A data-mining approach for multiple structural alignment of proteins

Siu, WY; Mamoulis, N; Yiu, SM; Chan, HL

File Download

97320630004366.pdf

Links for fulltext

(May Require Subscription)

PMID: 21079664
Find via

Supplementary

Citations:
- PubMed Central: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: A data-mining approach for multiple structural alignment of proteins

Title	A data-mining approach for multiple structural alignment of proteins
Authors	Siu, WY Mamoulis, N Yiu, SM Chan, HL
Keywords	Structural comparisons Proteins Multiple alignment
Issue Date	2010
Publisher	Biomedical Informatics Publishing Group. The Journal's web site is located at http://www.bioinformation.net/
Citation	Bioinformation, 2010, v. 4 n. 8, p. 366-370 How to Cite?
Abstract	Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools.
Persistent Identifier	http://hdl.handle.net/10722/129981
ISSN	0973-2063 2022 Impact Factor: 1.9
PubMed Central ID	PMC2951672

DC Field	Value	Language
dc.contributor.author	Siu, WY	en_US
dc.contributor.author	Mamoulis, N	en_US
dc.contributor.author	Yiu, SM	en_US
dc.contributor.author	Chan, HL	en_US
dc.date.accessioned	2010-12-23T08:45:07Z	-
dc.date.available	2010-12-23T08:45:07Z	-
dc.date.issued	2010	en_US
dc.identifier.citation	Bioinformation, 2010, v. 4 n. 8, p. 366-370	en_US
dc.identifier.issn	0973-2063	-
dc.identifier.uri	http://hdl.handle.net/10722/129981	-
dc.description.abstract	Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools.	-
dc.language	eng	en_US
dc.publisher	Biomedical Informatics Publishing Group. The Journal's web site is located at http://www.bioinformation.net/	-
dc.relation.ispartof	Bioinformation	en_US
dc.subject	Structural comparisons	-
dc.subject	Proteins	-
dc.subject	Multiple alignment	-
dc.title	A data-mining approach for multiple structural alignment of proteins	en_US
dc.type	Article	en_US
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=0973-2063&volume=4&issue=8&spage=366&epage=370&date=2010&atitle=A+data-mining+approach+for+multiple+structural+alignment+of+proteins	-
dc.identifier.email	Mamoulis, N: nikos@cs.hku.hk	en_US
dc.identifier.email	Yiu, SM: smyiu@cs.hku.hk	en_US
dc.identifier.email	Chan, HL: hlchan@cs.hku.hk	en_US
dc.identifier.authority	Mamoulis, N=rp00155	en_US
dc.identifier.authority	Yiu, SM=rp00207	en_US
dc.identifier.authority	Chan, HL=rp01310	en_US
dc.description.nature	published_or_final_version	-
dc.identifier.pmid	21079664	-
dc.identifier.pmcid	PMC2951672	-
dc.identifier.hkuros	177372	en_US
dc.identifier.volume	4	en_US
dc.identifier.issue	8	en_US
dc.identifier.spage	366	en_US
dc.identifier.epage	370	en_US
dc.identifier.issnl	0973-2063	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: A data-mining approach for multiple structural alignment of proteins

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats