Allowing mismatches in anchors for wholw genome alignment: Generation and effectiveness

Yiu, SM; Chan, PY; Lam, TW; Sung, WK; Ting, HF; Wong, WH

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-84857009288
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Allowing mismatches in anchors for wholw genome alignment: Generation and effectiveness

Title	Allowing mismatches in anchors for wholw genome alignment: Generation and effectiveness
Authors	Yiu, SM Chan, PY Lam, TW Sung, WK Ting, HF Wong, WH
Issue Date	2005
Publisher	World Scientific Publishing Co Pte Ltd. The Journal's web site is located at http://www.worldscibooks.com/series/abcb_series.shtml
Citation	The 3rd Asia-Pacific Bioinformatics Conference (APBC 2005), Singapore, 17-21 January 2005. In Series on Advances In Bioinformatics and Computational Biology, 2005, v. 1, p. 1-10 How to Cite?
Abstract	Recent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of the anchors. Some popular software tools use the exact match maximal unique substrings (EM-MUM) as anchors. However, the result is not satisfactory especially for genomes with high mutation rates (e.g. virus). In our experiments, we found that more than 40% of the conserved genes are not recovered. In this paper, we consider anchors with mismatches. Our contributions include the following. Based on the experiments on 35 pairs of virus genomes using three software tools (MUMmer-3, MaxMinCluster, MSS), we show that using anchors with mismatches does increase the effectiveness of locating conserved regions (about 10% more conserved gene regions are located, while maintaining a high sensitivity). To generate a more comprehensive set of anchors with mismatches is not trivial for long sequences due to the time and memory limitation. We propose two practical algorithms for generating this anchor set. One aims at speeding up the process, the other aims at saving memory. Experimental results show that both algorithms are faster (6 times and 5 times, respectively) than a straightforward suffix tree based approach.
Description	This journal issue is proceedings of the 3rd Asia-Pacific Bioinformatics Conference (APBC)
Persistent Identifier	http://hdl.handle.net/10722/93466
ISSN	1751-6404
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Yiu, SM	en_HK
dc.contributor.author	Chan, PY	en_HK
dc.contributor.author	Lam, TW	en_HK
dc.contributor.author	Sung, WK	en_HK
dc.contributor.author	Ting, HF	en_HK
dc.contributor.author	Wong, WH	en_HK
dc.date.accessioned	2010-09-25T15:02:01Z	-
dc.date.available	2010-09-25T15:02:01Z	-
dc.date.issued	2005	en_HK
dc.identifier.citation	The 3rd Asia-Pacific Bioinformatics Conference (APBC 2005), Singapore, 17-21 January 2005. In Series on Advances In Bioinformatics and Computational Biology, 2005, v. 1, p. 1-10	en_HK
dc.identifier.issn	1751-6404	-
dc.identifier.uri	http://hdl.handle.net/10722/93466	-
dc.description	This journal issue is proceedings of the 3rd Asia-Pacific Bioinformatics Conference (APBC)	-
dc.description.abstract	Recent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of the anchors. Some popular software tools use the exact match maximal unique substrings (EM-MUM) as anchors. However, the result is not satisfactory especially for genomes with high mutation rates (e.g. virus). In our experiments, we found that more than 40% of the conserved genes are not recovered. In this paper, we consider anchors with mismatches. Our contributions include the following. Based on the experiments on 35 pairs of virus genomes using three software tools (MUMmer-3, MaxMinCluster, MSS), we show that using anchors with mismatches does increase the effectiveness of locating conserved regions (about 10% more conserved gene regions are located, while maintaining a high sensitivity). To generate a more comprehensive set of anchors with mismatches is not trivial for long sequences due to the time and memory limitation. We propose two practical algorithms for generating this anchor set. One aims at speeding up the process, the other aims at saving memory. Experimental results show that both algorithms are faster (6 times and 5 times, respectively) than a straightforward suffix tree based approach.	-
dc.language	eng	en_HK
dc.publisher	World Scientific Publishing Co Pte Ltd. The Journal's web site is located at http://www.worldscibooks.com/series/abcb_series.shtml	en_HK
dc.relation.ispartof	Series on Advances In Bioinformatics and Computational Biology	en_HK
dc.title	Allowing mismatches in anchors for wholw genome alignment: Generation and effectiveness	en_HK
dc.type	Conference_Paper	en_HK
dc.identifier.email	Yiu, SM: smyiu@cs.hku.hk	en_HK
dc.identifier.email	Chan, PY: pychan@cs.hku.hk	en_HK
dc.identifier.email	Lam, TW: twlam@cs.hku.hk	en_HK
dc.identifier.email	Sung, WK: wksung@eti.hku.hk	en_HK
dc.identifier.email	Ting, HF: hfting@cs.hku.hk	en_HK
dc.identifier.authority	Yiu, SM=rp00207	en_HK
dc.identifier.authority	Lam, TW=rp00135	en_HK
dc.identifier.authority	Ting, HF=rp00177	en_HK
dc.identifier.scopus	eid_2-s2.0-84857009288	-
dc.identifier.hkuros	102704	en_HK
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-84857009288&selection=ref&src=s&origin=recordpage	-
dc.identifier.volume	1	-
dc.identifier.spage	1	en_HK
dc.identifier.epage	10	en_HK
dc.publisher.place	Singapore	-
dc.identifier.scopusauthorid	Yiu, SM=7003282240	-
dc.identifier.scopusauthorid	Chan, PY=26435793700	-
dc.identifier.scopusauthorid	Lam, TW=7202523165	-
dc.identifier.scopusauthorid	Sung, WK=13310059700	-
dc.identifier.scopusauthorid	Ting, HF=7005654198	-
dc.identifier.scopusauthorid	Wong, PWH=9734871500	-
dc.customcontrol.immutable	sml 151014 - merged	-
dc.identifier.issnl	1751-6404	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Allowing mismatches in anchors for wholw genome alignment: Generation and effectiveness

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats