File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Allowing mismatches in anchors for wholw genome alignment: Generation and effectiveness

TitleAllowing mismatches in anchors for wholw genome alignment: Generation and effectiveness
Authors
Issue Date2005
PublisherWorld Scientific Publishing Co Pte Ltd. The Journal's web site is located at http://www.worldscibooks.com/series/abcb_series.shtml
Citation
The 3rd Asia-Pacific Bioinformatics Conference (APBC 2005), Singapore, 17-21 January 2005. In Series on Advances In Bioinformatics and Computational Biology, 2005, v. 1, p. 1-10 How to Cite?
AbstractRecent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of the anchors. Some popular software tools use the exact match maximal unique substrings (EM-MUM) as anchors. However, the result is not satisfactory especially for genomes with high mutation rates (e.g. virus). In our experiments, we found that more than 40% of the conserved genes are not recovered. In this paper, we consider anchors with mismatches. Our contributions include the following. Based on the experiments on 35 pairs of virus genomes using three software tools (MUMmer-3, MaxMinCluster, MSS), we show that using anchors with mismatches does increase the effectiveness of locating conserved regions (about 10% more conserved gene regions are located, while maintaining a high sensitivity). To generate a more comprehensive set of anchors with mismatches is not trivial for long sequences due to the time and memory limitation. We propose two practical algorithms for generating this anchor set. One aims at speeding up the process, the other aims at saving memory. Experimental results show that both algorithms are faster (6 times and 5 times, respectively) than a straightforward suffix tree based approach.
DescriptionThis journal issue is proceedings of the 3rd Asia-Pacific Bioinformatics Conference (APBC)
Persistent Identifierhttp://hdl.handle.net/10722/93466
ISSN
References

 

DC FieldValueLanguage
dc.contributor.authorYiu, SMen_HK
dc.contributor.authorChan, PYen_HK
dc.contributor.authorLam, TWen_HK
dc.contributor.authorSung, WKen_HK
dc.contributor.authorTing, HFen_HK
dc.contributor.authorWong, WHen_HK
dc.date.accessioned2010-09-25T15:02:01Z-
dc.date.available2010-09-25T15:02:01Z-
dc.date.issued2005en_HK
dc.identifier.citationThe 3rd Asia-Pacific Bioinformatics Conference (APBC 2005), Singapore, 17-21 January 2005. In Series on Advances In Bioinformatics and Computational Biology, 2005, v. 1, p. 1-10en_HK
dc.identifier.issn1751-6404-
dc.identifier.urihttp://hdl.handle.net/10722/93466-
dc.descriptionThis journal issue is proceedings of the 3rd Asia-Pacific Bioinformatics Conference (APBC)-
dc.description.abstractRecent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of the anchors. Some popular software tools use the exact match maximal unique substrings (EM-MUM) as anchors. However, the result is not satisfactory especially for genomes with high mutation rates (e.g. virus). In our experiments, we found that more than 40% of the conserved genes are not recovered. In this paper, we consider anchors with mismatches. Our contributions include the following. Based on the experiments on 35 pairs of virus genomes using three software tools (MUMmer-3, MaxMinCluster, MSS), we show that using anchors with mismatches does increase the effectiveness of locating conserved regions (about 10% more conserved gene regions are located, while maintaining a high sensitivity). To generate a more comprehensive set of anchors with mismatches is not trivial for long sequences due to the time and memory limitation. We propose two practical algorithms for generating this anchor set. One aims at speeding up the process, the other aims at saving memory. Experimental results show that both algorithms are faster (6 times and 5 times, respectively) than a straightforward suffix tree based approach.-
dc.languageengen_HK
dc.publisherWorld Scientific Publishing Co Pte Ltd. The Journal's web site is located at http://www.worldscibooks.com/series/abcb_series.shtmlen_HK
dc.relation.ispartofSeries on Advances In Bioinformatics and Computational Biologyen_HK
dc.titleAllowing mismatches in anchors for wholw genome alignment: Generation and effectivenessen_HK
dc.typeConference_Paperen_HK
dc.identifier.emailYiu, SM: smyiu@cs.hku.hken_HK
dc.identifier.emailChan, PY: pychan@cs.hku.hken_HK
dc.identifier.emailLam, TW: twlam@cs.hku.hken_HK
dc.identifier.emailSung, WK: wksung@eti.hku.hken_HK
dc.identifier.emailTing, HF: hfting@cs.hku.hken_HK
dc.identifier.authorityYiu, SM=rp00207en_HK
dc.identifier.authorityLam, TW=rp00135en_HK
dc.identifier.authorityTing, HF=rp00177en_HK
dc.identifier.scopuseid_2-s2.0-84857009288-
dc.identifier.hkuros102704en_HK
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-84857009288&selection=ref&src=s&origin=recordpage-
dc.identifier.volume1-
dc.identifier.spage1en_HK
dc.identifier.epage10en_HK
dc.publisher.placeSingapore-
dc.identifier.scopusauthoridYiu, SM=7003282240-
dc.identifier.scopusauthoridChan, PY=26435793700-
dc.identifier.scopusauthoridLam, TW=7202523165-
dc.identifier.scopusauthoridSung, WK=13310059700-
dc.identifier.scopusauthoridTing, HF=7005654198-
dc.identifier.scopusauthoridWong, PWH=9734871500-
dc.customcontrol.immutablesml 151014 - merged-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats