File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Allowing mismatches in anchors for wholw genome alignment: Generation and effectiveness
Title | Allowing mismatches in anchors for wholw genome alignment: Generation and effectiveness |
---|---|
Authors | |
Issue Date | 2005 |
Publisher | World Scientific Publishing Co Pte Ltd. The Journal's web site is located at http://www.worldscibooks.com/series/abcb_series.shtml |
Citation | The 3rd Asia-Pacific Bioinformatics Conference (APBC 2005), Singapore, 17-21 January 2005. In Series on Advances In Bioinformatics and Computational Biology, 2005, v. 1, p. 1-10 How to Cite? |
Abstract | Recent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of the anchors. Some popular software tools use the exact match maximal unique substrings (EM-MUM) as anchors. However, the result is not satisfactory especially for genomes with high mutation rates (e.g. virus). In our experiments, we found that more than 40% of the conserved genes are not recovered. In this paper, we consider anchors with mismatches. Our contributions include the following. Based on the experiments on 35 pairs of virus genomes using three software tools (MUMmer-3, MaxMinCluster, MSS), we show that using anchors with mismatches does increase the effectiveness of locating conserved regions (about 10% more conserved gene regions are located, while maintaining a high sensitivity). To generate a more comprehensive set of anchors with mismatches is not trivial for long sequences due to the time and memory limitation. We propose two practical algorithms for generating this anchor set. One aims at speeding up the process, the other aims at saving memory. Experimental results show that both algorithms are faster (6 times and 5 times, respectively) than a straightforward suffix tree based approach. |
Description | This journal issue is proceedings of the 3rd Asia-Pacific Bioinformatics Conference (APBC) |
Persistent Identifier | http://hdl.handle.net/10722/93466 |
ISSN | |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yiu, SM | en_HK |
dc.contributor.author | Chan, PY | en_HK |
dc.contributor.author | Lam, TW | en_HK |
dc.contributor.author | Sung, WK | en_HK |
dc.contributor.author | Ting, HF | en_HK |
dc.contributor.author | Wong, WH | en_HK |
dc.date.accessioned | 2010-09-25T15:02:01Z | - |
dc.date.available | 2010-09-25T15:02:01Z | - |
dc.date.issued | 2005 | en_HK |
dc.identifier.citation | The 3rd Asia-Pacific Bioinformatics Conference (APBC 2005), Singapore, 17-21 January 2005. In Series on Advances In Bioinformatics and Computational Biology, 2005, v. 1, p. 1-10 | en_HK |
dc.identifier.issn | 1751-6404 | - |
dc.identifier.uri | http://hdl.handle.net/10722/93466 | - |
dc.description | This journal issue is proceedings of the 3rd Asia-Pacific Bioinformatics Conference (APBC) | - |
dc.description.abstract | Recent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of the anchors. Some popular software tools use the exact match maximal unique substrings (EM-MUM) as anchors. However, the result is not satisfactory especially for genomes with high mutation rates (e.g. virus). In our experiments, we found that more than 40% of the conserved genes are not recovered. In this paper, we consider anchors with mismatches. Our contributions include the following. Based on the experiments on 35 pairs of virus genomes using three software tools (MUMmer-3, MaxMinCluster, MSS), we show that using anchors with mismatches does increase the effectiveness of locating conserved regions (about 10% more conserved gene regions are located, while maintaining a high sensitivity). To generate a more comprehensive set of anchors with mismatches is not trivial for long sequences due to the time and memory limitation. We propose two practical algorithms for generating this anchor set. One aims at speeding up the process, the other aims at saving memory. Experimental results show that both algorithms are faster (6 times and 5 times, respectively) than a straightforward suffix tree based approach. | - |
dc.language | eng | en_HK |
dc.publisher | World Scientific Publishing Co Pte Ltd. The Journal's web site is located at http://www.worldscibooks.com/series/abcb_series.shtml | en_HK |
dc.relation.ispartof | Series on Advances In Bioinformatics and Computational Biology | en_HK |
dc.title | Allowing mismatches in anchors for wholw genome alignment: Generation and effectiveness | en_HK |
dc.type | Conference_Paper | en_HK |
dc.identifier.email | Yiu, SM: smyiu@cs.hku.hk | en_HK |
dc.identifier.email | Chan, PY: pychan@cs.hku.hk | en_HK |
dc.identifier.email | Lam, TW: twlam@cs.hku.hk | en_HK |
dc.identifier.email | Sung, WK: wksung@eti.hku.hk | en_HK |
dc.identifier.email | Ting, HF: hfting@cs.hku.hk | en_HK |
dc.identifier.authority | Yiu, SM=rp00207 | en_HK |
dc.identifier.authority | Lam, TW=rp00135 | en_HK |
dc.identifier.authority | Ting, HF=rp00177 | en_HK |
dc.identifier.scopus | eid_2-s2.0-84857009288 | - |
dc.identifier.hkuros | 102704 | en_HK |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-84857009288&selection=ref&src=s&origin=recordpage | - |
dc.identifier.volume | 1 | - |
dc.identifier.spage | 1 | en_HK |
dc.identifier.epage | 10 | en_HK |
dc.publisher.place | Singapore | - |
dc.identifier.scopusauthorid | Yiu, SM=7003282240 | - |
dc.identifier.scopusauthorid | Chan, PY=26435793700 | - |
dc.identifier.scopusauthorid | Lam, TW=7202523165 | - |
dc.identifier.scopusauthorid | Sung, WK=13310059700 | - |
dc.identifier.scopusauthorid | Ting, HF=7005654198 | - |
dc.identifier.scopusauthorid | Wong, PWH=9734871500 | - |
dc.customcontrol.immutable | sml 151014 - merged | - |
dc.identifier.issnl | 1751-6404 | - |