File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1186/1471-2105-10-S1-S15
- Scopus: eid_2-s2.0-60849121412
- PMID: 19208114
- WOS: WOS:000265601900015
- Find via
Supplementary
-
Bookmarks:
- CiteULike: 4
- Citations:
- Appears in Collections:
Conference Paper: Finding optimal threshold for correction error reads in DNA assembling
Title | Finding optimal threshold for correction error reads in DNA assembling |
---|---|
Authors | |
Issue Date | 2009 |
Publisher | BioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcbioinformatics/ |
Citation | The 7th Asia-Pacific Bioinformatics Conference (APBC 2009), Beijing, China, 13-16 January 2009. In BMC Bioinformatics, 2009, v. 10 suppl. 1, p. 153-161 How to Cite? |
Abstract | Background: DNA assembling is the problem of determining the nucleotide sequence of a genome from its substrings, called reads. In the experiments, there may be some errors on the reads which affect the performance of the DNA assembly algorithms. Existing algorithms, e.g. ECINDEL and SRCorr, correct the error reads by considering the number of times each length-k substring of the reads appear in the input. They treat those length-k substrings appear at least M times as correct substring and correct the error reads based on these substrings. However, since the threshold M is chosen without any solid theoretical analysis, these algorithms cannot guarantee their performances on error correction. Results: In this paper, we propose a method to calculate the probabilities of false positive and false negative when determining whether a length-k substring is correct using threshold M. Based on this optimal threshold M that minimizes the total errors (false positives and false negatives). Experimental results on both real data and simulated data showed that our calculation is correct and we can reduce the total error substrings by 77.6% and 65.1% when compared to ECINDEL and SRCorr respectively. Conclusion: We introduced a method to calculate the probability of false positives and false negatives of the length-k substring using different thresholds. Based on this calculation, we found the optimal threshold to minimize the total error of false positive plus false negative. © 2009 Chin et al; licensee BioMed Central Ltd. |
Persistent Identifier | http://hdl.handle.net/10722/60604 |
ISSN | 2023 Impact Factor: 2.9 2023 SCImago Journal Rankings: 1.005 |
PubMed Central ID | |
ISI Accession Number ID | |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Chin, FYL | en_HK |
dc.contributor.author | Leung, HCM | en_HK |
dc.contributor.author | Li, WL | en_HK |
dc.contributor.author | Yiu, SM | en_HK |
dc.date.accessioned | 2010-05-31T04:14:49Z | - |
dc.date.available | 2010-05-31T04:14:49Z | - |
dc.date.issued | 2009 | en_HK |
dc.identifier.citation | The 7th Asia-Pacific Bioinformatics Conference (APBC 2009), Beijing, China, 13-16 January 2009. In BMC Bioinformatics, 2009, v. 10 suppl. 1, p. 153-161 | en_HK |
dc.identifier.issn | 1471-2105 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/60604 | - |
dc.description.abstract | Background: DNA assembling is the problem of determining the nucleotide sequence of a genome from its substrings, called reads. In the experiments, there may be some errors on the reads which affect the performance of the DNA assembly algorithms. Existing algorithms, e.g. ECINDEL and SRCorr, correct the error reads by considering the number of times each length-k substring of the reads appear in the input. They treat those length-k substrings appear at least M times as correct substring and correct the error reads based on these substrings. However, since the threshold M is chosen without any solid theoretical analysis, these algorithms cannot guarantee their performances on error correction. Results: In this paper, we propose a method to calculate the probabilities of false positive and false negative when determining whether a length-k substring is correct using threshold M. Based on this optimal threshold M that minimizes the total errors (false positives and false negatives). Experimental results on both real data and simulated data showed that our calculation is correct and we can reduce the total error substrings by 77.6% and 65.1% when compared to ECINDEL and SRCorr respectively. Conclusion: We introduced a method to calculate the probability of false positives and false negatives of the length-k substring using different thresholds. Based on this calculation, we found the optimal threshold to minimize the total error of false positive plus false negative. © 2009 Chin et al; licensee BioMed Central Ltd. | en_HK |
dc.language | eng | en_HK |
dc.publisher | BioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcbioinformatics/ | en_HK |
dc.relation.ispartof | BMC Bioinformatics | en_HK |
dc.rights | BMC Bioinformatics. Copyright © BioMed Central Ltd. | en_HK |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.mesh | Base Sequence | - |
dc.subject.mesh | Computational Biology - methods | - |
dc.subject.mesh | DNA - chemistry | - |
dc.subject.mesh | Genome | - |
dc.subject.mesh | Sequence Analysis, DNA - methods | - |
dc.title | Finding optimal threshold for correction error reads in DNA assembling | en_HK |
dc.type | Conference_Paper | en_HK |
dc.identifier.openurl | http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1471-2105&volume=10, supp 1, article no. S15&spage=&epage=&date=2009&atitle=Finding+optimal+threshold+for+correction+error+reads+in+DNA+assembling | en_HK |
dc.identifier.email | Chin, FYL:chin@cs.hku.hk | en_HK |
dc.identifier.email | Leung, HCM:cmleung2@cs.hku.hk | en_HK |
dc.identifier.email | Li, W: wlli@cs.hku.hk | en_HK |
dc.identifier.email | Yiu, SM:smyiu@cs.hku.hk | - |
dc.identifier.authority | Chin, FYL=rp00105 | en_HK |
dc.identifier.authority | Leung, HCM=rp00144 | en_HK |
dc.identifier.authority | Yiu, SM=rp00207 | en_HK |
dc.description.nature | published_or_final_version | en_US |
dc.identifier.doi | 10.1186/1471-2105-10-S1-S15 | en_HK |
dc.identifier.pmid | 19208114 | - |
dc.identifier.pmcid | PMC2648749 | - |
dc.identifier.scopus | eid_2-s2.0-60849121412 | en_HK |
dc.identifier.hkuros | 161353 | en_HK |
dc.identifier.hkuros | 166440 | - |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-60849121412&selection=ref&src=s&origin=recordpage | en_HK |
dc.identifier.volume | 10 | en_HK |
dc.identifier.issue | suppl. 1 | en_HK |
dc.identifier.spage | 153 | - |
dc.identifier.epage | 161 | - |
dc.identifier.eissn | 1471-2105 | - |
dc.identifier.isi | WOS:000265601900015 | - |
dc.publisher.place | United Kingdom | en_HK |
dc.identifier.scopusauthorid | Chin, FYL=7005101915 | en_HK |
dc.identifier.scopusauthorid | Leung, HCM=35233742700 | en_HK |
dc.identifier.scopusauthorid | Li, WL=36063309100 | en_HK |
dc.identifier.scopusauthorid | Yiu, SM=7003282240 | en_HK |
dc.identifier.citeulike | 4307690 | - |
dc.customcontrol.immutable | sml 151113 - merged | - |
dc.identifier.issnl | 1471-2105 | - |