File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Finding optimal threshold for correction error reads in DNA assembling

TitleFinding optimal threshold for correction error reads in DNA assembling
Authors
Issue Date2009
PublisherBioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcbioinformatics/
Citation
The 7th Asia-Pacific Bioinformatics Conference (APBC 2009), Beijing, China, 13-16 January 2009. In BMC Bioinformatics, 2009, v. 10 suppl. 1, p. 153-161 How to Cite?
AbstractBackground: DNA assembling is the problem of determining the nucleotide sequence of a genome from its substrings, called reads. In the experiments, there may be some errors on the reads which affect the performance of the DNA assembly algorithms. Existing algorithms, e.g. ECINDEL and SRCorr, correct the error reads by considering the number of times each length-k substring of the reads appear in the input. They treat those length-k substrings appear at least M times as correct substring and correct the error reads based on these substrings. However, since the threshold M is chosen without any solid theoretical analysis, these algorithms cannot guarantee their performances on error correction. Results: In this paper, we propose a method to calculate the probabilities of false positive and false negative when determining whether a length-k substring is correct using threshold M. Based on this optimal threshold M that minimizes the total errors (false positives and false negatives). Experimental results on both real data and simulated data showed that our calculation is correct and we can reduce the total error substrings by 77.6% and 65.1% when compared to ECINDEL and SRCorr respectively. Conclusion: We introduced a method to calculate the probability of false positives and false negatives of the length-k substring using different thresholds. Based on this calculation, we found the optimal threshold to minimize the total error of false positive plus false negative. © 2009 Chin et al; licensee BioMed Central Ltd.
Persistent Identifierhttp://hdl.handle.net/10722/60604
ISSN
2015 Impact Factor: 2.435
2015 SCImago Journal Rankings: 1.722
PubMed Central ID
ISI Accession Number ID
References

 

DC FieldValueLanguage
dc.contributor.authorChin, FYLen_HK
dc.contributor.authorLeung, HCMen_HK
dc.contributor.authorLi, WLen_HK
dc.contributor.authorYiu, SMen_HK
dc.date.accessioned2010-05-31T04:14:49Z-
dc.date.available2010-05-31T04:14:49Z-
dc.date.issued2009en_HK
dc.identifier.citationThe 7th Asia-Pacific Bioinformatics Conference (APBC 2009), Beijing, China, 13-16 January 2009. In BMC Bioinformatics, 2009, v. 10 suppl. 1, p. 153-161en_HK
dc.identifier.issn1471-2105en_HK
dc.identifier.urihttp://hdl.handle.net/10722/60604-
dc.description.abstractBackground: DNA assembling is the problem of determining the nucleotide sequence of a genome from its substrings, called reads. In the experiments, there may be some errors on the reads which affect the performance of the DNA assembly algorithms. Existing algorithms, e.g. ECINDEL and SRCorr, correct the error reads by considering the number of times each length-k substring of the reads appear in the input. They treat those length-k substrings appear at least M times as correct substring and correct the error reads based on these substrings. However, since the threshold M is chosen without any solid theoretical analysis, these algorithms cannot guarantee their performances on error correction. Results: In this paper, we propose a method to calculate the probabilities of false positive and false negative when determining whether a length-k substring is correct using threshold M. Based on this optimal threshold M that minimizes the total errors (false positives and false negatives). Experimental results on both real data and simulated data showed that our calculation is correct and we can reduce the total error substrings by 77.6% and 65.1% when compared to ECINDEL and SRCorr respectively. Conclusion: We introduced a method to calculate the probability of false positives and false negatives of the length-k substring using different thresholds. Based on this calculation, we found the optimal threshold to minimize the total error of false positive plus false negative. © 2009 Chin et al; licensee BioMed Central Ltd.en_HK
dc.languageengen_HK
dc.publisherBioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcbioinformatics/en_HK
dc.relation.ispartofBMC Bioinformaticsen_HK
dc.rightsBMC Bioinformatics. Copyright © BioMed Central Ltd.en_HK
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.subject.meshBase Sequence-
dc.subject.meshComputational Biology - methods-
dc.subject.meshDNA - chemistry-
dc.subject.meshGenome-
dc.subject.meshSequence Analysis, DNA - methods-
dc.titleFinding optimal threshold for correction error reads in DNA assemblingen_HK
dc.typeConference_Paperen_HK
dc.identifier.openurlhttp://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1471-2105&volume=10, supp 1, article no. S15&spage=&epage=&date=2009&atitle=Finding+optimal+threshold+for+correction+error+reads+in+DNA+assemblingen_HK
dc.identifier.emailChin, FYL:chin@cs.hku.hken_HK
dc.identifier.emailLeung, HCM:cmleung2@cs.hku.hken_HK
dc.identifier.emailLi, W: wlli@cs.hku.hken_HK
dc.identifier.emailYiu, SM:smyiu@cs.hku.hk-
dc.identifier.authorityChin, FYL=rp00105en_HK
dc.identifier.authorityLeung, HCM=rp00144en_HK
dc.identifier.authorityYiu, SM=rp00207en_HK
dc.description.naturepublished_or_final_versionen_US
dc.identifier.doi10.1186/1471-2105-10-S1-S15en_HK
dc.identifier.pmid19208114-
dc.identifier.pmcidPMC2648749-
dc.identifier.scopuseid_2-s2.0-60849121412en_HK
dc.identifier.hkuros161353en_HK
dc.identifier.hkuros166440-
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-60849121412&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume10en_HK
dc.identifier.issuesuppl. 1en_HK
dc.identifier.spage153-
dc.identifier.epage161-
dc.identifier.eissn1471-2105-
dc.identifier.isiWOS:000265601900015-
dc.publisher.placeUnited Kingdomen_HK
dc.identifier.scopusauthoridChin, FYL=7005101915en_HK
dc.identifier.scopusauthoridLeung, HCM=35233742700en_HK
dc.identifier.scopusauthoridLi, WL=36063309100en_HK
dc.identifier.scopusauthoridYiu, SM=7003282240en_HK
dc.identifier.citeulike4307690-
dc.customcontrol.immutablesml 151113 - merged-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats