Finding optimal threshold for correction error reads in DNA assembling

Chin, FYL; Leung, HCM; Li, WL; Yiu, SM

File Download

1471-2105-10-S1-S15.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1186/1471-2105-10-S1-S15
Scopus: eid_2-s2.0-60849121412
PMID: 19208114
WOS: WOS:000265601900015
Find via

Supplementary

Bookmarks:
- CiteULike: 4
Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Finding optimal threshold for correction error reads in DNA assembling

Title	Finding optimal threshold for correction error reads in DNA assembling
Authors	Chin, FYL Leung, HCM Li, WL Yiu, SM
Issue Date	2009
Publisher	BioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcbioinformatics/
Citation	The 7th Asia-Pacific Bioinformatics Conference (APBC 2009), Beijing, China, 13-16 January 2009. In BMC Bioinformatics, 2009, v. 10 suppl. 1, p. 153-161 How to Cite? DOI: http://dx.doi.org/10.1186/1471-2105-10-S1-S15
Abstract	Background: DNA assembling is the problem of determining the nucleotide sequence of a genome from its substrings, called reads. In the experiments, there may be some errors on the reads which affect the performance of the DNA assembly algorithms. Existing algorithms, e.g. ECINDEL and SRCorr, correct the error reads by considering the number of times each length-k substring of the reads appear in the input. They treat those length-k substrings appear at least M times as correct substring and correct the error reads based on these substrings. However, since the threshold M is chosen without any solid theoretical analysis, these algorithms cannot guarantee their performances on error correction. Results: In this paper, we propose a method to calculate the probabilities of false positive and false negative when determining whether a length-k substring is correct using threshold M. Based on this optimal threshold M that minimizes the total errors (false positives and false negatives). Experimental results on both real data and simulated data showed that our calculation is correct and we can reduce the total error substrings by 77.6% and 65.1% when compared to ECINDEL and SRCorr respectively. Conclusion: We introduced a method to calculate the probability of false positives and false negatives of the length-k substring using different thresholds. Based on this calculation, we found the optimal threshold to minimize the total error of false positive plus false negative. © 2009 Chin et al; licensee BioMed Central Ltd.
Persistent Identifier	http://hdl.handle.net/10722/60604
ISSN	1471-2105 2023 Impact Factor: 2.9 2023 SCImago Journal Rankings: 1.005
PubMed Central ID	PMC2648749
ISI Accession Number ID	WOS:000265601900015
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Chin, FYL	en_HK
dc.contributor.author	Leung, HCM	en_HK
dc.contributor.author	Li, WL	en_HK
dc.contributor.author	Yiu, SM	en_HK
dc.date.accessioned	2010-05-31T04:14:49Z	-
dc.date.available	2010-05-31T04:14:49Z	-
dc.date.issued	2009	en_HK
dc.identifier.citation	The 7th Asia-Pacific Bioinformatics Conference (APBC 2009), Beijing, China, 13-16 January 2009. In BMC Bioinformatics, 2009, v. 10 suppl. 1, p. 153-161	en_HK
dc.identifier.issn	1471-2105	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/60604	-
dc.description.abstract	Background: DNA assembling is the problem of determining the nucleotide sequence of a genome from its substrings, called reads. In the experiments, there may be some errors on the reads which affect the performance of the DNA assembly algorithms. Existing algorithms, e.g. ECINDEL and SRCorr, correct the error reads by considering the number of times each length-k substring of the reads appear in the input. They treat those length-k substrings appear at least M times as correct substring and correct the error reads based on these substrings. However, since the threshold M is chosen without any solid theoretical analysis, these algorithms cannot guarantee their performances on error correction. Results: In this paper, we propose a method to calculate the probabilities of false positive and false negative when determining whether a length-k substring is correct using threshold M. Based on this optimal threshold M that minimizes the total errors (false positives and false negatives). Experimental results on both real data and simulated data showed that our calculation is correct and we can reduce the total error substrings by 77.6% and 65.1% when compared to ECINDEL and SRCorr respectively. Conclusion: We introduced a method to calculate the probability of false positives and false negatives of the length-k substring using different thresholds. Based on this calculation, we found the optimal threshold to minimize the total error of false positive plus false negative. © 2009 Chin et al; licensee BioMed Central Ltd.	en_HK
dc.language	eng	en_HK
dc.publisher	BioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcbioinformatics/	en_HK
dc.relation.ispartof	BMC Bioinformatics	en_HK
dc.rights	BMC Bioinformatics. Copyright © BioMed Central Ltd.	en_HK
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.mesh	Base Sequence	-
dc.subject.mesh	Computational Biology - methods	-
dc.subject.mesh	DNA - chemistry	-
dc.subject.mesh	Genome	-
dc.subject.mesh	Sequence Analysis, DNA - methods	-
dc.title	Finding optimal threshold for correction error reads in DNA assembling	en_HK
dc.type	Conference_Paper	en_HK
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1471-2105&volume=10, supp 1, article no. S15&spage=&epage=&date=2009&atitle=Finding+optimal+threshold+for+correction+error+reads+in+DNA+assembling	en_HK
dc.identifier.email	Chin, FYL:chin@cs.hku.hk	en_HK
dc.identifier.email	Leung, HCM:cmleung2@cs.hku.hk	en_HK
dc.identifier.email	Li, W: wlli@cs.hku.hk	en_HK
dc.identifier.email	Yiu, SM:smyiu@cs.hku.hk	-
dc.identifier.authority	Chin, FYL=rp00105	en_HK
dc.identifier.authority	Leung, HCM=rp00144	en_HK
dc.identifier.authority	Yiu, SM=rp00207	en_HK
dc.description.nature	published_or_final_version	en_US
dc.identifier.doi	10.1186/1471-2105-10-S1-S15	en_HK
dc.identifier.pmid	19208114	-
dc.identifier.pmcid	PMC2648749	-
dc.identifier.scopus	eid_2-s2.0-60849121412	en_HK
dc.identifier.hkuros	161353	en_HK
dc.identifier.hkuros	166440	-
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-60849121412&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	10	en_HK
dc.identifier.issue	suppl. 1	en_HK
dc.identifier.spage	153	-
dc.identifier.epage	161	-
dc.identifier.eissn	1471-2105	-
dc.identifier.isi	WOS:000265601900015	-
dc.publisher.place	United Kingdom	en_HK
dc.identifier.scopusauthorid	Chin, FYL=7005101915	en_HK
dc.identifier.scopusauthorid	Leung, HCM=35233742700	en_HK
dc.identifier.scopusauthorid	Li, WL=36063309100	en_HK
dc.identifier.scopusauthorid	Yiu, SM=7003282240	en_HK
dc.identifier.citeulike	4307690	-
dc.customcontrol.immutable	sml 151113 - merged	-
dc.identifier.issnl	1471-2105	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Finding optimal threshold for correction error reads in DNA assembling

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats