Benchmarking of computational error-correction methods for next-generation sequencing data

Mitchell, Keith; Brito, Jaqueline J.; Mandric, Igor; Wu, Qiaozhen; Knyazev, Sergey; Chang, Sei; Martin, Lana S.; Karlsberg, Aaron; Gerasimov, Ekaterina; Littman, Russell; Hill, Brian L.; Wu, Nicholas C.; Yang, Harry Taegyun; Hsieh, Kevin; Chen, Linus; Littman, Eli; Shabani, Taylor; Enik, German; Yao, Douglas; Sun, Ren; Schroeder, Jan; Eskin, Eleazar; Zelikovsky, Alex; Skums, Pavel; Pop, Mihai; Mangul, Serghei

File Download

content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1186/s13059-020-01988-3
Scopus: eid_2-s2.0-85082008336
PMID: 32183840
WOS: WOS:000521297300001
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- President's Office: Journal/Magazine Articles
- Biomedical Sciences: Journal/Magazine Articles

Article: Benchmarking of computational error-correction methods for next-generation sequencing data

Title	Benchmarking of computational error-correction methods for next-generation sequencing data
Authors	Mitchell, Keith Brito, Jaqueline J.Mandric, Igor Wu, Qiaozhen Knyazev, Sergey Chang, Sei Martin, Lana S.Karlsberg, Aaron Gerasimov, Ekaterina Littman, Russell Hill, Brian L.Wu, Nicholas C.Yang, Harry Taegyun Hsieh, Kevin Chen, Linus Littman, Eli Shabani, Taylor Enik, German Yao, Douglas Sun, Ren Schroeder, Jan Eskin, Eleazar Zelikovsky, Alex Skums, Pavel Pop, Mihai Mangul, Serghei
Issue Date	2020
Citation	Genome Biology, 2020, v. 21, n. 1, article no. 71 How to Cite? DOI: http://dx.doi.org/10.1186/s13059-020-01988-3
Abstract	© 2020 The Author(s). Background: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. Results: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. Conclusions: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.
Persistent Identifier	http://hdl.handle.net/10722/285863
ISSN	1474-7596 2012 Impact Factor: 10.288 2023 SCImago Journal Rankings: 7.197
PubMed Central ID	PMC7079412
ISI Accession Number ID	WOS:000521297300001

DC Field	Value	Language
dc.contributor.author	Mitchell, Keith	-
dc.contributor.author	Brito, Jaqueline J.	-
dc.contributor.author	Mandric, Igor	-
dc.contributor.author	Wu, Qiaozhen	-
dc.contributor.author	Knyazev, Sergey	-
dc.contributor.author	Chang, Sei	-
dc.contributor.author	Martin, Lana S.	-
dc.contributor.author	Karlsberg, Aaron	-
dc.contributor.author	Gerasimov, Ekaterina	-
dc.contributor.author	Littman, Russell	-
dc.contributor.author	Hill, Brian L.	-
dc.contributor.author	Wu, Nicholas C.	-
dc.contributor.author	Yang, Harry Taegyun	-
dc.contributor.author	Hsieh, Kevin	-
dc.contributor.author	Chen, Linus	-
dc.contributor.author	Littman, Eli	-
dc.contributor.author	Shabani, Taylor	-
dc.contributor.author	Enik, German	-
dc.contributor.author	Yao, Douglas	-
dc.contributor.author	Sun, Ren	-
dc.contributor.author	Schroeder, Jan	-
dc.contributor.author	Eskin, Eleazar	-
dc.contributor.author	Zelikovsky, Alex	-
dc.contributor.author	Skums, Pavel	-
dc.contributor.author	Pop, Mihai	-
dc.contributor.author	Mangul, Serghei	-
dc.date.accessioned	2020-08-18T04:56:50Z	-
dc.date.available	2020-08-18T04:56:50Z	-
dc.date.issued	2020	-
dc.identifier.citation	Genome Biology, 2020, v. 21, n. 1, article no. 71	-
dc.identifier.issn	1474-7596	-
dc.identifier.uri	http://hdl.handle.net/10722/285863	-
dc.description.abstract	© 2020 The Author(s). Background: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. Results: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. Conclusions: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.	-
dc.language	eng	-
dc.relation.ispartof	Genome Biology	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.title	Benchmarking of computational error-correction methods for next-generation sequencing data	-
dc.type	Article	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.1186/s13059-020-01988-3	-
dc.identifier.pmid	32183840	-
dc.identifier.pmcid	PMC7079412	-
dc.identifier.scopus	eid_2-s2.0-85082008336	-
dc.identifier.volume	21	-
dc.identifier.issue	1	-
dc.identifier.spage	article no. 71	-
dc.identifier.epage	article no. 71	-
dc.identifier.eissn	1474-760X	-
dc.identifier.isi	WOS:000521297300001	-
dc.identifier.issnl	1474-7596	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Benchmarking of computational error-correction methods for next-generation sequencing data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats