File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Accurate top protein variant discovery via low-N pick-and-validate machine learning

TitleAccurate top protein variant discovery via low-N pick-and-validate machine learning
Authors
Keywordsactive learning
base editor
Cas9
combinatorial mutagenesis
CRISPR
genome editing
low-N
machine learning
protein engineering
zero-shot
Issue Date21-Feb-2024
PublisherElsevier
Citation
Cell Systems, 2024, v. 15, n. 2, p. 193-203 How to Cite?
AbstractA strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information.
Persistent Identifierhttp://hdl.handle.net/10722/345914
ISSN
2023 Impact Factor: 9.0
2023 SCImago Journal Rankings: 4.872

 

DC FieldValueLanguage
dc.contributor.authorChu, Hoi Yee-
dc.contributor.authorFong, John HC-
dc.contributor.authorThean, Dawn GL-
dc.contributor.authorZhou, Peng-
dc.contributor.authorFung, Frederic KC-
dc.contributor.authorHuang, Yuanhua-
dc.contributor.authorWong, Alan SL-
dc.date.accessioned2024-09-04T07:06:26Z-
dc.date.available2024-09-04T07:06:26Z-
dc.date.issued2024-02-21-
dc.identifier.citationCell Systems, 2024, v. 15, n. 2, p. 193-203-
dc.identifier.issn2405-4712-
dc.identifier.urihttp://hdl.handle.net/10722/345914-
dc.description.abstractA strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information.-
dc.languageeng-
dc.publisherElsevier-
dc.relation.ispartofCell Systems-
dc.subjectactive learning-
dc.subjectbase editor-
dc.subjectCas9-
dc.subjectcombinatorial mutagenesis-
dc.subjectCRISPR-
dc.subjectgenome editing-
dc.subjectlow-N-
dc.subjectmachine learning-
dc.subjectprotein engineering-
dc.subjectzero-shot-
dc.titleAccurate top protein variant discovery via low-N pick-and-validate machine learning-
dc.typeArticle-
dc.identifier.doi10.1016/j.cels.2024.01.002-
dc.identifier.pmid38340729-
dc.identifier.scopuseid_2-s2.0-85185824181-
dc.identifier.volume15-
dc.identifier.issue2-
dc.identifier.spage193-
dc.identifier.epage203-
dc.identifier.eissn2405-4720-
dc.identifier.issnl2405-4712-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats