Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data

Wang, Yubo; Li, Liguan; Xia, Yu; Zhang, Tong

File Download

content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.3389/fbinf.2022.813771
Scopus: eid_2-s2.0-85174485421

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Civil Engineering: Journal/Magazine Articles
- Faculty of Engineering: Journal/Magazine Articles

Article: Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data

Title	Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data
Authors	Wang, Yubo Li, Liguan Xia, Yu Zhang, Tong
Keywords	anaerobes bioinformatic pipeline cellulose hydrolysis function interpretation genome-centric
Issue Date	1-Jan-2022
Publisher	Frontiers Media
Citation	Frontiers in Bioinformatics, 2022, v. 2 How to Cite? DOI: http://dx.doi.org/10.3389/fbinf.2022.813771
Abstract	In the era of high-throughput sequencing, genetic information that is inherently whispering hints of the microbes’ functional niches is becoming easily accessible; however, properly identifying and characterizing these genetic hints to infer the microbes’ functional niches remains a challenge. Regarding genome-centric interpretation on the specific functional niche of cellulose hydrolysis for anaerobes, often encountered in practice is a lack of confidence in predicting the anaerobes’ real cellulolytic competency based solely on abundances of the varying carbohydrate-active enzyme modules annotated or on their taxonomy affiliation. Recognition of the synergy machineries that include but not limited to the cellulosome gene clusters is equally important as the annotation of individual carbohydrate-active modules or genes. In the interpretation of complete genomes of 2,768 microbe strains whose phenotypes have been well documented, with the incorporation of an automatic recognition of synergy among the carbohydrate active elements annotated, an explicit genotype–phenotype correlation was evidenced to be feasible for cellulolytic anaerobes, and a bioinformatic pipeline was developed accordingly. This genome-centric pipeline would categorize putative cellulolytic anaerobes into six genotype groups based on differential cellulose-hydrolyzing capacity and varying synergy mechanisms. Suggested in this genotype–phenotype correlation analysis was a finer categorization of the cellulosome gene clusters: although cellulosome complexes, by their nature, could enable the assembly of a number of carbohydrate-active units, they do not certainly guarantee the formation of the cellulose–enzyme–microbe complex or the cellulose-hydrolyzing activity of the corresponding anaerobe strains, for example, the well-known Clostridium acetobutylicum strains. Also, recognized in this genotype-phenotype correlation analysis was the genetic foundation of a previously unrecognized machinery that may mediate the microbe–cellulose adhesion, to be specific, enzymes encoded by genes harboring both the surface layer homology and cellulose-binding CBM modules. Applicability of this pipeline on scalable annotation of large genome datasets was further tested with the annotation of 7,902 reference genomes downloaded from NCBI, from which 14 genomes of putative paradigm cellulose-hydrolyzing anaerobes were identified. We believe the pipeline developed in this study would be a good add as a bioinformatic tool for genome-centric interpretation of uncultivated anaerobes, specifically on their functional niche of cellulose hydrolysis.
Persistent Identifier	http://hdl.handle.net/10722/360455

DC Field	Value	Language
dc.contributor.author	Wang, Yubo	-
dc.contributor.author	Li, Liguan	-
dc.contributor.author	Xia, Yu	-
dc.contributor.author	Zhang, Tong	-
dc.date.accessioned	2025-09-11T00:30:30Z	-
dc.date.available	2025-09-11T00:30:30Z	-
dc.date.issued	2022-01-01	-
dc.identifier.citation	Frontiers in Bioinformatics, 2022, v. 2	-
dc.identifier.uri	http://hdl.handle.net/10722/360455	-
dc.description.abstract	<p>In the era of high-throughput sequencing, genetic information that is inherently whispering hints of the microbes’ functional niches is becoming easily accessible; however, properly identifying and characterizing these genetic hints to infer the microbes’ functional niches remains a challenge. Regarding genome-centric interpretation on the specific functional niche of cellulose hydrolysis for anaerobes, often encountered in practice is a lack of confidence in predicting the anaerobes’ real cellulolytic competency based solely on abundances of the varying carbohydrate-active enzyme modules annotated or on their taxonomy affiliation. Recognition of the synergy machineries that include but not limited to the cellulosome gene clusters is equally important as the annotation of individual carbohydrate-active modules or genes. In the interpretation of complete genomes of 2,768 microbe strains whose phenotypes have been well documented, with the incorporation of an automatic recognition of synergy among the carbohydrate active elements annotated, an explicit genotype–phenotype correlation was evidenced to be feasible for cellulolytic anaerobes, and a bioinformatic pipeline was developed accordingly. This genome-centric pipeline would categorize putative cellulolytic anaerobes into six genotype groups based on differential cellulose-hydrolyzing capacity and varying synergy mechanisms. Suggested in this genotype–phenotype correlation analysis was a finer categorization of the cellulosome gene clusters: although cellulosome complexes, by their nature, could enable the assembly of a number of carbohydrate-active units, they do not certainly guarantee the formation of the cellulose–enzyme–microbe complex or the cellulose-hydrolyzing activity of the corresponding anaerobe strains, for example, the well-known Clostridium acetobutylicum strains. Also, recognized in this genotype-phenotype correlation analysis was the genetic foundation of a previously unrecognized machinery that may mediate the microbe–cellulose adhesion, to be specific, enzymes encoded by genes harboring both the surface layer homology and cellulose-binding CBM modules. Applicability of this pipeline on scalable annotation of large genome datasets was further tested with the annotation of 7,902 reference genomes downloaded from NCBI, from which 14 genomes of putative paradigm cellulose-hydrolyzing anaerobes were identified. We believe the pipeline developed in this study would be a good add as a bioinformatic tool for genome-centric interpretation of uncultivated anaerobes, specifically on their functional niche of cellulose hydrolysis.</p>	-
dc.language	eng	-
dc.publisher	Frontiers Media	-
dc.relation.ispartof	Frontiers in Bioinformatics	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	anaerobes	-
dc.subject	bioinformatic pipeline	-
dc.subject	cellulose hydrolysis	-
dc.subject	function interpretation	-
dc.subject	genome-centric	-
dc.title	Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data	-
dc.type	Article	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.3389/fbinf.2022.813771	-
dc.identifier.scopus	eid_2-s2.0-85174485421	-
dc.identifier.volume	2	-
dc.identifier.eissn	2673-7647	-
dc.identifier.issnl	2673-7647	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats