File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Can transcriptome size be estimated from SAGE catalogs?

TitleCan transcriptome size be estimated from SAGE catalogs?
Authors
Issue Date2003
Citation
Bioinformatics, 2003, v. 19 n. 4, p. 443-448 How to Cite?
AbstractMotivation: We sought to determine whether SAGE (Serial Analysis of Gene Expression) can be used to estimate the number of unique transcripts in a transcriptome. A simple estimator that corrects for sequencing and sampling errors was applied to a SAGE library (137 832 tags) obtained from mouse embryonic stem cells, and also to Monte Carlo simulated libraries generated using assumed distributions of 'true' expression levels consistent with the data. Results: When the corrected data themselves were taken as the underlying model of 'ground truth', the estimator converged to the 'true' value (53 535) only after counting 300 000 simulated tags, more than twice the number in the experiment. The SAGE data could also be well fit by a Monte Carlo model based on a truncated inversesquare distribution of expression levels, with 130 000 'true' transcripts and > 106 samples needed for convergence. We conclude that the size of a transcriptome is ill-determined from SAGE libraries of even moderately large size. In order to obtain a valid estimate, one must sample a number of tags inversely proportional to the lowest abundance level, which is not known a priori. This constrains the design of SAGE experiments intended to determine biological complexity.
Persistent Identifierhttp://hdl.handle.net/10722/195164
ISSN
2021 Impact Factor: 6.931
2020 SCImago Journal Rankings: 3.599
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorStern, MD-
dc.contributor.authorAnisimov, SV-
dc.contributor.authorBoheler, KR-
dc.date.accessioned2014-02-25T01:40:15Z-
dc.date.available2014-02-25T01:40:15Z-
dc.date.issued2003-
dc.identifier.citationBioinformatics, 2003, v. 19 n. 4, p. 443-448-
dc.identifier.issn1367-4803-
dc.identifier.urihttp://hdl.handle.net/10722/195164-
dc.description.abstractMotivation: We sought to determine whether SAGE (Serial Analysis of Gene Expression) can be used to estimate the number of unique transcripts in a transcriptome. A simple estimator that corrects for sequencing and sampling errors was applied to a SAGE library (137 832 tags) obtained from mouse embryonic stem cells, and also to Monte Carlo simulated libraries generated using assumed distributions of 'true' expression levels consistent with the data. Results: When the corrected data themselves were taken as the underlying model of 'ground truth', the estimator converged to the 'true' value (53 535) only after counting 300 000 simulated tags, more than twice the number in the experiment. The SAGE data could also be well fit by a Monte Carlo model based on a truncated inversesquare distribution of expression levels, with 130 000 'true' transcripts and > 106 samples needed for convergence. We conclude that the size of a transcriptome is ill-determined from SAGE libraries of even moderately large size. In order to obtain a valid estimate, one must sample a number of tags inversely proportional to the lowest abundance level, which is not known a priori. This constrains the design of SAGE experiments intended to determine biological complexity.-
dc.languageeng-
dc.relation.ispartofBioinformatics-
dc.titleCan transcriptome size be estimated from SAGE catalogs?-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1093/bioinformatics/btg018-
dc.identifier.pmid12611798-
dc.identifier.scopuseid_2-s2.0-0037340843-
dc.identifier.volume19-
dc.identifier.issue4-
dc.identifier.spage443-
dc.identifier.epage448-
dc.identifier.isiWOS:000181410700001-
dc.identifier.issnl1367-4803-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats