File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion

TitleRENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion
Authors
Issue Date2021
PublisherOxford University Press: Open Access Journals. The Journal's web site is located at https://academic.oup.com/nargab
Citation
NAR Genomics and Bioinformatics, 2021, v. 3 n. 3, p. article no. lqab062 How to Cite?
AbstractRelation extraction (RE) is a fundamental task for extracting gene–disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene–disease associations only from single sentences or abstract texts. A few studies have explored extracting gene–disease associations from full-text articles, but there exists a large room for improvements. In this work, we propose RENET2, a deep learning-based RE method, which implements Section Filtering and ambiguous relations modeling to extract gene–disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene–disease associations from an annotated full-text dataset, which was 27.22, 30.30, 29.24 and 23.87% higher than BeFree, DTMiner, BioBERT and RENET, respectively. We applied RENET2 to (i) ∼1.89M full-text articles from PubMed Central and found ∼3.72M gene–disease associations; and (ii) the LitCovid articles and ranked the top 15 proteins associated with COVID-19, supported by recent articles. RENET2 is an efficient and accurate method for full-text gene–disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at GitHub.
Persistent Identifierhttp://hdl.handle.net/10722/301332
ISSN
PubMed Central ID
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorSU, J-
dc.contributor.authorWU, Y-
dc.contributor.authorTing, HF-
dc.contributor.authorLam, TW-
dc.contributor.authorLuo, R-
dc.date.accessioned2021-07-27T08:09:32Z-
dc.date.available2021-07-27T08:09:32Z-
dc.date.issued2021-
dc.identifier.citationNAR Genomics and Bioinformatics, 2021, v. 3 n. 3, p. article no. lqab062-
dc.identifier.issn2631-9268-
dc.identifier.urihttp://hdl.handle.net/10722/301332-
dc.description.abstractRelation extraction (RE) is a fundamental task for extracting gene–disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene–disease associations only from single sentences or abstract texts. A few studies have explored extracting gene–disease associations from full-text articles, but there exists a large room for improvements. In this work, we propose RENET2, a deep learning-based RE method, which implements Section Filtering and ambiguous relations modeling to extract gene–disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene–disease associations from an annotated full-text dataset, which was 27.22, 30.30, 29.24 and 23.87% higher than BeFree, DTMiner, BioBERT and RENET, respectively. We applied RENET2 to (i) ∼1.89M full-text articles from PubMed Central and found ∼3.72M gene–disease associations; and (ii) the LitCovid articles and ranked the top 15 proteins associated with COVID-19, supported by recent articles. RENET2 is an efficient and accurate method for full-text gene–disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at GitHub.-
dc.languageeng-
dc.publisherOxford University Press: Open Access Journals. The Journal's web site is located at https://academic.oup.com/nargab-
dc.relation.ispartofNAR Genomics and Bioinformatics-
dc.rightsPostprint This article has been accepted for publication in [Journal Title] Published by Oxford University Press-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.titleRENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion-
dc.typeArticle-
dc.identifier.emailTing, HF: hfting@cs.hku.hk-
dc.identifier.emailLam, TW: twlam@cs.hku.hk-
dc.identifier.emailLuo, R: rbluo@cs.hku.hk-
dc.identifier.authorityTing, HF=rp00177-
dc.identifier.authorityLam, TW=rp00135-
dc.identifier.authorityLuo, R=rp02360-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.1093/nargab/lqab062-
dc.identifier.pmid34235433-
dc.identifier.pmcidPMC8256824-
dc.identifier.scopuseid_2-s2.0-85123224675-
dc.identifier.hkuros323503-
dc.identifier.volume3-
dc.identifier.issue3-
dc.identifier.spagearticle no. lqab062-
dc.identifier.epagearticle no. lqab062-
dc.identifier.isiWOS:000745279200006-
dc.publisher.placeUnited Kingdom-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats