File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1093/nargab/lqab062
- Scopus: eid_2-s2.0-85123224675
- PMID: 34235433
- WOS: WOS:000745279200006
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion
Title | RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion |
---|---|
Authors | |
Issue Date | 2021 |
Publisher | Oxford University Press: Open Access Journals. The Journal's web site is located at https://academic.oup.com/nargab |
Citation | NAR Genomics and Bioinformatics, 2021, v. 3 n. 3, p. article no. lqab062 How to Cite? |
Abstract | Relation extraction (RE) is a fundamental task for extracting gene–disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene–disease associations only from single sentences or abstract texts. A few studies have explored extracting gene–disease associations from full-text articles, but there exists a large room for improvements. In this work, we propose RENET2, a deep learning-based RE method, which implements Section Filtering and ambiguous relations modeling to extract gene–disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene–disease associations from an annotated full-text dataset, which was 27.22, 30.30, 29.24 and 23.87% higher than BeFree, DTMiner, BioBERT and RENET, respectively. We applied RENET2 to (i) ∼1.89M full-text articles from PubMed Central and found ∼3.72M gene–disease associations; and (ii) the LitCovid articles and ranked the top 15 proteins associated with COVID-19, supported by recent articles. RENET2 is an efficient and accurate method for full-text gene–disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at GitHub. |
Persistent Identifier | http://hdl.handle.net/10722/301332 |
ISSN | 2023 Impact Factor: 4.0 2023 SCImago Journal Rankings: 2.454 |
PubMed Central ID | |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | SU, J | - |
dc.contributor.author | WU, Y | - |
dc.contributor.author | Ting, HF | - |
dc.contributor.author | Lam, TW | - |
dc.contributor.author | Luo, R | - |
dc.date.accessioned | 2021-07-27T08:09:32Z | - |
dc.date.available | 2021-07-27T08:09:32Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | NAR Genomics and Bioinformatics, 2021, v. 3 n. 3, p. article no. lqab062 | - |
dc.identifier.issn | 2631-9268 | - |
dc.identifier.uri | http://hdl.handle.net/10722/301332 | - |
dc.description.abstract | Relation extraction (RE) is a fundamental task for extracting gene–disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene–disease associations only from single sentences or abstract texts. A few studies have explored extracting gene–disease associations from full-text articles, but there exists a large room for improvements. In this work, we propose RENET2, a deep learning-based RE method, which implements Section Filtering and ambiguous relations modeling to extract gene–disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene–disease associations from an annotated full-text dataset, which was 27.22, 30.30, 29.24 and 23.87% higher than BeFree, DTMiner, BioBERT and RENET, respectively. We applied RENET2 to (i) ∼1.89M full-text articles from PubMed Central and found ∼3.72M gene–disease associations; and (ii) the LitCovid articles and ranked the top 15 proteins associated with COVID-19, supported by recent articles. RENET2 is an efficient and accurate method for full-text gene–disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at GitHub. | - |
dc.language | eng | - |
dc.publisher | Oxford University Press: Open Access Journals. The Journal's web site is located at https://academic.oup.com/nargab | - |
dc.relation.ispartof | NAR Genomics and Bioinformatics | - |
dc.rights | Postprint This article has been accepted for publication in [Journal Title] Published by Oxford University Press | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.title | RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion | - |
dc.type | Article | - |
dc.identifier.email | Ting, HF: hfting@cs.hku.hk | - |
dc.identifier.email | Lam, TW: twlam@cs.hku.hk | - |
dc.identifier.email | Luo, R: rbluo@cs.hku.hk | - |
dc.identifier.authority | Ting, HF=rp00177 | - |
dc.identifier.authority | Lam, TW=rp00135 | - |
dc.identifier.authority | Luo, R=rp02360 | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.1093/nargab/lqab062 | - |
dc.identifier.pmid | 34235433 | - |
dc.identifier.pmcid | PMC8256824 | - |
dc.identifier.scopus | eid_2-s2.0-85123224675 | - |
dc.identifier.hkuros | 323503 | - |
dc.identifier.volume | 3 | - |
dc.identifier.issue | 3 | - |
dc.identifier.spage | article no. lqab062 | - |
dc.identifier.epage | article no. lqab062 | - |
dc.identifier.isi | WOS:000745279200006 | - |
dc.publisher.place | United Kingdom | - |