RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion

SU, J; WU, Y; Ting, HF; Lam, TW; Luo, R

File Download

content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1093/nargab/lqab062
Scopus: eid_2-s2.0-85123224675
PMID: 34235433
WOS: WOS:000745279200006
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion

Title	RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion
Authors	SU, J WU, Y Ting, HF Lam, TW Luo, R
Issue Date	2021
Publisher	Oxford University Press: Open Access Journals. The Journal's web site is located at https://academic.oup.com/nargab
Citation	NAR Genomics and Bioinformatics, 2021, v. 3 n. 3, p. article no. lqab062 How to Cite? DOI: http://dx.doi.org/10.1093/nargab/lqab062
Abstract	Relation extraction (RE) is a fundamental task for extracting gene–disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene–disease associations only from single sentences or abstract texts. A few studies have explored extracting gene–disease associations from full-text articles, but there exists a large room for improvements. In this work, we propose RENET2, a deep learning-based RE method, which implements Section Filtering and ambiguous relations modeling to extract gene–disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene–disease associations from an annotated full-text dataset, which was 27.22, 30.30, 29.24 and 23.87% higher than BeFree, DTMiner, BioBERT and RENET, respectively. We applied RENET2 to (i) ∼1.89M full-text articles from PubMed Central and found ∼3.72M gene–disease associations; and (ii) the LitCovid articles and ranked the top 15 proteins associated with COVID-19, supported by recent articles. RENET2 is an efficient and accurate method for full-text gene–disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at GitHub.
Persistent Identifier	http://hdl.handle.net/10722/301332
ISSN	2631-9268 2023 Impact Factor: 4.0 2023 SCImago Journal Rankings: 2.454
PubMed Central ID	PMC8256824
ISI Accession Number ID	WOS:000745279200006

DC Field	Value	Language
dc.contributor.author	SU, J	-
dc.contributor.author	WU, Y	-
dc.contributor.author	Ting, HF	-
dc.contributor.author	Lam, TW	-
dc.contributor.author	Luo, R	-
dc.date.accessioned	2021-07-27T08:09:32Z	-
dc.date.available	2021-07-27T08:09:32Z	-
dc.date.issued	2021	-
dc.identifier.citation	NAR Genomics and Bioinformatics, 2021, v. 3 n. 3, p. article no. lqab062	-
dc.identifier.issn	2631-9268	-
dc.identifier.uri	http://hdl.handle.net/10722/301332	-
dc.description.abstract	Relation extraction (RE) is a fundamental task for extracting gene–disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene–disease associations only from single sentences or abstract texts. A few studies have explored extracting gene–disease associations from full-text articles, but there exists a large room for improvements. In this work, we propose RENET2, a deep learning-based RE method, which implements Section Filtering and ambiguous relations modeling to extract gene–disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene–disease associations from an annotated full-text dataset, which was 27.22, 30.30, 29.24 and 23.87% higher than BeFree, DTMiner, BioBERT and RENET, respectively. We applied RENET2 to (i) ∼1.89M full-text articles from PubMed Central and found ∼3.72M gene–disease associations; and (ii) the LitCovid articles and ranked the top 15 proteins associated with COVID-19, supported by recent articles. RENET2 is an efficient and accurate method for full-text gene–disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at GitHub.	-
dc.language	eng	-
dc.publisher	Oxford University Press: Open Access Journals. The Journal's web site is located at https://academic.oup.com/nargab	-
dc.relation.ispartof	NAR Genomics and Bioinformatics	-
dc.rights	Postprint This article has been accepted for publication in [Journal Title] Published by Oxford University Press	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.title	RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion	-
dc.type	Article	-
dc.identifier.email	Ting, HF: hfting@cs.hku.hk	-
dc.identifier.email	Lam, TW: twlam@cs.hku.hk	-
dc.identifier.email	Luo, R: rbluo@cs.hku.hk	-
dc.identifier.authority	Ting, HF=rp00177	-
dc.identifier.authority	Lam, TW=rp00135	-
dc.identifier.authority	Luo, R=rp02360	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.1093/nargab/lqab062	-
dc.identifier.pmid	34235433	-
dc.identifier.pmcid	PMC8256824	-
dc.identifier.scopus	eid_2-s2.0-85123224675	-
dc.identifier.hkuros	323503	-
dc.identifier.volume	3	-
dc.identifier.issue	3	-
dc.identifier.spage	article no. lqab062	-
dc.identifier.epage	article no. lqab062	-
dc.identifier.isi	WOS:000745279200006	-
dc.publisher.place	United Kingdom	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats