Large-scale Dataset and Effective Model for Variant-Disease Associations Extraction

Chen, Lei; Su, Junhao; Zheng, Zhenxian; Lam, Tak-Wah; Luo, Ruibang

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3584371.3612995
Scopus: eid_2-s2.0-85175851296
WOS: WOS:001143941200037

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers
- Faculty of Engineering: Conference papers

Conference Paper: Large-scale Dataset and Effective Model for Variant-Disease Associations Extraction

Title	Large-scale Dataset and Effective Model for Variant-Disease Associations Extraction
Authors	Chen, Lei Su, Junhao Zheng, Zhenxian Lam, Tak-Wah Luo, Ruibang
Keywords	biomedical literature corresponding gene embeddings DisGeNet platform pubmed website variant-disease associations
Issue Date	4-Oct-2023
Publisher	ACM
Abstract	Extracting variant-disease associations (VDAs) from the biomedical literature is a critical task in biomedical and genomics research, as it provides valuable insights into the genetic basis of diseases and facilitates the development of precision medicine. The biomedical literature is a vast and growing source of information containing a wealth of knowledge on genetic variants and their associations with diseases. However, the manual extraction of VDAs from the literature is a time-consuming and labor-intensive process, making it challenging to keep up with the rapidly expanding literature. Therefore, there is a pressing need to develop computational methods for effectively extracting and curating VDAs from the biomedical literature, and to build a comprehensive dataset for this significant task. In this paper, we present a large-scale, semi-automatically annotated dataset for VDA extraction from the biomedical literature (called VDAL) based on the DisGeNet platform which contains one of the largest publicly available collections of genes and variants associated with human diseases. To the best of our knowledge, VDAL is one of the largest datasets for VDA extraction, containing 9,362 related PubMed documents from the biomedical domain. In addition, we propose a novel and simple yet effective model, called VDANet, which incorporates the corresponding gene embeddings of the variants into the model to better explore the associations between genetic variants and human diseases. Extensive experiments on the constructed dataset show that VDANet significantly outperforms the state-of-the-art baseline methods, thus establishing a new benchmark for VDA extraction. For reproducibility, our code and data are available at https://github.com/JasonCLEI/VDANet.
Persistent Identifier	http://hdl.handle.net/10722/339280
ISI Accession Number ID	WOS:001143941200037

DC Field	Value	Language
dc.contributor.author	Chen, Lei	-
dc.contributor.author	Su, Junhao	-
dc.contributor.author	Zheng, Zhenxian	-
dc.contributor.author	Lam, Tak-Wah	-
dc.contributor.author	Luo, Ruibang	-
dc.date.accessioned	2024-03-11T10:35:23Z	-
dc.date.available	2024-03-11T10:35:23Z	-
dc.date.issued	2023-10-04	-
dc.identifier.uri	http://hdl.handle.net/10722/339280	-
dc.description.abstract	<p>Extracting variant-disease associations (VDAs) from the biomedical literature is a critical task in biomedical and genomics research, as it provides valuable insights into the genetic basis of diseases and facilitates the development of precision medicine. The biomedical literature is a vast and growing source of information containing a wealth of knowledge on genetic variants and their associations with diseases. However, the manual extraction of VDAs from the literature is a time-consuming and labor-intensive process, making it challenging to keep up with the rapidly expanding literature. Therefore, there is a pressing need to develop computational methods for effectively extracting and curating VDAs from the biomedical literature, and to build a comprehensive dataset for this significant task. In this paper, we present a large-scale, semi-automatically annotated dataset for VDA extraction from the biomedical literature (called VDAL) based on the DisGeNet platform which contains one of the largest publicly available collections of genes and variants associated with human diseases. To the best of our knowledge, VDAL is one of the largest datasets for VDA extraction, containing 9,362 related PubMed documents from the biomedical domain. In addition, we propose a novel and simple yet effective model, called VDANet, which incorporates the corresponding gene embeddings of the variants into the model to better explore the associations between genetic variants and human diseases. Extensive experiments on the constructed dataset show that VDANet significantly outperforms the state-of-the-art baseline methods, thus establishing a new benchmark for VDA extraction. For reproducibility, our code and data are available at https://github.com/JasonCLEI/VDANet.</p>	-
dc.language	eng	-
dc.publisher	ACM	-
dc.relation.ispartof	14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB'23 (03/09/2023-06/09/2023, Houston, Texas)	-
dc.subject	biomedical literature	-
dc.subject	corresponding gene embeddings	-
dc.subject	DisGeNet platform	-
dc.subject	pubmed website	-
dc.subject	variant-disease associations	-
dc.title	Large-scale Dataset and Effective Model for Variant-Disease Associations Extraction	-
dc.type	Conference_Paper	-
dc.identifier.doi	10.1145/3584371.3612995	-
dc.identifier.scopus	eid_2-s2.0-85175851296	-
dc.identifier.isi	WOS:001143941200037	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Large-scale Dataset and Effective Model for Variant-Disease Associations Extraction

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats