File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Large-scale Dataset and Effective Model for Variant-Disease Associations Extraction

TitleLarge-scale Dataset and Effective Model for Variant-Disease Associations Extraction
Authors
Keywordsbiomedical literature
corresponding gene embeddings
DisGeNet platform
pubmed website
variant-disease associations
Issue Date4-Oct-2023
PublisherACM
Abstract

Extracting variant-disease associations (VDAs) from the biomedical literature is a critical task in biomedical and genomics research, as it provides valuable insights into the genetic basis of diseases and facilitates the development of precision medicine. The biomedical literature is a vast and growing source of information containing a wealth of knowledge on genetic variants and their associations with diseases. However, the manual extraction of VDAs from the literature is a time-consuming and labor-intensive process, making it challenging to keep up with the rapidly expanding literature. Therefore, there is a pressing need to develop computational methods for effectively extracting and curating VDAs from the biomedical literature, and to build a comprehensive dataset for this significant task. In this paper, we present a large-scale, semi-automatically annotated dataset for VDA extraction from the biomedical literature (called VDAL) based on the DisGeNet platform which contains one of the largest publicly available collections of genes and variants associated with human diseases. To the best of our knowledge, VDAL is one of the largest datasets for VDA extraction, containing 9,362 related PubMed documents from the biomedical domain. In addition, we propose a novel and simple yet effective model, called VDANet, which incorporates the corresponding gene embeddings of the variants into the model to better explore the associations between genetic variants and human diseases. Extensive experiments on the constructed dataset show that VDANet significantly outperforms the state-of-the-art baseline methods, thus establishing a new benchmark for VDA extraction. For reproducibility, our code and data are available at https://github.com/JasonCLEI/VDANet.


Persistent Identifierhttp://hdl.handle.net/10722/339280
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorChen, Lei-
dc.contributor.authorSu, Junhao-
dc.contributor.authorZheng, Zhenxian-
dc.contributor.authorLam, Tak-Wah-
dc.contributor.authorLuo, Ruibang-
dc.date.accessioned2024-03-11T10:35:23Z-
dc.date.available2024-03-11T10:35:23Z-
dc.date.issued2023-10-04-
dc.identifier.urihttp://hdl.handle.net/10722/339280-
dc.description.abstract<p>Extracting variant-disease associations (VDAs) from the biomedical literature is a critical task in biomedical and genomics research, as it provides valuable insights into the genetic basis of diseases and facilitates the development of precision medicine. The biomedical literature is a vast and growing source of information containing a wealth of knowledge on genetic variants and their associations with diseases. However, the manual extraction of VDAs from the literature is a time-consuming and labor-intensive process, making it challenging to keep up with the rapidly expanding literature. Therefore, there is a pressing need to develop computational methods for effectively extracting and curating VDAs from the biomedical literature, and to build a comprehensive dataset for this significant task. In this paper, we present a large-scale, semi-automatically annotated dataset for VDA extraction from the biomedical literature (called VDAL) based on the DisGeNet platform which contains one of the largest publicly available collections of genes and variants associated with human diseases. To the best of our knowledge, VDAL is one of the largest datasets for VDA extraction, containing 9,362 related PubMed documents from the biomedical domain. In addition, we propose a novel and simple yet effective model, called VDANet, which incorporates the corresponding gene embeddings of the variants into the model to better explore the associations between genetic variants and human diseases. Extensive experiments on the constructed dataset show that VDANet significantly outperforms the state-of-the-art baseline methods, thus establishing a new benchmark for VDA extraction. For reproducibility, our code and data are available at https://github.com/JasonCLEI/VDANet.</p>-
dc.languageeng-
dc.publisherACM-
dc.relation.ispartof14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB'23 (03/09/2023-06/09/2023, Houston, Texas)-
dc.subjectbiomedical literature-
dc.subjectcorresponding gene embeddings-
dc.subjectDisGeNet platform-
dc.subjectpubmed website-
dc.subjectvariant-disease associations-
dc.titleLarge-scale Dataset and Effective Model for Variant-Disease Associations Extraction-
dc.typeConference_Paper-
dc.identifier.doi10.1145/3584371.3612995-
dc.identifier.scopuseid_2-s2.0-85175851296-
dc.identifier.isiWOS:001143941200037-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats