ExTRI: Extraction of transcription regulation interactions from literature

dc.contributor.authorVazquez, Miguel
dc.contributor.authorKrallinger, Martin
dc.contributor.authorLeitner, Florian
dc.contributor.authorKuiper, Martin
dc.contributor.authorValencia, Alfonso
dc.contributor.authorLaegreid, Astrid
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2022-05-19T09:51:33Z
dc.date.available2022-05-19T09:51:33Z
dc.date.issued2022
dc.description.abstractThe regulation of gene transcription by transcription factors is a fundamental biological process, yet the relations between transcription factors (TF) and their target genes (TG) are still only sparsely covered in databases. Text-mining tools can offer broad and complementary solutions to help locate and extract mentions of these biological relationships in articles. We have generated ExTRI, a knowledge graph of TF-TG relationships, by applying a high recall text-mining pipeline to MedLine abstracts identifying over 100,000 candidate sentences with TF-TG relations. Validation procedures indicated that about half of the candidate sentences contain true TF-TG relationships. Post-processing identified 53,000 high confidence sentences containing TF-TG relationships, with a cross-validation F1-score close to 75%. The resulting collection of TF-TG relationships covers 80% of the relations annotated in existing databases. It adds 11,000 other potential interactions, including relationships for ~100 TFs currently not in public TF-TG relation databases. The high confidence abstract sentences contribute 25,000 literature references not available from other resources and offer a wealth of direct pointers to functional aspects of the TF-TG interactions. Our compiled resource encompassing ExTRI together with publicly available resources delivers literature-derived TF-TG interactions for more than 900 of the 1500–1600 proteins considered to function as specific DNA binding TFs. The obtained result can be used by curators, for network analysis and modelling, for causal reasoning or knowledge graph mining approaches, or serve to benchmark text mining strategies.
dc.description.peerreviewedPeer Reviewed
dc.description.sponsorshipWe thank the participants of the COST Action GREEKC (CA15205) for fruitful discussions during workshops supported by COST (European Cooperation in Science and Technology).
dc.description.versionPostprint (published version)
dc.format.extent12 p.
dc.identifier.citationVazquez, M. [et al.]. ExTRI: Extraction of transcription regulation interactions from literature. "Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms", 2022, vol. 1865, núm. 1, 194778.
dc.identifier.doi10.1016/j.bbagrm.2021.194778
dc.identifier.issn1874-9399
dc.identifier.urihttps://hdl.handle.net/2117/367522
dc.language.isoeng
dc.publisherElsevier
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S1874939921000961?via%3Dihub#!
dc.rights.accessOpen Access
dc.rights.licensenameAttribution 3.0 Spain
dc.rights.licensenameAttribution 4.0 International (CC BY 4.0)
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Bioinformàtica
dc.subjectÀrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural
dc.subject.lcshText data mining
dc.subject.lcshGenetic transcription
dc.subject.lemacIntel·ligència artificial--Aplicacions biològiques (Subd. geog.)
dc.subject.lemacIntel·ligència artificial--Aplicacions a la medicina
dc.subject.otherText-mining
dc.subject.otherTranscription factors
dc.subject.otherGene regulation
dc.subject.otherSystems biology
dc.titleExTRI: Extraction of transcription regulation interactions from literature
dc.typeArticle
dspace.entity.typePublication
local.citation.number1
local.citation.other194778
local.citation.publicationNameBiochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms
local.citation.volume1865

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
1-s2.0-S1874939921000961-main.pdf
Mida:
3.19 MB
Format:
Adobe Portable Document Format
Descripció:

Col·leccions