Reaching a modular, domain-agnostic and containerized development in biomedical Natural Language Processing systems.

dc.contributor.authorCorvi, Javier
dc.contributor.authorGelpi, Josep
dc.contributor.authorCapella Gutiérrez, Salvador
dc.date.accessioned2023-01-26T15:49:31Z
dc.date.available2023-01-26T15:49:31Z
dc.date.issued2022-05
dc.description.abstractThe last century saw an exponential increase in scientific publications in the biomedical domain. Despite the potential value of this knowledge; most of this data is only available as unstructured textual literature, which have limited their systematic access, use and exploitation. This limitation can be avoided, or at least mitigated, by relying on text mining techniques to automatically extract relevant data and structure it from textual documents. A significant challenge for scientific software applications, including Natural Language Processing (NLP) systems, consists in providing facilities to share, distribute and run such systems in a simple and convenient way. Software containers can host their own dependencies and auxiliary programs, isolating them from the execution environment. In addition, a workflow manager can be used for the automated orchestration and execution of the text mining pipelines. Our work is focused in the study and design of new techniques and approaches to construct, develop, validate and deploy NLP components and workflows with sufficient genericity, scalability and interoperability allowing their use and instantiation across different domains. The results and techniques acquired will be applied in two main uses cases: the detection of relevant information from preclinical toxicological reports, under the eTRANSAFE project [1]; and the indexation of biomaterials publications with relevant concepts as part as the DEBBIE project.
dc.format.extent2 p.
dc.identifier.citationCorvi, J.; Gelpi, J.; Capella Gutiérrez, S. Reaching a modular, domain-agnostic and containerized development in biomedical Natural Language Processing systems. A: . Barcelona Supercomputing Center, 2022, p. 36-37.
dc.identifier.urihttps://hdl.handle.net/2117/381246
dc.languageen
dc.language.isoeng
dc.publisherBarcelona Supercomputing Center
dc.rights.accessOpen Access
dc.rights.licensenameAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshHigh performance computing
dc.subject.lcshNatural language processing (Computer science)
dc.subject.lemacCàlcul intensiu (Informàtica)
dc.subject.lemacTractament del llenguatge natural (Informàtica)
dc.subject.otherNatural Language Processing (NLP)
dc.subject.otherText mining
dc.subject.otherToxicology
dc.subject.otherPreclinical
dc.subject.otherBiomaterials
dc.titleReaching a modular, domain-agnostic and containerized development in biomedical Natural Language Processing systems.
dc.typeConference report
dspace.entity.typePublication
local.citation.endingPage37
local.citation.startingPage36

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
9BSCDS_11_Reaching a modular.pdf
Mida:
949.98 KB
Format:
Adobe Portable Document Format
Descripció: