Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
68.765 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • GPLN - Grup de Processament del Llenguatge Natural
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • GPLN - Grup de Processament del Llenguatge Natural
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Building a Spanish/Catalan health records corpus with very sparse protected information labelled

Thumbnail
View/Open
Main article (227,8Kb)
  View UPCommons Usage Statistics
  LA Referencia / Recolecta stats
Includes usage data since 2022
Cita com:
hdl:2117/124710

Show full item record
Medina Herrera, SalvadorMés informacióMés informació
Turmo Borras, JorgeMés informacióMés informacióMés informació
Document typeConference lecture
Defense date2018
Rights accessOpen Access
Attribution-NonCommercial 3.0 Spain
This work is protected by the corresponding intellectual and industrial property rights. Except where otherwise noted, its contents are licensed under a Creative Commons license : Attribution-NonCommercial 3.0 Spain
Abstract
Electronic Health Records (EHR) are an important resource for the research and study of diseases, treatments and symptoms. However, due to data protection laws, information that could potentially compromise privacy must be anonymized before making use of them. Thus, the identification of these pieces of information is mandatory. This identification is usually performed by linguistic models built from EHRs corpora in which Protected Health Information (PHI) has been previously annotated. Nevertheless, two main drawbacks can occur. First, the annotated corpora required to build the models for a particular language may not exist. Second, unannotated corpora might exist for that language, containing very few words related to PHI mentions (i.e., very sparse population). In this situation, the process of manually annotating EHRs results extremely hard and costly, as PHI occurs in very few EHRs. This paper proposes an iterative method for building corpus with labelled PHI from a large unlabelled corpus with a very sparse population of target PHI. The method makes use of manually defined rules specified in the form of Augmented Transition Networks, and tries to minimize the seek of EHRs containing PHI, thus minimizing the cost of manually annotating very sparse EHRs corpora. We use the method with primary care EHRs written in Spanish and Catalan, although it is language-independent and could be applied to EHRs written in other languages. Direct and indirect evaluations performed to the resulting labelled corpus show the appropriateness of our method.
CitationMedina, S., Turmo, J. Building a Spanish/Catalan health records corpus with very sparse protected information labelled. A: International Conference on Language Resources and Evaluation. "LREC 2018: Workshop MultilingualBIO: Multilingual Biomedical Text Processing: proceedings". 2018, p. 1-7. 
URIhttp://hdl.handle.net/2117/124710
ISBN979-10-95546-03-0
Publisher versionhttp://www.elra.info/en/
Collections
  • GPLN - Grup de Processament del Llenguatge Natural - Ponències/Comunicacions de congressos [192]
  • Departament de Ciències de la Computació - Ponències/Comunicacions de congressos [1.327]
  View UPCommons Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
6_W3(1).pdfMain article227,8KbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Metadata under:Metadata under CC0
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina