Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
69.362 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • MPI - Modelització i processament de la Informació
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • MPI - Modelització i processament de la Informació
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A machine learning approach for layout inference in spreadsheets

Thumbnail
View/Open
KDIR_2016_47_CR.pdf (576,2Kb)
 
10.5220/0006052200770088
 
  View UPCommons Usage Statistics
  LA Referencia / Recolecta stats
Includes usage data since 2022
Cita com:
hdl:2117/100584

Show full item record
Koci, Elvis
Thiele, Maik
Romero Moral, ÓscarMés informacióMés informacióMés informació
Lehner, Wolfgang
Document typeConference report
Defense date2016
PublisherSciTePress
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Spreadsheet applications are one of the most used tools for content generation and presentation in industry and the Web. In spite of this success, there does not exist a comprehensive approach to automatically extract and reuse the richness of data maintained in this format. The biggest obstacle is the lack of awareness about the structure of the data in spreadsheets, which otherwise could provide the means to automatically understand and extract knowledge from these files. In this paper, we propose a classification approach to discover the layout of tables in spreadsheets. Therefore, we focus on the cell level, considering a wide range of features not covered before by related work. We evaluated the performance of our classifiers on a large dataset covering three different corpora from various domains. Finally, our work includes a novel technique for detecting and repairing incorrectly classified cells in a post-processing step. The experimental results show that our approach deliver s very high accuracy bringing us a crucial step closer towards automatic table extraction.
CitationKoci, E., Thiele, M., Romero, O., Lehner, W. A machine learning approach for layout inference in spreadsheets. A: International Conference on Knowledge Discovery and Information Retrieval. "IC3K 2016: Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: volume 1: KDIR". Porto: SciTePress, 2016, p. 77-88. 
URIhttp://hdl.handle.net/2117/100584
DOI10.5220/0006052200770088
ISBN978-989-758-203-5
Publisher versionhttp://dx.doi.org/10.5220/0006052200770088
Collections
  • MPI - Modelització i processament de la Informació - Ponències/Comunicacions de congressos [119]
  • Departament d'Enginyeria de Serveis i Sistemes d'Informació - Ponències/Comunicacions de congressos [566]
  View UPCommons Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
KDIR_2016_47_CR.pdf576,2KbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Metadata under:Metadata under CC0
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina