Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
69.343 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • MPI - Modelització i processament de la Informació
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • MPI - Modelització i processament de la Informació
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Table identification and reconstruction in spreadsheets

Thumbnail
View/Open
Camera_Ready_Caise_2017.pdf (1,478Mb)
 
10.1007/978-3-319-59536-8_33
 
  View UPCommons Usage Statistics
  LA Referencia / Recolecta stats
Includes usage data since 2022
Cita com:
hdl:2117/113249

Show full item record
Koci, Elvis
Thiele, Maik
Romero Moral, ÓscarMés informacióMés informacióMés informació
Lehner, Wolfgang
Document typeConference report
Defense date2017
PublisherSpringer
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Spreadsheets are one of the most successful content generation tools, used in almost every enterprise to perform data transformation, visualization, and analysis. The high degree of freedom provided by these tools results in very complex sheets, intermingling the actual data with formatting, formulas, layout artifacts, and textual metadata. To unlock the wealth of data contained in spreadsheets, a human analyst will often have to understand and transform the data manually. To overcome this cumbersome process, we propose a framework that is able to automatically infer the structure and extract the data from these documents in a canonical form. In this paper, we describe our heuristics-based method for discovering tables in spreadsheets, given that each cell is classified as either header, attribute, metadata, data, or derived. Experimental results on a real-world dataset of 439 worksheets (858 tables) show that our approach is feasible and effectively identifies tables within partially structured spreadsheets.
CitationKoci, E., Thiele, M., Romero, O., Lehner, W. Table identification and reconstruction in spreadsheets. A: International Conference on Advanced Information Systems Engineering. "Advanced Information Systems Engineering: 29th International Conference, CAiSE 2017: Essen, Germany, June 12-16, 2017: proceedings". Essen: Springer, 2017, p. 527-541. 
URIhttp://hdl.handle.net/2117/113249
DOI10.1007/978-3-319-59536-8_33
ISBN978-3-319-59536-8
Publisher versionhttps://link.springer.com/chapter/10.1007%2F978-3-319-59536-8_33
Collections
  • MPI - Modelització i processament de la Informació - Ponències/Comunicacions de congressos [119]
  • Departament d'Enginyeria de Serveis i Sistemes d'Informació - Ponències/Comunicacions de congressos [566]
  View UPCommons Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
Camera_Ready_Caise_2017.pdf1,478MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Metadata under:Metadata under CC0
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina