XLIndy: interactive recognition and information extraction in spreadsheets
Document typeConference report
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
Over the years, spreadsheets have established their presence in many domains, including business, government, and science. However, challenges arise due to spreadsheets being partially-structured and carrying implicit (visual and textual) information. This translates into a bottleneck, when it comes to automatic analysis and extraction of information. Therefore, we present XLIndy, a Microsoft Excel add-in with a machine learning back-end, written in Python. It showcases our novel methods for layout inference and table recognition in spreadsheets. For a selected task and method, users can visually inspect the results, change configurations, and compare different runs. This enables iterative fine-tuning. Additionally, users can manually revise the predicted layout and tables, and subsequently save them as annotations. The latter is used to measure performance and (re-)train classifiers. Finally, data in the recognized tables can be extracted for further processing. XLIndy supports several standard formats, such as CSV and JSON.
CitationKoci, E. [et al.]. XLIndy: interactive recognition and information extraction in spreadsheets. A: ACM Symposium on Document Engineering. "DocEng '19: proceedings of the ACM Symposium on Document Engineering 2019: September 2019". New York: Association for Computing Machinery (ACM), 2019, p. 1-4.