Extracting Knowledge Bases from table-structured Web resources applied to the semantic-based requirements engineering methodology softwiki
Document typeMaster thesis (pre-Bologna period)
Rights accessOpen Access
Over the last years the use of the Internet by users has evolved drastically from just consulting to publishing, sharing and modifying contents, turning the Internet into a social net in which the possibilities to collaborate and communicate grow every day bigger. A good example are the Wiki systems, which are collaborative, content-focused platforms in which the work of a community is the key to its good performance. Another of the biggest web technology developments of the Internet nowadays is the so-called Semantic Web, a Web in which every piece of data has its context clearly speciﬁed and machines are able to understand it. The OntoWiki project merges both Semantic Web and Wiki technology, enabling the deﬁnition, modiﬁcation and visualization of agile, distributed knowledge engineering scenarios. Proﬁting from the complex extension system of OntoWiki, the SoftWiki platform was born. Thanks to this tool and the associated Agile Requirements Engineering methodology, potentially very large and spatially separate stakeholder groups are able to gather, semantically enrich, classify and aggregate software requirements in an easy manner. Originally created from the desire to import non-semantic requirement data from the Google Code Issues platform to SoftWiki, the CSVLoad extension for OntoWiki enables importing plain data out of CSV table ﬁles into OntoWiki with the help of an administrator-deﬁned RDF semantic template, deﬁned with a modiﬁed subset of the Turtle (N3) language with support of input and mapping values. The use of CSVLoad and the already deﬁned Google Code Issues Template makes importing the requirements of a project hosted in Google Code into SoftWiki (in other words, into a SWORE ontology) very easy. Some platforms permit exporting only a part (or in some cases none) of their information in standard formats like CSV or RDF. Instead they just show their data in HTML documents, which makes creating general, eﬀective plain-to-semantic importing tools an extremely diﬃcult (and in some cases impossible) task, forcing developers to build custom-made tools. The Gcode extension is a tool speciﬁcally built to extract additional requirements information from the Google Code Issues platform HTML code and, together with the CSVLoad tool, it turns importing all the requirements information from Google Code Issues into SoftWiki into an easy, automatic process. By comparing both extensions, their input data and features, the advantages of using structured, view-independent data compared to view-representation-embedded data (e.g. data in a HTML document) become clear. But this data needs a next step, the semantic mark-up, so that computers are able to know the context of the information in an expandable, ﬂexible environment.
Projecte realitzat mitjançant programa de mobilitat. Universität Leipzig. Fakultät für Mathematik und Informatik Institut für Informatik Betriebliche Informationsssysteme