A new methodology for datascience automation in javaKlass: PTDD

View/Open
Cita com:
hdl:2117/400523
Document typeMaster thesis
Date2023-10-16
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Data science research is a multidisciplinary activity where people with
different backgrounds and skills (mathematicians, physicists, computer
scientists, etc.) often work together to design and build software that
implements research results.
Long-term projects face stability, maintainability, scalability, and reproducibility challenges when a large number of developers are involved, when
very old code coexists with new code, and when the development team
faces high volatility due to the inherent nature of research teams, often
related to financial issues.
JavaKLASS is a data science software developed after 30 years of research
under the leadership of Karina Gibert and her team of more than 25
researchers and developers from different backgrounds. The system is a
Java desktop application that needs to evolve to a new version where it
can be used from different interfaces, including batch usage.
In this thesis, we will design and build an end-to-end scripting language
for javaKLASS, so that scripts can be used to execute the various data
science processes supported by javaKLASS in different ways: either called
from the current javaKLASS graphical interface, or from a batch process.
By implementing this scripting language, we’re also opening the door
to defining a set of scripts that can also be used to test intensively the
stability of the code as new developers extend the functionality of the
system.
These test scripts will provide a mechanism for a comprehensive and automated testing process, introducing a new methodology we’ll call Process
Testing Driven Development (PTDD).
This new methodology is intended to ensure that new developments do not
break existing functionality and to add robustness to future developments
and software upgrades. These tests will also be used in the long term to
support software refactoring activities.
SubjectsComputer software, Java (Computer program language), Data sets, Programari, Java (Llenguatge de programació), Conjunts de dades
DegreeMÀSTER UNIVERSITARI EN ENGINYERIA INFORMÀTICA (Pla 2012)
Collections
Files | Description | Size | Format | View |
---|---|---|---|---|
179063.pdf | 3,412Mb | View/Open |