Data Science for Software Engineering (ds4se) is an academic initiative to perform exploratory and causal inference analysis on software engineering artifacts and metadata. Data Management, Analysis, and Benchmarking for DL and Traceability.
Testing team needs to flag all of the tests created in the desc section of the nbs and then add those flags to the settings.ini file in the proper format. Before this issue can be submitted as complete the tests must be invokable through the ds4se format of running tests.
SE_Proj2: Main branch of project too, interacted with by everyone
SE_Proj2_Testing: Branch used by Yangchen and Alex
SE_Proj2_Refactor: Branch used by Robert and Will
SE_Proj2_Facade: BRanch used by Charles and Daniel
These separte branches are meant to prevent unnecessary collisions in merging and pushing allowing domains to merge with each other when needed but also keep certain changes isolated until the group can confirm them.
Create Assertion Tests for the Facade created by the Facade team. These proto asserts are barebones assertions that will need to be updated as the Facade changes.
Create a colab notebook presenting a tutorial of how to use DS4SE to analyze a system probabilistically. Use the refactored information theory and statistical components.
3.0_mining.ir.model
3.0_mining.unsupervised.traceability.ida
3.1_mining.ir.i
3.1_mining.unsupervised.traceability.eda - stuff in here, maybe need to export?
Phase II is aiming at filling the gaps to have a fully functional T-Miner (beta) version. To have a stable version, we need to adopt new SE methodologies that work specifically for data science and machine learning. Such methodologies involve other frameworks such as DVC, nbdev, and TFX. This phase is composed of the following activities:
T-Miner
T-Miner Interoperability and Deployment. We must guarantee that T-miner is communicating with the DS4SE library, Jenkins, and a SecureReqNet deployed version.
T-Miner Navigation. We must guarantee that the proposed navigation is functional and stable. Important use cases: information recovery (traceability) and information analysis (entropy). The tool should retrieve, create, update, and delete traceability results.
Causal Inference View. We require to implement a causal inference view for T-miner. CI should be consumed from DS4SE. However, no modules in DS4SE have been fully developed. This is a whole bach-end solution to update our previous COMET solution.
DS4SE
Data repository integration. We have been employing DVC for data versioning. However, our projects are not fully integrated. We require to centralize in a single remote all the SE-Related data. Our current architecture allows one remote per git-project, which generates data redundancies.
Data Science/ML Continues Integration. We want to adopt Continuous Machine Learning or CML. The main goal of CML is to keep all our experiments and models under control. Similar to TFX, DVC has its own pipeline solution here.
Migrating Unsupervised Traceability Models into CML-DVC. All our unsupervised models will be shaped as an ML pipeline for further enhancement and development.