Git Product home page Git Product logo

wm-semeru / ds4se Goto Github PK

View Code? Open in Web Editor NEW
8.0 8.0 3.0 372.36 MB

Data Science for Software Engineering (ds4se) is an academic initiative to perform exploratory and causal inference analysis on software engineering artifacts and metadata. Data Management, Analysis, and Benchmarking for DL and Traceability.

Home Page: https://wm-csci-435-f19.github.io/ds4se/

License: Apache License 2.0

Dockerfile 0.01% Jupyter Notebook 99.50% Python 0.46% Shell 0.01% C 0.03% Makefile 0.01%

ds4se's People

Contributors

aldrodriguezca avatar caharris02 avatar charleswang528 avatar danaderp avatar danielrcardenas avatar dependabot[bot] avatar dquiroga10 avatar m13253 avatar rmclanton avatar robertfrig avatar willkinney avatar wilsmccreight avatar yangchenye323 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ds4se's Issues

Fix: exp Mongo simulation tests

There are pathing errors in the simulate mongodb method in 1.0_exp.i. These need to be fixed in order to use this as a basis for testing exploratory.

Missing Dependencies

Running list of dependencies that were not installed using requirements.txt
0.0
tokenizers

1.1
fastprogress - in ds4se.mgmnt.prep.bpe (?)

6.2
Lizard
Tree_sitter

3.3
Gensim
Prg

Fix CI/CD

Have the repository properly test the project with each push/pull request

Add desc Test flags to settings.ini

Testing team needs to flag all of the tests created in the desc section of the nbs and then add those flags to the settings.ini file in the proper format. Before this issue can be submitted as complete the tests must be invokable through the ds4se format of running tests.

Create/Implement TestDS4SE.ipynb

Create a program in python that when run will run nbdev tests on our select nbdev files and will print out to the User a stub summary of the process.

CSCI-SE-Proj2: Create Branches for Sub Domains

Create Branches For Project 2:

SE_Proj2: Main branch of project too, interacted with by everyone
SE_Proj2_Testing: Branch used by Yangchen and Alex
SE_Proj2_Refactor: Branch used by Robert and Will
SE_Proj2_Facade: BRanch used by Charles and Daniel

These separte branches are meant to prevent unnecessary collisions in merging and pushing allowing domains to merge with each other when needed but also keep certain changes isolated until the group can confirm them.

CSCI-SE-Proj2: Facade Proto Test Cases

Create Assertion Tests for the Facade created by the Facade team. These proto asserts are barebones assertions that will need to be updated as the Facade changes.

Fixing Corrupted Files

Certain files have become corrupted resulting in problems throughout the repo. Fix these files and push to main.

System probability colab notebook tutorial

Create a colab notebook presenting a tutorial of how to use DS4SE to analyze a system probabilistically. Use the refactored information theory and statistical components.

Non-exported code

3.0_mining.ir.model
3.0_mining.unsupervised.traceability.ida
3.1_mining.ir.i
3.1_mining.unsupervised.traceability.eda - stuff in here, maybe need to export?

Test Cases for Facade

  • TraceLinkValue function for word2vec and doc2vec functionality
  • NumDoc
  • VocabSize
  • AverageToken
  • Vocab
  • VocabShared
  • SharedVocabSize
  • MutualInformation
  • CrossEntropy

[Phase II] T-Miner & DS4SE

Phase II is aiming at filling the gaps to have a fully functional T-Miner (beta) version. To have a stable version, we need to adopt new SE methodologies that work specifically for data science and machine learning. Such methodologies involve other frameworks such as DVC, nbdev, and TFX. This phase is composed of the following activities:

T-Miner

  • T-Miner Interoperability and Deployment. We must guarantee that T-miner is communicating with the DS4SE library, Jenkins, and a SecureReqNet deployed version.
  • T-Miner Navigation. We must guarantee that the proposed navigation is functional and stable. Important use cases: information recovery (traceability) and information analysis (entropy). The tool should retrieve, create, update, and delete traceability results.
  • Causal Inference View. We require to implement a causal inference view for T-miner. CI should be consumed from DS4SE. However, no modules in DS4SE have been fully developed. This is a whole bach-end solution to update our previous COMET solution.

DS4SE

  • Data repository integration. We have been employing DVC for data versioning. However, our projects are not fully integrated. We require to centralize in a single remote all the SE-Related data. Our current architecture allows one remote per git-project, which generates data redundancies.
  • Data Science/ML Continues Integration. We want to adopt Continuous Machine Learning or CML. The main goal of CML is to keep all our experiments and models under control. Similar to TFX, DVC has its own pipeline solution here.
  • Migrating Unsupervised Traceability Models into CML-DVC. All our unsupervised models will be shaped as an ML pipeline for further enhancement and development.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.