Git Product home page Git Product logo

team-2130-machine-learning-roulette's Introduction

Machine Learning Roulette - Team 2130

Our project is to create a website that allows users to receive an evaluation on the performance of selected machine learning algorithms on the dataset (csv form) which the users upload. In front-end, the website accepts a data set, selects a model or models, and displays statistical model quality results based on other parameter selection. For back-end, we will run the various models, as configured, on the uploaded dataset and offer aggregated quality metrics for the website to render in an informative way.

Team members

Hyelin Lee: [email protected] (Role: Summarizer)

Yuanzhi (David) Liu: [email protected] (Role: Opinion Seeker)

Ruokun (Tommy) Niu: [email protected] (Role: Information Giver)

Harrison L O'Neal: [email protected] (Role: Clarifier)

Junqi (Jacky) Xu: [email protected] (Role: Initiator, Information Giver)

Haoran (Marty) Zhao: [email protected] (Role: Information Seeker)

Release Note

Version 0.4.0

New Features

  1. Display metrics for the Machine learning model training result (including accuracy, prior probability, mean, standard deviation, etc)
  2. Seperate login page and register page. After new user is registered, they will be redirected into the upload page automatically.
  3. Set up database procedure to store history data
  4. Default training percentage set to 70%

Bug Fixes

  1. Disabled the model selection once the user has uploaded their dataset

Version 0.3.0

New Features

  1. Implemented Naive Bayes
  2. Implemented Hierarchical Clustering
  3. Implemented Decision Tree
  4. Database setup
  5. Supported y-label for accuracy comparison

Bug Fixes

  1. Modified the order of frontend upload stage. User will choose the ML model first and then upload their dataset. Y-label is required for some ML models and is optional for the others. The logic will be much clear if the user chooses ML model first, so that our frontend can decide whether y-label is must or not.

Version 0.2.0

New Features

  1. Implemented KMeans Algorithm that takes in CSV dataset and # of clusters as parameters
  2. Built backend API that call KMeans algorithm to get cluster assessment
  3. User authentication (Register and Login)
  4. Deploy the website (https://www.mleroulette.com/)

Bug Fixes

  1. Disabled upload button when the user is not in the first stage of upload.

Version 0.1.0

New Features

  1. Frontend page for uploading dataset (CSV Format) and selecting ML models and parameters.
  2. Frontend page for Login and registration
  3. Error Modal
  4. Backedn APIs to receive CSV dataset and parameters

Bug Fixes

  1. Large margin in "Upload: step2"

Installation Guide

  1. Install Git https://github.com/git-guides/install-git
    Git is a distributed version control system, tracking changes in any set of files. In this project, we use git to do the version control.
git clone https://github.com/JackyXu-Cool/Team-2130-Machine-Learning-Roulette
  1. Install node and npm For our backend, we use node.js After node is installed, follow the instruction here to install backend

  2. Install Frontend related package Follow the instruction here

  3. Database Integration The features we'd like to have are fully set up but we don't integrate it into our application yet. Follow the guide here to learn more about how our database works

  4. IDE installation It is strongly recommende to use a light-weighted IDE to run our application. For example, Visual Studio Code

Client

Jay Lofstead, Sandia National Laboratories

team-2130-machine-learning-roulette's People

Contributors

jackyxu-cool avatar lhyelinn avatar mmmmarty avatar timiport avatar ruokun-niu avatar honeal3 avatar hloneal avatar mmmmartyzhao avatar

Stargazers

 avatar  avatar Qifan Yang avatar  avatar  avatar  avatar  avatar  avatar

Watchers

James Cloos avatar  avatar  avatar

team-2130-machine-learning-roulette's Issues

Dataset storage and publicly available

  • In Frontend, add a checkbox "Are you agree to make this dataset public"?
  • Add a new page for user to select current publicly available dataset
  • Store Dataset in AWS S3

Control Random Seed Variable

As a researcher, I want to be able to control the random seed variable so I can easily reproduce my work.
Scenario: A user wants to control the random seed variable for the Machine Learning models
Given that the user wants to reproduce his/her previous work by controlling the random seed variable;
When the user inputs a numerical integer for the random seed value before executing the models;
Then the models will execute using the inputted value as the random seed variable.
Scenario: A researcher wants to verify a peer’s results.
Given the initial research has the random variable seed variable clearly indicated;
When the peer reviewer inputs the same random seed variable and dataset;
Then they will see the same exact results.
Scenario: A researcher wants to make sure their models are replicable.
Given they know what model they want to use.
When the researcher is entering hyperparameters they will manually control and note down their random variable seed.
Then anyone who has the researcher’s data and random variable seed will be able to exactly recreate the researcher’s models.

AWS bucket construction

Setting up the AWS bucket. Build one(start with "www") public for future website use and another one for internal test

Website Deploy

Create domains for the website and deploy it through route53

Documentation (for project tracking assignment)

All the required features for our application are done, however, for the stretch goal, which is the database implementation & integration, we don't have enough time to implement all of them. See our database documentation for more details. https://github.com/JackyXu-Cool/Team-2130-Machine-Learning-Roulette/tree/master/mlr_database

Features we fail to implement:

  1. Select previously uploaded dataset https://app.zenhub.com/workspaces/team-2130-61f8345f61dce90014cd6cc5/issues/jackyxu-cool/team-2130-machine-learning-roulette/11
  2. Store training result https://app.zenhub.com/workspaces/team-2130-61f8345f61dce90014cd6cc5/issues/jackyxu-cool/team-2130-machine-learning-roulette/12
  3. Cloud service infrastructure https://app.zenhub.com/workspaces/team-2130-61f8345f61dce90014cd6cc5/issues/jackyxu-cool/team-2130-machine-learning-roulette/13

Uploading datasets

As a registered user, I want to upload my own dataset so I can generate models.
Scenario: A user has a fresh dataset and doesn’t know what models to generate.
Given the user is on the upload page;
When the user selects which dataset to upload;
Then they will be able to select a wide variety of models to start determining the best model to move forward with.
Scenario: The user knows which model they want to generate.
Given the user has already uploaded their dataset;
When the user is selecting their models to generate;
Then they will be able to fine tune their hyperparameters.
Scenario: The user does not have a dataset that they want to use;
Given the user is on the upload page;
When they don’t know what to upload;
Then they can follow links to find publicly available datasets.
Scenario: The dataset exceeds the maximum size allowed
Given the user has already uploaded their dataset and the size of the dataset has exceeded the limit;
When the user presses the “start training” button;
Then an error will be thrown, notifying the user that the data has exceeded the size limit

Review results from previous runs

As a user, I want to see old results so I can avoid having to run the algorithm again.
Scenario: An existing user is looking for their results they got several days ago.
Given the user was logged in when they began running their models;
When the user goes to their results history page;
Then they will be able to see their past results.
Scenario: A new user is looking for historical result
Given the user was just a guest user and does not have an account
When the user tries to see their historical result
Then they will be directed to the account sign-up page.
Scenario: The user was not logged in when running their models.
Given the user has closed the webpage since running their models;
When the user goes to the results history page;
Then they will be asked to login but their results won’t be available.
Scenario: The user was not logged in but selected to make their results public.
Given the user has closed the webpage since running their models;
When the user goes to the general database;
Then they will be able to find the results.

How to run

please help with instructions on how to run this , i am pretty new to this , can i use visual studio code

Training notification

As a user, I want to have the training done in the background and notified when completed so I can leave the computer and do some other tasks at the same time.
Scenario: A user wants to leave his/her workplace
Given the user has started the training process by clicking the “start training” button
When the user shuts off the computer,.
Then the training should still keep running in the background until it is completed.
Scenario: the user wants to run two trainings at the same time
Given the user has started one training session
When the user wants to train another dataset
Then the user could open another tab to train another dataset while the first dataset will still keep training in the background.
Scenario: after the training is done in the background, the user wants to be notified via email.
Given the user has already uploaded the dataset and selected the parameters.
When the user wants to enable notification via emails.
Then the user could click the “enable notification when finished” button, which then a prompt will appear asking the user to type in his email or use a saved email address. After the training is completed, it will then send a notification email to this email address.

Decision Tree Algorithm fail to run

I ran the decision tree algorithm with the X_test.csv and Y_test.csv in /testdata folder, but I got this error

  File "C:\JackyXu\GT\Spring 2022\Junior Design\code\mlr_backend\run.py", line 90, in trainData       
    dtree_accuracy = metrics.calculateAccuracy(prediction, Ytest)
  File "C:\JackyXu\GT\Spring 2022\Junior Design\code\mlr_backend\metrics.py", line 6, in calculateAccuracy
    model_output = [int(row[0]) for row in y.tolist()]
  File "C:\JackyXu\GT\Spring 2022\Junior Design\code\mlr_backend\metrics.py", line 6, in <listcomp>   
    model_output = [int(row[0]) for row in y.tolist()]
TypeError: 'float' object is not subscriptable

Can you take a look at it @ruokun-niu

Select parameters

As a user, I want to choose the parameters so I can figure out the best fit case for different analysis.

  1. Scenario: User wants to select the parameters for the specific ML models
    Given the user has already uploaded the dataset and selected the ML model(s);
    when the user selects the related parameters;
    then ML models will adjust based on the selection when executed
  2. Scenario: User wants to find the random variable seed that generates the optimal results for their dataset.
    Given the user has selected their algorithm to use and identified a range of seed variables to try.
    When the user generates models for every random seed.
    Then they will be able to determine and utilize the best seed for their data in future uses.
  3. Scenario: User does not select any parameters by accident and the ML model they select requires the user to put at least one parameter
    Given the user does not select any parameter
    When they click “run the model”
    The model will not run and the user should select at least one parameter to continue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.