Git Product home page Git Product logo

call2018's Introduction

Intro

File Structure

  • data
    • asr: ASR raw CTM files and processed CSV files
    • fastText: containing fastText related training and trained files
    • processed:
      • Huy: features from Huy
      • grammar: features from Chuan
      • df_ CSV files will be used by prep_ml_data ipynb
      • numpy: containing pickled numpy arrays for sklearn
    • refernce_grammar:
    • scst1: contains training and test data sets for 2017 SCST1 challenge (only for the text task)
    • texttask_trainData: contains the training data sets for 2018 SCST2 challenge's text task. Please check internal readme to know three files' purposes.
  • result:
    • _tpot.pkl files are ML models generated by TPOT
  • src: codes

Codes

Active

  • text_data.py: loading 2017 and 2018 data sets to generate data/processed/data.pkl
  • vecdist_feature_fasttext.py: using FastText word embedding to compute Cos-Sim and Word Moving Distance features
    • TODO: map-reduce on pandas rows
  • parse_grm_error.py: parse grammar error count sent by Chuan
  • parse_ctm.py: convert ASR CTM outputs to .csv files containing Id and RecResult columns
  • prep_ml_data_{year}_{task}.ipynb IPython notebook to prepare data for running ML tasks. The output will be in data/processed/numpy
  • train_model_v2.py: using a ML (w/ default hyper parameters) model (-t LR, RF, XGB, SVC) to make reject/accept predictions.
  • train_model_hptuned.py: using hyperopt to tune ML models' hyper-parameters
    • TODO: enriching tuning grids
  • train_model_tpot.py: using TPOT genetic programming way to train optimal ML model
  • utils.py: utility functions

Inactive codes under archive dir

 - train_model.py: using hyperopt to tune model parameters. Now support RF, SVC, and XGBoost.
 - eval_model.py: tuning a prob-cutoff for predicting "accept/reject" and run D score evaluation
 - ml_model.py
 - model_sandbox.ipynb
 - end_to_end.py

Codes for debugging purpose

  • try_default_SVC.py: trying default SVC
  • try_tuned_SVC.py: tuned SVC didn't show helps on higher D score

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.