Intro

File Structure

data
- asr: ASR raw CTM files and processed CSV files
- fastText: containing fastText related training and trained files
- processed:
  - Huy: features from Huy
  - grammar: features from Chuan
  - df_ CSV files will be used by prep_ml_data ipynb
  - numpy: containing pickled numpy arrays for sklearn
- refernce_grammar:
- scst1: contains training and test data sets for 2017 SCST1 challenge (only for the text task)
- texttask_trainData: contains the training data sets for 2018 SCST2 challenge's text task. Please check internal readme to know three files' purposes.
result:
- _tpot.pkl files are ML models generated by TPOT
src: codes

Codes

Active

text_data.py: loading 2017 and 2018 data sets to generate data/processed/data.pkl
vecdist_feature_fasttext.py: using FastText word embedding to compute Cos-Sim and Word Moving Distance features
- TODO: map-reduce on pandas rows
parse_grm_error.py: parse grammar error count sent by Chuan
parse_ctm.py: convert ASR CTM outputs to .csv files containing Id and RecResult columns
prep_ml_data_{year}_{task}.ipynb IPython notebook to prepare data for running ML tasks. The output will be in data/processed/numpy
train_model_v2.py: using a ML (w/ default hyper parameters) model (-t LR, RF, XGB, SVC) to make reject/accept predictions.
train_model_hptuned.py: using hyperopt to tune ML models' hyper-parameters
- TODO: enriching tuning grids
train_model_tpot.py: using TPOT genetic programming way to train optimal ML model
utils.py: utility functions

Inactive codes under archive dir

 - train_model.py: using hyperopt to tune model parameters. Now support RF, SVC, and XGBoost.
 - eval_model.py: tuning a prob-cutoff for predicting "accept/reject" and run D score evaluation
 - ml_model.py
 - model_sandbox.ipynb
 - end_to_end.py

Codes for debugging purpose

try_default_SVC.py: trying default SVC
try_tuned_SVC.py: tuned SVC didn't show helps on higher D score

leocnj / call2018 Goto Github PK

call2018's Introduction

Intro

File Structure

Codes

Active

Inactive codes under archive dir

Codes for debugging purpose

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent