Git Product home page Git Product logo

gamechanger-ml's Introduction

GC - Machine Learning

Table of Contents

  1. Directory
  2. Development Rules
  3. Train Models
  4. ML API

Directory

├── gamechangerml
│   ├── api
│   │   ├── fastapi
│   │   │   └── routers
│   │   ├── kube
│   │   │   └── gc-ml-workflow
│   │   │       ├── charts
│   │   │       └── templates
│   │   ├── logs
│   │   ├── tests
│   │   └── utils
│   ├── configs
│   ├── corpus
│   ├── data
│   │   ├── agencies
│   │   ├── nltk_data
│   │   │   └── tokenizers
│   │   │       └── punkt
│   │   ├── test_data
│   │   └── test_output
│   ├── experimental
│   │   └── notebooks
│   │       ├── evaluation
│   │       │   ├── ablation_inputs
│   │       │   ├── ablation_outputs
│   │       │   ├── assets
│   │       │   ├── eval_folder
│   │       │   └── msmarco_1k
│   │       ├── portion_marking_demo
│   │       └── sentence-transformer
│   │           ├── sample_corpus
│   │           └── sample_index
│   ├── mlflow
│   ├── models
│   │   ├── sent_index_20210728
│   │   ├── topic_models
│   │   └── transformers
│   │       ├── bert-base-cased-squad2
│   │       ├── distilbart-mnli-12-3
│   │       ├── distilbert-base-uncased-distilled-squad
│   │       ├── distilroberta-base
│   │       └── msmarco-distilbert-base-v2
│   ├── scripts
│   │   └── topic_model
│   ├── src
│   │   ├── featurization
│   │   │   ├── data
│   │   │   ├── extract_improvement
│   │   │   ├── keywords
│   │   │   │   └── qe_mlm
│   │   │   │       ├── example
│   │   │   │       └── tests
│   │   │   ├── term_extract
│   │   │   └── tests
│   │   ├── model_testing
│   │   ├── search
│   │   │   ├── QA
│   │   │   ├── embed_reader
│   │   │   │   ├── examples
│   │   │   │   ├── schema_example
│   │   │   │   └── test
│   │   │   ├── evaluation
│   │   │   │   ├── sample_data
│   │   │   │   └── tests
│   │   │   ├── query_expansion
│   │   │   │   ├── aux_data
│   │   │   │   ├── build_ann_cli
│   │   │   │   └── tests
│   │   │   ├── ranking
│   │   │   │   ├── generated_files
│   │   │   │   └── tests
│   │   │   ├── semantic
│   │   │   └── sent_transformer
│   │   │       └── tests
│   │   ├── text_classif
│   │   │   ├── cli
│   │   │   ├── examples
│   │   │   ├── tests
│   │   │   └── utils
│   │   ├── text_handling
│   │   │   └── assets
│   │   └── utilities
│   │       └── numpy_encoder
│   │           └── tests
│   ├── stresstest
│   ├── train
│   │   └── scripts
│   └── unittest
└── unittest

127 directories

Development Rules

  • Everything in gamechangerml/src should be independent of things outside of that structure (should not need to import from dataPipeline, common, etc).
  • Where ever possible, code should be modular and broken down into smallest logical pieces and placed in the most logical subfolder.
  • Include README.md file and/or example scripts demonstrating the functionality of your code.
  • Models/large files should not be stored on Github.
  • Data should not be stored on Github, there is a script in the gamechangerml/scripts folder to download a corpus from s3.
  • File paths in gamechangerml/configs config files should be relative to gamechangerml and only used for local testing purposes (feel free to change on your local machine, but do not commit to repo with system specific paths).
  • A config should not be required as an input parameter to a function; however a config can be used to provide parameters to a function (foo(path=Config.path), rather than foo(Config)).
  • If a config is used for a piece of code (such as training a model), the config should be placed in the relevant section of the repo (dataPipeline, api, etc.) and should clearly designate which environment the config is for (if relevant).

Getting Started

To use gamechangerml as a python module

  • pip install .
  • you should now be able to import gamechangerml anywhere python is available.

Train Models

  1. Setup your environment, and make any changes to configs:
  • source ./gamechangerml/setup_env.sh DEV
  1. Ensure your AWS enviroment is setup (you have a default profile)
  2. Get dependencies
  • source ./gamechangerml/scripts/download_dependencies.sh
  1. For query expansion:
  • python -m gamechangerml.train.scripts.run_train_models --flag {MODEL_NAME_SUFFIX} --saveremote {True or False} --model_dest {FILE_PATH_MODEL_OUTPUT} --corpus {CORPUS_DIR}
  1. For sentence embeddings:
  • python -m gamechangerml.train.scripts.create_embeddings -c {CORPUS LOCATION} --gpu True --em msmarco-distilbert-base-v2

ML API

  1. Setup your environment, make any changes to configs:
  • source ./gamechangerml/setup_env.sh DEV
  1. Ensure your AWS enviroment is setup (you have a default profile)
  2. Dependencies will be automatically downloaded and extracted.
  3. cd gamechangerml/api
  4. docker-compose build
  5. docker-compose up
  6. visit localhost:5000/docs

FAQ

  • Do I need to train models to use the API?
    • No, you can use the pretrained models within the dependencies.
  • The API is crashing when trying to load the models.
    • Likely your machine does not have enough resources (RAM or CPU) to load all models. Try to exclude models from the model folder.
  • Do I need a machine with a GPU?
    • No, but it will make training or inferring faster.
  • What if I can't download the dependencies since I am external?
    • We are working on making models publically available. However you can use download pretrained transformers from HuggingFace to include in the models/transformers directory, which will enable you to use some functionality of the API. Without any models, there is still functionality available like text extraction avaiable.

gamechanger-ml's People

Contributors

katerdowdy avatar rha930 avatar cskiz avatar vctrstrm avatar smalagon15 avatar jgrundy avatar cwren0110 avatar brandonherzog avatar takao8 avatar mishoe avatar jram930 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.