Git Product home page Git Product logo

code-kern-ai / automl-docker Goto Github PK

View Code? Open in Web Editor NEW
56.0 3.0 7.0 3.41 MB

CLI-based tool to automatically build ML models from training data into a servable Docker container

Home Page: https://www.kern.ai

License: Apache License 2.0

Python 94.75% Dockerfile 1.41% Shell 2.65% Batchfile 1.19%
auto-ml cli containerization docker fastapi natural-language-processing nlp python text-classification ui webservice

automl-docker's Introduction

!automl-docker

๐Ÿณ automl-docker

With this beginner-friendly CLI tool, you can create containerized machine learning models from your labeled texts in minutes. With this tool you can:

  • Easily create a machine learning model
  • Create a docker container for the model
  • Connect to the container with an API
  • Use your model via a UI

Watch a tutorial on YouTube

Set-up & Installation

This repository uses various libraries, such as sklearn or our embedders library. In our guide, we use the clickbait dataset to illustrate how you can use ๐Ÿณ automl-docker. The data is small and easy-to-use; of course, you can still use any other dataset.

Caution: Currently, the tool supports only .csv files, so please ensure that the dataset fits this requirement.

First, you can clone this repository to your local computer. Do this by typing:

$ git clone [email protected]:code-kern-ai/automl-docker.git

(If you are new to GitHub, you'll find a nice guide to cloning a repository here.)

After you have cloned the repository, you simply need to install all the necessary libraries with either pip or conda. All you need to do is using one of the following commands (depending on wether you are using pip or conda):

$ pip install -r requirements.txt
$ conda install --file requirements.txt

Getting started

Once the requirements are installed, you are ready to go! In the first step of ๐Ÿณ automl-docker, you are going to use a terminal to load in your data, after which a machine learning model will be created for you. To get going, start with the following command:

$ python3 ml/create_model.py

Once the script is started, you will be prompted to set a path to the data location on your system. Currently, only the .csv data format is usable in the tool. More data formats will follow soon!

On Mac or Linux, the path might look like this:

home/user/data/training_data.csv

On Windows, the path might look something like this:

C:\\Users\\yourname\\data\\training_data.csv

You can also use relative paths from the directory you're currently in.

Next, you need to input the name of the columns where the training data and the labels are stored. In our example, you can use headline as the column for the inputs and label as the column for the labels.

headline label
example1 Regular
example2 Clickbait
example3 Regular

Preprocessing the text data.

To make text data usable to ML algorithms, it needs to be preprocessed. To ensure state of the art machine learning, we make use of large, pre-trained transformers pulled from ๐Ÿค— Hugging Face for preprocessing:

  • distilbert-base-uncased -> Very accurate, state of the art method, but slow (especially on large datasets). [ENG]
  • all-MiniLM-L6-v2 -> Faster, but still relatively accurate. [ENG]
  • Custom model -> Input your own model from ๐Ÿค— Hugging Face.

By choosing "Custom model" you can always just use a different model from Hugging Face! After you have choosen your model, the text data will be processed.

Building the machine learning model ๐Ÿš€

And that's it! Now it is time to grab a coffee, lean back and watch as your model is training on the data. You can see the training progress below.

76%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ        | 756/1000 [00:33<00:10, 229.00it/s]

After the training is done, we will automatically test your model and tell you how well it is doing!

Creating a container with Docker

Now, all the components are ready and it's time to bring them all together. Building the container is super easy! Make sure that Docker Desktop is running on your machine. You can get it here. Next, run the following command:

$ bash start_container

Or on winodws, you can run the following file:

$ start_container_windows.bat

Building the container can take a couple of minutes. The perfect opportunity to grab yet another cup of coffee! โ˜•

Using the container

After the container has been built, you are free to use it anywhere you want. To give you an example of what you could do with the container, we will be using it to connect to a graphical user interface that was build using the streamlit framework. If you have come this far, everything you'll need for that is already installed on your machine!

The user interface we built was designed to get predictions on single sentences of texts. Our machine learning model was trained on the clickbait dataset to be able to predict wether or not a headline of an article is clickbait or not. By the way: streamlit is very easy and fun to use, and the documentation is very helpful!

To start the streamlit application, simply use the following command:

$ streamlit run app/ui.py

And that's it! By default, the UI will automatically connect to the previouisly built container. You can test the UI on your local machine, usually by going to http://localhost:8501/.

Roadmap

  • Build basic CLI to capture the data
  • Build mappings for language data (e.g. EN -> ask for en_core_web_sm AND recommend using distilbert-base-uncased)
  • Implement AutoML for classification (training, validation and storage of model)
  • Implement AutoML for ner (training, validation and storage of model)
  • Wrap instructions for build in a Dockerfile
  • Add sample projects (clickbait dataset) and publish them in some posts
  • Publish the repository and set up new roadmap

If you want to have something added, feel free to open an issue.

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

And please don't forget to leave a โญ if you like the work!

License

Distributed under the Apache 2.0 License. See LICENSE.txt for more information.

Contact

This library is developed and maintained by kern.ai. If you want to provide us with feedback or have some questions, don't hesitate to contact us. We're super happy to help. โœŒ๏ธ

automl-docker's People

Contributors

jhoetter avatar leonardpuettmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.