mlsysops / active-learning-as-a-service Goto Github PK

A scalable & efficient active learning/data selection system for everyone.

License: Apache License 2.0

Python 98.88% Dockerfile 0.57% Shell 0.55%

active-learning mlsys mlops deep-learning machine-learning automl pytorch

active-learning-as-a-service's Introduction

ALaaS: Active Learning as a Service.

Active Learning as a Service (ALaaS) is a fast and scalable framework for automatically selecting a subset to be labeled from a full dataset so to reduce labeling cost. It provides a out-of-the-box and standalone experience for users to quickly utilize active learning.

ALaaS is featured for

🐣 Easy-to-use With <10 lines of code to start the system to employ active learning.
🚀 Fast Use the stage-level parallellism to achieve over 10x speedup than under-optimized active learning process.
💥 Elastic Scale up and down multiple active workers, depending on the number of GPU devices.

The project is still under the active development. Welcome to join us!

Demo on AWS ☕

Free ALaaS demo on AWS (Support HTTP & gRPC)

Use least confidence sampling with ResNet-18 to select images to be labeled for your tasks!

We have deployed ALaaS on AWS for demonstration. Try it by yourself!

Call ALaaS with HTTP 🌐

Call ALaaS with gRPC 🔐

curl \
-X POST http://13.213.29.8:8081/post \
-H 'Content-Type: application/json' \
-d '{"data":[{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png"}], 
    "parameters": {"budget": 3},
    "execEndpoint":"/query"}'

# pip install alaas
from alaas.client import Client

url_list = [
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png'
]
client = Client('grpc://13.213.29.8:60035')
print(client.query_by_uri(url_list, budget=3))

Then you will see 3 data samples (the most informative) has been selected from all the 5 data points by ALaaS.

Installation 🚧

You can easily install the ALaaS by PyPI,

pip install alaas

The package of ALaaS contains both client and server parts. You can build an active data selection service on your own servers or just apply the client to perform data selection.

⚠️ For deep learning frameworks like TensorFlow and Pytorch, you may need to install manually since the version to meet your deployment can be different (as well as transformers if you are running models from it).

You can also use Docker to run ALaaS:

docker pull huangyz0918/alaas

and start a service by the following command:

docker run -it --rm -p 8081:8081 \
        --mount type=bind,source=<config path>,target=/server/config.yml,readonly huangyz0918/alaas:latest

Quick Start 🚚

After the installation of ALaaS, you can easily start a local server, here is the simplest example that can be executed with only 2 lines of code.

from alaas.server import Server

Server.start()

The example code (by default) will start an image data selection (PyTorch ResNet-18 for image classification task) HTTP server in port 8081 for you. After this, you can try to get the selection results on your own image dataset, a client-side example is like

curl \
-X POST http://0.0.0.0:8081/post \
-H 'Content-Type: application/json' \
-d '{"data":[{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png"}], 
    "parameters": {"budget": 3},
    "execEndpoint":"/query"}'

You can also use alaas.Client to build the query request (for both http and grpc protos) like this,

from alaas.client import Client

url_list = [
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png'
]
client = Client('http://0.0.0.0:8081')
print(client.query_by_uri(url_list, budget=3))

The output data is a subset uris/data in your input dataset, which indicates selected results for further data labeling.

ALaaS Server Customization 🔧

We support two different methods to start your server, 1. by input parameters 2. by YAML configuration

Input Parameters

You can modify your server by setting different input parameters,

from alaas.server import Server

Server.start(proto='http',                      # the server proto, can be 'grpc', 'http' and 'https'.
    port=8081,                                  # the access port of your server.
    host='0.0.0.0',                             # the access IP address of your server.
    job_name='default_app',                     # the server name.
    model_hub='pytorch/vision:v0.10.0',         # the active learning model hub, the server will automatically download it for data selection.
    model_name='resnet18',                      # the active learning model name (should be available in your model hub).
    device='cpu',                               # the deploy location/device (can be something like 'cpu', 'cuda' or 'cuda:0'). 
    strategy='LeastConfidence',                 # the selection strategy (read the document to see what ALaaS supports).
    batch_size=1,                               # the batch size of data processing.
    replica=1,                                  # the number of workers to select/query data.
    tokenizer=None,                             # the tokenizer name (should be available in your model hub), only for NLP tasks.
    transformers_task=None                      # the NLP task name (for Hugging Face [Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines)), only for NLP tasks.
)

YAML Configuration

You can also start the server by setting an input YAML configuration like this,

from alaas import Server

# start the server by an input configuration file.
Server.start_by_config('path_to_your_configuration.yml')

Details about building a configuration for your deployment scenarios can be found here.

Strategy Zoo 🎨

Currently we supported several active learning strategies shown in the following table,

Type	Setting	Abbr	Strategy	Year	Reference
Random	Pool-base	RS	Random Sampling	-	-
Uncertainty	Pool	LC	Least Confidence Sampling	1994	DD Lew et al.
Uncertainty	Pool	MC	Margin Confidence Sampling	2001	T Scheffer et al.
Uncertainty	Pool	RC	Ratio Confidence Sampling	2009	B Settles et al.
Uncertainty	Pool	VRC	Variation Ratios Sampling	1965	EH Johnson et al.
Uncertainty	Pool	ES	Entropy Sampling	2009	B Settles et al.
Uncertainty	Pool	MSTD	Mean Standard Deviation	2016	M Kampffmeyer et al.
Uncertainty	Pool	BALD	Bayesian Active Learning Disagreement	2017	Y Gal et al.
Clustering	Pool	KCG	K-Center Greedy Sampling	2017	Ozan Sener et al.
Clustering	Pool	KM	K-Means Sampling	2011	Z Bodó et al.
Clustering	Pool	CS	Core-Set Selection Approach	2018	Ozan Sener et al.
Diversity	Pool	DBAL	Diverse Mini-batch Sampling	2019	Fedor Zhdanov
Adversarial	Pool	DFAL	DeepFool Active Learning	2018	M Ducoffe et al.

Citation

Our tech report of ALaaS is available on arxiv and NeurIPS 2022. Please cite as:

@article{huang2022active,
  title={Active-Learning-as-a-Service: An Efficient MLOps System for Data-Centric AI},
  author={Huang, Yizheng and Zhang, Huaizheng and Li, Yuanming and Lau, Chiew Tong and You, Yang},
  journal={arXiv preprint arXiv:2207.09109},
  year={2022}
}

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Yizheng Huang}
🚇

_Huaizheng
🖋

_{Yuanming Li}

This project follows the all-contributors specification. Contributions of any kind welcome!

Acknowledgement

Jina - Build cross-modal and multimodal applications on the cloud.
Transformers - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

License

The theme is available as open source under the terms of the Apache 2.0 License.

active-learning-as-a-service's People

Contributors

Stargazers

Watchers

Forkers

gaohuan2015 leeeizhang ayo-faks anmolduaai eshenxd ai-jie01 cemberk sibtainrazajamali micseb techthiyanes deanofthewebb eltociear jamnicki msnlab jonac22 gavinchen1314 ramstorage

active-learning-as-a-service's Issues

Add and benchmark strategies

Serve source indexes of the queried data

Hello!
It would be great if Client could serve the source indexes of the queried data. Jina's Document object includes parent_id parameter, maybe it can be used?
I don't get it why it is not a part of the service functionalities

What plans do you have for developing the library in the near future?

Add contributors in the README.md

need the docker file

Software execution information

AlaaS version:
System OS version:

Problem description

Steps to reproduce the problem

Expected behavior

Other information

Things you tried, stack traces, related issues, suggestions on how to fix it...

support loading local files for AL

support models from huggingface

Missing model confidence values for NER model prediction result in error

For random sampling, I have skipped the model loading in the [latest commit]
For NER tasks, the server can be started after the latest update. However, due to the task itself, sometimes the transformer model will return an empty list if there is no entity detected. In such cases, the AL strategy has no values to rank up. It is an interesting issue since currently our system only scores the score given by the hugging face models.

If you have any ideas about that, feel free to discuss them here.

Originally posted by @huangyz0918 in #39 (comment)

Active learning model download from Huggingface and TorchHub

intergrate with huggingface evaluate

Software execution information

AlaaS version:
System OS version:

Problem description

Steps to reproduce the problem

Expected behavior

Other information

Things you tried, stack traces, related issues, suggestions on how to fix it...

Software execution information

AlaaS version:
System OS version:

Problem description

Steps to reproduce the problem

Expected behavior

Other information

Things you tried, stack traces, related issues, suggestions on how to fix it...

https://github.com/hyperopt/hyperopt

Expected behavior

Add a GitHub Action task for pushing Docker images to the Docker hub when releasing.

transformers NER model service start failure

Running service from following config, returns weird error. Service with other huggingface models for NER task ends with same result.

name: "default_app"
version: 0.1
active_learning:
  strategy:
    type: "RandomSampling"
    model:
      name: "dslim/bert-base-NER"
      hub: "huggingface"
      model: "bert-base-NER"
      tokenizer: "dslim/bert-base-NER"
      transformers_task: "ner"
      batch_size: 1
      device: "cpu"
  al_worker:
    protocol: "http"
    host: "0.0.0.0"
    port: 8081
    replicas: 1

  Waiting default_app... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/1 0:00:02CRITI… default_app/rep-0@2592 can not load the executor from TorchALWorker                                                                                                 [12/29/22 19:05:32]
ERROR  default_app/rep-0@2592 TypeError("_sanitize_parameters() got an unexpected keyword argument 'return_all_scores'") during <class                                     [12/29/22 19:05:32]
       'jina.serve.runtimes.worker.WorkerRuntime'> initialization
        add "--quiet-error" to suppress the exception details
       Traceback (most recent call last):
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\orchestrate\pods\__init__.py", line 79, in run
           runtime = runtime_cls(
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\serve\runtimes\worker\__init__.py", line 39, in __init__
           super().__init__(args, **kwargs)
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\serve\runtimes\asyncio.py", line 77, in __init__
           self._loop.run_until_complete(self.async_setup())
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\asyncio\base_events.py", line 616, in run_until_complete
           return future.result()
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\serve\runtimes\worker\__init__.py", line 104, in async_setup
           self._request_handler = WorkerRequestHandler(
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\serve\runtimes\worker\request_handling.py", line 54, in __init__
           self._load_executor(
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\serve\runtimes\worker\request_handling.py", line 204, in _load_executor
           self._executor: BaseExecutor = BaseExecutor.load_config(
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\jaml\__init__.py", line 766, in load_config
           obj = JAML.load(tag_yml, substitute=False, runtime_args=runtime_args)
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\jaml\__init__.py", line 174, in load
           r = yaml.load(stream, Loader=get_jina_loader_with_runtime(runtime_args))
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\yaml\__init__.py", line 81, in load
           return loader.get_single_data()
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\yaml\constructor.py", line 51, in get_single_data
           return self.construct_document(node)
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\yaml\constructor.py", line 55, in construct_document
           data = self.construct_object(node)
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\yaml\constructor.py", line 100, in construct_object
           data = constructor(self, node)
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\jaml\__init__.py", line 582, in _from_yaml
           return get_parser(cls, version=data.get('version', None)).parse(
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\jaml\parsers\executor\legacy.py", line 46, in parse
           obj = cls(
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\serve\executors\decorators.py", line 60, in arg_wrapper
           f = func(self, *args, **kwargs)
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\serve\helper.py", line 73, in arg_wrapper
           f = func(self, *args, **kwargs)
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\alaas\server\executors\al_torch.py", line 104, in __init__
           self._model = pipeline(self._task,
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\transformers\pipelines\__init__.py", line 870, in pipeline
           return pipeline_class(model=model, framework=framework, task=task, **kwargs)
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\transformers\pipelines\token_classification.py", line 126, in __init__
           super().__init__(*args, **kwargs)
         File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\transformers\pipelines\base.py", line 788, in __init__
           self._preprocess_params, self._forward_params, self._postprocess_params = self._sanitize_parameters(**kwargs)
       TypeError: _sanitize_parameters() got an unexpected keyword argument 'return_all_scores'
ERROR  Flow@16320 Flow is aborted due to ['default_app'] can not be started.                                                                                               [12/29/22 19:05:32]
WARNI… gateway/rep-0@16320 Pod was forced to close after 1 second. Graceful closing is not available on Windows.                                                           [12/29/22 19:05:33]
Traceback (most recent call last):                                                                                                                                                            
  File ".\server.py", line 9, in <module>
    main()
  File ".\server.py", line 5, in main
    Server.start_by_config('al-server.yml')
  File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\alaas\server\server.py", line 67, in start_by_config
    Flow(protocol=_proto, port=_port, host=_host) \
  File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\orchestrate\flow\builder.py", line 33, in arg_wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\orchestrate\flow\base.py", line 1782, in start
    self._wait_until_all_ready()
  File "C:\Users\jedrz\anaconda3\envs\active_learner\lib\site-packages\jina\orchestrate\flow\base.py", line 1913, in _wait_until_all_ready
    raise RuntimeFailToStart
jina.excepts.RuntimeFailToStart

AlaaS version: 0.2.0
System OS version: Win 11 22H2

Btw, does model really need to be initialized in service with random sampling strategy?

improve the S3 downloader auth.
add function for uploading files.

mlsysops / active-learning-as-a-service Goto Github PK

active-learning-as-a-service's Introduction

ALaaS: Active Learning as a Service.

Demo on AWS ☕

Installation 🚧

Quick Start 🚚

ALaaS Server Customization 🔧

Input Parameters

YAML Configuration

Strategy Zoo 🎨

Citation

Contributors ✨

Acknowledgement

License

active-learning-as-a-service's People

Contributors

Stargazers

Watchers

Forkers

active-learning-as-a-service's Issues

Software execution information

Problem description

Steps to reproduce the problem

Expected behavior

Other information

Software execution information

Problem description

Steps to reproduce the problem

Expected behavior

Other information

Software execution information

Problem description

Steps to reproduce the problem

Expected behavior

Other information

Expected behavior

Recommend Projects

Recommend Topics

Recommend Org