authors: Yao-Yuan Yang, Shao-Chuan Lee, Yu-An Chung, Tung-En Wu, Si-An Chen, Hsuan-Tien Lin
libact
is a Python package designed to make active learning easier for
real-world users. The package not only implements several popular active learning strategies, but also features the active-learning-by-learning
meta-algorithm that assists the users to automatically select the best strategy
on the fly. Furthermore, the package provides a unified interface for implementing more strategies, models and application-specific labelers. The package is open-source along with issue trackers on github, and can be easily installed from Python Package Index repository.
-
Python 2.7, 3.3, 3.4, 3.5
-
Python dependencies
pip install -r requirements.txt
- Debian (>= 7) / Ubuntu (>= 14.04)
sudo apt-get install build-essential gfortran libatlas-base-dev liblapacke-dev python3-dev
- macOS
brew install homebrew/science/openblas
After resolving the dependencies, you may install the package via pip (for all users):
sudo pip install libact
or pip install in home directory:
pip install --user libact
or pip install from github repository for latest source:
pip install git+https://github.com/ntucllab/libact.git
To build and install from souce in your home directory:
python setup.py install --user
To build and install from souce for all users on Unix/Linux:
python setup.py build
sudo python setup.py install
The main usage of libact
is as follows:
qs = UncertaintySampling(trn_ds, method='lc') # query strategy instance
ask_id = qs.make_query() # let the specified query strategy suggest a data to query
X, y = zip(*trn_ds.data)
lb = lbr.label(X[ask_id]) # query the label of unlabeled data from labeler instance
trn_ds.update(ask_id, lb) # update the dataset with newly queried data
Some examples are available under the examples
directory. Before running, use
examples/get_dataset.py
to retrieve the dataset used by the examples.
Available examples:
- plot : This example performs basic usage of libact. It splits a fully-labeled dataset and remove some label from dataset to simulate the pool-based active learning scenario. Each query of an unlabeled dataset is then equivalent to revealing one labeled example in the original data set.
- label_digits : This example shows how to use libact in the case that you want a human to label the selected sample for your algorithm.
- albl_plot: This example compares the performance of ALBL with other active learning algorithms.
- multilabel_plot: This example compares the performance of algorithms under multilabel setting.
- alce_plot: This example compares the performance of algorithms under cost-sensitive multi-class setting.
Documentation for the latest release is available online.
Comments and questions on the package is welcomed at [email protected]
. All contributions to the documentation are greatly appreciated!
To run the test suite:
python setup.py test
To run pylint, install pylint through pip install pylint
and run the following command in root directory:
pylint libact
To measure the test code coverage, install coverage through pip install coverage
and run the following commands in root directory:
coverage run --source libact --omit */tests/* setup.py test
coverage report
If you find this package useful, please cite the original works (see Reference of each strategy) as well as the following (temporarily)
@TechReport{libact,
author = {Yao-Yuan Yang and Shao-Chuan Lee and Yu-An Chung and Tung-En Wu and Si-An Chen and Hsuan-Tien Lin},
title = {libact: Pool-based Active Learning in Python},
url = {https://github.com/ntucllab/libact},
year = {2015}
}
The authors thank Chih-Wei Chang and other members of the Computational Learning Lab at National Taiwan University for valuable discussions and various contributions to making this package better.