ntucllab / libact Goto Github PK
View Code? Open in Web Editor NEWPool-based active learning in Python
Home Page: http://libact.readthedocs.org/
License: BSD 2-Clause "Simplified" License
Pool-based active learning in Python
Home Page: http://libact.readthedocs.org/
License: BSD 2-Clause "Simplified" License
https://pypi.python.org/pypi/libact)
In order to reduce redundancy in batch mode active learning, a suggestion may to include diversity during training mentioned on the following paper :
Incorporating diversity in active learning
The algorithm is written and it's not complicated.
If feature vector are passed with np.array, when calling format_sklearn() method it would return a 3-dimensional array for the feature.
Build fails on RTD server as dependencies (numpy) won't build. Should look for a workaround.
It seems it is able for current interfaces to support multi-label problems without too much changes?
Possible algorithms to implement:
I think we should move the current get_dataset.py to something like the following utility
http://scikit-learn.org/stable/datasets/
It would be easier to write example in sphinx-gallery that way.
How do you think?
@iamyuanchung @hsuantien
https://github.com/ntucllab/libact/blob/master/libact/labelers/ideal_labeler.py#L28
This line may have to be changed to
return self.y[np.where([np.array_equal(x, feature) for x in self.X])[0][0]]
when using numpy 1.11.0b3
Maybe caused by this?
numpy/numpy#6155
give example usage in each code's doc string
setuptools supports more features such as defining dependencies. Also preparing for PyPI submission.
For QSs that rely on a user-given model, a type checked should be performed since different QSs require different capabilities (e.g. UncertaintySampling
requires a ContinuousModel
).
Since we use scikit-learn models a lot, we should define an adapter from scikit-learn models to libact models.
Python 3.5 seems to import everything before running unit tests, the _variance_reduction native extension is built and installed but import fails:
ImportError: Failed to import test module: libact.query_strategies
Traceback (most recent call last):
File "/opt/python/3.5.0/lib/python3.5/unittest/loader.py", line 462, in _find_test_path
package = self._get_module_from_name(name)
File "/opt/python/3.5.0/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
__import__(name)
File "/home/travis/build/ntucllab/libact/libact/query_strategies/__init__.py", line 16, in <module>
from .variance_reduction import VarianceReduction
File "/home/travis/build/ntucllab/libact/libact/query_strategies/variance_reduction.py", line 11, in <module>
from libact.query_strategies import _variance_reduction
ImportError: cannot import name '_variance_reduction'
Build/install log of extension:
running build_ext
building 'libact.query_strategies._variance_reduction' extension
C compiler: gcc -pthread -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/libact
creating build/temp.linux-x86_64-3.5/libact/query_strategies
compile options: '-I/home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/numpy/core/include -I/opt/python/3.5.0/include/python3.5m -c'
extra options: '-std=c11'
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
gcc: libact/query_strategies/variance_reduction.c
gcc -pthread -shared -L/opt/python/3.5.0/lib -Wl,-rpath=/opt/python/3.5.0/lib build/temp.linux-x86_64-3.5/libact/query_strategies/variance_reduction.o -L/opt/python/3.5.0/lib -lpython3.5m -o build/lib.linux-x86_64-3.5/libact/query_strategies/_variance_reduction.cpython-35m-x86_64-linux-gnu.so -llapacke -llapack -lblas
running install_lib
creating /home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/libact
creating /home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/libact/query_strategies
copying build/lib.linux-x86_64-3.5/libact/query_strategies/_variance_reduction.cpython-35m-x86_64-linux-gnu.so -> /home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/libact/query_strategies
For now, the unit tests for active learning algorithms are using the results of real-world data with fixed random seeds. So in the future if any modification to these algorithms have conflict with current test, it should be taken care carefully.
The rigorous way to do the test is to design artificial datasets. We'll leave it as future development goal.
Hello, Thank you for providing this project
After I have installed the dependencies, I run
python setup.py install
But, I get some errors:
Platform Detection: Linux. Link to liblapacke...
running install
running build
running build_py
running build_ext
building 'libact.query_strategies._variance_reduction' extension
C compiler: x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC
compile options: '-I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/lapacke -I/usr/include/python2.7 -c'
extra options: '-std=c11'
x86_64-linux-gnu-gcc: libact/query_strategies/src/variance_reduction/variance_reduction.c
libact/query_strategies/src/variance_reduction/variance_reduction.c:26:15: error: variable ‘moduledef’ has initializer but incomplete type
static struct PyModuleDef moduledef = {
^
libact/query_strategies/src/variance_reduction/variance_reduction.c:27:5: error: ‘PyModuleDef_HEAD_INIT’ undeclared here (not in a function)
PyModuleDef_HEAD_INIT,
^
。。。 。。。 。。。
。。。 。。。 。。。
I wonder if I need to specify the version of Python, so I tried
python3 steup.py install
Still, I cannot install successfully, but the error changes
File "setup.py", line 13, in
from Cython.Build import cythonize
ImportError: No module named 'Cython'
However, I have already installed Cython using "pip install Cython"
It will be very kind of you if you could tell me the requirement of version of the installed dependencies
OR could you please tell how to modify the "-I/usr/include/lapacke -I/usr/include/python2.7" in the compile option
Many Thanks
Hi,
I try to use hintSVM query strategy with the vehicle dataset from mldata.
However, I don't understand why, I got the following error :
File "testing.py", line 60, in run
ask_id = qs.make_query()
File "/usr/local/lib/python3.5/site-packages/libact-0.1.2-py3.5-macosx-10.12-x86_64.egg/libact/query_strategies/hintsvm.py", line 151, in make_query
np.array([x.tolist() for x in unlabeled_pool]), self.svm_params)
File "libact/query_strategies/_hintsvm.pyx", line 16, in libact.query_strategies._hintsvm.hintsvm_query (libact/query_strategies/_hintsvm.c:1836)
ValueError: Buffer dtype mismatch, expected 'float64_t' but got 'long'
I don't have this error when I use others strategies (UncertaintySampling,Quire).
def split_scale_train_test(name_dataset,test_size):
# choose a dataset with unbalanced class instances
#data = sklearn.datasets.fetch_mldata('segment')
data = sklearn.datasets.fetch_mldata(name_dataset)
X = StandardScaler().fit_transform(data['data'])
target = np.unique(data['target'])
# mapping the targets to 0 to n_classes-1
y = np.array([np.where(target == i)[0][0] for i in data['target']])
X_trn, X_tst, y_trn, y_tst = \
train_test_split(X, y, test_size=test_size, stratify=y)
# making sure each class appears ones initially
init_y_ind = np.array(
[np.where(y_trn == i)[0][0] for i in range(len(target))])
y_ind = np.array([i for i in range(len(X_trn)) if i not in init_y_ind])
trn_ds = Dataset(
np.vstack((X_trn[init_y_ind], X_trn[y_ind])),
np.concatenate((y_trn[init_y_ind], [None] * (len(y_ind)))))
tst_ds = Dataset(X_tst, y_tst)
fully_labeled_trn_ds = Dataset(
np.vstack((X_trn[init_y_ind], X_trn[y_ind])),
np.concatenate((y_trn[init_y_ind], y_trn[y_ind])))
cost_matrix = 2000. * np.random.rand(len(target), len(target))
np.fill_diagonal(cost_matrix, 0)
return trn_ds, tst_ds, y_trn,y_tst, fully_labeled_trn_ds, cost_matrix
def run(trn_ds, tst_ds, lbr, model, qs, quota):
E_in, E_out = [], []
score_train = []
score_test = []
for _ in range(quota):
ask_id = qs.make_query()
X, _ = zip(*trn_ds.data)
lb = lbr.label(X[ask_id])
trn_ds.update(ask_id, lb)
model.train(trn_ds)
E_in = np.append(E_in, 1 - model.score(trn_ds))
E_out = np.append(E_out, 1 - model.score(tst_ds))
score_train = np.append(score_train,model.score(trn_ds)*100)
score_test = np.append(score_test,model.score(tst_ds)*100)
return E_in, E_out,score_train,score_test
qs5 = HintSVM(trn_ds5, cl=1.0, ch=1.0, p=0.5)
model = SVM(kernel='rbf',C = n_C, gamma = n_gamma, decision_function_shape='ovr')
E_in_5, E_out_5,score_train_5,score_test_5 = run(trn_ds5, tst_ds, idealLabels, model, qs5, quota_to_query)
results_out.append(E_out_5.tolist())
results_score.append(score_test_5.tolist())
Do you have any insights about this error ?
thank you
Hello. The make_query() method fails at the following line, with q undefined:
ask_idx = np.random.choice(
np.arange(len(self.unlabeled_invert_id_idx)), size=1, p=q
)[0]
Could you please fix it?
Thanks!
Separate changes out from quire
branch.
Hi,
Instead of having of having unlabeled data which come as a stream, I would like to know if there is a way with libact to perform batch mode active learning meaning that the users can select multiples images at once (positive and negatives) ?
thank you in advance
Hello,
I am trying to install Libact in the HPC facilites of my university. However I am getting the following error every time I try to install it:
error: Command "gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/rmegret/irodriguez/anaconda3/envs/bee/lib/python3.6/site-packages/numpy/core/include -I/usr/include/lapacke -I/home/rmegret/irodriguez/anaconda3/envs/bee/include/python3.6m -c libact/query_strategies/src/variance_reduction/variance_reduction.c -o build/temp.linux-x86_64-3.6/libact/query_strategies/src/variance_reduction/variance_reduction.o -std=c11" failed with exit status 1
I have tried pip and cloning the repo and then using setup.py.
Just in case here is the specifications of the HPC: https://www.hpcf.upr.edu/documentation/boqueron/#ffs-tabbed-15
Hello,
I have found that your lib is not compatible with python packages plotly and cufflinks. I have tested it on fresh install of ubuntu 16.04 where anaconda was installed.
Everything was ok till installation of plotly and cufflinks:
pip install plotly --upgrade
pip install cufflinks --upgrade
Then running python setup.py test ends on this:
======================================================================
ERROR: query_strategies (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: query_strategies
Traceback (most recent call last):
File "/path/anaconda3/lib/python3.5/unittest/loader.py", line 153, in loadTestsFromName
module = __import__(module_name)
File "/path/libact/libact/query_strategies/__init__.py", line 20, in <module>
from ._variance_reduction import estVar
ImportError: /usr/lib/liblapacke.so.3: undefined symbol: dpotrf2_
This basic infrastructure of documentation generation has been establish.
Please read about the spec of how to write document in your code.
we are currently using numpydoc:
https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
also there is a lot of bug when building sphinx waiting to be fixed:
https://readthedocs.org/projects/striatum/builds/4370706/
Some applications would need the selection score to do further things.
In certain applications, you might want to know what the top N unlabelled entities are so that a human can go through and do batch labeling offline. Right now I have a particularly hacky way of getting multiple results out, just assuming the majority class in the update, but it would be great to tweak the make_query function to return arbitrary numbers of ordered results for batch label processing.
for i in range(20):
item_to_investigate = qs.make_query()
libact_ds.update(item_to_investigate, 0)
print item_to_investigate
Happy to contribute code to try to help this happen!
Perhaps write a faster implementation in C.
On Ubuntu
When installing libact using command " pip install git+https://github.com/ntucllab/libact.git"
Get error:
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-Q9a2LI-build/
Current IdealLabeler seems to return a list of labels instead of the label.
self.y[np.where([...])[0]]
should be
self.y[np.where([...])[0]][0]
or
self.y[np.where([...])[0][0]]
I would like to ask you about which classsifiers are theorized as Probabilistic so as to be combined with query strategies like Uncertainty Sampling?
Thanks in advance.
Since sklearn internally relabels the given label to 0-n_labels. If I get it correctly, they do it in the order of data sending into the fit method.
So if after we updated an unlabeled data and cause the order of data sending into fit method to change. The value from predict_real method of our model might have wrong order.
One proposal for solving this problem could be manage relabeling set ourself in the model classes.
Is there a jupyter notebook for learning how to use this library?.
https://github.com/ntucllab/libact/blob/master/libact/query_strategies/quire.py#L66
and the gamma parameter should be part of the kernel.
HI all,
I have installed libact package in my ubuntu OS but for some reason i cant run the alce_plot.py and multilabel_plot.py examples. I keep getting the ModuleNotFoundError for the module name 'libact.query_strategies.multilabel'
Please help!
Regards
Tried to install libact using sudo pip install libact
and got the following error message
libact/query_strategies/variance_reduction.c:26:15: error: variable ‘moduledef’ has initializer but incomplete type
You can see the full error message here.
I also tried to install using the setup.py
script, which actually did work just fine, also the python3 installation worked using pip on the same machine.
I did some googling and the error looked similar to here, I cant look into it because setup.py
worked.
Just wanted to let you guys know.
The labeled pool may contain only a subset of all possible labels.
Currently Model.predict_real
is connected to predict_proba
in scikit-learn, which returns an array of n_classes floats standing for probabilities of corresponding labels. But decision_function
is another candidate whose returning shapes vary from model to model, for example (in our case n_samples = 1):
We have to make sure what we want in order to well-define the interface. @hsuantien can you give us some advice on this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.