Comments (11)
SOLUTION
Also can everyone @stonk97 @awoloshuk @UCASREN please try changing the pruning line to
label_errors = get_noise_indices(
y_train,
my_psx,
n_jobs=1, # BE SURE TO ADD THIS -- it turns off multiprocessing
)
where i have turned off multiprocessing. My guess is that multiprocessing in python is not working correctly with Windows. For now, this work-around (turning off multiprocessing) should solve your issue.
from cleanlab.
I also have the same issue
from cleanlab.
@cgnorthcutt I also have the same issue
from cleanlab.
Hi folks @UCASREN @awoloshuk @stonk97 Thanks for sharing. I am unable to reproduce your error. here is a full working (for me) set of code, intended to reproduce your error.
import cleanlab
from cleanlab.latent_estimation import estimate_py_noise_matrices_and_cv_pred_proba
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_digits
from cleanlab.pruning import get_noise_indices
data = load_digits()
X_train = data['data']
y_train = data['target']
est_py, est_nm, est_inv, confident_joint, my_psx = estimate_py_noise_matrices_and_cv_pred_proba(
X=X_train,
s=y_train,
clf = GaussianNB()
)
label_errors = get_noise_indices(
y_train,
my_psx,
verbose=1,
)
However, I have no issues running this code. Can you give me more information for each of you about why you're having the error? Also can you each include python version, how you installed cleanlab, os version, etc. after running exactly the code above?
from cleanlab.
I install cleanlab with 'pip install cleanlab', python version is 3.7.4, and Win10 environment. When I run cleanlab-master/examples/iris_simple_example.ipynb, it gets below errors:
`
WITHOUT confident learning, Iris dataset test accuracy: 0.6
Now we show the improvement using confident learning to characterize the noise
and learn on the data that is (with high confidence) labeled correctly.
WITH confident learning (noise matrix given),
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "D:\ProgramData\Anaconda3\lib\site-packages\cleanlab\pruning.py", line 109, in _prune_by_count
noise_mask = np.zeros(len(psx), dtype=bool)
NameError: name 'psx' is not defined
"""
The above exception was the direct cause of the following exception:
NameError Traceback (most recent call last)
in
9 print()
10 print('WITH confident learning (noise matrix given),', end=" ")
---> 11 _ = rp.fit(X_train, s, noise_matrix=noise_matrix)
12 pred = rp.predict(X_test)
13 print("Iris dataset test accuracy:", round(accuracy_score(pred, y_test),2))
D:\ProgramData\Anaconda3\lib\site-packages\cleanlab\classification.py in fit(self, X, s, psx, thresholds, noise_matrix, inverse_noise_matrix)
295 inverse_noise_matrix = self.inverse_noise_matrix,
296 confident_joint = self.confident_joint,
--> 297 prune_method = self.prune_method,
298 )
299
D:\ProgramData\Anaconda3\lib\site-packages\cleanlab\pruning.py in get_noise_indices(s, psx, inverse_noise_matrix, confident_joint, frac_noise, num_to_remove_per_class, prune_method, sorted_index_method, multi_label, n_jobs, verbose)
334 )
335 else:
--> 336 noise_masks_per_class = p.map(_prune_by_count, range(K))
337 else: # n_jobs = 1, so no parallelization
338 noise_masks_per_class = [_prune_by_count(k) for k in range(K)]
D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py in map(self, func, iterable, chunksize)
266 in a list that is returned.
267 '''
--> 268 return self._map_async(func, iterable, mapstar, chunksize).get()
269
270 def starmap(self, func, iterable, chunksize=None):
D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):
NameError: name 'psx' is not defined
`
@cgnorthcutt , thank you!
from cleanlab.
@UCASREN can you please run my example above?
from cleanlab.
@cgnorthcutt yes, when I change the pruning line, it is working properly. Thank you very much~
from cleanlab.
@cgnorthcutt the code you wrote runs perfectly! I think you spotted the problem...Windows
Thank you very much!
from cleanlab.
@cgnorthcutt , I am implementing Rank Pruning algorithm on IRIS dataset, after using below mentioned code, "NameError: name 'psx' is not defined" is resolved. but now it is "NameError: name 'my_psx' is not defined"
label_errors = get_noise_indices(
y_train,
my_psx,
n_jobs=1, # BE SURE TO ADD THIS -- it turns off multiprocessing
)
from cleanlab.
Yes because my_psx
does not exist. You should write psx=my_psx
from cleanlab.
@kagalkot @stonk97 @awoloshuk @UCASREN Issues using Windows should be fixed. The main commit with the fix is cgnorthcutt@cf93242.
cleanlab now supports Windows multiprocessing and Python 3.4, 3.5, 3.6, 3.7 natively.
Upgrade your cleanlab version to 0.1.1 for the fix. To update just type pip install cleanlab
in your terminal and it should install the latest version.
from cleanlab.
Related Issues (20)
- Error in null: Ambiguous truth value of a Series HOT 4
- Add end-to-end tests at the end of Datalab quickstart tutorial
- get rid of warnings in the datalab quickstart tutorial
- Remove Tensorflow version constraint in developer dependencies
- add unit test with all identical dataset HOT 3
- Difference of object detection confident learning with objectlab paper HOT 1
- update coveragerc to only skip over specific experimental subfolders that currently are untested
- Null issue check throwing an error HOT 1
- lab.find_issues(features=features) outputs error for underperforming issue HOT 1
- Object detection, segmentation k-fold practical issue HOT 1
- Trying to create Datalab object with label set to a dtype of 'category' but getting 'NotImplementedError'
- test_scores_for_identical_examples unit test fails
- be able to pass in kwargs to plt.show()
- datalab issue guide should better describe the relevant cleanlab columns
- Trying to build docs with a new notebook I have created but getting `AttributeError` from the audio.ipynb tutorial HOT 1
- Doctests are failing for some functions HOT 1
- In the “Synthetic Data Quality” part, do we need the same amount of real data and generated data HOT 1
- image datalab tutorial broken: Getting build error RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [64, 1, 1, 28, 28] HOT 2
- 3D Cleanlab / DCAI ?
- Follow-Up: Revert macOS CI Environment to Latest Version Once Python Compatibility Is Resolved
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cleanlab.