Git Product home page Git Product logo

Comments (11)

cgnorthcutt avatar cgnorthcutt commented on May 18, 2024 2

SOLUTION

Also can everyone @stonk97 @awoloshuk @UCASREN please try changing the pruning line to

label_errors = get_noise_indices(
    y_train,
    my_psx,
    n_jobs=1,  # BE SURE TO ADD THIS -- it turns off multiprocessing
)

where i have turned off multiprocessing. My guess is that multiprocessing in python is not working correctly with Windows. For now, this work-around (turning off multiprocessing) should solve your issue.

from cleanlab.

awoloshuk avatar awoloshuk commented on May 18, 2024

I also have the same issue

from cleanlab.

UCASREN avatar UCASREN commented on May 18, 2024

@cgnorthcutt I also have the same issue

from cleanlab.

cgnorthcutt avatar cgnorthcutt commented on May 18, 2024

Hi folks @UCASREN @awoloshuk @stonk97 Thanks for sharing. I am unable to reproduce your error. here is a full working (for me) set of code, intended to reproduce your error.

import cleanlab
from cleanlab.latent_estimation import estimate_py_noise_matrices_and_cv_pred_proba
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_digits
from cleanlab.pruning import get_noise_indices

data = load_digits()

X_train = data['data']
y_train = data['target']

est_py, est_nm, est_inv, confident_joint, my_psx = estimate_py_noise_matrices_and_cv_pred_proba(
    X=X_train,
    s=y_train,
    clf = GaussianNB()
)

label_errors = get_noise_indices(
    y_train,
    my_psx,
    verbose=1,
)

However, I have no issues running this code. Can you give me more information for each of you about why you're having the error? Also can you each include python version, how you installed cleanlab, os version, etc. after running exactly the code above?

from cleanlab.

UCASREN avatar UCASREN commented on May 18, 2024

I install cleanlab with 'pip install cleanlab', python version is 3.7.4, and Win10 environment. When I run cleanlab-master/examples/iris_simple_example.ipynb, it gets below errors:

`
WITHOUT confident learning, Iris dataset test accuracy: 0.6

Now we show the improvement using confident learning to characterize the noise
and learn on the data that is (with high confidence) labeled correctly.

WITH confident learning (noise matrix given),


RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "D:\ProgramData\Anaconda3\lib\site-packages\cleanlab\pruning.py", line 109, in _prune_by_count
noise_mask = np.zeros(len(psx), dtype=bool)
NameError: name 'psx' is not defined
"""

The above exception was the direct cause of the following exception:

NameError Traceback (most recent call last)
in
9 print()
10 print('WITH confident learning (noise matrix given),', end=" ")
---> 11 _ = rp.fit(X_train, s, noise_matrix=noise_matrix)
12 pred = rp.predict(X_test)
13 print("Iris dataset test accuracy:", round(accuracy_score(pred, y_test),2))

D:\ProgramData\Anaconda3\lib\site-packages\cleanlab\classification.py in fit(self, X, s, psx, thresholds, noise_matrix, inverse_noise_matrix)
295 inverse_noise_matrix = self.inverse_noise_matrix,
296 confident_joint = self.confident_joint,
--> 297 prune_method = self.prune_method,
298 )
299

D:\ProgramData\Anaconda3\lib\site-packages\cleanlab\pruning.py in get_noise_indices(s, psx, inverse_noise_matrix, confident_joint, frac_noise, num_to_remove_per_class, prune_method, sorted_index_method, multi_label, n_jobs, verbose)
334 )
335 else:
--> 336 noise_masks_per_class = p.map(_prune_by_count, range(K))
337 else: # n_jobs = 1, so no parallelization
338 noise_masks_per_class = [_prune_by_count(k) for k in range(K)]

D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py in map(self, func, iterable, chunksize)
266 in a list that is returned.
267 '''
--> 268 return self._map_async(func, iterable, mapstar, chunksize).get()
269
270 def starmap(self, func, iterable, chunksize=None):

D:\ProgramData\Anaconda3\lib\multiprocessing\pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):

NameError: name 'psx' is not defined
`

@cgnorthcutt , thank you!

from cleanlab.

cgnorthcutt avatar cgnorthcutt commented on May 18, 2024

@UCASREN can you please run my example above?

from cleanlab.

UCASREN avatar UCASREN commented on May 18, 2024

@cgnorthcutt yes, when I change the pruning line, it is working properly. Thank you very much~

from cleanlab.

stonk97 avatar stonk97 commented on May 18, 2024

@cgnorthcutt the code you wrote runs perfectly! I think you spotted the problem...Windows
Thank you very much!

from cleanlab.

kagalkot avatar kagalkot commented on May 18, 2024

@cgnorthcutt , I am implementing Rank Pruning algorithm on IRIS dataset, after using below mentioned code, "NameError: name 'psx' is not defined" is resolved. but now it is "NameError: name 'my_psx' is not defined"
label_errors = get_noise_indices(
y_train,
my_psx,
n_jobs=1, # BE SURE TO ADD THIS -- it turns off multiprocessing
)

from cleanlab.

cgnorthcutt avatar cgnorthcutt commented on May 18, 2024

Yes because my_psx does not exist. You should write psx=my_psx

from cleanlab.

cgnorthcutt avatar cgnorthcutt commented on May 18, 2024

@kagalkot @stonk97 @awoloshuk @UCASREN Issues using Windows should be fixed. The main commit with the fix is cgnorthcutt@cf93242.

cleanlab now supports Windows multiprocessing and Python 3.4, 3.5, 3.6, 3.7 natively.

Upgrade your cleanlab version to 0.1.1 for the fix. To update just type pip install cleanlab in your terminal and it should install the latest version.

from cleanlab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.