Git Product home page Git Product logo

Comments (7)

zeyiwen avatar zeyiwen commented on May 17, 2024 1

We have fixed the issues of using multi:softprob and n_gpus. You should see an error message saying the number of available GPUs is smaller than n_gpus, if you request using more GPUs than available. Please update thundergbm to the latest version. If the problems still exist, feel free to let us know.

Regarding the data set size, we cannot reproduce the problem. Would you please provide more information about your data set? or even better directly share the data set here.

from thundergbm.

zeyiwen avatar zeyiwen commented on May 17, 2024

Thanks for the feedback. We will work on it and get back to you, once the problem is fixed. Please stay tuned.

from thundergbm.

VoyagerIII avatar VoyagerIII commented on May 17, 2024

Thank you very much for quickly replying and fixing the bug.
And now, the probability value can be get from Parameter set:

objective='multi:softprob'

However, when I set the "n_gpus" more than 1, thunderGBM still shutdown.
Moreover, even the "n_gpus=1", when the TRAIN DATA amount is larger it will shutdown with error:

[error == cudaSuccess] out of memory.

At last, when I del the variable "model", my GPU memory will not release.How can i release it in my code?

Thanks again.

Here is my code and data:

from __future__ import division

import numpy as np
import thundergbm

from sklearn.model_selection  import train_test_split
import numpy as np  

import gc

from scipy import sparse
from scipy.sparse import csr_matrix, hstack, vstack

import warnings
import random
warnings.filterwarnings('ignore')

label= pd.read_csv("label.csv", header = None)

csr_trainData = sparse.load_npz('csr_trainData13100.npz')
csr_trainData = csr_trainData[:, :5000]
csr_trainData.shape

trainData, valData, trainLabel, valLabel = train_test_split(csr_trainData, label.iloc[:, 1], test_size=0.2, random_state=0)

clf = thundergbm.TGBMClassifier(bagging=1, lambda_tgbm=1, learning_rate=0.07, min_child_weight=1.2, n_gpus=1, verbose=0,
                            n_parallel_trees=40, gamma=0.2, depth=7, n_trees=4000, tree_method='hist', objective='multi:softprob')

clf.fit(trainData,  trainLabel)

clf.score(valData, valLabel)

pridect = clf.predict(valData)
pridect

del clf
gc.collect()

Label and data:
https://pan.baidu.com/s/1rssIuuL3icYHsNnlWfHWew
extract code:0gux

from thundergbm.

zeyiwen avatar zeyiwen commented on May 17, 2024

The code runs fine on our machine. What OS, GPUs, and CUDA version, do you use?

from thundergbm.

VoyagerIII avatar VoyagerIII commented on May 17, 2024

Ubuntu 18.04
NVIDIA:
NVIDIA-SMI 390.67 Driver Version: 390.67
CUDA:
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Can you fill up the CUDA Memory-Usage in your machine?
It's perform well in small scale. But broken in large scale Train Data.

from thundergbm.

fjgmoya avatar fjgmoya commented on May 17, 2024

Thanks for your work, @zeyiwen.

As @VoyagerIII, I find a similar error. When I execute my code with 1 GPU, no problems at all. But if I set n_gpus to 2 or 3, I find an " illegal memory access was encountered". Evidently, my computer is a 3 GPU one.

It seems to ocurr at predict time: fitting is completed succesfully. I just stoped the code after fitting and before predicting.

This is the code:

import numpy as np
import sys
from thundergbm import TGBMClassifier
from sklearn import datasets as dts
from sklearn.model_selection import train_test_split

#Overall parameters
train_ratio=0.75
random_state=123457
limit=None
num_classes=10
num_estimators=10
num_parallel_trees=100
objective='multi:softmax'
max_depth=6

#number of GPU's
num_gpus=3


#Loads dataset digits
digits=dts.load_digits()
X=digits.data
y=digits.target

# Create 0.75/0.25 train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, \
        test_size=(1-train_ratio), \
        train_size=train_ratio, \
        random_state=random_state, \
        shuffle=True, \
        stratify=None)


#Classfier
clf = TGBMClassifier(objective=objective, \
        n_trees=num_estimators, \
        n_parallel_trees=num_parallel_trees, \
        n_gpus=num_gpus, \
        depth=max_depth,
        num_class=num_classes,
        tree_method='auto')
#Fitting
clf.fit(X_train, y_train)
#sys.exit(0)

#Predicting
y_pred = clf.predict(X_test)

#Score
print("Score: %10.5f"%(np.count_nonzero(np.equal(y_pred, y_test)) / y_test.shape[0]))

Ubuntu 18.04.4 TLS
NVIDIA-SMI 396.54 3 TITAN Xp GPUs
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Thanks.

from thundergbm.

Kurt-Liuhf avatar Kurt-Liuhf commented on May 17, 2024

Hi @fjgmoya, the issue that " illegal memory access was encountered" when applying prediction on multiple GPUs is fixed. You can reinstall ThunderGBM and have a try. Thank you!

from thundergbm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.