ray-project / lightgbm_ray Goto Github PK
View Code? Open in Web Editor NEWLightGBM on Ray
License: Apache License 2.0
LightGBM on Ray
License: Apache License 2.0
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (210771146 vs. 104857600)"
debug_error_string = "{"created":"@1646188124.309444695","description":"Error received from peer ipv4:10.207.183.32:40455","file":"src/core/lib/surface/call.cc","file_line":1074,"grpc_message":"Received message larger than max (210771146 vs. 104857600)","grpc_status":8}"
I am using RayDP with Spark and am using this package with Ray Tune for HyperParameter Optimization with the lightGBM regressor. Unless there is something I'm missing, there's no way to use lgbm's native cross validation as in Ray's examples, this would be a huge help to model accuracy when training large models.
Hi, is it possible to load LIBSVM files using the RayDMatrix class?
Thanks
I've just realized that RayDMatrix
class in lightgbm_ray
is named xgboost_ray.matrix.RayDMatrix
, rather than lightgbm_ray.matrix.RayDMatrix
.
I understand code re-use, but name re-use? It is apparently by design (as per docs), but in my opinion it violates the principle of least astonishment and thus should be changed to a more intuitive lightgbm_ray.matrix.RayDMatrix
.
To reproduce:
from lightgbm_ray import RayParams as test1, RayDMatrix as test2
print([test1, test2])
Output:
[<class 'lightgbm_ray.main.RayParams'>, <class 'xgboost_ray.matrix.RayDMatrix'>]
@Yard1 Hi Sir, I was trying Light GBM Ray on a large dataset with 3 num actors and 3 CPUs per actor. With this context, the result keeps changing across different runs. Can you guide how to make results reproducable in LightGBM-Ray ?
I have set the following seeds:
Lightgbm random state seed
import numpy as np
np.random.seed(seed)
import random as python_random
python_random.seed(seed)
Any more seeds or parameters to set ?
I'm getting the following errors when I try to import raylgbmclassifier.
Traceback (most recent call last):
File "/home/mforeman/miniconda3/envs/rapids-23.06/classifiers4.py", line 14, in <module>
from lightgbm_ray import RayLGBMClassifier
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/lightgbm_ray/__init__.py", line 1, in <module>
from lightgbm_ray.main import RayParams, train, predict
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/lightgbm_ray/main.py", line 55, in <module>
from xgboost_ray.main import (
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/xgboost_ray/__init__.py", line 1, in <module>
from xgboost_ray.main import RayParams, predict, train
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/xgboost_ray/main.py", line 76, in <module>
from xgboost_ray.matrix import (
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/xgboost_ray/matrix.py", line 36, in <module>
from ray.data.dataset import Dataset as RayDataset
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/data/__init__.py", line 5, in <module>
from ray.data._internal.compute import ActorPoolStrategy
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/data/_internal/compute.py", line 8, in <module>
from ray.data._internal.delegating_block_builder import DelegatingBlockBuilder
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/data/_internal/delegating_block_builder.py", line 4, in <module>
from ray.data._internal.arrow_block import ArrowBlockBuilder
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/data/_internal/arrow_block.py", line 22, in <module>
from ray.data._internal.numpy_support import (
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/data/_internal/numpy_support.py", line 5, in <module>
from ray.air.util.tensor_extensions.utils import create_ragged_ndarray
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/air/__init__.py", line 1, in <module>
from ray.air.checkpoint import Checkpoint
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/air/checkpoint.py", line 22, in <module>
from ray.air._internal.remote_storage import (
File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/air/_internal/remote_storage.py", line 142, in <module>
_cached_fs: Dict[tuple, Tuple[float, pyarrow.fs.FileSystem]] = {}
AttributeError: 'NoneType' object has no attribute 'fs'
Hi,
I've been testing using the interaction_constraints
parameter from lightgbm (see here).
Unfortunately, passing in the list of constraints causes training to fail with a sigsegv error.
Example:
#%%
# set up and load boston data
import numpy as np
import pandas as pd
from lightgbm_ray import RayLGBMRegressor, RayParams, RayDMatrix
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import ray
import os
boston = load_boston()
x, y = boston.data, boston.target
df = pd.DataFrame(x, columns= boston.feature_names)
# make into dmatrix
train_df_with_target = df.copy()
train_df_with_target['target'] = y
train_set = RayDMatrix(
data=train_df_with_target,
label = 'target'
)
# set params and ray params
params = {
'boosting_type': 'goss',
'objective': 'regression',
'metric': 'rmse',
'num_leaves': 10,
'max_depth': 4,
'learning_rate': 0.05,
'verbose': 1
}
ray_params = RayParams(
num_actors=2,
cpus_per_actor = 2,
)
#%% set up constraint (age cannot interact with any other feature)
constrained_feature = 'AGE'
other_features = [x for x,y in enumerate(df.columns) if y != constrained_feature ]
constrained_feature_idx = [x for x,y in enumerate(df.columns) if y == constrained_feature ]
constraint = [constrained_feature_idx, other_features]
#%% fit model
mod_ray_constrained = RayLGBMRegressor(
random_state=100,
interaction_constraints = constraint,
**params
)
mod_ray_constrained.fit(train_set,
y='target',
eval_set = [(train_set, 'target')],
eval_names=["train"],
ray_params=ray_params)
The constrained model fit returns the error:
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) *** SIGSEGV received at time=1673976819 on cpu 3 ***
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) PC: @ 0x7fedc74926f7 (unknown) (unknown)
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) @ 0x7fedc750d420 (unknown) (unknown)
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) [2023-01-17 09:33:39,994 E 266 292] logging.cc:361: *** SIGSEGV received at time=1673976819 on cpu 3 ***
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) [2023-01-17 09:33:39,994 E 266 292] logging.cc:361: PC: @ 0x7fedc74926f7 (unknown) (unknown)
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) [2023-01-17 09:33:39,994 E 266 292] logging.cc:361: @ 0x7fedc750d420 (unknown) (unknown)
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) Fatal Python error: Segmentation fault
Running this code with non-distributed lightgbm
works fine, as does the above code with interaction constraints removed.
Client tests in test_end_to_end.py
time out often during Github Actions CI (though not always). This should be fixed.
Doesn't seem to time out locally. My guess it's due to less cores being available on the CI runner.
I have some code that uses a callback to stop a Ray Tune trial if the complexity of the model (total leaves in the model) exceeds a given threshold). This works fine with a normal lightgbm
model but fails when I use a lightgbm_ray
model.
In the below code, "use_distributed" can be toggled to True to reproduce the error.
I presume the error is because the correct way of passing the metrics back to tune is with the TuneReportCheckpointCallback()
from ray.tune.integration.lightgbm
. I've played around with this, but it seems like I can only access the metrics reported by the lightgbm model. I can't add the "total_leaves" as a metric because it relies on accessing the model itself, not just the data and predictions.
Is it possible to report total_leaves
to ray tune with lightgbm_ray?
#%%
# set up and load boston data
import numpy as np
import pandas as pd
import os
import lightgbm
from lightgbm_ray import RayLGBMRegressor, RayParams, RayDMatrix
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import ray
from ray.air import session
from ray import tune
from ray.tune.search.optuna import OptunaSearch
ray.shutdown()
## Initialise ray:
if ray.is_initialized() == False:
service_host = os.environ['RAY_HEAD_SERVICE_HOST']
service_port = os.environ['RAY_HEAD_SERVICE_PORT']
ray.init(
f'ray://{service_host}:{service_port}'
)
use_distributed = False
out_dir =< '/path/to/output_folder'>
boston = load_boston()
x, y = boston.data, boston.target
df = pd.DataFrame(x, columns= boston.feature_names)
# make into dmatrix
if use_distributed:
actors = 2
ray_params = RayParams(
num_actors= actors,
cpus_per_actor = 2,
)
train_df_with_target = df.copy()
train_df_with_target['target'] = y
train_set = RayDMatrix(
data=train_df_with_target,
label = 'target'
)
else:
actors = 1
# set params and ray params
params = {
'boosting_type': 'goss',
'objective': 'regression',
'metric': 'rmse',
'n_estimators':100,
'num_leaves': 6,
'max_depth': 3,
'learning_rate': tune.quniform(0.05,0.1, 0.01),
'verbose': 1
}
#%% define function to count total leaves in model
def leaves_callback(env):
model = env.model
mod_dump = model.dump_model()
tree_info = mod_dump['tree_info']
num_leaves = 0
num_iterations = 0
for tree in tree_info:
num_leaves += tree['num_leaves']
num_iterations += 1
session.report({'total_leaves': num_leaves,
"rmse_train": env.evaluation_result_list[0][2],
'num_iterations': num_iterations})
# define trainable
def trainable(params):
if use_distributed:
mod_ray = RayLGBMRegressor(
random_state=100,
**params
)
mod_ray.fit(train_set,
y='target',
eval_set = [(train_set, 'target')],
eval_names=["train"],
ray_params=ray_params,
callbacks = [leaves_callback])
else:
mod = lightgbm.LGBMRegressor(
random_state=100,
**params
)
mod.fit(X = x,
y=y,
eval_set = [(x, y)],
eval_names=["train"],
callbacks = [leaves_callback])
#%% RUN TUNING
resources = [{'CPU': 2.0} for x in range(actors+1)] + [{'CPU': 1.0}]
analysis = tune.Tuner(
tune.with_resources(
trainable,
tune.PlacementGroupFactory(
resources,
strategy='PACK')
),
tune_config=tune.TuneConfig(
metric="rmse_train",
mode= "min",
search_alg=OptunaSearch(),
num_samples=5),
run_config= ray.air.RunConfig(local_dir=out_dir,
name = 'test_callback',
stop = {'total_leaves': 300}),
param_space= params,
)
results = analysis.fit()
If I toggle use_distributed to True
(_RemoteRayLightGBMActor pid=585, ip=10.99.15.76) File "/opt/conda/lib/python3.9/site-packages/ray/air/session.py", line 61, in report
(_RemoteRayLightGBMActor pid=585, ip=10.99.15.76) _get_session().report(metrics, checkpoint=checkpoint)
(_RemoteRayLightGBMActor pid=585, ip=10.99.15.76) AttributeError: 'NoneType' object has no attribute 'report'
If I toggle use_distributed to False, I get the expected result:
(TunerInternal pid=2096) +--------------------+------------+-----------------+-----------------+--------+------------------+----------------+--------------+------------------+
(TunerInternal pid=2096) | Trial name | status | loc | learning_rate | iter | total time (s) | total_leaves | rmse_train | num_iterations |
(TunerInternal pid=2096) |--------------------+------------+-----------------+-----------------+--------+------------------+----------------+--------------+------------------|
(TunerInternal pid=2096) | trainable_a895fa72 | TERMINATED | 10.99.5.8:2131 | 0.05 | 56 | 0.24444 | 300 | 3.44845 | 56 |
(TunerInternal pid=2096) | trainable_aa0be088 | TERMINATED | 10.99.5.8:2131 | 0.1 | 61 | 0.296924 | 302 | 2.82896 | 61 |
(TunerInternal pid=2096) | trainable_aa3ab41c | TERMINATED | 10.99.5.8:2131 | 0.08 | 60 | 0.354107 | 301 | 2.89081 | 60 |
(TunerInternal pid=2096) | trainable_aa6d4d32 | TERMINATED | 10.99.15.76:749 | 0.07 | 59 | 0.310418 | 300 | 2.99355 | 59 |
(TunerInternal pid=2096) | trainable_aa89c7a0 | TERMINATED | 10.99.5.8:2131 | 0.05 | 56 | 0.265122 | 300 | 3.44845 | 56 |
(TunerInternal pid=2096) +--------------------+------------+-----------------+-----------------+--------+------------------+----------------+--------------+------------------+
Setup
conda create --name lgbm python=3.8
conda activate lgbm
conda install lightgbm
pip install lightgbm_ray
Script:
# light_ray.py
from lightgbm_ray import RayDMatrix, RayParams, train
from sklearn.datasets import load_breast_cancer
train_x, train_y = load_breast_cancer(return_X_y=True)
train_set = RayDMatrix(train_x, train_y)
evals_result = {}
bst = train(
{
"objective": "binary",
"metric": ["binary_logloss", "binary_error"],
},
train_set,
num_boost_round=10,
evals_result=evals_result,
valid_sets=[train_set],
valid_names=["train"],
verbose_eval=False,
ray_params=RayParams(num_actors=2, cpus_per_actor=2))
bst.booster_.save_model("model.lgbm")
Exception:
% python light_ray.py
Traceback (most recent call last):
File "light_ray.py", line 1, in <module>
from lightgbm_ray import RayDMatrix, RayParams, train
File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm_ray/__init__.py", line 1, in <module>
from lightgbm_ray.main import RayParams, train, predict
File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm_ray/main.py", line 44, in <module>
from lightgbm import LGBMModel, LGBMRanker, Booster
File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm/__init__.py", line 8, in <module>
from .basic import Booster, Dataset, register_logger
File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm/basic.py", line 95, in <module>
_LIB = _load_lib()
File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm/basic.py", line 86, in _load_lib
lib = ctypes.cdll.LoadLibrary(lib_path[0])
File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/ctypes/__init__.py", line 459, in LoadLibrary
return self._dlltype(name)
File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/ctypes/__init__.py", line 381, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm/lib_lightgbm.so, 6): Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib
Referenced from: /Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm/lib_lightgbm.so
Reason: image not found
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.