Git Product home page Git Product logo

lightgbm_ray's People

Contributors

jimthompson5802 avatar justinvyu avatar krfricke avatar peytondmurray avatar richardliaw avatar yard1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lightgbm_ray's Issues

"grpc_message":"Received message larger than max

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (210771146 vs. 104857600)"
debug_error_string = "{"created":"@1646188124.309444695","description":"Error received from peer ipv4:10.207.183.32:40455","file":"src/core/lib/surface/call.cc","file_line":1074,"grpc_message":"Received message larger than max (210771146 vs. 104857600)","grpc_status":8}"

Support for CrossValidation: Enhancement Request

I am using RayDP with Spark and am using this package with Ray Tune for HyperParameter Optimization with the lightGBM regressor. Unless there is something I'm missing, there's no way to use lgbm's native cross validation as in Ray's examples, this would be a huge help to model accuracy when training large models.

Unintuitive naming of RayDMatrix class

I've just realized that RayDMatrix class in lightgbm_ray is named xgboost_ray.matrix.RayDMatrix, rather than lightgbm_ray.matrix.RayDMatrix.

I understand code re-use, but name re-use? It is apparently by design (as per docs), but in my opinion it violates the principle of least astonishment and thus should be changed to a more intuitive lightgbm_ray.matrix.RayDMatrix.

To reproduce:

from lightgbm_ray import RayParams as test1, RayDMatrix as test2
print([test1, test2])

Output:

[<class 'lightgbm_ray.main.RayParams'>, <class 'xgboost_ray.matrix.RayDMatrix'>]

Ray lightgbm reproducibility issue

@Yard1 Hi Sir, I was trying Light GBM Ray on a large dataset with 3 num actors and 3 CPUs per actor. With this context, the result keeps changing across different runs. Can you guide how to make results reproducable in LightGBM-Ray ?

I have set the following seeds:

Lightgbm random state seed

import numpy as np
np.random.seed(seed)

import random as python_random
python_random.seed(seed)

Any more seeds or parameters to set ?

import RayLGBMClassifier error

I'm getting the following errors when I try to import raylgbmclassifier.

Traceback (most recent call last):
  File "/home/mforeman/miniconda3/envs/rapids-23.06/classifiers4.py", line 14, in <module>
    from lightgbm_ray import RayLGBMClassifier
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/lightgbm_ray/__init__.py", line 1, in <module>
    from lightgbm_ray.main import RayParams, train, predict
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/lightgbm_ray/main.py", line 55, in <module>
    from xgboost_ray.main import (
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/xgboost_ray/__init__.py", line 1, in <module>
    from xgboost_ray.main import RayParams, predict, train
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/xgboost_ray/main.py", line 76, in <module>
    from xgboost_ray.matrix import (
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/xgboost_ray/matrix.py", line 36, in <module>
    from ray.data.dataset import Dataset as RayDataset
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/data/__init__.py", line 5, in <module>
    from ray.data._internal.compute import ActorPoolStrategy
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/data/_internal/compute.py", line 8, in <module>
    from ray.data._internal.delegating_block_builder import DelegatingBlockBuilder
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/data/_internal/delegating_block_builder.py", line 4, in <module>
    from ray.data._internal.arrow_block import ArrowBlockBuilder
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/data/_internal/arrow_block.py", line 22, in <module>
    from ray.data._internal.numpy_support import (
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/data/_internal/numpy_support.py", line 5, in <module>
    from ray.air.util.tensor_extensions.utils import create_ragged_ndarray
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/air/__init__.py", line 1, in <module>
    from ray.air.checkpoint import Checkpoint
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/air/checkpoint.py", line 22, in <module>
    from ray.air._internal.remote_storage import (
  File "/home/mforeman/miniconda3/envs/rapids-23.06/lib/python3.10/site-packages/ray/air/_internal/remote_storage.py", line 142, in <module>
    _cached_fs: Dict[tuple, Tuple[float, pyarrow.fs.FileSystem]] = {}
AttributeError: 'NoneType' object has no attribute 'fs'

where are the examples

The example folder is empty and the links to these examples are all broken. Please provided an updated link to the examples, thank you!
image

Interaction constraints not working

Hi,

I've been testing using the interaction_constraints parameter from lightgbm (see here).

Unfortunately, passing in the list of constraints causes training to fail with a sigsegv error.

Example:

#%%
# set up and load boston data
import numpy as np
import pandas as pd
from lightgbm_ray import RayLGBMRegressor, RayParams, RayDMatrix
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import ray
import os

boston = load_boston()
x, y = boston.data, boston.target
df = pd.DataFrame(x, columns= boston.feature_names)

# make into dmatrix
train_df_with_target = df.copy()
train_df_with_target['target'] = y

train_set = RayDMatrix(
    data=train_df_with_target,
    label = 'target'
    )

# set params and ray params
params = {
    'boosting_type': 'goss',
    'objective': 'regression',
    'metric': 'rmse',
    'num_leaves': 10,
    'max_depth': 4,
    'learning_rate': 0.05,
    'verbose': 1
}

ray_params = RayParams(
    num_actors=2,
    cpus_per_actor = 2,
    )



#%% set up constraint (age cannot interact with any other feature)
constrained_feature = 'AGE'
other_features = [x for x,y in enumerate(df.columns) if y != constrained_feature ]
constrained_feature_idx = [x for x,y in enumerate(df.columns) if y == constrained_feature ]

constraint = [constrained_feature_idx, other_features]


#%% fit model
mod_ray_constrained = RayLGBMRegressor(
    random_state=100,
    interaction_constraints = constraint,
    **params
)


mod_ray_constrained.fit(train_set,
    y='target',
    eval_set = [(train_set, 'target')],
    eval_names=["train"],
    ray_params=ray_params)

The constrained model fit returns the error:

(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) *** SIGSEGV received at time=1673976819 on cpu 3 ***
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) PC: @ 0x7fedc74926f7 (unknown) (unknown)
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) @ 0x7fedc750d420 (unknown) (unknown)
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) [2023-01-17 09:33:39,994 E 266 292] logging.cc:361: *** SIGSEGV received at time=1673976819 on cpu 3 ***
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) [2023-01-17 09:33:39,994 E 266 292] logging.cc:361: PC: @ 0x7fedc74926f7 (unknown) (unknown)
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) [2023-01-17 09:33:39,994 E 266 292] logging.cc:361: @ 0x7fedc750d420 (unknown) (unknown)
(_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) Fatal Python error: Segmentation fault

Running this code with non-distributed lightgbm works fine, as does the above code with interaction constraints removed.

Fix client tests being flaky (timing out)

Client tests in test_end_to_end.py time out often during Github Actions CI (though not always). This should be fixed.

Doesn't seem to time out locally. My guess it's due to less cores being available on the CI runner.

Ray Tune custom callback based on model structure

I have some code that uses a callback to stop a Ray Tune trial if the complexity of the model (total leaves in the model) exceeds a given threshold). This works fine with a normal lightgbmmodel but fails when I use a lightgbm_ray model.

In the below code, "use_distributed" can be toggled to True to reproduce the error.

I presume the error is because the correct way of passing the metrics back to tune is with the TuneReportCheckpointCallback() from ray.tune.integration.lightgbm. I've played around with this, but it seems like I can only access the metrics reported by the lightgbm model. I can't add the "total_leaves" as a metric because it relies on accessing the model itself, not just the data and predictions.

Is it possible to report total_leaves to ray tune with lightgbm_ray?

#%%
# set up and load boston data
import numpy as np
import pandas as pd
import os
import lightgbm
from lightgbm_ray import RayLGBMRegressor, RayParams, RayDMatrix
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import ray
from ray.air import session
from ray import tune
from ray.tune.search.optuna import OptunaSearch


ray.shutdown()
## Initialise ray:
if ray.is_initialized() == False:
    service_host = os.environ['RAY_HEAD_SERVICE_HOST']
    service_port = os.environ['RAY_HEAD_SERVICE_PORT']
    ray.init(
        f'ray://{service_host}:{service_port}'
    )

use_distributed = False
out_dir =< '/path/to/output_folder'>

boston = load_boston()
x, y = boston.data, boston.target
df = pd.DataFrame(x, columns= boston.feature_names)

# make into dmatrix
if use_distributed:
    actors = 2
    ray_params = RayParams(
        num_actors= actors,
        cpus_per_actor = 2,
    )


    train_df_with_target = df.copy()
    train_df_with_target['target'] = y

    train_set = RayDMatrix(
        data=train_df_with_target,
        label = 'target'
        )
else:
    actors = 1
    

# set params and ray params
params = {
    'boosting_type': 'goss',
    'objective': 'regression',
    'metric': 'rmse',
    'n_estimators':100,
    'num_leaves': 6,
    'max_depth': 3,
    'learning_rate': tune.quniform(0.05,0.1, 0.01),
    'verbose': 1
}




#%% define function to count total leaves in model
def leaves_callback(env):
    model = env.model

    mod_dump = model.dump_model()
    tree_info = mod_dump['tree_info']
    num_leaves = 0
    num_iterations = 0
    for tree in tree_info:
        num_leaves += tree['num_leaves']
        num_iterations += 1

    session.report({'total_leaves': num_leaves,
                    "rmse_train":  env.evaluation_result_list[0][2],
                    'num_iterations': num_iterations})

# define trainable
def trainable(params):
    if use_distributed:
        mod_ray = RayLGBMRegressor(
            random_state=100,
            **params
        )


        mod_ray.fit(train_set,
            y='target',
            eval_set = [(train_set, 'target')],
            eval_names=["train"],
            ray_params=ray_params,
            callbacks = [leaves_callback])
    else:
        mod = lightgbm.LGBMRegressor(
            random_state=100,
            **params
        )

        mod.fit(X = x,
            y=y,
            eval_set = [(x, y)],
            eval_names=["train"],
            callbacks = [leaves_callback])


#%% RUN TUNING

resources = [{'CPU': 2.0} for x in range(actors+1)] + [{'CPU': 1.0}]

analysis = tune.Tuner(
    tune.with_resources(
            trainable,
            tune.PlacementGroupFactory(
                resources,
                strategy='PACK')
        ),
    tune_config=tune.TuneConfig(
        metric="rmse_train",
        mode= "min",
        search_alg=OptunaSearch(),
        num_samples=5),
        
    run_config= ray.air.RunConfig(local_dir=out_dir,
                                name = 'test_callback',
                                stop = {'total_leaves': 300}),
    param_space= params,     
    )


results = analysis.fit()

If I toggle use_distributed to True

(_RemoteRayLightGBMActor pid=585, ip=10.99.15.76) File "/opt/conda/lib/python3.9/site-packages/ray/air/session.py", line 61, in report
(_RemoteRayLightGBMActor pid=585, ip=10.99.15.76) _get_session().report(metrics, checkpoint=checkpoint)
(_RemoteRayLightGBMActor pid=585, ip=10.99.15.76) AttributeError: 'NoneType' object has no attribute 'report'

If I toggle use_distributed to False, I get the expected result:

(TunerInternal pid=2096) +--------------------+------------+-----------------+-----------------+--------+------------------+----------------+--------------+------------------+
(TunerInternal pid=2096) | Trial name | status | loc | learning_rate | iter | total time (s) | total_leaves | rmse_train | num_iterations |
(TunerInternal pid=2096) |--------------------+------------+-----------------+-----------------+--------+------------------+----------------+--------------+------------------|
(TunerInternal pid=2096) | trainable_a895fa72 | TERMINATED | 10.99.5.8:2131 | 0.05 | 56 | 0.24444 | 300 | 3.44845 | 56 |
(TunerInternal pid=2096) | trainable_aa0be088 | TERMINATED | 10.99.5.8:2131 | 0.1 | 61 | 0.296924 | 302 | 2.82896 | 61 |
(TunerInternal pid=2096) | trainable_aa3ab41c | TERMINATED | 10.99.5.8:2131 | 0.08 | 60 | 0.354107 | 301 | 2.89081 | 60 |
(TunerInternal pid=2096) | trainable_aa6d4d32 | TERMINATED | 10.99.15.76:749 | 0.07 | 59 | 0.310418 | 300 | 2.99355 | 59 |
(TunerInternal pid=2096) | trainable_aa89c7a0 | TERMINATED | 10.99.5.8:2131 | 0.05 | 56 | 0.265122 | 300 | 3.44845 | 56 |
(TunerInternal pid=2096) +--------------------+------------+-----------------+-----------------+--------+------------------+----------------+--------------+------------------+

Error when running example

Setup

conda create --name lgbm python=3.8
conda activate lgbm
conda install lightgbm
pip install lightgbm_ray

Script:

# light_ray.py
from lightgbm_ray import RayDMatrix, RayParams, train
from sklearn.datasets import load_breast_cancer

train_x, train_y = load_breast_cancer(return_X_y=True)
train_set = RayDMatrix(train_x, train_y)

evals_result = {}
bst = train(
    {
        "objective": "binary",
        "metric": ["binary_logloss", "binary_error"],
    },
    train_set,
    num_boost_round=10,
    evals_result=evals_result,
    valid_sets=[train_set],
    valid_names=["train"],
    verbose_eval=False,
    ray_params=RayParams(num_actors=2, cpus_per_actor=2))


bst.booster_.save_model("model.lgbm")

Exception:

% python light_ray.py 
Traceback (most recent call last):
  File "light_ray.py", line 1, in <module>
    from lightgbm_ray import RayDMatrix, RayParams, train
  File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm_ray/__init__.py", line 1, in <module>
    from lightgbm_ray.main import RayParams, train, predict
  File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm_ray/main.py", line 44, in <module>
    from lightgbm import LGBMModel, LGBMRanker, Booster
  File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm/__init__.py", line 8, in <module>
    from .basic import Booster, Dataset, register_logger
  File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm/basic.py", line 95, in <module>
    _LIB = _load_lib()
  File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm/basic.py", line 86, in _load_lib
    lib = ctypes.cdll.LoadLibrary(lib_path[0])
  File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/ctypes/__init__.py", line 459, in LoadLibrary
    return self._dlltype(name)
  File "/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/ctypes/__init__.py", line 381, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm/lib_lightgbm.so, 6): Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib
  Referenced from: /Users/will/opt/anaconda3/envs/lgbm/lib/python3.8/site-packages/lightgbm/lib_lightgbm.so
  Reason: image not found

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.