Motivation When training a model using a nested cross-validation a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you for feature request. Do you think <a class="issue-link js-issue-link" data-e

Nested studies for Nested Cross-Validation Support about optuna HOT 2 OPEN

Jose-Verdu-Diaz commented on July 2, 2024

Nested studies for Nested Cross-Validation Support

from optuna.

Comments (2)

Jose-Verdu-Diaz commented on July 2, 2024 3

@nzw0301 Thank you for your reply. #5118 seems to have a confusing objective, and the answers seem to solve the issue using features already implemented in Optuna.

The proposed feature here might also help in #5118 , but I would consider this as an independent issue. See below a diagram of the proposed structure. The role of Parent Wrapper is to allow the comparison between different runs of nested cross-validation, keeping the study database table organized and the optuna-dashboard tidy. Maybe we can adopt the term "Experiment" for this new class?

The results of individual Studies are irrelevant when assessing the results of nested cross-validation, as we are interested in the overall performance of all studies (i.e mean accuracy of the best model of each Study). The "Experiment" class could implement a method set_metric() or equivalent to manually set the overall performance metric of the experiment. This Parent Wrapper could be used for different uses than nested cross-validation, allowing users to organize studies into groups.

I think users should not be forced to create an Experiment to be able to create Studies, as this would alter current implementations with Optuna and many users won't need Experiments at all. Instead, an Experiment object could be created and passed when creating Study objects. The Study database table could then store a Foreign Key to the parent Experiment. See below an example usage:

import optuna
import numpy as np
from sklearn.model_selection import KFold, cross_val_score

'''
Suppose that `X` and `y` are the input features and
input labels, respectively.

Suppose that `pipeline` is an sklearn Pipeline.
'''

search_spaces = {
    'param_1': optuna.distributions.FloatDistribution(0.001, 0.4, log=True),
    'param_2': optuna.distributions.IntDistribution(10, 500),
}

storage = 'sqlite:///example-storage.db'

# Here, we create the new Experiment object
experiment = Experiment(
    experiment_name = 'nestedCV-1',
    storage = storage
)

outer_scores = []
cv_outer = KFold()
for split, (train_idx, test_idx) in enumerate(cv_outer.split(X, y)):

    study = optuna.create_study(
        study_name=f'study_{split}', 
        storage=storage, 
        direction='maximize',
        experiment=experiment # We assign this study to the experiment object we defined
    )

    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

    def objective(trial):
        trial_params = {key: trial._suggest(key, dist) for key, dist in search_spaces.items()}

        pipeline.set_params(**trial_params)
        cv_inner = KFold()   
        scores = cross_val_score(pipeline, X_train, y_train, cv=cv_inner)
        return scores.mean()
    
    study.optimize(objective, n_trials=100)

    best_params = study.best_params
    pipeline.set_params(**best_params)
    pipeline.fit(X_train, y_train)
    test_score = pipeline.score(X_test, y_test)
    outer_scores.append(test_score)

experiment.set_metric('Mean Accuracy', np.mean(outer_scores)) # We add the overall performance metric (similar to `set_user_attr()`)

from optuna.

nzw0301 commented on July 2, 2024

Thank you for feature request. Do you think #5118 is the same as this request?

from optuna.

Nested studies for Nested Cross-Validation Support about optuna HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent