TLDR; a scoring method is chosen, but it does not seem to be followed. <h2 dir="au

Questionable Score Fitting about tpot HOT 5 CLOSED

jeff-hykin commented on June 14, 2024

Questionable Score Fitting

from tpot.

Comments (5)

perib commented on June 14, 2024 1

We would appreciate the PR! We have moved development to focus on TPOT2, which you can find here: https://github.com/EpistasisLab/tpot2 . It was rebuilt from the ground up to be more modular and easier to develop.

In TPOT2, we have the parameters validation_fraction and validation_strategy to try to address this issue. It can hold out x% of the training data, which it will use as a validation set for the pareto front models.

from tpot.

perib commented on June 14, 2024 1

great! Glad the issue has been resolved. I would also recommend trying out our next version of TPOT, TPOT2 found here: https://github.com/EpistasisLab/tpot2

from tpot.

perib commented on June 14, 2024

Your links go to a 404 error page; the repository is probably private. Your screenshot shows that one of the CV scores has an expected value of around .68. (It would be helpful to copy/paste your code rather than use screenshots, so we can quickly test it ourselves).

Is the lower accuracy you refer to on the out-of-sample test set? Sometimes TPOT can overfit the CV score with overly complex pipelines. So, while it is "correctly" optimizing the objective function, the result is a pipeline that performs poorly on held-out data. One option is that you could look at the pareto_front_fitted_pipelines_ to see if any of those simpler pipelines have better performance. (For example, You could hold out a validation set, use that to select from the Pareto front, and then do a final test on that dataset.) By default tpot uses 5 fold CV, you could try setting cv=10. You can also minimize complexity with template="selector-transformer-classifier"

When I run the TPOT with a test dataset with this scorer, it seems to accurately maximize the objective function.

from tpot.

jeff-hykin commented on June 14, 2024

Thanks for the reply, and sorry about the links, I forgot the development repo was private.

I think you're right, and yes there is a held-out dataset for the final evaluation score. I was thinking 5 fold CV within tpot would be enough to detect/prevent overfitting, but the training set does have very few positive examples so I probably should've been looking for it. I'll run some tests and if that is the case, then this is definitely non-issue (and sorry about that). If it is the case, maybe I'll add some overfitting detection or limited-examples-warning in a PR.

As for reproducing, no worries, I was just looking for high-level feedback first (rather than a full reproduction/debugging/analysis). If does appear to be a bug, I'll put in the work to isolate the dataset+code and make it fully reproducable on a public repo.

from tpot.

jeff-hykin commented on June 14, 2024

Alright, I confirmed sample size was the issue (e.g. no issue for tpot).

Not only that but tpot worked amazing! I was able to get +10% on an f1 score over the best hand-crafted architecture for this problem.

Thanks again for the help

from tpot.

Questionable Score Fitting about tpot HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent