Git Product home page Git Product logo

Comments (6)

sgbaird avatar sgbaird commented on June 11, 2024

Create two time-splits, one based on MPIDs and one based on ICSD earliest pub year (based on two suggestions linked above).

from xtal2png.

sgbaird avatar sgbaird commented on June 11, 2024

Useful: sklearn.model_selection.TimeSeriesSplit

from xtal2png.

sgbaird avatar sgbaird commented on June 11, 2024

By default TimeSeriesSplit train/test indices for a total of 10 compounds would look like:

TRAIN: [0 1 2 3 4] TEST: [5]
TRAIN: [0 1 2 3 4 5] TEST: [6]
TRAIN: [0 1 2 3 4 5 6] TEST: [7]
TRAIN: [0 1 2 3 4 5 6 7] TEST: [8]
TRAIN: [0 1 2 3 4 5 6 7 8] TEST: [9]

Considering ignoring test_index and using np.setdiff to make the test indices include "everything else" as in everything other than the train_index-s. Or could leave it as-is, but seems kind of strange not to use all the data. Makes sense why TimeSeriesSplit is set up this way, though.

from xtal2png.

sgbaird avatar sgbaird commented on June 11, 2024

Alternative:

TRAIN: [0 1 2 3 4] TEST: [5 6 7 8 9]
TRAIN: [0 1 2 3 4 5] TEST: [6 7 8 9]
TRAIN: [0 1 2 3 4 5 6] TEST: [7 8 9]
TRAIN: [0 1 2 3 4 5 6 7] TEST: [8 9]
TRAIN: [0 1 2 3 4 5 6 7 8] TEST: [9]

Was expected the first to be a 20/80 split, though. Maybe just need to wrap around KFold

from xtal2png.

sgbaird avatar sgbaird commented on June 11, 2024

Here's something very close to what I was originally imagining. It's based on KFold(n_splits=6) + some post-processing:

TRAIN: [0 1] TEST: [2 3 4 5 6 7 8 9]
TRAIN: [0 1 2 3] TEST: [4 5 6 7 8 9]
TRAIN: [0 1 2 3 4 5] TEST: [6 7 8 9]
TRAIN: [0 1 2 3 4 5 6 7] TEST: [8 9]
TRAIN: [0 1 2 3 4 5 6 7 8] TEST: [9]

Default TimeSeriesSplit might not be so bad in that taking the mean and stdDev of the performance metric is more straightforward: no weighting involved. Might be better to go with the more interpretable option, which is the default behavior of TimeSeriesSplit.

@hasan-sayeed @sp8rks feel free to chime in on this one if you have any thoughts.

from xtal2png.

sgbaird avatar sgbaird commented on June 11, 2024

@hasan-sayeed you should be able to use mp_time_split per the usage instructions or one of the examples.

If you run into any issues, lmk at https://github.com/sparks-baird/mp-time-split/issues

from xtal2png.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.