Comments (6)
Create two time-splits, one based on MPIDs and one based on ICSD earliest pub year (based on two suggestions linked above).
from xtal2png.
Useful: sklearn.model_selection.TimeSeriesSplit
from xtal2png.
By default TimeSeriesSplit
train/test indices for a total of 10 compounds would look like:
TRAIN: [0 1 2 3 4] TEST: [5]
TRAIN: [0 1 2 3 4 5] TEST: [6]
TRAIN: [0 1 2 3 4 5 6] TEST: [7]
TRAIN: [0 1 2 3 4 5 6 7] TEST: [8]
TRAIN: [0 1 2 3 4 5 6 7 8] TEST: [9]
Considering ignoring test_index
and using np.setdiff
to make the test indices include "everything else" as in everything other than the train_index
-s. Or could leave it as-is, but seems kind of strange not to use all the data. Makes sense why TimeSeriesSplit
is set up this way, though.
from xtal2png.
Alternative:
TRAIN: [0 1 2 3 4] TEST: [5 6 7 8 9]
TRAIN: [0 1 2 3 4 5] TEST: [6 7 8 9]
TRAIN: [0 1 2 3 4 5 6] TEST: [7 8 9]
TRAIN: [0 1 2 3 4 5 6 7] TEST: [8 9]
TRAIN: [0 1 2 3 4 5 6 7 8] TEST: [9]
Was expected the first to be a 20/80 split, though. Maybe just need to wrap around KFold
from xtal2png.
Here's something very close to what I was originally imagining. It's based on KFold(n_splits=6)
+ some post-processing:
TRAIN: [0 1] TEST: [2 3 4 5 6 7 8 9]
TRAIN: [0 1 2 3] TEST: [4 5 6 7 8 9]
TRAIN: [0 1 2 3 4 5] TEST: [6 7 8 9]
TRAIN: [0 1 2 3 4 5 6 7] TEST: [8 9]
TRAIN: [0 1 2 3 4 5 6 7 8] TEST: [9]
Default TimeSeriesSplit
might not be so bad in that taking the mean and stdDev of the performance metric is more straightforward: no weighting involved. Might be better to go with the more interpretable option, which is the default behavior of TimeSeriesSplit
.
@hasan-sayeed @sp8rks feel free to chime in on this one if you have any thoughts.
from xtal2png.
@hasan-sayeed you should be able to use mp_time_split
per the usage instructions or one of the examples.
If you run into any issues, lmk at https://github.com/sparks-baird/mp-time-split/issues
from xtal2png.
Related Issues (20)
- JOSS paper review - Documentation HOT 3
- `func:` syntax issue in API docs HOT 2
- Add hardcoded reference image test once API is stable (i.e. in conjunction with results manuscript, `v1.0.0`)
- Add `element-coder` to `conda-forge` HOT 4
- Suggestion: Add CLI parameter for `max_sites`
- Bug: `xtal2png` error with fractional occupancy HOT 1
- JOSS paper review - Installation docs HOT 8
- JOSS paper review - Docs
- interpretability of models trained on xtal2png HOT 3
- Any acknowledgements that need to be added to `paper.md`? HOT 6
- Are the distance matrices periodic by default? HOT 1
- Generalization to building blocks rather than only atoms HOT 1
- Might be interesting to add GitHub action for repo-visualizer, and include the image in the contributing docs
- use `xtal2png` with `imagen-pytorch` and `matbench-genmetrics` HOT 6
- local variable 'arr' referenced before assignment due to list of lists
- Run matbench-genmetrics on the latest imagen-pytorch run (fixup mod-petti featurizer) HOT 1
- lower_tri mask type zeros out everything
- add masking to intro tutorial
- Predict synthesis routes for DFT-validated xtal2png structures
- Use something similar to Xie's decoder/denoiser architecture for the xtal2png representation (e.g. m3gnet)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xtal2png.