Git Product home page Git Product logo

autoecm's Introduction

Machine Learning-based Classification of Electrochemical Impedance Spectra

Clarifications regarding the distribution of synthetic and measured impedance spectra in the labelled and unlabeled portion of the data set have been made. 26th of April 2023

This repository contains the code to the corresponding publication: "Machine learning benchmarks for the classification of equivalent circuit models from solid-state electrochemical impedance spectra" http://arxiv.org/abs/2302.03362

There was a minor bug in the splitting method for the validation set that is used for early stopping and learning rate adjustment for the CNN training. If you want to reproduce the results associated with the manuscript please checkout commit d893efd. The newest commit will yield a slighly higher CNN performance. The ranking of algorithms remains unaffected.

Setup

A requirements.txt file and, alternatively, a file environment.yml is provided to create the python environment needed to run this code.

Firstly to set up the environment, please run the following:

conda create -n "eis-ml" python=3.9.15 ipython
conda activate eis-ml
pip install -r requirements.txt

If you are on MacOS you might have to install TensorFlow and tsfresh separately/manually fix the installation of sub-dependencies etc.

Workflow

  1. Run the preprocess.py file to calculate all required data, features, and images. (The GitHub repository comes with the data for the RF and CNN models. The tsfresh feature files are too large; therefore, this step is required if you want to run the xgb model.)
  2. Run the model of your choice. The results are automatically saved in the respective results folder. The name of the folder is based on the timestamp.

Notebooks: data_vis_and_exploration.ipynb: Exploring the data and making plots.

.py files: utils.py Code to make plots utils_preprocessing.py Code to preprocess the .csv files with the spectra

preprocess.py Preprocessing of data

clf_rf.py Random forest model python script (train, test, save results) clf_xgb.py XGB model python script (train, test, save results) clf_cnn.py CNN model python script (train, test, save results)

eis folder: EIS toolkit that Raymond Gasper wrote for the BatteryDEV competition. Includes advanced options to simulate EIS data, visualize EIS data, and optimize equivalent circuit parameters based on initial guesses.

miscellaneous folder: Contains code to reproduce results shown in the supplementary information

Contribute

We welcome any further contributions to this repository. If you find bucks or are interested in new features and capabilities, please raise an issue or propose a pull request.

Data

QuantumScape (QS) provided the EIS data contained in the repository. The first data set comprises approximately 9,300 synthetic spectra with the associated Equivalent Circuit Model (ECM). The second data set contains approximately 19,000 unlabeled spectra consisting of about 80% synthetic and 20% measured data. The parameter ranges for all synthetic data are informed by the R&D of QS. The measured spectra are from a range of different materials, with some replicate measurements at different temperatures, and/or State-Of-Charge (SOC), and/or State-Of-Health (SOH).

The labeled data set is split into training and test data set: data/train_data.csv and data/test_data.csv Furthermore, the repository contains unlabeled spectra (~19k spectra): data/unlabeled_data.csv. The data in this article is shared under the terms of the CC-BY 4.0 license according to the file data\LICENSE. We thank Tim Holme from Quantum Scape for providing these data sets.

License

The code in this repository is made publicly available under the terms of the MIT license as denoted in the LICENSE file. The data in this article is shared under the terms of the CC-BY 4.0 license according to the file data\LICENSE.

Acknowledgment/Citation

If you use code from this repository for your work, please cite:

@article{Schaeffer_2023,
author = {Joachim Schaeffer and Paul Gasper and Esteban Garcia-Tamayo and Raymond Gasper and Masaki Adachi and Juan Pablo Gaviria-Cardona and Simon Montoya-Bedoya and Anoushka Bhutani and Andrew Schiek and Rhys Goodall and Rolf Findeisen and Richard D. Braatz and Simon Engelke},
doi = {10.1149/1945-7111/acd8fb},
journal = {Journal of The Electrochemical Society},
month = {jun},
number = {6},
pages = {060512},
publisher = {IOP Publishing},
title = {Machine Learning Benchmarks for the Classification of Equivalent Circuit Models from Electrochemical Impedance Spectra},
volume = {170},
year = {2023},
}

autoecm's People

Contributors

dependabot[bot] avatar joachimschaeffer avatar juanpagaviria avatar m-c-frank avatar rgasper avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

autoecm's Issues

Train/Validation Split Bug CNN

Hi,
i've been working through your benchmark and have noticed a slight bug in the train/val dataset generation for the CNN part.
Specifically these lines:

AutoECM/clf_cnn.py

Lines 152 to 173 in d893efd

train_ds = tf.keras.utils.image_dataset_from_directory(
f"{input_dir}/train",
validation_split=0.1,
subset="training",
seed=20,
image_size=input_shape,
label_mode="categorical",
shuffle=True,
color_mode="grayscale",
)
val_ds = tf.keras.utils.image_dataset_from_directory(
f"{input_dir}/train",
validation_split=0.1,
subset="validation",
seed=20,
image_size=input_shape,
batch_size=64,
label_mode="categorical",
shuffle=False,
color_mode="grayscale",
)

The shuffle operation is performed before taking the train/validation cuts so the validation set will almost certainly contain images from the training set (the intersection of the resulting datasets will be approximately as big as the validation_split value).
I also implemented a small test to illustrate that problem (see at the end of this issue).

I think there was just a bit of confusion as to how the seed parameter works.
In order to ensure that the train and validation sets are disjoint you need to set the seed parameter to the same value for both datasets AND set both shuffle flags to the same value. This will ensure that the same random permutation is used for both datasets and thus no intersection will exist.

All in all I don't think this will have a huge impact on the test results of the models but I thought its worth noting because I was confused for a while why my validation accuracy was so high and I think this also impacts the way hyperparameters are tuned/selected. As we could no longer trust the validation accuracy to be a good indicator of the model performance. Especially when using early stopping and other callbacks that rely on the validation accuracy.

I hope this helps anyone else stumbling across.
And I think the simplest fix is to just set the shuffle parameter to True for both datasets? But I know just doing that now would probably interfere with some reproducibility concerns.

Thanks,
Martin Christoph Frank

The short test script:

import tensorflow as tf

INPUT_PATH = "data/images"
SEED = 20
SHUFFLE_TRAIN = True
SHUFFLE_VAL = False
VAL_SPLIT = 0.1


def create_datasets(input_dir):
    train_ds = tf.keras.utils.image_dataset_from_directory(
        f"{input_dir}/train",
        validation_split=VAL_SPLIT,
        subset="training",
        seed=SEED,
        shuffle=SHUFFLE_TRAIN,
    )

    val_ds = tf.keras.utils.image_dataset_from_directory(
        f"{input_dir}/train",
        validation_split=VAL_SPLIT,
        subset="validation",
        seed=SEED,
        batch_size=64,
        shuffle=SHUFFLE_VAL,
    )

    return train_ds, val_ds


if __name__ == "__main__":
    train_ds, val_ds = create_datasets(INPUT_PATH)

    train_paths = train_ds.file_paths
    val_paths = val_ds.file_paths

    # check for intersection
    intersection = set(train_paths).intersection(set(val_paths))
    if len(intersection) > 0:
        print("Intersection exists")
        print(f"n_train: {len(train_paths)}")
        print(f"n_val: {len(val_paths)}")
        print(
            f"n_intersection: {len(intersection)} ({100*len(intersection)/len(train_paths):.2f} %)"
        )
    else:
        print("No intersection")
        print(f"n_train: {len(train_paths)}")
        print(f"n_val: {len(val_paths)}")
        print(f"n_intersection: {len(intersection)}")

keras version conflict in requirements.txt

Good Morning,
the keras version in the requirements leads to an error when trying to setup the environment as detailed in the README:

ERROR: Cannot install -r requirements.txt (line 12) and keras==2.10.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested keras==2.10.0
    tensorflow 2.11.1 depends on keras<2.12 and >=2.11.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

I Fixed it by changing the keras version to 2.11.0 but maybe there is a better solution?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.