samuel-buteau / universal-battery-database Goto Github PK

View Code? Open in Web Editor NEW

72.0 72.0 17.0 2.1 MB

Open source Li-ion data management and modelling software

License: Apache License 2.0

Python 92.29% HTML 7.28% Batchfile 0.21% Shell 0.22%

deep-learning lithium-ion lithium-ion-cells ml tensorflow universal-battery-database

universal-battery-database's People

Contributors

Stargazers

Watchers

Forkers

mbmkeffat hvbattman saftanas crazytreestomcat hanrach developerfred agafonovslava kekeliyi lrl930131 pastorpflores lodn2018 christoph-futter wb-eng palash-gaikwad zellfei

universal-battery-database's Issues

Forward compatible ml_smoothing

When updating the dataset, more information will be added.

It would be very convenient if we made sure that when a dictionary is missing from the dataset, then ml_smoothing still does the right thing. We already have mechanisms in DegradationModel to handle missing values.

In this way, even if the dataset is not 100% up to date, the code runs fine.

Visualize electrodes vs cycle vs rate

When plotting vq_curves, for the cycles and rates used, compute shift, then plot on the same graph v_plus, v_minus(shifted), v_total

shift = shift_0 + A*sqrt(cycle_number)

This is a quick experiment to determine if the cycle dependence of shift is in fact relatively straightforward.

The power of this equation is that A can depend on the avg_cycling_history and on cell_features,
and shift_0 depends on cell_features.

This equation is not literally true, but assuming that delta_shift is proportional to sqrt(time), there is a more or less linear relationship between time and cycle number (but the details do depend on cycling_history)

downloadable zips

basically download all the filtered data in separate csv files, named with the given cell name and the given filter names.

then, give options for what to include in the csv

input method: single csv file

metadata file gives filename, name, rate
or just filename (and the rest is extracted)

incentive gouvernance

Some incentives are applied to the outside of a computation, but some are applied to pieces of a computation that we might not want to expose elsewhere.

There are two issues:

Getting the coefficients in.
Getting the loss function out.

Coefficients in

access self.incentive_coeffs

Loss out

multiple returns
When computing derivatives, ignore the loss

needs a default group for cycles that don't have a group

in the plots, show the data that has no groups. (i.e. groups is None)

Add wiki page for variables that are passed around a lot

Short description
Where and how they're created (file, class, function/method)
Type
How they should be used

Examples (some are more self-explanatory than others, but docs/wikis are good to have nonetheless):

all_data
degradation_model
cya_grp_dict

Separate machine_learning from neware_parser

for some reason, the machine learning code has ended up in neware_parser, but this is dumb since it has nothing to do with parsing neware files.

We want a program to parse cycling files (such as neware files) and output into a universal cycling format.

We want a program to visualize the data

We want a program to do the machine learning stuff

Disentangle a Plot engine from a GetData engine

We need a strict demarcation between the code responsible for taking data in and outputing plots, and the code responsible for taking in a partially trained model+dataset and producing the necessary data to be plotted.

Plot:

Doesn't:

do any data processing,
call DegradationModel,
look into a large datastructure to find a small piece of data.

Does:

take in an efficiently packed minimal data.
take in a bunch of options.
Make nice plots of all types.

GetData:

Doesn't

do any plotting.

Does:

Take different input types depending on the situation, but always output things in the same format.

clarify reimporting behavior

remove norm_cycle

for a while now, this has just been the same as cycle.
So lets replace features -> cell_features
and norm_cycle -> cycle

metadata for v_curves

info: id, material, r, scale, data

input method: compound csv file

given a name + c_rate + cap + file containing current, voltage, capacity, step_count
this will split by step, add a name, a rate, and and output many csv files.
It will be normalized to cap

refactor parser argument loops

Make v_curves optional

The code should still run without the v_curves folder.

TODO:

fill v_curves with a bunch of zeros and turn off the relevant loss term

renaming cells within a dataset

for each (dataset, cell_id):
allow to rename the cell

Cell description is only part of the story

Each cell can be a little bit different.
Instead of either 100% cell latent or 100% predicted cell features from components,
we can have 100% latent (if components are unknown) and 10% latent 90% predicted features from components (if components are known),

together with a strong incentive (if components are known) to have latent = predicted.

Clean up and document Keys

Ideally, each particular key in Key class refers to the same variable no matter the context and dictionary: if two dictionaries contain the same object/data, they can share the same key; the same key should not reference different data types/structures/values.

Once the above is done, then each key can be documented in either the Key class or a new Wiki page (or short description in the former and longer one in the latter).

separating and naming cycle groups

for each (dataset, cell_id):
be able to create a filter over charge group/discharge group, give it a good name, a position in a grid, a color

remove cells from a dataset

allow removal of cells from a dataset

barcode is a dumb name

should be cell_id throughout. It is not a barcode.

neighborhood placement

the sampling of cycles and protocols is done by placing neighborhoods across the span of the data.

Previously, this was done to smooth the data. But now, nothing really dictates where we put the neighborhoods.

The first 100 cycles is tricky to get right, so is the last cycles.

So we can try to rewrite this system to correctly emphasize these important regions.

Not all keys have been replaced

Not all keys have been replaced by those in the Key class.

First replace all string values in all files with ".

excel file import database

The background task system keeps track of all the files in a given folder. The database knows all the filenames.

We can give an excel sheet mapping filenames to metadata and upload it to the database.
This will be read and overwrite the data in the database.
It should have all the columns of metadata, and prefilled header.
there should be a link to download a csv file.

rows with invalid filenames are ignored.

follow what happens when triggering reimport

Ambiguous variable names with prefixes "my_"

Rename variables with prefixes my_ to be more descriptive.

E.g.

my_data
my_cycle
my_barcode

time as a prediction

Create a time predictor, which simply takes cell_features, cycle_history, and cycle_number and outputs time.

Also, include time elapsed as part of the dataset

clarify postgresql setup

expand config

We need to allow more customization
for instance, which directory should we use as the root of the raw data files?

also, all the magic numbers in the project should be accessible and modifiable.

Maybe a web page could be a convenient way to do that.

This is better than having config files.

Also, some settings should be (optionally) customized to each user.

latent variables should be within distribution of known variables

Basically, anytime a property can be unknown and therefore fit to the data, and that we have some instances of that property which are known, then the unknown instances should have typical values.

Assuming the known values come from a gaussian distribution, then we can compute their mean and standard deviation (with some regularization for small datasets).
Based on this, the latent versions of these variables should not be many standard deviations away from the mean.

website visualization of the data is broken

dVdQ is very steep at bottom of charge (top of discharge), and so penalizing delta Q doesn't capture the right thing

dVdQ is almost infinite at certain places (i.e. Q vs V is almost constant). Maybe the current penality/incentives do not allow this to be learned effectively.

Investigate and solve this problem.

clarify file handling

specify source

UserWarning: Converting sparse IndexedSlices

Resolve the following warning

python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

when running ml_smoothing

direct plot of dataset

make a grid of plots of all the desired filters and create a unified legend.

cv predictions for v_direct

we have two branches of prediction: voltage in and state of charge in.

Right now, "voltage in" predicts the constant current data and constant voltage data as well.
We want to extend this to "state of charge in"

more specifically, the equation should be
I = v - V(q/Q_scale + Q(v_init - I_init R))

This requires passing the cv_capacities to the model.

identify test opportunities

initialization of last layer

The last layer of the neural net is initialized to be 0,
but this should only be done for the last neural net.
For the previous pieces such as R, shift, scale, z_cell, etc, it should probably not be initialized to be zero.

The goal is to add a final=False parameter to the neural net template and set int to final=True for the capacity network, or the voltage network

ChargeGroup and DischargeGroup are the same thing and should just have a flag corresponding to chg or dchg

output to folder

same as downloadable zips except that we create an "automated_output" and peridiocally, the output folder gets overwritten with the new data.

dataset membership in the search

instead of only selecting on electrolyte and dry cell,
also allow dataset membership to filter the search.

more efficient model query

basically, at test time, we call the model a bunch of times and we compute everything every time.

Instead, make specialized calls that get all the quantities that don't depend on protocol, then all scalars that depend on protocols, then all vector dependencies (probably separately for current_dependent and voltage_dependent

also, call @tf.function

make a DegradationModelBlackbox

remove explicit split of resistance, scale, and shift.
They will be reintroduced later.

when grouping cycles by rates, store the global total capacity for that cell_id (if auto)

for each cell_id, have two fields:
total_capacity,
auto_adjust.
If auto_adjust is False, then total_capacity is only set if None.
If auto_adjust is True, then total_capacity is set every time the data is processed.

train-test split

the way to control what is in the test set is to change which neighborhoods end up in the train set.
For instance, if we filter out fast discharges in the neighborhood compilation, then that becomes part of the test set.

If a cell is filtered out, then it becomes part of the test set.

If an electrode is filtered out, then it becomes part of the test set.
However, if a token is latent, it can't be used at test time. We need a way to just use the generic token or to report that we can't produce results.

Make Foralls about current be distributed uniformly in the log-space instead of the linear-space.

Basically, currents are naturally distributed uniformly in the logarithmic space, so it doesn't make sense to sample from a uniform distribution in linear-space to enforce constraints.
This makes an important distribution shift between data-based losses and incentive-based losses, which can be exploited by strong models.

The trick is simply to choose the uniform interval as x in [log(a), log(b)] instead of x in [a,b], and then to apply exp(x) instead of x.

samuel-buteau / universal-battery-database Goto Github PK

universal-battery-database's People

Contributors

Stargazers

Watchers

Forkers

universal-battery-database's Issues

Coefficients in

Loss out

Plot:

Doesn't:

Does:

GetData:

Doesn't

Does:

Recommend Projects

Recommend Topics

Recommend Org