Git Product home page Git Product logo

universal-battery-database's People

Contributors

harvey2phase avatar samuel-buteau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

universal-battery-database's Issues

Forward compatible ml_smoothing

When updating the dataset, more information will be added.

It would be very convenient if we made sure that when a dictionary is missing from the dataset, then ml_smoothing still does the right thing. We already have mechanisms in DegradationModel to handle missing values.

In this way, even if the dataset is not 100% up to date, the code runs fine.

shift = shift_0 + A*sqrt(cycle_number)

This is a quick experiment to determine if the cycle dependence of shift is in fact relatively straightforward.

The power of this equation is that A can depend on the avg_cycling_history and on cell_features,
and shift_0 depends on cell_features.

This equation is not literally true, but assuming that delta_shift is proportional to sqrt(time), there is a more or less linear relationship between time and cycle number (but the details do depend on cycling_history)

downloadable zips

basically download all the filtered data in separate csv files, named with the given cell name and the given filter names.

then, give options for what to include in the csv

incentive gouvernance

Some incentives are applied to the outside of a computation, but some are applied to pieces of a computation that we might not want to expose elsewhere.

There are two issues:

  1. Getting the coefficients in.
  2. Getting the loss function out.

Coefficients in

access self.incentive_coeffs

Loss out

multiple returns
When computing derivatives, ignore the loss

Add wiki page for variables that are passed around a lot

  • Short description
  • Where and how they're created (file, class, function/method)
  • Type
  • How they should be used

Examples (some are more self-explanatory than others, but docs/wikis are good to have nonetheless):

  • all_data
  • degradation_model
  • cya_grp_dict

Separate machine_learning from neware_parser

for some reason, the machine learning code has ended up in neware_parser, but this is dumb since it has nothing to do with parsing neware files.

We want a program to parse cycling files (such as neware files) and output into a universal cycling format.

We want a program to visualize the data

We want a program to do the machine learning stuff

Disentangle a Plot engine from a GetData engine

We need a strict demarcation between the code responsible for taking data in and outputing plots, and the code responsible for taking in a partially trained model+dataset and producing the necessary data to be plotted.

Plot:

Doesn't:

  1. do any data processing,
  2. call DegradationModel,
  3. look into a large datastructure to find a small piece of data.

Does:

  1. take in an efficiently packed minimal data.
  2. take in a bunch of options.
  3. Make nice plots of all types.

GetData:

Doesn't

  1. do any plotting.

Does:

  1. Take different input types depending on the situation, but always output things in the same format.

remove norm_cycle

for a while now, this has just been the same as cycle.
So lets replace features -> cell_features
and norm_cycle -> cycle

input method: compound csv file

given a name + c_rate + cap + file containing current, voltage, capacity, step_count
this will split by step, add a name, a rate, and and output many csv files.
It will be normalized to cap

Make v_curves optional

The code should still run without the v_curves folder.

TODO:

  • fill v_curves with a bunch of zeros and turn off the relevant loss term

Cell description is only part of the story

Each cell can be a little bit different.
Instead of either 100% cell latent or 100% predicted cell features from components,
we can have 100% latent (if components are unknown) and 10% latent 90% predicted features from components (if components are known),

together with a strong incentive (if components are known) to have latent = predicted.

Clean up and document Keys

Ideally, each particular key in Key class refers to the same variable no matter the context and dictionary: if two dictionaries contain the same object/data, they can share the same key; the same key should not reference different data types/structures/values.

Once the above is done, then each key can be documented in either the Key class or a new Wiki page (or short description in the former and longer one in the latter).

separating and naming cycle groups

for each (dataset, cell_id):
be able to create a filter over charge group/discharge group, give it a good name, a position in a grid, a color

neighborhood placement

the sampling of cycles and protocols is done by placing neighborhoods across the span of the data.

Previously, this was done to smooth the data. But now, nothing really dictates where we put the neighborhoods.

The first 100 cycles is tricky to get right, so is the last cycles.

So we can try to rewrite this system to correctly emphasize these important regions.

excel file import database

The background task system keeps track of all the files in a given folder. The database knows all the filenames.

We can give an excel sheet mapping filenames to metadata and upload it to the database.
This will be read and overwrite the data in the database.
It should have all the columns of metadata, and prefilled header.
there should be a link to download a csv file.

rows with invalid filenames are ignored.

time as a prediction

Create a time predictor, which simply takes cell_features, cycle_history, and cycle_number and outputs time.

Also, include time elapsed as part of the dataset

expand config

We need to allow more customization
for instance, which directory should we use as the root of the raw data files?

also, all the magic numbers in the project should be accessible and modifiable.

Maybe a web page could be a convenient way to do that.

This is better than having config files.

Also, some settings should be (optionally) customized to each user.

latent variables should be within distribution of known variables

Basically, anytime a property can be unknown and therefore fit to the data, and that we have some instances of that property which are known, then the unknown instances should have typical values.

Assuming the known values come from a gaussian distribution, then we can compute their mean and standard deviation (with some regularization for small datasets).
Based on this, the latent versions of these variables should not be many standard deviations away from the mean.

UserWarning: Converting sparse IndexedSlices

Resolve the following warning

python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

when running ml_smoothing

cv predictions for v_direct

we have two branches of prediction: voltage in and state of charge in.

Right now, "voltage in" predicts the constant current data and constant voltage data as well.
We want to extend this to "state of charge in"

more specifically, the equation should be
I = v - V(q/Q_scale + Q(v_init - I_init R))

This requires passing the cv_capacities to the model.

initialization of last layer

The last layer of the neural net is initialized to be 0,
but this should only be done for the last neural net.
For the previous pieces such as R, shift, scale, z_cell, etc, it should probably not be initialized to be zero.

The goal is to add a final=False parameter to the neural net template and set int to final=True for the capacity network, or the voltage network

output to folder

same as downloadable zips except that we create an "automated_output" and peridiocally, the output folder gets overwritten with the new data.

more efficient model query

basically, at test time, we call the model a bunch of times and we compute everything every time.

Instead, make specialized calls that get all the quantities that don't depend on protocol, then all scalars that depend on protocols, then all vector dependencies (probably separately for current_dependent and voltage_dependent

also, call @tf.function

train-test split

the way to control what is in the test set is to change which neighborhoods end up in the train set.
For instance, if we filter out fast discharges in the neighborhood compilation, then that becomes part of the test set.

If a cell is filtered out, then it becomes part of the test set.

If an electrode is filtered out, then it becomes part of the test set.
However, if a token is latent, it can't be used at test time. We need a way to just use the generic token or to report that we can't produce results.

Make Foralls about current be distributed uniformly in the log-space instead of the linear-space.

Basically, currents are naturally distributed uniformly in the logarithmic space, so it doesn't make sense to sample from a uniform distribution in linear-space to enforce constraints.
This makes an important distribution shift between data-based losses and incentive-based losses, which can be exploited by strong models.

The trick is simply to choose the uniform interval as x in [log(a), log(b)] instead of x in [a,b], and then to apply exp(x) instead of x.

Evaluation Problems

Checking the results at 10000 steps is not super insightful because that's 100 gradient steps per cell, which is quite low.
you can only see that things are not super visibly bad

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.