samuel-buteau / universal-battery-database Goto Github PK
View Code? Open in Web Editor NEWOpen source Li-ion data management and modelling software
License: Apache License 2.0
Open source Li-ion data management and modelling software
License: Apache License 2.0
When updating the dataset, more information will be added.
It would be very convenient if we made sure that when a dictionary is missing from the dataset, then ml_smoothing still does the right thing. We already have mechanisms in DegradationModel to handle missing values.
In this way, even if the dataset is not 100% up to date, the code runs fine.
When plotting vq_curves, for the cycles and rates used, compute shift, then plot on the same graph v_plus, v_minus(shifted), v_total
This is a quick experiment to determine if the cycle dependence of shift is in fact relatively straightforward.
The power of this equation is that A can depend on the avg_cycling_history and on cell_features,
and shift_0 depends on cell_features.
This equation is not literally true, but assuming that delta_shift is proportional to sqrt(time), there is a more or less linear relationship between time and cycle number (but the details do depend on cycling_history)
basically download all the filtered data in separate csv files, named with the given cell name and the given filter names.
then, give options for what to include in the csv
metadata file gives filename, name, rate
or just filename (and the rest is extracted)
Some incentives are applied to the outside of a computation, but some are applied to pieces of a computation that we might not want to expose elsewhere.
There are two issues:
access self.incentive_coeffs
multiple returns
When computing derivatives, ignore the loss
in the plots, show the data that has no groups. (i.e. groups is None)
Examples (some are more self-explanatory than others, but docs/wikis are good to have nonetheless):
all_data
degradation_model
cya_grp_dict
for some reason, the machine learning code has ended up in neware_parser, but this is dumb since it has nothing to do with parsing neware files.
We want a program to parse cycling files (such as neware files) and output into a universal cycling format.
We want a program to visualize the data
We want a program to do the machine learning stuff
We need a strict demarcation between the code responsible for taking data in and outputing plots, and the code responsible for taking in a partially trained model+dataset and producing the necessary data to be plotted.
for a while now, this has just been the same as cycle.
So lets replace features -> cell_features
and norm_cycle -> cycle
info: id, material, r, scale, data
given a name + c_rate + cap + file containing current, voltage, capacity, step_count
this will split by step, add a name, a rate, and and output many csv files.
It will be normalized to cap
The code should still run without the v_curves folder.
TODO:
for each (dataset, cell_id):
allow to rename the cell
Each cell can be a little bit different.
Instead of either 100% cell latent or 100% predicted cell features from components,
we can have 100% latent (if components are unknown) and 10% latent 90% predicted features from components (if components are known),
together with a strong incentive (if components are known) to have latent = predicted.
Ideally, each particular key in Key class refers to the same variable no matter the context and dictionary: if two dictionaries contain the same object/data, they can share the same key; the same key should not reference different data types/structures/values.
Once the above is done, then each key can be documented in either the Key class or a new Wiki page (or short description in the former and longer one in the latter).
for each (dataset, cell_id):
be able to create a filter over charge group/discharge group, give it a good name, a position in a grid, a color
allow removal of cells from a dataset
should be cell_id throughout. It is not a barcode.
the sampling of cycles and protocols is done by placing neighborhoods across the span of the data.
Previously, this was done to smooth the data. But now, nothing really dictates where we put the neighborhoods.
The first 100 cycles is tricky to get right, so is the last cycles.
So we can try to rewrite this system to correctly emphasize these important regions.
Not all keys have been replaced by those in the Key class.
First replace all string values in all files with "
.
The background task system keeps track of all the files in a given folder. The database knows all the filenames.
We can give an excel sheet mapping filenames to metadata and upload it to the database.
This will be read and overwrite the data in the database.
It should have all the columns of metadata, and prefilled header.
there should be a link to download a csv file.
rows with invalid filenames are ignored.
Rename variables with prefixes my_
to be more descriptive.
E.g.
my_data
my_cycle
my_barcode
Create a time predictor, which simply takes cell_features, cycle_history, and cycle_number and outputs time.
Also, include time elapsed as part of the dataset
We need to allow more customization
for instance, which directory should we use as the root of the raw data files?
also, all the magic numbers in the project should be accessible and modifiable.
Maybe a web page could be a convenient way to do that.
This is better than having config files.
Also, some settings should be (optionally) customized to each user.
Basically, anytime a property can be unknown and therefore fit to the data, and that we have some instances of that property which are known, then the unknown instances should have typical values.
Assuming the known values come from a gaussian distribution, then we can compute their mean and standard deviation (with some regularization for small datasets).
Based on this, the latent versions of these variables should not be many standard deviations away from the mean.
dVdQ is almost infinite at certain places (i.e. Q vs V is almost constant). Maybe the current penality/incentives do not allow this to be learned effectively.
Investigate and solve this problem.
specify source
Resolve the following warning
python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
when running ml_smoothing
make a grid of plots of all the desired filters and create a unified legend.
we have two branches of prediction: voltage in and state of charge in.
Right now, "voltage in" predicts the constant current data and constant voltage data as well.
We want to extend this to "state of charge in"
more specifically, the equation should be
I = v - V(q/Q_scale + Q(v_init - I_init R))
This requires passing the cv_capacities to the model.
The last layer of the neural net is initialized to be 0,
but this should only be done for the last neural net.
For the previous pieces such as R, shift, scale, z_cell, etc, it should probably not be initialized to be zero.
The goal is to add a final=False parameter to the neural net template and set int to final=True for the capacity network, or the voltage network
same as downloadable zips except that we create an "automated_output" and peridiocally, the output folder gets overwritten with the new data.
instead of only selecting on electrolyte and dry cell,
also allow dataset membership to filter the search.
basically, at test time, we call the model a bunch of times and we compute everything every time.
Instead, make specialized calls that get all the quantities that don't depend on protocol, then all scalars that depend on protocols, then all vector dependencies (probably separately for current_dependent and voltage_dependent
also, call @tf.function
remove explicit split of resistance, scale, and shift.
They will be reintroduced later.
for each cell_id, have two fields:
total_capacity,
auto_adjust.
If auto_adjust is False, then total_capacity is only set if None.
If auto_adjust is True, then total_capacity is set every time the data is processed.
the way to control what is in the test set is to change which neighborhoods end up in the train set.
For instance, if we filter out fast discharges in the neighborhood compilation, then that becomes part of the test set.
If a cell is filtered out, then it becomes part of the test set.
If an electrode is filtered out, then it becomes part of the test set.
However, if a token is latent, it can't be used at test time. We need a way to just use the generic token or to report that we can't produce results.
Basically, currents are naturally distributed uniformly in the logarithmic space, so it doesn't make sense to sample from a uniform distribution in linear-space to enforce constraints.
This makes an important distribution shift between data-based losses and incentive-based losses, which can be exploited by strong models.
The trick is simply to choose the uniform interval as x in [log(a), log(b)] instead of x in [a,b], and then to apply exp(x) instead of x.
for instance, a new format should be easy to define and then use instead of the default
Checking the results at 10000 steps is not super insightful because that's 100 gradient steps per cell, which is quite low.
you can only see that things are not super visibly bad
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.