Comments (11)
Alright @sadransh then look at this parallelized version of tSNE library otherwise I think that the sklearn version should work in decent time.
Then just try to train the autoencoder (play around with the different parameters and attention mechanisms) and once you are have trained the autoencoder. Just pass your data into the encoder, retrieve the latent features and feed them into a clustering algorithm and hopefuly that will do the job!
Please feel free to fork the project, open issues if you encounter any or simply open a PR if you want to add some cool stuff π€
from time-series-autoencoder.
Also, I'm curious about your results, please keep me posted!
I'm closing this issue, but if there's anyway I can help open a new one !
from time-series-autoencoder.
-
You should be able to directly use my model for this purpose by specifying
label_col=[]
in theconfig.py
. The model will just be a compression network then. -
A good repo I took inspiration from is this one. Even though it implies VAE the idea is similar and it might also be a potential architecture to look at in your case.
-
Unfortunately the data I used is not public so I cannot share it... I'll try to provide an example in the near future.
-
That should be feasible without too much struggle. If you can share your data we could potentially use it as an example in this repo π
from time-series-autoencoder.
Hi @sadransh thanks for the interest πΈ
I previously used this repo for a kindda similar task: anomaly detection. I used this autoencoder coupled with some clustering/classification algorithms. The particularity of my task was the really high-dimensionality of the data (>700 features). For this reason I firstly trained the autoencoder and then used the hidden states in the clustering/classifiers.
I also used tSNE for visualisations purposes and there are some amazing libraries that will do the job for you, depending on your needs (I can give you some if you want).
In this repo I'm using tensorboard to manage experiments; it's relatively easy to use and it does the job. However, I recently put my hands on a new experiment manager called neptune.ai and literally fell in love π So I might switch to it pretty soon haha .
Anyway, to further help you and guide my first question is how highly dimensional your data is?
from time-series-autoencoder.
Thank you for the cool website you mentioned!
To be accurate, I have 1k samples, each contain 5 time series signals(with 400 samples).
As you have done previously, I want train an autoencoder and do clustring on latent space. I prefer to use signals directly and not extracting statistical features ...
from time-series-autoencoder.
updated
Thanks again. Btw, I don't exactly know how to change the network from a forecast to a reconstruction one.
My initial intuition is that my network should be a signal reconstruction one to get better results from the encoder( is it correct?)
Could you please give me a good starting to read from, like another repo doing reconstruction, to make me able to implement it faster?
Another thing is that I am not sure how to feed the data in. Do you have any sample set that you have worked with on this network? That really helps me to figure it out fast.
Consider that I have 4 different sensors that measure something simultaneously,(400 samples per sensor) and then I have done the experiment 1000 times. This is what I have and I want to encode and check latent space to see if it is possible to cluster similar experiments in one group.
from time-series-autoencoder.
@JulesBelveze Thanks for your help
Unfortunately, my set is also not public. However, this might be a good one for you.
I finally found what exactly I am looking for and it is that I need to do multi sequence-to-one form. meaning my sequence has one label. So, do you have any idea how I show transform my dataset to work with your code structure when I have label_col
. I believe if I order my data in this shape :
(consider for 1 experiment each sensor gives me 400 samples and then I consider one label for all sampling)
sensor1 sensor2 sensor3 sensor4 label_col
1 -4 0 5 forward
3 55 -4 6 forward
... ... ... ... forward
this would be problematic. since all 400 samples together means forward label( not each sampling). So I think after feeding 400 samples together I should have one label. and during testing, the same should happen( I should feed the network with ~400 samples, and then the network should output one label)
Could you please let me know If I am thinking correctly about it.
from time-series-autoencoder.
@sadransh yes I was thinking of adding this dataset :)
Hmmm, correct me if I'm wrong but in your case you just want to train the autoencoder to reconstruct the input signals right? You dont want to perform any kind of forecasting. In this case you don't need a target column, since the features can be seen as the targeted values.
From my understanding of your problem. First you need to find a way to interpret your data as time series. What you could do is concatenate all the experiments, and say that each experiments contains 400 timesteps. Then you just need to set a window length of 400 and a window offset of 400. That way, each window will contain one 1 experiment.
Also, in your case you want to cluster each experiment, meaning 400*4=1600 values (which is too much). To leverage the latent space you will need to aggregate (by taking the mean for example) all the hidden states for each of your experiment.
Hope that helps !
from time-series-autoencoder.
Thanks for your time and useful comments!
I got a question when taking a look at the code.
Here
time-series-autoencoder/dataset.py
Line 57 in 8493801
Could you please let me know if you have ever reconstructed any signal based on this version of the code?
from time-series-autoencoder.
Hey @sadransh thanks for pointing that out!
You are actually right, this should only be done when working with a target column!
Could you please open an issue ?? And if you want to open a PR that would be truly awesome! :D
I actually did this for anomaly detection yes, the model is slightly different but I think once the issue you mentioned is fixed you should be able to do it.
from time-series-autoencoder.
Closing this for inactivity.
from time-series-autoencoder.
Related Issues (20)
- where is the datasetοΌ HOT 1
- Encoded state HOT 3
- BUG HOT 10
- reconstruction error HOT 8
- Decoder output
- Trained model on NFLX stock data from 2014-2021. Predictions are scaled wrongly. HOT 10
- Is the problem statement just prediction or forecasting? HOT 1
- AssertionError: Pytorch Issue with prediction window > 1 HOT 5
- Not working for many of the tickers HOT 1
- y_hist HOT 2
- No License HOT 1
- Got difficulties in accessing the dataset HOT 1
- I need to work with Google Colab? any get started exa;ple please? HOT 2
- issue when test the trained model
- error when the output_size isn't 1
- Question about model evaluation HOT 2
- Prediction of multiple outputs dependent on multiple features. HOT 1
- Questions about using the model for denoising (reconstruction) HOT 4
- Tensors must have same number of dimensions: got 2 and 1 HOT 4
- Regarding input dimensions HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from time-series-autoencoder.