Git Product home page Git Product logo

Comments (11)

JulesBelveze avatar JulesBelveze commented on May 30, 2024 1

Alright @sadransh then look at this parallelized version of tSNE library otherwise I think that the sklearn version should work in decent time.

Then just try to train the autoencoder (play around with the different parameters and attention mechanisms) and once you are have trained the autoencoder. Just pass your data into the encoder, retrieve the latent features and feed them into a clustering algorithm and hopefuly that will do the job!

Please feel free to fork the project, open issues if you encounter any or simply open a PR if you want to add some cool stuff πŸ€“

from time-series-autoencoder.

JulesBelveze avatar JulesBelveze commented on May 30, 2024 1

Also, I'm curious about your results, please keep me posted!

I'm closing this issue, but if there's anyway I can help open a new one !

from time-series-autoencoder.

JulesBelveze avatar JulesBelveze commented on May 30, 2024 1
  • You should be able to directly use my model for this purpose by specifying label_col=[] in the config.py. The model will just be a compression network then.

  • A good repo I took inspiration from is this one. Even though it implies VAE the idea is similar and it might also be a potential architecture to look at in your case.

  • Unfortunately the data I used is not public so I cannot share it... I'll try to provide an example in the near future.

  • That should be feasible without too much struggle. If you can share your data we could potentially use it as an example in this repo πŸ˜„

from time-series-autoencoder.

JulesBelveze avatar JulesBelveze commented on May 30, 2024

Hi @sadransh thanks for the interest 😸

I previously used this repo for a kindda similar task: anomaly detection. I used this autoencoder coupled with some clustering/classification algorithms. The particularity of my task was the really high-dimensionality of the data (>700 features). For this reason I firstly trained the autoencoder and then used the hidden states in the clustering/classifiers.

I also used tSNE for visualisations purposes and there are some amazing libraries that will do the job for you, depending on your needs (I can give you some if you want).

In this repo I'm using tensorboard to manage experiments; it's relatively easy to use and it does the job. However, I recently put my hands on a new experiment manager called neptune.ai and literally fell in love 😝 So I might switch to it pretty soon haha .

Anyway, to further help you and guide my first question is how highly dimensional your data is?

from time-series-autoencoder.

sadransh avatar sadransh commented on May 30, 2024

Thank you for the cool website you mentioned!

To be accurate, I have 1k samples, each contain 5 time series signals(with 400 samples).

As you have done previously, I want train an autoencoder and do clustring on latent space. I prefer to use signals directly and not extracting statistical features ...

from time-series-autoencoder.

sadransh avatar sadransh commented on May 30, 2024

updated
Thanks again. Btw, I don't exactly know how to change the network from a forecast to a reconstruction one.
My initial intuition is that my network should be a signal reconstruction one to get better results from the encoder( is it correct?)

Could you please give me a good starting to read from, like another repo doing reconstruction, to make me able to implement it faster?

Another thing is that I am not sure how to feed the data in. Do you have any sample set that you have worked with on this network? That really helps me to figure it out fast.

Consider that I have 4 different sensors that measure something simultaneously,(400 samples per sensor) and then I have done the experiment 1000 times. This is what I have and I want to encode and check latent space to see if it is possible to cluster similar experiments in one group.

from time-series-autoencoder.

sadransh avatar sadransh commented on May 30, 2024

@JulesBelveze Thanks for your help
Unfortunately, my set is also not public. However, this might be a good one for you.

I finally found what exactly I am looking for and it is that I need to do multi sequence-to-one form. meaning my sequence has one label. So, do you have any idea how I show transform my dataset to work with your code structure when I have label_col. I believe if I order my data in this shape :
(consider for 1 experiment each sensor gives me 400 samples and then I consider one label for all sampling)

sensor1       sensor2       sensor3      sensor4     label_col
   1            -4             0             5        forward
   3            55            -4             6        forward
  ...           ...           ...           ...       forward

this would be problematic. since all 400 samples together means forward label( not each sampling). So I think after feeding 400 samples together I should have one label. and during testing, the same should happen( I should feed the network with ~400 samples, and then the network should output one label)

Could you please let me know If I am thinking correctly about it.

from time-series-autoencoder.

JulesBelveze avatar JulesBelveze commented on May 30, 2024

@sadransh yes I was thinking of adding this dataset :)

Hmmm, correct me if I'm wrong but in your case you just want to train the autoencoder to reconstruct the input signals right? You dont want to perform any kind of forecasting. In this case you don't need a target column, since the features can be seen as the targeted values.
From my understanding of your problem. First you need to find a way to interpret your data as time series. What you could do is concatenate all the experiments, and say that each experiments contains 400 timesteps. Then you just need to set a window length of 400 and a window offset of 400. That way, each window will contain one 1 experiment.

Also, in your case you want to cluster each experiment, meaning 400*4=1600 values (which is too much). To leverage the latent space you will need to aggregate (by taking the mean for example) all the hidden states for each of your experiment.

Hope that helps !

from time-series-autoencoder.

sadransh avatar sadransh commented on May 30, 2024

Thanks for your time and useful comments!

I got a question when taking a look at the code.

Here

y_hist.append(
you are using y though if I pass a dataset without target column(y) still it is going to create y_hist tensor, which is not possible ( since there is no target value when we want to reconstruct) Could you please let me know what is y_hist and how to resolve this problem.

Could you please let me know if you have ever reconstructed any signal based on this version of the code?

from time-series-autoencoder.

JulesBelveze avatar JulesBelveze commented on May 30, 2024

Hey @sadransh thanks for pointing that out!
You are actually right, this should only be done when working with a target column!

Could you please open an issue ?? And if you want to open a PR that would be truly awesome! :D

I actually did this for anomaly detection yes, the model is slightly different but I think once the issue you mentioned is fixed you should be able to do it.

from time-series-autoencoder.

JulesBelveze avatar JulesBelveze commented on May 30, 2024

Closing this for inactivity.

from time-series-autoencoder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.