Git Product home page Git Product logo

Comments (24)

fjxmlzn avatar fjxmlzn commented on July 29, 2024 4

By the way, for future readers of this thread:

If you are looking for TF2 implementation of DoppelGANger, you can look at https://github.com/fjxmlzn/DoppelGANger/tree/TF2 by @yzion

If you are looking for PyTorch implementation of DoppelGANger, you can look at https://synthetics.docs.gretel.ai/en/stable/models/timeseries_dgan.html#timeseries-dgan by Gretel AI.

from doppelganger.

Baukebrenninkmeijer avatar Baukebrenninkmeijer commented on July 29, 2024 3

This, or a pytorch version would both be super great to have. TF 1.4 is kind of a bummer :(.

from doppelganger.

yzion avatar yzion commented on July 29, 2024 1

Hi @chameleonzz
Try to run it with the TF2 branch. In this branch there is a support in tensorflow 2 so you can use later version of python and cuda.
Please update here if it solved your problem

from doppelganger.

fjxmlzn avatar fjxmlzn commented on July 29, 2024 1

Thank @yzion for the help and the answers!

@chameleonzz Re: how to decide the attributes and features for your own data.

The definition of features and attributes can be very flexible, depending on the aspects of the data you want DoppelGANger to capture. More specifically, let's take a simple example. Let's say your original data is a table in the following format.

ColumnA ColumnB ColumnC
1 2 3
1 2 4
2 2 3
2 2 5
2 2 6

You can treat any (even several) columns as attributes (or metadata), and group the rows according to those attributes, and treat the rest of the columns as features (or time-series).

For example, you can choose to treat ColumnA and ColumnB as attributes, and ColumnC as the feature. You will get 2 samples: {attributes=(1,2), features=(3,4)}, {attributes=(2,2), features=(3,5,6)}. DoppelGANger (ideally) will be able to learn the temporal correlations of features that are associated with the same attribute (i.e., (3,4) in the first sample, and (3,5,6) in the second sample). But you can also choose to treat only ColumnA as the attributes, or any other combinations of the columns you want. In short, how to choose features/attributes depends on the context of your application, and which part you want DoppelGANger to model as temporal correlations.

Hope this clarification helps!

from doppelganger.

shaanchandra avatar shaanchandra commented on July 29, 2024

Is there a Pytorch implementation available? Tensorflow is really hard to work with now. If anyone has worked or wants to collaborate on open-sourcing a Pytorch version of this, lemme know! I will be interested :)

from doppelganger.

fjxmlzn avatar fjxmlzn commented on July 29, 2024

Thank you all for the suggestions, and I agree that TF2 or PyTorch version of DoppelGANger would be very useful. Unfortunately, we do not have that so far. If/When you have a TF2 or PyTorch implementation, please let me know I'll add a link to it. Thank you!

from doppelganger.

yzion avatar yzion commented on July 29, 2024

did someone managed to update it to TF2?

from doppelganger.

chameleonzz avatar chameleonzz commented on July 29, 2024

Hi, when I installed TensorFlow 1.4.0, pycharm warned that python 3.5 has reached its end-of-life date and it is no longer supported in pycharm. The DoppelGANger seemingly not worked normally. Is there any solution?

from doppelganger.

fjxmlzn avatar fjxmlzn commented on July 29, 2024

@chameleonzz Could you please post error messages or screenshots of the errors?

from doppelganger.

chameleonzz avatar chameleonzz commented on July 29, 2024

from doppelganger.

chameleonzz avatar chameleonzz commented on July 29, 2024

It is feasible for DG with TF 2.1.0 and Python 3.7. However, when I tried to run the code gan_task.py of 'example_training', there was a warning "unresolved reference 'gan' ". I tried to pip install gan, but it seems no corresponding package names gan. How to solve the problem? Thank you for your help.

from doppelganger.

yzion avatar yzion commented on July 29, 2024

@chameleonzz can you share more information?
what was the python command that you ran? can you share the full warning? is it running with this warning?
the folder gan is part of the project so maybe there is an improt issue that you need to solve.
gan_folder

from doppelganger.

chameleonzz avatar chameleonzz commented on July 29, 2024

Thank you for your answer. I know how to solve the problem finally. If I want to use the DoppelGANger, there are three main steps. Firstly, a virtual environment is needed to be built, such as tf 2.1.0 + python 3.7. Secondly, pip some packages, such as gan, GPUTaskScheduler and Tensorflow-privacy. Gan package can be downloaded in DoppelGANger Github. Those packages can be downloaded in Github and installed with the command 'pip install -e path/package_file_name. Thirdly, open the entire DoppelGANger(DG) item with pycharm.
But I was confused with another problem. In DG, there are some examples, the type of data files includes '.pkl' and '.npz'. If I want to create a similar data file with my data, how to decide the attributes and features of my data. If there are row data including goggle, web and FCC_MBA and an explanatory for how to decide the attributes and attributes of those data. Maybe more people can understand the work more easily. Besides, what outputs file will be generated after running each example project?
At last, thank you very much for your enthusiastic answers each time. I wish the DG project can be used in more data-driven research. It is really a significant work.

from doppelganger.

yzion avatar yzion commented on July 29, 2024

Any time :)
you can look for examples in the README file of the project.
there are exmaples for the pkl files and also fot the npz files.
if you need there are also links to download the dataset that was used in this project. so you can download it and read it with python to look on the structre.
moreover, there are links to a number of blogposts so you can try used them.
if you still struggling let me know and I will try to help
Good luck

from doppelganger.

chameleonzz avatar chameleonzz commented on July 29, 2024

By the way, for future readers of this thread:

If you are looking for TF2 implementation of DoppelGANger, you can look at https://github.com/fjxmlzn/DoppelGANger/tree/TF2 by @yzion

If you are looking for PyTorch implementation of DoppelGANger, you can look at https://synthetics.docs.gretel.ai/en/stable/models/timeseries_dgan.html#timeseries-dgan by Gretel AI.

Recently, I met with another problem.
I tried to run main.py in the example_training file and main_generate_data.py in the example_generating_data file. However, the result was that only a file named results was created. And in sub-files of 'results', there was only a worker_*.log.txt.
Q1: Why no synthetic datasets of [web/google/FCC_MBA] were generated?
Snipaste_2022-07-24_23-13-48
I looked for whether there is a place in the code to specify the dataset path. But I found nothing.

Q2: When I know the attributes and features of my datasets, how to generate the four files including data_attribute_output.pkl, data_feature_output.pkl, data_test.npz and data_train.npz. Whether another codes need to be written to achieve this work?

At last, thank you for your continued patient answers.

from doppelganger.

fjxmlzn avatar fjxmlzn commented on July 29, 2024

Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?

Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.

from doppelganger.

chameleonzz avatar chameleonzz commented on July 29, 2024

Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?

Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.

results

After running example_training/main.py, the content of worker.log was as follows.(The 'aux_disc-False,dataset-FCC_MBA,epoch-17000,epoch_checkpoint_freq-70,extra_checkpoint_freq-850,run-0,sample_len-1,self_norm-False,' file was taken as an example.)
workerlog

After running example_generating_data/main_generate_data.py, the content of worker_generate_data.log was as follows.
worker_generate_data_log

I wonder if the results of example_training/main.py and example_generating_data/main_generate_data.py only have those output files? If I want to generate synthetic data corresponding to real datasets (web/google/FCC_MBA), what should I do?

from doppelganger.

fjxmlzn avatar fjxmlzn commented on July 29, 2024

@chameleonzz No, there should be other files, and the content of worker.log or worker_generate_data.log should be more than this line.

Could you please delete results folder completely, and try running example_training/main.py again, and paste here the console output plus the content of worker.log again?

from doppelganger.

chameleonzz avatar chameleonzz commented on July 29, 2024

@chameleonzz No, there should be other files, and the content of worker.log or worker_generate_data.log should be more than this line.

Could you please delete results folder completely, and try running example_training/main.py again, and paste here the console output plus the content of worker.log again?

Thanks for your answer.
I have tried several times to delete results folder completely, and try running example_training/main.py again. But the output has also no change. It was the same as in the previous pictures.
Should I change some places in example_training/main.py and run it again?

from doppelganger.

fjxmlzn avatar fjxmlzn commented on July 29, 2024

Could you please paste here the console (i.e., terminal) output?

from doppelganger.

fjxmlzn avatar fjxmlzn commented on July 29, 2024

@chameleonzz
Also, we can move our future discussion of this question to #30 since the problem you see should likely not be due to TF2

from doppelganger.

JimmyZhan1213 avatar JimmyZhan1213 commented on July 29, 2024

Hello, have you solved the problem of incomplete training and generated output now? I had a similar problem recently and I only had a worker.log under the folder I generated.
image
image

from doppelganger.

fjxmlzn avatar fjxmlzn commented on July 29, 2024

For the previous problem, please refer to #30. For this issue, would you mind creating a new issue? We can discuss it there. This is a different problem.

from doppelganger.

chameleonzz avatar chameleonzz commented on July 29, 2024

Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?

Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.

Recently, the example_traning\main.py was re-run on a computer with intel i7-11800H CPU @2.30 GHz and 64 GB memory. It cost 5740 minutes to generate the results file named β€˜dataset-google,epoch-400,run-0,sample-len-1’. And the results file named 'dataset-google,epoch-400,run-0,sample-len-5' is generating now?
I have three questions now.

  1. When I know the attributes and features of my datasets, how to generate the four files including data_attribute_output.pkl, data_feature_output.pkl, data_test.npz and data_train.npz.
  2. How to decide parameters and Hyperparameters, such as epoch, extra_checkpoint_freq, and so on.
  3. Now I have not run the example_generating_data\main_generate_data.py, will a result file whose format is a CSV or Xls is generated?

from doppelganger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.