Comments (24)
By the way, for future readers of this thread:
If you are looking for TF2 implementation of DoppelGANger, you can look at https://github.com/fjxmlzn/DoppelGANger/tree/TF2 by @yzion
If you are looking for PyTorch implementation of DoppelGANger, you can look at https://synthetics.docs.gretel.ai/en/stable/models/timeseries_dgan.html#timeseries-dgan by Gretel AI.
from doppelganger.
This, or a pytorch version would both be super great to have. TF 1.4 is kind of a bummer :(.
from doppelganger.
Hi @chameleonzz
Try to run it with the TF2 branch. In this branch there is a support in tensorflow 2 so you can use later version of python and cuda.
Please update here if it solved your problem
from doppelganger.
Thank @yzion for the help and the answers!
@chameleonzz Re: how to decide the attributes and features for your own data.
The definition of features and attributes can be very flexible, depending on the aspects of the data you want DoppelGANger to capture. More specifically, let's take a simple example. Let's say your original data is a table in the following format.
ColumnA | ColumnB | ColumnC |
---|---|---|
1 | 2 | 3 |
1 | 2 | 4 |
2 | 2 | 3 |
2 | 2 | 5 |
2 | 2 | 6 |
You can treat any (even several) columns as attributes (or metadata), and group the rows according to those attributes, and treat the rest of the columns as features (or time-series).
For example, you can choose to treat ColumnA and ColumnB as attributes, and ColumnC as the feature. You will get 2 samples: {attributes=(1,2), features=(3,4)}, {attributes=(2,2), features=(3,5,6)}. DoppelGANger (ideally) will be able to learn the temporal correlations of features that are associated with the same attribute (i.e., (3,4) in the first sample, and (3,5,6) in the second sample). But you can also choose to treat only ColumnA as the attributes, or any other combinations of the columns you want. In short, how to choose features/attributes depends on the context of your application, and which part you want DoppelGANger to model as temporal correlations.
Hope this clarification helps!
from doppelganger.
Is there a Pytorch implementation available? Tensorflow is really hard to work with now. If anyone has worked or wants to collaborate on open-sourcing a Pytorch version of this, lemme know! I will be interested :)
from doppelganger.
Thank you all for the suggestions, and I agree that TF2 or PyTorch version of DoppelGANger would be very useful. Unfortunately, we do not have that so far. If/When you have a TF2 or PyTorch implementation, please let me know I'll add a link to it. Thank you!
from doppelganger.
did someone managed to update it to TF2?
from doppelganger.
Hi, when I installed TensorFlow 1.4.0, pycharm warned that python 3.5 has reached its end-of-life date and it is no longer supported in pycharm. The DoppelGANger seemingly not worked normally. Is there any solution?
from doppelganger.
@chameleonzz Could you please post error messages or screenshots of the errors?
from doppelganger.
from doppelganger.
It is feasible for DG with TF 2.1.0 and Python 3.7. However, when I tried to run the code gan_task.py of 'example_training', there was a warning "unresolved reference 'gan' ". I tried to pip install gan, but it seems no corresponding package names gan. How to solve the problem? Thank you for your help.
from doppelganger.
@chameleonzz can you share more information?
what was the python command that you ran? can you share the full warning? is it running with this warning?
the folder gan is part of the project so maybe there is an improt issue that you need to solve.
from doppelganger.
Thank you for your answer. I know how to solve the problem finally. If I want to use the DoppelGANger, there are three main steps. Firstly, a virtual environment is needed to be built, such as tf 2.1.0 + python 3.7. Secondly, pip some packages, such as gan, GPUTaskScheduler and Tensorflow-privacy. Gan package can be downloaded in DoppelGANger Github. Those packages can be downloaded in Github and installed with the command 'pip install -e path/package_file_name. Thirdly, open the entire DoppelGANger(DG) item with pycharm.
But I was confused with another problem. In DG, there are some examples, the type of data files includes '.pkl' and '.npz'. If I want to create a similar data file with my data, how to decide the attributes and features of my data. If there are row data including goggle, web and FCC_MBA and an explanatory for how to decide the attributes and attributes of those data. Maybe more people can understand the work more easily. Besides, what outputs file will be generated after running each example project?
At last, thank you very much for your enthusiastic answers each time. I wish the DG project can be used in more data-driven research. It is really a significant work.
from doppelganger.
Any time :)
you can look for examples in the README file of the project.
there are exmaples for the pkl files and also fot the npz files.
if you need there are also links to download the dataset that was used in this project. so you can download it and read it with python to look on the structre.
moreover, there are links to a number of blogposts so you can try used them.
if you still struggling let me know and I will try to help
Good luck
from doppelganger.
By the way, for future readers of this thread:
If you are looking for TF2 implementation of DoppelGANger, you can look at https://github.com/fjxmlzn/DoppelGANger/tree/TF2 by @yzion
If you are looking for PyTorch implementation of DoppelGANger, you can look at https://synthetics.docs.gretel.ai/en/stable/models/timeseries_dgan.html#timeseries-dgan by Gretel AI.
Recently, I met with another problem.
I tried to run main.py in the example_training file and main_generate_data.py in the example_generating_data file. However, the result was that only a file named results was created. And in sub-files of 'results', there was only a worker_*.log.txt.
Q1: Why no synthetic datasets of [web/google/FCC_MBA] were generated?
I looked for whether there is a place in the code to specify the dataset path. But I found nothing.
Q2: When I know the attributes and features of my datasets, how to generate the four files including data_attribute_output.pkl, data_feature_output.pkl, data_test.npz and data_train.npz. Whether another codes need to be written to achieve this work?
At last, thank you for your continued patient answers.
from doppelganger.
Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?
Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.
from doppelganger.
Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?
Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.
After running example_training/main.py, the content of worker.log was as follows.(The 'aux_disc-False,dataset-FCC_MBA,epoch-17000,epoch_checkpoint_freq-70,extra_checkpoint_freq-850,run-0,sample_len-1,self_norm-False,' file was taken as an example.)
After running example_generating_data/main_generate_data.py, the content of worker_generate_data.log was as follows.
I wonder if the results of example_training/main.py and example_generating_data/main_generate_data.py only have those output files? If I want to generate synthetic data corresponding to real datasets (web/google/FCC_MBA), what should I do?
from doppelganger.
@chameleonzz No, there should be other files, and the content of worker.log or worker_generate_data.log should be more than this line.
Could you please delete results
folder completely, and try running example_training/main.py again, and paste here the console output plus the content of worker.log again?
from doppelganger.
@chameleonzz No, there should be other files, and the content of worker.log or worker_generate_data.log should be more than this line.
Could you please delete
results
folder completely, and try running example_training/main.py again, and paste here the console output plus the content of worker.log again?
Thanks for your answer.
I have tried several times to delete results
folder completely, and try running example_training/main.py again. But the output has also no change. It was the same as in the previous pictures.
Should I change some places in example_training/main.py and run it again?
from doppelganger.
Could you please paste here the console (i.e., terminal) output?
from doppelganger.
@chameleonzz
Also, we can move our future discussion of this question to #30 since the problem you see should likely not be due to TF2
from doppelganger.
Hello, have you solved the problem of incomplete training and generated output now? I had a similar problem recently and I only had a worker.log under the folder I generated.
from doppelganger.
For the previous problem, please refer to #30. For this issue, would you mind creating a new issue? We can discuss it there. This is a different problem.
from doppelganger.
Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?
Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.
Recently, the example_traning\main.py was re-run on a computer with intel i7-11800H CPU @2.30 GHz and 64 GB memory. It cost 5740 minutes to generate the results file named βdataset-google,epoch-400,run-0,sample-len-1β. And the results file named 'dataset-google,epoch-400,run-0,sample-len-5' is generating now?
I have three questions now.
- When I know the attributes and features of my datasets, how to generate the four files including data_attribute_output.pkl, data_feature_output.pkl, data_test.npz and data_train.npz.
- How to decide parameters and Hyperparameters, such as epoch, extra_checkpoint_freq, and so on.
- Now I have not run the example_generating_data\main_generate_data.py, will a result file whose format is a CSV or Xls is generated?
from doppelganger.
Related Issues (20)
- DP_Training HOT 15
- membership_inference_attack HOT 6
- CLI getting stuck on running example_training/main.py HOT 2
- Dynamic attributes / attributes with time stamp? HOT 6
- Request for min/max used for feature and attribute normalization in input data HOT 2
- The data generated ranges from 0 to 2 HOT 3
- Incomplete training HOT 7
- unreasonable output HOT 6
- Dataset HOT 1
- About two MLPs HOT 1
- Training does not run although the input is of the required form HOT 6
- Generating time series with negative values HOT 4
- is_gen_flag HOT 4
- Attribute problematic result HOT 18
- Problem with tensorflow HOT 1
- Training time HOT 6
- Code of AR and HMM baseline
- unknown output type HOT 6
- Request for availability of the scripts used to reproduce figures HOT 8
- Inference from attributes HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from doppelganger.