borealisai / private-data-generation Goto Github PK
View Code? Open in Web Editor NEWA toolbox for differentially private data generation
License: Other
A toolbox for differentially private data generation
License: Other
Hi,
I got this error message when running pate-gan with the breast cancer dataset.
Traceback (most recent call last): File "evaluate.py", line 144, in <module> lap_scale=opt.lap_scale, class_ratios=class_ratios, lr=1e-4)) File "/content/private-data-generation/models/pate_gan.py", line 104, in train fake = self.generator(torch.cat([z.double(), category], dim=1)) RuntimeError: Sizes of tensors must match except in dimension 0. Got 45 and 64 (The offending index is 0)
I have made sure to drop all the nan values and the values in the dataset are continuous.
Could you please shed some light on the issue?
Here's my code of running pate-gan
python evaluate.py --target-variable='target' \ --train-data-path=./data/breast_processed_train.csv \ --test-data-path=./data/breast_processed_test.csv \ --normalize-data pate-gan --enable-privacy \ --target-epsilon=1
Hi there,
First I wanted to say fantastic work, I'm looking forward to hopefully implementing this on some projects.
I've just run your example code:
python evaluate.py --target-variable='income' --train-data-path=./data/adult_processed_train.csv --test-data-path=./data/adult_processed_test.csv --normalize-data dp-wgan --enable-privacy --sigma=0.8 --target-epsilon=8
but my results are much lower than your example output.
`
Results were obtained on epoch 243, here's the final console output before training stopped:
Epoch : 283 Loss D real : 0.011110783401113983 Loss D fake : 0.010858841290446964 Loss G : 0.010988074410009374 Epsilon spent : 8.001855949312862
Any ideas why my output results are much lower and how I can fix this?
I did have another issue where the parser failed to pass the target variable to the pandas data frame of the train and test data in the evaluate.py. I fixed this by replacing all instances of opt.target_variable
with 'income'
. Not sure if the two issues are linked so I thought I would mention it.
I'm installing the requirements in requirement.txt in a conda environment like this.
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
bzip2 1.0.8 h7b6447c_0
ca-certificates 2023.08.22 h06a4308_0
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
ncurses 6.4 h6a678d5_0
openssl 3.0.12 h7f8727e_0
pip 23.3 py310h06a4308_0
python 3.10.13 h955ad1f_0
readline 8.2 h5eee18b_0
setuptools 68.0.0 py310h06a4308_0
sqlite 3.41.2 h5eee18b_0
tk 8.6.12 h1ccaba5_0
tzdata 2023c h04d1e81_0
wheel 0.41.2 py310h06a4308_0
xz 5.4.2 h5eee18b_0
zlib 1.2.13 h5eee18b_0
However, it always encounters problems when trying to install pandas.
Collecting pandas==0.20.3 (from -r requirements.txt (line 6))
Using cached pandas-0.20.3.tar.gz (10.4 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [15 lines of output]
/tmp/pip-install-rm3spd3q/pandas_066c7fe344da4499ae83440107a7e8eb/setup.py:39: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
/home/quan-tran/anaconda3/envs/private-data-gen/lib/python3.10/site-packages/setuptools/__init__.py:84: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!
********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try `pip install --use-pep517`.
********************************************************************************
!!
dist.fetch_build_eggs(dist.setup_requires)
error in pandas setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Expected end or semicolon (after version specifier)
pytz >= 2011k
~~~~~~~^
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Anyone encountered the same problem? How did you solve it?
Hi, why do generator (and discriminator) inputs include category tensor? I don't understand torch.multinomial's reason, but if it represents labels, why is it forwarded to the generator and discriminator?
In the RDP accountant function, the noise multiplier is defined as the noise scale added to the gaussian for perturbing the gradients divided by the sensitivity which is Sigma*C/2C where sigma is the noise hyper-parameter, C is the clipping coefficient. However, in your RDP function, you only pass Sigma as the input for the noise multiplier. Shouldn't this be corrected or am I missing something ?
Hi I run demo:
python evaluate.py --target-variable='income' --train-data-path=./data/adult_processed_train.csv --test-data-path=./data/adult_processed_test.csv --normalize-data dp-wgan --enable-privacy --sigma=0.8 --target-epsilon=8
But I get very low auc:
AUC scores of downstream classifiers on test data :
LR: 0.3407119900995038
Random Forest: 0.2610576777879843
Neural Network: 0.3578788366713348
GaussianNB: 0.49315881768882325
GradientBoostingClassifier: 0.2606982987000339
May I know how to run the code to get your listed result?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.