mostly-ai / virtualdatalab Goto Github PK

View Code? Open in Web Editor NEW

20.0 20.0 6.0 5.28 MB

Benchmarking synthetic data generators for sequential data in terms of accuracy and privacy.

License: GNU General Public License v3.0

Python 99.65% Cython 0.35%

privacy sequential-data synthetic-data

virtualdatalab's People

Stargazers

Watchers

Forkers

jgeofil baudoindesury hitum-dev aliwicks pngu4751 harishgovardhandamodar

virtualdatalab's Issues

benchmark_example.ipynb Not working

Hey,

I tried running your notebook benchmark_example.ipynb on Google Colab and I get an error :
ModuleNotFoundError: No module named 'virtualdatalab.cython.cython_metric'

Log:

ModuleNotFoundError Traceback (most recent call last)
in <cell line: 4>()
2 from virtualdatalab.synthesizers.flatautoencoder import FlatAutoEncoderSynthesizer
3 from virtualdatalab.synthesizers.shuffle import ShuffleSynthesizer
----> 4 from virtualdatalab.benchmark import benchmark

1 frames
/content/vdl/virtualdatalab/virtualdatalab/benchmark.py in
14 import time
15
---> 16 from virtualdatalab.metrics import compare
17 from virtualdatalab.datasets.loader import load_cdnow,load_berka,load_mlb
18 from virtualdatalab.logging import getLogger

/content/vdl/virtualdatalab/virtualdatalab/metrics.py in
28 from virtualdatalab.target_data_manipulation import _generate_column_type_dictionary
29
---> 30 from virtualdatalab.cython.cython_metric import mixed_distance
31
32

ModuleNotFoundError: No module named 'virtualdatalab.cython.cython_metric'

New Feature: Unify logging interface

Context

VDL users the python logging infrastructure to generate logs. The log creation is currently configured in virtualdatalab/benchmark.py and has implicit effects on all other modules that use logging.

Improvement

Implement a central logging module that is used to setup logging and is called by the other modules.

Benefits

Makes the behavior of logging more transparent
Break implicit dependency between bechmark.py and other modules

Adding version information

I think we should versioning information using versioneer similar to other projects.

Suggestion: Alter BaseSynthesizer training and generate interface to use callbacks

Context

Synthesizers derived from BaseSynthesizers have the responsibility to call training and generate of the base class, which makes sure the data has specific properties.

Improvement

Instead of requiring this from the user, one could make the derived classes implement callbacks of the kind on_training_begin, on training_end, and on_generate_begin, on_generate_end, which would be called by the BaseSynthesizer.

Impact

Doing so requires the definition of 4 interfaces, that are either used to pass training and synthetic data, or class properties that hold the information.

Benefits

A cleaner separation of responsibilities
Provide VDL as a framework to better track Synthesizer progress (i.e. monitor and benchmark runtime)

New Feature: add L1D metric; simplify TVD metric

let's add L1D metric;
let's rename accuracy metric to TVD to make it explicit;
let's simplify TVD logic to not depend on conditional probabilities anymore
consider adding Hellinger Distance as well

Add coherence metrics

Error Handling: Sequence count starting at one

Context

VDL expects datasets to start they sequence count (column sequence_pos) from 0. If a dataset is provided that has a sequence starting with 1, virtualdatalab.benchmark.compare() fails with a crypt error message about columns not found.

Improvement

Add a check for the correct sequence count to virtualdatalab.synthesizers.utils.check_common_data_format() and make sure that the data format is either checked before compare, or after generation. This could be combined with issue #1 to enforce correct dataset formats.

Benefits

Better user experience in case of wrong values in sequence_pos.

mostly-ai / virtualdatalab Goto Github PK

virtualdatalab's People

Stargazers

Watchers

Forkers

virtualdatalab's Issues

Log:

Context

Improvement

Benefits

Context

Improvement

Impact

Benefits

Context

Improvement

Benefits

Recommend Projects

Recommend Topics

Recommend Org