mostly-ai / virtualdatalab Goto Github PK
View Code? Open in Web Editor NEWBenchmarking synthetic data generators for sequential data in terms of accuracy and privacy.
License: GNU General Public License v3.0
Benchmarking synthetic data generators for sequential data in terms of accuracy and privacy.
License: GNU General Public License v3.0
Hey,
I tried running your notebook benchmark_example.ipynb
on Google Colab and I get an error :
ModuleNotFoundError: No module named 'virtualdatalab.cython.cython_metric'
ModuleNotFoundError Traceback (most recent call last)
in <cell line: 4>()
2 from virtualdatalab.synthesizers.flatautoencoder import FlatAutoEncoderSynthesizer
3 from virtualdatalab.synthesizers.shuffle import ShuffleSynthesizer
----> 4 from virtualdatalab.benchmark import benchmark
1 frames
/content/vdl/virtualdatalab/virtualdatalab/benchmark.py in
14 import time
15
---> 16 from virtualdatalab.metrics import compare
17 from virtualdatalab.datasets.loader import load_cdnow,load_berka,load_mlb
18 from virtualdatalab.logging import getLogger
/content/vdl/virtualdatalab/virtualdatalab/metrics.py in
28 from virtualdatalab.target_data_manipulation import _generate_column_type_dictionary
29
---> 30 from virtualdatalab.cython.cython_metric import mixed_distance
31
32
ModuleNotFoundError: No module named 'virtualdatalab.cython.cython_metric'
VDL users the python logging infrastructure to generate logs. The log creation is currently configured in virtualdatalab/benchmark.py and has implicit effects on all other modules that use logging.
Implement a central logging module that is used to setup logging and is called by the other modules.
I think we should versioning information using versioneer similar to other projects.
Synthesizers derived from BaseSynthesizers have the responsibility to call training and generate of the base class, which makes sure the data has specific properties.
Instead of requiring this from the user, one could make the derived classes implement callbacks of the kind on_training_begin, on training_end, and on_generate_begin, on_generate_end, which would be called by the BaseSynthesizer.
Doing so requires the definition of 4 interfaces, that are either used to pass training and synthetic data, or class properties that hold the information.
see also https://mostly.ai/2020/09/25/the-worlds-most-accurate-synthetic-data-platform/
VDL expects datasets to start they sequence count (column sequence_pos) from 0. If a dataset is provided that has a sequence starting with 1, virtualdatalab.benchmark.compare() fails with a crypt error message about columns not found.
Add a check for the correct sequence count to virtualdatalab.synthesizers.utils.check_common_data_format() and make sure that the data format is either checked before compare, or after generation. This could be combined with issue #1 to enforce correct dataset formats.
Better user experience in case of wrong values in sequence_pos.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.