Git Product home page Git Product logo

virtualdatalab's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

virtualdatalab's Issues

benchmark_example.ipynb Not working

Hey,

I tried running your notebook benchmark_example.ipynb on Google Colab and I get an error :
ModuleNotFoundError: No module named 'virtualdatalab.cython.cython_metric'

Log:

ModuleNotFoundError Traceback (most recent call last)
in <cell line: 4>()
2 from virtualdatalab.synthesizers.flatautoencoder import FlatAutoEncoderSynthesizer
3 from virtualdatalab.synthesizers.shuffle import ShuffleSynthesizer
----> 4 from virtualdatalab.benchmark import benchmark

1 frames
/content/vdl/virtualdatalab/virtualdatalab/benchmark.py in
14 import time
15
---> 16 from virtualdatalab.metrics import compare
17 from virtualdatalab.datasets.loader import load_cdnow,load_berka,load_mlb
18 from virtualdatalab.logging import getLogger

/content/vdl/virtualdatalab/virtualdatalab/metrics.py in
28 from virtualdatalab.target_data_manipulation import _generate_column_type_dictionary
29
---> 30 from virtualdatalab.cython.cython_metric import mixed_distance
31
32

ModuleNotFoundError: No module named 'virtualdatalab.cython.cython_metric'

New Feature: Unify logging interface

Context

VDL users the python logging infrastructure to generate logs. The log creation is currently configured in virtualdatalab/benchmark.py and has implicit effects on all other modules that use logging.

Improvement

Implement a central logging module that is used to setup logging and is called by the other modules.

Benefits

  • Makes the behavior of logging more transparent
  • Break implicit dependency between bechmark.py and other modules

Suggestion: Alter BaseSynthesizer training and generate interface to use callbacks

Context

Synthesizers derived from BaseSynthesizers have the responsibility to call training and generate of the base class, which makes sure the data has specific properties.

Improvement

Instead of requiring this from the user, one could make the derived classes implement callbacks of the kind on_training_begin, on training_end, and on_generate_begin, on_generate_end, which would be called by the BaseSynthesizer.

Impact

Doing so requires the definition of 4 interfaces, that are either used to pass training and synthetic data, or class properties that hold the information.

Benefits

  • A cleaner separation of responsibilities
  • Provide VDL as a framework to better track Synthesizer progress (i.e. monitor and benchmark runtime)

Error Handling: Sequence count starting at one

Context

VDL expects datasets to start they sequence count (column sequence_pos) from 0. If a dataset is provided that has a sequence starting with 1, virtualdatalab.benchmark.compare() fails with a crypt error message about columns not found.

Improvement

Add a check for the correct sequence count to virtualdatalab.synthesizers.utils.check_common_data_format() and make sure that the data format is either checked before compare, or after generation. This could be combined with issue #1 to enforce correct dataset formats.

Benefits

Better user experience in case of wrong values in sequence_pos.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.