Git Product home page Git Product logo

maf's People

Contributors

gpapamak avatar josemanuel22 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

maf's Issues

Migration to version python3.6.4

Trying to understand the code I realized that it runs in python2, I decided to try to do a mini migration. From what I've seen there are serious compatibility problems with python > 3.6. I've managed to get the code working after a few minor changes in version 3.6.4. (It would be necessary to test the changes exhaustively.) Would be could if you can give me access to push my migration branch.

You can find my code in the fork that I have on my github

Updation requested

I'm not an expert but I've been working hard to run this code on google colab. Looks like this is not working with the latest python package. Please make those small changes that are required to run on python3 so that people like me can run this code. Nice paper by the way!

Problem with preprocessing of UCI datasets, especially MiniBooNE

When doing density estimation on the UCI datasets HEPMASS and MiniBooNE, I saw in the appendix D.2 of the article that several dimensions of the raw data were removed since certain real values are reoccurring too frequently. This does make sense to me since such densities would involve Dirac delta distributions being problematic when trying to estimate them with continuous densities. However, when I checked the code I stumbled upon the following lines:

max_count = np.array([v for k, v in sorted(c.iteritems())])[0]

# max_count = np.array([v for k, v in sorted(c.iteritems())])[0]

They seem to compute the maximum over the counts of each real value but when implementing it myself this is not the case. The sorted function is sorting the array based on the first entry, which is the real value corresponding to the count and not the count itself. I demonstrate this problem in the following notebook:
https://gist.github.com/VincentStimper/bed1aa10ac187dc51eefa85e683a7df4
It also showcases the consequences. For the HEPMASS dataset there is coincidentally no difference between the features that get dropped and the features that would be dropped when max_count is computed correctly, i.e. by using

max_count = np.max(np.unique(feature, return_counts=True)[1])

On the other side, for MiniBooNE there are some dimension which are drop although max_count is only moderately high, e.g. 6, while dimensions with values reoccurring 3434 times are kept.

This might be a minor issue but since the version of the MiniBooNE dataset you made publicly available has been used numerous times by others as a benchmark for density estimation I think it is an issue which requires our attention.

Can you provide details on you configuration (theano version especially)

I have tried running your code but got the following error message (MNIST experiments):

theano.gof.fg.MissingInputError: A variable that is an input to the graph was neither provided as an input to the function nor given a value. A chain of variables leading from this input to an output is [x, dot.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, h1, dot.0, Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0, h2, dot.0, logp, Elemwise{mul,no_inplace}.0, Elemwise{exp,no_inplace}.0, Elemwise{mul,no_inplace}.0, Sum{axis=[0], acc_dtype=float64}.0, mean]. This chain may not be unique
Backtrace when the variable is created:
  File "run_experiments.py", line 245, in <module>
    main()
  File "run_experiments.py", line 241, in main
    methods[name]()
  File "run_experiments.py", line 184, in run_experiments_mnist
    ex.train_maf_cond([n_hiddens]*2, act_fun, n_layers*i, mode)
  File "/u/home/maf/experiments.py", line 248, in train_maf_cond
    model = mafs.ConditionalMaskedAutoregressiveFlow(data.n_labels, data.n_dims, n_hiddens, act_fun, n_mades, mode=mode)
  File "/u/home/maf/ml/models/mafs.py", line 172, in __init__
    self.input = tt.matrix('x', dtype=dtype) if input is None else input

It looks like the model is not getting the data properly. Could this be caused by changes in theano version ?

How you preprocess your data?

I am trying to run your code on POWER, GAS datasets.
The data I download from the link is 'txt' files.
However, in your code, you read from a file called 'data.npy'.

def load_data():
    return np.load(datasets.root + 'power/data.npy')

Could you please provide the code to preprocess the data and generate npy files?
Thanks.

Broken datasets due to pandas API changes

Hello @gpapamak,

Due to API changes in pandas, the GAS and HEPMASS datasets are not usable anymore. Notably, the DataFrame.as_matrix method has been deprecated since pandas=0.23.0 and the DataFrame pickling format of pandas<2.0 is not compatible with pandas>=2.0. There is also an issue with Counter.iteritems which is deprecated since Python 3.0.

I don't think modifying this repository to fix these issues is a good idea as it could break the code. Instead, I made a lightweight fork (francois-rozet/uci-datasets) of the repo's UCI datasets and wrote instructions to generate environment-agnostic .npy files containing the processed data. These .npy files can then be used without relying on the original code and its dependencies. I hope it's ok for you.

Can you provide the preprocessed datasets?

It's unclear how every attribute with a Pearson correlation coefficient greater than 0.98 are eliminated. As correlation is calculated in pairs, how do you decide which attribute to eliminate?

Thanks.

Batch normalization

Hi!

Thanks for sharing amazing work!

I'm trying to port your code to PyTorch (for further use in my research).

I have a question regarding your implementation of Batch Norm. As you mention in the paper, it's implemented using global batch statistics. Could you please provide pointers to the lines where it is implemented exactly? My knowledge of Theano is a little bit rusty.

Preprocessed data

Hello,

Thank you for the code.
Could you specify the preprocessing methods you apply in the original datasets (e.g. mnist)? Apart from dequantization, logit and all the functions which are already in the code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.