Domain Adaptation Representation Learning Algorithm (as published in JMLR 2016)

License: BSD 2-Clause "Simplified" License

Python 100.00%

domain_adversarial_neural_network's Introduction

Domain Adversarial Neural Network (shallow implementation)

This python code has been used to conduct the experiments presented in Section 5.1 of the following JMLR paper.

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, Victor Lempitsky.
Domain-Adversarial Training of Neural Networks. Journal of Machine Learning Research, 2016.
http://jmlr.org/papers/v17/15-239.html

Content

DANN.py contains the learning algorithm. The fit() function is a very straightforward implementation of Algorithm 1 of the paper.
experiments_amazon.py contains an example of execution on the Amazon sentiment analysis dataset (a copy of the dataset files is contained in the folder data). Computes the target test risk (see Table 1 of the paper) and the Proxy-A-Distance (see Figure 3 of the paper).
experiments_moons.py contains the code used to produce Figure 2 of the paper (experiments on the inter-twinning moons toy problem).
mSDA.py contains the functions used to generate the mSDA representations (these are literal translations of Chen et al. (2012) Matlab code)

domain_adversarial_neural_network's People

Contributors

Stargazers

Watchers

domain_adversarial_neural_network's Issues

Why the test files have labels?

Thanks for sharing your code! However, i don't know why the unlabeled data in the raw data is labeled in your test.svmlight file. In other words, the test.svmlight files should not have labels. Could you please explain it.

Data preprocessing

Hi,
I was unable to create the same amazon data that has already been kindly provided in this repo. I have tried to extract unigram and bigram features from the raw data (the most frequent 5k) yet the result is far from being the same as yours. I have also checked the Chen 2012 paper which you referred to. However the preprocessing step was mentioned in that paper rather obscurely. I wonder can you please explain how exactly were the data being preprocessed?

Thank you.

Hyper-parameter selection ?

It seems that the reverse validation (rv) approach (as mentioned in the paper) to hyper-parameter selection has not been carried out in the repository. So I wanted to ask some of the gory implementation details that you used for reporting in the paper -

For the rv procedure you retrain the neural network multiple times using different parameters and test on the validation splits and choose the hyperparameter with the best result. Once you find the best hyper-parameter, you retrain the neural network using all the data and report accuracy on test data. Correct me if you did not follow such a policy.
Also, reverse validating a neural network over a range of hyparameters followed by retraining using the best hyperparameters, takes time . So, did you use a parallel procedure to report such results.

Question on the gradient

I recently was trying to simulate the results from the paper, for the toy example with the two moons, and stumbled up on this code. Thanks for the code, but I feel that the gradient being used here has a term missing.

Using the same labels as per the code, following are the networks in the code:

hidden_layer = sigmoid(W.X + b)
output = softmax(V.hidden_layer + c)
domain = sigmoid(U.hidden_layer + d)

The cost function(J) for the domain is D - D-hat (assuming D is the true domain and D-hat is the predicted). Hence the gradient of the domain-loss with respect to the bias (d) of the domain-network should be: (read delta-x/delta-y as dx/dy)

delta-J/delta-d = delta-J/delta-sigmoid * delta-sigmoid / delta-(U.hidden_layer + d) * delta-(U.hidden_layer + d) / delta-d
= -1 * sigmoid . (1 - sigmoid) * 1

And hence the update of the 'd' bias should be: (I used the derivative of sigma function from (https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e )

d - lr * delta_d = d - lr * (-1 * sigmoid . (1 - sigmoid))
But as per the code, the update term is

d - lr * delta_d = d - lr * (domain - est-domain)

this is extended for both source domain and target domain.

i do not see the derivative of sigmoid being used here. Is there any explanation for this?

Thanks a lot in advance and merry Christmas,
Kishor Kayyar

How do you create no. of trials of the experiment_amazon.py ?

From the script, it seems that you have used all the xs, xt, xtest for one realization of the experiment. How did you realize different trials or in other words, how did you split your data and report avg. and std. accuracy in your paper ? Kindly explain.

Reproducing results

Hi,
first of all thanks for sharing your code. I was trying to reproduce the results presented in the paper for the Amazon experiment, but I realized that when setting lambda=0.0 and/or adversarial_representation=False the numerical results for the risk do not change at all.

adversarial_representation=True

adversarial_representation=False

graal-research / domain_adversarial_neural_network Goto Github PK

domain_adversarial_neural_network's Introduction

Domain Adversarial Neural Network (shallow implementation)

Content

domain_adversarial_neural_network's People

Contributors

Stargazers

Watchers

Forkers

domain_adversarial_neural_network's Issues

Why the test files have labels?

Data preprocessing

Hyper-parameter selection ?

Question on the gradient

How do you create no. of trials of the experiment_amazon.py ?

Reproducing results

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent