Git Product home page Git Product logo

Comments (10)

Sachin19 avatar Sachin19 commented on July 21, 2024 1

Thanks for pointing this out. You are right. I'll update the lambda values.

from seq2seq-con.

Sachin19 avatar Sachin19 commented on July 21, 2024

Hi,

The loss term requires computing Bessel function (not exponentially scaled Bessel function) which is simply scipy.special.iv but this function is not numerically stable and can lead to issues while training. That is why we instead use scipy.special.ive() with a -k term which should give the same value as scipy.special.iv.

from seq2seq-con.

amanjitsk avatar amanjitsk commented on July 21, 2024

Noted! Cool, thanks for the quick reply!

from seq2seq-con.

amanjitsk avatar amanjitsk commented on July 21, 2024

I was also trying to reproduce the loss function from the paper. I see the losses implemented in loss.py but I am unable to figure what's going on here - particularly where is the

torch.log(1 + kappa) * (0.2-(out_vec_norm_t*tar_vec_norm_t).sum(dim=-1))

coming from ? And why is the output embedding unit normalized ? Does that not defeat the purpose of the norm regularization version ? Could you shed some light on what's going on about here ?

Thanks,
Amanjit

from seq2seq-con.

Sachin19 avatar Sachin19 commented on July 21, 2024

This is just another regularization we were experimenting with which gives slightly better results. Line 41 (commented) gives the exact loss we used in the paper. kappa is the norm of the output vector, log of which is multiplied with the dot product of the normalized vectors in the modified loss. So it is playing a role in the loss computation. I'm not sure what you mean by "defeat the purpose of the norm regularization version".

from seq2seq-con.

amanjitsk avatar amanjitsk commented on July 21, 2024

Right, thanks for the reply. Yeah I mean I was just trying to figure out the motivation for that loss objective. The reason I said the last statement was because I was not sure if the model outputs unit normed vectors (by construction - for ex. just enforced by taking the output of the network and unit normalizing it), because the first loss term (logcmk) would be constant if ||e_hat|| = 1. so I guess my question is, do you just unit normalize the output of the network before doing the nearest neighbour search at evaluation/test time or do you expect that the model will output "approximately unit normalized" vectors and take them as given by the network ?

from seq2seq-con.

Sachin19 avatar Sachin19 commented on July 21, 2024

The purpose of the regularization we mention in the paper is to control the length of the output vector. By taking it's log as in line 42, we aim to reduce the effect the length of the vector has on the loss as we did with lambda_2. It just empirically works better.

The model doesn't enforce any constraint on the output vectors, so ||e_hat|| is not 1. While using vMF loss, we do nearest neighbor search by using vMF probabilities as the metric, where norm of the output vector is playing a role. There is no requirement for the output vectors to be normalized. If you look at line 41, the loss is just written in a decomposed form as norm multiplied by unit length vector, which is the same as the actual vector itself.

from seq2seq-con.

amanjitsk avatar amanjitsk commented on July 21, 2024

Right, makes sense, I was completely ignoring the vMF probability distribution and was assuming naive neighbour search, thanks for the clarification. So if I understand correctly, only the ground truth word embeddings need to be unit-normalized as required by the vMF density. Also, perhaps just a minor question, is there a reason for not vectorizing the code for the loss functions loss.py, or was it simply for more fine-grained control ?

from seq2seq-con.

xxchauncey avatar xxchauncey commented on July 21, 2024

Hi,

Previously you said that scipy.special.iv is numerically unstable, did you mean it can cause underflow in Python?

I applied the vMF loss to my model, and it seems either scipy.special.iv or scipy.special.ive will both return 0.0, thus the logarithm becomes the infinity. The value of 'kappa' produced by my model is just around 0.5, and the value of m I used is 300.

Does that suggest I should approximate the logarithm of C_m?

from seq2seq-con.

VictorSanh avatar VictorSanh commented on July 21, 2024

This is just another regularization we were experimenting with which gives slightly better results. Line 41 (commented) gives the exact loss we used in the paper. kappa is the norm of the output vector, log of which is multiplied with the dot product of the normalized vectors in the modified loss. So it is playing a role in the loss computation. I'm not sure what you mean by "defeat the purpose of the norm regularization version".

It's just a small question of hyper-parameters: in line 41, it seems to me that lambda_1 and lambda_2 are inverted. From my understanding of the paper, 0.1 should be the multiplicative factor in front of the cosine similarity, and not 0.01. Am I missing something?

from seq2seq-con.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.