Git Product home page Git Product logo

Comments (12)

r9y9 avatar r9y9 commented on May 27, 2024

Sorry, I'm not sure what you mean by repeating twice. In paper,

  • sin(position_rate * pos / np.power(10000, i / d_pos_vec) (for even i)
  • cos(position_rate * pos / np.power(10000, i / d_pos_vec) (for odd i)

The strategy in the code is that computing position_rate * pos / np.power(10000, 2 * (i // 2) / d_pos_vec) (eq. 1) for each i and pos and then slicing them with stride 2 as:

position_enc[1:, 0::2] = torch.sin(position_enc[1:, 0::2])  # dim 2i
position_enc[1:, 1::2] = torch.cos(position_enc[1:, 1::2])  # dim 2i+1

from deepvoice3_pytorch.

taras-sereda avatar taras-sereda commented on May 27, 2024

the strategy is clear.
even values in your code are correct, when odd values are wrong

d_pos_vec = 256
position_rate = 1.0
pos = 1
positions = [position_rate * pos / np.power(10000, 2 * (i//2) / d_pos_vec) for i in range(d_pos_vec)]
positions[0::2] = np.sin(positions[0::2])
positions[1::2] = np.cos(positions[1::2])
print(positions[:4])
[0.8414709848078965, 0.54030230586813977, 0.80196179521478528, 0.59737532508120794]

approach from paper

d_pos_vec = 256
position_rate = 1.0
pos = 1
print(np.sin(position_rate * pos / np.power(10000, 0 / d_pos_vec)))
print(np.cos(position_rate * pos / np.power(10000, 1 / d_pos_vec)))
print(np.sin(position_rate * pos / np.power(10000, 2 / d_pos_vec)))
print(np.cos(position_rate * pos / np.power(10000, 3 / d_pos_vec)))
0.841470984808
0.569695008693
0.801961795215
0.623420035442

the results should be same. Do you agree?

from deepvoice3_pytorch.

r9y9 avatar r9y9 commented on May 27, 2024

I see, you are right. Thank you for catching this. The implementation was actually adapted from https://github.com/jadore801120/attention-is-all-you-need-pytorch. I'm not sure the difference affects speech quality.

from deepvoice3_pytorch.

taras-sereda avatar taras-sereda commented on May 27, 2024

@r9y9 you are welcome.
I'm not sure as well if it affects the speech quality. Just wanted to clarify.
btw. have you tried to use self-attention idea from attention-is-all-you-need-pytorch?
It's in my list to try it in encoder part of the deepvoice3.

from deepvoice3_pytorch.

r9y9 avatar r9y9 commented on May 27, 2024

I haven't tried it yet. If you get an impressive result with it, that would be great!

from deepvoice3_pytorch.

taras-sereda avatar taras-sereda commented on May 27, 2024

fixed #20

from deepvoice3_pytorch.

tuan3w avatar tuan3w commented on May 27, 2024

I think the original implementation is right (see paper). i is the dimension index, not the position of word. PE(pos, 2i) and PE(pos, 2i+1) are the basis of space (like in FFT transformation). This allows learn attend by relative position: See

from deepvoice3_pytorch.

r9y9 avatar r9y9 commented on May 27, 2024

@tuan3w Thank you for the explanation. For your information, assuming DeepVoice3 paper is correct, @taras-sereda is right. However, it makes more sense as you pointed out to design positional encoding PE(pos+k, 2i) can be represented as a linear combination of PE(pos, 2i) and PE(pos, 2i+1) for any fixed k. I understand this allows learn attend by relative position.

@taras-sereda, what do you think?

from deepvoice3_pytorch.

taras-sereda avatar taras-sereda commented on May 27, 2024

@r9y9 the justification provided by @tuan3w is convincing in favour of approach described in Attention is All you need paper. But clearly pos encoding described in DeepVoice3 paper is different. I'm not sure it makes much difference which one to use. Considering the fact that learnable positional encoding (which assumes no analytical representation) gives similar results.
https://arxiv.org/pdf/1706.03762.pdf Table 3 row(E)

from deepvoice3_pytorch.

r9y9 avatar r9y9 commented on May 27, 2024

@taras-sereda While I don't fully understand why DeepVoice3 uses a slightly different version of positional encoding, personally, either is fine if it actually works. As you may notice, the code in the repository is not trying to replicate DeepVoice3 exactly, but try to build a good TTS based on ideas from DeepVoice3. I have been using https://arxiv.org/abs/1706.03762 style positional encoding (with position rate) so far and get reasonable results. So my question is, did you get a reasonable result with the modification?

from deepvoice3_pytorch.

r9y9 avatar r9y9 commented on May 27, 2024

Reverted aeed225 for now. Happy to reapply if this change actually works well.

from deepvoice3_pytorch.

stale avatar stale commented on May 27, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from deepvoice3_pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.