Comments (12)
Sorry, I'm not sure what you mean by repeating twice. In paper,
sin(position_rate * pos / np.power(10000, i / d_pos_vec)
(for eveni
)cos(position_rate * pos / np.power(10000, i / d_pos_vec)
(for oddi
)
The strategy in the code is that computing position_rate * pos / np.power(10000, 2 * (i // 2) / d_pos_vec)
(eq. 1) for each i
and pos
and then slicing them with stride 2 as:
position_enc[1:, 0::2] = torch.sin(position_enc[1:, 0::2]) # dim 2i
position_enc[1:, 1::2] = torch.cos(position_enc[1:, 1::2]) # dim 2i+1
from deepvoice3_pytorch.
the strategy is clear.
even values in your code are correct, when odd values are wrong
d_pos_vec = 256
position_rate = 1.0
pos = 1
positions = [position_rate * pos / np.power(10000, 2 * (i//2) / d_pos_vec) for i in range(d_pos_vec)]
positions[0::2] = np.sin(positions[0::2])
positions[1::2] = np.cos(positions[1::2])
print(positions[:4])
[0.8414709848078965, 0.54030230586813977, 0.80196179521478528, 0.59737532508120794]
approach from paper
d_pos_vec = 256
position_rate = 1.0
pos = 1
print(np.sin(position_rate * pos / np.power(10000, 0 / d_pos_vec)))
print(np.cos(position_rate * pos / np.power(10000, 1 / d_pos_vec)))
print(np.sin(position_rate * pos / np.power(10000, 2 / d_pos_vec)))
print(np.cos(position_rate * pos / np.power(10000, 3 / d_pos_vec)))
0.841470984808
0.569695008693
0.801961795215
0.623420035442
the results should be same. Do you agree?
from deepvoice3_pytorch.
I see, you are right. Thank you for catching this. The implementation was actually adapted from https://github.com/jadore801120/attention-is-all-you-need-pytorch. I'm not sure the difference affects speech quality.
from deepvoice3_pytorch.
@r9y9 you are welcome.
I'm not sure as well if it affects the speech quality. Just wanted to clarify.
btw. have you tried to use self-attention idea from attention-is-all-you-need-pytorch?
It's in my list to try it in encoder part of the deepvoice3.
from deepvoice3_pytorch.
I haven't tried it yet. If you get an impressive result with it, that would be great!
from deepvoice3_pytorch.
fixed #20
from deepvoice3_pytorch.
I think the original implementation is right (see paper). i
is the dimension index, not the position of word. PE(pos, 2i)
and PE(pos, 2i+1)
are the basis of space (like in FFT transformation). This allows learn attend by relative position: See
from deepvoice3_pytorch.
@tuan3w Thank you for the explanation. For your information, assuming DeepVoice3 paper is correct, @taras-sereda is right. However, it makes more sense as you pointed out to design positional encoding PE(pos+k, 2i)
can be represented as a linear combination of PE(pos, 2i)
and PE(pos, 2i+1)
for any fixed k
. I understand this allows learn attend by relative position.
@taras-sereda, what do you think?
from deepvoice3_pytorch.
@r9y9 the justification provided by @tuan3w is convincing in favour of approach described in Attention is All you need paper. But clearly pos encoding described in DeepVoice3 paper is different. I'm not sure it makes much difference which one to use. Considering the fact that learnable positional encoding (which assumes no analytical representation) gives similar results.
https://arxiv.org/pdf/1706.03762.pdf Table 3 row(E)
from deepvoice3_pytorch.
@taras-sereda While I don't fully understand why DeepVoice3 uses a slightly different version of positional encoding, personally, either is fine if it actually works. As you may notice, the code in the repository is not trying to replicate DeepVoice3 exactly, but try to build a good TTS based on ideas from DeepVoice3. I have been using https://arxiv.org/abs/1706.03762 style positional encoding (with position rate) so far and get reasonable results. So my question is, did you get a reasonable result with the modification?
from deepvoice3_pytorch.
Reverted aeed225 for now. Happy to reapply if this change actually works well.
from deepvoice3_pytorch.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from deepvoice3_pytorch.
Related Issues (20)
- Key for all speaker_id's
- Slow down speaking rate?
- Samples cutting out early
- Using deprecated Tensorflow 1. HOT 1
- About audio parameters settings
- pre trained model works but goes crazy on some sentences which are a bit long
- DeepVoice3 multi-speaker TTS en demo.ipynb fixes HOT 2
- Problem with lws package HOT 2
- Error while loading the model HOT 1
- Deep voice multi-speaker on Colab has pip install torch==0.3.1 error
- Deep voice 3 multi speaker on Colab - failed building wheel for lws HOT 1
- Unknown hyperparameter type for use_preset HOT 2
- Dataset not available at link
- voice tone
- n_vocab AttributeError
- Installation nightmare
- train.py problem HOT 2
- 'SinusoidalEncoding' object has no attribute '_backend' HOT 1
- Both Sample Colab Notebooks No Longer Work HOT 2
- [CONTRIBUTION] Speech Dataset Generator
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepvoice3_pytorch.