Attention Is All You Need | a PyTorch Tutorial to Transformers

License: MIT License

Python 100.00%

pytorch pytorch-tutorial attention-is-all-you-need transformer transformer-architecture transformer-tutorial

a-pytorch-tutorial-to-transformers's Introduction

Hello, world! 🌏🌎🌍

♟️ Take a look — chess-transformers.

🤖 I develop AI models.

🐍 I usually work with Python and PyTorch.

a-pytorch-tutorial-to-transformers's People

Contributors

Stargazers

Watchers

a-pytorch-tutorial-to-transformers's Issues

why attend over the <end> token?

Hi @sgrvinod
in the xe train function:

predicted_sequences = model(source_sequences, target_sequences, source_sequence_lengths, target_sequence_lengths) # (N, max_target_sequence_pad_length_this_batch, vocab_size)

The target_sequence_lengths still includes the lengths with the <end> token, and in this case in MultiHead Attention it will be attending over the <end> token.

I think it should be: target_sequence_lengths - 1
predicted_sequences = model(source_sequences, target_sequences, source_sequence_lengths, target_sequence_lengths - 1) # (N, max_target_sequence_pad_length_this_batch, vocab_size)

Please clarify

Error in PyTorch DataLoder

i got this error when i train a model (srgan)

KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/drive/My Drive/PyTorch--master/datasets.py", line 67, in getitem
img = Image.open(self.images[i], mode='r')
KeyError: 0

i'm training the model in google colab
thanks

grouping according to similar lengths

Hi @sgrvinod
Thank you for your Tutorial posted for Attention is all you need. I have a small question, and would appreciate an answer.

In data loader.py you've grouped the batches according to their lengths, so that a batch has similar lengths. Is that necessary to be done? I do understand that it speeds up the training and reduces memory. But my question is does it have any effect on the performance if I don't group the data according to the lengths?

Thanks

The empty val set and test set

Thanks for your tutotial of Attention is all you need, and I have a small question.I would really appreciate for an answer.Because the project only has datasets of training, why we need a val_loader in train.py.Should I download data for val set by myself？
val_loader = SequenceLoader( data_folder=data_folder, source_suffix='en', target_suffix='de', split='val', tokens_in_batch=tokens_in_batch )

import youtokentome is not working presently

I dont know why youtokentome cannot be imported even when doing pip install youtokentome it is prompting an error

sgrvinod / a-pytorch-tutorial-to-transformers Goto Github PK

a-pytorch-tutorial-to-transformers's Introduction

Hello, world! 🌏🌎🌍

a-pytorch-tutorial-to-transformers's People

Contributors

Stargazers

Watchers

Forkers

a-pytorch-tutorial-to-transformers's Issues

why attend over the <end> token?

Error in PyTorch DataLoder

grouping according to similar lengths

The empty val set and test set

import youtokentome is not working presently

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent