Implement a language model from a very simple model to implement the architecture which introduced in Attention Is All You Need paper. google colab link
As this is a language model we don't implement encoder part.
This repository is based on a tutorial from Andrej Karpathy.
Dataset: Tiny Shakespeare dataset
We start with bigram language model which is a very simple model, and by 7 steps convert it to a powerful model these
steps are implemented in code and we can see how each step effects on the model's result.
- Bigram language model
- Add single head self attention module to model
- Add multi head attention module to model
- Add feed forward module
- Create feed forward and multi head attention blocks and repeat them in the model
- Add skip connection and layer normalization
- Add dropout