Git Product home page Git Product logo

labmlai / annotated_deep_learning_paper_implementations Goto Github PK

View Code? Open in Web Editor NEW
46.3K 425.0 4.9K 147.93 MB

🧑‍🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Home Page: https://nn.labml.ai

License: MIT License

Python 45.42% Makefile 0.07% Jupyter Notebook 54.51%
deep-learning deep-learning-tutorial pytorch gan transformers reinforcement-learning optimizers neural-networks transformer machine-learning attention literate-programming

annotated_deep_learning_paper_implementations's Introduction

Twitter Sponsor

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

The website renders these as side-by-side formatted notes. We believe these would help you understand these algorithms better.

Screenshot

We are actively maintaining this repo and adding new implementations almost weekly. Twitter for updates.

Paper Implementations

LSTM

ResNet

U-Net

✨ Graph Neural Networks

Solving games with incomplete information such as poker with CFR.

Highlighted Research Paper PDFs

Installation

pip install labml-nn

Citing

If you use this for academic research, please cite it using the following BibTeX entry.

@misc{labml,
 author = {Varuna Jayasiri, Nipun Wijerathne},
 title = {labml.ai Annotated Paper Implementations},
 year = {2020},
 url = {https://nn.labml.ai/},
}

Other Projects

This shows the most popular research papers on social media. It also aggregates links to useful resources like paper explanations videos and discussions.

This is a library that let's you monitor deep learning model training and hardware usage from your mobile phone. It also comes with a bunch of other tools to help write deep learning code efficiently.

annotated_deep_learning_paper_implementations's People

Contributors

andreemic avatar astramax57 avatar biogeek avatar callanwu avatar captn3m0 avatar cjfghk5697 avatar csl122 avatar eltociear avatar f-hy avatar gstamatiou avatar hlyang1992 avatar hnipun avatar jahatef avatar jakehsiao avatar keshsam avatar lc0 avatar lizhuoq avatar mirmix avatar mmeendez8 avatar mryxj avatar nirmalsinghania2008 avatar nmasnadithya avatar qiangxinglin avatar sachdevkartik avatar saqibameen avatar shakedbr avatar tatsuookubo avatar vpj avatar xboyminemc avatar yangwu1227 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

annotated_deep_learning_paper_implementations's Issues

pylit implementation?

Hello,

Very nice job on this site, I really love the side-by-side code. I was hoping to do something similar for my own documentation and was hoping you could point me to what you used to create these pages. I see something called pylit in a Makefile, along with a link to your own templates for it, but it doesn't appear to be anywhere else under the lab-ml organization.

Thanks very much!

The explanation of `load_balancing_loss` is kind of confusing

Hi, thanks for the nice implementation of the Switch Transformer. But I find the explanation of the "load_balancing_loss" may be confusing. In the tutorial, the formula on the left side calculates the loss is for a single layer. However, if I read it right, the code logic on the right side computes the load_balance_loss sum of all layers. I know this is a small problem, but I just want to make sure the consistency 😁.

PonderNet - possibility of making inference more efficient

Hey there,

First of all, great job on the implementation! I am impressed! BTW the official DeepMind source code is not available yet, right? So I would imagine you wrote it from scratch:)

I am reading through the source code

If I get it correctly, by setting is_halt=True the halting is actually going to take place. However, since one breaks out of the loop only after all samples in the batch halted there is a lot of redundant computation (assuming we only care about y). Did you consider an implementation where you make the batch smaller and smaller until it is empty?

ResNet: replace x with \times

In the paragraph about Bottleneck Residual Block in the ResNet paper implementation, a few latex formulas are wrong and should be corrected.

The first convolution layer maps from `in_channels` to `bottleneck_channels` with a $1x1$

and

* `bottleneck_channels` is the number of channels for the $3x3$ convlution

There are more identical mistakes in the aforementioned section.

502 Bad Gate and always retrying again

Thank you for your brilliant work!
But when I tried running the demo of the Switch Transformer, it failed and represented the error information as below:

LABML WARNING
Failed to send to https://api.labml.ai/api/v1/track?run_uuid=14a2596057db11eca5b41866da9d5f72&labml_version=0.4.134: 502
Bad Gateway
--------------------------------------------------
Retrying again in 10 seconds (0)...

I think it is a network problem. Maybe I can't visit the API from my Linux server for the convenient function.
But is there a choice that I avoid this function? I debugged the code step by step but I didn't find it.
Thank you for your help!

Question about the site

your website was truly impressive and really heplful with readability.
I saw the index.html and became curious How did you make that.
Did you write all that hundreds lines of html code on your own?
Or is there any tool formatting your comments to a certain pre-defined html code?

Internal Covariate Shift example in BatchNorm

Hello!
The example of Internal Covariate Shift in the introduction to Batch Normalization seemed strange to me. It says that

The paper defines Internal Covariate Shift as the change in the distribution of network activations due to the change in network parameters during training.

Although the example is:

For example, let’s say there are two layers l1 and l2. During the beginning of the training l1 outputs (inputs to l2) could be in distribution N(0.5,1). Then, after some training steps, it could move to N(0.5,1). This is internal covariate shift.

There is no difference in parameters of the l1 outputs' distribution. Maybe there should be another values of mean and variance?

Questions about Primer EZ

https://nn.labml.ai/transformers/primer_ez/index.html
130 self.query = nn.Sequential(self.query, SpatialDepthWiseConvolution(self.d_k))
I think there may exist a bug if we use cache because the ``conv'' op has time-dependency.
In Multi-head-Attention, we first project the query, key, value, and add cache if it exists, then we use the conv op.
So, Using the Sequential directly may cause a bug.

The gradient flow of Switch transformer seems wrong?

In the Mixture of Expert (MoE) system, the outputs of the experts are the weighted sum of each expert. The contribution ratio of each expert depends on the gate value, and that is how the gate weights are optimized from. In Switch Transformer, specially, only one expert is picked to contribute to the integrated outputs. I expect the gradient signal of the gate comes from multiplying the scaled route probability with the outputs of the picked expert. However, I found the scaled router probability is mutiplied with the input of the experts, as this line shows. I am wondering whether this is wrong. Looking forward to your reply.

StyleGAN2 original image normalization

Typically a generator has tanh activation and hence outputs an image with pixel range [-1, 1]. The generator in StyleGAN2 returns an RGB image from the RGB block, which hast LeakyReLU activation.

  1. Does the generator still returns an image with pixel range [-1, 1]?
  2. Don't we need to normalize original images?

The only normalization I found is in the discriminator's forward pass:

# Try to normalize the image (this is totally optional, but sped up the early training a little)
x = x - 0.5

Since we are subtracting 0.5, I am assuming the images are normalized to the range [0,1], but the transformer in the dataset does not include this normalization.

Gatv2 directory is missing from labml-nn

After running "pip install labml_nn" I was able to find all directories except for /gatv2:
image

This was confirmed by a visit to pypi.org where after downloading the source file (labml-nn-0.4.103.tar.gz) found the following within the labml__nn/graphs/ directory:

image

I'm guessing this is a mistake since making a GraphAttentionv2 layer requires this file. I apologize if this isn't the right place to raise this.

[BUG] StyleGAN2: latent vector is ignored

The implementation of StyleGAN2 does not learn a mapping for the latent vector z. The vector z is completely ignored, and a variety of generated images is provided by noise. To demonstrate the issue, I created a google colab with a pre-trained model that I trained for 55400 iterations.

Images genertd with a random z and a fixed noise:
image

Images generated with a fixed z and random noise
image

Question about the framework

Thanks for your excellent wor for so many implementations, i was wondering that would you accept some algorithms that are implemented using tensorflow, mxnet or paddlepaddle, rather than pytorch?

ParityDataset not deterministic

First of all, great job on the implementation of PonderNet!!

I have a quick question about the experimental setup. It seems like your ParityDataset is not deterministic.

def __getitem__(self, idx: int) -> Tuple[torch.Tensor, torch.Tensor]:
"""
Generate a sample
"""
# Empty vector
x = torch.zeros((self.n_elems,))
# Number of non-zero elements - a random number between $1$ and total number of elements
n_non_zero = torch.randint(1, self.n_elems + 1, (1,)).item()
# Fill non-zero elements with $1$'s and $-1$'s
x[:n_non_zero] = torch.randint(0, 2, (n_non_zero,)) * 2 - 1
# Randomly permute the elements
x = x[torch.randperm(self.n_elems)]
# The parity
y = (x == 1.).sum() % 2
#
return x, y

I guess for training it is very convenient, however, this also means that evaluating on this dataset will give random results.

However, if this was intended, please ignore this issue:)

Request for a pretrained model

It's really clear of the document. But if I want to add a TransformerXLLayer in my own model and load pretrained weights. Is there a solution?

Feature Request: GPT-2

Thanks for this repo! Code descriptions are well written and code is clear!
It would be worth to provide an implementation of OpenAI GPT-2

Thanks.

Colors in the paper annotation

Hi team labml! Thanks for the great project -- really helpful for the community. It's also great that you've provided annotations for the papers -- just wondering: are there explanations for how to interpret the highlight color used in the PDF files? Thanks!

Request - Object Detection Papers

Currently, labmlai has no implementation for Object Detection Papers such as Yolo Family, FPN, Retinanet.

Do you have any timeline to share them as well?

YOLOv4-CSP

yolov4-CSP paper implementation for large scale object detection
including small and large object

PonderNet implementation Issue

The self.is_halt = False variable is initialized, but is never changed in the code. It is supposed to be a halting condition. I guess some codes are missing.

Save generator and load it only for prediction

Hello,

Thank you for your implementation of cycle gans, it is very clear. I would like to ask if there is a way to save the generators every 500 iterations (exactly when they predict the test images) so I can load them in a different moment and only perform prediction in a specific test set with the loaded model (in a new code, independent of cycle_gan.py)

Thank you,
Agelos

Stride Issue in ResNet?

# The first block for the new feature map size, will have a stride length of $2$
# except fro the very first block
stride = 2 if len(blocks) == 0 else 1

Your annotation in the above code says: "The first block for the new feature map size, will have a stride length of 2, except fro the very first block".

On the contrary, your implementaion means the first convolution of the very first block has a stride of 2, and the first convolution of other blocks have a stride of 1. I think this is an issue.

Issue with autoregressive_experiment.py

Hey

I was trying out the code here - https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/basic/autoregressive_experiment.py
and it throws an error for Division by Zero

Traceback (most recent call last):
  File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/api/__init__.py", line 97, in run
    packets = self._get_packets()
  File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/api/__init__.py", line 92, in _get_packets
    packets = [s.get_data_packet() for s in sources]
  File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/api/__init__.py", line 92, in <listcomp>
    packets = [s.get_data_packet() for s in sources]
  File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/tracker/writers/web_api.py", line 71, in get_data_packet
    self.data['track'] = self.get_and_clear_indicators()
  File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/tracker/writers/web_api.py", line 113, in get_and_clear_indicators
    step = self._mean(step, max_buffer_size)
  File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/tracker/writers/web_api.py", line 85, in _mean       
    blocks = (len(values) + n_elems - 1) // n_elems
ZeroDivisionError: integer division or modulo by zero

I'm not quite sure how to fix this because this seems to be in some other module file

The inference speed of Switch Transformer

Thanks for a nice implementation of Switch Transformer here https://nn.labml.ai/transformers/switch/.
In the original paper, the speed is slower than T5-BASE but they used much einsum in code. I see you choose index operation to handle different experts. I have a question about how this implementation speed compares to the original Transformer (not using switch layer).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.