megengine / megdiffusion Goto Github PK

View Code? Open in Web Editor NEW

15.0 4.0 0.0 102 KB

MegEngine implementation of Diffusion Models.

License: Apache License 2.0

Python 100.00%

diffusion-models megengine deep-learing generative-model

megdiffusion's Introduction

MegEngine

Documentation | 中文文档

MegEngine is a fast, scalable, and user friendly deep learning framework with 3 key features.

Unified framework for both training and inference
- Quantization, dynamic shape/image pre-processing, and even derivation with a single model.
- After training, put everything into your model to inference on any platform with speed and precision. Check here for a quick guide.
The lowest hardware requirements
- The memory usage of the GPU can be reduced to one-third of the original memory usage when DTR algorithm is enabled.
- Inference models with the lowest memory usage by leveraging our Pushdown memory planner.
Inference efficiently on all platforms
- Inference with speed and high-precision on x86, Arm, CUDA, and RoCM.
- Supports Linux, Windows, iOS, Android, TEE, etc.
- Optimize performance and memory usage by leveraging our advanced features.

Installation

NOTE: MegEngine now supports Python installation on Linux-64bit/Windows-64bit/MacOS(CPU-Only)-10.14+/Android 7+(CPU-Only) platforms with Python from 3.6 to 3.9. On Windows 10 you can either install the Linux distribution through Windows Subsystem for Linux (WSL) or install the Windows distribution directly. Many other platforms are supported for inference.

Binaries

To install the pre-built binaries via pip wheels:

python3 -m pip install --upgrade pip
python3 -m pip install megengine -f https://megengine.org.cn/whl/mge.html

Building from Source

CMake build details. please refer to BUILD_README.md
Python binding build details, Please refer to BUILD_PYTHON_WHL_README.md

How to Contribute

MegEngine adopts Contributor Covenant as a guideline to run our community. Please read the Code of Conduct.
Every contributor of MegEngine must sign a Contributor License Agreement (CLA) to clarify the intellectual property license granted with the contributions.
You can help to improve MegEngine in many ways:
- Write code.
- Improve documentation.
- Answer questions on MegEngine Forum, or Stack Overflow.
- Contribute new models in MegEngine Model Hub.
- Try a new idea on MegStudio.
- Report or investigate bugs and issues.
- Review Pull Requests.
- Star MegEngine repo.
- Cite MegEngine in your papers and articles.
- Recommend MegEngine to your friends.
- Any other form of contribution is welcomed.

We strive to build an open and friendly community. We aim to power humanity with AI.

How to Contact Us

Issue: github.com/MegEngine/MegEngine/issues
Email: [email protected]
Forum: discuss.megengine.org.cn
QQ Group: 1029741705

Resources

MegEngine
MegStudio
mirror repo
- OPENI: openi.org.cn/MegEngine
- Gitee: gitee.com/MegEngine/MegEngine

License

MegEngine is licensed under the Apache License, Version 2.0

Citation

If you use MegEngine in your publication,please cite it by using the following BibTeX entry.

@Misc{MegEngine,
  institution = {megvii},
  title =  {MegEngine:A fast, scalable and easy-to-use deep learning framework},
  howpublished = {\url{https://github.com/MegEngine/MegEngine}},
  year = {2020}
}

megdiffusion's People

Contributors

Stargazers

Watchers

megdiffusion's Issues

The default branch is `master` in `megengine.hub`

So we need to pass git_host argument with the branch name main.

About padding in Downsample

I'm willing to upload my convert codes, but it doesn't work well after converting.
The error between megengine and pytorch implementation are high with the same input.
Because of the padding of convolution in Downsample are different, which in pytorch implementation it uses asymmetric padding.
Atfter I modified the megengine implmetation, the result:

class DownSample(M.Module):
    """"A downsampling layer with an optional convolution.

    Args:
        in_ch: channels in the inputs and outputs.
        use_conv: if ``True``, apply convolution to do downsampling; otherwise use pooling.
    """""

    def __init__(self, in_ch, with_conv=True):
        super().__init__()
        self.with_conv = with_conv
        if with_conv:
            self.main = M.Conv2d(in_ch, in_ch, 3, stride=2)
        else:
            self.main = M.AvgPool2d(2, stride=2)

    def _initialize(self):
        for module in self.modules():
            if isinstance(module, M.Conv2d):
                init.xavier_uniform_(module.weight)
                init.zeros_(module.bias)

    def forward(self, x, temb):  # add unused temb param here just for convince
        if self.with_conv:
            x = F.nn.pad(x, [*[(0, 0)
                         for i in range(x.ndim - 2)], (0, 1), (0, 1)])
        return self.main(x)

Btw, I'm also a beginner in ddpm, your blog helps me a lot!

Originally posted by @Asthestarsfalll in #5 (comment)

eps for GroupNorm

Great work!
The paramter 'eps' in group norm will be initialized to 1e-5 by default.
However, the group norm in TensorFlow has a little diference, which is initialized with 1e-6.
Maybe it doesn't have any influence on training results, but can you just modify this(for all GroupNorm in code) for aligning?
Because I want to convert the trained model from torch or tf to megengine, the less the error is, the better it is.

Handle with saving checkpoint failed

If the machine is preemptive, it might be scheduled to be preempted (or encounter other situations that cause the machine to go down). If the checkpoint is being saved at the exact moment, the original data will be corrupted. Therefore, it is reasonable to keep multiple backups locally. Considering the disk space occupancy, it is better to support cloud storage, such as supporting the use of AWS s3.

Gradient clipping issues in MegEngine v1.9.x

Description

Training with a single GPU & using gradient clipping in this codebase will cause an error in MegEngine 1.9.x version. After 1 iteration with auto diff & parameter update, the next time the model do forward will break. Error message:

RuntimeError: assertion `filter.ndim == img_ndim + 2 || filter.ndim == img_ndim + 4' failed at ../../../../../../imperative/src/impl/ops/convolution.cpp:61: megdnn::TensorLayout mgb::imperative::{anonymous}::convolution::do_shape_infer(const mgb::imperative::OpDef&, size_t, megdnn::TensorLayout, megdnn::TensorLayout)
extra message: bad filter ndim for dense convolution: spatial_ndim=2 filter_ndim=0

Here is the simplest example to reproduce this problem:

import megengine
import megengine.functional as F
import megengine.module as M
import megengine.optimizer as optim
import megengine.autodiff as autodiff

megengine.async_level = 0

class SimpleModel(M.Module):
    def __init__(self, in_ch):
        super().__init__()
        self.conv1 = M.Conv2d(in_ch, in_ch, 3, stride=1, padding=1)
        self.conv2 = M.Conv2d(in_ch, in_ch, 3, stride=1, padding=1)

    def forward(self, x):
        x = self.conv1(x)
        x = F.nn.interpolate(x, scale_factor=1, mode="nearest")
        x = self.conv2(x)
        return x

if __name__ == "__main__":
    x = F.ones((1, 1, 2, 2))
    model = SimpleModel(in_ch = 1)

    optimizer = optim.SGD(model.parameters(), lr=1e-3)
    gm = autodiff.GradManager()
    gm.attach(model.parameters())

    with gm:
        loss = model(x) + 0
        gm.backward(loss)
        
    optim.clip_grad_norm(model.parameters(), max_norm=1.)
    optimizer.step()
    y = model(x)

Workaround

Solution 1: Comment this line in megdiffusion.scripts.train:
```
optim.clip_grad_norm(model.parameters(), FLAGS.grad_clip)
```
Then we can train the model without clipping grad. ( But it's not expected... 😣 )
Solution 2: This situation did not happen when using distributed training.
Solution 3: Try changing loss = model(x) + 0 to loss = model(x) 🤔🤔🤔
Solution 4: Try deleting x = F.nn.interpolate(x, scale_factor=1, mode="nearest") 🤔🤔🤔

Issue Track

This problem was fixed in MegEngine/MegEngine@df5ebd3 so you can wait for the release of MegEngine v1.10 or build MegEngine dev latest than this commit from the source.