Git Product home page Git Product logo

stable-diffusion-pytorch's Introduction

stable-diffusion-pytorch

Open in Colab

Yet another PyTorch implementation of Stable Diffusion.

I tried my best to make the codebase minimal, self-contained, consistent, hackable, and easy to read. Features are pruned if not needed in Stable Diffusion (e.g. Attention mask at CLIP tokenizer/encoder). Configs are hard-coded (based on Stable Diffusion v1.x). Loops are unrolled when that shape makes more sense.

Despite of my efforts, I feel like I cooked another sphagetti. Well, help yourself!

Heavily referred to following repositories. Big kudos to them!

Dependencies

  • PyTorch
  • Numpy
  • Pillow
  • regex

How to Install

  1. Clone or download this repository.
  2. Install dependencies: Run pip install torch numpy Pillow regex or pip install -r requirements.txt.
  3. Download data.tar from here and unpack in the parent folder of stable_diffusion_pytorch. Your folders should be like this:
stable-diffusion-pytorch(-main)/
├─ data/
│  ├─ ckpt/
│  ├─ ...
├─ stable_diffusion_pytorch/
│  ├─ samplers/
└  ┴─ ...

Note that checkpoint files included in data.zip have different license -- you should agree to the license to use checkpoint files.

How to Use

Import stable_diffusion_pytorch as submodule.

Here's some example scripts. You can also read the docstring of stable_diffusion_pytorch.pipeline.generate.

Text-to-image generation:

from stable_diffusion_pytorch import pipeline

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts)
images[0].save('output.jpg')

...with multiple prompts:

prompts = [
    "a photograph of an astronaut riding a horse",
    ""]
images = pipeline.generate(prompts)

...with unconditional(negative) prompts:

prompts = ["a photograph of an astronaut riding a horse"]
uncond_prompts = ["low quality"]
images = pipeline.generate(prompts, uncond_prompts)

...with seed:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, uncond_prompts, seed=42)

Preload models (you will need enough VRAM):

from stable_diffusion_pytorch import model_loader
models = model_loader.preload_models('cuda')

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, models=models)

If you get OOM with above code but have enough RAM (not VRAM), you can move models to GPU when needed and move back to CPU when not needed:

from stable_diffusion_pytorch import model_loader
models = model_loader.preload_models('cpu')

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, models=models, device='cuda', idle_device='cpu')

Image-to-image generation:

from PIL import Image

prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('space.jpg')]
images = pipeline.generate(prompts, input_images=images)

...with custom strength:

prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('space.jpg')]
images = pipeline.generate(prompts, input_images=images, strength=0.6)

Change classifier-free guidance scale:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, cfg_scale=11)

...or disable classifier-free guidance:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, do_cfg=False)

Reduce steps (faster generation, lower quality):

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, n_inference_steps=28)

Use different sampler:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, sampler="k_euler")
# "k_lms" (default), "k_euler", or "k_euler_ancestral" is available

Generate image with custom size:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, height=512, width=768)

LICENSE

All codes on this repository are licensed with MIT License. Please see LICENSE file.

Note that checkpoint files of Stable Diffusion are licensed with CreativeML Open RAIL-M License. It has use-based restriction caluse, so you'd better read it.

stable-diffusion-pytorch's People

Contributors

kjsman avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.