in ADD, why the inputs of teacher nets are the denoised results, rather than the same

rather than the same noise inputs of student nets <p di

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

question of ADD(Adversarial Diffusion Distillation), the inputs of teacher nets about generative-models HOT 4 OPEN

YilanWang commented on September 22, 2024

question of ADD(Adversarial Diffusion Distillation), the inputs of teacher nets

from generative-models.

Comments (4)

jon-chuang commented on September 22, 2024 1

Actually, if you look at the definition of SDS, you will see that it is important that you use the noised generated output. That's because you can then interpret the loss in a nice way; it simplifies to predicting the noise at timestep t.

I think the intuition is that we need the generated output to be in the distribution of the teacher model. Using the noised student output just seems to give a better objective, at least mathematically - one has a ground truth noise to compare against.

from generative-models.

jon-chuang commented on September 22, 2024

One plausible reason is the timesteps s and t, are sampled independently for the student and teacher models respectively. However, this is not a hard obstacle.

The simpler explanation is that it inherits from score distillation sampling which originates in the 3D generation domain (see Dream Fusion), where the inputs to the teacher (image model) and student (3D model producing differentiable 2D rendering) differ vastly.

This provokes the open question of whether feeding the noised original image rather than the noised student output will lead to better or worse results (whether from a final loss or loss curve perspective).

from generative-models.

jon-chuang commented on September 22, 2024

rather than the same noise inputs of student nets

Note that your original suggestion is invalid. Due to the differing choice of noise sampling.

What is valid is the question of which input to noise.

from generative-models.

jon-chuang commented on September 22, 2024

@qp-qp I'm not sure if you're privileged to share, but I think this is an interesting question that is worth shedding light on.

I have a feeling that feeding the original image may lead to degenerate results, as it simply amplifies the original dataset.

If the teacher models is perfectly faithful on the dataset, you would reproduce training on the original dataset.

Perhaps what is beneficial about distillation I.e. feeding the generated output is that it generates new diversity for the teacher model to provide feedback on.

However I am uncertain if any of these thoughts are valid.

from generative-models.

Recommend Projects

question of ADD(Adversarial Diffusion Distillation), the inputs of teacher nets about generative-models HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent