Page 1 of 1

Lets build a diffusion Autoencoder!

PostPosted: Tue May 28, 2024 11:48 pm
by hbyte
Diffusion Probabilistic Model

My take on it is

1. An autoencoder trained to encode and decode and image
2. Add gausian noise at successive time steps during encoding
3. Remove noise at successive time steps
4. Train on noise or image??

Generate image from image

(Used with CLIP - image + text to generate images from text)

How??

Gaussian Noise N(0,1) p(x) ~= exp(-1/2*pow(|x|,2)) :- Maxwell-Boltzmann distribution of particals

B(t) {0,1} count
std(t) = 1-B(t)
B(t) = 1-std(t-1)/1-std(t)) * B(t) :- Normalize the schedule

Forward diffusion xt(image) = sqrt(1-B(t)*X(t-1) + sqrt(B(t)*Z(t))

Add noise using B(t) and Z(t) = N(0,1) so that at t=lim x(t)=N(0,1)

Z(t) is the fucking noise

q(x) is the encoder - part of autoencoder
p(x) is the decoder

q(x) creates a latent variable described by a mean and std for each time step
p(x) creates a new image

at each time step for each image noise is added to the image using the above schedule
ending with t=T with just a pure noise image

each time step before the current one t-1 the information to reverse the noise for the current step is available which means that

at each time step for each image the loss is calculated as :-

p(xt-1|xt) = reverse diffusion process prob of xt-1 from xt (less noise)

L(t) = Ex(t-1)(-log(p(xt-1|xt)))

L(t) = -log(P(xt-1) - P(xt)) (So I think thats it??)

Its the difference between consecutive outputs t-1 and t that gives the loss.

If you learn the differenc in the noise then you are also learning the signal.