The Science of Interpolation and Embedding?

There is no general Stable Diffusion category so I didn’t know where to ask this. There are a few demos out there on replicate and huggingface that will do latent space interpolation between prompts. How does that work?

If I render “bird” with seed 1 and then seed 2 and then seed 3, I get entirely different images but somehow these latent space interpolation animation scripts can create images just one position different from the previous. What is the science here, can anybody summarize this for me?

Secondly. Using older image models like StyleGAN, you can convert an image into a latent space representation. In the case of StyleGAN, you convert a jpg image to an array of floats/paramers which produces the closest thing possible to the source image. (“Latent me” for example)

Can we convert an image into some vector representation using Stable Diffusion? I understand the reconstructed image might not be 100% perfect. If this is possible, you can start a latent space interpolation with a source (and maybe even destination) image. Of course there are things like interrogate which will give you a prompt that produces somewhat similar results but this is nowhere near converting an image into embeddings.

1 Like

Good questions… this is beyond my current realm of experience so I hope someone else can chime in here. Maybe I can make a general stable diffusion category too although if you don’t get a response here, you may have better luck on official diffusers forums or maybe even diffusers github issues.