What’s hot in Text-to-Image Generation — GLIDE, DALL-E and IMAGEN

Nikhil Verma
6 min readJun 2, 2022

If you haven’t been living under a rock or were busy attending a conference during the last few days, social media should have notified you of yet another diffusion model called IMAGEN which is generating images from text in progression to openAI’s work.

Yet another text to image generator. Who would have thought that Google Brain would also take a shot at diffusion models generating images from text? Recently we have seen quite a lot of advancements in the text-to-image generation realm. So here is a short recap of the most important cornerstones.

  • DALL-E
  • GLIDE
  • DALL-E 2
  • IMAGEN

DALL-E, GLIDE and DALL-E 2

The first big hype was called DALL-E by OpenAI, an autoregressive model that could take in text and generate impressive images even though a bit blurry.

DALL-E

DALL-E is a GPT-like model which, given a piece of text and the start of an image, generates the image Pixel by Pixel, row by row. Then the second bigger bang was made again by OpenAI, but this time with GLIDE, which is a…

--

--

Nikhil Verma

Knowledge shared is knowledge squared | My Portfolio https://lihkinverma.github.io/portfolio/ | My blogs are living document, updated as I receive comments