Art generated through VQGAN+CLIP & Diffusion+CLIP Python scripts on Google Colab. Enter your text and keywords – computer then collates a progression of images refining with every iteration (theoretically one could go on forever). The collation is based on a “training set” of images – basically images with text description – that attempt to replicate your text input. The more precise your keywords (you can add a painter whose style you wish to replicate or a gaming engine – I used “Unreal engine” as a keyword for some of those), the more accurate the script will be in generating what your input text is asking for. In theory, the sky’s the limit – as well as your graphics card and what resources you’ll be allocated through Google Colab (the paid-for version offers more stability, RAM, GPU power, speed and space). Every image is truly unique!
The difference between VQGAN and Diffusion is mainly how they handle the process of generating images. VQGAN concentrates on producing and refining one image and, if allowed, can take an infinite amount of time to do so. Diffusion, on the other hand, can produce up to 5 completely different images from the input text, but the results may not be as refined as what VQGAN can generate because the script has self-imposed time limits. Both work extremely well, depending on what you’re aiming for.
One caveat for both: Do not expect realism! Everything looks somewhat psychedelically impressionistic – spectacular nonetheless.
Added 2 videos in the VQGAN+CLIP section below to show the progression in image generation the VQGAN+CLIP scripts go through. The script produces roughly 12 images (frames) per second. It’s mindblowing what AI can do. I might as well give up trying to be a better artist when a computer can very easily beat us at our own game, eventhough it doesn’t really have any understanding and intention behind what it’s creating! Astounding and somewhat disconcerting! Skynet is indeed here… so we might as well enjoy it.