Image generation using neural networks

The age of information continues to produce overwhelming volumes of data that we humans have little chance of processing on our own. This surplus of data is the perfect opportunity for Artificial Intelligence to take centre stage, as researchers around the world train learning machines to make sense of this world of numbers.

Computer Vision

During the lockdown, in support of the SBA Covid-19 relief efforts for small businesses, banks and financial institutions were inundated with thousands of loan requests under the Paycheck Protection Program

Broadly speaking, Computer Vision is the field of teaching computers to understand visual data such as images or videos at a high level. For example, an AI can be trained to recognise faces in an image or classify different types of flowers. There are already many examples of this being put into practice, such as face recognition in smartphone cameras, or the automatic blurring of faces or other potentially sensitive information on map street views.

So, we can train an AI to recognise features of an image we give to it, but what about the opposite? We can also ask an AI to generate new images.


OpenAI’s DALL-E, named after the artist Salvator Dalí and Pixar’s intelligent robot WALL-E, is a neural network based off GPT-3; a neural network designed for Natural Language Processing (NLP). DALL-E is trained using a dataset of text-image pairs and can generate images from text prompts. OpenAI have explored a wide variety of the neural network’s capabilities, including combining unrelated concepts, inferring contextual details, and even displaying some understanding of time and geography. For example, with the text prompt “a soap dispenser in the style of a turtle”, DALL-E produces the following images:

Image credit: OpenAI, DALL-E.

These images are the highest of an automatic ranking, and they are not perfect, but most of them do clearly match the prompt. In addition to generating completely new images, DALL-E can also manipulate and transform existing images.

DALL-E is not without its flaws. When asked to use specific colours it is prone to confusing similar colours, and images of geographical concepts such as local cuisines or wildlife tend towards stereotypes.

OpenAI have yet to discuss potential impacts of DALL-E, but it raises many questions. What impact might models like this have on the world of art? DALL-E is able to render images in artistic forms such as paintings or sketches, including specific instructions such as the style of one artist. In the example below, we can see highest ranked results for “a painting of a capybara sitting on a mountain at sunrise” (left) and the same prompt for “a painting in the style of Claude Monet”

Image credit: OpenAI, DALL-E.

Here, in addition to changing the style, DALL-E has also shown an ability to adapt the lighting according to the time of the day. These might not be considered artistic masterpieces, but to non-experts they are convincing paintings. Of course, DALL-E is limited by its data set, it can combine existing concepts and even infer some context in images, but it cannot create something entirely new. It nonetheless raises the question: Will neural networks like DALL-E one day compete with human artists?

The jury is still out on whether DALL-E or its successors will manage to penetrate the artistic scene. But as AI finds uses in ever more aspects of our lives, it’s easy to imagine that we will be seeing models such as DALL-E being used in practical applications in the near future.

What Is Important To You?

Would you like to exchange ideas with us on the subject of digital transformation and process automation without obligation? Let’s talk!