OpenAI, the San Francisco-based company best known for its massive GPT-3 natural language model, announced on Wednesday the release of a second version of its text-to-image AI model.
Like its predecessor, the new DALL-E 2 is a neural network that creates images based on natural language sentences entered by the user. But while the Original DALL-EImages of were low resolution and conceptually basic, images generated by DALL-E 2 are five times more realistic and accurate, say OpenAI researchers fast business. Also, the second DALL-E is actually a smaller neural network. (OpenAI declined to give DALL-E 2 dimensions in the settings.)
DALL-E 2 is also a multimodal neural network, meaning it is able to process both natural language and visual images. You can show the model two different images, for example, and ask her to create images that combine aspects of the source images in different ways.
And the creativity that the system seems to display when doing so is, well, a bit confusing. During a demonstration on Monday, DALL-E 2 received two images, one looking like street art, the other something like art deco. He quickly created a set of twenty images arranged in a grid, each different from its neighbour. The system combined various visual aspects of the source images in several ways. In some cases, this seemed to allow the dominant style of one source image to fully express itself, while suppressing the style of the other. Taken together, the new images had a distinct design language from that of the source images.
“It’s really fascinating to see these images being generated with math,” says OpenAI algorithm researcher Prafulla Dhariwal. “And it’s very beautiful.”
OpenAI engineers have been careful to explain the steps they take to prevent the model from creating annoying or harmful images. They removed any images containing nudity, violence or gore from the training dataset, said OpenAI researcher Mark Chen. Without it, Chen says it’s “extremely unlikely” that DALL-E 2 will accidentally produce such things. OpenAI Humans will also monitor images created by users with DALL-E 2. “Adult, violent or political content will not be allowed on the platform,” Chen said.
OpenAI says it plans to gradually roll out access to the new model to “trusted” user groups. “Eventually, we hope to offer access to DALL-E 2 via an API [application programming interface]said Dhariwal. Developers will then be able to create their own applications based on the AI model.
Looking at practical applications of the model, both Dhariwal and Chen envision DALL-E 2 to be useful for graphic designers who could use the tool to help open up new creative avenues. And developers who eventually access DALL-E 2 through the API will likely find new and fresh applications for the technology.
Chen says DALL-E 2 could be an important tool because while creating language seems natural to human beings, creating images isn’t as easy.
But DALL-E 2 is worth making without any immediate practical application. As a multimodal AI, it has basic research value that could benefit other AI systems for years to come.
“Vision and language are both key elements of human intelligence; building models like DALL-E 2 bridges these two areas,” says Dhariwal. “This is a very important step for us as we try to teach machines to perceive the world the way humans do and then eventually develop general intelligence.”