SAN FRANCISCO – At OpenAI, one of the world’s most ambitious artificial intelligence laboratories, researchers are building technology that allows you to create digital images by simply describing what you want to see.
He is also called DALL-E in the wake of “WALL-E”, a 2008 animated film about an autonomous robot, and Salvador Dali, a surrealist painter.
OpenAI, backed by $ 1 billion in funding from Microsoft, still doesn’t share the technology with the public. But a recent afternoon, Alex Nichol, one of the researchers behind the system, showed how it works.
When asked for “avocado-shaped teapot”, typing those words on a mostly blank computer screen, the system created 10 different images of a dark green avocado teapot, some with pits and some without them. “DALL-E is good at avocados,” Mr. Nichol said.
When he typed “chess cats”, he placed two fluffy kittens on either side of the checkered game board, 32 chess pieces lined up between them. When he summoned a “bear playing a trumpet underwater,” one image showed tiny air bubbles rising from the end of the bear’s trumpet toward the surface of the water.
DALL-E can also edit photos. When Mr. Nichol wiped out the bear’s trumpet and asked for a guitar instead, a guitar appeared between his furry hands.
A team of seven researchers spent two years developing the technology, which OpenAI plans to eventually offer as a tool for people like graphic artists, providing new shortcuts and new ideas as they create and edit digital images. Computer developers are already using Copilot, a tool based on similar technology from OpenAI, to generate snippets of software code.
But for many experts, DALL-E is worrying. As this type of technology continues to improve, they say, it could help spread misinformation online, encouraging the kind of online campaigns that may have helped influence the 2016 presidential election.
“You can use it for good things, but you could certainly use it for all sorts of other crazy, worrying applications, and that includes deep fakes,” such as deceptive photos and videos, said Subbarao Kambhampati, a professor of computer science at the University of Arizona. .
Half a decade ago, the world’s leading AI labs built systems that could identify objects in digital images and even generate images themselves, including flowers, dogs, cars and faces. A few years later, they built systems that could do the same with written language, summarizing articles, answering questions, generating tweets, and even writing blog posts.
Now, researchers combining these technologies to create new forms of AI DALL-E is a significant step forward as it juggles both language and images and, in some cases, understands the relationship between them.
“We can now use multiple, cross-flowing information flows to create better and better technology,” said Oren Etzioni, executive director of the Allen Institute for Artificial Intelligence, an artificial intelligence laboratory in Seattle.
The technology is not perfect. When Mr. Nichol asked DALL-E to “put the Eiffel Tower on the moon”, but he didn’t quite understand the idea. He placed the moon in the sky above the tower. When he asked for a “living room filled with sand,” she produced a scene that looked more like a construction site than a living room.
But when Mr. Nichol adjusted his requirements a little, adding or subtracting a few words here or there, it gave what he wanted. When he looked for “a piano in the living room filled with sand”, the picture looked more like a beach in the living room.
DALL-E is what artificial intelligence researchers call a neural network, which is a mathematical system loosely modeled on a network of neurons in the brain. It is the same technology that recognizes commands spoken in smartphones and identifies the presence of pedestrians as self-driving cars move through the city streets.
The neural network learns skills by analyzing large amounts of data. For example, by refining patterns on thousands of photos of avocados, one can learn to recognize avocados. DALL-E searches for patterns as it analyzes millions of digital images, as well as text labels describing what each image displays. In this way he learns to recognize the connections between pictures and words.
When someone describes an image for DALL-E, it generates a set of key features that this image may include. One feature may be the line at the edge of the trumpet. The second could be a curve at the tip of the teddy bear’s ear.
Then, another neural network, called the diffusion model, creates an image and generates the pixels needed to realize these characteristics. The latest version of DALL-E, unveiled on Wednesday with a new research paper describing the system, generates high-resolution images that in many cases look like photographs.
Although DALL-E often does not understand what someone has described and sometimes spoils the image it produces, OpenAI continues to improve technology. Researchers can often improve neural network skills by giving them even more data.
They can also build more powerful systems by applying the same concepts to new types of data. The Allen Institute recently created a system that can analyze sound as well as images and text. After analyzing millions of YouTube videos, including audio and subtitles, he learned to identify certain moments in TV shows or movies, such as a dog barking or a door closing.
Experts believe that researchers will continue to improve such systems. Finally, these systems could help companies improve search engines, digital assistants and other common technologies, as well as automate new tasks for graphic artists, developers and other professionals.
But there are warnings about that potential. Artificial intelligence systems can show bias towards women and people of color, in part because they learn their skills from vast sets of online text, images, and other data that show bias. They can be used to create pornography, hate speech and other offensive material. Many experts believe that technology will eventually make it so easy to create misinformation that people will have to be skeptical of almost everything they see on the internet.
“We can falsify the text. We can put text in someone’s voice. And we can forge pictures and videos, “said Dr. Etzioni. “There is already misinformation on the Internet, but there are concerns that this misinformation will reach new levels.”
OpenAI holds DALL-E firmly. It would not allow outsiders to use the system on their own. It puts a watermark in the corner of each image it generates. Although the lab plans to open the saw system this week, the group will be small.
The system also includes filters that prevent users from generating what they consider inappropriate images. When asked about the “sheep-headed pig”, she refused to take a picture. The combination of the words “pig” and “head” most likely broke OpenAI’s anti-bullying filters, according to the lab.
“This is not a product,” said Mira Murati, head of OpenAI research. “The idea is to understand the possibilities and limitations and give us the opportunity to incorporate mitigation.”
OpenAI can control system behavior in some way. But others around the world could soon create similar technology that puts the same powers in the hands of almost everyone. Working on a research paper describing an early version of DALL-E, Boris Dayma, an independent researcher in Houston, has already made and published a simpler version of the technology.
“People need to know that the pictures they see may not be real,” he said.