Researchers brought KITT to life using AI and Nvidia Omniverse.
Developed by the Nvidia AI Research Lab in Toronto, the GANverse3D app inflates flat images into realistic 3D models that can be viewed and controlled in virtual environments. This capability could help architects, creators, game developers, and designers easily add new objects to their models without the need for 3D modeling expertise or a large budget to spend on renderings.
A single photo of a car, for example, could be turned into a 3D model that can drive around a virtual scene, complete with realistic headlights, taillights and turn signals.
To generate a dataset for training, the researchers leveraged a Generative Adversarial Network, or GAN, to synthesize images depicting the same object from multiple viewpoints — such as a photographer walking around a parked vehicle, taking pictures from different angles. These multi-view images were connected to a rendering framework for inverse graphics, the process of inferring 3D mesh models from 2D images.
When trained on multi-view images, GANverse3D only needs a single 2D image to predict a 3D mesh model. This model can be used with a 3D neural renderer that allows developers to control object customization and override backgrounds.
When imported as an extension into the Nvidia Omniverse platform and run on Nvidia RTX GPUs, GANverse3D can be used to recreate any 2D image in 3D – like the well-known crime-fighting car. beloved KITT from the popular 1980s TV show Knight Rider.
Previous models for inverse graphs relied on 3D shapes as training data.
Instead, without the help of 3D assets, “We turned a GAN model into a very efficient data generator so we could create 3D objects from any 2D image on the web,” said said Wenzheng Chen, Nvidia researcher and lead author of the project. .
“Because we trained on real images instead of the typical pipeline, which relies on synthetic data, the AI model generalizes better to real-world applications,” added Nvidia researcher Jun Gao, author of the project.
The research behind GANverse3D will be presented at two upcoming conferences: the International Conference on Learning Representations in May and the Conference on Computer Vision and Pattern Recognition in June.
Comparison GANverse3D KITT
Game, architecture, and design creators rely on virtual environments like the Nvidia Omniverse simulation and collaboration platform to test new ideas and visualize prototypes before creating their final products. With Omniverse Connectors, developers can use their favorite 3D applications in Omniverse to simulate complex virtual worlds with real-time ray tracing.
But not all designers have the time and resources to create 3D models of every object they design. The cost of capturing the number of multi-view images needed to render the value of a car showroom or the value of a street of buildings can be prohibitive.
This is where a trained GANverse3D application can be used to convert standard images of a car, building, or even a horse into a 3D figure that can be customized and animated in Omniverse.
To recreate KITT, the researchers simply fed the trained model with an image of the car, letting GANverse3D predict a corresponding 3D textured mesh, as well as different parts of the vehicle such as wheels and headlights. They then used Nvidia Omniverse Kit and Nvidia PhysX to convert the predicted texture into high-quality materials that give KITT a more realistic look and feel, and placed him in a dynamic driving sequence alongside other cars.
“Omniverse enables researchers to bring exciting, cutting-edge research directly to creators and end users,” said Jean-François Laflèche, deep learning engineer at Nvidia.
GANverse3D as an extension of Omniverse will help artists create richer virtual worlds for game development, city planning, or even training new machine learning models.
GANs Power Dimensional Shift
Since real-world datasets that capture the same object from different angles are rare, most AI tools that convert images from 2D to 3D are trained using synthetic 3D datasets like ShapeNet.
To get multi-view images from real-world data — like images of cars publicly available on the web — Nvidia researchers instead turned to a GAN model, manipulating its neural network layers into a generator. of data.
The team found that opening the first four layers of the neural network and freezing the other 12 caused the GAN to render images of the same object from different viewpoints.
By keeping the first four layers fixed and the other 12 layers variable, the neural network generated different images from the same point of view. By manually assigning standard viewpoints, with vehicles photographed at a specific altitude and camera distance, the researchers were able to quickly generate a multi-view dataset from individual 2D images.
The final model, trained on 55,000 GAN-generated car images, outperformed an inverse graph network trained on the popular Pascal3D dataset.
The ICLR article was authored by Wenzheng Chen, fellow NVIDIA researchers Jun Gao and Huan Ling, Sanja Fidler, director of NVIDIA’s Toronto research lab, University of Waterloo student Yuxuan Zhang, Yinan Zhang, a Stanford student, and Antonio Torralba, a professor at MIT. Other contributors to the CVPR article include Jean-François Laflèche, Nvidia researcher Kangxue Yin and Adela Barriuso.
The Nvidia Research team consists of more than 200 scientists from around the world, focusing on areas such as AI, computer vision, self-driving cars, robotics and graphics.
Knight Rider ©1982 Universal Television Enterprises, Inc. Courtesy of Universal Studios Licensing LLC.