Google’s TensorFlow deep learning platform added differentiable graphics layers with TensorFlow Graphics, a combination of computer graphics and computer vision. Google says TensorFlow Graphics can solve data labeling problems for complex 3D vision tasks by leveraging a self-supervised training approach.
A computer graphics pipeline typically requires the representation of 3D objects and their absolute position in the scene, description of material, light, and camera. The renderer then uses the scene description to generate a composite render. In contrast, computer vision models take images as input and then infer scene parameters to predict objects and their materials, position, and orientation in three dimensions.
TensorFlow Graphics loops both pipelines so that the vision system extracts the parameters from the scene and the graphics system renders the image. The result of the rendered image matching the original image means that the vision system is well trained. In this setup, computer vision and computer graphics work together to train a machine learning system similar to an automatic encoder that can be trained in a self-supervised way.
Below are some major features of TensorFlow Graphics.
To allow users to debug visually, TensorFlow Graphics adds a TensorBoard plugin that can interactively display 3D meshes and point clouds.
TensorFlow graphing features
TensorFlow provides Colab notebooks covering image rendering operations including object pose estimation, interpolation, object materials, lighting, non-rigid surface deformation, spherical harmonics and mesh convolutions. All of these features are ranked by difficulty.
Object pose estimation plays a key role in many applications, especially robotics. For example, a robotic arm grabbing objects requires an accurate estimation of the pose of those objects relative to the robotic arm. The example below shows the process used to train a rotation shape in a neural network to predict the rotation and translation of an observed object.
Camera models play a key and fundamental role in computer vision, and they have a great impact on the appearance of 3D objects projected onto the image plane. The cube below appears to zoom in and out, but in fact these changes are simply due to changes in camera focal length.
Reflectance defines the interaction of light with an object to illustrate the object’s appearance. Accurately predicting material properties lays the foundation for many tasks.
TensorFlow Graphics also offers advanced features such as spherical harmonic rendering, environment map optimization, semantic mesh segmentation, and more.
The global market for 3D sensors such as smartphone embedded depth sensors and self-driving car LiDARs is growing rapidly. These sensors produce 3D data in the form of point clouds or grids. It is much more difficult for neural networks to perform convolution operations on these characterizations than on regular grid structures due to their structural irregularities. TensorFlow Graphics therefore introduces mesh segmentation, which offers two 3D convolution layers and a 3D clustering layer to allow the network to perform semantic classification training of parts on the grid.
TensorFlow Graphics is compatible with TensorFlow 1.13.1 and higher. Click here for API documentation and installation instructions.
Author: Yuqing Li | Editor: Michel Sarazen