Multi view 3D Object Reconstruction using Deep Neural Networks
Chirav Dave
Kirkland, Washington
- 0 Collaborators
Integrated ROS enabled 3D Recurrent Reconstruction Neural Network (3DR2N2) in Theano to generate the 3D shape of an object from 2D images and detected grasping poses on it. ...learn more
Project status: Concept
Groups
Student Developers for AI
Intel Technologies
Intel CPU
Overview / Usage
This project integrates both 3D Object Reconstruction and Grasp Pose Detection which are challenging problems involving perception, planning, and control. Whether it is object reconstruction or grasp planning, both require a tremendous amount of feature engineering. Most of the state-of-the-art methods for 3D object reconstruction are subject to a number of restrictions, such as objects must be observed from a dense number of views and views must have a relatively small baseline. This is an issue when users wish to reconstruct the object from just a handful of views or ideally just one view.
In grasp planning, humans can grasp and manipulate an object with great ease, but asking a robot with multi-fingered hand to perform a simple task is not a trivial work because it relates to kinematics, motion planning, force-closure grasp, optimization of grasp forces and finger gaits etc. Also, the number of degrees of freedom (DOF) of a robotic hand creates a large number of hand configurations.
But recently, Deep Learning has demonstrated state-of-the-art performance in a wide variety of tasks, including visual recognition, audio recognition, and natural language processing. These techniques are especially powerful because they are capable of learning useful features directly from both unlabeled and labeled data, avoiding the need for hand-engineering. 3D-R2N2 network outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail. GPD framework infers an optimal grasp for a given object which maximizes the chance of
successfully grasping the object. My goal in this project is to develop an end-to-end system for 3D object reconstruction and grasp
pose detection.
Methodology / Approach
My framework is divided into two stages. In the first stage, I collect live images of an object from multiple views using Kinect sensor for which we want to detect grasping positions. After this I feed these images as input to the 3D-R2N2 network which then generates a 3D
reconstructed shape of the object in the form of a 3D occupancy grid. In the final stage, I convert this occupancy grid into a point cloud format to visualize the same in RVIZ and then run a ROS library called as Grasp Pose Detection(GPD) on this point cloud to get the grasping positions.
Technologies Used
Technology Stack: Python, Numpy, Convolution Neural Network, Recurrent Neural Network
Repository
https://github.com/chiravdave/Projects-Papers/blob/master/3D_ObjectReconstruction_Grasping.pdf