stereo matching

yinfei pan

yinfei pan

Unknown

1 0
  • 0 Collaborators

Stereo vision methods are widely used in photogrammetry, automatic driving, industrial inspection, virtual reality and other fields. Stereo matching, as a key step in stereo vision, has a great influence on the accuracy and speed of measurement results. The heterogeneous acceleration technology base ...learn more

Project status: Under Development

oneAPI

Intel Technologies
DevCloud, DPC++, Intel Integrated Graphics

Code Samples [1]Links [1]

Overview / Usage

This demo project implements the local matching based on the Sum of Absolute Difference (SAD) in the stereo vision algorithm as the original C++ code for host and basic and ND-Range Kernel for accelerators, especially GPUs, to show the effect of different optimization efforts on the performance promotion of the algorithm.

Inspiration

I have just done the FPGA structure design of the stereo matching algorithm and studied the existing CUDA code, and found that there is almost no possibility of reuse. Since I have paid attention to the progress of OneAPI before, I took this opportunity of "The Great Cross Architecture Challenge" to learn more about the writing of dpc++ code and the use of OneAPI toolset.

Usage

The complete implementation details of this project have been posted on Github. Here is a brief description:

1.Clone the stereo-matching-dpc project and compile

2.Prepare left and right images that have already been stereo rectified, or use the test images in the res folder

(1) The two images can be color or grayscale images, but they are converted to grayscale ones by default before processed

(2) The two images should have the same dimensions, and the order of the left and right images should not be reversed

3.Run the application, specify the path of the left and right images and the result image

Methodology / Approach

The stereo-matching-dpc project includes reading the image, evaluating the ND-Range and Basic kernel running on the GPU, comparing the result with the CPU, and saving the result image to the specified path, all launched from main.cpp.

If PERF_NUM is enabled, ND-Range kernel and Basic kernel will run 5 times respectively, and the average running time will be calculated.

Development Process

Basic kernel is almost directly modified from the original C code, using special data types of dpc++, such as buffer and accessor, without paying attention to the movement of data on the host and accelerator. The parallelization of Basic kernel is very direct, and the originally written C functions can even be reused directly.

The writing of ND-Range kernel is relatively complicated, and the concept of thread grid and thread block is a little different from CUDA, which may confuse engineers familiar with CUDA code. CUDA code only needs to focus on the work of a thread block, while the global and local boundaries of dpc++ are relatively blurred, but it provides a richer API for obtaining global and local indexes.

Explicit data handling and reasonable use of work group local memory for data reuse can achieve considerable performance improvements. Building a local memory does not always bring benefits, it depends on whether the data has enough opportunities for reuse.

It is beneficial to always use the what function of sycl::exception. It allows you to debug bugs in the kernel running process, such as setting the work group size that does not match the global range, or the work group size exceeding the maximum range allowed by the device.

Using OneAPI's Adviser tool allows you to analyze the performance bottlenecks in the code, such as the longest running part and the proportion of activated threads in the GPU, and then perform targeted optimization.

Stereo matching needs to calculate the disparity position with the least cost, and the reduction library of dpc++ currently does not support multiple reduction variables and cannot calculate multiple minimum values at the same time, so we choose to parallelize in other dimensions.

Technologies Used

  • Visual Studio Community 2017
  • Intel Adviser
  • DevCloud
  • DPC++

Repository

https://github.com/silverfly1992/stereo-matching-dpc

Comments (0)