Mouseover the table cells to see the produced
disparity map. Clicking a cell will blink the ground truth for
comparison. To change the table type, click the links below.
For more information, please see the description of new features.
OpenCV 2.4.8 StereoSGBM method, full variant (2 passes). Reimplementation of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semi-global block matching" method; memory-intensive 2-pass version, which can only handle the quarter-size images. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.
SAD window: 3x3 pixel
Truncation value for pre-filter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: true
C/C++; 1 core, i7@3.3 GHz
07/25/14
1
SGBM2
Q
1
26.4
74
27.9
68
12.1
79
17.8
87
13.7
54
74.5
113
14.0
64
30.3
69
26.3
74
11.0
79
64.4
113
37.9
83
25.8
72
25.3
72
29.3
75
43.7
77
OpenCV 2.4.8 StereoSGBM method, single-pass variant. Reimplementation and modification of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.
SAD window: 3x3 pixel
Truncation value for pre-filter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: false
The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.
OpenCV 2.4.8 StereoSGBM method, single-pass variant. Reimplementation and modification of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.
SAD window: 3x3 pixel
Truncation value for pre-filter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: false
C/C++; 1 core, i7@3.3 GHz
07/25/14
4
SGBM1
F
3
28.4
80
43.5
96
9.09
64
13.6
70
25.9
86
82.0
117
14.4
69
43.4
89
30.3
82
5.98
56
59.3
105
45.8
100
28.5
84
24.9
69
20.1
60
45.9
81
OpenCV 2.4.8 StereoSGBM method, single-pass variant. Reimplementation and modification of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.
SAD window: 3x3 pixel
Truncation value for pre-filter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: false
The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.
The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.
Correlation with five, partly overlapping windows on Census transformed images using Hamming distance as matching cost. A left-right consistency check ensures unique matches and filtering small disparity segments removes outliers. Interpolation is done within image rows with the lowest, valid neighboring disparity.
Census window: 7x7 pixel
Correlation window: 9x9 pixel
LR-check: on
Min. segments: 200 pixel
Interpolation: horizontal, lowest neighbor
A fast method for high-resolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/- 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.
nRounds=3
The full set of parameters is listed in the paper and the supplemental materials on the project webpage.
A fast method for high-resolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/- 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.
nRounds=3
The full set of parameters is listed in the paper and the supplemental materials on the project webpage.
C++; Core2 Duo, 2 cores @ 3 GHz
08/27/14
10
LPS
F
3
20.3
59
6.72
30
6.06
47
9.72
46
9.87
43
94.3
121
14.1
65
11.2
35
11.2
43
5.88
54
89.3
122
36.0
74
20.5
46
23.8
64
16.0
49
25.4
47
Kang Zhang, Jiyang Li, Yijing Li, Weidong Hu, Lifeng Sun, and Shiqiang Yang. Binary stereo matching. ICPR 2012.
no post processing is used
the same with the original paper.
C/C++ single thread Intel(R) Core(TM)2 Duo CPU P7370 @ 2.00GHz
This approach is an adaptive local stereo-method. It is integrated into a hierarchical scheme, which exploits adaptive windows. Sub-pix disparities are estimated,but not refined.
L = 10
t = 35
medianK = [3 3]
censusK = [9 7]
lambda = 45;
Block-matching stereo with Summed Normalized Cross-Correlation (SNCC) measure. Standard post-processed is applied, including a left-right check, error island removal (region growing), hole-filling and median filtering.
SNCC (first stage 3x3, second stage 11x11)
min correlation threshold = 0.3
region growing threshold = 2.5 disparity
min region size = 200 pixel
median filter = 1x5 and 5x1
Efficient two-pass aggregation with census/gradient cost metric, followed by iterative cost penalization and disparity re-selection to encourage local smoothness of disparities.
census window size = 9 x 7
max census distance = 38.03
max gradient difference = 2.51
census/gradient balance = 0.09
aggregation window size = 33 x 33
aggregation range parameter = 23.39
aggregation spatial parameter = 7.69
refinement window size = 65 x 65
refinement range parameter = 11.30
refinement spatial parameter = 17.20
cost penalty coefficient = 0.0023
median filter window size = 3 x 3
3 iterations of refinement
confidence threshold of 0.1 for sparse maps
In stereo matching cost filtering methods and energy minimization algorithms are considered as two different techniques. Due to their global extend energy minimization methods obtain good stereo matching results. However, they tend to fail in occluded regions, in which cost filtering approaches obtain better results. In this paper we intend to combine both approaches with the aim to improve overall stereo matching results.
We propose to perform stereo matching as a two-step energy minimization algorithm. We consider two MRF models: a fully connected model defined on the complete set of pixels in an image and a conventional locally connected model. We solve the energy minimization problem for the fully connected model, after which the marginal function of the solution is used as the unary potential in the locally connected MRF model.
Only gradient component (6D vector) of color images is used
A local matching technique utilizing SAD+Census cost measure and a recursive edge-aware aggregation through Successive Weighted Summation. Occlusion handling is provided via left-right cross check and a background favored filling.
smoothness parameter sigma = 24
5x5 Census window, Census weight=0.7, SAD weight=0.3, occlusion threshold=2
This approach triangulates the polygonized SLIC segmentations of the input images and optimizes a lower-layer MRF on the resulting set of triangles defined by photo consistency and normal smoothness. The lower-layer MRF is solved by a quadratic relaxation method which iterates between PatchMatch and Cholesky Decomposition. The lower-layer MRF is assisted by a upper-layer MRF defined on the set of triangle vertices which exploits local 'visual complexity' cues and encourages smoothness of the vertices' splitting properties. The two layers interact through an Alignment energy term which requires triangles sharing a non-split vertex to have their disparities agree on that vertex. Optimization of the whole model is iterated between optimizations of the two layers till convergence where the upper-layer can be solved in closed form.
omega=0.2
tau_grad=15
theta goes from 0 to 100 by smoothstep function in ten iterations
gamma1=30
gamma2=60
gamma3=0.8
Compute the matching cost with a convolutional neural network (accurate architecture). Then apply cross-based cost aggregation, semiglobal matching, left-right consistency check, median filter, and a bilateral filter.
DETAILS:
The network is similar to the one described in our CVPR paper differing only in the values of some hyperparameters. The input to the network are two 11 x 11 image patches. Five convolutional layers with 3 x 3 kernels and 112 feature maps extract feature vectors from the input image patches. The two 112-length feature vectors are concatenated into a 224-length vector which is passed through three fully-connected layers with 384 units each. The final (fourth) fully-connected layer projects the output to a single number---the matching cost. One important addition was the use of data augmentation techniques to increase the size of the training set. We tried to use as much training data as possible. Therefore we combined all of the 2001, 2003, 2005, 2006, and 2014 Middlebury datasets obtaining 60 image pairs. For the newer datasets (2005, 2006, and 2014) we also used several illumination and exposure settings.
A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. (Improved results as of 9/14/2015 due to bug fix in color-to-gray conversion.)
Standard parameters of Libelas as provided with the MiddEval3-SDK.
The method generates multiple proposals on absolute and relative disparities from multi-segmentations. The proposals are coordinated by point-wise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes.
We post-process the depth maps produced by Zbontar & LeCun's MC-CNN technique. We use a domain transform to compute an edge-aware variance measure of our confidence in the depth map, and then run our robust bilateral solver on that depth map and confidence with a Geman-McClure loss function.
The MC-CNN is computed using the publicly-available implementation (https://github.com/jzbontar/mc-cnn) which using the GPU, and the robust bilateral solver is computed using our CPU implementation which does not use the GPU, and is written in vanilla C++.
Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz, 6 cores; 32 GB RAM; NVIDIA GTX TITAN X
This paper proposes a new image-guided non-local dense matching method with a three-step optimization based on the combination of image-guided methods and energy function-guided methods.
Cost Computation:
Window Size: 5
Weighting Coefficient: 0.3
Truncation Threshold (Census): 15
Truncation Threshold (HOG): 1
Image-guided Non-local Matching:
Smooth Term: 6
Penalty Term P1: 0.3
Penalty Term P2: 3
Disparity Interpolation:
Truncation Threshold: 5
Smooth Term: 3
Penalty Term P1: 3
Penalty Term P2: 30
Function Base: 5
An efficient stereo matching algorithm, which applies adaptive smoothness constraints using texture and edge information, is proposed in this work. First, we determine non-textured regions, on which an input image yields flat pixel values. In the non-textured regions, we penalize depth discontinuity and complement the primary CNN-based matching cost with a color-based cost. Second, by combining two edge maps from the input image and a pre-estimated disparity map, we extract denoised edges that correspond to depth discontinuity with high probabilities. Thus, near the denoised edges, we penalize small differences of neighboring disparities.
The method uses the MC-CNN code for the matching cost computation only.
Compute the matching cost with a convolutional neural network (fast architecture). Then apply cross-based cost aggregation, semiglobal matching, left-right consistency check, median filter, and a bilateral filter.
Our approach is an extension of the ELAS (from Geiger et al.) algorithm. We extract edges and sample our candidate support points along them. For every two consecutive valid support points we create a (straight) line segment. We force the triangulation to include the set of line segments (constrained Delaunay) for a better preservation of the disparity discontinuity at the edges.
Parameters as in the original ELAS algorithm.
For sampling candidate support points along the edge segments:
Adaptive sampling activated:
step = ceil(sqrt(img_diag)*0.5);
sampler(sqrt(step) / 2, step / 2, step / 2);
The computation of the sparse disparity maps is achieved by means of a 3D diffusion of the costs contained in the disparity space volume. The watershed segmentations of the left and right views control the diffusion process and valid measurements are obtained by cross-checking.
The estimation of the dense disparity maps uses the sparse measurements as control points and is driven by a 3D watershed separating the disparity space volume into foreground and background pixels.
No post processing (no filtering, no hole-filling, no interpolation) performed.
The concepts of intrinsic curves were revisited and used for:
- disparity search space reduction, resulting in 83% reduction of the disparity range (individually for each pixel) directly from the original resolution of the image without needing hierarchical search
- reducing the ambiguities due to occluded pixels by integrating occlusion clues explicitly into the global energy function as a soft prior
The final energy minimization was done using semi global approach along eight paths.
Matching (data) cost = census transform 7*9
Occlusion cost= from intrinsic curves curvature
Incorporating cues from top-down (holistic) scene understanding into existing bottom-up stereo reconstruction techniques (CoR - Chakrabarti et al. CVPR 2015).
Learned weightings (from 2006 dataset) for High Level Scene Cues. Default parameters for CoR. Images with max disp > 256 were downsampled before the SGM step of CoR.
A 3D label based method with global optimization at pixel level. A bilayer matching cost is employed by first matching small square windows then aggregate on large irregular windows. Global optimization is carried out by fusing candidate proposals, which are generated from our specific superpixel structure.
We propose a method to combine the predicted surface normal constraint by deep learning. With the selected reliable disparities from stereo matching method and effective edge fusion strategy, we can faithfully convert the predicted surface normal map to a disparity map by solving a least squares system which maintains discontinuity. We use the raw matching cost of MC-CNN.
A novel pooling scheme is used to train a matching cost function with a CNN. It widens the size of receptive field effectively without losing the fine details.
The overall post-processing pipeline is kept almost same as the original MC-CNN-acrt, except that the parameter settings are changed as follows:
cbca_num_iterations_1 = 0, cbca_num_iterations_2 = 1, sgm_P1 = 1.3, sgm_P2 = 17.0, sgm_Q1 = 3.6, sgm_Q2 = 36.0, and sgm_V = 1.4.
Torch; the Intel core i7 4790K
CPU and a single Nvidia Geforce GTX Titan X GPU
An energy minimization framework for disparity estimation where energy function consists of intensity matching cost, feature matching cost, IGMRF prior and sparsity priors.
This is a new weakly supervised method that allows to learn deep metric for stereo reconstruction from unlabeled stereo images, given coarse information about the scenes and the optical system. The deep metric architecture is similar to MC-CNN fst.
This is a segmentation based stereo matching algorithm using an adaptive multi-cost approach, which is exploited for obtaining accuracy disparity maps.
We propose a cost aggregation method that efficiently weave together MST-based support region filtering and PatchMatch-based 3D label search. We use the raw matching cost of MC-CNN.
We propose a novel method for stereo estimation, combining advantages of convolutional neural networks (CNNs) and optimization-based approaches. The optimization, posed as a conditional random field (CRF), takes local matching costs and consistency-enforcing (smoothness) costs as inputs, both estimated by CNN blocks. To perform the inference in the CRF we use an approach based on linear programming relaxation with a fixed number of iterations. We address the challenging problem of training this hybrid model end-to-end. We show that in the discriminative formulation (structured support vector machine) the training is practically feasible. The trained hybrid model with shallow CNNs is comparable to state-of-the-art deep models in both time and performance. The optimization part efficiently replaces sophisticated and not jointly trainable (but commonly applied) post-processing steps by a trainable, well-understood model.
Our method is local matching approach using the Guided Filter for cost aggregation. We give appropriate the Guided Filter size for each pixel in input image by the Filter Size Map computed by using the DoG Kernel.
Parameters for Filter Size Map computation:
DoGparam.scalesize = 25 (index of scale space)
DoGparam.mfsize = 1 (window size for Filter Size Map optimization)
Parameters for Guided Filter:
eps = 0.001
Parameters for cost computation:
gamma = 0.11 (Weight of cost)
Parameters for Bilateral Filter in disparity map optimization:
gamma_c = 1
gamma_d = 11
r_median = 19
We propose local expansion moves for estimating dense 3D labels on a pairwise MRF. The data term uses a PatchMatch-like 3D slanted window formulation, where raw matching costs within a window are computed by MC-CNN-acrt and aggregated using guided image filtering. The smoothness term uses a pairwise curvature regularization term by Olsson et al. 2013.
We propose a feature ensemble network leveraging deep convolutional neural network to perform matching cost computation and the disparity refinement. For matching cost computation, patch-based network architecture with multi-size and multi-layer pooling unit is adopted to learn cross-scale feature representations. For disparity refinement, the initial optimal and sub-optimal disparity maps are incorporated and diverse base learners are applied.
We propose a robust learning-based method for stereo cost volume computation. We accomplish this by coalescing diverse evidence from a bidirectional matching process via random forest classifiers. We show that our matching volume estimation method achieves similar accuracy to purely data-driven alternatives and that it generalizes to unseen data much better. In fact, we used the same model trained on Middlebury 2014 dataset to submit to the KITTI and ETH3D benchmarks.
We extend the standard BP sequential technique to the fully connected CRF models with the geodesic distance affinity.
Also a new approach to the BP marginal solution is proposed that we call one-view-occlusion detection (OVOD). In contrast to the standard winner takes all (WTA) estimation, the proposed OVOD solution allows to find occluded regions in the disparity map and simultaneously improve the matching result.
As a result we can perform only one energy minimization process and avoid the cost calculation for the second view and the left-right check procedure.
All parameter settings are given in the C++ MS VS project available at the project website.
Intel(R) Xeon(R) CPU E5-1620 v4 @3.50 GHz
12/11/17
56
OVOD
H
2
8.87
23
4.74
15
3.64
11
5.51
14
4.82
26
12.8
31
6.51
21
9.91
30
9.96
37
3.13
23
16.6
27
14.8
23
14.1
26
15.4
25
6.92
18
13.2
26
Hong Li and Chunbo Cheng. Adaptive weighted matching cost based on sparse representation. Submitted to IEEE TIP, 2018.
This paper proposes a novel non-data-driven matching cost for dense correspondence in view of sparse representation. This new matching cost can separate the source of impact such as illuminations and exposures, thus making it more suitable and selective for stereo matching. In addition, the new matching cost can be used as a adaptive weight in the process of cost calculation, and can improve the accuracy of the matching costs by weighting.
We propose a stereo matching algorithm that directly refines the winner-take-all (WTA) disparity map by exploring its statistic significance. WTA disparity maps are obtained from the pre-computed raw matching costs of MC-CNN-acrt.
Semi-Global Matching (SGM) uses an aggregation scheme to combine costs from multiple 1D scanline optimizations that tends to hurt its accuracy in difficult scenarios. We propose replacing this aggregation scheme with a new learning-based method that fuses disparity proposals estimated using scanline optimization. Our proposed SGM-Forest algorithm solves this problem using per-pixel classification. SGM-Forest currently ranks 1st on the ETH3D stereo benchmark and is ranked competitively on the Middlebury 2014 and KITTI 2015 benchmarks. It consistently outperforms SGM in challenging settings and under difficult training protocols that demonstrate robust generalization, while adding only a small computational overhead to SGM.
Median disparity over all training images of the ROB 2018 stereo challenge.
This submission is a baseline for the Robust Vision Challenge ROB 2018. Each pixel is set to the median disparity of the pixels at the same location in the training images. No test image information is used.
03/23/18
63
MEDIAN_ROB
H
2
97.8
126
96.1
125
95.6
125
99.0
126
98.4
126
98.4
125
99.2
126
98.4
126
98.1
125
99.0
126
99.0
126
99.6
126
99.9
126
94.7
126
95.1
125
98.3
125
Average disparity over all training images of the ROB 2018 stereo challenge.
This submission is a baseline for the Robust Vision Challenge ROB 2018. Each pixel is set to the average disparity of the pixels at the same location in the training images. No test image information is used.
A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed.
Updated ELAS submission as a baseline for the Robust Vision Challenge (http://robustvision.net), replacing the original ELAS (H) entry.
Standard parameters as provided with the MiddEval3-SDK and the Robust Vision Challenge stereo devkit.
A modification of the FlowNet 2 architecture [1] for the Robust Vision 2018 Stereo Challenge.
[1] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. Flownet 2.0: Evolution of optical flow estimation with deep networks. CVPR 2017.
See paper.
GTX1070
05/22/18
69
DN-CSS_ROB
H
2
22.8
64
31.4
72
9.28
67
13.5
68
12.4
51
44.3
77
12.1
46
28.1
66
17.6
58
9.11
70
50.9
92
40.0
88
21.2
50
25.0
70
31.9
82
43.2
74
Jie Li, Penglei Ji, and Xinguo Liu. Superpixel alpha-expansion and normal adjustment for stereo matching. Proceeding of CAD/Graphics 2019.
c/c++; core i7 7700@3.6GHz
05/26/18
70
NOSS_ROB
H
2
5.01
1
3.57
3
2.84
3
3.99
5
1.93
2
5.15
1
3.34
2
3.32
2
3.15
1
2.32
5
8.55
2
7.45
1
7.06
3
12.5
4
5.20
4
9.06
4
Benedikt Wiberg. Stereo matching with neural networks. Bachelors thesis, TU Munich 2018. ROB 2018 entry.
Neural Network based on a multidimensional similarity metric and Deeplab v3+
Numerous CNN algorithms focus on the pixel-wise matching cost computation, which is the important building block for many state-of-the-art algorithms. However, these architectures are limited to small and single scale receptive fields and use traditional methods for cost aggregation or even ignore cost aggregation. In this paper, we propose a novel architecture called cascaded multi-scale and multi-dimension network (MSMD) to take them both into consideration. Firstly, we propose a new multi-scale matching cost computation sub-network, in which two different sizes of receptive fields are implemented parallelly. In this way, the network can make the best use of both variants to balance the trade-off between the increase of receptive field and the loss of details. Furthermore, we show that our multi-dimension aggregation sub-network which contains 2D convolution and 3D convolution operations can provide rich context and semantic information for estimating an accurate initial disparity.
A robust solution for semi-dense stereo matching is presented. It utilizes two CNN models for computing stereo matching cost and performing confidence-based filtering, respectively. Compared to existing CNNs-based matching cost generation approaches, our method feeds additional global information into the network so that the learned model can better handle challenging cases, such as lighting changes and lack of textures. Through utilizing non-parametric transforms, our method is also more self-reliant than most existing semi-dense stereo approaches, which rely highly on the adjustment of parameters.
Matlab, GTX1080Ti, Lua, Python
06/27/18
76
DCNN
H
2
10.9
32
5.66
23
4.98
32
6.49
23
5.73
29
12.5
30
8.51
33
15.6
44
10.9
42
3.08
21
24.1
40
20.2
36
16.8
36
15.5
27
10.3
33
13.8
27
Julien Valentin, Adarsh Kowdle, Jonathan Barron, et al. Depth from motion for smartphone AR. ACM TOG 37(6):193 (Proc. of SIGGRAPH Asia), 2018.
Single core of a Mobile Phone (QualComm Snapdragon 821 Kryo @ 2.15Ghz)
Congxuan Zhang, Junjie Wu, Zhen Chen, Wen Liu, Ming Li, and Shaofeng Jiang. Dense-CNN: Dense convolutional neural network for stereo matching using multi-scale feature connection. Submitted to Signal Processing and Image Communication, 2019.
we propose a MST-based stereo
matching method using image edge and brightness
information due to the classical MST based methods were
used to produce the inaccurate matching weight in the
areas of image boundaries and similar color background.
We propose four efficient feature extractors based on convolutional neural networks for stereo matching cost computation. Two of them generate multiscale features with diverse receptive field sizes. These multiscale features are used to compute the corresponding multiscale matching costs. We then determine an optimal cost by combining the multiscale costs using edge information. On the other hand, the other two feature extractors produce uni-scale features by combining multiscale features directly through fully connected layers. Finally, after obtaining matching costs using one of the four extractors, we determine optimal disparities based on the cross-based cost aggregation and the semiglobal matching.
We design a full-convolutional network to generate disparity map as a regression problem. Applying pyramid pooling and skip connection to integrate hierarchical context information.
Julia Navarro and Antoni Buades. Dense and robust image registration by shift adapted weighted aggregation and variational completion. Submitted to Image and Vision Computing, 2019.
The method comprises two main steps. First, we use adaptive support weights for local matching. Apart from the color similarity and geometric distance, the adaptive weight distribution favors pixels in the block matching with smaller cost. Besides, we use a multiscale strategy with invalidation criteria to reduce match ambiguity and computational time.
Second, a global interpolation using a variational formulation is carried out. The energy functional penalizes deviations from the local disparity estimation at different scales.
Local approach (DAWA): 23x23 squared window, beta=11, lambda=6, gamma=4, pixel precision 1/4, three scales for multiscale procedure.
Variational model: alpha=1, gamma=5, phi1=30, phi2=15.
Stereo matching process is attracted numbers of study in recent years. The process is unique and difficult due to visual discomfort occurred which contributed to effect of accuracy of disparity maps. By using multistage technique implemented most of Stereo Matching Algorithm; taxonomy by D. Scharstein and R. Szeliski, in this paper proposed new improvement algorithm of stereo matching by using the effect of Adaptive Weighted Bilateral Filter as main filter in cost aggregation stage which able contribute edge-preserving factor and robust against plain colour region. With some improvement parameters in matching cost computation stage where windows size of sum of absolute different (SAD) and thresholds adjustment was applied and Median Filter as main filter in refinement disparity map’s stage may overcome the limitation of disparity map accuracy. Evaluation on indoor datasets, latest (2014) Middlebury dataset were used to prove that Adaptive Weighted Bilateral Filter effect applied on proposed algorithm resulted smooth disparity maps and achieved good processing time.
Yuhao Xiao, Dingding Xu, Guijin Wang, Xiaowei Hu, Yongbing Zhang, Xiangyang Ji, and Li Zhang. Confidence map based 3D cost aggregation with multiple minimum spanning trees for stereo matching. ACPR 2019.
C++;i7-6700 3.4GHz CPU;32G memory
03/09/19
91
3DMST-CM
H
2
5.47
3
4.10
11
3.37
10
2.99
2
2.95
13
7.63
11
4.55
8
3.26
1
3.95
9
2.16
3
10.2
7
8.28
2
6.37
1
13.2
7
5.86
11
9.35
6
Chunbo Cheng, Hong Li, and Liming Zhang. A new stereo matching cost based on two-branch convolutional sparse coding and sparse representation. Submitted to IEEE TIP, 2019.
This paper presents a novel unsupervised stereo matching cost for stereo matching. Specifically, a novel two-branch convolutional sparse coding (CSC) is used to learn the convolution filter bank without ground truth disparity maps. Then, the sparse representations over the learned convolutional filter bank are utilized to measure the similarity between image patches, namely, the stereo matching cost can be computed by measuring the l1 distance between sparse representations of image patches.
Matlab/C/C++; 1 i5 core @2.9 GHz
04/12/19
92
TCSCSM
H
2
19.1
56
45.2
101
5.76
44
11.0
57
22.1
74
41.1
72
13.4
57
24.8
57
11.4
47
7.17
60
29.5
48
26.6
51
26.6
75
20.5
48
16.5
50
17.4
35
Sonali Patil, Tanmay Prakash, Bharath Comandur, and Avinash Kak. A comparative evaluation of SGM variants for dense stereo matching. Submitted to PAMI, 2019.
Hierarchical MGM-16 where coarser level results limit per pixel disparity search range. Post-Processing at each level include Joint Bilateral Filter, Peak removal and, consistency check. The final disparity maps are interpolated using Discontinuity preserving interpolation
See Paper
C/C++; Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 16 Cores
In this work, we propose a learning-based method to denoise and refine disparity maps of a given stereo method. The proposed variational network arises naturally from unrolling the iterates of a proximal gradient method applied to a variational energy defined in a joint disparity, color, and confidence image space. Our method allows to learn a robust collaborative regularizer leveraging the joint statistics of the color image, the confidence map and the disparity map. Due to the variational structure of our method, the individual steps can be easily visualized, thus enabling interpretability of the method. We can therefore provide interesting insights into how our method refines and denoises disparity maps. The efficiency of our method is demonstrated by the publicly available stereo benchmarks Middlebury 2014 and Kitti 2015.
GTX 2080Ti
05/13/19
94
VN
H
2
14.2
43
9.69
38
9.58
70
10.9
56
7.33
34
9.54
16
13.8
61
11.3
36
11.3
46
7.17
60
27.4
45
23.3
44
24.8
66
22.8
60
14.6
46
18.4
39
Wei Wang, Wei Bao, Yulan Guo, Siyu Hong, Zhengfa Liang, Xiaohu Zhang, and Yuhua Xu. An indoor real scene dataset to train convolution networks for stereo matching. Submitted to SCIENCE CHINA Information Sciences, 2019.
We have collected 2000 pairs of stereo images with high accuracy disparity maps to fine-tune the network. Our goal is to improve the generalization performance of networks.
fine-tune num: 90000; the initial learning rate: 1e-3.
i7@3.4GHz, GTX 1080Ti GPU
05/15/19
95
PSMNet_2000
H
2
28.9
82
20.4
53
8.23
59
15.1
79
27.7
90
35.2
60
15.2
71
50.8
102
51.8
113
9.29
71
61.9
110
31.1
62
25.2
68
27.8
78
29.3
75
52.9
91
Hao Li, Yanwei Sun, and Li Sun. Edge-preserved disparity estimation with piecewise cost aggregation. Submitted to the International Journal of Geo-Information, 2019.
The cost aggregated paths are divided by edge pixels in the edge disparity map, and cost aggregation is calculated independently in each sub-path.
We propose "DeepPruner", a real-time stereo matching algorithm, which combines the strength of deep network and search space pruning techniques. Towards this goal, we developed a differentiable PatchMatch module that allows us to discard most disparities and generates a sparse representation of the cost-volume. We then exploit this representation to learn which range to prune for each pixel. Our method achieves competitive results on KITTI / SceneFlow datasets while running in real-time at 62ms. Moreover, we obtain the first place (on overall rankings) on the Robust Vision Challenge. For more details, check out our paper and source code.
It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest inference methods, a truncated max-product Belief Propagation, and add what is necessary to make it a proper component of a deep learning model: We connect it to learning formulations with losses on marginals and compute the backprop operation. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs), allowing us to design a hierarchical model composing BP inference and CNNs at different scale levels. The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
We novelly formulate the scale transformation of cost volume as a Bayes inference and propose the inter-scale subnetwork to reliably and adaptively generate details under the guidance of geometric information.
we fine-tune the model pre-trained on Scene Flow for 300 epochs with the learning rate of 0.001 in the first 100 epochs and 0.0001 in the rest 1000 epochs.
GTX 1080Ti
11/11/19
101
SACA-Net
Q
1
31.7
91
21.8
54
16.0
91
21.5
96
24.5
81
38.0
65
34.6
110
38.6
82
36.3
93
19.0
96
49.4
84
40.2
89
32.1
95
33.6
103
36.0
86
58.2
99
Anonymous. Enhancing deep stereo networks with geometric priors. CVPR 2020 submission 387.
Huaiyuan Xu, Xiaodong Chen, Haitao Liang, Siyu Ren, and Haotian Li. Cross-based rolling label expansion for dense stereo matching. Submitted to IEEE Access, 2019.
c++, i5-7400 CPU@3.0GHz
12/19/19
104
CRLE
H
2
5.75
5
3.66
5
3.11
7
5.92
17
2.14
4
6.01
5
3.39
4
3.49
7
3.68
6
2.34
6
10.2
7
9.63
5
8.04
5
14.9
22
5.45
9
9.26
5
Weimin Yuan. Efficient local stereo matching algorithm based on fast gradient domain guided image filtering. Submitted to Pattern Recognition Letters, 2019.
Matlab R2014a;@3.4GHZ
12/30/19
105
F-GDGIF
H
2
31.6
90
37.3
84
7.72
57
16.1
83
34.9
106
35.2
61
20.1
82
55.3
108
46.6
103
9.01
69
47.6
81
52.9
111
29.4
87
29.7
88
29.4
77
61.0
102
Yuli Fu, Kaimin Lai, Weixiang Chen, and Youjun Xiang. A pixel pair based encoding pattern for stereo matching via an adaptively weighted cost. Submitted to IET Image Processing, 2020.
A novel encoding pattern, which is designed for the situation of radiometric distortion, is proposed. The pattern is applied for stereo matching cost function.
m=25
1 i5 core@3.0Hz
01/02/20
106
PPEP-GF
Q
1
34.6
100
42.4
91
21.7
107
24.8
103
30.8
96
44.0
76
25.1
93
45.7
94
42.1
100
20.1
100
44.1
75
43.6
95
35.2
104
32.8
99
39.3
99
55.0
96
Rafael Brandt, Nicola Strisciuglio, and Nicolai Petkov. MTStereo 2.0: Improved accuracy of stereo depth estimation. ICPR 2020 submission.
The method is based on a Max-tree hierarchical representation of image pairs, which we use to identify matching regions along image scan-lines.
The number of color quantization levels was set to 16. α was set to 0.8. The minimum (or maximum) width of nodes to be matched was set to 0 (or 1/2 of the input image width). Matched node levels S was set to {1, 0}. The maximum neighborhood size ω_γ was set to 10. The size of the Gaussian kernel used to aggregate the cost volume was 21. The minimum confidence percentage parameter ω_Π was set to 12. In guided pixel refinement, ω_ω was set to 12% when sparse disparity maps were generated.
i7 8565U (4 cores)
01/05/20
107
MTS2
F
3
53.8
120
51.7
110
21.5
106
38.8
120
52.7
121
97.5
123
43.0
120
66.4
119
60.8
119
32.0
116
85.7
120
69.0
119
46.3
117
45.1
116
71.2
120
85.2
120
Lingyin Kong, Jiangping Zhu, and Sancong Ying. Stereo matching based on guidance image and adaptive support region. Submitted to Acta Optica Sinica, 2020.
Matlab R2018b;Intel Core i7-3770 CPU
01/07/20
108
ADSR_GIF
Q
1
37.1
103
43.6
97
18.6
97
36.7
119
24.6
82
58.6
100
22.8
89
56.3
112
49.7
109
18.7
95
56.0
99
48.5
103
32.2
96
24.5
66
36.3
87
79.1
118
Anonymous. Cascade cost volume for high-resolution multi-view stereo and stereo matching. CVPR 2020 submission 6312.
GTX 1080Ti
02/07/20
109
CasStereo
H
2
18.8
54
23.9
58
9.01
63
10.5
50
11.7
48
74.0
112
13.1
55
10.1
31
7.86
25
4.09
35
45.4
77
25.2
47
24.4
63
17.3
36
20.5
62
44.3
78
Linghua Zeng and Xinmei Tian. CRAR: Accelerating stereo matching with cascaded regression and adaptive refinement. Submitted to Pattern Recognition, 2020.
A deep-learning model PSMNU, modified based on PSMNet, produces initial disparity and uncertainty on the down-sampled image. SGBMP performs full resolution prediction based on the initial disparity and uncertainty.
PSMNU: max disparity 256, trained on Scene Flow dataset (Flyingthings3D & Monkaa) only, without data augmentation. SGBMP: \lambda_b = 3, \lambda_s = 0.1, \lambda_d = 0.1. For the initial prediction of PSMNU, images are down-sampled to 768x1024.
The algorithm is based on a hierarchical representation of image pairs which is used to restrict disparity search range. We propose a cost function that takes into account region contextual information and a cost aggregation method that preserves disparity borders.
James Okae, Juan Du, and Yueming Hu. Robust statistical approach to stereo disparity map denoising and refinement. Submitted to Journal of Control Theory and Technology, 2020.
Using robust statistics and probability to detect and refine outliers in disparity maps by leveraging the joint statistics of the given disparity map and its reference image.
lamda=1,r1=5,r2=25, sigma=10,tho_d=1, tho_s=4
Matlab Intel® Core™ i7-4600U CPU
05/14/20
115
SRM
H
2
13.1
38
8.50
36
7.04
53
7.86
35
7.73
37
16.1
38
7.90
27
18.4
48
18.5
62
5.03
44
22.3
36
20.0
34
18.1
42
18.5
40
11.3
36
19.3
40
Haoyu Ren, Mostafa El-Khamy, and Jungwon Lee. Stereo disparity estimation via joint supervised, unsupervised, and weakly supervised learning. ICIP 2020.
Mei Haocheng, Yu Lei, and Wang Tiankui. Class classification network for stereo matching. Submitted to The Visual Computer, 2020.
8 cores + GTX 1080Ti GPU
05/23/20
118
CCNet
Q
1
31.9
93
27.0
67
16.0
92
20.3
92
18.6
64
38.8
66
52.0
124
35.9
78
34.0
88
18.6
94
49.6
86
30.5
61
26.7
76
36.2
109
37.1
89
62.0
105
Yun Xie, Shaowu Zheng, and Weihua Li. Feature-guided spatial attention upsampling for real-time stereo matching. Submitted to IEEE MultiMedia, 2020.
Python, 2080TI
05/28/20
120
RTSMNet
H
2
45.6
116
47.0
103
21.9
108
31.9
114
36.4
109
75.1
114
43.9
121
58.9
116
55.3
116
32.7
117
62.2
111
56.4
115
42.2
113
39.1
113
58.0
114
59.9
101
Anonymous. End-to-end neural architecture search for deep stereo matching. NeurIPS 2020 submission 4988.
GPU @ 2.5 Ghz (Python)
05/28/20
119
LEAStereo
H
2
7.15
11
7.56
33
4.52
23
4.62
7
4.64
24
8.83
14
5.66
13
5.86
18
6.03
21
3.30
28
13.1
18
11.3
10
10.3
11
12.1
3
7.06
20
9.90
10
Hector Vazquez, Madain Perez, Abiel Aguilar, Miguel Arias, Marco Palacios, Antonio Perez, Jose Camas, and Sabino Trujillo. Real-time multi-window stereo matching algorithm with fuzzy logic. Submitted to IET Computer Vision, 2020.
The propose a novel stereo matching algorithm with fuzzy logic and also implement it on a FPGA embedded system. We try to select the best window size of SAD for each pixel by leveraging fuzzy logic.
We used block the diferents size. (ex. 5, 15, 21)
Cyclone V 5CSEBA6U23I7 FPGA
06/08/20
121
MANE
H
2
30.9
88
54.7
114
11.5
77
14.6
75
29.4
93
52.6
92
26.4
97
45.1
93
31.5
86
11.5
81
42.5
71
41.8
93
33.1
99
31.6
94
34.2
85
43.5
75
Xianjing Cheng and Yong Zhao. HLocalExp-CM: Confidence map by hierarchical local expansion moves for stereo matching. Submitted to IEEE Access, 2020.
GA-Net reference submission as baseline for the stereo benchmark of the robust vision challenge 2020.
All method credits go to the original author (Zhang et al.)
Submission by Nicolas Jourdan, TU Darmstadt, RVC 2020 team.
Trained on Middleburry, KITTI, ETH3D from the KITTI checkpoint made available in the GANet repository on Github by the original authors.
Frequency of sampling was adapted to the dataset size. Test images scaled to next multiple of 48.
We proposed a robust disparity estimation network. Our major novelty compared to existing work is a novel usage of attention, which can handle scenes with different scenarios.
The RVC submission trained by quarter-resolution Middlebury + KITTI + ETH. After validation, we will go with quarter resolution instead of half-resolution