Mouseover the table cells to see the produced
disparity map. Clicking a cell will blink the ground truth for
comparison. To change the table type, click the links below.
For more information, please see the description of new features.
OpenCV 2.4.8 StereoSGBM method, full variant (2 passes). Reimplementation of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semiglobal block matching" method; memoryintensive 2pass version, which can only handle the quartersize images. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in 8 directions. Post filter as implemented in OpenCV. Dense results are created by holefilling along scanlines.
SAD window: 3x3 pixel
Truncation value for prefilter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: true
C/C++; 1 core, i7@3.3 GHz
07/25/14
1
SGBM2
Q
1
26.4
70
27.9
59
12.1
73
17.8
79
13.7
50
74.5
100
14.0
59
30.3
63
26.3
69
11.0
71
64.4
99
37.9
74
25.8
64
25.3
68
29.3
71
43.7
72
OpenCV 2.4.8 StereoSGBM method, singlepass variant. Reimplementation and modification of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semiglobal block matching" method; memory efficient singlepass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by holefilling along scanlines.
SAD window: 3x3 pixel
Truncation value for prefilter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: false
The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.
OpenCV 2.4.8 StereoSGBM method, singlepass variant. Reimplementation and modification of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semiglobal block matching" method; memory efficient singlepass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by holefilling along scanlines.
SAD window: 3x3 pixel
Truncation value for prefilter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: false
C/C++; 1 core, i7@3.3 GHz
07/25/14
4
SGBM1
F
3
28.4
75
43.5
84
9.09
57
13.6
63
25.9
75
82.0
103
14.4
64
43.4
77
30.3
75
5.98
51
59.3
95
45.8
88
28.5
74
24.9
65
20.1
59
45.9
74
OpenCV 2.4.8 StereoSGBM method, singlepass variant. Reimplementation and modification of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semiglobal block matching" method; memory efficient singlepass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by holefilling along scanlines.
SAD window: 3x3 pixel
Truncation value for prefilter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: false
The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.
The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.
Correlation with five, partly overlapping windows on Census transformed images using Hamming distance as matching cost. A leftright consistency check ensures unique matches and filtering small disparity segments removes outliers. Interpolation is done within image rows with the lowest, valid neighboring disparity.
Census window: 7x7 pixel
Correlation window: 9x9 pixel
LRcheck: on
Min. segments: 200 pixel
Interpolation: horizontal, lowest neighbor
A fast method for highresolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/ 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.
nRounds=3
The full set of parameters is listed in the paper and the supplemental materials on the project webpage.
A fast method for highresolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/ 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.
nRounds=3
The full set of parameters is listed in the paper and the supplemental materials on the project webpage.
This approach is an adaptive local stereomethod. It is integrated into a hierarchical scheme, which exploits adaptive windows. Subpix disparities are estimated,but not refined.
L = 10
t = 35
medianK = [3 3]
censusK = [9 7]
lambda = 45;
Blockmatching stereo with Summed Normalized CrossCorrelation (SNCC) measure. Standard postprocessed is applied, including a leftright check, error island removal (region growing), holefilling and median filtering.
SNCC (first stage 3x3, second stage 11x11)
min correlation threshold = 0.3
region growing threshold = 2.5 disparity
min region size = 200 pixel
median filter = 1x5 and 5x1
Efficient twopass aggregation with census/gradient cost metric, followed by iterative cost penalization and disparity reselection to encourage local smoothness of disparities.
census window size = 9 x 7
max census distance = 38.03
max gradient difference = 2.51
census/gradient balance = 0.09
aggregation window size = 33 x 33
aggregation range parameter = 23.39
aggregation spatial parameter = 7.69
refinement window size = 65 x 65
refinement range parameter = 11.30
refinement spatial parameter = 17.20
cost penalty coefficient = 0.0023
median filter window size = 3 x 3
3 iterations of refinement
confidence threshold of 0.1 for sparse maps
CUDA C++, NVIDIA GeForce TITAN Black
10/07/14
14
IDR
H
2
18.1
47
37.5
75
4.08
16
7.49
30
23.3
71
40.6
62
12.8
50
24.5
49
11.3
42
5.46
45
33.1
49
26.0
43
21.5
52
21.7
53
15.3
45
21.2
40
Anonymous. Using local cues to improve dense stereo matching. CVPR 2015 submission 973.
In stereo matching cost filtering methods and energy minimization algorithms are considered as two different techniques. Due to their global extend energy minimization methods obtain good stereo matching results. However, they tend to fail in occluded regions, in which cost filtering approaches obtain better results. In this paper we intend to combine both approaches with the aim to improve overall stereo matching results.
We propose to perform stereo matching as a twostep energy minimization algorithm. We consider two MRF models: a fully connected model defined on the complete set of pixels in an image and a conventional locally connected model. We solve the energy minimization problem for the fully connected model, after which the marginal function of the solution is used as the unary potential in the locally connected MRF model.
Only gradient component (6D vector) of color images is used
A local matching technique utilizing SAD+Census cost measure and a recursive edgeaware aggregation through Successive Weighted Summation. Occlusion handling is provided via leftright cross check and a background favored filling.
smoothness parameter sigma = 24
5x5 Census window, Census weight=0.7, SAD weight=0.3, occlusion threshold=2
This approach triangulates the polygonized SLIC segmentations of the input images and optimizes a lowerlayer MRF on the resulting set of triangles defined by photo consistency and normal smoothness. The lowerlayer MRF is solved by a quadratic relaxation method which iterates between PatchMatch and Cholesky Decomposition. The lowerlayer MRF is assisted by a upperlayer MRF defined on the set of triangle vertices which exploits local 'visual complexity' cues and encourages smoothness of the vertices' splitting properties. The two layers interact through an Alignment energy term which requires triangles sharing a nonsplit vertex to have their disparities agree on that vertex. Optimization of the whole model is iterated between optimizations of the two layers till convergence where the upperlayer can be solved in closed form.
omega=0.2
tau_grad=15
theta goes from 0 to 100 by smoothstep function in ten iterations
gamma1=30
gamma2=60
gamma3=0.8
Compute the matching cost with a convolutional neural network (accurate architecture). Then apply crossbased cost aggregation, semiglobal matching, leftright consistency check, median filter, and a bilateral filter.
DETAILS:
The network is similar to the one described in our CVPR paper differing only in the values of some hyperparameters. The input to the network are two 11 x 11 image patches. Five convolutional layers with 3 x 3 kernels and 112 feature maps extract feature vectors from the input image patches. The two 112length feature vectors are concatenated into a 224length vector which is passed through three fullyconnected layers with 384 units each. The final (fourth) fullyconnected layer projects the output to a single numberthe matching cost. One important addition was the use of data augmentation techniques to increase the size of the training set. We tried to use as much training data as possible. Therefore we combined all of the 2001, 2003, 2005, 2006, and 2014 Middlebury datasets obtaining 60 image pairs. For the newer datasets (2005, 2006, and 2014) we also used several illumination and exposure settings.
A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum aposterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. (Improved results as of 9/14/2015 due to bug fix in colortogray conversion.)
Standard parameters of Libelas as provided with the MiddEval3SDK.
C++/SSE; 1 core, i7@3.6 GHz
09/14/15
22
ELAS
F
3
32.3
82
50.9
95
9.17
58
11.0
51
33.0
89
88.2
107
18.3
70
47.3
84
26.8
70
11.7
75
41.7
68
37.4
71
23.7
55
28.8
78
63.0
103
42.8
69
S. Fang and Y. Li. Removed based multiview stereo using windowbased matching method. Submitted to MV&A, 2015.
Removed erroneous corresponding points from stereo pair and only correct corresponding points are kept which are obtained by NCC.
The method generates multiple proposals on absolute and relative disparities from multisegmentations. The proposals are coordinated by pointwise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes.
We postprocess the depth maps produced by Zbontar & LeCun's MCCNN technique. We use a domain transform to compute an edgeaware variance measure of our confidence in the depth map, and then run our robust bilateral solver on that depth map and confidence with a GemanMcClure loss function.
The MCCNN is computed using the publiclyavailable implementation (https://github.com/jzbontar/mccnn) which using the GPU, and the robust bilateral solver is computed using our CPU implementation which does not use the GPU, and is written in vanilla C++.
Intel(R) Xeon(R) CPU E51650 0 @ 3.20GHz, 6 cores; 32 GB RAM; NVIDIA GTX TITAN X
11/03/15
25
MCCNN+RBS
H
2
8.42
20
6.05
26
5.16
36
6.24
21
3.27
16
11.1
21
6.36
19
8.87
26
9.83
34
3.21
25
15.1
19
15.9
26
12.8
15
13.5
10
7.04
19
9.99
9
Anonymous. High accuracy stereo matching with spatially varying regularization. CVPR 2016 submission 863.
C++; 1 i7 core @2.8GHz + Nvidia GTX 660Ti GPU
11/05/15
26
GCSVR
H
2
14.6
43
17.1
45
3.50
10
8.22
36
16.5
55
47.4
76
8.82
30
9.75
28
7.06
21
3.17
23
34.4
52
27.1
46
18.3
39
19.2
42
16.0
47
19.3
37
Anonymous. Look wider and deeper to match. CVPR 2016 submission 975.
.
.
i7 core @3.2GHz + GTX 980 GPU
11/06/15
27
SOU4Pnet
H
2
13.4
38
23.1
51
5.41
41
6.39
22
13.1
49
30.5
48
9.00
31
16.4
44
12.7
45
3.13
21
28.9
38
17.1
28
16.4
32
16.9
33
10.7
32
14.5
30
Anonymous. Imageguided nonlocal dense matching with threesteps optimization. ISPRS Congress 2016 submission 231.
This paper proposes a new imageguided nonlocal dense matching method with a threestep optimization based on the combination of imageguided methods and energy functionguided methods.
Cost Computation:
Window Size: 5
Weighting Coefficient: 0.3
Truncation Threshold (Census): 15
Truncation Threshold (HOG): 1
Imageguided Nonlocal Matching:
Smooth Term: 6
Penalty Term P1: 0.3
Penalty Term P2: 3
Disparity Interpolation:
Truncation Threshold: 5
Smooth Term: 3
Penalty Term P1: 3
Penalty Term P2: 30
Function Base: 5
An efficient stereo matching algorithm, which applies adaptive smoothness constraints using texture and edge information, is proposed in this work. First, we determine nontextured regions, on which an input image yields flat pixel values. In the nontextured regions, we penalize depth discontinuity and complement the primary CNNbased matching cost with a colorbased cost. Second, by combining two edge maps from the input image and a preestimated disparity map, we extract denoised edges that correspond to depth discontinuity with high probabilities. Thus, near the denoised edges, we penalize small differences of neighboring disparities.
The method uses the MCCNN code for the matching cost computation only.
Compute the matching cost with a convolutional neural network (fast architecture). Then apply crossbased cost aggregation, semiglobal matching, leftright consistency check, median filter, and a bilateral filter.
See paper.
NVIDIA GTX TITAN X
01/26/16
31
MCCNNfst
H
2
9.47
24
7.35
32
5.07
34
7.18
28
4.71
24
16.8
35
8.47
28
7.37
20
6.97
20
2.82
16
20.7
27
17.4
30
15.4
29
15.1
21
7.90
23
12.6
23
Anonymous. Stereo depth map refinement with scene layout estimation. CVPR 2016 submission 617.
We exploit scene layout information to refine depth maps.
R. AitJellal, M. Lange, B. Wassermann, A. Schilling, and A. Zell. LSELAS: line segment based efficient large scale stereo matching. ICRA 2017.
Our approach is an extension of the ELAS (from Geiger et al.) algorithm. We extract edges and sample our candidate support points along them. For every two consecutive valid support points we create a (straight) line segment. We force the triangulation to include the set of line segments (constrained Delaunay) for a better preservation of the disparity discontinuity at the edges.
Parameters as in the original ELAS algorithm.
For sampling candidate support points along the edge segments:
Adaptive sampling activated:
step = ceil(sqrt(img_diag)*0.5);
sampler(sqrt(step) / 2, step / 2, step / 2);
Anonymous. Morphological processing of stereoscopic image superimpositions for disparity map estimation. ECCV 2016 submission 1308.
The computation of the sparse disparity maps is achieved by means of a 3D diffusion of the costs contained in the disparity space volume. The watershed segmentations of the left and right views control the diffusion process and valid measurements are obtained by crosschecking.
The estimation of the dense disparity maps uses the sparse measurements as control points and is driven by a 3D watershed separating the disparity space volume into foreground and background pixels.
Python; 1 i5 core @2.7Ghz
03/15/16
33
MPSV
Q
1
43.5
100
58.8
101
33.9
108
34.2
102
37.9
97
52.4
80
30.8
95
56.8
98
51.0
99
30.6
102
56.9
89
51.5
93
44.6
100
43.4
99
44.2
91
54.2
86
Shahbazi et al. Revisiting intrinsic curves for efficient dense stereo matching. ISPRS Congress 2016 submission 913.
No post processing (no filtering, no holefilling, no interpolation) performed.
The concepts of intrinsic curves were revisited and used for:
 disparity search space reduction, resulting in 83% reduction of the disparity range (individually for each pixel) directly from the original resolution of the image without needing hierarchical search
 reducing the ambiguities due to occluded pixels by integrating occlusion clues explicitly into the global energy function as a soft prior
The final energy minimization was done using semi global approach along eight paths.
Matching (data) cost = census transform 7*9
Occlusion cost= from intrinsic curves curvature
C++; 1 i72760QM CPU @ 2.4 GHz
04/03/16
34
ICSG
F
3
45.6
101
69.7
108
19.1
86
21.3
85
43.6
102
77.6
101
36.9
102
65.3
103
40.4
88
20.3
85
53.6
86
58.7
102
46.5
101
47.1
103
60.7
101
79.1
102
C. Zhang, Z. Li, Y. Cheng, R. Cai, H. Chao, and Y. Rui. MeshStereo: a global stereo model with mesh alignment regularization for view interpolation. Submitted to IJCV 2016.
An extended version of MeshStereo model. Use matching cost (matching cost only) computed from MCCNN of Zbontar and LeCun.
See paper of MeshStereo, ICCV 2015
C/C++, 8 cores + NVIDIA TITAN X
04/12/16
35
MeshStereoExt
H
2
7.08
10
4.41
11
3.98
15
5.40
14
3.17
14
10.0
13
6.23
17
4.62
11
4.77
13
3.49
27
12.7
13
12.4
12
10.4
10
14.5
17
7.80
22
8.85
4
Z. Ge. A global stereo matching algorithm with iterative optimization. China CAD & CG 2016 submission 595.
the basic idea is treat stereo matching as a Markov Random Field problem,and iterative optimize the initial result use modified EMlike algorithm
S. Hadfield, K. Lebeda, and R. Bowden. Stereo reconstruction using topdown cues. Submitted to CVIU 2016.
Incorporating cues from topdown (holistic) scene understanding into existing bottomup stereo reconstruction techniques (CoR  Chakrabarti et al. CVPR 2015).
Learned weightings (from 2006 dataset) for High Level Scene Cues. Default parameters for CoR. Images with max disp > 256 were downsampled before the SGM step of CoR.
Matlab and C++. single E5 core at 2.4GHz
04/24/16
37
HLSC_cor
H
2
26.0
69
26.5
56
15.2
81
21.0
83
20.5
67
35.7
55
23.4
82
33.1
67
35.0
80
11.9
76
39.1
58
34.2
62
25.2
62
32.8
88
28.3
70
22.7
43
Anonymous. Stereo matching by joint energy minimization. ECCV 2016 submission 41.
1 i7 Core @3.5GHz
04/27/16
38
JEM
Q
1
37.2
90
35.7
69
27.9
103
30.6
99
33.2
90
43.0
66
31.4
97
49.5
89
47.3
94
26.5
98
49.6
80
46.0
90
35.7
90
30.8
81
37.5
81
55.8
88
L. Li, S. Zhang, X. Yu, and L. Zhang. PMSC: PatchMatchbased superpixel cut for accurate stereo matching. Submitted to IEEE Transactions on Circuits and Systems for Video Technology, 2016.
A 3D label based method with global optimization at pixel level. A bilayer matching cost is employed by first matching small square windows then aggregate on large irregular windows. Global optimization is carried out by fusing candidate proposals, which are generated from our specific superpixel structure.
We propose a method to combine the predicted surface normal constraint by deep learning. With the selected reliable disparities from stereo matching method and effective edge fusion strategy, we can faithfully convert the predicted surface normal map to a disparity map by solving a least squares system which maintains discontinuity. We use the raw matching cost of MCCNN.
See paper.
Matlab; 1 i54590 CPU@3.3 GHz
09/13/16
43
SNPRSM
H
2
8.75
21
5.46
19
4.85
28
6.50
24
3.37
17
10.4
17
7.31
24
8.73
25
9.37
31
3.58
28
14.3
16
14.7
22
14.9
26
12.8
5
10.1
29
10.8
15
Anonymous. A learned sparseness and IGMRF based regularization framework for dense disparity estimation using unsupervised feature learning. Submitted to IPSJ Transactions on Computer Vision and Applications, 2016.
Dense disparity estimation in a sparsity and IGMRF based regularization framework where the matching is performed using learned features and intensities of stereo images.
manually set
Core i73632QM, 2.20 GHz processor and 8.00 GB RAM.
09/20/16
44
LFSIR
Q
1
70.1
110
75.7
110
60.3
109
67.1
108
72.4
110
80.8
102
53.7
110
85.4
110
83.8
110
42.5
107
91.2
108
90.4
109
64.1
109
71.3
110
61.5
102
90.3
108
H. Park and K. Lee. Look wider to match image patches with convolutional neural network. Submitted to IEEE Signal Processing Letters, 2016.
A novel pooling scheme is used to train a matching cost function with a CNN. It widens the size of receptive field effectively without losing the fine details.
The overall postprocessing pipeline is kept almost same as the original MCCNNacrt, except that the parameter settings are changed as follows:
cbca_num_iterations_1 = 0, cbca_num_iterations_2 = 1, sgm_P1 = 1.3, sgm_P2 = 17.0, sgm_Q1 = 3.6, sgm_Q2 = 36.0, and sgm_V = 1.4.
Torch; the Intel core i7 4790K
CPU and a single Nvidia Geforce GTX Titan X GPU
10/19/16
45
LWCNN
H
2
7.04
9
4.65
12
3.95
13
5.30
13
2.63
6
11.2
23
5.41
11
4.32
10
4.22
11
2.43
10
12.2
11
13.4
15
13.6
22
14.8
19
4.72
1
12.0
21
M. Joshi. A learned IGMRF sparseness and IGMRF based regularization framework for dense disparity estimation. Submitted to IPSJ CVA 2016.
An energy minimization framework for disparity estimation where energy function consists of intensity matching cost, feature matching cost, IGMRF prior and sparsity priors.
Manually set
MATLAB 2014 @2.22 Ghz
10/23/16
46
SIGMRF
Q
1
64.2
109
60.0
103
33.0
107
67.9
109
63.2
108
99.5
112
39.8
104
84.8
109
82.0
109
35.2
104
95.2
110
91.5
110
58.1
108
65.8
109
55.0
99
88.6
107
C. Legendre, K. Batsos, and P. Mordohai. Highresolution stereo matching based on sampled photoconsistency computation. BMVC 2017.
C/C++ 6 cores Intel Core i75820K @3.3 GHz
11/06/16
47
SPS
F
3
19.6
52
14.2
41
12.3
75
14.9
74
12.0
44
15.8
32
19.1
73
17.4
45
15.4
49
8.23
59
30.9
42
34.8
65
30.6
80
25.3
68
28.3
69
28.0
51
S. Tulyakov, A. Ivanov, and F. Fleuret. Weakly supervised learning of deep metrics for stereo reconstruction. ICCV 2017.
This is a new weakly supervised method that allows to learn deep metric for stereo reconstruction from unlabeled stereo images, given coarse information about the scenes and the optical system. The deep metric architecture is similar to MCCNN fst.
1 core 2.5 Ghz + K40 NVIDIA, LuaTorch
11/15/16
48
MCCNNWS
H
2
12.1
32
14.8
44
7.20
51
11.1
52
7.62
35
15.9
33
11.8
43
11.5
35
9.01
29
3.89
31
19.7
26
20.5
34
16.3
31
16.3
30
12.1
38
18.3
36
Anonymous. Learning to compute the stereo matching cost without supervision. CVPR 2017 submission 2151.
SGM type method where the matching cost is computed with a CNN trained without ground truth data.
See paper
NVIDIA Titan X
11/15/16
49
UCNN
H
2
20.5
55
44.8
86
9.77
65
13.6
63
18.2
58
36.5
57
12.8
49
23.4
48
12.4
43
9.22
64
39.5
60
30.5
55
24.8
59
21.2
49
19.1
55
32.3
56
Anonymous. Simultaneous learning matching cost and smoothness constraint for stereo matching. CVPR 2017 submission 1368.
C/C++; 4 cores + GTX 1080 GPU
11/16/16
50
MCSC
F
3
11.3
31
13.3
40
5.96
44
10.6
48
8.69
36
7.22
8
11.3
39
10.6
30
7.48
22
3.07
18
3.10
1
25.2
42
19.0
40
17.2
35
10.3
30
25.5
49
N. Ma, Y. Men, C. Men, and X. Li. Accurate dense stereo matching based on image segmentation using an adaptive multicost approach. Submitted to Symmetry, 2016.
This is a segmentation based stereo matching algorithm using an adaptive multicost approach, which is exploited for obtaining accuracy disparity maps.
Used local adaptive cost aggregation including crossarm support region, guided filtering and disparity subset selection. Refine homogeneous regions using weight propagation with disparity change penalty.
We propose a cost aggregation method that efficiently weave together MSTbased support region filtering and PatchMatchbased 3D label search. We use the raw matching cost of MCCNN.
NVIDIA GTX TITAN X for MCCNN.
Python, Ubuntu, 4 cores @2.7GHz for our method.
03/10/17
56
MCCNN+TDSR
F
3
6.35
7
5.45
18
4.45
23
6.80
25
3.46
18
10.7
19
6.05
15
5.01
14
5.19
15
2.62
11
10.8
8
9.62
5
6.59
2
11.4
1
6.01
13
7.04
1
P. Knöbelreiter, C. Reinbacher, A. Shekhovtsov and T. Pock. Endtoend training of hybrid CNNCRF models for stereo. CVPR 2017.
We propose a novel method for stereo estimation, combining advantages of convolutional neural networks (CNNs) and optimizationbased approaches. The optimization, posed as a conditional random field (CRF), takes local matching costs and consistencyenforcing (smoothness) costs as inputs, both estimated by CNN blocks. To perform the inference in the CRF we use an approach based on linear programming relaxation with a fixed number of iterations. We address the challenging problem of training this hybrid model endtoend. We show that in the discriminative formulation (structured support vector machine) the training is practically feasible. The trained hybrid model with shallow CNNs is comparable to stateoftheart deep models in both time and performance. The optimization part efficiently replaces sophisticated and not jointly trainable (but commonly applied) postprocessing steps by a trainable, wellunderstood model.

NVidia Titan X
03/22/17
57
JMR
H
2
12.5
33
4.09
9
3.97
14
8.44
37
6.93
33
11.1
20
13.8
56
19.5
47
19.0
59
3.66
29
17.0
23
18.2
31
18.0
37
21.0
48
7.29
20
17.8
34
Anonymous. Deep selfguided cost aggregation for stereo matching. ICCV 2017 submission 1999.
This method utilizes a deep learning technique to perform a selfguided cost aggregation which does not require any guidance color image.
Matlab; i74770 @ 3.40 GHz; GTX 1080 GPU
03/23/17
58
DSGCA
Q
1
33.8
85
42.9
81
20.9
90
23.6
88
30.2
82
45.5
72
27.6
88
42.0
75
36.0
81
21.0
88
50.2
81
44.2
84
33.3
85
34.6
92
38.4
84
46.8
75
J. Yin, H. Zhu, D. Yuan, and T. Xue. Sparse representation over discriminative dictionary for stereo matching. Submitted to Pattern Recognition 2017.
M. Kitagawa, I. Shimizu, and R. Sara. High accuracy local stereo matching using DoG scale map. IAPR MVA 2017.
Our method is local matching approach using the Guided Filter for cost aggregation. We give appropriate the Guided Filter size for each pixel in input image by the Filter Size Map computed by using the DoG Kernel.
Parameters for Filter Size Map computation:
DoGparam.scalesize = 25 (index of scale space)
DoGparam.mfsize = 1 (window size for Filter Size Map optimization)
Parameters for Guided Filter:
eps = 0.001
Parameters for cost computation:
gamma = 0.11 (Weight of cost)
Parameters for Bilateral Filter in disparity map optimization:
gamma_c = 1
gamma_d = 11
r_median = 19
We propose local expansion moves for estimating dense 3D labels on a pairwise MRF. The data term uses a PatchMatchlike 3D slanted window formulation, where raw matching costs within a window are computed by MCCNNacrt and aggregated using guided image filtering. The smoothness term uses a pairwise curvature regularization term by Olsson et al. 2013.
See our paper.
C++; 4 cores i74770K @3.5GHz
06/22/17
62
LocalExp
H
2
5.43
3
3.65
6
2.87
5
2.98
1
1.99
3
5.59
5
3.37
5
3.48
6
3.35
5
2.05
2
10.3
6
9.75
6
8.57
7
14.4
16
5.40
7
9.55
7
H. Li, C. Cheng, and L. Zhang. Stereo matching cost based on sparse representation. Submitted to IEEE Transactions on Circuits and Systems for Video Technology, 2017.
the paper proposes a new matching cost based on sparse
representation theory,which is obtained by solving the convex
quadratic programming (QP) problems and without needing to
learn the dictionary in advance.
We propose a feature ensemble network leveraging deep convolutional neural network to perform matching cost computation and the disparity refinement. For matching cost computation, patchbased network architecture with multisize and multilayer pooling
unit is adopted to learn crossscale feature representations. For disparity refinement, the initial optimal and suboptimal disparity maps are incorporated and diverse base learners are applied.
We propose a robust learningbased method for stereo cost volume computation. We accomplish this by coalescing diverse evidence from a bidirectional matching process via random forest classifiers. We show that our matching volume estimation method achieves similar accuracy to purely datadriven alternatives and that it generalizes to unseen data much better. In fact, we used the same model trained on Middlebury 2014 dataset to submit to the KITTI and ETH3D benchmarks.
We extend the standard BP sequential technique to the fully connected CRF models with the geodesic distance affinity.
Also a new approach to the BP marginal solution is proposed that we call oneviewocclusion detection (OVOD). In contrast to the standard winner takes all (WTA) estimation, the proposed OVOD solution allows to find occluded regions in the disparity map and simultaneously improve the matching result.
As a result we can perform only one energy minimization process and avoid the cost calculation for the second view and the leftright check procedure.
All parameter settings are given in the C++ MS VS project available at the project website.
Intel(R) Xeon(R) CPU E51620 v4 @3.50 GHz
12/11/17
67
OVOD
H
2
8.87
22
4.74
15
3.64
11
5.51
15
4.82
25
12.8
29
6.51
20
9.91
29
9.96
35
3.13
21
16.6
22
14.8
23
14.1
24
15.4
23
6.92
18
13.2
25
H. Li and C. Cheng. Adaptive weighted matching cost based on sparse representation. Submitted to IEEE TIP, 2018.
This paper proposes a novel
nondatadriven matching cost for dense correspondence in view
of sparse representation. This new matching cost can separate
the source of impact such as illuminations and exposures, thus
making it more suitable and selective for stereo matching. In
addition, the new matching cost can be used as a adaptive
weight in the process of cost calculation, and can improve the
accuracy of the matching costs by weighting.
Matlab;c++
01/24/18
68
SMSSR
Q
1
34.8
88
47.4
90
22.4
94
27.3
92
28.8
79
46.7
75
23.6
84
49.5
90
43.4
89
20.9
87
37.7
56
44.1
83
34.6
87
33.0
89
39.1
88
49.8
80
Anonymous. Disparity filtering with 3D convolutional neural networks. CRV 2018 submission 52.
Matlab 2017
02/07/18
69
DF
Q
1
56.1
107
47.7
91
27.9
102
36.1
104
46.7
104
62.5
90
50.2
109
72.4
104
69.9
107
37.8
105
88.2
106
70.0
104
52.8
107
50.2
106
77.5
106
91.8
110
T. Yan and Q. Zhao. Fast disparity refinement with occlusion handling for stereo matching. To appear in IEEE TIP 2018.
We propose a stereo matching algorithm that directly refines the winnertakeall (WTA) disparity map by exploring its statistic significance. WTA disparity maps are obtained from the precomputed raw matching costs of MCCNNacrt.
lambda: 0.3
gamma: 20
k: 30
C/C++;1 core i54210M @2.6GHz
02/28/18
70
FDR
H
2
7.69
16
5.41
16
4.22
18
4.20
6
2.73
8
10.2
14
5.40
10
6.40
17
5.76
18
4.72
37
11.2
9
15.4
24
13.4
20
16.5
31
5.22
5
13.0
24
Anonymous. Superpixel stereo matching based on normal optimization. ECCV 2018 submission 500.
c i7
03/06/18
71
NOSS
H
2
5.04
2
3.57
4
2.84
3
3.99
4
1.93
1
5.15
1
3.34
2
3.32
2
3.15
1
2.32
5
8.55
2
7.45
1
7.06
3
12.5
3
5.20
3
10.0
10
H. Hirschmueller. Stereo processing by semiglobal matching and mutual information. PAMI 30(2):328341, 2008. ROB 2018 submission.
GPU implementation of SGM using Census as matching cost
p1=18
p2=32
segmentation filter=80 pixel
median filter=5x5
GeForce GTX 980
03/09/18
72
SGM_ROB
H
2
18.4
49
37.4
74
5.31
38
9.03
40
14.2
51
31.7
49
14.3
63
24.7
52
12.6
44
5.27
42
31.8
46
29.7
54
24.9
60
22.0
55
18.6
51
28.2
52
Anonymous. Learning to aggregate costs from multiple scanline optimizations in semiglobal matching. ECCV 2018 submission 1093.
Using MCCNNacrt matching cost
Intel Xeon CPU E52697, 32GB RAM
03/11/18
73
SGMForest
H
2
7.37
13
4.71
14
3.69
12
4.93
9
3.18
15
11.1
21
5.37
9
5.57
16
5.81
19
2.65
13
14.5
18
13.2
13
13.1
17
14.8
20
5.63
9
11.2
19
Anonymous. The domain transform solver. ECCV 2018 submission 1189.
MCCNN on Titan X and DTS on 1080Ti using Cuda.
03/14/18
74
DTS
H
2
13.4
39
8.45
34
7.54
53
7.46
29
5.50
27
14.9
30
10.2
35
24.5
50
25.1
67
4.93
38
19.2
25
18.7
32
14.6
25
15.9
27
13.0
39
17.2
32
Median disparity over all training images of the ROB 2018 stereo challenge.
This submission is a baseline for the Robust Vision Challenge (http://robustvision.net). Each pixel is set to the median disparity of the pixels at the same location in the training images. No test image information is used.
03/23/18
75
MEDIAN_ROB
H
2
97.8
112
96.1
111
95.6
111
99.0
112
98.4
112
98.4
111
99.2
112
98.4
112
98.1
111
99.0
112
99.0
112
99.6
112
99.9
112
94.7
112
95.1
111
98.3
111
Average disparity over all training images of the ROB 2018 stereo challenge.
This submission is a baseline for the Robust Vision Challenge (http://robustvision.net). Each pixel is set to the average disparity of the pixels at the same location in the training images. No test image information is used.
A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum aposterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed.
Updated ELAS submission as a baseline for the Robust Vision Challenge (http://robustvision.net), replacing the original ELAS (H) entry.
Standard parameters as provided with the MiddEval3SDK and the Robust Vision Challenge stereo devkit.
Anonymous. Depth from motion for smartphone AR. SIGGRAPH Asia 2018 submission 242.
Single core of a Mobile Phone (QualComm Snapdragon 821 Kryo @ 2.15Ghz)
07/31/18
98
MotionStereo
H
2
40.4
94
67.6
107
25.0
99
29.2
96
40.9
100
57.3
87
35.5
101
57.5
99
40.4
87
19.9
84
42.8
69
52.6
96
39.8
96
37.1
96
51.7
98
34.9
60
Anonymous. DISCO: Depth inference from stereo using context. ICME 2019 submission 3.
Python, Tensorflow
10/10/18
99
DISCO
H
2
24.5
62
35.0
67
7.34
52
11.7
54
18.7
61
48.6
77
17.1
68
31.4
64
22.4
62
9.33
65
46.0
74
33.5
60
27.5
71
24.6
63
27.5
68
55.0
87
Anonymous. Learning for disparity estimation through feature constancy. CVPR 2019 submission 2403.
C/C++,NVIDIA Titan XP
10/29/18
101
iResNet
H
2
22.9
60
28.3
60
9.19
59
15.8
76
19.3
66
35.1
54
11.3
40
27.7
58
16.8
53
15.2
79
54.7
88
27.6
48
19.5
42
21.5
50
31.9
75
51.6
82
Anonymous. DenseCNN: Dense convolutional neural network for stereo matching with feature connected modules. Submitted to Neurocomputing 2018.
Lua/Torch7; 8 cores + GTX 1080 GPU
10/29/18
100
DenseCNN
H
2
7.98
18
5.59
21
4.54
25
5.83
16
2.79
9
10.4
16
5.78
13
8.26
22
8.84
25
2.66
14
15.6
20
14.2
21
13.2
18
13.2
8
6.30
14
11.1
17
C. He, C. Zhang, Z. Chen, and S. Jiang. Minimum spanning tree based stereo matching using image edge and brightness information. CISPBMEI 2017.
we propose a MSTbased stereo
matching method using image edge and brightness
information due to the classical MST based methods were
used to produce the inaccurate matching weight in the
areas of image boundaries and similar color background.
λ=0.8
c++ opencv2.4.9 I78750 uhd630
11/07/18
102
IEBIMst
H
2
33.8
86
36.7
73
12.1
74
16.9
78
32.5
88
51.0
79
25.3
87
58.1
101
49.8
97
11.2
72
48.6
77
56.9
100
30.2
78
26.8
73
26.9
67
71.7
99
Anonymous. Realtime ondemand deep stereo matching on highresolution images. CVPR 2019 submission 2471.
Anonymous. Matching cost network without using stereo data. CVPR 2019 submission 5913.
We propose a stereo matching function that is based on convolutional neural network and does not require domain data (left and right stereo images and ground truth). A method is proposed to generate a synthesized corresponding image patch for a given image patch. In addition, we propose a multiwindow stereo matching network that is aware of the flattening effect in stereo matching. The proposed matching cost is refined using crossscale aggregation, semiglobal matching and leftright consistency postprocessing.
We propose four efficient feature extractors based on convolutional neural networks for stereo matching cost computation. Two of them generate multiscale features with diverse receptive field sizes. These multiscale features are used to compute the corresponding multiscale matching costs. We then determine an optimal cost by combining the multiscale costs using edge information. On the other hand, the other two feature extractors produce uniscale features by combining multiscale features directly through fully connected layers. Finally, after obtaining matching costs using one of the four extractors, we determine optimal disparities based on the crossbased cost aggregation and the semiglobal matching.
We design a fullconvolutional network to generate disparity map as a regression problem. Applying pyramid pooling and skip connection to integrate hierarchical context information.
lr=0.001
Python, GTX 1080 ti GPU
01/12/19
109
EHCI_net
H
2
9.47
25
3.75
8
4.27
20
13.1
58
27.6
77
5.30
3
3.23
1
3.47
5
3.18
3
3.90
32
9.20
4
9.58
4
9.26
8
13.9
11
17.3
49
10.6
12
Anonymous. 3D fill with minimum spanning trees by rolling filter for stereo matching. ISMAR 2019 submission.
Threshold=0.5
C++;i7 core @3.40GHZ
01/16/19
110
3DFMR
H
2
5.44
4
4.10
10
3.37
9
2.99
2
2.95
12
7.63
10
4.55
8
3.26
1
3.95
8
2.16
3
10.2
5
8.28
3
6.37
1
13.2
7
5.86
11
8.46
2
Anonymous. Stereo matching with fusing adaptive support weights. Submitted to IEEE ACCESS, 2019.
C++;Intel Core i56500@3.2GHZ
01/17/19
111
FASW
Q
1
28.6
76
41.7
78
18.1
84
23.1
86
27.2
76
40.6
62
19.1
74
34.9
68
28.1
71
18.5
83
40.8
66
36.4
69
29.3
77
28.4
75
31.1
73
41.0
66
Z. Liang, Y. Guo, Y. Feng, Y. Lei, Q. Wang, and X. Chen. Stereo matching using multilevel cost volume and multiscale feature constancy. Submitted to PAMI 2019.