Mouseover the table cells to see the produced
disparity map. Clicking a cell will blink the ground truth for
comparison. To change the table type, click the links below.
For more information, please see the description of new features.
OpenCV 2.4.8 StereoSGBM method, full variant (2 passes). Reimplementation of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semi-global block matching" method; memory-intensive 2-pass version, which can only handle the quarter-size images. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.
SAD window: 3x3 pixel
Truncation value for pre-filter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: true
C/C++; 1 core, i7@3.3 GHz
07/25/14
1
SGBM2
Q
1
26.4
208
27.9
199
12.1
217
17.8
228
13.7
174
74.5
266
14.0
191
30.3
197
26.3
201
11.0
212
64.4
265
37.9
223
25.8
206
25.3
209
29.3
204
43.7
216
OpenCV 2.4.8 StereoSGBM method, single-pass variant. Reimplementation and modification of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.
SAD window: 3x3 pixel
Truncation value for pre-filter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: false
The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.
OpenCV 2.4.8 StereoSGBM method, single-pass variant. Reimplementation and modification of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.
SAD window: 3x3 pixel
Truncation value for pre-filter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: false
C/C++; 1 core, i7@3.3 GHz
07/25/14
4
SGBM1
F
3
28.4
216
43.5
248
9.09
187
13.6
201
25.9
227
82.0
270
14.4
197
43.4
231
30.3
214
5.98
177
59.3
255
45.8
242
28.5
224
24.9
206
20.1
183
45.9
221
OpenCV 2.4.8 StereoSGBM method, single-pass variant. Reimplementation and modification of H. Hirschmüller's SGM method (CVPR 2006; PAMI 2008).
OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.
SAD window: 3x3 pixel
Truncation value for pre-filter: 63
P1/P2: 8*3*3*3/32*3*3*3
Uniqueness ratio: 10
Speckle window size: 100
Speckle range: 32
Full DP: false
The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.
The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.
Correlation with five, partly overlapping windows on Census transformed images using Hamming distance as matching cost. A left-right consistency check ensures unique matches and filtering small disparity segments removes outliers. Interpolation is done within image rows with the lowest, valid neighboring disparity.
Census window: 7x7 pixel
Correlation window: 9x9 pixel
LR-check: on
Min. segments: 200 pixel
Interpolation: horizontal, lowest neighbor
A fast method for high-resolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/- 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.
nRounds=3
The full set of parameters is listed in the paper and the supplemental materials on the project webpage.
A fast method for high-resolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/- 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.
nRounds=3
The full set of parameters is listed in the paper and the supplemental materials on the project webpage.
C++; Core2 Duo, 2 cores @ 3 GHz
08/27/14
10
LPS
F
3
20.3
188
6.72
99
6.06
146
9.72
164
9.87
149
94.3
275
14.1
192
11.2
119
11.2
129
5.88
173
89.3
276
36.0
209
20.5
166
23.8
199
16.0
158
25.4
162
Kang Zhang, Jiyang Li, Yijing Li, Weidong Hu, Lifeng Sun, and Shiqiang Yang. Binary stereo matching. ICPR 2012.
no post processing is used
the same with the original paper.
C/C++ single thread Intel(R) Core(TM)2 Duo CPU P7370 @ 2.00GHz
This approach is an adaptive local stereo-method. It is integrated into a hierarchical scheme, which exploits adaptive windows. Sub-pix disparities are estimated,but not refined.
L = 10
t = 35
medianK = [3 3]
censusK = [9 7]
lambda = 45;
Block-matching stereo with Summed Normalized Cross-Correlation (SNCC) measure. Standard post-processed is applied, including a left-right check, error island removal (region growing), hole-filling and median filtering.
SNCC (first stage 3x3, second stage 11x11)
min correlation threshold = 0.3
region growing threshold = 2.5 disparity
min region size = 200 pixel
median filter = 1x5 and 5x1
Efficient two-pass aggregation with census/gradient cost metric, followed by iterative cost penalization and disparity re-selection to encourage local smoothness of disparities.
census window size = 9 x 7
max census distance = 38.03
max gradient difference = 2.51
census/gradient balance = 0.09
aggregation window size = 33 x 33
aggregation range parameter = 23.39
aggregation spatial parameter = 7.69
refinement window size = 65 x 65
refinement range parameter = 11.30
refinement spatial parameter = 17.20
cost penalty coefficient = 0.0023
median filter window size = 3 x 3
3 iterations of refinement
confidence threshold of 0.1 for sparse maps
In stereo matching cost filtering methods and energy minimization algorithms are considered as two different techniques. Due to their global extend energy minimization methods obtain good stereo matching results. However, they tend to fail in occluded regions, in which cost filtering approaches obtain better results. In this paper we intend to combine both approaches with the aim to improve overall stereo matching results.
We propose to perform stereo matching as a two-step energy minimization algorithm. We consider two MRF models: a fully connected model defined on the complete set of pixels in an image and a conventional locally connected model. We solve the energy minimization problem for the fully connected model, after which the marginal function of the solution is used as the unary potential in the locally connected MRF model.
Only gradient component (6D vector) of color images is used
A local matching technique utilizing SAD+Census cost measure and a recursive edge-aware aggregation through Successive Weighted Summation. Occlusion handling is provided via left-right cross check and a background favored filling.
smoothness parameter sigma = 24
5x5 Census window, Census weight=0.7, SAD weight=0.3, occlusion threshold=2
This approach triangulates the polygonized SLIC segmentations of the input images and optimizes a lower-layer MRF on the resulting set of triangles defined by photo consistency and normal smoothness. The lower-layer MRF is solved by a quadratic relaxation method which iterates between PatchMatch and Cholesky Decomposition. The lower-layer MRF is assisted by a upper-layer MRF defined on the set of triangle vertices which exploits local 'visual complexity' cues and encourages smoothness of the vertices' splitting properties. The two layers interact through an Alignment energy term which requires triangles sharing a non-split vertex to have their disparities agree on that vertex. Optimization of the whole model is iterated between optimizations of the two layers till convergence where the upper-layer can be solved in closed form.
omega=0.2
tau_grad=15
theta goes from 0 to 100 by smoothstep function in ten iterations
gamma1=30
gamma2=60
gamma3=0.8
Compute the matching cost with a convolutional neural network (accurate architecture). Then apply cross-based cost aggregation, semiglobal matching, left-right consistency check, median filter, and a bilateral filter.
DETAILS:
The network is similar to the one described in our CVPR paper differing only in the values of some hyperparameters. The input to the network are two 11 x 11 image patches. Five convolutional layers with 3 x 3 kernels and 112 feature maps extract feature vectors from the input image patches. The two 112-length feature vectors are concatenated into a 224-length vector which is passed through three fully-connected layers with 384 units each. The final (fourth) fully-connected layer projects the output to a single number---the matching cost. One important addition was the use of data augmentation techniques to increase the size of the training set. We tried to use as much training data as possible. Therefore we combined all of the 2001, 2003, 2005, 2006, and 2014 Middlebury datasets obtaining 60 image pairs. For the newer datasets (2005, 2006, and 2014) we also used several illumination and exposure settings.
A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. (Improved results as of 9/14/2015 due to bug fix in color-to-gray conversion.)
Standard parameters of Libelas as provided with the MiddEval3-SDK.
The method generates multiple proposals on absolute and relative disparities from multi-segmentations. The proposals are coordinated by point-wise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes.
We post-process the depth maps produced by Zbontar & LeCun's MC-CNN technique. We use a domain transform to compute an edge-aware variance measure of our confidence in the depth map, and then run our robust bilateral solver on that depth map and confidence with a Geman-McClure loss function.
The MC-CNN is computed using the publicly-available implementation (https://github.com/jzbontar/mc-cnn) which using the GPU, and the robust bilateral solver is computed using our CPU implementation which does not use the GPU, and is written in vanilla C++.
Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz, 6 cores; 32 GB RAM; NVIDIA GTX TITAN X
This paper proposes a new image-guided non-local dense matching method with a three-step optimization based on the combination of image-guided methods and energy function-guided methods.
Cost Computation:
Window Size: 5
Weighting Coefficient: 0.3
Truncation Threshold (Census): 15
Truncation Threshold (HOG): 1
Image-guided Non-local Matching:
Smooth Term: 6
Penalty Term P1: 0.3
Penalty Term P2: 3
Disparity Interpolation:
Truncation Threshold: 5
Smooth Term: 3
Penalty Term P1: 3
Penalty Term P2: 30
Function Base: 5
An efficient stereo matching algorithm, which applies adaptive smoothness constraints using texture and edge information, is proposed in this work. First, we determine non-textured regions, on which an input image yields flat pixel values. In the non-textured regions, we penalize depth discontinuity and complement the primary CNN-based matching cost with a color-based cost. Second, by combining two edge maps from the input image and a pre-estimated disparity map, we extract denoised edges that correspond to depth discontinuity with high probabilities. Thus, near the denoised edges, we penalize small differences of neighboring disparities.
The method uses the MC-CNN code for the matching cost computation only.
Compute the matching cost with a convolutional neural network (fast architecture). Then apply cross-based cost aggregation, semiglobal matching, left-right consistency check, median filter, and a bilateral filter.
Our approach is an extension of the ELAS (from Geiger et al.) algorithm. We extract edges and sample our candidate support points along them. For every two consecutive valid support points we create a (straight) line segment. We force the triangulation to include the set of line segments (constrained Delaunay) for a better preservation of the disparity discontinuity at the edges.
Parameters as in the original ELAS algorithm.
For sampling candidate support points along the edge segments:
Adaptive sampling activated:
step = ceil(sqrt(img_diag)*0.5);
sampler(sqrt(step) / 2, step / 2, step / 2);
The computation of the sparse disparity maps is achieved by means of a 3D diffusion of the costs contained in the disparity space volume. The watershed segmentations of the left and right views control the diffusion process and valid measurements are obtained by cross-checking.
The estimation of the dense disparity maps uses the sparse measurements as control points and is driven by a 3D watershed separating the disparity space volume into foreground and background pixels.
No post processing (no filtering, no hole-filling, no interpolation) performed.
The concepts of intrinsic curves were revisited and used for:
- disparity search space reduction, resulting in 83% reduction of the disparity range (individually for each pixel) directly from the original resolution of the image without needing hierarchical search
- reducing the ambiguities due to occluded pixels by integrating occlusion clues explicitly into the global energy function as a soft prior
The final energy minimization was done using semi global approach along eight paths.
Matching (data) cost = census transform 7*9
Occlusion cost= from intrinsic curves curvature
Incorporating cues from top-down (holistic) scene understanding into existing bottom-up stereo reconstruction techniques (CoR - Chakrabarti et al. CVPR 2015).
Learned weightings (from 2006 dataset) for High Level Scene Cues. Default parameters for CoR. Images with max disp > 256 were downsampled before the SGM step of CoR.
A 3D label based method with global optimization at pixel level. A bilayer matching cost is employed by first matching small square windows then aggregate on large irregular windows. Global optimization is carried out by fusing candidate proposals, which are generated from our specific superpixel structure.
We propose a method to combine the predicted surface normal constraint by deep learning. With the selected reliable disparities from stereo matching method and effective edge fusion strategy, we can faithfully convert the predicted surface normal map to a disparity map by solving a least squares system which maintains discontinuity. We use the raw matching cost of MC-CNN.
A novel pooling scheme is used to train a matching cost function with a CNN. It widens the size of receptive field effectively without losing the fine details.
The overall post-processing pipeline is kept almost same as the original MC-CNN-acrt, except that the parameter settings are changed as follows:
cbca_num_iterations_1 = 0, cbca_num_iterations_2 = 1, sgm_P1 = 1.3, sgm_P2 = 17.0, sgm_Q1 = 3.6, sgm_Q2 = 36.0, and sgm_V = 1.4.
Torch; the Intel core i7 4790K
CPU and a single Nvidia Geforce GTX Titan X GPU
An energy minimization framework for disparity estimation where energy function consists of intensity matching cost, feature matching cost, IGMRF prior and sparsity priors.
This is a new weakly supervised method that allows to learn deep metric for stereo reconstruction from unlabeled stereo images, given coarse information about the scenes and the optical system. The deep metric architecture is similar to MC-CNN fst.
This is a segmentation based stereo matching algorithm using an adaptive multi-cost approach, which is exploited for obtaining accuracy disparity maps.
We propose a cost aggregation method that efficiently weave together MST-based support region filtering and PatchMatch-based 3D label search. We use the raw matching cost of MC-CNN.
We propose a novel method for stereo estimation, combining advantages of convolutional neural networks (CNNs) and optimization-based approaches. The optimization, posed as a conditional random field (CRF), takes local matching costs and consistency-enforcing (smoothness) costs as inputs, both estimated by CNN blocks. To perform the inference in the CRF we use an approach based on linear programming relaxation with a fixed number of iterations. We address the challenging problem of training this hybrid model end-to-end. We show that in the discriminative formulation (structured support vector machine) the training is practically feasible. The trained hybrid model with shallow CNNs is comparable to state-of-the-art deep models in both time and performance. The optimization part efficiently replaces sophisticated and not jointly trainable (but commonly applied) post-processing steps by a trainable, well-understood model.
Our method is local matching approach using the Guided Filter for cost aggregation. We give appropriate the Guided Filter size for each pixel in input image by the Filter Size Map computed by using the DoG Kernel.
Parameters for Filter Size Map computation:
DoGparam.scalesize = 25 (index of scale space)
DoGparam.mfsize = 1 (window size for Filter Size Map optimization)
Parameters for Guided Filter:
eps = 0.001
Parameters for cost computation:
gamma = 0.11 (Weight of cost)
Parameters for Bilateral Filter in disparity map optimization:
gamma_c = 1
gamma_d = 11
r_median = 19
We propose local expansion moves for estimating dense 3D labels on a pairwise MRF. The data term uses a PatchMatch-like 3D slanted window formulation, where raw matching costs within a window are computed by MC-CNN-acrt and aggregated using guided image filtering. The smoothness term uses a pairwise curvature regularization term by Olsson et al. 2013.
We propose a feature ensemble network leveraging deep convolutional neural network to perform matching cost computation and the disparity refinement. For matching cost computation, patch-based network architecture with multi-size and multi-layer pooling unit is adopted to learn cross-scale feature representations. For disparity refinement, the initial optimal and sub-optimal disparity maps are incorporated and diverse base learners are applied.
We propose a robust learning-based method for stereo cost volume computation. We accomplish this by coalescing diverse evidence from a bidirectional matching process via random forest classifiers. We show that our matching volume estimation method achieves similar accuracy to purely data-driven alternatives and that it generalizes to unseen data much better. In fact, we used the same model trained on Middlebury 2014 dataset to submit to the KITTI and ETH3D benchmarks.
We extend the standard BP sequential technique to the fully connected CRF models with the geodesic distance affinity.
Also a new approach to the BP marginal solution is proposed that we call one-view-occlusion detection (OVOD). In contrast to the standard winner takes all (WTA) estimation, the proposed OVOD solution allows to find occluded regions in the disparity map and simultaneously improve the matching result.
As a result we can perform only one energy minimization process and avoid the cost calculation for the second view and the left-right check procedure.
All parameter settings are given in the C++ MS VS project available at the project website.
We propose a stereo matching algorithm that directly refines the winner-take-all (WTA) disparity map by exploring its statistic significance. WTA disparity maps are obtained from the pre-computed raw matching costs of MC-CNN-acrt.
Semi-Global Matching (SGM) uses an aggregation scheme to combine costs from multiple 1D scanline optimizations that tends to hurt its accuracy in difficult scenarios. We propose replacing this aggregation scheme with a new learning-based method that fuses disparity proposals estimated using scanline optimization. Our proposed SGM-Forest algorithm solves this problem using per-pixel classification. SGM-Forest currently ranks 1st on the ETH3D stereo benchmark and is ranked competitively on the Middlebury 2014 and KITTI 2015 benchmarks. It consistently outperforms SGM in challenging settings and under difficult training protocols that demonstrate robust generalization, while adding only a small computational overhead to SGM.
Median disparity over all training images of the ROB 2018 stereo challenge.
This submission is a baseline for the Robust Vision Challenge ROB 2018. Each pixel is set to the median disparity of the pixels at the same location in the training images. No test image information is used.
03/23/18
62
MEDIAN_ROB
H
2
97.8
280
96.1
279
95.6
279
99.0
280
98.4
280
98.4
279
99.2
280
98.4
280
98.1
279
99.0
280
99.0
280
99.6
280
99.9
280
94.7
280
95.1
279
98.3
279
Average disparity over all training images of the ROB 2018 stereo challenge.
This submission is a baseline for the Robust Vision Challenge ROB 2018. Each pixel is set to the average disparity of the pixels at the same location in the training images. No test image information is used.
A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed.
Updated ELAS submission as a baseline for the Robust Vision Challenge (http://robustvision.net), replacing the original ELAS (H) entry.
Standard parameters as provided with the MiddEval3-SDK and the Robust Vision Challenge stereo devkit.
A modification of the FlowNet 2 architecture [1] for the Robust Vision 2018 Stereo Challenge.
[1] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. Flownet 2.0: Evolution of optical flow estimation with deep networks. CVPR 2017.
See paper.
GTX1070
05/22/18
68
DN-CSS_ROB
H
2
22.8
194
31.4
209
9.28
192
13.5
199
12.4
165
44.3
222
12.1
165
28.1
191
17.6
171
9.11
203
50.9
234
40.0
229
21.2
172
25.0
207
31.9
214
43.2
213
Jie Li, Penglei Ji, and Xinguo Liu. Superpixel alpha-expansion and normal adjustment for stereo matching. Proceeding of CAD/Graphics 2019.
c/c++; core i7 7700@3.6GHz
05/26/18
69
NOSS_ROB
H
2
5.01
45
3.57
28
2.84
34
3.99
59
1.93
31
5.15
24
3.34
56
3.32
44
3.15
46
2.32
68
8.55
50
7.45
33
7.06
56
12.5
66
5.20
60
9.06
60
Benedikt Wiberg. Stereo matching with neural networks. Bachelors thesis, TU Munich 2018. ROB 2018 entry.
Neural Network based on a multidimensional similarity metric and Deeplab v3+
Numerous CNN algorithms focus on the pixel-wise matching cost computation, which is the important building block for many state-of-the-art algorithms. However, these architectures are limited to small and single scale receptive fields and use traditional methods for cost aggregation or even ignore cost aggregation. In this paper, we propose a novel architecture called cascaded multi-scale and multi-dimension network (MSMD) to take them both into consideration. Firstly, we propose a new multi-scale matching cost computation sub-network, in which two different sizes of receptive fields are implemented parallelly. In this way, the network can make the best use of both variants to balance the trade-off between the increase of receptive field and the loss of details. Furthermore, we show that our multi-dimension aggregation sub-network which contains 2D convolution and 3D convolution operations can provide rich context and semantic information for estimating an accurate initial disparity.
A robust solution for semi-dense stereo matching is presented. It utilizes two CNN models for computing stereo matching cost and performing confidence-based filtering, respectively. Compared to existing CNNs-based matching cost generation approaches, our method feeds additional global information into the network so that the learned model can better handle challenging cases, such as lighting changes and lack of textures. Through utilizing non-parametric transforms, our method is also more self-reliant than most existing semi-dense stereo approaches, which rely highly on the adjustment of parameters.
Matlab, GTX1080Ti, Lua, Python
06/27/18
75
DCNN
H
2
10.9
121
5.66
81
4.98
118
6.49
122
5.73
100
12.5
111
8.51
135
15.6
138
10.9
128
3.08
100
24.1
154
20.2
140
16.8
148
15.5
113
10.3
121
13.8
115
Julien Valentin, Adarsh Kowdle, Jonathan Barron, et al. Depth from motion for smartphone AR. ACM TOG 37(6):193 (Proc. of SIGGRAPH Asia), 2018.
Single core of a Mobile Phone (QualComm Snapdragon 821 Kryo @ 2.15Ghz)
we propose a MST-based stereo
matching method using image edge and brightness
information due to the classical MST based methods were
used to produce the inaccurate matching weight in the
areas of image boundaries and similar color background.
We propose four efficient feature extractors based on convolutional neural networks for stereo matching cost computation. Two of them generate multiscale features with diverse receptive field sizes. These multiscale features are used to compute the corresponding multiscale matching costs. We then determine an optimal cost by combining the multiscale costs using edge information. On the other hand, the other two feature extractors produce uni-scale features by combining multiscale features directly through fully connected layers. Finally, after obtaining matching costs using one of the four extractors, we determine optimal disparities based on the cross-based cost aggregation and the semiglobal matching.
We design a full-convolutional network to generate disparity map as a regression problem. Applying pyramid pooling and skip connection to integrate hierarchical context information.
The method comprises two main steps. First, we use adaptive support weights for local matching. Apart from the color similarity and geometric distance, the adaptive weight distribution favors pixels in the block matching with smaller cost. Besides, we use a multiscale strategy with invalidation criteria to reduce match ambiguity and computational time.
Second, a global interpolation using a variational formulation is carried out. The energy functional penalizes deviations from the local disparity estimation at different scales.
Local approach (DAWA): 23x23 squared window, beta=11, lambda=6, gamma=4, pixel precision 1/4, three scales for multiscale procedure.
Variational model: alpha=1, gamma=5, phi1=30, phi2=15.
Stereo matching process is attracted numbers of study in recent years. The process is unique and difficult due to visual discomfort occurred which contributed to effect of accuracy of disparity maps. By using multistage technique implemented most of Stereo Matching Algorithm; taxonomy by D. Scharstein and R. Szeliski, in this paper proposed new improvement algorithm of stereo matching by using the effect of Adaptive Weighted Bilateral Filter as main filter in cost aggregation stage which able contribute edge-preserving factor and robust against plain colour region. With some improvement parameters in matching cost computation stage where windows size of sum of absolute different (SAD) and thresholds adjustment was applied and Median Filter as main filter in refinement disparity map’s stage may overcome the limitation of disparity map accuracy. Evaluation on indoor datasets, latest (2014) Middlebury dataset were used to prove that Adaptive Weighted Bilateral Filter effect applied on proposed algorithm resulted smooth disparity maps and achieved good processing time.
This paper presents a novel unsupervised stereo matching cost for stereo matching. Specifically, a novel two-branch convolutional sparse coding (CSC) is used to learn the convolution filter bank without ground truth disparity maps. Then, the sparse representations over the learned convolutional filter bank are utilized to measure the similarity between image patches, namely, the stereo matching cost can be computed by measuring the l1 distance between sparse representations of image patches.
Hierarchical MGM-16 where coarser level results limit per pixel disparity search range. Post-Processing at each level include Joint Bilateral Filter, Peak removal and, consistency check. The final disparity maps are interpolated using Discontinuity preserving interpolation
See Paper
C/C++; Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 16 Cores
In this work, we propose a learning-based method to denoise and refine disparity maps of a given stereo method. The proposed variational network arises naturally from unrolling the iterates of a proximal gradient method applied to a variational energy defined in a joint disparity, color, and confidence image space. Our method allows to learn a robust collaborative regularizer leveraging the joint statistics of the color image, the confidence map and the disparity map. Due to the variational structure of our method, the individual steps can be easily visualized, thus enabling interpretability of the method. We can therefore provide interesting insights into how our method refines and denoises disparity maps. The efficiency of our method is demonstrated by the publicly available stereo benchmarks Middlebury 2014 and Kitti 2015.
We have collected 2000 pairs of stereo images with high accuracy disparity maps to fine-tune the network. Our goal is to improve the generalization performance of networks.
fine-tune num: 90000; the initial learning rate: 1e-3.
We propose "DeepPruner", a real-time stereo matching algorithm, which combines the strength of deep network and search space pruning techniques. Towards this goal, we developed a differentiable PatchMatch module that allows us to discard most disparities and generates a sparse representation of the cost-volume. We then exploit this representation to learn which range to prune for each pixel. Our method achieves competitive results on KITTI / SceneFlow datasets while running in real-time at 62ms. Moreover, we obtain the first place (on overall rankings) on the Robust Vision Challenge. For more details, check out our paper and source code.
It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest inference methods, a truncated max-product Belief Propagation, and add what is necessary to make it a proper component of a deep learning model: We connect it to learning formulations with losses on marginals and compute the backprop operation. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs), allowing us to design a hierarchical model composing BP inference and CNNs at different scale levels. The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
We novelly formulate the scale transformation of cost volume as a Bayes inference and propose the inter-scale subnetwork to reliably and adaptively generate details under the guidance of geometric information.
we fine-tune the model pre-trained on Scene Flow for 300 epochs with the learning rate of 0.001 in the first 100 epochs and 0.0001 in the rest 1000 epochs.
A novel encoding pattern, which is designed for the situation of radiometric distortion, is proposed. The pattern is applied for stereo matching cost function.
The method is based on a Max-tree hierarchical representation of image pairs, which we use to identify matching regions along image scan-lines.
The number of color quantization levels was set to 16. α was set to 0.8. The minimum (or maximum) width of nodes to be matched was set to 0 (or 1/2 of the input image width). Matched node levels S was set to {1, 0}. The maximum neighborhood size ω_γ was set to 10. The size of the Gaussian kernel used to aggregate the cost volume was 21. The minimum confidence percentage parameter ω_Π was set to 12. In guided pixel refinement, ω_ω was set to 12% when sparse disparity maps were generated.
A deep-learning model PSMNU, modified based on PSMNet, produces initial disparity and uncertainty on the down-sampled image. SGBMP performs full resolution prediction based on the initial disparity and uncertainty.
PSMNU: max disparity 256, trained on Scene Flow dataset (Flyingthings3D & Monkaa) only, without data augmentation. SGBMP: \lambda_b = 3, \lambda_s = 0.1, \lambda_d = 0.1. For the initial prediction of PSMNU, images are down-sampled to 768x1024.
The algorithm is based on a hierarchical representation of image pairs which is used to restrict disparity search range. We propose a cost function that takes into account region contextual information and a cost aggregation method that preserves disparity borders.
Using robust statistics and probability to detect and refine outliers in disparity maps by leveraging the joint statistics of the given disparity map and its reference image.
lamda=1,r1=5,r2=25, sigma=10,tho_d=1, tho_s=4
Matlab Intel® Core™ i7-4600U CPU
05/14/20
110
SRM
H
2
13.1
145
8.50
122
7.04
164
7.86
143
7.73
131
16.1
130
7.90
127
18.4
154
18.5
175
5.03
151
22.3
145
20.0
137
18.1
160
18.5
151
11.3
129
19.3
144
Haoyu Ren, Mostafa El-Khamy, and Jungwon Lee. Stereo disparity estimation via joint supervised, unsupervised, and weakly supervised learning. ICIP 2020.
The propose a novel stereo matching algorithm with fuzzy logic and also implement it on a FPGA embedded system. We try to select the best window size of SAD for each pixel by leveraging fuzzy logic.
We used block the diferents size. (ex. 5, 15, 21)
Cyclone V 5CSEBA6U23I7 FPGA
06/08/20
115
MANE
H
2
30.9
228
54.7
268
11.5
213
14.6
208
29.4
237
52.6
243
26.4
241
45.1
236
31.5
219
11.5
215
42.5
203
41.8
234
33.1
245
31.6
239
34.2
216
43.5
214
Xianjing Cheng and Yong Zhao. HLocalExp-CM: Confidence map by hierarchical local expansion moves for stereo matching. To appear in Journal of Electronic Imaging, 2022.
GA-Net reference submission as baseline for the stereo benchmark of the robust vision challenge 2020.
All method credits go to the original author (Zhang et al.)
Submission by Nicolas Jourdan, TU Darmstadt, RVC 2020 team.
Trained on Middleburry, KITTI, ETH3D from the KITTI checkpoint made available in the GANet repository on Github by the original authors.
Frequency of sampling was adapted to the dataset size. Test images scaled to next multiple of 48.
We proposed a robust disparity estimation network. Our major novelty compared to existing work is a novel usage of attention, which can handle scenes with different scenarios.
The RVC submission trained by quarter-resolution Middlebury + KITTI + ETH. After validation, we will go with quarter resolution instead of half-resolution
Accurate disparity prediction is a hot spot in computer vision, and how to efficiently exploit contextual information is the key to improve the performance. In this paper, we propose a simple yet effective non-local context attention network (NLCANet) to exploit the global context information by using attention mechanisms and semantic information for stereo matching. First, we develop a 2D geometry feature learning (GFL) module to get a more discriminative representation by taking advantage of multi-scale features and form them into the variance-based cost volume. Then, we construct a non-local attention matching (NLAM) module by using the non-local block and hierarchical 3D convolutions, which can effectively regularize the cost volume and capture the global contextual information. Finally, we adopt a geometry refinement (GR) module to refine the disparity map to further improve the performance. Moreover, we add the warping loss function to help the model learn the matching rule of the non-occluded region. Our experiments show that (1), our approach achieves competitive results on KITTI and SceneFlow datasets in the end-point error (EPE) and the fraction of erroneous pixels (D 1 ); (2), our proposed method particularly has superior performance in the reflective regions and occluded areas.
600 * 10^-3;
200 * 10^-4;
100 * 10^-5
Nvidia v100
08/11/20
120
NLCA_NET_v2_RVC
H
2
10.4
112
11.8
142
4.12
76
6.39
120
6.44
113
19.7
147
10.9
153
14.5
133
13.2
137
3.26
110
21.2
141
14.7
94
10.1
84
14.5
100
7.17
95
11.5
99
Anonymous. Cascade and fuse cost volume for efficient and robust stereo matching. CVPR 2021 submission 1728.
we construct multi-scale cost volumes and fuse lower scale cost volumes and cascade higher scale ones to realize efficient and robust stereo matching
we first pre-train our model on sceneflow dataset and then finetune it jointly on Middlebury + KITTI + ETH3D
tesla V100
08/12/20
121
CFNet_RVC
H
2
10.1
107
14.4
159
7.81
173
7.12
135
6.61
115
15.5
122
7.53
120
12.3
124
11.5
133
3.02
97
10.7
70
16.6
112
10.7
88
15.4
112
10.9
127
9.01
59
Haiwei Sang and Yong Zhao. A pixels based stereo matching algorithm using cooperative optimization. Submitted to IEEE Access, 2020
This paper presents a stereo matching algorithm based on inter-Pixels cooperative optimization.
C/C++
08/30/20
122
LE_PC
H
2
5.58
53
3.52
26
2.99
42
4.24
66
1.92
30
5.39
29
3.42
60
3.16
41
3.72
55
2.30
66
7.83
46
9.90
53
7.79
62
17.4
138
4.74
50
9.51
67
Chenglong Xu, Chengdong Wu, Daokui Qu, Haibo Sun and Jilai Song. Accurate and efficient stereo matching by log-angle and pyramid-tree. Submitted to IEEE TCSVT, 2020.
Combined bearings-only cost metric and Cross-regional connection based aggregation.
The approach relies on a fast multi-resolution initialization step, differentiable 2D geometric propagation and warping mechanisms to infer slanted plane hypotheses at multiple resolutions.
We propose a novel lightweight network for stereo estimation. The method uses densely connected layer structures to learn expressive features without the need of fully-connected layers or 3D convolutions. This leads to a network structure with only 0.37M parameters while still having competitive results. The post-processing consists of filtering, a consistency check and hole filling.
\eta = 6 \times 10^{-6}
python 3.6; pytorch 1.2.0; GPU RTX 2080 TI
11/10/20
127
FC-DCNN
H
2
17.9
171
21.2
181
6.52
156
9.56
163
14.1
175
31.9
184
23.4
232
23.4
165
19.7
177
5.93
175
26.9
160
22.8
153
20.0
165
19.3
160
18.2
170
23.9
158
Anonymous. RLStereo: Real-time stereo matching based on reinforcement learning. CVPR 2021 submission 4443.
Tensorflow 2.0; Nvidia GeForce Titan RTX GPU
11/12/20
128
RLStereo
H
2
27.9
215
20.5
180
15.0
231
23.5
244
26.3
228
51.5
240
35.8
266
27.1
184
23.4
188
15.6
228
63.6
264
32.3
190
21.5
175
23.2
191
44.7
246
17.4
132
Anonymous. UnDAF: A general unsupervised domain adaptation framework for disparity, optical flow or scene flow estimation. CVPR 2021 submission 236.
Pytorch
11/12/20
129
UnDAF-GANet
H
2
16.2
164
3.74
38
2.94
38
16.7
222
18.3
191
24.1
162
26.3
240
19.2
156
15.7
156
1.86
53
36.8
191
26.8
170
11.1
93
24.8
205
6.54
83
28.0
170
Anonymous. Semi-synthesis: a fast way to produce effective datasets for stereo matching. CVPR 2021 submission 3688.
We propose a novel method namely semi-synthesis for producing large-scale on demand stereo datasets which doesn't require further fine-tuning on real datasets, i,e, we haven't fine-tuned the submission model on Middlebury training data.
Python 1 Nvidia 1080Ti GPU
11/16/20
130
SSCasStereo
H
2
15.2
160
33.6
214
5.73
139
8.13
147
12.6
167
51.1
237
8.19
130
16.7
147
5.02
69
5.70
171
48.5
220
17.3
119
16.0
138
20.1
164
12.3
136
9.25
63
Anonymous. Stereo matching by high-resolution correlation volume learning and epipolar lookup. CVPR 2021 submission 1654.
Tesla V100 GPU
11/17/20
131
RASNet
H
2
13.1
146
11.9
144
5.65
138
5.71
101
8.36
136
25.8
166
8.31
132
7.18
90
5.29
76
2.93
94
25.0
157
16.0
108
13.9
121
18.4
149
38.2
227
21.4
153
Anonymous. A decomposition model for stereo matching. CVPR submission 2543.
GTX 1080Ti GPU
11/21/20
132
DecStereo
F
3
20.2
187
19.4
174
11.9
216
15.6
215
13.5
173
23.0
159
26.7
243
13.3
128
15.1
154
7.60
190
28.3
167
30.2
184
23.4
186
17.6
139
38.9
233
38.4
199
Xianjing Cheng and Yong Zhao. Local PatchMatch based on superpixel cut for efficient high-resolution stereo matching. Submitted to BABT (Brazilian Archives of Biology and Technology), 2021.
we propose an efficient method,i.e, local PatchMatch based on superpixel cut for high-resolution stereo matching.
the number of superpixels N is 500, two iterative parameters: k_fea is set to 9 and k_SP is set to 7. The parameter γ to measure the similarity weight is set to 50 and k=8000.
i5-9400 CPU@2.90GHz, C++;
11/25/20
133
LPSC
H
2
10.7
117
5.15
65
4.23
82
5.48
93
6.38
110
16.5
132
7.84
124
9.56
105
10.3
125
4.02
129
20.2
136
19.0
129
17.7
154
18.5
152
9.73
114
18.0
136
Menglong Yang, Fangrui Wu, Wei Li, Peng Cheng, and Xuebin Lv. CooperativeStereo: Cooperative convolutional neural networks for stereo matching. Submitted to Pattern Recognition 2020.
Tensorflow1.0, GTX 1080Ti
11/26/20
134
CooperativeStereo
Q
1
28.8
220
28.5
204
12.3
220
17.3
227
18.5
195
62.3
254
22.4
229
36.3
213
24.7
190
15.8
229
74.5
270
37.8
221
28.4
222
26.6
214
41.6
241
28.4
173
Peng Yao and Jieqing Feng. Stacking learning with coalesced cost filtering for accurate stereo matching. Submitted to Journal of Visual Communication and Image Representation 2020.
By leveraging Stacking Learning with Coalesced Cost Filtering to make the conventional algorithms achieve more accurate disparity estimations.
For the Random Forest, we set 10 Decision Trees, maximum depth is 25 and minimum number of samples in each node to split equal to 12.
C++, Intel Core-i7 Octa-Core CPUs;
12/22/20
135
SLCCF
H
2
8.83
98
6.97
104
4.90
114
6.05
114
4.35
85
8.89
68
5.33
82
6.29
83
5.15
71
4.80
146
13.0
88
18.1
124
17.8
156
17.7
141
6.93
90
15.4
124
Lingyin Kong, Jiangping Zhu, and Sancong Ying. Local stereo matching using adaptive cross-region based guided image filtering with orthogonal weights. Submitted to Mathematical Problems in Engineering, 2020.
we propose an improved cost aggregation method, in which the matching cost volume is filtered by ACR-GIF-OW
This model is trained on low-resolution data but aims at high-resolution images. It uses a recurrent module to iteratively update a coarse disparity prediction. Then a special refinement module makes a final adjustment. The recurrent update and final refine are applied in a patch-wise manner across the initial disparity.
Trained on Scene Flow, Middlebury 1/4 size, and TartanAir (sampled) datasets. Training disparity range 256 pixels, testing range over 1000 pixels.
Trained on 4 Tesla V100 GPUs. Inference on 1 Tesla V100 GPU.
03/05/21
137
ORStereo
F
3
19.1
182
38.9
232
9.97
205
9.21
158
23.3
215
42.6
215
13.0
179
18.2
153
6.63
90
4.93
147
35.4
188
33.1
193
24.1
190
23.6
196
18.2
169
26.0
166
Anonymous. Local expansion moves for stereo matching based on RANSAC confidence. ICCV 2021 submission 3073.
A stereo matching algorithm based on collaborative optimization among pixels is proposed. Based on local expansion, the matching energy function of pixels is defined by using the color and gradient features of adjacent pixels, and the cooperative competition mechanism between pixels is introduced.
iterations = 5; pmIterations = 2;
C/C++,i7-4790 CPU@3.60GHz.
03/05/21
138
LocalExp-RC
H
2
5.54
52
3.78
40
3.02
43
3.85
52
2.08
37
5.95
38
3.48
61
3.61
52
3.65
53
2.52
78
10.3
66
6.85
27
7.25
58
16.1
124
5.12
58
10.2
77
Xianjing Cheng, Yong Zhao, Zhijun Hu, Xiaomin Yu, Ren Qian, and Haiwei Sang. Superpixel cut-based local expansion for accurate stereo matching. IET Image Processing, 2021.
i7 CPU @2.2GH,C++, 8 cores
04/22/21
139
LESC
H
2
6.78
67
4.07
44
3.46
58
3.26
44
3.36
71
9.15
71
4.08
66
4.76
61
5.21
75
2.80
90
11.7
78
13.0
79
10.2
85
17.0
131
5.52
69
12.5
107
Hao Liu, Hanlong Zhang, Xiaoxi Nie, Wei He, Dong Luo, Guohua Jiao and Wei Chen. Stereo matching algorithm based on two-phase adaptive optimization of AD-census and gradient fusion. IEEE RCAR 2021.
In this paper, an improved AD-Census algorithm is proposed to improve the matching ratio in some special regions. The proposed algorithm contains an optimization method and three similarity metrics.
We propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs.
Anonymous. Deep learning based stereo cost aggregation on a small dataset. DICTA 2021 submission.
4*RTX 3090
06/11/21
145
R3DCNN
H
2
33.0
240
34.2
217
15.8
233
13.4
198
41.7
268
47.9
231
22.0
227
60.1
270
57.4
271
12.6
224
40.3
198
46.4
245
26.8
214
37.0
258
19.3
178
45.2
220
Anonymous. Estimate regularization weight for local expansion moves stereo matching. ACPR 2021 submission.
The method that estimate optimal parameters for MRF stereo can not be directly used to estimate parameters for local expansion moves stereo. To estimate regularization weight for local expansion moves stereo, we propose the probabilistic mixture models for slanted patch matching terms and curvature regularization terms.
This paper presents an accurate and efficient hierarchical BP framework using the representation of the image segmentation pyramid (ISP). We design a hierarchy of MRF networks using the graph of superpixels at each ISP level.
We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT. We introduce multi-level convolutional GRUs, which more efficiently propagate information across the image. A modified version of RAFT-Stereo can perform accurate real-time inference.
Xianjing Cheng and Yong Zhao. Segment-based disparity computation with occlusion handling for accurate stereo matching. Submitted to IEEE TCSVT, 2021.
i7-10870H CPU @2.20GHz,C++
08/22/21
150
SDCO
H
2
19.0
181
30.4
207
5.92
143
9.11
156
21.5
209
37.5
198
12.3
168
26.8
180
16.7
167
5.68
170
29.4
170
30.6
186
25.6
204
23.1
190
17.5
166
18.9
142
Krishna Shankar, Mark Tjersland, Jeremy Ma, Kevin Stone, and Max Bajracharya. A learned stereo depth system for robotic manipulation in homes. ICRA 2022 submission.
A lightweight network with dilated ResNet feature extractor, a correlation cost volume run at a low resolution, and a refinement network to get a full resolution disparity output. Sparse disparity is processed from the dense disparity using a threshold on the network confidence output and a region grower to remove suspected bad disparities.
Max disparity 512
Cost volume downsample 8x
PyTorch on Nvidia Titan RTX
08/24/21
151
MMStereo
F
3
12.7
141
27.9
200
8.71
183
8.81
153
11.7
159
26.9
171
5.82
98
20.9
160
14.6
149
4.10
134
15.4
109
16.0
108
14.2
124
13.6
86
9.71
113
7.35
50
Anonymous. Region separable stereo matching. 3DV 2021 submission 110.
In stereo matching, there are two cases of poor performance: (1) the interior of large objects, and (2) object boundaries and small objects. In this work, we present feature enhancement stereo matching network to solve the problems.
None
2080ti
11/21/21
156
FENet
H
2
11.3
128
7.70
114
3.91
67
3.97
58
6.24
108
16.7
133
5.78
94
32.1
200
32.4
222
2.57
80
11.8
79
10.8
59
6.90
55
13.4
79
5.41
65
11.2
92
Junda Cheng and Gangwei Xu. CoAtRS stereo: Fully exploiting convolution and attention for stereo matching. Submitted to IEEE Transactions on Multimedia, 2021.
Madiha Zahari. A new cost volume estimation using modified CT. Submitted to the Bulletin of Electrical Engineering and Informatics (BEEI), paper ID 4122, 2022.
Visual Studio c++, Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz 1.99 GHz
we propose a novel and effective network architecture
RDNet that utilizes edge detection and multi-scale cost volume for robust stereo matching.
β1 = 0.9, β2 = 0.999
PowerEdge T640 GPUs
08/05/22
166
RDNet
H
2
11.3
126
11.2
138
5.24
128
5.45
91
6.51
114
16.7
134
8.89
138
16.7
147
14.9
153
4.75
145
16.5
117
19.3
134
15.4
134
12.5
67
12.1
133
14.2
118
Zhelun Shen. Digging into uncertainty-based pseudo-label for robust stereo matching. Submitted to TPAMI, 2022.
Tesla V100
08/08/22
167
UCFNet_RVC
H
2
10.7
118
12.2
147
6.48
152
5.83
102
5.90
105
16.9
136
6.61
108
15.8
141
14.6
150
2.73
89
11.4
75
18.8
127
11.0
92
18.9
158
10.7
123
11.4
96
Pengxiang Li, Chengtang Yao, Yunde Jia, and Yuwei Wu. Inter-scale similarity guided cost aggregation for stereo matching. Submitted to IEEE Transactions on Circuits and Systems for Video Technology, 2022.
python; 2 cores + RTX 3090 GPU
08/09/22
168
issga
H
2
18.9
179
12.0
146
11.6
214
11.1
185
18.3
192
14.3
118
14.6
199
28.6
193
26.2
200
5.90
174
13.5
93
41.4
232
21.9
178
22.2
184
19.4
179
30.7
177
Xiao Guo. Feature extractor augmentation network. Submitted to Neurocomputing, 2022.
Stereo matching algorithm based on multi-cost computation with hybrid aggregation using random walk and image segmentation with filtering in refinement stage.
A unified global matching formulation and framework for optical flow and stereo depth estimation
Please refer to the paper
V100 GPU
09/01/22
171
GMStereo
F
3
7.14
69
6.30
95
6.20
149
6.22
117
6.62
116
9.79
76
2.76
52
5.69
77
5.17
73
4.04
130
14.0
99
11.2
67
6.81
54
11.8
60
6.90
88
12.8
109
Xiaowei Yang. A light-weight stereo matching network based on multi-scale features fusion and robust disparity refinement. Submitted to IET Image Processing, 2022.
In recent years, convolutional-neural-network based stereo matching methods have achieved significant
gains compared to conventional methods in terms of both speed and accuracy. Current state-of-the-art disparity
estimation algorithms require many parameters and large amounts of computational resources and are not suited to
applications on edge devices. In this paper, we propose an end-to-end light-weight network (LWNet) for fast stereo
matching, which consists of an efficient backbone with multi-scale feature fusion for feature extraction, a 3D U-Net
aggregation architecture for disparity computation and a color guidance in 2D CNN for disparity refinement.
(β1= 0.9, β2 = 0.999)
GeForce RTX 3090
09/20/22
172
LWNet
H
2
40.9
263
38.1
230
18.4
244
30.5
262
33.3
255
43.2
218
30.9
257
49.2
247
50.6
260
22.8
253
58.1
251
54.2
264
41.8
266
37.5
260
58.8
264
81.5
273
Xue Liu. Stereo matching with monocular augmentation. Submitted to Signal Processing Letters, 2022.
We propose an accurate and lightweight convolutional neural network for stereo estimation with depth completion. The whole method consists of three parts. The first part consists of fully-convolutional densely connected layers that computes expressive features of rectified image pairs.
learning rate: 0.00006 for feature extraction and similarity and learning rate: 0.000006 for depth completion
Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Bregier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, and Jerome Revaud. Self-supervised pretraining for 3D vision tasks by cross-view completion. NeurIPS 2022; RVC 2022 submission.
pretraining self-supervised model
masking rate 0.9
python Nvidia
10/03/22
178
CroCo_RVC
F
3
15.1
159
7.43
110
5.85
142
6.71
128
11.7
158
15.4
120
3.94
64
36.2
212
35.8
231
3.41
115
18.1
127
29.3
182
10.9
90
18.0
145
10.6
122
21.0
151
Anonymous. An improved RaftStereo trained with multiple mixed datasets for Robust Vision Challenge. RVC 2022 submission.
GTX 2080ti
10/03/22
180
iRaftStereo_RVC
H
2
8.07
86
9.13
127
8.25
178
5.55
98
4.68
89
6.92
48
6.41
106
6.29
83
6.19
88
3.96
127
17.9
125
13.0
78
9.58
81
11.4
56
9.24
112
11.8
103
Yang Xiaowei and Feng Zhiguo. Attention guide cost volume for stereo matching. Submitted to IET Image Processing, 2022.
GTX 3090
10/06/22
181
AGCVNet
H
2
12.0
133
10.6
136
5.14
123
5.47
92
7.00
120
17.0
138
8.91
140
18.9
155
15.7
157
4.64
142
15.8
112
19.1
132
16.6
146
13.7
87
15.0
153
14.6
123
Han Li. Adaptive slice stereo matching network. Submitted to Image and Vision Computing, 2022.
pytorch
10/06/22
182
GwcSlice
H
2
12.7
140
13.4
152
4.76
108
5.33
90
7.69
130
17.0
137
11.1
155
13.7
129
9.88
121
4.22
138
20.1
135
20.1
139
17.4
151
16.9
130
14.0
145
36.5
192
Han Li. Multi-cascade stereo matching network. Submitted to Neurocomputing, 2022.
Pan Lei. A multi view solid based on removal using the matching method of the deformation window. Submitted to Information Processing Letters, 2022.
2080ti
10/13/22
185
DW
F
3
19.9
186
15.1
162
13.1
222
14.3
206
14.2
176
14.2
117
8.90
139
16.4
143
16.0
161
12.3
223
50.2
229
36.4
211
21.7
177
21.8
180
31.8
212
40.4
206
Yang Xiuze. A lightweight multilevel cascaded recurrent network for high resolution stereo matching. Submitted to Neurocomputing, 2022.
Python;32cores+NVIDIA GeForce RTX3090 GPU
10/16/22
186
LMCR-Stereo
F
3
6.27
59
6.20
91
4.59
101
3.92
54
2.66
52
4.52
17
4.88
76
3.65
53
3.41
50
2.08
60
16.8
120
11.2
69
8.58
70
13.2
76
6.89
87
10.5
79
Anonymous. Revisiting cost aggregation in stereo matching from disparity classification. CVPR 2023 submission 1116.
Cost aggregation plays a critical role in existing stereo
matching methods. Generally, aggregating matching costs
in homogeneous regions with similar disparities is benefi-
cial to matching accuracy. However, previous approaches
commonly use 3D convolutions for cost aggregation with-
out considering the homogeneity of different regions. In
this paper, we revisit cost aggregation in stereo match-
ing from a perspective of disparity classification and pro-
pose a generic yet efficient Disparity Context Aggregation
(DCA) module to improve the performance of CNN-based
methods.
Parameters:4.96 M;
Only using half-resolution Middlebury training images for validation.
Anonymous. Global occlusion-aware transformer for robust stereo matching. ICCV 2023 submission 6309.
Occlusion-Aware Global Aggregation for robust stereo matching using vision transformer.
Iter=18, resolution=1/8
NVIDIA RTX TITAN X
03/07/23
199
GOAT18
H
2
8.73
96
7.26
107
7.32
167
6.80
129
3.47
75
10.3
81
10.4
151
5.14
69
5.16
72
4.95
149
15.9
113
13.9
86
11.2
96
9.62
39
13.1
139
16.4
126
Kai Zeng. Deep stereo network with MRF-based cost aggregation. Submitted to IEEE TCSVT 2022.
Tesla V100
04/18/23
200
DMCANet
H
2
7.79
81
7.91
115
4.12
76
3.79
51
4.26
84
11.2
98
10.1
149
6.76
87
4.85
67
3.32
113
12.9
87
13.3
82
10.5
87
12.9
73
9.11
110
10.1
75
Wang Yun and Wang Longguang. ADStereo: Learning stereo matching from adaptive downsample with disparity alignment. Submitted to IEEE TIP, 2023.
NVIDIA 3090TI
04/28/23
201
ADStereo
H
2
18.0
172
16.4
166
14.9
230
12.6
193
21.3
208
20.6
149
16.6
204
15.8
140
16.0
160
7.43
187
19.1
132
52.0
255
24.8
199
18.1
146
17.7
167
11.2
91
Peng Yao, Haiwei Sang, and Xu Cheng. Structured support vector machine with coarse-to-fine PatchMatch filtering for stereo matching. Submitted to The Visual Computer, 2023.
Stereo Matching Using Structured Supported Vector Machine and Coarse to Fine Features
The proposed IGEV-Stereo builds a combined
geometry encoding volume that encodes geometry and context information as well as local matching details, and iteratively indexes it to update the disparity map.
Details in code.
Python RTX 3090
06/22/23
203
IGEV-Stereo
F
3
4.83
44
3.17
19
2.46
26
1.97
18
2.19
42
5.63
31
1.22
15
16.2
142
9.20
111
1.17
29
3.77
18
4.93
16
5.35
41
6.99
23
2.31
20
5.00
32
Anonymous. CCL-Stereo: Stereo matching via looking up coupled correlations. ICCSIP 2023 submission.
TITAN RTX
06/26/23
204
CCL-Stereo
F
3
30.9
230
50.9
262
9.17
188
11.0
183
33.0
250
88.2
272
1.91
34
47.3
242
26.8
202
11.7
217
41.7
201
37.4
216
23.7
187
28.8
224
63.0
266
42.8
211
Wenhuan Wu, Xi Xu, Haokun Zhang, and Yanzhang Dong. Stereo matching with directional trees. Submitted to The Visual Computer, 2023.
Junhong Min and Youngpil Jeon. Confidence-aware symmetric stereo matching via u-net transformer. Submitted to ICRA 2024.
We propose a novel deep stereo matching network a new real-world stereo dataset of cluttered objects taken with a commercially available stereo sensor. We design a U-shaped architecture with various types of attentions which more efficiently extracts global and local contexts from rectified image pairs, resulting in highly accurate disparities. Furthermore, its symmetric structure allows simultaneous estimation both left and right disparity. It can also implicitly estimate the uncertainty i.e. the confidence of estimated disparities.
4 level unet for feature extraction
and 3 level unet for refinement
channel dimension is 128.
Yang Zhang, Peng Song, and Bo Song. A local side window algorithm with tree segmentation for stereo matching. Submitted to Laser and Optoelectronics Progress 2023.
I9-9880H CPU and RT5000GPU
10/30/23
214
LSTS
H
2
17.3
168
8.70
125
6.18
148
8.41
149
9.63
146
21.3
152
13.2
184
29.5
194
29.1
211
5.00
150
25.0
157
24.9
162
22.6
180
21.4
176
15.4
155
33.1
184
Kunhong Li, Longguang Wang, Ye Zhang, Kaiwen Xue, Shunbo Zhou, and Yulan Guo. LoS: Local structure-guided stereo matching. CVPR 2024.
RTX 4090
10/30/23
215
LoS
F
3
4.20
38
5.85
84
4.92
116
4.64
73
2.77
57
3.92
14
1.32
20
2.36
35
2.17
35
1.81
52
8.18
48
6.58
23
4.55
29
8.57
33
4.57
44
5.06
34
Guohui Wang and Yuanwei Bi. GASNet: Light-wise gated attention for efficient stereo matching. Submitted to Visual Computer, 2023.
Nvidia RTX 3090 GPU
11/10/23
216
GASNet
F
3
33.1
241
21.3
183
16.9
238
26.3
252
33.2
253
39.5
201
17.7
208
26.7
179
26.0
199
21.3
248
54.1
245
46.9
246
33.3
246
36.8
257
63.2
268
63.4
252
Haoxuan Sun and Taoyang Wang. Weighted RANSAC disparity refinement based on estimated single-view normal map and SAM. Submitted to IEEE TIP 2023.
This article presents a disparity map algorithm to improve the depth map estimation based on Census Transform and hierarchical segment-tree on each block.The stereo matching algorithm presented in this study comprises of four steps: Cost Computation, Cost
Aggregation, Optimization, and Post-Processing, all of which will refine the final disparity map.
CostAlpha = 0.3;
CEN-WND = 9x11;
k = 1600;
LR checking = Yes
PY_LVL = 3.
C++, a personal PC with a CPU i7 8700@3.2 GHz, an RTX 2070 SUPER, and 16GB RAM.
12/31/23
222
H-CENST
Q
1
38.4
255
41.6
240
26.7
271
31.8
265
33.0
252
43.0
216
32.7
260
53.1
255
50.5
259
24.8
259
51.4
237
47.0
248
36.7
256
31.9
243
40.5
239
53.4
235
Anonymous. DualNet: Self-supervised stereo based on knowledge distillation. ECCV 2024 submission 327.
Unsupervised Stereo Matching methods have made significant strides recently. However, these approaches have predominantly relied on the assumption of photometric consistency, leading to potential limitations: sensitivity to illuminance changes and difficulty in dealing with problematic areas like occluded or textureless regions.
To mitigate these limitations, this paper introduces a novel self-supervised dual-level framework named \textbf{\textit{Dual-Net}}.
This framework mainly consists of two key components: self-supervised teacher training and student training based on knowledge distillation.
Specifically, the teacher model is first trained in a self-supervised fashion with a focus on feature space and data augmentation consistency.
On the one hand, pixels from feature space are robust to noise and luminance changes, which are discriminative even in textureless regions.
On the other hand, a data augmentation consistency loss is presented to guide the model toward enhanced contextual awareness, thus leading to a completed depth estimation in problematic regions.
Then, the knowledge learned by the teacher model is distilled and transferred probabilistically to the student model. By leveraging this distilled knowledge, the student model is guided by validated insights, enabling it to outperform its teacher model by a large margin.
700 M
nvidia a100 GPU
01/08/24
223
DualNet
H
2
16.4
165
19.7
176
7.99
175
10.1
169
18.3
193
24.1
162
10.0
147
23.9
167
20.4
178
7.79
192
23.0
149
23.1
154
16.3
143
18.8
156
17.0
162
18.5
140
Aixin Chong, Hui Yin, Qianqian Du, Yanting Liu, and Ming Han. Gradual interaction network for stereo matching. Submitted to Pattern Recognition, 2024.
RTX 3090
01/08/24
224
GINet
H
2
15.6
162
16.1
164
7.15
165
7.37
138
9.39
144
25.1
164
7.88
126
35.2
210
32.6
223
3.19
104
15.5
110
16.7
115
11.6
97
14.7
101
15.8
157
26.4
168
Anonymous. HART: Hadamard beat matmul on self-attention for recurrent stereo transformer. ECCV 2024 submission 1197.
NVIDIA A6000
01/31/24
225
HART
F
3
4.24
39
3.13
18
2.24
19
4.16
64
1.10
17
4.01
15
2.03
37
1.86
24
1.68
27
0.85
10
9.83
56
11.0
65
8.71
71
9.65
40
3.26
35
6.96
47
Tuming Yuan. Hourglass cascaded recurrent stereo matching network. Submitted to Image and Vision Computing, 2024.
combine stacked hourglass modules and
recurrent neural networks
The project proposes a stereo matching network based on neural operator, which can achieve mapping from RGB image pair space to disparity space. This network supports users to test images at any scale, and can customize the disparity range according to different scenarios, and dynamically build Cost Volume based on different scales and disparity ranges.
parser.add_argument('--model', default='gwcnet-g', help='select a model structure', choices=__models__.keys())
parser.add_argument('--maxdisp', type=int, default=192, help='maximum disparity')
parser.add_argument('--start_disp', type=int, default=15, help='maximum disparity')
parser.add_argument('--end_disp', type=int, default=303, help='maximum disparity')
parser.add_argument('--dataset', required=True, help='dataset name', choices=__datasets__.keys())
parser.add_argument('--datapath', required=True, help='data path')
parser.add_argument('--testlist', required=True, help='testing list')
parser.add_argument('--lr', type=float, default=0.001, help='base learning rate')
parser.add_argument('--test_batch_size', type=int, default=1, help='testing batch size')
parser.add_argument('--epochs', type=int, required=True, help='number of epochs to train')
parser.add_argument('--lrepochs', type=str, required=True, help='the epochs to decay lr: the downscale rate')
parser.add_argument('--logdir', required=True, help='the directory to save logs and checkpoints')
parser.add_argument('--loadckpt', help='load the weights from a specific checkpoint')
parser.add_argument('--resume', action='store_true', help='continue training the model')
parser.add_argument('--seed', type=int, default=1, metavar='S', help='random seed (default: 1)')
parser.add_argument('--summary_freq', type=int, default=20, help='the frequency of saving summary')
parser.add_argument('--save_freq', type=int, default=1, help='the frequency of saving checkpoint')
parser.add_argument('--out_add', type=str)
parser.add_argument('--key_query_same', type=str)
parser.add_argument('--deformable_groups', type=int, required=True)
parser.add_argument('--output_representation', type=str, required=True, help='regressing disparity')
parser.add_argument('--sampling', type=str, default='dda', required=True)
parser.add_argument('--scale_min', type=float, default=1)
parser.add_argument('--scale_max', type=float, default=1)
python; 4 Tesla V100 GPU
02/20/24
227
DispNO
H
2
15.0
156
18.4
172
6.17
147
9.13
157
11.3
155
25.2
165
11.4
158
17.6
151
14.3
144
8.70
199
31.7
180
21.6
150
18.1
159
17.6
140
16.2
160
17.5
133
Anonymous. ClearDepth: Enhanced stereo perception of transparent objects for robotic vision. ECCV 2024 submission 3517.
stereo recovery network with a cascaded vision transformer and post feature fusion
Gangwei Xu. IGEV++: Iterative multi-range geometry encoding volumes for stereo matching. Submitted to TPAMI, 2024.
RTX 3090
06/14/24
239
IGEV++
F
3
3.23
19
3.24
20
2.46
26
4.12
62
1.15
19
6.71
47
1.38
24
1.53
15
1.52
20
1.02
20
4.57
24
4.68
13
5.41
42
7.68
27
2.22
14
4.68
27
Junhong Min and Youngpil Jeon. Confidence aware stereo matching for realistic cluttered scenario. ICIP 2024.
Our approach estimates disparities using implicitly inferred confidence levels with Unet transformer.
3x unet transformer for feature extraction, runs with 1984*2816px resolution
Python, Nvidia 4090
06/27/24
240
CAS++
F
3
3.33
21
4.27
51
3.72
65
3.17
40
2.17
40
2.44
4
1.33
21
2.24
33
2.01
30
1.47
43
4.04
20
8.15
36
4.97
37
5.80
16
3.73
40
3.04
17
Yansong Zhu, Songwei Pei, and Jun Gao. AP-Net: Attention-fused volume and progressive aggregation for accurate stereo matching. Submitted to Neurocomputing, 2024
Python; 16 cores + 2 * A800
07/22/24
241
apnet
Q
1
30.9
229
18.3
171
9.59
200
17.1
225
24.8
222
49.1
233
19.5
217
32.3
201
29.2
212
22.2
252
60.7
260
33.2
194
27.0
216
28.0
219
64.4
272
63.7
253
Anonymous. Robust stereo matching for real world dataset. AAAI 2025 submission 768.
Python; A100
08/01/24
243
RSM
F
3
2.40
8
2.66
11
1.88
11
3.18
41
0.91
10
5.80
33
1.34
22
1.35
8
1.16
11
0.93
13
3.35
15
3.96
6
2.88
13
4.38
9
2.01
8
4.15
23
Anonymous. All-in-One: Transferring vision foundation models into stereo matching. AAAI 2025 submission 6620.
Pytorch; A100
08/07/24
244
AIO-Stereo
F
3
2.36
6
2.38
3
1.71
5
3.22
43
0.85
8
5.83
34
1.24
17
1.42
14
1.32
17
1.03
22
4.49
23
4.81
14
2.43
8
3.61
4
2.12
11
3.63
20
Anonymous. PointerStereo: Extract stereo position with robust feature. AAAI 2025 submission 8740.
pointer attention for global look-up
max_disp=768
RTX 3090
08/12/24
245
PointerNet
F
3
2.69
14
2.67
12
1.84
8
3.21
42
1.51
24
7.52
56
1.29
19
1.54
16
1.17
12
1.09
25
3.59
17
3.96
6
3.10
15
5.60
14
2.29
19
4.27
24
Anonymous. UniTT-Stereo: Unified training of transformers for enhanced stereo matching. AAAI 2025 submission 7987.
This paper focuses on effectively capturing local patterns from images during the fine-tuning of Transformer-based models with limited labeled training data in dense downstream tasks, particularly in the context of stereo matching. For that, we propose MaDis-stereo, a novel stereo depth estimation framework that enhances locality inductive biases during fine-tuning via Masked Image Modeling (MIM).
a100 / 1 GPU
08/15/24
247
MaDis-Stereo
F
3
9.49
104
3.73
37
3.14
48
1.76
9
9.05
141
10.5
86
1.74
30
27.8
187
27.9
206
1.50
44
7.47
43
19.8
136
4.80
34
11.8
59
3.40
37
10.2
78
Shimeng Fan. Accurate edge-preserving stereo matching by enhancing anisotropy. Submitted to Signal Processing: Image Communication, 2024.
Intel Core i5-9300H CPU @2.40 GHz (C++, OpenCV)
07/27/24
242
esmea
H
2
30.1
223
29.4
206
9.48
195
17.0
224
31.7
243
49.7
234
15.2
200
52.6
254
45.9
248
11.9
220
46.5
216
52.1
256
27.2
218
23.7
198
25.2
192
54.5
238
Anonymous. Robust stereo depth estimation for complex environments with visual transformer. WACV 2025 submission 2325.
RTX 4090
09/08/24
248
RSD
F
3
3.73
29
2.13
2
1.98
13
1.71
7
2.03
36
2.63
6
0.87
6
8.66
99
9.69
118
0.96
18
2.54
8
6.82
26
2.34
6
7.76
28
2.23
15
2.57
12
Anonymous. Grouped correlation aggregation with PatchMatch for stereo matching. ICLR 2025 submission 6577.
GTX 4090
09/26/24
249
GCAP_Stereo
F
3
4.31
40
5.32
70
3.40
56
2.38
27
2.16
39
11.2
99
4.44
69
2.13
30
2.04
32
1.32
39
7.16
41
8.97
44
5.03
39
8.38
31
3.22
32
6.08
41
Anonymous. S-MoEStereo: Selective mixture of experts with parameter-efficient fine-tuning for robust stereo matching. CVPR 2025 submission 3857.
We propose S-MoEStereo, which adapts pre-trained VFMs for stereo matching by integrating Low-Rank Adaptation (LoRA) with Mixture-of-Experts (MoE) modules.
This approach balances parameter efficiency and discriminative feature learning by dynamically selecting the optimal expert within each MoE module.
Additionally, we introduce CNN-based adapter layers to incorporate inductive bias, enhancing geometric feature extraction.
Furthermore, we propose a lightweight decision network to reduce computational costs by selectively activating MoE modules based on input complexity.
Peng Yao, Haiwei Sang, and Linlin Ge. Stacking learning with exhaustive disparity characteristics for accurate stereo matching. Submitted to Pattern Recongnition, 2025.
Using stacking learning with exhaustive disparity characteristics for accurate stereo matching to archieve the performance of deep learning ones.
P1=80,P2=320 for Semi-Global Matching Aggregation.
C/C++, Intel Core i7 3.20 GHz CPUs
02/06/25
260
SLEDC
H
2
10.1
108
8.58
123
4.40
88
5.51
94
5.84
103
21.4
153
7.02
113
6.16
81
14.3
145
5.30
158
17.0
121
15.9
105
14.8
129
18.7
153
6.52
81
13.7
113
Anonymous. Global regulation and excited attention tuning for stereo matching. ICCV 2025 submission 1322.
Apply the global context information to handle the ill-posed regions in stereo matching.
14.44 M
PyTorch RTX 4090 GPUs
02/10/25
261
GREAT-IGEV
F
3
2.81
15
3.30
21
2.44
24
2.31
26
0.96
11
7.12
50
1.17
14
1.38
10
1.36
18
1.04
23
3.89
19
3.82
5
4.66
32
6.24
19
2.17
12
4.65
26
Anze Xu, Lin Yang and Jingzhong Li. Real-time and accurate stereo matching via tri-fusion volume for stereo vision. Submitted to Neural Networks, 2025.
RTX 3090
02/16/25
262
TCM
H
2
12.8
142
13.5
154
5.64
137
7.01
133
12.0
162
30.2
179
10.0
148
9.66
107
7.53
97
5.58
166
27.0
162
21.0
147
16.0
139
18.8
155
12.5
137
18.1
137
Anonymous. State-Stereo: A robust stereo matching method for real-world scenes with state fusion. ICCV 2025 submission 3847.
RTX 3090
03/02/25
263
State-Stereo
F
3
2.64
12
3.55
27
3.06
46
1.91
16
1.19
20
1.67
3
1.28
18
1.68
19
1.66
24
1.22
31
1.64
2
4.45
12
5.81
45
5.06
10
3.25
34
2.38
9
Jiakang Yuan. LG-stereo:Local global cost volume for stereo matching. Submitted to TIP, 2025.
NVIDIA A100
03/04/25
264
LG-Stereo
F
3
1.76
2
2.57
8
1.86
9
2.02
19
0.65
4
3.23
9
0.68
3
0.98
4
0.81
2
0.55
2
2.15
5
4.26
10
2.03
3
3.85
8
1.27
2
2.42
10
Gaofeng Peng. G2L-Stereo: Global to local two-stage real-time stereo matching network. Submitted to Transactions on Computational Imaging, 2025.
We introduce Stereo Anywhere, a novel stereo-matching framework that combines geometric constraints with robust priors from monocular depth Vision Foundation Models (VFMs). By elegantly coupling these complementary worlds through a dual-branch architecture, we seamlessly integrate stereo matching with learned contextual cues.
Datasets mixture (FSD, Tartanair, CREStereo)
Python; i7 8 cores + H100 GPU
04/24/25
267
StereoAnywhere
F
3
3.69
27
7.34
108
2.23
18
2.23
24
5.12
94
18.1
142
0.90
10
2.16
31
1.43
19
1.25
35
5.73
32
4.95
17
2.66
10
6.89
22
2.28
18
1.86
5
Jie Lin. M2-Stereo: Multi scale information fusion and multi branch iteration for stereo matching. Submitted to IEEE Robotics and Automation Letters, 2025.
M2-Stereo embedded three Multi scale Feature Fusion Attention Blocks in the feature extraction stage to fuse deep and shallow information, and used a Multi scale Cost Aggregation Module in the cost aggregation stage to achieve sharing of cost information at different scales. Finally, the Multi branch Iterative Strategy was used for efficient iteration.
Iters=12;lr=0.0002;batch size=8
python;8 cores + NVIDIA TITAN Xp GPU
04/24/25
266
M2-Stereo
F
3
3.90
33
8.13
117
2.35
22
1.61
6
3.58
78
15.9
128
1.43
26
1.83
23
1.13
9
0.79
8
3.09
13
7.36
32
6.23
48
7.99
30
2.63
24
4.03
21
Gaofeng Peng. G2L-Stereo: Global to local two-stage real-time stereo matching network. Submitted to Transactions on Computational Imaging, 2025.
python;Nvidia A800 GPU
05/07/25
268
G2L-ROB
F
3
11.6
129
15.0
161
5.19
126
4.91
79
13.1
171
19.4
146
6.22
102
17.5
150
14.0
141
2.55
79
13.9
98
20.8
145
14.0
122
18.2
147
6.96
91
14.6
122
Jie Lin. DS-Stereo: Deep-shallow information interaction for stereo matching. Submitted to Neural Networks, 2025.
DS-Stereo utilizes our proposed Adjacent Feature Hybrid Attention Block and Hierarchical Cost Aggregation Module to achieve deep to shallow information interaction in stereo matching. Simultaneously replacing the traditional ConvGRU iterative operator with an Inception like iterative operator to achieve high convergence updates.
lr=0.0002;iters=12;batch size=8
python; 8 cores + NVIDIA TITAN Xp GPU
05/07/25
269
DS-Stereo
F
3
3.13
17
4.92
61
2.14
15
1.40
3
1.23
21
10.1
79
1.16
13
1.55
17
1.25
14
0.93
13
3.41
16
6.18
19
5.10
40
8.66
35
2.21
13
2.43
11
Anonymous. MatchAttention for high-resolution cross-view matching. Submitted to TPAMI, 2025.
MatchAttention for High-Resolution Cross-View Matching
MatchStereo-B, 76M parameters
RTX 4090
05/10/25
270
MatchStereo
F
3
1.85
4
2.61
9
2.14
15
1.79
12
0.59
2
1.30
2
0.80
4
1.11
6
0.95
5
0.65
4
1.90
4
3.17
2
2.41
7
5.43
12
1.88
7
1.79
4
Xiaoyang Zhao. Monocular prior-based water surface stereo matching algorithm. Submitted to Neurocomputing, 2025.
GTX 4090D
05/21/25
271
waterstereo
H
2
8.48
93
6.74
100
6.51
154
9.22
159
5.11
93
6.44
43
4.71
73
5.50
74
5.36
78
3.44
116
18.2
129
14.2
90
9.13
76
14.4
97
13.3
141
12.9
110
Zhien Dai. Multi-scale geometric-structure-enhanced stereo matching. Submitted to TIP, 2025.
This paper proposes a robust stereo matching algorithm that combines a CNN for initial cost computation, bilateral filtering with cross-based cost aggregation (CBCA) for refinement, and a winner-take-all (WTA) strategy for disparity selection, followed by an edge-aware smoothing filter (EASF) to reduce noise
transformer based global stereo matching using stacked multi-resolution vision transformer blocks
ch:384, ntr: 3, num_refine: 3
Intel i9-13900K, Nvidia 4090
06/27/25
276
S2M2
F
3
1.15
1
1.29
1
1.23
1
1.27
1
0.40
1
0.45
1
0.59
2
0.67
1
0.62
1
0.45
1
1.28
1
2.80
1
1.37
1
3.60
3
1.12
1
0.25
1
Yongjian Zhang, Longguang Wang, Kunhong Li, Ye Zhang, Yun Wang, Liang Lin, and Yulan Guo.
PanMatch: Unleashing the potential of large vision models for unified matching models. Submitted to TPAMI, 2025.
A100
06/17/25
275
PanMatch
F
3
7.18
71
5.21
67
5.34
132
3.34
45
5.43
97
4.52
17
2.47
46
13.2
127
13.3
138
1.51
45
8.34
49
16.6
112
8.07
65
10.3
45
4.96
53
9.23
62
Peng Yao, Haiwei Sang, and Linlin Ge. Stacked learning with exhaustive disparity characteristics for accurate stereo matching. Submitted to Neurocomputing, 2025.
Using stacked learning with exhaustive disparity characteristics for accurate stereo matching to archieve more accurate disparity estimation.
For the ERF, we set 10 Decision Trees, maximum depth is 25 and minimum number of samples in each node to split equal to 12.
Dodeca-Core Intel Core i7 CPUs,32GB RAM,C++,64-bits OS
07/11/25
277
SLEDC_v1
H
2
6.67
65
4.22
50
2.72
32
3.49
48
3.38
73
13.3
113
5.11
79
4.36
59
3.92
58
2.19
63
13.5
94
10.8
64
11.9
100
14.5
99
6.83
86
8.27
53
Tongfan Guan, Jiaxin Guo, Chen Wang, and Yun-Hui Liu. BridgeDepth: Bridging monocular and stereo reasoning with latent alignment. ICCV 2025.
A unified framework that bridges monocular reasoning and stereo matching through iterative bidirectional alignment of their latent representations.
lr=5e-4,iters=100000
Python; PyTorch + GTX 4090D
07/24/25
279
BridgeDepth
H
2
3.78
30
13.0
150
2.45
25
1.58
5
1.54
27
9.56
75
2.27
41
3.67
54
1.65
23
1.29
38
7.63
44
6.44
22
2.72
11
7.79
29
2.70
25
2.68
13
Anonymous. GEAStereo: Geometry-aware stereo-matching via monocular disparity prior and gradients. AAAI 2026 submission 7299.
python with pytorch; Nvidia RTX 3090
07/19/25
278
GEAStereo
F
3
3.80
31
2.93
17
2.29
20
2.08
20
2.52
47
6.53
46
2.14
39
2.11
29
2.32
36
1.36
40
6.97
40
6.42
21
5.55
44
10.9
48
2.33
21
5.06
34
Junda Chen and Wenjing Liao. MonSter++: A unified geometric foundation model for multi-view depth estimation via harnessing monodepth priors. Submitted to TPAMI, 2025.