vision.middlebury.edu/stereo/eval3

Middlebury Stereo Evaluation - Version 3

#page { display:none; } #noscript { display:inline; background-color:crimson; color:white; font-size:14px; font-weight:bold; } Please enable javascript to use the site.

Mouseover the table cells to see the produced disparity map. Clicking a cell will blink the ground truth for comparison. To change the table type, click the links below. For more information, please see the description of new features.

Submit and evaluate your own results.

Set:	test densetest sparsetraining densetraining sparse
Metric:	bad 0.5 bad 1.0 bad 2.0 bad 4.0 avgerr rms A50 A90 A95 A99 time time/MP time/GD
Mask:	nonocc all
plot selected show invalid Reset sort Reference list

Reference

Description

Parameters

Running Environment

[stat] error

bad 2.0 (%)

Weight

Date

Name

Res

Avg

Austr

AustrP

Bicyc2

Class

ClassE

Compu

Crusa

CrusaP

Djemb

DjembL

Hoops

Livgrm

Nkuba

Plants

Stairs

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 250
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 1.5
nd: 256
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 410
im0	im1
GT
nonocc

MP: 5.9
nd: 320
im0	im1
GT
nonocc

MP: 5.5
nd: 570
im0	im1
GT
nonocc

MP: 5.6
nd: 320
im0	im1
GT
nonocc

MP: 5.2
nd: 450
im0	im1
GT
nonocc

OpenCV's "semi-global block matching" method; memory-intensive 2-pass version, which can only handle the quarter-size images. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/25/14

SGBM2

26.4

175

27.9

165

12.1

182

17.8

192

13.7

144

74.5

227

14.0

158

30.3

161

26.3

165

11.0

179

64.4

225

37.9

185

25.8

173

25.3

176

29.3

175

43.7

180

OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/25/14

SGBM1

27.9

180

28.3

167

17.2

201

19.0

193

14.5

148

57.9

212

15.6

168

31.8

164

31.4

177

13.2

190

58.6

212

38.6

187

27.0

183

25.9

178

31.4

180

59.7

206

The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.

07/25/14

SGM

20.8

157

35.5

182

9.57

164

13.8

169

16.5

153

32.1

153

19.0

177

25.8

144

16.7

139

8.95

167

39.8

164

31.1

158

22.6

150

20.7

143

21.3

159

32.2

152

07/25/14

SGBM1

28.4

182

43.5

209

9.09

157

13.6

168

25.9

189

82.0

231

14.4

163

43.4

193

30.3

175

5.98

147

59.3

215

45.8

203

28.5

190

24.9

173

20.1

156

45.9

185

07/28/14

SGBM1

23.8

164

32.9

175

10.8

174

13.6

167

16.2

151

71.2

222

12.6

140

26.6

147

23.0

156

5.83

142

53.8

205

39.2

189

25.6

172

22.8

155

18.8

147

47.4

189

07/28/14

SGM

18.4

144

40.3

197

4.54

8.03

120

22.9

176

40.5

166

11.4

132

24.7

141

10.1

104

5.40

137

29.6

143

28.5

150

23.9

157

20.0

135

14.2

127

30.9

147

07/28/14

SGM

25.3

172

45.1

212

4.33

6.87

109

32.2

208

50.0

197

13.0

149

48.1

204

18.3

145

7.66

158

29.6

142

36.1

177

31.2

200

24.2

168

24.5

164

50.2

193

Correlation with five, partly overlapping windows on Census transformed images using Hamming distance as matching cost. A left-right consistency check ensures unique matches and filtering small disparity segments removes outliers. Interpolation is done within image rows with the lowest, valid neighboring disparity.

07/28/14

Cens5

26.6

176

47.1

218

8.74

154

11.9

156

25.6

186

45.3

188

19.5

181

40.6

187

29.0

172

9.93

173

36.5

157

38.6

186

31.0

199

25.0

175

25.6

166

44.6

182

A fast method for high-resolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/- 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.

08/25/14

LPS

19.2

153

6.14

5.34

112

9.24

132

7.53

106

96.0

236

12.3

138

9.61

9.40

5.18

131

92.4

237

27.4

146

24.3

159

23.0

157

10.0

100

25.6

138

08/27/14

LPS

20.3

156

6.72

6.06

124

9.72

135

9.87

121

94.3

235

14.1

159

11.2

111

5.88

143

89.3

236

36.0

176

20.5

139

23.8

166

16.0

136

25.4

135

08/31/14

BSM

41.5

226

59.8

233

25.8

228

27.9

214

38.9

224

60.6

214

33.3

220

46.9

202

37.3

197

26.3

223

64.8

226

51.5

213

42.6

227

45.2

231

42.8

204

66.6

216

09/10/14

LAMC_DSM

26.0

173

55.8

229

11.9

180

14.3

173

18.3

160

44.0

182

18.3

176

39.9

185

29.5

174

6.67

150

31.1

146

34.5

170

28.8

191

26.3

179

30.1

178

35.7

157

09/18/14

SNCC

21.9

159

48.6

222

6.98

136

9.79

136

25.7

187

46.0

191

12.4

139

36.8

179

16.6

138

7.25

154

23.1

123

34.2

169

26.7

180

21.8

149

19.9

154

28.4

145

10/07/14

IDR

18.1

143

37.5

191

4.08

7.49

117

23.3

178

40.6

168

12.8

141

24.5

139

11.3

112

5.46

138

33.1

151

26.0

139

21.5

147

21.7

148

15.3

133

21.2

126

In stereo matching cost filtering methods and energy minimization algorithms are considered as two different techniques. Due to their global extend energy minimization methods obtain good stereo matching results. However, they tend to fail in occluded regions, in which cost filtering approaches obtain better results. In this paper we intend to combine both approaches with the aim to improve overall stereo matching results. We propose to perform stereo matching as a two-step energy minimization algorithm. We consider two MRF models: a fully connected model defined on the complete set of pixels in an image and a conventional locally connected model. We solve the energy minimization problem for the fully connected model, after which the marginal function of the solution is used as the unary potential in the locally connected MRF model.

01/21/15

TSGO

39.1

216

34.1

178

16.9

200

20.0

194

43.3

228

55.4

208

14.3

161

54.1

218

49.2

216

33.9

231

66.2

227

45.9

204

39.8

222

42.6

227

47.2

213

52.6

195

04/08/15

REAF

31.4

194

58.3

230

30.9

234

13.1

162

45.3

230

63.8

217

30.9

217

38.7

183

25.3

162

8.60

165

39.3

163

36.8

180

27.0

182

35.5

215

18.2

145

39.7

170

04/09/15

PFS

32.2

199

65.1

235

29.4

233

12.1

157

50.0

233

70.8

221

28.2

207

44.6

195

23.1

157

7.85

161

37.0

161

37.7

183

27.9

187

36.0

217

19.8

153

35.7

158

04/17/15

TMAP

16.9

136

20.2

144

4.94

8.13

123

12.8

140

30.0

146

11.7

133

27.9

155

20.4

150

5.09

130

31.5

147

23.1

131

20.9

142

19.0

132

18.8

148

18.0

115

This approach triangulates the polygonized SLIC segmentations of the input images and optimizes a lower-layer MRF on the resulting set of triangles defined by photo consistency and normal smoothness. The lower-layer MRF is solved by a quadratic relaxation method which iterates between PatchMatch and Cholesky Decomposition. The lower-layer MRF is assisted by a upper-layer MRF defined on the set of triangle vertices which exploits local 'visual complexity' cues and encourages smoothness of the vertices' splitting properties. The two layers interact through an Alignment energy term which requires triangles sharing a non-split vertex to have their disparities agree on that vertex. Optimization of the whole model is iterated between optimizations of the two layers till convergence where the upper-layer can be solved in closed form.

04/19/15

MeshStereo

13.2

120

5.90

4.88

10.8

148

12.9

141

10.6

11.0

128

12.2

102

9.01

5.39

136

27.4

134

23.5

133

17.7

129

21.0

145

15.4

134

20.9

124

Compute the matching cost with a convolutional neural network (accurate architecture). Then apply cross-based cost aggregation, semiglobal matching, left-right consistency check, median filter, and a bilateral filter. DETAILS: The network is similar to the one described in our CVPR paper differing only in the values of some hyperparameters. The input to the network are two 11 x 11 image patches. Five convolutional layers with 3 x 3 kernels and 112 feature maps extract feature vectors from the input image patches. The two 112-length feature vectors are concatenated into a 224-length vector which is passed through three fully-connected layers with 384 units each. The final (fourth) fully-connected layer projects the output to a single number---the matching cost. One important addition was the use of data augmentation techniques to increase the size of the training set. We tried to use as much training data as possible. Therefore we combined all of the 2001, 2003, 2005, 2006, and 2014 Middlebury datasets obtaining 60 image pairs. For the newer datasets (2005, 2006, and 2014) we also used several illumination and exposure settings.

08/28/15

MC-CNN-acrt

8.08

5.59

4.55

5.96

2.83

11.4

5.81

8.32

8.89

2.71

16.3

14.1

13.2

13.0

6.40

11.1

A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. (Improved results as of 9/14/2015 due to bug fix in color-to-gray conversion.)

09/14/15

ELAS

32.3

200

50.9

223

9.17

158

11.0

151

33.0

211

88.2

233

18.3

175

47.3

203

26.8

166

11.7

184

41.7

168

37.4

181

23.7

156

28.8

189

63.0

227

42.8

175

09/28/15

R-NCC

48.4

231

26.2

161

14.8

192

30.2

220

30.9

203

72.9

225

41.6

233

77.7

236

64.1

235

27.4

226

59.1

214

71.9

236

50.9

233

33.9

213

78.2

236

80.8

232

The method generates multiple proposals on absolute and relative disparities from multi-segmentations. The proposals are coordinated by point-wise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes.

10/13/15

MDP

12.6

115

14.4

127

4.99

101

10.6

147

10.7

125

27.2

141

8.11

107

12.5

104

8.07

4.27

121

30.4

144

20.5

123

12.6

17.8

119

13.4

124

17.3

111

We post-process the depth maps produced by Zbontar & LeCun's MC-CNN technique. We use a domain transform to compute an edge-aware variance measure of our confidence in the depth map, and then run our robust bilateral solver on that depth map and confidence with a Geman-McClure loss function. The MC-CNN is computed using the publicly-available implementation (https://github.com/jzbontar/mc-cnn) which using the GPU, and the robust bilateral solver is computed using our CPU implementation which does not use the GPU, and is written in vanilla C++.

11/03/15

MC-CNN+RBS

8.42

6.05

5.16

105

6.24

3.27

11.1

6.36

8.87

9.83

101

3.21

15.1

15.9

12.8

13.5

7.04

9.99

12/18/15

INTS

14.5

128

20.2

143

4.52

8.62

126

11.6

128

29.5

143

10.7

126

16.4

119

10.3

105

4.69

122

27.6

136

22.5

128

20.7

140

20.5

139

11.5

114

24.9

134

An efficient stereo matching algorithm, which applies adaptive smoothness constraints using texture and edge information, is proposed in this work. First, we determine non-textured regions, on which an input image yields flat pixel values. In the non-textured regions, we penalize depth discontinuity and complement the primary CNN-based matching cost with a color-based cost. Second, by combining two edge maps from the input image and a pre-estimated disparity map, we extract denoised edges that correspond to depth discontinuity with high probabilities. Thus, near the denoised edges, we penalize small differences of neighboring disparities. The method uses the MC-CNN code for the matching cost computation only.

01/19/16

NTDE

7.44

5.72

4.36

5.92

2.83

10.4

5.71

5.30

5.54

2.40

13.5

14.1

12.6

13.9

6.39

12.2

01/26/16

MC-CNN-fst

9.47

7.35

5.07

102

7.18

113

4.71

16.8

110

8.47

111

7.37

6.97

2.82

20.7

113

17.4

104

15.4

112

15.1

7.90

12.6

Our approach is an extension of the ELAS (from Geiger et al.) algorithm. We extract edges and sample our candidate support points along them. For every two consecutive valid support points we create a (straight) line segment. We force the triangulation to include the set of line segments (constrained Delaunay) for a better preservation of the disparity discontinuity at the edges.

02/18/16

LS-ELAS

36.7

209

53.5

225

10.3

172

15.8

183

37.0

221

83.6

232

24.5

195

49.1

206

34.6

188

13.9

191

44.9

177

45.7

202

34.9

210

29.1

191

64.4

230

62.7

212

The computation of the sparse disparity maps is achieved by means of a 3D diffusion of the costs contained in the disparity space volume. The watershed segmentations of the left and right views control the diffusion process and valid measurements are obtained by cross-checking. The estimation of the dense disparity maps uses the sparse measurements as control points and is driven by a 3D watershed separating the disparity space volume into foreground and background pixels.

03/15/16

MPSV

43.5

228

58.8

231

33.9

236

34.2

229

37.9

223

52.4

202

30.8

215

56.8

225

51.0

223

30.6

228

56.9

208

51.5

214

44.6

229

43.4

228

44.2

206

54.2

198

No post processing (no filtering, no hole-filling, no interpolation) performed. The concepts of intrinsic curves were revisited and used for: - disparity search space reduction, resulting in 83% reduction of the disparity range (individually for each pixel) directly from the original resolution of the image without needing hierarchical search - reducing the ambiguities due to occluded pixels by integrating occlusion clues explicitly into the global energy function as a soft prior The final energy minimization was done using semi global approach along eight paths.

04/03/16

ICSG

45.6

230

69.7

237

19.1

206

21.3

199

43.6

229

77.6

230

36.9

227

65.3

232

40.4

203

20.3

206

53.6

203

58.7

231

46.5

231

47.1

233

60.7

226

79.1

229

04/24/16

HLSC_cor

26.0

174

26.5

162

15.2

195

21.0

197

20.5

170

35.7

160

23.4

192

33.1

169

35.0

190

11.9

185

39.1

162

34.2

168

25.2

170

32.8

207

28.3

173

22.7

129

04/27/16

JEM

37.2

212

35.7

183

27.9

232

30.6

222

33.2

213

43.0

180

31.4

218

49.5

208

47.3

212

26.5

225

49.6

191

46.0

205

35.7

212

30.8

199

37.5

192

55.8

203

A 3D label based method with global optimization at pixel level. A bilayer matching cost is employed by first matching small square windows then aggregate on large irregular windows. Global optimization is carried out by fusing candidate proposals, which are generated from our specific superpixel structure.

05/12/16

PMSC

6.71

3.46

2.68

6.19

2.54

6.92

4.54

3.96

4.04

2.37

13.1

12.3

12.2

16.2

106

5.88

10.8

05/28/16

APAP-Stereo

7.26

5.43

4.91

5.11

5.17

21.6

124

6.99

4.31

4.23

3.24

14.3

9.78

7.32

13.4

6.30

8.46

07/03/16

LPU

10.4

11.4

114

3.18

8.10

122

6.08

20.9

122

8.24

109

6.94

4.00

4.04

113

33.9

154

16.9

101

15.2

110

17.8

118

9.12

11.6

08/31/16

SED

63.4

237

54.3

227

22.4

221

72.9

238

64.5

238

71.4

223

42.5

234

80.1

237

67.9

236

49.8

237

79.6

231

74.4

237

65.4

238

55.1

237

86.1

238

91.6

237

We propose a method to combine the predicted surface normal constraint by deep learning. With the selected reliable disparities from stereo matching method and effective edge fusion strategy, we can faithfully convert the predicted surface normal map to a disparity map by solving a least squares system which maintains discontinuity. We use the raw matching cost of MC-CNN.

09/13/16

SNP-RSM

8.75

5.46

4.85

6.50

102

3.37

10.4

7.31

8.73

9.37

3.58

103

14.3

14.7

14.9

109

12.8

10.1

102

10.8

10/19/16

LW-CNN

7.04

4.65

3.95

5.30

2.63

11.2

5.41

4.32

4.22

2.43

12.2

13.4

13.6

14.8

4.72

12.0

10/23/16

SIGMRF

64.2

238

60.0

234

33.0

235

67.9

237

63.2

237

99.5

240

39.8

231

84.8

238

82.0

238

35.2

233

95.2

238

91.5

238

58.1

236

65.8

238

55.0

221

88.6

236

11/06/16

SPS

19.6

154

14.2

126

12.3

184

14.9

177

12.0

134

15.8

102

19.1

178

17.4

124

15.4

130

8.23

163

30.9

145

34.8

171

30.6

197

25.3

176

28.3

172

28.0

141

11/15/16

MC-CNN-WS

12.1

112

14.8

130

7.20

140

11.1

153

7.62

107

15.9

104

11.8

134

11.5

101

9.01

3.89

108

19.7

109

20.5

122

16.3

117

16.3

107

12.1

117

18.3

118

11/16/16

MCSC

11.3

108

13.3

122

5.96

122

10.6

144

8.69

113

7.22

11.3

130

10.6

7.48

3.07

3.10

25.2

137

19.0

135

17.2

112

10.3

104

25.5

137

11/24/16

ADSM

38.7

215

40.4

199

20.3

210

27.3

213

35.1

218

55.9

209

22.3

190

56.1

221

50.9

222

24.2

217

58.0

210

56.3

227

36.5

214

32.1

206

38.7

197

69.7

219

01/15/17

IGF

34.0

204

42.7

206

20.1

209

23.7

207

32.2

207

45.6

190

28.6

211

43.0

191

37.2

196

21.4

210

50.9

196

44.7

201

34.7

209

31.9

204

37.4

190

47.1

188

01/24/17

3DMST

5.92

3.71

2.78

4.75

2.72

7.36

4.28

3.44

3.76

2.35

12.6

11.5

8.56

14.0

5.35

8.87

03/09/17

SGMEPi

13.9

125

6.92

6.71

133

9.47

133

9.72

119

11.8

13.6

153

10.9

10.6

108

5.26

132

32.8

150

26.9

142

22.7

151

22.7

154

12.0

116

21.7

127

03/10/17

MC-CNN+TDSR

6.35

5.45

4.45

6.80

107

3.46

10.7

6.05

5.01

5.19

2.62

10.8

9.62

6.59

11.4

6.01

7.04

We propose a novel method for stereo estimation, combining advantages of convolutional neural networks (CNNs) and optimization-based approaches. The optimization, posed as a conditional random field (CRF), takes local matching costs and consistency-enforcing (smoothness) costs as inputs, both estimated by CNN blocks. To perform the inference in the CRF we use an approach based on linear programming relaxation with a fixed number of iterations. We address the challenging problem of training this hybrid model end-to-end. We show that in the discriminative formulation (structured support vector machine) the training is practically feasible. The trained hybrid model with shallow CNNs is comparable to state-of-the-art deep models in both time and performance. The optimization part efficiently replaces sophisticated and not jointly trainable (but commonly applied) post-processing steps by a trainable, well-understood model.

03/22/17

JMR

12.5

114

4.09

3.97

8.44

125

6.93

101

11.1

13.8

154

19.5

130

19.0

147

3.66

104

17.0

100

18.2

109

18.0

131

21.0

146

7.29

17.8

114

03/23/17

DSGCA

33.8

202

42.9

207

20.9

216

23.6

206

30.2

200

45.5

189

27.6

205

42.0

190

36.0

194

21.0

207

50.2

193

44.2

200

33.3

208

34.6

214

38.4

195

46.8

186

04/04/17

DDL

30.1

189

44.3

211

19.4

207

25.8

211

28.3

195

42.1

174

21.1

187

37.1

180

28.7

171

21.7

211

46.8

183

36.0

175

30.3

196

28.4

186

32.7

183

37.5

163

05/23/17

r200high

40.9

223

70.5

238

14.4

189

21.3

198

37.7

222

72.2

224

38.1

229

53.2

216

31.4

177

18.3

196

52.4

201

52.6

218

44.1

228

45.4

232

50.7

216

66.5

215

06/14/17

DoGGuided

41.4

224

45.4

214

23.6

224

30.6

223

34.6

216

52.5

203

28.3

208

59.1

230

53.8

227

26.4

224

60.6

219

54.7

225

38.3

218

35.5

216

44.5

207

72.0

225

We propose local expansion moves for estimating dense 3D labels on a pairwise MRF. The data term uses a PatchMatch-like 3D slanted window formulation, where raw matching costs within a window are computed by MC-CNN-acrt and aggregated using guided image filtering. The smoothness term uses a pairwise curvature regularization term by Olsson et al. 2013.

06/22/17

LocalExp

5.43

3.65

2.87

2.98

1.99

5.59

3.37

3.48

3.35

2.05

10.3

9.75

8.57

14.4

5.40

9.55

We propose a feature ensemble network leveraging deep convolutional neural network to perform matching cost computation and the disparity refinement. For matching cost computation, patch-based network architecture with multi-size and multi-layer pooling unit is adopted to learn cross-scale feature representations. For disparity refinement, the initial optimal and sub-optimal disparity maps are incorporated and diverse base learners are applied.

10/12/17

FEN-D2DRR

7.23

4.68

4.11

5.03

3.03

8.42

6.05

4.90

5.32

3.20

11.5

14.1

13.4

13.9

5.06

14.3

102

We propose a robust learning-based method for stereo cost volume computation. We accomplish this by coalescing diverse evidence from a bidirectional matching process via random forest classifiers. We show that our matching volume estimation method achieves similar accuracy to purely data-driven alternatives and that it generalizes to unseen data much better. In fact, we used the same model trained on Middlebury 2014 dataset to submit to the KITTI and ETH3D benchmarks.

11/13/17

CBMV

11.1

107

6.07

5.22

107

8.09

121

4.05

18.7

117

9.31

116

10.7

9.61

3.11

33.7

152

15.6

17.5

127

17.1

111

10.1

101

14.4

103

We extend the standard BP sequential technique to the fully connected CRF models with the geodesic distance affinity. Also a new approach to the BP marginal solution is proposed that we call one-view-occlusion detection (OVOD). In contrast to the standard winner takes all (WTA) estimation, the proposed OVOD solution allows to find occluded regions in the disparity map and simultaneously improve the matching result. As a result we can perform only one energy minimization process and avoid the cost calculation for the second view and the left-right check procedure.

12/11/17

OVOD

8.87

4.74

3.64

5.51

4.82

12.8

6.51

9.91

9.96

103

3.13

16.6

14.8

14.1

103

15.4

6.92

13.2

02/07/18

56.1

235

47.7

220

27.9

231

36.1

231

46.7

231

62.5

216

50.2

238

72.4

235

69.9

237

37.8

234

88.2

235

70.0

234

52.8

235

50.2

236

77.5

235

91.8

238

02/28/18

SDR

7.69

5.41

4.22

4.20

2.73

10.2

5.40

6.40

5.76

4.72

123

11.2

15.4

13.4

16.5

108

5.22

13.0

03/09/18

SGM_RVC

18.4

145

37.4

189

5.31

110

9.03

129

14.2

146

31.7

150

14.3

162

24.7

142

12.6

116

5.27

133

31.8

148

29.7

154

24.9

168

22.0

150

18.6

146

28.2

143

Semi-Global Matching (SGM) uses an aggregation scheme to combine costs from multiple 1D scanline optimizations that tends to hurt its accuracy in difficult scenarios. We propose replacing this aggregation scheme with a new learning-based method that fuses disparity proposals estimated using scanline optimization. Our proposed SGM-Forest algorithm solves this problem using per-pixel classification. SGM-Forest currently ranks 1st on the ETH3D stereo benchmark and is ranked competitively on the Middlebury 2014 and KITTI 2015 benchmarks. It consistently outperforms SGM in challenging settings and under difficult training protocols that demonstrate robust generalization, while adding only a small computational overhead to SGM.

03/11/18

SGM-Forest

7.37

4.71

3.69

4.93

3.18

11.1

5.37

5.57

5.81

2.65

14.5

13.2

13.1

14.8

5.63

11.2

03/14/18

DTS

13.4

122

8.45

101

7.54

144

7.46

116

5.50

14.9

10.2

124

24.5

140

25.1

160

4.93

126

19.2

108

18.7

110

14.6

107

15.9

103

13.0

121

17.2

110

03/23/18

MEDIAN_ROB

97.8

240

96.1

239

95.6

239

99.0

240

98.4

240

98.4

239

99.2

240

98.4

240

98.1

239

99.0

240

99.0

240

99.6

240

99.9

240

94.7

240

95.1

239

98.3

239

03/23/18

AVERAGE_ROB

97.6

239

96.2

240

96.5

240

96.8

239

97.8

239

97.8

238

98.2

239

97.9

239

98.2

240

98.9

239

98.9

239

99.1

239

99.7

239

93.1

239

97.9

240

98.9

240

A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. Updated ELAS submission as a baseline for the Robust Vision Challenge (http://robustvision.net), replacing the original ELAS (H) entry.

03/26/18

ELAS_RVC

27.3

177

43.4

208

12.4

185

13.9

171

23.8

179

66.4

218

20.4

184

33.0

168

20.7

152

11.0

178

43.9

174

37.5

182

26.3

174

28.7

188

38.4

196

33.3

153

04/17/18

ISM

40.8

221

42.5

205

26.4

229

34.8

230

36.1

219

44.5

185

34.4

222

56.2

222

52.7

226

25.2

220

51.0

198

52.4

217

39.7

220

33.3

209

38.8

199

75.3

227

05/01/18

PSMNet_ROB

42.1

227

33.0

176

23.1

222

30.1

219

31.4

204

54.8

207

30.7

214

48.7

205

48.3

214

28.3

227

80.8

233

53.5

222

36.9

216

38.6

224

63.9

229

71.2

221

05/18/18

PDS

14.2

126

14.4

127

5.80

120

10.5

143

10.5

123

22.1

125

14.0

157

14.5

110

8.97

5.93

145

24.2

128

21.5

126

18.2

134

18.9

130

11.9

115

33.6

155

05/22/18

DN-CSS_ROB

22.8

162

31.4

172

9.28

160

13.5

166

12.4

136

44.3

184

12.1

135

28.1

157

17.6

142

9.11

169

50.9

197

40.0

191

21.2

144

25.0

174

31.9

182

43.2

177

05/26/18

NOSS_ROB

5.01

3.57

2.84

3.99

1.93

5.15

3.34

3.32

3.15

2.32

8.55

7.45

7.06

12.5

5.20

9.06

05/31/18

FBW_ROB

32.2

198

36.3

185

9.37

161

14.2

172

19.1

163

69.3

220

13.2

151

51.0

210

39.1

200

10.8

177

43.9

174

41.5

195

33.0

206

29.9

197

51.3

217

71.4

222

05/31/18

iResNet_ROB

24.8

170

23.0

151

10.2

171

14.7

175

12.4

135

25.9

137

12.9

142

28.0

156

24.9

159

11.5

182

46.6

182

38.9

188

21.4

145

27.8

182

45.6

210

66.7

217

05/31/18

CBMV_ROB

7.65

3.48

3.35

4.80

3.57

6.32

6.88

4.84

3.91

1.97

25.4

130

11.1

13.1

15.8

102

7.34

13.8

06/05/18

CBMBNet

10.2

8.30

5.10

103

6.87

109

4.52

11.5

7.70

100

13.9

108

13.2

117

3.04

21.9

116

13.3

13.6

100

15.4

11.2

112

11.0

Numerous CNN algorithms focus on the pixel-wise matching cost computation, which is the important building block for many state-of-the-art algorithms. However, these architectures are limited to small and single scale receptive fields and use traditional methods for cost aggregation or even ignore cost aggregation. In this paper, we propose a novel architecture called cascaded multi-scale and multi-dimension network (MSMD) to take them both into consideration. Firstly, we propose a new multi-scale matching cost computation sub-network, in which two different sizes of receptive fields are implemented parallelly. In this way, the network can make the best use of both variants to balance the trade-off between the increase of receptive field and the loss of details. Furthermore, we show that our multi-dimension aggregation sub-network which contains 2D convolution and 3D convolution operations can provide rich context and semantic information for estimating an accurate initial disparity.

06/14/18

MSMD_ROB

30.9

191

26.9

164

14.6

190

20.0

195

22.6

175

33.7

154

27.8

206

43.9

194

38.4

199

21.1

208

49.5

190

40.8

193

31.8

201

31.6

200

37.5

194

43.6

179

A robust solution for semi-dense stereo matching is presented. It utilizes two CNN models for computing stereo matching cost and performing confidence-based filtering, respectively. Compared to existing CNNs-based matching cost generation approaches, our method feeds additional global information into the network so that the learned model can better handle challenging cases, such as lighting changes and lack of textures. Through utilizing non-parametric transforms, our method is also more self-reliant than most existing semi-dense stereo approaches, which rely highly on the adjustment of parameters.

06/27/18

DCNN

10.9

104

5.66

4.98

6.49

101

5.73

12.5

8.51

112

15.6

114

10.9

110

3.08

24.1

127

20.2

120

16.8

123

15.5

10.3

105

13.8

07/31/18

MotionStereo

40.4

219

67.6

236

25.0

226

29.2

217

40.9

226

57.3

211

35.5

224

57.5

227

40.4

202

19.9

203

42.8

172

52.6

219

39.8

221

37.1

219

51.7

218

34.9

156

10/10/18

DISCO

24.5

167

35.0

181

7.34

142

11.7

154

18.7

162

48.6

194

17.1

170

31.4

162

22.4

154

9.33

172

46.0

180

33.5

165

27.5

186

24.6

170

27.5

170

55.0

201

10/29/18

iResNet

22.9

163

28.3

167

9.19

159

15.8

182

19.3

166

35.1

156

11.3

131

27.7

152

16.8

140

15.2

192

54.7

206

27.6

147

19.5

137

21.5

147

31.9

181

51.6

194

10/29/18

Dense-CNN

7.98

5.59

4.54

5.83

2.79

10.4

5.78

8.26

8.84

2.66

15.6

14.2

13.2

6.30

11.1

11/07/18

IEBIMst

33.8

203

36.7

187

12.1

183

16.9

188

32.5

209

51.0

198

25.3

198

58.1

228

49.8

218

11.2

180

48.6

187

56.9

229

30.2

195

26.8

181

26.9

168

71.7

223

11/08/18

HSM-Net_RVC

10.2

12.0

118

5.32

111

7.50

118

6.72

100

15.6

100

9.89

120

6.83

5.14

4.17

119

22.7

119

17.1

102

15.6

113

14.3

10.8

110

14.6

104

11/11/18

MBM

22.8

161

36.4

186

9.95

169

15.3

179

19.3

165

36.5

162

19.9

182

27.5

151

18.1

143

10.1

175

41.5

167

32.7

161

26.3

176

23.3

159

21.4

160

39.9

171

We propose four efficient feature extractors based on convolutional neural networks for stereo matching cost computation. Two of them generate multiscale features with diverse receptive field sizes. These multiscale features are used to compute the corresponding multiscale matching costs. We then determine an optimal cost by combining the multiscale costs using edge information. On the other hand, the other two feature extractors produce uni-scale features by combining multiscale features directly through fully connected layers. Finally, after obtaining matching costs using one of the four extractors, we determine optimal disparities based on the cross-based cost aggregation and the semiglobal matching.

11/28/18

MSFNetA

7.96

6.21

4.26

6.02

3.66

8.95

6.28

8.41

8.06

2.62

17.9

103

13.9

11.9

11.5

8.00

10.6

01/12/19

EHCI_net

9.47

3.75

4.27

13.1

162

27.6

192

5.30

3.23

3.47

3.18

3.90

109

9.20

9.58

9.26

13.9

17.3

141

10.6

01/17/19

FASW

28.6

184

41.7

202

18.1

203

23.1

204

27.2

191

40.6

168

19.1

179

34.9

174

28.1

170

18.5

199

40.8

166

36.4

178

29.3

192

28.4

185

31.1

179

41.0

172

12/18/18

MCV-MFC

24.8

169

26.8

163

9.63

167

14.8

176

17.4

156

54.1

206

14.2

160

26.5

146

18.2

144

16.0

194

62.6

223

28.7

152

20.9

141

24.6

171

37.4

191

48.1

191

02/05/19

AMNet

53.3

233

54.3

226

63.7

238

51.2

236

51.3

234

40.6

170

39.9

232

51.6

211

55.9

231

55.4

238

58.9

213

57.5

230

52.7

234

49.5

235

58.1

224

61.6

210

The method comprises two main steps. First, we use adaptive support weights for local matching. Apart from the color similarity and geometric distance, the adaptive weight distribution favors pixels in the block matching with smaller cost. Besides, we use a multiscale strategy with invalidation criteria to reduce match ambiguity and computational time. Second, a global interpolation using a variational formulation is carried out. The energy functional penalizes deviations from the local disparity estimation at different scales.

02/15/19

DAWA-F

27.4

178

47.2

219

13.6

187

13.1

164

19.2

164

66.4

218

20.4

185

30.3

160

33.9

186

8.73

166

48.9

188

37.8

184

26.7

179

29.9

196

28.0

171

36.5

160

Stereo matching process is attracted numbers of study in recent years. The process is unique and difficult due to visual discomfort occurred which contributed to effect of accuracy of disparity maps. By using multistage technique implemented most of Stereo Matching Algorithm; taxonomy by D. Scharstein and R. Szeliski, in this paper proposed new improvement algorithm of stereo matching by using the effect of Adaptive Weighted Bilateral Filter as main filter in cost aggregation stage which able contribute edge-preserving factor and robust against plain colour region. With some improvement parameters in matching cost computation stage where windows size of sum of absolute different (SAD) and thresholds adjustment was applied and Median Filter as main filter in refinement disparity map’s stage may overcome the limitation of disparity map accuracy. Evaluation on indoor datasets, latest (2014) Middlebury dataset were used to prove that Adaptive Weighted Bilateral Filter effect applied on proposed algorithm resulted smooth disparity maps and achieved good processing time.

03/06/19

SM-AWP

38.1

213

30.7

171

24.0

225

25.2

210

30.3

201

44.9

186

38.1

228

56.0

220

55.8

230

19.9

204

60.1

217

51.2

212

32.1

203

30.2

198

40.0

202

61.7

211

03/09/19

3DMST-CM

5.47

4.10

3.37

2.99

2.95

7.63

4.55

3.26

3.95

2.16

10.2

8.28

6.37

13.2

5.86

9.35

This paper presents a novel unsupervised stereo matching cost for stereo matching. Specifically, a novel two-branch convolutional sparse coding (CSC) is used to learn the convolution filter bank without ground truth disparity maps. Then, the sparse representations over the learned convolutional filter bank are utilized to measure the similarity between image patches, namely, the stereo matching cost can be computed by measuring the l1 distance between sparse representations of image patches.

04/12/19

TCSCSM

19.1

152

45.2

213

5.76

119

11.0

150

22.1

174

41.1

172

13.4

152

24.8

143

11.4

114

7.17

152

29.5

141

26.6

140

26.6

177

20.5

140

16.5

138

17.4

112

05/10/19

tMGM-16

17.3

137

8.70

106

6.49

129

9.82

137

20.7

171

13.7

13.9

156

21.8

133

16.0

133

5.57

139

26.9

132

25.4

138

28.3

188

22.1

151

14.6

130

39.6

169

In this work, we propose a learning-based method to denoise and refine disparity maps of a given stereo method. The proposed variational network arises naturally from unrolling the iterates of a proximal gradient method applied to a variational energy defined in a joint disparity, color, and confidence image space. Our method allows to learn a robust collaborative regularizer leveraging the joint statistics of the color image, the confidence map and the disparity map. Due to the variational structure of our method, the individual steps can be easily visualized, thus enabling interpretability of the method. We can therefore provide interesting insights into how our method refines and denoises disparity maps. The efficiency of our method is demonstrated by the publicly available stereo benchmarks Middlebury 2014 and Kitti 2015.

05/13/19

14.2

127

9.69

108

9.58

165

10.9

149

7.33

104

9.54

13.8

155

11.3

100

11.3

113

7.17

152

27.4

135

23.3

132

24.8

165

22.8

156

14.6

131

18.4

119

05/15/19

PSMNet_2000

28.9

185

20.4

145

8.23

148

15.1

178

27.7

194

35.2

157

15.2

167

50.8

209

51.8

224

9.29

171

61.9

221

31.1

157

25.2

169

27.8

182

29.3

175

52.9

196

We propose "DeepPruner", a real-time stereo matching algorithm, which combines the strength of deep network and search space pruning techniques. Towards this goal, we developed a differentiable PatchMatch module that allows us to discard most disparities and generates a sparse representation of the cost-volume. We then exploit this representation to learn which range to prune for each pixel. Our method achieves competitive results on KITTI / SceneFlow datasets while running in real-time at 62ms. Moreover, we obtain the first place (on overall rankings) on the Robust Vision Challenge. For more details, check out our paper and source code.

06/26/19

DeepPruner_ROB

30.1

186

34.2

179

19.9

208

24.3

208

23.8

179

47.2

192

26.1

199

26.1

145

22.8

155

18.4

198

59.8

216

36.5

179

23.2

153

31.7

202

48.3

215

44.8

183

07/26/19

EdgeStereo

18.7

146

25.3

159

6.79

134

10.6

145

25.1

185

22.1

125

8.31

110

24.5

138

16.5

137

6.63

149

9.20

32.0

159

24.6

163

20.2

137

19.2

150

54.3

199

It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest inference methods, a truncated max-product Belief Propagation, and add what is necessary to make it a proper component of a deep learning model: We connect it to learning formulations with losses on marginals and compute the backprop operation. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs), allowing us to design a hierarchical model composing BP inference and CNNs at different scale levels. The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.

11/07/19

LBPS

9.68

5.05

4.98

5.57

3.24

6.03

12.9

144

5.44

5.50

3.55

102

15.2

15.9

17.5

126

17.3

114

9.84

28.3

144

11/11/19

CACA-Net

31.7

196

21.8

149

16.0

197

21.5

200

24.5

181

38.0

164

34.6

223

38.6

182

36.3

195

19.0

201

49.4

189

40.2

192

32.1

204

33.6

212

36.0

186

58.2

205

11/14/19

HSM-Smooth-Occ

10.8

103

11.7

115

5.62

115

8.75

127

8.39

112

15.4

9.60

117

8.29

6.11

4.09

115

23.6

124

20.4

121

17.0

124

13.4

10.2

103

16.6

109

11/15/19

100

SPPSMNet

41.4

225

32.5

174

20.6

212

34.0

228

33.0

210

55.9

210

36.3

226

53.8

217

48.6

215

35.0

232

71.6

228

52.8

220

37.1

217

38.1

223

46.7

212

56.9

204

12/19/19

101

CRLE

5.75

3.66

3.11

5.92

2.14

6.01

3.39

3.49

3.68

2.34

10.2

9.63

8.04

14.9

5.45

9.26

12/30/19

102

F-GDGIF

31.6

195

37.3

188

7.72

145

16.1

184

34.9

217

35.2

158

20.1

183

55.3

219

46.6

209

9.01

168

47.6

184

52.9

221

29.4

193

29.7

195

29.4

177

61.0

209

01/02/20

103

PPEP-GF

34.6

206

42.4

204

21.7

218

24.8

209

30.8

202

44.0

183

25.1

197

45.7

199

42.1

204

20.1

205

44.1

176

43.6

198

35.2

211

32.8

208

39.3

201

55.0

202

01/05/20

104

MTS2

53.8

234

51.7

224

21.5

217

38.8

233

52.7

235

97.5

237

43.0

235

66.4

233

60.8

233

32.0

229

85.7

234

69.0

233

46.3

230

45.1

230

71.2

234

85.2

234

01/07/20

105

ADSR_GIF

37.1

211

43.6

210

18.6

205

36.7

232

24.6

182

58.6

213

22.8

191

56.3

223

49.7

217

18.7

200

56.0

207

48.5

210

32.2

205

24.5

169

36.3

187

79.1

230

02/07/20

106

CasStereo

18.8

148

23.9

156

9.01

156

10.5

142

11.7

130

74.0

226

13.1

150

10.1

7.86

4.09

115

45.4

179

25.2

136

24.4

162

17.3

115

20.5

158

44.3

181

02/20/20

107

CRAR

22.0

160

23.2

154

13.5

186

16.4

186

16.3

152

21.0

123

21.8

188

28.5

158

26.9

167

10.5

176

32.5

149

32.9

162

23.3

154

23.6

164

20.3

157

37.4

162

02/24/20

108

SGBMP

27.8

179

37.5

192

16.3

199

17.1

191

27.6

193

75.7

229

14.6

164

33.4

171

25.8

163

12.2

187

60.3

218

34.0

167

23.1

152

29.3

192

28.7

174

31.2

149

03/13/20

109

MTS

59.7

236

58.8

232

25.2

227

51.1

235

60.4

236

91.3

234

48.3

237

70.3

234

63.4

234

44.4

236

79.3

230

71.7

235

60.9

237

47.4

234

79.6

237

88.4

235

05/14/20

110

SRM

13.1

119

8.50

102

7.04

138

7.86

119

7.73

110

16.1

105

7.90

105

18.4

128

18.5

146

5.03

129

22.3

118

20.0

117

18.1

132

18.5

125

11.3

113

19.3

122

05/19/20

111

SUWNet

30.1

188

24.5

157

13.9

188

20.3

196

20.0

168

35.7

159

26.1

200

40.9

188

43.2

206

17.9

195

49.8

192

28.6

151

24.4

160

28.5

187

52.5

220

37.9

165

05/20/20

112

AANet++

15.4

131

17.5

137

8.37

151

10.2

140

9.86

120

23.9

131

9.82

119

17.7

126

15.9

132

3.25

18.1

105

27.1

144

16.2

116

18.4

124

20.0

155

37.7

164

05/28/20

114

RTSMNet

45.6

229

47.0

217

21.9

219

31.9

225

36.4

220

75.1

228

43.9

236

58.9

229

55.3

229

32.7

230

62.2

222

56.4

228

42.2

226

39.1

225

58.0

223

59.9

207

05/28/20

113

LEAStereo

7.15

7.56

4.52

4.62

4.64

8.83

5.66

5.86

6.03

3.30

13.1

11.3

10.3

12.1

7.06

9.90

06/08/20

115

MANE

30.9

192

54.7

228

11.5

178

14.6

174

29.4

199

52.6

204

26.4

202

45.1

198

31.5

180

11.5

182

42.5

169

41.8

196

33.1

207

31.6

201

34.2

184

43.5

178

07/16/20

116

HLocalExp-CM

5.68

3.68

2.95

3.92

2.45

8.12

3.41

3.74

3.53

2.17

10.2

10.0

8.75

14.1

5.12

9.61

07/17/20

117

GANetREF_RVC

18.9

150

16.6

135

6.42

127

7.40

115

10.6

124

25.8

136

12.2

137

36.5

178

35.5

192

4.10

117

33.8

153

20.1

118

16.4

120

20.2

138

14.3

129

48.1

192

07/21/20

118

AANet_RVC

25.2

171

22.6

150

11.3

176

12.9

161

15.9

149

30.5

148

17.9

173

33.4

170

30.9

176

6.34

148

28.8

140

43.4

197

25.3

171

26.7

180

37.0

189

69.8

220

08/10/20

119

CVANet_RVC

31.8

197

25.6

160

14.6

191

21.7

201

22.1

173

39.8

165

28.4

209

44.7

196

47.0

211

19.1

202

50.6

195

30.4

156

24.7

164

29.3

193

52.1

219

41.6

174

Accurate disparity prediction is a hot spot in computer vision, and how to efﬁciently exploit contextual information is the key to improve the performance. In this paper, we propose a simple yet effective non-local context attention network (NLCANet) to exploit the global context information by using attention mechanisms and semantic information for stereo matching. First, we develop a 2D geometry feature learning (GFL) module to get a more discriminative representation by taking advantage of multi-scale features and form them into the variance-based cost volume. Then, we construct a non-local attention matching (NLAM) module by using the non-local block and hierarchical 3D convolutions, which can effectively regularize the cost volume and capture the global contextual information. Finally, we adopt a geometry reﬁnement (GR) module to reﬁne the disparity map to further improve the performance. Moreover, we add the warping loss function to help the model learn the matching rule of the non-occluded region. Our experiments show that (1), our approach achieves competitive results on KITTI and SceneFlow datasets in the end-point error (EPE) and the fraction of erroneous pixels (D 1 ); (2), our proposed method particularly has superior performance in the reﬂective regions and occluded areas.

08/11/20

120

NLCA_NET_v2_RVC

10.4

11.8

116

4.12

6.39

6.44

19.7

119

10.9

127

14.5

110

13.2

118

3.26

21.2

114

14.7

10.1

14.5

7.17

11.5

08/12/20

121

CFNet_RVC

10.1

14.4

129

7.81

146

7.12

112

6.61

15.5

7.53

12.3

103

11.5

115

3.02

10.7

16.6

10.7

15.4

10.9

111

9.01

09/03/20

122

LPSM

39.5

217

40.0

196

20.7

214

28.3

216

34.0

215

34.3

155

23.8

194

56.7

224

52.4

225

24.9

219

36.9

160

66.3

232

40.6

224

37.5

221

46.6

211

79.3

231

09/09/20

123

AdaStereo

13.7

124

19.6

141

7.41

143

10.6

146

14.5

147

15.7

101

7.85

103

22.6

135

9.32

7.00

151

9.20

22.4

127

14.5

106

17.8

120

14.8

132

24.2

132

10/28/20

124

HITNet

6.46

6.25

4.67

4.51

2.17

6.52

5.18

2.92

2.66

2.37

36.7

158

9.28

6.27

11.2

4.61

9.54

We propose a novel lightweight network for stereo estimation. The method uses densely connected layer structures to learn expressive features without the need of fully-connected layers or 3D convolutions. This leads to a network structure with only 0.37M parameters while still having competitive results. The post-processing consists of filtering, a consistency check and hole filling.

11/10/20

125

FC-DCNN

17.9

141

21.2

147

6.52

132

9.56

134

14.1

145

31.9

152

23.4

193

23.4

136

19.7

148

5.93

145

26.9

131

22.8

129

20.0

138

19.3

133

18.2

144

23.9

131

11/12/20

126

RLStereo

27.9

181

20.5

146

15.0

194

23.5

205

26.3

190

51.5

201

35.8

225

27.1

150

23.4

158

15.6

193

63.6

224

32.3

160

21.5

146

23.2

158

44.7

208

17.4

113

11/12/20

127

UnDAF-GANet

16.2

134

3.74

2.94

16.7

187

18.3

158

24.1

132

26.3

201

19.2

129

15.7

131

1.86

36.8

159

26.8

141

11.1

24.8

172

6.54

28.0

142

11/16/20

128

SSCasStereo

15.2

130

33.6

177

5.73

117

8.13

123

12.6

138

51.1

199

8.19

108

16.7

122

5.02

5.70

141

48.5

185

17.3

103

16.0

114

20.1

136

12.3

119

9.25

11/21/20

129

DecStereo

20.2

155

19.4

140

11.9

181

15.6

180

13.5

143

23.0

129

26.7

203

13.3

106

15.1

129

7.60

157

28.3

138

30.2

155

23.4

155

17.6

116

38.9

200

38.4

166

11/25/20

130

LPSC

10.7

101

5.15

4.23

5.48

6.38

16.5

107

7.84

102

9.56

10.3

106

4.02

112

20.2

111

19.0

113

17.7

128

18.5

126

9.73

18.0

116

12/22/20

131

SLCCF

8.83

6.97

4.90

6.05

4.35

8.89

5.33

6.29

5.15

4.80

125

13.0

18.1

108

17.8

130

17.7

117

6.93

15.4

106

12/24/20

132

ACR-GIF-OW

24.5

166

37.5

190

10.8

173

16.3

185

17.4

157

44.9

187

17.2

171

33.5

172

25.2

161

11.4

181

45.4

178

35.7

174

26.6

178

23.3

161

23.6

163

38.4

167

This model is trained on low-resolution data but aims at high-resolution images. It uses a recurrent module to iteratively update a coarse disparity prediction. Then a special refinement module makes a final adjustment. The recurrent update and final refine are applied in a patch-wise manner across the initial disparity.

03/05/21

133

ORStereo

19.1

151

38.9

195

9.97

170

9.21

130

23.3

177

42.6

178

13.0

147

18.2

127

6.63

4.93

126

35.4

155

33.1

163

24.1

158

23.6

163

18.2

143

26.0

139

03/05/21

134

LocalExp-RC

5.54

3.78

3.02

3.85

2.08

5.95

3.48

3.61

3.65

2.52

10.3

6.85

7.25

16.1

105

5.12

10.2

04/22/21

135

LESC

6.78

4.07

3.46

3.26

3.36

9.15

4.08

4.76

5.21

2.80

11.7

13.0

10.2

17.0

110

5.52

12.5

05/09/21

136

ADSG

24.7

168

36.3

184

11.3

175

15.6

181

20.0

167

35.9

161

18.2

174

35.2

176

27.2

168

12.2

188

42.7

171

33.8

166

26.3

175

23.3

160

27.0

169

36.6

161

06/02/21

137

FADNet_RVC

28.4

183

18.3

138

9.48

162

13.9

170

16.0

150

40.9

171

13.0

148

43.4

192

45.0

207

8.43

164

57.8

209

35.7

172

24.8

166

23.9

167

47.4

214

68.1

218

06/07/21

138

FADNet++

40.2

218

25.2

158

22.1

220

33.4

227

28.8

198

54.0

205

33.6

221

46.8

201

46.8

210

23.5

216

73.4

229

47.1

209

28.5

189

37.7

222

65.4

232

71.9

224

We propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs.

06/10/21

139

ReS2tAC

35.8

208

41.8

203

20.7

213

28.1

215

24.6

183

42.4

176

30.3

212

38.9

184

34.9

189

23.0

214

50.3

194

48.9

211

39.8

223

39.4

226

44.1

205

65.1

214

06/11/21

140

R3DCNN

33.0

201

34.2

180

15.8

196

13.4

165

41.7

227

47.9

193

22.0

189

60.1

231

57.4

232

12.6

189

40.3

165

46.4

206

26.8

181

37.0

218

19.3

151

45.2

184

07/23/21

142

HBP_ISP

5.20

3.70

3.05

3.57

2.34

7.80

3.79

3.34

3.09

1.87

9.85

10.1

7.82

11.2

5.26

7.86

07/26/21

143

RAFT-Stereo

4.74

4.19

3.44

3.11

1.51

7.30

2.79

2.67

2.59

1.39

7.46

10.2

5.86

13.0

3.59

9.38

07/14/21

141

MFN_USFDSRVC

36.7

210

31.4

173

16.1

198

22.0

202

28.5

196

42.3

175

21.1

186

52.2

213

50.7

221

21.4

209

53.6

204

53.8

223

30.7

198

31.7

203

63.7

228

60.7

208

A lightweight network with dilated ResNet feature extractor, a correlation cost volume run at a low resolution, and a refinement network to get a full resolution disparity output. Sparse disparity is processed from the dense disparity using a threshold on the network confidence output and a region grower to remove suspected bad disparities.

08/24/21

144

MMStereo

12.7

117

27.9

166

8.71

153

8.81

128

11.7

130

26.9

140

5.82

20.9

132

14.6

126

4.10

117

15.4

16.0

14.2

104

13.6

9.71

7.35

09/20/21

145

GANet-RSSM

10.6

100

11.9

117

8.54

152

6.60

103

6.26

16.3

106

7.10

15.5

113

14.5

125

2.93

11.3

16.1

10.8

16.0

104

10.7

107

11.3

10/17/21

146

ACVNet

13.6

123

9.69

108

3.65

4.82

7.48

105

22.9

128

12.9

144

15.7

115

14.4

124

3.82

107

21.8

115

17.7

105

14.3

105

15.6

25.6

167

32.1

151

11/10/21

147

CREStereo

3.71

4.73

3.94

5.07

1.96

3.02

1.42

2.28

2.05

1.51

6.86

6.35

4.25

6.01

4.60

5.49

11/21/21

148

FENet

11.3

110

7.70

3.91

3.97

6.24

16.7

108

5.78

32.1

165

32.4

184

2.57

11.8

10.8

6.90

13.4

5.41

11.2

11/21/21

149

Gwc_CoAtRS

6.50

6.92

6.82

135

4.55

3.48

5.12

5.80

4.88

4.96

2.69

15.3

12.8

6.40

10.2

7.13

8.48

01/27/22

150

UPFNet

10.3

9.74

110

4.67

6.28

5.54

20.1

120

8.78

113

9.42

7.51

3.78

106

22.9

121

16.0

16.7

122

15.6

8.70

16.0

107

02/27/22

151

CSTR

8.72

6.28

6.00

123

4.13

5.00

8.03

7.81

101

5.33

5.80

3.25

20.3

112

14.4

11.2

12.9

7.29

31.5

150

04/11/22

152

Z2ZNCC

34.4

205

40.9

200

20.4

211

29.9

218

32.0

206

42.5

177

27.0

204

41.4

189

38.0

198

26.0

221

48.6

186

43.7

199

36.5

213

33.3

210

37.5

193

41.3

173

05/25/22

153

LSMSW

8.15

5.45

4.64

5.93

2.93

10.6

5.68

8.70

9.23

2.68

13.4

14.6

13.3

13.5

7.69

11.4

06/13/22

154

EAI-Stereo

3.68

4.02

3.32

2.48

1.42

4.19

2.37

2.18

2.01

1.16

10.2

8.84

4.00

7.15

3.14

6.44

08/05/22

155

RDNet

11.3

109

11.2

112

5.24

108

5.45

6.51

16.7

109

8.89

114

16.7

122

14.9

128

4.75

124

16.5

19.3

115

15.4

111

12.5

12.1

117

14.2

101

08/08/22

156

UCFNet_RVC

10.7

102

12.2

120

6.48

128

5.83

5.90

16.9

111

6.61

15.8

117

14.6

127

2.73

11.4

18.8

111

11.0

18.9

131

10.7

107

11.4

08/09/22

157

issga

18.9

149

12.0

119

11.6

179

11.1

152

18.3

159

14.3

14.6

165

28.6

159

26.2

164

5.90

144

13.5

41.4

194

21.9

148

22.2

152

19.4

152

30.7

146

08/29/22

158

MCP-HA-VQ

30.6

190

47.8

221

17.8

202

23.0

203

25.9

188

41.9

173

24.6

196

38.2

181

31.5

179

18.4

197

43.0

173

35.7

173

29.6

194

29.6

194

35.8

185

47.9

190

09/01/22

159

GMStereo

7.14

6.30

6.20

125

6.22

6.62

9.79

2.76

5.69

5.17

4.04

113

14.0

11.2

6.81

11.8

6.90

12.8

In recent years, convolutional-neural-network based stereo matching methods have achieved significant gains compared to conventional methods in terms of both speed and accuracy. Current state-of-the-art disparity estimation algorithms require many parameters and large amounts of computational resources and are not suited to applications on edge devices. In this paper, we propose an end-to-end light-weight network (LWNet) for fast stereo matching, which consists of an efficient backbone with multi-scale feature fusion for feature extraction, a 3D U-Net aggregation architecture for disparity computation and a color guidance in 2D CNN for disparity refinement.

09/20/22

160

LWNet

40.9

222

38.1

193

18.4

204

30.5

221

33.3

214

43.2

181

30.9

216

49.2

207

50.6

220

22.8

213

58.1

211

54.2

224

41.8

225

37.5

220

58.8

225

81.5

233

09/27/22

161

FCDSN-DC

18.8

147

23.0

152

7.01

137

10.2

141

20.1

169

37.7

163

17.3

172

27.8

154

20.8

153

7.81

160

23.9

125

24.5

135

22.4

149

20.7

142

16.0

137

19.9

123

10/01/22

162

CREStereo++_RVC

4.68

5.09

4.04

5.24

4.21

5.05

2.11

3.52

3.58

1.67

8.01

6.61

4.68

9.53

4.61

5.98

10/02/22

163

MaskLacGwcNet_RVC

10.4

7.52

4.50

5.21

6.94

102

18.6

116

5.18

14.7

112

13.3

120

3.01

28.5

139

18.0

107

8.95

11.2

14.2

128

13.8

10/02/22

164

raft+_RVC

8.29

11.1

111

4.49

5.97

10.3

122

28.5

142

3.75

5.07

2.88

2.21

12.2

15.2

12.3

12.7

5.09

10.8

10/03/22

166

GEStereo_RVC

7.97

6.70

3.52

5.90

7.63

108

22.5

127

7.61

4.89

4.22

2.19

10.4

14.9

11.8

14.3

4.36

11.9

10/03/22

165

CroCo_RVC

15.1

129

7.43

5.85

121

6.71

106

11.7

129

15.4

3.94

36.2

177

35.8

193

3.41

100

18.1

104

29.3

153

10.9

18.0

121

10.6

106

21.0

125

10/03/22

167

iRaftStereo_RVC

8.07

9.13

107

8.25

149

5.55

4.68

6.92

6.41

6.29

6.19

3.96

110

17.9

102

13.0

9.58

11.4

9.24

11.8

10/06/22

168

GwcSlice

12.7

116

13.4

123

4.76

5.33

7.69

109

17.0

112

11.1

129

13.7

107

9.88

102

4.22

120

20.1

110

20.1

119

17.4

125

16.9

109

14.0

126

36.5

159

10/08/22

169

MANet

17.5

139

23.0

153

5.25

109

9.82

137

11.4

127

31.1

149

12.9

143

22.5

134

16.1

135

7.95

162

24.9

129

28.1

149

24.4

161

20.5

140

18.9

149

31.1

148

10/16/22

170

LMCR-Stereo

6.27

6.20

4.59

3.92

2.66

4.52

4.88

3.65

3.41

2.08

16.8

11.2

8.58

13.2

6.89

10.5

Cost aggregation plays a critical role in existing stereo matching methods. Generally, aggregating matching costs in homogeneous regions with similar disparities is benefi- cial to matching accuracy. However, previous approaches commonly use 3D convolutions for cost aggregation with- out considering the homogeneity of different regions. In this paper, we revisit cost aggregation in stereo match- ing from a perspective of disparity classification and pro- pose a generic yet efficient Disparity Context Aggregation (DCA) module to improve the performance of CNN-based methods.

10/26/22

171

DCANet

8.55

8.41

100

6.26

126

4.79

5.41

10.3

7.14

10.1

9.75

100

3.38

12.8

13.5

12.4

12.7

7.37

10.2

11/10/22

172

DLNR

3.20

2.91

2.37

2.18

1.67

3.21

1.37

1.66

1.11

6.25

7.07

3.45

8.90

4.43

2.91

11/11/22

173

ICVP

7.97

11.3

113

3.97

5.02

8.79

114

17.1

113

5.62

7.51

6.97

3.09

13.7

12.7

9.23

10.9

6.28

9.73

12/02/22

174

GANet+ADL

17.7

140

21.3

148

3.97

6.61

104

11.7

130

25.9

138

6.07

40.5

186

35.0

191

3.68

105

24.1

126

19.3

114

13.9

101

14.4

23.0

162

33.5

154

12/05/22

175

Ct-Net

21.0

158

38.5

194

11.3

176

11.7

155

17.4

155

31.7

151

13.0

146

27.1

149

20.4

151

7.45

156

27.9

137

27.7

148

16.5

121

23.5

162

38.8

198

24.4

133

03/06/23

176

PCVNet

8.19

7.01

6.51

130

5.89

4.53

7.42

8.10

106

5.49

5.62

2.90

22.0

117

12.7

8.07

11.9

7.87

21.9

128

03/07/23

177

GOAT18

8.73

7.26

7.32

141

6.80

107

3.47

10.3

10.4

125

5.14

5.16

4.95

128

15.9

13.9

11.2

9.62

13.1

122

16.4

108

04/18/23

178

DMCANet

7.79

7.91

4.12

3.79

4.26

11.2

10.1

123

6.76

4.85

3.32

12.9

13.3

10.5

12.9

9.11

10.1

04/28/23

179

ADStereo

18.0

142

16.4

134

14.9

193

12.6

160

21.3

172

20.6

121

16.6

169

15.8

116

16.0

134

7.43

155

19.1

107

52.0

215

24.8

167

18.1

122

17.7

142

11.2

06/09/23

180

SSVM-CFPMF

9.52

8.58

103

4.40

5.51

5.84

7.02

6.16

14.3

123

5.30

134

17.0

100

15.9

14.8

108

18.7

127

6.52

13.7

06/22/23

181

IGEV-Stereo

4.83

3.17

2.46

1.97

2.19

5.63

1.22

16.2

118

9.20

1.17

3.77

4.93

5.35

6.99

2.31

5.00

08/10/23

182

CroCo-Stereo

7.29

4.90

3.62

1.74

7.01

103

9.90

1.78

16.4

120

17.4

141

1.45

6.20

15.3

4.95

8.62

5.00

10.0

08/13/23

183

Any-RAFT

5.22

5.19

4.20

4.00

2.23

5.88

4.06

3.05

2.91

2.04

9.76

10.7

8.77

9.90

4.94

6.72

09/27/23

184

FM-DT

40.6

220

45.4

215

20.8

215

32.3

226

40.5

225

51.3

200

30.6

213

52.1

212

48.0

213

23.2

215

53.4

202

47.0

207

39.7

219

29.0

190

55.3

222

76.2

228

10/09/23

185

EGLCR-Stereo

4.03

4.69

2.46

3.70

2.99

10.7

2.48

1.95

1.63

0.94

5.76

8.17

3.84

10.3

2.99

4.87

10/30/23

186

LoS

4.20

5.85

4.92

4.64

2.77

3.92

1.32

2.36

2.17

1.81

8.18

6.58

4.55

8.57

4.57

5.06

11/13/23

187

Selective-IGEV

2.51

2.54

1.86

2.51

1.12

7.22

1.23

1.36

1.17

1.16

4.48

4.83

2.99

3.79

2.26

4.72

12/07/23

188

4D-IteraStereo

10.9

105

5.87

5.59

114

6.15

6.07

9.15

7.46

27.0

148

32.4

183

2.46

12.2

11.2

7.18

12.2

7.37

6.77

This article presents a disparity map algorithm to improve the depth map estimation based on Census Transform and hierarchical segment-tree on each block.The stereo matching algorithm presented in this study comprises of four steps: Cost Computation, Cost Aggregation, Optimization, and Post-Processing, all of which will refine the final disparity map.

12/31/23

189

H-CENST

38.4

214

41.6

201

26.7

230

31.8

224

33.0

212

43.0

179

32.7

219

53.1

215

50.5

219

24.8

218

51.4

199

47.0

208

36.7

215

31.9

205

40.5

203

53.4

197

Unsupervised Stereo Matching methods have made significant strides recently. However, these approaches have predominantly relied on the assumption of photometric consistency, leading to potential limitations: sensitivity to illuminance changes and difficulty in dealing with problematic areas like occluded or textureless regions. To mitigate these limitations, this paper introduces a novel self-supervised dual-level framework named \textbf{\textit{Dual-Net}}. This framework mainly consists of two key components: self-supervised teacher training and student training based on knowledge distillation. Specifically, the teacher model is first trained in a self-supervised fashion with a focus on feature space and data augmentation consistency. On the one hand, pixels from feature space are robust to noise and luminance changes, which are discriminative even in textureless regions. On the other hand, a data augmentation consistency loss is presented to guide the model toward enhanced contextual awareness, thus leading to a completed depth estimation in problematic regions. Then, the knowledge learned by the teacher model is distilled and transferred probabilistically to the student model. By leveraging this distilled knowledge, the student model is guided by validated insights, enabling it to outperform its teacher model by a large margin.

01/08/24

190

DualNet

16.4

135

19.7

142

7.99

147

10.1

139

18.3

160

24.1

132

10.0

121

23.9

137

20.4

149

7.79

159

23.0

122

23.1

130

16.3

119

18.8

129

17.0

139

18.5

120

01/08/24

191

GINet

15.6

132

16.1

132

7.15

139

7.37

114

9.39

117

25.1

135

7.88

104

35.2

175

32.6

185

3.19

15.5

16.7

100

11.6

14.7

15.8

135

26.4

140

01/31/24

192

HART

4.24

3.13

2.24

4.16

1.10

4.01

2.03

1.86

1.68

0.85

9.83

11.0

8.71

9.65

3.26

6.96

02/19/24

193

HCR

12.4

113

8.33

3.79

5.54

9.27

116

26.1

139

6.26

32.9

167

32.0

182

2.38

11.4

10.8

8.08

15.8

101

7.69

6.48

02/21/24

194

ClearDepth

3.48

4.14

3.16

2.81

1.95

4.55

2.36

1.73

1.70

1.25

5.46

11.2

3.12

7.30

3.70

3.45

04/19/24

195

DCSE

16.2

133

16.1

133

4.76

6.47

100

12.5

137

29.9

145

8.91

115

34.7

173

33.9

187

3.98

111

22.8

120

18.9

112

16.3

117

15.2

10.7

107

23.8

130

05/17/24

196

FormerRaft_RVC

10.9

106

13.4

124

8.32

150

6.67

105

9.42

118

15.6

3.24

9.67

10.5

107

5.30

134

13.8

17.8

106

9.52

17.2

113

17.1

140

18.8

121

06/06/24

197

MoCha-V2

3.51

2.52

1.95

2.25

1.47

4.61

0.98

7.35

8.07

0.66

2.95

4.18

4.46

5.70

2.54

2.70

06/14/24

198

IGEV++

3.23

3.24

2.46

4.12

1.15

6.71

1.38

1.53

1.52

1.02

4.57

4.68

5.41

7.68

2.22

4.68

06/27/24

199

CAS++

3.33

4.27

3.72

3.17

2.17

2.44

1.33

2.24

2.01

1.47

4.04

8.15

4.97

5.80

3.73

3.04

07/22/24

200

apnet

30.9

193

18.3

139

9.59

166

17.1

190

24.8

184

49.1

195

19.5

180

32.3

166

29.2

173

22.2

212

60.7

220

33.2

164

27.0

183

28.0

184

64.4

231

63.7

213

08/07/24

202

AIO-Stereo

2.36

2.38

1.71

3.22

0.85

5.83

1.24

1.42

1.32

1.03

4.49

4.81

2.43

3.61

2.12

3.63

08/13/24

203

UniTT-Stereo

6.34

3.96

2.69

1.82

7.92

111

11.7

1.81

14.2

109

13.8

121

1.22

5.07

16.6

4.09

5.89

2.91

8.44

This paper focuses on effectively capturing local patterns from images during the fine-tuning of Transformer-based models with limited labeled training data in dense downstream tasks, particularly in the context of stereo matching. For that, we propose MaDis-stereo, a novel stereo depth estimation framework that enhances locality inductive biases during fine-tuning via Masked Image Modeling (MIM).

08/15/24

204

MaDis-Stereo

9.49

3.73

3.14

1.76

9.05

115

10.5

1.74

27.8

153

27.9

169

1.50

7.47

19.8

116

4.80

11.8

3.40

10.2

07/27/24

201

esmea

30.1

187

29.4

170

9.48

162

17.0

189

31.7

205

49.7

196

15.2

166

52.6

214

45.9

208

11.9

186

46.5

181

52.1

216

27.2

185

23.7

165

25.2

165

54.5

200

We propose S-MoEStereo, which adapts pre-trained VFMs for stereo matching by integrating Low-Rank Adaptation (LoRA) with Mixture-of-Experts (MoE) modules. This approach balances parameter efficiency and discriminative feature learning by dynamically selecting the optimal expert within each MoE module. Additionally, we introduce CNN-based adapter layers to incorporate inductive bias, enhancing geometric feature extraction. Furthermore, we propose a lightweight decision network to reduce computational costs by selectively activating MoE modules based on input complexity.

10/26/24

205

SMoEStereo_RVC

5.83

6.58

5.15

104

3.96

3.82

5.84

5.41

4.43

4.21

2.31

10.2

8.41

6.20

11.1

6.59

8.19

11/03/24

206

DEFOM-Stereo

2.39

2.82

2.21

1.53

1.01

5.24

0.88

1.40

1.14

0.85

2.64

9.10

2.18

5.50

2.49

1.67

11/28/24

207

DEFOM-Stereo_RVC

3.28

3.50

2.61

2.41

0.87

2.51

0.89

1.38

1.26

0.97

6.35

10.8

2.43

11.0

3.03

5.00

02/03/25

208

FoundationStereo

1.84

2.46

1.71

1.36

0.79

5.19

0.53

0.93

0.84

0.93

2.41

3.39

3.45

3.28

1.82

1.17

02/10/25

209

GREAT-IGEV

2.81

3.30

2.44

2.31

0.96

7.12

1.17

1.38

1.36

1.04

3.89

3.82

4.66

6.24

2.17

4.65

02/16/25

210

TCM

12.8

118

13.5

125

5.64

116

7.01

111

12.0

133

30.2

147

10.0

122

9.66

7.53

5.58

140

27.0

133

21.0

125

16.0

115

18.8

128

12.5

120

18.1

117

03/04/25

211

LG-Stereo

1.76

2.57

1.86

2.02

0.65

3.23

0.68

0.98

0.81

0.55

2.15

4.26

2.03

3.85

1.27

2.42

03/19/25

212

G2L-Stereo

13.3

121

16.8

136

4.45

5.64

16.8

154

23.0

130

5.05

20.4

131

16.2

136

2.83

15.3

24.0

134

18.2

133

19.5

134

5.48

25.4

136

We introduce Stereo Anywhere, a novel stereo-matching framework that combines geometric constraints with robust priors from monocular depth Vision Foundation Models (VFMs). By elegantly coupling these complementary worlds through a dual-branch architecture, we seamlessly integrate stereo matching with learned contextual cues.

04/24/25

214

StereoAnywhere

3.69

7.34

2.23

5.12

18.1

115

0.90

2.16

1.43

1.25

5.73

4.95

2.66

6.89

2.28

1.86

M2-Stereo embedded three Multi scale Feature Fusion Attention Blocks in the feature extraction stage to fuse deep and shallow information, and used a Multi scale Cost Aggregation Module in the cost aggregation stage to achieve sharing of cost information at different scales. Finally, the Multi branch Iterative Strategy was used for efficient iteration.

04/24/25

213

M2-Stereo

3.90

8.13

2.35

1.61

3.58

15.9

103

1.43

1.83

1.13

0.79

3.09

7.36

6.23

7.99

2.63

4.03

05/07/25

215

G2L-ROB

11.6

111

15.0

131

5.19

106

4.91

13.1

142

19.4

118

6.22

17.5

125

14.0

122

2.55

13.9

20.8

124

14.0

102

18.2

123

6.96

14.6

105

DS-Stereo utilizes our proposed Adjacent Feature Hybrid Attention Block and Hierarchical Cost Aggregation Module to achieve deep to shallow information interaction in stereo matching. Simultaneously replacing the traditional ConvGRU iterative operator with an Inception like iterative operator to achieve high convergence updates.

05/07/25

216

DS-Stereo

3.13

4.92

2.14

1.40

1.23

10.1

1.16

1.55

1.25

0.93

3.41

6.18

5.10

8.66

2.21

2.43

05/21/25

217

waterstereo

8.48

6.74

6.51

130

9.22

131

5.11

6.44

4.71

5.50

5.36

3.44

101

18.2

106

14.2

9.13

14.4

13.3

123

12.9

06/05/25

218

MGS-Stereo

2.16

2.62

1.58

2.20

0.76

6.45

1.04

1.39

1.08

0.67

1.73

4.32

1.43

6.29

1.83

2.30

This paper proposes a robust stereo matching algorithm that combines a CNN for initial cost computation, bilateral filtering with cross-based cost aggregation (CBCA) for refinement, and a winner-take-all (WTA) strategy for disparity selection, followed by an edge-aware smoothing filter (EASF) to reduce noise

06/12/25

219

IRDINA

35.4

207

40.3

198

23.5

223

26.0

212

28.6

197

40.6

167

28.5

210

46.6

200

43.2

205

26.1

222

51.8

200

39.2

189

32.0

202

33.3

211

44.8

209

46.9

187

06/17/25

220

UnViTAStereo

24.3

165

28.5

169

9.72

168

12.5

159

12.7

139

29.6

144

12.1

136

45.0

197

39.5

201

9.19

170

42.7

170

27.0

143

21.1

143

22.3

153

36.7

188

38.8

168

06/27/25

222

S2M2

1.15

1.29

1.23

1.27

0.40

0.45

0.59

0.67

0.62

0.45

1.28

2.80

1.37

3.60

1.12

0.25

06/17/25

221

PanMatch

7.18

5.21

5.34

112

3.34

5.43

4.52

2.47

13.2

105

13.3

119

1.51

8.34

16.6

8.07

10.3

4.96

9.23

07/11/25

223

SLEDC_v1

6.67

4.22

2.72

3.49

3.38

13.3

5.11

4.36

3.92

2.19

13.5

10.8

11.9

14.5

6.83

8.27

07/24/25

225

BridgeDepth

3.78

13.0

121

2.45

1.58

1.54

9.56

2.27

3.67

1.65

1.29

7.63

6.44

2.72

7.79

2.70

2.68

07/19/25

224

GEAStereo

3.80

2.93

2.29

2.08

2.52

6.53

2.14

2.11

2.32

1.36

6.97

6.42

5.55

10.9

2.33

5.06

09/02/25

226

MonSter++

2.60

7.04

1.61

1.91

1.04

8.92

0.85

2.08

1.02

0.75

3.06

8.01

2.73

3.84

2.11

2.00

10/23/25

227

VMStereo-Base

4.52

5.00

4.18

3.81

2.90

3.92

3.34

3.71

3.67

1.69

9.38

6.61

5.79

6.95

3.53

9.02

11/11/25

228

BLMT-Stereo

1.57

2.24

1.58

3.95

0.59

4.58

1.22

1.03

1.01

0.54

1.18

1.81

1.16

1.18

1.91

1.15

12/02/25

229

DispViT+

4.92

8.62

104

2.27

1.58

3.45

17.3

114

1.63

9.31

8.95

0.94

5.85

6.27

3.02

7.84

2.18

2.61

12/22/25

230

SelfViTAS

17.4

138

23.2

155

8.83

155

12.2

158

11.4

126

24.7

134

9.67

118

16.5

121

10.7

109

9.96

174

35.6

156

27.3

145

19.3

136

20.9

144

21.6

161

42.9

176

02/15/26

232

WAFT-Stereo

2.53

4.41

1.15

0.87

2.96

13.3

0.91

2.28

0.90

0.68

0.91

4.32

1.45

6.71

1.97

0.40

02/04/26

231

GGDA

7.80

7.53

5.73

117

4.83

4.68

8.60

3.92

4.55

3.59

3.09

14.6

12.7

8.81

15.7

100

13.8

125

14.1

100

02/15/26

233

DispArbiter

2.57

6.80

1.57

1.85

1.04

8.92

0.83

2.07

1.03

0.75

3.03

8.03

2.64

3.87

2.08

2.03

03/04/26

234

DepthFocus

1.00

1.80

1.45

0.93

0.45

0.62

0.54

0.83

0.76

0.50

1.19

3.61

1.33

0.67

1.26

0.25

03/23/26

235

LACA

4.71

6.81

2.54

1.54

2.96

10.2

1.70

9.62

8.96

1.32

5.29

6.33

5.01

6.69

2.98

2.58

MatchAttention is a novel attention mechanism that embeds explicit matching constraints by using continuous relative positions to dynamically target the exact sampling center for key-value pairs. By leveraging Continuous Attention Sampling, it achieves differentiable, linear-complexity matching, enabling highly efficient and accurate high-resolution stereo inference.

03/30/26

236

MatchAttention

1.29

1.99

1.73

0.98

0.48

0.94

0.55

0.93

0.77

0.67

1.22

1.31

0.98

4.16

1.39

1.45

04/16/26

237

CTHDGM

51.2

232

46.6

216

36.1

237

44.9

234

49.1

232

62.1

215

39.1

230

57.2

226

55.0

228

43.3

235

80.6

232

55.9

226

47.0

232

43.5

229

66.8

233

72.2

226

05/07/26

238

NBS

2.39

3.38

2.82

1.23

1.95

2.51

0.80

1.51

1.42

1.21

2.63

5.33

2.08

6.08

2.93

1.82

05/08/26

239

E³Stereo

9.31

8.66

105

4.53

2.51

5.33

7.04

3.50

31.5

163

31.5

181

1.88

4.79

8.05

4.38

8.68

5.41

5.63

05/11/26

240

GCAP-stereo

4.31

5.27

4.75

3.08

1.79

4.21

3.06

2.60

2.55

1.60

9.32

5.49

5.27

5.24

9.12

5.38

Reference list