vision.middlebury.edu/stereo/eval3

Middlebury Stereo Evaluation - Version 3

#page { display:none; } #noscript { display:inline; background-color:crimson; color:white; font-size:14px; font-weight:bold; } Please enable javascript to use the site.

Mouseover the table cells to see the produced disparity map. Clicking a cell will blink the ground truth for comparison. To change the table type, click the links below. For more information, please see the description of new features.

Submit and evaluate your own results.

Set:	test densetest sparsetraining densetraining sparse
Metric:	bad 0.5 bad 1.0 bad 2.0 bad 4.0 avgerr rms A50 A90 A95 A99 time time/MP time/GD
Mask:	nonocc all
plot selected show invalid Reset sort Reference list

Reference

Description

Parameters

Running Environment

[stat] error

bad 2.0 (%)

Weight

Date

Name

Res

Avg

Austr

AustrP

Bicyc2

Class

ClassE

Compu

Crusa

CrusaP

Djemb

DjembL

Hoops

Livgrm

Nkuba

Plants

Stairs

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 250
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 1.5
nd: 256
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 410
im0	im1
GT
nonocc

MP: 5.9
nd: 320
im0	im1
GT
nonocc

MP: 5.5
nd: 570
im0	im1
GT
nonocc

MP: 5.6
nd: 320
im0	im1
GT
nonocc

MP: 5.2
nd: 450
im0	im1
GT
nonocc

OpenCV's "semi-global block matching" method; memory-intensive 2-pass version, which can only handle the quarter-size images. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/25/14

SGBM2

26.4

206

27.9

195

12.1

213

17.8

225

13.7

172

74.5

261

14.0

189

30.3

195

26.3

199

11.0

209

64.4

260

37.9

219

25.8

204

25.3

207

29.3

204

43.7

213

OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/25/14

SGBM1

27.9

212

28.3

197

17.2

235

19.0

226

14.5

178

57.9

246

15.6

199

31.8

197

31.4

213

13.2

222

58.6

247

38.6

221

27.0

214

25.9

209

31.4

210

59.7

240

The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.

07/25/14

SGM

20.8

187

35.5

215

9.57

195

13.8

199

16.5

183

32.1

182

19.0

211

25.8

172

16.7

164

8.95

197

39.8

194

31.1

186

22.6

178

20.7

168

21.3

185

32.2

180

07/25/14

SGBM1

28.4

214

43.5

244

9.09

184

13.6

198

25.9

222

82.0

265

14.4

194

43.4

227

30.3

211

5.98

174

59.3

250

45.8

238

28.5

222

24.9

204

20.1

182

45.9

218

07/28/14

SGBM1

23.8

195

32.9

208

10.8

205

13.6

197

16.2

181

71.2

256

12.6

170

26.6

176

23.0

183

5.83

169

53.8

239

39.2

223

25.6

202

22.8

185

18.8

173

47.4

223

07/28/14

SGM

18.4

172

40.3

231

4.54

8.03

141

22.9

208

40.5

199

11.4

158

24.7

169

10.1

120

5.40

159

29.6

170

28.5

177

23.9

187

20.0

160

14.2

146

30.9

175

07/28/14

SGM

25.3

203

45.1

247

4.33

6.87

129

32.2

241

50.0

231

13.0

179

48.1

240

18.3

172

7.66

187

29.6

169

36.1

208

31.2

234

24.2

199

24.5

191

50.2

227

Correlation with five, partly overlapping windows on Census transformed images using Hamming distance as matching cost. A left-right consistency check ensures unique matches and filtering small disparity segments removes outliers. Interpolation is done within image rows with the lowest, valid neighboring disparity.

07/28/14

Cens5

26.6

208

47.1

252

8.74

180

11.9

186

25.6

219

45.3

222

19.5

215

40.6

221

29.0

207

9.93

203

36.5

187

38.6

220

31.0

233

25.0

206

25.6

193

44.6

215

A fast method for high-resolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/- 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.

08/25/14

LPS

19.2

182

6.14

5.34

130

9.24

157

7.53

123

96.0

271

12.3

168

9.61

103

9.40

112

5.18

151

92.4

272

27.4

172

24.3

189

23.0

187

10.0

116

25.6

162

08/27/14

LPS

20.3

186

6.72

6.06

143

9.72

161

9.87

146

94.3

270

14.1

190

11.2

116

11.2

127

5.88

170

89.3

271

36.0

207

20.5

164

23.8

197

16.0

158

25.4

159

08/31/14

BSM

41.5

262

59.8

268

25.8

264

27.9

250

38.9

260

60.6

248

33.3

256

46.9

237

37.3

232

26.3

259

64.8

261

51.5

249

42.6

263

45.2

266

42.8

238

66.6

251

09/10/14

LAMC_DSM

26.0

204

55.8

264

11.9

211

14.3

204

18.3

190

44.0

216

18.3

209

39.9

219

29.5

210

6.67

179

31.1

174

34.5

199

28.8

223

26.3

210

30.1

207

35.7

187

09/18/14

SNCC

21.9

189

48.6

256

6.98

159

9.79

162

25.7

220

46.0

225

12.4

169

36.8

213

16.6

163

7.25

183

23.1

148

34.2

198

26.7

211

21.8

177

19.9

180

28.4

171

10/07/14

IDR

18.1

171

37.5

224

4.08

7.49

138

23.3

211

40.6

201

12.8

171

24.5

167

11.3

128

5.46

161

33.1

181

26.0

164

21.5

174

21.7

176

15.3

154

21.2

150

In stereo matching cost filtering methods and energy minimization algorithms are considered as two different techniques. Due to their global extend energy minimization methods obtain good stereo matching results. However, they tend to fail in occluded regions, in which cost filtering approaches obtain better results. In this paper we intend to combine both approaches with the aim to improve overall stereo matching results. We propose to perform stereo matching as a two-step energy minimization algorithm. We consider two MRF models: a fully connected model defined on the complete set of pixels in an image and a conventional locally connected model. We solve the energy minimization problem for the fully connected model, after which the marginal function of the solution is used as the unary potential in the locally connected MRF model.

01/21/15

TSGO

39.1

252

34.1

211

16.9

234

20.0

227

43.3

264

55.4

242

14.3

192

54.1

254

49.2

252

33.9

267

66.2

262

45.9

239

39.8

258

42.6

263

47.2

247

52.6

229

04/08/15

REAF

31.4

228

58.3

265

30.9

270

13.1

192

45.3

266

63.8

251

30.9

253

38.7

217

25.3

192

8.60

194

39.3

193

36.8

212

27.0

213

35.5

250

18.2

171

39.7

200

04/09/15

PFS

32.2

233

65.1

270

29.4

269

12.1

187

50.0

268

70.8

255

28.2

243

44.6

229

23.1

184

7.85

190

37.0

191

37.7

216

27.9

218

36.0

252

19.8

179

35.7

188

04/17/15

TMAP

16.9

163

20.2

173

4.94

115

8.13

144

12.8

167

30.0

175

11.7

159

27.9

187

20.4

177

5.09

149

31.5

175

23.1

152

20.9

168

19.0

156

18.8

174

18.0

133

This approach triangulates the polygonized SLIC segmentations of the input images and optimizes a lower-layer MRF on the resulting set of triangles defined by photo consistency and normal smoothness. The lower-layer MRF is solved by a quadratic relaxation method which iterates between PatchMatch and Cholesky Decomposition. The lower-layer MRF is assisted by a upper-layer MRF defined on the set of triangle vertices which exploits local 'visual complexity' cues and encourages smoothness of the vertices' splitting properties. The two layers interact through an Alignment energy term which requires triangles sharing a non-split vertex to have their disparities agree on that vertex. Optimization of the whole model is iterated between optimizations of the two layers till convergence where the upper-layer can be solved in closed form.

04/19/15

MeshStereo

13.2

144

5.90

4.88

111

10.8

177

12.9

168

10.6

11.0

152

12.2

120

9.01

105

5.39

158

27.4

160

23.5

155

17.7

152

21.0

171

15.4

155

20.9

148

Compute the matching cost with a convolutional neural network (accurate architecture). Then apply cross-based cost aggregation, semiglobal matching, left-right consistency check, median filter, and a bilateral filter. DETAILS: The network is similar to the one described in our CVPR paper differing only in the values of some hyperparameters. The input to the network are two 11 x 11 image patches. Five convolutional layers with 3 x 3 kernels and 112 feature maps extract feature vectors from the input image patches. The two 112-length feature vectors are concatenated into a 224-length vector which is passed through three fully-connected layers with 384 units each. The final (fourth) fully-connected layer projects the output to a single number---the matching cost. One important addition was the use of data augmentation techniques to increase the size of the training set. We tried to use as much training data as possible. Therefore we combined all of the 2001, 2003, 2005, 2006, and 2014 Middlebury datasets obtaining 60 image pairs. For the newer datasets (2005, 2006, and 2014) we also used several illumination and exposure settings.

08/28/15

MC-CNN-acrt

8.08

5.59

4.55

5.96

109

2.83

11.4

100

5.81

8.32

8.89

102

2.71

16.3

114

14.1

13.2

111

13.0

6.40

11.1

A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. (Improved results as of 9/14/2015 due to bug fix in color-to-gray conversion.)

09/14/15

ELAS

32.3

234

50.9

257

9.17

185

11.0

180

33.0

245

88.2

267

18.3

208

47.3

238

26.8

200

11.7

214

41.7

198

37.4

213

23.7

185

28.8

221

63.0

261

42.8

207

09/28/15

R-NCC

48.4

267

26.2

191

14.8

225

30.2

256

30.9

236

72.9

259

41.6

268

77.7

271

64.1

270

27.4

262

59.1

249

71.9

271

50.9

268

33.9

248

78.2

271

80.8

267

The method generates multiple proposals on absolute and relative disparities from multi-segmentations. The proposals are coordinated by point-wise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes.

10/13/15

MDP

12.6

137

14.4

154

4.99

118

10.6

175

10.7

151

27.2

170

8.11

127

12.5

122

8.07

4.27

137

30.4

171

20.5

141

12.6

104

17.8

141

13.4

142

17.3

128

We post-process the depth maps produced by Zbontar & LeCun's MC-CNN technique. We use a domain transform to compute an edge-aware variance measure of our confidence in the depth map, and then run our robust bilateral solver on that depth map and confidence with a Geman-McClure loss function. The MC-CNN is computed using the publicly-available implementation (https://github.com/jzbontar/mc-cnn) which using the GPU, and the robust bilateral solver is computed using our CPU implementation which does not use the GPU, and is written in vanilla C++.

11/03/15

MC-CNN+RBS

8.42

6.05

5.16

123

6.24

116

3.27

11.1

6.36

104

8.87

9.83

117

3.21

104

15.1

103

15.9

102

12.8

106

13.5

7.04

9.99

12/18/15

INTS

14.5

152

20.2

172

4.52

8.62

148

11.6

155

29.5

172

10.7

150

16.4

141

10.3

121

4.69

140

27.6

162

22.5

149

20.7

166

20.5

164

11.5

130

24.9

158

An efficient stereo matching algorithm, which applies adaptive smoothness constraints using texture and edge information, is proposed in this work. First, we determine non-textured regions, on which an input image yields flat pixel values. In the non-textured regions, we penalize depth discontinuity and complement the primary CNN-based matching cost with a color-based cost. Second, by combining two edge maps from the input image and a pre-estimated disparity map, we extract denoised edges that correspond to depth discontinuity with high probabilities. Thus, near the denoised edges, we penalize small differences of neighboring disparities. The method uses the MC-CNN code for the matching cost computation only.

01/19/16

NTDE

7.44

5.72

4.36

5.92

106

2.83

10.4

5.71

5.30

5.54

2.40

13.5

14.1

12.6

105

13.9

6.39

12.2

104

01/26/16

MC-CNN-fst

9.47

100

7.35

106

5.07

119

7.18

134

4.71

16.8

132

8.47

131

7.37

6.97

2.82

20.7

137

17.4

117

15.4

132

15.1

107

7.90

105

12.6

106

Our approach is an extension of the ELAS (from Geiger et al.) algorithm. We extract edges and sample our candidate support points along them. For every two consecutive valid support points we create a (straight) line segment. We force the triangulation to include the set of line segments (constrained Delaunay) for a better preservation of the disparity discontinuity at the edges.

02/18/16

LS-ELAS

36.7

245

53.5

260

10.3

203

15.8

215

37.0

257

83.6

266

24.5

230

49.1

242

34.6

223

13.9

223

44.9

209

45.7

237

34.9

246

29.1

225

64.4

266

62.7

246

The computation of the sparse disparity maps is achieved by means of a 3D diffusion of the costs contained in the disparity space volume. The watershed segmentations of the left and right views control the diffusion process and valid measurements are obtained by cross-checking. The estimation of the dense disparity maps uses the sparse measurements as control points and is driven by a 3D watershed separating the disparity space volume into foreground and background pixels.

03/15/16

MPSV

43.5

264

58.8

266

33.9

272

34.2

265

37.9

259

52.4

236

30.8

251

56.8

261

51.0

259

30.6

264

56.9

243

51.5

250

44.6

265

43.4

264

44.2

240

54.2

232

No post processing (no filtering, no hole-filling, no interpolation) performed. The concepts of intrinsic curves were revisited and used for: - disparity search space reduction, resulting in 83% reduction of the disparity range (individually for each pixel) directly from the original resolution of the image without needing hierarchical search - reducing the ambiguities due to occluded pixels by integrating occlusion clues explicitly into the global energy function as a soft prior The final energy minimization was done using semi global approach along eight paths.

04/03/16

ICSG

45.6

266

69.7

272

19.1

241

21.3

232

43.6

265

77.6

264

36.9

263

65.3

267

40.4

238

20.3

240

53.6

237

58.7

266

46.5

267

47.1

268

60.7

260

79.1

264

04/24/16

HLSC_cor

26.0

205

26.5

192

15.2

228

21.0

230

20.5

201

35.7

190

23.4

227

33.1

202

35.0

225

11.9

216

39.1

192

34.2

197

25.2

200

32.8

242

28.3

202

22.7

153

04/27/16

JEM

37.2

248

35.7

216

27.9

268

30.6

258

33.2

249

43.0

213

31.4

254

49.5

244

47.3

248

26.5

261

49.6

223

46.0

240

35.7

248

30.8

234

37.5

223

55.8

237

A 3D label based method with global optimization at pixel level. A bilayer matching cost is employed by first matching small square windows then aggregate on large irregular windows. Global optimization is carried out by fusing candidate proposals, which are generated from our specific superpixel structure.

05/12/16

PMSC

6.71

3.46

2.68

6.19

114

2.54

6.92

4.54

3.96

4.04

2.37

13.1

12.3

12.2

101

16.2

126

5.88

10.8

05/28/16

APAP-Stereo

7.26

5.43

4.91

113

5.11

5.17

21.6

152

6.99

109

4.31

4.23

3.24

105

14.3

9.78

7.32

13.4

6.30

8.46

07/03/16

LPU

10.4

111

11.4

138

3.18

8.10

143

6.08

105

20.9

148

8.24

129

6.94

4.00

4.04

127

33.9

184

16.9

113

15.2

129

17.8

140

9.12

111

11.6

08/31/16

SED

63.4

272

54.3

262

22.4

257

72.9

273

64.5

273

71.4

257

42.5

269

80.1

272

67.9

271

49.8

272

79.6

267

74.4

272

65.4

273

55.1

272

86.1

273

91.6

272

We propose a method to combine the predicted surface normal constraint by deep learning. With the selected reliable disparities from stereo matching method and effective edge fusion strategy, we can faithfully convert the predicted surface normal map to a disparity map by solving a least squares system which maintains discontinuity. We use the raw matching cost of MC-CNN.

09/13/16

SNP-RSM

8.75

5.46

4.85

110

6.50

121

3.37

10.4

7.31

116

8.73

9.37

111

3.58

117

14.3

14.7

14.9

128

12.8

10.1

118

10.8

10/19/16

LW-CNN

7.04

4.65

3.95

5.30

2.63

11.2

5.41

4.32

4.22

2.43

12.2

13.4

13.6

116

14.8

104

4.72

12.0

103

10/23/16

SIGMRF

64.2

273

60.0

269

33.0

271

67.9

272

63.2

272

99.5

275

39.8

266

84.8

273

82.0

273

35.2

269

95.2

273

91.5

273

58.1

271

65.8

273

55.0

255

88.6

271

11/06/16

SPS

19.6

183

14.2

153

12.3

215

14.9

209

12.0

161

15.8

124

19.1

212

17.4

147

15.4

153

8.23

192

30.9

173

34.8

200

30.6

231

25.3

207

28.3

201

28.0

166

11/15/16

MC-CNN-WS

12.1

132

14.8

157

7.20

163

11.1

183

7.62

125

15.9

126

11.8

161

11.5

118

9.01

105

3.89

122

19.7

132

20.5

140

16.3

138

16.3

127

12.1

133

18.3

136

11/16/16

MCSC

11.3

123

13.3

148

5.96

141

10.6

172

8.69

135

7.22

11.3

154

10.6

112

7.48

3.07

3.10

25.2

162

19.0

160

17.2

133

10.3

120

25.5

161

11/24/16

ADSM

38.7

251

40.4

233

20.3

245

27.3

248

35.1

254

55.9

243

22.3

224

56.1

257

50.9

258

24.2

252

58.0

245

56.3

262

36.5

250

32.1

241

38.7

228

69.7

254

01/15/17

IGF

34.0

239

42.7

241

20.1

244

23.7

241

32.2

240

45.6

224

28.6

247

43.0

225

37.2

231

21.4

245

50.9

229

44.7

236

34.7

245

31.9

239

37.4

221

47.1

221

01/24/17

3DMST

5.92

3.71

2.78

4.75

2.72

7.36

4.28

3.44

3.76

2.35

12.6

11.5

8.56

14.0

5.35

8.87

03/09/17

SGMEPi

13.9

149

6.92

6.71

156

9.47

159

9.72

144

11.8

103

13.6

184

10.9

114

10.6

124

5.26

153

32.8

180

26.9

168

22.7

180

22.7

184

12.0

132

21.7

151

03/10/17

MC-CNN+TDSR

6.35

5.45

4.45

6.80

127

3.46

10.7

6.05

5.01

5.19

2.62

10.8

9.62

6.59

11.4

6.01

7.04

We propose a novel method for stereo estimation, combining advantages of convolutional neural networks (CNNs) and optimization-based approaches. The optimization, posed as a conditional random field (CRF), takes local matching costs and consistency-enforcing (smoothness) costs as inputs, both estimated by CNN blocks. To perform the inference in the CRF we use an approach based on linear programming relaxation with a fixed number of iterations. We address the challenging problem of training this hybrid model end-to-end. We show that in the discriminative formulation (structured support vector machine) the training is practically feasible. The trained hybrid model with shallow CNNs is comparable to state-of-the-art deep models in both time and performance. The optimization part efficiently replaces sophisticated and not jointly trainable (but commonly applied) post-processing steps by a trainable, well-understood model.

03/22/17

JMR

12.5

135

4.09

3.97

8.44

147

6.93

116

11.1

13.8

185

19.5

155

19.0

174

3.66

118

17.0

119

18.2

122

18.0

155

21.0

172

7.29

17.8

132

03/23/17

DSGCA

33.8

237

42.9

242

20.9

251

23.6

240

30.2

233

45.5

223

27.6

241

42.0

224

36.0

229

21.0

241

50.2

225

44.2

235

33.3

243

34.6

249

38.4

226

46.8

219

04/04/17

DDL

30.1

222

44.3

246

19.4

242

25.8

245

28.3

228

42.1

207

21.1

221

37.1

214

28.7

206

21.7

246

46.8

215

36.0

206

30.3

230

28.4

218

32.7

214

37.5

193

05/23/17

r200high

40.9

259

70.5

273

14.4

222

21.3

231

37.7

258

72.2

258

38.1

265

53.2

252

31.4

213

18.3

230

52.4

234

52.6

254

44.1

264

45.4

267

50.7

250

66.5

250

06/14/17

DoGGuided

41.4

260

45.4

249

23.6

260

30.6

259

34.6

252

52.5

237

28.3

244

59.1

265

53.8

263

26.4

260

60.6

254

54.7

261

38.3

254

35.5

251

44.5

241

72.0

260

We propose local expansion moves for estimating dense 3D labels on a pairwise MRF. The data term uses a PatchMatch-like 3D slanted window formulation, where raw matching costs within a window are computed by MC-CNN-acrt and aggregated using guided image filtering. The smoothness term uses a pairwise curvature regularization term by Olsson et al. 2013.

06/22/17

LocalExp

5.43

3.65

2.87

2.98

1.99

5.59

3.37

3.48

3.35

2.05

10.3

9.75

8.57

14.4

5.40

9.55

We propose a feature ensemble network leveraging deep convolutional neural network to perform matching cost computation and the disparity refinement. For matching cost computation, patch-based network architecture with multi-size and multi-layer pooling unit is adopted to learn cross-scale feature representations. For disparity refinement, the initial optimal and sub-optimal disparity maps are incorporated and diverse base learners are applied.

10/12/17

FEN-D2DRR

7.23

4.68

4.11

5.03

3.03

8.42

6.05

4.90

5.32

3.20

103

11.5

14.1

13.4

114

13.9

5.06

14.3

117

We propose a robust learning-based method for stereo cost volume computation. We accomplish this by coalescing diverse evidence from a bidirectional matching process via random forest classifiers. We show that our matching volume estimation method achieves similar accuracy to purely data-driven alternatives and that it generalizes to unseen data much better. In fact, we used the same model trained on Middlebury 2014 dataset to submit to the KITTI and ETH3D benchmarks.

11/13/17

CBMV

11.1

122

6.07

5.22

125

8.09

142

4.05

18.7

142

9.31

139

10.7

113

9.61

114

3.11

100

33.7

182

15.6

101

17.5

150

17.1

132

10.1

117

14.4

118

We extend the standard BP sequential technique to the fully connected CRF models with the geodesic distance affinity. Also a new approach to the BP marginal solution is proposed that we call one-view-occlusion detection (OVOD). In contrast to the standard winner takes all (WTA) estimation, the proposed OVOD solution allows to find occluded regions in the disparity map and simultaneously improve the matching result. As a result we can perform only one energy minimization process and avoid the cost calculation for the second view and the left-right check procedure.

12/11/17

OVOD

8.87

4.74

3.64

5.51

4.82

12.8

110

6.51

106

9.91

107

9.96

119

3.13

101

16.6

116

14.8

14.1

120

15.4

111

6.92

13.2

110

02/07/18

56.1

270

47.7

254

27.9

267

36.1

267

46.7

267

62.5

250

50.2

273

72.4

270

69.9

272

37.8

270

88.2

270

70.0

269

52.8

270

50.2

271

77.5

270

91.8

273

02/28/18

SDR

7.69

5.41

4.22

4.20

2.73

10.2

5.40

6.40

5.76

4.72

141

11.2

15.4

100

13.4

113

16.5

128

5.22

13.0

109

03/09/18

SGM_RVC

18.4

173

37.4

222

5.31

128

9.03

151

14.2

175

31.7

179

14.3

193

24.7

170

12.6

133

5.27

154

31.8

178

29.7

181

24.9

198

22.0

179

18.6

172

28.2

168

Semi-Global Matching (SGM) uses an aggregation scheme to combine costs from multiple 1D scanline optimizations that tends to hurt its accuracy in difficult scenarios. We propose replacing this aggregation scheme with a new learning-based method that fuses disparity proposals estimated using scanline optimization. Our proposed SGM-Forest algorithm solves this problem using per-pixel classification. SGM-Forest currently ranks 1st on the ETH3D stereo benchmark and is ranked competitively on the Middlebury 2014 and KITTI 2015 benchmarks. It consistently outperforms SGM in challenging settings and under difficult training protocols that demonstrate robust generalization, while adding only a small computational overhead to SGM.

03/11/18

SGM-Forest

7.37

4.71

3.69

4.93

3.18

11.1

5.37

5.57

5.81

2.65

14.5

100

13.2

13.1

109

14.8

105

5.63

11.2

03/14/18

DTS

13.4

146

8.45

118

7.54

167

7.46

137

5.50

14.9

116

10.2

148

24.5

168

25.1

190

4.93

144

19.2

131

18.7

123

14.6

125

15.9

122

13.0

138

17.2

127

03/23/18

MEDIAN_ROB

97.8

275

96.1

274

95.6

274

99.0

275

98.4

275

98.4

274

99.2

275

98.4

275

98.1

274

99.0

275

99.0

275

99.6

275

99.9

275

94.7

275

95.1

274

98.3

274

03/23/18

AVERAGE_ROB

97.6

274

96.2

275

96.5

275

96.8

274

97.8

274

97.8

273

98.2

274

97.9

274

98.2

275

98.9

274

98.9

274

99.1

274

99.7

274

93.1

274

97.9

275

98.9

275

A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. Updated ELAS submission as a baseline for the Robust Vision Challenge (http://robustvision.net), replacing the original ELAS (H) entry.

03/26/18

ELAS_RVC

27.3

209

43.4

243

12.4

217

13.9

201

23.8

212

66.4

252

20.4

218

33.0

201

20.7

179

11.0

208

43.9

206

37.5

215

26.3

205

28.7

220

38.4

227

33.3

182

04/17/18

ISM

40.8

257

42.5

240

26.4

265

34.8

266

36.1

255

44.5

219

34.4

258

56.2

258

52.7

262

25.2

256

51.0

231

52.4

253

39.7

256

33.3

244

38.8

230

75.3

262

05/01/18

PSMNet_ROB

42.1

263

33.0

209

23.1

258

30.1

255

31.4

237

54.8

241

30.7

250

48.7

241

48.3

250

28.3

263

80.8

268

53.5

258

36.9

252

38.6

260

63.9

265

71.2

256

05/18/18

PDS

14.2

150

14.4

154

5.80

138

10.5

171

10.5

149

22.1

153

14.0

188

14.5

130

8.97

104

5.93

172

24.2

153

21.5

146

18.2

159

18.9

154

11.9

131

33.6

184

05/22/18

DN-CSS_ROB

22.8

192

31.4

205

9.28

189

13.5

196

12.4

163

44.3

218

12.1

163

28.1

189

17.6

169

9.11

199

50.9

230

40.0

225

21.2

170

25.0

205

31.9

213

43.2

210

05/26/18

NOSS_ROB

5.01

3.57

2.84

3.99

1.93

5.15

3.34

3.32

3.15

2.32

8.55

7.45

7.06

12.5

5.20

9.06

05/31/18

FBW_ROB

32.2

232

36.3

218

9.37

190

14.2

202

19.1

194

69.3

254

13.2

181

51.0

246

39.1

235

10.8

207

43.9

206

41.5

229

33.0

240

29.9

232

51.3

251

71.4

257

05/31/18

iResNet_ROB

24.8

201

23.0

181

10.2

202

14.7

206

12.4

162

25.9

166

12.9

172

28.0

188

24.9

189

11.5

212

46.6

214

38.9

222

21.4

172

27.8

214

45.6

244

66.7

252

05/31/18

CBMV_ROB

7.65

3.48

3.35

4.80

3.57

6.32

6.88

108

4.84

3.91

1.97

25.4

156

11.1

13.1

107

15.8

120

7.34

13.8

115

06/05/18

CBMBNet

10.2

108

8.30

115

5.10

120

6.87

129

4.52

11.5

101

7.70

120

13.9

127

13.2

134

3.04

21.9

141

13.3

13.6

117

15.4

112

11.2

128

11.0

Numerous CNN algorithms focus on the pixel-wise matching cost computation, which is the important building block for many state-of-the-art algorithms. However, these architectures are limited to small and single scale receptive fields and use traditional methods for cost aggregation or even ignore cost aggregation. In this paper, we propose a novel architecture called cascaded multi-scale and multi-dimension network (MSMD) to take them both into consideration. Firstly, we propose a new multi-scale matching cost computation sub-network, in which two different sizes of receptive fields are implemented parallelly. In this way, the network can make the best use of both variants to balance the trade-off between the increase of receptive field and the loss of details. Furthermore, we show that our multi-dimension aggregation sub-network which contains 2D convolution and 3D convolution operations can provide rich context and semantic information for estimating an accurate initial disparity.

06/14/18

MSMD_ROB

30.9

224

26.9

194

14.6

223

20.0

228

22.6

207

33.7

183

27.8

242

43.9

228

38.4

234

21.1

242

49.5

222

40.8

227

31.8

235

31.6

235

37.5

225

43.6

212

A robust solution for semi-dense stereo matching is presented. It utilizes two CNN models for computing stereo matching cost and performing confidence-based filtering, respectively. Compared to existing CNNs-based matching cost generation approaches, our method feeds additional global information into the network so that the learned model can better handle challenging cases, such as lighting changes and lack of textures. Through utilizing non-parametric transforms, our method is also more self-reliant than most existing semi-dense stereo approaches, which rely highly on the adjustment of parameters.

06/27/18

DCNN

10.9

119

5.66

4.98

116

6.49

120

5.73

12.5

109

8.51

132

15.6

135

10.9

126

3.08

24.1

152

20.2

137

16.8

145

15.5

114

10.3

121

13.8

113

07/31/18

MotionStereo

40.4

255

67.6

271

25.0

262

29.2

253

40.9

262

57.3

245

35.5

260

57.5

262

40.4

237

19.9

237

42.8

204

52.6

255

39.8

257

37.1

255

51.7

252

34.9

185

10/10/18

DISCO

24.5

198

35.0

214

7.34

165

11.7

184

18.7

193

48.6

228

17.1

202

31.4

196

22.4

181

9.33

202

46.0

212

33.5

193

27.5

217

24.6

201

27.5

199

55.0

235

10/29/18

iResNet

22.9

193

28.3

197

9.19

187

15.8

214

19.3

197

35.1

186

11.3

155

27.7

184

16.8

166

15.2

224

54.7

241

27.6

173

19.5

162

21.5

175

31.9

212

51.6

228

10/29/18

Dense-CNN

7.98

5.59

4.54

5.83

100

2.79

10.4

5.78

8.26

8.84

101

2.66

15.6

109

14.2

13.2

110

13.2

6.30

11.1

11/07/18

IEBIMst

33.8

238

36.7

220

12.1

214

16.9

220

32.5

242

51.0

232

25.3

233

58.1

263

49.8

254

11.2

210

48.6

219

56.9

264

30.2

229

26.8

213

26.9

197

71.7

258

11/08/18

HSM-Net_RVC

10.2

107

12.0

142

5.32

129

7.50

139

6.72

115

15.6

121

9.89

144

6.83

5.14

4.17

134

22.7

144

17.1

114

15.6

134

14.3

10.8

126

14.6

119

11/11/18

MBM

22.8

191

36.4

219

9.95

200

15.3

211

19.3

196

36.5

192

19.9

216

27.5

183

18.1

170

10.1

205

41.5

197

32.7

189

26.3

207

23.3

190

21.4

186

39.9

202

We propose four efficient feature extractors based on convolutional neural networks for stereo matching cost computation. Two of them generate multiscale features with diverse receptive field sizes. These multiscale features are used to compute the corresponding multiscale matching costs. We then determine an optimal cost by combining the multiscale costs using edge information. On the other hand, the other two feature extractors produce uni-scale features by combining multiscale features directly through fully connected layers. Finally, after obtaining matching costs using one of the four extractors, we determine optimal disparities based on the cross-based cost aggregation and the semiglobal matching.

11/28/18

MSFNetA

7.96

6.21

4.26

6.02

111

3.66

8.95

6.28

103

8.41

8.06

2.62

17.9

124

13.9

11.9

100

11.5

8.00

106

10.6

01/12/19

EHCI_net

9.47

101

3.75

4.27

13.1

192

27.6

225

5.30

3.23

3.47

3.18

3.90

123

9.20

9.58

9.26

13.9

17.3

165

10.6

01/17/19

FASW

28.6

216

41.7

237

18.1

238

23.1

238

27.2

224

40.6

201

19.1

213

34.9

207

28.1

205

18.5

233

40.8

196

36.4

209

29.3

225

28.4

217

31.1

209

41.0

204

12/18/18

MCV-MFC

24.8

200

26.8

193

9.63

198

14.8

208

17.4

186

54.1

240

14.2

191

26.5

175

18.2

171

16.0

227

62.6

258

28.7

179

20.9

167

24.6

202

37.4

222

48.1

225

02/05/19

AMNet

53.3

268

54.3

261

63.7

273

51.2

271

51.3

269

40.6

203

39.9

267

51.6

247

55.9

266

55.4

273

58.9

248

57.5

265

52.7

269

49.5

270

58.1

258

61.6

244

The method comprises two main steps. First, we use adaptive support weights for local matching. Apart from the color similarity and geometric distance, the adaptive weight distribution favors pixels in the block matching with smaller cost. Besides, we use a multiscale strategy with invalidation criteria to reduce match ambiguity and computational time. Second, a global interpolation using a variational formulation is carried out. The energy functional penalizes deviations from the local disparity estimation at different scales.

02/15/19

DAWA-F

27.4

210

47.2

253

13.6

220

13.1

194

19.2

195

66.4

252

20.4

219

30.3

194

33.9

221

8.73

196

48.9

220

37.8

217

26.7

210

29.9

231

28.0

200

36.5

190

Stereo matching process is attracted numbers of study in recent years. The process is unique and difficult due to visual discomfort occurred which contributed to effect of accuracy of disparity maps. By using multistage technique implemented most of Stereo Matching Algorithm; taxonomy by D. Scharstein and R. Szeliski, in this paper proposed new improvement algorithm of stereo matching by using the effect of Adaptive Weighted Bilateral Filter as main filter in cost aggregation stage which able contribute edge-preserving factor and robust against plain colour region. With some improvement parameters in matching cost computation stage where windows size of sum of absolute different (SAD) and thresholds adjustment was applied and Median Filter as main filter in refinement disparity map’s stage may overcome the limitation of disparity map accuracy. Evaluation on indoor datasets, latest (2014) Middlebury dataset were used to prove that Adaptive Weighted Bilateral Filter effect applied on proposed algorithm resulted smooth disparity maps and achieved good processing time.

03/06/19

SM-AWP

38.1

249

30.7

204

24.0

261

25.2

244

30.3

234

44.9

220

38.1

264

56.0

256

55.8

265

19.9

238

60.1

252

51.2

248

32.1

237

30.2

233

40.0

234

61.7

245

03/09/19

3DMST-CM

5.47

4.10

3.37

2.99

2.95

7.63

4.55

3.26

3.95

2.16

10.2

8.28

6.37

13.2

5.86

9.35

This paper presents a novel unsupervised stereo matching cost for stereo matching. Specifically, a novel two-branch convolutional sparse coding (CSC) is used to learn the convolution filter bank without ground truth disparity maps. Then, the sparse representations over the learned convolutional filter bank are utilized to measure the similarity between image patches, namely, the stereo matching cost can be computed by measuring the l1 distance between sparse representations of image patches.

04/12/19

TCSCSM

19.1

181

45.2

248

5.76

137

11.0

179

22.1

206

41.1

205

13.4

183

24.8

171

11.4

130

7.17

181

29.5

168

26.6

166

26.6

208

20.5

165

16.5

161

17.4

129

05/10/19

tMGM-16

17.3

164

8.70

123

6.49

150

9.82

163

20.7

202

13.7

112

13.9

187

21.8

160

16.0

157

5.57

162

26.9

158

25.4

163

28.3

219

22.1

180

14.6

150

39.6

199

In this work, we propose a learning-based method to denoise and refine disparity maps of a given stereo method. The proposed variational network arises naturally from unrolling the iterates of a proximal gradient method applied to a variational energy defined in a joint disparity, color, and confidence image space. Our method allows to learn a robust collaborative regularizer leveraging the joint statistics of the color image, the confidence map and the disparity map. Due to the variational structure of our method, the individual steps can be easily visualized, thus enabling interpretability of the method. We can therefore provide interesting insights into how our method refines and denoises disparity maps. The efficiency of our method is demonstrated by the publicly available stereo benchmarks Middlebury 2014 and Kitti 2015.

05/13/19

14.2

151

9.69

128

9.58

196

10.9

178

7.33

120

9.54

13.8

186

11.3

117

11.3

129

7.17

181

27.4

161

23.3

154

24.8

195

22.8

186

14.6

151

18.4

137

05/15/19

PSMNet_2000

28.9

218

20.4

174

8.23

174

15.1

210

27.7

227

35.2

187

15.2

198

50.8

245

51.8

260

9.29

201

61.9

256

31.1

185

25.2

199

27.8

214

29.3

204

52.9

230

We propose "DeepPruner", a real-time stereo matching algorithm, which combines the strength of deep network and search space pruning techniques. Towards this goal, we developed a differentiable PatchMatch module that allows us to discard most disparities and generates a sparse representation of the cost-volume. We then exploit this representation to learn which range to prune for each pixel. Our method achieves competitive results on KITTI / SceneFlow datasets while running in real-time at 62ms. Moreover, we obtain the first place (on overall rankings) on the Robust Vision Challenge. For more details, check out our paper and source code.

06/26/19

DeepPruner_ROB

30.1

219

34.2

212

19.9

243

24.3

242

23.8

212

47.2

226

26.1

234

26.1

173

22.8

182

18.4

232

59.8

251

36.5

211

23.2

182

31.7

237

48.3

249

44.8

216

07/26/19

EdgeStereo

18.7

174

25.3

189

6.79

157

10.6

173

25.1

218

22.1

153

8.31

130

24.5

166

16.5

162

6.63

178

9.20

32.0

187

24.6

193

20.2

162

19.2

176

54.3

233

It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest inference methods, a truncated max-product Belief Propagation, and add what is necessary to make it a proper component of a deep learning model: We connect it to learning formulations with losses on marginals and compute the backprop operation. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs), allowing us to design a hierarchical model composing BP inference and CNNs at different scale levels. The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.

11/07/19

LBPS

9.68

104

5.05

4.98

116

5.57

3.24

6.03

12.9

174

5.44

5.50

3.55

116

15.2

104

15.9

105

17.5

149

17.3

135

9.84

115

28.3

169

11/11/19

CACA-Net

31.7

230

21.8

179

16.0

230

21.5

233

24.5

214

38.0

196

34.6

259

38.6

216

36.3

230

19.0

235

49.4

221

40.2

226

32.1

238

33.6

247

36.0

217

58.2

239

11/14/19

HSM-Smooth-Occ

10.8

118

11.7

139

5.62

134

8.75

149

8.39

134

15.4

117

9.60

140

8.29

6.11

4.09

129

23.6

149

20.4

139

17.0

146

13.4

10.2

119

16.6

125

11/15/19

100

SPPSMNet

41.4

261

32.5

207

20.6

247

34.0

264

33.0

244

55.9

244

36.3

262

53.8

253

48.6

251

35.0

268

71.6

263

52.8

256

37.1

253

38.1

259

46.7

246

56.9

238

12/19/19

101

CRLE

5.75

3.66

3.11

5.92

106

2.14

6.01

3.39

3.49

3.68

2.34

10.2

9.63

8.04

14.9

106

5.45

9.26

12/30/19

102

F-GDGIF

31.6

229

37.3

221

7.72

168

16.1

216

34.9

253

35.2

188

20.1

217

55.3

255

46.6

245

9.01

198

47.6

216

52.9

257

29.4

226

29.7

230

29.4

206

61.0

243

01/02/20

103

PPEP-GF

34.6

241

42.4

239

21.7

254

24.8

243

30.8

235

44.0

217

25.1

232

45.7

234

42.1

240

20.1

239

44.1

208

43.6

233

35.2

247

32.8

243

39.3

232

55.0

236

01/05/20

104

MTS2

53.8

269

51.7

259

21.5

252

38.8

269

52.7

270

97.5

272

43.0

270

66.4

268

60.8

268

32.0

265

85.7

269

69.0

268

46.3

266

45.1

265

71.2

269

85.2

269

01/07/20

105

ADSR_GIF

37.1

247

43.6

245

18.6

240

36.7

268

24.6

215

58.6

247

22.8

226

56.3

259

49.7

253

18.7

234

56.0

242

48.5

246

32.2

239

24.5

200

36.3

218

79.1

265

02/07/20

106

CasStereo

18.8

176

23.9

186

9.01

183

10.5

170

11.7

157

74.0

260

13.1

180

10.1

109

7.86

4.09

129

45.4

211

25.2

160

24.4

192

17.3

136

20.5

184

44.3

214

02/20/20

107

CRAR

22.0

190

23.2

184

13.5

219

16.4

218

16.3

182

21.0

149

21.8

222

28.5

190

26.9

202

10.5

206

32.5

179

32.9

190

23.3

183

23.6

195

20.3

183

37.4

192

02/24/20

108

SGBMP

27.8

211

37.5

225

16.3

232

17.1

223

27.6

226

75.7

263

14.6

195

33.4

204

25.8

196

12.2

218

60.3

253

34.0

195

23.1

181

29.3

226

28.7

203

31.2

177

03/13/20

109

MTS

59.7

271

58.8

267

25.2

263

51.1

270

60.4

271

91.3

269

48.3

272

70.3

269

63.4

269

44.4

271

79.3

266

71.7

270

60.9

272

47.4

269

79.6

272

88.4

270

05/14/20

110

SRM

13.1

143

8.50

119

7.04

161

7.86

140

7.73

129

16.1

127

7.90

125

18.4

152

18.5

173

5.03

148

22.3

143

20.0

134

18.1

157

18.5

148

11.3

129

19.3

142

05/19/20

111

SUWNet

30.1

221

24.5

187

13.9

221

20.3

229

20.0

199

35.7

189

26.1

235

40.9

222

43.2

242

17.9

229

49.8

224

28.6

178

24.4

190

28.5

219

52.5

254

37.9

195

05/20/20

112

AANet++

15.4

158

17.5

165

8.37

177

10.2

167

9.86

145

23.9

159

9.82

143

17.7

150

15.9

156

3.25

106

18.1

126

27.1

170

16.2

137

18.4

147

20.0

181

37.7

194

05/28/20

114

RTSMNet

45.6

265

47.0

251

21.9

255

31.9

261

36.4

256

75.1

262

43.9

271

58.9

264

55.3

264

32.7

266

62.2

257

56.4

263

42.2

262

39.1

261

58.0

257

59.9

241

05/28/20

113

LEAStereo

7.15

7.56

109

4.52

4.62

4.64

8.83

5.66

5.86

6.03

3.30

109

13.1

11.3

10.3

12.1

7.06

9.90

06/08/20

115

MANE

30.9

225

54.7

263

11.5

209

14.6

205

29.4

232

52.6

238

26.4

237

45.1

232

31.5

216

11.5

212

42.5

200

41.8

230

33.1

241

31.6

236

34.2

215

43.5

211

07/16/20

116

HLocalExp-CM

5.68

3.68

2.95

3.92

2.45

8.12

3.41

3.74

3.53

2.17

10.2

10.0

8.75

14.1

5.12

9.61

07/17/20

117

GANetREF_RVC

18.9

178

16.6

163

6.42

148

7.40

136

10.6

150

25.8

165

12.2

165

36.5

212

35.5

227

4.10

131

33.8

183

20.1

135

16.4

141

20.2

163

14.3

148

48.1

226

07/21/20

118

AANet_RVC

25.2

202

22.6

180

11.3

207

12.9

191

15.9

179

30.5

177

17.9

206

33.4

203

30.9

212

6.34

177

28.8

166

43.4

231

25.3

201

26.7

212

37.0

220

69.8

255

08/10/20

119

CVANet_RVC

31.8

231

25.6

190

14.6

224

21.7

234

22.1

205

39.8

198

28.4

245

44.7

230

47.0

247

19.1

236

50.6

228

30.4

183

24.7

194

29.3

227

52.1

253

41.6

206

Accurate disparity prediction is a hot spot in computer vision, and how to efﬁciently exploit contextual information is the key to improve the performance. In this paper, we propose a simple yet effective non-local context attention network (NLCANet) to exploit the global context information by using attention mechanisms and semantic information for stereo matching. First, we develop a 2D geometry feature learning (GFL) module to get a more discriminative representation by taking advantage of multi-scale features and form them into the variance-based cost volume. Then, we construct a non-local attention matching (NLAM) module by using the non-local block and hierarchical 3D convolutions, which can effectively regularize the cost volume and capture the global contextual information. Finally, we adopt a geometry reﬁnement (GR) module to reﬁne the disparity map to further improve the performance. Moreover, we add the warping loss function to help the model learn the matching rule of the non-occluded region. Our experiments show that (1), our approach achieves competitive results on KITTI and SceneFlow datasets in the end-point error (EPE) and the fraction of erroneous pixels (D 1 ); (2), our proposed method particularly has superior performance in the reﬂective regions and occluded areas.

08/11/20

120

NLCA_NET_v2_RVC

10.4

110

11.8

140

4.12

6.39

118

6.44

111

19.7

145

10.9

151

14.5

130

13.2

135

3.26

108

21.2

139

14.7

10.1

14.5

101

7.17

11.5

08/12/20

121

CFNet_RVC

10.1

105

14.4

156

7.81

170

7.12

133

6.61

113

15.5

119

7.53

118

12.3

121

11.5

131

3.02

10.7

16.6

109

10.7

15.4

113

10.9

127

9.01

09/03/20

122

LPSM

39.5

253

40.0

230

20.7

249

28.3

252

34.0

251

34.3

184

23.8

229

56.7

260

52.4

261

24.9

255

36.9

190

66.3

267

40.6

260

37.5

257

46.6

245

79.3

266

09/09/20

123

AdaStereo

13.7

148

19.6

170

7.41

166

10.6

174

14.5

177

15.7

123

7.85

123

22.6

162

9.32

110

7.00

180

9.20

22.4

148

14.5

124

17.8

142

14.8

152

24.2

156

10/28/20

124

HITNet

6.46

6.25

4.67

104

4.51

2.17

6.52

5.18

2.92

2.66

2.37

36.7

188

9.28

6.27

11.2

4.61

9.54

We propose a novel lightweight network for stereo estimation. The method uses densely connected layer structures to learn expressive features without the need of fully-connected layers or 3D convolutions. This leads to a network structure with only 0.37M parameters while still having competitive results. The post-processing consists of filtering, a consistency check and hole filling.

11/10/20

125

FC-DCNN

17.9

169

21.2

176

6.52

153

9.56

160

14.1

173

31.9

181

23.4

228

23.4

163

19.7

175

5.93

172

26.9

157

22.8

150

20.0

163

19.3

157

18.2

170

23.9

155

11/12/20

126

RLStereo

27.9

213

20.5

175

15.0

227

23.5

239

26.3

223

51.5

235

35.8

261

27.1

182

23.4

186

15.6

225

63.6

259

32.3

188

21.5

173

23.2

189

44.7

242

17.4

130

11/12/20

127

UnDAF-GANet

16.2

161

3.74

2.94

16.7

219

18.3

188

24.1

160

26.3

236

19.2

154

15.7

154

1.86

36.8

189

26.8

167

11.1

24.8

203

6.54

28.0

167

11/16/20

128

SSCasStereo

15.2

157

33.6

210

5.73

136

8.13

144

12.6

165

51.1

233

8.19

128

16.7

145

5.02

5.70

168

48.5

217

17.3

116

16.0

135

20.1

161

12.3

136

9.25

11/21/20

129

DecStereo

20.2

185

19.4

169

11.9

212

15.6

212

13.5

171

23.0

157

26.7

238

13.3

125

15.1

152

7.60

186

28.3

164

30.2

182

23.4

184

17.6

137

38.9

231

38.4

196

11/25/20

130

LPSC

10.7

115

5.15

4.23

5.48

6.38

108

16.5

129

7.84

122

9.56

102

10.3

122

4.02

126

20.2

134

19.0

126

17.7

151

18.5

149

9.73

114

18.0

134

11/26/20

131

CooperativeStereo

28.8

217

28.5

200

12.3

216

17.3

224

18.5

192

62.3

249

22.4

225

36.3

211

24.7

188

15.8

226

74.5

265

37.8

218

28.4

220

26.6

211

41.6

237

28.4

170

12/22/20

132

SLCCF

8.83

6.97

101

4.90

112

6.05

112

4.35

8.89

5.33

6.29

5.15

4.80

143

13.0

18.1

121

17.8

153

17.7

139

6.93

15.4

122

12/24/20

133

ACR-GIF-OW

24.5

197

37.5

223

10.8

204

16.3

217

17.4

187

44.9

221

17.2

203

33.5

205

25.2

191

11.4

211

45.4

210

35.7

205

26.6

209

23.3

192

23.6

190

38.4

197

This model is trained on low-resolution data but aims at high-resolution images. It uses a recurrent module to iteratively update a coarse disparity prediction. Then a special refinement module makes a final adjustment. The recurrent update and final refine are applied in a patch-wise manner across the initial disparity.

03/05/21

134

ORStereo

19.1

180

38.9

228

9.97

201

9.21

155

23.3

210

42.6

211

13.0

177

18.2

151

6.63

4.93

144

35.4

185

33.1

191

24.1

188

23.6

194

18.2

169

26.0

163

03/05/21

135

LocalExp-RC

5.54

3.78

3.02

3.85

2.08

5.95

3.48

3.61

3.65

2.52

10.3

6.85

7.25

16.1

125

5.12

10.2

04/22/21

136

LESC

6.78

4.07

3.46

3.26

3.36

9.15

4.08

4.76

5.21

2.80

11.7

13.0

10.2

17.0

130

5.52

12.5

105

05/09/21

137

ADSG

24.7

199

36.3

217

11.3

206

15.6

213

20.0

198

35.9

191

18.2

207

35.2

209

27.2

203

12.2

219

42.7

203

33.8

194

26.3

206

23.3

191

27.0

198

36.6

191

06/02/21

138

FADNet_RVC

28.4

215

18.3

166

9.48

192

13.9

200

16.0

180

40.9

204

13.0

178

43.4

226

45.0

243

8.43

193

57.8

244

35.7

203

24.8

196

23.9

198

47.4

248

68.1

253

06/07/21

139

FADNet++

40.2

254

25.2

188

22.1

256

33.4

263

28.8

231

54.0

239

33.6

257

46.8

236

46.8

246

23.5

251

73.4

264

47.1

245

28.5

221

37.7

258

65.4

268

71.9

259

We propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs.

06/10/21

140

ReS2tAC

35.8

244

41.8

238

20.7

248

28.1

251

24.6

216

42.4

209

30.3

248

38.9

218

34.9

224

23.0

249

50.3

227

48.9

247

39.8

259

39.4

262

44.1

239

65.1

249

06/11/21

141

R3DCNN

33.0

235

34.2

213

15.8

229

13.4

195

41.7

263

47.9

227

22.0

223

60.1

266

57.4

267

12.6

221

40.3

195

46.4

241

26.8

212

37.0

254

19.3

177

45.2

217

The method that estimate optimal parameters for MRF stereo can not be directly used to estimate parameters for local expansion moves stereo. To estimate regularization weight for local expansion moves stereo, we propose the probabilistic mixture models for slanted patch matching terms and curvature regularization terms.

06/23/21

142

ERW-LocalExp

5.53

3.64

2.84

2.66

1.97

5.68

4.87

3.27

3.25

2.36

10.5

11.5

7.46

14.7

103

5.55

9.18

07/23/21

144

HBP_ISP

5.20

3.70

3.05

3.57

2.34

7.80

3.79

3.34

3.09

1.87

9.85

10.1

7.82

11.2

5.26

7.86

07/26/21

145

RAFT-Stereo

4.74

4.19

3.44

3.11

1.51

7.30

2.79

2.67

2.59

1.39

7.46

10.2

5.86

13.0

3.59

9.38

07/14/21

143

MFN_USFDSRVC

36.7

246

31.4

206

16.1

231

22.0

235

28.5

229

42.3

208

21.1

220

52.2

249

50.7

257

21.4

244

53.6

238

53.8

259

30.7

232

31.7

238

63.7

264

60.7

242

08/22/21

146

SDCO

19.0

179

30.4

203

5.92

140

9.11

153

21.5

204

37.5

194

12.3

166

26.8

178

16.7

165

5.68

167

29.4

167

30.6

184

25.6

202

23.1

188

17.5

166

18.9

140

A lightweight network with dilated ResNet feature extractor, a correlation cost volume run at a low resolution, and a refinement network to get a full resolution disparity output. Sparse disparity is processed from the dense disparity using a threshold on the network confidence output and a region grower to remove suspected bad disparities.

08/24/21

147

MMStereo

12.7

139

27.9

196

8.71

179

8.81

150

11.7

157

26.9

169

5.82

20.9

158

14.6

147

4.10

131

15.4

107

16.0

106

14.2

121

13.6

9.71

113

7.35

09/20/21

148

GANet-RSSM

10.6

113

11.9

141

8.54

178

6.60

122

6.26

107

16.3

128

7.10

113

15.5

134

14.5

146

2.93

11.3

16.1

108

10.8

16.0

124

10.7

123

11.3

10/17/21

149

ACVNet

13.6

147

9.69

128

3.65

4.82

7.48

122

22.9

156

12.9

174

15.7

136

14.4

145

3.82

121

21.8

140

17.7

118

14.3

122

15.6

115

25.6

194

32.1

179

10/25/21

150

SWFSM

8.21

5.46

4.66

103

5.90

103

2.92

10.9

5.59

8.91

9.58

113

2.72

13.2

14.8

13.4

114

13.4

7.76

103

11.4

11/10/21

151

CREStereo

3.71

4.73

3.94

5.07

1.96

3.02

1.42

2.28

2.05

1.51

6.86

6.35

4.25

6.01

4.60

5.49

11/21/21

152

FENet

11.3

126

7.70

111

3.91

3.97

6.24

106

16.7

130

5.78

32.1

198

32.4

219

2.57

11.8

10.8

6.90

13.4

5.41

11.2

11/21/21

153

Gwc_CoAtRS

6.50

6.92

6.82

158

4.55

3.48

5.12

5.80

4.88

4.96

2.69

15.3

106

12.8

6.40

10.2

7.13

8.48

01/27/22

154

UPFNet

10.3

109

9.74

131

4.67

104

6.28

117

5.54

20.1

146

8.78

133

9.42

101

7.51

3.78

120

22.9

146

16.0

107

16.7

144

15.6

116

8.70

108

16.0

123

02/27/22

155

MSTR

8.72

6.28

6.00

142

4.13

5.00

8.03

7.81

121

5.33

5.80

3.25

106

20.3

136

14.4

11.2

12.9

7.29

31.5

178

03/02/22

156

AANet_Edge

23.7

194

28.4

199

9.56

194

14.8

207

14.3

176

34.9

185

15.9

200

23.9

164

17.2

167

5.62

165

31.6

176

35.2

201

29.6

228

22.1

181

41.4

236

74.0

261

04/11/22

157

Z2ZNCC

34.4

240

40.9

235

20.4

246

29.9

254

32.0

239

42.5

210

27.0

239

41.4

223

38.0

233

26.0

257

48.6

218

43.7

234

36.5

249

33.3

245

37.5

224

41.3

205

05/25/22

158

LSMSW

8.15

5.45

4.64

101

5.93

108

2.93

10.6

5.68

8.70

9.23

109

2.68

13.4

14.6

13.3

112

13.5

7.69

101

11.4

06/13/22

159

EAI-Stereo

3.68

4.02

3.32

2.48

1.42

4.19

2.37

2.18

2.01

1.16

10.2

8.84

4.00

7.15

3.14

6.44

07/14/22

161

CRMV2

11.9

130

9.65

127

7.98

171

10.0

165

5.81

100

12.4

107

12.3

167

9.96

108

8.75

100

5.64

166

20.3

135

21.1

145

17.2

147

15.8

120

13.7

143

19.6

143

07/13/22

160

ACT

35.0

242

40.7

234

21.7

253

27.5

249

32.5

243

43.9

215

27.5

240

45.4

233

41.0

239

24.6

253

52.5

235

43.5

232

34.2

244

29.7

229

39.6

233

47.4

222

08/05/22

162

RDNet

11.3

124

11.2

136

5.24

126

5.45

6.51

112

16.7

131

8.89

135

16.7

145

14.9

151

4.75

142

16.5

115

19.3

131

15.4

131

12.5

12.1

133

14.2

116

08/08/22

163

UCFNet_RVC

10.7

116

12.2

144

6.48

149

5.83

100

5.90

103

16.9

133

6.61

107

15.8

138

14.6

148

2.73

11.4

18.8

124

11.0

18.9

155

10.7

123

11.4

08/09/22

164

issga

18.9

177

12.0

143

11.6

210

11.1

182

18.3

189

14.3

115

14.6

196

28.6

191

26.2

198

5.90

171

13.5

41.4

228

21.9

176

22.2

182

19.4

178

30.7

174

08/22/22

165

PSM-Aug

15.0

154

10.1

132

9.43

191

10.8

176

8.87

137

13.8

113

9.63

141

14.0

128

14.7

150

5.98

174

20.7

138

24.6

158

21.3

171

21.1

173

22.7

188

29.0

172

08/29/22

166

MCP-HA-VQ

30.6

223

47.8

255

17.8

237

23.0

237

25.9

221

41.9

206

24.6

231

38.2

215

31.5

215

18.4

231

43.0

205

35.7

204

29.6

227

29.6

228

35.8

216

47.9

224

09/01/22

167

GMStereo

7.14

6.30

6.20

146

6.22

115

6.62

114

9.79

2.76

5.69

5.17

4.04

127

14.0

11.2

6.81

11.8

6.90

12.8

107

In recent years, convolutional-neural-network based stereo matching methods have achieved significant gains compared to conventional methods in terms of both speed and accuracy. Current state-of-the-art disparity estimation algorithms require many parameters and large amounts of computational resources and are not suited to applications on edge devices. In this paper, we propose an end-to-end light-weight network (LWNet) for fast stereo matching, which consists of an efficient backbone with multi-scale feature fusion for feature extraction, a 3D U-Net aggregation architecture for disparity computation and a color guidance in 2D CNN for disparity refinement.

09/20/22

168

LWNet

40.9

258

38.1

226

18.4

239

30.5

257

33.3

250

43.2

214

30.9

252

49.2

243

50.6

256

22.8

248

58.1

246

54.2

260

41.8

261

37.5

256

58.8

259

81.5

268

09/22/22

169

DCstereo

12.6

136

10.1

133

9.23

188

9.04

152

8.19

132

12.3

106

7.29

115

16.4

142

14.7

149

5.11

150

18.8

129

20.3

138

14.4

123

19.4

158

13.2

140

20.0

147

09/27/22

170

FCDSN-DC

18.8

175

23.0

182

7.01

160

10.2

168

20.1

200

37.7

195

17.3

204

27.8

186

20.8

180

7.81

189

23.9

150

24.5

157

22.4

177

20.7

167

16.0

159

19.9

146

10/01/22

171

CREStereo++_RVC

4.68

5.09

4.04

5.24

4.21

5.05

2.11

3.52

3.58

1.67

8.01

6.61

4.68

9.53

4.61

5.98

10/02/22

172

MaskLacGwcNet_RVC

10.4

112

7.52

108

4.50

5.21

6.94

117

18.6

141

5.18

14.7

132

13.3

137

3.01

28.5

165

18.0

120

8.95

11.2

14.2

147

13.8

113

10/02/22

173

raft+_RVC

8.29

11.1

135

4.49

5.97

110

10.3

148

28.5

171

3.75

5.07

2.88

2.21

12.2

15.2

12.3

102

12.7

5.09

10.8

10/03/22

175

GEStereo_RVC

7.97

6.70

3.52

5.90

103

7.63

127

22.5

155

7.61

119

4.89

4.22

2.19

10.4

14.9

11.8

14.3

4.36

11.9

102

10/03/22

174

CroCo_RVC

15.1

156

7.43

107

5.85

139

6.71

126

11.7

156

15.4

117

3.94

36.2

210

35.8

228

3.41

113

18.1

125

29.3

180

10.9

18.0

143

10.6

122

21.0

149

10/03/22

176

iRaftStereo_RVC

8.07

9.13

125

8.25

175

5.55

4.68

6.92

6.41

105

6.29

6.19

3.96

124

17.9

123

13.0

9.58

11.4

9.24

112

11.8

101

10/06/22

177

AGCVNet

12.0

131

10.6

134

5.14

121

5.47

7.00

118

17.0

135

8.91

137

18.9

153

15.7

155

4.64

139

15.8

110

19.1

129

16.6

143

13.7

15.0

153

14.6

121

10/06/22

178

GwcSlice

12.7

138

13.4

149

4.76

107

5.33

7.69

128

17.0

134

11.1

153

13.7

126

9.88

118

4.22

135

20.1

133

20.1

136

17.4

148

16.9

129

14.0

145

36.5

189

10/07/22

179

MCNet

11.6

128

12.6

145

4.72

106

6.63

124

7.62

125

19.0

143

11.4

156

12.5

123

9.12

107

4.47

138

17.6

122

23.1

153

15.5

133

13.5

8.44

107

30.1

173

10/08/22

180

MANet

17.5

167

23.0

183

5.25

127

9.82

163

11.4

154

31.1

178

12.9

173

22.5

161

16.1

160

7.95

191

24.9

154

28.1

176

24.4

191

20.5

165

18.9

175

31.1

176

10/13/22

181

19.9

184

15.1

159

13.1

218

14.3

203

14.2

174

14.2

114

8.90

136

16.4

140

16.0

159

12.3

220

50.2

226

36.4

209

21.7

175

21.8

178

31.8

211

40.4

203

10/16/22

182

LMCR-Stereo

6.27

6.20

4.59

100

3.92

2.66

4.52

4.88

3.65

3.41

2.08

16.8

118

11.2

8.58

13.2

6.89

10.5

Cost aggregation plays a critical role in existing stereo matching methods. Generally, aggregating matching costs in homogeneous regions with similar disparities is benefi- cial to matching accuracy. However, previous approaches commonly use 3D convolutions for cost aggregation with- out considering the homogeneity of different regions. In this paper, we revisit cost aggregation in stereo match- ing from a perspective of disparity classification and pro- pose a generic yet efficient Disparity Context Aggregation (DCA) module to improve the performance of CNN-based methods.

10/26/22

183

DCANet

8.55

8.41

117

6.26

147

4.79

5.41

10.3

7.14

114

10.1

110

9.75

116

3.38

112

12.8

13.5

12.4

103

12.7

7.37

10.2

11/10/22

184

DLNR

3.20

2.91

2.37

2.18

1.67

3.21

1.37

1.66

1.11

6.25

7.07

3.45

8.90

4.43

2.91

11/11/22

185

ICVP

7.97

11.3

137

3.97

5.02

8.79

136

17.1

136

5.62

7.51

6.97

3.09

13.7

12.7

9.23

10.9

6.28

9.73

12/02/22

186

GANet+ADL

17.7

168

21.3

177

3.97

6.61

123

11.7

157

25.9

167

6.07

100

40.5

220

35.0

226

3.68

119

24.1

151

19.3

130

13.9

118

14.4

23.0

189

33.5

183

12/05/22

187

Ct-Net

21.0

188

38.5

227

11.3

207

11.7

185

17.4

185

31.7

180

13.0

176

27.1

181

20.4

178

7.45

185

27.9

163

27.7

174

16.5

142

23.5

193

38.8

229

24.4

157

12/12/22

188

KPEA-Stereo

10.6

114

9.69

128

5.35

132

4.52

6.43

109

12.1

104

6.99

109

11.7

119

7.84

5.59

164

16.6

117

19.1

128

13.1

108

17.1

131

12.2

135

26.0

164

01/14/23

189

AASNet

12.8

141

12.8

146

4.41

9.40

158

7.56

124

17.3

138

11.7

160

15.3

133

14.2

141

4.13

133

15.0

102

19.7

132

20.5

165

15.2

110

17.0

163

16.9

126

02/22/23

190

GLC_STEREO

6.42

5.35

4.65

102

5.19

5.79

7.59

2.22

10.4

111

14.1

140

2.00

9.92

10.8

4.94

5.07

6.16

5.72

03/06/23

191

PCVNet

8.19

7.01

102

6.51

151

5.89

102

4.53

7.42

8.10

126

5.49

5.62

2.90

22.0

142

12.7

8.07

11.9

7.87

104

21.9

152

03/07/23

192

GOAT18

8.73

7.26

104

7.32

164

6.80

127

3.47

10.3

10.4

149

5.14

5.16

4.95

146

15.9

111

13.9

11.2

9.62

13.1

139

16.4

124

04/18/23

193

DMCANet

7.79

7.91

112

4.12

3.79

4.26

11.2

10.1

147

6.76

4.85

3.32

111

12.9

13.3

10.5

12.9

9.11

110

10.1

04/28/23

194

ADStereo

18.0

170

16.4

162

14.9

226

12.6

190

21.3

203

20.6

147

16.6

201

15.8

137

16.0

158

7.43

184

19.1

130

52.0

251

24.8

197

18.1

144

17.7

167

11.2

06/09/23

195

SSVM-CFPMF

9.52

103

8.58

120

4.40

5.51

5.84

101

5.84

7.02

111

6.16

14.3

143

5.30

155

17.0

119

15.9

103

14.8

126

18.7

150

6.52

13.7

111

06/22/23

196

IGEV-Stereo

4.83

3.17

2.46

1.97

2.19

5.63

1.22

16.2

139

9.20

108

1.17

3.77

4.93

5.35

6.99

2.31

5.00

06/26/23

197

CCL-Stereo

30.9

227

50.9

257

9.17

185

11.0

180

33.0

245

88.2

267

1.91

47.3

238

26.8

200

11.7

214

41.7

198

37.4

213

23.7

185

28.8

221

63.0

261

42.8

207

08/03/23

198

26.6

207

39.9

229

17.7

236

22.2

236

23.0

209

36.6

193

18.3

210

29.7

193

24.2

187

16.9

228

42.6

201

34.0

196

28.9

224

28.8

223

26.2

196

39.8

201

08/10/23

199

CroCo-Stereo

7.29

4.90

3.62

1.74

7.01

119

9.90

1.78

16.4

143

17.4

168

1.45

6.20

15.3

4.95

8.62

5.00

10.0

08/12/23

200

UGRU

10.8

117

4.71

4.27

2.12

13.2

170

15.7

122

1.95

20.5

157

25.4

193

1.68

7.78

25.2

160

11.1

14.4

8.79

109

10.5

08/13/23

201

Any-RAFT

5.22

5.19

4.20

4.00

2.23

5.88

4.06

3.05

2.91

2.04

9.76

10.7

8.77

9.90

4.94

6.72

We propose a novel deep stereo matching network a new real-world stereo dataset of cluttered objects taken with a commercially available stereo sensor. We design a U-shaped architecture with various types of attentions which more efficiently extracts global and local contexts from rectified image pairs, resulting in highly accurate disparities. Furthermore, its symmetric structure allows simultaneous estimation both left and right disparity. It can also implicitly estimate the uncertainty i.e. the confidence of estimated disparities.

09/14/23

202

CASS

11.8

129

9.23

126

8.92

182

10.4

169

7.84

130

12.4

107

4.43

8.42

8.65

6.03

176

30.9

172

17.1

115

15.3

130

15.7

117

17.8

168

19.0

141

09/27/23

203

FM-DT

40.6

256

45.4

250

20.8

250

32.3

262

40.5

261

51.3

234

30.6

249

52.1

248

48.0

249

23.2

250

53.4

236

47.0

243

39.7

255

29.0

224

55.3

256

76.2

263

10/09/23

204

EGLCR-Stereo

4.03

4.69

2.46

3.70

2.99

10.7

2.48

1.95

1.63

0.94

5.76

8.17

3.84

10.3

2.99

4.87

10/30/23

205

LSTS

17.3

165

8.70

123

6.18

145

8.41

146

9.63

143

21.3

150

13.2

182

29.5

192

29.1

208

5.00

147

25.0

155

24.9

159

22.6

178

21.4

174

15.4

155

33.1

181

10/30/23

206

LoS

4.20

5.85

4.92

114

4.64

2.77

3.92

1.32

2.36

2.17

1.81

8.18

6.58

4.55

8.57

4.57

5.06

11/10/23

207

GASNet

33.1

236

21.3

178

16.9

233

26.3

247

33.2

248

39.5

197

17.7

205

26.7

177

26.0

197

21.3

243

54.1

240

46.9

242

33.3

242

36.8

253

63.2

263

63.4

247

11/12/23

208

SNDR

6.09

5.30

4.20

3.11

2.66

9.22

4.70

5.10

3.98

4.26

136

9.96

7.01

9.11

15.1

108

4.57

7.26

11/13/23

209

Selective-IGEV

2.51

2.54

1.86

2.51

1.12

7.22

1.23

1.36

1.17

1.16

4.48

4.83

2.99

3.79

2.26

4.72

11/16/23

210

LoS_RVC

5.14

7.57

110

4.82

109

4.27

3.20

8.71

2.62

3.45

2.95

1.56

8.91

6.79

6.57

9.87

6.67

4.61

12/07/23

211

4D-IteraStereo

10.9

120

5.87

5.59

133

6.15

113

6.07

104

9.15

7.46

117

27.0

180

32.4

218

2.46

12.2

11.2

7.18

12.2

7.37

6.77

This article presents a disparity map algorithm to improve the depth map estimation based on Census Transform and hierarchical segment-tree on each block.The stereo matching algorithm presented in this study comprises of four steps: Cost Computation, Cost Aggregation, Optimization, and Post-Processing, all of which will refine the final disparity map.

12/31/23

212

H-CENST

38.4

250

41.6

236

26.7

266

31.8

260

33.0

247

43.0

212

32.7

255

53.1

251

50.5

255

24.8

254

51.4

232

47.0

244

36.7

251

31.9

240

40.5

235

53.4

231

Unsupervised Stereo Matching methods have made significant strides recently. However, these approaches have predominantly relied on the assumption of photometric consistency, leading to potential limitations: sensitivity to illuminance changes and difficulty in dealing with problematic areas like occluded or textureless regions. To mitigate these limitations, this paper introduces a novel self-supervised dual-level framework named \textbf{\textit{Dual-Net}}. This framework mainly consists of two key components: self-supervised teacher training and student training based on knowledge distillation. Specifically, the teacher model is first trained in a self-supervised fashion with a focus on feature space and data augmentation consistency. On the one hand, pixels from feature space are robust to noise and luminance changes, which are discriminative even in textureless regions. On the other hand, a data augmentation consistency loss is presented to guide the model toward enhanced contextual awareness, thus leading to a completed depth estimation in problematic regions. Then, the knowledge learned by the teacher model is distilled and transferred probabilistically to the student model. By leveraging this distilled knowledge, the student model is guided by validated insights, enabling it to outperform its teacher model by a large margin.

01/08/24

213

DualNet

16.4

162

19.7

171

7.99

172

10.1

166

18.3

190

24.1

160

10.0

145

23.9

165

20.4

176

7.79

188

23.0

147

23.1

151

16.3

140

18.8

153

17.0

162

18.5

138

01/08/24

214

GINet

15.6

159

16.1

160

7.15

162

7.37

135

9.39

141

25.1

163

7.88

124

35.2

208

32.6

220

3.19

102

15.5

108

16.7

112

11.6

14.7

102

15.8

157

26.4

165

01/31/24

215

HART

4.24

3.13

2.24

4.16

1.10

4.01

2.03

1.86

1.68

0.85

9.83

11.0

8.71

9.65

3.26

6.96

02/19/24

216

HCR

12.4

134

8.33

116

3.79

5.54

9.27

140

26.1

168

6.26

102

32.9

200

32.0

217

2.38

11.4

10.8

8.08

15.8

119

7.69

101

6.48

The project proposes a stereo matching network based on neural operator, which can achieve mapping from RGB image pair space to disparity space. This network supports users to test images at any scale, and can customize the disparity range according to different scenarios, and dynamically build Cost Volume based on different scales and disparity ranges.

02/20/24

217

DispNO

15.0

153

18.4

168

6.17

144

9.13

154

11.3

152

25.2

164

11.4

156

17.6

149

14.3

142

8.70

195

31.7

177

21.6

147

18.1

156

17.6

138

16.2

160

17.5

131

02/21/24

218

ClearDepth

3.48

4.14

3.16

2.81

1.95

4.55

2.36

1.73

1.70

1.25

5.46

11.2

3.12

7.30

3.70

3.45

02/28/24

219

AKD_Stereo

3.87

4.21

3.53

3.91

1.08

7.63

4.75

1.72

1.60

1.18

5.26

9.62

3.66

7.63

3.23

5.37

03/04/24

220

AEACV

4.15

5.53

2.98

2.54

3.23

3.42

1.57

2.85

2.99

1.22

4.63

5.96

4.36

12.9

5.41

4.08

03/04/24

221

ET_Stereo

4.00

4.38

3.33

2.85

1.53

7.84

2.61

1.91

1.82

1.05

5.08

8.72

7.52

8.81

3.09

4.82

03/11/24

222

StereoIM

9.25

3.47

3.05

1.78

9.22

139

10.6

1.65

27.0

179

25.6

195

1.25

5.89

20.9

143

4.99

12.0

3.39

10.6

03/11/24

223

MIF-Stereo

11.3

125

6.68

154

5.90

103

7.33

120

8.57

11.9

162

11.1

115

11.6

132

5.24

152

18.4

128

27.7

175

11.8

15.8

118

13.8

144

19.8

145

04/14/24

224

SMFormer

12.8

142

14.2

152

7.76

169

7.10

132

6.43

109

17.6

139

8.81

134

9.71

106

6.39

3.49

115

16.3

112

19.1

127

10.9

18.3

146

30.3

208

35.6

186

04/19/24

225

DCSE

16.2

160

16.1

161

4.76

107

6.47

119

12.5

164

29.9

174

8.91

137

34.7

206

33.9

222

3.98

125

22.8

145

18.9

125

16.3

138

15.2

109

10.7

123

23.8

154

05/17/24

226

FormerRaft_RVC

10.9

121

13.4

150

8.32

176

6.67

125

9.42

142

15.6

120

3.24

9.67

105

10.5

123

5.30

155

13.8

17.8

119

9.52

17.2

134

17.1

164

18.8

139

06/05/24

227

MGS-Stereo

3.57

3.62

2.93

3.43

2.66

6.24

2.54

2.04

2.15

1.23

5.81

8.40

3.56

6.48

3.18

4.81

06/06/24

228

MoCha-V2

3.51

2.52

1.95

2.25

1.47

4.61

0.98

7.35

8.07

0.66

2.95

4.18

4.46

5.70

2.54

2.70

06/14/24

229

IGEV++

3.23

3.24

2.46

4.12

1.15

6.71

1.38

1.53

1.52

1.02

4.57

4.68

5.41

7.68

2.22

4.68

06/27/24

230

CAS++

3.33

4.27

3.72

3.17

2.17

2.44

1.33

2.24

2.01

1.47

4.04

8.15

4.97

5.80

3.73

3.04

07/22/24

231

apnet

30.9

226

18.3

167

9.59

197

17.1

222

24.8

217

49.1

229

19.5

214

32.3

199

29.2

209

22.2

247

60.7

255

33.2

192

27.0

214

28.0

216

64.4

267

63.7

248

08/01/24

233

RSM

2.40

2.66

1.88

3.18

0.91

5.80

1.34

1.35

1.16

0.93

3.35

3.96

2.88

4.38

2.01

4.15

08/07/24

234

AIO-Stereo

2.36

2.38

1.71

3.22

0.85

5.83

1.24

1.42

1.32

1.03

4.49

4.81

2.43

3.61

2.12

3.63

08/12/24

235

PointerNet

2.69

2.67

1.84

3.21

1.51

7.52

1.29

1.54

1.17

1.09

3.59

3.96

3.10

5.60

2.29

4.27

08/13/24

236

UniTT-Stereo

6.34

3.96

2.69

1.82

7.92

131

11.7

102

1.81

14.2

129

13.8

138

1.22

5.07

16.6

111

4.09

5.89

2.91

8.44

This paper focuses on effectively capturing local patterns from images during the fine-tuning of Transformer-based models with limited labeled training data in dense downstream tasks, particularly in the context of stereo matching. For that, we propose MaDis-stereo, a novel stereo depth estimation framework that enhances locality inductive biases during fine-tuning via Masked Image Modeling (MIM).

08/15/24

237

MaDis-Stereo

9.49

102

3.73

3.14

1.76

9.05

138

10.5

1.74

27.8

185

27.9

204

1.50

7.47

19.8

133

4.80

11.8

3.40

10.2

07/27/24

232

esmea

30.1

220

29.4

202

9.48

192

17.0

221

31.7

238

49.7

230

15.2

197

52.6

250

45.9

244

11.9

217

46.5

213

52.1

252

27.2

216

23.7

196

25.2

192

54.5

234

09/08/24

238

RSD

3.73

2.13

1.98

1.71

2.03

2.63

0.87

8.66

9.69

115

0.96

2.54

6.82

2.34

7.76

2.23

2.57

09/26/24

239

GCAP_Stereo

4.31

5.32

3.40

2.38

2.16

11.2

4.44

2.13

2.04

1.32

7.16

8.97

5.03

8.38

3.22

6.08

We propose S-MoEStereo, which adapts pre-trained VFMs for stereo matching by integrating Low-Rank Adaptation (LoRA) with Mixture-of-Experts (MoE) modules. This approach balances parameter efficiency and discriminative feature learning by dynamically selecting the optimal expert within each MoE module. Additionally, we introduce CNN-based adapter layers to incorporate inductive bias, enhancing geometric feature extraction. Furthermore, we propose a lightweight decision network to reduce computational costs by selectively activating MoE modules based on input complexity.

10/26/24

240

SMoEStereo_RVC

5.83

6.58

5.15

122

3.96

3.82

5.84

5.41

4.43

4.21

2.31

10.2

8.41

6.20

11.1

6.59

8.19

10/30/24

241

MonoStereo

2.64

3.72

1.68

1.77

1.05

10.5

0.88

1.27

0.97

0.63

4.39

8.10

4.59

3.70

1.73

2.69

11/01/24

242

GIP-stereo

4.03

2.86

2.31

2.64

1.76

5.00

1.55

6.75

7.30

0.86

4.63

7.24

3.51

10.6

2.06

2.37

11/03/24

243

DEFOM-Stereo

2.39

2.82

2.21

1.53

1.01

5.24

0.88

1.40

1.14

0.85

2.64

9.10

2.18

5.50

2.49

1.67

11/05/24

244

coffe_stereo

2.82

2.70

1.98

1.87

0.61

3.32

2.45

1.07

1.30

1.02

2.63

4.13

2.18

8.38

2.27

11.6

11/10/24

246

CFF

15.1

155

7.95

113

8.00

173

5.26

10.2

147

12.3

105

4.94

21.1

159

23.4

185

5.40

159

16.3

112

35.2

202

17.8

154

20.9

169

26.0

195

19.6

143

11/10/24

245

AdaRStereo

12.2

133

6.86

6.70

155

4.89

8.30

133

10.8

2.75

26.3

174

25.5

194

3.31

110

14.8

101

26.3

165

9.13

15.9

123

14.4

149

11.3

12/16/24

248

RPS

2.61

2.46

1.71

3.92

0.79

5.19

2.44

0.93

0.84

0.93

2.41

3.39

3.45

3.28

1.82

11.6

11/28/24

247

DEFOM-Stereo_RVC

3.28

3.50

2.61

2.41

0.87

2.51

0.89

1.38

1.26

0.97

6.35

10.8

2.43

11.0

3.03

5.00

02/03/25

249

FoundationStereo

1.84

2.46

1.71

1.36

0.79

5.19

0.53

0.93

0.84

0.93

2.41

3.39

3.45

3.28

1.82

1.17

02/06/25

250

SLEDC

10.1

106

8.58

120

4.40

5.51

5.84

101

21.4

151

7.02

111

6.16

14.3

143

5.30

155

17.0

119

15.9

103

14.8

126

18.7

150

6.52

13.7

111

02/10/25

251

GREAT-IGEV

2.81

3.30

2.44

2.31

0.96

7.12

1.17

1.38

1.36

1.04

3.89

3.82

4.66

6.24

2.17

4.65

02/16/25

252

TCM

12.8

140

13.5

151

5.64

135

7.01

131

12.0

160

30.2

176

10.0

146

9.66

104

7.53

5.58

163

27.0

159

21.0

144

16.0

136

18.8

152

12.5

137

18.1

135

03/02/25

253

State-Stereo

2.64

3.55

3.06

1.91

1.19

1.67

1.28

1.68

1.66

1.22

1.64

4.45

5.81

5.06

3.25

2.38

03/04/25

254

LG-Stereo

1.76

2.57

1.86

2.02

0.65

3.23

0.68

0.98

0.81

0.55

2.15

4.26

2.03

3.85

1.27

2.42

03/19/25

255

G2L-Stereo

13.3

145

16.8

164

4.45

5.64

16.8

184

23.0

158

5.05

20.4

156

16.2

161

2.83

15.3

105

24.0

156

18.2

158

19.5

159

5.48

25.4

160

We introduce Stereo Anywhere, a novel stereo-matching framework that combines geometric constraints with robust priors from monocular depth Vision Foundation Models (VFMs). By elegantly coupling these complementary worlds through a dual-branch architecture, we seamlessly integrate stereo matching with learned contextual cues.

04/24/25

257

StereoAnywhere

3.69

7.34

105

2.23

5.12

18.1

140

0.90

2.16

1.43

1.25

5.73

4.95

2.66

6.89

2.28

1.86

M2-Stereo embedded three Multi scale Feature Fusion Attention Blocks in the feature extraction stage to fuse deep and shallow information, and used a Multi scale Cost Aggregation Module in the cost aggregation stage to achieve sharing of cost information at different scales. Finally, the Multi branch Iterative Strategy was used for efficient iteration.

04/24/25

256

M2-Stereo

3.90

8.13

114

2.35

1.61

3.58

15.9

125

1.43

1.83

1.13

0.79

3.09

7.36

6.23

7.99

2.63

4.03

05/07/25

258

G2L-ROB

11.6

127

15.0

158

5.19

124

4.91

13.1

169

19.4

144

6.22

101

17.5

148

14.0

139

2.55

13.9

20.8

142

14.0

119

18.2

145

6.96

14.6

120

DS-Stereo utilizes our proposed Adjacent Feature Hybrid Attention Block and Hierarchical Cost Aggregation Module to achieve deep to shallow information interaction in stereo matching. Simultaneously replacing the traditional ConvGRU iterative operator with an Inception like iterative operator to achieve high convergence updates.

05/07/25

259

DS-Stereo

3.13

4.92

2.14

1.40

1.23

10.1

1.16

1.55

1.25

0.93

3.41

6.18

5.10

8.66

2.21

2.43

05/10/25

260

MatchStereo

1.85

2.61

2.14

1.79

0.59

1.30

0.80

1.11

0.95

0.65

1.90

3.17

2.41

5.43

1.88

1.79

05/21/25

261

waterstereo

8.48

6.74

6.51

151

9.22

156

5.11

6.44

4.71

5.50

5.36

3.44

114

18.2

127

14.2

9.13

14.4

13.3

141

12.9

108

06/05/25

262

MGS-Selectiv

2.16

2.62

1.58

2.20

0.76

6.45

1.04

1.39

1.08

0.67

1.73

4.32

1.43

6.29

1.83

2.30

This paper proposes a robust stereo matching algorithm that combines a CNN for initial cost computation, bilateral filtering with cross-based cost aggregation (CBCA) for refinement, and a winner-take-all (WTA) strategy for disparity selection, followed by an edge-aware smoothing filter (EASF) to reduce noise

06/12/25

263

IRDINA

35.4

243

40.3

232

23.5

259

26.0

246

28.6

230

40.6

200

28.5

246

46.6

235

43.2

241

26.1

258

51.8

233

39.2

223

32.0

236

33.3

246

44.8

243

46.9

220

06/17/25

264

UnViTAStereo

24.3

196

28.5

201

9.72

199

12.5

189

12.7

166

29.6

173

12.1

164

45.0

231

39.5

236

9.19

200

42.7

202

27.0

169

21.1

169

22.3

183

36.7

219

38.8

198

06/27/25

266

S2M2

1.15

1.29

1.23

1.27

0.40

0.45

0.59

0.67

0.62

0.45

1.28

2.80

1.37

3.60

1.12

0.25

06/17/25

265

PanMatch

7.18

5.21

5.34

130

3.34

5.43

4.52

2.47

13.2

124

13.3

136

1.51

8.34

16.6

109

8.07

10.3

4.96

9.23

07/11/25

267

SLEDC_v1

6.67

4.22

2.72

3.49

3.38

13.3

111

5.11

4.36

3.92

2.19

13.5

10.8

11.9

14.5

100

6.83

8.27

07/24/25

269

BridgeDepth

3.78

13.0

147

2.45

1.58

1.54

9.56

2.27

3.67

1.65

1.29

7.63

6.44

2.72

7.79

2.70

2.68

07/19/25

268

GEAStereo

3.80

2.93

2.29

2.08

2.52

6.53

2.14

2.11

2.32

1.36

6.97

6.42

5.55

10.9

2.33

5.06

09/02/25

270

MonSter++

2.60

7.04

103

1.61

1.91

1.04

8.92

0.85

2.08

1.02

0.75

3.06

8.01

2.73

3.84

2.11

2.00

10/23/25

271

VMStereo-Base

4.52

5.00

4.18

3.81

2.90

3.92

3.34

3.71

3.67

1.69

9.38

6.61

5.79

6.95

3.53

9.02

11/07/25

272

DepthFocus

1.53

1.96

1.77

1.01

0.49

0.59

0.66

0.92

0.76

0.57

1.17

4.24

1.96

5.11

1.66

0.43

11/11/25

273

BLMT-Stereo

1.57

2.24

1.58

3.95

0.59

4.58

1.22

1.03

1.01

0.54

1.18

1.81

1.16

1.18

1.91

1.15

12/02/25

274

DispViT+

4.92

8.62

122

2.27

1.58

3.45

17.3

137

1.63

9.31

100

8.95

103

0.94

5.85

6.27

3.02

7.84

2.18

2.61

12/22/25

275

SelfViTAS

17.4

166

23.2

185

8.83

181

12.2

188

11.4

153

24.7

162

9.67

142

16.5

144

10.7

125

9.96

204

35.6

186

27.3

171

19.3

161

20.9

170

21.6

187

42.9

209

Reference list