vision.middlebury.edu/stereo/eval3

Middlebury Stereo Evaluation - Version 3

#page { display:none; } #noscript { display:inline; background-color:crimson; color:white; font-size:14px; font-weight:bold; } Please enable javascript to use the site.

Mouseover the table cells to see the produced disparity map. Clicking a cell will blink the ground truth for comparison. To change the table type, click the links below. For more information, please see the description of new features.

Submit and evaluate your own results.

Set:	test densetest sparsetraining densetraining sparse
Metric:	bad 0.5 bad 1.0 bad 2.0 bad 4.0 avgerr rms A50 A90 A95 A99 time time/MP time/GD
Mask:	nonocc all
plot selected show invalid Reset sort Reference list

Reference

Description

Parameters

Running Environment

[stat] error

bad 2.0 (%)

Weight

Date

Name

Res

Avg

Austr

AustrP

Bicyc2

Class

ClassE

Compu

Crusa

CrusaP

Djemb

DjembL

Hoops

Livgrm

Nkuba

Plants

Stairs

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 250
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 1.5
nd: 256
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 410
im0	im1
GT
nonocc

MP: 5.9
nd: 320
im0	im1
GT
nonocc

MP: 5.5
nd: 570
im0	im1
GT
nonocc

MP: 5.6
nd: 320
im0	im1
GT
nonocc

MP: 5.2
nd: 450
im0	im1
GT
nonocc

OpenCV's "semi-global block matching" method; memory-intensive 2-pass version, which can only handle the quarter-size images. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/25/14

SGBM2

26.4

205

27.9

196

12.1

214

17.8

225

13.7

171

74.5

263

14.0

188

30.3

194

26.3

198

11.0

209

64.4

262

37.9

220

25.8

203

25.3

206

29.3

201

43.7

213

OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/25/14

SGBM1

27.9

211

28.3

198

17.2

237

19.0

227

14.5

177

57.9

248

15.6

199

31.8

196

31.4

213

13.2

222

58.6

249

38.6

222

27.0

213

25.9

209

31.4

207

59.7

242

The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.

07/25/14

SGM

20.8

186

35.5

216

9.57

195

13.8

199

16.5

183

32.1

182

19.0

211

25.8

171

16.7

163

8.95

198

39.8

194

31.1

185

22.6

177

20.7

168

21.3

183

32.2

180

07/25/14

SGBM1

28.4

213

43.5

245

9.09

184

13.6

198

25.9

224

82.0

267

14.4

194

43.4

228

30.3

211

5.98

174

59.3

252

45.8

239

28.5

221

24.9

203

20.1

180

45.9

218

07/28/14

SGBM1

23.8

194

32.9

209

10.8

206

13.6

197

16.2

181

71.2

258

12.6

169

26.6

175

23.0

182

5.83

169

53.8

241

39.2

224

25.6

201

22.8

184

18.8

171

47.4

223

07/28/14

SGM

18.4

171

40.3

232

4.54

8.03

141

22.9

210

40.5

200

11.4

157

24.7

168

10.1

120

5.40

159

29.6

170

28.5

176

23.9

186

20.0

160

14.2

143

30.9

175

07/28/14

SGM

25.3

202

45.1

248

4.33

6.87

128

32.2

243

50.0

232

13.0

178

48.1

241

18.3

171

7.66

188

29.6

169

36.1

207

31.2

234

24.2

198

24.5

188

50.2

227

Correlation with five, partly overlapping windows on Census transformed images using Hamming distance as matching cost. A left-right consistency check ensures unique matches and filtering small disparity segments removes outliers. Interpolation is done within image rows with the lowest, valid neighboring disparity.

07/28/14

Cens5

26.6

207

47.1

253

8.74

181

11.9

186

25.6

221

45.3

223

19.5

215

40.6

221

29.0

207

9.93

204

36.5

186

38.6

221

31.0

233

25.0

205

25.6

190

44.6

215

A fast method for high-resolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/- 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.

08/25/14

LPS

19.2

181

6.14

5.34

129

9.24

157

7.53

122

96.0

273

12.3

167

9.61

103

9.40

112

5.18

151

92.4

274

27.4

171

24.3

188

23.0

186

10.0

113

25.6

162

08/27/14

LPS

20.3

185

6.72

6.06

143

9.72

161

9.87

146

94.3

272

14.1

189

11.2

116

11.2

126

5.88

170

89.3

273

36.0

206

20.5

163

23.8

196

16.0

155

25.4

159

08/31/14

BSM

41.5

264

59.8

270

25.8

266

27.9

252

38.9

262

60.6

250

33.3

258

46.9

238

37.3

232

26.3

261

64.8

263

51.5

250

42.6

265

45.2

268

42.8

239

66.6

253

09/10/14

LAMC_DSM

26.0

203

55.8

266

11.9

212

14.3

204

18.3

190

44.0

217

18.3

209

39.9

219

29.5

210

6.67

179

31.1

174

34.5

198

28.8

223

26.3

210

30.1

204

35.7

187

09/18/14

SNCC

21.9

188

48.6

257

6.98

159

9.79

162

25.7

222

46.0

226

12.4

168

36.8

212

16.6

162

7.25

183

23.1

147

34.2

197

26.7

210

21.8

176

19.9

178

28.4

171

10/07/14

IDR

18.1

170

37.5

225

4.08

7.49

138

23.3

213

40.6

202

12.8

170

24.5

166

11.3

127

5.46

161

33.1

181

26.0

164

21.5

173

21.7

175

15.3

151

21.2

149

In stereo matching cost filtering methods and energy minimization algorithms are considered as two different techniques. Due to their global extend energy minimization methods obtain good stereo matching results. However, they tend to fail in occluded regions, in which cost filtering approaches obtain better results. In this paper we intend to combine both approaches with the aim to improve overall stereo matching results. We propose to perform stereo matching as a two-step energy minimization algorithm. We consider two MRF models: a fully connected model defined on the complete set of pixels in an image and a conventional locally connected model. We solve the energy minimization problem for the fully connected model, after which the marginal function of the solution is used as the unary potential in the locally connected MRF model.

01/21/15

TSGO

39.1

254

34.1

212

16.9

236

20.0

228

43.3

266

55.4

244

14.3

192

54.1

255

49.2

253

33.9

269

66.2

264

45.9

240

39.8

260

42.6

264

47.2

248

52.6

229

04/08/15

REAF

31.4

228

58.3

267

30.9

272

13.1

192

45.3

268

63.8

253

30.9

255

38.7

216

25.3

191

8.60

195

39.3

193

36.8

211

27.0

212

35.5

251

18.2

168

39.7

200

04/09/15

PFS

32.2

234

65.1

272

29.4

271

12.1

188

50.0

270

70.8

257

28.2

245

44.6

230

23.1

183

7.85

191

37.0

190

37.7

216

27.9

217

36.0

253

19.8

177

35.7

188

04/17/15

TMAP

16.9

163

20.2

175

4.94

114

8.13

144

12.8

166

30.0

175

11.7

158

27.9

186

20.4

176

5.09

149

31.5

175

23.1

152

20.9

167

19.0

156

18.8

172

18.0

132

This approach triangulates the polygonized SLIC segmentations of the input images and optimizes a lower-layer MRF on the resulting set of triangles defined by photo consistency and normal smoothness. The lower-layer MRF is solved by a quadratic relaxation method which iterates between PatchMatch and Cholesky Decomposition. The lower-layer MRF is assisted by a upper-layer MRF defined on the set of triangle vertices which exploits local 'visual complexity' cues and encourages smoothness of the vertices' splitting properties. The two layers interact through an Alignment energy term which requires triangles sharing a non-split vertex to have their disparities agree on that vertex. Optimization of the whole model is iterated between optimizations of the two layers till convergence where the upper-layer can be solved in closed form.

04/19/15

MeshStereo

13.2

144

5.90

4.88

110

10.8

177

12.9

167

10.6

11.0

151

12.2

120

9.01

105

5.39

158

27.4

160

23.5

155

17.7

152

21.0

170

15.4

152

20.9

147

Compute the matching cost with a convolutional neural network (accurate architecture). Then apply cross-based cost aggregation, semiglobal matching, left-right consistency check, median filter, and a bilateral filter. DETAILS: The network is similar to the one described in our CVPR paper differing only in the values of some hyperparameters. The input to the network are two 11 x 11 image patches. Five convolutional layers with 3 x 3 kernels and 112 feature maps extract feature vectors from the input image patches. The two 112-length feature vectors are concatenated into a 224-length vector which is passed through three fully-connected layers with 384 units each. The final (fourth) fully-connected layer projects the output to a single number---the matching cost. One important addition was the use of data augmentation techniques to increase the size of the training set. We tried to use as much training data as possible. Therefore we combined all of the 2001, 2003, 2005, 2006, and 2014 Middlebury datasets obtaining 60 image pairs. For the newer datasets (2005, 2006, and 2014) we also used several illumination and exposure settings.

08/28/15

MC-CNN-acrt

8.08

5.59

4.55

5.96

108

2.83

11.4

5.81

8.32

8.89

103

2.71

16.3

113

14.1

13.2

110

13.0

6.40

11.1

A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. (Improved results as of 9/14/2015 due to bug fix in color-to-gray conversion.)

09/14/15

ELAS

32.3

235

50.9

259

9.17

185

11.0

180

33.0

247

88.2

269

18.3

208

47.3

239

26.8

199

11.7

214

41.7

198

37.4

213

23.7

184

28.8

221

63.0

263

42.8

208

09/28/15

R-NCC

48.4

269

26.2

192

14.8

226

30.2

258

30.9

238

72.9

261

41.6

270

77.7

273

64.1

272

27.4

264

59.1

251

71.9

273

50.9

270

33.9

249

78.2

273

80.8

269

The method generates multiple proposals on absolute and relative disparities from multi-segmentations. The proposals are coordinated by point-wise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes.

10/13/15

MDP

12.6

136

14.4

154

4.99

117

10.6

175

10.7

151

27.2

169

8.11

126

12.5

122

8.07

4.27

137

30.4

171

20.5

141

12.6

102

17.8

140

13.4

139

17.3

127

We post-process the depth maps produced by Zbontar & LeCun's MC-CNN technique. We use a domain transform to compute an edge-aware variance measure of our confidence in the depth map, and then run our robust bilateral solver on that depth map and confidence with a Geman-McClure loss function. The MC-CNN is computed using the publicly-available implementation (https://github.com/jzbontar/mc-cnn) which using the GPU, and the robust bilateral solver is computed using our CPU implementation which does not use the GPU, and is written in vanilla C++.

11/03/15

MC-CNN+RBS

8.42

6.05

5.16

122

6.24

115

3.27

11.1

6.36

102

8.87

9.83

117

3.21

103

15.1

102

15.9

101

12.8

105

13.5

7.04

9.99

12/18/15

INTS

14.5

152

20.2

174

4.52

8.62

148

11.6

154

29.5

171

10.7

149

16.4

141

10.3

121

4.69

140

27.6

162

22.5

149

20.7

165

20.5

164

11.5

127

24.9

158

An efficient stereo matching algorithm, which applies adaptive smoothness constraints using texture and edge information, is proposed in this work. First, we determine non-textured regions, on which an input image yields flat pixel values. In the non-textured regions, we penalize depth discontinuity and complement the primary CNN-based matching cost with a color-based cost. Second, by combining two edge maps from the input image and a pre-estimated disparity map, we extract denoised edges that correspond to depth discontinuity with high probabilities. Thus, near the denoised edges, we penalize small differences of neighboring disparities. The method uses the MC-CNN code for the matching cost computation only.

01/19/16

NTDE

7.44

5.72

4.36

5.92

105

2.83

10.4

5.71

5.30

5.54

2.40

13.5

14.1

12.6

103

13.9

6.39

12.2

103

01/26/16

MC-CNN-fst

9.47

7.35

107

5.07

118

7.18

134

4.71

16.8

132

8.47

131

7.37

6.97

2.82

20.7

136

17.4

117

15.4

132

15.1

103

7.90

102

12.6

105

Our approach is an extension of the ELAS (from Geiger et al.) algorithm. We extract edges and sample our candidate support points along them. For every two consecutive valid support points we create a (straight) line segment. We force the triangulation to include the set of line segments (constrained Delaunay) for a better preservation of the disparity discontinuity at the edges.

02/18/16

LS-ELAS

36.7

247

53.5

262

10.3

204

15.8

215

37.0

259

83.6

268

24.5

231

49.1

243

34.6

223

13.9

223

44.9

209

45.7

238

34.9

247

29.1

225

64.4

268

62.7

248

The computation of the sparse disparity maps is achieved by means of a 3D diffusion of the costs contained in the disparity space volume. The watershed segmentations of the left and right views control the diffusion process and valid measurements are obtained by cross-checking. The estimation of the dense disparity maps uses the sparse measurements as control points and is driven by a 3D watershed separating the disparity space volume into foreground and background pixels.

03/15/16

MPSV

43.5

266

58.8

268

33.9

274

34.2

267

37.9

261

52.4

238

30.8

253

56.8

262

51.0

260

30.6

266

56.9

245

51.5

251

44.6

267

43.4

265

44.2

241

54.2

233

No post processing (no filtering, no hole-filling, no interpolation) performed. The concepts of intrinsic curves were revisited and used for: - disparity search space reduction, resulting in 83% reduction of the disparity range (individually for each pixel) directly from the original resolution of the image without needing hierarchical search - reducing the ambiguities due to occluded pixels by integrating occlusion clues explicitly into the global energy function as a soft prior The final energy minimization was done using semi global approach along eight paths.

04/03/16

ICSG

45.6

268

69.7

274

19.1

243

21.3

233

43.6

267

77.6

266

36.9

265

65.3

269

40.4

238

20.3

242

53.6

239

58.7

267

46.5

269

47.1

270

60.7

262

79.1

266

04/24/16

HLSC_cor

26.0

204

26.5

193

15.2

229

21.0

231

20.5

202

35.7

190

23.4

228

33.1

201

35.0

225

11.9

216

39.1

192

34.2

196

25.2

199

32.8

242

28.3

199

22.7

153

04/27/16

JEM

37.2

250

35.7

217

27.9

270

30.6

260

33.2

251

43.0

214

31.4

256

49.5

245

47.3

249

26.5

263

49.6

223

46.0

241

35.7

250

30.8

234

37.5

221

55.8

238

A 3D label based method with global optimization at pixel level. A bilayer matching cost is employed by first matching small square windows then aggregate on large irregular windows. Global optimization is carried out by fusing candidate proposals, which are generated from our specific superpixel structure.

05/12/16

PMSC

6.71

3.46

2.68

6.19

113

2.54

6.92

4.54

3.96

4.04

2.37

13.1

12.3

12.2

16.2

122

5.88

10.8

05/28/16

APAP-Stereo

7.26

5.43

4.91

112

5.11

5.17

21.6

151

6.99

108

4.31

4.23

3.24

104

14.3

9.78

7.32

13.4

6.30

8.46

07/03/16

LPU

10.4

110

11.4

138

3.18

8.10

143

6.08

104

20.9

147

8.24

128

6.94

4.00

4.04

127

33.9

184

16.9

113

15.2

129

17.8

139

9.12

108

11.6

08/31/16

SED

63.4

274

54.3

264

22.4

259

72.9

275

64.5

275

71.4

259

42.5

271

80.1

274

67.9

273

49.8

274

79.6

269

74.4

274

65.4

275

55.1

274

86.1

275

91.6

274

We propose a method to combine the predicted surface normal constraint by deep learning. With the selected reliable disparities from stereo matching method and effective edge fusion strategy, we can faithfully convert the predicted surface normal map to a disparity map by solving a least squares system which maintains discontinuity. We use the raw matching cost of MC-CNN.

09/13/16

SNP-RSM

8.75

5.46

4.85

108

6.50

120

3.37

10.4

7.31

115

8.73

9.37

111

3.58

116

14.3

14.7

14.9

128

12.8

10.1

115

10.8

10/19/16

LW-CNN

7.04

4.65

3.95

5.30

2.63

11.2

5.41

4.32

4.22

2.43

12.2

13.4

13.6

115

14.8

100

4.72

12.0

102

10/23/16

SIGMRF

64.2

275

60.0

271

33.0

273

67.9

274

63.2

274

99.5

277

39.8

268

84.8

275

82.0

275

35.2

271

95.2

275

91.5

275

58.1

273

65.8

275

55.0

257

88.6

273

11/06/16

SPS

19.6

182

14.2

153

12.3

216

14.9

209

12.0

160

15.8

124

19.1

212

17.4

146

15.4

152

8.23

193

30.9

173

34.8

199

30.6

231

25.3

206

28.3

198

28.0

166

11/15/16

MC-CNN-WS

12.1

131

14.8

157

7.20

163

11.1

183

7.62

124

15.9

126

11.8

160

11.5

118

9.01

105

3.89

122

19.7

131

20.5

140

16.3

138

16.3

123

12.1

130

18.3

135

11/16/16

MCSC

11.3

122

13.3

148

5.96

141

10.6

172

8.69

135

7.22

11.3

153

10.6

112

7.48

3.07

3.10

25.2

162

19.0

160

17.2

131

10.3

117

25.5

161

11/24/16

ADSM

38.7

253

40.4

234

20.3

247

27.3

250

35.1

256

55.9

245

22.3

225

56.1

258

50.9

259

24.2

254

58.0

247

56.3

263

36.5

252

32.1

241

38.7

227

69.7

256

01/15/17

IGF

34.0

241

42.7

242

20.1

246

23.7

243

32.2

242

45.6

225

28.6

249

43.0

226

37.2

231

21.4

247

50.9

229

44.7

237

34.7

246

31.9

239

37.4

219

47.1

221

01/24/17

3DMST

5.92

3.71

2.78

4.75

2.72

7.36

4.28

3.44

3.76

2.35

12.6

11.5

8.56

14.0

5.35

8.87

03/09/17

SGMEPi

13.9

149

6.92

101

6.71

156

9.47

159

9.72

144

11.8

102

13.6

183

10.9

114

10.6

124

5.26

153

32.8

180

26.9

168

22.7

179

22.7

183

12.0

129

21.7

151

03/10/17

MC-CNN+TDSR

6.35

5.45

4.45

6.80

126

3.46

10.7

6.05

5.01

5.19

2.62

10.8

9.62

6.59

11.4

6.01

7.04

We propose a novel method for stereo estimation, combining advantages of convolutional neural networks (CNNs) and optimization-based approaches. The optimization, posed as a conditional random field (CRF), takes local matching costs and consistency-enforcing (smoothness) costs as inputs, both estimated by CNN blocks. To perform the inference in the CRF we use an approach based on linear programming relaxation with a fixed number of iterations. We address the challenging problem of training this hybrid model end-to-end. We show that in the discriminative formulation (structured support vector machine) the training is practically feasible. The trained hybrid model with shallow CNNs is comparable to state-of-the-art deep models in both time and performance. The optimization part efficiently replaces sophisticated and not jointly trainable (but commonly applied) post-processing steps by a trainable, well-understood model.

03/22/17

JMR

12.5

134

4.09

3.97

8.44

147

6.93

115

11.1

13.8

184

19.5

154

19.0

173

3.66

117

17.0

118

18.2

122

18.0

155

21.0

171

7.29

17.8

131

03/23/17

DSGCA

33.8

239

42.9

243

20.9

253

23.6

242

30.2

235

45.5

224

27.6

243

42.0

224

36.0

229

21.0

243

50.2

225

44.2

236

33.3

244

34.6

250

38.4

225

46.8

219

04/04/17

DDL

30.1

222

44.3

247

19.4

244

25.8

247

28.3

230

42.1

208

21.1

221

37.1

213

28.7

205

21.7

248

46.8

215

36.0

205

30.3

230

28.4

218

32.7

212

37.5

193

05/23/17

r200high

40.9

261

70.5

275

14.4

223

21.3

232

37.7

260

72.2

260

38.1

267

53.2

253

31.4

213

18.3

232

52.4

236

52.6

255

44.1

266

45.4

269

50.7

252

66.5

252

06/14/17

DoGGuided

41.4

262

45.4

250

23.6

262

30.6

261

34.6

254

52.5

239

28.3

246

59.1

266

53.8

264

26.4

262

60.6

256

54.7

262

38.3

256

35.5

252

44.5

242

72.0

262

We propose local expansion moves for estimating dense 3D labels on a pairwise MRF. The data term uses a PatchMatch-like 3D slanted window formulation, where raw matching costs within a window are computed by MC-CNN-acrt and aggregated using guided image filtering. The smoothness term uses a pairwise curvature regularization term by Olsson et al. 2013.

06/22/17

LocalExp

5.43

3.65

2.87

2.98

1.99

5.59

3.37

3.48

3.35

2.05

10.3

9.75

8.57

14.4

5.40

9.55

We propose a feature ensemble network leveraging deep convolutional neural network to perform matching cost computation and the disparity refinement. For matching cost computation, patch-based network architecture with multi-size and multi-layer pooling unit is adopted to learn cross-scale feature representations. For disparity refinement, the initial optimal and sub-optimal disparity maps are incorporated and diverse base learners are applied.

10/12/17

FEN-D2DRR

7.23

4.68

4.11

5.03

3.03

8.42

6.05

4.90

5.32

3.20

102

11.5

14.1

13.4

113

13.9

5.06

14.3

116

We propose a robust learning-based method for stereo cost volume computation. We accomplish this by coalescing diverse evidence from a bidirectional matching process via random forest classifiers. We show that our matching volume estimation method achieves similar accuracy to purely data-driven alternatives and that it generalizes to unseen data much better. In fact, we used the same model trained on Middlebury 2014 dataset to submit to the KITTI and ETH3D benchmarks.

11/13/17

CBMV

11.1

121

6.07

5.22

124

8.09

142

4.05

18.7

141

9.31

139

10.7

113

9.61

114

3.11

33.7

182

15.6

100

17.5

150

17.1

130

10.1

114

14.4

117

We extend the standard BP sequential technique to the fully connected CRF models with the geodesic distance affinity. Also a new approach to the BP marginal solution is proposed that we call one-view-occlusion detection (OVOD). In contrast to the standard winner takes all (WTA) estimation, the proposed OVOD solution allows to find occluded regions in the disparity map and simultaneously improve the matching result. As a result we can perform only one energy minimization process and avoid the cost calculation for the second view and the left-right check procedure.

12/11/17

OVOD

8.87

4.74

3.64

5.51

4.82

12.8

109

6.51

104

9.91

107

9.96

119

3.13

100

16.6

115

14.8

14.1

120

15.4

107

6.92

13.2

109

02/07/18

56.1

272

47.7

255

27.9

269

36.1

269

46.7

269

62.5

252

50.2

275

72.4

272

69.9

274

37.8

272

88.2

272

70.0

271

52.8

272

50.2

273

77.5

272

91.8

275

02/28/18

SDR

7.69

5.41

4.22

4.20

2.73

10.2

5.40

6.40

5.76

4.72

141

11.2

15.4

13.4

112

16.5

126

5.22

13.0

108

03/09/18

SGM_RVC

18.4

172

37.4

223

5.31

127

9.03

151

14.2

174

31.7

179

14.3

193

24.7

169

12.6

132

5.27

154

31.8

178

29.7

180

24.9

197

22.0

178

18.6

170

28.2

168

Semi-Global Matching (SGM) uses an aggregation scheme to combine costs from multiple 1D scanline optimizations that tends to hurt its accuracy in difficult scenarios. We propose replacing this aggregation scheme with a new learning-based method that fuses disparity proposals estimated using scanline optimization. Our proposed SGM-Forest algorithm solves this problem using per-pixel classification. SGM-Forest currently ranks 1st on the ETH3D stereo benchmark and is ranked competitively on the Middlebury 2014 and KITTI 2015 benchmarks. It consistently outperforms SGM in challenging settings and under difficult training protocols that demonstrate robust generalization, while adding only a small computational overhead to SGM.

03/11/18

SGM-Forest

7.37

4.71

3.69

4.93

3.18

11.1

5.37

5.57

5.81

2.65

14.5

13.2

13.1

108

14.8

101

5.63

11.2

03/14/18

DTS

13.4

146

8.45

119

7.54

167

7.46

137

5.50

14.9

116

10.2

147

24.5

167

25.1

189

4.93

144

19.2

130

18.7

123

14.6

125

15.9

118

13.0

135

17.2

126

03/23/18

MEDIAN_ROB

97.8

277

96.1

276

95.6

276

99.0

277

98.4

277

98.4

276

99.2

277

98.4

277

98.1

276

99.0

277

99.0

277

99.6

277

99.9

277

94.7

277

95.1

276

98.3

276

03/23/18

AVERAGE_ROB

97.6

276

96.2

277

96.5

277

96.8

276

97.8

276

97.8

275

98.2

276

97.9

276

98.2

277

98.9

276

98.9

276

99.1

276

99.7

276

93.1

276

97.9

277

98.9

277

A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. Updated ELAS submission as a baseline for the Robust Vision Challenge (http://robustvision.net), replacing the original ELAS (H) entry.

03/26/18

ELAS_RVC

27.3

208

43.4

244

12.4

218

13.9

201

23.8

214

66.4

254

20.4

218

33.0

200

20.7

178

11.0

208

43.9

206

37.5

215

26.3

204

28.7

220

38.4

226

33.3

182

04/17/18

ISM

40.8

259

42.5

241

26.4

267

34.8

268

36.1

257

44.5

220

34.4

260

56.2

259

52.7

263

25.2

258

51.0

232

52.4

254

39.7

258

33.3

245

38.8

229

75.3

264

05/01/18

PSMNet_ROB

42.1

265

33.0

210

23.1

260

30.1

257

31.4

239

54.8

243

30.7

252

48.7

242

48.3

251

28.3

265

80.8

270

53.5

259

36.9

254

38.6

261

63.9

267

71.2

258

05/18/18

PDS

14.2

150

14.4

154

5.80

138

10.5

171

10.5

149

22.1

152

14.0

187

14.5

130

8.97

104

5.93

172

24.2

152

21.5

146

18.2

159

18.9

154

11.9

128

33.6

184

05/22/18

DN-CSS_ROB

22.8

191

31.4

206

9.28

189

13.5

196

12.4

162

44.3

219

12.1

162

28.1

188

17.6

168

9.11

200

50.9

231

40.0

226

21.2

169

25.0

204

31.9

211

43.2

210

05/26/18

NOSS_ROB

5.01

3.57

2.84

3.99

1.93

5.15

3.34

3.32

3.15

2.32

8.55

7.45

7.06

12.5

5.20

9.06

05/31/18

FBW_ROB

32.2

233

36.3

219

9.37

190

14.2

202

19.1

195

69.3

256

13.2

180

51.0

247

39.1

235

10.8

207

43.9

206

41.5

230

33.0

241

29.9

232

51.3

253

71.4

259

05/31/18

iResNet_ROB

24.8

200

23.0

183

10.2

203

14.7

206

12.4

161

25.9

165

12.9

171

28.0

187

24.9

188

11.5

212

46.6

214

38.9

223

21.4

171

27.8

214

45.6

245

66.7

254

05/31/18

CBMV_ROB

7.65

3.48

3.35

4.80

3.57

6.32

6.88

106

4.84

3.91

1.97

25.4

156

11.1

13.1

106

15.8

116

7.34

13.8

114

06/05/18

CBMBNet

10.2

107

8.30

116

5.10

119

6.87

128

4.52

11.5

100

7.70

119

13.9

127

13.2

133

3.04

21.9

140

13.3

13.6

116

15.4

108

11.2

125

11.0

Numerous CNN algorithms focus on the pixel-wise matching cost computation, which is the important building block for many state-of-the-art algorithms. However, these architectures are limited to small and single scale receptive fields and use traditional methods for cost aggregation or even ignore cost aggregation. In this paper, we propose a novel architecture called cascaded multi-scale and multi-dimension network (MSMD) to take them both into consideration. Firstly, we propose a new multi-scale matching cost computation sub-network, in which two different sizes of receptive fields are implemented parallelly. In this way, the network can make the best use of both variants to balance the trade-off between the increase of receptive field and the loss of details. Furthermore, we show that our multi-dimension aggregation sub-network which contains 2D convolution and 3D convolution operations can provide rich context and semantic information for estimating an accurate initial disparity.

06/14/18

MSMD_ROB

30.9

224

26.9

195

14.6

224

20.0

229

22.6

209

33.7

183

27.8

244

43.9

229

38.4

234

21.1

244

49.5

222

40.8

228

31.8

235

31.6

235

37.5

223

43.6

212

A robust solution for semi-dense stereo matching is presented. It utilizes two CNN models for computing stereo matching cost and performing confidence-based filtering, respectively. Compared to existing CNNs-based matching cost generation approaches, our method feeds additional global information into the network so that the learned model can better handle challenging cases, such as lighting changes and lack of textures. Through utilizing non-parametric transforms, our method is also more self-reliant than most existing semi-dense stereo approaches, which rely highly on the adjustment of parameters.

06/27/18

DCNN

10.9

118

5.66

4.98

115

6.49

119

5.73

12.5

108

8.51

132

15.6

135

10.9

125

3.08

24.1

151

20.2

137

16.8

145

15.5

110

10.3

118

13.8

112

07/31/18

MotionStereo

40.4

257

67.6

273

25.0

264

29.2

255

40.9

264

57.3

247

35.5

262

57.5

263

40.4

237

19.9

239

42.8

204

52.6

256

39.8

259

37.1

256

51.7

254

34.9

185

10/10/18

DISCO

24.5

197

35.0

215

7.34

165

11.7

184

18.7

193

48.6

229

17.1

202

31.4

195

22.4

180

9.33

203

46.0

212

33.5

192

27.5

216

24.6

200

27.5

196

55.0

236

10/29/18

iResNet

22.9

192

28.3

198

9.19

187

15.8

214

19.3

198

35.1

186

11.3

154

27.7

183

16.8

165

15.2

224

54.7

243

27.6

172

19.5

161

21.5

174

31.9

210

51.6

228

10/29/18

Dense-CNN

7.98

5.59

4.54

5.83

2.79

10.4

5.78

8.26

8.84

102

2.66

15.6

108

14.2

13.2

109

13.2

6.30

11.1

11/07/18

IEBIMst

33.8

240

36.7

221

12.1

215

16.9

220

32.5

244

51.0

233

25.3

234

58.1

264

49.8

255

11.2

210

48.6

219

56.9

265

30.2

229

26.8

213

26.9

194

71.7

260

11/08/18

HSM-Net_RVC

10.2

106

12.0

143

5.32

128

7.50

139

6.72

114

15.6

121

9.89

143

6.83

5.14

4.17

134

22.7

143

17.1

114

15.6

134

14.3

10.8

123

14.6

118

11/11/18

MBM

22.8

190

36.4

220

9.95

201

15.3

211

19.3

197

36.5

192

19.9

216

27.5

182

18.1

169

10.1

205

41.5

197

32.7

188

26.3

206

23.3

189

21.4

184

39.9

202

We propose four efficient feature extractors based on convolutional neural networks for stereo matching cost computation. Two of them generate multiscale features with diverse receptive field sizes. These multiscale features are used to compute the corresponding multiscale matching costs. We then determine an optimal cost by combining the multiscale costs using edge information. On the other hand, the other two feature extractors produce uni-scale features by combining multiscale features directly through fully connected layers. Finally, after obtaining matching costs using one of the four extractors, we determine optimal disparities based on the cross-based cost aggregation and the semiglobal matching.

11/28/18

MSFNetA

7.96

6.21

4.26

6.02

110

3.66

8.95

6.28

101

8.41

8.06

2.62

17.9

123

13.9

11.9

11.5

8.00

103

10.6

01/12/19

EHCI_net

9.47

100

3.75

4.27

13.1

192

27.6

227

5.30

3.23

3.47

3.18

3.90

123

9.20

9.58

9.26

13.9

17.3

162

10.6

01/17/19

FASW

28.6

216

41.7

238

18.1

240

23.1

239

27.2

226

40.6

202

19.1

213

34.9

206

28.1

204

18.5

235

40.8

196

36.4

208

29.3

225

28.4

217

31.1

206

41.0

205

12/18/18

MCV-MFC

24.8

199

26.8

194

9.63

198

14.8

208

17.4

186

54.1

242

14.2

190

26.5

174

18.2

170

16.0

228

62.6

260

28.7

178

20.9

166

24.6

201

37.4

220

48.1

225

02/05/19

AMNet

53.3

270

54.3

263

63.7

275

51.2

273

51.3

271

40.6

204

39.9

269

51.6

248

55.9

267

55.4

275

58.9

250

57.5

266

52.7

271

49.5

272

58.1

260

61.6

246

The method comprises two main steps. First, we use adaptive support weights for local matching. Apart from the color similarity and geometric distance, the adaptive weight distribution favors pixels in the block matching with smaller cost. Besides, we use a multiscale strategy with invalidation criteria to reduce match ambiguity and computational time. Second, a global interpolation using a variational formulation is carried out. The energy functional penalizes deviations from the local disparity estimation at different scales.

02/15/19

DAWA-F

27.4

209

47.2

254

13.6

221

13.1

194

19.2

196

66.4

254

20.4

219

30.3

193

33.9

221

8.73

197

48.9

220

37.8

217

26.7

209

29.9

231

28.0

197

36.5

190

Stereo matching process is attracted numbers of study in recent years. The process is unique and difficult due to visual discomfort occurred which contributed to effect of accuracy of disparity maps. By using multistage technique implemented most of Stereo Matching Algorithm; taxonomy by D. Scharstein and R. Szeliski, in this paper proposed new improvement algorithm of stereo matching by using the effect of Adaptive Weighted Bilateral Filter as main filter in cost aggregation stage which able contribute edge-preserving factor and robust against plain colour region. With some improvement parameters in matching cost computation stage where windows size of sum of absolute different (SAD) and thresholds adjustment was applied and Median Filter as main filter in refinement disparity map’s stage may overcome the limitation of disparity map accuracy. Evaluation on indoor datasets, latest (2014) Middlebury dataset were used to prove that Adaptive Weighted Bilateral Filter effect applied on proposed algorithm resulted smooth disparity maps and achieved good processing time.

03/06/19

SM-AWP

38.1

251

30.7

205

24.0

263

25.2

246

30.3

236

44.9

221

38.1

266

56.0

257

55.8

266

19.9

240

60.1

254

51.2

249

32.1

237

30.2

233

40.0

235

61.7

247

03/09/19

3DMST-CM

5.47

4.10

3.37

2.99

2.95

7.63

4.55

3.26

3.95

2.16

10.2

8.28

6.37

13.2

5.86

9.35

This paper presents a novel unsupervised stereo matching cost for stereo matching. Specifically, a novel two-branch convolutional sparse coding (CSC) is used to learn the convolution filter bank without ground truth disparity maps. Then, the sparse representations over the learned convolutional filter bank are utilized to measure the similarity between image patches, namely, the stereo matching cost can be computed by measuring the l1 distance between sparse representations of image patches.

04/12/19

TCSCSM

19.1

180

45.2

249

5.76

137

11.0

179

22.1

208

41.1

206

13.4

182

24.8

170

11.4

129

7.17

181

29.5

168

26.6

166

26.6

207

20.5

165

16.5

158

17.4

128

05/10/19

tMGM-16

17.3

164

8.70

123

6.49

150

9.82

163

20.7

203

13.7

112

13.9

186

21.8

159

16.0

156

5.57

162

26.9

158

25.4

163

28.3

218

22.1

179

14.6

147

39.6

199

In this work, we propose a learning-based method to denoise and refine disparity maps of a given stereo method. The proposed variational network arises naturally from unrolling the iterates of a proximal gradient method applied to a variational energy defined in a joint disparity, color, and confidence image space. Our method allows to learn a robust collaborative regularizer leveraging the joint statistics of the color image, the confidence map and the disparity map. Due to the variational structure of our method, the individual steps can be easily visualized, thus enabling interpretability of the method. We can therefore provide interesting insights into how our method refines and denoises disparity maps. The efficiency of our method is demonstrated by the publicly available stereo benchmarks Middlebury 2014 and Kitti 2015.

05/13/19

14.2

151

9.69

128

9.58

196

10.9

178

7.33

119

9.54

13.8

185

11.3

117

11.3

128

7.17

181

27.4

161

23.3

154

24.8

194

22.8

185

14.6

148

18.4

136

05/15/19

PSMNet_2000

28.9

218

20.4

176

8.23

174

15.1

210

27.7

229

35.2

187

15.2

198

50.8

246

51.8

261

9.29

202

61.9

258

31.1

184

25.2

198

27.8

214

29.3

201

52.9

230

We propose "DeepPruner", a real-time stereo matching algorithm, which combines the strength of deep network and search space pruning techniques. Towards this goal, we developed a differentiable PatchMatch module that allows us to discard most disparities and generates a sparse representation of the cost-volume. We then exploit this representation to learn which range to prune for each pixel. Our method achieves competitive results on KITTI / SceneFlow datasets while running in real-time at 62ms. Moreover, we obtain the first place (on overall rankings) on the Robust Vision Challenge. For more details, check out our paper and source code.

06/26/19

DeepPruner_ROB

30.1

219

34.2

213

19.9

245

24.3

244

23.8

214

47.2

227

26.1

235

26.1

172

22.8

181

18.4

234

59.8

253

36.5

210

23.2

181

31.7

237

48.3

250

44.8

216

07/26/19

EdgeStereo

18.7

173

25.3

190

6.79

157

10.6

173

25.1

220

22.1

152

8.31

129

24.5

165

16.5

161

6.63

178

9.20

32.0

186

24.6

192

20.2

162

19.2

174

54.3

234

It has been proposed by many researchers that combining deep neural networks with graphical models can create more efficient and better regularized composite models. The main difficulties in implementing this in practice are associated with a discrepancy in suitable learning objectives as well as with the necessity of approximations for the inference. In this work we take one of the simplest inference methods, a truncated max-product Belief Propagation, and add what is necessary to make it a proper component of a deep learning model: We connect it to learning formulations with losses on marginals and compute the backprop operation. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs), allowing us to design a hierarchical model composing BP inference and CNNs at different scale levels. The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.

11/07/19

LBPS

9.68

103

5.05

4.98

115

5.57

3.24

6.03

12.9

173

5.44

5.50

3.55

115

15.2

103

15.9

104

17.5

149

17.3

133

9.84

112

28.3

169

11/11/19

CACA-Net

31.7

230

21.8

181

16.0

231

21.5

234

24.5

216

38.0

197

34.6

261

38.6

215

36.3

230

19.0

237

49.4

221

40.2

227

32.1

238

33.6

248

36.0

215

58.2

240

11/14/19

HSM-Smooth-Occ

10.8

117

11.7

139

5.62

133

8.75

149

8.39

134

15.4

117

9.60

140

8.29

6.11

4.09

129

23.6

148

20.4

139

17.0

146

13.4

10.2

116

16.6

124

11/15/19

100

SPPSMNet

41.4

263

32.5

208

20.6

249

34.0

266

33.0

246

55.9

246

36.3

264

53.8

254

48.6

252

35.0

270

71.6

265

52.8

257

37.1

255

38.1

260

46.7

247

56.9

239

12/19/19

101

CRLE

5.75

3.66

3.11

5.92

105

2.14

6.01

3.39

3.49

3.68

2.34

10.2

9.63

8.04

14.9

102

5.45

9.26

12/30/19

102

F-GDGIF

31.6

229

37.3

222

7.72

168

16.1

216

34.9

255

35.2

188

20.1

217

55.3

256

46.6

246

9.01

199

47.6

216

52.9

258

29.4

226

29.7

230

29.4

203

61.0

245

01/02/20

103

PPEP-GF

34.6

243

42.4

240

21.7

256

24.8

245

30.8

237

44.0

218

25.1

233

45.7

235

42.1

240

20.1

241

44.1

208

43.6

234

35.2

248

32.8

243

39.3

232

55.0

237

01/05/20

104

MTS2

53.8

271

51.7

261

21.5

254

38.8

271

52.7

272

97.5

274

43.0

272

66.4

270

60.8

269

32.0

267

85.7

271

69.0

270

46.3

268

45.1

267

71.2

271

85.2

271

01/07/20

105

ADSR_GIF

37.1

249

43.6

246

18.6

242

36.7

270

24.6

217

58.6

249

22.8

227

56.3

260

49.7

254

18.7

236

56.0

244

48.5

247

32.2

239

24.5

199

36.3

216

79.1

267

02/07/20

106

CasStereo

18.8

175

23.9

187

9.01

183

10.5

170

11.7

156

74.0

262

13.1

179

10.1

109

7.86

4.09

129

45.4

211

25.2

160

24.4

191

17.3

134

20.5

182

44.3

214

02/20/20

107

CRAR

22.0

189

23.2

186

13.5

220

16.4

218

16.3

182

21.0

148

21.8

223

28.5

189

26.9

201

10.5

206

32.5

179

32.9

189

23.3

182

23.6

194

20.3

181

37.4

192

02/24/20

108

SGBMP

27.8

210

37.5

226

16.3

233

17.1

223

27.6

228

75.7

265

14.6

195

33.4

203

25.8

195

12.2

218

60.3

255

34.0

194

23.1

180

29.3

226

28.7

200

31.2

177

03/13/20

109

MTS

59.7

273

58.8

269

25.2

265

51.1

272

60.4

273

91.3

271

48.3

274

70.3

271

63.4

271

44.4

273

79.3

268

71.7

272

60.9

274

47.4

271

79.6

274

88.4

272

05/14/20

110

SRM

13.1

142

8.50

120

7.04

161

7.86

140

7.73

128

16.1

127

7.90

124

18.4

151

18.5

172

5.03

148

22.3

142

20.0

134

18.1

157

18.5

148

11.3

126

19.3

141

05/19/20

111

SUWNet

30.1

221

24.5

188

13.9

222

20.3

230

20.0

200

35.7

189

26.1

236

40.9

222

43.2

242

17.9

231

49.8

224

28.6

177

24.4

189

28.5

219

52.5

256

37.9

195

05/20/20

112

AANet++

15.4

158

17.5

166

8.37

177

10.2

167

9.86

145

23.9

158

9.82

142

17.7

149

15.9

155

3.25

105

18.1

125

27.1

170

16.2

137

18.4

147

20.0

179

37.7

194

05/28/20

114

RTSMNet

45.6

267

47.0

252

21.9

257

31.9

263

36.4

258

75.1

264

43.9

273

58.9

265

55.3

265

32.7

268

62.2

259

56.4

264

42.2

264

39.1

262

58.0

259

59.9

243

05/28/20

113

LEAStereo

7.15

7.56

110

4.52

4.62

4.64

8.83

5.66

5.86

6.03

3.30

108

13.1

11.3

10.3

12.1

7.06

9.90

06/08/20

115

MANE

30.9

225

54.7

265

11.5

210

14.6

205

29.4

234

52.6

240

26.4

238

45.1

233

31.5

216

11.5

212

42.5

200

41.8

231

33.1

242

31.6

236

34.2

213

43.5

211

07/16/20

116

HLocalExp-CM

5.68

3.68

2.95

3.92

2.45

8.12

3.41

3.74

3.53

2.17

10.2

10.0

8.75

14.1

5.12

9.61

07/17/20

117

GANetREF_RVC

18.9

177

16.6

164

6.42

148

7.40

136

10.6

150

25.8

164

12.2

164

36.5

211

35.5

227

4.10

131

33.8

183

20.1

135

16.4

141

20.2

163

14.3

145

48.1

226

07/21/20

118

AANet_RVC

25.2

201

22.6

182

11.3

208

12.9

191

15.9

179

30.5

177

17.9

206

33.4

202

30.9

212

6.34

177

28.8

166

43.4

232

25.3

200

26.7

212

37.0

218

69.8

257

08/10/20

119

CVANet_RVC

31.8

231

25.6

191

14.6

225

21.7

235

22.1

207

39.8

199

28.4

247

44.7

231

47.0

248

19.1

238

50.6

228

30.4

182

24.7

193

29.3

227

52.1

255

41.6

207

Accurate disparity prediction is a hot spot in computer vision, and how to efﬁciently exploit contextual information is the key to improve the performance. In this paper, we propose a simple yet effective non-local context attention network (NLCANet) to exploit the global context information by using attention mechanisms and semantic information for stereo matching. First, we develop a 2D geometry feature learning (GFL) module to get a more discriminative representation by taking advantage of multi-scale features and form them into the variance-based cost volume. Then, we construct a non-local attention matching (NLAM) module by using the non-local block and hierarchical 3D convolutions, which can effectively regularize the cost volume and capture the global contextual information. Finally, we adopt a geometry reﬁnement (GR) module to reﬁne the disparity map to further improve the performance. Moreover, we add the warping loss function to help the model learn the matching rule of the non-occluded region. Our experiments show that (1), our approach achieves competitive results on KITTI and SceneFlow datasets in the end-point error (EPE) and the fraction of erroneous pixels (D 1 ); (2), our proposed method particularly has superior performance in the reﬂective regions and occluded areas.

08/11/20

120

NLCA_NET_v2_RVC

10.4

109

11.8

140

4.12

6.39

117

6.44

110

19.7

144

10.9

150

14.5

130

13.2

134

3.26

107

21.2

138

14.7

10.1

14.5

7.17

11.5

08/12/20

121

CFNet_RVC

10.1

104

14.4

156

7.81

170

7.12

132

6.61

112

15.5

119

7.53

117

12.3

121

11.5

130

3.02

10.7

16.6

109

10.7

15.4

109

10.9

124

9.01

08/30/20

122

LE_PC

5.58

3.52

2.99

4.24

1.92

5.39

3.42

3.16

3.72

2.30

7.83

9.90

7.79

17.4

135

4.74

9.51

09/03/20

123

LPSM

39.5

255

40.0

231

20.7

251

28.3

254

34.0

253

34.3

184

23.8

230

56.7

261

52.4

262

24.9

257

36.9

189

66.3

269

40.6

262

37.5

258

46.6

246

79.3

268

09/09/20

124

AdaStereo

13.7

148

19.6

172

7.41

166

10.6

174

14.5

176

15.7

123

7.85

122

22.6

161

9.32

110

7.00

180

9.20

22.4

148

14.5

124

17.8

141

14.8

149

24.2

156

09/24/20

125

ACMC

31.9

232

50.8

258

16.5

234

23.4

240

21.2

204

51.1

235

26.5

239

39.5

218

28.8

206

15.9

227

51.3

233

37.9

219

32.3

240

33.1

244

39.3

231

53.1

231

10/28/20

126

HITNet

6.46

6.25

4.67

102

4.51

2.17

6.52

5.18

2.92

2.66

2.37

36.7

187

9.28

6.27

11.2

4.61

9.54

We propose a novel lightweight network for stereo estimation. The method uses densely connected layer structures to learn expressive features without the need of fully-connected layers or 3D convolutions. This leads to a network structure with only 0.37M parameters while still having competitive results. The post-processing consists of filtering, a consistency check and hole filling.

11/10/20

127

FC-DCNN

17.9

168

21.2

178

6.52

153

9.56

160

14.1

172

31.9

181

23.4

229

23.4

162

19.7

174

5.93

172

26.9

157

22.8

150

20.0

162

19.3

157

18.2

167

23.9

155

11/12/20

128

RLStereo

27.9

212

20.5

177

15.0

228

23.5

241

26.3

225

51.5

237

35.8

263

27.1

181

23.4

185

15.6

225

63.6

261

32.3

187

21.5

172

23.2

188

44.7

243

17.4

129

11/12/20

129

UnDAF-GANet

16.2

161

3.74

2.94

16.7

219

18.3

188

24.1

159

26.3

237

19.2

153

15.7

153

1.86

36.8

188

26.8

167

11.1

24.8

202

6.54

28.0

167

11/16/20

130

SSCasStereo

15.2

157

33.6

211

5.73

136

8.13

144

12.6

164

51.1

234

8.19

127

16.7

144

5.02

5.70

168

48.5

217

17.3

116

16.0

135

20.1

161

12.3

133

9.25

11/17/20

131

RASNet

13.1

143

11.9

142

5.65

135

5.71

8.36

133

25.8

163

8.31

129

7.18

5.29

2.93

25.0

154

16.0

105

13.9

118

18.4

146

38.2

224

21.4

150

11/21/20

132

DecStereo

20.2

184

19.4

171

11.9

213

15.6

212

13.5

170

23.0

156

26.7

240

13.3

125

15.1

151

7.60

187

28.3

164

30.2

181

23.4

183

17.6

136

38.9

230

38.4

196

11/25/20

133

LPSC

10.7

114

5.15

4.23

5.48

6.38

107

16.5

129

7.84

121

9.56

102

10.3

122

4.02

126

20.2

133

19.0

126

17.7

151

18.5

149

9.73

111

18.0

133

11/26/20

134

CooperativeStereo

28.8

217

28.5

201

12.3

217

17.3

224

18.5

192

62.3

251

22.4

226

36.3

210

24.7

187

15.8

226

74.5

267

37.8

218

28.4

219

26.6

211

41.6

238

28.4

170

12/22/20

135

SLCCF

8.83

6.97

103

4.90

111

6.05

111

4.35

8.89

5.33

6.29

5.15

4.80

143

13.0

18.1

121

17.8

153

17.7

138

6.93

15.4

121

12/24/20

136

ACR-GIF-OW

24.5

196

37.5

224

10.8

205

16.3

217

17.4

187

44.9

222

17.2

203

33.5

204

25.2

190

11.4

211

45.4

210

35.7

204

26.6

208

23.3

191

23.6

187

38.4

197

This model is trained on low-resolution data but aims at high-resolution images. It uses a recurrent module to iteratively update a coarse disparity prediction. Then a special refinement module makes a final adjustment. The recurrent update and final refine are applied in a patch-wise manner across the initial disparity.

03/05/21

137

ORStereo

19.1

179

38.9

229

9.97

202

9.21

155

23.3

212

42.6

212

13.0

176

18.2

150

6.63

4.93

144

35.4

185

33.1

190

24.1

187

23.6

193

18.2

166

26.0

163

03/05/21

138

LocalExp-RC

5.54

3.78

3.02

3.85

2.08

5.95

3.48

3.61

3.65

2.52

10.3

6.85

7.25

16.1

121

5.12

10.2

04/22/21

139

LESC

6.78

4.07

3.46

3.26

3.36

9.15

4.08

4.76

5.21

2.80

11.7

13.0

10.2

17.0

128

5.52

12.5

104

05/09/21

140

ADSG

24.7

198

36.3

218

11.3

207

15.6

213

20.0

199

35.9

191

18.2

207

35.2

208

27.2

202

12.2

219

42.7

203

33.8

193

26.3

205

23.3

190

27.0

195

36.6

191

06/02/21

141

FADNet_RVC

28.4

214

18.3

167

9.48

192

13.9

200

16.0

180

40.9

205

13.0

177

43.4

227

45.0

244

8.43

194

57.8

246

35.7

202

24.8

195

23.9

197

47.4

249

68.1

255

06/04/21

142

RANet++

28.5

215

19.1

170

9.91

200

18.4

226

15.7

178

37.2

194

14.2

191

42.9

225

43.4

243

7.56

186

50.9

230

37.2

212

28.6

222

25.6

208

48.8

251

59.0

241

06/07/21

143

FADNet++

40.2

256

25.2

189

22.1

258

33.4

265

28.8

233

54.0

241

33.6

259

46.8

237

46.8

247

23.5

253

73.4

266

47.1

246

28.5

220

37.7

259

65.4

270

71.9

261

We propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs.

06/10/21

144

ReS2tAC

35.8

246

41.8

239

20.7

250

28.1

253

24.6

218

42.4

210

30.3

250

38.9

217

34.9

224

23.0

251

50.3

227

48.9

248

39.8

261

39.4

263

44.1

240

65.1

251

06/11/21

145

R3DCNN

33.0

237

34.2

214

15.8

230

13.4

195

41.7

265

47.9

228

22.0

224

60.1

267

57.4

268

12.6

221

40.3

195

46.4

242

26.8

211

37.0

255

19.3

175

45.2

217

The method that estimate optimal parameters for MRF stereo can not be directly used to estimate parameters for local expansion moves stereo. To estimate regularization weight for local expansion moves stereo, we propose the probabilistic mixture models for slanted patch matching terms and curvature regularization terms.

06/23/21

146

ERW-LocalExp

5.53

3.64

2.84

2.66

1.97

5.68

4.87

3.27

3.25

2.36

10.5

11.5

7.46

14.7

5.55

9.18

07/23/21

148

HBP_ISP

5.20

3.70

3.05

3.57

2.34

7.80

3.79

3.34

3.09

1.87

9.85

10.1

7.82

11.2

5.26

7.86

07/26/21

149

RAFT-Stereo

4.74

4.19

3.44

3.11

1.51

7.30

2.79

2.67

2.59

1.39

7.46

10.2

5.86

13.0

3.59

9.38

07/14/21

147

MFN_USFDSRVC

36.7

248

31.4

207

16.1

232

22.0

236

28.5

231

42.3

209

21.1

220

52.2

250

50.7

258

21.4

246

53.6

240

53.8

260

30.7

232

31.7

238

63.7

266

60.7

244

08/22/21

150

SDCO

19.0

178

30.4

204

5.92

140

9.11

153

21.5

206

37.5

195

12.3

165

26.8

177

16.7

164

5.68

167

29.4

167

30.6

183

25.6

201

23.1

187

17.5

163

18.9

139

A lightweight network with dilated ResNet feature extractor, a correlation cost volume run at a low resolution, and a refinement network to get a full resolution disparity output. Sparse disparity is processed from the dense disparity using a threshold on the network confidence output and a region grower to remove suspected bad disparities.

08/24/21

151

MMStereo

12.7

138

27.9

197

8.71

180

8.81

150

11.7

156

26.9

168

5.82

20.9

157

14.6

146

4.10

131

15.4

106

16.0

105

14.2

121

13.6

9.71

110

7.35

09/20/21

152

GANet-RSSM

10.6

112

11.9

141

8.54

178

6.60

121

6.26

106

16.3

128

7.10

112

15.5

134

14.5

145

2.93

11.3

16.1

108

10.8

16.0

120

10.7

120

11.3

10/17/21

153

ACVNet

13.6

147

9.69

128

3.65

4.82

7.48

121

22.9

155

12.9

173

15.7

136

14.4

144

3.82

120

21.8

139

17.7

118

14.3

122

15.6

111

25.6

191

32.1

179

10/25/21

154

SWFSM

8.21

5.46

4.66

101

5.90

102

2.92

10.9

5.59

8.91

100

9.58

113

2.72

13.2

14.8

13.4

113

13.4

7.76

100

11.4

11/10/21

155

CREStereo

3.71

4.73

3.94

5.07

1.96

3.02

1.42

2.28

2.05

1.51

6.86

6.35

4.25

6.01

4.60

5.49

11/21/21

156

FENet

11.3

125

7.70

112

3.91

3.97

6.24

105

16.7

130

5.78

32.1

197

32.4

219

2.57

11.8

10.8

6.90

13.4

5.41

11.2

11/21/21

157

Gwc_CoAtRS

6.50

6.92

101

6.82

158

4.55

3.48

5.12

5.80

4.88

4.96

2.69

15.3

105

12.8

6.40

10.2

7.13

8.48

01/27/22

158

UPFNet

10.3

108

9.74

131

4.67

102

6.28

116

5.54

20.1

145

8.78

133

9.42

101

7.51

3.78

119

22.9

145

16.0

107

16.7

144

15.6

112

8.70

105

16.0

122

02/27/22

159

MSTR

8.72

6.28

6.00

142

4.13

5.00

8.03

7.81

120

5.33

5.80

3.25

105

20.3

135

14.4

11.2

12.9

7.29

31.5

178

03/02/22

160

AANet_Edge

23.7

193

28.4

200

9.56

194

14.8

207

14.3

175

34.9

185

15.9

200

23.9

163

17.2

166

5.62

165

31.6

176

35.2

200

29.6

228

22.1

180

41.4

237

74.0

263

04/11/22

161

Z2ZNCC

34.4

242

40.9

236

20.4

248

29.9

256

32.0

241

42.5

211

27.0

241

41.4

223

38.0

233

26.0

259

48.6

218

43.7

235

36.5

251

33.3

246

37.5

222

41.3

206

05/25/22

162

LSMSW

8.15

5.45

4.64

5.93

107

2.93

10.6

5.68

8.70

9.23

109

2.68

13.4

14.6

13.3

111

13.5

7.69

11.4

06/13/22

163

EAI-Stereo

3.68

4.02

3.32

2.48

1.42

4.19

2.37

2.18

2.01

1.16

10.2

8.84

4.00

7.15

3.14

6.44

07/14/22

165

CRMV2

11.9

129

9.65

127

7.98

171

10.0

165

5.81

12.4

106

12.3

166

9.96

108

8.75

101

5.64

166

20.3

134

21.1

145

17.2

147

15.8

116

13.7

140

19.6

142

07/13/22

164

ACT

35.0

244

40.7

235

21.7

255

27.5

251

32.5

245

43.9

216

27.5

242

45.4

234

41.0

239

24.6

255

52.5

237

43.5

233

34.2

245

29.7

229

39.6

233

47.4

222

08/05/22

166

RDNet

11.3

123

11.2

136

5.24

125

5.45

6.51

111

16.7

131

8.89

135

16.7

144

14.9

150

4.75

142

16.5

114

19.3

131

15.4

131

12.5

12.1

130

14.2

115

08/08/22

167

UCFNet_RVC

10.7

115

12.2

145

6.48

149

5.83

5.90

102

16.9

133

6.61

105

15.8

138

14.6

147

2.73

11.4

18.8

124

11.0

18.9

155

10.7

120

11.4

08/09/22

168

issga

18.9

176

12.0

144

11.6

211

11.1

182

18.3

189

14.3

115

14.6

196

28.6

190

26.2

197

5.90

171

13.5

41.4

229

21.9

175

22.2

181

19.4

176

30.7

174

08/22/22

169

PSM-Aug

15.0

154

10.1

132

9.43

191

10.8

176

8.87

137

13.8

113

9.63

141

14.0

128

14.7

149

5.98

174

20.7

137

24.6

158

21.3

170

21.1

172

22.7

185

29.0

172

08/29/22

170

MCP-HA-VQ

30.6

223

47.8

256

17.8

239

23.0

238

25.9

223

41.9

207

24.6

232

38.2

214

31.5

215

18.4

233

43.0

205

35.7

203

29.6

227

29.6

228

35.8

214

47.9

224

09/01/22

171

GMStereo

7.14

6.30

6.20

146

6.22

114

6.62

113

9.79

2.76

5.69

5.17

4.04

127

14.0

11.2

6.81

11.8

6.90

12.8

106

In recent years, convolutional-neural-network based stereo matching methods have achieved significant gains compared to conventional methods in terms of both speed and accuracy. Current state-of-the-art disparity estimation algorithms require many parameters and large amounts of computational resources and are not suited to applications on edge devices. In this paper, we propose an end-to-end light-weight network (LWNet) for fast stereo matching, which consists of an efficient backbone with multi-scale feature fusion for feature extraction, a 3D U-Net aggregation architecture for disparity computation and a color guidance in 2D CNN for disparity refinement.

09/20/22

172

LWNet

40.9

260

38.1

227

18.4

241

30.5

259

33.3

252

43.2

215

30.9

254

49.2

244

50.6

257

22.8

250

58.1

248

54.2

261

41.8

263

37.5

257

58.8

261

81.5

270

09/22/22

173

DCstereo

12.6

135

10.1

133

9.23

188

9.04

152

8.19

131

12.3

105

7.29

114

16.4

142

14.7

148

5.11

150

18.8

128

20.3

138

14.4

123

19.4

158

13.2

137

20.0

146

09/27/22

174

FCDSN-DC

18.8

174

23.0

184

7.01

160

10.2

168

20.1

201

37.7

196

17.3

204

27.8

185

20.8

179

7.81

190

23.9

149

24.5

157

22.4

176

20.7

167

16.0

156

19.9

145

10/01/22

175

CREStereo++_RVC

4.68

5.09

4.04

5.24

4.21

5.05

2.11

3.52

3.58

1.67

8.01

6.61

4.68

9.53

4.61

5.98

10/02/22

176

MaskLacGwcNet_RVC

10.4

111

7.52

109

4.50

5.21

6.94

116

18.6

140

5.18

14.7

132

13.3

136

3.01

28.5

165

18.0

120

8.95

11.2

14.2

144

13.8

112

10/02/22

177

raft+_RVC

8.29

11.1

135

4.49

5.97

109

10.3

148

28.5

170

3.75

5.07

2.88

2.21

12.2

15.2

12.3

100

12.7

5.09

10.8

10/03/22

179

GEStereo_RVC

7.97

6.70

3.52

5.90

102

7.63

126

22.5

154

7.61

118

4.89

4.22

2.19

10.4

14.9

11.8

14.3

4.36

11.9

101

10/03/22

178

CroCo_RVC

15.1

156

7.43

108

5.85

139

6.71

125

11.7

155

15.4

117

3.94

36.2

209

35.8

228

3.41

112

18.1

124

29.3

179

10.9

18.0

142

10.6

119

21.0

148

10/03/22

180

iRaftStereo_RVC

8.07

9.13

125

8.25

175

5.55

4.68

6.92

6.41

103

6.29

6.19

3.96

124

17.9

122

13.0

9.58

11.4

9.24

109

11.8

100

10/06/22

181

AGCVNet

12.0

130

10.6

134

5.14

120

5.47

7.00

117

17.0

135

8.91

137

18.9

152

15.7

154

4.64

139

15.8

109

19.1

129

16.6

143

13.7

15.0

150

14.6

120

10/06/22

182

GwcSlice

12.7

137

13.4

149

4.76

105

5.33

7.69

127

17.0

134

11.1

152

13.7

126

9.88

118

4.22

135

20.1

132

20.1

136

17.4

148

16.9

127

14.0

142

36.5

189

10/07/22

183

MCNet

11.6

127

12.6

146

4.72

104

6.63

123

7.62

124

19.0

142

11.4

155

12.5

123

9.12

107

4.47

138

17.6

121

23.1

153

15.5

133

13.5

8.44

104

30.1

173

10/08/22

184

MANet

17.5

166

23.0

185

5.25

126

9.82

163

11.4

153

31.1

178

12.9

172

22.5

160

16.1

159

7.95

192

24.9

153

28.1

175

24.4

190

20.5

165

18.9

173

31.1

176

10/13/22

185

19.9

183

15.1

159

13.1

219

14.3

203

14.2

173

14.2

114

8.90

136

16.4

140

16.0

158

12.3

220

50.2

226

36.4

208

21.7

174

21.8

177

31.8

209

40.4

203

10/16/22

186

LMCR-Stereo

6.27

6.20

4.59

3.92

2.66

4.52

4.88

3.65

3.41

2.08

16.8

117

11.2

8.58

13.2

6.89

10.5

Cost aggregation plays a critical role in existing stereo matching methods. Generally, aggregating matching costs in homogeneous regions with similar disparities is benefi- cial to matching accuracy. However, previous approaches commonly use 3D convolutions for cost aggregation with- out considering the homogeneity of different regions. In this paper, we revisit cost aggregation in stereo match- ing from a perspective of disparity classification and pro- pose a generic yet efficient Disparity Context Aggregation (DCA) module to improve the performance of CNN-based methods.

10/26/22

187

DCANet

8.55

8.41

118

6.26

147

4.79

5.41

10.3

7.14

113

10.1

110

9.75

116

3.38

111

12.8

13.5

12.4

101

12.7

7.37

10.2

11/07/22

189

ConvStereo

4.62

5.66

4.86

109

4.49

2.47

3.38

1.83

2.81

2.62

1.55

10.4

7.32

5.53

9.74

5.08

6.91

11/06/22

188

7.33

5.11

4.18

4.06

2.65

9.94

5.33

5.91

5.37

3.85

121

10.9

15.0

12.8

104

16.4

125

4.95

11.4

11/10/22

190

DLNR

3.20

2.91

2.37

2.18

1.67

3.21

1.37

1.66

1.11

6.25

7.07

3.45

8.90

4.43

2.91

11/11/22

191

ICVP

7.97

11.3

137

3.97

5.02

8.79

136

17.1

136

5.62

7.51

6.97

3.09

13.7

12.7

9.23

10.9

6.28

9.73

11/11/22

192

AnPM

7.35

6.11

4.54

3.35

2.52

11.4

6.96

107

5.61

2.64

1.91

12.4

8.51

9.87

12.3

18.6

169

8.83

12/02/22

193

GANet+ADL

17.7

167

21.3

179

3.97

6.61

122

11.7

156

25.9

166

6.07

40.5

220

35.0

226

3.68

118

24.1

150

19.3

130

13.9

117

14.4

23.0

186

33.5

183

12/05/22

194

Ct-Net

21.0

187

38.5

228

11.3

208

11.7

185

17.4

185

31.7

180

13.0

175

27.1

180

20.4

177

7.45

185

27.9

163

27.7

173

16.5

142

23.5

192

38.8

228

24.4

157

12/12/22

195

KPEA-Stereo

10.6

113

9.69

128

5.35

131

4.52

6.43

108

12.1

103

6.99

108

11.7

119

7.84

5.59

164

16.6

116

19.1

128

13.1

107

17.1

129

12.2

132

26.0

164

01/14/23

196

AASNet

12.8

140

12.8

147

4.41

9.40

158

7.56

123

17.3

137

11.7

159

15.3

133

14.2

140

4.13

133

15.0

101

19.7

132

20.5

164

15.2

106

17.0

160

16.9

125

02/22/23

197

GLC_STEREO

6.42

5.35

4.65

100

5.19

5.79

7.59

2.22

10.4

111

14.1

139

2.00

9.92

10.8

4.94

5.07

6.16

5.72

03/06/23

198

PCVNet

8.19

7.01

104

6.51

151

5.89

101

4.53

7.42

8.10

125

5.49

5.62

2.90

22.0

141

12.7

8.07

11.9

7.87

101

21.9

152

03/07/23

199

GOAT18

8.73

7.26

105

7.32

164

6.80

126

3.47

10.3

10.4

148

5.14

5.16

4.95

146

15.9

110

13.9

11.2

9.62

13.1

136

16.4

123

04/18/23

200

DMCANet

7.79

7.91

113

4.12

3.79

4.26

11.2

10.1

146

6.76

4.85

3.32

110

12.9

13.3

10.5

12.9

9.11

107

10.1

04/28/23

201

ADStereo

18.0

169

16.4

163

14.9

227

12.6

190

21.3

205

20.6

146

16.6

201

15.8

137

16.0

157

7.43

184

19.1

129

52.0

252

24.8

196

18.1

143

17.7

164

11.2

06/09/23

202

SSVM-CFPMF

9.52

102

8.58

121

4.40

5.51

5.84

100

5.84

7.02

110

6.16

14.3

142

5.30

155

17.0

118

15.9

102

14.8

126

18.7

150

6.52

13.7

110

06/22/23

203

IGEV-Stereo

4.83

3.17

2.46

1.97

2.19

5.63

1.22

16.2

139

9.20

108

1.17

3.77

4.93

5.35

6.99

2.31

5.00

06/26/23

204

CCL-Stereo

30.9

227

50.9

259

9.17

185

11.0

180

33.0

247

88.2

269

1.91

47.3

239

26.8

199

11.7

214

41.7

198

37.4

213

23.7

184

28.8

221

63.0

263

42.8

208

08/03/23

205

26.6

206

39.9

230

17.7

238

22.2

237

23.0

211

36.6

193

18.3

210

29.7

192

24.2

186

16.9

230

42.6

201

34.0

195

28.9

224

28.8

223

26.2

193

39.8

201

08/10/23

206

CroCo-Stereo

7.29

4.90

3.62

1.74

7.01

118

9.90

1.78

16.4

143

17.4

167

1.45

6.20

15.3

4.95

8.62

5.00

10.0

08/12/23

207

UGRU

10.8

116

4.71

4.27

2.12

13.2

169

15.7

122

1.95

20.5

156

25.4

192

1.68

7.78

25.2

160

11.1

14.4

8.79

106

10.5

08/13/23

208

Any-RAFT

5.22

5.19

4.20

4.00

2.23

5.88

4.06

3.05

2.91

2.04

9.76

10.7

8.77

9.90

4.94

6.72

We propose a novel deep stereo matching network a new real-world stereo dataset of cluttered objects taken with a commercially available stereo sensor. We design a U-shaped architecture with various types of attentions which more efficiently extracts global and local contexts from rectified image pairs, resulting in highly accurate disparities. Furthermore, its symmetric structure allows simultaneous estimation both left and right disparity. It can also implicitly estimate the uncertainty i.e. the confidence of estimated disparities.

09/14/23

209

CASS

11.8

128

9.23

126

8.92

182

10.4

169

7.84

129

12.4

106

4.43

8.42

8.65

100

6.03

176

30.9

172

17.1

115

15.3

130

15.7

113

17.8

165

19.0

140

09/27/23

210

FM-DT

40.6

258

45.4

251

20.8

252

32.3

264

40.5

263

51.3

236

30.6

251

52.1

249

48.0

250

23.2

252

53.4

238

47.0

244

39.7

257

29.0

224

55.3

258

76.2

265

10/09/23

211

EGLCR-Stereo

4.03

4.69

2.46

3.70

2.99

10.7

2.48

1.95

1.63

0.94

5.76

8.17

3.84

10.3

2.99

4.87

10/26/23

212

StereoStar

33.0

236

15.4

160

8.57

179

12.0

187

19.0

194

29.6

172

21.7

222

62.6

268

61.0

270

16.9

229

37.2

191

59.9

268

35.2

249

43.9

266

40.0

234

40.9

204

10/28/23

213

SAMTormer

3.63

3.84

2.95

1.85

0.98

3.31

2.02

1.71

1.67

0.82

5.50

10.4

4.57

11.7

3.92

3.41

10/30/23

214

LSTS

17.3

165

8.70

123

6.18

145

8.41

146

9.63

143

21.3

149

13.2

181

29.5

191

29.1

208

5.00

147

25.0

154

24.9

159

22.6

177

21.4

173

15.4

152

33.1

181

10/30/23

215

LoS

4.20

5.85

4.92

113

4.64

2.77

3.92

1.32

2.36

2.17

1.81

8.18

6.58

4.55

8.57

4.57

5.06

11/10/23

216

GASNet

33.1

238

21.3

180

16.9

235

26.3

249

33.2

250

39.5

198

17.7

205

26.7

176

26.0

196

21.3

245

54.1

242

46.9

243

33.3

243

36.8

254

63.2

265

63.4

249

11/12/23

217

SNDR

6.09

5.30

4.20

3.11

2.66

9.22

4.70

5.10

3.98

4.26

136

9.96

7.01

9.11

15.1

104

4.57

7.26

11/13/23

218

Selective-IGEV

2.51

2.54

1.86

2.51

1.12

7.22

1.23

1.36

1.17

1.16

4.48

4.83

2.99

3.79

2.26

4.72

11/15/23

219

D2Stereo

9.33

5.25

3.25

7.13

133

4.16

13.3

111

5.70

5.75

7.30

1.65

12.6

10.8

9.92

16.3

124

31.6

208

5.64

11/16/23

220

LoS_RVC

5.14

7.57

111

4.82

107

4.27

3.20

8.71

2.62

3.45

2.95

1.56

8.91

6.79

6.57

9.87

6.67

4.61

12/07/23

221

4D-IteraStereo

10.9

119

5.87

5.59

132

6.15

112

6.07

103

9.15

7.46

116

27.0

179

32.4

218

2.46

12.2

11.2

7.18

12.2

7.37

6.77

This article presents a disparity map algorithm to improve the depth map estimation based on Census Transform and hierarchical segment-tree on each block.The stereo matching algorithm presented in this study comprises of four steps: Cost Computation, Cost Aggregation, Optimization, and Post-Processing, all of which will refine the final disparity map.

12/31/23

222

H-CENST

38.4

252

41.6

237

26.7

268

31.8

262

33.0

249

43.0

213

32.7

257

53.1

252

50.5

256

24.8

256

51.4

234

47.0

245

36.7

253

31.9

240

40.5

236

53.4

232

Unsupervised Stereo Matching methods have made significant strides recently. However, these approaches have predominantly relied on the assumption of photometric consistency, leading to potential limitations: sensitivity to illuminance changes and difficulty in dealing with problematic areas like occluded or textureless regions. To mitigate these limitations, this paper introduces a novel self-supervised dual-level framework named \textbf{\textit{Dual-Net}}. This framework mainly consists of two key components: self-supervised teacher training and student training based on knowledge distillation. Specifically, the teacher model is first trained in a self-supervised fashion with a focus on feature space and data augmentation consistency. On the one hand, pixels from feature space are robust to noise and luminance changes, which are discriminative even in textureless regions. On the other hand, a data augmentation consistency loss is presented to guide the model toward enhanced contextual awareness, thus leading to a completed depth estimation in problematic regions. Then, the knowledge learned by the teacher model is distilled and transferred probabilistically to the student model. By leveraging this distilled knowledge, the student model is guided by validated insights, enabling it to outperform its teacher model by a large margin.

01/08/24

223

DualNet

16.4

162

19.7

173

7.99

172

10.1

166

18.3

190

24.1

159

10.0

144

23.9

164

20.4

175

7.79

189

23.0

146

23.1

151

16.3

140

18.8

153

17.0

159

18.5

137

01/08/24

224

GINet

15.6

159

16.1

161

7.15

162

7.37

135

9.39

141

25.1

161

7.88

123

35.2

207

32.6

220

3.19

101

15.5

107

16.7

112

11.6

14.7

15.8

154

26.4

165

01/31/24

225

HART

4.24

3.13

2.24

4.16

1.10

4.01

2.03

1.86

1.68

0.85

9.83

11.0

8.71

9.65

3.26

6.96

02/19/24

226

HCR

12.4

133

8.33

117

3.79

5.54

9.27

140

26.1

167

6.26

100

32.9

199

32.0

217

2.38

11.4

10.8

8.08

15.8

115

7.69

6.48

The project proposes a stereo matching network based on neural operator, which can achieve mapping from RGB image pair space to disparity space. This network supports users to test images at any scale, and can customize the disparity range according to different scenarios, and dynamically build Cost Volume based on different scales and disparity ranges.

02/20/24

227

DispNO

15.0

153

18.4

169

6.17

144

9.13

154

11.3

152

25.2

162

11.4

155

17.6

148

14.3

141

8.70

196

31.7

177

21.6

147

18.1

156

17.6

137

16.2

157

17.5

130

02/21/24

228

ClearDepth

3.48

4.14

3.16

2.81

1.95

4.55

2.36

1.73

1.70

1.25

5.46

11.2

3.12

7.30

3.70

3.45

02/28/24

229

AKD_Stereo

3.87

4.21

3.53

3.91

1.08

7.63

4.75

1.72

1.60

1.18

5.26

9.62

3.66

7.63

3.23

5.37

03/04/24

230

AEACV

4.15

5.53

2.98

2.54

3.23

3.42

1.57

2.85

2.99

1.22

4.63

5.96

4.36

12.9

5.41

4.08

03/04/24

231

ET_Stereo

4.00

4.38

3.33

2.85

1.53

7.84

2.61

1.91

1.82

1.05

5.08

8.72

7.52

8.81

3.09

4.82

03/11/24

232

StereoIM

9.25

3.47

3.05

1.78

9.22

139

10.6

1.65

27.0

178

25.6

194

1.25

5.89

20.9

143

4.99

12.0

3.39

10.6

03/11/24

233

MIF-Stereo

11.3

124

6.68

154

5.90

102

7.33

119

8.57

11.9

161

11.1

115

11.6

131

5.24

152

18.4

127

27.7

174

11.8

15.8

114

13.8

141

19.8

144

04/14/24

234

SMFormer

12.8

141

14.2

152

7.76

169

7.10

131

6.43

108

17.6

138

8.81

134

9.71

106

6.39

3.49

114

16.3

111

19.1

127

10.9

18.3

145

30.3

205

35.6

186

04/19/24

235

DCSE

16.2

160

16.1

162

4.76

105

6.47

118

12.5

163

29.9

174

8.91

137

34.7

205

33.9

222

3.98

125

22.8

144

18.9

125

16.3

138

15.2

105

10.7

120

23.8

154

05/17/24

236

FormerRaft_RVC

10.9

120

13.4

150

8.32

176

6.67

124

9.42

142

15.6

120

3.24

9.67

105

10.5

123

5.30

155

13.8

17.8

119

9.52

17.2

132

17.1

161

18.8

138

06/05/24

237

MGS-Stereo

3.57

3.62

2.93

3.43

2.66

6.24

2.54

2.04

2.15

1.23

5.81

8.40

3.56

6.48

3.18

4.81

06/06/24

238

MoCha-V2

3.51

2.52

1.95

2.25

1.47

4.61

0.98

7.35

8.07

0.66

2.95

4.18

4.46

5.70

2.54

2.70

06/14/24

239

IGEV++

3.23

3.24

2.46

4.12

1.15

6.71

1.38

1.53

1.52

1.02

4.57

4.68

5.41

7.68

2.22

4.68

06/27/24

240

CAS++

3.33

4.27

3.72

3.17

2.17

2.44

1.33

2.24

2.01

1.47

4.04

8.15

4.97

5.80

3.73

3.04

07/22/24

241

apnet

30.9

226

18.3

168

9.59

197

17.1

222

24.8

219

49.1

230

19.5

214

32.3

198

29.2

209

22.2

249

60.7

257

33.2

191

27.0

213

28.0

216

64.4

269

63.7

250

08/01/24

243

RSM

2.40

2.66

1.88

3.18

0.91

5.80

1.34

1.35

1.16

0.93

3.35

3.96

2.88

4.38

2.01

4.15

08/07/24

244

AIO-Stereo

2.36

2.38

1.71

3.22

0.85

5.83

1.24

1.42

1.32

1.03

4.49

4.81

2.43

3.61

2.12

3.63

08/12/24

245

PointerNet

2.69

2.67

1.84

3.21

1.51

7.52

1.29

1.54

1.17

1.09

3.59

3.96

3.10

5.60

2.29

4.27

08/13/24

246

UniTT-Stereo

6.34

3.96

2.69

1.82

7.92

130

11.7

101

1.81

14.2

129

13.8

137

1.22

5.07

16.6

111

4.09

5.89

2.91

8.44

This paper focuses on effectively capturing local patterns from images during the fine-tuning of Transformer-based models with limited labeled training data in dense downstream tasks, particularly in the context of stereo matching. For that, we propose MaDis-stereo, a novel stereo depth estimation framework that enhances locality inductive biases during fine-tuning via Masked Image Modeling (MIM).

08/15/24

247

MaDis-Stereo

9.49

101

3.73

3.14

1.76

9.05

138

10.5

1.74

27.8

184

27.9

203

1.50

7.47

19.8

133

4.80

11.8

3.40

10.2

07/27/24

242

esmea

30.1

220

29.4

203

9.48

192

17.0

221

31.7

240

49.7

231

15.2

197

52.6

251

45.9

245

11.9

217

46.5

213

52.1

253

27.2

215

23.7

195

25.2

189

54.5

235

09/08/24

248

RSD

3.73

2.13

1.98

1.71

2.03

2.63

0.87

8.66

9.69

115

0.96

2.54

6.82

2.34

7.76

2.23

2.57

09/26/24

249

GCAP_Stereo

4.31

5.32

3.40

2.38

2.16

11.2

4.44

2.13

2.04

1.32

7.16

8.97

5.03

8.38

3.22

6.08

We propose S-MoEStereo, which adapts pre-trained VFMs for stereo matching by integrating Low-Rank Adaptation (LoRA) with Mixture-of-Experts (MoE) modules. This approach balances parameter efficiency and discriminative feature learning by dynamically selecting the optimal expert within each MoE module. Additionally, we introduce CNN-based adapter layers to incorporate inductive bias, enhancing geometric feature extraction. Furthermore, we propose a lightweight decision network to reduce computational costs by selectively activating MoE modules based on input complexity.

10/26/24

250

SMoEStereo_RVC

5.83

6.58

5.15

121

3.96

3.82

5.84

5.41

4.43

4.21

2.31

10.2

8.41

6.20

11.1

6.59

8.19

10/30/24

251

MonoStereo

2.64

3.72

1.68

1.77

1.05

10.5

0.88

1.27

0.97

0.63

4.39

8.10

4.59

3.70

1.73

2.69

11/01/24

252

GIP-stereo

4.03

2.86

2.31

2.64

1.76

5.00

1.55

6.75

7.30

0.86

4.63

7.24

3.51

10.6

2.06

2.37

11/03/24

253

DEFOM-Stereo

2.39

2.82

2.21

1.53

1.01

5.24

0.88

1.40

1.14

0.85

2.64

9.10

2.18

5.50

2.49

1.67

11/05/24

254

coffe_stereo

2.82

2.70

1.98

1.87

0.61

3.32

2.45

1.07

1.30

1.02

2.63

4.13

2.18

8.38

2.27

11.6

11/10/24

256

CFF

15.1

155

7.95

114

8.00

173

5.26

10.2

147

12.3

104

4.94

21.1

158

23.4

184

5.40

159

16.3

111

35.2

201

17.8

154

20.9

169

26.0

192

19.6

142

11/10/24

255

AdaRStereo

12.2

132

6.86

100

6.70

155

4.89

8.30

132

10.8

2.75

26.3

173

25.5

193

3.31

109

14.8

100

26.3

165

9.13

15.9

119

14.4

146

11.3

12/16/24

258

RPS

2.61

2.46

1.71

3.92

0.79

5.19

2.44

0.93

0.84

0.93

2.41

3.39

3.45

3.28

1.82

11.6

11/28/24

257

DEFOM-Stereo_RVC

3.28

3.50

2.61

2.41

0.87

2.51

0.89

1.38

1.26

0.97

6.35

10.8

2.43

11.0

3.03

5.00

02/03/25

259

FoundationStereo

1.84

2.46

1.71

1.36

0.79

5.19

0.53

0.93

0.84

0.93

2.41

3.39

3.45

3.28

1.82

1.17

02/06/25

260

SLEDC

10.1

105

8.58

121

4.40

5.51

5.84

100

21.4

150

7.02

110

6.16

14.3

142

5.30

155

17.0

118

15.9

102

14.8

126

18.7

150

6.52

13.7

110

02/10/25

261

GREAT-IGEV

2.81

3.30

2.44

2.31

0.96

7.12

1.17

1.38

1.36

1.04

3.89

3.82

4.66

6.24

2.17

4.65

02/16/25

262

TCM

12.8

139

13.5

151

5.64

134

7.01

130

12.0

159

30.2

176

10.0

145

9.66

104

7.53

5.58

163

27.0

159

21.0

144

16.0

136

18.8

152

12.5

134

18.1

134

03/02/25

263

State-Stereo

2.64

3.55

3.06

1.91

1.19

1.67

1.28

1.68

1.66

1.22

1.64

4.45

5.81

5.06

3.25

2.38

03/04/25

264

LG-Stereo

1.76

2.57

1.86

2.02

0.65

3.23

0.68

0.98

0.81

0.55

2.15

4.26

2.03

3.85

1.27

2.42

03/19/25

265

G2L-Stereo

13.3

145

16.8

165

4.45

5.64

16.8

184

23.0

157

5.05

20.4

155

16.2

160

2.83

15.3

104

24.0

156

18.2

158

19.5

159

5.48

25.4

160

We introduce Stereo Anywhere, a novel stereo-matching framework that combines geometric constraints with robust priors from monocular depth Vision Foundation Models (VFMs). By elegantly coupling these complementary worlds through a dual-branch architecture, we seamlessly integrate stereo matching with learned contextual cues.

04/24/25

267

StereoAnywhere

3.69

7.34

106

2.23

5.12

18.1

139

0.90

2.16

1.43

1.25

5.73

4.95

2.66

6.89

2.28

1.86

M2-Stereo embedded three Multi scale Feature Fusion Attention Blocks in the feature extraction stage to fuse deep and shallow information, and used a Multi scale Cost Aggregation Module in the cost aggregation stage to achieve sharing of cost information at different scales. Finally, the Multi branch Iterative Strategy was used for efficient iteration.

04/24/25

266

M2-Stereo

3.90

8.13

115

2.35

1.61

3.58

15.9

125

1.43

1.83

1.13

0.79

3.09

7.36

6.23

7.99

2.63

4.03

05/07/25

268

G2L-ROB

11.6

126

15.0

158

5.19

123

4.91

13.1

168

19.4

143

6.22

17.5

147

14.0

138

2.55

13.9

20.8

142

14.0

119

18.2

144

6.96

14.6

119

DS-Stereo utilizes our proposed Adjacent Feature Hybrid Attention Block and Hierarchical Cost Aggregation Module to achieve deep to shallow information interaction in stereo matching. Simultaneously replacing the traditional ConvGRU iterative operator with an Inception like iterative operator to achieve high convergence updates.

05/07/25

269

DS-Stereo

3.13

4.92

2.14

1.40

1.23

10.1

1.16

1.55

1.25

0.93

3.41

6.18

5.10

8.66

2.21

2.43

05/10/25

270

MatchStereo

1.85

2.61

2.14

1.79

0.59

1.30

0.80

1.11

0.95

0.65

1.90

3.17

2.41

5.43

1.88

1.79

05/21/25

271

waterstereo

8.48

6.74

6.51

151

9.22

156

5.11

6.44

4.71

5.50

5.36

3.44

113

18.2

126

14.2

9.13

14.4

13.3

138

12.9

107

06/05/25

272

MGS-Selectiv

2.16

2.62

1.58

2.20

0.76

6.45

1.04

1.39

1.08

0.67

1.73

4.32

1.43

6.29

1.83

2.30

This paper proposes a robust stereo matching algorithm that combines a CNN for initial cost computation, bilateral filtering with cross-based cost aggregation (CBCA) for refinement, and a winner-take-all (WTA) strategy for disparity selection, followed by an edge-aware smoothing filter (EASF) to reduce noise

06/12/25

273

IRDINA

35.4

245

40.3

233

23.5

261

26.0

248

28.6

232

40.6

201

28.5

248

46.6

236

43.2

241

26.1

260

51.8

235

39.2

224

32.0

236

33.3

247

44.8

244

46.9

220

06/17/25

274

UnViTAStereo

24.3

195

28.5

202

9.72

199

12.5

189

12.7

165

29.6

173

12.1

163

45.0

232

39.5

236

9.19

201

42.7

202

27.0

169

21.1

168

22.3

182

36.7

217

38.8

198

06/27/25

276

S2M2

1.15

1.29

1.23

1.27

0.40

0.45

0.59

0.67

0.62

0.45

1.28

2.80

1.37

3.60

1.12

0.25

06/17/25

275

PanMatch

7.18

5.21

5.34

129

3.34

5.43

4.52

2.47

13.2

124

13.3

135

1.51

8.34

16.6

109

8.07

10.3

4.96

9.23

07/11/25

277

SLEDC_v1

6.67

4.22

2.72

3.49

3.38

13.3

110

5.11

4.36

3.92

2.19

13.5

10.8

11.9

14.5

6.83

8.27

Reference list