Middlebury Stereo Evaluation

before rounding bug fix

We fixed a rounding issue on 1/8/2017. This affected the error numbers for the three "old" image pairs with integer ground truth (test/Computer, training/ArtL, and training/Teddy), for which submitted disparities should have been rounded to integers (at full resolution) before evaluation.

See other snapshots here.

Set:	test densetest sparsetraining densetraining sparse
Metric:	bad 0.5 bad 1.0 bad 2.0 bad 4.0 avgerr rms A50 A90 A95 A99 time time/MP time/GD
Mask:	nonocc all
plot selected show invalid Reset sort Reference list

Reference

Description

Parameters

Running Environment

[stat] error

Warning: Undefined variable $ids in /home/vision/public_html/stereo/snap3/2017_01_07/eval3/get_table.php on line 402

bad 2.0 (%)

Weight

Date

Name

Res

Avg

Austr

AustrP

Bicyc2

Class

ClassE

Compu

Crusa

CrusaP

Djemb

DjembL

Hoops

Livgrm

Nkuba

Plants

Stairs

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 250
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 1.5
nd: 256
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 410
im0	im1
GT
nonocc

MP: 5.9
nd: 320
im0	im1
GT
nonocc

MP: 5.5
nd: 570
im0	im1
GT
nonocc

MP: 5.6
nd: 320
im0	im1
GT
nonocc

MP: 5.2
nd: 450
im0	im1
GT
nonocc

OpenCV's "semi-global block matching" method; memory-intensive 2-pass version, which can only handle the quarter-size images. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/25/14

SGBM2

26.6

27.9

12.1

17.8

13.7

74.5

16.2

30.3

26.3

11.0

64.4

37.9

25.8

25.3

29.3

43.7

OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/25/14

SGBM1

28.1

28.3

17.2

19.0

14.5

57.9

18.2

31.8

31.4

13.2

58.6

38.6

27.0

25.9

31.4

59.7

The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.

07/25/14

SGM

21.2

35.5

9.57

13.8

16.5

32.1

23.3

25.8

16.7

8.95

39.8

31.1

22.6

20.7

21.3

32.2

07/25/14

SGBM1

28.4

43.5

9.09

13.6

25.9

82.0

14.4

43.4

30.3

5.98

59.3

45.8

28.5

24.9

20.1

45.9

07/28/14

SGBM1

24.0

32.9

10.8

13.6

16.2

71.2

14.7

26.6

23.0

5.83

53.8

39.2

25.6

22.8

18.8

47.4

07/28/14

SGM

18.7

40.3

4.54

8.03

22.9

40.5

14.6

24.7

10.1

5.40

29.6

28.5

23.9

20.0

14.2

30.9

07/28/14

SGM

25.3

45.1

4.33

6.87

32.2

50.0

13.0

48.1

18.3

7.66

29.6

36.1

31.2

24.2

24.5

50.2

Correlation with five, partly overlapping windows on Census transformed images using Hamming distance as matching cost. A left-right consistency check ensures unique matches and filtering small disparity segments removes outliers. Interpolation is done within image rows with the lowest, valid neighboring disparity.

07/28/14

Cens5

26.9

47.1

8.74

11.9

25.6

45.3

22.6

40.6

29.0

9.93

36.5

38.6

31.0

25.0

25.6

44.6

A fast method for high-resolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/- 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.

08/25/14

LPS

19.4

6.14

5.34

9.24

7.53

96.0

15.0

9.61

9.40

5.18

92.4

27.4

24.3

23.0

10.0

25.6

08/27/14

LPS

20.3

6.72

6.06

9.72

9.87

94.3

14.1

11.2

5.88

89.3

36.0

20.5

23.8

16.0

25.4

08/31/14

BSM

41.5

59.8

25.8

27.9

38.9

60.6

33.3

46.9

37.3

26.3

64.8

51.5

42.6

45.2

42.8

66.6

09/10/14

LAMC_DSM

26.2

55.8

11.9

14.3

18.3

44.0

21.3

39.9

29.5

6.67

31.1

34.5

28.8

26.3

30.1

35.7

09/18/14

SNCC

22.2

48.6

6.98

9.79

25.7

46.0

15.2

36.8

16.6

7.25

23.1

34.2

26.7

21.8

19.9

28.4

10/07/14

IDR

18.4

37.5

4.08

7.49

23.3

40.6

15.7

24.5

11.3

5.46

33.1

26.0

21.5

21.7

15.3

21.2

11/12/14

LCU

17.0

24.7

7.59

11.6

11.9

27.9

14.0

19.3

15.8

8.10

36.1

29.1

21.3

18.4

14.1

23.8

In stereo matching cost filtering methods and energy minimization algorithms are considered as two different techniques. Due to their global extend energy minimization methods obtain good stereo matching results. However, they tend to fail in occluded regions, in which cost filtering approaches obtain better results. In this paper we intend to combine both approaches with the aim to improve overall stereo matching results. We propose to perform stereo matching as a two-step energy minimization algorithm. We consider two MRF models: a fully connected model defined on the complete set of pixels in an image and a conventional locally connected model. We solve the energy minimization problem for the fully connected model, after which the marginal function of the solution is used as the unary potential in the locally connected MRF model.

01/21/15

TSGO

39.1

34.1

16.9

20.0

43.3

55.4

14.3

54.1

49.2

33.9

66.2

45.9

39.8

42.6

47.2

52.6

04/08/15

REAF

31.4

58.3

30.9

13.1

45.3

63.8

30.9

38.7

25.3

8.60

39.3

36.8

27.0

35.5

18.2

39.7

04/09/15

PFS

32.2

65.1

29.4

12.1

50.0

70.8

28.2

44.6

23.1

7.85

37.0

37.7

27.9

36.0

19.8

35.7

04/17/15

TMAP

17.1

20.2

4.94

8.13

12.8

30.0

14.1

27.9

20.4

5.09

31.5

23.1

20.9

19.0

18.8

18.0

This approach triangulates the polygonized SLIC segmentations of the input images and optimizes a lower-layer MRF on the resulting set of triangles defined by photo consistency and normal smoothness. The lower-layer MRF is solved by a quadratic relaxation method which iterates between PatchMatch and Cholesky Decomposition. The lower-layer MRF is assisted by a upper-layer MRF defined on the set of triangle vertices which exploits local 'visual complexity' cues and encourages smoothness of the vertices' splitting properties. The two layers interact through an Alignment energy term which requires triangles sharing a non-split vertex to have their disparities agree on that vertex. Optimization of the whole model is iterated between optimizations of the two layers till convergence where the upper-layer can be solved in closed form.

04/20/15

MeshStereo

13.4

5.90

4.88

10.8

12.9

10.6

13.6

12.2

9.01

5.39

27.4

23.5

17.7

21.0

15.4

20.9

Compute the matching cost with a convolutional neural network (accurate architecture). Then apply cross-based cost aggregation, semiglobal matching, left-right consistency check, median filter, and a bilateral filter. DETAILS: The network is similar to the one described in our CVPR paper differing only in the values of some hyperparameters. The input to the network are two 11 x 11 image patches. Five convolutional layers with 3 x 3 kernels and 112 feature maps extract feature vectors from the input image patches. The two 112-length feature vectors are concatenated into a 224-length vector which is passed through three fully-connected layers with 384 units each. The final (fourth) fully-connected layer projects the output to a single number---the matching cost. One important addition was the use of data augmentation techniques to increase the size of the training set. We tried to use as much training data as possible. Therefore we combined all of the 2001, 2003, 2005, 2006, and 2014 Middlebury datasets obtaining 60 image pairs. For the newer datasets (2005, 2006, and 2014) we also used several illumination and exposure settings.

08/28/15

MC-CNN-acrt

8.29

5.59

4.55

5.96

2.83

11.4

8.44

8.32

8.89

2.71

16.3

14.1

13.2

13.0

6.40

11.1

A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed. (Improved results as of 9/14/2015 due to bug fix in color-to-gray conversion.)

09/14/15

ELAS

32.3

50.9

9.17

11.0

33.0

88.2

18.3

47.3

26.8

11.7

41.7

37.4

23.7

28.8

63.0

42.8

09/14/15

ELAS

27.3

43.3

12.5

13.9

23.7

66.1

20.4

33.1

20.5

11.0

43.9

37.8

26.4

28.6

38.3

33.3

09/28/15

R-NCC

48.4

26.2

14.8

30.2

30.9

72.9

41.6

77.7

64.1

27.4

59.1

71.9

50.9

33.9

78.2

80.8

The method generates multiple proposals on absolute and relative disparities from multi-segmentations. The proposals are coordinated by point-wise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes.

10/13/15

MDP

12.6

14.4

4.99

10.6

10.7

27.2

8.11

12.5

8.07

4.27

30.4

20.5

12.6

17.8

13.4

17.3

We post-process the depth maps produced by Zbontar & LeCun's MC-CNN technique. We use a domain transform to compute an edge-aware variance measure of our confidence in the depth map, and then run our robust bilateral solver on that depth map and confidence with a Geman-McClure loss function. The MC-CNN is computed using the publicly-available implementation (https://github.com/jzbontar/mc-cnn) which using the GPU, and the robust bilateral solver is computed using our CPU implementation which does not use the GPU, and is written in vanilla C++.

11/04/15

MC-CNN+RBS

8.62

6.05

5.16

6.24

3.27

11.1

8.91

8.87

9.83

3.21

15.1

15.9

12.8

13.5

7.04

9.99

11/05/15

GCSVR

14.8

17.1

3.50

8.22

16.5

47.4

11.4

9.75

7.06

3.17

34.4

27.1

18.3

19.2

16.0

19.3

12/18/15

INTS

14.8

20.2

4.52

8.62

11.6

29.5

13.7

16.4

10.3

4.69

27.6

22.5

20.7

20.5

11.5

24.9

An efficient stereo matching algorithm, which applies adaptive smoothness constraints using texture and edge information, is proposed in this work. First, we determine non-textured regions, on which an input image yields flat pixel values. In the non-textured regions, we penalize depth discontinuity and complement the primary CNN-based matching cost with a color-based cost. Second, by combining two edge maps from the input image and a pre-estimated disparity map, we extract denoised edges that correspond to depth discontinuity with high probabilities. Thus, near the denoised edges, we penalize small differences of neighboring disparities. The method uses the MC-CNN code for the matching cost computation only.

01/19/16

NTDE

7.62

5.72

4.36

5.92

2.83

10.4

8.02

5.30

5.54

2.40

13.5

14.1

12.6

13.9

6.39

12.2

01/26/16

MC-CNN-fst

9.69

7.35

5.07

7.18

4.71

16.8

11.2

7.37

6.97

2.82

20.7

17.4

15.4

15.1

7.90

12.6

01/21/16

MCCNN_Layout

9.16

5.53

5.63

5.06

3.59

12.6

9.97

7.53

8.86

5.79

23.0

13.6

15.0

14.7

5.85

10.4

Our approach is an extension of the ELAS (from Geiger et al.) algorithm. We extract edges and sample our candidate support points along them. For every two consecutive valid support points we create a (straight) line segment. We force the triangulation to include the set of line segments (constrained Delaunay) for a better preservation of the disparity discontinuity at the edges.

02/18/16

LS-ELAS

36.7

53.5

10.3

15.8

37.0

83.6

24.5

49.1

34.6

13.9

44.9

45.7

34.9

29.1

64.4

62.7

The computation of the sparse disparity maps is achieved by means of a 3D diffusion of the costs contained in the disparity space volume. The watershed segmentations of the left and right views control the diffusion process and valid measurements are obtained by cross-checking. The estimation of the dense disparity maps uses the sparse measurements as control points and is driven by a 3D watershed separating the disparity space volume into foreground and background pixels.

03/15/16

MPSV

43.5

58.8

33.9

34.2

37.9

52.4

30.8

56.8

51.0

30.6

56.9

51.5

44.6

43.4

44.2

54.2

No post processing (no filtering, no hole-filling, no interpolation) performed. The concepts of intrinsic curves were revisited and used for: - disparity search space reduction, resulting in 83% reduction of the disparity range (individually for each pixel) directly from the original resolution of the image without needing hierarchical search - reducing the ambiguities due to occluded pixels by integrating occlusion clues explicitly into the global energy function as a soft prior The final energy minimization was done using semi global approach along eight paths.

04/03/16

ICSG

45.6

69.7

19.1

21.3

43.6

77.6

36.9

65.3

40.4

20.3

53.6

58.7

46.5

47.1

60.7

79.1

04/12/16

MeshStereoExt

7.29

4.41

3.98

5.40

3.17

10.0

8.89

4.62

4.77

3.49

12.7

12.4

10.4

14.5

7.80

8.85

04/13/16

Glstereo

19.8

20.7

8.25

13.9

18.6

34.6

12.9

25.4

23.2

8.61

31.1

30.6

20.3

25.4

21.1

22.7

04/24/16

HLSC_cor

26.4

26.5

15.2

21.0

20.5

35.7

27.7

33.1

35.0

11.9

39.1

34.2

25.2

32.8

28.3

22.7

04/28/16

JEM

37.2

35.7

27.9

30.6

33.2

43.0

31.4

49.5

47.3

26.5

49.6

46.0

35.7

30.8

37.5

55.8

A 3D label based method with global optimization at pixel level. A bilayer matching cost is employed by first matching small square windows then aggregate on large irregular windows. Global optimization is carried out by fusing candidate proposals, which are generated from our specific superpixel structure.

05/12/16

PMSC

6.87

3.46

2.68

6.19

2.54

6.92

6.54

3.96

4.04

2.37

13.1

12.3

12.2

16.2

5.88

10.8

05/28/16

APAP-Stereo

7.46

5.43

4.91

5.11

5.17

21.6

9.50

4.31

4.23

3.24

14.3

9.78

7.32

13.4

6.30

8.46

11/06/15

SOU4P-net

13.5

23.1

5.41

6.39

13.1

30.5

11.1

16.4

12.7

3.13

28.9

17.1

16.4

16.9

10.7

14.5

07/03/16

LPU

10.5

11.4

3.18

8.10

6.08

20.9

9.84

6.94

4.00

4.04

33.9

16.9

15.2

17.8

9.12

11.6

08/31/16

SED

63.4

54.3

22.4

72.9

64.5

71.4

42.5

80.1

67.9

49.8

79.6

74.4

65.4

55.1

86.1

91.6

09/14/16

SNP-RSM

8.98

5.46

4.85

6.50

3.37

10.4

10.1

8.73

9.37

3.58

14.3

14.7

14.9

12.8

10.1

10.8

09/20/16

LFSIR

70.1

75.7

60.3

67.1

72.4

80.8

53.7

85.4

83.8

42.5

91.2

90.4

64.1

71.3

61.5

90.3

10/19/16

LW-CNN

7.23

4.65

3.95

5.30

2.63

11.2

7.86

4.32

4.22

2.43

12.2

13.4

13.6

14.8

4.72

12.0

10/23/16

SIGMRF

64.2

60.0

33.0

67.9

63.2

99.5

39.8

84.8

82.0

35.2

95.2

91.5

58.1

65.8

55.0

88.6

11/07/16

SPS

19.6

14.2

12.3

14.9

12.0

15.8

19.1

17.4

15.4

8.23

30.9

34.8

30.6

25.3

28.3

28.0

11/11/16

JMR

16.5

27.9

4.79

8.62

20.2

37.6

15.1

24.2

17.4

4.47

19.2

22.0

20.0

21.1

9.43

16.2

11/14/16

DES

11.2

7.80

4.56

10.2

5.62

9.75

10.4

9.19

8.39

4.21

30.9

17.5

16.9

17.1

12.2

15.3

11/15/16

MC-CNN-SS

12.3

14.8

7.20

11.1

7.62

15.9

14.3

11.5

9.01

3.89

19.7

20.5

16.3

12.1

18.3

11/16/16

UCNN

20.5

44.8

9.77

13.6

18.2

36.5

12.8

23.4

12.4

9.22

39.5

30.5

24.8

21.2

19.1

32.3

11/16/16

MCSC

11.3

13.3

5.96

10.6

8.69

7.22

11.3

10.6

7.48

3.07

3.10

25.2

19.0

17.2

10.3

25.5

11/25/16

ADSM

38.7

40.4

20.3

27.3

35.1

55.9

22.3

56.1

50.9

24.2

58.0

56.3

36.5

32.1

38.7

69.7

Middlebury Stereo Evaluation - Version 3

before rounding bug fix