Middlebury Stereo Evaluation

ELAS update

Snapshot of 9/14/2015 before updating ELAS results. The old ELAS results were worse because of a bug in the conversion to grayscale images (using ImageMagick's "convert" under linux). The conversion is now done via the code in the SDK 1.6.

See other snapshots here.

Set:	test densetest sparsetraining densetraining sparse
Metric:	bad 0.5 bad 1.0 bad 2.0 bad 4.0 avgerr rms A50 A90 A95 A99 time time/MP time/GD
Mask:	nonocc all
plot selected show invalid Reset sort

Reference

Description

Parameters

Running Environment

[stat] error

Warning: Undefined variable $ids in /home/vision/public_html/stereo/snap3/2015_09_14/eval3/get_table.php on line 399

bad 2.0 (%)

Weight

Date

Name

Res

Avg

Austr

AustrP

Bicyc2

Class

ClassE

Compu

Crusa

CrusaP

Djemb

DjembL

Hoops

Livgrm

Nkuba

Plants

Stairs

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 290
im0	im1
GT
nonocc

MP: 5.6
nd: 250
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 5.7
nd: 610
im0	im1
GT
nonocc

MP: 1.5
nd: 256
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.5
nd: 800
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 320
im0	im1
GT
nonocc

MP: 5.7
nd: 410
im0	im1
GT
nonocc

MP: 5.9
nd: 320
im0	im1
GT
nonocc

MP: 5.5
nd: 570
im0	im1
GT
nonocc

MP: 5.6
nd: 320
im0	im1
GT
nonocc

MP: 5.2
nd: 450
im0	im1
GT
nonocc

OpenCV's "semi-global block matching" method; memory-intensive 2-pass version, which can only handle the quarter-size images. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/25/14

SGBM2

26.6

27.9

12.1

17.8

13.7

74.5

16.2

30.3

26.3

11.0

64.4

37.9

25.8

25.3

29.3

43.7

OpenCV's "semi-global block matching" method; memory efficient single-pass version. The matching cost is the sum of absolute differences over small windows. Aggregation is performed by dynamic programming along paths in only 5 of 8 directions. Post filter as implemented in OpenCV. Dense results are created by hole-filling along scanlines.

07/25/14

SGBM1

28.1

28.3

17.2

19.0

14.5

57.9

18.2

31.8

31.4

13.2

58.6

38.6

27.0

25.9

31.4

59.7

The images are Census transformed and the Hamming distance is used as pixelwise matching cost. Aggregation is performed by a kind of dynamic programming along 8 paths that go from all directions through the image. Small disparity patches are invalidated. Interpolation is also performed along 8 paths.

07/25/14

SGM

21.2

35.5

9.57

13.8

16.5

32.1

23.3

25.8

16.7

8.95

39.8

31.1

22.6

20.7

21.3

32.2

A prior disparity image is calculated by matching a set of reliable support points and triangulating between them. A maximum a-posterior approach refines the disparities. The disparities for the left and right image are checked for consistency and disparity segments below a size of 50 pixels removed.

07/25/14

ELAS

33.1

49.3

8.96

14.5

35.1

98.0

16.4

47.4

24.2

14.9

49.7

39.7

26.2

30.1

60.8

34.6

07/25/14

SGBM1

28.4

43.5

9.09

13.6

25.9

82.0

14.4

43.4

30.3

5.98

59.3

45.8

28.5

24.9

20.1

45.9

07/28/14

ELAS

29.3

41.8

11.1

16.6

27.9

96.0

19.8

31.6

21.4

14.1

50.8

40.2

29.2

29.7

32.3

37.3

07/28/14

SGBM1

24.0

32.9

10.8

13.6

16.2

71.2

14.7

26.6

23.0

5.83

53.8

39.2

25.6

22.8

18.8

47.4

07/28/14

SGM

18.7

40.3

4.54

8.03

22.9

40.5

14.6

24.7

10.1

5.40

29.6

28.5

23.9

20.0

14.2

30.9

07/28/14

SGM

25.3

45.1

4.33

6.87

32.2

50.0

13.0

48.1

18.3

7.66

29.6

36.1

31.2

24.2

24.5

50.2

Correlation with five, partly overlapping windows on Census transformed images using Hamming distance as matching cost. A left-right consistency check ensures unique matches and filtering small disparity segments removes outliers. Interpolation is done within image rows with the lowest, valid neighboring disparity.

07/28/14

Cens5

26.9

47.1

8.74

11.9

25.6

45.3

22.6

40.6

29.0

9.93

36.5

38.6

31.0

25.0

25.6

44.6

A fast method for high-resolution stereo matching without exploring the full search space. Plane hypotheses are generated from sparse feature matches. Around each plane, a local plane sweep with +/- 3 disparities levels is performed to establish local disparity hypotheses via SGM using NCC matching costs. Finally, each pixel is assigned to one hypothesis using global optimization, again using SGM.

08/25/14

LPS

19.4

6.14

5.34

9.24

7.53

96.0

15.0

9.61

9.40

5.18

92.4

27.4

24.3

23.0

10.0

25.6

08/27/14

LPS

20.3

6.72

6.06

9.72

9.87

94.3

14.1

11.2

5.88

89.3

36.0

20.5

23.8

16.0

25.4

08/31/14

BSM

41.5

59.8

25.8

27.9

38.9

60.6

33.3

46.9

37.3

26.3

64.8

51.5

42.6

45.2

42.8

66.6

09/10/14

LAMC_DSM

26.2

55.8

11.9

14.3

18.3

44.0

21.3

39.9

29.5

6.67

31.1

34.5

28.8

26.3

30.1

35.7

09/18/14

SNCC

22.2

48.6

6.98

9.79

25.7

46.0

15.2

36.8

16.6

7.25

23.1

34.2

26.7

21.8

19.9

28.4

10/07/14

IDR

18.4

37.5

4.08

7.49

23.3

40.6

15.7

24.5

11.3

5.46

33.1

26.0

21.5

21.7

15.3

21.2

11/12/14

LCU

17.0

24.7

7.59

11.6

11.9

27.9

14.0

19.3

15.8

8.10

36.1

29.1

21.3

18.4

14.1

23.8

We propose to perform stereo matching as a two-step energy minimization algorithm. We consider two MRF models: a fully connected model defined on the complete set of pixels in an image and a conventional locally connected model. We solve the energy minimization problem for the fully connected model, after which the marginal function of the solution is used as the unary potential in the locally connected MRF model.

01/21/15

TSGO

39.1

34.1

16.9

20.0

43.3

55.4

14.3

54.1

49.2

33.9

66.2

45.9

39.8

42.6

47.2

52.6

04/08/15

REAF

31.4

58.3

30.9

13.1

45.3

63.8

30.9

38.7

25.3

8.60

39.3

36.8

27.0

35.5

18.2

39.7

04/09/15

PFS

32.2

65.1

29.4

12.1

50.0

70.8

28.2

44.6

23.1

7.85

37.0

37.7

27.9

36.0

19.8

35.7

04/17/15

TMAP

17.1

20.2

4.94

8.13

12.8

30.0

14.1

27.9

20.4

5.09

31.5

23.1

20.9

19.0

18.8

18.0

04/20/15

MeshStereo

13.4

5.90

4.88

10.8

12.9

10.6

13.6

12.2

9.01

5.39

27.4

23.5

17.7

21.0

15.4

20.9

Compute the matching cost with a convolutional neural network. Then apply cross-based cost aggregation, semiglobal matching, left-right consistency check, median filter, and a bilateral filter. DETAILS: The network is similar to the one described in our CVPR paper differing only in the values of some hyperparameters. The input to the network are two 11 x 11 image patches. Five convolutional layers with 3 x 3 kernels and 112 feature maps extract feature vectors from the input image patches. The two 112-length feature vectors are concatenated into a 224-length vector which is passed through three fully-connected layers with 384 units each. The final (fourth) fully-connected layer projects the output to a single number---the matching cost. One important addition was the use of data augmentation techniques to increase the size of the training set. We tried to use as much training data as possible. Therefore we combined all of the 2001, 2003, 2005, 2006, and 2014 Middlebury datasets obtaining 60 image pairs. For the newer datasets (2005, 2006, and 2014) we also used several illumination and exposure settings.

08/28/15

MC-CNN

8.29

5.59

4.55

5.96

2.83

11.4

8.44

8.32

8.89

2.71

16.3

14.1

13.2

13.0

6.40

11.1

Middlebury Stereo Evaluation - Version 3

ELAS update