Stereo Evaluation Version 2

Here are some design decisions we made in designing version 2 of the stereo evaluation. Feedback is welcome (schar@middlebury).

Differences to the old table

  1. The main difference is that we now include the Cones and Teddy stereo pairs, together with two of the old ones: Tsukuba and Venus. Motivation: The four old data sets have been virtually solved, and enough people have published results on Cones and Teddy that we should have a good "seed group" for a new table. So if you have published results on Cones and Teddy that are not yet in the table, please submit your results using the link at the top of the results page.

  2. The "all" column in the old table is now (more accurately) called "nonocc". It is still the first column for each data set, and the numbers (for thresh = 1.0) are the same.

  3. We have replaced the old second column, "textureless", with the new "all" column (which includes the half-occluded regions). Motivation: Textureless areas no longer present a real problem for the top algorithms, which tend to be global methods (look at these numbers in the old table). On the other hand, "guessing" the correct disparities in half-occluded regions is becoming more important for several applications, including view synthesis, so it seems important to report the performance in these areas. We are planning to submit a separate submission page for those methods that explicitly compute occlusion maps, for evaluating the accuracy of these maps.

  4. The third column, "disc", is still there, but "near discontinuities" now also means "near occluded regions". Thus, the statistics are slightly different. Motivation: This makes sense and is in a way more symmetric, since the two edges of occluded regions caused by discontinuities in the two images, respectively. Click on the "disc" links on Tsukuba or Venus in both old and new tables to see the difference. Clearly, all the white regions qualify as the most difficult areas for the algorithms to solve.

  5. On the two new data sets, Cones and Teddy, we no longer exclude a border region from the evaluation. Tsukuba and Venus still have the old border regions of 18 and 10 pixels, respectively. Motivation: Most algorithms (except for simple window-based implementations) can estimate disparities right up to the borders. Those that can't will need to add some sort of extrapolation step before submitting results. Of course, those border pixels that are visible in only one image are considered occluded and are not evaluated in the "nonocc" and "disc" categories.

  6. We decided to still require that methods are run with constant parameters across all image pairs. Note that this is not the case for all methods in the current table. In the short term, we will just flag the methods that don't obey this requirement, but in the long term we will ask their authors to resubmit new results with constant parameter settings.

New features

  1. The error threshold can now be changed with a pull-down menu to the values 0.5, 0.75, 1.0, 1.5, and 2.0. Motivation: The old, fixed, error threshold of 1.0 has been problematic since it did not distinguish between integer-based methods and those that compute sub-pixel estimates. Even worse, on the Tsukuba images (which only has integer ground-truth disparities) it favored rounding disparities to integers since a disparity error of, say, 1.4 would be considered a "bad pixel", but not when rounded to 1.0. The new variable error threshold allows comparison of subpixel performance (when 0.75 or 0.5 is selected), and discourages rounding (if 0.5 or 1.5 is selected). The default error threshold that determines the "official" rank of each algorithm in the table is still 1.0

  2. The table can now be resorted by data sets and different evaluation regions by clicking on the blue arrows. The green columns in the table indicate the columns by which the table is currently being sorted.

  3. The overall performance measure (the average rank over all 12 columns) is displayed in the second column of the table. Ties among rankings are thus made explicit, and are also visualized using the same row color.

  4. Clicking on any links in the table now creates output in a different window, which pops to the front. This makes it easier to compare the different results. By default, the results always appear in the same window. If multiple result windows are desired (to compare results side by side), you can check the box above the table.

  5. The image name now links to the image pairs. You can switch back and forth between the two views by moving the mouse over the images.