Stereo Evaluation Version 2
Here are some design decisions we made in designing version 2
of the stereo evaluation. Feedback is welcome
(schar@middlebury).
Differences to the old table
- The main difference is that we now include the Cones and Teddy
stereo pairs, together
with two of the old ones: Tsukuba and Venus. Motivation: The
four old data sets have been virtually solved, and enough people
have published results on Cones and Teddy that we should have a good
"seed group" for a new table. So if you have published results on
Cones and Teddy that are not yet in the table, please submit your
results using the link at the top of the results page.
- The "all" column in the old table is now (more accurately)
called "nonocc". It is still the first column
for each data set, and the numbers (for thresh = 1.0) are the
same.
- We have replaced the old second column, "textureless", with the
new "all" column (which includes the half-occluded regions).
Motivation: Textureless areas no longer present a real
problem for the top algorithms, which tend to be global methods
(look at these numbers in the old table). On the other hand,
"guessing" the correct disparities in half-occluded regions is
becoming more important for several applications, including view
synthesis, so it seems important to report the performance in these
areas. We are planning to submit a separate submission page for
those methods that explicitly compute occlusion maps, for evaluating
the accuracy of these maps.
- The third column, "disc", is still there, but
"near discontinuities" now also means "near occluded regions".
Thus, the statistics are slightly different. Motivation: This
makes sense and is in a way more symmetric,
since the two edges of occluded regions caused by
discontinuities in the two images, respectively. Click on the
"disc" links on Tsukuba or Venus in both old and new tables to see
the difference. Clearly, all the white regions qualify as the most
difficult areas for the algorithms to solve.
- On the two new data sets, Cones and Teddy, we no longer exclude a
border region from the evaluation. Tsukuba and Venus still have the
old border regions of 18 and 10 pixels,
respectively. Motivation: Most algorithms (except for simple
window-based implementations) can estimate disparities right up to
the borders. Those that can't will need to add some sort of
extrapolation step before submitting results. Of course, those border
pixels that are visible in only one image are considered occluded and
are not evaluated in the "nonocc" and "disc" categories.
- We decided to still require that methods are run with constant
parameters across all image pairs. Note that this is not the case
for all methods in the current table. In the short term, we will
just flag the methods that don't obey this requirement, but in the long
term we will ask their authors to resubmit new results with constant
parameter settings.
New features
- The error threshold can now be changed with a pull-down menu to the
values 0.5, 0.75, 1.0, 1.5, and 2.0. Motivation: The old,
fixed, error threshold of 1.0 has been problematic since it did not
distinguish between integer-based methods and those that compute
sub-pixel estimates. Even worse, on the Tsukuba images
(which only has integer ground-truth disparities)
it favored rounding disparities to
integers since a disparity error of, say, 1.4 would be considered a
"bad pixel", but not when rounded to 1.0. The new variable error
threshold allows comparison of subpixel performance (when 0.75 or
0.5 is selected), and discourages rounding (if 0.5 or 1.5 is
selected). The default error threshold that determines the "official"
rank of each algorithm in the table is still 1.0
- The table can now be resorted by data sets and different
evaluation regions by clicking on the blue arrows.
The green columns in the table indicate the columns by which the
table is currently being sorted.
- The overall performance measure (the average rank over all 12
columns) is displayed in the second column of the table. Ties among
rankings are thus made explicit, and are also visualized using the
same row color.
- Clicking on any links in the table now creates output in a
different window, which pops to the front. This makes it easier to
compare the different results. By default, the results always appear
in the same window. If multiple
result windows are desired (to compare results side by side), you can
check the box above the table.
- The image name now links to the image pairs. You can switch
back and forth between the two views by moving the mouse over the images.