Martin, D.R., Fowlkes, C., Walker, L., Malik, J. (2003). Local Boundary Detection in Natural Images: Matching Human and Machine Performance. European Conference on Visual Perception, Paris, France.
Local boundary detection remains an important problem in vision. Physiology shows that V1 extracts complex boundaries [Lee/et.al.,1998,Vis.Res.38]. Psychophysics shows that human subjects localize boundaries using multiple cues [Rivest/Cavanagh,1996,Vis.Res.36]. Our work in computer vision [Martin/et.al.,2002,NIPS] shows how to formulate and combine local boundary cues in natural images. To determine the quality of computational models, we need a precise characterization of human performance for the local boundary detection task. METHODS: A large dataset of ~1000 natural images (480x320), each segmented by ~10 human observers, provides ground truth on/off-boundary pixel labels. We extracted 100 on- and 100 off-boundary sample pixels, and presented radius {9,18,36} patches centered at the samples to subjects in 200ms exposures. The r=9 patches subtended 0.85 degrees. Subjects were asked if an object boundary passed through the patch center. We also evaluated the machine model at each sample; its input is a radius 9 patch, and its output is the estimated probability of a boundary. We evaluate performance using precision-recall (PR) curves (similar to ROC curves) which can be summarized by the F-measure, the harmonic mean of precision and recall. RESULTS: We find that the precision and recall of human subjects for radius 9 patches falls directly on the PR curve given by the machine boundary model, both with F=78%. Human performance increases with r={18,36} patches to F={83%,85%}. The classification error is 21% for the machine detector, and {23%,18%,15%} for the human subjects at r={9,18,36}. We conclude that the current state of the art in computational local boundary detection on natural images matches human performance, and is therefore optimal by this measure.