Computer Vision Research
WHAT IS COMPUTER VISION?
Computer vision (also known as machine vision) is a branch of artificial intelligence that strives to make computers “see” like humans. Everyday tasks such as recognizing objects, discerning shapes, colors and motions and interpreting scenes are effortless for people with normal vision but very challenging for a computer.Much of the challenge of computer vision arises from the ambiguity of local image information – at the scale of individual pixels – which must be interpreted in the larger context of the scene (see figure below). Resolving these local ambiguities requires a framework for combining multiple information sources and for representing the uncertainty inherent in this information. One such framework is Bayesian probability, which provides a way of quantifying knowledge and uncertainty about variables, and prescribes rules for inferring likely values of unknown variables from measured data. When applied to computer vision problems, Bayesian methods provide inferences about a scene within the context of all relevant evidence, rather than making premature (and often incorrect) decisions at a local scale.
![]() |
Local
image information is very ambiguous when presented out of context: what is this a picture of? Click here to find out by viewing the entire scene. |
RESEARCH AREAS
1.) Graphical Models for Representing, Finding and Matching Shapes
Dynamic Quantization
Dynamic quantization (DQ) is an extension to belief propagation (BP) for deformable template matching which speeds up the search process, allowing the detection of deformable shapes which are partially occluded in a cluttered background. DQ is an extension of standard pruning techniques which allows BP to adaptively add as well as subtract states as needed. Since DQ allows BP to focus on more probable regions of the image, the state space can be adaptively enlarged to include locations where features are occluded, without the computational burden of representing all possible pixel locations. The combination of BP and DQ yields deformable templates that are both fast and robust to significant occlusions, without requiring any user initialization.![]() |
Graphical model algorithm applied to finding cat head shape in an image. |
Loopy Belief Propagation for Finding Deformable Shapes in Natural Images
A deformable template algorithm, based on the belief propagation algorithm used for inference on graphical models, is used to find shapes in natural images automatically (without any user initialization). The algorithm is based on a Bayesian graphical model of shapes (e.g. the letter "A" or the contour of an open hand) and edge statistics and accomodates arbitrary rotation, translation and a variety of shape deformations. Empirically we find that it converges even in the presence of loops (cycles) in the Bayesian graphical model, and thus using belief propagation removes a serious restriction imposed in related earlier work, in which the matching was performed by dynamic programming and required the graphical model to be tree-shaped (i.e. without loops).The standard belief propagation algorithm is augmented by a pruning procedure and a novel technique, inspired by the 20 Questions (divide-and-conquer) search strategy, called "focused message updating." These modifications boost the speed of convergence by over an order of magnitude compared with standard belief propagation, resulting in an algorithm that detects and localizes shapes in grayscale images in as little as several seconds.
J. Coughlan and S. J. Ferreira. "Finding Deformable Shapes using Loopy Belief Propagation." In proceedings of ECCV (European Conference on Computer Vision) 2002. pdf
2.) Image Statistics
The g Factor: Relating Distributions on Features to Distributions on Images
Markov Random Field (MRF) distributions have been used to model images in terms of image features such as filter statistics (e.g. Minimax Entropy Learning devised by Zhu, Wu and Mumford for texture modeling). This type of MRF model can be formulated either as the maximum entropy distribution on images whose mean filter histograms match empirically measured target histograms (obtained by averaging over a dataset of representative images), or equivalently as the maximum likelihood distribution (whose form is dictated by the choice of filters) given the empirical histograms.Our work on the g factor explores the connection between MRF distributions on images and the induced distributions in feature space (i.e. the space of possible image filter statistics). This connection motivates the multinomial approximation, which allows us to estimate the MRF clique potentials quickly (and provides insight into their form), sheds light on the coupling between different filters, and furnishes an estimate of the information provided by different filters.
We also establish a basic connection between the multinomial approximation and Generalized Iterative Scaling (GIS, Darroch and Ratcliff 1972), which is an iterative algorithm that is guaranteed to converge to the correct maximum likelihood estimate of the MRF clique potentials. The multinomial approximation of the MRF clique potentials turns out to be equivalent to performing one iteration of GIS.
J. Coughlan and A.L. Yuille. “The g Factor: Relating Distributions on Features to Distributions on Images.” Neural Information Processing Systems (NIPS ‘01).
It is also possible to improve upon the multinomial approximation by continuing to iterate GIS and evaluating the necessary expectations using a Bethe-Kikuchi approximation:
J. Coughlan and A.L. Yuille. “Algorithms from Statistical Physics for Generative Models of Images.” Image and Vision Computing (IVC) Special issue on Generative-Model Based Vision. Vol. 21/1, pp.29 - 36. 2003. pdf
Manhattan World: Interpreting the three-dimensional layout of cluttered urban scenes
A Bayesian model and algorithm is used to estimate camera orientation in an indoor or urban environment. A typical scene in such an environment contains many x-, y- and z- lines which may be used to estimate the camera orientation relative to the xyz axes. Our model requires only a single input image and, since it does not rely on edge grouping processes (e.g. Hough transform), it can succeed even in the absence of continuous x,y,z lines.The Manhattan model can also be applied to images, such as rural scenes, which do not have strictly Manhattan structure, but nevertheless have one or more dominant vanishing points. In addition, the Manhattan model can be compared with a null hypothesis model assuming no 3-D structure, yielding an estimate of how much Manhattan structure is contained in an image. Details on these results are in:
J. Coughlan and A.L. Yuille. “
Also see powerpoint presentation on Manhattan world: pdf
Edge statistics
All algorithms for finding edges in images look for some sort of image discontinuity that signals the presence of an object boundary. Rather than imposing an idealized model of what an edge should be, such as a step-edge plus Gaussian noise, we adopt a Bayesian procedure of learning the statistics of image properties both on and off object boundaries. This learning is done on large image databases that have been manually segmented. Given any filter applied to the images, such as the magnitude of the image gradient, two distributions are learned using histograms: Pon , the distribution of filter responses on edges, and P off , the distribution off edges. The same technique can be applied using several filter responses simultaneously, e.g. image gradients at multiple scales, yielding Pon and Poff distributions that optimally combine multiple cues and which are very effective for local edge detection.S. Konishi, A.L. Yuille, J. Coughlan and S.C. Zhu. “Statistical Edge Detection: Learning and Evaluating Edge Cues.” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 25, No. 1, pp. 57-74. January 2003. pdf
3.) Fundamental Limits in Visual Inference
Order Parameters
What fundamental limits exist in performing visual inference? When is there is enough information in an image to solve a visual task? And what are the consequences for algorithms to perform visual tasks? These questions are addressed for visual search tasks in terms of order parameters, which summarize the difficulty of correctly locating contours in cluttered/noisy backgrounds.The following are links to three papers on order parameters. The first addresses the fundamental limits on visual inference, without regard to algorithmic considerations; the second explores the effects of performing inference with sub-optimal prior models; and the third analyzes the computational complexity of the A* tree search algorithm in terms of order parameters.
A.L. Yuille and J. Coughlan. “Visual Search: Fundamental Bounds, Order Parameters, and Phase Transitions.” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 22, No. 2, pp. 1-14. February 2000. pdf
A.L. Yuille and J. Coughlan. “High-Level and Generic Priors for Visual Search: When Does High-Level Knowledge Help?” Computer Vision and Pattern Recognition (CVPR ‘99). Fort Collins, CO. June 1999.
Last updated Aug. 2007.

