Threshold Psychophysics, Probability Summation and the Analysis of Spatial Summation Behavior

Christopher W. Tyler

Smith-Kettlewell Eye Research Institute, San Francisco.

cwt@skivs.ski.org

/cwt_lab/html

1.1 Ideal Observer Assumes A Continuum Of Linear Summing Mechanisms With Unrealistically Shallow Psychometric Function Slopes

The Ideal Observer formalism assumes that the observer has complete knowledge of the stimulus and uses a matched filter to detect its presence. The Ideal Observer therefore is effectively a Bayesian observer with a prior probability of 1.0. Optimal performance with this filter is assumed to occur with linear summation over the noisy filter inputs Ri. Since the matched filter increases with stimulus size, the overall signal should improve according to

if the local regions have independent sources of Gaussian noise. However, the psychometric function in this model is based on a linear relation between d'(=R) and signal strength that is violated by most d' measurements, which typically show an exponent of 2. Similarly, translation of this prediction into the Weibull format yields a predicted Weibull exponent of 1.3, whereas most measurements show slopes of 3-4.

The Ideal Observer model requires a summing receptive field matching every size of stimulus for which the summation behavior is exhibited. Probability summation among these mechanisms could produce slightly greater sensitivity than the Bayesian strategy, because the Ideal Observer incorporates no probability summation.

1.2 Ideal Observer Analysis Implies Mechanism Summation At Threshold But No Summation Under Masking Conditions

Data from Kersten (1985) showed spatial summation that conformed to the Ideal Observer prediction up to about four cycles for grating detection, but no summation beyond one grating cycle for grating patterns buried in a noise mask. Since detection is presumably occurring against a background of intrinsic noise, the difference between these two conditions seems unlikely to be due to differences in the attentional strategy. Instead, it seems that the physiological units operating under high-contrast masking conditions respond to local regions of the image rather than integrating over larger areas. Conversely, the range of detection thresholds where the summation conforms to the Ideal Observer slope of -1/2 imply the existence of linear summing mechanisms up to four grating cycles in foveal viewing.

2.1 Probability Summation Implies Distributed Attention To Many Channels

In spatial vision, the Probability Summation hypothesis implies that the mechanism of attention is distributed over many spatial channels rather than focal, since one cannot monitor many channels without attending to them. It is then assumed that, on every trial, the attention mechanism can select the maximum channel response over the monitored range for use in the detection decision and ignore all other channels. Probability 'summation' is thus a max operator rather than a summing operator in the normal sense. In Quick's (1974) High-Threshold Theory, the psychometric function for an individual channel prior to summation is given by the Weibull function:

The effect on the overall psychometric function of probability summation by taking the minimum over the set of individual parallel channels is:

from which:

The summation predictions of Ideal Observer Theory and High Threshold Probability Summation therefore concide when ß =2.

2.2 High Threshold Analysis Of Probability Summation Assumes Non-Gaussian Noise

Under the assumption of a high threshold, above which the signal must climb to be detected, the Weibull formulation of the psychometric function

represents the integral of an internal S + N distribution:

which is reminiscent of the Poisson distribution. The forms of the psychometric functions and the implied noise distributions d¥/dR for values of ß from 1.3 to 10.4 in half-octave increments (corresponding to d' exponents from 1 to 8 in the same increments) are shown. It is evident that the implied noise distributions in the lower portion are generally far from approximating a Gaussian form except in the mid-range, which constitutes the special case that ß = 4 (dashed lines).

(It has been pointed out by Marius Usher that this derivative analysis is performed from the wrong side of the distribution, because it is the upper tail of the noise as it emerges above the hard threshold level that forms the lower tails of the cumulative function. I have not yet rederived the results, but it appears that the same degrees of asymmtry would be found, except flipped left to right).

2.3 High-Threshold Analysis Of Probability Summation Predicts Improvement Inversely With Psychometric Function Slope

For n equally sensitive mechanisms, the summation term for that the overall response in relation to the responses of the individual mechanisms in the above equations simplifies to:

where ß empirically may take values from 1.3 to 6. In particular, Kersten (1985) reports exponents of d' = kc1 for his noise- masked psychometric functions, which corresponds to ß = 1.3 and would imply almost linear summation behavior if probability summation were operating. For example, summation over 1000 channels has a predicted improvement of a factor of 1000^1.3=200.

Nevertheless, since there was no detectable summation beyond 0.5 - 1 cycle in Kersten’s noise-masked data, it appears that probability summation was not occurring under these conditions (although Kersten himself did not draw this inference). This suggests the need to identify experimental conditions that would replicate this result, to verify the inferred absence of probability summation.

2.4 High-Threshold Probability Summation Fails For Additive Noise

High Threshold Theory assumes that probability summation occurs because the observer can identify the max of the samples of signal + noise distributions provided by the DS(R) across each of K stimulated channels (where R is the internal response). The distribution of such maxes over trials, which provides the internal response after probability summation, is given by the standard statistical formula:

To obtain the new threshold signal level, the signal must be reduced accordingly until the max distribution MS reaches the threshold criterion. If the noise is assumed to be additive, however, this creates the fatal problem that the mean signal needs to go negative in order to bring the signal + noise distribution down to threshold, as depicted in Fig. 1. High-threshold analysis is immune to this problem only if the noise in the Weibull approximation is multiplicative and hence can be reduced indefinitely by requisite signal reductions without the signal going negative.

Note that, for mixed additive and multiplicative noise sources, reducing the signal will tend to reduce the multiplicative noise to the point where additive noise dominates. Since there are always sources of additive noise in any physical system (e.g., quantal and thermal noise), any noise-limited threshold is likely to be limited by its additive component. Weibull analysis is thus inappropriate for realistic threshold systems, particularly when noise is added asa stimulus mask.

The implication that the noise in the Quick/Weibull analysis is multiplicative is incompatible with the assumption of additive noise made for the analysis of section 2.2. That type of distribution analysis has not yet been performed under the assumption of multiplicative noise, for which the conclusions might be quite different. The main conclusion of both sections is that Quick/Weibull analysis is incompatible with the assumption of Gaussian additive noise over the complete range of empirical psychometric data.

"

3.1 Probability Summation In 2AFC Experiments Does Not Conform To High Threshold Analysis, But Derives From The Difference Distribution

Assume there are K stimulated channels and MK total channels. The observers’ task is to distinguish between sample stimuli drawn from the max distributions of noise alone and signal + noise for summation over a given number of channels. For 2AFC probability summation by the max combination rule in each test interval, the summed response distribution MN over M K channels for the null stimulus of the pair is:

To obtain the distribution of the sample maxima in the signal interval, the activity of the K stimulated channels must be combined with that of the (M-1).K unstimulated channels to obtain the summed signal distribution MS :

For an ideal observer, attention is directed only to the stimulated channels and M = 1. As the stimules extent is increased, the attentional area is increased to monitor a proportionately larger number of channels and the of the max distributions decrease. Fig. 2 shows the numerical distributions for samples of maxes computed according to this derivation in factors of 2 from 1 to 1 million. The decreases by a factor of for monitoring two channels and no more than a factor of 4 for monitoring 1 million channels! In each case, the observers’ task is to distinguish between sample stimuli drawn from the max distributions of noise alone and signal + noise for summation over a given number of channels. Discriminability therefore improves with the reciprocal of the reduction in and is independent of the initial value of , and thus independent of in the Weibull approximation to the psychometric function. Under these low-uncertainty conditions, 2AFC probability summation over equal-sensitivity channels is usually less than the rule implied by Pelli’s (1985) high-uncertainty approximation to High-Threshold Theory.

3.2 Channel Uncertainty Effects Can Be Eliminated By Renormalization

Channel Uncertainty Theory is an elaboration of Signal Detection Theory in which the number of neural channels monitored in the brain is greater than the number of channels stimulated (by ratio M) (Pelli, 1985). The left panels of Fig. 3 show the numerical distributions obtained by the 2AFC derivation for the certain condition (monitoring the stimulated channel) and an uncertain condition (monitoring one thousand channels, with only one stimulated). The right panel of Fig. 3 shows the d' functions obtained for 4 levels of uncertainty, together with an analytic approximation to these functions that is accurate up to about d’=3 (i.e., within the normal measurement range). Note that the log slope of these functions (straight dashed lines) is related simply to log uncertainty by the expression: where M is the ratio of monitored to stimulated channels.

Moreover, a "certain d' estimate" may be derived by extrapolating the measured slope of log d' function up to d'=10, then extrapolating back on the slope for M=1 to estimate the d' that would have been obtained with no channel uncertainty. Note that certain d' estimate is approximated simply by dividing the measured threshold at d'=1.1 by the slope of the d’ function! Alternatively, the analytic d' functions may be used to provide an estimate of the psychometric function that requires no approximation. If a threshold estimate is required to be more accurate than either of these proposed approximation formulae, the data for the psychometric function may be fitted over the family of computed d' functions to refine the compensation for channel uncertainty.

3.3 Weibull Analysis Does Not Apply To 2AFC With High Uncertainty

Under high uncertainty conditions, the number of channels monitored is much greater than the number stimulated. If the number stimulated is now increased, as by increasing the stimulated area through the size of the area attended, uncertainty is decreased with increasing stimulus size (and the psychometric function exponent becomes shallower). Contrary to Pelli's claim, spatial probability summation effects at high uncertainty with a fixed attention window are not proportional to 1/ß. The summation effects (dotted line) are controlled by the change in the ß of the Weibull approximation as uncertainty is reduced by increasing stimulus area; as seen at the left of the plot, the summation effects are of negligible size when uncertainty is high. (Stimulus area is small relative to attention window.)

The dashed curve shows the summation effects expected of an ideal observer with a limited attention field, i.e., the ability to attend to a linear summing area matched in size and position to the stimulus for every stimulus size, up to some maximum designated as the size of the attention field.

4.1 High-Contrast Noise Masks Do Not Generate d' Exponents Of 1

Kersten's (1985) experiment was replicated for detection of Gabor patches with ramped noise masks in foveal viewing. The exponents of the d' functions are close to 2 (except at the highest d's) corresponding to a ß of 2.5. These slopes therefore predict the same amount of probability summation on High Threshold Theory as on Signal Detection Theory, making it difficult to discriminate between the analysis on the basis of the sensitivity functions. The received explanation for the steeper slopes is that the the observer is monitoring of a larger number of channels than are stimulated, hence exhibiting uncertainty effects as in 3.2.

4.2 High-Contrast Grating Masks May Generate d' Powers Close To 1

On the other hand, it was possible to find conditions where the psychometric function exponents approximated 1, predicting strong probability summation effects. In particular, detection of Gabor patches on a 300 msec ramped grating mask in foveal vision conformed to an exponent of 1 over most of the range. Peripheral static masks also gave exponents close to 1 for large test sizes and higher d' values.

5.1 Spatial Summation With A High-Contrast Mask Is Less Than Predicted For Probability Summation By Weibull Or 2AFC Analysis

We now have two methods to eliminate the effects of channel uncertainty to obtain high levels of predicted probability summation: measuring sensitivity at high d' levels, where the exponent is close to unity, and using the certain estimates from 3.2. For both foveal and peripheral masking conditions, both types of estimate produced much less improvement in sensitivity than predicted by the High-Threshold Theory of probability summation (or its 2AFC High-Uncertainty variant).

A caveat on this analysis is that probability summation is expected only if the sources of noise are independent across the summation space. It is possible that, for example, eye movements across a static mask generate predominantly correlated noise, implying a zero-summation prediction for any noise-limited theory. However, even on a 300 msec ramped mask. the foveal data are showing no improvement beyond about one cycle. The degradation seen here for large field sizes is also evident in several plots from Kersten (1985). It may reflect lateral inhibitory effects in the second-order processing of the grating envelopes.

I conclude that, since probability summation does not seem to be operating under these conditions, it is unlikely that it is a strategy that is available to the human observer. Morever, since there is no improvement in sensitivity with size beyond 0.5º, there must be no summation of any kind over the measured range under masked conditions, including summation by receptive-field mechanisms (although long-range, second-order inhibition may be present).

5.2 At Threshold, Spatial Summation Conforms To Ideal Observer Behavior Up To 12 Grating Cycles

Unmasked detection sensitivity at 10o peripheral improved linearly with area up to 3 cycles and with the square root of area up to 12 cycles. In the absence of probability summation, these results imply that the smallest low-contrast mechanism in the periphery is 3 cycles wide and the largest is 12 cycles wide. Ideal Observer behavior corresponds to switching among mechanisms with sizes within this range by attending to the one best matching the stimulus in each condition. The operation of wide field summing mechanisms contrasts with that under masked conditions, where almost no summation was evident.

Conclusion

A Discrete Attention Model Without Probability Summation Can Account For Both Threshold And High-Contrast Summation Behavior

In many respects, the data are inconsistent with plausible models of probability summation. Under masking conditions, the visual system appears to be able to attend only to a tiny region of the visual field. The dramatic localization suggests a unitary rather than distributed model of human consciousness, in which the attention mechanism can select the output of only one neural circuit for conscious processing at any moment in time.

On this interpretation, the spatial summation seen at threshold would be attributed to physiological rather than probability summation over arrays of independent noise sources, with Bayesian attentional selection to an optimal mechanism from those available, optimized during the repetitive runs of the experiment. This selection allows the observer to approximate ideal observer behavior. Presumably, the high background signal would desensitize these summing mechanisms under masking conditions by a gain control or compressive nonlinearity.

Consequently, the data suggest that spatial mechanisms can summate linearly up to 12 grating cycles in peripheral retina. Nevertheless, an alternative interpretation in which the observer changes attentional strategy according to the presence or absence of the masking pattern could be ruled out by an experiment in which stimulus position unpredictable from trial to trial, forcing attention to be distributed or randomized.