Speech Intelligibility of the Talking Signs.

John A. Brabyn, Ph.D. and Lesley A. Brabyn, M.A.
Smith-Kettlewell Institute of Visual Sciences
San Francisco, CA.

Journal of Visual Impairment and Blindneness, February, 1982


The Talking Signs concept is a method of making navigational signs and landmarks "readable" by blind and visually impaired persons. A sighted person can best understand the value of such an orientation system if he or she imagines all street signs, house markers, room numbers, and bus identification signs, to be suddenly removed. Without such signs, travel through a city or inside a large building would become frustrating, time-consuming, and hazardous.

The intent of the Talking Signs system is to make these signs available to the blind, visually impaired and reading impaired persons -- estimated to be over 10 million in the United States. The Talking Signs system achieves this goal by placing a miniature, low-powered infrared light transmitter at locations where written signs normally appear, both indoors and out. Each light source, invisible to the eye and therefore not intrusive to the sighted population, is modulated with a spoken message corresponding to the wording of the sign. This message is stored on a tiny computer memory chip. Although the light transmits continuously, its message is only heard when a blind pedestrian points his or her receiver in the general direction of the signal and presses the "on" button. The receiver which contains a small speaker then decodes the sign into a verbal message.

Research and Development:

At the current stage of development, each Talking Sign transmitter consists of a 6.4 cm. plug-in cube which contains the necessary electronics and the memory chip. A thin cable leads from this to the miniature infrared transmitter (2.5 cm. by 1.3 cm.) which is placed at the desired sign location. The receiver is a hand-held unit measuring 8.9 cm. by 5.1 cm. that has a lens on the front for directionality. The desired speech message is recorded on a 16K bit EPROM (Erasable Programmable Read Only Memory) chip using delta modulation. This information is continuously transmitted in integrated form using pulsed FM transmission centered at 25 KHz. The receiver demodulates the FM Signal and presents the speech information through its built in speaker.

The Current Experiment:

Pilot data on the optimal values for receiver and transmitter beam-width have been obtained in previous experiments. The study reported here was designed to address the question of message intelligibility. It has been a concern throughout the system's development that the intelligibility of the speech output should be sufficient for navigational purposes without necessitating excessive cost and sophistication in hardware design. The current transmitter/receiver design is being manufactured in quantities of hundreds for demonstration purposes. For these demonstrations to effectively illustrate the systems potential, it was necessary that speech intelligibility should not be the limiting factor in actual use.

To test the speech quality objectively the intelligibility was compared with that of spoken messages reproduced over a high fidelity audio system. This applied the most stringent possible standard of comparison so that the amount by which the Talking Signs speech fell short of "ideal" could be measured. This enabled a judgment to be made as to whether the speech quality was likely to prejudice significantly the performance of the system as an orientation aid for traveler.

Subjects:

The subjects were l8 adults ranging in age from 27-47 years with a mean of 34.3 years. Participants reported no history of auditory impairment. Five of the subjects were bilingual in English and either German, Spanish, Swedish or Swiss-German. Two of the subjects were blind. Each subject had a moderate amount of familiarity with the Talking Signs and was naive as to the experimental hypothesis.

Apparatus:

Fifteen Taking Signs transmitters, each programmed with a different verbal message were randomly selected from a pool of 100 pre recorded transmitters. The brief messages contained information that would be necessary for successful orientation in a typical office building, such as "room 3431", "Drinking Fountain", or "Stairway Exit." A Revox stereo type recorder (Model G36) was programmed with identical messages spoken by the same voice recorded on the Talking Signs transmitters. Because of the difficulties in precision involved with random access to the reel recorder, 12 different randomized sequences of the 15 messages were recorded. Each subject was then randomly assigned one of these sequences while the mode of presentation was randomly chosen for each trial. The sound level of the tape recorder speech was matched to the mean sound level of the l5 Talking Signs transmitters.

Procedure:

Each subject was seated with his or her back to the experimenter. The tape recorder was positioned on a table 172.7 cm. (5 ft. 8 in.) from the back of and level with the subject's head. During the trials using the Talking Signs transmitters, the experimenter held the receiver next to the tape recorder speakers so that the messages, regardless of its source, would be emitted from the same spatial locale. The transmitting Light Emitting Diode (LED) was positioned 111.8 cm. (3 ft. 8i n.) from the receiver and hidden from the subjects view by a screen. The subject was told that the purpose of the experiment was to compare the comprehensibility of two kinds of recorded speech. He or she was informed that during each trial a brief verbal message would be presented.

The subject was instructed to write down what he or she understood the message to be. The two blind subjects wrote their responses in Braille, which was later translated for scoring. Several practice trials were presented in order to familiarize the subject with the experimental procedure.

Each subject was presented with 60 trials. In 30 of these, the messages were presented on the tape recorder and in the remaining 30, using the Talking Signs transmitters. The mode of presentation and order of the messages were randomized. In each trial the subject was allowed only one presentation of the complete message and no repetitions were permitted.

Result:

For each subject the number of messages correctly understood for the two modes of presentation was calculated. Due to the high accuracy rate and very low variance creating a skewed distribution in both experimental conditions a non-parametrlc analysis of the data was chosen. The mean number of correct answers and standard deviation for each condition appears in Table 1. Using the Walsh Test (see Siegel, 1956), as significant difference was found between the comprehensibility of the tape recorder and Talking Signs speech (p .047, two-tailed). The two blind subjects produced perfect scores for both conditions and correctly understood all of the messages.

Table 1. Mean number of correctly understood messages with standard deviations for Talking Signs and tape recorder speech.

Mean Talking Signs = 28.94                  Mean Tape Recorder = 29.83

Standard Deviation Talking Signs = 1.43     Standard Deviation Tape Recorder = 0.38

An interesting trend was observed among the bilingual subjects who exhibited a lower accuracy rate than the native English speakers in the Taking Signs condition. The percentage of correctly understood messages for these two sub-groups appears in Table 2. When the data from the bilingual subjects were eliminated from the analysis, the difference in comprehensibility between the Talking Signs and tape recorder speech was not significant.

Table 2. Percentage of correctly understood messages for bilingual and Native English speakers listening to Talking Signs or Tape Recorder.

Native English with Talking Signs = 98.7%           Bilingual with Talking Signs = 90.7%

Native English with Tape Recorder = 99.4%           Bilingual with Tape Recorder = 99.3%

Total for Native English speakers = 99.1%           Total for Bilingual speakers = 98.1%

Total for both Native and Bilingual using Talking Signs = 96.5%

Total for both Native and Bilingual using Tape Recorder = 99.4%

Discussion:

The subjects demonstrated a high accuracy rate in understanding the content of the messages for both modes of presentation. They reported feeling a high level of confidence and a minimum uncertainty in their correct answers It was true for both modes of presentation. The small difference between conditions, only 2.9% may not be due to gross differences in the intelligibility of the two types of speech so much as it may be due to subject characteristics.

Post hoc analysis revealed differences in accuracy between bilingual and native English speakers in the Talking Signs condition. This may be analogous to the situation often reported when the bilingual person is first learning the new language and has increased difficulty in understanding the second in telephone conversations. This can occur even though face-to-face conversation presents no problem to the listener. A relativity simple solution to this problem would be to modify the transmitter to incorporate two or more languages using different carrier frequencies or wave lengths of light. Each receiver would then be tuned to receive only the desired language.

The present experiment suggested at least two avenues of possible future investigation. First the effect of user control over the stimulus would be of interest. In this experiment only one presentation of the message was permitted during each trial. The experimenter held the Talking Signs receiver some distance away from the subjects' head, minimizing the amount of control the listener had over the stimulus. Most subjects reported feeling confident that they could have accurately understood all of the messages had they been allowed to hear them a second time. In actual practice the receiver would be held by the person using the Talking Signs system as close to his or her ear as desired and an infinite number of repetitions of the message is theoretically possible. The question of whether speech intelligibility is enhanced more for the active rather than the passive listener still remains to be answered. More research needs to be done in order to determine whether message repetition or ambient noise affects understandability.

Secondly, the influence of environmental context on message intelligibility might vary the results. In the present experiment, message content was restricted to contextual information. Subjects were instructed that they would be hearing phrases representative of markers found in a typical office building. Under these conditions, which were chosen because they were as close to "real life" use as possible, intelligibility scores were very high. Since the real life user will always possess some environmental contextual information, the question of intelligibility for random, non-contextual phrases is largely academic. However, a test of this sort would perhaps provide a more sensitive measure for comparison between Talking Signs speech and high fidelity reproduction.

In conclusion, the results of this study indicate that the speech intelligibility of the Talking Signs system is sufficient for accurate understanding of the transmitted messages. Bilingual users may be more sensitive to the imperfections of the speech and therefore experience more greater initial difficulty with the system. While this difference may disappear with practice, a simple solution is to adapt the system for use with several different languages.

In future stages of Talking Signs development the speech quality will certainly be improved. The present study demonstrates that the current system is adequately intelligible and shows that its utility in terms of orientation and mobility performance will not be limited by speech quality.

References:

Loughborough, W. Talking Lights. Journal of Visual Impairment and Blindness. 1979, 73, 143.

Schenkman, B. The effect of different angles and training on the time of detection in an orientation and for the blind. Unpublished manuscript.

Siegel S. Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill, 1956.


John Brabyn is co-director, Rehabilitation Center, and Lesley Brabyn is research assistant, Smith-Kettlewell Institute of Visual Sciences.

This research was supported by a grant from the National Institute of Handicapped Research, (Grant no. 23-P-57590/9) to Dr. Arthur Jampolsky. The authors wish to express their appreciation to William Lougborough, Talking Signs inventor, for his suggestions.