Speech Intelligibility of the Talking Signs.
John A. Brabyn, Ph.D. and Lesley A. Brabyn, M.A.
Smith-Kettlewell Institute of Visual Sciences
San Francisco, CA.
Journal of Visual Impairment and Blindneness, February, 1982
The Talking Signs concept is a method of making navigational signs and landmarks
"readable" by blind and visually impaired persons. A sighted person
can best understand the value of such an orientation system if he or she
imagines all street signs, house markers, room numbers, and bus identification
signs, to be suddenly removed. Without such signs, travel through a city
or inside a large building would become frustrating, time-consuming, and
hazardous.
The intent of the Talking Signs system is to make these signs available
to the blind, visually impaired and reading impaired persons -- estimated
to be over 10 million in the United States. The Talking Signs system achieves
this goal by placing a miniature, low-powered infrared light transmitter
at locations where written signs normally appear, both indoors and out.
Each light source, invisible to the eye and therefore not intrusive to the
sighted population, is modulated with a spoken message corresponding to
the wording of the sign. This message is stored on a tiny computer memory
chip. Although the light transmits continuously, its message is only heard
when a blind pedestrian points his or her receiver in the general direction
of the signal and presses the "on" button. The receiver which
contains a small speaker then decodes the sign into a verbal message.
Research and Development:
At the current stage of development, each Talking Sign transmitter consists
of a 6.4 cm. plug-in cube which contains the necessary electronics and the
memory chip. A thin cable leads from this to the miniature infrared transmitter
(2.5 cm. by 1.3 cm.) which is placed at the desired sign location. The receiver
is a hand-held unit measuring 8.9 cm. by 5.1 cm. that has a lens on the
front for directionality. The desired speech message is recorded on a 16K
bit EPROM (Erasable Programmable Read Only Memory) chip using delta modulation.
This information is continuously transmitted in integrated form using pulsed
FM transmission centered at 25 KHz. The receiver demodulates the FM Signal
and presents the speech information through its built in speaker.
The Current Experiment:
Pilot data on the optimal values for receiver and transmitter beam-width
have been obtained in previous experiments. The study reported here was
designed to address the question of message intelligibility. It has been
a concern throughout the system's development that the intelligibility of
the speech output should be sufficient for navigational purposes without
necessitating excessive cost and sophistication in hardware design. The
current transmitter/receiver design is being manufactured in quantities
of hundreds for demonstration purposes. For these demonstrations to effectively
illustrate the systems potential, it was necessary that speech intelligibility
should not be the limiting factor in actual use.
To test the speech quality objectively the intelligibility was compared
with that of spoken messages reproduced over a high fidelity audio system.
This applied the most stringent possible standard of comparison so that
the amount by which the Talking Signs speech fell short of "ideal"
could be measured. This enabled a judgment to be made as to whether the
speech quality was likely to prejudice significantly the performance of
the system as an orientation aid for traveler.
Subjects:
The subjects were l8 adults ranging in age from 27-47 years with a mean
of 34.3 years. Participants reported no history of auditory impairment.
Five of the subjects were bilingual in English and either German, Spanish,
Swedish or Swiss-German. Two of the subjects were blind. Each subject had
a moderate amount of familiarity with the Talking Signs and was naive as
to the experimental hypothesis.
Apparatus:
Fifteen Taking Signs transmitters, each programmed with a different verbal
message were randomly selected from a pool of 100 pre recorded transmitters.
The brief messages contained information that would be necessary for successful
orientation in a typical office building, such as "room 3431",
"Drinking Fountain", or "Stairway Exit." A Revox stereo
type recorder (Model G36) was programmed with identical messages spoken
by the same voice recorded on the Talking Signs transmitters. Because of
the difficulties in precision involved with random access to the reel recorder,
12 different randomized sequences of the 15 messages were recorded. Each
subject was then randomly assigned one of these sequences while the mode
of presentation was randomly chosen for each trial. The sound level of the
tape recorder speech was matched to the mean sound level of the l5 Talking
Signs transmitters.
Procedure:
Each subject was seated with his or her back to the experimenter. The tape
recorder was positioned on a table 172.7 cm. (5 ft. 8 in.) from the back
of and level with the subject's head. During the trials using the Talking
Signs transmitters, the experimenter held the receiver next to the tape
recorder speakers so that the messages, regardless of its source, would
be emitted from the same spatial locale. The transmitting Light Emitting
Diode (LED) was positioned 111.8 cm. (3 ft. 8i n.) from the receiver and
hidden from the subjects view by a screen. The subject was told that the
purpose of the experiment was to compare the comprehensibility of two kinds
of recorded speech. He or she was informed that during each trial a brief
verbal message would be presented.
The subject was instructed to write down what he or she understood the message
to be. The two blind subjects wrote their responses in Braille, which was
later translated for scoring. Several practice trials were presented in
order to familiarize the subject with the experimental procedure.
Each subject was presented with 60 trials. In 30 of these, the messages
were presented on the tape recorder and in the remaining 30, using the Talking
Signs transmitters. The mode of presentation and order of the messages were
randomized. In each trial the subject was allowed only one presentation
of the complete message and no repetitions were permitted.
Result:
For each subject the number of messages correctly understood for the two
modes of presentation was calculated. Due to the high accuracy rate and
very low variance creating a skewed distribution in both experimental conditions
a non-parametrlc analysis of the data was chosen. The mean number of correct
answers and standard deviation for each condition appears in Table 1. Using
the Walsh Test (see Siegel, 1956), as significant difference was found between
the comprehensibility of the tape recorder and Talking Signs speech (p .047,
two-tailed). The two blind subjects produced perfect scores for both conditions
and correctly understood all of the messages.
Table 1. Mean number of correctly understood messages with standard
deviations for Talking Signs and tape recorder speech.
Mean Talking Signs = 28.94 Mean Tape Recorder = 29.83
Standard Deviation Talking Signs = 1.43 Standard Deviation Tape Recorder = 0.38
An interesting trend was observed among the bilingual subjects who exhibited
a lower accuracy rate than the native English speakers in the Taking Signs
condition. The percentage of correctly understood messages for these two
sub-groups appears in Table 2. When the data from the bilingual subjects
were eliminated from the analysis, the difference in comprehensibility between
the Talking Signs and tape recorder speech was not significant.
Table 2. Percentage of correctly understood messages for bilingual and
Native English speakers listening to Talking Signs or Tape Recorder.
Native English with Talking Signs = 98.7% Bilingual with Talking Signs = 90.7%
Native English with Tape Recorder = 99.4% Bilingual with Tape Recorder = 99.3%
Total for Native English speakers = 99.1% Total for Bilingual speakers = 98.1%
Total for both Native and Bilingual using Talking Signs = 96.5%
Total for both Native and Bilingual using Tape Recorder = 99.4%
Discussion:
The subjects demonstrated a high accuracy rate in understanding the content
of the messages for both modes of presentation. They reported feeling a
high level of confidence and a minimum uncertainty in their correct answers
It was true for both modes of presentation. The small difference between
conditions, only 2.9% may not be due to gross differences in the intelligibility
of the two types of speech so much as it may be due to subject characteristics.
Post hoc analysis revealed differences in accuracy between bilingual and
native English speakers in the Talking Signs condition. This may be analogous
to the situation often reported when the bilingual person is first learning
the new language and has increased difficulty in understanding the second
in telephone conversations. This can occur even though face-to-face conversation
presents no problem to the listener. A relativity simple solution to this
problem would be to modify the transmitter to incorporate two or more languages
using different carrier frequencies or wave lengths of light. Each receiver
would then be tuned to receive only the desired language.
The present experiment suggested at least two avenues of possible future
investigation. First the effect of user control over the stimulus would
be of interest. In this experiment only one presentation of the message
was permitted during each trial. The experimenter held the Talking Signs
receiver some distance away from the subjects' head, minimizing the amount
of control the listener had over the stimulus. Most subjects reported feeling
confident that they could have accurately understood all of the messages
had they been allowed to hear them a second time. In actual practice the
receiver would be held by the person using the Talking Signs system as close
to his or her ear as desired and an infinite number of repetitions of the
message is theoretically possible. The question of whether speech intelligibility
is enhanced more for the active rather than the passive listener still remains
to be answered. More research needs to be done in order to determine whether
message repetition or ambient noise affects understandability.
Secondly, the influence of environmental context on message intelligibility
might vary the results. In the present experiment, message content was restricted
to contextual information. Subjects were instructed that they would be hearing
phrases representative of markers found in a typical office building. Under
these conditions, which were chosen because they were as close to "real
life" use as possible, intelligibility scores were very high. Since
the real life user will always possess some environmental contextual information,
the question of intelligibility for random, non-contextual phrases is largely
academic. However, a test of this sort would perhaps provide a more sensitive
measure for comparison between Talking Signs speech and high fidelity reproduction.
In conclusion, the results of this study indicate that the speech intelligibility
of the Talking Signs system is sufficient for accurate understanding of
the transmitted messages. Bilingual users may be more sensitive to the imperfections
of the speech and therefore experience more greater initial difficulty with
the system. While this difference may disappear with practice, a simple
solution is to adapt the system for use with several different languages.
In future stages of Talking Signs development the speech quality will certainly
be improved. The present study demonstrates that the current system is adequately
intelligible and shows that its utility in terms of orientation and mobility
performance will not be limited by speech quality.
References:
Loughborough, W. Talking Lights. Journal of Visual Impairment and Blindness.
1979, 73, 143.
Schenkman, B. The effect of different angles and training on the time of
detection in an orientation and for the blind. Unpublished manuscript.
Siegel S. Non-parametric statistics for the behavioral sciences. New York:
McGraw-Hill, 1956.
John Brabyn is co-director, Rehabilitation Center, and Lesley Brabyn is
research assistant, Smith-Kettlewell Institute of Visual Sciences.
This research was supported by a grant from the National Institute of Handicapped
Research, (Grant no. 23-P-57590/9) to Dr. Arthur Jampolsky. The authors
wish to express their appreciation to William Lougborough, Talking Signs
inventor, for his suggestions.