It is estimated that by 2030, the number of people in the United States over the age of 65 will account for over 20% of the total population. Hearing and vision loss naturally accompanies the aging process. Persons with hearing loss can benefit from observing the visual cues from a speaker such as the shape of the lips and facial expression to greatly improve their ability to comprehend speech. However, persons with vision loss cannot make use of these visual cues, and have a harder time understanding speech, especially in noisy environments. Furthermore, people with normal vision can use visual information to identify a speaker in a group, which allows them to focus on this person. This can greatly benefit a person with hearing loss who may be using a device such as a sound amplifier or a hearing aid. A user with vision loss, however, needs to be provided with this speaker information to make optimal use of such devices.
We propose developing a prototype device that will clean the speech signal from a target speaker and improve speech comprehension for persons with hearing and vision loss in everyday situations. In order to accomplish this task, we need to harness the visual cues that have so far largely been ignored in the design of assistive technologies for persons with hearing loss. Our first aim is to learn speaker-independent visual cues that are associated with the target speech signal, and use these audio-visual cues to design speech enhancement algorithms that perform much better in noisy everyday environment than current methods which only utilize the audio signal. We will utilize a video camera and computer vision methods to design advanced digital signal processing techniques to enhance the target speech signals recorded through a microphone. We believe that the end product will show the feasibility and importance of incorporating multiple modalities into sensory assistive devices, and set the stage for future research and development efforts.