NZSM Online

Get TurboNote+ desktop sticky notes

Interclue makes your browsing smarter, faster, more informative

SciTech Daily Review

Webcentre Ltd: Web solutions, Smart software, Quality graphics


Seeing with Sound

Acoustical vision systems are opening up a new way of seeing for the blind.

Professor Leslie Kay

There has long been a search for a vision subsitute for the blind, something that will give them the ability to perceive their surroundings sufficiently well to move about and live independently. Traditionally, a cane or a dog guide has been used to provide some assistance, however the sonic "vision" of the bat may provide the ultimate answer in painting a complex vision of the environment using sound, rather than sight.

Efforts have been made over many years to provide a form of artificial vision using electronic cameras coupled to computers. This approach transfers the information by vibrotactile devices which tap the skin, or by electrodes implanted directly in the occipital cortex, the vision centre of the brain. Little realistic progress has been reported.

The discovery, more than 50 years ago, of the bat's ability to "see with sound" stimulated scientific thought on how to use ultrasonic waves to see like the bat. The potential application for the blind was quite well publicised, especially through Professor Donald Griffin's book Bats and Men. At that time, however, ultrasonic vision was envisaged only as an obstacle detector. It was not realised that the bat's remarkably effective information-gathering sonar could perhaps be used to produce rich spatial information, complete with object recognition capability.

In 1959, I had just left the highly classified field of underwater sonar development and, as an outlet for the sort of knowledge I'd gained, I proposed a sensing system for the blind which had many similarities to a bat's sonar. I studied these in collaboration with Griffin. Thus began the long haul towards what has recently become the first available acoustic vision substitute, modelling the dual functions of the eyes but using sound waves instead of light.

Providing a vision substitute for the blind is different from providing low vision optical aids; the latter simply magnify things to enable ordinary vision to operate. The vision substitute is also different from hearing aids, which simply modify and amplify sound to permit hearing to occur. For the deaf, cochlear implants attempt to replace the mechanical process of the biological cochlear so that a form of hearing is recovered. None of these operate as real-time interactive sensors, the way a vision substitute must do if it is to adequately gather ever-changing information about the surrounding environment.

Early opposition to the use of sound focused on the complexity of the concept. Perceptual psychologists insisted that the presentation of spatial information to the blind must be easy to use and therefore must be of a simple form. "Single object detection" using a simple display, preferably not auditory, seemed to them the most appropriate way to go. Many simple single-object detecting devices have now been developed, but it has recently been recognised that these sensors do not in fact provide the blind with sufficient information for their needs. Only a few use them.

It now seems to be accepted that a useful acoustic vision substitute must provide "rich" spatial information which permits the user to discriminate between objects in the local environment.

Ultrasonic waves produce aural information by reflecting off surfaces, in much the same manner that reflection of visual light waves produces visual information. The information in the ultrasonic waves contains object distance, object direction, and the reflection characteristics of elements forming the three-dimensional space we see. It has less detail because it operates at longer wavelengths (3-6mm, compared to the 0.0005mm, or thereabouts, of visible light), and it doesn't have the colour information present in visible light. The very wide bandwidth ultrasonic waves instead carry information about surface texture.

It was the perceived difference between a visual optical image with which we are familiar and the acoustic ultrasonic image with which we are not, that led psychologists to declare that blind people would find great difficulty in learning to interpret this ultrasonic "vision". This was especially so when it became clear that, as in the bat, the auditory system would have to process the unique spatial information binaurally, using both ears. The question was "what would the brain eventually perceive, and how difficult would it be to learn to use this?"

Acoustic Spatial Perception

Despite concerted opposition to this approach, an ultrasonic electronic instrument was eventually developed which enables a high degree of spatial awareness (imaging) well beyond what I first envisaged. It now seems to satisfy some of the earlier critics. Through the years, the spatial information gathering process has increased in capability very significantly and -- what is very surprising -- the process of learning its use has become easier and quicker.

The technological key was the development of a sensing element which would convert ultrasonic waves in air into understandable audio signals in real-time. The sensor is fitted into a headband carrying the ultrasonic transducers and the specially designed earphones. It emits a series of ultrasonic waves ahead of the wearer, varying from say 100 to 50 kHz in frequency. These reflected and scattered ultrasonic waves are acoustically "textured" by the complex surfaces of objects, and produce information about this "isonifed" environment. These waves continuously pass the head of a perceiver, and envelope it with sufficient spatial information to form a 3D acoustic "hologram" of their surroundings up to a selectable distance. A simple array of three specially shaped sensing elements in the sensor capsule sample this "holographic" field as the head is moved.

The special transducer array of receiving elements generates three independent broadband electrical signals, which are converted into audio spatially-related signals and thence into two binaural, stereo sounds of 0-5000Hz. Each complex-shaped reflecting surface produces a tone complex, each tone having a frequency which is proportional to the distance. The tone complexes enable high-resolution object discrimination with regard to distance.

Acoustic "texturation" comes about through the modulation of the ultrasonic waves by complex surfaces. The result is time-varying fields of scattered and reflected waves which vary in both intensity and field shape. This allows surface-related sound signatures to be formed -- creating a sound language -- and auditory discrimination between objects can take place. This is fundamental to the perceptual process.

Thus the stereo sound signatures at the ear drums represent the 3D surfaces producing the echo waves. Fine neck movement causes the audio signatures to vary in real time as the sensing array scans the ultrasonic field at the head. The audible sounds presented to the two ears are processed by the physiological operation of the two cochleas.

These act as continuous frequency spectrum analysers, sending the information in the soundwaves through the auditory neural channels as electrical impulses. These stimuli provide the neural information for the formation of an image of the "isonofied" surroundings. This information is initially perceived as complex sounds which vary as the head moves, providing fast feedback. The real-time response and interaction with the environment is unique in such systems.

As with any perceptual process, interpreting the sounds to be able to perceive one's surroundings requires learning. Psychologists had thought that this would be too difficult and, for a time, it was.

New Perceptual Process

To perceive something, the incoming sensory stimulus must be analysed and synthesised. With vision, you can usually look at something long enough for it to be identified, but with auditory input, the stimulus is transient and normally changes continually and uncontrollably with time. The information in the stimulus might not remain available for analysis and synthesis without some form of auditory store to aid perception. Vision involves an iconic store, but both areas of perception require a great deal more study.

The important difference between the normal transient auditory stimulus, such as from vocal sounds, and the new spatial stimulus of the vision substitute, is that the latter is under the control of the blind observer through sensitive neck movement. This was true of the binaural sensors developed earlier, but the spatial resolution of these early models was poor and could cope only with a sparsely populated environment.

The new Trisensor (named for the three sensing elements) provides both a central narrow field and a wide peripheral field of view, modelling the dual functions of the eye. An object of interest in the wide field of view can be fixed on acoustically by turning the head and then using fine neck movements to explore the characteristics of the object using the central narrow field. This models the foveal seccadic movements of the ordinary eye, where objects are fixed upon and rapidly scanned. As with ordinary vision, most of the surrounding environment remains constant while this scanning is taking place.

The 200-5000Hz tone complexes produced by the sensor system vary at the ears in real-time (instantaneously) with head movement; they are also seemingly perceived to do so. There is no perceived delay in the feedback loop as the head is moved -- even very fast -- suggesting that the biological feedback loop is very tight.

The Acoustic Experience

To a beginner, the complex of tones is confusing and an introduction to the new perceptual process is best carried out using a simple situation, such as "looking at" objects on a table. This lets the user explore using kinaesthetic--haptic feedback (reaching and grasping), just as in the normal maturation process when a baby first learns to judge object distance and size.

First, one acoustically simple object is used (a vertical circular rod), then multiple rods are introduced as the sensing skills increase. The object shape is then varied and more complex shapes and structural patterns are gradually introduced as perception develops. From small-scale space on the table, it is a natural step to transfer to medium-scale space on the floor and introduce movement. This is then guided by the spatial perception from the sensor.

An exercise programme of 40 lessons has been conducted to study the change in perceptual skill and behaviour of blind children. They were presented with simple objects within reaching distance on a table top. Videos of the children show that both the peripheral and the central fields of view were being used by the time these exercises had been learned. Accurate fixation on an object through the central channel was observed regularly.

Because of this behaviour change, there can be little doubt that the children were using the sensor to form a new kind of perception of the small and medium-scale space. The trajectory of hand reaching to grasp a rod was almost sighted-like in some of the children, and all learned to reach instead of grope.

The acoustic sensor has now been extensively evaluated by independent psychology and educational research groups and useability has been established.

One totally blind nine-year-old, who shuffled around the school in fear of colliding with objects, insisted that when she had to learn to use the long cane she be allowed to use, at the same time, the Trisensor. She had initially been rated as unteachable as a student of orientation and mobility (O&M). An experienced O&M specialist acceded to the demand and discovered that this child could learn to travel better than all previous students.

Another six-year-old blind girl, with a specially designed sensor which had just a small peripheral field, learned to get about the school playground of a sighted school. She became so adapted to the sensory information that she was unexpectedly captured on video running through the school yard between buildings and avoiding the many other sighted children to get to her class.

A two-and-a-half year old child with no locomotion, who did not respond to any verbal sounds, is seen six months later on video walking around the backyard following his mother's voice. When close to a silent van he catches hold of his mother's hand. At this point the unexpected happens -- the child releases his mother's hand and accurately places his other hand, palm outwards, on the side of the van in such a way that it is obvious he knew exactly where it was. Had the hand not made contact with the van, the boy would have lost balance. He too was wearing a binaural sensor with only a peripheral field of view.

The peripheral fields enabled controlled locomotion but the learning process was slow and object discrimination was possible only during motion. The Trisensor vision substitute has been shown to have a spatial resolution which is six times superior to the commercial version of the binaural aid (Sonic Glasses). This allows fixation on an object for the recognition process, not available in the earlier sensors I developed.

A field test is now in progress internationally to attempt to determine acceptability of the Trisensor. This involves a period of learning by teachers, a period of training blind people, and an assessment of the acceptability of the acoustic vision substitute by the user groups, babies, children, young adults, mature adults and older persons. The addition of the central field of view makes learning easier and quicker than with the earlier binaural sensors. They also have wider application.

It is not obvious that this new sense will get used. It certainly will not if it is not manufactured as a commercial venture. The process has now become one of marketing a sophisticated man-machine interactive technology to a humanities-oriented service provider.

Professor Leslie Kay is director of the Bay Advanced Technologies Ltd Spatial Sensing Laboratory in Russell.