An assistive device for direction estimation of a sound source.

This article was downloaded by: [New York University] On: 30 November 2014, At: 18:01 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Assistive Technology: The Official Journal of RESNA Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/uaty20

An Assistive Device for Direction Estimation of a Sound Source a

a

Ki-Won Kim MD , Jung-Woo Choi PhD & Yang-Hann Kim PhD

a

a

Mechanical Engineering , Korea Advanced Institute of Science and Technology , Daejeon , Republic of Korea Accepted author version posted online: 08 Feb 2013.Published online: 14 Oct 2013.

To cite this article: Ki-Won Kim MD , Jung-Woo Choi PhD & Yang-Hann Kim PhD (2013) An Assistive Device for Direction Estimation of a Sound Source, Assistive Technology: The Official Journal of RESNA, 25:4, 216-221, DOI: 10.1080/10400435.2013.768718 To link to this article: http://dx.doi.org/10.1080/10400435.2013.768718

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Assistive Technology® (2013) 25, 216–221 Copyright © 2013 RESNA ISSN: 1040-0435 print / 1949-3614 online DOI: 10.1080/10400435.2013.768718

An Assistive Device for Direction Estimation of a Sound Source KI-WON KIM, MD∗ , JUNG-WOO CHOI, PhD, and YANG-HANN KIM, PhD

Downloaded by [New York University] at 18:02 30 November 2014

Mechanical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea

A wearable assistive device is proposed that visually indicates the direction of a sound source for people who cannot hear but can see when a loud sound occurs. The device has been implemented in the shape of common eyeglasses, so that it does not draw attention to the wearer’s hearing impairment. Acoustical information is acquired by an array of microelectromechanical-system microphones attached to the eyeglasses. The direction of a sound source is estimated in real time and indicated to the user via four light-emitting diodes designating the front, back, left, and right directions. Two methods for estimating the sound direction were compared: delay-and-sum beamformer and sound pressure level (SPL) comparison. The performance of the directional estimation was evaluated using the head-and-torso simulator. The SPL comparison method was 92% accurate versus the 84% accuracy of the delay-and-sum beamformer approach. Keywords: assistive devices, deaf/hard of hearing, instrument development

Introduction Several assistive technologies have been studied to convey sound information to deaf individuals by tactile or visual information. Damper and Evans (1995) proposed an electronic system to alert individuals of the occurrence of household sounds, such as doorbells, telephones, or smoke alarms. Ho-Ching, Mankoff, and Landay (2003) attempted to deliver sound information visually by a spectrographic scheme with positional ripples. Their method displays the position, pitch, and volume of the sound. Azar, Saleh, and Al-Alaoui (2007) presented different visual displays fused into a single program to enhance the user’s awareness of his or her surroundings. Although all these sound visualization methods identify the location of a sound source, they only do so in special spaces where microphone arrays are installed. The systems proposed in these studies are also non-portable; hence, they cannot be used when deaf individuals are outdoors. If deaf individuals are able to see, and the important sound occurs within their field of view, they can recognize the sound by observing their environment visually. However, they still have difficulty in recognizing and reacting to important sounds that occur outside their field of view. In this situation, a device that can display where a sound originates would assist them. Such a device would allow these individuals to become aware of their surroundings. The practical requirements of such a device are that it should be portable, lightweight, and wearable. An eyeglasses-type device is a good candidate because it could be worn without any unnatural attachments, and visual indicators could be seamlessly installed in the user’s field of view.

∗

Address correspondence to: Ki-Won Kim, Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea. Email: [email protected]

The ultimate device would be special eyeglasses that display the locations and propagation patterns of various sound sources. However, displaying such information would require expensive and sophisticated instrumentation. As an initial step toward developing such an ultimate device, we have attempted to construct eyeglasses that simply display the propagation directions of important or non-trivial sounds, such as various warning signals that occur on a street (Figure 1).

Development of an Eyeglasses-type Assistive Device To measure sound pressure signals, seven microphones were attached to the inner surface of a pair of eyeglasses with equal spacing (6 cm), as shown in Figure 2. It is noteworthy that the microphone spacing and distribution generally determine the performance of the proposed device. Spatial resolution and aliasing are some of them, if we use a beamforming technique (which will be explained in the next section). Although other microphone arrangements with better performance may exist, we find that the arrangement presented here is quite acceptable for estimating whether or not the signal originates from four different directions: front, back, left, and right. Two microphones are assigned to the left and right sides of the eyeglasses and three microphones are assigned to the front side. Microelectromechanical-system (MEMS) microphones (Analog devices, ADMP401, 8 mV/Pa) were selected in the present work because their low power consumption (825 µW) and small size (4.72 mm × 3.76 mm × 1.0 mm) are suitable for portable devices. In addition, the inner surface of the eyeglasses was equipped with four light-emitting diodes (LEDs) to indicate the direction of a sound source. Each LED displays each direction: left, right, front, and back. A processing unit for direction estimation was composed of three main parts (Figure 3). The measured sound pressure signals were acquired by a data acquisition device (NI9234). From these signals, the direction estimation results were obtained using


Assistive Device for Direction of Sound

217

Fig. 1. The ultimate goal of an assistive device in which sound visualization techniques are applied. The eyeglasses display the directions of important or non-trivial sounds, such as various warning signals that occur on a street (color figure available online).

Fig. 2. A eyeglasses-type assistive device to display where the sound originates. Seven microphones were attached to the inner surface of the glasses with equal spacing (6 cm), and four LEDs are installed. Two microphones are assigned to the left and right sides of the eyeglasses, and three microphones are for the front side (color figure available online).

a real-time algorithm installed in a laptop PC. A signal was then generated using the algorithm and fed into the corresponding LED via an analog output module (NI9263).

Direction Estimation

Fig. 3. Equipment setup for demonstration of the assistive device. The device is composed of two main parts: the eyeglasses with sensors and LEDs and the processing unit. The processing unit is made up of a data acquisition device (NI9234), a laptop PC for signal processing, and an analog signal output device (NI9263) (color figure available online).

To estimate the direction of a sound source, we assume that a sound source is further than the Rayleigh distance (2A2 /λ; A is the size of array, and λ is the wavelength) from the eyeglasses, such that the propagated sound can be considered a plane wave. We also assume that the direct sound is dominant relative to the reflected one. These assumptions are justified because the eyeglasses are for mainly alerting the wearer of the location of the potentially harmful noise sources. For direction estimation, we need to define what we mean by “front,” “back,” “left,” and

218

Kim et al. When a plane wave is incident from ϕ s with the speed of c, the measured acoustic pressure signal at the m-th microphone can be expressed as p (rm , ϕm , t), where (rm , ϕm ) denotes the location of the m-th microphone. Then, the beam output steered to the arbitrary direction, z (ϕ, t), is obtained as follows: z(ϕ, t) =

M 1 p rm , ϕm , t + τm,1 (ϕ) , M m=1

(1)


where τ m,1 (ϕ) is the relative time delay between the m-th microphone and the first microphone for an arbitrary ϕ-direction. This term can be expressed as follows: Fig. 4. A plane wave from a sound source is incident upon the eyeglasses on the wearer’s head. The eyeglasses quickly estimate where the sound originates: front, back, left, and right (color figure available online).

τm,1 (ϕ) =

rm [cos (ϕ − ϕ1 ) − cos (ϕ − ϕm )] . c

(2)

In this artcile, we define the beam power, BP, as the sound pressure level (SPL) of the beam output, expressed by Equation (3). Then, we estimate the direction of a sound source by finding the maximum beam power: ⎛ 1 BP(ϕ) = 10 log ⎝ T

T

⎞ z2 (ϕ, t)dt/p2ref ⎠ , pref = 20µPa. (3)

0

Fig. 5. Schematics of sound source localization using the delayand-sum beamformer. A plane wave propagated from the source is measured by the microphone array, and the beam output and beam power are then obtained by the delay-and-sum beamformer (color figure available online).

“right” in terms of the azimuth angle ϕ (Figure 4). When the wearer turns his or her gaze toward the direction where a sound occurs, the corresponding region should be included inside the wearer’s field of view to understand the situation. According to the literature (e.g., see Ferre, Aracil, & Sanchez-Uran, 2008), a typical human has a field of view of roughly 120 degrees, and this allows us to define the four regions: “front” (−45 degrees ≤ ϕ ≤ 45◦ ), “left” (45 degrees ≤ ϕ ≤ 150◦ ), “back” (150 degrees ≤ ϕ ≤ 210◦ ), and “right” (210 degrees ≤ ϕ ≤ 315◦ ). Once the direction of a sound source, ϕ s , is estimated, we can determine the correct region, and the LED corresponding to that region is illuminated. Therefore, our problem is to accurately estimate ϕ s and then determine the incident region and light the corresponding LED. Hence, we begin by investigating candidate techniques that can sufficiently and quickly estimate ϕ s , given the measured pressure signals on the seven microphones. The first candidate is a delay-and-sum beamformer that requires the fewest calculations among the various beamforming methods (e.g., see Choi & Kim, 1995; Johnson & Dudgeon, 1993; Pillai, 1989; Van Trees, 2002). As Figure 5 illustrates, this method works by adjusting the relative time differences between each microphone caused by the difference in the wave propagation path.

The second candidate is a simpler method that directly compares the SPLs of the microphone groups facing front, back, left, and right. This method uses the scattering phenomena around a user’s head, in which sound pressure is amplified in the incident direction and diminished in the opposite direction. On the basis of this phenomenon, we attempt to estimate the direction of a sound source by comparing the averaged SPLs calculated at the front, back, left, and right. All the microphones are divided into four groups, depending on their position, and the averaged SPLs in each direction are obtained, as shown in Figure 6. Then, the direction with the largest level is estimated as the direction of the sound source. As shown in Table 1, this method requires only 4 × [6N + 3] calculations in terms of the computational cost,

whereas the delay-and-sum beamformer requires 360/ϕ × 7N2 + 9N calculations, where ϕ is the angular gap between each steering angle, and N is the number of data points in a time signal block.

Implementation for Real-time Operation To reduce unnecessary operation and display only the important information, a threshold for detecting sounds is required; the results are displayed only if an important sound occurs (they are not displayed for unimportant sounds or background noise). The threshold should be automatically updated considering the background noise level. Dufaux et al. (2001) suggested such a threshold, although it was limited to impulsive sounds. In the present work, this threshold has been modified such that it can be applied to non-impulsive sounds. The signal power e (k) for the k-th time signal block is defined as



219

Fig. 6. Schematics of the SPL comparison method. The microphones are divided into four groups that face in the front, back, left, and right directions (color figure available online).

Table 1. Number of calculations for the delay-and-sum beamformer and SPL comparison method. Computation Delay-and-sum beamformer

Process Time delay compensation (convolution integral) SPL calculation Average Beam power Steering Total

SPL comparison

7N 2

·

· 7N 2N 360/ϕ 360 2 × 7N + 9N ϕ

6N 3 . 4

epre (k + 1) =

4 × [6N + 3]

Note. N = the number of data points measured at one microphone; 360/ϕ = the steering number; . = no calculation load in this process.

e(k) =

M 1 1 T 2 p (t + (k − 1)T) dt , M m=1 T 0 m

k = 1, 2, · · · , (4)

where T is the length of the time signal block, and M is the number of microphones. Then, the signal power sequence epre (k) is composed of the signal powers from the previous time signal blocks as: epre (k) = [e(k − L) e(k − L + 1) · · · e(k − 1)]T ,

(5)

where L is the number of elements and is chosen to be 20. The threshold for detecting sounds is defined by Equation (6) considering factor α, which is proportional to the threshold level: th = α(5 · std + m),

In equation (6), m and std are the mean and standard deviation, respectively, of the elements in the sequence. Factor α is chosen to be 10 to detect when the sound is sufficiently large (approximately 10 dB) as compared to the background noise. The power sequence is updated only if the present e (k) is less than or equal to the threshold value:

(6)

no update, e(k) > th [e(k − L + 1) · · · e(k − 1)

. e(k)]T , e(k) ≤ th (7)

For practical use, a recognition procedure in addition to the sound detection procedure should be added to discern important sounds from unimportant ones or background noise. This procedure can be implemented by applying sound recognition techniques (e.g., see Dufaux, 2000). If a sound is detected, the eyeglasses should then estimate the region where the sound source is located in real time. Let the entire calculation time Tc be the time required to present the result from the measured time signal block [(k − 1) T ≤ t ≤ kT]. We then refer to this system as a real-time system if the calculation time is less than the time block: Tc ≤ T. In other words, if the results are sequentially produced from the previous time signal block without any delay or omission, we can consider this device to act in real time. The procedures in Figure 7 illustrate how this device works in real time: (1) Acquisition of the acoustic pressure signal: T = 0.1 [sec] (2) Calculation of th and e (k) (3) Triggering of sound detection: e (k) > th: go to the directional estimation procedure e (k) ≤ th: go back to procedure (1) (4) Directional estimation of the sound source (5) Illumination of the corresponding LED (6) Repeat (1)–(5)


220

Kim et al.

Fig. 7. Flow chart for operating the assistive device in real time (color figure available online).

Table 2. Comparison of the overall accuracy of directional estimation. Delay-and-sum beamformer

Fig. 8. Experimental setup for measuring the localization performance (color figure available online).

Performance Test The experimental system was installed in an anechoic chamber to observe the directional estimation performance of the developed eyeglasses. To reproduce the situation in which a deaf individual is wearing the assistive device, we used a head and torso simulator (HATS, B&K 4128-C) that represents the average human adult. As shown in Figure 8, a sound source radiated band-limited white noise (0.25–8 kHz), and MEMS microphones attached to the eyeglasses measured the acoustic pressure signal. The distance between the speaker and the eyeglasses was 3.5 m; therefore, we considered the incident wave as a plane wave within a frequency range of approximately 10 kHz according to the definition of the Rayleigh distance. The experiment was carried out by changing the direction of the speaker in angular steps of 15 degrees to observe changes in performance with respect to the direction of the sound source.

SPL comparison

Direction

Correct

Incorrect

Correct

Incorrect

Front Left Back Right Total

34/42 39/42 13/18 35/42 121/144

8/42 3/ 42 5/18 7/42 23/144

38/42 41/42 17/18 36/42 132/144

4/42 1/42 1/18 6/42 12/144

The directional estimation performance is compared in Table 2. The overall accuracy of directional estimation was calculated by observing the number of directions correctly identified by the eyeglasses. The accuracy was obtained at six frequencies (0.25, 0.5, 1, 2, 4, and 8 kHz). The direction of the sound source was estimated 144 times by each method because the direction was changed in 15 degrees steps. The results show that the SPL comparison method (approximately 92% accuracy) is more accurate than the delay-and-sum beamformer (approximately 84% accuracy), although there are pros and cons to each method. First, we consider the delay-and-sum beamformer. In Figure 9(a), the beam power distribution is depicted with respect to the frequency and steering angle when the sound source was positioned at ϕs = 0 degrees. The delay-and-sum beamformer can be used only up to 2.5 kHz and is restricted to the frequency range of 2.5–8 kHz because of the high level of the sidelobes. All the incorrect results in Table 2 occurred at 4 and 8 kHz because these sidelobes confuse the estimation of the correct sound direction. On the other hand, the results of the SPL comparison method show that the direction of the sound source can be estimated in


221


The delay-and-sum beamformer and the SPL comparison method were used to estimate the direction of a detected sound source in real time. To quantify the performance, the HATS that represents an average adult was used in an experiment. The SPL comparison method has the advantages of simplicity and speed as compared to the delay-and-sum beamformer. In addition, the SPL comparison method can be used in the high-frequency range. However, this method also has the disadvantage of poor accuracy at the boundary of two adjacent directions. To be more useful in practice, the assistive device should be improved in two respects. First, the processing unit should be miniaturized to make the assistive device more portable. Second, a recognition method will need to be developed to discern important sounds among all sounds detected.

Acknowledgments This research was supported by the Converging Research Center Program funded by the Ministry of Education, Science and Technology (2012K001329), UTRC (Unmanned technology Research Center) at KAIST (Korea Advanced Institute of Science and Technology), originally funded by DAPA (Defense Acquisition Program Administration), ADD (Agency for Defense Development), the Ministry of Knowledge Economy grant funded by the Korea government (Grant No. 10037244), and the BK21 (Brain Korea 21) project initiated by the Ministry of Education, Science and Technology.

References Fig. 9. Results of the performance tests: (a) beam power distribution with respect to the frequency and the steering angle, and (b) results of the SPL comparison. The red dotted box in (b) indicates the boundary of two directions (color figure available online).

a relatively wide frequency range (0.25–8 kHz) because the SPL differences in each direction become obvious as the frequency increases. However, the estimation performance is degraded at the boundary of two adjacent directional categories because of the small SPL difference, as shown in Figure 9(b). From the results of the experiment, we conclude that the delayand-sum beamforming method is relatively useful in the low frequency range below 2.5 kHz, and the SPL comparison method can be applied in the high frequency range of 2.5–8 kHz.

Conclusion In the present work, we developed an eyeglasses-type assistive device for deaf individuals who can see. Seven MEMS microphones were attached to the surface of a pair of eyeglasses with equal spacing, and four LEDs were installed to display the results of directional estimation.

Azar, J., Saleh, H. A., & Al-Alaoui, M. A. (2007). Sound visualization for the hearing impaired. International Journal of Emerging Technologies in Learning (iJET), 2(1), 1–7. Dufaux, A., Besacier, L., Ansorge, M., & Pellandini, F. (2000, September). Automatic sound detection and recognition for noisy environment. Paper presented at the X European Signal Processing Conference, Tampere, Finland. Choi, J.-W., & Kim, Y.-H. (1995). Spherical beam-forming and MUSIC methods for the estimation of location and strength of spherical sound sources. Mechanical Systems and Signal Processing, 9, 569–588. Dampper, R. I., & Evans, M. D. (1995). A multifunction domestic alert system for the deaf-blind. IEEE Transactions on Rehabilitation Engineering, 3, 354–359. Ferre, M., Aracil, R., & Sanchez-Uran, M. A. (2008). Stereoscopic human interfaces: advanced telerobotic applications for telemanipulation, IEEE Robotics & Automation Magazine, 15, 50–57. Ho-Ching, F. W., Mankoff, J., & Landay, J. A. (2003, April). Can you see what I hear? The design and evaluation of a peripheral sound display for the deaf. In Proceedings of the SIGCHI Conference on Human Factors in Computer Systems (pp. 161–168). ACM. Johnson, D. H., & Dudgeon, D. E. (1993). Beamforming. In Array signal processing concept and techniques (pp. 112–119). Englewood Cliffs, NJ: Prentice Hall. Pillai, S. (1989). Conventional techniques. In C. S. Burns (Ed.), Array signal processing (pp. 15–18). New York, NY: Springer-Verlag. Van Trees, H. (2002). Frequency–wavenumber response and beam patterns. In Optimum array processing: Part IV of detection, estimation, and modulation theory (pp. 23–36). New York, NY: Wiley.

Optimal Prediction of Moving Sound Source Direction in the Owl.

Auditory Model-Based Sound Direction Estimation With Bilateral Cochlear Implants.

A domestic assistive device for rheumatoid hands.

Ergonomic evaluation of a wearable assistive device for overhead work.

An implantable electromagnetic sound source for speech production.

An implantable electromagnetic sound source for speech production.

Rehabilitation of a hemiplegic patient with cardiac assistive device.

IntraVAD, an intra-ventricular assistive device for heart failure patients: design and proof of concept simulations.

Design of a Direction-of-Arrival Estimation Method Used for an Automatic Bearing Tracking System.

A Low-Complexity Method for Two-Dimensional Direction-of-Arrival Estimation Using an L-Shaped Array.

An open source device for operant licking in rats.

Flow re-direction endoluminal device.

Cross-cultural adaptation of the psychosocial impact of assistive device scale (PIADS) for Puerto Rican assistive technology users.

Sound speed estimation and source localization with linearization and particle filtering.

Calibration of Laser Beam Direction for Inner Diameter Measuring Device.

Applied physics. Selecting the direction of sound transmission.

A review of assistive listening device and digital wireless technology for hearing instruments.

Exploring the meaning of a new assistive technology device for older individuals.

Modeling sound-source localization in sagittal planes for human listeners.

Sound frequency-invariant neural coding of a frequency-dependent cue to sound source location.

Source and listener directivity for interactive wave-based sound propagation.

Sound source localization identification accuracy: bandwidth dependencies.

An investigation on the feasibility of uncalibrated and unconstrained gaze tracking for human assistive applications by using head pose estimation.

Open-source products for a lighting experiment device.