# BioAid Part 2: From Auditory Model to Hearing Aid Algorithm

In a previous blog post, I introduced the BioAid project and discussed some of the motivations. I advise reading that post before reading this one. In this second installment, I aim to describe the algorithm architecture in technical detail and then discus some of its properties. This information is placed on my blog, allowing me to rapidly, and informally communicate some of the technical details related to the project while I gather thoughts in preparation for a more rigorous account.

## Modeling the Auditory Periphery

The architecture of the BioAid algorithm is based on a computational model of the auditory periphery, developed in the hearing research laboratory at the University of Essex. This model has undergone refinements over a time period spanning four decades. Therefore, it would be unwise to describe it in detail in this blog post! However, an overview can be given that describes the processes most relevant to the design of BioAid.

The human auditory periphery (sound processing associated with the ear and low-level brain processing) is depicted in the abstract diagram below. The images represent the stages of processing in the auditory periphery that are modeled. The acoustic pressure waves enter the ear. The waveform in the diagram is a time domain representation of the utterance ‘2841’ spoken by a male talker. The middle ear converts these pressure fluctuations into a stapes displacement that drives the motion of fluid within the cochlear. In turn, this fluid motion results in the displacement of a frequency selective membrane, the Basilar membrane (BM), running the length of the cochlea. Along the length of the BM are:

• Active structures that change the passive vibration characteristics of the membrane in a stimulus dependent manner.
• Transduction units that convert the displacement information at each point along the membrane into an electrical neural code that can be transmitted along the auditory nerve to the brain.

The image plot on the right shows the simulated neural code. The output of the process is made of multiple frequency channels (y-axis), each containing a representation of neural activity as a function time (x-axis). The output resembles a spectrogram in its basic structure, although the non-linear processing makes it rather unique. For this reason, it is referred to as the auditory spectrogram.

Diagram showing peripheral auditory processes. The input is shown on the left, and is processed to produce the output shown on the right.

This biological system can be modeled of as a chain of discrete sequential processes. In general, the output of each process feeds into the next process in the sequence. The model takes an array of numbers representing the acoustic waveform as its input. This is then processed by an algorithm that converts the acoustic representation to the displacement of the stapes bone within the middle ear. Following this, there is an algorithm that converts the stapes displacement into multi-channel representation of BM displacement along the cochlear partition. Next is a model of the transduction units, which convert the multichannel displacement information into a multichannel neural code representation. This is a representation of the information that would be conveyed by the auditory nerve to the brain.

The auditory model can then be used for various tasks. By making a model that can reproduce physical measurements, you can then use the model to predict the output of the system to all manner of different stimuli. For example, we know that the human auditory system is excellent at extracting speech information from noisy environments. By using the auditory model as a front end for an automatic speech recognizer, the modeler can investigate how the different components of the auditory periphery may contribute to this ability.

## The basic dual resonance non-linear filterbank

There are a numerous models of cochlear mechanics. The dual resonance non-linear filterbank (DRNL) is the model developed within the Essex lab. BioAid is fundamentally a modified version of the latest version of the DRNL model.

The DRNL model was originally designed to account for two major experimental observations. The first observation is the non-linear relationship of BM displacement relative to stapes displacement. This is shown by the diagram below. The basilar membrane displacement has a linear relationship with stapes displacement at low stimulus intensities. For a large part of the auditory intensity range (approximately 20 dB to 80 dB SPL across most of the audible frequency range), the relationship between stapes and BM displacement is compressive, i.e. the BM displacement only increases by 0.2 dB per dB increase in stapes displacement. At very high stimulus intensities, the relationship is linear, like at low intensities.

Illustration of the BM ‘Broken Stick’ non-linearity. The x-axis is the input stapes displacement and the y-axis is the output BM displacement.

The second observation is related to the relationship between the frequency selectivity of the BM with level. Each point along the BM displaces maximally at a specific frequency. Parts of the BM near to the interface with the stapes (base) respond maximally to high frequencies, while the opposite end (apex) responds maximally to low frequencies. For this reason, different regions along the basilar membrane can be thought of as filters. At low stimulus levels, the regions are highly frequency selective, so do not respond much to off-frequency stimulation. However, at higher stimulus intensities, the BM has a reduced frequency selectivity, meaning that the BM will be displaced by a proportionately greater amount when off frequency stimuli have high intensity. Not only does the bandwidth of the auditory filters change with stimulus intensity, but the centre frequency (or best frequency) also shifts.

Illustration of level dependent frequency selectivity. Each line shows data from a different stimulus intensity. The x-axis is stimulus frequency and the y-axis is BM displacement for a fixed position along the membrane.

The DRNL is a parallel filterbank model, in that each cochlear channel along the BM is modeled using a an independent DRNL section. Each frequency channel of the DRNL model is comprised of two independent processing pathways. These pathways share a common input and the outputs of the pathways are summed to give the final displacement value for the location along the BM being modeled. The linear pathway is made of a linear gain function and a bandpass filter. The nonlinear pathway is made of an instantaneous broken stick non-linearity sandwiched between two bandpass filters. The filters are tuned according to the position along the BM being modeled. This arrangement is shown by the diagram below.

Schematic showing one frequency channel of the DRNL model

The linear pathway simulates the passive mechanical properties of the cochlear. Therefore, the output of this pathway in isolation would give the BM displacement if the active structures in the cochlear were not functioning. Conversely, the non-linear pathway is the contribution from the active mechanisms to the displacement. The 3-part piecewise relationship between BM and stapes displacement can be modeled by just summing the responses of the pathways. When performing decibel addition, the sum value is approximately the greater of the two values being summed. The output of each pathway is shown below, along with the sum total. The parameters are tuned so that the output of the model can reproduce experimental observations of BM displacement.

The green line is the input-output (IO) function relating stapes to BM displacement of the linear pathway of the DRNL model. The blue line is the IO function for the non-linear pathway. The red line is the decibel sum of the two pathways.

The DRNL model can also reproduce the level-dependent frequency selectivity data using this architecture. For this, the filters in the two pathways are tuned differently. As the level of stimulation increases, the contribution of the linear pathway becomes significant. By using different filter tunings, it is possible to make a level-dependent frequency response using this combination of linear filters.

## The latest dual resonance non-linear filterbank

The active structures in the cochlear that give rise to the non-linear relationship between stapes- and BM-displacement are subject to control by a frequency-selective feedback pathway originating in the brain. When there is neural activity in this feedback pathway, the contribution of the active structures to BM displacement is reduced. The level of activity in the feedback network is at least partially reflexive: the feedback is activated when the acoustical stimulation intensity passes a certain threshold within a given frequency band, then grows with increasing stimulus intensity.

Robert Ferry showed that the result of neural activity in the biological feedback network could be simulated by attenuating the input to the non-linear pathway of the DRNL model. The cochlear and neural transduction processes have limited dynamic ranges, and there is some evidence to suggest that the feedback modulated attenuation may assist a listener by optimally regulating the cochlear operating point for given background noise conditions.

We subsequently went on to complete the feedback loop in the computer model. This was achieved by deriving a feedback signal from the simulated neural information to modulate the attenuation value. This complete feedback model can then adjust the attenuation parameter over time to regulate the cochlear operating point in accordance with changes in the acoustical environment. Data from automatic speech recognition experiments have shown that machine listeners equipped with the feedback network consistently outperform (i.e. correctly identify a greater proportion of the speech material) machine listeners without the feedback network in a variety of background noises.

Diagram depicting the latest version of the DRNL model. The feedback signal is derived from the neural data after displacement to neural transduction stage (T). This feedback signal is used to modulate the amount of attenuation applied to the non-linear pathway over time.

## Simulating hearing impairment

Some origins of hearing impairment are a result of a malfunction of certain parts of the auditory periphery. Some components of the auditory periphery are far more susceptible to failure (or reduced functionality) than others. These components can include a reduction in the function of the active structures in the cochlear that influence the BM displacement, and/or a reduction in the effectiveness of the transduction structures that convert BM displacement into neural signals.

Simplified diagram of the DRNL model to highlight the impact of reduced peripheral component functionality on cochlear feedback.

Firstly, consider the case where the transduction units within a given channel are not functioning properly. Not only is there going to be an adverse effect on the quality of the information transmitted via this channel to the brain, but the feedback loop which is driven by the neural information will also not function optimally, thus compounding the problem.

Secondly, consider the case there the active structures are not functioning correctly. This will result in a reduced BM displacement for a given level of stapes displacement. The output of the transduction units will therefore be reduced, and so the feedback will be derived from a reduced-fidelity signal. To make things worse, any residual feedback signal will not be effective because the feedback signal modulates the action of the active components, which in this case are not functioning correctly.

BioAid is designed to artificially replace the peripheral functionality that may be reduced or missing in hearing impaired listeners. By simulating the non-linear pathway and feedback loop, BioAid can at least partially restore the function of the regulating mechanisms that help normal-hearing listeners to cope when listening in noisy environments.

## BioAid Architecture

The image shows the architecture of the BioAid algorithm in block form. Only 4 channels are displayed for simplicity.

The first stage of processing in BioAid involves a decomposition of the signal into various bands. This is to coarsely simulate the frequency decomposition performed by the cochlea. The frequency decomposition performed in the BioAid app is done by a simple bank of 7 non-overlapping octave-wide Butterworth IIR filters centered at standard audiometric frequencies between 125 and 8000 Hz. When the signal is filtered twice (by first and second stage filters in the algorithm), the crossover points of each channel intersect at -6dB. This means that the energy spectrum is flat when the the channels are summed. The filters are each 2nd order. Even order filters must be used to prevent sharp phase cancellations at the filter crossover points. In the laboratory version of the aid, we have found some benefit to using an 11 channel variant of the algorithm, with additional channels between 500 and 1000, between 1000 and 2000, between 2000 and 4000, and between 4000 and 8000 Hz.

No phase correction network is used, as group delay differences between channels are not a primary issue when using wide bands with modest roll-off. For a higher frequency resolution, a filterbank with good reconstruction properties would be required. The optimum frequency resolution for this algorithm is still a research question. However, the really unique features of BioAid are related to the time domain dynamics processing that occurs within each band.

Within each band is an instantaneous compression process to simulate the action of the active components in the auditory periphery. Below the compression threshold, the input and output signals have a linear relationship. Above a certain threshold the waveform is shaped so that the only increases by 0.2 dB per dB increase in input level. In the code, this is implemented as a waveshaping algorithm that directly modifies the sample values, although it could be implemented equally effectively as a conventional side-chain compressor with zero attack and release time. Instantaneous compression is not commonly used in conventional hearing aid algorithms, as it introduces distortion. Normal hearing listeners find this distortion particularly unpleasant. However, we believe that some distortion may be useful to an impaired listener if it mimics that which occurs naturally in a healthy auditory system.

Following the instantaneous compression stage, the signal is filtered by a secondary bank of filters with the same transfer function as the first bank of filters. The instantaneous compression process introduces harmonic distortion that extends above the frequency range of the band-limited signal. It can also produce intermodulation distortion products that extend above and below the band. The secondary filter bank reduces the spread of signal energy across the frequency spectrum. Astute readers will notice that the secondary filtering means that the net compressive effect can no longer be described as instantaneous, but this is a discussion for the next blog post.

The output of the secondary filter stage is then used to generate a feedback signal. This is similar to the feedback signal implemented in the latest DRNL model, but for a reduction in computational cost, it is derived directly from the stimulus waveform (omitting models of neural transduction and low-level brain processes). We call this feedback signal the Delayed Feedback Attenuation Control (DFAC) when discussing it in the context of the hearing aid. This signal is used to modulate the level of attenuation applied to the input of each instantaneous compressor. The feedback signal has a threshold and a compression ratio like the instantaneous compressor, but it also has an integration time constant (tau) and delay parameter. Rather than modify the signal on a sample by sample basis, the DFAC integrates sample magnitude using an exponential window. This signal supplied to the integrator is delayed by 10 ms (using a ring buffer) to simulate the neuronal delay measured in the biological analogue of this process. The compression threshold value is then subtracted from the integrated value and multiplied by the compression ratio to give an attenuation value for the next sample.

The implementation of the algorithm in the app is mono. However, the algorithm code can be used in a stereo configuration (we use a stereo configuration when evaluating the algorithm in the lab). When a stereo signal is supplied, the DFAC attenuation is averaged between left and right channels. This means that the attenuation applied is identical in left and right channels within a certain frequency band. This linked setup prevents the DFAC from scrambling interaural level difference cues that might be useful to the listener. In contrast, the instantaneous compression processing is completely independent between left and right channels.

In a nutshell, each channel of BioAid is a laggy feedback compressor with an instantaneous compressor sandwiched between its attenuation and detection stages. This simple arrangement is completely unique to BioAid, and certainly quite unlike the automatic gain control circuits found in standard hearing aids.

After the secondary filtering, we depart from our adherence to physiological realism in the main signal chain. All of the processing up to this point has been focused on reducing the signal energy. To make sounds audible to hearing impaired listeners, a gain must be provided in the impaired frequency regions. This is done on a channel-by-channel basis before the signals from each of the channels are summed and then presented to the listener.

## Summary

In this blog post I have described the architecture of the DRNL filterbank and how the non-linear pathway of the DRNL model forms the core of the BioAid algorithm. In the next post I will describe the unique properties of this algorithm.

# BioAid Part 1: Motivations for Building a New Class of Hearing Aid

Just before Christmas, I submitted a free app (BioAid) to the Apple iTunes Store that turns an iOS device into a hearing aid. It does this by taking the audio stream from the internal microphone, processing the audio in real time, and then playing the audio back over headphones connected to the device. For more general information on usage, please visit the main BioAid site. This information is placed on my blog, allowing me to rapidly, and informally communicate some of the technical details related to the project while I gather thoughts in preparation for a more rigorous account. This is the first part of a series of posts that I intend to write about the project.

Screenshot of the BioAid app running on an iPhone.

BioAid is not some gimmicky sound amplifier app. The development and evaluation of the algorithm has been conducted by a team of researchers within the hearing research laboratory at the University of Essex. Our research group became involved in the development of an ‘aid on a phone’ out of necessity. BioAid is a novel design for a hearing aid that is still in its infancy. There was little chance of having it made up as a conventional hearing aid for a number of reasons. We could test it in the laboratory (using a setup described below) but convincing a manufacturer to adopt the algorithm would require a considerable financial investment. Making a case would be difficult even if our new ideas were to provide a small improvement to an established design. However, we wanted to do something much more radical. I realised that we could move directly into production using a mobile phone as a portable experimental hearing aid. This would allow us to demonstrate the viability of the concept and learn from the experiences of people all around the world, not just in our laboratory.

Laboratory tests with hearing-impaired volunteers are still in progress. These tests are being conducted using a ‘lab-scale’ version of BioAid, comprised of standard behind the ear (BTE) hearing aids that are connected to a laptop computer. The signal processing that would normally occur within the hearing aid is offloaded to the laptop, making it easier for us to change the parameters in the hearing aid at runtime, or even tweak the algorithm structure itself. Another avenue of research uses the algorithm to pre-process acoustic stimuli in an off-line mode (not real time) before they are presented to listeners over headphones. Therefore, it is important to think of BioAid as an algorithm concept, rather than to pigeon-hole it as an iOS app. The BioAid algorithm has potential for use in many applications, and the iPhone app is just one form in which BioAid exists. Another one of the numerous motivations for making the iPhone implementation was that it might inspire others to use the algorithm in unusual ways, perhaps for processing speech in a VIOP application, or as a hack for a media centre, allowing film and television audio to be processed at the source. This is why the source is freely available at GitHub. There is also a Facebook page that I encourage anyone interested in the project to ‘like’ so that they can be periodically informed of developments.

## Generic hearing aid ‘gain model’

Modern hearing aids contain all manner of signal processing wizardry to assist the impaired listener in various ways. Much effort goes into developing noise-reduction technologies, and microphone array technology coupled with beam-forming algorithms to reduce off-axis sound interference. These may help to improve speech reception, or at least alleviate some of the exhaustion associated with the increased listening effort required from impaired listeners, especially when extracting information from sounds of interest in cacophonous environments. Processing often includes feedback cancellation algorithms to prevent howl associated with high gain settings in conjunction with open (non occluded) fittings. Some hearing aids even transpose information from different frequency bands to others. However, these technologies are not related to the core BioAid processing.

At the heart of any hearing aid is the ‘gain model’, and the BioAid algorithm falls into this category. The most basic goal of any hearing assistive device is to restore audibility of sounds that were previously inaudible to the hearing-impaired listener. Hearing impaired listeners have a reduced sensitivity to environmental sounds, i.e. they cannot detect the low level sounds that a normal hearing listener would be able to detect, and so it can be said that their thresholds of hearing are relatively high, or raised. To compensate for this deficit, the intensity of the stimulus must be increased, i.e. gain is provided by the hearing aid. The earliest hearing aids (the ear trumpet) just provided gain.

It is important to note that a flat loss (equal loss of sensitivity across frequency) is not often observed. More commonly, there is a distinct pattern of hearing loss, where the sensitivity is different to that of normal hearing listeners at different frequencies. For a hearing aid to work effectively across the audible spectrum, it must provide differing amounts of gain in different frequency regions. Modern hearing aids decompose sounds into separate frequency bands, perform various processing tasks, then finally recombine the signal into a waveform that can be presented to the listener via a loudspeaker. BioAid processing is no different to current hearing aids regarding this general principle.

Most hearing impaired listeners will begin to experience discomfort from loud sounds at levels not too dissimilar to those with a normal hearing sensitivity*. This means that the impaired listener has a reduced dynamic range into which the important sonic information must be squeezed. If the hearing aid applies a linear gain irrespective of the incoming sound intensity, it will help the listener detect quiet sounds, but it will also make loud sounds unbearably loud. For this reason, modern hearing aids also use compression algorithms. A lot of gain is applied to low intensity sounds to help with audibility, while considerably less gain is applied to high intensity sounds, so as not to over-amplify sounds that are already audible to the listener.

The figure below (taken from this open-access publication) is shown help illustrate the concept of reduced dynamic range. It shows categorical loudness scaling (CLS) functions for a hypothetical hearing-impaired listener and a hypothetical normal-hearing listener. A test stimulus is presented at various intensities (represented by the x-axis), and the listener is asked to categorize the loudness on a rating scale (represented by the y-axis). For sounds rated as just audible, there is a large intensity difference between the normal- and impaired-hearing listener data. However, for sounds perceived as very loud, there is little or no difference between the two listeners. The normal-hearing listener’s ratings span a range of approximately 90 dB, whereas the impaired-listener’s ratings span a relatively reduced range of approximately 50 dB.

Categorical Loudness Scaling functions for hypothetical normal- and impaired-hearing listeners. Taken from here.

Unfortunately, any non-linear process (including dynamic range compression) applied to the processing chain will have side effects. In order to protect the listener from sudden loud sounds, the compression algorithm needs to respond quickly. However, standard compression algorithms with rapid temporal acuity tend to make the acoustical environment sound distinctly unnatural. The action of the compressor is clearly audible and can interfere with the important information contained in the amplitude modulations of signals such as speech. Fast compression reduces the modulation depth of amplitude modulates signals, and can therefore reduce our ability to extract information from the glimpses of signal information we might otherwise receive during the low intensity dips in modulated masking sounds. Very fast compression also changes the signal to noise ratio (SNR) of steady state signal and noise mixtures. At positive SNRs, the signal is of greater amplitude than the noise signal. If compression is so fast that it works near instantaneously, then the high level peaks of the signal will not be amplified as much as the lower level peaks in the noise signal. The noise level will increase relative to the level of the signal information reducing an otherwise advantageous SNR. The resulting negative impact on speech intelligibility is compounded by any distortion introduced by the compression process. In contrast, slowly acting compression algorithms do not impose so many negative side effects. A very slow compressor acts like a person continuously adjusting the volume control of an amplifier while watching a movie: the gain is increased for the quiet spoken passages, and then decreased in the loud action sequences. This works well for sounds with slowly changing intensity, and the sound ‘quality’ is not vastly altered. However, this is problematic if the volume is cranked up for quiet spoken passages, and there is a sudden intense event in the soundtrack that nearly deafens the audience. For this reason, both fast and slow acting compression algorithms are used in modern hearing aids to get the best possible compromise**. BioAid also utilizes fast and slow acting compression.

If BioAid is a multi-band compressor with both slow and fast acting components, then how is it different to current hearing aid gain models? On the surface, BioAid looks similar, but the architecture is certainly unique, and this gives it some unique properties.

*This is with the exception of those whose hearing is affected by a problem with the transfer of energy through the middle ear, who will generally have an increased discomfort threshold in addition to a raised detection threshold. It is also worth noting that many hearing impaired listeners have a lower discomfort threshold than that of normal hearing listeners. This condition is known as hyperacusis and is an area of active research.

**Modern digital hearing aids generally work by processing blocks (or frames) of samples. Each block of samples is processed and the output buffer is filled before the next block of samples arrives. This frame based processing is part of what gives rise to a hearing aid’s latency. This latency is generally undesirable, but while it exists, it can be used for good. It gives the compression algorithm the opportunity to ‘look ahead’ a few samples and adjust its parameters in an optimum way given the information about ‘future’ events.

## Technical motivation for BioAid

BioAid is unique in that the algorithm has been designed from the ground up to mimic the processes that occur in the ear. Hearing aid technology has generally evolved to solve problems with each generation of algorithm design. This incremental approach provides an increasingly refined product. However, the problem with extended design and refine methods of development, is that the returns from each design revision generally tend to diminish. There is an asymptote. This partly explains why so much effort is now expended on the development of peripheral technologies in hearing aids, away from the core gain model. Machine hearing is a related field in which performance improvements are becoming harder to obtain using refinements of standard methods. In that field, there is a change going on, whereby radically different signal processes are being researched that are based on more physiologically accurate models of human hearing. Following in this revolutionary zeitgeist, BioAid is an effort to break through a current intellectual plateau in hearing aid gain model design.

The human auditory periphery (sound processing associated with the ear and low-level brain processing) can be modeled of as a chain of discrete sequential processes. In general the output of each process just feeds into the next process in the sequence. There are also some feedback signals that originate in processes situated further along the chain that modulate the behavior of the earlier-stage systems. The PhD thesis of Manasa Panda demonstrates that it is possible to model common hearing pathologies by reducing the functionality of, or completely removing some of the processing blocks in the chain. This modified model is called a ‘Hearing Dummy’, as the models of the periphery can be tailored to individual listeners. An artificial (machine) listener will make the same responses in hearing tests as the human when connected to their personalized Hearing Dummy.

Having isolated the components of the model likely to cause the listening difficulties, we then thought it might be a good idea to replicate those processes in a hearing aid. This could be to assist some residual functionality of certain auditory components, or to completely replace lost functionality of others. BioAid can be thought of as a simplified auditory model, containing a chain of models of the components most susceptible to the malfunctions responsible for hearing impairments.

There is one major difference between BioAid and the peripheral model used in the lab. In a standard model of the auditory periphery, the output is a code made of neural spikes representing the transformed sound information. Information in this form is useful for higher stages of brain processing with the correct interface, but it cannot be played back through a hearing aid. BioAid must deviate from the physiological model, as the sound must be recombined into a waveform that can be presented to the listener acoustically. Apart from this necessary alteration, we aim to remain faithful to the physiological model. This allows us to observe emergent properties of the system, rather than deliberately engineering properties into it.

## Next Time

For those who want a technical overview of the whole project immediately, there is a YouTube video below containing a 42 minute screencast of a talk that I gave back in September 2012.

This post has described general hearing aid technology and some of scientific the motivations for developing a new class of hearing aid. In the next posts, I will discuss the algorithm structure and its properties.

# Cobalt Theme for Matlab

This is a quick post showing how to apply a cobalt-like theme to Matlab’s workspace and editor. Matlab has rather limited font and colour customization options compared to XCode (see my other post about the Cobalt theme) but it is still possible to change the default theme to something that I personally find a little easier on the eyes.

The default Matlab theme (when viewed on a Mac) is shown below …

The end result is this …

If you like the look of this, make a file called matlab.prf containing the text at the end of this post. Navigate to the folder containing the original matlab.prf, make a backup of the original file and replace it with the new version. In *nix systems, this file is located in …

Below is the config text …

# Audiophile AirPlay With Raspberry Pi: Part 1

I had recently been looking into purchasing an Apple TV for the purpose of streaming audio via AirPlay. I have little interest in the video side of things, so I started searching for a dedicated audio solution, hoping to find a high fidelity airplay receiver. Given that AirPlay data is transmitted using a lossless codec, the potential for super-high fidelity is unlimited.

After a little searching I found that a Raspberry Pi (RPi) could be converted into an airplay receiver. The only problem is the somewhat-less-than-perfect analogue audio output built into the RPi. The digital to analogue converter (DAC) in the RPi works using the pulse-width modulation (PWM) principle to keep costs down. Most high fidelity audio DACs work using the pulse-code modulation (PCM) principle, generally resulting in a more faithful representation of the digitally-encoded analogue signal.

A PWM signal is comprised of a rapid series of electrical pulses. The pulse can either be zero voltage, or maximum voltage. The width (or duty cycle) of the very rapid pulses is modulated such that the the desired output voltage is represented by the average voltage of the pulse train. This is then lowpass filtered to give a steady analogue voltage (the cutoff frequency of the filter can be well above the audible band). The pulse rate in the RPi is fixed at 100 MHz. The bit depth of this pulse stream is 1 bit (the pulse voltage can only be zero or maximum). The fixed pulse rate gives an upper limit on the amount of information that can be transmitted by the audio output built into the RPi. The following formula can be used to estimate the bit depth at the standard CD sampling rate of 44.1 kHz.

This gives us a theoretical maximum of around 11 bits / sample at 44.1 kHz. This is absolutely fine for basic speech intelligibility when running a VIOP application, or for simple sound effects. However, for any serious music listening, at least 16-bit audio is required to get full resolution dynamic range out of CDs. This means that an external USB audio interface is required, but thankfully, there are some great USB external interfaces available.

### The Hardware

I happened to have a Creative X-Fi HD laying around that I was able to temporarily repurpose. This is certainly not the best external DAC that money can buy, but it sounds very good for its price, and it is more than adequate for the initial testing phase of this project.

The assembled test rig is shown in the screenshot below. The power is supplied by the USB hub. The xFi current draw is fairly low, meaning it can be powered from the RPi using a Y-cable if desired. The networking is wired for the time being, but the setup could easily be changed to use wifi (I have found the ALFA Networks high power USB wifi devices to work very well compared to the small Edimax devices, but this is a whole other blog post). This setup is not very visually appealing, but it is functional, and this is just the pilot phase. I have been testing this setup by streaming music from my mac using iTunes, through the test equipment, and then onwards to some Rogers LS7t speakers. I have been listening for hours on end over the past few days without noticing a single audio drop or glitch.

The sound quality is great. As far as I can tell, the digital data stream to the USB interface is uninterrupted and error free, meaning sound quality is exactly as if the Mac was directly connected to the the X-Fi Sound interface. Many people on the official Raspberry Pi forums have reported having difficulty getting stable audio when using an external USB interface. The rest of this blog article describes the software steps involved in getting pristine AirPlay audio through an external USB sound interface connected to the RPi.

### Preparation

Before we begin, make sure that you have the Debian Wheezy operating system installed, and that you have secure shell (SSH) access to your RPi (the instructions for doing so are beyond the scope of this tutorial, but the steps required can easily be found by searching the web). Make sure that you can log onto the RPi over the network before proceeding.

The very first step is to upgrade the packages on the RPi. As root, do the following …

On my system, the uname command gives the following output (so long as the output you see has the same date stamp, or is newer, then things should be OK) …

The next step is to upgrade the RPi’s firmware. As root, use the following command …

For me, this failed on the first run, but worked OK on the second attempt. For detailed information regarding the rpi-update command, see the associated github repository.

### Test AirPlay using the onboard sound

This section is a summary of the instructions found here. The default audio output should be set to the onboard stereo jack. For this, a single command is required as root …

Before download and compilation of the shairport software (this is the software that makes the RPi mimic an authentic AirPlay device), some prerequisites need to be installed. As root, install the following packages …

I also needed to get the Net::SDP perl library when I tried this. This was done using the following command as root …

With all the prerequisites successfully installed, the next step is to download shairport sources and compile them as root …

Shairport can now be tested by launching it in the foreground …

… and then connecting to it using an AirPlay compatible device. The streamed data should then be audible from the 3.5 mm jack with speakers or headphones attached.

### Getting an external USB DAC to work without pops / clicks / noise

Firstly, install some prerequisites as root …

The next step is to edit /boot/cmdline.txt to fix some of the potential causes of pops when using a USB sound interface. Open /boot/cmdline.txt in the nano editor as root …

Add the following text to the file. I’m not sure if the position of the text makes a difference, but I appended the text to the front of the existing text. Once the text has been added, quit the nano editor using ctrl+x and save the changes by hitting return when prompted.

Using nano as in the previous step (or your favorite editor), edit /etc/libao.conf so that it contains the following …

Edit /etc/modprobe.d/alsa-base.conf, commenting out the snd-usb-audio line and adding the snd_bcm2835 line so part of the file looks like the following …

For testing purposes, create a hidden file, .asoundrc, in the home directory of a regular user. For example, as pi …

Then edit this file to contain the following configuration data…

Restart the RPi before proceeding. Attach the external USB sound interface to some speakers or headphones, manually start up shairport again like in the previous example, then test it by connecting to the shairport server and streaming some music

If this all works OK, then make a demon process so that shairport always runs on startup or after the RPi is reset.

### Running the AirPort server at startup

The first task is to copy the local .asoundrc to /etc/asound.conf so that it can be found by the demon process. This tripped me up for a bit!

As root, install the shairport software, copy the default configuration, make the shairport init script executable, and then update rc.d by issuing the following commands …

Before starting the daemon, we have to add the AP Name in the launch parameters. Edit the file using nano shairport then change the DAEMON_ARGS variable line so it looks like the following …

Replace AirPi with whatever you want it to appear as on your network. The demon can be started using the following …

The AirPlay service will now start whenever the RPi is powered on.

### Coming soon

In the next installment of this blog series, I will attempt to interface the Raspberry Pi with a Schiit Modi dedicated USB DAC. I have one on order from the manufacturer in the USA (it is currently in the post). These are supposed to provide incredible sound quality for the price. the problem with the xFi is that it tires to do too much (optical / audio in / headphone amp). the Modi just has a USB input and a stereo line level analogue output. As an AipPlay device for use with the Raspberry Pi, no money (or size) is wasted on superfluous functionality.

In the next installment, I also intend to some controlled tests to compare the setup’s sound quality to an Apple TV. I’ll do this with a number of listeners. I’m excited to find out what happens!

# Cobalt Theme for Xcode

I am very fond of the TextMate 2 open source editor. I use it to write posts for this blog. I also really like the ‘cobalt’ text formatting theme that is included with TextMate.

I do not like the hideous default themes that ship with Xcode, and neither does this guy. Daniel Barowy kindly supplies a cobalt emulation theme for xcode on his site, but the file format does not work with the latest version of XCode. I converted this theme and tweaked it slightly to darken the appearance console output. The screenshot below shows Xcode adjacent to TextMate.

If you like the look of this, then copy and paste the text from the code box below into the following file.

Finally, restart Xcode and select the Cobalt theme under the preferences menu.

# Basic Differential Equations Tutorial

This is a brief tutorial showing how to use differential equations to do make predictions about the state of a simple dynamic analogue electronics circuit. I have ported this post over from my discontinued Wordpress blog, as some readers may find the information useful.

Put simply, a differential equation is any equation that contains derivatives. A derivative is basically a gradient, and so it is an equation that describes a system containing something that changes. In audio, this will likely be something that changes over time (t), so an example might be

I looks kinda like a polynomial, but with various order derivatives. We’d call this example a second order differential equation, as the highest order derivative is 2.

I quickly want to move onto a real world example, showing some useful stuff we can do with differential equations. For this, I’m going to use simple electronic components in a circuit and then try and predict the state of the circuit at a given time. I’m going to use basic analogue electronic components for my first real-world example, as digital filter theory is rooted in basic analogue electronics theory.

The complex impedance ($Z$) of a circuit is made up of real resistive terms ($R$) and imaginary reactive terms ($X$) …

The reactance term comes from the combination of inductance and capacitance. Like resistance, capacitance and inductance are inherent properties of electronic components. The capacitative effect comes from built up electric field that resists voltage changes. The inductive effect comes from the build up of magnetic field that resists changes in current. Reactance effects are only exhibited when circuit conditions change. Therefore, we can use differential equations to predict the state of a circuit after a change. For the following example, the change is the circuit being switched on.

Consider a simple circuit containing a prefect voltage source, a resistor and an inductor …

When the circuit is switched on, the inductor chokes the initial burst of current, giving rise to a gradual increase in current through the inductor until a maximum value is reached.

One way to visualise the current flow through this circuit is to build it and measure it, but this is a bit of a faff. Another way is to use electronic simulation software. For simple circuits (and even for relatively complex ones too) I am a huge fan of the excellent and free web app, CircuitLab. CircuitLab allows the user to just drag and drop various components, set their values, then produce simulation plots like that shown below.

The plot shows that the rise in current through the inductor is not linear with time. This is because the relationship between the inductance (L), voltage (V), and current (I) is differential - it changes over time …

We will now see if we can predict the current for any given time analytically using differential equations. We know from Kirchoff’s Voltage Law that the sum total voltage drop across series components (VR and VL) equals the supply voltage (VS) …

We want to predict the current, so using Ohm’s law (V=IR), we can express this in terms of current (I) …

The next thing we want to do is get this into a suitable form for integration so we can remove the differential terms (dI and dt) and find a solution. THe first step is to split dI and dt …

… then separate the rest of the variables …

This is now in a suitable form for integration.

This is a great time to introduce a web app that I find particularly useful, Wolfram Alpha. This tool can be used to perform calculus. It will show the intermediate steps towards a solution, so there is no black-box trickery to get in the way of our understanding. I have posted a screenshot of the output of the web app when asked to integrate the left-hand-side (LHS) of the above equation. It saves all the $LaTeX$ typing anyway!

Integration of the right-hand-side (RHS) is trivial, so we end up with the following …

We are still trying to solve this for current, so the next step is to isolate the logarithmic term so that both sides of the equation can be easily exponentiated …

Now simply exp() both sides …

Now save a little writing by using the following alias for K, where K is some constant term …

… and so …

This allows us to finally get the current term by itself …

Great! We can almost state the current at a given time after the circuit is switched on, but there are still some pesky unknown constant terms in K that would cause some uncertainty in the result. However, this is not a problem because we know the initial conditions at switch on. At time t=0, the current through the circuit I=0. Substituting this in …

And so it follows …

so …

Now we can determine the current at any given time after switch on!

It is then easy to check if our analytical solution matches our hardware equivalent by making a quick Matlab script …

So there we have it. I differential equation solved that allows us to predict the behaviour of a real physical system.

# Matlab-like Profiling in C++ Using RAII

I’ve always liked the ability to rapidly profile code snippets in Matlab using the ‘tic’ and ‘toc’ commands. I wanted to develop something equally as simple to use in C++ for rapid tests. It turns out that a particularly elegant way to do this is to use RAII techniques, allowing easy profiling of any commands within a scope. A time marker is made on construction, and another time marker is made on destruction, and then compared to the original time marker.

The new ‘chrono’ classes bundles in the C++11 standard library make this task particularly simple. Below is the full class for simple tic toc type profiling …

Putting this to use is very simple. Just make a new scope using the scope resolution operator and create an instance of TicToc. When it goes out of scope, it will be deleted from the stack and will display the elapsed time.

Furthermore, other classes can be derived from this simple class, making this a really simple way to get stats on object lifetime.

Enjoy and HAPPY NEW YEAR!

# C++ Palindrome Detection

An interesting question popped up on stackoverflow about palindrome detection in c++. A palindrome is a word whose characters can be reversed, and no change is detectable, e.g. “racecar”. My initial thought was to make a copy and use the std::reverse algorithm …

However, this copy can be avoided using reverse itterators . .

Seeing as this function only contains a single command, it might be nice to use a lambda as an alternative. You can specify a return type from a lambda with this syntax

So defining the palindrome detector becomes a simple 1-liner.

# C++11 Is Poetic

I will start posting cool little blocks of code, like short poems when I see neat little things to do ..

In vector: 12, 324, 45, 65, 787, 9, there are 2 multiples of 5

For some more depth on the functional goodness see this Dr. Dobbs article

# Using Inkscape Vector Graphics to Generate a Website Logo

This is a quick tutorial showing the steps required to generate a logo like the one seen at the top of this page. For this, I used the fantastic open source Inkscape software. In the past, I have tried to learn how to use various vector graphics drawing software packages, but have always run out of patience. This was until trying Inkscape. In comparison, Inkscape is easy to grasp and there is a large amount of user generated documentation online that will help you to produce amazing looking graphics in no time.

I am an inkscape beginner but thought it might be useful to produce a brief tutorial that combines some of the basic techniques that I’ve leant from other tutorials. I have noticed that many of the tutorials on the web do not take good advantage of the layers tools and so I hope to address this here. This tutorial was generated on a mac and so the screenshots have a mac look and feel. However, the commands outlined here should be identical on all platforms.

The first step is to open up the layers menu widget (Shift+… as shown) thing. I decided that I wanted some text with some lighting to make it look shiny and some special effects like a glow and a shadow. I decided to break this idea down into three layers as shown in the screenshot.

The next step was to create a basic logo using the text tool. Do this on the ‘BasicText’ layer. The ‘AP’ part of the logo used a non-standard font downloaded from a free font website. The colours of the text can be changed easily by highlighting a selection and then selecting one of the colours from the swatch at the bottom of the Inkscape window.

I wanted to check that the text was legible when using a black background. For this, I created a black rectangle and sent it too the back by using the item shown in the object dropdown menu. All looks good so I deleted the black box.

The next thing that I wanted to do was add lighting effects. I was only interested in doing this on the large text. The first thing to do was to duplicate the target text.

Once the duplicate was created, I moved the duplicate to it’s own layer that I reserved earlier for lighting effects.

The visibility of each layer can be changed by clicking the eye image in the layers menu window. The fist job that I did with the duplicate was to create a gradient from black to transparent, from bottom to top. THis was to make the lower part of the text look as if it is in a relatively shaded region The gradient can be locked to the vertical plane by holding the ctrl key while dragging the gradient tool.

<img src=”/images/inkscape_logo_tut/Screen shot 2011-07-02 at 08.24.46.png” width=100%>

The next idea was to make the top of the text look shiny, like light is coming from above. For this, another duplicate was made and pushed into the lighting layer. I coloured it grey for better visibility. I then put an elipse on top of this duplicate.

The elipse and duplicated text were both selected and the intersection tool in the path menu was then used.

This does what it says on the tin.

Setting the basic text layer to visible under this gives the following.

The colour of the inset was changed from grey to white and then a gradient was applied to transparent, adding to the shiny look.

To give the edge of the text more colour definition, an inset was added like so.

To give the following result.

Looking good! I wanted to give the coloured text a neon glow, so I duplicated the basic text once more and pushed it into the effects layer.

I then used the blur tool from the colours window to get the glow effect.

<img src=”/images/inkscape_logo_tut/Screen shot 2011-07-02 at 08.36.02.png” width=100%>

The layers can all be set to visible once again to see the combined effect.

I again wanted to test the legibility of the text on a dark background after making the lighting and glow modifications. However, instead of just drawing a rectangle and sending it to the bottom of the layer, I created a new later called ‘Alternative Background’, moved that layer to the bottom and then placed the black background within it. This makes switching the background on and off using the layers menu very easy.

<img src=”/images/inkscape_logo_tut/Screen shot 2011-07-02 at 08.40.12.png” width=100%>

In doing this, I noticed that the bottom, shaded part of the text did not contrast well against the background. To rectify this, I went tot the lighting layer and added an inset to the shading.

<img src=”/images/inkscape_logo_tut/Screen shot 2011-07-02 at 08.41.21.png” width=100%>

The resulting image after this transform is much sharper to the eye.

After this, I wanted to create a reflection in the horizontal axis, making the logo appear as if it is hovering above a reflective surface. For this, I created a duplicate of everything, copied it into an new reflection layer, and then flipped the duplicate upside down using the button in the top-left of the following screen shot.

The duplicate was then moved below the original. Holding the ctrl key while dragging only allows the object to move in one dimension, thus making alignment simple.

Colour and transparency were modified.

Glow colour, glow amount and other subtle changes were made

The final image was then ready for export. The first stage in this action was to resize the canvas. In the image properties box, there is a useful too to resize the canvas to mathc the size of the current selection. Seeing as nearly half the selection was transparent in the example shown, the lower margin was tapered in by some pixels

To export the logo in a useful format such as png, use the “export to bitmap” option in the file menu. the first time that I attempted this action, I used the “save as” option to save the image as a png. However, in doing this, I lost all of the alpha information and so the transparency was all messed up. The “export to bitmap” method gives predictable results.

The final png is shown in the operating system preview pane.