I have central air-conditioning in my apartment, and it’s controlled by a remote, employing IR signals to send commands to the A/C control unit.
As any decent geek would, I’d like to be able to control my A/C using other means (e.g., a smartphone).
In a previous post, I covered thoroughly the details of using an Arduino to send IR signals to the A/C instead of the remote – but if I’m far away, how can I know if the command was received and executed by the A/C successfully?
Well, given that the A/C control unit beeps when it receives and executes a command, I thought I might take advantage of that – and virtually “listen for beeps” after sending A/C commands to verify successful execution!
The short version: A laptop running Ubuntu Linux is located in hearing distance from the A/C. Just before sending a signal from the Arduino, the laptop starts listening on the microphone (using the PyAlsaAudio library). It calculates Fourier transforms over the recorded audio sample, and measures the power around the beep central frequency (4100Hz), looking for power-peaks that correlate to a beep.
For the longer, detailed, version – do read on!
Also, check out the code that implements this on GitHub.
(nitpicks-alert: while I am aware that “power” and “energy” are different things, I am using the terms loosely and interchangeably throughout the post. please forgive me.)
First steps – manually analyzing the beep
Before even starting to think that I can pull this off, I wanted a quick way to examine the ability of the laptop to record audio and distinguish between beeps and random noise. Thankfully, the above-mentioned PyAlsaAudio library is quick enough to install:
- Install the dependency (libasound2-dev):
sudo apt-get install libasound2-dev
- Download the Python library from the website.
python setup.py buildand
sudo python setup.py install
Once installed, it comes bundled with utility scripts that allow recording and playing back PCM recordings (using
python recordtest.py to record, and
python playbacktest.py to playback).
So, with the library installed, I recorded a couple of seconds that included “silence” (for reference background noise), and a beep (all using default settings, raw PCM, 1-channel, 16-bit little-endian samples at 44,100 Hz sample rate). Then I played back the recording to hear whether the recorded beep is distinguishable using the laptop microphone.
Initially, I couldn’t hear the beep at all! I figured this is because the laptop-microphone is optimized to pick up a close-by human speaking (e.g. for Skype calls), while trying to cancel out distant noise. I don’t really know how to change this behavior, so I looked in the sound settings in Ubuntu, and set the microphone-gain to maximum. The result was a very loud noisy recording, but with a human-audible beep in the background – so I figured that if I can hear it, then a program must be able to detect it.
The next step for me was to further analyse the audio sample. I wanted to see the spectrum of the audio, and see if I can spot the beep there.
So I installed Audacity (now on my Windows laptop), and imported the raw PCM sample (apparently it’s supported).
Using Audacity, I was able to zoom in on the half second that contained the entire beep, and export it as a separate file, and export another half second with just noise as a separate reference file.
As visible in the last screenshot, Audacity has useful analysis functions, including “Plot Spectrum” – which is exactly what I ran on the two half-second samples.
Clearly – the beep spectrum has a nice strong peak around 4100Hz, which is missing from the noise spectrum – hurray! I can use this!
Implementing spectrum-based detection in Python
Armed with scientific proof of the existence of the desired beep, I was ready to implement a Python program to perform the real-time recording and signal processing required to actually have the computer “hear” the beep autonomously.
The shiniest version of that program is available on GitHub, so you can dig into it if you’d like.
Here I will describe some of the design consideration behind that program.
First thing – the program is actually only a module that is part of a larger program (remember? sending an IR signal using an Arduino, while listening for a beep?), so the module must offer a convenient API – which is the
MicAnalyzer class. The user program instantiates a
MicAnalyzer object, with parameters for recording (sample rate and such) and detecting (beep central frequency and time-span, and energy and noise thresholds), and then can invoke several methods on the object:
start_listen: Activates a listening thread, that uses PyAlsaAudio to collect live real-time samples from the microphone and write them to a processing queue. The method takes a
rec_timeparameter that determines the maximal length of the recording session (in seconds), along with other test/debug parameters (more on that later).
is_beep: Assuming a listening session was activated (if not, it will simply return immediately with False), the method will process a sliding window of the sample (e.g. a half-second window taken every 0.1 second), calculating the spectrum of the windowed signal for every window (using NumPy’s FFT), and return True if the signal energy (integrated over a narrow band around the expected beep central frequency) exceeds a “beep-threshold” (as read from a threshold file). In case a beep is detected, the method will stop the listening thread (before timeout), and wait for it to terminate before returning to the caller.
energy_generator: As the name implies, this is a generator-method. It is used by
is_beep, to iterate over windowed samples, yielding the integrated signal energy that
is_beepcompares against the threshold. The method actually yields tuple of the form
(timestamp, integrated_energy), where the integrated_energy is for the time-window that ends at time
timestamp. The method may be used directly by external users if they wish to (for instance, I used it directly in order to produce signal energy graphs for debugging and awesome, as well as for calibrating the beep-threshold, and generally for testing and debugging the module).
Graphing the audio signal energy
The signal energy graph above was produced using the
graph function in the module (see in code). Given the
MicAnalyzer class described above, creating the graph is pretty straight forward – so the graph function can be quite short and concise. It simply instantiates an object of that class, starts a listening session, iterates over the
energy_generator iterator method of the object – appending yielded timestamps and energy values to internal lists, and uses the matplotlib Python library to plot the energy values (y-axis) as function of the timestamp value (x-axis), adding axis labels and graph title. For fun, the plot is XKCDified (which requires matplotlib >= 3.0).
Calibrating the Beep threshold
When describing the beep-detection flow above, I mentioned a beep-energy-threshold that is compared to the measured signal energy in order to decide whether a beep is present in the signal or not.
Naturally, during development I used hard-coded threshold based on my manual observations.
But I really don’t like hard-coded values in my code, and I also don’t like the idea that I will need to repeat the manual process of determining the optimal threshold value each time I move the laptop or something like that.
So I implemented an automated “calibration wizard” (see in code).
The wizard guides the user in recording a couple of seconds of background noise, followed by a couple of seconds during which the user is requested to trigger a couple of beeps.
The detailed flow of the calibration wizard:
- Take the maximal energy value from the noise sample, and double it to obtain an initial “noise threshold”.
- During the second recording, look for energy values that exceed the noise threshold defined above. Such values are “beep suspects”.
- Group sequences of “beep suspects” (each consecutive series of higher-than-noise-threshold energy values is a sequence), so we have a list of beeps, each “beep” specified as a list of energy levels that define the beep.
- Verify correctness – count discreet beeps, and ask the user to approve that this is the right number of beeps during the recording.
- For each discreet beep, calculate the mean energy value of the beep, then set the final beep-threshold to be mid-way between the noise threshold and the weakest mean-beep.
- The final beep threshold is written to a threshold file as persistent storage.
For some nice “visual” feedback to the user while processing the recording, the wizard prints a symbol every
time_step seconds (e.g. 0.1sec) that lets the user know what’s going on. A period (
.) is printed for noise levels, and an asterisk (
*) is printed for every higher-than-noise-threshold level.
Implementation note: It is OS-dependent whether the default console/terminal behavior for STDOUT is buffered or not. Most of the time, this is not important, since the rate of displayed characters is meaningless. In the case of the visual feedback, if STDOUT is configured to flush only when
\n is printed, for example, then the feedback would be meaningless, and out of sync. The solution is to explicitly set STDOUT to unbuffered mode:
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0).
Some corner cases:
- If no energy value exceeds noise threshold, the wizard assumes either there were no beeps during the recording, or the beeps are too weak given the current environment and microphone. In that case the wizard exits.
- If the number of discreet beeps does not match the number provided by the user, the wizard fails and exits.
- If the weakest beep mean is not greater than noise threshold – then math is broken and the world probably ended, so it doesn’t matter what the wizard does…
Using pre-recorded PCM files for testing
The way I described the features above adheres to a real live recording session, but requiring such live sessions during development is less than ideal, for the following reasons:
- The audio library is for Linux, and I write the code on Windows.
- Noise/Beep conditions and levels are dynamic, depend on the ambient conditions in my location, and are generally not consistent and reptitive – which is very inconvenient when testing and debugging (e.g. “I had a specific bug with a specific situation, but I can’t recreate the exact sound that triggered it!”).
- The listening laptop / microphone needs to be close to the A/C control unit, which is not generally true for my work desk.
- Specifically, beeping requires that I play with the A/C (turn on, change settings) – and I don’t want to have to do it for each test run…
The solution for the issues above is simple – use pre-recorded PCM files instead of real live recordings!
Specifically, in order to be able to develop on Windows (where PyAlsaAudio is not available), the library is not imported globally, but only in the listening thread, and only if pre-recorded files are not used.
Since using this approach was so convenient, I opted for keeping support for files instead of live recording in the production version of my code, along with a couple of pre-recorded test files that I used.
I tried to integrate this feature as deeply as possible, in the listening thread code, in a way that would be as transparent as possible to all other parts of the code.
The purpose is to allow everything that is not the listening thread to behave the same for mocked-listening vs. live-listening (see more about Mock objects on Wikipedia), so I actually test and debug the same code that would later handle live-listening.
Some notes on the integration of pre-recorded files:
- The only place that is aware of mock-files for recording in the
MicAnalyzerclass is the
start_listenmethod, and even there it’s just to pass these parameters on to the listening thread. The mock-file parameters default to
None, which the listening thread interprets as “live recording”.
- The stand-alone variant of the module (that can be executed via command-line) supports optional flags that specify pre-recorded files. These flags are transparently passed to the recording thread (via the
start_listenmethod), and also default to
- In the listening thread code itself, the distinction between live and pre-recorded audio is very clear and concentrated in the
- The listening thread supports a listening timeout that defaults to
None(which means no internally-triggered timeout). The way that this is supported with pre-recorded files is by looping the file in case the listening time is longer than the recorded time.
Other utility features
A couple of extra utility features, mostly for testing and debugging purposes, include:
- Mentioned only briefly before, the Python module can be executed as a stand-alone command-line Python program, in addition to its “natural” usage pattern (as an imported module, via the
MicAnalyzerAPI). Everything related to stand-alone module behavior can be found in the bottom of the code (the
'__main__' == __name__condition is true only when the module is executed explicitly, as opposed to imported by another module). It uses the argparse module to define command-line sub-commands and flags. Refer to the README file on GitHub for details on the command-line flags and options.
graphsub-command was explained above – it is used to produce signal energy graphs.
calibratesub-command was explained above – it is used to launch a wizard that calibrates the beep-energy-threshold.
detectsub-command can be used to check the beep-threshold. It will print out periods and asterisks while processing a recording (same as the calibrate feedback).
isbeepsub-command can also be used to check the beep-threshold. It defers from
detectin that it ends as soon as a beep is detected, and doesn’t print “live” feedback.
- The listening thread also takes an optional
debug_rec_fileparameter. If specified, in addition to queuing audio samples for processing, the listening thread will also store the raw PCM samples in the file specified on disk. This proved to be invaluable for late-stage-debugging, when I started testing with real A/C beeps, as I could easily recreate bugs that occurred for certain recordings and not for others. I actually keep this active in production, so I can diagnose rare issues when they occur.
This concludes this post on analyzing beeps.
You’re invited to clone, fork, hack and improve freely (and share back if you do).
This post is part of a series on my A/C-control project, so if you missed some of the other posts in the series, now is your chance to catch up: