Introduction
Last week, we took a look at the javax.sound.sampled
package and learned how
to loop a snippet of audio.
In Part Two, we’re going to take a look at some issues related to
audio latency and real-time programming. This will help us increase
the responsiveness of our software, and make for smoother audio
transitions.
The High Concept: Reprise
If you’ll recall, we’re working on a program called
Looper
whose job it is to play a snippet of sound over
and over again. It’s also able to switch between different snippets.
Assuming that our snippets are the same length and have the same
rhythm, and assuming that we switch properly between them, we should
be able to maintain continuity between the different snippets.
Transitioning: The Easy Way
Just switching from one loop to another is easy. Let’s suppose we
have two different rhythm tracks which have the same rhythm. If we
want to make a smooth transition from one loop to the other, we can’t
simply start playing the second loop from the beginning. If we did,
we would likely find that the second one started right in the middle
of the first one, ruining the rhythmic continuity between the snippets
and throwing everything off.
But it’s not hard to get this right — all we have to do is keep the
value of cursor
around. As you remember from Part One,
cursor
tells us how many bytes of the first loop we’ve
written so far:
// Write at most this many bytes per inner loop execution static private final int innerLoopWriteSize = 512; // The position within the currently playing sound // (i.e. the next sample to write) int cursor = 0; while (true) { // How many bytes are the left to write from this snippet? int bytesLeft = snippet.length-cursor; // If we've reached the end, start from the top // of the sound if (bytesLeft<=0) { // restart sound cursor = 0; bytesLeft = currentRaw.length; } // Don't write more than this in one loop int towrite = innerLoopWriteSize; if (towrite > bytesLeft) towrite = bytesLeft; // Write a chunk int r = sdl.write( snippet, cursor, towrite ); // Remember how much we wrote, by advancing the cursor // to the next chunk of sound cursor += r; }
When we switch to the second loop, we need to skip ahead
cursor
bytes and start writing from that point. This
ensures that the piece of the first snippet matches up with the piece
of the second snippet. Put more concisely, if we’ve played up to
sample N of the first snippet, then we start playing the second one
starting from sample N.
Following this rule ensures rhythmic continuity. However, we’re going
to take it a step further and try to deal with latency.
Latency
The reason that the above solution is not ideal is that, even if we do
make a perfect transition, we are not going to hear it right away.
Assuming that the transition is triggered in some way by the user,
there will be a delay between this trigger and the moment when the
transition becomes audible.
With any digital audio system, it takes a certain amount of time for
the audio to get from disk to speaker, or from microphone to disk.
(Or from microphone to speaker, for that matter.) This delay is
called latency.
For commercial hardware devices, this delay is usually on the order of
a few milliseconds or less.
For computer systems, this delay can be substantially longer —
sometimes on the order of large fractions of a second. This increased
delay is due to the fact that a desktop workstation is not really
designed to provide low-latency audio throughput. And the audio
generally has to go through several levels of software, each of which
adds a certain amount of delay.
Compensating for Latency
In our Looper
appliation, the effect of the latency is
that there is a delay between the time that the user initiates a
transition and the time that the transition appears to happen. It is
this delay that we would like to reduce. We would like the transition
to occur as soon after initiation as possible.
When using javax.sound.sampled
, there are two sources of
latency. Any audio leaving your application is first buffered within
the code for the javax.sound.sampled
library. It is then
sent through the OS audio layers, and then finally to the sound
hardware. So we have three potential sources of latency.
In our Java code, there’s nothing we can do about the latency in the
OS and hardware layers. The latency in the Java library, however, is
under our control to a certain degree.
Let’s consider the moment of transition. At this moment, we’ve
written a certain portion of the first snippet to our SourceDataLine,
and suddenly the user asks us to transition to the second snippet.
Because of the latency through the system, we know that not
all of the audio data that we’ve written has actually gone
through. Some of it may be in the Java buffers, some may be in the
OS buffers, and some may be in the hardware buffers.
What we’d really like to do is “recall” the data from these buffers.
Sadly, we cannot do this from the OS and hardware buffers, but we can
from the Java buffers. Naturally, this is as easy as calling a
method:
sourceDataLine.flush();
What this does is tell the SourceDataLine
to empty
everything from its buffers. At this point, the
SourceDataLine
has nothing to play, which means
we’d better fill it with new data as fast as possible. Just as we’ve
removed the remaining portion of the first snippet, we can start
writing the next portion of the second snippet.
A Bit More Complicated
However, we can’t just start playing the second snippet just as we
used to do. After all, we just emptied some audio data from the Java
buffers, which means we really haven’t played as much audio as we
thought we had. The audio we just took out never got played.
We had previously advanced cursor
by the amount that we
thought we had played. But we didn’t play all of it, so we need to
back up our cursor by an amount equal to the amount of data we flushed
from the buffer. The flush()
method doesn’t tell us how
much was flush
ed, but we can figure it out for ourselves:
// We have to flush the buffer, and thus have to // back the cursor up a certain amount to keep the // looping seamless int backlog = sourceDataLine.getBufferSize() - sourceDataLine.available(); cursor -= backlog;
getBufferSize()
tells us exactly how large the Java
buffer is, while available()
tells us how much room is
left inside the buffer. The first minus the second tells us how much
data was waiting in the buffer. It is this amount that we need to
subtract from the cursor to make sure things line up.
Fudge Factor
In practice, however, even this is not quite enough. The compensation
factor above results in a looper that is nearly, but not quite,
seamless. A good ear can tell that the transition is not quite
… right.
There could be a number of explanations for this, and, frankly, I
don’t know for sure which is the right one. I do know, however, that
it is very common for there to be minor errors in latency compensation
whenever a general-purpose desktop computer is used for real-time
audio. Even the expensive commercial audio software packages have
this problem, and so they make it easy for the user to correct for
these small errors by entering a “fudge factor” that is added or
subtracted from the expected latency.
Setting the fudge factor properly is mostly a matter of
trial-and-error, and a good ear. In our software, we simply hard-code
a latency compensation value, and change it until things sound right:
// the latency through the low-level sound system // this must be tuned for each system static private final double sysLatencyTime = 0.695; // seconds // the sound latency expressed in samples static private final int sysLatency = (int)(sysLatencyTime*sampRate); // ... // Back up by a bit more, because of the latency // through the low-level sound system cursor -= sysLatency*2; // 16 bit, don'tcha know
My system needs a value of 0.695 seconds, and this sounds about right
for my sound hardware.
Full Source
Here are links to the full source to a program that demonstrates these
concepts. Looper.java
contains the looper code. It’s
also a main()
routine which lets you try it out at the
command-line.
To use it, simply enter the filenames of some audio files on the
command-line:
prompt> java Looper loop0.wav loop1.wav loop2.wav
Press Return
, and Looper
will start playing
the first sound. Press Return
while the program is
running, and it will smoothly transition from one sound to the next.
Entering q
before pressing Return
will cause
the program to exit gracefully.
The other class, Queue.java
, is a utility class used to
hold a list of sounds that are to be played by the Looper. (In fact,
the looper simply skips ahead to the last element in the queue and
transitions to that one.)
Conclusion
In this two-part series, we’ve learned how to read audio data
from the filesystem and play it in a platform- and format- independent
way through the sound hardware.
We’ve also learned about some of the more subtle issues surrounding
real-time audio, such as latency, and we’ve explored some methods for
solving problems associated with these issues.
About the Author
Greg Travis is a free-lance programmer living in New York
City. His interest in computers can probably be traced back to that
episode of “The Bionic Woman” where Jamie runs around trying to escape
a building whose lights and doors are controlled by an evil artificial
intelligence which mocks her through loudspeakers. He’s a devout
believer in the religious idea that, when a computer program works,
it’s a complete coincidence. He can be reached at mito@panix.com.