Audio USB Fundamentals

USB Made Technical

USB cable illustration

Introduction and Overview:

This is a fairly technical article. Having said that, if you are at all curious about USB as it's used in hi-fi audio, then this will be a highly informative read. Most audiophiles are technically oriented to some degree, so I'm not going to be shy about getting deep into things. Let's wade in!

The USB interface has become extremely popular as the default connection between computers, music servers and DACs. It's not the only interface, but because it's so ubiquitous, I feel it needs to be defined in an explicit and approachable way for audio users. There is a lack of information on audio USB that is geared towards the average audiophile. There's plenty of information that is directed toward the electrical and computer engineer. Moreover, there's a fair amount of misinformation regarding USB and its application for audio.

Simplified Audio USB view of the host computer and DAC endpoint.

The Universal Serial Bus (USB) specification covers a wide gamut of principles. When we say USB, we can be referring to a physical cable with A and B connectors on either end. We could also be referring to the communication protocol itself. We also might be talking about the computer interface or the DAC interface which is connected to either end of the USB cable. I'll attempt to be clear in this article which part of this system I'm referring to. I would like to establish a few conventions used in this article: Computer Audio (CA) means any computer (Windows, Mac or Linux) or any music server, that is used to serve or playback hi-fi music files. There is arguably little difference between a computer dedicated to serve music and what's called a music server, which is still just a computer inside. I will not discuss any of the slight differences between music files, except to say that they are presumed to be lossless compressed or not compressed at all. The USB protocol is the data transfer communication, and the source file format is another topic altogether. When I refer to the DAC, this really means the USB to I2S converter board that may be inside the DAC or it may be an external device converting USB to SPDIF.

Audio USB Concerns:

The USB specification defines four different data transfer models: Command, Interrupt, Isochronous and Bulk. Audio data uses the isochronous model of communication. This is a highly important point, because audio control flow is very different from the other models. See the article on this site about the USB Specification. There are also four different versions of the USB specification, each with a different data transfer rate. The version that we are concerned with in audio is USB 2.0 which has a transfer rate up to 480Mbits/sec. That's megabits not megabytes. More on this later. There's a lot of data moving along the USB cable in real-time to faithfully play a hi-res music file.

The most important point that I can make at this juncture, is that audio USB can and does contain transmission errors, and in fact these errors are anticipated, and the specification allows for this. If this is a surprise, consider that audio data is a timed stream, and if errors occur, there is absolutely no provisions in the specification for a request to resend the data. In order to have an audio stream, the USB designers (the USB International Forum or USB IF) accepts that stuff will happen while data is transferred from host to endpoint. The firmware and the cable designers need to work within the parameters established for USB communication, and make the best design decisions given their application.

In high end audio, the need to minimize these transmission errors is great, otherwise music will not sound as good as it should. We will explain how these errors occur, and what to do about them. Simply using an inexpensive grade USB cable is not going to provide adequate error suppression. Often a poorly designed cable will introduce errors of its own as it transfers music data. It also will not suppress external influences that cause data errors. Poorly designed cables simply would not be acceptable for high-end CA use.

So what types of things happen to the audio data stream as it traverses the cable from computer to DAC? The answer to this is simple: noise and jitter. These terms often get lumped as one and the same, but it's useful to distinguish between them and their effects. Noise is degradation of the binary square waves that are transmitted down the two data wires in the cable. Noise is caused by electromagnetic interference (EMI) which is generated from many sources, and to a lesser extent radio frequency interference (RFI). Jitter is the deviation in the arrival time of a data point from what is expected. Jitter is caused by numerous things, but essentially it is due to less-than-perfect computer clocks in conjunction with poor cable design.

Stylized examples of square wave errors and phase shift.

Do not kid yourself, these adverse effects are audible in decent hi-fi systems. Both noise and jitter can be heard by any average listener. The good news is that there are reasonable solutions to help combat noise and jitter. One of the first places to start is by choosing a well-designed USB cable. But to take full advantage of the USB link, you also need a decent music player and DAC on either end. See the article on this site about Computer Audio Basics. Let's now take an in-depth look at how the music data traverses a USB cable in its binary journey.

Audio USB Details:

The USB data coming from your computer/music server originates at a USB controller, which may be on your motherboard or may be on a separate USB card plugged into the motherboard. Either way the data flow is initiated by the host controller and is sent out to the DAC called an endpoint. USB binary data is sent in what's called packets. That's to say the data is chunked into small manageable segments, and packed inside of a shell called a Frame that has an ID and information about its contents. An audio data stream consists of a continuous sequence of Frames each of which contain multiple packets, and each packet can contain up to 8192 bits (1024 bytes). There are also handshake packets. Each packet contains a header and an end of packet indicator. Each frame must be reliably sent every 125 microseconds, and this helps the data stream keep in time.

There are a total of four wires in normal USB 2.0 cables. They may also be shielded with either foil or wire mesh. The standard USB cable has a 5 volt wire and a ground wire. The data is carried on two wires designated data plus and data minus. The 1's and 0's are indicated by the polarity of the voltage between each wire. This differential (polarity) enables great noise suppression, especially useful at high speed transmissions. The principle relies on the fact that in differential signalling, both cables are exposed to the same noise, so it ultimately can be canceled out.

The data + and data - wires in a twisted pair (illustrative only).

The data + and data - wires are required to be in a twisted pair configuration, as per the USB standard. This is a well-known technique for improving data transmission integrity; and was originally discovered by Alexander Graham Bell in 1881 for telephone circuits. It provides an effective means for reducing electromagnetic induced noise (electric and magnetic flux) by cancellation.

Imagine the data being communicated for a single song that is 100Mb in size. Remember that a compressed song is first un-compressed before it's transferred. That one song contains well over 838 million bits of information: 100 * 1024 * 1024 * 8 = 838,860,800 bits. If we divide the total number of bits by 8000, the approximate number of bits of data in each packet, we get 104,857 packets being transferred to play one song. That provides a lot of room for noise and timing errors to be introduced.

If you're still following along, let's look at some details of how the binary data is structured as it flows inside the cable. USB 2.0 connections are half-duplex, meaning data flows in only one direction at a time. And it's inverted, meaning it has a positive and a negative signal. The smallest data entity is a bit, and it is represented by either a rise in voltage (a 3.2v square wave) or a lack of a rise in voltage.

Another way of describing the data flow is that it is an NRZI (non-return-to-zero inverted) encoding. Here is an illustration of this encoding as represented on the positive wire.

Illustration showing one-half of the USB data encoding (NRZI).

The presence of a transition encodes a "1", and the absence of a transition encodes a "0". The voltage is applied for the duration of the bit, and the bit duration is determined by the data rate. Since the NRZ flow is inverted, which means there is a mirrored compliment to the positive data. There is a positive and a negative side to the diagram above. This is a clever technique that allows the receiving end to apply a math subtraction to reduce noise that may build up on the pulses of current.

Each of the pulses of voltage that represent a bit boundary has to be critically timed by an oscillating quartz clock at the host controller. It is critically important that the timing be accurate. Deviations within a couple of nanoseconds can make a difference. The stream then needs to be re-calibrated to an identical clock rate at the receiving endpoint to enable reading of the binary data. Isochronous (equal time) transfers require a guaranteed data rate, but data loss is possible in real-time audio streaming.

Causes of Digital Noise:

Let's look at what can go wrong with this real-time data stream and how it builds up noise which degrades the audio signal. The oscillating of digital clocks, switch mode power supplies, wireless computer network traffic and transformer inductance are examples of conducted and radiated interference sources. In digital logic circuits, noise enters the data signal from ground bus noise, power bus noise, transmission line reflections and adjacent wire crosstalk. The presence of noise in the digital signal is audible in the music, and is passed through the DAC to manifest itself in the analog components. The effects of noise on the music presents itself by causing the music to sound noisy or grainy, it degrades bass control, and causes loss of mid-range tone and high frequency micro-details. Noise in digital music makes the listener feel edgy and instruments sound less emotionally engaging.

Good music server design and attention to electromagnetic compatibility (EMC) is therefore necessary to reduce degradation to the digital signal, before it gets placed onto the USB cable. Once the digital music signal is flowing on the cable towards the DAC, whatever digital noise is already present then gets compounded by a whole set of similar noise sources.

The following diagram illustrates how the USB converter in the DAC calculates the voltage pulses on the two data lines. If the code pulses are well formed and free of noise, the subtraction logic circuit faithfully transcribes the correct music data. If there is noise on the data signal, it should be canceled by the subtraction of their values. However, if there is excessive noise on the data lines, the difference of their positive and negative parts may not be enough to eliminate the noise, and errors will occur. The firmware on the integrated chips in the DAC will not be able to correctly transcribe the signal; 1's become 0's and 0's become 1's. This leads to a noisy analog signal.

This illustrates how data subtraction works in the DAC logic circuit to ideally cancel out noise induced on the data lines.

The two data lines can radiate their voltage pulses and generate influence on the adjacent line, which is called cross talk. If the data lines are shielded, as per the USB specification recommendations, that radiated electromagnetic field is confined around the wire leads, and can be carried by the shield itself. This causes more buildup of digital noise. If there is a 5 volt power lead in the cable, it will radiate EM, and if not shielded, can add to the noise on the data lines. Still other sources of noise which can be picked up by the all-important data lines, might be from nearby unshielded power cables (low voltage and high voltage), nearby transformers, and switch mode power supplies that radiate high frequencies in the Megahertz range. Of course, we also have to consider radio frequency interference (RFI), which may be from any source such as cell phone transmission or a nearby radio station.

Causes of Digital Jitter:

Let's look at what can cause jitter noise, which essentially is inaccuracies in the timing of the data signal. When the USB host controller establishes a communication pipe with a DAC at its endpoint, a transmission rate is established and agreed upon. Any deviation in the expected arrival of the binary pulses constitute jitter. Besides digital noise as discussed previously, digital jitter is a major concern. It's the USB cable's job to transfer both the positive and the negative data pulses to the DAC at precisely the same time. Any shift in time of the signal arrival, even if both plus and minus arrive together but too early or too late, then the DAC firmware has a difficult job of transcribing the signal. This creates audible differences in the music on high end equipment.

The two data lines must operate at the same frequency. If they don't we get phase-shift in the time domain.

The presence of jitter in digital music degrades dynamic range, it reduces instrument sustain, it affects sound staging and imaging, and can alter frequency ranges. In addition, if the positive and negative signals travelling down the twisted pair of data wires arrive at the DAC endpoint at different times, then we have an effect called phase shift. This too is audible in the music, and presents as the smearing of sounds.

Understanding bits and sample rates:

Taking a step backwards, let's look at how analog music is sampled and encoded to a digital format. The traditional and venerable Audio CD has digital music sampled at a bit depth of 16 bits, and at the bit rate of 44,100 samples per second (44.1KHz). There are 8 binary bits in every byte, and it takes at least 1 byte to represent a single letter or number, like a, b, c or 1, 2, 3. Remember that binary bits are either one's or zero's, and nothing more, strung together in sequences.

Let's consider one of the channels of a two channel stereo analog source. What we have is essentially a sine wave containing every one of the instruments, voices and sounds in that recording. Now, that is a very complex sine wave, but if you magnify it on an oscilloscope, it will look like a continuous, undulating wave. To encode that wave into a digital representation, we just take a large number of samples along that wave. Each value is stored as a number, and represents the accumulated frequency and amplitude of all those instruments at a segment in time of the original recording. To increase the signal to noise ratio of faint or low amplitude signals, we employ a method of encoding the height of the sample point. This process of encoding the analog signal into digital pulses of binary codes is called Pulse-Code Modulation (PCM). In other words, each sampled quantity of the analog modulation is represented by binary code and transmitted in binary pulses. The timing is determined by how many samples we agree to take within the span of one second. That will be over 44 thousand (44.1 KHz) sample points in every second for CD quality or 192 thousand for hi-res sampling.

So if the above describes sample rate, what is bit depth? Clearly, the more samples of the analog music we take the better the digital representation. Similarly, the bit depth is also better the deeper it is. Digital is stored in binary numbers which are just strings of 1's and 0's, or base 2 (opposed to what you're familiar with - base 10). Take a look at the following chart.

Bit depth Decimal Binary Max values
8 2^8 = 11111111 = 256
16 2^16 = 1111111111111111 = 65,536
24 2^24 = 111111111111111111111111 = 16,777,216

So as you can see, if we encoded our music with 8 bit numbers, we will only be able to represent 256 different discreet steps. With that limitation, digital music sounds pretty bad, indeed. The common red book CD encodes each sample in a 16 bit number, providing over 65 thousand different steps. While CDs are darn good, we can do much better than that. If the analog source is sampled into 24 bit numbers, we can obtain 16 million discreet step values.

This is how pulse codes (PCM) are used to encode modulated analog signals.

How Noise and Jitter Degrades Music:

As mentioned previously, noise and jitter degrade our music signal, which degrades what we actually hear. Admittedly, it usually takes a rather good audio system and attentive listener to hear a distinct difference. It's also important to address multiple causes of noise and jitter in our music data. Concentrating solely on the USB cable design though, there are a few key culprits for noise and jitter. A three foot cable acts as an antennae. It will pick up radio frequency interference, of course. But more importantly it will pick up electromagnetic interference (inductance and capacitance). In addition, the cable can generate its own noise internally, if it is poorly designed. There is voltage on the Vbus and there is also voltage pulses on the positive and negative data wires. These emit their own electric and magnetic fields, and that noise will be induced into the leads running in proximity to them.

Let's take a deep dive look at a sixteen bit binary encoded value such as 1100010000011000 which is the recorded amplitude value of 50200. This numeric value might represent, for example, the combined sounds in a sampled instance of analog music (stringed instruments, bass and percussive tones). Since noise and jitter cause errors randomly, any one or more of these binary bits may be switched if the firmware can't read them properly. So therefore, the DAC might decode 1100011000011000 which is 50712 or 1000011000011000 which is 34328. So as you can see a switched bit can make a minor or a major difference in the decoded sound depending on where it occurs in the sample code.

Illustration of noisy and phase-shifted USB data; the results when noisy music servers and surroundings are at play.

The above illustration shows a hypothetical noisy and phase shifted PCM signal carried by a USB audio stream. The noise is characterized by ringing, some overshoot and undershoot, decay rise and fall, and a fair amount of phase shift. Depending on the algorithms implemented in the DAC conversion firmware, this data signal may cause numerous random read errors.

Jitter is either due to drifting clock frequencies, or phase shifting between the two data lines. These effects can be introduced by the cable if there is inconsistent wire twisting, or a complete lack of a twisted pair. Also to a lesser extent, metallurgy and wire gauge play a role in jitter. Following is a chart illustrating the various ways that a square wave can be malformed due to source generation of the pulse and conditions on the cable itself.

The nomenclature used to describe square wave inconsistencies.

Now that we have seen the many ways that digital data in an audio USB stream can be degraded by noise and jitter, it's easy to understand why digital music can sound strident, edgy, confused, thin, smeared or generally noisy.

Good Audio USB Cable Design:

We have seen that twisting the pair of data wires reduces internal noise and cross talk. We understand that a sufficient wire gauge, the combination of silver coated copper wire and proper twisting, all improve the cable's performance. The goal is to do no harm to the original signal coming from the computer or music server. It is reasonable that the shorter the USB cable length, the better it will be, because there is less potential for picking up noise voltages and adding it to the signal. We can easily understand that eliminating the 5 volt Vbus from the cable, if at all possible, is beneficial to reduce electric and magnetic interference on the data lines. The following illustration demonstrates how a spurious noise voltage can impart itself onto a pulse code traversing a well-designed cable, and how the DAC's USB Converter uses simple subtraction to effectively remove that noise.

The ideal scenario of how noise is canceled out in the USB protocol.

If the 5 volt lead is required for a given server DAC combination, then it's important to shield the wire carrying this voltage. The shield should be only grounded on one end; preferably the computer chassis ground rather than the DAC ground plane. This shield will help protect the timed pulses in the pulse code modulation containing the music data. A ground wire is useful to complete the circuit, but ideally will have galvanic isolation from the DAC ground. We have seen that noise enters the system through power and ground. Noise also impacts the cable from exterior interferences. But, it is arguably better not to shield the data leads because the shield can carry the same capacitance generated noise from the electrical field emanating from the data voltages. It's best to let the electric and magnetic flux escape the data wires. The cable will broadcast in the near field a small high frequency signal (Megahertz ranges), but this is not an issue for our audio signal (hertz and Kilohertz ranges).

Summary:

It can't be overstated that the USB cable can have a profound impact on both noise and jitter. A well designed cable can communicate, without any additional degradation, the output of a good music server, and make the DAC's job that much easier. Similarly, a poorly designed USB cable can contribute its own noise and jitter to the signal, and make the DAC's job that much harder. A superior cable will maintain the lush mid-tones, solid controlled bass, and the pleasing highs of the original source material. It's noticeable when a USB cable faithfully presents the music with a well-balanced tone overall.

Ken Matesich, 2016