Illegitimi non carborundum


Sound on Linux is Confusing: Defuzzing Part 2: PulseAudio

In an earlier article, I describe how the low level ALSA configuration allowed us to route all applications using the ALSA API via PulseAudio. In this article we'll take a look at the various configuration files and variables that control this side of the audio path.Let's walk through what happens when an application tries to play sound.

ALSA Application

So first off, an application using the ALSA API tries to open the "default" device. Assuming we've configured this default device to be the PulseAudio plugin for ALSA, it will basically act like every PulseAudio client application. We're now into the land of the PulseAudio configuration.

PulseAudio Client

PulseAudio adopts a client/server model that is very similar in principle to that of the X11 system. It is the server that actually outputs the audio and the client app that tells the server what to play. While this approach can be inefficient, resulting in the copying of audio data around, PulseAudio goes to great lengths to ensure that data copying and other latency-prone operations are kept to a minimum. In the common use case of both client and server running on the same machine, PulseAudio uses SHM (Shared Memory) to ensure that data sent from the client to server is not copied across the wire. The core of the PulseAudio server itself is "zero copy" meaning that references to the data are passed around without actually copying the data itself.

The first thing a PulseAudio client has to do is connect to a server. In order to do this, it checks various variables and configuration files to determine precisely to which server it will connect!

Initially, the PulseAudio client library looks for a PULSE_SERVER environment variable. If found, this variable can define a list of servers to which the client should connect. These servers can be specified as local UNIX sockets or DNS names/IP addresses for a TCP connection.

If this is variable does not exist or is empty, PulseAudio will then check for X11 properties on the root window. These properties are much like environment variables, but will be available remotely if you SSH to another machine with X11 forwarding. I'll speak about this more later. You can see a list of PulseAudio related properties by doing:

xprop -root | grep PULSE

The variables names used are the same as those used in the environment, so PulseAudio will look for a property called PULSE_SERVER.

Assuming it's still not got a server to connect to yet, PulseAudio will check for a default-server configuration in it's client.conf file. This file is located in either /etc/pulse/ or ~/.pulse/. Only one of the client.conf files is parsed. So if the user has their own one, the system one will not be parsed at all (this is a point I tried to make in vein on PulseAudio bug #606 - it took quite a while for this to sink in as you can see!).

So we've tried three ways to find a server. If we've still not found one, we just resort to defaults - i.e. connecting to a local, personal daemon and a local system-wide daemon (system-wide use is generally not recommended, but is supported for certainly circumstances - typically embedded systems). If we still cannot connect, the client.conf file can specify whether or not we will try to automatically start a personal daemon. Since PulseAudio 0.9.11, this is the default behaviour and allows console applications to work out of the box without starting a PulseAudio deamon beforehand.

So, in the unlikely event that all that fails, we will ultimately not be able to play sound, but we've done pretty much everything we can to make it work! In order to better visualise this, let's look at a couple of typical scenarios and run through the above process.

Console Application

So, under a default install, we've booted to runlevel 3 and logged into the terminal. We don't set any special variables and start an application that plays sound via ALSA. Here is what happens.

  1. App opens default device
  2. ALSA PulseAudio plugin (like any PulseAudio client) checks it's config for a server and finds none.
  3. It tries to connect to a local server but fail as it is not running.
  4. It then starts a PulseAudio server automatically and then connects to it.
  5. The application then plays audio via the ALSA API functions and this is ultimately played by the PulseAudio daemon.
  6. The client application finishes doing it's thing, and exits.
  7. The PulseAudio daemon stays around for a while just incase another app wants to play sound in the near future.
  8. After a while, the PulseAudio daemon will go in the huff because no one loves it and kill itself :p

So that's it. It's quite simple. Let's have another example.

X11 Application

Under X11 things are a little bit different, but the same basic principles are followed.

  1. During X11 initialisation, modern desktops that support XDG Autostart ultimately run the script start-pulseaudio-x11. This script ensures the PulseAudio daemon is started and some extra X11 related modules loaded into it. These modules ensure that, unlike a console application, the X11 PulseAudio daemon will not exit after an idle timeout - it will instead stick around for as long as the X11 session exists. It als ensures that the X11 Properties mentioned earlier are set - the reasons for will will become apparent in the next example.
  2. When any PulseAudio client (be it an ALSA app via the ALSA PulseAudio plugin, or a native PulseAudio) is started it goes through it's "find a server routine". It will now stop this process when it reaches the X11 properties and use the info therein and connect to the server.
  3. The client application then plays sound as before and ultimately finishes and extis.
  4. The PulseAudio daemon doen't exit/kill itself as the X11 session is still going.

So again, things are actually quite simple.

Remote X11 Application

One of the handiest things with X11 is the ability to connect to another machine on your network and run GUI applications and have them display on your local display. With PulseAudio, the sound is also heard on the local machine. Out of the box (for security reasons), remote connections are not enabled. To enable them, run paprefs and enable the option Enable network access to local sound devices. This is the only option needed for this example. It loads an additional module into the server that listens on TCP port 4713 for incoming connections. Obviously, it goes without saying that any firewall on the machine must allow connections to this port!

  1. User starts a normal X11 session and the start-pulseaudio-x11 script ensures the X11 properties are set as in the previous example.
  2. User then connects to another machine on his network via SSH and starts an audio player (e.g. RhythmBox, Amarok, etc.)
  3. Even tho' the app is running remotely the visual display will be local.
  4. When the app starts playing audio, the PulseAudio client portion will find the X11 properties that have been forwarded through the SSH connection.
  5. The PulseAudio client will then connect over TCP to the PulseAudio server running on the user's local machine.
  6. The User revels in the local display and audio the application provides.

So as you can see, the use of the X11 properties has allowed us to piggy back on top of the X11 forwarding. It's not a totally clean connection as under SSH the X11 data will actually be tunnelled over a secure link handled by SSH itself, whereas all we are doing is telling the PulseAudio client where to connect directly, outside of any SSH tunnels. This means that while the display can work over a NATed system, the sound will not. This is fairly easily addressed, but we'd have to teach SSH about PulseAudio for this to work. The reason it works for X11 is because SSH is aware of, and has specific support for, X11. We're just piggy backing on this. That said, the current arrangement is "good enough" for most use cases.

So I hope this article has demystified how the PulseAudio client and server interact and the various configuration files/variables that come into play. If you have any questions, please ask in the comments and I'll endeavour to update the article.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks
  • Slashdot
  • Gary Copcutt

    Sorry Mandriva but you have lost me. I am going back to DreamLinux. Why? After 5 years and previous version of Mandriva working I now try Power 2009.1 and I cant record or use Skype. This is not good enough. Linux on the desktop has to handle sound no problem with Skype on all laptops. (Thats why I am not looking to try and find out why it does not work) At least DreamLinux will do it from cold on my HP Pavillion and my Aspire 5315 without any problems. Common on guys you cant be serious!

    • Colin

      Well this is exactly the kind of problem you get when you deal with Closed Source applications in a Free Software system. I’ll be writing another article shortly about this, but put simply, the ALSA userspace API is pretty complex and some applications have used it in a pretty, ahem, “creative” way. Skype is one of these applications. Normally we would simply write a patch for the application to use the audio in a more standard way, thus making it work quite happily when the components underneath move forward and are modernised. We simply can’t do that with Closed Source applications.
      So what should we do? Hold back progress and not change anything because one or two apps are not written well? Or should we push on, accepting these problems, working around them as best we can but ultimately driving their developers to update their product. I, very strongly, say yes. Yes we need to push on, yes we need to show these application developers that they need to make changes. Without this push the whole ecosystem breaks down and stalls, and innovation suffers.
      This is how things work with Free software. If you don’t like it you should be paying some company for your system who will use that money to give you the system you want. For those of us who want to do things the right way, we’ll crack on and improve things.
      And while I don’t want to be rude, using a Closed Source example of a badly written implementation and suggesting that we must pay lip service to this product is completely misunderstanding what Free Software is all about.

    • Sergio

      You complaints are valid, but dude, why do you blame Mandriva for this issue????

  • Here we have the typical clash of the good and very well explained theory with the facts of reality. Yes, Skype is closed source and it may be badly written. Yes, the sound system of the open source ALSA and Pulse are going the modern way into the future. But reality is that Skype is not just a piece of software like a mail client or browser which I can change easily from Thunderbird to kmail or Firefox to Chrome or whatever. It is a worldwide communication system where the user either uses this Skype software and is connected – or he isn’t.

    This is one of the very few things in Linux where you do not have a choice. So, ignoring Skype is not something recommendable for a Linux distribution to do. The users can easily change distributions but it’s hard for them to abandon Skype.

    BTW: I’m using Skype on a Lenovo S10e netbook with Mandriva Linux 2009.1 – no problem at all. I have to use pasuspender to start Skype, though.

  • I don’t care about Skype, however my issue concerns pulse audio server startup. It relies on HAL+DBUS+PolicyKit. If one of those components is not working/configured correctly, then pulse audio will not start and IMHO this is not good … Especially concerning HAL ( now udev I guess which is safer ).

    • Colin

      Pulse no longer relies on HAL or PolicyKit. PolicyKit (indirectly via RealtimeKit) will allow pulse to obtain realtime privileges, but it’s not a required component. As you said, HAL has been supplanted by udev now that it’s got it’s act in gear and is deprecating HAL more generally. DBUS, well yes there is a requirement on that and there will be even more so in the future when pulse will use DBUS as it’s primary IPC mechanism. But then again, what’s the point of having frameworks and structures in a modern desktop if we don’t use them and always implement alternatives? I say we sometimes just have to embrace these components. DBUS is just about everywhere – GNOME/KDE both make extensive use of it… I doubt it’s really that big a deal that pulse uses it too.

  • hairy t

    Thanks for the explanation. However, I have to say that these two articles still clarify nothing for me. I have full confidence that it is quite correct and indepth — I am not questioning your knowledge at all. But I feel that it’s still too esoteric for anyone but those extremely familiar with kernel or near kernel development. I am a fairly experienced linux user (more than 10 years, not afraid of modifying rc files and even have recompiled the kernel once or twice)

    I’ll tell you what I was hoping to find: an explanation of why my Fujitsu laptop sound system seems to be hosed because the sound works fine but the drivers can’t seem to sense that a headphone jack has been plugged in so the sound never goes to the headsets always to the speakers. Now where on a drawing with ALSA and PulseAudio and OSS does that issue lie. Where does one start to even look? (btw this is a pretty common issue that have been reported by many with what seems no satisfactory answer.)

    Perhaps a less technical explanation with more pictures of boxes and lines would be clearer?

    Thank for what you do.

    • Colin

      For a while now I’ve been meaning to work on a “Troubleshooting” guide that would help to identify where in the process such a problem lies. In your case I’d say that this is a problem in the ALSA driver and you should probably seek advice from the alsa-dev mailing list.

  • Pingback: SEBELK FOSS » Blog Archive » El sonido en Linux es confuso: Desmitificando Parte 2: PulseAudio()

  • It’s not necessarily even a ‘problem’ as such, more a not-yet-implemented feature. Jack sensing is a fairly new thing for ALSA and hasn’t been implemented for anything like all hardware that supports it, yet. Still, Colin gave the best practical advice: ask on the alsa-devel mailing list. Include the output of the script (I don’t think Mandriva includes it – Colin, you should really package it, believe me it’ll make your life easier…in the meantime, you can find copies lying around the intarwebs).

    • Colin

      Fair point re… I’ll put it into a package somewhere.

  • Andrew Kane

    So, it does not work for me.
    I ssh into my desktop machine from my laptop. pabrowse on the laptop shows that the desktop as a “new server” and a “new sink” (surely it should be a source?), but does not terminate. Checked in paprefs is:
    – Enable network access to local sound devices
    – Allow other machines on the LAN to discover local sound devices
    – Don’t require authentication
    – Enable multicast sender
    – Create separate audio device for multicast

    and nothing else.

    paman shows only the local server, not the remote.

    Evidently it is not “discovering” the remote pulse server, I assume because pabrowser does not terminate- though I don’t know, since I can’t find anything in the sparse manpages for these various programs nor on that tells me how to troubleshoot this.

    I have configured both machines to allow incoming tcp on 4713, but netstat shows no connection on that port when I attempt to run totem under ssh -XC.
    I don’t understand why the various distributions that have now made Pulseaudio the default sound server have done so, since it immediately broke compatibility with so many popular applications and is in general so difficult to use.
    Like many folks, I want to be able to forward sound from one machine to another without becoming an expert in a particular application. I’ve already spent years of my life learning unix skills and tcp/ip networking, and if I have to study something I’d like it to be of more general application. This is something that ought to work with a few simple mouse clicks, and it does not; the server runs Ubuntu, but I have tried this under four different distros on the client (laptop).
    I’m not sure whether to blame Pulseaudio for this. Certainly Ubuntu’s implementation is incomplete by default, and Ubuntu’s documentation is lacking in this area. However, the docs at are not much better, and require more up-front knowledge than most non-audio-experts will have.

    This is a problem for a lot of people, as a search of forum posts will show.
    Apologies for the rant, but it is just not as easy as the article seems to indicate. I wish it were. Maybe someday it will be. But not today.

    • Colin

      Well there are several points to answer here:
      1. The term “sink” means a somewhere audio data can be pushed to. A “source” would be somewhere audio data could be pulled from. It’s the same term as used in electrical engineering and various other disciplines too. When your tool finds a “new sink” it is basically reporting that it’s found a new output device. I think the terminology is pretty accurate.
      2. I would avoid the multicast stuff just now. If you don’t specifically need multicasting, it is much more efficient (and also less error prone) to leave it disabled.
      3. pabrowse is just a cut down version of avahi-browse. The problem you’re basically seeing here is something that seems to be an Ubuntu bug as no other distro has reported problems and I personally use this setup on other distros and have no problems. It seems that avahi is essentially broken on Ubuntu until the network is reset or something like that. I’m not an Ubuntu user so I’m not following this, but you can find more info at the upstream bug:

      So ultimately I agree with you, this is something that should work with a few mouse clicks. And for most people with working distributions it does. For folk who drink the Ubuntu Kool Aid, you’re out of luck until they deign to fix their problem.
      So while I appreciate that everyone needs to rant sometimes, please save the vitriol for those that actually deserve it rather than going off at the first person you see.