Illegitimi non carborundum


Sound on Linux Anti-FUD: Calm, Certainty and Confidence

Over the years I've listened to several opinions expressing doubt over the Linux sound stack. There are lots of ill informed comments out there concerning various things sound related, both positive and negative, but more often than not commentators miss out very important aspects of a modern, multi-user, desktop sound stack. So in this article I'll attempt to discuss some of the misconceptions out there, provide a balanced view of the current state of affairs, discuss some of the perceived mistakes in the rollout of new sound stacks and where things are going in the future.There have been a few articles, some picking up mainstream coverage talking about the Linux sound stack. Some comments suggest that it's not that bad, but totally miss the point regarding what a desktop audio stack is all about, but most people are talking about how it's in a bad way and overly complicated and while such comments do have some merit, things really are not that bad, and I believe there is a really bright future.


A lot of the comments of late have been discussing things such as how amazingly brilliant OSS is. Personally I don't buy it. I've never really played overly much with OSS and as such this is probably a slightly ill-informed view - although that's not to say it's not accurate of course :D. Most of these kind of comments are made by people who don't really understand ALSA and are bought over by the "ALSA API is overly complex" type comments. Yes, the ALSA client library is rather complex and has numerous pitfalls - so much so that there exists now an unofficial "safe" ALSA Subset API. But what people invariably fail to comment on (and thus fully understand) is that ALSA comes in two parts: the kernel driver and the userspace library. ALSA differs from OSS in that all access to the kernel layer is performed via a userspace library. I don't know of any ALSA clients that communicate directly with the kernel layer without going through libasound. What this means is that the kernel interface has the freedom to be re-factored and improved at any time, provided the userspace library is developed in parallel. For this reason, the kernel layer is actually quite clean and well defined. The rather rigorous quality control that goes on in the kernel is testament to the fact that on the kernel side of things, ALSA is doing pretty well. Of course there can (and will be) improvements in this area in the future, but this side of things is certainly not in a poor state as people seem to assume.

The "too complex" argument relates to the ALSA userspace API. In order to remain backwards compatible, the userspace API has undergone several refinements. As will anything not designed from the top down, some parts of it are rather confusing and have sometimes been misinterpreted (the classic example here being the confusion over snd_pcm_delay() - it's documentation hinting at a hardware based implementation that subsequently lead to some project (i.e. WINE) assuming that this function will eventually return 0 which is not true; fortunately this problem is behind us now, but with a new API call added that does return the info the WINE guys (and others) needed).

So yes, the ALSA userspace API could use a complete top-down redesign, but in order to do that, we would immediately break compatibility with 90% of the apps out there: Not a great idea all in all. Retaining backwards compatibility is a pain, but it's also quite important!

But Sound Servers Suck!

What is in a name? That which we call a rose by any other name would smell as sweet. Some people seem to have some sort of built in hatred of "sound servers" as a concept without really thinking through what this means. Yes, there have been some pretty awful experiences with some sound servers in the past (EsounD and aRTs being the immediate examples that spring to mind), but that doesn't mean the concept itself is flawed. You may drive a couple of shit cars but doesn't mean we should all abandon the roads. In addition, the sound servers of old were really just mixers. In the old days most hardware was not capable of doing hardware mixing and thus couldn't produce sound from multiple apps at the same time, so a mixer was an essential component. Nowadays, software mixing is the norm rather than the exception, even on high end hardware, and ALSA itself has a pretty solid sofware mixing in the form of DMIX, thus obsoleting large parts of the previous sound server functionality - certainly making the additional features they did offer seem disproportionate to the hassle they introduced. In the early days DMIX was just another sound server. Apparently this has changed these days, no longer needing an additional process. While it achieves the job of software mixing very well, it's not as fast or as flexible as other solutions can offer.

Modern Multi-user Desktop

So, these days a modern, multi-user desktop is quite a different beast to what it once was. Components such as Console Kit track which users are currently active (e.g. when more than one user is logged in simultaneously) and tells udev to write appropriate ACLs to enforce this policy. Users also want to use network attached sound systems, such as Apple Airtunes (RAOP) devices and UPnP media renderers etc. not to mention Bluetooth devices. All of this is much further up the sound stack than the low level driver level and has to deal with various permission and authentication schemes. This obviously needs a userspace component to govern this interaction. Something has to be responsible for this and a "sound server" of some sort obviously fits the bill perfectly.


So enter PulseAudio. It's had it's fair share of bad publicity, but ultimately this important part of the Linux sound stack is taking on several roles that are important in a modern desktop. It's dealing with several different things:

  • Software mixing
  • Independent (per-application) volume control
  • Dealing with permissions (is the user allowed to access the sound device?)
  • Dealing with Bluetooth devices
  • Dealing with Network based devices (UPnP, Apple Airtunes, Native PulseAudio etc).
  • Handling the moving of streams between outputs.
  • Handling sound from remote applications run via X11 over a network.
  • Dealing with routing policy (Music goes to USB speakers, Desktop sound events to built in speakers, VoIP to Bluetooth headset)
  • Effects to promote HCI (e.g. positional event sounds  - button clicks etc, coming out louder on the left hand speaker when triggered from the left hand side of the desktop)
  • Power Consumption and Efficient savings.
  • Reduces risk of buffer under-runs.

So the people who talk about OSSv4 and how it can do mixing and per-app volume control and how this means that ALSA and PulseAudio are not needed are totally underestimating what's needed in a modern audio stack. There still needs to be some kind of userspace daemon to govern these other sound systems and deal with multiple users. This is a non-trivial job and no other system out there is currently aiming to implement these capabilities.

One of the often overlooked advantages of PulseAudio is the "glitch free" system. This is an approach that ultimately disabled interrupt driven audio and instead relies on system timers. Modern kernels can provide these timers easily and reducing the number of interrupts and using larger buffers allows you to greatly reduce the number of CPU wake-ups thus saving power. This is actually a very important technique to implement when dealing with modern mobile platforms.


It's obviously important to ensure efficient code reuse. It doesn't make sense for all sound producing applications to implement direct support for "exotic" sound systems such as Bluetooth, UPnP and Apple Airtunes etc. To do so is very inefficient (there are some exceptions to this - e.g. a media player that targets Win/Lin/Mac will maybe need to implement direct support if it is to be available across the board). Keeping the implementation centralised and having a single app->sound system API is essential here.

Consistency of UI

One of my big problems with many applications is inconsistent UI. This is a problem on Windows as much as on Linux, but it's something OSX has done mostly right. Users got to a central GUI to configure their sound and which device is currently active/in use. In Linux land all sound producing apps have their own config GUI for selecting sound devices. This is insane. Non-technical users don't know that you have to go to Tool->Preferences->Advanced->Sound in App A and Edit->Settings->Audio in App B. Sure, those of us who are reasonably technical will generally find the options (that's how we use applications - we click and look at all the settings pretty early on!), it's going to be less than obvious for a massive number of users. Keeping the preferences centralised so the user always know where to look is important and for a general purpose application that outputs sound, there should be no reason to provide any config option relating to this to the user - it should "just work"(tm).


Some users have complained that some proprietary applications have stopped working with PulseAudio, Skype being an oft mentioned example. Well, I'm sorry but that's just tough. If a closed source application does not implement an API cleanly and does bizarre things, there is nothing we can do to fix it. The problems Skype has experienced with PulseAudio would also be experienced by any other plugin to ALSA. I'm sorry to say it, but in order to move forward, some applications have to suffer and/or be forced into action. By not allowing the people who care about this stuff the right to improve things themselves you're taking on the responsibility to do this yourself and you need to live up to your responsibilities. Considering the last version of Skype for Linux was released more than one and a half years ago, it's hard to consider it as anything more than abandon-ware at present. Will there be more pain like this? Yes, probably but that's just way things are - Free Software only truly works if the whole bundle is Free, if you mix and match you, as a user, have to accept this state of affairs. I do.

Desktop environments need to ensure they integrate nicely with PulseAudio. GNOME is obviously doing this, but KDE is lagging behind. I do hope to rectify the latter situation personally, and have a pretty clear roadmap to making this happen - it's just a matter of finding the time to do it!


So, with all this in mind, the sound stack has to be more than just a driver layer. It needs a persistent userspace layer that can run and keep track of various permission problems, deal with network connections and generally govern things. At present PulseAudio is fitting the bill pretty nicely and is continuing to add support for additional constructs in the Linux stack. As things stand all the major Linux distributions are now using PulseAudio with commercial interest from Nokia, Intel and Palm among others.


So the future? Well, the drivers in ALSA need to be further debugged and developed to ensure the accuracy of the timing information that has so far plagued the "glitch free" system in PulseAudio. Nothing has pushed the ALSA drivers to such limits before, but the benefits of the glitch free mode are clearly worth the pain. Applications using the ALSA API need to ensure that they are using it correctly and sticking to the safe subset whenever possible (thus ensuring compatibility with PulseAudio's ALSA plugin). In addition, applications such as media players need to deal properly with latencies. It's a bit of a myth that low latencies are needed by such applications - higher latencies will ensure better battery life on mobile players and depending how the user wants to route their sound (e.g. to the Bluetooth enabled hi-fi system) latencies will be something beyond the control of the application in any event. It's therefore important to deal with this correctly and appropriately to ensure A/V sync. It's only been about half a year that the ALSA level limitations on buffer sizes were lifted after lobbying from the PulseAudio maintainer. Intel are even experimenting with 10 second buffers (that's not the same as latency!) in order to save power!

Every day more and more applications are tightening up their ALSA implementations. Every day the constructs of the Linux desktop are becoming more stable and solidified, offering a truly joined up multi-user and network aware experience. I think this is particularly impressive considering the fact that (as far as I know) only three people are employed to look after the Linux sound stack: Takashi Iwai andJaroslav Kysela on the ALSA side and Lennart Poettering on the PulseAudio side. While there are numerous other contributors, this is still pretty impressive progress with the resources at hand. It's also worth noting that two of the three are employed by RedHat, the other by Novell.

While Mandriva will still provide an easy way to disable PulseAudio if you feel it's not right for you (just untick the box - it's not hard!!) or need to use these closed applications such as Skype, I believe that this will not be necessary in the not too distant future.

So where is Sound on Linux? In my opinion it's in a pretty good state - there are still lots of things to do, and that will never change, but there is a firm and solid framework out there now and it's getting better every day.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks
  • Slashdot
  • KimTjik

    Thanks for the clear and well constructed article. Much appreciated. There are so many opinions circulating, hence it becomes very difficult to really know what to believe as a not initiated person.

  • Thanks for the explanation. I appreciate the flexibility of Pulseadio, but it has caused me some problems (out of sync audio with some video, some apps not working at all).

    The attitude to broken proprietary apps is unfortunate. Breaking very popular apps will lose users – i.e. Skype not working would be, for lots of people, a reason not to use Linux (especially for non-technical users).

    One question: where is KDE lagging? I thought (wrongly?) that Phonon was supposed to work with Pulseaudio? Is it just buggy or is there missing functionality?

    • Colin

      The places where KDE is lagging is not supporting pulseaudio (it does indirectly support pulseaudio via the Phonon engines – although the Xine engine has fairly poor pulse support sadly (it’s author fully admits this) which does cause some problems).

      The main problem is a lack of integration. The system settings in KDE allows for different categories to be specified for different apps and then a different order of preference to be specified for e.g. Music vs. VoIP (when you have more than one sound device – e.g. a USB headset etc.).

      When pulse is used on the system, all the device management is off loaded, but the UI remains.

      What I want to do is ultimately make the UI work properly, e.g. allow the devices to be listed and reordered, but for this preference and setting to basically just be a window onto pulse – i.e. hand off everything for pulse to deal with.

      So that’s ultimately what I’ll work on in the next few months (in parallel to a few other things related to this).

  • audunmb

    I think most complaints come from poor implementation by distributions (especially Ubuntu in my experience), so it’s not necessarily a fault of the developers.

  • DanF

    A very helpful description of what may well be a great setup for the majority of users. In my audio production setup, none of the features in the PulseAudio list (that are not already supported by ALSA/JACK) are of use. A companion article, with the same top-level scope as this one, might describe the state of things for sound production.

    • Colin

      Some of the issues of working with jack+pulse have been discussed over the last couple days and automatic handover from pulse to jack should be working pretty nicely these days (or will be very soon).
      As many people using jack for production work will also check their mail and just “play tunes” on the machines too, I think that having both play nice with each other is a nice feature.
      I’m sadly not in any position to do any vaguely informed article relating to sound production work, so I’ll leave that to someone else more qualified 🙂

  • The only thing I think Linux sound is really lacking is something which sadly, I don’t hear people asking for. How about a standard environment variable for sound.

    I have one session going running locally at the computer. Thanks to the DESKTOP environment variable every app I start knows to talk to my local X-server. I click and it appears on the screen.

    At the very same time I can have a remote VNC session running. I click inside there, it sees a different copy of that DESKTOP variable and my app starts inside the VNC session.

    The same goes if I am just running remote X-sessions, NoMachine’s NX or any other way of accessing X-applications remotely.

    That kind of flexibility is one of the things I think Linux used to really shine in when compared to Windows. Why doesn’t sound work like this? Even Windows has Remote Desktop now and guess what… it supports sound.

    I propose a new environment variable called AUDIO. Since there are multiple sound APIs and multiple sound servers it will have to use a URI format. It could be something like this…

    PULSE:// (I don’t want to look up the standard port# right now)
    ARTS://…. ok, you get the idea

    Of course not every distro would adopt the new standard immediately and not every user would upgrade right away so apps would still have to have a place to set the sound device in their setup areas. They would just have a default option, which would check for an environment variable and if none was present would use whatever is the most popular local device… probably ALSA at this time.

    Along with the envirnonmental variable I would also like to see an audio server which has a named pipe for local use… no TCP/IP lag but is also accessible via TCP/IP for remote use. It would support multiple encoding formats and chose one automatically based on the connection speed. So.. if I was away from home and connected on a slow network I would automatically get highly compressed tiny audio but at least I would get audio. I would connect from a different room on the same LAN and automatically get full quality stereo. (or better). I don’t want to have to even think about it though it would be nice to allow the user to override the default choices here.

    • Colin

      I think you mean the DISPLAY variable rather than DESKTOP, but the point stands :p

      Under pulse you can use the PULSE_SERVER variable to “display” sound on the right network machine so this goes some of the way to addressing what you want.

      Also pulseaudio does use local pipes for communications (just look at your xprop -root | grep PULSE_SERVER output too see it). Pulse also uses SHM to communicate with clients thus ensuring the copying of data is kept to a minimum – even on local connections there is no need to copy the audio date over the socket when you use SHM.

      If Lennart’s once stated goal of a convergence sound library called libsydney ever saw the light of day, it would pretty much do everything you wanted, but that’s another story. I think with Pulse you pretty much get what you want for all practical use cases.

  • Matt

    I think where people got upset was that things that worked suddenly stopped working when pulse started to be used. Thats not very nice and certainly not good for appearances and acceptance.

    People like to blame the distribution’s implantation.. that certainly has been a problem, but how about more information on how a user could fix his/her installation if its broke instead of just waiting for a new version. Documentation for pulse has been hard to find and contradictory. Fortunately, pulse has gotten more usable and for the most part, now “Just Works”.

    Where it still is confusing for me is pulse’s interaction with JACK which is the standard for music production apps on linux. My goal for that would be when JACK is started, pulse becomes a client of JACK without interrupting pulse’s clients.

    • Colin

      In Mandriva we provided pretty good methods for turning pulse off (a big button in the sound config portion) and as such it’s not been too difficult to revert problem cases back to plain alsa until we can make it work fully. I don’t think pulse is something your average user should worry too much about debugging, they should report it to the distro and the maintainers there should step them through the processes needed to gather more info. Of course enterprising and enthusiastic users are welcome to do it themselves (that’s all I am really!) and thus we always help out when people come on to the #pulseaudio IRC channel or ask on the mailing list.
      Granted more written information needs to be made available which is party why I wrote this series of articles 🙂

  • n0ns

    The problem with your beloved PulseAudio (which is nice and sometimes useful) is that it isn’t ready for the DIGITAL world.
    I use one of computers as Media Center in living room. Guess what … It running Linux.
    I like DTS and DD sound tracks for the movies over S/PDIF, but PulseAudio can’t handle it and there is no plans to support it near future (as far as i know).
    So, the only way to have DD or DTS is to use something like VLC that can directly output to S/PDIF regardless system settings. This is ugly solution.
    All sound is going over S/PDIF and after watching a movie you have to restart applications to have sound back.
    There is no problem with PulseAudio on the analog front.
    Somehow, I even use Skype with it.
    The only disturbance that I have is that sound captured from the mic outputted through the speakers. I couldn’t find solution by myself and probably to lazy to ask this question on the forum since Skype mostly used from the phone (fring)

  • Burgers

    You also need to look at the reasons for the shitstorm and why people are pontificating about PulseAudio suckage. It’s because — ultimately — people don’t care about the reasons why, they … just want … … their au … dio to stop paus … ing and skipping when they’re trying to listen to an MP3.

    This is what Apple call minimum requirements in their HIG. We must meet minimum requirements before foisting this on users.

  • phred14

    I appreciate what you’ve written, and the clear explanation. But I’m still left with 2 problems at the moment…

    1 – Ever since upgrading to Ubuntu 9.04, my daughter has lost sound on Amarok and MythFrontend, but does have sound on YouTube. She had sound prior to that, back to Ubuntu Feisty Fawn. I don’t run Ubuntu myself, so I have no idea even where to start with this. The “sometimes” throws me off, especially.

    2 – On my intended dedicated MythFrontend machine, the HD-Audio is too soft to be practical – with all known volume knobs turned to max, ordinary room noises can drown it out. There are “known switches”, but poorly cross-referenced to my motherboard, and it takes using them to get the sound as loud as it is. I’ve turned it off and installed an ancient PCI sound card, which worked fine from the get-go.

    What I really mean is that esoteric explanations, no matter how correct or truthful, are all fine and good. But you also used the expression “just works”, and for my daughter and me, it doesn’t. What’s more I started using Linux with RedHat 4.0, back when there was NO sound at all until you compiled a custom kernel and did a bunch of custom configuration. Unfortunately “just works” usually means, “It should ‘just work’, so we’re not going to bother with easily usable documentation,” and ends up meaning, “If it doesn’t ‘just work’, it’s just about impossible to get working.”

    • Colin

      While I don’t know the versions involved in Ubuntu, MythFrontend may actually suspend pulseaudio when it’s running which may mess with how it output sound. With Amarok, chances are it’s Phonon on KDE doing the work, and it could be it’s misconfigured. The problem is that KDE sound preferences only affect KDE apps whereas pulse is system wide. It makes no sense to show all the audio devices in KDE when you are using pulseaudio for sound output. That’s why in Mandriva we’ve ensured we’ve crippled the KDE UI when pulse is used to only show the “PulseAudio” output in the list of options there. I would have hoped other distros did the same. Trouble is, it only takes one rouge application to “hog” the sound hardware and it throws everything else out of kilter.

  • Cláudio Pinheiro

    Strange. In the beginning of the article you say ALSA people need to maintain backward compatibility in an overly complicated userpace library to not break already developed programs. In the end of the article you say closed source programs should be blamed for not following the newest APIs (and not embracing PulseAudio) because they can’t take advantage of the higher-end features, like per-application volume controls and automatic audio routing when, let’s say, an USB or Bluetooth audio device is connected.
    I see the problem a bit deeper. First, there’s not only Linux out there. There are the BDSs and other UNIXes (that use OSS). Politics aside, the existence of two different kernel driver sets, one of them incompatible with the rest of the world, is a bit problematic. Not that the Windows world is in any better state. If a developer wants to have his/her program reach the widest (UNIX) audience possible, choosing ALSA is not the optimal option. The funny thing is the most recent OSS has software mixing and per-application volume controls, is less cpu-intensive and offers better latency in most cases. The only uncontemplated feature would be the automatic audio routing, but even that could be in a future included in kernel drivers (such things already occur in the network stack when you configure redundant interfaces). So no need to an audio daemon at all.

    • Colin

      I don’t dispute the need to work with OSS, and patches to the OSS modules in pulseaudio are happily accepted. One of the major points to look at is the glitch-free stuff I mentioned.It’s actually an amazing bit of kit but it really doesn’t fit at all into the OSS model, nor does dynamic routing/moving of streams, so I think by it’s very nature, it’s not suitable for a modern desktop sound system on it’s own – it needs a layer like pulse over the top to handle various desktop and user based things (authenticated network systems being a case in point I made originally).

    • Vermax

      In my opinion nobody who’s interested in supporting Linux should care about Unix’es. Their problems are their problems and why someone who’s only interested in supporting Linux should care about them? I’m talking for myself.

  • Sergio

    I thanks Colin for his great 3 latest articles about sound on Linux. He was kind answering my questiosns.
    Now I want to solve a really specific problem. If I want to use recordmydesktop, I have to run it in this way:

    recordmydesktop –device “hw:1,0”

    I have to do that for netbook mic works.

    Now, I’d be glad to use ucview (an audio and video capture) for recording video and audio, guess what, I haven’t found a way to do something alike above example using recordmydesktop.

    If I enable audio recording ucview crashes and it outputs:

    ucil_alsa.c ucil_alsa_init (120) :cannot open audio device hw: 0,0 (No such a file or directory)

    You have said that alsa files are not really config files, in this case should not I create an .asoundrc.

    And shame on me, I’ve found somewhat confusing how to do it.

    Please could you explain for all apps the mic?

    Thanks in advance!

  • Pingback: DevOnLinux | State of sound stack in GNU/Linux()

  • Alejandro Nova


    It was good to read all of this, because I’m with two sound cards and configure them to be happy with plain ALSA is simply a plain hell. PulseAudio has solved lots of headaches for me, but some still remain (namely, my Crystal SoundFusion soundcard is unusable without tsched=0).

    Now, to the question. How can I feed an input (SAA7132, a TV card) into an output (CS46xx speakers) using Pulse? It would be amazing to know, because when I watch TV I don’t want to use more expensive methods (sox) that involve data copying and latency (you don’t want latency when you are watching TV).

    Thanks in advance. If I can debug this CS46xx to be usable with tsched=1 I can help. Linux sound is evolving nicely (hey, I could connect my synth to my computer and the computer detected and enabled the USB MIDI interface in seconds), and I’m going to be here to watch it.

    • Colin

      With newer pulseaudio releases (well the 0.9.19 as versions .16-.18 should just be avoided really!) there is module-loopback. You can load this module and then move the input and output devices in pavucontrol to feed the audio from your TV card to your speakers.

      There are some problems with the saa7132 alsa support it seems judging from some messages on the PA mailing list.

      Debugging the CS46xx would certainly be welcome feedback for the alsa developers I’m sure.

  • Syniurge

    Awesome article Colin !
    I’ll be linking it all over the web and forums, as soon as some misunderstanding (and hate) pops up !

    However I’d gladly welcome a ALSA lib v2, stripped of all features provided by Pulseaudio (dmix and the like) and redesigned to make device and per-device capabilities listing much more usable from Pulseaudio point of view.

    A little Pulseaudio annoyance on the other hand is the locked samplerate. Many people paying attention to sound quality and CPU usage (especially in 48khz movies) are unhappy with Lennart’s decision to lock the samplerate and enforce unnecessary resampling for the sake of Lennart’s very own definition of sound quality:
    Why not making Lennart’s choice to lock samplerate a default policy, and offer other “samplerate policies” in the UI ?

    But don’t mind, I’ll try to offer a patch when I get the time, in the meantime please bring back some good Phonon support !
    For some reason it was working nicely in Kubuntu Intrepid, the “physical devices” were mapped to Pulseaudio when possible, and since Jaunty/Karmic it has been regressing to a primitive “PulseAudio” device.

    • Colin

      Hi and thanks for the kind comments.

      The samplerate thing is trickier than you might think. We currently use the default sample rate as specified in the configuration file. We’ll probably expose this via some kind of profile selector (e.g. the Configuration tab in pavucontrol etc.) but that’s not available yet. The main problem comes when you do automatic switching. e.g. say nothing is using the device, and a new stream of 96kHz starts to play, the device is opened and switched to 96kHz. Cool, so far so good. Then a 8kHz sound event “bing” is played. As the device is open and set to 96kHz the 8kHz sample is automatically up-sampled to work. Again, so far so good. Now consider the events happening the other way round. The 8kHz is played first, so the device is opened in 8kHz mode, but then the our 96kHz stream is played and as the device is in 8kHz mode we have to down-sample to 8kHz… not that sure as hell is going to be audible! Even if we closed the device and reinitialised it at 96 (due to some kind of highest possible sample rate policy) it’s going to be heard as a click or a pop, again not acceptable. Now this could be mitigated in the case where the device supports hardware mixing by opening the device several times at different rates, but this is going to introduce many more problems and require a lot of coding and probing to get right.

      So all in all, the easiest thing is just to lock the system to the sample rate the user wants to use and work with it. Once we expose this via a configurable interface, things will be easier. We may also be able to use a “default rate” and adopt a policy of first opener get’s the rate >= default they request, but doing that will lead to inconsistent sound quality, so it’s probably a bad ideal.

      As for the phonon support, you should read my article here: I’ve no idea if they are using my patches or not.

  • Name (required)

    Not a single proper use of “its” in the entire article, but a lot of creative abuses of “it is”, contracted. Lots of advocacy and feel-good stuff under the “anti-fud” label, and in that it is probably correct. But it still will not win me over to support a flawed idea done badly and with a lot of denial that there are problems, with much fingerpointing to elsewhere for the fixes. That just leaves ordinary user type people out in the cold and that in turn is something “linux” in general is infamous for: Sweeping changes that break lots of stuff and only shrugs or worse to show for it. No amount of feel-good marketing can mend that, as you-know-who too experienced.

    • Colin

      Well, I generally don’t allow anonymous posts, but I just wanted to correct my grammar. I found two cases where I’ve used “it’s” incorrectly. All other cases are contractions of “it is” or “it has” both of which are valid in my book. I’ve not counted but there were over 15 of such occurrences, so slightly higher than your “not a single” count.

      Now with Linux you can say what you like about the development process but ultimately whose itch is going to get scratched? Your itch because you’re the end user, or mine because I’m the developer. Guess what? Mine comes first, because I’m the one doing it!!! If it happens to fit in with your itch, then great, lucky you. If not, maybe I’ll scratch it for you because I’m in a good mood, or maybe I need to scratch it to get to something else behind it.

      Linux is not some commercial entity that must please it’s end users or obtain some benchmark of desktop penetration. It’s not marketed, it’s not market-driven. Sure there are commercial entities doing just that, but it’s not the driving force behind most developments. They are done because they are the right way to go. Adding in hacks left right and centre to behave correctly given bad input leads to the shitty situation today with the web browser market which have for too long tolerated crap input which leads to inconsistencies. The problems need to be addressed where they lie, not worked around, painting over the cracks. Call that finger pointing if you want, but that’s just ill informed at best and vindictive at worst.

  • Bizarre

    I’d like somebody to write an article which illustrates the recommended configuration for sound. I’m using KDE 3.x, ALSA and artsd. Pulseaudio is installed but not running; I don’t know why. I don’t know whether I should disable artsd, how to start pulseaudio, or whether my ALSA is configured correctly. For me the whole sound subsystem appears as a mish-mash of pieces which don’t fit together properly.

    • Colin

      Arts is a pain and needs to die. Thankfully it has. I’d recommend updating to KDE 4 and using my various patches to Phonon + KDE to make KDE4+Pulseaudio not suck:

      That said, arts can be made to run on top of PA, some people configure it to use ESD but personally I had good experience with just plain alsa output and no silly specific config. YMMV tho’.

  • Pingback: So how does the KDE PulseAudio support work anyway? «

  • J Story

    Funny. You say: “So the people who talk about OSSv4 and how it can do mixing and per-app volume control and how this means that ALSA and PulseAudio are not needed are totally underestimating what’s needed in a modern audio stack.” But what you seem to misunderstand is that OSSv4 actually works.

    Excuse after excuse is offered why Pulseaudio doesn’t *really* suck, if only blah, blah, blah, and how terribly superior it is. However, the fact is that it does suck. OSSv4 has what I want — some of what pulseaudio claims to offer — and doesn’t have what I don’t want — no audio, skipping, pegged cpu — which pulseaudio delivers in spades. Are audio decisions for Linux distributions being made by masochists, that they prefer the polished turd?

    • Colin

      Well this is sadly the kind of reply that is so common from people who really don’t understand how open source software works. PulseAudio has certainly had it’s problems, no-one (myself included) has denied that, but these problems have typically fallen into two categories: integration issues (e.g. the fault of the distro) or driver issues (due to new features exposed in the low level ALSA drivers that no other ALSA client has used before). Both of these issues have been largely addressed these days, although obviously some problems still remain.
      PulseAudio has now been adopted by the major distributions, which means that it is now getting a lot of exposure and testing. It is this exposure and testing that drives the bug fixes that are needed. Without a critical mass, corner cases are missed and obscure hardware is not fixed at the driver level. OSSv4 has simply not had the exposure needed to be able to make a “it just works” statement hold water. I cannot and will not accept such a statement without seeing either a detailed study of the testing procedure and hardware used, or until it reaches a critical mass and has been in used for a reasonable period of time. The bugs exposed in the ALSA drivers by the later versions of PA (0.9.11+) take advantage of advanced timing feedback from the driver layer in a way that will save power. This is something that OSSv4 cannot do, and thus a comparison is not like for like… do more advanced things, expect more bugs during the initial stages: this is what is expected. Sure there are teething problems but it doesn’t mean the fundamental architecture or thinking is flawed.
      Like it or not, the OSS API is dying. It’s already being disabled by default on the major Linux distros and there is no chance of having two competing audio driver infrastructures in the kernel, ALSA has won there.
      If you really care about making Linux audio better, then you should just abandon OSS evangelism and help make the tools that are being used now better. Even if OSSv4 were better (I don’t believe it is personally, but it is irrelevant here), sometimes you just have to admit defeat… Betamax had it’s little ass kicked by VHS if you remember!

  • CasHew4

    Its sounds like you are ignoring the KISS formula in order to implement features that are required by less than 2% of the users. Multiple users logged in simultaneously on one desktop computer? Come on now, how often does this happen? Many of the “modern desktop requirements” that you insist on are actally needed by very few users. Needless complexity usually results in chaos that negates any advances.

    • Colin

      I totally disagree with your figures there. 2% is very low. Both Windows (since XP) and OS X support multiple users logged in simultaneously very well. Are you suggesting that Linux should not support multiple user logins? If so, why do we bother having “users” at all. Why not just have a single “user” account?!!!

      The multiple user concept is ingrained in Linux history and there is no reason to change this now – it’s still very useful. The design decisions made are also completely separate from the Audio stack per-se. The user session tracking system is Console Kit and has several uses beyond the audio stack. We just integrate into the systems that are there and to good effect.

      If you follow any of the more recent trends in Linux you will see that multiple users (and also simultaneous active users via a multi-seat system) are very much issues that we need to deal with.

  • Juan Carlos Perez

    This is an interesting article, but I should not be reading it. The microphone of the computer should “Just work” and I should not have to read lots of things in internet install programs which result in forced reboots, to no avail. I have been using Ubuntu since its first version and has advanced quite a lot but regarding sound it went back wards some time ago, before version 9.04 because neither that not 9.10 work. I used Skype without problems some time ago. So is not the hardware it is the sound system. And there are quite some people who have the same problem.

    • Colin

      In an ideal world we’d all just sit down in a big room with all the hardware manufacturers in the world and then develop our replacement sound stack and not release until it worked with every bit of hardware in the room. In the real world this doesn’t happen!

      I’m sorry you’re having problems at the moment but without people reporting the problems and seeing that they are fixed, we cannot expect to move forward. Complaining that “it worked with X, Y years ago” doesn’t really help. We’re aware that regressions are introduced by driving the sound hardware in a very different way to before. This is inevitable with such a change. People should be keen to say “OK, so things no longer work so well for me, what can I do to help fix it again?” rather than just saying “It’s broken, it worked via a different way a year ago, this is shit, I’m going to complain”. It’s all about the attitude really and while I don’t mean to pick on you personally (as your message was very polite and well mannered etc.) people who are part of the FOSS “movement” should appreciate that things are not deliberately broken, nor are things changed without due cause. Sure, it would be nice if we had the resources to QA everything fully on every hardware in existence before releases etc. but that simply isn’t possible. We need user feedback and quality bug reports to get things working. It’s how things work.

      I strongly encourage you to help report bugs and get the problems fixed for yourself and for the benefit of others with similar hardware.

      If you cannot do that just now, just use the interface your distro should provide you to disable PulseAudio and try again with a new round of updates to see if other people have managed to report and have fixed the bug or bugs you were experiencing before.

      For reference, Skype with PA works exactly how I want it to now! My Bluetooth headset works great with it and the ability to move streams from built in sound card to the headset is perfect. It does it automatically without me having to do anything. Accept call, open headset, streams move across. Perfect!