Friday, September 4, 2015

Audio Networking with Sox and Netcat

Sox + Netcat = VoIP

The idea is to hook the sound devices on two machines together so that there is a bidirectional intercom between them.  This way, you can have an amateur SSB radio hooked to a distant computer in your radio shack and a headset on a local computer in your living room and do VoIP with the remote radio transceiver over your LAN.

I noticed that there is a steady stream of people reading this post.  Please note that Sox with CVSD and netcat will work fine and consume very little processing time on a tiny ARM embedded system, but gstreamer with raw audio over UDP may be a better choice if you have a half decent system on both ends of the link and want the best quality audio.  Therefore, do look into gstreamer also.

For a remote ham radio, the main thing missing, is the PTT switch, which one can do with another netcat proxy to the serial control port of the transceiver and radio control software running locally to set the channel and key the radio.

For streaming, one needs a headerless, self synchronizing CODEC protocol, to connect to a running stream.  Examples are ADPCM, CVSD and LPC10.

Sox gotchas

Sound Exchange has three programs: sox, rec and play.  The difference is that if you run sox, it will glom onto the sound device for both read and write, but it can only do one thing at a time, so it can either record or play, but not both at the same time.

The rec and play programs however, can run independently and concurrently, so those can be used to make a bidirectional intercom.

UNIX systems have many different sound systems: OSS, ALSA, Coreaudio, Pulseaudio, Jack...  Sox knows how to handle them, if you let it use the default settings.  So get the default to work first with a simple mixer application such as aumix, rec and play, before you try something complicated.

Sox also has internal buffering that is extremely large: 8 kilobytes.  The result is that sox by default has a huge delay.  When you use compression, then the more you compress, the longer the delay gets, because the slower stream takes longer to fill the buffers.  You can reduce that with the --buffer parameter.

In general, a simpler sound system such as ALSA or OSS will have smaller buffers and lower delays than the complex pulseaudio or coreaudio, so stick to the basics for best results.

The general syntax is: sox inputspec inputfile outputspec outputfile

A lone "-" instead of a file, tell sox to use stdio or a pipe, so read the examples carefully.  The data flow is from left to right in all the below examples.

Default setup

First get sox to work on the default sound device:

Make a noise:
$ cat /dev/urandom | sox -traw -r44100 -b16 -eunsigned-integer - -d

Play a tone:
$ play -n synth 10 sin 500

$ rec -r8000 -tcvsd file

$ play -r8000 -tcvsd file

Don't bother with the below if you cannot get the above to work.

Hook sox to netcat on two computers:

Assuming IP addresses and

With CVSD, be sure to set the sample rate and reduce the buffer size to reduce the delays.  The resulting network bandwidth is about 1 kilobyte per second and the end to end delay is about 1 second.  CVSD is a very simple CODEC, so it works well on an embedded processor.

On the first computer:
$ rec —buffer 32 -tcvsd -r8000 - | nc -u -l 5555 | play —buffer 32 -tcvsd -r8000 -

and on the second computer:
$ rec —buffer 32 -tcvsd -r8000 - | nc -u 5555 | play —buffer 32 -tcvsd -r8000 -

Just two one liners!

The above works and I had it running between a Macbook Pro with OSX (Mac sox is available from Homebrew) and a Fedora Linux laptop.  I also experimented on a BSD laptop and it behaves the same.

The main problem is with delays due to buffering and the more one compresses the data stream, the longer it takes to fill the pipes, hence longer delays.  The LPC10 CODEC will reduce the network bandwidth further, but the delay will be ridiculous.

To make it robust, you may want to put a while loop with a sleep delay around the above, so that if sox or netcat would exit for whatever reason, it would start again, after waiting a little bit for the dust to settle:
$ while true; do rec —buffer 32 -tcvsd -r8000 - | nc -u 5555 | play —buffer 32 -tcvsd -r8000 -; sleep 1; done

...and it is still a one liner!  OK, technically, it is three.

It took me about 3 days to figure the above out, so when all else fails, RTFM and try again.

It is certainly much better than two paper cups and string...

La voila!



  1. I don't have clear the idea about VoIP, but as you explain its amazing and now I can understand all about my VoIP headset which is I purchase last week from

  2. I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post I would like to read this audio visual installation services toronto


On topic comments are welcome. Junk will be deleted.