Friday, September 4, 2015

Audio Networking with Sox and Netcat

Sox + Netcat = VoIP

The idea is to hook the sound devices on two machines together so that one has a bidirectional intercom between them.  This way, one can have a radio hooked to a distant computer and a headset on a local computer and do VoIP with a remote radio transceiver over a LAN.

The main thing missing then is the PTT switch, which one can do with another netcat proxy to the serial control port of the transceiver and radio control software running locally to set the channel and key the radio.  This would make a good setup for a radio ham who wants to key a remote radio in his garden shack, from the comfort of his living room.

For streaming, one needs a headerless, self synchronizing CODEC protocol, to connect to a running stream.  Examples are ADPCM, CVSD and LPC10.

Sox gotchas

Sound Exchange has three programs: sox, rec and play.  The difference is that if you run sox, it will glom onto the sound device for both read and write, but it can only do one thing at a time, so it can either record or play, but not both at the same time.

The rec and play programs however, can run independently and concurrently, so those can be used to make a bidirectional intercom.

UNIX systems have many different sound systems: OSS, ALSA, Coreaudio, Pulseaudio, Jack...  Sox knows how to handle them, if you let it use the default settings.  So get the default to work first with a simple mixer application such as aumix, rec and play, before you try something complicated.

Sox also has internal buffering that is extremely large: 8 kilobytes.  The result is that sox by default has a huge delay.  When you use compression, then the more you compress, the longer the delay gets, because the slower stream takes longer to fill the buffers.  You can reduce that with the --buffer parameter.

In general, a simpler sound system such as ALSA or OSS will have smaller buffers and lower delays than the complex pulseaudio or coreaudio, so stick to the basics for best results.

The general syntax is: sox inputspec inputfile outputspec outputfile

A lone "-" instead of a file, tell sox to use stdio or a pipe, so read the examples carefully.  The data flow is from left to right in all the below examples.

Default setup

First get sox to work on the default sound device:

Make a noise:
$ cat /dev/urandom | sox -traw -r44100 -b16 -eunsigned-integer - -d

Play a tone:
$ play -n synth 10 sin 500

$ rec -r8000 -tcvsd file

$ play -r8000 -tcvsd file

Don't bother with the below if you cannot get the above to work.

Hook sox to netcat on two computers:

Assuming IP addresses and

With CVSD, be sure to set the sample rate and reduce the buffer size to reduce the delays.  The resulting network bandwidth is about 1 kilobyte per second and the end to end delay is about 1 second.  CVSD is a very simple CODEC, so it works well on an embedded processor.

On the first computer:
$ rec —buffer 32 -tcvsd -r8000 - | nc -u -l 5555 | play —buffer 32 -tcvsd -r8000 -

and on the second computer:
$ rec —buffer 32 -tcvsd -r8000 - | nc -u 5555 | play —buffer 32 -tcvsd -r8000 -

Just two one liners!

The above works and I had it running between a Macbook Pro with OSX (Mac sox is available from Homebrew) and a Fedora Linux laptop.  I also experimented on a BSD laptop and it behaves the same. 

The main problem is with delays due to buffering and the more one compresses the data stream, the longer it takes to fill the pipes, hence longer delays.  The LPC10 CODEC will reduce the network bandwidth further, but the delay will be ridiculous.

To make it robust, you may want to put a while loop with a sleep delay around the above, so that if sox or netcat would exit for whatever reason, it would start again, after waiting a little bit for the dust to settle:
$ while true; do rec —buffer 32 -tcvsd -r8000 - | nc -u 5555 | play —buffer 32 -tcvsd -r8000 -; sleep 1; done

...and it is still a one liner!  OK, technically, it is three.

It took me about 3 days to figure the above out, so when all else fails, RTFM and try again.

It is certainly much better than two paper cups and string...

La voila!



  1. I don't have clear the idea about VoIP, but as you explain its amazing and now I can understand all about my VoIP headset which is I purchase last week from

  2. I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post I would like to read this audio visual installation services toronto


On topic comments are welcome. Junk will be deleted.