Skip to main content

Audio Networking with Sox and Netcat

Sox + Netcat = VoIP

The idea is to hook the sound devices on two machines together so that there is a bidirectional intercom between them.  This way, you can have an amateur SSB radio hooked to a distant computer in your radio shack and a headset on a local computer in your living room and do VoIP with the remote radio transceiver over your LAN.

I noticed that there is a steady stream of people reading this post.  Please note that Sox with CVSD and netcat will work fine and consume very little processing time on a tiny ARM embedded system, but gstreamer with raw audio over UDP may be a better choice if you have a half decent system on both ends of the link and want the best quality audio.  Therefore, do look into gstreamer also.

For a remote ham radio, the main thing missing, is the PTT switch, which one can do with another netcat proxy to the serial control port of the transceiver and radio control software running locally to set the channel and key the radio.

For streaming, one needs a headerless, self synchronizing CODEC protocol, to connect to a running stream.  Examples are ADPCM, CVSD and LPC10.

Sox gotchas

Sound Exchange has three programs: sox, rec and play.  The difference is that if you run sox, it will glom onto the sound device for both read and write, but it can only do one thing at a time, so it can either record or play, but not both at the same time.

The rec and play programs however, can run independently and concurrently, so those can be used to make a bidirectional intercom.

UNIX systems have many different sound systems: OSS, ALSA, Coreaudio, Pulseaudio, Jack...  Sox knows how to handle them, if you let it use the default settings.  So get the default to work first with a simple mixer application such as aumix, rec and play, before you try something complicated.

Sox also has internal buffering that is extremely large: 8 kilobytes.  The result is that sox by default has a huge delay.  When you use compression, then the more you compress, the longer the delay gets, because the slower stream takes longer to fill the buffers.  You can reduce that with the --buffer parameter.

In general, a simpler sound system such as ALSA or OSS will have smaller buffers and lower delays than the complex pulseaudio or coreaudio, so stick to the basics for best results.

The general syntax is: sox inputspec inputfile outputspec outputfile

A lone "-" instead of a file, tell sox to use stdio or a pipe, so read the examples carefully.  The data flow is from left to right in all the below examples.

Default setup

First get sox to work on the default sound device:

Make a noise:
$ cat /dev/urandom | sox -traw -r44100 -b16 -eunsigned-integer - -d

Play a tone:
$ play -n synth 10 sin 500

$ rec -r8000 -tcvsd file

$ play -r8000 -tcvsd file

Don't bother with the below if you cannot get the above to work.

Hook sox to netcat on two computers:

Assuming IP addresses and

With CVSD, be sure to set the sample rate and reduce the buffer size to reduce the delays.  The resulting network bandwidth is about 1 kilobyte per second and the end to end delay is about 1 second.  CVSD is a very simple CODEC, so it works well on an embedded processor.

On the first computer:
$ rec —buffer 32 -tcvsd -r8000 - | nc -u -l 5555 | play —buffer 32 -tcvsd -r8000 -

and on the second computer:
$ rec —buffer 32 -tcvsd -r8000 - | nc -u 5555 | play —buffer 32 -tcvsd -r8000 -

Just two one liners!

The above works and I had it running between a Macbook Pro with OSX (Mac sox is available from Homebrew) and a Fedora Linux laptop.  I also experimented on a BSD laptop and it behaves the same.

The main problem is with delays due to buffering and the more one compresses the data stream, the longer it takes to fill the pipes, hence longer delays.  The LPC10 CODEC will reduce the network bandwidth further, but the delay will be ridiculous.

To make it robust, you may want to put a while loop with a sleep delay around the above, so that if sox or netcat would exit for whatever reason, it would start again, after waiting a little bit for the dust to settle:
$ while true; do rec —buffer 32 -tcvsd -r8000 - | nc -u 5555 | play —buffer 32 -tcvsd -r8000 -; sleep 1; done

...and it is still a one liner!  OK, technically, it is three.

It took me about 3 days to figure the above out, so when all else fails, RTFM and try again.

It is certainly much better than two paper cups and string...

La voila!



  1. I don't have clear the idea about VoIP, but as you explain its amazing and now I can understand all about my VoIP headset which is I purchase last week from

  2. I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post I would like to read this audio visual installation services toronto


Post a Comment

On topic comments are welcome. Junk will be deleted.

Popular posts from this blog

Parasitic Quadrifilar Helical Antenna

This article was reprinted in OSCAR News, March 2018: If you want to receive Satellite Weather Pictures , then you need a decent antenna, otherwise you will receive more noise than picture. For polar orbit satellites, one needs an antenna with a mushroom shaped radiation pattern .  It needs to have strong gain towards the horizon where the satellites are distant, less gain upwards where they are close and as little as possible downwards, which would be wasted and a source of noise.  Most satellites are spin stabilized and therefore the antenna also needs circular polarization, otherwise the received signal will flutter as the antennas rotate through nulls. The helical antenna, first proposed by Kraus in 1948, is the natural solution to circular polarized satellite communications.  It is a simple twisted wire - there seems to be nothing to it.  Various papers have been published on helix antennas, so the operation is pretty well understood. Therefore,

Weather Satellite Turnstile Antennas for the 2 meter Band

NEC2, 2 m band, 146 MHz, Yagi Turnstile Simulation and Build This article describes a Turnstile Antenna for the 2 meter band, 146 MHz amateur satcom, 137 MHz NOAA and Russian Meteor weather satellites.  Weather satellite reception is described here .  A quadrifilar helical antenna is described here .   Engineering, is the art of making what you need,  from what you can get. Radiation Pattern of the Three Element Yagi-Uda Antenna Once one combine and cross two Yagis, the pattern becomes distinctly twisted. The right hand polarization actually becomes visible in the radiation pattern plot, which I found really cool. Radiation Pattern of Six Element Turnstile Antenna Only a true RF Geek can appreciate the twisted invisible inner beauty of a herring bone antenna... Six Element Turnstile Antenna Essentially, it is three crosses on a stick.  The driven elements are broken in the middle at the drive points.  The other elements can go straight throug

To C or not to C, That is the Question

As most would know, the Kernighan and Ritchie C Programming Language is an improved version of B, which is a simplified version of BCPL, which is derived from ALGOL, which is the Ur computer language that started the whole madness, when Adam needed an operating system for his Abacus, to count Eve's apples in the garden of Eden in Iraq.  The result is that C is my favourite, most hated computer language , which I use for everything. At university, I learned FORTRAN with punch cards on a Sperry-Univac, in order to run SPICE, to simulate an operational amplifier.  Computers rapidly lost their glamour after that era! Nobody taught me C.  I bought the book and figured it out myself. Over time, I wrote a couple of assemblers, a linker-locator, various low level debuggers and schedulers and I even fixed a bug in a C compiler - not because I wanted to, but because I had to, to get the job done!   Much of my software work was down in the weeds with DSP and radio modems ( Synchronization,