Friday, May 18, 2018

Video Distribution With MPEG-2 Transport Streams

FFMPEG MPEG-2 TS Encapsulation

An observation aircraft could be fitted with three or four cameras and a radar.  In addition to the multiple video streams, there are also Key, Length, Value (KLV) metadata consisting of the time and date, the GPS position of the aircraft, the speed, heading and altitude, the position that the cameras are staring at, the range to the target, as well as the audio intercom used by the pilots and observers.  All this information needs to be combined into a single stream for distribution, so that the relationship between the various information sources is preserved.


Example UAV Video from FFMPEG Project

When the stream is recorded and played back later, one must still be able to determine which GPS position corresponds to which frame for example.  If one would save the data in separate files, then that becomes very difficult.  In a stream, everything is interleaved in chunks, so one can open the stream at any point and tell immediately exactly what happened, when and where.

The MPEG-2 TS container is used to encapsulate video, audio and metadata according to STANAG 4609.  This is similar to the Matroska format used for movies, but a movie has only one video channel.

The utilities and their syntax required to manipulate encapsulated video streams is obscure and it is difficult to debug, since off the shelf video players do not support streams with multiple video substreams and will only play one of the substreams, with no way to select which one to play, since they were made for Hollywood movies, not STANAG 4609 movies.

After considerable head scratching, I finally figured out how to do it and even more important, how to test and debug it.  Using the Bash shell and a few basic utilities, it is possible to sit at any UNIX workstation and debug this complex stream wrapper and metadata puzzle interactively.  Once one has it all under control, one can write a C program to do it faster, or one can just leave it as a Bash script, once it is working, since it is is easy to maintain.

References


Install the utilities

If you are using Debian or Ubuntu Linux, install the necessary tools with apt.  Other Linux distributions use dnf:
$ sudo apt install basez ffmpeg vlc mplayer espeak sox 

Note that these tests were done on Ubuntu Linux 18LTS.  You can obtain the latest FFMPEG version from Git, by following the compile guide referenced above.  If you are using Windows, well, good luck.

Capture video for test purposes

Capture the laptop camera to a MP4 file in the simplest way:
$ ffmpeg -f v4l2 -i /dev/video0 c1.mp4

Make 4 camera files with different video sizes, so that one can distinguish them later.  Also make four numbered cards and hold them up to the camera to see easily which is which:

$ ffmpeg -f v4l2 -framerate 25 -video_size vga -pix_fmt yuv420p -i /dev/video0 -vcodec h264 c1.mp4
$ ffmpeg -f v4l2 -framerate 25 -video_size svga -pix_fmt yuv420p -i /dev/video0 -vcodec h264 c2.mp4
$ ffmpeg -f v4l2 -framerate 25 -video_size xga -pix_fmt yuv420p -i /dev/video0 -vcodec h264 c3.mp4
$ ffmpeg -f v4l2 -framerate 25 -video_size uxga -pix_fmt yuv420p -i /dev/video0 -vcodec h264 c4.mp4

 

Playback methods

SDL raises an error, unless pix_fmt is explicitly specified during playback: "Unsupported pixel format yuvj422p"

Here is the secret to play video with ffmpeg and SDL:
$ ffmpeg -i s2.mp4 -pix_fmt yuv420p -f sdl "SDL OUT"

...and here is the secret to play video with ffmpeg and X:
$ ffmpeg -i s2.mp4 -f xv Screen1 -f xv Screen2 

With X, you can decode the video once and display it on multiple screens, without increasing the processor load.  If you are a Windows user - please don't cry...

Play video with ffplay:
$ ffplay s2.mp4

ffplay also uses SDL, but it doesn’t respect the -map option for stream playback selection.  Ditto for VLC and Mplayer.

Some help with window_size / video_size:
-window_size vga
‘cif’ = 352x288
‘vga’ = 640x480
...

 

Map multiple video streams into one mpegts container

Documentation: https://trac.ffmpeg.org/wiki/Map

Map four video camera input files into one stream:
$ ffmpeg -i c1.mp4 -i c2.mp4 -i c3.mp4 -i c4.mp4 -map 0:v -map 1:v -map 2:v -map 3:v -c:v copy -f mpegts s4.mp4

 

See whether the mapping worked

Compare the file sizes:
$ ls -al
total 14224
drwxr-xr-x  2 herman herman    4096 May 18 13:19 .
drwxr-xr-x 16 herman herman    4096 May 18 11:19 ..
-rw-r--r--  1 herman herman 1113102 May 18 13:12 c1.mp4
-rw-r--r--  1 herman herman 2474584 May 18 13:13 c2.mp4
-rw-r--r--  1 herman herman 1305167 May 18 13:13 c3.mp4
-rw-r--r--  1 herman herman 2032543 May 18 13:14 c4.mp4
-rw-r--r--  1 herman herman 7621708 May 18 13:19 s4.mp4


The output file s4.mp4 size is the sum of the camera parts above.

 

Analyze the output stream file using ffmpeg

Run "ffmpeg -i INPUT" (not specify an output) to see what program IDs and stream IDs it contains:

$ ffmpeg -i s4.mp4
ffmpeg version 3.4.2-2 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.3.0-16ubuntu2)
  configuration: --prefix=/usr --extra-version=2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-
...snip...
Input #0, mpegts, from 's4.mp4':
  Duration: 00:00:16.60, start: 1.480000, bitrate: 3673 kb/s
  Program 1
    Metadata:
      service_name    : Service01
      service_provider: FFmpeg
    Stream #0:0[0x100]: Video: h264 (High 4:2:2) ([27][0][0][0] / 0x001B), yuvj422p(pc, progressive), 640x480 [SAR 1:1 DAR 4:3], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x101]: Video: h264 (High 4:2:2) ([27][0][0][0] / 0x001B), yuvj422p(pc, progressive), 960x540 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:2[0x102]: Video: h264 (High 4:2:2) ([27][0][0][0] / 0x001B), yuvj422p(pc, progressive), 1024x576 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:3[0x103]: Video: h264 (High 4:2:2) ([27][0][0][0] / 0x001B), yuvj422p(pc, progressive), 1280x720 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc

Running ffmpeg with no output, shows the streams have different resolutions and corresponds to the original 4 files (640x480, 960x540, 1024x576, 1280x720).

 

Play or extract specific substreams

Play the best substream with SDL (uxga):
$ ffmpeg -i s4.mp4 -pix_fmt yuv420p -f sdl "SDL OUT"

Play the first substream (vga):
$ ffmpeg -i s4.mp4 -pix_fmt yuv420p -map v:0 -f sdl "SDL OUT"

Use -map v:0 till -map v:3 to play or extract the different video substreams.

Add audio and data to the mpegts stream:

Make two audio test files:
$ espeak “audio channel one, audio channel one, audio channel one” -w audio1.wav
$ espeak “audio channel two, audio channel two, audio channel two” -w audio2.wav


Convert the files from wav to m4a to be compliant with STANAG 4609:
$ ffmpeg -i audio1.wav -codec:a aac audio1.m4a
$ ffmpeg -i audio2.wav -codec:a aac audio2.m4a

Make two data test files:
$ echo “Data channel one. Data channel one. Data channel one.”>data1.txt
$ echo “Data channel two. Data channel two. Data channel two.”>data2.txt

 

Map video, audio and data into the mpegts stream

Map three video camera input files, two audio and one data stream into one mpegts stream:
$ ffmpeg -i c1.mp4 -i c2.mp4 -i c3.mp4 -i audio1.m4a -i audio2.m4a -f data -i data1.txt -map 0:v -map 1:v -map 2:v -map 3:a -map 4:a -map 5:d -c:v copy -c:d copy -f mpegts s6.mp4

(This shows that mapping data into a stream doesn't actually work yet - see below!) 

 

Verify the stream contents

See whether everything is actually in there:
$ ffmpeg -i s6.mp4
…snip...
[mpegts @ 0x55f2ba4e3820] start time for stream 5 is not set in estimate_timings_from_pts
Input #0, mpegts, from 's6.mp4':
  Duration: 00:00:16.62, start: 1.458189, bitrate: 2676 kb/s
  Program 1
    Metadata:
      service_name    : Service01
      service_provider: FFmpeg
    Stream #0:0[0x100]: Video: h264 (High 4:2:2) ([27][0][0][0] / 0x001B), yuvj422p(pc, progressive), 640x480 [SAR 1:1 DAR 4:3], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x101]: Video: h264 (High 4:2:2) ([27][0][0][0] / 0x001B), yuvj422p(pc, progressive), 960x540 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:2[0x102]: Video: h264 (High 4:2:2) ([27][0][0][0] / 0x001B), yuvj422p(pc, progressive), 1024x576 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:3[0x103](und): Audio: mp2 ([4][0][0][0] / 0x0004), 22050 Hz, mono, s16p, 160 kb/s
    Stream #0:4[0x104](und): Audio: mp2 ([4][0][0][0] / 0x0004), 22050 Hz, mono, s16p, 160 kb/s
    Stream #0:5[0x105]: Data: bin_data ([6][0][0][0] / 0x0006)

The ffmpeg analysis of the stream shows three video, two audio and one data substream.

 

Extract the audio and data from the stream

Extract and play one audio channel:
$ ffmpeg -i s6.mp4 -map a:0 aout1.m4a
$ ffmpeg -i aout1.m4a aout1.wav
$ play aout1.wav

and the other one:
$ ffmpeg -i s6.mp4 -map a:1 aout2.m4a
$ ffmpeg -i aout2.m4a aout2.wav
$ play aout2.wav

Extract the data

Extract the data using the -map d:0 parameter:
$ ffmpeg -i s6.mp4 -map d:0 -f data dout1.txt

...and nothing is copied.  The output file is zero length.

This means the original data was not inserted into the stream in the first place, so there is nothing to extract.

It turns out that while FFMPEG does support data copy, it doesn't support data insertion yet.  For the time being, one should either code it up in C using the API, or use Gstreamer to insert the data into the stream: https://developer.ridgerun.com/wiki/index.php/GStreamer_and_in-band_metadata#KLV_Key_Length_Value_Metadata

Extract KLV data from a real UAV video file

You can get a sample UAV observation file with video and metadata here:

$ wget http://samples.ffmpeg.org/MPEG2/mpegts-klv/Day%20Flight.mpg

Get rid of that stupid space in the file name:
$ mv Day[tab] DayFlight.mpg

The above file is perfect for meta data copy and extraction experiments:
$ ffmpeg -i DayFlight.mpg -map d:0 -f data dayflightklv.dat
...snip
 [mpegts @ 0x55cb74d6a900] start time for stream 1 is not set in estimate_timings_from_pts
Input #0, mpegts, from 'DayFlight.mpg':
  Duration: 00:03:14.88, start: 10.000000, bitrate: 4187 kb/s
  Program 1
    Stream #0:0[0x1e1]: Video: h264 (Main) ([27][0][0][0] / 0x001B), yuv420p(progressive), 1280x720, 60 fps, 60 tbr, 90k tbn, 180k tbc
    Stream #0:1[0x1f1]: Data: klv (KLVA / 0x41564C4B)
Output #0, data, to 'dout2.txt':
  Metadata:
    encoder         : Lavf57.83.100
    Stream #0:0: Data: klv (KLVA / 0x41564C4B)
Stream mapping:
  Stream #0:1 -> #0:0 (copy)
Press [q] to stop, [?] for help
size=       1kB time=00:00:00.00 bitrate=N/A speed=   0x   
video:0kB audio:0kB subtitle:0kB other streams:1kB global headers:0kB muxing overhead: 0.000000%


Dump the KLV file in hexadecimal:
$ hexdump dayflightklv.dat
0000000 0e06 342b 0b02 0101 010e 0103 0001 0000
0000010 9181 0802 0400 8e6c 0320 8583 0141 0501
0000020 3d02 063b 1502 0780 0102 0b52 4503 4e4f
0000030 0e0c 6547 646f 7465 6369 5720 5347 3438
0000040 040d c44d bbdc 040e a8b1 fe6c 020f 4a1f
0000050 0210 8500 0211 4b00 0412 c820 7dd2 0413
0000060 ddfc d802 0414 b8fe 61cb 0415 8f00 613e
0000070 0416 0000 c901 0417 dd4d 2a8c 0418 beb1
0000080 f49e 0219 850b 0428 dd4d 2a8c 0429 beb1

...snip 

Sneak a peak for interesting text strings:

$ strings dayflightklv.dat
KLVA'   

BNZ
Bms
JUD
07FEB
5g|IG

...snip

Cool, it works!

KLV Data Debugging

The KLV data is actually what got me started with this in the first place.   The basic problem is how to ensure that the GPS data is saved with the video, so that one can tell where the plane was and what it was looking at, when a recording is played back later.

The transport of KLV metadata over MPEG-2 transport streams in an asynchronous manner is defined in SMPTE RP 217 and MISB ST0601.8:
http://www.gwg.nga.mil/misb/docs/standards/ST0601.8.pdf

Here is a more human friendly description:
https://impleotv.com/2017/02/17/klv-encoded-metadata-in-stanag-4609-streams/

You can make a short form meta data KLV LS test message using the echo \\x command to output binary values to a file.  Working with binary data in Bash is problematic, but one just needs to know what the limitations are (zeroes, line feeds and carriage return characters may disappear for example):  Don't store binary data in a shell variable (use a file) and don't do shell arithmetic, use the calculator bc or awk instead.

The key, length and date are in this example, but I'm still working on the checksum calculation and the byte orders are probably not correct.  It only gives the general idea of how to do it at this point:

# Universal Key for Local Data Set
echo -en \\x06\\x0E\\x2B\\x34\\x02\\x0B\\x01\\x01 > klvdata.dat
echo -en \\x0E\\x01\\x03\\x01\\x01\\x00\\x00\\x00 >> klvdata.dat
# Length 76 bytes for short packet
echo -en \\x4c >> klvdata.dat
# Value: First ten bytes is the UNIX time stamp, tag 2, length 8, 8 byte time
echo -en \\x02\\x08 >> klvdata.dat
printf "%0d" "$(date +%s)" >> klvdata.dat
echo -en \\x00\\x01\\x02\\x03\\x04\\x05\\x06\\x07\\x08\\x09 >> klvdata.dat
echo -en \\x00\\x01\\x02\\x03\\x04\\x05\\x06\\x07\\x08\\x09 >> klvdata.dat
echo -en \\x00\\x01\\x02\\x03\\x04\\x05\\x06\\x07\\x08\\x09 >> klvdata.dat
echo -en \\x00\\x01\\x02\\x03\\x04\\x05\\x06\\x07\\x08\\x09 >> klvdata.dat
echo -en \\x00\\x01\\x02\\x03\\x04\\x05\\x06\\x07\\x08\\x09 >> klvdata.dat
echo -en \\x00\\x01\\x02\\x03\\x04\\x05\\x06\\x07\\x08\\x09 >> klvdata.dat
echo -en \\x00\\x01 >> klvdata.dat
# Checksum tag 1, length 2
echo -en \\x01\\x02 >> klvdata.dat
# Calculate 2 byte sum with bc
echo -en \\x04\\x05 >> klvdata.dat

The UTC time stamp since Epoch 1 Jan 1970 must be the first data field:
$ printf "%0d" "$(date +%s)" | hexdump
0000000 3531 3632 3237 3838 3030              

The checksum is a doozy.  It is a 16 bit sum of everything excluding the sum itself and would need the help of the command line calculator bc.  One has to read two bytes at a time, swap them around (probably), then convert the binary to hex text, do the calculation in bc and eventually output the data in binary back to the file.  I would need a very big mug of coffee to get that working.

Multicast Routing

Note that multicast routing is completely different from unicast routing.  A multicast packet has no source and destination address.  Instead, it has a group address and something concocted from the host MAC.  To receive a stream, a host has to subscribe to the group with IGMP.

Here, there be dragons.

If you need to route video between two subnets, then you should consider sparing yourself the head-ache and rather use unicast streaming.  Otherwise, you would need an expensive switch from Cisco, or HPE, or OpenBSD with dvmrpd.

Linux multicast routing is not recommended, for three reasons: No documentation and unsupported, buggy router code.  Windows cannot route it at all and FreeBSD needs to be recompiled for multicast routing.  Only OpenBSD supports multicast routing out of the box.

Do not meddle in the affairs of dragons,
for you are crunchy
and taste good with ketchup.

Also consider that UDP multicast packets have a Time To Live of 1, meaning that they will be dropped at the first router.  Therefore a multicast router also has to increment the TTL.

If you need to use OpenBSD, do get a copy of Absolute OpenBSD - UNIX for the Practically Paranoid, by M.W. Lucas.


Sigh...

Herman

No comments:

Post a Comment

On topic comments are welcome. Junk will be deleted.