In-Game HD Video Recording using Real-Time YUYV-DXT Compression

posted in Journal of Lethargic Programmers

Published March 28, 2010

This article describes a method of recording in-game HD videos without the large impact on frame rate as with an external video capture software.

The method used in this approach was inspired by an article about Real-Time YCoCg-DXT Compression which presented a real-time GPU compression algorithm to DXT formats.
Standard DXT texture formats aren't very suitable for compression of general images like the game frames, the higher contrast results in artifacts like color bleeding and color blocking. The article introduced YCoCg-DXT format that encodes colors to YCoCg color space (intensity and orange and green chrominance). It also contains the source code for real-time GPU compression and comparison of achieved results.

The YCoCg format is suitable for decompression on GPU, because decoding YCoCg values back to RGB only takes a few shader instructions. However, for the purpose of decoding the frame data in a video codec, a better format is a YUV-based one that allows to decode the data directly to the video surface without additional conversions. The best format for this seemed to be YUYV with 16 bits per sample, which means there's one U and V value per 2 horizontal samples.
The compression algorithm differs from the YCoCg-DXT one in the initial color space conversion to YUYV and in that it encodes 4x4 YY, U and V blocks in the way alpha component is encoded in DXT5 format.

The algorithm is as follows:

Video frames are compressed with fragment shader to YUYV-DXT format by render to texture technique, reducing the data to 1/3 of its original size

The compressed textures are asynchronously read back to CPU

The data are continuously written to disk

The compression on GPU reduces the bandwidth needed between CPU and GPU, but more importantly also the bandwidth needed for disk writes. Sustainable write speed of a SATA drives is somewhere around 55MB/s, transferring a raw 1280x720/30fps video takes 79.1MB/s, while the DXT compressed video only takes 26.4MB/s. A Full-HD video stream is 59.3MB/s.

To capture the frame buffer data the application first renders to an intermediate target. The compression shader uses this as the input texture, rendering to a uint4 target with one quarter width and height of the original resolution, that is then read back to CPU memory.

The next step is decoding the captured video. To make this easy I've written a custom video codec and video format plugin for ffmpeg library. The format was named Yog (from YCoCg) as the encoding was originally in YCoCg format, changed only later to YUYV.
The game produces *.yog video files that can be directly replayed by ffplay or converted to another video format with the ffmpeg utility. They are also recognized by any video processing software that uses ffmpeg or ffplay executables or uses the avcodec and avformat dlls from the suite, such as WinFF or FFe or many others.

Results

After starting the video recording in our game the frame rate drops only by a few fps, and it's still playable normally, unlike when recording for example with Fraps. Disadvantage is that this has to be integrated into the renderer path.
Quality wise the results are quite good, as it can be seen on the following screen shots:

Original

YUYV compressed, note this is slightly lighter because of an issue in ffmpeg that has to be solved yet.

The difference, 4X amplified

The source code and further implementation details can be found at outerra.com/video/index.html

Previous Entry Tatra T813 Video

Next Entry Integrating Vector Data - Roads

0 likes 7 comments

Comments

tuan kuranes

Very Interesting, thanks for sharing.

Can you elaborate a bit more on the asynchronous readback ?
(opengl pbo ? dx9 trick? dx10 trick? special directx texture mode or fourCC thingy? something listed in here ?)

Or perhaps just emitting an Hardware Occlusion Query at frame start and waiting for all pixels to be drawn before issue read-back of Render Texture is sufficient ?
(would combine speed of FBO + asynchronous read-back...)

I'm sure you're aware of that, but just in case, you can even go one steop further and have automatic posting on Youtube for instance ( as documented/coded here in this article && documented here youtube side )

March 28, 2010 09:41 AM

cameni

We are using OpenGL PBOs, emitting the readback after the frame was rendered to the intermediate target, and mapping the memory after a fixed number of frames. One frame later seems to be enough for us, but it can be easily set to more as the buffers are pooled.

The integration with YouTube will be interesting later for the players, thanks for the links.

March 28, 2010 12:29 PM