Intermittent Issues: The H.26x Format
By Ben Gruchow
June 8, 2016
BoxOfficeProphets.com

Read this article to understand what this means!

This is going to function as kind of an intermediary Intermittent Issues column before the final installment of the four (five)-part series initiated last year, because it’s a lot easier to talk about the next wave(s) of home cinema when you can shorthand a lot of the different formats.

ITU Telecommunication Standards

The H.26x video standard, which comprises requirements and parameters for digital video up to and including current HD and Ultra HD video formats (essentially, it’s the thing that makes most of the HD content you watch possible), comes from the International Telecommunication Union’s Standardization Sector, or ITU-T. Its predecessor was the H.120 standard; H.120 had application mostly in videoconferencing, and never received much in the way of wide support owing to coding strategies and protocols that varied by region. The quality of the video itself, owing primarily to poor temporal filtering, was also far enough below grade to be a liability on its own.

H.261 was developed on three core principles: discrete cosine transform (DCT), which is a method of converting a signal into elementary frequency components; differential pulse code modulation (DPCM), which effects a sort of prediction of future samples based on the data from past samples; and motion compensation, which anticipates movement in the frame based on past frames, and a pre-defined set of logical properties, and flags areas of a frame affected by motion to be encoded; the unaffected area is left alone. Motion compensation, when done properly, is incredibly advantageous in efficient video compression. A 2006 presentation on a 2002 Polytechnic University paper helpfully communicates the ratio of efficiency in broad language: proper motion compensation can improve compression quality by a factor of five to ten.

These three principles were transformative; they paved the way for H.261 to become adopted on a widespread level, and all subsequent major video coding standards have been based on them. H.261 also made use of a type of hierarchy that’s mostly been translated to the way digital content displays images now; this hierarchy centers around blocks - some concerned with luminosity and the others concerned with color (i.e. images are made with the use of light and color; this hierarchy quite literally constitutes the building blocks of how we view HD media. I’m going to go sit and face the corner now until I feel less shameful about that pun). The items that occupy the upper level of this hierarchy are GOBs, or groups of blocks; each group of blocks is made of a number of MBs, or macroblocks; each macroblock is made up of four luminosity blocks (luma blocks), and two color blocks (chroma blocks). Together, these blocks form the color space of YCbCr (where Y=luminosity, Cb stands for Chroma-blue, and Cr stands for chroma-red), and we also use the number of blocks to illustrate a concept called subsampling, which - without going into a dedicated article about how sensitively the human eye perceives color versus light - halves the vertical resolution of the chroma channels across the blocks, and we end up at four samplings of luminosity, and two samplings of half-resolution color, or 4:2:0.

This is important to denote here because this arrangement of color and light applies to almost every variant of H.26x, and the more recent codecs build off of this scheme rather than replace it.

H.262: MPEG-2, parts 2, 3, and 7

If you have ever played a DVD in your home (and the talking-head oracle on my chamber desk assures me that many of you have), then you’re familiar with MPEG-2, even if you’re not aware of it yet. That’s the codec’s claim to fame, really: it’s the video and audio codec that brought the home theater ethic to the population on a massive scale. LaserDisc, DVD’s clear predecessor from an aficionado standpoint, had the technology as far as multitrack audio, widescreen, and special features (not to mention nonlinear workflow as far as navigating the disc), but DVD contained the digital video advancement (LaserDisc video was still analog) necessary to shrink the physical size of the thing and make it palatable to the casual viewer.

H.262 was developed as MPEG-2 Part 2, to occupy a stronger and more versatile spot in the industry. It contained the ability to display video in interlaced and progressive scan, (LaserDisc couldn’t do progressive, and MPEG-1 couldn’t display interlaced) and it included a new type of frame. To this point, digital video encoding had made use of two types of frames. The first is the I-frame (intra), also called the reference frame, which consists of an entire, fully-rendered picture. A key frame in animation or a transition is something similar. It is the frame type that provides the most data, and consequentially is the most space-intensive. The second type of frame is the P-frame (predictive), also called the delta frame, which slots into the next frame slot and increases efficiency by containing only the elements of the previous I-frame that change. H.262 introduced the B-frame, or bi-directionally-predictive frame. This frame increased efficiency further, by containing only the changed elements from the frames preceding and following it. The more P- and B-frames a piece of digital video contained, the more it could pack into a given amount of space - and the more immutable the video was, the less able the user was to manipulate or edit the video in any way.

Part 2 also incorporated some HD-related elements at an embryonic level. Of the four “profiles” available for MPEG-2 encoding, two of them allowed for scalability; we’ll talk more about scalability when we theoretically arrive at the point of this column. The other HD-related element involved two coding “levels” within each profile, which provided HDTV-compliant resolution to the encode. Neither of these levels was used in practice, but the capability was there.

MPEG-2 Part 3 encompasses audio coding definitions; it allows for multichannel audio up to a 5.1 configuration, and it's the spec behind the limited advancement of audio DVDs primarily due to this asset. It also defines additional parameters and bitrates for MPEG-1 Audio Layer III, more commonly known as MP3. It's a substantive part of the MPEG-2 “build,” but a relatively obscure one. Less so is Part 7, also known as the Advanced Audio Codec, or AAC. Part 7 was designed as a replacement for MP3, being more efficient as far as compression goes while also allowing for many more discrete channels of audio and data.


H.263: Reengineered videoconferencing and early mobile applications

We won’t spend much time on H.263; it’s an important intermediary step between the H.262 standard and the relative game-changer that was H.264, and it was crucial to the advent of streaming video sites that form one of HD cinema’s multiple arms, but it doesn’t have much to do with HD cinema itself directly; instead, H.263 was mostly about improving compression efficiency and improving the scalability of video quality. The codec is Web-oriented in a way that its previous incarnations were not. H.263 was never going to be used as a standard for physical media or home entertainment; its widest audience was going to be found much more in “new media” - an ethereal concept in the mid-1990s when it was established, and still slippery and insubstantial as a general term ten years later. Perhaps its biggest “get” as far as acquisition was its use in encoding Flash videos for YouTube and MySpace (mentioning the latter should let you place the acquisition in a fairly specific time frame).

H.263 also includes several improvements to predictive encoding; like H.262, it possesses B-frames as a way to retain maximum visual information at minimum storage space. It also incorporates layering via extensions, which was a boon to scalability. As we mentioned, scalability refers to the ability of a video to sustain a certain resolution, while having the ability to intelligently degrade video quality depending on bandwidth, vehicle, or provider choice. Layering is a huge aspect of scalability in its modern form, and you can identify it in just about any current type of subscription media. Netflix is probably the easiest to spot it on, where it’s pretty frequent to start up a movie or show and behold a choppy, blocky image that quickly refines itself and gains clarity and resolution. Each instance of image improvement is essentially a layer being applied to the base image, changing the acceptable protocol and bitrate depending on the scenario. In this way, it’s not dissimilar from MPEG-2’s levels.

H.264

H.264 is a product of the MPEG-4 video standard; it is also known as MPEG-4 Part 10, or by the acronym AVC (Advanced Video Codec). Put simply, it is the reason we have the video we do on Blu-ray discs, computers, tablets, and smartphones. It’s the underpinning of HD in the home, of contemporary Internet streaming video, and of mobile services. But for the final segment in this column, it is the only standard capable of displaying content at Ultra HD-level resolution.

There are a number of core enhancements inherent to H.264 that set it above its predecessors; chief among them is the targeted reduction in bitrate and the compensation in bitrate reduction with more sophisticated means of arranging each frame. A big component of this arrangement is the introduction of variable block sizes. We talked about blocks already; the H.26x standard is built on the foundation of luma and chroma blocks. With H.264, these blocks can assume varying shapes and proportions in accordance with what the predictive framing needs require. This results in a savings of approximately 15% in terms of bitrate, without a discernable compromise in quality.

H.264 was provided an extension in 2009 to allow for multiview applications, a fancy way of saying that it accepts coding of content acquired from multiple cameras; this was usable for a predicted boom in 3-D stereoscopic films. There are several other instances of increased efficiency with the H.264 codec - deblocking applications for reducing the amount of macroblocking without losing detail in the image, weighted and quarter-pixel predictive capabilities for P and B frames in the image - but the main takeaway from the standard is the enormous leap forward it brought for HD in the home, and for paving the way to what’s here now.

H.265/HEVC

That “now” is the ultimate subject of our column; it’s H.265, or HEVC, for High-Efficiency Video Codec (the instincts of the ITU’s naming committee are a reminder of how clarity is occasionally at the expense of creativity, for better or worse). The standard was approved in April 2013. Putting very simply what H.265 does: it improves compression efficiency by up to 50% over AVC without perceptible quality loss, opening the door for storing content at 4K/Ultra HD-level resolution on mobile, streaming, and physical-media applications while keeping bandwidth usage and processing times reasonable. Improvements to the process and extensions to the codec will continue to boost quality and lower bitrate, similarly to H.262 and H.264.

To that end, H.265 pulls a lot of its structural makeup from H.264, and several features have been simplified. Most, however, are geared toward streamlining more complex coding work. For example, H.264 utilized a method of spatial prediction using pixel values from adjacent decoded blocks in order to fine-tune its image, and it did this with a total of eight directional modes. To visualize this, picture three rows of three dots each. The dot in the center is the pixel asking for “hints” as to prediction values; the eight dots surrounding it are what provide those hints. H.265 utilizes the same basic scheme, but with 35 directional modes instead of eight; this means there’s basically more than four times the resources for that central dot to pull information from. It’s picking a subject from one encyclopedia versus picking a subject from the Britannica Global Edition.

H.265 utilizes thirteen levels of encoding; with the available bitrates observed, Levels 5 and 6 and their derivatives are likely to be the most popular during the first phase of the codec’s lifespan, offering displays of 1080p at 128-300 frames per second, 2160p at 30-120 frames per second, and 4320p at 30-120 frames per second, depending on the level and derivative. There are three profiles for HEVC, consisting of a standard “Main” profile (8 bits, 4:2:0 chroma sampling); a “Main 10” profile (10 bits, 4:2:2 chroma sampling with a bitrate reduction of 5% over the Main profile, and Main Still Picture, which is a subset of the Main profile used for still image coding. There are additional profiles in development, including profiles for 12-bit decoding and 4:4:4 full chroma sampling.

What we see with the H.26x standard, cultivated chiefly through H.262, 264, and 265, is a codec given toward evolution rather than redefinition. A trend toward increased coding efficiency is the clearest improvement here, with the same general architecture precipitating, on average, a halving of bitrate for equivalent video quality every 10 years over the past three decades. The trends going into the future will focus on further improvements to compression efficiency and spatial resolution. It’s worth noting that most of the efficiency tests conducted for H.265 (as of 2014, at least), were conducted with HD content (720p or 1080p), despite one of the stated goals of the format being to realize visuals beyond high definition (I bet the original packaging for the first wave of Blu-ray discs are feeling the crushing weight of that dated hyperbole now, and no, I will not let it go). Tests of H.265 with 4K content at 24 and 30 frames per second have been conducted since, and the format has been shown to handle the visual requirement easily. Results for greater resolutions and higher detail is something baked into the H.265 mission statement, which should afford the codec a greater degree of permanence and flexibility.

Ultimately, Ultra HD is another stepping-stone; the ultimate goal here is, I assume, total perceptual immersion into the atmosphere of whatever film or event you’re watching. The implications of this are varied; the mind reels at the thought of being much closer to anything made by Zack Snyder or Michael Bay, for example. I’m reminded of The Pedestrian, a Ray Bradbury short story that I read all the way back in middle school. The story is narratively about Leonard Mead, a man in a city of millions who takes walks outside in total solitude. Everyone else stays inside, watching TV on giant wall screens. I don’t think we’ll get to that point; HDTV is still at a spotty-enough penetration rate as far as the market goes to invalidate much threat of an entirely disconnected society. Besides, it’s far more likely that we’d just stay buried in our smartphones.