The Lantern Festival is approaching, which means that the "year" is about to end. While the aftermath of the Spring Festival Gala has not passed, let's take a look back at the technology that supports the live broadcast of this "Cultural Chinese New Year's Eve Dinner": ultra-high-definition shallow compression.
This technology is the best thing to do at first glance, so what's so amazing about it?
Ultra-high-definition" is easy to understand, which is nothing more than the high definition of the ** picture, with a resolution of 4K or even 8K.
So what does "shallow compression" mean here? Let's start with some basic concepts.
* Some basic concepts.
*In essence, it will be a rapid succession of independent pictures **, using the visual persistence effect of the eyes to make people feel "moving".
Each frame is called a "frame", and the number of frames per second is called the "frame rate", and the unit is fps, that is, frame per second. Generally speaking, if the frame rate of the picture seen by the eye is higher than about 10 to 12 frames per second, it is considered to be continuous.
In order to get a more delicate effect, frame rates such as 24fps, 25fps, and 30fps are commonly used in film production. Higher frame rates, such as 60fps and 120fps, are used to record in slow motion.
There are so many images in a row every second, and the size of the ** is generally much larger**, and the bandwidth occupied by the smooth ** is also much larger.
If you don't do any processing, theoretically, to *** resolution of 4K, frame rate of 30fps**, you need to occupy a bandwidth of close to 6Gbps.
It's just traffic gluttony. Wired can also be transmitted in close proximity and can be carried by wire, but it is too difficult to use wireless, even if it is as strong as 5G, it is also stretched.
Therefore, we need to compress **.
* Deep compression.
Why can it be compressed? This is because there is a lot of redundant information within each frame and between multiple frames.
First of all, within each frame, the color correlation of adjacent pixels within an image is very strong, the texture is likely to change continuously, and the background may have a large number of repetitive colors. With the right algorithms, redundancies can be removed and the amount of data can be compressed.
Then, between multiple frames, if there is no scene switching, most of the images between adjacent frames are the same: for example, a butterfly flying among flowers, except for the butterfly's attitude change between multiple frames, the rest is unchanged, and the data can be compressed.
Therefore, there are two ways of compression: "intra-frame" and "inter-frame", which decomposes the compression into multiple pixel blocks, and then combines the content of the current frame and the preceding and subsequent frames for compression.
The main idea of the in-frame ** is that the texture of the picture is continuous, and there are a large number of "".Space redundancy", so you can use adjacent decoded pixels**unknown pixels. In the actual encoding, only the residual block left after subtracting the ** block from the original block needs to be operated, so that the amount of data can be effectively reduced.
The main idea of inter-frame ** is that since there is a strong correlation between adjacent frames, there is a large number of ".Time redundancyThen just find the position of the current block in the reference frame, calculate the corresponding displacement and get the residual between the two frames, and then send only the residual frame with a small amount of data.
In this way, the compression algorithm encodes the ** content into I-frames and P-frames. The i-frame is used as a keyframe and must be able to be decoded completely independently, so only intra-frame ** can be used; As a ** frame, the p frame only reflects the part of the change of the previous i frame, and can be used both within and between frames, but it cannot be decoded without the i frame.
In addition to the I and P frames, there is also a type of B frame (bidirectional frame). The encoding and decoding of the B-frame can refer to the left and right frames at the same time, and the amount of data that needs to be transmitted is smaller.
Since the content is continuous and tends to be more time-domain correlated, the compression efficiency between frames is generally higher, so the P B frame is generally much smaller than the I frame.
This two-pronged effect of intra-frame and inter-frame compression is very good, and the original compression can be very small under the premise that there is basically no loss of image quality, so it is also called ".Deep compression
In the current mainstream compression coding standard h264 For example, 1080p resolution, 60fps**, compressed and encoded, only 4Mbps network bandwidth, you have a good experience on mobile phones.
Lack of deep compression.
However, there are two sides to everything. What is the cost of deep compression?
The easiest thing to think of is: the ** delay of the live broadcast.
As mentioned earlier, the decoding of P frame and B frame needs to depend on other frames, when the terminal ** is live broadcast, it is very likely that the access time point is the ** frame that cannot be decoded independently, then the ** program must wait until the key frame to decode normally**, which inevitably leads to delay.
Imagine if the interval between two i-frames is set to 2 seconds, then if you're unlucky, you might have to wait 2 seconds to see the footage after connecting to the live stream.
The access delay of the above-mentioned ** is actually a small problem, and the impact of deep compression on ** production and broadcasting is unacceptable.
As we all know, the role of the director is very critical in the production and broadcasting. The main function of the broadcast station is to integrate the first-class signals of multiple cameras together, and the director will switch them in real time according to the needs of the site.
Let's focus our gaze on the left side of the image above. Suppose that on the stage of the Spring Festival Gala, the artist is singing loudly, and multiple cameras shoot pictures from different angles and send them back to the broadcast station for production and broadcasting, switching, splitting screens, superimposing elements, etc.
However, due to the deep compression of these ** streams, it is very likely that when the director switches the live broadcast screen from camera 1 to camera 2, he just caught up with the non-decodeable non-keyframes, which is good, and the screens of all viewers are stuck.
Therefore, in this scenario, it is not appropriate to use deep compression. Besides, the distance from the camera to the broadcast station is generally not far, and it is enough to use ultra-large bandwidth wired transmission, and there is no need to save bandwidth, and naturally there is no need for such extreme compressionShallow compression"Technology is enough.
Why do you need shallow compression.
The so-called shallow compression, in fact, is to abandon the inter-frame ** encoding that will produce P frames or B frames, and all use intra-frame ** encoding, so that the generated ** is all i frames, and each frame can be completely independently decoded, although the amount of data has increased significantly, but it also eliminates the shortcomings of deep compression.
Since shallow compression technology is so logical, why did it still cause a sensation during the live broadcast of the Spring Festival Gala?
This is because of the introduction of 5G.
Traditionally, you have to use either a fixed camera position or a movable camera position with a cable in tow. This is because the bandwidth required for transmission shallow compression** is large, if the compression ratio is 1 8, each 4K** also needs 1Gbps bandwidth, and the latency is low, which is difficult to meet by ordinary wireless methods.
But 5G can. The first-class production and broadcasting private network built by 5G-A can achieve a bandwidth of more than 10Gbps, and it is natural to realize "5G ultra-high-definition shallow compression production and broadcasting". Due to the low latency, the images captured by both wireless and wired cameras can be mixed smoothly.
Finally, let's summarize.
Shallow compression is mainly used in the scene is the display interface and professional production and broadcasting, the bandwidth of the channel is very abundant, and the requirements are lossless image quality and low latency, and the compression rate is generally 1 8. Ordinary users generally do not have access to shallow compression, so it is called TOB compression.
Deep compression is primarily user-oriented, hence the name TOC compression. **After the production and broadcasting from the director station is completed, it is distributed after deep compression, and the compression rate can reach 1 200 or even 1 500. In this way, not only can the transmission bandwidth be saved to the maximum, but the audience will also look smooth.
Thank you very much for seeing the end.