BPG Specification version 0.9.5 Copyright (c) 2014-2015 Fabrice Bellard 1) Introduction --------------- BPG is a lossy and lossless picture compression format based on HEVC [1]. It supports grayscale, YCbCr, RGB, YCgCo color spaces with an optional alpha channel. CMYK is supported by reusing the alpha channel to encode an additional white component. The bit depth of each component is from 8 to 14 bits. The color values are stored either in full range (JPEG case) or limited range (video case). The YCbCr color space is either BT 601 (JPEG case), BT 709 or BT 2020. The chroma can be subsampled by a factor of two in horizontal or both in horizontal and vertical directions (4:4:4, 4:2:2 or 4:2:0 chroma formats are supported). In order to be able to transcode JPEG images or video frames without modification to the chroma, both JPEG and MPEG2 chroma sample positions are supported. Progressive decoding and display is supported by interleaving the alpha and color data. Arbitrary metadata (such as EXIF, ICC profile, XMP) are supported. Animations are supported as an optional feature. Decoders not supporting animation display the first frame of the animation. 2) Bitstream conventions ------------------------ The bit stream is byte aligned and bit fields are read from most significant to least signficant bit in each byte. - u(n) is an unsigned integer stored on n bits. - ue7(n) is an unsigned integer of at most n bits stored on a variable number of bytes. All the bytes except the last one have a '1' as their first bit. The unsigned integer is represented as the concatenation of the remaining 7 bit codewords. Only the shortest encoding for a given unsigned integer shall be accepted by the decoder (i.e. the first byte is never 0x80). Example: Encoded bytes Unsigned integer value 0x08 8 0x84 0x1e 542 0xac 0xbe 0x17 728855 - ue(v) : unsigned integer 0-th order Exp-Golomb-coded (see HEVC specification). - b(8) is an arbitrary byte. 3) File format -------------- 3.1) Syntax ----------- heic_file() { file_magic u(32) pixel_format u(3) alpha1_flag u(1) bit_depth_minus_8 u(4) color_space u(4) extension_present_flag u(1) alpha2_flag u(1) limited_range_flag u(1) animation_flag u(1) picture_width ue7(32) picture_height ue7(32) picture_data_length ue7(32) if (extension_present_flag) extension_data_length ue7(32) extension_data() } hevc_header_and_data() } extension_data() { while (more_bytes()) { extension_tag ue7(32) extension_tag_length ue7(32) if (extension_tag == 5) { animation_control_extension(extension_tag_length) } else { for(j = 0; j < extension_tag_length; j++) { extension_tag_data_byte b(8) } } } } animation_control_extension(payload_length) { loop_count ue7(16) frame_period_num ue7(16) frame_period_den ue7(16) while (more_bytes()) { dummy_byte b(8) } } hevc_header_and_data() { if (alpha1_flag || alpha2_flag) { hevc_header() } hevc_header() hevc_data() } hevc_header() { hevc_header_length ue7(32) log2_min_luma_coding_block_size_minus3 ue(v) log2_diff_max_min_luma_coding_block_size ue(v) log2_min_transform_block_size_minus2 ue(v) log2_diff_max_min_transform_block_size ue(v) max_transform_hierarchy_depth_intra ue(v) sample_adaptive_offset_enabled_flag u(1) pcm_enabled_flag u(1) if (pcm_enabled_flag) { pcm_sample_bit_depth_luma_minus1 u(4) pcm_sample_bit_depth_chroma_minus1 u(4) log2_min_pcm_luma_coding_block_size_minus3 ue(v) log2_diff_max_min_pcm_luma_coding_block_size ue(v) pcm_loop_filter_disabled_flag u(1) } strong_intra_smoothing_enabled_flag u(1) sps_extension_present_flag u(1) if (sps_extension_present_flag) { sps_range_extension_flag u(1) sps_extension_7bits u(7) } if (sps_range_extension_flag) { transform_skip_rotation_enabled_flag u(1) transform_skip_context_enabled_flag u(1) implicit_rdpcm_enabled_flag u(1) explicit_rdpcm_enabled_flag u(1) extended_precision_processing_flag u(1) intra_smoothing_disabled_flag u(1) high_precision_offsets_enabled_flag u(1) persistent_rice_adaptation_enabled_flag u(1) cabac_bypass_alignment_enabled_flag u(1) } trailing_bits u(v) } hevc_data() { for(i = 0; i < v; i++) { hevc_data_byte b(8) } } frame_duration_sei(payloadSize) { frame_duration u(16) } 3.2) Semantics -------------- 'file_magic' is defined as 0x425047fb. 'pixel_format' indicates the chroma subsampling: 0 : Grayscale 1 : 4:2:0. Chroma at position (0.5, 0.5) (JPEG chroma position) 2 : 4:2:2. Chroma at position (0.5, 0) (JPEG chroma position) 3 : 4:4:4 4 : 4:2:0. Chroma at position (0, 0.5) (MPEG2 chroma position) 5 : 4:2:2. Chroma at position (0, 0) (MPEG2 chroma position) The other values are reserved. 'alpha1_flag' and 'alpha2_flag' give information about the alpha plane: alpha1_flag=0 alpha2_flag=0: no alpha plane. alpha1_flag=1 alpha2_flag=0: alpha present. The color is not premultiplied. alpha1_flag=1 alpha2_flag=1: alpha present. The color is premultiplied. The resulting non-premultiplied R', G', B' shall be recovered as: if A != 0 R' = min(R / A, 1), G' = min(G / A, 1), B' = min(B / A, 1) else R' = G' = B' = 1 . alpha1_flag=0 alpha2_flag=1: the alpha plane is present and contains the W color component (CMYK color). The resulting CMYK data can be recovered as follows: C = (1 - R), M = (1 - G), Y = (1 - B), K = (1 - W) . In case no color profile is specified, the sRGB color R'G'B' shall be computed as: R' = R * W, G' = G * W, B' = B * W . 'bit_depth_minus_8' is the number of bits used for each component minus 8. In this version of the specification, bit_depth_minus_8 <= 6. 'extension_present_flag' indicates that extension data are present. 'color_space' specifies how to convert the color planes to RGB. It must be 0 when pixel_format = 0 (grayscale): 0 : YCbCr (BT 601, same as JPEG and HEVC matrix_coeffs = 5) 1 : RGB (component order: G B R) 2 : YCgCo (same as HEVC matrix_coeffs = 8) 3 : YCbCr (BT 709, same as HEVC matrix_coeffs = 1) 4 : YCbCr (BT 2020 non constant luminance system, same as HEVC matrix_coeffs = 9) 5 : reserved for BT 2020 constant luminance system, not supported in this version of the specification. The other values are reserved. YCbCr is defined using the BT 601, BT 709 or BT 2020 conversion matrices. For RGB, G is stored as the Y plane. B in the Cb plane and R in the Cr plane. YCgCo is defined as HEVC matrix_coeffs = 8. Y is stored in the Y plane. Cg in the Cb plane and Co in the Cr plane. If no color profile is present, the RGB output data are assumed to be in the sRGB color space [6]. 'limited_range_flag': opposite of the HEVC video_full_range_flag. The value zero indicates that the full range of each color component is used. The value one indicates that a limited range is used: - (16 << (bit_depth - 8) to (235 << (bit_depth - 8)) for Y and G, B, R, - (16 << (bit_depth - 8) to (240 << (bit_depth - 8)) for Cb and Cr. For the YCgCo color space, the range limitation shall be done on the RGB data. The alpha (or W) plane always uses the full range. 'animation_flag'. The value '1' indicates that more than one frame are encoded in the hevc data. The animation control extension must be present. If the decoder does not support animations, it shall decode the first frame only and ignore the animation information. 'picture_width' is the picture width in pixels. The value 0 is not allowed. 'picture_height' is the picture height in pixels. The value 0 is not allowed. 'picture_data_length' is the picture data length in bytes. The special value of zero indicates that the picture data goes up to the end of the file. 'extension_data_length' is the extension data length in bytes. 'extension_data()' is the extension data. 'extension_tag' is the extension tag. The following values are defined: 1: EXIF data. 2: ICC profile (see [4]) 3: XMP (see [5]) 4: Thumbnail (the thumbnail shall be a lower resolution version of the image and stored in BPG format). 5: Animation control data. The decoder shall ignore the tags it does not support. 'extension_tag_length' is the length in bytes of the extension tag. 'loop_count' gives the number of times the animation shall be played. The value of 0 means infinite. 'frame_period_num' and 'frame_period_den' encode the default delay between each frame as frame_period_num/frame_period_den seconds. The value of 0 for 'frame_period_num' or 'frame_period_den' is forbidden. 'hevc_header_length' is the length in bytes of the following data up to and including 'trailing_bits'. 'log2_min_luma_coding_block_size_minus3', 'log2_diff_max_min_luma_coding_block_size', 'log2_min_transform_block_size_minus2', 'log2_diff_max_min_transform_block_size', 'max_transform_hierarchy_depth_intra', 'sample_adaptive_offset_enabled_flag', 'pcm_enabled_flag', 'pcm_sample_bit_depth_luma_minus1', 'pcm_sample_bit_depth_chroma_minus1', 'log2_min_pcm_luma_coding_block_size_minus3', 'log2_diff_max_min_pcm_luma_coding_block_size', 'pcm_loop_filter_disabled_flag', 'strong_intra_smoothing_enabled_flag', 'sps_extension_flag' 'sps_extension_present_flag', 'sps_range_extension_flag' 'transform_skip_rotation_enabled_flag', 'transform_skip_context_enabled_flag', 'implicit_rdpcm_enabled_flag', 'explicit_rdpcm_enabled_flag', 'extended_precision_processing_flag', 'intra_smoothing_disabled_flag', 'high_precision_offsets_enabled_flag', 'persistent_rice_adaptation_enabled_flag', 'cabac_bypass_alignment_enabled_flag' are the corresponding fields of the HEVC SPS syntax element. 'trailing_bits' has a value of 0 and has a length from 0 to 7 bits so that the next data is byte aligned. 'hevc_data()' contains the corresponding HEVC picture data, excluding the first NAL start code (i.e. the first 0x00 0x00 0x01 or 0x00 0x00 0x00 0x01 bytes). The VPS and SPS NALs shall not be included in the HEVC picture data. The decoder can recover the necessary fields from the header by doing the following assumptions: - vps_video_parameter_set_id = 0 - sps_video_parameter_set_id = 0 - sps_max_sub_layers = 1 - sps_seq_parameter_set_id = 0 - chroma_format_idc: for picture data: chroma_format_idc = pixel_format for alpha data: chroma_format_idc = 0. - separate_colour_plane_flag = 0 - pic_width_in_luma_samples = ceil(picture_width/cb_size) * cb_size - pic_height_in_luma_samples = ceil(picture_height/cb_size) * cb_size with cb_size = 1 << log2_min_luma_coding_block_size - bit_depth_luma_minus8 = bit_depth_minus_8 - bit_depth_chroma_minus8 = bit_depth_minus_8 - max_transform_hierarchy_depth_inter = max_transform_hierarchy_depth_intra - scaling_list_enabled_flag = 0 - log2_max_pic_order_cnt_lsb_minus4 = 4 - amp_enabled_flag = 1 - sps_temporal_mvp_enabled_flag = 1 Alpha data encoding: - If alpha data is present, all the corresponding NALs have nuh_layer_id = 1. NALs for color data shall have nuh_layer_id = 0. - Alpha data shall use the same tile sizes as color data and shall have the same entropy_coding_sync_enabled_flag value as color data. - Alpha slices shall use the same number of coding units as color slices and should be interleaved with color slices. alpha NALs shall come before the corresponding color NALs. Animation encoding: - The optional prefix SEI with payloadType = 257 (defined in frame_duration_sei()) specifies that the image must be repeated 'frame_duration' times. 'frame_duration' shall not be zero. If the frame duration SEI is not present for a given frame, frame_duration = 1 shall be assumed by the decoder. If alpha data is present, the frame duration SEI shall be present only for the color data. 3.3) HEVC Profile ----------------- Conforming HEVC bit streams shall conform to the Main 4:4:4 16 Still Picture, Level 8.5 of the HEVC specification with the following modifications. - separate_colour_plane_flag shall be 0 when present. - bit_depth_luma_minus8 <= 6 - bit_depth_chroma_minus8 = bit_depth_luma_minus8 - explicit_rdpcm_enabled_flag = 0 (does not matter for intra frames) - extended_precision_processing_flag = 0 - cabac_bypass_alignment_enabled_flag = 0 - high_precision_offsets_enabled_flag = 0 (does not matter for intra frames) - If the encoded image is larger than the size indicated by picture_width and picture_height, the lower right part of the decoded image shall be cropped. If a horizontal (resp. vertical) decimation by two is done for the chroma and that the width (resp. height) is n pixels, ceil(n/2) pixels must be kept as the resulting chroma information. When animations are present, the next frames shall be encoded with the following changes: - P slices are allowed (but B slices are not allowed). - Only the previous picture can be used as reference (hence a DPB size of 2 pictures). 4) Design choices ----------------- (This section is informative) - Our design principle was to keep the format as simple as possible while taking the HEVC codec as basis. Our main metric to evaluate the simplicity was the size of a software decoder which outputs 32 bit RGBA pixel data. - Pixel formats: we wanted to be able to convert JPEG images to BPG with as little loss as possible. So supporting the same color space (BT 601 YCbCr) with the same range (full range) and most of the allowed JPEG chroma formats (4:4:4, 4:2:2, 4:2:0 or grayscale) was mandatory to avoid going back to RGB or doing a subsampling or interpolation. - Alpha support: alpha support is mandatory. We chose to use a separate HEVC monochrome plane to handle it instead of another format to simplify the decoder. The color is either non-premultiplied or premultiplied. Premultiplied alpha usually gives a better compression. Non-premultiplied alpha is supported in case no loss is needed on the color components. In order to allow progressive display, the alpha and color data are interleaved (the nuh_layed_id NAL field is 0 for color data and 1 for alpha data). The alpha and color slices should contain the same number of coding units and each alpha slice should come before the corresponding color slice. Since alpha slices are usually smaller than color slices, it allows a progressive display even if there is a single slice. - Color spaces: In addition to YCbCr, RGB is supported for the high quality or lossless cases. YCgCo is supported because it may give slightly better results than YCbCr for high quality images. CMYK is supported so that JPEGs containing this color space can be converted. The alpha plane is used to store the W (1-K) plane. The data is stored with inverted components (1-X) so that the conversion to RGB is simplified. The support of the BT 709 and BT 2020 (non constant luminance) YCbCr encodings and of the limited range color values were added to reduce the losses when converting video frames. - Bit depth: we decided to support the HEVC bit depths 8 to 14. The added complexity is small and it allows to support high quality pictures from cameras. - Picture file format: keeping a completely standard HEVC stream would have meant a more difficult parsing for the picture header which is a problem for the various image utilities to get the basic picture information (pixel format, width, height). So we added a small header before the HEVC bit stream. The picture header is byte oriended so it is easy to parse. - HEVC bit stream: the standard HEVC headers (the VPS and SPS NALs) give an overhead of about 60 bytes for no added value in the case of picture compression. Since the alpha plane uses a different HEVC bit stream, it also adds the same overhead again. So we removed the VPS and SPS NALs and added a very small header with the equivalent information (typically 4 bytes). We also removed the first NAL start code which is not useful. It is still possible to reconstruct a standard HEVC stream to feed an unmodified hardware decoder if needed. - Extensions: the metadata are stored at the beginning of the file so that they can be read at the same time as the header. Since metadata tend to evolve faster than the image formats, we left room for extension by using a (tag, lengh) representation. The decoder can easily skip all the metadata because their length is explicitly stored in the image header. - Animations: they are interesting compared to WebM or MP4 short videos for the following reasons: * transparency is supported * lossless encoding is supported * the decoding resources are smaller than with a generic video player because only two frames need to be stored (DPB size = 2). * the animations are expected to be small so the decoder can cache all the decoded frames in memory. * the animation can be decoded as a still image if the decoder does not support animations. Compared to the other animated image formats (GIF, APNG, WebP), the compression ratio is usually much higher because of the HEVC inter frame prediction. 5) References ------------- [1] High efficiency video coding (HEVC) version 2 (ITU-T Recommendation H.265) [2] JPEG File Interchange Format version 1.02 ( http://www.w3.org/Graphics/JPEG/jfif3.pdf ) [3] EXIF version 2.2 (JEITA CP-3451) [4] The International Color Consortium ( http://www.color.org/ ) [5] Extensible Metadata Platform (XMP) http://www.adobe.com/devnet/xmp.html [6] sRGB color space, IEC 61966-2-1