Introduction
mp4⽂件格式⼜被称为MPEG-4 Part 14,出⾃MPEG-4标准第14部分 。它是⼀种多媒体格式容器,⼴泛⽤于包装视频和⾳频数据流、海报、字幕和元数据等。(顺便⼀提,⽬前流⾏的视频编码格式AVC/H264定义在MPEG-4 Part 10)。
mp4⽂件格式基于Apple公司的QuickTime格式,因此,QuickTime File Format Specification 也可以作为我们研究mp4的重要参考。
MP4⽂件结构的资料 http://www.52rd.com/Blog/wqyuwss/559/
mp4box⼤杀器:ttp://download.tsi.telecom-paristech.fr/gpac/mp4box.js/filereader.html (该链接已经失效)
新地址:http://114.215.169.66:9001/test/filereader.html
Overview
mp4⽂件由box组成,每个box分为Header和Data。其中Header部分包含了box的类型和⼤⼩,Data包含 了⼦box或者数据,box可以嵌套⼦box。 下图是⼀个典型mp4⽂件的基本结构:

MP4⽂件的基本组成单元是box,也就是说MP4⽂件是由各种各样的box组成的,有parent box,还有 children box。因此,这些boxes之间存在⼀定的层次关系,总结如下表所示,表中标记出了各个box必选 或可选特性,√代表Box必选。
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|
ftyp | √ | file type and compatibility ⽂件类型和兼容性 | |||||
pdin | progressive download information | ||||||
moov | √ | container for all the metadata所有元数据的容器 | |||||
mvhd | √ | movie header, overall declarations 电影头,整体声明 | |||||
trak | √ | container for an individual track or stream 单个轨或流的容器 | |||||
tkhd | √ | track header, overall information about the track 轨的头部,关于该轨的概括信息,⽐如视频宽 ⾼ | |||||
tref | track reference container | ||||||
edts | edit list container | ||||||
elst | an edit list | ||||||
mdia | √ | container for the media information in a track 轨媒体信息的容器 | |||||
mdhd | √ | media header, overall information about the media 媒体头,关于媒体的总体信息 | |||||
hdlr | √ | handler, declares the media (handler) type 媒体的播放过程信息 | |||||
minf | √ | media information container 媒体信息容器 | |||||
vmhd | video media header, overall information (video track only) | ||||||
smhd | sound media header, overall information (sound track only) | ||||||
hmhd | hint media header, overall information (hint track only) | ||||||
nmhd | Null media header, overall information (some tracks only) | ||||||
dinf | √ | data information box, container 数据信息box,容器 | |||||
dref | √ | data reference box, declares source(s) of 4 media data in track 如何定位媒体信息 | |||||
stbl | √ | sample table box, container for the time/space map 包含了track中的sample的所有时间和位置信 息,以及sample的编解码等信息。利⽤这个表 可以解析sample的时序、类型、⼤⼩以及在各 ⾃存储容器中的位置。 | |||||
stsd | √ | sample descriptions (codec types, initialization etc.) 如果是视频,包含:编码类型、宽⾼、⻓度等 信息; 如果是⾳频,包含:声道、采样率等信息 | |||||
stts | √ | (decoding) time-to-sample 描述了sample时序的映射⽅法,我们可以通过 它找到任何时间的sample。 | |||||
ctts | (composition) time to sample | ||||||
stsc | √ | sample-to-chunk, partial data-offset information ⽤chunk组织sample可以⽅便优化数据获取, ⼀个chunk包含⼀个或多个sample。 | |||||
stsz | sample sizes (framing) 每个sample的⼤⼩。 虽然这⾥没有打勾,但对于mp4还是⾮常必要 的。 | ||||||
stz2 | compact sample sizes (framing) | ||||||
stco | √ | chunk offset, partial data-offset information 定义了每个chunk在媒体流中的偏移位置 | |||||
co6 4 | 64-bit chunk offset | ||||||
stss | sync sample table (random access points) ⽤于确定media中的关键帧 | ||||||
stsh | shadow sync sample table | ||||||
padd | sample padding bits | ||||||
stdp | sample degradation priority | ||||||
sdtp | independent and disposable samples | ||||||
sbgp | sample-to-group | ||||||
sgpd | sample group description | ||||||
subs | sub-sample information | ||||||
mvex | movie extends box | ||||||
meh d | movie extends header box | ||||||
trex | √ | track extends defaults | |||||
ipmc | IPMP Control Box | ||||||
moof | movie fragment | ||||||
mfhd | √ | movie fragment header | |||||
traf | track fragment | ||||||
tfhd | √ | track fragment header | |||||
trun | track fragment run | ||||||
sdtp | independent and disposable samples | ||||||
sbgp | sample-to-group | ||||||
subs | sub-sample information | ||||||
mfra | movie fragment random access | ||||||
tfra | track fragment random access | ||||||
mfro | √ | movie fragment random access offset | |||||
mdat | media data container | ||||||
free | free space | ||||||
skip | free space | ||||||
udta | user-data | ||||||
cprt | copyright etc. | ||||||
meta | metadata | ||||||
hdlr | √ | handler, declares the metadata (handler) type | |||||
dinf | data information box, container | ||||||
dref | data reference box, declares source(s) of metadata items | ||||||
ipmc | IPMP Control Box | ||||||
iloc | item location | ||||||
ipro | item protection | ||||||
sinf | protection scheme information box | ||||||
frma | original format box | ||||||
imif | IPMP Information box | ||||||
schm | scheme type box | ||||||
schi | scheme information box | ||||||
iinf | item information | ||||||
xml | XML container | ||||||
bxml | binary XML container | ||||||
pitm | primary item reference | ||||||
fiin | file delivery item information | ||||||
paen | partition entry | ||||||
fpar | file partition | ||||||
fecr | FEC reservoir | ||||||
segr | file delivery session group | ||||||
gitn | group id to name | ||||||
tsel | track selection | ||||||
meco | additional metadata container | ||||||
mere | metabox relation |
本⽂使⽤mediainfo和mp4box进⾏分析
图中看到mp4⽂件由⼏个主要组成部分,下⾯以📎 2_audio_track_5s.mp4⽂件为分析案例:
ftyp
File Type Box,⼀般在⽂件的开始位置,描述的⽂件的版本、兼容协议等。
1 000000 File Type (32 bytes)
2 000000 Header (8 bytes)
3 000000 Size: 32 (0x00000020)
4 000004 Name: ftyp
5 000008 MajorBrand: isom
6 00000C MajorBrandVersion: 512 (0x00000200)
7 000010 CompatibleBrand: isom
8 000014 CompatibleBrand: iso2
9 000018 CompatibleBrand: avc1
10 00001C CompatibleBrand: mp41
moov
Movie Box,包含本⽂件中所有媒体数据的宏观描述信息以及每路媒体轨道的具体信息。⼀般位于放在⽂件末尾,但如果为了⽀持http边下载边播放则需要将moov提前。注意,当改变moov位置时,内部⼀些值需要重新计算。
1 14B2CE File header (10341 bytes)
2 14B2CE Header (8 bytes)
3 14B2CE Size: 10341 (0x00002865)
4 14B2D2 Name: moov
moov⾥⾯的box才是我们主要分析的box

mdat
Media Data Box,存放具体的媒体数据。
1 000028 Data (1356454 bytes)
2 000028 Header (8 bytes)
3 000028 Size: 1356454 (0x0014B2A6)
4 00002C Name: mdat
5 000030 Data: (1356446 bytes)
6 .............数据区域 连续存储.......................................
Moov Insider
mp4的媒体数据信息主要存放在Moov Box中,是我们需要分析的重点。moov的主要组成部分如下:
mvhd
Movie Header Box,记录整个媒体⽂件的描述信息,如创建时间、修改时间、时间度量标尺、可播放时⻓等。
下图示例中,可以获取⽂件信息如时⻓为 Duration: 5016 ms秒。
1 14B2D6 Movie header (108 bytes)
2 14B2D6 Header (8 bytes)
3 14B2D6 Size: 108 (0x0000006C)
4 14B2DA Name: mvhd
5 14B2DE Version: 0 (0x00)
6 14B2DF Flags: 0 (0x000000)
7 14B2E2 Creation time: 0 (0x00000000) -
8 14B2E6 Modification time: 0 (0x00000000) -
9 14B2EA Time scale: 1000 (0x000003E8) - 1000 Hz
10 14B2EE Duration: 5016 (0x00001398) - 5016 ms
11 14B2F2 Preferred rate: 65536 (0x00010000) - 1.000
12 14B2F6 Preferred volume: 256 (0x0100) - 1.0 00
13 14B2F8 Reserved: (10 bytes)
14 14B302 Matrix structure (36 bytes)
15 14B302 a (width scale): 1.000
16 14B306 b (width rotate): 0.000
17 14B30A u (width angle): 0.000
18 14B30E c (height rotate): 0.000
19 14B312 d (height scale): 1.000
20 14B316 v (height angle): 0.000
21 14B31A x (position left): 0.000
22 14B31E y (position top): 0.000
23 14B322 w (divider): 1.000
24 14B326 Preview time: 0 (0x00000000)
25 14B32A Preview duration: 0 (0x00000000)
26 14B32E Poster time: 0 (0x00000000)
27 14B332 Selection time: 0 (0x00000000)
28 14B336 Selection duration: 0 (0x00000000)
29 14B33A Current time: 0 (0x00000000)
30 14B33E Next track ID: 4 (0x00000004)
udta
User Data Box,⾃定义数据。
track
Track Box,记录媒体流信息,⽂件中可以存在⼀个或多个track,它们之间是相互独⽴的。

⽐如我们提供的测试⽂件2_audio_track_5s.mp4
每个track包含以下⼏个组成部分:
tkhd
Track Header Box,包含关于媒体流的头信息。
下图示例中,可以看到流信息如视频流宽度720,⻓度1280。
1 14CEA6 Track Header (92 bytes)
2 14CEA6 Header (8 bytes)
3 14CEA6 Size: 92 (0x0000005C)
4 14CEAA Name: tkhd
5 14CEAE Version: 0 (0x00)
6 14CEAF Flags: 3 (0x000003)
7 14CEB2 Track Enabled: Yes
8 14CEB2 Track in Movie: 2 (0x0000000000000002)
9 14CEB2 Track in Preview: 0 (0x0000000000000000)
10 14CEB2 Track in Poster: 0 (0x0000000000000000)
11 14CEB2 Creation time: 0 (0x00000000) -
12 14CEB6 Modification time: 0 (0x00000000) -
13 14CEBA Track ID: 3 (0x00000003)
14 14CEBE Reserved: 0 (0x00000000)
15 14CEC2 Duration: 4875 (0x0000130B)- 4875 (0x130B) ms
16 14CEC6 Reserved: 0 (0x00000000)
17 14CECA Reserved: 0 (0x00000000)
18 14CECE Layer: 0 (0x0000)
19 14CED0 Alternate group: 2 (0x0002)
20 14CED2 Volume: 0 (0x0000) - 0.000
21 14CED4 Reserved: 0 (0x0000)
22 14CED6 Matrix structure (36 bytes)
23 14CED6 a (width scale): 1.000
24 14CEDA b (width rotate): 0.000
25 14CEDE u (width angle): 0.000
26 14CEE2 c (height rotate): 0.000
27 14CEE6 d (height scale): 1.000
28 14CEEA v (height angle): 0.000
29 14CEEE x (position left): 0.000
30 14CEF2 y (position top): 0.000
31 14CEF6 w (divider): 1.000
32 14CEFA Track width: 1920.000
33 14CEFE Track height: 800.000
⾳频的tkhd,则⽐如duration、volume等。
⾳频tkhd内容
1 14B34A Track Header (92 bytes)
2 14B34A Header (8 bytes)
3 14B34A Size: 92 (0x0000005C)
4 14B34E Name: tkhd
5 14B352 Version: 0 (0x00)
6 14B353 Flags: 3 (0x000003)
7 14B356 Track Enabled: Yes
8 14B356 Track in Movie: 2 (0x0000000000000002)
9 14B356 Track in Preview: 0 (0x0000000000000000)
10 14B356 Track in Poster: 0 (0x0000000000000000)
11 14B356 Creation time: 0 (0x00000000) -
12 14B35A Modification time: 0 (0x00000000) -
13 14B35E Track ID: 1 (0x00000001)
14 14B362 Reserved: 0 (0x00000000)
15 14B366 Duration: 5016 (0x00001398)- 5016 (0x1398) ms
16 14B36A Reserved: 0 (0x00000000)
17 14B36E Reserved: 0 (0x00000000)
18 14B372 Layer: 0 (0x0000)
19 14B374 Alternate group: 0 (0x0000)
20 14B376 Volume: 256 (0x0100) - 1.000
21 14B378 Reserved: 0 (0x0000)
22 14B37A Matrix structure (36 bytes)
23 14B37A a (width scale): 1.000
24 14B37E b (width rotate): 0.000
25 14B382 u (width angle): 0.000
26 14B386 c (height rotate): 0.000
27 14B38A d (height scale): 1.000
28 14B38E v (height angle): 0.000
29 14B392 x (position left): 0.000
30 14B396 y (position top): 0.000
31 14B39A w (divider): 1.000
32 14B39E Track width: 0.000
33 14B3A2 Track height: 0.000
mdia
Media Box,这是⼀个包含track媒体数据信息的container box。⼦box包括:
mdhd:Media Header Box,存放视频流创建时间,⻓度等信息。
hdlr:Handler Reference Box,媒体的播放过程信息。
minf:Media Information Box,解释track媒体数据的handler-specific信息。minf同样是个container box,其内部需要关注的内容是stbl,这也是moov中最复杂的部分。stbl包含了媒体流每⼀个sample在⽂件中的offset,pts,duration等信息。想要播放⼀个mp4⽂件,必须根据stbl正确找到每个sample并送给解码器。
mdia展开如下所示:
1 14CF32 Media (2975 bytes)
2 14CF32 Header (8 bytes)
3 14CF32 Size: 2975 (0x00000B9F)
4 14CF36 Name: mdia

mdhd
Media Header Box,存放视频流创建时间,⻓度等信息。
视频的mdhd,Time scale,Duration等信息。
视频mdhd
1 14CF3A Media Header (32 bytes)
2 14CF3A Header (8 bytes)
3 14CF3A Size: 32 (0x00000020)
4 14CF3E Name: mdhd
5 14CF42 Version: 0 (0x00)
6 14CF43 Flags: 0 (0x000000)
7 14CF46 Creation time: 0 (0x00000000) -
8 14CF4A Modification time: 0 (0x00000000) -
9 14CF4E Time scale: 90000 (0x00015F90)
10 14CF52 Duration: 438750 (0x0006B1DE) - 4875 (0x130B) ms
11 14CF56 Language: 21956 (0x55C4) - und
12 14CF58 Quality: 0 (0x0000)
⾳频的mdhd,也类似视频,但要注意Time scale,我们在计算时间戳的时候都要使⽤该Time scale,对应我们流⾥⾯的AVStream->time_base
⾳频mdhd
1 14B3D2 Media Header (32 bytes)
2 14B3D2 Header (8 bytes)
3 14B3D2 Size: 32 (0x00000020)
4 14B3D6 Name: mdhd
5 14B3DA Version: 0 (0x00)
6 14B3DB Flags: 0 (0x000000)
7 14B3DE Creation time: 0 (0x00000000) -
8 14B3E2 Modification time: 0 (0x00000000) -
9 14B3E6 Time scale: 44100 (0x0000AC44)
10 14B3EA Duration: 221184 (0x00036000) - 5015 (0x1397) ms
11 14B3EE Language: 21956 (0x55C4) - und
12 14B3F0 Quality: 0 (0x0000)
hdlr
Handler Reference Box,媒体的播放过程信息。
视频的hdlr,重点Component subtype: vide
1 14CF5A Handler Reference (45 bytes)
2 14CF5A Header (8 bytes)
3 14CF5A Size: 45 (0x0000002D)
4 14CF5E Name: hdlr
5 14CF62 Version: 0 (0x00)
6 14CF63 Flags: 0 (0x000000)
7 14CF66 Component type:
8 14CF6A Component subtype: vide
9 14CF6E Component manufacturer:
10 14CF72 Component flags: 0 (0x00000000)
11 14CF76 Component flags mask: 0 (0x00000000)
12 14CF7A Component name: VideoHandler
⾳频的hdlr,Component subtype: soun,如果我们多个⾳轨的时候,Component name:粤语
1 14B3F2 Handler Reference (39 bytes)
2 14B3F2 Header (8 bytes)
3 14B3F2 Size: 39 (0x00000027)
4 14B3F6 Name: hdlr
5 14B3FA Version: 0 (0x00)
6 14B3FB Flags: 0 (0x000000)
7 14B3FE Component type:
8 14B402 Component subtype: soun
9 14B406 Component manufacturer:
10 14B40A Component flags: 0 (0x00000000)
11 14B40E Component flags mask: 0 (0x00000000)
12 14B412 Component name: 粤语
我们分析的⽂件另⼀路⾳轨
1 14C0EA Handler Reference (39 bytes)
2 14C0EA Header (8 bytes)
3 14C0EA Size: 39 (0x00000027)
4 14C0EE Name: hdlr
5 14C0F2 Version: 0 (0x00)
6 14C0F3 Flags: 0 (0x000000)
7 14C0F6 Component type:
8 14C0FA Component subtype: soun
9 14C0FE Component manufacturer:
10 14C102 Component flags: 0 (0x00000000)
11 14C106 Component flags mask: 0 (0x00000000)
12 14C10A Component name: 国语
minf
minf:Media Information Box,解释track媒体数据的handler-specific信息。minf同样是个container box,其内部需要关注的内容是stbl,这也是moov中最复杂的部分。stbl包含了媒体流每⼀个sample在⽂件中的offset,pts,duration等信息。想要播放⼀个mp4⽂件,必须根据stbl正确找到每个sample并送给解码器。
⽽且需要注意的是,minf⾥⾯的⼦容器,⾳频和视频轨是有区别的,⽐如视频轨:vmhd, ⾳频轨则为:smhd
vmhd
1 14CF8F Video Media Header (20 bytes)
2 14CF8F Header (8 bytes)
3 14CF8F Size: 20 (0x00000014)
4 14CF93 Name: vmhd
5 14CF97 Version: 0 (0x00)
6 14CF98 Flags: 1 (0x000001)
7 14CF9B Graphic mode: 0 (0x0000)
8 14CF9D Graphic mode color R: 0 (0x0000)
9 14CF9F Graphic mode color G: 0 (0x0000)
10 14CFA1 Graphic mode color B: 0 (0x0000)
smhd
1 14B421 Sound Media Header (16 bytes)
2 14B421 Header (8 bytes)
3 14B421 Size: 16 (0x00000010)
4 14B425 Name: smhd
5 14B429 Version: 0 (0x00)
6 14B42A Flags: 0 (0x000000)
7 14B42D Audio balance: 0 (0x0000)
8 14B42F Reserved: 0 (0x0000)
Stbl Insider
Sample Table Box,上⽂提到mdia中最主要的部分是存放⽂件中每个sample信息的stbl。在解析stbl前,我们需要区分chunk和sample这两个概念。
在mp4⽂件中,sample是⼀个媒体流的基本单元,例如视频流的⼀个sample代表实际的nal数据。chunk是数据存储的基本单位,它是⼀系列sample数据的集合,⼀个chunk中可以包含⼀个或多的sample。

⼀个chunk包含⼀个或多个sample
stbl⽤来描述每个sample的信息,包含以下⼏个主要的⼦box:
stsd
Sample Description Box,存放解码必须的描述信息。
下图示例中,对于h264的视频流,其具体类型为 avc1 ,extensions中其中存放有sps,pps等解码必要信息。
视频的stsd
1 14CFCF Sample Description (174 bytes)
2 14CFCF Header (8 bytes)
3 14CFCF Size: 174 (0x000000AE)
4 14CFD3 Name: stsd
5 14CFD7 Version: 0 (0x00)
6 14CFD8 Flags: 0 (0x000000)
7 14CFDB Count: 1 (0x00000001)
⾥⾯包含了avc1,avc1⾥⾯⼜包含了avcC和pasp

avc1:包含了视频Width、Height
avcC:包含了视频编码器相关的信息,包括sps、pps等信息
1 14CFDF Video (158 bytes)
2 14CFDF Header (8 bytes)
3 14CFDF Size: 158 (0x0000009E)
4 14CFE3 Name: avc1
5 14CFE7 Reserved: 0 (0x0000000000000000)
6 14CFED Data reference index: 1 (0x0001)
7 14CFEF Version: 0 (0x0000)
8 14CFF1 Revision level: 0 (0x0000)
9 14CFF3 Vendor:
10 14CFF7 Temporal quality: 0 (0x00000000)
11 14CFFB Spatial quality: 0 (0x00000000)
12 14CFFF Width: 1920 (0x0780)
13 14D001 Height: 800 (0x0320)
14 14D003 Horizontal resolution: 4718592 (0x00480000)
15 14D007 Vertical resolution: 4718592 (0x00480000)
16 14D00B Data size: 0 (0x00000000)
17 14D00F Frame count: 1 (0x0001)
18 14D011 Compressor name size: 0 (0x00)
19 14D012 Padding: (31 bytes)
20 14D031 Depth: 24 (0x0018)
21 14D033 Color table ID: 65535 (0xFFFF)
22 14D035 AVC decode (56 bytes)
23 14D035 Header (8 bytes)
24 14D035 Size: 56 (0x00000038)
25 14D039 Name: avcC
26 14D03D Version: 1 (0x01)
27 14D03E Specific (47 bytes)
28 14D03E Profile: 100 (0x64)
29 14D03F Compatible profile: 0 (0x00)
30 14D040 Level: 40 (0x28)
31 14D041 Reserved: 63 (0x3F) - (6 bits)
32 14D041 Size of NALU length minus 1: 3 (0x3) - (2 bits)
33 14D042 Reserved: 7 (0x7) - (3 bits)
34 14D042 seq_parameter_set count: 1 (0x01) - (5 bits)
35 14D043 seq_parameter_set (30 bytes)
36 14D043 Size: 28 (0x001C)
37 14D045 nal_ref_idc: 3 (0x3) - (2 bits)
38 14D045 nal_unit_type: 7 (0x7) - (5 bits)
39 14D046 profile_idc: 100 (0x64)
40 14D047 constraints (1 bytes)
41 14D047 constraint_set0_flag: No
42 14D047 constraint_set1_flag: No
43 14D047 constraint_set2_flag: No
44 14D047 constraint_set3_flag: No
45 14D047 constraint_set4_flag: No
46 14D047 constraint_set5_flag: No
47 14D047 reserved_zero_2bits: 0 (0x0)
48 14D048 level_idc: 40 (0x28) - (8 bits)
49 14D049 seq_parameter_set_id: 0 (0x0)
50 14D049 high profile specific (1 bytes)
51 14D049 chroma_format_idc: 1 (0x1) - 4:2:0
52 14D049 bit_depth_luma_minus8: 0 (0x0)
53 14D049 bit_depth_chroma_minus8: 0 (0x0)
54 14D049 qpprime_y_zero_transform_bypass_flag: No
55 14D049 seq_scaling_matrix_present_flag: No
56 14D04A log2_max_frame_num_minus4: 0 (0x0)
57 14D04A pic_order_cnt_type: 0 (0x0)
58 14D04A log2_max_pic_order_cnt_lsb_minus4: 2 (0x2)
59 14D04A max_num_ref_frames: 3 (0x3)
60 14D04B gaps_in_frame_num_value_allowed_flag: No
61 14D04B pic_width_in_mbs_minus1: 119 (0x077)
62 14D04D pic_height_in_map_units_minus1: 49 (0x031)
63 14D04E frame_mbs_only_flag: Yes
64 14D04E direct_8x8_inference_flag: Yes
65 14D04E frame_cropping_flag: No
66 14D04E vui_parameters_present_flag (17 bytes)
67 14D04E vui_parameters_present_flag: Yes
68 14D04E aspect_ratio_info_present_flag (2 bytes)
69 14D04E aspect_ratio_info_present_flag: Yes
70 14D04F aspect_ratio_idc: 1 (0x01) - (8 bits) - 1.000
71 14D050 overscan_info_present_flag: No
72 14D050 video_signal_type_present_flag (3 bytes)
73 14D050 video_signal_type_present_flag: Yes
74 14D050 video_format: 5 (0x5) - (3 bits) -
75 14D050 video_full_range_flag: 0 (0x0) - (1 bits) - Limited
76 14D050 colour_description_present_flag (3 bytes)
77 14D050 colour_description_present_flag: Yes
78 14D050 colour_primaries: 1 (0x01) - (8 bits) - BT.709
79 14D051 transfer_characteristics: 1 (0x01) - (8 bits) - BT.709
80 14D052 matrix_coefficients: 1 (0x01) - (8 bits) - BT.709
81 14D053 chroma_loc_info_present_flag: No
82 14D054 timing_info_present_flag (8 bytes)
83 14D054 timing_info_present_flag: Yes
84 14D054 num_units_in_tick: 1 (0x00000001) -(32 bits)
85 14D058 time_scale: 48 (0x00000030) -(32 bits)
86 14D05C fixed_frame_rate_flag: Yes
87 14D05C nal_hrd_parameters_present_flag: No
88 14D05C vcl_hrd_parameters_present_flag: No
89 14D05C pic_struct_present_flag: No
90 14D05C bitstream_restriction_flag (3 bytes)
91 14D05C bitstream_restriction_flag: Yes
92 14D05C motion_vectors_over_pic_boundaries_flag: Yes
93 14D05D max_bytes_per_pic_denom: 0 (0x0)
94 14D05D max_bits_per_mb_denom: 0 (0x0)
95 14D05D log2_max_mv_length_horizontal: 11 (0x0B)
96 14D05E log2_max_mv_length_vertical: 11 (0x0B)
97 14D05F max_num_reorder_frames: 2 (0x2)
98 14D05F max_dec_frame_buffering: 4 (0x4)
99 14D061 pic_parameter_set count: 1 (0x01)
100 14D062 pic_parameter_set (6 bytes)
101 14D062 Size: 5 (0x0005)
102 14D064 nal_ref_idc: 3 (0x3) - (2 bits)
103 14D064 nal_unit_type: 8 (0x8) - (5 bits)
104 14D065 pic_parameter_set_id: 0 (0x0)
105 14D065 seq_parameter_set_id: 0 (0x0)
106 14D065 entropy_coding_mode_flag: Yes
107 14D065 bottom_field_pic_order_in_frame_present_flag: No
108 14D065 num_slice_groups_minus1: 0 (0x0)
109 14D065 num_ref_idx_l0_default_active_minus1: 3 (0x3)
110 14D066 num_ref_idx_l1_default_active_minus1: 0 (0x0)
111 14D066 weighted_pred_flag: No
112 14D066 weighted_bipred_idc: 2 (0x2) - (2 bits)
113 14D066 pic_init_qp_minus26: 0 (0x0)
114 14D067 pic_init_qs_minus26: 0 (0x0)
115 14D067 chroma_qp_index_offset: 0 (0x0)
116 14D067 deblocking_filter_control_present_flag: Yes
117 14D067 constrained_intra_pred_flag: No
118 14D067 redundant_pic_cnt_present_flag: No
119 14D067 transform_8x8_mode_flag: Yes
120 14D067 pic_scaling_matrix_present_flag: No
121 14D067 second_chroma_qp_index_offset: 0 (0x0)
122 14D068 -------------------------
123 14D068 --- AVC, accepted ---
124 14D068 -------------------------
125 14D069 Padding?: (4 bytes)
126 14D06D Pixel Aspect Ratio (16 bytes)
127 14D06D Header (8 bytes)
128 14D06D Size: 16 (0x00000010)
129 14D071 Name: pasp
130 14D075 hSpacing: 1 (0x00000001)
131 14D079 vSpacing: 1 (0x00000001)
⾳频的stsd
⽤Hexinator分析,包含了⾳频相关的信息,⽐如采样率,通道数量等。

stts
Time-to-Sample Box,定义每个sample时⻓。Time-To-Sample的table entry布局如下:

sample count:sample个数
sample duration:sample持续时间
持续时间相同的连续sample可以放到⼀个entry⾥达到节省空间的⽬的。
这⾥先给出来的是视频的stts,Number of entries,这个参数需要注意并不是sample的个数,sample的实际数量需要将每个entry的sample count进⾏累加才是真正的sample个数。
下图示例中,第1个sample时间为3720,单位⽤mdhd的time scale进⾏换算,⽐如视频的是90000,此时换算成秒为3720/90000 = 0.0413333333333333秒。
1 14D07D Time to Sample (664 bytes)
2 14D07D Header (8 bytes)
3 14D07D Size: 664 (0x00000298)
4 14D081 Name: stts
5 14D085 Version: 0 (0x00)
6 14D086 Flags: 0 (0x000000)
7 14D089 Number of entries: 81 (0x00000051)
8 14D08D Sample Count: 1 (0x00000001)
9 14D091 Sample Duration: 3720 (0x00000E88)
10 14D095 Sample Count: 1 (0x00000001)
11 14D099 Sample Duration: 3780 (0x00000EC4)
12 14D09D Sample Count: 1 (0x00000001)
13 14D0A1 Sample Duration: 3690 (0x00000E6A)
14 14D0A5 Sample Count: 2 (0x00000002)
15 14D0A9 Sample Duration: 3780 (0x00000EC4)
16 14D0AD Sample Count: 1 (0x00000001)
17 14D0B1 Sample Duration: 3690 (0x00000E6A)
18 14D0B5 Sample Count: 2 (0x00000002)
19 14D0B9 Sample Duration: 3780 (0x00000EC4)
20 14D0BD Sample Count: 1 (0x00000001)
21 14D0C1 Sample Duration: 3690 (0x00000E6A)
22 14D0C5 Sample Count: 2 (0x00000002)
23 14D0C9 Sample Duration: 3780 (0x00000EC4)
24 14D0CD Sample Count: 1 (0x00000001)
25 14D0D1 Sample Duration: 3690 (0x00000E6A)
26 14D0D5 Sample Count: 2 (0x00000002)
27 14D0D9 Sample Duration: 3780 (0x00000EC4)
28 14D0DD Sample Count: 1 (0x00000001)
29 14D0E1 Sample Duration: 3690 (0x00000E6A)
30 ........
31 14D305 Sample Count: 2 (0x00000002)
32 14D309 Sample Duration: 3780 (0x00000EC4)
33 14D30D Sample Count: 1 (0x00000001)
34 14D311 Sample Duration: 3750 (0x00000EA6)
35 14D315 结束位置
再给出个⾳频的stts,只是mdhd的time scale的差别,之前我们看到⾳频为44100,则计算第⼀个sample的时间
1024/44100=0.0232199546485261秒。
⾳频stts内容节选
1 14B4C4 Time to Sample (1048 bytes)
2 14B4C4 Header (8 bytes)
3 14B4C4 Size: 1048 (0x00000418)
4 14B4C8 Name: stts
5 14B4CC Version: 0 (0x00)
6 14B4CD Flags: 0 (0x000000)
7 14B4D0 Number of entries: 129 (0x00000081)
8 14B4D4 Sample Count: 1 (0x00000001)
9 14B4D8 Sample Duration: 1024 (0x00000400)
10 14B4DC Sample Count: 1 (0x00000001)
11 14B4E0 Sample Duration: 1025 (0x00000401)
12 14B4E4 Sample Count: 2 (0x00000002)
13 14B4E8 Sample Duration: 1024 (0x00000400)
14 14B4EC Sample Count: 1 (0x00000001)
15 14B4F0 Sample Duration: 1023 (0x000003FF)
stss
Sync Sample Box,同步sample表,存放关键帧列表,关键帧是为了⽀持随机访问。
stss的table entry布局如下:

下图示例中,该视频track有3个关键帧:
1 14D315 Sync Sample (28 bytes)
2 14D315 Header (8 bytes)
3 14D315 Size: 28 (0x0000001C)
4 14D319 Name: stss
5 14D31D Version: 0 (0x00)
6 14D31E Flags: 0 (0x000000)
7 14D321 entry-count: 3 (0x00000003)
8 14D325 number:
1 darren补充(mediainfo没有解析出来)
9 14D329 number:
54 darren补充
10 14D32D number:
103 darren补充

stsc
Sample-To-Chunk Box,sample-chunk映射表。上⽂提到mp4通常把sample封装到chunk中,⼀个chunk可能会包含⼀个或者⼏个sample。Sample-To-Chunk Atom的table entry布局如下图所示:

First chunk:使⽤该表项的第⼀个chunk序号
Samples per chunk:使⽤该表项的chunk中包含有⼏个sample
Sample description ID:使⽤该表项的chunk参考的stsd表项序号
下图示例中,可以看到该视频track⼀共有1个stsc表项,chunk序列1-x,每个chunk包含⼀个sample。
这⾥则说明每个chunk⾥⾯只有⼀个sample(⼀个chunk是可以有多个sample)。

stsz
Sample Size Box,指定了每个sample的size。Sample Size Atom包含两sample总数和⼀张包含了每个sample size的表。
sample size 表的entry布局如下图:

下图示例中,该视频流⼀共有110个sample,第1个sample⼤⼩为42072字节,第2个sample⼤⼩为7354个字节。
1 14D705 Sample Size (488 bytes)
2 14D705 Header (8 bytes)
3 14D705 Size: 488 (0x000001E8)
4 14D709 Name: stsz
5 14D70D Version: 0 (0x00)
6 14D70E Flags: 0 (0x000000)
7 14D711 Sample Size: 0 (0x00000000)
8 14D715 Number of entries: 117 (0x00000075)

**
stco
Chunk Offset Box,指定了每个chunk在⽂件中的位置,这个表是确定每个sample在⽂件中位置的关键。该表包含了chunk个数和⼀个包含每个chunk在⽂件中偏移位置的表。每个表项的内存布局如下:
[图片上传失败...(image-6d7cd2-1677409978949)]

需要注意,这⾥stco只是指定的每个chunk在⽂件中的偏移位置,并没有给出每个sample在⽂件中的偏移。想要获得每个sample的偏移位置,需要结合 Sample Size box(stsz)和Sample-To-Chunk(stsc) 计算后取得。
下图示例中,该视频流第1个chunk在⽂件中的偏移为4750,⽽这⾥是每个chunk只有⼀个sample,此时第⼀个sample的起始位置就为4750->0x1D78,数据⼤⼩则参照stsz,第⼀个sample size为172818。

⽐如偏移位置,7544->0x1D78

如何计算sample偏移位置
上⽂提到通过stco并不能直接获取某个sample的偏移位置,下⾯举例说明如何获取某⼀个pts对应的sample在⽂件中的位置。
⼤体需要以下步骤:
1.将pts转换到媒体对应的时间坐标系
2.根据stts((decoding) time-to-sample)计算某个pts对应的sample序号
3.根据stsc(sample-to-chunk)计算sample序号存放在哪个chunk中
4.根据stco(chunk offset)获取对应chunk在⽂件中的偏移位置
5.根据stsz获取sample在chunk内的偏移位置并加上第4步获取的偏移,计算出sample在⽂件中的偏移
例如,想要获取3.64秒视频sample数据在⽂件中的位置(使⽤我们上课⽤的2_audio_track_5s.mp4):
1.根据time scale参数,将3.64秒转换为视频时间轴对应的3640000 (假如时间刻度不为毫秒)
视频轨:time scale为90000,转成对应的时间戳为3.64秒90000
2.遍历累加下表所示stts所有项⽬,计算得到3640000位于第110个sample = 327600
3.计算出多个sample_deltas叠加才到了327600, 我们这⾥姑且按3780作为平均值计算,实际是37201+37801+36901+3780*2 ...... 这样⼀直叠加进⾏。327600/3780 = 86.66666666666667,取整为86
1 type stts
2 size 664
3 flags 0
4 version 0
5 sample_counts 1,1,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,1,1,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,1,1,1,2,1,2,1,2,1,2,1
6 sample_deltas 3720,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3750,3720,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3690,3780,3750,3720,3780,3690,3780,3690,3780,3690,3780,3690,3780,3750
4.查询下表所示stsc所有项⽬,计算得到第86个sample位于第86个chunk,并且在该chunk中位于第1个sample(因为我们的码流是每个chunk对应了⼀个sample)
1 Property name Property value
2 type stsc
3 size 28
4 flags 0
5 version 0
6 first_chunk 1
7 samples_per_chunk 1
8 sample_description_index 1
5.查询下表所示stco所有项⽬,得到第86个chunk在⽂件中偏移位置为1004678。使⽤hexinator
1 Property name Property value
2 type stco
3 size 484
4 flags 0
5 version 0
6 chunk_offsets 7544,182562,204381,206907,209520,236820,240924,242781,..............省略
6.查询下表所示stsz所有项⽬,得到第86个sample的size为20934。计算得到3.64秒视频sample数据在⽂件中
offset:1004678+0 = 1004678
size:20934
1 Property name Property value
2 type stsz
3 size 488
4 flags 0
5 version 0
6 sample_sizes 172818,20829,722,567,25207,1946,822,674,23828,2141,824,974,22426,2794..省略
7 sample_count 117
验证:⽤编辑器打开mp4⽂件,定位到⽂件偏移1004678位置。
09分隔符,这⾥占⽤了6个字节, 再看真正的数据区域,前4字节也为 NALU的⻓度0x000051bc=20924
总共占⽤的字节计算 4+2+4+20924 = 20934

参考资料
https://www.cnblogs.com/ranson7zop/p/7889272.html
在线mp4解析⼯具 MP4盒.js |MP4盒.js (gpac.github.io)
QuickTime File Format Specification: QuickTime 文件格式规范简介 (apple.com)
⼀个chunk含有多个sample的情景参考下⾯链接进⾏分析:
mp4⽂件格式重点解析 https://www.jianshu.com/p/44c9567d8fcb
案例demux-mp4
.pro
TEMPLATE = app
CONFIG += console
CONFIG -= app_bundle
CONFIG -= qt
SOURCES += main.c
win32 {
INCLUDEPATH += $$PWD/ffmpeg-4.2.1-win32-dev/include
LIBS += $$PWD/ffmpeg-4.2.1-win32-dev/lib/avformat.lib \
$$PWD/ffmpeg-4.2.1-win32-dev/lib/avcodec.lib \
$$PWD/ffmpeg-4.2.1-win32-dev/lib/avdevice.lib \
$$PWD/ffmpeg-4.2.1-win32-dev/lib/avfilter.lib \
$$PWD/ffmpeg-4.2.1-win32-dev/lib/avutil.lib \
$$PWD/ffmpeg-4.2.1-win32-dev/lib/postproc.lib \
$$PWD/ffmpeg-4.2.1-win32-dev/lib/swresample.lib \
$$PWD/ffmpeg-4.2.1-win32-dev/lib/swscale.lib
}
main.c
#include <stdio.h>
#include "libavutil/log.h"
#include "libavformat/avformat.h"
#define ERROR_STRING_SIZE 1024
#define ADTS_HEADER_LEN 7;
const int sampling_frequencies[] = {
96000, // 0x0
88200, // 0x1
64000, // 0x2
48000, // 0x3
44100, // 0x4
32000, // 0x5
24000, // 0x6
22050, // 0x7
16000, // 0x8
12000, // 0x9
11025, // 0xa
8000 // 0xb
// 0xc d e f是保留的
};
int adts_header(char * const p_adts_header, const int data_length,
const int profile, const int samplerate,
const int channels)
{
int sampling_frequency_index = 3; // 默认使用48000hz
int adtsLen = data_length + 7;
int frequencies_size = sizeof(sampling_frequencies) / sizeof(sampling_frequencies[0]);
int i = 0;
for(i = 0; i < frequencies_size; i++)
{
if(sampling_frequencies[i] == samplerate)
{
sampling_frequency_index = i;
break;
}
}
if(i >= frequencies_size)
{
printf("unsupport samplerate:%d\n", samplerate);
return -1;
}
p_adts_header[0] = 0xff; //syncword:0xfff 高8bits
p_adts_header[1] = 0xf0; //syncword:0xfff 低4bits
p_adts_header[1] |= (0 << 3); //MPEG Version:0 for MPEG-4,1 for MPEG-2 1bit
p_adts_header[1] |= (0 << 1); //Layer:0 2bits
p_adts_header[1] |= 1; //protection absent:1 1bit
p_adts_header[2] = (profile)<<6; //profile:profile 2bits
p_adts_header[2] |= (sampling_frequency_index & 0x0f)<<2; //sampling frequency index:sampling_frequency_index 4bits
p_adts_header[2] |= (0 << 1); //private bit:0 1bit
p_adts_header[2] |= (channels & 0x04)>>2; //channel configuration:channels 高1bit
p_adts_header[3] = (channels & 0x03)<<6; //channel configuration:channels 低2bits
p_adts_header[3] |= (0 << 5); //original:0 1bit
p_adts_header[3] |= (0 << 4); //home:0 1bit
p_adts_header[3] |= (0 << 3); //copyright id bit:0 1bit
p_adts_header[3] |= (0 << 2); //copyright id start:0 1bit
p_adts_header[3] |= ((adtsLen & 0x1800) >> 11); //frame length:value 高2bits
p_adts_header[4] = (uint8_t)((adtsLen & 0x7f8) >> 3); //frame length:value 中间8bits
p_adts_header[5] = (uint8_t)((adtsLen & 0x7) << 5); //frame length:value 低3bits
p_adts_header[5] |= 0x1f; //buffer fullness:0x7ff 高5bits
p_adts_header[6] = 0xfc; //11111100 //buffer fullness:0x7ff 低6bits
// number_of_raw_data_blocks_in_frame:
// 表示ADTS帧中有number_of_raw_data_blocks_in_frame + 1个AAC原始帧。
return 0;
}
// 程序本身 input.mp4 out.h264 out.aac
int main(int argc, char **argv)
{
// 判断参数
if(argc != 4) {
printf("usage app input.mp4 out.h264 out.aac");
return -1;
}
char *in_filename = argv[1];
char *h264_filename = argv[2];
char *aac_filename = argv[3];
FILE *aac_fd = NULL;
FILE *h264_fd = NULL;
h264_fd = fopen(h264_filename, "wb");
if(!h264_fd) {
printf("fopen %s failed\n", h264_filename);
return -1;
}
aac_fd = fopen(aac_filename, "wb");
if(!aac_fd) {
printf("fopen %s failed\n", aac_filename);
return -1;
}
AVFormatContext *ifmt_ctx = NULL;
int video_index = -1;
int audio_index = -1;
AVPacket *pkt = NULL;
int ret = 0;
char errors[ERROR_STRING_SIZE+1]; // 主要是用来缓存解析FFmpeg api返回值的错误string
ifmt_ctx = avformat_alloc_context();
if(!ifmt_ctx) {
printf("avformat_alloc_context failed\n");
// fclose(aac_fd);
return -1;
}
ret = avformat_open_input(&ifmt_ctx, in_filename, NULL, NULL);
if(ret < 0) {
av_strerror(ret, errors, ERROR_STRING_SIZE);
printf("avformat_open_input failed:%d\n", ret);
printf("avformat_open_input failed:%s\n", errors);
avformat_close_input(&ifmt_ctx);
// go failed;
return -1;
}
video_index = av_find_best_stream(ifmt_ctx, AVMEDIA_TYPE_VIDEO, -1, -1, NULL, 0);
if(video_index == -1) {
printf("av_find_best_stream video_index failed\n");
avformat_close_input(&ifmt_ctx);
return -1;
}
audio_index = av_find_best_stream(ifmt_ctx, AVMEDIA_TYPE_AUDIO, -1, -1, NULL, 0);
if(audio_index == -1) {
printf("av_find_best_stream audio_index failed\n");
avformat_close_input(&ifmt_ctx);
return -1;
}
// h264_mp4toannexb
const AVBitStreamFilter *bsfilter = av_bsf_get_by_name("h264_mp4toannexb"); // 对应面向对象的方法
if(!bsfilter) {
avformat_close_input(&ifmt_ctx);
printf("av_bsf_get_by_name h264_mp4toannexb failed\n");
return -1;
}
AVBSFContext *bsf_ctx = NULL; // 对应面向对象的变量
ret = av_bsf_alloc(bsfilter, &bsf_ctx);
if(ret < 0) {
av_strerror(ret, errors, ERROR_STRING_SIZE);
printf("av_bsf_alloc failed:%s\n", errors);
avformat_close_input(&ifmt_ctx);
return -1;
}
ret = avcodec_parameters_copy(bsf_ctx->par_in, ifmt_ctx->streams[video_index]->codecpar);
if(ret < 0) {
av_strerror(ret, errors, ERROR_STRING_SIZE);
printf("avcodec_parameters_copy failed:%s\n", errors);
avformat_close_input(&ifmt_ctx);
av_bsf_free(&bsf_ctx);
return -1;
}
ret = av_bsf_init(bsf_ctx);
if(ret < 0) {
av_strerror(ret, errors, ERROR_STRING_SIZE);
printf("av_bsf_init failed:%s\n", errors);
avformat_close_input(&ifmt_ctx);
av_bsf_free(&bsf_ctx);
return -1;
}
pkt = av_packet_alloc();
av_init_packet(pkt);
while (1) {
ret = av_read_frame(ifmt_ctx, pkt); // 不会去释放pkt的buf,如果我们外部不去释放,就会出现内存泄露
if(ret < 0 ) {
av_strerror(ret, errors, ERROR_STRING_SIZE);
printf("av_read_frame failed:%s\n", errors);
break;
}
// av_read_frame 成功读取到packet,则外部需要进行buf释放
if(pkt->stream_index == video_index) {
// 处理视频
ret = av_bsf_send_packet(bsf_ctx, pkt); // 内部把我们传入的buf转移到自己bsf内部
if(ret < 0) { // 基本不会进入该逻辑
av_strerror(ret, errors, ERROR_STRING_SIZE);
printf("av_bsf_send_packet failed:%s\n", errors);
av_packet_unref(pkt);
continue;
}
// av_packet_unref(pkt); // 这里不需要去释放内存
while (1) {
ret = av_bsf_receive_packet(bsf_ctx, pkt);
if(ret != 0) {
break;
}
size_t size = fwrite(pkt->data, 1, pkt->size, h264_fd);
if(size != pkt->size)
{
av_log(NULL, AV_LOG_DEBUG, "h264 warning, length of writed data isn't equal pkt->size(%d, %d)\n",
size,
pkt->size);
}
av_packet_unref(pkt);
}
} else if(pkt->stream_index == audio_index) {
// 处理音频
char adts_header_buf[7] = {0};
adts_header(adts_header_buf, pkt->size,
ifmt_ctx->streams[audio_index]->codecpar->profile,
ifmt_ctx->streams[audio_index]->codecpar->sample_rate,
ifmt_ctx->streams[audio_index]->codecpar->channels);
fwrite(adts_header_buf, 1, 7, aac_fd); // 写adts header , ts流不适用,ts流分离出来的packet带了adts header
size_t size = fwrite( pkt->data, 1, pkt->size, aac_fd); // 写adts data
if(size != pkt->size)
{
av_log(NULL, AV_LOG_DEBUG, "aac warning, length of writed data isn't equal pkt->size(%d, %d)\n",
size,
pkt->size);
}
av_packet_unref(pkt);
} else {
av_packet_unref(pkt); // 释放buffer
}
}
printf("while finish\n");
failed:
if(h264_fd) {
fclose(h264_fd);
}
if(aac_fd) {
fclose(aac_fd);
}
if(pkt)
av_packet_free(&pkt);
if(ifmt_ctx)
avformat_close_input(&ifmt_ctx);
printf("Hello World!\n");
return 0;
}
构建项目
将source.200kbps.768x320.mp4放到
设置

运行
播放
cd E:\Ovoice\av_media\online\07-ffmpeg-decode\build-07-08-demux-mp4-Desktop_Qt_5_10_1_MinGW_32bit-Debug
ffplay out.aac
ffplay out.h264
网友评论