Each sample represents the amplitude of the wave, or the displacement of the membrane, as a percentage of the maximum possible value. The advantage of using percentages is hardware independence. Half is half, regardless of what you’re halving.
Unlike integers, in which larger numbers require more digits to express, fractions need more digits to express smaller numbers. Writing 100 takes more digits than writing 10, but writing [1/100] takes more digits than writing [1/10]. A sample’s bit depth is the number of digits it has. If the difference between two sounds is smaller (as a percentage of the maximum possible displacement) than the sample has digits to express, that difference is lost.
The bit depth (measured in bits per sample) multiplied by the sample rate (measured in samples per second) results in the bit rate (measured in bits per second). That is the number of bits required to describe 1 second of audio. A higher bit rate gives a higher-quality recording but also means more bits for the hardware to store and process.
The fundamental problem of digital fidelity lies in making the best approximations, given the limits of the hardware. Every different format is a different solution set of compromises to solve this problem. This is true not only in digital audio, but generally. Digital photography, for example, has the same sorts of problems, with the same alphabet soup of formats, each offering its own solution.
1.单个采样使用振幅为分母,当前的位移除以振幅的百分数为采样的数据,如果采样最小的单位为 1/100,那么如果当前样本的位移为 1/100 + 1/200.那么这个 1/200就会被忽略掉。
2. 最小采样刻度为1/100 就是要比 最小刻度是 1/10 的 需要更多的 byte(位)去表示。因为每个采样都需要 容纳100这样的数去表示,至少要7位,而10只需要 4位。
3. 比特深度(就是每个样本需要多少位) × 采样率(每秒采多少次样本)= 比特率 ,比特率就是一秒音频需要的byte数,数字越大说明采样频率越高,或者是采样位深越大,就意味着可以提高高质量的录音。但也意味着存储和处理处理的byte数更高
4. 考虑到硬件的局限性,必然要在比特率和保真度之间做好平衡。每一种不同的格式都是解决这个问题的不同折中方案。
One niggling implementation detail is that this metaphor works only with a grayscale image. Computers can’t actually display color pixels the way humans see them. Each color pixel consists of red, green, and blue lights. Each requires its own set of data, called a channel. Most image formats combine one sample from each channel into a bundle representing a single pixel.
Digital audio shares these issues. A monaural sound wave is like a grayscale image, but many sound systems have multiple speakers. Just as a pixel requires channels for red, blue, and green, stereo sound requires channels for left and right. Surround sound adds extra channels. A typical 5.1 surround-sound signal has six channels: left and right channels for the front and rear, a center channel, and a nondirectional channel for low-frequency effects, known to aficionados simply as bass.
As with their graphical brethren, audio formats typically combine one sample per channel into a bundle, called a frame. Whereas a pixel represents all color channels in one area in space, a frame represents all audio channels at one moment in time. So for mono sound, a frame has one sample; for stereo, it has two. If you put multiple channels of sound in one stream, you call the channels interleaved. This is common for playback: Because you want to read all the channels at the same time, it makes sense to arrange the data to do this easily.
1.每个彩色像素都是红,绿,蓝组成,每个像素都需要租户的一组数据,称为通道。大多数图片格式是将一个像素 的所有通道的样本组成一个单像素的包。
2.音频和图像类似,立体声音需要左右通道,环绕声需要增加额外的声道,以及低频效果的非定向声道。一个典型的 5.1环绕声信号需要6个声道:前后左右声道,中间声道,以及低频效果的非定向声道,被爱好者们成为低音。
3.与图像类似,音频将每个通道的一个样本组合成一个包,成为帧。 一个像素代表控件中的一个区域的所有颜色通道,而一帧则代表同一时刻的所有音频通道.所以对于单声道的声音来说,一帧有一个样本,对于立体声来说,它有两个样本。如果你把多个通道的声音放在一个流中,你称这些通道为交错。这对于播放来说非常常见,因为你想同时读取所有所有的通道,所以能够轻松做到这个很有意义。
Some audio formats combine multiple frames as packets. This concept is entirely a creation of a given audio format and represents an indivisible unit within that format. LPCM doesn’t use packets, but compressed audio formats do because they use mathematical techniques with a group of samples to approximate their values. A given sample can often be predicted to some degree by those around it, so compressed formats can use groups of samples, arranged in frames, to produce a similar (if not identical) wave from much less data than the uncompressed LPCM original.
1.多个数据帧我们成为数据包
2.LPCM不使用数据包,也就是无压缩的音乐格式不使用数据包
3.PCM的使用数据包,使用压缩的方式的好处是可以使用一组样本运用数学的方式去近似他们的值,根据跟定的值可以在一定程度上预测到临近的值,从而比未压缩的LPCM原始数据少的多的数据产生相似的波。
We mentioned the bit rate earlier, the amount of data over a given period of time to represent audio in some format. For PCM, the bit rate is constant: CD-quality audio has a bit rate of 1,411,200 bits per second, or 1,411 kbps, because it has 2 channels × 16 bits × 44,100 samples per second. PCM is said to have a constant bit rate because this data rate never changes for a given combination of channel count, bit depth, and sample rate. Compressed formats often use a variable bit rate, meaning that the amount of data needed to compress any particular part of the data changes. Core Audio supports variable frame rate formats: The amount of data for any given frame may vary, but the packets remain the same size. Core Audio also supports variable packet rate, in which even the packet rate may change. However, currently no variable packet rate formats are in widespread use.
One reason to care about this distinction is that constant bit rate (CBR) data sometimes employs simpler API calls than variable bit rate (VBR). For example, when you read audio data from a file or stream, VBR data supplies you with a block of data and an array of packet descriptions that help you figure out what samples go with what times. For CBR, this is unnecessary; the amount of data for every frame is constant, so you can find a given time’s samples by multiplying the frame size by the time.
PCM:非线性脉冲编码调制
LPCM:线性脉冲编码调制
它们是一种将模拟语音信号转换为数字信号的编码方式,为无损非压缩编码.
对于PCM来说,比特率 = 信道数 * 比特深度 * 采样率 ,如果信道数,比特深度,采样率固定,那么比特率就固定。
压缩格式通常采用可变比特率,这意味着压缩数据的任何部分的帧的数据可能都会变化,但是数据包的大小不变.包的大小不变也就是说包的速率不变。core Audio 还支持可变包速率,就是说在这种情况下,包的速率也可能改变。
固定比特率(CBR)数据有时使用比 可变比特率(VBR)在API调用上更简单。比方说:当你从文件或者流中读取音频数据时,VBR 数据为你提供了意见数据块和一组包描述,帮你找出什么样的样本和什么时间,对于CBR来说,这是没必要的,因为每一个帧的数据量是恒定的,所以我们可以通过帧大小乘以时间去找到给定时间的样本。
重复的频率决定你能听到的音调,波形影响声音的特性和音色。
相同的频率生成的声音可以发现听起来还是有差别,这是因为音色不同。
Buffers (缓冲区)
你可能注意到将书上举例的写到你的文件中需要几秒钟的时间,对于生成5s的文档可能并不太好。
一次写入一个样本效率非常低,AudioFileWriteBytes 这个方法一次可以写入多个字节,多次函数调用开销很大,实际上可以创建一个缓冲区来保存这一堆样本,然后将这些文档全部写入文件。不仅仅是写文件的问题,还有一个问题是运行时产生一些声音,简单说,计算机不同部分的运行时间并不相同.音频硬件产生或者消耗一个音频数据包所需时间远远小于将数据包从内存中进出所需的时间,这种减慢被称为冯.诺依曼瓶颈。
当音频硬件无事可做时,就会发出噪音,破坏魔法。为了防止这种情况发生,你就可以使用缓冲区来传递音频数据包。大量的大缓冲区意味着,在计算机慢速部分的任何小问题就可以在快速部分通知前的得到解决。当音频数据的一个缓冲区耗尽时,另外一个缓冲区(希望)已经到来。
缓冲区会增加主题的复杂性。
还会有下面的坏的一方面
Aside from further increasing the complexity of the subject, buffering has the unpleasant side effect of making it harder to get the hardware’s attention when you actually want it. Think of it as data-level bureaucracy—some of it’s necessary, but too much of it makes everything take too long.
// 如果你是用编写压缩格式时,必须使用更复杂的 AudioFileWritePackets()
//audioFile 写入的文件
//false 是否需要缓冲
// sampleCount * 2 写入的位置
// bytesToWrite 要写入的字节个数
// sample 写入字节的内容
audioErr = AudioFileWriteBytes(audioFile,
false,
sampleCount*2,
&bytesToWrite,
&sample);
assert (audioErr == noErr);
还有一个问题是延迟,即启动一项行动到看到行动的结果之间的延迟。例如iphone 和 ipod Touch 的硬件延迟为 15 -30 ms,取决于型号。
缓冲区和延迟 是一个微妙的平衡行为:大的缓冲器会导致更高的延迟,但如果你想运气更好寻找更低的延迟,你就会耗尽缓冲器,并听到沉默(退出)或噪音。
音频格式:
描述音频和将音频存储到文件系统是完全不同的问题,数据格式是用来描述音频的,文件格式是解决将文件存储到文件系统中的。可以将音频文件看做是音频数据的容器。
有些文件格式是为了特定的数据格式定制的,比方说 MP3 ,一个MP3就不能包含PCM文件
Windows 媒体数据 只有MP3 数据
AIFF 文件处理 PCM 文件, 必须是大端序。
WAV 处理 PCM 文件,必须是小端序。
MP4 文件格式可以包含多种格式的数据 包括 AAC ,PCM, AC3
CAF :一个CAF文件包含着任何Core Audio 支持的任何的音频格式:MP3, AAC ,Apple loss,应有尽有。这使得CAF 成为应用程序内部音频(如背景音乐或者音效)的容器格式的最佳选择。
CAF 使用了很多技巧来提高性能。举一个例子:以MP3 音频为例: 因为它有一个可变的比特率,跳跃到任何一个时间点,MP3 文件需要从当前播放位置解压数据,直到到达你想跳转到的时间.你没有其他方法知道文件的哪个部分代表所要找的时间。这个需要大量的I/O和 CPU成本,通常需要大量的时间来执行。
而CAF设置了将事件映射到样本的内部参照表,所以它几乎可以立即实现跳跃。
网友评论