信息熵还和(无损)数据压缩有关:
For example, it quantififies the limit of much (lossless) compression of data can be achieved, a problem in information theory known as source coding.

具体来说:
Entropy can be interpreted as the average length of the shortest encoding of a sequenceof random events, which here are repeated draws (with replacement) of colored marbles from anurn. (Top) With equal probability of each color, the shortest binary recording assigns a two-digitbinary string code to each event. The entropy is the average number of bits per event of a typical sequence: 2 bits. (Bottom) When some colors are more likely than others, the more probable onescan be recorded as shorter binary strings to save space. This gives a shorter entropy: 1.75 bits.
计算一下两个概率分布也可以看出,第一个案例的信息熵就是2,第二个就是1.75。
尤其是针对第二个案例:
This is an example of a prefifix code, which can be uniquely decoded even if the the codewords are concatenated one after another, since no codeword is a prefifix of another.
如果也采用了和第一个案例的相同编码,那么信息熵没有变,但是平均编码变长了,成为了2位:
In other words, this code would introduce redundancy.
网友评论