美文网首页
17. The Huffman code problem

17. The Huffman code problem

作者: 何大炮 | 来源:发表于2017-09-25 09:46 被阅读0次

    The principle behind this technique is to design a binary character code for each character, using a variable number of bits to represent each character, so as to reduce the total length of the document.

    To allow easy, unambiguous decoding, we require the code to be a prefix code: no code is a prefix of another

    For example, if the codewords are {0, 01, 11, 001}, the decoding of string like 001 is ambiguous. You can interpret it as 0-01, or 001.

    Solution ---- insisting on the prefix-free property: no codeword can be a prefix of another codeword, where codeword is a string of digits to represent a character in the encoding.

    Details

    Given a character set C and the frequency f (c) of each character c belongs to C in the document A, the problem is to find a prefix code that minimizes the total length of the document.
    The tree representing such an optimal code is called a Huffman tree T . We aim to minimize

    B(T,A) = sum of f(c)d(c)

    ——d (c) is the depth of character c in the tree T (numbers of bits used to encode character c), or the length of the codeword of character c.
    ——B(T,A) is the total number of bits for document A based on the Huffman tree T.

    The problem is to design a tree T to minimize B(T,A).

    1. There are no nodes with only a single child.
    2. Rearranging characters within the same level of the tree T does not change B(T,A).
    3. Swapping a character at a higher level with another less-frequent character reduces B(T,A), where the level of the tree root is the lowest level

    We have a greedy strategy for the Huffman coding problem:

    Process

    For efficient implementation of a Huffman tree, we can use a priority queue data structure - a minimum heap

    The running time is O(n) +O(n) + O(nlogn) = O(nlogn)

    相关文章

      网友评论

          本文标题:17. The Huffman code problem

          本文链接:https://www.haomeiwen.com/subject/argnsxtx.html