美文网首页玩耍Python程序员Python 运维
Python: collections模块实例透析

Python: collections模块实例透析

作者: 心智万花筒 | 来源:发表于2016-08-16 08:56 被阅读648次

    Collections模块

    collections模块提供了一些python内置数据类型的扩展,比如OrderedDictdefaultdictnamedtupledequecounter等,简单实用,非常值得学习了解。

    import collections
    

    1. OrderedDict

    顾名思义,有顺序的词典,次序不再是随机的。普通的dict不记录插入的顺序,遍历其值的时候是随机的,相反,OrderedDict记录插入的顺序,在迭代的时候可以看出差异。

    遍历

    print 'Regular dictionary:'
    d = {}
    d['a'] = 'A'
    d['b'] = 'B'
    d['c'] = 'C'
    
    for key, value in d.items():
        print key, value
    
    Regular dictionary:
    a A
    c C
    b B
    
    print 'OrderedDict:'
    d = collections.OrderedDict()
    d['a'] = 'A'
    d['b'] = 'B'
    d['c'] = 'C'
    
    for key, value in d.items():
        print key, value
    
    OrderedDict:
    a A
    b B
    c C
    

    相等比较

    比较两个词典是否相等,普通词典比较只看内容,内容相同即判定相等为真;而OrderedDict同时会考虑顺序,item被添加的顺序。

    print 'dict       :',
    d1 = {}
    d1['a'] = 'A'
    d1['b'] = 'B'
    d1['c'] = 'C'
    
    d2 = {}
    d2['b'] = 'B'
    d2['a'] = 'A'
    d2['c'] = 'C'
    
    print d1 == d2
    
    dict       : True
    
    print 'OrderedDict:',
    d1 = collections.OrderedDict()
    d1['a'] = 'A'
    d1['b'] = 'B'
    d1['c'] = 'C'
    
    d2 = collections.OrderedDict()
    d2['b'] = 'B'
    d2['a'] = 'A'
    d2['c'] = 'C'
    
    print d1 == d2
    
    
    OrderedDict: False
    

    2. defaultdict

    普通词典,当你访问没有的键值时,会抛出异常,用defaultdict,可以预先给定默认值,尤其默认值是需要做累积或聚合操作的时候(比如计数)。defaultdict接受一个参数default_factory,该函数负责返回特定的值,可以自定义,也可以用list(返回[ ]) set(返回set())int(返回0),直接上例子说的比较清楚。

    defaultdict其实是继承dict类后。添加了__missing__(key)方法,用于处理KeyError异常。

    def default_factory():
        return 'This is default string value'
    d = collections.defaultdict(default_factory)
    print d['foo']
    
    This is default string value
    

    这里没有定义d['foo'],但是可以访问,并返回值。下面看点更厉害的!

    list

    default_factory设定为list可以方便地把一系列键值对group起来。默认会返回空的list,下面例子把相同的键group在一起。

    s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
    d = collections.defaultdict(list)
    for k, v in s:
        d[k].append(v)
        # simpler and faster than d.setdefault(k, []).append(v)
    d.items()
    
    [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
    

    int

    计数的时候特别方便,比如要统计每个键值出现多少次。

    s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
    d = collections.defaultdict(int)
    for k, v in s:
        d[k] += 1
    d.items()
    
    [('blue', 2), ('red', 1), ('yellow', 2)]
    
    s = 'mississippi'
    d = collections.defaultdict(int)
    for k in s:
        d[k] += 1
    d.items()
    
    [('i', 4), ('p', 2), ('s', 4), ('m', 1)]
    

    set

    list功能类似,但返回set(),剔除了重复元素。

    s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
    d = collections.defaultdict(set)
    for k, v in s:
        d[k].add(v)
    d.items()
    
    [('blue', {2, 4}), ('red', {1, 3})]
    

    3. namedtuple

    默认的tuple是用数字做索引的,而namedtuple是可以按名字访问,对fields很多,或者创建和使用场景离得比较远的情况,比较有用。

    bob = ('Bob', 30, 'male')
    print 'Representation:', bob
    
    jane = ('Jane', 29, 'female')
    print '\nField by index:', jane[0]
    
    print '\nFields by index:'
    for p in [ bob, jane ]:
        print '%s is a %d year old %s' % p
    
    Representation: ('Bob', 30, 'male')
    
    Field by index: Jane
    
    Fields by index:
    Bob is a 30 year old male
    Jane is a 29 year old female
    

    由于不同的nametuple不一样,我们要单独定义,同时按name访问(依然可以按数字访问)。

    # define namedtuple
    Person = collections.namedtuple('Person','name age gender')
    
    print 'Type of Person:', type(Person)
    bob = Person(name='Bob', age=30, gender='male')
    print '\nRepresentation:', bob
    
    bob = Person('Bob',30,'male') # also supported
    print 'Representation:', bob
    
    jane = Person(name='Jane', age=29, gender='female')
    print '\nField by name:', jane.name
    print 'Field by name:', jane[0]
    
    Type of Person: <type 'type'>
    
    Representation: Person(name='Bob', age=30, gender='male')
    Representation: Person(name='Bob', age=30, gender='male')
    
    Field by name: Jane
    Field by name: Jane
    

    4. deque

    double-ended queue,双向队列,支持任何一侧的addremove操作。普通的stackqueuedeque的退化形式。

    当然,deque依然是sequence,所以一些列表类似的操作也是支持的。

    d = collections.deque('abcdefg')
    print 'Deque:', d
    print 'Length:', len(d)
    print 'Left end:', d[0]
    print 'Right end:', d[-1]
    
    d.remove('c')
    print 'remove(c)', d
    
    Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
    Length: 7
    Left end: a
    Right end: g
    remove(c) deque(['a', 'b', 'd', 'e', 'f', 'g'])
    

    populating

    往队列push元素

    import collections
    
    # Add to the right
    d = collections.deque()
    d.extend('abcdefg') # append with elements from the iterable
    print 'extend    :', d
    d.append('h')
    print 'append    :', d
    
    # Add to the left
    d = collections.deque()
    d.extendleft('abcdefg')
    print 'extendleft:', d
    d.appendleft('h')
    print 'appendleft:', d
    
    extend    : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
    append    : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
    extendleft: deque(['g', 'f', 'e', 'd', 'c', 'b', 'a'])
    appendleft: deque(['h', 'g', 'f', 'e', 'd', 'c', 'b', 'a'])
    

    consuming

    从双向队列pop元素。

    print 'From the right:'
    d = collections.deque('abcdefg')
    while True:
        try:
            print d.pop(),
        except IndexError:
            break
    
    From the right:
    g f e d c b a
    
    print '\nFrom the left:'
    d = collections.deque('abcdefg')
    while True:
        try:
            print d.popleft(),
        except IndexError:
            break
    
    From the left:
    a b c d e f g
    

    5. Counter

    计数器,顾名思义。构造器接受以下形式,实现初始化。

    print collections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])
    print collections.Counter({'a':2, 'b':3, 'c':1})
    print collections.Counter(a=2, b=3, c=1)
    
    Counter({'b': 3, 'a': 2, 'c': 1})
    Counter({'b': 3, 'a': 2, 'c': 1})
    Counter({'b': 3, 'a': 2, 'c': 1})
    

    update

    c = collections.Counter()
    print 'Initial :', c
    
    c.update('abcdaab')
    print 'Sequence:', c
    
    c.update({'a':1,'d':5}) # increse not replace
    print 'Dict    :', c # add to a and d
    
    Initial : Counter()
    Sequence: Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})
    Dict    : Counter({'d': 6, 'a': 4, 'b': 2, 'c': 1})
    

    访问

    访问时候利用和字典一样的API。但对于没有的键,不会抛出异常,而是计数为0。

    c = collections.Counter('abcdaab')
    for letter in 'abcde':
        print '%s : %d' % (letter, c[letter])
    
    a : 3
    b : 2
    c : 1
    d : 1
    e : 0
    

    elements

    产生包含所有元素的一个迭代器。

    c = collections.Counter('China')
    c['z'] = 0
    print c
    print list(c.elements())
    
    Counter({'a': 1, 'C': 1, 'i': 1, 'h': 1, 'n': 1, 'z': 0})
    ['a', 'C', 'i', 'h', 'n']
    

    most_common()

    返回前n个最常见的。

    c = collections.Counter('abcdaab')
    c.most_common(2)
    
    [('a', 3), ('b', 2)]

    相关文章

      网友评论

        本文标题:Python: collections模块实例透析

        本文链接:https://www.haomeiwen.com/subject/aehpsttx.html