python性能优化指南

作者: openex | 来源:发表于2018-04-04 16:48 被阅读0次

Awesome Extra
spark性能调优
目录
python性能优化指南
前端性能优化系列
美团关于大数据技术的文章
前端工程系列（二）
关于前端性能优化问题详解
Spark—9、性能优化—美团的性能优化指南
前端性能和优化

本文用以记录在python开发中遇到的性能提高技巧
持续更新中...

1.字符串

在python中string对象是不可变的，而字符串的相加会产生新的字符串。
当需要迭代生成一个长字符串时逐一相加不仅会影响速度也会而外增加内存消耗（如中间结果，参考java StringBuffer）, 但是当仅需链接很少的字符串时join方法未必明智

join的恰当使用
""%()
format比较慢

避免

s = ""
for x in list1:
    s += x

2.循环

python解释器在解释for循环时会有较大的性能损耗，如果可以建议多使用列表解析来代替for循环,另外迭代器也在优化时间外还能够减少内存开销,在调用自写函数时map也会提高程序性能

newlist = [s.upper() for s in oldlist]

iterator = (s.upper() for s in oldlist)
newlist = list(iterator)

在循环中尽量避免 .（点号操作符）的使用，因为这往往会增加调用开销
避免

newlist = []
for x in oldlist:
    newlist.append(x.upper())

建议

newlist = []
append = newlist.append
upper = str.upper
for x in oldlist:
    append(upper(x))

3.变量

在函数中尽量使用局部变量，因为调用解释器会先搜索局部变量再搜索全局变量

避免

def f1():
    newlist = []
    append = newlist.append
    upper = str.upper
    global oldlist
    for x in oldlist:
        append(tool1(x))
    return newlist

建议

def f2(oldlist):
    newlist = []
    append = newlist.append
    for x in oldlist:
        append(tool1(x))
    return newlist

4.字典

针对无初值的情况进行优化（提升并不多20%左右，但在一定程度上降低了可读性）

原始方法

def f5(oldlist):
    d = {}
    for string in oldlist:
        if string not in d:
            d[string] = 0
        else:
            d[string] += 1

优化方法

def f6(oldlist):
    d = {}
    for string in oldlist:
        try:
            d[string] += 1
        except KeyError:
            d[string] = 0
            
from collections import defaultdict
def f8(oldlist):
    d = defaultdict(int)
    for string in oldlist:
        d[string] += 1