美文网首页
Python2.7.X字符串比较注意点

Python2.7.X字符串比较注意点

作者: LaxChan | 来源:发表于2018-11-16 18:28 被阅读0次

    字符串前缀说明

    • u前缀
      Unicode编码
    • b 前缀
      Ascll编码
    • 无前缀
      默认编码

    出现问题现象

    • 两个字符串列表取交集耗时长
      两个列表字符串内容一样,大小为3k+
      以遍历的方式取交集耗时达到近5s
    • 字符串格式
      其中一个列表中的字符串带有前缀u,另一个则没有
    • 代码样例
    a12 = [s for s in a1 if s in a2]
    

    初步方案

    • 将前缀u的列表进行encode
    a12 = [s for s in a1 if s.encode("utf-8") in a2]
    
    • 效果
    ('diff|loop speed ', 4998.8330078125ms, '| encode loop speed ', 123.25ms)
    

    使用intersection函数

    • 代码
    a13 = list(set(a1).intersection(set(a2)))
    
    • 效果
    ('diff|loop speed ', 4998.8330078125ms, '| encode loop speed ', 123.25ms, '|intersection speed : ', 3.626953125ms)
    

    初步结论

    • 两个列表取交集建议使用intersection函数

    几种方式对比

    • 对比数据
      相同编码的列表,不同编码的列表,遍历取交集,遍历编码后取交集,使用intersection函数
    • 测试代码
    def loopComp(a,b):
        c=[s for s in a if s in b]
        print('loopComp ret size : ',len(c))
    
    def intersectionComp(a,b):
        c=list(set(a).intersection(set(b)))
        print('intersectionComp ret size : ',len(c))
    
    def encodeIntersectionComp(a,b):
        a1=[s.encode("utf-8") for s in a]
        c=list(set(a1).intersection(set(b)))
        print('encodeIntersectionComp ret size : ',len(c))
    
    def encodeloopComp(a,b):
        c=[s for s in a if s.encode("utf-8") in b]
        print('encodeloopComp ret size : ',len(c))
    
    print('==========same encode list==========')
    %time loopComp(a1,a2)
    %time encodeloopComp(a1,a2)
    %time intersectionComp(a1,a2)
    %time encodeIntersectionComp(a1,a2)
    
    print('==========diff encode list==========')
    %time loopComp(a1,a3)
    %time encodeloopComp(a1,a3)
    %time intersectionComp(a1,a3)
    %time encodeIntersectionComp(a1,a3)
    
    • 结果数据
    ==========same encode list==========
    ('loopComp ret size : ', 3559)
    CPU times: user 172 ms, sys: 3.1 ms, total: 175 ms
    Wall time: 167 ms
    ('encodeloopComp ret size : ', 3559)
    CPU times: user 4.79 s, sys: 4.86 ms, total: 4.8 s
    Wall time: 4.82 s
    ('intersectionComp ret size : ', 3559)
    CPU times: user 920 µs, sys: 0 ns, total: 920 µs
    Wall time: 851 µs
    ('encodeIntersectionComp ret size : ', 3559)
    CPU times: user 4.97 ms, sys: 0 ns, total: 4.97 ms
    Wall time: 4.88 ms
    ==========diff encode list==========
    ('loopComp ret size : ', 3559)
    CPU times: user 4.81 s, sys: 7.46 ms, total: 4.82 s
    Wall time: 4.83 s
    ('encodeloopComp ret size : ', 3559)
    CPU times: user 125 ms, sys: 0 ns, total: 125 ms
    Wall time: 126 ms
    ('intersectionComp ret size : ', 3559)
    CPU times: user 3.53 ms, sys: 0 ns, total: 3.53 ms
    Wall time: 3.54 ms
    ('encodeIntersectionComp ret size : ', 3559)
    CPU times: user 2.34 ms, sys: 0 ns, total: 2.34 ms
    Wall time: 2.32 ms
    

    结论

    • 相同编码的列表,使用intersection函数取交集性能最好
    • 在不确定列表编码的情况下,必须使用intersection函数取交集

    扩展

    • 环境为Python3
      有无u前缀的字符串列表取交集性能表现一致
      使用intersection函数取交集性能最好
    • 运行结果数据
    
    ==========same encode list==========
    loopComp ret size :  3559
    CPU times: user 129 ms, sys: 1.32 ms, total: 130 ms
    Wall time: 129 ms
    encodeloopComp ret size :  0
    CPU times: user 253 ms, sys: 122 µs, total: 253 ms
    Wall time: 253 ms
    intersectionComp ret size :  3559
    CPU times: user 605 µs, sys: 0 ns, total: 605 µs
    Wall time: 706 µs
    encodeIntersectionComp ret size :  0
    CPU times: user 1.31 ms, sys: 0 ns, total: 1.31 ms
    Wall time: 1.32 ms
    ==========diff encode list==========
    loopComp ret size :  3559
    CPU times: user 123 ms, sys: 0 ns, total: 123 ms
    Wall time: 122 ms
    encodeloopComp ret size :  0
    CPU times: user 248 ms, sys: 0 ns, total: 248 ms
    Wall time: 249 ms
    intersectionComp ret size :  3559
    CPU times: user 0 ns, sys: 0 ns, total: 0 ns
    Wall time: 689 µs
    encodeIntersectionComp ret size :  0
    CPU times: user 1.47 ms, sys: 0 ns, total: 1.47 ms
    Wall time: 1.3 ms
    

    相关文章

      网友评论

          本文标题:Python2.7.X字符串比较注意点

          本文链接:https://www.haomeiwen.com/subject/ydtvfqtx.html