美文网首页
Python正则表达式匹配换行符

Python正则表达式匹配换行符

作者: SeanCheney | 来源:发表于2019-12-12 10:59 被阅读0次

    默认时,Python正则中的.是不能匹配换行符的,如果碰到下面这种带有换行的js字符串该怎么办呢?

    下面用到的js2py,是一个用Python执行js,可对JavaScript渲染的库。这里用来拼接出真正的url

    import re
    import js2py
    
    txt = '''
    (new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + 'b9be9b04-7bcd-4a70-b412-70e1eb33fd1c' + '&token=' + '0177FFA5CCF44B442226BA55C2563A922371B60D5DF19CE0' + '&from=inner';
    
        setTimeout(function () {
            var url = '';
            url += 'http://mp.w';
            url += 'eixin.qq.co';
            url += 'm/s?src=11&';
            url += 'timestamp=1';
            url += '576115412&v';
            url += 'er=2029&sig';
            url += 'nature=3OfX';
            url += 'g*vTl0xc6Uv';
            url += 'afcTMAEg9B8';
            url += 'Ed0UQLlh744';
            url += '19o9uA1j0KFuh1W99OnNadkNegwwNkr5B7kI4g7k9vQzqb-BPoSoEESUUcMlerw99vocCRWur0Fp9fVATo*2aTRYiUo&new=1';
            url.replace("@", "");
            window.location.replace(url)
        },100);      
    '''
    
    # 这里用的是`.*?`匹配换行符
    url_var = re.search('(var url.*?url\.replace\("@", ""\);)', txt).group(1)
    url_rendered = js2py.eval_js(url_var)
    print(url_rendered)
    

    强行照上面写的话,结果就会报错。

    解决方法之一,是使用[\s\S]*?代替.*?[\s\S]是可以匹配包括换行符的任意字符的。

    import re
    import js2py
    
    txt = '''
    (new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + 'b9be9b04-7bcd-4a70-b412-70e1eb33fd1c' + '&token=' + '0177FFA5CCF44B442226BA55C2563A922371B60D5DF19CE0' + '&from=inner';
    
        setTimeout(function () {
            var url = '';
            url += 'http://mp.w';
            url += 'eixin.qq.co';
            url += 'm/s?src=11&';
            url += 'timestamp=1';
            url += '576115412&v';
            url += 'er=2029&sig';
            url += 'nature=3OfX';
            url += 'g*vTl0xc6Uv';
            url += 'afcTMAEg9B8';
            url += 'Ed0UQLlh744';
            url += '19o9uA1j0KFuh1W99OnNadkNegwwNkr5B7kI4g7k9vQzqb-BPoSoEESUUcMlerw99vocCRWur0Fp9fVATo*2aTRYiUo&new=1';
            url.replace("@", "");
            window.location.replace(url)
        },100);      
    '''
    
    # 这里用的是`[\s\S]*?`匹配换行符
    url_var = re.search('(var url[\s\S]*?url\.replace\("@", ""\);)', txt).group(1)
    url_rendered = js2py.eval_js(url_var)
    print(url_rendered)
    

    解决方法之二,设置re.DOTALL,就可以使.匹配换行符了,如下:

    import re
    
    txt = '''
    (new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + 'b9be9b04-7bcd-4a70-b412-70e1eb33fd1c' + '&token=' + '0177FFA5CCF44B442226BA55C2563A922371B60D5DF19CE0' + '&from=inner';
    
        setTimeout(function () {
            var url = '';
            url += 'http://mp.w';
            url += 'eixin.qq.co';
            url += 'm/s?src=11&';
            url += 'timestamp=1';
            url += '576115412&v';
            url += 'er=2029&sig';
            url += 'nature=3OfX';
            url += 'g*vTl0xc6Uv';
            url += 'afcTMAEg9B8';
            url += 'Ed0UQLlh744';
            url += '19o9uA1j0KFuh1W99OnNadkNegwwNkr5B7kI4g7k9vQzqb-BPoSoEESUUcMlerw99vocCRWur0Fp9fVATo*2aTRYiUo&new=1';
            url.replace("@", "");
            window.location.replace(url)
        },100);      
    '''
    
    pattern = re.compile(r'(var url.*?url\.replace\("@", ""\);)', re.DOTALL)
    res = pattern.search(txt).group(1)
    print(res)
    

    相关文章

      网友评论

          本文标题:Python正则表达式匹配换行符

          本文链接:https://www.haomeiwen.com/subject/idbzgctx.html