Python正则表达式笔记

作者: cynthia猫 | 来源:发表于2018-07-22 13:26 被阅读50次

Python ☞ day 11
正则表达式
正则表达式
正则表达式
爬虫基础系列正则表达式——爬取简书个人文章标题
Python正则表达式指南
Python3.5笔记——第11章正则表达式
正则表达式
Python爬虫(十)_正则表达式
python正则表达式

正则表达式呢，属于看一遍忘一遍的东西。那就来作个笔记吧。
在python中，是使用re模块来实现的。

re.match

这个呢，是从字符串的起始位置开始匹配！

常规匹配
这个没啥好说的~~
泛匹配

content = 'hello 1234567 demo'
result = re.match('hello.*demo$', content)
# .* 这里.匹配任意字符，*表示重复多次

匹配目标

content = 'hello 1234567 demo'
result = re.match('hello\s(\d+)\sdemo$', content)
print(result.group(1))

> 1234567
# 使用()把目标括起来

贪婪匹配

content = 'hello 1234567 demo'
result = re.match('h.*(\d+).*o$', content)
print(result.group(1))

>7
# 7前面的数字都被.*匹配走了，所以(\d+)只匹配到一个7

非贪婪匹配

content = 'hello 1234567 demo'
result = re.match('h.*?(\d+).*o$', content)
print(result.group(1))

>1234567
# 加了问号，就是非贪婪匹配。尽量使用这种方法，避免想要匹配字符没有被匹配到

匹配模式

content = 'hello 1234567 
demo'
result = re.match('h.*?(\d+).*o$', content, re.S)
print(result.group(1))

>1234567
# 这里如果不使用匹配模式，用.是无法匹配到换行符的，必须要使用了匹配模式才可以匹配到

转义

content = 'hello price is $5.00'
result = re.match('h.*?(\$5\.00)$', content, re.S)
print(result.group(1))

>$5.00
# 要转义的字符，前面用\

re.search

扫描整个字符串，并返回第一个成功的匹配。
为匹配方便，能用search就不用match

re.findall

content = " \n<li data-view = '1' singer='art1'>song1</li>\n<li data-view = '2' singer='art2'>song2</li>\n<li data-view = '13 singer='art3'>song3</li>\n<li data-view = '7' singer='art4'>song4</li>\n"
results = re.findall('<li.*?singer=(.*?)>(.*?)</li>', content, re.S)
print(results)

>[("'art1'", 'song1'), ("'art2'", 'song2'), ("'art3'", 'song3'), ("'art4'", 'song4')]

re.sub

替换字符串中每一个匹配的子串后，返回替换后的字符串

content = 'strings hello 1234567 this is a demo'
content = re.sub('\d+', '', content)
print(content)
>strings hello  this is a demo

content = 'strings hello 1234567 this is a demo'
content = re.sub('\d+', 'replace', content)
print(content)
>strings hello replace this is a demo

content = 'strings hello 1234567 this is a demo'
content = re.sub('(\d+)', r'\1 987', content)
print(content)
>strings hello 1234567 987 this is a demo
# r是一个转义字符，\1表示把第一个括号里的内容拿出来

content = '<a href = "1.jpg"><li data="1">dakfjalflal</li>abode</a>'
content = re.sub('<a.*?>|</a>', '', content)
print(content)

re.compile

将一个正则表达式串编译成正则对象，以便于复用该匹配模式。

content = 'hello 1234567 \ndemo'
pattern = re.compile('hello.*demo', re.S)
result = re.match(pattern, content)
print(result)

网友评论

本文标题：Python正则表达式笔记

本文链接：https://www.haomeiwen.com/subject/ciplmftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Python正则表达式笔记

re.match

re.search

re.findall

re.sub

re.compile

相关文章

Python ☞ day 11

正则表达式

正则表达式

正则表达式

爬虫基础系列正则表达式——爬取简书个人文章标题

Python正则表达式指南

Python3.5笔记——第11章正则表达式

正则表达式

Python爬虫(十)_正则表达式

python正则表达式

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

百人计划

软件测试精进之路