正则表达式与方法

作者: 豆豆_50dd | 来源:发表于2018-03-26 10:34 被阅读0次

正则表达式---常用符号

. :匹配任意字符，换行符\n除外
* :匹配前一个字符0次或无限次
? :匹配前一个字符0次后1次
.* :贪心算法   
.*? :非贪心算法  
() :括号内的数据作为结构返回

正则表达式--常用函数

findall:匹配所有符合规律的内容，返回包含结果的列表
Search:匹配并提取第一个符合规律的内容，返回一个正则表达式对象(object)
Sub: 替换符合规律的内容，返回替换后的值

正则表达式--常用技巧

import re
from re import*
from re import findall,search,sub,S
不需要compile
使用\d+匹配纯数字

代码：

import re
#from re import findall,search,sub,S    不好区分S建议用import re
secret_code = 'hadkfalifexxIxxfasdjifjal134xxlovexx23345sdfxxyouxx8dfse'

# . 的使用举例
# a = 'xz123'
# b = re.findall('x.',a)    //'.'可以认为占位符
# print (b)

# * 的使用案例
# a ='xyxy123'
# b =re.findall('x*',a)
# print(b)

# ? 的使用案例
# a = 'xy123'
# b =re.findall('x?',a)
# print (b)

# .* 的使用案例
# b = re.findall('xx.*xx',secret_code)
# print(b)
# # .*? 的使用案例
# c = re.findall('xx.*?xx',secret_code)
# print(c)

# #()使用括号与不使用括号的区别
# d = re.findall('xx(.*?)xx',secret_code)
# print(d)
# for each in d:
#   print(each)

# s ='''sdfxxhello
# xxfsdfxxworldxxasdfd'''   #hello后面有换行符

# d = re.findall('xx(.*?)xx',s,re.S)    //re.S包括换行符
# print(d)

#对比findall与search的区别
# s2 = 'asdfxxIxx123xxlovexxdfd'
# # f = re.search('xx(.*?)xx123xx(.*?)xx',s2).group(2)
# # print(f)

# f2 = re.findall('xx(.*?)xx123xx(.*?)xx',s2)
# print(f2[0][1])

#sub的使用案例
# s = '123abcssfasdsfsdax123'
# output = re.sub('123(.*?)123','123%d123'%789,s)
# print(output)

#不要使用compile
# pattern = 'xx(.*?)xx'
# new_pattern = re.compile(pattern,re.S)
# output = re.findall(new_pattern,secret_code)
# print(output)

#匹配数字
a = 'asfasdafads123456789sadfsd5555fvas'
b = re.findall('(\d+)',a)
print(b)

正则表达式的应用举例

1、使用findall与search从大量文本匹配感兴趣的内容
2、使用sub实现换页功能

正则表达式的应用举例--匹配多段内容

 灵活使用findall与search
 先抓大在抓小

正则表达式的应用举例--实现翻页

实验网址：http://www.jikexueyuan.com/course/android/?pageNum=12
核心代码：re.sub('pageNum=\d+','pageNum=%d'%i,old_url,re.S)

# -*- coding: UTF-8 -*-
import re
old_url = 'http://www.jikexueyuan.com/course/android/?pageNum=12'
total_page = 20

# f = open('text.txt','r')
# html = f.read()

# f.close
# with open('text.txt','r+',encoding='utf-8') as f:
#   html=f.read()

# #爬取标题
# 正则表达式中，group（）用来提出分组截获的字符串，（）用来分组
# title = re.search('<title>(.*?)</title>',html,re.S).group(1)   
# print(title)

#爬取网页
# links = re.findall('href="(.*?)"',html,re.S)
# for each in links:
#   print(each)

#提取部分文字，先大后小
# text_fied = re.findall('<ul>(.*?)</ul>',html,re.S)[0] #！！！！返回数组
# the_text = re.findall('">(.*?)</a>',text_fied,re.S)
# for every_text in the_text:
#   print (every_text)

#sub实现翻页功能
for i in range(2,total_page+1):
    new_link = re.sub('pageNum=\d+','pageNum=%d'%i,old_url,re.S)
    print (new_link)

网友评论

本文标题：正则表达式与方法

本文链接：https://www.haomeiwen.com/subject/pafkcftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

正则表达式与方法

正则表达式---常用符号

正则表达式--常用函数

正则表达式--常用技巧

正则表达式的应用举例

正则表达式的应用举例--匹配多段内容

正则表达式的应用举例--实现翻页

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读