1、beautifulsoup
初始化
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
用于结构简单清晰的网页,下面两个都适用于复杂的网页
2、xpath
初始化
from lxml import etree
html = etree.HTML(text)
基础规则
![](https://img.haomeiwen.com/i11434751/c2ac44d0d304a8ad.png)
result = html.xpath('/')
3、pyquery
初始化
from pyquery import PyQuery as pq
doc = pq(html)
css选择器
doc.find(selector)
快速获取xpath和css selector的方法
![](https://img.haomeiwen.com/i11434751/04e270d586672f93.png)
网友评论