BeautifulSoup

作者: harukou_ou | 来源:发表于2018-02-06 21:47 被阅读15次

爬虫任务二
BeautifulSoup(BS4)的基本使用
BeautifulSoup基础使用
beautifulsoup教程
Python中HTML解析
beautifulsoup4 标签选择器
用beautifulsoup剖析网页元素
Python 抓取花瓣图片地址
HTML 解析
Python 爬虫基础｜Python网络数据采集笔记

from bs4 import BeautifulSoup
soup = BeautifulSoup('<p>data</p>', 'html.parser')

BeautifulSoup的功能

标签结构

BeautifulSoup类.png

beautifulsoup 解析器

BeautifulSoup 类的基本元素

Tag 用<> and </>表明开头和结尾
Name <p>...</p>的名字是p，格式： <tag>.name
Attributes 标签的属性，字典格式 <tag>.attrs
NavigableString <>...</>内字符串 <tag>.string
Comment 字符串的注释部分

bs的理解
https://python123.io/ws/demo.html

HTML基本格式.png

标签树的下行遍历.png

from bs4 import BeautifulSoup as BS
soup = BS(demo,'html.parser')
 soup.head
<head><title>This is a python demo page</title></head>
 soup.head.contents
[<title>This is a python demo page</title>]
soup.body.contents
['\n', <p class="title"><b>The demo python introduces several python courses.</b></p>, '\n', <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:

<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>, '\n']
>>> len(soup.body.contents)
5
>>> soup.body.contents[1]
<p class="title"><b>The demo python introduces several python courses.</b></p>
>>> soup.body.children
<list_iterator object at 0x00000000039D2748>

上行方法.png

上行遍历.png

1.png

平行遍历.png

bs遍历.png

print(soup.prettify())  #prettify方法

bs入门.png

soup.find_all(name, attrs, recursive,string,**kwargs) #返回列表？

name:对标签名称检索的字符串
attrs:对标签属性值的检索字符串，可标注属性

len(soup.find_all(src=re.compile('jpg'),class_=re.compile('image')))

for tag in soup.find_all(src=re.compile('jpg'),class_=re.compile('image')):
    print(tag.attrs['src'])

网友评论

本文标题：BeautifulSoup

本文链接：https://www.haomeiwen.com/subject/xbsnzxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

BeautifulSoup

BeautifulSoup 类的基本元素

相关文章

爬虫任务二

BeautifulSoup(BS4)的基本使用

BeautifulSoup基础使用

beautifulsoup教程

Python中HTML解析

beautifulsoup4 标签选择器

用beautifulsoup剖析网页元素

Python 抓取花瓣图片地址

HTML 解析

Python 爬虫基础｜Python网络数据采集笔记

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读