Python 用BeautifulSoup从HTML网页格式中获

作者: LeeMin_Z | 来源:发表于2018-04-27 13:33 被阅读270次

记录一下py4e 课程的 beautiful soup 作业，虽然是我写的但其实应该算半原创的。

找到某个网页上的链接。

步骤解析：

引入相关库
忽略SSL错误
打开网站并且引用BS4直接提取相关内容

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter url - ')
# url = 'http://py4e-data.dr-chuck.net/known_by_Elita.html'
position = int(input('enter position - '))
times = int(input('enter times - '))

for time in range(times):
    if time == 0:
        openurl = url
    else:
        openurl = get_urls[position-1]

    html = urllib.request.urlopen(openurl, context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')

    tags = soup('a')
    get_urls = []
    for tag in tags:
        get_urls.append(tag.get('href', None))

    print(get_urls[position-1])

2018.4.27

网友评论

本文标题：Python 用BeautifulSoup从HTML网页格式中获

本文链接：https://www.haomeiwen.com/subject/dgaxhftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Python 用BeautifulSoup从HTML网页格式中获

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读