用Python写一个简单的爬虫

作者: 呆呆的初行者 | 来源:发表于2019-01-01 14:57 被阅读0次

python爬虫——scrapy框架总结
用Python写一个简单的爬虫
各语言简单爬虫
Xpath模拟登陆GitHub
第一个python爬虫示例：用pycharm在phthon环境下
Scrapy爬虫框架
用Python写爬虫
Python网络爬虫一
Python爬虫入门(01) -- 10行代码实现一个爬虫
QUANTAXIS.SPIDER 爬虫部分

本文主要内容是写一个简单的爬虫，可以抓取网页图片并且自动下载。以https://www.educoder.net网站为例。

1.首先根据URL获取网页源代码：

URL处理模块（库）

import urllib.request as req

创建一个表示远程url的类文件对象

req.urlopen(' ')

(```)

from urllib import request

import re

def getreq(url):

urlrqe=request.urlopen(url)

return(urlrqe)

rqe1=getreq("https://www.educoder.net")

data=rqe1.read().decode('utf-8')

(```)

2.根据获取的网页源代码分析，提取图片相关的链接。

def getjpg(data):

jpglist=re.findall(r'/images.+?.png',data)

return(jpglist)

link=getjpg(data)

print(link)

3.编写下载代码

def download(jpgurl,n):

try:

request.urlretrieve(jpgurl,D:\images'%s.png' %n)

except Exception as e:

print(e)

finally:

print('图片%s下载操作完成' % n)

n=1

s1=[]

ul='https://www.educoder.net'

for jpgurl in link:

s=re.findall(r'/images.+?.png',jpgurl)

s[0]=ul+jpgurl

download(s[0],n)

n=n+1

由于网页源代码的图片链接不是完整链接需要加上'https://www.educoder.net

运行代码可自动下载'https://www.educoder.net首页以.png后缀的图片。

网友评论

本文标题：用Python写一个简单的爬虫

本文链接：https://www.haomeiwen.com/subject/nvfnlqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

用Python写一个简单的爬虫

相关文章

python爬虫——scrapy框架总结