Python爬虫之---爬淘宝

作者: 安晓生 | 来源:发表于2020-06-10 18:00 被阅读0次

Python爬虫之---爬淘宝
Python爬虫实战之爬取链家广州房价_03存储
Python爬虫集合，20个爬虫项目让你一次吃到撑！！！
Python网络爬虫（八） - 利用有道词典实现一个简单翻译程序
Python网络爬虫（七）- 深度爬虫CrawlSpider
Python网络爬虫（二）- urllib爬虫案例
Python网络爬虫（一）- 入门基础
Python网络爬虫（四）- XPath
Python网络爬虫（三）- 爬虫进阶
Python网络爬虫（六）- Scrapy框架

下面是爬虫爬淘宝网站的代码示例：


import requests
import re

def getHTMLText(url):
    headers = {
        "cookie": "enc=rhkdBuATegC%2Bei%2FOyoznNhbQnMfVx%2Fmwc1WI%2BanFMOku5X39Cr7U2tOYm5ddcg5%2FEq9rpBgkEGwD%2FFh4RDNCRQ%3D%3D;",
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
    }

    r = requests.get(url, headers=headers, timeout=30)
    return r.text

def parsePage(ilt, html):

    plt = re.findall(r'\"view_price\"\:\"[\d\.]*\"', html)
    tlt = re.findall(r'\"raw_title\"\:\".*?\"', html)
    for i in range(len(plt)):
        price = eval(plt[i].split(':')[1])
        title = eval(tlt[i].split(':')[1])
        ilt.append([price, title])

def printGoodsList(ilt):

    tplt = "{:4}\t{:8}\t{:16}"
    print(tplt.format("序号", "价格", "商品名称"))
    count = 0
    for g in ilt:
        count = count + 1
        print(tplt.format(count, g[0], g[1]))

def main():
    goods = '移动硬盘'
    depth = 20
    start_url = 'https://s.taobao.com/search?q=' + goods
    infoList = []
    for i in range(depth):
        try:
            url = start_url + '&s=' + str(44 * i)
            html = getHTMLText(url)
            parsePage(infoList, html)
            infoList = sorted(infoList, key=lambda x: int(x[0]))
        except:
            continue
    infoList = sorted(infoList,key=lambda x:float(x[0]))
    printGoodsList(infoList)

main()