美文网首页
爬取京东商品实例

爬取京东商品实例

作者: haokeed | 来源:发表于2019-05-20 17:02 被阅读0次
import random
uas = ["Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
        "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11", 
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6", 
        "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6", 
        "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1", 
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5", 
        "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5", 
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", 
        "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", 
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", 
        "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3", 
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3", 
        "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", 
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", 
        "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", 
        "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3", 
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24", 
        "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"
    ]

UserAgent=random.choice(uas)
print(UserAgent)


import requests
import random
url='https://list.jd.com/list.html?cat=9987,653,655'
this_ua=random.choice(uas)
head={"user-agent":this_ua}
res=requests.get(url,headers=head)
res.encoding='utf-8'
print(res)

# 爬取常规数据:商品名称 商家名称
root=etree.HTML(res.text)

# 商品名称
name=root.xpath('//div[@class="p-name"]/a/em/text()')
for i in range(len(name)):
    name[i]=name[i].strip()
    
print(name)

# 获取商家名称
shop=root.xpath('//div[@class="p-shop"]/@data-shop_name')
print(shop)


# 获取全部商品的sku
sku=root.xpath('//li[@class="gl-item"]/div/@data-sku')
print(sku)


# 获取价格
price=[]
for i in range(len(sku)):
    this_sku=sku[i]
    price_url="https://p.3.cn/prices/mgets?callback=jQuery8966120&ext=11101100&pin=&type=1&area=1_72_4137_0&skuIds=J_"+sku[i]+"&pdbp=0&pdtk=&pdpin=&pduid=825425048&source=list_pc_front&_=1558338909272"
    price_res=requests.get(price_url)
    pat='"p":"(.*?)"'
    this_price=re.findall(pat,price_res.text)
    price=price+this_price
    
print(price)

# 评论数
commentcount=[]
for i in range(len(sku)):
    this_sku=sku[i]
    commenturl="https://club.jd.com/comment/productCommentSummaries.action?referenceIds="+this_sku+"&callback=jQuery345302&_=1558338910413"
    commentres=requests.get(commenturl)
    commentpat='"ShowCount":(.*?),'
    this_commentcount=re.findall(commentpat,commentres.text)
    commentcount=commentcount+this_commentcount
print(commentcount)
image.png image.png
image.png
image.png
# 利用循环爬取多页数据
import requests
from lxml import etree
import re
from fake_useragent import UserAgent
from pandas import DataFrame
import pandas as pd
import time

pInfoAll=DataFrame()

for i in range(1,3):
    url="https://list.jd.com/list.html?cat=9987,653,655&page="+str(i)
    
    this_ua=random.choice(uas)
    head={"user-agent":this_ua}
    res=requests.get(url,headers=head)
    res.encoding='utf-8'
    root=etree.HTML(res.text)

    # 商品名称
    name=root.xpath('//div[@class="p-name"]/a/em/text()')
    for i in range(len(name)):
        name[i]=name[i].strip()

    # 获取商家名称
    shop=root.xpath('//div[@class="p-shop"]/@data-shop_name')

    # 获取全部商品的sku
    sku=root.xpath('//li[@class="gl-item"]/div/@data-sku')

    # 获取价格
    price=[]
    for i in range(len(sku)):
        this_sku=sku[i]
        price_url="https://p.3.cn/prices/mgets?callback=jQuery8966120&ext=11101100&pin=&type=1&area=1_72_4137_0&skuIds=J_"+sku[i]+"&pdbp=0&pdtk=&pdpin=&pduid=825425048&source=list_pc_front&_=1558338909272"
        price_res=requests.get(price_url)
        pat='"p":"(.*?)"'
        this_price=re.findall(pat,price_res.text)
        price=price+this_price
    
    # 评论数
    commentcount=[]
    for i in range(len(sku)):
        this_sku=sku[i]
        commenturl="https://club.jd.com/comment/productCommentSummaries.action?referenceIds="+this_sku+"&callback=jQuery345302&_=1558338910413"
        commentres=requests.get(commenturl)
        commentpat='"ShowCount":(.*?),'
        this_commentcount=re.findall(commentpat,commentres.text)
        commentcount=commentcount+this_commentcount

    pInfo=DataFrame([sku,name,shop,price,commentcount]).T
    pInfoAll=pd.concat([pInfoAll,pInfo])
    pInfo.columns=['sku','商品名称','商家','价格','评论数']
    time.sleep(2)

print(pInfoAll.head())
print(pInfoAll.describe())

相关文章

  • requests库网络爬取实战

    @[toc] 实例1:京东商品页面的爬取 实例2:亚马逊商品页面的爬取 需要伪造请求头 实例3:百度/360搜索关...

  • 爬取京东商品实例

  • Requests 库爬虫实例

    实例1:京东商品页面的爬取 http://item.jd.com/2967929.html[http://item...

  • LA1 Requests库实验

    Request库实战 [TOC] 实例1:京东商品页面爬取 例如我们爬取最新的荣耀V20信息,目前仅仅是将HTML...

  • Python网络爬虫与信息提取入门<3>

    Part15:实例1 :京东商品页面的爬取 首先打开京东的页面,这里面我们选取一款华为手机,我们可以看到这个商品的...

  • 京东商品爬取

    这次的练习主要是对京东的ipad商品页面进行爬取,主页如下: items.py 对名字、商铺、价格和营销方式进行抓...

  • 案例集锦

    案例一: 京东商品页面的爬取 案例二:亚马逊商品页面的爬取 由于amazon禁止python访问,要把header...

  • 网络爬虫实战(5个案例)

    案例1:京东商品页面的爬取 商品链接 案例2:亚马逊商品页面的爬取 商品链接 案例3:百度360关键词提交 搜索引...

  • 入门级爬虫(2)

    requests库入门实操我的个人博客 京东商品页面爬取 亚马逊商品页面的爬取 百度/360搜索关键字提交 IP地...

  • 1.python爬虫实例

    1.京东商品页面的爬取 2.亚马逊商品页面的爬取 用headers字段,让代码模拟浏览器向亚马逊服务器提供请求。 ...

网友评论

      本文标题:爬取京东商品实例

      本文链接:https://www.haomeiwen.com/subject/zegzaqtx.html