美文网首页
京东商品的检索爬虫

京东商品的检索爬虫

作者: 周周周__ | 来源:发表于2019-03-24 21:37 被阅读0次
    主要是对京东页面的商品检索接口的抓取,进行返回数据

    主要利用:
    pythonrequest库以及xpath

    网页分析

    进入首页面,打开f12,network进行网络抓包
    输入商品名称:小米手机
    分析抓包:通过response 进行查看


    url

    在地址:https://search.jd.com/Search?keyword=%E5%B0%8F%E7%B1%B3%E6%89%8B%E6%9C%BA&enc=utf-8&suggest=1.his.0.0&wq=&pvid=922913c28d854d979bb187fa68bfef4e
    链接分析:https://search.jd.com/Search?keyword=%E5%B0%8F%E7%B1%B3%E6%89%8B%E6%9C%BA&enc=utf-8&是有效部分
    分析参数:keyword是我们进行搜多的内容

    京东

    爬虫编写

    import requests 
    from lxml import etree
    import chardet
    import pymysql
    
    # 链接数据库
    conn = pymysql.connect(
                host = 'localhost',
                user = 'root',
                password = '123456',
                database = 'shopping',
                charset = 'utf8'
            )
    cur = conn.cursor()
    # 初始地址
    url = 'https://search.jd.com/Search?keyword={}&enc=utf-8&'.format(n)
    # 请求头
    headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3724.8 Safari/537.36'
        }
    
    try:
        # 开始请求
        res = requests.get(url, headers=headers)
        html = res.content
        #进行解码
        encoding = chardet.detect(html).get('encoding')
        html = html.decode(encoding, 'ignore')
    
        # print(html)
        #创建节点对象,使用xpath进行清洗
        docs = etree.HTML(html)
        good_list = docs.xpath('//li[@class="gl-item"]/div')  # 取出所有商品的列表
        # print(good_list)
        print(len(good_list))
         #循环列表,从每个数据中清洗更干净的数据
        for good in good_list:
            # print(good.xpath('.//text()'))
            title = (good.xpath('./div/a/em//text()'))[:3] #标题
            price = good.xpath('.//div/strong/i/text()')    # 价格
            cover = good.xpath('./div[@class="p-img"]/a/img/@source-data-lazy-img')  # 图片
            intro = good.xpath('./div[@class="p-img"]/a/@title')  #介绍
            # 转化为字符串
            title = ''.join(title)
            price = ''.join(price)
            cover = "https:" + ''.join(cover)
            intro = ''.join(intro)
             
            # 图片保存到本地
            pic = requests.get(url=cover, headers=headers)
            dir = "E:\\biyesheji\\mall1\\mall1\\static\\images\\goods\\{}".format(i) + '.jpg'
            with open(dir,'wb') as f:
                f.write(pic.content)
                
            if price == '':
                continue
            price = float(price)
            print(title)
            print(price)
            cover = 'static/images/goods/{}'.format(i) + '.jpg'
            print(cover)
            print(intro)
            print("#"*100)
            sql1 = "insert into goods_goods(id,name,price,stock, count, intro, goodstype_id, stores_id, creatTime) values(%s, %s, %s, %s, %s,%s, %s,%s,%s)"
                cur.execute(sql1, (i, title, price, 10, 0, intro, m, 1, '2019-03-24 01:45:32.227014'))
                
                sql2 = 'insert into goods_goodsimage(id, path, status, intro, goods_id)values(%s, %s, %s, %s, %s)'
                cur.execute(sql2, (i, cover, 0, intro, i))
                conn.commit()
        except:
            continue
    
    

    只是对京东的界面进行简单分析

    作者QQ群(非):832785950(备注来地)

    相关文章

      网友评论

          本文标题:京东商品的检索爬虫

          本文链接:https://www.haomeiwen.com/subject/awepvqtx.html