美文网首页
使用Python爬取拉勾网职位写入Excel

使用Python爬取拉勾网职位写入Excel

作者: longsan0918 | 来源:发表于2018-02-08 16:52 被阅读43次
    不知不觉2018年已经过了快两个月了,马上春节了,小伙伴们是不是也和我一样,无心工作。。。年终啦,小伙伴们年终奖有没有拿到手软啊??哈哈哈。。。。 没拿到年终奖的小伙伴,也不要气馁,再接再厉,年后一波职位等你来战。。。。闲来无事,教大家使用Python爬取职位数据,实例爬取的是拉勾网杭州的Python职位数据,废话不多说,直接上代码
    
    import requests     # 导入请求模块
    import xlsxwriter
    import time
    
    headers = {
        'User-Agent':"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
        'Referer':"https://www.lagou.com/jobs/list_python?px=default&city=%E6%9D%AD%E5%B7%9E",
        'Cookie':'user_trace_token=20170814104005-8c79ba9a-88b2-49f0-b55a-84283730e16a; LGUID=20170814104015-e71d7b3c-8099-11e7-af6d-525400f775ce; _ga=GA1.2.1544472457.1502678407; _gid=GA1.2.413489957.1517997531; JSESSIONID=ABAAABAACEFAACG097350D52438FF8E73B79D2C95CD3671; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1517997531; index_location_city=%E5%85%A8%E5%9B%BD; isCloseNotice=0; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1518053302; LGRID=20180208092823-5a881396-0c6f-11e8-af9a-5254005c3644'
    }
    
    url = 'https://www.lagou.com/jobs/positionAjax.json?px=default''&city=%E6%9D%AD%E5%B7%9E&needAddtionalResult=false&isSchoolJob=0'
    
    # 获取数据有多少页
    def get_Job_page():
        res = requests.post(
            # 请求url
            url=url,
            headers = headers,
            data = {
            'first': 'false',
            'pn': 1,
            'kd': 'Python'
            }
        )
        result = res.json()     # 获取res中的json信息
        all_count = result['content']['positionResult']['totalCount']
        singe_pagecount = result['content']['positionResult']['resultSize']
        print(all_count,singe_pagecount)
        return int(all_count/singe_pagecount)+1
    
    # 获取数据 列表展示的数据
    def getJobList(page):
        res = requests.post(
            # 请求url
            url=url,
            headers = headers,
            data = {
            'first': 'false',
            'pn': page,
            'kd': 'Python'
            }
        )
        result = res.json()     # 获取res中的json信息
        jobsInfo = result['content']['positionResult']['result']
        return jobsInfo
    
    
    workbook = xlsxwriter.Workbook('lagou.xlsx')
    worksheet = workbook.add_worksheet()
    worksheet.set_column('A:A',20)
    
    def write_excel(row = 0,positionName='职位名',salary='薪水',city='工作地点',
                    education='教育程度',workYear='工作经验',companyFullName='公司名'):
        worksheet.write(row,0,positionName)
        worksheet.write(row,1,salary)
        worksheet.write(row,2,city)
        worksheet.write(row,3,education)
        worksheet.write(row,4,workYear)
        worksheet.write(row,5,companyFullName)
    
    write_excel(0)
    row = 1
    pages = get_Job_page()+1
    
    # 1 延时处理请求
    for page in range(1,pages):
        for job in getJobList(page=page):
            write_excel(row,positionName = job['positionName'],
                        salary = job['salary'],
                        city = job['city'],
                        education = job['education'],
                        workYear = job['workYear'],
                        companyFullName = job['companyFullName'])
            row += 1
        print('第%d页数据已经写入完毕'%page)
        time.sleep(0.5)
    
    
    print('全部写入完毕')
    
    workbook.close()
    

    Excel展示爬取的数据结构


    image.png

    相关文章

      网友评论

          本文标题:使用Python爬取拉勾网职位写入Excel

          本文链接:https://www.haomeiwen.com/subject/mgbttftx.html