python爬取腾讯社会招聘网站岗位信息——2020.05.25

作者: f11905ddb57f | 来源:发表于2020-05-25 15:00 被阅读0次

python爬取腾讯社会招聘网站岗位信息——2020.05.25
小爬虫之腾讯招聘网自动翻页采集
前程无忧python岗位信息爬取和分析
爬虫设计
Selenium小例子
python爬取招聘信息
基于scrapy框架的关于58同城招聘网站信息的爬取(一)
python爬虫scrapy项目详解（关注、持续更新）！
python爬虫scrapy项目详解（关注、持续更新）！
6.爬取拉勾网职位信息

python爬取腾讯社招岗位源代码：（欢迎点赞哦）

import requests

import json

import time

import random

import re

class TencentSpider(object):

def __init__(self):

self.headers = {'User-Agent':'Mozilla/5.0'}

self.one_url = 'https://careers.tencent.com/tencentcareer/api/post/Query?\

timestamp=1563912271089&countryId=&cityId=&bgIds=&productId=&categoryId=&parent\

CategoryId=&attrId=&keyword=&pageIndex={}&pageSize=10&language=zh-cn&area=cn'

self.two_url = 'https://careers.tencent.com/tencentcareer/api/post/ByPostId?\

timestamp=1563912374645&postId={}&language=zh-cn'

# 请求函数(两级页面都需要请求)

def get_page(self,url):

res = requests.get(url,headers=self.headers)

res.encoding = 'utf-8'

# json.loads()把响应内容转为　Python 数据类型

return json.loads(res.text)

# 获取数据(名称地点职责　要求)

def get_data(self,html):

# 先解析一级页面html

job_info = {}

# 依次遍历10个职位,再通过postId的值拼接二级页面地址

# html['Data']['Posts'] : [{职位1信息},{},{},{},{}]

for job in html['Data']['Posts']:

# 职位名称

job_info['招聘岗位'] = job['RecruitPostName']

# postId: 拼接二级页面的地址

post_id = job['PostId']

two_url = self.two_url.format(post_id)

# 发请求,解析出时间、地点、职责和要求

job_info['发布时间'],job_info['工作地点'],job_info['岗位职责'],\

job_info['岗位要求'] =self.parse_two_page(two_url)

#print(job_info)

print(job_info['招聘岗位'],job_info['工作地点'],job_info['发布时间'])

#print('\n')

print(job_info['岗位职责'])

print('\n')

# 解析二级页面函数(时间地点职责要求)

def parse_two_page(self,two_url):

two_html = self.get_page(two_url)

#发布时间

time=two_html['Data']['LastUpdateTime']

#地点

location=two_html['Data']['LocationName']

# 职责

duty = two_html['Data']['Responsibility']

# 要求

require = two_html['Data']['Requirement']

return time,location,duty,require

def main(self):

for index in range(1,2):

#页码范围

url = self.one_url.format(index)

# 得到了一级页面的响应内容

one_html = self.get_page(url)

self.get_data(one_html)

time.sleep(random.uniform(0.5,2))

#打印间隔

if __name__ == '__main__':

spider = TencentSpider()

spider.main()

爬取岗位信息部分源代码截图

上面代码中只输出了岗位名称，工作地点，发布时间与岗位职责，有需要的小伙伴可以根据需求进行调整，爬取网站页码范围也可以根据需要调整。

下面展示一下运行结果：

喜欢就点个赞吧~

网友评论

本文标题：python爬取腾讯社会招聘网站岗位信息——2020.05.25

本文链接：https://www.haomeiwen.com/subject/ikyyahtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

python爬取腾讯社会招聘网站岗位信息——2020.05.25

相关文章