前言
“人生苦短,我用 Python”,Python 的经典 slogan 讲究争分夺秒,并且在 9月的TIOBE榜中拿下第 3 名宝座。
![](https://img.haomeiwen.com/i14140970/9e1c940547912db5.png)
今天就试着在Boss直聘网站上爬取python和java的招聘信息,比较一下两个方向的发展钱景,为本科生的就业方向给一个小小的建议
爬取
在招聘网站上直接以”本科生”和”java”或”python”作为筛选条件,以广州为例
爬取招聘的大体信息,具体代码
from bs4 import BeautifulSoup
import requests
import pymongo
client = pymongo.MongoClient('localhost', 27017)
zhipin = client['zhipin']
zhipin_java = zhipin['zhipin_java']
zhipin_python = zhipin['zhipin_python']
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.78 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
}
total_page = 11
def get_info(param, data_table):
'''
根据招聘方向(java或python..)爬取信息存进数据库
:param param: 招聘方向
:param data_table: 数据库表明
:return:
'''
for i in range(1, total_page):
url = 'https://www.zhipin.com/c101280100/d_203-h_101280100/?query={0}&page={1}'.format(
param, i)
web_data = requests.get(url, headers=headers)
soup = BeautifulSoup(web_data.content, 'lxml')
for item in soup.select('#main > div > div.job-list > ul > li'):
# 招聘要求
job_title = item.select('.job-title')[0].text # 岗位
salary = item.select('.red')[0].text # 薪资
person_info = item.select('.info-primary p')[0].text # 应聘要求
# 获取公司信息
company = item.select('.info-company h3 a')[0].text # 公司
company_info = item.select('.info-company p')[0].text # 公司信息
data = {
'job_title': job_title,
'salary': salary,
'person_info': person_info,
'company': company,
'company_info': company_info,
}
# 插入数据库
data_table.insert(data)
print(data)
print('*' * 100)
print('\n' * 5)
if __name__ == '__main__':
param_list = ['java', 'python']
table_list = [zhipin_java, zhipin_python]
for param, table in zip(param_list, table_list):
get_info(param, table)
爬取的信息全部存在mongodb中。便于后面的分析处理
数据清洗
在数据处理这里定义了几个方法,用来处理相应的内容
1.初始化变量
import pymongo
client = pymongo.MongoClient('localhost', 27017)
zhipin = client['zhipin']
zhipin_java = zhipin['zhipin_java']
zhipin_python = zhipin['zhipin_python']
from collections import Counter
from pyecharts import Bar,Line,Pie
2.获取地区分布情况
import re
def get_zone():
''' 获取地区'''
zone_list = []
real_list = []
for item in zhipin_java.find():
text = item['person_info'][3:6]
zone_list.append(text)
for i in zone_list:
j = re.sub(r' \d-','',i)
real_list.append(j)
while '' in real_list:
real_list.remove('')
return real_list
zone = dict(Counter(get_zone()))
3.整理招聘数据
def del_key_1():
'''删除招聘次数为1的岗位'''
li = []
for key in job_dict.keys():
if job_dict[key] == 1:
li.append(key)
for i in li:
del job_dict[i]
print(job_dict)
4.整理薪水数据
def get_salary():
'''获取招聘的工资'''
min_list = [] #起步工资
max_list = [] #最高工资
job_title = [] #岗位
for item in zhipin_java.find():
job_title.append(item['job_title'])
salary = item['salary']
min_list.append(int(salary.split('-')[0][:-1]))
max_list.append(int(salary.split('-')[1][:-1]))
return min_list,max_list,job_title
数据可视化
通过整理地区分布数据,利用pyecharts作图
bar = Bar("java和python岗位地区分布")
bar.add("java", list(key for key in zone.keys()), list(value for value in zone.values()),mark_line=['min', 'max'], is_toolbox_show = True,is_more_utils=True)
bar.add("python", list(key for key in py_zone.keys()), list(value for value in py_zone.values()),mark_line=['min', 'max'], is_toolbox_show = True,is_more_utils=True)
bar
![](https://img.haomeiwen.com/i14140970/2c068aafb0fd9848.png)
越靠近城市中心的地区,招聘的岗位就越多,成功应聘的机会较高;番禺和天河区相差较大,其中天河区招python比java将近多8倍;番禺区java比python更加热门,受公司青睐;其他区相差不大
招聘最多的岗位
python方向招聘岗位
![](https://img.haomeiwen.com/i14140970/b00f356f89012778.png)
python岗位占比
![](https://img.haomeiwen.com/i14140970/4196db86a875ee11.png)
占比前五位分别是:
- python工程师
- 数据分析师
- 运维工程师
- 大数据开发工程师
- 游戏AI算法工程师
java岗位对比
![](https://img.haomeiwen.com/i14140970/5ea9daf3eff3e842.png)
![](https://img.haomeiwen.com/i14140970/1fab16339d9b69d4.png)
高级的工程师招聘的人数较少,大部分都是在招聘初中级工程师,真的是“三个臭皮匠也干不过一个诸葛亮胜过 (:”
python招聘公司
![](https://img.haomeiwen.com/i14140970/ce0abc04f88b3f56.png)
java招聘公司
![](https://img.haomeiwen.com/i14140970/16ccd6a4bdc4c49e.png)
最关心的钱途问题
最高薪水
![](https://img.haomeiwen.com/i14140970/09823ea72ecda1bb.png)
看来python不是吹的,最高薪水也大多数都比java的高;java最高薪水平均19.24K,最低3K,最高50k;python最高薪水平均21.16K,最低3k,最高60k
最低薪水
![](https://img.haomeiwen.com/i14140970/1ec099e9fd752518.png)
python起步薪水大多数都比java的高;java平均起步薪水11.42K,python平均起步薪水12.08K
两个岗位词云
![](https://img.haomeiwen.com/i14140970/3f3903ebb08fb132.png)
![](https://img.haomeiwen.com/i14140970/a991f2e881fae207.png)
关注公众号“Python绿洲 ” 回复 [ boss ]获取源码
网友评论