1
前言
<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558177772788" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
上篇提到说爬取学校的就业信息网,考虑到我学校的土豆服务器,然后也不想给学校服务器带来什么不好的影响,所以没有写爬取学校的招聘信息。
又考虑到爬取学校就业信息网的信息对于毕业季找工作的同学太重要了。所以这篇具体的写,因为我找了其他学校的就业信息网(突然有一丝罪恶感,坏笑)
我先大体的介绍一下这篇文章中代码主要完成的功能:
- 爬取学校就业信息
- 简单数据处理
- 储存为excel
- 发送到一些人的邮箱
- 利用Linux的crontab,可以定时的执行(so,每天早上定时执行,很简单。起床就可以看到整理好的就业信息,舒服。)
2
需要的“轮子”
<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558177772794" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import jsonimport datetimeimport requestsfrom bs4 import BeautifulSoupimport pandasimport smtplib #smtp发邮件from email.mime.text import MIMEText #构造邮件内容from email.mime.multipart import MIMEMultipart #发带附件的邮件
</pre>
<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
3
代码分段讲解
<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558177772801" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
构造一些必须部分
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># 构造请求头# 使访问看起来像是一个真正的浏览器在进行headers = {"Host": "jy.51uns.com:8022","Connection": "keep-alive","Accept": "application/json, text/javascript, /; q=0.01","X-Requested-With": "XMLHttpRequest","User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36","Referer": "http://jobsys.xpu.edu.cn/Pro_StudentEmploy/StudentJobFair/JobFairSearch.aspx?searchKey=","Accept-Encoding": "gzip, deflate","Accept-Language": "zh-CN,zh;q=0.9","Cookie": "ASP.NET_SessionId=ldyzg0ktrmlzaedggpfvbbmm"}
构造一个格式化时间# 形如 2019-01-01time1 = (datetime.datetime.now()+datetime.timedelta(days=0)).strftime("%Y-%m-%d")
构造请求网址
url = "http://jy.51uns.com:8022/Frame/Data/jdp.ashx?rnd=1558070330555&fn=GetJobFairListToWeb&StartDate={}&EndDate={}&SearchKey=&InfoState=1&start=0&limit=999&IsOpen=1"url1 = url.format(time1,time1)
数据存取
jobs_total = []
</pre>
定义函数来获取招聘信息网址
Python学习交流群:1004391443,这里有资源共享,技术解答,还有小编从最基础的Python资料到项目实战的学习资料都有整理,希望能帮助你更了解python,学习python。
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def getJobLinks(url_zong):# 用到额requests的get方法 res = requests.get(url_zong,headers = headers) res.encoding = 'utf-8' jd = json.loads(res.text) jd1 = jd['rows'] i12 = len(jd['rows']) print(i12)# 如果长度大于0,则这个时间有招聘会,否则今天没有。 if i12 > 0 : for z1 in range(i12): id1 = jd1[z1]['Id']#招聘信息详情页网址 url2 = "http://jy.51uns.com:8022/Frame/Data/jdp.ashx?rnd=1558071078419&fn=GetOneJobFair&Id={}&StartDate=2000-01-01" joburl = url2.format(id1) #print('总:'+url1)#将参数传递到getJobDetail_if这个函数请求 job00 = getJobDetail_if(url_zong,joburl) id1 = (len(jobs_total)) print('招聘会场数:');print(id1)#pandas用来将数据存为excel df = pandas.DataFrame(jobs_total,columns=['CompanyName','MeetDate','TimesName','SitusName','CompanyLinkmen','CompanyTel','CompanyEmail','PostDemand','Specialty','Plannumber'])#文件存取名称 file = './'+time1+'.xlsx'#pandas导出为excel文件 file = df.to_excel(file) # 发送邮件 toemail() else: print ('今天没有招聘会')
</pre>
定义函数用于向 getJobDetail传递参数,并且接受其返回的内容,存入 jobs_total
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def getJobDetail_if(url_zong,joburl): res1 = requests.get(joburl,headers = headers) res1.encoding = 'utf-8' print('内容页:'+joburl)# 获取招聘信息的具体内容 job001 = getJobDetail(joburl) #print(job00)# append存入jobs_total jobs_total.append(job001) return jobs_total
</pre>
定义函数用来获取招聘信息的具体内容
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def getJobDetail(url1): res = requests.get(url1,headers = headers) res.encoding = 'utf-8' jobs = {}# print(res.text) js = json.loads(res.text) arr = js['Data']['RecruitContent'] soup1 = BeautifulSoup(arr,'html.parser') #jobs['RecruitContent'] = soup1.get_text('\n','
') jobs['Plannumber'] = js['Data']['Plannumber'] jobs['MeetDate'] = js['Data']['MeetDate'][0:10] jobs['TimesName'] = js['Data']['TimesName'] # 数据处理 ct = js['Data']['CompanyType'] if ct =='' or ct == None : jobs['CompanyType'] = '未填写' else: jobs['CompanyType'] = ct #soup3 = BeautifulSoup(sc1,'html.parser') #sc = soup3.get_text('/',',') sc1 = js['Data']['Specialty'] if sc1 =='' or sc1 == None or sc1 == 'null' : jobs['Specialty'] = '未填写' else: sc = sc1.replace('/','、') jobs['Specialty'] = sc pd1 = js['Data']['PostDemand'] pd = pd1.replace('/','、') if pd =='' or pd == None : jobs['PostDemand'] = '未填写' else: jobs['PostDemand'] = pd #arr2 = js['Data']['CompanyAbout'] #soup2 = BeautifulSoup(arr2,'html.parser') #jobs['CompanyAbout'] = soup2.get_text('\n','
') jobs['SitusName'] = js['Data']['SitusName'] ce = js['Data']['CompanyEmail'] if ce =='' or ce == None : jobs['CompanyEmail'] = '未填写' else: jobs['CompanyEmail'] = ce cw = js['Data']['CompanyWeblink'] if cw =='' or cw == None : jobs['CompanyWeblink'] = '未填写' else: jobs['CompanyWeblink'] = cw jobs['CompanyTel'] = js['Data']['CompanyTel'] jobs['CompanyLinkmen'] = js['Data']['CompanyLinkmen'] jobs['CompanyName'] = js['Data']['CompanyName'] return jobs
</pre>
邮件部分
参数设定
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">email_host = 'smtp.qq.com' #邮箱服务器地址email_user = 'xxx@qq.com' # 发送者账号email_pwd = 'xxx'# 发送者密码是邮箱的授权码,不是登录的密码maillist = "xxx@qq.com"# maillist可以是多个,用逗号分隔file = './'+time1+'.xlsx'
</pre>
定义函数用来发送邮件以及构造邮件内容
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def toemail(): new_msg = MIMEMultipart() #构建了一个能发附件的邮件对象 new_msg.attach(MIMEText('西安xx大学'+time1+'招聘信息')) # 邮件内容 new_msg['Subject'] = '西安xx大学'+time1+'招聘信息' # 邮件主题 new_msg['From'] = email_user # 发送者账号 new_msg['To'] = maillist # 接收者账号列表 att = MIMEText(open(file, 'rb').read(), 'base64', 'utf-8') att["Content-Type"] = ('application/octet-stream') att["Content-Disposition"] = ( 'attachment; filename= '+time1+'.xlsx') new_msg.attach(att) smtp = smtplib.SMTP_SSL(email_host,port=465) #连接邮箱,传入邮箱地址,使用465端口发送邮件 smtp.login(email_user, email_pwd) # 发送者的邮箱账号,密码 smtp.sendmail(email_user, maillist.split(','), new_msg.as_string())# maillist.split(',')给多人发邮件关键部分 # 参数分别是发送者,接收者,第三个是把上面的发送邮件的内容变成字符串 smtp.quit() # 发送完毕后退出smtp print ('邮件发送成功!')
</pre>
运行
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">if name == "main": getJobLinks(url1)
</pre>
<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
4
完整代码
<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558177772815" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import jsonimport datetimeimport requestsfrom bs4 import BeautifulSoupimport pandasimport smtplib #smtplib这个模块是管发邮件from email.mime.text import MIMEText #构造邮件内容from email.mime.multipart import MIMEMultipart #发带附件的邮件用的
构造请求头
headers = {"Host": "jy.51uns.com:8022","Connection": "keep-alive","Accept": "application/json, text/javascript, /; q=0.01","X-Requested-With": "XMLHttpRequest","User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36","Referer": "http://jobsys.xpu.edu.cn/Pro_StudentEmploy/StudentJobFair/JobFairSearch.aspx?searchKey=","Accept-Encoding": "gzip, deflate","Accept-Language": "zh-CN,zh;q=0.9","Cookie": "ASP.NET_SessionId=ldyzg0ktrmlzaedggpfvbbmm"}
构造一个格式化时间
time1 = (datetime.datetime.now()+datetime.timedelta(days=0)).strftime("%Y-%m-%d")
构造请求网址
url = "http://jy.51uns.com:8022/Frame/Data/jdp.ashx?rnd=1558070330555&fn=GetJobFairListToWeb&StartDate={}&EndDate={}&SearchKey=&InfoState=1&start=0&limit=999&IsOpen=1"url1 = url.format(time1,time1)
数据存取
jobs_total = []
定义函数来获取招聘信息的网址
def getJobLinks(url_zong): res = requests.get(url_zong,headers = headers) res.encoding = 'utf-8' jd = json.loads(res.text) jd1 = jd['rows'] i12 = len(jd['rows']) print(i12)# 如果长度大于0,则这个时间有招聘会,否则今天没有。 if i12 > 0 : for z1 in range(i12): id1 = jd1[z1]['Id']# 招聘信息详情页网址 url2 = "http://jy.51uns.com:8022/Frame/Data/jdp.ashx?rnd=1558071078419&fn=GetOneJobFair&Id={}&StartDate=2000-01-01" joburl = url2.format(id1) #print('总:'+url1)# 将参数传递到getJobDetail_if这个函数请求 job00 = getJobDetail_if(url_zong,joburl) id1 = (len(jobs_total)) print('招聘会场数:');print(id1)# pandas用来将数据存为excel df = pandas.DataFrame(jobs_total,columns=['CompanyName','MeetDate','TimesName','SitusName','CompanyLinkmen','CompanyTel','CompanyEmail','PostDemand','Specialty','Plannumber'])# 文件存取名称 file = './'+time1+'.xlsx'# pandas导出excel文件 file = df.to_excel(file) # 发送邮件 toemail() else: print ('明天没有招聘会')
此函数用于向 getJobDetail传递参数,并且接受其返回的内容,存入 jobs_total
def getJobDetail_if(url_zong,joburl): res1 = requests.get(joburl,headers = headers) res1.encoding = 'utf-8' print('内容页:'+joburl) job001 = getJobDetail(joburl) #print(job00) jobs_total.append(job001) return jobs_total
获取招聘信息的具体内容
def getJobDetail(url1): res = requests.get(url1,headers = headers) res.encoding = 'utf-8' jobs = {}# print(res.text) js = json.loads(res.text) arr = js['Data']['RecruitContent'] soup1 = BeautifulSoup(arr,'html.parser') #jobs['RecruitContent'] = soup1.get_text('\n','
') jobs['Plannumber'] = js['Data']['Plannumber'] jobs['MeetDate'] = js['Data']['MeetDate'][0:10] jobs['TimesName'] = js['Data']['TimesName'] # 数据处理 ct = js['Data']['CompanyType'] if ct =='' or ct == None : jobs['CompanyType'] = '未填写' else: jobs['CompanyType'] = ct #soup3 = BeautifulSoup(sc1,'html.parser') #sc = soup3.get_text('/',',') sc1 = js['Data']['Specialty'] if sc1 =='' or sc1 == None or sc1 == 'null' : jobs['Specialty'] = '未填写' else: sc = sc1.replace('/','、') jobs['Specialty'] = sc pd1 = js['Data']['PostDemand'] pd = pd1.replace('/','、') if pd =='' or pd == None : jobs['PostDemand'] = '未填写' else: jobs['PostDemand'] = pd #arr2 = js['Data']['CompanyAbout'] #soup2 = BeautifulSoup(arr2,'html.parser') #jobs['CompanyAbout'] = soup2.get_text('\n','
') jobs['SitusName'] = js['Data']['SitusName'] ce = js['Data']['CompanyEmail'] if ce =='' or ce == None : jobs['CompanyEmail'] = '未填写' else: jobs['CompanyEmail'] = ce cw = js['Data']['CompanyWeblink'] if cw =='' or cw == None : jobs['CompanyWeblink'] = '未填写' else: jobs['CompanyWeblink'] = cw jobs['CompanyTel'] = js['Data']['CompanyTel'] jobs['CompanyLinkmen'] = js['Data']['CompanyLinkmen'] jobs['CompanyName'] = js['Data']['CompanyName'] return jobs
邮件部分
email_host = 'smtp.qq.com' #邮箱服务器地址email_user = 'xxx@qq.com' # 发送者账号email_pwd = 'xxxxxxx' # 发送者密码是邮箱的授权码,不是登录的密码maillist = "xxx@qq.com" # maillist可以是多个,用逗号分隔file = './'+time1+'.xlsx'
def toemail(): new_msg = MIMEMultipart() #构建了一个能发附件的邮件对象 new_msg.attach(MIMEText('西安xx大学'+time1+'招聘信息')) # 邮件内容 new_msg['Subject'] = '西安xx大学'+time1+'招聘信息' # 邮件主题 new_msg['From'] = email_user # 发送者账号 new_msg['To'] = maillist # 接收者账号列表 att = MIMEText(open(file, 'rb').read(), 'base64', 'utf-8') att["Content-Type"] = ('application/octet-stream') att["Content-Disposition"] = ( 'attachment; filename= '+time1+'.xlsx') new_msg.attach(att) smtp = smtplib.SMTP_SSL(email_host,port=465) #连接邮箱,传入邮箱地址,使用465端口发送邮件 smtp.login(email_user, email_pwd) # 发送者的邮箱账号,密码 smtp.sendmail(email_user, maillist.split(','), new_msg.as_string())# maillist.split(',')给多人发邮件关键部分 # 参数分别是发送者,接收者,第三个是把上面的发送邮件的内容变成字符串 smtp.quit() # 发送完毕后退出smtp print ('邮件发送成功!')
if name == "main": getJobLinks(url1)
</pre>
运行结果如下:
<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558177772828 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
网友评论