webdriver

作者: 一只烟酒僧 | 来源:发表于2021-02-25 23:36 被阅读0次

webdriver:一款类似于RSelenium的包,可以用于爬取动态网页,模拟点击翻页登陆等操作,GitHub地址:https://github.com/rstudio/webdriver
个人感觉体验优于RSelenium,至少不用在旁边开一个浏览器(确信)

######################################################## 
#-------------------------------------------------------
# Topic:使用webdriver爬取meiosisonline
# Author:
# Date:Fri Feb 26 10:36:37 2021
# Mail:
#-------------------------------------------------------
########################################################


#使用webdriver
library(webdriver)
library(rvest)
library(stringr)
pjs<-run_phantomjs()
session<-Session$new(port = pjs$port)
session$go("https://mcg.ustc.edu.cn/bsc/meiosis/search.php?search_tag=species&search_input=Stra8&sub_species=Mus+musculus&sub_function_gender=Male&sub_function_stage=Anaphase+I&sub_reproduction=Infertile&sub_complex=CDK2-CCNA2&sub_location=Cell+junction&sub_tissue=Brain&sub_method=cKO&browse_submit=Submit")
my_info<-data.frame()
max_page<-read_html(session$getSource())%>%html_node(xpath = '//span[@id="sp_1_gridPager"]')%>%html_text()
max_page<-as.numeric(str_replace_all(max_page,",",""))
for (i in 1:max_page) {
  #爬取信息
  page<-read_html(session$getSource())
  my_info_sub<-html_node(page,xpath = '//*[@id="gridTable"]')%>%html_table()
  my_info<-rbind(my_info,my_info_sub)
  #模拟操作:翻页=元素识别+单击
  fanye<-session$findElement(xpath = '//span[@class="ui-icon ui-icon-seek-next"]')
  fanye$click()
  session$getUrl()
  print(paste0("进入第",i,"页开始爬取"))
  
}
#整理结果
my_info<-my_info[my_info$X1!="",]
colnames(my_info)<-colnames(read_html(session$getSource())%>%html_node(xpath = '//*[@id="gview_gridTable"]/div[2]/div/table')%>%html_table())

相关文章

网友评论

      本文标题:webdriver

      本文链接:https://www.haomeiwen.com/subject/wpppfltx.html