美文网首页
selenium webdriver滚动加载页面的爬取

selenium webdriver滚动加载页面的爬取

作者: 风一样的存在 | 来源:发表于2019-06-28 00:48 被阅读0次

    场景:滚动滚动条页面的元素在加载,有时候会出现一个加载更多的按钮,点击后继续加载(googleplay应用市场app信息页面)

    安装的app信息

    具体代码实现:

        /**
         * 滚动页面到指定位置
         * @param context
         * @throws Exception
         */
        private void crawlApps(TaskContext context){
            WebDriver webDriver = context.getWebDriver();
            WebElement element;
            JavascriptExecutor jsExecutor=(JavascriptExecutor) webDriver;
            boolean flag = true;
            Actions actions = new Actions(webDriver);
            long checkHeight =  (Long) jsExecutor.executeScript("return document.body.scrollHeight;");
            while (flag){
                //每次滚动停顿1秒
                TimeUnit.SECONDS.sleep(1);
                //滚动页面指定的像素            
                jsExecutor.executeScript("window.scrollBy(0,document.body.scrollHeight)");
                long nextHeight =  (Long)jsExecutor.executeScript("return document.body.scrollHeight;");
                if (nextHeight > checkHeight){
                    checkHeight =  (Long)jsExecutor.executeScript("return document.body.scrollHeight;");
                    //查找是否含有加载更多的按钮
                    if (expectBy(context,By.xpath(XPATH_SHOW_MORE),5)) {
                        element = webDriver.findElement(By.xpath(XPATH_SHOW_MORE));
                        TimeUnit.SECONDS.sleep(1);
                        actions.moveToElement(element).build().perform();
                        click(context,By.xpath(XPATH_SHOW_MORE));
                    }
                }
                else{
                    flag = false;
                }
                //actions.sendKeys(Keys.PAGE_DOWN).perform();
            }
        }
    
        /**
         * 期望出现的元素
         * @param context
         * @param by
         * @param seconds
         * @return 元素是否出现
         */
        protected boolean expectBy(TaskContext context, By by, long seconds) {
            try {
                (new WebDriverWait(context.getWebDriver(), seconds)).until(ExpectedConditions.elementToBeClickable(by));
                return true;
            } catch (Exception e) {
                return false;
            }
        }
    

    相关文章

      网友评论

          本文标题:selenium webdriver滚动加载页面的爬取

          本文链接:https://www.haomeiwen.com/subject/vvljcctx.html