美文网首页
selenium webdriver滚动加载页面的爬取

selenium webdriver滚动加载页面的爬取

作者: 风一样的存在 | 来源:发表于2019-06-28 00:48 被阅读0次

场景:滚动滚动条页面的元素在加载,有时候会出现一个加载更多的按钮,点击后继续加载(googleplay应用市场app信息页面)

安装的app信息

具体代码实现:

    /**
     * 滚动页面到指定位置
     * @param context
     * @throws Exception
     */
    private void crawlApps(TaskContext context){
        WebDriver webDriver = context.getWebDriver();
        WebElement element;
        JavascriptExecutor jsExecutor=(JavascriptExecutor) webDriver;
        boolean flag = true;
        Actions actions = new Actions(webDriver);
        long checkHeight =  (Long) jsExecutor.executeScript("return document.body.scrollHeight;");
        while (flag){
            //每次滚动停顿1秒
            TimeUnit.SECONDS.sleep(1);
            //滚动页面指定的像素            
            jsExecutor.executeScript("window.scrollBy(0,document.body.scrollHeight)");
            long nextHeight =  (Long)jsExecutor.executeScript("return document.body.scrollHeight;");
            if (nextHeight > checkHeight){
                checkHeight =  (Long)jsExecutor.executeScript("return document.body.scrollHeight;");
                //查找是否含有加载更多的按钮
                if (expectBy(context,By.xpath(XPATH_SHOW_MORE),5)) {
                    element = webDriver.findElement(By.xpath(XPATH_SHOW_MORE));
                    TimeUnit.SECONDS.sleep(1);
                    actions.moveToElement(element).build().perform();
                    click(context,By.xpath(XPATH_SHOW_MORE));
                }
            }
            else{
                flag = false;
            }
            //actions.sendKeys(Keys.PAGE_DOWN).perform();
        }
    }

    /**
     * 期望出现的元素
     * @param context
     * @param by
     * @param seconds
     * @return 元素是否出现
     */
    protected boolean expectBy(TaskContext context, By by, long seconds) {
        try {
            (new WebDriverWait(context.getWebDriver(), seconds)).until(ExpectedConditions.elementToBeClickable(by));
            return true;
        } catch (Exception e) {
            return false;
        }
    }

相关文章

网友评论

      本文标题:selenium webdriver滚动加载页面的爬取

      本文链接:https://www.haomeiwen.com/subject/vvljcctx.html