美文网首页
微信文章爬取

微信文章爬取

作者: 秦汉邮侠 | 来源:发表于2019-08-05 19:44 被阅读0次
 public void process(Page page) {
        String rawText = page.getRawText();
        Html html = page.getHtml();
        String regEx =  "(,s=\")\\d{4}-\\d{2}-\\d{2}(\";)";
        Pattern p = Pattern.compile(regEx);
        Matcher matcher = p.matcher(rawText);

        if (matcher.find()) {
            String res =  matcher.group();
            String[] array = res.split("\"");
            String str = array[1];
        }
        String title = html.xpath("//h2[@class='rich_media_title']/text()").toString();
        String content = html.xpath("//div[@class='rich_media_content ']").toString();

        Html html1 = new Html(content);

        List<String> imageList = html1.xpath("//img/@data-src").all();

        System.out.println("ok");


        System.out.println("hello");
    }

相关文章

网友评论

      本文标题:微信文章爬取

      本文链接:https://www.haomeiwen.com/subject/ovjvdctx.html