美文网首页
微信文章爬取

微信文章爬取

作者: 秦汉邮侠 | 来源:发表于2019-08-05 19:44 被阅读0次
     public void process(Page page) {
            String rawText = page.getRawText();
            Html html = page.getHtml();
            String regEx =  "(,s=\")\\d{4}-\\d{2}-\\d{2}(\";)";
            Pattern p = Pattern.compile(regEx);
            Matcher matcher = p.matcher(rawText);
    
            if (matcher.find()) {
                String res =  matcher.group();
                String[] array = res.split("\"");
                String str = array[1];
            }
            String title = html.xpath("//h2[@class='rich_media_title']/text()").toString();
            String content = html.xpath("//div[@class='rich_media_content ']").toString();
    
            Html html1 = new Html(content);
    
            List<String> imageList = html1.xpath("//img/@data-src").all();
    
            System.out.println("ok");
    
    
            System.out.println("hello");
        }
    

    相关文章

      网友评论

          本文标题:微信文章爬取

          本文链接:https://www.haomeiwen.com/subject/ovjvdctx.html