微信文章爬取
作者:
秦汉邮侠 | 来源:发表于
2019-08-05 19:44 被阅读0次 public void process(Page page) {
String rawText = page.getRawText();
Html html = page.getHtml();
String regEx = "(,s=\")\\d{4}-\\d{2}-\\d{2}(\";)";
Pattern p = Pattern.compile(regEx);
Matcher matcher = p.matcher(rawText);
if (matcher.find()) {
String res = matcher.group();
String[] array = res.split("\"");
String str = array[1];
}
String title = html.xpath("//h2[@class='rich_media_title']/text()").toString();
String content = html.xpath("//div[@class='rich_media_content ']").toString();
Html html1 = new Html(content);
List<String> imageList = html1.xpath("//img/@data-src").all();
System.out.println("ok");
System.out.println("hello");
}
本文标题:微信文章爬取
本文链接:https://www.haomeiwen.com/subject/ovjvdctx.html
网友评论