爬虫啦（node）

作者: heheheyuanqing | 来源:发表于2017-09-23 10:30 被阅读31次

爬虫啦（node）
node爬虫之路（一）
node爬虫快速入门
node入门场景之——爬虫
node 爬虫
node爬虫
Node爬虫
node爬虫
node 爬虫
node爬虫

在同学的建议下我开始学习如何抓取一个网页，获取网页上的信息。

https://book.douban.com/ [豆瓣读书]

获取书名

HTTPS和cheerio模块实现

通过HTTPS模块进行获取整个HTML页面
//使用get方法发送请求

https.get(url,function (res) {
   var html = '';
   res.on('data',function (data) {
        html +=data;
   });
    res.on('end',function () {
        console.log(html);
    });
}).on('error',function () {
    console.log('爬取页面错误');
});

分析所要获取的信息

警察

父亲的失乐园
可见所有的书籍的名称都在class为title的div中，以及a标签中

通过cheerio模块进行对获取到的html进行分析
//封装在crawleChapter函数中

function crawleChapter(html) {
    var $ = cheerio.load(html);
    var books = $('.title');//获取class为title的div
    var data = [];

    books.map(function (node) {
        var books = $(this);
        var booksName = books.find('a').text().trim();//遍历div,获取a标签的文本即书籍的书名信息

data.push(booksName);
    });
    console.log(data);
}

superagent模块

可以使用superagent模块来进行与服务器的交互

superagent.get(url)
        .end(function (err, res) {
          //请求成功之后进行的解析html文件
        });

网友评论

本文标题：爬虫啦（node）

本文链接：https://www.haomeiwen.com/subject/hmblsxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

爬虫啦（node）

获取书名

superagent模块

相关文章

爬虫啦（node）

node爬虫之路（一）

node爬虫快速入门

node入门场景之——爬虫

node 爬虫

node爬虫

Node爬虫

node爬虫

node 爬虫

node爬虫

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读