前嗅ForeSpider脚本教程：频道脚本

作者: 前嗅大数据 | 来源:发表于2019-03-25 10:56 被阅读0次

前嗅ForeSpider脚本教程-频道脚本：脚本采集数据
前嗅ForeSpider脚本教程：频道脚本使用场景及配置关键词搜
前嗅ForeSpider脚本教程：频道脚本
前嗅ForeSpider脚本教程-链接抽取：链接在POST请求里
前嗅ForeSpider脚本教程：脚本概述
前嗅ForeSpider脚本教程-链接抽取：链接在源码的js变量
前嗅ForeSpider脚本教程-数据抽取脚本实战教程
前嗅ForeSpider脚本教程-链接抽取：自定义链接写脚本
前嗅ForeSpider脚本教程-链接过滤脚本
前嗅ForeSpider脚本教程：链接脚本

频道脚本是频道配置中的脚本, 如果配置了频道脚本，该频道的所有采集流程将被配置的脚本所接管。

一.可用全局对象（只读）

EXTRACT：当前采集引擎[ 对象类型: extractor ]

DATADB：当前连接的数据库[ 对象类型: dataBase ]

RESULT：当前结果集对象[ 对象类型: result ]

二.this指针

当前频道节点[channel ]对象

三.脚本返回值

无

示例一：用脚本创建一个采集源列表

1.以下脚本将生成 http://xjrb.xjrb.com/xjrb/20141201/index.htm~ http://xjrb.xjrb.com/xjrb/20141231/index.htm 共31条链接:

url u;

for(i=1;i <=31;i++)

{

u.entryid = this.id;//频道

idu.tmplid =1;//模板

Idu.urlname ="http://xjrb.xjrb.com/xjrb/201412"+ i.Dim(2) +"/index.htm";//链接地址

u.title ="test";

RESULT.AddLink(u);

//添加到最后的结果中

}

2.以下脚本将生成从当前日期递推前十天的链接:

url u;

time t1;

for(i=0;i<10;i++)

{

u.title ="test";//链接标题

u.entryid = this.id;//频道

idu.tmplid =1;//模板

Idpre = t1.Preday(i);//向前计算日期

u.urlname ="http://www.cdrb.com.cn/html/"+ pre.year +"-"+ pre.month +"/"+ pre.day +"/content_2155799.htm";//链接地址

RESULT.AddLink(u);//添加到最后的结果中

}

3.以下脚本用关键词拼接链接:

url u;var keys=["前嗅","爬虫"];for(i=0;i

示例二：用脚本采集数据

1.以下脚本查找表格并抽取表格数据：

gdoc = EXTRACT.OpenDoc(this,"http://gk.sjtu.edu.cn/index.php/list/fellow/2015-10-30-15-02-59/241-2015-11-18-02-21-01",0);

if(gdoc){dm = gdoc.GetDom();

record rec;

if(dm)

{

tab = dm.FindName("table");

if(tab)

{

tr = dm.FindName("tr", tab);

while(tr)

{

name = dm.FindName("td", tr);

if(name)

{//找到数据

posd =0; corp=0;fund=0;

rec.name = dm.GetTextAll(name);//名字

posd = name.next;

if(posd)

{corp = posd.next;

rec.position = dm.GetTextAll(posd);

}

if(corp)

{

fund = corp.next;

rec.company = dm.GetTextAll(corp);

}

if(fund)

{

rec.fund = dm.GetTextAll(fund);

}

RESULT.AddRec(rec,3);

}

tr = tr.next;

}

EXTRACT.CloseDoc(gdoc);

}

2.以下脚本从服务器请求json数据并存入到记录中：

gdoc = EXTRACT.OpenDoc(this,"http://www.w3school.com.cn//example/jquery/demo_ajax_json.js",0);

if(gdoc)

{

jScript js;

record rec;

data = js.RunJson(gdoc.GetDom().GetSource());

rec.name = data.firstName;

rec.family=data.lastName;

rec.age = data.age;

schea = EXTRACT.GetSchema("schemaName"); //获取表单

IDif(schea) sId = schea.id;

elsesId = 1;

RESULT.AddRec(rec,sId);

EXTRACT.CloseDoc(gdoc);

}

网友评论

本文标题：前嗅ForeSpider脚本教程：频道脚本

本文链接：https://www.haomeiwen.com/subject/mnfvvqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

前嗅ForeSpider脚本教程：频道脚本

相关文章