Fiddler+Python+UiPath抓包

Fiddler+Python+UiPath抓包

作者: 旷kevin | 来源:发表于2020-05-08 14:14 被阅读0次

Fiddler+Python+UiPath抓包
应用抓包之Fiddler抓包
tcpdump简单抓包命令
Mac 电脑抓HTTPS包手机/Charles配置流程
iOS 如何防止抓包
17天JMETER&FIddler抓包导入&运维
Fiddler 抓包基本操作(实操篇)
2018-10-23day07fiddler强化&http协议
iOS抓包（Charles）
CURL编程下载文件和直接在浏览器中下载区别

场景：截获某应用的数据包

涉及两种请求的返回截取：

1.https://xxxx.xxxx.com/static/pc/api/v1/documents/1594/group-correlations?type=wrong&page={具体哪一页}&size=20

这个请求返回列表模式下所有钩稽关系对，包含了四元组信息和索引信息

2.https://xxxx.xxxx.com/static/pc/api/v1/document/1594/html_segment?entity_type={识别出的四元组的原始来源的类型}&entity_index={具体的四元组的id}

这个请求返回某个四元组在文档中的原始来源，本次抓取只取type=paragraph的来源

纯页面抓取的困难：

1.页面上的信息不能完全展示四元组

2.关联条目没有固定格式

python抓取的困难：

1.由于网络限制request一直调不通

2.mitmproxy代理设置后各种异常阻断

最终方案：

1.使用fiddler工具作为客户端和服务器的中间代理，通过JScript.NET在请求返回时将数据包写到本地

2.python读取存到本地的数据包（json字符串），进行解析（也可在JScript.Net脚本中写入，但由于本人不熟悉还是选择用python处理）

3.设计UiPath流程不断触发页面发送请求，直到遍历完所有钩稽关系的请求，每一次翻页时执行一次python脚本

备注：

1.fiddler的代理设置灵活，即开即代理，存写数据包的代码写在Rules>Customize Rules点击后的CustomRules.js脚本里OnBeforeResponse方法中，代码如下：

CustomRules.js里

static function OnBeforeResponse(oSession: Session) {

if (m_Hide304s && oSession.responseCode == 304) {

oSession["ui-hide"] = "true";

}

if(oSession.HostnameIs("autodoc_mvip.paodingai.com")&&oSession.url.Contains("group-correlations")){

var jsonString = oSession.GetResponseBodyAsString();

var responseJSON = Fiddler.WebFormats.JSON.JsonDecode(jsonString);

//FiddlerApplication.Log.LogString(" message from OnBeforeResponse: "+responseJSON.JSONObject["data"].JSONObject["data"].length);

//for(var i=0;i<responseJSON.JSONObject["data"].JSONObject[""])

// 保存文件到本地

var fso;

var file;

fso = new ActiveXObject("Scripting.FileSystemObject");

file = fso.OpenTextFile("D:\\Users\\{我的名字}\\data\\json.txt",2 ,true, -2);

file.writeLine(jsonString);

file.writeLine("\n");

file.close();

}

if(oSession.HostnameIs("autodoc_mvip.paodingai.com")&&oSession.url.Contains("html_segment")&&oSession.url.Contains("PARAGRAPH")){

var requestString = oSession.url;

FiddlerApplication.Log.LogString("截获请求:"+requestString);

var vars = requestString.split("&");

var entity_id = vars[1].split("=")[1];

FiddlerApplication.Log.LogString("截获entity_id="+entity_id+"的数据包");

var jsonString = oSession.GetResponseBodyAsString();

// 保存文件到本地

var fso;

var file;

fso = new ActiveXObject("Scripting.FileSystemObject");

file = fso.OpenTextFile("D:\\Users\\{{我的名字}}\\data\\"+entity_id+".txt",2 ,true, -2);

file.writeLine(jsonString);

file.writeLine("\n");

file.close();

}

}

2. 超级简单python解析脚本，这里的逻辑完全基于我对返回数据包的格式理解：

extract.js

import json

import csv

def formateQuadruple(q, t):

if t=='公式不予处理':

return q

output = ""

#attributes

attributes = "attributes:"

isFirst = True

try:

for a in q['attributes']:

if isFirst:

attributes += a['text']

isFirst = False

else:

attributes += ','+a['text']

except:

print('no attributes detected')

output += attributes

#preattributes / head_attributes

if t=='paragraph':

preattributes = "preattributes:"

isFirst = True

try:

for a in q['preattributes']:

if isFirst:

preattributes += a['text']

isFirst = False

else:

preattributes += ','+a['text']

except:

print('no preattributes detected')

output += " "+preattributes

if t=='table':

head_attributes = "head_attributes:"

isFirst = True

try:

for a in q['head_attributes']:

if isFirst:

head_attributes += a['text']

isFirst = False

else:

head_attributes += ','+a['text']

except:

print('no head_attributes detected')

output += " "+head_attributes

#value

value = "value:"

isFirst = True

try:

for a in q['value']:

if isFirst:

value += a['text']

isFirst = False

else:

value += ','+a['text']

except:

print('no value detected')

output += " "+value

#time

time = "time:"

isFirst = True

try:

for a in q['time']:

if isFirst:

time += a['text']

isFirst = False

else:

time += ','+a['text']

except:

print('no time detected')

output += " "+time

return output

def findRawContent(entity_id):

content="not captured"

try:

data = open("./data/"+str(entity_id)+".txt","r").read()

obj = json.loads(data)

content=obj['data']['entity']

except:

print('there has no txt file captured for entity='+str(entity_id))

return content

if __name__ == '__main__':

print('begin to extract data packages from current page...')

data = open("./data/json.txt","r").read()

f = open('q.csv','a+',encoding='gbk',newline='')

csv_writer = csv.writer(f)

obj = json.loads(data)

#当前页面所有钩稽关系

keys = list(obj['data']['data'].keys())

for key in keys:

#print(obj['data']['data'][key])

#遍历一个钩稽关系内的多条关联:每一个关联里的main_correlation_item都是一样的

for relation in obj['data']['data'][key]:

main_entity = relation['main_correlation_item']['data']['entity']

main_page = relation['main_correlation_item']['page']

main_q = relation['main_correlation_item']['data']['entity']['quadruple']

main_origin_content=""

if main_entity['type']=='paragraph':

main_origin_content=findRawContent(main_entity['id'])

else:

main_origin_content="表格和公式不予捕捉原始来源"

matching_degree = relation['matching_degree']

#如果是多个数组就是公式，暂不予处理

if len(relation['correlation_items'])>1:

print('公式不予处理')

correlate_entity={}

correlate_entity['id']='公式不予处理'

correlate_entity['type']='公式不予处理'

correlate_page='公式不予处理'

correlate_q='公式不予处理'

correlate_origin_content='公式不予处理'

else:

if len(relation['correlation_items'])==1:

correlate_entity = relation['correlation_items'][0]['data']['entity']

correlate_page = relation['correlation_items'][0]['page']

correlate_q = relation['correlation_items'][0]['data']['entity']['quadruple']

correlate_origin_content=""

if correlate_entity['type']=='paragraph':

correlate_origin_content=findRawContent(correlate_entity['id'])

else:

correlate_origin_content="表格和公式不予捕捉原始来源"

else:

correlate_entity={}

correlate_entity['id']='无关联'

correlate_entity['type']='无关联'

correlate_page='无关联'

correlate_q='无关联'

correlate_origin_content='无关联'

csv_writer.writerow([main_entity['id'], main_entity['type'],

formateQuadruple(main_q, main_entity['type']),

main_page,

main_origin_content,

matching_degree,

correlate_entity['id'], correlate_entity['type'],

formateQuadruple(correlate_q, correlate_entity['type']),

correlate_page,

correlate_origin_content])

f.close()

print('done.')

3. uiPath流程文件：（执行前打开fiddler代理，执行中逐一点击钩稽关系对，并在每次翻页前调用一次上述python脚本）

相关文章

Fiddler+Python+UiPath抓包
场景：截获某应用的数据包涉及两种请求的返回截取： 1.https://xxxx.xxxx.com/static/...
应用抓包之Fiddler抓包
抓包前准备 tcpdump抓包：应用抓包之tcpdump命令抓包原料 1.抓包工具Fiddler（Windows...
tcpdump简单抓包命令
抓包停止抓包
Mac 电脑抓HTTPS包手机/Charles配置流程
<关键字： Mac抓包 Charles抓HTTPS包 Mac抓HTTPS包手机抓HTTPS包> 1、确保手机和电...
iOS 如何防止抓包
iOS 如何防止抓包 1、抓包原理为了防止被抓包那么就要了解抓包的原理。其实原理很是简单：一般抓包都是通过代理...
17天JMETER&FIddler抓包导入&运维
fiddler 抓包如何导入jemter 1.fiddler 抓包过滤设置 OK开始抓包 2.抓包 4.导出设置 ...
Fiddler 抓包基本操作(实操篇)
Fiddler: 抓包开关：默认处于抓包状态，左下角点击capturing，可以禁止抓包。再次点击实现抓包的状态...
2018-10-23day07fiddler强化&http协议
fiddler抓包与http协议 fiddler抓包 pc端web网站抓包一、网页不安全1.用fiddler抓包...
iOS抓包（Charles）
Charles抓包(iOS的http/https请求) Charles安装 HTTP抓包 HTTPS抓包image...
CURL编程下载文件和直接在浏览器中下载区别
目录：一抓包使用方法二抓包结果三分析差异一抓包使用方法 curl编程下载方式： tcpdump抓包，通过...

网友评论

本文标题：Fiddler+Python+UiPath抓包

本文链接：https://www.haomeiwen.com/subject/kgdfnhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

关于我们|服务条款|联系我们|Fiddler+Python+UiPath抓包|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！