报错
1、UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
错误原因:编码问题
解决:
.py
import io
import os
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030') #改变标准输出的默认编码
2、elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at
0x00000000039C9828>: Failed to establish a new connection: [WinError 10061] ▒▒▒▒Ŀ▒▒▒▒▒▒▒▒▒▒▒ܾ▒▒▒▒▒▒▒▒ӡ▒) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x00000000039C9828>: Failed to establish a new connection: [WinError 10061] ▒▒▒▒Ŀ▒▒▒▒▒▒▒▒▒▒▒ܾ▒▒▒▒▒▒▒▒ӡ▒)
错误原因:数据库那边的问题,导致连接失败
3、IndexError: list index out of range
错误原因:数组有问题 https://blog.csdn.net/huacode/article/details/79759205
4、爬取数据的时候爬到一般中断,可以通过try expect来跨过去,继续执行程序
5、^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
解决办法:https://www.cnblogs.com/hfdkd/p/7902530.html
6、_compile(pattern, flags).findall(string) TypeError: cannot use a string pattern on a bytes-like
https://www.cnblogs.com/areyouready/p/9032251.html
要注意
:findall()的参数需要加上decode('utf-8')
infor = p.text.strip().encode('utf-8').decode('utf-8') #此处用utf-8编码,以免下面查找 ‘主演’下标报错
ya = re.findall('[0-9]+.*\/?', infor)[0] # re得到年份和地区
网友评论