乱码三千，只取中文

作者: torrent_lsl | 来源:发表于2019-01-24 19:02 被阅读3次

乱码三千，只取中文
nodejs+axios爬取html出现中文乱码
python爬取中文乱码
Day_04_Servlet
py爬取的html中文乱码
#69 9月4日知足常乐
一次提问引发的深思，从此再也不怕“Python乱码”问题
Servlet的一些问题
sublime使用中若干问题的解答
charles 配置

弱水三千，只取一瓢饮
繁华三千，只为一人饮尽悲欢

任凭弱水三千，我只取一瓢饮

乱码取中文, 如果有一堆字符串，只想获取中文, 可以使用re模块来实现

直接上代码，看下图

# -*- coding:utf-8 -*-
str ="""<div class="ie-fix"><p class="reader-word-layer reader-liusil
word-s1-0" style="width:42px;heightasd:181px;line-heigx;left:3285px;z-index:2;font-family:simsun;"> </p><p class="reader-word-layer reader-word-s1-0" style="width:42px;height:181px;line-height:181px;top:1376px;left:3370px;z-index:3;font-family:simsun;"> </p><p class="reader-word-layer reader-word-s1-0" style="width:42px;height:181px;line-height:181px;top:1376px;left:3454px;z-inde弱水x:4;font-family:simsun;三"> 
</p><p 千class="reader-word-layer reader-word-s1-3" style="width:72px;height:312只取一px;line-t:3551px;z-index:5;fo瓢nt-family:'Times New Roman Bold','7e4b9f2a59010饮20207409c940020001','Times New Roman Bold';font-family:simsun;"> </p><p class="reader-word繁华三-layer reader-word-s1-3 reader-word-s1-4" style="width:2621px;height:312px;line-height:312千px;top:1272p;false"></p><p class="reader只为一人饮-word-layer reader-word-s1-0" style="width:42px;height:181px;line-height:181px;top:13尽悲欢76px;left:6306px;z-index:7;font-family:simsun;"> 
"""

import re
pattern = "[\u4e00-\u9fa5]+"
regex = re.compile(pattern)
result = regex.findall(str)
china_str = "".join(result)
print(china_str)

代码运行结果，看下图

弱水三千只取一瓢饮繁华三千只为一人饮尽悲欢

对于英文，中文，日文，韩文，常见的unicode字符范围如下

epre = re.compile(r"[\s\w]+")
chre = re.compile(ur".[\u4E00-\u9FA5]+.")
jpre = re.compile(ur".[\u3040-\u30FF\u31F0-\u31FF]+.")
hgre = re.compile(ur".[\u1100-\u11FF\u3130-\u318F\uAC00-\uD7AF]+.")

Mr. Hamster
Hope you enjoy your life

Mr. Hamster

网友评论

程序员

本文标题：乱码三千，只取中文

本文链接：https://www.haomeiwen.com/subject/jduujqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

乱码三千，只取中文

乱码取中文, 如果有一堆字符串，只想获取中文, 可以使用re模块来实现

直接上代码，看下图

相关文章

乱码三千，只取中文

nodejs+axios爬取html出现中文乱码

python爬取中文乱码

Day_04_Servlet

py爬取的html中文乱码

#69 9月4日知足常乐

一次提问引发的深思，从此再也不怕“Python乱码”问题

Servlet的一些问题

sublime使用中若干问题的解答

charles 配置

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

程序员