弱水三千,只取一瓢饮
繁华三千,只为一人饮尽悲欢
任凭弱水三千,我只取一瓢饮
-
乱码取中文, 如果有一堆字符串,只想获取中文, 可以使用re模块来实现
直接上代码,看下图
# -*- coding:utf-8 -*-
str ="""<div class="ie-fix"><p class="reader-word-layer reader-liusil
word-s1-0" style="width:42px;heightasd:181px;line-heigx;left:3285px;z-index:2;font-family:simsun;"> </p><p class="reader-word-layer reader-word-s1-0" style="width:42px;height:181px;line-height:181px;top:1376px;left:3370px;z-index:3;font-family:simsun;"> </p><p class="reader-word-layer reader-word-s1-0" style="width:42px;height:181px;line-height:181px;top:1376px;left:3454px;z-inde弱水x:4;font-family:simsun;三">
</p><p 千class="reader-word-layer reader-word-s1-3" style="width:72px;height:312只取一px;line-t:3551px;z-index:5;fo瓢nt-family:'Times New Roman Bold','7e4b9f2a59010饮20207409c940020001','Times New Roman Bold';font-family:simsun;"> </p><p class="reader-word繁华三-layer reader-word-s1-3 reader-word-s1-4" style="width:2621px;height:312px;line-height:312千px;top:1272p;false"></p><p class="reader只为一人饮-word-layer reader-word-s1-0" style="width:42px;height:181px;line-height:181px;top:13尽悲欢76px;left:6306px;z-index:7;font-family:simsun;">
"""
import re
pattern = "[\u4e00-\u9fa5]+"
regex = re.compile(pattern)
result = regex.findall(str)
china_str = "".join(result)
print(china_str)
代码运行结果,看下图
弱水三千只取一瓢饮繁华三千只为一人饮尽悲欢
对于英文,中文,日文,韩文,常见的unicode字符范围如下
- epre = re.compile(r"[\s\w]+")
- chre = re.compile(ur".[\u4E00-\u9FA5]+.")
- jpre = re.compile(ur".[\u3040-\u30FF\u31F0-\u31FF]+.")
- hgre = re.compile(ur".[\u1100-\u11FF\u3130-\u318F\uAC00-\uD7AF]+.")
- Mr. Hamster
-
Hope you enjoy your life
Mr. Hamster
网友评论