python pyttsx3模块初探及实战项目 (爬取小说后朗读

作者: 也然君 | 来源:发表于2018-11-15 22:59 被阅读11240次

python pyttsx3模块初探及实战项目 (爬取小说后朗读
2017-12-31
python 爬虫练习（一）
Python PIL模块初探字母验证码识别及实战项目(爬取图
Python爬虫实战之爬取链家广州房价_03存储
Python实战计划学习笔记1-3：爬取租房信息
python爬虫实战——爬取股票个股信息
【Python实战】爬取国家社科基金项目数据
算技@Python爬虫-租房信息（即将过期）
python实战项目：爬取某小说网

实战项目：利用urllib爬取小说网站内容后朗读出来
涉及模块 pyttsx3，（安装pip install pyttsx3）
环境PyCharm+python3.6.3
在项目进行之前，我们首先要配置下自己的环境，找到发出声音的东西。不然，硬件设备不支持，没用的！

#利用pywin32模块，来实现微软的语音接口调用
#安装pip3 install pywin32
import win32com.client
#微软这个服务器
speaker = win32com.client.Dispatch("SAPI.SpVoice")
speaker.Speak("你好，小姐姐，能加个微信吗？")

之后运行，就会听到悦耳的声音!
然后，我们简单看一下pyttsx3模块的用法

import pyttsx3
engine = pyttsx3.init()
engine.say('君不见，黄河之水天上来，奔流到海不复回。')
engine.say('君不见，高堂明镜悲白发，朝如青丝暮成雪。')
#运行并且等待
engine.runAndWait()

我们看一下pyttsx3.init()的源代码

image.png
从方法声明上来看，第一个参数指定的是语音驱动的名称，这个在底层适合操作系统密切相关的。如下：
drivename：由pyttsx.driver模块根据操作系统类型来调用，默认使用当前操作系统可以使用的最好的驱动
sapi5 - SAPI5 on Windows
nsss - NSSpeechSynthesizer on Mac OS X
espeak - eSpeak on every other platform
debug: 这第二个参数是指定要不要以调试状态输出，建议开发阶段设置为True
接下来看一下pyttx3的一些常规操作。
getProperty #获取当前引擎实例的属性值
setProperty #设置当前引擎实例的属性值

#更改声音(音色处理)
import pyttsx3
engine = pyttsx3.init()
voices = engine.getProperty('voices')
print(len(voices))
for voice in voices:
    engine.setProperty('voice', voice.id)
    engine.say('I will always love you ')
    engine.runAndWait()

发现打印出voices的长度为2，系统默认两种声音

image.png

#控制语速(频率处理）
import pyttsx3
engine = pyttsx3.init()
rate = engine.getProperty('rate')
engine.setProperty('rate', rate-150)
engine.say('I will always love you')
engine.runAndWait()

到这里，运行结果听一下，有没有一种树懒的感觉，哈哈

#控制音量
import pyttsx3
engine = pyttsx3.init()
volume = engine.getProperty('volume')
#engine.setProperty('volume', volume-0.25) 不明显
engine.setProperty('volume', volume-0.75)
engine.say('I will always love you')
engine.runAndWait()

到这里，我们整理一下，对txt文件中的内容进行朗读
直接上代码，对文件操作不熟悉的，自行百度

import pyttsx3
with open('2.txt','r',encoding='utf-8') as f:
    line = f.read()#文件不大，一次性读取
    engine = pyttsx3.init()
    #调整频率
    rate = engine.getProperty('rate')
    engine.setProperty('rate', rate-50)
    # 调整音量
    volume = engine.getProperty('volume')
    engine.setProperty('volume', volume+0.25)
    engine.say(line)
    engine.runAndWait()

朗读的内容是李白的《将进酒》，不足之处，对多音字没处理（感兴趣的同学可以尝试做一下容错）

image.png

好的，到这里，开始今天的实战项目-----爬取小说，朗读出来
项目开始之前我们首先要分析：
第一步：爬取小说网站
第二步：利用xpth 获取内容
第三步：利用文件操作生成一个txt文件
第四步：处理文件内容
第五步：读取出来
直接上代码

from urllib import request
import time
import pyttsx3
from lxml import etree
#小说《大医凌然》 志鸟村 著
url = 'https://read.qidian.com/chapter/Y8j5WWawoqd1C4AOuV6yIg2/oG-HexlEuhG2uJcMpdsVgA2'
headers = {
    "Referer": "https://read.qidian.com/",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
}
req = request.Request(url=url,headers=headers)
response = request.urlopen(req)
content = response.read().decode()
#复制html中的文本的XPath
##//*[@id="chapter-406583253"]/div/div[2]/p[1]
# print(content)
xpath_content = etree.HTML(content)
new_content = xpath_content.xpath('//*[@id="chapter-406583253"]/div/div/p/text()')
#print(new_content)
with open('3.txt','w',encoding='utf-8') as f:
    for i in new_content:
        f.writelines(i.strip())
        f.writelines('\n')
time.sleep(2)
with open('3.txt','r',encoding='utf-8') as f:
    line = f.read()
    engine = pyttsx3.init()
    volume=engine.getProperty('volume')
    engine.setProperty('volume', volume + 0.25)
    engine.say(line)
    engine.runAndWait()

运行结束，开始朗读
我们对比下爬取文本的内容和小说内容，发现一致，