缘起:师兄发起印经,需要给word电子版的经书全文加注音。
(增加新版exe程序;
比vba快,大文档用vba的能跑好几天,
新版程序的几秒就处理完,还不会因为中文标点出错;
另有修改指定词组读音等程序;
请参看:
word自带的加注音只能一次加30个字,一本经书有两万四千多字。网上找了下,好像没有现成的软件,但是python可以很简单的把汉字转换成带声调的拼音。再结合office的vba宏录制,摸索了一下,弄出了个批量给word加注音的脚本。分享如下:
效果一、python汉字转拼音
请参考:https://blog.csdn.net/r_coder/article/details/79419318
1.首先需要下载、安装python环境。我安装的python3.8版本。这一步很简单,搜索一下,下载安装就可以了。【记得安装过程中,把“加入环境变量”这个选项勾上。】
2.安装python的拼音库
网上有两个版本的python汉字转拼音的demo,一个是基于pypinyin库的,另一个是基于xpinyin库的。
基于xpinyin库的试出来拼音不带声调,后来用的pypinyin库。
安装完python之后,打开命令行(CMD),输入如下安装命令:
pip install pypinyin
然后等待安装完成。这个过程需要联网。
3.将下面代码复制下来,保存到D盘下,名为wordTOPinyin.py,即:(d:/wordToPinyin.py)。
(保存路径可以是其它,但需要同时修改后面VBA代码中脚本的路径)
import sys, getopt
import pypinyin
word = sys.argv[1:]
s = ''
for i in pypinyin.pinyin(word):
s = s + ''.join(i) + " "
print(s)
二、VBA宏,给汉字加注音
MS office默认就安装了VBA开发环境,用Alt+F11打开VBA编辑器。
WPS需要单独安装一个VBA_for_WPS的包,需要网上找一下。也是用Alt+F11打开VBA编辑器。
VBA代码:
Attribute VB_Name = "wordToPinyinByBlockSpecified"
Option Explicit
Public Declare Function timeGetTime Lib "winmm.dll" () As Long
Public g_manualCharSet As Object
Public g_specifiedCharSet As Object
Public g_exceptionCharSet As Object
Function isNeedZhuyin(ByVal tmpChar) As Boolean
' 非汉字,跳出
If LenB(StrConv(tmpChar, vbFromUnicode)) <> 2 Then
isNeedZhuyin = False
Exit Function
End If
' 手动指定字符 和 例外字符 两种模式,有点像黑白名单,没有做成优先级的方式,简单点,二选一就可以了
If True Then
' 特殊符号不需要加注音,跳出
If "" <> g_exceptionCharSet(tmpChar) Then
isNeedZhuyin = False
Exit Function
End If
isNeedZhuyin = True
Exit Function
Else
'是指定的字才加注音
If "" <> g_specifiedCharSet(tmpChar) Then
isNeedZhuyin = True
Exit Function
End If
isNeedZhuyin = False
Exit Function
End If
End Function
Function initExceptionCharSet()
Set g_exceptionCharSet = CreateObject("Scripting.Dictionary")
'不需要加注音的字符,加到这个集合里
g_exceptionCharSet.Add "。", "ok"
g_exceptionCharSet.Add ",", "ok"
g_exceptionCharSet.Add "、", "ok"
g_exceptionCharSet.Add ";", "ok"
g_exceptionCharSet.Add "…", "ok"
g_exceptionCharSet.Add "?", "ok"
g_exceptionCharSet.Add "(", "ok"
g_exceptionCharSet.Add ")", "ok"
g_exceptionCharSet.Add "(", "ok"
g_exceptionCharSet.Add ")", "ok"
g_exceptionCharSet.Add ":", "ok"
g_exceptionCharSet.Add ":", "ok"
g_exceptionCharSet.Add " ", "ok"
g_exceptionCharSet.Add "[", "ok"
g_exceptionCharSet.Add "]", "ok"
g_exceptionCharSet.Add "-", "ok"
g_exceptionCharSet.Add "+", "ok"
g_exceptionCharSet.Add "*", "ok"
g_exceptionCharSet.Add "《", "ok"
g_exceptionCharSet.Add "》", "ok"
g_exceptionCharSet.Add "【", "ok"
g_exceptionCharSet.Add "】", "ok"
g_exceptionCharSet.Add "“", "ok"
g_exceptionCharSet.Add "”", "ok"
g_exceptionCharSet.Add "—", "ok"
g_exceptionCharSet.Add Chr$(9), "ok"
g_exceptionCharSet.Add Chr$(10), "ok"
g_exceptionCharSet.Add Chr$(13), "ok"
'下面是手动注音的字,多音字可以放到这里先不注音,后面再逐一手动注音
'g_exceptionCharSet.Add "夫", "ok"
'g_exceptionCharSet.Add "还", "ok"
End Function
Function initManualCharSet()
Set g_manualCharSet = CreateObject("Scripting.Dictionary")
'如果一个字,不是所有出现的地方都相同发音,那就需要后期手动修改
'g_manualCharSet.Add "南", "ná"
'g_manualCharSet.Add "无", "mó"
'g_manualCharSet.Add "唵", "ong"
'g_manualCharSet.Add "尽", "jìn"
g_manualCharSet.Add "藏", "zàng"
g_manualCharSet.Add "佛", "fó"
g_manualCharSet.Add "刹", "chà"
g_manualCharSet.Add "甚", "shèn"
g_manualCharSet.Add "牟", "móu"
g_manualCharSet.Add "散", "sǎn"
g_manualCharSet.Add "调", "tiáo"
g_manualCharSet.Add "行", "xínɡ"
End Function
Function initSpecifiedCharSet()
Set g_specifiedCharSet = CreateObject("Scripting.Dictionary")
g_specifiedCharSet.Add "伽", "ok"
g_specifiedCharSet.Add "梵", "ok"
'g_specifiedCharSet.Add "乐", "ok"
g_specifiedCharSet.Add "苾", "ok"
g_specifiedCharSet.Add "刍", "ok"
End Function
Function getPinyin(ByVal text As String) As String
Dim Result As String
Dim cmd As String
Dim wExec As Object
Dim shellObj As Object
Set shellObj = CreateObject("WScript.Shell")
'cmd = "C:\Users\taihang\appdata\local\programs\python\python37\pythonw.exe D:\wordToPinyin.py "
cmd = "pythonw D:\wordToPinyin.py " & text
Set wExec = shellObj.Exec(cmd)
Result = wExec.StdOut.ReadAll
'如果运行时同时做其它操作,这里有时候会中段,可能是什么信号,懒得处理了,F5继续运行就可以了
'去掉空格和后面的换行,否则注音显示会歪向左边
' spaceIdx = InStr(Result, " ")
' If 0 <> spaceIdx Then
' Result = Mid(Result, 1, spaceIdx - 1)
' End If
getPinyin = Result
Set wExec = Nothing
Set shellObj = Nothing
End Function
Function isExceptionChar(ByVal char As String) As Boolean
'有些字符不需要加注音
If "" <> g_exceptionCharSet(tmpChar) Then
isExceptionChar = True
Else
isExceptionChar = False
End If
End Function
Function getManualPinyin(ByRef tmpChar As String, ByRef charPinyin As String)
Dim tmpPinyin As String
tmpPinyin = g_manualCharSet(tmpChar)
If "" <> tmpPinyin Then
charPinyin = tmpPinyin
End If
End Function
Function countChar(areaStart As Long, areaEnd As Long)
'手动算字数
Dim charNum As Long
'全选时要修正一下结束位置,否则全选时会死循环结束不了
Selection.EndKey unit:=wdStory
docEnd = Selection.End
If docEnd < areaEnd Then
areaEnd = docEnd
End If
'恢复光标到起始位置
Selection.Start = areaStart
Selection.End = areaStart
charNum = 0
Do While Selection.End < areaEnd
If charNum Mod 256 = 0 Then
' 防止假死
DoEvents
'Debug.Print "正在计算字数: " & Selection.End & "/" & areaEnd & " 百分比:" & Format((Selection.End - areaStart) / (areaEnd - areaStart), "Percent") & "%"
End If
Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdMove
charNum = charNum + 1
Loop
countChar = charNum
End Function
Function showDebugLog(ByVal prompt As String, _
ByVal allBeginTimeSec As Double, ByVal allDoneCount As Long, ByVal allNum As Long, _
ByVal phaseBeginTimeSec As Double, ByVal phaseDoneCount As Long, ByVal phaseNum As Long)
Dim nowTimeSec As Double
Dim allAvrCharCount As Double
Dim allCostTimeSec As Double
Dim phaseAvrCharCount As Double
Dim phaseCostTimeSec As Double
Dim tmpMS As Long
'tmpMS = timeGetTime()
If (0 = allDoneCount) Or (0 = phaseDoneCount) Then
Exit Function
End If
nowTimeSec = Timer()
allCostTimeSec = nowTimeSec - allBeginTimeSec
If 0 <> allCostTimeSec Then
allAvrCharCount = allDoneCount / allCostTimeSec
Else
allAvrCharCount = 1
End If
phaseCostTimeSec = nowTimeSec - phaseBeginTimeSec
If 0 <> phaseCostTimeSec Then
phaseAvrCharCount = phaseDoneCount / (nowTimeSec - phaseBeginTimeSec)
Else
phaseAvrCharCount = 1
End If
Debug.Print prompt & " all:" & Format(allCostTimeSec / 24 / 3600, "hh:mm:ss") & _
" " & Format((allNum - allDoneCount) / allAvrCharCount / 24 / 3600, "hh:mm:ss") & _
" " & allDoneCount & "/" & allNum & _
" " & Format(allAvrCharCount, "Standard") & "words/second" & _
" " & Format(allDoneCount / allNum, "Percent") & _
" phase:" & Format(phaseCostTimeSec / 24 / 3600, "hh:mm:ss") & _
" " & Format((phaseNum - phaseDoneCount) / phaseAvrCharCount / 24 / 3600, "hh:mm:ss") & _
" " & phaseDoneCount & "/" & phaseNum & _
" " & Format(phaseAvrCharCount, "Standard") & "words/count" & _
" " & Format(phaseDoneCount / phaseNum, "Percent")
End Function
Sub addPinyin()
'给选中区域加注音
'避免被系统信号打断,好像有点效果
Application.EnableCancelKey = False
Dim blockCharLimit As Long
'可自行修改每次处理的字符数量的大小
blockCharLimit = 100
Dim textForPinyin As String
Dim tmpChar As String
Dim pinyin As String
Dim blockText As String
Dim cursor1 As Long
Dim cursor2 As Long
Dim areaStart As Long
Dim areaEnd As Long
Dim docEnd As Long
Dim lastPos As Long
Dim blockAddedCharCount As Long
Dim charPinyin As String
Dim pinyinNum As Long
Dim blockStart As Long
Dim blockEnd As Long
Dim blockReadCharCount As Long
Dim blockReadMovedCharCount As Long
Dim blockAddMovedCharCount As Long
Dim allReadMovedCharCount As Long
Dim charNum As Long
Dim allAddedMovedCharCount As Long
Dim costTimeSec As Double
Dim avrCharCount As Double
Dim mainLoopCount As Long
Dim logContent As String
Dim appBeginTimeSec As Double
Dim endTimeSec As Double
Dim tmpTimeSec As Double
Dim blockBeginTimeSec As Double
Dim addBeginTimeSec As Double
'charNum =countChar(areaStart, areaEnd)
charNum = Selection.Characters.Count
areaStart = Selection.Start
areaEnd = Selection.End
appBeginTimeSec = Timer()
initManualCharSet '手动指定的读音
initSpecifiedCharSet '需要加读音的字
initExceptionCharSet ' 例外字符
Dim pinyinArr() As String
Debug.Print ""
' test
'textForPinyin = getPinyin("】")
' 收集需要加注音的字
textForPinyin = ""
blockText = ""
allReadMovedCharCount = 0
mainLoopCount = 0
'blockStart = areaStart
allAddedMovedCharCount = 0
blockEnd = areaStart
Selection.Start = areaStart
Selection.End = areaStart
Do While (allReadMovedCharCount < charNum)
blockBeginTimeSec = Timer()
'防止假死
DoEvents
mainLoopCount = mainLoopCount + 1
Debug.Print "begin to collect words"
blockStart = Selection.End
Selection.Start = blockStart
'blockLoopCount = 0
blockText = ""
blockReadCharCount = 0
blockReadMovedCharCount = 0
Do While (allReadMovedCharCount < charNum)
'防止假死
DoEvents
'进度显示
If allReadMovedCharCount Mod 64 = 0 Then
showDebugLog "step1/2 collect words", appBeginTimeSec, allReadMovedCharCount + 1, charNum, blockBeginTimeSec, blockReadCharCount + 1, blockCharLimit
End If
' 找句末,非汉字,或者不需要加注音的符号
'Do While (allReadMovedCharCount < charNum) And (blockReadCharCount < blockCharLimit)
'防止假死
'DoEvents
lastPos = Selection.End
Selection.Start = Selection.End
'光标往下移一个字
Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdMove
Selection.Start = lastPos
allReadMovedCharCount = allReadMovedCharCount + 1
blockReadMovedCharCount = blockReadMovedCharCount + 1
tmpChar = Selection.text
If True = isNeedZhuyin(tmpChar) Then
blockText = blockText + tmpChar
blockReadCharCount = blockReadCharCount + 1
ElseIf (blockCharLimit <= blockReadCharCount) Then
' 达到限制后,遇到句末才结束。python汉字转语音模块,多音字会根据词自动选择相应的读音
Exit Do
End If
'Loop
Loop
blockEnd = Selection.End
pinyin = getPinyin(blockText)
pinyinArr = Split(pinyin, " ")
pinyinNum = UBound(pinyinArr) + 1
Debug.Print "begin to add pinyin"
addBeginTimeSec = Timer()
' add pinyin word by word
charPinyin = ""
Selection.Start = blockStart
Selection.End = blockStart
blockAddedCharCount = 0
blockAddMovedCharCount = 0
Do While (blockAddMovedCharCount < blockReadMovedCharCount)
'防止假死
DoEvents
'进度显示
If blockAddedCharCount Mod 16 = 0 Then
showDebugLog "step2/2 add pinyin", appBeginTimeSec, allAddedMovedCharCount + 1, charNum, addBeginTimeSec, blockAddMovedCharCount + 1, blockReadMovedCharCount
End If
allAddedMovedCharCount = allAddedMovedCharCount + 1
blockAddMovedCharCount = blockAddMovedCharCount + 1
Selection.Start = Selection.End
lastPos = Selection.End
Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdMove
Selection.Start = lastPos
tmpChar = Selection.text
'Debug.Print "加注音:" & blockAddedCharCount & "/" & blockReadCharCount & " " & tmpChar
If True = isNeedZhuyin(tmpChar) Then
charPinyin = pinyinArr(blockAddedCharCount)
'如果是手动指定读音的字,就用手动指定的读音
getManualPinyin tmpChar, charPinyin
' 不需要加注音,跳出
'If False = isExceptionChar(tmpChar) Then
'注音操作后,字的大小会增长,当前Selection.End等属性会自动变化
cursor1 = lastPos
cursor2 = Selection.End
'cursor2 = cursor1 + 1
With Selection
'.text = ""
.SetRange Start:=cursor1, End:=cursor2
.Range.PhoneticGuide text:=charPinyin, Alignment:=wdPhoneticGuideAlignmentCenter, Raise:=19, FontSize:=10, FontName:="微软雅黑 Light"
'.SetRange Start:=cursor1, End:=cursor2
End With
'End If
blockAddedCharCount = blockAddedCharCount + 1
End If
Loop
' '如果add中移动光标的数量与read中的不一致,就需要修正光标位置
' If (blockReadMovedCharCount <> blockAddMovedCharCount) Then
' Selection.Start = Selection.End
' Selection.MoveRight unit:=wdCharacter, Count:=blockReadMovedCharCount - blockAddMovedCharCount, Extend:=wdMove
' End If
Loop
Set g_manualCharSet = Nothing
Set g_specifiedCharSet = Nothing
Set g_exceptionCharSet = Nothing
' 恢复选中区域
Selection.Start = areaStart
endTimeSec = Timer()
logContent = "all cost time:" & Format((endTimeSec - appBeginTimeSec) / 24 / 3600, "hh:mm:ss") & " all words count:" & charNum & _
" average:" & Format(charNum / (endTimeSec - appBeginTimeSec), "Standard") & "word/second"
Debug.Print logContent
MsgBox logContent
End Sub
如果直接复制网页网页上的代码到VB编辑器中,汉字会乱码。先把网页代码复制到记事本(notepad),然后另存为ANSI编码的文件,后缀名为.bas。否则中文字符在vba编辑器中会变成乱码。然后修改,在VB编辑器中导入文件,这样比较省事。
打开VB编辑器的【视图】->【立即窗口】可以查看日志和大概进度。
踩过的坑:汉字加上注音后,当前字的范围会变化,加上描述属性后,大小会增加,Selection.Start和Selection.End都会变动,后面内容的位置也会因此变化,用代码+1的方式来移动光标会出现各种乱码。(最新测试好像又可以了。)
三、运行
有一些边界条件没有测试,常用的就是选中一段文字,也可直接Crtl+a全选,然后
WPS选择【开发工具】->【宏】(MS选择【视图】->【宏】),
选择 addPinyin 宏,运行,就可以看到word中在一个字一个字的加注音。
运行期间,最好不要操作word(可以做切换到其他的应用),不要切换到word的其他文档去,不要点击文档里的内容。否则就会变成给当前查看的文档、或者点击处的内容加注音,而且可能导致死循环。(如果死循环了,可以杀掉进程重新运行,或者在VBA编辑器里选择【运行】->【重新设置】以重置。代码里没有加自动保存的内容,重置可以保留已经处理过的结果。)
运行时会看到光标会移动两遍,第一遍是遍历选中内容,这个比较快;第二遍是加注音,慢一些。
想找一下看看能不能多文档并行处理的方法,不知道怎么在不用select的情况下光标MoveRight,也没找到文档(懒)。想要多开,就把文档拆分成多个,用虚拟机或者多台电脑。
四、收尾,人工校对,重新调整排版
加了注音后,WPS Word里看排版会变化,很多换行,不知道怎么设置,用MS Word打开就好很多。转成常用的pdf格式即可。
五、定制
initManualCharSet 和 initExceptionCharSet 两个函数中的内容可以根据需要自行修改。
一般来说,文档里的中文字符都需要加到initExceptionCharSet中。可以先把未加拼音的原文档备份一下,先试几页,大概就知道需要排除那些符号了。
需要自定义拼音的格式,就先按要求修改一个例子,录制成宏,然后把代码中.Range.PhoneticGuide这一行后面的属性修改成录制的宏里面对应的属性。
文档中的汉字号,需要加到initExceptionCharSet中去,不然会出现注音一行也会带符号的情况。
加上注音后,WPS Word中无法搜索,MS Word中可以搜索单个字。
已经有注音的字,会被判断为不是汉字,已有注音不会被改变。
第一个版本是一个字一个字的转换的,效率比较慢,两万四千个字大概两个小时加完。后来改成批量的,时间缩短到四十分钟左右。
有师兄有更高效率的方案,请告诉我一下,非常感激。
感谢给python库做贡献的前辈们!
感谢写示例demo的前辈们!
补:专业加拼音软件 adobe indesign
好像有个“拼音居士”的word插件,一次可以加3万多字,没有亲自用过,有兴趣的可以去试试。
内容里不能有跳转链接,不然会出问题。
如果经常用,可以自定义菜单,给某个菜单卡新建一个组,把这个宏命令加进去,成为一个固定按钮。
后来遇到一个问题,文档字很多,10万+,运行了一两天都没有结束,原因是运行到几百页就变得非常慢。经过尝试,大概是前面字多了,光标定位效率就低了。
解决办法:运行注音用一个文档,保存已经注音的内容用另外一个文档;每次都注音几十页到100页的内容,这一部分注音结束就剪切追加到到已注音文档后面保存,这样效率高一些。就是有些地方的格式可能会被打乱一点,只能全都结束了之后再调一下了。
附:
网友评论