美文网首页
[DIY]给word中的汉字批量加、修改拼音(word全篇加拼音

[DIY]给word中的汉字批量加、修改拼音(word全篇加拼音

作者: 姜附 | 来源:发表于2019-03-15 08:34 被阅读0次

    缘起:师兄发起印经,需要给word电子版的经书全文加注音。

    (增加新版exe程序;

    比vba快,大文档用vba的能跑好几天,

    新版程序的几秒就处理完,还不会因为中文标点出错;

    另有修改指定词组读音等程序;

    请参看:

    word文档全篇加拼音(批量注音)+批量修改读音


    word自带的加注音只能一次加30个字,一本经书有两万四千多字。网上找了下,好像没有现成的软件,但是python可以很简单的把汉字转换成带声调的拼音。再结合office的vba宏录制,摸索了一下,弄出了个批量给word加注音的脚本。分享如下:

    效果

    一、python汉字转拼音

    请参考:https://blog.csdn.net/r_coder/article/details/79419318

    1.首先需要下载、安装python环境。我安装的python3.8版本。这一步很简单,搜索一下,下载安装就可以了。【记得安装过程中,把“加入环境变量”这个选项勾上。】

    2.安装python的拼音库
    网上有两个版本的python汉字转拼音的demo,一个是基于pypinyin库的,另一个是基于xpinyin库的。
    基于xpinyin库的试出来拼音不带声调,后来用的pypinyin库。

    安装完python之后,打开命令行(CMD),输入如下安装命令:
    pip install pypinyin
    然后等待安装完成。这个过程需要联网。

    3.将下面代码复制下来,保存到D盘下,名为wordTOPinyin.py,即:(d:/wordToPinyin.py)。
    (保存路径可以是其它,但需要同时修改后面VBA代码中脚本的路径)

    import sys, getopt
    import pypinyin
    
    word = sys.argv[1:]
    
    s = ''
    for i in pypinyin.pinyin(word):
        s = s + ''.join(i) + " "
    
    print(s)
    
    

    二、VBA宏,给汉字加注音

    MS office默认就安装了VBA开发环境,用Alt+F11打开VBA编辑器。
    WPS需要单独安装一个VBA_for_WPS的包,需要网上找一下。也是用Alt+F11打开VBA编辑器。

    VBA代码:

    Attribute VB_Name = "wordToPinyinByBlockSpecified"
    Option Explicit
    
    Public Declare Function timeGetTime Lib "winmm.dll" () As Long
    
    Public g_manualCharSet As Object
    Public g_specifiedCharSet As Object
    Public g_exceptionCharSet As Object
    
    Function isNeedZhuyin(ByVal tmpChar) As Boolean
                
                ' 非汉字,跳出
                If LenB(StrConv(tmpChar, vbFromUnicode)) <> 2 Then
                        isNeedZhuyin = False
                        Exit Function
                End If
                
                ' 手动指定字符 和 例外字符 两种模式,有点像黑白名单,没有做成优先级的方式,简单点,二选一就可以了
                If True Then
                        ' 特殊符号不需要加注音,跳出
                        If "" <> g_exceptionCharSet(tmpChar) Then
                                isNeedZhuyin = False
                                Exit Function
                        End If
                        
                        isNeedZhuyin = True
                        Exit Function
                        
                Else
                        '是指定的字才加注音
                        If "" <> g_specifiedCharSet(tmpChar) Then
                                isNeedZhuyin = True
                                Exit Function
                        End If
                        
                        isNeedZhuyin = False
                        Exit Function
                    
                End If
                
    End Function
    
    Function initExceptionCharSet()
    
            Set g_exceptionCharSet = CreateObject("Scripting.Dictionary")
    
            '不需要加注音的字符,加到这个集合里
            
            g_exceptionCharSet.Add "。", "ok"
            g_exceptionCharSet.Add ",", "ok"
            g_exceptionCharSet.Add "、", "ok"
            g_exceptionCharSet.Add ";", "ok"
            g_exceptionCharSet.Add "…", "ok"
            g_exceptionCharSet.Add "?", "ok"
            g_exceptionCharSet.Add "(", "ok"
            g_exceptionCharSet.Add ")", "ok"
            g_exceptionCharSet.Add "(", "ok"
            g_exceptionCharSet.Add ")", "ok"
            g_exceptionCharSet.Add ":", "ok"
            g_exceptionCharSet.Add ":", "ok"
            g_exceptionCharSet.Add " ", "ok"
            g_exceptionCharSet.Add "[", "ok"
            g_exceptionCharSet.Add "]", "ok"
            g_exceptionCharSet.Add "-", "ok"
            g_exceptionCharSet.Add "+", "ok"
            g_exceptionCharSet.Add "*", "ok"
            g_exceptionCharSet.Add "《", "ok"
            g_exceptionCharSet.Add "》", "ok"
            g_exceptionCharSet.Add "【", "ok"
            g_exceptionCharSet.Add "】", "ok"
            g_exceptionCharSet.Add "“", "ok"
            g_exceptionCharSet.Add "”", "ok"
            g_exceptionCharSet.Add "—", "ok"
            g_exceptionCharSet.Add Chr$(9), "ok"
            g_exceptionCharSet.Add Chr$(10), "ok"
            g_exceptionCharSet.Add Chr$(13), "ok"
    
      
            '下面是手动注音的字,多音字可以放到这里先不注音,后面再逐一手动注音
            'g_exceptionCharSet.Add "夫", "ok"
            'g_exceptionCharSet.Add "还", "ok"
        
        
    End Function
    
    Function initManualCharSet()
    
            Set g_manualCharSet = CreateObject("Scripting.Dictionary")
            
            '如果一个字,不是所有出现的地方都相同发音,那就需要后期手动修改
        
            'g_manualCharSet.Add "南", "ná"
            'g_manualCharSet.Add "无", "mó"
            'g_manualCharSet.Add "唵", "ong"
            'g_manualCharSet.Add "尽", "jìn"
            g_manualCharSet.Add "藏", "zàng"
            g_manualCharSet.Add "佛", "fó"
            g_manualCharSet.Add "刹", "chà"
            g_manualCharSet.Add "甚", "shèn"
            g_manualCharSet.Add "牟", "móu"
            g_manualCharSet.Add "散", "sǎn"
            g_manualCharSet.Add "调", "tiáo"
            g_manualCharSet.Add "行", "xínɡ"
        
    End Function
    
    Function initSpecifiedCharSet()
    
            Set g_specifiedCharSet = CreateObject("Scripting.Dictionary")
    
            g_specifiedCharSet.Add "伽", "ok"
            g_specifiedCharSet.Add "梵", "ok"
            'g_specifiedCharSet.Add "乐", "ok"
            g_specifiedCharSet.Add "苾", "ok"
            g_specifiedCharSet.Add "刍", "ok"
    
    End Function
    
    Function getPinyin(ByVal text As String) As String
    
            Dim Result As String
            Dim cmd As String
            
            Dim wExec As Object
            Dim shellObj As Object
            Set shellObj = CreateObject("WScript.Shell")
            
            'cmd = "C:\Users\taihang\appdata\local\programs\python\python37\pythonw.exe D:\wordToPinyin.py "
            cmd = "pythonw D:\wordToPinyin.py " & text
            
            Set wExec = shellObj.Exec(cmd)
            Result = wExec.StdOut.ReadAll
        
            '如果运行时同时做其它操作,这里有时候会中段,可能是什么信号,懒得处理了,F5继续运行就可以了
            
            '去掉空格和后面的换行,否则注音显示会歪向左边
        '    spaceIdx = InStr(Result, " ")
        '    If 0 <> spaceIdx Then
        '        Result = Mid(Result, 1, spaceIdx - 1)
        '    End If
            
            getPinyin = Result
        
            Set wExec = Nothing
            Set shellObj = Nothing
        
    End Function
    
    Function isExceptionChar(ByVal char As String) As Boolean
        '有些字符不需要加注音
        
        If "" <> g_exceptionCharSet(tmpChar) Then
                isExceptionChar = True
        Else
                isExceptionChar = False
        End If
        
    End Function
    
    Function getManualPinyin(ByRef tmpChar As String, ByRef charPinyin As String)
    
            Dim tmpPinyin As String
            tmpPinyin = g_manualCharSet(tmpChar)
            If "" <> tmpPinyin Then
                charPinyin = tmpPinyin
            End If
            
    End Function
    
    Function countChar(areaStart As Long, areaEnd As Long)
        '手动算字数
        
        Dim charNum As Long
                
                '全选时要修正一下结束位置,否则全选时会死循环结束不了
                Selection.EndKey unit:=wdStory
                docEnd = Selection.End
                If docEnd < areaEnd Then
                        areaEnd = docEnd
                End If
        
                '恢复光标到起始位置
                Selection.Start = areaStart
                Selection.End = areaStart
                
                charNum = 0
                Do While Selection.End < areaEnd
            
                        If charNum Mod 256 = 0 Then
                                ' 防止假死
                                DoEvents
                                'Debug.Print "正在计算字数: " & Selection.End & "/" & areaEnd & "    百分比:" & Format((Selection.End - areaStart) / (areaEnd - areaStart), "Percent") & "%"
                        End If
                
                        Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdMove
                        charNum = charNum + 1
            
                Loop
                
                countChar = charNum
                
    End Function
    
    Function showDebugLog(ByVal prompt As String, _
            ByVal allBeginTimeSec As Double, ByVal allDoneCount As Long, ByVal allNum As Long, _
            ByVal phaseBeginTimeSec As Double, ByVal phaseDoneCount As Long, ByVal phaseNum As Long)
    
            Dim nowTimeSec As Double
            
            Dim allAvrCharCount As Double
            Dim allCostTimeSec As Double
            
            Dim phaseAvrCharCount As Double
            Dim phaseCostTimeSec As Double
            
            Dim tmpMS As Long
            'tmpMS = timeGetTime()
            
            If (0 = allDoneCount) Or (0 = phaseDoneCount) Then
                    Exit Function
            End If
            
            nowTimeSec = Timer()
            
            allCostTimeSec = nowTimeSec - allBeginTimeSec
            If 0 <> allCostTimeSec Then
                    allAvrCharCount = allDoneCount / allCostTimeSec
            Else
                    allAvrCharCount = 1
            End If
            
            phaseCostTimeSec = nowTimeSec - phaseBeginTimeSec
            If 0 <> phaseCostTimeSec Then
                    phaseAvrCharCount = phaseDoneCount / (nowTimeSec - phaseBeginTimeSec)
            Else
                    phaseAvrCharCount = 1
            End If
            
            Debug.Print prompt & "    all:" & Format(allCostTimeSec / 24 / 3600, "hh:mm:ss") & _
                    "    " & Format((allNum - allDoneCount) / allAvrCharCount / 24 / 3600, "hh:mm:ss") & _
                    "    " & allDoneCount & "/" & allNum & _
                    "    " & Format(allAvrCharCount, "Standard") & "words/second" & _
                    "    " & Format(allDoneCount / allNum, "Percent") & _
                    "    phase:" & Format(phaseCostTimeSec / 24 / 3600, "hh:mm:ss") & _
                    "    " & Format((phaseNum - phaseDoneCount) / phaseAvrCharCount / 24 / 3600, "hh:mm:ss") & _
                    "    " & phaseDoneCount & "/" & phaseNum & _
                    "    " & Format(phaseAvrCharCount, "Standard") & "words/count" & _
                    "    " & Format(phaseDoneCount / phaseNum, "Percent")
    
    End Function
    
    Sub addPinyin()
            '给选中区域加注音
            
            '避免被系统信号打断,好像有点效果
            Application.EnableCancelKey = False
            
            Dim blockCharLimit As Long
            '可自行修改每次处理的字符数量的大小
            blockCharLimit = 100
        
            Dim textForPinyin As String
            Dim tmpChar As String
            Dim pinyin As String
            Dim blockText As String
            
            Dim cursor1 As Long
            Dim cursor2 As Long
            Dim areaStart As Long
            Dim areaEnd As Long
            Dim docEnd As Long
            Dim lastPos As Long
            Dim blockAddedCharCount As Long
            Dim charPinyin As String
            Dim pinyinNum As Long
            
            Dim blockStart As Long
            Dim blockEnd As Long
            Dim blockReadCharCount As Long
            Dim blockReadMovedCharCount As Long
            Dim blockAddMovedCharCount As Long
            
            Dim allReadMovedCharCount As Long
            Dim charNum As Long
            Dim allAddedMovedCharCount As Long
            
            Dim costTimeSec As Double
            Dim avrCharCount As Double
            
            Dim mainLoopCount As Long
            
            Dim logContent As String
            
            Dim appBeginTimeSec As Double
            Dim endTimeSec As Double
            Dim tmpTimeSec As Double
            Dim blockBeginTimeSec As Double
            Dim addBeginTimeSec As Double
            
            'charNum =countChar(areaStart, areaEnd)
            charNum = Selection.Characters.Count
            areaStart = Selection.Start
            areaEnd = Selection.End
            
            appBeginTimeSec = Timer()
            
            initManualCharSet       '手动指定的读音
            initSpecifiedCharSet     '需要加读音的字
            initExceptionCharSet    ' 例外字符
            
            Dim pinyinArr() As String
            
            Debug.Print ""
            
            
            ' test
            'textForPinyin = getPinyin("】")
            
    
            ' 收集需要加注音的字
            textForPinyin = ""
            blockText = ""
            allReadMovedCharCount = 0
            
            mainLoopCount = 0
            'blockStart = areaStart
            allAddedMovedCharCount = 0
            blockEnd = areaStart
            Selection.Start = areaStart
            Selection.End = areaStart
            Do While (allReadMovedCharCount < charNum)
            
                    blockBeginTimeSec = Timer()
                    
                    '防止假死
                    DoEvents
                    
                    mainLoopCount = mainLoopCount + 1
            
                    Debug.Print "begin to collect words"
                    
                    blockStart = Selection.End
                    Selection.Start = blockStart
                    
                    'blockLoopCount = 0
                    
                    blockText = ""
                    
                    blockReadCharCount = 0
                    blockReadMovedCharCount = 0
                    
                    Do While (allReadMovedCharCount < charNum)
                    
                            '防止假死
                            DoEvents
                            
                            '进度显示
                            If allReadMovedCharCount Mod 64 = 0 Then
                                    showDebugLog "step1/2    collect words", appBeginTimeSec, allReadMovedCharCount + 1, charNum, blockBeginTimeSec, blockReadCharCount + 1, blockCharLimit
                            End If
                            
                            ' 找句末,非汉字,或者不需要加注音的符号
                            'Do While (allReadMovedCharCount < charNum) And (blockReadCharCount < blockCharLimit)
                            
                                    '防止假死
                                    'DoEvents
                                    
                                    lastPos = Selection.End
                                    Selection.Start = Selection.End
                                    '光标往下移一个字
                                    Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdMove
                                    Selection.Start = lastPos
                                    
                                    allReadMovedCharCount = allReadMovedCharCount + 1
                                    blockReadMovedCharCount = blockReadMovedCharCount + 1
                                    
                                    tmpChar = Selection.text
                                    
                                    If True = isNeedZhuyin(tmpChar) Then
                                            blockText = blockText + tmpChar
                                            blockReadCharCount = blockReadCharCount + 1
                                    ElseIf (blockCharLimit <= blockReadCharCount) Then
                                            ' 达到限制后,遇到句末才结束。python汉字转语音模块,多音字会根据词自动选择相应的读音
                                            Exit Do
                                    End If
                            
                            'Loop
    
                    Loop
                    
                    blockEnd = Selection.End
                    
                    pinyin = getPinyin(blockText)
                    pinyinArr = Split(pinyin, " ")
                    pinyinNum = UBound(pinyinArr) + 1
                    
                    Debug.Print "begin to add pinyin"
                    
                    addBeginTimeSec = Timer()
                    
                    ' add pinyin word by word
                    charPinyin = ""
                    Selection.Start = blockStart
                    Selection.End = blockStart
                    blockAddedCharCount = 0
                    blockAddMovedCharCount = 0
                    Do While (blockAddMovedCharCount < blockReadMovedCharCount)
                    
                            '防止假死
                            DoEvents
                            
                            '进度显示
                            If blockAddedCharCount Mod 16 = 0 Then
                                    showDebugLog "step2/2    add pinyin", appBeginTimeSec, allAddedMovedCharCount + 1, charNum, addBeginTimeSec, blockAddMovedCharCount + 1, blockReadMovedCharCount
                            End If
                            
                            allAddedMovedCharCount = allAddedMovedCharCount + 1
                            blockAddMovedCharCount = blockAddMovedCharCount + 1
                            
                            Selection.Start = Selection.End
                            lastPos = Selection.End
                            Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdMove
                            Selection.Start = lastPos
                            tmpChar = Selection.text
                            
                            'Debug.Print "加注音:" & blockAddedCharCount & "/" & blockReadCharCount & "    " & tmpChar
                            
                            If True = isNeedZhuyin(tmpChar) Then
                            
                                    charPinyin = pinyinArr(blockAddedCharCount)
                                    
                                    '如果是手动指定读音的字,就用手动指定的读音
                                    getManualPinyin tmpChar, charPinyin
                                    
                                    
                                     ' 不需要加注音,跳出
                                    'If False = isExceptionChar(tmpChar) Then
                                        
                                            '注音操作后,字的大小会增长,当前Selection.End等属性会自动变化
                                            cursor1 = lastPos
                                            cursor2 = Selection.End
                                            'cursor2 = cursor1 + 1
                                            With Selection
                                                    '.text = ""
                                                    .SetRange Start:=cursor1, End:=cursor2
                                                    .Range.PhoneticGuide text:=charPinyin, Alignment:=wdPhoneticGuideAlignmentCenter, Raise:=19, FontSize:=10, FontName:="微软雅黑 Light"
                                                    '.SetRange Start:=cursor1, End:=cursor2
                                            End With
                                            
                                    'End If
                                    
                                    blockAddedCharCount = blockAddedCharCount + 1
                                    
                            End If
                            
                    Loop
                    
                    
    '                '如果add中移动光标的数量与read中的不一致,就需要修正光标位置
    '                If (blockReadMovedCharCount <> blockAddMovedCharCount) Then
    '                        Selection.Start = Selection.End
    '                        Selection.MoveRight unit:=wdCharacter, Count:=blockReadMovedCharCount - blockAddMovedCharCount, Extend:=wdMove
    '                End If
                    
            Loop
            
            Set g_manualCharSet = Nothing
            Set g_specifiedCharSet = Nothing
            Set g_exceptionCharSet = Nothing
            
            ' 恢复选中区域
            Selection.Start = areaStart
            
            endTimeSec = Timer()
            
            logContent = "all cost time:" & Format((endTimeSec - appBeginTimeSec) / 24 / 3600, "hh:mm:ss") & "    all words count:" & charNum & _
                "    average:" & Format(charNum / (endTimeSec - appBeginTimeSec), "Standard") & "word/second"
            
            Debug.Print logContent
            MsgBox logContent
            
    
    End Sub
    
    

    如果直接复制网页网页上的代码到VB编辑器中,汉字会乱码。先把网页代码复制到记事本(notepad),然后另存为ANSI编码的文件,后缀名为.bas。否则中文字符在vba编辑器中会变成乱码。然后修改,在VB编辑器中导入文件,这样比较省事。

    打开VB编辑器的【视图】->【立即窗口】可以查看日志和大概进度。

    踩过的坑:汉字加上注音后,当前字的范围会变化,加上描述属性后,大小会增加,Selection.Start和Selection.End都会变动,后面内容的位置也会因此变化,用代码+1的方式来移动光标会出现各种乱码。(最新测试好像又可以了。)

    三、运行

    有一些边界条件没有测试,常用的就是选中一段文字,也可直接Crtl+a全选,然后
    WPS选择【开发工具】->【宏】(MS选择【视图】->【宏】),
    选择 addPinyin 宏,运行,就可以看到word中在一个字一个字的加注音。

    运行期间,最好不要操作word(可以做切换到其他的应用),不要切换到word的其他文档去,不要点击文档里的内容。否则就会变成给当前查看的文档、或者点击处的内容加注音,而且可能导致死循环。(如果死循环了,可以杀掉进程重新运行,或者在VBA编辑器里选择【运行】->【重新设置】以重置。代码里没有加自动保存的内容,重置可以保留已经处理过的结果。)

    运行时会看到光标会移动两遍,第一遍是遍历选中内容,这个比较快;第二遍是加注音,慢一些。

    想找一下看看能不能多文档并行处理的方法,不知道怎么在不用select的情况下光标MoveRight,也没找到文档(懒)。想要多开,就把文档拆分成多个,用虚拟机或者多台电脑。

    四、收尾,人工校对,重新调整排版

    加了注音后,WPS Word里看排版会变化,很多换行,不知道怎么设置,用MS Word打开就好很多。转成常用的pdf格式即可。

    五、定制

    initManualCharSet 和 initExceptionCharSet 两个函数中的内容可以根据需要自行修改。
    一般来说,文档里的中文字符都需要加到initExceptionCharSet中。可以先把未加拼音的原文档备份一下,先试几页,大概就知道需要排除那些符号了。

    需要自定义拼音的格式,就先按要求修改一个例子,录制成宏,然后把代码中.Range.PhoneticGuide这一行后面的属性修改成录制的宏里面对应的属性。

    文档中的汉字号,需要加到initExceptionCharSet中去,不然会出现注音一行也会带符号的情况。

    加上注音后,WPS Word中无法搜索,MS Word中可以搜索单个字。
    已经有注音的字,会被判断为不是汉字,已有注音不会被改变。

    第一个版本是一个字一个字的转换的,效率比较慢,两万四千个字大概两个小时加完。后来改成批量的,时间缩短到四十分钟左右。

    有师兄有更高效率的方案,请告诉我一下,非常感激。

    感谢给python库做贡献的前辈们!
    感谢写示例demo的前辈们!

    补:专业加拼音软件 adobe indesign
    好像有个“拼音居士”的word插件,一次可以加3万多字,没有亲自用过,有兴趣的可以去试试。

    内容里不能有跳转链接,不然会出问题。
    如果经常用,可以自定义菜单,给某个菜单卡新建一个组,把这个宏命令加进去,成为一个固定按钮。

    后来遇到一个问题,文档字很多,10万+,运行了一两天都没有结束,原因是运行到几百页就变得非常慢。经过尝试,大概是前面字多了,光标定位效率就低了。
    解决办法:运行注音用一个文档,保存已经注音的内容用另外一个文档;每次都注音几十页到100页的内容,这一部分注音结束就剪切追加到到已注音文档后面保存,这样效率高一些。就是有些地方的格式可能会被打乱一点,只能全都结束了之后再调一下了。


    附:

    寻亲方法。寻找失散亲人的方法。家人团聚的方法。打拐。寻找被拐儿童的方法。
    大富大贵大慈大悲的药师佛法
    药师七佛灯(各地供奉情况汇总)

    相关文章

      网友评论

          本文标题:[DIY]给word中的汉字批量加、修改拼音(word全篇加拼音

          本文链接:https://www.haomeiwen.com/subject/zwykmqtx.html