美文网首页
论文翻译助手,python3调用剪贴板和谷歌翻译

论文翻译助手,python3调用剪贴板和谷歌翻译

作者: 铁佛爷 | 来源:发表于2018-10-24 20:50 被阅读0次

    英语烂,看论文都费劲,谷歌翻译和欧陆词典是我的好朋友。
    从pdf里复制段落到谷歌翻译是最常用的操作了。
    但是删换行什么的太讨厌。
    python写个小工具。

    功能:从windows剪切板中读取复制的文字,格式处理,调用谷歌翻译api,返回结果。
    环境:win10,Python 3.5.2 |Anaconda 4.2.0 (64-bit)

    主程序 Clipboard.py,从这里运行。包括读写剪贴板,格式化处理。
    一是注意剪贴板使用中的异常处理,剪贴板打开了必须要关闭,cb.CloseClipboard(),否则会影响复制粘贴使用(如果发现复制粘贴失效了,关闭python即可)
    二是注意编码问题。python3里str都是unicode编码,从剪贴板读的时候,格式控制要选win32con.CF_UNICODETEXT,不要用win32con.CF_TEXT。那个出来时bytes类型,转str的时候还会有好多毛病。

    # -*- coding: utf-8 -*-
    """
    Created on Fri Oct 19 10:48:45 2018
    
    @author: BigFly
    """
    import win32clipboard as cb
    import win32con
    from translate import google_translate
    
    def gettext():
        cb.OpenClipboard()
        try:
            t = cb.GetClipboardData( win32con.CF_UNICODETEXT)
        except TypeError:
            print("There are NO TEXT in clipboard.")
        else :
            return t
        finally:
            cb.CloseClipboard()
    
    def settext(aString):
        cb.OpenClipboard()
        try:
            cb.EmptyClipboard()
            cb.SetClipboardData( win32con.CF_UNICODETEXT, aString)
        except:
            print("Any error in func:settext()")
        cb.CloseClipboard()
        
    #删()引用
    def deletBracket(source,flags,pad_sym=chr(0)):
        code={"(":1, ")":-1}
        index = [i for i in range(len(source)) if source[i]=="(" or source[i]==")"]
        match,start=0,-1
        for i in index:
            match+= code[ source[i] ]
            if start<0 and match==1:
                start = i
            if match==0:
                concent=source[start: i+1]
                check=sum([concent.find(flag) for flag in flags])+len(flags)
                if check > 0:
                    source=source.replace(concent,pad_sym*len(concent),1)
                start=-1
        return source.replace(pad_sym,"")
        
    source= gettext()
    if source:
        source= source.replace(chr(0),"")
        # huanhang
        source=source.replace("\r","")
        source=source.replace("\n"," ")
        # fenju
        pad_sym=chr(0)
        source=source.replace("e.g. ","e.g."+pad_sym)
        source=source.replace("i.e. ","i.e."+pad_sym)
        source=source.replace("Eq. ","Eq."+pad_sym)
        source=source.replace("Mr. ","Mr."+pad_sym)
        
        source=source.replace(". ",". \r\n")
        source=source.replace(pad_sym," ")
        # qu()
        source=deletBracket(source,["et al.", ", 201", ", 200", ", 199"],pad_sym)
        source=source.replace("  "," ")
        
        settext(source)
        print(source)
        print("[ %d ]"%(len(source)))
        print(google_translate(source))
    
    '''
    
    Our architectures
    will have only one representation at one resolution besides
    the pooling layers and the convolutional layers that initialize
    the needed numbers of channels. Take the architecture in
    Table 1 as an example. There are two processes for each
    resolution. The first one is the transition process, which
    computes the initial features with the dimensions of the next
    resolution, then down samples it to 1=4 using a 2×2 average
    pooling. A convolutional operation is needed here because
    F is assumed to have the same input and output sizes. The
    next process is using GUNN to update this feature space
    gradually. Each channel will only be updated once, and all
    channels will be updated after this process. Unlike most of
    the previous networks, after this two processes, the feature
    transformations at this resolution are complete. There will
    be no more convolutional layers or blocks following this feature representation, i.e., one resolution, one representation.
    Then, the network will compute the initial features for the
    next resolution, or compute the final vector representation of
    the entire image by a global average pooling. By designing
    networks in this way, SUNN networks usually have about
    20 layers before converting to GUNN-based networks.
    '''
    
    

    调用谷歌翻译的程序,网上找的现成代码稍改了一下
    原文:https://blog.csdn.net/yingshukun/article/details/53470424

    translate.py
    改了返回数据的处理:
    result返回的是个长度为9的list,result[0]是翻译结果,后边有备选翻译等其他东西,用不着。
    result[0]也是个列表,长度为行数or句子数+1,最后一个是翻译结果的拼音
    把result[:-1]中的翻译结果拼接起来就是我们要的了。
    该文件可直接运行,测试翻译。

    # -*- coding: utf-8 -*-
    """
    Created on Tue Oct 23 18:58:26 2018
    
    @author: BigFly
    """
    
    import requests  
    from HandleJs import Py4Js    
    
    js=Py4Js()
    
    def google_translate(content):   
        if len(content) > 4891:    
            print("翻译的长度超过限制!!!")    
            return  
        tk = js.getTk(content)
        param = {'tk': tk, 'q': content}
        result = requests.get("""http://translate.google.cn/translate_a/single?client=t&sl=en
            &tl=zh-CN&hl=zh-CN&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss
            &dt=t&ie=UTF-8&oe=UTF-8&clearbtn=1&otf=1&pc=1&srcrom=0&ssel=0&tsel=0&kc=2""", params=param).json()[0]
        #返回的结果为Json,解析为一个嵌套列表
        return "".join([text[0] for text in result[:-1]])
    
    if __name__ == "__main__":    
        content = """An old woman had a cat. 
    The cat was very old; she could not run quickly, and she could not bite, because she was so old. 
    One day the old cat saw a mouse; she jumped and caught the mouse. 
    But she could not bite it; so the mouse got out of her mouth and ran away, because the cat could not bite it.
    Then the old woman became very angry because the cat had not killed the mouse. 
    She began to hit the cat. The cat said, "Do not hit your old servant. 
    I have worked for you for many years, and I would work for you still, but I am too old. 
    Do not be unkind to the old, but remember what good work the old did when they were young."""
        print(google_translate(content))
    

    HandleJs.py
    这段是用js生成tk码的,tk码由提交的要翻译的内容生成,相当于是个校验吧,不了解。
    注意安装execjs模块时,名字是 PyExecJS。 pip install PyExecJS

    # -*- coding: utf-8 -*-
    """
    Created on Tue Oct 23 18:57:54 2018
    
    @author: BigFly
    """
    import execjs
     
    class Py4Js():
        def __init__(self):
            self.ctx = execjs.compile("""
            function TL(a) {
            var k = "";
            var b = 406644;
            var b1 = 3293161072;
            
            var jd = ".";
            var $b = "+-a^+6";
            var Zb = "+-3^+b+-f";
        
            for (var e = [], f = 0, g = 0; g < a.length; g++) {
                var m = a.charCodeAt(g);
                128 > m ? e[f++] = m : (2048 > m ? e[f++] = m >> 6 | 192 : (55296 == (m & 64512) && g + 1 < a.length && 56320 == (a.charCodeAt(g + 1) & 64512) ? (m = 65536 + ((m & 1023) << 10) + (a.charCodeAt(++g) & 1023),
                e[f++] = m >> 18 | 240,
                e[f++] = m >> 12 & 63 | 128) : e[f++] = m >> 12 | 224,
                e[f++] = m >> 6 & 63 | 128),
                e[f++] = m & 63 | 128)
            }
            a = b;
            for (f = 0; f < e.length; f++) a += e[f],
            a = RL(a, $b);
            a = RL(a, Zb);
            a ^= b1 || 0;
            0 > a && (a = (a & 2147483647) + 2147483648);
            a %= 1E6;
            return a.toString() + jd + (a ^ b)
        };
        function RL(a, b) {
            var t = "a";
            var Yb = "+";
            for (var c = 0; c < b.length - 2; c += 3) {
                var d = b.charAt(c + 2),
                d = d >= t ? d.charCodeAt(0) - 87 : Number(d),
                d = b.charAt(c + 1) == Yb ? a >>> d: a << d;
                a = b.charAt(c) == Yb ? a + d & 4294967295 : a ^ d
            }
            return a
        }
        """)
            
        def getTk(self,text):
            return self.ctx.call("TL",text)
        
    

    程序演示:

    pdf里选中,复制
    运行下clipboard.py,中英文结果都出来了。按句换行,括号引用都去掉了,清爽。

    格式处理后的英文还放到了剪贴板里,可以在别处直接粘贴(这是为了方便做ppt用的):

    Deep neural networks have become the state-of-the-art systems for image recognition as well as other vision tasks .
    The architectures keep going deeper, e.g., from five convolutional layers to 1001 layers .
    The benefit of deep architectures is their strong learning capacities because each new layer can potentially introduce more non-linearities and typically uses larger receptive fields .
    In addition, adding certain types of layers will not harm the performance theoretically since they can just learn identity mapping.
    This makes stacking up layers more appealing in the network designs.

    嗯,,还是得好好学英语,不要依赖这个。

    相关文章

      网友评论

          本文标题:论文翻译助手,python3调用剪贴板和谷歌翻译

          本文链接:https://www.haomeiwen.com/subject/gxaxtqtx.html