跨语言大数据驱动未来翻译行业巨头讲述心得

作者: 甜菜的垂耳兔 | 来源:发表于2017-12-13 10:23 被阅读0次

TAUS spoke with Eric Yu, CEO of GTCOM, Global Tone Communication Technology Co., Ltd, the world’s largest services provider in the combined field of translation, big data and artificial intelligence.

spoke with 对话

TAUS 翻译自动化用户协会Translation automation user association

Technology Co., Ltd 科技股份有限公司

TAUS（翻译自动化用户协会）对话中译语通科技股份有限公司（“中译语通”）CEO于洋。中译语通是行业领先的语言、大数据和人工智能服务供应商。

At the core of the conversation lies this question: “What is the single biggest lesson that you have learnt about the translation industry?’’

对话首先围绕一个核心问题展开—TAUS：“您对翻译行业最重要的心得体会是什么？”

What I have learnt most about the translation industry as such is that it can be considered as just one part of a much more all-encompassing “industry” that we call cross-language big data.

我认为最主要的是：翻译行业是一个综合性“行业”的一部分，这个“行业”就是我们所说的“跨语言大数据

I come from a background deeply rooted in the translation industry. Having majored in conference interpreting, I worked for China Translation Corporation, served as the Head of Conference Interpreting, CMO, and Assistant President and then Vice President of CTC /(此处可断句翻译)which is the largest LSP in China. I was appointed as the CEO of GTCOM when it was incorporated as a subsidiary of CTC in 2013. What I have realized in this career is that today many related language phenomena can converge in this concept of language as data in a world with artificial intelligence.

我本身跟翻译行业渊源颇深(come from a background deeply rooted in the translation industry.)。我学的是会议口译专业，毕业后加入中国对外翻译有限公司（“中译”），先后担任(当过很多职位，可以用先后)首席同传、会议口译部主任、市场总监、总经理助理和副总经理等职务。中译是中国最大的语言服务供应商(LSP language service provider)，2013年，中译成立子公司(subsidiary)中译语通，我被任命为CEO。这些年中( in this career)，我认识到许多相关的语言现象都可以作为语言数据归结(converge in this concept of language )到人工智能这个行业中。(补范畴词)

As we all know, machine translation is one of the most complicated parts of NLP and artificial intelligence. We call it “the jewel in the crown” of NLP. (We started to invest deeply in R&D for machine translation in 2014, giving us a chance to better understand NLP, speech recognition, big data and artificial intelligence.

众所周知，机器翻译是自然语言处理(natural language processing)（NLP）和人工智能（AI）最复杂的部分，我们称之为“皇冠上的宝石”。2014年(时间在前)，我们开始投入大量资金(invest deeply)进行机器翻译研发(research and development)，因此对NLP、语音识别(speech recognition)、大数据和人工智能有了(giving us a chance)更深刻的认识(better understand )。

Then in October 2015, we put forward the “cross-language big data” concept, which basically involves managing data on an Internet scale. For example, instead of searching in English or Russian or Chinese using input terms and finding content only in those languages, what if we simply let people input a word in their language but(翻译顺序发生改变) eliminated all those language labels? That way we would be able to have access to all relevant results in any language - Chinese, English, Russian, German and so on. And what if we were then able to analyze all those data quantitatively and qualitatively? That is the essence of our cross-language big data concept.

2015年10月，我们提出了“跨语言大数据”的概念，这一概念基本上包含了网络规模上的数据管理。如果(instead of……only find 并没有)我们用英语、俄语或中文进行搜索(using input terms)，得到的只能是相应语种的内容，那么如果消除所有语言标签，只是简单地让大家输入( a word in their language)他们自己的语言，结果会怎么样呢？这样我们可以获取(have access to )中文、英语、俄语、德语(中文先分后总)及其他任意一种语言的所有相关结果。如果我们能够定量、定性(quantitatively and qualitatively)地分析所有这些数据，又会怎么样呢？这就是我们跨语言大数据概念的实质所在(essence)。

Naturally this involves using real-time machine translation. But that is only a kind of “language switch.” I began to wonder what we could achieve if we could analyze all the texts discovered from one search term, or from one piece of news returned to the user – analyze the persons involved, the time, location, entities, and all the other knowledge and information contained in that news item or document? What if we then extended this search for data by being able to grasp all the data from the past ten or twenty years and analyze them in the same way - qualitatively and quantitatively? That capability is lacking, so we are trying to offer it through this concept.

从本质上来说(Naturally)，这涉及到实时机器翻译的应用(using 将动词翻译为名词，灵活使用)，但这仅仅是一种“语言转换”。我们开始思考(wonder)，假如能够从一个术语搜索到所有相关文本(analyze all the texts discovered from one search (名词做动词，因此可以不译discovered from，抓强势动词) term)，或者能够根据用户收到(returned to)的一条新闻来分析相关人员、时间、位置、实体以及包含的其他知识或信息(contained in that news item or document?(这个重复了上述的新闻，英文重复性强。因此要适当删减))，会有怎样的结果？如果我们能够扩大数据搜索范围(扩大后面搭配为范围，可以说是范畴词，而不能说扩大搜索，注意动宾搭配)(extended this search for data)，获取近10年或20年的所有数据，并以同样的方式对其进行定量定性分析，又会有怎样的结果？目前这个领域是空白(capability is lacking,(能力是欠缺的，翻译出来不顺。可以译为领域空白，上文就是围绕这个领域开展))，所以我们试图来实现(offer it(赋予这个能力就是实现，注意灵活性))这一概念。

How does your background and interest in interpretation fit into this vision?

您是如何将口译的(状语可以当成修饰词提前)专业背景和兴趣与这一理念联系到一起(fit into (fit into原来意思是融入一体))的呢？

In early 2013, we launched our Global Multilingual Call Center. Previously in 2008 when I was a member of the Global Advisory Committee on language line services, I proposed that we try to see whether simultaneous interpreting could be provided over the phone. At that time it seemed like a new idea, and it still is.

在2013年出初，我们推出了了国际多呼叫中心这一。早在2008年(Previously in 2008)，我在奥运会全球顾问委员会(Global Advisory Committee on language line services,)担任委员( a member of中文喜欢用动词，因此不说我是……的成员，而是担任……职务)的时候，就提出我们应该试试通过电话进行同声传译(simultaneous interpreting could be provided over the phone. 被动改主动)。那时候这个想法很超前(a new idea)，事实上现在也是。

So we started to build our multilingual call center in 2012 and when we first started GTCOM as an independent company, we launched our Global Multilingual Call Center. Following recent developments, I think this is now the largest Call Center in China. It provides services in 12 languages and receives an average of 500,000-minutes of calls per month from all over the world. For example, we have more than 400 interpreters working exclusively for China UnionPay. Moreover, we also provide big data analysis. Every month, we analyze all the call big data and generate reports, so this Center is no longer just a call center in the traditional sense, but increasingly a big-data center.

因此，2012年，我们开始筹备(过渡的词语)成立多语言呼叫中心(成立中心而不是建立中心)。2013年初，中译语通作为独立公司成立(when we first started GTCOM as an independent company,此处省略了we，中心作为主语，也是中文的常见翻译法)，我们设立了全球多语言呼叫中心。近几年，它已经发展为中国最大的呼叫中心(Following recent developments 时刻记住词性改变)，提供12种语言服务，平均每月收到全球各地客户50万分钟的呼叫。比如，我们有400余位口译员专门(exclusively)为中国银联(China UnionPay)提供服务。此外，我们还为客户提供大数据分析。每个月我们都对所有呼叫大数据进行分析(analyze)并生成报告(generate reports 不是产生报告而是生成报告)，从这点上讲(这是中文另外加入的一点，让表达更加通畅)，呼叫中心已经不是传统意义上的呼叫中心，而是正在发展为一个大数据中心。

In terms of size and volume how would you compare GTCOM to its closest competitors?

从规模(size and volume 大小和数量可以归结为规模)上讲(In terms of(翻译成从……上讲))，您如何将中译语通与你们最强劲的对手进行对比？

As I said, GTCOM is now a big data company and far bigger than any other translation companies in China – and maybe in many other countries!

如我所说，中译语通现在是一家大数据公司，规模及业务范围(英文中省略的主语在中文中要补充)远远超过(far bigger than 不要简单翻译大的多)中国任何一家翻译公司，也许还超过许多其他国家和地区的翻译公司。

We started working on machine translation, for example, in 2014 and on big data in 2015. Each year we have invested on average about USD 30 million on R&D in machine translation. We have already filed a dozen patents for MT-related technologies. And our machine translation supports a total of 33 different languages, including Chinese, English, German, French, Japanese, Arabic, Portuguese, Russian, Korean, etc. Among them, about 25 languages can be translated with our own Neural Machine Translation engine.

例如，我们从2014年开始致力于(working on )机器翻译，2015年开始开展大数据业务(英文中常常用介词来代替动词。因此翻译成中文时要具体根据名词来选择搭配选择，此处的开展业务就是一例)，平均每年投入3000万美元用于研发。公司已经申请了十几项机器翻译技术(MT-related technologies)的专利(filed a dozen patents)。目前，我们的机器翻译支持33种语言，包括中文、英语、德语、法语、日语、阿拉伯语、葡萄牙语、俄语、韩语等；其中，25种语言采用了我们自主研发的(own)神经网络机器翻译(Neural Machine Translation engine.)技术(with 看到介词要注意是否翻译成动词)。

In addition to our translation and Call Center activities, we also provide video localization services, which are part of our translation services in China. About 85% of the work in this area is carried out using our YeeCaption toolkit, a one-stop smart subtitle translation software.

除了(In addition to)翻译业务和呼叫中心(activities)，我们还提供视频本地化(video localization)服务，其中(in this area)约85%的内容是通过我们的一站式智能字幕(后面补充语，可以用破折号来解释)翻译工具——字幕通（YeeCaption）完成的。

Let me give you an example of a video job from March 2017. Our client planned to launch a short video clip business on a platform rather like YouTube /which introduced huge amounts of videos from overseas into China. So we mobilized nearly 700 translators and localized about 830 hours of multilingual videos from a wide variety of content categories in just ten days. On top of the localization, we have become an IP provider, signing up the IP copyright partnerships with 37 of the biggest IP providers overseas, making us the exclusive operator for their video content in China. So once again, we have decided to go beyond the technical art of localization, and work in distribution, and production as well.

我来举个视频翻译(video job，英语为了避免重复常用代替的词语。要注意辨别)的例子。2017年3月，我们的客户从海外引进海量短视频，计划在类似YouTube的平台上推出短视频业务(launch a short video clip business)。这些视频种类多样、内容丰富，并且涉及多个语种。我们启用了将近700位翻译，仅用10天时间就完成了时长约830个小时的视频本地化翻译(localized)。此外(On top of the localization)，我们已经成为IP供应商，与海外37家最大的IP供应商签署了版权合作协议(signing up the IP copyright partnerships )，成为他们在中国的独家(exclusive)经营商。因此，我们已经超越了本地化的技术服务(technical art)，进一步向发行与制作(distribution and production)方面进军(in又是介词改动词)。

Your ambitions are clearly far greater than just providing a translation service. Do you feel ready to take on technology companies such as Google yet?

中译语通追求的(Your ambitions are clearly ……的雄心壮志就等于追求的)已经不仅仅(far greater than)是提供翻译服务了。您认为你们已经做好准备与Google这样的技术公司进行竞争(take on 说实话，有点不明觉厉)了吗？

Unlike Google and other MT technology providers, we provide a domain-specific MT engine. In certain fields, our MT delivers higher quality as it can be tailored to news domains such as financial news, military news and others. In addition to our machine translation, we can collect data in about 65 different languages from 200+ countries, and keep updating this daily. Globally, we regularly update about 30 million articles and 500 million social media messages daily. According to data analytics firm Palantir, this is estimated to be worth $20 billion.

我们(每句话先想一想主语)与Google和其他机器翻译技术供应商不同，我们提供特定领域(domain-specific)的机器翻译引擎。比如在金融、军事等领域可以进行定制(be tailored to)，以便提供更高质量的服务。除了机器翻译，我们还能收集全球200多个国家和地区的65种语言，并且每天进行更新。全球范围内(Globally)，我们的新闻日更新和处理能力达3000多万篇，社交数据日更新和处理能力达5亿条。data analytics firm数据分析公司

For us, cross-language big data is unstructured open-source data which are all related to open source news and social media such as Twitter, Facebook, and WeChat and Weibo in China. We can analyze each piece of unstructured data. Our current line of big data products includes JoveBird, which is designed to take advantage of big data and AI technologies to offer financial investment solutions. Using a set of financial analysis models and powerful cross-language big data processing capability, it helps investors analyze stock-price trends and strategize their investments.

对我们而言，跨语言大数据是非结构化开源数据(unstructured open-source data )，包括开源新闻数据以及Twitter、Facebook、微信、微博等社交媒体数据。我们可以分析每一条(each piece of)非结构化数据。比如我们的大数据产品JoveBird，通过一系列金融分析模型和强大的跨语言大数据处理能力，可以帮助投资人分析股价趋势并制定投资战略(strategize their investments. 制定战略 动词改名词)。

We are indeed seeking new partners and targets, for example in localization. We’re also looking at advertising, consulting, and big data companies. China is already a huge market, and we are able to respond to the huge local-market demand of our customers. On the other hand, we have more than 10 products in our big data line-up, with the big data analytical platform for social media, and a news toolkit. We also have an industrial big data platform, a mining platform and a series of other big data platforms - all leading big data technologies. So what we hope is that together with our target partners, we will be able to provide them with cutting-edge data technologies, and help them first explore the Chinese market and later the global market.

事实上，我们很希望与全球伙伴合作(seeking new partners and targets)，localization 本地化包括广告公司、咨询公司和大数据公司等等，帮助他们在中国市场发展。中国是一个很大的市场，我们能够很好地对客户千变万化的市场需求做出回应。(we are able to respond to (很好的做出回应)the huge(翻译成了千变万化) local-market demand of our customers. )我们现在拥有10余个(more than 10 products超过十个不怎么顺溜，用十余个比较好)大数据产品，面向(for 翻译成面向social media)政府、企业、新闻媒体等行业的大数据分析平台，还拥有工业大数据、数据采集平台(mining platform)及其他大数据平台。我们希望能与我们合作伙伴一起，运用我们最先进的数据技术(cutting-edge data technologies)，来帮助他们打开(first explore)中国市场并扩展(later一个later也要保证动词，英文常常省略)全球市场。

Is translation slowly becoming a smaller part of your business while big data and AI grow bigger?

你们的大数据和人工智能业务发展越来越大(while断句，一般后面为前提，先翻译后面的)，翻译业务占据的比例是否会逐渐缩小(slowly becoming a smaller part of 英语常用名词，中文的话可以翻译成动词)？

Yes, our language services are playing a smaller role in overall GTCOM revenue, but at the same time, this market is set to grow significantly. Language services are at a completely different market level compared to big data and AI which are growing much faster. What I’d like to highlight is that the language industry should be developed in an artificial intelligence direction, not in the traditional way. As I said in my keynote at the FIT congress in Australia, we must be open to AI and all work and grow together with it for the future of the language industry. We work, for example, with such large players as Haier, GE, Alibaba and many other industry clients. Beneath our entire big data platform, we now have a language technology infrastructure. So our big data platform can handle our key asset - cross-language big data.

目前，语言服务在中译语通整体收入中的确只占据很小一部分(are playing a smaller role 同上也是名词改动词的方法)，但未来，这个市场会有极大的发展。与发展迅速的大数据和人工智能相比，语言服务市场同前沿技术的融合相对缓慢(are at a completely different market level 这里难以理解，我只会翻译成两者的水平并不在同一个层面上)。我想强调的是语言行业应该借助人工智能，而不是继续遵循传统方式( in the traditional way 动词和名词转换)。就像我在澳大利亚的世界翻译大会（FIT）主题演讲中提到的一样，我们必须迎接( be open to 迎接就是开放怀抱)人工智能，与其共同成长，为语言服务创建更好的未来(for the创建(动词) future of the language industry.)。我们之所以可以与海尔、通用电气(GE general electric)、阿里巴巴等这样的行业巨头进行合作(clients 有客户即与……合作)，是因为(之所以……是因为，是根据原文逻辑关系推出的)我们拥有整个大数据平台背后的语言技术，也正因为如此(so)，我们的平台才能处理跨语言大数据，这就是(破折号的作用，顺序相反)我们的核心价值。(key asset - cross-language big data.)

That’s the future. Take our industrial big data platform as an example, all those language technologies have been automatically embedded in the big data platform. Underneath the platform we have the language technology infrastructure. In this way, we have established a link between language and data, initiating a brand new future for the language service industry.

我们可以预见未来(That’s the future. )，以我们的大数据行业平台为例，所有这些语言技术已经自动嵌入(embedded in)大数据平台。通过这种方式，我们已经将语言和数据连接起来(established a link 不说建立联系，不像中国话，名词改动词)，为语言服务行业打造全新的未来(initiating a brand new future(打造未来，愿意为开始，要灵活翻译) )。

跨语言大数据驱动未来翻译行业巨头讲述心得

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读