双语:Conversational Computing
发布时间:2018年07月06日
发布人:nanyuzi  

Conversational Computing

会话式计算

 

Voice technology is making computers less daunting and more accessible

语音技术让计算机不那么令人生畏,且更易接近

 

Any sufficiently advanced technology, noted Arthur C. Clarke, a British science-fiction writer, is indistinguishable from magic. The fast-emerging technology of voice computing proves his point. Using it is just like casting a spell: say a few words into the air, and a nearby device can grant your wish.

 

英国科幻作家亚瑟·克拉克曾说过,任何足够先进的科技看起来都与魔法无异。迅速兴起的语音计算技术证明了他的观点。使用语音就如念出咒语:对着空中说几个词,身旁的设备就能满足你的愿望。

 

The Amazon Echo, a voice-driven cylindrical computer that sits on a table top and answers to the name Alexa, can call up music tracks and radio stations, tell jokes, answer trivia questions and control smart appliances; even before Christmas it was already resident in about 4% of American households. Voice assistants are proliferating in smartphones, too: Apple’s Siri handles over 2bn commands a week, and 20% of Google searches on Android-powered handsets in America are input by voice. Dictating e-mails and text messages now works reliably enough to be useful. Why type when you can talk?

 

亚马逊Echo是一台置于桌面、由语音驱动的圆柱形电脑,会对Alexa这个名字做出回应。它能播放音乐和广播、讲笑话、回答琐碎的问题,还会控制智能家电;圣诞节还没到它就已经入住了约4%的美国家庭。智能手机里的语音助手也在激增:苹果的Siri每周处理超过20亿条指令,美国安卓手机上20%的谷歌搜索由语音输入。现在语音输入电子邮件和短信的可靠程度已经足够使用。能说话的时候为什么还要打字呢?

 

This is a huge shift. Simple though it may seem, voice has the power to transform computing, by providing a natural means of interaction. Windows, icons and menus, and then touchscreens, were welcomed as more intuitive ways to deal with computers than entering complex keyboard commands. But being able to talk to computers abolishes the need for the abstraction of a “user interface” at all. Just as mobile phones were more than existing phones without wires, and cars were more than carriages without horses, so computers without screens and keyboards have the potential to be more useful, powerful and ubiquitous than people can imagine today.

 

这是个巨大的转变。尽管看似简单,但通过提供一种自然的互动方式,语音有能力改变计算的形态。说到和计算机打交道,先是视窗、图标和菜单,之后是触摸屏,都因为比输入复杂的命令行更为直观而受到欢迎。但是能对计算机说话彻底消除了对“用户界面”这一抽象概念的需要。正如手机远不只是无线电话,汽车远不只是无马之车,没有了显示屏和键盘的电脑有潜力变得比人们今天所能想像的更有用、更强大且无处不在。

 

Voice will not wholly replace other forms of input and output. Sometimes it will remain more convenient to converse with a machine by typing rather than talking (Amazon is said to be working on an Echo device with a built-in screen). But voice is destined to account for a growing share of people’s interactions with the technology around them, from washing machines that tell you how much of the cycle they have left to virtual assistants in corporate call-centres. However, to reach its full potential, the technology requires further breakthroughs – and a resolution of the tricky questions it raises around the trade-off between convenience and privacy.

 

语音不会完全取代其他形式的输入和输出。和机器交流,有时打字仍然会比说话更方便(据说亚马逊正在研发带嵌入屏幕的Echo)。但是在人们与身边科技设备的互动中,语音必将占据越来越大的份额,无论是与告诉你还需要多久洗完衣物的洗衣机互动,还是和企业热线的虚拟助手交谈。不过,要充分发挥潜能,这项技术还需要更多突破,而且必须解决由它引出的棘手问题——在便利性和隐私之间权衡。

 

Alexa, what is deep learning?

Alexa,深度学习是什么?

 

Computer-dictation systems have been around for years. But they were unreliable and required lengthy training to learn a specific user’s voice. Computers’ new ability to recognise almost anyone’s speech dependably without training is the latest manifestation of the power of “deep learning”, an artificial-intelligence technique in which a software system is trained using millions of examples, usually culled from the internet. Thanks to deep learning, machines now nearly equal humans in transcription accuracy, computerised translation systems are improving rapidly and text-to-speech systems are becoming less robotic and more natural-sounding. Computers are, in short, getting much better at handling natural language in all its forms.

 

计算机语音识别系统已出现多年。但在以前它并不可靠,而且需要漫长的训练才能学会识别特定使用者的语音。如今计算机无需训练即能可靠识别几乎任何人的语音,这一新能力是“深度学习”力量的最新体现。深度学习是一种人工智能技术,用通常来自互联网的数百万个范例来训练某个软件系统。正因为有了深度学习,现在的机器将语音转为文字的准确度才堪比人类。计算机翻译系统也正迅速改进,而把文字转为语音的系统也变得越来越不那么机器腔,听起来更加自然。简而言之,计算机在处理各种形式的自然语言时表现大幅提升。

 

Although deep learning means that machines can recognise speech more reliably and talk in a less stilted manner, they still don’t understand the meaning of language. That is the most difficult aspect of the problem and, if voice-driven computing is truly to flourish, one that must be overcome. Computers must be able to understand context in order to maintain a coherent conversation about something, rather than just responding to simple, one-off voice commands, as they mostly do today (“Hey, Siri, set a timer for ten minutes”). Researchers in universities and at companies large and small are working on this very problem, building “bots” that can hold more elaborate conversations about more complex tasks, from retrieving information to advising on mortgages to making travel arrangements. (Amazon is offering a $1m prize for a bot that can converse “coherently and engagingly” for 20 minutes.)

 

尽管深度学习能让机器能更可靠地识别语音、说话也不那么生硬,但它们还是无法理解语言的意思。这是这项技术最棘手的地方,而且如果语音驱动的计算要真正蓬勃发展,就必须克服这一难关。要进行一段连贯的对话,计算机必须能够理解上下文,而不是仅仅对简单的一次性语音指令做出回应——它们目前所做的大部分都是后者(比如,“Hey,Siri,设个十分钟提醒”)。各大院校和大小公司的研究人员都在钻研这一问题,努力开发能就更复杂的任务进行更详尽对话的“机器人”,无论是做信息检索、房贷咨询还是安排旅行。(亚马逊为能开发出进行“连贯生动地”谈话20分钟的机器人悬赏一百万美元。)

 

When spells replace spelling

当咒语取代拼写

 

Consumers and regulators also have a role to play in determining how voice computing develops. Even in its current, relatively primitive form, the technology poses a dilemma: voice-driven systems are most useful when they are personalised, and are granted wide access to sources of data such as calendars, e-mails and other sensitive information. That raises privacy and security concerns.

 

在决定语音计算如何发展上,消费者和监管机构也将扮演一定的角色。即便是在目前相对初级的阶段,这一技术也已陷入了进退两难的窘境:语音驱动系统若要发挥最大的作用,就得个人化并能获准访问各种数据源,如日历、电子邮件和其他敏感的信息。这引发了对隐私和安全的担忧。

 

To further complicate matters, many voice-driven devices are always listening, waiting to be activated. Some people are already concerned about the implications of internet-connected microphones listening in every room and from every smartphone. Not all audio is sent to the cloud – devices wait for a trigger phrase before they start relaying the user’s voice to the servers that actually handle the requests – but when it comes to storing audio, it is unclear who keeps what and when.

 

让情况变得更加复杂的是,很多语音驱动设备始终在聆听,等待被启动。有些人已经担心在每个房间里、每台智能手机上时刻倾听的联网麦克风将产生怎样的影响。并非所有声音都被传到云端,在开始将用户的语音传到真正处理用户指令的服务器之前,设备在等待一句“触发指令”。但说到储存声音,谁在何时记录了什么,我们并不清楚。

 

Police investigating a murder in Arkansas, which may have been overheard by an Amazon Echo, have asked the company for access to any audio that might have been captured. Amazon has refused to co-operate, arguing (with the backing of privacy advocates) that the legal status of such requests is unclear. The situation is analogous to Apple’s refusal in 2016 to help FBI investigators unlock a terrorist’s iPhone; both cases highlight the need for rules that specify when and what intrusions into personal privacy are justified in the interests of security.

 

阿肯色州调查一宗谋杀案的警察认为一台亚马逊Echo可能无意中听到了凶案信息,因此要求亚马逊提供可能捕捉到的任何声音。亚马逊拒绝合作,称(在隐私保护拥护者的支持下)这些要求的法律规定尚不明确。这一情况与2016年苹果拒绝帮助FBI调查员解锁一名恐怖分子的iPhone相似。两起事件都突显出需要制定法规,明确为保障公众安全可以在何时以何种方式介入个人隐私。

 

Consumers will adopt voice computing even if such issues remain unresolved. In many situations voice is far more convenient and natural than any other means of communication. Uniquely, it can also be used while doing something else (driving, working out or walking down the street). It can extend the power of computing to people unable, for one reason or another, to use screens and keyboards. And it could have a dramatic impact not just on computing, but on the use of language itself. Computerised simultaneous translation could render the need to speak a foreign language irrelevant for many people; and in a world where machines can talk, minor languages may be more likely to survive. The arrival of the touchscreen was the last big shift in the way humans interact with computers. The leap to speech matters more.

 

即便这样的问题尚未解决,消费者仍会接纳语音计算。在很多场合下,语音比其他任何交流方式都方便得多也自然得多。而且可以在做其他事的同时(如开车、健身或者走在路上时)使用语音,这一点独一无二。它可以让计算的力量泽被因种种原因无法使用屏幕和键盘的人。而且,它的巨大影响不仅限于计算,还会冲击语言使用本身。计算机同声传译会让很多人觉得会不会说外语无关紧要;在一个机器会说话的世界里,小语种或许更有可能幸存下来。触摸屏的到来是人类与计算机互动之路上迈出的一大步。向语音的飞跃现在更为重要。


英文、中文版本下载:http://www.yingyushijie.com/shop/source/detail/id/557.html