未登录,请登录后再发表信息
最新评论 (0)
播放视频

详细解释验证码的工作原理

I'm not a robot

在你上网时 你曾遇到过写着“我不是机器人”的选项框吗?
Have you ever been surfing the internet when you come across one of these boxes that says: “I’m not a robot.”?
你只是勾选之后继续浏览网站
So you check the box and go on your way.
但是一个对话框怎么会知道你是否是机器人 同时为什么这很重要?
But how the heck does this box know whether you’re a robot or not and why does it matter?
好吧 要回答这些问题 我们先要从这些东西开始
Well, to answer that, we actually have to start with these:
它们就是验证码 英文的全称是:区分人类与电脑的全自动公共图灵测试
They’re called CAPTCHAs: Completely Automated Public Turing Test to tell Computers and Humans Apart.
验证码是于2003年由路易斯·冯·安及其卡耐基梅隆大学的研究团队共同研发出来的
They were invented in 2003 by Luis von Ahn and his team of researchers at Carnegie Mellon University.
这些扭曲的文字是为了阻止垃圾信息在网络上扩散
The whole point of these distorted pieces of text was to stop spam on the internet,
比如阻止黄牛编写能在毫秒内买完所有票的电脑程序
like preventing scalpers from writing a computer program that buys every ticket in a fraction of a second.
验证码之所以有效 是因为人类能识别这些扭曲的文字 而电脑与机器人不能
They work because humans could read the distorted text yet computers and bots can’t.
[你不能通过这里!]
[You shall not pass!]
所以如果你想阻止机器人疯买演唱会门票或者是注册邮箱地址
So if you want to stop bots from buying concert tickets or setting up email addresses,
我们就只需要填写验证码作为整个过程的一部分
we just have to make filling out a CAPTCHA part of the process.
验证码的推广如此之快 现在每天就有以百万计的验证码在被网络用户识别
So fast forward and now millions of CAPTCHAs are being solved every single day by internet users and
于是冯·安开始思考:我们能运用这一强大的力量做一些有用之举吗?
Von Ahn on started to think: can we do something useful with all this great power?
[答案是肯定的 而这正是我们正在从事的工作]
[And the answer to that is, yes, and this is what we’re doing now.]
所以他们运用脑力来将所有现存的实体书数字化
So they decided to use that brain power to digitize every single physical book we have and the
具体方法是获取实体书并扫描它们 然后运用光学字符识别软件
way to do that is to take real physical books, scan them, and then use optical character recognition
来将书中的文字变成电子稿 而工作人员的任务则是挑出电脑难以解读的文字
software to translate the words into digital text. What they did was take any words that were too hard for the computer to decipher and
并将它们上传至二次验证数据库
upload them into the reCAPTCHA database.
所以随着验证码的发展 它将不会出现随机的扭曲文字
So going forward, instead of showing random distorted text,
取而代之的是电脑无法识别的书上的单词 当有足够多的人都输入这些验证码后
CAPTCHA started to show words from books that computers couldn’t understand and when enough people on the internet solving these CAPTCHAs
对于展示的文本人们给出的相同单词会被确认为正确答案 并上传至电子书数据库
wrote the same word for a piece of text shown, that word would be confirmed and uploaded to an ebook database.
冯·安将这项计划称为“二次验证”
Von Ahn called this project
他们的口号是“少发垃圾,多读书” 从这点出发的话 每天约有一亿个二次验证码被完成
reCAPTCHA. Their slogan was “stop spam, read books”. At this point, a hundred million reCAPTCHAs were being solved every day, the equivalent of
这个数量相当于一年阅读了250万本书
2.5 million books a year.
所以谷歌坐不住了 “让我们也使用二次验证码吧”
So Google was like: “let’s acquire reCAPTCHA”.
他们也的确这么做了 在2009年他们运用脑力将自19世纪来《纽约时报》的所有存档
And they did, in 2009, and they used that brainpower to digitize all the New York Times archives since
以及谷歌图书的全部内容都数字化了
the 1800s, as well as all of Google books.
当这些资源用尽后 谷歌开始将街景捕捉到的路边数字给用户识别 从而方便标识谷歌地图
And when they ran out of those, Google started giving people street numbers from Google Street view to help label Google Maps.
从此一切都愉快地进行着……并不准确 因为还是存在一些问题的
So everything worked out happily ever after, but not really, because there were a couple of problems.
第一个问题在于二次验证码虽然有效 但是却不是无障碍的 比如对于盲人
The first is that even though reCAPTCHAs work, they weren’t too accessible, so blind people had a much harder time
他们填信息和网上注册时会比其他人困难得多
filling out forms and signing up for things on the internet.
所以工作人员也制作了二次音频验证码 就像这样
So they made audio reCAPTCHAs as well that sound like this:
[六]
[Six]
[四]
[Four]
[零]
[Zero]
[九]
[Nine]
尽管这样 二次验证码对于有阅读困难、听觉缺陷、视觉缺陷以及其他感官缺陷的人而言还是一个负担
But regardless, reCAPTCHAs became a burden for people with dyslexia, poor hearing, poor sight, as well as other sensory impairments.
另一个问题在于 一些能帮助解读验证码的付费服务开始涌现
The other problem was that paid services started popping up that solved CAPTCHAs for you.
这些服务之所以有效 是因为他们把你的验证码发到位于第三世界国家的验证码工厂
The services work because they took your CAPTCHAs and shipped them off to CAPTCHA Farms in third-world countries
在那里员工以极低的工资来解读验证码 然后再发回给客户 也就是你
where workers would be paid dirt cheap to solve your CAPTCHAs and ship them back to you, the client.
最后一个问题 可能也是最重要的一个
And the last problem, which is perhaps the most important, was that
电脑视觉技术正日趋完美 以致机器人得以解读验证码并通过验证
computer vision technology was becoming so good that bots were starting to solve these CAPTCHAs and get through.
因此工程师们开始思考:“为什么不把验证码弄得再难一点呢?”
So engineers got to thinking and thought: “why not make CAPTCHAs harder to solve?
所以他们让验证码变得扭曲弯折 并加入了干扰与随机曲线
So they made CAPTCHAs have more twists and turns and added some noise and threw in random lines,
但随着时间进展 新技术得以推广 而机器人再一次成功破译这些验证码
but as time went on the technology caught on and bots were once again getting through.
所以谷歌决定开展一项调查 然后发现人类对于这些复杂困难的图片
So google decided to do some research and they found that humans got these complex complicated captions right
成功解读率只有百分之33
only about 33 percent of the time and
而谷歌先进的电脑技术的成功解读率高达百分之99.8
their advanced computer technology at Google was getting them right 99.8 percent of the time.
该死 电脑视觉技术已经达到了
Shoot, that computer vision technology was on a
[完全另一个高度]
[WHOLE ‘NOTHER LEVEL]
因此谷歌打算改变这个局面 他们放弃了这些扭曲的验证码并想出了另一个方案
So Google decided to change things. They got rid of the distorted text CAPTCHA and they came up with this:
他们称之为
And they called it
“不需要验证码的二次验证” 当你点击验证后 它会给谷歌发送一份带有大量有用信息的超文本链接请求
“No CAPTCHA reCAPTCHA”. When you click it, it sends over an hTTP request to Google with a whole bunch of useful information.
包括你的IP地址 你的国家 时间戳
Things like your IP address, your country, a timestamp.
以及你浏览器的相关信息 比如你在勾选复选框前的光标移动方式
Information from your browsers, such as the way you move your cursor just moments before entering the checkbox.
你在点击前如何滚动页面 不同浏览器操作之间的时间间隔
How you were scrolling the page before the click, the time interval between different browser events, and many other
以及许多其他变量 谷歌都会为用户保密
variables that Google will keep secret.
这些标准随后会被一台机器进行处理
All these criteria are then processed by a machine
并在谷歌由风险分析引擎处理 因此大部分情况下它能区分人类与机器人
learning risk analysis engine at Google and most of the time the information can tell the difference between a human and a bot.
但如果引擎仍无法确认 那么这一小部分用户通常要额外进行一项任务
But if the risk analysis engine still isn’t sure, then for a small percent of users they’ll often complete an additional challenge.
那就是图像识别验证 比如选出所有有店面的图片 或是选出图片带有路标的部分
An image recognition CAPTCHA. Something like picking all images with a storefront or picking all the sections of an image that show a street sign.
如果你通过这种方式证明了你是人类
And if you prove that you’re a human once this way
那么谷歌引擎就可能记住你 下一次你勾选对话框时 你就可以畅通无阻了
then chances are Google’s engine will remember. And next time after clicking that check box, you’ll be able to pass right through with ease.
[完全另一个高度]
[A WHOLE ‘NOTHER LEVEL]

发表评论

译制信息
视频概述

“你到底是人类还是机器人?”我们人类要怎么回答这一问题呢?其实你早就接触过答案了,不信?请戳进来看看

听录译者

收集自网络

翻译译者

泪之彩虹

审核员

嘉言先森

视频来源

https://www.youtube.com/watch?v=jCr6rNaZ9EU

相关推荐