Have you ever been surfing the internet when you come across one of these boxes that says: “I’m not a robot.”?
So you check the box and go on your way.
But how the heck does this box know whether you’re a robot or not and why does it matter?
好吧 要回答这些问题 我们先要从这些东西开始
Well, to answer that, we actually have to start with these:
They’re called CAPTCHAs: Completely Automated Public Turing Test to tell Computers and Humans Apart.
They were invented in 2003 by Luis von Ahn and his team of researchers at Carnegie Mellon University.
The whole point of these distorted pieces of text was to stop spam on the internet,
like preventing scalpers from writing a computer program that buys every ticket in a fraction of a second.
验证码之所以有效 是因为人类能识别这些扭曲的文字 而电脑与机器人不能
They work because humans could read the distorted text yet computers and bots can’t.
[You shall not pass!]
So if you want to stop bots from buying concert tickets or setting up email addresses,
we just have to make filling out a CAPTCHA part of the process.
So fast forward and now millions of CAPTCHAs are being solved every single day by internet users and
Von Ahn on started to think: can we do something useful with all this great power?
[And the answer to that is, yes, and this is what we’re doing now.]
So they decided to use that brain power to digitize every single physical book we have and the
way to do that is to take real physical books, scan them, and then use optical character recognition
software to translate the words into digital text. What they did was take any words that were too hard for the computer to decipher and
upload them into the reCAPTCHA database.
So going forward, instead of showing random distorted text,
CAPTCHA started to show words from books that computers couldn’t understand and when enough people on the internet solving these CAPTCHAs
wrote the same word for a piece of text shown, that word would be confirmed and uploaded to an ebook database.
Von Ahn called this project
他们的口号是“少发垃圾，多读书” 从这点出发的话 每天约有一亿个二次验证码被完成
reCAPTCHA. Their slogan was “stop spam, read books”. At this point, a hundred million reCAPTCHAs were being solved every day, the equivalent of
2.5 million books a year.
So Google was like: “let’s acquire reCAPTCHA”.
And they did, in 2009, and they used that brainpower to digitize all the New York Times archives since
the 1800s, as well as all of Google books.
当这些资源用尽后 谷歌开始将街景捕捉到的路边数字给用户识别 从而方便标识谷歌地图
And when they ran out of those, Google started giving people street numbers from Google Street view to help label Google Maps.
So everything worked out happily ever after, but not really, because there were a couple of problems.
第一个问题在于二次验证码虽然有效 但是却不是无障碍的 比如对于盲人
The first is that even though reCAPTCHAs work, they weren’t too accessible, so blind people had a much harder time
filling out forms and signing up for things on the internet.
So they made audio reCAPTCHAs as well that sound like this:
But regardless, reCAPTCHAs became a burden for people with dyslexia, poor hearing, poor sight, as well as other sensory impairments.
The other problem was that paid services started popping up that solved CAPTCHAs for you.
The services work because they took your CAPTCHAs and shipped them off to CAPTCHA Farms in third-world countries
在那里员工以极低的工资来解读验证码 然后再发回给客户 也就是你
where workers would be paid dirt cheap to solve your CAPTCHAs and ship them back to you, the client.
And the last problem, which is perhaps the most important, was that
computer vision technology was becoming so good that bots were starting to solve these CAPTCHAs and get through.
So engineers got to thinking and thought: “why not make CAPTCHAs harder to solve?
So they made CAPTCHAs have more twists and turns and added some noise and threw in random lines,
但随着时间进展 新技术得以推广 而机器人再一次成功破译这些验证码
but as time went on the technology caught on and bots were once again getting through.
So google decided to do some research and they found that humans got these complex complicated captions right
only about 33 percent of the time and
their advanced computer technology at Google was getting them right 99.8 percent of the time.
Shoot, that computer vision technology was on a
[WHOLE ‘NOTHER LEVEL]
So Google decided to change things. They got rid of the distorted text CAPTCHA and they came up with this:
And they called it
“不需要验证码的二次验证” 当你点击验证后 它会给谷歌发送一份带有大量有用信息的超文本链接请求
“No CAPTCHA reCAPTCHA”. When you click it, it sends over an hTTP request to Google with a whole bunch of useful information.
包括你的IP地址 你的国家 时间戳
Things like your IP address, your country, a timestamp.
Information from your browsers, such as the way you move your cursor just moments before entering the checkbox.
How you were scrolling the page before the click, the time interval between different browser events, and many other
variables that Google will keep secret.
All these criteria are then processed by a machine
learning risk analysis engine at Google and most of the time the information can tell the difference between a human and a bot.
But if the risk analysis engine still isn’t sure, then for a small percent of users they’ll often complete an additional challenge.
那就是图像识别验证 比如选出所有有店面的图片 或是选出图片带有路标的部分
An image recognition CAPTCHA. Something like picking all images with a storefront or picking all the sections of an image that show a street sign.
And if you prove that you’re a human once this way
那么谷歌引擎就可能记住你 下一次你勾选对话框时 你就可以畅通无阻了
then chances are Google’s engine will remember. And next time after clicking that check box, you’ll be able to pass right through with ease.
[A WHOLE ‘NOTHER LEVEL]