哈喽 我是克里斯蒂安·鲁德尔
Hello, my name is Christian Rudder,
我是丘比特约会网站的创始人之一
and I was one of the founders of OkCupid.
我们的网站现在已经成为美国最大的约会网站了
It’s now one of the biggest dating sites in the United States.
像网站中的很多人一样 我是主修数学的
Like most everyone at the site, I was a math major,
如同你猜测的一样 我们将数学分析方法带入了恋爱中
As you may expect, we’re known for the analytic approach we take to love.
我们称它为匹配算法
We call it our matching algorithm.
简单来说 丘比特的匹配算法帮助我们决定
Basically, OkCupid’s matching algorithm helps us decide
两个人是否适合约会
whether two people should go on a date.
我们所有的业务都是围绕它展开的
We built our entire business around it.
如今 算法成为了一个高端的词
Now, algorithm is a fancy word,
人们像讨论大事件一样讨论它
and people like to drop it like it’s this big thing.
可是实际上 算法不过是个系统化的
But really, an algorithm is just a systematic,
逐步解决问题的方法
step-by-step way to solve a problem.
完全没有多么的高端
It doesn’t have to be fancy at all.
在这次的视频中
Here in this lesson,
我将会展现我们是如何得到这一特别的算法的
I’m going to explain how we arrived at our particular algorithm,
这样你就能看到它是如何实现的了
so you can see how it’s done.
现在来说 为什么算法如此重要?
Now, why are algorithms even important?
制作这个视频的意义是什么
Why does this lesson even exist?
嗯 我要重提之前说过的一个非常重要的短语了
Well, notice one very significant phrase I used above:
它们是一种逐步解决问题的方法
they are a step-by-step way to solve a problem,
就像大家大概都知道的 电脑就是采用逐步的处理方式
and as you probably know, computers excel at step-by-step processes.
没有算法的电脑
A computer without an algorithm
也只能成为一块昂贵的镇纸
is basically an expensive paperweight.
所以当电脑充斥着我们的日常生活时
And since computers are such a pervasive part of everyday life,
算法便无处不在
algorithms are everywhere.
丘比特的匹配算法用到的数学计算惊人的简单
The math behind OkCupid’s matching algorithm is surprisingly simple.
不过是一些加法、乘法和一点开根式
It’s just some addition, multiplication, a little bit of square roots.
设计上的巧妙之处便是
The tricky part in designing it
想办法让某些东西充满神秘感
was figuring out how to take something mysterious,
吸引人
human attraction,
还可以拆分成电脑可以处理的数据
and break it into components that a computer can work with.
首先就需要将人们与数据相匹配
The first thing we needed to match people up was data,
这是运用算法的必要条件
something for the algorithm to work with.
得到信息最好的方法就是直接问
The best way to get data quickly from people is to just ask for it.
于是我们决定 丘比特网站需要问用户一些问题
So we decided that OkCupid should ask users questions,
像是“你以后想要孩子吗?”
stuff like, “Do you want to have kids one day?”
“你通常多久刷一次牙?”
“How often do you brush your teeth?”
“你喜欢看恐怖电影吗?”
“Do you like scary movies?”
还有一些更严肃的问题“你相信上帝吗?”
And big stuff like, “Do you believe in God?”
现在 很多问题都可以方便的用喜欢和喜欢配对
Now, a lot of the questions are good for matching like with like,
这便是 两个人的答案相同时
that is, when both people answer the same way.
举个例子 两个人都喜欢恐怖电影
For example, two people who are both into scary movies
比一个人喜欢而另一人不喜欢的情况契合度更高
are probably a better match than one person who is and one who isn’t.
可是像这样的问题呢
But what about a question like,
“你喜欢成为关注的焦点吗?”
“Do you like to be the center of attention?”
如果两个相关联的人都说喜欢
If both people in a relationship are saying yes to this,
他们之间会出现很多的问题
they’re going to have massive problems.
我们很早便了解了这一点
We realized this early on,
于是我们觉得对于每一个问题我们都需要一些更详细的信息
and so we decided we needed a bit more data from each question.
我们要求大家不仅要对自己的问题回答详细
We had to ask people to specify not only their own answer,
同样有他们对另一半的期望
but the answer they wanted from someone else.
这个方法很好用
That worked really well.
只是需要多一次划分
But we needed one more dimension.
有些问题更能区分一个人的特性
Some questions tell you more about a person than others.
比如说 一个有关政治的问题 像是
For example, a question about politics, something like,
你觉得点燃一本书与点燃一面旗帜相比哪个更糟?
“Which is worse: book burning or flag burning?”
相较于一个人看电影的品味来说更能揭示一个人的本性
might reveal more about someone than their taste in movies.
想要准确的衡量所有的事情似乎不太可能
And it doesn’t make sense to weigh all things equally,
于是我们增加了最后一个数据点
so we added one final data point.
对于丘比特网提出的任何问题
For everything that OkCupid asks you,
你都可以标注出它对你的重要性
you have a chance to tell us the role it plays in your life.
这个等级从毫不相关到密切相关
And this ranges from irrelevant to mandatory.
那么现在 对于每一个问题 我们都有了三个计算量
So now, for every question, we have three things for our algorithm:
首先 你自己的答案
first, your answer;
然后 你希望别人——你潜在的对象——怎么回答
second, how you want someone else — your potential match — to answer;
最后 这些问题对你来说有多重要
and third, how important the question is to you at all.
有了这所有的信息
With all this information,
丘比特网便能计算出两个人的契合度
OkCupid can figure out how well two people will get along.
算法通过对这些数据的运算得出结论
The algorithm crunches the numbers and gives us a result.
作为一个实用性的例子来说
As a practical example,
我们来计算一下你和其他人的匹配度
let’s look at how we’d match you with another person.
我们暂且称他为“B”
Let’s call him “B.”
你与B的匹配度取决于你们都回答过的问题
Your match percentage with B is based on questions you’ve both answered.
我们称它为问题集“s”
Let’s call that set of common questions “s.”
作为一个简单的例子 我们用一个元素少的“s”
As a very simple example, we use a small set “s”
只有两个元素的集合
with just two questions in common,
用它们来计算匹配度
and compute a match from that.
这里有两个示例问题
Here are our two example questions.
第一个 我们假设是 “你的个多么混乱的人?”
The first one, let’s say, is, “How messy are you?”
答案可选有
And the answer possibilities are:
非常混乱、一般或非常有条理
very messy, average and very organized.
我们假设你的答案是“非常有条理”
And let’s say you answered “very organized,”
你希望其他人的回答也是“非常有条理”
and you’d like someone else to answer “very organized,”
这个问题对你来说很重要
and the question is very important to you.
基本上 你就是有洁癖
Basically, you’re a neat freak.
你总是保持整洁 你希望别人也能保持整洁 就是这样
You’re neat, you want someone else to be neat, and that’s it.
我们假设B有些不一样
And let’s say B is a little bit different.
他自己的回答是“非常有条理”
He answered “very organized” for himself,
而对他人的期望是“一般”就可以了
but “average” is OK with him as an answer from someone else,
这个问题对他来说只是有一点重要
and the question is only a little important to him.
我们再来看看第二个问题 来自我们之前提到过的问题
Let’s look at the second question, from our previous example:
“你喜欢成为关注的焦点吗?”
“Do you like to be the center of attention?”
选项是“是”或是“否”
The answers are “yes” and “no.”
你的回答是“否” 你希望别人的回答也是“否”
You’ve answered “no,” you want someone else to answer “no,”
这个问题对你来说只是有一点重要
and the question is only a little important to you.
现在来看B 他自己的回答是“是”
Now B, he’s answered “yes.”
他希望别人的回答是“否”
He wants someone else to answer “no,”
因为他希望自己是公众关注的焦点
because he wants the spotlight on him,
而且这个问题对他来说是有些重要的
and the question is somewhat important to him.
好了 现在我们来计算这些问题
So, let’s try to compute all of this.
第一步 因为我们使用了电脑来处理这些
Our first step is, since we use computers to do this,
我们就要给他们赋予权值
we need to assign numerical values
对于“比较重要”和“非常重要”这样的选项
to ideas like “somewhat important” and “very important,”
因为计算机需要所有东西都变成数字
because computers need everything in numbers.
丘比特网站决定用这样的数值
We at OkCupid decided on the following scale:
“不相关”值为0
“Irrelevant” is worth 0.
“有一点重要”值为1
“A little important” is worth 1.
“有些重要”值为10
“Somewhat important” is worth 10.
“非常重要”值为50
“Very important” is 50.
“必须要求”值为250
And “absolutely mandatory” is 250.
接下来 运用算法进行两个简单计算
Next, the algorithm makes two simple calculations.
首先:你对B的回答的满意度是多少?
The first is: How much did B’s answers satisfy you?
也就是 在你的标准下B能拿到多少分?
That is, how many possible points did B score on your scale?
那么 你明确指出了B对于第一个问题的回答限制
Well, you indicated that B’s answer to the first question,
关于混乱的
about messiness,
它对你非常重要
was very important to you.
它有50分并且B得到了
It’s worth 50 points and B got that right.
第二个问题只值1分
The second question is worth only 1,
因为你说它对你来说只是有一点重要
because you said it was only a little important.
B没有得到这一分
B got that wrong,
那么你对B的回答满意度为50/51
so B’s answers were 50 out of 51 possible points.
98%的满意度 很不错
That’s 98% satisfactory. Pretty good.
第二个算法问题便是:你对B来说满意度为多少?
The second question the algorithm looks at is: How much did you satisfy B?
嗯 B对于混乱的问题设置了1分
Well, B placed 1 point on your answer to the messiness question
第二个问题中设置了10分
and 10 on your answer to the second.
这11分即是1分加10分 你得到了其中10分——
Of those 11, that’s 1 plus 10, you earned 10 —
因为你们在第二个问题上契合度很高
you guys satisfied each other on the second question.
于是你得到10/11等于91%的满意度
So your answers were 10 out of 11 equals 91 percent satisfactory to B.
听起来不坏
That’s not bad.
最后一步是将你们两人的满意度
The final step is to take these two match percentages
计算得到你们共同的满意度数值
and get one number for the both of you.
实现这个 算法是将两数相乘
To do this, the algorithm multiplies your scores,
然后开n次方根
then takes the nth root,
这里的“n”是问题的数目
where “n” is the number of questions.
因为s 问题的样本空间
Because s, which is the number of questions in this sample,
只有2
is only 2,
我们便得到 将百分数开平方根
we have: match percentage equals the square root
即98%与91%的乘积的平方根
of 98 percent times 91 percent.
得到94%
That equals 94 percent.
这个94%就是你和B之间的匹配度
That 94 percent is your match percentage with B.
这是对你们相处时能有多么幸福的数学表达
It’s a mathematical expression of how happy you’d be with each other,
基于我们了解到的信息
based on what we know.
那么 我们为什么要用乘法呢?
Now, why does the algorithm multiply,
反过来说 为什么不是算两个百分数的平均数
as opposed to, say, average the two match scores together,
而且开根式有什么意义?
and do the square-root business?
通俗来说 这个计算公式称作求几何平均数
In general, this formula is called the geometric mean.
是一个建立两个跨度较大的数值间关系的有效方法
It’s a great way to combine values that have wide ranges
并能显现出各自非常不同的特性
and represent very different properties.
换句话来说 是完美的浪漫主义匹配法
In other words, it’s perfect for romantic matching.
你有相当大的选择范围而且你有一堆的不同数据点
You’ve got wide ranges and you’ve got tons of different data points,
就像我说的 关于电影的、政治的、宗教的——所有的一切
like I said, about movies, politics, religion — everything.
同样 直观上也好理解
Intuitively, too, this makes sense.
两个有50%满意度的人
Two people satisfying each other 50 percent
将比两个0%和100%的人更适合在一起
should be a better match than two others who satisfy 0 and 100,
因为情感是相互的
because affection needs to be mutual.
在加上一点误差范围的纠正
After adding a little correction for margin of error,
在我们仅使用一小部分问题的情况下产生
in the case where we have a small number of questions,
像是我们举例的那个
like we do in this example,
就可以得到准确数值了
we’re good to go.
不论何时丘比特网站对两人配对时
Any time OkCupid matches two people,
都是经过我们刚刚执行的步骤
it goes through the steps we just outlined.
首先收集你的答案信息
First it collects data about your answers,
然后将你的选择与其他人的比较
then it compares your choices and preferences to other people’s
用非常简单的数学方法
in simple, mathematical ways.
而这 将现实世界的现象记录下来
This, the ability to take real-world phenomena
并使其能被电子芯片理解的能力
and make them something a microchip can understand,
我觉得 才是 如今对任何人来说都是很重要的能力
is, I think, the most important skill anyone can have these days.
就像你用语言告诉别人一个故事
Like you use sentences to tell a story to a person,
你使用算法告诉电脑一个故事
you use algorithms to tell a story to a computer.
只要你学会了这种语言 你就可以转述这个故事了
If you learn the language, you can go out and tell your stories.
我希望这个视频可以帮助你们做到
I hope this will help you do that.
