ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

彼得·唐纳利:数据如何欺骗陪审团 – 译学馆
未登陆,请登陆后再发表信息
最新评论 (0)
播放视频

彼得·唐纳利:数据如何欺骗陪审团

Peter Donnelly: How stats fool juries

正如一些演讲者所说 在这里的观众面前演讲
As other speakers have said, it’s a rather daunting experience —
是一次令人畏缩的经历–相当令人恐慌
a particularly daunting experience — to be speaking in front of this audience.
不过与其他演讲者不同 我不会给大家讲
But unlike the other speakers, I’m not going to tell you about
宇宙的迷团 也不会讲进化的奥妙
the mysteries of the universe, or the wonders of evolution,
抑或是人们用来对抗世界上主要的不平等现象的
or the really clever, innovative ways people are attacking
那些着实非常奇妙新颖的办法
the major inequalities in our world.
更不会讲现代全球经济下国家之间的挑战
Or even the challenges of nation-states in the modern global economy.
就像你们刚才听到的 概括来说 我讲的内容是统计学–
My brief, as you’ve just heard, is to tell you about statistics —
更确切地说 是一些统计学中很有趣的事情
and, to be more precise, to tell you some exciting things about statistics.
而这–
And that’s —
(笑)
(Laughter)
–相对所有在我之前以及之后的演讲者而言
— that’s rather more challenging
具有空前绝后的挑战性
than all the speakers before me and all the ones coming after me.
(笑)
(Laughter)
当我在统计学这个领域还是新人的时候 一个资深同事相当自豪地告诉我
One of my senior colleagues told me, when I was a youngster in this profession,
统计学家是那些喜欢数字
rather proudly, that statisticians were people who liked figures
但性格上不适合做会计的人
but didn’t have the personality skills to become accountants.
(笑)
(Laughter)
还有一个统计学的笑话
And there’s another in-joke among statisticians, and that’s,
“怎样看出统计学家是内向还是外向呢?”
“How do you tell the introverted statistician from the extroverted statistician?”
答案就是
To which the answer is,
“外向的统计学家会看别人的鞋”
“The extroverted statistician’s the one who looks at the other person’s shoes.”
(笑)
(Laughter)
不过其实我想讲一些有用的–所以请注意
But I want to tell you something useful — and here it is, so concentrate now.
今晚在学校的自然历史博物馆里有一个招待会
This evening, there’s a reception in the University’s Museum of Natural History.
希望你能发现 这是一个绝妙的场合
And it’s a wonderful setting, as I hope you’ll find,
也是维多利亚优秀传统中的表现
and a great icon to the best of the Victorian tradition.
在这样的场合 这样的人群中 虽然有点不大可能
It’s very unlikely — in this special setting, and this collection of people —
但你也许仍然发现你在跟一些你并不想聊天的人交谈
but you might just find yourself talking to someone you’d rather wish that you weren’t.
这时候你就可以这么做
So here’s what you do.
当他们问:“你的工作是?”–你就说:“我是统计学家”
When they say to you, “What do you do?” — you say, “I’m a statistician.”
(笑)
(Laughter)
除非他们事先得到提醒 知道这是你编的
Well, except they’ve been pre-warned now, and they’ll know you’re making it up.
一般出现的情形都不过以下两种
And then one of two things will happen.
他们会突然在屋子另一角发现了失散多年的表亲
They’ll either discover their long-lost cousin in the other corner of the room
然后赶去跟他们说话
and run over and talk to them.
或者他们会突然很渴或者很饿–通常是饥渴交迫–
Or they’ll suddenly become parched and/or hungry — and often both —
然后奔向食物和饮料
and sprint off for a drink and some food.
这是你就能一个人静下来 跟你想聊天的人交谈
And you’ll be left in peace to talk to the person you really want to talk to.
解释我们到底是做什么的 是我们这个领域的一个挑战
It’s one of the challenges in our profession to try and explain what we do.
我们并不是晚宴的贵宾 也不是理想的交谈对象
We’re not top on people’s lists for dinner party guests and conversations and so on.
对此我也一直没能找到什么好的解决办法
And it’s something I’ve never really found a good way of doing.
但我的妻子–当时是我的女朋友
But my wife — who was then my girlfriend —
在这件事上就比我出色的多
managed it much better than I’ve ever been able to.
多年前 那时我们刚开始约会 她在英国BBC工作
Many years ago, when we first started going out, she was working for the BBC in Britain,
而我当时在美国
and I was, at that stage, working in America.
我回英国看她的时候
I was coming back to visit her.
她跟一个同事说起这事 那个同事问:“你男朋友是做什么的?”
She told this to one of her colleagues, who said, “Well, what does your boyfriend do?”
她苦苦思索着我刚才解释过的工作
Sarah thought quite hard about the things I’d explained —
于是那段时间她一直是一个专心的倾听者
and she concentrated, in those days, on listening.
(笑)
(Laughter)
别告诉她我跟说过这事
Don’t tell her I said that.
她当时想 我的工作是建立数模
And she was thinking about the work I did developing mathematical models
来加深对进化和现代基因学的了解
for understanding evolution and modern genetics.
所以当同事问:“他是干什么的?”
So when her colleague said, “What does he do?”
她就停顿一下 然后说:“他做模型。”
She paused and said, “He models things.”
(笑)
(Laughter)
当然 她的同事立即就对我产生了出乎我意料的兴趣
Well, her colleague suddenly got much more interested than I had any right to expect
并继续问:“他做什么模型?”
and went on and said, “What does he model?”
然后 萨拉又想了想我的工作 然后答:“基因。”
Well, Sarah thought a little bit more about my work and said, “Genes.”
(笑)
(Laughter)
“他建立基因模型。”
“He models genes.”
这就是我的初恋 题外话了
That is my first love, and that’s what I’ll tell you a little bit about.
总的来说 我要给大家讲一些
What I want to do more generally is to get you thinking about
不确定性、随机性和概率在生活中的影响
the place of uncertainty and randomness and chance in our world,
我们对此的反应是怎样的 以及我们了解他们的程度
and how we react to that, and how well we do or don’t think about it.
到现在为止大家听得都很轻松
So you’ve had a pretty easy time up till now —
到现在为止都是听听笑笑
a few laughs, and all that kind of thing — in the talks to date.
现在大家要开始思考了 我会提几个问题
You’ve got to think, and I’m going to ask you some questions.
下面这个场景就是我开始问第一个问题
So here’s the scene for the first question I’m going to ask you.
想象连续掷硬币的情形
Can you imagine tossing a coin successively?
由于某种原因–我就暂时不做过多的解释了–
And for some reason — which shall remain rather vague —
我们很喜欢某种特定的情形
we’re interested in a particular pattern.
比如这个–正面、反面、正面
Here’s one — a head, followed by a tail, followed by a tail.
假设我们连续掷硬币
So suppose we toss a coin repeatedly.
然后我们设定这样一个情形 正反反
Then the pattern, head-tail-tail, that we’ve suddenly become fixated with happens here.
数着掷十次:一 二 三 四 五 六 七 八 九 十
And you can count: one, two, three, four, five, six, seven, eight, nine, 10 —
然后看结果怎么样
it happens after the 10th toss.
你可能觉得还有更有趣的事可以做 不过这次先迁就我一下
So you might think there are more interesting things to do, but humor me for the moment.
假设这半边观众都拿出硬币开始投掷
Imagine this half of the audience each get out coins, and they toss them
直到他们看到正反反现象为止
until they first see the pattern head-tail-tail.
第一回投硬币 也许十次以后才能看到
The first time they do it, maybe it happens after the 10th toss, as here.
第二回 也许第四次就能看到
The second time, maybe it’s after the fourth toss.
再下一回 也许比15次还多
The next time, after the 15th toss.
做过很多遍这个实验后 将每遍的次数平均
So you do that lots and lots of times, and you average those numbers.
这就是我想让这半边思考的情况
That’s what I want this side to think about.
那半边观众不喜欢正反反
The other half of the audience doesn’t like head-tail-tail —
出于某些深刻的文化因素 他们觉得这很无聊–
they think, for deep cultural reasons, that’s boring —
他们跟更喜欢另一种情形–正反正
and they’re much more interested in a different pattern — head-tail-head.
所以 这半边的观众拿出硬币 反复投掷
So, on this side, you get out your coins, and you toss and toss and toss.
然后记下看到正反正情形出现时掷硬币的次数
And you count the number of times until the pattern head-tail-head appears
然后将所有的次数平均
and you average them. OK?
那么 这半边的观众得出了一个平均数
So on this side, you’ve got a number —
因为做了很多次 所以这个数字是准确的
you’ve done it lots of times, so you get it accurately —
就是正反反情形出现时投掷硬币次数的平均
which is the average number of tosses until head-tail-tail.
而这半边的观众 大家也得出了一个数字–正反正情形的平均
On this side, you’ve got a number — the average number of tosses until head-tail-head.
那么就有了这样一个数学问题
So here’s a deep mathematical fact —
两个数之间只能有三种情形
if you’ve got two numbers, one of three things must be true.
他们或者相等 或者这个比那个大
Either they’re the same, or this one’s bigger than this one,
或者那个比这个大
or this one’s bigger than that one.
那么在我们这两种情形下这两个数相比会怎样呢
So what’s going on here?
大家来思考一下 然后投个票
So you’ve all got to think about this, and you’ve all got to vote —
现在给大家一些时间
and we’re not moving on.
不过我不想因为给大家更多的时间思考直到每个人都立场明确
And I don’t want to end up in the two-minute silence
而最后以两分钟沉默告终
to give you more time to think about it, until everyone’s expressed a view. OK.
所以你们要做的只是比较这两种情形下
So what you want to do is compare the average number of tosses until we first see
平均数的大小
head-tail-head with the average number of tosses until we first see head-tail-tail.
哪些认为A是对的–
Who thinks that A is true —
即 平均来看 出现正反正的情形要晚于正反反情形?
that, on average, it’ll take longer to see head-tail-head than head-tail-tail?
哪些认为B是对的–即 平均来看次数相同?
Who thinks that B is true — that on average, they’re the same?
哪些认为C是对的–即 平均来看 出现正反正情形的次数
Who thinks that C is true — that, on average, it’ll take less time
要少于正反反的情形?
to see head-tail-head than head-tail-tail?
好 谁没有投票? 那真是很调皮–我说过你们要选择一个
OK, who hasn’t voted yet? Because that’s really naughty — I said you had to.
(笑)
(Laughter)
好的 那么大多数人认为B是正确的
OK. So most people think B is true.
也许当听到甚至非常优秀的数学家也是这么想的 你会放下心来
And you might be relieved to know even rather distinguished mathematicians think that.
B不正确 答案是A
It’s not. A is true here.
实际上 平均起来
It takes longer, on average.
正反正情形下掷硬币的次数是10次
In fact, the average number of tosses till head-tail-head is 10
而正反反情形的次数是8次
and the average number of tosses until head-tail-tail is eight.
怎么会这样呢
How could that be?
这两种情形有什么不同吗
Anything different about the two patterns?
二者的确不同 正反正情形会自我重叠
There is. Head-tail-head overlaps itself.
如果你掷出正-反-正-反-正 你能在这五次中
If you went head-tail-head-tail-head, you can cunningly get two occurrences
看到两次正反正的情形
of the pattern in only five tosses.
而这在正反反的情形下无法实现
You can’t do that with head-tail-tail.
这一点变得很重要
That turns out to be important.
有两种方法可以来想这个问题
There are two ways of thinking about this.
我提供其中之一
I’ll give you one of them.
假设我们正在进行这个实验
So imagine — let’s suppose we’re doing it.
这半边观众–记住 你们希望看到正反反
On this side — remember, you’re excited about head-tail-tail;
而你们希望看到正反正
you’re excited about head-tail-head.
我们开始投硬币 第一次是正
We start tossing a coin, and we get a head —
大家都开始暗自激动
and you start sitting on the edge of your seat
因为一个美妙绝伦的事情要发生了
because something great and wonderful, or awesome, might be about to happen.
第二次是反–大家都很激动
The next toss is a tail — you get really excited.
手边的香槟已经冰好 大家都拿着杯子开始准备庆祝
The champagne’s on ice just next to you; you’ve got the glasses chilled to celebrate.
大家都屏气凝神观望最后一掷
You’re waiting with bated breath for the final toss.
如果是正 那么非常好
And if it comes down a head, that’s great.
你们完了 而你们可以庆祝了
You’re done, and you celebrate.
如果这是反–那么有些遗憾 你们要把杯子移开
If it’s a tail — well, rather disappointedly, you put the glasses away
然后把香槟放回去
and put the champagne back.
接着掷硬币 等着下一个正 然后开始激动
And you keep tossing, to wait for the next head, to get excited.
而这半边则完全不同
On this side, there’s a different experience.
这个序列中前两步都是相同的
It’s the same for the first two parts of the sequence.
大家因第一个是正有点兴奋
You’re a little bit excited with the first head —
当第二个是反的时候 变得更加激动
you get rather more excited with the next tail.
然后再掷硬币
Then you toss the coin.
如果是反 你们就可以打开香槟了
If it’s a tail, you crack open the champagne.
如果是正 你们会感到失望
If it’s a head you’re disappointed,
但你们仍旧已经完成了这个模式的三分之一
but you’re still a third of the way to your pattern again.
这就是一种不大正式的解释–这就是出现不同的原因
And that’s an informal way of presenting it — that’s why there’s a difference.
另外一种思考的方法就是–
Another way of thinking about it —
如果我们掷八百万次硬币
if we tossed a coin eight million times,
我们可能会预计有一百万正反正情形
then we’d expect a million head-tail-heads
和一百万次正反反情形的出现–但正反正的情形可能接连出现
and a million head-tail-tails — but the head-tail-heads could occur in clumps.
所以如果你想在八百万个位置中得到一百万个固定的模式
So if you want to put a million things down amongst eight million positions
可能会有一些是重叠的 重叠的部分会很长
and you can have some of them overlapping, the clumps will be further apart.
这就是另外一种思考方法
It’s another way of getting the intuition.
那么这说明什么问题呢?
What’s the point I want to make?
这是一个非常简单的例子 一个很简单明了的问题–
It’s a very, very simple example, an easily stated question in probability,
有很多人跟你们一样–这个问题几乎没有人答对
which every — you’re in good company — everybody gets wrong.
这是一个小小的题外话 我很想讲的 是基因学
This is my little diversion into my real passion, which is genetics.
在基因学中 正反正和正反反两种情形间存在某种联系
There’s a connection between head-tail-heads and head-tail-tails in genetics,
这个联系是这样的
and it’s the following.
掷硬币的时候 你会得到一个正和反组成的序列
When you toss a coin, you get a sequence of heads and tails.
而当观察DNA时 会发现这不是两个元素组成的序列–正反正–
When you look at DNA, there’s a sequence of not two things — heads and tails —
而是四个字母–A G C T
but four letters — As, Gs, Cs and Ts.
有一些小小的化学剪刀 叫做限制性内切酶
And there are little chemical scissors, called restriction enzymes
当它们遇到特定的情形时 就会剪断DNA
which cut DNA whenever they see particular patterns.
在现代分子生物学中它们是非常有用的工具
And they’re an enormously useful tool in modern molecular biology.
在基因学中 我们不问“什么时候能看到正反正的情形?”
And instead of asking the question, “How long until I see a head-tail-head?” —
你可以问 比如说 “如果用限制性内切酶来剪断任何它遇到的GAAG排列
you can ask, “How big will the chunks be when I use a restriction enzyme
剪下来的基因部分会有多大?”
which cuts whenever it sees G-A-A-G, for example?
那些基因部分会有多长?
How long will those chunks be?”
这是概率和基因之间的一个相当细微的联系
That’s a rather trivial connection between probability and genetics.
他们之间还有一个更深的联系 这里我没有时间多讲
There’s a much deeper connection, which I don’t have time to go into
那就是 现代基因学是一个很令人激动的科学领域
and that is that modern genetics is a really exciting area of science.
以后我们可能会在某些大会的演讲中听到这个部分
And we’ll hear some talks later in the conference specifically about that.
但是若把现代实验技术中发现的秘密公开,
But it turns out that unlocking the secrets in the information generated by modern
关键就是那必须与一些相当复杂的–
experimental technologies, a key part of that has to do with fairly sophisticated —
当听到我的工作是多有用的时候你们会倍感释然
you’ll be relieved to know that I do something useful in my day job,
比正反正的试验要复杂地多–
rather more sophisticated than the head-tail-head story —
但是相当复杂的计算机建模 数学建模
but quite sophisticated computer modelings and mathematical modelings
以及现代统计技术
and modern statistical techniques.
我会举在牛津我们团队正在研究的项目中
And I will give you two little snippets — two examples —
的两个小例子
of projects we’re involved in in my group in Oxford,
我认为这两个例子都很有趣
both of which I think are rather exciting.
大家都了解人类基因组计划
You know about the Human Genome Project.
那是一个项目 目的在于构建人类基因组遗传图谱
That was a project which aimed to read one copy of the human genome.
当完成那个项目后 下一步自然是–
The natural thing to do after you’ve done that —
–就是这个计划 国际人类基因组单体型图计划
and that’s what this project, the International HapMap Project,
目前有五六个不同个国家的实验室在合作研究
which is a collaboration between labs in five or six different countries.
把人类基因遗传图谱看做是对我们共同点的了解
Think of the Human Genome Project as learning what we’ve got in common,
而国际人类基因组单体型图计划就是试着了解
and the HapMap Project is trying to understand
人类之间的不同
where there are differences between different people.
为什么要这么关注这些呢?
Why do we care about that?
这有很多原因
Well, there are lots of reasons.
最紧迫的一个就是 我们想了解其中一些不同
The most pressing one is that we want to understand how some differences
是怎样让一些人容易患一种病的–比如说 二型糖尿病–
make some people susceptible to one disease — type-2 diabetes, for example —
而另一些不同使人更容易得心脏病
and other differences make people more susceptible to heart disease,
或中风 自闭症等等其它病症
or stroke, or autism and so on.
这是一个宏大的项目
That’s one big project.
最近 英国威康信托基金会资助了一个项目
There’s a second big project,
其规模仅次于上一个项目
recently funded by the Wellcome Trust in this country,
它包括了很多大型的研究–
involving very large studies —
成千上万的人各负责八种不同的疾病
thousands of individuals, with each of eight different diseases,
有一些比较常见的疾病 比如一型糖尿病 二型糖尿病和冠心病
common diseases like type-1 and type-2 diabetes, and coronary heart disease,
躁狂抑郁症等等–来试着了解基因
bipolar disease and so on — to try and understand the genetics.
着这了解那些导致疾病的基因的不同之处
To try and understand what it is about genetic differences that causes the diseases.
为什么我们想做这些呢?
Why do we want to do that?
因为我们对大多数人类疾病都了解甚微
Because we understand very little about most human diseases.
我们不知道病因是什么
We don’t know what causes them.
如果我们从根本入手并了解基因
And if we can get in at the bottom and understand the genetics,
这边开启了一个通向疾病病理的窗口
we’ll have a window on the way the disease works,
也开辟了思考疾病治疗方法
and a whole new way about thinking about disease therapies
和预防措施的新路径
and preventative treatment and so on.
所以 就像我之前说过的那样 这是我主要兴趣的一个小分支
So that’s, as I said, the little diversion on my main love.
回到一些关于随机性的平凡的问题上来
Back to some of the more mundane issues of thinking about uncertainty.
这是给你们的另一个测试–
Here’s another quiz for you —
现在假设我们拿到了一个疾病的检测
now suppose we’ve got a test for a disease
这个检测并不是完全准确的 但准确性很高
which isn’t infallible, but it’s pretty good.
这个检测的准确性高达99%
It gets it right 99 percent of the time.
现在我让你们中的一个人 或从街上拉来一个人
And I take one of you, or I take someone off the street,
然后检测他患病的几率
and I test them for the disease in question.
假设这是一个艾滋病毒的测试–一个导致艾滋病的病毒–
Let’s suppose there’s a test for HIV — the virus that causes AIDS —
而测试表明这个人患病
and the test says the person has the disease.
那么他患病的几率是多少呢
What’s the chance that they do?
这个测试准确性是99%
The test gets it right 99 percent of the time.
所以自然而然会得出99%这个答案
So a natural answer is 99 percent.
谁喜欢这个答案?
Who likes that answer?
别这样–每个人都参与进来
Come on — everyone’s got to get involved.
不要觉得你不再相信我了
Don’t think you don’t trust me anymore.
(笑)
(Laughter)
不过 你们的怀疑是正确的 因为这不是正确答案
Well, you’re right to be a bit skeptical, because that’s not the answer.
你们可能是这么想的
That’s what you might think.
这不是正确答案 并不是因为这只是故事的一部分
It’s not the answer, and it’s not because it’s only part of the story.
而实际上它取决于这种病是常见的还是罕见的
It actually depends on how common or how rare the disease is.
现在我来试着说明一下
So let me try and illustrate that.
这个图代表一百万人
Here’s a little caricature of a million individuals.
我们来考虑一种疾病的感染率–
So let’s think about a disease that affects —
它非常罕见 在一万人中仅一人患病
it’s pretty rare, it affects one person in 10,000.
在这一百万人中 大部分人都是健康的
Amongst these million individuals, most of them are healthy
而一些人会患病
and some of them will have the disease.
实际上 如果这是疾病的流行程度
And in fact, if this is the prevalence of the disease,
那么约一百人会患病而其余人不会
about 100 will have the disease and the rest won’t.
现在假设我们给所有人做了测试
So now suppose we test them all.
会出现什么情况呢
What happens?
在100个患有该疾病的人中
Well, amongst the 100 who do have the disease,
这个测试会有99%的正确性 所以99个人会检测出患病
the test will get it right 99 percent of the time, and 99 will test positive.
在那些没有患病的人中
Amongst all these other people who don’t have the disease,
这个测试仍然有99%的正确率
the test will get it right 99 percent of the time.
只有1%是错误的
It’ll only get it wrong one percent of the time.
但是没有患病的人太多了 所以错误的患病检测会非常多
But there are so many of them that there’ll be an enormous number of false positives.
换种方法说–
Put that another way —
在所有结果是患病的检测中–就是这些人–
of all of them who test positive — so here they are, the individuals involved —
真正患病的几率小于1%
less than one in 100 actually have the disease.
所以即便我们认为这个测试是准确的 这个例子重要的部分在于
So even though we think the test is accurate, the important part of the story is
我们还需要一些信息
there’s another bit of information we need.
这就是关键
Here’s the key intuition.
当知道测试结果为患病时 我们要做的就是
What we have to do, once we know the test is positive,
权衡下面两种解释的概率或可能性
is to weigh up the plausibility, or the likelihood, of two competing explanations.
每种解释都有一定的可能性
Each of those explanations has a likely bit and an unlikely bit.
一种解释是这个人不患病–
One explanation is that the person doesn’t have the disease —
这种可能性比较大 如果你随机选人的话–
that’s overwhelmingly likely, if you pick someone at random —
但是测试结果错了 这种情况很罕见
but the test gets it wrong, which is unlikely.
另一种解释就是这个人不患病–这很少见–
The other explanation is that the person does have the disease — that’s unlikely —
但测试结果正确 这可能性很大
but the test gets it right, which is likely.
而我们最后得到的数字–
And the number we end up with —
就是略少于100的数字–
that number which is a little bit less than one in 100 —
与这几种解释之间的关联性有关
is to do with how likely one of those explanations is relative to the other.
每个解释合起来都不大可能
Each of them taken together is unlikely.
这是另一个说明同样道理的例子 更加切题
Here’s a more topical example of exactly the same thing.
在英国的听众知道 这是一个很有名的案子
Those of you in Britain will know about what’s become rather a celebrated case
一个女人叫做萨里•克拉克 她有两个孩子 都突然去世
of a woman called Sally Clark, who had two babies who died suddenly.
很自然人们以为这属于婴儿猝死
And initially, it was thought that they died of what’s known informally as “cot death,”
更正式的说法是婴儿猝死综合征
and more formally as “Sudden Infant Death Syndrome.”
由于多种原因 萨里后来以谋杀罪被逮捕
For various reasons, she was later charged with murder.
在法庭上 一个非常著名的小儿科医师作证
And at the trial, her trial, a very distinguished pediatrician gave evidence
两个婴儿猝死 在一个像萨里的家里–
that the chance of two cot deaths, innocent deaths, in a family like hers —
有经验并不吸烟的–概率为七千三百万分之一
which was professional and non-smoking — was one in 73 million.
长话短说 她最后被判有罪
To cut a long story short, she was convicted at the time.
后来 最近 她在上诉中无罪释放了
Later, and fairly recently, acquitted on appeal — in fact, on the second appeal.
当置于实际情境中 大家就能想象 一个人失去了一个孩子
And just to set it in context, you can imagine how awful it is for someone
然后又失去了另一个 然后又被诬为凶手
to have lost one child, and then two, if they’re innocent,
这是多么可怕的事情
to be convicted of murdering them.
要被迫承受审判的压力 并判有罪–
To be put through the stress of the trial, convicted of murdering them —
在女监里熬过一段日子 那里所有的囚犯
and to spend time in a women’s prison, where all the other prisoners
都认为是你杀了孩子–这件事发生在一个人身上真是太可怕了
think you killed your children — is a really awful thing to happen to someone.
而这些事的发生 很大程度上是因为那个专家
And it happened in large part here because the expert got the statistics
得出的数据是错误的 错误出在两方面
horribly wrong, in two different ways.
那么他是怎样得出七千三百万分之一这个数字的呢
So where did he get the one in 73 million number?
他看了一些研究 那些研究上说一个家庭里一个婴儿猝死的概率
He looked at some research, which said the chance of one cot death in a family
就像萨里•克拉克家 这概率是八千五百分之一
like Sally Clark’s is about one in 8,500.
所以他说:“我假设如果一个家庭中出现了一个婴儿猝死
So he said, “I’ll assume that if you have one cot death in a family,
那么第二个婴儿发生猝死的概率也不会变。”
the chance of a second child dying from cot death aren’t changed.”
这被统计学家们称为独立事件
So that’s what statisticians would call an assumption of independence.
这就像是在说:“如果你掷硬币第一次是正
It’s like saying, “If you toss a coin and get a head the first time,
这并不会影响第二次投掷得到正的概率。”
that won’t affect the chance of getting a head the second time.”
所以如果你扔两次硬币 第一次正的几率是二分之一
So if you toss a coin twice, the chance of getting a head twice are a half —
第二次正的几率也是二分之一
that’s the chance the first time — times a half — the chance a second time.
所以他说:“我们来假设
So he said, “Here,
假设这些事件是独立的
I’ll assume that these events are independent.
当你将八千五百分之一相乘
When you multiply 8,500 together twice,
你就会得到七千三百分之一
you get about 73 million.”
而上面这些并没有在法庭上向陪审团
And none of this was stated to the court as an assumption
展示作为前提
or presented to the jury that way.
不幸的是–确实很令人遗憾–
Unfortunately here — and, really, regrettably —
首先 在这种情况下要先以经验判断
first of all, in a situation like this you’d have to verify it empirically.
第二 这可能是错的
And secondly, it’s palpably false.
我们对婴儿猝死综合症有太多不了解
There are lots and lots of things that we don’t know about sudden infant deaths.
很可能有一些我们并不知道的环境因素
It might well be that there are environmental factors that we’re not aware of,
也很可能是有一些
and it’s pretty likely to be the case that there are
我们并不了解的基因因素
genetic factors we’re not aware of.
所以如果一个家庭出现一个婴儿猝死 你就要把他们放到高概率组
So if a family suffers from one cot death, you’d put them in a high-risk group.
他们很可能有这些环境因素
They’ve probably got these environmental risk factors
和/或基因因素 而我们对这些并不知情
and/or genetic risk factors we don’t know about.
而就像不知道上面得出的信息一样 确定第二个死亡的概率
And to argue, then, that the chance of a second death is as if you didn’t know
是非常愚蠢的
that information is really silly.
这比愚蠢还糟–这是坏科学
It’s worse than silly — it’s really bad science.
但是 这推论就这样呈现在法庭上 而几乎没有人质疑
Nonetheless, that’s how it was presented, and at trial nobody even argued it.
这是第一个问题
That’s the first problem.
第二个问题是 七千三百万分之一这个数字意味着什么
The second problem is, what does the number of one in 73 million mean?
在萨里•克拉克被定罪后–
So after Sally Clark was convicted —
可以想象 这在媒体中引起轩然大波–
you can imagine, it made rather a splash in the press —
一个英国相当有名望的报社记者写到
one of the journalists from one of Britain’s more reputable newspapers wrote that
这个专家说
what the expert had said was,
“她无罪的几率是七千三百万分之一”
“The chance that she was innocent was one in 73 million.”
这是一个逻辑上的错误
Now, that’s a logical error.
这个错误相当于认为
It’s exactly the same logical error as the logical error of thinking that
在准确率99%的疾病测试后
after the disease test, which is 99 percent accurate,
患病的几率是99%
the chance of having the disease is 99 percent.
在疾病的例子中 我们要注意两点
In the disease example, we had to bear in mind two things,
一个是这个测试得出的可能性是否正确
one of which was the possibility that the test got it right or not.
另一个就是这个人本身是否患病
And the other one was the chance, a priori, that the person had the disease or not.
这个情形是完全相同的
It’s exactly the same in this context.
这个解释包括两个部分
There are two things involved — two parts to the explanation.
我们想知道这两种不同解释发生的可能性 或相对的可能性
We want to know how likely, or relatively how likely, two different explanations are.
一个是 萨里•克拉克是清白的–
One of them is that Sally Clark was innocent —
也就是 一个先验 极为可能–
which is, a priori, overwhelmingly likely —
大多母亲不会杀自己的孩子
most mothers don’t kill their children.
这个解释的第二部分
And the second part of the explanation
就是她遭遇了一个可能性极小的时间
is that she suffered an incredibly unlikely event.
不像七千三百万分之一那样小 但也同样不可能
Not as unlikely as one in 73 million, but nonetheless rather unlikely.
另一个解释就是
The other explanation is that she was guilty.
我们可能认为一个先验是 不大可能
Now, we probably think a priori that’s unlikely.
然后我们当然应该认为在刑事审判的情形下
And we certainly should think in the context of a criminal trial
这是不大可能的 因为我们以无罪为前提
that that’s unlikely, because of the presumption of innocence.
如果她那时试着杀害孩子 那么她成功了
And then if she were trying to kill the children, she succeeded.
所以她无罪的机率并不是七千三百万分之一
So the chance that she’s innocent isn’t one in 73 million.
我们不知道这个个机率是多少
We don’t know what it is.
这同衡量其它对她不利的证据
It has to do with weighing up the strength of the other evidence against her
和数据型证据有关
and the statistical evidence.
我们知道 孩子死了
We know the children died.
重要的是这两种解释
What matters is how likely or unlikely, relative to each other,
相对发生的机率
the two explanations are.
他们都令人难以置信
And they’re both implausible.
在这种情形下 错误的数据
There’s a situation where errors in statistics had really profound
产生了很重大而且不幸的结果
and really unfortunate consequences.
事实上 还有其他两个女人因这个小儿科医师的作证
In fact, there are two other women who were convicted on the basis of the
而被定罪 而她们在上诉中都被无罪释放了
evidence of this pediatrician, who have subsequently been released on appeal.
很多案子都因此而重审
Many cases were reviewed.
这引起了很高的关注 因为他正面临着
And it’s particularly topical because he’s currently facing a disrepute charge
英国综合医学委员会的名誉调查
at Britain’s General Medical Council.
总结一下 我们应该得到什么警示呢
So just to conclude — what are the take-home messages from this?
我们知道 随机性、不确定性和概率
Well, we know that randomness and uncertainty and chance
在生活中影响重大
are very much a part of our everyday life.
并且大家作为一个集体 在很多方面都很特别
It’s also true — and, although, you, as a collective, are very special in many ways,
大家没有回答正确我给出的例子 是完全正常并具有代表性的
you’re completely typical in not getting the examples I gave right.
有很多人们理解错误的记录
It’s very well documented that people get things wrong.
他们在不确定性方面犯逻辑错误
They make errors of logic in reasoning with uncertainty.
我们可以很好地解决语言的细微差别
We can cope with the subtleties of language brilliantly —
还有有趣的进化方面的问题 如我们是怎么来到这里的
and there are interesting evolutionary questions about how we got here.
我们并不擅长不确定性
We are not good at reasoning with uncertainty.
这是我们生活中的一个问题
That’s an issue in our everyday lives.
像你们听过的很多演讲 数据是很多科学研究中
As you’ve heard from many of the talks, statistics underpins an enormous amount
的基础–社会科学 医学
of research in science — in social science, in medicine
确实 很多行业
and indeed, quite a lot of industry.
所有的质量控制 这些对工业过程的影响极其重要
All of quality control, which has had a major impact on industrial processing,
这些都以数据为基础
is underpinned by statistics.
而这方面我们并不擅长
It’s something we’re bad at doing.
至少我们应该意识到这一点 并尽力防止错误发生
At the very least, we should recognize that, and we tend not to.
回到法律方面 在萨里•克拉克的案子中
To go back to the legal context, at the Sally Clark trial
所有律师都接受了专家的证词
all of the lawyers just accepted what the expert said.
如果一个小儿科医师出来对陪审团作证
So if a pediatrician had come out and said to a jury,
我不知道怎样建造桥梁 我在路那边建了一个
“I know how to build bridges. I’ve built one down the road.
开车回家的时候请放心过桥
Please drive your car home over it,”
他们会说 小儿科医师不懂怎样建造桥梁
they would have said, “Well, pediatricians don’t know how to build bridges.
那是工程师的工作
That’s what engineers do.”
而另一方面 他们站出来说 或暗示
On the other hand, he came out and effectively said, or implied,
我知道怎样运用不确定性 我知道怎样处理数据
“I know how to reason with uncertainty. I know how to do statistics.”
然后大家都说 这没问题 他是专家
And everyone said, “Well, that’s fine. He’s an expert.”
所以我们应该明白我们的什么是我们的强项 什么不是
So we need to understand where our competence is and isn’t.
完全相同类型的问题每天都出现在DNA的测绘中
Exactly the same kinds of issues arose in the early days of DNA profiling,
科学家 律师 有些情况下甚至法官
when scientists, and lawyers and in some cases judges,
都会错误地解释证据
routinely misrepresented evidence.
通常–大家希望–结果是无罪 只是错误地解释了证据
Usually — one hopes — innocently, but misrepresented evidence.
法庭上的科学家说 这个人无罪的机率是三百万分之一
Forensic scientists said, “The chance that this guy’s innocent is one in three million.”
即使你相信这个数据 就像七千三百万分之一
Even if you believe the number, just like the 73 million to one,
这也并不是它真正的含义
that’s not what it meant.
因为这个在英国和其他地方
And there have been celebrated appeal cases
有很多上诉案件
in Britain and elsewhere because of that.
这就是在法律层面上我们要考虑的问题
And just to finish in the context of the legal system.
说“我们尽量给予证据更好的解释”固然很好
It’s all very well to say, “Let’s do our best to present the evidence.”
但越来越的地 在DNA测绘中–这也很重要–
But more and more, in cases of DNA profiling — this is another one —
我们希望陪审团 那些普通人–
we expect juries, who are ordinary people —
记录表明他们非常不擅此类–
and it’s documented they’re very bad at this —
我们希望陪审团能够处理好这些推理
we expect juries to be able to cope with the sorts of reasoning that goes on.
在生活的其它方面 如果人们在争辩的时候–当然 也许不包括政治
In other spheres of life, if people argued — well, except possibly for politics —
但是在生活的其他方面 如果人们争辩地并不合逻辑
but in other spheres of life, if people argued illogically,
我们认为这不是好现象
we’d say that’s not a good thing.
在不确定性方面 我们也从某种程度上对政客抱有希望
We sort of expect it of politicians and don’t hope for much more.
但并不奢求什么 我们一直都没对过
In the case of uncertainty, we get it wrong all the time —
至少 我们应该认识到这一点
and at the very least, we should be aware of that,
并且 希望我们能试着做什么去改变这一点
and ideally, we might try and do something about it.
谢谢大家
Thanks very much.

发表评论

译制信息
视频概述
听录译者

收集自网络

翻译译者

收集自网络

审核员

自动通过审核

视频来源

https://www.youtube.com/watch?v=kLmzxmRcUTo

相关推荐