ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

人脸检测(维奥拉·琼斯算法) – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

人脸检测(维奥拉·琼斯算法)

Detecting Faces (Viola Jones Algorithm) - Computerphile

我想讲一下人脸检测
I’d like to talk about face detection
好 它是这么一回事
All right. So this is the idea or
如果你有一张照片
if you’ve got a picture with one face
里面有一张或多张脸
in it or many faces in it
那我们怎么找出这些脸来
how do we find those faces and
标准做法是 “啊 我们只要用深度学习就可以啊”
The standard approaches is”Ah, we’ll just use deep learning”
如今你的确可以用深度学习来检测人脸
Now you can use deep learning to find faces
但实际上大家都在用的方法不是深度学习
But actually the approach that everyone uses isn’t deep learning
而是在本世纪初提出的一种方法
and it was developed in the early 2000s
让我们回到深度学习可以做任何事之前
So back before deep learning did everything
你需要自己想出这些算法
Em, you kind of had to come up
对不对 机器学习仍然还是有很大用处的
with these algorithms yourself right machine learning was still a thing.
因此大家依然用机器学习的方法
So people still use machine learning
但是他们用的
But they used them
是一些人工的特征 小的神经网络以及其他分类器
with handcrafted features and small neural networks and other kinds of classifiers
他们试着用这些方法来进行人脸检测
that they tried to use to do these things
现在你知道人脸检测
Now the face detection was you
在那时就已经开始了研究
know ongoing research at this time
2002年 保罗·维奥拉·迈克尔·琼斯
In 2002 Paul viola Michael Jones came up
发表了一篇论文 名为
with this paper here called
“利用简单特征的增强串联进行快速目标检测”
“Rapid object detection using a boosted cascade of simple features”
这是一篇非常好的论文
and this is a very very good paper
它已被引用了约17000次
It’s been cited some 17,000 times
尽管深度学习几乎已经接管了一切
And despite the fact that deep learning has kind of taken over everything.
在人脸检测领域 这篇论文的方法依然完全有效
In face detection, this still performs absolutely fine, right
这个算法非常快 并且如果你有任何一个
It’s incredibly quick and if you’ve got any kind
可以进行人脸检测的相机
of camera that does some kind of face detection
这个相机用的算法会跟这个算法十分相似
It’s going to be using something very similar to this, right?
那么这个算法是怎么工作的?我来讲一下
So what does it do? Let’s talk about that.
问题这样定义 对吧
The problem is, right,
对于人脸检测有几个问题 其中一个
There’s a few problems with face detection one is
是我们不知道待检测的人脸有多大
that we don’t know how big the face is going to be
它可能很大也可能很小
So it could be very big could be very small,
另一个问题是
and another is, you know,
或许你有一张高分辨率的照片
Maybe you’ve got a very high-resolution image.
我们想要做的是
We want to be doing this
每秒运行特别多次
lots and lots of times a second
那我们要怎样做?
So what are we going to do to?
仔细观察照片的每一个像素点
Look over every every tiny bit of image
重复很多很多次?这也太复杂了
and lots of times? Complicated. Um.
机器学习告诉你这是一张脸吗?
Machine learning, that says, you know, is this a face?
这不是一张脸吗?
is this not a face?
在速度和准确率之间 在假正例和假负例之间都会有权衡
There’s a trade-off between speed and accuracy and false-positives and false-negatives.
这样一想完全没有头绪
It’s a total mess
要想快速检测出人脸是很难的 对吧?
It’s very difficult to find faces quickly, right?
这个算法也考虑到了这点
This is also considering it, you know,
我们人可以分为不同的族群
we have different ethnic groups young,
年轻的 年长的 戴眼镜的 等等
old people, people who’ve got glasses on, things like this
所有这些汇合成了一个相当难的问题
So all of this adds up to quite a difficult problem,
但它不再是一个
and yet it’s not a problem
我们担心的问题了
we worry about anymore
我们可以解决它 是因为论文的作者
because we can do it and we can do it because of these guys
提出了一种分类器 它使用一些
They came up with a classifier that uses very
非常简单的特征 从一张图的一个比特点
very simple features, one bit of an image
减去另一个比特点
subtracted from another bit of an image and
仅靠这一个点的特征 效果不会特别好
On its own and that’s not very good,
但是如果你有很多很多这样的像素点的话
but if you have thousands and thousands of those
你会发现或许这就是一张脸
all giving you a clue that maybe this is a face
你可以开始进行合适的判断
you could start to come up with proper decision
场外音:这个算法是在找面部特征吗
[offscreen] Is this looking for facial features then is it
只是简单地找鼻子啊眼睛啊之类的?
as simple as looking for a nose and an eye and etc?
实际上不是的
So no, not really, right.
深度学习是这样做的 对吧?
So deep learning kind of does that right?
它把物体的轮廓以及
It takes it takes edges and
其他特征组合成物体
other features and it combines them together into objects
通过逐层组合 或许就可以检测到脸
you know, in a hierarchy and then maybe it finds faces.
而我们讲的这个算法是要对脸是什么样子
What this is doing is making very quick decisions about
做一个十分快速的判断 举个例子
What it is to be a face, so in for example,
如果我们在看一张灰度图
if we’re just looking at a grayscale image Right,
我的眼睛肯定要比我的额头暗的 对吧?
my eye is arguably slightly darker than my forehead, right?
比如说阴影以及
In terms of shadowing and the
瞳孔更暗之类的
pupils darker and things like this
所以如果你只是用这张图的这个比特点
So if you just do this bit
减去这张图的这个比特点
of image minus this bit of image
我的眼睛与这个黑板相比
My eye is going to produce a different response
在大多数时候 会有不同的响应 对吧
from this blackboard, right, most of the time
现在 如果你只是这样做
Now, if you do that on its own,
这显然不是个好的分类器 对吧
that’s not a very good classifier, right?
这样做将会检测到很多的脸
It’ll get quite a lot of the faces
但它也会找到很多其他的
But it’ll also find a load of other
也是恰巧同时
stuff as well where something happens to be darker
比其他东西暗的东西
than something else that happens all the time
所以问题来了 我们可以一次性找出许多
so the question is “can we produce a lot of these things
这种暗的东西 之后再像那样做决策吗
all at once and make a decision that way?”
这篇论文的作者提出了这些十分简单的矩形特征
They proposed these very very simple rectangular features
那就是 用一张图片的一部分
which are just one part
减去这张图片的另一部分
of an image subtracted from another part of an image
所以就有了几种这样的特征
So there are a few types of these features.
其中一种是两个矩形特征
One of them is a two rectangle features
于是我们有一张图
So we have a block
在这张图里用一边减去另一边
of image where we subtract one side from the other side
他们的方法是一种基于机器学习的算法
Their approaches are machine learning-based approach
通常来说 在机器学习里
Normally, what you would do
你要做的东西是提取(特征)
in machine learning is you would extract —
你不能把整张图进行输入
You can’t put the whole image
因为有可能这张图里有500张脸
in maybe there’s five hundred faces in this image
所以我们是要把从图中
So we put in something
计算得到的特征作为算法的输入
we’ve calculated from the image some features
之后再用一些机器学习的方法
and then we use all machine learning to try and classify
去对图中的点或者整张图进行分类 诸如此类
bits of the image or the whole image or something like this.
这篇论文的贡献是用一种非常快的方式
Their contribution was a very quick way to
计算这些特征并且将它们用于人脸分类
calculate these features and use them to make a face classification
也就是判断在图的这块区域中有没有脸
To say there is a face in this block of image or there isn’t
他们用的这些特征是不是超级简单?
And the features they use a super simple, right?
他们只是像这样计算矩形特征
So they’re just rectangular features like this
用这个方法 我们就可以得到相邻的两个矩形
So we’ve got two rectangles next to each other which,
也就是一些像素点的集合
you know are some amount of pixels
所以有可能这边9个像素点 另一边9个像素点
so maybe it’s, it’s nine pixels here and nine pixels here or
或者每一边都是1个像素点 或者每一边都是100个
just one pixel and one pixel or hundred pixels and a hundred pixels
这不重要
It’s not really important.
之后我们是要用一边的像素点值减去另一边的 对吧?
and we do one subtract the other right?
所以本质上来说 我们是在找
So essentially we’re looking
一张图中哪些像素点比其他的点更暗或者更亮
for bits of an image where one bit is darker or brighter than another bit
这是一个双矩形的特征
This is a two rectangle feature.
这个特征还可以像这样进一步调整
It can also be oriented the other way so you know like this
我们还可以有三矩形特征
We also have three rectangle features which are like
这时你就可以这样做
this where you’re doing sort of maybe the
用中间的矩形减去两边的或反过来
middle subtract the outside or vice versa
并且我们也可以计算四个矩形的特征
And we have four rectangle feature which are
来找到类似于对角线啊边角之类的东西
going to be kind of finding diagonal sort of corner things
就像这样
So something like this
即使你的图片很小
Even if your image is small right
你也可以有很多不同的特征
you’re going to have a lot of different possible features
甚至提到的这四种特征
even of these four types
这四个矩形可以每个都只有一个像素点
So this four rectangle feature could just be one pixel each
也可以每个都是图的一半
or each of these could be half the image
它的大小可以伸缩 位置可以移动
It can scale, you know, or move and move around
布兰迪:这个大小是怎么决定的?
Brady: What determines that?
迈克:呃 他们把不同大小的都用上
Mike: Um, so they do all of them, right?
或至少看一下不同大小的特征矩形
Or at least they look at all of them originally
看看哪个大小的
And they learn which ones are the most
对于检测脸更有用
useful for finding a face
覆盖整张图的脸并不能表示一般的
This over a whole image of a face isn’t hugely
脸在图中什么样 对吧
representative of what a face looks like right?
没有谁的脸
No one’s face.
是这个角比另外的角暗的
The corners are darker than the other two corners
这没道理 对吧
That doesn’t make sense, right?
但可能覆盖他们的眼睛会更有道理
but maybe over their eye, maybe that makes more sense
我不知道 这需要再考虑
I don’t know, that’s the kind of the idea.
所以他们有一个训练过程
So they have a training process at which was down
来决定哪一个特征更有用
which of these features are useful,
另一个问题是 在一张图里
the other problem we’ve got is that on an image,
进行大批量的像素点运算
calculating large groups of pixels
并把它们加起来 这是个很费时的过程
and summing them up is quite a slow process
所以论文作者提出了个漂亮的想法
So they come a really nifty idea
名叫整体图 可以使得上述运算大大加快
called an integral image which makes this way way faster
现在想象我们有一张图
So let’s imagine we have an image right? and so think — consider while
考虑到我们正在讨论的是
we’re talking about this that we want
想计算图中的像素点
to kind of calculate these bits of image
并减去图中其他像素点 对吧
but minus some other bit of image, right?
想象我们有一张
So let’s imagine we have
漂亮但是很小的图片
an image which is nice and small
这张图太小以至于我在上面都写不下东西
It’s too small for me to write on
但我们不要担心这个
but let’s not worry about it
好 之后画一些像素值
right and then let’s draw in some pixel values.
过程快进
Fast forward.
看下现在的状态
Look at the state of that.
完全是一片混乱
That’s that’s a total total shambles
这是个可擦除笔吧?我的天
This is a rubbable-out pen, right? For goodness sake
好的好的
Right right okay okay so all right so
想象一下这就是我们的输入图片
Let’s imagine this is our input image.
我们要在里面检测人脸
We’re trying to find a face in it
现在我还没有看到
Now I can’t see one
但显然这张脸可能会相当大
But obviously this could be a quite a
我们想要计算
lot bigger and we want to calculate
我们的双矩形特征之一
let’s say one of our two rectangle features
那或许我们是想要
So maybe we want to do
拿上面的这四个像素点
these four pixels up in the top
减去下面的这四个像素点
Minus the four pixels below it
现在就只是几个加法运算了 7+7+1+2
now that’s only a few additions: 7 + 7 + 1 + 2
减去 8+3+1+2
minus 8 + 3 + 1 + 2
但如果你对图片的很多区域
But if you’re doing this over large sections of image
都重复多次这个操作 试着去检测人脸
and thousands and thousands of times to try and find faces
那肯定是不行的
That’s not gonna work
所以维奥拉·琼斯想出来了办法
So what Viola Jones came up with
就是我们预先在整张图上
was this integral image where we pre-compute
计算一些东西
some of this arithmetic for us,
把计算结果存储为中间形式
store it in an intermediate form,
之后我们就能够
and then we can calculate
非常容易地计算矩形相减
rectangles minus of of rectangles really easily
据此 我们先过一遍这个图
So we do one pass over the image,
每一个新的像素值是它的上边 左边
and every new pixel is the sum of all the pixels
和它自己的像素值之和
above and to the left and it including it
对吧?就像这样
Right? So this will be something like this
于是 1+7等于8
So, 1 + 7 is 8 so this
所以这个像素点就是这两个像素点值之和
so this pixel is the sum of these two pixels
这个像素点就是这三个像素点值之和
and this pixel is going to be all these three
所以就会是12 14 23……
So that’s going to be 12… 14… 23
现在我们快进一下
and now we fast forward
我在自己脑中算一遍
while I do a bit of math in my head
8 17…或许我算得比某些人快…24
8, 17 maybe I did somebody’s earlier, 24…
计算机算起这个来会非常快
On a computer this is much much faster
所有像素点的值加起来是113
The sum of all the pixels is 113.
举个例子 这个4×4的像素块之和是68
For example, the sum of this 4×4 block is 68
这样做有用 原因是 等我一下
Now the reason this is useful, bear with me here
如果我们想要计算
But if we want to work out what,
这一块区域的像素点值之和
let’s say, the sum of this region
我们要做的就是
is what we do is we take this one
用这个113减去这个值64
113 we subtract this one, minus 64
再减去这个值71 这样就去掉了
alright? and this one minus 71
这一块和这一块区域的值
and that’s taken off all of that and all of that
之后我们还要加上这一块区域的值
and then we have to add this bit in
因为这块区域被减了两次
because we’ve been taken off twice
所以再加上40
so plus 40.
好 所以一共是读了4个数
All right, so that’s four reads.
有趣的是 这是一块4×4的区域
Now funnily enough this is a 4 by 4 block
所以看起来不算什么
So I’ve achieved nothing
但是如果这是张很大很大的图
But if this was a huge huge image,
我就可以节省大量的时间
I’ve saved a huge amount of time
我们刚才的计算结果是18
and the answer to this is 18
就正好等于6+6+5+1
which is 6 plus 6 plus 5 plus 1
所以有个假设是
So the assumption is that
我不打算只看一遍
I’m not just going to be looking at these pictures
这些图片 是不是?
one time to do this, right?
有可能人脸会出现在很多地方
There’s lots of places a face could be I’ve got to look
我要在不同的区域考虑不同的像素组合
at lots of combinations of pixels and different regions
所以我会做很多次像素值的加减运算
So I’m going to be doing huge amounts of pixel addition and subtraction
所以就让我们对整张图计算一次
So let’s calculate this integral image once and then
之后以这个作为基础 后续计算就会快很多
use that as a base to do really quick
对区域的像素值进行加减操作 对吧?
Adding and subtracting of regions, right?
这样做的话 举个例子 4个矩形区域
and so I think for example a 4 rectangle region
就需要读取9次存储结果
is going to take something like nine reads
诸如此类 再做几次加法
or something like that and a little bit addition.
这非常简单
It’s very simple
好了 现在我们考虑怎么把这个算法
All right. So now how do we turn this
变成可以实际工作的人脸检测器
into a working face detector?
我们想象下有一张人脸图片
Let’s imagine we have a picture of a face
又到展现我画技的时候了
which is going to be one of my good drawings again
现在 在这个算法中
Now in this particular algorithm
他们要看这个24×24的像素区域
they look 24 by 24 pixel regions
但他们也可以稍微增大或减小区域大小
but they can also scale up and down a little bit
我们假设在这里有一张脸
So let’s imagine there’s a face here
这张脸有眼睛 鼻子
which has, you know eyes, a nose
和一张嘴 还有一些头发
and a mouth right and some hair
好的 像我之前讲到的
Okay. Good. Now as I mentioned earlier,
可能会有一些特征
there are probably some features
对于检测这张脸不那么有用
that don’t make a lot of sense on this
举个例子 如果我拿我的笔
So subtracting, for example, if I take my red pen
用这一半图像减去另一半
subtracting this half of image from this half.
这不会表示大多数情况下的人脸
It’s not going to represent most faces
除非其中一边光线很亮
It may be when there’s a lot of lighting on one side,
但是它没法很好地区分
but it’s not very good at distinguishing
包含人脸的照片
images that have faces in and
和不含人脸的照片
images that don’t have faces in
所以 他们的做法是
So what they do,
他们计算这张24×24图片的所有的特征
is they calculate all of the features, right for a 24 by 24 image,
计算所有的18万种可能的组合
they calculate all 180,000 possible combinations
包括2个 3个 4个矩形特征 之后他们算出
of 2, 3 and 4 rectangle features and they work out which one,
对于给定的包含脸的和不包含脸的图片数据集
for a given data set of faces and not faces,
哪一个特征能够更好地区分正负样本
which one best separates the positives from the negatives, right?
我们假定你有1万张带人脸的图片
So let’s say you have 10,000 pictures of faces
1万张只有背景的图片 哪个特征更好地
10,000 pictures of background which one feature best
告诉你“这里有人脸 这里没有”
says”this is a face, this is not a face”Right, bearing in mind
记住 如果只靠一个特征
Nothing is going to get it completely right
不会百分之百正确
with just one feature
所以它采用的第一个特征
So the first one it looks
是类似这样的东西
it turns out is something like this
是一个双矩形的区域
It’s a two rectangle region,
但是在眼睛和脸颊区域
but works out a difference between
得到的结果会有区别
the area of the eyes and the area for cheeks
就是说 如果是正常的人脸
So it’s saying if on a normal face
你的脸颊一般会比你的眼睛更亮或者更暗
your cheeks are generally brighter or darker than your eyes
所以他们所做的 就是他们提到……
So what they do is they say, okay Well,
呃……我们从只用这一个特征
let’s start a classifier
构造的分类器说起 看下效果
with just that feature right and see how good it is
这是我们第一个特征 特征一
This is our first feature feature number one,
我们有一个很低的阈值
and we have a pretty relaxed threshold
所以如果这个区域中有任何稍微看起来
so if there’s anything plausible in this region
像人脸的 我们都判为正例
we’ll let it through right which is going
这样就会把所有的人脸
to let through all of the faces
以及一堆其他的我们不需要
and a bunch of other stuff as well
的东西都通过了
that we don’t want right.
就是这样 这样也可以 对吧?
So this is yes. That’s okay, right?
如果分类器判为负例 那么我们立刻就
That’s okay if it’s a no then we immediately
可以判断这块区域不含有人脸 对吧?
fail that region of image right?
所以我们运行一次试验
So we’ve done one test which is
如之前讲的 四次加法
as we know about four additions
我们就可以说 对于图像的这块区域
So we’ve said for this region of image
如果分类结果通过的话
if this passes
就可以继续进行下一阶段 对吧
will let it through to the next stage right?
如果该区域可能有人脸 就可以到下一阶段
And we’ll say okay it definitely could be a face
如果没有人脸 就不可以 这样有道理吧
It’s not not-a-face. Does that make sense? Yeah, okay
好 我们来看下一个特征
So let’s do look at the next feature
下一个特征是这样的
The next feature is this one
是一个三区域的特征
So it’s a three region feature
它测定了鼻子 鼻梁
and it measures the difference between the nose and
和眼睛之间的不同
the bridge and the eyes, right?
它们会不会更暗或者更亮
which may or may not be darker or lighter.
好 这里会有不同
All right, so there’s a difference there
这就是特征二
So this is feature number two,
我在这里标记上 二号特征
so I’m going to draw that in here number two
如果这个也通过了 我们就看下一个特征
And if that passes we go to the next feature
这样就构成了二分决策
so this is a sort of binary
他们把这个称为“退化决策树”
they call it”degenerate decision tree”, right?
因为决策树是一个二叉树
well because the decision tree is a binary tree
而我们刚才所说不是严格的二叉树
This is not really because you immediately
因为你在这里就停住不往下了
stop here, you don’t go any further.
关键点在于 每次我们计算
The argument is that every time we calculate one
其中一个特征时 会花一点时间
of these features it takes a little bit of time
越快得出“这里绝对没有人脸” 越好
The quicker we can say”no definitely not a face in there”, the better.
我们需要计算所有特征或者其中最好特征的
And the only time we ever need to look at all the features,
唯一情况 是我们觉得
or all of the good ones is when we think,”
好 这里真的可能是一张脸
okay, that actually could be a face here”
所以我们用越来越精细
So we have less and less general,
越来越具体的特征
more and more specific features going forward
直到这个数量
right up to about the number
他们论文里最终用了大概6000个特征
I think it’s about six thousand they end up using.
好 我们刚才说的只是过第一遍
All right, so we we say just the first one pass
下面 还有第二遍
Yes, just a second one pass
一直这样下去
Yes, and we keep going until we
直到分类器判该区域为一个负例
get a fail and if we get all
如果我们一直这样下去没有得到负例结果
the way to the end and nothing fails
那就说明这是一张人脸 这个算法妙在
that’s a face, right and the beauty of this, is that
对于图片的绝大多数区域不需要计算
for the vast majority of the image, there’s no computation at all.
我们只需要看一下
We just take one look at it
如果特征一失败了 “啊 不是人脸”
first feature fails, “Nah, it’s not a face”
他们设计了一个非常巧妙的方法
They designed a really good way of
来计算图像不同区域之间的加减运算
adding and subtracting different regions of the image
之后他们训练了像这样的
And then they trained a classifier
分类器来找到最好的特征
like this to find the best features
以及用这些特征的最佳顺序
and the best order to apply those features
这在总是检测到人脸
which was a nice compromise between
和假正例之间 以及在速度上
always detecting the faces that are there
做到了很好的权衡 对吧?
and false positives and speed right?
并且我认为在那个时候
And at the time, this was running on,
这个算法也可以让你感受到
I think to give you some idea of
2002年计算机的性能是什么样
what the computational technology was like in 2002
在700兆赫兹的奔腾3处理器上
This was presented on a 700 megahertz Pentium 3
这个算法可以每秒处理15帧
and ran at 15 frames a second
这在那时是闻所未闻的 对吧?
which was totally unheard of back then, right?
人脸检测在那时不是实时进行的
Face detection was the kind of offline, you know,
而这个算法在那时却可以
it was okay at that time
所以这是个特别棒的算法
So this is a really, really cool algorithm
并且它也十分有效
and it’s so effective
直到现在你还可以看到
that you still see it used in,
在你的手机相机里
you know, in your camera phone
在这个相机里都还在用这个算法
and in this camera and so on,
你看到镜头里的人脸周围会有个框
when you just get a little bounding box around the face.
这个算法仍然很有用
And this is still really useful
因为虽然你可能用深度学习
because you might be doing deep learning on something
做了如人脸识别 人脸ID之类的任务
like face recognition, face ID something like this
但部分过程的第一步还是要确定人脸在哪
But part of that process is firstly working out
既然这个算法很有效 就可以直接用啊
where the face is, and why reinvent the
为什么还要再去做重复工作呢
wheel when this technique works really really well
你没有必要进到数据中心
You can’t really get into the data center necessarily
把所有你放进去的芯片再取出来
and take all the chips out that you’ve put in there
你可能是要让这些芯片看起来像一个别的东西
So you probably will make the chips look like they’re meant to be there
或者把它们藏起来
like they’re something else or hide them
现代印刷电路板就是这样构造的
So the way a modern printed circuit board is constructed.
这是一个有好几层玻璃纤维的印刷电路板
It’s a printed circuit board that’s got several layers of fiberglass

发表评论

译制信息
视频概述

本文介绍了2002年左右提出的一个人脸检测算法,用于检测一张照片中哪块区域含有人脸。该算法非常快速有效,至今还用于手机相机等一些应用里。该算法主要思想有两个:一个是积分图,预先计算图片的区域像素点之和以提升计算效率,另一个是通过退化决策树来对每一个简单特征利用分类器来做决策,从而快速判断区域中有没有人脸。

听录译者

收集自网络

翻译译者

Imagist

审核员

审核员 T

视频来源

https://www.youtube.com/watch?v=uEJ71VlUmMQ

相关推荐