Dear Fellow Scholars,
this is Two Minute Paperswith Károly Zsolnai-Fehér.
This episode is about a really nice new paper on pose estimation.
Pose estimation means
that we have an image or video of a human as an input,
and the output, should be, this skeleton that you see here
that shows us what the current position of this person is.
Sounds alright, but what are the applications of this, really?
Well, it has a huge swath of applications,
for instance, many of you often hear about
motion capture for video games and animation movies,
but it is also used in medical applications for finding abnormalities in a patient’s posture,
animal tracking, understanding sign language,
pedestrian detection for self-driving cars,
and much, much more.
So if we can do something like this in real time,
that’s hugely beneficial for many many applications.
However, this is a very challenging task,
because humans have a large variety of appearances
images come in all kinds of possible viewpoints,
and as a result, the algorithm has to deal with occlusions as well.
This is particularly hard, have a look here.
In these two cases, we don’t see the left elbow,
so it has to be inferred from seeing the remainder of the body.
We have the reference solution on the right,
and as you see here, this new method is significantly closer to it
than any of the previous works.
The main idea in this paper is that
it works out the poses both in 2D and 3D
and contains neural network
that can convert to both directions between these representations
while retaining the consistencies between them.
First, the technique comes up with an initial guess,
and follows up by using these pose transformer networks
to further refine thisinitial guess.
This makes all the difference.
And not does it lead to high-quality results,
but it also takes way less time than previous algorithms —
we can expect to obtain a predicted pose in about 51 milliseconds,
which is almost 20 frames per second.
This is close to real time,
and is more than enough for many of the applications
we’ve talked about earlier.
In the age of rapidly improving hardware,
these are already fantastic results
both in terms of quality and performance.
And not only the hardware,
but the papers are also improving at a remarkable pace.
What a time to be alive.
The paper contains an exhaustive evaluation section.
It is measured against a variety of high-quality solutions.
I recommend that you have a look in the video description.
I hope nobody is going to install a system
in my lab that starts beeping every time I slouch a little,
but I am really looking forward to benefitting from these other applications.
Thanks for watching and for your generous support,
and I’ll see you next time!