• #### 科普

SCIENCE

#### 英语

ENGLISH

#### 科技

TECHNOLOGY

MOVIE

FOOD

#### 励志

INSPIRATIONS

#### 社会

SOCIETY

TRAVEL

#### 动物

ANIMALS

KIDS

#### 卡通

CARTOON

#### 计算机

COMPUTER

#### 心理

PSYCHOLOGY

#### 教育

EDUCATION

#### 手工

HANDCRAFTS

#### 趣闻

MYSTERIES

CAREER

GEEKS

#### 时尚

FASHION

• 精品课
• 公开课
• 欢迎下载我们在各应用市场备受好评的APP

点击下载Android最新版本

点击下载iOS最新版本

扫码下载译学馆APP

#### AI学习3D面部重构

AI Learns 3D Face Reconstruction | Two Minute Papers #198

Dear Fellow Scholars,

this is Two Minute Papers with Károly Zsolnai-Fehér.

Now that facial recognition is becoming more and more of a hot topic,

let’s talk a bit about 3D face reconstruction!

This is a problem where we have a 2D input photograph,

or a video of a person,

and the goal is to create a piece of 3D geometry from it.

To accomplish this, previous works often required

a combination of proper alignment of the face,

multiple photographs and dense correspondences,

which is a fancy name for additional data

that identifies the same regions across these photographs.

But this new formulation is the holy grail

of all possible versions of this problem,

because it requires nothing else but one 2D photograph.

The weapon of choice for this work was a Convolutional Neural Network,

and the dataset the algorithm was trained on couldn’t be simpler:

it was given a large database of 2D input image

and 3D output geometry pairs.

This means that the neural network can look at a lot of these pairs

and learn how these input photographs are mapped to 3D geometry.

And as you can see, the results are absolutely insane,

especially given the fact that it works for arbitrary face positions

and many different expressions, and even with occlusions.

However, this is not your classical Convolutional Neural Network,

because as we mentioned, the input is 2D and the output is 3D.

So the question immediately arises:

what kind of data structure should be used for the output?

The authors went for a 3D voxel array,

which is essentially a cube in which we build up

the face from small, identical Lego pieces.

This representation is similar to the terrain in the game Minecraft,

only the resolution of these blocks is finer.

The process of guessing how these voxel arrays should look

based the input photograph is referred to

in the research community as volumetric regression.

This is what this work is about.

And now comes the best part!

An online demo is also available

where we can either try some prepared images,

or, we can also upload our own.

So while I run my own experiments,

don’t leave me out of the good stuff

and make sure you post your results in the comments section!

The source code is also available for you fellow tinkerers out there.
3D技术的局限性包括 不能检测
The limitations of this technique includes the inability of detecting expressions that

are very far away from the ones seen in the training set,

and as you can see in the videos,

temporal coherence could also use some help.

This means that if we have video input,

the reconstruction has some tiny differences in each frame.

Maybe a Recurrent Neural Network,

like some variant of Long Short Term Memory

could address this in the near future.

However, netr more resources properly.

Very excited to see how these solutions evolve,

and of course, Two Minute Papers is going to be here for you

Very excited to see how these solutions evolve,

Thanks for watching and for your generous support,

and I’ll see you next time!

Taoyasa