ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

AI从音频学习Obama的嘴形合成 – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

AI从音频学习Obama的嘴形合成

Audio To Obama: AI Learns Lip Sync from Audio | Two Minute Papers #194

亲爱的学者们 我是两分钟杂志的Károly Zsolnai Fehér
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
这篇论文真的很精彩:如果有一段真人音频和目标视频画面
This work is doing something truly remarkable: if we have a piece of audio of a real person speaking,
它可以重定时并修改视频 使得图像中的人物
and a target video footage, it will retime and change the video so that the target
看起来在开口讲这些话
person appears to be uttering these words.
哇!
Whoa!
与之不同的是 几期之前我们曾讲过在NVIDIA工作的科学家们致力于
This is different from what we’ve seen a few episodes ago, where scientists at NVIDIA worked
只依靠音频脚本合成虚拟人物的唇型 使其与音频同步
on synthesizing lip sync geometry for digital characters solely relying on audio footage.
其结果很惊人 快看一看吧
The results were quite amazing, have a look.
这对于制作虚拟人物动画非常棒 因为我们已有的仅仅是声音
This was great for animating digital characters when all we have is sound.
但这一次 我们想制作的是真实人物动画
But this time around, we’re interested in reanimating the footage of real, existing people.
使用学习算法完成这件事 其前提条件是要有大量的数据用来训练
A prerequisite to do this with a learning algorithm is to have a ton of data to train
——这是我们所拥有的 因为这位前总统在每周的演讲中
on – which we have in our possession as there are many hours of footage of the former president
留下了很多个小时的视频
speaking during his weekly address.
这是利用递归神经网络完成的
This is done using a recurrent neural network.
递归神经网络是一种学习算法 其输入和输出可以
Recurrent neural networks are learning algorithms where the inputs and outputs can be sequences
是数据序列
of data.
所以在第一部分 输入可以是某人的一段音频
So here, in the first part, the input can be a piece of audio with the person saying
它能够合成合适的嘴形并随着时间的推移
something, and it is able to synthesize the appropriate mouth shapes and their evolution
配合音频演变
over time to match the audio.
下一步是从学习算法生成的粗略形状中创建一个实际的嘴的纹理
The next step is creating an actual mouth texture from this rough shape that comes from
然后将其用作对合成器的输入
the learning algorithm, which is then used as an input to the synthesizer.
此外 该算法还提供了一个额外的姿势匹配模块
Furthermore, the algorithm is also endowed with an additional pose matching module to
以确保合成的嘴型与头部的姿势适当匹配
make sure that the synthesized mouth texture aligns with the posture of the head properly.
最后重定时步骤确保头部运动与语音同步
The final retiming step makes sure that the head motions follow the speech correctly.
如果你对这一步骤是否必要有疑问 这里是一些有无重新定时的比较
If you have any doubts whether this is required, here are some results with and without the retiming step.
没有重定时 他随机乱动 看起来不自然
Without retiming, he moves randomly and appears unnatural.
如你所见 这确实大大增强了最后镜头的真实感
You can see that this indeed substantially enhances the realism of the final footage.
更好的是 当结合谷歌DeepMind的WaveNet 给予足够的训练数据
Even better, when combined with Google DeepMind’s WaveNet, given enough training data, we could
我们就不再需要整个音频而只要一段文本 即可变成奥巴马或
skip the audio footage altogether and just write a piece of text, making Obama, or someone
别人说我们写的话
else say what we’ve written.
还有大量其他细节有待解决 例如
There are also a ton of other details to be worked out, for instance, there are cases
人们说话之前嘴型变化的情况也是需要考虑的
where the mouth moves before the person starts to speak, which is to be taken into consideration.
可怕的“嗯”和“啊”就是一个典型例子
The dreaded “umm”-s and “ahh”-s are classical examples of that
还有很重要的颌矫正步骤 以及其他更多
There is also an important jaw correction step and more.
这是一篇非常出色的作品 有许多非凡的决定都在论文中描述了
This is a brilliant piece of work with many non-trivial decisions that are described in
——一定要仔细看看 一如既往
the paper – make sure to have a look at it for details, as always, there is a link to
在视频描述中可获得链接
it is available in the video description.
与去年的系列中我们讲过的Face2face论文
The results are also compared to the Face-to-face paper from last year that we also covered
作了对比
in the series.
仅一年时间就产生如此巨大的飞跃堪称奇迹
It is absolutely insane to see this rate of progress over the lapse of only one year.
如果你喜欢这集 并且觉得每月8个类似视频值得花1美元
If you have enjoyed this episode and you feel that eight of these videos a month is worth
请在Patreon上支持我们
a dollar, please consider supporting us on Patreon.
你可以获得一些很酷的福利 并且可以极大的帮助我们
You can pick up some really cool perks there and it is also a great deal of help for us
制作更好的视频
to make better videos for you in the future.
我以前也写了些文章 是关于你们的大力支持带给我们的
Earlier I also wrote a few words about the changes we were able to make because of your
改变
amazing support.
详情请看描述
Details are available in the description.
感谢观看和支持 下期再见!
Thanks for watching and for your generous support, and I’ll see you next time!

发表评论

译制信息
视频概述

利用一段音频和一段真人说话视频,来合成同步的发音口型,从而使视频中的人物看起来在说这段音频。

听录译者

收集自网络

翻译译者

朦胧星海

审核员

审核团O

视频来源

https://www.youtube.com/watch?v=nsuAQcvafCs

相关推荐