ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

人工智能根据音频制作面部动画 – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

人工智能根据音频制作面部动画

AI Creates Facial Animation From Audio | Two Minute Papers #185

亲爱的学霸们 这里是由Károly Zsolnai带来的两分钟论文
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
[带辅助动作的成熟语音动画]
这篇文章是关于根据语音实时产生面部动画的
This work is about creating facial animation from speech in real time.
这意味着录制完我们说话的音频视频之后
This means that after recording the audio footage of us speaking,
我们把数据给一个学习算法
we give it to a learning algorithm,
它能够生成高质量的动画 来描绘说这些话的虚拟人物
which creates a high-quality animation depicting our digital characters uttering these words.
这个学习算法是一个卷积神经网络
This learning algorithm is a Convolutional Neural Network,
每个角色仅用了3-5分钟的镜头来训练
which was trained on as little as 3 to 5 minutes of footage per actor,
并且它能够从训练数据推及到
and was able to generalize its knowledge from
各种各样现实世界中的表情和言语
this training data to a variety of real-world expressions and words.
如果你觉得你已经了解充分了 那你应该看到视频结束
And if you think you’ve seen everything, you should watch until the end of the video as
它比你想象的要更厉害 有两点原因可以说明
it gets better than that because of two reasons.
第一个原因 它不单单可以接受音频输入 而且我们还可以指定
Reason number one, it not only takes audio input, but we can also specify an emotional
角色说这些话时应表现出的情绪
state that the character should express when uttering these words.
[挖掘和应用情绪状态]
[训练场景]
[角色处于伤心或担心的状态]
[新表现][同一情绪应用于其他音频]
[训练场景]
[人物表现出轻微的惊讶]
[新表现][同一情绪应用于其他音频]
[训练场景]
[人物处于疼痛中]
[新表现][同一情绪应用于其他音频]
第二点 也是最棒的一点 我们也可以把它和DeepMind’s 的Wavenet相结合
Number two, and this is the best part, we can also combine this together with DeepMind’s
Wavenet可以根据文本输入合成音频
WaveNet, which synthesizes audio from our text input.
无论输入什么文本 它基本上都可以合成为比较可信的人声
It basically synthesizes a believable human voice and says whatever text we write down.
然后这个技术就可以用此声音片段来生成一个虚拟人物
And then that sound clip can be used with this technique to make a digital character
来说我们写下的话
say what we’ve written.
所以我们就可以用WaveNet实现从文本到音频的转化 然后用这一技术让虚拟人物说出这段话
So we can go from text to speech with WaveNet, and put the speech onto a virtual actor with this work.
用合成音频驱动神经网络-用Wavenet生成音频
-并指明不同的说话者
这样 我们就得到了一整套的转化途径 它可以学习并用最便捷的方式
This way, we get a whole pipeline that works by learning and does everything for us in
完成一切工作
the most convenient way.
不再需要画外音演员
No actors needed for voiceovers.
不再需要动作捕捉来做动画
No motion capture for animations.
这真不可思议
This is truly incredible.
而且如果你看左边 你可以看到在他们视频里
And if you look at the left side, you can see that in their video, there is some Two
有一些正在进行中的两分钟论文
Minute Papers action going on.
这有多酷?
How cool is that?
一定要去看一下这篇论文 看一下作者提出的三路损失函数
Make sure to have a look at the paper to see the three-way loss function the authors came
来保证长时动画能够有效运作
up with to make sure that the results work correctly for longer animations.
当然 在研究中我们必须证明我们的结果优于之前的技术
And of course, in research, we have to prove that our results are better than previous techniques.
为了达成这点 在补充视频中有很多的对比
To accomplish this, there are plenty of comparisons in the supplementary video.
但是我们需要的远不止这些
But we need more than that.
因为这些结果不能被归结于一个我们需要去证明的数学定理
Since these results cannot be boiled down to a mathematical theorem that we need to
我们必须换种方式
prove, we have to do it some other way.
最终目标是能让人认为这些视频更像真人
And the ultimate goal is that a human being would judge these videos as being real with
而不是用以往的技术合成的
a higher chance than one made with a previous technique.
这就是这篇论文进行用户研究的核心思想
This is the core idea behind the user study carried out in the paper.
我们找一群人 分别给他们展示用老技术和新技术合成的视频
We bring in a bunch of people, present them with a video of the old and new technique
不告诉他们哪个是哪个 然后问他们觉得哪一个更自然
without knowing which is which, and ask them which one they feel to be more natural.
结果证明两个技术相差甚远 新技术不仅总体上效果更好
And the result was not even close – the new method is not only better overall, but I haven’t
而且从场景或语言上来看 新技术在所有案例中都遥遥领先
found a single case, scenario or language where it didn’t come out ahead.
这一点在研究中极其难得
And that’s extremely rare in research.
通常来说 在一个成熟的领域 新技术总是引入一种不同的折中方法
Typically, in a maturing field, new techniques introduce a different kind of tradeoff, for
比如 更少的执行时间 但是代价是更高的内存消耗
instance, less execution time but at the cost of higher memory consumption
这是一种典型的情况
is a classical case.
但是在这里 在所有的考察维度上它都更出色太赞了
But here, it’s just simply better in every regard. Excellent.
用游戏引擎最终渲染-绿美迪娱乐的北极光引擎
-高级过程控件来驱动眼睛
[左:基于视频的捕捉图像][右:我们基于音频生成的结果]
如果你喜欢本期节目 并且愿意帮助我们在未来做得更好
If you enjoyed this episode, and would like to help us make better videos in the future,
可以考虑在Patreon上支持我们
please consider supporting us on Patreon.
你可以获得很棒的福利 比如更早看到这些视频的更新
You can pick up cool perks like watching these episodes in early access.
更多详情参见视频描述
Details are available in the video description.
除了介绍这些重要的科研故事之外
Beyond telling these important research stories,
我们也会用这一资金中的一部分去支持其他研究项目
we’re also using part of these funds to empower other research projects.
我刚刚写了一篇关于这个问题的小文章 可以在我们的主页Patreon上看到
I just made a small write-up about this which is available on our Patreon page.
链接在视频描述里 一定要去看一看
That link is in the video description, make sure to have a look.
感谢收看和您的大力支持 我们下期再见!
Thanks for watching and for your generous support, and I’ll see you next time!

发表评论

译制信息
视频概述

新的技术不但可以实现从音频到面部动画的转换 而且效果上全方位优于前人方法 是一次非常大的突破

听录译者

收集自网络

翻译译者

吾家黄姑娘

审核员

审核团O

视频来源

https://www.youtube.com/watch?v=ZtP3gl_2kBM

相关推荐