亲爱的学霸们 这是由Károly Zsolnai-Fehér带来的两分钟论文
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
We have talked about some awesome previous works
where we used learning algorithms to teach digital creatures
to navigate in complex environments.
The input is a terrain and a set of joints, feet, and movement types,
and the output has to be a series of motions that maximizes some kind of reward.
This previous technique borrowed smaller snippets of movements from a previously existing database
of motions and learned to stitch them together in a way that looks natural.
And as you can see, these results are phenomenal.
你可能会说 新技术的特点 看起来并不怎么复杂精致
And the selling point of this new one, which you might say, looks less elaborate, however,
it synthesizes them from scratch.
This problem is typically solved via reinforcement learning, which is a technique that comes
up with a series of decisions to maximize a prescribed score.
This score typically needs to be something reasonably complex, otherwise the algorithm
is given too much freedom to maximize it.
For instance, we may want to teach a digital character to run or jump hurdles, but it may
start crawling instead, which is still completely fine if our objective is too simple, for instance,
那它可能不会学着跑 而是学着爬 因为这样也能达到目标
just maximizing the distance from the starting point.
为了防止这种情况 一般我们会使用奖励工程的方法 也就是
To alleviate this, we typically resort to reward engineering, which means that we add
additional terms to this reward function to regularize the behavior of these creatures.
比如 我们会限定动作 让其身体必须保持直立
For instance, we can specify that throughout these motions, the body has to remain upright
which likely favors locomotion-type solutions.
However, one of the main advantages of machine learning is that we can reuse our solutions
for a large set of problems.
If we have to specialize our algorithm for all terrain and motion types, and different
kinds of games, we lose out on one of the biggest advantage of learning techniques.
所以 DeepMind 的研究者们就决定用奖励函数来解决这一问题
So researchers at DeepMind decided that they are going to solve this problem with a reward
function which is nothing else but forward progress.
The further we get, the higher score we obtain.
This is amazing because it doesn’t require any specialized reward function but at the
same time, there are a ton of different solutions that get us far in these terrains.
如你所见 除了两条腿的 还有很多种类的运动单元我们可以选
And as you see here, beyond bipeds, a bunch of different agent types are supported.
The key factors to make this happen is to apply two modifications to the original reinforcement
One makes the learning process more robust and less dependent on what parameters we choose,
and the other one makes it more scalable, which means that it is able to efficiently
deal with larger problems.
深入一些讲 训练过程本身在一个丰富的 经过精心选择的挑战等级上进行的
Furthermore, the training process itself happens on a rich, carefully selected set of challenging levels.
Make sure to have a look at the paper for details.
A byproduct of this kind of problem formulation, is, as you can see, that even though this
humanoid does its job with its lower body well, but in the meantime, it is flailing
its arms like a madman.
The reason is likely because there is not much of a difference in the reward between
different arm motions.
This means that we most likely get through a maze or a heightfield even when flailing,
therefore the algorithm doesn’t have any reason to favor more natural looking movements for
the upper body.
It will probably choose a random one, which is highly unlikely to be a natural motion.
This creates high quality, albeit amusing results that I am sure some residents of the
网友们用 Benny Hill 的音乐给它做的鬼畜视频了
internet will honor with a sped-up remix video with some Benny Hill music.
总结一下 在没有预设的动作数据库 没有手动调整的奖励函数 没有
In summary, no precomputed motion database, no handcrafting of rewards, and no additional
Everything is learned from scratch with a few small modifications to the reinforcement
Highly remarkable work.
If you’ve enjoyed this episode and would like to help us and support the series, have a
look at our Patreon page.
Details and cool perks are available in the video description, or just click the letter
P at the end of this video.
Thanks for watching and for your generous support, and I’ll see you next time!
亲爱的学霸们 这是由Károly Zsolnai-Fehér带来的两分钟论文