ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

DeepMind的AI从零开始学习运动 – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

DeepMind的AI从零开始学习运动

DeepMind's AI Learns Locomotion From Scratch | Two Minute Papers #190

亲爱的学霸们 这是由Károly Zsolnai-Fehér带来的两分钟论文
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
我们之前提到过一些很棒的论文
We have talked about some awesome previous works
这些论文运用学习算法来教会数字生物
where we used learning algorithms to teach digital creatures
如何行进通过复杂的环境
to navigate in complex environments.
输入的数据是一片地形和一组关节 脚和动作类型
The input is a terrain and a set of joints, feet, and movement types,
输出的结果得是将成果最大化的一系列动作
and the output has to be a series of motions that maximizes some kind of reward.
这项技术借用了之前动作数据库中的一小部分动作数据
This previous technique borrowed smaller snippets of movements from a previously existing database
并且学会了将它们天衣无缝地组合起来
of motions and learned to stitch them together in a way that looks natural.
如你所见 这些成果相当惊人
And as you can see, these results are phenomenal.
你可能会说 新技术的特点 看起来并不怎么复杂精致
And the selling point of this new one, which you might say, looks less elaborate, however,
但说到底 它们可是从无到有的
it synthesizes them from scratch.
这个问题的解决方式是典型的增强学习 这项技术
This problem is typically solved via reinforcement learning, which is a technique that comes
就是通过一系列行为决策来将结果的收益最大化
up with a series of decisions to maximize a prescribed score.
目标结果必须设定的相对复杂一些 不然的话
This score typically needs to be something reasonably complex, otherwise the algorithm
算法可能会为了达到最大化收益而不择手段
is given too much freedom to maximize it.
比如 我们本来想要教一个机器人跑步或跨栏
For instance, we may want to teach a digital character to run or jump hurdles, but it may
如果我们的目标设定太简单 比如就是想让机器人走的越远越好
start crawling instead, which is still completely fine if our objective is too simple, for instance,
那它可能不会学着跑 而是学着爬 因为这样也能达到目标
just maximizing the distance from the starting point.
为了防止这种情况 一般我们会使用奖励工程的方法 也就是
To alleviate this, we typically resort to reward engineering, which means that we add
在奖励函数里增加额外的参数来规范数字生物的行为
additional terms to this reward function to regularize the behavior of these creatures.
比如 我们会限定动作 让其身体必须保持直立
For instance, we can specify that throughout these motions, the body has to remain upright
这样才能让移动方式更加的合理自然
which likely favors locomotion-type solutions.
不过 机器学习的一个主要优势就是我们在处理大数据问题时
However, one of the main advantages of machine learning is that we can reuse our solutions
可以复用以前的解决方案
for a large set of problems.
如果我们要去为每种地形 运动类型和比赛项目
If we have to specialize our algorithm for all terrain and motion types, and different
配备独特的算法的话 那就完全失去了机器学习技术的最大优势
kinds of games, we lose out on one of the biggest advantage of learning techniques.
所以 DeepMind 的研究者们就决定用奖励函数来解决这一问题
So researchers at DeepMind decided that they are going to solve this problem with a reward
这个函数其实就是前进过程
function which is nothing else but forward progress.
就是这样
That’s it.
前进的越多 得的分就越高
The further we get, the higher score we obtain.
这种方法非常精彩因为其并不需要独特的奖励函数
This is amazing because it doesn’t require any specialized reward function but at the
不过与此同时 能让我们在这些地形上走很远的解决方案就有些太多了
same time, there are a ton of different solutions that get us far in these terrains.
如你所见 除了两条腿的 还有很多种类的运动单元我们可以选
And as you see here, beyond bipeds, a bunch of different agent types are supported.
关键点就是在原始增强学习算法上增加
The key factors to make this happen is to apply two modifications to the original reinforcement
两项调整
learning algorithm.
一项让学习进程更加健壮 并且对我们选择的参数依赖更小
One makes the learning process more robust and less dependent on what parameters we choose,
另一项让学习过程可扩展 也就是让其可以
and the other one makes it more scalable, which means that it is able to efficiently
处理更大的问题
deal with larger problems.
深入一些讲 训练过程本身在一个丰富的 经过精心选择的挑战等级上进行的
Furthermore, the training process itself happens on a rich, carefully selected set of challenging levels.
对于更多细节可以研究一下这篇论文
Make sure to have a look at the paper for details.
这一问题模式的一个副产品就是
A byproduct of this kind of problem formulation, is, as you can see, that even though this
即便这个类人机器人下半身做的很好 它的上半身还是
humanoid does its job with its lower body well, but in the meantime, it is flailing
如同一个疯子一样的挥舞手臂
its arms like a madman.
这其实是因为对于不同的手臂运动模式
The reason is likely because there is not much of a difference in the reward between
奖励差别不大
different arm motions.
这也就等同于带上枷锁也能通过迷宫或者高度场一样
This means that we most likely get through a maze or a heightfield even when flailing,
因此算法没理由去选择对于上身运动更加自然
therefore the algorithm doesn’t have any reason to favor more natural looking movements for
的运动方式
the upper body.
所以它可能就会随机选择一种并不自然的运动方式
It will probably choose a random one, which is highly unlikely to be a natural motion.
这也就创造出了这种非常搞笑的结果 我都能看到
This creates high quality, albeit amusing results that I am sure some residents of the
网友们用 Benny Hill 的音乐给它做的鬼畜视频了
internet will honor with a sped-up remix video with some Benny Hill music.
总结一下 在没有预设的动作数据库 没有手动调整的奖励函数 没有
In summary, no precomputed motion database, no handcrafting of rewards, and no additional
什么神奇魔法的帮助
wizardry needed.
所有结果都是通过对于增强学习算法的一个小调整
Everything is learned from scratch with a few small modifications to the reinforcement
从无到有学出来的
learning algorithm.
非常精彩的处理
Highly remarkable work.
如果你喜欢这期视频 乐意帮助并支持这个系列的话
If you’ve enjoyed this episode and would like to help us and support the series, have a
看一下我们的Patreon页面吧
look at our Patreon page.
视频简介中有更多细节和彩蛋
Details and cool perks are available in the video description, or just click the letter
或者点一下视频后面的P
P at the end of this video.
感谢你们的观看和慷慨支持 下期再见
Thanks for watching and for your generous support, and I’ll see you next time!

发表评论

译制信息
视频概述

在数字生物运动模式的奖励函数进行一个小的改动,就可以让其学会在不同的地形中自如跑动 跳跃等。副作用是动作略鬼畜。自由度:物体运动方向的数量。促动器:即运动关节。高度场:类似水面被击出涟漪的界面。前庭感受器:中枢神经系统中用来感受平衡的神经。

听录译者

收集自网络

翻译译者

[B]刀子

审核员

审核团1024

视频来源

https://www.youtube.com/watch?v=14zkfDTN_qo

相关推荐