ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

DeepMind的AI学习基于想象的规划 – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

DeepMind的AI学习基于想象的规划

DeepMind's AI Learns Imagination-Based Planning | Two Minute Papers #178

亲爱的学者们 这里是Károly Zsolnai-Fehér的两分钟论文
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
两年多前 DeepMind的员工们实现了一种算法
A bit more than two years ago, the DeepMind guys implemented an algorithm that could play
只需通过观看你看到的游戏视频 它就可以将Atari公司的Breakout游戏玩出超凡的水平
Atari Breakout on a superhuman level by looking at the video feed that you see here.
这个新闻马上使世界大为轰动
And the news immediately took the world by storm.
原始的文章发表了有两年多一点的时间 并且
This original paper is a bit more than 2 years old and has already been referenced in well
被一千篇其他文章所引用
over a thousand other research papers.
这是一篇很轰动的文章
That is one powerful paper!
这个算法是基于神经网络和强化学习的
This algorithm was based on a combination of a neural network and reinforcement learning.
神经网络用于理解视频画面的输入
The neural network was used to understand the video feed, and reinforcement learning
强化学习则会提出合适的对策行为
is there to come up with the appropriate actions.
这就是玩游戏的部分
This is the part that plays the game.
强化学习非常适合于那些处于变化中的环境
Reinforcement learning is very suitable for tasks where we are in a changing environment
并且基于环境来选择行为 从而使得分
and we need to choose an appropriate action based on our surroundings to maximize some
最大化
sort of score.
举例来说 这些分可以是我们在迷宫中所走的距离
This score can be for instance, how far we’ve gotten in a labyrinth, or how many collisions
或者我们避免直升机碰撞的次数 或者一些反映我们表现如何
we have avoided with a helicopter, or any sort of score that reflects how well we’re
的分数
currently doing.
这个算法同时适用于动物如何学习新东西
And this algorithm works similarly to how an animal learns new things.
它观察环境 尝试不同的行为 看它们是否有好的效果
It observes the environment, tries different actions and sees if they worked well.
如果有 就继续做 如果没有 那就尝试别的东西
If yes, it will keep doing that, if not, well, let’s try something else.
Pavlov的狗与铃铛就是一个很好的例子
Pavlov’s dog with the bell is an excellent example of that.
这个领域中有许多已开展的工作 它们在很多问题和电脑游戏上
There are many existing works in this area and it performs remarkably well for a number
都表现得相当好 但是这只有在行为与奖励是即时对应的情况下
of problems and computer games, but only if the reward comes relatively quickly after
才有用
the action.
例如 在Breakout游戏中 如果我们没接到球 就会马上少一条命
For instance, in Breakout, if we miss the ball, we lose a life immediately, but if we
但如果我们接到球 就可以马上打破一些砖块从而增加分数
hit it, we’ll almost immediately break some bricks and increase our score.
这对于一个建立好的强化学习算法再合适不过了
This is more than suitable for a well-built reinforcement learner algorithm.
然而 这项早期的工作在其他需要长期计划的游戏中 表现得并不是太好
However, this earlier work didn’t perform well on any other games that required long-term planning.
如果Pavlov因为他的狗两天前做的事情而奖励它零食 那这条狗
If Pavlov gave his dog a treat for something that it did two days ago, the animal would
将完全不知道是什么行为导致了这个美味的奖励
have no clue as to which action led to this tasty reward.
这篇论文的主题是一项游戏 在这个游戏中 我们控制这个绿色的人物
And this work’s subject is a game where we control this green character and our goal
目标是把箱子推进红点中
is to push the boxes onto the red dots.
不仅是对算法来说 哪怕是对人类来说 这个游戏都特别难
This game is particularly difficult, not only for algorithms, but even humans, because of
因为两个重要原因 第一 这需要长期计划 正如我们所知道的
two important reasons: one, it requires long-term planning, which, as we know, is a huge issue
这对强化学习算法来讲是一个大问题
for reinforcement learning algorithms.
仅仅是因为一个箱子在一个点旁边 并不意味着它属于那里
Just because a box is next to a dot doesn’t mean that it is the one that belongs there.
这就是这个游戏令人很讨厌的特点
This is a particularly nasty property of the game.
第二 一些我们犯下的错误是不可逆的 例如 将一个箱子推进角落
And two, some mistakes we make are irreversible, for instance, pushing a box in a corner can
那就不可能过关了
make it impossible to complete the level.
如果一个算法可以尝试一系列的行为 然后看他们是否有用 好吧 这样的算法
If we have an algorithm that tries a bunch of actions and sees if they stick, well, that’s
在这里是不可行的
not going to work here!
现在你应该看出来了 这是一个很难的问题
It is now hopefully easy to see that this is an obscenely difficult problem, and the
DeepMind的员工们想出了一个Imagination-Augmented Agents作为解决方法
DeepMind guys just came up with Imagination-Augmented Agents as a solution for it.
那么 在这个很酷的名字下是什么呢?
So what is behind this really cool name?
关于这个新奇的架构下的有趣的部分是 它运用了想象
The interesting part about this novel architecture is that it uses imagination, which is a routine
并不是仅仅编出一个动作 而是想出了一个包含许多步骤的完整计划 最终
to cook up not only one action, but entire plans consisting of several steps, and finally, choose one
选择一个从长远角度来看 被认为有最好结果的计划
that has the greatest expected reward over the long term.
这需要基于现在情形 来想象可能的未来 然后选择
It takes information about the present and imagines possible futures, and chooses the
一个回报最大的
one with the most handsome reward.
如你所见到的 这只是第一篇关于新架构的文章
And as you can see, this is only the first paper on this new architecture and it can
它可以解决有七个箱子的问题
already solve a problem with seven boxes.
这是难以置信的
This is just unreal.
绝对是惊人的工作
Absolutely amazing work.
请记住这是一个具有普遍性的算法 它可以用于
And please note that this is a fairly general algorithm that can be used for a number of
很多不同的问题
different problems.
这个特定的游戏只是证明这项新技术诱人特性的一种方式
This particular game was just one way of demonstrating the attractive properties of this new technique.
文章包含着更多的结果 值得阅读 一定要看看
The paper contains more results and is a great read, make sure to have a look.
同时 如果你喜欢这一段 请考虑支持一下Patreon上的两分钟论文
Also, if you’ve enjoyed this episode, please consider supporting Two Minute Papers on Patreon.
细节可以在视频描述中找到 请看一看
Details are available in the video description, have a look!
谢谢观看和您慷慨的支持 我们下次再见
Thanks for watching and for your generous support, and I’ll see you next time!

发表评论

译制信息
视频概述

DeepMind团队研究出一种处理需要长期规划情况的算法

听录译者

收集自网络

翻译译者

[B]白菜

审核员

审核团O

视频来源

https://www.youtube.com/watch?v=xp-YOPcjkFw

相关推荐