未登录,请登录后再发表信息
最新评论 (0)
播放视频

DeepMind发布星际2的学习环境

DeepMind Publishes StarCraft II Learning Environment | Two Minute Papers #182

亲爱的学霸们 这是由Károly Zsolnai-Fehér带来的两分钟论文
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
本期的话题也许是大家最为期待的话题之一
This topic has been perhaps the most highly anticipated by you Fellow Scholars and I am
而我也非常高兴能为大家介绍首篇由 DeepMind 和暴雪游戏合作
extremely excited to show you the first joint paper between DeepMind and Blizzard on creating
创作的第一篇有关 AI 玩星际争霸2的论文
an AI program to play Stacraft II.
超赞的!
Hell yeah!
这篇论文每一个细节都处理得很细致
And fortunately, we have a paper where every detail is meticulously described, so there’s
所以不太会造成误解
much less room for misunderstandings.
在开始之前 请注意 这只是个初步成果 所以请不要期望它能达到超越人类的水准
And before we start, note that this is a preliminary work, so please don’t expect superhuman performance.
不管你以前觉得这问题多难 等会你就会知道
However difficult you thought this problem was, you’ll see in a minute that it’s way
这个问题比一般人想象的复杂多了
more complex than most people would think.
不过在开始之前 还是先讲讲什么是星际争霸2
But before we start – what is Starcraft 2?
它是一个需要高超技术性策略的游戏 这也使得给它编写一个强大的
It is a highly technical strategy game which will be a huge challenge to write a formidable
AI 是一个巨大的挑战 原因有三
AI for because of the three reasons.
第一 我们只有一张部分可见的地图和其中包含的有限的信息
One, we have imperfect information with a partially observed map.
如果你想知道对手在干什么 就必须付出一些资源去侦查
If we wish to see what the other opponent is up to, we have to devote resources to scouting,
而这次侦查可能成功也可能失败 这取决于对手是否警觉
which may or may not be successful depending on the vigilance of the other player.
第二 我们需要在极有限的时间内选择和操控上百个单位
Two, we need to select and control hundreds of units under heavy time pressure.
一次的错误决策就可能导致我们损失绝大部分兵力 而且无力回天
One wrong decision, and we can quickly lose most of our units and become unable to recover from it.
第三 可能是最重要的一点 就是游戏得有长线策略
And three, perhaps the most important part – long-term strategies need to be developed,
游戏前期一个不好的决策 可能会导致后期数以千计的努力被毁
where a poor decision in the early game can lead to a crushing defeat several thousands of actions later.
这些情形是非常难以识别和学习的
These cases are especially difficult to identify and learn.
然而 现在关心玩游戏的部分有点早了
However, we ran a bit too far ahead to the gameplay part.
在所有这些之前 首先要做的就是
What needs to be emphasized is that there is a step number one before that.
确保 AI 和游戏之间的通讯和交互
And that step number one is making sure that the AI is able communicate and interact with
而这是一个大工程
the game, which requires a significant engineering effort.
这篇论文里 研究者使用了一个基于python的交互程序 来使这一切成为现实
In this paper, a Python-based interface is described to make this happen.
我们真的很幸运 有像DeepMind和OpenAI这样的公司致力于去
It is great to have companies like DeepMind and OpenAI who are devoted to lay down the
打造这样一个界面 因为这真的是一项难度很高的任务
foundations for such an interface, which is a herculean task.
如果人工智能的研究只局限在学术界的话 那这项研究可能永远不会出现
This work would likely had never seen the light of day if AI research would only take place in academia.
为了这一切 请向DeepMind的开发者们致敬!
Huge respect and much thanks for the DeepMind guys for making this happen.
为了玩这个游戏 他们使用了深度增强学习 这个在我们之前的节目中
To play the game, Deep Reinforcement Learning is used, which you heard about earlier in
有所提及
the series.
这是一个很强力的学习算法 它让神经网络去处理视频输入
This is a powerful learning algorithm where a neural network is used to process the video
并将其和增强学习算法结合了起来
input and is combined with a reinforcement learner.
通过增强学习算法 我们就可以观察周围环境并选择
With reinforcement learning, we’re observing the environment around us and choose the next
接下来的行动 以期最大化分数或收益
action to maximize a score or reward.
不过 定义雅达利弹球这种游戏的分数是非常容易的 因为我们知道如果我们的
However, defining score was very easy in Atari Breakout, because we knew that if the number
生命数降至零 我们就输了 如果我们打了很多砖块 我们的分数就会提升
of our lives drops to zero, we lost, and if we break a lot of bricks, our score improves.
简单明了
Simple.
但星际2可没有这么简单 我们怎么精确地知道我们是在赢得游戏的胜利呢?
Not so much in Starcraft 2, because how do we know exactly if we’re winning?
如何定义我们想要最大化的这个分数呢?
What is the score we’re trying to maximize?
在这篇论文中 关于这个分数有两个定义 第一个是我们只能在游戏最后才能知道的
In this work, there are two definitions for score: one that we get to know at the very
游戏系统告诉我们是否获胜 平局 或者输掉比赛
end that describes whether we won, had a tie, or lost.
这是最终的起作用的分数
This is the score that ultimately matters.
但是 这个信息在游戏过程中是不可见的 也没法被增强学习算法使用
However, this information is not available throughout the game to drive the reinforcement
所以论文中还有另一个中间分 被称作暴雪分
learner, so there is an intermediate score that is referred to as Blizzard score in the
它包括一个当前资源和攻防等级 以及我们的单位
paper, which involves a weighted sum of current resources and upgrades,
和建筑的加权总和
as well as our units and buildings.
作为一个第一近似值这听起来很不错
This sounds good for a first approximation,
因为当我们管理好资源和赢得遭遇战的时候它就增加
since it is monotonically increasing if we
而我们在丢失比赛优势的时候它就减小
win battles and manage our resources well, and decreases when we’re losing.
然而也有很多比赛里获得资源多的一方最后并没有
However, there are many matches where the player with the more resources does not have
来得及利用这些资源 最后输掉了关键的那场遭遇战
enough time to spend it and ultimately loses a deciding encounter.
所以最大化这个分数是否能打败强大的人类选手
It remains to be seen whether this is exactly what we need to maximize to beat a formidable
还有待观察
human player.
在处理视频流方面 研究中也使用了一些重要的工程决策
There are also non-trivial engineering decisions on how to process the video stream.
处理系统使用了一组特征层 特征层中为 AI 编码了一些比赛相关信息
The current system uses a set of feature layers, which encode relevant information for the
比如地形 镜头位置 屏幕中各单位的攻击点 以及其他
AI, such as terrain height, the camera location, hit points for the units on the screen and
很多很多信息
much, much more.
这一堆海量的信息需要卷积神经网络
There is a huge amount of information that the convolutional neural network has to make
来处理
sense of.
我觉得现在就想随便把 AI 扔进水这么深的游戏里
And I think it is now easy to see that starting out with throwing the AI in the deep water
然后还期待它能完美地赢下一场1V1的比赛
and expecting it to perform well on a full one versus one match, at this point, is a
有点为时尚早
forlorn effort.
论文中使用了一组小游戏 在其中算法可以分开学习
The paper describes a set of minigames, where the algorithm can learn different aspects
游戏的不同方面 比如收集地图上分布的水晶矿碎片
of the game in isolation, such as picking up mineral shards scattered around the map,
在小型遭遇战中消灭敌方单位 建造我方单位或者收集资源这些
defeating enemy units in small skirmishes, building units or harvesting resources.
在这些小游戏中 AI已经达到了新手玩家的水平
In these minigames, the AI has reached the level of a novice human player, which is quite
考虑到游戏的量级和复杂程度 这真是令人惊叹的结果
amazing given the magnitude and the complexity of the problem.
论文的作者也鼓励其他人创建更多的小游戏来训练 AI
The authors also encourage the community to create more minigames for the AI to train on.
我真的非常欣赏论文作者的开放性 以及开发者社区在这方面的工作
I really love the openness and the community effort aspects of this work!
今天我们只是非常粗浅的介绍了一下这篇论文 论文中还有很多
And we’ve only just scratched the surface, there is so much more in the paper, with a
非常重要的设计决策 以及成千上万的游戏记录我们没有提到
lot more non-trivial design decisions and a database with tens of thousands of recorded games.
此外 游戏 AI 开发环境的源代码已经
And, the best part is that the source code for this environment is available right now
为各位开发者开放了
for the fellow tinkerers out there.
我会把链接放在视频说明中
I’ve put a link to this in the video description.
即便对于我们这个时代最优秀的 AI 研究者来说 这也是一个非常严峻的挑战
This is going to be one heck of a challenge for even the brightest AI researchers of our time.
我已经迫不及待的想去阅读这些代码了 同时我也非常期待有关这一课题的
I can’t wait to get my hands on the code and also, I am very excited to read some followup
后续论文
papers on this.
预计在接下来的几个月将会有很多论文发表
I expect there will be many of those in the following months.
同时我们也了解到 OpenAI 在 DOTA AI 研究中也取得了令人瞩目的成果
In the meantime, as we know, Open AI is also working on DOTA with remarkable results, and
有关 DOTA 5V5 对战和星际2 1V1对战 哪个对 AI 来说更复杂
there’s lots of discussion whether a DOTA 5 versus 5 or a Starcraft 2 1 versus 1 game
也有很多的讨论
is more complex for the AI to learn.
如果你对此也有想法 就在视频下方留言吧
If you have an opinion on this, make sure to leave a comment below this video.
哪一个更复杂一些呢?
Which is more complex?
为什么?
Why?
这也表明今年在 AI 和游戏方面会有更多有意思的东西
This also signals that there’s going to be tons of fun to be had with AI and video games this year.
敬请期待!
Stay tuned!
感谢您的收看和大力支持 我们下期再见!
Thanks for watching and for your generous support, and I’ll see you next time!

发表评论

译制信息
视频概述

视频介绍了Deepmind和暴雪合作研究AI玩星际争霸2的论文。视频有关源代码和项目地址请点击原视频简介查看。

听录译者

收集自网络

翻译译者

[B]刀子

审核员

审核团O

视频来源

https://www.youtube.com/watch?v=St5lxIxYGkI

相关推荐