OpenAI's Bot Beats DOTA World Champion Dendi | Two Minute Papers #180

学霸们大家好 这是由Károly Zsolnai-Fehér带来的两分钟论文
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
It is time for some minds to be blown.
DOTA2是一个多人在线竞技游戏 它拥有一大堆忠实拥趸 每年一度的世界冠军赛
DOTA 2 is a multiplayer online battle arena game with a huge cult following and world
championship events with a prize pool of over 20 million dollars.
在DOTA中 玩家们分成两队 每人控制一个英雄 使用各自的策略和
In this game, players form two teams and control a hero each and use their strategy and special
abilities to defeat the other team.
人工智能初创公司OpenAI最近为DOTA2创造了一个一个人工智能系统 它水平非常高 甚至可以挑战当今世界上
OpenAI recently created an AI for this game that is so good that they challenged the best
players in the world.
注意 这个程序并不是在玩一个完整版的DOTA2 而是中路SOLO
Now note that this program is not playing the full feature set of the game, but a version
这种模式限制双方进行一对一遭遇战 并且还禁用了游戏里一些其它的元素
that is limited to one versus one encounters with several other elements of the game disabled.
尽管已经涉及到了很多战略层面的东西 我们一直在讨论的是在这个对战片段里的长期规划
Since lots of strategy is involved, we always discuss in these episodes that long-term planning
is the Achilles-heel of these learning algorithms.
前期一个小失误常像滚雪球似的放大 在游戏后期无法收拾
A small blunder in the early game can often snowball out of control by the end of the
而且判断这些状况对于人工智能来说是很困难的 甚至对于人类也是
match, and it is hard for the AI, and sometimes, to even humans to identify these cases.
这个游戏是一个非常巨大的挑战 因为不同于国际象棋和围棋 DOTA2有很多不完整信息
And this game is a huge challenge because unlike chess and go, it has lots of incomplete
information, and even the simplified one versus one mode involves a reasonable amount of long-term planning.
它还涉及到攻击 诱导 欺骗你的对手 我们可以把它想象成一个策略游戏
It also involves attacks, trickery and deceiving an opponent and can be imagined as a strategy
game that also requires significant technical prowess to pull off the most spectacular moves.
This game is also designed in a way that new and unfamiliar situations come up all the
要想成为大师 就需要很多经验和瞬间决策能力
time which require lots of experience and split-second decisionmaking to master.
这是一次真正的考验 对任何人工智能系统都是
This is a true test for any kind of AI.
注意这个人工智能系统并没有任何关于这个游戏的先验知识 甚至没有规则
And note that this AI wasn’t told anything about the game, not even the rules, and was
just instructed to try to find a way to win.
这个算法用了24小时来训练 在这段时间里 它不仅学到了游戏的规则
The algorithm was trained in 24 hours, and during this time, it not only learned the
和目标 还实现了一些非凡的战术
rules and objectives of the game, but it also pulls off remarkable tactics.
比如 别的玩家震惊于机器竟然没有中他们布下的诱饵
For instance, other players were very surprised that the bot didn’t take the bait, which typically
means a smart tactic involving giving up a smaller battle in favor of winning a bigger objective.
The AI has a ton of experience playing the game and typically sees through these shenanigans.
In this game, there are also neutral units that we call creep.
当他们被对手击杀时 会提供珍贵的金钱和经验给对手 所以我们通常
When killed, they grant precious gold and experience to our opponent, so we typically
try to deny that.
当这些单位遇到障碍物时 他们会绕着障碍物走 所以玩家们开发了一种技术
If these units encounter an obstacle, they go around it, so players developed a technique
叫做卡兵 这是一种将小兵卡在英雄后面
by the name creep blocking, which is the art of holding them up by the hero character to
minimize the distance traveled by them in a unit of time.
And the AI has not only learned about the existence of this technique by itself, but
还将这种技术实现地分毫不差 简直惊了个呆
it also executes it with stunning precision, which is quite remarkable.
再重申一遍 在训练过程中 它从来没有看过任何人类玩家玩这个游戏或是
And again, during the training phase, it had never seen any human play the game and do
something like this.
另一个惊人的东西是当人类玩家消失在迷雾中时 人工智能
The other remarkable thing is that when a player disappears in the darkness, the AI
预测到了他们可能在做什么 做出了相应的计划 并在人类玩家可能出现的地方
predicts what he could be doing, plans around it, and strikes where the player is expected
to show up.
如果你还记得 DeepMind最初的围棋算法包括一个引导步骤 其中人工智能
If you remember, DeepMind’s initial Go algorithm contained a bootstrapping step where it was
被输入了大量的人类棋局 从而帮助他们去获取一些基本技巧
fed a large amount of games by players to grasp the basics.
The truly remarkable thing is that none of that happened here.
这个算法的训练仅仅只有24小时 并且只进行自我对战
This algorithm was trained for only 24 hours and it only played against itself.
当它最终面对Dendi 曾经统治DOTA2届的冠军选手时 第一场比赛
When it finally played against Dendi, the reigning world champion, the first match was
真的是一场盛宴 我完全被机器击败Dendi给惊呆了
such a treat, and I was shocked to see that the AI has outplayed him.
在第二场比赛里 人类选手制造了一种他认为机器并没有遇见过的情形
In the second game, the player tried to create a situation that he thought the AI hasn’t
encountered before by giving up some creep to it.
The program ruthlessly took advantage of this mistake and defeated him almost immediately.
OpenAI的机器不仅拿下了这场比赛 而且明显击垮了Dendi的意志 以至于他
OpenAI’s bot not only won, but apparently also broke the will of Dendi, who tapped out
after two matches.
I feel like someone being hit by a sledgehammer.
I didn’t even know this was being worked on!
This is such a remarkable achievement.
通常我听到的第一个论断是 当然 人工智能可以不停地玩
Usually, the first argument I hear is that of course, the AI can play non-stop without
bathroom breaks or sleep.
然而 我们必须承认 一些玩家同样能做到不吃不喝 而且这个算法只训练了
While, admittedly, this is also true for some players, the algorithm was only trained for
24 hours.
注意这仍然意味着巨量的练习 但说到训练时间
Note that this still means a stupendous amount of games played, but in terms of training
time, given that these algorithms typically take from weeks to months to train properly,
24 hours is nothing.
The second argument that I often hear is that the AI should of course win every time, because
因为机器具有近乎为零的反应时间 而且在一秒内能做出上千次的操作
it has close to 0 reaction time and can perform thousands of actions every second.
比如 如果我们玩一个游戏比谁
For instance, if we would play a game where the goal is to perform the most amount of
一分钟内的操作更多 很明显 人类因为有生理限制永远不会胜过
actions per minute, clearly, humans with biological limitations would stand no chance against
a computer program.
但是 在打DOTA这件事上 算法在一分钟内的操作数
However, in this case, the number of actions that this algorithm performs in a minute is
comparable to that of a human player.
这意味着这些结果来自于更优秀的技术和战术规划 而不是
This means that these results stem from superior technical abilities and planning, and not
from the fact that we’re talking about a computer.
We can look at this result from two different directions.
一方会说 无所谓啊 这只是一个高度限制的残废版
One could be saying, well, no big deal, because this is only a highly limited and hamstrung
的DOTA 相比于完整的5对5的团队比赛差远了
version of the game, which is way less complex than a fully-fledged 5 versus 5 team match.
而另一方会说我们可以说算法已经展现了惊人的能力 用来学习高度
Or, two, we could say that the algorithm had shown a remarkable aptitude for learning highly
复杂的技术操作和长期策略 用于一个困难的游戏里
sophisticated technical maneuvers and longer-term strategy in a difficult game.
And the rest is only a matter of time.
事实上 在5对5模式里有更多的空间供智能程序来发挥
In fact, in 5 versus 5, there is even more room for a highly intelligent program to shine
and create new tactics that we’ve never thought of.
我可以为此赌上任何东西 将来我们一定将会震惊于
I would bet that if anything, we’re going to be even more surprised by the 5 versus
5 results later.
我们仍然欠缺一些细节 但我已经联系过OpenAI的兄弟们 他们说
We are still lacking in details a bit, but I have contacted the OpenAI guys who noted
that there will be more information available in the next few days.
一旦有什么新信息出现 我一定会在这里给大家汇报的
Whenever something new appears, I’ll be here to cover it for you Fellow Scholars.
感谢收看和慷慨支持 下次再见
Thanks for watching and for your generous support, and I’ll see you next time!