学霸们大家好 这是由Károly Zsolnai-Fehér带来的两分钟论文
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
It is time for some minds to be blown.
DOTA2是一个多人在线竞技游戏 它拥有一大堆忠实拥趸 每年一度的世界冠军赛
DOTA 2 is a multiplayer online battle arena game with a huge cult following and world
championship events with a prize pool of over 20 million dollars.
在DOTA中 玩家们分成两队 每人控制一个英雄 使用各自的策略和
In this game, players form two teams and control a hero each and use their strategy and special
abilities to defeat the other team.
人工智能初创公司OpenAI最近为DOTA2创造了一个一个人工智能系统 它水平非常高 甚至可以挑战当今世界上
OpenAI recently created an AI for this game that is so good that they challenged the best
players in the world.
注意 这个程序并不是在玩一个完整版的DOTA2 而是中路SOLO
Now note that this program is not playing the full feature set of the game, but a version
that is limited to one versus one encounters with several other elements of the game disabled.
Since lots of strategy is involved, we always discuss in these episodes that long-term planning
is the Achilles-heel of these learning algorithms.
A small blunder in the early game can often snowball out of control by the end of the
match, and it is hard for the AI, and sometimes, to even humans to identify these cases.
这个游戏是一个非常巨大的挑战 因为不同于国际象棋和围棋 DOTA2有很多不完整信息
And this game is a huge challenge because unlike chess and go, it has lots of incomplete
information, and even the simplified one versus one mode involves a reasonable amount of long-term planning.
它还涉及到攻击 诱导 欺骗你的对手 我们可以把它想象成一个策略游戏
It also involves attacks, trickery and deceiving an opponent and can be imagined as a strategy
game that also requires significant technical prowess to pull off the most spectacular moves.
This game is also designed in a way that new and unfamiliar situations come up all the
time which require lots of experience and split-second decisionmaking to master.
This is a true test for any kind of AI.
And note that this AI wasn’t told anything about the game, not even the rules, and was
just instructed to try to find a way to win.
这个算法用了24小时来训练 在这段时间里 它不仅学到了游戏的规则
The algorithm was trained in 24 hours, and during this time, it not only learned the
rules and objectives of the game, but it also pulls off remarkable tactics.
For instance, other players were very surprised that the bot didn’t take the bait, which typically
means a smart tactic involving giving up a smaller battle in favor of winning a bigger objective.
The AI has a ton of experience playing the game and typically sees through these shenanigans.
In this game, there are also neutral units that we call creep.
当他们被对手击杀时 会提供珍贵的金钱和经验给对手 所以我们通常
When killed, they grant precious gold and experience to our opponent, so we typically
try to deny that.
当这些单位遇到障碍物时 他们会绕着障碍物走 所以玩家们开发了一种技术
If these units encounter an obstacle, they go around it, so players developed a technique
by the name creep blocking, which is the art of holding them up by the hero character to
minimize the distance traveled by them in a unit of time.
And the AI has not only learned about the existence of this technique by itself, but
it also executes it with stunning precision, which is quite remarkable.
再重申一遍 在训练过程中 它从来没有看过任何人类玩家玩这个游戏或是
And again, during the training phase, it had never seen any human play the game and do
something like this.
The other remarkable thing is that when a player disappears in the darkness, the AI
预测到了他们可能在做什么 做出了相应的计划 并在人类玩家可能出现的地方
predicts what he could be doing, plans around it, and strikes where the player is expected
to show up.
如果你还记得 DeepMind最初的围棋算法包括一个引导步骤 其中人工智能
If you remember, DeepMind’s initial Go algorithm contained a bootstrapping step where it was
fed a large amount of games by players to grasp the basics.
The truly remarkable thing is that none of that happened here.
This algorithm was trained for only 24 hours and it only played against itself.
当它最终面对Dendi 曾经统治DOTA2届的冠军选手时 第一场比赛
When it finally played against Dendi, the reigning world champion, the first match was
such a treat, and I was shocked to see that the AI has outplayed him.
In the second game, the player tried to create a situation that he thought the AI hasn’t
encountered before by giving up some creep to it.
The program ruthlessly took advantage of this mistake and defeated him almost immediately.
OpenAI的机器不仅拿下了这场比赛 而且明显击垮了Dendi的意志 以至于他
OpenAI’s bot not only won, but apparently also broke the will of Dendi, who tapped out
after two matches.
I feel like someone being hit by a sledgehammer.
I didn’t even know this was being worked on!
This is such a remarkable achievement.
通常我听到的第一个论断是 当然 人工智能可以不停地玩
Usually, the first argument I hear is that of course, the AI can play non-stop without
bathroom breaks or sleep.
然而 我们必须承认 一些玩家同样能做到不吃不喝 而且这个算法只训练了
While, admittedly, this is also true for some players, the algorithm was only trained for
Note that this still means a stupendous amount of games played, but in terms of training
time, given that these algorithms typically take from weeks to months to train properly,
24 hours is nothing.
The second argument that I often hear is that the AI should of course win every time, because
it has close to 0 reaction time and can perform thousands of actions every second.
For instance, if we would play a game where the goal is to perform the most amount of
一分钟内的操作更多 很明显 人类因为有生理限制永远不会胜过
actions per minute, clearly, humans with biological limitations would stand no chance against
a computer program.
但是 在打DOTA这件事上 算法在一分钟内的操作数
However, in this case, the number of actions that this algorithm performs in a minute is
comparable to that of a human player.
This means that these results stem from superior technical abilities and planning, and not
from the fact that we’re talking about a computer.
We can look at this result from two different directions.
一方会说 无所谓啊 这只是一个高度限制的残废版
One could be saying, well, no big deal, because this is only a highly limited and hamstrung
version of the game, which is way less complex than a fully-fledged 5 versus 5 team match.
Or, two, we could say that the algorithm had shown a remarkable aptitude for learning highly
sophisticated technical maneuvers and longer-term strategy in a difficult game.
And the rest is only a matter of time.
In fact, in 5 versus 5, there is even more room for a highly intelligent program to shine
and create new tactics that we’ve never thought of.
I would bet that if anything, we’re going to be even more surprised by the 5 versus
5 results later.
我们仍然欠缺一些细节 但我已经联系过OpenAI的兄弟们 他们说
We are still lacking in details a bit, but I have contacted the OpenAI guys who noted
that there will be more information available in the next few days.
Whenever something new appears, I’ll be here to cover it for you Fellow Scholars.
If you are new to the series and enjoyed this episode, make sure to subscribe and click
the bell icon for two super fun science videos a week.
如果你对DOTA2有兴趣 当然 看完这期视频很难不产生兴趣
And if you find yourself interested in DOTA 2, and admittedly, it’s hard not to, and would
如果你想了解更多的DOTA2基本技巧 一定要看看Day9的频道 他有一个
like to catch up a bit on the basics, make sure to visit Day9’s channel who has a really
nice playlist about the fundamentals of the game.
There is a link in the description for it, check it out.
If you go to his channel, make sure to leave him a kind scholarly comment.
Let the world see how courteous the Two Minute Papers listeners are.
Thanks for watching and for your generous support, and I’ll see you next time!
学霸们大家好 这是由Károly Zsolnai-Fehér带来的两分钟论文