ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

来自人们偏好的深度学习 – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

来自人们偏好的深度学习

Deep Learning From Human Preferences | Two Minute Papers #196

亲爱的学霸们 这里是由Károly Zsolnai Fehér 带来的两分钟论文
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
在这个新的人工智能时代 不乏关于人工智能安全性的文章和讨论
In this new age of AI, there is no shortage of articles and discussion about AI safety,
这是理所当然的:这些新的学习算法正在快速迭代解决以前
and of course, rightfully so: these new learning algorithms started solving problems that were
认为不可能被解决的问题
previously thought to be impossible in quick succession.
仅仅十年前 假如我们和某人谈论两分钟论文中涵盖的一半的事情
Only ten years ago, if we told someone about half of the things that have been covered
我们就会被认为是疯了
in the last few Two Minute Papers episodes, we’d have been declared insane.
当然 算法这么强大
And of course, having such powerful algorithms,
我们必须确保它们被用在正道
we have to make sure that they are used for good.
这篇论文由OpenAI和DeepMind的安全小组合作完成
This work is a collaboration between OpenAI and DeepMind’s security team and is about
在强化学习问题中引入更多人为控制
introducing more human control in reinforcement learning problems.
目标是通过强化学习来学习表演后空翻
The goal was to learn to perform a backflip through reinforcement learning.
这种算法试图执行一系列动作来得到最高分
This is an algorithm that tries to perform a series of actions to maximize a score.
有点像玩电脑游戏
Kind of like playing computer games.
例如 在雅达利的打砖块游戏中 如果我们打掉了很多砖块 就会得到高分
For instance, in Atari Breakout – if we break a lot of bricks, we get a high score so we
于是我们就知道做对了
know we did something well.
如果砖块一直被敲掉 我们就继续直到获得结果
If we see that happening, we keep doing what led to this result,
如果没做到 我们就返回最开始 尝试新方法
if not, we go back to the drawing board and try something new.
但这项工作并不是普通的强化算法
But this work is about no ordinary reinforcement learning algorithm,
因为有一位人类导师在教这个数码生物
because the score to be maximized comes from a human supervisor
在表演后空翻方面获得最高分
and we’re trying to teach a digital creature to perform a backflip.
我特别喜欢选择这个后空翻游戏
I particularly like the choice of the backflip here
因为我们可以看到就能知道结果 但是这种行为的
because we can tell when we see one, but a mathematical specification of this
数学描述方法是相当复杂的
in terms of movement actions is rather challenging.
这是一个人类可以检查并控制的公式化学习过程
This is a problem formulation in which humans can overlook and control the learning process,
未来这对于学习算法来说将日益重要
which is going to be an increasingly important aspect of learning algorithms in the future.
反馈选项非常简单:我们只需指定这一系列动作
The feedback option is very simple: we just specify whether this sequence of motions achieved
是否达到了我们规定的标准
our prescribed goal or not.
是摔倒还是后空翻成功
Did it fall or did it perform the backflip successfully.
经过大约700次的反馈 该算法已能掌握后空翻的概念
After around 700 human feedbacks, the algorithm was able to learn the concept of a backflip,
这是标志性的 因为只给出对错这种二元判断
which is quite remarkable given that these binary yes/no scores are extremely difficult
对于任何一种学习过程都是极其困难的
to use for any sort of learning.
在前一集中 我们举了一个类似的例子 一位粗心的老师
In an earlier episode, we illustrated a similar case with a careless teacher
他拒绝在笔试考试中给出每一个问题的分数
who refuses to give out points for each problem on a written exam
而只告诉我们是否及格了
and only announces whether we have failed or passed.
这显然是一个可怕的学习经历
This clearly makes a dreadful learning experience, and it is incredible that the algorithm is
但该算法仍然能够使用这些来学习
still able to learn using these.
我们提供了算法产生动作的不到1%的反馈
We provide feedback on less than 1% of the actions the algorithm makes, and it can still
它仍然可以从这些极其稀疏模糊的奖励中学到困难的概念
learn difficult concepts off of these extremely sparse and vague rewards.
劣等的教育产生高质量的学习
Low-quality teaching leads to high-quality learning.
怎么样?
How about that!?
很明显这比其他能从人类反馈中学习的技术要复杂的多
This is significantly more complex than what other techniques were able to learn with human feedback.
而且 它也能适用于其他游戏哦!
And, it works with other games too!
提一下“合作”这个词
A word about the collaboration itself.
当一家公司雇佣了一群绝顶聪明的科学家
When a company hires a bunch of super smart scientists and
并花费大量的钱进行研究时 可以理解 他们
spends a ton of money on research, it is understandable that they want to
希望通过这些项目来获得竞争优势 这通常意味着他们会紧守结果
get an edge through these projects, which often means keeping the results for themselves.
这会导致过度保密和缺乏与其他团队的合作
This leads to excessive secrecy and a lack of collaboration with other groups
因为每个人对竞争都很警觉
as everyone wants to keep their cards close to their chest.
事实上 在这两个人工智能研究巨人之间能形成合作
The fact that such collaborations can happen between these two AI research giants is a
全身心的投入到研究工作中去 并且能与所有人
testament to how devoted they are to working together and sharing their findings with everyone,
免费的分享成果 来达到更大的价值 赞
free of charge for the greater good. Awesome.
由于媒体都在一致宣传人类灭亡论
As the media is all up in arms about the demise of the human race
我觉得有必要说一下硬币的另一面
I feel that it is important to show the other side of the coin as well.
我们的顶级人员已经在人工智能的安全方面研究了
We have top people working on AI safety right now.
如果你想帮我们把这些故事讲给更多的人
If you wish to help us tell these stories to more people,
请在Patreon上支持我们
please consider supporting us on Patreon.
详情可以在视频中找到 也可以点击
Details are available in the video description, or just click the letter p that appears on
屏幕上出现的字母P
the screen in a moment.
感谢您的观看和支持 下期再见!
Thanks for watching and for your generous support, and I’ll see you next time!

发表评论

译制信息
视频概述

形象地介绍了一种深度学习的算法,通过给出正确与否的判断,算法通过尝试和自我学习来掌握后空翻的概念

听录译者

收集自网络

翻译译者

朦胧星海

审核员

审核团1024

视频来源

https://www.youtube.com/watch?v=WT0WtoYz2jE

相关推荐