ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

DeepMind的AI自主学习音视频概念 – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

DeepMind的AI自主学习音视频概念

DeepMind's AI Learns Audio And Video Concepts By Itself | Two Minute Papers #184

亲爱的学霸们 这里是由Károly Zsolnai-Fehér带来的两分钟论文
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
在之前几期节目中 当我们提及学习技术时
In our earlier episodes, when it came to learning techniques,
我们几乎都在讲监督式学习
we almost always talked about supervised learning.
这意味着我们给算法一堆图片和一些额外信息
This means that we give the algorithm a bunch of images, and some additional information,
比如说 描绘狗或者猫的图片
for instance, that these images depict dogs or cats.
然后 学习算法要去接触之前没看过的新图片
Then, the learning algorithm is exposed to new images that it had never seen before and
并要将它们准确分类
has to be able to classify them correctly.
这有点像一个老师坐在学生旁边 进行监督
It is kind of like a teacher sitting next to a student, providing supervision.
然后 在考试中提出一个新问题
Then, the exam comes with new questions.
这就是监督式学习就如你们在超过180期的两分钟论文中所了解到的
This is supervised learning, and as you have seen from more than 180 episodes of Two Minute Papers,
这无疑是取得了巨大成功的研究领域
there is no doubt that it is an enormously successful field of research.
然而 这意味着我们必须要给我们的数据集贴上标签
However, this means that we have to label our datasets,
好让我们在已有的每张图片中加上一些额外信息
so we have to add some additional information to every image we have.
这是一项非常艰苦的工程通常由研究者完成或者通过众包完成
This is a very laborious task, which is typically performed by researchers or through crowdsourcing,
两种方式都需要花费大量资金和上百个小时
both of which takes a lot of funding and hundreds of work hours.
但如果我们仔细想想 我们在网上有一大堆视频
But if we think about it, we have a ton of videos on the internet,
你总会听到这些令人震惊的新统计
you always hear these mind melting new statistics
展示了每天有多少小时的视频画面被上传到油管上
on how many hours of video footage is uploaded to YouTube every day.
当然 我们可以雇用来自全球各地的员工
Of course, we could hire all the employees in the world
为这些视频逐帧注释 从而告诉算法
to annotate these videos frame by frame to tell the algorithm
这是一把吉他 这是一把手风琴 或者键盘
that this is a guitar, this is an accordion, or a keyboard,
然而我们还是没办法学习上传的大部分视频
and we would still not be able to learn on most of what’s uploaded.
但要是有一个能够学习未贴标签的数据的算法那就太好了
But it would be so great to have an algorithm that can learn on unlabeled data.
然而 在无人监督式学习领域中有这样的学习技术
However, there are learning techniques in the field of unsupervised learning,
这意味着算法会拿到一堆图片 或者任何媒介
which means that the algorithm is given a bunch of images, or any media,
并被指导如何在没有任何额外信息的情况下学习
and is instructed to learn on it without any additional information.
没有老师会来监督它学习
There is no teacher to supervise the learning.
算法靠自学
The algorithm learns by itself.
在这篇论文中 目标是在无人监督的方式下
And in this work, the objective is to learn both visual
同时学习有关视觉和音频的任务
and audio-related tasks in an unsupervised manner.
比如 如果我们看这层视觉子网络
So for instance, if we look at the this layer of the visual subnetwork,
我们会发现当这个网络看到东西时 神经元变得活跃起来了
we’ll find neurons that get very excited when they see,
比如看到有人在演奏手风琴
for instance, someone playing an accordion.
这一层每一个神经元都归属于不同的物体种类
And each of the neurons in this layer belong to different object classes.
当我看论文的时候肯定也有类似的反应
I surely have something like this for papers.
现在来到了Károly疯狂时刻—第一部分
And here comes the Károly goes crazy part one:
这项技术不仅将视频画面分类
this technique not only classifies the frames of the videos,
还创造了语义热图
but it also creates semantic heatmaps,
这些热图向我们展示了图片中的哪一部分发出了我们所听到的声音
which show us which part of the image is responsible for the sounds that we hear.
这太疯狂了
This is insanity!
为了达成这个目标 他们运行了一个针对视频部分的视觉子网络
To accomplish this, they ran a vision subnetwork on the video part,
和一个单独的音频子网络来学习声音
and a separate audio subnetwork to learn about the sounds,
最后一步 把所有这些信息都混合在一起
and at the last step, all this information is fused together
得到了Károly疯狂时刻—第二部分
to obtain Károly goes crazy part two:
这让网络能够推测出
this makes the network able to guess
音频和视频流是否相对应
whether the audio and the video stream correspond to each other.
它看着一个拉小提琴的男人 听着一段声音片段
It looks at a man with a fiddle, listens to a sound clip
就能说出这两个是否对应
and will say whether the two correspond to each other.
哇噢
Wow!
这个音频子网络也学习了人声
The audio subnetwork also learned the concept of human voices,
水声 风声 音乐 现场音乐会和更多其它声音的概念
the sound of water, wind, music, live concerts and much, much more.
答案是肯定的 它非常地接近人类在声音识别分类上的表现水平
And the answer is yes, it is remarkably close to human-level performance on sound classification.
所有的这些都归功于从零开始训练的两个网络
And all this is provided by the two networks that were trained from scratch,
并且 不需要任何监督
and, no supervision is required.
我们不需要给这些视频做注解
We don’t need to annotate these videos.
完美解决
Nailed it.
请不要误会
And please don’t get this wrong,
这不是DeepMind突然发明了无监督式学习完全不是
it’s not like DeepMind has suddenly invented unsupervised learning, not at all.
这是数十年来一直处于活跃研究状态的领域
This is a field that has been actively researched for decades,
只是我们很少看到真正有分量的成果就像刚才说的那样
it’s just that we rarely see really punchy results like these ones here.
真的是令人叹为观止的论文
Truly incredible work.
如果你喜欢这期视频 并且觉得每月8期视频值得花一美元
If you enjoyed this episode, and you feel that 8 of these videos a month is worth a dollar,
请考虑一下在Patrean众筹网上支持我们
please consider supporting us on Patreon.
在视频简介栏中有更多细节
Details are available in the video description.
感谢你们的观看和慷慨支持下期再见
Thanks for watching and for your generous support, and I’ll see you next time!

发表评论

译制信息
视频概述

通过音视频两个子网络自主学习音视频,识别分类

听录译者

收集自网络

翻译译者

One静茹

审核员

审核团O

视频来源

https://www.youtube.com/watch?v=mL3CzZcBJZU

相关推荐