未登录,请登录后再发表信息
最新评论 (0)
播放视频

人工智能向我们展示像素的声音

This AI Shows Us the Sound of Pixels

亲爱的学者们 这里是《两分钟论文》
Dear Fellow Scholars, this is Two Minute Papers
我是Károly Zsolnai-Fehér
with Károly Zsolnai-Fehér.
这种基于神经网络的方法
This is a neural network-based method
能够显示像素的声音
that is able to show us the sound of pixels.
这意味着它能分离和定位视频中的音频信号
What this means is that it separates and localizes audio signals in videos.
两个关键词是分离和定位
The two keywords are separation and localization,
所以让我们逐一地研究它们
so let’s take a look at these one by one.
定位意味着我们可以在图像中选取一个像素
Localization means that we can pick a pixel in the image
它就会播放来自那个位置的声音
and it show us the sound that comes from that location,
分离意味着 理想情况下
and the separation part means that ideally,
我们仅能听到那个特定的声源
we only hear that particular sound source.
让我们看一个例子
Let’s have a look at an example.
这是一个输入视频
Here is an input video.
[音乐]
[Music]
现在 让我们试着分离大提琴的声音
And now, let’s try to separate the sound of the cello
看看它是否知道它来自哪里
and see if it knows where it comes from.
[大提琴弹奏]
[Cello playing]
吉他也是一样
Same with the guitar.
[吉他弹奏]
[Guitar playing]
现在有一个更棘手的问题……
Now for a trickier question…
虽然声音会在墙壁上回荡
even though there are sound reverberations off the walls,
但是墙壁本身并不直接发出声音
but the walls don’t directly emit sound themselves,
所以 我希望现在什么也听不见 让我们看看…
so I am hoping to hear nothing now, let’s see…
[无声]
[Silence]
无信号 太棒了
flat signal, great!
那么 它是怎么运作的呢?
So,how does this work?
这是一种基于神经网络的解决方案
It is a neural-network based solution
通过观看60小时的音乐表演
that has watched 60 hours of musical performances
它就能做到这一点
to be able to pull this off,
它还发现声音的变化
and it learns that a change in sound
往往可以追溯到音乐家演奏乐器时
can often be tracked back to a change in the video footage
视频片段的变化
as a musician is playing an instrument.
因此你要知道 这个过程不必有人监督
As a result, get this, no supervision is required.
这意味着我们不需要标记这些数据
This means that we don’t need to label this data,
换句话说 我们不必指定每个像素的声音
or in other words, we don’t need to specify how each pixel sounds,
它学会自己从视频和声音信号中推断所有信息
it learns to infer all this information from the video and sound signals by itself.
这是个重大发明 要不然
This is huge, and otherwise,
给这些数据注释 不知道需要多少工作时间
just imagine how many work-hours that would require to annotate all this data.
另一个很酷的应用是如果我们可以分离这些信号
And,another cool application is that if we can separate these signals,
那么我们也可以单独调整这些乐器的声音
then we can also independently adjust the sound of these instruments.
看一下
Have a look.
[音乐 长笛比木琴大声]
[Music – flute louder than xylophone]
[音乐 木琴比长笛大声]
[Music – xylophone louder than flute]
很明显 它并不完美
Now,clearly, it is not perfect
因为一些乐器的频率可能会被另一个乐器干扰
as some frequencies may bleed over from one instrument to the other,
也有其他分离音频信号的方法
and there also are other methods to separate audio signals,
但这个方法不需要任何专业知识
but this particular one does not require any expertise,
所以我看到了一个伟大的价值主张
so I see a great value proposition there.
如果你想创作某个视频的另一个版本
If you wish to create a separate version of a video clip
并将它用于卡拉OK中
and use it for karaoke,
或者把吉他拿掉 自己弹奏
or just subtract the guitar and play it yourself,
这就是你要找的技术了
I would look no further.
还有 你知道的
Also,you know the drill,
只要再做些研究 这技术一定大有改善
this will be way better a couple papers down the line.
所以 你觉得怎么样
So, what do you think?
你觉得这个技术还有什么应用方法?
What possible applications do you envision for this?
哪里可以改进呢
Where could it be improved?
请在下方评论区告诉我
Let me know below in the comments.
感谢您的收看和大力支持
Thanks for watching and for your generous support,
我们下次再见
and I’ll see you next time!

发表评论

译制信息
视频概述

没有简介

听录译者

收集自网络

翻译译者

🍀趁年轻🌴

审核员

审核员B

视频来源

https://www.youtube.com/watch?v=o-LU_Dja6Ks

相关推荐