Dear Fellow Scholars,
this is Two Minute Papers with Károly Zsolnai-Fehér.
With today’s graphics technology,
we can enjoy many really smooth videos that were created
using 60 frames per second.
We love it too, and we hope that you noticed
that our last hundred or maybe even more episodes
have been available in 60hz.
However, it oftentimes happens that
we’re given videos that have anything from 20 to
30 frames per second.
This means that if we play them on a 60 fps timeline,
half or even more of these frames
will not provide any new information.
As we try to slow down the videos for some nice slow-motion action,
this ratio is even worse,
creating an extremely choppy output video.
Fortunately, there are techniques that are able to guess
what happens in these intermediate frames and give them to us.
This is what we call frame interpolation.
We have had some previous experiments in this area
where we tried to create an amazing slow
motion version of a video with some bubbles merging.
A simple and standard way of doing frame interpolation
is called frame blending,
which is a simple averaging of the closest two known frames.
The more advanced techniques are optical flow-based,
which is a method to determine what motions
happened between the two frames,
and create new images based on that knowledge,
leading to higher quality results in most cases.
This technique uses a convolutional neural network
to accomplish something similar,
but in the end, it doesn’t give us an image,
but a set of convolution kernels.
This is a transformation that is applied to the previous
and the next frame to produce the intermediate image.
It is not the image itself,
but a recipe of how to produce it, if you will.
We’ve had a ton of fun with convolutions earlier,
where we used them to create beautiful subsurface
scattering effects for translucent materials in real time,
and our more loyal Fellow Scholars remember that at some point,
I also pulled out my guitar and showed
what it would sound like inside a church
using a convolution-based reverberation technique.
The links are available in the video description,
make sure to check them out!
Since we have a neural network over here,
it goes without saying that the training takes place
on a large number of before-after image pairs,
so that the network is able to produce
these convolution kernels.
Of course, to validate this algorithm,
we also need to have access to a ground truth
reference to compare against –
we can accomplish this by withholding some information
about a few intermediate frames
so we have the true images
which the algorithm would have to reproduce
without seeing it.
Kind of like giving a test to a student
when we already know the answers.
You can see such a comparison here.
And now, let’s have a look at these results!
As you can see, they are extremely smooth,
and the technique retains a lot of high-frequency
details in these images.
The videos also seem temporally coherent,
which means that it’s devoid of the annoying flickering effect
where the reconstruction takes place in a way
that’s be different in each subsequent frame.
None of that happens here,
which is an excellent property of this technique.
The python source code for this technique is available
and is free for non-commercial uses.
I’ve put a link in the description,
if you have given it a try and have some results of your own,
make sure to post them in the comments section
or our subreddit discussion.
The link is available in the description.
Thanks for watching and for your generous support,
and I’ll see you next time!