ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

《了解深度学习》#4 如何轻松运用数学 – 译学馆
未登陆,请登陆后再发表信息
最新评论 (0)
播放视频

《了解深度学习》#4 如何轻松运用数学

How to Do Mathematics Easily - Intro to Deep Learning #4

嘿 你好吗
Hey, are you okay?
Siraj 你能跟他们讲讲深度学习背后的数学吗?
Siraj, can you show them the math behind deep learning?
当然可以
Totally.
大家好 我是Siraj
Hello world! It’s Siraj.
接下来我们来了解与深度学习有关的数学
And let’s learn about the math needed to do deep learning.
数学存在于万物之中
Math is in everything,
不仅仅渗透到理工科的每个领域
not just every field of engineering and science.
而且也体现在音乐的乐符间
It’s between every note in a piece of music
藏在绘画的结构中
and hidden in the textures of a painting.
深度学习也不例外
Deep learning is no different.
数学帮助我们定义规则
Math helps us define rules
对于神经网络来说 我们从数据中获得知识
for our neural network so we can learn from our data.
你可以在不懂数学的情况下使用深度学习
if you wanted to, you could use deep learning without ever knowing anything about math.
现在有大量的易用的API接口可供使用
There are a bunch of readily available APIs
应用在譬如
for tasks like computer
视觉识别和机器翻译的领域
vision and language translation
但是如果你想用一个库
but if you want to use a library
比如用TensorFlow定制模型来解决问题
like TensorFlow to make a custom model to solve a problem
了解一些你可能遇到的数学术语的含义
knowing what math terms mean when you see them pop
是很有益的 同时如果你希望
up is helpful and if you want to advance the
通过研究的方式掌握深度学习 就不要错过
field through research, don’t even trip!
你一定需要了解数学
You definitely need to know the math.
深度学习算法主要来源于数学的三个分支:
Deep learning mainly pulls fromthree branches of math:
线性代数 统计学和微积分
linear algebra, statistics and calculus.
如果你不了解这些科目 我推荐
if you don’t know any of the topics,
你看一下这些科目里重要概念的大纲
i recommend a cheat sheet of the important concepts
在视频描述里可以找到三份大纲的链接
and I’ve linked to one for each in the description
接下来我们通过四步 来阐述深度学习的过程
so let’s go over the four-step process of building a deep learning pipeline
我们会谈论数学在每个步骤发挥什么作用
and talk about how math is used at each step
一旦我们获得有用的数据集 就要进行处理
once we’ve got a dataset that we want to use we want to process it
我们需要做清理工作 删除没有值的数据和没有用的特征
we can clean the data of any empty values, remove features that are not necessary
这些工作不需要数学
but these steps don’t require math.
我把这个步骤称为归一化
A step that does, though, is called normalization.
这是一个可选的步骤 它有利于使模型收敛
This is an optional step that can help our model reach convergence,
使得预测结果的更快的到达可能的收敛值
which is that point when our prediction gives us the lowest error possible, faster
因为所有值的在同样的比例下处理
since all the values operate on the same scale.
这个方法来自统计学
This idea comes from statistics.
你有17.4%的概率是一个好人
You have a 17.4 percent chance of making a straight.
有好几种方法对数据归一化
there are several strategies to normalize data,
最流行的一种叫做线性函数归一化
although a popular one is called min max scaling.
如果有一组给定数据
If we have some given data
我们可以用下面的方程进行归一化
we can use the following equation to normalize it.
对表中的每个数据减掉 它们的最小值
We take each value in the list and subtract the minimum value from it,
然后除以最大值和最小值的差
then divide that result by the maximum value minus the min value.
我们就得到了新的一组数据 在[0, 1]的范围内
we then have a new list of data within the range of 0 to 1
我们对每个特征都采取同样的归一化操作
and we do this for every feature we have so they’re all on the same scale
这是为了确保
after normalizing our data we have to ensure that
数据以神经网络能够接受的格式呈现
it’s in a format that our neural network will accept.
接着就要用到线性代数
This is where linear algebra comes in.
这里要提到线性代数的四个贯穿的概念
There four terms in linear algebra that show up consistently.
它们是标量 向量 矩阵和张量
Scalars, vectors, matrices and tensors.
标量就是一个单纯的数字
A scalar is just a single number.
向量是由数字组成的一维序列
A vector is a one-dimensional array of numbers.
矩阵是数字组成的二维序列
A matrix is a two-dimensional array of numbers.
张量是数字组成的N维序列
And a tensor is an N dimensional array of numbers.
所以矩阵 标量 向量和007幽灵党 不对 没有007幽灵党
So a matrix, scalar, vector and spectre, wait not spectre,
都可以用一个张量表达
can all be represented as a tensor.
想要把来自图像 文本 视频的数据转换成张量
Want to convert data, whatever form it’s in, be that images, words, videos, into tensors, where n is
数据里特征的数目决定了张量的维度
the number of features are data has and defines the dimensionality of our tensor.
一个三层前馈神经网络
Let’s use a three-layer feed-forward neural network
给定输入 它可以输出预测的二进制数
capable of predicting a binary output
这是一个基础的例子
given an input as our base example
为下面更多的数学概念提供例子
to illustrate some more math concepts going forward
我们什么时候用数学和深度学习
When do we use math and deep learning?
对数据进行归一化时
When we normalize during processing.
搜索模型的参数时
Learn a model parameters by searching.
初始化随机权重时
And random weights be initializing.
张量在流动 从输入到输出
Tensors flow… From input to out
测了误差 又测可靠性
Then measure the error to measure the doubt
它告诉我们真实值是什么 期待值是什么
It gives us what’s real and what’s expected.
误差反向传播 最优化损失
Back propogate to get cost corrected.
我们导入我们唯一的依赖库 Numpy
We’ll import our only dependency, Numpy,
然后将输入和输出的数据初始化成矩阵
then initialize our input data and output data as matrices.
一旦数据正确的格式化后
Once our data is in the right format,
就可以搭建深度神经网络了
we want to build our deep neural network.
深度网络有超参数
Deep nets have what are called hyperparameters. These are the
我们定义它们是网络上层的控制装置
high level tuning knobs of the network that
它们控制着 比如模型运行的速度
we define and they help decide things like how fast our model runs, how many
每层神经元的数量 隐藏层的层数
neurons per layer, how many hidden layers.
基本上 神经网络越复杂
Basically the more complex your neural
就需要设置更多的超参数
network gets, the more hyperparameters you’ll have.
你可以根据对问题理解 手动的调节超参数
You can tune these manually using knowledge you have about the problem you’re solving
猜测可能的值 观察结果
to guess probable values and observe the result.
基于结果 你可以修正相应的超参数
Based on the result, you can tweak them accordingly and
如此重复这个过程
repeat that process iteratively.
另一个方法是随机搜索
But another strategy you could use is random search.
你可以对每个超参数设定范围
You can identify ranges for each,
然后写一个随机搜索的算法
then you can create a search algorithm
从设定的范围随机选取值
that picks values from those ranges at random
选取的值应该符合均匀分布
from a uniform distribution of possibilities
这意味着所有的值 都有相同的可能性被选中
which means all possible values have the same probability of being chosen.
这个过程一直重复直到找到最优的超参数
This process repeats until it finds the optimal hyperparameters.
统计学真棒
Yay for statistics!
我们只有num_epochs的值作为超参数
We only have number of epochs as our hyperparameter,
毕竟这是一个很简单的神经网络
since we have a very simple neural network
类似的我们利用概率来决定权重的值
We’ll use probability to decide our weight values, too.
通常的做法是随机初始化样本的权重
One common method is randomly initializing samples of each weight
值来自一个小偏差的正态分布
from a normal distribution with a low deviation,
这意味着值都处于接近的范围
with a low deviation, meaning values are pretty close together.
我们利用它建立一个3*4的权重矩阵
We’ll use it to create a weight matrix with a dimension of three by four, since
维度判断的依据是输入的大小
that’s the size of our input.
输入层中的每个节点 都和下一层的每个节点连接在一起
So every node in the input layer is connected to every node in the next layer.
权重的范围会落在[-1,1]之间
The weight values will be in the range from -1 to 1.
因为一共有三层 我们要初始化两个权重矩阵
Since we have three layers, we’ll initialize two weight matrices.
下一个权重矩阵的维度是1*4
The next set of weights has a dimension four by one
判断的根据是输出的大小
which is the size of our output.
当数据在神经网络中向前传播时
As data propagates forward in a neural network
每一层对数据进行各自的操作
each layer applies its own respective operation to it.
通过某种方式进行变换 直到最终输出一个预测
transforming it in some way,until it eventually outputs a prediction
实际上这些都是线性代数
This is all linear algebra.
实际上这些都是张量的运算
It’s all tensor math.
初始化一个for循环 对这个网络进行6万次训练
We’ll initialize a for loop to train our network 60,000 iterations
然后初始化三层神经网络的参数
Then we’ll want to initialize our layers.
第一层是输入 获得输入的数据
The first layer, our input, gets input data.
下一层 计算第一层 和第一个矩阵的内积
The next layer computes the dot product of the first layer and the first weight matrix.
两个矩阵的乘积运算
When we multiply two matrices together,
类似于给输入数据加上权重
like in the case of applying weight values to input data,
我们称之为内积
we call that the dot product.
然后做一个非线性运算
Then it applies a non-linearity to the result which we
我们决定用sigmoid函数
decided it’s going to be a sigmoid.
它输入一个任意实数
It takes a real value number
输出一个[0, 1]的数
and squashes it into a range between 0 and 1.
这是第一层进行的操作
So that’s the operation that occurs in layer 1,
第二层也进行同样的操作
and the same occurs in the next layer.
我们把第一层的值 向前传播给第二层
We’ll take that value from layer 1 and propagate it forward to layer 2,
计算它的内积和下一个权重函数
computing the dot product of it and the next weight matrix,
然后用非线性运算转化成输出概率
then squashing it into output probabilities with our non-linearity.
因为只有三层 输出的值就是预测的结果
Since we only have three layers, this output value is our prediction.
使预测更准确的方式 即网络学习的方式
The way we improve this prediction, the way our network learns,
是不断地对网络进行优化
is by optimizing our network over time.
我们该怎样进行优化呢?
So how do we optimize it?
微积分登场了
Enter calculus.
我们模型做出来的第一个预测是不准确的
The first prediction our model makes will be inaccurate.
为了改进它
To improve it,
首先量化这个预测值的误差
we first need to quantify exactly how wrong our prediction is.
我们通过测量误差或损失来实现
We’ll do this by measuring the error, or cost.
误差描述了预测值和真实值的差距
The error specifies how far off the predicted output is from the expected output.
一旦得到了误差
Once we have the error value we
我们希望减小这个误差
want to minimize it because the smaller
因为误差越小 预测就越准确
the error the better our prediction.
训练模型就是不断减小误差的过程
Training a neural network means minimizing the error over time.
无法改变输入的数据
We don’t want to change our input data
但是我们可以改变权重 使得误差最小
but we can change our weights to help minimize this error.
如果我们暴力遍历所有权重的可能性来获得最精确的预测
If we just brute forced all the possible weights to see what gave us the most accurate prediction,
计算会花费 非常长的时间 但是
it would take a very long time to compute.
我们希望得到一个方向
Instead, we want some sense of direction for
关于我们怎样更新权重使得
how we can update our weights such that
下一轮循环训练的输出会更加精确
in the next round of training our output is more accurate.
为了获得这个方向
To get this direction
我们根据的是权重 计算误差的梯度
we’ll want to calculate the gradient of our error with respect to our weight values.
我们能通过微分来计算梯度
We can calculate this by using what’s called the derivative in calculus.
当设置非线性函数的deriv为真
When we set deriv to true for our nonlin function,
它就计算sigmoid的微分
it’ll calculate the derivative of a sigmoid.
结果就是sigmoid函数一个点的斜率
That means the slope of a sigmoid at a given point,
其实就是第二层的预测值
which is the prediction values we give it from l2.
我们希望尽可能地减小误差
We want to minimize our error as much as possible,
可以凭直觉想象扔一个球到一个碗里
and we can intuitively think of this process as dropping a ball into a bowl
误差最小的地方就在碗底
where the smallest error value is at the bottom of the bowl.
一旦我们扔了小球进去 我们计算 每一个位置的梯度
Once we drop the ball in, we’ll calculate the gradient at each of those positions,
如果梯度是负数
and if the gradient is negative,
我们就把小球向右移
we’ll move the ball to the right.
如果是正数 就将小球向左移
If it’s positive, we’ll move the ball to the left.
每次计算的梯度都会用来更新相应的权重
And we’re using the gradient to update our weights accordingly each time.
我们重复这个过程直到梯度的值为零
We’ll keep repeating the process until eventually the gradient is zero,
这时就达到了最小误差
which will give us the smallest error value.
这个过程叫做梯度下降
This process is called gradient descent,
因为我们使梯度不断变小 直到零为止
because we are descending our gradient to approach zero
同时在迭代中不断更新权重的值
and using it to update our weight values iteratively.
我全部理解了
I understand everything now.
我依然理解
Still understand everything.
编程实现上 我们乘上预测误差的微分
So to do this programmatically, we’ll multiply the derivative we calculated for our prediction by the error.
就获得了误差权重微分 我们称之为l2_delta
This gives us our error weighted derivative which we’ll call l2_delta
这是一个矩阵的值 使输出的预测值值有一个方向
This is a matrix of values, one for each predicted output, and gives us a direction.
接着我们用这个方向更新层里对应的权重
We’ll later use this direction to update this layer’s associated weight values.
计算特定层的误差的过程
This process of calculating the error at a given layer
和利用它计算误差加权的梯度
and using it to help calculate the error weighted gradient
使得我们朝正确的方向更新权重
so that we can update our weights in the right direction
如此不断迭代更新每一层
will be done recursively for every layer
不断地从末尾回到开头
starting from the last back to the first.
在向前传播预测后 反向地传播误差
We are propagating our error backwardsafter we’ve computed our prediction by propagating forward.
我们称之为反向传播算法
This is called back propagation.
l2_delta乘上相应的权值矩阵的转置
So we’ll multiply the l2_delta values by the transpose of its associated weight matrix
获得了上一层的误差
to get the previous layer’s error,
然后用这个误差做和前面一样的操作
then use that error to do the same operation as before,
获得一个方向值更新相应层的权重
to get direction values to update the associated layers’ weights.
所以误差就被最小化了
so error is minimized.
最终 我们通过乘上各自的deltas 更新每一层的权重矩阵
Lastly, we’ll update the weight matrices for each associated layer by multiplying them by their respective deltas.
当我们运行代码 我们看到误差不断减小
When we run our code we can see that the error values decreased over time,
误差最终变得准确
and our prediction eventually became very accurate.
总结一下
So, to break it down.
深度学习来自数学的三个分支
Deep learning borrows from three branches of math,
线性代数 统计学和微积分
linear algebra, statistics and calculus.
一个神经网络对一个输入张量进行一系列操作
A neural net performs a series of operations on an input tensor
来计算预测
to compute a prediction
我们通过梯度下降算法来优化预测
and we can optimize a prediction by using gradient descent
来递归地反向传播误差
to back propagate our errors recursively,
训练中对每一层更新权重
updating our weight values for every layer during training.
上一期代码挑战的赢家是Jovian Lin
The coding challenge winner from the last video is Jovian Lin.
Jovian用大量的模型预测游戏评论中的情感
Jovian tried out a bunch of different models to predict sentiment from a dataset of video game reviews.
本周最佳魔法师!
Wizard of the week!
亚军是Vishal Batchu
And the runner-up is Vishal Batchu.
他测试了几种不同的递归神经网络
He tested out several different recurrent nets and
并在README文件中做了详细的记录
eloquently recorded his experiment in his ReadMe.
这一期的代码挑战是训练一个神经网络
The coding challenge for this video is to train a deep neural net
来预测一场地震的震级
to predict the magnitude of an earthquake
还有使用一个策略来学习最佳的超系数
and use a strategy to learn the optimal hyperparameters.
把详细内容写在README文件中 然后把你的Github链接贴在评论区
Details are in the ReadMe. Post your GitHub link in the comments,
下一期视频我会宣布优胜者
and I’ll announce the winner next video.
如果喜欢本视频请订阅我的频道
Please subscribe if you wanna see more videos like this.
看看我其他相关的视频
Check out this related video,
现在我该去把运算增加到一百万次了
and for now I got to get my math turned up to a million.
感谢观看
So thanks for watching!

发表评论

译制信息
视频概述

以一个三层神经网络作为例子,介绍了深度学习背后关键的数学概念

听录译者

收集自网络

翻译译者

颜木林

审核员

审核员@AI

视频来源

https://www.youtube.com/watch?v=N4gDikiec8E

相关推荐