未登录,请登录后再发表信息
最新评论 (0)
播放视频

《机器学习之数学》#1 简介

Intro - The Math of Intelligence

大家好!我是西拉杰
Hello World! It’s Siraj.
欢迎来到“智能的数学”
And welcome to “The Math of Intelligence”
【智能的数学】
【编导:西拉杰】
在接下来3个月 我们将开启一段旅程
For the next 3 months, we’re going to take a journey
来学习机器学习中最重要的数学概念
through the most important math concepts that underlie machine learning.
这意味着从各个伟大学科中了解各种所需概念
That means all the concepts you need from the great disciplines
比如微积分 线性代数 概率论和统计学
of calculus, linear algebra, probability theory, and statistics.
学这门课前需要了解基本的Python语法和代数
The prerequisites are knowing basic python syntax and algebra.
我们写的每一个算法都
Every single algorithm we code
不会使用任何流行的机器学习库来完成
will be done without using any popular machine learning library,
因为这门课程的目的是帮助您
because the point of this course is to help you
培养一种扎实的关于建立算法的数学直觉
build a solid mathematical intuition around building algorithms
这些算法可以从数据中学习
that can learn from data.
我的意思是让我们面对它吧
I mean let’s face it,
你可以只使用黑盒子API来调用这一切
you could just use a black box API for all this stuff,
但如果你有直觉
but if you have the intuition
你将确切地知道使用哪种算法
you’ll know exactly which algorithm to use for the job.
甚至从零开始定制自己的工具
Or even how it’s to make your own from scratch.
作为人类 我们不断地通过五种感官接收数据
As humans, we are constantly receiving data through our five senses
与此同时 我们必须理解这些混乱的输入
and somehow we’ve got to make sense of all this chaotic input
才能生存下来
so that we can survive.
感谢这个进化过程
Thanks to the evolutionary process
我们的大脑已经进化到能做到这一点了
we’ve developed brains capable of doing this.
我们已经得到了宇宙中最宝贵的资源——智能
We’ve got the most precious resource in the universe intelligence,
一种学习和运用知识的能力
the ability to learn and apply knowledge.
有一种将我们的智能与其他动物的
One way to measure our intelligence
智能区分开来的方式就是用这样一个阶梯
against the rest of the animal kingdom is using a ladder.
我们的智能的确是最通用的
Ours is indeed the most generalized type of intelligence,
它的适用范围最广
capable of being applied to the widest variety of tasks.
但是 这并不意味着我们就一定是最好的一种智能
But that doesn’t mean that we are necessarily the best kind of intelligence.
在1960年代 灵长类动物学者
In the 1960s, a primate researcher named,
简·古德尔博士总结到
Dr. Jane Goodall, concluded that
黑猩猩已经在森林里生活了几十万年
chimpanzees had been living in the forest for hundreds of thousands of years
却没有过度繁殖或者破坏他们的生活环境
without overpopulating or destroying their environment at all.
逆戟鲸可以每次只用一个大脑半球进行睡眠
Orcas have the ability to sleep with one hemisphere of their brain at a time,
这使得他们能在休养生息的同时
which allows them to recuperate,
也能感知到周围世界
while being aware of their surroundings.
在某些方面动物比我们更聪明
In some ways animals are more intelligent than us.
智能包括许多方面
Intelligence consists of many dimensions.
它就像是一种由可能性构成的多维空间
Think of it like a multi-dimensional space of possibility.
当建立一个AI
When building a AI,
人的大脑是一个伟大的路线图
the human brain is a great road map,
毕竟 神经网络通过无数的任务
after all the neural networks have achieved
已获得了非常先进的性能
state of the art performance in countless tasks,
但它不是唯一的路线图
but it’s not the only road map,
我们能够并且将会创造很多其他可能的智能类型
there are many possible types of intelligence out there that we can and will create.
有些和我们很像 有些差别很大
Some will seem familiar to us, and some very alien.
用我们从未有过的方式思考
Thinking in a way we’ve never done before.
就像 AlphaGo 下的第37手
Like when AlphaGo played move 37.
即使是世界上最好的棋手都在这步棋上惊呆了
Even the best Go players in the world were stunned at the move.
它违背了我们几千年来在围棋实践中所收获的策略
It went against everything we’ve learned about the game from millennia of practice,
但它却是一个更好的策略 这也促成了它的胜利
but it turned out to be an objectively better strategy that led to its win.
许多不同类型的智能就像交响乐
The many different types of intelligence are like symphonies,
需要由不同的乐器组成
each comprising of different instruments and
这些乐器各不相同 不只是在他们的力度中
these instruments vary, not just in their dynamics
也在他们的音调 节奏 色彩和旋律中
but in their pitch and tempo and color and melody.
我们正在产生的数据量增长非常快
The amount of data that we’re generating is growing really fast.
不对 我是说非常 非常快!
No I mean really, REALLY fast!
你开始看这部视频到现在
In the time since you started watching this video
所产生的数据就够你一辈子来进行分析了
enough data was generated for you to spend an entire lifetime analyzing.
而这只是全部数据的0.5%
And only 0.5% of all data ever is.
创造智能不只是一件不错的事情 还是一种需要
Creating intelligence isn’t just a nice to have, it’s a necessity.
正确地使用AI 它会帮助我们解决问题
Put in the right hands it will help us solve problems
解决我们做梦也没想到能解决的问题
we never dreamed could be possible to solve.
那么我们从何入手呢?
So where do we start?
机器学习的核心是数学优化
At it’s core, machine learning is all about mathematical optimization.
这是一种思维方式
This is a way of thinking.
每个问题都可以分解为优化问题
Every single problem can be broken down into an optimization problem.
一旦有了充当输入的一些数据集
Once we have some data set that acts as our input,
我们将建立一个模型,利用这些数据来优化一个目标
we’ll build a model that uses that data to optimize for an objective,
一个我们想要达到的目标
a goal that we want to reach.
实现它的方式是最小化我们定义的误差值
And the way it does this is by minimizing some error value that we define.
比如“我今天应该穿什么?”
One example problem could be, “what should I wear today?”
我将按照时尚来进行优化 而不是舒适
I could frame this as optimizing for stylishness, instead of say, comfort,
然后定义一个我想要最小化的误差值
then define an error that I want to minimize
因为一组人给我的评分是负数
as the amount of ratings a group of people give me that are negative.
甚至什么是我iOS上app主页的最佳设计
Or even what’s the best design for my iOS app’s homepage.
与其在某些元素上硬编码
Rather than hardcoding in some elements,
我选择从用户那获取app设计和它们评分的数据集
I could find a data set of app designs and their ratings from users.
如果我想为最高评分的设计进行优化
If I want to optimize for a design that would be the highest rated
我需要了解设计风格和评级之间的映射
I would learn the mapping between design styles and ratings.
这将是将来建立每一层堆栈的方式
This is the way that every single layer of the stack will be built in the future.
有时我们的数据已经被标记
Sometimes our data is labeled,
有时不是
sometimes it isn’t,
我们可以使用不同的技术来找到这些数据中的模式
there are different techniques we can use to find patterns in this data.
有时 对目标的优化
And sometimes optimizing for an objective can happen
不是通过模式识别的框架来实现的
not through the frame of pattern recognition but
而是通过探索许多可能性 看什么有效 什么无效
through the exploration of many possibilities and seeing what works and what doesn’t.
有许多方法可以构建学习过程
There are many ways that we can frame the learning process,
但最简单的学习方法是使用标记的数据
but the easiest way to learn is when we used labelled data.
从数学上来说 我们有一些输入
Mathematically speaking we have some input.
有一个域X 其中的每个点都具有我们观察到的特征
There‘s a domain, X, where every point of X has features that we observe.
然后我们有一个标签组Y
Then we have a label set Y.
因此由一组带标记例子组成的数据 可以这样表示
So the data consists of a set of labeled examples that we can denote this way.
输出 此时 就是一个预测规则
The output, then, would be a prediction rule.
也就是给出一个新的X值 对应的Y值应是多少?
So given a new X value, what’s its associated Y value?
必须学习这个映射 也就是X的未知分布
We’ve gotta learn this mapping, which is an unknown distribution over X,
才能回答这个问题
to be able to answer this.
因此我们必须测量一些作为性能指标的误差函数
So we have to measure some error function that acts as a performance metric.
那么要做的就是从许多可能的模型中选择
So what we’d do is choose from a number of possible models
来表示这个函数
to represent this function.
我们首先设置一些参数值来表示映射
We’ll initially set some parameter values to represent the mapping,
然后我们评估初始结果
then we’d evaluate the initial result,
测量误差 更新参数
measure the error, update the parameters,
并重复此过程来一次又一次优化模型
and repeat this process optimizing the model again and again
直到完全了解映射
until it fully learns the mapping.
是凸型还是凹型功能更容易优化?我认为是凸
Was it convex or concave functions that were easier to optimize? I think convex.
我真希望我的实验室合作伙伴是优化高手
I really hope my lab partner is epic at optimization.
我想我应该感激
I guess I should be thankful,
没有多少数据科学家得到了CERN的资助去找希格斯玻色子
not many data scientists get a grant from CERN to detect the Higgs-Boson.
她叫什么来着? 我记得好像是埃洛伊斯
What was her name again? Eloise, I think.
嗯 她在ICML中得了一个奖 她是不是很可爱?
Yup, she did win an award at ICML. I wonder if she’s cute?
不 这并不重要 这次我不打算把工作娱乐混为一谈
No, that doesn’t matter. I am not going to mix business and pleasure, not this time.
假设我有一堆数据点
Suppose I’ve got a bunch of data points.
这些只是测试数据点
These are just toy data points,
就像苹果公司可能用来训练Siri的一样
like what Apple probably trained Siri on.
他们都是x-y值对 其中x代表一个人骑单车的距离
They’re all x-y value pairs where x represents the distance a person bikes,
而y代表他们消耗的热量
and y represents the amount of calories they lost.
我们可以在图表上这样绘制它们
We can just plot them on a graph like so.
我们希望能够在已知骑车距离时预测
We want to be able to predict the calories lost for a new person
另一个人所消耗的热量
giving their biking distance.
我们该怎么做呢?
How should we do this?
我们可以尝试画出一条经过所有数据点的线
Well we could try to draw a line that fits through all the data points
但是似乎点相隔太远
but it seems like our points are too spaced out for
直线没法经过所有的点
a straight line to pass through all of them.
因此 我们可以绘制一条最佳拟合线
So we can settle for drawing the line of best fit,
这条线尽可能多地经过数据点
a line that goes through as many data points as possible.
代数告诉我们 直线的方程是y=mx+b的形式
Algebra tells us that the equation for a straight line is of the form y = mx+ b.
其中m表示线的斜率或陡度
Where m represents the slope or steepness of the line
b表示y轴截距点
and b represents its y-axis intercept point.
我们想找到b和m的最优值
We want to find the optimal values for b and m such that
使得线尽可能多的拟合 以便给定任何新的x值
line fits as many points as possible, so given any new x value,
可以将其插入到方程式中 并输出最有可能的y值
we can plug it into our equation and it’ll output the most likely y value.
误差值是一种接近程度 我们可以这样定义
Our error metric can be a measure of closeness, which we can define like this.
从一个随机的b和m值开始绘制该图像
So lets start off with a random b and m value and plot this line.
对于每个数据点
For every single data point we have,
计算其相关的Y值
let’s calculate its associated y value.
然后 减去实际的Y值 来测量两者的间距
Then we’ll subtract the actual y value from it to measure the distance between the two.
为了使下一步更容易 再把这个误差值平方
We’ll want to square this error to make our next steps easier.
等我们对所有这些值求和 就得到一个值
Once we sum all these values we get a single value
这个值代表了刚才画的线的误差值
that represents our error given that line we just drew.
现在 若反复进行此过程 比如说666次
Now if we did this process repeatedly, say 666 times,
对于一堆不同的随机绘制的线
for a bunch of different randomly drawn lines,
我们可以创建一个3D图
we could create a 3D graph
显示每个关联的b和m值的误差值
that shows the error value for every associated b and m value.
注意这个图表中有一个山谷
Notice how there is a valley in this graph.
在这个山谷的底部 误差最小
At the bottom of this valley, the error is at its smallest.
因此 相关的B和M值将是最佳拟合的线
And so the associated b and m values would be the line of best fit,
在这里所有数据点和线之间的距离将是最小的!
where the distance between all our data points and our line would be the smallest!
但要怎么找到呢?
But how do we find it?
需要尝试一堆不同的线来创建这个3D图
Well we’ll need to try out a bunch of different lines to create this 3D graph.
但是 相对于一遍遍随机而无目的地绘制
But rather than just randomly drawing lines over and over again with no signal,
我们可以以更有效的方式做到这一点
what if we could do it in a more efficient way,
比如让绘制的每一条连续的线更靠近
such that each successive line we draw brings us closer and closer
这个山谷的底部
to the bottom of this valley.
为此需要一个方向 一个下降到谷底的方法
We need a direction a way to descend this valley.
比如对于给定函数 可在给定点找到斜率
What if for a given function, we could find the slope of it at a given point.
而斜率将指向特定方向 朝向图的最小值
Then that slope would point in a certain direction, towards the minima of the graph.
当一遍遍重绘路线时
And when we re-draw our line over and over again
我们可以用斜率作为指南针
we could do so using the slope as our compass,
作为如何最好地进行重画的指南
as our guide on how best to redraw as we
(走过死亡之荫的山谷)
( walk through the valley of the shadow of death )
直到最低点斜率为0
towards the minima until our slope approaches 0.
在微积分中把这个斜率称为函数的导数
In calculus, we call this slope the derivative of a function.
由于正在更新2个值b和m
Since we are updating 2 values, b and m.
要计算相对于这两个变量的导数 即偏导数
We want to calculate the derivative with respect to both of them, the partial derivative.
一个变量的偏导数
The partial derivative with respect to a variable
意味着计算该变量的导数
means that we calculate the derivative of that variable
而保持其它变量恒定
while ignoring the others.
所以我们先计算b的偏导数
So we’ll compute the partial derivative with respect to b.
然后是m的偏导数
Then the partial derivative with respect to m.
这需要用到幂函数求导公式
To do this we use the power rule.
在系数前乘以指数 并从指数中减去1
We multiply the exponent by the coefficient and subtract 1 from the exponent.
有了这两个值 就可以在函数中更新参数
Once we have these 2 values we can update both of these parameters
从现有的b和m值中减去它们
from our function by subtracting them from our existing b and m values.
然后按照预先定义的迭代次数继续这样做
And we just keep doing that for a set number of iterations that we pre-define.
刚刚执行的这种优化技术称为梯度下降
So this optimization technique that we just performed is called gradient descent
它是机器学习中最受欢迎的一种算法
and it’s the most popular one in machine learning.
你需要从这个视频中记住什么? 3点
So what do you need to remember from this video? 3 points.
导数是函数给定点的斜率
The derivative is the slope of a function at a given line,
偏导数是相对于该函数中一个变量的斜率
the partial derivative is the slope with respect to one variable in that function.
我们可以使用它们来组成一个梯度
We can use them to compose a gradient
它指向函数局部最小值的方向
which points in the direction of the local minima of a function.
梯度下降是机器学习中很受欢迎的优化策略
And gradient descent is a very popular optimization strategy in machine learning
它使用梯度来实现优化
that uses the gradient to do this.
【介绍编程挑战赛的获胜者】
现在到你了 我为你准备了一个编码挑战
Now its your turn. I’ve got a coding challenge for you.
在提供的不同数据集上自己实现梯度下降
Implement gradient descent on your own on a different dataset that I’ll provide.
请点击GitHub的链接获取详情
Check out the GitHub link for details,
获胜者会在一周内公布
the winner will be announced in a week.
订阅就能获得更多节目视频哦
Please subscribe for more programming videos
现在要开始记忆幂函数求导公式了
and for now I’ve gotta find memorize the power rule
感谢收看:)
so thanks for watching 🙂

发表评论

译制信息
视频概述

欢迎来到智能的数学!在课程中,我们将学习机器学习中最基本的数学概念。在第一节课中,我们将介绍一种非常流行的优化技术,叫做梯度下降,以帮助我们预测一个骑车人已知骑行距离时能燃烧多少卡路里。我们也将了解2个数据科学家的故事,他们试图通过异常检测找到希格斯玻色子。视频的代码:https://github.com/llSourcell/Intro_to_the_Math_of_intelligence

听录译者

土土

翻译译者

土土

审核员

审核员_MZ

视频来源

https://www.youtube.com/watch?v=xRJCOz3AfYY

相关推荐