最新评论 (0)

《机器学习Python实践》#7 回归是如何实现的

Regression How it Works - Practical Machine Learning Tutorial with Python p.7

大家好 欢迎观看Python机器学习第七部分
What is going on everybody and welcome to part seven of our machine learning with Python tutorial series in this video.
We are going to be beginning to break down linear regression
然后用 Python 代码 从头开始建立模型
and begin to build it back up in pure Python code from scratch ourselves.
Before we get started we have to kind of break down linear regression theoretically
before we actually know what to to program
by the time we’re done here you’ll start to at least understand one how linear regression can be threaded
but also most importantly why it works with what’s known as continuous data.
这些内容并不是随便了解就行了 这些是理解线性回归原理的基本要求
This isn’t just a happenstance is just fundamental to how this linear regression actually works.
那我们就继续 先看一些例子
So let’s go ahead and cover a couple of examples.
比如说你有一些数据 让我这个数据可视化专家来给你们画出来
So for let’s say you have a dataset expert visualization by Sentdex like this.
好 那你看到这些数据点后 能看出它们之间的关联吗?
Okay so when you look at these data points do you see any sort of correlation.
Well probably you may see a line that goes something like that okay.
And let’s consider another dataset
这组数据大概是这样 我随便画画 你们明白意思就好
And this dataset will do something like this not not the most professional dots but you get the point right.
So you can see how you might draw a line through this.
是吧?没问题 理论上 这应该是一条直线
Right? no problem and in theory this should be a straight line okay.
But then what about a dataset looked like.
Maybe a dataset that looks like this
ignoring this ugly plot over here.
好吧?那这些点能画出最优拟合线吗?当然可以 那数据集的点之间会有关联吗?
Right? this dataset does it have a best-fit line? yes. does have a correlation?
Not……I mean it probably has some sort of correlation if anything looks looks like it has a slight negative correlation.
但我的意思是 你可以给这些点画一条最佳拟合线 但这条线真的能有用吗?
But for the most part I mean you could draw a best fit line but would be would it be a good fit line?
Right? any answer of course is is is no.
好吧 接下来我们就来看看 我们这个例子中的 x 和 y 有没有什么关联性
Okay. so we can see right away that you know is there a relationship between, in our case, x and y.
And it doesn’t look like there is a valid relationship between them.
所以在这种 x 和 y 间没什么关联性的情况中
So in the case where you don’t have a relationship between x and y
doing something like linear regression is not going to be very beneficial
当然 理论上说 x 算是连续型数据 y 也是连续的
and also it’s just clearly that like yes X could be in theory kind of continuous and y could be continuous data.
但是我们还是能很清楚地看到 x 和y 之间并没有什么关系
But we just see there’s no there’s really no relationship here.
好的 那这组数据看来就不能用线性回归了
Okay so this this data just would not work with linear regression.
好的 比方说 你有这么一张图
So let’s say though you do you’ve got a graph
然后你也有一些数据 好 假设我们有这么一些数据
and you’ve got got some data okay so we’ve got some data.
目测一下这些点 你大概就可以确定 这里可以用线性回归
And you decide yeah you sure sure can do linear regression like like eyeballing it or something
we might say that.
好的 那这条粉色的线就是我们的最优拟合线
Okay so so we have this line right this pink line this is our best fit line
right and it is a we’ll assume it’s a straight line
what is the definition like how do you define that line?
好的 那我就带大家回到
Well I will take a journey back to
中学时代 我们学过直线函数 y = mx + b
to middle school where we remember that y equals mx plus b.
right? And obviously where…you know you got your two values
next to each other that they’re being multiplied.
那我们知道直线方程是 y = mx + b
So we know the equation of a line y equals mx plus b.
所以沿着 x 轴上的任何点 比如你有这么一些 x 值
So at any point along x right like let’s say you’ve got some X
那你只要把这些值代入到方程式中的 x
at any point along x you would just you you just plug X into here
接着再得到一些 m 值和 b 值
and then you need the values of m and b
那你就能求到任何对应的 y 值 是吧
and then you will just get the answer for whatever y is, okay.
那么通过 y = mx + b
So with the question of y equals mx plus b
x 是多少我们应该都可以知道
we know that we’ll be able to figure out whatever x is.
不过我们还是要找出 m 和 b 的值
But we are left trying to figure out what is m and what is b.
先说 m 我们都知道 m 就是斜率 这条直线的斜率
So first let’s talk about what m is so we know m is the slop alright that’s the slope of the line.
我们也知道 b 就是 y 轴的截距
And we know that b is the y-intercept.
好 那我们先看斜率
Okay so first let’s consider the slope
所以我们先要求出 m 的值 也就是斜率
so here we are going to just address m which is our slope.
那求 m 值的方程就是
So the equation for m is for a best fit
我们只是求最优拟合线这一条直线的 m 值
we’re talking about just for a best fit line here.
求它的方程就是 m等于
The way that you figure that out is m equals
这里就是 x 的均值乘以 y 的均值
and this is going to be the mean of x times the mean of y.
所以对于直线 当然这里求的就是直线
So when you have a straight line those are supposed to to be straight lines
we have a straight line over the value that’s just the mean.
那么 所有 x 值的均值 乘以所有 y 值的均值 减去 所有 x 和 y 值相乘结果的均值
So x the mean of all the Xs times the mean of all the Ys minus the mean of all the Xs times Ys
好的 理解第一个乘积和第二个乘积之间的区别很重要
Okay so it’s important to understand the difference between those first multiplication and the second multiplication.
所有这些部分再除以 x² 的均值 减去 x均值的平方
Then it’s all of that over the mean of your X squared minus the mean of your X’s squared.
对吧? 也就是所有 x 的均值的二次方减去 x 平方的均值
Right? so this is the the mean of all your X’s to the power of 2 minus the mean of X’s to the power of 2.
对吧?这里的 x 是所有 x 的值
Right? and that’s all of your Xs okay.
那接下来我们来处理 y 的截距 或者说是最优拟合线的 y轴截距
Now we are going to be talking about the the y intercept or best fit y intercept.
也就是求 b 它就是 y轴截距 我就写作 (y int) 吧
So that’s just a b and that is your y intercept I’m going to say y int.
And this one is actually a much easier equation
就是 b等于
And that is actually just simply b equals
这里就是 y的均值 减去 m乘以 x的均值
and this is going to be the mean of Ys minus m times the mean of Xs.
那这里的 m 就是我们刚才求的这个 m
Right so that m is you know from over there.
好 这就是怎么来求 m 和 b 的值
So that’s how you can calculate the m and b.
再说一遍 直线就是 y = mx + b
So again the line is simply y equals mx plus b
这里我们求出 m 这里求出 b
we have our m here we’ve got our b here
接下来你就可以去求给定任何 x 和 y 的最优拟合线了
So now you’re already to go with the calculation of the best fit line given any Xs and Ys
再强调一下 这只是
Now again this is simply on you know
用来应付二维数据的方法 随着你所求向量空间维度的增加 求解方法也会越来越复杂
two dimensional data as you as you increase in your dimensions in vector space this can get much more complex
but yes whereas this is very simple regression example.
那么这就是线性回归背后的数学原理了 基本上问题最后都归结到了
So anyways that’s the math that goes behind it. it just basically boils down to
这个方程就是我们主要用到的算法 然后我们要找到这两个参数的值
you know this is the major algorithm that you are using and to find the values.
这里你就用到了这个算法 还有这个算法
you’re using this algorithm and this algorithm.
这些方程都可以很容易的用 Python 写出来 也就是下一节我们要讲的
And that can be very simply translated to Python code so that’s what we’re going to be starting to do.
在下一个教学视频中就是用 Python 把我们刚讲的东西实现出来
In the next tutorial is converting these to actually Python code
and then applying it to some actual some actual data here.
那么 有任何问题 评论 想法之类的都可以留言 感谢各位的收看 支持和订阅 我们下次见
So if you have any questions comments concerns whatever, leave them below. otherwise as always thanks for watching. thanks for all the support and subscription and until next time.