ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

《机器学习Python实践》#10 R平方理论 – 译学馆
未登陆,请登陆后再发表信息
最新评论 (0)
播放视频

《机器学习Python实践》#10 R平方理论

R Squared Theory - Practical Machine Learning Tutorial with Python p.10

大家好 欢迎来到机器学习系列教程第10讲
Hello everybody and welcome to part 10 of our machine learning tutorial series.
这一节我们来讲讲
In this part what we’re going to be talking about
之前讲过了的线性回归
is we’ve been talking about linear regression
还讲了怎么用 Python 来算出最优拟合线
and we’ve got to the point where we could calculate the best fit line in our Python code.
那问题来了 最优拟合线到底
But now the question is how good of a fit
拟合的好不好?我们怎么来确定精确度?
is our best fit line. How do we determine the accuracy, right?
那我们用来确定精确度的方法就是 R平方
So the way that we determine accuracy is through R squared or
或者叫决定系数
the coefficient of determination.
决定系数是用我们所熟知的平方误差来计算的
And the coefficient of determination is calculated using what’s known as squared error.
所以我们还是先来了解一下什么是平方误差吧
So first we need to figure out what the heck squared error is.
然后才能计算 R平方 或者说决定系数
And then we can calculate R squared or the coefficient of determination.
先来举个例子 这里有两张图
So to exemplify this consider you know you’ve got two graphs.
图上有一些点 然后我们想要
And then you’ve got some plots on those graphs. And then what you want to do is
画出最优拟合线 是吧 就像这样
draw the best fit line. Okay. So something like this
然后……还有比如说是这样 可以吧?
and then…I have no idea something like this, right? And
如果我问你哪条线拟合得更好
if I asked you which one is a better fit?
你肯定会说是右边这条
You would say the the one on the right.
那我要再问为什么是这条?你可能就得想一会了
And then if I asked why is that the best fit. You might think for a moment. but
你也许会说 这些点更接近这条线
you would probably come up with…well the the dots are closer to the line
就是右边的点更接近线 总之比左边那条更接近
and the one on the right as they are… closer than they are the one on the left anyways.
当然现在我们的坐标轴上还没有写单位
Now of course we don’t have any ticks on our axis here.
也许左边的图的比例太大了
And so I might say that the one on the left were actually zoomed in really far.
那有可能就是左边的点更接近线了 也说不定
And they’re actually much closer so you don’t really know.
那到底如何确定一条线是不是拟合得好呢?
But really it’s how good of the fit is it.
拟合得多好才是最优拟合线?
How good a the fit is the best fit line.
你的模型对数据集拟合得到底有多好?
How good if your fit is your model to your data set.
这当然和你的数据集关系很大
So it’s very relative to your data set.
不过我们一会再讲这个
And you’ll see more why in just a moment.
衡量方式就是距离 那么……我们如何计算距离?
So we know it’s the distance. So what…how do we actually calculate this.
我们可以用平方误差 图在这里
Well, we use squared error so we’ve got a graph here.
我们有一些数据点 很漂亮的数据点 然后我们算出了最优拟合线
And then we got some data points, some beautiful data points. And we have our best fit line.
那计算平方误差的方法就是
And the way that we calculate squared error is we say the error
误差其实就是数据点和最优拟合线的距离
is the distance between the point and the line…and the best fit line.
但我们不仅要计算误差
And then what we say is it’s not just error.
我们还要把误差的值平方 所以我们想要的平方误差就是
We want to square that value. So we want to do squared error is you know
e² 好的 你也许会问 为啥要平方呢?
e squared. Okay. So you might ask why are we squaring it, right? Well
在这里距离……一种情况下距离会是正的
In this case the distance right…In one case, the distance might be positive.
另外一种情况距离可能是负的
And in another case the distance might be negative.
所以我们用平方的一个原因就是把值都变成正的
So one reason why we square it is so that we’re only dealing with positive values.
那你可能会问了 为啥不用绝对值
You might then ask why is it e squared and not like absolute value of e.
那就是用平方的另一个原因 也许会有这么一个点
Well we want to square it because what have you had a point that was like
跑到了这个地方 这就算是个异常值
way out here. That would be an outlier and
线性数据一般会排除掉异常值
your linear data set should not have an outlier
因为线性回归当然是要线性数据
because we only want to do linear regression on linear data.
这样才有意义 那我们这里用平方误差
Okay. That only makes sense. So we square the error because
是因为……
we want to…
我们要惩罚异常值 你也许会问了 那为啥不用4次方
We want to penalize for outliers. So then you might ask well why not using power of 4.
或者6次方18次方之类的
or 6 or 18 for that matter.
你当然可以用这些来惩罚异常值
You could use these other ones if you want to penalize for outliers.
为了惩罚有力你甚至可以用更大的次方
You can use a bigger value there if you want to penalize even heavier for outliers.
只是恰巧平方成为了使用标准而已
It’s just so happens the standard is going to be squared error.
如果你不用平方
And if you’re not using squared error.
可能你想发表论文 或者
And maybe you’re publishing something publicly either in a paper or maybe you’ve got
你有一些数据的模型 或者 Python 模型之类的
some data, some sort of module in Python or something you’d want to
那你就得告诉别人 你并没有按照常规的方法做
alert people to the fact that you are not doing it the way that
好的 这就是平方误差
most people do it. Okay. So that is squared error.
接下来我们来计算决定系数 或者说 R平方
Now how do we calculate the coefficient of determination or R squared.
R平方要计算这些东西
So R squared is calculating the following so R squared equals.
R平方 就等于
And it is one minus
1减去平方误差 一般把它写作 SE
And it’s one minus the squared error and generally you’re going to see squared error denoted as SE.
这里就是 y帽线的方差 y帽线是什么?还记得么?
So it’s the squared error of the y hat line. What the heck is the y hat line? Remember?
y帽线 最优拟合 回归线 都是一回事
y hat, best fit, regression line, all the same thing.
除以
Divided by the squared error
y均值 的平方误差
of the mean of the y.
就是数据集 ys 的均值
That’s the mean of the ys of your dataset.
看起来就像是这样 这就是 ys 的均值
So what might that actually look like. The mean of your ys might be that.
这就是一条简单的直线 我们要做的是
So it’s just a simple straight line. And what we’re trying to do
比较这条直线和最优拟合线的准确度
is compare the accuracy of that line to the accuracy of like the best fit line.
讲道理 最优拟合线基本上会比 ys 均值线
And honestly the best fit line is almost certainly going to be better
的准确度高 但我们希望尽量高 是吧
than the mean of the y’s. But we want it to be like way better. Okay.
所以再来看看 R平方 和对 R平方 的计算
So looking at R squared and the calculation of R squared.
怎么算是一个好的结果呢?
What’s like a…what’s a good value, right?
我们怎么判断结果是好是坏呢?
What do we think might be a good value versus what do we think might be a bad value.
比如我们有一个值 R平方
Do let’s consider a value like…Let’s say R squared
等于0.8
equals 0.8.
我们怎么算出0.8的?我们知道
How would we arrive at 0.8? Well we would know that
R平方 等于0.8 那么
in order for R squared to be 0.8. It would have to be
y帽线的平方误差 除以
the squared error of the y hat line divided by
ys均值的平方误差
the squared error of the mean of the ys would have to be…
通过这个等式计算 就等于0.2
like this equation here would have to be equal to 0.2 like…
这样我们才能得到最后的0.8
That’s the only way we could get to 1 – what is 0.8.
那这个例子中我们如何得到
So how…what would be an example of this equation here
这个0.2呢?我们需要算出这个平方误差
being 0.2. Well we would find…that we would need the squared error. Let’s say of y
比如这个y帽线的方差可以是2
maybe the squared error of the y hat line is
然后ys均值的平方误差就是10 等于10
2 and the squared error of the mean of the ys is 10. Okay equals.
那如果是这样
Do if that was the case
那我们就可以明显看出来 y帽线的平方误差远小于
We were saying you know the squared error of the y hat line is actually significantly lower
y均值的平方误差 这是好事还是坏事?
than the squared error of the mean of the y. Is that a good thing or a bad thing.
当然是极好的 我们还想让它更低点呢 不过这样也不错啦
Well that’s a pretty good thing. We would prefer it to be even lower than that. But you know that’s pretty good.
这么看来数据就挺线性的了 那么
So that means this data is probably pretty linear, right? So it’s…
R平方值等于0.8就挺不错
So an R squared of 0.8 is pretty good.
那如果R平方值等于0.3呢?
What if your R squared was like 0.3 for example.
假设 R平方值等于0.3
So if R squared for example was 0.3.
那这个等式会是怎么样的呢
How would we arrive maybe at that equation?
我们得让这个平方误差
Well we would need the squared error
除以……就是y帽线方差除以ys均值的平方误差
divided by R the squared error of the y hat, divided by the square root of the mean of the ys.
我们需要这里等于0.7 对吧?
We would need that to be 0.7, right?
可以用 7÷10 得到
And we could get that by you know 7 over 10.
然后 y帽线 的平方误差就
And now the squared error of the y hat is
就非常接近 ys均值 的平方误差了
a lot closer to the squared error of the mean of the ys.
那很明显这个值不是很好 我们想让这个 R平方 的值
So obviously this is more negative. So we want the R squared value
高一些
to be high.
多高由你自己来决定
How high is kind of determined by you.
那我们这个模型的准确度
But the accuracy in this case of our model
就是……比如说是0.8
is…Let’s say we call it 0.8.
这就是 R平方 的值 并不是一个百分比的准确度
That is the R squared value. So it’s not a percent accuracy is.
就是 R平方 决定系数
It is the R squared. It’s the coefficient of determination.
就是这个值 现在我们知道了 R平方
That is the value. So now that we know what the calculation
计算了什么 我们也知道了平方误差是什么
for R squared is. And we know what squared error is.
我们还知道了怎么算 y帽线 这里已经算过了
And we know how to calculate the y hat line. We’ve already done that.
我们知道怎么去计算 ys的均值 我们并不是一定要算这个
We know how to calculate the mean of the ys. We haven’t done that necessarily.
不过其实我们已经算过了 因为这是最优拟合线计算的一部分
Actually we have done that. Because that was part of our our best fit line calculation. So we’ve done that.
我们已经充分了解了这些 接下来我们可以用Python来实现了
We know how to do everything here. So we can definitely calculate this in Python.
这个放在下一节来讲
So that is what we are going to be doing in the next video.
如果你们有任何问题或者评论 就在下方留言吧
If you have questions, comments, concerns up to this point. Please feel free leaving below.
感谢各位的收看支持和订阅 我们下次见
Otherwise as always thanks for watching. Thanks for all the support and subscriptions and until next time.

发表评论

译制信息
视频概述

这一讲介绍了如何通过R平方来计算拟合线的准确度。同时讲解了为什么使用R平方来计算准确度。

听录译者

[B]刀子

翻译译者

[B]刀子

审核员

审核团1024

视频来源

https://www.youtube.com/watch?v=-fgYp74SNtk

相关推荐