ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

《机器学习python实践》#11 编写R平方 – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

《机器学习python实践》#11 编写R平方

Programming R Squared - Practical Machine Learning Tutorial with Python p.11

大家好 欢迎来到机器学习系列教程第11讲
Hello everybody and welcome to part 11 of our machine learning tutorial series.
上一讲中我们学习了如何去计算R平方或者决定系数的值
In this video we’re going to be building on the last one which is where we learned
这一讲 我们来用代码实现它
how to calculate the R squared value or the coefficient of determination value.
R平方的值是用来衡量最优拟合线的准确度的
And this value is the…It’s the value of how good of a fit is our best fit line.
好的 式子就是这个 我们具体怎么计算它呢?
Okay. So that is the equation. Now how do we actually go about calculating it.
这个式子的主要部分是平方误差
So a big part of this equation is actually squared error.
那我们就先创建一个函数来计算平方误差
So we’re going to create a new function that calculates squared error.
就加在这里吧 首先 def squared_error
So we’re just going to add it down here. And so this will be defined squared-error.
squared_error 计算了 y_original
And the squared error is the difference between the
或者就写成 y_orig
ys_original or a ys_orig, we’ll say
以及 ys_line 之间的差值 回忆一下 这个平方误差就是
And the ys_line. So recall that squared error is
我们所要求直线和实际数据点之间的距离
the distance between whatever line is in question
也就是它们 y坐标的差值
and the points. It’s the amount of y distance.
这个差值平方后 就是我们要的平方误差
That’s the error and then we square that error.
我们需要的就是实际数据点的坐标和对应在y线上点的坐标
So we need the actual points and the actual spot on the y line.
就是这样 那这里我们……
So that is that. Do then what we’re going to say is the…
其实这里我们只要 return
Let’s actually here what we need to do is return the sum
sum(ys_line – ys_orig) 就可以了
of the…Let’s see sum of the ys_line minus the ys_original–ys_orig. OK.
再平方一下
Squared.
这就是数据点对应整条 y线 的平方误差了
So that would be your squared error for the entire line.
可以看出来这个式子挺简单的
And in…you know this equation is relatively simple.
但我还是想把它写成平方误差的函数 这样调用起来比较容易
But I want to give it a function of squared error just so I think it’s a little easier to call upon but
但你不想这么做也可以
feel free to to do what you want. But of course
你可以用 y_orig 来放原始数据
you can get the original data points by just ys_original.
直线方程我们已经知道了 就是 mx + b 这我们在前面已经算了
And then you can get the line because we know what mx and b or……I mean…well we know it.
在这里已经算出了 mx + b 所以只要代入到下面这
mx and b are. But we got m and b already here. So we would just plug in
就可以得到我们对某个数据点所做的预测值了
mx and b to get the y line of any y original point.
好的
So anyway.
这里我们画出这些就可以了 因为我们已经用 x
In our case that’s all we are plotting since we created our regression line
创建好了我们的回归线 regression_line
only using the x’s for x in xs.
好的 这就是平方误差
Anyways. So that’s our squared error.
接下来我们来计算决定系数
Now we need to calculate coefficient of determination which again
也就是 1 减去 y帽线的平方误差除以
is just 1 minus the squared error of the y hat line divided by the squared error
ys均值的平方误差
of the mean of the y’s.
我们可以算出它来 def coefficient_of_determination()
So we can calculate that. Define coefficient_of_determination().
参数就是 ys_orig 和 ys_line 这里是逗号
That’s ys_orig and then ys_ line that should be a comma.
过程我们就写 y_mean_line =
We calculate that by saying the y_mean_line equals
mean() 这里我们得用方括号
the mean of…actually what we need to do is brackets
这里就是 mean(ys_orig)
This is mean(ys_orig)
for y in ys_orig
for y in ys_orig.
这样我们就定义好了 y_mean_line 它会算出
So that will make our y_mean_line. That’s just…it’s just it will make a
一个数来 这个数的值就是
single value and each value is just the mean of y
原始数据中 y 的均值 就是我们的 y_mean_line
for every y that we have in the original line. So that’s our y mean line.
squared_error_of_regression_line =
squared error of the regression line is equal to
这里我就不打字了 直接复制过来
I’m just going to copy and paste rather than typing this out. So copy that.
粘贴 然后
Paste. And then we’re gonna say the
squared_y_mean = squared_error(y_orig
the squared_y_mean equals squared_error y_orig.
然后这里的 ys_line 改成 y_mean_line
And then instead of ys_line. It’s the y_mean_line.
最后我们 return
And then finally we just return 1 minus
这里应该就是 1 –
and then that would be 1 minus the squared error of the regression line.
(squared_error_of_regression_line / squared_error_y_mean) 对吧?
divided by the squared error of the mean line, right?
1减去回归线平方误差除以均值线平方误差
1 minus squared of regression divided by the squared error of the mean line.
好的 这就是我们的决定系数
Great. So there’s our coefficient of determination.
接下来我们要做的就是 我们可以
So now all we would have to do is…We might say something like
到下面这来
we might come down I don’t know here.
这里写 r_squared 你可以叫它决定系数
We could say R squared or you could call it coefficient of determination
= coefficient_of_determination
equals coefficient of determination.
参数这里就写 ys_regression_line
And then you might have something like this: ys_regression_line.
这是 ys_orig 这里的ys_line是我们想要求出的线
So that’s the y’s original. That’s the line we’re curious about.
我们还想知道回归线的 R平方的值 那我们就打印出 r_squared
And we want to know the R squared value of that regression line. So we could print r_squared.
保存然后运行
Save and run it.
得到的值是0.58
And the value we get here is 0.58.
那如果回归线
So just as a know if the regression line was…
我们假设回归线的准确度和y均值一样
If the val…Let’s say the regression line is as good as the y mean.
那这个值就会是0 对吧?基本上就是 1 减去 1 是吧?
Then our value here would be 0, right? To be 1 minus basically a whole number 1, right?
当然不能说50%就是一个好的准确度
So you know…Anything you know like…You can’t just say anything about 50% is more accurate
但是任何大于0的回归线都是更加准确的
But anything above 0 means the regression line was more accurate.
接下来就是你们自己去找决定系数了
Now you kind of have to make your own determination of
要确定自己到底要什么样的决定系数
what kind of coefficient of determination line you’re looking for.
这里我们的系数值是0.58 显然要比均值线更准确
In this case we get 0.58 which is obviously it’s significantly more accurate
因为要得到0.58 这里的值就得是0.42
because you know to get 0.58 the equation would have to be 0.42.
那0.42差不多就是
So to be 0.42 that would be
100分之42 那这个平方误差就非常小了 对吧?
basically like 42 out of 100. So the squared error is much less, right?
不过平方误差和决定系数
So anyways squared error and coefficient of determination is not
并非计算拟合线准确度的唯一方法
the only calculation of how accurate the best fit line is.
但这确实是计算最优拟合线
But it is A calculation of
到底好不好的一个重要方法
how good of a fit the best fit line is.
下一讲我们就可以用一些例子
So in the next tutorial what we’re going to do is build
来测试一下我们到现在为止讲的东西
some sample data or in test our everything we got so far.
所有的这些算法什么的 会用到很多数学 虽然都是基础数学
All our algorithm and all that. There’s a lot of math that’s involved here. It’s basic math
但是还是挺多 我们得想办法确保
but it’s a lot of math involved. We need to have some sort of way to figure out
所有结果都能和0.58一样好 过程中可能会出现错误 而且我们也没什么
if everything is right like 0.58. Something could be totally wrong here. We wouldn’t really have any way to
方法去找到哪里错了 要不就只能用手算了
figure out how it’s wrong. Other than maybe doing it by hand or something like that.
那么下个视频我们要来检验我们的理论
So in the next tutorial we’re gonna be talking about testing all of our assumptions
会用一些测试数据什么的
and sample data and stuff like that.
如果你有什么想法就在下面留言吧 我们下次见
If you have questions comments leave them below. Otherwise until next time.

发表评论

译制信息
视频概述

本节讲解了如何用 Python 来实现计算决定系数(R平方)。

听录译者

[B]刀子

翻译译者

[B]刀子

审核员

审核团1024

视频来源

https://www.youtube.com/watch?v=QUyAFokOmow

相关推荐