ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

《机器学习python实践》#11 编写R平方 – 译学馆
最新评论 (0)

《机器学习python实践》#11 编写R平方

Programming R Squared - Practical Machine Learning Tutorial with Python p.11

大家好 欢迎来到机器学习系列教程第11讲
Hello everybody and welcome to part 11 of our machine learning tutorial series.
In this video we’re going to be building on the last one which is where we learned
这一讲 我们来用代码实现它
how to calculate the R squared value or the coefficient of determination value.
And this value is the…It’s the value of how good of a fit is our best fit line.
好的 式子就是这个 我们具体怎么计算它呢?
Okay. So that is the equation. Now how do we actually go about calculating it.
So a big part of this equation is actually squared error.
So we’re going to create a new function that calculates squared error.
就加在这里吧 首先 def squared_error
So we’re just going to add it down here. And so this will be defined squared-error.
squared_error 计算了 y_original
And the squared error is the difference between the
或者就写成 y_orig
ys_original or a ys_orig, we’ll say
以及 ys_line 之间的差值 回忆一下 这个平方误差就是
And the ys_line. So recall that squared error is
the distance between whatever line is in question
也就是它们 y坐标的差值
and the points. It’s the amount of y distance.
这个差值平方后 就是我们要的平方误差
That’s the error and then we square that error.
So we need the actual points and the actual spot on the y line.
就是这样 那这里我们……
So that is that. Do then what we’re going to say is the…
其实这里我们只要 return
Let’s actually here what we need to do is return the sum
sum(ys_line – ys_orig) 就可以了
of the…Let’s see sum of the ys_line minus the ys_original–ys_orig. OK.
这就是数据点对应整条 y线 的平方误差了
So that would be your squared error for the entire line.
And in…you know this equation is relatively simple.
但我还是想把它写成平方误差的函数 这样调用起来比较容易
But I want to give it a function of squared error just so I think it’s a little easier to call upon but
feel free to to do what you want. But of course
你可以用 y_orig 来放原始数据
you can get the original data points by just ys_original.
直线方程我们已经知道了 就是 mx + b 这我们在前面已经算了
And then you can get the line because we know what mx and b or……I mean…well we know it.
在这里已经算出了 mx + b 所以只要代入到下面这
mx and b are. But we got m and b already here. So we would just plug in
mx and b to get the y line of any y original point.
So anyway.
这里我们画出这些就可以了 因为我们已经用 x
In our case that’s all we are plotting since we created our regression line
创建好了我们的回归线 regression_line
only using the x’s for x in xs.
好的 这就是平方误差
Anyways. So that’s our squared error.
Now we need to calculate coefficient of determination which again
也就是 1 减去 y帽线的平方误差除以
is just 1 minus the squared error of the y hat line divided by the squared error
of the mean of the y’s.
我们可以算出它来 def coefficient_of_determination()
So we can calculate that. Define coefficient_of_determination().
参数就是 ys_orig 和 ys_line 这里是逗号
That’s ys_orig and then ys_ line that should be a comma.
过程我们就写 y_mean_line =
We calculate that by saying the y_mean_line equals
mean() 这里我们得用方括号
the mean of…actually what we need to do is brackets
这里就是 mean(ys_orig)
This is mean(ys_orig)
for y in ys_orig
for y in ys_orig.
这样我们就定义好了 y_mean_line 它会算出
So that will make our y_mean_line. That’s just…it’s just it will make a
一个数来 这个数的值就是
single value and each value is just the mean of y
原始数据中 y 的均值 就是我们的 y_mean_line
for every y that we have in the original line. So that’s our y mean line.
squared_error_of_regression_line =
squared error of the regression line is equal to
这里我就不打字了 直接复制过来
I’m just going to copy and paste rather than typing this out. So copy that.
粘贴 然后
Paste. And then we’re gonna say the
squared_y_mean = squared_error(y_orig
the squared_y_mean equals squared_error y_orig.
然后这里的 ys_line 改成 y_mean_line
And then instead of ys_line. It’s the y_mean_line.
最后我们 return
And then finally we just return 1 minus
这里应该就是 1 –
and then that would be 1 minus the squared error of the regression line.
(squared_error_of_regression_line / squared_error_y_mean) 对吧?
divided by the squared error of the mean line, right?
1 minus squared of regression divided by the squared error of the mean line.
好的 这就是我们的决定系数
Great. So there’s our coefficient of determination.
接下来我们要做的就是 我们可以
So now all we would have to do is…We might say something like
we might come down I don’t know here.
这里写 r_squared 你可以叫它决定系数
We could say R squared or you could call it coefficient of determination
= coefficient_of_determination
equals coefficient of determination.
参数这里就写 ys_regression_line
And then you might have something like this: ys_regression_line.
这是 ys_orig 这里的ys_line是我们想要求出的线
So that’s the y’s original. That’s the line we’re curious about.
我们还想知道回归线的 R平方的值 那我们就打印出 r_squared
And we want to know the R squared value of that regression line. So we could print r_squared.
Save and run it.
And the value we get here is 0.58.
So just as a know if the regression line was…
If the val…Let’s say the regression line is as good as the y mean.
那这个值就会是0 对吧?基本上就是 1 减去 1 是吧?
Then our value here would be 0, right? To be 1 minus basically a whole number 1, right?
So you know…Anything you know like…You can’t just say anything about 50% is more accurate
But anything above 0 means the regression line was more accurate.
Now you kind of have to make your own determination of
what kind of coefficient of determination line you’re looking for.
这里我们的系数值是0.58 显然要比均值线更准确
In this case we get 0.58 which is obviously it’s significantly more accurate
因为要得到0.58 这里的值就得是0.42
because you know to get 0.58 the equation would have to be 0.42.
So to be 0.42 that would be
100分之42 那这个平方误差就非常小了 对吧?
basically like 42 out of 100. So the squared error is much less, right?
So anyways squared error and coefficient of determination is not
the only calculation of how accurate the best fit line is.
But it is A calculation of
how good of a fit the best fit line is.
So in the next tutorial what we’re going to do is build
some sample data or in test our everything we got so far.
所有的这些算法什么的 会用到很多数学 虽然都是基础数学
All our algorithm and all that. There’s a lot of math that’s involved here. It’s basic math
但是还是挺多 我们得想办法确保
but it’s a lot of math involved. We need to have some sort of way to figure out
所有结果都能和0.58一样好 过程中可能会出现错误 而且我们也没什么
if everything is right like 0.58. Something could be totally wrong here. We wouldn’t really have any way to
方法去找到哪里错了 要不就只能用手算了
figure out how it’s wrong. Other than maybe doing it by hand or something like that.
So in the next tutorial we’re gonna be talking about testing all of our assumptions
and sample data and stuff like that.
如果你有什么想法就在下面留言吧 我们下次见
If you have questions comments leave them below. Otherwise until next time.



本节讲解了如何用 Python 来实现计算决定系数(R平方)。