• #### 科普

SCIENCE

#### 英语

ENGLISH

#### 科技

TECHNOLOGY

MOVIE

FOOD

#### 励志

INSPIRATIONS

#### 社会

SOCIETY

TRAVEL

#### 动物

ANIMALS

KIDS

#### 卡通

CARTOON

#### 计算机

COMPUTER

#### 心理

PSYCHOLOGY

#### 教育

EDUCATION

#### 手工

HANDCRAFTS

#### 趣闻

MYSTERIES

CAREER

GEEKS

#### 时尚

FASHION

• 精品课
• 公开课
• 欢迎下载我们在各应用市场备受好评的APP

点击下载Android最新版本

点击下载iOS最新版本

扫码下载译学馆APP

#### 《机器学习Python实践》#8 如何编写出最优拟合斜率

How to program the Best Fit Slope - Practical Machine Learning Tutorial with Python p.8

What is going on everybody and welcome to part eight of our machine learning tutorial series.

In this part we’re gonna start working on

creating a simple linear regression algorithm from scratch in Python.

So to start we know that the definition of the line is y = mx + b.
x 对于我们来说是已知的 因为它就在 x 轴上嘛
And so we know that x will figure out just simply because that’s on the x-axis.

But we know we need to know m and b.
m 是我们要求的最优拟合线的斜率
And m is going to be our best fit slope.

And then b is that y-intercept.

So first we’re going to calculate for m, the slope.

And I’ll pull up the equation again just as a reminder so that’s
m 等于 x的均值乘以y的均值 减去 x乘以y 的均值
m equals the mean of the x values times the mean of the y values minus the mean of the x’s times the y’s.

All of that is divided over the mean of the x’s to the power of two minus
x均值的平方 减去 x²的均值 好的
the mean of all of the x’s to the power of two. Okay.

Easy enough. So now we’re going to be translating that into Python code.

So the first thing that we’re going to do is from statistics we’re gonna import the mean or mean

And then we’re going to import numpy as np.

so you should be able to guess why we’re bringing in mean.

There was there’s quite a few uses of mean there.

Also just for the record, the regression line…like a regression line is just take a straight line. So you…

So you know for example…an example might be just pull up an image here, right.

This is some data points and then you’ve got this straight red line. That’s your regression line.

That’s also your best fit line.

And you might even hear people call it the Y hat line if you’re talking to a statistician. Anyway.

Now we’re going to define some simple values.

You can get to the point where you’re using real data.

But I think the easiest thing to do is to just define some simple data.

So we’re just gonna say xs 1 2 3 4 5 and 6 yeah.
y 为 5 4 6 5 6 和 7
And then some ys will say are 5 4 6 5 6 and 7.

You don’t have to do this part but we’re going to visualize this data real quick.

So import matplotlib.pyplot as plt.

And I’m just gonna…we’re not gonna make it pretty or anything. I’m just gonna say plt.plot xs ys,

plt.show. Just so we can see the data we’re working with here. Okay.

So this is the data of course we just made it a line. So let’s make it a scatter plot. Okay.

So that’s the data that we’re working with.

It’s just a simple scatter plot but you can probably see already.

This is positively correlated data.

And you could probably think of a line might be something like this you know drawing up. Anyway

so that’s our data and I’m gonna move this over. Okay.

Now I’m gonna get rid of showing the graph.

Now we know that what we kind of need to do is if you recalling the

previous examples. Our data was not actually a Python list.

It wasn’t a Python array because it doesn’t exist and instead it was an numpy array.

So we’re gonna change this to np.array.

And you just can put that around in like parentheses

parentheses like that. So that’s just like basically converting this to
numpy数组了 接下来我们修改数据类型
our numpy array. And then we’re gonna also change the data type.

So we’re gonna say dtype or actually we’re not changing it. This will be the default but…np.float64.

And we’re mainly doing this because we’re probably going to be revisiting

linear regression.

And it’ll be in a time where the data type actually matters. For now,

you can or…you don’t have to put the data type there.

We’re just being very explicit there with the data type.

So now we want to get the

best fit slope. So let’s say we’re going to define a function

that generated the best fit slope.

Okay. So best fit slope.

And we know we passed the x’s and y’s through.

And then eventually we want to get to the point where we return m.

And then we would just be like m = best_fit_slope

of the x’s and y’s, right?

We’re done! Right?

But anyway that’s a nice a skeleton function there. So

the first order of business is

we do the mean of the xs times the mean of the ys, right?

So mean x times the mean of the ys.

So how do we calculate that?

Well we can start off by saying m

equals…now this is not complete of course but we’re gonna say
m等于 首先是 x 的均值
m equals, at first, it’s the mean of the xs,

multiplied by the mean of the ys.

OK.

So so far so good what was that next step?
x的均值 乘以 y的均值
Mean of the xs times the mean of ys.

And then it’s minus the mean of the xs

times the ys.

So now what we need to do is add that. So

the mean of the xs times the mean of the ys. That’s one variation or like one of the

I hate to say variables.

One of the parts. So then what we’re going to do is we’re gonna put more parentheses space.

You don’t have to add the space there. That’s definitely not PEP8.

Just making it easier to read because this is kind of going to be a long one.

Anyway minus and that was the mean of the xs

times the ys, Right?

Okay. So now going back to our function here.

It is…we have done the entire top of this fraction basically.

So now we need to do the next layer. So the next thing that we’re gonna do is coming back

to the code. We’re going to add a third parenthesis here.

So I’m gonna add another parenthesis

and a space. You’ve got this space or these parenthesis parenthesis.

I’m going to add a space.

I’ll slash. So this is our division sign.

And then I’m gonna hit enter.

And the reason why I’m able to hit enter

is because of this parenthesis here.

Okay. So anyways that’s why we’re encasing all this in a parenthesis.

Just for the record. So now going back to the bottom.

We’ve got…we need to do the mean of the x’s to the power of 2.

Okay. So how might we do that?

Well the…first of all

Let’s consider what the power of 2 actually is.

The power of two is basically the…

Like let’s say is mean the x’s to the power of 2.

That’s the mean of the x’s times the mean of the x’s. Okay.
Python 中有几种不同的方法可以实现这个 比如
So in Python there’s a few different things that we can do. You can do something like…

Let’s do this.

And you could say a lot of times you can do mean
x的均值 ^2 像这样
of the xs to the power of 2. Like that.

But let’s run that really quick and we’ll see that

we get this unsupported operand for

the data type we’re using. Okay. Another option

can be times times.

And it looks like that one is acceptable.

Another option is mean of x’s literally times the mean of the xs.

Okay. Both of those will give you what you’re looking for.

Now finally it was…
x均值的平方 减去
the mean of the x’s squared minus the
x²的均值
x’s square or the mean of the x’s squared.

So for us to do that and that was minus, right?

So this is all one. So at this point we’re gonna go minus
x²的均值 所以是
mean of the…And this is the mean of the x’s squared. So this could be…

you could…a lot of times you can get away with this but I don’t think…

Because of our data type we’ll do that. Yeah. So…

or actually…or ha…2 typo. What an amateur.

Anyway, all right. So that’s not gonna work out for us.

So another option will be like that…like times times 2. Okay that’s acceptable.

And then there’s of course…x’s like this. Okay.

So you can do whichever one you makes you feel better in sleep better at night.

Regardless there are…is our m and then we can even go and print out m.

Okay so m minus fifteen point two six is what we’re getting here.

So let me see here…what did I…

not content with that. Let’s see…That does close all that off. Slash…

So I’m getting myself confused now. so let’s see…

Let me make sure we close this off right. I don’t think that should be what we’re getting there.

That’s about right and then we wanted this.

mean of

Because this whole thing needs to be divided by this whole thing.

So the mean of the xs like this.

Let’s run that one more time.

That looks to be more along the lines of what I’m looking for. Okay. So

So welcome to the world of PEMDAS alright.

This is the order of operations, right? Parenthesis, exponents, multiplication, division, addition, subtraction.

So if you want to get around

the order of operations you need to use parentheses correctly. I was…my issue

was that we were dividing…

We were trying to do…What we needed to do was divide by this

minus this like both those things together.

But instead division was occurring

before subtraction.

So we were actually dividing…doing this divided by this. And then subtracting this from the final answer

which was why we were getting such a large negative number there.

Okay. So again this is just…this is the slope of a line, so…
-15的斜率就有些……
negative 15 slope is its…

First of all negative 15 is kind of weird.

Oh printing out PEMDAS.

It’s weird to have a negative slope to the line

where you have clearly a positively correlated data.

But anyways. So we have our

our m.

And now we need just one more thing and that is our b.

So that’s what we’re going to be working on in the next tutorial is calculating b.

And the once we have that we can do linear regression.

So anyways stay tuned in the next video if you’ve got questions, comments, concerns, whatever up to this point.

Please feel free to leave them below. Otherwise as always thanks for watching, thanks for all the support subscriptions and until next time.

[B]刀子

[B]刀子