• 科普

SCIENCE

英语

ENGLISH

科技

TECHNOLOGY

MOVIE

FOOD

励志

INSPIRATIONS

社会

SOCIETY

TRAVEL

动物

ANIMALS

KIDS

卡通

CARTOON

计算机

COMPUTER

心理

PSYCHOLOGY

教育

EDUCATION

手工

HANDCRAFTS

趣闻

MYSTERIES

CAREER

GEEKS

时尚

FASHION

• 精品课
• 公开课
• 欢迎下载我们在各应用市场备受好评的APP

点击下载Android最新版本

点击下载iOS最新版本

扫码下载译学馆APP

《机器学习Python实践》#5 回归预测

Regression forecasting and predicting - Practical Machine Learning Tutorial with Python p.5

What’s going on everybody!

Welcome to the fifth Machine Learning and fourth regression tutorial.

In this tutorial we’ll be building on the last one

where we created this linear regression algorithm.

We found that it got great accuracy and all that.

And now we’re ready to actually predict

like out into the unknow. All right?

So it turns out that we actually already do have some unknown data.

Simply because we’re forecasting out the shift right which is about 30 days.

So we can actually work with that.

So we’re gonna do is where we define our…

our Xes…
X 的值
Let’s do the following.

Let’s actually cut this…come down here…

Paste.

And we’re gonna take this dropna cut and paste.

And now I am reminded why I was doing that negative forecast_out.

So…er…

So what we’re gonna do is

X equals X to the negative forecast_out.
X = X[-forecast_out]
Let’s see…negative forecast_out…to the point of negative forecast_out.

And then we’re gonna say…

We’re gonna do X_lately

equals X to the minus forecast_out colon.

And then we’re gonna drop

the missing data when we go to create the labels.

In this way we have both our Xes and our X_lately defined.

So the X_lately basically is the stuff we’re gonna actually predict again.

So we have the Xes.

And we just need to figure out

what the m and b is right? For y = mx + b
m 和 b 分别是什么就行了 对吧？就是这个函数 y = mx ＋ b
We get the answer for y.

We’ve done the linear regression.

So…so we’re gonna do against this X_lately

that’s we actually don’t have a y-value for.

Which is why we were not training or testing on that data.

So now we have X_lately.

So the next thing we’re gonna go ahead and do is basically

we’ll come down here and actually it’s going to run this really quick

just make sure we’re still getting

the accuracy we don’t have incorrect number of values.

OK no we don’t. So we’re good.

So we’ve got 96% accuracy.

Awsome~

So we come down. We’ll comment this out.

We don’t really need that anymore.

And now to predict stuff.

What you will do is also make sure we scale…

So

what we need to do is take this

Almost made a mistake there. And this…

Now let’s run that one more time.

I’m wanna make sure I don’t screw thing up.

Good. OK.

So now we’re gonna do is we’ll come down here.

And…

We need to predict based on the X data.

So the way that we can do this once you have a classifier

doing a prediction is super easy. So…

We’re gonna say forecast_set equals clf.predict.

And in here you’re gonna actually pass a single value.

Or you can pass like in

an array of values to predict, make a prediction, per value in that array.

And that’s what we’re gonna do right?

We’ve got this 30 days of database basically right here.

So…er…

X_lately rather so 30 days here. OK.
X_lately 应该就有30天的数据 好的
So last 30 days. So X_lately…

We want to create that with X_lately.

So then we have forecast_set. So now…

We can do…We can print…

er…forecast_set…
forecast_set
forecast_set
forecast_set
confidence and forecast_out

Just so we know how many days were forecasting out here.

Uh-oh…

confidence…do we…er…I’m sorry. So I change the set I usually use confidence…

So accuracy…try again…

Pull this up…

And yeah so there we go…

So we got our predict value.

So these are

the basically the next 30 days of unknown values for us.

That’s like these just straight up the stock prices

which is that pretty cool?

Because we…you know…that whole scaling part

is also playing a major role here and still outputting

You know…stock prices there

of decent value to us.

Anyway, I think it’s cool

So these are the next 30 days’ prices

So then let’s say you want to graph that.

So what we’re gonna do? We’re gonna come to the top

And again we’re gonna just blast through wrapping this

if you’re confused or whatever I have matplotLib tutorials

But otherwise we’re gonna import and in fact…

er…datetime?
datetime
And then we’re gonna

import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
from matplotlib import style
from matplotlib import style
And we’re gonna say style.use(‘ggplot’)

This is just to plot stuff

This is how to make it look decent.

This is how to specify which decent looking thing you want.

So now what we’re gonna say is…we’ll come down…

er…Let’s come down here…And we’re gonna say

df[‘forecast’] equals np.nan
df[‘Forecast’] = np.nan
This is just specifys the entire column.

It’s just full of a lot number of data and you’ll see why at a moment.

But we actually puts some information there shortly.

Now we need to find out what the last day was.

This may not be the best-est way to do something like this

But this is what how we’re gonna actually plot this on the graph.

So we say the last_date equals df.iloc.

Oops [-1]. So this is the very the last day we’ll get the name of that.

And we’re gonna say the last_unix value is equal to last_date.timestamp.

And then one_day.

This is how many seconds in a day.

So you can just do the math there if you want.

But it’s 86400

And then the next_unix

would be like the next day, right?

And these are…we know these are daily prices

So we’re just gonna work kind of hard coding this part of it.

Just so we can create graph.

The last_unix + one_day.
last_unix + one_day
So when you do a prediction

the prediction has no idea like

what date that is…that’s like four…right?

So remember

when you doing machine learning X and y does not correspond to like

necessarily the Xes on the graph.
y 并不一定是 X 的函数值
In this case, it doesn’t. X are the features

y is the label. It just so happens the label is the price
y 是标签 那是因为 y 刚好就是价格这个标签
so y is correct.

But the X is correct? No, because the date is not a feature.

So that’s why we can’t have it work around here

Because we actually don’t have the date values.

I have lost my mouse…there we go

Anyway that’s unix. OK.

So now we get the dates.

And now we actually populate the data frame

with the new dates

and the forecast value. So…

the way we’re gonna do that is we’re gonna say

for i in forecast_set
for i in forecast_set
next_date equals datetme.datetime.fromtimestamp
next_date = datetime.datetime.fromtimestamp
next_unix
next_unix
And now we’re just gonna say

next_unix plus equals that value of one_day
next_unix += one_day 的值
So one_day

And then df.loc

and then next_date…oops…next_date

equals

and then we’re gonna do like one-liner for loop here

So we’re gonna say np.nan for something we don’t care about

in range()
in range()
len(df.columns)
len(df.columns)
er…let’s see…minus 1

And then plus i

So we’ll do is iterating through the forecast_set

taking each forecast and day

and then setting those as the values

in the data frame

basically making the features

the future features, not a number. OK.

And then the last line just takes all of the first columns

sets them to, not to numbers, and in the final columns

whatever i is. Which is the forecast in this case.

So now we’re gonna go ahead and do is

we’re gonna say df

And then we’re gonna say df[‘forecast’]

forecast.plot
df.[‘Forecast’].plot()
And then we’re just gonna do plt.legend

we’ll put that in the fourth location

That’s just like the bottom, right?

And then we’re gonna say plt.xlabel

And we’ll say that’s the date.

plt.ylabel 这就是价格
And finally, plt.show

OK. So we zoom to that. Hopefully that for loop is gonna work out

We’ll find out shortly.

See…and the graph here…OK

So this is our actual graph of the data here

Pull this up…

And as you can see. This is the known data here.

And then over here is our predicted data.

So let me zoom in to that spot

So this is like the future prediction here, the forecast. OK.

So it’s just like a really quick way…

to visualize the data.

And the really the complex part, the reason why we had all this nasty crap in here

We’re just simply so you can have dates on the Xes

Because that’s how I am. I want to have the dates there.

Anyway…Oh yeah…

So that’s how you can actually forecasted out

the data and actually do a prediction.

But the crux of doing prediction

with scikit-learn is right here

And just remember you can pass a single value

Or you can pass an array of values and it will

just output in the same order of the array of values

And then from there we just use logic

to know…because each investment is a day

right? Each price report was one day.

So then that just means

that each forecast was like one day later, right?

So we just kind of use our brians for that one.

So…anyway…

And I guess the other thing to think about df.loc…just in case…

I’m not sure we’re actually cover that in Pandas

But what happens there is basically .loc

is gonna referencing the index

for the data frame.

basically what that saying is that next_date

is a datestamp, right?

And that next_date is the…

index of the data frame. So…

Maybe it’ll help…just…

If you’re not confused in that at this point

Feel free to carry on to the next video. We’ll be talking about pickling.

But if you’re confused about that for loop. I just want to explain

that for loop just so everyone…no one is like:”what the hell?”

So anyway…So yeah…So here…

Right the date is the index. So when we say

df.loo(next_day) we’re saying the index.
df.loo(next_day)我们就是在用索引
And if that index doesn’t exist

It’s gonna created. And if it did exist we’re just gonna replace it.

OK. Then we’re saying np.nan for underscore in range(len(df.columns)-1)

What the heck is that?!

Well, that is just a list of values that are np.nan.

So basically we’re saying it’s np.nan for Adjust, High percent change.

All this stuff is just not a number, right?

Because this is in the future. We don’t have information on that data.

OK. Then…can we back down here?

Then we say + i

Remember i is the forecast, right?

i in forecast_set.
i in forecast_set
So when we’re just saying so basically it’s just the list

plus one value. So it’s just here like the huge list

Well not that huge…It’s just these many columns, right?

Well we just add the forecast at the very end.

So that’s just our super hacky way

of doing the following. I set to head there but probably more useful to set tail.

And so you can see the end of this data frame.

These are all those np.nans and then finally it’s just forecast, OK?

So that’s all that is.

Sorry I was just little confusing. Hopefully the explaination worked. If not

feel free to ask question wherever I’ll be happay to clarify.

Now…there…

There is actually one more thing I want to show you all before

we dive into the regression then actually write a regression algorithm all on our own

that’s pickling. The reason why you want to pickle is imagine

you have rather than training a classifier on this…you know

relatively small data set. We just had daily values for the last few years

But you know if you save this to a file. You know…it’s probably like

you know…500 killerbytes or something…who knows

But let’s say you have like intraday data. You’ve got like two gigabytes worth of data.

That’s gonna take a while to train the classifier on that data.

So won’t it be nice if eveytime you want to make a prediction

So just consider

making a prediction in using future data

Consider everytime you want to make a prediction you have to train the classifier

Is that not just crazy sounding?

So yes that’s crazy sounding.

So the next tutorial we’re gonna talking about pickling

which will let you save your classifier and then just quickly load it in

without any training time.

So definitely very useful with machine learning classifiers.

So anyway that’s what we’ll talk about the next video.

Questions comments leave them below. Otherwise as always thanks for watching

thanks for support and subscription and until next time.

[B]刀子

[B]刀子