《机器学习Python实践》#6 序列化和缩放 – 译学馆

• #### 科普

SCIENCE

#### 英语

ENGLISH

#### 科技

TECHNOLOGY

MOVIE

FOOD

#### 励志

INSPIRATIONS

#### 社会

SOCIETY

TRAVEL

#### 动物

ANIMALS

KIDS

#### 卡通

CARTOON

#### 计算机

COMPUTER

#### 心理

PSYCHOLOGY

#### 教育

EDUCATION

#### 手工

HANDCRAFTS

#### 趣闻

MYSTERIES

CAREER

GEEKS

#### 时尚

FASHION

• 精品课
• 公开课
• 欢迎下载我们在各应用市场备受好评的APP

点击下载Android最新版本

点击下载iOS最新版本

扫码下载译学馆APP

#### 《机器学习Python实践》#6 序列化和缩放

Pickling and Scaling - Practical Machine Learning Tutorial with Python p.6

What’s going on everybody and welcome to the sixth machine learning tutorial.

In this tutorial we are going to be talking about pickling, a little bit about scaling

and then we are going to move on into diving into the inner workings

of linear regression and of course the other algorithms.

So pickling really doesn’t really have anything to do with regression

just simply is a good…

a good thing to have it at your disposal so you can save yourself a lot of time.

So first of all what is pickle? Pickle is just serialization of any python object.

So this could be a dictionary or in our case could be a classifier or a whole host of other things.

So first of all what we are going to do is we are going to import,

not in Caps. we are going to import pickle.

Okay.

Also I noticed

Yeah here, for some reason, I was like double defining y is the label thing

My mortality is showing.

Anyway, um… yeah, just get rid of that if you did that.

hopefully nobody followed me into that mistake.

But anyway, um… yeah, so there is that.

Now what we are going to do is the way the pickle works is like

a file. Right. You open it. You save it.

Right. And then when you want to use it, you open it, you read it.

So we are going to come down.

Bascially, you know, at what point would you want to save the classifier?

Right, so like thinking logically when should we save the classifier.

Should we just say we save it here?

Probably not, I mean you could.

You could save that classifier but that’s untrained classifier.

So we probably save it here. Right, so like what is the purpose of saving a classifier?

It’s to avoid doing the training step.

Because that’s a very tedious step. Right, that’s what’s going to take the most time.

In our case it doesn’t really take that long because we don’t have that much data and we were threading it.

But you can imagine if you had gigabytes and gigabytes

or even terabytes of data that you were doing this on.

Every time you want to make a prediction, you would not want to

have to retrain an algorithm so instead you can save it.

Now of course you save it,

you know, you might want to retrain it like once a month.

So something like this, but you don’t have to retrain it every time you want to use it.

So what we’re going to say here is we’re going to say “with open”

and we’re going to say a “linearregression.pickle”

as a “wb”.

“as f”, so we’re going to open this file

with intention to write and we’re going to just use temporary variable as f.

And then we’re going to say “pickle.dump”.

What are we dumping? We’re going to dump that trained classifier, so “clf”.

Where are we dumping it? “f”.

So that dumps the classifier.

And then to use the classifier.

All you would actually do is you would say

something like pickle_in equals open

and then we would open this file,

open with “rb”.

And then we would say clf equals pickle

What I do here is copy this, paste.

Okay, so now let’s go aheah and just run this and see how we do.

Okay, so everything works at this point and whether or not you recognize the fact

we have actually renamed classifier or redefined the classifier here.

So now what would happen if we did this?

Right now there is no definition of classifier. We’re not saving it as a pickle.

We are simply reading a pickle.

So if I refresh it so we reload this, what we get is…

There we go, so you get this information here

and the pickle is saved in the directory where we are working in.

So that would be… if I could find it here

Open it up, right, there is your pickle data.

So in our case, the pickle data of the classifier is actually a really really small classifier.

That’s not a big model.

But anyway we save ourselves the time of actually training that classifier.

Also I said it in the introduction video, but I’ll say it again.

Remember that we live in a time where you can spin up a server

for a very short amount of time

and you could do that. So if you have like a slower computer,

maybe your only computer is a relatively slow laptop,

you have got, I don’t know, one of those old netbooks or something.

If you are one of those people, you can spin up a GPU cluster, you can just spin up a regular server.

You can spend a very powerful computer

and bascially rent it for a few dollars an hour on Linode, Digitalocean or Amazon Web Servers.

And if you do that, what you can do is generally what I do if I’m going to use a big server

Pretty much all these hosts work the same way, you take your data,

you put it on their server

and like you’ll set everything up, so you’ll transfer all your data there.

You’ll set up all the code you want to run it on that

and while you’re doing that, you are using a smallest version they have,

so you’re paying like a half a penny an hour

to rent that version of a server.

scale that server up.

And then you’re paying maybe a few dollars an hour, maybe even ten dollars an hour if you are crazy.

Scale it up, run your operation,

take your classifier, save it to a pickle,

scale back down the server, destroy the server, whatever and you are done.

So just a couple of quick, you know, pointers

because we are not really going to be talking much about that

for scale for a while, but this algorithm actually you can scale linear regression

very well, so just keep that in minds.

So anyways that’s really all I have to say about pickling and scaling

and now what we are going to actually be doing is writing our own linear regression algorithm

to learn how that actually works and all that. So very exciting stuff.

If you have questions comments concerns whatever leaving below,

otherwise as always thanks for your watching thanks for all the support subscriptions and until next time.

##### 译制信息

《机器学习Python实践》第六讲及回归教程第五讲，序列化和缩放。

[B]hugue

[B]hugue