ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

《机器学习Python实践》#6 序列化和缩放 – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

《机器学习Python实践》#6 序列化和缩放

Pickling and Scaling - Practical Machine Learning Tutorial with Python p.6

大家最近怎么样 欢迎来到机器学习教程第六讲
What’s going on everybody and welcome to the sixth machine learning tutorial.
在本节教程中 我们会谈论序列化 大概了解一下扩展
In this tutorial we are going to be talking about pickling, a little bit about scaling
然后我们会深入了解
and then we are going to move on into diving into the inner workings
线性回归以及其他算法的工作原理
of linear regression and of course the other algorithms.
序列化其实和回归没有任何关系
So pickling really doesn’t really have anything to do with regression
它只是一个很好的……
just simply is a good…
它在处理过程中很有用 可以节省很多时间
a good thing to have it at your disposal so you can save yourself a lot of time.
首先来看什么是pickle 它是用来对python对象序列化的工具
So first of all what is pickle? Pickle is just serialization of any python object.
它可以序列化字典、我们的例子中的分类器或其他东西
So this could be a dictionary or in our case could be a classifier or a whole host of other things.
首先我们要做的是“import”
So first of all what we are going to do is we are going to import,
不是大写 我们要“import pickle”
not in Caps. we are going to import pickle.
好了
Okay.
我还注意到
Also I noticed
在这 不知道什么原因 y被我重复定义了
Yeah here, for some reason, I was like double defining y is the label thing
错误就是从这开始的
My mortality is showing.
额 如果你犯了这个错误 修改一下
Anyway, um… yeah, just get rid of that if you did that.
希望没人会和我犯一样的错误
hopefully nobody followed me into that mistake.
那么 就这样了
But anyway, um… yeah, so there is that.
现在我们要做的是…… pickle的工作方式就像是一个文件
Now what we are going to do is the way the pickle works is like
你可以打开它 保存它
a file. Right. You open it. You save it.
你想用它的时候 就可以打开它 并从中读取数据
Right. And then when you want to use it, you open it, you read it.
向下来
So we are going to come down.
什么时候你会想要保存分类器?
Bascially, you know, at what point would you want to save the classifier?
逻辑思考一下 什么时候应该保存分类器
Right, so like thinking logically when should we save the classifier.
可以直接在这保存吗?
Should we just say we save it here?
可能不行 我是说你可以
Probably not, I mean you could.
在这保存分类器 但现在分类器还没训练
You could save that classifier but that’s untrained classifier.
我们或许可以在这保存 为什么我们要把分类器保存下来?
So we probably save it here. Right, so like what is the purpose of saving a classifier?
这是为了避免执行训练这个步骤
It’s to avoid doing the training step.
因为这是一个非常无聊的步骤 大部分时间都是花费在这
Because that’s a very tedious step. Right, that’s what’s going to take the most time.
我们的例子中不会花费这么长时间 因为我们没这么多数据 而且我们使用了多线程
In our case it doesn’t really take that long because we don’t have that much data and we were threading it.
但是可以想象的到 如果你有几G
But you can imagine if you had gigabytes and gigabytes
甚至几T的数据来执行这个程序的话
or even terabytes of data that you were doing this on.
每次要做预测 你不会想要
Every time you want to make a prediction, you would not want to
重新训练算法的 所以你可以把它保存下来
have to retrain an algorithm so instead you can save it.
当然你会把它保存下来
Now of course you save it,
或许你可能想一个月重新训练一次
you know, you might want to retrain it like once a month.
就像这样 你不必每次使用的时候都重新训练
So something like this, but you don’t have to retrain it every time you want to use it.
接下来在这我们要输入“with open”
So what we’re going to say here is we’re going to say “with open”
然后输入“linearregression.pickle”
and we’re going to say a “linearregression.pickle”
再然后是“wb”
as a “wb”.
然后是“as f” 这就是说我们要打开这个文件
“as f”, so we’re going to open this file
的目的是写入数据 并且我们会用临时变量f来表示它
with intention to write and we’re going to just use temporary variable as f.
然后再输入“pickle.dump”
And then we’re going to say “pickle.dump”.
我们要转储什么? 我们要转储训练好得分类器 所以是“clf”
What are we dumping? We’re going to dump that trained classifier, so “clf”.
要转储到哪呢? “f”中
Where are we dumping it? “f”.
这就会把分类器转储
So that dumps the classifier.
然后使用这个分类器
And then to use the classifier.
实际上 你要做的是
All you would actually do is you would say
输入“pickle_in = open()”
something like pickle_in equals open
然后我们会打开这个文件
and then we would open this file,
以“rb”方式打开
open with “rb”.
然后再输入“clf =”
And then we would say clf equals pickle
“pickle.load()” 我们会加载“pickle_in”
“pickle.load”, and we are just going to load “pickle_in”
我要在这复制 然后粘贴
What I do here is copy this, paste.
我们继续 运行这个程序 看看我们做的怎么样
Okay, so now let’s go aheah and just run this and see how we do.
一切都正常工作 无论你是否能看出来
Okay, so everything works at this point and whether or not you recognize the fact
其实我们已经在这重命名或者说重定义了这个分类器
we have actually renamed classifier or redefined the classifier here.
如果我们这样做会发生什么呢?
So now what would happen if we did this?
现在我们没有定义分类器 也没有通过序列化来保存它
Right now there is no definition of classifier. We’re not saving it as a pickle.
我们只是简单的读取了一个序列化后的文件
We are simply reading a pickle.
我刷新一下 重新加载它 我们就可以得到……
So if I refresh it so we reload this, what we get is…
可以看到 我们已经得到了所需信息
There we go, so you get this information here
序列化数据已经保存到了我们当前的工作目录
and the pickle is saved in the directory where we are working in.
我可以在这找到它
So that would be… if I could find it here
打开它就可以看到我们序列化后的数据
Open it up, right, there is your pickle data.
在我们的例子中 分类器的序列化数据很小
So in our case, the pickle data of the classifier is actually a really really small classifier.
这个模型不大
That’s not a big model.
那么 我们已经节省下了训练分类器的时间
But anyway we save ourselves the time of actually training that classifier.
我在导论课中说过 但是我要再说一遍
Also I said it in the introduction video, but I’ll say it again.
在如今这个时代 你可以在很短时间内
Remember that we live in a time where you can spin up a server
启动一台服务器
for a very short amount of time
你可以这么做 如果你有一台很慢的电脑
and you could do that. So if you have like a slower computer,
可能你唯一的电脑是一台相对较慢的笔记本
maybe your only computer is a relatively slow laptop,
或者是一台老旧的上网本或其他电脑
you have got, I don’t know, one of those old netbooks or something.
如果你是这种情况 你可以使用GPU集群 或使用常规服务器
If you are one of those people, you can spin up a GPU cluster, you can just spin up a regular server.
找到一台很强大的计算机不难
You can spend a very powerful computer
大体上 租用Linode、Digitalocean或AWS每小时只需几美元
and bascially rent it for a few dollars an hour on Linode, Digitalocean or Amazon Web Servers.
如果你打算这么做 基本就和我一样了 使用一台大型服务器
And if you do that, what you can do is generally what I do if I’m going to use a big server
所有这些在主机上的工作方式都非常类似
Pretty much all these hosts work the same way, you take your data,
你把数据传输到他们的服务器上
you put it on their server
在服务器上设置好所有东西 因此你要把数据传输到上面去
and like you’ll set everything up, so you’ll transfer all your data there.
也要设置好要在上面运行的代码
You’ll set up all the code you want to run it on that
你可以使用服务器的一个最小版本
and while you’re doing that, you are using a smallest version they have,
租用该版本的服务器大概
so you’re paying like a half a penny an hour
一小时只需要半美分
to rent that version of a server.
一旦到时候了
And once you’re ready,
你就可以扩展服务器
scale that server up.
那时大概每小时要花费几美元 如果你租用的太大的话可能每小时要10美元
And then you’re paying maybe a few dollars an hour, maybe even ten dollars an hour if you are crazy.
扩展服务器 运行你的计算任务
Scale it up, run your operation,
得到分类器 把它序列化后保存
take your classifier, save it to a pickle,
把分类器取到本地
take your classifier,
只要完成了 就缩减服务器、停止服务器
scale back down the server, destroy the server, whatever and you are done.
有几个关键点
So just a couple of quick, you know, pointers
我们不会过多谈论扩展方面的内容
because we are not really going to be talking much about that
但是这个算法对于线性回归的扩展效果很好
for scale for a while, but this algorithm actually you can scale linear regression
记住这些就好
very well, so just keep that in minds.
以上就是我讲的序列化以及扩展的内容
So anyways that’s really all I have to say about pickling and scaling
现在我们要做的是写我们的线性回归算法
and now what we are going to actually be doing is writing our own linear regression algorithm
来学习它究竟是怎样工作的 这非常激动人心
to learn how that actually works and all that. So very exciting stuff.
如果你有问题、评论或关注点 无论什么都可以写在下面
If you have questions comments concerns whatever leaving below,
感谢收看以及所有支持和订阅 下次见
otherwise as always thanks for your watching thanks for all the support subscriptions and until next time.

发表评论

译制信息
视频概述

《机器学习Python实践》第六讲及回归教程第五讲,序列化和缩放。

听录译者

[B]hugue

翻译译者

[B]hugue

审核员

审核团1024

视频来源

https://www.youtube.com/watch?v=za5s7RB_VLw

相关推荐