ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

《机器学习Python实践》#16 创建我们自己的K最邻近算法 – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

《机器学习Python实践》#16 创建我们自己的K最邻近算法

Creating Our K Nearest Neighbors Algorithm - Practical Machine Learning with Python p.16

大家好 欢迎来到Python机器学习系列教程第16节
What is going on everybody. Welcome to part 16 of our machine learning with Python tutorial series.
这一节 其实从上节开始 我们就一直再说K最近邻算法
In this tutorial, actually in the previous tutorial we’ve been talking about K nearest neighbors.
我们先做了直观的了解
We talked about the intuition which is basically
基本上就是任何点的分类取决于
the class of any given point is
离它最近的K个点的分类 比如K是3的话
based on the vote of K. Let’s say K equals 3
那就是看离这个点最近的3个点都是什么分类
So the three closest data points to it.
在上一节我们讲了
And then we talked about in the previous tutorial
衡量离得远近的标准是欧几里得距离
that closeness is measured by Euclidean distance. And then also we showed
此外我们还试了一下K最近邻算法 最后的准确率也很高
the accuracy overall of K nearest neighbors is actually very impressive.
那从这节开始
So in this tutorial
我们就要来自己写一个K最近邻算法出来
we’re actually now getting to the point where we’re going to write our own K nearest neighbors algorithm.
那就开始吧 首先要做的就是
So the first thing we’re going to go ahead and do is
我要把这些东西删掉
I’m going to remove this stuff here.
然后再引入一些模块
And we’re going to add some new imports here.
首先就是 import numpy as np
So first I’m going to say import numpy as np.
这节我们要用numpy模块
And we’re going to be using numpy. Especially here
特别是这里计算欧几里得距离 要换成numpy形式的
we’re going to actually change this Euclidean distance formula to use numpy.
numpy有自带的函数可以
numpy it turns out actually has a built-in function that will
直接给我们算出欧几里得距离
calculate Euclidean distance for us.
之前我没用它是因为它
The reason I chose not to use it is
没有这么写来的明了
it’s not as obvious as this.
这样写很直接 就是
This is very clearly. The square root of
(x1 -x2)² + (y1 – y2)²的开方
(x1 – x2)² + (y1 – y2)²
对吧?这样写就很清楚
Right? Well that is…it’s very obvious this way but
如果用numpy的话就没这么明显了 不过numpy的计算速度会更快
with numpy it’s not so obvious but we’re going to use numpy because it’s faster.
再次强调一下 我之所以这么细致的写这些公式
But again, the whole reason I wanted to break down the formulas for you all
是想让你们对算法背后的机制有更深的理解
was so you have a better understanding of how it actually works
如果直接使用numpy的函数
using numpy right out of the gate you wouldn’t probably
可能就不那么容易理解了
as easily understand how it works.
好的 这就是我为啥要写这些 不过现在可以先把这些放一边
So anyways, that’s why I did it that way but we’ll keep that there for now.
接下来我们就快速引入numpy来使用吧
But we are going to bring in numpy here pretty quickly.
接着就是import matplotlib.pyplot as plt
Then I’m going to say import matplotlib.pyplot as plt.
然后from matplotlib import style
from matplotlib we’re going to import style.
from collection import Counter等会我们要用这个方法来进行分类投票
from the collections we’re going to import Counter which is how we’re going to do the votes basically.
然后会用style.use(‘fivethirtyeight’)
Then we’re going to do style.use and we’re gonna say ‘fivethirtyeight’.
我们还得import warnings
Also let’s…we’re going to import warnings.
这个主要是用来提醒用户让他们别瞎挑一个K的值
And this is so we can warn the user when they’re attempting to use a dumb number for K.
接下来删掉这些 当然euclidean_distance就先留下来
So now we’re gonna get rid of this and we’ll leave euclidean_distance here for now but…
要不然……
In fact let’s just…
我也不知道怎么处理它了 就先删掉吧 用的时候可以再写
I don’t know what I want to do with it. I think I was going to delete it for now. We’ll rewrite it anyway.
接下来是数据集
So now let’s say we have a data set
我们用一个字典来表示
where…and we’re going to make this a dictionary.
这里我要稍微取巧一下
And I’m going to kind of cheat a little bit
这样会让事情简单些 我们假设
so we can make things actually kind of easier down the line. But we’re going to say
这是一个分类 比如说就是k分类
this is a class. We’re going to say the class of k.
k分类是一个嵌套列表
The k class is the following. It is a list of lists.
列表里包含的是坐标
The list in here is actually coordinates
我其实……我还是别叫坐标了 不好意思
or…I hate…we’re not going to call them coordinates. I apologize.
这些是特征 我们把这叫做特征 毕竟这是个机器学习教程
These are features. We’re going to call them features. This is a machine learning tutorial.
我是想把它们用图像画出来 不过就这样吧
I’m thinking about plotting them. Anyways.
所以这个数据集有2维的特征
So this one has and these are going to be two-dimensional features.
那k的特征是[1,2] [2,3] [3,1]
So the k…the features of k are 1,2 2,3 and 3,1.
好的 那这些特征就对应了k的分类
Okay. So those are features that correspond to the class of k.
接着就是另一个值 r
Now we’re going to have another value which will be ‘r’.
r就是分类 也就是标记
That’s the the class. That’s the label.
标记的话……唔……并不是一个字典
And that label…whoops…is not a dictionary.
它的元素数应该和特征数目对应
It is a corresponds again to the same number of features
也就是r有3个元素 这些特征就是[6,5]
of 3 features for r. And those features are 6,5
还有[7,7]以及[8,6]
we’ll do 7,7 and 8,6. Okay.
这里我们有了两个分类以及它们的特征
So we have two classes here and their features.
好 接下来我们有了一个新的点
Okay. So and then let’s say down the line we have a new point
或者说新的特征
or new…we’ll say new features.
我有时可能会忘 不过用特征这个表达要更好一些
I’ll probably forget that later on but new features is probably better
[5,6]新的点的坐标就是[5,6]
better worded. 5,6. So that’s the new point 5,6.
光是看到这些你可能就已经可以推测出来
Now looking at this visually you can probably already surmise
它属于哪个分类了
which one this…this better belongs to.
可以大胆猜测一下 不过别担心
So you can take a wild guess but…and don’t worry
很快我们就会写出这个算法并应用在一些更实际的数据上了
soon enough we’ll actually…we’ll write this algorithm and then we’ll apply it to a more realistic data set.
这里就先用一下这些简单数据就好
This is just simple data set for now.
接下来我要
And I’m going to write a
写一个单行for循环来图像化这些数据
one-line for loop that’s going to graph this.
那这里就写plt.scatter
So let’s say we’re going to say plt.scatter
我们就用散点图吧
and we’re going to scatter…Let’s say plt.scatter
就是这样plt.scatter(ii[0])
and we’re going to say (ii[0]).
接下来我要……让我先把循环写出来
Actually you know what? I’m gonna do…First let me write it out.
这里就有for i in dataset:
So let’s say you had for i in dataset:
然后for ii in dataset[i]:我们要
for ii in dataset[i]: so we’re iterating through
对数据集中的i进行迭代 i对应k和r
you know for i in dataset, i corresponds to k and r.
下面的for ii in dataset[i]的意思就是
Then for ii in dataset[i] that would be for each
应用到所有的特征上
feature set basically.
那在for ii里我们要用plt.scatter(ii[0], ii[1], s=100, color=i)
So for ii we’re going to do plt.scatter(ii[0],ii[1], s=100, color=i)
所以我要用这些
And that is why I use those.
那要怎么写出
So how might you
单行for循环呢?
break this back down to be a one-liner for loop?
你可以这样
You would just…you could do basically this.
这里是plt.scatter color是i ……然后这么写
So plt.scatter that…i color equals i and then do this
for ii in dataset[i] 接着是
for ii in dataset[i] and then
for i in dataset 好的
for i in dataset. Okay.
一般我就是这么写单行for循环的 一行就可以完成我们想做的事
That’s usually how I build a one-liner anyway. So this should do what we want to do.
然后是plt.show()来看看结果
And then let’s go ahead and plt.show() to see our data.
图在哪呢?出来了 好的
Where are you? There it is. Okay.
这就是我们的数据
So there we have our data.
这样我们就能比较直接的看到我们要处理的数据
And again, it’s pretty simple to see kind of what we’re working with.
我们甚至可以这么写plt.scatter()
And in fact we can even do here plt.scatter().
然后得这样 这里是0 还有1
Think we’ll have to do this. 0 1
好的 这个点有点小 应该把大小设置成100 不过你大概可以明白这个意思
Okay. That was pretty small. We should have done size equals 100 but you get the point.
看到这个数据图像 你大概会想
So just looking at it visually. We’re probably thinking
哦 它应该属于红色分类 对吧?
Okay, that belongs to the red group, right?
我们有了数据 然后我们用散点图来图像化这些数据
So we have our data. We are scattering that data just so we could visualize it.
接下来就要开始写K最近邻算法了
But now let’s get started with defining the K nearest neighbors algorithm.
我先把这里注释掉
So I’m going to comment this out for now
至少我们已经看过我们的数据长什么样了
but we at least wanted to see our data.
接下来我们需要有k_nearest_neighbors
So we know that eventually we want to have k_nearest_neighbors
我们要写这么一个函数
as we’ll do a function.
为了使用K最近邻算法 我们需要将数据集作为参数传入 对吧?
And in order to calculate K nearest neighbors we need to pass through data, right?
这里传入的就是训练数据集了
This is the training data basically.
接下来我们还要传入预测数据集 对吧?
And then we need to pass through whatever we’re trying to predict, right?
然后需要一个参数来确定K的数值
And then we need a value for K.
好的 我们把k的默认值设为3
Okay. And for now we’re going to default k to equal to 3.
scikit-learn的k的默认值应该也是这样 也有可能是5
which is the same I believe as scikit-learns K nearest neighbors. It might be 5
不过到时候再看吧
but we’ll look at that when we come to it.
接下来就是 举个例子
So then what we’re going to say is for example
我们大概知道最后需要
we know that eventually what we want to have is
如果len(data)小于
if len(data) is less than
或者说大于或等于k
or is greater than or equal to k. If we have more
我们应该会传入一个字典 对吧?
because we’re going to pass through a dictionary, right?
比如像这个字典的长度
And the length of a dictionary like the length of this dictionary here
就是2 其实就是键的数目 对吧
is 2. It has to basically keys. Okay?
应该就是2
So that would be 2.
那如果字典的长度大于或等于我们选的k的值
So if the length is greater than or equal to the value of k that we choose.
我们就要给用户返回一个警告 所以这里写warnings.warn()
we’re going to just send the user a warning so we’re gonna say warnings.warn().
然后警告信息我们就写 你好笨……开个玩笑哈
And then we’re just going to say you’re dumb. Never mind…
警告信息就写 k的值要设置为
We’re gonna say k is set to a value
小于总的分类数 之类的这样的话
less than total voting groups something like that.
笨蛋!啊没忍住……好吧
Idiot! Just kidding…Okay
所以 这里就随便你写吧 接下来
So anyway, you can leave you there if you like. And then…
接下来就是knn算法部分了
And then we’re going to do you know special knnalgos.
然后要返回结果
Then we’re going to return the the results.
应该就是一个投票结果 那么就写成vote_results吧
It’s probably going to be a vote results. So we’re just a vote_results. Okay.
这就是我们的K最近邻算法的开头
So that’s the starting point of our K nearest neighbors algorithm.
我就先停在这里了 下节视频里
And I’m going to go ahead and cut it off here. And in the next tutorial
我们再继续完成它
we’ll continue building this
基本上就是我们传入数据然后得到最后投票的分类结果
to the point where hopefully we can pass out data and actually get that vote result
差不多要把这个完成
and probably bring at least this one to a close.
然后我们就可以用一些真实的数据来测试一下
And then we can test it on real data.
就这样吧 如果你们有任何问题或者评论 就在下方留言吧
Anyways, if you have questions comments concerns whatever post them below.
感谢收看 支持和订阅 我们下节课见
Otherwise as always thanks for watching, thanks for all support and subscriptions and until next time.

发表评论

译制信息
视频概述
听录译者

[B]刀子

翻译译者

[B]刀子

审核员

审核员1024

视频来源

https://www.youtube.com/watch?v=n3RqsMz3-0A

相关推荐