• #### 科普

SCIENCE

#### 英语

ENGLISH

#### 科技

TECHNOLOGY

MOVIE

FOOD

#### 励志

INSPIRATIONS

#### 社会

SOCIETY

TRAVEL

#### 动物

ANIMALS

KIDS

#### 卡通

CARTOON

#### 计算机

COMPUTER

#### 心理

PSYCHOLOGY

#### 教育

EDUCATION

#### 手工

HANDCRAFTS

#### 趣闻

MYSTERIES

CAREER

GEEKS

#### 时尚

FASHION

• 精品课
• 公开课
• 欢迎下载我们在各应用市场备受好评的APP

点击下载Android最新版本

点击下载iOS最新版本

扫码下载译学馆APP

#### 《机器学习Python实践》#16 创建我们自己的K最邻近算法

Creating Our K Nearest Neighbors Algorithm - Practical Machine Learning with Python p.16

What is going on everybody. Welcome to part 16 of our machine learning with Python tutorial series.

In this tutorial, actually in the previous tutorial we’ve been talking about K nearest neighbors.

We talked about the intuition which is basically

the class of any given point is

based on the vote of K. Let’s say K equals 3

So the three closest data points to it.

And then we talked about in the previous tutorial

that closeness is measured by Euclidean distance. And then also we showed

the accuracy overall of K nearest neighbors is actually very impressive.

So in this tutorial

we’re actually now getting to the point where we’re going to write our own K nearest neighbors algorithm.

So the first thing we’re going to go ahead and do is

I’m going to remove this stuff here.

And we’re going to add some new imports here.

So first I’m going to say import numpy as np.

And we’re going to be using numpy. Especially here

we’re going to actually change this Euclidean distance formula to use numpy.
numpy有自带的函数可以
numpy it turns out actually has a built-in function that will

calculate Euclidean distance for us.

The reason I chose not to use it is

it’s not as obvious as this.

This is very clearly. The square root of
（x1 -x2）² + （y1 – y2）²的开方
(x1 – x2)² + (y1 – y2)²

Right? Well that is…it’s very obvious this way but

with numpy it’s not so obvious but we’re going to use numpy because it’s faster.

But again, the whole reason I wanted to break down the formulas for you all

was so you have a better understanding of how it actually works

using numpy right out of the gate you wouldn’t probably

as easily understand how it works.

So anyways, that’s why I did it that way but we’ll keep that there for now.

But we are going to bring in numpy here pretty quickly.

Then I’m going to say import matplotlib.pyplot as plt.

from matplotlib we’re going to import style.
from collection import Counter等会我们要用这个方法来进行分类投票
from the collections we’re going to import Counter which is how we’re going to do the votes basically.

Then we’re going to do style.use and we’re gonna say ‘fivethirtyeight’.

Also let’s…we’re going to import warnings.

And this is so we can warn the user when they’re attempting to use a dumb number for K.

So now we’re gonna get rid of this and we’ll leave euclidean_distance here for now but…

In fact let’s just…

I don’t know what I want to do with it. I think I was going to delete it for now. We’ll rewrite it anyway.

So now let’s say we have a data set

where…and we’re going to make this a dictionary.

And I’m going to kind of cheat a little bit

so we can make things actually kind of easier down the line. But we’re going to say

this is a class. We’re going to say the class of k.
k分类是一个嵌套列表
The k class is the following. It is a list of lists.

The list in here is actually coordinates

or…I hate…we’re not going to call them coordinates. I apologize.

These are features. We’re going to call them features. This is a machine learning tutorial.

I’m thinking about plotting them. Anyways.

So this one has and these are going to be two-dimensional features.

So the k…the features of k are 1,2 2,3 and 3,1.

Okay. So those are features that correspond to the class of k.

Now we’re going to have another value which will be ‘r’.
r就是分类 也就是标记
That’s the the class. That’s the label.

And that label…whoops…is not a dictionary.

It is a corresponds again to the same number of features

of 3 features for r. And those features are 6,5

we’ll do 7,7 and 8,6. Okay.

So we have two classes here and their features.

Okay. So and then let’s say down the line we have a new point

or new…we’ll say new features.

I’ll probably forget that later on but new features is probably better
[5,6]新的点的坐标就是[5,6]
better worded. 5,6. So that’s the new point 5,6.

Now looking at this visually you can probably already surmise

which one this…this better belongs to.

So you can take a wild guess but…and don’t worry

soon enough we’ll actually…we’ll write this algorithm and then we’ll apply it to a more realistic data set.

This is just simple data set for now.

And I’m going to write a

one-line for loop that’s going to graph this.

So let’s say we’re going to say plt.scatter

and we’re going to scatter…Let’s say plt.scatter

and we’re going to say (ii[0]).

Actually you know what? I’m gonna do…First let me write it out.

So let’s say you had for i in dataset:

for ii in dataset[i]: so we’re iterating through

you know for i in dataset, i corresponds to k and r.

Then for ii in dataset[i] that would be for each

feature set basically.

So for ii we’re going to do plt.scatter(ii[0],ii[1], s=100, color=i)

And that is why I use those.

So how might you

break this back down to be a one-liner for loop?

You would just…you could do basically this.

So plt.scatter that…i color equals i and then do this
for ii in dataset[i] 接着是
for ii in dataset[i] and then
for i in dataset 好的
for i in dataset. Okay.

That’s usually how I build a one-liner anyway. So this should do what we want to do.

And then let’s go ahead and plt.show() to see our data.

Where are you? There it is. Okay.

So there we have our data.

And again, it’s pretty simple to see kind of what we’re working with.

And in fact we can even do here plt.scatter().

Think we’ll have to do this. 0 1

Okay. That was pretty small. We should have done size equals 100 but you get the point.

So just looking at it visually. We’re probably thinking

Okay, that belongs to the red group, right?

So we have our data. We are scattering that data just so we could visualize it.

But now let’s get started with defining the K nearest neighbors algorithm.

So I’m going to comment this out for now

but we at least wanted to see our data.

So we know that eventually we want to have k_nearest_neighbors

as we’ll do a function.

And in order to calculate K nearest neighbors we need to pass through data, right?

This is the training data basically.

And then we need to pass through whatever we’re trying to predict, right?

And then we need a value for K.

Okay. And for now we’re going to default k to equal to 3.
scikit-learn的k的默认值应该也是这样 也有可能是5
which is the same I believe as scikit-learns K nearest neighbors. It might be 5

but we’ll look at that when we come to it.

So then what we’re going to say is for example

we know that eventually what we want to have is

if len(data) is less than

or is greater than or equal to k. If we have more

because we’re going to pass through a dictionary, right?

And the length of a dictionary like the length of this dictionary here

is 2. It has to basically keys. Okay?

So that would be 2.

So if the length is greater than or equal to the value of k that we choose.

we’re going to just send the user a warning so we’re gonna say warnings.warn().

And then we’re just going to say you’re dumb. Never mind…

We’re gonna say k is set to a value

less than total voting groups something like that.

Idiot! Just kidding…Okay

So anyway, you can leave you there if you like. And then…

And then we’re going to do you know special knnalgos.

Then we’re going to return the the results.

It’s probably going to be a vote results. So we’re just a vote_results. Okay.

So that’s the starting point of our K nearest neighbors algorithm.

And I’m going to go ahead and cut it off here. And in the next tutorial

we’ll continue building this

to the point where hopefully we can pass out data and actually get that vote result

and probably bring at least this one to a close.

And then we can test it on real data.

Anyways, if you have questions comments concerns whatever post them below.

Otherwise as always thanks for watching, thanks for all support and subscriptions and until next time.

[B]刀子

[B]刀子