What is going on everybody and welcome to part 15 of the machine learning with Python tutorial series.
In this tutorial we’re going to be building on the last couple
which we’re talking about K nearest neighbors.
We talked about the intuition. It’s basically
how close is this point to the K closest points.
And whatever the majority classes of those K closest points.
We say this new point is that class.
And in the previous tutorial exactly what we did was
apply K nearest neighbors to real world data set.
We found that it’s actually fairly accurate which was very cool.
So now we’re going to break down the K nearest neighbors algorithm
and rewrite it ourselves from scratch in code.
But first we have to cover what everything hinges on, right?
It hinges on this distance.
So what is that distance? It is Euclidean distance.
So what is Euclidean distances?
Of course named after Euclid famous mathematician,
popularly referred to as the father of geometry.
Definitely wrote the book on it, right? Euclid’s elements
which is arguably the Bible for mathematicians and scientists.
Also fun fact is you know
whenever someone would create a printing press
the first thing you’d start popping out was the Bible of course.
And then the second thing was most likely Euclid’s elements.
So anyways what is Euclidean distance?
First we have is the sum to n.
And in this case ‘n’ represents the number of dimensions in your data.
So just think of it as in this case as dimensions
but really this means sum to n where i
starts off at least as being equal to 1.
OK. So it really just means i starts as 1 goes up to n where i actually
is your dimensions.
So if you just have one dimension. It would just…you would just do this one time.
And it’s the sum of what?
And in…Let’s do parentheses here. It is going to be
（qi – pi）²
(qi – pi)²
And then this entire calculation we do the square root of it.
And this is Euclidean distance.
那i就是维度 q是其中一个点 p是另一个点 对吧？
So i is just the dimensions. q is one point. p is a different point, right?
So this would in theory if you just…if you got rid of n and i.
去掉这些连加运算 就留一个括号 那么
You got rid of the whole sum and you just left the parentheses part like…if you just left…
I hate to circle legs and arms to mess it up.
But if you just left the whole parenthesis squared.
This would be…that would be the calculation for
a one-dimensional distance between two points in Euclidean space, anyways.
But now let’s actually break this down into simple mathematics.
I always like to do it by hand for some things we won’t always do by hand.
We won’t actually do the calculation but I’ll show you how you would plug it in at least.
So we’ll start off and say q is equal to 1,3.
So these are the coordinates for our data point.
And then p…the coordinates for p, x and y, so this is two dimensions
is 2,5. Those are the coordinates.
So then how would we calculate the Euclidean distance?
Well it’s going to be the square root of
basically a couple things. So we know we have two dimensions.
So we know that basically what’s going to start off as. Will be something like…
it’ll be you know the square root without that dot there.
square root and we know we’re going to have at least two of these.
Right? Because we’ve got two dimensions.
Here we recall it’s the summation of these. So it’ll be a plus here.
And then this will be squared and this will be squared. And then we just need to fill in
就是q1 – p1 也就是1 – 2 对吧？
So initially it’ll be q1 – p1. So it would be 1 – 2, right?
这里是1 然后这里是3 接着是减号 减号 2和5
1 and then over here we’ll just put a 3. And then it’s minus, minus, 2 and 5.
And that would be the Euclidean distance.
好的 很简单了 接下来让我们用Python实现这个
Okay. So simple enough. Let’s head over to Python and actually create this.
So in Python here,
let’s just recreate exactly what we just did by hand.
So instead of q and p let’s say plot1
equals…and we’ll do 1,3.
And plot2 equals 2,5. Okay?
接下来……让我先回到开头 from math import sqrt
Now we’re going to…Let’s go up to the top and say from math import sqrt
which is just importing the square root.
So coming back down here.
Converting this to Euclidean distance or basically
calculating the Euclidean distance between these two plots
is the following.
euclidean_distance = sqrt
So euclidean_distance = sqrt
So remember it’s the square root of the sum
of each of the dimensions minus that same dimension in each of the plots
or two plots. Really, you’re going to calculate distance between two plots.
So in this case it would be for example
plot1的第一个元素 也就是plot1的x坐标值减去plot2的x坐标值 对吧？
plot1 zeroeth element so the x of plot one minus the x of plot2, right?
So minus plot2 and the x of the zeroeth. Okay? So that’s one.
And remember it was the sum of all of these. So it would be that plus
and then basically the exact same thing only instead of the 0 it would be the 1, all right? So 0,1
So you can think of these as your dimensions, right? So this is dimension 0
and this is dimension 1. So this is two dimensions as indeed it is.
So that would be the i in that equation. Just for the record.
好的 欧几里得距离 搞定 接下来……
So anyways Euclidean distance. Boom! Done. Let’s go ahead and…
Oh, these also need to be squared.
So that squared and this squared.
对吧？这里再平方 然后再平方 最后整体
Right? So that is squared. This is squared and then the entire operation is…
We get to grab the square root of that. So now let’s print
So we get 2.2360 and so on.
But basically that is your Euclidean distance.
So now that we know how to calculate Euclidean distance.
We basically have the crux of everything we need
to do K nearest neighbors.
But we have kind of like a lot of framework to create regardless.
So that’s what we are going to do in the next tutorial.
Creating the framework that will take a data set and use K nearest neighbors to classify points.
So if you have any questions or comments up to this point.
就在下方留言吧 感谢各位的收看 支持和订阅 我们下期见
Feel free to leave them below. Otherwise as always thanks for watching. Thanks for all the support and subscriptions and until next time.