• #### 科普

SCIENCE

#### 英语

ENGLISH

#### 科技

TECHNOLOGY

MOVIE

FOOD

#### 励志

INSPIRATIONS

#### 社会

SOCIETY

TRAVEL

#### 动物

ANIMALS

KIDS

#### 卡通

CARTOON

#### 计算机

COMPUTER

#### 心理

PSYCHOLOGY

#### 教育

EDUCATION

#### 手工

HANDCRAFTS

#### 趣闻

MYSTERIES

CAREER

GEEKS

#### 时尚

FASHION

• 精品课
• 公开课
• 欢迎下载我们在各应用市场备受好评的APP

点击下载Android最新版本

点击下载iOS最新版本

扫码下载译学馆APP

#### 【机器学习入门】#3 什么是好特征

What Makes a Good Feature? - Machine Learning Recipes #3

JOSH GORDON: Classifiers are only

as good as the features you provide.

That means coming up with good features

is one of your most important jobs in machine learning.

But what makes a good feature, and how can you tell?

If you’re doing binary classification,

then a good feature makes it easy to decide

between two different things.

For example, imagine we wanted to write a classifier

to tell the difference between two types of dogs–

Here we’ll use two features– the dog’s height in inches

and their eye color.

Just for this toy example, let’s make a couple assumptions

about dogs to keep things simple.

First, we’ll say that greyhounds are usually

Next, we’ll pretend that dogs have only two eye

colors– blue and brown.

And we’ll say the color of their eyes

doesn’t depend on the breed of dog.

This means that one of these features is useful

and the other tells us nothing.

To understand why, we’ll visualize them using a toy

dataset I’ll create.

Let’s begin with height.

How useful do you think this feature is?

Well, on average, greyhounds tend

to be a couple inches taller than Labradors, but not always.

There’s a lot of variation in the world.

So when we think of a feature, we

have to consider how it looks for different values

in a population.

Let’s head into Python for a programmatic example.

I’m creating a population of 1,000

I’ll give each of them a height.

For this example, we’ll say that greyhounds

are on average 28 inches tall and Labradors are 24.

Now, all dogs are a bit different.

Let’s say that height is normally distributed,

so we’ll make both of these plus or minus 4 inches.

This will give us two arrays of numbers,

and we can visualize them in a histogram.

I’ll add a parameter so greyhounds are in red

Now we can run our script.

This shows how many dogs in our population have a given height.

There’s a lot of data on the screen,

so let’s simplify it and look at it piece by piece.

of the distribution– say, who are about 20 inches tall.

Imagine I asked you to predict whether a dog with his height

was a lab or a greyhound.

What would you do?

Well, you could figure out the probability of each type

of dog given their height.

Here, it’s more likely the dog is a lab.

On the other hand, if we go all the way

to the right of the histogram and look

at a dog who is 35 inches tall, we

can be pretty confident they’re a greyhound.

Now, what about a dog in the middle?

You can see the graph gives us less information

here, because the probability of each type of dog is close.

So height is a useful feature, but it’s not perfect.

That’s why in machine learning, you almost always

need multiple features.

Otherwise, you could just write an if statement

instead of bothering with the classifier.

To figure out what types of features you should use,

do a thought experiment.

Pretend you’re the classifier.

If you were trying to figure out if this dog is

a lab or a greyhound, what other things would you want to know?

or how fast they can run, or how much they weigh.

Exactly how many features you should use

is more of an art than a science,

but as a rule of thumb, think about how many you’d

need to solve the problem.

Now let’s look at another feature like eye color.

Just for this toy example, let’s imagine

dogs have only two eye colors, blue and brown.

And let’s say the color of their eyes

doesn’t depend on the breed of dog.

Here’s what a histogram might look like for this example.

For most values, the distribution is about 50/50.

So this feature tells us nothing,

because it doesn’t correlate with the type of dog.

Including a useless feature like this in your training

data can hurt your classifier’s accuracy.

That’s because there’s a chance they might appear useful purely

by accident, especially if you have only a small amount

of training data.

You also want your features to be independent.

And independent features give you

different types of information.

Imagine we already have a feature– height and inches–

in our dataset.

if we added another feature, like height in centimeters?

No, because it’s perfectly correlated with one

It’s good practice to remove highly correlated features

That’s because a lot of classifiers

aren’t smart enough to realize that height in inches

in centimeters are the same thing,

so they might double count how important this feature is.

Last, you want your features to be easy to understand.

For a new example, imagine you want

to predict how many days it will take

to mail a letter between two different cities.

The farther apart the cities are, the longer it will take.

A great feature to use would be the distance

between the cities in miles.

A much worse pair of features to use

would be the city’s locations given by their latitude

and longitude.

And here’s why.

I can look at the distance and make

a good guess of how long it will take the letter to arrive.

But learning the relationship between latitude, longitude,

and time is much harder and would require many more

Now, there are techniques you can

use to figure out exactly how useful your features are,

and even what combinations of them are best,

so you never have to leave it to chance.

We’ll get to those in a future episode.

Coming up next time, we’ll continue building our intuition

for supervised learning.

We’ll show how different types of classifiers

can be used to solve the same problem and dive a little bit

deeper into how they work.

Thanks very much for watching, and I’ll see you then.