ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

《边学Python边学数据科学》#1 简介 – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

《边学Python边学数据科学》#1 简介

Introduction - Learn Python for Data Science #1

大家好 我是Sirag
Hello, world! It’s Sirag.
欢迎来到边学Python边学数据科学
And welcome to the Learn Python for Data Science Series.
在这个视频里 我们要搭建Python开发环境
In this video, we’re gonna set up our Python environment
并写出10行脚本
and write a 10-line script.
根据身体测量数据
That can classify anyone as male or female,
将所有人进行性别分类
given just their body measurements.
数据科学是对数据的研究
Data science is the study of data.
数据科学家是通过研究数据
And a data scientist is someone who
来解决问题的人
solves problems by studying data.
所以 基本上所有科学都是数据科学
So, pretty much, all science is data science.
我们会进行观察 做出预测
We observe. We make predictions.
做实验 然后更新我们的理论
We test and we update our ideas.
所以如果我们有过去十年间
So if we were given a data set of
陨石坠落的数据
meteorite landings over the past 10 years,
我们就能想出一些
we could come up with questions that
数据有可能帮助解决的问题
we think the data might help us solve.
比如“哪个区域被陨石撞击的可能性最大?”
Like ”What area is most likely to get hit?”
或者“大气压力是如何影响到陨石坠落轨迹的?”
or ”How does atmospheric presssure affect meteorite trajectory?”.
然后我们就可以编写些代码
Then we could write a little code
来让机器学习数据模型
that trains a machine learning model on that data
并预测出答案
and predicts the answer.
我们可以用一个现存的模型
We can use an existing model.
目前有很多已存的 或者也可以自己建造一个
And there are a lot of them or build our own.
#我以训练它们为终身事业#
#To train them is my cause#
传统上 你需要有个博士学位才能做到
Traditionally, you need a PhD for this stuff.
但如今世界上的数据每两年就翻一倍
But with the world’s data doubling every 2 years,
计算机学习算法更加强大
and machine learning algorithms getting more powerful,
任何人都能成为数据科学家
anyone can become a data scientist.
你只需要拥有时间和动力
You just need time and motivation.
如果你这两样都有
If you have those two things,
你就能完成一堆数据科学项目
you’d be able to complete a bunch of data science projects
并把它们上传到你的GitHub上
and upload them to your GitHub.
GitHub就是新的简历
GitHub is the new resume.
这上面不是说你学历多高
It’s not about how many degrees you have.
而是你能做什么
It’s about what you can do.
计算机学习让科学发现大众化了
Machine learning democratizes scientific discovery.
我正在跟你说呢
Ok, I’m talking to you.
对 就是你 坐在前面的
Yes, you, sitting right there.
你也能成为一个数据科学家
You can be a data scientist.
任何人都能!哈哈哈哈哈
Anyone can! Hahahahaha…
我们用来从数据中学习的工具就是
And the tool we’re gonna use to help us learn from our data
Python编程语言
is the Python programming language.
我要教给你的是Python
I’m gonna teach you Python.
但不仅是通过讲授语法
But not just by talking about syntax.
你会在实践中学习
You’ll learn it by doing.
在每期视频里
In each episode,
我们都会专注于不同的数据科学项目
we’re gonna focus on a different data science project.
我会在视频末尾
I’ll give you a coding challenge at the end
给你个拓展项目的编程问题
that extends that project
你就能通过这些实践学习Python了
and you’ll learn Python along the way.
我选择Python有以下两个原因
I’m picking Python for two reasons:
非常具有可读性 并且适用于通用性目的
it’s designed for readablity; and its general purpose.
看看这个语音识别应用程序
Check out this speech recognition app,
它用了一个叫斯芬克斯的库文件
it uses a library called Sphinx.
来读取音频文件
to read an audio file,
将其转换成文本 然后打印出来
convert it to text, and print it out.
这就只有5行代码
That’s just 5 lines of coding.
我们依旧能看得出来这是做什么的
We can still read what it’s doing
因为每个单词都很简洁 具有描述性
since every word is descriptive and compact.
现在 来看看用C++语言写的一个相似应用
Now, let’s look at a similar app in C++.
大概有100行
That’s about 100 lines.
#我以为我爱你 我爱你#
#I thought I love you. I love you.#
太感人了
So beautiful.
要建立起性别分类应用
To build up our gender classification app,
一共有四步
there are 4 steps.
安装Python 搭建开发环境
We’ll install Python, setup our environment,
安装依赖包 写出Python脚本
install our dependencies and write the Python script.
我们先从安装Python开始
Let’s start by installing Python.
如果你有Mac或者Linux系统
If you own a Mac or Linux machine,
Python已经预安装好了
Python comes pre-installed.
如果你用Windows 则没有Python
If you’re on Windows, it doesn’t.
唷!什么玩意儿
Yo! What the f…
不管怎样 你要是想下载Python最新版本
Regardless you’ll want to download the latest version of Python,
截止今天的是3.5.2版本
3.5.2 as of today.
在Mac系统上 你可以下载安装包
On Mac, you can download the Installer package
经过安装的必要步骤
and go through the necessary steps to install it.
就能从终端编写你的脚本了
Then you’ll be able to compile your scripts from terminal
通过像这样的Python关键词
using the Python keyword like so.
在Linux系统上 你可以下载源
On Linux, you can download the source.
然后在终端输入三条命令来安装它
Then in terminal type in three commands to install it.
然后你就能通过Python关键词运行Python脚本了
You’ll then be able to run Python scripts using the Python keyword.
在Windows系统里 你可以去下载安装程序
On Windows, you can go download the Installer,
要确保Python安装程序的路径
make sure add Python.exe to path
被设置成安装到本地硬盘上
is set to be installed on your local hard drive.
一旦安装完成 你就能从命令行
And once it’s finished, you can run Python
运行Python了
right from command line.
现在我们已经安装好Python了
Now that we have Python installed,
接着我们来搭建环境吧
let’s set up our environment.
我们这堂课会用到的文本编辑器
The text editor we’ll be using for this course
就是Sublime Text
is Sublime Text.
因为用起来超级简单
Since it’s super simple to use.
那Emacs怎么样?
But what about Emacs?
不要
No.
Mac和Windows里都有安装程序
Both Mac and Windows have an installer
可用于安装
that you can use to install it.
至于Linux 你可以在apt-get包管理器中
For Linux, you can install it via the apt-get package manager
输入这三条命令来安装
with these three commands.
一旦环境搭建完成 我们就能在这里写Python代码
Once we have it installed, we can type our Python code in there
并通过将Python编译器指向我们的脚本
and compile it with terminal
在终端进行编译
by pointing our Python interpreter to our script.
就这样
That’s it.
我们只需要终端和文本编辑器来运行我们的脚本
We only need terminal and our text editor to run our scripts.
那么我们已经搭建好环境了
So we’ve got our environment setup.
让我们来安装依赖包吧
Let’s move on to installing our dependencies.
依赖包是我们代码所依赖的包
Dependencies are our packages that our code depends on.
我们在每个脚本的最顶端都会引入
We call them at the top of each script
写上引入语句
we write with the import statement.
比如说 任何程序员都能编写一个包的程序
Any programmer can write a package
在一千行代码中
to say, figure out who shot Harambe
找出是谁射杀了哈兰贝
in a thousand lines of code.
将依赖包上传到Python包服务器
Upload it to the Python package server and
我们就能够下载并用一行代码引入
we could download it and call it with a single line of code.
所有代码都是一个更大整体的一部分
All code is part of a greater whole.
这些代码通过依赖关系连接起来
It’s all linked together in a grand chain of dependencies.
就像建一座房子
It’s like building a house.
如果你已经有了依赖包
In order for you to be able to build the roof of a house,
就能让你更方便地建造出房顶
it’d be nice if you already had the dependencies.
Python包管理器pip
The Python package manager pip
能帮我们安装好依赖包
helps us install dependencies
直接在命令行就能用pip
and we’ll use it right from command line.
你可以为Python3安装pip
You can install pip for Python 3
不管你是什么系统
using these commands for whichever
都能用这些命令来安装
operating system you’re using.
我们这个视频里用来搭建性别分类程序的
The only dependency we’ll be using in this video
唯一依赖包就是Scikit-learn
to build our gender classifier is scikit-learn,
这是一个机器学习程序包
a machine learning package with a bunch of
里面有很多已经搭建好的模型可供使用
pre-built models for us to use, dope.
酷毙了 我们现在已经安装好依赖包
We have our dependencies installed
是时候来编写脚本了
and now we’re ready to write our script.
首先从引入开始
We’ll start by importing it first,
所有依赖包都要这样做
as we should for all dependences.
我们将会用Scikit-learn里
We’re going to use a specific submodule
一个叫做“树”的子模块
of scikit-learn called Tree.
这能让我们建起一个叫“决策树”的机器学习模型
That will let us build a machine learning model called a Decision Tree.
决策树就像一个储存数据的流程图
A Decision Tree is like a flowchart that stores data,
它会向每一个由它衍生的标签数据点
it asks each labeled data point it receives
问一个“是否”问题
a yes-or-no question.
它是否包含X?
Does it contain X or not?
如果答案为“是” 数据就流向一个方向
If the answer is “Yes”, the data moves one direction.
如果答案为“否” 则流向另一个方向
If the answer is “No”, it moves in the other.
它接收到越多数据点
It’ll build every node in the tree
就会在树上生成每个分岔的节点
the more data points it receives.
然后 当我们有了一个新的未标签数据点时
Then when we have a new unlabeled data point,
我们就能补充到树上
we can feed it to the tree,
它会问一系列问题直到它被贴上标签了
it’ll ask it a series of questions until it labels it.
那个标签就是我们的分类
That label is our classification.
我们用于训练这个模型的数据越多
The more data we train it on,
分类就越准确
the more accurate the classification.
让我们用编程方式来建造我们的数据库吧
Let’s start by creating our data set programmatically.
我们先写上变量X 作为名单的集合
We’ll write our first variable X as a list of lists.
变量就是一个可变的值
A variable is a value that can change and
我们会存储一个集合的名单
we’ll store a list of lists in it.
名单是Python里的一种能存储值序列的数据类型
A list is a data type in Python that can store a sequence of values.
在这里 每一个值本身就是一个名单
Here, each value is a list itself
该名单包含三个分别代表身高
that contains three numbers that represent the length,
体重和鞋子码数的数字
width and shoe size of a person.
我们写出11个 即我们的数据库只包含11人
We’ll write 11 of these, so our data set size is only 11 people.
写多一个变量Y来存储一个标签名单
or write one more variable called Y to store a list of labels.
每个标签都是一种性别
Each label is a gender and is associated with
且都与之前名单中的身体数据相关联
a list of body metrics in the previous list.
那么 将它们写成字符串这种数据类型
Well, write them as strings which is a data type
用来代表文本 而不是数字
used to represent text instead of numbers.
现在我们已经有了数据库
Now that we have our data set
接下来要定义一个变量来存储我们的决策树模型
I want to define a variable to store our Decision Tree model.
我们就命名为“clf” 就是分类器的简写
Let’s call it clf, short for classifier and
这个变量会存储我们的决策树分类器
it’ll store our Decision Tree classifier.
我们就能通过调用该变量 来直接引用树依赖包
We can reference our tree dependency directly by calling it here,
然后通过在树对象中
then initialize the Decision Tree,
调用决策树方法 来将决策树初始化
by calling the Decision Tree method on the tree object.
现在已经有了树的变量
Now that we have our tree variable,
我们就能用数据库来训练这个树了
we can train it on our data set.
在分类器变量中调用fit方法
We’ll call the fit method on the classifier variable
这需要两个参数
which takes two arguments.
我们将变量X和Y存储为参数
We’ll store our X and Y variables as the arguments
结果就会被存储在更新了的clf变量中
and the result will be stored in the updated clf variable.
这个fit方法用我们的数据库训练了决策树
The fit method trains the Decision Tree on our data set.
让我们来根据给出的具体身体测量数据
Let’s test it by classifying the gender of someone
进行性别分类测试吧
given a new list of body metrix.
我们要建一个叫做预测的变量
We’ll create a variable called prediction
用来存储结果
to store the result and call
调用我们决策树中的预测方法
the predict method of our Decision Tree to
来预测根据三个值系列得出的性别
predict the gender given these three values in a list.
然后用print命令
Then we can print it out
将结果输出到终端
terminal of the other print command.
通过把脚本存储为demo.py文件
We can run the script in terminal by saving it as
并用Python demo.py命令运行
demo.py and running it
就能在终端运行脚本了
via the Python demo.py command.
那么 为了更好理解
So, to break it down,
数据科学家用数据解决问题
data scientists solve problems using data.
因为用机器学习库非常简便
And because it’s easy to use machine learning libraries
且大量数据都随处可取
and abundant data are now available everywhere
你也能成为数据科学家
you can become one.
Python是一种同时适合初学者和专家
Python is a programming language for
并着重于可读性的编程语言
both beginners and experts and emphasizes readability.
决策树是给数据进行分类的模型
And a Decision Tree is a model
它通过为每个可能出现的结果创造分支来实现
that classifies data by creating branches for every possible outcome.
本期视频的难题是
The challenge for this video is to use any
用同一组数据库 训练Scikit-learn程序包中
three different classifiers from the scikit-learn package
三个不同的分类器
on this same data set,
比较它们的结果
compare their results,
然后最终输出最符合的那一个
then print the name of the best one.
有时你得多试几个模型
Sometimes you have to try a few models to see
才能看出哪个模型预测最准确
what gives you the most accurate predictions.
在评论区发出你的GitHub链接
Post your GitHub link in the comments.
我会在一周内选出最佳者
I’ll pick a winner within one week
在下期视频中点名表扬
and mention them in the next video.
如果你喜欢本期视频 请转发
Please share this video if you liked it
关注我 就能看到更多编程视频
and subscribe for more programming videos.
至于现在 我得去喝点Soylent了
For now, I’ve got to drink some Soylent.
那么 感谢收看
So, thanks for watching.

发表评论

译制信息
视频概述

简单介绍python及其安装步骤。从安装配置编程环境讲起用Python开发了一个性别识别小脚本

听录译者

徘徊的小孩

翻译译者

One静茹

审核员

审核员 W

视频来源

https://www.youtube.com/watch?v=T5pRlIbr6gg&t=11s

相关推荐