各位粉丝还有其他人 大家好
What is going on subscribers and others.
欢迎来到机器学习系列课程第十二讲
Welcome to part 12 of our machine learning tutorial series.
这一节我们来说说
In this tutorial what we’re gonna be talking about is
怎么检验我们之前的理论
testing our assumptions. So
其实一直到现在 我都觉得还没讲到重点
up until this point it’s been…I would say rather hand-wavy
我只是刚讲到一些算法
in the sense that I have just said hey these are the algorithms
和它们的输出结果 就是这些算法的结果
and whatever the output. These are the answers to those algorithms and
我们已经了解了线性回归和R平方理论这些
we have done linear regression and R squared all this.
现在就需要来检验一下
And so the question is we need to actually kind of test
我们讲过的这些理论 我们其实
all of these assumptions. So we’ve got actually
主要讲了两个算法 一个就是最优拟合线的等式
two major algorithms. One is the equation for the best fit line
另一个就是R平方或者叫决定系数
and the other one is the R squared or coefficient of determination.
当然这两个算法其实包含了很多其他算法
So we’ve got these two major algorithms that are also comprised of many other algorithms
前面的视频中我们也提到过
as we even saw just a few videos ago.
一个小括号差点就让我们功亏一篑
The misplacement of a single parentheses changes everything and completely ruins the entire thing.
所以我们需要检验一下是否一切能够正常运行
So we need to be able to test to make sure things are working as intended.
其实在编程界这算是……
So in the world of programming this is…
有个相关的领域叫作
There’s a similar kind of field and structure called
单元测试 就是去尽量的测试程序中的
the unit testing where we you know test each little small unit basically that we can
每个小单元 以避免程序出现问题
in a program and this kind of helps us from getting into trouble.
我们要做的还不能算是单元测试 但差不多是一回事
Now this is not going to be unit testing but the idea is fairly similar.
我们有很多想法 我们也做了不少工作
We’ve got a lot of ideas. We’ve got a lot of inner working parts.
我们至少要确保它们都起作用
and we want to at least test them to make sure.
最简单的方法就是代入一些样例数据
The easiest way we can do that is by working with sample data.
并且我们可以修改这些样例数据
And by sample data that we have the power to change.
也就是说我们能够创建一个线性数据集
So that we can create a data set that is a more linear data set.
或者至少数据间是线性相关的
And or at least the relationship is more linear.
这样我们就可以确保R平方比较高 对吧?
And then we can test to make sure is R squared better higher, right?
也可以测试我们的最优拟合线
And then also just test our best fit line.
当然最主要的是检测R平方的值
But for the most part we’re actually going to be testing R squared.
如果数据不够线性 那么我们可以让它们更分散些
And if the data is not more linear we can make it more spread apart.
R平方就会更低 所以
R squared should be lower and so on. So anyways.
让我们开始吧
Let’s go ahead and do that.
仅仅通过观察它是不是一条最佳拟合曲线
And we can also confirm visually that the best fit line is indeed working just by looking at it
我们可以看出它的效果还是挺不错的
and seeing whether or not is indeed a best-fit liner what looks to be a best fit line.
首先我们要做的是
So first what we’re going to go ahead and do is
import random 因为我们要用到随机数
import random. Because we’re going to be using random numbers
大家都知道这其实是伪随机数
everybody the obligatory pseudo-random.
如果我不说是伪随机数 恐怕就有人要留言说
If you don’t say it’s pseudo-random someone absolutely feels the desire and urge to comment and say:
“这可不是真正的随机数啊” 我们用的就是伪随机数
‘But it is not real random.’ So anyways pseudo-random there you go.
你们真是爱挑刺儿
You nitpickers. Okay. So
接下来我们就到这里
what we’re going to do is just right under here.
写 def create_dataset()
Let’s…We’re going to say define create_dataset().
然后我们要传入一些参数
Then here we’re going to have…we’re going to pass some parameters.
第一个就是我们要创建多少个数据点
First is how much like how many data points do we actually want to create here.
然后我们要传递方差
And then we’re going to say we’ll pass variance.
也就是说我们想让这个数据集的波动性有多大
And this will be how variable do we want this data set to be.
然后要传入步长(step)
Then we’re gonna pass step.
步长就是每个数据点间
And step will just be how far on average
y值变化的平均数
to step up the y value per point.
这里我们要设置一个默认值
And we’ll assign a default value there.
最后是相关性
And then finally we’re gonna do correlation.
这里我们可以传入值来设置数据集是正相关
And this is where we can just pass a value and say we want correlation to be positive
还是负相关或是不相关
negative or none and
这里就是相关性…… 它可以为真或为假
what we’re gonna do here is correlation or hold on…So correlation will either be true or false.
如果为真就是正相关
And then if it is true to get it to go to be a positive correlation step.
这里就是一些正数 对吧?因为这是修改 y
We’ll just be some positive number, right? Because that’s changing y.
如果是负相关你就得把这改成负数了 对吧?
And to be a negative correlation you would just change this to a negative number, right?
其实还有一种方法就是我们可以假设相关性为正
So…And in fact another way we could do it is we could actually say correlation is positive
或者为负 假如为正就需要增加步长
or negative and if it’s negative you do a multiplication of the step.
可能这么做更好 当然两种都行 我们就用这种吧
That’s actually probably a better way to do it. Either way would work but we’ll do that way actually.
首先我们需要……
So the first thing that we’re going to do is…
最后这里 我们要先写一个万能函数
Well we would want to be able to, at the end of this, I always like to build the skeleton function first.
最后这里这个对象会返回 np.array(xs, )
So at the end of it what is the objective and that would be to return the numpy array
然后我们还要设定它的数据类型
of the x and for now again we will specify the data type.
等会别忘了这个 因为可能一会儿有用
So we don’t forget this later on. Because it’s probably going to be useful later on.
这里就用 float64
So we’ll say float64.
它会返回 xs 我们也需要返回
So that returns the x’s and then we also need to return
y 的值 所以这里有 ys dtype就等于
y values. So ys and then dtype equals
np.float64 好吧
np.float64. Okay.
这就是我们要定义的对象 接下来我们要
So that’s the objective that we want to do and then now what we want to do
先创建一些随机数
is create some…start creating at least some random values.
首先要做的就是设定 val = 1
So the first thing we’re gonna say is we’re going to start with val equals 1.
这差不多就是 y 的第一个值
So that’s just going to be the first value for y basically.
然后我们设定 y 为一个空 list
And then we’re just going to say y is this–an empty list.
然后我们可以用 for i in range
And then we’re going to…we could say something like for i in range of…
范围range应该是多大 应该和 hm 的值一样
And how many…what should this range be. Well should be ‘hm’ for how many, right?
所以在 hm 的范围里 我们要定义
So for range hm what are we going to do. Well we’re going to say y
y = val + random.randrange()
equals the val plus random.randrange.
这大概就是从 -variance 到 variance
And it should be random.randrange from the negative variance to the positive variance.
这就是我们指定的数据范围
So some range in there is what we want to do first.
接下来定义 ys.append(y)
And then we’re going to say ys.append that y.
这里我们只要用hm做一次遍历就行
So here we would just be iterating through the range using that how much variable.
然后每次给当前值加上一个随机数
And then we’re just appending that current value plus a random one.
这会给我们一些线性无关的数据
So this would give us data but really no correlation if we actually wanted that data.
接下来的问题是……记着
So then what we would ask is…So keep in mind that
val 就是 y 的起始值
y is literally the val. So it’s just that starting value.
然后我们的方差从这个值开始
And then our variance from that starting value.
这时候的值对我们用处不大
So this would be pretty worthless at the moment.
或多或少会有一些差别
It would just be somewhat varied but not by much.
要看你怎么定义方差了
Well it depends on what you said the variance was. Anyway.
那现在我们可以说
So then what you could say now is
“if correlation and correlation == pos”
if correlation and correlation equals
那么就定义 val += step
positive. What we could do is val plus equals step
step 默认等于2
which would in this case default 2
然后是“elif correlation and correlation == neg”
And then elif correlation and correlation equals
我们要做什么 让 val -= step
negative. What do we want to do? Well val minus equals step.
最后我们就可以得到 y 的值了 我们还需要一些 x 的值
Finally at the end of the day what all we’re going to do is now we’ve got the y’s so we just need some x’s.
你可以假设 xs =
So you could say something like x’s equals
然后用一个 for 循环 “i for i in range”
And we’ll just do a one line for loop. i for i in range of what
(len(ys))
the len of y’s.
这样差不多就够了
That’s good enough where you could do how much for that matter. Anyway.
接下来我们要返回 x 和 y 的值
So now we’ve got what we need and we’re returning some x’s and y’s.
为了创建一个简单的数据集 我们可以
So to create a sample data set we could do something
比如……
like…and for example let’s…
这边我们可以先放着
We can leave this here for now but
我还是先把它注释掉吧 这样我们就可以用新数据了
I’m gonna comment it out just so we know that we’re working with our new data instead.
在下面我们可以创建一个新数据集
So underneath this you could create a new data set but I guess we’ll create it.
我们在这些函数下面创建
We’ll create down here underneath all these other functions.
你可以写 xs, ys = create_dataset
So you could say something like xs, ys equals create_data set.
这里我们知道参数是 hm variance
And then let’s say we did a recall that it’s how much variance
step 还有 correlation
the step in the correlation.
比如我们想要40个数据点
So let’s say we said we want 40 data points
方差为40 步长为2
with variance of 40. The step will be 2.
相关性我们设定为正
And correlation we’ll make that positive.
现在我们已经可以得到 xs ys 可以直接得出 R平方这些东西了
So now we have x’s, y’s. We can print R squared and all that fun stuff.
让我们快速运行看看
And let’s go ahead and run that real quick. And in fact
画预测值的函数还在这
are we still…We’re still graphing that prediction. So…
我们就删掉这一段吧 其实我们也可以留着 它挺有趣的
Let’s we’ll get rid of the prediction. We could actually leave the prediction that might be kind of interesting.
现在的话
For now
运行可能有些问题 我不太确定
that we might run in trouble I’m not really sure if
程序会不会出问题 那我们还是继续
we’re gonna get in trouble for that or not. But we’ll just do that.
让我们运行看一看
And let’s run it and see.
我们可能还要改点东西 但可能也就这些了
We might have to change something else but I think that would be everything we would change.
太棒啦
Awesome.
这儿是我们的数据集 最优拟合线看起来也很不错
So here’s our data set and sure enough there’s a nice best fit line for us.
可以看出
And we see that
结果确实比较好
We would kind of agree with that visually.
我们把另一个点也画出来吧
Let’s go and graph that other plot though that one.
它应该是一个绿色的点
And this will be a G prediction.
我没有看到它 它应该是 x = 8
I don’t even see it. It was for x equals 8.
我猜这个点应该就在直线上
I guess it would be right on the line.
接下来我们要画出回归线
And then we’re plotting the regression line. So
我想这条线应该直接划过去了 穿过了那个点
I’m guessing the line is just going right over it probably. It’s just being drawn over it.
还是没看见
Still not seeing it however.
就是 x = 8 对吧?应该是在这儿的一个小点 我们放大看看
It was x equals 8, right? So it should be that…It’s probably this little plot right here. I’ll zoom in.
就在这儿 我不确定你们在视频里
It’s there I don’t know if you’ll be able to see that on the
能不能看得清 不过这里可能不需要这个点 我们可以
on the video. But it there isn’t need a plot there and in fact we could do something like
用散点图 那这里就是 s 然后我们试试100
I think with scatter it’ll be s equals. And then let’s try 100.
也就是代表这个点的大小
So this is like for the size.
所以这确实有一个大绿点儿
And indeed there is a huge green there. So okay.
好的 看来我们的预测值 跟我们所想的一样 完美契合直线
Anyway. So there’s our prediction as you should expect it’s perfectly on the line.
那我们就关掉这个
So we’re going to close this out and…
接下来我们怎么去检验我们的理论呢
So now how would we test our assumption?
这里我们有 hm 值和方差
Well recall that we’ve got how much and then variance.
如果我们将方差设定为40
So if I said…If I took variance which is currently 40.
那R平方我觉得就大概是0.5 那我们再来看看
And we saw that it was like 0.5, I think for R squared. Let’s look at again.
因为随机数据这次是0.6 理论上来说
Well since it’s random data this time was 0.6. Okay. So in theory
如果我们减少方差值会怎样
if we decrease the variance. What should happen?
如果我们减少方差的话
Well what should happen is that number should go down
这个数会大幅下降 我们来试试
pretty significantly so long as we decrease variance significantly. So let’s do it.
比如是10 保存然后运行
Let’s do 10. We can save and run that.
可以看到点与点之间更紧密了
And as you can see it’s much tighter. Everything’s there and sure enough
而且决定系数确实很好 0.92 比以前都好
the coefficient of determination is very very strong. It’s 0.92 much better than before.
如果我们把它改成80 那这里就应该小于0.6
What if we change this to an 80 now. It should be less than 0.6.
确实小于0.6
And sure enough it is less than 0.6. And so
接下来你要做的就是
what you can begin to do is automatically
写一个程序来计算一个样本数据集的
write a program that’s simply calculates the coefficient of determination
决定系数
for just a sample dataset.
你只要确保 比如从40开始
And you would just make sure for example that you’d start with 40.
保存这个数 然后改为10
Save that number and then you would change that to 10.
然后希望决定系数少于这个初始值
And hopefully the coefficient of determination was less than this initial number.
如果变大的话 就会一直变大
And then if you went greater it should be greater and so on.
这就算是一个测试 我们叫做单元测试
That would be a way to test just that. We’ll call it a unit.
理论上你确实可以写这么一个单元测试 但这确实不算是一个单元测试
In theory you could build a unit test out of this. But this isn’t quite yet a unit test. But anyway.
你可以去测试它 当然也可以做些其他的事情
So you can test that and then sure enough the other thing you could do is…
这里我们有一个正相关关系 我们把它改成负相关
While you we had a positive correlation. If we change this to false.
那数据学就会变得不怎么好看了
We should get quite an ugly data set. Sure enough we do.
决定系数会接近于0
And the coefficient of determination is almost zero.
当然 这也是意料之中的
which is absolutely not surprising.
因为这条线看起来几乎是水平的
Because that almost looks like a completely flat.
完全水平
Completely flat line.
那这些数据也肯定是非线性的了
And sure enough this data is completely non-linear.
如果你有这么一个数据集 而且你还想在上面做线性回归
So if you did have a data set and you were trying to run linear regression on this data set.
那你得到的 R平方值 大概就会是0.0007
And you came back with an R squared that was this number that’s like 0.0007.
聪明的你肯定会想到“这个数据集不是线性的”
You would probably be smart enough to decide: ‘hey my data is actually not linear’.
我们不能用这样的数据集来做线性回归
We can’t quite do linear regression with this data.
也就说你可以使用一些其他形式的分类
That said you can do other forms of classification with the data
或者不仅仅是分类
or not just classification.
你也知道一些其他的机器学习算法
But you know other forms of machine learning. I’m thinking classification
用来分类 数据集就不必是线性的了
with your data doesn’t necessarily have to be linear.
其实大多数分类问题的数据都是线性的 我们以后会遇到的
And in fact a lot of classification is…should be linear in some way. But we’ll get there.
好的 差不多就这样了
Anyway that’s enough for now I think.
你们要记住到你们写大程序时
But just kind of keep in mind that when you create
就像我们今天这样 可能会用到很多东西
big scripts like we have here in big programs that are kind of based on a lot of things.
你想要确保一切都正常
You want to make sure that it’s about right.
可以通过观察来检验最优拟合线
We could check the best fit line ourselves kind of visually.
但 R平方 就不能这样来检验了
But R squared we could not really totally test that.
当然你可以写个程序来检验它
But you could definitely program something that would go through.
就像我之前说的
like I was saying
确保R平方的变化
check to make sure R squared was acting
和它理论上的变化一样
according to our assumption or our knowledge of how it ought to act.
我们基本上就讲完线性回归了
So we’re basically done with regression.
但我想在这个视频里快速说明两件重要的事情
But I want to make a quick edit to this video to cover two pretty important things.
一是机器学习中容易忽视的一个基本方面
One is a fundamental aspect to machine learning
就是使用像我们今天使用的简单数据集
that be getting overlooked using the really simple example that we that we’ve used here.
第二就是我之前犯了一个很大的错误 现在要说明一下
And then two I made an error that I think is bad enough that we want to cover it
希望你们能从我的错误中学到东西
plus I think you can learn a little bit from the mistake that I made.
让我们回到代码中 再强调两个东西
So let’s pop over to the code and address these two things.
我尽量快些
Hopefully pretty quickly.
首先看看这个数据
So first of all, looking at the data.
我把它从1%改成10%
I’m going to change this to from 1% basically to 10% now.
然后我们运行
We’re going to run that.
我们可以看出 基本上就是
And we’re going to see that it’s basically an exact copy of
之前数据的复制 稍微在价格上变了一些 对吧
like the data leading up. Just shift it in price a bit, right?
然后看这里
So coming over here.
基本上是一样的
It’s basically the same.
只是变得有一些扁了
This version is squished up a little bit.
因为蓝线就是预测线
And that’s just because the blue line is the prediction line
它把周末和假期的值都给预测了
that plots even on the weekends and holidays.
然而股价只会在周一到周五变化 假期不会变化
Whereas over here the stock price only occurs during Monday to Friday and not on holidays as well.
其他基本上就都一样了
So anyways basically an exact match.
只是价格高了一些
Just higher in price.
原因有两个
And the reason is kind of twofold.
第一 我们建立的线性模型会倾向这么做
One we’ve created a linear model that is going to attempt to do this.
并且我们也犯了一个错误
But then also we’ve made a mistake.
所以这里都要提一下
So we’ll address kind of both. But anyway.
第一个错误问题最大
The first thing is in the biggest mistake.
其实有两个错误
Actually there was two mistakes.
第一就是我注意到之前的视频中
One I noticed in the video just going back over it.
我非常确定在X的末尾有一个分号
I’m pretty sure it was here. There was also a colon at the end of the X.
我也不知道它为什么会在那 没人让它显示出来
I don’t know why that was there. No one actually brought out that one.
我也只是刚好在录视频前发现了它
I just happened to see it right before filming this one anyway.
基本上就是 X = X 对吧
That basically is X equals X, right?
这其实就是X中0到forecast_out然后到结尾
All that says is X up to forecast_out and then finish the whole thing, right?
基本上什么都没干
That doesn’t do anything.
只是一个失误
So that was just a typo.
这里要这样写
But then you get to this point.
我们还是要时刻注意
And we’re still kind of in a world of hurt.
因为 X……我们要做的就是把 X
Because X…What we were intending to do is say X is
首先……在这里它就是10%
the first…Let’s say in this case it’s 10%.
是的 首先我们说 X 是前90%的数据
Yeah. So the first we’re saying X is the first 90% of data.
这些是我们的训练数据
This is the stuff we’re going to train against.
然后我们用了 X_lately 然后目标就变成
And then we’re saying X_lately and our objective here was to say
X_lately就是最后10%的数据
X_lately is the last 10%.
我们所做的就是分割了 X 然后重新定义 X
But instead what we’ve done is we’ve sliced X and redefined X here.
重新定义 X 后分割它
And then sliced X after it’s already been redefined.
也就是90%中的-forecast_out
So this is actually minus forecast_out of the 90%.
很明显简化了一些东西
So obviously simplifying things a little bit.
基本上就是到90% 然后这个就是这90%里的
This is the basically up to 90%. And this is the last
最后10% 会更多一点 不过不要紧
10% of that 90%. It’s a little bit more. But anyway.
这其实……算是一个逻辑错误
So that was just…that’s a failure in logic.
复制然后粘贴到这 就可以修复了
Okay. So really the fix that you just cut that paste it there.
这样就对了 这里创建的模型还是
In there you have it. Now this is still going to create a model that’s relatively
看起来很眼熟
akin and very similar to what we’ve already seen.
因为我们用的还是线性回归
And again this is because we’re using linear regression.
它要去创建一个我们已经见过的线性模型
It’s going to create a linear model that resembles what we’ve already seen.
股价遇到一些问题 然后反弹 然后得出价格
So again you’ve got some jagged then you got to jump up and then price.
有一些不同
It’s a little different.
但还是非常相似的
But it’s very very similar. Okay. So anyway. That’s just
基于我们之前所做的工作 还有我们训练的方式 就会发生这种事情
given what we’ve done and how we’ve trained it. That’s going to happen.
让我们来说说最后一件事情
So now let’s talk about the last thing
也就是……
which is the fundamentals of…
你们都清楚需要训练什么样的特征
you know what kind of features should you train against.
那这里的目标是什么呢
So what was the objective here?
首先我来讲一下我们这么做的理由
First of all let me just say the reason why we did it this way is
只是为了简化起见
just for simplicity sake.
我们要做一个非常简单的回归案例
We’re just trying to do a really simple regression example.
无论你对股票投资有没有兴趣
But let’s say you know regardless of whether or not you’re interested in stock investing.
这个问题在机器学习中会变得比较复杂
This problem is every machine learning problem is going to likely be a somewhat complex problem.
所以你得仔细想明白你要用什么特征
So you have to think pretty logically about the features that you choose to use.
看看这里
So looking at this.
这些特征都和价格有关或者可以直接影响价格 对吧
Which of these things hinges directly on price or will directly impact price, right?
Adj.Close 明显就是 那 HL_PCT 呢 它和价格有关系吗
Obviously Adj. Close well. What about HL_PCT? Doesn’t matter what the price is?
没关系 它就是一个百分比 对吧?就是一个归一化值
Not. It’s a percent, right? It’s a normalized value.
和价格没任何关系
So that doesn’t have anything to do with price.
那 PCT_change 呢 也没关系 对吧
How about PCT_change? No. right?
它们可能会很容易改变 对吧
These may be volatility, right?
大小可能会变 就像在趋势上会上下波动
May be magnitude–same thing with high-low percent volatility in like direction maybe.
但不是价格 那 Volume 呢
But not price. What about Volume?
无关 它不是价格 对吧
No, not price, right?
只是大小可能会有些波动 非常易变
This is just magnitude. Kind of fluctuation maybe. Stuff like that volatility.
所以真正和价格相关的量就是 Adj.Close
So the only thing that really hinges on price is just Adj. Close.
举个例子
To illustrate that
虽然训练的预测值是价格
despite training on a future value that is indeed price.
但我们可以把 Adj.Close
What we can do is we can actually drop Adj. Close.
从特征中去掉
from the features.
在画图前 你能想到去掉它后会发生什么吗
What do you think when we drop this? What do you think is going to happen before we graph it?
会生成一条沿着价格走的线吗
Is that going to create a similar line that follows price?
会变成下降的价格 上升的价格 还是价格不变 预测会变成什么样
Is it going to be a falling price, upward price, flat line? What’s it going to create for the prediction?
我们来快速运行一下 然后就可以知道答案
So think about that we run it really quick. And we’ll get our answer.
结果可能不是我们所希望的那样
And the answer is not going to be probably what we were hoping for, right?
基本上就是一条平的线 为什么会这样
It’s just more of a flatline and why do we get this, right? Well.
对于 HL_PCT 这个量
The HL_PCT.
当价格是 $400 $600 $800 的时候
You probably had very similar
可能会得到差不多的 HL_PCT
high – low percents back when price of $400, $600, $800, right?
差别不会太大
Not big differences.
稍微会有点影响的就是 Adj.Volume
The only thing that might be sort of impactful is the Adj. Volume.
因为可能很少人会
Since probably less people
像处理 $50 的股票那样处理 $800 的股票
are quickly flipping an $800 stock as opposed to a $50 stock or something like that.
但不管怎样
But regardless
这些确实都不是特别好的特征
These just aren’t the greatest features.
那在思考一下问题 我们这里就是股票投资
So thinking about your problem. In this case it was stock investing.
股票价格代表什么
What is it…What is a stock price indicative of?
它是整个公司价值的一个表示
It’s indicative of the the entire company’s value.
比如谷歌 差不多市值就是5千亿美元
Let’s think of Google for example. Like 500 billion dollars I think.
为什么谷歌值5000亿美元?
Why is Google worth 500 billion dollars?
是因为 Adj.Close HL_PCT PCT_change Adj.Volume 这些数字吗
Is it because of the Adj. Close, HL_PCT, PCT_change, Adj.Volume?
当然不是 想想都不是 你也知道不是因为这些
No! Come on! Be logical about it. You know that’s not the case.
有些人比较相信模式识别这类东西 但
There are people who believe in pattern recognition and stuff like this. But…
至少你们知道股票走势模式
Or at least you know chart patterns in stocks.
抱歉 有很多研究证明它并没有用
Sorry. But it’s been tested. There’s plenty of research done. That doesn’t work. But anyway.
但还是有人相信
Some people still believe it.
但到底为什么谷歌值5000亿美元呢 不是因为这些
But fundamentally why is Google worth 500 billion dollars? It’s not because of this stuff.
谷歌价值5000亿美元基本上
Fundamentally Google’s worth 500 billion dollars
是因为季度营收这一类东西
Because of things like its quarterly earnings,
营收值 营收增长 账面价值之类
its price to earnings, its price to earnings’ growth, its book value and so on.
这些是衡量一个公司市值的标准
These are the things that value the company.
如果你真的想预测股价
So if you wanted to predict stock price
你就应该把这些相关的值都作为特征
You would use features that attempt to predict the company’s overall value.
这样你就可以划分出那些优良的股票 并得到指定公司的股价
Then from there you can divide that by outstanding shares and get a specific share price for the company.
这只是一个简单的例子
But anyway. This was just meant to be a very simple example.
如果你想了解更复杂的例子
If you want to see a more complex example of
用特征投资和一个公司的基本特征这些东西
doing investing with features and fundamental features of companies.
我也有相关的教学视频
I do have a tutorial series out for that.
大概30讲或者20讲左右
It’s like 30-something videos if I recall or maybe 20 or something.
但内容会比较枯燥
But it’s kind of tedious.
因为要用到所有的季度营收
Because you got quarterly earnings which is every quarter.
然后还得算营收增长
Then you’ve got things like price to earnings to growth
要算出所有时间的
which you could measure all the time.
账面价值 所有时间段的账面价值
Book value, price to book. You can measure all the time and so on.
有太多太多东西 当然还有公司的市值
So a lot of these things and also just the entire company’s values that
每天都在变
you know changes as the day throughout the day. So anyway.
问题很快就会变得非常复杂
It can get really complex really quick.
这里我就想用个简单的例子
So we just wanted to use a really simple example. But
如果你想要更复杂的例子 我还真有一个
if you are looking for a more complex version. I do have one.
不过就这样吧 这就是回归算法
But anyways. That’s it with regression.
希望你们可以从我的错误中学到东西
Hopefully you can learn from my mistakes down here. You’ll……
我可能还会犯错 你也会
I’ll probably continue making mistakes and you’ll probably make mistakes too.
但这也算难免会遇到的吧
And that’s just like part of it honestly.
还好我们发现了错误
So luckily we could visualize this and we could catch it.
但大多数时候你未必能看出错误来
But a lot of times you’re not going to be able to visually catch it.
你只能一遍一遍地读代码
So you want to like read and reread and all that your code.
但错误还是不可避免
But still you’re going to make mistakes. So…
除非你是个机器人 希望你们能吸取教训
Unless you’re a robot or something. So anyways. Hopefully you can learn from my mistakes.
回归我们就讲到这里
Otherwise we’re going to be leaving regression behind now.
分类也一样
And traversing into classification.
继续关注我的视频吧 感谢收看
So stay tuned for that. As always thanks for watching.
