未登录,请登录后再发表信息
最新评论 (0)

“直方图”是怎么来的,你知道吗?

StatQuest: Histograms, Clearly Explained

My cat
我的猫
does stats
睡觉的时候
which she sleeps
会统计
I like to do stats how about you
而我喜欢在清醒的时候统计
when I’m awake
你呢
Stat Quest
《征服统计学》
Hello and welcome to stat quest
你好 欢迎来到《征服统计学》
StatQuest is brought to you by the friendly folks in the genetics department
本栏目由北卡罗莱纳大学
at the University of North Carolina at Chapel Hill
教堂山分校遗传学系制作
Today we’re gonna be talking about histograms, and they’re gonna be clearly explained.
今天我们要好好讲讲直方图
Imagine we went out and measured someone
如果我们出门去量了一些人的身高
and they were this tall
他们有的有这么高
and then we measured someone else and
然后我们再量量其他人的
Then we measured a whole bunch of people
接着我们又量了一大群人
We’ve measured so many people that the dots overlap
我们量了这么多人的身高 有些点都重叠了
some dots are completely hidden
有些点甚至完全被遮住了
We could try to make it easier to see the hidden measurements
我们可以试着把相同数据的点叠加起来
by stacking any that are exactly the same
这样就能更容易找到那些被遮住的点了
But measurements that are the exact same are rare
但是相同数据的点实在是太少了
and a lot of the hidden measurements are still hidden
还是有很多的点被遮住
so instead of stacking measurements that are the exact same
所以我们不再将相同的点叠加
we divide the range of values into bins
而是划分成不同的赋值区间
And stack the measurements that fall in the same bin
然后再叠加分布于同一区间的数据点
This my friends is a histogram
而这 我的朋友们 就是一张直方图
Bam

The taller the stack within a bin
一个区间内叠加地越高
the more measurements we made that fall into that band
就意味着落入的数据点就越多
Duh

we can use the histogram to predict the probability
我们可以用直方图来预测
of getting future measurements
能得到未来估值的概率
I Would be willing to bet that the next measurement we make is somewhere in this range
我想我们该对这个范围内的数据进一步评估
Measurements out here are rarer
这里的数据点很少
and less likely to happen in the future
意味着在未来发生的概率很小
If you want to use a distribution to approximate your data or future measurements
如果你想大致判断数据或未来估值的分布
Histograms are a good way to justify your decision
直方图不失为佐证决策的好办法
By the way
顺便提一句
if you don’t know what a distribution is
如果你不知道什么是“分布”
n is there’s a StatQuest for that.
这里有针对“分布”所作的解释
In this case
在这个例子中
we might use a normal distribution
我们可能会用一个常规分布
to approximate the data and future measurements
来大致判断数据与未来估值
if the data look like this
如果数据看上去是这样的
We might use an exponential distribution
我们可能会用一个指数分布曲线
to approximate this data and future measurements
来大致判断这些数据与未来估值
Note
做笔记的时间到了
figuring out how wide to make the bins is tricky
要确定这些区间的大小是很棘手的
If The bins are too narrow, then they are not much help
如果区间太小了 就等于没什么用
In this case the bins are so narrow
在这个例子中的区间太小
that pretty much every measurement gets its own bin
几乎每一个数据点都属于一个区间
This doesn’t give us much more insight than what we had before
这并没有提供给我们更多的洞见
so it’s not very useful
所以这样就不是很有用
And if the bins are too wide
要是这些区间太大
they are not much help
也不是很有用
In this case the bins are so wide
这个例子中的区间太大
that the measurements are split 50/50
数据点就只能对半分了
all this tells us this how many measurements are above the average,
这些告诉我们多少数据点在平均值之上
and how many are below
多少在平均值以下
this is more insight than before,
虽然这比之前更有参考价值
but we can do better
但我们可以做地更好
Sometimes you have to try a bunch of different bin widths
有时候 在你能得到一个清晰的结果之前
before you get a clear picture
你得尝试取一大堆不同的区间值
In other words
换句话说
don’t rely on the default setting of whatever program you’re using to draw the histogram
不要依赖任何绘制直方图程序的默认设置
You’ve got to try a bunch of different settings be
在肯定得到了最理想的直方图之前
fore you’re sure that you’ve got the best histogram you can draw
你该试着设置一大堆不同的区间值
Hooray,
好极了
we’ve made it to the end of another exciting StatQuest
我们又完成一趟激动人心的《征服统计学》之旅
if you like this StatQuest and wanna see more like it,
如果你喜欢我们的频道 想获取更多相关内容
please subscribe.It’s really easy.
很简单 只需订阅频道
and if you have any suggestions for future StatQuests
如果你对未来的《征服统计学》有任何建议
Just let me know in the comments below until next time
下期视频之前 请在评论区留言
quest on!
生命不息 征服不止!

发表评论

译制信息
视频概述

对直方图的形成过程、原理与应用、绘制直方图的注意事项等等做了详尽的解释,说明其对统计学中的数据分析和决策具有一定作用。

听录译者

收集自网络

翻译译者

YXG-4e45d

审核员

审核员CR

视频来源

https://www.youtube.com/watch?v=qBigTkBLU6g

相关推荐