ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

数据结构:哈希表 – 译学馆
最新评论 (0)


Data Structures: Hash Tables

大家好 我是《程序员面试金典》的作者 盖尔·拉克曼·麦克道尔
Hi, I’m Gayle Laakmann McDowell, author of Cracking the Coding Interview.
今天的主题 哈希表 你一定不想错过
Today’s topic is one that you really don’t want to miss – hash tables.
A hash table is possibly the most useful data structure for interview questions.
无论是面试 还是实际工作中都免不了要使用哈希表
It comes up all the time both in interviews and in real life.
In fact one technique I often tell people is just,
如果遇到问题了 首先考虑试试用哈希表来解决
for any problem, have hash tables at the top of your mind as a possible technique to solve the problem.
So let’s talk a bit about what a hash table is.
从大面上来说 哈希表是键值对查找集合
At a high level a hash table is a key value lookup
即 在哈希表中可以利用 键 来快速查找 值
So it gives you a way of, given a key, associating a value with it for very very quick lookups.
假设 你要利用某人姓名存储并索引相关信息
So suppose you had some situation where you needed to associate somebody’s name with some set of information about them.
那么 哈希表就是最好的选择
A hash table would be the perfect solution for this problem because
you can just put this into the hash table and then you can say,
而当需要某人的数据(比如玛丽的)时 能够立刻查找出来
okay give me the data associated with Mary and then boom we can get that information immediately.
哈希表中 键 值 都可以是任何数据结构类型
So in a hash table the key as well as the value can be basically any type of data structure.
通常使用字符串 但是圆形 方形 人等等类型都是可以的
A string is often used but it could be a circle, a square, a person,
只要实现了哈希函数 都可以
pretty much anything, as long as you have a hash function.
So what does that mean? Well let’s turn to the implementation to talk about that.
总得来说 我们需要把存储的对象存进数组 我们画个图看下
So at a high-level we do want to store the objects in an array.So let’s picture that.
那我们怎么通过 键 的值获取对应的数组下标呢?
But how do we actually jump from a string to a particular index in the array?
Well that’s what a hash function does.
哈希函数接受字符串作为参数 将其转换为整数
So a hash function takes in a string, converts into some sort of integer,
然后重映射这个整数 从而确定数组下标
and then remaps that integer into an index into that array.
And then that’s where we can find the person we’re looking for.
要注意的是 哈希值并不是数组中的实际下标
So it’s important to understand that the hash code is not actually the index in this array.
我们利用 键 的值生成了哈希值 又用哈希值生成下标
We map actually from the key to the hashcode and then over to the index.
这么做的原因是 哈希表中存储数据的数组
And one of the reasons for this is that the array that actually stores the data from the hash table
might be much much smaller than all the available potential hash codes.
不可能因为有 30 亿哈希值我们就建 30 亿长的数组
And so we don’t want to have an array of three billion just because there’s three billion potential hash codes.
We actually remap it into something smaller.
要注意的是 两个不同的字符串可能有相同的哈希值
Now note here that two different strings could actually have the same hash code
因为不同字符串有无穷多个 但哈希值的数量是有限的
and that’s because there are an infinite number of strings out there but a finite number of hash codes.
所以理论上来说 艾利克斯 和 萨拉 的哈希值可能是一样的
So it’s theoretically possible for Alex and Sarah to actually have the same hash code.
此外 因为哈希值被我们用某种方式映射为更小的数值
Additionally since we’re remapping the hashcode into an even smaller index,
two things with different hash codes could actually wind up mapped to the same index.
So what do we do when this happens, which is called the collision?
事实上有很多种处理方式 强烈建议你们花点时间自己研究下
There are different ways of resolving collisions and I really do encourage you to look this up on your own time,
我只讲其中的一种 称之为 拉链法
but I’ll just talk about one of them which is called chaining.
拉链法应该是最常用的 也很简单
And chaining is possibly the most common one and it’s very simple.
拉链法是指 当发生碰撞时 将重复的元素存储在链表中
But it basically means is just, hey when there is collisions just store them in a linked list.
存储结构不再是存储有人信息的数组 而是数组中每个元素指向一个存有人信息的链表
So rather than this being an array of people it’s actually going to be an array of a linked list of people.
也就是说 当你想要取得 艾利克斯 的值时
And so that means though that when you get a call to say hey get the value of Alex,
you actually need to walk through all the values in that linked list, and pull out the value for Alex.
需要注意 这里的链表不仅要存储 人的信息
Now as you’ll note here this linked list contains not just the actual person objects
还要存储原来的 键 (即人名)
but the actual original keys as well.
这么做的原因是 如果你只存储 人的信息 那么遍历的时候就会遍历到同一链表的所有 人的信息
And the reason for that is that if you only store the person object you’d see all these people who mapped to this index
这样就不知道所属信息是哪个人的 所以存储 键(人名) 的值十分必要
but you wouldn’t know which one they are and so you actually just store the keys with them.
当查找 艾利克斯 的信息时 实际上是在调用哈希函数
So when you get a call to say get me the value for Alex, then you actually call this hash function,
然后取得哈希值 映射哈希值获得数组下标 再通过下标遍历该数组元素指向的链表 获得 艾利克斯 的信息
you get a hash code back, you map over to this index and then you walk through and look for the thing with the key of Alex.
这就是哈希表的工作原理 我们再从头到尾实现一遍
So that is basics of how a hash table operates. So let’s walk through this from start to end.
新建一个哈希表的类 并定义一个数组来存储数据
We’re going to have this hash table class that underneath it has a array that actually holds the data.
数组中不会存储实际的值 而是存储各个存有实际值的链表首地址
And the array isn’t going to be an array of the actual values it’s going to be an array of the linked list of the values.
如果要存储 艾利克斯 的信息 即将艾利克斯这个名字映射到这个人的信息
Then when someone says put the value of Alex, put the value Alex mapping to this person,
we call this hash code function that gets us back the hashcode.
然后映射哈希值获得一个索引 即数组下标的值
Then we map from this hash code over to an index and that gets us over to this index in the array
并把要存的键值对存进该数组元素指向的链表 如果之前没有元素 那么就是单元素的链表
and then we put it into this linked list. If there’s nothing else, great, then we’re just going to have a one element linked list.
但有时链表中已经有多个元素 那就要遍历它们
But there could be multiple values in there, in which case we walk through it.
如果已经有 艾利克斯 的信息了怎么办?
Now what if there is already a value of Alex in there?
Well then we just fix that value immediately.
下面讲讲 哈希表的运行时间
So let’s talk now about the runtime of a hash table.
运行时间取决于使用条件与 我们所做的设定
Well it really depends on what we’re talking about and what assumptions we make.
大多数情况下 假定哈希表的哈希函数都能很好地分布哈希值
In many cases we assume that we have a good hash table and a good hash function that really distributes our values well
而目前为了面试需要 我们通常假设
and so for the purpose of an interview, we often just summarize this and say okay,
getting and setting in a hash table is constant time.
而实际情况是 有时会有未知情况发生 比如哈希函数不可靠等等
In reality if something weird happens we have a bad hash function blah blah blah,
we could also say it is linear time in the very worst case.
But for the purpose of most problems we generally talk about constant time
因为实际上 绝大部分情况下哈希表工作的都很好
because in the real world we’re going to make pretty sure hopefully that we have a good hash table.
So this is a bit of a high level view of what a hash table is and how it works.
There’s a whole lot of complexity you can dive into here about different ways of handling collisions
此外 当哈希表
and exactly what happens when a hash table,
中存储元素过多 哈希表是如何扩容的?
you start to put a ton of elements in the hash table, how does it grow with new elements?
Does it can be resized? All that sort of stuff.
I really do encourage you to go look at this stuff on your own time.
It’s a fantastic thing to really understand a lot of the complexity here.
But for the purpose of a lot of interview questions
we just sort of simplify all this down and summarize things as,
前面的讨论都是基于哈希表是完美的 插入查找与删除都是常数时间复杂度
let’s assume we have a good hash table and so we get this beautiful constant-time insert find and delete.
So now that you understand the basics,
go ahead and do try to learn these more complex details of hash tables,
but also go practice some of these basics on your own time on your own problems.
Good luck