最新评论 (0)

数据结构:#4 数组vs链表

Data Structures: Arrays vs Linked Lists

In our previous lesson, we introduced you to linked list data structure
上节课 我们介绍了链表这种数据结构
and we saw how linked lists solve some of the problems that we have with arrays.
So now the obvious question would be which one is better – an array or a linked list.
那么 数组和链表哪个更好呢?
Well, there is no such thing as one data structure is better than another data structure.
其实 不能说某种数据结构要比另一种更好
One data structure may be really good for one kind of requirement,
while another data structure can be really good for another kind of requirement.
So, it all depends upon factor like what is the most frequent operation that you want
这取决于一些因素 比如对数据最频繁的操作是什么
to perform with the data structure or what is the size of the data
and there can be other factors as well.
So, in this lesson, we will compare these two data structures based on some parameters,
那么 本节课 我们将根据参数和操作复杂度
based on the cost of operations that we have with these data structures.
So, all in all we will comparatively study the advantages and disadvantages and try to
总之 我们会用对比的方式来研究两种结构的优缺点
understand in which scenario we should use an array
and in which scenario we should use a linked list.
So, I will draw two columns here, one for array and another for linked list
我来画一个双栏对比图 一栏数组 一栏链表
and the first parameter that we want to talk about is the cost of accessing an element.
Irrespective of the size of an array
it takes constant time to access an element in the array.
This is because an array is stored as one contiguous block of memory.
So, if we know the starting address or the base address of this block of memory.
只要我们知道内存块的起始地址 即基地址
Let us say what we have here is an integer array and base address is 200.
假设有一个整型数组 基地址是200
The first byte in this array is at address 200.
Then let’s say if we want to calculate the address of element at index i,
then it will be equal to 200 plus i into size of an integer in bytes.
So, size of an integer in bytes is typically 4 bytes.
So, it will be 200 + 4*i.
那么 就是200 + 4*i
So, if 0th element is at address 200
if we want to calculate the address for element
at index 6, it will be 200 plus 6 into 4 which will be equal to 224.
So, knowing address of any element in an array is just this calculation for our application.
因此 任意元素的位置都可以通过计算得到
In terms of big-oh notation, constant time is also called O(1).
在大O表示法中 常数时间又记作O(1)
So, accessing an element in an array is O(1) in terms of time complexity.
因此 访问数组元素的时间复杂度是O(1)
If you are not aware of big-oh notation
check the description of this video for a tutorial on time complexity analysis.
Now, in a linked list, data is not stored in a contiguous block of memory.
而对于链表 数据并不是储存在连续的内存块中
So, if we have a linked list something like this, let’s say we have a linked list of integers here,
如果我们有一个这样的链表 假定它是个int型链表
then we have multiple blocks of memory at different addresses.
Each block in the linked list is called a node and each node has two fields,
每个内存块称为一个结点 每个结点包含两个数据域
one to store the data and one to store the address of the next node.
一个储存数据 一个储存下一结点的地址
So, we call the second field, the link field.
The only information that we keep with us about a linked list is the address of the first node
which is also called the head node.
And this is what we keep passing to all the functions also, the address of the head node.
To access an element in the linked list at a particular position
we first need to start at the head node or the first node
需要从头结点 或第一个结点开始遍历
then we go to the second node and see the address of the third node.
到第二个结点 然后得到第三个结点的地址
In the worst case, to access the last element in the list,
we will be traversing all the elements in the list.
In the average case, we will be accessing the middle element may be.
平均情况下 我们要访问链表中间的元素
So, if n is the size of the linked list, n is the number of elements in the linked list,
假设链表的大小为n 即链表有n个元素
then we will traverse n/2 elements.
So, the time taken in the average case
因此 平均情况时所需的时间
also is proportional to the number of elements in the linked list.
So, we can say that the time complexity in average case is O(n).
因此 平均情况的时间复杂度是O(n)
So, on this parameter, cost of accessing an element, array scores heavily over linked list.
对于访问元素的复杂度 数组远优于链表
So if you have a requirement where you want to access elements in the list all the time,
then definitely array is a better choice.
Now, the second parameter that we want to talk about is
接下来 我们要讨论的第二个参数是
memory requirement or memory usage.
内存需求 或称内存使用情况
with an array, we need to know the size of the array before creating it,
创建数组之前 我们需要知道它的大小
because arrays is created as one contiguous block of memory.
So, array is of fixed size.
因此 数组大小是固定不变的
What we typically do is create, we create a large enough array
and some part of the array stores our list
and some part of the array is vacant or empty so that we can add more elements in the list.
其他部分空着 以备存放后续数据
For example, we have an array of 7 integers here and we have only 3 integers in the list.
例如 一个大小为7的整型数组 里面只有3个整型变量
Rest 4 positions are unused.
There would be some garbage value there.
With linked list, lets say we have, let’s say we have this linked list of integers,
对于链表 我们可以说
there is no unused memory.
We ask memory for one node at a time, so we do not keep any reserved space.
每次只申请一个结点的内存 无需预留
But we use extra memory for pointer variables
and this extra memory requirement for a pointer variable in a linked list can not be ignored.
In a typical architecture let’s say
integer is stored in 4 bytes and pointer variable also takes 4 bytes.
整型数据占用4字节 指针也是4字节
So, if you see, the memory requirement for this array of 7 integers is 28 bytes.
And the memory requirement for this linked list would be 8*3,
where 8 is the size of each node, 4 for integer and 4 bytes for the pointer variable.
其中 8是每个结点的大小 即4字节整数和4字节指针
So, this is also 24 bytes.
If we add one more element to the list in the array, we will just use one more position,
如果增加一个元素 只需占用数组一个空位
while in linked list we will create one more node, and will take another 8 bytes,
而链表中 得创建新结点 又需要8个字节
so this will be 32 bytes.
Linked list would fetch us a lot of advantage if the data, the data part is large in size.
如果要存放大量数据 链表会占很大优势
So, in this case, we had a linked list of integers, so integer is only 4 bytes.
What if we had a linked list in which the data part was
那如果有一个链表 数据域更为复杂
some complex type that took 16 bytes.
占用了16个字节 又会怎样呢?
So, 4 bytes for the link and 16 bytes for the data, each node would have been 20 bytes.
4字节指针 16字节数据 每个结点共20字节
An array of 7 elements for 16 bytes of data would be
16 byte for each element would be 112 bytes.
每个元素16字节 共112字节
And linked list of 4 would be only 80 bytes.
So, it all depends.
因此 要视情况而定
If the data part of a list takes a lot of memory,
linked list will definitely consume lot less memory.
Otherwise, it depends what strategy we are choosing to decide the size of the array.
反之 所占内存多少由数组大小决定
At any time how much array are we keep unused.
不论何时 数组都需要预留内存
Now, one more point with memory allocation
另外 关于内存分配还要补充一个知识点
because arrays are created as one contiguous block of memory,
sometimes when we may want to create a really really large array, then
maybe memory may not be available as one large block,
but if we are using linked list, memory may be available as multiple small blocks.
但用链表的话 数据可以储存在零散的内存块中
So, we will have this problem of fragmentation in the memory.
Sometimes, we may get many small units of memory
but we may not get one large block of memory.
This may be a rare phenomenon, but this is a possibility.
这种情况或许不常见 但存在这种可能性
So, this is also where linked list scores.
因此 这也是链表的优势
Because arrays have fixed size, once array gets filled and we need more memory, then
由于数组的大小不变 一旦填满 需要更多内存时
there is no other option than to create a new array of larger size
and copy the content from the previous array into the new array.
So, this is also one cost which is not there with linked list.
So, we need to keep these constraints and these requirements in mind when we want to
因此 根据需求选用数据结构时
decide for one of these data structures for our requirement.
Now, the third parameter that we want to talk about is
接着 我们要讨论的第三个参数是
cost of inserting an element in the list.
Remember when we are talking about arrays here, we are also talking about
请记住 这里我们讨论的数组包括
the possible use of array as dynamic list.
So, there can be 3 scenarios in insertion.
First scenario will be when we want to insert an element at the beginning of the list.
Let’s see we want to insert number 3 at the beginning of the list.
In the case of arrays,
we will have to shift each element by one position towards the higher index.
So, the time taken will be proportional to the size of the list.
因此 所需时间与数组大小成正比
So, this will be O(n).
Let’s say n is the size of the list.
This will be O(n) in terms of time complexity.
In the case of linked list, inserting a node in the beginning
而对于链表而言 要在首部插入一个结点
will mean only creating a new node and adjusting the head pointer and the link of this new node.
只需创建新的结点 调整头结点和新结点的指针
So, the time taken will not depend upon the size of the list, it will be constant.
因此 所需时间是常数 与列表大小无关
So, for linked list
所以 对链表来说
inserting an element at the beginning is O(1) in terms of time complexity.
Inserting an element at end for an array, let’s say we are talking about dynamic array,
如果在尾部插入元素 这里讨论的是动态数组
a dynamic list in which we create a new array if it gets filled.
If there is space in the array, we just write to the next higher index of the list.
如果数组中还有空位 只需继续写入元素
So, it will be constant time.
So, time complexity is O(1) if array is not full.
If array is full, we will have to create a new array and copy all the previous content
若数组已满 需新建数组并复制原有数据
into new array which will take O(n) time where n is the size of the list.
若数组大小为n 时间复杂度就是O(n)
In the case of linked list, adding an element, inserting an element at the end will mean
对链表来说 在末尾添加一个元素意味着
traversing the whole list and then creating a new node and adjusting the links.
遍历整个链表 新建一个结点并调整指针
So, time taken will be proportional to n.
因此 所需时间与n成正比
I will use this color coding for linked list.
Here n is the number of elements in the list.
Now, the third case will be when we want to insert in the middle of the list
at some nth position or may be some ith position.
比如在第n个位置 或者第i个位置
Again in the case of arrays, we will have to shift elements.
对数组而言 仍需将元素依次后移
For the average case, we may want to insert at the mid position in the array.
平均情况下 需要在数组正中间插入元素
So, will have to shift n/2 where n is the number of elements in the list.
因此 数组大小为n 则需要后移n/2个元素
So, the time taken is definitely proprotional to n in average case.
平均情况下 所需时间与n成正比
So, complexity will be O(n).
因此 时间复杂度为O(n)
For linked list also we will have to traverse
对链表而言 仍然需要遍历
till that position and then only we can adjust the links.
直到找出位置 然后仅需调整指针
Even though we will not have any shifting, we will have to traverse till that point and
虽然不需要后移 但也要遍历到那个位置
in the average case, time taken will be proportional to n
平均情况下 所需时间也和n成正比
and the time complexity will be O(n).
If you see , deleting an element will also have these 3 sceanrios
显然 删除一个元素 也分为3种形式
and the time comeplxity for deleting for these 3 sceanrios will also be the same.
And the final point, the final parameter that I want to talk about
is which one is easy to use and implement.
An array definitely is a lot easier to use.
显然 数组更容易使用
Linked list implemetation especially in C or C++ is more prone to errors
链表更容易出现错误 尤其是在C或C++中
like segmentation fault and memory leaks.
It takes good care to work with linked lists.
使用链表时 需要多加注意
So, this was arrays vs linked lists.
In our next lesson, we will implement linked list in C or C++.
下节课 我们将在C或C++中应用链表
We will get our hands dirty with some real code.
So this is it for this lesson.
Thanks for Watching !