大家好 我是《程序员面试金典》的作者 盖尔·拉克曼·麦克道尔
Hi, I’m Gayle Laakmann McDowell, author of Cracking the Coding Interview.
Today I want to talk about memoization and dynamic programming. This is a super important topic.
一开始可能比较难理解 但是一旦学会了 你就会发现其实很简单
It’s challenging a bit at first but once you get the hang of it it gets a lot easier.
Fibonacci’s pretty much the classic recursive example.
其中第 n 项是 n-1 项与 n-2 项的和
Fib of n is the sum of fib of n minus 1 plus fib of n minus 2.
And then we need some starting conditions like if f of 0 is 0 and and fib of 1 is 1.
And we can very easily write out the first few cases here.
But now let’s think about the code.
So transation this into code is very straight forward as well.
But there’s a problem when you think about the runtime here.
假如我们调用 fib(6) 那么fib(6)就会调用 fib(5) 与 fib(4) fib(5)会调用fib(4)与fib(3) fib(4)会调用 fib(3)与fib(2) 等等
So if we call this for say fib of 6, fib of 6 will call fib of 5 and fib of 4, 5 will call 4 and 3, 4 will call 3 and 2 etc etc.
Then we finish a single branch and we still need to do all the other branches.
这样会很低效 尤其是在调用 fib(n)时
So that’s going to get very very inefficient, specifically if we imagine calling this for say fib of n,
会一直循环调用下去 总共要调用 n 层
we are going to call fib of n, and that’s going to call n, fib of n minus one, fib of n minus 2, fib of n minus 3, all the way down, to roughly n levels.
导致树里面有 n 层 第一层有一个节点 第二层有两个节点
So we’ll have n levels in our tree. The first level will have one node, the second level will have two nodes.
每个节点有两个子节点 第三层有四个节点 第四层有八个节点等等以此类推
Each node has two children, so we will have four nodes of the next level, then 8 and so on and so on.
So if you take one you double it n times because we have double the number nodes at every single level,
we’re gonna wind up with roughly 2 to the n nodes in the tree.
So this gives us a runtime of O of 2 to the end,
and if you remember I was able to compute this by hand fairly quickly and so there’s a bit of a mismatch here.
Why is this algorithm so inefficient?
And the reason is that we keep doing things repeatedly.
We compute the fib of 4 twice, here and here.
Fib of 3 gets computed 3 times here, here, and here.
Fib of 2 gets computed five times, here, here, here, here, and here.
Now obviously these values haven’t changed, fib of 4 is always the same thing, fib of 3 is always the same thing.
So what we should be able to do is just compute these values once
and if we ever have to repeat them again just return the stored value.
So we store the value each time rather than computing it from scratch.
And that’s what the most basic implementation of a memoization algorithm is.
So now if we walk through what this code does we’re we are still going to go and compute this whole path down here
but when we do that we’re going to build up this table.
So then when we get down to computing fib of 1, that’s a base case, fib of 2, hey already been computed
we just return the value. Fib of 3, already computed, fib of 4, already computed.
So although we still recurse just as deeply
all our other work has been done by that point and we’re just returning stored values.
And now we get a linear runtime instead of exponential which is of course an enormous improvement.
So now let’s talk about space complexity.
Space complexity comes from any memory you use,
and that includes yet things like creating arrays and stuff like that but it also includes the call stack.
So let’s think about the space complexity of the non memoized solution.
A lot of people describe this as of O of 2 to the n and that’s actually a mistake.
People were thinking well gee all O of 2 to the n things are all on the call stack so we need all of that.
Yes but they’re not all on the call stack at the same time.
One way to think about space complexity is, if I were to give you a chunk of memory to run your program
how big would that chunk of memory have to be?
Although all O of 2 to the n things wind up on the call stack overall time,
they’re not all sitting here at the same time.
The biggest your call stack ever gets, the most memory you have used at any one point in time, is O of n.
There’s only O of n things that wind up on the call stack at any one given time.
So the space complexity, the non-memoized solution is O of n.
The memoized solution also has a space complexity of O of n,
because we need O of n memory for the call stack still and we need another O of n memory for the memo table.
但是O(n)加 O(n)仍然是O(n) 所以两者都是O(n)的空间复杂度
But O of n plus O of n is still O of n. So both of these have a space complexity of O of n.
Now let’s try this out on a new problem.
假定有个n*n的正方形棋盘 其中一些格子被占用了 而你只能向下或向右移动
Suppose you’re given an n-by-n grid and you can only move down or to the right and certain cells are blocked off in this grid.
The question is, how many paths are there from the top left to the bottom right?
As you might guess, we want to approach this recursively, so let’s break this down into smaller problems.
The first, the overall problem is, how many paths are there from the top left to the bottom right.
Well let’s think about this green guy in the top left corner.
The first decision he needs to make is, well am I going to go down or to the right. Those are my only options.
So let’s picture each of those options.
There are only those two ways of going.
So the number of paths from the start to the end
is the number of paths from position a to the end plus number number paths from position B to the end.
So now we’ve broken down our problem into other sub problems.
So now what happens?
Well person A has to go either down or to the right again.
b位置也是同理 现在a位置的问题变成了c d位置的问题
Same with position B. So now the number of paths from A to the end can be broken down into C&D.
Its number of paths from C to the end plus the number of paths from D to the end.
Same thing with B. Its number of paths from B to the end, is number paths from C to the end, plus the number of paths from E of the end.
This recursion stops when we have a definitive answer and that comes when you’re actually at the end.
The number of paths from the end to the end is just 1.
You stand exactly where you are. So that forms our recursion base cases.
Translating this into code is relatively straight forward.
We just traverse down and to the right, obviously making sure we check you know they’re not out of bounds or in a blocked-off square,
but then we just sum up the number of ways of going down plus the number of ways of going to the right.
And when we get down to the actual end we return 1.
This is a pretty straight forward implementation of that algorithm,
valid square does both the boundary checking as well as making sure we’re not in a blocked-off square.
If we are then we turn 0 because there’s no path,
右下角终点处返回1 因为只有一种走法 原地踏步
if we’re at the end return one because there’s just the one path, you stay right there.
Otherwise number of paths from here to the end is the number of paths going down to the end, and plus the number of paths going right the end.
One little tip by the way whenever I do a matrix problem I like to use row and columns for my variable names rather than x and y.
And the reason is that the x-coordinate actually translates to columns and the y-coordinate translates to the rows.
But people tend to instinctively write grid of X comma Y, and that’s backwards.
It should be grid of Y comma X. It just leads to a lot of confusion.
相信我 别用xy处理矩阵 而用行与列
So take it from me, don’t use X&Y when you deal with matrices, use rows and columns instead.
Well one thing we might notice is that in the example above we compute the number of paths from C to the end twice, and that’s unnecessary.
What we should be doing is saying, hey if I’ve already computed something just return that stored value.
所以每计算出一个结果 都存储起来 一旦需要立刻返回存储的值
So every single thing I compute I want to just store that value and return it if I’m ever asked for it again.
And that’s a memoization approach.
Now instead we just compute it once and then if we ever need that value again
just return that stored value and that’s all we have to do here.
时间复杂度是 O(n^2) 因为有n的平方个可能的格子
The runtime now is O of n squared, because we have n squared possible cells that we have to compute this form,
but each time we compute it, its constant time given that we have other work already done.
So that gives us a runtime of O of n squared.
So if we want to we can flip this around and do more of a traditional dynamic programming approach.
So let’s take a look at how this would work.
If we think about what the recursive approach does, the very first concrete values it gets other than the base cases are these two cells.
So that gives us two concrete answers.
Now where can we go from here?
Well the next thing if we take that recursion one step up, the next thing it does is it might be filling in one of these three cells.
So let’s start with this one here.
Well we know the answer to that cell and that cell and that cell and that cell and that cell.
All of these cells have just one path from the, from their cell to the end which is this path here.
Now what about this cell here?
Well if there’s one path going down, one path one to the right, the actual answer is their sum.
So there’s two paths there. Now here again one path there.
Now when we go to this cell there are two paths going down or to the right.
The sum is going to be these two, the sum of those two cells.
And this one here, well we can go down or go to the right.
There’s two paths to go to the right, one path if we go down.
So the actual result for the number of paths from this cell to the end is three paths.
And we could double-check that if we want.
第一种走法是向下然后一直向右 第二种是向右 向下 再一直向右
One path is going down and then all the way over, another path is going to the right, down, and then all the way over,
第三种是向右 向右 再向下 再一直向右
or to the right, to the right again, down, and all the way over.
So sure enough, yes, there are three paths.
And we can do this exact approach going through the entire grid,
starting from the bottom left moving to the left, sorry,
starting from the bottom right, moving all the way to the left, up one row,
all the way to the left, up one row all the way to the left, filling in these values as we go.
这里是1 这是1 这里是0
So we got a 1, a 1, actually it’s 0 here,
because there’s actually no paths from this cell down to the right.
这是3 因为只能向下走 这也是3 因为只能向右走
And here they’ll be three, because we can only go down. Here three again, we can go only go to the right.
4 1 1 1 1 1 这是3 这是7 1 2 1 2
4, 1, 1, 1, 1, 1, and then 3 here, 7 here, 1, 2, 1, 2
and we can go through and fill in the entire cell, entire matrix this way.
Our runtime is still O of n squared and so is our space complexity
But we’ve reduced the actual amount of memory use slightly because we no longer have to use a call stack.
So these are a couple different ways of approaching problems.
You can tackle things recursively, that’s often a pretty good way to get your handle on the problem in the first place.
Then you might implement a memoization approach and just throw in a memo table,
记忆表可以是哈希集合 哈希表 或是数组 都可以
maybe that’s a hash set, a hash table, maybe just a simple integer array will work.
And sometimes you can easily flip that around and implement an iterative traditional dynamic programming approach.
So try these problems out, I really recommend before your interview practicing a whole bunch of these.
这些问题很有规律 一旦掌握了窍门 很容易解决
These problems are actually pretty formulaic and once you get the hang of them they can be very very easy to tackle.
但一开始可能有点棘手 所以加油练习吧 祝你好运
But when you start off first they can be kind of tricky. So do your best, practice these, and good luck.
大家好 我是《程序员面试金典》的作者 盖尔·拉克曼·麦克道尔