The name is largely a marketing construct. Here is the inventor of the term, Richard Bellman, on how it came about:
"I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes. An interesting question is, Where did the name, dynamic programming, come from? The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word research. I'm not using the term lightly; I'm using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term research in his presence. You can imagine how he felt, then, about the term mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word "programming". I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying. I thought, let's kill two birds with one stone. Let's take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that it's impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities."
Note that Bellman's claim that "dynamic" can be use pejoratively is surely false: most people would not favor "dynamic ethnic cleansing"!
Nothing special here about steel rods: the algorithm applies to any good that can be sub-divided, but only in multiples of some unit, like lumber, or meat, or cloth.
Keeps calculating the same cuts again and again, much like
naive, recursive Fibonacci.
Running time is exponential in n. Why?
Our textbook gives us the equation:
This is equivalent to:
T(n) = 1 + T(n - 1) + T(n - 2) + ... T(1)
For n = 1, there are 20 ways to solve the problem.
For n = 2, there are 21 ways to solve the problem.
For n = 3, there are 22 ways to solve the problem.
Each additional foot of rod gives us 2 * (previous number of ways of solving problem), since we have all the previous solutions, either with a cut of one foot for the new extension, or without a cut there. (Similar to why each row of Pascal's triangle gives us the next power of two.)
So, we have the series:
2n - 1 + 2n - 2 + 2n - 3... + 20 + 1
And this equals 2n. Why?
Example: 24 = 23 + 22 + 21 + 20 + 1
Or, 16 = 8 + 4 + 2 + 1 + 1
Much like we did with the naive, recursive Fibonacci, we
can "memoize" the recursive rod-cutting algorithm and
achieve huge time savings.
That is an efficient top-down approach. But we can also do a bottom-up approach, which will have the same run-time order but may be slightly faster due to fewer function calls. (The algorithm uses an additional loop instead of recursion to do its work.)
The above is the Fibonacci sub-problem graph for fib(5). As you can see, F5 must solve F4 and F3. But F4 must also solve F3. It also must solve F2, which F3 must solve as well. And so on.
This is the sort of graph we want to see if dynamic programming is going to be a good approach: a recursive solution involves repeatedly solving the same problems.
This is quite different than, say, a parser, where the code sub-problems are very unlikely to be the same chunks of code again and again, unless we are parsing the code of a very bad programmer who doesn't understand functions!
In this section, we see how to record the solution we arrived at, rather than simply return the optimal revenue possible. The owner of Serling Enterprises will surely be much more pleased with this code than the earlier versions.
In the console below, type or paste:
!git clone https://gist.github.com/80d2a774f08f686f675f8a9254570da0.git
from dynamic_programming import *
Now let's run our ext_bottom_up_cut_rod() code.
(Link to full source code below.)
Type or paste:
(revs, cuts, max_rev) = ext_bottom_up_cut_rod(p4, 4)
You can go explore more, by designing your own price
arrays! Just type in:
my_name = [x, y, z...]
where 'my_name' is whatever name you want to give your price array, and x, y, z, etc. are the prices for a cut of length 1, 2, 3, etc.
There are many ways to parenthisize a series of matrix multiplications. For instance, if we are parenthisizing A1 * A2 * A3 * A4, we could parenthisize this in the following ways:
(A1 (A2 (A3 A4)))
(A1 ((A2 A3) A4))
((A1 A2) (A3 A4))
((A1 (A2 A3)) A4)
(((A1 A2) A3) A4)
Which way we choose to do so can make a huge difference in run-time!
Why is this different than rod cutting? Think about this for a moment, and see if you can determine why the problems are not the same.
In rod cutting, a cut of 4-2-2 is the same cut as a
cut of 2-2-4, and the same as a cut of 2-4-2.
That is not at all the case for matrix parenthisization.
The number of solutions is exponential in n, thus brute-force is a bad technique for solving this problem.
For any place at level n where we place
parentheses, we must have optimal parentheses
at level n + 1.
Otherwise, we could substitute
in the optimal n + 1
level parentheses, and level n would be better!
If we know the optimal place to split A1...
An (call it k), then the optimal solution is
that split, plus the optimal solution for A1...
Ak and the optimal solution for
Ak+1... An. Since we don't know k, we
try each possible k in turn, compute the optimal
sub-problem for each such split, and see which pair of
optimal sub-problems yields the optimal (minimum, in this
"For example, if we have four matrices ABCD, we compute the cost required to find each of (A)(BCD), (AB)(CD), and (ABC)(D), making recursive calls to find the minimum cost to compute ABC, AB, CD, and BCD. We then choose the best one." (https://en.wikipedia.org/wiki/Matrix_chain_multiplication)
An easy way to understand this:
Let's say we need to get from class at NYU Tandon to a ballgame at Yankee Stadium in the Bronx as fast as possible. If we choose Grand Central Station as the optimal high-level split, we must also choose the optimal ways to get from NYU to Grand Central, and from Grand Central to Yankee Stadium. It won't do to choose Grand Central, and then walk from NYU to Grand Central, and CitiBike from Grand Central to Yankee Stadium: there are faster ways to do each sub-problem!
CLRS does not offer a recursive version here (they do later in the chapter); they go straight to the bottom-up approach of storing each lowest-level result in a table, avoiding recomputation, and then combine those lower-level results into higher-level ones. The indexing here is very tricky and hard to follow in one's head, but it is worth trying to trace out what is going on by following the code. I have as usual included some print statements to help.
Finally, we use the results computed in step 3 to actually provide the optimal solution, by actually determing where the parentheses go.
Here is the code from our textbook, implemented in Python, runnning on the example where A1 is 10 x 100, A2 is 100 x 5, and A3 is 5 x 50:
The structure of m:
We can memoize the recursive version and change its run time from Ω(2n) to O(n3).
A problem exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems.
The problem space must be "small," in that a recursive algorithm visits the same sub-problems again and again, rather than continually generating new subproblems. The recursive Fibonacci is an excellent example of this!
Storing our choices in a table as we make them allows quick and simple reconstruction of the optimal solution.
As mentioned above, recursion with memoization is often a viable alternative to the bottom-up approach. Which to choose depends on several factors, one of which being that a recursive approach is often easier to understand. If our algorithm is going to handle small data sets, or not run very often, a recursive approach with memoization may be the right answer.
'Let X be "XMJYAUZ" and Y be "MZJAWXU".
The longest common subsequence between X and Y is "MJAU".'
Brute force solution runs in exponential time: not so good!
But the problem has an optimal substructure:
X = gregorsamsa
Y = reginaldblack
Our match on the last 'a' is at position X11 and Y11. The previous result string ('rega') must have been the LCS before X11 and Y11: otherwise, we could substitute in that actual LCS for 'rega' and have a longer overall LCS.
Caution: here some sub-problems are ruled out! If Xi and Yj are different, we consider the sub-problems of finding the LCS for Xi and Yj - 1 and for Xi - 1 and Yj, but not for Xi and Yj. Why not? Well, if they aren't equal, they can't be the endpoint of an LCS.
The solution here proceeds much like the earlier ones: find an LCS in a bottom-up fashion, using tables to store intermediate results and information for reconstructing the optimal solution.
We could eliminate a table here, reduce aymptotic
run-time a bit there. But is the code more confusing?
Do we lose an ability (reconstructing the solution) we
might actually need later?
An important principle: Don't optimize unless it is needed!
If a binary search tree is optimally construted, then both its left and right sub-trees must be optimally constructed. The usual "cut-and-paste" argument applies.
As usual, this is straightforward, but too slow.
Very much like the matrix-chain-order code. Working code coming soon!