Design and Analysis of Algorithms Midterm

Problems

Write pseudo-code for a double hasing scheme. You may simply call your hashing functions h1() and h2(). (In other words, you needn't actually write the hashing functions.)

Example code here.
What operations do we want in the ADT iterator? Consider including rewind() in the ADT: is it always going to be possible to implement it? If "no," in which situations would it not be possible?
- Get iterator.
- Get next value.
- Detect end of iterables.
Can't rewind if data is gone, e.g., live stock feed.
Let us say we have an algorithm that runs in 8(lg n) + 10 time. We want to show its order of complexity is θ(lg n). Give an example of a set of k₁, k₂, and n₀ that would do this, so that k₁ * lg n < 8(lg n) + 10 < k₂ * lg n.

Example values:
k₁ = 9
k₂ = 7
n₀ = 2048
Imagine you have an 8-sided die, numbered 1 through 8. Use an indicator random variable to show how many 8s we expect to get in 100 rolls.

See here for example.
Draw the binary search tree that arises from inserting, in order: 24, 32, 10, 6, 88, 20, 28, 22, 1, 99
In the above tree, describe the process by which we find the successor of 22.
Use an indicator random variable to show how many numbers we have to pick from a set of n numbers before we are more likely than not to have drawn the same number twice.

See here for example.
Write pseudo-code for a hash table (you can just assume a hash function of h) that resolves collisions by chaining, and that looks up (possibly chained) key value and returns the associated data.

Example code here.
Write pseudo-code to search a BST for a particular key.
The logistic equation is x_n = rx_(n-1)(1 - x_(n-1)). Write memoized pseudo-code that uses recursion, but also uses a direct addressing to store results.

Remember to both store value and retrieve the value!

Multiple Choice

Compared to chaining, open addressing has the downside that:
1. It is very slow.
2. The hash table can fill up. *
3. It can't resolve collisions.
4. The hash table takes a huge amount of space.
We have one algorithm for processing customer records with run time of O(n), and another with run time of O(lg n) + 1000. In what circumstances might we want to choose the O(n) algorithm?
1. No circumstances.
2. If our programmers are really bad.
3. We believe our program will always be dealing with a number of records less than one thousand. *
4. If n is very large.
Under what circumstances might we most reasonably expect to receive worst-case input to our binary search tree?
1. We are loading the tree from an existing data store which is sorted. *
2. We insert data into the tree as it comes along to us.
3. The data has been randomly shuffled.
4. All of the above.
When we randomize an algorithm, we then speak of its
1. necessary running time.
2. best-case running time.
3. expected running time. *
4. unbounded running time.
The linearity of expectations means that, for two events X and Y, the expectation that either event will occur equals
1. the product of the expectation that X will occur and the expectation that Y will occur
2. the difference of the expectation that X will occur and the expectation that Y will occur
3. the square of the expectation that X will occur and the expectation that Y will occur
4. the sum of the expectation that X will occur and the expectation that Y will occur *
The worst-case running time of a search on a binary search tree is
1. θ(n²)
2. θ(n) *
3. θ(lg n)
4. θ(1)
If we have a binary search tree of height h, then all searching functions such as min, max, and successor will run in
1. O(h) time. *
2. O(h²) time.
3. O(ln h) time.
4. O(lg h) time.
According to the binary search tree property
1. for all nodes higher in the tree than x, x.key ≤ y.key.
2. for all nodes in the right subtree of x, x.key ≤ y.key. *
3. for all nodes lower in the tree than x, x.key ≤ y.key.
4. all nodes in the right subtree of x will be the successors of x.
The expected value of a real variable X is:
1. is the weighted sum of all its values, weighted according to the probability that they happen. *
2. the single value that occurs most often.
3. the value in the middle of its probability distribution.
4. all of the above, as they are equivalent.
Θ-notation applied to a function implies it is
1. a function we know little about.
2. asymptotically bound from above and below. *
3. asymptotically bound from below only.
4. asymptotically bound from above only.
For large n, an algorithm will run the slowest if it time complexity is:
1. O(20n lg n)
2. O(n!) *
3. O(n³)
4. O(2ⁿ)
An algorithm is most like:
1. a carefully designed recipe. *
2. a car engine.
3. an organic molecule.
4. a literary work.
O-notation applied to a function implies
1. it is a function we know little about.
2. it is asymptotically bound from above and below.
3. only that it is asymptotically bound from below.
4. only that it is asymptotically bound from above. *
For large n, an algorithm will run the fastest if it time complexity is:
1. O((1.1)ⁿ)
2. O(n!)
3. O(n²) *
4. O(n³)
Besides running time, we can also measure algorithm performance by:
1. disk usage
2. memory usage
3. power consumed
4. all of the above *
If f(n) = ω(g(n)), that means
1. g dominates f asymptotically
2. f is bounded below by g asymptotically
3. f is bounded above by g asymptotically
4. f dominates g asymptotically *
Which of these functions grows most slowly?
1. lg n
2. lg* n *
3. ln n
4. log₁₀ n
Ω-notation applied to a function implies
1. it is a function we know little about.
2. it is asymptotically bound from above and below.
3. only that it is asymptotically bound from below. *
4. only that it is asymptotically bound from above.
In algorithm analysis, we usually analyze algorithms in terms of
1. actual running time in nanoseconds.
2. the number of disk operations.
3. the number of basic operations. *
4. CPU time used.
In algorithm analysis, we usually look at
1. worst-case performance. *
2. average-case performance.
3. best-case performance.
4. median-case performance.
The chief advantage of a doubly-linked list over a linked list is
1. we can walk the list in either direction. *
2. inserts are faster.
3. deletes are faster.
4. less memory is used.
A good use of direct dictionaries would be
1. memoization of a numeric algorithm.
2. coding the Sieve of Eratosthenes.
3. marking zipcodes seen in a mailing application.
4. all of the above. *
The worst case of hashing with chaining occurs when
1. the input comes in sorted already.
2. all inputs hash to different values.
3. the input is purely random.
4. all inputs hash to the same value. *
The "clustering" problem with open addressing with linear probing means that
1. "clusters" of keys will all hash to the same value.
2. "clusters" of keys will build up in a linked list.
3. the hash table clusters around its central value.
4. once inputs start to cluster in one area of the hash table, they will become more likely to do so. *
A random variable is actually
1. the mean of all possible values.
2. an algorithm.
3. a precise real number value.
4. a function. *