Spring 2023

## Analysis of Algorithms

• In the previous week, we looked into
• difficulty of merge-sort (upper bound)
• difficulty of sorting (lower bound)
• But we have many more questions that can be asked:
• How long an implementation may take on a particular computer
• How does merge-sort compare to other $$O(N\log N)$$ algorithms?
• How does merge-sort compare to fast on average but slow on worst case algorithms?
• How does it compare to algorithms that are not based on comparison?

## Steps of Analysis

• Implement the algorithm completely
• Determine the time required for each basic operation
• Identify unknown quantities that can be used to describe the frequency of execution of the basic operations.
• Develop a realistic model for the input to the program.
• Analyze the unknown quantities, assuming the modeled input.
• Calculate the total running time by multiplying the time by the frequency for each operation, then adding all the products.

## Implementation

• program: a careful implementation of an algorithm
• one algorithm may correspond to many programs
• a program gives an object to study
• a program provides useful experimental data
• do not over-emphasize efficiency too early
• analysis will result in better efficiency

## Estimation of operation times

• Can be done quite easily in most cases
• profilers help a lot
• We will be mostly interested in machine-independent implementations in this course

## Identify frequency of execution

• Study branching structure
• Compute execution frequencies as unknowns
• Concentrate on high frequency/cost items

## Develop input model

• Input size determines the unknown quantities computed in the previous step
• We usually refer to the size of the input as $$N$$
• By “model” we mean the characteristics of the input
• A classic example for sorting is: “a randomly ordered, distinct array of numbers of size $$N$$”
• Alternative can be : “random array of integers between 1 and 1000”
• The algorithms behaviour and performance may change based on the input model

## Analyze the quantities based on the input

• For average case analysis,
• compute average frequencies
• multiply with operation costs
• sum all up
• Worst case,
• compute maximum frequencies
• multiply with operation costs
• sum all up

## Approximations

• This schematic can provide very nice results in most cases
• However, the details are daunting
• So, we usually seek approximate models that can be used to estimate costs
• Computing each operational cost can be a really tedious task
• Alternatively, we only focus on inner operations
• For example, for sorting we only count compares

## Average Case Analysis

• Our focus in this course
• Formulate a reasonable input model
• Analyze the expected time based on inputs from this model
• Effective for two reasons:
• straightforward models of randomness are often extremely accurate
• we can often inject randomness to the problem instance

## Random Models

• How to compute the mean?

## Distributional Approach

• Let $$\Pi_N$$ be the number of possible inputs of size $$N$$
• Let $$\Pi_{Nk}$$ be the number of inputs of size $$N$$ that cause the algorithm to have cost $$k$$
• Therefore, $$\Pi_N=\sum_k{\Pi_{Nk}}$$
• Probability that the cost is $$k$$: $\Pi_Nk/\Pi_N$
• Expected cost of the algorithm: $\frac{1}{\Pi_N}\sum_k{k\Pi_{Nk}}$

## Cumulative Approach

• Let $$\sum_N$$ be the total (or cumulated) cost of the algorithm on all inputs of size $$N$$
• That is, $$\sum_N = \sum_k{k\Pi_{Nk}}$$
• Then, the average cost is $$\sum_N / \Pi_N$$

## Quicksort: a Java Implementation

private void quicksort(int[] a, int lo, int hi)
{
if (hi <= lo) return;
int i = lo-1, j = hi;
int t, v = a[hi];
while (true)
{
while (a[++i] < v) ;
while (v < a[--j]) if (j == lo) break;
if (i >= j) break;
t = a[i]; a[i] = a[j]; a[j] = t;
}
t = a[i]; a[i] = a[hi]; a[hi] = t;
quicksort(a, lo, i-1);
quicksort(a, i+1, hi);
}

## Quicksort: Analysis

• Identify resource requirements
• while (a[++i] < v) ; translates into
  LOOP INC I,1      # increment i
CMP V,A(I)   # compare v with A(i)
BL LOOP      # branch if less
• 4 unit memory access operations
• Other while is similar

## Identify Frequencies

• A – the number of partitioning stages

• B – the number of exchanges

• C – the number of compares

• On a typical computer: $4C + 11B + 35A$

• The exact coefficients depend on the compiler and the computer architecture

• The coefficient of C is significantly lower compared to mergesort

## Quicksort Analysis

Theorem Quicksort uses, on the average,

• $$(N − 1)/2$$ partitioning stages,
• $$2(N + 1) (H_{N+1} − 3/2) \approx 2N\ln N − 1.846N$$ compares, and
• $$(N + 1) (H_{N+1} − 3)/3 + 1 \approx .333N\ln N − .865N$$ exchanges

to sort an array of N randomly ordered distinct elements.

where $$H_N = \sum_{1\leq k\leq N}{1/k}$$ is the harmonic numbers

Proof: Full proof in the book, if you are interested.

## Quicksort Analysis

$C_N = N+1 + \frac{1}{N}\sum_{1\leq j\leq N}{(C_{j-1}+C_{N-j})}$

## Quicksort Analysis

$C_N = N+1 + \frac{1}{N}\sum_{1\leq j\leq N}{(C_{j-1}+C_{N-j})}$

• $$N+1$$ comparison in first partitioning phase
• $$\displaystyle\sum_{1\leq j\leq N}{(C_{j-1}+C_{N-j})}$$ for sub-arrays divided at $$j^{th}$$ element
• multiply by $$1/N$$ for probability of each

## Quicksort Analysis

$C_N = N+1 + \frac{1}{N}\sum_{1\leq j\leq N}{(C_{j-1}+C_{N-j})}$

• Note that the first and second terms are identical in $$\displaystyle\sum_{1\leq j\leq N}{(C_{j-1}+C_{N-j})}$$
• $$\displaystyle\sum_{1\leq j\leq N}{C_{j-1}} = \sum_{1\leq j\leq N}{C_{N-j}}$$
• So, $$\displaystyle\sum_{1\leq j\leq N}{(C_{j-1}+C_{N-j})} = 2\sum_{1\leq j\leq N}{C_{j-1}}$$

## Quicksort Analysis

$C_N = N+1 + \frac{2}{N}\sum_{1\leq j\leq N}{C_{j-1}}$ - Multiply by N

$NC_N = N^2+N + 2\sum_{1\leq j\leq N}{C_{j-1}}$ - and note for N-1

$(N-1)C_{N-1} = (N-1)^2+N-1 + 2\sum_{1\leq j\leq N-1}{C_{j-1}}$

## Quicksort Analysis

• Subtract case of N from case of N-1

$NC_N - (N-1)C_{N-1}= 2N+2C_{N-1}$ - rearrange

$NC_N = (N+1)C_{N-1} + 2N$

• divide both sides by $$N(N+1)$$

$\frac{C_N}{N+1} = \frac{C_{N-1}}{N} + \frac{2}{N+1}$

## Quicksort Analysis

• Iterating gives:

$\frac{C_N}{N+1} = \frac{C_1}{2} + 2\sum_{3\leq k\leq N+1}{1/k}$

• $$C_1=0$$ and $$\displaystyle\sum_{3\leq k\leq N+1}{1/k} = H_{N+1}-3/2$$, so

$C_N = 2N\ln N - 1.846N$

• using Euler–Mascheroni constant for approximation of Harmonics.

## How to Improve Quicksort?

• Small subarrays: A small array can be sorted with simpler techniques much faster
• An array of size 2 requires one compare and one potential exchange
• Use insertion sort for small arrays
• When to switch from insertion sort to quicksort recursion?
• Median-of-three: take a small sample and use the median as a pivot
• Radix-exchange sort: Consider the keys to be bit strings and partition bit by bit