Chapter 6: Search Analysis


February 9th, 20118

Search Analysis

How long does a linear search take? If we assume that the element v is present in the array a, then the average search visits n/2 elements, where n is the length of the array. If it is not present, then all n elements must be inspected to verify the absence. Either way, a linear search is an O(n) algorithm.

A binary search is an O(log(n)) algorithm.
That result makes intuitive sense. Suppose that n is 100. Then after each search,
the size of the search range is cut in half, to 50, 25, 12, 6, 3, and 1. After seven comparisons we are done. This agrees with our formula, because log2(100) ≈ 6.64386, and indeed the next larger power of 2 is 27 = 128.

Selection Sort:
Let us count the number of operations that the program must carry out to sort an array with the selection sort algorithm. We don’t actually know how many machine operations are generated for each Java instruction, or which of those instructions are more time consuming than others, but we can make a simplification. We will simply count how often an array element is visited. Each visit requires about the same amount of work by other operations, such as incrementing subscripts and comparing values.

Let n be the size of the array. First, we must find the smallest of n numbers. To achieve that, we must visit n array elements. Then we swap the elements, which takes two visits. (You may argue that there is a certain probability that we don’t need to swap the values. That is true, and one can refine the computation to reflect that observation. As we will soon see, doing so would not affect the overall conclusion.) In the next step, we need to visit only n − 1 elements to find the minimum. In the following step, n − 2 elements are visited to find the minimum. The last step visits two elements to find the minimum. Each step requires two visits to swap the elements. Therefore, the total number of visits is

n + 2 + (n − 1) + 2 + … + 2 + 2 =

n + (n − 1) + … + 2 + (n − 1) ⋅ 2 =

2 + … + (n-1) + n + (n – 1) ⋅ 2 =

n (n – 1)
——— – 1 + (n – 1) ⋅ 2
2

multiplying out and collecting terms of n:

1 5
— n2 + — n – 3
2 2

Let’s simplify the analysis further:

When you plug in a large value for n (say 1,000 or 2,000)

1
— n2 is 500,000 or 2,000,000
2

5
— n – 3 is only 2,497 or 4,997. Neither of these two number would have an impact on
2

the magnitude of 500,000 and 2,000,000. We can just ignore them for the purpose of the analysis.

Further more, we will show that 1/2 in the square term does not have much of an effect in our analysis

We are not interested in the actual count of visits for a single n. We want to compare the ratios of counts for different values of n.

If we want to compare the number of visits required to sort an array of 1,000 elements to an array of 2,000, by what factor the number of visits would increase?

ratio

The factor of 1/2 cancels out in comparison of this kind. We will simply say, “The number of visits is of order n2“. That way, we can easily see that the number of comparisons increases fourfold when the size of the array doubles: (2n)2 = 4n2.

To indicate that the number of visits is of the order n2, we will use the big-Oh notation: O(n2)

The big-Oh notation is about the fastest-growing term, n2, and ignores its constant coefficient, no matter how large or small it may be.

It is safe to say that the number of machine operations, and the actual amount of time that the computer spends on them, is approximately proportional to the number of element visits. For argument sake, we could say that there are about 10 machine operations for every element visit. Then, then number of machine operations should be in the neighborhood of 10 x (1/2) n2. Using the big-Oh notation, we can say that the number of machine operations required to sort is in the order of n2 or O(n2) and so is the sorting time.

If an array is increase by a factor of 100, the sorting time increases by a factor of 10,000. To sort an array of a million elements, takes 10,000 times as long as sorting an array of 10,000 items.If 10,000 items can be sort in about half a second, then sorting one million items requires well over an hour.

1. If you increase the size of a data set tenfold, how much longer does it take to sort it with the selection sort algorithm?

2. How large does n need to be so that (1/2)n2 is bigger than (5/2)n – 3?