# Sorting Algorithms An algorithm is a finite set of instructions that are well defined. They can be implemented by a computer and typically solve a particular type of problem. They are used to explicitly define specifications for computer functionality. Found in mathematics as well, they are not limited to computer science. Basically they are a set of rules that a computer or mathematician {sometimes there is little difference} uses to define a sequence of operations.

Sorting algorithms put elements of a list into a defined order. This could be numerical, alphabetical, or any type of ordering. Sorting is also used in canonicalization of data. Algorithms are typically named with the word “sort” after them, unless the word “sort” is in the name. Sorting has been researched since before computer science existed. Adding computers to the mix speeds up the process but adds complexity to the mix.

An algorithm must meet two criteria to be a sorting algorithm. First, the result cannot be in decreasing order. No element can be smaller than the previous element, unless you are sorting in decreasing order then it’s flipped. The other is that the result must be a permutation of the input. It is a reordering of what is input, new elements cannot be added.

## Episode Breakdown

### Understanding Sorting

#### Properties of Sorting Algorithms

Sorting may be internal or external. Internal sorts use local memory to store data while so Data that cannot be placed in memory to be sorted must use external sorting. Typically a merge sort or other sort that breaks the data into smaller chunks is used. An external storage device such as a hard disk or flash drive are used to store the data while sorting.

In-place algorithms are ones that do not use an extra data structure to hold temporary data. The changes happen in the actual data structure itself. A good though example is swapping the values (integers) of two variables without using a third one.

• a = 10, b = 20
• a = a + b :. 10 + 20 = 30
• b = a – b :. 30 – 20 = 10
• a = a – b :. 30 – 10 = 20

Sorting algorithms can be classified in several ways. First off is computational complexity. This is based on worst, average, and best cases. It is divided into the amount of time it takes to run an algorithm and the amount of space that process takes up in active memory.

Sort algorithms may be classified based on their general method of sorting: insertion, exchange, selection, merging, etc. Bubble and quicksort would be types of exchanges algorithms. Heapshot would be a selection algorithm. Algorithms may be recursive, non-recursive, or have properties of both.

They may also be classified based on their adaptability. If an algorithm takes into account the presorted order of the input it is considered adaptive. Algorithms that do not take into account the order of input are not adaptive. Finally almost all sorting algorithms are serial but they could be parallel.

#### Stability in Sorting

An algorithm is stable if items of the same sort value in the input are in the same order in the output. Data being sorted can be represented as a tuple of values. The part of the datum used for sorting is the key. The other part of the datum is irrelevant when it comes to sorting.

When sorting a set of data only part of the data is used for the sort key. Stability is not a concern if all parts of the datum are used for the key. It is also not a concern if each of the datum used for sorting in the set is unique. If the part of the datum which you are using as key has no repeated values then stability is not an issue.

It can be formally defined:

Let A be an array, and let &lt be a strict weak ordering on the elements of A. A sorting algorithm is stable if i &lt j and A[i] is the equivalent of A[j] implies pi(i) &lt pi(j) where pi is the sorting permutation (sorting moves A[i] to position pi(i)).

It basically means that elements retain their relative position after sorting. You have the data set of classrooms and sections of a class (10A 20C 30A 20B 10C 10B). To stably sort the set by room number you would output (10A 10C 10B 20C 20B 30A). To stably sort the set by section letter you would output (10A 30A 20B 10B 20C 10C).

• Original Set: 10A 20C 30A 20B 10C 10B
• Sort by Room: 10A 10C 10B 20C 20B 30A
• Sort by Class: 10A 30A 20B 10B 20C 10C

Stability becomes important with nested sorting. Nested sorting occurs when you want to sort the same data set by multiple criteria. This creates a pseudo grouping of the data. For example, if you have a data set containing student name, class, and grade and want to see grades in descending order per student you would need to sort by both students and grades.

Nesting requires backward thinking on how your are sorting. It requires starting the the final sort criteria and then working backward to the top level. If you’ve ever used Excel or other spreadsheets this is how you would group in them. Using the student example, first you would need to sort by the grades to get all the records in the set in order of grade. Then you would want to sort by student name preserving the previous order so the output will be grouped by student name and in order of grade. You may also want to preserve a previous sort order when running a new sort.

Any algorithm can be modified to be stable. The key comparison can be artificially extended. Comparison between two keys of equal value use the original input as the deciding factor. This will require adding more complexity to the sorting algorithm.

Another option is to use a primary and secondary key. In this the secondary key would be something like position. You could then use recursive or nested sorting to first compare primary then secondary keys.

#### Comparisons in Sorting

Comparison algorithms compare keys at each step of the algorithm. This is used to decide in what order to place the keys being compared. Generally they are easier to implement than non-comparison sorts. They read the list of keys through a single abstract comparison. This is usually a less than or equal to operation. It could also be a three way comparison.

The comparison operation must form a total preorder over the data. This involves using the transitive property from algebra. If a is related to b by a property, and b is related to c by the same property, then a is related to c by that property. If a &lt b and b &lt c, then a &lt c.

Comparison algorithms are connex relationships. A connex relationship is a homogenous relation where all the pairs relate in some way. For all instances a and b, a &le b or b &le a. This is for every possible pair of a and b in a set.

It is possible for two keys to equal one another. Mathematically a &le b and b &le a when a = b. In this case either key may be first in a sorted output. However, in a stable sort the input order determines which comes first in the output.

On average comparison sorts cannot be faster than O(n*log(n)). This is averaging the best possible and the worst possible run times. A set that is already sorted or has some unique factor will run faster. There is only one version of the list that is sorted. There are n!n! possible versions of a list. Chances are very low that the list will be input already sorted.

Integer or Counting sorts do not make comparisons. With these all the keys are integers. An integer sort determines how many keys are less than the given key.
a. If there are 14 elements less than the key then it will be placed at position 15. This places each element immediately in the correct slot without rearranging the list. Integer sorts are not limited to an average time of O(n*log(n)).

#### Complexity and Efficiency

It is important to know how well an algorithm will run when choosing which to use. All sorting algorithms output a sorted list. However the way they go about sorting affects the efficiency of the algorithm. Some work better with larger data sets but not as well with smaller ones. Others may be more efficient with smaller data sets but get hung up on larger ones.

Knowing the worst case is important for systems that need guaranteed performance. This will tell you how your system will work under real-time load. Also it can be used to mitigate DDOS and other attacks that try to overwhelm a system.

The running time of an algorithm is how many operations it must do before it completes. Each operations takes a set amount of time so this is a description of how long it takes an algorithm to complete.

Big O notation is used to describe algorithms time efficiency. Big O notations describes the worst case running time meaning the algorithm will never be slower than that time. Big Omega describes best case running times meaning the algorithm will never be faster than that time. Big Theta describes the case when Big O and Big Omega are the same. There also exists an average case running time meaning that on average the algorithm will not be slower than that time. For example, the worst case from comparison algorithms is O(n*log(n)) so that will be the slowest it will run.

Space complexity is how much space in memory or external storage is needed to run an algorithm. In-place algorithms have the smallest space requirement O(1) because they do not need any extra memory. Space complexity also uses Big O notations. For example, if an algorithm needs a list of size n and in the process of sorting creates a second list of size n then that algorithm needs n^2 space in memory.

#### Problems with Sorting Data

The way that sorting algorithms use (or overuse) memory can become a problem. This could be from very large data sets. When you know you’re going to be sorting large sets of data you can choose your algorithm to suit. However if you do not expect large data sets and get them you may run out of space in memory.

When running out of memory the algorithm may use slower hard disk or external storage. Context switching becomes an issue if you have to swap to and from the disk. This can drastically slow the process down. When working with complex records creating an index and sorting by the index reduces the amount of memory needed to sort.

“Expect the best, plan for the worst, and prepare to be surprised.” ~ Dennis Waitley

Various algorithms have different time costs to run them. There is generally a trade off between memory and time when choosing an algorithm. With real-time systems you want to run only the most necessary processes and algorithms. It is best practice to assume the worst case scenario on time for an algorithm. Not doing this can cause latency and other timing issues when you do get unexpected data to sort. To reduce live sorting time presort or batch longer sort processes.

Sorting speeds up searching but isn’t necessary. When searching a large data set it is faster to sort first. However the sorting will take time. You either sacrifice in the search or the sort. If all that is needed is an unfiltered search result then searching alone may be faster. Determining when to sort and when not to sort can be confusing. A lot of factors are involved in that process.

### Common Sorting Algorithms

#### Simple Sorting Algorithms

These algorithms are the easiest to implement and maintain. They work best on small data sets due to low overhead. However efficiency drops quickly as the set gets large.

Insertion Sort

Insertion sort is an efficient algorithm for small and mostly sorted lists. Typically it is used in combination with other sort algorithms. Insertion sorts by taking an element from the list and inserting it in the correct position. This takes up extra memory because it requires a new list of sorted elements. The new and old list can take up the same space but requires moving elements over by one.

Insertion sort doesn’t change the relative order so it is stable. Because it builds the new list one item at a time is O(n^2) and requires constant memory O(1). The sort performs at O(kn) where k is the greatest distance between two elements. Because they can be at opposite ends of the list then in the worst case k = n.

Selection Sort

Selection sort compares elements without using another list or memory space. It is best with smaller data sets though it is not as efficient as insertion sort. It has a time complexity of O(n^2).

Selection sort divides the input into two parts. The first is a list of sorted elements. This is built at the front of the full list. If using arrays then it will start at index 0. The rest of the list is the unsorted elements.

As the algorithm runs the list of unsorted elements dwindles as the list of sorted grows. It’s advantage comes when there is limited memory for other sorts.

#### Practical or Efficient Sorting Algorithms

Efficiency of sorting comes from a reduced time complexity. Almost all of these algorithms will have an average time complexity of O(n*log(n)). That means they are all comparison sorts. In most cases the Big Theta will be O(n*log(n)).

The cost comes in the space complexity. This can range from O(n) and best to O(n^2) at worst. The space issues are often resolved by combining these with other algorithms. These algorithms are likely to be used in several languages and standard libraries.

Merge Sort

Merge sort relies on the ease or merging previously sorted lists. It is used in several programming languages. In Perl it is the standard sort routine. In Python and Java it is part of the standard routine. It is also used in combination with Insertion in the complex Timsort algorithm.

It works by breaking the data down into smaller lists. It starts by comparing every two elements and making lists for each one. It then combines (merges) the lists of two elements into lists of four. This continues until the last two lists are merged.

It works well with large data sets having a Big O of time complexity of O(n*log(n)). However it sacrifices in the memory because it has to create so many sublists. These take up lots of extra space. The space complexity is O(n).

Quicksort

Quicksort works through use of multi-branched recursion. This is called a divide and conquer algorithm. It recursively breaks down the problem into smaller problems. It works backward from the merge sort which starts with the smallest comparison and builds up.

It uses a partition operation that partitions an array on a pivot. All elements smaller than the pivot are moved in front of it. All elements greater are moved behind it.

The tricky part of Quicksort is choosing the pivot element. Constantly choosing on either end of the sort causes the time complexity to drift toward O(n^2). However this is a rare occurrence and the typical performance is closer to the average O(n*log(n)). Because it doesn’t have to create separate lists for move the elements around it is another in-place algorithm with a space complexity of O(log n).

Heapsort

Heapsort is an efficient version of Selection sort. Like selection sort it breaks the data down into two parts, sorted and unsorted. Time efficiency is improved by using a heap. This is a special type of binary tree structure. The root node of the tree is the extreme element, largest or smallest depending on the direction of the sort. That node is removed and placed at the end of the list. The next node becomes the root. This speeds the process of finding the next element making the Big Theta O(n*log(n)).

Shellsort

Shellsort is a more efficient version of Insertion sort. It’s name comes from it’s inventor Donald Shell.

It moves the out of order element more than one position at a time. This first sorts elements farther away from each other. The gap between elements progressively shrinks causing the algorithm to get faster as it runs. This means that time complexity cannot be exactly calculated.

It runs more efficiently on mostly sorted data sets. Like Insertion it is an in-place algorithm and doesn’t require extra memory.

#### Distribution Sorts

In distributed sorts the data is distributed into intermediate structures before being combined. It is used for handling very large amounts of data that needs to be sorted. Data too large to be stored in memory is able to be broken down and stored externally while other pieces of the data are being sorted. This can happen on a single processor or it can be broken down and processes on several different processors.

Counting Sort

Counting sort is used when you know the input is within a particular set S. It is extremely fast, especially the smaller the set.

It works by creating an array of size of set S. It then counts the occurrences of a particular member of the set and places them in their own bin. The inputs are then counted by incrementing the associated bin. The counting array is then looped to put all the inputs in order.

It has a time complexity of O(|S|{Size of set} + n). The memory required is just the size of the set so O(|S|).

Bucket Sort

Bucket sort is another divide and conquer algorithm. It builds on Counting sort to allow for larger data sets.

Bucket sort divides an array into a finite amount of buckets or bins. Each one is sorted individually. This can be using a different algorithm or by recursion.

When using recursion it is similar to Merge sort. The exception that Merge sort doesn’t have a set number of buckets or sublists. Also Bucket sort doesn’t necessarily break them down into two element comparisons.

It works best when the data is evenly distributed. Efficiency is determined by the algorithm used to sort each bucket.

#### Bubble Sorts

Bubble sort and it’s variants are simple to learn but very inefficient. They are mostly seen in textbooks and tutorial. While they are used for studying sorting they are rarely used in the wild due to their inefficiency.

Bubble Sort

Bubble sort is one of the simplest sorting algorithms. This is the most commonly taught sorting algorithm because it’s easy to understand.

It starts by comparing the first two elements in a list. If needed it swaps these elements. Then it moves to the next two elements in the list and compares them. Once it has reached the end of the list it starts over again at the top of the list. It only stops when no swaps occur in the last run.

Bubble sort is plagued by rabbits and turtles. The distance and direction each element must move affects the performance of the algorithm. Rabbits are elements moving to the back of the list move faster because they win every comparison. Turtles are elements at the back of the list that need to be moved forward. They can only move one position per loop through the list.

It has a Big Theta of O(n^2). It’s not used often with larger data sets. It is most efficient with semi-sorted data when the lower efficiency is not a problem.

Cocktail Shaker Sort

Cocktail shaker sort is a variant of the Bubble sort. It is a stable comparison sorting algorithm that attempts to deal with turtles in Bubble sort.

Unlike Bubble sort which is unidirectional Cocktail sorts in both directions. It functions basically the same as the Bubble sort. Instead of restarting at the beginning of the list it moves backward through the list making the opposite comparison to put elements in order.

Even though it works in both directions it is only slightly more efficient than Bubble sort. The time complexity has a Big O of O(n^2) and a Big Omega of O(n), though it averages at O(n^2). Since it is in-place the Big Theta space complexity is O(1).

Comb Sort

Comb sort is another simple algorithm based on Bubble sort. It’s purpose is to eliminate turtles from Bubble sort.

It starts off by swapping elements at a certain distance from one another. This is opposed only swapping elements side-by-side in Bubble sort. The distance between elements gets smaller until it is doing a Bubble sort. Comb sort is to Bubble sort as Shellsort is to Insertion sort.

### Book Club

#### Art Matters: Because Your Imagination Can Change The World

Neil Gaiman

“The world always seems brighter when you’ve made something that wasn’t there before.” ~ Neil Gaiman