Author: Jack Robbins
This project contains a C program that uses a hexadecimal radix sort to sort signed integers taken in from standard input. A simple runner script is also provided for user convenience and ease of testing.
Most generic sorting algorithms like insertion sort, shell sort, quicksort, etc., work by comparing two values in an array and swapping them accordingly. Some methods of doing this are more efficient than others. Algorithms like Bubble Sort, the oldest and most basic algorithm, have a runtime of
This is because, for an algorithm to be generic, at some point the algorithm must compare two items and swap them. An interesting question then arises if we remove the requirement that the algorithm must be generic. When this requirement is dropped, it turns out that there are more efficient ways of sorting some kinds of data, specifically primitive kinds of data, whose binary represenation in memory we can manipulate. This is where Radix sort comes in.
Before answering this question it is important to recall how data is organized in memory. At its most fundamental level, all data are 1's and 0's. Both signed and unsigned integers are 4 Bytes, or 4*8 bits, of contiguous 1's and 0's in memory. With this we know that for each bit, there are only 2 possible values, 1 or 0, of which 1 is the largest and 0 is the smallest. We can leverage this fact to arrange data into "buckets", based on their values at a certain bit, Here is a simplified example, sorting 4 nibbles(half bytes):
By being consistent in how we move from the Least Significant Bit(LSB) up to the Most Significant Bit(MSB), and in how we update the array by merging both buckets together after we sort, the data arranges itself in order. The example above only goes bit by bit, and therefore needs two buckets. However, the program in this project impelements a hexadecimal radix sort. As opposed to going bit by bit, the program radix_sort.c goes nibble(4 bits) by nibble through each 32 bit integer, and uses 16 buckets, since for each nibble, there are
Although this example is very basic and simple, its basic principle of "bucket sorting" based on bits can be applied to 32 bit values like floats, integers, and even 64 bit values like doubles. This program only implements radix sort on integers, but there is another repository that does the same for floating point numbers.
Since Radix sort requires us to go through the array of integers 8 times for a hexadecimal radix sort, since there are
The above description of radix sort leaves out the fact that, for signed integers, the Most Significant Bit is always 1. For this reason, since radix sort looks at all values from an unsigned perspective, all of the negative values will be considered to be greater than the positive ones, and appear after the positive values if no corrections are made.
Without a correction, sorting this 9 -1 2 -11 23
would yield 2 9 23 -11 -1
. Luckily, this is very easy to fix in an efficient way, simply by grabbing the negative subarray and rearranging the positive one to be after it. Since both positive and negative subsets of integers do come out properly sorted, no additional sorting is needed, only some rearranging.
The function:
void to_signed_order(unsigned int* arr, int size);
fixes this problem in radix_sort.c.
The program radix_sort.c implements a hexadecimal radix sort of signed integers. The program itself is very well documented, so I will not reiterate its functionality here. I would encourage anyone who is interested to step through the code to understand it.
The runner script run.sh provides a convenient way to run the radix_sort.c program. The script takes in a positive integer value and generates the appropriate number of integers, saving them into a file named integers
. It will then pass the
values in this file into radix_sort.c and into the standard sort
command in Bash. The results of both of these procedures are stored in individual files, and compared using the diff
command to demonstrate the correct functionality of radix_sort.c. An example of how to use this script is given below.
Note
Be sure to grant executable permissions to run.sh
by running chmod +x run.sh
example@bash:~$ ./run.sh 500
Warning: using 500 integers will generate 3 files each of approximate size 2KB
Are you sure you want to proceed?[Y/N]: Y
# Program output follows this