The purpose of this module is to test the performance of the different sort (and related) algorithms in Julia. See https://github.com/kmsquire/SortPerf.jl/raw/master/sortperf.pdf for an example output from Version 0.3.0-prerelease+125.
Run with:
std_sort_tests(;sort_algs=SortPerf.sort_algs, # [InsertionSort, HeapSort, MergeSort,
# QuickSort, RadixSort, TimSort]
types=SortPerf.std_types, # [Int32, Int64, Int128, Float32, Float64, String]
range=6:20, # Array size 2^6 through 2^20, by powers of 2
replicates=3, #
lt::Function=isless, # \
by::Function=identity, # | sort(...) options
rev::Bool=false, # |
order::Ordering=Forward, # /
save::Bool=false, # create and save timing tsv and pdf plot
prefix="sortperf") # prefix for saved files
You can also test individual algorithms with
sortperf(Algorithm(s), data, [size,] [replicates=xxx])
Some examples:
sortperf(QuickSort, Int, 10_000) # Test QuickSort on 10,000 random ints
sortperf(MergeSort, [Float32, String], 6:2:10) # Test MergeSort on 2^6, 2^8, and 2^10 float 32s and strings
sortperf([QuickSort, MergeSort, TimSort], # Test QuickSort, MergeSort, and TimSort on
[Int, Float32, Float64, String], # Arrays of Int, Float32, Float64, and String
6:20; # ranging from 2^6 elements to 2^20 elements, by
replicates=5) # powers of 2, and run each test 5 times
Ordering parameters accepted by sort!(...) will be passed through.
The actual tests run include sorting arrays with the following characteristics:
- random
- sorted
- reversed
- sorted, but with 3 random exchanges
- sorted, with 10 random values appended
- 4 unique values
- all equal
- quicksort median killer: first half descending, second half ascending
The tests were inspired by similar tests used by sortperf in Python. See http://svn.python.org/projects/python/trunk/Objects/listsort.txt for more details.
Here is a table and some notes on the Julia implementations of the
various algorithms. The table indicates the recommended sort
algorithm for the given size (small < ~2^12 (=8,192) items < large
)
and type (string, floating point, or integer) of data.
- Random means that the data is permuted randomly.
- Structured here means that the data contains partially sorted runs (such as when adding random data to an already sorted array).
- Few unique indicates that the data only contains a few unique values.
(Un)stable (small) | Stable (small) | (Un)stable (large) | Stable (large) | In-place (large) | |
---|---|---|---|---|---|
Strings | |||||
- Random | M | M | M | M | Q |
- Structured | M | M | T | T | Q |
- Few Unique | Q | M | Q | M | Q |
Float64 | |||||
- Random | Q | M | R | R | Q |
- Structured | M | M | T | T | Q |
- Few Unique | Q | M | Q | R | Q |
Int64 | |||||
- Random | Q | M | R | R | Q |
- Structured | Q | M | uT | R/T | Q |
- Few Unique | Q | M | R | R | Q |
Key:
Symbol | Algorithm |
---|---|
H | HeapSort |
I | InsertionSort |
M | MergeSort |
Q | QuickSort |
T | TimSort |
uT | TimSortUnstable |
R | RadixSort |
-
Except for pathological cases, small arrays are sorted best with
QuickSort
(unstable) or `MergeSort`` (stable) -
When sorting large arrays with sections of already-sorted data, use
TimSort
. The only structured case it does not handle well is reverse-sorted data with large numbers of repeat elements. An unstable version ofTimSort
(to be contributed to Julia soon) will handle this case -
For numerical data (Ints or Floats) without structure,
RadixSort
is the best choice, except for 1) 128-bit values, or 2) 64-bit integers which span the full range of values. -
When memory is tight,
QuickSort
is the best in-place algorithm. If there is concern about pathological cases, useHeapSort
. All stable algorithms use additional memory, butTimSort
is (probably) the most frugal. -
Composite types may behave differently. If sorting is important to your application, you should test the different algorithms on your own data. This package facilitates that.