-
Notifications
You must be signed in to change notification settings - Fork 58
Measures of presortedness
Also known as measures of disorder, the measures of presortedness are algorithms used to tell how much a sequence is already sorted, or how much disorder there is in it. Some adaptative sorting algorithms are known to take advantage of the order already present in a sequence, and happen to be "optimal" with regard to some measures of presortedness.
In cpp-sort, measures of presortedness are implemented as instances of some specific function objects; they take an iterable or a pair of iterators and return how much disorder there is in the sequence according to the measure. Just like sorters, measures of presortedness can handle custom comparison and projection functions, and with the same degree of freedom when it comes to how they can be called:
using namespace cppsort;
auto a = probe::inv(collection);
auto b = probe::rem(vec.begin(), vec.end());
auto c = probe::ham(li, std::greater<>{});
auto d = probe::runs(integers, std::negate<>{});
Note however that these algorithms can be expensive. Using them before an actual sorting algorithm has no interest at all; they are meant to be profiling tools: when sorting is a critical part of your application, you can use these measures on typical data and check whether it is mostly sorted according to one measure or another, then you may be able to find a sorting algorithm known to be optimal with regard to this specific measure.
Every measure of presortedness lives in the subnamespace cppsort::probe
. Even though all of them are available in their own header, it is possible to include all of them at once with the following include:
#include <cpp-sort/probes.h>
Measures of presortedness are pretty formalized, so the names of the functions in the library are short and correspond to the ones used in the litterature. As per this litterature, we will use the symbols X to represent the analyzed sequence, and n to represent the size of that sequence.
#include <cpp-sort/probes/dis.h>
Computes the maximum distance determined by an inversion. Its worst case returns n - 1 when X is sorted in reverse order.
Complexity | Memory | Iterators |
---|---|---|
n² | 1 | Forward |
Warning: this algorithm might be noticeably slower when the passed iterable is not random-access.
#include <cpp-sort/probes/enc.h>
Computes the number of encroaching lists that can be extracted from X minus one (see Skiena's Encroaching lists as a measure of presortedness). Its worst case returns n / 2 - 1 when the values already extracted from X constitute stronger bounds than the values yet to be extracted (for example the sequence {0 9 1 8 2 7 3 6 4 5} will trigger the worst case).
Complexity | Memory | Iterators |
---|---|---|
n² | n | Forward |
#include <cpp-sort/probes/exc.h>
Computes the minimum number of exchanges required to sort X, which corresponds to n minus the number of cycles in the sequence. A cycle corresponds to a number of elements in a sequence that need to be rotated to be in their sorted position; for example, let {2, 4, 0, 6, 3, 1, 5} be a sequence, the cycles are {0, 2} and {1, 3, 4, 5, 6} so Exc(X) = n - 2 = 5. There is always at least one cycle in any sequence, so Exc has a worst case of n - 1 when every element in X is one element away from its sorted position.
Complexity | Memory | Iterators |
---|---|---|
n log n | n | Forward |
Warning: this algorithm might be noticeably slower when the passed iterable is not random-access.
#include <cpp-sort/probes/ham.h>
Computes the number of elements in X that are not in their sorted position, which corresponds to the Hamming distance between X and its sorted permutation. Its worst case returns n when every element in X is one element away from its sorted position.
Complexity | Memory | Iterators |
---|---|---|
n log n | n | Forward |
#include <cpp-sort/probes/inv.h>
Computes the number of inversions in X, where an inversion corresponds to a pair (a, b) of elements not in order. For example, the sequence {2, 1, 3, 0} has 4 inversions: (2, 1), (2, 0), (1, 0) and (3, 0). Its worst case returns n * (n - 1) / 2 when X is sorted in reverse order.
Complexity | Memory | Iterators |
---|---|---|
n log n | n | Forward |
n² | 1 | Forward |
inv
will first try to allocate enough memory to use the linearithmic algorithm. If it fails to allocate enough memory, it falls back to the quadratic algorithm.
#include <cpp-sort/probes/max.h>
Computes the maximum distance an element in X must travel to find its sorted position. Its worst case returns n - 1 when X is sorted in reverse order.
Complexity | Memory | Iterators |
---|---|---|
n log n | n | Forward |
Warning: this algorithm might be noticeably slower when the passed iterable is not random-access.
#include <cpp-sort/probes/osc.h>
Computes the Oscillation measure described by Levcopoulos and Petersson in Adaptative Heapsort. Its worst case returns (n * (n - 2) - 1) / 2 when the values in X are strongly oscillating.
Complexity | Memory | Iterators |
---|---|---|
n² | 1 | Forward |
#include <cpp-sort/probes/par.h>
Computes the Par measure described by Estivill-Castro and Wood in A New Measure of Presortedness as follows:
Par(X) = min { p | X is p-sorted }
The following definition is also given to determine whether a sequence is p-sorted:
X is p-sorted iff for all i, j ∈ {1, 2, ..., n}, i - j > p implies Xj ≤ Xi.
Its worst case returns (n - 1) when the last element of X is smaller than the first one.
Complexity | Memory | Iterators |
---|---|---|
n² log n | 1 | Random-access |
#include <cpp-sort/probes/rem.h>
Computes the minimum number of elements that must be removed from X to obtain a sorted subsequence, which corresponds to n minus the size of the longest ascending run in X. Its worst case returns n - 1 when X is sorted in reverse order.
Complexity | Memory | Iterators |
---|---|---|
n | 1 | Forward |
Warning: this algorithm might be noticeably slower when the passed iterable is not random-access.
#include <cpp-sort/probes/runs.h>
Computes the number of ascending runs in X minus one. Its worst case returns n - 1 when X is sorted in reverse order.
Complexity | Memory | Iterators |
---|---|---|
n | 1 | Forward |
- Home
- Quickstart
- Sorting library
- Comparators and projections
- Miscellaneous utilities
- Tutorials
- Tooling
- Benchmarks
- Changelog
- Original research