Text distance metrics

This section lists all available text distance metrics along with their IDs for command-line use.

The weighted N-gram score is computed as the sum of the number of weighted shared N-grams between the two texts. It ensures that:

Shared N-gram instances near interval bounds (dependent on situation) get rated higher than the ones near the center or opposite end
Large shared N-gram instances are weighted higher than short ones

--align-min-ngram-size <SIZE> sets the start (minimum) N-gram size

--align-max-ngram-size <SIZE> sets the final (maximum) N-gram size

--align-ngram-size-factor <FACTOR> sets a weight factor for the size preference

--align-ngram-position-factor <FACTOR> sets a weight factor for the position preference

Jaro-Winkler is an edit distance metric described here.

Editex is a phonetic text distance algorithm described here.

Levenshtein is an edit distance metric described here.

The "Match rating approach" is a phonetic text distance algorithm described here.