-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve hash calculation to ensure a list of 128.660 are all uniquely…
… hasheable - Grab more bytes if possible when calculating the hash. This helps prevent collisions as well as improves speed - Move find_cities/1 to after spawning workers. It can be done in the background as it only takes a very short amount of time - Read less bytes in find_cities/1 (2MB to be precise) - In-line do_process_line/2 and add_to_state/3 - Do not generate home-grown half-baked City names in PropEr test but use an input file taken from: https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000 - Save test files generated by PropEr so they can be repeated from a shell for debugging rather than losing them upon shrinking - Do not shrink at all as it never really leads to anything anyway - Add eunit test which tries to hash all 128.660 cities and ensures uniqueness - Add runtime results on Mac M1 Pro to README - Add build badge to README
- Loading branch information
1 parent
e5b023d
commit d25f91e
Showing
6 changed files
with
128,768 additions
and
87 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,20 @@ | ||
-ifndef(_HASH_HRL_). | ||
-define(_HASH_HRL_, true). | ||
|
||
|
||
%-define(HASH(Acc, Char), (((Acc band (Char + ?MASK)) + (Char + ?PRIME)) + (((Acc + Char) bxor Char) * (?PRIME band Char) band ?MASK))). | ||
|
||
%% This is a modified version of the FNV64a hashing code found at: | ||
%% https://github.com/leostera/erlang-hash/blob/a1b9101189e115b4eabbe941639f3c626614e986/src/hash_fnv.erl#L98 | ||
%% | ||
%% The reason is that PropEr kept finding conflicting keys such as for example: <<"JiÔk">> & <<"næðl">> | ||
%% which both produce the same FNV64a Hash. | ||
%% The reason is that PropEr kept finding conflicting keys | ||
%% which both produce the same FNV Hash. Hence, we test that all cities from the test file containing 128.660 cities | ||
%% are hasheable with below hash. See brc.erl eunit test for that. | ||
%% | ||
%% So we'll call this the 1BRC hash :-) | ||
%% So we'll call this just hash :-) | ||
-define(PRIME, 16777619). | ||
-define(INIT, 2166136261). | ||
-define(MASK, 16#FFFFFFFF). | ||
-define(HASH(Acc, Char), ((Char * Char + Char) + (Acc bxor Char) * ?PRIME) band ?MASK). | ||
-define(HASH(Acc, Char), (((Acc bxor Char) * ?PRIME + 1)) band ?MASK). | ||
|
||
-endif. |
Oops, something went wrong.