Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve hash calculation to ensure a list of 128.660 are all uniquely hasheable #5

Merged
merged 1 commit into from
Feb 4, 2024

Conversation

onno-vos-dev
Copy link
Owner

@onno-vos-dev onno-vos-dev commented Feb 4, 2024

  • Grab more bytes if possible when calculating the hash. This helps prevent collisions as well as improves speed
  • Move find_cities/1 to after spawning workers. It can be done in the background as it only takes a very short amount of time
  • Read less bytes in find_cities/1 (2MB to be precise)
  • In-line do_process_line/2 and add_to_state/3
  • Do not generate home-grown half-baked City names in PropEr test but use an input file taken from: https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000
  • Save test files generated by PropEr so they can be repeated from a shell for debugging rather than losing them upon shrinking
  • Do not shrink at all as it never really leads to anything anyway
  • Add eunit test which tries to hash all 128.660 cities and ensures uniqueness
  • Add runtime results on Mac M1 Pro to README
  • Add build badge to README

… hasheable

- Grab more bytes if possible when calculating the hash. This helps prevent collisions as well as improves speed
- Move find_cities/1 to after spawning workers. It can be done in the background as it only takes a very short amount of time
- Read less bytes in find_cities/1 (2MB to be precise)
- In-line do_process_line/2 and add_to_state/3
- Do not generate home-grown half-baked City names in PropEr test but use an input file taken from: https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000
- Save test files generated by PropEr so they can be repeated from a shell for debugging rather than losing them upon shrinking
- Do not shrink at all as it never really leads to anything anyway
- Add eunit test which tries to hash all 128.660 cities and ensures uniqueness
- Add runtime results on Mac M1 Pro to README
- Add build badge to README
@onno-vos-dev onno-vos-dev merged commit 96e07e4 into main Feb 4, 2024
2 checks passed
onno-vos-dev added a commit to onno-vos-dev/1brc_erl_ex_test that referenced this pull request Feb 4, 2024
@onno-vos-dev onno-vos-dev deleted the minor-improvements branch February 4, 2024 14:15
IceDragon200 added a commit to IceDragon200/1brc_erl_ex_test that referenced this pull request Feb 4, 2024
…to-latest-main

Bump onno-vos-dev submodule to latest merge of onno-vos-dev/1brc#5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant