Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Bitset instead of Array<boolean> #25

Open
rocketraman opened this issue Apr 4, 2019 · 1 comment
Open

Use Bitset instead of Array<boolean> #25

rocketraman opened this issue Apr 4, 2019 · 1 comment

Comments

@rocketraman
Copy link

rocketraman commented Apr 4, 2019

The current implementation uses a boolean[] as an input. Use of a BitSet (https://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html) would be a lot more efficient.

For example, if dictionary size is Integer.MAX_INT, as it would be with the "hashing shingles" approach given in 3.2.3 of Ullman et al, I need to allocate 2GB of memory to store an array of booleans. With BitSet, I can store that in approximately 8 times less space.

@tdebatty tdebatty changed the title Use Bitset instead of Array<Boolean> Use Bitset instead of Array<boolean> Apr 9, 2019
@tdebatty
Copy link
Owner

tdebatty commented Apr 9, 2019

Hi,

Would be interesting, but only for very large dictionaries (probably 100 million entries or more):
https://stackoverflow.com/a/605451/4770918

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants