-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add categorical equality fast paths #6057
Comments
We've been moving a lot of data into Categorical cols recently, so this sounds fantastic :) |
Tried the smaller rev-map, but that seems more trouble than it is worth. We cannot determine fast uniques in such a case and get regressions in string comparisons. I will go with equality fast paths. |
Wouldn't a bloom filter be a good alternative for a fast in set reduction approach? |
Yes, but we also need to map from global idx to local one. Don't want to do a linear search once the bloom filter says yes. |
Will go for a btree. They are much more condensed and likely to be much faster on |
A few easy perf/memory wins.
cat == str
should first check the rev map.cat.is_in
should first check the rev_map ifis_in
is reasonable sized.The text was updated successfully, but these errors were encountered: