-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make hash collisions in the registry much less likely #657
Conversation
Signed-off-by: beorn7 <beorn@grafana.com>
This makes the collisions a bit less likely by XOR'ing descIDs rather than adding them up for the collectorID. Signed-off-by: beorn7 <beorn@grafana.com>
This is a much stronger hash function than fnv64a and comparably fast (with super-fast assembly implementation for amd64). Performance is not critical here anyway. The old fnv64a is kept for vectors, where collision detection is in place and the weakness of the hashing doesn't matter that much. I implemented a vector version with xxhash and found that xxhash is slower in all cases except very very high cardinality (where it is only slightly faster). Also, ``xxhash.New`` comes with an allocation of 80 bytes. Thus, to keep vectors alloc-free, we needed to add a `sync.Pool`, which would have an additional performance overhead. Signed-off-by: beorn7 <beorn@grafana.com>
@awilliams this will be of interest for you. |
Thanks @beorn7, this looks like a great bridge between now and v2. Nice to see the collision test case as well! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks for having a look. |
@@ -24,6 +24,8 @@ import ( | |||
|
|||
const separatorByte byte = 255 | |||
|
|||
var separatorByteSlice = []byte{255} // For convenient use with xxhash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not that it's likely to change, but should this not be []byte{separatorByte}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's better. (Originally, I wanted to remove separatorByte altogether, but then realized I want to keep the old hashing for vectors.) Will sent you a PR.
full diff: prometheus/client_golang@v0.9.4...v1.1.0 Using v1.1.0, because version v1.2.0 and up use versioned import paths for the github.com/cespare/xxhash/v2 dependency (prometheus/client_golang#657), which causes vendoring with vndr to break due to the v2 in the import-path. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: prometheus/client_golang@v0.9.4...v1.1.0 Using v1.1.0, because version v1.2.0 and up use versioned import paths for the github.com/cespare/xxhash/v2 dependency (prometheus/client_golang#657), which causes vendoring with vndr to break due to the v2 in the import-path. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: prometheus/client_golang@v0.9.4...v1.1.0 Using v1.1.0, because version v1.2.0 and up use versioned import paths for the github.com/cespare/xxhash/v2 dependency (prometheus/client_golang#657), which causes vendoring with vndr to break due to the v2 in the import-path. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Upstream-commit: 34a65cb3bad3cfc626bf0b15715680ed9633e455 Component: engine
full diff: prometheus/client_golang@v0.9.4...v1.1.0 Using v1.1.0, because version v1.2.0 and up use versioned import paths for the github.com/cespare/xxhash/v2 dependency (prometheus/client_golang#657), which causes vendoring with vndr to break due to the v2 in the import-path. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Sadly, that's not yet a complete solution (like the one planned for v2, see last item in #222). However, using xxhash for the registry improves things a lot. It also moves from summing to XOR'ing for the collectorID. (Hashing the hashes would be an option, too, but it would require sorting of the hashes. I didn't want to go down that rabbit hole and rather stick with a commutative and associative operation.)
(Interestingly, just moving from sum to XOR solved #633 already, but of course, other hash collisions might happen in practice despite the XOR. In fact, hash collisions might still occur. This PR is mostly there to make this sufficiently unlikely so that the world will survive until v2 is released.)