-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Result document IDs are not sufficiently unique #50613
Comments
Pinging @elastic/ml-core (:ml) |
This might be lost to history, but why aren't we using |
I was just working on a change this afternoon to use The biggest question is now what happens if someone upgrades from a version with the old IDs to a version with the new ones. I haven’t yet checked if this makes duplicate results likely in any situation. |
Switch from a 32 bit Java hash to a 128 bit Murmur hash for creating document IDs from by/over/partition field values. The 32 bit Java hash was not sufficiently unique, and could produce identical numbers for relatively common combinations of by/partition field values such as L018/128 and L017/228. Fixes elastic#50613
Switch from a 32 bit Java hash to a 128 bit Murmur hash for creating document IDs from by/over/partition field values. The 32 bit Java hash was not sufficiently unique, and could produce identical numbers for relatively common combinations of by/partition field values such as L018/128 and L017/228. Fixes #50613
Switch from a 32 bit Java hash to a 128 bit Murmur hash for creating document IDs from by/over/partition field values. The 32 bit Java hash was not sufficiently unique, and could produce identical numbers for relatively common combinations of by/partition field values such as L018/128 and L017/228. Fixes #50613
Switch from a 32 bit Java hash to a 128 bit Murmur hash for creating document IDs from by/over/partition field values. The 32 bit Java hash was not sufficiently unique, and could produce identical numbers for relatively common combinations of by/partition field values such as L018/128 and L017/228. Fixes elastic#50613
More variation is needed in the document IDs. However, before making the change an analysis of the possibilities for duplicate documents caused by the change is required. We need to consider all the places where we're assuming that duplicate results will overwrite one another and the likelihood of occurrence.
The problem affects at least model plot, forecast and anomaly record results. Other types of results, for example, influencer results, contain a hash of just one string. We should consider consistency in the way IDs are created across the different types.
The text was updated successfully, but these errors were encountered: