-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move away from hashing #78
Comments
Here's a proposal for how to make our hashing collision-proof. We've been caching hashes, rather than just defining a hash function recursively and using a dictionary structure to avoid collisions, to avoid traversing a recursive call to hash multiple times in case we've constructed deep nested expressions, eg (+, (x[1], (+, x[2], (+, x[3], ...)))). Consider the following procedure. Maintain a global dictionary unique_exprs of (hash, expr) pairs for each expression constructed so far. When a new expression
You can prove that this works by induction. (Variables and constants have hashes that are unique b/c we use their |
If you have a dictionary where the keys are the expressions themselves and not the hashes, the internal logic of the dictionary will handle hash collisions. That could be a simpler solution. |
Yes, but we'll still need to define
expr.children)...)); unique_exprs[expr] = hash) On Tue, May 12, 2015 at 7:38 PM, Miles Lubin notifications@github.com
Madeleine Udell |
I don't know enough about the Convex.jl internals to understand the performance effects of hashing the same thing multiple times. Does caching hashes change the algorithmic complexity of how many times the nodes in an expression need to be traversed? Don't you have to do it at least once for every expression, and at most a constant number of times? |
The only time it seems likely to happen is in deeply nested expressions involving indexing. For example, the expression In fact, we try to be mildly more clever to avoid making deeply nested expressions. The expression is condensed every time, so eg at the last step [Actually, looking through the code it's somewhat concerning to me that we've ended up with two constructs for the same thing: |
Hashing may result in collisions, and since we use hashes to ascertain object equalities, we should probably move away from this.
The text was updated successfully, but these errors were encountered: