Removed uuidtools #344

iainbeeston · 2016-06-30T12:17:18Z

Right now, we cache schemas, and the key we use to cache them is either the URL we got them from, or if there's no URL, we generate a UUID from the json text of the schema. We're basically using a v5 UUID as a hashing function. What's more, json-schema includes it's own UUID library for doing this.

I've swapped that out, for generating a SHA1 hash instead. It should be fast, although in my own benchmarks, the speed is comparable to the current implementation (real world speed may depend on the size of the schemas being cached). At the very least, I believe it's clearer to use SHA1 (rather than UUID) and it also means we don't need an external ruby file copied into the project.

iainbeeston · 2016-10-05T14:29:26Z

@RST-J Do you have any thoughts about this? It will allow us to remove another dependency (although this one is included inside json-schema itself)

iainbeeston · 2016-10-05T14:30:25Z

lib/json-schema/util/uri.rb

@@ -54,6 +55,11 @@ def self.file_uri(uri)
        Addressable::URI.convert_path(parsed_uri.path)
      end

+      def self.fake_uri(string)


Probably better to name this hash_uri

I agree that it should rather have hash in the name than fake. But actually I would not call it hash_uri because the method itself does not hash URIs but is used to hash a schema. So maybe hash_schema would be a better name. But now it actually does not care about whether the input is a schema or not, it takes a string and returns a URI created from the hash of that string. So maybe uri_from_sha1 oder uri_from_sha1_hash or something like that would be more precise. But also more verbose. I'm personally getting more and more a fan of being as precise as possible with naming methods as it eases reasoning about code (we likely have some worse candidates here). What do you think?

RST-J · 2016-10-05T20:35:54Z

Removing the dependency is a good idea I think, we didn't use anything else from that and your replacement fits the purpose.
In terms of speed, MD5 is faster, so maybe we could take just this? I mean, there is no security concern here (or am I missing something?). But there is a comment in the removed library which suggests not to do that:

UUID generation using SHA1. Recommended over create_md5.

As I dont't know why and there is no reason given, I'm not sure whether this performance improvement is worthwhile or not. What do you know and think about that?

iainbeeston · 2016-10-06T07:20:00Z

I'm actually starting to have doubts about this now.

Perhaps we should be using String.hash, which would be an order of magnitude faster. So far as I'm aware String.hash should realistically not have collisions (although I'm struggling to find good evidence of that)

Also, it might be better to generate not just a hex string but a uri with a fake scheme that is immediately identifiable as an internally calculated value (something like "ref://af562bbc" or "internal://539daf54dd"?)

iainbeeston · 2016-10-06T07:21:46Z

Incidentally, @mjc did suggest using String.hash in a previous pull request (any credit for the idea should go to him!)

iainbeeston · 2016-10-06T07:45:39Z

I've now changed my mind entirely - I think we should simply use String.hash.to_s(16) (with an appropriate scheme to make it look like a URI).

String.hash in MRI ruby uses siphash-2-4, which should be good enough for our needs and much faster than SHA1

RST-J · 2016-11-08T21:02:33Z

I think, using a custom scheme is a good idea. It prevents accidental collisions, no matter how improbable they are.
And, if String.hash turns out to be not collision resistant enough, again no matter how improbable, one can always circumvent the immediate problem by assigning an URI and we can mitigate the problem if it ever gets reported. As this is internal, I assume no one should rely on the details.

iainbeeston force-pushed the remove-uuidtools branch from 638b6fb to 2abbc19 Compare June 30, 2016 15:19

Removed uuidtools

109ecb1

iainbeeston force-pushed the remove-uuidtools branch from 2abbc19 to 109ecb1 Compare September 29, 2016 14:34

iainbeeston commented Oct 5, 2016

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removed uuidtools #344

Removed uuidtools #344

iainbeeston commented Jun 30, 2016

iainbeeston commented Oct 5, 2016

iainbeeston Oct 5, 2016

RST-J Oct 5, 2016

RST-J commented Oct 5, 2016 •

edited

Loading

iainbeeston commented Oct 6, 2016

iainbeeston commented Oct 6, 2016

iainbeeston commented Oct 6, 2016

RST-J commented Nov 8, 2016

Removed uuidtools #344

Are you sure you want to change the base?

Removed uuidtools #344

Conversation

iainbeeston commented Jun 30, 2016

iainbeeston commented Oct 5, 2016

iainbeeston Oct 5, 2016

Choose a reason for hiding this comment

RST-J Oct 5, 2016

Choose a reason for hiding this comment

RST-J commented Oct 5, 2016 • edited Loading

iainbeeston commented Oct 6, 2016

iainbeeston commented Oct 6, 2016

iainbeeston commented Oct 6, 2016

RST-J commented Nov 8, 2016

RST-J commented Oct 5, 2016 •

edited

Loading