-
-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating lots of validator classes uses lots of memory #868
Comments
Hi. Is your actual code doing what your timing example does? It's unclear why there are different memory characteristics, so perhaps there's something to look into regardless, but it'd be quite odd to call
and if you always validate with the same schema (which seems to be the case from the example, but unclear whether that matches your real code), then creating a validator instance only once as well:
of course with whatever your real schema is. With that, on my macbook air, I get around 28 million iterations on 4.1.2 (I didn't try yet with 3.2). On PyPy without warming the JIT that number goes up to ~140 million iterations. Happy to look into performance issues for pathological cases as well, but want to make sure you observe the issue in a real scenario as well? |
Hey! Thanks for your answer. It was actually close to what we had in our codebase 🙈. Creating the class once indeed gets rid of the memory issue. It's certainly more efficient to do it this way, but in theory I don't think that creating a class that shouldn't have any references to it after the line is executed should leave anything behind memory-wise. So as you said, there may still be something that is worth checking out :-) Iterations on my machine:
|
Yep indeed, it still may be worth looking into, just way lower priority :) |
This is a bug in Upgrade to attrs 21.3.0 to fix this. Consider making the next jsonschema release depend on this |
A PR would certainly be welcome adding the pin from below. Though I'm still unconvinced any reasonable code should encounter this issue, so if you have some that does I'd love to see that too. |
Before pinning the version I think we should wait for confirmation by the OP. I also wonder if it fixes #853 (I had both issues open and accidentally replied there at first. Sorry.) Here's my use-case. I admit it's probably not reasonable :) I'm already working on fixes. In my codebase, APIs have schemas of different versions, as I slowly moved up from draft 3. The requests JSON is wrapped in case-insensitive dict, so in the old jsonschema that supported api_schema_json.setdefault('$schema', 'http://json-schema.org/draft-03/schema#')
# This one-liner works for any schema version, and has no equivalent now that `types=` is removed.
jsonschema.validate(request_json, api_schema_json, types={'object': (dict, CaseInsensitiveDict)}) When migrating to the new TypeChecker, I wrote this code instead: # Find the correct version to use
original_validator = jsonschema.validators.validator_for(api_schema_json, default=jsonschema.validators.Draft3Validator)
# Extend the type checker
Validator = jsonschema.validators.extend(
original_validator,
type_checker=original_validator.TYPE_CHECKER.redefine('object', lambda _, x: isinstance(x, (dict, CaseInsensitiveDict))))
# Run validation
Validator(schema=api_schema_json).validate(reques_json) Regardless of the attrs bug, I will now make sure to only create one "type-checker-patched" validator per version, and will also call |
I see. |
On a related note, I wonder if the docs should strongly encourage creating and storing validator objects per object when used in production code. I suspect it's not uncommon for users to execute |
I wouldn't be so sure, I'd suspect users who have found the Way more likely are users who just use
(though it didn't always say that IIRC.) But I'm certainly open to adding it regardless of what I think if you can come up with some short wording in a PR, specifically pointing it out at worst can't hurt. |
I can have a look at this with my original snippet early in the new year when I'm back from vacation. Thank you for looking at this even though it stemmed from an unreasonable usage 😊 |
Hi there! As part of a major upgrade in I've also just attempted to run the initial example, and here at least don't observe any memory growth after ~5 minutes of runtime. Going to close this as hopefully fixed either directly via the attrs bump or via some other change since this was filed (sorry it took so long) -- but if you observe this behavior again feel free to follow up with a new reproducer. |
We're using jsonschema to run validation on payloads coming into our web services and we've been noticing a steady memory increase in our services since upgrading to version 4+ of jsonschema. There also seems to be a performance drain, but it seems that this is already reported in #853. Further investigation revealed that the problem seems to stem from a piece of code that is heavily based on jsonschema documentation. I've narrowed the memory leak down to this snippet that reproduces the problem:
With jsonschema==3.2.0, this manages to do 1436757 iterations on my machine and shows this memory consumption:
However, when updating to 4.1.2 we only get 6763 iterations done and have steadily rising memory:
I've been getting similiar results on Python 3.7, 3.8, 3.9 and 3.10 as well as when using
Draft7Validator
instead ofDraft4Validator
.When we don't wrap the validator with
extend_with_default
the memory consumption remains steady also on 4.1.2 and we get 2324680 iterations done (that function seems to be a performance drain also in 3.2.0, but memory-wise it was alright):The text was updated successfully, but these errors were encountered: