Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve date parsing #722

Merged
merged 8 commits into from
Aug 20, 2020
Merged

Improve date parsing #722

merged 8 commits into from
Aug 20, 2020

Conversation

helrond
Copy link
Contributor

@helrond helrond commented Aug 13, 2020

Attempts to use fromisoformat to parse date strings. The (buggy) strptime method is used as a fallback for Python versions < 3.7

fixes #685

@codecov
Copy link

codecov bot commented Aug 13, 2020

Codecov Report

Merging #722 into master will decrease coverage by 0.17%.
The diff coverage is 54.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #722      +/-   ##
==========================================
- Coverage   96.18%   96.01%   -0.18%     
==========================================
  Files          17       17              
  Lines        2648     2658      +10     
  Branches      308      310       +2     
==========================================
+ Hits         2547     2552       +5     
- Misses         84       87       +3     
- Partials       17       19       +2     
Impacted Files Coverage Δ
jsonschema/tests/test_jsonschema_test_suite.py 93.02% <50.00%> (-6.98%) ⬇️
jsonschema/_format.py 77.27% <60.00%> (-0.57%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff1ddcf...d9d5fe1. Read the comment docs.

@Julian
Copy link
Member

Julian commented Aug 18, 2020

@helrond now that these are merged upstream think you can now unskip the relevant tests and help get this over the finish line?

Appreciated!

@helrond
Copy link
Contributor Author

helrond commented Aug 18, 2020

@Julian sure - what exactly do I need to do?

@Julian
Copy link
Member

Julian commented Aug 18, 2020

@helrond so I basically pulled in the changes you made to the test suite, so now those tests are in here, but being skipped on master (where they'd otherwise fail).

Essentially what needs to be done now is to unskip them (delete e.g. these lines: https://github.com/Julian/jsonschema/blob/master/jsonschema/tests/test_jsonschema_test_suite.py#L103-L111), confirm that the tests now pass in this branch, and then we'd be good.

Lemme know if that makes sense!

@helrond
Copy link
Contributor Author

helrond commented Aug 18, 2020

OK, I think I did that right, but tests are failing on everything < Python 3.6, which I believe is a consequence of the implementation we chose, which falls back to the buggy strptime method if fromisoformat isn't available.

@Julian
Copy link
Member

Julian commented Aug 18, 2020

Got it so... probably the thing to do is to re-add a skip that just skips those tests on 3.6 then? Have a look at e.g. the narrow_unicode skipping thing, it should show you how to do that (though yours will now predicate on sys.version_info)

@helrond
Copy link
Contributor Author

helrond commented Aug 19, 2020

OK, so tests are passing now but I'm getting an error on test coverage. Any pointers on how to fix that?

@Julian
Copy link
Member

Julian commented Aug 19, 2020

Cool! We can likely ignore that, it's due to the fact that we don't yet combine coverage across all runs, so codecov is confused and thinks coverage has gone down, it won't block merging though.

Let me review! Will leave any comments.

@@ -323,7 +323,10 @@ def is_regex(instance):
def is_date(instance):
if not isinstance(instance, str):
return True
return datetime.datetime.strptime(instance, "%Y-%m-%d")
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but might as well -- it'll save doing the "wrong" thing on each call on Py36 if we do this only once rather than each time we validate.

I.e. if we move this into a try/except, and do:

if hasattr(datetime.date, "fromisoformat"):
     _is_date = datetime.date.fromisoformat
else:
    def _is_date(instance):
        return datetime.datetime.strptime(instance, "%Y-%m-%d")

def is_date(...):
    return _is_date(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully got what you were after here, but happy to take another crack at it if I messed up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been less lazy in writing it out :) -- so the point is (which is true of your commit here too) -- as is, every time the is_date validation is applied, it checks to see which implementation should be used. So on Py36, that means you first try fromisoformat, then that fails, then you try the old way, but then you "forget" about the fact that it didn't work and the next time the function is called, you'll again try the fromisoformat, again find it doesn't work, etc.

Instead, what I was trying to say was to move the check outside of the is_date function entirely. In other words, globally at the top level in the module:

if hasattr(datetime.date, "fromisoformat"):
     _is_date = datetime.date.fromisoformat
else:
    def _is_date(instance):
        return datetime.datetime.strptime(instance, "%Y-%m-%d")

@_checks_drafts(draft3="date", draft7="date", raises=ValueError)
def is_date(instance):
    if not isinstance(instance, str):
        return True
    return _is_date(instance)

Now, we only check once on import which implementation should be used, and then if you call that function 1000 times, we've already figured out the right implementation ahead of time.

Does that make more sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, got it. If I'd looked around a bit more in that file I'd have seen you doing that kind of thing in other spots. This one should be resolved now.

Copy link
Member

@Julian Julian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor nitpicks on the implementation that I should have pointed out earlier.

After that though think we'll be good to merge.

Much appreciated again...

@Julian
Copy link
Member

Julian commented Aug 19, 2020

All good, this looks great now, thanks again!

@Julian Julian merged commit 756de12 into python-jsonschema:master Aug 20, 2020
@helrond helrond deleted the issue-685 branch August 20, 2020 00:17
Julian added a commit to sloanlance/jsonschema that referenced this pull request Aug 29, 2020
* origin/master:
  Remove docformatter, it's too unwieldy.
  Minor doc formatting.
  Sigh, this appears to be a regex, not exact match, so it was ignoring everything.
  Update pre-commit hooks.
  And two last ones...
  Remove the rest of the notebooks.ai related resources.
  Remove demo links from README file
  Move the latest validator CLI test to where it must go now.
  Simplify the schemas in the tests for CLI validator detection.
  add test case
  Delete the failed test case
  Fix the bug of processing arguments[validator] in parse_args
  use correct format check globally
  renames message and function
  better handling for Python 3.6
  fix formatting error
  skip tests on python 3.6
  unskip tests
  Temporarily skip the tests that will be unskipped by python-jsonschema#722.
  Squashed 'json/' changes from 86f52b87..fce9e9b3
  Validate IP addresses using the ipaddress module.
  Skip more tests for python-jsonschema#686.
  Squashed 'json/' changes from ea415537..86f52b87
  improve date parsing
  Bump the version in the notebooks.io requirements.
  pre-commit setup.
  Trailing whitespace
  Kill Py2 in the sphinx role.
  isorted.
  Unbreak non-format-setuptools-extra-installed jsonschema.
  Run CI against all setuptools extras separately.
  Pick a format checker that has no external dep now that fqdn exists.
  regenerated the requirements.txt
  Modify the code based on review comments
  fix bug about hostname by import fqdn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

YYYY-M-D is incorrectly allowed as a format for dates
2 participants