Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix STACObject inheritance #451

Merged

Conversation

duckontheweb
Copy link
Contributor

@duckontheweb duckontheweb commented Jun 17, 2021

Related Issue(s): #

Description:

Updates from_dict and from_file methods for STACObject and all sub-classes so that any classes inheriting from them will be returned by those methods. This makes it easier for downstream libraries like pystac-client to create custom sub-classes. See these tests for an example.

Also moves the logic from serialization.stac_object_from_dict into stac_io to simplify the chain of functions that is called. If it seems like downstream libraries are using this function directly we can go through a deprecation process, but it seemed unlikely. Once STAC_IO is removed in 1.0.0 this code will only be used in one place in the StacIO class, so it seemed to be simpler to just move it in there.

Supersedes #443

PR Checklist:

  • Code is formatted (run pre-commit run --all-files)
  • Tests pass (run scripts/test)
  • This PR maintains or improves overall codebase code coverage.
  • Changes are added to the CHANGELOG. See the docs for information about adding to the changelog.

@duckontheweb
Copy link
Contributor Author

@matthewhanson I believe this will close #410, but would like to get your confirmation that it covers the needs in that issue.

@lossyrob
Copy link
Member

@duckontheweb Want to clarify before review - can you talk about why STACObject.from_file uses logic that seems to be duplicated in StacIO.read_stac_object, when it was just a straight call before? It seems like this introduces that logic in a couple of different places (e.g. also pystac.read_file), where it could be consolidated into the StacIO method. I agree with the move out of serialization into StacIO, just not sure why the identify, migrate etc checks are duplicated in other places.

@duckontheweb
Copy link
Contributor Author

@duckontheweb Want to clarify before review - can you talk about why STACObject.from_file uses logic that seems to be duplicated in StacIO.read_stac_object, when it was just a straight call before? It seems like this introduces that logic in a couple of different places (e.g. also pystac.read_file), where it could be consolidated into the StacIO method. I agree with the move out of serialization into StacIO, just not sure why the identify, migrate etc checks are duplicated in other places.

Yeah, definitely a good question. StacIO.read_stac_object calls StacIO.stac_object_from_dict, which has hard-coded classes (pystac.Catalog, pystac.Collection, pystac.Item) that it calls from_dict methods on to get a STACObject sub-class instance. For custom subclasses (e.g. class API(pystac.Catalog)) this breaks the ability to get back the original calling class in from_file.

I originally considered passing the cls argument from that method down through the chain, but it seemed like it was going to change the signatures of too many functions, and I didn't want to introduce that change.

I agree the duplication of logic is not ideal, and makes maintenance trickier. I can take another look at this and see if there is a better solution that doesn't involve altering the StacIO method signatures and gives us some DRY-er code.

@codecov-commenter
Copy link

codecov-commenter commented Jun 17, 2021

Codecov Report

Merging #451 (b73ac4d) into main (a729cd2) will increase coverage by 0.16%.
The diff coverage is 91.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #451      +/-   ##
==========================================
+ Coverage   91.31%   91.47%   +0.16%     
==========================================
  Files          40       40              
  Lines        5204     5234      +30     
==========================================
+ Hits         4752     4788      +36     
+ Misses        452      446       -6     
Impacted Files Coverage Δ
pystac/serialization/__init__.py 100.00% <ø> (+13.04%) ⬆️
pystac/stac_io.py 77.19% <83.33%> (+0.48%) ⬆️
pystac/stac_object.py 94.70% <87.50%> (-0.35%) ⬇️
pystac/__init__.py 100.00% <100.00%> (ø)
pystac/catalog.py 94.70% <100.00%> (+1.19%) ⬆️
pystac/collection.py 95.49% <100.00%> (+1.76%) ⬆️
pystac/item.py 96.98% <100.00%> (+0.31%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a729cd2...b73ac4d. Read the comment docs.

@matthewhanson
Copy link
Member

this looks good, I think the consolidating the duplicated functions addresses the concern @lossyrob had.

Will test this out with pystac-client in a few.

Comment on lines 75 to 77
cat = pystac.Catalog.from_file(href)
cat = pystac.read_file(href)
if not isinstance(cat, pystac.Catalog):
raise pystac.STACTypeError(f"File at {href} is not a Catalog.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lossyrob @matthewhanson One consequence of adding the STACObject.identify methods is that Catalog.from_file will now fail on a Collection file, which is why these lines changed. Judging from discussion in radiantearth/stac-spec#1082 I think this is the right behavior, but let me know if you feel otherwise. If we feel it should succeed, I can update Catalog.identify accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good, though we should include the type information of what it is so that if someone does expect Catalog.form_file to work with collections, it will say "File at {href} is a {cat.STAC_OBJECT_TYPE } not a Catalog"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the right behavior. A Collection is not a Catalog, so should not be able to be opened up as such

Copy link
Member

@lossyrob lossyrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes suggested, LGTM otherwise!

pystac/__init__.py Outdated Show resolved Hide resolved
Comment on lines 75 to 77
cat = pystac.Catalog.from_file(href)
cat = pystac.read_file(href)
if not isinstance(cat, pystac.Catalog):
raise pystac.STACTypeError(f"File at {href} is not a Catalog.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good, though we should include the type information of what it is so that if someone does expect Catalog.form_file to work with collections, it will say "File at {href} is a {cat.STAC_OBJECT_TYPE } not a Catalog"

pystac/catalog.py Outdated Show resolved Hide resolved
@lossyrob
Copy link
Member

@duckontheweb mind if I merge? Some changes I'm working on now for #453 should be based on these changes.

@duckontheweb
Copy link
Contributor Author

No, go for it!

Copy link
Member

@lossyrob lossyrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually just merged in the changes from this branch, and had a question about STACObject.from_file, comment inline

if stac_io is None:
stac_io = pystac.StacIO.default()

if not is_absolute_href(href):
href = make_absolute_href(href)

o = stac_io.read_stac_object(href)
d = stac_io.read_json(href)
o = cls.from_dict(d, href=href, migrate=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Couldn't this remain stac_io.read_stac_object for one codepath to handle href -> STACObject logic?

The type will be STACObject as far as type hinting goes, but the underlying class will be either a Catalog, Collection or Item. The casting done in the implementations should work out fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made this change and saw that some tests break with custom STACObjects. disregard.

@lossyrob lossyrob dismissed their stale review June 17, 2021 22:42

Answered my own question

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants