You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue documents some slowness on moderately large queries. In the snippet below we fetch 8,234 items. It takes about a minute to construct the results.
We're spend roughly 2/3s of our time in stac_io.get_pages, which includes IO, waiting for the endpoint (and maybe parsing the JSON into Python objects?)
We spend the other 1/3 of our time in item_collection.from_dict
Some ideas for optimization:
Most of the time in item_collections.from_dict is spent on a deepcopy in pystac.Item.from_dict. It might be safe to skip that copy (since these should be coming off the network with no other references) and provide a copy=False flag to pystac.Item.from_dict, to allow it to mutate the incoming dict.
Maybe pystac_client.Client or .search could provide a raw=True/False flag to allow skipping constructing pystac Items?
Maybe some kind of async magic would speed up the reads? Hard to say, since I don't know how much time is spent waiting for results vs. parsing JSON. I don't know if it's a good idea to parse JSON on the asyncio event loop.
The text was updated successfully, but these errors were encountered:
Thanks for this @TomAugspurger , I think you're right that deepcopy can probably be skipped.
Also like the idea of maybe not converting to PySTAC Items, or separating that out, this would be useful for the case of just fetching and saving the results.
Async could help, but would require some sweeping changes in PySTAC and pystac-client (there's an experimental async branch of PySTAC, but outdated now), but more importantly the STAC spec currently only requires next links to do paging, so you have to get them sequentially.
Will look into the other suggestions, will be doing some work on this next week.
This issue documents some slowness on moderately large queries. In the snippet below we fetch 8,234 items. It takes about a minute to construct the results.
I ran
list(search.items())
under snakeviz and came up with this result: https://gistcdn.rawgit.org/TomAugspurger/fb5b3bde8cee09d2d9aa2f7215edf2b2/94e4ec2ae97bec2169f9263e8f41183418e885d9/mosaic-static.htmlA few notes:
stac_io.get_pages
, which includes IO, waiting for the endpoint (and maybe parsing the JSON into Python objects?)item_collection.from_dict
Some ideas for optimization:
item_collections.from_dict
is spent on adeepcopy
inpystac.Item.from_dict
. It might be safe to skip that copy (since these should be coming off the network with no other references) and provide acopy=False
flag topystac.Item.from_dict
, to allow it to mutate the incomingdict
.pystac_client.Client
or.search
could provide araw=True/False
flag to allow skipping constructing pystac Items?The text was updated successfully, but these errors were encountered: