Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues/37 Add function for returning an iterator instead of sequence #91

Merged
merged 4 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- Add method `Query.results` for returning results as an iterator instead of sequence ([#37](https://github.com/nasa/python_cmr/issues/37))

### Changed

- Deprecate methods `Query.get` and `Query.get_all` in favor of the new `Query.results` method. These deprecated methods will likely be removed for the 1.0.0 release. ([#37](https://github.com/nasa/python_cmr/issues/37))

## [0.13.0]

### Added
Expand Down
61 changes: 59 additions & 2 deletions cmr/queries.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
from datetime import date, datetime, timezone
from inspect import getmembers, ismethod
from re import search
from typing import Iterator

from typing_extensions import (
Any,
List,
Expand All @@ -20,7 +22,7 @@
Tuple,
TypeAlias,
Union,
override,
override, deprecated,
)
from urllib.parse import quote

Expand Down Expand Up @@ -58,6 +60,7 @@ def __init__(self, route: str, mode: str = CMR_OPS):
self.concept_id_chars: Set[str] = set()
self.headers: MutableMapping[str, str] = {}

@deprecated("Use the 'results' method instead, but note that it produces an iterator.")
def get(self, limit: int = 2000) -> Sequence[Any]:
"""
Get all results up to some limit, even if spanning multiple pages.
Expand Down Expand Up @@ -115,6 +118,7 @@ def hits(self) -> int:

return int(response.headers["CMR-Hits"])

@deprecated("Use the 'results' method instead, but note that it produces an iterator.")
def get_all(self) -> Sequence[Any]:
"""
Returns all of the results for the query. This will call hits() first to determine how many
Expand All @@ -123,8 +127,61 @@ def get_all(self) -> Sequence[Any]:

:returns: query results as a list
"""

return list(self.get(self.hits()))

def results(self, page_size: int = 2000) -> Iterator[Any]:
"""
Return an iterator (generator) of all results matching the query
criteria.

Because a query may produce a large number of results (perhaps
10s or 100s of thousands), such results are fetched using
multiple CMR requests, each returning a "page" of results, as
returning all results in a single request would be impractical.
The size of each page (in terms of the number of results
in a page) is controlled by the `page_size` parameter. A smaller
page size means fewer items in memory (per page), requiring
more CMR queries to fetch all results (if desired). Conversely,
a larger page size means more items in memory (per page)
and fewer CMR queries.

When the query is configured to use the `"json"` format, each
element produced by the returned iterator is a element of the
"feed.entry" array (see
<https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#json>).
In this case, the iterator may produce as many elements as there
are results matching the query criteria.

For all other formats, each element produced by the returned
iterator is an unparsed (text) page of results (i.e., the caller
is responsible for parsing the page of results into individual
elements). In this case, the iterator will produce only as many
pages as required (based on `page_size`) to produce all results
matching the query criteria.

:param page_size: maximum number of results per page (min 1,
max 2000 [default]) requested from the CMR
:returns: query results as an iterator (generator)
"""

url = self._build_url()
headers = dict(self.headers or {})
params = {"page_size": min(max(1, page_size), 2000)}

while True:
response = requests.get(url, headers=headers, params=params)
response.raise_for_status()

if self._format == "json":
yield from response.json()["feed"]["entry"]
else:
yield response.text

if not (cmr_search_after := response.headers.get("cmr-search-after")):
break

return self.get(self.hits())
headers["cmr-search-after"] = cmr_search_after

def parameters(self, **kwargs: Any) -> Self:
"""
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,11 @@ interactions:
- gzip, deflate
Connection:
- keep-alive
User-Agent:
- python-requests/2.31.0
method: GET
uri: https://cmr.earthdata.nasa.gov/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=0
response:
body:
string: '{"feed":{"updated":"2023-08-14T17:02:36.801Z","id":"https://cmr.earthdata.nasa.gov:443/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=0","title":"ECHO
string: '{"feed":{"updated":"2024-09-24T21:02:25.663Z","id":"https://cmr.earthdata.nasa.gov:443/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=0","title":"ECHO
granule metadata","entry":[]}}'
headers:
Access-Control-Allow-Origin:
Expand All @@ -25,37 +23,41 @@ interactions:
CMR-Hits:
- '163'
CMR-Request-Id:
- 5855d714-8aff-4d0f-b4cc-e556f02ef96a
- a22de087-6217-4845-b19f-734cad960bce
CMR-Took:
- '52'
- '275'
Connection:
- keep-alive
Content-MD5:
- 4c37cbd504ace09da5a3997968626ea5
Content-SHA1:
- 5416d1b30f3c052f18b3d65dc42b5ce01d69739c
Content-Type:
- application/json;charset=utf-8
Date:
- Mon, 14 Aug 2023 17:02:36 GMT
- Tue, 24 Sep 2024 21:02:25 GMT
Server:
- ServerTokens ProductOnly
Strict-Transport-Security:
- max-age=31536000
- max-age=31536000; includeSubDomains; preload
Transfer-Encoding:
- chunked
Vary:
- Accept-Encoding, User-Agent
Via:
- 1.1 cc58556a6e846289f4d3105969536e4c.cloudfront.net (CloudFront)
- 1.1 4208ca8c7c521bdbe71d5b0a82523074.cloudfront.net (CloudFront)
X-Amz-Cf-Id:
- qj9VuAc1JQu-rnMVDg3mGwstR-jGQA4rd7MKVRAEpXeTDbKZT5p5jg==
- 5SXRX7_nObc-1MC-o1YvNCocGFg33JIsTWClbrYt375rj1K5YAqxpg==
X-Amz-Cf-Pop:
- SFO53-C1
- LAX50-C1
X-Cache:
- Miss from cloudfront
X-Content-Type-Options:
- nosniff
X-Frame-Options:
- SAMEORIGIN
X-Request-Id:
- qj9VuAc1JQu-rnMVDg3mGwstR-jGQA4rd7MKVRAEpXeTDbKZT5p5jg==
- 5SXRX7_nObc-1MC-o1YvNCocGFg33JIsTWClbrYt375rj1K5YAqxpg==
X-XSS-Protection:
- 1; mode=block
content-length:
Expand All @@ -72,13 +74,11 @@ interactions:
- gzip, deflate
Connection:
- keep-alive
User-Agent:
- python-requests/2.31.0
method: GET
uri: https://cmr.earthdata.nasa.gov/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=163
response:
body:
string: '{"feed":{"updated":"2023-08-14T17:02:40.416Z","id":"https://cmr.earthdata.nasa.gov:443/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=163","title":"ECHO
string: '{"feed":{"updated":"2024-09-24T21:02:26.149Z","id":"https://cmr.earthdata.nasa.gov:443/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=163","title":"ECHO
granule metadata","entry":[{"boxes":["-89.5 0.5 89.5 180","-89.5 -180 89.5
-0.5"],"time_start":"2002-04-04T00:00:00.000Z","updated":"2023-04-17T15:27:21.022Z","dataset_id":"JPL
TELLUS GRACE Level-3 Monthly Land Water-Equivalent-Thickness Surface Mass
Expand Down Expand Up @@ -2045,39 +2045,43 @@ interactions:
CMR-Hits:
- '163'
CMR-Request-Id:
- 60eb29b2-95e1-453c-8efe-6e59cf649eb5
- a2563e97-3528-4d5a-9849-4dd17e703065
CMR-Search-After:
- '["pocloud",1495497600000,2658328520]'
CMR-Took:
- '4959'
- '332'
Connection:
- keep-alive
Content-MD5:
- 5199240b27f047a4f32ada5932c96a1b
Content-SHA1:
- f74df0ad201ceea97aaae8228a500c0392625b11
Content-Type:
- application/json;charset=utf-8
Date:
- Mon, 14 Aug 2023 17:02:42 GMT
- Tue, 24 Sep 2024 21:02:26 GMT
Server:
- ServerTokens ProductOnly
Strict-Transport-Security:
- max-age=31536000
- max-age=31536000; includeSubDomains; preload
Transfer-Encoding:
- chunked
Vary:
- Accept-Encoding, User-Agent
Via:
- 1.1 44933b72098305e9c31fc50b2e6554a0.cloudfront.net (CloudFront)
- 1.1 924eb6575c2679d663c17bd1e792d09a.cloudfront.net (CloudFront)
X-Amz-Cf-Id:
- 9TJ3JRMGc6mUxKegR4f2HSLC_1Cfwei5QHZuicg_aLsWEJS3T6XCNg==
- dMoCuQG8wTJ2-UTtNbyJ-2SJPskyH6fndTldIhVmZodoXMVr1BugXg==
X-Amz-Cf-Pop:
- SFO53-C1
- LAX50-C1
X-Cache:
- Miss from cloudfront
X-Content-Type-Options:
- nosniff
X-Frame-Options:
- SAMEORIGIN
X-Request-Id:
- 9TJ3JRMGc6mUxKegR4f2HSLC_1Cfwei5QHZuicg_aLsWEJS3T6XCNg==
- dMoCuQG8wTJ2-UTtNbyJ-2SJPskyH6fndTldIhVmZodoXMVr1BugXg==
X-XSS-Protection:
- 1; mode=block
content-length:
Expand Down
Loading