Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: error on collection with no items TDE-1216 #1000

Merged
merged 8 commits into from
Jul 29, 2024

Conversation

amfage
Copy link
Contributor

@amfage amfage commented Jul 16, 2024

Motivation

The create-collection standardising Argo workflow task that calls collection-from-items.py does not fail if there are no items added to the collection, and creates an invalid collection.json file that fails STAC Validation (it has no item links or extents).

This happened recently when a standardising workflow was retried, creating a new collection ID, however the target scratch directory already had standardised images and STAC. We do not overwrite exisiting standardised files, so the collection ID did not match between the new collection and the existing items.

We want to know if the collection has no items and the create-collection workflow task should fail with an error.

Modifications

  • Check that the collection ID and the item collection ID are the same and log a warning if they are not.
  • Also check that the item we are adding is an item, as sometimes a collection.json file may exist.
  • Raise an Exception if the collection has no items.

Examples of logging/Exception:
If the Item Collection ID and Collection ID do not match:
{"file": "afage/collection_from_items_id_mismatch/AZ34_5000_0402.json", "action": "collection_from_items", "reason": "skip", "level": 40, "time": 1721257664445, "v": 1, "pid": 1, "hostname": "d5df8d86499f", "id": "01J31F777EZD0M0WN8HZY1BVZ6", "msg": "skipping: 01J1XDYXSC1GC9ASDBXNT3A93T and 01J1XG0GJF2AK110287P265HJM do not match"}

If the collection has no items:
{"level": 50, "time": 1721258802092, "v": 1, "pid": 1, "hostname": "1807f10cfe48", "id": "01J31G9XHEFA79K827P0VQBYP9", "msg": "Collection 01J1XG0GJF2AK110287P265HJM has no items"} Traceback (most recent call last): File "/app/scripts/collection_from_items.py", line 187, in <module> main() File "/app/scripts/collection_from_items.py", line 172, in main raise Exception(f"Collection {collection_id} has no items") Exception: Collection 01J1XG0GJF2AK110287P265HJM has no items

If there is a non-item file in the source directory:
{"file": "afage/collection_from_items_id_mismatch/collection.json", "action": "collection_from_items", "reason": "skip", "level": 40, "time": 1721258457687, "v": 1, "pid": 1, "hostname": "2dbdb92c7ac4", "id": "01J31FZDP3AVF0B4SVTW3MGJBE", "msg": "skipping: not a STAC item"}

Verification

collection-from-items.py should be refactored to be able to create unit tests for individual functions, including this functionality. This is outside the scope of this Pull Request.

Testing was done by running the collection-from-items.py command with a matching, and a mismatching ID.

Example command (for ID mismatch, change --collection-id to something like 0123456789):
❯ docker build -t topo-imagery . && docker run -v ${HOME}/tmp/:/tmp/:rw topo-imagery python /app/scripts/collection_from_items.py --uri /home/dev/collection_from_items_id_mismatch/ --collection-id 01J1XG0GJF2AK110287P365HJM --category urban-aerial-photos --region hawkes-bay --gsd 1m --start-date 2023-09-20 --end-date 2023-12-21 --lifecycle ongoing --producer "Placeholder" --licensor "Placeholder" --concurrency 25

@amfage amfage marked this pull request as ready for review July 18, 2024 00:01
@amfage amfage requested a review from a team as a code owner July 18, 2024 00:01
@amfage amfage requested review from l0b0, blacha, MDavidson17, paulfouquet and dwsilk and removed request for l0b0 and MDavidson17 July 18, 2024 00:01
@paulfouquet paulfouquet added this pull request to the merge queue Jul 29, 2024
Merged via the queue into master with commit d351d85 Jul 29, 2024
2 checks passed
@paulfouquet paulfouquet deleted the fix/invalid-collection-tde-1216 branch July 29, 2024 22:43
@github-actions github-actions bot mentioned this pull request Jul 29, 2024
github-merge-queue bot pushed a commit that referenced this pull request Sep 12, 2024
🤖 I have created a release *beep* *boop*
---


##
[4.10.0](v4.9.0...v4.10.0)
(2024-09-12)


### Features

* add derived_from links to item TDE-1251
([#1043](#1043))
([077f14f](077f14f))
* determine start and end datetimes when item include derived TDE-1258
([#1052](#1052))
([62d571a](62d571a))
* make start_datetime and end_datetime optional when includeDerived is
True TDE-1258
([#1065](#1065))
([57f088c](57f088c))
* More types ([#989](#989))
([3f7cd57](3f7cd57))


### Bug Fixes

* capture area winding order TDE-1205
([#1008](#1008))
([61be096](61be096))
* error on collection with no items TDE-1216
([#1000](#1000))
([d351d85](d351d85))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants