fix: error on collection with no items TDE-1216 #1000
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
The
create-collection
standardising Argo workflow task that callscollection-from-items.py
does not fail if there are no items added to the collection, and creates an invalidcollection.json
file that fails STAC Validation (it has no item links or extents).This happened recently when a standardising workflow was retried, creating a new collection ID, however the target scratch directory already had standardised images and STAC. We do not overwrite exisiting standardised files, so the collection ID did not match between the new collection and the existing items.
We want to know if the collection has no items and the
create-collection
workflow task should fail with an error.Modifications
collection.json
file may exist.Examples of logging/Exception:
If the Item Collection ID and Collection ID do not match:
{"file": "afage/collection_from_items_id_mismatch/AZ34_5000_0402.json", "action": "collection_from_items", "reason": "skip", "level": 40, "time": 1721257664445, "v": 1, "pid": 1, "hostname": "d5df8d86499f", "id": "01J31F777EZD0M0WN8HZY1BVZ6", "msg": "skipping: 01J1XDYXSC1GC9ASDBXNT3A93T and 01J1XG0GJF2AK110287P265HJM do not match"}
If the collection has no items:
{"level": 50, "time": 1721258802092, "v": 1, "pid": 1, "hostname": "1807f10cfe48", "id": "01J31G9XHEFA79K827P0VQBYP9", "msg": "Collection 01J1XG0GJF2AK110287P265HJM has no items"} Traceback (most recent call last): File "/app/scripts/collection_from_items.py", line 187, in <module> main() File "/app/scripts/collection_from_items.py", line 172, in main raise Exception(f"Collection {collection_id} has no items") Exception: Collection 01J1XG0GJF2AK110287P265HJM has no items
If there is a non-item file in the source directory:
{"file": "afage/collection_from_items_id_mismatch/collection.json", "action": "collection_from_items", "reason": "skip", "level": 40, "time": 1721258457687, "v": 1, "pid": 1, "hostname": "2dbdb92c7ac4", "id": "01J31FZDP3AVF0B4SVTW3MGJBE", "msg": "skipping: not a STAC item"}
Verification
collection-from-items.py
should be refactored to be able to create unit tests for individual functions, including this functionality. This is outside the scope of this Pull Request.Testing was done by running the
collection-from-items.py
command with a matching, and a mismatching ID.Example command (for ID mismatch, change
--collection-id
to something like0123456789
):❯ docker build -t topo-imagery . && docker run -v ${HOME}/tmp/:/tmp/:rw topo-imagery python /app/scripts/collection_from_items.py --uri /home/dev/collection_from_items_id_mismatch/ --collection-id 01J1XG0GJF2AK110287P365HJM --category urban-aerial-photos --region hawkes-bay --gsd 1m --start-date 2023-09-20 --end-date 2023-12-21 --lifecycle ongoing --producer "Placeholder" --licensor "Placeholder" --concurrency 25