Skip to content

Commit

Permalink
Merge pull request #18 from crim-ca/better-queryables-and-summaries
Browse files Browse the repository at this point in the history
- Make queryables and summaries automatically updatable

  Previously this app implemented a custom /queryables endpoint that crawled the database to display information about the 
  items stored in the database. This method has some limitations:

  It only worked for individual collections, not all queryables across all collections
  It was really slow since it had to inspect the entire database every time the endpoint was called

  This improves on this method by introducing postgres functions to collect the same queryables information from the database and store it in the queryables table. This caches the queryables information and allows the default /queryables endpoint function to get the same information quickly for a single collection or for all collections.

  A similar strategy is also implemented here to ensure that the collection summaries and extents are kept up to date.

- Update README.md to document the new functionality described above.

- Add `PATCH /queryables` endpoint to update queryables to reflect the current items stored in the database.
  This endpoint takes the optional parameter `minimal`. If the minimal parameter is True, then only "minimal" 
  queryables will set. Minimal queryables are those whose values are scalar JSON types. Collection JSON types 
  (objects and arrays) will be omitted.

- Add `PATCH /summaries` endpoint to update collection summaries to reflect the current items associated with
  all collections.

- Moved source code to the `src/` folder to improve code organization.

- Introduced `ruff` as a linter and formatter used by `pre-commit`.

- Only build docker images for published tags.
  • Loading branch information
mishaschwartz authored Feb 6, 2025
2 parents 40cad1a + 3370ea4 commit 040d350
Show file tree
Hide file tree
Showing 17 changed files with 1,094 additions and 391 deletions.
17 changes: 17 additions & 0 deletions .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[bumpversion]
current_version = 1.0.0
commit = True
tag = False

[bumpversion:file:CHANGES.md]
search =
[Unreleased](https://github.com/crim-ca/stac-app/tree/master)
------------------------------------------------------------------------------------------------------------------
replace =
[Unreleased](https://github.com/crim-ca/stac-app/tree/master)
------------------------------------------------------------------------------------------------------------------

[//]: # (list changes here, using '-' for each new entry, remove this when items are added)

[{new_version}](https://github.com/crim-ca/stac-app/tree/{new_version})
------------------------------------------------------------------------------------------------------------------
6 changes: 4 additions & 2 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
env.local
env.docker
*
!requirements.txt
!src/
src/**/__pycache__
45 changes: 15 additions & 30 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
name: Release Docker image

on:
push:
tags:
- "*"
branches:
- "*"
release:
types: [published]

env:
REGISTRY: ghcr.io
Expand All @@ -17,35 +14,23 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Get Tag Version
id: version
shell: bash
run: |
if [[ "${GITHUB_REF}" == "refs/heads/master" ]]; then
echo "::set-output name=TAG_VERSION::latest"
else
echo "::set-output name=TAG_VERSION::${GITHUB_REF##*/}"
fi
- name: Extract branch name
id: extract_branch
shell: bash
run: echo "##[set-output name=branch;]$(echo ${GITHUB_REF#refs/heads/})"
- name: Log in to the container registry
uses: docker/login-action@f054a8b539a109f9f41c372932f1ae047eff08c9
uses: actions/checkout@v4
- name: Log in to the Container registry
uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# - name: Build and push image using tag
# uses: docker/build-push-action@v3
# with:
# context: .
# push: true
# tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.version.outputs.TAG_VERSION }}
- name: Build and push image using branch name
uses: docker/build-push-action@v3
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
id: push
uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
with:
context: .
push: true
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.extract_branch.outputs.branch }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
.idea/
.vscode/

## Project
## linters/formatters
.ruff_cache/

# credentials file
env.local
Expand All @@ -20,3 +21,6 @@ env.docker

# Python
__pycache__/

# virtual environments
venv/
32 changes: 8 additions & 24 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,25 +1,9 @@
repos:
# - repo: https://github.com/pre-commit/pre-commit-hooks
# rev: v4.3.0
# hooks:
# - id: trailing-whitespace
# - id: end-of-file-fixer
# - id: check-yaml
# - id: check-added-large-files
- repo: https://github.com/psf/black
rev: 22.6.0
hooks:
- id: black
language_version: python3
# - repo: https://github.com/pre-commit/mirrors-autopep8
# rev: v2.0.0
# hooks:
# - id: autopep8
# - repo: https://github.com/PyCQA/flake8
# rev: 6.0.0
# hooks:
# - id: flake8
# - repo: https://github.com/PyCQA/isort
# rev: 5.10.1
# hooks:
# - id: isort
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.9.2
hooks:
# Run the linter.
- id: ruff
# Run the formatter.
- id: ruff-format
47 changes: 47 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Changes

[Unreleased](https://github.com/crim-ca/stac-app/tree/master)
------------------------------------------------------------------------------------------------------------------

[//]: # (list changes here, using '-' for each new entry, remove this when items are added)

[1.0.0](https://github.com/crim-ca/stac-app/tree/1.0.0)
------------------------------------------------------------------------------------------------------------------

# Changed

- Make queryables and summaries automatically updatable

Previously this app implemented a custom /queryables endpoint that crawled the database to display information about the
items stored in the database. This method has some limitations:

It only worked for individual collections, not all queryables across all collections
It was really slow since it had to inspect the entire database every time the endpoint was called

This improves on this method by introducing postgres functions to collect the same queryables information from the database and store it in the queryables table. This caches the queryables information and allows the default /queryables endpoint function to get the same information quickly for a single collection or for all collections.

A similar strategy is also implemented here to ensure that the collection summaries and extents are kept up to date.

- Update README.md to document the new functionality described above.

- Add `PATCH /queryables` endpoint to update queryables to reflect the current items stored in the database.
This endpoint takes the optional parameter `minimal`. If the minimal parameter is True, then only "minimal"
queryables will set. Minimal queryables are those whose values are scalar JSON types. Collection JSON types
(objects and arrays) will be omitted.

- Add `PATCH /summaries` endpoint to update collection summaries to reflect the current items associated with
all collections.

- Moved source code to the `src/` folder to improve code organization.

- Introduced `ruff` as a linter and formatter used by `pre-commit`.

- Only build docker images for published tags.

Prior Versions
------------------------------------------------------------------------------------------------------------------

All versions prior to [1.0.0](https://github.com/crim-ca/stac-app/1.0.0) were not officially tagged.
Is it strongly recommended to use a tagged version to ensure better traceability of changes that could impact behavior
and potential issues.
The docker image for the version directly prior to 1.0.0 is tagged as [version 0.0.0](https://github.com/crim-ca/stac-app/pkgs/container/stac-app/113480762?tag=0.0.0).
39 changes: 6 additions & 33 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,39 +1,12 @@
FROM python:3.12-slim

FROM python:3.8-slim as base
# see .dockerignore file for which files are included
COPY ./requirements.txt /requirements.txt

FROM base as builder
# Any python libraries that require system libraries to be installed will likely
# need the following packages in order to build
RUN apt-get update && apt-get install -y build-essential git
RUN python -m pip install -r /requirements.txt

ENV CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
ENV PATH=$PATH:/install/bin

ARG install_dev_dependencies=true

# TODO : temporary fix
RUN git clone https://github.com/stac-utils/stac-fastapi.git

WORKDIR /stac-fastapi

# TODO : checkout to working November 25 2022 version of stac-fastapi, where pgstac was bundled in stac-fastapi (now `pip install pypgstac`)
RUN git checkout d53e792

RUN pip install \
-e stac_fastapi/api \
-e stac_fastapi/types \
-e stac_fastapi/extensions
RUN pip install -e stac_fastapi/pgstac

RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
postgresql-client

COPY . /app
COPY ./src/ /app

WORKDIR /app

RUN pip install -r requirements.txt

CMD ["uvicorn", "stac_app:app", "--reload", "--host", "0.0.0.0", "--port", "8000", "--root-path", ""]
CMD ["uvicorn", "stac_app:app", "--host", "0.0.0.0", "--port", "8000", "--root-path", ""]
99 changes: 97 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,105 @@
# STAC API implementation for PAVICS (https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse)
# STAC API implementation for [Birdhouse](https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse)

This implementation extends [stac-fastapi-pgstac](https://github.com/stac-utils/stac-fastapi-pgstac) by providing the following additional features:

- [Custom Queryables](#custom-queryables)
- [Custom Collection Summaries](#custom-collection-summaries)
- [Settable Router Prefix](#settable-router-prefix)
- [Settable OpenAPI paths](#settable-openapi-paths)

## CONTRIBUTING
#### Custom Queryables

The [`/queryables` endpoints](https://github.com/stac-api-extensions/filter?tab=readme-ov-file#queryables) enabled by [stac-fastapi-pgstac](https://github.com/stac-utils/stac-fastapi-pgstac) only provide basic information about the STAC items. This includes the property type (string, array, number, etc.) but not much else.

This implementation adds additional postgres functions to help discover more detailed queryables information including minumums and maximums for range properties and enum values for discrete properties.

> [!Note]
> Dates are formatted as RFC 3339 strings and JSON schemas only support minumum/maximum for numeric types so minimum and maximum dates are provided as epoch seconds (in the "minimum" and "maximum" fields) and as RFC 3339 strings in the "description" field.
This also adds the following helper route `PATCH /queryables` which will update the
queryables stored in the database with up to date information from all items stored
in the database.

We recommend that you update the queryables after you add/remove/update any items in the database.

Custom queryables are enabled by default. To disable this feature and only use the
queryables provided by [stac-fastapi-pgstac](https://github.com/stac-utils/stac-fastapi-pgstac), set the `STAC_DEFAULT_QUERYABLES` environment variable to `1`.

```sh
export STAC_DEFAULT_QUERYABLES=1
```

#### Custom Collection Summaries

Collections in STAC are strongly recommended to provide [summaries](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#summaries) and [extents](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#extents) of the items they contain. This includes the temporal and spatial extents of the whole collection as well as the minumums and maximums for range properties and enum values for discrete properties of items.

These values are not updated automatically so this implementation adds additional postgres functions to help keep these collection summaries and extents up to date.

This also adds the following helper route `PATCH /summaries` which will update the
collection summaries and extents stored in the database with up to date information from all items stored
in the database.

> [!Note]
> These functions will only update the first extent value which defines the extent of the whole collection, additional extents that describe subsets of the collection will not be modified.
Custom summaries are enabled by default. To disable this feature and set the `STAC_DEFAULT_SUMMARIES` environment variable
to `1`:

```sh
export STAC_DEFAULT_SUMMARIES=1
```

#### Settable Router Prefix

To set a custom router prefix, set the `ROUTER_PREFIX` environment variable.

For example, the following access the same route:

With no router prefix set:

```
GET /collections
```

With a custom router prefix set to `/my-prefix`:

```
GET /my-prefix/collections
```

#### Settable OpenAPI paths

To set a custom path for the OpenAPI routes set the following environment variables:

- `OPENAPI_URL`
- default: `/api`
- returns a description of this API in JSON format
- `DOCS_URL`
- default: `/api.html`
- returns a description of this API in HTML format

> [!NOTE]
> Note that other environment variables can be used to set other settings according to the [FastAPI documentation](https://fastapi.tiangolo.com/advanced/settings/#settings-and-environment-variables) and the
[STAC-FastAPI documentation](https://stac-utils.github.io/stac-fastapi/tips-and-tricks/#set-api-title-description-and-version)

## Contributing

Ensure that the pre-commit checks are installed so that you make sure that your code changes conform to
the expected style for this project.

```
pip install pre-commit
pre-commit install
```

## Releasing

Before making a new release:

```
pip install bump2version
bump2version <part>
```

Where `<part>` is one of `major`, `minor`, `patch` to determine which version number is updated.
This project uses [semantic versioning](https://semver.org/).
Loading

0 comments on commit 040d350

Please sign in to comment.