Add dockerized development environment; pypgstac testing; CI - fix issue with ingest #18

lossyrob · 2021-07-16T00:23:14Z

Related Issues: #6

This PR adds a dockerized development environment that allows consistent testing across dev environments.

This follows the "Scripts To Rule Them All" pattern for scripts that control the development environment. It makes it very easy for developers to clone the repository and get started working with the project.

A docker-compose defines the development database (which uses the existing Dockerfile in the repo) and a dev container (which uses a new Docker.dev), and all scripts run through those docker containers.

This also adds black as a formatter, mypy as a type checker, and flake8 as a linter to the python project. This set of dev tools are become standard across python projects in stac-utils.

The following modifications were made to pypgstac:

Added the "pgready" command. This command has been useful in downsteram projects and I think it's a nice fit to include in pypgstac directly. This is used when brining up the development environment to ensure the database is ready to accept connections before doing other things, like running migrations.
Adds type information, and refactors some of the code to make mypy happy. These type annotations and refactors shouldn't change behavior.
Fixes an issue with loading ndjson of items. The fix was discovered by @bitner. I implemented a test to ensure test data loading works, which was failing prior to that fix.

There is no database schema updates, but I did change pgstac.sql to include running the sql/999_version.sql against new databases. Otherwise there's no entry in the migration table, and so a pypgstac migrate command will assume it's a new database, and fail on creation of tables. I also fixed that sql to include pypgstac in the SEARCH_PATH.

GitHub Workflows are created to perform CI on pull requests and on the main branch. It will also publish pypgstac to pypi on tags. This will allow us to publish only via tagging releases.

I was unable to get test/test.sh working - it seems like there's some missing files (e.g. test/test.sql) that perhaps were not committed.

I didn't touch the Makefile, but I suggest after this PR goes in to remove that file. The only thing not covered by these STRTA/CI is the publishing to dockerhub for pgstac images, which should be added in a follow up PR.

Testing:

Clone a fresh repo
Run scripts/setup to build a new docker dev environment
Run scripts/test to run all the tests
Run scripts/console to drop into a dev console, which will allow you to run pypgstac against the development db from the command line
Run scripts/console --db to drop into a psql interactive session to inspect the database
Run scripts/format to format the python code
Run scripts/migrate to migrate the db (which should be a no-op)
Run scripts/update to rebuild the docker containers.

This adds the entry to the migrations table which was missing prior, and would cause "pypgstac migrate" to fail on new databases

docker-compose.yml

bitner · 2021-07-16T18:56:51Z

@lossyrob thanks for pulling this all together. I definitely like this approach.

One thing that I don't have a strong opinion on. I had been using Poetry to build & publish to PyPi. In the Makefile the setup.py was actually generated from the Poetry build artifacts in the make command. If we stick with Poetry, we should add the dev requirements to the pyproject.toml rather than as a requirements-dev.txt and we should use Poetry for the build/publishing steps. If we don't want to use Poetry, we should get rid of the pyproject.toml and just use setup.py + requirements files.

I was also doing a couple steps in the Makefile to set the version in a few places and to concatenate the sql scripts into the pgstac-x.x.x.sql file that gets distributed with pypgstac. Not sure I was handling that in the best way, but we should make sure that we are getting the version number set appropriately in all the right places.

This causes the required egg-info to be created in the source directory, which is mounted in when running the dev container and can cause issues. Because the source is on the PYTHONPATH any edits mounted into the container will be used over the installed source

lossyrob · 2021-07-16T19:26:43Z

One thing that I don't have a strong opinion on. I had been using Poetry to build & publish to PyPi. In the Makefile the setup.py was actually generated from the Poetry build artifacts in the make command. If we stick with Poetry, we should add the dev requirements to the pyproject.toml rather than as a requirements-dev.txt and we should use Poetry for the build/publishing steps. If we don't want to use Poetry, we should get rid of the pyproject.toml and just use setup.py + requirements files.

I don't have experience with poetry, but am cool with learning it and integrating it if that's the preference. @geospatial-jeff had mentioned he had opinions on the use of poetry in stac-fastapi, would be curious of your thoughts on which direction to go here Jeff.

I was also doing a couple steps in the Makefile to set the version in a few places and to concatenate the sql scripts into the pgstac-x.x.x.sql file that gets distributed with pypgstac. Not sure I was handling that in the best way, but we should make sure that we are getting the version number set appropriately in all the right places.

Makes sense - perhaps we can either keep the Makefile or put this logic into a script, and then have these generated files committed as part of a PR during the release process? Usually right before a release there's a version change and a CHANGELOG turnover PR that's the last step before a release (e.g. stac-utils/pystac#532), so we could have those files generated as part of a release PR like that?

bitner · 2021-07-16T19:38:28Z

I was also doing a couple steps in the Makefile to set the version in a few places and to concatenate the sql scripts into the pgstac-x.x.x.sql file that gets distributed with pypgstac. Not sure I was handling that in the best way, but we should make sure that we are getting the version number set appropriately in all the right places.

Makes sense - perhaps we can either keep the Makefile or put this logic into a script, and then have these generated files committed as part of a PR during the release process? Usually right before a release there's a version change and a CHANGELOG turnover PR that's the last step before a release (e.g. stac-utils/pystac#532), so we could have those files generated as part of a release PR like that?

For consistency, I'd be inclined to add to the scripts. I can do that as a follow on PR and then once we get that and the docker push in, we can get rid of the makefile.

bitner

There are some follow on tasks that we can address now or in future PRs, but this is definitely a great step forward for managability.

Things we should follow up on:
Docker push CI

Version tagging script / process documentation including checklist to create version to version migration and change log.

Decision and cleanup on whether to use poetry.

Get PGTap SQL tests running/passing with CI.

geospatial-jeff · 2021-07-16T19:45:38Z

Having a way to lock down dependencies (poetry/pipenv) beyond what vanilla pip provides is really nice for dependency management. Ever since using poetry/pipenv I've never really had the "it works on my machine" sort of problem that you can run into with vanilla pip. The biggest difference is that pipenv is slow to lock but fast to install while poetry is fast to lock but slow to install. I think poetry is a little more ergonomic as well.

I'd recommend poetry over pipenv for this project.

kylebarron · 2021-07-16T20:46:01Z

I'd recommend poetry over pipenv for this project.

Just +1 I've had a good experience with Poetry lately. The next version of Poetry (or the 1.2.0 alpha) will make local development easier with support for editable installs like pip install -e .: python-poetry/poetry#3940

bitner · 2021-07-19T13:20:20Z

I'd recommend poetry over pipenv for this project.

Just +1 I've had a good experience with Poetry lately. The next version of Poetry (or the 1.2.0 alpha) will make local development easier with support for editable installs like pip install -e .: python-poetry/poetry#3940

Oh, that's awesome! That's why the makefile had a weird step that it extracted the setup.py from the build.

lossyrob · 2021-07-19T13:52:40Z

Made follow up issues, merging

lossyrob added 14 commits July 15, 2021 19:35

Add pgstac to the search path in migrations sql

3eb29c0

Add version script to pgstac initialization script.

7d1f7c7

This adds the entry to the migrations table which was missing prior, and would cause "pypgstac migrate" to fail on new databases

Add python entries to .gitignore

b4e3b36

Add docker-compose and Docker.dev for docker-based testing

c796160

Add scripts to control dev environment

7269c8f

Modify pypgstac to pass mypy test

4f04b70

Run formatting on pypgstac

5614e31

Fix linting errors

c8033c9

Fix unused arg in new pgready command

0fb52a4

Use BufferedIOBase, not TextIOWrapper

177ab1c

Add test to load test data

19dbd66

Fix issue with load - end lines with a newline

26df853

Fix collection ID to match items

d49b59b

Add CI build and publish workflows

fb5bda3

lossyrob requested a review from bitner July 16, 2021 00:23

Set pypgstac on the python path in the dev container

4ca6ab1

bitner reviewed Jul 16, 2021

View reviewed changes

docker-compose.yml Outdated Show resolved Hide resolved

lossyrob added 2 commits July 16, 2021 15:03

Fix typo in docker-compose

d999dac

bitner approved these changes Jul 16, 2021

View reviewed changes

This was referenced Jul 19, 2021

Push to dockerhub during CI publish workflow #19

Closed

Document process for release #20

Closed

Integrate poetry into development environment scripts #21

Closed

Implement PGTap SQL tests in scripts/test #22

Closed

lossyrob merged commit 16c222c into main Jul 19, 2021

lossyrob deleted the tests/rde/ingest-issue branch July 19, 2021 13:54

lossyrob mentioned this pull request Jul 20, 2021

Fix issue with logic around type of object returned by smart_open #26

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dockerized development environment; pypgstac testing; CI - fix issue with ingest #18

Add dockerized development environment; pypgstac testing; CI - fix issue with ingest #18

lossyrob commented Jul 16, 2021

bitner commented Jul 16, 2021

lossyrob commented Jul 16, 2021

bitner commented Jul 16, 2021

bitner left a comment

geospatial-jeff commented Jul 16, 2021

kylebarron commented Jul 16, 2021

bitner commented Jul 19, 2021

lossyrob commented Jul 19, 2021

Add dockerized development environment; pypgstac testing; CI - fix issue with ingest #18

Add dockerized development environment; pypgstac testing; CI - fix issue with ingest #18

Conversation

lossyrob commented Jul 16, 2021

bitner commented Jul 16, 2021

lossyrob commented Jul 16, 2021

bitner commented Jul 16, 2021

bitner left a comment

Choose a reason for hiding this comment

geospatial-jeff commented Jul 16, 2021

kylebarron commented Jul 16, 2021

bitner commented Jul 19, 2021

lossyrob commented Jul 19, 2021