Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dockerized development environment; pypgstac testing; CI - fix issue with ingest #18

Merged
merged 17 commits into from
Jul 19, 2021

Conversation

lossyrob
Copy link
Member

Related Issues: #6

This PR adds a dockerized development environment that allows consistent testing across dev environments.

This follows the "Scripts To Rule Them All" pattern for scripts that control the development environment. It makes it very easy for developers to clone the repository and get started working with the project.

A docker-compose defines the development database (which uses the existing Dockerfile in the repo) and a dev container (which uses a new Docker.dev), and all scripts run through those docker containers.

This also adds black as a formatter, mypy as a type checker, and flake8 as a linter to the python project. This set of dev tools are become standard across python projects in stac-utils.

The following modifications were made to pypgstac:

  • Added the "pgready" command. This command has been useful in downsteram projects and I think it's a nice fit to include in pypgstac directly. This is used when brining up the development environment to ensure the database is ready to accept connections before doing other things, like running migrations.
  • Adds type information, and refactors some of the code to make mypy happy. These type annotations and refactors shouldn't change behavior.
  • Fixes an issue with loading ndjson of items. The fix was discovered by @bitner. I implemented a test to ensure test data loading works, which was failing prior to that fix.

There is no database schema updates, but I did change pgstac.sql to include running the sql/999_version.sql against new databases. Otherwise there's no entry in the migration table, and so a pypgstac migrate command will assume it's a new database, and fail on creation of tables. I also fixed that sql to include pypgstac in the SEARCH_PATH.

GitHub Workflows are created to perform CI on pull requests and on the main branch. It will also publish pypgstac to pypi on tags. This will allow us to publish only via tagging releases.

I was unable to get test/test.sh working - it seems like there's some missing files (e.g. test/test.sql) that perhaps were not committed.

I didn't touch the Makefile, but I suggest after this PR goes in to remove that file. The only thing not covered by these STRTA/CI is the publishing to dockerhub for pgstac images, which should be added in a follow up PR.

Testing:

  • Clone a fresh repo
  • Run scripts/setup to build a new docker dev environment
  • Run scripts/test to run all the tests
  • Run scripts/console to drop into a dev console, which will allow you to run pypgstac against the development db from the command line
  • Run scripts/console --db to drop into a psql interactive session to inspect the database
  • Run scripts/format to format the python code
  • Run scripts/migrate to migrate the db (which should be a no-op)
  • Run scripts/update to rebuild the docker containers.

@lossyrob lossyrob requested a review from bitner July 16, 2021 00:23
docker-compose.yml Outdated Show resolved Hide resolved
@bitner
Copy link
Collaborator

bitner commented Jul 16, 2021

@lossyrob thanks for pulling this all together. I definitely like this approach.

One thing that I don't have a strong opinion on. I had been using Poetry to build & publish to PyPi. In the Makefile the setup.py was actually generated from the Poetry build artifacts in the make command. If we stick with Poetry, we should add the dev requirements to the pyproject.toml rather than as a requirements-dev.txt and we should use Poetry for the build/publishing steps. If we don't want to use Poetry, we should get rid of the pyproject.toml and just use setup.py + requirements files.

I was also doing a couple steps in the Makefile to set the version in a few places and to concatenate the sql scripts into the pgstac-x.x.x.sql file that gets distributed with pypgstac. Not sure I was handling that in the best way, but we should make sure that we are getting the version number set appropriately in all the right places.

This causes the required egg-info to be created in the source
directory, which is mounted in when running the dev container and can
cause issues. Because the source is on the PYTHONPATH any edits
mounted into the container will be used over the installed source
@lossyrob
Copy link
Member Author

One thing that I don't have a strong opinion on. I had been using Poetry to build & publish to PyPi. In the Makefile the setup.py was actually generated from the Poetry build artifacts in the make command. If we stick with Poetry, we should add the dev requirements to the pyproject.toml rather than as a requirements-dev.txt and we should use Poetry for the build/publishing steps. If we don't want to use Poetry, we should get rid of the pyproject.toml and just use setup.py + requirements files.

I don't have experience with poetry, but am cool with learning it and integrating it if that's the preference. @geospatial-jeff had mentioned he had opinions on the use of poetry in stac-fastapi, would be curious of your thoughts on which direction to go here Jeff.

I was also doing a couple steps in the Makefile to set the version in a few places and to concatenate the sql scripts into the pgstac-x.x.x.sql file that gets distributed with pypgstac. Not sure I was handling that in the best way, but we should make sure that we are getting the version number set appropriately in all the right places.

Makes sense - perhaps we can either keep the Makefile or put this logic into a script, and then have these generated files committed as part of a PR during the release process? Usually right before a release there's a version change and a CHANGELOG turnover PR that's the last step before a release (e.g. stac-utils/pystac#532), so we could have those files generated as part of a release PR like that?

@bitner
Copy link
Collaborator

bitner commented Jul 16, 2021

I was also doing a couple steps in the Makefile to set the version in a few places and to concatenate the sql scripts into the pgstac-x.x.x.sql file that gets distributed with pypgstac. Not sure I was handling that in the best way, but we should make sure that we are getting the version number set appropriately in all the right places.

Makes sense - perhaps we can either keep the Makefile or put this logic into a script, and then have these generated files committed as part of a PR during the release process? Usually right before a release there's a version change and a CHANGELOG turnover PR that's the last step before a release (e.g. stac-utils/pystac#532), so we could have those files generated as part of a release PR like that?

For consistency, I'd be inclined to add to the scripts. I can do that as a follow on PR and then once we get that and the docker push in, we can get rid of the makefile.

Copy link
Collaborator

@bitner bitner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some follow on tasks that we can address now or in future PRs, but this is definitely a great step forward for managability.

Things we should follow up on:
Docker push CI

Version tagging script / process documentation including checklist to create version to version migration and change log.

Decision and cleanup on whether to use poetry.

Get PGTap SQL tests running/passing with CI.

@geospatial-jeff
Copy link

Having a way to lock down dependencies (poetry/pipenv) beyond what vanilla pip provides is really nice for dependency management. Ever since using poetry/pipenv I've never really had the "it works on my machine" sort of problem that you can run into with vanilla pip. The biggest difference is that pipenv is slow to lock but fast to install while poetry is fast to lock but slow to install. I think poetry is a little more ergonomic as well.

I'd recommend poetry over pipenv for this project.

@kylebarron
Copy link
Contributor

I'd recommend poetry over pipenv for this project.

Just +1 I've had a good experience with Poetry lately. The next version of Poetry (or the 1.2.0 alpha) will make local development easier with support for editable installs like pip install -e .: python-poetry/poetry#3940

@bitner
Copy link
Collaborator

bitner commented Jul 19, 2021

I'd recommend poetry over pipenv for this project.

Just +1 I've had a good experience with Poetry lately. The next version of Poetry (or the 1.2.0 alpha) will make local development easier with support for editable installs like pip install -e .: python-poetry/poetry#3940

Oh, that's awesome! That's why the makefile had a weird step that it extracted the setup.py from the build.

@lossyrob
Copy link
Member Author

Made follow up issues, merging

@lossyrob lossyrob merged commit 16c222c into main Jul 19, 2021
@lossyrob lossyrob deleted the tests/rde/ingest-issue branch July 19, 2021 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants