From 833a8f860e620c3f5b8c7eadf7d0b11763d6b22b Mon Sep 17 00:00:00 2001 From: lcard Date: Wed, 6 Sep 2023 15:42:07 +0100 Subject: [PATCH] Add changelog and migration instructions --- Makefile | 6 ++++ api/batect.yml | 6 ++++ docs/changelog.md | 26 ++++++++++++++++ docs/migration.md | 78 +++++++++++++++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 5 files changed, 117 insertions(+) create mode 100644 docs/migration.md diff --git a/Makefile b/Makefile index 5aaf084..90d1778 100644 --- a/Makefile +++ b/Makefile @@ -187,3 +187,9 @@ release: @python get_latest_release_changelog.py @gh release create ${version} -F latest_release_changelog.md @rm -rf latest_release_changelog.md + + +# Migration -------------------- +## +migrate-v7: ## Run the migration + @cd api/; ./batect migrate-v7 -- "--layer ${layer} --all-layers ${all-layers}" diff --git a/api/batect.yml b/api/batect.yml index 68fd7f0..ddc5772 100755 --- a/api/batect.yml +++ b/api/batect.yml @@ -121,3 +121,9 @@ tasks: run: container: service-image command: 'python get_latest_release_changelog.py' + + migrate-v7: + description: Run the rAPId migration script + run: + container: service-image + command: 'python migrations/scripts/v7_layer_migration.py' diff --git a/docs/changelog.md b/docs/changelog.md index e69de29..946df62 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -0,0 +1,26 @@ +# Changelog + +## v7.0.0 - _2023-09-06_ + +See [v7.0.0] changes. + +### Major Changes + +- Layers have been introduced to rAPId. These are now the highest level of grouping for your data. They allow you to separate your data into areas that relate to the layers in your data architecture e.g `raw`, `curated`, `presentation`. You will need to specify your layers when you create or migrate a rAPId instance. +- All the code is now in this monorepo. The previous [Infrastructure](https://github.com/no10ds/rapid-infrastructure), [UI](https://github.com/no10ds/rapid-ui) and [API](https://github.com/no10ds/rapid-api) repos are now deprecated. This will ease the use and development of rAPId. +- Schemas are now stored in DynamoDB, rather than S3. This offers speed and usability improvements, as well as making rAPId easier to extend. +- Code efficiency improvements. There were several areas in rAPId where we were executing costly operations that caused performance to degrade at scale. We've fixed these inefficiencies, taking us from O(n²) -> O(n) in these areas. +- Glue Crawlers have been removed, with Athena tables are created directly by the API instead. Data is now available to query immediately after it is uploaded, rather than the previous wait (approximately 3 mins) while crawlers ran. It also offers scalability benefits because without crawlers we are not dependant on the number of free IPs within the subnet. +- Improved UI testing with Playwright. + +### Breaking Changes + +- All dataset endpoints will be prefixed with `layer`. Typically going from `domain/dataset` to `layer/domain/dataset`. +- All sdk functions that interact with datasets will now require an argument for layer. + +### Migration + +- See the [migration doc](migration.md) for details on how to migrate to v7 from v6. + +[Unreleased changes]: https://github.com/no10ds/rapid/compare/v7.0.0...HEAD +[v7.0.0]: https://github.com/no10ds/rapid/v7.0.0 diff --git a/docs/migration.md b/docs/migration.md new file mode 100644 index 0000000..9b8b5f6 --- /dev/null +++ b/docs/migration.md @@ -0,0 +1,78 @@ +# Migration + +## Migrating to v7 from v6 + +All of the datasets need to be moved to a layer as part of the v7 migration. + +The migration script carries this out, along with other operations. + +To execute it, you'll need to decide: + +1. Which layer the existing datasets should be moved to. +2. What the full complement of layers in your rAPId instance should be. + +### Prerequisites + +#### Infrastructure changes + +The v7.0.0 infrastructure changes need to be applied to your rAPId instance. + +Update the version of the rAPId terraform module that you are using and apply the terraform. + +#### Local requirements + +You will need the ability to run `Batect`, the requirements for which are listed [here](https://batect.dev/docs/getting-started/requirements/). + +### Steps: + +#### Clone the repo + +To do this, run: + +`git clone -b v7.0.0 git@github.com:no10ds/rapid.git` + +#### Set your environment variables + +Within the rAPId repo, set the following variables in the `.env` file to match those of your rAPId instance and AWS account: + +``` +# rAPId instance variables +- AWS_REGION= +- DATA_BUCKET= +- RESOURCE_PREFIX= + +# AWS environment variables +- AWS_ACCESS_KEY_ID= +- AWS_SECRET_ACCESS_KEY= +- AWS_SESSION_TOKEN= +``` + +#### Run the migration script + +You can now run the script and specify your layer configuration. Examples for it are below: + +##### Example 1: + +You do not wish to use the layer functionality: + +- The existing datasets can be moved to a `default` layer +- The full complement of layers can just consist of one, called `default`. + +To do this, you would run: + +``` +make migrate-v7 layer=default all-layers=default +``` + +##### Example 2: + +You wish to use the layer functionality and largely have raw data already in your rAPId instance: + +- The existing datasets can be moved to a `raw` layer. +- The full complement of layers in your rAPId instance can mirror your architecture and be: `raw`, `curated` and `presentation` + +To do this, you would run: + +``` +make migrate-v7 layer=raw all-layers=raw,curated,presentation +``` diff --git a/mkdocs.yml b/mkdocs.yml index e538102..6733073 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -47,6 +47,7 @@ nav: - Patterns: - sdk/api/patterns/data.md - Releases: changelog.md + - Migration: migration.md - Contributing: contributing.md plugins: