Skip to content

Commit

Permalink
Merge pull request #33 from no10ds/feature/changelog
Browse files Browse the repository at this point in the history
Feature/changelog
  • Loading branch information
TobyDrane authored Sep 6, 2023
2 parents 17869ad + 833a8f8 commit 5f5b155
Show file tree
Hide file tree
Showing 5 changed files with 117 additions and 0 deletions.
6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -187,3 +187,9 @@ release:
@python get_latest_release_changelog.py
@gh release create ${version} -F latest_release_changelog.md
@rm -rf latest_release_changelog.md


# Migration --------------------
##
migrate-v7: ## Run the migration
@cd api/; ./batect migrate-v7 -- "--layer ${layer} --all-layers ${all-layers}"
6 changes: 6 additions & 0 deletions api/batect.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,9 @@ tasks:
run:
container: service-image
command: 'python get_latest_release_changelog.py'

migrate-v7:
description: Run the rAPId migration script
run:
container: service-image
command: 'python migrations/scripts/v7_layer_migration.py'
26 changes: 26 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Changelog

## v7.0.0 - _2023-09-06_

See [v7.0.0] changes.

### Major Changes

- Layers have been introduced to rAPId. These are now the highest level of grouping for your data. They allow you to separate your data into areas that relate to the layers in your data architecture e.g `raw`, `curated`, `presentation`. You will need to specify your layers when you create or migrate a rAPId instance.
- All the code is now in this monorepo. The previous [Infrastructure](https://github.com/no10ds/rapid-infrastructure), [UI](https://github.com/no10ds/rapid-ui) and [API](https://github.com/no10ds/rapid-api) repos are now deprecated. This will ease the use and development of rAPId.
- Schemas are now stored in DynamoDB, rather than S3. This offers speed and usability improvements, as well as making rAPId easier to extend.
- Code efficiency improvements. There were several areas in rAPId where we were executing costly operations that caused performance to degrade at scale. We've fixed these inefficiencies, taking us from O(n²) -> O(n) in these areas.
- Glue Crawlers have been removed, with Athena tables are created directly by the API instead. Data is now available to query immediately after it is uploaded, rather than the previous wait (approximately 3 mins) while crawlers ran. It also offers scalability benefits because without crawlers we are not dependant on the number of free IPs within the subnet.
- Improved UI testing with Playwright.

### Breaking Changes

- All dataset endpoints will be prefixed with `layer`. Typically going from `domain/dataset` to `layer/domain/dataset`.
- All sdk functions that interact with datasets will now require an argument for layer.

### Migration

- See the [migration doc](migration.md) for details on how to migrate to v7 from v6.

[Unreleased changes]: https://github.com/no10ds/rapid/compare/v7.0.0...HEAD
[v7.0.0]: https://github.com/no10ds/rapid/v7.0.0
78 changes: 78 additions & 0 deletions docs/migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Migration

## Migrating to v7 from v6

All of the datasets need to be moved to a layer as part of the v7 migration.

The migration script carries this out, along with other operations.

To execute it, you'll need to decide:

1. Which layer the existing datasets should be moved to.
2. What the full complement of layers in your rAPId instance should be.

### Prerequisites

#### Infrastructure changes

The v7.0.0 infrastructure changes need to be applied to your rAPId instance.

Update the version of the rAPId terraform module that you are using and apply the terraform.

#### Local requirements

You will need the ability to run `Batect`, the requirements for which are listed [here](https://batect.dev/docs/getting-started/requirements/).

### Steps:

#### Clone the repo

To do this, run:

`git clone -b v7.0.0 git@github.com:no10ds/rapid.git`

#### Set your environment variables

Within the rAPId repo, set the following variables in the `.env` file to match those of your rAPId instance and AWS account:

```
# rAPId instance variables
- AWS_REGION=
- DATA_BUCKET=
- RESOURCE_PREFIX=
# AWS environment variables
- AWS_ACCESS_KEY_ID=
- AWS_SECRET_ACCESS_KEY=
- AWS_SESSION_TOKEN=
```

#### Run the migration script

You can now run the script and specify your layer configuration. Examples for it are below:

##### Example 1:

You do not wish to use the layer functionality:

- The existing datasets can be moved to a `default` layer
- The full complement of layers can just consist of one, called `default`.

To do this, you would run:

```
make migrate-v7 layer=default all-layers=default
```

##### Example 2:

You wish to use the layer functionality and largely have raw data already in your rAPId instance:

- The existing datasets can be moved to a `raw` layer.
- The full complement of layers in your rAPId instance can mirror your architecture and be: `raw`, `curated` and `presentation`

To do this, you would run:

```
make migrate-v7 layer=raw all-layers=raw,curated,presentation
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ nav:
- Patterns:
- sdk/api/patterns/data.md
- Releases: changelog.md
- Migration: migration.md
- Contributing: contributing.md

plugins:
Expand Down

0 comments on commit 5f5b155

Please sign in to comment.