Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History of applied migrations #179

Open
jabbors opened this issue Feb 26, 2019 · 24 comments
Open

History of applied migrations #179

jabbors opened this issue Feb 26, 2019 · 24 comments

Comments

@jabbors
Copy link

jabbors commented Feb 26, 2019

Problem
A use case that have come up with a customer would require us to keep a history of when migrations have been applied.

Solution
Most migration system I've seen include history of all applied migrations in the schema_migrations table. Instead of just containing the last applied version and dirty state I would propose that each migration applied would end up as a new row in the table, which includes the number, name and date applied. The table structure could be something like.

create table schema_migrations(
    id serial primary key not null,
    number int not null,
    name text not null,
    dirty bool not null default false,
    applied_at timestamp not null default now()
);

Which would yield something like this when migrations are applied

select * from schema_migrations;
 id | number |     name     | dirty |         applied_at
----+--------+--------------+-------+----------------------------
  1 |      1 | table bar    | f     | 2019-02-26 07:39:27.302288
  3 |      2 | table bar    | f     | 2019-02-26 07:40:21.939487
  4 |      3 | table foobar | t     | 2019-02-26 07:40:54.243851
@dhui
Copy link
Member

dhui commented Feb 26, 2019

A use case that have come up with a customer would require us to keep a history of when migrations have been applied.

Why do they want this?

Also see: #65

I'm hesitant to add this functionality to migrate unless there's a good reason since the required work is non-trivial. e.g. need to make a schema change to the schema version table for every DB driver and what dates should be used to backfill existing migrations?

It may be easier to add lifecycle hooks to migrate and have the consumer track the state for each migration...

@jabbors
Copy link
Author

jabbors commented Feb 27, 2019

It's not the customer that requires it explicitly, it's more of an internal compliance towards customers' requirements where a history would be useful.

It can be accomplished with SQL functions, e.g. a trigger to copy every add/update to a new table with a timestamp column. The important part is the number and date.

I wanted to bring up this question if other users would find it useful and since it hasn't been asked before. #65 is similar but not what I'm looking for.

The implementation details can be decided upon later if this issue is accepted and put on the roadmap. I understand that it might not be trivial to implement and some decision must be made for regarding existing migrations etc.

@dhui
Copy link
Member

dhui commented Feb 27, 2019

I wanted to bring up this question if other users would find it useful and since it hasn't been asked before.

I'll leave this issue open to gather feedback

It can be accomplished with SQL functions, e.g. a trigger to copy every add/update to a new table with a timestamp column.

IDK if every DB supported by migrate supports triggers. That being said, each schema version table is independently managed by the driver. e.g. the schema version table doesn't need to support the same structure across different DB drivers unless we want to change the driver interface to support history

In the meanwhile, so you're not blocked, it sounds like you could start tracking history yourself by adding a trigger.

@jabbors
Copy link
Author

jabbors commented Feb 28, 2019

Yeah, in the mean time we can track history with a trigger. I plan on creating the table and trigger with a migration. I can share the final result when it's done should anyone else finds it useful.

@deuill
Copy link

deuill commented Mar 18, 2019

As an additional data-point, we're running a service where database migrations are applied on startup with Migration.Up(), and never noticed having missed some migrates until a few days later where a database migration that referenced a table created in an earlier, missed migration, was failing. We run automated deploys on merges, and this was on staging instances of the service, thankfully.

This, of course, comes down to the fact that certain pull-requests introducing migrates were merged out-of-order, and we should've been more diligent with at least merging these in-order, if not re-numbering for the right order.

However, the fact still remains that the migration system did not error out with these missing migrates, as it does when a version stored in the database does not have a corresponding migrate file (this sometimes happens when testing changes locally, and we're jumping between branches).

It would be wonderful if we could apply these missing migrates in their versioned order, but throwing an error is imperative in catching these issues early, and tracking the history is, I believe, mandatory in implementing this.

I can try and work up some reasonable implementation for at least the subset of database engines I can test for, if you'd be willing to entertain the idea.

@dhui
Copy link
Member

dhui commented Mar 23, 2019

@deuill I'm glad you caught the issue before going to production!

Have you seen my comment here? #65 (comment)

I'm interested in solving the problem, but I'm not sure what the best solution for this problem (detecting changed/missing migrations) is.
The earlier the missing migration is detected in the dev process, the better. The problem is that everyone/team has a different dev and deploy process.

We should discuss solutions for this problem before implementing anything.

Goals for any solution (ordered by importance):

  1. Correctly catch missing migrations e.g. no false positives or false negatives
  2. Seamlessly integrate with existing users of migrate
    • migrate doesn't use migrate to manage it's schema verison table, but could start packing migrations in it's code.
  3. Support everyone's workflow
    • Would need to explicitly list workflows we support
    • May need overrides similar to migrate force
  4. Catch the missing migration as early as possible

Nice to have for solutions:

  • Detect modified migrations
    • Also support overriding/ignoring flags if someone intentionally modifies a migration

Current solutions:

  • Track history of applied migrations in DB (schema version table)
  • Manage manifest file of migration sources

@OscarHanzely
Copy link

OscarHanzely commented Apr 2, 2019

I very much agree with @deuill
This is first time I see migration system that is so much unsafe.
Ina team of multiple people with dependencies we expect that migrations will be applied in order merged in the code, not necesarily based on the developer's file creation time since some features could take longer to develop. Even if we would agree to the time sensitive order, we still have to track the files not just the head of the state. This way it is too dangerous to ignore/skip migration file. Of course it also breaks the option to run migration manually from psql,sql consoles command, but that's the point of migrations. The package should be only authority managing it.

I don;t see much benefit from maintaining/detecting modifications on migration files, once merged with code or release it should not be touched at all. History is essential through to catch possible production issues especially with dependencies.

Apparently this issue has been pointed two years ago mattes/migrate#237 with no resolution agreed on.

@dahu33
Copy link

dahu33 commented Apr 4, 2019

100% agree as well. Having an history of the migrations and being able to apply missing migrations (out of order) are a must. Maybe a feature for v5?

@jabbors
Copy link
Author

jabbors commented Apr 4, 2019

A migration (for PostgresSQL) to track history of applied migrations.

BEGIN;

SET ROLE 'postgres';

CREATE TABLE schema_migrations_history (
    id SERIAL PRIMARY KEY NOT NULL,
    version BIGINT NOT NULL,
    applied_at timestamptz NOT NULL DEFAULT NOW()
);

CREATE OR REPLACE FUNCTION track_applied_migration()
RETURNS TRIGGER AS $$
DECLARE _current_version integer;
BEGIN
    SELECT COALESCE(MAX(version),0) FROM schema_migrations_history INTO _current_version;
    IF new.dirty = 'f' AND new.version > _current_version THEN
        INSERT INTO schema_migrations_history(version) VALUES (new.version);
    END IF;
    RETURN NEW;
END;
$$ language 'plpgsql';

-- TRIGGER
CREATE TRIGGER track_applied_migrations AFTER INSERT ON schema_migrations FOR EACH ROW EXECUTE PROCEDURE track_applied_migration();

COMMIT;

This adds a new row to table schema_migrations_history when a row, with a greater version number, is inserted to table schema_migrations. It solves our use case where we use sequence numbers as versions instead of epoch time. Our migrations are also tested as part of our CI/CD pipeline where we can spot and prevent adding migrations with conflicting (or smaller) version numbers that would result in unapplied migrations.

@OscarHanzely
Copy link

@jabbors good implementation of history for a use case like mentioned above that the contract with client required to know the timestamp. It does not prevent the issues with skipping the source by migration tool, nor it updates history on migrate down, but it is great database implementation of tracking. We were thinking about something like that just for sake fo checking if any error occurs.

I am most afraid of the skipped migration file from source, we will be adding a policy that upon code merge the migration file timestamp version must be the current time to avoid issues. Otherwise, the newer timestamp merged/released before would cause any migration file with an old timestamp to never being applied.

@dhui
Copy link
Member

dhui commented Aug 16, 2019

Turns out pressly/goose has a different solution to this problem, which is to use timestamped migrations in dev and sequential migrations in prod and to explicitly serialize the migrations by converting the timestamped migrations to be sequential before deploy. I'm not a fan of this approach because of the following shortcomings:

  1. Doesn't address the issue that the migrations created in one order but merged in a different order may have incompatible schema changes.
  2. The developer needs to be aware of migration file names differing between the different environments which increases the amount of knowledge/context necessary during deploy and can complicate debugging any issues.

@dahu33
Copy link

dahu33 commented Aug 30, 2019

Probably one of the best migration tool out there is FlywayDB. Why not getting a bit of inspiration from them? https://flywaydb.org/documentation/migrations#overview

Something I really like with FlywayDB is they let you manage your migration the "way you want". For example, down migrations are optional and migrations can be run "out of order" (optional):
https://flywaydb.org/documentation/commandline/migrate#outOfOrder

I would love to have similar features and flexibility with golang-migrate. That would make the tool less opinionated on "how migration should be or should not be run" and it would most likely adapt better to different type of workflow/environments.

Just my 2cents :)

@tsauvajon
Copy link

This would be a great addition to go-migrate indeed, it's hard to keep track of what has been applied and when.

@nazar-pc
Copy link

We have the same issue with multiple migrations being merged not necessarily in the same order as they were created and expectation is that non-conflicting migrations will still be applied. So there should be a history of applied migrations and missing ones should be running. Non-conflicting merges are, obviously, task for developers to ensure they merge what should be merged and when.

@NickOttrando
Copy link

If this isn't going to be implemented with the suggested approach, i'd recommend changing the table name from schema_migrations to current_schema_version. The current name is very misleading when I apply multiple migrations and only see the most recent one in the table. Documentation around how this table works would also be very helpful.

@dhui
Copy link
Member

dhui commented Nov 8, 2020

Will this proposed design (#470) meet most of your needs? e.g. everything except for auditability

@jabbors
Copy link
Author

jabbors commented Nov 8, 2020

We have several production environments and it's common that some environments fall behind both in regards of deploying application upgrades and new migrations. For this reason, auditability is preferred in case we need to do some troubleshooting and track when a migration has been applied. But we have also solved that in Postgres by applying the above trigger function as an initial migration.

As for the proposed designed, it will be a great addition and it will help us with some issues relating to CI.

@amanangira
Copy link

Problem A use case that have come up with a customer would require us to keep a history of when migrations have been applied.

Solution Most migration system I've seen include history of all applied migrations in the schema_migrations table. Instead of just containing the last applied version and dirty state I would propose that each migration applied would end up as a new row in the table, which includes the number, name and date applied. The table structure could be something like.

create table schema_migrations(
    id serial primary key not null,
    number int not null,
    name text not null,
    dirty bool not null default false,
    applied_at timestamp not null default now()
);

Which would yield something like this when migrations are applied

select * from schema_migrations;
 id | number |     name     | dirty |         applied_at
----+--------+--------------+-------+----------------------------
  1 |      1 | table bar    | f     | 2019-02-26 07:39:27.302288
  3 |      2 | table bar    | f     | 2019-02-26 07:40:21.939487
  4 |      3 | table foobar | t     | 2019-02-26 07:40:54.243851

@jabbors we are planning to implement a similar approach. Is there any references that you can share for the implementation that you have made? I am still new to go-lang-migrate and it seems most of the migration logic is part of the package.

@jabbors
Copy link
Author

jabbors commented Sep 30, 2021

@amanangira this is the implementation that we are using #179 (comment).

It meets our requirements and have been working so far. Below is a sample from the table in a live system. Note: we are using a sequential version number in our migrations files {version}_{title}.up.{extension}, where a file would look something like 0165_create_table_foorbar.up.sql

SELECT * FROM schema_migrations_history ORDER BY applied_at DESC LIMIT 10;
 id  | version |          applied_at
-----+---------+-------------------------------
 168 |     182 | 2021-09-28 15:06:02.894574+00
 167 |     181 | 2021-09-28 12:59:10.771168+00
 166 |     180 | 2021-09-28 10:10:01.494816+00
 165 |     179 | 2021-06-11 09:38:54.363129+00
 164 |     178 | 2021-06-10 07:34:28.292296+00
 163 |     177 | 2021-06-08 08:31:25.304139+00
 162 |     176 | 2021-06-03 10:53:32.802223+00
 161 |     175 | 2021-06-01 07:35:30.812291+00
 160 |     174 | 2021-05-26 09:13:36.25021+00
 159 |     173 | 2021-05-25 06:13:45.053859+00
(10 rows)

@jabbors
Copy link
Author

jabbors commented Sep 30, 2021

A good practice that we have adapted is to create the trigger as the very first migration for new databases, ie. 0000_initial.up.sql

@amanangira
Copy link

@amanangira this is the implementation that we are using #179 (comment).

It meets our requirements and have been working so far. Below is a sample from the table in a live system. Note: we are using a sequential version number in our migrations files {version}_{title}.up.{extension}, where a file would look something like 0165_create_table_foorbar.up.sql

SELECT * FROM schema_migrations_history ORDER BY applied_at DESC LIMIT 10;
 id  | version |          applied_at
-----+---------+-------------------------------
 168 |     182 | 2021-09-28 15:06:02.894574+00
 167 |     181 | 2021-09-28 12:59:10.771168+00
 166 |     180 | 2021-09-28 10:10:01.494816+00
 165 |     179 | 2021-06-11 09:38:54.363129+00
 164 |     178 | 2021-06-10 07:34:28.292296+00
 163 |     177 | 2021-06-08 08:31:25.304139+00
 162 |     176 | 2021-06-03 10:53:32.802223+00
 161 |     175 | 2021-06-01 07:35:30.812291+00
 160 |     174 | 2021-05-26 09:13:36.25021+00
 159 |     173 | 2021-05-25 06:13:45.053859+00
(10 rows)

I probably missed this in the thread. This looks interesting. I will spend more time putting the approach against our use case. Thank you @jabbors

@dillonstreator
Copy link

As a workaround, my team has opted to use a custom required GitHub action which runs on new PRs. This also requires enforcing the strict status check branch protection rule so that the head branch is up to date with the base branch (ensuring the script compares against the latest migrations).

# .github/workflows/migration-validation.yml

name: migration-validation
on:
  pull_request:
    types: [opened, synchronize, edited, reopened]
    branches:
      - main
jobs:
  migration-validation:
    name: migration-validation
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: update checkout
        run: git fetch --prune
      - name: migration-validation
        run: scripts/migration-validation.sh ${{ github.event.pull_request.base.ref }}
        shell: bash
# scripts/migration-validation.sh

BASE_BRANCH=$1

OLDEST_NEW_MIGRATION_FILE=$(git diff --name-only origin/$BASE_BRANCH --diff-filter=d | grep -m1 db/migrations/)

if [[ -z $OLDEST_NEW_MIGRATION_FILE ]]; then
    echo "no new migrations"
    exit 0
fi

NEWEST_EXISTING_MIGRATION_FILE=$(git ls-tree -r origin/$BASE_BRANCH --name-only | grep db/migrations/ | tail -1)

if [[ -z $NEWEST_EXISTING_MIGRATION_FILE ]]; then
    echo "no existing migrations"
    exit 0
fi

echo "oldest new migration $OLDEST_NEW_MIGRATION_FILE"
echo "newest existing migration $NEWEST_EXISTING_MIGRATION_FILE"

EXISTING_TIMESTAMP="$(basename $NEWEST_EXISTING_MIGRATION_FILE | cut -d '_' -f 1)"

NEW_TIMESTAMP="$(basename $OLDEST_NEW_MIGRATION_FILE | cut -d '_' -f 1)"

if [[ $EXISTING_TIMESTAMP -ge $NEW_TIMESTAMP ]]; then
    echo "existing migration timestamp is greater than or equal to incoming migration timestamp. please update your migrations timestamp."
    exit 1
fi

echo "new migration(s) are safe to merge"
exit 0

Running this as a Github action provides a good feedback loop and seems like a decent hack until a solution is settled on and implemented.

@lzap
Copy link

lzap commented May 6, 2022

Just an idea - how about checking the filenames before loading them for processing?

package db

import (
	"embed"
	"log"
	"strconv"

	migrate "github.com/golang-migrate/migrate/v4"
	_ "github.com/golang-migrate/migrate/v4/database/postgres"
	"github.com/golang-migrate/migrate/v4/source/iofs"
)

//go:embed migrations
var fs embed.FS

func Migrate() {
	d, err := iofs.New(fs, "migrations")
        // ...
}

// Checks that migration files are in proper format and index has no gaps or reused numbers.
func init() {
	dir, err := fs.ReadDir("migrations")
	if err != nil {
		log.Fatal("Unable to open migrations embedded directory")
	}
	if len(dir)%2 != 0 {
		log.Fatal("Migration files must be even")
	}
	// count migration prefixes
	checks := make([]int, len(dir)/2)
	for _, de := range dir {
		ix, err := strconv.Atoi(de.Name()[:5])
		if err != nil {
			log.Fatalf("Migration %s does not start with an integer?", de.Name())
		}
		if ix-1 > len(checks)-1 {
			log.Fatalf("Is there a gap in migration numbers? Number %d is way too high", ix)
		}
		checks[ix-1]++
	}
	// check expected result
	for i, x := range checks {
		if x != 2 {
			log.Fatalf("There are not exactly two migration files with index %05d, found: %d", i+1, x)
		}
	}
}

@amenowanna
Copy link

A migration (for PostgresSQL) to track history of applied migrations.

BEGIN;

SET ROLE 'postgres';

CREATE TABLE schema_migrations_history (
    id SERIAL PRIMARY KEY NOT NULL,
    version BIGINT NOT NULL,
    applied_at timestamptz NOT NULL DEFAULT NOW()
);

CREATE OR REPLACE FUNCTION track_applied_migration()
RETURNS TRIGGER AS $$
DECLARE _current_version integer;
BEGIN
    SELECT COALESCE(MAX(version),0) FROM schema_migrations_history INTO _current_version;
    IF new.dirty = 'f' AND new.version > _current_version THEN
        INSERT INTO schema_migrations_history(version) VALUES (new.version);
    END IF;
    RETURN NEW;
END;
$$ language 'plpgsql';

-- TRIGGER
CREATE TRIGGER track_applied_migrations AFTER INSERT ON schema_migrations FOR EACH ROW EXECUTE PROCEDURE track_applied_migration();

COMMIT;

This adds a new row to table schema_migrations_history when a row, with a greater version number, is inserted to table schema_migrations. It solves our use case where we use sequence numbers as versions instead of epoch time. Our migrations are also tested as part of our CI/CD pipeline where we can spot and prevent adding migrations with conflicting (or smaller) version numbers that would result in unapplied migrations.

Thank you for sharing this solution this is what we chose to do to address this gap in the tool. Elegant and reverse compatible with the way the tool currently works. Would encourage that a similar solution is added to the tool to address all of the concerns on this issue. Similarly as others have noted here this seems like a big gap when compared to other migration tools I have used.

amenowanna added a commit to mergestat/mergestat that referenced this issue Sep 28, 2022
Adding a Table, Function, and Trigger to track the history of our schema
migrations. Currently the `schema_migrations` table only stores the
latest migration so it is difficult to track what migrations were
applied and when and this is an issue when troubleshooting failed or
missed migrations.

The solution was found in an
[issue](golang-migrate/migrate#179 (comment))
for our migration tool

Resolves #330 

Example output of the table on a clean database where all migrations ran

![image](https://user-images.githubusercontent.com/10135546/192883450-a90ec30b-0db3-45b5-9ac9-42f2259e8dfb.png)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests