-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sequential numbering with postgres or redshift schemas #63
Comments
I also prefer the timestamp naming convention. I'd suggest that both are offered in goose. If you prefer sequential, then default to that, but offer UTC timestamps with |
@rkulla sounds like a bug. Goose binary should see the subdirectories and give you I'm strongly against Timestamp-based versioning, since it causes much worse unpredictable schema conflicts than sequential versioning. See discussion in #27. In short: Given two developers work on different git branches and they both create migrations that change the schema in Sequential versions
0005_first_developer_account_changes.sql 0005_second_developer_different_account_changes.sql Thus, the second dev will need to resolve the conflict (rename the migration) and will probably rebase against master to update his PR. Timestamp-based versions2017...08_first_developer_account_changes.sql 2017...03_second_developer_different_account_changes.sql ** Note that this is how goose work by design from its very beginnings in https://bitbucket.org/liamstask/goose/ repo. As you can see, the sequential versions provide more safety during the development cycle. (Even though they might seem annoying and unnecessary from the beginning.) But more importantly, they provide predictable order of schema migrations. The predictable order has more benefits, though:
Hope this makes sense. I'm not saying sequential versioning is perfect - but it has some real benefits over timestamp-based versions. I'd prefer if we thought about some tooling around conflicts (ie. bump version of all files that were not applied yet to resolve conflicts or something), rather than falling back to timestamps. |
@VojtechVitek I see. Thanks for the quick response. I'm not particularly adamant about timestamps if the sequential versioning would work with sub-dirs. I do have some older timestamp based migrations that were made with liamstask/goose that would now be a mix of timestamps and sequential numbers. Do you think that would pose a problem if i don't rework them as sequential numbers, for people "migrating" from the old goose to this fork? |
@rkulla it will still work. But If you want to switch to sequential versions, you can always bump the latest migration to something like 10000000001 (something higher than latest timestamp) and go from there. |
@VojtechVitek IMO timestamp based migrations are a lot better than sequential ones. Major issue with sequential ones are collisions after merge. Why can't you check all timestamp versions against goose_db_version? In #27 someone provided argument with initial migration as example but in real world this case is minor and conflicts after merge are real problem. Logic and storage schemat should be discussed between developers, not migration numbers. |
@pawelma please read my comment at #63 (comment). Timestamp based approach doesn't guarantee ordering, in which the migrations get run. I think of DB schema migrations as of Problem is not when you merge two branches together, but in which order you run the DB migrations from these branches. Version-based naming prevents any conflicts, as it errors out on version collisions. I totally understand how annoying is to rename files manually. I don't like it either. But we don't have anything better right now. I wish we could just pick up the order from I'll give you one more example with timestamp-based approach: |
@VojtechVitek I read it earlier and it didn't convince me :) We can use any prefix which provides uniqueness and some sort of order. The order should matter but should not take over implementation. The order should matter in a longer period but in short time it shouldn't be forced. As you mentioned git model is a tree, history there can be linear but by default, it isn't and developers take care to resolve conflicts or make history linear. I agree with the problem described in your example but sequential versioning doesn't solve it. Imagin same issue with sequential versioning. Developer tries to merge his changes into master, CI tests are failing because there is another migration with the same number - so developer fixes the number and I can bet that in 90% (or even more) cases he doesn't check the code. If there is no restriction for 'merge only when tests pass' he may merge broken code. Let's assume that there is such restriction or he waits for another test execution - the result is red again because the original error was hidden by migration framework (instead of 1 debugging step he had to go through 2). What I want to say: You are not able to protect developers from mistakes but you can make their lives easier. Good practices like well-written tests, working CI, etc... should solve such problems. |
Yea, it would be nice if we could get the version number automatically somehow, like in git. Technically, goose binary could look into git repository and recognize number of commits since the initial commit for each migration merged into master. Sequential versioning enforces consistency of the DB schema on each environment. It's a stability decision. You probably want to have the same DB state on your localhost, in your CI tests, and then on staging/production.
I can rely on CI tests that enforce consistent and predictable DB schema, thus protecting developers and production devops from issues on production. That's imho more important than forcing them to rename bunch of files. Yes, it is annoying. But how can we do it better? Unfortunately, timestamps are not the answer. However, if you still prefer timestamp-based approach, you can simply edit this line: https://github.com/pressly/goose/blob/master/create.go#L22 and build your custom goose binary. |
could a hinting solution be concidered? Django uses sequential number migrations, however it also allows one to specify dependencies. Therefore by adding one line:
goose could know how to arange the history. |
I am running into serious problems when we have development going on on multiple branches. Scripts get created with the same sequence number on multiple branches. Then there is a conflict after a merge and deployment fails. And even worse if a different branch gets deployed with different numbering the state is hard to decipher. Not sure we will continue using goose. |
@croaker8 : May I suggest that you add a timestamp into the sql/go file automatically so you can resolve the sequence problem. Even though you would have the time based sequences there is no guarantee that the migrations would be in order. And you would still run into the same problems e.g manually solving the order of them except there would be no merge conflicts. Now on the other hand what @toudi suggested would solve this problem partially as the migrations would still be numbered out of the order and there would still be conflicts with merge however they might not be run out of order. |
We can use full name of file as version. It's give more information witch revision you apply. And the risk to create the same filename very low. And we don't need use timestamp in filename. About prefix for sort. It's not a problem, we will have key '0001_create_user', '0001_create_fiends', some people can't create depending migration without parent migration. |
And don't forget. All migration and branch, before release, merge in release branch and apply on staging step by step. If developers catch error on this step, they will roll back and try resolve problem localy. And goose have well api I think that goose need give to people simple tools for create migration and show clear logic how it work. And don't need resolved 0.0001% cases, which people need resolve by hand, because they wrong use goose and git |
Or, if you don't want destroy logic this number prefix, you need panic (exit 1), if you see 2 version In this moment, you just ignore different migration this different people with different filename, but with |
Hi guys. I created a proof of concept with migration dependencies support and revisions (where the revision can be anything) and two types of inputs - SQL and fizz. Could this be a possible solution to this problem? I'm quite new to golang so I'd love your insight. There's one hardcoded path in there, but I just wanted to know what you'd think about the API |
We came up with an interesting approach internally in Pressly with @1vn & @diogogmt :
We're working on a proof on concept right now, and we might merge it back here, if there's enough interest. It's a tool that can be either run manually on master branch, or automatically in the CI/CD pipeline or via a "merge bot". |
And what about CVS, Subversion, Bazaar and Mercurial ?) |
I think timestamp (or version number) + filename will easy good solution problem. And u don't need (and don't) use some dependencies of other tools. No ? |
@eaglemoor The solution we're thinking of doesn't depend on |
The timestamp migration order problem explained:
+migrations/2018-09-15-12:00:00_joe_1.sql
+migrations/2018-10-10-12:00:00_joe_2.sql
migrations/2018-09-15-12:00:00_joe_1.sql
+migrations/2018-10-01-12:00:00_alice_1.sql
migrations/2018-10-10-12:00:00_joe_2.sql
+migrations/2018-10-18-12:00:00_alice_2.sql ^ Note the order of these migrations. Goose orders migration files alphabetically.
Now, the above story can be even worse, if you have multiple production environments, as we do in Pressly. We have a separate databases in Germany, Canada, US etc. because of data residency regulations. It's impossible to keep all the environments in sync, if we don't use sequential ordering. The migration order mismatch can cause both
|
Fixed in #120. |
Released as v2.4.0. |
IMO this would be better solved with migrations depending on another migration. |
@mvrhov I disagree. Dependencies would be much more complex to implement & and also harder to maintain from the user perspective. Dependencies between migrations would also lock you in, since you wouldn't be able to remove any older migration files that are referenced from the newer migrations. You'd have to think about all of that when maintaining the code. In Pressly, we remove old migrations from our codebase every couple months and leave only the latest ~5 migrations. Regular clean-up keeps our codebase small and clean. |
Well removing older migrations might not be an option. e.g. It depend on how many copies of your product do you have. Also you need older migrations for someone to set up the dev environment. Unless you are scrubbing sensitive data from database an your devs are then using almost a production base.. But If you run large saas, then this is not an option. |
We run a large SAAS. We have five productions in five different AWS regions and millions of users in each copy of the product. And yes, we still do remove old migrations. Out of ~600 migrations, we have about 50 in our codebase right now.
Let me explain what we do in order to remove the old migrations in Pressly: We maintain two DB schema files in our codebase that are the "source" of truth for the initial state of the DB:
Let me know if you have any questions. This could be probably worth a blog post :) |
What is the significance of points 5 and 6? Wouldn't the "versioning problem" still be present without this bit? Just want to make sure I understand this correctly as we're finally getting around to updating our CI/CD to use hybrid versioning. |
@VojtechVitek Is the |
I'm just started using this fork because it has support for Redshift. However, I use schemas to organize tables with databases like redshift and postgres but the new sequential numbering of files, instead of timestamps, means that I have to put all my migration files in the same directory, otherwise the counter will start again at 00001 for each directory.
Say I have a single database in redshift or postgres, and it has multiple schemas
CREATE SCHEMA foo;
andCREATE SCHEMA bar
;. If I wanted to organize my goose migrations by folders, I might have:If i then ran
cd foo && goose create add_cars sql
thencd ../bar && goose_create add_trains sql
, goose will createfoo/00001_add_cars.sql
and bar/00001_add_trains.sql. So running:$ cd foo
$ goose postgres "user=rkulla dbname=postgres sslmode=disable" up
goose: no migrations to run. current version: 1
$ cd ../bar
$ goose postgres "user=rkulla dbname=postgres sslmode=disable" up
goose: no migrations to run. current version: 1
would only apply the migration under foo/ but not the one under bar/, because both migrations start with
00001_
and postgres or redshift only get ONE goose_db_version table to share amongst the different schema names. This makes them harder to organize unless I dogoose create add_cars_foo sql' and
goose create add_trains_bar sql, then run commands like
ls *_foo.sqland
ls *_bar.sql`.It's not that big of a deal right now I guess, but I'm wondering if there's a better way and if it's really worth not using timestamps. IIRC, Ruby on Rails's migration feature used to use sequential numbers but developers complained because it caused a lot of conflicts when 2 different developers working on the same project made separate changes but both generated a migration with the same number. So Rails switched to UTC timestamps.
The text was updated successfully, but these errors were encountered: