Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC #1 - questions about supporting test database syncing better #324

Open
zilkey opened this issue Nov 6, 2015 · 4 comments
Open

RFC #1 - questions about supporting test database syncing better #324

zilkey opened this issue Nov 6, 2015 · 4 comments

Comments

@zilkey
Copy link

zilkey commented Nov 6, 2015

Loading / Dumping Schema

A scenario I come across fairly often is the need to drop old migrations. For example, on a large codebase there might be dozens or hundreds of migrations. In these cases, I find it helpful to just delete old ones (taking care to make sure all environments including dev environments). After doing this, when new developers come on the team, they would just create an empty database from a structure.sql file (like in Rails) to get setup.

What do you think about adding the ability for node-db-migrate to dump a schema file to a directory specified in database.json after every migration? This script would:

  • create a database dump
  • add sql statements to insert migration rows (so future db-migrate up commands would work). This would copy the migrations table from the dev database

Here's a simple example of this kind of script (minus the migrations piece):

require('dotenv').load({silent: true})
var child_process = require('child_process');

var dumpOptions = [
  '-s',
  '-x',
  '-c',
  '-O',
  '-f',
  'db/structure.sql',
  process.env.DATABASE_URL
]

var loadOptions = [
  '-q',
  '-f',
  'db/structure.sql',
  process.env.TEST_DATABASE_URL,
]

var stdOptions = { stdio: [ 0, 'inherit', 'inherit' ] }

function prepareDatabase() {
  return new Promise(function (resolve, reject) {
    pgDump = child_process.spawn('pg_dump', dumpOptions, stdOptions);
    pgDump.on('exit', function (code, err) {
      if (err) reject(err)
      psql = child_process.spawn('psql', loadOptions, stdOptions);
      psql.on('exit', function (code, err) {
        if (err) reject(err)
        resolve(code)
      })
    });
  })
}

module.exports = prepareDatabase

if (require.main === module) {
  prepareDatabase().then(function (code) {
    console.log('Exited with ' + code);
  })
}

Thoughts?

Auto-migrate more than one env

Another helpful feature could be to specify multiple environments. So when I run a migration I could, specify more than one environment (or maybe it just defaults to always trying to load test??).

The use case here is that when I'm testing, and I migrate, I'll always want to prepare the test database before running tests again.

Abort on pending migrations

When running tests, it would be nice to be able to quickly diff the test database's migrations with those in the dev database. This would be less necessary if running migrations auto-migrated the test database, but still useful as a quick check.

I imagine this would be available to developers as a module, rather than through the CLI.

Thoughts on any / all of the above? If I wrote these features, what are the chances they'd be merged?


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@wzrdtales wzrdtales changed the title questions about supporting test database syncing better RFC #1 - questions about supporting test database syncing better Nov 7, 2015
@wzrdtales wzrdtales self-assigned this Nov 7, 2015
@wzrdtales
Copy link
Member

So far, I'm in a hurry and have not read everything right now, but I will leave some thoughts right now.

The thing about the migrations getting outdated and unneeded is known to me, we also have switched to place an up to date sql file on a known location for all developers instead of rerunning the migration set. But this had multiple other reasons, for example that the live database contains much more data, there are included big ETL processes and so on. But I know the pain though.

I would however not drop migrations, this is not a good behavior and not consistent. Maybe it makes sense to archive some of them if you're moving forward to the next major version of your product. If you're doing this you would create a much smaller set of migrations than before, but you would also need to mitigate the changed migrations table in some way. Maybe it would be the right way to think of tagged migrations and the new migrations would have the tag on the new position where the migrator actually needs to continue from and can build the information how he needs to rebuild the migrations table.

Thank you for your ideas so far, I highly appreciate them! I will take time to complete my answer with my thoughts later on, maybe tomorrow.

@wzrdtales
Copy link
Member

Your use case about auto migrations is not a common use case and also not a common practice, thus things like default loading the test environment is out of question. Wouldn't it also make more sense to chain your test suite together with the migrations?
However executing multiple environments after each other could be added to db-migrate, but I do not really see the use in here. Obviously this is just my opinion and as long as this does not gets confusing for the user and it is helpful those, it might make sense. Maybe one can think of definable groups/sets of environments which can be executed after each other, that would from my perspective also open a use case that I would support. Migrating different database at the same time, I have and am working on projects where I have different databases at the same time, for different specialities. Most of the time this is a combination of a NoSQL or NewSQL and a RDBMS like MariaDB. Defining a group or sets could make the task of migrating both of these a bit more comfortable. Not to forget that here most of the different migrations are going to be executed, thus the migrations folder has to be configurable within the environment.

@wzrdtales
Copy link
Member

The thing about the dumps is not taking care of being generic yet. Simply dumping the db is not enough, this wouldn't be generic for all other databases and this is not something that db-migrate should do. A user can do within its userspace whatever he/she wants, but it is not a good practice to have an abstract layer for migrations that do not work on all databases in the end.

There would be the option to create json dumps instead, this needs some concepting from here though.

Apart from that it is questionable, if one would do a dump, if this is really enough without the data within those tables. At this point when data gets written through db-migrate (in the best case through the new seeders), we need to recognize this and execute the needed seeders which would have been executed. Dumping this data to json dumps or something similar would again be an option, but this again needs conception from here.

@wzrdtales
Copy link
Member

To the last point, basically one instance of db-migrate has one connection at a time. Even with mutliple environments they would execute chained after each other. Thus diffing executed migrations accross different connections might be a problem for this. Maybe we can provide an API functionality to filter the actual migrations against a provided list of migrations and an API functionality that returns actually executed migrations. If a developer would want to make a diff between two dbs, he could utilize those two functions together and archive the wished result you descibed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants