Skip to content
This repository has been archived by the owner on Feb 19, 2022. It is now read-only.

Add parallel support #6

Merged
merged 2 commits into from
May 21, 2020
Merged

Add parallel support #6

merged 2 commits into from
May 21, 2020

Conversation

davidkohn88
Copy link
Contributor

Add parallel support for both dump and restore.

Adds support for parallel restores by restoring in several parts
avoiding issues with parallel restore that were caused by circular
foreign key constraints in the TimescaleDB catalog.
@davidkohn88 davidkohn88 requested review from cevian and antekresic May 15, 2020 10:42
@davidkohn88
Copy link
Contributor Author

I should probably be running these tests in parallel as well, can someone tell me how?

Copy link

@antekresic antekresic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good, a little refactoring would be needed to make the complicated logic a bit more readable.

Comment on lines 31 to 69

//In order to support parallel restores, we have to first do a pre-data
//restore, then restore only the data for the _timescaledb_catalog schema,
//which has circular foreign key constraints and can't be restored in
//parallel, then restore (in parallel) the data for everything else and the
//post-data (also in parallel, this includes building indexes and the like
//so it can be significantly faster that way)
var baseArgs = []string{fmt.Sprintf("--dbname=%s", cf.DbURI), "--format=directory"}

if cf.Verbose {
baseArgs = append(baseArgs, "--verbose")
}
// Now just the pre-data section
preDataArgs := append(baseArgs, "--section=pre-data")

err = runRestore(restorePath, cf.PgDumpDir, preDataArgs)
if err != nil {
return err
return fmt.Errorf("pg_restore run failed in pre-data section: %w", err)
}
//How does error handling in deferred things work?
defer postRestoreTimescale(cf.DbURI, tsInfo)

dump := exec.Command(restorePath)
dump.Env = append(os.Environ()) //may use this to set other environmental vars
dump.Args = append(dump.Args,
fmt.Sprintf("--dbname=%s", cf.DbURI),
"--format=directory",
"--verbose",
cf.PgDumpDir) //final argument to pg_restore should be the filename to restore from. Bad UI...
dump.Stdout = os.Stdout
dump.Stderr = os.Stderr
err = dump.Run()
//Now data for just the _timescaledb_catalog schema
catalogArgs := append(baseArgs, "--section=data", "--schema=_timescaledb_catalog")
err = runRestore(restorePath, cf.PgDumpDir, catalogArgs)
if err != nil {
return fmt.Errorf("pg_restore run failed with: %w", err)
return fmt.Errorf("pg_restore run failed while restoring _timescaledb_catalog: %w", err)
}

//Now the data for everything else, first time to add our parallel jobs
dataArgs := append(baseArgs, "--section=data", "--exclude-schema=_timescaledb_catalog")
if cf.Jobs > 0 {
dataArgs = append(dataArgs, fmt.Sprintf("--jobs=%d", cf.Jobs))
}
err = runRestore(restorePath, cf.PgDumpDir, dataArgs)
if err != nil {
return fmt.Errorf("pg_restore run failed while restoring user data: %w", err)
}

//Now the full post-data run, which should also be in parallel
postDataArgs := append(baseArgs, "--section=post-data")
if cf.Jobs > 0 {
postDataArgs = append(postDataArgs, fmt.Sprintf("--jobs=%d", cf.Jobs))
}
err = runRestore(restorePath, cf.PgDumpDir, postDataArgs)
if err != nil {
return fmt.Errorf("pg_restore run failed during post-data step: %w", err)
}
return err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part seems like it has a custom logic (explained in the comment) which is kinda repetitive and could be refactored out to be more readable.

@antekresic
Copy link

@davidkohn88 you need to run t.Parallel() in the test to make it run in parallel.

https://golang.org/pkg/testing/#T.Parallel

Add support for parallel dumps, this allows large databases to be
dumped considerably faster. Add tests for parallel dumps and restores
as well as making sure that non-parallel dumps/restores still work.
@davidkohn88 davidkohn88 force-pushed the add-parallel-support branch from 4722555 to 0180962 Compare May 21, 2020 12:23
@davidkohn88 davidkohn88 requested a review from antekresic May 21, 2020 12:24
@davidkohn88
Copy link
Contributor Author

Fixes #4

@davidkohn88 davidkohn88 merged commit b2283eb into master May 21, 2020
@davidkohn88 davidkohn88 deleted the add-parallel-support branch May 21, 2020 12:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants