This repository has been archived by the owner on Feb 19, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds support for parallel restores by restoring in several parts avoiding issues with parallel restore that were caused by circular foreign key constraints in the TimescaleDB catalog.
I should probably be running these tests in parallel as well, can someone tell me how? |
antekresic
suggested changes
May 18, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good, a little refactoring would be needed to make the complicated logic a bit more readable.
pkg/restore/restore.go
Outdated
Comment on lines
31
to
69
|
||
//In order to support parallel restores, we have to first do a pre-data | ||
//restore, then restore only the data for the _timescaledb_catalog schema, | ||
//which has circular foreign key constraints and can't be restored in | ||
//parallel, then restore (in parallel) the data for everything else and the | ||
//post-data (also in parallel, this includes building indexes and the like | ||
//so it can be significantly faster that way) | ||
var baseArgs = []string{fmt.Sprintf("--dbname=%s", cf.DbURI), "--format=directory"} | ||
|
||
if cf.Verbose { | ||
baseArgs = append(baseArgs, "--verbose") | ||
} | ||
// Now just the pre-data section | ||
preDataArgs := append(baseArgs, "--section=pre-data") | ||
|
||
err = runRestore(restorePath, cf.PgDumpDir, preDataArgs) | ||
if err != nil { | ||
return err | ||
return fmt.Errorf("pg_restore run failed in pre-data section: %w", err) | ||
} | ||
//How does error handling in deferred things work? | ||
defer postRestoreTimescale(cf.DbURI, tsInfo) | ||
|
||
dump := exec.Command(restorePath) | ||
dump.Env = append(os.Environ()) //may use this to set other environmental vars | ||
dump.Args = append(dump.Args, | ||
fmt.Sprintf("--dbname=%s", cf.DbURI), | ||
"--format=directory", | ||
"--verbose", | ||
cf.PgDumpDir) //final argument to pg_restore should be the filename to restore from. Bad UI... | ||
dump.Stdout = os.Stdout | ||
dump.Stderr = os.Stderr | ||
err = dump.Run() | ||
//Now data for just the _timescaledb_catalog schema | ||
catalogArgs := append(baseArgs, "--section=data", "--schema=_timescaledb_catalog") | ||
err = runRestore(restorePath, cf.PgDumpDir, catalogArgs) | ||
if err != nil { | ||
return fmt.Errorf("pg_restore run failed with: %w", err) | ||
return fmt.Errorf("pg_restore run failed while restoring _timescaledb_catalog: %w", err) | ||
} | ||
|
||
//Now the data for everything else, first time to add our parallel jobs | ||
dataArgs := append(baseArgs, "--section=data", "--exclude-schema=_timescaledb_catalog") | ||
if cf.Jobs > 0 { | ||
dataArgs = append(dataArgs, fmt.Sprintf("--jobs=%d", cf.Jobs)) | ||
} | ||
err = runRestore(restorePath, cf.PgDumpDir, dataArgs) | ||
if err != nil { | ||
return fmt.Errorf("pg_restore run failed while restoring user data: %w", err) | ||
} | ||
|
||
//Now the full post-data run, which should also be in parallel | ||
postDataArgs := append(baseArgs, "--section=post-data") | ||
if cf.Jobs > 0 { | ||
postDataArgs = append(postDataArgs, fmt.Sprintf("--jobs=%d", cf.Jobs)) | ||
} | ||
err = runRestore(restorePath, cf.PgDumpDir, postDataArgs) | ||
if err != nil { | ||
return fmt.Errorf("pg_restore run failed during post-data step: %w", err) | ||
} | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part seems like it has a custom logic (explained in the comment) which is kinda repetitive and could be refactored out to be more readable.
@davidkohn88 you need to run |
Add support for parallel dumps, this allows large databases to be dumped considerably faster. Add tests for parallel dumps and restores as well as making sure that non-parallel dumps/restores still work.
davidkohn88
force-pushed
the
add-parallel-support
branch
from
May 21, 2020 12:23
4722555
to
0180962
Compare
Fixes #4 |
antekresic
approved these changes
May 21, 2020
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add parallel support for both dump and restore.