Add parallel support #6

davidkohn88 · 2020-05-15T10:42:26Z

Add parallel support for both dump and restore.

Adds support for parallel restores by restoring in several parts avoiding issues with parallel restore that were caused by circular foreign key constraints in the TimescaleDB catalog.

davidkohn88 · 2020-05-15T10:42:45Z

I should probably be running these tests in parallel as well, can someone tell me how?

antekresic

Looks mostly good, a little refactoring would be needed to make the complicated logic a bit more readable.

antekresic · 2020-05-18T12:16:51Z

pkg/restore/restore.go

+
+	//In order to support parallel restores, we have to first do a pre-data
+	//restore, then restore only the data for the _timescaledb_catalog schema,
+	//which has circular foreign key constraints and can't be restored in
+	//parallel, then restore (in parallel) the data for everything else and the
+	//post-data (also in parallel, this includes building indexes and the like
+	//so it can be significantly faster that way)
+	var baseArgs = []string{fmt.Sprintf("--dbname=%s", cf.DbURI), "--format=directory"}
+
+	if cf.Verbose {
+		baseArgs = append(baseArgs, "--verbose")
+	}
+	// Now just the pre-data section
+	preDataArgs := append(baseArgs, "--section=pre-data")
+
+	err = runRestore(restorePath, cf.PgDumpDir, preDataArgs)
 	if err != nil {
-		return err
+		return fmt.Errorf("pg_restore run failed in pre-data section: %w", err)
 	}
-	//How does error handling in deferred things work?
-	defer postRestoreTimescale(cf.DbURI, tsInfo)

-	dump := exec.Command(restorePath)
-	dump.Env = append(os.Environ()) //may use this to set other environmental vars
-	dump.Args = append(dump.Args,
-		fmt.Sprintf("--dbname=%s", cf.DbURI),
-		"--format=directory",
-		"--verbose",
-		cf.PgDumpDir) //final argument to pg_restore should be the filename to restore from. Bad UI...
-	dump.Stdout = os.Stdout
-	dump.Stderr = os.Stderr
-	err = dump.Run()
+	//Now data for just the _timescaledb_catalog schema
+	catalogArgs := append(baseArgs, "--section=data", "--schema=_timescaledb_catalog")
+	err = runRestore(restorePath, cf.PgDumpDir, catalogArgs)
 	if err != nil {
-		return fmt.Errorf("pg_restore run failed with: %w", err)
+		return fmt.Errorf("pg_restore run failed while restoring _timescaledb_catalog: %w", err)
+	}
+
+	//Now the data for everything else, first time to add our parallel jobs
+	dataArgs := append(baseArgs, "--section=data", "--exclude-schema=_timescaledb_catalog")
+	if cf.Jobs > 0 {
+		dataArgs = append(dataArgs, fmt.Sprintf("--jobs=%d", cf.Jobs))
+	}
+	err = runRestore(restorePath, cf.PgDumpDir, dataArgs)
+	if err != nil {
+		return fmt.Errorf("pg_restore run failed while restoring user data: %w", err)
+	}
+
+	//Now the full post-data run, which should also be in parallel
+	postDataArgs := append(baseArgs, "--section=post-data")
+	if cf.Jobs > 0 {
+		postDataArgs = append(postDataArgs, fmt.Sprintf("--jobs=%d", cf.Jobs))
+	}
+	err = runRestore(restorePath, cf.PgDumpDir, postDataArgs)
+	if err != nil {
+		return fmt.Errorf("pg_restore run failed during post-data step: %w", err)
 	}
 	return err


This part seems like it has a custom logic (explained in the comment) which is kinda repetitive and could be refactored out to be more readable.

antekresic · 2020-05-18T12:21:08Z

@davidkohn88 you need to run t.Parallel() in the test to make it run in parallel.

https://golang.org/pkg/testing/#T.Parallel

Add support for parallel dumps, this allows large databases to be dumped considerably faster. Add tests for parallel dumps and restores as well as making sure that non-parallel dumps/restores still work.

davidkohn88 · 2020-05-21T12:24:55Z

Fixes #4

Add parallel restore support

b8276f4

Adds support for parallel restores by restoring in several parts avoiding issues with parallel restore that were caused by circular foreign key constraints in the TimescaleDB catalog.

davidkohn88 requested review from cevian and antekresic May 15, 2020 10:42

antekresic suggested changes May 18, 2020

View reviewed changes

Add parallel dump support

0180962

Add support for parallel dumps, this allows large databases to be dumped considerably faster. Add tests for parallel dumps and restores as well as making sure that non-parallel dumps/restores still work.

davidkohn88 force-pushed the add-parallel-support branch from 4722555 to 0180962 Compare May 21, 2020 12:23

davidkohn88 requested a review from antekresic May 21, 2020 12:24

antekresic approved these changes May 21, 2020

View reviewed changes

davidkohn88 merged commit b2283eb into master May 21, 2020

davidkohn88 deleted the add-parallel-support branch May 21, 2020 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel support #6

Add parallel support #6

davidkohn88 commented May 15, 2020

davidkohn88 commented May 15, 2020

antekresic left a comment

antekresic May 18, 2020

antekresic commented May 18, 2020

davidkohn88 commented May 21, 2020

Add parallel support #6

Add parallel support #6

Conversation

davidkohn88 commented May 15, 2020

davidkohn88 commented May 15, 2020

antekresic left a comment

Choose a reason for hiding this comment

antekresic May 18, 2020

Choose a reason for hiding this comment

antekresic commented May 18, 2020

davidkohn88 commented May 21, 2020