Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use on-conflict-do-nothing flag due to other missing flags that cannot be supplied #153

Closed
Tracked by #149
bjorngoossens opened this issue Aug 9, 2024 · 6 comments · Fixed by #157
Closed
Tracked by #149
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@bjorngoossens
Copy link

Hi all, great tool you're working on!

Using Greenmask for dumping/restoring full databases works well for us. However, we'd really like to be able to restore multiple times and skip any data that causes conflicts. The underlying pg_dump command has a feature flag for this: on-conflict-do-nothing. This is also listed in the available flags in the documentation.

However, when I try to run a dump with this flag enabled, pg_dump gives me this output:
pg_dump: error: option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts

I cannot figure out how to supply any of these additional flags to Greenmask, and I also don't see them listed as possible options in the documentation linked above. I also don't see any reference to these flags in the file that is used to supply the on-conflict-do-nothing flag.

Am I missing something to make on-conflict-do-nothing work nonetheless? Or should these options/flags be added to Greenmask to make it work. Thanks in advance for your help!

@bjorngoossens bjorngoossens changed the title Unable to supply on-conflict-do-nothing flag due to other missing flags that cannot be added Unable to use on-conflict-do-nothing flag due to other missing flags that cannot be supplied Aug 9, 2024
@wwoytenko wwoytenko self-assigned this Aug 9, 2024
@wwoytenko wwoytenko added the enhancement New feature or request label Aug 9, 2024
@wwoytenko
Copy link
Contributor

Hi! Thanks for sharing your feedback. I'm glad to hear that Greenmask is helping you.

By the way, I agree it's a bit controversial that not all pg_dump/pg_restore options are supported.

I believe it's necessary to implement such options, and here’s a list of tasks we need to address:

  • Implement an INSERT-like dump when the --inserts option is provided.
  • Implement --on-conflict-do-nothing for the --inserts option.
  • Ensure that when restoring a dump in INSERT-like format, all constraints are deployed first to prevent unexpected errors when data violates constraints.
  • Develop a strategy for restoring tables topologically. We need to carefully plan the deployment, as parallel jobs can only restore tables if their dependent tables have already been restored. Initially, this might require a single job.

@wwoytenko
Copy link
Contributor

I will add it to the backlog. It might be done at the end of this quarter or in the beginning of the Q4 2024.

@wwoytenko wwoytenko mentioned this issue Aug 9, 2024
11 tasks
@bjorngoossens
Copy link
Author

bjorngoossens commented Aug 12, 2024

@wwoytenko thanks for your input, and great to hear that you're planning on picking this up. To give a bit more insight into our intended use-case: we mainly want to use Greenmask for our dev team to quickly get up-and running locally with an anonymized but true-to-life db. Greenmask fulfills this need really well. The feedback I get from our team is that it would be even better to allow them to re-run the restoration process periodically to get new data. Now they have to set up a new, empty database for this. It would be really nice if they could just run a restore operation again.

Happy to provide you with some input when needed. Thanks!

@wwoytenko
Copy link
Contributor

@bjorngoossens thank you so much for the details.

There are a few points we need to clarify to determine the best implementation approach based on the requirements.

Description:

According to the PostgreSQL docs

The optional ON CONFLICT clause specifies an alternative action to raising a unique violation or exclusion constraint violation error.

When using --on-conflict-do-nothing, it generates the following SQL statement (example from a test database):

INSERT INTO person.password VALUES ('2811', 'M/FE9Z+3Jj1EKOLYSlfuUEnLfMlfFt91yy6qLfGguTA=', 'Tu7nuZo=', '4d6140e8-33e6-494f-8ea6-c56afc5d50ec', '2014-01-29 00:00:00') ON CONFLICT DO NOTHING;

However, this approach only works for unique or exclusion constraints. If a foreign key is not present in the referenced table, the insertion will still fail. For example:

transformed=# INSERT INTO person.password VALUES ('2811', 'M/FE9Z+3Jj1EKOLYSlfuUEnLfMlfFt91yy6qLfGguTA=', 'Tu7nuZo=', '4d6140e8-33e6-494f-8ea6-c56afc5d50ec', '2014-01-29 00:00:00') ON CONFLICT DO NOTHING;
ERROR:  insert or update on table "password" violates foreign key constraint "FK_Password_Person_BusinessEntityID"
DETAIL:  Key (businessentityid)=(2811) is not present in table "person".

How critical is this issue? Do you want to rely on other constraints like check constraints or foreign keys?

In my opinion, it might be acceptable to ignore other constraints as well, at least as an option. In this particular case, we could resolve the issue by inserting data using a transaction per statement. If the driver returns an error indicating a constraint violation, we can choose to skip that insertion. The downside is that the target table may become significantly bloated.

@wwoytenko
Copy link
Contributor

I've made MR #157 which has to close this feat request

@wwoytenko wwoytenko added this to the v0.2b2 milestone Aug 16, 2024
@wwoytenko wwoytenko moved this to In progress in Engineering Aug 16, 2024
@wwoytenko
Copy link
Contributor

It is closed by #157

@wwoytenko wwoytenko moved this from In progress to Done in Engineering Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants