Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new replication gem #18686

Merged
merged 23 commits into from
May 15, 2019
Merged

Conversation

carbonin
Copy link
Member

@carbonin carbonin commented Apr 24, 2019

This PR moves us from the pg-pglogical gem to the pg-logical_replication one.

This means that we will be utilizing the built-in logical replication feature new in PostgreSQL 10.

A few things have changed between the two, but the general concepts are compatible. There will still be subscriptions configured in the global region and a replication slot in the remote region.

Major changes:

  • The concept of user-editable excludes has been removed. This is because the user would need to take action on both the remote and global databases to change the list of excluded tables and because this was originally only needed because of the performance issues of rubyrep.
  • There is no-longer a concept of the extension being enabled/disabled or a "node". Most of the replication operations are atomic, requiring just a single command.
  • replication specs can now be run on any local machine running PG 10

Terminology changes:

  • replication set => publication

Gemfile Outdated Show resolved Hide resolved
lib/miq_pglogical.rb Outdated Show resolved Hide resolved
lib/miq_pglogical.rb Outdated Show resolved Hide resolved
lib/miq_pglogical.rb Outdated Show resolved Hide resolved
lib/miq_pglogical.rb Outdated Show resolved Hide resolved
@Fryguy
Copy link
Member

Fryguy commented Apr 26, 2019

Awesome... love all the 🔴 red 🔴

@carbonin carbonin force-pushed the use_new_replication_gem branch 3 times, most recently from f12ded7 to 8666f5d Compare April 29, 2019 20:55
carbonin added 12 commits May 7, 2019 15:26
Most notably, the concept of a node goes away and we can set the
tables in a publication directly rather than adding and removing
them one by one.

This also eliminates the need for a lock around the publication
because the operation is atomic.
No one is using this and the concept of a "node" is gone now
Previously it was a separate suite, but now that we're using the
built-in replication technology we can put it next to the other
lib specs
This is needed to allow the subscription to pick up on tables that
were newly added to the publication in the remote region.
@carbonin
Copy link
Member Author

carbonin commented May 9, 2019

@Fryguy I think this is ready to go, the gems are released and I tested out all the edge cases I could think of.

I don't want to un-wip just yet though because I think merging this will break the UI specs if we don't also merge ManageIQ/manageiq-ui-classic#5546 rather quickly afterwards and that hasn't been reviewed yet.

@himdel
Copy link
Contributor

himdel commented May 10, 2019

Consider the UI bit approved 👍 :)

@carbonin
Copy link
Member Author

Thanks @himdel. @Fryguy can you take another look through this?

@carbonin carbonin changed the title [WIP] Use new replication gem Use new replication gem May 10, 2019
def safe_delete
pglogical.drop_subscription(id, true)
rescue PG::InternalError => e
raise unless e.message =~ /could not connect to publisher/ || e.message =~ /replication slot .* does not exist/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone has a non-English postgres installation, will this still work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not? I can try to change the appliance locale and see if things break, but the alternative would be to just remove this conditional and allow the next command to raise if it was some other error so I feel like it makes sense to just leave it in given that we don't have a more specific error to rescue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I didn't know if it was a problem if the code tries to move forward or not. I know in the past that message text checking has been brittle in particular with non-English databases, but I assume this was last resort for you because there wasn't something more specific.

# Sets the tables that should be used to create the replication set using refresh_excludes
def configured_excludes=(new_excludes)
@configured_excludes = new_excludes | ALWAYS_EXCLUDED_TABLES
@pg_connection = ApplicationRecord.connection.raw_connection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this, and could this create issues with ActiveRecord connection pooling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were to checkout a new connection I guess it could, but otherwise it's using the same connection that the worker process would be using anyway, right? I guess the exception would be if we're using any features provided specifically by AR like locking or something, but if you think it could be an issue I can grab the conn info out of this object and open up a new connection. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm just concerned that we are holding a handle to the raw connection, which may also be given out again when someone else calls ApplicationRecord.connection. I'm not sure if 2 threads using the same raw_connection is a problem or not. When we were using the ActiveRecord connection instead of the raw connection I was less concerned because that was the object being pooled. (TBH, I don't understand how the pooling works when you call ApplicationRecord.connection, since I don't know what puts it back into the pool)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely a valid concern. @jrafanie any thoughts here?

Copy link
Member

@jrafanie jrafanie May 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pooling is thread based. My concern would be that we consume the connection for the life of this client and could cause us to exhaust the pool if these clients could ever run at the same time. Additionally, I don't think connection will explicitly checkin the connection so I'm not sure when this would be checked in, otherwise, it would wait until when the pool is full and the reaper starts checking for dead threads holding connections.

If you use the connection configuration to establish the connection with the PG gem directly, it feels like you'll get the parts of of activerecord you want without the risk of peeing in the pool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's thread based, and the lifetime of this MiqPglogical object is on-demand, then we are probably ok and can merge. On the other hand, if we keep a cached instance of the MiqPglogical instance, then we would leak the underlying connection, and that would be an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I was saying before about checking out a new connection. But I think it's just using the one that's already assigned to the thread.

And as far as I can tell the only place that's even saving an instance past one invocation is a rake task:

app/models/miq_region.rb:146:    MiqPglogical.new.provider?
app/models/miq_region.rb:150:    MiqPglogical.new.subscriber?
app/models/miq_region.rb:167:    MiqPglogical.new.destroy_provider   if current_type == :remote
app/models/miq_region.rb:169:    MiqPglogical.new.configure_provider if desired_type == :remote
lib/tasks/evm_dbsync.rake:24:      pgl = MiqPglogical.new

@Fryguy
Copy link
Member

Fryguy commented May 13, 2019

My only other general question is how much of this could be pushed down into the gem? There are a handful of things here that seems like they could be made into a method in the gem. Even so, I'd be ok with doing those later as a follow up.

This is the situation where the remote server is unreachable so
we need to set the replication slot to "NONE" in order to drop the
subscription without error. But we don't want to end up in an
in-between case where the subscription is disabled, but not deleted
so wrap the three commands in a transaction.
…ibutes

We have a .subscription_status method that actually returns the
status of a subscription based on the number of workers and the
enabled attribute.

The instance method is actually returning a new copy of the current
subscription's attirubtes for fetching things like the current wal
location or the password so we can more accurately name it
subscription_attributes
@carbonin
Copy link
Member Author

@Fryguy I'm not sure there's a whole lot I would want to push down into the gem, but like you said, we can catch those as we move forward from here.

@miq-bot
Copy link
Member

miq-bot commented May 13, 2019

Checked commits carbonin/manageiq@8cec170~...eed3a50 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
11 files checked, 0 offenses detected
Everything looks fine. ⭐

@Fryguy Fryguy merged commit 13051cf into ManageIQ:master May 15, 2019
@Fryguy Fryguy added this to the Sprint 112 Ending May 27, 2019 milestone May 15, 2019
@carbonin carbonin deleted the use_new_replication_gem branch August 16, 2019 15:54
jrafanie added a commit to jrafanie/manageiq-schema that referenced this pull request Jan 14, 2022
As of ManageIQ/manageiq#18686 we moved from pglogical
to the pg-logical_replication using postgresql's native logical replication.

We switched as of ivanchuk.  It should be safe to drop this now.

Funny enough, this actually helps with getting ruby 3.0 support because the AR
extension signature for drop_table needs to change to work with ruby 2.x and
3.x.
jrafanie added a commit to jrafanie/manageiq-postgres_ha_admin that referenced this pull request Sep 14, 2022
We switched to the logical replication handler in 2019, orphaning this handler.

See: ManageIQ/manageiq#18686
and specifically this commit: ManageIQ/manageiq@81ddc80
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants