Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segv when wrapped in a Timeout.timeout #586

Closed
BuonOmo opened this issue Sep 5, 2024 · 7 comments · Fixed by #589
Closed

Segv when wrapped in a Timeout.timeout #586

BuonOmo opened this issue Sep 5, 2024 · 7 comments · Fixed by #589

Comments

@BuonOmo
Copy link

BuonOmo commented Sep 5, 2024

Here's the failure (and ruby/system versions):

https://github.com/rgeo/activerecord-postgis-adapter/actions/runs/10715263199/job/29710385170

I'm not so used to reading segv messages.

I'll try to see if I can have a stable reproduction


Not reproducing but finding another issue, likely related (I'll split in two issues if not later).

The reset method calls resolve_hosts and reassign the new iopts to the newly created connection. But with a localhost host, this gets wild quite fast if you reset a few time the connection:

iopts = {host:"localhost", port:"5432"}

loop do
	p iopts # double host and port count every time
	iopts = PG::Connection.send(:resolve_hosts, iopts)
end

A more complete reproduction would be:

conn=PG.connect(host: "localhost", dbname: "postgres")
10.times { conn.reset }
conn.conninfo_hash[:host].count(",") # expected 0, obtained 2047
BuonOmo added a commit to cockroachdb/activerecord-cockroachdb-adapter that referenced this issue Sep 5, 2024
@larskanis
Copy link
Collaborator

larskanis commented Sep 5, 2024

The reset method calls resolve_hosts and reassign the new iopts to the newly created connection. But with a localhost host, this gets wild quite fast if you reset a few time the connection:

It is normal and expected that you get several hosts when calling conn.conninfo_hash[:host] when the host resolves to several IP addresses. In your case the localhost seems to resolve to 2048 IP addresses, which is uncommon.

What is the result of conn.conninfo_hash[:hostaddr] ?

See also #490

@larskanis
Copy link
Collaborator

From your stacktrace above:

/usr/lib/x86_64-linux-gnu/libpq.so.5(0x7fcc29418742) [0x7fcc29418742]
/usr/lib/x86_64-linux-gnu/libpq.so.5(PQgetCancel+0x104) [0x7fcc29419fb4]
[...]/pg-1.5.7/lib/pg_ext.so(pgconn_backend_key+0x1c) [0x7fcc294780ec]
[...]/pg-1.5.7/ext/pg_connection.c:1017

So rails is trying to cancel all running queries, but this segfaults on C level.
Internally ruby-pg calls PQgetCancel(conn) here to allocate a cancel object. So there is only one parameter that is given from ruby-pg to libpq: conn, but this parameter is checked for validity by pg_get_pgconn() to be non-NULL before the call. So I don't have an idea what ruby-pg could be doing wrong here.

connection.rb:597: [BUG] Segmentation fault at 0x0000000000000000

For me it looks like a segfault due to some out-of-memory condition in PQgetCancel. It should return NULL in this case without segfault, but seems to do not.

@BuonOmo
Copy link
Author

BuonOmo commented Sep 5, 2024

It is normal and expected that you get several hosts when calling conn.conninfo_hash[:host] when the host resolves to several IP addresses. In your case the localhost seems to resolve to 2048 IP addresses, which is uncommon.

It is not normal. The thing happening is #resolve_hosts takes localhost as host, and since this gives ::1,127.0.0.1 as hostaddr it will return localhost,localhost as host to have the same amount of hostaddr and host. Then the next call will loop two times on localhost, giving ::1,127.0.0.1,::1,127.0.0.1, which yields 4 * localhost. ETC

The reproduction makes it quite obvious if you want to try it locally!

The issue you are pointing is referencing the same bug, but if #resolve_hosts is called only once, this is fine. More than once and it starts burning the house down 🔥

@BuonOmo
Copy link
Author

BuonOmo commented Sep 5, 2024

For me it looks like a segfault due to some out-of-memory condition in PQgetCancel. It should return NULL in this case without segfault, but seems to do not.

That would make sense, since the conninfo string gets twice as big after every reconnect!

@larskanis
Copy link
Collaborator

Then the next call will loop two times on localhost, giving ::1,127.0.0.1,::1,127.0.0.1, which yields 4 * localhost. ETC

Now it makes sense to me! This is a regression in pg-1.5.6 introduced by #559. I'll prepare a fix.

larskanis added a commit to larskanis/ruby-pg that referenced this issue Sep 5, 2024
Commit 016c17c introduced a regression, that leads to a host list duplication every time conn.reset is used.

Since the hosts are duplicated when resolve_hosts is called, the conn.coninfo_hash can not be used.
So the fix is to store the original hash of connection options for later use in conn.reset .

Fixes ged#586
larskanis added a commit to larskanis/ruby-pg that referenced this issue Sep 5, 2024
Commit 016c17c introduced a regression, that leads to a host list duplication every time conn.reset is used.

Since the hosts are duplicated when resolve_hosts is called, the conn.coninfo_hash can not be used.
So the fix is to store the original hash of connection options for later use in conn.reset .

Fixes ged#586
larskanis added a commit to larskanis/ruby-pg that referenced this issue Sep 5, 2024
Commit 016c17c introduced a regression, that leads to a host list duplication every time conn.reset is used.

Since the hosts are duplicated when resolve_hosts is called, the conn.coninfo_hash can not be used.
So the fix is to store the original hash of connection options for later use in conn.reset .

Fixes ged#586
@BuonOmo
Copy link
Author

BuonOmo commented Sep 6, 2024

Any plans of making this part of a release soonish ? I'd rather upgrade the dependency in the postgis adapter to be sure ppl don't get their connection bloated (#reset occurs frequently enough in rails for this to be an issue)

@larskanis
Copy link
Collaborator

pg-1.5.8 is out with this fix included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants