Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 11 terminates unexpectedly #520

Closed
chranmat opened this issue Oct 30, 2018 · 20 comments
Closed

Version 11 terminates unexpectedly #520

chranmat opened this issue Oct 30, 2018 · 20 comments
Labels

Comments

@chranmat
Copy link

Today, when trying to build and run my application I discover that Postgres suddenly terminates unexpectedly during the initialization of the application.

When doing further investigation I can see that latest Postgres image now is version 11. (I hadn't specified to use specific version in Dockerfile).

My application does a bunch of initialization tasks on start, and Postgres terminates during one of these.

Going back to version 10.5 solves the issue for me, but there is certainly an issue either with Postgres it self, or configuration issue with the default config in pg11 for the Docker image.

Here is my log output:

api_1 | Exception: SQLSTATE[HY000]: General error: 7 server closed the connection unexpectedly
api_1 | This probably means the server terminated abnormally
api_1 | before or while processing the request.
api_1 | Exception 'PDOException' with message 'SQLSTATE[HY000]: General error: 7 no connection to the server'
db_1 | 2018-10-30 09:48:50.610 UTC [1] LOG: server process (PID 71) was terminated by signal 11: Segmentation fault
db_1 | 2018-10-30 09:48:50.610 UTC [1] DETAIL: Failed process was running: DELETE FROM "dashboard_layouts" WHERE "position"=$1
db_1 | 2018-10-30 09:48:50.610 UTC [1] LOG: terminating any other active server processes
db_1 | 2018-10-30 09:48:50.610 UTC [67] WARNING: terminating connection because of crash of another server process
db_1 | 2018-10-30 09:48:50.610 UTC [67] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
db_1 | 2018-10-30 09:48:50.610 UTC [67] HINT: In a moment you should be able to reconnect to the database and repeat your command.
db_1 | 2018-10-30 09:48:50.613 UTC [1] LOG: all server processes terminated; reinitializing
db_1 | 2018-10-30 09:48:50.627 UTC [72] LOG: database system was interrupted; last known up at 2018-10-30 09:48:46 UTC
db_1 | 2018-10-30 09:48:50.944 UTC [72] LOG: database system was not properly shut down; automatic recovery in progress
db_1 | 2018-10-30 09:48:50.950 UTC [72] LOG: redo starts at 0/1654800
db_1 | 2018-10-30 09:48:50.971 UTC [72] LOG: invalid record length at 0/17C72F8: wanted 24, got 0
db_1 | 2018-10-30 09:48:50.971 UTC [72] LOG: redo done at 0/17C72D0
db_1 | 2018-10-30 09:48:50.971 UTC [72] LOG: last completed transaction was at log time 2018-10-30 09:48:50.600994+00
db_1 | 2018-10-30 09:48:51.054 UTC [1] LOG: database system is ready to accept connections

@wglambert wglambert added the question Usability question, not directly related to an error with the image label Oct 30, 2018
@wglambert
Copy link

Can you give all the commands you ran and any relevant contextual information or files for reproducing the issue

@yosifkit
Copy link
Member

The most likely culprit would be that the volume of data you had for postgres was for version 10.5 and PostgreSQL cannot read data directories of older versions and is unable to auto-upgrade. Related to and duplicate of #37

@chranmat
Copy link
Author

chranmat commented Oct 31, 2018

@yosifkit, there were no previous data volume attached to the container. I've read the case #37 you refer to, how can you possibly assume this and close the case?

@chranmat
Copy link
Author

@wglambert, I will see if I can reproduce it without providing all sources :)

@erikdstock
Copy link

erikdstock commented Nov 2, 2018

Edit: postgres:10.5 worked for me as well.
I'm having what seems to be the exact same issue.

db_1   | 2018-11-02 01:00:22.248 UTC [1] LOG:  server process (PID 57) was terminated by signal 11: Segmentation fault
db_1   | 2018-11-02 01:00:22.248 UTC [1] DETAIL:  Failed process was running: COMMIT
db_1   | 2018-11-02 01:00:22.249 UTC [1] LOG:  terminating any other active server processes
db_1   | 2018-11-02 01:00:22.249 UTC [52] WARNING:  terminating connection because of crash of another server process
db_1   | 2018-11-02 01:00:22.249 UTC [52] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
db_1   | 2018-11-02 01:00:22.249 UTC [52] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
db_1   | 2018-11-02 01:00:22.254 UTC [1] LOG:  all server processes terminated; reinitializing
db_1   | 2018-11-02 01:00:22.281 UTC [58] LOG:  database system was interrupted; last known up at 2018-11-02 00:52:29 UTC
db_1   | 2018-11-02 01:00:22.282 UTC [59] FATAL:  the database system is in recovery mode
db_1   | 2018-11-02 01:00:22.758 UTC [58] LOG:  database system was not properly shut down; automatic recovery in progress
db_1   | 2018-11-02 01:00:22.762 UTC [58] LOG:  redo starts at 0/183B4E8
db_1   | 2018-11-02 01:00:22.763 UTC [58] LOG:  redo done at 0/183B4E8
db_1   | 2018-11-02 01:00:22.785 UTC [1] LOG:  database system is ready to accept connections

I can try to provide more context around this but here are a few basics:

  • had recently updated a django/wagtail cms from sqlite/docker to postgres/docker-compose
  • the error occurs in a django bootstrapping script after build: manage.py migrate and createadminuser tasks work fine, but a custom task fails on its first db operation.
  • it worked fine on a handful of development machines (all osx)
  • it did not work on another 2 machines- gave the segfault as above. These two machines were building later and have a different image id.
  • I also tried removing all associated volumes/images/containers and beginning from a freshly cloned repo. This included attempting it with a clean checkout in a new directory- i was worried some bind mount was shadowing files on the running container.

@izakp
Copy link

izakp commented Nov 2, 2018

Same problem here - getting terminated by signal 11: Segmentation fault when trying to load data, starting with no initial data or volume mounted to the /var/lib/postgresql/data directory. As with @chranmat the postgres:10.5 image works as expected. @yosifkit I don't think this is a duplicate of #37 either... will you reopen?

@wglambert
Copy link

So @izakp you're getting this error from a blank startup with no data mounted? I think your example would be the most concise for reproducing the issue, could you post any relevant files that you have

@wglambert wglambert reopened this Nov 2, 2018
@erikdstock
Copy link

(updated my comment to point out that 10.5 did work for me- I double-checked it after seeing izakp's comment)

@wglambert wglambert added Issue and removed question Usability question, not directly related to an error with the image labels Nov 2, 2018
@raarts
Copy link

raarts commented Nov 5, 2018

I don't know if I have the exact same problem, but it's also in a DELETE query, and I can reproduce it with odoo on my mac:

docker run --name postgres -d postgres:11-alpine -c log_min_duration_statement=0

Enter the container and create an odoo user:

CREATE ROLE odoo with LOGIN CREATEDB PASSWORD 'odoo';

and now run odoo:

docker run -it --link postgres:postgres -p 8000:8069 -e DB_PORT_5432_TCP_ADDR=postgres -d odoo

Now:

  1. connect to localhost:8000
  2. create a database (fill in some email address/password, and choose English & United States, no demo data)
  3. install the 'Website' module (it's at the top).

This fires a lot of queries, and after a while postgres crashes, the odoo container shows the query it crashed on. Issuing it by hand repeats the crash.

The crash does not happen with v10.

EDIT: I found that only deleting very long rows crashed the server.

@tianon
Copy link
Member

tianon commented Nov 5, 2018 via email

@izakp
Copy link

izakp commented Nov 5, 2018

@wglambert replying to your questions above...

you're getting this error from a blank startup with no data mounted?

Sorry, not quite this... Postgres successfully initializes in the container from a blank data directory with nothing mounted. Once it is up and running, when I try and batch INSERT data into the server, it segfaults. Unfortunately, I can't post the data here as it's sensitive.

I think your example would be the most concise for reproducing the issue, could you post any relevant files that you have

I'll try and narrow it down to the specific query that topples it during the importer.

@labkey-tchad
Copy link

This appears to be a known issue in Postgres 11.0
It is fixed in 11.1

@yosifkit
Copy link
Member

yosifkit commented Nov 9, 2018

11.1 will be built and pushed once docker-library/official-images#5054 merges.

You can test early by building the current 11 context: https://github.com/docker-library/postgres/tree/64bec4b1617291e3646e4e7dbbae1174404c3fd9/11.

@raarts
Copy link

raarts commented Nov 9, 2018

Thanks, will test and report back.

@tedivm
Copy link

tedivm commented Nov 10, 2018

I'm seeing the same issue and the new 11.1 images has not resolved it.

@yosifkit
Copy link
Member

I am unable to reproduce the crash on 11.1. I used the query from the linked bug. It does reliably crash 11.0 but not 11.1:

create table foo (a int primary key, b int);
create table bar (a int references foo on delete cascade, b int);
insert into foo values (1, 1);
insert into foo values (2, 2);
alter table foo add c int;
alter table foo drop c;
delete from foo;

- https://www.postgresql.org/message-id/9cb4aa1c-12ba-59c3-fd75-545fa90fb92f%40lab.ntt.co.jp)

So the linked bug does fix one crash. If you are having other crashes, then it is still probably an upstream bug and you should report a minimal reproducer to them.

@raarts
Copy link

raarts commented Nov 12, 2018

Can confirm that the odoo DELETE bug is also fixed in 11.1. Thanks!

@tedivm
Copy link

tedivm commented Nov 12, 2018

Thanks all- we've downgraded back to 10.5 and our issues have gone away. When I can properly replicate it I'll report it upstream.

@tianon
Copy link
Member

tianon commented Nov 21, 2018

👍

@tianon tianon closed this as completed Nov 21, 2018
JaneJeon added a commit to JaneJeon/objection-authorize that referenced this issue Dec 25, 2021
@JaneJeon
Copy link

At least on GitHub Actions, v13 & v14 reliably fails with this issue (I have confirmed v10.5-12 works). Just stand up a container with a small app to connect to it as "test"; then, it will fail always on CI. However, it doesn't fail on local which makes debugging all the more tricky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants