how to run as an intermediate container in a multistage Dockerfile? #661

the-vampiire · 2020-01-09T20:09:16Z

Given that this isn't an error with the image you might have better responses asking over at the Docker Community Forums, the Docker Community Slack, or Stack Overflow. As these repositories are for issues with the image and not necessarily for questions of usability

i have followed the recommendation and asked this question on reddit, stackoverflow and docker slack chat and have not received any help. i am turning to the maintainers because i am at a loss

use case

i am looking for some advice on the best way to accomplish a build. i have got it working manually but would like to set this up in a multistage docker build to automate it.

goal: a docker image (from postgres:9.4) that has a fully set up database and users for testing and distributing with my team.

the setup scripts take a long time to run which is why id like to have the final image just contain the resultant data rather then re-executing the scripts every time the container is run.

i would then like to extend from this pre-set image by adding additional user scripts. this is another issue i am having (from the manually created image) because initdb does not execute if the data dir is populated. and i run into the same issues as i am finding in the multistage build (needing to run an intermediate container to execute the scripts). if i can learn how the multistage works then i can figure out the extending problem.

manual steps

[image 1] write Dockerfile that copies over the custom scripts and builds from postgres:9.4

run the container (to execute the scripts and set up the users / dbs) with a mounted volume to store the postgres data on the host

[image 2] write another Dockerfile that copies over the data from the mounted volume and builds from postgres:9.4

run a container from the final image (2) which now has all the data ready to go without needing to execute the scripts

is this possible to do in a multi stage Dockerfile so it can be automated?

multistage automated attempt

FROM postgres:9.4 AS builder
COPY custom-scripts/ /docker-entrypoint-initdb.d/
COPY data/ /tmp
# run the postgres process in the container?
# i saw this as the CMD for the postgres image
# also tried running /usr/local/bin/docker-entrypoint.sh
RUN postgres
FROM postgres:9.4
COPY --from=builder /var/lib/postgresql/data /var/lib/postgresql/data
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 CMD ["CMD-SHELL", "pg_isready -U $DB_USER -d $DB_NAME"]

what is happening

"root" execution of the PostgreSQL server is not permitted.
The server must be started under an unprivileged user ID to prevent
possible system security compromise. See the documentation for
more information on how to properly start the server.

so i tried adding USER postgres before this directive and got the following error:

postgres cannot access the server configuration file "/var/lib/postgresql/data/postgresql.conf": No such file or directory

The text was updated successfully, but these errors were encountered:

yosifkit · 2020-01-09T20:52:28Z

Ok, so this dockerfile should give you a fully setup DB with the DB stored in the image:

FROM postgres
# set pgdata to some path outside the VOLUME that is declared in the image
ENV PGDATA /var/lib/postgresql/custom
# TODO: make sure the postgres user owns PGDATA and has access to an new directories
USER postgres
COPY custom-scripts/ /docker-entrypoint-initdb.d/
COPY custom-entrypoint.sh /usr/local/bin/
RUN custom-entrypoint.sh postgres
ENTRYPOINT [ "custom-entrypoint.sh" ]
CMD [ "postgres" ]

And use the custom entrypoint described at the end of the comment here: #496 (comment). Just make sure to put an exit 0 at the end of the if block, so that it exits once initialized. Then you can put anything you want added at startup inside /always-initdb.d/.

the-vampiire · 2020-01-10T20:14:45Z

@yosifkit thank you so much man. this has been bugging me for a while and i was out of approache to solving it.

i had to make a few changes from your suggestion but i finally got it working!

i now have a pre-built image with the data loaded. and i have extended that prebuilt image with "always init" scripts to create another one sucessfully. so seriously, thank you again for your help.

i will post the files below that i ended up using in case someone else comes across this. but i have some questions for you about why they needed to be made if you have a minute.

questions

why do we run custom-entrypoint.sh postgres which leads to exec postgres postgres and an error?
- is that just to make it fail so the process stops in the RUN command?
where should the exit 0 statement have been added in the script? i was only able to get it working with RUN custom-entrypoint.sh postgres || exit 0 to catch the error and exit 0
- nothing within the script would resolve the fatal error and the build would fail
is there anything concerning about replacing CMD ["postgres"] with CMD ["-c max_transaction_lock=512" to avoid having to append this command when running a container?
- leaving CMD ["postgres"] would cause the container to fail otherwise with the same fatal error as above
- is there anything else i can use to make it work without requiring an explicit container run command override?

changes

using the custom entrypoint script in a RUN command:

no matter where i put the exit 0 in the custom entrypoint script it would still fail. i believe because the final exec postgres "$@" was run outside the if blocks that you suggested the exit statement be written under so it had no effect.

it failed with

+ exec postgres postgres
postgres: invalid argument: "postgres"

# your version, with exit 0 attempted in all possible locations of the script file
RUN custom-entrypoint.sh postgres

# what i had to do to make it work
RUN custom-entrypoint.sh postgres || exit 0

the root CMD

while the image would build fine using the CMD you suggested, it would fail when running the container (with the same postgres invalid command error as above).

i saw that using custom-entrypoint.sh -c max_transaction_locks=512 in the docker run command suggested in the linked thread worked. but i did not want to have to append that every time as it is likely to cause an error from members of my team if they forget.

i changed the entrypoint to apply this setting whenever the container is run automatically. this works now however every psql command run causes a SET setting to be called. this is completely tolerable but for the record i could not find anything else to put in CMD that would make it work otherwise.

# your version (when run container normally), postgres: invalid argument: "postgres"
CMD [ "postgres" ]

# what i did to make it work, only downside is SET command run on every psql request
CMD [ "-c", "max_locks_per_transaction=512" ]

resolving permissions

i created the always init dir as root, and copied / ran all other command as postgres

# create entrypoint and always init dirs with ownership to root
RUN mkdir "${ALWAYS_INIT_DIR}"

USER postgres

# all copied scripts and the RUN custom-entrypoint.sh were run as postgres user

the-vampiire · 2020-01-10T20:27:10Z

in case someone else comes across this here is the final solution i used:

creating a base image with pre-built data

/project
  Dockerfile
  custom-entrypoint.sh
  data/ <--- optional, initial CSV data i used to load in
  data-init-scripts/ <--- scripts to initialize the pre-built database
  always-init-scripts/ <--- scripts you want to run every time the container is started

in my case always-init-scripts/ was empty for the base image and extended images added scripts in there.

Dockerfile

# set to your tag as needed
FROM postgres:9.4

ENV PGDATA=/var/lib/postgresql/custom
ENV ALWAYS_INIT_DIR=/always-initdb.d

# -- RUNNING AS: root --

# remove this RUN command if you do not use postgis
# install postgis extensions
RUN apt-get update \
 && apt-get install --no-install-recommends -y postgresql-9.4-postgis-2.3 postgresql-9.4-postgis-2.3-scripts \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*

# create entrypoint and always init dirs with ownership to root
RUN mkdir "${ALWAYS_INIT_DIR}"

# -- RUNNING AS: postgres --

# switch to postgres user for ownership and execution
USER postgres

# remove this line if you are not loading any data / change the destination as needed
COPY data/ /tmp/

COPY custom-entrypoint.sh /usr/local/bin/
COPY data-init-scripts/ /docker-entrypoint-initdb.d/
COPY always-init-scripts/ "${ALWAYS_INIT_DIR}"/

# create custom data dir with ownership to postgres user
RUN mkdir "${PGDATA}"

# execute custom entry point to initialize with scripts and build the data dir
RUN custom-entrypoint.sh postgres || exit 0

ENTRYPOINT [ "custom-entrypoint.sh" ]

# using this CMD works so that docker run ... doesnt require a command override, 
# only downside is it calls SET for this parameter on every psql issued to the container
CMD [ "-c", "max_locks_per_transaction=512" ]

custom entrypoint

from this thread

#!/usr/bin/env bash
set -Eeo pipefail

source "$(which docker-entrypoint.sh)"

docker_setup_env
docker_create_db_directories

if [ -z "$DATABASE_ALREADY_EXISTS" ]; then
	docker_verify_minimum_env
	docker_init_database_dir
	pg_setup_hba_conf
	
	# only required for '--auth[-local]=md5' on POSTGRES_INITDB_ARGS
	export PGPASSWORD="${PGPASSWORD:-$POSTGRES_PASSWORD}"
	
	docker_temp_server_start "$@" -c max_locks_per_transaction=256
	docker_setup_db
	docker_process_init_files /docker-entrypoint-initdb.d/*
	docker_temp_server_stop
else
	docker_temp_server_start "$@"
	docker_process_init_files /always-initdb.d/*
	docker_temp_server_stop
fi

exec postgres "$@"

the-vampiire · 2020-01-10T21:12:39Z

extending the pre-built data base image

uses the pre-built image as a base then adds to the always-init-scripts dir to customize with more additions on top of the existing database

/project
  Dockerfile
  always-init-scripts/ <--- put new scripts to run on top of pre-built data image

Dockerfile

# replace with the name of your pre-built image
FROM pre-built-image-name

COPY always-init-scripts/ /always-initdb.d/

wglambert added the question Usability question, not directly related to an error with the image label Jan 9, 2020

yosifkit closed this as completed Jan 9, 2020

wglambert mentioned this issue Jan 16, 2020

Own startup scripts like /docker-entrypoint-initdb.d regardless of $DATABASE_ALREADY_EXISTS MariaDB/mariadb-docker#284

Closed

wglambert mentioned this issue Feb 10, 2020

Is it possible to initialize database while creating the image, NOT when starting the container #672

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to run as an intermediate container in a multistage Dockerfile? #661

how to run as an intermediate container in a multistage Dockerfile? #661

the-vampiire commented Jan 9, 2020

yosifkit commented Jan 9, 2020

the-vampiire commented Jan 10, 2020 •

edited

Loading

the-vampiire commented Jan 10, 2020

the-vampiire commented Jan 10, 2020

how to run as an intermediate container in a multistage Dockerfile? #661

how to run as an intermediate container in a multistage Dockerfile? #661

Comments

the-vampiire commented Jan 9, 2020

use case

manual steps

multistage automated attempt

what is happening

yosifkit commented Jan 9, 2020

the-vampiire commented Jan 10, 2020 • edited Loading

questions

changes

using the custom entrypoint script in a RUN command:

the root CMD

resolving permissions

the-vampiire commented Jan 10, 2020

creating a base image with pre-built data

Dockerfile

custom entrypoint

the-vampiire commented Jan 10, 2020

extending the pre-built data base image

Dockerfile

the-vampiire commented Jan 10, 2020 •

edited

Loading