Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-20915][docker] Move docker entrypoint to distribution #14630

Closed
wants to merge 8 commits into from

Conversation

zentol
Copy link
Contributor

@zentol zentol commented Jan 13, 2021

This PR moves most of the docker-entrypoint.sh logic into a new script within the distribution to reduce the maintenance burden and release issues.
Everything but the jmalloc switch and dropping of privileges has been moved.

Some minor adjustments have been made to the file;

  • _FLINK_HOME_DETERMINED is now always set to true, not just in the native-kubernetes/generic modes
  • bin/config.sh is now always sourced, not just in the native-kubernetes/generic modes
  • use more specific directories (e.g., FLINK_OPT_DIR) where possible.

This branch works against a dev flink-docker branch for CI purposes.

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 73767fc (Wed Jan 13 10:35:35 UTC 2021)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 13, 2021

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@rmetzger rmetzger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether it makes sense to rename the script. docker-entrypoint.sh is usually the name of the "main method" of a Docker Image.

maybe we need to establish the concept of flink-docker/docker-entrypoint.sh being the "wrapper" that sets up the environment, and this one being the "docker-entrypoint-runner.sh" ?
Just thoughts ...

flink-end-to-end-tests/test-scripts/common_docker.sh Outdated Show resolved Hide resolved
###############################################################################

COMMAND_STANDALONE="standalone-job"
# Deprecated, should be remove in Flink release 1.13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Deprecated, should be remove in Flink release 1.13
# Deprecated, should be removed in Flink release 1.13

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is against master, which will eventually become release 1.13?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only moving the script, any logic/API changes should be relegated to follow-ups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the master branch, the native K8s will not need the native-k8s command anymore. It will generate the command and arguments as followings.

  - args:
    - bash
    - -c
    - $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824 -Xms1073741824
      -XX:MaxMetaspaceSize=268435456 -Dlog.file=/opt/flink/log/jobmanager.log -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
      -Dlog4j.configuration=file:/opt/flink/conf/log4j2.xml -Dlog4j.configurationFile=file:/opt/flink/conf/log4j2.xml
      org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint
      ...
    command:
    - /docker-entrypoint.sh

We already have a ticket FLINK-20676 to track this.

@@ -0,0 +1,161 @@
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script shows up in build-target/bin. If we keep it there, we need to add a message that it shall not be used manually.

Maybe, it makes sense to move the script to opt/docker/docker-entrypoint.sh to not further pollute bin/? (This is really just a thought for discussion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a lot of stuff in bin/ at this point that users should never use directly (bash-java-utils, config.sh, flink-console.sh, flink-daemon.sh, find-flink-home.sh), that I'm not too concerned about adding one more.
I see the point of trying to hide it, but this whole "put-into-opt-but-load-it-automatically-in-some-scenarios" business that the table-api introduced is rubbing me the wrong way.
The overall semantics for bin/opt seem rather ill-defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We should clean up the bin directory in a separate effort.

@zentol
Copy link
Contributor Author

zentol commented Jan 13, 2021

Yeah I wasn't sure either about the naming business. I'm not too bothered by the current name, and think that entrypoint-runner is a bit misleading since I would would assume the runner would call the entrypoint.
Maybe docker-utils.sh would do the trick.

COMMAND_HISTORY_SERVER="history-server"

args=("$@")
echo "${args[@]}"
Copy link
Contributor

@wangyang0918 wangyang0918 Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not print any text in the pass-through mode. Otherwise, the PR for docker-library/official-images will fail since CI will run the this test[1].

[1]. https://github.com/docker-library/official-images/blob/master/test/tests/override-cmd/run.sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, this was for debugging.

Copy link
Contributor

@wangyang0918 wangyang0918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR.
The only concern I have is about the counterpart change in flink-docker project. I am not sure using such a wrapper is reasonable for those docker guys. In my opinion, now the docker image is more like a blackbox and I have to check the docker-entrypoint.sh in flink project for how to use it.

exec $(drop_privs_cmd) "${FLINK_HOME}/bin/docker-entrypoint.sh" "${args[@]}"

@zentol
Copy link
Contributor Author

zentol commented Jan 14, 2021

@wangyang0918 I have outlined this approach and already gotten their approval. docker-library/official-images#9249 (comment)
(note that I had the same hunch as you though, and was actually surprised that they were fine with it)

Copy link
Contributor

@rmetzger rmetzger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this.

I assume you've tested the changes manually. I haven't.

@zentol
Copy link
Contributor Author

zentol commented Jan 14, 2021

I've only run CI; which should at least cover all branches. I think I broke the jmalloc switch, because it checks the wrong argument. Of course we don't have tests for that; why should you...

@zentol
Copy link
Contributor Author

zentol commented Jan 14, 2021

@flinkbot run azure

@tianon
Copy link

tianon commented Jan 15, 2021

@wangyang0918 I have outlined this approach and already gotten their approval. docker-library/official-images#9249 (comment)
(note that I had the same hunch as you though, and was actually surprised that they were fine with it)

To clarify - I did not mean simply moving the entrypoint script into Flink. I meant moving the functionality into Flink. Many of the things that happen in that script are things that probably make sense for other (non-container) users of Flink as well, and it feels wrong for those implementations to happen in the Dockerization scripts.

@wangyang0918
Copy link
Contributor

It seems that we are still not on the same page.

@zentol
Copy link
Contributor Author

zentol commented Jan 15, 2021

@tianon This is just a first step to facilitate easier refactorings. Migrating more (ideally all) of the docker-specific logic into the scripts is the long-term goal, which would be hampered if we'd have to move things bit by bit from one repo (flink-docker) to another (flink).
The concern that @wangyang0918 had was that the docker image is now a blackbox (because it does not define in any way what it accepts), and this is what I believe we agreed on to be fine (how else could we move the logic to the distribution).

In any case moving this script into the distribution should already be an overall plus because this logic is now at least accessible even outside docker use-cases.

@tianon
Copy link

tianon commented Jan 15, 2021

I agree that the Docker image being a complete black box is a bad thing, which is exactly why I'm confused by this PR. Let me try to rephrase what I'm suggesting:

Take a look at the docker-entrypoint.sh as it stands today, and identify what problems it solves that are not currently solved by the existing flink wrapper script(s) that are shipped with the upstream Flink distribution.

Determine what makes sense to add to the Flink distribution to help with those use cases that are currently not being serviced, and use the integration of those changes into docker-entrypoint.sh as a "barometer" of sorts for whether you're succeeding at decreasing the amount of above-and-beyond scripting necessary to accomplish these core Flink use cases (because at the end of the day, the length of docker-entrypoint.sh is demonstrating that there are gaps in how Flink is expected to be run).

I am not suggesting that the code in the current entrypoint script should be simply moved elsewhere, because that doesn't actually solve the underlying problem that the way the Flink distribution officially expects to be run and the way it's being run have diverged.

As a concrete example, if someone downloads the binary distribution of Flink and tries to run it on a VM, are they expected to set an appropriate CLASSPATH variable? Why or why not? How can we improve the experience for them in such a way that it also improves the experience for Docker? (Alternatively, if they are not, why does the Docker image need to, and how can we bring those two usages into a more symbiotic alignment?)

@zentol
Copy link
Contributor Author

zentol commented Jan 17, 2021

I'm confused

Well that makes 2 of us.

docker-library/official-images#9249 (comment)

If I understand you correctly, then it would be fine for us to move everything in the docker-entrypoint.sh into a new scripts within distribution, and just call that script from docker-entrypoint.sh with all passed arguments.

docker-library/official-images#9249 (comment)

Yes, I'd love to see this.

@zentol
Copy link
Contributor Author

zentol commented Jan 17, 2021

if someone downloads the binary distribution of Flink and tries to run it on a VM, are they expected to set an appropriate [FLINK_]CLASSPATH variable?

To answer your example: No, it is not required.

As for why the kubernetes mode does it: It assembles a plain java ... command, and uses ${FLINK_CLASSPATH} as a placeholder for the jvm classpath parameter (because the actual value is determined within the container based on it's contents, not on the client-side), which is populated by the script.

@zentol
Copy link
Contributor Author

zentol commented Jan 18, 2021

@tianon Note that the stance against blackboxes and against kubernetes-specific branches creates quite a few issues, and at this point I honestly do not know what is acceptable and what isn't.

a)
If deferring the entire logic to some script outside the image is a problem, then so should any approach that allows an arbitrary command to be run (i.e., what was introduced in apache/flink-docker#49). Both do something unspecified, having some unspecified requirements. I cannot comprehend why the latter was seemingly accepted though; we could now remove all the other branches and route everything through the generic path, leaving us in eventually the same state that we are in with this PR.
b)
The reality is we need some hook to activate some kubernetes-specific code; that's the whole point of our native kubernetes integration (you can still run things on kubernetes without it, this is just an option). We can't have kubernetes-specific branches in the image, and with generic branches being ruled out by a) the only alternative is adding a new isKubernetes flag argument to existing scripts in the distribution. This flag would not be mentioned in the image at all (it would just one of many arguments passed as parameter to the bin/ scripts), effectively being a hidden kubernetes branch.
I feel like that shouldn't be acceptable because we'd just be tip-toeing around the issue and the scripts using that flag, from the perspective of the image, could be considered a black box.
To be specific, the jobmanager.sh would then either start a generic jobmanager (one that would be used outside docker environments) or one specific to kubernetes based on a given argument. But if that were allowed, then I'd think a script that starts either a jobmanager or taskmanager based on some argument should also be allowed. You can apply this logic a few more times and end up with a single scripts starting some process, which was already rejected.

I understand the goal of wanting the image to be run anywhere and not containing kubernetes-specific code.
But then, how exactly are applications meant to implement kubernetes-specific features?

@wangyang0918
Copy link
Contributor

Actually, I am also confused about what is acceptable to keep in docker-entrypoint.sh and what is not. Maybe at the very begging we should not introduce the native K8s related logics to docker-entrypoint.sh.

For how to implement kubernetes-specific features, I think we could have a dedicated entrypoint for native Kubernetes integration kubernetes-entrypoint.sh. After then, docker-entrypoint.sh is used only for standalone containerized deployment. Both kubernetes-entrypoint.sh and docker-entrypoint.sh could share some common functions(e.g. copy_plugins_if_required , disable_jemalloc , etc.).

@tianon
Copy link

tianon commented Jan 19, 2021

Ok, let me try to phrase this in a simpler way.

https://github.com/apache/flink-docker/blob/d2ff00e0176e09cdbbde76412026054016a8551c/1.12/scala_2.12-java11-debian/docker-entrypoint.sh is almost 200 lines long with a lot of little bits of duplicated code.

How much of that is actually specific to using Flink in Docker, and how much of that is something that users outside Docker might be interested in / use?

It seems like a lot of it is just (explicit, separate) wrappers around other existing Flink scripts, and even many of those are sharing common behavior (copy_plugins_if_required, for example).

Could they be combined in some way such that the "Docker specific" behavior happens once and the upstream scripts are then just standard fallthrough behavior?

For example, right now users do commands like docker run ... flink history-server ... in order to run historyserver.sh. What I would expect instead is that $FLINK_HOME/bin is in PATH, that docker-entrypoint.sh looks at $1 to do any extra Docker-specific initialization for historyserver.sh, and users then do docker run ... flink historyserver.sh ... (where now, the "command" users are running in the container matches the exact name of the same command they would run for a standard install of Flink).

Rough mockup:

copy_plugins_if_required

case "$1" in
	standalone-job.sh) ... ;;
	historyserver.sh) ... ;;
	...
esac

exec "$@"

If having something that allows users to run something "prettier" like flink history-server ... is a desired goal, shouldn't that be a standard wrapper script in the upstream distribution, rather than something created specifically just for the Docker image? That could even be implemented as a set of symlinks to the official scripts with their "prettier" preferred names, if you want to keep the maintenance burden low.


As another explicit example of complexity, instead of disabling jemalloc via an explicit "command" which then has to be stripped and is order-dependent, if that were set via environment variable the logic would be much simpler:

if [ -z "${FLINK_DISABLE_JEMALLOC:-}" ]; then
	export LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
fi

(To make this part of an "official" script in the actual distribution you'll likely want to make it more forgiving / detecting of the path to libjemalloc.so, but you get the idea.)


This isn't as much a matter of "acceptable" vs "unacceptable" but rather that the script is consistently growing in complexity and I'm trying to understand why it is not simpler and/or which parts of it are actually Docker-specific and necessary, because it currently reads as a whole new launcher script for Flink and that seems like something that ought to be part of the Flink distribution instead of something Docker-specific.

@zentol
Copy link
Contributor Author

zentol commented Jan 19, 2021

How much of that is actually specific to using Flink in Docker[...]?

As it stands, I'd say everything. Let's go through it shall we:

  1. Setting of certain options:
  • jobmanager.rpc.address: Outside of docker this must be set manually depending on the way you deploy Flink, but we can make use of knowledge about dockers networking to simplify the setup.
  • *.port: Outside of docker these are determined randomly by default, but we set them to static values here since we know that with docker they cannot conflict with others, and it simplifies the network rules setup for users.
  • taskmanager.numberOfTaskSlots: This is essentially legacy behavior that we inherited. Outside of docker there are set manually by users based on their resources/requirements.
  1. copying plugins
  • It was recommended to us to allow any modifications that are usually done manually be made possible via environment variables. This is one such example. There are no plans to move this upstream.
  1. FLINK_PROPERTIES
  • Similarly to copying plugins, this was added as a convenience for modifying the configuration. Outside of docker this file is modified manually, and there is little benefit making this generally available.
  1. envsubst stuff: Again some inherited stuff we can't throw out as of yet. There is a current discussion on the Flink dev mailing list to support this in Flink itself.
  2. jemalloc stuff
  • This switch was introduced due to a problem that is specific to the distribution used by docker image. While it is something that is potentially useful in other cases we have no interest in accommodating all possible platforms to such a degree.
  1. drop_privs_cmd
  • Relies on the existence of a user account that is setup in the Dockerfile.
  1. wrapping of commands
  • primarily exists to set the start-foreground flag, as by default Flink processes run in the background. This is easier to do if there is some abstraction layer in between the user and Flink; in your model we'd have to inject a new parameter into the script arguments.

There are some things we can certainly simplify (like deduplicating the copy_plugins calls, or jemalloc being controlled by an environment variable (that is in fact already implemented and just needs to be merged)).
The idea to have users call scripts directly is generally a good one, but it does bear the risk of users using functionality that needs some docker-specific logic that we have yet to set up. Ultimately, our intend neither is nor ever was was to provide an image that allows everything in the distribution to be used, but to only ease the setup of singular Flink processes. Maybe this is where some of the dissonance comes from.
Nevertheless I'd be interested in trying this out and checking in with others on what they think about it.

This isn't as much a matter of "acceptable" vs "unacceptable".

I'd beg to differ, given that docker-library/official-images#9249 has now been sitting around for over a month, forcing us to set up a secondary docker image distribution channel.

@tianon
Copy link

tianon commented Jan 19, 2021

I'd beg to differ, given that docker-library/official-images#9249 has now been sitting around for over a month, forcing us to set up a secondary docker image distribution channel.

Our experience with "we'll fix this in the next version" is that it doesn't happen.

The reason we're pushing back on 1.12 is that this functionality is introduced there, and as far as the image is concerned, isn't released yet, so I'm having a hard time understanding why things like the jemalloc usage can't be fixed in 1.12 before the images are released? I get that the kubernetes functionality is unfortunately already baked in to the 1.12 upstream release, so we can't really "fix" that until 1.13.

As a compromise (and to show good faith), if we can clean up the jemalloc usage (or instead remove/revert it pending 1.13), I'm happy to merge 1.12 and get that moving under the shared understanding that we find a better usage pattern for kubernetes in 1.13 (cc @yosifkit).

@zentol
Copy link
Contributor Author

zentol commented Jan 25, 2021

Our experience with "we'll fix this in the next version" is that it doesn't happen.

I can understand that mindset.

We're happy to take you up on that compromise; we are cleaning up the script right now with the last PR to be merged today, and will update the official-images PR shortly.
The follow-up regarding Kubernetes will be tracked in FLINK-21128.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants