Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLUME-3415 - Provide Docker image for Flume #351

Open
wants to merge 23 commits into
base: trunk
Choose a base branch
from

Conversation

tmgstevens
Copy link
Contributor

Draft docker build and integration tests.

@busbey
Copy link
Contributor

busbey commented Mar 7, 2022

is there a jira that describes the goals of this PR?

@tmgstevens tmgstevens changed the title Docker FLUME-3415 - Provide Docker image for Flume Mar 28, 2022
@tmgstevens
Copy link
Contributor Author

Reverting the now duplicated HBase client changes

tmgstevens and others added 3 commits May 11, 2022 19:21
Comment on lines +78 to +85
- name: Deploy Docker (Linux)
if: runner.os == 'Linux' && github.ref == 'refs/heads/trunk'
timeout-minutes: 90
shell: bash
run: |
./mvnw deploy -Ddockerfile.username=$${{ secrets.DOCKERHUB_USER }} \
-Ddockerfile.password=... $${{ secrets.DOCKERHUB_TOKEN }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be publishing snapshot builds of trunk to DockerHub. Can we make sure we're only publishing released artifacts?

LABEL org.opencontainers.image.authors="${MAINTAINER}"
LABEL site="https://flume.apache.org"

COPY target/apache-flume-${ASSEMBLY_VERSION}-bin/apache-flume-${ASSEMBLY_VERSION}-bin/ /etc/flume/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't be putting executables into /etc. Since we're making a flume user can we put the executables into that user's home directory?

Comment on lines +95 to +98
<groupId>com.spotify</groupId>
<artifactId>dockerfile-maven-plugin</artifactId>
<version>${dockerfile.version}</version>
<!-- Wire up to the default build phases -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this plugin is in "public archived" status. we should treat it as EOL and should not rely on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +100 to +126

@After
public void tearDown() throws Exception {
DockerInstall.getInstance().stopAgent();
for (File tempResource : tempResources) {
tempResource.delete();
}
agentProps = null;
}

/**
* File channel in/out test. Verifies that all events inserted into the
* file channel are received by the sink in order.
* <p>
* The EXEC source creates 100 events where the event bodies have
* sequential numbers. The source puts those events into the file channel,
* and the FILE_ROLL The sink is expected to take all 100 events in FIFO
* order.
*
* @throws Exception
*/
@Test
public void testInOut() throws Exception {
LOGGER.debug("testInOut() started.");


DockerInstall.getInstance().startAgent("a1", agentProps, tempResources);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a code smell here due to the agent start being in the test while the agent stop is in an after block. either start and stop should both be in the test, or the start should be in a before.

Comment on lines +47 to +49
@Before
public void setUp() throws Exception {
/* Create 3 temp dirs, each used as value within agentProps */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should add a JUnit Assert.assumeTrue to skip the test cases when Docker isn't available on the system.

ImmutableList.Builder<String> builder = new ImmutableList.Builder<>();
builder.add("/bin/sh");
builder.add("-c");
builder.add("'docker kill " + containerId + "'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of the cases where docker is invoked here should be an environment variable or some such that defaults to "docker" and allows whoever is running the test to specify a fully qualified path to the docker executable.

@rgoers
Copy link
Member

rgoers commented Oct 31, 2022

Now that I look at this again I don't think I really understand how it helps.

  1. It packages all the artifacts whether they are required or not.
  2. Unless I am mistaken, it is getting the distribution tar and using the configuration located within that. I fail to see how useful that will be as I would expect most users would have a custom configuration.

To be honest, I added Spring Boot support to simplify a LOT of this. Spring Boot packages up just the dependencies you need, uses the standard application.yml to configure Flume, and allows all the wiring of Sinks, Sources, and Channels to be performed in standard Spring Java config source files. Once packaged with the Spring Boot Maven plugin startup consists of either

  1. java -Dspring.profiles.active={envName} -jar my-flume.jar
    2../my-flume.jar -Dspring.profiles.active={envName}

@busbey
Copy link
Contributor

busbey commented Nov 1, 2022

Container images make deployment to a variety of environments much, much simpler. So I definitely agree with Tristan that we need something like this. For example, it would make it about an order of magnitude easier for me to evaluate release candidates.

Additionally, the project having an opinion on what a reasonable starting point for a container image looks like will make conversation I have about deployment in modern production environments easier.

I think starting from "everything goes in the bucket" to match our historic deployment shape is fine. I agree that eventually we need to get to a more modular approach that provides easy examples for folks building just the parts they need.

There's some tooling around for easier container image building based on spring boot applications, e.g. Jib. Maybe we should take an approach that leverages that?

@rgoers
Copy link
Member

rgoers commented Nov 1, 2022

I certainly don't have an issue against providing a Docker image. I just am unclear as to how it will be useful, but I suppose if it is just used as a starting point it could work out. But the approach you mention sounds much better in the long run. I just don't want to have to support that a) I won't have any use for, and b) we will have to keep compatibility with even though we want to do something much better.

To be clear, I use the FileChannel, which pretty much requires fast disk (i.e. SSDs or equivalent) to perform well. It also requires a dedicated "disk" so that data isn't lost on restart. This doesn't really work well with Docker/Kubernetes so we don't use it for Flume.

@tmgstevens
Copy link
Contributor Author

So a couple of points in here:

It packages all the artifacts whether they are required or not.

True - packaging is something that I think we need to think about going forwards. For people who want a lightweight deployment, should we offer different profiles, the flipside being that if you're combining two components from different modules (e.g. syslog and kafka, HTTP and HDFS etc) then actually do you ever get the benefit of the modularity, or does it just ramp up your complexity (complexity = adoption blocker in my mind).

Unless I am mistaken, it is getting the distribution tar and using the configuration located within that. I fail to see how useful that will be as I would expect most users would have a custom configuration.

It does bundle the default conf directory, but it is anticipated that a user would re-map that or pass config in via environment variables (which could then include secrets). Both designed to work in docker and kubernetes. There's an example of doing that here: https://github.com/apache/flume/blob/d2bd7812dbacd86459726c0fd3dc774272ce0222/flume-ng-tests/src/test/java/org/apache/flume/test/util/DockerInstall.java#L137-L153

I think starting from "everything goes in the bucket" to match our historic deployment shape is fine. I agree that eventually we need to get to a more modular approach that provides easy examples for folks building just the parts they need.

+1

There's some tooling around for easier container image building based on spring boot applications, e.g. Jib. Maybe we should take an approach that leverages that?

Personally I'd rather not re-write the docker deployment right now given that what's there works pretty well. We could look to move away from the spotify plugin to something else, but I don't want to re-architect the whole packaging of Flume at the moment.

To be clear, I use the FileChannel, which pretty much requires fast disk (i.e. SSDs or equivalent) to perform well. It also requires a dedicated "disk" so that data isn't lost on restart. This doesn't really work well with Docker/Kubernetes so we don't use it for Flume.

I actually think this would be fine - you can have persistent disks in Kubernetes assigned to pods, same applies to docker where you can mount a volume. Would I necessarily recommend that you re-write your deployment model to use containers? No. But for example, in @busbey 's world where he might be moving from a previously managed deployment to something that needs additional orchestration, using Docker or Kubernetes and deploying agents across many nodes, this could make things much easier to manage and maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants