Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-1860 new developer option for ansible in docker to deploy to vagrant #1261

Closed
wants to merge 36 commits into from

Conversation

ottobackwards
Copy link
Contributor

@ottobackwards ottobackwards commented Nov 13, 2018

The goal of this PR is to provide a new "full_dev" option for new and old users that does not require as much setup and version matching to try Metron's full dev environment.

Currently, the vagrant up command runs ansible locally, on the host machine, to build and deploy metron. This means that the user must not only have Vagrant, Virtual Box and Docker, but must also have all the tools necessary to build metron ( maven, java, c++ 11 etc ) and run ansible ( python and others ). It has been a common source of problems for new users to get started with Metron because of version or setup problems.

This PR introduces a new metron-deployment/development option which tries to address this problem, and make it possible for the user to only have Vagrant, VirtualBox and Docker ( along with a local copy of the source ) to be able to run full dev.

The new option starts the Vagrant VM, but does not run ansible in it. Instead it runs a docker container which contains all the tools/versions necessary, and that container is what runs ansible.

##Testing
Have the correct versions of vagrant, virtual box and docker installed and running

cd $METRON_SRC_ROOT/metron-deployment/development/centos6_docker_build
./build_and_run.sh

Answer yes to building the vagrant box.
Answer yes to building the docker machine
Go grab a coffee.

The end result should be full dev running in the vagrant instance.
The logs directory will have a log for each run.

If you run a second time, you can say no to building the docker machine.

Differences

  • This does not support skip tags passed on the cli
  • This does not support provision

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?

  • Have you included steps or a guide to how the change may be verified and tested manually?

  • [-] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:

    mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh 
    
  • [-] Have you written or updated unit tests and or integration tests to verify your changes?

  • [-] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

  • [-] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:

…docker ( and a copy of the metron codebase ) are required to run the Metron

full-dev vm with it's default setup.

This is the initial work, there will be refactorings
@nickwallen
Copy link
Contributor

Valiant effort @ottobackwards . I am just wondering how much easier this really makes it?

Have you thought about just publishing a Metron Demo image to Vagrant Cloud? Does that scratch the same itch?

  • Everything is already pre-installed (removes one source of potential problems for new users.)
  • All a user needs is Vagrant and VirtualBox.

@mmiklavc
Copy link
Contributor

@ottobackwards thanks for the submission. Per recent community comments, it sounds like we could use some improvements to how our build/deploy dep versions interact with other tooling that may require other versions of things.

Is this the correct dependency listings for each host/container?

The Docker container will be pre-configured with:

  • Java 8
  • Ansible 2.4.0+
  • Python 2.7
  • Maven 3.3.9
  • C++11 compliant compiler, like GCC

And the developer host machine would now need to manage versions of:

  • Vagrant 2.0+
  • Vagrant Hostmanager Plugin
  • Virtualbox 5.0+
  • Docker

The deployment goes through these steps:

  1. You mount m2 repo and your code dir in the Docker instance.
  2. build_and_run optionally spins up Vagrant in VirtualBox.
  3. build_and_run optionally creates Docker instance with pre-reqs for building and deploying Metron.
  4. build_and_run runs the build and then calls the Ansible deployment scripts from within Docker.

At the end, you have an ephemeral Docker instance + Vagrant instance that has the running Metron instance?

@ottobackwards
Copy link
Contributor Author

@nickwallen I did not think of that. I was improving the process that stands today. I think in a world where the posted image exists, we would still want the ability to try the latest ( to verify a fix pre-release etc ).

@ottobackwards
Copy link
Contributor Author

ottobackwards commented Nov 14, 2018

@mmiklavc That is basically correct. Except that the ansible version is 2.5, since it only applies to this build, and allows for the yaml log formatting.

Also, in the latest version, the ansible once again does the clean and build as opposed to the script. I had a lot of problems getting the c++ picked up from ansible and moved the build out of it for the time, but the idea was always to have ansible run the metron_build, and that has returned.

The reasoning for the prompts to build the vagrant box and the docker ->

  • if you are using this during development, IE> we are working ON ansible or ON docker, you may fail in the docker or ansible stage without modifying the vm, and thus not need to vagrant up again.
  • Likewise, you may not need to rebuild the docker machine if you have not made changes, or you may in fact need to. I added these flags as I developed.

@ottobackwards
Copy link
Contributor Author

The integration test failure has to do with the profiler tests and seem unrelated.

@nickwallen
Copy link
Contributor

If we go this approach, why not just replace the existing "Full Dev" environments (both centos and ubuntu) rather than add new environments to support, test, and keep in-sync?

@nickwallen
Copy link
Contributor

How does the build time compare to the current approach (mvn clean install -DskipTests)? Does it change at all since it is now run in a Docker container?

How does the time it takes to get Full Dev running compare (whatever is equivalent to vagrant up)?

@JonZeolla
Copy link
Member

I'm going to take a stab at a further look next week. For now I gave it a quick run-up and it was successful.

@ottobackwards
Copy link
Contributor Author

@nickwallen That is an option, but not something I would pick as the goal from the outset if you know what I mean.

@JonZeolla
Copy link
Member

If we'd want to replace full dev we would need to get skip tags passed in appropriately, I use that a lot. That said I'm not 100% that we need to do that all at once.

@ottobackwards
Copy link
Contributor Author

We could also use more tags, for example I may want to skip building the java, but not skip building the RPMs. Think of a dev flow -> I make my change, run my local tests and want to spin up full dev. It is already built, but needs the rpms, I should be able to make ansible skip the compile/package of java and still do the rpms/debs

@ottobackwards
Copy link
Contributor Author

anyone have any ideas of the best way to time these things?

@justinleet
Copy link
Contributor

In the interest of pure (and almost certainly incredibly ignorant) speculation, along the lines of the vagrant cloud, is it theoretically possible to save off a base image that has the Hadoop and non-Metron installs done? Then if said base image is missing (or possibly has changed properties, e.g. HDP version) rebuild it, and if found just build Metron and install mpack? Or alternatively, don't load the end result to Vagrant Cloud, just upload the base image we install on top of. Then only update it on a base image update.

What I'm getting at is that it would be nice to be able to build off of master, in a way that doesn't require an external dependency, and still lets us cache off the majority of the install (setting up HDP and Ambari). At that point, full dev would essentially be (for most builds) "build metron, build RPMs, run up image, do Metron install and setup". Which is probably 20 minutes faster and would make me a substantially happier person.

@ottobackwards
Copy link
Contributor Author

ottobackwards commented Nov 21, 2018

It is possible to imagine a number of scenarios, including that, but also needing to build with new hadoop versions ( can't lose build from scratch ).

There are a number of things we can do down the road.

I think this work is going to help people enough in the near term to land it, while we discuss longer term refactoring and workflow.

@ottobackwards
Copy link
Contributor Author

If you create an issue for your vagrant base machine with our hadoop / ambari already in it, you can assign it to me. @justinleet

@nickwallen
Copy link
Contributor

nickwallen commented Nov 21, 2018

hey @ottobackwards - I am still wrapping my head around this, but one small nit is that I don't like all the confirmation prompts. With the prompts I have to constantly check back to ensure it is not stuck on another prompt waiting for me to do something.

As much as possible it should just be fire-and-forget, so I can run it, work on something else, and hopefully come back some time later with a functioning Metron install.

I think you could get the same flexibility, if you either used command-line switches or if you moved all the prompts to the beginning of the script.

@ottobackwards
Copy link
Contributor Author

@nickwallen, yeah, I did prompts as I went along debugging. I was thinking that folks may not like them.
I'll parameterize things.

@nickwallen
Copy link
Contributor

@JonZeolla: Is it your CPU/Memory preferences in docker?

Doubt it. CPUs: 6, Mem: 12 GB, Swap 3 GB

Is it running sufficiently fast for you @JonZeolla ?

@nickwallen
Copy link
Contributor

nickwallen commented May 1, 2019

@mmiklavc: ...and it takes 10x as long for me when I've got a Vagrant instance running along with a bunch of browser tabs.

I was thinking about this the other day actually. Since with these changes we no longer rely on Vagrant to kick-off Ansible, we could alter the steps so that it first completes the build and packaging and then only after that is done, it launches the VM. That way we don't have the two fighting for memory at the same time.

First off, does that sound like a reasonable thing to do @ottobackwards ? If so, is it a heavy lift we should hold off on for a follow-on PR? Or is it just a simple re-ordering of some of the actions in metron-up.sh?

Edit: We should probably just get the basics working here before going on changing anything else drastically. Ignore what I said.

@ottobackwards
Copy link
Contributor Author

ottobackwards commented May 1, 2019

./metron-up.sh 6.30s user 3.09s system 0% cpu 1:20:47.77 total

@nickwallen I think that is worth doing definitely! I'll get on it and let you know what it will take

@nickwallen
Copy link
Contributor

nickwallen commented May 1, 2019

Can we do it as a follow-on PR?

EDIT:. Looks like it also took about 80 minutes for you also. Hmm.

@ottobackwards
Copy link
Contributor Author

ottobackwards commented May 1, 2019

I think it will take me a couple of hours to make the change.

@ottobackwards
Copy link
Contributor Author

Ok, I'm going to do that as a follow on

@ottobackwards
Copy link
Contributor Author

@nickwallen
Copy link
Contributor

Hey @ottobackwards - What are your thoughts on the duration of the build? Do you have any thoughts on how we could improve that?

@ottobackwards
Copy link
Contributor Author

You mean other than not run the vm?

@nickwallen
Copy link
Contributor

I don't know where this stands. Do you think this should be merged in spite of the time it takes to spin-up? Or are you looking at ways to improve the build time? What is the path forward here?

@ottobackwards
Copy link
Contributor Author

I have a secondary branch and I'm working through building without the vm running
I'll have time comparisons soon, let's see how that pan's out

@ottobackwards
Copy link
Contributor Author

This latest merge contains a refactoring that makes the resource use much better, by delaying the vagrant machine start until after the builds are done.

This involves refactoring the playbooks and the scripts.

Screen Shot 2019-05-06 at 15 51 11

@ottobackwards
Copy link
Contributor Author

Some thoughts for next steps.

  • try setting volume sync options to :delegate ( I tried and it didn't make much difference, but we could look more )

  • build the docker image, but add a last layer with a COPY of all the source into the image. That way no volume penalty, and caching will make the docker build fast.

  • new vagrant base image that has the cluster already installed, just without metron to make deploy faster

@mmiklavc
Copy link
Contributor

mmiklavc commented May 7, 2019

@ottobackwards Thanks for that comparison info, that makes it really easy to compare.

It's a fractionally small part of the build, but what's going on with clean taking 8x as long?

Those time drops are pretty substantial - is that purely resource contention?

Thinking through this build process a bit more, just curious, are you sharing local m2 repo from the host? I'm assuming this doesn't pull every dep down local to the Docker container. That's probably worth adding to the diagram for clarity.

@ottobackwards
Copy link
Contributor Author

I believe the issue with clean is due to slowness with docker doing writes to osx hosted volumes.
I think resource contention is a big part of this on my machine at least.
I do share the repo.
https://github.com/apache/metron/pull/1261/files#diff-80771b861babcdefd8d7bbc81c412ff1R110

DOCKER_CMD="bash"
DOCKER_CMD_BASE[0]="docker run -d -t --name MetronBuild "
DOCKER_CMD_BASE[1]="-v \"${VAGRANT_PATH}/../../..:/root/metron\" "
DOCKER_CMD_BASE[2]="-v ~/.m2:/root/.m2 "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ottobackwards
Copy link
Contributor Author

@mmiklavc Somone else needs to try it obviously

@ottobackwards
Copy link
Contributor Author

@mmiklavc as I stated above, if we need to, another change would be to create a docker layer that was a copy of the source, such that we didn't need to use the volumes at all.... but I'd have to do some research on that as a follow on

@nickwallen
Copy link
Contributor

@ottobackwards I ran your latest code up a few times and I am not able to replicate your results unfortunately.

This is the overall duration to go from source code to a fully deployed dev environment.

Your results differ significantly from mine...

(1) Any ideas why we are getting such different results? I have allocated Docker 12G of RAM, 6 cores, and 3G of swap. Are there any other settings that you have adjusted on your build machine?

(2) Which logs did you use to break out the build, clean, package times? Since Ansible runs on Docker, we no longer have that log persisted on the host. Once the Docker container is shut-down the logs are lost, unless I am missing something.

@ottobackwards
Copy link
Contributor Author

In this PR, the ansible log goes to the /logs directory

@mmiklavc
Copy link
Contributor

@ottobackwards - I want to spend some time in the next week resurrecting this and taking another look to see where we're at.

@ottobackwards
Copy link
Contributor Author

@mmiklavc I think we are doing docker wrong, wrt building.
I think we may want to have the build docker be at the root of the source and use docker ignore to set that up, that way the src is automagically in the image etc.

Copy link

@luozhenwei luozhenwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, may I ask you a question? For testing purposes, I compiled metron according to the source code on github.
Reference links are as follows: https://github.com/apache/metron/tree/master/metron-deployment/packaging/docker/ansible-docker.
The last command to execute is MVN clean package-DskipTests.
The result is successful, showing build success.
Excuse me, how many jar packages are generated by this operation?
How to start metron in docker? Or can we say something about the next operation? I can't find duying in the community anymore. Thank you very much!

@mmiklavc
Copy link
Contributor

mmiklavc commented Sep 3, 2019

@luozhenwei - it's generally best to ask these questions on the user or dev list, but here's the list of jars we depend on in our full dev (e.g. https://github.com/apache/metron/tree/master/metron-deployment/development/centos6) env:

[root@node1 ~]# ls -1 /usr/metron/0.7.2/lib
metron-common-0.7.2.jar
metron-data-management-0.7.2.jar
metron-elasticsearch-storm-0.7.2-uber.jar
metron-enrichment-common-0.7.2-uber.jar
metron-enrichment-storm-0.7.2-uber.jar
metron-maas-service-0.7.2-uber.jar
metron-management-0.7.2.jar
metron-parsers-0.7.2-uber.jar
metron-parsers-common-0.7.2-uber.jar
metron-parsing-storm-0.7.2-uber.jar
metron-pcap-backend-0.7.2.jar
metron-performance-0.7.2.jar
metron-profiler-repl-0.7.2.jar
metron-profiler-spark-0.7.2.jar
metron-profiler-storm-0.7.2-uber.jar
metron-rest-0.7.2.jar
metron-solr-storm-0.7.2-uber.jar

17 jars. The Docker deploy you linked to hasn't been maintained/updated in quite some time, so I'm not entirely sure what its current state is. @merrimanr may have more detail on this. If you're just trying to explore Metron, I would run up full dev on centos6 via the instructions in the link I provided above.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants