[AIRFLOW-2952] Splits CI into k8s + docker-compose #3797

dimberman · 2018-08-23T22:37:45Z

Make sure you have checked all steps below.

Jira

My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
- https://issues.apache.org/jira/browse/AIRFLOW-2952
- In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.

Description

Here are some details about my PR, including screenshots of any UI changes:

Since using docker-compose for everything was causing k8s integration
tests to die silently, this will determine whether a CI test is in k8s
or docker-compose mode

Tests

My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters (not including Jira issue reference)
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

Documentation

In case of new functionality, my PR adds documentation that describes how to use it.
- When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

Passes git diff upstream/master -u -- "*.py" | flake8 --diff

dimberman · 2018-08-23T22:57:53Z

@bolkedebruin @Fokko @kaxil PTAL. I'm going to try to get this all working tonight (though I assume the tests should pass based on the travis running on my branch).

Kubernetes tests are silently running non-kubernetes airflow tests. This means that it will show up as passing as long as the non-kubernetes tests pass.

dimberman · 2018-08-23T23:11:51Z

ede6729#diff-354f30a63fb0907d4ad57269548329e3R43

Looks like this might not be so simple. Getting errors based on:
IOError: [Errno 13] Permission denied: u'/home/travis/.wheelhouse/bleach-2.1.4-py2.py3-none-any.whl'

@gerardo any idea how to get around these permission issues?

I have to run to an event but will revisit later tonight.

gerardo · 2018-08-24T03:31:20Z

@dimberman I'll have a look now

gerardo · 2018-08-24T03:54:07Z

@dimberman given the Kubernetes ci scripts runs outside docker, this line should be sudo chown -R travis.travis . $HOME/.wheelhouse/ $HOME/.cache/pip instead.

I think we can expose minikube as just another service container inside the docker-compose setup, but for the sake of getting the K8S tests back up, it looks good.

dimberman · 2018-08-24T08:54:07Z

@gerardo No luck with that. Any other potential culprits?

bolkedebruin · 2018-08-24T09:17:00Z

Travis sometimes has this issue. Just make "sudo rm -rf " part of the script for one time. Also don't rely on "Travis" as a user but use the right env var.

gerardo · 2018-08-29T05:46:34Z

Hey @dimberman, after your change, the build started failing in a different place: sudo: kadmin: command not found. This means tox is running the 2-setup-kdc.sh script.

Not sure what's the best way to do this with Tox, but at this point, we need to skip these scripts and only run the package installation steps, 5-run-tests.sh, 6-check-license.sh and codecov.

gerardo · 2018-08-29T05:48:21Z

@dimberman I was trying to run the tests as-is inside our docker image, but so far, minikube doesn't seem to like to run inside docker.

gerardo · 2018-08-29T06:10:42Z

For future reference. This looks good: https://github.com/kubernetes-sigs/kubeadm-dind-cluster

If you're an application developer, you may be better off with Minikube because it's more mature and less dependent on the local environment, but if you're feeling adventurous you may give kubeadm-dind-cluster a try, too. In particular you can run kubeadm-dind-cluster in CI environment such as Travis without having issues with nested virtualization.

Fokko · 2018-08-29T06:43:43Z

Nice one @gerardo

I'm also working on getting rid of tox, since we now have docker-compose and tox, which both act as a visualisation layer.

dimberman · 2018-08-29T16:57:09Z

@gerardo Minikube definitely will not run inside docker (there's such thing as "docker in docker" but it's a really bad rabbit hole that we should avoid by all means necessary). Let me see if I can remove those earlier tasks.

Interesting! That looks really cool. I think that would be a great idea for a future PR to switch off of minikube.

dimberman · 2018-08-30T14:56:17Z

@Fokko @gerardo Quick update. I've been still running into weird minikube issues and have been unable to get the CI to build properly. This has become blocking on me implementing/PRing fixes for the k8sExecutor and the bug reports are starting to pile up. Could we revert the dockerized CI and then re-merge it once we get it working with k8s?

I'm working with the k8s-kubeadm-dind guys as I think the best way forward might be to switch to that.

gerardo · 2018-08-30T20:47:29Z

@dimberman I can take a stab at making this work in a separate branch if you want. This is definitely a blocker, but reverting sounds like even more work.

dimberman · 2018-08-30T21:04:38Z

@gerardo I agree that it would be a pain, but it's going to REALLY hurt if we merge PRs for a couple of weeks and then can't track down what broke the k8s executor when it restarts. Definitely please try on a different branch.

gerardo · 2018-08-31T00:24:44Z

.travis.yml

+    - TOX_ENV=py27-backend_sqlite-env_docker
+    - TOX_ENV=py27-backend_postgres-env_docker
+    - TOX_ENV=py35-backend_mysql-env_docker PYTHON_VERSION=3
+    - TOX_ENV=py35-backend_sqlite-env_ddocker PYTHON_VERSION=3


There's a typo here

dimberman · 2018-08-31T03:14:25Z

@Fokko @bolkedebruin @gerardo I was able to get kubeadm to work with a local registry (that was a rough experience lol). I'm still running into some weird TOX issues (like being unable to find python 3.5) but progress!

dimberman · 2018-08-31T03:33:10Z

cc: @feng-tao @kaxil just a warning any PR merged right now is not being tested against kubernetes.

dimberman · 2018-08-31T04:03:39Z

@gerardo Ok it's now solidly back in the court of "getting TOX to work". Kubeadm is able to build and deploy. PTAL and let me know how we can get these to pass.

gerardo · 2018-08-31T05:00:37Z

@dimberman I'm trying to figure out the simplest changes that can get this to work. So far:

airflow initdb is failing. It might be easier to install postgres in the travisci host again
After this, we'll need another value for backend_postgres (or another variable altogether) when running the k8s tests. This one should point to localhost instead
The final error I see is kinit: command not found, but the script keeps running after failing that one anyway.

gerardo · 2018-08-31T05:49:59Z

@dimberman we missed creating the airflow database for postgresql

dimberman · 2018-08-31T21:17:14Z

@gerardo ok further process. The main issue left is that it keeps attempting to compile the s3 tests even though there it's claiming there is no moto (this is after I attempted to install moto both in tox and in travis).

gauthiermartin · 2018-09-04T14:30:56Z

@dimberman Currently I'm having an issue while running the ./docker/build.sh locally. There still seem to be an issue with the SLUGIFY_USES_TEXT_UNIDECODE=yes while running the script locally. I know you have added that env var in travis-ci.yml file but it is also required when running the script locally. Should we export it in the build.sh file ?

dimberman · 2018-09-11T21:54:34Z

@Fokko @barni is going to investigate how we can make the airflow API tests to work today. Should hopefully have this working soon.

Fokko · 2018-09-12T08:28:28Z

Awesome work @dimberman

ashb · 2018-09-12T08:30:34Z

scripts/ci/run-ci-docker.sh

+#  specific language governing permissions and limitations
+#  under the License.
+
+set -x


set -e too?

ashb · 2018-09-12T08:31:54Z

tests/operators/__init__.py

+except Exception as e:
+    print(e)
+    from .s3_to_hive_operator import *
+    pass


This looks like debugging code?

@ashb this is more of something I wanted to ask how to solve. I run into issues with the s3 tests where they don't skip even though moto shows up as "None". This was preventing the k8s tests from running at all since tests were failing at imports.

Here's an example build where it fails because it's attempting to run the moto decorator even though it shouldn't be able to https://travis-ci.org/bloomberg/airflow/jobs/423162910#L4949

ashb · 2018-09-12T08:33:32Z

tox.ini

@@ -57,10 +58,12 @@ passenv = *
 commands =
  pip wheel --progress-bar off -w {homedir}/.wheelhouse -f {homedir}/.wheelhouse -e .[devel_ci]
  pip install --progress-bar off --find-links={homedir}/.wheelhouse --no-index -e .[devel_ci]
+  env_kubernetes: pip install boto3
+  env_kubernetes:  pip install moto


Do we not want boto3 and moto always? Also aren't these already installed as test_requires from setup.py? Why do we need to specify them directly here? (My tox is hazy, so there may be a reason)

related to the problem above. Something is making the s3 tests attempt to run a "None" function blocking all testing.

ashb · 2018-09-12T08:34:10Z

scripts/ci/run-ci-kubernetes.sh

+#  specific language governing permissions and limitations
+#  under the License.
+
+set -x


ashb · 2018-09-12T08:34:45Z

scripts/ci/run-ci-kubernetes.sh

+AIRFLOW_ROOT="$DIRNAME/../.."
+
+# Fix file permissions
+sudo chown -R travis.travis . $HOME/.wheelhouse/ $HOME/.cache/pip


What if I want to be able to run kube-based tests locally, what does the workflow for that look like?

ashb · 2018-09-12T08:35:04Z

scripts/ci/kubernetes/setup_kubernetes.sh

-
-$DIRNAME/minikube/start_minikube.sh
+#rm  /etc/docker/daemon.json
+#sudo cp $DIRNAME/daemon.json /etc/docker/


Remove commented code please

ashb · 2018-09-12T08:35:27Z

scripts/ci/kubernetes/minikube/start_kubeadm.sh

@@ -0,0 +1,50 @@
+#!/usr/bin/env bash


ashb · 2018-09-12T08:36:36Z

.travis.yml

+  - pip install tox
+  - pip install codecov
+  - pip install boto3
+  - pip install moto


These are in setup.py - why do we need them here?

Since using docker-compose for everything was causing k8s integration tests to die silently, this will determine whether a CI test is in k8s or docker-compose mode

dimberman · 2018-09-12T20:51:05Z

@Fokko @bolkedebruin @ashb I am seeing multiple errors with the kubernetes executor once running:

The API calls are failing because airflow thinks that orm_dag is None

[2018-09-12 20:41:01,730] ERROR in app: Exception on /api/experimental/dags/example_kubernetes_annotation/paused/false [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python2.7/dist-packages/airflow/api/auth/backend/default.py", line 32, in decorated
    return function(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/www_rbac/api/experimental/endpoints.py", line 156, in dag_paused
    orm_dag.is_paused = False
AttributeError: 'NoneType' object has no attribute 'is_paused'

When I log into the webserver (both through k-d-c and through minikube), I am unable to see any DAGs even though the logs show them as loaded and the files are in the correct directory.
When I attempt to ping through k-d-c it just hangs infinitely.

For these reasons I think this is a high priority bug. Especially since every PR we add from here on just makes the k8s executor more and more broken. I realize reverting the docker-compose CI would be a pain, but if we don't either do that or fix these bugs in the short term I fear the k8sexecutor work will become even more broken.

gerardo · 2018-09-12T21:10:59Z

If it's becoming too hard, I agree with reverting the docker-compose CI changes. I could go back and create a new branch with those changes and work on fixing the k8s setup.

Fokko · 2018-09-17T13:09:29Z

We should be able to set up a separate branch beside the docker-compose, which installs minikube and spins the Airflow docker, right?

odracci · 2018-09-19T22:34:48Z

I created #3922 as an alternative solution for this issue. /cc @dimberman @gerardo @Fokko

dimberman · 2018-09-19T22:40:35Z

@odracci I like your solution better as a short-term fix (switching to k-d-c should ideally be done later when the build is stable). Let me know when it's ready to review :).

Fokko · 2018-09-21T12:38:37Z

I've merged the one of @odracci.

dimberman mentioned this pull request Aug 23, 2018

[AIRFLOW-2947] Added Helm chart for Kubernetes executor #3792

Closed

4 tasks

gerardo reviewed Aug 31, 2018

View reviewed changes

dimberman mentioned this pull request Sep 7, 2018

[AIRFLOW-3022] Add volume mount to KubernetesExecutorConfig #3855

Merged

6 tasks

ashb reviewed Sep 12, 2018

View reviewed changes

dimberman added 3 commits September 12, 2018 10:32

[AIRFLOW-2952] Splits CI into k8s + docker-compose

250a11c

Since using docker-compose for everything was causing k8s integration tests to die silently, this will determine whether a CI test is in k8s or docker-compose mode

Tox modifications to make new build work

0064e98

Setting up kubeadm-dind-cluster for kubernetes testing

d28546c

dimberman added 7 commits September 12, 2018 10:32

Hack to make s3 tests work. This needs to be fixed before merge

f9e4652

attempting kubeadm host

6cea6ff

added -e to set

3f1cfdc

removed comments

20770b7

removed boto3 and moto from travis.yaml

0afad49

added python 3 for k8s testing

eacfc71

Attempting to access dind for Airflow API

d942f25

Fokko closed this Sep 21, 2018

[AIRFLOW-2952] Splits CI into k8s + docker-compose #3797

[AIRFLOW-2952] Splits CI into k8s + docker-compose #3797

Conversation

dimberman commented Aug 23, 2018 • edited Loading

Jira

Description

Tests

Commits

Documentation

Code Quality

dimberman commented Aug 23, 2018

dimberman commented Aug 23, 2018

gerardo commented Aug 24, 2018

gerardo commented Aug 24, 2018

dimberman commented Aug 24, 2018

bolkedebruin commented Aug 24, 2018

gerardo commented Aug 29, 2018

gerardo commented Aug 29, 2018

gerardo commented Aug 29, 2018

Fokko commented Aug 29, 2018

dimberman commented Aug 29, 2018

dimberman commented Aug 30, 2018

gerardo commented Aug 30, 2018

dimberman commented Aug 30, 2018

Choose a reason for hiding this comment

dimberman commented Aug 31, 2018

dimberman commented Aug 31, 2018

dimberman commented Aug 31, 2018

gerardo commented Aug 31, 2018

gerardo commented Aug 31, 2018

dimberman commented Aug 31, 2018

gauthiermartin commented Sep 4, 2018

dimberman commented Sep 11, 2018

Fokko commented Sep 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimberman commented Sep 12, 2018

gerardo commented Sep 12, 2018

Fokko commented Sep 17, 2018

odracci commented Sep 19, 2018

dimberman commented Sep 19, 2018

Fokko commented Sep 21, 2018

dimberman commented Aug 23, 2018 •

edited

Loading