-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-2952] Splits CI into k8s + docker-compose #3797
[AIRFLOW-2952] Splits CI into k8s + docker-compose #3797
Conversation
@bolkedebruin @Fokko @kaxil PTAL. I'm going to try to get this all working tonight (though I assume the tests should pass based on the travis running on my branch). Kubernetes tests are silently running non-kubernetes airflow tests. This means that it will show up as passing as long as the non-kubernetes tests pass. |
ede6729#diff-354f30a63fb0907d4ad57269548329e3R43 Looks like this might not be so simple. Getting errors based on: @gerardo any idea how to get around these permission issues? I have to run to an event but will revisit later tonight. |
@dimberman I'll have a look now |
@dimberman given the Kubernetes ci scripts runs outside docker, this line should be I think we can expose minikube as just another service container inside the docker-compose setup, but for the sake of getting the K8S tests back up, it looks good. |
@gerardo No luck with that. Any other potential culprits? |
Travis sometimes has this issue. Just make "sudo rm -rf " part of the script for one time. Also don't rely on "Travis" as a user but use the right env var. |
Hey @dimberman, after your change, the build started failing in a different place: Not sure what's the best way to do this with Tox, but at this point, we need to skip these scripts and only run the package installation steps, |
@dimberman I was trying to run the tests as-is inside our docker image, but so far, minikube doesn't seem to like to run inside docker. |
For future reference. This looks good: https://github.com/kubernetes-sigs/kubeadm-dind-cluster
|
Nice one @gerardo I'm also working on getting rid of tox, since we now have docker-compose and tox, which both act as a visualisation layer. |
@gerardo Minikube definitely will not run inside docker (there's such thing as "docker in docker" but it's a really bad rabbit hole that we should avoid by all means necessary). Let me see if I can remove those earlier tasks. Interesting! That looks really cool. I think that would be a great idea for a future PR to switch off of minikube. |
@Fokko @gerardo Quick update. I've been still running into weird minikube issues and have been unable to get the CI to build properly. This has become blocking on me implementing/PRing fixes for the k8sExecutor and the bug reports are starting to pile up. Could we revert the dockerized CI and then re-merge it once we get it working with k8s? I'm working with the k8s-kubeadm-dind guys as I think the best way forward might be to switch to that. |
@dimberman I can take a stab at making this work in a separate branch if you want. This is definitely a blocker, but reverting sounds like even more work. |
@gerardo I agree that it would be a pain, but it's going to REALLY hurt if we merge PRs for a couple of weeks and then can't track down what broke the k8s executor when it restarts. Definitely please try on a different branch. |
.travis.yml
Outdated
- TOX_ENV=py27-backend_sqlite-env_docker | ||
- TOX_ENV=py27-backend_postgres-env_docker | ||
- TOX_ENV=py35-backend_mysql-env_docker PYTHON_VERSION=3 | ||
- TOX_ENV=py35-backend_sqlite-env_ddocker PYTHON_VERSION=3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo here
@Fokko @bolkedebruin @gerardo I was able to get kubeadm to work with a local registry (that was a rough experience lol). I'm still running into some weird TOX issues (like being unable to find python 3.5) but progress! |
@gerardo Ok it's now solidly back in the court of "getting TOX to work". Kubeadm is able to build and deploy. PTAL and let me know how we can get these to pass. |
@dimberman I'm trying to figure out the simplest changes that can get this to work. So far:
|
@gerardo ok further process. The main issue left is that it keeps attempting to compile the s3 tests even though there it's claiming there is no moto (this is after I attempted to install moto both in tox and in travis). |
@dimberman Currently I'm having an issue while running the ./docker/build.sh locally. There still seem to be an issue with the SLUGIFY_USES_TEXT_UNIDECODE=yes while running the script locally. I know you have added that env var in travis-ci.yml file but it is also required when running the script locally. Should we export it in the build.sh file ? |
Awesome work @dimberman |
scripts/ci/run-ci-docker.sh
Outdated
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
set -x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set -e
too?
except Exception as e: | ||
print(e) | ||
from .s3_to_hive_operator import * | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like debugging code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashb this is more of something I wanted to ask how to solve. I run into issues with the s3 tests where they don't skip even though moto shows up as "None". This was preventing the k8s tests from running at all since tests were failing at imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example build where it fails because it's attempting to run the moto decorator even though it shouldn't be able to https://travis-ci.org/bloomberg/airflow/jobs/423162910#L4949
@@ -57,10 +58,12 @@ passenv = * | |||
commands = | |||
pip wheel --progress-bar off -w {homedir}/.wheelhouse -f {homedir}/.wheelhouse -e .[devel_ci] | |||
pip install --progress-bar off --find-links={homedir}/.wheelhouse --no-index -e .[devel_ci] | |||
env_kubernetes: pip install boto3 | |||
env_kubernetes: pip install moto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not want boto3 and moto always? Also aren't these already installed as test_requires from setup.py? Why do we need to specify them directly here? (My tox is hazy, so there may be a reason)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to the problem above. Something is making the s3 tests attempt to run a "None" function blocking all testing.
scripts/ci/run-ci-kubernetes.sh
Outdated
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
set -x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set -e
AIRFLOW_ROOT="$DIRNAME/../.." | ||
|
||
# Fix file permissions | ||
sudo chown -R travis.travis . $HOME/.wheelhouse/ $HOME/.cache/pip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if I want to be able to run kube-based tests locally, what does the workflow for that look like?
|
||
$DIRNAME/minikube/start_minikube.sh | ||
#rm /etc/docker/daemon.json | ||
#sudo cp $DIRNAME/daemon.json /etc/docker/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code please
@@ -0,0 +1,50 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set -e
.travis.yml
Outdated
- pip install tox | ||
- pip install codecov | ||
- pip install boto3 | ||
- pip install moto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are in setup.py - why do we need them here?
Since using docker-compose for everything was causing k8s integration tests to die silently, this will determine whether a CI test is in k8s or docker-compose mode
@Fokko @bolkedebruin @ashb I am seeing multiple errors with the kubernetes executor once running:
For these reasons I think this is a high priority bug. Especially since every PR we add from here on just makes the k8s executor more and more broken. I realize reverting the docker-compose CI would be a pain, but if we don't either do that or fix these bugs in the short term I fear the k8sexecutor work will become even more broken. |
If it's becoming too hard, I agree with reverting the docker-compose CI changes. I could go back and create a new branch with those changes and work on fixing the k8s setup. |
We should be able to set up a separate branch beside the docker-compose, which installs minikube and spins the Airflow docker, right? |
I created #3922 as an alternative solution for this issue. /cc @dimberman @gerardo @Fokko |
@odracci I like your solution better as a short-term fix (switching to k-d-c should ideally be done later when the build is stable). Let me know when it's ready to review :). |
I've merged the one of @odracci. |
Make sure you have checked all steps below.
Jira
Description
Since using docker-compose for everything was causing k8s integration
tests to die silently, this will determine whether a CI test is in k8s
or docker-compose mode
Tests
Commits
Documentation
Code Quality
git diff upstream/master -u -- "*.py" | flake8 --diff