-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support configuration key to toggle using IPs or hostnames #7
Conversation
* Background We're patching the socket library in order to use IP addresses instead of hostnames, because Lyft hostnames are special and don't actually resolve. (See https://github.com/lyft/etl/pull/5560 for context) `service_venv` sets the \$PATH such that `/usr/local/lib/service_venv/bin/` takes precedence over `srv/service/current/bin/`, so it's not enough to simply put a patched version of `airflow` in the ETL repository. We're ensuring that our patched version of `airflow` takes precedence by overwriting `/usr/local/lib/service_venv/bin/airflow` with a script that just calls our patched version at `/srv/service/current/bin/airflow` (from the ETL repository). See this salt-state [1]. That doesn't work anymore, because with the rollout of frozen_venvs we can't count on `airflow` being at a hardcoded path. https://github.com/lyft/etl/pull/6073 was one attempt to solve this by finding the service_venv root dynamically. That approach doesn't work, because it depends on `service_venv` being available by the time the Jinja template is rendered, but `service_venv` won't be available until it's created in the lyft-python state (which requires that the template is rendered). We can avoid this chicken and egg problem if we simply patch the Airflow code directly, rather than trying to overwrite it on deployment. We apply our patch iff the `prefer_ip_over_hostname` key under the `lyft` configuration section is set. * Rollout On the sharedairflowworker (which is only running test DAGs so far), we will stop overwriting the airflow command with our patched version and set the `prefer_ip_over_hostname` key. If this works, then we can take the same approach for the other ASGs. Until then, we will continue overwriting the `airflow` command under service_venv/ with our patched version from ETL. [1] https://github.com/lyft/etl/blob/master/ops/config/states/etl/init.sls#L50-L59
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall makes sense to me. Minor suggestion
from airflow import configuration | ||
from airflow.bin.cli import CLIFactory | ||
|
||
def get_private_ip(name=''): | ||
r = requests.get("http://169.254.169.254/latest/meta-data/local-ipv4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "http://169.254.169.254/latest/meta-data/local-ipv4" also be a part of configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but I see no reason to make it configurable unless it could change. This is the static address from which EC2 instances vend metadata - see the AWS docs for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wish boto3
had a nice wrapper around this, but boto/boto3#313 has been open for a while. There is boto.utils.get_instance_metadata()
though:
>>> import boto.utils
>>> instance_metadata = boto.utils.get_instance_metadata()
>>> print instance_metadata['local-ipv4']
10.0.24.67
But that is a dependency to another library, which may be available though.
Overall looks like the hostname the node returns:
PRODUCTION: amalakar@etl-production-iad-d532cf46:~$ hostname
etl-production-iad-d532cf46
Doesn't seem to have a dns entry or an entry in /etc/hosts
PRODUCTION: amalakar@etl-production-iad-d532cf46:~$ host etl-production-iad-d532cf46
Host etl-production-iad-d532cf46 not found: 3(NXDOMAIN)
I have seen lot of systems do depend on the gethostname
to resolve correctly on a DNS lookup. For example hadoop cli would fail when this is not satisfied. I think any instance that gets provisioned should have these pieces working which is an assumption for lot of systems out there.
@astahlman should we create a ticket against #provisioning to take care of this? We may end up patching/hacking startup script of many other systems otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, provisioning is aware of it, and unfortunately it seems that this is by design. Lyft uses the hostname to encode some information about the host that's used for monitoring purposes, see the previous discussions in these two Slack threads:
👍 lgtm |
💨 |
Background
We're patching the socket library in order to use IP addresses instead of hostnames, because Lyft hostnames are special and don't actually resolve. (See https://github.com/lyft/etl/pull/5560 for context)
service_venv
sets the $PATH such that/usr/local/lib/service_venv/bin/
takes precedence oversrv/service/current/bin/
, so it's not enough to simply put a patched version ofairflow
in the ETL repository.We're ensuring that our patched version of
airflow
takes precedence by overwriting/usr/local/lib/service_venv/bin/airflow
with a script that just calls our patched version at/srv/service/current/bin/airflow
(from the ETL repository). See this salt-state [1].That doesn't work anymore, because with the rollout of frozen_venvs we can't count on
airflow
being at a hardcoded path. https://github.com/lyft/etl/pull/6073 was one attempt to solve this by finding the service_venv root dynamically. That approach doesn't work, because it depends onservice_venv
being available by the time the Jinja template is rendered, butservice_venv
won't be available until it's created in the lyft-python state (which requires that the template is rendered).Solution
We can avoid this chicken and egg problem if we simply patch the Airflow code directly, rather than trying to overwrite it on deployment. We apply our patch iff the
prefer_ip_over_hostname
key under thelyft
configuration section is set.
Rollout
On the sharedairflowworker (which is only running test DAGs so far), we will stop overwriting the airflow command with our patched version and set the
prefer_ip_over_hostname
key.If this works, then we can take the same approach for the other ASGs. Until then, we will continue overwriting the
airflow
command under service_venv/ with our patched version from ETL.cc @lyft/data-platform
[1] https://github.com/lyft/etl/blob/master/ops/config/states/etl/init.sls#L50-L59