Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support configuration key to toggle using IPs or hostnames #7

Merged
merged 1 commit into from
Jun 26, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions airflow/bin/airflow
Original file line number Diff line number Diff line change
@@ -1,9 +1,30 @@
#!/usr/bin/env python
import logging
import os
import socket
import requests
from airflow import configuration
from airflow.bin.cli import CLIFactory

def get_private_ip(name=''):
r = requests.get("http://169.254.169.254/latest/meta-data/local-ipv4")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "http://169.254.169.254/latest/meta-data/local-ipv4" also be a part of configuration?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I see no reason to make it configurable unless it could change. This is the static address from which EC2 instances vend metadata - see the AWS docs for details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wish boto3 had a nice wrapper around this, but boto/boto3#313 has been open for a while. There is boto.utils.get_instance_metadata() though:

>>> import boto.utils
>>> instance_metadata = boto.utils.get_instance_metadata()
>>> print instance_metadata['local-ipv4']
10.0.24.67

But that is a dependency to another library, which may be available though.

Overall looks like the hostname the node returns:

PRODUCTION: amalakar@etl-production-iad-d532cf46:~$ hostname
etl-production-iad-d532cf46

Doesn't seem to have a dns entry or an entry in /etc/hosts

PRODUCTION: amalakar@etl-production-iad-d532cf46:~$ host etl-production-iad-d532cf46
Host etl-production-iad-d532cf46 not found: 3(NXDOMAIN)

I have seen lot of systems do depend on the gethostname to resolve correctly on a DNS lookup. For example hadoop cli would fail when this is not satisfied. I think any instance that gets provisioned should have these pieces working which is an assumption for lot of systems out there.

@astahlman should we create a ticket against #provisioning to take care of this? We may end up patching/hacking startup script of many other systems otherwise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, provisioning is aware of it, and unfortunately it seems that this is by design. Lyft uses the hostname to encode some information about the host that's used for monitoring purposes, see the previous discussions in these two Slack threads:

  1. https://lyft.slack.com/archives/C3ASS377S/p1493405313083642
  2. https://lyft.slack.com/archives/C2U75K0H5/p1484849237000002

return str(r.text)


def getfqdn(name=''):
return get_private_ip()


should_patch_socket = False
try:
should_patch_socket = configuration.getboolean('lyft', 'prefer_ip_over_hostname')
except configuration.AirflowConfigException:
pass # Default to False if not configured

if should_patch_socket:
logging.info("Using IP addresses instead of hostnames.")
socket.gethostname = socket.getfqdn = getfqdn

if __name__ == '__main__':

if configuration.get("core", "security") == 'kerberos':
Expand Down