Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve/surface errors when attempting to read S3 logs #9118

Closed
ashb opened this issue Jun 3, 2020 · 5 comments
Closed

Improve/surface errors when attempting to read S3 logs #9118

ashb opened this issue Jun 3, 2020 · 5 comments
Labels
kind:feature Feature Requests

Comments

@ashb
Copy link
Member

ashb commented Jun 3, 2020

Description

As mentioned in #8212 if you have configured S3 logs, but there is a problem then this is never surfaced to the UI (nor the webserver logs) making this very hard to debug.

All you see in the UI is this:

*** Log file does not exist: /usr/local/airflow/logs/MY_DAG_NAME/MY_TASK_NAME/2020-04-07T20:59:19.312402+00:00/6.log
*** Fetching from: http://MY_DAG_NAME-0dde5ff5a786437cb14234:8793/log/MY_DAG_NAME/MY_TASK_NAME/2020-04-07T20:59:19.312402+00:00/6.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='MY_DAG_NAME-0dde5ff5a786437cb14234', port=8793): Max retries exceeded with url: /log/MY_DAG_NAME/MY_TASK_NAME/2020-04-07T20:59:19.312402+00:00/6.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f708332fc90>: Failed to establish a new connection: [Errno -2] Name or service not known'))

In one such case I was debugging, I found this error when attempting to communicate with S3:

>>> from airflow.configuration import conf
[2020-06-03 08:26:00,253] {settings.py:254} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=11106
>>> from airflow.hooks.S3_hook import S3Hook
>>> h = S3Hook(aws_conn_id=conf.get('core', 'remote_log_conn_id'))
>>> c = h.get_conn()
>>> c.list_buckets()
[2020-06-03 08:27:24,662] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): s3.amazonaws.com
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python3.7/site-packages/botocore/client.py", line 612, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListBuckets operation: The AWS Access Key Id you provided does not exist in our records.

We should add a *** line showing we attempted to fetch the logs form S3, and the error that it failed at least. Right now on error the S3TaskHandler is totally silent in case of error. This is bad.

@dimon222
Copy link
Contributor

dimon222 commented Jun 9, 2020

Any clue what component can be touched to impact on this? I'm having trouble to find what code swallows exceptions. Really looking to investigate and resolve S3 logs issue.

@YevhenKv
Copy link

I used code from the example above and I was able to list log files in my s3 bucket written by Airflow, no permission issues, no errors. However, Airflow seams to be ignoring remote logging configuration when reads logs. Also, used for testing airflow_local_settings.py and set a logging class path in config file, but no luck, S3TaskHandler seams to be ignored by Airflow.

OS: fedora26
python: 3.7.5
Airflow: 1.10.10
AWS EC2 instance with proper role permissions.

@Siddharthk
Copy link

@ashb @JPonte I am getting below error. Looks like a bug:

>>> from airflow.configuration import conf
>>> from airflow.hooks.S3_hook import S3Hook
>>> h = S3Hook(aws_conn_id=conf.get('core', 'remote_log_conn_id'))
>>> c = h.get_conn()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", line 44, in get_conn
    return self.get_client_type('s3')
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", line 176, in get_client_type
    session, endpoint_url = self._get_credentials(region_name)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", line 102, in _get_credentials
    connection_object = self.get_connection(self.aws_conn_id)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 84, in get_connection
    conn = random.choice(list(cls.get_connections(conn_id)))
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 80, in get_connections
    return secrets.get_connections(conn_id)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/secrets/__init__.py", line 52, in get_connections
    conn_list = secrets_backend.get_connections(conn_id=conn_id)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/secrets/base_secrets.py", line 69, in get_connections
    conn = Connection(conn_id=conn_id, uri=conn_uri)
  File "<string>", line 4, in __init__
  File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/orm/state.py", line 433, in _initialize_instance
    manager.dispatch.init_failure(self, args, kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit__
    exc_value, with_traceback=exc_tb,
  File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/orm/state.py", line 430, in _initialize_instance
    return manager.original_init(*mixed[1:], **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/connection.py", line 119, in __init__
    self.parse_from_uri(uri)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/connection.py", line 144, in parse_from_uri
    self.port = uri_parts.port
  File "/usr/local/lib/python3.6/urllib/parse.py", line 169, in port
    port = int(port, 10)
ValueError: invalid literal for int() with base 10: 'abcd12'

My aws secret key is(dummy): abcd12/ef34578fgt

Looks like when '/' is coming in the secret key, the connection is not getting created.

@ashb
Copy link
Member Author

ashb commented Aug 2, 2020

@Siddharthk you probably need to URL encode the secret key - / to %2f

@ashb
Copy link
Member Author

ashb commented Nov 13, 2020

Closed by #9908

@ashb ashb closed this as completed Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

4 participants