Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3159] Update GCS logging docs for latest code #3952

Merged
merged 2 commits into from
Oct 5, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 12 additions & 52 deletions docs/howto/write-logs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,54 +89,21 @@ Writing Logs to Google Cloud Storage

Follow the steps below to enable Google Cloud Storage logging.

#. Airflow's logging system requires a custom .py file to be located in the ``PYTHONPATH``, so that it's importable from Airflow. Start by creating a directory to store the config file. ``$AIRFLOW_HOME/config`` is recommended.
#. Create empty files called ``$AIRFLOW_HOME/config/log_config.py`` and ``$AIRFLOW_HOME/config/__init__.py``.
#. Copy the contents of ``airflow/config_templates/airflow_local_settings.py`` into the ``log_config.py`` file that was just created in the step above.
#. Customize the following portions of the template:

.. code-block:: bash

# Add this variable to the top of the file. Note the trailing slash.
GCS_LOG_FOLDER = 'gs://<bucket where logs should be persisted>/'

# Rename DEFAULT_LOGGING_CONFIG to LOGGING CONFIG
LOGGING_CONFIG = ...

# Add a GCSTaskHandler to the 'handlers' block of the LOGGING_CONFIG variable
'gcs.task': {
'class': 'airflow.utils.log.gcs_task_handler.GCSTaskHandler',
'formatter': 'airflow.task',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
'gcs_log_folder': GCS_LOG_FOLDER,
'filename_template': FILENAME_TEMPLATE,
},

# Update the airflow.task and airflow.task_runner blocks to be 'gcs.task' instead of 'file.task'.
'loggers': {
'airflow.task': {
'handlers': ['gcs.task'],
...
},
'airflow.task_runner': {
'handlers': ['gcs.task'],
...
},
'airflow': {
'handlers': ['console'],
...
},
}

#. Make sure a Google Cloud Platform connection hook has been defined in Airflow. The hook should have read and write access to the Google Cloud Storage bucket defined above in ``GCS_LOG_FOLDER``.

#. Update ``$AIRFLOW_HOME/airflow.cfg`` to contain:
To enable this feature, ``airflow.cfg`` must be configured as in this
example:

.. code-block:: bash
.. code-block:: bash

task_log_reader = gcs.task
logging_config_class = log_config.LOGGING_CONFIG
remote_log_conn_id = <name of the Google cloud platform hook>
[core]
# Airflow can store logs remotely in AWS S3. Users must supply a remote
# location URL (starting with either 's3://...') and an Airflow connection
# id that provides access to the storage location.
remote_logging_enabled = True
akshayi1 marked this conversation as resolved.
Show resolved Hide resolved
remote_base_log_folder = gs://my-bucket/path/to/logs
remote_log_conn_id = MyGCSConn

#. Install the ``gcp_api`` package first, like so: ``pip install apache-airflow[gcp_api]``.
#. Make sure a Google Cloud Platform connection hook has been defined in Airflow. The hook should have read and write access to the Google Cloud Storage bucket defined above in ``remote_base_log_folder``.
#. Restart the Airflow webserver and scheduler, and trigger (or wait for) a new task execution.
#. Verify that logs are showing up for newly executed tasks in the bucket you've defined.
#. Verify that the Google Cloud Storage viewer is working in the UI. Pull up a newly executed task, and verify that you see something like:
Expand All @@ -150,10 +117,3 @@ Follow the steps below to enable Google Cloud Storage logging.
[2017-10-03 21:57:51,306] {base_task_runner.py:98} INFO - Subtask: [2017-10-03 21:57:51,306] {models.py:186} INFO - Filling up the DagBag from /airflow/dags/example_dags/example_bash_operator.py

Note the top line that says it's reading from the remote log file.

Please be aware that if you were persisting logs to Google Cloud Storage
using the old-style airflow.cfg configuration method, the old logs will no
longer be visible in the Airflow UI, though they'll still exist in Google
Cloud Storage. This is a backwards incompatbile change. If you are unhappy
with it, you can change the ``FILENAME_TEMPLATE`` to reflect the old-style
log filename format.