Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make graphviz dependency optional #36647

Merged
merged 1 commit into from
Jan 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -856,13 +856,13 @@ arangodb, asana, async, atlas, atlassian.jira, aws, azure, cassandra, celery, cg
cncf.kubernetes, cohere, common.io, common.sql, crypto, databricks, datadog, dbt.cloud,
deprecated_api, devel, devel_all, devel_ci, devel_hadoop, dingding, discord, doc, doc_gen, docker,
druid, elasticsearch, exasol, fab, facebook, ftp, gcp, gcp_api, github, github_enterprise, google,
google_auth, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, kerberos, kubernetes,
ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo, mssql,
mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, otel, pagerduty,
pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, rabbitmq, redis, s3, s3fs,
salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp, snowflake,
spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, virtualenv, weaviate,
webhdfs, winrm, yandex, zendesk
google_auth, graphviz, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, kerberos,
kubernetes, ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo,
mssql, mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, otel,
pagerduty, pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, rabbitmq,
redis, s3, s3fs, salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp,
snowflake, spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, virtualenv,
weaviate, webhdfs, winrm, yandex, zendesk
.. END EXTRAS HERE
Provider packages
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
# much smaller.
#
# Use the same builder frontend version for everyone
ARG AIRFLOW_EXTRAS="aiobotocore,amazon,async,celery,cncf.kubernetes,common.io,docker,elasticsearch,ftp,google,google_auth,grpc,hashicorp,http,ldap,microsoft.azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,virtualenv"
ARG AIRFLOW_EXTRAS="aiobotocore,amazon,async,celery,cncf.kubernetes,common.io,docker,elasticsearch,ftp,google,google_auth,graphviz,grpc,hashicorp,http,ldap,microsoft.azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,virtualenv"
ARG ADDITIONAL_AIRFLOW_EXTRAS=""
ARG ADDITIONAL_PYTHON_DEPS=""

Expand Down
14 changes: 7 additions & 7 deletions INSTALL
Original file line number Diff line number Diff line change
Expand Up @@ -101,13 +101,13 @@ arangodb, asana, async, atlas, atlassian.jira, aws, azure, cassandra, celery, cg
cncf.kubernetes, cohere, common.io, common.sql, crypto, databricks, datadog, dbt.cloud,
deprecated_api, devel, devel_all, devel_ci, devel_hadoop, dingding, discord, doc, doc_gen, docker,
druid, elasticsearch, exasol, fab, facebook, ftp, gcp, gcp_api, github, github_enterprise, google,
google_auth, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, kerberos, kubernetes,
ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo, mssql,
mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, otel, pagerduty,
pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, rabbitmq, redis, s3, s3fs,
salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp, snowflake,
spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, virtualenv, weaviate,
webhdfs, winrm, yandex, zendesk
google_auth, graphviz, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, kerberos,
kubernetes, ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo,
mssql, mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, otel,
pagerduty, pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, rabbitmq,
redis, s3, s3fs, salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp,
snowflake, spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, virtualenv,
weaviate, webhdfs, winrm, yandex, zendesk
# END EXTRAS HERE

# For installing Airflow in development environments - see CONTRIBUTING.rst
Expand Down
15 changes: 14 additions & 1 deletion airflow/utils/dot_renderer.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,14 @@
"""Renderer DAG (tasks and dependencies) to the graphviz object."""
from __future__ import annotations

import warnings
from typing import TYPE_CHECKING, Any

import graphviz
try:
import graphviz
except ImportError:
warnings.warn("Could not import graphviz. Rendering graph to the graphical format will not be possible.")
graphviz = None

from airflow.exceptions import AirflowException
from airflow.models.baseoperator import BaseOperator
Expand Down Expand Up @@ -151,6 +156,10 @@ def render_dag_dependencies(deps: dict[str, list[DagDependency]]) -> graphviz.Di
:param deps: List of DAG dependencies
:return: Graphviz object
"""
if not graphviz:
raise AirflowException(
"Could not import graphviz. Install the graphviz python package to fix this error."
)
dot = graphviz.Digraph(graph_attr={"rankdir": "LR"})

for dag, dependencies in deps.items():
Expand Down Expand Up @@ -179,6 +188,10 @@ def render_dag(dag: DAG, tis: list[TaskInstance] | None = None) -> graphviz.Digr
:param tis: List of task instances
:return: Graphviz object
"""
if not graphviz:
raise AirflowException(
"Could not import graphviz. Install the graphviz python package to fix this error."
)
dot = graphviz.Digraph(
dag.dag_id,
graph_attr={
Expand Down
1 change: 1 addition & 0 deletions dev/breeze/src/airflow_breeze/global_constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -432,6 +432,7 @@ def get_airflow_extras():
"ftp",
"google",
"google_auth",
"graphviz",
"grpc",
"hashicorp",
"http",
Expand Down
2 changes: 2 additions & 0 deletions docs/apache-airflow/extra-packages-ref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ python dependencies for the provided package.
+---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
| google_auth | ``pip install 'apache-airflow[google_auth]'`` | Google auth backend |
+---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
| graphviz | ``pip install 'apache-airflow[graphviz]'`` | Graphviz renderer for converting DAG to graphical output |
+---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
| kerberos | ``pip install 'apache-airflow[kerberos]'`` | Kerberos integration for Kerberized services (Hadoop, Presto, Trino) |
+---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
| ldap | ``pip install 'apache-airflow[ldap]'`` | LDAP authentication for users |
Expand Down
1 change: 1 addition & 0 deletions docs/docker-stack/build-arg-ref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ List of default extras in the production Dockerfile:
* ftp
* google
* google_auth
* graphviz
* grpc
* hashicorp
* http
Expand Down
1 change: 1 addition & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -687,6 +687,7 @@ googleapiclient
GoogleDisplayVideo
gpu
gpus
graphviz
greenlet
Groupalia
groupId
Expand Down
2 changes: 1 addition & 1 deletion images/breeze/output_prod-image_build.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
aa383b195f2991035d5333a6f37bebaa
f7a753d66923772bfb6250d5b87d1f51
23 changes: 23 additions & 0 deletions newsfragments/36647.significant.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Graphviz dependency is now an optional one, not required one.

The ``graphviz`` dependency has been problematic as Airflow required dependency - especially for
ARM-based installations. Graphviz packages require binary graphviz libraries - which is already a
limitation, but they also require to install graphviz Python bindings to be build and installed.
This does not work for older Linux installation but - more importantly - when you try to install
Graphviz libraries for Python 3.8, 3.9 for ARM M1 MacBooks, the packages fail to install because
Python bindings compilation for M1 can only work for Python 3.10+.

This is not a breaking change technically - the CLIs to render the DAGs is still there and IF you
already have graphviz installed, it will continue working as it did before. The only problem when it
does not work is where you do not have graphviz installed it will raise an error and inform that you need it.

Graphviz will remain to be installed for most users:

* the Airflow Image will still contain graphviz library, because
it is added there as extra
* when previous version of Airflow has been installed already, then
graphviz library is already installed there and Airflow will
continue working as it did

The only change will be a new installation of new version of Airflow from the scratch, where graphviz will
need to be specified as extra or installed separately in order to enable DAG rendering option.
1 change: 0 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,6 @@ install_requires =
flask-wtf>=0.15
fsspec>=2023.10.0
google-re2>=1.0
graphviz>=0.12
gunicorn>=20.1.0
httpx
importlib_metadata>=1.7;python_version<"3.9"
Expand Down
3 changes: 3 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,12 +318,14 @@ def write_version(filename: str = str(AIRFLOW_SOURCES_ROOT / "airflow" / "git_ve
]
doc_gen = [
"eralchemy2",
"graphviz>=0.12",
]
flask_appbuilder_oauth = [
"authlib>=1.0.0",
# The version here should be upgraded at the same time as flask-appbuilder in setup.cfg
"flask-appbuilder[oauth]==4.3.10",
]
graphviz = ["graphviz>=0.12"]
kerberos = [
"pykerberos>=1.1.13",
"requests_kerberos>=0.10.0",
Expand Down Expand Up @@ -593,6 +595,7 @@ def get_unique_dependency_list(req_list_iterable: Iterable[list[str]]):
"deprecated_api": deprecated_api,
"github_enterprise": flask_appbuilder_oauth,
"google_auth": flask_appbuilder_oauth,
"graphviz": graphviz,
"kerberos": kerberos,
"ldap": ldap,
"leveldb": leveldb,
Expand Down