Skip to content

Commit

Permalink
Allow configuration to be contributed by providers (#32604)
Browse files Browse the repository at this point in the history
* Allow configuration to be contributed by providers

The changes implemented:

* provider.yaml files for providers can optionally contribute extra
  configuration, the configuration is exposed via "get_provider_info"
  entrypoint, thus allowing Airflow to discover the configuration
  from both - sources (in Breeze and local development) and from
  installed packages

* Provider configuraitions are lazily loaded - only for commands that
  actually need them

* Documentation for configuration contributed by providers is
  generated as part of Provider documentation. It is also discoverable
  by having a "core-extension" page displaying all community providers
  that contribute their own configuration.

* Celery configuration (and in the future Kubernetes configuration) is
  linked directly from the airflow documentation - the providers are
  preinstalled, which means that celery (and Kubernetes in the future)
  configuration is considered as important to be directly mentioned
  and linked from the core. Similarly Celery and Kubernetes executor
  documentation remains in the core documentation (still configuration
  options are detailed only in the provider documentation and only
  linked from the core.

* configuration writing happens in "main" not in the configuration
  initialization and we will always execute provider configuration
  initialization. This will make sure that the generated configuration
  will contain configuration for the providers as well.

* Related documentation about custom and community providers have been
  updated and somewhat refactored - I realized that some of it was quite
  out-of-date and some of it was really "developer" not user docs.
  The docs are restructured a bit, cleaned, missing information is
  added and old/irrelevant parts removed.

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* Update airflow/configuration.py

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
  • Loading branch information
potiuk and jedcunningham committed Jul 21, 2023
1 parent 8156551 commit 73b90c4
Show file tree
Hide file tree
Showing 132 changed files with 1,556 additions and 945 deletions.
3 changes: 2 additions & 1 deletion TESTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ as default for all tests you should add the value to this file.
You can also of course override the values in individual test by patching environment variables following
the usual ``AIRFLOW__SECTION__KEY`` pattern or ``conf_vars`` context manager.

.. note::
.. note:: Previous way of setting the test configuration

The test configuration for Airflow before July 2023 was automatically generated in a file named
``AIRFLOW_HOME/unittest.cfg``. The template for it was stored in "config_templates" next to the yaml file.
Expand All @@ -87,6 +87,7 @@ the usual ``AIRFLOW__SECTION__KEY`` pattern or ``conf_vars`` context manager.

The unittest.cfg file generated in {AIRFLOW_HOME} will no longer be used and can be removed.


Airflow test types
------------------

Expand Down
7 changes: 7 additions & 0 deletions airflow/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,13 @@ def main():
parser = cli_parser.get_parser()
argcomplete.autocomplete(parser)
args = parser.parse_args()

# Here we ensure that the default configuration is written if needed before running any command
# that might need it. This used to be done during configuration initialization but having it
# in main ensures that it is not done during tests and other ways airflow imports are used
from airflow.configuration import write_default_airflow_configuration_if_needed

write_default_airflow_configuration_if_needed()
args.func(args)


Expand Down
34 changes: 29 additions & 5 deletions airflow/cli/cli_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,11 @@ def string_lower_type(val):
help="Comment out all configuration options. Useful as starting point for new installation",
action="store_true",
)
ARG_EXCLUDE_PROVIDERS = Arg(
("-p", "--exclude-providers"),
help="Exclude provider configuration (they are included by default)",
action="store_true",
)
ARG_DEFAULTS = Arg(
("-a", "--defaults"),
help="Show only defaults - do not include local configuration, sources,"
Expand Down Expand Up @@ -818,6 +823,16 @@ def string_lower_type(val):
action="store_true",
)

# IMPORTANT NOTE! ONLY FOR CELERY ARGUMENTS
#
# Celery configs below have explicit fallback values because celery provider defaults are not yet loaded
# via provider at the time we parse the command line, so in case it is not set, we need to have manual
# fallback. After ProvidersManager.initialize_providers_configuration() is called, the fallbacks are
# not needed anymore and everywhere where you access configuration in provider-specific code and when
# you are sure that providers configuration has been initialized, you can use conf.get() without fallbacks.
#
# DO NOT REMOVE THE FALLBACKS in args parsing even if you are tempted to.
# TODO: possibly move the commands to providers but that could be big performance hit on the CLI
# worker
ARG_QUEUES = Arg(
("-q", "--queues"),
Expand All @@ -828,7 +843,7 @@ def string_lower_type(val):
("-c", "--concurrency"),
type=int,
help="The number of worker processes",
default=conf.get("celery", "worker_concurrency"),
default=conf.getint("celery", "worker_concurrency", fallback=16),
)
ARG_CELERY_HOSTNAME = Arg(
("-H", "--celery-hostname"),
Expand All @@ -855,22 +870,24 @@ def string_lower_type(val):
ARG_BROKER_API = Arg(("-a", "--broker-api"), help="Broker API")
ARG_FLOWER_HOSTNAME = Arg(
("-H", "--hostname"),
default=conf.get("celery", "FLOWER_HOST"),
default=conf.get("celery", "FLOWER_HOST", fallback="0.0.0.0"),
help="Set the hostname on which to run the server",
)
ARG_FLOWER_PORT = Arg(
("-p", "--port"),
default=conf.get("celery", "FLOWER_PORT"),
default=conf.getint("celery", "FLOWER_PORT", fallback=5555),
type=int,
help="The port on which to run the server",
)
ARG_FLOWER_CONF = Arg(("-c", "--flower-conf"), help="Configuration file for flower")
ARG_FLOWER_URL_PREFIX = Arg(
("-u", "--url-prefix"), default=conf.get("celery", "FLOWER_URL_PREFIX"), help="URL prefix for Flower"
("-u", "--url-prefix"),
default=conf.get("celery", "FLOWER_URL_PREFIX", fallback=""),
help="URL prefix for Flower",
)
ARG_FLOWER_BASIC_AUTH = Arg(
("-A", "--basic-auth"),
default=conf.get("celery", "FLOWER_BASIC_AUTH"),
default=conf.get("celery", "FLOWER_BASIC_AUTH", fallback=""),
help=(
"Securing Flower with Basic Authentication. "
"Accepts user:password pairs separated by a comma. "
Expand Down Expand Up @@ -1848,6 +1865,12 @@ class GroupCommand(NamedTuple):
func=lazy_load_command("airflow.cli.commands.provider_command.executors_list"),
args=(ARG_OUTPUT, ARG_VERBOSE),
),
ActionCommand(
name="configs",
help="Get information about provider configuration",
func=lazy_load_command("airflow.cli.commands.provider_command.config_list"),
args=(ARG_OUTPUT, ARG_VERBOSE),
),
ActionCommand(
name="status",
help="Get information about provider initialization status",
Expand Down Expand Up @@ -2038,6 +2061,7 @@ class GroupCommand(NamedTuple):
ARG_INCLUDE_SOURCES,
ARG_INCLUDE_ENV_VARS,
ARG_COMMENT_OUT_EVERYTHING,
ARG_EXCLUDE_PROVIDERS,
ARG_DEFAULTS,
ARG_VERBOSE,
),
Expand Down
9 changes: 9 additions & 0 deletions airflow/cli/commands/config_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ def show_config(args):
include_descriptions=args.include_descriptions or args.defaults,
include_sources=args.include_sources and not args.defaults,
include_env_vars=args.include_env_vars or args.defaults,
include_providers=not args.exclude_providers,
comment_out_everything=args.comment_out_everything or args.defaults,
only_defaults=args.defaults,
)
Expand All @@ -48,6 +49,14 @@ def show_config(args):

def get_value(args):
"""Get one value from configuration."""
# while this will make get_value quite a bit slower we must initialize configuration
# for providers because we do not know what sections and options will be available after
# providers are initialized. Theoretically Providers might add new sections and options
# but also override defaults for existing options, so without loading all providers we
# cannot be sure what is the final value of the option.
from airflow.providers_manager import ProvidersManager

ProvidersManager().initialize_providers_configuration()
if not conf.has_option(args.section, args.option):
raise SystemExit(f"The option [{args.section}/{args.option}] is not found in config.")

Expand Down
12 changes: 12 additions & 0 deletions airflow/cli/commands/provider_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,18 @@ def executors_list(args):
)


@suppress_logs_and_warning
def config_list(args):
"""Lists all configurations at the command line."""
AirflowConsole().print_as(
data=list(ProvidersManager().provider_configs),
output=args.output,
mapper=lambda x: {
"provider_config": x,
},
)


@suppress_logs_and_warning
def status(args):
"""Informs if providers manager has been initialized.
Expand Down
Loading

0 comments on commit 73b90c4

Please sign in to comment.