Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure during "airflow db init" on fresh airflow 2.0 installation #13149

Closed
hankehly opened this issue Dec 18, 2020 · 14 comments
Closed

Failure during "airflow db init" on fresh airflow 2.0 installation #13149

hankehly opened this issue Dec 18, 2020 · 14 comments
Labels
invalid kind:bug This is a clearly a bug

Comments

@hankehly
Copy link
Contributor

Apache Airflow version: 2.0

Environment:

  • Cloud provider or hardware configuration: Local development environment using docker
  • OS (e.g. from /etc/os-release): macOS Catalina (10.15.7)
  • Kernel (e.g. uname -a): Darwin
  • Install tools: poetry
  • Others: docker 20.10.0, postgres, Python 3.7.9

What happened:

A fresh installation of airflow 2.0 seems to be failing on airflow db init with what looks like a 3rd-party library exception (see traceback below). I searched for related issues on github/google; but didn't find anything useful.

What you expected to happen:

Given I couldn't find any useful information online, this makes me think the problem is my environment. I will continue to look at it, but in the meantime I'd like to put this out there in case anyone else has had a similar issue.

How to reproduce it:

1. Install poetry package manager

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

2. Create a docker-compose.yml file inside a new directory

version: "3.9"
services:
  db:
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
      POSTGRES_PORT: 5432
    image: postgres
    ports:
      - 5432:5432

3. Create a pyproject.toml file inside the same directory as (2)

[tool.poetry]
name = "airflow-2-docker-example"
version = "0.1.0"
description = ""
authors = ["name <name@example.com>"]

[tool.poetry.dependencies]
python = "^3.7"
apache-airflow = "^2.0.0"
psycopg2-binary = "^2.8.6"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

4. Install python packages and bring up the database

poetry install
docker-compose up -d

5. Initialize airflow database

AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@localhost:5432/airflow poetry run airflow db init

Anything else we need to know:

Traceback
hankehly ~/src/airflow-2-docker-example $ AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@localhost:5432/airflow poetry run airflow db init
DB: postgresql+psycopg2://airflow:***@localhost:5432/airflow
[2020-12-18 19:23:20,344] {db.py:678} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 849da589634d -> 2c6edca13270, Resource based permissions.
Traceback (most recent call last):
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/airflow/cli/commands/db_command.py", line 31, in initdb
    db.initdb()
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/airflow/utils/db.py", line 549, in initdb
    upgradedb()
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/airflow/utils/db.py", line 688, in upgradedb
    command.upgrade(config, 'heads')
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/alembic/command.py", line 298, in upgrade
    script.run_env()
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/alembic/script/base.py", line 489, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/alembic/util/pyfiles.py", line 98, in load_python_file
    module = load_module_py(module_id, path)
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/alembic/util/compat.py", line 184, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/airflow/migrations/env.py", line 108, in <module>
    run_migrations_online()
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/airflow/migrations/env.py", line 102, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/alembic/runtime/environment.py", line 846, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/alembic/runtime/migration.py", line 522, in run_migrations
    step.migration_fn(**kw)
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/airflow/migrations/versions/2c6edca13270_resource_based_permissions.py", line 310, in upgrade
    remap_permissions()
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/airflow/migrations/versions/2c6edca13270_resource_based_permissions.py", line 287, in remap_permissions
    appbuilder = create_app(config={'FAB_UPDATE_PERMS': False}).appbuilder
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/airflow/www/app.py", line 74, in create_app
    flask_app.config.from_pyfile(settings.WEBSERVER_CONFIG, silent=True)
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/flask/config.py", line 132, in from_pyfile
    exec(compile(config_file.read(), filename, "exec"), d.__dict__)
  File "/Users/hankehly/airflow/webserver_config.py", line 21, in <module>
    from flask_appbuilder.security.manager import AUTH_DB
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/flask_appbuilder/security/manager.py", line 13, in <module>
    from flask_openid import OpenID
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/flask_openid.py", line 26, in <module>
    from openid.store.filestore import FileOpenIDStore
  File "/Users/hankehly/src/airflow-2-docker-example/.venv/lib/python3.7/site-packages/openid/__init__.py", line 52, in <module>
    if len(version_info) != 3:
TypeError: object of type 'map' has no len()
@hankehly hankehly added the kind:bug This is a clearly a bug label Dec 18, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Dec 18, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@potiuk
Copy link
Member

potiuk commented Dec 18, 2020

Hello. We so not support installing airlfow via Poetry.

The official way of installing airflow in a reproducible way is described here (using PIP 20.2.4 and constraint mechanism) :

http://airflow.apache.org/docs/apache-airflow/stable/installation.html#getting-airflow

Also the problems with PIP 20.3 has been apparently solved in PIP 20.3.3 (we are still verifying it) so installing with PIP 20.3.3 should also work.

Can you please verify that the problem persists if you follow the official installation mechanism?

If want to still stay with poetry, the list of "consistent" and working constraints for Airflow 2.0.0 is available here https://github.com/apache/airflow/tree/constraints-2.0.0 (separate for each python version supported) and in case poetry resolves the requirements differently, I encourage you to make it follow the constraints we publish.

If you see any further problems, simple comparision of your installed versions of dependencies with the ones provided by our constraints should help you to resolve the problem and make poetry install the right versions. I am not sure how this can be done, I do not know poetry that well, but if you find how to make this works with poetry It would be great if you could contribute back the description on how to make our constraint mechanisms work with poetry-driven installations.

We've the whole CI set of tests employed in place to make sure that the list of "valid" constraints is up-to-date and automatically verified, so folowing the constraints we produce is the best to make sure your installation is smooth and works.

You can read more on why this works this way and how it actually works in here if you are interested why we chose this path.

I am closing it as invalid now, but if you try to match the constraints and you see that you still have the same problems even with the same versions of dependencies installed, feel free to add extra comment here.

@potiuk potiuk added the invalid label Dec 18, 2020
@potiuk potiuk closed this as completed Dec 18, 2020
@hankehly
Copy link
Contributor Author

Thank you for the quick reply.

Can you please verify that the problem persists if you follow the official installation mechanism?

DB initialization after installing packages with pip (even without specifying constraints) succeeds on my local environment.

If you see any further problems, simple comparison of your installed versions of dependencies with the ones provided by our constraints should help you to resolve the problem and make poetry install the right versions. I am not sure how this can be done, I do not know poetry that well, but if you find how to make this works with poetry It would be great if you could contribute back the description on how to make our constraint mechanisms work with poetry-driven installations.

Thanks. It looks like poetry does not support constraint files out of the box. If I come across anything helpful regarding installation via poetry I'll share it with the community.

@hankehly
Copy link
Contributor Author

For those seeking a workaround, I was eventually able to initialize the database with a poetry installation by adding python3-openid to my list of dependencies (see python-poetry/poetry#1287 for details)

[tool.poetry]
name = "airflow-2-docker-example"
version = "0.1.0"
description = ""
authors = ["name <name@example.com>"]

[tool.poetry.dependencies]
python = "^3.7"
apache-airflow = "^2.0.0"
psycopg2-binary = "^2.8.6"
+python3-openid = "^3.2.0"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

@potiuk
Copy link
Member

potiuk commented Dec 21, 2020

For those seeking a workaround, I was eventually able to initialize the database with a poetry installation by adding python3-openid to my list of dependencies (see python-poetry/poetry#1287 for details)

Cool. Do you think you could make somehow repeatable instructions on how to install Airflow with Poetry? it would be great to add such instruction in Airflow Docs for those who try it.

UPDATE: Ah - I see your response about it. I would love to hear back if you find something there :). This is very much possible that installation will work even without the constraints initially. The problems we experienced in the past was that at some point of time transitive dependency changes broke Airflow installation - that's why we got those "fixed" constraint files that makes a reproducible installation.

@maxcountryman
Copy link
Contributor

...that's why we got those "fixed" constraint files that makes a reproducible installation.

Why not use a lock file for this purpose? Poetry should support reproducible builds via the lock file.

@potiuk
Copy link
Member

potiuk commented Dec 23, 2020

We indeed considered to check poetry in the past so maybe this is a good time to try. Would it be possible that you make a POC and try of poetry.lock approach can be used in similar way we use constraint files ? Happy to help with review and guide you @maxcountryman

@potiuk
Copy link
Member

potiuk commented Dec 24, 2020

Just one comment here @maxcountryman if you are following that route. There are certain use cases that need to be solved if we would support poetry:

  1. Installing from PyPI or wheel using the 'constraints' - one reason why we have constraints in our repo is that we can do the "recommended" installation method for our users. See https://github.com/apache/airflow/blob/master/docs/apache-airflow/installation.rst#getting-airflow. For example, the user can run (without having source code):
pip install apache-airflow[EXTRAS]==1.10.14 \
   --constraint https://raw.githubusercontent.com/apache/airflow/constraints-1.10.14/constraints-3.6.txt

And get a fully reproducible installation of Airflow 1.10.14, no matter what transitional dependencies have been released since we released airflow. The poetry.lock file is not available via PyPI and I wonder if we can direct users to install airflow from PyPI in a similar way using poetry.lock? Note that having to download such a file before installation is not a good solution. It should be one simple command.

  1. We have constraint files in orphaned branches in our repo where we can update them independently from relating Airflow. That allows us to fix and correct any kind of constraint issues we have without having to update the source code:
  1. We keep separate constraints for each python major/minor combination we support (2.7, 3.5, 3.6. 3.7, 3.8 for 1.10 and 3.6, 3.7, 3.8, and working on 3.9 in 2.0). There are subtle (but significant) differences between those, I wonder if single poetry.lock can handle this or whether we would have to have separate poetry-3.6.lock, poetry-3.7.lock etc. ). Is there any support in poetry to support that? Or again we would have to add our own tooling around that?

Those are the three major challenges I see if we would like to go the poetry route at some point in time - I wonder @maxcountryman (or anyone else with poetry experience) if you have some experience that could help us to see if those cases can be handled with it?

@Limess
Copy link

Limess commented Jan 4, 2021

Not to ignore the above discussion, but for the benefit of anyone else using poetry on top of airflow in a workflow similar to the following who is experiencing issues with flask-openid:

  1. Use the official Airflow docker image
  2. Export requirements from poetry so we can use the Airflow constraints file
  3. Globally install on top of the base Airflow dependencies using the airflow constraints file to ensure we get what the Airflow team deems 'golden'

I had to add python3-openid to our poetry.toml as above, but I also had to remove python-openid from the output requirements.txt, otherwise we still had it floating around and causing issues. As the poetry issues (python-poetry/poetry#1287) says, this is a problem with flask-openid, not poetry :

RUN export PYTHON_MAJOR_MINOR_VERSION=$(python -c 'import sys; print("%s.%s"% (sys.version_info.major, sys.version_info.minor))') \
  && AIRFLOW_MINOR_VERSION=$(echo "$AIRFLOW_VERSION" | cut -d "." -f 1)-$(echo "$AIRFLOW_VERSION" | cut -d "." -f 2) \
  && curl -sSL "https://raw.githubusercontent.com/apache/airflow/constraints-$AIRFLOW_MINOR_VERSION/constraints-$PYTHON_MAJOR_MINOR_VERSION.txt" > ./airflow-constraints.txt \
  && poetry export --without-hashes -f requirements.txt -o ./requirements.txt \
  # flask-openid does not correctly specify version constraints https://github.com/python-poetry/poetry/issues/1287
  && echo "remove python-openid from poetry packages as it's pulled in incorrectly by flask-openid" \
  && sed -i '/^python-openid==/d' ./requirements.txt \
  && pip install --user --no-cache-dir --upgrade pip==${PIP_VERSION} \
  && pip install --user --no-cache-dir --no-warn-script-location -r ./requirements.txt --constraint ./airflow-constraints.txt \
  && rm -rf ~/.cache ./requirements.txt ./airflow-constraints.txt

@potiuk
Copy link
Member

potiuk commented Jan 4, 2021

Not to ignore the above discussion, but for the benefit of anyone else using poetry on top of airflow in a workflow similar to the following who is experiencing issues with flask-openid:

Just wondering (I would love to understand it) - what benefit does poetry brings in this particular case?
Is there any reason why one simply could not use pip alone rather than poetry in this case? Is there a reason this cannot be done?

I was thinking about switching to poetry once for Airflow (that was a long time ago: my mail from October 2018: https://lists.apache.org/thread.html/23a598f54eda27311544fbdb9503305cd214b27b211699bd37689f46%40%3Cdev.airflow.apache.org%3E) but after trying it out, I noticed that it misses a lot of the things vs. what we wanted (also checked pip-tools then but they weren't good enough either). So I really wonder what benefits people have with using poetry vs. other tools and how it fits in in the workflow you already have.

@Limess
Copy link

Limess commented Jan 4, 2021

We started using poetry to build an airflow docker image, but have since moved to the official image to try to minimize incompatibilities and haven't gone away from the tool.

I think the main reasons are:

  • reasonably sane virtualenv management if you set it up correctly with pyenv
  • easy to just do a poetry install after which IDE tools work pretty well if you keep the poetry deps file up to date. We've had horrible issues with setup with raw requirements across other python projects, and we're not a python shop so it's not a strong skillset
  • we have a monorepo with airflow config, a dbt project, and several small libraries (mainly singer taps/targets) which we install directly in virtualenvs and use poetry to manage deps so it's consistent

@potiuk
Copy link
Member

potiuk commented Jan 5, 2021

I see. I perfectly understand what 'horrible issues' mean. It's unfortunately yet another 'dependency-hell' kind of situation. Good luck with that. For now I think I prefer to stick with PIP install, but if you would like to make a PR to our contributing documentation on how to use our constraint files when you are installing airflow with poetry in the chapter following this: https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pinned-constraint-files that would be awesome @Limess . I know there are people using poetry for various reasons and I would love to be able to tell them "if you use poetry you can follow this" rather than "do not use poetry".

potiuk added a commit to PolideaInternal/airflow that referenced this issue Jan 16, 2021
Seems that python3-openid dependency is not properly solved by tools
like poetry (it is properly resolved by pip). The result is
that old version of python3-openid is installed when poetry is
used and errors when initdb is run.

While we do not use poetry as an official installation mechanism
this happens frequently enought and it is easy enough to fix
that we can add this dependency to make it easier for
poetry users.

Related to apache#13711 apache#13558 apache#13149
potiuk added a commit that referenced this issue Jan 16, 2021
* Adds python3-openid requirement

Seems that python3-openid dependency is not properly solved by tools
like poetry (it is properly resolved by pip). The result is
that old version of python3-openid is installed when poetry is
used and errors when initdb is run.

While we do not use poetry as an official installation mechanism
this happens frequently enought and it is easy enough to fix
that we can add this dependency to make it easier for
poetry users.

Related to #13711 #13558 #13149

* Update setup.cfg
kaxil pushed a commit that referenced this issue Jan 21, 2021
* Adds python3-openid requirement

Seems that python3-openid dependency is not properly solved by tools
like poetry (it is properly resolved by pip). The result is
that old version of python3-openid is installed when poetry is
used and errors when initdb is run.

While we do not use poetry as an official installation mechanism
this happens frequently enought and it is easy enough to fix
that we can add this dependency to make it easier for
poetry users.

Related to #13711 #13558 #13149

* Update setup.cfg

(cherry picked from commit df73edf)
@petobens
Copy link

I had to fork flask-openid to make it work:

[tool.poetry]
name = "airflow-new"
version = "0.1.0"
description = ""
authors = ["petobens <foo@bar.com>"]

[tool.poetry.dependencies]
python = "^3.8"
flask-openid = {git = "https://github.com/petobens/flask-openid.git"}
apache-airflow = "^2.0.1"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

basically removing the sys.version_info check that made poetry install the Python2 dependency instead of the correct Python3 one. Since Airflow 2.0 only supports Python >=3.6 can Airflow either use my fork or a new flask-openid fork with this workaround? (the official flask-openid library hasn't been updated in almost 5 years)

@potiuk
Copy link
Member

potiuk commented Feb 22, 2021

@petobens -> did you try to make a PR to the flask-openid library? the change is not huge and in the light of Python 2 EOL > year ago, maybe the authors will merge and release it?

I am afraid we cannot release anything in PyPI that refers to a github repository even if we want - PyPI does not work with dependencies from GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

5 participants