Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't install SQLAlchemy/Pendulum adapters for other DBs #18745

Merged

Conversation

ashb
Copy link
Member

@ashb ashb commented Oct 5, 2021

This stops the MySQL libs being imported "unnecessarily" when Postgres is in use -- and there have been a few confusing reports of the mysql client libs causing problems in rare cases, so lets avoid the import if we can.

This doesn't fix the underlying problem causing #17546 but it might stop it affecting users who aren't using mysql. (Our docker image contains mysql lib, so the import is successful.)


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@ashb ashb added this to the Airflow 2.2.0 milestone Oct 5, 2021
@ashb ashb requested review from potiuk, kaxil and uranusjr October 5, 2021 13:42
@uranusjr
Copy link
Member

uranusjr commented Oct 5, 2021

Where have the checks gone 🤔

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would break some PythonOperator edge cases where people depend on these adapters. But even if so it's OK to break those.

@ashb
Copy link
Member Author

ashb commented Oct 5, 2021

Where have the checks gone thinking

Github can take some time to trigger the start of the action sometimes, and until it does it just show Green with no pending checks. It's a bit annoying in that regard.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Oct 5, 2021
@github-actions
Copy link

github-actions bot commented Oct 5, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk
Copy link
Member

potiuk commented Oct 5, 2021

Github can take some time to trigger the start of the action sometimes, and until it does it just show Green with no pending checks. It's a bit annoying in that regard.

There was a github Actions outage - that's why it took longer than usual, but yeah - I even have a ticket opened to GH to fix this "green even if we know there are other checks to be scheduled" case.

@potiuk
Copy link
Member

potiuk commented Oct 5, 2021

Good idea with the mysql thingie. I was wondering where it came from, but now I know

@ashb
Copy link
Member Author

ashb commented Oct 5, 2021

I wonder if this would break some PythonOperator edge cases where people depend on these adapters. But even if so it's OK to break those.

ahaha. It did:

tests/core/test_core_to_contrib.py::TestMovingCoreToContrib::test_is_subclass_393_airflow_providers_google_cloud_transfers_mysql_to_gcs_MySQLToGCSOperator: NameError: name '_mysql' is not defined
tests/core/test_core_to_contrib.py::TestMovingCoreToContrib::test_is_subclass_412_airflow_providers_apache_hive_transfers_mysql_to_hive_MySqlToHiveOperator: NameError: name '_mysql' is not defined
tests/core/test_core_to_contrib.py::TestMovingCoreToContrib::test_is_subclass_421_airflow_providers_mysql_transfers_vertica_to_mysql_VerticaToMySqlOperator: NameError: name '_mysql' is not defined
tests/core/test_core_to_contrib.py::TestMovingCoreToContrib::test_warning_on_import_393_airflow_providers_google_cloud_transfers_mysql_to_gcs_MySQLToGCSOperator: NameError: name '_mysql' is not defined
tests/core/test_core_to_contrib.py::TestMovingCoreToContrib::test_warning_on_import_412_airflow_providers_apache_hive_transfers_mysql_to_hive_MySqlToHiveOperator: NameError: name '_mysql' is not defined
tests/core/test_core_to_contrib.py::TestMovingCoreToContrib::test_warning_on_import_421_airflow_providers_mysql_transfers_vertica_to_mysql_VerticaToMySqlOperator: NameError: name '_mysql' is not defined

@ashb
Copy link
Member Author

ashb commented Oct 6, 2021

The same problem people are reporting in that thread is happening in this CI build now, and also for me locally

  _ TestMovingCoreToContrib.test_is_subclass_393_airflow_providers_google_cloud_transfers_mysql_to_gcs_MySQLToGCSOperator _
  
  thing = <module 'airflow.providers.google.cloud.transfers' from '/opt/airflow/airflow/providers/google/cloud/transfers/__init__.py'>
  comp = 'mysql_to_gcs'
  import_path = 'airflow.providers.google.cloud.transfers.mysql_to_gcs'
  
      def _dot_lookup(thing, comp, import_path):
          try:
  >           return getattr(thing, comp)
  E           AttributeError: module 'airflow.providers.google.cloud.transfers' has no attribute 'mysql_to_gcs'
  
  /usr/local/lib/python3.6/unittest/mock.py:1075: AttributeError
  
  During handling of the above exception, another exception occurred:
  
      """
      
      try:
          from MySQLdb.release import version_info
  >       from . import _mysql
  E       ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: cannot allocate memory in static TLS block
  
  /usr/local/lib/python3.6/site-packages/MySQLdb/__init__.py:18: ImportError
  
  During handling of the above exception, another exception occurred:
  
  a = (<test_core_to_contrib.TestMovingCoreToContrib testMethod=test_is_subclass_393_airflow_providers_google_cloud_transfers_mysql_to_gcs_MySQLToGCSOperator>,)
  
      @wraps(func)
      def standalone_func(*a):
  >       return func(*(a + p.args), **p.kwargs)

@potiuk
Copy link
Member

potiuk commented Oct 6, 2021

The same problem people are reporting in that thread is happening in this CI build now, and also for me locally

OOOH. Nice! REPRODUCIBILITY!

I will take a look as soon as take are of some urgent stuff!

@potiuk
Copy link
Member

potiuk commented Oct 6, 2021

Hey @ashb - I have an idea and need your help - if you can reproduce it locally.

Can you please try:

  ./breeze --github-image-id 2057b5bacee40841f7c7255e457876de04f54940 --backend mssql --python 3.6 --db-reset --skip-mounting-local-sources --test-type Always shell

and then:

pytest --with-db-init tests/always

This is what runs on CI and generates this error. However when I run it locally, I get NO PROBLEM.

If you can reproducibly get failure for it on your machine, then I believe I know the root cause.

My Hypothesis is that I have some CPU extension or more likely kernel feature (kernel is shared between host and container), that you do not have, and it makes compilation go different route and use the TLS for optimisation - but then it might behave differently (and crash) when that CPU extension/kernel feature is not available.

If we can consistently reproduce it, we can compare our CPU extensions/kernel versions and possibly we will be able to disable it when I generate the image.

@ashb
Copy link
Member Author

ashb commented Oct 6, 2021

I'll see if I can still reproduce it

@ashb
Copy link
Member Author

ashb commented Oct 7, 2021

I can't reproduce this with that image, but my locally built Python 3.8 image does have that.

@potiuk
Copy link
Member

potiuk commented Oct 7, 2021

I can't reproduce this with that image, but my locally built Python 3.8 image does have that.

Can you please push that image somewhere so that I can take a look :) ? I'd love to get to the bottom of it

@ashb
Copy link
Member Author

ashb commented Oct 7, 2021

@potiuk Pushing up ashb/airflow-test:py3-8-with-libcpperror now. Pushed now.

It seems to fairly relibably happen for the first time I run this command in the image

pytest tests/core/test_core_to_contrib.py::TestMovingCoreToContrib::test_is_subclass_393_airflow_providers_google_cloud_transfers_mysql_to_gcs_MySQLToGCSOperator

(Tested against psql 9.6, but I don't think that is relevant.)

@potiuk
Copy link
Member

potiuk commented Oct 10, 2021

@potiuk Pushing up ashb/airflow-test:py3-8-with-libcpperror now. Pushed now.

It seems to fairly relibably happen for the first time I run this command in the image

pytest tests/core/test_core_to_contrib.py::TestMovingCoreToContrib::test_is_subclass_393_airflow_providers_google_cloud_transfers_mysql_to_gcs_MySQLToGCSOperator

(Tested against psql 9.6, but I don't think that is relevant.)

Used your image, and I cannot reproduce it on my PC.

I will try on the EC2 instance, but in the meantime I have this:

Linux Hyperion 5.4.0-88-generic #99-Ubuntu SMP Thu Sep 23 17:29:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

My theory is that this will only show up for newer kernels for the host, that's why we started to get reports recently (nothing really changed in those libraries and compilations). Others reported such errors with higher kelner versions: https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1633680297064600?thread_ts=1633442559.410900&cid=CCQ7EGB1P - 5.4.120
@ashb - what's your kernel ?

@potiuk
Copy link
Member

potiuk commented Oct 10, 2021

Kernel on our runner's instance's is newer (and we saw it happening there):
Linux ip-172-31-31-24 5.11.0-1019-aws #20~20.04.1-Ubuntu SMP Tue Sep 21 10:40:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

@potiuk
Copy link
Member

potiuk commented Oct 10, 2021

I could not reproduce it with Ash's image but the good news I have it nicely reproducible on our Runner EC2 instance 🤞

@potiuk
Copy link
Member

potiuk commented Oct 10, 2021

I could not reproduce it with Ash's image but the good news I have it nicely reproducible on our Runner EC2 instance crossed_fingers

export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6 also reliably fixes the problem there

@ashb ashb force-pushed the only-register-mysql-pendulm-under-mysql branch from 2057b5b to 6f05853 Compare October 12, 2021 09:20
@ashb
Copy link
Member Author

ashb commented Oct 12, 2021

For reference my current kernel is 5.13.13-arch1-1

@ashb
Copy link
Member Author

ashb commented Oct 12, 2021

Do you think I should add export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6 do the ci entrypoint @potiuk ?

@potiuk
Copy link
Member

potiuk commented Oct 12, 2021

I would like to continue to try to find a more 'neutral' fix for it later today - now i can reproduce it and will try to do it later today. LD_PRELOAD has the side effect that it slows down loading of everything

This stops the MySQL libs being imported "unnecessarily" when Postgres
is in use -- and there have been a few confusing reports of the mysql
client libs causing problems in rare cases, so lets avoid the import if
we can.
@jedcunningham jedcunningham force-pushed the only-register-mysql-pendulm-under-mysql branch from 6f05853 to e90ed55 Compare October 20, 2021 17:19
@jedcunningham jedcunningham merged commit d75cf4d into apache:main Oct 20, 2021
@jedcunningham jedcunningham deleted the only-register-mysql-pendulm-under-mysql branch October 20, 2021 18:56
jedcunningham pushed a commit that referenced this pull request Oct 20, 2021
This stops the MySQL libs being imported "unnecessarily" when Postgres
is in use -- and there have been a few confusing reports of the mysql
client libs causing problems in rare cases, so lets avoid the import if
we can.

(cherry picked from commit d75cf4d)
jedcunningham pushed a commit to astronomer/airflow that referenced this pull request Oct 26, 2021
This stops the MySQL libs being imported "unnecessarily" when Postgres
is in use -- and there have been a few confusing reports of the mysql
client libs causing problems in rare cases, so lets avoid the import if
we can.

(cherry picked from commit d75cf4d)
sharon2719 pushed a commit to sharon2719/airflow that referenced this pull request Oct 27, 2021
This stops the MySQL libs being imported "unnecessarily" when Postgres
is in use -- and there have been a few confusing reports of the mysql
client libs causing problems in rare cases, so lets avoid the import if
we can.
jedcunningham pushed a commit to astronomer/airflow that referenced this pull request Oct 27, 2021
This stops the MySQL libs being imported "unnecessarily" when Postgres
is in use -- and there have been a few confusing reports of the mysql
client libs causing problems in rare cases, so lets avoid the import if
we can.

(cherry picked from commit d75cf4d)
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants