OpenMetadata in airgapped environment #18137
-
We are just spinning up an proof of concept, and we have got things running in docker, using an existing Airflow instance for the ingestion. Ingesting stuff from databases work as well as for dashboards, and we are now looking into classification of personal data. However, when running an profiling job, we get an error that points to an missing python library. snippet from the run log:
Now, searching issues in this repo, I could only find some stuff related to models beeing downloaded on the fly, and I do not know how to address this problem, as I have no internet access in the running environment, so I need stuff to be downloaded when building my image. Following the documentation, if trying to install the models with: ..causes the Airflow ingestion API to fail with an error that pydantic can't be imported. Airflow version 2.9.1 Anyone have an input how to solve this? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Replying to myself here, as I got it working. The issue is twofold. First, The second is that not only spacy needs to be installed, but also presidio_analyser. So, if running OpenMetadata 1.5.6, and you are running an existing Airflow 2.9.1, ensure that the following stuff is installed on you Airflow image:
And ensure that v3.7 of the spacy models are used and NOT 3.5: This got me to an working state |
Beta Was this translation helpful? Give feedback.
Replying to myself here, as I got it working.
The issue is twofold.
First,
the pip --trusted-host github.com --trusted-host objects.githubusercontent.com install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.5.0/en_core_web_md-3.5.0.tar.gz
installs pydantic 1.x causing havoc on AIrflow.The second is that not only spacy needs to be installed, but also presidio_analyser.
So, if running OpenMetadata 1.5.6, and you are running an existing Airflow 2.9.1, ensure that the following stuff is installed on you Airflow image:
And ensur…