Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: Capital One The DataProfiler doesn't collect labels #25

Closed
RamanDamayeu opened this issue Apr 4, 2024 · 1 comment
Closed

Issue: Capital One The DataProfiler doesn't collect labels #25

RamanDamayeu opened this issue Apr 4, 2024 · 1 comment
Assignees

Comments

@RamanDamayeu
Copy link

When we run odd-collector-profiler v0.2.1 with PostgreSQL we do not get labels at the moment of writing.
Using profiler_demo:

odd-profiler-1      | 2024-04-04 11:28:10.144 | DEBUG    | odd_collector_profiler.profiler_sdk:__send_request:64 - [postgres_profiler] Start calculating statistics
odd-profiler-1      | 2024-04-04 11:28:10.205 | DEBUG    | odd_collector_profiler.datasource.database.repository:get_tables:33 - schema: public
odd-profiler-1      | 2024-04-04 11:28:10.210 | DEBUG    | odd_collector_profiler.datasource.database.repository:get_tables:35 - schema='public' table='fakes'
odd-profiler-1      | /app/.venv/lib/python3.9/site-packages/dataprofiler/profilers/profile_builder.py:706: RuntimeWarning: 
odd-profiler-1      | 
odd-profiler-1      | !!! WARNING Partial Profiler Failure !!!
odd-profiler-1      | 
odd-profiler-1      | Profiling Type: data_labeler
odd-profiler-1      | Exception: ModuleNotFoundError
odd-profiler-1      | Message: No module named 'tensorflow'
odd-profiler-1      | 
odd-profiler-1      | For labeler errors, try installing the extra ml requirements via:
odd-profiler-1      | 
odd-profiler-1      | $ pip install dataprofiler[ml] --user
odd-profiler-1      | 
odd-profiler-1      | 
odd-profiler-1      |   utils.warn_on_profile("data_labeler", e)
odd-profiler-1      | 2024-04-04 11:28:10.759 | SUCCESS  | odd_collector_profiler.profiler_sdk:__send_request:68 - [postgres_profiler] Metadata ingested

Updating to latest dependency at pyproject.toml for dataprofiler with extras set to "ml":

dataprofiler = {version = "0.10.9", extras = ["ml"]}

Leads to the mismatch between pandas and SQLAlchemy communication:

odd-profiler-1      | 2024-04-04 10:17:40.881 | DEBUG    | odd_collector_profiler.profiler_sdk:__send_request:64 - [postgres_profiler] Start calculating statistics
odd-profiler-1      | 2024-04-04 10:17:40.941 | DEBUG    | odd_collector_profiler.datasource.database.repository:get_tables:33 - schema: public
odd-profiler-1      | 2024-04-04 10:17:40.947 | DEBUG    | odd_collector_profiler.datasource.database.repository:get_tables:35 - schema='public' table='fakes'
odd-profiler-1      | /app/odd_collector_profiler/data_frame_reader.py:36: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
odd-profiler-1      |   return pd.read_sql_table(
odd-profiler-1      | 2024-04-04 10:17:40.955 | DEBUG    | odd_collector_profiler.data_frame_reader:read_table:42 - Traceback (most recent call last):
odd-profiler-1      |   File "/app/odd_collector_profiler/data_frame_reader.py", line 36, in read_table
odd-profiler-1      |     return pd.read_sql_table(
odd-profiler-1      |   File "/app/.venv/lib/python3.9/site-packages/pandas/io/sql.py", line 385, in read_sql_table
odd-profiler-1      |     if not pandas_sql.has_table(table_name):
odd-profiler-1      |   File "/app/.venv/lib/python3.9/site-packages/pandas/io/sql.py", line 2863, in has_table
odd-profiler-1      |     return len(self.execute(query, [name]).fetchall()) > 0
odd-profiler-1      |   File "/app/.venv/lib/python3.9/site-packages/pandas/io/sql.py", line 2670, in execute
odd-profiler-1      |     cur = self.con.cursor()
odd-profiler-1      | AttributeError: 'Connection' object has no attribute 'cursor'
odd-profiler-1      | 
odd-profiler-1      | 2024-04-04 10:17:40.955 | ERROR    | odd_collector_profiler.data_frame_reader:read_table:43 - Getting data frame, 'Connection' object has no attribute 'cursor'
odd-profiler-1      | 2024-04-04 10:17:41.282921: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
odd-profiler-1      | 2024-04-04 10:17:41.284993: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
odd-profiler-1      | 2024-04-04 10:17:41.320439: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
odd-profiler-1      | 2024-04-04 10:17:41.320768: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
odd-profiler-1      | To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
odd-profiler-1      | 2024-04-04 10:17:43.543678: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
odd-profiler-1      | 2024-04-04 10:17:46.325626: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype string and shape [?,?]
odd-profiler-1      |    [[{{node inputs}}]]
odd-profiler-1      | 2024-04-04 10:17:46.328369: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder' with dtype string and shape [?,?]
odd-profiler-1      |    [[{{node Placeholder}}]]
odd-profiler-1      | 2024-04-04 10:17:46.448 | SUCCESS  | odd_collector_profiler.profiler_sdk:__send_request:68 - [postgres_profiler] Metadata ingested

Suggestion is to update pandas and SQLAlchemy to the latest versions.

@ValeriyWorld
Copy link
Contributor

Closed with #26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants