RedshiftMetadataExtractor example sample DAG needed - changing postgres sample DAG not loading views as expected #1461

raajpackt · 2021-08-27T13:36:13Z

Requesting a databuilder example dag for Redshift.
The RedshiftMetadataExtractor is present in the list of extracters.
An example dag for sample metadata for redshift would be helpful. The postgres example databuilder dag does not seem to bring in views. Using RedshiftMetadataExtractor in place of PostgresMetadataExtractor in the sample postgres dag does not solve this problem. They still do not load the views although running fine.

The following is the trace of a successful dag running in airflow which still does not load views in amundsen somehow.

        SELECT
            *
        FROM (
            SELECT
              CURRENT_DATABASE() as cluster,
              c.table_schema as schema,
              c.table_name as name,
              pgtd.description as description,
              c.column_name as col_name,
              c.data_type as col_type,
              pgcd.description as col_description,
              ordinal_position as col_sort_order
            FROM INFORMATION_SCHEMA.COLUMNS c
            INNER JOIN
              pg_catalog.pg_statio_all_tables as st on c.table_schema=st.schemaname and c.table_name=st.relname
            LEFT JOIN
              pg_catalog.pg_description pgcd on pgcd.objoid=st.relid and pgcd.objsubid=c.ordinal_position
            LEFT JOIN
              pg_catalog.pg_description pgtd on pgtd.objoid=st.relid and pgtd.objsubid=0

            UNION

            SELECT
              CURRENT_DATABASE() as cluster,
              view_schema as schema,
              view_name as name,
              NULL as description,
              column_name as col_name,
              data_type as col_type,
              NULL as col_description,
              ordinal_position as col_sort_order
            FROM
                PG_GET_LATE_BINDING_VIEW_COLS()
                    COLS(view_schema NAME, view_name NAME, column_name NAME, data_type VARCHAR, ordinal_position INT)

            UNION

            SELECT
              CURRENT_DATABASE() AS cluster,
              schemaname AS schema,
              tablename AS name,
              NULL AS description,
              columnname AS col_name,
              external_type AS col_type,
              NULL AS col_description,
              columnnum AS col_sort_order
            FROM svv_external_columns
        )

        
        ORDER by cluster, schema, name, col_sort_order ;
        
[2021-08-27 13:53:50,582] {task.py:53} INFO - Running a task
[2021-08-27 13:53:50,584] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 0)
[2021-08-27 13:53:50,587] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Column', 1)
[2021-08-27 13:53:50,587] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Database', 2)
[2021-08-27 13:53:50,588] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Cluster', 2)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Schema', 2)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Schema', 'Table', 'TABLE', 3)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 'Column', 'COLUMN', 3)
[2021-08-27 13:53:50,590] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Database', 'Cluster', 'CLUSTER', 3)
[2021-08-27 13:53:50,590] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Cluster', 'Schema', 'SCHEMA', 3)
[2021-08-27 13:53:51,338] {task.py:72} INFO - Extracted 500 records so far
[2021-08-27 13:53:51,707] {task.py:72} INFO - Extracted 1000 records so far
[2021-08-27 13:53:51,971] {task.py:72} INFO - Extracted 1500 records so far
[2021-08-27 13:53:52,456] {task.py:72} INFO - Extracted 2000 records so far
[2021-08-27 13:53:52,689] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Description', 4)
[2021-08-27 13:53:52,689] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 'Description', 'DESCRIPTION', 3)
[2021-08-27 13:53:52,745] {task.py:72} INFO - Extracted 2500 records so far
[2021-08-27 13:53:52,917] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Column', 'Description', 'DESCRIPTION', 3)
[2021-08-27 13:53:52,981] {task.py:72} INFO - Extracted 3000 records so far
[2021-08-27 13:53:53,070] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Column_Description_DESCRIPTION.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Table_Description_DESCRIPTION.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Description_4.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Cluster_Schema_SCHEMA.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Database_Cluster_CLUSTER.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Table_Column_COLUMN.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Schema_Table_TABLE.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Schema_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Cluster_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Database_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Column_1.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Table_0.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:159} INFO - Publishing Node csv files ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv'], and Relation CSV files ['/var/tmp/amundsen/table_metadata/relationships/Table_Column_COLUMN.csv', '/var/tmp/amundsen/table_metadata/relationships/Column_Description_DESCRIPTION.csv', '/var/tmp/amundsen/table_metadata/relationships/Database_Cluster_CLUSTER.csv', '/var/tmp/amundsen/table_metadata/relationships/Cluster_Schema_SCHEMA.csv', '/var/tmp/amundsen/table_metadata/relationships/Table_Description_DESCRIPTION.csv', '/var/tmp/amundsen/table_metadata/relationships/Schema_Table_TABLE.csv']
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:182} INFO - Creating indices using Node files: ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv']
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,137] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Schema if not exist: 
            CREATE CONSTRAINT ON (node:Schema) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,196] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,196] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,221] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Table if not exist: 
            CREATE CONSTRAINT ON (node:Table) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,223] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,223] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,226] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Cluster if not exist: 
            CREATE CONSTRAINT ON (node:Cluster) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,227] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,227] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,767] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Column if not exist: 
            CREATE CONSTRAINT ON (node:Column) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,787] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,787] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,791] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Database if not exist: 
            CREATE CONSTRAINT ON (node:Database) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,792] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,792] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,795] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Description if not exist: 
            CREATE CONSTRAINT ON (node:Description) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,796] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,796] {neo4j_csv_publisher.py:186} INFO - Publishing Node files: ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv']

EDIT : Is this because of late binding views only being added and not normal views?

The text was updated successfully, but these errors were encountered:

delwaterman · 2021-08-31T14:59:15Z

I tried the sample_postgres_loader.py on postgres database and its not working at all. Running the services with docker-compose I can't even see logs of NEO4j receiving updates

delwaterman · 2021-08-31T16:13:14Z

@raajpackt I think I found the bug with the script. See #1466

raajpackt · 2021-09-01T07:43:27Z

@delwaterman does it now handle the views as well after adding the config?

I tried to use the RedshiftMetadataExtractor in this script but it still would not load views.
Added the config line you added in your commit but still does not fetch views.
Could we have a working redshift sample dag example which would be helpful really?

https://github.com/amundsen-io/amundsen/blob/main/databuilder/example/dags/postgres_sample_dag.py

delwaterman · 2021-09-01T13:32:21Z

As I read the example it's only fetching tables from Redshift based off the query in the extractor: https://github.com/amundsen-io/amundsen/blob/main/databuilder/databuilder/extractor/redshift_metadata_extractor.py

You might have to add/create your own custom extractor to pull out views.

The fix I did was to make sure that elascticsearch was properly updated.

raajpackt · 2021-09-01T13:50:24Z

@delwaterman Yes, figured that we might need an extractor for that. I could not figure out how to build one hence posted this as a feature request to have views in. Surprisingly there is a redshift extractor that supports late binding views and still does not support normal views which was a bit non intuitive (mentioned in the above issue).

stale · 2021-09-15T15:29:21Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2021-10-06T16:33:05Z

This issue has been automatically closed for inactivity. If you still wish to make these changes, please open a new pull request or reopen this one.

raajpackt changed the title ~~Feature Proposal~~ RedshiftMetadataExtractor example sample DAG Aug 27, 2021

raajpackt changed the title ~~RedshiftMetadataExtractor example sample DAG~~ RedshiftMetadataExtractor example sample DAG needed - changing postgres sample DAG not loading views as expected Aug 27, 2021

stale bot added the stale label Sep 15, 2021

stale bot closed this as completed Oct 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RedshiftMetadataExtractor example sample DAG needed - changing postgres sample DAG not loading views as expected #1461

RedshiftMetadataExtractor example sample DAG needed - changing postgres sample DAG not loading views as expected #1461

raajpackt commented Aug 27, 2021 •

edited

Loading

delwaterman commented Aug 31, 2021

delwaterman commented Aug 31, 2021

raajpackt commented Sep 1, 2021 •

edited

Loading

delwaterman commented Sep 1, 2021

raajpackt commented Sep 1, 2021

stale bot commented Sep 15, 2021

stale bot commented Oct 6, 2021

RedshiftMetadataExtractor example sample DAG needed - changing postgres sample DAG not loading views as expected #1461

RedshiftMetadataExtractor example sample DAG needed - changing postgres sample DAG not loading views as expected #1461

Comments

raajpackt commented Aug 27, 2021 • edited Loading

delwaterman commented Aug 31, 2021

delwaterman commented Aug 31, 2021

raajpackt commented Sep 1, 2021 • edited Loading

delwaterman commented Sep 1, 2021

raajpackt commented Sep 1, 2021

stale bot commented Sep 15, 2021

stale bot commented Oct 6, 2021

raajpackt commented Aug 27, 2021 •

edited

Loading

raajpackt commented Sep 1, 2021 •

edited

Loading