Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedshiftMetadataExtractor example sample DAG needed - changing postgres sample DAG not loading views as expected #1461

Closed
raajpackt opened this issue Aug 27, 2021 · 7 comments

Comments

@raajpackt
Copy link

raajpackt commented Aug 27, 2021

Requesting a databuilder example dag for Redshift.
The RedshiftMetadataExtractor is present in the list of extracters.
An example dag for sample metadata for redshift would be helpful. The postgres example databuilder dag does not seem to bring in views. Using RedshiftMetadataExtractor in place of PostgresMetadataExtractor in the sample postgres dag does not solve this problem. They still do not load the views although running fine.

The following is the trace of a successful dag running in airflow which still does not load views in amundsen somehow.

        SELECT
            *
        FROM (
            SELECT
              CURRENT_DATABASE() as cluster,
              c.table_schema as schema,
              c.table_name as name,
              pgtd.description as description,
              c.column_name as col_name,
              c.data_type as col_type,
              pgcd.description as col_description,
              ordinal_position as col_sort_order
            FROM INFORMATION_SCHEMA.COLUMNS c
            INNER JOIN
              pg_catalog.pg_statio_all_tables as st on c.table_schema=st.schemaname and c.table_name=st.relname
            LEFT JOIN
              pg_catalog.pg_description pgcd on pgcd.objoid=st.relid and pgcd.objsubid=c.ordinal_position
            LEFT JOIN
              pg_catalog.pg_description pgtd on pgtd.objoid=st.relid and pgtd.objsubid=0

            UNION

            SELECT
              CURRENT_DATABASE() as cluster,
              view_schema as schema,
              view_name as name,
              NULL as description,
              column_name as col_name,
              data_type as col_type,
              NULL as col_description,
              ordinal_position as col_sort_order
            FROM
                PG_GET_LATE_BINDING_VIEW_COLS()
                    COLS(view_schema NAME, view_name NAME, column_name NAME, data_type VARCHAR, ordinal_position INT)

            UNION

            SELECT
              CURRENT_DATABASE() AS cluster,
              schemaname AS schema,
              tablename AS name,
              NULL AS description,
              columnname AS col_name,
              external_type AS col_type,
              NULL AS col_description,
              columnnum AS col_sort_order
            FROM svv_external_columns
        )

        
        ORDER by cluster, schema, name, col_sort_order ;
        
[2021-08-27 13:53:50,582] {task.py:53} INFO - Running a task
[2021-08-27 13:53:50,584] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 0)
[2021-08-27 13:53:50,587] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Column', 1)
[2021-08-27 13:53:50,587] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Database', 2)
[2021-08-27 13:53:50,588] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Cluster', 2)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Schema', 2)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Schema', 'Table', 'TABLE', 3)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 'Column', 'COLUMN', 3)
[2021-08-27 13:53:50,590] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Database', 'Cluster', 'CLUSTER', 3)
[2021-08-27 13:53:50,590] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Cluster', 'Schema', 'SCHEMA', 3)
[2021-08-27 13:53:51,338] {task.py:72} INFO - Extracted 500 records so far
[2021-08-27 13:53:51,707] {task.py:72} INFO - Extracted 1000 records so far
[2021-08-27 13:53:51,971] {task.py:72} INFO - Extracted 1500 records so far
[2021-08-27 13:53:52,456] {task.py:72} INFO - Extracted 2000 records so far
[2021-08-27 13:53:52,689] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Description', 4)
[2021-08-27 13:53:52,689] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 'Description', 'DESCRIPTION', 3)
[2021-08-27 13:53:52,745] {task.py:72} INFO - Extracted 2500 records so far
[2021-08-27 13:53:52,917] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Column', 'Description', 'DESCRIPTION', 3)
[2021-08-27 13:53:52,981] {task.py:72} INFO - Extracted 3000 records so far
[2021-08-27 13:53:53,070] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Column_Description_DESCRIPTION.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Table_Description_DESCRIPTION.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Description_4.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Cluster_Schema_SCHEMA.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Database_Cluster_CLUSTER.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Table_Column_COLUMN.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Schema_Table_TABLE.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Schema_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Cluster_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Database_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Column_1.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Table_0.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:159} INFO - Publishing Node csv files ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv'], and Relation CSV files ['/var/tmp/amundsen/table_metadata/relationships/Table_Column_COLUMN.csv', '/var/tmp/amundsen/table_metadata/relationships/Column_Description_DESCRIPTION.csv', '/var/tmp/amundsen/table_metadata/relationships/Database_Cluster_CLUSTER.csv', '/var/tmp/amundsen/table_metadata/relationships/Cluster_Schema_SCHEMA.csv', '/var/tmp/amundsen/table_metadata/relationships/Table_Description_DESCRIPTION.csv', '/var/tmp/amundsen/table_metadata/relationships/Schema_Table_TABLE.csv']
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:182} INFO - Creating indices using Node files: ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv']
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,137] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Schema if not exist: 
            CREATE CONSTRAINT ON (node:Schema) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,196] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,196] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,221] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Table if not exist: 
            CREATE CONSTRAINT ON (node:Table) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,223] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,223] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,226] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Cluster if not exist: 
            CREATE CONSTRAINT ON (node:Cluster) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,227] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,227] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,767] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Column if not exist: 
            CREATE CONSTRAINT ON (node:Column) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,787] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,787] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,791] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Database if not exist: 
            CREATE CONSTRAINT ON (node:Database) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,792] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,792] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,795] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Description if not exist: 
            CREATE CONSTRAINT ON (node:Description) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,796] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,796] {neo4j_csv_publisher.py:186} INFO - Publishing Node files: ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv']

EDIT : Is this because of late binding views only being added and not normal views?

@raajpackt raajpackt changed the title Feature Proposal RedshiftMetadataExtractor example sample DAG Aug 27, 2021
@raajpackt raajpackt changed the title RedshiftMetadataExtractor example sample DAG RedshiftMetadataExtractor example sample DAG needed - changing postgres sample DAG not loading views as expected Aug 27, 2021
@delwaterman
Copy link
Contributor

I tried the sample_postgres_loader.py on postgres database and its not working at all. Running the services with docker-compose I can't even see logs of NEO4j receiving updates

@delwaterman
Copy link
Contributor

@raajpackt I think I found the bug with the script. See #1466

@raajpackt
Copy link
Author

raajpackt commented Sep 1, 2021

@delwaterman does it now handle the views as well after adding the config?

I tried to use the RedshiftMetadataExtractor in this script but it still would not load views.
Added the config line you added in your commit but still does not fetch views.
Could we have a working redshift sample dag example which would be helpful really?

https://github.com/amundsen-io/amundsen/blob/main/databuilder/example/dags/postgres_sample_dag.py

@delwaterman
Copy link
Contributor

As I read the example it's only fetching tables from Redshift based off the query in the extractor: https://github.com/amundsen-io/amundsen/blob/main/databuilder/databuilder/extractor/redshift_metadata_extractor.py

You might have to add/create your own custom extractor to pull out views.

The fix I did was to make sure that elascticsearch was properly updated.

@raajpackt
Copy link
Author

@delwaterman Yes, figured that we might need an extractor for that. I could not figure out how to build one hence posted this as a feature request to have views in. Surprisingly there is a redshift extractor that supports late binding views and still does not support normal views which was a bit non intuitive (mentioned in the above issue).

@stale
Copy link

stale bot commented Sep 15, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Sep 15, 2021
@stale
Copy link

stale bot commented Oct 6, 2021

This issue has been automatically closed for inactivity. If you still wish to make these changes, please open a new pull request or reopen this one.

@stale stale bot closed this as completed Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants