This project has a set of executable scripts that:
- Lists the tables in BigQuery;
- Downloads the query codes for views and scheduled queries;
- Create a flowchart (using
pygraphviz
) of the BigQuery table relations.
NOTE: Besides python, this script requires the shell command gcloud
to get
an access token to bigquery.
This script can print to the screen a list of tables, save the list to a file in BigQuery and downloads the query codes from views.
This script downloads non-disabled scheduled query codes.
The script can be executed with a input parameter (a JSON config file) or without any parameters (in which case it used a hard-coded config).
The hard coded config reads the previously downloaded views and scheduled queries and creates a flowchart to 'this_file.pdf'. The configuration passed to the script can demand views and scheduled queries to be updated (i.e. downloaded) first. In this case, old queries not present in BigQuery anymore are deleted.
The configuration file is a JSON file with the following keywords:
credentials
(str): path to JSON file containing Google Cloud credentials;printout
(bool): whether or not to print on screen information about the tables;table_list_file
(str): path to file where to save the table information above (can be set tonull
to avoid saving this to a file);get_views
(bool): whether or not to update the local copy of the views (and delete old saved views);views_path
(str): path to a folder where to save view queries as.sql
files;get_scheduled
(bool): whether or not to update the local copy of the scheduled queries (and delete old saved ones);scheduled_path
(str): path to a folder where to save scheduled queries as.sql
files;flowchart
(bool): whether or not to produce a flowchart of the tables and views dependencies;flowchart_file
(str): path to a PDF file where to save the flowchart.
Henrique S. Xavier @hsxavier