Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Python SDK example: write query result to BigQuery using Cloud Functions #908

Merged
merged 6 commits into from
Dec 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion examples/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ The full details of all Looker API endpoints are listed in Looker Docs: [Version
## Full Applications

- [Flask full app demo](lookersdk-flask)
- [Google Cloud Function: User Creation](cloud-function-user-provision)
- [Google Cloud Function & Google Sheet : Create new users from reading email addresses in a Google Sheet](cloud-function-user-provision)
- [Google Cloud Function & BigQuery: Run a query in Looker, and write the result to a BigQuery table](cloud-function-write-to-bigquery)

## Connection : Manage Database Connections

Expand Down
13 changes: 13 additions & 0 deletions examples/python/cloud-function-write-to-bigquery/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Run a Looker query and write the result to a BigQuery table using Cloud Function

This repository contains a [Google Cloud Function](https://cloud.google.com/functions) that leverages Looker Python SDK and the Python client for BigQuery to get the result of a query in Looker and load the result to a BigQuery table.

A potential use case is to get data from Looker's System Activity and write to BigQuery (currently, Looker's System Activity stores a maximum of 100k rows, or 90 days of historical query and event data). These BigQuery tables can then be registered as a connection in Looker for additional LookML data modeling. For more flexibility on System Activity, consider using [Elite System Activity](https://docs.looker.com/admin-options/system-activity/elite-system-activity).

Cloud Function is easy to set up and suitable for light-weighted, on-the-fly tasks. For heavy ETL/ELT workloads, consider using Looker's native actions (sending to Google Cloud Storage) or ETL/ELT tools (such as GCP's Dataflow).

## Demo

<p align="center">
<img src="https://storage.googleapis.com/tutorials-img/Cloud%20Function%20Write%20to%20BQ%20from%20Looker.gif" alt="Setting environmental variables in Cloud Function UI">
</p>
68 changes: 68 additions & 0 deletions examples/python/cloud-function-write-to-bigquery/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""This Cloud Function accomplishes the following tasks:
1. Get data from a Looker query in CSV format
2. Transform columns' names by replacing a white space with an underscore
("User Name" to "User_Name") since BigQuery does not accept a white space inside columns' names
3. Write the modified column name and data to a CSV file stored in Cloud Functions' temporary disk
4. Load the CSV file to a BigQuery table

Last modified: November 2021
"""

from google.cloud import bigquery
import looker_sdk
client = bigquery.Client()
sdk = looker_sdk.init40()

def main(request):
get_data_from_looker()
write_to_file()
load_to_bq()
return("Successfully loaded data from Looker to BigQuery")

def get_data_from_looker(query_id=1):
query = sdk.run_query(
query_id=query_id,
result_format="csv",
limit= 5000
)
print("Successfully retrieved data from Looker")
return query

def write_to_file():
data = get_data_from_looker()
# Transform the columns' name (i.e: "User ID" to become "User_ID") because
# BigQuery does not accept a white space inside columns' name
cnt = 0 # cnt is to find the index of the character after the last character of columns'names
for i in data:
if i == "\n":
break
else:
cnt += 1
header = data[:cnt]
header_to_write = header.replace(" ", "_")
data_to_write = data[cnt:]
# Write header and data to temporary disk
with open('/tmp/table.csv', "w") as csv: # Files can only be modified/written inside tmp/
csv.write(header_to_write)
csv.write(data_to_write)
print("Successfully wrote data to a CSV file stored in temporary disk")

def load_to_bq():
# Set up the table inside BQ in advance: The names and types of columns in BQ must match the
# names and types of the query result from Looker (for example: User_ID, type: Integer).
# Optionally, write additional logic to make an empty table with matching columns' names
# Example: https://github.com/googleapis/python-bigquery/blob/main/samples/create_table.py
table_id = "myproject.myschema.mytable"
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
)
with open("/tmp/table.csv", "rb") as source_file:
job = client.load_table_from_file(source_file, table_id, job_config=job_config)
job.result() # Wait for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)

Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Function dependencies, for example:
# package>=version
looker_sdk
google-api-python-client==1.7.9
google-auth-httplib2==0.0.3
google-auth-oauthlib==0.4.0
google-cloud-bigquery