Skip to content

Commit

Permalink
Source Google Sheets: exposes row batch size config (#15107)
Browse files Browse the repository at this point in the history
* exposes row batch size config to the connector

* review changes

* bump connector version

* auto-bump connector version [ci skip]

Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
  • Loading branch information
3 people authored Aug 3, 2022
1 parent 392c4f6 commit 2a0c57b
Show file tree
Hide file tree
Showing 6 changed files with 19 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,7 @@
- name: Google Sheets
sourceDefinitionId: 71607ba1-c0ac-4799-8049-7f4b90dd50f7
dockerRepository: airbyte/source-google-sheets
dockerImageTag: 0.2.16
dockerImageTag: 0.2.17
documentationUrl: https://docs.airbyte.io/integrations/sources/google-sheets
icon: google-sheets.svg
sourceType: file
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3365,7 +3365,7 @@
oauthFlowOutputParameters:
- - "access_token"
- - "refresh_token"
- dockerImage: "airbyte/source-google-sheets:0.2.16"
- dockerImage: "airbyte/source-google-sheets:0.2.17"
spec:
documentationUrl: "https://docs.airbyte.io/integrations/sources/google-sheets"
connectionSpecification:
Expand All @@ -3383,6 +3383,12 @@
description: "Enter the link to the Google spreadsheet you want to sync"
examples:
- "https://docs.google.com/spreadsheets/d/1hLd9Qqti3UyLXZB2aFfUWDT7BG-arw2xy4HR3D-dwUb/edit"
row_batch_size:
type: "integer"
title: "Row Batch Size"
description: "Number of rows fetched when making a Google Sheet API call.\
\ Defaults to 200."
default: 200
credentials:
type: "object"
title: "Authentication"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,5 @@ COPY google_sheets_source ./google_sheets_source
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.2.16
LABEL io.airbyte.version=0.2.17
LABEL io.airbyte.name=airbyte/source-google-sheets
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ def read(
sheet_to_column_name = Helpers.parse_sheet_and_column_names_from_catalog(catalog)
spreadsheet_id = Helpers.get_spreadsheet_id(config["spreadsheet_id"])

row_batch_size = config.get("row_batch_size", ROW_BATCH_SIZE)
logger.info(f"Starting syncing spreadsheet {spreadsheet_id}")
# For each sheet in the spreadsheet, get a batch of rows, and as long as there hasn't been
# a blank row, emit the row batch
Expand All @@ -146,13 +147,13 @@ def read(
# if the last row of the interval goes outside the sheet - this is normal, we will return
# only the real data of the sheet and in the next iteration we will loop out.
while row_cursor <= sheet_row_counts[sheet]:
range = f"{sheet}!{row_cursor}:{row_cursor + ROW_BATCH_SIZE}"
range = f"{sheet}!{row_cursor}:{row_cursor + row_batch_size}"
logger.info(f"Fetching range {range}")
row_batch = SpreadsheetValues.parse_obj(
client.get_values(spreadsheetId=spreadsheet_id, ranges=range, majorDimension="ROWS")
)

row_cursor += ROW_BATCH_SIZE + 1
row_cursor += row_batch_size + 1
# there should always be one range since we requested only one
value_ranges = row_batch.valueRanges[0]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ connectionSpecification:
Enter the link to the Google spreadsheet you want to sync
examples:
- https://docs.google.com/spreadsheets/d/1hLd9Qqti3UyLXZB2aFfUWDT7BG-arw2xy4HR3D-dwUb/edit
row_batch_size:
type: integer
title: Row Batch Size
description: Number of rows fetched when making a Google Sheet API call. Defaults to 200.
default: 200
credentials:
type: object
title: Authentication
Expand Down
2 changes: 2 additions & 0 deletions docs/integrations/sources/google-sheets.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ To set up Google Sheets as a source in Airbyte Cloud:
- **(Recommended)** To authenticate your Google account via OAuth, click **Sign in with Google** and complete the authentication workflow.
- To authenticate your Google account via Service Account Key Authentication, enter your [Google Cloud service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys) in JSON format. Make sure the Service Account has the Project Viewer permission. If your spreadsheet is viewable by anyone with its link, no further action is needed. If not, [give your Service account access to your spreadsheet](https://youtu.be/GyomEw5a2NQ%22).
6. For Spreadsheet Link, enter the link to the Google spreadsheet. To get the link, go to the Google spreadsheet you want to sync, click **Share** in the top right corner, and click **Copy Link**.
7. For Row Batch Size, define the number of records you want the Google API to fetch at a time. The default value is 200.

### For Airbyte OSS

Expand Down Expand Up @@ -70,6 +71,7 @@ The [Google API rate limit](https://developers.google.com/sheets/api/limits) is

| Version | Date | Pull Request | Subject |
|---------|------------|------------------------------------------------------------|-------------------------------------------------------------------------------|
| 0.2.17 | 2022-08-03 | [15107](https://github.com/airbytehq/airbyte/pull/15107) | Expose Row Batch Size in Connector Specification |
| 0.2.16 | 2022-07-07 | [13729](https://github.com/airbytehq/airbyte/pull/13729) | Improve configuration field description |
| 0.2.15 | 2022-06-02 | [13446](https://github.com/airbytehq/airbyte/pull/13446) | Retry requests resulting in a server error |
| 0.2.13 | 2022-05-06 | [12685](https://github.com/airbytehq/airbyte/pull/12685) | Update CDK to v0.1.56 to emit an `AirbyeTraceMessage` on uncaught exceptions |
Expand Down

0 comments on commit 2a0c57b

Please sign in to comment.