Adds s3_parallel_dataframe_load example #570

elijahbenizzy · 2023-11-28T23:52:53Z

This downloads data from s3 in parallel. It has a few limitations, but overall is easy to adapt/modify. This is a hub contributions.

[Summary of contribution]

For new dataflows:

Do you have the following?

For existing dataflows -- what has changed?

N/A

How I tested this

Ran locally on a custom s3 bucket

Notes

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Dataflow documentation has been updated if adding/changing functionality.

sweep-ai · 2023-11-28T23:53:57Z

Apply Sweep Rules to your PR?

Apply: All new business logic should have corresponding unit tests.
Apply: Refactor large functions to be more modular.
Apply: Add docstrings to all functions and file headers.

This downloads data from s3 in parallel. It has a few limitations, but overall is easy to adapt/modify. This is a hub contributions.

elijahbenizzy force-pushed the s3-parallel-dataframe-load branch from 2a35977 to 09e4f1b Compare November 28, 2023 23:54

elijahbenizzy temporarily deployed to github-pages November 28, 2023 23:54 — with GitHub Actions Inactive

elijahbenizzy requested review from skrawcz and zilto November 29, 2023 00:21

Adds s3_parallel_dataframe_load example

0d12311

This downloads data from s3 in parallel. It has a few limitations, but overall is easy to adapt/modify. This is a hub contributions.

elijahbenizzy force-pushed the s3-parallel-dataframe-load branch from 09e4f1b to 0d12311 Compare November 29, 2023 00:22

elijahbenizzy temporarily deployed to github-pages November 29, 2023 00:22 — with GitHub Actions Inactive

skrawcz approved these changes Nov 29, 2023

View reviewed changes

skrawcz merged commit b207db7 into main Nov 29, 2023
2 checks passed

skrawcz deleted the s3-parallel-dataframe-load branch November 29, 2023 01:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds s3_parallel_dataframe_load example #570

Adds s3_parallel_dataframe_load example #570

elijahbenizzy commented Nov 28, 2023 •

edited

Loading

sweep-ai bot commented Nov 28, 2023

Adds s3_parallel_dataframe_load example #570

Adds s3_parallel_dataframe_load example #570

Conversation

elijahbenizzy commented Nov 28, 2023 • edited Loading

For new dataflows:

For existing dataflows -- what has changed?

How I tested this

Notes

Checklist

sweep-ai bot commented Nov 28, 2023

Apply Sweep Rules to your PR?

elijahbenizzy commented Nov 28, 2023 •

edited

Loading