Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor python sources #1331

Merged
merged 9 commits into from
Dec 16, 2020
Merged

Refactor python sources #1331

merged 9 commits into from
Dec 16, 2020

Conversation

ChristopheDuong
Copy link
Contributor

@ChristopheDuong ChristopheDuong commented Dec 15, 2020

What

Closes #1321
Closes #1322

How

  • Move ConfigContainer to be specific for Sources based on SingerSource only.
  • Python Sources does not refer to ConfigContainer anymore
  • In addition, configs, catalogs, and state objects are passed to the different methods that need to be re-implemented instead of passing file paths
  • Add typing to help understand what is expected in each method

Pre-merge Checklist

  • Run integration tests
  • Publish Docker images

Recommended reading order

  1. airbyte-integrations/bases/base-python/base_python/integration.py
  2. airbyte-integrations/bases/base-singer/base_singer/source.py
  3. the rest

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Dec 15, 2020

/test connector=all

🕑 all https://github.com/airbytehq/airbyte/actions/runs/423696759
❌ all https://github.com/airbytehq/airbyte/actions/runs/423696759

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Much better.

I don't think a developer should have to know about ConfigContainer at all. While you have removed it from the base python source, it is now exposed in the base singer source. I don't think it is necessary. See some comments below / let me know if I'm missing something.

)

# can be overridden to change an input catalog
def transform_catalog(self, catalog_path: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what transform_catalog and transform_state needed for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are overridden in singer to return path strings instead of actual catalogs and state objects

Copy link
Contributor

@michel-tricot michel-tricot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a lot better now!
% killing ConfigContainer completely

@ChristopheDuong ChristopheDuong force-pushed the chris/move-ConfigContainer branch from 33f2520 to 0b7ebf5 Compare December 16, 2020 10:07
@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Dec 16, 2020

/test connector=all

🕑 all https://github.com/airbytehq/airbyte/actions/runs/425443340
❌ all https://github.com/airbytehq/airbyte/actions/runs/425443340

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Dec 16, 2020

/test connector=all

🕑 all https://github.com/airbytehq/airbyte/actions/runs/425580018
❌ all https://github.com/airbytehq/airbyte/actions/runs/425580018

since some singer source may need to persist extra temp file
from tap_google_analytics import GAClient

CREDENTIALS_FILE = os.path.join(tempfile.gettempdir(), "credentials.json")
Copy link
Contributor Author

@ChristopheDuong ChristopheDuong Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tempfile is never deleted so it might cause problems as more and more sync are run using this source... (plus it is writing some sensitive credentials information)

Instead, a new method on the source abstraction called configure allows source to override and add extra temp file to persist in the Temp_dir context. This is automatically deleted when the source is done running

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Dec 16, 2020

/test connector=all

🕑 all https://github.com/airbytehq/airbyte/actions/runs/425853242
❌ all https://github.com/airbytehq/airbyte/actions/runs/425853242
🕑 all https://github.com/airbytehq/airbyte/actions/runs/425853242
✅ all https://github.com/airbytehq/airbyte/actions/runs/425853242

@ChristopheDuong ChristopheDuong merged commit e92e9aa into master Dec 16, 2020
@ChristopheDuong ChristopheDuong deleted the chris/move-ConfigContainer branch December 16, 2020 16:52
@ChristopheDuong
Copy link
Contributor Author

Publish Docker images

I will open new PR to bump versions and publish separately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants