Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source connector tutorial #1428

Merged
merged 18 commits into from
Dec 25, 2020
Merged

Conversation

sherifnada
Copy link
Contributor

@sherifnada sherifnada commented Dec 22, 2020

What

  1. Adds a new stock ticker source alongside a step by step guide for creating it.
  2. A potentially controversial decision I made here was to create the source using vanilla python entirely from scratch and with basically no dependencies. This is to minimize the chances of it becoming outdated and to illustrate the core Airbyte concepts to anyone, even non-python programmers, without any "noise" introduced by helper libraries.
  3. This does not yet implement incremental sync. Will add this in a future PR but this was getting pretty large so I want feedback.
  4. I'm pretty concerned at how unmaintainable this is because of the duplication that goes into maintaining a tutorial like this. Just calling it out. I have some ideas to solve but it is not solved in this PR.

Reading Order

  1. docs/tutorials/building-a-new-connector.md <-- the tutorial
  2. source.py <-- the final state of the connector built in the tutorial

@sherifnada sherifnada force-pushed the sherifnada/source-connector-tutorial branch from aa75ba0 to 635bcd6 Compare December 22, 2020 22:55
@sherifnada sherifnada marked this pull request as ready for review December 22, 2020 23:02
@sherifnada sherifnada requested review from michel-tricot, jrhizor and cgardens and removed request for michel-tricot and jrhizor December 22, 2020 23:02
Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we walked through this together. it is good! sherif is still processing some feedback but approving to unblock for when he is ready.

@michel-tricot
Copy link
Contributor

michel-tricot commented Dec 23, 2020

.2 is very controversial, I strongly disagree with it. Especially if the tutorial is "building a new connector". If this becomes the reference, it is counter-productive with what we are thriving to become.

If it was a tutorial about Airbyte protocol then why not. But if it is building a new connector it should use the recommended way instead of starting from scratch.

Regarding the maintainability, should we link to files instead?, to minimize the drift that will happen?

Copy link
Contributor

@michel-tricot michel-tricot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment about it

@sherifnada
Copy link
Contributor Author

sherifnada commented Dec 23, 2020

@michel-tricot would it change your mind if we:

  1. publish our helpers as python modules that can be installed in setup.py just like any other python module instead of the current inheritance via Docker
  2. autogenerate python documentation (equivalent to javadocs) for those + good READMEs that describe the helpers available in those libraries
  3. At the end of this tutorial have a section on how to edit the vanilla source to leverage the language-specific helpers
    ?

The tradeoff of using a vanilla python implementation is that it is the lowest possible barrier to entry across all languages (including Python because you only need to absorb the one file being edited throughout the tutorial) at the cost of not being representative of a "prod" connector in any language.

If we go the other way (fully use python helpers and inherit base docker images), for a newcomer to Airbyte the barrier to entry is higher because they need to understand the python & docker dependencies in addition to everything else in the vanilla tutorial.

I think I would be persuaded to do a more productionized Python version if our current dependencies were visible to the user. But right now they are too transparent, specifically:

  1. The Docker entrypoint (specifically that it is specified 2/3 layers above the connector Docker image) is very inscrutable
  2. The way we depend on monorepo python deps (docker inheritance) is inscrutable
  3. We don't have pydocs for those modules (e.g: a full README and a clear entrypoint to learning the helpers/framework)

@cgardens
Copy link
Contributor

I really don't want to lose this tutorial. You can definitely argue whether this should be the "official python tutorial" or the "learning to develop with airbyte" or "getting started". I think this tutorial does a good job of meeting a developer at their existing mental models.

This is what I wasn't able to successfully communicate on Friday about being thoughtful about abstraction. Abstraction in this case helps us (airbyters) write code faster and more maintainably. But for my money, if you have not worked with Airbyte before, the tutorial Sherif has written is more approachable and faster for going from 0 to something that works than trying to induct someone immediately into all of our abstraction.

There is a lot of power in showing how much you can do with airbyte and just the python standard library. This is where can meet the contributor closest to their expectations. Let's not lose it, even if we want to promote a different tutorial using our abstraction. I think we can better take int account how abstraction affects superusers writing maintainable code faster and new contributors learning us from.

Some examples of things we get here versus the abstracted version is:

  • easy to see that the objects that are passed around are just json.
  • understand that there is no magic in between the python script and the CLI and docker.
  • the source.py python script is directly runnable without intermediaries.
  • don't need to look at any other libraries outside of the python standard lib

@michel-tricot
Copy link
Contributor

Agreed with your point. It is more of a "anatomy of a connector". Don't want to lose it either and we should move forward with it. It is just a matter of finding the proper name for this tutorial

on the following page, click the "add destination" button then "add new destination":
![](../.gitbook/assets/newsourcetutorial_add_destination.png)

Configure a local CSV destination as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the local Json here, local CSV is very hard to leverage with all the quoting logic

Copy link
Contributor

@michel-tricot michel-tricot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% name

@sherifnada sherifnada merged commit ef3b046 into master Dec 25, 2020
@sherifnada sherifnada deleted the sherifnada/source-connector-tutorial branch December 25, 2020 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants