Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Choosing and configuring DataConnectors #3533

Merged
merged 71 commits into from
Nov 2, 2021

Conversation

NathanFarmer
Copy link
Contributor

@NathanFarmer NathanFarmer commented Oct 12, 2021

Changes proposed in this pull request:

  • DEVREL-213
  • Migrate "How to configure an InferredAssetDataConnector" and put under test
  • Migrate "How to configure a ConfiguredAssetDataConnector" and put under test
  • Migrate "How to configure a RuntimeDataConnector" and put under test
  • Migrate "How to choose which DataConnector to use" and put under test

Notable changes not in this pull request:

  • Examples for SQL (everything here is for file-system-like)

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

@netlify
Copy link

netlify bot commented Oct 12, 2021

✔️ Deploy Preview for niobium-lead-7998 ready!

🔨 Explore the source changes: 4033abc

🔍 Inspect the deploy log: https://app.netlify.com/sites/niobium-lead-7998/deploys/61816c132ab0fd0007b22542

😎 Browse the preview: https://deploy-preview-3533--niobium-lead-7998.netlify.app

@github-actions
Copy link
Contributor

HOWDY! This is your friendly 🤖 CHANGELOG bot 🤖

Please don't forget to add a clear and succinct description of your change under the Develop header in docs_rtd/changelog.rst, if applicable. This will help us with the release process. See the Contribution checklist in the Great Expectations documentation for the type of labels to use!

Thank you!

Copy link
Contributor

@talagluck talagluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome overall! I think we might need some more explanation on some specific topics. We may also need a bit more detail explaining when to use Configured vs Inferred. Is it just that you can use Inferred when Data Assets are not divided equally in the same location, but where there's inconsistency, you need to use Configured? Inferred are actually more configurable than I had expected, so making this more explicit would be helpful.

@NathanFarmer NathanFarmer changed the title [DOCS] Choosing and configuring Data Connectors [DOCS] Choosing and configuring DataConnectors Oct 20, 2021
Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Some comments inside.
  2. I did not see any examples that use the Batch filtering capabilities (including the use of a custom Python based filter query.

Copy link
Contributor

@talagluck talagluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall! Just a few more minor nits. I haven't gone through the testing piece in detail either.

Comment on lines +22 to +29
| InferredAssetDataConnectors | ConfiguredAssetDataConnectors |
| --- | --- |
| InferredAssetFilesystemDataConnector | ConfiguredAssetFilesystemDataConnector |
| InferredAssetFilePathDataConnector | ConfiguredAssetFilePathDataConnector |
| InferredAssetAzureDataConnector | ConfiguredAssetAzureDataConnector |
| InferredAssetGCSDataConnector | ConfiguredAssetGCSDataConnector |
| InferredAssetS3DataConnector | ConfiguredAssetS3DataConnector |
| InferredAssetSqlDataConnector | ConfiguredAssetSqlDataConnector |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this table.


The configuration would also work with a regex capturing the entire filename (e.g. `pattern: (.*)\.csv`). However, capturing the month on its own allows for `batch_identifiers` to be used to retrieve a specific Batch of the Data Asset. For more information about capture groups, refer to the Python documentation on [regular expressions](https://docs.python.org/3/library/re.html#re.Match.group).

Later on we could retrieve the data in `yellow_tripdata_2019-02.csv` of `yellow_tripdata` as its own batch using `context.get_validator()` by specifying `{"month": "2019-02"}` as the `batch_identifier`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wild - I hadn't realized that you could specify batch_identifiers at the top level of the get_validator - I've only seen it specified within the batch_request.

@NathanFarmer
Copy link
Contributor Author

NathanFarmer commented Oct 27, 2021

@alexsherstinsky

  • I did not see any examples that use the Batch filtering capabilities (including the use of a custom Python based filter query.

Is this referring to the data_connector_query parameter of BatchRequest?

If so, I have added an example to how_to_configure_an_inferredassetdataconnector.md lines 268-291.

Copy link
Contributor

@talagluck talagluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one question/possible update, but I don't want it to gate. Amazing work!

Copy link
Contributor

@talagluck talagluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one question/possible update, but I don't want it to gate. Amazing work!

Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (strong work!)

@NathanFarmer NathanFarmer enabled auto-merge (squash) November 2, 2021 15:24
Copy link
Contributor

@Shinnnyshinshin Shinnnyshinshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks wonderful :)

@NathanFarmer NathanFarmer merged commit 948d6ed into develop Nov 2, 2021
@NathanFarmer NathanFarmer deleted the docs/DEVREL-213/everything-dataconnectors branch November 2, 2021 17:28
Shinnnyshinshin pushed a commit that referenced this pull request Nov 3, 2021
…-tests-for-docs

* develop:
  DevRel - docs fixes (#3498)
  Remove --no-spark flag from docs tests (#3625)
  [DOCS] Choosing and configuring DataConnectors (#3533)
  [BUGFIX] Be able to use spark execution engine with spark reuse flag (#3541)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants