Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file-based CDK: Configurable strings_can_be_null #29298

Merged
merged 34 commits into from
Aug 14, 2023
Merged

Conversation

girarda
Copy link
Contributor

@girarda girarda commented Aug 9, 2023

What

How

When converting values to null:

  1. If the value is not in the null_values set -> value is not None
  2. If the value is in the null_values set AND its type is not "string" -> value is None
  3. If the value is in the null_values set AND its type is "string" and strings_can_be_null" is True -> value is None
  4. If the value is in the null_values set AND its type is "string" and strings_can_be_null" is False -> value is not None

Recommended reading order

  1. airbyte-cdk/python/airbyte_cdk/sources/file_based/config/csv_format.py
  2. airbyte-cdk/python/airbyte_cdk/sources/file_based/file_types/csv_parser.py

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Aug 9, 2023
@@ -210,13 +230,6 @@ def _cast_types(
prop_type = property_types.get(key)
cast_value: Any = value

if isinstance(prop_type, list):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved outside the function so the deduped property types can also be used in _to_nullable

@@ -43,6 +43,18 @@
"description": "Used during spec; allows the developer to configure the cloud provider specific options\nthat are needed when users configure a file-based source.",
"type": "object",
"properties": {
"start_date": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 100% sure why this is being re-ordered, but properties is a map so this shouldn't matter

@girarda girarda changed the title first pass file-based: Configurable strings_can_be_null Aug 9, 2023
@girarda girarda changed the title file-based: Configurable strings_can_be_null file-based CDK: Configurable strings_can_be_null Aug 9, 2023
@girarda girarda marked this pull request as ready for review August 9, 2023 19:40
@girarda girarda requested a review from a team as a code owner August 9, 2023 19:40
@girarda girarda requested review from maxi297 and clnoll August 9, 2023 19:40
Copy link
Contributor

@clnoll clnoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @girarda! One small question.

Base automatically changed from issue-28893/infer-schema-csv to master August 14, 2023 19:14
@girarda girarda merged commit b512fa4 into master Aug 14, 2023
11 checks passed
@girarda girarda deleted the alex/strings_can_be_null branch August 14, 2023 19:51
@girarda girarda restored the alex/strings_can_be_null branch August 14, 2023 20:39
harrytou pushed a commit to KYVENetwork/airbyte that referenced this pull request Sep 1, 2023
* [ISSUE airbytehq#28893] infer csv schema

* [ISSUE airbytehq#28893] align with pyarrow

* Automated Commit - Formatting Changes

* [ISSUE airbytehq#28893] legacy inference and infer only when needed

* [ISSUE airbytehq#28893] fix scenario tests

* [ISSUE airbytehq#28893] using discovered schema as part of read

* [ISSUE airbytehq#28893] self-review + cleanup

* [ISSUE airbytehq#28893] fix test

* [ISSUE airbytehq#28893] code review part #1

* [ISSUE airbytehq#28893] code review part #2

* Fix test

* formatcdk

* first pass

* [ISSUE airbytehq#28893] code review

* fix mypy issues

* comment

* rename for clarity

* Add a scenario test case

* this isn't optional anymore

* FIX test log level

* Re-adding failing tests

* [ISSUE airbytehq#28893] improve inferrence to consider multiple types per value

* Automated Commit - Formatting Changes

* [ISSUE airbytehq#28893] remove InferenceType.PRIMITIVE_AND_COMPLEX_TYPES

* Code review

* Automated Commit - Formatting Changes

* fix unit tests

---------

Co-authored-by: maxi297 <maxime@airbyte.io>
Co-authored-by: maxi297 <maxi297@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants