Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1945 Issue: update SingerHelper read method #2053

Merged
merged 8 commits into from
Feb 15, 2021

Conversation

yevhenii-ldv
Copy link
Contributor

@yevhenii-ldv yevhenii-ldv commented Feb 12, 2021

What

It was found that for all Singer-based connectors, the data read from the console can be not always quite correct. The problem was that we use two IOTextWrappers: one for stdout, and the second for stderror. In the code, we exit the loop and stop receiving data from the console as soon as one of the IOTextWrappers gives us an empty string (this means the end of the file). However, we need both IOTextWrappers to return an empty string and only then stop reading from the output console. Thus, the previous code sometimes exited the loop earlier and did not read a certain number of records and states, this was clearly seen on the "users" stream for the Slack connector..

How

Describe the solution

Pre-merge Checklist

  • Run integration tests
  • Publish Docker images

Recommended reading order

  1. test.java
  2. component.ts
  3. the rest

@yevhenii-ldv
Copy link
Contributor Author

I left this PR as a WIP, as I want to work on it a little more and optimize it, removing such a large number of code nestings for method read.

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch 👍

logger.log_by_prefix(*std_data)

@staticmethod
def _read_std_rows(process: subprocess.Popen, is_message, transform) -> Generator[AirbyteMessage, None, None]:
Copy link
Contributor

@eugene-kulak eugene-kulak Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _read_std_rows(process: subprocess.Popen, is_message, transform) -> Generator[AirbyteMessage, None, None]:
def _read_std_rows(process: subprocess.Popen, is_message, transform) -> Iterator[Tuple[str, IOTextWrapper]:

@@ -163,48 +163,67 @@ def get_catalogs(logger, shell_command: str, sync_mode_overrides: Dict[str, Sync
@staticmethod
def read(logger, shell_command, is_message=(lambda x: True), transform=(lambda x: x)) -> Generator[AirbyteMessage, None, None]:
Copy link
Contributor

@eugene-kulak eugene-kulak Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove transform, I don't see its usage it is not transparent enough why it is here

yield line, "ERROR"

@staticmethod
def _classify_and_convert_out_json_to_airbyte_message(out_json: Dict, transform) -> AirbyteMessage:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _classify_and_convert_out_json_to_airbyte_message(out_json: Dict, transform) -> AirbyteMessage:
def _airbyte_message_from_json(out_json: Mapping[str, Ayy], transform) -> Optional[AirbyteMessage]:

@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented Feb 15, 2021

/test connector=source-slack-singer

🕑 source-slack-singer https://github.com/airbytehq/airbyte/actions/runs/568626988
✅ source-slack-singer https://github.com/airbytehq/airbyte/actions/runs/568626988

@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented Feb 15, 2021

/test connector=source-slack-singer

🕑 source-slack-singer https://github.com/airbytehq/airbyte/actions/runs/568657352
✅ source-slack-singer https://github.com/airbytehq/airbyte/actions/runs/568657352

@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented Feb 15, 2021

/test connector=source-slack-singer

🕑 source-slack-singer https://github.com/airbytehq/airbyte/actions/runs/568686164
✅ source-slack-singer https://github.com/airbytehq/airbyte/actions/runs/568686164

@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented Feb 15, 2021

/test connector=source-slack-singer

🕑 source-slack-singer https://github.com/airbytehq/airbyte/actions/runs/568729712
✅ source-slack-singer https://github.com/airbytehq/airbyte/actions/runs/568729712

@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented Feb 15, 2021

/test connector=source-googleanalytics-singer

🕑 source-googleanalytics-singer https://github.com/airbytehq/airbyte/actions/runs/568752763
✅ source-googleanalytics-singer https://github.com/airbytehq/airbyte/actions/runs/568752763

@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented Feb 15, 2021

/test connector=source-googleanalytics-singer

🕑 source-googleanalytics-singer https://github.com/airbytehq/airbyte/actions/runs/568798150
✅ source-googleanalytics-singer https://github.com/airbytehq/airbyte/actions/runs/568798150

@yevhenii-ldv yevhenii-ldv changed the title WIP: #1945 Issue: update SingerHelper read method #1945 Issue: update SingerHelper read method Feb 15, 2021
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about EOF, once it's addressed we can merge. Good catch! 🎉

if empty_line_counter >= len(selects_list):
eof = True

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should only run this section of code (the try except then if process.returncode etc.. if EOF has not been reached, so this should go outside the while loop

@sherifnada
Copy link
Contributor

sherifnada commented Feb 15, 2021

/publish connector=connectors/source-slack-singer

🕑 connectors/source-slack-singer https://github.com/airbytehq/airbyte/actions/runs/569955859
✅ connectors/source-slack-singer https://github.com/airbytehq/airbyte/actions/runs/569955859

@sherifnada
Copy link
Contributor

sherifnada commented Feb 15, 2021

/publish connector=connectors/source-googleanalytics-singer

🕑 connectors/source-googleanalytics-singer https://github.com/airbytehq/airbyte/actions/runs/569956150
✅ connectors/source-googleanalytics-singer https://github.com/airbytehq/airbyte/actions/runs/569956150

@sherifnada sherifnada merged commit 6a19aa7 into master Feb 15, 2021
@sherifnada sherifnada deleted the ykurochkin/bug-read-method-singerhelper branch February 15, 2021 23:33
This was linked to issues Feb 15, 2021

@staticmethod
def _airbyte_message_from_json(transformed_json: Mapping[str, Any]) -> Optional[AirbyteMessage]:
if transformed_json is None or transformed_json.get("type") == "SCHEMA" or transformed_json.get("type") == "ACTIVATE_VERSION":
Copy link
Contributor

@eugene-kulak eugene-kulak Feb 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if transformed_json is None or transformed_json.get("type") in {"SCHEMA", "ACTIVATE_VERSION"}:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Google Analytics failing CI test. Slack failing CI build
3 participants