Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard Tests: Implement v2 test suites #2181 #2792

Merged
merged 10 commits into from
Apr 19, 2021

Conversation

keu
Copy link
Contributor

@keu keu commented Apr 7, 2021

closes #2181

What

Describe what the change is solving
It helps to add screenshots if it affects the frontend.

How

Describe the solution

Pre-merge Checklist

  • Run integration tests
  • Publish Docker images

Recommended reading order

  1. test.java
  2. component.ts
  3. the rest

@keu keu self-assigned this Apr 7, 2021
@auto-assign auto-assign bot requested review from jrhizor and sherifnada April 7, 2021 22:40
@keu keu linked an issue Apr 8, 2021 that may be closed by this pull request
@keu keu changed the title WIP: Implement v2 test suites #2181 Standard Tests: Implement v2 test suites #2181 Apr 8, 2021
records_3 = filter_output(output, type_=Type.RECORD)
states_3 = filter_output(output, type_=Type.STATE)

assert not records_3, "There should be no records after second incremental read"
Copy link
Contributor Author

@keu keu Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, here should be a better check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this verifies the incremental sync invariant described here

records_3 = filter_output(output, type_=Type.RECORD)
states_3 = filter_output(output, type_=Type.STATE)

assert not records_3, "There should be no records after second incremental read"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this verifies the incremental sync invariant described here

@@ -51,6 +50,13 @@ def test_check(self, connector_config, docker_runner: ConnectorRunner):
assert len(con_messages) == 1, "Connection status message should be emitted exactly once"
assert con_messages[0].connectionStatus.status == Status.SUCCEEDED

def test_check_without_network(self, connector_config, docker_runner: ConnectorRunner):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my guess is that this check will work well for most sources but sources which don't always require internet like source-file might not need it. I see a couple of ways around this:

  1. add a flag to the config requires_internet that can be used to disable this check
  2. don't use this check and rely on the other ones which malform the configs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for №2

Copy link
Contributor

@vitaliizazmic vitaliizazmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few comments but feel free to merge once they are addressed

@@ -1,6 +1,6 @@
# Running Standard Source Tests
```bash
PYTHONPATH=. pytest standard_test.py --standard_test_config=/connectors/source-hubspot/ -vvv
python -m pytest standard_test/tests --standard_test_config=../../connectors/source-hubspot/ -vvv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the hubspot bit be removed?

@@ -51,6 +52,13 @@ def test_check(self, connector_config, docker_runner: ConnectorRunner):
assert len(con_messages) == 1, "Connection status message should be emitted exactly once"
assert con_messages[0].connectionStatus.status == Status.SUCCEEDED

# def test_check_without_network(self, connector_config, docker_runner: ConnectorRunner):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we delete instead of comment out?



@pytest.mark.timeout(10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only problem with this is if the docker image takes a long time to download. Maybe we should have a fixture that downloads the docker image first so downstream tests can set reasonable timeouts like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed with pull_docker_image

@@ -69,9 +78,9 @@ def test_discover(self, connector_config, catalog, docker_runner: ConnectorRunne
assert catalog_messages[0].catalog == catalog, "Catalog should match the one that was provided"


@pytest.mark.timeout(300)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very short timeout given that some sources might have lots of streams or be reading lots of data. WDYT? Maybe this method shouldn't have a timeout? or maybe the timeout should be configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not forcing timeout defeats the purpose of it. Another aspect is that we want to keep tests fast and scalable. They shouldn't take more than 20 minutes. If a connector can't guarantee this time either it is defective or it's test account configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in principle, but pragmatically speaking I worry this may be difficult/full of friction to do in practice, so my suggestion is borne out of practicality rather than philosophy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make it configurable? 300 second timeout sounds fine in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as agreed with @sherifnada I will create a ticket for this, for now we will move on



@pytest.fixture(name="cursor_paths")
def cursor_paths_fixture(inputs, configured_catalog_for_incremental):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put return type hints for all these fixtures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

result = {}

for stream in configured_catalog_for_incremental.streams:
path = cursor_paths.get(stream.stream.name, [stream.cursor_field[-1]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cursor field could be nested so we should return the full cursor field instead of just the last element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, this is intentional, the default behavior in states don't use nested cursors, wherein records it is often the case.
I think the chance of events is the following:
(flat cursor in records) > (nested cursor in records) => (flat cursor in state) > (nested cursor in states)
so default behaviour here should appear more frequently

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see 👍🏼

records = filter_output(output, type_=Type.RECORD)
states = filter_output(output, type_=Type.STATE)

assert states, "Should produce at least one state"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this test be merged with the one below? I understand that ideally it's not, but we have to be pragmatic because these tests can take a long time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

return catalog


@pytest.mark.timeout(300)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as basic read about timeout -- this will be too short for some sources

helper = JsonSchemaHelper(schema=stream.stream.json_schema)
record_value = helper.get_cursor_value(record=record.record.data, cursor_path=stream.cursor_field)
state_value = helper.get_state_value(state=latest_state[stream_name], cursor_path=cursor_paths[stream_name])
assert record_value <= state_value, "First incremental sync should produce records younger or equal to cursor value from the state"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea with this check!

@keu keu force-pushed the keu/standard_python-tests/test-suites branch from dc82433 to a825b25 Compare April 18, 2021 17:56
result = {}

for stream in configured_catalog_for_incremental.streams:
path = cursor_paths.get(stream.stream.name, [stream.cursor_field[-1]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see 👍🏼

@pytest.fixture(name="configured_catalog_for_incremental")
def configured_catalog_for_incremental_fixture(configured_catalog) -> ConfiguredAirbyteCatalog:
catalog = incremental_only_catalog(configured_catalog)
for stream in catalog.streams:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, we already do this filtering elsewhere? I must have missed it

@keu keu merged commit 0247670 into keu/standard-python-tests Apr 19, 2021
@keu keu deleted the keu/standard_python-tests/test-suites branch April 19, 2021 14:56
keu added a commit that referenced this pull request Apr 20, 2021
)

* Standard Tests: Implement v2 test suites #2181 (#2792)
* rename standard tests (v2) to source_acceptance_test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement v2 test suites
4 participants