Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate the test suite to vcrpy #541

Merged
merged 28 commits into from
Oct 31, 2022

Conversation

Terseus
Copy link
Collaborator

@Terseus Terseus commented Oct 30, 2022

Fixes #503

Using vcrpy now we can deterministically test the backend responses without making any real request.
This have a lot of advantages:

  • Removes false negatives or test skipped for 429 responses.
  • The tests are much faster now, all the 23 tests are executed in about 1 second in my PC.
  • Allow to refactor anything that doesn't involve a change in the requests made.
  • If the backend response changes we can regenerate the vcrpy cassettes with pytest --record-mode=rewrite.

There are some gotchas though:

  • The backend can be unpredictable when asking for shorter timeframes (e.g. 10 days) so we chose to do yearly requests in almost all the tests.
  • Due to the above the results to check can be quite big, therefore we don't compare the full result, instead we test the first and last rows, and the length of the result.
  • There is a lot of YAML code now with the generated cassettes, I've tried to make Github ignore them by marking them with linguist-vendored in .gitattributes (see linguist doc and this Stackoverflow response) but we will not know if it works until the change is merged and some days have passed.
    • Alternatively we can merge only the first two commits in a separate PR, let pass some time and come back to this PR to see if it worked.
  • Writing new tests require some knowledge not documented right now:
    • By default pytest-recording is executed with --record-mode=none so any new test without a cassette fails, this is intended to prevent generate cassettes unavertidely; to generate a new cassette you need to use --record-mode=once.
    • The cassette must be deleted after any change that requires modifying the requests made to the backend, and the error message may not communicate this clearly enough to anyone who doesn't know vcrpy.

This have some advantages:
* prevent accidentally importing the tests files, which may have
  different dependencies than the runtime files.
* tests are not shipped anymore inside the package, which reduces
  package size.
We're going to use vcrpy via pytest-recording, which will record all
HTTP interactions in YAML files inside the `tests/cassettes` path.

Since we don't want to show them in diffs or mark YAML as the main
project language, we configure the `.gitattributes` accordingly.

The `linguist-vendored` attribute has been taken from the Github
linguistic doc:
https://github.com/github/linguist/blob/master/docs/overrides.md#vendored-code
Now we can test the expected generated content at `token_payload`.
Now we can test the expected payload at `related_queries_widget_list`.
The downside is that the expected payload is a little bit of a monster.

Also we start to define timeframes for tests where the timeframe defines
the response so we can have deterministic responses.
Now we can test the returned value by the backend, however we don't
compare the _full_ response because it's a `pd.DataFrame` with 52 rows,
instead we just compare the length, the first 3 rows and the last 3
rows, and from that we can infer that the result is correct.

Later we can add more fine-grained tests if we want but I think this is
a good starting point.
Also now the packages are defined in `pyproject.toml`, otherwise it
fails when there is a HTML coverage report in `htmlcov`.
As for why pip takes `html` as a package, I have no clue.
We introduce two tools to reduce the ever-growing tests boilerplate:
* `build_interest_over_time_df`.
* `ExpectedResult`, a dataclass that encapsulates all that's tested
  against a `pd.DataFrame`.
We splitted the original `test_related_topics` in 3 tests: one to test
the result dict shape, one to test the 'top' results and one to test the
'rising' results.
There should be a special circle of hell for the programmers who designs
these kind of responses in an API.
Here we too a different approach because there is no observable
behavior in the result, in this case is in the requests made.
We switch to the responses library to test the exact requests made.

Also we do it with `trending_searches` because it's the easiest of all
calls to mock, and because all the code passed through `_get_data` which
is the one who make the requests.

Also we wrote just one test instead of two, there's no point in testing
only two methods, we can either test only one and hope for the best or
test all.
While testing all whould be better is also a time consuming chore with
very little value.
They're not needed anymore because we test all the responses fields and
indexes dtypes now.
Now tests in the CI cannot use the network, so no test without vcrpy
cassette or responses-patching should silently make through.
* remove the `tool.coverage.run.source` directive and `html` section in
  `pyproject.toml`, it was useless.
* add coverage by default in pytest config.
@Terseus
Copy link
Collaborator Author

Terseus commented Oct 30, 2022

@emlazzarin
Please note, I don't consider this 100% finished until we add some doc to how to write/maintain/use the test suite, however I don't know if we should simply add it to the README.md or if we want to create a CONTRIBUTING.md file along with more content, what do you prefer?

@emlazzarin
Copy link
Collaborator

emlazzarin commented Oct 30, 2022 via email

@Terseus
Copy link
Collaborator Author

Terseus commented Oct 30, 2022

@emlazzarin

Is comparing the full results very slow? Is that why we only check the first+last+size?

No, is because the expected result variables can become quite big because we ask for a year of data, something I did because the daily data from the API is not trustworthy and I fear it might change.
Maybe I was paranoid and we can change the tests them to ask for a week of data or so and compare the full results, then we can get rid of the ExpectedResponse class.
What do you think?

Let’s link from the README to a new CONTRIBUTING file we can start from scratch, just so the doc has the freedom to be as expansive as necessary.

Perfect, do you want me to start it with info about working with the test suite in this PR?

@Terseus
Copy link
Collaborator Author

Terseus commented Oct 30, 2022

Ok, I reduced the date ranges in the tests where we can use daily data but that was only 5 tests, with all the other I don't think is worth comparing the full result for not hardcoding DataFrames of between 25 and 250 rows.
I think it's okay right now.

Looks like Github doesn't support splitting a command in more than one
line.
@emlazzarin emlazzarin merged commit de31fb0 into GeneralMills:master Oct 31, 2022
@Terseus Terseus deleted the vcr-based-unit-tests branch November 1, 2022 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

request for tests
2 participants