-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate the test suite to vcrpy #541
Migrate the test suite to vcrpy #541
Conversation
This have some advantages: * prevent accidentally importing the tests files, which may have different dependencies than the runtime files. * tests are not shipped anymore inside the package, which reduces package size.
We're going to use vcrpy via pytest-recording, which will record all HTTP interactions in YAML files inside the `tests/cassettes` path. Since we don't want to show them in diffs or mark YAML as the main project language, we configure the `.gitattributes` accordingly. The `linguist-vendored` attribute has been taken from the Github linguistic doc: https://github.com/github/linguist/blob/master/docs/overrides.md#vendored-code
Now we can test the expected generated content at `token_payload`.
Now we can test the expected payload at `related_queries_widget_list`. The downside is that the expected payload is a little bit of a monster. Also we start to define timeframes for tests where the timeframe defines the response so we can have deterministic responses.
Now we can test the returned value by the backend, however we don't compare the _full_ response because it's a `pd.DataFrame` with 52 rows, instead we just compare the length, the first 3 rows and the last 3 rows, and from that we can infer that the result is correct. Later we can add more fine-grained tests if we want but I think this is a good starting point.
Also now the packages are defined in `pyproject.toml`, otherwise it fails when there is a HTML coverage report in `htmlcov`. As for why pip takes `html` as a package, I have no clue.
We introduce two tools to reduce the ever-growing tests boilerplate: * `build_interest_over_time_df`. * `ExpectedResult`, a dataclass that encapsulates all that's tested against a `pd.DataFrame`.
We splitted the original `test_related_topics` in 3 tests: one to test the result dict shape, one to test the 'top' results and one to test the 'rising' results.
There should be a special circle of hell for the programmers who designs these kind of responses in an API.
Here we too a different approach because there is no observable behavior in the result, in this case is in the requests made. We switch to the responses library to test the exact requests made. Also we do it with `trending_searches` because it's the easiest of all calls to mock, and because all the code passed through `_get_data` which is the one who make the requests. Also we wrote just one test instead of two, there's no point in testing only two methods, we can either test only one and hope for the best or test all. While testing all whould be better is also a time consuming chore with very little value.
They're not needed anymore because we test all the responses fields and indexes dtypes now.
Now tests in the CI cannot use the network, so no test without vcrpy cassette or responses-patching should silently make through.
* remove the `tool.coverage.run.source` directive and `html` section in `pyproject.toml`, it was useless. * add coverage by default in pytest config.
@emlazzarin |
Looks good. I think the additional size is acceptable.
Is comparing the full results very slow? Is that why we only check the first+last+size?
Let’s link from the README to a new CONTRIBUTING file we can start from scratch, just so the doc has the freedom to be as expansive as necessary.
…On Sun, Oct 30 2022 at 12:14, David Caro < ***@***.*** > wrote:
@emlazzarin ( https://github.com/emlazzarin )
Please note, I don't consider this 100% finished until we add some doc to
how to write/maintain/use the test suite, however I don't know if we
should simply add it to the README.md or if we want to create a CONTRIBUTING.md
file along with more content, what do you prefer?
—
Reply to this email directly, view it on GitHub (
#541 (comment)
) , or unsubscribe (
https://github.com/notifications/unsubscribe-auth/AAIWU4MHIJGNHKIP2BIEVOLWF2GG3ANCNFSM6AAAAAARSLHNZM
).
You are receiving this because you were mentioned. Message ID: <GeneralMills/pytrends/pull/541/c1296293616
@ github. com>
|
No, is because the expected result variables can become quite big because we ask for a year of data, something I did because the daily data from the API is not trustworthy and I fear it might change.
Perfect, do you want me to start it with info about working with the test suite in this PR? |
Ok, I reduced the date ranges in the tests where we can use daily data but that was only 5 tests, with all the other I don't think is worth comparing the full result for not hardcoding DataFrames of between 25 and 250 rows. |
Looks like Github doesn't support splitting a command in more than one line.
Fixes #503
Using vcrpy now we can deterministically test the backend responses without making any real request.
This have a lot of advantages:
pytest --record-mode=rewrite
.There are some gotchas though:
linguist-vendored
in.gitattributes
(see linguist doc and this Stackoverflow response) but we will not know if it works until the change is merged and some days have passed.pytest-recording
is executed with--record-mode=none
so any new test without a cassette fails, this is intended to prevent generate cassettes unavertidely; to generate a new cassette you need to use--record-mode=once
.