-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API tests rely on pre-existing data #725
Comments
@sarayourfriend this is true for unit tests but integration tests should be using the actual underlying database so that we can ensure that the overall behaviour works as expected. In my view, there shouldn't be any server-side mocking/cleanup etc. in the integration tests and they should be conducted from the PoV of a consumer of the API. |
In that case should they just use production data? |
Do you mean copy some production data? Or interact with the production database? |
Interacting with the production database for running tests seems wrong. Why not just use the sample entries set up by |
I guess my concern is over tests like the ones being introduced in WordPress/openverse-api#553 where it relies on specific data that isn't indicated by the tests themselves (namely, the list of providers). Maybe there's a different way to write the tests that would request the provider list first and then use that rather than hard-coding the provider list? Idk, I've never used integration tests like this before, when I did it was on apps with user generated content so the integration tests would go through the full flow of creating and then consuming data from various users and stuff. I've never worked on this kind of catalog type data set, so sorry if I'm projecting incorrect understandings about how the tests should behave! |
Oh no @sarayourfriend I completely agree that tests like the ones in WordPress/openverse-api#553 are not ideal. But as it stands right now, the API completely relies on the sample data loaded into it at the start for the complete functionality so integration tests keep breaking whenever we change the sample data. |
Right, I guess I just want to find a way to clarify how to fix that for the tests. Is the expectation that we update the sample data in the CSVs? Is there an easy way to generate them? CSV is a fine format for what it's doing currently but trying to maintain the data in there would be a pain, like adding demo waveforms for example would be very annoying and related models, like the AudioAddOn, don't seem to have a clear place in the current sample data, at least not one that wouldn't require manually cross-referencing IDs across multiple CSVs. The I can't find documentation about how they're created or maintained in the repo. I suspect they're maybe extracted from production data somehow, or maybe a local run of the ingestion process? Could we change the way our fixture data is created to use Python factories and define our fixture data in code rather than purely declaratively like we have now? Or should we try to refresh them on a regular basis somehow? It also seems like a fix for the problem we ran into with the audio waveforms is to automatically load the sample data before the tests run. That's essentially what happens in CI already because the tests run immediately after |
Hii @dhruvkb I want to contribute to fix this issue |
@Prakharkarsh1 please go ahead. Feel free to comment here or alteratively ask in the #openverse channel in the Making WordPress Slack workspace if you need help. |
Yeah sure @dhruvkb thank you! |
Description
The API tests rely on data pre-existing in the database being tested against.
This is not good practice as it relies on information that is neither controlled nor indicated by the tests themselves.
Instead, we should use factories (introduced in WordPress/openverse-api#530) to generate new data for each test case and use
@pytest.mark.django_db
to ensure that the tests occur in a single transaction that is rolled back after the testcase is finished. Essentially we want fresh data for every test. Ideally the database instance used for testing isn't even the one shared by the local running instance of the API but that can introduce more infrastructure complexity than we necessarily want to have to deal with.This will prevent issues like the one described here.
Reproduction
Expectation
Test data should be isolated to specific test cases, not leak across, and not rely on external data that can change arbitrarily.
Resolution
The text was updated successfully, but these errors were encountered: