Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API tests rely on pre-existing data #725

Open
1 task
sarayourfriend opened this issue Mar 8, 2022 · 10 comments
Open
1 task

API tests rely on pre-existing data #725

sarayourfriend opened this issue Mar 8, 2022 · 10 comments
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API

Comments

@sarayourfriend
Copy link
Collaborator

Description

The API tests rely on data pre-existing in the database being tested against.

This is not good practice as it relies on information that is neither controlled nor indicated by the tests themselves.

Instead, we should use factories (introduced in WordPress/openverse-api#530) to generate new data for each test case and use @pytest.mark.django_db to ensure that the tests occur in a single transaction that is rolled back after the testcase is finished. Essentially we want fresh data for every test. Ideally the database instance used for testing isn't even the one shared by the local running instance of the API but that can introduce more infrastructure complexity than we necessarily want to have to deal with.

This will prevent issues like the one described here.

Reproduction

  1. Run `just recreate && just init && just api-tests` and ensure they pass
  2. Run `just recreate && just init && just dj "generatewaveforms --max_records=3" && just api-tests` and note their failure due to the unexpected data existing in the database being tested against.

Expectation

Test data should be isolated to specific test cases, not leak across, and not rely on external data that can change arbitrarily.

Resolution

  • 🙋 I would be interested in resolving this bug.
@sarayourfriend sarayourfriend added 🟨 priority: medium Not blocking but should be addressed soon 🛠 goal: fix Bug fix 🤖 aspect: dx Concerns developers' experience with the codebase labels Mar 8, 2022
@dhruvkb
Copy link
Member

dhruvkb commented Mar 9, 2022

@sarayourfriend this is true for unit tests but integration tests should be using the actual underlying database so that we can ensure that the overall behaviour works as expected.

In my view, there shouldn't be any server-side mocking/cleanup etc. in the integration tests and they should be conducted from the PoV of a consumer of the API.

@sarayourfriend
Copy link
Collaborator Author

In that case should they just use production data?

@AetherUnbound
Copy link
Collaborator

In that case should they just use production data?

Do you mean copy some production data? Or interact with the production database?

@dhruvkb
Copy link
Member

dhruvkb commented Mar 10, 2022

Interacting with the production database for running tests seems wrong. Why not just use the sample entries set up by just init for the tests?

@sarayourfriend
Copy link
Collaborator Author

I guess my concern is over tests like the ones being introduced in WordPress/openverse-api#553 where it relies on specific data that isn't indicated by the tests themselves (namely, the list of providers). Maybe there's a different way to write the tests that would request the provider list first and then use that rather than hard-coding the provider list?

Idk, I've never used integration tests like this before, when I did it was on apps with user generated content so the integration tests would go through the full flow of creating and then consuming data from various users and stuff. I've never worked on this kind of catalog type data set, so sorry if I'm projecting incorrect understandings about how the tests should behave!

@dhruvkb
Copy link
Member

dhruvkb commented Mar 10, 2022

Oh no @sarayourfriend I completely agree that tests like the ones in WordPress/openverse-api#553 are not ideal. But as it stands right now, the API completely relies on the sample data loaded into it at the start for the complete functionality so integration tests keep breaking whenever we change the sample data.

@sarayourfriend
Copy link
Collaborator Author

the API completely relies on the sample data loaded into it at the start for the complete functionality so integration tests keep breaking whenever we change the sample data.

Right, I guess I just want to find a way to clarify how to fix that for the tests. Is the expectation that we update the sample data in the CSVs? Is there an easy way to generate them? CSV is a fine format for what it's doing currently but trying to maintain the data in there would be a pain, like adding demo waveforms for example would be very annoying and related models, like the AudioAddOn, don't seem to have a clear place in the current sample data, at least not one that wouldn't require manually cross-referencing IDs across multiple CSVs. The load_sample_data.sh script is currently simple (which is good) but it would be hard to extend it to include cross-referenced data that didn't involved maintaining UUIDs duplicated across multiple files (in this case just two, but could be more in the future I guess?). Is there a simple way that I'm not seeing for us to fix that now that we have "side tables" present?

I can't find documentation about how they're created or maintained in the repo. I suspect they're maybe extracted from production data somehow, or maybe a local run of the ingestion process?

Could we change the way our fixture data is created to use Python factories and define our fixture data in code rather than purely declaratively like we have now? Or should we try to refresh them on a regular basis somehow?

It also seems like a fix for the problem we ran into with the audio waveforms is to automatically load the sample data before the tests run. That's essentially what happens in CI already because the tests run immediately after just init is run, but locally you can run into the edge case we saw with the waveform peaks.

@Prakharkarsh1
Copy link

Hii @dhruvkb I want to contribute to fix this issue

@dhruvkb
Copy link
Member

dhruvkb commented Sep 10, 2022

@Prakharkarsh1 please go ahead. Feel free to comment here or alteratively ask in the #openverse channel in the Making WordPress Slack workspace if you need help.

@Prakharkarsh1
Copy link

Yeah sure @dhruvkb thank you!

@obulat obulat transferred this issue from WordPress/openverse-api Feb 22, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Feb 23, 2023
@obulat obulat added 🧱 stack: api Related to the Django API and removed 🧱 stack: backend labels Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
Projects
Status: 📋 Backlog
Development

No branches or pull requests

5 participants