Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IQSS/10379 Fix Metrics API Bugs #10865

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Sep 19, 2024

What this PR does / why we need it: This PR addresses a bug reported in #10379 where a cached value did not include the choice of whether to report local, remote (harvested), or all datasets for the /datasets and /datasets/monthly endpoints, resulting in incorrect values when different values were used and a cached value was available. The PR uses the existing mechanism to cache the results with the local/remote/all dataLocation choices to avoid this.

The PR also addresses the issue raised in #10710 (comment) where the old queries to find the latest relevant version were broken if/when the minor version was greater than 9. The new queries rely on ordering the results by both the major and minor version numbers and using DISTINCT ON to pick the first entry (the one with the highest major version, and among those, the highest minor version.

Which issue(s) this PR closes:

Closes #10379

Special notes for your reviewer: I did some checking of the results from the old/new queries in sql: e.g. in simplest form, comparing
select DISTINCT ON (datasetversion.dataset_id) datasetversion.id from datasetversion join dataset on dataset.id = datasetversion.dataset_id where versionstate='RELEASED' order by datasetversion.dataset_id, datasetversion.versionnumber desc, datasetversion.minorversionnumber desc;

with
select datasetversion.dataset_id || ':' || max(datasetversion.versionnumber || datasetversion.minorversionnumber) from datasetversion join dataset on dataset.id = datasetversion.dataset_id where versionstate='RELEASED' group by dataset_id;

and they appear to give the same results. In going through the queries for different metrics, I realized that the string value being returned was never used, so I've simplified the selects to just return just the dataset id (when all we're doing is count(*)) or the datasetversion id (when we're matching against the version to, for example, find filemetadatas in that version).

Suggestions on how to test this:
Compare the results of any/all of the metrics APIs with the changed code. A potential shortcut that covers most endpoints would be to install/run the https://github.com/gdcc/dv-metrics app and view it's webpage and then compare the graphs produced before and after the PR. Whether testing manually or via the app, you need to truncate metric; to remove the cached values or call the /clearMetricsCache api call.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: included.

Additional documentation:

@qqmyers qqmyers added the Size: 10 A percentage of a sprint. 7 hours. label Sep 19, 2024
@coveralls
Copy link

Coverage Status

coverage: 20.71% (-0.001%) from 20.711%
when pulling 35c9b5a on QualitativeDataRepository:IQSS/10379
into d4e9a4f on IQSS:develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Metrics caching doesn't handle dataLocation parameter
2 participants