Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardize image url #10855

Merged
merged 7 commits into from
Sep 24, 2024
Merged

Conversation

stevenwinship
Copy link
Contributor

What this PR does / why we need it: Currently, image_url returns base64 URLs for files and dataverses, while it returns a regular URL for datasets. The goal is to standardize all URLs to use the same format. We have chosen to use regular URLs instead of base64 for all cases.

Which issue(s) this PR closes:#10831

Closes #10831

Special notes for your reviewer:

Suggestions on how to test this: SearchIT has a test for datafiles. Collection test can be done manually by adding a logo and calling a search. verify that the image_url is there and is reachable.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?: Yes, since this changes the body of the search response

Additional documentation:

@stevenwinship stevenwinship self-assigned this Sep 17, 2024
@stevenwinship stevenwinship added Type: Feature a feature request User Role: API User Makes use of APIs Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) SPA These changes are required for the Dataverse SPA GREI Re-arch Issues related to the GREI Dataverse rearchitecture FY25 Sprint 6 FY25 Sprint 6 labels Sep 17, 2024
@coveralls
Copy link

coveralls commented Sep 17, 2024

Coverage Status

coverage: 20.711% (-0.003%) from 20.714%
when pulling d215221 on 10831-standardize-image-url-of-search-api
into ec882e3 on develop.

This comment has been minimized.

@stevenwinship stevenwinship removed their assignment Sep 18, 2024

This comment has been minimized.

1 similar comment

This comment has been minimized.

@qqmyers qqmyers self-assigned this Sep 18, 2024
----

- ** /api/search?q=**: Json values for image_url in DataFiles and Collections have changed from Base64URL ("data:image/png;base64,...) to "/api/access/datafile/{identifier}?imageThumb=true" and "/api/access/dvCardImage/{identifier}" respectively. This was done to match the image_url of Dataset.

v6.3
----

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #10811 the following line was added:

"Note that the image_url field, if exists, will be returned as a regular URL for Datasets, while for Files and Dataverses, it will be returned as a Base64 URL. We plan to standardize this behavior so that the field always returns a regular URL. (See: #10831)"

Can we remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed. also modified the example json

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good overall. The PR is currently minimalistic (~per discussion) so I think the issue owner (@GPortas ) should verify that it meets the need and either request changes or add new issues as required. Things I see that are relevant:

  • Both the new datafile and dataverse/collection links work when an image exists. They were existing API calls that have just been added to the search results in this PR.

  • When an image doesn't exist, the file call returns 404 and the collection returns 204(No Content). Should they be changed to be the same? Should an image_url not be sent when there isn't one?

  • There is currently no API call to add a Dataverse image, which, as @stevenwinship pointed out, means there's no automated test of the dataverse/collection download. Manual testing shows that it works, so probably not a big issues. That said - when is an upload API call going to be needed? Should that be added to this PR or be a new issue?

  • I'm guessing the file API call, which uses a query param, won't be cached by the browser. Is that something that should be fixed now/later (i.e. by providing a URL w/o a query param)? Is it even an issue given that in many cases, that call will end up redirecting to a signed S3 download URL for the thumbnail (does the browser cache answers from redirects?).

Given that I think the PR works as is, I'm going to approve it, but if anyone thinks more scope should be added before it moves forward, just note the change and move it back to in progress.

@pdurbin
Copy link
Member

pdurbin commented Sep 18, 2024

I'm putting it back to "in progress" for this: #10855 (comment)

@pdurbin pdurbin assigned stevenwinship and unassigned qqmyers Sep 18, 2024
@stevenwinship stevenwinship removed their assignment Sep 18, 2024

This comment has been minimized.

Comment on lines 10 to 13
v6.4
----

- ** /api/search?q=**: Json values for image_url in DataFiles and Collections have changed from Base64URL ("data:image/png;base64,...) to "/api/access/datafile/{identifier}?imageThumb=true" and "/api/access/dvCardImage/{identifier}" respectively. This was done to match the image_url of Dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this release note if we merge this before we release 6.4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed for now. can be re-added if this doesn't make 6.4

This comment has been minimized.

@pdurbin pdurbin added this to the 6.4 milestone Sep 18, 2024
@pdurbin
Copy link
Member

pdurbin commented Sep 18, 2024

As discussed in Slack, I'm giving this the 6.4 milestone (thanks for the 👍 @cmbz) because it fixes something (image_url) in #10811 which has already been merged and will be part of 6.4. Basically, since we're shipping an update to image_url, we should ship the fixed version.

@pdurbin pdurbin removed the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Sep 19, 2024
@pdurbin pdurbin added the Size: 3 A percentage of a sprint. 2.1 hours. label Sep 19, 2024
@GPortas GPortas self-assigned this Sep 19, 2024
Copy link
Contributor

@GPortas GPortas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. image_url field when no image is available.

I find it a bit confusing and not very practical for the API consumer that, when no image exists, a URL is still returned in the image_url field, which results in 404 or 204 codes.

I think it’s more appropriate to prevent this scenario directly from the backend, rather than requiring all consumers to handle the logic of checking whether images actually exist.

After updating this, we should also mention the optionality of the field in the documentation.

  1. API call to add a Dataverse image

This API call will indeed be necessary for the SPA when we implement the functionality to add images to a collection.

Normally, when I extend the API, it often happens that I have to implement write endpoints just for the setup of the API tests for the read endpoints, as is the case here.

The good thing is that these endpoints will already be implemented by the time the related UI feature need to be implemented in the SPA.

If it’s not too big to implement now (Given the upcoming 6.4 release), I would implement the endpoint in this PR. If not, we can keep it as is and implement it later along with the pending automated tests.

  1. Caching and query params & redirects

I would probably address this point in a future iteration, as I see it more as an optimization than a critical aspect for the SPA and other consumers to operate.

I also don't believe that switching from returning URLs with query parameters to using a different structure with path parameters is a backwards compatibility issue to worry about.


For Dataverse:

- "image_url" (optional)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the image_url is always returned, even though the URL returns a 404 for files and a 204 for collections, then it is not an optional field. Anyway, check my final review comment, where I suggest not adding the fields to the payload in these cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GPortas what if when no image is available the endpoint just returns image_url: null ?
In that way we will know that image_url could be either a string(image exists) or null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed image_url if none exists

@GPortas GPortas assigned stevenwinship and unassigned GPortas Sep 20, 2024

This comment has been minimized.


if (result.getEntity() == null) {
return null;
if (!result.isHarvested() && result.getEntity() != null && (!((DataFile)result.getEntity()).isRestricted()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably use the logic in

private boolean isAccessAuthorized(User requestUser, DataFile df) {
or
if(dataFile.isRestricted()
|| !dataFile.isReleased()
|| FileUtil.isActivelyEmbargoed(dataFile)
|| FileUtil.isRetentionExpired(dataFile)){
return false;
}
to address embargo and retention

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers Wouldn't the Access call using the image_url go through the isAccessAuthorized() method anyways? So if we supply the url the user then needs to make the call to get the image which would fail is they don't have access. I don't want to duplicate the code so I would need to make it public and be able to call it from the wrapper to prevent sending the url.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it would make sense to change the code to public or move the relevant parts to a util class to avoid duplicating code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers I went with the second option of copying the code from DatasetPage

@GPortas GPortas added the SPA.Q3.1 Collection page results of all types label Sep 23, 2024
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:10831-standardize-image-url-of-search-api
ghcr.io/gdcc/configbaker:10831-standardize-image-url-of-search-api

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@stevenwinship stevenwinship removed their assignment Sep 23, 2024
Copy link
Contributor

@GPortas GPortas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works good. Merging.

@GPortas GPortas merged commit de88c00 into develop Sep 24, 2024
20 checks passed
@GPortas GPortas deleted the 10831-standardize-image-url-of-search-api branch September 24, 2024 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 6 FY25 Sprint 6 GREI Re-arch Issues related to the GREI Dataverse rearchitecture Original size: 30 Size: 3 A percentage of a sprint. 2.1 hours. SPA.Q3.1 Collection page results of all types SPA These changes are required for the Dataverse SPA Type: Feature a feature request User Role: API User Makes use of APIs
Projects
Status: Done 🧹
6 participants