-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
standardize image url #10855
standardize image url #10855
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
---- | ||
|
||
- ** /api/search?q=**: Json values for image_url in DataFiles and Collections have changed from Base64URL ("data:image/png;base64,...) to "/api/access/datafile/{identifier}?imageThumb=true" and "/api/access/dvCardImage/{identifier}" respectively. This was done to match the image_url of Dataset. | ||
|
||
v6.3 | ||
---- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #10811 the following line was added:
"Note that the image_url field, if exists, will be returned as a regular URL for Datasets, while for Files and Dataverses, it will be returned as a Base64 URL. We plan to standardize this behavior so that the field always returns a regular URL. (See: #10831)"
Can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed. also modified the example json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good overall. The PR is currently minimalistic (~per discussion) so I think the issue owner (@GPortas ) should verify that it meets the need and either request changes or add new issues as required. Things I see that are relevant:
-
Both the new datafile and dataverse/collection links work when an image exists. They were existing API calls that have just been added to the search results in this PR.
-
When an image doesn't exist, the file call returns 404 and the collection returns 204(No Content). Should they be changed to be the same? Should an image_url not be sent when there isn't one?
-
There is currently no API call to add a Dataverse image, which, as @stevenwinship pointed out, means there's no automated test of the dataverse/collection download. Manual testing shows that it works, so probably not a big issues. That said - when is an upload API call going to be needed? Should that be added to this PR or be a new issue?
-
I'm guessing the file API call, which uses a query param, won't be cached by the browser. Is that something that should be fixed now/later (i.e. by providing a URL w/o a query param)? Is it even an issue given that in many cases, that call will end up redirecting to a signed S3 download URL for the thumbnail (does the browser cache answers from redirects?).
Given that I think the PR works as is, I'm going to approve it, but if anyone thinks more scope should be added before it moves forward, just note the change and move it back to in progress.
I'm putting it back to "in progress" for this: #10855 (comment) |
This comment has been minimized.
This comment has been minimized.
v6.4 | ||
---- | ||
|
||
- ** /api/search?q=**: Json values for image_url in DataFiles and Collections have changed from Base64URL ("data:image/png;base64,...) to "/api/access/datafile/{identifier}?imageThumb=true" and "/api/access/dvCardImage/{identifier}" respectively. This was done to match the image_url of Dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this release note if we merge this before we release 6.4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed for now. can be re-added if this doesn't make 6.4
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
image_url
field when no image is available.
I find it a bit confusing and not very practical for the API consumer that, when no image exists, a URL is still returned in the image_url
field, which results in 404 or 204 codes.
I think it’s more appropriate to prevent this scenario directly from the backend, rather than requiring all consumers to handle the logic of checking whether images actually exist.
After updating this, we should also mention the optionality of the field in the documentation.
- API call to add a Dataverse image
This API call will indeed be necessary for the SPA when we implement the functionality to add images to a collection.
Normally, when I extend the API, it often happens that I have to implement write endpoints just for the setup of the API tests for the read endpoints, as is the case here.
The good thing is that these endpoints will already be implemented by the time the related UI feature need to be implemented in the SPA.
If it’s not too big to implement now (Given the upcoming 6.4
release), I would implement the endpoint in this PR. If not, we can keep it as is and implement it later along with the pending automated tests.
- Caching and query params & redirects
I would probably address this point in a future iteration, as I see it more as an optimization than a critical aspect for the SPA and other consumers to operate.
I also don't believe that switching from returning URLs with query parameters to using a different structure with path parameters is a backwards compatibility issue to worry about.
|
||
For Dataverse: | ||
|
||
- "image_url" (optional) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the image_url is always returned, even though the URL returns a 404 for files and a 204 for collections, then it is not an optional field. Anyway, check my final review comment, where I suggest not adding the fields to the payload in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GPortas what if when no image is available the endpoint just returns image_url: null
?
In that way we will know that image_url
could be either a string
(image exists) or null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed image_url if none exists
This comment has been minimized.
This comment has been minimized.
|
||
if (result.getEntity() == null) { | ||
return null; | ||
if (!result.isHarvested() && result.getEntity() != null && (!((DataFile)result.getEntity()).isRestricted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably use the logic in
private boolean isAccessAuthorized(User requestUser, DataFile df) { |
dataverse/src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
Lines 5831 to 5836 in 9a1e494
if(dataFile.isRestricted() | |
|| !dataFile.isReleased() | |
|| FileUtil.isActivelyEmbargoed(dataFile) | |
|| FileUtil.isRetentionExpired(dataFile)){ | |
return false; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qqmyers Wouldn't the Access call using the image_url go through the isAccessAuthorized() method anyways? So if we supply the url the user then needs to make the call to get the image which would fail is they don't have access. I don't want to duplicate the code so I would need to make it public and be able to call it from the wrapper to prevent sending the url.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and it would make sense to change the code to public or move the relevant parts to a util class to avoid duplicating code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qqmyers I went with the second option of copying the code from DatasetPage
📦 Pushed preview images as
🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works good. Merging.
What this PR does / why we need it: Currently, image_url returns base64 URLs for files and dataverses, while it returns a regular URL for datasets. The goal is to standardize all URLs to use the same format. We have chosen to use regular URLs instead of base64 for all cases.
Which issue(s) this PR closes:#10831
Closes #10831
Special notes for your reviewer:
Suggestions on how to test this: SearchIT has a test for datafiles. Collection test can be done manually by adding a logo and calling a search. verify that the image_url is there and is reachable.
Does this PR introduce a user interface change? If mockups are available, please link/include them here: No
Is there a release notes update needed for this change?: Yes, since this changes the body of the search response
Additional documentation: