Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDCC/9005 replace files api call #9018

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Oct 3, 2022

What this PR does / why we need it: This PR adds a /replaceFiles api call to allow bulk direct upload/out-of-band upload replace operations. It also include

  • some minor cleanup,
  • a bug fix for /addFiles where an error with one file would stop processing of further files and falsely report the same error for all the skipped files
  • minor improvement to report the bulk edit operations in the edit-drafts logs (as the single file versions do).

Which issue(s) this PR closes:

Closes #9005
Closes #5924

Special notes for your reviewer: No new tests - while this could be tested within an overall direct upload test (only on S3 with upload-redirect=true), it is tricky to test as a single call since one has to be able to place files in storage directly prior to running the /replaceFiles call. If that can be done in an IT test, then this and /addFiles could be tested this way.

Suggestions on how to test this: @janvanmansum /DANS should be able to test this as part of their migration activity so their review would be useful. Otherwise, manual testing can be done - easiest on an S3 store with direct-upload enabled. (One can temporarily add upload-redirect=true on a file store but this breaks upload in the UI). The examples in the docs should give a good idea of how to test. W.r.t. to regression, this does make minor changes to the overall add/replace mechanism so retesting the single file add/replace apis would be useful.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: added a small one

Additional documentation: included.

refactor to make multifile a separate boolean
remove unused LicenseBean from constructor
updated /addFiles logic to use clone
refactored steps 70/80 to work for multi-replace. i.e. by tracking
filesToDelete and the physical files to delete.
replace local Json readers with JsonUtil method
move sanity check on file deletes to DataFileServiceBean
hasError is not  cleared where it was being used causing one error to
skip all further add/replace calls and report that error for all
subsequent files
the title Add File Metadata has been misunderstood to mean the call can
change the metadata for existing files which it can't. The entry was
also in the File section when it is a dataset-level call
@qqmyers qqmyers added the GDCC: DANS related to GDCC work for DANS label Oct 3, 2022
@coveralls
Copy link

coveralls commented Oct 5, 2022

Coverage Status

Coverage decreased (-0.03%) to 19.948% when pulling 54871d1 on GlobalDataverseCommunityConsortium:GDCC/9005-replaceFiles_api_call into 11abccf on IQSS:develop.

@mreekie
Copy link

mreekie commented Dec 13, 2022

The crux is that we can now replace multiple files at once.
It is functionality that we have, just in a different API call.

Discussion:
DANS wanted this and may be that they have tested this already.
If we rely on their work then we would only need to do regression tests.
It would make us feel more confident if we can still do some QA
There are 9 files, but some are documentation which makes review easier
Jim will resolve the merge conflicts as part of the review.
Bulk add is already existing. DVUploader supports this using the DirectUpload API.
This is not a bulk add. It is a direct upload.
For bulk add, you can do it with a direct upload.
This does a bulk replace. It only works with a direct upload.
Will need S3 setup to test this. The test can be built but it will need to be built out to do 3 things.

It looks like this may close #5924. It may not as they might not be using direct upload. Does Amber using S3?

This gets a 33.

Next step:

@mreekie mreekie added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Dec 13, 2022
@mreekie
Copy link

mreekie commented Dec 14, 2022

added to sprint Dec 15, 2022

@sekmiller sekmiller self-assigned this Dec 19, 2022
@@ -2366,48 +2373,6 @@ The fully expanded example above (without environment variables) looks like this
Note: The ``id`` returned in the json response is the id of the file metadata version.



Adding File Metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you're removing this doc since the endpoint still exists and it seems like we would want to have the ability to add files' metadata, not just replace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint doesn't work as ~implied by this section and this placement in the guides. You can't change the metadata of existing files with it. It is a way to add new files that have been uploaded by direct S3, out-of-band means, or when using the remoteOverlayStore. This call is documented already at https://guides.dataverse.org/en/latest/developers/s3-direct-upload-api.html?highlight=addfiles#to-add-multiple-uploaded-files-to-the-dataset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. thanks for the explanation

Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of old methods in the add replace helper need to be updated. also a question about addFiles.

Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@sekmiller sekmiller removed their assignment Dec 20, 2022
@kcondon kcondon self-assigned this Dec 21, 2022
@kcondon
Copy link
Contributor

kcondon commented Jan 3, 2023

Issues found:
[x] 1. extra buttons on upload page appear during replace file workflow in UI if file types differ and you select delete.
[x] 2. example json for multiple file upload needs different filetoreplace ids.
[x] 3. JSON_DATA for multiple file replace needs to switch all " with ' and vice versa
4. Replacing 2 files removes first file but not replaced, correctly replaces the second file, no server.log error. Command line result says 2 files replaced: Result":{"Total number of files":2,"Number of files successfully replaced":2}

@kcondon kcondon assigned qqmyers and unassigned kcondon Jan 3, 2023
In the UI doing a file replace and, in the case where the mimetype does
not match the original, selecting delete could result in a broken upload
tab. This only occured with direct upload enabled and was due to the
delete call not reinitializing the upload tab for direct upload.
The replaceFiles call keeps one dataset/draft version open for all file
replacements and then calls the update version command. At some point
prior to this commit changes started causing the dataset being used to
be replaced by a fresh copy from the database for each file, losing
changes for all but the last file.
@qqmyers
Copy link
Member Author

qqmyers commented Jan 5, 2023

@kcondon - hopefully ready for QA again. 1) was not related to the PR, just to replacing when using direct upload, but I included a fix. Doc changes for 2),3) have been made, and I fixed 4) - looks like a change along the way broke things and made only the last file in the list be changed (with earlier ones being deleted with no replacement).

@qqmyers qqmyers removed their assignment Jan 5, 2023
@kcondon kcondon self-assigned this Jan 9, 2023
@kcondon kcondon merged commit f63f0e8 into IQSS:develop Jan 10, 2023
@pdurbin pdurbin added this to the 5.13 milestone Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GDCC: DANS related to GDCC work for DANS Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Bulk replace files API Is there an option to batch file replace?
6 participants