-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DD-539: files.xml items hetzij in de manifest, hetzij in pre-staged.csv #107
Conversation
…th filepaths in metadata/pre-staged.csv
src/test/scala/nl.knaw.dans.easy.validatebag/rules/metadata/MetadataRulesSpec.scala
Outdated
Show resolved
Hide resolved
@@ -512,8 +518,8 @@ package object metadata extends DebugEnhancedLogging { | |||
else set.mkString("{", ", ", "}") | |||
} | |||
|
|||
lazy val onlyInBag = stringDiff(payloadPaths, pathsInFileXml) | |||
lazy val onlyInFilesXml = stringDiff(pathsInFileXml, payloadPaths) | |||
lazy val onlyInBag = stringDiff(payloadAndPreStagedFilePaths, pathsInFileXml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The messages are clearer when you split this check resulting in only in payload
and only in pre-staged.csv
Not caused by you but perhaps it is more efficient to read the manifest, than walk the file system (again, as it will have been walked when comparing the manifest with the data folder)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't really split the message to only in payload
and only in pre-staged.csv
because together they have to match the filepaths in filesXml.
I left the reading of payload files as it is now (walking the file system), because the code is much more simple that way. If we would read the manifest file, the code would be something like this:
t.tryBag.map(bag => bag.getPayLoadManifests.asScala.find(_.getAlgorithm == SHA1).map(manifest => manifest.getFileToChecksumMap.keySet().asScala.map(p => t.bagDir.relativize(p)).toSet))
@@ -527,6 +533,25 @@ package object metadata extends DebugEnhancedLogging { | |||
} | |||
} | |||
|
|||
// borrowed from easy-split-multi-deposit/MultiDepositParser.scala |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a more readable example to reuse:
https://github.com/DANS-KNAW/easy-convert-bag-to-deposit/blob/7e3bd6bdd5a9ab01dc646c580bb459653c27fc42/src/main/scala/nl.knaw.dans.easy.bag2deposit/UserTransformer.scala#L27-L34
With r.getRecordNumber
you won't need zip-with-index, what is info you are not using anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Fixes DD-539
When applied it will
filesXml
correspond to thepayload files
combined with filepaths inpre-staged.csv
Where should the reviewer @DANS-KNAW/easy start?
How should this be manually tested?
Related pull requests on github