OAI-ORE and BagIT development #4706

qqmyers · 2018-05-23T11:58:17Z

This is an issue to track feedback related to developing a way to archive published datasets in DPN (http://dpn.org). I've done some proof-of-concept work to generate an OAI-ORE map file and BagIt bag (which uses and includes the ORE map file) for published datasets that I hope can form the basis for a DPN submission.
From https://groups.google.com/forum/#!topic/dataverse-community/NZydpK_zXO0 :

I've posted some documentation which describes the use case and design rationale and has a run-down of some of the choices we've made to get to a proof-of-concept and some open issues. That documentation links to two example files - a json-ld ORE map and a BagIt bag for a test dataset. (FWIW: We're developing in the QDR fork of Dataverse at https://github.com/QualitativeDataRepository/dataverse/tree/feature/QDR-953

qqmyers · 2018-07-03T15:51:03Z

After discussions at the 2018 Dataverse meeting (thanks!), I've tried to identify a list of things to get to a minimum viable solution. Comments welcome.

ORE updates:

add URI (and/or block-level namespace) to *.tsv import/api/metadata block model, use that URI, if it exists, instead of generated one in oremap (keeping a generated one as the default for tsv files that have not been updated)
add sha256 (and 512) as options to Dataverse, create a way to generate for existing files (e.g. with check of existing md5)?
decide on mapping of terms to URIs for internal DV metadata and citation.tsv metadata, with options being DDI(disco?), DCTerms, schema.org, and/or custom Dataverse terms (as the current code has). If there are custom terms, create a page at the URL used.
submit OAI-ORE exporter to Dataverse (i.e. make it available as an option in the export menu) - this will probably include an update to allow exporters to stream instead of generating a string in memory.

Bag updates:

Adapt bag to follow RDA guidance at https://www.rd-alliance.org/system/files/Research%20Data%20Repository%20Interoperability%20WG%20-%20Final%20Recommendations_0.pdf (move metadata files to a /metadata subdir, add a Datacite.xml file, include a tagmanifest file, change name of oremap file per spec, identify a minimal bagit profile to reference)?

DPN updates:

Still TBD - planning discussion with them to decide whether to submit an ~RDA bag to be wrapped in a DPN bag, to submit a DPN bag directly, or perhaps to use the ORE map file to construct a bag via DPN's API.

qqmyers · 2018-08-03T17:25:47Z

An update: w.r.t. code - the URI and namespace for metadatablocks and support for sha256/512, including an api call to verify the existing hash and replace it, if the file is still valid, with one form the new algorithm, are both done/in the branch. I'm currently working on default URIs for the citation metadata block, and adapting the bag.

John Crabtree and I had a good call with Dave Pcolar of DPN today and it appears that either sending individual files or an ~RDA bag, either of which would be wrapped in a DPN bag for preservation, should be doable today. Sending a DPN bag directly is something that DPN is willing to work on, but is not currently supported. From the discussion, it appears that sending files directly could have higher performance (due to parallel transfer) but the idea of using an ~RDA bag as a general export, common intermediate/standard across possible preservation systems sounds compelling and I think sending a bag is currently the consensus option. We had some discussion of potential next steps w.r.t. versioning (perhaps just publishing the version changes given Dataverse's ability to identify them) and how to assure that variable level metadata is included (by including the ddi metadata file and/or adding to the ORE map).

qqmyers · 2018-08-22T13:48:54Z

After a second discussion with DPN and Odum, I've gone ahead with a consensus plan to enable optional submission to DPN as a post-publication Dataverse workflow as a v 1 effort. Based on how DPN works, including the fact that initial submission is synchronous and reversible while creating a 'snapshot' to archive a space can have a delay and is irreversible (except for manual removal), the workflow creates a space, named after the dataset globalId and uploads a BagIt bag, named after the globalId+ version, and a datacite.xml file to it. The success of this step is reported in a new column on the versions tab, visible only to admins, that reports failure or provides a link to find the data in the DPN Duracloud admin GUI. A curator would click the button to create a snapshot and monitor progress from there. Once the snapshot exists, the space is automatically emptied and can be deleted. Publishing a new version of a dataset will recreate the space and the process can be repeated with a snapshot of the new Bag and datacite.xml file. (Versions are therefore stored as different snapshots of the same space.)

I've been testing this in a 4.9.2-based QDR branch and it works reliably, though I did hit a DPN bug at one point. As a side-effect of the main effort, the datacite.xml file can be made available as a metadata export format (and it may be worth looking at it to add more fields as we just did with the citation formats). I've removed the Bag generation from the metadata export menu where I initially tested it for several reasons - it's not just metadata, it includes restricted files and access to it should be restricted, it's better to stream it to DPN/generate on demand rather than caching it (as its similar in size to the whole dataset).

I have a few things to finish up before this is ready for review/QA:

suggesting vocabulary matches for terms in citation.tsv
add a place for admins to download a Bag if/when the workflow fails
add a way to archive already published datasets
add the datacite.xml file to the bag and check for any other RDA recommended changes
make the update available as a PR off the current develop branch

If anyone would like to see it early, I'd be happy to demo/discuss.

pdurbin · 2018-08-22T23:51:58Z

A curator would click the button to create a snapshot

@qqmyers what would label on the button be? If it's easy to provide a screenshot of what you have so far, can you please add it here? Today in a design meeting we were thinking about UI impact on the dataset page and I mentioned that at one point you were planning to put a button under "Export" even though we might want to consider a different place and name for it.

qqmyers · 2018-08-23T18:16:12Z

@pdurbin - the button you reference is one in the duracloud admin webapp, not Dataverse. I was originally thinking an 'export button' on the dataset page would be good, but since it could contain restricted files and is version specific, I've gone for the admin-only column in the version table, which is currently something like below. Non-admins would just see the normal table.
All that said, it's all just 'proposed' and I'm definitely interested in feedback w.r.t. GUI, as well as who should be able to access the 'retry' functionality, etc. etc.

pdurbin · 2018-08-24T15:03:17Z

@qqmyers thanks, I've been talking to @mheppler a bit about your screenshot.

I know you wrote extensive documentation about what you're up in the "Data and Metadata Packaging for Archiving" that's ultimately linked from the Google Group post in the description of this issue but here's a direct link: https://github.com/QualitativeDataRepository/dataverse/wiki/Data-and-Metadata-Packaging-for-Archiving

People should also check out the discussion about the plan in the Google Group thread: https://groups.google.com/d/msg/dataverse-community/NZydpK_zXO0/vuvhnHL7AQAJ

Another resource is @qqmyers 's talk at the 2018 Dataverse Community Meeting. "Dataverse and Preservation in the Qualitative Data Repository" at https://drive.google.com/open?id=1fVhtw-R3Jf7wO4tgkNxpk3Mm93bUIjXP

qqmyers · 2018-09-04T15:17:20Z

@pdurbin - FYI - I've extended the workflow mechanism as discussed on the community call to allow system settings and an apiKey for the current user to be sent to a workflow step and, after some EJB fun, I think I have DPN submission as a workflow working along with the ability to submit past versions via the GUI. I have some cleanup to do, but I'm about ready to submit a PR(s) and would like to ask: There are a few things like the workflow changes and making the export mechanism use streaming data that were needed for DPN submission but could be submitted and reviewed as separate PRs. Would it be helpful to do that? That could be a little extra work for me, but I don't think it's that much since I have to compare between QDR's 4.9.2-based branch and develop anyway. It may help with review, but there would be dependencies between the PRs too. Let me know what you all think. Thanks!

djbrooke · 2018-09-05T20:33:43Z

@scolapasta any thoughts/guidance re: the approach in the above comment from @qqmyers?

hooray workflows!

scolapasta · 2018-09-06T20:03:12Z

@qqmyers yes, generally, having separate, smaller PRs are easier for us to review, QA, and merge. So since it isn't too much work on your side, we would prefer that approach.

pdurbin · 2018-09-12T00:54:55Z

@qqmyers I saw you made pull request #5049 and I assume it's the main one for this issue so I dragged it to code review at https://waffle.io/IQSS/dataverse

djbrooke · 2018-09-14T18:43:40Z

Hi @qqmyers - thanks for talking about this earlier this week. The other PRs are being reviewed. The workflow-based integration here will be extremely useful and is a fulfillment of a long-standing community need.

I have some concerns about the UI piece here. We’ll have a lot of moving pieces on the dataset and file pages as part of the redesign effort, so we don’t want to add any additional UI elements to the page right now, even if it’s only for superusers. It doesn’t appear there are API endpoints for the archiving via the UI that’s shown in the screenshot above. If these endpoints could be added, I think it would allow the desired functionality while not adding additional challenge to the design team's work in flight.

Let me know if you have any thoughts on the above. Thanks for all the PRs!

qqmyers · 2018-09-14T19:12:19Z

@djbrooke - there is an api, just missed it in the merge (and just added). For QDR, I think the ability for curators to see the status and be able to send things via the GUI will be important, but I can pull that part from the PR.

djbrooke · 2018-09-18T15:09:10Z

Thanks @qqmyers. I saw some commits come in on the associated PR over the weekend. Is this ready for code review? Let me know if you'd like us to take another look.

kcondon · 2019-01-16T19:53:53Z

@qqmyers API for existing dataset still failing. I can hold off testing if you are still working on it, just wanted to give new account one last try. Saw this in server log:

[2019-01-16T14:37:22.284-0500] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataver se.util.bagit.BagGenerator] [tid: _ThreadID=836 _ThreadName=Thread-50] [timeMill is: 1547667442284] [levelValue: 800] [[
Generating: Bag to the Future!]]

[2019-01-16T14:37:22.369-0500] [glassfish 4.1] [INFO] [] [] [tid: _ThreadID=836 _ThreadName=Thread-8] [timeMillis: 1547667442369] [levelValue: 800] [[
Using index: 0]]

[2019-01-16T14:37:22.369-0500] [glassfish 4.1] [INFO] [] [] [tid: _ThreadID=836 _ThreadName=Thread-8] [timeMillis: 1547667442369] [levelValue: 800] [[
Using index: 1]]

[2019-01-16T14:46:22.632-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.data verse.util.bagit.BagGenerator] [tid: _ThreadID=837 _ThreadName=pool-63-thread-1] [timeMillis: 1547667982632] [levelValue: 900] [[
Attempt# 5 : Unable to retrieve file: https://dataverse-internal.iq.harvard.ed u/api/access/datafile/:persistentId?persistentId=doi:10.5072/FK2/C0W2T8/ZBNKAL
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connect ion from pool
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseCon nection(PoolingHttpClientConnectionManager.java:313)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(Po olingHttpClientConnectionManager.java:279)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec. java:191)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java :185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java :111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp Client.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttp Client.java:83)
at edu.harvard.iq.dataverse.util.bagit.BagGenerator$3.get(BagGenerator.j ava:987)
at org.apache.commons.compress.archivers.zip.ZipArchiveEntryRequest.getP ayloadStream(ZipArchiveEntryRequest.java:62)
at org.apache.commons.compress.archivers.zip.ScatterZipOutputStream.addA rchiveEntry(ScatterZipOutputStream.java:95)
at org.apache.commons.compress.archivers.zip.ParallelScatterZipCreator$2 .call(ParallelScatterZipCreator.java:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624)
at java.lang.Thread.run(Thread.java:748)
]]

[2019-01-16T14:46:22.634-0500] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.datav erse.util.bagit.BagGenerator] [tid: _ThreadID=837 _ThreadName=pool-63-thread-1] [timeMillis: 1547667982634] [levelValue: 1000] [[
Final attempt failed for https://dataverse-internal.iq.harvard.edu/api/access/ datafile/:persistentId?persistentId=doi:10.5072/FK2/C0W2T8/ZBNKAL]]

[2019-01-16T14:46:22.635-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=83 7 _ThreadName=Thread-9] [timeMillis: 1547667982635] [levelValue: 1000] [[
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for conne ction from pool
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseCon nection(PoolingHttpClientConnectionManager.java:313)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(Po olingHttpClientConnectionManager.java:279)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec. java:191)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java :185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java :111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp Client.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttp Client.java:83)
at edu.harvard.iq.dataverse.util.bagit.BagGenerator$3.get(BagGenerator.j ava:987)
at org.apache.commons.compress.archivers.zip.ZipArchiveEntryRequest.getP ayloadStream(ZipArchiveEntryRequest.java:62)
at org.apache.commons.compress.archivers.zip.ScatterZipOutputStream.addA rchiveEntry(ScatterZipOutputStream.java:95)
at org.apache.commons.compress.archivers.zip.ParallelScatterZipCreator$2 .call(ParallelScatterZipCreator.java:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624)
at java.lang.Thread.run(Thread.java:748)]]

[2019-01-16T14:46:22.635-0500] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.datav erse.util.bagit.BagGenerator] [tid: _ThreadID=837 _ThreadName=pool-63-thread-1] [timeMillis: 1547667982635] [levelValue: 1000] [[
Could not read: https://dataverse-internal.iq.harvard.edu/api/access/datafile/ :persistentId?persistentId=doi:10.5072/FK2/C0W2T8/ZBNKAL]]

[2019-01-16T14:46:22.636-0500] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.datav erse.engine.command.impl.DuraCloudSubmitToArchiveCommand] [tid: _ThreadID=836 _T hreadName=Thread-50] [timeMillis: 1547667982636] [levelValue: 1000] [[
Error creating bag: java.lang.NullPointerException]]

[2019-01-16T14:46:22.637-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=83 6 _ThreadName=Thread-9] [timeMillis: 1547667982637] [levelValue: 1000] [[
java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.commons.compress.archivers.zip.ParallelScatterZipCreator.w riteTo(ParallelScatterZipCreator.java:245)
at edu.harvard.iq.dataverse.util.bagit.BagGenerator.writeTo(BagGenerator .java:730)
at edu.harvard.iq.dataverse.util.bagit.BagGenerator.generateBag(BagGener ator.java:312)
at edu.harvard.iq.dataverse.engine.command.impl.DuraCloudSubmitToArchive Command$2.run(DuraCloudSubmitToArchiveCommand.java:125)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at org.apache.commons.compress.archivers.zip.StreamCompressor.deflate(St reamCompressor.java:183)
at org.apache.commons.compress.archivers.zip.ScatterZipOutputStream.addA rchiveEntry(ScatterZipOutputStream.java:96)
at org.apache.commons.compress.archivers.zip.ParallelScatterZipCreator$2 .call(ParallelScatterZipCreator.java:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624)
... 1 more]]

[2019-01-16T14:46:25.362-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.data verse.engine.command.impl.DuraCloudSubmitToArchiveCommand] [tid: _ThreadID=31 _T hreadName=http-listener-1(4)] [timeMillis: 1547667985362] [levelValue: 900] [[
Error attempting to add content 'doi-10-5072-fk2-c0w2t8v1.0.zip' in 'doi-10-50 72-fk2-c0w2t8' due to: null]]

[2019-01-16T14:46:25.365-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=31 org.duracloud.error.ContentStoreException: Error attempting to add content 'do at org.duracloud.client.ContentStoreImpl.doAddContent(ContentStoreImpl.j at org.duracloud.client.ContentStoreImpl.addContent(ContentStoreImpl.jav at edu.harvard.iq.dataverse.engine.command.impl.DuraCloudSubmitToArchive at edu.harvard.iq.dataverse.engine.command.impl.AbstractSubmitToArchiveC at edu.harvard.iq.dataverse.engine.command.impl.AbstractSubmitToArchiveC at edu.harvard.iq.dataverse.EjbDataverseEngine.submit(EjbDataverseEngine at sun.reflect.GeneratedMethodAccessor2270.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(E at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBS at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.j at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:656)
at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroun at org.jboss.weld.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInt at sun.reflect.GeneratedMethodAccessor1545.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext( at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(Sys at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvo at sun.reflect.GeneratedMethodAccessor1546.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext( at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(Inte at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4 at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:474 at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLoca at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke at com.sun.proxy.$Proxy671.submit(Unknown Source)
at edu.harvard.iq.dataverse._EJB31_Generated__EjbDataverseEngine__Intf at edu.harvard.iq.dataverse.api.Admin.submitDatasetVersionToArchive(Admi at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(E at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBS at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.j at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:656)
at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroun at org.jboss.weld.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInt at sun.reflect.GeneratedMethodAccessor1545.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext( at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(Sys at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvo at sun.reflect.GeneratedMethodAccessor1546.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext( at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(Inte at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4 at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:474 at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLoca at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke at com.sun.proxy.$Proxy600.submitDatasetVersionToArchive(Unknown at edu.harvard.iq.dataverse.api.EJB31_Generated__Admin__Intf____Bean at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHa at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethod at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethod at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatch at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethod at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(Resour at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(Resourc at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(Resourc at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:27 at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(Request _ThreadName=Thread-9] [timeMillis: 1547667985365] [levelValue: 1000] [[
i-10-5072-fk2-c0w2t8v1.0.zip' in 'doi-10-5072-fk2-c0w2t8' due to: null
ava:661)
a:586)
Command.performArchiveSubmission(DuraCloudSubmitToArchiveCommand.java:134)
ommand.execute(AbstractSubmitToArchiveCommand.java:52)
ommand.execute(AbstractSubmitToArchiveCommand.java:21)
.java:232)
sorImpl.java:43)
JBSecurityManager.java:1081)
ecurityManager.java:1153)
ava:4786)
> InterceptorManager.java:822)
dInvoke(AbstractEJBRequestScopeActivationInterceptor.java:64)
erceptor.java:52)
sorImpl.java:43)
(InterceptorManager.java:883)
InterceptorManager.java:822)
temInterceptorProxy.java:163)
ke(SystemInterceptorProxy.java:140)
sorImpl.java:43)
(InterceptorManager.java:883)
InterceptorManager.java:822)
rceptorManager.java:369)
758)
6)
lObjectInvocationHandler.java:212)
(EJBLocalObjectInvocationHandlerDelegate.java:88)
_Bean.submit(Unknown Source)
n.java:1321)
java:62)
sorImpl.java:43)
JBSecurityManager.java:1081)
ecurityManager.java:1153)
ava:4786)
> InterceptorManager.java:822)
dInvoke(AbstractEJBRequestScopeActivationInterceptor.java:64)
erceptor.java:52)
sorImpl.java:43)
(InterceptorManager.java:883)
InterceptorManager.java:822)
temInterceptorProxy.java:163)
ke(SystemInterceptorProxy.java:140)
sorImpl.java:43)
(InterceptorManager.java:883)
InterceptorManager.java:822)
rceptorManager.java:369)
758)
6)
lObjectInvocationHandler.java:212)
(EJBLocalObjectInvocationHandlerDelegate.java:88)
Source)
.submitDatasetVersionToArchive(Unknown Source)
java:62)
sorImpl.java:43)
ndlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
Dispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
Dispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
erProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.ja va:152)
Dispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
ceMethodInvoker.java:387)
eMethodInvoker.java:331)
eMethodInvoker.java:103)
1)
Scope.java:29]]

[2019-01-16T14:46:25.365-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=31 _ThreadName=Thread-9] [timeMillis: 1547667985365] [levelValue: 1000] [[
7)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java: 254)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHand ler.java:1028)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:3 72)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContaine r.java:381)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContaine r.java:344)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContaine r.java:221)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java :1682)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:344)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:214)
at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java :226)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:214)
at edu.harvard.iq.dataverse.api.ApiBlockingFilter$3.doBlock(ApiBlockingF ilter.java:65)
at edu.harvard.iq.dataverse.api.ApiBlockingFilter.doFilter(ApiBlockingFi lter.java:157)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:214)
at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:30)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:214)
at org.apache.catalina.core.ApplicationDispatcher.doInvoke(ApplicationDi spatcher.java:873)
at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDisp atcher.java:739)
at org.apache.catalina.core.ApplicationDispatcher.processRequest(Applica tionDispatcher.java:575)
at org.apache.catalina.core.ApplicationDispatcher.doDispatch(Application Dispatcher.java:546)
at org.apache.catalina.core.ApplicationDispatcher.dispatch(ApplicationDi spatcher.java:428)
at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDis patcher.java:378)
at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:34)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:214)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV alve.java:316)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextV alve.java:160)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.j ava:734)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.jav a:673)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j ava:174)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.j ava:734)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.jav a:673)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.j ava:412)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav a:282)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallab le.call(ContainerMapper.java:459)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(Container Mapper.java:167)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler. java:201)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.ja va:175)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpSer verFilter.java:235)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(Executor Resolver.java:119)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(De faultFilterChain.java:284)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart (DefaultFilterChain.java:201)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultF ilterChain.java:133)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultF ilterChain.java:112)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.jav a:77)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNI OTransport.java:561)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(Abstr actIOStrategy.java:112)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerTh readIOStrategy.java:117)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(Wo rkerThreadIOStrategy.java:56)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadR unnable.run(WorkerThreadIOStrategy.java:137)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(Abs tractThreadPool.java:565)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(Abstra ctThreadPool.java:545)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.http.client.ClientProtocolException
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp Client.java:187)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttp Client.java:83)
at org.duracloud.common.web.RestHttpHelper.executeRequest(RestHttpHelper .java:292)
at org.duracloud.common.web.RestHttpHelper.put(RestHttpHelper.java:192)
at org.duracloud.client.ContentStoreImpl.doAddContent(ContentStoreImpl.j ava:625)
... 145 more
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry re quest with a non-repeatable request entity
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:108)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java :111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp Client.java:185)
... 149 more
Caused by: java.io.IOException: Pipe broken
at java.io.PipedInputStream.read(PipedInputStream.java:321)
at java.io.PipedInputStream.read(PipedInputStream.java:377)
at java.security.DigestInputStream.read(DigestInputStream.java:161)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.ja va:133)
at org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEnti tyProxy.java:121)
at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(D efaultBHttpClientConnection.java:156)
at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.jav a:160)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpReques tExecutor.java:238)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecu tor.java:123)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec. java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java :185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
... 151 more]]

qqmyers · 2019-01-16T19:56:10Z

@kcondon - thanks for sending it back. I'm also seeing a problem with the API, but not the workflow at the moment, so some debugging needed. I'll let you know when I figure out what's changed.

qqmyers · 2019-01-17T22:54:43Z

@kcondon - just uploaded a fix for the api. It looks like at some point the fact that the api is a call to the server which then triggers file retrieval calls via http caused some deadlock. I was mostly testing from our GUI which was the same code except the initial http call so I missed the issue. In any case, the new async mechanism that works like indexing - the api call just starts the process and returns - works for me. (The workflow part should have been working all along...).

So - I think you can look at this again. I'll go back tomorrow to look at your comments on the docs and see if I can make those clearer, but I won't touch the code unless you find issues.

qqmyers · 2019-01-18T16:03:53Z

@kcondon - made some clarifications/corrections in the docs including 1) the :ArchiverClassName doesn't need to be listed in :ArchiverSettings or the workflow definition, 2) the DuraCloud port and context are optional since they have defaults, but setting them only works if they are also listed in the :ArchiverSettings. (FWIW: This split is so thing can be generic - the :ArchiverSettings tells the generic code which properties to send to the archive-specific class and then the archive-specific class uses those settings.). Hopefully it's all good at this point...

kcondon · 2019-01-22T20:16:02Z

@qqmyers API is working now, thanks. Am having trouble with workflow but likely a simple config issue: I've added workflow using sample file, replacing "string" with values that were mentioned in the api section. However, when I publish a dataset it fails with log error:

[2019-01-22T15:01:25.073-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.util.ArchiverUtil] [tid: _ThreadID=142 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548187285073] [levelValue: 900] [[
Unable to instantiate an Archiver of class: null]]

[2019-01-22T15:01:25.074-0500] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.workflow.internalspi.ArchivalSubmissionWorkflowStep] [tid: _ThreadID=142 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548187285074] [levelValue: 1000] [[
No Archiver instance could be created for name: null]]

[2019-01-22T15:01:25.075-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.workflow.WorkflowServiceBean] [tid: _ThreadID=142 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548187285075] [levelValue: 900] [[
Workflow 541c4e8b-4261-4cd1-8f91-a7c3ffbc758f failed: No Archiver]]

[2019-01-22T15:01:25.074-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=142 _ThreadName=Thread-9] [timeMillis: 1548187285074] [levelValue: 1000] [[
java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at edu.harvard.iq.dataverse.util.ArchiverUtil.createSubmitToArchiveCommand(ArchiverUtil.java:25)
at edu.harvard.iq.dataverse.workflow.internalspi.ArchivalSubmissionWorkflowStep.run(ArchivalSubmissionWorkflowStep.java:52)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.runStep(WorkflowServiceBean.java:256)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.executeSteps(WorkflowServiceBean.java:221)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.forward(WorkflowServiceBean.java:161)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.start(WorkflowServiceBean.java:102)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1081)
at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:1153)
at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4786)
at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:656)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:73)
at org.jboss.weld.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
at sun.reflect.GeneratedMethodAccessor76751.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
at sun.reflect.GeneratedMethodAccessor76752.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:369)
at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4758)
at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4746)
at com.sun.ejb.containers.EjbAsyncTask.call(EjbAsyncTask.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)]]

[2019-01-22T15:01:25.080-0500] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.workflow.WorkflowServiceBean] [tid: _ThreadID=142 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548187285080] [levelValue: 800] [[
Removing workflow lock]]

qqmyers · 2019-01-22T21:01:32Z

@kcondon - The issue may be that the "string" entries in requiredSettings part of the json file aren't meant to be substituted. They just specify the data type for that setting so it can be passed appropriately. The actual values will come from the named settings you've already set up for the API.

kcondon · 2019-01-22T21:55:11Z

@qqmyers No luck, still seeing this error:
[2019-01-22T16:38:03.892-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.util.ArchiverUtil] [tid: _ThreadID=143 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548193083892] [levelValue: 900] [[
Unable to instantiate an Archiver of class: null]]

[2019-01-22T16:38:03.900-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=143 _ThreadName=Thread-9] [timeMillis: 1548193083900] [levelValue: 1000] [[
java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at edu.harvard.iq.dataverse.util.ArchiverUtil.createSubmitToArchiveCommand(ArchiverUtil.java:25)
at edu.harvard.iq.dataverse.workflow.internalspi.ArchivalSubmissionWorkflowStep.run(ArchivalSubmissionWorkflowStep.java:52)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.runStep(WorkflowServiceBean.java:256)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.executeSteps(WorkflowServiceBean.java:221)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.forward(WorkflowServiceBean.java:161)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.start(WorkflowServiceBean.java:102)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1081)
at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:1153)
at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4786)
at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:656)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:73)
at org.jboss.weld.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
at sun.reflect.GeneratedMethodAccessor131.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:369)
at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4758)
at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4746)
at com.sun.ejb.containers.EjbAsyncTask.call(EjbAsyncTask.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)]]

[2019-01-22T16:38:03.901-0500] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.workflow.internalspi.ArchivalSubmissionWorkflowStep] [tid: _ThreadID=143 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548193083901] [levelValue: 1000] [[
No Archiver instance could be created for name: null]]

[2019-01-22T16:38:03.902-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.workflow.WorkflowServiceBean] [tid: _ThreadID=143 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548193083902] [levelValue: 900] [[
Workflow ba91b595-3a05-47c5-bf14-10e56f4b8797 failed: No Archiver]]

[2019-01-22T16:38:03.903-0500] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.workflow.WorkflowServiceBean] [tid: _ThreadID=143 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548193083903] [levelValue: 800] [[
Removing workflow lock]]

qqmyers · 2019-01-22T22:21:46Z

@kcondon - hmmm. One other thought - did you update the PostPublishDataset workflow:
curl http://localhost:8080/api/admin/workflows/default/PostPublishDataset
returns the id of a good workflow (ids autoincrement). If that's not it...

My workflow json looks as follows:
curl http://localhost:8080/api/admin/workflows/10
{
"status": "OK",
"data": {
"name": "Archive submission workflow",
"id": 10,
"steps": [
{
"stepType": "archiver",
"provider": ":internal",
"parameters": {
"stepName": "archive submission"
},
"requiredSettings": {
":ArchiverClassName": "string",
":ArchiverSettings": "string",
":DuraCloudPort": "string",
":DuraCloudContext": "string",
":DuraCloudHost": "string"
}
}
]
}
}
and my :ArchiverClassName is set as:
curl http://localhost:8080/api/admin/settings/:ArchiverClassName
{"status":"OK","data":{"message":"edu.harvard.iq.dataverse.engine.command.impl.DuraCloudSubmitToArchiveCommand"}}

If you have these two, I think you should get the class created OK . (You'll need the :ArchiverSettings and :DuraCloudHost set as well to make it all go, but those should be good already with the API working.)

Could there be a typo somewhere? The way this works is that the requiredSettings listed in json are the only settings the workflow step gets to see (so steps can't read settings outside the set an admin has allowed through the workflow definition). If the setting exists, and the workflow definition lists it, you shouldn't be getting a null. (I did a quick check between my 'good' branch and this one and don't see any differences in how the settings are read/passed, so I don't think its a code issue. If nothing here helps, I look again.)

kcondon · 2019-01-22T22:51:10Z

@qqmyers Thanks for the specifics. It looks like for some reason the workflow was not created the same way as yours. I believe I did a wget on the raw sample file from github and then ran the add workflow endpoint. This is what I get:

curl http://localhost:8080/api/admin/workflows/4 | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
194 194 194 194 0 0 10634 0 --:--:-- --:--:-- --:--:-- 11411
{
"data": {
"steps": [
{
"requiredSettings": {},
"parameters": {
"stepName": "archive submission"
},
"provider": ":internal",
"stepType": "archiver"
}
],
"id": 4,
"name": "Archive submission workflow"
},
"status": "OK"
}

I'm wondering whether since I am defaulting on port and context if that is causing the create workflow to skip part of the file? I'll try deleting and re-adding, both as is and with settings explicitly set.

Other setting:
curl http://localhost:8080/api/admin/settings/:ArchiverClassName
{"status":"OK","data":{"message":"edu.harvard.iq.dataverse.engine.command.impl.DuraCloudSubmitToArchiveCommand"}}

qqmyers · 2019-01-22T23:04:59Z

@kcondon -got it. It looks like the code in JsonParser to parse the requiredSettings didn't make it in one of the earlier PRs (ie. the general workflow update since this isn't specific to the archiver). Hopefully that's the last issue. To get this to work, I think you'll have to rebuild and then resubmit the workflow/set the PostPublishDataset workflow to the new one. Thanks again for persevering!

kcondon · 2019-01-22T23:16:29Z

@qqmyers OK, rebuilt, readded workflow, set as default. Still fails with null archiver.

[2019-01-22T18:14:17.503-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.util.ArchiverUtil] [tid: _ThreadID=150 _ThreadName=__ejb-thread-pool12] [timeMillis: 1548198857503] [levelValue: 900] [[
Unable to instantiate an Archiver of class: null]]

curl http://localhost:8080/api/admin/workflows/5 | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
109 328 109 328 0 0 18446 0 --:--:-- --:--:-- --:--:-- 20500
{
"data": {
"steps": [
{
"requiredSettings": {
"DuraCloudPort": "string",
"ArchiverClassName": "string",
"DuraCloudContext": "string",
"ArchiverSettings": "string",
"DuraCloudHost": "string"
},
"parameters": {
"stepName": "archive submission"
},
"provider": ":internal",
"stepType": "archiver"
}
],
"id": 5,
"name": "Archive submission workflow"
},
"status": "OK"
}

pameyer · 2019-01-22T23:30:13Z

Ran a few quick tests on 581e5dc in this branch. Existing RSAL workflows (RSAL 0.1) continue to work as expected (at least for success path).

qqmyers · 2019-01-22T23:30:30Z

@kcondon - Argghh - I think you need colons in front of the requiredSettings, which is not what the example in the branch has... but is what's working for me (see earlier comment). I'll commit an update...

kcondon · 2019-01-22T23:41:12Z

@qqmyers That did the trick!

kcondon · 2019-01-22T23:55:56Z

@qqmyers Have found some weirdness in API, need to narrow it down:
using API, trying to archive v2.0 of dataset not yet archived using a key from a user with no particular perms. Says error, version already archived. It turns out it did archive it though it had not been archived yet.

I'm heading out now so will look at it again tomorrow. Thanks for the help and fixes.

qqmyers · 2019-01-23T01:56:20Z

@kcondon - not sure I get the full picture, so here's some possibly useful info:

any user able to publish the dataset should be allowed to use the api on that dataset. As far as I can tell, a pemission error should get an initial OK response and then a log message about""Unexpected Exception calling submit archive command..."
the BAD_REQUEST, Version already archived at: ... message should only occur if the archivalCopylocation for that datasetversion is not null in the db.
versions can all be archived independently - no order is required and the test for whether something has been archived is for the specified version
archiving any version of a dataset should fail if a prior version was archived (went far enough to create the space on duracloud) and the space has not been deleted. (All versions use the same space and, to avoid overwriting a version that was submitted but not yet manually transferred to Chronopolis, neither the api or workflow can write a new version until the space is deleted.)

kcondon · 2019-01-23T18:30:03Z

@qqmyers Thanks for the detail. It was a combination of the archivalCopyLocation value and the enabled workflow. I was not fully clearing out the prior entry. It's working, merged.

qqmyers · 2019-01-23T21:11:48Z

@kcondon - Great - thanks for the all the work on this - it was interesting breaking this into multiple PRs and trying to keep them all in sync. Unfortunately, I think you ended up being the one who found problems when I didn't keep things straight.

With the merge, can I delete the qdr.duracloud account we created for testing? I don't think there's any issue with recreating it if/when needed, but since it needs admin privs to create spaces, I'd like to close it down if you're not planning further testing/demoing with it.

kcondon · 2019-01-23T21:17:46Z

@qqmyers Sure, feel free to delete the account.

djbrooke added the communitydev label Jun 27, 2018

djbrooke assigned qqmyers Jun 27, 2018

pdurbin added the UX & UI: Design This issue needs input on the design of the UI and from the product owner label Aug 23, 2018

pdurbin mentioned this issue Aug 24, 2018

Support for Duplication of Data Collections Across Repositories #2025

Closed

This was referenced Sep 7, 2018

Support for SHA-256 and SHA-512 with update option #5033

Closed

Streaming exports and OAI_ORE and datacite exporters #5043

Closed

Workflow Enhancements #5044

Closed

Iqss/4706 dpn submission of archival copies #5049

Merged

pdurbin added Status: Code Review and removed communitydev labels Sep 12, 2018

pdurbin unassigned qqmyers Sep 12, 2018

djbrooke assigned djbrooke and scolapasta Sep 12, 2018

djbrooke assigned qqmyers and unassigned scolapasta Sep 14, 2018

djbrooke added communitydev and removed Status: Code Review labels Sep 14, 2018

kcondon assigned qqmyers and unassigned kcondon Jan 16, 2019

djbrooke added Status: QA and removed Status: Community Dev labels Jan 17, 2019

djbrooke unassigned qqmyers Jan 17, 2019

kcondon self-assigned this Jan 22, 2019

kcondon closed this as completed Jan 23, 2019

kcondon removed the Status: QA label Jan 23, 2019

matthew-a-dunlap mentioned this issue Feb 13, 2019

Logging: Verbose debug logging in server log happening on some machines post 4.10.1 release #5534

Closed

jggautier mentioned this issue Apr 18, 2019

OAI-ORE exports are missing mapping of Dataverse's "Related Publication ID Number" to Datacite property #5766

Closed

skasberger mentioned this issue Jun 27, 2020

Add import and export of BagIt gdcc/pyDataverse#46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OAI-ORE and BagIT development #4706

OAI-ORE and BagIT development #4706

qqmyers commented May 23, 2018

qqmyers commented Jul 3, 2018

qqmyers commented Aug 3, 2018

qqmyers commented Aug 22, 2018 •

edited

Loading

pdurbin commented Aug 22, 2018

qqmyers commented Aug 23, 2018

pdurbin commented Aug 24, 2018 •

edited by jggautier

Loading

qqmyers commented Sep 4, 2018

djbrooke commented Sep 5, 2018 •

edited

Loading

scolapasta commented Sep 6, 2018

pdurbin commented Sep 12, 2018

djbrooke commented Sep 14, 2018

qqmyers commented Sep 14, 2018

djbrooke commented Sep 18, 2018

kcondon commented Jan 16, 2019

qqmyers commented Jan 16, 2019

qqmyers commented Jan 17, 2019

qqmyers commented Jan 18, 2019

kcondon commented Jan 22, 2019

qqmyers commented Jan 22, 2019

kcondon commented Jan 22, 2019

qqmyers commented Jan 22, 2019

kcondon commented Jan 22, 2019

qqmyers commented Jan 22, 2019

kcondon commented Jan 22, 2019 •

edited

Loading

pameyer commented Jan 22, 2019

qqmyers commented Jan 22, 2019

kcondon commented Jan 22, 2019

kcondon commented Jan 22, 2019 •

edited

Loading

qqmyers commented Jan 23, 2019

kcondon commented Jan 23, 2019

qqmyers commented Jan 23, 2019

kcondon commented Jan 23, 2019

OAI-ORE and BagIT development #4706

OAI-ORE and BagIT development #4706

Comments

qqmyers commented May 23, 2018

qqmyers commented Jul 3, 2018

qqmyers commented Aug 3, 2018

qqmyers commented Aug 22, 2018 • edited Loading

pdurbin commented Aug 22, 2018

qqmyers commented Aug 23, 2018

pdurbin commented Aug 24, 2018 • edited by jggautier Loading

qqmyers commented Sep 4, 2018

djbrooke commented Sep 5, 2018 • edited Loading

scolapasta commented Sep 6, 2018

pdurbin commented Sep 12, 2018

djbrooke commented Sep 14, 2018

qqmyers commented Sep 14, 2018

djbrooke commented Sep 18, 2018

kcondon commented Jan 16, 2019

qqmyers commented Jan 16, 2019

qqmyers commented Jan 17, 2019

qqmyers commented Jan 18, 2019

kcondon commented Jan 22, 2019

qqmyers commented Jan 22, 2019

kcondon commented Jan 22, 2019

qqmyers commented Jan 22, 2019

kcondon commented Jan 22, 2019

qqmyers commented Jan 22, 2019

kcondon commented Jan 22, 2019 • edited Loading

pameyer commented Jan 22, 2019

qqmyers commented Jan 22, 2019

kcondon commented Jan 22, 2019

kcondon commented Jan 22, 2019 • edited Loading

qqmyers commented Jan 23, 2019

kcondon commented Jan 23, 2019

qqmyers commented Jan 23, 2019

kcondon commented Jan 23, 2019

qqmyers commented Aug 22, 2018 •

edited

Loading

pdurbin commented Aug 24, 2018 •

edited by jggautier

Loading

djbrooke commented Sep 5, 2018 •

edited

Loading

kcondon commented Jan 22, 2019 •

edited

Loading

kcondon commented Jan 22, 2019 •

edited

Loading