Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAI-ORE and BagIT development #4706

Closed
qqmyers opened this issue May 23, 2018 · 79 comments
Closed

OAI-ORE and BagIT development #4706

qqmyers opened this issue May 23, 2018 · 79 comments

Comments

@qqmyers
Copy link
Member

qqmyers commented May 23, 2018

This is an issue to track feedback related to developing a way to archive published datasets in DPN (http://dpn.org). I've done some proof-of-concept work to generate an OAI-ORE map file and BagIt bag (which uses and includes the ORE map file) for published datasets that I hope can form the basis for a DPN submission.
From https://groups.google.com/forum/#!topic/dataverse-community/NZydpK_zXO0 :

I've posted some documentation which describes the use case and design rationale and has a run-down of some of the choices we've made to get to a proof-of-concept and some open issues. That documentation links to two example files - a json-ld ORE map and a BagIt bag for a test dataset. (FWIW: We're developing in the QDR fork of Dataverse at https://github.com/QualitativeDataRepository/dataverse/tree/feature/QDR-953

@qqmyers
Copy link
Member Author

qqmyers commented Jul 3, 2018

After discussions at the 2018 Dataverse meeting (thanks!), I've tried to identify a list of things to get to a minimum viable solution. Comments welcome.

ORE updates:

  • add URI (and/or block-level namespace) to *.tsv import/api/metadata block model, use that URI, if it exists, instead of generated one in oremap (keeping a generated one as the default for tsv files that have not been updated)
  • add sha256 (and 512) as options to Dataverse, create a way to generate for existing files (e.g. with check of existing md5)?
  • decide on mapping of terms to URIs for internal DV metadata and citation.tsv metadata, with options being DDI(disco?), DCTerms, schema.org, and/or custom Dataverse terms (as the current code has). If there are custom terms, create a page at the URL used.
  • submit OAI-ORE exporter to Dataverse (i.e. make it available as an option in the export menu) - this will probably include an update to allow exporters to stream instead of generating a string in memory.

Bag updates:

DPN updates:

  • Still TBD - planning discussion with them to decide whether to submit an ~RDA bag to be wrapped in a DPN bag, to submit a DPN bag directly, or perhaps to use the ORE map file to construct a bag via DPN's API.

@qqmyers
Copy link
Member Author

qqmyers commented Aug 3, 2018

An update: w.r.t. code - the URI and namespace for metadatablocks and support for sha256/512, including an api call to verify the existing hash and replace it, if the file is still valid, with one form the new algorithm, are both done/in the branch. I'm currently working on default URIs for the citation metadata block, and adapting the bag.

John Crabtree and I had a good call with Dave Pcolar of DPN today and it appears that either sending individual files or an ~RDA bag, either of which would be wrapped in a DPN bag for preservation, should be doable today. Sending a DPN bag directly is something that DPN is willing to work on, but is not currently supported. From the discussion, it appears that sending files directly could have higher performance (due to parallel transfer) but the idea of using an ~RDA bag as a general export, common intermediate/standard across possible preservation systems sounds compelling and I think sending a bag is currently the consensus option. We had some discussion of potential next steps w.r.t. versioning (perhaps just publishing the version changes given Dataverse's ability to identify them) and how to assure that variable level metadata is included (by including the ddi metadata file and/or adding to the ORE map).

@qqmyers
Copy link
Member Author

qqmyers commented Aug 22, 2018

After a second discussion with DPN and Odum, I've gone ahead with a consensus plan to enable optional submission to DPN as a post-publication Dataverse workflow as a v 1 effort. Based on how DPN works, including the fact that initial submission is synchronous and reversible while creating a 'snapshot' to archive a space can have a delay and is irreversible (except for manual removal), the workflow creates a space, named after the dataset globalId and uploads a BagIt bag, named after the globalId+ version, and a datacite.xml file to it. The success of this step is reported in a new column on the versions tab, visible only to admins, that reports failure or provides a link to find the data in the DPN Duracloud admin GUI. A curator would click the button to create a snapshot and monitor progress from there. Once the snapshot exists, the space is automatically emptied and can be deleted. Publishing a new version of a dataset will recreate the space and the process can be repeated with a snapshot of the new Bag and datacite.xml file. (Versions are therefore stored as different snapshots of the same space.)

I've been testing this in a 4.9.2-based QDR branch and it works reliably, though I did hit a DPN bug at one point. As a side-effect of the main effort, the datacite.xml file can be made available as a metadata export format (and it may be worth looking at it to add more fields as we just did with the citation formats). I've removed the Bag generation from the metadata export menu where I initially tested it for several reasons - it's not just metadata, it includes restricted files and access to it should be restricted, it's better to stream it to DPN/generate on demand rather than caching it (as its similar in size to the whole dataset).

I have a few things to finish up before this is ready for review/QA:

  • suggesting vocabulary matches for terms in citation.tsv
  • add a place for admins to download a Bag if/when the workflow fails
  • add a way to archive already published datasets
  • add the datacite.xml file to the bag and check for any other RDA recommended changes
  • make the update available as a PR off the current develop branch

If anyone would like to see it early, I'd be happy to demo/discuss.

@pdurbin
Copy link
Member

pdurbin commented Aug 22, 2018

A curator would click the button to create a snapshot

@qqmyers what would label on the button be? If it's easy to provide a screenshot of what you have so far, can you please add it here? Today in a design meeting we were thinking about UI impact on the dataset page and I mentioned that at one point you were planning to put a button under "Export" even though we might want to consider a different place and name for it.

@qqmyers
Copy link
Member Author

qqmyers commented Aug 23, 2018

@pdurbin - the button you reference is one in the duracloud admin webapp, not Dataverse. I was originally thinking an 'export button' on the dataset page would be good, but since it could contain restricted files and is version specific, I've gone for the admin-only column in the version table, which is currently something like below. Non-admins would just see the normal table.
All that said, it's all just 'proposed' and I'm definitely interested in feedback w.r.t. GUI, as well as who should be able to access the 'retry' functionality, etc. etc.

image

@pdurbin pdurbin added the UX & UI: Design This issue needs input on the design of the UI and from the product owner label Aug 23, 2018
@pdurbin
Copy link
Member

pdurbin commented Aug 24, 2018

@qqmyers thanks, I've been talking to @mheppler a bit about your screenshot.

I know you wrote extensive documentation about what you're up in the "Data and Metadata Packaging for Archiving" that's ultimately linked from the Google Group post in the description of this issue but here's a direct link: https://github.com/QualitativeDataRepository/dataverse/wiki/Data-and-Metadata-Packaging-for-Archiving

People should also check out the discussion about the plan in the Google Group thread: https://groups.google.com/d/msg/dataverse-community/NZydpK_zXO0/vuvhnHL7AQAJ

Another resource is @qqmyers 's talk at the 2018 Dataverse Community Meeting. "Dataverse and Preservation in the Qualitative Data Repository" at https://drive.google.com/open?id=1fVhtw-R3Jf7wO4tgkNxpk3Mm93bUIjXP

@qqmyers
Copy link
Member Author

qqmyers commented Sep 4, 2018

@pdurbin - FYI - I've extended the workflow mechanism as discussed on the community call to allow system settings and an apiKey for the current user to be sent to a workflow step and, after some EJB fun, I think I have DPN submission as a workflow working along with the ability to submit past versions via the GUI. I have some cleanup to do, but I'm about ready to submit a PR(s) and would like to ask: There are a few things like the workflow changes and making the export mechanism use streaming data that were needed for DPN submission but could be submitted and reviewed as separate PRs. Would it be helpful to do that? That could be a little extra work for me, but I don't think it's that much since I have to compare between QDR's 4.9.2-based branch and develop anyway. It may help with review, but there would be dependencies between the PRs too. Let me know what you all think. Thanks!

@djbrooke
Copy link
Contributor

djbrooke commented Sep 5, 2018

@scolapasta any thoughts/guidance re: the approach in the above comment from @qqmyers?

hooray workflows!

@scolapasta
Copy link
Contributor

@qqmyers yes, generally, having separate, smaller PRs are easier for us to review, QA, and merge. So since it isn't too much work on your side, we would prefer that approach.

@pdurbin
Copy link
Member

pdurbin commented Sep 12, 2018

@qqmyers I saw you made pull request #5049 and I assume it's the main one for this issue so I dragged it to code review at https://waffle.io/IQSS/dataverse

@djbrooke
Copy link
Contributor

Hi @qqmyers - thanks for talking about this earlier this week. The other PRs are being reviewed. The workflow-based integration here will be extremely useful and is a fulfillment of a long-standing community need.

I have some concerns about the UI piece here. We’ll have a lot of moving pieces on the dataset and file pages as part of the redesign effort, so we don’t want to add any additional UI elements to the page right now, even if it’s only for superusers. It doesn’t appear there are API endpoints for the archiving via the UI that’s shown in the screenshot above. If these endpoints could be added, I think it would allow the desired functionality while not adding additional challenge to the design team's work in flight.

Let me know if you have any thoughts on the above. Thanks for all the PRs!

@qqmyers
Copy link
Member Author

qqmyers commented Sep 14, 2018

@djbrooke - there is an api, just missed it in the merge (and just added). For QDR, I think the ability for curators to see the status and be able to send things via the GUI will be important, but I can pull that part from the PR.

@djbrooke
Copy link
Contributor

Thanks @qqmyers. I saw some commits come in on the associated PR over the weekend. Is this ready for code review? Let me know if you'd like us to take another look.

@kcondon kcondon assigned qqmyers and unassigned kcondon Jan 16, 2019
@kcondon
Copy link
Contributor

kcondon commented Jan 16, 2019

@qqmyers API for existing dataset still failing. I can hold off testing if you are still working on it, just wanted to give new account one last try. Saw this in server log:

[2019-01-16T14:37:22.284-0500] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataver se.util.bagit.BagGenerator] [tid: _ThreadID=836 _ThreadName=Thread-50] [timeMill is: 1547667442284] [levelValue: 800] [[
Generating: Bag to the Future!]]

[2019-01-16T14:37:22.369-0500] [glassfish 4.1] [INFO] [] [] [tid: _ThreadID=836 _ThreadName=Thread-8] [timeMillis: 1547667442369] [levelValue: 800] [[
Using index: 0]]

[2019-01-16T14:37:22.369-0500] [glassfish 4.1] [INFO] [] [] [tid: _ThreadID=836 _ThreadName=Thread-8] [timeMillis: 1547667442369] [levelValue: 800] [[
Using index: 1]]

[2019-01-16T14:46:22.632-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.data verse.util.bagit.BagGenerator] [tid: _ThreadID=837 _ThreadName=pool-63-thread-1] [timeMillis: 1547667982632] [levelValue: 900] [[
Attempt# 5 : Unable to retrieve file: https://dataverse-internal.iq.harvard.ed u/api/access/datafile/:persistentId?persistentId=doi:10.5072/FK2/C0W2T8/ZBNKAL
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connect ion from pool
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseCon nection(PoolingHttpClientConnectionManager.java:313)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(Po olingHttpClientConnectionManager.java:279)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec. java:191)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java :185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java :111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp Client.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttp Client.java:83)
at edu.harvard.iq.dataverse.util.bagit.BagGenerator$3.get(BagGenerator.j ava:987)
at org.apache.commons.compress.archivers.zip.ZipArchiveEntryRequest.getP ayloadStream(ZipArchiveEntryRequest.java:62)
at org.apache.commons.compress.archivers.zip.ScatterZipOutputStream.addA rchiveEntry(ScatterZipOutputStream.java:95)
at org.apache.commons.compress.archivers.zip.ParallelScatterZipCreator$2 .call(ParallelScatterZipCreator.java:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624)
at java.lang.Thread.run(Thread.java:748)
]]

[2019-01-16T14:46:22.634-0500] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.datav erse.util.bagit.BagGenerator] [tid: _ThreadID=837 _ThreadName=pool-63-thread-1] [timeMillis: 1547667982634] [levelValue: 1000] [[
Final attempt failed for https://dataverse-internal.iq.harvard.edu/api/access/ datafile/:persistentId?persistentId=doi:10.5072/FK2/C0W2T8/ZBNKAL]]

[2019-01-16T14:46:22.635-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=83 7 _ThreadName=Thread-9] [timeMillis: 1547667982635] [levelValue: 1000] [[
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for conne ction from pool
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseCon nection(PoolingHttpClientConnectionManager.java:313)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(Po olingHttpClientConnectionManager.java:279)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec. java:191)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java :185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java :111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp Client.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttp Client.java:83)
at edu.harvard.iq.dataverse.util.bagit.BagGenerator$3.get(BagGenerator.j ava:987)
at org.apache.commons.compress.archivers.zip.ZipArchiveEntryRequest.getP ayloadStream(ZipArchiveEntryRequest.java:62)
at org.apache.commons.compress.archivers.zip.ScatterZipOutputStream.addA rchiveEntry(ScatterZipOutputStream.java:95)
at org.apache.commons.compress.archivers.zip.ParallelScatterZipCreator$2 .call(ParallelScatterZipCreator.java:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624)
at java.lang.Thread.run(Thread.java:748)]]

[2019-01-16T14:46:22.635-0500] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.datav erse.util.bagit.BagGenerator] [tid: _ThreadID=837 _ThreadName=pool-63-thread-1] [timeMillis: 1547667982635] [levelValue: 1000] [[
Could not read: https://dataverse-internal.iq.harvard.edu/api/access/datafile/ :persistentId?persistentId=doi:10.5072/FK2/C0W2T8/ZBNKAL]]

[2019-01-16T14:46:22.636-0500] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.datav erse.engine.command.impl.DuraCloudSubmitToArchiveCommand] [tid: _ThreadID=836 _T hreadName=Thread-50] [timeMillis: 1547667982636] [levelValue: 1000] [[
Error creating bag: java.lang.NullPointerException]]

[2019-01-16T14:46:22.637-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=83 6 _ThreadName=Thread-9] [timeMillis: 1547667982637] [levelValue: 1000] [[
java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.commons.compress.archivers.zip.ParallelScatterZipCreator.w riteTo(ParallelScatterZipCreator.java:245)
at edu.harvard.iq.dataverse.util.bagit.BagGenerator.writeTo(BagGenerator .java:730)
at edu.harvard.iq.dataverse.util.bagit.BagGenerator.generateBag(BagGener ator.java:312)
at edu.harvard.iq.dataverse.engine.command.impl.DuraCloudSubmitToArchive Command$2.run(DuraCloudSubmitToArchiveCommand.java:125)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at org.apache.commons.compress.archivers.zip.StreamCompressor.deflate(St reamCompressor.java:183)
at org.apache.commons.compress.archivers.zip.ScatterZipOutputStream.addA rchiveEntry(ScatterZipOutputStream.java:96)
at org.apache.commons.compress.archivers.zip.ParallelScatterZipCreator$2 .call(ParallelScatterZipCreator.java:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624)
... 1 more]]

[2019-01-16T14:46:25.362-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.data verse.engine.command.impl.DuraCloudSubmitToArchiveCommand] [tid: _ThreadID=31 _T hreadName=http-listener-1(4)] [timeMillis: 1547667985362] [levelValue: 900] [[
Error attempting to add content 'doi-10-5072-fk2-c0w2t8v1.0.zip' in 'doi-10-50 72-fk2-c0w2t8' due to: null]]

[2019-01-16T14:46:25.365-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=31 _ThreadName=Thread-9] [timeMillis: 1547667985365] [levelValue: 1000] [[
org.duracloud.error.ContentStoreException: Error attempting to add content 'do i-10-5072-fk2-c0w2t8v1.0.zip' in 'doi-10-5072-fk2-c0w2t8' due to: null
at org.duracloud.client.ContentStoreImpl.doAddContent(ContentStoreImpl.j ava:661)
at org.duracloud.client.ContentStoreImpl.addContent(ContentStoreImpl.jav a:586)
at edu.harvard.iq.dataverse.engine.command.impl.DuraCloudSubmitToArchive Command.performArchiveSubmission(DuraCloudSubmitToArchiveCommand.java:134)
at edu.harvard.iq.dataverse.engine.command.impl.AbstractSubmitToArchiveC ommand.execute(AbstractSubmitToArchiveCommand.java:52)
at edu.harvard.iq.dataverse.engine.command.impl.AbstractSubmitToArchiveC ommand.execute(AbstractSubmitToArchiveCommand.java:21)
at edu.harvard.iq.dataverse.EjbDataverseEngine.submit(EjbDataverseEngine .java:232)
at sun.reflect.GeneratedMethodAccessor2270.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(E JBSecurityManager.java:1081)
at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBS ecurityManager.java:1153)
at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.j ava:4786)
at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:656)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext( InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroun dInvoke(AbstractEJBRequestScopeActivationInterceptor.java:64)
at org.jboss.weld.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInt erceptor.java:52)
at sun.reflect.GeneratedMethodAccessor1545.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept (InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext( InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(Sys temInterceptorProxy.java:163)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvo ke(SystemInterceptorProxy.java:140)
at sun.reflect.GeneratedMethodAccessor1546.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept (InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext( InterceptorManager.java:822)
at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(Inte rceptorManager.java:369)
at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4 758)
at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:474 6)
at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLoca lObjectInvocationHandler.java:212)
at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke (EJBLocalObjectInvocationHandlerDelegate.java:88)
at com.sun.proxy.$Proxy671.submit(Unknown Source)
at edu.harvard.iq.dataverse._EJB31_Generated__EjbDataverseEngine__Intf _Bean.submit(Unknown Source)
at edu.harvard.iq.dataverse.api.Admin.submitDatasetVersionToArchive(Admi n.java:1321)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(E JBSecurityManager.java:1081)
at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBS ecurityManager.java:1153)
at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.j ava:4786)
at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:656)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext( InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroun dInvoke(AbstractEJBRequestScopeActivationInterceptor.java:64)
at org.jboss.weld.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInt erceptor.java:52)
at sun.reflect.GeneratedMethodAccessor1545.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept (InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext( InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(Sys temInterceptorProxy.java:163)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvo ke(SystemInterceptorProxy.java:140)
at sun.reflect.GeneratedMethodAccessor1546.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept (InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext( InterceptorManager.java:822)
at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(Inte rceptorManager.java:369)
at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4 758)
at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:474 6)
at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLoca lObjectInvocationHandler.java:212)
at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke (EJBLocalObjectInvocationHandlerDelegate.java:88)
at com.sun.proxy.$Proxy600.submitDatasetVersionToArchive(Unknown Source)
at edu.harvard.iq.dataverse.api.EJB31_Generated__Admin__Intf____Bean .submitDatasetVersionToArchive(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHa ndlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethod Dispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethod Dispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatch erProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.ja va:152)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethod Dispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(Resour ceMethodInvoker.java:387)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(Resourc eMethodInvoker.java:331)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(Resourc eMethodInvoker.java:103)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:27 1)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(Request Scope.java:29]]

[2019-01-16T14:46:25.365-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=31 _ThreadName=Thread-9] [timeMillis: 1547667985365] [levelValue: 1000] [[
7)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java: 254)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHand ler.java:1028)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:3 72)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContaine r.java:381)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContaine r.java:344)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContaine r.java:221)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java :1682)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:344)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:214)
at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java :226)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:214)
at edu.harvard.iq.dataverse.api.ApiBlockingFilter$3.doBlock(ApiBlockingF ilter.java:65)
at edu.harvard.iq.dataverse.api.ApiBlockingFilter.doFilter(ApiBlockingFi lter.java:157)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:214)
at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:30)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:214)
at org.apache.catalina.core.ApplicationDispatcher.doInvoke(ApplicationDi spatcher.java:873)
at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDisp atcher.java:739)
at org.apache.catalina.core.ApplicationDispatcher.processRequest(Applica tionDispatcher.java:575)
at org.apache.catalina.core.ApplicationDispatcher.doDispatch(Application Dispatcher.java:546)
at org.apache.catalina.core.ApplicationDispatcher.dispatch(ApplicationDi spatcher.java:428)
at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDis patcher.java:378)
at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:34)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:214)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV alve.java:316)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextV alve.java:160)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.j ava:734)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.jav a:673)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j ava:174)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.j ava:734)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.jav a:673)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.j ava:412)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav a:282)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallab le.call(ContainerMapper.java:459)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(Container Mapper.java:167)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler. java:201)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.ja va:175)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpSer verFilter.java:235)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(Executor Resolver.java:119)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(De faultFilterChain.java:284)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart (DefaultFilterChain.java:201)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultF ilterChain.java:133)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultF ilterChain.java:112)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.jav a:77)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNI OTransport.java:561)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(Abstr actIOStrategy.java:112)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerTh readIOStrategy.java:117)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(Wo rkerThreadIOStrategy.java:56)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadR unnable.run(WorkerThreadIOStrategy.java:137)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(Abs tractThreadPool.java:565)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(Abstra ctThreadPool.java:545)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.http.client.ClientProtocolException
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp Client.java:187)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttp Client.java:83)
at org.duracloud.common.web.RestHttpHelper.executeRequest(RestHttpHelper .java:292)
at org.duracloud.common.web.RestHttpHelper.put(RestHttpHelper.java:192)
at org.duracloud.client.ContentStoreImpl.doAddContent(ContentStoreImpl.j ava:625)
... 145 more
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry re quest with a non-repeatable request entity
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:108)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java :111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp Client.java:185)
... 149 more
Caused by: java.io.IOException: Pipe broken
at java.io.PipedInputStream.read(PipedInputStream.java:321)
at java.io.PipedInputStream.read(PipedInputStream.java:377)
at java.security.DigestInputStream.read(DigestInputStream.java:161)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.ja va:133)
at org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEnti tyProxy.java:121)
at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(D efaultBHttpClientConnection.java:156)
at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.jav a:160)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpReques tExecutor.java:238)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecu tor.java:123)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec. java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java :185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
... 151 more]]

@qqmyers
Copy link
Member Author

qqmyers commented Jan 16, 2019

@kcondon - thanks for sending it back. I'm also seeing a problem with the API, but not the workflow at the moment, so some debugging needed. I'll let you know when I figure out what's changed.

@qqmyers
Copy link
Member Author

qqmyers commented Jan 17, 2019

@kcondon - just uploaded a fix for the api. It looks like at some point the fact that the api is a call to the server which then triggers file retrieval calls via http caused some deadlock. I was mostly testing from our GUI which was the same code except the initial http call so I missed the issue. In any case, the new async mechanism that works like indexing - the api call just starts the process and returns - works for me. (The workflow part should have been working all along...).

So - I think you can look at this again. I'll go back tomorrow to look at your comments on the docs and see if I can make those clearer, but I won't touch the code unless you find issues.

@qqmyers
Copy link
Member Author

qqmyers commented Jan 18, 2019

@kcondon - made some clarifications/corrections in the docs including 1) the :ArchiverClassName doesn't need to be listed in :ArchiverSettings or the workflow definition, 2) the DuraCloud port and context are optional since they have defaults, but setting them only works if they are also listed in the :ArchiverSettings. (FWIW: This split is so thing can be generic - the :ArchiverSettings tells the generic code which properties to send to the archive-specific class and then the archive-specific class uses those settings.). Hopefully it's all good at this point...

@kcondon kcondon self-assigned this Jan 22, 2019
@kcondon
Copy link
Contributor

kcondon commented Jan 22, 2019

@qqmyers API is working now, thanks. Am having trouble with workflow but likely a simple config issue: I've added workflow using sample file, replacing "string" with values that were mentioned in the api section. However, when I publish a dataset it fails with log error:

[2019-01-22T15:01:25.073-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.util.ArchiverUtil] [tid: _ThreadID=142 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548187285073] [levelValue: 900] [[
Unable to instantiate an Archiver of class: null]]

[2019-01-22T15:01:25.074-0500] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.workflow.internalspi.ArchivalSubmissionWorkflowStep] [tid: _ThreadID=142 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548187285074] [levelValue: 1000] [[
No Archiver instance could be created for name: null]]

[2019-01-22T15:01:25.075-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.workflow.WorkflowServiceBean] [tid: _ThreadID=142 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548187285075] [levelValue: 900] [[
Workflow 541c4e8b-4261-4cd1-8f91-a7c3ffbc758f failed: No Archiver]]

[2019-01-22T15:01:25.074-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=142 _ThreadName=Thread-9] [timeMillis: 1548187285074] [levelValue: 1000] [[
java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at edu.harvard.iq.dataverse.util.ArchiverUtil.createSubmitToArchiveCommand(ArchiverUtil.java:25)
at edu.harvard.iq.dataverse.workflow.internalspi.ArchivalSubmissionWorkflowStep.run(ArchivalSubmissionWorkflowStep.java:52)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.runStep(WorkflowServiceBean.java:256)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.executeSteps(WorkflowServiceBean.java:221)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.forward(WorkflowServiceBean.java:161)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.start(WorkflowServiceBean.java:102)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1081)
at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:1153)
at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4786)
at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:656)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:73)
at org.jboss.weld.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
at sun.reflect.GeneratedMethodAccessor76751.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
at sun.reflect.GeneratedMethodAccessor76752.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:369)
at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4758)
at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4746)
at com.sun.ejb.containers.EjbAsyncTask.call(EjbAsyncTask.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)]]

[2019-01-22T15:01:25.080-0500] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.workflow.WorkflowServiceBean] [tid: _ThreadID=142 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548187285080] [levelValue: 800] [[
Removing workflow lock]]

@qqmyers
Copy link
Member Author

qqmyers commented Jan 22, 2019

@kcondon - The issue may be that the "string" entries in requiredSettings part of the json file aren't meant to be substituted. They just specify the data type for that setting so it can be passed appropriately. The actual values will come from the named settings you've already set up for the API.

@kcondon
Copy link
Contributor

kcondon commented Jan 22, 2019

@qqmyers No luck, still seeing this error:
[2019-01-22T16:38:03.892-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.util.ArchiverUtil] [tid: _ThreadID=143 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548193083892] [levelValue: 900] [[
Unable to instantiate an Archiver of class: null]]

[2019-01-22T16:38:03.900-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=143 _ThreadName=Thread-9] [timeMillis: 1548193083900] [levelValue: 1000] [[
java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at edu.harvard.iq.dataverse.util.ArchiverUtil.createSubmitToArchiveCommand(ArchiverUtil.java:25)
at edu.harvard.iq.dataverse.workflow.internalspi.ArchivalSubmissionWorkflowStep.run(ArchivalSubmissionWorkflowStep.java:52)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.runStep(WorkflowServiceBean.java:256)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.executeSteps(WorkflowServiceBean.java:221)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.forward(WorkflowServiceBean.java:161)
at edu.harvard.iq.dataverse.workflow.WorkflowServiceBean.start(WorkflowServiceBean.java:102)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1081)
at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:1153)
at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4786)
at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:656)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:73)
at org.jboss.weld.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
at sun.reflect.GeneratedMethodAccessor131.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:369)
at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4758)
at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4746)
at com.sun.ejb.containers.EjbAsyncTask.call(EjbAsyncTask.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)]]

[2019-01-22T16:38:03.901-0500] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.workflow.internalspi.ArchivalSubmissionWorkflowStep] [tid: _ThreadID=143 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548193083901] [levelValue: 1000] [[
No Archiver instance could be created for name: null]]

[2019-01-22T16:38:03.902-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.workflow.WorkflowServiceBean] [tid: _ThreadID=143 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548193083902] [levelValue: 900] [[
Workflow ba91b595-3a05-47c5-bf14-10e56f4b8797 failed: No Archiver]]

[2019-01-22T16:38:03.903-0500] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.workflow.WorkflowServiceBean] [tid: _ThreadID=143 _ThreadName=__ejb-thread-pool1] [timeMillis: 1548193083903] [levelValue: 800] [[
Removing workflow lock]]

@qqmyers
Copy link
Member Author

qqmyers commented Jan 22, 2019

@kcondon - hmmm. One other thought - did you update the PostPublishDataset workflow:
curl http://localhost:8080/api/admin/workflows/default/PostPublishDataset
returns the id of a good workflow (ids autoincrement). If that's not it...

My workflow json looks as follows:
curl http://localhost:8080/api/admin/workflows/10
{
"status": "OK",
"data": {
"name": "Archive submission workflow",
"id": 10,
"steps": [
{
"stepType": "archiver",
"provider": ":internal",
"parameters": {
"stepName": "archive submission"
},
"requiredSettings": {
":ArchiverClassName": "string",
":ArchiverSettings": "string",
":DuraCloudPort": "string",
":DuraCloudContext": "string",
":DuraCloudHost": "string"
}
}
]
}
}
and my :ArchiverClassName is set as:
curl http://localhost:8080/api/admin/settings/:ArchiverClassName
{"status":"OK","data":{"message":"edu.harvard.iq.dataverse.engine.command.impl.DuraCloudSubmitToArchiveCommand"}}

If you have these two, I think you should get the class created OK . (You'll need the :ArchiverSettings and :DuraCloudHost set as well to make it all go, but those should be good already with the API working.)

Could there be a typo somewhere? The way this works is that the requiredSettings listed in json are the only settings the workflow step gets to see (so steps can't read settings outside the set an admin has allowed through the workflow definition). If the setting exists, and the workflow definition lists it, you shouldn't be getting a null. (I did a quick check between my 'good' branch and this one and don't see any differences in how the settings are read/passed, so I don't think its a code issue. If nothing here helps, I look again.)

@kcondon
Copy link
Contributor

kcondon commented Jan 22, 2019

@qqmyers Thanks for the specifics. It looks like for some reason the workflow was not created the same way as yours. I believe I did a wget on the raw sample file from github and then ran the add workflow endpoint. This is what I get:

curl http://localhost:8080/api/admin/workflows/4 | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
194 194 194 194 0 0 10634 0 --:--:-- --:--:-- --:--:-- 11411
{
"data": {
"steps": [
{
"requiredSettings": {},
"parameters": {
"stepName": "archive submission"
},
"provider": ":internal",
"stepType": "archiver"
}
],
"id": 4,
"name": "Archive submission workflow"
},
"status": "OK"
}

I'm wondering whether since I am defaulting on port and context if that is causing the create workflow to skip part of the file? I'll try deleting and re-adding, both as is and with settings explicitly set.

Other setting:
curl http://localhost:8080/api/admin/settings/:ArchiverClassName
{"status":"OK","data":{"message":"edu.harvard.iq.dataverse.engine.command.impl.DuraCloudSubmitToArchiveCommand"}}

@qqmyers
Copy link
Member Author

qqmyers commented Jan 22, 2019

@kcondon -got it. It looks like the code in JsonParser to parse the requiredSettings didn't make it in one of the earlier PRs (ie. the general workflow update since this isn't specific to the archiver). Hopefully that's the last issue. To get this to work, I think you'll have to rebuild and then resubmit the workflow/set the PostPublishDataset workflow to the new one. Thanks again for persevering!

@kcondon
Copy link
Contributor

kcondon commented Jan 22, 2019

@qqmyers OK, rebuilt, readded workflow, set as default. Still fails with null archiver.

[2019-01-22T18:14:17.503-0500] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.util.ArchiverUtil] [tid: _ThreadID=150 _ThreadName=__ejb-thread-pool12] [timeMillis: 1548198857503] [levelValue: 900] [[
Unable to instantiate an Archiver of class: null]]

curl http://localhost:8080/api/admin/workflows/5 | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
109 328 109 328 0 0 18446 0 --:--:-- --:--:-- --:--:-- 20500
{
"data": {
"steps": [
{
"requiredSettings": {
"DuraCloudPort": "string",
"ArchiverClassName": "string",
"DuraCloudContext": "string",
"ArchiverSettings": "string",
"DuraCloudHost": "string"
},
"parameters": {
"stepName": "archive submission"
},
"provider": ":internal",
"stepType": "archiver"
}
],
"id": 5,
"name": "Archive submission workflow"
},
"status": "OK"
}

@pameyer
Copy link
Contributor

pameyer commented Jan 22, 2019

Ran a few quick tests on 581e5dc in this branch. Existing RSAL workflows (RSAL 0.1) continue to work as expected (at least for success path).

@qqmyers
Copy link
Member Author

qqmyers commented Jan 22, 2019

@kcondon - Argghh - I think you need colons in front of the requiredSettings, which is not what the example in the branch has... but is what's working for me (see earlier comment). I'll commit an update...

@kcondon
Copy link
Contributor

kcondon commented Jan 22, 2019

@qqmyers That did the trick!

@kcondon
Copy link
Contributor

kcondon commented Jan 22, 2019

@qqmyers Have found some weirdness in API, need to narrow it down:
using API, trying to archive v2.0 of dataset not yet archived using a key from a user with no particular perms. Says error, version already archived. It turns out it did archive it though it had not been archived yet.

I'm heading out now so will look at it again tomorrow. Thanks for the help and fixes.

@qqmyers
Copy link
Member Author

qqmyers commented Jan 23, 2019

@kcondon - not sure I get the full picture, so here's some possibly useful info:

  • any user able to publish the dataset should be allowed to use the api on that dataset. As far as I can tell, a pemission error should get an initial OK response and then a log message about""Unexpected Exception calling submit archive command..."
  • the BAD_REQUEST, Version already archived at: ... message should only occur if the archivalCopylocation for that datasetversion is not null in the db.
  • versions can all be archived independently - no order is required and the test for whether something has been archived is for the specified version
  • archiving any version of a dataset should fail if a prior version was archived (went far enough to create the space on duracloud) and the space has not been deleted. (All versions use the same space and, to avoid overwriting a version that was submitted but not yet manually transferred to Chronopolis, neither the api or workflow can write a new version until the space is deleted.)

@kcondon
Copy link
Contributor

kcondon commented Jan 23, 2019

@qqmyers Thanks for the detail. It was a combination of the archivalCopyLocation value and the enabled workflow. I was not fully clearing out the prior entry. It's working, merged.

@qqmyers
Copy link
Member Author

qqmyers commented Jan 23, 2019

@kcondon - Great - thanks for the all the work on this - it was interesting breaking this into multiple PRs and trying to keep them all in sync. Unfortunately, I think you ended up being the one who found problems when I didn't keep things straight.

With the merge, can I delete the qdr.duracloud account we created for testing? I don't think there's any issue with recreating it if/when needed, but since it needs admin privs to create spaces, I'd like to close it down if you're not planning further testing/demoing with it.

@kcondon
Copy link
Contributor

kcondon commented Jan 23, 2019

@qqmyers Sure, feel free to delete the account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants