Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvesting : message "javax.ejb.EJBTransactionRolledbackException, Exception thrown from bean: java.lang.NullPointerException" #9318

Open
arnaumevi opened this issue Jan 24, 2023 · 3 comments

Comments

@arnaumevi
Copy link

arnaumevi commented Jan 24, 2023

Hi,
I'm having trouble harvesting Clients with the Dataverse 5.11.1 version.
I get the message javax.ejb.EJBTransactionRolledbackException, Exception thrown from bean: java.lang.NullPointerException on the server log

Client configurations:

  • Alias : UAB
  • Server URL : https://ddd.uab.cat/oai2d
  • OAI Set : datasets
  • Metadata Format : oai_dc
  • Archive type Generic OAI archive
    Results :
    SUCCESS; 0 harvested, 0 deleted, 78 failed.

Here is the log for the attempt :
harvest_UAB_2023-01-24T13-21-32.log

Thank you for your time in advance,
Best Regards,
Arnau

@landreev
Copy link
Contributor

Thank you.
Just to confirm, you WERE able to harvest from this OAI archive successfully, before upgrading to 5.11.1, correct?

@landreev
Copy link
Contributor

A quick followup:
This isn't mentioned in this issue here, but the original report in the Google group suggests that these failures started happening after the upgrade to 5.11.1. Having looked at this OAI server and the failures, I don't think these OAI_DC records would have been imported successfully by any version of Dataverse. So if you were able to harvest from this archive previously, they must have changed their record format on the server side since then.

The short answer is that Dataverse can't import these OAI_DC records because they don't have persistent identifiers in any of the <dc:identifier> fields, for example:

  <dc:identifier>https://ddd.uab.cat/record/166606</dc:identifier>
  <dc:identifier>urn:oai:ddd.uab.cat:166606</dc:identifier>
  <dc:identifier>urn:10.5565/ddd.uab.cat/166606</dc:identifier>
  <dc:identifier>urn:articleid:14712202</dc:identifier>

i.e. Dataverse wants one of these fields to contain either a DOI or a Handle identifier.

This is our fault, in more than one way:

  1. It obviously shouldn't be failing in such a confusing, unclear manner. (There's nothing informative in that harvesting log; and there's a mess of stacktraces left in the main server.log).
  2. We may not really need to enforce this requirement, that a dataset must have a persistent id, on harvested datasets. (as opposed to "real", local datasets). All we need is some working url that we can use to redirect the Dataverse user back to the archival location of the data; and the first of the identifiers in the record above is a valid url that we could use for that. It becomes more difficult/less reliable, to ensure that we are not importing duplicate copies of the same data record without persistent ids, but then again, duplicates are probably much less of a problem with harvested datasets.

We have an open issue for improving the client-side harvesting functionality that should address 1. above - we'll make more and better diagnostics visible to the admin; I'm hoping that it will be prioritized and addressed soon.
As for 2., I have brought this up with the dev. team and we at least started talking about this.

But, unfortunately, this is not something we can fix for you, and/or something you can fix with a configuration change, right away.

@pdurbin pdurbin added Type: Bug a defect User Role: API User Makes use of APIs labels Oct 9, 2023
@tjouneau
Copy link

tjouneau commented Jan 9, 2024

Is related to the previous issue :

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ⚠️ Needed/Important
Development

No branches or pull requests

4 participants