-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harvest: Json format harvest fails with unknown field when field exists on client. #7075
Comments
@kcondon I'm trying to reproduce the error, but I can't seem to add the custom field mraCollection in a way that reproduces the error. Do you have a TSV file for me so I can add this custom field to Dataverse? |
Hi, I believe all I did was run the custom script for Harvard metadata in the dvinstall file: https://raw.githubusercontent.com/IQSS/dataverse/develop/scripts/api/setup-optional-harvard.sh |
I ran the script to add the Harvard metadata successfully. When I run the harvest, I get the same error as you but with less info and a null in the message. However, I decided to not dig deeper into this difference yet and focus on solr instead. I think that the problem is that the new fields were added to the database and the UI, but not to solr. So I checked the new fields that have to be added to solr using the I received an error: "Dataverse responded with empty file. When running on K8s: did you bootstrap yet?". This failure happens here:
I think the writer of this script intended to check if the amount of lines in the file with the new fields is 3 or less, but the piece of code doesn't work because the file actually has 455 lines but still ends up returning the empty file error. If my assumption is correct, I have a fix for this using this first line which I can push in a PR:
When I make this change, the update script triggers successfully but the schema.xml still doesn't have the new fields like "mraCollection". Not sure where to go next. |
@JingMa87 Thanks for looking into this! I can check with the team tomorrow when I'm back at work but I wonder whether @poikilotherm has any insight into the problem and your proposed fix since I think it is an area he is familiar with? |
Hi @JingMa87, I'll go ahead and try to reproduce with |
Hi @poikilotherm, I'm running this script on my mac using zsh. I have a local full dev environment of Dataverse running in Glassfish4. But even after I adjust the script and run it, the schema.xml doesn't update to contain the new fields. |
Hi @JingMa87, I tried to reproduce this with latest Obviously, you can still gather the fields from the API endpoint and put it in the Solr schema files manually. Please reach out on IRC for more talk and help 😄 |
@poikilotherm My colleague @mderuijter has had a similar problem actually. I decided to add the fields manually for now. @kcondon Do you run Dataverse on Payara5? That might explain why my error message is less complete compared to yours. So I added the custom fields like mraCollection to Solr and then ran a harvest on the Princeton set, resulting in all successes. Do you have the mraCollection field in Solr? I presume this to be the problem. |
@JingMa87 Yes, Payara 5 is now our target platform. |
@kcondon Did you add the custom fields like mraCollection to Solr yet? This is probably what causes the problem. |
@JingMa87 Hi, that was the problem. Apologies for not catching that but this field used to be part of schema.xml but behavior has changed to separate those out. |
For reference, see pr #7057
For this instance, the fields do exist on the client side and surprisingly, the failing datasets are not the same as indicated in the ticket when the fields do not exist on the client. The failure appears to be the same, unknown field, mraCollection but it is coming from solr?
grep Error harvest_test_n99_2020-07-13T21-28-05.log
Exception processing getRecord(), oaiUrl=https://dataverse.harvard.edu/oai, identifier=doi:10.7910/DVN/B6OJKG, edu.harvard.iq.dataverse.api.imports.ImportException, Failed to import harvested dataset: class edu.harvard.iq.dataverse.engine.command.exception.CommandException (Command [DatasetCreate dataset:49321] failed: Exception thrown from bean: javax.ejb.EJBTransactionRolledbackException: Exception thrown from bean: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/collection1: ERROR: [doc=dataset_49321] unknown field 'mraCollection')
Exception processing getRecord(), oaiUrl=https://dataverse.harvard.edu/oai, identifier=doi:10.7910/DVN/I5O6OS, edu.harvard.iq.dataverse.api.imports.ImportException, Failed to import harvested dataset: class edu.harvard.iq.dataverse.engine.command.exception.CommandException (Command [DatasetCreate dataset:51066] failed: Exception thrown from bean: javax.ejb.EJBTransactionRolledbackException: Exception thrown from bean: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/collection1: ERROR: [doc=dataset_51066] unknown field 'mraCollection')
The two failing datasets are:
You are now connected to database "thedata_alt" as user "postgres".
select id, identifier, dtype from dvobject where id=49321;
id | identifier | dtype
-------+------------+---------
49321 | DVN/A9VJVR | Dataset
(1 row)
select id, identifier, dtype from dvobject where id=51066;
id | identifier | dtype
-------+------------+---------
51066 | DVN/3XMK0W | Dataset
In the original pr, the failing datasets were:
The controversial datasets are https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.7910/DVN/B6OJKG and https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.7910/DVN/I5O6OS.
The text was updated successfully, but these errors were encountered: