Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for ISO-639-3 language codes #7690

Merged
merged 6 commits into from
Mar 22, 2021
Merged

Conversation

landreev
Copy link
Contributor

@landreev landreev commented Mar 16, 2021

What this PR does / why we need it:

This will allow us to import metadata where the "language" field is populated not by a literal value ("French", "English") but by 3 letter ISO-639-3 codes ("fra", "eng").
This problem was encountered by a remote installation when harvesting Dublin Core records from Zenodo. DC documentation suggests that using these codes to specify the language is an acceptable practice.

In this PR these codes are added as "alternate values" for the corresponding controlled vocabulary entries (more info in the issue). Once a record with a field like this (for example, <dc:language>fra</dc:language>) is imported, it becomes a controlled vocabulary entry French in our metadata.

Which issue(s) this PR closes:

Closes #7638

Special notes for your reviewer:

Note that our existing metadata block update API was not updating these "alternate values" found in the TSV. Those were only populated on the initial import. So I had to change that.

Suggestions on how to test this:

Using the example from the linked issue:
Once the branch is built and deployed,
Update the citation block:

wget https://raw.githubusercontent.com/IQSS/dataverse/7638-iso-639-3-language-codes/scripts/api/data/metadatablocks/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

Create a harvest from the original issue:

harvestUrl: https://www.zenodo.org/oai2d
metadataFormat: oai_dc
set: user-couperin

(This OAI server offers hundreds of sets. This PR has an extra improvement, unrelated to languages - the sets will appear sorted in the pull down menu, making it easier to use)

The harvest should be able to import all 21 records in the set. Including the 16 of them that have these language codes in the metadata, that were failing previously.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@landreev landreev assigned landreev and unassigned landreev Mar 16, 2021
@sekmiller sekmiller self-assigned this Mar 17, 2021
Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@sekmiller sekmiller removed their assignment Mar 17, 2021
@kcondon
Copy link
Contributor

kcondon commented Mar 22, 2021

Followed the instructions: deployed, applied citation.tsv, created harvest client and it does harvest 99 datasets (there was no subset available) but it failed with an unknown error and nothing in logs. I then attempted a basic harvest against Harvard and it failed outright, no indication of why, just a get request failure. I had been using oai_dc and generic oai client for initial test but switched to dataverse_4+ against Harvard once generic failed and no luck.

@landreev
Copy link
Contributor Author

It sounds like there's something wrong with this system, that's unrelated to this PR; if it can't harvest anything. You definitely want to harvest this particular set; if it's not seeing any sets, it means something is wrong. Let's take a closer look.

@kcondon
Copy link
Contributor

kcondon commented Mar 22, 2021

as discussed, I can try testing against develop branch. harvesting worked partially against test case.
Ok, was a few things that made it confusing:

  1. zenodo has a ton of sets so it appears to take a while to populate set list. leonid waited for sets and selected right one. it worked
  2. zenodo has a lot of studies outside of specified set that likely contains data we don't like, hence failure
  3. I have a browser autocomplete that types http rather than https when I type harvard oai server, that can contact server but not complete transactions. changing to https works.

Thanks for the assistance @landreev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Harvesting client fails : Zenodo "Couperin" community / error with "language" field mapping.
5 participants