-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for ISO-639-3 language codes #7690
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Followed the instructions: deployed, applied citation.tsv, created harvest client and it does harvest 99 datasets (there was no subset available) but it failed with an unknown error and nothing in logs. I then attempted a basic harvest against Harvard and it failed outright, no indication of why, just a get request failure. I had been using oai_dc and generic oai client for initial test but switched to dataverse_4+ against Harvard once generic failed and no luck. |
It sounds like there's something wrong with this system, that's unrelated to this PR; if it can't harvest anything. You definitely want to harvest this particular set; if it's not seeing any sets, it means something is wrong. Let's take a closer look. |
as discussed, I can try testing against develop branch. harvesting worked partially against test case.
Thanks for the assistance @landreev |
What this PR does / why we need it:
This will allow us to import metadata where the "language" field is populated not by a literal value ("French", "English") but by 3 letter ISO-639-3 codes ("fra", "eng").
This problem was encountered by a remote installation when harvesting Dublin Core records from Zenodo. DC documentation suggests that using these codes to specify the language is an acceptable practice.
In this PR these codes are added as "alternate values" for the corresponding controlled vocabulary entries (more info in the issue). Once a record with a field like this (for example,
<dc:language>fra</dc:language>
) is imported, it becomes a controlled vocabulary entryFrench
in our metadata.Which issue(s) this PR closes:
Closes #7638
Special notes for your reviewer:
Note that our existing metadata block update API was not updating these "alternate values" found in the TSV. Those were only populated on the initial import. So I had to change that.
Suggestions on how to test this:
Using the example from the linked issue:
Once the branch is built and deployed,
Update the citation block:
Create a harvest from the original issue:
(This OAI server offers hundreds of sets. This PR has an extra improvement, unrelated to languages - the sets will appear sorted in the pull down menu, making it easier to use)
The harvest should be able to import all 21 records in the set. Including the 16 of them that have these language codes in the metadata, that were failing previously.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: