-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request/Idea: Sanitize languages controlled vocabulary values #8243
Comments
While extending the list of languages is a good idea, this seems like it may be better handled through an external controlled vocabulary, using the new CV management capabilities. The list of languages here seems to be a relatively small subset of possible ISO639-3 languages. A quick wikipedia search (in the absence of access to the ISO standard) suggests a broader list of ISO-specified languages exists (https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Languages/List_of_ISO_639-3_language_codes_(2019)). As an example, for Australia, we would like to be able to use Aboriginal language groups. |
@jeromeroucou sure, this looks good. Please go ahead. Sorry for the slow reply.
@stevenmce yes, I agree in principle. Maybe someday we'll move the language field to an external controlled vocabulary. But for now it probably makes sense to improve the existing internal field. Belt and suspenders, perhaps. 😄 @jeromeroucou if you're not familiar with this new-ish feature, please see https://guides.dataverse.org/en/5.12/admin/metadatacustomization.html#using-external-vocabulary-services |
people requesting extra ISO language codes to be added as legitimate controlled vocab. values (this is just a matter of adding extra values to citation.tsv); these are NOT duplicates, different things are being requested to be added in the issues below, but makes sense to get all 3 out of the way at the same time: Added back the laberl: NIH OTA: 1.4.1 Need to touch base with Leonid on this. |
2023/12/19: Prioritized during meeting on 2023/12/18. Added to Needs Sizing. |
I've added the label "Metadata" - this is really a metadata issue; but Harvesting is affected so, not unreasonable to treat is a harvesting issue as well. I've given it size 10, under the assumption that this is only about modifying the citation block, and producing a flyway script for any adjustment to the existing values that may be needed. Please note that the change to the metadata block is going to be less trivial than replacing the existing controlled vocab. values there with the copy-and-pasted "proposed values" above. The 2 will need to be merged, carefully. The list above contains some ISO abbreviations absent from the current block; but the opposite is true for some languages as well - we already support some codes that are not on the list above.
in the
|
Hello, The initial proposal is no longer up to date (Nov 16, 2021), here is a new one with the associated PR, adapted to Dataverse 6.1, linked to this commit more precisely: 991c5f9 You can provide your comments directly in the PR. |
Following up on @stevenmce's earlier comment, #7377 is also related |
Just noticed this went into the current sprint, please feel free to take on the draft PR or to indicate us additions if you want us to modify something ;) |
I am leaning towards just merging what's in #10197, with just minor additions, possibly. |
I got gang-pressed into mostly working on something else for the past couple of weeks, but I am still determined to move this along ASAP. I am planning to make a new PR, from my own branch, instead of the draft PR #10197, but I may ask more questions there. |
…er conversation with requestor. #8243
… - I'm leaving the main name intact (so that the block update will still works), but adding both versions as extra alternative names, so that either is importable. #8243
…he ISO 639-1 and -2 codes for Nuosu. #8243
…the order in which they are listed in the current ISO 639-3 table) #8243
…dates easier. Used the first 3-letter code as the identifier for each of the 185 supported languages. #8243
👏 |
Overview of the Feature Request
In order to improve the content of the proposed languages as a list of controlled values, and to be able to expose them with an identifier later on, we want to modify them by adding the ISO 639-3 code as an alternative value.
Before making a pull request, we would like to have feedback from you on our proposal.
Please note that the language "Bihari" does not have an ISO 639-3 code, but only ISO 639-2 / 5.
A modified data migration script will be required.
Below are the proposed values :
What kind of user is the feature intended for?
API User, Curator, Depositor, and Guest
What inspired the request?
Requirement of archive language metadata
What existing behavior do you want changed?
Improve languages list to be more compliant with ISO standard
Any brand new behavior do you want to add to Dataverse?
None
Any related open or closed issues to this feature request?
Pull request #7690
The text was updated successfully, but these errors were encountered: