-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recent (non-MARC) imports are adding deprecated language codes (presumably via language name lookups, not just old codes in the import data) #9504
Comments
@hornc can you propose a priority for this based on your use cases? Is this happening at a large scale (e.g. how many records being affected)? Is this blocking one of our systems/processes? This would help us prioritize accordingly |
@scottbarnes can you assign this issue to me? |
I have assigned this to you, @AbhinavKRN. Please ask any questions if you get stuck anywhere. |
Sure @scottbarnes on it. |
So, I think this is a relatively low priority issue because I have a bot task that runs weekly to correct deprecated language codes to their current codes (if one exists). To do this properly, we might want to think a bit about what is supposed to happen in the various cases. What should happen in the following cases:
I was hoping someone would find and link the related "duplicate languages in dropdowns" issue, as that has similar requirements for extending the language code model, which I think is necessary to add this functionality. Optional language fields we might need to add:
Note: some deprecated codes may not have a clear single value for I'm not completely happy with the |
I think #8145 was perhaps the issue I remember, which touches on duplicate names. Is there a clearer one? |
@hornc, I had hoped we could discuss this during the Monday ABC call, but somehow it was missed during triage. I added this to the agenda for the coming week. |
I just found this code that translates already translates deprecated language codes: openlibrary/openlibrary/catalog/marc/parse.py Lines 288 to 317 in 4471420
I had been thinking this (and the related removing deprecated language codes from the edition edit dropdown) required an update to the |
It looks like MARC imports use the hardcoded deprecated language code tables in openlibrary/openlibrary/catalog/marc/parse.py , but imports from other sources do not. #9809 is an attempt to consolidate the deprecations into the language code type , so there should be an opportunity to consolidate the imports, and perhaps remove the special-case translations? |
@AbhinavKRN, are you still interested in working on this issue? If not I will open it back for others who may wish to work on it. |
Problem
https://openlibrary.org/books/OL51818714M/Yederasiw_Mastawesha
is a recently imported item that picked up the deprecated Ethiopian language code (the metadata has since been updated), it looks like the language code lookups, converting from language name to a code are using a list of codes with deprecated duplicates, so the resulting code may be the deprecated one (it's probably arbitrary depending on which is listed first?)
How to fix:
The Name -> code lookup list should only contain current item codes.
This relates to the 'duplicates in the language drop down list' issue that I thought I saw recently, but cannot find it now. The dropdown and import translation list should both only contain current language codes.
Perhaps the language code config should have a deprecated parameter, and these can be excluded as needed.
Relates to #9002 in that the example shows at least BWB sourced import are using language lookups.
The specific code to change is:
https://github.com/internetarchive/openlibrary/pull/9488/files
The text was updated successfully, but these errors were encountered: