-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update MARC importer language mapping table #9344
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -279,14 +279,39 @@ def read_edition_name(rec: MarcBase) -> str: | |
'end': 'eng', | ||
'enk': 'eng', | ||
'ent': 'eng', | ||
'cro': 'chu', | ||
'jap': 'jpn', | ||
'fra': 'fre', | ||
'gwr': 'ger', | ||
'sze': 'slo', | ||
'fr ': 'fre', | ||
'fle': 'dut', # Flemish -> Dutch | ||
# 2 character to 3 character codes | ||
'fr ': 'fre', | ||
'it ': 'ita', | ||
# LOC MARC Deprecated code updates | ||
'cam': 'khm', # Khmer | ||
'esp': 'epo', # Esperanto | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I only added the codes which had a clear one-to-one correct mapping -- your list was super helpful showing the corrected mappings. Many deprecated codes don't have a single obvious mapping, which prevents this kind of automated fix, and there seems to be a range of reasons why a code is deprecated. Some seem technical dialect vs language factors like I think your advice on what is needed to correct the ~217 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be worth tossing in a comment about |
||
'eth': 'gez', # Ethiopic | ||
'far': 'fao', # Faroese | ||
'fri': 'fry', # Frisian | ||
'gae': 'gla', # Scottish Gaelic | ||
'gag': 'glg', # Galician | ||
'gal': 'orm', # Oromo | ||
'gua': 'grn', # Guarani | ||
'int': 'ina', # Interlingua (International Auxiliary Language Association) | ||
'iri': 'gle', # Irish | ||
'lan': 'oci', # Occitan (post 1500) | ||
'lap': 'smi', # Sami | ||
'mla': 'mlg', # Malagasy | ||
'mol': 'rum', # Romanian | ||
'sao': 'smo', # Samoan | ||
'scc': 'srp', # Serbian | ||
'scr': 'hrv', # Croatian | ||
'sho': 'sna', # Shona | ||
'snh': 'sin', # Sinhalese | ||
'sso': 'sot', # Sotho | ||
'swz': 'ssw', # Swazi | ||
'tag': 'tgi', # Tagalog | ||
'taj': 'tgk', # Tajik | ||
'tar': 'tat', # Tatar | ||
'tsw': 'tsn', # Tswana | ||
} | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gwr
looks like it could be a typo fix, butcro
andsze
seem unlikely. There's a pretty big archive of MARC records which have been imported, so it should be possible to see how frequently (if at all) these are used.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sze
looks like it could be an ISO code for the https://en.wikipedia.org/wiki/Seze_language , which has nothing to do withslo
, and I thinkchu
->cro
has a similar ambiguity. Without comments, I'm not sure what that mapping is protecting against, but to me it looks like they are more likely to re-assign non-MARC language codes to unrelated languages. I imagine there were some historical records that those changes worked for, but this code can only protect against systematic and likely codes we might encounter through regular older catalog imports.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my natural bias is towards not changing things I don't understand, particularly since the code represents (or should) decades of accumulated knowledge, but I'd be hard pressed to argued for preserving such an ancient bit of cruft.