Fix ImportBot to import Archive.org works w/o MARCs #459
Labels
Module: Import
Issues related to the configuration or use of importbot and other bulk import systems. [managed]
Priority: 1
Do this week, receiving emails, time sensitive, . [managed]
Currently, many works digitized by Internet Archive are not making it into Open Library. The root cause is an overly restrictive policy around repub-status values and the requirement for the archive.org item to have a MARC.
sudo -u openlibrary /olsystem/bin/olenv HOME=. OPENLIBRARY_RCFILE=/olsystem/etc/olrc-importbot python scripts/manage-imports.py --config /olsystem/etc/openlibrary.yml import-all
This would query for new IA items (in last day) which must have MARCs and have epub-status of 4 (which is too strict). As part of my fix, I have removed any repub-status check and also removed the requirement for a marc to be present.
A batch is created for all these items (for efficiency sake) and then the items are enumerated and "processed"
Processing entails delegating to openlibrary/core/ia.py which uses
get_item_status()
as a check to ensure the IA item meets all criteria. This is currently failing atbad-repub-state
. As part of my in-progress fix, this check is removed because the query in step catalog/onix/onix.py attempts to use a global variable in init(), but doesn't declare it global #2 has been relaxed.During processing, the script makes a POST to the openlibrary.org to login and then a POST to the /api/import/ia API endpoint (which under the hood routes to openlibrary/plugins/importapi/code.py -- namely
ia_importapi
).Within ia_importapi POST, the metadata for the item (to create an OL work/edition) is requested from ia.get_metadata(key). This is currently failing because no MARC exists in (see "case 4" in code.py's ia_importapi)
The text was updated successfully, but these errors were encountered: