-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research why printdisabled
Archive.org items not in Open Library
#1047
Comments
Hey, @mekarpeles I'm not certain if it's related, but searching The same is true for [Revised 16 August] I'm not sure if you consider the 13 to be "orphans", as they are under extant-but-unnecessary work records. |
similar to #732 , but this is for non-inlibrary items |
IA client commands and OL datadump checking:
|
I'm converting this to a research task for #732 |
printdisabled
Arch IDs that aren't in Open Library
printdisabled
Arch IDs that aren't in Open Libraryprintdisabled
Archive.org items not in Open Library
We have ~755,879 (admin) 78,704 (incognito) archive.org items in
printdisabled
which don't haveopenlibrary
oropenlibrary_edition
IDs:https://archive.org/search.php?query=collection%3Aprintdisabled%20AND%20-openlibrary_edition%3A%2A%20AND%20-openlibrary%3A%2A
We need to Update/Create Open Library editions which are missing Archive IDs.
Updated query:
Since not all
printdisabled
items are necessarily published books with good quality metadata, limiting the scope to items with ISBNs will give us better quality imports into Open Library:https://archive.org/search.php?query=collection%3Aprintdisabled%20AND%20NOT%20collection%3Ainlibrary%20AND%20NOT%20openlibrary_edition%3A%2A%20AND%20isbn%3A%2A
This gives ~330,110 items (admin) that are printdisabled only that should be imported / linked to Open Library records.
See wiki page https://github.com/internetarchive/openlibrary/wiki/archive.org-%E2%86%94-Open-Library-synchronisation for information on IA ↔ OL synchronisation.
Solution
Loop over the archive.org items whose
ocaid
s are missing in Open Library, take their ISBNs and or titles of these archive.org items and search for them in Open Library.If a corresponding Open Library edition exists for that ISBN, then write in the
ocaid
on the Open Library edition. If the Open Library edition is an orphan, then we are going to do a dummy-edit so that a work is created, and then perform a writeback to Archive.org so theopenlibrary_edition
andopenlibrary_work
are created.We should do this using the Open Library Client (not the import bot)
The text was updated successfully, but these errors were encountered: