-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create ONIX Ingestion Pipeline #860
Comments
I'm all in to get started on this! Do we have some documentation on how to import them for the perspective of a developer or should that be added to the To-Do as well? |
The example CUP feed doesn't appear to have any strong identifiers for authors. What's the proposal for reconciling author name and affiliation (which it appears is all that is there) with the Open Library author records? I am very leery of making the messy OpenLibrary data even messier. |
As a first pass, we could simply identify which books we already have on OL, and simply add edition/work data to these (without creating any new authors) |
Cory (from bibliometa) writes:
|
Only adding editions would definitely be a lower risk option and would allow starting to get familiar with the ONIX standard, but it wouldn't achieve what I understood to be the primary goal of expanding and modernizing the corpus of cataloged books. |
@tfmorris you're right -- though if we can get a parser in place and figure out a solution for author authority IDs, there are ~1M records Cory can see about getting us which have isbn. Yes, it does kind of beg the question, how do we get/ensure the author identities |
In addition to the two bug fixes mentioned up top, there are a whole stack of other things that need to be cleaned up since this code hasn't been touched in 9 years. It may even be the case that it's better to use the current code as a specification and reimplement. Some of the things which I notice at a glance:
|
@mekarpeles It's long past time to put in place a simple principle: no new author record should be machine-created without links to an established authority record. When there's neither VIAF nor ISNI found, it is extremely likely that the author name is in error. Let's not further pollute the commons. Aside from just name, there should be at least one date (none conflicting), or else a matching coauthor, work title, or publisher at a minimum. Simply matching on name is not enough. |
@LeadSongDog I don't understand the relevance of your comment. The notes from Mar 20 explicitly say no new authors at all. |
Yes @tfmorris it’s true that @mekarpeles said that in the “as a first pass” context, but I’m arguing for a more general principle. Getting the urine out of the swimming pool is rather more work than getting it in. |
Closing this for now, @hornc is driving MARC and amz imports. |
Bibliometa has ONIX feeds which we can import into Open Library:
issues related to onix parsing: #2, #3
link
to metadata for the publisher's websiteThe items containing the files to import into Open Library are:
cc: @salman-bhai, @hornc
The text was updated successfully, but these errors were encountered: