Extend BWB Monthly Cron to archive + import covers #7691
Labels
Affects: Operations
Affects the IA DevOps folks
Lead: @scottbarnes
Issues overseen by Scott (Community Imports)
Module: Cover Service
Cover Store (book covers service)
Module: Import
Issues related to the configuration or use of importbot and other bulk import systems. [managed]
Needs: Breakdown
This big issue needs a checklist or subissues to describe a breakdown of work. [managed]
Needs: Detail
Submitter needs to provide more detail for this issue to be assessed (see comments). [managed]
Priority: 2
Important, as time permits. [managed]
Type: Feature Request
Issue describes a feature or enhancement we'd like to implement. [managed]
Milestone
Related to #6822
Describe the problem that you'd like solved
NB: Some of the details of this issue are intentionally internal as they deal with partner data sources.
Presently, every mid-month, we receive certain book data from BWB which we archive and then import. As part of this process, we download book covers, but unlike other book data, these covers reside on disk and (a) have not yet been archived and (b) have not yet been imported.
The purpose of this issue is to extend the BWB mid-monthly importer cron job so that when the cron runs:
olsystem
documentation / READMEImport strategy
We've already tested that BWBCoverBot/main.py works for importing these covers. To reduce overhead, we want to skip covers we have already imported. Before we run the cover importer, we will first use the open library monthly editions dump to get a normalized set of isbn13 for all editions that already have covers present (see cover_id field) → add this to a sqlite table of "don't need" for running.
Stakeholders
@bfalling @cdrini @cclauss @jimchamp
The text was updated successfully, but these errors were encountered: