Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend BWB Monthly Cron to archive + import covers #7691

Open
3 tasks
mekarpeles opened this issue Mar 21, 2023 · 1 comment
Open
3 tasks

Extend BWB Monthly Cron to archive + import covers #7691

mekarpeles opened this issue Mar 21, 2023 · 1 comment
Assignees
Labels
Affects: Operations Affects the IA DevOps folks Lead: @scottbarnes Issues overseen by Scott (Community Imports) Module: Cover Service Cover Store (book covers service) Module: Import Issues related to the configuration or use of importbot and other bulk import systems. [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Needs: Detail Submitter needs to provide more detail for this issue to be assessed (see comments). [managed] Priority: 2 Important, as time permits. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed]

Comments

@mekarpeles
Copy link
Member

mekarpeles commented Mar 21, 2023

Related to #6822

Describe the problem that you'd like solved

NB: Some of the details of this issue are intentionally internal as they deal with partner data sources.

Presently, every mid-month, we receive certain book data from BWB which we archive and then import. As part of this process, we download book covers, but unlike other book data, these covers reside on disk and (a) have not yet been archived and (b) have not yet been imported.

The purpose of this issue is to extend the BWB mid-monthly importer cron job so that when the cron runs:

Import strategy

We've already tested that BWBCoverBot/main.py works for importing these covers. To reduce overhead, we want to skip covers we have already imported. Before we run the cover importer, we will first use the open library monthly editions dump to get a normalized set of isbn13 for all editions that already have covers present (see cover_id field) → add this to a sqlite table of "don't need" for running.

Stakeholders

@bfalling @cdrini @cclauss @jimchamp

@mekarpeles mekarpeles added Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed] Module: Cover Service Cover Store (book covers service) Module: Import Issues related to the configuration or use of importbot and other bulk import systems. [managed] Needs: Detail Submitter needs to provide more detail for this issue to be assessed (see comments). [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Priority: 2 Important, as time permits. [managed] Affects: Operations Affects the IA DevOps folks Lead: @scottbarnes Issues overseen by Scott (Community Imports) labels Mar 21, 2023
@mekarpeles mekarpeles added this to the 2023 milestone Mar 21, 2023
@mekarpeles mekarpeles changed the title Extend BWB Cron: Import Covers Extend BWB Monthly Cron to archive + import covers Sep 22, 2023
@mekarpeles mekarpeles modified the milestones: 2023, Sprint 2023-11 Nov 6, 2023
@mekarpeles
Copy link
Member Author

mekarpeles commented May 16, 2024

I noticed a little bug with our current partner cover ftp downloader... which is because months are off-by-one, Dec 2023 is in the directory for 2024.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Affects: Operations Affects the IA DevOps folks Lead: @scottbarnes Issues overseen by Scott (Community Imports) Module: Cover Service Cover Store (book covers service) Module: Import Issues related to the configuration or use of importbot and other bulk import systems. [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Needs: Detail Submitter needs to provide more detail for this issue to be assessed (see comments). [managed] Priority: 2 Important, as time permits. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed]
Projects
None yet
Development

No branches or pull requests

2 participants