Extend BWB Monthly Cron to archive + import covers #7691

mekarpeles · 2023-03-21T01:00:16Z

Related to #6822

Describe the problem that you'd like solved

NB: Some of the details of this issue are intentionally internal as they deal with partner data sources.

Presently, every mid-month, we receive certain book data from BWB which we archive and then import. As part of this process, we download book covers, but unlike other book data, these covers reside on disk and (a) have not yet been archived and (b) have not yet been imported.

The purpose of this issue is to extend the BWB mid-monthly importer cron job so that when the cron runs:

Covers are archived in archive.org
- There's pre-existing code in the cron to archive book data, we'll need a separate block with similar logic for covers. This is likely to be the easy step.
Covers are imported / DRAFT: Batch add Better World Books covers #6706 into Open Library.
- Importing these records depends on sufficient free space space on the host via Roadmap for fixing cover archival #7257
Provide internal olsystem documentation / README

Import strategy

We've already tested that BWBCoverBot/main.py works for importing these covers. To reduce overhead, we want to skip covers we have already imported. Before we run the cover importer, we will first use the open library monthly editions dump to get a normalized set of isbn13 for all editions that already have covers present (see cover_id field) → add this to a sqlite table of "don't need" for running.

Stakeholders

@bfalling @cdrini @cclauss @jimchamp

The text was updated successfully, but these errors were encountered:

mekarpeles · 2024-05-16T16:14:44Z

I noticed a little bug with our current partner cover ftp downloader... which is because months are off-by-one, Dec 2023 is in the directory for 2024.

mekarpeles added Type: Feature Request Module: Cover Service Module: Import Needs: Detail Needs: Breakdown Priority: 2 Affects: Operations Lead: @scottbarnes labels Mar 21, 2023

mekarpeles added this to the 2023 milestone Mar 21, 2023

mekarpeles mentioned this issue Mar 21, 2023

Cover Service Improvements #6822

Open

7 tasks

mekarpeles mentioned this issue Sep 9, 2023

Add monthly cron to run Cover Server Archival #8278

Closed

mekarpeles changed the title ~~Extend BWB Cron: Import Covers~~ Extend BWB Monthly Cron to archive + import covers Sep 22, 2023

mekarpeles modified the milestones: 2023, Sprint 2023-11 Nov 6, 2023

jimchamp assigned scottbarnes Nov 20, 2023

mekarpeles modified the milestones: Sprint 2023-11, Sprint 2023-12, 2024 (provisional, requires discussion) Nov 27, 2023

mekarpeles modified the milestones: 2024 (provisional, requires discussion), Sprint 2024-05 May 10, 2024

mekarpeles modified the milestones: Sprint 2024-05, Sprint 2024-06 May 30, 2024

mekarpeles modified the milestones: Sprint 2024-06, Sprint 2024-07 Jul 1, 2024

mekarpeles modified the milestones: Sprint 2024-07, Sprint 2024-09 Aug 2, 2024

mekarpeles modified the milestones: Sprint 2024-09, 2024 (provisional, requires discussion) Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend BWB Monthly Cron to archive + import covers #7691

Extend BWB Monthly Cron to archive + import covers #7691

mekarpeles commented Mar 21, 2023 •

edited

Loading

mekarpeles commented May 16, 2024 •

edited

Loading

Extend BWB Monthly Cron to archive + import covers #7691

Extend BWB Monthly Cron to archive + import covers #7691

Comments

mekarpeles commented Mar 21, 2023 • edited Loading

Describe the problem that you'd like solved

Import strategy

Stakeholders

mekarpeles commented May 16, 2024 • edited Loading

mekarpeles commented Mar 21, 2023 •

edited

Loading

mekarpeles commented May 16, 2024 •

edited

Loading