Design error handling mechanism #7

simonw · 2023-11-05T22:42:41Z

Refs:

Design database schema #2 (comment)

I added a error_count column, and I have the idea that the job should automatically be cancelled if more than X errors occur (default X = 5).

But... how does that actually work at the code level?

The text was updated successfully, but these errors were encountered:

simonw · 2023-11-05T22:46:51Z

Currently batches are executed like this:

datasette-enrichments/datasette_enrichments/__init__.py

Lines 115 to 118 in 58b4990

    
           # Enrich batch 
        
           await self.enrich_batch( 
        
               db, job["table_name"], rows, json.loads(job["config"]) 
        
           )

That enrich_batch() method is provided by the embedding subclass.

Having that method raise an exception isn't ideal, because if you hand it 100 rows and it works fine for 99 of them but errors on one we don't want to assume the entire batch failed.

Instead, it could return a list of errors - something like this:

    errors = await self.enrich_batch(
        db, job["table_name"], rows, json.loads(job["config"])
    )

That returned list would need to map row primary keys to errors though - and the information about what a primary key is has not really been dealt with yet.

Maybe that list is a [None] * 100 for the case when everything worked, otherwise the errors are strings that correspond to the incoming list of rows? Then the calling code could count them and write them to the database.

Bit of an ugly thing to have to implement each time in the embeddings though.

Unless... there could be a Embedding subclass called OneAtATimeEmbedding or similar where you can implement just a single enrich_single(self, row) method which is allowed to raise any errors it likes, and those errors will then be stringified automatically by the default enrich_batch() method.

Then if you want to do fancy batches - like with the OpenAI embeddings API - you can implement enrich_batch() but most embedding implementations can just implement enrich_single() instead.

simonw · 2023-11-05T22:49:10Z

Then I can have a table called _enrichment_errors that looks like this:

create table _enrichment_errors (
    id integer primary key,
    job_id integer references _enrichment_jobs(id),
    row_pk text, -- JSON encoded, can be integer or string or ["compound", "pk"]
    error text,
    created_at text
)

simonw · 2023-11-28T06:47:12Z

I think it's on the plugins themselves to catch errors and write them to this table.

I'll add a await self.log_error(job_id, row_ids, error) method.

simonw · 2023-11-28T06:49:29Z

I've been thinking about having an optional RowWiseEnrichment subclass which applies them a row at a time using a enrich_row() method - I could implement that and have any exceptions it raises automatically logged.

simonw · 2023-11-28T06:50:35Z

Or I could just have a default .enrich_batch() method which calls .enrich_row() - then the class author can decide which of those methods to override.

If you override .enrich_batch() then it's on you to record errors.

simonw · 2023-11-29T18:18:01Z

New idea: instead of this:

async def log_error(self, db: "Database", job_id: int, id: Any, error: str):

I'll do this:

async def log_error(self, db: "Database", job_id: int, ids: List[Union[int, str, Tuple[]], error: str):

So any time you record an error you can record it against a LIST of row primary keys,

from typing import Union, Tuple, List

# Define the type for the elements in the list
ElementType = Union[int, str, Tuple[Union[int, str], ...]]

async def log_error(..., ids: List[ElementType]):
    pass

simonw · 2023-11-29T18:21:45Z

That way I can have a default error logging mechanism where if your enrich_batch() method throws an error I can record that along with the IDs of all of the items in that batch.

* Error table and log_error() method, refs #7 * Log errors if batch raises exception, refs #7 * log_traceback=True option to log tracebacks in errors

Refs #7, #14, #27

simonw added the research label Nov 5, 2023

simonw added this to the First alpha milestone Nov 13, 2023

simonw mentioned this issue Nov 26, 2023

Optional logging mechanism #26

Open

simonw modified the milestones: First alpha, First stable release Nov 28, 2023

simonw mentioned this issue Nov 28, 2023

enrich_row() shortcut #31

Open

simonw added a commit that referenced this issue Nov 29, 2023

Error table and log_error() method, refs #7

8add778

simonw added a commit that referenced this issue Nov 29, 2023

Log errors if batch raises exception, refs #7

494c08d

simonw added a commit that referenced this issue Nov 29, 2023

Tests and docs for error logging, refs #7

00a2d60

simonw mentioned this issue Nov 29, 2023

Error logging mechanism #33

Merged

simonw added a commit that referenced this issue Nov 29, 2023

Error logging mechanism (#33)

07e42e1

* Error table and log_error() method, refs #7 * Log errors if batch raises exception, refs #7 * log_traceback=True option to log tracebacks in errors

simonw added a commit that referenced this issue Nov 29, 2023

Release 0.1a7

188a247

Refs #7, #14, #27

simonw closed this as completed Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design error handling mechanism #7

Design error handling mechanism #7

simonw commented Nov 5, 2023

simonw commented Nov 5, 2023

simonw commented Nov 5, 2023 •

edited

Loading

simonw commented Nov 28, 2023

simonw commented Nov 28, 2023

simonw commented Nov 28, 2023

simonw commented Nov 29, 2023 •

edited

Loading

simonw commented Nov 29, 2023

Design error handling mechanism #7

Design error handling mechanism #7

Comments

simonw commented Nov 5, 2023

simonw commented Nov 5, 2023

simonw commented Nov 5, 2023 • edited Loading

simonw commented Nov 28, 2023

simonw commented Nov 28, 2023

simonw commented Nov 28, 2023

simonw commented Nov 29, 2023 • edited Loading

simonw commented Nov 29, 2023

simonw commented Nov 5, 2023 •

edited

Loading

simonw commented Nov 29, 2023 •

edited

Loading