Run import tasks in the background and show people a progress indicator while they run #4

simonw · 2024-02-26T20:12:30Z

Currently blocks for several seconds while the job runs, which isn't ideal.

simonw · 2024-02-26T20:12:49Z

Can't easily show a progress indicator in terms of 0-100%, but showing parsed results as they are extracted would be neat.

simonw · 2024-02-26T20:14:07Z

Two ways to do this:

A database table with a record for each job, which can be polled for status
An in-memory mechanism for storing progress

I like the table method, since that makes things easier to debug in the future. It could also be used to help power future features like remembering the last used column descriptions.

simonw · 2024-02-26T20:15:19Z

Database table will be similar to the one used for Datasette enrichments: https://github.com/datasette/datasette-enrichments/blob/eeded4173fff0614e2efe559022a9023e5ff8640/datasette_enrichments/__init__.py#L35-L54

create table if not exists _enrichment_jobs (
    id integer primary key,
    status text, -- pending, running, cancelled, finished
    enrichment text, -- slug of enrichment
    database_name text,
    table_name text,
    filter_querystring text, -- querystring used to filter rows
    config text, -- JSON dictionary of config
    started_at text, -- ISO8601 when added
    finished_at text, -- ISO8601 when completed or cancelled
    cancel_reason text, -- null or reason for cancellation
    next_cursor text, -- next cursor to fetch
    row_count integer, -- number of rows to enrich at start
    error_count integer, -- number of rows with errors encountered
    done_count integer, -- number of rows processed
    actor_id text, -- optional ID of actor who created the job
    cost_100ths_cent integer -- cost of job so far in 1/100ths of a cent
)

simonw · 2024-02-26T20:15:58Z

I'm torn on whether I should persist the full input text to the model. I think yes, for debuggability.

simonw · 2024-02-26T20:18:19Z

Maybe this as a starting point:

create table if not exists _extract_jobs (
    id integer primary key,
    database_name text,
    table_name text,
    input text,
    custom_prompt text, -- for custom system prompt
    columns text, -- JSON function definition
    started_at text, -- ISO8601 when added
    finished_at text, -- ISO8601 when completed or cancelled
    row_count integer, -- starts at 0
    error text
);

simonw · 2024-02-27T15:53:38Z

Options for polling for extracted records to display to the user:

Poll the database table that is being populated (though that's tricky as we don't want to show any previously existing records - could use rowid for inserted ones but that won't help for replace-by-pk if we implement that)
Poll against an in-memory data structure that exists just for the duration of the task
Append the JSON to either a column in _extract_jobs or a separate table and poll that
Instead of polling use WebSockets or SSE

I'm going to try option 2 first because I think it will be easiest to get working - can very easily change how that works later.

simonw · 2024-02-27T15:55:26Z

I'm going to display the extracted data to the user as a raw JSON array - I considered a more human-friendly format like a table display or definition list, but I think JSON is a better representation of what's actually happening plus it gives them the chance to copy and paste it out to another tool if they are so inclined.

simonw · 2024-02-27T15:57:50Z

Should the rows be inserted one at a time as they are extracted, or should they all be inserted at once at the end of the operation?

One at a time means a server crash won't lose work-so-far.
At the end provides a chance to let the user preview what has been extracted and approve/deny the operation.

I'll try one at a time first and see how that feels,

simonw · 2024-02-27T17:11:33Z

I generated a new loading indicator using https://loading.io/ (unrestricted public domain license).

simonw · 2024-02-27T17:14:33Z

Demo:

Refs #1, #2, #3, #4, #5, #6, #7, #12, #13

simonw added the enhancement New feature or request label Feb 26, 2024

simonw closed this as completed in 4bcd623 Feb 27, 2024

simonw added a commit that referenced this issue Mar 7, 2024

Release 0.1a0

3c849be

Refs #1, #2, #3, #4, #5, #6, #7, #12, #13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run import tasks in the background and show people a progress indicator while they run #4

Run import tasks in the background and show people a progress indicator while they run #4

simonw commented Feb 26, 2024

simonw commented Feb 26, 2024

simonw commented Feb 26, 2024

simonw commented Feb 26, 2024

simonw commented Feb 26, 2024

simonw commented Feb 26, 2024

simonw commented Feb 27, 2024 •

edited

Loading

simonw commented Feb 27, 2024

simonw commented Feb 27, 2024

simonw commented Feb 27, 2024 •

edited

Loading

simonw commented Feb 27, 2024

Run import tasks in the background and show people a progress indicator while they run #4

Run import tasks in the background and show people a progress indicator while they run #4

Comments

simonw commented Feb 26, 2024

simonw commented Feb 26, 2024

simonw commented Feb 26, 2024

simonw commented Feb 26, 2024

simonw commented Feb 26, 2024

simonw commented Feb 26, 2024

simonw commented Feb 27, 2024 • edited Loading

simonw commented Feb 27, 2024

simonw commented Feb 27, 2024

simonw commented Feb 27, 2024 • edited Loading

simonw commented Feb 27, 2024

simonw commented Feb 27, 2024 •

edited

Loading

simonw commented Feb 27, 2024 •

edited

Loading