-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial plugin design #1
Comments
Assorted ideas:
|
Most basic version: you select an existing table (hence avoiding the need to implement a schema editing tool) and paste text into a textarea. I'll build that first. |
It's going to need a description for each column - it can guess in some cases, but the option to give it clues will help a lot. |
I got this working, but it was really slow... because the OpenAI APIs take a while to stream back all of that JSON. I had a note about that https://til.simonwillison.net/gpt3/openai-python-functions-data-extraction where I mentioned that maybe So I spent some time and figured out the Short version: events = ijson.sendable_list()
coro = ijson.items_coro(events, "items.item")
seen_events = set()
for chunk in chunks:
coro.send(chunk.encode("utf-8"))
if events:
# Any we have not seen yet?
unseen_events = [e for e in events if json.dumps(e) not in seen_events]
if unseen_events:
for event in unseen_events:
seen_events.add(json.dumps(event))
print(json.dumps(event))
time.sleep(1) |
The goal of this plugin is to provide a UI for extracting structured data from unstructured text, using the trick described in https://til.simonwillison.net/gpt3/openai-python-functions-data-extraction
Datasette is all about tables, so a plugin which makes it as easy as possible to turn unstructured data into table data makes a ton of sense.
The text was updated successfully, but these errors were encountered: