Improve Querying #4

cparish312 · 2024-11-19T20:44:20Z

Currently, rag is simply done by embedding each screenshot and then comparing the embeddings to the query embeddings. A time decay is added to the score to encourage more recent results.

For each group of frames within 2 minutes retrieved, the most recent frame is grabbed. Of those deduplicated frames, the top 3 are grabbed by time-adjusted embedding score.

Those three frames are then converted to text by grouping by block_num and ordering by y position. The application package name and local time of screetshot are added ahead of the text form the screenshot.

This context from the 3 screenshots is passed along with the original query to the LLM.

Some ideas to improve:

Improve OCR parsing to include spatial information better (could try leaving in some positional data)
Create some app specific parsing (such as messages to know which message came from who)
Could try to integrate some UI object detection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Querying #4

Improve Querying #4

cparish312 commented Nov 19, 2024

Improve Querying #4

Improve Querying #4

Comments

cparish312 commented Nov 19, 2024