Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Querying #4

Open
cparish312 opened this issue Nov 19, 2024 · 0 comments
Open

Improve Querying #4

cparish312 opened this issue Nov 19, 2024 · 0 comments

Comments

@cparish312
Copy link
Owner

Currently, rag is simply done by embedding each screenshot and then comparing the embeddings to the query embeddings. A time decay is added to the score to encourage more recent results.

For each group of frames within 2 minutes retrieved, the most recent frame is grabbed. Of those deduplicated frames, the top 3 are grabbed by time-adjusted embedding score.

Those three frames are then converted to text by grouping by block_num and ordering by y position. The application package name and local time of screetshot are added ahead of the text form the screenshot.

This context from the 3 screenshots is passed along with the original query to the LLM.

Some ideas to improve:

  1. Improve OCR parsing to include spatial information better (could try leaving in some positional data)
  2. Create some app specific parsing (such as messages to know which message came from who)
  3. Could try to integrate some UI object detection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant