Skip to content

Commit

Permalink
tweaks
Browse files Browse the repository at this point in the history
  • Loading branch information
sshh12 committed Sep 28, 2024
1 parent 218554c commit cd9a247
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 2 deletions.
66 changes: 65 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,65 @@
# screen-tab-ai
# screen-complete

> Screen complete is a proof of concept universal screenshot-based text completion tool. Inspired by tools like cursor and github copilot, it allows you to fill in arbitrary selected text on your screen using a hot key.
## Quick Start (Windows/OSX)

1. Download the latest version from the [releases page](https://github.com/sshh12/screen-complete/releases)
2. Create a config file `screen_complete.yml` with `openai_api_key: ...` (in the same directory)
3. Run `screen-complete` (OSX may require enabling screen recording permissions)

The UI/UX is extremely minimal, on a page you want to fill in text:
1. Place your cursor where you want to type text (or select text you want to replace)
1. Move your mouse (without clicking) the top left corner of the applicable window
2. Hold down `Ctrl+Q`
3. Move your cursor to the bottom right corner of the applicable window
4. Release `Ctrl+Q`

## Examples

| Description | Image |
|-------------|-------|
| Writing text in a google doc | ![chrome_lOtDKw9Vsd](https://github.com/user-attachments/assets/eb7e2a84-52e5-480c-9fd8-b744c3b04f13) |
| Filling in the title of a GitHub issue | ![chrome_rLIIqjoeJE](https://github.com/user-attachments/assets/257c990f-fcfd-46cf-b0c8-d0c6a28e6137) |
| Drafting a reddit comment | ![chrome_lnoue13hYT](https://github.com/user-attachments/assets/0f3a263c-8ab6-42da-8985-2369a2fadd5e) |


## Configuration

This tool currently supports OpenAI and Azure OpenAI. Only fields for Azure OR OpenAI are required.

### Via Environment Variables

```
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_DEPLOYMENT=...
OPENAI_API_KEY=...
OPENAI_MODEL=... (optional)
```

### Via `screen_complete.yml`

```yaml
azure_openai_api_key: ...
azure_openai_endpoint: ...
azure_openai_deployment: ...
openai_api_key: ...
openai_model: ... (optional)
```
## Building
### Windows
1. Download the Mingw, then set system environment variables C:\mingw64\bin to the Path
2. `go build -o screen-complete.exe cmd\screen_complete\main.go`

### MacOS

1. `xcode-select --install`
2. `go build -o screen-complete cmd\screen_complete\main.go`

## Privacy

Your screen is only captured and sent to OpenAI when you release the hot key.
4 changes: 4 additions & 0 deletions cmd/screen_complete/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ func run() {
timeSinceLastEvent = time.Now()
img := screenshot.CaptureBounds(x1, y1, x2, y2)

fmt.Println("Generating...")

ocr := llm.PromptImage(img, llm.SystemAnalyzeScreenshot, llm.UserAnalyzeScreenshot)
fmt.Println("> ", ocr)
Expand All @@ -52,6 +53,9 @@ func run() {

result = strings.ReplaceAll(result, "\n", " ")
result = strings.TrimSpace(result)
if strings.HasPrefix(result, "\"") && strings.HasSuffix(result, "\"") {
result = result[1 : len(result)-1]
}

robotgo.TypeStr(result)
}
Expand Down
2 changes: 1 addition & 1 deletion pkg/llm/prompts.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Respond in the format (without the quotes):
SystemComplete = `You are a typing assistant that works by responding to prompts within screenshots.
## Notes
- Respond only with text that the user wants your to elaborate or answer
- Respond ONLY with text that the user wants your to elaborate or answer (no "Sure, ...", etc)
- Do not just repeat the text, instead treat it as a prompt
- You may not use new lines or special formatting
- Keep in mind the formatting the text near the highlighted content
Expand Down

0 comments on commit cd9a247

Please sign in to comment.