Voice Transcription Server

A real-time voice transcription system that processes audio files using Whisper. The application functions as both a server and client, requiring HTTPS with self-signed certificates for secure communication.

Setup & Usage

Run x_setup.sh to install required dependencies
Execute the following scripts in order:
- x_run_server.sh
- x_run_client.sh
- x_ws.sh

Features

Audio processing using Whisper for accurate voice-to-text transcription
Real-time file watching system that monitors for new audio recordings
Built-in audio player for reviewing recorded files
Automatic FFmpeg preprocessing of audio files for optimal transcription
WebSocket endpoint for real-time transcription updates

Storage Structure

Recordings are organized hierarchically:

recordings/
└── YYYYMMDD/
└── UUID/
├── audio_files
└── transcriptions

API Documentation

WebSocket Endpoint

`/ws/{clientID}`

Method: WebSocket Connection
Description: Establishes a real-time WebSocket connection for receiving transcription updates
Parameters:
- clientID: Valid UUID of the client
Notes:
- Implements ping/pong with 60-second timeout
- Automatically disconnects on extended silence
- Validates UUID format

REST Endpoints

`/api/clients`

Method: GET
Description: Lists all active clients and their most recent transcription from the current day
Response: JSON map of client IDs to their latest TranscriptionMessage
Example Response:

{
    "client-uuid-1": {
        "timestamp": "2024-01-23T15:04:05Z",
        "text": "Latest transcription..."
    }
}

`/api/clients/{clientID}`

Method: GET
Description: Retrieves the most recent transcription for a specific client from the current day
Parameters:
- clientID: UUID of the client
Response: Single TranscriptionMessage object
Status Codes:
- 200: Success
- 404: Client not found or no messages for today

Static File Serving

Path: /
Description: Serves static files from the scribe/static directory
Note: All non-API routes default to static file serving

Development Notes:

Right now the client is what causes this to be as slow as it is from the Direct speech feel. The reason for this is that the client establishes a background noise level Wait for a detection level VAD that will indicate if someone is speaking. Then will happen is it will wait until there is a specific amount of time of silence before it then ships the data off to the server. Inside the server right now there is some commented out code that will take one second snapshots of a buffer that's being built as it's been transmitted in, And that makes it fast enough to wear it it looks like the user speech is coming right out.

But the problem is is that sometimes cuts up the person speech. The way that we could solve this is by having alternating frames on the server side such that we Snip the the audio is as it's being transmitted in and then a half a second later we do it again so we have a staggered offset that's being done then when we have as it as it comes out we could then analyze the two streams and merge them back together correctly, essentially zipping them back up. The problem with this is that it takes a little intuition of what context what's going on potentially I actually don't know I've never done this before, the way I intend solving this issue later is by by multiple client on the same device turn the internal of the application and then we'll just have a Tulpa who soul existence in this world will be to merge the streams and something coherent based on the context

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
audio		audio
client		client
scribe		scribe
server		server
whisper.cpp @ 0fbaac9		whisper.cpp @ 0fbaac9
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
demo.gif		demo.gif
demo.tape		demo.tape
gen_keys.sh		gen_keys.sh
go.mod		go.mod
go.sum		go.sum
main.go		main.go
x_run_client.sh		x_run_client.sh
x_run_server.sh		x_run_server.sh
x_setup.sh		x_setup.sh
x_ws.sh		x_ws.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Transcription Server

Setup & Usage

Features

Storage Structure

API Documentation

WebSocket Endpoint

`/ws/{clientID}`

REST Endpoints

`/api/clients`

`/api/clients/{clientID}`

Static File Serving

Development Notes:

About

Releases

Packages

Languages

License

bosley/libas

Folders and files

Latest commit

History

Repository files navigation

Voice Transcription Server

Setup & Usage

Features

Storage Structure

API Documentation

WebSocket Endpoint

/ws/{clientID}

REST Endpoints

/api/clients

/api/clients/{clientID}

Static File Serving

Development Notes:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`/ws/{clientID}`

`/api/clients`

`/api/clients/{clientID}`

Packages