I have an issue where when I am using real time transcription, when I am not talking, it seems like it parses random text. #4

heromanofe · 2023-12-07T00:17:17Z

I was able to setup model and it works really great. My code is:

`private fun testAudio() {
// Initialize Whisper
val mWhisper = Whisper(this) // Create Whisper instance

// Load model and vocabulary for Whisper
val basePath = Global.fileOperations.getOutputDirectory("/Models", this)!!.path
val modelPath = basePath + "/whisper-tiny.tflite" // Provide model file path

    val vocabPath: String = basePath +
        "/filters_vocab_multilingual.bin" // Provide vocabulary file path
    println("PATHS: ")
    println(modelPath)
    println(vocabPath)
    mWhisper.loadModel(modelPath, vocabPath, true) // Load model and set multilingual mode

// Set a listener for Whisper to handle updates and results

    mWhisper.setListener(object : IWhisperListener {
        override fun onUpdateReceived(message: String?) {
            Log.i("TRANSCRIBE_WHISPER", "New State: $message")
            // Handle Whisper status updates
        }

        override fun onResultReceived(result: String?) {
            Log.i("TRANSCRIBE_WHISPER", result ?: "")
            // Handle transcribed results
        }
    })
    // Initialize Recorder
    val mRecorder = Recorder(this) // Create Recorder instance

// Set a listener for Recorder to handle updates and audio data
mRecorder.setListener(object : IRecorderListener {
override fun onUpdateReceived(message: String) {
// Handle Recorder status updates
}

        override fun onDataReceived(samples: FloatArray) {
            // Handle audio data received during recording
            // You can forward this data to Whisper for live recognition using writeBuffer()
            mWhisper.writeBuffer(samples);
        }
    })

    mRecorder.start(); // Start recording

}`

and  override fun onResultReceived(result: String?) {
            Log.i("TRANSCRIBE_WHISPER", result ?: "")
            // Handle transcribed results
        }

seemed to return:

[audioRecordData][fine] 5s(f:5014 m:0 s:0) : pid 8824 uid 10419 sessionId 41305 sr 16000 ch 1 fmt 1

I'll make a hole in the hole.
2 times this:

[audioRecordData][fine] 10s(f:10000 m:0 s:0) : pid 8824 uid 10419 sessionId 41305 sr 16000 ch 1 fmt 1
then
I'll be back with a little .... <== repeated a lot

thanks for you hard work :P

The text was updated successfully, but these errors were encountered:

vilassn · 2023-12-07T06:04:21Z

This can be fixed with VAD detection support. But, VAD detection is not yet implemented.

ITHealer · 2023-12-07T09:12:09Z

This can be fixed with VAD detection support. But, VAD detection is not yet implemented.

I am trying to apply VAD into the C++ source of my project. Get ideas from file: https://github.com/vilassn/whisper_android/blob/master/app/src/main/cpp/silent_detection.cpp

I tried calculating dB for each input audio clip according to BUFFER_SIZE then keeping only the audio clips that have speech inserted into outputBuffer. Then use this vector to calculate log_mel_spectrogram(...). However, the test results gave me a completely different sentence than the original sentence.

This is the result when I choose the threshold as -45.0:

This is the result when I choose the threshold as -40.0:

This is the result when I choose the threshold as -35.0:

Can you help me assess where the problem might be?

heromanofe · 2023-12-07T09:37:30Z

Yea but I don't understand how VAD can fix.. random text detected. I will check what audio is recorded and report back.

vilassn · 2023-12-07T12:45:31Z

@heromanofe 512 samples are taken as a window to determine the silence for 31.25 ms. If there is sequence of silence, lets say 16 windows are silent continuously, then consider there is no voice activity (i.e. silence).

In short, check for 500ms of silence instead of 31.25 ms. 500ms means 16 windows in sequence.

I hope, this should works. I should check this too.

heromanofe · 2023-12-07T13:21:40Z

I've noticed interesting thing, I have multi-lag model and it translates my speech when I think it shouldn't

vilassn · 2023-12-07T13:34:33Z

@heromanofe Yes. This is default behaviour for other languages. It translates to English if input language is other than English. We need to regenerate model with required configuration.

heromanofe · 2023-12-07T13:44:19Z

speaking of which, I would be interested in self-generating those bin and tflite files or at least having some place where I can download other models. I will check in 1-2 hrs what whisper receives from recorder.

heromanofe · 2023-12-07T16:32:58Z

https://1drv.ms/u/s!AgXqUQNVnl-xmZ07Nq71pVUibaZUOg?e=blb6zR <-- Onedrive link, if you want, I can send file using other way.
here is the audio.
here is output from my app.

2023-12-07 18:18:47.100 16170-16184 MyStudio< 2023-12-07 18:18:47.504 16170-16360 AudioRecord 2023-12-07 18:18:47.959 16170-16351 System.out 2023-12-07 18:18:47.964 16170-16463 System.out 2023-12-07 18:18:49.186 16170-16359 TRANSCRIBE_WHISPER 2023-12-07 18:18:49.217 16170-16185 MyStudio< 2023-12-07 18:18:51.738 16170-16185 MyStudio< 2023-12-07 18:18:51.848 16170-16359 TRANSCRIBE_WHISPER 2023-12-07 18:18:52.115 16170-16181 MyStudio< 2023-12-07 18:18:52.115 16170-16181 MyStudio< 2023-12-07 18:18:52.306 16170-16181 MyStudio< 2023-12-07 18:18:52.504 16170-16360 AudioRecord 2023-12-07 18:18:54.449 16170-16464 System.out 2023-12-07 18:18:54.458 16170-16464 System.out 2023-12-07 18:18:54.460 16170-16170 Choreographer 2023-12-07 18:18:54.504 16170-16185 MyStudio< 2023-12-07 18:18:54.569 16170-16359 TRANSCRIBE_WHISPER 2023-12-07 18:18:55.591 16170-16658 ProfileInstaller 2023-12-07 18:18:55.615 16170-16199 OpenGLRenderer 2023-12-07 18:18:55.654 16170-16170 Choreographer 2023-12-07 18:18:55.732 16170-16199 OpenGLRenderer 2023-12-07 18:18:56.889 16170-16359 TRANSCRIBE_WHISPER 2023-12-07 18:18:57.504 16170-16360 AudioRecord 2023-12-07 18:18:59.690 16170-16359 TRANSCRIBE_WHISPER 2023-12-07 18:19:02.506 16170-16360 AudioRecord 2023-12-07 18:19:02.543 16170-16359 TRANSCRIBE_WHISPER 2023-12-07 18:19:05.311 16170-16359 TRANSCRIBE_WHISPER 2023-12-07 18:19:07.504 16170-16360 AudioRecord 2023-12-07 18:19:08.509 16170-16359 TRANSCRIBE_WHISPER 2023-12-07 18:19:11.442 16170-16359 TRANSCRIBE_WHISPER 2023-12-07 18:19:12.261 16318-16338 System 2023-12-07 18:19:12.505 16170-16360 AudioRecord 2023-12-07 18:19:12.562 16170-16360 AudioRecord 2023-12-07 18:19:12.563 16170-16360 AudioRecord 2023-12-07 18:19:12.607 16170-16360 AudioRecord 2023-12-07 18:19:12.607 16170-16360 AudioRecord 2023-12-07 18:19:12.607 16170-16360 AudioRecord 2023-12-07 18:19:12.607 16170-16360 AudioRecord 2023-12-07 18:19:12.664 16170-16360 Recorder 2023-12-07 18:19:14.681 16170-16359 TRANSCRIBE_WHISPER /del>.MyAppName~ com.~~MyStudio~~.MyAppName~ I Compiler allocated 6018KB to compile java.lang.Object com.~~MyStudio~~.MyAppName~.model.XMLRPC.exeKwSafe(java.lang.String, java.lang.String, java.lang.Object, java.util.Map, com.~~MyStudio~~.MyAppName~.Permissions, boolean, boolean, boolean, kotlin.coroutines.Continuation)
com.~~MyStudio~~.MyAppName~ D [audioRecordData][fine] 5s(f:5000 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
com.~~MyStudio~~.MyAppName~ I task refresh start
com.~~MyStudio~~.MyAppName~ I Already Running...!
com.~~MyStudio~~.MyAppName~ I .
/del>.MyAppName~ com.~~MyStudio~~.MyAppName~ I NativeAlloc concurrent copying GC freed 96451(6415KB) AllocSpace objects, 28(668KB) LOS objects, 50% free, 12MB/25MB, paused 434us,49us total 127.362ms
/del>.MyAppName~ com.~~MyStudio~~.MyAppName~ I NativeAlloc concurrent copying GC freed 61908(2417KB) AllocSpace objects, 3(228KB) LOS objects, 50% free, 16MB/33MB, paused 1.142ms,1.206ms total 263.029ms
com.~~MyStudio~~.MyAppName~ I I'll make a hole in the hole
/del>.MyAppName~ com.~~MyStudio~~.MyAppName~ I Thread[6,tid=16181,WaitingInMainSignalCatcherLoop,Thread*=0xb400007c9343d000,peer=0x13d00000,"Signal Catcher"]: reacting to signal 3
/del>.MyAppName~ com.~~MyStudio~~.MyAppName~ I
/del>.MyAppName~ com.~~MyStudio~~.MyAppName~ I Wrote stack traces to tombstoned
com.~~MyStudio~~.MyAppName~ D [audioRecordData][fine] 10s(f:10000 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
com.~~MyStudio~~.MyAppName~ I widget tasks! Took: 7Seconds, 438Milliseconds
com.~~MyStudio~~.MyAppName~ I Tag Was Removed...!
com.~~MyStudio~~.MyAppName~ I Skipped 875 frames! The application may be doing too much work on its main thread.
/del>.MyAppName~ com.~~MyStudio~~.MyAppName~ I NativeAlloc concurrent copying GC freed 64933(2171KB) AllocSpace objects, 1(188KB) LOS objects, 50% free, 19MB/39MB, paused 2.278ms,1.462ms total 385.249ms
com.~~MyStudio~~.MyAppName~ I I'll make a little more of the dough.
com.~~MyStudio~~.MyAppName~ D Installing profile for com.~~MyStudio~~.MyAppName~
com.~~MyStudio~~.MyAppName~ I Davey! duration=8462ms; Flags=0, FrameTimelineVsyncId=368239147, IntendedVsync=1075489537039394, Vsync=1075496847566394, InputEventId=0, HandleInputStart=1075496856064781, AnimationStart=1075496856077020, PerformTraversalsStart=1075496860344781, DrawStart=1075497881552749, FrameDeadline=1075489549372727, FrameInterval=1075496855368999, FrameStartTime=8354888, SyncQueued=1075497972425353, SyncStart=1075497972721343, IssueDrawCommandsStart=1075497974028738, SwapBuffers=1075497993450405, FrameCompleted=1075497999888686, DequeueBufferDuration=31823, QueueBufferDuration=672396, GpuCompleted=1075497999888686, SwapBuffersCompleted=1075497994771238, DisplayPresentTime=0, CommandSubmissionCompleted=1075497993450405,
com.~~MyStudio~~.MyAppName~ I Skipped 142 frames! The application may be doing too much work on its main thread.
com.~~MyStudio~~.MyAppName~ I Davey! duration=1261ms; Flags=0, FrameTimelineVsyncId=368245973, IntendedVsync=1075496863178718, Vsync=1075498049451830, InputEventId=0, HandleInputStart=1075498049999468, AnimationStart=1075498050005770, PerformTraversalsStart=1075498050892228, DrawStart=1075498083264780, FrameDeadline=1075496883866087, FrameInterval=1075498049743947, FrameStartTime=8354036, SyncQueued=1075498098311811, SyncStart=1075498098412801, IssueDrawCommandsStart=1075498100124311, SwapBuffers=1075498116940457, FrameCompleted=1075498124907593, DequeueBufferDuration=76666, QueueBufferDuration=345781, GpuCompleted=1075498124907593, SwapBuffersCompleted=1075498117692801, DisplayPresentTime=0, CommandSubmissionCompleted=1075498116940457,
com.~~MyStudio~~.MyAppName~ I I'll make a hole in the hole
com.~~MyStudio~~.MyAppName~ D [audioRecordData][fine] 15s(f:15000 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
com.~~MyStudio~~.MyAppName~ I I'll make a small piece of cake with a little bit of sugar.
com.~~MyStudio~~.MyAppName~ D [audioRecordData][fine] 20s(f:20002 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
com.~~MyStudio~~.MyAppName~ I I'll make a hole in the hole.
com.~~MyStudio~~.MyAppName~ I you
com.~~MyStudio~~.MyAppName~ D [audioRecordData][fine] 25s(f:25000 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
com.~~MyStudio~~.MyAppName~ I I'll make a hole in the hole.
com.~~MyStudio~~.MyAppName~ I I'll make a hole in the hole
com.~~MyStudio~~.MyAppName~ W A resource failed to call close.
com.~~MyStudio~~.MyAppName~ D [audioRecordData][fine] 30s(f:30000 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
com.~~MyStudio~~.MyAppName~ D stop mSessionID=41849
com.~~MyStudio~~.MyAppName~ D stop(10025): mActive:1
com.~~MyStudio~~.MyAppName~ D stop mSessionID=41849
com.~~MyStudio~~.MyAppName~ D stop(10025): mActive:0
com.~~MyStudio~~.MyAppName~ D stop mSessionID=41849
com.~~MyStudio~~.MyAppName~ D stop(10025): mActive:0
com.~~MyStudio~~.MyAppName~ D Recorded file: /storage/emulated/0/Android/media/com.~~MyStudio~~.MyAppName~/MyAppName~/Models/test.wav
com.~~MyStudio~~.MyAppName~ I I'll make a small piece of cake with a little bit of sugar.

heromanofe · 2023-12-07T17:35:44Z

Okay, you were sooo right :D I remembered that I looked into VAD before. I implemented this https://github.com/gkonovalov/android-vad
into my project, using
implementation 'org.tensorflow:tensorflow-lite-task-audio:0.4.0'
implementation 'com.github.gkonovalov.android-vad:yamnet:2.0.4'
and in your code:

(Recorder)

VadYamnet vad = Vad.builder()
.setContext(mContext)
.setSampleRate(SampleRate.SAMPLE_RATE_16K)
.setFrameSize(FrameSize.FRAME_SIZE_487)
.setMode(Mode.NORMAL)
.setSilenceDurationMs(200)
.setSpeechDurationMs(30)
.build();
before while loop and inside while loop:

SoundCategory soundCategory = vad.classifyAudio(samples);
Log.d(TAG, soundCategory.getLabel());
Log.d(TAG, String.valueOf(soundCategory.getScore()));
// Send samples for transcription
if(soundCategory.getLabel().equals("Speech") && soundCategory.getScore() > 0.5)
sendData(samples);

and result is this:

2023-12-07 19:27:45.835 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D Silence
2023-12-07 19:27:45.835 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D 0.0
2023-12-07 19:27:45.999 7830-8018 System.out com.~~MyStudio~~.MyAppName~ I Main User Logging (Auto-Login) Took: 1Seconds, 782Milliseconds
2023-12-07 19:27:46.268 7830-7830 DecorView[] com.~~MyStudio~~.MyAppName~ D onWindowFocusChanged hasWindowFocus false
2023-12-07 19:27:46.330 30975-31175 ActivityManagerWrapper com.mi.android.globallauncher E getRecentTasks: mainTaskId=3824 userId=0 baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.~~MyStudio~~.MyAppName~/com.~~MyStudio~~.MyAppName~.MainActivity} }
2023-12-07 19:27:46.349 30975-31175 ActivityManagerWrapper com.mi.android.globallauncher E getRecentTasks: mainTaskId=3824 userId=0 baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.~~MyStudio~~.MyAppName~/com.~~MyStudio~~.MyAppName~.MainActivity} }
2023-12-07 19:27:46.456 7830-7830 com.github...orActivity com.~~MyStudio~~.MyAppName~ D Detect NFC state changes while previously enabled
2023-12-07 19:27:46.456 7830-7830 com.github...orActivity com.~~MyStudio~~.MyAppName~ D NFC state remains enabled
2023-12-07 19:27:46.458 7830-7830 System.out com.~~MyStudio~~.MyAppName~ I task refresh start
2023-12-07 19:27:46.478 7830-7830 DecorView[] com.~~MyStudio~~.MyAppName~ D onWindowFocusChanged hasWindowFocus true
2023-12-07 19:27:46.505 7830-7830 HandWritingStubImpl com.~~MyStudio~~.MyAppName~ I refreshLastKeyboardType: 1
2023-12-07 19:27:46.505 7830-7830 HandWritingStubImpl com.~~MyStudio~~.MyAppName~ I getCurrentKeyboardType: 1
2023-12-07 19:27:46.506 30975-31175 ActivityManagerWrapper com.mi.android.globallauncher E getRecentTasks: mainTaskId=3824 userId=0 baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.~~MyStudio~~.MyAppName~/com.~~MyStudio~~.MyAppName~.MainActivity} }
2023-12-07 19:27:46.551 30975-31175 ActivityManagerWrapper com.mi.android.globallauncher E getRecentTasks: mainTaskId=3824 userId=0 baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.~~MyStudio~~.MyAppName~/com.~~MyStudio~~.MyAppName~.MainActivity} }
2023-12-07 19:27:46.965 7830-7849 ~~MyStudio~~.MyAppName~ com.~~MyStudio~~.MyAppName~ I Compiler allocated 6018KB to compile java.lang.Object com.~~MyStudio~~.MyAppName~.model.NetixXMLRPC.exeKwSafe(java.lang.String, java.lang.String, java.lang.Object, java.util.Map, com.~~MyStudio~~.MyAppName~.Permissions, boolean, boolean, boolean, kotlin.coroutines.Continuation)
2023-12-07 19:27:47.234 7830-8093 System.out com.~~MyStudio~~.MyAppName~ I task refresh start
2023-12-07 19:27:47.236 7830-8095 System.out com.~~MyStudio~~.MyAppName~ I Already Running...!
2023-12-07 19:27:47.681 7830-8027 AudioRecord com.~~MyStudio~~.MyAppName~ D [audioRecordData][fine] 5s(f:5019 m:0 s:0) : pid 7830 uid 10419 sessionId 42009 sr 16000 ch 1 fmt 1
2023-12-07 19:27:48.782 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D Silence
2023-12-07 19:27:48.782 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D 0.0
2023-12-07 19:27:49.617 7830-7845 ~~MyStudio~~.MyAppName~ com.~~MyStudio~~.MyAppName~ I Thread[6,tid=7845,WaitingInMainSignalCatcherLoop,Thread*=0xb400007c9343d000,peer=0x13c803d0,"Signal Catcher"]: reacting to signal 3
2023-12-07 19:27:49.617 7830-7845 ~~MyStudio~~.MyAppName~ com.~~MyStudio~~.MyAppName~ I
2023-12-07 19:27:49.755 7830-7845 ~~MyStudio~~.MyAppName~ com.~~MyStudio~~.MyAppName~ I Wrote stack traces to tombstoned
2023-12-07 19:27:51.730 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D Speech
2023-12-07 19:27:51.730 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D 0.95703125
2023-12-07 19:27:52.171 7830-7845 ~~MyStudio~~.MyAppName~ com.~~MyStudio~~.MyAppName~ I Thread[6,tid=7845,WaitingInMainSignalCatcherLoop,Thread*=0xb400007c9343d000,peer=0x13c803d0,"Signal Catcher"]: reacting to signal 3
2023-12-07 19:27:52.171 7830-7845 ~~MyStudio~~.MyAppName~ com.~~MyStudio~~.MyAppName~ I
2023-12-07 19:27:52.367 7830-8017 System.out com.~~MyStudio~~.MyAppName~ I widget tasks! Took: 5Seconds, 518Milliseconds
2023-12-07 19:27:52.368 7830-8017 System.out com.~~MyStudio~~.MyAppName~ I Tag Was Removed...!
2023-12-07 19:27:52.370 7830-7830 Choreographer com.~~MyStudio~~.MyAppName~ I Skipped 658 frames! The application may be doing too much work on its main thread.
2023-12-07 19:27:52.418 7830-7845 ~~MyStudio~~.MyAppName~ com.~~MyStudio~~.MyAppName~ I Waiting for a blocking GC ObjectsAllocated
2023-12-07 19:27:52.551 7830-7850 ~~MyStudio~~.MyAppName~ com.~~MyStudio~~.MyAppName~ I NativeAlloc concurrent copying GC freed 29251(1167KB) AllocSpace objects, 4(412KB) LOS objects, 50% free, 19MB/39MB, paused 145us,59us total 257.088ms
2023-12-07 19:27:52.551 7830-7845 ~~MyStudio~~.MyAppName~ com.~~MyStudio~~.MyAppName~ I WaitForGcToComplete blocked ObjectsAllocated on NativeAlloc for 133.523ms
2023-12-07 19:27:52.552 7830-7845 ~~MyStudio~~.MyAppName~ com.~~MyStudio~~.MyAppName~ I Wrote stack traces to tombstoned
2023-12-07 19:27:52.681 7830-8027 AudioRecord com.~~MyStudio~~.MyAppName~ D [audioRecordData][fine] 10s(f:10020 m:0 s:0) : pid 7830 uid 10419 sessionId 42009 sr 16000 ch 1 fmt 1
2023-12-07 19:27:53.282 7830-7866 OpenGLRenderer com.~~MyStudio~~.MyAppName~ I Davey! duration=6403ms; Flags=0, FrameTimelineVsyncId=373222770, IntendedVsync=1079629264489237, Vsync=1079634761097501, InputEventId=0, HandleInputStart=1079634765932890, AnimationStart=1079634765937525, PerformTraversalsStart=1079634767145181, DrawStart=1079635577064712, FrameDeadline=1079629276822570, FrameInterval=1079634765744192, FrameStartTime=8353508, SyncQueued=1079635649329816, SyncStart=1079635649478150, IssueDrawCommandsStart=1079635650830285, SwapBuffers=1079635662302577, FrameCompleted=1079635668211848, DequeueBufferDuration=43594, QueueBufferDuration=365625, GpuCompleted=1079635668211848, SwapBuffersCompleted=1079635663156587, DisplayPresentTime=0, CommandSubmissionCompleted=1079635662302577,
2023-12-07 19:27:53.307 7830-8167 ProfileInstaller com.~~MyStudio~~.MyAppName~ D Installing profile for com.~~MyStudio~~.MyAppName~
2023-12-07 19:27:53.369 7830-7830 Choreographer com.~~MyStudio~~.MyAppName~ I Skipped 119 frames! The application may be doing too much work on its main thread.
2023-12-07 19:27:53.470 7830-7866 OpenGLRenderer com.~~MyStudio~~.MyAppName~ I Davey! duration=1089ms; Flags=0, FrameTimelineVsyncId=373233395, IntendedVsync=1079634769368090, Vsync=1079635763385681, InputEventId=0, HandleInputStart=1079635764478931, AnimationStart=1079635764496691, PerformTraversalsStart=1079635765899764, DrawStart=1079635814580702, FrameDeadline=1079634790054512, FrameInterval=1079635763763462, FrameStartTime=8353089, SyncQueued=1079635834791327, SyncStart=1079635834967316, IssueDrawCommandsStart=1079635837195545, SwapBuffers=1079635852484764, FrameCompleted=1079635859409816, DequeueBufferDuration=51198, QueueBufferDuration=531875, GpuCompleted=1079635859409816, SwapBuffersCompleted=1079635854016743, DisplayPresentTime=0, CommandSubmissionCompleted=1079635852484764,
2023-12-07 19:27:54.352 7830-8026 TRANSCRIBE_WHISPER com.~~MyStudio~~.MyAppName~ I hello this is test to use
2023-12-07 19:27:54.714 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D Speech
2023-12-07 19:27:54.714 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D 0.95703125
2023-12-07 19:27:56.644 7830-8026 TRANSCRIBE_WHISPER com.~~MyStudio~~.MyAppName~ I 16k sample rate and
2023-12-07 19:27:57.698 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D Speech
2023-12-07 19:27:57.699 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D 0.98046875
2023-12-07 19:27:57.699 7830-8027 AudioRecord com.~~MyStudio~~.MyAppName~ D [audioRecordData][fine] 15s(f:15038 m:0 s:0) : pid 7830 uid 10419 sessionId 42009 sr 16000 ch 1 fmt 1
2023-12-07 19:27:59.592 7830-8026 TRANSCRIBE_WHISPER com.~~MyStudio~~.MyAppName~ I and the frame size 487.
2023-12-07 19:28:00.706 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D Speech
2023-12-07 19:28:00.706 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D 0.96875
2023-12-07 19:28:02.532 7830-8026 TRANSCRIBE_WHISPER com.~~MyStudio~~.MyAppName~ I with normal mode. Speech detection.
2023-12-07 19:28:02.682 7830-8027 AudioRecord com.~~MyStudio~~.MyAppName~ D [audioRecordData][fine] 20s(f:20020 m:0 s:0) : pid 7830 uid 10419 sessionId 42009 sr 16000 ch 1 fmt 1
2023-12-07 19:28:03.696 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D Silence
2023-12-07 19:28:03.696 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D 0.0
2023-12-07 19:28:06.685 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D Silence
2023-12-07 19:28:06.685 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D 0.0
2023-12-07 19:28:07.681 7830-8027 AudioRecord com.~~MyStudio~~.MyAppName~ D [audioRecordData][fine] 25s(f:25020 m:0 s:0) : pid 7830 uid 10419 sessionId 42009 sr 16000 ch 1 fmt 1
2023-12-07 19:28:09.681 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D Animal
2023-12-07 19:28:09.681 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D 0.4140625
2023-12-07 19:28:12.526 7987-8007 System com.~~MyStudio~~.MyAppName~ W A resource failed to call close.
2023-12-07 19:28:12.670 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D Silence
2023-12-07 19:28:12.671 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D 0.0
2023-12-07 19:28:12.680 7830-8027 AudioRecord com.~~MyStudio~~.MyAppName~ D stop mSessionID=42009
2023-12-07 19:28:12.680 7830-8027 AudioRecord com.~~MyStudio~~.MyAppName~ D stop(10055): mActive:1
2023-12-07 19:28:12.740 7830-8027 AudioRecord com.~~MyStudio~~.MyAppName~ D stop mSessionID=42009
2023-12-07 19:28:12.740 7830-8027 AudioRecord com.~~MyStudio~~.MyAppName~ D stop(10055): mActive:0
2023-12-07 19:28:12.741 7830-8027 AudioRecord com.~~MyStudio~~.MyAppName~ D stop mSessionID=42009
2023-12-07 19:28:12.741 7830-8027 AudioRecord com.~~MyStudio~~.MyAppName~ D stop(10055): mActive:0
2023-12-07 19:28:12.753 7830-8027 Recorder com.~~MyStudio~~.MyAppName~ D Recorded file: /storage/emulated/0/Android/media/com.~~MyStudio~~.MyAppName~/MyAppName~/Models/test.wav

here is onedrive link to file:
https://1drv.ms/u/s!AgXqUQNVnl-xmZ086HH_M3ekp8XeUQ?e=dQyePD

vilassn · 2023-12-07T17:56:49Z

Has your problem been solved?

heromanofe · 2023-12-07T17:58:42Z

Has your problem been solved?

it was VAD problem, thou I wouldn't be celebrating for now. I noticed there is some speech it detected as silence instead :D I need to fine-tune it, but then its working 100% :P thanks for you work

ITHealer · 2023-12-08T02:31:12Z

Has your problem been solved?

it was VAD problem, thou I wouldn't be celebrating for now. I noticed there is some speech it detected as silence instead :D I need to fine-tune it, but then its working 100% :P thanks for you work

Can you guide me how to run the project from the repo: https://github.com/gkonovalov/android-vad
Is that Okay?

I ran it but when I clicked record even though I was still talking the result was "Noise detected". I don't understand how it works?

heromanofe · 2023-12-08T09:26:03Z

I don't know about app, all I did was this (in Recorder.java file)

heromanofe · 2023-12-11T17:58:54Z

Quick update about my situation, I decided to write kotlin code for real-time recognition. it works very simple, I am taking your recording system and just leaving out 1second chunks part. then in my code I have a system for tracking timeout.
there are 2 timeouts, first: if I don't talk for 5 seconds after activating, timeout
and when I stop talking < 2 second timeout.
when 2nd timeout happens, I am gathering all floatArrays I've created and pushing to whisper for recognition, result is this:

2023-12-11 19:53:26.300 30867-31012 WHISPER: New State com.ERPStudio.ErpDroid W READY
2023-12-11 19:53:26.309 30867-31012 WHISPER: New State com.ERPStudio.ErpDroid W LISTENING_WAITING
2023-12-11 19:53:29.591 30867-31039 WHISPER: New State com.ERPStudio.ErpDroid W LISTENING_RECORDING
2023-12-11 19:53:34.509 30867-31039 System.out com.ERPStudio.ErpDroid I Whisper: recognizing text....
2023-12-11 19:53:34.509 30867-31039 WHISPER: New State com.ERPStudio.ErpDroid W READY
2023-12-11 19:53:37.363 30867-31038 TRANSCRIBE_WHISPER com.ERPStudio.ErpDroid I Test Test 1,2,3, Test Test
so transcribe whisper took 3 seconds to recognise that text I have.

I am making 2bl app, I need both: TTS which like here can be slow and Commands (like start X do Y) and those specifically ideally should be very quick, but this 3 second delay is too much for me. what can you suggest for speed optimization, keep in mind I am using right now whisper-tiny.tflite, so multi-lang model. would using eng model speed things up?

vilassn · 2023-12-14T05:00:02Z

Transcription time varies device to device. On high end device, transcription time will be less.

You can debug what is taking more time.
Whether it is Mel spectrogram calculation or inference.

matanel-6over6 · 2024-01-09T18:47:16Z

Hi, first of all thanks for the hard work. Is there a solution to the quiet mode issue? I don't speak and there is complete silence and words are still coming back to me

heromanofe · 2024-01-09T18:49:13Z

@matanel-6over6 scroll up for screenshots, here is library: https://github.com/gkonovalov/android-vad
You need VAD and that was pretty good solution for me

matanel-6over6 · 2024-01-09T18:51:42Z

@heromanofe Thanks for the quick reply. What should I take from the project I mentioned to Vilassn's project?

heromanofe · 2024-01-09T18:53:13Z

you implement that library in gradle (
implementation 'org.tensorflow:tensorflow-lite-task-audio:0.4.0'
implementation 'com.github.gkonovalov.android-vad:yamnet:2.0.4'
)
and for this project, Recorder.java <-- file you add vad there

matanel-6over6 · 2024-01-09T18:54:30Z

Do I need to add what you marked to the Class of the recorder?

heromanofe · 2024-01-09T18:55:40Z

in screenshot stuff there, implementation is gradle (app/build.gradle)

matanel-6over6 · 2024-01-09T18:56:50Z

@heromanofe Yes, I understand, thank you very much.

matanel-6over6 · 2024-01-09T19:46:58Z

@heromanofe Working grate. Thank you very much

KihongK mentioned this issue Jun 27, 2024

Is there any additional way to improve the performance? #14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I have an issue where when I am using real time transcription, when I am not talking, it seems like it parses random text. #4

I have an issue where when I am using real time transcription, when I am not talking, it seems like it parses random text. #4

heromanofe commented Dec 7, 2023

vilassn commented Dec 7, 2023

ITHealer commented Dec 7, 2023 •

edited

Loading

heromanofe commented Dec 7, 2023

vilassn commented Dec 7, 2023

heromanofe commented Dec 7, 2023

vilassn commented Dec 7, 2023

heromanofe commented Dec 7, 2023

heromanofe commented Dec 7, 2023

heromanofe commented Dec 7, 2023

vilassn commented Dec 7, 2023

heromanofe commented Dec 7, 2023

ITHealer commented Dec 8, 2023

heromanofe commented Dec 8, 2023

heromanofe commented Dec 11, 2023

vilassn commented Dec 14, 2023

matanel-6over6 commented Jan 9, 2024

heromanofe commented Jan 9, 2024

matanel-6over6 commented Jan 9, 2024

heromanofe commented Jan 9, 2024

matanel-6over6 commented Jan 9, 2024

heromanofe commented Jan 9, 2024

matanel-6over6 commented Jan 9, 2024

matanel-6over6 commented Jan 9, 2024

I have an issue where when I am using real time transcription, when I am not talking, it seems like it parses random text. #4

I have an issue where when I am using real time transcription, when I am not talking, it seems like it parses random text. #4

Comments

heromanofe commented Dec 7, 2023

vilassn commented Dec 7, 2023

ITHealer commented Dec 7, 2023 • edited Loading

heromanofe commented Dec 7, 2023

vilassn commented Dec 7, 2023

heromanofe commented Dec 7, 2023

vilassn commented Dec 7, 2023

heromanofe commented Dec 7, 2023

heromanofe commented Dec 7, 2023

heromanofe commented Dec 7, 2023

vilassn commented Dec 7, 2023

heromanofe commented Dec 7, 2023

ITHealer commented Dec 8, 2023

heromanofe commented Dec 8, 2023

heromanofe commented Dec 11, 2023

vilassn commented Dec 14, 2023

matanel-6over6 commented Jan 9, 2024

heromanofe commented Jan 9, 2024

matanel-6over6 commented Jan 9, 2024

heromanofe commented Jan 9, 2024

matanel-6over6 commented Jan 9, 2024

heromanofe commented Jan 9, 2024

matanel-6over6 commented Jan 9, 2024

matanel-6over6 commented Jan 9, 2024

ITHealer commented Dec 7, 2023 •

edited

Loading