doc:update/refine doc in README.md or source code (#94)

* continue to try a new finetune method which introduced in ggerganov/whisper.cpp#1951 after sync with upstream whispercpp * continue to try a new finetune method which introduced in ggerganov/whisper.cpp#1951 after sync with upstream whispercpp * Update FAQ.md * Update FAQ.md
zhouwg · Mar 23, 2024 · d241d99 · d241d99
1 parent e8846da
commit d241d99
Show file tree

Hide file tree

Showing 5 changed files with 159 additions and 43 deletions.
diff --git a/README.md b/README.md
@@ -149,7 +149,7 @@ modify <a href="https://github.com/cdeos/kantv/blob/master/build/envsetup.sh#L52
 
 pay attention <a href="https://github.com/cdeos/kantv/blob/master/external/whispercpp/CMakeLists.txt#L54">here and modify it accordingly</a> if build-target is kantv-android and running Android device is NOT Xiaomi 14
 
-TIP: a VERY powerful Linux PC / Linux workstation is HIGHLY recommended for this step.
+a VERY powerful Linux PC / Linux workstation is HIGHLY recommended for this step.
 
 ```
 . build/envsetup.sh
@@ -248,9 +248,9 @@ Report issue in various Android-based phone or even submit PR to this project is
 - [How to integrate proprietary/open source codes to project KanTV for personal/proprietary/commercial R&D activity](https://github.com/cdeos/kantv/issues/74)
 - [How to use whisper.cpp and ffmpeg to add subtitle to video](./docs/how-to-use-whispercpp-ffmpeg-add-subtitle-to-video.md)
 - [Acknowledgement](./docs/acknowledgement.md)
+- [F.A.Q](./docs/FAQ.md)
+
 
-- Please do not send e-mail to me for technical question. Public technical discussion on github is preferred.
-- feel free to submit issues or new features(focus on Android at the moment), volunteer support would be provided if time permits.
 
 ### ChangeLog
 

diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -0,0 +1,56 @@
+
+- Should I use/reference kantv in my project?
+
+Project KanTV is a personal/hobby project. It does not strive to provide a production ready implementation. The main goals of the implementation is to be educational, hackable. There are no guarantees that the implementation is correct and bug-free and stuff can break at any point in the future. Support and updates will depend mostly on contributions, since with time I will move on and won't dedicate too much time on the project.
+
+If you plan to use kantv in your own project, keep in mind the above.
+
+My advice is to not put all your eggs into the kantv basket although I wish you could feed your need by use/reference this project.
+
+<hr>
+
+- Is there any IPR concern/risk in kantv?
+
+Project KanTV was almost/completely done by myself since 05-2021 and NO IPR concern/risk(some parts with IPR risk(implementation of ChinaDRM, widevine, wiseplay, TEE...) have been removed accordingly because I had been received good IPR compliance training when I was a full-time employee in MNC company).
+
+
+<hr>
+
+- What's the relationship between kantv and whisper.cpp
+
+
+Project KanTV has NO or NO personal relationship with whisper.cpp but only use source code of whisper.cpp as ASR engine.
+
+But, I have to say that GGML's whisper.cpp is a real excellent and amazing open source AI project and very helpful for C/C++ programmer and the original author of GGML is the only person I know of who is AI expert and modern C++ master and familiar with both iOS(app / native ) and Android(app / native) and Linux(app / native) software development at the same time(I know a few programmers who are familiar with both iOS(app / dev) and Android (app / native) and Linux(app / native) software development but they also know very little about real AI tech) and I have to say that the original author of GGML made a huge contribution to our planet.
+
+
+<hr>
+
+
+- Could I contact to you by e-mail?
+
+  * Please do not send e-mail to me for technical/non-technical question. Public technical discussion on github is preferred
+
+  * Feel free to submit issues or new features(focus on Android at the moment), volunteer support would be provided if time permits
+
+
+<hr>
+
+
+- Can I sponsor to project KanTV
+
+In Sep 2022, after I left my last employer, I became a no-paid/freelancer programmer because of various/complex reason. started writing some code for solving some technical problems in a personal project KanTV(which was launched on 05/2021) and also for practicing my C/C++/Java programming. Just for fun, I implemented online-TV recording feature on 12/2023, I implmented a <a href="https://github.com/zhouwg/kantv/issues/64">device-side AI PoC on Xiaomi 14</a> by the great&excellent&amazaing <a href="https://github.com/ggerganov/whisper.cpp">whisper.cpp</a> on 03/2024 - something I did not expect at all.
+
+I have to say I heard whisper.cpp too late.if there is no GFW(I had been spent about RMB10000(USD 1500-1600) to fix network issue caused by GFW since 2019), I would heard Georgi Gerganov's great whisper.cpp earlier.of course there are many programmers and AI researchers from China heard wishiper.cpp very earlier, this is also the fact.
+
+With personal time/effort(personal purchase a Dell PC and Xiaomi 14 for software development activity, personal purchase Cloud Server for setup a dedicated proxy to cross the GFW and then access github more stably and Google is available accordingly......), the project grew and now I want to seek external resource to help this project growing.
+
+I don't have an oversea phone number and I could not create Github Sponsors account accordingly.I only have a Wechat account so I put my personal WeChat reward(aka "赞赏" in Chinese or "donation" in English) QR code here.In other words, sponsorship of this project can ONLY be done through WeChat Pay(it's also to comply China's compliance policy. a TIP here:many personal privacy information(include face identification) might be required/provided to open a WeChat/WeChat Pay account and as well-known we(include Tencent) are used to that because of China's compliance policy) and if you mind that pls ignore this sponsorship info), thanks for your understanding. Of course,I will list received sponsorship and usage of sponsorship irregularly.
+
+![zhouwg-reward](https://github.com/zhouwg/kantv/assets/6889919/7832ef0e-1091-4a82-8f3a-eb78afae500b)
+
+Still, if you do decide to sponsor me, the money will most likely go towards buying [various high-end powerful Android phone](./docs/high-end-android-phone.md) for device-side AI software development activity and pay for Cloud Server, or buy some coffee or buy a meal to potential volunteer programmer to participate in project's development.
+
+Contribute PR/codes is the best sponsorship to project KanTV.
+
+Thanks!
diff --git a/external/whispercpp/jni/whispercpp-jni-impl.cpp b/external/whispercpp/jni/whispercpp-jni-impl.cpp
@@ -1,6 +1,26 @@
 /*
  * Copyright (c) 2024- KanTV Authors. All Rights Reserved.
  *
+ * Copyright (c) zhou.weiguo(zhouwg2000@gmail.com), this clean-room implementation is for
+ *
+ * PoC(https://github.com/zhouwg/kantv/issues/64) in project KanTV. the initial implementation was done
+ *
+ * from 03-05-2024 to 03-16-2024. the initial implementation could be found at:
+ *
+ * https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.cpp#L6727
+
+ * https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.h#L620
+
+ * https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/jni/whispercpp-jni.c
+
+ * https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/cdeosplayer-lib/src/main/java/org/ggml/whispercpp/whispercpp.java
+ *
+ *
+ * in short, it a very concise implementation and the method here is never seen in any other similar
+ *
+ * (whisper.cpp related) open-source project before 03-05-2024.
+ *
+ *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
  * You may obtain a copy of the License at
@@ -13,8 +33,7 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  *
- * The above statement and notice must be included in corresponding files
- * in derived project
+ * The above statement and notice must be included in corresponding files in derived project
  */
 
 #define RUAPU_IMPLEMENTATION
@@ -54,9 +73,6 @@
 #include <random>
 #include <functional>
 
-
-
-
 extern "C" {
 #include <inttypes.h>
 #include <math.h>
@@ -99,20 +115,11 @@ extern "C" {
 
 }
 
+#define MAX_PATH_LEN                    512
 
-// forward function declaration
-static bool whisper_abort_callback(void * data);
-
-
-//------------------------------------ added by zhou.weiguo(https://github.com/zhouwg) since 03-05-2024(2024-03-05) -----------------------------------------
-//for PoC(https://github.com/cdeos/kantv/issues/64) in project KanTV
-//
-//I should follow coding style of GGML
-
+#define MAX_SAMPLE_SIZE                 (1024 * 8 * 32)
 
-#define MAX_SAMPLE_SIZE  (1024 * 8 * 32)
-#define MAX_PATH_LEN     512
-#define MAX_WHISPER_IN_BUFFER_SIZE (1024 * 1024 * 5)
+#define MAX_WHISPER_IN_BUFFER_SIZE      (1024 * 1024 * 5)
 
 class whisper_asr;
 
@@ -123,6 +130,9 @@ typedef struct {
     char  sz_model_path[MAX_PATH_LEN];
     size_t n_threads;
 
+    //03-20-2024,referenced by:https://github.com/futo-org/whisper-acft
+    size_t n_decoding_mode;                         // 0:WHISPER_SAMPLING_GREEDY 1:WHISPER_SAMPLING_BEAM_SEARCH
+
     size_t n_asr_mode;                              // 0: normal transcription  1: asr pressure test 2:benchmark 3: transcription + audio record
     size_t n_benchmark_type;                        // what to benchmark: 0: asr, 1: memcpy 2: mulmat  3: whisper_encode/whisper full benchmark
     bool   b_use_gpu;
@@ -846,7 +856,9 @@ class whisper_asr {
             n_end_time = ggml_time_us();
             n_durtion = (n_end_time - n_begin_time) / 1000;
 
-            if (n_durtion > 1000) { // 1 seconds, very good on Xiaomi 14, about 500-700 ms with GGML model ggml-tiny.en-q8_0.bin
+            // 1 second, very good on Xiaomi 14, about 500-700 ms with GGML model ggml-tiny.en-q8_0.bin
+            // 300 -900 ms are both ok with latest upstream whisper.cpp(as of 03-22-2024), but whisper.cpp would produce sketchy/incorrect/repeat tokens
+            if (n_durtion > 300) {
                 LOGGD("duration of audio data gathering is: %d milliseconds\n", n_durtion);
                 LOGGD("size of gathered audio data: %d\n", _n_whisper_in_size);
                 LOGGD("total audio sample counts %d\n", _n_total_sample_counts);
@@ -900,15 +912,15 @@ class whisper_asr {
                     continue;
                 }
 
-                LOGGD("got resampled samples:%d, total samples:%d \n", result, _n_total_sample_counts);
+                //LOGGD("got resampled samples:%d, total samples:%d \n", result, _n_total_sample_counts);
                 while (1) {
                     p_samples += (result * sizeof(float));
                     result = swr_convert(_swr_ctx,
                                          (uint8_t **) (&p_samples),
                                          _n_total_sample_counts,
                                          NULL,
                                          0);
-                    LOGGD("got resampled samples:%d, total samples:%d \n", result, _n_total_sample_counts);
+                    //LOGGD("got resampled samples:%d, total samples:%d \n", result, _n_total_sample_counts);
                     if (0 == result) {
                         break;
                     }
@@ -979,7 +991,6 @@ class whisper_asr {
 };
 
 
-// TODO: remove the mutex
 /**
  *
  * @param  opaque          uncompressed pcm data, presented as AVFrame
@@ -1010,7 +1021,7 @@ static const char * whisper_asr_callback(void * opaque) {
     if ((NULL == p_asr_ctx))
         return NULL;
 
-    if (1 == p_asr_ctx->n_asr_mode) { //ASR pressure test
+    if (1 == p_asr_ctx->n_asr_mode) { //ASR pressure test during online-TV playback
         static std::string test_info;
 
         test_info = whisper_get_time_string() + "\n" +
@@ -1020,14 +1031,13 @@ static const char * whisper_asr_callback(void * opaque) {
         return test_info.c_str();
     }
 
-    if (2 == p_asr_ctx->n_asr_mode) { //benchmark
+    if (2 == p_asr_ctx->n_asr_mode) { //ASR benchmark in standalone ASRResearchFragment.java
         return NULL;
     }
 
-    //pthread_mutex_lock(&p_asr_ctx->mutex);
+    //pthread_mutex_lock(&p_asr_ctx->mutex); // remove the mutex since 03-19-2024, crash would happen before 03-19 without mutex
 
     audioframe  = (AVFrame *) opaque;
-    p_samples   = p_asr_ctx->p_sample_buffer;
     num_samples = audioframe->nb_samples;
 
     frame_size = av_samples_get_buffer_size(NULL, audioframe->channels, audioframe->nb_samples,
@@ -1134,7 +1144,7 @@ static const char * whisper_asr_callback(void * opaque) {
         }
     }
 
-    //pthread_mutex_unlock(&p_asr_ctx->mutex);
+    //pthread_mutex_unlock(&p_asr_ctx->mutex); // remove the mutex since 03-19-2024, crash would happen before 03-19 without mutex
 
     return NULL;
 }
@@ -1185,6 +1195,36 @@ static const char * whisper_asr_audio_to_text(const float * pf32_audio_buffer, i
 
     begin_time = ggml_time_ms();
     whisper_reset_timings(p_asr_ctx->p_context);
+
+    //03-20-2024, ref:https://github.com/futo-org/whisper-acft
+    p_asr_ctx->p_params->max_tokens        = 256;
+    p_asr_ctx->p_params->temperature_inc   = 0.0f;
+    //03-22-2024, don't use this new fine-tune method because it will brings side-effect:app crash randomly
+    //p_asr_ctx->p_params->audio_ctx         = std::min(1500, (int)ceil((double)num_samples / (double)(320.0)) + 16);
+
+    //replaced with default value, ref: https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h#L499
+    p_asr_ctx->p_params->audio_ctx         = 0;
+
+    //p_asr_ctx->p_params->initial_prompt    = "\" English online TV \"";
+    /*
+    p_asr_ctx->p_params->abort_callback_user_data = p_asr_ctx;
+    p_asr_ctx->p_params->abort_callback = [](void * user_data) -> bool {
+        auto *asr_ctx = reinterpret_cast<whisper_asr_context*>(user_data);
+        return true;
+    };
+    */
+
+    p_asr_ctx->n_decoding_mode  = WHISPER_SAMPLING_GREEDY;
+    if (WHISPER_SAMPLING_GREEDY == p_asr_ctx->n_decoding_mode) {
+        p_asr_ctx->p_params->strategy = WHISPER_SAMPLING_GREEDY;
+        p_asr_ctx->p_params->greedy.best_of         = 1;    //ref: https://github.com/openai/whisper/blob/f82bc59f5ea234d4b97fb2860842ed38519f7e65/whisper/transcribe.py#L264
+    } else {
+        p_asr_ctx->p_params->strategy               = WHISPER_SAMPLING_BEAM_SEARCH;
+        p_asr_ctx->p_params->beam_search.beam_size  = 5;    //ref: https://github.com/openai/whisper/blob/f82bc59f5ea234d4b97fb2860842ed38519f7e65/whisper/transcribe.py#L265
+        p_asr_ctx->p_params->greedy.best_of         = 5;
+    }
+    LOGGD("decoding_mode=%d, audio_ctx=%d\n", p_asr_ctx->n_decoding_mode, p_asr_ctx->p_params->audio_ctx);
+
     result = whisper_full(p_asr_ctx->p_context, *p_asr_ctx->p_params, pf32_audio_buffer, num_samples);
     if (0 != result) {
         LOGW("whisper inference failure, pls check why?\n");
@@ -1194,7 +1234,7 @@ static const char * whisper_asr_audio_to_text(const float * pf32_audio_buffer, i
     end_time = ggml_time_ms();
 
     LOGGW("whisper inference cost %d ms\n", end_time - begin_time);
-    //whisper_print_timings(p_asr_ctx->p_context);
+    //whisper_print_timings(p_asr_ctx->p_context); // DO NOT uncomment this line
 
     num_segments = whisper_full_n_segments(p_asr_ctx->p_context);
     for (index = 0; index < num_segments; index++) {
@@ -1258,7 +1298,7 @@ int whisper_asr_init(const char * sz_model_path, int n_threads, int n_asrmode) {
      }
 
 
-     // dynamic ISA dectect by RUAPU
+     // dynamic ISA dectect by RUAPU, prepare for SIMD optimization on Android device. but not used currently
      ruapu_init();
      const char* const* supported = ruapu_rua();
      while (*supported) {
@@ -1339,7 +1379,7 @@ int whisper_asr_init(const char * sz_model_path, int n_threads, int n_asrmode) {
      params.print_special           = false;
      params.translate               = false; //first step is transcription, the second step is English -> Chinese
      //params.initial_prompt        = "hello,whisper.cpp";
-     params.language                = "en";
+     //params.language                = "en";
      params.n_threads               = n_threads;;
      params.offset_ms               = 0;
      params.no_context              = true;
@@ -1348,6 +1388,14 @@ int whisper_asr_init(const char * sz_model_path, int n_threads, int n_asrmode) {
 
      params.speed_up                = false;
      params.debug_mode              = false;
+     params.audio_ctx               = 0;
+
+     params.suppress_blank              = false;
+     params.suppress_non_speech_tokens  = false;
+
+     //03-20-2024, ref:https://github.com/futo-org/whisper-acft
+     p_asr_ctx->n_decoding_mode         = WHISPER_SAMPLING_BEAM_SEARCH;
+
 
      //params.tdrz_enable                  = false;//whisper complain failed to compute log mel spectrogram when this flag was enabled
      //params.suppress_blank               = true;
@@ -1357,9 +1405,7 @@ int whisper_asr_init(const char * sz_model_path, int n_threads, int n_asrmode) {
 
      p_asr_ctx->b_pre_convert = p_asr_ctx->b_enable_dump_16k_data = false;
 
-
-
-     LOGGV("leave kantv_asr_init\n");
+     LOGGV("leave whisper_asr_init\n");
 
      return result;
 
@@ -1491,4 +1537,4 @@ int whisper_asr_reset(const char * sz_model_path, int n_threads, int n_asrmode)
 
     LOGGD("leave asr reset\n");
     return result;
-}
+}
diff --git a/external/whispercpp/jni/whispercpp-jni.h b/external/whispercpp/jni/whispercpp-jni.h
@@ -1,6 +1,25 @@
 /*
  * Copyright (c) 2024- KanTV Authors. All Rights Reserved.
  *
+ * Copyright (c) zhou.weiguo(zhouwg2000@gmail.com), this clean-room implementation is for
+ *
+ * PoC(https://github.com/zhouwg/kantv/issues/64) in project KanTV. the initial implementation was done
+ *
+ * from 03-05-2024 to 03-16-2024.the initial implementation could be found at:
+ *
+ * https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.cpp#L6727
+
+ * https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.h#L620
+
+ * https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/jni/whispercpp-jni.c
+
+ * https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/cdeosplayer-lib/src/main/java/org/ggml/whispercpp/whispercpp.java
+ *
+ *
+ * in short, it a very concise implementation and the method here is never seen in any other similar
+ *
+ * (whisper.cpp related) open-source project before 03-05-2024.
+ *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
  * You may obtain a copy of the License at
@@ -13,8 +32,7 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  *
- * The above statement and notice must be included in corresponding files
- * in derived project
+ * The above statement and notice must be included in corresponding files in derived project
  */
 
 #ifndef WHISPER_JNI_H
@@ -29,11 +47,7 @@
 extern "C" {
 #endif
 
-    // =================================================================================================
-    //
-    // the following is for PoC(https://github.com/cdeos/kantv/issues/64) in project KanTV
-    //
-    // =================================================================================================
+
     // JNI helper function for benchmark
     int          whisper_get_cpu_core_counts(void);
     void         whisper_set_benchmark_status(int b_exit_benchmark);

diff --git a/toolchain/.gitignore → prebuilts/toolchain/.gitignore b/toolchain/.gitignore → prebuilts/toolchain/.gitignore