Skip to content

Commit

Permalink
doc:update/refine doc in README.md or source code (#94)
Browse files Browse the repository at this point in the history
* continue to try a new finetune method which introduced in ggerganov/whisper.cpp#1951 after sync with upstream whispercpp

* continue to try a new finetune method which introduced in ggerganov/whisper.cpp#1951 after sync with upstream whispercpp

* Update FAQ.md

* Update FAQ.md
  • Loading branch information
zhouwg authored Mar 23, 2024
1 parent e8846da commit d241d99
Show file tree
Hide file tree
Showing 5 changed files with 159 additions and 43 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ modify <a href="https://github.com/cdeos/kantv/blob/master/build/envsetup.sh#L52

pay attention <a href="https://github.com/cdeos/kantv/blob/master/external/whispercpp/CMakeLists.txt#L54">here and modify it accordingly</a> if build-target is kantv-android and running Android device is NOT Xiaomi 14

TIP: a VERY powerful Linux PC / Linux workstation is HIGHLY recommended for this step.
a VERY powerful Linux PC / Linux workstation is HIGHLY recommended for this step.

```
. build/envsetup.sh
Expand Down Expand Up @@ -248,9 +248,9 @@ Report issue in various Android-based phone or even submit PR to this project is
- [How to integrate proprietary/open source codes to project KanTV for personal/proprietary/commercial R&D activity](https://github.com/cdeos/kantv/issues/74)
- [How to use whisper.cpp and ffmpeg to add subtitle to video](./docs/how-to-use-whispercpp-ffmpeg-add-subtitle-to-video.md)
- [Acknowledgement](./docs/acknowledgement.md)
- [F.A.Q](./docs/FAQ.md)


- Please do not send e-mail to me for technical question. Public technical discussion on github is preferred.
- feel free to submit issues or new features(focus on Android at the moment), volunteer support would be provided if time permits.

### ChangeLog

Expand Down
56 changes: 56 additions & 0 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@

- Should I use/reference kantv in my project?

Project KanTV is a personal/hobby project. It does not strive to provide a production ready implementation. The main goals of the implementation is to be educational, hackable. There are no guarantees that the implementation is correct and bug-free and stuff can break at any point in the future. Support and updates will depend mostly on contributions, since with time I will move on and won't dedicate too much time on the project.

If you plan to use kantv in your own project, keep in mind the above.

My advice is to not put all your eggs into the kantv basket although I wish you could feed your need by use/reference this project.

<hr>

- Is there any IPR concern/risk in kantv?

Project KanTV was almost/completely done by myself since 05-2021 and NO IPR concern/risk(some parts with IPR risk(implementation of ChinaDRM, widevine, wiseplay, TEE...) have been removed accordingly because I had been received good IPR compliance training when I was a full-time employee in MNC company).


<hr>

- What's the relationship between kantv and whisper.cpp


Project KanTV has NO or NO personal relationship with whisper.cpp but only use source code of whisper.cpp as ASR engine.

But, I have to say that GGML's whisper.cpp is a real excellent and amazing open source AI project and very helpful for C/C++ programmer and the original author of GGML is the only person I know of who is AI expert and modern C++ master and familiar with both iOS(app / native ) and Android(app / native) and Linux(app / native) software development at the same time(I know a few programmers who are familiar with both iOS(app / dev) and Android (app / native) and Linux(app / native) software development but they also know very little about real AI tech) and I have to say that the original author of GGML made a huge contribution to our planet.


<hr>


- Could I contact to you by e-mail?

* Please do not send e-mail to me for technical/non-technical question. Public technical discussion on github is preferred

* Feel free to submit issues or new features(focus on Android at the moment), volunteer support would be provided if time permits


<hr>


- Can I sponsor to project KanTV

In Sep 2022, after I left my last employer, I became a no-paid/freelancer programmer because of various/complex reason. started writing some code for solving some technical problems in a personal project KanTV(which was launched on 05/2021) and also for practicing my C/C++/Java programming. Just for fun, I implemented online-TV recording feature on 12/2023, I implmented a <a href="https://github.com/zhouwg/kantv/issues/64">device-side AI PoC on Xiaomi 14</a> by the great&excellent&amazaing <a href="https://github.com/ggerganov/whisper.cpp">whisper.cpp</a> on 03/2024 - something I did not expect at all.

I have to say I heard whisper.cpp too late.if there is no GFW(I had been spent about RMB10000(USD 1500-1600) to fix network issue caused by GFW since 2019), I would heard Georgi Gerganov's great whisper.cpp earlier.of course there are many programmers and AI researchers from China heard wishiper.cpp very earlier, this is also the fact.

With personal time/effort(personal purchase a Dell PC and Xiaomi 14 for software development activity, personal purchase Cloud Server for setup a dedicated proxy to cross the GFW and then access github more stably and Google is available accordingly......), the project grew and now I want to seek external resource to help this project growing.

I don't have an oversea phone number and I could not create Github Sponsors account accordingly.I only have a Wechat account so I put my personal WeChat reward(aka "赞赏" in Chinese or "donation" in English) QR code here.In other words, sponsorship of this project can ONLY be done through WeChat Pay(it's also to comply China's compliance policy. a TIP here:many personal privacy information(include face identification) might be required/provided to open a WeChat/WeChat Pay account and as well-known we(include Tencent) are used to that because of China's compliance policy) and if you mind that pls ignore this sponsorship info), thanks for your understanding. Of course,I will list received sponsorship and usage of sponsorship irregularly.

![zhouwg-reward](https://github.com/zhouwg/kantv/assets/6889919/7832ef0e-1091-4a82-8f3a-eb78afae500b)

Still, if you do decide to sponsor me, the money will most likely go towards buying [various high-end powerful Android phone](./docs/high-end-android-phone.md) for device-side AI software development activity and pay for Cloud Server, or buy some coffee or buy a meal to potential volunteer programmer to participate in project's development.

Contribute PR/codes is the best sponsorship to project KanTV.

Thanks!
112 changes: 79 additions & 33 deletions external/whispercpp/jni/whispercpp-jni-impl.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
/*
* Copyright (c) 2024- KanTV Authors. All Rights Reserved.
*
* Copyright (c) zhou.weiguo(zhouwg2000@gmail.com), this clean-room implementation is for
*
* PoC(https://github.com/zhouwg/kantv/issues/64) in project KanTV. the initial implementation was done
*
* from 03-05-2024 to 03-16-2024. the initial implementation could be found at:
*
* https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.cpp#L6727
* https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.h#L620
* https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/jni/whispercpp-jni.c
* https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/cdeosplayer-lib/src/main/java/org/ggml/whispercpp/whispercpp.java
*
*
* in short, it a very concise implementation and the method here is never seen in any other similar
*
* (whisper.cpp related) open-source project before 03-05-2024.
*
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
Expand All @@ -13,8 +33,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*
* The above statement and notice must be included in corresponding files
* in derived project
* The above statement and notice must be included in corresponding files in derived project
*/

#define RUAPU_IMPLEMENTATION
Expand Down Expand Up @@ -54,9 +73,6 @@
#include <random>
#include <functional>




extern "C" {
#include <inttypes.h>
#include <math.h>
Expand Down Expand Up @@ -99,20 +115,11 @@ extern "C" {

}

#define MAX_PATH_LEN 512

// forward function declaration
static bool whisper_abort_callback(void * data);


//------------------------------------ added by zhou.weiguo(https://github.com/zhouwg) since 03-05-2024(2024-03-05) -----------------------------------------
//for PoC(https://github.com/cdeos/kantv/issues/64) in project KanTV
//
//I should follow coding style of GGML

#define MAX_SAMPLE_SIZE (1024 * 8 * 32)

#define MAX_SAMPLE_SIZE (1024 * 8 * 32)
#define MAX_PATH_LEN 512
#define MAX_WHISPER_IN_BUFFER_SIZE (1024 * 1024 * 5)
#define MAX_WHISPER_IN_BUFFER_SIZE (1024 * 1024 * 5)

class whisper_asr;

Expand All @@ -123,6 +130,9 @@ typedef struct {
char sz_model_path[MAX_PATH_LEN];
size_t n_threads;

//03-20-2024,referenced by:https://github.com/futo-org/whisper-acft
size_t n_decoding_mode; // 0:WHISPER_SAMPLING_GREEDY 1:WHISPER_SAMPLING_BEAM_SEARCH

size_t n_asr_mode; // 0: normal transcription 1: asr pressure test 2:benchmark 3: transcription + audio record
size_t n_benchmark_type; // what to benchmark: 0: asr, 1: memcpy 2: mulmat 3: whisper_encode/whisper full benchmark
bool b_use_gpu;
Expand Down Expand Up @@ -846,7 +856,9 @@ class whisper_asr {
n_end_time = ggml_time_us();
n_durtion = (n_end_time - n_begin_time) / 1000;

if (n_durtion > 1000) { // 1 seconds, very good on Xiaomi 14, about 500-700 ms with GGML model ggml-tiny.en-q8_0.bin
// 1 second, very good on Xiaomi 14, about 500-700 ms with GGML model ggml-tiny.en-q8_0.bin
// 300 -900 ms are both ok with latest upstream whisper.cpp(as of 03-22-2024), but whisper.cpp would produce sketchy/incorrect/repeat tokens
if (n_durtion > 300) {
LOGGD("duration of audio data gathering is: %d milliseconds\n", n_durtion);
LOGGD("size of gathered audio data: %d\n", _n_whisper_in_size);
LOGGD("total audio sample counts %d\n", _n_total_sample_counts);
Expand Down Expand Up @@ -900,15 +912,15 @@ class whisper_asr {
continue;
}

LOGGD("got resampled samples:%d, total samples:%d \n", result, _n_total_sample_counts);
//LOGGD("got resampled samples:%d, total samples:%d \n", result, _n_total_sample_counts);
while (1) {
p_samples += (result * sizeof(float));
result = swr_convert(_swr_ctx,
(uint8_t **) (&p_samples),
_n_total_sample_counts,
NULL,
0);
LOGGD("got resampled samples:%d, total samples:%d \n", result, _n_total_sample_counts);
//LOGGD("got resampled samples:%d, total samples:%d \n", result, _n_total_sample_counts);
if (0 == result) {
break;
}
Expand Down Expand Up @@ -979,7 +991,6 @@ class whisper_asr {
};


// TODO: remove the mutex
/**
*
* @param opaque uncompressed pcm data, presented as AVFrame
Expand Down Expand Up @@ -1010,7 +1021,7 @@ static const char * whisper_asr_callback(void * opaque) {
if ((NULL == p_asr_ctx))
return NULL;

if (1 == p_asr_ctx->n_asr_mode) { //ASR pressure test
if (1 == p_asr_ctx->n_asr_mode) { //ASR pressure test during online-TV playback
static std::string test_info;

test_info = whisper_get_time_string() + "\n" +
Expand All @@ -1020,14 +1031,13 @@ static const char * whisper_asr_callback(void * opaque) {
return test_info.c_str();
}

if (2 == p_asr_ctx->n_asr_mode) { //benchmark
if (2 == p_asr_ctx->n_asr_mode) { //ASR benchmark in standalone ASRResearchFragment.java
return NULL;
}

//pthread_mutex_lock(&p_asr_ctx->mutex);
//pthread_mutex_lock(&p_asr_ctx->mutex); // remove the mutex since 03-19-2024, crash would happen before 03-19 without mutex

audioframe = (AVFrame *) opaque;
p_samples = p_asr_ctx->p_sample_buffer;
num_samples = audioframe->nb_samples;

frame_size = av_samples_get_buffer_size(NULL, audioframe->channels, audioframe->nb_samples,
Expand Down Expand Up @@ -1134,7 +1144,7 @@ static const char * whisper_asr_callback(void * opaque) {
}
}

//pthread_mutex_unlock(&p_asr_ctx->mutex);
//pthread_mutex_unlock(&p_asr_ctx->mutex); // remove the mutex since 03-19-2024, crash would happen before 03-19 without mutex

return NULL;
}
Expand Down Expand Up @@ -1185,6 +1195,36 @@ static const char * whisper_asr_audio_to_text(const float * pf32_audio_buffer, i

begin_time = ggml_time_ms();
whisper_reset_timings(p_asr_ctx->p_context);

//03-20-2024, ref:https://github.com/futo-org/whisper-acft
p_asr_ctx->p_params->max_tokens = 256;
p_asr_ctx->p_params->temperature_inc = 0.0f;
//03-22-2024, don't use this new fine-tune method because it will brings side-effect:app crash randomly
//p_asr_ctx->p_params->audio_ctx = std::min(1500, (int)ceil((double)num_samples / (double)(320.0)) + 16);

//replaced with default value, ref: https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h#L499
p_asr_ctx->p_params->audio_ctx = 0;

//p_asr_ctx->p_params->initial_prompt = "\" English online TV \"";
/*
p_asr_ctx->p_params->abort_callback_user_data = p_asr_ctx;
p_asr_ctx->p_params->abort_callback = [](void * user_data) -> bool {
auto *asr_ctx = reinterpret_cast<whisper_asr_context*>(user_data);
return true;
};
*/

p_asr_ctx->n_decoding_mode = WHISPER_SAMPLING_GREEDY;
if (WHISPER_SAMPLING_GREEDY == p_asr_ctx->n_decoding_mode) {
p_asr_ctx->p_params->strategy = WHISPER_SAMPLING_GREEDY;
p_asr_ctx->p_params->greedy.best_of = 1; //ref: https://github.com/openai/whisper/blob/f82bc59f5ea234d4b97fb2860842ed38519f7e65/whisper/transcribe.py#L264
} else {
p_asr_ctx->p_params->strategy = WHISPER_SAMPLING_BEAM_SEARCH;
p_asr_ctx->p_params->beam_search.beam_size = 5; //ref: https://github.com/openai/whisper/blob/f82bc59f5ea234d4b97fb2860842ed38519f7e65/whisper/transcribe.py#L265
p_asr_ctx->p_params->greedy.best_of = 5;
}
LOGGD("decoding_mode=%d, audio_ctx=%d\n", p_asr_ctx->n_decoding_mode, p_asr_ctx->p_params->audio_ctx);

result = whisper_full(p_asr_ctx->p_context, *p_asr_ctx->p_params, pf32_audio_buffer, num_samples);
if (0 != result) {
LOGW("whisper inference failure, pls check why?\n");
Expand All @@ -1194,7 +1234,7 @@ static const char * whisper_asr_audio_to_text(const float * pf32_audio_buffer, i
end_time = ggml_time_ms();

LOGGW("whisper inference cost %d ms\n", end_time - begin_time);
//whisper_print_timings(p_asr_ctx->p_context);
//whisper_print_timings(p_asr_ctx->p_context); // DO NOT uncomment this line

num_segments = whisper_full_n_segments(p_asr_ctx->p_context);
for (index = 0; index < num_segments; index++) {
Expand Down Expand Up @@ -1258,7 +1298,7 @@ int whisper_asr_init(const char * sz_model_path, int n_threads, int n_asrmode) {
}


// dynamic ISA dectect by RUAPU
// dynamic ISA dectect by RUAPU, prepare for SIMD optimization on Android device. but not used currently
ruapu_init();
const char* const* supported = ruapu_rua();
while (*supported) {
Expand Down Expand Up @@ -1339,7 +1379,7 @@ int whisper_asr_init(const char * sz_model_path, int n_threads, int n_asrmode) {
params.print_special = false;
params.translate = false; //first step is transcription, the second step is English -> Chinese
//params.initial_prompt = "hello,whisper.cpp";
params.language = "en";
//params.language = "en";
params.n_threads = n_threads;;
params.offset_ms = 0;
params.no_context = true;
Expand All @@ -1348,6 +1388,14 @@ int whisper_asr_init(const char * sz_model_path, int n_threads, int n_asrmode) {

params.speed_up = false;
params.debug_mode = false;
params.audio_ctx = 0;

params.suppress_blank = false;
params.suppress_non_speech_tokens = false;

//03-20-2024, ref:https://github.com/futo-org/whisper-acft
p_asr_ctx->n_decoding_mode = WHISPER_SAMPLING_BEAM_SEARCH;


//params.tdrz_enable = false;//whisper complain failed to compute log mel spectrogram when this flag was enabled
//params.suppress_blank = true;
Expand All @@ -1357,9 +1405,7 @@ int whisper_asr_init(const char * sz_model_path, int n_threads, int n_asrmode) {

p_asr_ctx->b_pre_convert = p_asr_ctx->b_enable_dump_16k_data = false;



LOGGV("leave kantv_asr_init\n");
LOGGV("leave whisper_asr_init\n");

return result;

Expand Down Expand Up @@ -1491,4 +1537,4 @@ int whisper_asr_reset(const char * sz_model_path, int n_threads, int n_asrmode)

LOGGD("leave asr reset\n");
return result;
}
}
28 changes: 21 additions & 7 deletions external/whispercpp/jni/whispercpp-jni.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,25 @@
/*
* Copyright (c) 2024- KanTV Authors. All Rights Reserved.
*
* Copyright (c) zhou.weiguo(zhouwg2000@gmail.com), this clean-room implementation is for
*
* PoC(https://github.com/zhouwg/kantv/issues/64) in project KanTV. the initial implementation was done
*
* from 03-05-2024 to 03-16-2024.the initial implementation could be found at:
*
* https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.cpp#L6727
* https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/whisper.h#L620
* https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/external/whispercpp/jni/whispercpp-jni.c
* https://github.com/cdeos/kantv/blob/kantv-poc-with-whispercpp/cdeosplayer/cdeosplayer-lib/src/main/java/org/ggml/whispercpp/whispercpp.java
*
*
* in short, it a very concise implementation and the method here is never seen in any other similar
*
* (whisper.cpp related) open-source project before 03-05-2024.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
Expand All @@ -13,8 +32,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*
* The above statement and notice must be included in corresponding files
* in derived project
* The above statement and notice must be included in corresponding files in derived project
*/

#ifndef WHISPER_JNI_H
Expand All @@ -29,11 +47,7 @@
extern "C" {
#endif

// =================================================================================================
//
// the following is for PoC(https://github.com/cdeos/kantv/issues/64) in project KanTV
//
// =================================================================================================

// JNI helper function for benchmark
int whisper_get_cpu_core_counts(void);
void whisper_set_benchmark_status(int b_exit_benchmark);
Expand Down
File renamed without changes.

0 comments on commit d241d99

Please sign in to comment.