Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval-callback: Example how to use eval callback for debugging #6576

Merged
merged 25 commits into from
Apr 11, 2024

Conversation

phymbert
Copy link
Collaborator

@phymbert phymbert commented Apr 10, 2024

Motivation

It can be useful to debug the inference graph, for example, to compare with the original pytorch version.
Also, one would need to retrieve intermediate tensors data like imatrix or advanced NLP.
This example shows how to use inference callback.

Suggestions from @slaren, thanks:

Changes

  • Add a simple example ggml-debug that prints each operation and outputs tensors' data.

Example

eval-callback \
  --hf-repo ggml-org/models \
  --hf-file phi-2/ggml-model-q4_0.gguf \
  --model phi-2-q4_0.gguf \
  --prompt hello \
  --seed 42 \
  -ngl 33
Output
ggml_debug:                 inp_embd = (f32)   GET_ROWS(token_embd.weight{2560, 51200, 1, 1}, inp_tokens{1, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [ -0.0181,   0.0272,   0.0272, ...],
                                      ],
                                     ]
ggml_debug:                   norm-0 = (f32)       NORM(CUDA0#inp_embd#0{2560, 1, 1, 1}, }) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [ -0.6989,   1.0636,   1.0636, ...],
                                      ],
                                     ]
ggml_debug:                 norm_w-0 = (f32)        MUL(norm-0{2560, 1, 1, 1}, blk.0.attn_norm.weight{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [ -0.1800,   0.2817,   0.2632, ...],
                                      ],
                                     ]
ggml_debug:              attn_norm-0 = (f32)        ADD(norm_w-0{2560, 1, 1, 1}, blk.0.attn_norm.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                   wqkv-0 = (f32)    MUL_MAT(blk.0.attn_qkv.weight{2560, 7680, 1, 1}, attn_norm-0{2560, 1, 1, 1}}) = {7680, 1, 1, 1} 
# redacted
ggml_debug:                   bqkv-0 = (f32)        ADD(wqkv-0{7680, 1, 1, 1}, blk.0.attn_qkv.bias{7680, 1, 1, 1}}) = {7680, 1, 1, 1} 
# redacted
ggml_debug:            bqkv-0 (view) = (f32)       VIEW(bqkv-0{7680, 1, 1, 1}, }) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                   Qcur-0 = (f32)       CONT(bqkv-0 (view){2560, 1, 1, 1}, }) = {2560, 1, 1, 1} 
# redacted
ggml_debug:        Qcur-0 (reshaped) = (f32)    RESHAPE(Qcur-0{2560, 1, 1, 1}, }) = {80, 32, 1, 1} 
# redacted
ggml_debug:                   Qcur-0 = (f32)       ROPE(Qcur-0 (reshaped){80, 32, 1, 1}, CUDA0#inp_pos#0{1, 1, 1, 1}}) = {80, 32, 1, 1} 
                                     [
                                      [
                                       [ -1.1135,   1.4604,  -1.9226, ...],
                                       [ -0.3608,   0.5076,  -1.8866, ...],
                                       [  1.7643,   0.0273,  -2.1065, ...],
                                       ...
                                      ],
                                     ]
...
ml_debug:        kq_soft_max_ext-0 = (f32)   SOFT_MAX(kq-0{32, 1, 32, 1}, CUDA0#KQ_mask#0{32, 1, 1, 1}}) = {32, 1, 32, 1} 
                                     [
                                      [
                                       [  1.0000,   0.0000,   0.0000, ...],
                                      ],
                                      [
                                       [  1.0000,   0.0000,   0.0000, ...],
                                      ],
                                      [
                                       [  1.0000,   0.0000,   0.0000, ...],
                                      ],
                                     ...
                                     ]
ggml_debug:                    kqv-0 = (f32)    MUL_MAT(v-0{32, 80, 32, 1}, kq_soft_max_ext-0{32, 1, 32, 1}}) = {80, 1, 32, 1} 
                                     [
                                      [
                                       [ -0.2136,  -0.2137,   0.3335, ...],
                                      ],
                                      [
                                       [ -0.2139,   0.2949,  -0.0338, ...],
                                      ],
                                      [
                                       [ -0.4204,  -0.0442,  -0.6392, ...],
                                      ],
                                     ...
                                     ]
ggml_debug:             kqv_merged-0 = (f32)    PERMUTE(kqv-0{80, 1, 32, 1}, }) = {80, 32, 1, 1} 
                                     [
                                      [
                                       [ -0.2136,  -0.2137,   0.3335, ...],
                                       [ -0.2139,   0.2949,  -0.0338, ...],
                                       [ -0.4204,  -0.0442,  -0.6392, ...],
                                       ...
                                      ],
                                     ]
ggml_debug:        kqv_merged_cont-0 = (f32)       CONT(kqv_merged-0{80, 32, 1, 1}, }) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                 kqv_wo-0 = (f32)    MUL_MAT(blk.0.attn_output.weight{2560, 2560, 1, 1}, kqv_merged_cont-0{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                kqv_out-0 = (f32)        ADD(kqv_wo-0{2560, 1, 1, 1}, blk.0.attn_output.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                 ffn_up-0 = (f32)    MUL_MAT(blk.0.ffn_up.weight{2560, 10240, 1, 1}, attn_norm-0{2560, 1, 1, 1}}) = {10240, 1, 1, 1} 
# redacted
ggml_debug:               ffn_up_b-0 = (f32)        ADD(ffn_up-0{10240, 1, 1, 1}, blk.0.ffn_up.bias{10240, 1, 1, 1}}) = {10240, 1, 1, 1} 
# redacted
ggml_debug:               ffn_gelu-0 = (f32)      UNARY(ffn_up_b-0{10240, 1, 1, 1}, }) = {10240, 1, 1, 1} 
# redacted
ggml_debug:               ffn_down-0 = (f32)    MUL_MAT(blk.0.ffn_down.weight{10240, 2560, 1, 1}, ffn_gelu-0{10240, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                ffn_out-0 = (f32)        ADD(ffn_down-0{2560, 1, 1, 1}, blk.0.ffn_down.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [  0.3379,  -0.0695,  -0.1994, ...],
                                      ],
                                     ]
# redacted
ggml_debug:                node_1218 = (f32)   GET_ROWS(l_out-30{2560, 1, 1, 1}, CUDA0#inp_out_ids#0{1, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [  5.1697,   0.5750,   2.1958, ...],
                                      ],
                                     ]
ggml_debug:                 l_out-31 = (f32)        ADD(l_out-31{2560, 1, 1, 1}, node_1218{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [  5.3609,   0.9790,   2.5628, ...],
                                      ],
                                     ]
# redacted
ggml_debug:    result_output_no_bias = (f32)    MUL_MAT(output.weight{2560, 51200, 1, 1}, result_norm{2560, 1, 1, 1}}) = {51200, 1, 1, 1} 
# redacted
ggml_debug:            result_output = (f32)        ADD(result_output_no_bias{51200, 1, 1, 1}, output.bias{51200, 1, 1, 1}}) = {51200, 1, 1, 1} 
                                     [
                                      [
                                       [ 10.6199,   8.5389,  11.3658, ...],
                                      ],

@phymbert phymbert added model Model specific demo Demonstrate some concept or idea, not intended to be merged labels Apr 10, 2024
@phymbert phymbert requested review from ggerganov and slaren April 10, 2024 01:57
@slaren
Copy link
Collaborator

slaren commented Apr 10, 2024

Pretty cool. Some notes:

  • There is no need to use a mutex, the callback will never be called from multiple threads
  • You should at least check the type of the tensors, if only float is supported, ignore the other types
  • To work correctly with non-contiguous tensors you need to take into account the strides. This is the general pattern to do this:
ggml_tensor * t;
char * data; // tensor data
for (int64_t i3 = 0; i3 < t->ne[3]; i3++) {
    for (int64_t i2 = 0; i2 < t->ne[2]; i2++) {
        for (int64_t i1 = 0; i1 < t->ne[1]; i1++) {
            for (int64_t i0 = 0; i0 < t->ne[0]; i0++) {
                size_t i = i3*t->nb[3] + i2*t->nb[2] + i1*t->nb[1] + i0*t->nb[0];
                float v = *(float *)(data + i);
            }
        }
    }
}

This comment was marked as off-topic.

@phymbert phymbert marked this pull request as draft April 10, 2024 02:36
@phymbert phymbert marked this pull request as ready for review April 10, 2024 07:57
@phymbert
Copy link
Collaborator Author

  • To work correctly with non-contiguous tensors you need to take into account the strides.

Looks better, cool!

examples/ggml-debug/ggml-debug.cpp Outdated Show resolved Hide resolved
@phymbert phymbert changed the title gguf-debug: Example how to use callback for debugging ggml-debug: Example how to use callback for debugging Apr 10, 2024
@phymbert phymbert changed the title ggml-debug: Example how to use callback for debugging ggml-debug: Example how to use eval callback for debugging Apr 10, 2024
@phymbert
Copy link
Collaborator Author

@slaren If you can have a quick look at this please when you have time

examples/ggml-debug/ggml-debug.cpp Outdated Show resolved Hide resolved
examples/ggml-debug/ggml-debug.cpp Outdated Show resolved Hide resolved
examples/imatrix/imatrix.cpp Show resolved Hide resolved
@phymbert phymbert requested a review from slaren April 10, 2024 21:45
@phymbert
Copy link
Collaborator Author

@slaren @ggerganov Can I merge ?

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tool! I've found that also printing the total sum of the elements in the tensor is sometimes a useful metric to look at when debugging

Merge after slaren approves

@slaren
Copy link
Collaborator

slaren commented Apr 11, 2024

Not a big deal, but I think that ggml-debug will be a confusing name for this example. Maybe something like eval-callback would be more clear.

@phymbert phymbert changed the title ggml-debug: Example how to use eval callback for debugging eval-callback: Example how to use eval callback for debugging Apr 11, 2024
@phymbert phymbert merged commit b804b1e into master Apr 11, 2024
58 of 61 checks passed
@phymbert phymbert deleted the hp/ggml/debug branch April 11, 2024 12:51
}

printf("%s: %24s = (%s) %10s(%s{%s}, %s}) = {%s}\n", __func__,
t->name, ggml_type_name(t->type), ggml_op_name(t->op),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention that you can use ggml_op_desc instead of ggml_op_name to get proper names for unary ops too instead of just UNARY.

@HanClinto
Copy link
Collaborator

I'm not sure, but it looks like this PR is behind the currently-failing CI builds on OSX, and I'm not currently smart-enough to figure out why.

Example: https://github.com/ggerganov/llama.cpp/actions/runs/8651988935/job/23723903645#step:5:4773

Also, I could be wrong, but this might have snuck in a libcurl dependency in at a level that we aren't comfortable with, but again, I don't have a full handle on this yet either.

@slaren
Copy link
Collaborator

slaren commented Apr 11, 2024

Maybe adding -ngl 0 to the test to avoid using Metal would work. The Metal backend doesn't work reliably in the github runners.

HanClinto added a commit to HanClinto/llama.cpp that referenced this pull request Apr 11, 2024
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
…nov#6576)

* gguf-debug: Example how to use ggml callback for debugging

* gguf-debug: no mutex, verify type, fix stride.

* llama: cv eval: move cb eval field in common gpt_params

* ggml_debug: use common gpt_params to pass cb eval.
Fix get tensor SIGV random.

* ggml_debug: ci: add tests

* ggml_debug: EOL in CMakeLists.txt

* ggml_debug: Remove unused param n_batch, no batching here

* ggml_debug: fix trailing spaces

* ggml_debug: fix trailing spaces

* common: fix cb_eval and user data not initialized

* ci: build revert label

* ggml_debug: add main test label

* doc: add a model: add a link to ggml-debug

* ggml-debug: add to make toolchain

* ggml-debug: tests add the main label

* ggml-debug: ci add test curl label

* common: allow the warmup to be disabled in llama_init_from_gpt_params

* ci: add curl test

* ggml-debug: better tensor type support

* gitignore : ggml-debug

* ggml-debug: printing also the sum of each tensor

* ggml-debug: remove block size

* eval-callback: renamed from ggml-debug

* eval-callback: fix make toolchain

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Demonstrate some concept or idea, not intended to be merged model Model specific
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants