eval-callback: Example how to use eval callback for debugging #6576

phymbert · 2024-04-10T01:57:22Z

Motivation

It can be useful to debug the inference graph, for example, to compare with the original pytorch version.
Also, one would need to retrieve intermediate tensors data like imatrix or advanced NLP.
This example shows how to use inference callback.

Suggestions from @slaren, thanks:

model: support arch DbrxForCausalLM #6515 (comment)

Changes

Add a simple example ggml-debug that prints each operation and outputs tensors' data.

Example

eval-callback \
  --hf-repo ggml-org/models \
  --hf-file phi-2/ggml-model-q4_0.gguf \
  --model phi-2-q4_0.gguf \
  --prompt hello \
  --seed 42 \
  -ngl 33

Output

ggml_debug:                 inp_embd = (f32)   GET_ROWS(token_embd.weight{2560, 51200, 1, 1}, inp_tokens{1, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [ -0.0181,   0.0272,   0.0272, ...],
                                      ],
                                     ]
ggml_debug:                   norm-0 = (f32)       NORM(CUDA0#inp_embd#0{2560, 1, 1, 1}, }) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [ -0.6989,   1.0636,   1.0636, ...],
                                      ],
                                     ]
ggml_debug:                 norm_w-0 = (f32)        MUL(norm-0{2560, 1, 1, 1}, blk.0.attn_norm.weight{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [ -0.1800,   0.2817,   0.2632, ...],
                                      ],
                                     ]
ggml_debug:              attn_norm-0 = (f32)        ADD(norm_w-0{2560, 1, 1, 1}, blk.0.attn_norm.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                   wqkv-0 = (f32)    MUL_MAT(blk.0.attn_qkv.weight{2560, 7680, 1, 1}, attn_norm-0{2560, 1, 1, 1}}) = {7680, 1, 1, 1} 
# redacted
ggml_debug:                   bqkv-0 = (f32)        ADD(wqkv-0{7680, 1, 1, 1}, blk.0.attn_qkv.bias{7680, 1, 1, 1}}) = {7680, 1, 1, 1} 
# redacted
ggml_debug:            bqkv-0 (view) = (f32)       VIEW(bqkv-0{7680, 1, 1, 1}, }) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                   Qcur-0 = (f32)       CONT(bqkv-0 (view){2560, 1, 1, 1}, }) = {2560, 1, 1, 1} 
# redacted
ggml_debug:        Qcur-0 (reshaped) = (f32)    RESHAPE(Qcur-0{2560, 1, 1, 1}, }) = {80, 32, 1, 1} 
# redacted
ggml_debug:                   Qcur-0 = (f32)       ROPE(Qcur-0 (reshaped){80, 32, 1, 1}, CUDA0#inp_pos#0{1, 1, 1, 1}}) = {80, 32, 1, 1} 
                                     [
                                      [
                                       [ -1.1135,   1.4604,  -1.9226, ...],
                                       [ -0.3608,   0.5076,  -1.8866, ...],
                                       [  1.7643,   0.0273,  -2.1065, ...],
                                       ...
                                      ],
                                     ]
...
ml_debug:        kq_soft_max_ext-0 = (f32)   SOFT_MAX(kq-0{32, 1, 32, 1}, CUDA0#KQ_mask#0{32, 1, 1, 1}}) = {32, 1, 32, 1} 
                                     [
                                      [
                                       [  1.0000,   0.0000,   0.0000, ...],
                                      ],
                                      [
                                       [  1.0000,   0.0000,   0.0000, ...],
                                      ],
                                      [
                                       [  1.0000,   0.0000,   0.0000, ...],
                                      ],
                                     ...
                                     ]
ggml_debug:                    kqv-0 = (f32)    MUL_MAT(v-0{32, 80, 32, 1}, kq_soft_max_ext-0{32, 1, 32, 1}}) = {80, 1, 32, 1} 
                                     [
                                      [
                                       [ -0.2136,  -0.2137,   0.3335, ...],
                                      ],
                                      [
                                       [ -0.2139,   0.2949,  -0.0338, ...],
                                      ],
                                      [
                                       [ -0.4204,  -0.0442,  -0.6392, ...],
                                      ],
                                     ...
                                     ]
ggml_debug:             kqv_merged-0 = (f32)    PERMUTE(kqv-0{80, 1, 32, 1}, }) = {80, 32, 1, 1} 
                                     [
                                      [
                                       [ -0.2136,  -0.2137,   0.3335, ...],
                                       [ -0.2139,   0.2949,  -0.0338, ...],
                                       [ -0.4204,  -0.0442,  -0.6392, ...],
                                       ...
                                      ],
                                     ]
ggml_debug:        kqv_merged_cont-0 = (f32)       CONT(kqv_merged-0{80, 32, 1, 1}, }) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                 kqv_wo-0 = (f32)    MUL_MAT(blk.0.attn_output.weight{2560, 2560, 1, 1}, kqv_merged_cont-0{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                kqv_out-0 = (f32)        ADD(kqv_wo-0{2560, 1, 1, 1}, blk.0.attn_output.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                 ffn_up-0 = (f32)    MUL_MAT(blk.0.ffn_up.weight{2560, 10240, 1, 1}, attn_norm-0{2560, 1, 1, 1}}) = {10240, 1, 1, 1} 
# redacted
ggml_debug:               ffn_up_b-0 = (f32)        ADD(ffn_up-0{10240, 1, 1, 1}, blk.0.ffn_up.bias{10240, 1, 1, 1}}) = {10240, 1, 1, 1} 
# redacted
ggml_debug:               ffn_gelu-0 = (f32)      UNARY(ffn_up_b-0{10240, 1, 1, 1}, }) = {10240, 1, 1, 1} 
# redacted
ggml_debug:               ffn_down-0 = (f32)    MUL_MAT(blk.0.ffn_down.weight{10240, 2560, 1, 1}, ffn_gelu-0{10240, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                ffn_out-0 = (f32)        ADD(ffn_down-0{2560, 1, 1, 1}, blk.0.ffn_down.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [  0.3379,  -0.0695,  -0.1994, ...],
                                      ],
                                     ]
# redacted
ggml_debug:                node_1218 = (f32)   GET_ROWS(l_out-30{2560, 1, 1, 1}, CUDA0#inp_out_ids#0{1, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [  5.1697,   0.5750,   2.1958, ...],
                                      ],
                                     ]
ggml_debug:                 l_out-31 = (f32)        ADD(l_out-31{2560, 1, 1, 1}, node_1218{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [  5.3609,   0.9790,   2.5628, ...],
                                      ],
                                     ]
# redacted
ggml_debug:    result_output_no_bias = (f32)    MUL_MAT(output.weight{2560, 51200, 1, 1}, result_norm{2560, 1, 1, 1}}) = {51200, 1, 1, 1} 
# redacted
ggml_debug:            result_output = (f32)        ADD(result_output_no_bias{51200, 1, 1, 1}, output.bias{51200, 1, 1, 1}}) = {51200, 1, 1, 1} 
                                     [
                                      [
                                       [ 10.6199,   8.5389,  11.3658, ...],
                                      ],

slaren · 2024-04-10T02:16:12Z

Pretty cool. Some notes:

There is no need to use a mutex, the callback will never be called from multiple threads
You should at least check the type of the tensors, if only float is supported, ignore the other types
To work correctly with non-contiguous tensors you need to take into account the strides. This is the general pattern to do this:

ggml_tensor * t;
char * data; // tensor data
for (int64_t i3 = 0; i3 < t->ne[3]; i3++) {
    for (int64_t i2 = 0; i2 < t->ne[2]; i2++) {
        for (int64_t i1 = 0; i1 < t->ne[1]; i1++) {
            for (int64_t i0 = 0; i0 < t->ne[0]; i0++) {
                size_t i = i3*t->nb[3] + i2*t->nb[2] + i1*t->nb[1] + i0*t->nb[0];
                float v = *(float *)(data + i);
            }
        }
    }
}

phymbert · 2024-04-10T07:58:20Z

To work correctly with non-contiguous tensors you need to take into account the strides.

Looks better, cool!

examples/ggml-debug/ggml-debug.cpp

Fix get tensor SIGV random.

.github/workflows/build.yml

common/common.h

phymbert · 2024-04-10T20:19:41Z

@slaren If you can have a quick look at this please when you have time

examples/ggml-debug/ggml-debug.cpp

examples/imatrix/imatrix.cpp

phymbert · 2024-04-11T10:11:22Z

@slaren @ggerganov Can I merge ?

ggerganov

Nice tool! I've found that also printing the total sum of the elements in the tensor is sometimes a useful metric to look at when debugging

Merge after slaren approves

examples/ggml-debug/ggml-debug.cpp

slaren · 2024-04-11T11:59:48Z

Not a big deal, but I think that ggml-debug will be a confusing name for this example. Maybe something like eval-callback would be more clear.

slaren · 2024-04-11T20:34:47Z

examples/eval-callback/eval-callback.cpp

+    }
+
+    printf("%s: %24s = (%s) %10s(%s{%s}, %s}) = {%s}\n", __func__,
+           t->name, ggml_type_name(t->type), ggml_op_name(t->op),


I forgot to mention that you can use ggml_op_desc instead of ggml_op_name to get proper names for unary ops too instead of just UNARY.

HanClinto · 2024-04-11T20:40:54Z

I'm not sure, but it looks like this PR is behind the currently-failing CI builds on OSX, and I'm not currently smart-enough to figure out why.

Example: https://github.com/ggerganov/llama.cpp/actions/runs/8651988935/job/23723903645#step:5:4773

Also, I could be wrong, but this might have snuck in a libcurl dependency in at a level that we aren't comfortable with, but again, I don't have a full handle on this yet either.

slaren · 2024-04-11T20:46:11Z

Maybe adding -ngl 0 to the test to avoid using Metal would work. The Metal backend doesn't work reliably in the github runners.

…OSX from ggerganov#6576

…OSX from #6576 (#6619)

…nov#6576) * gguf-debug: Example how to use ggml callback for debugging * gguf-debug: no mutex, verify type, fix stride. * llama: cv eval: move cb eval field in common gpt_params * ggml_debug: use common gpt_params to pass cb eval. Fix get tensor SIGV random. * ggml_debug: ci: add tests * ggml_debug: EOL in CMakeLists.txt * ggml_debug: Remove unused param n_batch, no batching here * ggml_debug: fix trailing spaces * ggml_debug: fix trailing spaces * common: fix cb_eval and user data not initialized * ci: build revert label * ggml_debug: add main test label * doc: add a model: add a link to ggml-debug * ggml-debug: add to make toolchain * ggml-debug: tests add the main label * ggml-debug: ci add test curl label * common: allow the warmup to be disabled in llama_init_from_gpt_params * ci: add curl test * ggml-debug: better tensor type support * gitignore : ggml-debug * ggml-debug: printing also the sum of each tensor * ggml-debug: remove block size * eval-callback: renamed from ggml-debug * eval-callback: fix make toolchain --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…OSX from ggerganov#6576 (ggerganov#6619)

gguf-debug: Example how to use ggml callback for debugging

067e294

phymbert added model Model specific demo Demonstrate some concept or idea, not intended to be merged labels Apr 10, 2024

phymbert requested review from ggerganov and slaren April 10, 2024 01:57

This comment was marked as off-topic.

Sign in to view

phymbert marked this pull request as draft April 10, 2024 02:36

gguf-debug: no mutex, verify type, fix stride.

f63b722

phymbert marked this pull request as ready for review April 10, 2024 07:57

ggerganov approved these changes Apr 10, 2024

View reviewed changes

examples/ggml-debug/ggml-debug.cpp Outdated Show resolved Hide resolved

phymbert changed the title ~~gguf-debug: Example how to use callback for debugging~~ ggml-debug: Example how to use callback for debugging Apr 10, 2024

phymbert added 3 commits April 10, 2024 20:51

llama: cv eval: move cb eval field in common gpt_params

8fe3be8

ggml_debug: use common gpt_params to pass cb eval.

01dd5e9

Fix get tensor SIGV random.

ggml_debug: ci: add tests

cda1d42

phymbert commented Apr 10, 2024

View reviewed changes

.github/workflows/build.yml Show resolved Hide resolved

ggml_debug: EOL in CMakeLists.txt

2d34bbe

phymbert changed the title ~~ggml-debug: Example how to use callback for debugging~~ ggml-debug: Example how to use eval callback for debugging Apr 10, 2024

phymbert added 4 commits April 10, 2024 21:15

ggml_debug: Remove unused param n_batch, no batching here

fe4b191

ggml_debug: fix trailing spaces

08fa088

ggml_debug: fix trailing spaces

ca6f3ff

common: fix cb_eval and user data not initialized

f3f0d18

phymbert commented Apr 10, 2024

View reviewed changes

common/common.h Show resolved Hide resolved

phymbert added 6 commits April 10, 2024 21:48

ci: build revert label

1a031d3

ggml_debug: add main test label

368272c

Merge remote-tracking branch 'origin/master' into hp/ggml/debug

deadf29

doc: add a model: add a link to ggml-debug

0b33928

ggml-debug: add to make toolchain

a42ebbd

ggml-debug: tests add the main label

f84473d

slaren reviewed Apr 10, 2024

View reviewed changes

examples/ggml-debug/ggml-debug.cpp Outdated Show resolved Hide resolved

examples/ggml-debug/ggml-debug.cpp Outdated Show resolved Hide resolved

examples/imatrix/imatrix.cpp Show resolved Hide resolved

phymbert added 4 commits April 10, 2024 22:36

ggml-debug: ci add test curl label

52a8e06

common: allow the warmup to be disabled in llama_init_from_gpt_params

831c97e

ci: add curl test

3f8a93f

ggml-debug: better tensor type support

cfb820b

phymbert requested a review from slaren April 10, 2024 21:45

phymbert mentioned this pull request Apr 11, 2024

Server: Add prompt processing progress endpoint? #6586

Open

ggerganov approved these changes Apr 11, 2024

View reviewed changes

ggerganov and others added 2 commits April 11, 2024 13:58

gitignore : ggml-debug

bb359cd

ggml-debug: printing also the sum of each tensor

8d7be2c

slaren reviewed Apr 11, 2024

View reviewed changes

examples/ggml-debug/ggml-debug.cpp Outdated Show resolved Hide resolved

slaren approved these changes Apr 11, 2024

View reviewed changes

phymbert added 2 commits April 11, 2024 14:07

ggml-debug: remove block size

28fd76f

eval-callback: renamed from ggml-debug

ee588a5

phymbert changed the title ~~ggml-debug: Example how to use eval callback for debugging~~ eval-callback: Example how to use eval callback for debugging Apr 11, 2024

eval-callback: fix make toolchain

12731d2

phymbert merged commit b804b1e into master Apr 11, 2024
58 of 61 checks passed

phymbert deleted the hp/ggml/debug branch April 11, 2024 12:51

slaren reviewed Apr 11, 2024

View reviewed changes

HanClinto added a commit to HanClinto/llama.cpp that referenced this pull request Apr 11, 2024

As suggested by @slaren, disabling Metal for test to fix CI build on …

d031c8f

…OSX from ggerganov#6576

HanClinto mentioned this pull request Apr 11, 2024

Fix for CI macOS-latest-cmake-arm64 #6619

Merged

HanClinto added a commit that referenced this pull request Apr 11, 2024

As suggested by @slaren, disabling Metal for test to fix CI build on …

f7001cc

…OSX from #6576 (#6619)

phymbert mentioned this pull request Apr 12, 2024

eval-callback: use ggml_op_desc to pretty print unary operator name #6631

Merged

tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024

As suggested by @slaren, disabling Metal for test to fix CI build on …

91c6d77

…OSX from ggerganov#6576 (ggerganov#6619)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval-callback: Example how to use eval callback for debugging #6576

eval-callback: Example how to use eval callback for debugging #6576

phymbert commented Apr 10, 2024 •

edited

Loading

slaren commented Apr 10, 2024 •

edited

Loading

This comment was marked as off-topic.

phymbert commented Apr 10, 2024

phymbert commented Apr 10, 2024

phymbert commented Apr 11, 2024

ggerganov left a comment

slaren commented Apr 11, 2024

slaren Apr 11, 2024

HanClinto commented Apr 11, 2024

slaren commented Apr 11, 2024

eval-callback: Example how to use eval callback for debugging #6576

eval-callback: Example how to use eval callback for debugging #6576

Conversation

phymbert commented Apr 10, 2024 • edited Loading

Motivation

Changes

Example

Output

slaren commented Apr 10, 2024 • edited Loading

This comment was marked as off-topic.

phymbert commented Apr 10, 2024

phymbert commented Apr 10, 2024

phymbert commented Apr 11, 2024

ggerganov left a comment

Choose a reason for hiding this comment

slaren commented Apr 11, 2024

slaren Apr 11, 2024

Choose a reason for hiding this comment

HanClinto commented Apr 11, 2024

slaren commented Apr 11, 2024

phymbert commented Apr 10, 2024 •

edited

Loading

slaren commented Apr 10, 2024 •

edited

Loading