Encoder is broken when CUBLAS is ON #1688

bobqianic · 2023-12-26T00:02:47Z

This occurs when using the tiny, small, base, medium, and large models.
All models used are not quantized.

    ggml_tensor * tensor = wctx.state->embd_conv;
    std::vector<float> tensor_data(ggml_nelements(tensor));
    ggml_backend_tensor_get(tensor, tensor_data.data(), 0, ggml_nbytes(tensor));
    std::ofstream outFile("encoder_embedding_conv.json");
    outFile << "[";
    for (uint64_t i = 0; i < tensor_data.size() - 1; i++) {
        outFile << tensor_data[i] << ", ";
    }
    outFile << tensor_data[tensor_data.size() - 1] << "]";
    outFile.close();
    return 0;

CUDA:

CPU:

encoder_embedding_conv.zip

The text was updated successfully, but these errors were encountered:

bobqianic · 2023-12-26T00:49:47Z

Regarding the encoder's final output. I compared CUDA output with CPU output.

This output reflects the behavior of a specific audio sample, and it's important to note that various audio samples can exhibit different behaviors. For instance, while some audio files might yield distinct results when using medium model, others could show varied outcomes when using small model.

    ggml_tensor * tensor = wctx->state->embd_enc;
    std::vector<float> tensor_data(ggml_nelements(tensor));
    ggml_backend_tensor_get(tensor, tensor_data.data(), 0, ggml_nbytes(tensor));
    std::ofstream outFile("encoder_embedding.json");
    outFile << "[";
    for (uint64_t i = 0; i < tensor_data.size() - 1; i++) {
        outFile << tensor_data[i] << ", ";
    }
    outFile << tensor_data[tensor_data.size() - 1] << "]";
    outFile.close();
    return 0;

Tiny:

Base:

Small:

Medium:

Large:

bobqianic · 2023-12-26T01:05:05Z

Do you know how we could fix this? @slaren

slaren · 2023-12-26T01:23:05Z

We would need to find the op that is producing wrong results in CUDA. The easiest way to do this is by using ggml_backend_compare_graph_backend to run the graph both on the CPU and in CUDA at the same time and compare the results. test-backend-ops shows how to do this. If you already know or suspect what op may be the issue, then you can add a test case in test-backend-ops to confirm it.

bobqianic · 2023-12-26T02:02:23Z

Seems like we're dealing with a nightmare here. I'll do my best to pinpoint the operation at the heart of the issue : )

~~Wait a second, my mistake, I made an error when comparing the second graph.~~

(Edit: I probably know why there was a problem when comparing the second graph. It's because a tensor was calculated twice, being computed once by each of two backends, so there was a problem. So what I need to do is modify the graph to make a copy of the initial tensor that's being calculated.)

~~I don't know how to fix this : (~~

CPU_backend vs CUDA_backend

Large-v2 (whisper_build_graph_conv)

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [IM2COL] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000010 FAIL

Large-v2 (whisper_build_graph_encoder)

[MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002
 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001
 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001
 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000015 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000009 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000009 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000021 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000022 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000024 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000027 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000031 [ADD] NMSE = 0.000032 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000014 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006
 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000039 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000041 [ADD] NMSE = 0.000042 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000042 [ADD] NMSE = 0.000044 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000008 [SCALE] NMSE = 0.000008 [SOFT_MAX] NMSE = 0.000044 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000049 [ADD] NMSE = 0.000050 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000014 [SCALE] NMSE = 0.000014 [SOFT_MAX] NMSE = 0.000048 [MUL_MAT] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000025 [ADD] NMSE = 0.000024 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000024 [MUL] NMSE = 0.000023 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000059 [ADD] NMSE = 0.000060 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000038 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000031 [ADD] NMSE = 0.000031 [CPY] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000037 [SCALE] NMSE = 0.000037 [SOFT_MAX] NMSE = 0.000053 [MUL_MAT] NMSE = 0.000029 [CPY] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000042 [ADD] NMSE = 0.000040 [ADD] NMSE = 0.000004
 [NORM] NMSE = 0.000042 [MUL] NMSE = 0.000034 [ADD] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000033 [MUL_MAT] NMSE = 0.000142 [ADD] NMSE = 0.000144 [ADD] NMSE = 0.000007 [NORM] NMSE = 0.000079 [MUL] NMSE = 0.000047 [ADD] NMSE = 0.000047 [MUL_MAT] NMSE = 0.000045 [ADD] NMSE = 0.000045 [CPY] NMSE = 0.000045 [MUL_MAT] NMSE = 0.000042 [CPY] NMSE = 0.000042 [MUL_MAT] NMSE = 0.000037 [ADD] NMSE = 0.000035 [CPY] NMSE = 0.000035 [MUL_MAT] NMSE = 0.000078 [SCALE] NMSE = 0.000078 [SOFT_MAX] NMSE = 0.000097 [MUL_MAT] NMSE = 0.000049 [CPY] NMSE = 0.000049 [MUL_MAT] NMSE = 0.000082 [ADD] NMSE = 0.000080 [ADD] NMSE = 0.000008 [NORM] NMSE = 0.000075 [MUL] NMSE = 0.000056 [ADD] NMSE = 0.000048 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000090 [MUL_MAT] NMSE = 0.000488 [ADD] NMSE = 0.000495 [ADD] NMSE = 0.000025 [NORM] NMSE = 0.000067 [MUL] NMSE = 0.000051 [ADD] NMSE = 0.000052 [MUL_MAT] NMSE = 0.000064 [ADD] NMSE = 0.000064 [CPY] NMSE = 0.000064 [MUL_MAT] NMSE = 0.000039 [CPY] NMSE = 0.000039 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000032 [CPY] NMSE = 0.000032 [MUL_MAT] NMSE = 0.000061 [SCALE] NMSE = 0.000061 [SOFT_MAX] NMSE = 0.000104 [MUL_MAT] NMSE = 0.000054 [CPY] NMSE = 0.000054 [MUL_MAT] NMSE = 0.000080 [ADD] NMSE = 0.000077 [ADD] NMSE = 0.000025 [NORM] NMSE = 0.000068 [MUL] NMSE = 0.000060 [ADD] NMSE = 0.000052 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [GELU] NMSE = 0.000127 [MUL_MAT] NMSE = 0.000938 [ADD] NMSE = 0.000940 [ADD] NMSE = 0.000063 [NORM] NMSE = 0.000343 [MUL] NMSE = 0.000152 [ADD] NMSE = 0.000154 [MUL_MAT] NMSE = 0.000130 [ADD] NMSE = 0.000130 [CPY] NMSE = 0.000130 [MUL_MAT] NMSE = 0.000169 [CPY] NMSE = 0.000169 [MUL_MAT] NMSE = 0.000119 [ADD] NMSE = 0.000114 [CPY] NMSE = 0.000114 [MUL_MAT] NMSE = 0.000312 [SCALE] NMSE = 0.000312 [SOFT_MAX] NMSE = 0.000324 [MUL_MAT] NMSE = 0.000217 [CPY] NMSE = 0.000217 [MUL_MAT] NMSE = 0.000305 [ADD] NMSE = 0.000293 [ADD] NMSE = 0.000064 [NORM] NMSE = 0.000380 [MUL] NMSE = 0.000357 [ADD] NMSE = 0.000316 [MUL_MAT] NMSE = 0.000055 [ADD] NMSE = 0.000054 [GELU] NMSE = 0.002536 [MUL_MAT] NMSE = 0.006002 [ADD] NMSE = 0.006002
 [ADD] NMSE = 0.002475 [NORM] NMSE = 0.000820 [MUL] NMSE = 0.000437 [ADD] NMSE = 0.000440 [MUL_MAT] NMSE = 0.000277 [ADD] NMSE = 0.000278 [CPY] NMSE = 0.000278 [MUL_MAT] NMSE = 0.000620 [CPY] NMSE = 0.000620 [MUL_MAT] NMSE = 0.000315 [ADD] NMSE = 0.000303 [CPY] NMSE = 0.000303 [MUL_MAT] NMSE = 0.001110 [SCALE] NMSE = 0.001110 [SOFT_MAX] NMSE = 0.000941 [MUL_MAT] NMSE = 0.000439 [CPY] NMSE = 0.000439 [MUL_MAT] NMSE = 0.000577 [ADD] NMSE = 0.000558 [ADD] NMSE = 0.002471 [NORM] NMSE = 0.000828 [MUL] NMSE = 0.001247 [ADD] NMSE = 0.001100 [MUL_MAT] NMSE = 0.000411 [ADD] NMSE = 0.000406 [GELU] NMSE = 0.008901 [MUL_MAT] NMSE = 0.010630 [ADD] NMSE = 0.010632 [ADD] NMSE = 0.004732 [NORM] NMSE = 0.001196 [MUL] NMSE = 0.000786 [ADD] NMSE = 0.000783 [MUL_MAT] NMSE = 0.000519 [ADD] NMSE = 0.000520 [CPY] NMSE = 0.000520 [MUL_MAT] NMSE = 0.001015 [CPY] NMSE = 0.001015 [MUL_MAT] NMSE = 0.000578 [ADD] NMSE = 0.000557 [CPY] NMSE = 0.000557 [MUL_MAT] NMSE = 0.001883 [SCALE] NMSE = 0.001883 [SOFT_MAX] NMSE = 0.001561 [MUL_MAT] NMSE = 0.000627 [CPY] NMSE = 0.000627 [MUL_MAT] NMSE = 0.000657 [ADD] NMSE = 0.000639 [ADD] NMSE = 0.004732 [NORM] NMSE = 0.001201 [MUL] NMSE = 0.001921 [ADD] NMSE = 0.001670 [MUL_MAT] NMSE = 0.000502 [ADD] NMSE = 0.000494 [GELU] NMSE = 0.009599 [MUL_MAT] NMSE = 0.012234 [ADD] NMSE = 0.012243 [ADD] NMSE = 0.006867 [NORM] NMSE = 0.001300 [MUL] NMSE = 0.000892 [ADD] NMSE = 0.000872 FAIL

Large-v2 (whisper_build_graph_cross)

OK

My modification:

I copied a section of code from test-backend-ops and pasted it into the whisper.cpp encoder, then proceeded to compile and run it.

static std::vector<float> tensor_to_float(const ggml_tensor * t) {
    std::vector<float> tv;
    tv.reserve(ggml_nelements(t));

    std::vector<uint8_t> buf(ggml_nbytes(t));
    ggml_backend_tensor_get(t, buf.data(), 0, ggml_nbytes(t));

    ggml_type_traits_t tt = ggml_internal_get_type_traits(t->type);
    size_t bs = ggml_blck_size(t->type);
    std::vector<float> vq(ggml_blck_size(t->type));
    bool quantized = ggml_is_quantized(t->type);

    // access elements by index to avoid gaps in views
    for (int64_t i3 = 0; i3 < t->ne[3]; i3++) {
        for (int64_t i2 = 0; i2 < t->ne[2]; i2++) {
            for (int64_t i1 = 0; i1 < t->ne[1]; i1++) {
                for (int64_t i0 = 0; i0 < t->ne[0]; i0 += bs) {
                    size_t i = i3*t->nb[3] + i2*t->nb[2] + i1*t->nb[1] + i0/bs*t->nb[0];
                    if (t->type == GGML_TYPE_F16) {
                        tv.push_back(ggml_fp16_to_fp32(*(ggml_fp16_t*)&buf[i]));
                    } else if (t->type == GGML_TYPE_F32) {
                        tv.push_back(*(float *) &buf[i]);
                    } else if (t->type == GGML_TYPE_I32) {
                        tv.push_back((float)*(int32_t *) &buf[i]);
                    } else if (quantized) {
                        tt.to_float(&buf[i], vq.data(), bs);
                        tv.insert(tv.end(), vq.begin(), vq.end());
                    } else {
                        GGML_ASSERT(false);
                    }
                }
            }
        }
    }

    return tv;
}

static bool isinf_or_max(float f) {
    return std::isinf(f) || f == FLT_MAX || f == -FLT_MAX;
}

static double nmse(const float * a, const float * b, size_t n) {
    double mse_a_b = 0.0;
    double mse_a_0 = 0.0;

    for (size_t i = 0; i < n; i++) {
        float a_i = a[i];
        float b_i = b[i];

        mse_a_b += (a_i - b_i) * (a_i - b_i);
        mse_a_0 += a_i * a_i;
    }

    return mse_a_b / mse_a_0;
}

static bool whisper_encode_internal(
        whisper_context & wctx,
          whisper_state & wstate,
              const int   mel_offset,
              const int   n_threads,
 whisper_abort_callback   abort_callback,
                   void * abort_callback_data) {
    const int64_t t_start_us = ggml_time_us();

    struct callback_userdata {
        bool   ok;
        double max_err;
    };

    callback_userdata ud {
            true,
            1e-7,
    };

    auto callback = [](int index, ggml_tensor * t1, ggml_tensor * t2, void * user_data) -> bool {
        callback_userdata * ud = (callback_userdata *) user_data;

        if (t1->op == GGML_OP_NONE) {
            // sentinels must be unchanged
            std::vector<uint8_t> t1_data(ggml_nbytes(t1));
            std::vector<uint8_t> t2_data(ggml_nbytes(t2));
            ggml_backend_tensor_get(t1, t1_data.data(), 0, ggml_nbytes(t1));
            ggml_backend_tensor_get(t2, t2_data.data(), 0, ggml_nbytes(t2));

            if (memcmp(t1_data.data(), t2_data.data(), ggml_nbytes(t1)) != 0) {
                printf("sentinel mismatch: %s ", t1->name);
                ud->ok = false;
                return true;
            }
        }

        std::vector<float> f1 = tensor_to_float(t1);
        std::vector<float> f2 = tensor_to_float(t2);

        for (size_t i = 0; i < f1.size(); i++) {
            // check for nans
            if (std::isnan(f1[i]) || std::isnan(f2[i])) {
                printf("[%s] NaN at index %zu (%f %f) ", ggml_op_desc(t1), i, f1[i], f2[i]);
                ud->ok = false;
                return true;
            }
            // check for infs: both must be inf of the same sign, or both must be finite
            if (isinf_or_max(f1[i]) || isinf_or_max(f2[i])) {
                if (isinf_or_max(f1[i]) && isinf_or_max(f2[i])) {
                    if (std::signbit(f1[i]) != std::signbit(f2[i])) {
                        printf("[%s] inf sign mismatch: %f %f ", ggml_op_desc(t1), f1[i], f2[i]);
                        ud->ok = false;
                        return true;
                    }
                } else {
                    printf("[%s] inf mismatch: %f %f ", ggml_op_desc(t1), f1[i], f2[i]);
                    ud->ok = false;
                    return true;
                }
            }
        }

        double err = nmse(f1.data(), f2.data(), f1.size());
        if (err > ud->max_err) {
            printf("[%s] NMSE = %f ", ggml_op_desc(t1), err);
            //for (int i = 0; i < f1.size(); i++) {
            //    printf("%5d %9.6f %9.6f, diff = %9.6f\n", i, f1[i], f2[i], f1[i] - f2[i]);
            //}
            //printf("\n");
            //exit(1);
            ud->ok = false;
        }
        return true;

        GGML_UNUSED(index);
    };

    ggml_backend_t backend_cpu = ggml_backend_cpu_init();

    // conv
    {
        auto & alloc = wstate.alloc_conv.alloc;

        ggml_allocr_reset(alloc);

        ggml_cgraph * gf = whisper_build_graph_conv(wctx, wstate, mel_offset);

        ggml_allocr_alloc_graph(alloc, gf);

        ud = {true, 1e-7};

        ggml_backend_compare_graph_backend(wstate.backend, backend_cpu, gf, callback, &ud);

        if (ud.ok) {
            printf("\033[1;32mOK\033[0m\n");
        } else {
            printf("\033[1;31mFAIL\033[0m\n");
        }

//        if (!whisper_encode_external(wstate)) {
//            ggml_graph_compute_helper(wstate.backend, gf, n_threads);
//        }
    }

    // encoder
    if (!whisper_encode_external(wstate)) {
        auto & alloc = wstate.alloc_encode.alloc;

        ggml_allocr_reset(alloc);

        ggml_cgraph * gf = whisper_build_graph_encoder(wctx, wstate);

        ggml_allocr_alloc_graph(alloc, gf);

        ud = {true, 1e-7};

        ggml_backend_compare_graph_backend(wstate.backend, backend_cpu, gf, callback, &ud);

        if (ud.ok) {
            printf("\033[1;32mOK\033[0m\n");
        } else {
            printf("\033[1;31mFAIL\033[0m\n");
        }

//        ggml_graph_compute_helper(wstate.backend, gf, n_threads);
    }

    // cross
    {
        auto & alloc = wstate.alloc_cross.alloc;

        ggml_allocr_reset(alloc);

        ggml_cgraph * gf = whisper_build_graph_cross(wctx, wstate);

        ggml_allocr_alloc_graph(alloc, gf);

        ud = {true, 1e-7};

        ggml_backend_compare_graph_backend(wstate.backend, backend_cpu, gf, callback, &ud);

        if (ud.ok) {
            printf("\033[1;32mOK\033[0m\n");
        } else {
            printf("\033[1;31mFAIL\033[0m\n");
        }

        return 0;

//        ggml_graph_compute_helper(wstate.backend, gf, n_threads);
    }

    wstate.t_encode_us += ggml_time_us() - t_start_us;
    wstate.n_encode++;

    return !(abort_callback && abort_callback(abort_callback_data));
}

ggerganov · 2023-12-26T07:00:09Z

Which NVIDIA card are you using - this seems like an issue that occurs only on very old hardware (CC <= 6). It's hard to fix because I don't have means to reproduce it

bobqianic · 2023-12-26T09:46:19Z

this seems like an issue that occurs only on very old hardware

I’m currently using an RTX 3060, which is still fairly recent.

bobqianic · 2023-12-26T09:56:46Z

By the way, do you have any idea on how to get my test code running properly? As far as I know, ggml_op_CONT is just meant to copy data. In my test, it’s the first operation in the second graph and shows an NMSE of 1, indicating that one of the backends received a zero tensor, which isn’t right. I’ve tried testing with the -ng flag for consistency, as it compares CPU backend against CPU backend. Interestingly, I get consistent results in the first graph, but not in the second or third.

slaren · 2023-12-26T10:25:57Z

The test code looks good. It's not impossible that the issue is in the ggml_cont, but to verify that you can add a test case to test-backend-ops with the exact same parameters (copy the tensor types, dimensions and strides in nb).

bobqianic · 2023-12-27T14:49:08Z

The test code looks good.

Ah, now I understand why I'm encountering issues when comparing the second and third graphs.

whisper_build_graph_encoder

whisper.cpp/whisper.cpp

Line 1785 in 37a709f

struct ggml_tensor * cur = ggml_view_tensor(ctx0, wstate.embd_conv);

If I remove this code and replace it with

struct ggml_tensor * cur = wstate.embd_conv;

whisper_build_graph_cross

whisper.cpp/whisper.cpp

Line 2034 in 37a709f

struct ggml_tensor * cur = ggml_view_tensor(ctx0, wstate.embd_enc);

If I remove this code and replace it with

struct ggml_tensor * cur = wstate.embd_enc;

It functions as anticipated, especially when I employ ggml_backend_compare_graph_backend. When I utilize the -ng (no GPU) flag, it yields identical results.

slaren · 2023-12-27T15:05:28Z

Ah I see, I didn't understand the issue. I think that views of externals tensors were not being copied properly to the other backend. This should fix it:

diff --git a/ggml-backend.c b/ggml-backend.c
index 526ce732..e9cfffbe 100644
--- a/ggml-backend.c
+++ b/ggml-backend.c
@@ -1312,6 +1312,7 @@ static void graph_init_tensor(struct ggml_hash_set hash_set, struct ggml_tensor

     struct ggml_tensor * dst = node_copies[id];
     if (dst->view_src != NULL) {
+        graph_init_tensor(hash_set, node_copies, node_init, src->view_src);
         ggml_backend_view_init(dst->view_src->buffer, dst);
     }
     else {

bobqianic · 2023-12-27T15:16:02Z

This output reflects the behavior of a specific audio sample, and it's important to note that various audio samples can exhibit different behaviors. For instance, while some audio files might yield distinct results when using medium model, others could show varied outcomes when using small model.

My hypothesis is that there must have been a relatively large error occurring somewhere, and these errors keep accumulating as the computation goes on, leading to problems in the results. This also explains why, with different audio files, only specific models encounter issues. It's because different models have different weights and embeddings, so the errors might cancel each other out during the computation process.

slaren · 2023-12-27T15:28:25Z

It might be because matrix multiplications are performed in FP16. You can force FP32 by using ggml_mul_mat_set_prec. This should work to modify the entire graph:

for (int i = 0; i < gf->n_nodes; i++) {
    if (gf->nodes[i]->op == GGML_OP_MUL_MAT) ggml_mul_mat_set_prec(gf->nodes[i], GGML_PREC_F32);
}

bobqianic · 2023-12-27T15:36:02Z

Have I made a mistake here? This seems to have worsened the NMSE.

Before:

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [IM2COL] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000010 FAIL

After:

[MUL_MAT] NMSE = 5.193738 [ADD] NMSE = 3.548122 [GELU] NMSE = 5.239518 [IM2COL] NMSE = 5.234486 [MUL_MAT] NMSE = 0.935680 [ADD] NMSE = 0.933797 [GELU] NMSE = 0.925861 FAIL

    // conv
    {
        auto & alloc = wstate.alloc_conv.alloc;

        ggml_allocr_reset(alloc);

        ggml_cgraph * gf = whisper_build_graph_conv(wctx, wstate, mel_offset);

        for (int i = 0; i < gf->n_nodes; i++) {
            if (gf->nodes[i]->op == GGML_OP_MUL_MAT) ggml_mul_mat_set_prec(gf->nodes[i], GGML_PREC_F32);
        }

        ggml_allocr_alloc_graph(alloc, gf);

        ud = {true, 1e-7};

        ggml_backend_compare_graph_backend(wstate.backend, backend_cpu, gf, callback, &ud);

        if (ud.ok) {
            printf("\033[1;32mOK\033[0m\n\n");
        } else {
            printf("\033[1;31mFAIL\033[0m\n\n");
        }

//        if (!whisper_encode_external(wstate)) {
//            ggml_graph_compute_helper(wstate.backend, gf, n_threads);
//        }
    }

slaren · 2023-12-27T15:40:34Z

That's odd, maybe there is a bug with GGML_PREC_F32.

ggerganov · 2023-12-27T15:55:00Z

@bobqianic Maybe it would be better if you open a PR with the changes that you have made and steps to reproduce.

Also, try the latest sync #1691 that will be merged soon and see of the issues still persist there

bobqianic · 2023-12-27T16:05:24Z

That's odd, maybe there is a bug with GGML_PREC_F32.

Even if ggml_mul_mat_set_prec functions properly, it won't change the precision used in GGML_OP_MUL_MAT when using CUDA backend. Given that src0 typically acts as the weight and is in FP16 format, it will automatically perform matrix multiplication using FP16 precision.

whisper.cpp/ggml-cuda.cu

Lines 7396 to 7397 in 37a709f

    
           if (compute_capability >= CC_VOLTA && (src0->type == GGML_TYPE_F16 || ggml_is_quantized(src0->type)) && ggml_is_contiguous(src0) && row_diff == src0->ne[1] && dst->op_params[0] == GGML_PREC_DEFAULT) { 
        
               // convert src0 and src1 to fp16, multiply as fp16, convert dst to fp32

bobqianic · 2023-12-27T16:05:31Z

@bobqianic Maybe it would be better if you open a PR with the changes that you have made and steps to reproduce.

Also, try the latest sync #1691 that will be merged soon and see of the issues still persist there

OK

slaren · 2023-12-27T16:12:55Z

Given that src0 typically acts as the weight and is in FP16 format, it will automatically perform matrix multiplication using FP16 precision.

There is a check for dst->op_params[0] == GGML_PREC_DEFAULT that should prevent that.

bobqianic · 2023-12-27T16:26:35Z

Given that src0 typically acts as the weight and is in FP16 format, it will automatically perform matrix multiplication using FP16 precision.

There is a check for dst->op_params[0] == GGML_PREC_DEFAULT that should prevent that.

~~src0->type == GGML_TYPE_F16 ||~~

If src0->type == GGML_TYPE_F16 evaluates to true, short-circuit evaluation comes into play. Therefore, dst->op_params[0] == GGML_PREC_DEFAULT will never be evaluated at all. So, it won't interfere with the evaluation.

slaren · 2023-12-27T16:35:43Z

All of these need to be true to use FP16 matrix multiplication:

compute_capability >= CC_VOLTA
(src0->type == GGML_TYPE_F16 || ggml_is_quantized(src0->type))
ggml_is_contiguous(src0)
row_diff == src0->ne[1]
dst->op_params[0] == GGML_PREC_DEFAULT

Note that the || is inside a parenthesis.

bobqianic · 2023-12-27T21:50:24Z

By the way, do you have any idea why using the -ng (no GPU) flag still results in some GPU activity? I've been monitoring it with GPU-Z and noticed a few activities. ~~I'm planning to use Nsight to identify which kernel is being launched.~~

It's not just me. Several other users have noticed the same thing, which is quite odd. I've been able to replicate this issue on my machine as well.

#1587 (comment)

What?

As far as I know, the -ng flag is supposed to create a CPU backend, which theoretically means there should be no GPU activity.

whisper.cpp/whisper.cpp

Lines 1056 to 1089 in 37a709f

    
           static ggml_backend_t whisper_backend_init(const whisper_context_params & params) { 
        
               ggml_backend_t backend_gpu = NULL; 
        
               // initialize the backends 
        
           #ifdef GGML_USE_CUBLAS 
        
               if (params.use_gpu && ggml_cublas_loaded()) { 
        
                   WHISPER_LOG_INFO("%s: using CUDA backend\n", __func__); 
        
                   backend_gpu = ggml_backend_cuda_init(0); 
        
                   if (!backend_gpu) { 
        
                       WHISPER_LOG_ERROR("%s: ggml_backend_cuda_init() failed\n", __func__); 
        
                   } 
        
               } 
        
           #endif 
        
           #ifdef GGML_USE_METAL 
        
               if (params.use_gpu) { 
        
                   WHISPER_LOG_INFO("%s: using Metal backend\n", __func__); 
        
                   ggml_metal_log_set_callback(whisper_log_callback_default, nullptr); 
        
                   backend_gpu = ggml_backend_metal_init(); 
        
                   if (!backend_gpu) { 
        
                       WHISPER_LOG_ERROR("%s: ggml_backend_metal_init() failed\n", __func__); 
        
                   } else if (!ggml_backend_metal_supports_family(backend_gpu, 7)) { 
        
                       WHISPER_LOG_ERROR("%s: Metal GPU does not support family 7 - falling back to CPU\n", __func__); 
        
                       ggml_backend_free(backend_gpu); 
        
                       backend_gpu = NULL; 
        
                   } 
        
               } 
        
           #endif 
        
               if (backend_gpu) { 
        
                   return backend_gpu; 
        
               } 
        
               return ggml_backend_cpu_init(); 
        
           }

slaren · 2023-12-27T23:23:27Z

The CUDA backend is always used automatically with large matrix multiplications. At the moment, the only way to disable it completely is to build without CUDA.

whisper.cpp/ggml-cuda.cu

Lines 8243 to 8246 in 37a709f

    
           return (src0->type == GGML_TYPE_F32 || src0->type == GGML_TYPE_F16 || ggml_is_quantized(src0->type)) && 
        
                   src1->type == GGML_TYPE_F32 && 
        
                    dst->type == GGML_TYPE_F32 && 
        
                   (ne0 >= 32 && ne1 >= 32 && ne10 >= 32);

ggerganov · 2023-12-29T08:48:54Z

At the moment, the only way to disable it completely is to build without CUDA.

There is also the option to run with CUDA_VISIBLE_DEVICES=-1:

whisper.cpp/ggml-cuda.cu

Lines 6643 to 6647 in 37a709f

    
           if (cudaGetDeviceCount(&g_device_count) != cudaSuccess) { 
        
               initialized = true; 
        
               g_cublas_loaded = false; 
        
               return; 
        
           }

bobqianic added bug Something isn't working help wanted Extra attention is needed high priority Very important issue labels Dec 26, 2023

bobqianic changed the title ~~Encoder Convolution is broken when CUBLAS is ON~~ Encoder is broken when CUBLAS is ON Dec 26, 2023

bobqianic mentioned this issue Dec 27, 2023

Testing : Compare CPU backend with GPU backend #1692

Closed

bobqianic mentioned this issue Dec 27, 2023

cuBLAS error 15 at ggml-cuda.cu:7548: the requested functionality is not supported #1587

Closed

bobqianic linked a pull request Jan 6, 2024 that will close this issue

Testing : Compare CPU backend with GPU backend #1692

Closed

bobqianic closed this as completed Jan 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoder is broken when CUBLAS is ON #1688

Encoder is broken when CUBLAS is ON #1688

bobqianic commented Dec 26, 2023 •

edited

Loading

bobqianic commented Dec 26, 2023 •

edited

Loading

bobqianic commented Dec 26, 2023

slaren commented Dec 26, 2023 •

edited

Loading

bobqianic commented Dec 26, 2023 •

edited

Loading

ggerganov commented Dec 26, 2023

bobqianic commented Dec 26, 2023

bobqianic commented Dec 26, 2023

slaren commented Dec 26, 2023 •

edited

Loading

bobqianic commented Dec 27, 2023

slaren commented Dec 27, 2023

bobqianic commented Dec 27, 2023 •

edited

Loading

slaren commented Dec 27, 2023

bobqianic commented Dec 27, 2023 •

edited

Loading

slaren commented Dec 27, 2023

ggerganov commented Dec 27, 2023

bobqianic commented Dec 27, 2023 •

edited

Loading

bobqianic commented Dec 27, 2023

slaren commented Dec 27, 2023

bobqianic commented Dec 27, 2023 •

edited

Loading

slaren commented Dec 27, 2023

bobqianic commented Dec 27, 2023

slaren commented Dec 27, 2023

ggerganov commented Dec 29, 2023

Encoder is broken when CUBLAS is ON #1688

Encoder is broken when CUBLAS is ON #1688

Comments

bobqianic commented Dec 26, 2023 • edited Loading

CUDA:

CPU:

bobqianic commented Dec 26, 2023 • edited Loading

Tiny:

Base:

Small:

Medium:

Large:

bobqianic commented Dec 26, 2023

slaren commented Dec 26, 2023 • edited Loading

bobqianic commented Dec 26, 2023 • edited Loading

CPU_backend vs CUDA_backend

My modification:

ggerganov commented Dec 26, 2023

bobqianic commented Dec 26, 2023

bobqianic commented Dec 26, 2023

slaren commented Dec 26, 2023 • edited Loading

bobqianic commented Dec 27, 2023

slaren commented Dec 27, 2023

bobqianic commented Dec 27, 2023 • edited Loading

slaren commented Dec 27, 2023

bobqianic commented Dec 27, 2023 • edited Loading

slaren commented Dec 27, 2023

ggerganov commented Dec 27, 2023

bobqianic commented Dec 27, 2023 • edited Loading

bobqianic commented Dec 27, 2023

slaren commented Dec 27, 2023

bobqianic commented Dec 27, 2023 • edited Loading

slaren commented Dec 27, 2023

bobqianic commented Dec 27, 2023

slaren commented Dec 27, 2023

ggerganov commented Dec 29, 2023

bobqianic commented Dec 26, 2023 •

edited

Loading

bobqianic commented Dec 26, 2023 •

edited

Loading

slaren commented Dec 26, 2023 •

edited

Loading

bobqianic commented Dec 26, 2023 •

edited

Loading

slaren commented Dec 26, 2023 •

edited

Loading

bobqianic commented Dec 27, 2023 •

edited

Loading

bobqianic commented Dec 27, 2023 •

edited

Loading

bobqianic commented Dec 27, 2023 •

edited

Loading

bobqianic commented Dec 27, 2023 •

edited

Loading