Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Topic partition backlog grows after bookies restart #23908

Open
2 of 3 tasks
szkoludasebastian opened this issue Jan 29, 2025 · 22 comments
Open
2 of 3 tasks

[Bug] Topic partition backlog grows after bookies restart #23908

szkoludasebastian opened this issue Jan 29, 2025 · 22 comments
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@szkoludasebastian
Copy link

szkoludasebastian commented Jan 29, 2025

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

Pulsar client >= 3.3.2
Pulsar server >= 3.3.2

We made our tests on client/server version: 3.3.1, 3.3.2, 4.0.1, 4.0.2. We have not noticed such a problem on version 3.3.1.

Minimal reproduce step

  1. Create partitioned topic A with 100 partitions
  2. Connect 4 consumers to topic A (Subscription type does not matter, we noticed this problem for shared and key_shared subscriptions)
  3. Start sending data to topic A with a throughput of about 5,000 messages per second
  4. Restart bookies (in our case we had like 6 bookies and 6 brokers) - we restarted it by killing k8s pods
  5. After all bookies restarted and are up and running wait for 30 seconds and stop sending data
  6. Check pulsar admin topic stats that backlog is only growing. We were observing backlog for every partition in grafana so we knew for which partition check stats-internal. In our stats-internal we usually see such situation:
{
  "entriesAddedCounter" : 1843,
  "numberOfEntries" : 1867,
  "totalSize" : 14761813,
  "currentLedgerEntries" : 964,
  "currentLedgerSize" : 10095321,
  "lastLedgerCreatedTimestamp" : "2025-01-29T09:52:26.278Z",
  "lastLedgerCreationFailureTimestamp" : "2025-01-29T09:52:26.059Z",
  "waitingCursorsCount" : 1,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "1755285:963",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 1742277,
    "entries" : 23,
    "size" : 26553,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1742802,
    "entries" : 1,
    "size" : 4272,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1743592,
    "entries" : 35,
    "size" : 21963,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1744071,
    "entries" : 31,
    "size" : 19453,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1744244,
    "entries" : 43,
    "size" : 26981,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1744474,
    "entries" : 44,
    "size" : 27611,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1744955,
    "entries" : 25,
    "size" : 15688,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1745168,
    "entries" : 48,
    "size" : 30127,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1745373,
    "entries" : 45,
    "size" : 28250,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1745594,
    "entries" : 38,
    "size" : 23882,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1746041,
    "entries" : 36,
    "size" : 22628,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1746268,
    "entries" : 34,
    "size" : 21362,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1746487,
    "entries" : 34,
    "size" : 21370,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1746710,
    "entries" : 443,
    "size" : 4184621,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1749679,
    "entries" : 23,
    "size" : 191731,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 1755285,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "realTimeDocStreamSubscription" : {
      "markDeletePosition" : "1739559:1480",
      "readPosition" : "1755285:964",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 3097,
      "cursorLedger" : 1755643,
      "cursorLedgerLastEntry" : 72,
      "individuallyDeletedMessages" : "[(1739559:1481..1739559:1482],(1739559:1483..1739559:1488],(1739559:1489..1739559:1490],(1739559:1494..1739559:1498],(1739559:1501..1739559:1502],(1739559:1503..1739559:1504],(1739559:1507..1739559:1510],(1739559:1511..1739559:1513],(1739559:1518..1739559:1519],(1739559:1521..1739559:1528],(1739559:1530..1739559:1531],(1739559:1532..1739559:1533],(1739559:1534..1739559:1535],(1739559:1536..1739559:1541],(1739559:1542..1739559:1545],(1739559:1546..1739559:1549],(1739559:1550..1739559:1561],(1739559:1562..1739559:1563],(1739559:1564..1739559:1571],(1739559:1572..1739559:1574],(1739559:1575..1739559:1586],(1739559:1587..1739559:1588],(1739559:1589..1739559:1591],(1739559:1592..1739559:1594],(1739559:1598..1739559:1599],(1739559:1600..1739559:1604],(1739559:1605..1739559:1607],(1739559:1609..1739559:1611],(1739559:1612..1739559:1614],(1739559:1615..1739559:1621],(1739559:1622..1739559:1627],(1739559:1628..1739559:1630],(1739559:1631..1739559:1643],(1739559:1644..1739559:1647],(1739559:1648..1739559:1655],(1739559:1657..1739559:1658],(1739559:1659..1739559:1663],(1739559:1664..1739559:1665],(1739559:1668..1739559:1672],(1739559:1673..1739559:1674],(1739559:1675..1739559:1678],(1739559:1679..1739559:1681],(1739559:1682..1739559:1683],(1739559:1684..1739559:1687],(1739559:1688..1739559:1697],(1739559:1698..1739559:1704],(1739559:1705..1739559:1706],(1739559:1707..1739559:1711],(1739559:1712..1739559:1713],(1739559:1715..1739559:1717],(1739559:1718..1739559:1719],(1739559:1720..1739559:1725],(1739559:1726..1739559:1729],(1739559:1730..1739559:1738],(1739559:1739..1739559:1751],(1739559:1755..1739559:1765],(1739559:1766..1739559:1768],(1739559:1771..1739559:1773],(1739559:1776..1739559:1779],(1739559:1781..1739559:1788],(1739559:1789..1739559:1794],(1739559:1795..1739559:1796],(1739559:1797..1739559:1803],(1739559:1806..1739559:1808],(1739559:1809..1739559:1810],(1739559:1812..1739559:1817],(1739559:1818..1739559:1828],(1739559:1830..1739559:1832],(1739559:1833..1739559:1838],(1739559:1841..1739559:1843],(1739559:1844..1739559:1849],(1739559:1851..1739559:1857],(1739559:1858..1739559:1859],(1739559:1860..1739559:1864],(1739559:1865..1739559:1870],(1739559:1871..1739559:1873],(1739559:1875..1739559:1876],(1739559:1877..1739559:1879],(1739559:1881..1739559:1887],(1739559:1890..1739559:1893],(1739559:1894..1739559:1896],(1739559:1897..1739559:1899],(1739559:1901..1739559:1908],(1739559:1911..1739559:1913],(1739559:1914..1739559:1918],(1739559:1919..1739559:1922],(1739559:1923..1739559:1925],(1739559:1927..1739559:1939],(1739559:1940..1739559:1945],(1739559:1946..1739559:1956],(1739559:1958..1739559:1959],(1739559:1960..1739559:1968],(1739559:1969..1739559:1974],(1739559:1975..1739559:1977],(1739559:1978..1739559:1983],(1739559:1985..1739559:1992],(1739559:1993..1739559:1996],(1739559:1998..1739559:1999],(1739559:2001..1739559:2008],(1739559:2009..1739559:2010],(1739559:2011..1739559:2016],(1739559:2017..1739559:2021],(1739559:2022..1739559:2023],(1739559:2024..1739559:2025],(1739559:2026..1739559:2030],(1739559:2031..1739559:2032],(1739559:2033..1739559:2037],(1739559:2038..1739559:2042],(1739559:2043..1739559:2044],(1739559:2046..1739559:2048],(1739559:2050..1739559:2056],(1739559:2058..1739559:2060],(1739559:2062..1739559:2063],(1739559:2064..1739559:2066],(1739559:2068..1739559:2070],(1739559:2071..1739559:2072],(1739559:2073..1739559:2078],(1739559:2079..1739559:2082],(1739559:2084..1739559:2085],(1739559:2086..1739559:2088],(1739559:2089..1739559:2098],(1739559:2102..1739559:2104],(1739559:2105..1739559:2111],(1739559:2113..1739559:2115],(1739559:2116..1739559:2124],(1739559:2125..1739559:2129],(1739559:2130..1739559:2135],(1739559:2137..1739559:2142],(1739559:2143..1739559:2144],(1739559:2145..1739559:2147],(1739559:2148..1739559:2152],(1739559:2155..1739559:2157],(1739559:2158..1739559:2161],(1739559:2163..1739559:2166],(1739559:2167..1739559:2168],(1739559:2170..1739559:2172],(1739559:2173..1739559:2179],(1739559:2181..1739559:2185],(1739559:2187..1739559:2190],(1739559:2192..1739559:2199],(1739559:2201..1739559:2204],(1739559:2207..1739559:2212],(1739559:2213..1739559:2214],(1739559:2215..1739559:2226],(1739559:2230..1739559:2231],(1739559:2232..1739559:2233],(1739559:2234..1739559:2235],(1739559:2236..1739559:2239],(1739559:2240..1739559:2241],(1739559:2242..1739559:2243],(1739559:2244..1739559:2250],(1739559:2255..1739559:2256],(1739559:2258..1739559:2263],(1739559:2264..1739559:2265],(1739559:2266..1739559:2269],(1739559:2270..1739559:2271],(1739559:2272..1739559:2277],(1739559:2278..1739559:2282],(1739559:2283..1739559:2288],(1739559:2289..1739559:2293],(1739559:2294..1739559:2304],(1739559:2305..1739559:2306],(1739559:2308..1739559:2312],(1739559:2313..1739559:2314],(1739559:2315..1739559:2323],(1739559:2324..1739559:2327],(1739559:2328..1739559:2331],(1739559:2332..1739559:2335],(1739559:2336..1739559:2337],(1739559:2338..1739559:2341],(1739559:2346..1739559:2351],(1739559:2352..1739559:2354],(1739559:2355..1739559:2356],(1739559:2357..1739559:2361],(1739559:2363..1739559:2367],(1739559:2368..1739559:2371],(1739559:2372..1739559:2376],(1739559:2377..1739559:2383],(1739559:2384..1739559:2389],(1739559:2393..1739559:2396],(1739559:2397..1739559:2401],(1739559:2403..1739559:2410],(1739559:2411..1739559:2412],(1739559:2414..1739559:2418],(1739559:2419..1739559:2421],(1739559:2423..1739559:2424],(1739559:2425..1739559:2428],(1739559:2429..1739559:2431],(1739559:2433..1739559:2434],(1739559:2435..1739559:2440],(1739559:2441..1739559:2442],(1739559:2443..1739559:2445],(1739559:2446..1739559:2449],(1739559:2450..1739559:2451],(1739559:2452..1739559:2456],(1739559:2457..1739559:2462],(1739559:2464..1739559:2470],(1739559:2471..1739559:2475],(1739559:2476..1739559:2482],(1739559:2483..1739559:2489],(1739559:2492..1739559:2494],(1739559:2496..1739559:2499],(1739559:2502..1739559:2504],(1739559:2505..1739559:2506],(1739559:2508..1739559:2513],(1739559:2514..1739559:2519],(1739559:2520..1739559:2522],(1739559:2524..1739559:2525],(1739559:2526..1739559:2528],(1739559:2529..1739559:2530],(1739559:2532..1739559:2533],(1739559:2535..1739559:2536],(1739559:2538..1739559:2542],(1739559:2546..1739559:2548],(1739559:2550..1739559:2552],(1739559:2553..1739559:2554],(1739559:2555..1739559:2563],(1739559:2566..1739559:2569],(1739559:2570..1739559:2574],(1739559:2575..1739559:2577],(1739559:2578..1739559:2580],(1739559:2581..1739559:2590],(1739559:2591..1739559:2596],(1739559:2597..1739559:2598],(1739559:2604..1739559:2606],(1739559:2611..1739559:2613],(1739559:2616..1739559:2620],(1739559:2622..1739559:2625],(1739559:2626..1739559:2630],(1739559:2633..1739559:2635],(1739559:2637..1739559:2640],(1739559:2641..1739559:2642],(1739559:2644..1739559:2646],(1739559:2648..1739559:2650],(1739559:2655..1739559:2661],(1739559:2663..1739559:2669],(1739559:2670..1739559:2672],(1739559:2674..1739559:2675],(1739559:2676..1739559:2678],(1739559:2679..1739559:2682],(1739559:2683..1739559:2688],(1739559:2690..1739559:2692],(1739559:2694..1739559:2695],(1739559:2696..1739559:2699],(1739559:2701..1739559:2703],(1739559:2705..1739559:2706],(1739559:2707..1739559:2708],(1739559:2709..1739559:2717],(1739559:2718..1739559:2721],(1739559:2722..1739559:2728],(1739559:2729..1739559:2730],(1739559:2731..1739559:2740],(1739559:2743..1739559:2745],(1739559:2746..1739559:2747],(1739559:2749..1739559:2754],(1739559:2756..1739559:2758],(1739559:2762..1739559:2764],(1739559:2765..1739559:2768],(1739559:2772..1739559:2773],(1739559:2774..1739559:2775],(1739559:2779..1739559:2784],(1739559:2785..1739559:2787],(1739559:2789..1739559:2791],(1739559:2793..1739559:2794],(1739559:2796..1739559:2799],(1739559:2800..1739559:2801],(1739559:2802..1739559:2803],(1739559:2805..1739559:2807],(1739559:2809..1739559:2810],(1739559:2813..1739559:2814],(1739559:2817..1739559:2818],(1739559:2821..1739559:2822],(1739559:2823..1739559:2824],(1739559:2825..1739559:2831],(1739559:2832..1739559:2833],(1739559:2835..1739559:2838],(1739559:2840..1739559:2842],(1739559:2843..1739559:2844],(1739559:2846..1739559:2850],(1739559:2851..1739559:2852],(1739559:2853..1739559:2854],(1739559:2855..1739559:2861],(1739559:2862..1739559:2864],(1739559:2866..1739559:2867],(1739559:2871..1739559:2875],(1739559:2876..1739559:2879],(1739559:2881..1739559:2884],(1739559:2885..1739559:2886],(1739559:2887..1739559:2889],(1739559:2890..1739559:2891],(1739559:2892..1739559:2893],(1739559:2895..1739559:2899],(1739559:2904..1739559:2905],(1739559:2906..1739559:2909],(1739559:2911..1739559:2914],(1739559:2915..1739559:2917],(1739559:2919..1739559:2921],(1739559:2923..1739559:2928],(1739559:2929..1739559:2930],(1739559:2931..1739559:2934],(1739559:2935..1739559:2939],(1739559:2940..1739559:2944],(1739559:2947..1739559:2951],(1739559:2954..1739559:2955],(1739559:2960..1739559:2961],(1739559:2965..1739559:2966],(1739559:2969..1739559:2970],(1739559:2972..1739559:2974],(1739559:2975..1739559:2976],(1739559:2977..1739559:2985],(1739559:2987..1739559:2988],(1739559:2991..1739559:2994],(1739559:2995..1739559:2996],(1739559:2997..1739559:3000],(1739559:3001..1739559:3002],(1739559:3003..1739559:3006],(1739559:3012..1739559:3014],(1739559:3016..1739559:3019],(1739559:3022..1739559:3023],(1739559:3024..1739559:3025],(1739559:3032..1739559:3035],(1739559:3036..1739559:3038],(1739559:3040..1739559:3043],(1739559:3046..1739559:3047],(1739559:3048..1739559:3049],(1739559:3050..1739559:3051],(1739559:3052..1739559:3053],(1739559:3060..1739559:3061],(1739559:3062..1739559:3065],(1739559:3075..1739559:3076],(1739559:3077..1739559:3083],(1739559:3084..1739559:3085],(1739559:3086..1739559:3089],(1739559:3091..1739559:3093],(1739559:3094..1739559:3095],(1739559:3097..1739559:3098],(1739559:3103..1739559:3104],(1739559:3106..1739559:3107],(1739559:3113..1739559:3116],(1739559:3118..1739559:3119],(1739559:3122..1739559:3123],(1739559:3125..1739559:3126],(1739559:3127..1739559:3129],(1739559:3133..1739559:3136],(1739559:3138..1739559:3150],(1739559:3151..1739559:3153],(1739559:3154..1739559:3158],(1739559:3159..1739559:3166],(1739559:3167..1739559:3172],(1739559:3173..1739559:3175],(1739559:3176..1739559:3178],(1739559:3179..1739559:3180],(1739559:3181..1739559:3185],(1739559:3186..1739559:3187],(1739559:3188..1739559:3190],(1739559:3191..1739559:3193],(1739559:3195..1739559:3197],(1739559:3199..1739559:3200],(1739559:3205..1739559:3206],(1739559:3207..1739559:3208],(1739559:3211..1739559:3212],(1739559:3215..1739559:3216],(1739559:3217..1739559:3218],(1739559:3219..1739559:3224],(1739559:3225..1739559:3226],(1739559:3227..1739559:3229],(1739559:3231..1739559:3232],(1739559:3233..1739559:3234],(1739559:3235..1739559:3236],(1739559:3237..1739559:3238],(1739559:3241..1739559:3243],(1739559:3245..1739559:3246],(1739559:3247..1739559:3248],(1739559:3253..1739559:3258],(1739559:3260..1739559:3261],(1739559:3262..1739559:3263],(1739559:3265..1739559:3266],(1739559:3267..1739559:3269],(1739559:3274..1739559:3276],(1739559:3277..1739559:3279],(1739559:3281..1739559:3284],(1739559:3285..1739559:3286],(1739559:3287..1739559:3288],(1739559:3290..1739559:3292],(1739559:3293..1739559:3295],(1739559:3297..1739559:3298],(1739559:3300..1739559:3301],(1739559:3302..1739559:3303],(1739559:3307..1739559:3308],(1739559:3310..1739559:3312],(1739559:3313..1739559:3316],(1739559:3322..1739559:3323],(1739559:3327..1739559:3328],(1739559:3332..1739559:3334],(1739559:3336..1739559:3338],(1739559:3343..1739559:3344],(1739559:3345..1739559:3346],(1739559:3351..1739559:3352],(1739559:3353..1739559:3354],(1739559:3359..1739559:3360],(1739559:3361..1739559:3362],(1739559:3368..1739559:3369],(1739559:3380..1739559:3381],(1739559:3382..1739559:3383],(1739559:3386..1739559:3388],(1739559:3390..1739559:3392],(1739559:3395..1739559:3396],(1739559:3397..1739559:3398],(1739559:3400..1739559:3402],(1739559:3403..1739559:3404],(1739559:3406..1739559:3407],(1739559:3408..1739559:3409],(1739559:3413..1739559:3414],(1739559:3415..1739559:3416],(1739559:3421..1739559:3423],(1739559:3424..1739559:3426],(1739559:3429..1739559:3430],(1739559:3431..1739559:3432],(1739559:3433..1739559:3436],(1739559:3442..1739559:3444],(1739559:3447..1739559:3448],(1739559:3454..1739559:3455],(1739559:3461..1739559:3462],(1739559:3467..1739559:3468],(1739559:3473..1739559:3475],(1739559:3482..1739559:3486],(1739559:3487..1739559:3488],(1739559:3489..1739559:3490],(1739559:3491..1739559:3492],(1739559:3494..1739559:3495],(1739559:3496..1739559:3497],(1739559:3502..1739559:3503],(1739559:3505..1739559:3506],(1739559:3508..1739559:3509],(1739559:3515..1739559:3516],(1739559:3523..1739559:3524],(1739559:3528..1739559:3529],(1739559:3532..1739559:3533],(1739559:3539..1739559:3540],(1739559:3545..1739559:3546],(1739559:3555..1739559:3556],(1739559:3559..1739559:3560],(1739559:3562..1739559:3564],(1739559:3567..1739559:3568],(1739559:3570..1739559:3572],(1739559:3573..1739559:3574],(1739559:3579..1739559:3582],(1739559:3583..1739559:3584],(1739559:3585..1739559:3587],(1739559:3596..1739559:3598],(1739559:3599..1739559:3600],(1739559:3602..1739559:3603],(1739559:3606..1739559:3607],(1739559:3608..1739559:3609],(1739559:3614..1739559:3615],(1742277:-1..1742277:22],(1742802:-1..1742802:0],(1743592:-1..1743592:34],(1744071:-1..1744071:30],(1744244:-1..1744244:42],(1744474:-1..1744474:43],(1744955:-1..1744955:24],(1745168:-1..1745168:47],(1745373:-1..1745373:44],(1745594:-1..1745594:37],(1746041:-1..1746041:35],(1746268:-1..1746268:33],(1746487:-1..1746487:33],(1746710:-1..1746710:442],(1749679:-1..1749679:22],(1755285:-1..1755285:963]]",
      "lastLedgerSwitchTimestamp" : "2025-01-29T06:11:35.773Z",
      "state" : "Open",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1868,
      "totalNonContiguousDeletedMessagesRange" : 471,
      "subscriptionHavePendingRead" : true,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },
  "schemaLedgers" : [ ],
  "compactedLedger" : {
    "ledgerId" : -1,
    "entries" : -1,
    "size" : -1,
    "offloaded" : false,
    "underReplicated" : false
  }
}

so readPosition is moving but markDeletePosition stuck. Sometimes restarting consumers or brokers makes that markDeletePosition is moving forward and backlog goes down.

We also tried resetting the cursor to markDeletePosition to read the messages again, and that also helped. We did that by this command:

./pulsar-admin topics reset-cursor persistent://topic/prefix/topic-partition-48 -s subscription -m 1739559:1480

What did you expect to see?

No partition stuck and backlog does not grow

What did you see instead?

Backlog grows because markDeletePosition does not move

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@szkoludasebastian szkoludasebastian added the type/bug The PR fixed a bug or issue reported a bug label Jan 29, 2025
@lhotari
Copy link
Member

lhotari commented Jan 31, 2025

Thanks for reporting @szkoludasebastian . Would you be able to share also topic stats? It would be preferred to have this for a supported version (4.0.2) since that's what we are maintaining actively and there's also msgInReplay information since 3.0.7/3.3.2/4.0.0 with #23224 changes. The reason why msgInReplay would be relevant is that I'm interested to see if this is similar to #23845 .

  1. After all bookies restarted and are up and running wait for 30 seconds and stop sending data

It could be similar to Issue 23845 since I understood based on the above description that in your case you stop sending data and the backlog remains after that.
@szkoludasebastian Some questions that would help isolate the issue:

  • Does the backlog get consumed if you produce at least one message to each partition later?
  • Does the problem reproduce when you configure dispatcherDispatchMessagesInSubscriptionThread=false?

@szkoludasebastian
Copy link
Author

  1. New messages are consumed but the backlog is still growing then because markDeletePosition is not moving due to the fact that for some reason there is no continuity in acknowledgments

@szkoludasebastian
Copy link
Author

szkoludasebastian commented Jan 31, 2025

I will check that with dispatcherDispatchMessagesInSubscriptionThread=false. Where to configure that?

@lhotari
Copy link
Member

lhotari commented Jan 31, 2025

I will check that with dispatcherDispatchMessagesInSubscriptionThread=false. Where to configure that?

@szkoludasebastian In broker.conf (reference). In Apache Pulsar Helm chart k8s deployments, you add dispatcherDispatchMessagesInSubscriptionThread: "false" under the broker.configData key in values.yaml.

@lhotari
Copy link
Member

lhotari commented Jan 31, 2025

  1. New messages are consumed but the backlog is still growing then because markDeletePosition is not moving due to the fact that for some reason there is no continuity in acknowledgments

@szkoludasebastian That means that the broker hasn't received the acknowledgements or lost them. It's indeed very useful information if you can consistently reproduce the issue with 3.3.2 and not with 3.3.1 . In that case, it should be possible to find the commit that introduced the regression. Since I don't have the reproducer, I cannot do this on behalf of you. For finding the commit in v3.3.1...v3.3.2 (release notes), one possible solution would be to do interactive git-bisecting (or just manual bisecting) where you first pick the commit in the middle of v3.3.1...v3.3.2 and build a Pulsar binary and test that. If it fails, you know that the regression was introduced before that commit. You keep splitting the commits until you find the commit that introduced the regression. In Pulsar, the docker images are built with the commands documented here: https://pulsar.apache.org/contribute/release-process/#release-pulsar-30-and-later (prerequisite is https://pulsar.apache.org/contribute/release-process/#build-release-artifacts). Would you be able to perform this task so that we'd find the commit that introduced the regression?
I guess you have already checked whether the client version matters so that we could be sure that it's a broker side issue?

@lhotari
Copy link
Member

lhotari commented Jan 31, 2025

Is the issue that you are facing similar to #22709? What's different this time?

@szkoludasebastian
Copy link
Author

Is the issue that you are facing similar to #22709? What's different this time?

it appeared that it was problem on our side.

@lhotari
Copy link
Member

lhotari commented Jan 31, 2025

@szkoludasebastian Just confirming: do you happen to use Pulsar transactions?

@szkoludasebastian
Copy link
Author

Partition-4 stats-internal:

{
  "entriesAddedCounter" : 285,
  "numberOfEntries" : 409,
  "totalSize" : 4481595,
  "currentLedgerEntries" : 285,
  "currentLedgerSize" : 3521965,
  "lastLedgerCreatedTimestamp" : "2025-01-31T09:46:39.629Z",
  "waitingCursorsCount" : 1,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "2044129:284",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 2043391,
    "entries" : 124,
    "size" : 959630,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 2044129,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "microbatcher" : {
      "markDeletePosition" : "2043391:112",
      "readPosition" : "2044129:285",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 273,
      "cursorLedger" : 2044305,
      "cursorLedgerLastEntry" : 44,
      "individuallyDeletedMessages" : "[(2043391:122..2043391:123],(2044129:-1..2044129:282]]",
      "lastLedgerSwitchTimestamp" : "2025-01-31T09:46:39.675Z",
      "state" : "Open",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 297,
      "totalNonContiguousDeletedMessagesRange" : 2,
      "subscriptionHavePendingRead" : true,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },
  "schemaLedgers" : [ ],
  "compactedLedger" : {
    "ledgerId" : -1,
    "entries" : -1,
    "size" : -1,
    "offloaded" : false,
    "underReplicated" : false
  }
}

Topic partition-4 stats:

{
  "msgRateIn" : 0.016666664188055923,
  "msgThroughputIn" : 16.849997494124537,
  "msgRateOut" : 0.016666664151389267,
  "msgThroughputOut" : 16.84999745705455,
  "bytesInCounter" : 3522976,
  "msgInCounter" : 832,
  "systemTopicBytesInCounter" : 0,
  "bytesOutCounter" : 3658127,
  "msgOutCounter" : 863,
  "bytesOutInternalCounter" : 0,
  "averageMsgSize" : 1011.0,
  "msgChunkPublished" : false,
  "storageSize" : 4482606,
  "backlogSize" : 3608212,
  "backlogQuotaLimitSize" : -1,
  "backlogQuotaLimitTime" : -1,
  "oldestBacklogMessageAgeSeconds" : 956,
  "oldestBacklogMessageSubscriptionName" : "microbatcher",
  "publishRateLimitedTimes" : 0,
  "earliestMsgPublishTimeInBacklogs" : 0,
  "offloadedStorageSize" : 0,
  "lastOffloadLedgerId" : 0,
  "lastOffloadSuccessTimeStamp" : 0,
  "lastOffloadFailureTimeStamp" : 0,
  "ongoingTxnCount" : 0,
  "abortedTxnCount" : 0,
  "committedTxnCount" : 0,
  "publishers" : [ {
    "accessMode" : "Shared",
    "msgRateIn" : 0.016666664188055923,
    "msgThroughputIn" : 16.849997494124537,
    "averageMsgSize" : 1011.0,
    "chunkedMessageRate" : 0.0,
    "producerId" : 5,
    "supportsPartialProducer" : false,
    "producerName" : "integ-pulsar-1162-18",
    "address" : "/10.10.178.13:38552",
    "connectedSince" : "2025-01-31T09:46:39.801991145Z",
    "clientVersion" : "Pulsar-Java-v4.0.2",
    "metadata" : { }
  }, {
    "accessMode" : "Shared",
    "msgRateIn" : 0.0,
    "msgThroughputIn" : 0.0,
    "averageMsgSize" : 0.0,
    "chunkedMessageRate" : 0.0,
    "producerId" : 5,
    "supportsPartialProducer" : false,
    "producerName" : "integ-pulsar-1162-21",
    "address" : "/10.10.178.13:38530",
    "connectedSince" : "2025-01-31T09:46:39.802282863Z",
    "clientVersion" : "Pulsar-Java-v4.0.2",
    "metadata" : { }
  } ],
  "waitingPublishers" : 0,
  "subscriptions" : {
    "microbatcher" : {
      "msgRateOut" : 0.016666664151389267,
      "msgThroughputOut" : 16.84999745705455,
      "bytesOutCounter" : 3658127,
      "msgOutCounter" : 863,
      "msgRateRedeliver" : 0.0,
      "messageAckRate" : 0.049999992325001186,
      "chunkedMessageRate" : 0.0,
      "msgBacklog" : 10,
      "backlogSize" : 3608212,
      "earliestMsgPublishTimeInBacklog" : 0,
      "msgBacklogNoDelayed" : 10,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "msgInReplay" : 0,
      "unackedMessages" : 26,
      "type" : "Key_Shared",
      "msgRateExpired" : 0.0,
      "totalMsgExpired" : 0,
      "lastExpireTimestamp" : 0,
      "lastConsumedFlowTimestamp" : 1738316912547,
      "lastConsumedTimestamp" : 1738317704211,
      "lastAckedTimestamp" : 1738317707396,
      "lastMarkDeleteAdvancedTimestamp" : 0,
      "consumers" : [ {
        "msgRateOut" : 0.0,
        "msgThroughputOut" : 0.0,
        "bytesOutCounter" : 0,
        "msgOutCounter" : 0,
        "msgRateRedeliver" : 0.0,
        "messageAckRate" : 0.0,
        "chunkedMessageRate" : 0.0,
        "consumerName" : "microbatcher-consumer-27b750cf-9118-4585-9136-34ecb6a90ffe",
        "availablePermits" : 500,
        "unackedMessages" : 0,
        "avgMessagesPerEntry" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "drainingHashesCount" : 0,
        "drainingHashesClearedTotal" : 0,
        "drainingHashesUnackedMessages" : 0,
        "drainingHashes" : [ ],
        "address" : "/10.10.179.78:52838",
        "connectedSince" : "2025-01-31T09:46:39.803306204Z",
        "clientVersion" : "Pulsar-Java-v4.0.2",
        "lastAckedTimestamp" : 0,
        "lastConsumedTimestamp" : 0,
        "lastConsumedFlowTimestamp" : 1738316799814,
        "keyHashRangeArrays" : [ [ 1515, 2038 ], [ 3849, 4299 ], [ 4464, 4577 ], [ 5188, 5234 ], [ 5449, 6708 ], [ 7236, 7239 ], [ 8129, 8361 ], [ 8914, 8926 ], [ 9655, 9829 ], [ 9959, 10040 ], [ 10788, 11077 ], [ 11782, 11963 ], [ 13985, 14384 ], [ 15106, 15200 ], [ 15782, 16093 ], [ 16122, 16315 ], [ 16335, 16759 ], [ 17059, 17076 ], [ 18068, 18300 ], [ 18488, 18578 ], [ 19599, 19681 ], [ 20131, 20189 ], [ 20514, 21012 ], [ 21891, 22294 ], [ 22450, 22468 ], [ 22776, 22891 ], [ 23152, 23237 ], [ 24151, 24397 ], [ 24642, 25135 ], [ 25570, 25656 ], [ 26289, 26293 ], [ 26423, 26843 ], [ 27974, 28209 ], [ 28860, 29208 ], [ 29779, 29816 ], [ 30007, 30317 ], [ 31403, 31610 ], [ 32887, 33005 ], [ 33401, 33717 ], [ 33891, 34068 ], [ 34205, 34262 ], [ 35101, 35158 ], [ 36566, 37028 ], [ 37881, 38931 ], [ 39256, 39782 ], [ 40111, 40142 ], [ 40267, 40604 ], [ 40715, 41186 ], [ 41403, 41667 ], [ 43168, 43394 ], [ 43405, 44313 ], [ 44795, 44960 ], [ 46331, 46365 ], [ 46706, 47175 ], [ 47590, 48125 ], [ 48549, 48675 ], [ 49501, 49517 ], [ 49831, 50064 ], [ 50422, 50460 ], [ 52018, 52338 ], [ 52529, 52991 ], [ 53814, 53849 ], [ 54740, 54763 ], [ 55544, 55616 ], [ 55732, 56176 ], [ 56501, 56621 ], [ 56727, 56815 ], [ 57158, 57334 ], [ 57860, 58229 ], [ 58242, 58476 ], [ 59275, 59361 ], [ 59970, 60019 ], [ 61151, 61210 ], [ 62030, 62291 ], [ 62590, 62610 ], [ 63450, 63735 ], [ 64945, 65168 ] ],
        "metadata" : { },
        "lastAckedTime" : "1970-01-01T00:00:00Z",
        "lastConsumedTime" : "1970-01-01T00:00:00Z"
      }, {
        "msgRateOut" : 0.016666664151389267,
        "msgThroughputOut" : 16.84999745705455,
        "bytesOutCounter" : 3649823,
        "msgOutCounter" : 851,
        "msgRateRedeliver" : 0.0,
        "messageAckRate" : 0.016666664146667048,
        "chunkedMessageRate" : 0.0,
        "consumerName" : "microbatcher-consumer-e02c3d33-2ff6-4fcb-9fdc-3e8f3773ae36",
        "availablePermits" : 399,
        "unackedMessages" : 26,
        "avgMessagesPerEntry" : 2,
        "blockedConsumerOnUnackedMsgs" : false,
        "drainingHashesCount" : 0,
        "drainingHashesClearedTotal" : 0,
        "drainingHashesUnackedMessages" : 0,
        "drainingHashes" : [ ],
        "address" : "/10.10.179.105:36012",
        "connectedSince" : "2025-01-31T09:46:39.803747416Z",
        "clientVersion" : "Pulsar-Java-v4.0.2",
        "lastAckedTimestamp" : 1738317707396,
        "lastConsumedTimestamp" : 1738317704211,
        "lastConsumedFlowTimestamp" : 1738316912547,
        "keyHashRangeArrays" : [ [ 406, 441 ], [ 661, 1185 ], [ 3706, 3848 ], [ 4300, 4463 ], [ 4578, 5029 ], [ 6709, 6722 ], [ 6833, 6877 ], [ 7301, 7326 ], [ 7506, 7595 ], [ 9503, 9572 ], [ 10041, 10427 ], [ 10536, 10583 ], [ 10605, 10787 ], [ 11078, 11101 ], [ 11115, 11166 ], [ 11448, 11781 ], [ 12319, 13683 ], [ 14385, 14534 ], [ 14706, 15105 ], [ 15201, 15651 ], [ 15659, 15781 ], [ 17077, 17309 ], [ 17568, 18019 ], [ 18805, 19188 ], [ 19256, 19353 ], [ 19484, 19598 ], [ 20072, 20078 ], [ 20190, 20213 ], [ 21106, 21458 ], [ 21676, 21731 ], [ 21753, 21890 ], [ 23238, 24150 ], [ 25136, 25343 ], [ 26294, 26422 ], [ 27172, 27880 ], [ 27944, 27973 ], [ 28653, 28859 ], [ 30373, 30610 ], [ 31110, 31402 ], [ 32104, 32194 ], [ 32350, 32436 ], [ 33006, 33097 ], [ 35454, 35941 ], [ 36289, 36430 ], [ 37029, 37034 ], [ 37596, 37679 ], [ 37806, 37880 ], [ 38946, 39181 ], [ 39783, 40110 ], [ 40605, 40714 ], [ 41187, 41232 ], [ 41314, 41402 ], [ 42094, 42417 ], [ 42549, 42795 ], [ 44462, 44609 ], [ 46240, 46322 ], [ 46366, 46422 ], [ 46581, 46705 ], [ 48893, 49220 ], [ 49518, 49601 ], [ 49621, 49830 ], [ 50749, 51596 ], [ 52339, 52528 ], [ 53770, 53813 ], [ 53850, 54079 ], [ 54222, 54320 ], [ 54764, 55360 ], [ 56177, 56219 ], [ 56622, 56650 ], [ 56867, 56896 ], [ 58230, 58241 ], [ 58491, 58830 ], [ 60644, 60926 ], [ 61211, 61299 ], [ 62611, 62962 ], [ 63344, 63449 ], [ 64566, 64944 ], [ 65169, 65177 ] ],
        "metadata" : { },
        "lastAckedTime" : "2025-01-31T10:01:47.396Z",
        "lastConsumedTime" : "2025-01-31T10:01:44.211Z"
      }, {
        "msgRateOut" : 0.0,
        "msgThroughputOut" : 0.0,
        "bytesOutCounter" : 0,
        "msgOutCounter" : 0,
        "msgRateRedeliver" : 0.0,
        "messageAckRate" : 0.0,
        "chunkedMessageRate" : 0.0,
        "consumerName" : "microbatcher-consumer-4ddf9389-86d0-42a0-9b99-eac0c9374d16",
        "availablePermits" : 500,
        "unackedMessages" : 0,
        "avgMessagesPerEntry" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "drainingHashesCount" : 0,
        "drainingHashesClearedTotal" : 0,
        "drainingHashesUnackedMessages" : 0,
        "drainingHashes" : [ ],
        "address" : "/10.10.178.13:41010",
        "connectedSince" : "2025-01-31T09:46:39.803606989Z",
        "clientVersion" : "Pulsar-Java-v4.0.2",
        "lastAckedTimestamp" : 0,
        "lastConsumedTimestamp" : 0,
        "lastConsumedFlowTimestamp" : 1738316799808,
        "keyHashRangeArrays" : [ [ 1, 376 ], [ 501, 525 ], [ 1186, 1502 ], [ 2039, 2287 ], [ 2524, 2794 ], [ 3588, 3705 ], [ 6723, 6832 ], [ 6878, 7235 ], [ 7596, 8128 ], [ 8446, 8913 ], [ 8927, 9502 ], [ 9647, 9654 ], [ 9830, 9958 ], [ 10428, 10535 ], [ 10584, 10604 ], [ 11102, 11114 ], [ 15652, 15658 ], [ 16094, 16121 ], [ 17310, 17567 ], [ 18301, 18487 ], [ 20214, 20513 ], [ 22357, 22449 ], [ 22469, 22775 ], [ 22892, 22961 ], [ 24398, 24537 ], [ 25968, 26288 ], [ 26969, 27171 ], [ 27881, 27943 ], [ 28210, 28400 ], [ 29461, 29778 ], [ 29817, 30006 ], [ 30356, 30372 ], [ 30611, 30811 ], [ 31611, 32103 ], [ 32437, 32886 ], [ 33718, 33841 ], [ 34069, 34185 ], [ 34263, 35100 ], [ 35159, 35453 ], [ 37203, 37595 ], [ 37680, 37805 ], [ 38932, 38945 ], [ 39182, 39255 ], [ 40143, 40157 ], [ 41668, 42093 ], [ 42488, 42548 ], [ 44375, 44445 ], [ 44616, 44781 ], [ 44961, 45485 ], [ 45867, 45916 ], [ 46423, 46580 ], [ 47176, 47442 ], [ 50065, 50071 ], [ 50461, 50748 ], [ 51619, 51944 ], [ 52003, 52017 ], [ 54321, 54609 ], [ 55361, 55543 ], [ 55617, 55731 ], [ 56220, 56500 ], [ 56651, 56726 ], [ 56897, 57002 ], [ 57343, 57388 ], [ 59362, 59672 ], [ 60335, 60643 ], [ 60927, 61068 ], [ 61399, 61454 ], [ 61698, 62029 ], [ 62292, 62384 ], [ 62535, 62539 ], [ 62963, 63028 ], [ 63736, 63765 ], [ 64276, 64565 ], [ 65205, 65535 ] ],
        "metadata" : { },
        "lastAckedTime" : "1970-01-01T00:00:00Z",
        "lastConsumedTime" : "1970-01-01T00:00:00Z"
      }, {
        "msgRateOut" : 0.0,
        "msgThroughputOut" : 0.0,
        "bytesOutCounter" : 8304,
        "msgOutCounter" : 12,
        "msgRateRedeliver" : 0.0,
        "messageAckRate" : 0.033333328178334135,
        "chunkedMessageRate" : 0.0,
        "consumerName" : "microbatcher-consumer-7e5d4f42-6697-446e-bbb8-6eeb3349269c",
        "availablePermits" : 488,
        "unackedMessages" : 0,
        "avgMessagesPerEntry" : 1,
        "blockedConsumerOnUnackedMsgs" : false,
        "drainingHashesCount" : 0,
        "drainingHashesClearedTotal" : 0,
        "drainingHashesUnackedMessages" : 0,
        "drainingHashes" : [ ],
        "address" : "/10.10.179.78:36344",
        "connectedSince" : "2025-01-31T09:46:39.803620032Z",
        "clientVersion" : "Pulsar-Java-v4.0.2",
        "lastAckedTimestamp" : 1738317673406,
        "lastConsumedTimestamp" : 1738317657199,
        "lastConsumedFlowTimestamp" : 1738316799808,
        "keyHashRangeArrays" : [ [ 377, 405 ], [ 442, 500 ], [ 526, 660 ], [ 1503, 1514 ], [ 2288, 2523 ], [ 2795, 3587 ], [ 5030, 5187 ], [ 5235, 5448 ], [ 7240, 7300 ], [ 7327, 7505 ], [ 8362, 8445 ], [ 9573, 9646 ], [ 11167, 11447 ], [ 11964, 12318 ], [ 13684, 13984 ], [ 14535, 14705 ], [ 16316, 16334 ], [ 16760, 17058 ], [ 18020, 18067 ], [ 18579, 18804 ], [ 19189, 19255 ], [ 19354, 19483 ], [ 19682, 20071 ], [ 20079, 20130 ], [ 21013, 21105 ], [ 21459, 21675 ], [ 21732, 21752 ], [ 22295, 22356 ], [ 22962, 23151 ], [ 24538, 24641 ], [ 25344, 25569 ], [ 25657, 25967 ], [ 26844, 26968 ], [ 28401, 28652 ], [ 29209, 29460 ], [ 30318, 30355 ], [ 30812, 31109 ], [ 32195, 32349 ], [ 33098, 33400 ], [ 33842, 33890 ], [ 34186, 34204 ], [ 35942, 36288 ], [ 36431, 36565 ], [ 37035, 37202 ], [ 40158, 40266 ], [ 41233, 41313 ], [ 42418, 42487 ], [ 42796, 43167 ], [ 43395, 43404 ], [ 44314, 44374 ], [ 44446, 44461 ], [ 44610, 44615 ], [ 44782, 44794 ], [ 45486, 45866 ], [ 45917, 46239 ], [ 46323, 46330 ], [ 47443, 47589 ], [ 48126, 48548 ], [ 48676, 48892 ], [ 49221, 49500 ], [ 49602, 49620 ], [ 50072, 50421 ], [ 51597, 51618 ], [ 51945, 52002 ], [ 52992, 53769 ], [ 54080, 54221 ], [ 54610, 54739 ], [ 56816, 56866 ], [ 57003, 57157 ], [ 57335, 57342 ], [ 57389, 57859 ], [ 58477, 58490 ], [ 58831, 59274 ], [ 59673, 59969 ], [ 60020, 60334 ], [ 61069, 61150 ], [ 61300, 61398 ], [ 61455, 61697 ], [ 62385, 62534 ], [ 62540, 62589 ], [ 63029, 63343 ], [ 63766, 64275 ], [ 65178, 65204 ] ],
        "metadata" : { },
        "lastAckedTime" : "2025-01-31T10:01:13.406Z",
        "lastConsumedTime" : "2025-01-31T10:00:57.199Z"
      } ],
      "isDurable" : true,
      "isReplicated" : false,
      "allowOutOfOrderDelivery" : false,
      "keySharedMode" : "AUTO_SPLIT",
      "consumersAfterMarkDeletePosition" : { },
      "drainingHashesCount" : 0,
      "drainingHashesClearedTotal" : 0,
      "drainingHashesUnackedMessages" : 0,
      "nonContiguousDeletedMessagesRanges" : 2,
      "nonContiguousDeletedMessagesRangesSerializedSize" : 70,
      "delayedMessageIndexSizeInBytes" : 0,
      "subscriptionProperties" : { },
      "filterProcessedMsgCount" : 0,
      "filterAcceptedMsgCount" : 0,
      "filterRejectedMsgCount" : 0,
      "filterRescheduledMsgCount" : 0,
      "durable" : true,
      "replicated" : false
    }
  },
  "replication" : { },
  "deduplicationStatus" : "Disabled",
  "nonContiguousDeletedMessagesRanges" : 2,
  "nonContiguousDeletedMessagesRangesSerializedSize" : 70,
  "delayedMessageIndexSizeInBytes" : 0,
  "compaction" : {
    "lastCompactionRemovedEventCount" : 0,
    "lastCompactionSucceedTimestamp" : 0,
    "lastCompactionFailedTimestamp" : 0,
    "lastCompactionDurationTimeInMills" : 0
  },
  "ownerBroker" : "integ-pulsar-broker-5.integ-pulsar-broker.str-integ.svc.cluster.local:8080"
}

Topic partitioned-stats:

{
  "msgRateIn" : 2.0333408764084946,
  "msgThroughputIn" : 1704.8063034094396,
  "msgRateOut" : 2.033340898884361,
  "msgThroughputOut" : 1704.8063223167255,
  "bytesInCounter" : 330665095,
  "msgInCounter" : 78321,
  "systemTopicBytesInCounter" : 0,
  "bytesOutCounter" : 338307079,
  "msgOutCounter" : 80097,
  "bytesOutInternalCounter" : 0,
  "averageMsgSize" : 612.0249999878398,
  "msgChunkPublished" : false,
  "storageSize" : 339672957,
  "backlogSize" : 55819242,
  "backlogQuotaLimitSize" : -1,
  "backlogQuotaLimitTime" : -1,
  "oldestBacklogMessageAgeSeconds" : 1202,
  "oldestBacklogMessageSubscriptionName" : "microbatcher",
  "publishRateLimitedTimes" : 0,
  "earliestMsgPublishTimeInBacklogs" : 0,
  "offloadedStorageSize" : 0,
  "lastOffloadLedgerId" : 0,
  "lastOffloadSuccessTimeStamp" : 0,
  "lastOffloadFailureTimeStamp" : 0,
  "ongoingTxnCount" : 0,
  "abortedTxnCount" : 0,
  "committedTxnCount" : 0,
  "publishers" : [ {
    "msgRateIn" : 1.0333367134601146,
    "msgThroughputIn" : 874.5694844573518,
    "averageMsgSize" : 420.425,
    "chunkedMessageRate" : 0.0,
    "producerId" : 0,
    "supportsPartialProducer" : false
  }, {
    "msgRateIn" : 1.000004162948381,
    "msgThroughputIn" : 830.2368189520878,
    "averageMsgSize" : 387.9766666666666,
    "chunkedMessageRate" : 0.0,
    "producerId" : 0,
    "supportsPartialProducer" : false
  } ],
  "waitingPublishers" : 0,
  "subscriptions" : {
    "microbatcher" : {
      "msgRateOut" : 2.033340898884361,
      "msgThroughputOut" : 1704.8063223167255,
      "bytesOutCounter" : 338307079,
      "msgOutCounter" : 80097,
      "msgRateRedeliver" : 0.0,
      "messageAckRate" : 2.050007899466562,
      "chunkedMessageRate" : 0.0,
      "msgBacklog" : 131,
      "backlogSize" : 55819242,
      "earliestMsgPublishTimeInBacklog" : 0,
      "msgBacklogNoDelayed" : 131,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "msgInReplay" : 0,
      "unackedMessages" : 214,
      "type" : "Key_Shared",
      "msgRateExpired" : 0.0,
      "totalMsgExpired" : 0,
      "lastExpireTimestamp" : 0,
      "lastConsumedFlowTimestamp" : 0,
      "lastConsumedTimestamp" : 0,
      "lastAckedTimestamp" : 0,
      "lastMarkDeleteAdvancedTimestamp" : 0,
      "consumers" : [ {
        "msgRateOut" : 0.600002054159385,
        "msgThroughputOut" : 473.685136454395,
        "bytesOutCounter" : 64861170,
        "msgOutCounter" : 15575,
        "msgRateRedeliver" : 0.0,
        "messageAckRate" : 0.6166689861685504,
        "chunkedMessageRate" : 0.0,
        "availablePermits" : 46675,
        "unackedMessages" : 95,
        "avgMessagesPerEntry" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "drainingHashesCount" : 0,
        "drainingHashesClearedTotal" : 51,
        "drainingHashesUnackedMessages" : 0,
        "drainingHashes" : [ ],
        "lastAckedTimestamp" : 0,
        "lastConsumedTimestamp" : 0,
        "lastConsumedFlowTimestamp" : 0,
        "keyHashRangeArrays" : [ [ 1, 376 ], [ 501, 525 ], [ 1186, 1502 ], [ 2039, 2287 ], [ 2524, 2794 ], [ 3588, 3705 ], [ 6723, 6832 ], [ 6878, 7235 ], [ 7596, 8128 ], [ 8446, 8913 ], [ 8927, 9502 ], [ 9647, 9654 ], [ 9830, 9958 ], [ 10428, 10535 ], [ 10584, 10604 ], [ 11102, 11114 ], [ 15652, 15658 ], [ 16094, 16121 ], [ 17310, 17567 ], [ 18301, 18487 ], [ 20214, 20513 ], [ 22357, 22449 ], [ 22469, 22775 ], [ 22892, 22961 ], [ 24398, 24537 ], [ 25968, 26288 ], [ 26969, 27171 ], [ 27881, 27943 ], [ 28210, 28400 ], [ 29461, 29778 ], [ 29817, 30006 ], [ 30356, 30372 ], [ 30611, 30811 ], [ 31611, 32103 ], [ 32437, 32886 ], [ 33718, 33841 ], [ 34069, 34185 ], [ 34263, 35100 ], [ 35159, 35453 ], [ 37203, 37595 ], [ 37680, 37805 ], [ 38932, 38945 ], [ 39182, 39255 ], [ 40143, 40157 ], [ 41668, 42093 ], [ 42488, 42548 ], [ 44375, 44445 ], [ 44616, 44781 ], [ 44961, 45485 ], [ 45867, 45916 ], [ 46423, 46580 ], [ 47176, 47442 ], [ 50065, 50071 ], [ 50461, 50748 ], [ 51619, 51944 ], [ 52003, 52017 ], [ 54321, 54609 ], [ 55361, 55543 ], [ 55617, 55731 ], [ 56220, 56500 ], [ 56651, 56726 ], [ 56897, 57002 ], [ 57343, 57388 ], [ 59362, 59672 ], [ 60335, 60643 ], [ 60927, 61068 ], [ 61399, 61454 ], [ 61698, 62029 ], [ 62292, 62384 ], [ 62535, 62539 ], [ 62963, 63028 ], [ 63736, 63765 ], [ 64276, 64565 ], [ 65205, 65535 ] ],
        "lastAckedTime" : "1970-01-01T00:00:00Z",
        "lastConsumedTime" : "1970-01-01T00:00:00Z"
      }, {
        "msgRateOut" : 0.5166690860734111,
        "msgThroughputOut" : 405.385061886289,
        "bytesOutCounter" : 80591750,
        "msgOutCounter" : 19173,
        "msgRateRedeliver" : 0.0,
        "messageAckRate" : 0.5500023426171868,
        "chunkedMessageRate" : 0.0,
        "availablePermits" : 46577,
        "unackedMessages" : 61,
        "avgMessagesPerEntry" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "drainingHashesCount" : 0,
        "drainingHashesClearedTotal" : 11,
        "drainingHashesUnackedMessages" : 0,
        "drainingHashes" : [ ],
        "lastAckedTimestamp" : 0,
        "lastConsumedTimestamp" : 0,
        "lastConsumedFlowTimestamp" : 0,
        "keyHashRangeArrays" : [ [ 377, 405 ], [ 442, 500 ], [ 526, 660 ], [ 1503, 1514 ], [ 2288, 2523 ], [ 2795, 3587 ], [ 5030, 5187 ], [ 5235, 5448 ], [ 7240, 7300 ], [ 7327, 7505 ], [ 8362, 8445 ], [ 9573, 9646 ], [ 11167, 11447 ], [ 11964, 12318 ], [ 13684, 13984 ], [ 14535, 14705 ], [ 16316, 16334 ], [ 16760, 17058 ], [ 18020, 18067 ], [ 18579, 18804 ], [ 19189, 19255 ], [ 19354, 19483 ], [ 19682, 20071 ], [ 20079, 20130 ], [ 21013, 21105 ], [ 21459, 21675 ], [ 21732, 21752 ], [ 22295, 22356 ], [ 22962, 23151 ], [ 24538, 24641 ], [ 25344, 25569 ], [ 25657, 25967 ], [ 26844, 26968 ], [ 28401, 28652 ], [ 29209, 29460 ], [ 30318, 30355 ], [ 30812, 31109 ], [ 32195, 32349 ], [ 33098, 33400 ], [ 33842, 33890 ], [ 34186, 34204 ], [ 35942, 36288 ], [ 36431, 36565 ], [ 37035, 37202 ], [ 40158, 40266 ], [ 41233, 41313 ], [ 42418, 42487 ], [ 42796, 43167 ], [ 43395, 43404 ], [ 44314, 44374 ], [ 44446, 44461 ], [ 44610, 44615 ], [ 44782, 44794 ], [ 45486, 45866 ], [ 45917, 46239 ], [ 46323, 46330 ], [ 47443, 47589 ], [ 48126, 48548 ], [ 48676, 48892 ], [ 49221, 49500 ], [ 49602, 49620 ], [ 50072, 50421 ], [ 51597, 51618 ], [ 51945, 52002 ], [ 52992, 53769 ], [ 54080, 54221 ], [ 54610, 54739 ], [ 56816, 56866 ], [ 57003, 57157 ], [ 57335, 57342 ], [ 57389, 57859 ], [ 58477, 58490 ], [ 58831, 59274 ], [ 59673, 59969 ], [ 60020, 60334 ], [ 61069, 61150 ], [ 61300, 61398 ], [ 61455, 61697 ], [ 62385, 62534 ], [ 62540, 62589 ], [ 63029, 63343 ], [ 63766, 64275 ], [ 65178, 65204 ] ],
        "lastAckedTime" : "1970-01-01T00:00:00Z",
        "lastConsumedTime" : "1970-01-01T00:00:00Z"
      }, {
        "msgRateOut" : 0.48333491160967085,
        "msgThroughputOut" : 424.8514676772335,
        "bytesOutCounter" : 78352903,
        "msgOutCounter" : 18496,
        "msgRateRedeliver" : 0.0,
        "messageAckRate" : 0.5166687138351619,
        "chunkedMessageRate" : 0.0,
        "availablePermits" : 46754,
        "unackedMessages" : 10,
        "avgMessagesPerEntry" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "drainingHashesCount" : 0,
        "drainingHashesClearedTotal" : 3,
        "drainingHashesUnackedMessages" : 0,
        "drainingHashes" : [ ],
        "lastAckedTimestamp" : 0,
        "lastConsumedTimestamp" : 0,
        "lastConsumedFlowTimestamp" : 0,
        "keyHashRangeArrays" : [ [ 1515, 2038 ], [ 3849, 4299 ], [ 4464, 4577 ], [ 5188, 5234 ], [ 5449, 6708 ], [ 7236, 7239 ], [ 8129, 8361 ], [ 8914, 8926 ], [ 9655, 9829 ], [ 9959, 10040 ], [ 10788, 11077 ], [ 11782, 11963 ], [ 13985, 14384 ], [ 15106, 15200 ], [ 15782, 16093 ], [ 16122, 16315 ], [ 16335, 16759 ], [ 17059, 17076 ], [ 18068, 18300 ], [ 18488, 18578 ], [ 19599, 19681 ], [ 20131, 20189 ], [ 20514, 21012 ], [ 21891, 22294 ], [ 22450, 22468 ], [ 22776, 22891 ], [ 23152, 23237 ], [ 24151, 24397 ], [ 24642, 25135 ], [ 25570, 25656 ], [ 26289, 26293 ], [ 26423, 26843 ], [ 27974, 28209 ], [ 28860, 29208 ], [ 29779, 29816 ], [ 30007, 30317 ], [ 31403, 31610 ], [ 32887, 33005 ], [ 33401, 33717 ], [ 33891, 34068 ], [ 34205, 34262 ], [ 35101, 35158 ], [ 36566, 37028 ], [ 37881, 38931 ], [ 39256, 39782 ], [ 40111, 40142 ], [ 40267, 40604 ], [ 40715, 41186 ], [ 41403, 41667 ], [ 43168, 43394 ], [ 43405, 44313 ], [ 44795, 44960 ], [ 46331, 46365 ], [ 46706, 47175 ], [ 47590, 48125 ], [ 48549, 48675 ], [ 49501, 49517 ], [ 49831, 50064 ], [ 50422, 50460 ], [ 52018, 52338 ], [ 52529, 52991 ], [ 53814, 53849 ], [ 54740, 54763 ], [ 55544, 55616 ], [ 55732, 56176 ], [ 56501, 56621 ], [ 56727, 56815 ], [ 57158, 57334 ], [ 57860, 58229 ], [ 58242, 58476 ], [ 59275, 59361 ], [ 59970, 60019 ], [ 61151, 61210 ], [ 62030, 62291 ], [ 62590, 62610 ], [ 63450, 63735 ], [ 64945, 65168 ] ],
        "lastAckedTime" : "1970-01-01T00:00:00Z",
        "lastConsumedTime" : "1970-01-01T00:00:00Z"
      }, {
        "msgRateOut" : 0.43333484704189446,
        "msgThroughputOut" : 400.884656298808,
        "bytesOutCounter" : 114501256,
        "msgOutCounter" : 26853,
        "msgRateRedeliver" : 0.0,
        "messageAckRate" : 0.36666785684566255,
        "chunkedMessageRate" : 0.0,
        "availablePermits" : 46647,
        "unackedMessages" : 48,
        "avgMessagesPerEntry" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "drainingHashesCount" : 0,
        "drainingHashesClearedTotal" : 0,
        "drainingHashesUnackedMessages" : 0,
        "drainingHashes" : [ ],
        "lastAckedTimestamp" : 0,
        "lastConsumedTimestamp" : 0,
        "lastConsumedFlowTimestamp" : 0,
        "keyHashRangeArrays" : [ [ 406, 441 ], [ 661, 1185 ], [ 3706, 3848 ], [ 4300, 4463 ], [ 4578, 5029 ], [ 6709, 6722 ], [ 6833, 6877 ], [ 7301, 7326 ], [ 7506, 7595 ], [ 9503, 9572 ], [ 10041, 10427 ], [ 10536, 10583 ], [ 10605, 10787 ], [ 11078, 11101 ], [ 11115, 11166 ], [ 11448, 11781 ], [ 12319, 13683 ], [ 14385, 14534 ], [ 14706, 15105 ], [ 15201, 15651 ], [ 15659, 15781 ], [ 17077, 17309 ], [ 17568, 18019 ], [ 18805, 19188 ], [ 19256, 19353 ], [ 19484, 19598 ], [ 20072, 20078 ], [ 20190, 20213 ], [ 21106, 21458 ], [ 21676, 21731 ], [ 21753, 21890 ], [ 23238, 24150 ], [ 25136, 25343 ], [ 26294, 26422 ], [ 27172, 27880 ], [ 27944, 27973 ], [ 28653, 28859 ], [ 30373, 30610 ], [ 31110, 31402 ], [ 32104, 32194 ], [ 32350, 32436 ], [ 33006, 33097 ], [ 35454, 35941 ], [ 36289, 36430 ], [ 37029, 37034 ], [ 37596, 37679 ], [ 37806, 37880 ], [ 38946, 39181 ], [ 39783, 40110 ], [ 40605, 40714 ], [ 41187, 41232 ], [ 41314, 41402 ], [ 42094, 42417 ], [ 42549, 42795 ], [ 44462, 44609 ], [ 46240, 46322 ], [ 46366, 46422 ], [ 46581, 46705 ], [ 48893, 49220 ], [ 49518, 49601 ], [ 49621, 49830 ], [ 50749, 51596 ], [ 52339, 52528 ], [ 53770, 53813 ], [ 53850, 54079 ], [ 54222, 54320 ], [ 54764, 55360 ], [ 56177, 56219 ], [ 56622, 56650 ], [ 56867, 56896 ], [ 58230, 58241 ], [ 58491, 58830 ], [ 60644, 60926 ], [ 61211, 61299 ], [ 62611, 62962 ], [ 63344, 63449 ], [ 64566, 64944 ], [ 65169, 65177 ] ],
        "lastAckedTime" : "1970-01-01T00:00:00Z",
        "lastConsumedTime" : "1970-01-01T00:00:00Z"
      } ],
      "isDurable" : true,
      "isReplicated" : false,
      "allowOutOfOrderDelivery" : false,
      "consumersAfterMarkDeletePosition" : { },
      "drainingHashesCount" : 0,
      "drainingHashesClearedTotal" : 65,
      "drainingHashesUnackedMessages" : 0,
      "nonContiguousDeletedMessagesRanges" : 29,
      "nonContiguousDeletedMessagesRangesSerializedSize" : 1001,
      "delayedMessageIndexSizeInBytes" : 0,
      "subscriptionProperties" : { },
      "filterProcessedMsgCount" : 0,
      "filterAcceptedMsgCount" : 0,
      "filterRejectedMsgCount" : 0,
      "filterRescheduledMsgCount" : 0,
      "replicated" : false,
      "durable" : true
    }
  },
  "replication" : { },
  "nonContiguousDeletedMessagesRanges" : 29,
  "nonContiguousDeletedMessagesRangesSerializedSize" : 1001,
  "delayedMessageIndexSizeInBytes" : 0,
  "compaction" : {
    "lastCompactionRemovedEventCount" : 0,
    "lastCompactionSucceedTimestamp" : 0,
    "lastCompactionFailedTimestamp" : 0,
    "lastCompactionDurationTimeInMills" : 0
  },
  "metadata" : {
    "partitions" : 100,
    "deleted" : false
  },
  "partitions" : { }
}

@szkoludasebastian
Copy link
Author

szkoludasebastian commented Jan 31, 2025

@szkoludasebastian Just confirming: do you happen to use Pulsar transactions?

Checked on broker: This Broker is not configured with transactionCoordinatorEnabled=true so we are not using pulsar transactions. Should I check it in some other way?

@lhotari
Copy link
Member

lhotari commented Jan 31, 2025

@szkoludasebastian Just confirming: do you happen to use Pulsar transactions?

Checked on broker: This Broker is not configured with transactionCoordinatorEnabled=true so we are not using pulsar transactions. Should I check it in some other way?

That's fine. No transactions involved. The application code would explicitly need to use transactions if it was used.
The reason why I asked is that acknowledgement handling is more complicated when using transactions.

@lhotari
Copy link
Member

lhotari commented Jan 31, 2025

In the v3.3.1...v3.3.2 diff, #23072 looks like the largest change related to acknowledgements. If you would be able to run the experiments mentioned in #23908 (comment), that would confirm many things. Building a fork off v3.3.2 with #23072 commit 1996c86 reverted would be one way to check if it's impacting your use case.
There are also a few other commits that could be causing the change in behavior such as #23189 & #23147.

@szkoludasebastian
Copy link
Author

I will check that with dispatcherDispatchMessagesInSubscriptionThread=false. Where to configure that?

@szkoludasebastian In broker.conf (reference). In Apache Pulsar Helm chart k8s deployments, you add dispatcherDispatchMessagesInSubscriptionThread: "false" under the broker.configData key in values.yaml.

Configured that but still able to reproduce issue

@szkoludasebastian
Copy link
Author

In the v3.3.1...v3.3.2 diff, #23072 looks like the largest change related to acknowledgements. If you would be able to run the experiments mentioned in #23908 (comment), that would confirm many things. Building a fork off v3.3.2 with #23072 commit 1996c86 reverted would be one way to check if it's impacting your use case. There are also a few other commits that could be causing the change in behavior such as #23189 & #23147.

Tried that but I have problem with building pulsar-all image for architecture linux/amd64. For linux/arm64 it works correctly but I need to build linux/amd64

@lhotari
Copy link
Member

lhotari commented Feb 3, 2025

Tried that but I have problem with building pulsar-all image for architecture linux/amd64. For linux/arm64 it works correctly but I need to build linux/amd64

What problem are you facing? The way Pulsar docker images are built during the release is documented at https://pulsar.apache.org/contribute/release-process/#release-pulsar-30-and-later . The prerequisite is to build artifacts before that.

@szkoludasebastian
Copy link
Author

I tried to build that locally with this instruction from Readme:

Image I've added this param: -Ddocker.platforms=linux/amd64 so it was like: mvn package -Pdocker,-main -am -pl docker/pulsar-all -DskipTests -Ddocker.platforms=linux/amd64 and the problem is: Image there is no such problem when I remove this param and image apachepulsar/pulsar and apachepulsar/pulsar-all is built locally, but I need linux/amd64 arch

@lhotari
Copy link
Member

lhotari commented Feb 3, 2025

I tried to build that locally with this instruction from Readme:

Unfortunately those instructions aren't up-to-date. Please check #23908 (comment) for the commands used while releasing.

@szkoludasebastian
Copy link
Author

still same issue when i use commands from this link

@lhotari
Copy link
Member

lhotari commented Feb 3, 2025

still same issue when i use commands from this link

It seems that you might be using the wrong commands. In the error message, you shouldn't see apachepulsar/pulsar at all if you are using this command:

mvn clean install -DskipTests
DOCKER_USER=my_docker_user
mvn install -pl docker/pulsar,docker/pulsar-all \
    -DskipTests \
    -Pmain,docker,docker-push \
    -Ddocker.platforms=linux/amd64,linux/arm64 \
    -Ddocker.organization=$DOCKER_USER \
    -Ddocker.noCache=true

@szkoludasebastian
Copy link
Author

still something wrong, on k8s I see such error:

Image

on my local docker images seems to be linux/amd64

Image

@lhotari
Copy link
Member

lhotari commented Feb 3, 2025

still something wrong, on k8s I see such error:

Image on my local docker images seems to be linux/amd64 Image

Did you use these commands to build a multi-platform image?

mvn clean install -DskipTests
DOCKER_USER=my_docker_user
mvn install -pl docker/pulsar,docker/pulsar-all \
    -DskipTests \
    -Pmain,docker,docker-push \
    -Ddocker.platforms=linux/amd64,linux/arm64 \
    -Ddocker.organization=$DOCKER_USER \
    -Ddocker.noCache=true

There are several gotchas in building and handling multi-platform images. I'm not exactly sure, but I think that unless you build a multi-platform image in one shot and push it directly to a docker repository from the build, you might not be able to push it separately with docker push. I have struggled with this issue in the past. You can use a custom docker repository as well, by prefixing the -Ddocker.organization= value with the docker repository.

You might not need a multi-platform image in your case, but if you'd like to build a similar image as the Pulsar images, it's better to stick to the same set of commands.

Another detail is that tagging multi-platform images contains gotchas. In Pulsar, we use the regctl image copy to achieve tagging directly in the repository. (examples). Using docker tag locally and then pushing with docker push could cause in unexpected results with multi-platform images. regctl is available at https://github.com/regclient/regclient .

@szkoludasebastian
Copy link
Author

In the v3.3.1...v3.3.2 diff, #23072 looks like the largest change related to acknowledgements. If you would be able to run the experiments mentioned in #23908 (comment), that would confirm many things. Building a fork off v3.3.2 with #23072 commit 1996c86 reverted would be one way to check if it's impacting your use case. There are also a few other commits that could be causing the change in behavior such as #23189 & #23147.

Finally I was able to build pulsar-all docker image. I tried to revert commits which you mentioned in above comment but still was able to reproduce issue. Do you know which other commit I can try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

2 participants