DotNext.Net.Cluster crash in production since I think version 5.4.0 #242

guillaume-chervet · 2024-06-07T19:55:43Z

We have crashed in production that lock nodes.
The slimfaas code did not change things like to This part, we only update libraries : AxaFrance/SlimFaas@b26e3bd
I'am not sure but I think it is link to these changes :

DotNext.Net.Cluster 5.4.0

Changed binary file format for WAL for more efficient I/O. A new format is incompatible with all previous versions. To enable legacy format, set PersistentState.Options.UseLegacyBinaryFormat property to true
Introduced a new experimental binary format for WAL based on sparse files. Can be enabled with PersistentState.Options.MaxLogEntrySize property

We took the new default system and we have this new error that happen sometime and crash the node 👍

fail: Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware[1]
      An unhandled exception has occurred while executing the request.
      System.ArgumentOutOfRangeException: Non-negative number required. (Parameter 'fileOffset')
         at System.ThrowHelper.ThrowArgumentOutOfRangeException_NeedNonNegNum(String) + 0x30
         at System.IO.RandomAccess.ValidateInput(SafeFileHandle, Int64, Boolean) + 0x5f
         at System.IO.RandomAccess.WriteAsync(SafeFileHandle, IReadOnlyList`1, Int64, CancellationToken) + 0x3d
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Table.<WriteThroughAsync>d__17.MoveNext() + 0x2ad
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16) + 0x22
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendCachedAsync>d__44`1.MoveNext() + 0x740
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16) + 0x22
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAsync>d__47`1.MoveNext() + 0x4f2
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAndCommitSlowAsync>d__50`1.MoveNext() + 0x164
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x29
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.<AppendEntriesAsync>d__101`1.MoveNext() + 0x82b
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x21
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x5ac
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x7f7
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddlewareImpl.<<Invoke>g__Awaited|10_0>d.MoveNext() + 0xac
fail: Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware[1]
      An unhandled exception has occurred while executing the request.
      System.ArgumentOutOfRangeException: Non-negative number required. (Parameter 'fileOffset')
         at System.ThrowHelper.ThrowArgumentOutOfRangeException_NeedNonNegNum(String) + 0x30
         at System.IO.RandomAccess.ValidateInput(SafeFileHandle, Int64, Boolean) + 0x5f
         at System.IO.RandomAccess.WriteAsync(SafeFileHandle, IReadOnlyList`1, Int64, CancellationToken) + 0x3d
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Table.<WriteThroughAsync>d__17.MoveNext() + 0x2ad
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16) + 0x22
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendCachedAsync>d__44`1.MoveNext() + 0x740
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16) + 0x22
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAsync>d__47`1.MoveNext() + 0x4f2
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAndCommitSlowAsync>d__50`1.MoveNext() + 0x164
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x29
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.<AppendEntriesAsync>d__101`1.MoveNext() + 0x82b
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x21
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x5ac
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x7f7
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddlewareImpl.<<Invoke>g__Awaited|10_0>d.MoveNext() + 0xac
fail: Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware[1]
      An unhandled exception has occurred while executing the request.
      System.ArgumentOutOfRangeException: Non-negative number required. (Parameter 'fileOffset')
         at System.ThrowHelper.ThrowArgumentOutOfRangeException_NeedNonNegNum(String) + 0x30
         at System.IO.RandomAccess.ValidateInput(SafeFileHandle, Int64, Boolean) + 0x5f
         at System.IO.RandomAccess.WriteAsync(SafeFileHandle, IReadOnlyList`1, Int64, CancellationToken) + 0x3d
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Table.<WriteThroughAsync>d__17.MoveNext() + 0x2ad
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16) + 0x22
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendUncachedAsync>d__45`1.MoveNext() + 0x214
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAsync>d__47`1.MoveNext() + 0x2e5
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAndCommitSlowAsync>d__50`1.MoveNext() + 0x164
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x29
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.<AppendEntriesAsync>d__101`1.MoveNext() + 0x82b
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x21
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x5ac
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x7f7
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddlewareImpl.<<Invoke>g__Awaited|10_0>d.MoveNext() + 0xac
fail: Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware[1]
      An unhandled exception has occurred while executing the request.
      System.ArgumentOutOfRangeException: Non-negative number required. (Parameter 'fileOffset')
         at System.ThrowHelper.ThrowArgumentOutOfRangeException_NeedNonNegNum(String) + 0x30
         at System.IO.RandomAccess.ValidateInput(SafeFileHandle, Int64, Boolean) + 0x5f
         at System.IO.RandomAccess.WriteAsync(SafeFileHandle, IReadOnlyList`1, Int64, CancellationToken) + 0x3d
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Table.<WriteThroughAsync>d__17.MoveNext() + 0x2ad
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16) + 0x22
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendUncachedAsync>d__45`1.MoveNext() + 0x214
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAsync>d__47`1.MoveNext() + 0x2e5
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAndCommitSlowAsync>d__50`1.MoveNext() + 0x164
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x29
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.<AppendEntriesAsync>d__101`1.MoveNext() + 0x82b
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x21
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x5ac
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x7f7
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddlewareImpl.<<Invoke>g__Awaited|10_0>d.MoveNext() + 0xac
fail: Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware[1]
      An unhandled exception has occurred while executing the request.
      System.ArgumentOutOfRangeException: Non-negative number required. (Parameter 'fileOffset')
         at System.ThrowHelper.ThrowArgumentOutOfRangeException_NeedNonNegNum(String) + 0x30
         at System.IO.RandomAccess.ValidateInput(SafeFileHandle, Int64, Boolean) + 0x5f
         at System.IO.RandomAccess.WriteAsync(SafeFileHandle, IReadOnlyList`1, Int64, CancellationToken) + 0x3d
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Table.<WriteThroughAsync>d__17.MoveNext() + 0x2ad
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16) + 0x22
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendUncachedAsync>d__45`1.MoveNext() + 0x214
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAsync>d__47`1.MoveNext() + 0x2e5
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAndCommitSlowAsync>d__50`1.MoveNext() + 0x164
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x29
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.<AppendEntriesAsync>d__101`1.MoveNext() + 0x82b
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x21
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x5ac
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x7f7
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddlewareImpl.<<Invoke>g__Awaited|10_0>d.MoveNext() + 0xac
fail: Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware[1]
      An unhandled exception has occurred while executing the request.
      System.ArgumentOutOfRangeException: Non-negative number required. (Parameter 'fileOffset')
         at System.ThrowHelper.ThrowArgumentOutOfRangeException_NeedNonNegNum(String) + 0x30
         at System.IO.RandomAccess.ValidateInput(SafeFileHandle, Int64, Boolean) + 0x5f
         at System.IO.RandomAccess.WriteAsync(SafeFileHandle, IReadOnlyList`1, Int64, CancellationToken) + 0x3d
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Table.<WriteThroughAsync>d__17.MoveNext() + 0x2ad
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16) + 0x22
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendUncachedAsync>d__45`1.MoveNext() + 0x214
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAsync>d__47`1.MoveNext() + 0x2e5
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
         at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<AppendAndCommitSlowAsync>d__50`1.MoveNext() + 0x164
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x29
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.<AppendEntriesAsync>d__101`1.MoveNext() + 0x82b
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x16
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2d
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x21
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x5ac
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<AppendEntriesAsync>d__79.MoveNext() + 0x7f7
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e

or like this

at System.ThrowHelper.ThrowArgumentOutOfRangeException_NeedNonNegNum(String) + 0x30
at System.IO.RandomAccess.ValidateInput(SafeFileHandle, Int64, Boolean) + 0x5f
at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Table.Initialize() + 0x2a2
at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<.ctor>g__CreateTables|28_1(SortedSet`1, DirectoryInfo, Int32, Int32, PersistentState.BufferManager&, Int32, PersistentState.WriteMode, Int64) + 0x14f
at DotNext.Net.Cluster.Consensus.Raft.PersistentState..ctor(DirectoryInfo, Int32, PersistentState.Options) + 0x445
at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine..ctor(DirectoryInfo, Int32, MemoryBasedStateMachine.Options) + 0x57
at SlimData.SlimPersistentState..ctor(String path) + 0x10c
at SlimFaas!<BaseAddress>+0xe59cb0
at System.Reflection.DynamicInvokeInfo.InvokeWithFewArguments(IntPtr, Byte&, Byte&, Object[], BinderBundle, Boolean) + 0x84
at System.Reflection.DynamicInvokeInfo.Invoke(Object, IntPtr, Object[], BinderBundle, Boolean) + 0xf3
at Internal.Reflection.Execution.MethodInvokers.InstanceMethodInvoker.CreateInstance(Object[], BinderBundle, Boolean) + 0x43
at Internal.Reflection.Core.Execution.MethodBaseInvoker.CreateInstance(Object[], Binder, BindingFlags, CultureInfo) + 0x3e
at System.Reflection.Runtime.MethodInfos.RuntimePlainConstructorInfo`1.Invoke(BindingFlags, Binder, Object[], CultureInfo) + 0x5b
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite, RuntimeResolverContext) + 0x74
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite callSite, TArgument argument) + 0xa9
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite, ServiceProviderEngineScope) + 0x1d
at Microsoft.Extensions.DependencyInjection.ServiceProvider.CreateServiceAccessor(ServiceIdentifier serviceIdentifier) + 0xf7
at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey, Func`2) + 0xf7
at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(ServiceIdentifier, ServiceProviderEngineScope) + 0x39
at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type) + 0x2d
at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService(IServiceProvider, Type) + 0x51
at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService[T](IServiceProvider provider) + 0x2f
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitFactory(FactoryCallSite, RuntimeResolverContext) + 0xf
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite, RuntimeResolverContext) + 0x74
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite callSite, TArgument argument) + 0xa9
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitConstructor(ConstructorCallSite, RuntimeResolverContext) + 0x86
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite, RuntimeResolverContext) + 0x74
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite callSite, TArgument argument) + 0xa9
at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite, ServiceProviderEngineScope) + 0x1d
at Microsoft.Extensions.DependencyInjection.ServiceProvider.CreateServiceAccessor(ServiceIdentifier serviceIdentifier) + 0xf7
at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey, Func`2) + 0xf7
at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(ServiceIdentifier, ServiceProviderEngineScope) + 0x39
at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(Type) + 0xf
at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService(IServiceProvider, Type) + 0x51
at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService[T](IServiceProvider provider) + 0x2f
at DotNext.Net.Cluster.Consensus.Raft.Http.ConfigurationExtensions.UseConsensusProtocolHandler(IApplicationBuilder) + 0x5d
at SlimData.Startup.Configure(IApplicationBuilder app) + 0x15
at Program.<Main>$(String[] args) + 0x10f5

The text was updated successfully, but these errors were encountered:

sakno · 2024-06-08T08:05:28Z

Have you set UseLegacyBinaryFormat property to true?

guillaume-chervet · 2024-06-08T09:00:49Z

Yes, It is working when I set it back to the old mode. I always errase the previous data storage when I test.
@sakno

sakno · 2024-06-08T15:54:40Z

Do you mean that it crashes on empty WAL with a new format?

guillaume-chervet · 2024-06-08T19:55:46Z

It happen with empty WAL sometime and sometime after an amount of time with the existing WAL @sakno .

sakno · 2024-06-09T08:26:50Z

Do you have a stable repro? I see that the second stack trace is from the tests in your repository.

guillaume-chervet · 2024-06-09T09:22:47Z

It is a kind of random behavior.
But when it start to happen it does not stop.

First logs comes from our production.
Second comes from dev environment from one of 3 nodes at startup.
@sakno

sakno · 2024-06-09T09:31:27Z

It could happen if you trying to open WAL produced by version < 5.4.0 with a new version >=5.4.0 without UseLegacyBinaryFormat set to true. Are you sure that dev environment starts with clean environment without older WAL files?

guillaume-chervet · 2024-06-09T09:35:17Z

Yes I am sure. My store for testing was completly errased.
Idem for the updated production. @sakno

guillaume-chervet · 2024-06-09T09:36:28Z

SlimFaas is compiled in AOT.

sakno · 2024-06-09T10:31:22Z

The second stack trace indicates that WAL is trying to read existing files:

at System.IO.RandomAccess.ValidateInput(SafeFileHandle, Int64, Boolean) + 0x5f
at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Table.Initialize() + 0x2a2
at DotNext.Net.Cluster.Consensus.Raft.PersistentState.<.ctor>g__CreateTables|28_1(SortedSet`1, DirectoryInfo, Int32, Int32, PersistentState.BufferManager&, Int32, PersistentState.WriteMode, Int64) + 0x14f

There is a code for Initialize:

dotNext/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.Partition.cs

Lines 507 to 552 in cacf3e5

    
           internal override void Initialize() 
        
           { 
        
               using var handle = File.OpenHandle(FileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, FileOptions.SequentialScan); 
        
               // read header 
        
               if (RandomAccess.Read(Handle, header.Span, fileOffset: 0L) < HeaderSize) 
        
               { 
        
                   header.Span.Clear(); 
        
               } 
        
               else if (IsSealed) 
        
               { 
        
                   // partition is completed, read table 
        
                   var tableStart = RandomAccess.GetLength(Handle); 
        
                   RandomAccess.Read(Handle, footer.Span, tableStart - footer.Length); 
        
               } 
        
               else 
        
               { 
        
                   // read sequentially every log entry 
        
                   int footerOffset; 
        
                   long fileOffset; 
        
                   if (PartitionNumber is 0L) 
        
                   { 
        
                       footerOffset = LogEntryMetadata.Size; 
        
                       fileOffset = HeaderSize + LogEntryMetadata.Size; 
        
                   } 
        
                   else 
        
                   { 
        
                       footerOffset = 0; 
        
                       fileOffset = HeaderSize; 
        
                   } 
        
                   for (Span<byte> metadataBuffer = this.metadataBuffer.Span, metadataTable = footer.Span; ; footerOffset += LogEntryMetadata.Size) 
        
                   { 
        
                       var count = RandomAccess.Read(Handle, metadataBuffer, fileOffset); 
        
                       if (count < LogEntryMetadata.Size) 
        
                           break; 
        
                       fileOffset = LogEntryMetadata.GetEndOfLogEntry(metadataBuffer); 
        
                       if (fileOffset <= 0L) 
        
                           break; 
        
                       metadataBuffer.CopyTo(metadataTable.Slice(footerOffset, LogEntryMetadata.Size)); 
        
                   } 
        
               } 
        
           }

To get an exception like in your stack trace the program needs to go to the second or third if branch. It is possible only if there is a file in the file system.

guillaume-chervet · 2024-06-09T12:14:25Z

forgot the latest logs @sakno I may made a mistake in our dev kubernetes environment.

Here the logs my collegues sent to me from the crash in production. Occur with the new protocol (in random laps of time near 48 hours and do not happen with the old one). I think it manage near 400 000 writes operation by day.
slimfaas-1-slimfaas.log
slimfaas-2-slimfaas.log
slimfaas-0-slimfaas.log

I do no kown where can come from the negative number.

sakno · 2024-06-09T16:40:05Z

How WAL is configured? How many records per partition, parallel IO, etc? What's the target architecture, x86_64?

guillaume-chervet · 2024-06-09T18:35:58Z

Target architecture is x86 64.
The other options I do not know what it is. Here is the SlimData persistent constructor https://github.com/AxaFrance/SlimFaas/blob/2ca3a8c7589b87dcd560164d7ed643f8f17aa89b/src/SlimData/SlimPersistentState.cs#L19

Thank you @sakno for your help

sakno · 2024-06-09T20:13:30Z

It's hard to say what's the root cause of the problem because there is no stable repro. I can only guess. Possibly it happens because of network timeouts leading to cancellation of the token used by WAL internally to perform I/O. Some I/O were done in a way not safe for cancellation, I've prepared the potential fix. I can't release it right now.

sakno · 2024-06-11T17:24:19Z

Did you have a chance to check the fix?

guillaume-chervet · 2024-06-11T17:41:30Z

Hi @sakno do you have a way to publish an alpha?

guillaume-chervet · 2024-06-11T17:43:04Z

My level in c# is not the best 😜
It my favorite one but i do not code a lot with (unfortunately).

sakno · 2024-06-12T08:55:45Z

You can reference a project explicitly from your csproj file without published alpha.

sakno · 2024-06-20T21:20:18Z

Release 5.7.0 has been published.

guillaume-chervet · 2024-06-21T04:53:31Z

Thank you @sakno I test it today and tell you if it fix the problem

sakno · 2024-07-09T17:38:52Z

@guillaume-chervet , please use 5.7.3 release

guillaume-chervet · 2024-07-09T18:14:22Z

I test it tomorrow morning @sakno .
I did not take time to send you feedback but previous version was crash after 1 minutes maximum.

guillaume-chervet · 2024-07-09T18:14:44Z

Thank you so much for you help work @sakno

guillaume-chervet · 2024-07-11T15:21:40Z

@sakno in local it works a lot better ! Few hours of use and no problem. I will deploy it on our dev environment to test with more call and time. I wil give you some feedbacks!

Thank you @sakno !

sakno self-assigned this Jun 8, 2024

guillaume-chervet mentioned this issue Jun 8, 2024

fix(slimfaas): remove crash by setting back raft to previous protocol AxaFrance/SlimFaas#61

Merged

sakno added the bug Something isn't working label Jun 9, 2024

sakno closed this as completed in 6be595d Jun 20, 2024

sakno mentioned this issue Jun 25, 2024

DotNext.Net.Cluster: System.ArgumentOutOfRangeException: Non-negative number required. (Parameter 'length') #244

Closed

github-project-automation bot added this to Cluster Aug 28, 2024

github-project-automation bot moved this to Closed in Cluster Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DotNext.Net.Cluster crash in production since I think version 5.4.0 #242

DotNext.Net.Cluster crash in production since I think version 5.4.0 #242

guillaume-chervet commented Jun 7, 2024

sakno commented Jun 8, 2024

guillaume-chervet commented Jun 8, 2024 •

edited

Loading

sakno commented Jun 8, 2024

guillaume-chervet commented Jun 8, 2024 •

edited

Loading

sakno commented Jun 9, 2024

guillaume-chervet commented Jun 9, 2024

sakno commented Jun 9, 2024

guillaume-chervet commented Jun 9, 2024 •

edited

Loading

guillaume-chervet commented Jun 9, 2024

sakno commented Jun 9, 2024 •

edited

Loading

guillaume-chervet commented Jun 9, 2024 •

edited

Loading

sakno commented Jun 9, 2024 •

edited

Loading

guillaume-chervet commented Jun 9, 2024 •

edited

Loading

sakno commented Jun 9, 2024

sakno commented Jun 11, 2024

guillaume-chervet commented Jun 11, 2024

guillaume-chervet commented Jun 11, 2024

sakno commented Jun 12, 2024

sakno commented Jun 20, 2024

guillaume-chervet commented Jun 21, 2024

sakno commented Jul 9, 2024

guillaume-chervet commented Jul 9, 2024

guillaume-chervet commented Jul 9, 2024

guillaume-chervet commented Jul 11, 2024

DotNext.Net.Cluster crash in production since I think version 5.4.0 #242

DotNext.Net.Cluster crash in production since I think version 5.4.0 #242

Comments

guillaume-chervet commented Jun 7, 2024

sakno commented Jun 8, 2024

guillaume-chervet commented Jun 8, 2024 • edited Loading

sakno commented Jun 8, 2024

guillaume-chervet commented Jun 8, 2024 • edited Loading

sakno commented Jun 9, 2024

guillaume-chervet commented Jun 9, 2024

sakno commented Jun 9, 2024

guillaume-chervet commented Jun 9, 2024 • edited Loading

guillaume-chervet commented Jun 9, 2024

sakno commented Jun 9, 2024 • edited Loading

guillaume-chervet commented Jun 9, 2024 • edited Loading

sakno commented Jun 9, 2024 • edited Loading

guillaume-chervet commented Jun 9, 2024 • edited Loading

sakno commented Jun 9, 2024

sakno commented Jun 11, 2024

guillaume-chervet commented Jun 11, 2024

guillaume-chervet commented Jun 11, 2024

sakno commented Jun 12, 2024

sakno commented Jun 20, 2024

guillaume-chervet commented Jun 21, 2024

sakno commented Jul 9, 2024

guillaume-chervet commented Jul 9, 2024

guillaume-chervet commented Jul 9, 2024

guillaume-chervet commented Jul 11, 2024

guillaume-chervet commented Jun 8, 2024 •

edited

Loading

guillaume-chervet commented Jun 8, 2024 •

edited

Loading

guillaume-chervet commented Jun 9, 2024 •

edited

Loading

sakno commented Jun 9, 2024 •

edited

Loading

guillaume-chervet commented Jun 9, 2024 •

edited

Loading

sakno commented Jun 9, 2024 •

edited

Loading

guillaume-chervet commented Jun 9, 2024 •

edited

Loading