PersistentSubscription silently drops #112

kivancguckiran · 2021-01-06T13:25:06Z

We are experiencing consumer drops from persistent subscriptions which do not raise any error. Consumer service hangs indefinitely because of this. This is only noticeable in the EventStoreDB frontend and it happens on a weekly basis.

Is this a known issue? We suspect that this might be a server side drop, but it should somehow raise an error on client side so that we can restart the consumer service.

We are currently using 0.0.0-alpha.7 release.

George-Payne · 2021-01-07T14:42:50Z

Hi @kivancguckiran

We are aware of some set ups that can cause this to happen, can you tell us a bit the environment you are running this in?

The issue may come from network infrastructure between the client and the server related to handling of gRPC keep-alive messages.

We are working on a fix for the server to deal with these deployment configurations but in the meantime you can work around the issue by periodically reading from $all to keep the connection alive. More information about your setup will help us pinpoint the issue.

kivancguckiran · 2021-01-07T15:23:25Z

Thanks for the swift reply.

We are running EventStoreDB 20.6.1 on a dedicated VM in Azure. Our services are deployed on AKS (Azure Kubernetes Cluster) and EventStoreDB VM is on the same virtual network as the cluster. Base image for our service docker image is node:12-alpine.

Our persistent subscription is to a $by_category projection as in $ce-Name and we are using default settings for our subscription.

We are planning to add this snippet on start:

setInterval(async () => {
  await client.readStream('$all', 1, {
    direction: FORWARDS,
    fromRevision: START,
  });
}, 60000);

Thanks.

kivancguckiran · 2021-01-08T09:38:01Z

Situation update, we are having a possibly unrelated and weird problem. The snippet I have provided above throws an error like this:

Unhandled Rejection at: [object Promise], reason: Error: $all not found

We are using a workaround by reading another projection, but this behavior is weird.

kaancfidan · 2021-01-08T16:50:44Z

Hi @kivancguckiran

We are aware of some set ups that can cause this to happen, can you tell us a bit the environment you are running this in?

The issue may come from network infrastructure between the client and the server related to handling of gRPC keep-alive messages.

We are working on a fix for the server to deal with these deployment configurations but in the meantime you can work around the issue by periodically reading from $all to keep the connection alive. More information about your setup will help us pinpoint the issue.

Shouldn't the client raise any kind of error even if there is a problem with the infrastructure? The problem here is not that the subscription is dropping, but that we have no way to understand that it dropped.

George-Payne · 2021-01-11T11:13:11Z

The snippet I have provided above throws an error like this:

Unhandled Rejection at: [object Promise], reason: Error: $all not found

Hi @kivancguckiran

To read from $all you need to use client.readAll:

await client.readAll(1, {
    direction: FORWARDS,
    fromPosition: START,
});

(this is due to position / revision etc)

George-Payne · 2021-01-11T12:52:57Z

Shouldn't the client raise any kind of error even if there is a problem with the infrastructure?

Unfortunately, neither the client or server is made aware that the connection has been dropped.

You can read more on the upstream issue here:

grpc/grpc-dotnet#770

it's possible for one side of an idle TCP connection to disconnect without the other side observing it. Without any keep alives or timeouts a connection can stay in this state indefinitely.

kaancfidan · 2021-01-11T13:03:55Z

Thanks for clarifying the issue for us. We have honestly suspected it could be a problem with gRPC itself, it's nice to have your suspicions confirmed.

On a related note, we also have .NET services that connect to our EventStoreDB using the .NET client and we still have yet to upgrade to using gRPC (we are still using TCP for them).

Should we expect the same issue when we upgrade those to gRPC as well? Should I ask this question in .NET client issues?

YoEight · 2021-02-08T10:22:29Z

@kaancfidan you won't have that behaviour with the .NET 5.0 gRPC client, there is a keep-alive setting to prevent it.

George-Payne added the kind/bug Issues which are a software defect label Jan 7, 2021

George-Payne added the blocked This issue is blocked label Jan 11, 2021

George-Payne mentioned this issue Feb 9, 2021

Keep alives #135

Merged

George-Payne removed the blocked This issue is blocked label Feb 9, 2021

hayley-jean closed this as completed in #135 Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PersistentSubscription silently drops #112

PersistentSubscription silently drops #112

kivancguckiran commented Jan 6, 2021

George-Payne commented Jan 7, 2021

kivancguckiran commented Jan 7, 2021

kivancguckiran commented Jan 8, 2021

kaancfidan commented Jan 8, 2021

George-Payne commented Jan 11, 2021

George-Payne commented Jan 11, 2021

kaancfidan commented Jan 11, 2021 •

edited

Loading

YoEight commented Feb 8, 2021

PersistentSubscription silently drops #112

PersistentSubscription silently drops #112

Comments

kivancguckiran commented Jan 6, 2021

George-Payne commented Jan 7, 2021

kivancguckiran commented Jan 7, 2021

kivancguckiran commented Jan 8, 2021

kaancfidan commented Jan 8, 2021

George-Payne commented Jan 11, 2021

George-Payne commented Jan 11, 2021

kaancfidan commented Jan 11, 2021 • edited Loading

YoEight commented Feb 8, 2021

kaancfidan commented Jan 11, 2021 •

edited

Loading