Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PersistentSubscription silently drops #112

Closed
kivancguckiran opened this issue Jan 6, 2021 · 8 comments · Fixed by #135
Closed

PersistentSubscription silently drops #112

kivancguckiran opened this issue Jan 6, 2021 · 8 comments · Fixed by #135
Labels
kind/bug Issues which are a software defect

Comments

@kivancguckiran
Copy link
Contributor

We are experiencing consumer drops from persistent subscriptions which do not raise any error. Consumer service hangs indefinitely because of this. This is only noticeable in the EventStoreDB frontend and it happens on a weekly basis.

Is this a known issue? We suspect that this might be a server side drop, but it should somehow raise an error on client side so that we can restart the consumer service.

We are currently using 0.0.0-alpha.7 release.

@George-Payne George-Payne added the kind/bug Issues which are a software defect label Jan 7, 2021
@George-Payne
Copy link
Member

Hi @kivancguckiran

We are aware of some set ups that can cause this to happen, can you tell us a bit the environment you are running this in?

The issue may come from network infrastructure between the client and the server related to handling of gRPC keep-alive messages.

We are working on a fix for the server to deal with these deployment configurations but in the meantime you can work around the issue by periodically reading from $all to keep the connection alive. More information about your setup will help us pinpoint the issue.

@kivancguckiran
Copy link
Contributor Author

Thanks for the swift reply.

We are running EventStoreDB 20.6.1 on a dedicated VM in Azure. Our services are deployed on AKS (Azure Kubernetes Cluster) and EventStoreDB VM is on the same virtual network as the cluster. Base image for our service docker image is node:12-alpine.

Our persistent subscription is to a $by_category projection as in $ce-Name and we are using default settings for our subscription.

We are planning to add this snippet on start:

setInterval(async () => {
  await client.readStream('$all', 1, {
    direction: FORWARDS,
    fromRevision: START,
  });
}, 60000);

Thanks.

@kivancguckiran
Copy link
Contributor Author

Situation update, we are having a possibly unrelated and weird problem. The snippet I have provided above throws an error like this:

Unhandled Rejection at: [object Promise], reason: Error: $all not found

We are using a workaround by reading another projection, but this behavior is weird.

@kaancfidan
Copy link

Hi @kivancguckiran

We are aware of some set ups that can cause this to happen, can you tell us a bit the environment you are running this in?

The issue may come from network infrastructure between the client and the server related to handling of gRPC keep-alive messages.

We are working on a fix for the server to deal with these deployment configurations but in the meantime you can work around the issue by periodically reading from $all to keep the connection alive. More information about your setup will help us pinpoint the issue.

Shouldn't the client raise any kind of error even if there is a problem with the infrastructure? The problem here is not that the subscription is dropping, but that we have no way to understand that it dropped.

@George-Payne
Copy link
Member

The snippet I have provided above throws an error like this:

Unhandled Rejection at: [object Promise], reason: Error: $all not found

Hi @kivancguckiran

To read from $all you need to use client.readAll:

await client.readAll(1, {
    direction: FORWARDS,
    fromPosition: START,
});

(this is due to position / revision etc)

@George-Payne
Copy link
Member

Shouldn't the client raise any kind of error even if there is a problem with the infrastructure?

Unfortunately, neither the client or server is made aware that the connection has been dropped.

You can read more on the upstream issue here:

grpc/grpc-dotnet#770

it's possible for one side of an idle TCP connection to disconnect without the other side observing it. Without any keep alives or timeouts a connection can stay in this state indefinitely.

@George-Payne George-Payne added the blocked This issue is blocked label Jan 11, 2021
@kaancfidan
Copy link

kaancfidan commented Jan 11, 2021

Thanks for clarifying the issue for us. We have honestly suspected it could be a problem with gRPC itself, it's nice to have your suspicions confirmed.

On a related note, we also have .NET services that connect to our EventStoreDB using the .NET client and we still have yet to upgrade to using gRPC (we are still using TCP for them).

Should we expect the same issue when we upgrade those to gRPC as well? Should I ask this question in .NET client issues?

@YoEight
Copy link
Member

YoEight commented Feb 8, 2021

@kaancfidan you won't have that behaviour with the .NET 5.0 gRPC client, there is a keep-alive setting to prevent it.

@George-Payne George-Payne removed the blocked This issue is blocked label Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues which are a software defect
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants