-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Crashing ingress and dispatcher pods when connections are disrupted in a 3 node cluster #764
Comments
/kind bug |
I have not created a script but can try my best to outline the steps I take to reproduce the issue. As shown above this can result delivery blockages. Overview
Resources
apiVersion: eventing.knative.dev/v1
kind: Broker
metadata:
name: default
annotations:
eventing.knative.dev/broker.class: RabbitMQBroker
spec:
config:
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
name: rabbitmq
namespace: knative-eventing
delivery:
retry: 5
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: rabbitmq
namespace: knative-eventing
annotations:
rabbitmq.com/topology-allowed-namespaces: "*"
spec:
replicas: 3
resources:
requests:
memory: 2Gi
limits:
memory: 2Gi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- rabbitmq
topologyKey: kubernetes.io/hostname
|
Kk! 🤔 Definitely a scenario we haven't tried so I'll be taking a look next week, thanks @andrewwebber :)! |
I was performing some chaos engineering exercises after having reliability issues with the broker.
Of course could be a configuration missing (replicated queues?) on the cluster side however 1.4.0 does not seem to recover if a 3 node rabbitmq cluster is being rescheduled (maintainence or autoscaling + PodDistruptionBudget).
Dispatcher seem to crash and the broker-ingress never reconnects. The situation is only resolved by restarting the broker at which point a successful retry logic is initiated within the broker.
In the meantime I have a custom operator that scrapes the logs of dispatchers and brokers and restarts them when appropriate.
Just wondered if this was of interest.
Originally posted by @andrewwebber in #760 (comment)
The text was updated successfully, but these errors were encountered: