Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddRabbitMQ method with volume mounts does not persist queues across restarts by default #2247

Closed
pedershk opened this issue Feb 15, 2024 · 15 comments · Fixed by #3152
Closed
Assignees
Labels
area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication

Comments

@pedershk
Copy link

RabbitMQ uses its HOSTNAME to set up paths for queues on its internal storage. As the DCP generates a random hostname across restarts, this changes with every invocation of the container. This results in unexpected behavior from the developer's point of view - queues are not persisted across restarts even with persistent storage configured for the container.

This is an example of a statement that will NOT persist queues:

 var rabbitMq = builder
     .AddRabbitMQContainer("rabbitmq-host", 5672, password: rabbitMqPassword)
     .WithVolumeMount("rabbitmq-host-vol", "/var/lib/rabbitmq", VolumeMountType.Named);

If a NODENAME environment variable is set, RabbitMQ uses that to override the HOSTNAME on the container for internal configuration purposes. So directly specifiying the NODENAME variable With i.e. WithEnvironment("NODENAME", "rabbit@localhost") resolves the issue, as in the following statement:

 var rabbitMq = builder
     .AddRabbitMQContainer("rabbitmq-host", 5672, password: rabbitMqPassword)
     .WithEnvironment("NODENAME", "rabbit@localhost")
     .WithVolumeMount("rabbitmq-host-vol", "/var/lib/rabbitmq", VolumeMountType.Named);

Suggestion : set NODENAME by default to a fixed value for RabbitMQ hosts (it's only used internally, so won't affect other RabbitMQ containers running in the same docker engine) - but still allow the user to override by setting it specifically like above. Alternatively, as a minimum, update the RabbitMQ container documentation for Aspire to mention this specifically.

@eerhardt
Copy link
Member

@mitchdenny - what do you think about having a "UsePersistence()" extension method in RabbitMQ that does this? Similar to the Storage container:

var storage = builder.AddAzureStorage("storage").UseEmulator(container =>
{
container.UsePersistence();
});

@mitchdenny
Copy link
Member

Need to think it through all the way to deployment, but definatley open to this idea. The difference between this and the storage resource is that Rabbit MQ is already a container so no need for the callback.

@mitchdenny mitchdenny added area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication and removed area-integrations Issues pertaining to Aspire Integrations packages untriaged labels Feb 24, 2024
@mitchdenny mitchdenny added this to the preview 5 (Apr) milestone Feb 24, 2024
@danmoseley danmoseley added the bug label Feb 26, 2024
@davidfowl davidfowl changed the title RabbitMQ .WithRabbitMQContainer method even with volume mounts does not persist queues across restarts by default AddRabbitMQ method with volume mounts does not persist queues across restarts by default Feb 28, 2024
@joperezr joperezr added enhancement and removed bug labels Mar 1, 2024
@joperezr
Copy link
Member

joperezr commented Mar 1, 2024

@mitchdenny were you planning on taking this one according to the above?

@davidfowl
Copy link
Member

davidfowl commented Mar 9, 2024

@sebastienros assigning this to you. There's work to add helpers for containers for persistence, you might be able to build on top #1317

cc @DamianEdwards

@davidfowl
Copy link
Member

@DamianEdwards does this work now?

@DamianEdwards
Copy link
Member

Let me try it out.

@DamianEdwards
Copy link
Member

DamianEdwards commented Mar 23, 2024

Ok an update. I tried the scenario out with the new WithDataVolume() method and can confirm what @pedershk reported that RabbitMQ internally lays out storage based on the node name which in the case of a changing port is not stable, so using a volume just results in a new sub-dir structure being created in it each time. I updated the WithDataVolume() and WithDataBindMount() to set a stable node name via the RABBITMQ_NODENAME environment variable (docs) and that indeed resulted in the same directory being used each time, but then a stable password is required (logins fail otherwise, same as #1151). So I created a stable password via a parameter and then ran into issues with password escaping in the connection string (#3117) which I worked around.

I then set about trying to create the scenario where message are enqueued but not consumed in one dev session, and then in the next dev session they are consumed, but I haven't been able to do so. I used the TestShop app to do this as it already has separate projects that produce and consume messages, so once modified to use persistence for RabbitMQ, I can just start a dev session with the resource that consumes messages (orderprocessor) commented out, add some items to the basket and then click the basket to force an order to be created and thus a message produced. Then stop the session, uncomment the resource, then start a new dev session. I'm expecting to see the orderprocessor receive the unprocessed messages when it starts but it does not. The logs for the RabbitMQ container seem to indicate that no queue restoration took place when it started.

RabbitMQ container logs
2024-03-22 17:39:08 2024-03-23 00:39:08.492506+00:00 [info] <0.248.0> Running boot step recovery defined by app rabbit
2024-03-22 17:39:08 2024-03-23 00:39:08.506201+00:00 [info] <0.474.0> Making sure data directory '/var/lib/rabbitmq/mnesia/messaging@localhost/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2024-03-22 17:39:08 2024-03-23 00:39:08.522707+00:00 [info] <0.474.0> Starting message stores for vhost '/'
2024-03-22 17:39:08 2024-03-23 00:39:08.532842+00:00 [info] <0.484.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2024-03-22 17:39:08 2024-03-23 00:39:08.534894+00:00 [info] <0.474.0> Started message store of type transient for vhost '/'
2024-03-22 17:39:08 2024-03-23 00:39:08.535094+00:00 [info] <0.488.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2024-03-22 17:39:08 2024-03-23 00:39:08.535539+00:00 [warning] <0.488.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": rebuilding indices from scratch
2024-03-22 17:39:08 2024-03-23 00:39:08.537064+00:00 [info] <0.474.0> Started message store of type persistent for vhost '/'
2024-03-22 17:39:08 2024-03-23 00:39:08.537213+00:00 [info] <0.474.0> Recovering 0 queues of type rabbit_classic_queue took 29ms
2024-03-22 17:39:08 2024-03-23 00:39:08.537259+00:00 [info] <0.474.0> Recovering 0 queues of type rabbit_quorum_queue took 0ms
2024-03-22 17:39:08 2024-03-23 00:39:08.537300+00:00 [info] <0.474.0> Recovering 0 queues of type rabbit_stream_queue took 0ms

My changes so far are in https://github.com/dotnet/aspire/tree/damianedwards/rabbitmq-volume

@pedershk any thoughts on what might be going on?

@pedershk
Copy link
Author

pedershk commented Mar 23, 2024

Hi @DamianEdwards. I've observed the same behavior with passwords - this forced me to not use persistence with RabbitMQ when developing locally (we use ASB in our deployed test environment, so not an issue there). I was a bit flummoxed that the password parameter was removed from the AddRabbitMQ method, to be honest (atleast in P4 - I'm not currently using nightlies) - what's the reasoning behind this change? I understand it can be set via a parameter now, but it's not at all obvious - and the sum of this makes it hard to hook up things like the RMQ admin interface.

I haven't set the password to a fixed value, so haven't been able to observe the non-stateful behavior you've been seeing. I CAN confirm however that previously, when password was being set via the AddRabbitMQ method parameter, persistence was working properly and queues persisted across restarts with the addition of the RABBITMQ_NODENAME environment variable to the container.

Sorry I can't be of more help - but it would be good to get this back in a functional state.

@davidfowl
Copy link
Member

I was a bit flummoxed that the password parameter was removed from the AddRabbitMQ method, to be honest (atleast in P4 - I'm not currently using nightlies) - what's the reasoning behind this change? I understand it can be set via a parameter now, but it's not at all obvious - and the sum of this makes it hard to hook up things like the RMQ admin interface.

P4 still had lots of inconsistency. When we unified the container-based methods, we didn't add the password configuration back and that was an oversight. It lead to this issue which we resolved in P5.

@DamianEdwards
Copy link
Member

@pedershk I'm completely new to RabbitMQ so it's possible I'm doing something wrong. Can you look at the code in the TestShop.BasketService (producer) and TestShop.OrderProcessor (consumer) projects and confirm that the code, coupled with the procedure I documented above, should be sufficient to test the scenario?

@pedershk
Copy link
Author

@pedershk I'm completely new to RabbitMQ so it's possible I'm doing something wrong. Can you look at the code in the TestShop.BasketService (producer) and TestShop.OrderProcessor (consumer) projects and confirm that the code, coupled with the procedure I documented above, should be sufficient to test the scenario?

Afraid I'm not going to be much help - I've only ever used RabbitMQ through abstractions such as MassTransit. But I'm not entirely sure that the C# SDK creates queues with persistence by default - RabbitMQ also has non-persisted queue types.

MassTransit creates queues with persistence by default, however.

I can download a nightly build of Aspire and test with your AddRabbitMQ in the TestShop Apphost against my own MassTransit code - but that may not be sufficient to validate your scenario?

@davidfowl
Copy link
Member

https://stackoverflow.com/a/40748606/45091?

@DamianEdwards
Copy link
Member

That was it! Confirmed it's working now. I'll send a PR.

image

image

@davidfowl
Copy link
Member

@DamianEdwards can you backport

@DamianEdwards
Copy link
Member

@davidfowl #3186

@github-actions github-actions bot locked and limited conversation to collaborators Apr 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication
Projects
None yet
8 participants