Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messages can be lost on upgrade from Erlang/OTP 19 (or earlier) to Erlang/OTP 20 #1243

Closed
hairyhum opened this issue Jun 1, 2017 · 11 comments
Assignees
Milestone

Comments

@hairyhum
Copy link
Contributor

hairyhum commented Jun 1, 2017

OTP-20 has different format for term_to_binary format, probably changed in OTP-14337.
Because queue index directory names are generated using term_to_binary they are different in OTP-20 and pre-20, so any data created before erlang upgrade will be lost because RabbitMQ deletes all unknown queue index directories.

The milestone says 3.7.0 but together with some backwards-compatibility restoring changes by the the Erlang/OTP team, we hope to support OTP 20 in 3.6.11 as well.

See RabbitMQ Erlang/OTP 20 compatibility thread on rabbitmq-users for more details and updates.

@hairyhum hairyhum added the bug label Jun 1, 2017
@michaelklishin michaelklishin added this to the 3.7.0 milestone Jun 1, 2017
@michaelklishin
Copy link
Member

It's not clear if we can make Erlang upgrades from 19 to 20 work for 3.6.x (that is, without breaking changes in RabbitMQ itself), so milestone => 3.7.0.

@michaelklishin michaelklishin changed the title Data store lost on upgrade to OTP-20 Data store lost on upgrade from Erlang/OTP 19 (or earlier) to Erlang/OTP 20 Jun 1, 2017
@hairyhum
Copy link
Contributor Author

hairyhum commented Jun 1, 2017

In that case there is not much difference between 3.6 and 3.7, but limiting fix to 3.7 still makes sense.

The problem here is in this line https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_queue_index.erl#L694
We get term_to_binary from a record with atoms inside and then calculate MD5 hash of it.
OTP-20 can decode binaries encoded in OTP-19, but it doesn't encode them the same way. And because we do MD5 we cannot decode it.
I can see 3 options for us:

  • Migrate dir names by generating the old binary format ourselves (from OTP-20)
  • Require OTP-19 to migrate them
  • Convince Erlang core devs to return backwards-compatibility option.

Third option can be hard, due to RC process, but could make sense not only for us, but other Erlang users as well.

@hairyhum
Copy link
Contributor Author

hairyhum commented Jun 2, 2017

Discussion in erlang bugtracker https://bugs.erlang.org/browse/ERL-431

@hairyhum
Copy link
Contributor Author

hairyhum commented Jun 2, 2017

OTP team does not want to ad options to term_to_binary, recommending us to hand-craft the "old format" values.

@hairyhum
Copy link
Contributor Author

hairyhum commented Jun 2, 2017

The critical parts are queues and vhosts names. We also have several places across plugins which use term_to_binary, but they could be safe (needs verification):

  • rabbit_pbe
  • rabbitmq_clusterer
  • federation status
  • bindings management
  • stomp subscription_queue_name

@hairyhum
Copy link
Contributor Author

hairyhum commented Jun 2, 2017

Stomp and management issues are not node-local, which makes it impossible to apply the changes in 3.6, so it's definitely 3.7

@michaelklishin
Copy link
Member

If we can reproduce the old algorithm in our own function, why can't that go into 3.6.x?

@hairyhum
Copy link
Contributor Author

hairyhum commented Jun 2, 2017

As a workaround for 3.6.x, we can reproduce old term_to_binary. There is no breaking changes in MD5. It's just the fact that we use MD5 to obfuscate binaries.

@hairyhum
Copy link
Contributor Author

hairyhum commented Jun 2, 2017

Just for queue names, the function should be relatively simple. Something like

term_to_binary_legacy({resource, Vhost, queue, Name}) ->
    VLength = byte_size(Vhost),
    NLength = byte_size(Name),
    <<131,104,4,                              %% 4-element tuple
      100,0,8,114,101,115,111,117,114,99,101, %% 'resource' atom
      109,VLength:32,Vhost/binary,            %% VHost binary
      100,0,5,113,117,101,117,101,            %% 'queue' atom
      109,NLength:32,Name/binary>>.           %% Name binary

@hairyhum
Copy link
Contributor Author

Since erlang/otp@48e67f5 the old term_to_binary format was brought back, so presumably we won't have problems during upgrades. Still need to investigate.

3.6.11 now should use term_to_binary/2 to generate a fixed old version of binaries.

@michaelklishin michaelklishin changed the title Data store lost on upgrade from Erlang/OTP 19 (or earlier) to Erlang/OTP 20 Data store can be lost on upgrade from Erlang/OTP 19 (or earlier) to Erlang/OTP 20 Jun 28, 2017
@michaelklishin
Copy link
Member

Changed the title to be less alarming as OTP 20 GA ended up including a "mostly compatible" term_to_binary/1.

@michaelklishin michaelklishin changed the title Data store can be lost on upgrade from Erlang/OTP 19 (or earlier) to Erlang/OTP 20 Messages can be lost on upgrade from Erlang/OTP 19 (or earlier) to Erlang/OTP 20 Jun 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants