Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial RabbitMQ 4.0 support (Mnesia only) #291

Merged
merged 3 commits into from
Oct 1, 2024

Conversation

gomoripeti
Copy link
Contributor

Proposed Changes

Minimal changes for a plugin version that can run with RabbitMQ 4.0 (with Mnesia metadata store)

Related to #290

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Further Comments

If this is a relatively large or complex change, kick off the discussion by
explaining why you chose the solution you did and what alternatives you
considered, etc.

Copied from rabbitmq-server @ 19af2f649c aka v4.0.2
@michaelklishin
Copy link
Member

@gomoripeti can you please tweak the wording to say something like

This plugin currently only supports Mnesia for metadata store (do not use it with Khepri).

I don't see why this plugin cannot start Mnesia and create its tables even when Khepri is used in general, so we should explain that this is the current state of things, not a fundamental incompatibility.

@gomoripeti
Copy link
Contributor Author

sure I will update the text. I also meant to write what you say that this is the current state only

(the CI tests fail with the sam bazel reason as previous nightly tests, unfortunately I don't know bazel so don't know how to address)

@michaelklishin
Copy link
Member

@gomoripeti this repo hasn't migrated off of BuildBuddy but that's not super relevant. CI will move back to Make fairly soon, including for RabbitMQ itself, for reasons that have nothing to do with Bazel or its capabilities.

@michaelklishin
Copy link
Member

@gomoripeti the conclusion on our team is that this plugin should switch to schema API modules such as rabbit_db_exchange and rabbit_db_queue.

Can you please look into that before we merge this PR?

The implementation is taken from `rabbit_exchange_type_consistent_hash`
@gomoripeti
Copy link
Contributor Author

I updated the functions used by recover/0 with implementation taken from rabbit_exchange_type_consistent_hash.

However I don't fully understand the reason for this recovery. As far as I can trace, rabbit_binding:recover_semi_durable_route already calls the add_binding callback of the exchange type. So I don't see in which case there is a need for this additional recovery.
(What I tested: commented out the recover function and did a restart app on a single node cluster with bindings between a durable delayed exchange with x-delayed-type = x-consistent-hash and 2 durable queues. The hash ring was correctly restored.)

@michaelklishin
Copy link
Member

@gomoripeti rabbit_exchange_type_consistent_hash is very sensitive to ring state changes, and it can be recovered concurrently with things like definition import.

@michaelklishin michaelklishin changed the title RabbitMQ 4.0 support RabbitMQ 4.0 support (Mnesia only) Oct 1, 2024
@michaelklishin
Copy link
Member

With this change, the plugin works against 4.0 with Mnesia and that's fine to merge this part alone.

However, as soon as I enable Khepri, routing specifically fails with an exception:

2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0> Error on AMQP connection <0.4169.0> (127.0.0.1:63451 -> 127.0.0.1:5672, vhost: '/', user: 'guest', state: running), channel 2:
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>  {{{aborted,{no_exists,rabbit_delayed_messagerabbit@sunnyside_index}},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>   {gen_server,call,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>       [rabbit_delayed_message,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>        {delay_message,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>            {exchange,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                {resource,<<"/">>,exchange,<<"dlx.1">>},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                'x-delayed-message',true,false,false,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                [{<<"x-delayed-type">>,longstr,<<"fanout">>}],
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                undefined,undefined,undefined,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                {[],[]},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                #{user => <<"guest">>}},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>            {mc,mc_amqpl,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                {content,60,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                    {'P_basic',<<"application/octet-stream">>,undefined,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                        [{<<"x-delay">>,long,1000}],
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                        2,0,undefined,undefined,undefined,undefined,undefined,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                        undefined,undefined,undefined,undefined},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                    <<184,0,24,97,112,112,108,105,99,97,116,105,111,110,47,111,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                      99,116,101,116,45,115,116,114,101,97,109,0,0,0,17,7,120,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                      45,100,101,108,97,121,108,0,0,0,0,0,0,3,232,2,0>>,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                    rabbit_framing_amqp_0_9_1,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                    [<<"1727741164">>]},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>                #{p => 0,x => <<"dlx.1">>,rk => [<<>>],rts => 1727741164440}},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>            1000},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>        infinity]}},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>  [{gen_server,call,3,[{file,"gen_server.erl"},{line,419}]},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>   {rabbit_exchange_type_delayed_message,route,3,
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>       [{file,"rabbit_exchange_type_delayed_message.erl"},{line,46}]},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>   {rabbit_exchange,route1,4,[{file,"rabbit_exchange.erl"},{line,396}]},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>   {rabbit_exchange,route,3,[{file,"rabbit_exchange.erl"},{line,376}]},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>   {rabbit_channel,handle_method,3,[{file,"rabbit_channel.erl"},{line,1212}]},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>   {rabbit_channel,handle_cast,2,[{file,"rabbit_channel.erl"},{line,629}]},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>   {gen_server2,handle_msg,2,[{file,"gen_server2.erl"},{line,1056}]},
2024-09-30 20:06:04.448162-04:00 [error] <0.4169.0>   {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,251}]}]}
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>     supervisor: {<0.4183.0>,rabbit_channel_sup}
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>     errorContext: child_terminated
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>     reason: {{{aborted,
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>                   {no_exists,rabbit_delayed_messagerabbit@sunnyside_index}},
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>               {gen_server,call,
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>                   [rabbit_delayed_message,
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>                    {delay_message,
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>                        {exchange,
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>                            {resource,<<"/">>,exchange,<<"dlx.1">>},
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>                            'x-delayed-message',true,false,false,
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>                            [{<<"x-delayed-type">>,longstr,<<"fanout">>}],
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>                            undefined,undefined,undefined,
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>                            {[],[]},
2024-09-30 20:06:04.447946-04:00 [error] <0.4183.0>                            #{user => <<"guest">>}},

@michaelklishin michaelklishin changed the title RabbitMQ 4.0 support (Mnesia only) Initial RabbitMQ 4.0 support (Mnesia only) Oct 1, 2024
@michaelklishin michaelklishin merged commit 1187327 into rabbitmq:main Oct 1, 2024
3 of 4 checks passed
@michaelklishin
Copy link
Member

@gomoripeti I'd like to see Khepri support land before cutting a new release, at least unless we discover something really difficult to address.

In the exception above, it's the table that does not exist. The plugin must create it (ensure it exists) on boot. It's a reasonable requirement to restart the nodes or disable and re-enable the plugin after enabling Khepri, or not enable it before moving to Khepri.

@gomoripeti
Copy link
Contributor Author

EDIT: after re-reading your message I see you do mention the reasonable requirements.

I think there are two issues to address related to khepri (and I intentionally reduced the scope of this PR to when khepri_db feature flag is disabled)

  1. on a fresh 4.0 RabbitMQ enable khepri_db and then enable this plugin: the delayed plugin will try to create its Mnesia tables but Mnesia is not running, so it needs to start and initialise Mnesia itself (this is the smaller change)

  2. on a 4.0 RabbitMQ (maybe upgraded from earlier version) this plugin is enabled then khepri_db feature flag is enabled: the post-migration step deletes all Mnesia tables including those of this plugin and stops Mnesia. The plugin is unaware. I reported this here Allow Khepri post-migration step to not delete certain files or dirs rabbitmq-server#11304 (this will need change in RabbitMQ as well. The fact that Mnesia (unlike khepri) does not support running multiple DBs in the same node does not help either)

I can imagine a limited solution (only addressing 1.) as a next step where it is documented that one needs to disable this plugin before khepri migration and enable it after with delayed messages lost. As you mention this is acceptable I will shortly submit a followup PR for this

@michaelklishin
Copy link
Member

@gomoripeti we can add a list of table-exceptions that won't be removed by the Khepri migration function but at least for now, re-enabling the plugin or a cluster restart sounds acceptable to me.

@dumbbell
Copy link
Member

dumbbell commented Oct 1, 2024

The list of Mnesia tables to migrate is explicit and exhaustive; see the rabbit_mnesia_tables_to_khepri_db module attributes. rabbit_khepri isn’t supposed to migrate all existing tables.

However, after the migration, Mnesia is stopped and its files are deleted in the case the node rejoins a Mnesia-based cluster. We would need to figure out a way to reset just enough of Mnesia to not interfere with other users of Mnesia.

@michaelklishin
Copy link
Member

@dumbbell fair enough. Given the single-node nature of this plugin and the fact that the future of this feature is in the commercial edition (read: not this plugin), the above upgrade recommendations (which are/will be part of #292) sounds reasonable and pragmatic.

@michaelklishin michaelklishin added this to the 4.0.2 milestone Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants