amorin | felixhuettner[m]: I just came accross the discussion | 08:41 |
---|---|---|
amorin | about the rabbit stuff you were talking about, the transient timeout default value is definietly an issue in my opinion | 08:42 |
amorin | I cant find a good explanation on why it should be 30 minutes | 08:42 |
amorin | and the fact that it's not deleted correctly on shutdown is a bug, so big +1 on your bug report if you do it | 08:42 |
amorin | about the HA/durable stuff for reply queues | 08:43 |
amorin | we had a mail thread about this like a year ago | 08:43 |
amorin | the main reason on my side to NOT enable the durable/HA on those transient queues is to reduce the low on the rabbit cluster | 08:43 |
amorin | because, as soon as you enable durable, your load will increase a lot | 08:44 |
amorin | s/durable/ha/ | 08:44 |
amorin | and there is not that much benefits of replicating such messages in my opinions | 08:45 |
amorin | if it get loss, a new API call should be done | 08:45 |
amorin | I accept that, but I agree it's not perfect | 08:45 |
opendevreview | Merged openstack/large-scale master: Reformat "how to contribute" page https://review.opendev.org/c/openstack/large-scale/+/854419 | 08:48 |
tobias-urdin | join the effort to replace rabbit ;) it's election time in Sweden right now but I'd rather push an agenda for replacing rabbit than a political party | 08:50 |
felixhuettner[m] | amorin: that sounds like a good point. Maybe it would make sense to make it configurable | 08:59 |
felixhuettner[m] | our rabbit clusters now run a lot smothing after we changed their scheduler behaviour to use the linux kernel scheduler | 08:59 |
amorin | tobias-urdin :) | 09:14 |
amorin | you are talking about moving to NATS, right? Are you running it on production? | 09:15 |
amorin | felixhuettner[m]: nice! Will you share the tuning you've been doing on that part? | 09:15 |
felixhuettner[m] | sure, basically its starting rabbitmq with `-stbt u` | 09:16 |
felixhuettner[m] | details can be found here: https://gitlab.com/yaook/operator/-/issues/405 | 09:17 |
felixhuettner[m] | but it should only have effect in two cases | 09:18 |
felixhuettner[m] | 1. you run multiple erlang process on the same host | 09:18 |
felixhuettner[m] | 2. you run some service on the same host as the erlang process and you use cpu pinning for that | 09:18 |
felixhuettner[m] | also the erlang docs in this section where quite helpfull: https://www.erlang.org/doc/man/erl.html#+sbt | 09:19 |
amorin | ok, thanks! | 09:20 |
amorin | got it, you have multiple clusters running on the same hardware and they were all using the same subset of cores | 09:20 |
felixhuettner[m] | yep, and the erlang scheduler tries to use the same cores on all of the processes | 09:21 |
felixhuettner[m] | because they all prefer to run things on the lowest core number | 09:21 |
felixhuettner[m] | i'm not sure how much it changes things if you don't run multiple erlang processes on the same host | 09:22 |
felixhuettner[m] | but i would be interested if it helps you | 09:22 |
amorin | ack | 09:22 |
amorin | I will read that and see if I will change something or not | 09:23 |
amorin | we use dedicated hardware for our biggest clusters and it works quite well | 09:23 |
amorin | so, for now, we dont need to change anything :) | 09:23 |
felixhuettner[m] | :) | 09:37 |
felixhuettner[m] | but i would add this to the recommendation about having potentially multiple clusters | 09:37 |
felixhuettner[m] | then others dont need to search for months :D | 09:37 |
tobias-urdin | amorin: yeah, no unfortunately not yet :( | 09:43 |
amorin | ok | 10:02 |
amorin | any of you already running rabbits with quorum in production? | 10:02 |
amorin | I am really thinking about this solution | 10:02 |
amorin | now that oslo.messaging is supposed to support it | 10:03 |
amorin | but I never tried it | 10:03 |
amorin | and yes, thank you if you can do a patchset with your doc, that would be amazing for people :) | 10:04 |
felixhuettner[m] | we run it at the moment in our staging environment and it works quite well. | 11:58 |
felixhuettner[m] | but we did not yet test it under real load | 11:58 |
amorin | nice, I suppose the move from classic queue to quorum queues need a massive restart of agents? | 12:39 |
opendevreview | Felix Huettner proposed openstack/large-scale master: rabbitmq: add erlang scheduler recommendation https://review.opendev.org/c/openstack/large-scale/+/855508 | 12:54 |
opendevreview | Felix Huettner proposed openstack/large-scale master: rabbitmq: add erlang scheduler recommendation https://review.opendev.org/c/openstack/large-scale/+/855508 | 12:56 |
felixhuettner[m] | i guess so as well, we are actually not migrating an existing environment but rather using it for a new one | 12:57 |
felixhuettner[m] | but we did something similar and migrated to durable queues on a running system | 12:57 |
amorin | ack | 12:58 |
amorin | when we had to migrate to durable, we had to shutdown everything | 12:58 |
felixhuettner[m] | you can do it online if you teach oslo.messaging/kombu to not care about if the queue settings missmatch | 12:59 |
felixhuettner[m] | https://gitlab.com/yaook/images/cinder/-/raw/fcd0967c3eed674903a987d99a45720ae6928b4f/files/patch_amqp_declare | 12:59 |
felixhuettner[m] | we used this patch for that | 12:59 |
felixhuettner[m] | so we needed to roll it out first on all systems | 12:59 |
felixhuettner[m] | then change the settings on all systems | 12:59 |
felixhuettner[m] | and then remove the patch again everywhere | 12:59 |
felixhuettner[m] | but it was fully online | 12:59 |
amorin | oh nice! | 13:00 |
amorin | I wasnt even aware that this was possible :( | 13:00 |
amorin | anyway we took that as an oportunity to switch our rabbit cluster to new hardware and do other stuff | 13:01 |
felixhuettner[m] | yep, we took a while to find that solution | 13:04 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!