Thursday, 2022-09-01

amorinfelixhuettner[m]: I just came accross the discussion08:41
amorinabout the rabbit stuff you were talking about, the transient timeout default value is definietly an issue in my opinion08:42
amorinI cant find a good explanation on why it should be 30 minutes08:42
amorinand the fact that it's not deleted correctly on shutdown is a bug, so big +1 on your bug report if you do it08:42
amorinabout the HA/durable stuff for reply queues08:43
amorinwe had a mail thread about this like a year ago08:43
amorinthe main reason on my side to NOT enable the durable/HA on those transient queues is to reduce the low on the rabbit cluster08:43
amorinbecause, as soon as you enable durable, your load will increase a lot08:44
amorins/durable/ha/08:44
amorinand there is not that much benefits of replicating such messages in my opinions08:45
amorinif it get loss, a new API call should be done08:45
amorinI accept that, but I agree it's not perfect08:45
opendevreviewMerged openstack/large-scale master: Reformat "how to contribute" page  https://review.opendev.org/c/openstack/large-scale/+/85441908:48
tobias-urdinjoin the effort to replace rabbit ;) it's election time in Sweden right now but I'd rather push an agenda for replacing rabbit than a political party08:50
felixhuettner[m]amorin: that sounds like a good point. Maybe it would make sense to make it configurable08:59
felixhuettner[m]our rabbit clusters now run a lot smothing after we changed their scheduler behaviour to use the linux kernel scheduler08:59
amorintobias-urdin :)09:14
amorinyou are talking about moving to NATS, right? Are you running it on production?09:15
amorinfelixhuettner[m]: nice! Will you share the tuning you've been doing on that part?09:15
felixhuettner[m]sure, basically its starting rabbitmq with `-stbt u`09:16
felixhuettner[m]details can be found here: https://gitlab.com/yaook/operator/-/issues/40509:17
felixhuettner[m]but it should only have effect in two cases09:18
felixhuettner[m]1. you run multiple erlang process on the same host09:18
felixhuettner[m]2. you run some service on the same host as the erlang process and you use cpu pinning for that09:18
felixhuettner[m]also the erlang docs in this section where quite helpfull: https://www.erlang.org/doc/man/erl.html#+sbt09:19
amorinok, thanks!09:20
amoringot it, you have multiple clusters running on the same hardware and they were all using the same subset of cores09:20
felixhuettner[m]yep, and the erlang scheduler tries to use the same cores on all of the processes09:21
felixhuettner[m]because they all prefer to run things on the lowest core number09:21
felixhuettner[m]i'm not sure how much it changes things if you don't run multiple erlang processes on the same host09:22
felixhuettner[m]but i would be interested if it helps you09:22
amorinack09:22
amorinI will read that and see if I will change something or not09:23
amorinwe use dedicated hardware for our biggest clusters and it works quite well09:23
amorinso, for now, we dont need to change anything :)09:23
felixhuettner[m]:)09:37
felixhuettner[m]but i would add this to the recommendation about having potentially multiple clusters09:37
felixhuettner[m]then others dont need to search for months :D09:37
tobias-urdinamorin: yeah, no unfortunately not yet :(09:43
amorinok10:02
amorinany of you already running rabbits with quorum in production?10:02
amorinI am really thinking about this solution10:02
amorinnow that oslo.messaging is supposed to support it10:03
amorinbut I never tried it10:03
amorinand yes, thank you if you can do a patchset with your doc, that would be amazing for people :)10:04
felixhuettner[m]we run it at the moment in our staging environment and it works quite well.11:58
felixhuettner[m]but we did not yet test it under real load11:58
amorinnice, I suppose the move from classic queue to quorum queues need a massive restart of agents?12:39
opendevreviewFelix Huettner proposed openstack/large-scale master: rabbitmq: add erlang scheduler recommendation  https://review.opendev.org/c/openstack/large-scale/+/85550812:54
opendevreviewFelix Huettner proposed openstack/large-scale master: rabbitmq: add erlang scheduler recommendation  https://review.opendev.org/c/openstack/large-scale/+/85550812:56
felixhuettner[m]i guess so as well, we are actually not migrating an existing environment but rather using it for a new one12:57
felixhuettner[m]but we did something similar and migrated to durable queues on a running system12:57
amorinack12:58
amorinwhen we had to migrate to durable, we had to shutdown everything12:58
felixhuettner[m]you can do it online if you teach oslo.messaging/kombu to not care about if the queue settings missmatch12:59
felixhuettner[m]https://gitlab.com/yaook/images/cinder/-/raw/fcd0967c3eed674903a987d99a45720ae6928b4f/files/patch_amqp_declare12:59
felixhuettner[m]we used this patch for that12:59
felixhuettner[m]so we needed to roll it out first on all systems12:59
felixhuettner[m]then change the settings on all systems12:59
felixhuettner[m]and then remove the patch again everywhere12:59
felixhuettner[m]but it was fully online12:59
amorinoh nice!13:00
amorinI wasnt even aware that this was possible :(13:00
amorinanyway we took that as an oportunity to switch our rabbit cluster to new hardware and do other stuff13:01
felixhuettner[m]yep, we took a while to find that solution13:04

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!