Thursday, 2022-09-01

amorin	felixhuettner[m]: I just came accross the discussion	08:41
amorin	about the rabbit stuff you were talking about, the transient timeout default value is definietly an issue in my opinion	08:42
amorin	I cant find a good explanation on why it should be 30 minutes	08:42
amorin	and the fact that it's not deleted correctly on shutdown is a bug, so big +1 on your bug report if you do it	08:42
amorin	about the HA/durable stuff for reply queues	08:43
amorin	we had a mail thread about this like a year ago	08:43
amorin	the main reason on my side to NOT enable the durable/HA on those transient queues is to reduce the low on the rabbit cluster	08:43
amorin	because, as soon as you enable durable, your load will increase a lot	08:44
amorin	s/durable/ha/	08:44
amorin	and there is not that much benefits of replicating such messages in my opinions	08:45
amorin	if it get loss, a new API call should be done	08:45
amorin	I accept that, but I agree it's not perfect	08:45
opendevreview	Merged openstack/large-scale master: Reformat "how to contribute" page https://review.opendev.org/c/openstack/large-scale/+/854419	08:48
tobias-urdin	join the effort to replace rabbit ;) it's election time in Sweden right now but I'd rather push an agenda for replacing rabbit than a political party	08:50
felixhuettner[m]	amorin: that sounds like a good point. Maybe it would make sense to make it configurable	08:59
felixhuettner[m]	our rabbit clusters now run a lot smothing after we changed their scheduler behaviour to use the linux kernel scheduler	08:59
amorin	tobias-urdin :)	09:14
amorin	you are talking about moving to NATS, right? Are you running it on production?	09:15
amorin	felixhuettner[m]: nice! Will you share the tuning you've been doing on that part?	09:15
felixhuettner[m]	sure, basically its starting rabbitmq with `-stbt u`	09:16
felixhuettner[m]	details can be found here: https://gitlab.com/yaook/operator/-/issues/405	09:17
felixhuettner[m]	but it should only have effect in two cases	09:18
felixhuettner[m]	1. you run multiple erlang process on the same host	09:18
felixhuettner[m]	2. you run some service on the same host as the erlang process and you use cpu pinning for that	09:18
felixhuettner[m]	also the erlang docs in this section where quite helpfull: https://www.erlang.org/doc/man/erl.html#+sbt	09:19
amorin	ok, thanks!	09:20
amorin	got it, you have multiple clusters running on the same hardware and they were all using the same subset of cores	09:20
felixhuettner[m]	yep, and the erlang scheduler tries to use the same cores on all of the processes	09:21
felixhuettner[m]	because they all prefer to run things on the lowest core number	09:21
felixhuettner[m]	i'm not sure how much it changes things if you don't run multiple erlang processes on the same host	09:22
felixhuettner[m]	but i would be interested if it helps you	09:22
amorin	ack	09:22
amorin	I will read that and see if I will change something or not	09:23
amorin	we use dedicated hardware for our biggest clusters and it works quite well	09:23
amorin	so, for now, we dont need to change anything :)	09:23
felixhuettner[m]	:)	09:37
felixhuettner[m]	but i would add this to the recommendation about having potentially multiple clusters	09:37
felixhuettner[m]	then others dont need to search for months :D	09:37
tobias-urdin	amorin: yeah, no unfortunately not yet :(	09:43
amorin	ok	10:02
amorin	any of you already running rabbits with quorum in production?	10:02
amorin	I am really thinking about this solution	10:02
amorin	now that oslo.messaging is supposed to support it	10:03
amorin	but I never tried it	10:03
amorin	and yes, thank you if you can do a patchset with your doc, that would be amazing for people :)	10:04
felixhuettner[m]	we run it at the moment in our staging environment and it works quite well.	11:58
felixhuettner[m]	but we did not yet test it under real load	11:58
amorin	nice, I suppose the move from classic queue to quorum queues need a massive restart of agents?	12:39
opendevreview	Felix Huettner proposed openstack/large-scale master: rabbitmq: add erlang scheduler recommendation https://review.opendev.org/c/openstack/large-scale/+/855508	12:54
opendevreview	Felix Huettner proposed openstack/large-scale master: rabbitmq: add erlang scheduler recommendation https://review.opendev.org/c/openstack/large-scale/+/855508	12:56
felixhuettner[m]	i guess so as well, we are actually not migrating an existing environment but rather using it for a new one	12:57
felixhuettner[m]	but we did something similar and migrated to durable queues on a running system	12:57
amorin	ack	12:58
amorin	when we had to migrate to durable, we had to shutdown everything	12:58
felixhuettner[m]	you can do it online if you teach oslo.messaging/kombu to not care about if the queue settings missmatch	12:59
felixhuettner[m]	https://gitlab.com/yaook/images/cinder/-/raw/fcd0967c3eed674903a987d99a45720ae6928b4f/files/patch_amqp_declare	12:59
felixhuettner[m]	we used this patch for that	12:59
felixhuettner[m]	so we needed to roll it out first on all systems	12:59
felixhuettner[m]	then change the settings on all systems	12:59
felixhuettner[m]	and then remove the patch again everywhere	12:59
felixhuettner[m]	but it was fully online	12:59
amorin	oh nice!	13:00
amorin	I wasnt even aware that this was possible :(	13:00
amorin	anyway we took that as an oportunity to switch our rabbit cluster to new hardware and do other stuff	13:01
felixhuettner[m]	yep, we took a while to find that solution	13:04

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!