Friday, 2022-06-24

felixhuettner[m]stalking is always nice :)06:50
felixhuettner[m]we run individual Rabbit clusters for each openstack service06:50
felixhuettner[m]we are honestly not sure what is causing all the messages06:50
felixhuettner[m]but we mostly see issues when we restart the rabbit nodes one by one06:51
felixhuettner[m]the setup is a little "non-standard": we run rabbit in k8s and use tls for all connections06:51
felixhuettner[m]replication policy is set according to the sig recommendation06:52
felixhuettner[m]but we try to do a rolling restart each week 06:53
felixhuettner[m]and during this time our nova and neutron rabbit clusters break (when we restart the last of the 3 nodes)06:53
felixhuettner[m]the patches you found have definately improoved the behaviour (the nova cluster now only dies half the time)06:54
felixhuettner[m]but a collegue of mine is now working on that to get this to a better state06:54
amorinok!07:07
amorinwe noticed on our side that each time we restart a node, the connections from agents (neutron mostly) are dispatched to other nodes, so the load is split accross 2 nodes instead of 207:08
amorin*instead of 307:08
amorinand the load is never dispatched again on the 3 nodes in an automated way07:08
amorinthe only way to perform this is to restart the agents07:09
felixhuettner[m]oooh, that is also interesting07:09
amorinso if for some reason, 2 nodes over 3 are down, the last node will handle all the load07:09
amorinand it will stay like that forever07:10
felixhuettner[m]the issue we see is with the failover of mirrored queues (that might also fit your issue)07:10
felixhuettner[m]it seems like when failing over these queues select the next master based on the "oldest" node07:10
amorinhave you tried the quorum queues?07:10
felixhuettner[m]so if you have restarted 2 out of 3 nodes then the last node will have all the mirrored queues since it is oldest07:10
felixhuettner[m]not yet07:11
felixhuettner[m]we are still a little stuck on queens :D07:11
felixhuettner[m]do you have some experiences there?07:11
amorinnot yet neither :(07:11
amorinbut we are running our own patched version of oslo.messaging, so we might be able to upgrade it to a latest version that enable quorum queues07:12
amorinthat's something we will try for sure07:12
felixhuettner[m]we are currently planning a new cluster on yoga and will use quorum queues there. So maybe i can share something there in a few months07:12
amorinthat would be amazing!07:12
opendevreviewRamona Rautenberg proposed openstack/large-scale master: Transfer configure page to rst format  https://review.opendev.org/c/openstack/large-scale/+/84753507:42
opendevreviewRamona Rautenberg proposed openstack/large-scale master: Transfer configure page to rst format  https://review.opendev.org/c/openstack/large-scale/+/84753508:03
opendevreviewRamona Rautenberg proposed openstack/large-scale master: Transfer configure page to rst format  https://review.opendev.org/c/openstack/large-scale/+/84753508:42
opendevreviewRamona Rautenberg proposed openstack/large-scale master: Transfer configure page to rst format  https://review.opendev.org/c/openstack/large-scale/+/84753510:14
opendevreviewMerged openstack/large-scale master: Transfer configure page to rst format  https://review.opendev.org/c/openstack/large-scale/+/84753515:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!