Saturday, 2021-08-07

opendevreviewSatish Patel proposed openstack/openstack-ansible-os_neutron master: Add CentOS-8-Stream OVN support  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/80379803:34
opendevreviewSatish Patel proposed openstack/openstack-ansible-os_neutron master: Add CentOS-8-Stream OVN support  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/80379803:37
opendevreviewSatish Patel proposed openstack/openstack-ansible-os_neutron master: Add CentOS-8-Stream OVN support  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/80379805:22
depasqualeciao everyone. I am facing this error executing a  stable/wallaby OSA https://paste.opendev.org/show/807941/15:15
depasqualeit is something about os-ceilometer.yml playbook15:15
depasqualeany idea?15:15
admin1never seen this error  ..   my galera install on 22.2.0 fails on   haproxy_endpoints : Set haproxy service state   with a FileNotFound error..  logs here      https://gist.githubusercontent.com/a1git/9895beefd1c8680b7dea311781fa1637/raw/ebe44836b87e74e544b51a8bc18ed55a025f12a4/gistfile1.txt   16:59
admin1i have done a couple of 22.2.0 installs .. but this one is new to me, and i have no idea how to solve this or what even is wrong 16:59
admin1any help .. pointers appreciated .. 17:00
admin1rest all playbooks of setup-host and setup-infra ran just fine17:00
admin1i have deleted and re-created the galera containers a few times .. no help 19:53
jrosserdepasquale: you need this fix on the ceilometer ansible role https://github.com/openstack/openstack-ansible-os_ceilometer/commit/87fdc3a17a211a3f896dc20c6090021bfa5c10ef20:36
jrosseradmin1: you have to look at the information it gives you about the stack trace in the error message20:47
jrosserthat points to here https://github.com/ansible-collections/community.general/blob/main/plugins/modules/net_tools/haproxy.py#L26420:48
jrosserso you can see that it’s failed when trying to open the haproxy unix socket inside the haproxy ansible module20:48
jrosserthat suggests that the socket isn’t there, which in turn would make me think haproxy has failed to start properly20:49
jrosserso start with the haproxy journal20:49
jrosserI don’t really think it’s anything to do with galera at all20:50
admin1jrosser, thanks for replying this late 20:56
admin1i mean on a weekend 20:56
admin1the haproxy playbooks run well, haproxy seems to be working 20:56
admin1wouldn't the playbooks fix this ? 20:57
jrossera bunch of the other roles interact with haproxy, setting backends as active/not active as needed20:57
jrosserthat’s what’s failing20:57
admin1ok20:57
jrosserin this case it’s in the galera role, but the failed tasks are delegated to the haproxy node20:58
jrosserit needs to be able to acces the socket to communicate with it20:58
jrosserand I am thinking that is what’s giving you the “file not found” error20:58
jrosserso either haproxy is broken, or the path. to the socket is incorrect for some reason20:59
admin1the socket will be in the active haproxy server only right ? 20:59
jrosserno I think they all have one20:59
jrosserhaproxy itself doesn’t know if it’s the active one or not21:00
admin1oh yes .. its in all 21:00
admin1hmm..   my external LB  where cloud.domain.com points to , is only active in 1 node, 21:01
admin1so in that case, haproxy is only active in 1 node 21:01
admin1in the rest, it does not bind 21:01
admin1coz the ip does not exist 21:01
admin1 i see .. 21:02
admin1its up in one and down in another one ( both non master ) 21:02
admin1i mean in the servers where the bind ip is not present by keepalive21:02
admin1ok .. so in my case, the external bind is on c2 . and in c3  the haproxy is running fine  ,but it was down on c1 .. and i think the playbook tried to use the one in c1 .. 21:04
admin1one more question .. why does this come "galera_server : Fail if galera_cluster_name doesnt match provided value" -- when we re-run galera role ..  with zero changes ? 21:04
jrosserit’s not really to do with the ip binding21:05
admin1i understood that now 21:05
jrosserit’s a unix domain socket (looks like a file)21:05
jrosserI think your deleting the galera containers and recreating has confused the state21:06
jrosserthere is data written to the nodes I think which says if the cluster is bootstrapped - that will have been done for the first deployment21:07
jrosserbut if you delete them all then the state is wrong, the expectation in a cloud is that once bootstrapped you try to keep the cluster valid21:08
jrosserthere are some vars in the galera role to force a re-bootstrap which might help21:09
admin1my thought was deleting all the containers and redoing it again was equivalent to doing a fresh install 21:10
admin1i mean all the "galera" containers21:10
jrosserit may be21:11
jrosserthe bootstrap flag is a fact21:11
jrosserwhich may get cached..... blah blah blah21:11
admin1this time i rm -rf the facts :) 21:11
admin1before running the galera playbook 21:11
admin1hopefully this works this time 21:11
jrosserfingers crossed - good luck :)21:12
admin1rest of the infra playbooks  ran fine ..  was only stuck on this 21:12
admin1it passed the "RUNNING HANDLER [haproxy_endpoints : Set haproxy service state] "  step .. and no errors :) 21:14
admin1thank you jrosser .. you  rock \o/ 21:14

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!