Wednesday, 2025-04-09

open-noob	Hi all - noob question - I am trying to install openstack on my ubuntu vm, with 16384 GB RAM, 150GB disk. Using openstack-ansible 2024.2. I keep getting this bunch of notifications: haproxy[152240]: backend repo_all-back has no server available! Broadcast message from systemd-journald@aio1 (Tue 2025-04-08 18:01:31 EDT): haproxy[155832]: backend galera-back has no server available! ssharma@stack:~$ ssharma@stack:~$ Broadcast message from	00:04
open-noob	What could the problem be? Im using only the configurations generated by scripts/aio.sh	00:04
f0o	Quite a lot of traffic here recently :)	06:13
jrosser	it’s good! has been really quiet for a while	06:26
f0o	jrosser: can you have a look at https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601 ?	06:32
f0o	all other sysctl changes made it but this one seems stuck. Should I rebase?	06:33
f0o	I looked at the Zuul failures but they seem entirely unrelated to my change	06:35
f0o	(Same for my recent errorfiles changes)	06:35
darkhackernc	https://paste.centos.org/view/05bb9025	06:53
darkhackernc	rabbitmq is broken and no users in the rabbitmqctl list_users	06:54
darkhackernc	anyone can guide/help for the fix please	06:54
darkhackernc	its a live env	06:54
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-haproxy_server master: Remove extra whitespace delimiter to satisfy ansible-lint https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939606	07:12
jrosser	f0o: i think that this needs a manual rebase https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601/6	07:13
f0o	Can do!	07:13
jrosser	can you try that? its not sufficiently trivial that gerrit will do it for you in the UI	07:13
jrosser	cool, thanks	07:13
opendevreview	Daniel Preussker proposed openstack/openstack-ansible-haproxy_server master: Make sysctl configuration path configurable Defaults to /etc/sysctl.conf to retain current behavior https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601	07:14
f0o	I think the rebase was needed because `yes` became `true` somewhere	07:15
jrosser	ah right indeed we have done a round of getting ansible-lint happy	07:16
f0o	makes sense :)	07:16
jrosser	darkhackernc: is yours a 'metal' deployment, without LXC containers for rabbitmq?	07:16
jrosser	its slightly surprising that you run rabbitmqctl on the controller host	07:17
darkhackernc	jrosser, metal	07:18
darkhackernc	not lxc	07:18
noonedeadpunk	f0o: I commented on it	07:27
noonedeadpunk	I wonder when rocky will publish relevant ovn/ovs	07:28
noonedeadpunk	ah, ovs is there... not ovn...	07:30
noonedeadpunk	NeilHanlon: ^ :) https://zuul.opendev.org/t/openstack/build/a47da36e1e4c42958940aaa7da0d92b0	07:30
f0o	noonedeadpunk: Oh release note... I will have to read up on that	07:31
noonedeadpunk	it's kinda `pip install reno; reno new haproxy_sysctl_location`	07:32
f0o	it generated a boilerplate releasenotes/notes/haproxy_sysctl_location-e18310fd96597a6f.yaml but I wonder what the contents should be like, I guess everything but `features:` goes away?	07:38
f0o	sorry for the silly questions :)	07:39
jrosser	yes delete everything thats not relevant	07:40
f0o	https://paste.opendev.org/show/biPnMS84QhwVGLHlLxmw/ is this enough or is it too brief?	07:46
noonedeadpunk	f0o: it's fine, just drop prelude please	08:09
f0o	okidoki	08:09
noonedeadpunk	as it's gonna render as release highlight more or less	08:09
opendevreview	Daniel Preussker proposed openstack/openstack-ansible-haproxy_server master: Make sysctl configuration path configurable Defaults to /etc/sysctl.conf to retain current behavior https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601	08:09
noonedeadpunk	like first thing for the release: https://docs.openstack.org/releasenotes/openstack-ansible/2024.2.html#prelude	08:10
f0o	aaah	08:10
darkhackernc	thanks noonedeadpunk jrosser for the help	08:19
darkhackernc	cluster is good now	08:20
darkhackernc	how about adding this to rabbitmq	08:45
darkhackernc	RABBITMQ_SERVER_ERL_ARGS="+K true +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<15000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<15000:64/native>>}]"	08:45
darkhackernc	will this enhance rabbitmq capability	08:45
noonedeadpunk	darkhackernc: I'd suggest using `rabbitmq_erlang_extra_args` variable to set this up	09:03
noonedeadpunk	it will propogate into the template: https://opendev.org/openstack/openstack-ansible-rabbitmq_server/src/branch/master/templates/rabbitmq-env.j2#L10-L12	09:04
noonedeadpunk	about the values - really no idea what they are doing	09:04
darkhackernc	noonedeadpunk++	09:04
noonedeadpunk	eventually HA queues create a huge load on the cluster	09:06
noonedeadpunk	I think it was caracal, where we did upgrade of rabbitmq and oslo.messaging got modern enough to support streams and quorum queues	09:07
noonedeadpunk	which should improve rabbit capacity indeed	09:08
noonedeadpunk	disabling ha queues can be an option as well	09:08
noonedeadpunk	I wonder if we should set rocky to NV right now, until ovn/ovs is sorted out...	09:12
noonedeadpunk	or just wait...	09:12
darkhackernc	noonedeadpunk, thank you so much	09:17
opendevreview	Daniel Preussker proposed openstack/openstack-ansible-haproxy_server master: Make sysctl configuration path configurable https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601	10:17
f0o	when having additional repos in /etc/apt/sources.list.d/ how can I copy over the gpg keyrings into the lxc as well? is it just a matter of placing the keyring in the same folder? Right now the lxc caching step will always fail because it cannot get past `apt update` as the additional sources' gpg keyring is missing	10:48
jrosser	f0o: these things should be copied over i think https://github.com/openstack/openstack-ansible-lxc_hosts/blob/master/vars/debian.yml#L23-L33	10:52
jrosser	the idea is that if the host has apt repo config then that should be copied over into the image	10:53
f0o	jrosser: thanks! we had them /usr/share/keyrings/ but its easier to just move that	10:53
f0o	worked like a breeze :)	11:00
derekokeeffe	Hi again all, noonedeadpunk I forgot to actually thank you for thet script in the first place. Got around to running it and the result is here https://paste.openstack.org/show/bQonP6K4XKc6K8WDBacC/ not sure what the failure is, you might have some insight if you have time please	11:01
darkhackernc	noonedeadpunk, https://bugs.launchpad.net/openstack-ansible/+bug/2106614	11:01
noonedeadpunk	derekokeeffe: eh.... that looks like some weird thing in opensdtack_user_config	11:05
noonedeadpunk	as `eth11` is not really expected in where it;s placed	11:06
derekokeeffe	Hmm oko, I better take a look at that config file so	11:15
noonedeadpunk	but at least your management network feels fine	11:16
noonedeadpunk	So I think that mariadb is broken not on this	11:17
noonedeadpunk	have you checked haproxy backends?	11:18
noonedeadpunk	are they healthy regarding mariadb backends?	11:18
derekokeeffe	Sorry for my ignorance but how would I check that?	11:23
noonedeadpunk	echo "show stat" \| nc -U /run/haproxy.stat \| grep galera	11:27
noonedeadpunk	or using hatop	11:28
f0o	am I correct in assuming that all of this are manual steps that I'm required to do on each storage host? https://docs.openstack.org/openstack-ansible-os_swift/latest/configure-swift-devices.html	11:40
f0o	I couldnt find any fstab ansible shenanigans in os_swift either	11:41
f0o	just wondering what the ops here is... having to mnaually go through ~50 hosts and run mkfs as well as editing fstab is gonna be tedious X_X	11:41
noonedeadpunk	so for mounting part - you should be able to leverage systemd_mount role	11:42
noonedeadpunk	but it does not provide formatting of drives	11:42
f0o	I shall look into this `systemd_mount role`	11:43
noonedeadpunk	So we do have it included in openstack_hosts	11:43
f0o	the formatting is not an issue actually, I think we got that covered with the pxe template already... So mounting/fstab is the only tedious part left	11:44
noonedeadpunk	https://opendev.org/openstack/openstack-ansible-openstack_hosts/src/branch/master/tasks/openstack_hosts_systemd.yml#L38-L44	11:44
f0o	ty!	11:44
noonedeadpunk	format of openstack_hosts_systemd_mounts looks like this: https://opendev.org/openstack/ansible-role-systemd_mount/src/branch/master/defaults/main.yml#L46	11:44
noonedeadpunk	so you pretty much define `openstack_hosts_systemd_mounts` as host/group_vars	11:44
f0o	so I can just add a group_vars for the swift_hosts and toss all the sdxy into it	11:45
f0o	neato	11:45
f0o	<3	11:45
noonedeadpunk	and when running openstack_hosts - role will configure mounts for you	11:45
noonedeadpunk	yeah	11:45
f0o	starting to like this ansible-voodoo	11:45
noonedeadpunk	you can also do same with networks :)	11:45
jrosser	f0o: a bunch of these "utility" roles for mounts / systemd services / systemd network are designed to be able to be used also outside openstack-ansible	11:47
noonedeadpunk	fwiw, today in new envs I just leverage systemd_networkd for configuring networks on hosts for me. But tbh - coming up with initial config (which I can copy/paste for any new env) took some timer	11:47
jrosser	i use them a bunch for deploying other things and they are super handy	11:47
noonedeadpunk	++	11:47
jrosser	noonedeadpunk: do you have some approach for (hah) consistent device naming for network setup?	11:48
jrosser	like udev or something to make the interfaces have some useful names	11:49
f0o	jrosser: ennpXsY has been working great for us, it's predictable since X is pcie port and Y is the interface on the card	11:50
jrosser	until you upgrade your OS	11:50
jrosser	then its similar, but different	11:50
f0o	havent had an issue with that yet	11:50
f0o	for our routers we use mac encoding	11:51
f0o	enx1234567abc	11:51
f0o	becomes tedious to type but it's guaranteed to stay	11:51
jrosser	things became enpXsYfZnpA on noble hosts	11:53
f0o	then your card has subslots	11:54
f0o	like 40G with 4x10G breakouts	11:54
f0o	at least that's what we observed	11:55
jrosser	that is an ancient connectx-4	11:55
jrosser	so i suspect it is more that the driver in noble understands things like subslots now and everything seems to be expressed that way, even if the hardware cant do it	11:56
jrosser	i don't think it's until cx-7 that you can breakout a nic from the host perspective	11:56
f0o	maybe it uses the same naming as newer connectx's... `fZ` is usually for sr-iov functions	11:56
f0o	all our sr-iovs use enpXsYf[0-2048]np[0-1]	11:57
f0o	network naming is fun (:	11:57
jrosser	at the moment we have not needed to record all the mac addresses in our inventory, so it's not trivial to drop udev config to make some actually persistent names	11:58
jrosser	we only have the mac for the interface the host pxeboots from	11:58
derekokeeffe	noonedeadpunk this is the output https://paste.openstack.org/show/blaQI2nfgsLWyk1iqqdA/	11:58
noonedeadpunk	well :)	11:59
derekokeeffe	Ths sounds promising :) what have I done!!!	11:59
noonedeadpunk	what is haproxy_keepalived_internal_vip_cidr ?	12:01
derekokeeffe	haproxy_keepalived_external_vip_cidr: "193.1.8.6/32"	12:01
derekokeeffe	haproxy_keepalived_internal_vip_cidr: "172.29.236.6/32"	12:01
derekokeeffe	I changed it from a 193 address yesterday	12:02
derekokeeffe	Same as the container mgmt net	12:02
f0o	I tihnk there's an issue with the os_swift 'Create the swift system user' task. There's a chicken egg in the swift-proxy contianer because it tries to execute ssh-keygen before installing it first	12:03
noonedeadpunk	I assume you've changed DNS as well, right?	12:03
f0o	so swift_pre_install creates the user with `generate_ssh_key: "yes"` but openssh-client is only installed in swift_install ... that seems like a chicken-egg issue	12:05
derekokeeffe	I haven't set up any DNS entries yet, should I have?	12:05
f0o	Just verified, if I manually install openssh-client in the swift-proxy lxc the playbook passes the step	12:06
f0o	I have a feeling os_swift is broken in many ways - it's trying to upload the rings from the deployment host but didnt set the deployment hosts' ssh-keys into authorized keys... am I missing something?	12:13
noonedeadpunk	frankly - I hardly ever used swift	12:36
noonedeadpunk	no nothing about it, except it passes CI somehow	12:36
f0o	I start to doubt myself so much right now... I think my config is correct... but I get a lot of odd errors. So I just did the lxc-destroy playbook to remove the proxy containers and start from scratch	12:38
f0o	I've also had issues with the actual storage hosts (not containers) not finding the br-storage, but it's right there in `ip l` with IP and all because nova uses it for cinder :D	12:38
noonedeadpunk	but I think you might be right about race conditions, as we test in AIO	12:40
noonedeadpunk	and multinode can have quirks	12:41
noonedeadpunk	also I wonder if we're still using regular ssh keys for swift... as that I;'d expect to be replaced with ssh certs	12:41
f0o	I can live with the openssh-client one, I can isntall that on the proxies real quick. But the rsync from deployment host to proxies for the rings is much bigger issue for me - and I have no idea where to even start with the storage hosts not finding br-storage	12:42
noonedeadpunk	but we could miss swift when doing keystone/nova conversion to certs	12:42
darkhackernc	noonedeadpunk, https://bugs.launchpad.net/openstack-ansible/+bug/2106625	12:46
darkhackernc	why after the nova-compute service restart , instances lost the connectivyt , restart of openvswitch reform the connectivyt	12:47
darkhackernc	any know issues?	12:48
jrosser	f0o: i don't think i ever looked at the swift role when we did SSH CAs	13:08
jrosser	if there is code there that tries to generate and distribute ssh keys across all the swift hosts then we can totally do better than that	13:08
f0o	yay :D	13:09
f0o	at least it means I'm not going insane	13:10
jrosser	i am really not sure anyone has used the os_swift role for real in a very long time	13:11
jrosser	most people are ceph/radosgw	13:11
f0o	ceph seems like a very big overhead to just add cheap object storage... I get it if ceph is also the cinder backend	13:12
f0o	but yeah it seems that os_swift is nonfunctional outside of AIO	13:12
jrosser	i'm really just saying that the swift role almost certainkly needs some work	13:12
jrosser	and i am currently completely -ETIME for looking at it, plus my clouds don't use it	13:13
jrosser	if you are interested to see how it should work, the keystone role uses rsync to copy the fernet keys between hosts and we set that up nicely so that you don't need a "full mesh" of installed public keys	13:14
noonedeadpunk	f0o: I could look at it but after the PTG week only	13:14
jrosser	the same approach should be lift/shift into swift	13:14
noonedeadpunk	and given that some more information is provided on issues	13:14
f0o	I'll have a crack at it. There might be some Qs here and there at points where I'm lost (namely `Swift storage address not found` when it absolutely does exist) but at least the user-add and ssh-ca I should be able to handle	13:16
f0o	will have a go tomorrow, worst case I learn some ansible :P	13:18
f0o	but now it's Lil Lordag as we say, time for a beer on a wednesday	13:18
noonedeadpunk	f0o: so that "failure" boils to this variable: https://opendev.org/openstack/openstack-ansible-os_swift/src/branch/master/defaults/main.yml#L273-L287	13:18
f0o	noonedeadpunk: https://paste.opendev.org/show/bqJJUFVI4dcxT6Iko9Rp/	13:20
noonedeadpunk	and that's what we have in AIO: https://opendev.org/openstack/openstack-ansible/src/branch/master/etc/openstack_deploy/conf.d/swift.yml.aio#L2	13:20
f0o	yup that's what I took as template	13:21
noonedeadpunk	did you get failure for the LXC or bare metal host?	13:21
f0o	noonedeadpunk: do I require the container_vars>swift_vars>... objects? I assumed they would be flat for hosts (non-lxc deployment)	13:21
noonedeadpunk	um, no you don't	13:22
noonedeadpunk	I also think that zone/drives can be moved to group_vars easily	13:23
f0o	probably, this was just a short copy-paste. the zones do change down the line as the hosts are in different racks	13:24
noonedeadpunk	and also I think they should be under `swift` variable	13:24
f0o	no matter, I think my plan right now is to solve the user creation & SSH-CA first; then deal with the storage address and then iterative with the rest.	13:25
noonedeadpunk	as like that example jsut having all that stuff inside of single var: https://opendev.org/openstack/openstack-ansible-os_swift/src/branch/master/defaults/main.yml#L273-L287	13:25
f0o	you mean h4_2>swift_vars>zone:123 ?	13:25
jrosser	it is probably good to generally put things more in /openstack_deploy/group_vars than in user_variables	13:25
noonedeadpunk	So when you mix up user_variables with stuff from openstack_user_config - you likely having conflicts	13:25
noonedeadpunk	yeah	13:26
noonedeadpunk	I think you need all that to be part of `swift` variable like described in the role	13:26
jrosser	i would avoid totally where possible putting any vars in openstack_user_config	13:26
noonedeadpunk	https://opendev.org/openstack/openstack-ansible-os_swift/src/branch/master/defaults/main.yml#L273-L287	13:26
noonedeadpunk	yeah, it's a weird pattern we support for some reason...	13:27
noonedeadpunk	from really early days I believe	13:27
jrosser	right - i do remember the time the group/host_vars became possible	13:27
jrosser	so it might be a thing that was for the same purpose directly in the inventory before that	13:27
f0o	:D	13:28
noonedeadpunk	f0o: also jsut to double check - you have seen https://docs.openstack.org/openstack-ansible-os_swift/latest/configure-swift.html ?	13:28
f0o	I have which lead to a few more Qs than As	13:28
noonedeadpunk	yeah, right	13:28
f0o	because I'm not 100% sure on the meaning of container_vars when you use baremetal deployment for the storage hosts	13:29
noonedeadpunk	I already see it's outdated...	13:29
noonedeadpunk	and I see where you took the approach you took	13:29
f0o	also really uncertain if `swift:{}` needs to be in global_overrides or not	13:30
f0o	just adds to the Qs	13:30
f0o	I now wonder what the compute overhead of ceph is for "simple" object storage...	13:30
f0o	:D	13:31
noonedeadpunk	no, just place it in user_variables or gropup_vars	13:32
noonedeadpunk	`swift` is a regular variable	13:32
f0o	but alright, my plan is to delve into the ssh-ca taking keystone fernet keys as example and also how the user is created elsewhere. I should be able to create a patch fixing that part at least by tmr afternoon.	13:32
noonedeadpunk	and docs there are indeed written for times where group_vars were not possuble	13:33
noonedeadpunk	so it tried to account for different variable content based on the group	13:33
noonedeadpunk	so that `swift` was diferent for different groups	13:33
noonedeadpunk	(or different hosts)	13:34
f0o	alright the sun is out and I'm off for that lil lordag beer. Thanks for the pointers and I'll try to fix it myself tomorrow but worst case ask a bunch of silly Qs again (:	13:35
noonedeadpunk	I can try adopting docs for modern usecase, but then I have very little idea about what it should be as close to never actually used swift	13:35
f0o	I'll have a beer for you guys as well :)	13:35
f0o	catch you all later :)	13:36
noonedeadpunk	yeah, just shout the question	13:36
noonedeadpunk	if any, I'd be glad to help with sorting things out and giving some love to our roles/docs	13:36
mgariepy	well my issue with placement and host-aggregate is du to fqdn being for the hosts in hypervisor_hostname in my 10 year old production cluster.	13:38
noonedeadpunk	I _think_ it's generally not an issue.....	13:38
noonedeadpunk	we have aggregates working with fqdn in hypervisors and short hostname in compute servoice list	13:38
noonedeadpunk	(and then in aggregate they somehow matching)	13:39
noonedeadpunk	also in quite old envs...	13:39
noonedeadpunk	so _generally_ I'd expect different hostnames not being a deal breaker today	13:39
mgariepy	the aggregate did not migrate in my production system i had to add all the compute to the correct aggregate	13:42
mgariepy	when i look at my testbed with all shortname for everything the placement aggretate is there.	13:46
mgariepy	for the other none of the fqdn hosts were added to aggregates in placements.	13:46
mgariepy	might be fixed in newer release i guess.	13:46
noonedeadpunk	yeah, indeed it could be just old release...	14:16
mgariepy	maybe indeed	14:46
noonedeadpunk	Folks, PTG session is live right now: https://meetpad.opendev.org/apr2025-ptg-os-ansible	15:03
noonedeadpunk	so don't be shy to jump in :)	15:03
jrosser	on my way sorry	15:03
jrosser	back to bck meetings	15:03
NeilHanlon	on me way	15:03
noonedeadpunk	I clean forgot to send a ML with info this time :(	15:04

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!