Wednesday, 2025-04-09

open-noobHi all - noob question - I am trying to install openstack on my ubuntu vm, with 16384 GB RAM, 150GB disk. Using openstack-ansible 2024.2. I keep getting this bunch of notifications: haproxy[152240]: backend repo_all-back has no server available!   Broadcast message from systemd-journald@aio1 (Tue 2025-04-08 18:01:31 EDT):  haproxy[155832]: backend galera-back has no server available!   ssharma@stack:~$ ssharma@stack:~$ Broadcast message from 00:04
open-noobWhat could the problem be? Im using only the configurations generated by scripts/aio.sh00:04
f0oQuite a lot of traffic here recently :)06:13
jrosserit’s good! has been really quiet for a while06:26
f0ojrosser: can you have a look at https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601 ?06:32
f0oall other sysctl changes made it but this one seems stuck. Should I rebase?06:33
f0oI looked at the Zuul failures but they seem entirely unrelated to my change06:35
f0o(Same for my recent errorfiles changes)06:35
darkhackernchttps://paste.centos.org/view/05bb902506:53
darkhackerncrabbitmq is broken and no users in the rabbitmqctl list_users06:54
darkhackerncanyone can guide/help for the fix please06:54
darkhackerncits a live env06:54
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-haproxy_server master: Remove extra whitespace delimiter to satisfy ansible-lint  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/93960607:12
jrosserf0o: i think that this needs a manual rebase https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601/607:13
f0oCan do!07:13
jrossercan you try that? its not sufficiently trivial that gerrit will do it for you in the UI07:13
jrossercool, thanks07:13
opendevreviewDaniel Preussker proposed openstack/openstack-ansible-haproxy_server master: Make sysctl configuration path configurable Defaults to /etc/sysctl.conf to retain current behavior  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/93960107:14
f0oI think the rebase was needed because `yes` became `true` somewhere07:15
jrosserah right indeed we have done a round of getting ansible-lint happy07:16
f0omakes sense :)07:16
jrosserdarkhackernc: is yours a 'metal' deployment, without LXC containers for rabbitmq?07:16
jrosserits slightly surprising that you run rabbitmqctl on the controller host07:17
darkhackerncjrosser, metal07:18
darkhackerncnot lxc07:18
noonedeadpunkf0o: I commented on it07:27
noonedeadpunkI wonder when rocky will publish relevant ovn/ovs07:28
noonedeadpunkah, ovs is there... not ovn...07:30
noonedeadpunkNeilHanlon: ^ :) https://zuul.opendev.org/t/openstack/build/a47da36e1e4c42958940aaa7da0d92b007:30
f0onoonedeadpunk: Oh release note... I will have to read up on that07:31
noonedeadpunkit's kinda `pip install reno; reno new haproxy_sysctl_location`07:32
f0oit generated a boilerplate releasenotes/notes/haproxy_sysctl_location-e18310fd96597a6f.yaml but I wonder what the contents should be like, I guess everything but `features:` goes away?07:38
f0osorry for the silly questions :)07:39
jrosseryes delete everything thats not relevant07:40
f0ohttps://paste.opendev.org/show/biPnMS84QhwVGLHlLxmw/ is this enough or is it too brief?07:46
noonedeadpunkf0o: it's fine, just drop prelude please08:09
f0ookidoki08:09
noonedeadpunkas it's gonna render as release highlight more or less08:09
opendevreviewDaniel Preussker proposed openstack/openstack-ansible-haproxy_server master: Make sysctl configuration path configurable Defaults to /etc/sysctl.conf to retain current behavior  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/93960108:09
noonedeadpunklike first thing for the release: https://docs.openstack.org/releasenotes/openstack-ansible/2024.2.html#prelude08:10
f0oaaah08:10
darkhackerncthanks noonedeadpunk jrosser for the help08:19
darkhackernccluster is good now08:20
darkhackernchow about adding this to rabbitmq08:45
darkhackerncRABBITMQ_SERVER_ERL_ARGS="+K true +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<15000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<15000:64/native>>}]"08:45
darkhackerncwill this enhance rabbitmq capability08:45
noonedeadpunkdarkhackernc: I'd suggest using `rabbitmq_erlang_extra_args` variable to set this up09:03
noonedeadpunkit will propogate into the template: https://opendev.org/openstack/openstack-ansible-rabbitmq_server/src/branch/master/templates/rabbitmq-env.j2#L10-L1209:04
noonedeadpunkabout the values - really no idea what they are doing09:04
darkhackerncnoonedeadpunk++09:04
noonedeadpunkeventually HA queues create a huge load on the cluster09:06
noonedeadpunkI think it was caracal, where we did upgrade of rabbitmq and oslo.messaging got modern enough to support streams and quorum queues09:07
noonedeadpunkwhich should improve rabbit capacity indeed09:08
noonedeadpunkdisabling ha queues can be an option as well09:08
noonedeadpunkI wonder if we should set rocky to NV right now, until ovn/ovs is sorted out...09:12
noonedeadpunkor just wait...09:12
darkhackerncnoonedeadpunk, thank you so much09:17
opendevreviewDaniel Preussker proposed openstack/openstack-ansible-haproxy_server master: Make sysctl configuration path configurable  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/93960110:17
f0owhen having additional repos in /etc/apt/sources.list.d/ how can I copy over the gpg keyrings into the lxc as well? is it just a matter of placing the keyring in the same folder? Right now the lxc caching step will always fail because it cannot get past `apt update` as the additional sources' gpg keyring is missing10:48
jrosserf0o: these things should be copied over i think https://github.com/openstack/openstack-ansible-lxc_hosts/blob/master/vars/debian.yml#L23-L3310:52
jrosserthe idea is that if the host has apt repo config then that should be copied over into the image10:53
f0ojrosser: thanks! we had them /usr/share/keyrings/ but its easier to just move that10:53
f0oworked like a breeze :)11:00
derekokeeffeHi again all, noonedeadpunk I forgot to actually thank you for thet script in the first place. Got around to running it and the result is here https://paste.openstack.org/show/bQonP6K4XKc6K8WDBacC/ not sure what the failure is, you might have some insight if you have time please11:01
darkhackerncnoonedeadpunk, https://bugs.launchpad.net/openstack-ansible/+bug/210661411:01
noonedeadpunkderekokeeffe: eh.... that looks like some weird thing in opensdtack_user_config11:05
noonedeadpunkas `eth11` is not really expected in where it;s placed11:06
derekokeeffeHmm oko, I better take a look at that config file so11:15
noonedeadpunkbut at least your management network feels fine11:16
noonedeadpunkSo I think that mariadb is broken not on this11:17
noonedeadpunkhave you checked haproxy backends?11:18
noonedeadpunkare they healthy regarding mariadb backends?11:18
derekokeeffeSorry for my ignorance but how would I check that?11:23
noonedeadpunkecho "show stat" | nc -U /run/haproxy.stat | grep galera11:27
noonedeadpunkor using hatop11:28
f0oam I correct in assuming that all of this are manual steps that I'm required to do on each storage host? https://docs.openstack.org/openstack-ansible-os_swift/latest/configure-swift-devices.html11:40
f0oI couldnt find any fstab ansible shenanigans in os_swift either11:41
f0ojust wondering what the ops here is... having to mnaually go through ~50 hosts and run mkfs as well as editing fstab is gonna be tedious X_X11:41
noonedeadpunkso for mounting part - you should be able to leverage systemd_mount role11:42
noonedeadpunkbut it does not provide formatting of drives11:42
f0oI shall look into this `systemd_mount role`11:43
noonedeadpunkSo we do have it included in openstack_hosts11:43
f0othe formatting is not an issue actually, I think we got that covered with the pxe template already... So mounting/fstab is the only tedious part left11:44
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible-openstack_hosts/src/branch/master/tasks/openstack_hosts_systemd.yml#L38-L4411:44
f0oty!11:44
noonedeadpunkformat of openstack_hosts_systemd_mounts looks like this: https://opendev.org/openstack/ansible-role-systemd_mount/src/branch/master/defaults/main.yml#L4611:44
noonedeadpunkso you pretty much define `openstack_hosts_systemd_mounts` as host/group_vars11:44
f0oso I can just add a group_vars for the swift_hosts and toss all the sdxy into it11:45
f0oneato11:45
f0o<311:45
noonedeadpunkand when running openstack_hosts - role will configure mounts for you11:45
noonedeadpunkyeah11:45
f0ostarting to like this ansible-voodoo11:45
noonedeadpunkyou can also do same with networks :)11:45
jrosserf0o: a bunch of these "utility" roles for mounts / systemd services / systemd network are designed to be able to be used also outside openstack-ansible11:47
noonedeadpunkfwiw, today in new envs I just leverage systemd_networkd for configuring networks on hosts for me. But tbh - coming up with initial config (which I can copy/paste for any new env) took some timer11:47
jrosseri use them a bunch for deploying other things and they are super handy11:47
noonedeadpunk++11:47
jrossernoonedeadpunk: do you have some approach for (hah) consistent device naming for network setup?11:48
jrosserlike udev or something to make the interfaces have some useful names11:49
f0ojrosser: ennpXsY has been working great for us, it's predictable since X is pcie port and Y is the interface on the card11:50
jrosseruntil you upgrade your OS11:50
jrosserthen its similar, but different11:50
f0ohavent had an issue with that yet11:50
f0ofor our routers we use mac encoding11:51
f0oenx1234567abc11:51
f0obecomes tedious to type but it's guaranteed to stay11:51
jrosserthings became enpXsYfZnpA on noble hosts11:53
f0othen your card has subslots11:54
f0olike 40G with 4x10G breakouts11:54
f0oat least that's what we observed11:55
jrosserthat is an ancient connectx-411:55
jrosserso i suspect it is more that the driver in noble understands things like subslots now and everything seems to be expressed that way, even if the hardware cant do it11:56
jrosseri don't think it's until cx-7 that you can breakout a nic from the host perspective11:56
f0omaybe it uses the same naming as newer connectx's... `fZ` is usually for sr-iov functions11:56
f0oall our sr-iovs use enpXsYf[0-2048]np[0-1]11:57
f0onetwork naming is fun (:11:57
jrosserat the moment we have not needed to record all the mac addresses in our inventory, so it's not trivial to drop udev config to make some actually persistent names11:58
jrosserwe only have the mac for the interface the host pxeboots from11:58
derekokeeffenoonedeadpunk this is the output https://paste.openstack.org/show/blaQI2nfgsLWyk1iqqdA/11:58
noonedeadpunkwell :)11:59
derekokeeffeThs sounds promising :) what have I done!!!11:59
noonedeadpunkwhat is haproxy_keepalived_internal_vip_cidr ?12:01
derekokeeffehaproxy_keepalived_external_vip_cidr: "193.1.8.6/32"12:01
derekokeeffehaproxy_keepalived_internal_vip_cidr: "172.29.236.6/32"12:01
derekokeeffeI changed it from a 193 address yesterday12:02
derekokeeffeSame as the container mgmt net12:02
f0oI tihnk there's an issue with the os_swift 'Create the swift system user' task. There's a chicken egg in the swift-proxy contianer because it tries to execute ssh-keygen before installing it first12:03
noonedeadpunkI assume you've changed DNS as well, right?12:03
f0oso swift_pre_install creates the user with `generate_ssh_key: "yes"` but openssh-client is only installed in swift_install ... that seems like a chicken-egg issue12:05
derekokeeffeI haven't set up any DNS entries yet, should I have?12:05
f0oJust verified, if I manually install openssh-client in the swift-proxy lxc the playbook passes the step12:06
f0oI have a feeling os_swift is broken in many ways - it's trying to upload the rings from the deployment host but didnt set the deployment hosts' ssh-keys into authorized keys... am I missing something?12:13
noonedeadpunkfrankly - I hardly ever used swift12:36
noonedeadpunkno nothing about it, except it passes CI somehow12:36
f0oI start to doubt myself so much right now... I think my config is correct... but I get a lot of odd errors. So I just did the lxc-destroy playbook to remove the proxy containers and start from scratch12:38
f0oI've also had issues with the actual storage hosts (not containers) not finding the br-storage, but it's right there in `ip l` with IP and all because nova uses it for cinder :D12:38
noonedeadpunkbut I think you might be right about race conditions, as we test in AIO12:40
noonedeadpunkand multinode can have quirks12:41
noonedeadpunkalso I wonder if we're still using regular ssh keys for swift... as that I;'d expect to be replaced with ssh certs12:41
f0oI can live with the openssh-client one, I can isntall that on the proxies real quick. But the rsync from deployment host to proxies for the rings is much bigger issue for me - and I have no idea where to even start with the storage hosts not finding br-storage12:42
noonedeadpunkbut we could miss swift when doing keystone/nova conversion to certs12:42
darkhackerncnoonedeadpunk, https://bugs.launchpad.net/openstack-ansible/+bug/210662512:46
darkhackerncwhy after the nova-compute service restart , instances lost the connectivyt , restart of openvswitch reform the connectivyt12:47
darkhackerncany know issues? 12:48
jrosserf0o: i don't think i ever looked at the swift role when we did SSH CAs13:08
jrosserif there is code there that tries to generate and distribute ssh keys across all the swift hosts then we can totally do better than that13:08
f0oyay :D13:09
f0oat least it means I'm not going insane13:10
jrosseri am really not sure anyone has used the os_swift role for real in a very long time13:11
jrossermost people are ceph/radosgw13:11
f0oceph seems like a very big overhead to just add cheap object storage... I get it if ceph is also the cinder backend13:12
f0obut yeah it seems that os_swift is nonfunctional outside of AIO13:12
jrosseri'm really just saying that the swift role almost certainkly needs some work13:12
jrosserand i am currently completely -ETIME for looking at it, plus my clouds don't use it13:13
jrosserif you are interested to see how it should work, the keystone role uses rsync to copy the fernet keys between hosts and we set that up nicely so that you don't need a "full mesh" of installed public keys13:14
noonedeadpunkf0o: I could look at it but after the PTG week only13:14
jrosserthe same approach should be lift/shift into swift13:14
noonedeadpunkand given that some more information is provided on issues13:14
f0oI'll have a crack at it. There might be some Qs here and there at points where I'm lost (namely `Swift storage address not found` when it absolutely does exist) but at least the user-add and ssh-ca I should be able to handle13:16
f0owill have a go tomorrow, worst case I learn some ansible :P13:18
f0obut now it's Lil Lordag as we say, time for a beer on a wednesday13:18
noonedeadpunkf0o: so that "failure" boils to this variable: https://opendev.org/openstack/openstack-ansible-os_swift/src/branch/master/defaults/main.yml#L273-L28713:18
f0onoonedeadpunk: https://paste.opendev.org/show/bqJJUFVI4dcxT6Iko9Rp/13:20
noonedeadpunkand that's what we have in AIO: https://opendev.org/openstack/openstack-ansible/src/branch/master/etc/openstack_deploy/conf.d/swift.yml.aio#L213:20
f0oyup that's what I took as template13:21
noonedeadpunkdid you get failure for the LXC or bare metal host?13:21
f0onoonedeadpunk: do I require the container_vars>swift_vars>... objects? I assumed they would be flat for hosts (non-lxc deployment)13:21
noonedeadpunkum, no you don't13:22
noonedeadpunkI also think that zone/drives can be moved to group_vars easily13:23
f0oprobably, this was just a short copy-paste. the zones do change down the line as the hosts are in different racks13:24
noonedeadpunkand also I think they should be under `swift` variable13:24
f0ono matter, I think my plan right now is to solve the user creation & SSH-CA first; then deal with the storage address and then iterative with the rest.13:25
noonedeadpunkas like that example jsut having all that stuff inside of single var: https://opendev.org/openstack/openstack-ansible-os_swift/src/branch/master/defaults/main.yml#L273-L28713:25
f0oyou mean h4_2>swift_vars>zone:123 ?13:25
jrosserit is probably good to generally put things more in /openstack_deploy/group_vars than in user_variables13:25
noonedeadpunkSo when you mix up user_variables with stuff from openstack_user_config - you likely having conflicts13:25
noonedeadpunkyeah13:26
noonedeadpunkI think you need all that to be part of `swift` variable like described in the role13:26
jrosseri would avoid totally where possible putting any vars in openstack_user_config13:26
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible-os_swift/src/branch/master/defaults/main.yml#L273-L28713:26
noonedeadpunkyeah, it's a weird pattern we support for some reason...13:27
noonedeadpunkfrom really early days I believe13:27
jrosserright - i do remember the time the group/host_vars became possible13:27
jrosserso it might be a thing that was for the same purpose directly in the inventory before that13:27
f0o:D13:28
noonedeadpunkf0o: also jsut to double check - you have seen https://docs.openstack.org/openstack-ansible-os_swift/latest/configure-swift.html ?13:28
f0oI have which lead to a few more Qs than As13:28
noonedeadpunkyeah, right13:28
f0obecause I'm not 100% sure on the meaning of container_vars when you use baremetal deployment for the storage hosts13:29
noonedeadpunkI already see it's outdated...13:29
noonedeadpunkand I see where you took the approach you took13:29
f0oalso really uncertain if `swift:{}` needs to be in global_overrides or not13:30
f0ojust adds to the Qs13:30
f0oI now wonder what the compute overhead of ceph is for "simple" object storage...13:30
f0o:D13:31
noonedeadpunkno, just place it in user_variables or gropup_vars13:32
noonedeadpunk`swift` is a regular variable13:32
f0obut alright, my plan is to delve into the ssh-ca taking keystone fernet keys as example and also how the user is created elsewhere. I should be able to create a patch fixing that part at least by tmr afternoon.13:32
noonedeadpunkand docs there are indeed written for times where group_vars were not possuble13:33
noonedeadpunkso it tried to account for different variable content based on the group13:33
noonedeadpunkso that `swift` was diferent for different groups13:33
noonedeadpunk(or different hosts)13:34
f0oalright the sun is out and I'm off for that lil lordag beer. Thanks for the pointers and I'll try to fix it myself tomorrow but worst case ask a bunch of silly Qs again (:13:35
noonedeadpunkI can try adopting docs for modern usecase, but then I have very little idea about what it should be as close to never actually used swift13:35
f0oI'll have a beer for you guys as well :)13:35
f0ocatch you all later :)13:36
noonedeadpunkyeah, just shout the question13:36
noonedeadpunkif any, I'd be glad to help with sorting things out and giving some love to our roles/docs13:36
mgariepywell my issue with placement and host-aggregate is du to fqdn being for the hosts in hypervisor_hostname in my 10 year old production cluster.13:38
noonedeadpunkI _think_ it's generally not an issue.....13:38
noonedeadpunkwe have aggregates working with fqdn in hypervisors and short hostname in compute servoice list13:38
noonedeadpunk(and then in aggregate they somehow matching)13:39
noonedeadpunkalso in quite old envs...13:39
noonedeadpunkso _generally_ I'd expect different hostnames not being a deal breaker today13:39
mgariepythe aggregate did not migrate in my production system i had to add all the compute to the correct aggregate13:42
mgariepywhen i look at my testbed with all shortname for everything the placement aggretate is there.13:46
mgariepyfor the other none of the fqdn hosts were added to aggregates in placements.13:46
mgariepymight be fixed in newer release i guess.13:46
noonedeadpunkyeah, indeed it could be just old release...14:16
mgariepymaybe indeed14:46
noonedeadpunkFolks, PTG session is live right now: https://meetpad.opendev.org/apr2025-ptg-os-ansible15:03
noonedeadpunkso don't be shy to jump in :)15:03
jrosseron my way sorry15:03
jrosserback to bck meetings15:03
NeilHanlonon me way15:03
noonedeadpunkI clean forgot to send a ML with info this time :(15:04

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!