open-noob | Hi all - noob question - I am trying to install openstack on my ubuntu vm, with 16384 GB RAM, 150GB disk. Using openstack-ansible 2024.2. I keep getting this bunch of notifications: haproxy[152240]: backend repo_all-back has no server available! Broadcast message from systemd-journald@aio1 (Tue 2025-04-08 18:01:31 EDT): haproxy[155832]: backend galera-back has no server available! ssharma@stack:~$ ssharma@stack:~$ Broadcast message from | 00:04 |
---|---|---|
open-noob | What could the problem be? Im using only the configurations generated by scripts/aio.sh | 00:04 |
f0o | Quite a lot of traffic here recently :) | 06:13 |
jrosser | it’s good! has been really quiet for a while | 06:26 |
f0o | jrosser: can you have a look at https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601 ? | 06:32 |
f0o | all other sysctl changes made it but this one seems stuck. Should I rebase? | 06:33 |
f0o | I looked at the Zuul failures but they seem entirely unrelated to my change | 06:35 |
f0o | (Same for my recent errorfiles changes) | 06:35 |
darkhackernc | https://paste.centos.org/view/05bb9025 | 06:53 |
darkhackernc | rabbitmq is broken and no users in the rabbitmqctl list_users | 06:54 |
darkhackernc | anyone can guide/help for the fix please | 06:54 |
darkhackernc | its a live env | 06:54 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-haproxy_server master: Remove extra whitespace delimiter to satisfy ansible-lint https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939606 | 07:12 |
jrosser | f0o: i think that this needs a manual rebase https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601/6 | 07:13 |
f0o | Can do! | 07:13 |
jrosser | can you try that? its not sufficiently trivial that gerrit will do it for you in the UI | 07:13 |
jrosser | cool, thanks | 07:13 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-haproxy_server master: Make sysctl configuration path configurable Defaults to /etc/sysctl.conf to retain current behavior https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601 | 07:14 |
f0o | I think the rebase was needed because `yes` became `true` somewhere | 07:15 |
jrosser | ah right indeed we have done a round of getting ansible-lint happy | 07:16 |
f0o | makes sense :) | 07:16 |
jrosser | darkhackernc: is yours a 'metal' deployment, without LXC containers for rabbitmq? | 07:16 |
jrosser | its slightly surprising that you run rabbitmqctl on the controller host | 07:17 |
darkhackernc | jrosser, metal | 07:18 |
darkhackernc | not lxc | 07:18 |
noonedeadpunk | f0o: I commented on it | 07:27 |
noonedeadpunk | I wonder when rocky will publish relevant ovn/ovs | 07:28 |
noonedeadpunk | ah, ovs is there... not ovn... | 07:30 |
noonedeadpunk | NeilHanlon: ^ :) https://zuul.opendev.org/t/openstack/build/a47da36e1e4c42958940aaa7da0d92b0 | 07:30 |
f0o | noonedeadpunk: Oh release note... I will have to read up on that | 07:31 |
noonedeadpunk | it's kinda `pip install reno; reno new haproxy_sysctl_location` | 07:32 |
f0o | it generated a boilerplate releasenotes/notes/haproxy_sysctl_location-e18310fd96597a6f.yaml but I wonder what the contents should be like, I guess everything but `features:` goes away? | 07:38 |
f0o | sorry for the silly questions :) | 07:39 |
jrosser | yes delete everything thats not relevant | 07:40 |
f0o | https://paste.opendev.org/show/biPnMS84QhwVGLHlLxmw/ is this enough or is it too brief? | 07:46 |
noonedeadpunk | f0o: it's fine, just drop prelude please | 08:09 |
f0o | okidoki | 08:09 |
noonedeadpunk | as it's gonna render as release highlight more or less | 08:09 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-haproxy_server master: Make sysctl configuration path configurable Defaults to /etc/sysctl.conf to retain current behavior https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601 | 08:09 |
noonedeadpunk | like first thing for the release: https://docs.openstack.org/releasenotes/openstack-ansible/2024.2.html#prelude | 08:10 |
f0o | aaah | 08:10 |
darkhackernc | thanks noonedeadpunk jrosser for the help | 08:19 |
darkhackernc | cluster is good now | 08:20 |
darkhackernc | how about adding this to rabbitmq | 08:45 |
darkhackernc | RABBITMQ_SERVER_ERL_ARGS="+K true +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<15000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<15000:64/native>>}]" | 08:45 |
darkhackernc | will this enhance rabbitmq capability | 08:45 |
noonedeadpunk | darkhackernc: I'd suggest using `rabbitmq_erlang_extra_args` variable to set this up | 09:03 |
noonedeadpunk | it will propogate into the template: https://opendev.org/openstack/openstack-ansible-rabbitmq_server/src/branch/master/templates/rabbitmq-env.j2#L10-L12 | 09:04 |
noonedeadpunk | about the values - really no idea what they are doing | 09:04 |
darkhackernc | noonedeadpunk++ | 09:04 |
noonedeadpunk | eventually HA queues create a huge load on the cluster | 09:06 |
noonedeadpunk | I think it was caracal, where we did upgrade of rabbitmq and oslo.messaging got modern enough to support streams and quorum queues | 09:07 |
noonedeadpunk | which should improve rabbit capacity indeed | 09:08 |
noonedeadpunk | disabling ha queues can be an option as well | 09:08 |
noonedeadpunk | I wonder if we should set rocky to NV right now, until ovn/ovs is sorted out... | 09:12 |
noonedeadpunk | or just wait... | 09:12 |
darkhackernc | noonedeadpunk, thank you so much | 09:17 |
opendevreview | Daniel Preussker proposed openstack/openstack-ansible-haproxy_server master: Make sysctl configuration path configurable https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/939601 | 10:17 |
f0o | when having additional repos in /etc/apt/sources.list.d/ how can I copy over the gpg keyrings into the lxc as well? is it just a matter of placing the keyring in the same folder? Right now the lxc caching step will always fail because it cannot get past `apt update` as the additional sources' gpg keyring is missing | 10:48 |
jrosser | f0o: these things should be copied over i think https://github.com/openstack/openstack-ansible-lxc_hosts/blob/master/vars/debian.yml#L23-L33 | 10:52 |
jrosser | the idea is that if the host has apt repo config then that should be copied over into the image | 10:53 |
f0o | jrosser: thanks! we had them /usr/share/keyrings/ but its easier to just move that | 10:53 |
f0o | worked like a breeze :) | 11:00 |
derekokeeffe | Hi again all, noonedeadpunk I forgot to actually thank you for thet script in the first place. Got around to running it and the result is here https://paste.openstack.org/show/bQonP6K4XKc6K8WDBacC/ not sure what the failure is, you might have some insight if you have time please | 11:01 |
darkhackernc | noonedeadpunk, https://bugs.launchpad.net/openstack-ansible/+bug/2106614 | 11:01 |
noonedeadpunk | derekokeeffe: eh.... that looks like some weird thing in opensdtack_user_config | 11:05 |
noonedeadpunk | as `eth11` is not really expected in where it;s placed | 11:06 |
derekokeeffe | Hmm oko, I better take a look at that config file so | 11:15 |
noonedeadpunk | but at least your management network feels fine | 11:16 |
noonedeadpunk | So I think that mariadb is broken not on this | 11:17 |
noonedeadpunk | have you checked haproxy backends? | 11:18 |
noonedeadpunk | are they healthy regarding mariadb backends? | 11:18 |
derekokeeffe | Sorry for my ignorance but how would I check that? | 11:23 |
noonedeadpunk | echo "show stat" | nc -U /run/haproxy.stat | grep galera | 11:27 |
noonedeadpunk | or using hatop | 11:28 |
f0o | am I correct in assuming that all of this are manual steps that I'm required to do on each storage host? https://docs.openstack.org/openstack-ansible-os_swift/latest/configure-swift-devices.html | 11:40 |
f0o | I couldnt find any fstab ansible shenanigans in os_swift either | 11:41 |
f0o | just wondering what the ops here is... having to mnaually go through ~50 hosts and run mkfs as well as editing fstab is gonna be tedious X_X | 11:41 |
noonedeadpunk | so for mounting part - you should be able to leverage systemd_mount role | 11:42 |
noonedeadpunk | but it does not provide formatting of drives | 11:42 |
f0o | I shall look into this `systemd_mount role` | 11:43 |
noonedeadpunk | So we do have it included in openstack_hosts | 11:43 |
f0o | the formatting is not an issue actually, I think we got that covered with the pxe template already... So mounting/fstab is the only tedious part left | 11:44 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible-openstack_hosts/src/branch/master/tasks/openstack_hosts_systemd.yml#L38-L44 | 11:44 |
f0o | ty! | 11:44 |
noonedeadpunk | format of openstack_hosts_systemd_mounts looks like this: https://opendev.org/openstack/ansible-role-systemd_mount/src/branch/master/defaults/main.yml#L46 | 11:44 |
noonedeadpunk | so you pretty much define `openstack_hosts_systemd_mounts` as host/group_vars | 11:44 |
f0o | so I can just add a group_vars for the swift_hosts and toss all the sdxy into it | 11:45 |
f0o | neato | 11:45 |
f0o | <3 | 11:45 |
noonedeadpunk | and when running openstack_hosts - role will configure mounts for you | 11:45 |
noonedeadpunk | yeah | 11:45 |
f0o | starting to like this ansible-voodoo | 11:45 |
noonedeadpunk | you can also do same with networks :) | 11:45 |
jrosser | f0o: a bunch of these "utility" roles for mounts / systemd services / systemd network are designed to be able to be used also outside openstack-ansible | 11:47 |
noonedeadpunk | fwiw, today in new envs I just leverage systemd_networkd for configuring networks on hosts for me. But tbh - coming up with initial config (which I can copy/paste for any new env) took some timer | 11:47 |
jrosser | i use them a bunch for deploying other things and they are super handy | 11:47 |
noonedeadpunk | ++ | 11:47 |
jrosser | noonedeadpunk: do you have some approach for (hah) consistent device naming for network setup? | 11:48 |
jrosser | like udev or something to make the interfaces have some useful names | 11:49 |
f0o | jrosser: ennpXsY has been working great for us, it's predictable since X is pcie port and Y is the interface on the card | 11:50 |
jrosser | until you upgrade your OS | 11:50 |
jrosser | then its similar, but different | 11:50 |
f0o | havent had an issue with that yet | 11:50 |
f0o | for our routers we use mac encoding | 11:51 |
f0o | enx1234567abc | 11:51 |
f0o | becomes tedious to type but it's guaranteed to stay | 11:51 |
jrosser | things became enpXsYfZnpA on noble hosts | 11:53 |
f0o | then your card has subslots | 11:54 |
f0o | like 40G with 4x10G breakouts | 11:54 |
f0o | at least that's what we observed | 11:55 |
jrosser | that is an ancient connectx-4 | 11:55 |
jrosser | so i suspect it is more that the driver in noble understands things like subslots now and everything seems to be expressed that way, even if the hardware cant do it | 11:56 |
jrosser | i don't think it's until cx-7 that you can breakout a nic from the host perspective | 11:56 |
f0o | maybe it uses the same naming as newer connectx's... `fZ` is usually for sr-iov functions | 11:56 |
f0o | all our sr-iovs use enpXsYf[0-2048]np[0-1] | 11:57 |
f0o | network naming is fun (: | 11:57 |
jrosser | at the moment we have not needed to record all the mac addresses in our inventory, so it's not trivial to drop udev config to make some actually persistent names | 11:58 |
jrosser | we only have the mac for the interface the host pxeboots from | 11:58 |
derekokeeffe | noonedeadpunk this is the output https://paste.openstack.org/show/blaQI2nfgsLWyk1iqqdA/ | 11:58 |
noonedeadpunk | well :) | 11:59 |
derekokeeffe | Ths sounds promising :) what have I done!!! | 11:59 |
noonedeadpunk | what is haproxy_keepalived_internal_vip_cidr ? | 12:01 |
derekokeeffe | haproxy_keepalived_external_vip_cidr: "193.1.8.6/32" | 12:01 |
derekokeeffe | haproxy_keepalived_internal_vip_cidr: "172.29.236.6/32" | 12:01 |
derekokeeffe | I changed it from a 193 address yesterday | 12:02 |
derekokeeffe | Same as the container mgmt net | 12:02 |
f0o | I tihnk there's an issue with the os_swift 'Create the swift system user' task. There's a chicken egg in the swift-proxy contianer because it tries to execute ssh-keygen before installing it first | 12:03 |
noonedeadpunk | I assume you've changed DNS as well, right? | 12:03 |
f0o | so swift_pre_install creates the user with `generate_ssh_key: "yes"` but openssh-client is only installed in swift_install ... that seems like a chicken-egg issue | 12:05 |
derekokeeffe | I haven't set up any DNS entries yet, should I have? | 12:05 |
f0o | Just verified, if I manually install openssh-client in the swift-proxy lxc the playbook passes the step | 12:06 |
f0o | I have a feeling os_swift is broken in many ways - it's trying to upload the rings from the deployment host but didnt set the deployment hosts' ssh-keys into authorized keys... am I missing something? | 12:13 |
noonedeadpunk | frankly - I hardly ever used swift | 12:36 |
noonedeadpunk | no nothing about it, except it passes CI somehow | 12:36 |
f0o | I start to doubt myself so much right now... I think my config is correct... but I get a lot of odd errors. So I just did the lxc-destroy playbook to remove the proxy containers and start from scratch | 12:38 |
f0o | I've also had issues with the actual storage hosts (not containers) not finding the br-storage, but it's right there in `ip l` with IP and all because nova uses it for cinder :D | 12:38 |
noonedeadpunk | but I think you might be right about race conditions, as we test in AIO | 12:40 |
noonedeadpunk | and multinode can have quirks | 12:41 |
noonedeadpunk | also I wonder if we're still using regular ssh keys for swift... as that I;'d expect to be replaced with ssh certs | 12:41 |
f0o | I can live with the openssh-client one, I can isntall that on the proxies real quick. But the rsync from deployment host to proxies for the rings is much bigger issue for me - and I have no idea where to even start with the storage hosts not finding br-storage | 12:42 |
noonedeadpunk | but we could miss swift when doing keystone/nova conversion to certs | 12:42 |
darkhackernc | noonedeadpunk, https://bugs.launchpad.net/openstack-ansible/+bug/2106625 | 12:46 |
darkhackernc | why after the nova-compute service restart , instances lost the connectivyt , restart of openvswitch reform the connectivyt | 12:47 |
darkhackernc | any know issues? | 12:48 |
jrosser | f0o: i don't think i ever looked at the swift role when we did SSH CAs | 13:08 |
jrosser | if there is code there that tries to generate and distribute ssh keys across all the swift hosts then we can totally do better than that | 13:08 |
f0o | yay :D | 13:09 |
f0o | at least it means I'm not going insane | 13:10 |
jrosser | i am really not sure anyone has used the os_swift role for real in a very long time | 13:11 |
jrosser | most people are ceph/radosgw | 13:11 |
f0o | ceph seems like a very big overhead to just add cheap object storage... I get it if ceph is also the cinder backend | 13:12 |
f0o | but yeah it seems that os_swift is nonfunctional outside of AIO | 13:12 |
jrosser | i'm really just saying that the swift role almost certainkly needs some work | 13:12 |
jrosser | and i am currently completely -ETIME for looking at it, plus my clouds don't use it | 13:13 |
jrosser | if you are interested to see how it should work, the keystone role uses rsync to copy the fernet keys between hosts and we set that up nicely so that you don't need a "full mesh" of installed public keys | 13:14 |
noonedeadpunk | f0o: I could look at it but after the PTG week only | 13:14 |
jrosser | the same approach should be lift/shift into swift | 13:14 |
noonedeadpunk | and given that some more information is provided on issues | 13:14 |
f0o | I'll have a crack at it. There might be some Qs here and there at points where I'm lost (namely `Swift storage address not found` when it absolutely does exist) but at least the user-add and ssh-ca I should be able to handle | 13:16 |
f0o | will have a go tomorrow, worst case I learn some ansible :P | 13:18 |
f0o | but now it's Lil Lordag as we say, time for a beer on a wednesday | 13:18 |
noonedeadpunk | f0o: so that "failure" boils to this variable: https://opendev.org/openstack/openstack-ansible-os_swift/src/branch/master/defaults/main.yml#L273-L287 | 13:18 |
f0o | noonedeadpunk: https://paste.opendev.org/show/bqJJUFVI4dcxT6Iko9Rp/ | 13:20 |
noonedeadpunk | and that's what we have in AIO: https://opendev.org/openstack/openstack-ansible/src/branch/master/etc/openstack_deploy/conf.d/swift.yml.aio#L2 | 13:20 |
f0o | yup that's what I took as template | 13:21 |
noonedeadpunk | did you get failure for the LXC or bare metal host? | 13:21 |
f0o | noonedeadpunk: do I require the container_vars>swift_vars>... objects? I assumed they would be flat for hosts (non-lxc deployment) | 13:21 |
noonedeadpunk | um, no you don't | 13:22 |
noonedeadpunk | I also think that zone/drives can be moved to group_vars easily | 13:23 |
f0o | probably, this was just a short copy-paste. the zones do change down the line as the hosts are in different racks | 13:24 |
noonedeadpunk | and also I think they should be under `swift` variable | 13:24 |
f0o | no matter, I think my plan right now is to solve the user creation & SSH-CA first; then deal with the storage address and then iterative with the rest. | 13:25 |
noonedeadpunk | as like that example jsut having all that stuff inside of single var: https://opendev.org/openstack/openstack-ansible-os_swift/src/branch/master/defaults/main.yml#L273-L287 | 13:25 |
f0o | you mean h4_2>swift_vars>zone:123 ? | 13:25 |
jrosser | it is probably good to generally put things more in /openstack_deploy/group_vars than in user_variables | 13:25 |
noonedeadpunk | So when you mix up user_variables with stuff from openstack_user_config - you likely having conflicts | 13:25 |
noonedeadpunk | yeah | 13:26 |
noonedeadpunk | I think you need all that to be part of `swift` variable like described in the role | 13:26 |
jrosser | i would avoid totally where possible putting any vars in openstack_user_config | 13:26 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible-os_swift/src/branch/master/defaults/main.yml#L273-L287 | 13:26 |
noonedeadpunk | yeah, it's a weird pattern we support for some reason... | 13:27 |
noonedeadpunk | from really early days I believe | 13:27 |
jrosser | right - i do remember the time the group/host_vars became possible | 13:27 |
jrosser | so it might be a thing that was for the same purpose directly in the inventory before that | 13:27 |
f0o | :D | 13:28 |
noonedeadpunk | f0o: also jsut to double check - you have seen https://docs.openstack.org/openstack-ansible-os_swift/latest/configure-swift.html ? | 13:28 |
f0o | I have which lead to a few more Qs than As | 13:28 |
noonedeadpunk | yeah, right | 13:28 |
f0o | because I'm not 100% sure on the meaning of container_vars when you use baremetal deployment for the storage hosts | 13:29 |
noonedeadpunk | I already see it's outdated... | 13:29 |
noonedeadpunk | and I see where you took the approach you took | 13:29 |
f0o | also really uncertain if `swift:{}` needs to be in global_overrides or not | 13:30 |
f0o | just adds to the Qs | 13:30 |
f0o | I now wonder what the compute overhead of ceph is for "simple" object storage... | 13:30 |
f0o | :D | 13:31 |
noonedeadpunk | no, just place it in user_variables or gropup_vars | 13:32 |
noonedeadpunk | `swift` is a regular variable | 13:32 |
f0o | but alright, my plan is to delve into the ssh-ca taking keystone fernet keys as example and also how the user is created elsewhere. I should be able to create a patch fixing that part at least by tmr afternoon. | 13:32 |
noonedeadpunk | and docs there are indeed written for times where group_vars were not possuble | 13:33 |
noonedeadpunk | so it tried to account for different variable content based on the group | 13:33 |
noonedeadpunk | so that `swift` was diferent for different groups | 13:33 |
noonedeadpunk | (or different hosts) | 13:34 |
f0o | alright the sun is out and I'm off for that lil lordag beer. Thanks for the pointers and I'll try to fix it myself tomorrow but worst case ask a bunch of silly Qs again (: | 13:35 |
noonedeadpunk | I can try adopting docs for modern usecase, but then I have very little idea about what it should be as close to never actually used swift | 13:35 |
f0o | I'll have a beer for you guys as well :) | 13:35 |
f0o | catch you all later :) | 13:36 |
noonedeadpunk | yeah, just shout the question | 13:36 |
noonedeadpunk | if any, I'd be glad to help with sorting things out and giving some love to our roles/docs | 13:36 |
mgariepy | well my issue with placement and host-aggregate is du to fqdn being for the hosts in hypervisor_hostname in my 10 year old production cluster. | 13:38 |
noonedeadpunk | I _think_ it's generally not an issue..... | 13:38 |
noonedeadpunk | we have aggregates working with fqdn in hypervisors and short hostname in compute servoice list | 13:38 |
noonedeadpunk | (and then in aggregate they somehow matching) | 13:39 |
noonedeadpunk | also in quite old envs... | 13:39 |
noonedeadpunk | so _generally_ I'd expect different hostnames not being a deal breaker today | 13:39 |
mgariepy | the aggregate did not migrate in my production system i had to add all the compute to the correct aggregate | 13:42 |
mgariepy | when i look at my testbed with all shortname for everything the placement aggretate is there. | 13:46 |
mgariepy | for the other none of the fqdn hosts were added to aggregates in placements. | 13:46 |
mgariepy | might be fixed in newer release i guess. | 13:46 |
noonedeadpunk | yeah, indeed it could be just old release... | 14:16 |
mgariepy | maybe indeed | 14:46 |
noonedeadpunk | Folks, PTG session is live right now: https://meetpad.opendev.org/apr2025-ptg-os-ansible | 15:03 |
noonedeadpunk | so don't be shy to jump in :) | 15:03 |
jrosser | on my way sorry | 15:03 |
jrosser | back to bck meetings | 15:03 |
NeilHanlon | on me way | 15:03 |
noonedeadpunk | I clean forgot to send a ML with info this time :( | 15:04 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!