Friday, 2023-11-10

opendevreviewMagnus Lööf proposed openstack/kolla-ansible master: Enable TLS backend for designate  https://review.opendev.org/c/openstack/kolla-ansible/+/86652407:42
jovialspatel: I think you'll need to use indirection, since playbook group_vars have precedence over inventory group_vars: enable_neutron_sriov: "{{ enable_neutron_sriov_override | default(false) }}" and put enable_neutron_sriov_override in the inventory group_vars. The other option is customize the inventory. 09:28
jovialsylvr-remote: What was the latest error? That issue with generating the seed inventory? I might need to see some more of the ansible output to work out what is going on there.09:30
jovialsylvr: The only thing I can think of is if the ansible variable kolla_config_path was overridden to something that didn't match the environment variable: KOLLA_CONFIG_PATH09:36
opendevreviewMark Goddard proposed openstack/kayobe stable/2023.1: Fix setting kolla_admin_openrc_cacert  https://review.opendev.org/c/openstack/kayobe/+/90058410:05
opendevreviewMark Goddard proposed openstack/kayobe stable/zed: Fix setting kolla_admin_openrc_cacert  https://review.opendev.org/c/openstack/kayobe/+/90058510:06
opendevreviewMark Goddard proposed openstack/kayobe stable/yoga: Fix setting kolla_admin_openrc_cacert  https://review.opendev.org/c/openstack/kayobe/+/90058610:06
opendevreviewMark Goddard proposed openstack/kayobe stable/yoga: Fix setting kolla_admin_openrc_cacert  https://review.opendev.org/c/openstack/kayobe/+/90058610:09
opendevreviewMark Goddard proposed openstack/kayobe stable/yoga: Fix setting kolla_admin_openrc_cacert  https://review.opendev.org/c/openstack/kayobe/+/90058610:10
opendevreviewMark Goddard proposed openstack/kayobe stable/2023.1: Generate local Kolla Ansible config in check mode  https://review.opendev.org/c/openstack/kayobe/+/90058710:13
opendevreviewMark Goddard proposed openstack/kayobe stable/zed: Generate local Kolla Ansible config in check mode  https://review.opendev.org/c/openstack/kayobe/+/90058810:13
opendevreviewMark Goddard proposed openstack/kayobe stable/yoga: Generate local Kolla Ansible config in check mode  https://review.opendev.org/c/openstack/kayobe/+/90058910:13
opendevreviewMark Goddard proposed openstack/kayobe stable/2023.1: dev: Improve error checking for config check functions  https://review.opendev.org/c/openstack/kayobe/+/90059010:14
opendevreviewMark Goddard proposed openstack/kayobe stable/zed: dev: Improve error checking for config check functions  https://review.opendev.org/c/openstack/kayobe/+/90059110:14
opendevreviewMark Goddard proposed openstack/kayobe stable/yoga: dev: Improve error checking for config check functions  https://review.opendev.org/c/openstack/kayobe/+/90059210:15
opendevreviewMark Goddard proposed openstack/kayobe stable/2023.1: Improve neutron images regex  https://review.opendev.org/c/openstack/kayobe/+/90059310:16
opendevreviewMark Goddard proposed openstack/kayobe stable/zed: Improve neutron images regex  https://review.opendev.org/c/openstack/kayobe/+/90059410:17
opendevreviewWill Szumski proposed openstack/kayobe master: Adds initial support for vGPUs  https://review.opendev.org/c/openstack/kayobe/+/88720011:00
janguttermnasiadka: I've just spent a short time digging into the ironic failures. I've followed the thread to Ironic not finding the tenks node...  https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/kolla/ironic/ironic-api-wsgi.txt#80611:59
jangutterI'll come back to this in a bit (need to figure out what tenks is first), but if this rings a bell for someone more familiar with it, it might be a clue. 12:00
jovialjangutter, possibly that happens when tenks is checking if the node is already registered? We seem to see it the baremetal node list output, but there is a problem with inspector: https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/kolla/ironic-inspector/ironic-inspector.txt#81112:27
jovialinspector not happy: https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/ansible/test-ironic#71-7412:27
opendevreviewVerification of a change to openstack/kayobe master failed: veth: Remove support for EL8 / network-scripts  https://review.opendev.org/c/openstack/kayobe/+/89988812:29
jangutterjovial: ah, I messed up the time sequence: https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/kolla/ironic-inspector/ironic-inspector.txt#773 <-- ironic-inspector isn't chatting with ironic either.12:52
jovialjangutter, timestamps seem to indicate that ironic-inspector started before ironic-api. I wonder if we need some kind of RestartPolicy as it seem to just shutdown right sway and never start back up.12:56
jangutterAnd in the one that works, ironic-inspector restarted.12:57
jovialInteresting, could it of hung on shutdown?12:58
jangutterworking: https://zuul.opendev.org/t/openstack/build/a6a4297bf72e455680e7aae1d81f6793/log/primary/logs/container_logs/ironic_inspector.txt , broken: https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/container_logs/ironic_inspector.txt12:58
jangutterAaah, that triggered during 'reconfigure' on the working (ubuntu) side13:00
jangutter(think it's a red herring maybe)13:01
jangutteryeah, on ubuntu ironic-inspector started slightly after ironic wsgi, and vice versa on the ones that failed.13:21
jangutterjovial: https://etherpad.opendev.org/p/kolla-ansible-rocky9-ironic-debug <--- keeping my notes here13:30
opendevreviewMerged openstack/kayobe stable/zed: Improve neutron images regex  https://review.opendev.org/c/openstack/kayobe/+/90059413:30
opendevreviewJan Gutter proposed openstack/kolla-ansible master: Test Ironic  https://review.opendev.org/c/openstack/kolla-ansible/+/90061613:45
opendevreviewVerification of a change to openstack/kayobe stable/zed failed: Fix setting kolla_admin_openrc_cacert  https://review.opendev.org/c/openstack/kayobe/+/90058513:50
opendevreviewMerged openstack/kayobe stable/yoga: dev: Improve error checking for config check functions  https://review.opendev.org/c/openstack/kayobe/+/90059214:00
opendevreviewMerged openstack/kayobe stable/zed: dev: Improve error checking for config check functions  https://review.opendev.org/c/openstack/kayobe/+/90059114:10
opendevreviewMerged openstack/kayobe stable/2023.1: Generate local Kolla Ansible config in check mode  https://review.opendev.org/c/openstack/kayobe/+/90058714:18
opendevreviewMerged openstack/kayobe stable/2023.1: Improve neutron images regex  https://review.opendev.org/c/openstack/kayobe/+/90059314:19
opendevreviewMerged openstack/kayobe stable/2023.1: Fix setting kolla_admin_openrc_cacert  https://review.opendev.org/c/openstack/kayobe/+/90058414:31
opendevreviewVerification of a change to openstack/kayobe stable/2023.1 failed: dev: Improve error checking for config check functions  https://review.opendev.org/c/openstack/kayobe/+/90059014:31
jovialjangutter, This looks suspect: https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/system_logs/docker-info.txt#6775-677814:52
jovialnormally, Name would be "unless-stopped"14:53
jangutterooh yeah, likely that's the one.14:53
jovialbut I'm not sure if the move to systemd units changed anything?14:54
jovialDoes systemd handle the restart policy these days? 14:56
jovialEither way, it is in the exited state :D14:57
jovialI didn't know we did this: https://github.com/openstack/kolla-ansible/blob/6a737b19686c821c32778bb847c6548d51eef002/tests/templates/globals-default.j2#L515:02
jangutterI'm suffering a bit of PTSD myself this week on hearing "all services must be restarteable".15:05
jangutterI think the service cannot determine at this stage whether it's a bad config, or an intermittent error, so it _must_ retry. Problem is to avoid thundering herd when you do maintenance.15:07
sylvrHi ! I'm here again about my issue : Kolla inventory ostack/src/kayobe-config/etc/kayobe/kolla/inventory/seed is invalid: Path does not exist when running `kayobe seed service deploy`15:45
jovialsylvr: Could you post the output of `kayobe configuration dump -l localhost --var-name kolla_config_path`?15:46
sylvr[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details {     "localhost": "ostack/src/kayobe-config/etc/kayobe/kolla" }15:47
spatelmmalchuk Hey are you there?15:47
spatelI have very stupid question and didn't find answer from google or anywhere :)15:48
spatelHow do I get rid of openvswitch-agent on sriov compute nodes?15:48
jovialspatel: I think you'll need to use indirection, since playbook group_vars have precedence over inventory group_vars: enable_neutron_sriov: "{{ enable_neutron_sriov_override | default(false) }}" and put enable_neutron_sriov_override in the inventory group_vars. The other option is customize the inventory15:49
jovialsylvr: looks like a relative path ... how did that happen? Did you source kayobe-env?15:50
sylvryes ! 15:50
spateljovial Thank you for the reply. But do i tell kolla to not deploy neutron-openvswitch-agent on SRIOV nodes?15:51
SvenKieskespatel: I read that over on the mailing list. question: how does your network setup work without openvswitch-agent? what are you using?15:51
spatelI have sriov compute nodes they don't need openvswitch-agent 15:52
sylvrI can try to remove the conf, but updating it fixed a previous issue, I could try tho15:52
spatelsriov pass PCI bus directly to vm 15:52
spatelI only need neutron-sriov-agent on all my sriov compute nodes (there are no role of any openvswitch agent)15:53
spatelI did remove by hand on all sriov compute and it works without issue... 15:53
spatelLook like kolla-ansible by default deploying openvswitch-agent on all compute no matter what. 15:54
SvenKieskethat assumes that you only schedule sriov enabled VMs on such a compute node15:55
spatelI wish there is a flag to tell kolla-ansible in host_vars or group_vars to not deploy openvswitch agent 15:55
spatelSvenKieske Yes you are correct. I have dedicated SRIOV nodes and they only run SRIOV workload 15:56
SvenKieskeI guess nobody has thought about that scenario just yet15:56
spatel:)15:56
SvenKieskeshould be possible to add a subgroup to compute_nodes that consist of sriov_exclusive nodes where ovs-agent will not be deployed...maybe :)15:57
spatelSvenKieske hmm! tell me how to do that.. :)15:57
SvenKieskethat is basically, from a 10.000 feet point of view, only ansible inventory refactoring and adding/changing a few variables15:57
SvenKieskethe thing is, I don't know how nova/neutron behaves if suddenly there is no ovs-agent on one compute node, if that is even supported, I think you may need to change some more configs there too..15:58
SvenKieskemaybe ask first in #openstack-neutron from the networking side15:59
SvenKieskeI really have no idea :)15:59
spatelinstead of ovs-agent there will be sriov-agent so what is the big deal?15:59
opendevreviewDawud proposed openstack/kolla-ansible master: Remove the `grafana` volume  https://review.opendev.org/c/openstack/kolla-ansible/+/89913615:59
spatelI will ask in neutron channel 16:00
jovialsylvr:I don't quite get how you've managed to get a relative path. This: https://github.com/openstack/kayobe-config/blob/master/kayobe-env#L15-L19, should give you an absolute path. You haven't set the variable: kolla_config_path at all?16:01
sylvrjovial: still not working, I don't think removing the line from the config file changed anything, I've already exported the env var16:01
sylvrooh, you should put an absolute path in the env var and the config?16:01
jovialsylvr: kayobe normally sets this to value of the environment variable16:02
jovialhttps://github.com/openstack/kayobe/blob/master/ansible/inventory/group_vars/all/kolla#L4716:03
spatelSvenKieske lets see if someone reply. But it would be good if you reply to my mailing list thread so other people can chiming in :)16:03
jovialslyvr: I think any absolute path would work, but the environment variable must match the one in defined in ansible16:05
SvenKieskespatel: I see if I can make time for that. As this is really at first a neutron question it is also good to add the [neutron] tag to your subject line in such mails, so neutron people see it as well :)16:05
spatelLet me open new thread.. 16:05
spatelSvenKieske I did test removing all neutron-openvswitch-agent from compute nodes by hand and my cloud still functional without any issue or error :) just for your info that there are no dependency of that agent anymore. 16:06
sylvrjovial: I'll try with absolute path16:06
spatelAgain this is very special case.. (where you only looking for sriov workload and no VxLAN or any other neutron features) 16:07
spatelReason I am looking for this option because I have 300 compute nodes and now imaging I have 600 network agent showing up. It just put extra pressure on my neutron-servers :(16:08
jovialsylvr: Normally I wouldn't set the kolla_config_path and would just use what is configured with kayobe-env16:09
sylvryep, that broke some things16:20
sylvrrunning `kayobe control host bootstrap` return this error : https://pastebin.com/PgdjSm5p16:22
sylvrand kayobe env ins't set to anything16:38
jovialsylvr, I'm confused. Setting kolla_config_path was the only change?16:41
sylvrI had both shell var Kolla config and kayobe config paths16:42
jangutterbtw jovial: looks like there's more than one type of failure in that ironic job :-/16:43
sylvrI modified kayobe/globals.yml for the kayobe config path too16:45
sylvrand the only issue was when trying to deploy service on the seed16:45
jovialslyvr: Could we try and leave those variables undefined in kayobe-config and use kayobe defaults? It looks like a templating issue, but it's not obvious why...16:52
jovialjangutter, yikes, looks like this time it failed trying to power on the ironic node: https://zuul.opendev.org/t/openstack/build/8b8efd7c1db947a889448fd1d9d87559/log/primary/logs/kolla/ironic/ironic-conductor.txt#219617:44
jangutteryeah, and this time it started after ironic conductor.17:45
jovialvbmc logs are unhelpfully empty: https://zuul.opendev.org/t/openstack/build/8b8efd7c1db947a889448fd1d9d87559/log/primary/logs/system_logs/tenks/vbmc-tk0.txt17:46
jovialThere is this: libvirt:  error : Cannot set interface flags on 'macvtap1': Value too large for defined data type17:49
jovialWe needed this in kayobe: https://github.com/openstack/kayobe/commit/990370a3673b8bdf4882816926868dd7b422db6017:51
joviallibvirt_vm_trust_guest_rx_filters: false17:51
jovialShould we try adding that to: https://github.com/openstack/kolla-ansible/blob/3e0014a7ea55651703eb03a5e8a513105a4de4aa/tests/templates/tenks-deploy-config.yml.j2#L4?17:52
jovial^ Not that line, but in that file17:52
spateljovial I have different question, are there any performance difference running vms inside libvirt container vs metal itself? 17:53
jovialNot that I am aware of, the container uses the host PID namespace for qemu processes IIRC17:55
jovialSo I can't imagine there would be a significant difference17:58
opendevreviewMerged openstack/kayobe master: veth: Remove support for EL8 / network-scripts  https://review.opendev.org/c/openstack/kayobe/+/89988818:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!