opendevreview | Magnus Lööf proposed openstack/kolla-ansible master: Enable TLS backend for designate https://review.opendev.org/c/openstack/kolla-ansible/+/866524 | 07:42 |
---|---|---|
jovial | spatel: I think you'll need to use indirection, since playbook group_vars have precedence over inventory group_vars: enable_neutron_sriov: "{{ enable_neutron_sriov_override | default(false) }}" and put enable_neutron_sriov_override in the inventory group_vars. The other option is customize the inventory. | 09:28 |
jovial | sylvr-remote: What was the latest error? That issue with generating the seed inventory? I might need to see some more of the ansible output to work out what is going on there. | 09:30 |
jovial | sylvr: The only thing I can think of is if the ansible variable kolla_config_path was overridden to something that didn't match the environment variable: KOLLA_CONFIG_PATH | 09:36 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/2023.1: Fix setting kolla_admin_openrc_cacert https://review.opendev.org/c/openstack/kayobe/+/900584 | 10:05 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/zed: Fix setting kolla_admin_openrc_cacert https://review.opendev.org/c/openstack/kayobe/+/900585 | 10:06 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/yoga: Fix setting kolla_admin_openrc_cacert https://review.opendev.org/c/openstack/kayobe/+/900586 | 10:06 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/yoga: Fix setting kolla_admin_openrc_cacert https://review.opendev.org/c/openstack/kayobe/+/900586 | 10:09 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/yoga: Fix setting kolla_admin_openrc_cacert https://review.opendev.org/c/openstack/kayobe/+/900586 | 10:10 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/2023.1: Generate local Kolla Ansible config in check mode https://review.opendev.org/c/openstack/kayobe/+/900587 | 10:13 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/zed: Generate local Kolla Ansible config in check mode https://review.opendev.org/c/openstack/kayobe/+/900588 | 10:13 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/yoga: Generate local Kolla Ansible config in check mode https://review.opendev.org/c/openstack/kayobe/+/900589 | 10:13 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/2023.1: dev: Improve error checking for config check functions https://review.opendev.org/c/openstack/kayobe/+/900590 | 10:14 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/zed: dev: Improve error checking for config check functions https://review.opendev.org/c/openstack/kayobe/+/900591 | 10:14 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/yoga: dev: Improve error checking for config check functions https://review.opendev.org/c/openstack/kayobe/+/900592 | 10:15 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/2023.1: Improve neutron images regex https://review.opendev.org/c/openstack/kayobe/+/900593 | 10:16 |
opendevreview | Mark Goddard proposed openstack/kayobe stable/zed: Improve neutron images regex https://review.opendev.org/c/openstack/kayobe/+/900594 | 10:17 |
opendevreview | Will Szumski proposed openstack/kayobe master: Adds initial support for vGPUs https://review.opendev.org/c/openstack/kayobe/+/887200 | 11:00 |
jangutter | mnasiadka: I've just spent a short time digging into the ironic failures. I've followed the thread to Ironic not finding the tenks node... https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/kolla/ironic/ironic-api-wsgi.txt#806 | 11:59 |
jangutter | I'll come back to this in a bit (need to figure out what tenks is first), but if this rings a bell for someone more familiar with it, it might be a clue. | 12:00 |
jovial | jangutter, possibly that happens when tenks is checking if the node is already registered? We seem to see it the baremetal node list output, but there is a problem with inspector: https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/kolla/ironic-inspector/ironic-inspector.txt#811 | 12:27 |
jovial | inspector not happy: https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/ansible/test-ironic#71-74 | 12:27 |
opendevreview | Verification of a change to openstack/kayobe master failed: veth: Remove support for EL8 / network-scripts https://review.opendev.org/c/openstack/kayobe/+/899888 | 12:29 |
jangutter | jovial: ah, I messed up the time sequence: https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/kolla/ironic-inspector/ironic-inspector.txt#773 <-- ironic-inspector isn't chatting with ironic either. | 12:52 |
jovial | jangutter, timestamps seem to indicate that ironic-inspector started before ironic-api. I wonder if we need some kind of RestartPolicy as it seem to just shutdown right sway and never start back up. | 12:56 |
jangutter | And in the one that works, ironic-inspector restarted. | 12:57 |
jovial | Interesting, could it of hung on shutdown? | 12:58 |
jangutter | working: https://zuul.opendev.org/t/openstack/build/a6a4297bf72e455680e7aae1d81f6793/log/primary/logs/container_logs/ironic_inspector.txt , broken: https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/container_logs/ironic_inspector.txt | 12:58 |
jangutter | Aaah, that triggered during 'reconfigure' on the working (ubuntu) side | 13:00 |
jangutter | (think it's a red herring maybe) | 13:01 |
jangutter | yeah, on ubuntu ironic-inspector started slightly after ironic wsgi, and vice versa on the ones that failed. | 13:21 |
jangutter | jovial: https://etherpad.opendev.org/p/kolla-ansible-rocky9-ironic-debug <--- keeping my notes here | 13:30 |
opendevreview | Merged openstack/kayobe stable/zed: Improve neutron images regex https://review.opendev.org/c/openstack/kayobe/+/900594 | 13:30 |
opendevreview | Jan Gutter proposed openstack/kolla-ansible master: Test Ironic https://review.opendev.org/c/openstack/kolla-ansible/+/900616 | 13:45 |
opendevreview | Verification of a change to openstack/kayobe stable/zed failed: Fix setting kolla_admin_openrc_cacert https://review.opendev.org/c/openstack/kayobe/+/900585 | 13:50 |
opendevreview | Merged openstack/kayobe stable/yoga: dev: Improve error checking for config check functions https://review.opendev.org/c/openstack/kayobe/+/900592 | 14:00 |
opendevreview | Merged openstack/kayobe stable/zed: dev: Improve error checking for config check functions https://review.opendev.org/c/openstack/kayobe/+/900591 | 14:10 |
opendevreview | Merged openstack/kayobe stable/2023.1: Generate local Kolla Ansible config in check mode https://review.opendev.org/c/openstack/kayobe/+/900587 | 14:18 |
opendevreview | Merged openstack/kayobe stable/2023.1: Improve neutron images regex https://review.opendev.org/c/openstack/kayobe/+/900593 | 14:19 |
opendevreview | Merged openstack/kayobe stable/2023.1: Fix setting kolla_admin_openrc_cacert https://review.opendev.org/c/openstack/kayobe/+/900584 | 14:31 |
opendevreview | Verification of a change to openstack/kayobe stable/2023.1 failed: dev: Improve error checking for config check functions https://review.opendev.org/c/openstack/kayobe/+/900590 | 14:31 |
jovial | jangutter, This looks suspect: https://zuul.opendev.org/t/openstack/build/735e7425501d44819d221838998b69a5/log/primary/logs/system_logs/docker-info.txt#6775-6778 | 14:52 |
jovial | normally, Name would be "unless-stopped" | 14:53 |
jangutter | ooh yeah, likely that's the one. | 14:53 |
jovial | but I'm not sure if the move to systemd units changed anything? | 14:54 |
jovial | Does systemd handle the restart policy these days? | 14:56 |
jovial | Either way, it is in the exited state :D | 14:57 |
jovial | I didn't know we did this: https://github.com/openstack/kolla-ansible/blob/6a737b19686c821c32778bb847c6548d51eef002/tests/templates/globals-default.j2#L5 | 15:02 |
jangutter | I'm suffering a bit of PTSD myself this week on hearing "all services must be restarteable". | 15:05 |
jangutter | I think the service cannot determine at this stage whether it's a bad config, or an intermittent error, so it _must_ retry. Problem is to avoid thundering herd when you do maintenance. | 15:07 |
sylvr | Hi ! I'm here again about my issue : Kolla inventory ostack/src/kayobe-config/etc/kayobe/kolla/inventory/seed is invalid: Path does not exist when running `kayobe seed service deploy` | 15:45 |
jovial | sylvr: Could you post the output of `kayobe configuration dump -l localhost --var-name kolla_config_path`? | 15:46 |
sylvr | [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details { "localhost": "ostack/src/kayobe-config/etc/kayobe/kolla" } | 15:47 |
spatel | mmalchuk Hey are you there? | 15:47 |
spatel | I have very stupid question and didn't find answer from google or anywhere :) | 15:48 |
spatel | How do I get rid of openvswitch-agent on sriov compute nodes? | 15:48 |
jovial | spatel: I think you'll need to use indirection, since playbook group_vars have precedence over inventory group_vars: enable_neutron_sriov: "{{ enable_neutron_sriov_override | default(false) }}" and put enable_neutron_sriov_override in the inventory group_vars. The other option is customize the inventory | 15:49 |
jovial | sylvr: looks like a relative path ... how did that happen? Did you source kayobe-env? | 15:50 |
sylvr | yes ! | 15:50 |
spatel | jovial Thank you for the reply. But do i tell kolla to not deploy neutron-openvswitch-agent on SRIOV nodes? | 15:51 |
SvenKieske | spatel: I read that over on the mailing list. question: how does your network setup work without openvswitch-agent? what are you using? | 15:51 |
spatel | I have sriov compute nodes they don't need openvswitch-agent | 15:52 |
sylvr | I can try to remove the conf, but updating it fixed a previous issue, I could try tho | 15:52 |
spatel | sriov pass PCI bus directly to vm | 15:52 |
spatel | I only need neutron-sriov-agent on all my sriov compute nodes (there are no role of any openvswitch agent) | 15:53 |
spatel | I did remove by hand on all sriov compute and it works without issue... | 15:53 |
spatel | Look like kolla-ansible by default deploying openvswitch-agent on all compute no matter what. | 15:54 |
SvenKieske | that assumes that you only schedule sriov enabled VMs on such a compute node | 15:55 |
spatel | I wish there is a flag to tell kolla-ansible in host_vars or group_vars to not deploy openvswitch agent | 15:55 |
spatel | SvenKieske Yes you are correct. I have dedicated SRIOV nodes and they only run SRIOV workload | 15:56 |
SvenKieske | I guess nobody has thought about that scenario just yet | 15:56 |
spatel | :) | 15:56 |
SvenKieske | should be possible to add a subgroup to compute_nodes that consist of sriov_exclusive nodes where ovs-agent will not be deployed...maybe :) | 15:57 |
spatel | SvenKieske hmm! tell me how to do that.. :) | 15:57 |
SvenKieske | that is basically, from a 10.000 feet point of view, only ansible inventory refactoring and adding/changing a few variables | 15:57 |
SvenKieske | the thing is, I don't know how nova/neutron behaves if suddenly there is no ovs-agent on one compute node, if that is even supported, I think you may need to change some more configs there too.. | 15:58 |
SvenKieske | maybe ask first in #openstack-neutron from the networking side | 15:59 |
SvenKieske | I really have no idea :) | 15:59 |
spatel | instead of ovs-agent there will be sriov-agent so what is the big deal? | 15:59 |
opendevreview | Dawud proposed openstack/kolla-ansible master: Remove the `grafana` volume https://review.opendev.org/c/openstack/kolla-ansible/+/899136 | 15:59 |
spatel | I will ask in neutron channel | 16:00 |
jovial | sylvr:I don't quite get how you've managed to get a relative path. This: https://github.com/openstack/kayobe-config/blob/master/kayobe-env#L15-L19, should give you an absolute path. You haven't set the variable: kolla_config_path at all? | 16:01 |
sylvr | jovial: still not working, I don't think removing the line from the config file changed anything, I've already exported the env var | 16:01 |
sylvr | ooh, you should put an absolute path in the env var and the config? | 16:01 |
jovial | sylvr: kayobe normally sets this to value of the environment variable | 16:02 |
jovial | https://github.com/openstack/kayobe/blob/master/ansible/inventory/group_vars/all/kolla#L47 | 16:03 |
spatel | SvenKieske lets see if someone reply. But it would be good if you reply to my mailing list thread so other people can chiming in :) | 16:03 |
jovial | slyvr: I think any absolute path would work, but the environment variable must match the one in defined in ansible | 16:05 |
SvenKieske | spatel: I see if I can make time for that. As this is really at first a neutron question it is also good to add the [neutron] tag to your subject line in such mails, so neutron people see it as well :) | 16:05 |
spatel | Let me open new thread.. | 16:05 |
spatel | SvenKieske I did test removing all neutron-openvswitch-agent from compute nodes by hand and my cloud still functional without any issue or error :) just for your info that there are no dependency of that agent anymore. | 16:06 |
sylvr | jovial: I'll try with absolute path | 16:06 |
spatel | Again this is very special case.. (where you only looking for sriov workload and no VxLAN or any other neutron features) | 16:07 |
spatel | Reason I am looking for this option because I have 300 compute nodes and now imaging I have 600 network agent showing up. It just put extra pressure on my neutron-servers :( | 16:08 |
jovial | sylvr: Normally I wouldn't set the kolla_config_path and would just use what is configured with kayobe-env | 16:09 |
sylvr | yep, that broke some things | 16:20 |
sylvr | running `kayobe control host bootstrap` return this error : https://pastebin.com/PgdjSm5p | 16:22 |
sylvr | and kayobe env ins't set to anything | 16:38 |
jovial | sylvr, I'm confused. Setting kolla_config_path was the only change? | 16:41 |
sylvr | I had both shell var Kolla config and kayobe config paths | 16:42 |
jangutter | btw jovial: looks like there's more than one type of failure in that ironic job :-/ | 16:43 |
sylvr | I modified kayobe/globals.yml for the kayobe config path too | 16:45 |
sylvr | and the only issue was when trying to deploy service on the seed | 16:45 |
jovial | slyvr: Could we try and leave those variables undefined in kayobe-config and use kayobe defaults? It looks like a templating issue, but it's not obvious why... | 16:52 |
jovial | jangutter, yikes, looks like this time it failed trying to power on the ironic node: https://zuul.opendev.org/t/openstack/build/8b8efd7c1db947a889448fd1d9d87559/log/primary/logs/kolla/ironic/ironic-conductor.txt#2196 | 17:44 |
jangutter | yeah, and this time it started after ironic conductor. | 17:45 |
jovial | vbmc logs are unhelpfully empty: https://zuul.opendev.org/t/openstack/build/8b8efd7c1db947a889448fd1d9d87559/log/primary/logs/system_logs/tenks/vbmc-tk0.txt | 17:46 |
jovial | There is this: libvirt: error : Cannot set interface flags on 'macvtap1': Value too large for defined data type | 17:49 |
jovial | We needed this in kayobe: https://github.com/openstack/kayobe/commit/990370a3673b8bdf4882816926868dd7b422db60 | 17:51 |
jovial | libvirt_vm_trust_guest_rx_filters: false | 17:51 |
jovial | Should we try adding that to: https://github.com/openstack/kolla-ansible/blob/3e0014a7ea55651703eb03a5e8a513105a4de4aa/tests/templates/tenks-deploy-config.yml.j2#L4? | 17:52 |
jovial | ^ Not that line, but in that file | 17:52 |
spatel | jovial I have different question, are there any performance difference running vms inside libvirt container vs metal itself? | 17:53 |
jovial | Not that I am aware of, the container uses the host PID namespace for qemu processes IIRC | 17:55 |
jovial | So I can't imagine there would be a significant difference | 17:58 |
opendevreview | Merged openstack/kayobe master: veth: Remove support for EL8 / network-scripts https://review.opendev.org/c/openstack/kayobe/+/899888 | 18:01 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!