opendevreview | Stefan K proposed openstack/nova-specs master: Add Cloud Hypervisor support spec https://review.opendev.org/c/openstack/nova-specs/+/945549 | 09:22 |
---|---|---|
opendevreview | Stefan K proposed openstack/nova-specs master: Add Cloud Hypervisor support spec https://review.opendev.org/c/openstack/nova-specs/+/945549 | 09:32 |
opendevreview | Merged openstack/nova-specs master: tox: Drop envdir https://review.opendev.org/c/openstack/nova-specs/+/941184 | 11:00 |
opendevreview | Andre Aranha proposed openstack/nova master: Replace paramiko with ssh-python https://review.opendev.org/c/openstack/nova/+/946922 | 11:49 |
bbezak | Hi - I'd like to bring to nova team's attention pretty interesting (and somewhat convoluted to troubleshoot) bug - https://bugs.launchpad.net/nova/+bug/2104255. Namely stripping a switchdev capability from active SR-IOV VF on nova-compute service restart. Workaround is to restart nova-compute after VF is detach | 12:50 |
sean-k-mooney | bbezak: note that nova does not supprot VF-LAG at all :) | 12:53 |
sean-k-mooney | at least not offically | 12:53 |
sean-k-mooney | melonox never acutlly did the work to enabel it in neutorn or nova they just figured out a hack to make it work | 12:54 |
bbezak | indeed. I like that patch though - https://review.opendev.org/c/openstack/nova/+/884439. now we don't need to add switchdev by hand to binding profiles ;) | 12:55 |
bbezak | and it works very well :) | 12:55 |
sean-k-mooney | thats a diffent feature | 12:55 |
sean-k-mooney | so your using hardware offlowaded ovs now generic sriov | 12:55 |
bbezak | yes | 12:55 |
sean-k-mooney | bbezak: for what its worth the translation fo capablities to port bindign was ment to be done before hardwar offloaded ovs was merged | 12:56 |
sean-k-mooney | bbezak: it actully predates the creation of placement and got put on hold for like 6 years | 12:57 |
sean-k-mooney | bbezak: so it was always inteneded that you would not set switchdev | 12:57 |
sean-k-mooney | i.e. that nova would | 12:57 |
sean-k-mooney | bbezak: anyway looking at the bug report | 12:58 |
bbezak | thx for background info sean-k-mooney | 12:59 |
sean-k-mooney | it looks like the network capablities are lost when an instnce is "torn down" | 12:59 |
sean-k-mooney | does that mean the instnace is delete or stopped? | 12:59 |
bbezak | well. it happens on nova-compute service restart - on actively attached VFs | 13:00 |
bbezak | then when tearing vm down. the vfs are not usable | 13:00 |
sean-k-mooney | right but this is because while its attached to a vm | 13:00 |
sean-k-mooney | we cannot inspect its ethtool feature flags | 13:01 |
sean-k-mooney | i guess the problem is that we also cache this info on startup | 13:02 |
sean-k-mooney | so if the agent gets restarted whiel the vms are running | 13:02 |
bbezak | yes, it is attached to vm, then nova-compute restart is stripping feature in the db (as it is not visible on the host). and then when vm is removed the vf don't have the capability | 13:02 |
sean-k-mooney | it will no longer have the network capablities info | 13:02 |
bbezak | exactly | 13:02 |
sean-k-mooney | which is not wrong per say | 13:03 |
sean-k-mooney | it just wrong because it will never get update again when its detached | 13:03 |
bbezak | yeap | 13:03 |
sean-k-mooney | there are a couple of ways to fix that | 13:04 |
sean-k-mooney | they all have trade offs | 13:04 |
sean-k-mooney | we could clear the cache when we detach an interface | 13:04 |
sean-k-mooney | we could refuse to update the capablities if the device is not in the aviable sate | 13:05 |
bbezak | I'm wondering if we could look to existed.extra_info for capabilities | 13:07 |
bbezak | but maybe that is too narrow | 13:07 |
bbezak | thinking | 13:07 |
bbezak | :) | 13:07 |
bbezak | here https://github.com/openstack/nova/blob/1ad11b13884baeaa6ed9f8f5818f4d176f4d3134/nova/pci/manager.py#L271-L289 | 13:07 |
sean-k-mooney | you mean merge the values form the db with those form libvirt and have teh db ones take precidence while its in claimmed or allocated | 13:07 |
gibi | Uggla: here is my eventlet removal summary https://gibizer.github.io/posts/Eventlet-Removal-Flamingo-PTG/ feel free to link to it in you PTG summary mail | 13:11 |
bbezak | sth like that | 13:11 |
bbezak | I'm not taking account other pci devices that may don | 13:11 |
bbezak | I'm not taking into account other pci devices that may don't like that | 13:12 |
bbezak | I guess | 13:12 |
sean-k-mooney | bbezak: sorry on a call | 13:27 |
bbezak | no rush! | 13:28 |
sean-k-mooney | bbezak: we likely need to experiment with what is the correct approch. | 13:29 |
bbezak | yeah | 13:45 |
opendevreview | Balazs Gibizer proposed openstack/nova master: split monkey_patching form import https://review.opendev.org/c/openstack/nova/+/922425 | 14:07 |
opendevreview | sean mooney proposed openstack/nova master: Remove workaround for ovn live migration https://review.opendev.org/c/openstack/nova/+/946950 | 14:11 |
opendevreview | Pranali Deore proposed openstack/nova master: DNM: Test glance new location api https://review.opendev.org/c/openstack/nova/+/891207 | 14:28 |
opendevreview | Amit Uniyal proposed openstack/nova stable/2024.2: Libvirt: updates resource provider trait list https://review.opendev.org/c/openstack/nova/+/932522 | 15:00 |
opendevreview | Merged openstack/nova stable/2024.2: Libvirt: updates resource provider trait list https://review.opendev.org/c/openstack/nova/+/932522 | 22:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!