Friday, 2022-02-11

*** Guest2 is now known as prometheanfire01:28
opendevreviewMerged openstack/nova stable/victoria: Reproduce bug 1953359  https://review.opendev.org/c/openstack/nova/+/82055802:05
opendevreviewMerged openstack/nova stable/victoria: Extend the reproducer for 1953359 and 1952915  https://review.opendev.org/c/openstack/nova/+/82085602:06
opendevreviewmelanie witt proposed openstack/nova master: Enforce api and db limits  https://review.opendev.org/c/openstack/nova/+/71214203:11
opendevreviewmelanie witt proposed openstack/nova master: Update quota_class APIs for db and api limits  https://review.opendev.org/c/openstack/nova/+/71214303:11
opendevreviewmelanie witt proposed openstack/nova master: Update limit APIs  https://review.opendev.org/c/openstack/nova/+/71270703:11
opendevreviewmelanie witt proposed openstack/nova master: Update quota sets APIs  https://review.opendev.org/c/openstack/nova/+/71274903:11
opendevreviewmelanie witt proposed openstack/nova master: Tell oslo.limit how to count nova resources  https://review.opendev.org/c/openstack/nova/+/71330103:11
opendevreviewmelanie witt proposed openstack/nova master: Enforce resource limits using oslo.limit  https://review.opendev.org/c/openstack/nova/+/61518003:11
opendevreviewmelanie witt proposed openstack/nova master: Add legacy limits and usage to placement unified limits  https://review.opendev.org/c/openstack/nova/+/71349803:11
opendevreviewmelanie witt proposed openstack/nova master: Update quota apis with keystone limits and usage  https://review.opendev.org/c/openstack/nova/+/71349903:11
opendevreviewmelanie witt proposed openstack/nova master: Add reno for unified limits  https://review.opendev.org/c/openstack/nova/+/71527103:11
opendevreviewmelanie witt proposed openstack/nova master: Enable unified limits in the nova-next job  https://review.opendev.org/c/openstack/nova/+/78996303:11
opendevreviewMerged openstack/nova stable/xena: Reproduce bug 1952941  https://review.opendev.org/c/openstack/nova/+/82786803:54
opendevreviewMerged openstack/nova stable/xena: Migrate RequestSpec.numa_topology to use pcpuset  https://review.opendev.org/c/openstack/nova/+/82786903:56
opendevreviewMerged openstack/nova stable/wallaby: Add functional test for bug 1937375  https://review.opendev.org/c/openstack/nova/+/80371704:03
*** clarkb is now known as Guest3205:22
opendevreviewMinghong Hou proposed openstack/nova master: fix VirtualInterface table can't be update  https://review.opendev.org/c/openstack/nova/+/82881906:17
*** amoralej|off is now known as amoralej07:24
*** hemna2 is now known as hemna07:37
nikparasyrhello, I have a question regarding shelving/unshelving. We have a flavor with `hw:cpu_policy='dedicated', hw:cpu_thread_policy='isolate'` and pci_passthrough 2 gpus. When we try to unshelve we get this error " Insufficient compute resources: Requested instance NUMA topology together with requested PCI devices cannot fit the given host NUMA topology.". My question is to what extent does Nova require to 08:41
nikparasyrfind the  exact same cpu set available on the target host? We have enabled the PCIPassThrough filter for the scheduler but not the NUMATopologyFilter. If I understand well the NUMATopologyFilter will make sure that the scheduler picks a node that has the required topology available. Even so, if Nova requires the exact same cpu set to the target host we will still have an issue even with the numa filter... 08:41
nikparasyrSo, any idea to what extend does nova require the exact same cpu set when cpu pinning is enabled?08:41
kashyapgibi: Thanks for the link to the smaller repro; also check out Peter's response on that thread10:05
kashyapHe points out two possibilities:10:05
kashyap1) the guest OS didn't confirm the detach10:05
kashyap2) there was a recent bug in qemu triggered by using JSON syntax for -device10:05
kashyapgibi: That's it: this looks like it -- 10:10
kashyap"DEVICE_DELETED event is not delivered for device frontend if -device is configured via JSON"10:10
kashyaphttps://bugzilla.redhat.com/show_bug.cgi?id=203666910:10
kashyapBut based on the versions in the CI job, they should already have the fix:10:19
kashyap  - libvirt version: 8.0.0, package: 2.el9 10:19
kashyap  - qemu-kvm-6.2.0-5.el910:19
*** mdbooth3 is now known as mdbooth10:25
kashyapgibi: When you're around, to rule out the above bug, I wonder if we could try this workaround:10:27
kashyapOn compute nodes, in /etc/libvit/qemu.conf:10:28
kashyap    capability_filters = [ "device.json" ]10:28
gibikashyap: hi!10:34
gibikashyap: sure, I will try to make that config change via devstack 10:35
gibikashyap: does it require a libvirtd restart?10:35
kashyapgibi: See my latest comment: https://bugs.launchpad.net/nova/+bug/1960346/10:35
kashyapgibi: Yeah, it is required, afraid10:35
gibiOK10:36
gibithanks10:36
gibikashyap: pushed new PS to https://review.opendev.org/c/openstack/devstack/+/828705 with the WA, lets see if it helps or not11:01
kashyapgibi: Thank you!  It will at least rule out the 2nd possibility above for sure.11:01
gibiI can try to look at the first 11:01
gibiwe can grab the console log 11:01
gibiafter the failed detach11:02
gibihm, we already grabbing it in tempest11:03
gibilet me find it11:03
kashyapI see, I need to be AFK for an hour-ish; will come back and check11:06
gibiadded the consol log to the bug https://paste.opendev.org/show/bXXn63wbTPOwiCGC5xDI/11:15
gibinothing obviously wrong there 11:15
gibibut the guest is still in a state to getting IP from DHCP11:16
gibiso maybe it is not fully boot when the detach was requested11:16
gibichateaulav: left some suggestions inline about the ovo backports11:25
opendevreviewManuel Bentele proposed openstack/nova master: libvirt: Add properties to set advanced QXL video RAM settings  https://review.opendev.org/c/openstack/nova/+/82867411:35
opendevreviewManuel Bentele proposed openstack/nova master: libvirt: Add configuration options to set SPICE compression settings  https://review.opendev.org/c/openstack/nova/+/82867512:03
chateaulavgibi: thanks for the follow up, that makes sense i was doing research and reading last night and had found references to `obj_relationships`. but your comments align to what i was trying yesterday, i just had brought in the exception aspect. appreciated12:24
gibichateaulav: cool12:25
opendevreviewManuel Bentele proposed openstack/nova master: libvirt: Add property to set number of screens per video adapter  https://review.opendev.org/c/openstack/nova/+/82867612:37
erlonsean-k-mooney: hey Sean, I believe I have finished implementing all suggested changes in the live migration rollback fix: https://review.opendev.org/q/topic:bug%252F194461912:40
erlonwhen you have a chance to give a look ill appreciate12:41
rosmaitabauzas: fyi, i will be raising the minima in requirements for os-brick release: http://lists.openstack.org/pipermail/openstack-discuss/2022-February/027192.html13:22
*** amoralej is now known as amoralej|lunch13:24
*** amoralej|lunch is now known as amoralej14:01
gibibauzas, rosmaita: I quickly checked the os_brick requirements patch I see no major bump in any deps so I think it is not a risky change. 14:16
rosmaitagibi: ty14:17
gibiand tempest is green so nova is co-installable with the new os_brick deps14:17
*** dasm|off is now known as dasm14:21
gibigmann, frickler, bauzas: about the centos-9-steam job failure https://bugs.launchpad.net/nova/+bug/1960346/ I conculded that the cirros guest is not fully booted when the volume detach happens and the guest OS does not release the device. We need https://review.opendev.org/q/topic:wait_until_sshable_pingable to solve this in general14:36
kashyapgibi: So cracked the prob!  It's the guest OS indeed - adding a delay helps here?14:49
kashyaps/So/So you/14:49
kashyapI think for now, going with the extra delay before the detach happens is fine.  That saves more time here, before the big Tempest series gets merged14:53
gibikashyap: I don't have brains any more today but next week I can put up a tempest patch with some selective delays. I'm not sure how well QA will appreciate it14:55
gibialso I can take a look at lyarwood's series and try to move that forward14:56
gibikashyap: thanks you for your help!14:56
fricklergibi: thx for the update, do you know why this only occurs on c9s? is booting slower or did previous libvirts not care whether that release actually happens?15:04
gibifrickler: I think older libvirt let nova to restart the detach process but newer libvirt simply rejectes the retry as the original detach is still ongoing15:19
opendevreviewMerged openstack/nova stable/victoria: Add a WA flag waiting for vif-plugged event during reboot  https://review.opendev.org/c/openstack/nova/+/81855915:22
sean-k-mooneygibi: older qemu did not support restart the detach but did not raise an error then qemu started enforcing it15:39
gibisean-k-mooney: yeah, sorry, s/libvirt/qemu.15:40
elodillesmelwitt: whenever you have time, could you review this patch? https://review.opendev.org/c/openstack/nova/+/80562815:45
fricklersean-k-mooney: gibi: but then it sounds to me that the real fix would still be to make nova not retry the detach, just wait longer?15:45
elodillesmelwitt: i think it would reduce the number of rechecks in wallaby and victoria if it merges (and its devstack part)15:45
gibifrickler: if the detach happens while the cirros is booting then the guest OS never releases the device15:46
gibiso right now waiting more is not an option15:47
gibibut in general I agree to remove the retry loop from nova15:47
gibias it is pointless after qemu starts rejecting the retry15:47
*** Guest32 is now known as clarkb15:47
sean-k-mooneyreally the jobs should wait for the instance to be pingable/sshable 15:47
sean-k-mooneyand only detach then15:47
sean-k-mooneyand nova shoudl not retry if the detach fails and just have the client retry15:48
sean-k-mooneyclieht beign tempest or enduser if a retry is needed15:48
gibisean-k-mooney: yepp15:50
gibion the other hand if ever a detach is issue by the client right after a boot then that detach will time out in nova, but I'm not sure it ever time outs in qemu15:50
gibiso in that case a client retry will not help either15:51
sean-k-mooneywe might be abel to expicitly cancel the job in qemu15:51
gibias qemu will say that the detach is in progress15:51
sean-k-mooneywhen we time out15:51
opendevreviewJonathan Race proposed openstack/nova master: object/notification for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82836915:51
opendevreviewJonathan Race proposed openstack/nova master: driver/secheduler/docs for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82205315:51
opendevreviewJonathan Race proposed openstack/nova master: zuul-job for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82837215:51
gibisean-k-mooney: at least the doc did not mention a way to cancel via libvirt https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainDetachDeviceFlags15:54
sean-k-mooneywe could always issue a detach again :) since that will do an abort in qemu15:55
sean-k-mooneybut ya we shoudl aske the libirt folks although im on PTO today so im going to drop off irc again soon15:56
sean-k-mooneyso kashyap maybe you could folow up and see if there is a way to abort the detach or pass a timeout to qemu via libfirt15:56
gibisean-k-mooney: if we attach the detach again qemu reject it 15:57
gibisean-k-mooney: if we issue the detach again qemu reject it 15:58
gibias the pervious one is still ongoing15:58
sean-k-mooneyyep15:59
sean-k-mooneybut we could catch the error15:59
gibiyepp, but that does not make the device actually detached :D15:59
sean-k-mooneyif it say devcice not found well presumabel it finsihed before we sent the detach after the time out16:00
gibisean-k-mooney: it say detach is ongoing16:00
sean-k-mooneyright but the second detach will abort the detach16:00
sean-k-mooneythat is the new behavior in qemu16:00
gibireally?16:01
gibiI've only checked the first two detach16:01
gibiso you say the 3rd returns device not found?16:01
* gibi looks at the logs again...16:02
sean-k-mooneyno16:02
sean-k-mooneyim saying the second detach that return "detach is ongoing" cause qemu to abort the detach16:02
sean-k-mooneyat lest that is what i was told was the new behavior16:03
gmanngibi: ack, thanks. I will check that tempest patches. 16:09
gibisean-k-mooney: qemu reject each 7 retries with the message the the unplug is in progress https://paste.opendev.org/show/bW5wXCyH5em5tNI34zwV/16:10
gibisean-k-mooney: I don't think the first detach job was abborted by the second detach16:10
gibigmann: they are WIP 16:11
gibigmann: how QA would feel about a 20second sleep in the tempest volume detach code? that would be a quick fix compared to the sshable series16:12
sean-k-mooneygibi: huh ok i was told it would but perhaps not.16:12
gibisean-k-mooney: if there would be a way to abort a detach then we could adapt now to it16:13
gibianyhow go enjoy your PTO, this will be an open issue on Monday too :)16:13
sean-k-mooneyack im currently trying to decide if i want to use brick or paving slabs in my garden  to make paths and beds16:15
gibinice problem :)16:15
sean-k-mooneyya im half tempted to just go with wood chip since its eaiser but a lot less permenent and i woudl have to do it every year16:16
gibiless permanent mean you can decide next year to replace it with brick or slab :)16:16
sean-k-mooneyhehe thats true too16:17
sean-k-mooneyalso cheaper16:17
gmanngibi: I think we can wait for sshable series as it is hitting only in cenos9-stream16:22
gibigmann: ack16:22
opendevreviewMerged openstack/nova stable/xena: Avoid unbound instance_uuid var during delete  https://review.opendev.org/c/openstack/nova/+/81648816:24
opendevreviewBalazs Gibizer proposed openstack/nova stable/wallaby: Avoid unbound instance_uuid var during delete  https://review.opendev.org/c/openstack/nova/+/82883916:26
*** amoralej is now known as amoralej|off16:28
opendevreviewJonathan Race proposed openstack/nova master: driver/secheduler/docs for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82205316:52
opendevreviewJonathan Race proposed openstack/nova master: zuul-job for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82837216:52
*** artom__ is now known as artom16:56
melwittelodilles: done18:49
*** carloss is now known as carloss|afk19:12
opendevreviewJonathan Race proposed openstack/nova master: object/notification for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82836919:49
opendevreviewJonathan Race proposed openstack/nova master: driver/secheduler/docs for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82205319:49
opendevreviewJonathan Race proposed openstack/nova master: zuul-job for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82837219:49
*** dasm is now known as dasm|off21:49
*** carloss|afk is now known as carloss21:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!