*** Guest2 is now known as prometheanfire | 01:28 | |
opendevreview | Merged openstack/nova stable/victoria: Reproduce bug 1953359 https://review.opendev.org/c/openstack/nova/+/820558 | 02:05 |
---|---|---|
opendevreview | Merged openstack/nova stable/victoria: Extend the reproducer for 1953359 and 1952915 https://review.opendev.org/c/openstack/nova/+/820856 | 02:06 |
opendevreview | melanie witt proposed openstack/nova master: Enforce api and db limits https://review.opendev.org/c/openstack/nova/+/712142 | 03:11 |
opendevreview | melanie witt proposed openstack/nova master: Update quota_class APIs for db and api limits https://review.opendev.org/c/openstack/nova/+/712143 | 03:11 |
opendevreview | melanie witt proposed openstack/nova master: Update limit APIs https://review.opendev.org/c/openstack/nova/+/712707 | 03:11 |
opendevreview | melanie witt proposed openstack/nova master: Update quota sets APIs https://review.opendev.org/c/openstack/nova/+/712749 | 03:11 |
opendevreview | melanie witt proposed openstack/nova master: Tell oslo.limit how to count nova resources https://review.opendev.org/c/openstack/nova/+/713301 | 03:11 |
opendevreview | melanie witt proposed openstack/nova master: Enforce resource limits using oslo.limit https://review.opendev.org/c/openstack/nova/+/615180 | 03:11 |
opendevreview | melanie witt proposed openstack/nova master: Add legacy limits and usage to placement unified limits https://review.opendev.org/c/openstack/nova/+/713498 | 03:11 |
opendevreview | melanie witt proposed openstack/nova master: Update quota apis with keystone limits and usage https://review.opendev.org/c/openstack/nova/+/713499 | 03:11 |
opendevreview | melanie witt proposed openstack/nova master: Add reno for unified limits https://review.opendev.org/c/openstack/nova/+/715271 | 03:11 |
opendevreview | melanie witt proposed openstack/nova master: Enable unified limits in the nova-next job https://review.opendev.org/c/openstack/nova/+/789963 | 03:11 |
opendevreview | Merged openstack/nova stable/xena: Reproduce bug 1952941 https://review.opendev.org/c/openstack/nova/+/827868 | 03:54 |
opendevreview | Merged openstack/nova stable/xena: Migrate RequestSpec.numa_topology to use pcpuset https://review.opendev.org/c/openstack/nova/+/827869 | 03:56 |
opendevreview | Merged openstack/nova stable/wallaby: Add functional test for bug 1937375 https://review.opendev.org/c/openstack/nova/+/803717 | 04:03 |
*** clarkb is now known as Guest32 | 05:22 | |
opendevreview | Minghong Hou proposed openstack/nova master: fix VirtualInterface table can't be update https://review.opendev.org/c/openstack/nova/+/828819 | 06:17 |
*** amoralej|off is now known as amoralej | 07:24 | |
*** hemna2 is now known as hemna | 07:37 | |
nikparasyr | hello, I have a question regarding shelving/unshelving. We have a flavor with `hw:cpu_policy='dedicated', hw:cpu_thread_policy='isolate'` and pci_passthrough 2 gpus. When we try to unshelve we get this error " Insufficient compute resources: Requested instance NUMA topology together with requested PCI devices cannot fit the given host NUMA topology.". My question is to what extent does Nova require to | 08:41 |
nikparasyr | find the exact same cpu set available on the target host? We have enabled the PCIPassThrough filter for the scheduler but not the NUMATopologyFilter. If I understand well the NUMATopologyFilter will make sure that the scheduler picks a node that has the required topology available. Even so, if Nova requires the exact same cpu set to the target host we will still have an issue even with the numa filter... | 08:41 |
nikparasyr | So, any idea to what extend does nova require the exact same cpu set when cpu pinning is enabled? | 08:41 |
kashyap | gibi: Thanks for the link to the smaller repro; also check out Peter's response on that thread | 10:05 |
kashyap | He points out two possibilities: | 10:05 |
kashyap | 1) the guest OS didn't confirm the detach | 10:05 |
kashyap | 2) there was a recent bug in qemu triggered by using JSON syntax for -device | 10:05 |
kashyap | gibi: That's it: this looks like it -- | 10:10 |
kashyap | "DEVICE_DELETED event is not delivered for device frontend if -device is configured via JSON" | 10:10 |
kashyap | https://bugzilla.redhat.com/show_bug.cgi?id=2036669 | 10:10 |
kashyap | But based on the versions in the CI job, they should already have the fix: | 10:19 |
kashyap | - libvirt version: 8.0.0, package: 2.el9 | 10:19 |
kashyap | - qemu-kvm-6.2.0-5.el9 | 10:19 |
*** mdbooth3 is now known as mdbooth | 10:25 | |
kashyap | gibi: When you're around, to rule out the above bug, I wonder if we could try this workaround: | 10:27 |
kashyap | On compute nodes, in /etc/libvit/qemu.conf: | 10:28 |
kashyap | capability_filters = [ "device.json" ] | 10:28 |
gibi | kashyap: hi! | 10:34 |
gibi | kashyap: sure, I will try to make that config change via devstack | 10:35 |
gibi | kashyap: does it require a libvirtd restart? | 10:35 |
kashyap | gibi: See my latest comment: https://bugs.launchpad.net/nova/+bug/1960346/ | 10:35 |
kashyap | gibi: Yeah, it is required, afraid | 10:35 |
gibi | OK | 10:36 |
gibi | thanks | 10:36 |
gibi | kashyap: pushed new PS to https://review.opendev.org/c/openstack/devstack/+/828705 with the WA, lets see if it helps or not | 11:01 |
kashyap | gibi: Thank you! It will at least rule out the 2nd possibility above for sure. | 11:01 |
gibi | I can try to look at the first | 11:01 |
gibi | we can grab the console log | 11:01 |
gibi | after the failed detach | 11:02 |
gibi | hm, we already grabbing it in tempest | 11:03 |
gibi | let me find it | 11:03 |
kashyap | I see, I need to be AFK for an hour-ish; will come back and check | 11:06 |
gibi | added the consol log to the bug https://paste.opendev.org/show/bXXn63wbTPOwiCGC5xDI/ | 11:15 |
gibi | nothing obviously wrong there | 11:15 |
gibi | but the guest is still in a state to getting IP from DHCP | 11:16 |
gibi | so maybe it is not fully boot when the detach was requested | 11:16 |
gibi | chateaulav: left some suggestions inline about the ovo backports | 11:25 |
opendevreview | Manuel Bentele proposed openstack/nova master: libvirt: Add properties to set advanced QXL video RAM settings https://review.opendev.org/c/openstack/nova/+/828674 | 11:35 |
opendevreview | Manuel Bentele proposed openstack/nova master: libvirt: Add configuration options to set SPICE compression settings https://review.opendev.org/c/openstack/nova/+/828675 | 12:03 |
chateaulav | gibi: thanks for the follow up, that makes sense i was doing research and reading last night and had found references to `obj_relationships`. but your comments align to what i was trying yesterday, i just had brought in the exception aspect. appreciated | 12:24 |
gibi | chateaulav: cool | 12:25 |
opendevreview | Manuel Bentele proposed openstack/nova master: libvirt: Add property to set number of screens per video adapter https://review.opendev.org/c/openstack/nova/+/828676 | 12:37 |
erlon | sean-k-mooney: hey Sean, I believe I have finished implementing all suggested changes in the live migration rollback fix: https://review.opendev.org/q/topic:bug%252F1944619 | 12:40 |
erlon | when you have a chance to give a look ill appreciate | 12:41 |
rosmaita | bauzas: fyi, i will be raising the minima in requirements for os-brick release: http://lists.openstack.org/pipermail/openstack-discuss/2022-February/027192.html | 13:22 |
*** amoralej is now known as amoralej|lunch | 13:24 | |
*** amoralej|lunch is now known as amoralej | 14:01 | |
gibi | bauzas, rosmaita: I quickly checked the os_brick requirements patch I see no major bump in any deps so I think it is not a risky change. | 14:16 |
rosmaita | gibi: ty | 14:17 |
gibi | and tempest is green so nova is co-installable with the new os_brick deps | 14:17 |
*** dasm|off is now known as dasm | 14:21 | |
gibi | gmann, frickler, bauzas: about the centos-9-steam job failure https://bugs.launchpad.net/nova/+bug/1960346/ I conculded that the cirros guest is not fully booted when the volume detach happens and the guest OS does not release the device. We need https://review.opendev.org/q/topic:wait_until_sshable_pingable to solve this in general | 14:36 |
kashyap | gibi: So cracked the prob! It's the guest OS indeed - adding a delay helps here? | 14:49 |
kashyap | s/So/So you/ | 14:49 |
kashyap | I think for now, going with the extra delay before the detach happens is fine. That saves more time here, before the big Tempest series gets merged | 14:53 |
gibi | kashyap: I don't have brains any more today but next week I can put up a tempest patch with some selective delays. I'm not sure how well QA will appreciate it | 14:55 |
gibi | also I can take a look at lyarwood's series and try to move that forward | 14:56 |
gibi | kashyap: thanks you for your help! | 14:56 |
frickler | gibi: thx for the update, do you know why this only occurs on c9s? is booting slower or did previous libvirts not care whether that release actually happens? | 15:04 |
gibi | frickler: I think older libvirt let nova to restart the detach process but newer libvirt simply rejectes the retry as the original detach is still ongoing | 15:19 |
opendevreview | Merged openstack/nova stable/victoria: Add a WA flag waiting for vif-plugged event during reboot https://review.opendev.org/c/openstack/nova/+/818559 | 15:22 |
sean-k-mooney | gibi: older qemu did not support restart the detach but did not raise an error then qemu started enforcing it | 15:39 |
gibi | sean-k-mooney: yeah, sorry, s/libvirt/qemu. | 15:40 |
elodilles | melwitt: whenever you have time, could you review this patch? https://review.opendev.org/c/openstack/nova/+/805628 | 15:45 |
frickler | sean-k-mooney: gibi: but then it sounds to me that the real fix would still be to make nova not retry the detach, just wait longer? | 15:45 |
elodilles | melwitt: i think it would reduce the number of rechecks in wallaby and victoria if it merges (and its devstack part) | 15:45 |
gibi | frickler: if the detach happens while the cirros is booting then the guest OS never releases the device | 15:46 |
gibi | so right now waiting more is not an option | 15:47 |
gibi | but in general I agree to remove the retry loop from nova | 15:47 |
gibi | as it is pointless after qemu starts rejecting the retry | 15:47 |
*** Guest32 is now known as clarkb | 15:47 | |
sean-k-mooney | really the jobs should wait for the instance to be pingable/sshable | 15:47 |
sean-k-mooney | and only detach then | 15:47 |
sean-k-mooney | and nova shoudl not retry if the detach fails and just have the client retry | 15:48 |
sean-k-mooney | clieht beign tempest or enduser if a retry is needed | 15:48 |
gibi | sean-k-mooney: yepp | 15:50 |
gibi | on the other hand if ever a detach is issue by the client right after a boot then that detach will time out in nova, but I'm not sure it ever time outs in qemu | 15:50 |
gibi | so in that case a client retry will not help either | 15:51 |
sean-k-mooney | we might be abel to expicitly cancel the job in qemu | 15:51 |
gibi | as qemu will say that the detach is in progress | 15:51 |
sean-k-mooney | when we time out | 15:51 |
opendevreview | Jonathan Race proposed openstack/nova master: object/notification for Adds Pick guest CPU architecture based on host arch in libvirt driver support https://review.opendev.org/c/openstack/nova/+/828369 | 15:51 |
opendevreview | Jonathan Race proposed openstack/nova master: driver/secheduler/docs for Adds Pick guest CPU architecture based on host arch in libvirt driver support https://review.opendev.org/c/openstack/nova/+/822053 | 15:51 |
opendevreview | Jonathan Race proposed openstack/nova master: zuul-job for Adds Pick guest CPU architecture based on host arch in libvirt driver support https://review.opendev.org/c/openstack/nova/+/828372 | 15:51 |
gibi | sean-k-mooney: at least the doc did not mention a way to cancel via libvirt https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainDetachDeviceFlags | 15:54 |
sean-k-mooney | we could always issue a detach again :) since that will do an abort in qemu | 15:55 |
sean-k-mooney | but ya we shoudl aske the libirt folks although im on PTO today so im going to drop off irc again soon | 15:56 |
sean-k-mooney | so kashyap maybe you could folow up and see if there is a way to abort the detach or pass a timeout to qemu via libfirt | 15:56 |
gibi | sean-k-mooney: if we attach the detach again qemu reject it | 15:57 |
gibi | sean-k-mooney: if we issue the detach again qemu reject it | 15:58 |
gibi | as the pervious one is still ongoing | 15:58 |
sean-k-mooney | yep | 15:59 |
sean-k-mooney | but we could catch the error | 15:59 |
gibi | yepp, but that does not make the device actually detached :D | 15:59 |
sean-k-mooney | if it say devcice not found well presumabel it finsihed before we sent the detach after the time out | 16:00 |
gibi | sean-k-mooney: it say detach is ongoing | 16:00 |
sean-k-mooney | right but the second detach will abort the detach | 16:00 |
sean-k-mooney | that is the new behavior in qemu | 16:00 |
gibi | really? | 16:01 |
gibi | I've only checked the first two detach | 16:01 |
gibi | so you say the 3rd returns device not found? | 16:01 |
* gibi looks at the logs again... | 16:02 | |
sean-k-mooney | no | 16:02 |
sean-k-mooney | im saying the second detach that return "detach is ongoing" cause qemu to abort the detach | 16:02 |
sean-k-mooney | at lest that is what i was told was the new behavior | 16:03 |
gmann | gibi: ack, thanks. I will check that tempest patches. | 16:09 |
gibi | sean-k-mooney: qemu reject each 7 retries with the message the the unplug is in progress https://paste.opendev.org/show/bW5wXCyH5em5tNI34zwV/ | 16:10 |
gibi | sean-k-mooney: I don't think the first detach job was abborted by the second detach | 16:10 |
gibi | gmann: they are WIP | 16:11 |
gibi | gmann: how QA would feel about a 20second sleep in the tempest volume detach code? that would be a quick fix compared to the sshable series | 16:12 |
sean-k-mooney | gibi: huh ok i was told it would but perhaps not. | 16:12 |
gibi | sean-k-mooney: if there would be a way to abort a detach then we could adapt now to it | 16:13 |
gibi | anyhow go enjoy your PTO, this will be an open issue on Monday too :) | 16:13 |
sean-k-mooney | ack im currently trying to decide if i want to use brick or paving slabs in my garden to make paths and beds | 16:15 |
gibi | nice problem :) | 16:15 |
sean-k-mooney | ya im half tempted to just go with wood chip since its eaiser but a lot less permenent and i woudl have to do it every year | 16:16 |
gibi | less permanent mean you can decide next year to replace it with brick or slab :) | 16:16 |
sean-k-mooney | hehe thats true too | 16:17 |
sean-k-mooney | also cheaper | 16:17 |
gmann | gibi: I think we can wait for sshable series as it is hitting only in cenos9-stream | 16:22 |
gibi | gmann: ack | 16:22 |
opendevreview | Merged openstack/nova stable/xena: Avoid unbound instance_uuid var during delete https://review.opendev.org/c/openstack/nova/+/816488 | 16:24 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/wallaby: Avoid unbound instance_uuid var during delete https://review.opendev.org/c/openstack/nova/+/828839 | 16:26 |
*** amoralej is now known as amoralej|off | 16:28 | |
opendevreview | Jonathan Race proposed openstack/nova master: driver/secheduler/docs for Adds Pick guest CPU architecture based on host arch in libvirt driver support https://review.opendev.org/c/openstack/nova/+/822053 | 16:52 |
opendevreview | Jonathan Race proposed openstack/nova master: zuul-job for Adds Pick guest CPU architecture based on host arch in libvirt driver support https://review.opendev.org/c/openstack/nova/+/828372 | 16:52 |
*** artom__ is now known as artom | 16:56 | |
melwitt | elodilles: done | 18:49 |
*** carloss is now known as carloss|afk | 19:12 | |
opendevreview | Jonathan Race proposed openstack/nova master: object/notification for Adds Pick guest CPU architecture based on host arch in libvirt driver support https://review.opendev.org/c/openstack/nova/+/828369 | 19:49 |
opendevreview | Jonathan Race proposed openstack/nova master: driver/secheduler/docs for Adds Pick guest CPU architecture based on host arch in libvirt driver support https://review.opendev.org/c/openstack/nova/+/822053 | 19:49 |
opendevreview | Jonathan Race proposed openstack/nova master: zuul-job for Adds Pick guest CPU architecture based on host arch in libvirt driver support https://review.opendev.org/c/openstack/nova/+/828372 | 19:49 |
*** dasm is now known as dasm|off | 21:49 | |
*** carloss|afk is now known as carloss | 21:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!