Thursday, 2023-03-23

*** ministry is now known as __ministry08:11
auniyalHi sean-k-mooney 09:16
auniyalcan you please review this - https://review.opendev.org/c/openstack/nova/+/790447/09:16
zigoHi there!09:44
zigoI'm currently doing routine upgrade of compute nodes in a cluster (running Victoria), and I'm getting live-migration errors of VMs like this one:09:44
zigohttps://paste.opendev.org/show/bYlTfz7fxnQtzVhpVf91/09:44
zigoIs it known? Is there a way to fix? Is it related to the version of libvirt or qemu?09:44
bauzaslooking09:49
bauzaszigo: good question I guess you've seen the libvirt error09:52
bauzas2023-03-23 09:37:52.682 3209246 INFO nova.compute.manager [req-5fda88d8-510a-4943-9703-9b47e865a89f - - - - -] [instance: 17672112-c416-494a-88f8-fd7cfa85453b] VM Resumed (Lifecycle Event) 2023-03-23 09:37:52.694 3209246 ERROR nova.virt.libvirt.driver [-] [instance: 17672112-c416-494a-88f8-fd7cfa85453b] Live Migration failure: internal error: qemu unexpectedly closed the monitor: 2023-03-23T09:37:52.215715Z qemu-system-x86_64: VQ09:52
bauzas 0 size 0x80 < last_avail_idx 0x0 - used_idx 0x44 2023-03-23T09:37:52.215742Z qemu-system-x86_64: Failed to load virtio-balloon:virtio 2023-03-23T09:37:52.215745Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon'09:52
zigoYeah, I do. But then, where do I look?09:53
zigoLibvirt logs?09:53
bauzasit reminds me this bug report https://bugs.launchpad.net/cloud-archive/+bug/184849709:54
zigoI haven't see anything doing a tail of /var/log/libvirt/qemu/*.log09:55
zigoOn Bullseye, I'm running with qemu 1:5.2+dfsg-11+deb11u209:55
bauzasand you're not seeing anything with qemu logs ?09:57
bauzasthe error is reported by qemu process, not by libvirtd09:58
bauzasso I'd say check the qemu logs09:58
bauzaszigo: so it seems a qemu migration *to* a node with a version of 4.0 or higher is problematic10:03
bauzaswhich qemu version the source is running ?10:03
bauzasI assume you're not mixing releases10:03
bauzasbut I wanted to double-check10:03
zigoSame version of qemu and libvirt in both source and dest.10:04
zigoIt's a plain Bullseye, so I use whatever is in Debian Stable (minus the security upgrades that I'm trying to perform).10:05
bauzasand what migration flags are you using ?10:06
bauzasare the vms paused ? 10:06
bauzasor suspended ?10:06
zigoThey are ACTIVE.10:08
zigoIs there migration flags I can set?!? :)10:08
zigoWhere do I look?10:08
zigoI've just done "nova host-evacuate-live <hostname>" ...10:09
zigo(not sure if there's a way to do this with python-openstackclient from Victoria...)10:10
zigoWhat's weird, is that MANY VMs on the same host are live-migrating without a glitch. Then on average, 2 VMs on each compute can't live-migrate ...10:11
sean-k-mooneyzigo: there isnet and that intentional10:11
sean-k-mooneyhost-evacuate-live is not somthing we recomend operators use10:12
zigosean-k-mooney: What should I use then?10:12
sean-k-mooneywe are intentioally not supporting it in osc10:12
bauzaswait10:12
bauzasevacuate or live-migrate ?10:12
bauzasI'm lost here10:12
sean-k-mooneyzigo: you should write yoru won code to live migrate all the vms forma host that actully has error handeling10:12
bauzasoh10:12
bauzashost-evacuate-live10:12
bauzasdamn old unspported CLIs10:13
sean-k-mooneytechnially deprecated rather then unsuppported10:13
sean-k-mooneyuntil we can remvoe it in C/D10:13
sean-k-mooneywe need everyone ot use the sdk first10:13
sean-k-mooneyzigo: it wont help now but you used to be able to set https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.mem_stats_period_seconds to 0 to disable the memory ballon10:15
sean-k-mooneycan you confirm if that is set to 0 on either host and that the vm has a memory ballon10:15
zigoIt's set to default on that host (ie: 10 ...).10:16
sean-k-mooneyack10:16
zigoShould I set it to zero and try again then?10:16
sean-k-mooneyi was wondering if having it enabel and disable on differnt hsot coudl cause issues10:16
sean-k-mooneyi think this is one of those thigns that you cant change with runnign vms10:17
sean-k-mooneyzigo: i assume you cant just cold migrate them10:17
bauzasyup, or it would require a vm recycle10:17
zigoOh ... :/10:17
sean-k-mooney/recycle/restart/10:17
zigosean-k-mooney: Well, to cold-migrate, I must get in touch with customers to at least warn them about the operation, and let them know their VM will reboot.10:18
zigoThat's kind of very annoying with 50 computes and 2k+ VMs ...10:18
sean-k-mooneyto answer your orgianl question no im not aware of live migration issues related to memory baloons 10:18
bauzassean-k-mooney: I remember we had some old qemu-4 live migration issues with the qemu balloons10:20
bauzasbut zigo isn't impacted10:21
sean-k-mooneyzigo: i assume this is persistent10:27
sean-k-mooneyi.e. a secodn live migration of the vm has the same error10:28
zigoRight.10:28
sean-k-mooneyhave you tried migratign to a differnt host?10:28
zigoI didn't try to specify the dest host, but I can try. I'll let you know...10:29
sean-k-mooneyim trying to fiture out is the a thing tha that is speciic to the vm, the destion host ectra10:29
sean-k-mooneyhttps://bugzilla.redhat.com/show_bug.cgi?id=192388110:32
sean-k-mooneythat sound like it10:32
sean-k-mooneysimilar for virtio-blk https://bugs.launchpad.net/nova/+bug/173762510:34
sean-k-mooney""Dave notes that we get this "guest index inconsistent" error when the migrated RAM is inconsistent with the migrated 'virtio' device state. And a common case is where a 'virtio' device does an operation after the vCPU is stopped and after RAM has been transmitted."""10:34
sean-k-mooneyzigo: are you using post-copy or autoconverge by the way10:36
sean-k-mooneybauzas: this is the qemu 4.0 issue right https://lore.kernel.org/all/156517411102.26464.1302440989654328620.launchpad@gac.canonical.com/T/10:37
sean-k-mooneyhttps://bugs.launchpad.net/qemu/+bug/183856910:37
zigoNot using post-copy (it's set to default, ie: false)10:38
zigoSame for live_migration_permit_auto_converge (set to default: false)10:39
zigoI'm using TLS though...10:39
zigolibvirt over TLS.10:39
zigoI probably should set live_migration_permit_auto_converge to true though, as sometimes, I have to manualy do a live-migration-force-complete ...10:40
bauzassean-k-mooney: unrelated, was it you who wrote https://etherpad.opendev.org/p/nova-bobcat-ptg#L75 ?10:40
sean-k-mooneyi dont rememebr that but i am interested in that10:44
sean-k-mooneymaybe dansmith 10:44
sean-k-mooneyor artom i certenlly ask that question or one like it in our intenal meetings a few week ago10:44
sean-k-mooneymy understandign is the glance folks were going to also work on the nova patches10:45
bauzasI'm just mentioning glance will discuss this with cinder 10:45
sean-k-mooneyand we just needed to reveiew10:45
bauzasand people will join10:45
sean-k-mooneyok are you suggesting we also join that at teh same time10:48
bauzasyeah, if interested10:52
bauzasI wrote that in our etherpad10:52
bauzasbut we can ask for a specific glance/nova session if after that some questions remain10:53
bauzasI just arranged this with pdeore10:53
artomsean-k-mooney, bauzas, no, not me11:15
bauzask11:15
stephenfinCould I get another reviewer on that series to remove sqlalchemy-migrate, please? https://review.opendev.org/q/topic:sqlalchemy-20+project:openstack/nova+is:open12:28
sean-k-mooneyso you have two diffent patch changes there12:54
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/860829 andhttps://review.opendev.org/c/openstack/nova/+/87242812:54
sean-k-mooneyi reviewd the later before and was ok to proceed with it12:54
sean-k-mooneyi have not looked at teh former12:55
opendevreviewElod Illes proposed openstack/nova stable/victoria: DNM: gate test  https://review.opendev.org/c/openstack/nova/+/87838613:27
opendevreviewMerged openstack/nova stable/victoria: [stable-only][cve] Check VMDK create-type against an allowed list  https://review.opendev.org/c/openstack/nova/+/87169913:34
opendevreviewMerged openstack/nova stable/yoga: Reproducer for bug 1951656  https://review.opendev.org/c/openstack/nova/+/86615313:34
dansmithbauzas: sean-k-mooney wasn't me13:38
bauzasanyway, we'll see this next Thursday then13:38
bauzaselodilles: awesome \o/ https://review.opendev.org/c/openstack/nova/+/87169913:38
bauzasdansmith: gibi or sean-k-mooney: we need to merge this one now https://review.opendev.org/c/openstack/nova/+/875621 given grenade was modified13:39
dansmithbauzas: yep, got it13:41
bauzas++13:43
elodillesbauzas: wow! finally... \o/13:43
dansmithgmann: around yet?14:12
dansmithgmann: I'm not sure I understand the skip-level-always comment.. you made it gate/voting in the template and thus it needs to be in the gate pipeline in nova's zuul as well is that right?14:13
dansmithjust confused by the irrelevant-files part I guess14:13
sean-k-mooneyyou can override it in nova .zuul.yaml14:14
sean-k-mooneyif we wanted too14:14
sean-k-mooneythe in repo options take precidnece14:14
sean-k-mooneyyou would just need to add the job by name to check/gate and set voting false. but i think your really wondiering if ti shoudl be voting?14:15
sean-k-mooneyor was this a zuul mechanics question14:16
opendevreviewDan Smith proposed openstack/nova master: Add grenade-skip-level-always to nova  https://review.opendev.org/c/openstack/nova/+/87577314:16
dansmithsean-k-mooney: I'm talking about a specific comment *on* nova's zuul.yaml from gmann 14:17
sean-k-mooneyoh hum well you have it in both check and gate and we have irrelevnet-files in both14:18
sean-k-mooneyi think they were suggesting editing the project template14:19
sean-k-mooneymaybe14:19
dansmithmaybe we just wait and see what he meant :)14:20
sean-k-mooneysure but im not sure i really like the idea of having to have teh irrelevnt file we use in the tempest repo14:21
sean-k-mooneysince that depnes on the nova repo strcuture. granted that changes very in frequesntly but still14:21
sean-k-mooneyit does not feel liek this shoudl be there14:21
sean-k-mooneygranted i have also said in the past that i woudl preer if the integrated-gate-compute template was in the nova repo but i knwo why the qa team want to keep those all in one repo14:22
dansmithare you talking about the policies-irrelevant list?14:23
dansmithI dunno why that's named that way, but AFAIK it's defined in this file, not in tempest14:24
opendevreviewDmitry Tantsur proposed openstack/nova master: ironic: clean up references to memory_mb/cpus/local_gb  https://review.opendev.org/c/openstack/nova/+/87841814:29
opendevreviewMerged openstack/nova stable/yoga: Handle mdev devices in libvirt 7.7+  https://review.opendev.org/c/openstack/nova/+/86615414:57
bauzaswhoami-rajat: it's quite late for your time, but it looks like we need a cinder-nova cross-project session, IIRC my emails :)14:59
bauzasfor the vPTG15:00
whoami-rajatbauzas, yeah i was just reading through it, I haven't finalized timing for cinder topics yet, do you have any time in mind?15:00
whoami-rajatwe're going to have a cross project with glance on thursday15:01
bauzasyup, I discussed that this morning with pdeore15:02
bauzaswhoami-rajat: Sofia was requesting the last Thursday slot or the first Wed slot15:04
bauzaswhoami-rajat: I can somehow set it for Wed 1300UTC, would that work for you ?15:05
whoami-rajatbauzas, hmm, the problem with first slots is we don't have full gathering, would thursday last slot work for you? 1600-1700 UTC15:05
bauzaswhoami-rajat: sure15:06
bauzaslet's take it15:06
whoami-rajatgreat!15:06
whoami-rajatthanks15:06
bauzaswhoami-rajat: would it work if that would be in our room ?15:06
whoami-rajatbauzas, sure, we can move there, is it on Zoom or any other platform?15:07
bauzaswhoami-rajat: we use the diablo room (zoom)15:07
whoami-rajatcool, we will be there15:08
bauzasI just added it in our etherpad https://etherpad.opendev.org/p/nova-bobcat-ptg#L9115:08
bauzaswhoami-rajat: and if you see other topics to discuss, please add them there15:08
whoami-rajatbauzas, sure, sounds good!15:09
bauzas++15:10
bauzasg'night15:10
dtantsurHey folks! Seeing this in the ironic grenade job: https://zuul.opendev.org/t/openstack/build/d90374e9b6554704a2f84b7fe8a9d411/log/controller/logs/screen-n-api.txt#418215:44
dtantsurrings any bells?15:44
dtantsurIt's quite possible that the ironic virt driver does not indeed support 'openstack console log show', but why is it called?15:45
dtantsurhmm, maybe a red herring. judging by https://opendev.org/openstack/grenade/commit/adcb563b185416451da419186a8d7773ffb6b913 it happens if ping fails.15:47
dtantsur(would be cool to check the virt driver before doing it)15:47
clarkbdtantsur: I think one of the responses to a failed tempest test is to dump the instance console log. This is often useful if there are networking issues because with a VM the console is accessed via libvirt and not the network and the logs often show you if dhcp failed etc15:48
dtantsurRight. Probably needs to exclude VIRT_DRIVER=ironic15:48
gmanndansmith: hi17:08
gmanndansmith: I mean to add  irrelevnet-files in gate pipeline in same way you did in check pipeline to avoid running grenade on doc only changes etc17:08
sean-k-mooneygmann: that in the curernt patch but i dont knwo if you left your comment on a previous version17:50
sean-k-mooneygmann: https://review.opendev.org/c/openstack/nova/+/875773/6/.zuul.yaml#79817:50
sean-k-mooneyah i see so in v5 it was being added to the gate pipeline via the template17:52
sean-k-mooneynot explictly 17:52
sean-k-mooneyso ya v6 adressed your comment17:52
sean-k-mooneyhowever it failed in v6 for somereason when it ran17:53
sean-k-mooneytest_security_group_rules_create17:53
sean-k-mooneyweired that looks unrealted17:53
sean-k-mooneyi wonder wy that failed17:54
gmannsean-k-mooney: yes, already +2 on that17:55
sean-k-mooneyim jsut quickly checking the logs before rechecking and +2ing17:55
gmannohk17:56
gmannI checked it from test failing log but did not go into deep17:56
sean-k-mooney nova.api.openstack.wsgi.Fault: Instance 08070e41-68b0-4dd3-9eb6-1926c3082060 could not be found.17:56
sean-k-mooneythere are a bunch of faults like that in the nova api17:57
sean-k-mooneyalthough im not sure its the same test17:57
gmannapi log might be confusing on NotFound due to negative tests17:59
sean-k-mooneyya i was assumign that too17:59
sean-k-mooneybut i was just checkign to see if there are any erference to that est17:59
sean-k-mooneyah found the request id req-90bd91de-9909-4b0a-a16f-1de245c6783418:00
sean-k-mooneyreq-90bd91de-9909-4b0a-a16f-1de245c67834 tempest-SecurityGroupRulesTestJSON-617656270 tempest-SecurityGroupRulesTestJSON-617656270-project-member] 10.209.0.48 "POST /compute/v2.1/os-security-groups" status: 200 18:00
sean-k-mooneyso nova was happy with it18:01
sean-k-mooneyim not seeing isseu on the neutorn side so im pretty happy this is a one off failure18:03
sean-k-mooneyi just have not seen that test fail before at least not that stuck out in my memory18:04
sean-k-mooneyso i wanted to check it a little more deeply  in case it was a real failure18:04
opendevreviewMerged openstack/nova master: Add grenade-skip-level-always to nova  https://review.opendev.org/c/openstack/nova/+/87577321:06
*** promethe- is now known as prometheanfire23:08
*** seebaer is now known as seba23:19

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!