Friday, 2022-03-11

*** EugenMayer4 is now known as EugenMayer08:36
opendevreviewMaxim Korezkij proposed openstack/nova master: Handle volume attachments  https://review.opendev.org/c/openstack/nova/+/83323209:13
opendevreviewMaxim Korezkij proposed openstack/nova master: Handle volume attachments  https://review.opendev.org/c/openstack/nova/+/83323309:15
opendevreviewMaxim Korezkij proposed openstack/nova master: Handle volume attachments  https://review.opendev.org/c/openstack/nova/+/83323409:21
lajoskatonabauzas: Hi, do you know if there is any Nova/Neutron xproject topic? On the nova etherpad there's no at the moment, but perhaps you know more :-)09:28
bauzaslajoskatona: hey :)09:30
bauzaslajoskatona: I created the etherpad last week, so for the moment, we don't have a lot of topics09:30
bauzaslajoskatona: but you can surely add your own topics if you want and we can find some timeslot after ;)09:31
opendevreviewribaudr proposed openstack/nova master: Fix unit tests when they are run with OS_DEBUG=True  https://review.opendev.org/c/openstack/nova/+/83311509:31
lajoskatonabauzas: ok, thanks, I will check, but currently I don't know any09:31
bauzaslajoskatona: I can add a section, in case people want to discuss 09:32
lajoskatonabauzas: thanks, perhaps that's the best , and we will see if there will be concrete issues/features to discuss09:35
bauzaslajoskatona: just updating the etherpad09:36
*** gibi is now known as gibi_pto09:38
gibi_ptoI will be back around a bit Monday and Thursday09:39
elodillesgibi_pto: we will have national holiday on those days ;)09:42
gibi_ptonot on thursday :)09:42
elodillestrue :)09:43
gibi_ptoand my wife works on Monday so she cannot prevent me to look at IRC09:43
kashyapgibi_pto: Get off IRC, you're polluting your PTO, really :)09:45
elodilles:]09:45
kashyapEmbrace JOMO (joy of missing out).09:46
opendevreviewOpenStack Release Bot proposed openstack/nova stable/yoga: Update .gitreview for stable/yoga  https://review.opendev.org/c/openstack/nova/+/83324109:54
opendevreviewOpenStack Release Bot proposed openstack/nova stable/yoga: Update TOX_CONSTRAINTS_FILE for stable/yoga  https://review.opendev.org/c/openstack/nova/+/83324209:54
opendevreviewOpenStack Release Bot proposed openstack/nova master: Update master for stable/yoga  https://review.opendev.org/c/openstack/nova/+/83324309:54
opendevreviewOpenStack Release Bot proposed openstack/nova master: Add Python3 zed unit tests  https://review.opendev.org/c/openstack/nova/+/83324409:54
opendevreviewribaudr proposed openstack/nova master: [WIP] Attach Manila shares via virtiofs(manila abstraction)  https://review.opendev.org/c/openstack/nova/+/83119410:02
opendevreviewribaudr proposed openstack/nova master: [WIP] Enable and use COMPUTE_STORAGE_VIRTIO_FS and COMPUTE_MEM_BACKING_FILE traits.  https://review.opendev.org/c/openstack/nova/+/83309010:02
opendevreviewElod Illes proposed openstack/nova stable/yoga: [stable-only] Update .gitreview for stable/yoga  https://review.opendev.org/c/openstack/nova/+/83324110:03
opendevreviewElod Illes proposed openstack/nova stable/yoga: [stable-only] Update TOX_CONSTRAINTS_FILE for stable/yoga  https://review.opendev.org/c/openstack/nova/+/83324210:03
bauzasgibi_pto: enjoy your time off 10:07
bauzaselodilles: so not around too on Monday and Tuesday ?10:09
elodillesbauzas: yepp, those are holidays here10:13
bauzascool, enjoy then !10:14
elodillesbauzas: thanks :)10:15
elodillesbauzas: though i also might be around, will look at IRC and participate on the weekly meeting :)10:16
bauzaselodilles: this is holidays10:16
elodillesbut i don't promise anything o:)10:16
elodillesbauzas: yepp :D10:16
bauzaselodilles: don't be around and rest 10:16
elodillesbauzas: will do that as well ;)10:16
bauzasin August, I'll be off for 3.5 weeks and I'd be back 2 days before FF...10:17
elodillesthat sounds like a brave thing :D10:17
elodillesbut the team is here, so there will be surely someone who can help in :)10:18
elodilleswill it be a big family - tesla tour around France? :)10:19
opendevreviewchangxin xiao proposed openstack/nova master: Fix openstack/nova git repo https://bugs.launchpad.net/nova/+bug/1964548  https://review.opendev.org/c/openstack/nova/+/83324810:24
bauzaselodilles: just the usual annual family holidays with 2 weeks in Corsica, indeed10:28
elodillesthat's also sounds relaxing :)10:28
bauzaswith 2 kids ? not exactly relaxing 10:42
bauzasbut at least, it's called "holidays"10:42
bauzasthe only change is I won't dad taxi every day10:42
opendevreviewMaxim Korezkij proposed openstack/nova master: fixup! Handle volume attachments  https://review.opendev.org/c/openstack/nova/+/83325710:44
opendevreviewMaxim Korezkij proposed openstack/nova master: Handle volume attachments  https://review.opendev.org/c/openstack/nova/+/83323410:45
elodillesbauzas: let's say, different kind of relaxing :D11:57
opendevreviewMerged openstack/nova stable/yoga: [stable-only] Update .gitreview for stable/yoga  https://review.opendev.org/c/openstack/nova/+/83324112:46
opendevreviewMerged openstack/nova stable/yoga: [stable-only] Update TOX_CONSTRAINTS_FILE for stable/yoga  https://review.opendev.org/c/openstack/nova/+/83324212:51
opendevreviewErlon R. Cruz proposed openstack/nova master: Adds regression test for bug LP#1944619  https://review.opendev.org/c/openstack/nova/+/83316613:55
opendevreviewErlon R. Cruz proposed openstack/nova master: Fix pre_live_migration rollback  https://review.opendev.org/c/openstack/nova/+/81532413:55
bauzashuzzah, we now have master be Zed :)14:01
bauzasthanks elodilles14:01
chateaulavall hail Zed14:02
*** dasm|off is now known as dasm14:06
dansmithkashyap: I wonder if you might be interested in chasing a CI failure I've seen a couple times on centos jobs, where we ask libvirt to detach a volume, and it just never happens14:29
dansmithit's just, AFAICT, a simple file-based volume so I dunno if it's a guest refusing to let it go or what14:30
kashyapdansmith: Got a bug or a link?  (Detach volume sucks the marrow out of my life ... but got used to it :D)14:36
dansmithyeah, lemme get one14:36
kashyapIs it this one? - https://bugs.launchpad.net/nova/+bug/196034614:39
dansmithkashyap: https://zuul.opendev.org/t/openstack/build/87df2018e335440f830b08fe1a05bfb7/logs14:41
* kashyap clicks14:42
dansmithkashyap: ah, looks like it!14:42
kashyapOkay, I was hoping: "Oh, not again, not a new one" -- me and Gibi recently spent a few days chasing it down14:43
dansmithso what's the outcome there? it looks like just making a centos job non-voting :)14:43
* kashyap goes to check the email thread about it14:44
kashyapThere was a direct thread about it w/ libvirt folks.  We had two hypotheses:14:45
kashyap(Actually I noted both in the bug.  Lemme look what's the current status)14:45
kashyapdansmith: Okay, to summarize, our two hypotheses were these:14:45
kashyap1) the guest OS didn't confirm the detach14:45
kashyap2) there was a recent bug in QEMU triggered by using JSON syntax for `-device`14:45
kashyapIt turns out to be #114:46
dansmithbut why does this seem to only happen on the centos jobs? it's all the same cirros in the guest right?14:46
kashyapFor now, we've hacked around it by adding extra delay :-(14:46
dansmithare you talking about the wait_until=SSHABLE patches in terms of the delay?14:47
kashyapdansmith: Good question!  Damned if I know, why it's happening only on CentOS jobs.  (I'm tempted to say "something to do w/ virt-package versions")14:47
kashyapdansmith: No, that's the bigger thing that's not merged yet (IIRC).  There was a 120-sec delay patch from Gibi ... lemme look14:48
dansmithbecause I think it's still happening, AFAICT14:48
dansmithoh jeez, 120s?14:48
sean-k-mooneykashyap: the centos 9 stream jobs are much much newer version fo libvirt and qemu vs ubuntu14:48
sean-k-mooneyi think sshable has merged or mostly merged in the last day or two14:49
dansmithright, which is why I'm not sure how we could ever really run on something so bleeding edge where we need stability14:49
sean-k-mooneyim not sure if all of the patches are landed but there was proggess on them14:49
dansmithsean-k-mooney: that landed on 3/3 I think14:49
sean-k-mooneywell its the same version we will be releaseing stable wallaby on downstream14:49
sean-k-mooneyalso ubuntu 22.04 will have similar version when it releases 14:50
sean-k-mooneyso its kind of good that centos 9 stream is catching this14:50
dansmithsean-k-mooney: doesn't stream9's version track upstream closer, like almost constantly moving?14:51
kashyapNot quite; it is the "upstream of RHEL"14:51
kashyapSo not as bleeding as Fedora, but not as "stable" as RHEL either14:52
sean-k-mooneydansmith: the pacakges are older the fedora but newer then rhel but not by much14:52
kashyapSo you might get the worst of everything w/ Stream :D14:52
dansmithheh14:52
sean-k-mooneyeffectivly stream is what would be in the next point release of rhel14:53
dansmithwell, I'd be concerned if we're adding sleeps to paper over something that would be a real problem with the host triggering the guest to release the block device14:53
dansmithseems like something must have changed if the guest is identical14:53
sean-k-mooneydansmith: the sleep was to see if it was related to the kernel booting14:53
sean-k-mooneynot an actual fix14:53
sean-k-mooneybefore the sshable serires landed14:53
dansmithsean-k-mooney: ah okay I thought kashyap was suggesting that was the workaround14:54
sean-k-mooneyat least that was my understanding14:54
dansmith[06:46:25]  <kashyap> For now, we've hacked around it by adding extra delay :-(14:54
dansmiththis ^ but maybe that's not what he meant14:54
sean-k-mooneythe sleep i belive was in tempest not nova if its the patch im thinking of14:54
kashyapdansmith: Sorry, I should've been clearer; I don't see the 120sec patch in tree; but I swear Gibi mentioned it in a thread14:54
dansmithsean-k-mooney: yeah I assumed tempest14:55
kashyapsean-k-mooney: For the SSHable series to land, "someone" (a body) needs to shepherd it...Not sure who has the will for it14:55
sean-k-mooneykashyap: gmann took it over14:56
sean-k-mooneyi think its landed already14:56
dansmithyeah it's already merged14:56
dansmithand  I still see fails, like this one I think from five days later:14:56
dansmithhttps://zuul.opendev.org/t/openstack/build/ee63e247893c42a69e096e14f430585014:56
dansmithI haven't dug into that one yet but looks the same14:57
sean-k-mooneyya we did not know if the sshable woudl fix it we just hoped it woudl reduce the issue. we saw some cases where test wer doing attach, detach and live migrate all before the kernel was at thet login prompt form the inital boot14:58
sean-k-mooneyand the kernel then crashed during the migration14:58
dansmithokay, but nova retries things like the volume detach like ten times,14:58
dansmithso I would expect that would resolve that race for detach right/14:59
sean-k-mooneyya so we need to remove that retry14:59
sean-k-mooneygibi and i spoke about this a while ago14:59
sean-k-mooneykashyap: can proably verify but  i think qemu now considers it an error to detach an already detaching vloume14:59
dansmithis libvirt/qemu only requesting the guest drop it once? because subsequent ones seem to be a libvirt refusal to try again15:00
dansmithyeah I see that error in the logs after the first attempt15:00
sean-k-mooneyya so it used to be undefiend behavior that on some release aborted the detach then qemu started rejecting it and 15:00
kashyapsean-k-mooney: Yes, that's right - a device that is already being unplugged, QEMU will consider another attempt at it an error15:00
sean-k-mooneyim not really sure what the intended behavior is now15:01
dansmithso .. what to do? if the guest isn't ready to handle the signal we're just screwed until reboot or something?15:01
dansmiththat seems pretty broken15:01
sean-k-mooneyi dont know honestly. i dont think we want to do what we did in the past for snapshot which was stop and start the vm15:02
sean-k-mooneywhen live snapshots were not possible15:02
dansmithyeah :/15:02
sean-k-mooneyother then that i dont know of a way to force this form nova15:02
dansmithbut again, I'm still curious about why this seems to only happen on centos hosts15:02
dansmithunless you think that behavior changed between 20.04's libvirt and now?15:03
sean-k-mooneywe were seeign it on ubuntu too with q3515:03
kashyapdansmith: So, this was the DNM patch of 120sec delay: https://review.opendev.org/c/openstack/devstack/+/828705/8/.zuul.yaml15:03
sean-k-mooneyi dont think we have seen this with pc and ubuntu15:03
dansmithsean-k-mooney: oh, are the centos jobs all q35 by default?15:03
sean-k-mooneyno i dont think so15:04
kashyapThe thing is, CirrOS needs 10 sec to boot in our CI, but Nova returns "ACTIVE" when the guest spawns.15:04
dansmithkashyap: ah that's the timeout not a delay, and just runs more futile retries I guess?15:04
sean-k-mooneybut it could be a combination of the differnet way attach is done with q35 and the new versions treatment of pc15:04
kashyapRight, I read the timeout as "delay before the detach"15:04
sean-k-mooneydansmith: we coudl try using a debian or maybe ubuntu latest job to verify15:04
sean-k-mooneyill check nodepool quickly15:05
sean-k-mooneybut i think we have a debian 11 image avaiable15:05
sean-k-mooneyand i think it will have a similar libvirt as c9s15:05
kashyapdansmith: Also, yes, Q35 does have some additional hidden special bugs with hot unplug.  I was told <cough> CentOS/RHEL 8.6 has better fixes in that area15:05
kashyap(Ouch, I should not use the c-word in these times)15:06
dansmithkashyap: may I suggest <ahem>15:06
kashyapHeh, sure15:07
sean-k-mooneydebian-bullseye is there so we could try recreating it on that15:07
kashyapdansmith: Incidentally, I was supposed to work with Red Hat QE today/Monday to test those bits15:08
dansmithsean-k-mooney: so what's the point of doing that? to see if it seems to be characteristic of new libvirt/qemu and not something else about stream9 itself?15:08
kashyap("those bits" == supposed fixes in QEMU from 8.6)15:09
dansmithkashyap: does that make it to stream9 at some point I hope?15:09
kashyapYes, definitely.  They should.15:09
kashyap"There were number of improvements for both native PCI-E (albeit it still slow to react (due to how it's implemented in guest OS) and now q35 supports ACPI base hotplug, can you check with latest machine type (which supposedly should use ACPI hotplug) and see if it resolved the issue."15:10
sean-k-mooneydansmith: yep basically15:10
sean-k-mooneyhttps://bugzilla.redhat.com/show_bug.cgi?id=200712915:10
kashyapThat's the comment from a PCI(e) dev from QEMU (from a RHT bug)15:10
sean-k-mooney^  that is the main bug right15:10
kashyap(Where "latest machine type" == 8.6 / 9)15:10
dansmithack15:12
sean-k-mooneyfrom that bug the say "This bug is related to some change in qemu-6.2 so it should not be there in RHEL 8.4/8.2,"15:12
sean-k-mooneyhowever we have had detach issue on 8.4 15:12
sean-k-mooneyanyway it might just be down to the use of 6.2+ on c9s15:12
*** hemna1 is now known as hemna15:27
opendevreviewAndre Aranha proposed openstack/nova master: Move FIPS jobs to experimental and periodic queue  https://review.opendev.org/c/openstack/nova/+/83343115:50
opendevreviewsean mooney proposed openstack/nova master: [WIP] enable block VDPA operations  https://review.opendev.org/c/openstack/nova/+/83233015:55
opendevreviewsean mooney proposed openstack/nova master: [WIP] enable blocked VDPA operations  https://review.opendev.org/c/openstack/nova/+/83233015:55
opendevreviewAndre Aranha proposed openstack/nova master: Test setting the nova job to centos-9-stream  https://review.opendev.org/c/openstack/nova/+/83184415:56
sean-k-mooneyi need to add a few unit tests and a release note but i think ^ that is basically done15:56
sean-k-mooneyi still want to test this with real hardware however before i do and i proably need to update the docs too15:57
* bauzas stops to work for this week15:58
bauzas\o15:58
sean-k-mooneyo/16:01
opendevreviewsean mooney proposed openstack/nova stable/xena: reenable greendns in nova.  https://review.opendev.org/c/openstack/nova/+/83341116:14
opendevreviewsean mooney proposed openstack/nova stable/wallaby: reenable greendns in nova.  https://review.opendev.org/c/openstack/nova/+/83343516:21
opendevreviewsean mooney proposed openstack/nova stable/victoria: reenable greendns in nova.  https://review.opendev.org/c/openstack/nova/+/83343616:22
opendevreviewsean mooney proposed openstack/nova stable/ussuri: reenable greendns in nova.  https://review.opendev.org/c/openstack/nova/+/83343716:23
opendevreviewsean mooney proposed openstack/nova stable/train: reenable greendns in nova.  https://review.opendev.org/c/openstack/nova/+/83343816:23
opendevreviewTakashi Natsume proposed openstack/nova master: Update min supported service version for Zed  https://review.opendev.org/c/openstack/nova/+/83344016:49
zigoI'm packaging nova RC1. I've seen that os-win is removed. Is that library useless now?17:05
zigoOh... setup.cfg ... :P17:06
opendevreviewTakashi Natsume proposed openstack/nova master: Update contributor guide for Zed  https://review.opendev.org/c/openstack/nova/+/83344117:06
sean-k-mooneyzigo: its an optional dep (and always was) so its no in extras17:21
sean-k-mooneyzigo: https://github.com/openstack/nova/commit/86d87be8db588cc3125d53cd92e271fb45b1a3aa for context17:22
sean-k-mooneyzigo: this was partly propeted by unmaintained packages that were breakign the gate17:23
zigosean-k-mooney: Is zVMCloudConnector completely gone?17:24
zigoOr will it stay ...17:24
zigoIn other words: should I ask for its removal from Debian and erase all traces of it?17:25
gmanndansmith: kashyap sean-k-mooney for detach failure/SSHable things, this last one needs to be merged, rescue negative test which this patch making SSHable was failing in reported bug. https://review.opendev.org/c/openstack/tempest/+/83160817:25
gmannit is not ready, need to debug on change failure though 17:25
dansmithgmann: ah are you saying that the sshable patches that already merged are working and that there are just a few remaining ones needing to be converted (in that patch)?17:29
sean-k-mooneyzigo: we still have it in tree https://github.com/openstack/nova/tree/master/nova/virt/zvm im not sure what its state is17:30
gmanndansmith: patches merged are few volume detach are made SSH-able but the failing test in centos9-stream was rescue negative which is in-progress in 83160817:30
gmanndansmith: those merged were not failing, may be due to the wait between server create and detach operation call. in rescue negative timing were playing key role 17:31
dansmithgmann: okay the latest failure I'm looking at includes test_rescued_vm_detach_volume but there's another in there, which may or may not be related17:32
dansmithbut yeah good to know17:32
sean-k-mooneydansmith: how oftten are you seeing the failure by the way17:34
sean-k-mooneyis it blockign the gate consitently17:34
gmanndansmith: yeah that test and in negative test just try the detach and assert expected failure as detach cannot be done on rescue server but later this test does un-rescue server and detach in cleanup there it stuck 17:34
dansmithsean-k-mooney: on centos, this was one very common one we were suffering in the glance job17:34
gmann* in that negative test17:34
sean-k-mooneydoes glance need to test this?17:35
dansmithgmann: do those tests actually ssh for some reason, or are we just using ssh to determine readiness?17:35
gmanndansmith: just to check readiness 17:35
dansmithsean-k-mooney: yeah, this was a glance-cinder-multistore job which needs to run volume-related tests of course17:35
sean-k-mooneysome volume operation are certenly glance related but detach proably isnt17:35
dansmithgmann: we could use the login prompt via console instead to reduce the need for secgroups, if that's hard for some reason17:36
dansmithsean-k-mooney: it's a job that tests cinder-glance multistore arrangements17:36
gmanndansmith: that is one try if that fix it. but there might be some other issue. we will see if that patch (once pass gate) can pass cento9 job too17:36
sean-k-mooneydansmith: we removed usign the console in an eairler patch17:36
dansmithsean-k-mooney: but the only reason we were running on centos was because we were trying to get a fips job and used one of our existing wide-coverage jobs to do that17:36
dansmithgmann: ack17:37
gmannyeah SSH-able was preferred than console check17:37
dansmithit's definitely good, it's just more than required for this but fair enough17:37
gmannlet me debug sec group thing today or monday. I compared and it was same as other test doing but i might have missing something 17:37
sean-k-mooneyi think the console check failed in a specific edgecase but i dont recall what it was exactly. 17:38
dansmithit's just out-of-band, so a little less fragile,17:38
dansmithbut it's cool if we're trying to stick to sshable as the indicator17:38
opendevreviewArtom Lifshitz proposed openstack/nova master: Add whitebox-devstack-multinode job to periodic  https://review.opendev.org/c/openstack/nova/+/83345318:10
*** artom__ is now known as artom18:10
opendevreviewDan Smith proposed openstack/nova master: Attempt to thin out nova-ceph-multistore  https://review.opendev.org/c/openstack/nova/+/83347021:46
*** dasm is now known as dasm|off22:27

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!