Monday, 2021-08-30

opendevreviewBalazs Gibizer proposed openstack/nova master: Add force kwarg to delete_allocation_for_instance  https://review.opendev.org/c/openstack/nova/+/68880206:11
gibimelwitt: ^^ removed the co-authored line as you requested06:11
gibilyarwood, stephenfin, bauzas: we are still pretty much blocking the openstack gate without ^^06:34
lyarwoodI'm out today, last public holiday of the year in the UK but I'll review from my phone now.06:45
lyarwoodOkay done, LGTM.06:49
gibilyarwood: thanks, enjoy your day off06:57
abhishekkgibi, py38 post failure for https://review.opendev.org/c/openstack/nova/+/688802, could you please dd recheck ?08:39
elodillesgibi: could you please have a quick look at this placement release patch for stable/ussuri? (it's a generated patch to avoid release rush around EM transition): https://review.opendev.org/c/openstack/releases/+/80211010:01
gibielodilles: ack I will check10:07
gibiabhishekk feel free to recheck next time10:09
gibielodilles: done and thanks10:11
elodillesgibi: thanks \o/10:12
gibithe force kwargs patch https://review.opendev.org/c/openstack/nova/+/688802  bounced from the gate due to bug 1912310, I've requeued it12:21
sean-k-mooneywhat causes https://bugs.launchpad.net/nova/+bug/191231012:24
gibiI saw libvirt internal errors like12:25
gibi2021-07-30 08:56:25.528+0000: 57632: error : virProcessRunInFork:1159 : internal error: child reported (status=125): unable to open /dev/sda: No such device or address12:25
sean-k-mooneyok so it looks like its actully libvirt that is having issue not nova connecting to it12:26
gibiyepp12:26
gibias far as I understand12:26
gibithere are also occasions with 12:27
gibivirKeepAliveTimerInternal:137 : internal error: connection closed due to keepalive timeout12:27
sean-k-mooneywe are not seeing any OOM events or anythying else strange on the node at the time are we12:27
gibiI just like two occurence and found no such thing12:27
gibiI just checked like12:27
gibithe nova-live-migration job set to non-voting due to this12:28
gibibut it seems we can hit the same in nova-next too12:28
gibibut a lot less frequently12:28
sean-k-mooneyyep if failind in nova-net in this case12:29
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/f888b58ca23f49fc8f9046e9c2ad18a0/log/controller/logs/screen-dstat.txt12:29
gibiyes12:29
gibithat is basically a first time I see it in nova-next12:29
sean-k-mooneywe got donw to 120MB a few times but i dont see any really evidence of memory issues so likely not the kernel randomly killing things12:29
gibiaround the time of the failure we were floating around 300MB free12:32
sean-k-mooneyya its unlikely to be the cause but we have seen OOM issue break libvirt and other process in weird ways before.12:33
gibitrue, oom can cause weird thing12:34
gibis12:34
sean-k-mooneyill quickly check the cloud archive12:35
sean-k-mooneyperhaps there is a newer libvirt avaiable we coudl use instead12:35
gibididn't we use the max available?12:36
sean-k-mooneywell im not sure we are using the xena cloud archive currently12:37
sean-k-mooneybut looking at it they are not shiping libvirt/qemu in the cloud archive currently12:37
sean-k-mooneywe tyically dont use the most recent cloud archive version12:37
sean-k-mooneyso ya looks like we are using 6.x for ubuntu  "libvirt0:amd64                       6.0.0-0ubuntu8.12"  12:39
sean-k-mooneyon centos stream with the advance virt modulee we would be useing 7.x.y12:40
sean-k-mooneyits a long shot but we could enable this ppa as a test to see if that would resolve it. its the one i use when i need newer libvirt on ubutu but dont want to build from source12:43
sean-k-mooneyhttps://launchpad.net/~jacob/+archive/ubuntu/virtualisation12:43
sean-k-mooneyalthopugh that still only provides  6.6.0-1ubuntu2~ppa0 12:43
sean-k-mooneynot 7.x12:44
gibiI'm not sure how can we enable this in infra but feel free to go ahead. We can use the nova-live-migration job as canary as that is now non-voting but still run for almost all of our patches12:45
sean-k-mooneyya i might porpose a DNM patch just to see if that works. if it does it means we need to talk to canonical about a missing backport12:45
sean-k-mooneyproablem is i have no idea what is missing12:46
gibicool, good ide12:46
gibiidea12:46
sean-k-mooneythe other alternitive would be to move form ubuntu 20.04 to 21.04 or to centos 8 on the affected jobs12:46
sean-k-mooneywell there is another alternitive too which is complie libvirt/qemu form source which i have a devstack plugin to do but i would prefer to avoid that mainly due to extra job time. its not hard to do but if we can just use distro pacakages in this case its nicer12:49
gibias we declare our supported distros beforhand of the release I would go with trying to fix ubuntu 20.04 https://governance.openstack.org/tc/reference/runtimes/xena.html12:49
sean-k-mooneyyes although centos 8 stream is vaild too. but ya ill see if i can look into this a little later today. ill propose a couple of different patches for different options.12:50
sean-k-mooneyenableing "sudo add-apt-repository ppa:jacob/virtualisation" in a pre playbook is simple as is changing the base os to centos 8 stream 12:51
sean-k-mooneythe other options are more complicated but doable12:51
opendevreviewMerged openstack/nova master: Functional tests removed direct post call  https://review.opendev.org/c/openstack/nova/+/76606813:06
sean-k-mooneygibi: by the way we maintain a tempest plugin called whitebox that looks at some of the internals of how nova works and assert that it does the right thing. would you have any object to me enableing that for a subset of nova changes at least in a non voting capasity initaly?13:22
sean-k-mooneygibi: i was thinking of making it run on change to the libvirt driver and hardware.py13:22
gibisean-k-mooney: I have no problem with it if it is actively maintained and won't take up much of the CI resources13:23
sean-k-mooneyyes its maintianed and runs downstream we also maintin the devstack support upstream13:25
sean-k-mooneygibi: upstream many of the test are disabled because we dont have the hardware https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/whitebox_tempest_plugin/api/compute13:25
sean-k-mooneyi.e. we can run the pmem, sriov or vgpu test in the gate13:26
sean-k-mooneyi know we were lucking to see if we coudl use this for third party ci but we still are having problems finding hardware internally to run it13:26
opendevreviewBalazs Gibizer proposed openstack/nova master: Add two new hacking rules  https://review.opendev.org/c/openstack/nova/+/80566813:27
gibijust based on the test file names even without special hardware this plugin has useful coverage13:28
sean-k-mooneyyep it has all the test that were orginailly don by the intel thridpary nfv ci in it but updated13:28
sean-k-mooneyand some other test coverage13:28
gibithen lets enable it 13:31
sean-k-mooneygibi: when the qe member of the comptue  team downstream  writes test automation that is not suitable for upstream tempest becasue it depens on speicic configuration of the serives this is where we try to add the test coverage. 13:32
sean-k-mooneylike testing adding cpu flags which we can do https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/.zuul.yaml#L53-L55 in the ci like this13:33
gibiI agree to have that coverage in our upstream gate13:34
sean-k-mooneythanks ill let artom know and see if he wants me to wait for the jobs to be split or not first ill start on the WIP patch in anycase13:34
artomHuh, happy coincidence, I was pondering proposing a periodic whitebox job for Nova13:43
artomSo, I think it's not yet stable enough for that, actually13:43
artomWe think we know the issue, and we're working on it, but until then I'm not sure it's ready yet13:43
artomEvery so often, depending on which order tests end up being executed, what we think happens is we attempt to reshape from cpu_dedicated_set to vcpu_pin_set, and that's not allowed, so there's a cascading failure. There are also around how we use admin clients and clean up after ourselves, that can also cause cascading failures13:45
gibiartom: nothing is urgent from upstream perspective. If upstream feedback helps then I'm OK to enable a non voting job13:45
artomgibi, I think even that's premature, as the solution to ^^^ is to change whitebox's own job a bit, so until that's done, let's not add it to nova13:46
sean-k-mooneyartom: ok the reason i was bring this up was we did at one point plan to enabel white box for wallaby13:46
sean-k-mooneythen we did not have time to actully get it stable in time13:47
sean-k-mooneyso i was hoping we could do thatbefore the end of xena13:47
sean-k-mooneyif you think its not ready however we can hold off13:47
artomAh, probably not before the end of Xena13:47
artom... well, does end == FF?13:47
artomOr release?13:47
sean-k-mooneywell i guess i twas thinking before RC1 when stable branch is created13:48
sean-k-mooneyalthough if we were ok with backporting enableing the testing on the stable branch end could be anytime before eol i guess13:48
sean-k-mooneyif we dont think it ready however no need to rush13:49
sean-k-mooneyi would just like to keep making progress on getting this test coverage enable eithe firstparty or third party13:50
artomThird party I still haven't solved the hardware problem :)13:51
artomErr :(13:51
sean-k-mooneyartom: is bauzas  back today or is he retruning tomorow13:51
sean-k-mooneyartom: yep i know :)13:51
artomStill on PTO today, according to Workday13:52
gibiI personally OK with enabling new jobs on stable but I guess elodilles or lyarwood has more authority about that :)13:52
gibias per landing it on master, this is not a feature so RC1 is the cut of date due to branching13:53
artomI can try to hurry it up, especially as jparker seems to have more time for this right about now, too13:53
sean-k-mooneygibi: before i recheck are there any gate blockers i should hold off for13:54
sean-k-mooneyi was just looking at the failure in bauzas mdev series which dont seam related13:55
gibisean-k-mooney: the "Add force kwarg to delete_allocation_for_instance" not landed yet that kills at least 1/4 of the tempest jobs all around the gate13:56
sean-k-mooneyah right13:56
gibiI don't know about any full blocker13:56
sean-k-mooneyok i was seeing the nova-ceph-multistore job fail in several patch but have not dug in to see if its the same issue 13:56
sean-k-mooneyoh "'Failed to delete allocations for consumer 2064788c-9fa0-474e-a66c-72cf97b45922. .."13:57
sean-k-mooneyya so its just that13:57
gibiyes13:58
sean-k-mooneyany idea why it would hit the multistore job more often13:59
sean-k-mooneyit looks like that is mostly the failure so ill hold off until the force patch lands13:59
sean-k-mooneythey dont have +w anyway so they can wait14:00
elodilleswell, i don't exactly followed which job you were talking about but I am less concerned enabling new CI jobs on stable than disabling one o:)14:00
gibiIve no ide about the increased frequency of  multistore failure14:00
gibielodilles: it would be a job running https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/whitebox_tempest_plugin/api/compute14:01
sean-k-mooneyelodilles: which is defiedn here https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/.zuul.yaml#L31-L8114:01
sean-k-mooneygibi: artom is currntly actully spliting it into two jobs one that use the old cpu pinning config and the main one will only use the new way14:02
gibiack14:02
sean-k-mooneygibi: right now if the reshap happens at the wong time the job breaks14:02
sean-k-mooneyso we are just going to split it to aovid that14:02
gibisure make sense14:02
elodillesgibi sean-k-mooney : I guess these would land on master and then backported on the most recent stable branch, am I right?14:03
elodilles(hmmm, it looks quite heavy, according to its parent: tempest-multinode-full-py3)14:05
gibipersonally I would take it on master first14:07
sean-k-mooneyelodilles: well it need 2 nodes but it does not run run all the tempest test becasue we use the regex to limit it14:16
sean-k-mooneytox_envlist: all14:16
sean-k-mooney      tempest_concurrency: 114:17
sean-k-mooney      tempest_test_regex: ^whitebox_tempest_plugin\.14:17
sean-k-mooneyso i just ues that job to set up 2 node devstack with tempest then we just run the test from the plugin14:17
elodillesoh, i see, i missed that14:20
sean-k-mooneywe proably could inherit form something better to make that more obvious14:21
elodilleswell, when someone reviews it thoroughly i think it'll be obvious o:) but that's true that at first glance the tempest-multinode-*full*-py3 suggests some time and resource heavy test job o:)14:26
sean-k-mooneywe proably can just use devstack-tempest ill see if there is a better job we can use in the future. that is a simple fix14:26
*** akekane_ is now known as abhishekk14:41
gansoelodilles, lyarwood: Hi! If you have a spare minute could you please take a quick look at the backport now for victoria? it is clean and same as the one for wallaby from last week. Thanks in advance! https://review.opendev.org/c/openstack/nova/+/80600415:20
opendevreviewGhanshyam proposed openstack/nova master: Convert features not supported error to HTTPBadRequest  https://review.opendev.org/c/openstack/nova/+/80629415:22
elodillesganso: +2'd. Thanks for the backport! (fyi, lyarwood is on holiday today)15:46
gansoelodilles: thanks! I will ping him tomorrow =)15:47
elodillesno problem :)15:50
opendevreviewMerged openstack/nova master: tests: Validate AZ values  https://review.opendev.org/c/openstack/nova/+/80152316:06
opendevreviewMerged openstack/nova master: Add force kwarg to delete_allocation_for_instance  https://review.opendev.org/c/openstack/nova/+/68880217:05
sean-k-mooney:)17:05
opendevreviewMerged openstack/nova master: Prevent deletion of a compute node belonging to another host  https://review.opendev.org/c/openstack/nova/+/69480217:15
opendevreviewMerged openstack/nova master: Fix inactive session error in compute node creation  https://review.opendev.org/c/openstack/nova/+/69518917:15
opendevreviewMerged openstack/nova master: Reduce mocking in test_reject_open_redirect for compat  https://review.opendev.org/c/openstack/nova/+/80309117:16
opendevreviewMerged openstack/nova master: extend_volume of libvirt/volume/iscsi should not use device_path  https://review.opendev.org/c/openstack/nova/+/80100317:16
opendevreviewsean mooney proposed openstack/nova stable/victoria: address open redirect with 3 forward slashes  https://review.opendev.org/c/openstack/nova/+/80662617:38
sean-k-mooneygibi: elodilles  by the way are we goign to backport https://review.opendev.org/c/openstack/nova/+/68880217:39
opendevreviewsean mooney proposed openstack/nova stable/ussuri: address open redirect with 3 forward slashes  https://review.opendev.org/c/openstack/nova/+/80662817:56
opendevreviewsean mooney proposed openstack/nova stable/train: address open redirect with 3 forward slashes  https://review.opendev.org/c/openstack/nova/+/80662918:03
gibisean-k-mooney: I don't konw. The consumer types feature is only on master so we have a smaller issue on stable. And that smaller issue there since stein if I remember correclty. 19:51
gmanngibi: added releasenotes in this https://review.opendev.org/c/openstack/nova/+/80629421:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!