Thursday, 2022-11-17

melwittbauzas: this patch looks relevant to your interests https://review.opendev.org/c/openstack/nova/+/86467401:57
opendevreviewJorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host  https://review.opendev.org/c/openstack/nova/+/86481205:59
opendevreviewJorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host  https://review.opendev.org/c/openstack/nova/+/86481208:04
opendevreviewJorhson Deng proposed openstack/nova master: Optimize the small pagesize in numa_fit_instance_to_host  https://review.opendev.org/c/openstack/nova/+/86481208:42
johnthetubaguyWhen you run functional tests on your dev box, and you get loads of errors due to no valid host, it probably means I am doing something stupid, has anyone else hit that at all please?09:44
bauzasjohnthetubaguy: which testcases ?10:09
johnthetubaguygood question, mostly the ones that use placement10:11
johnthetubaguythere are loads of them in the functional suite, one example is: nova.tests.functional.wsgi.test_services.TestServicesAPI.test_resize_revert_after_deleted_source_compute10:11
johnthetubaguypretty sure its my environment being broken, as so many fail10:11
johnthetubaguyah, so I think it was by default python3 being 3.8, oops!10:19
johnthetubaguycuriously, unit tests are just fine, but functional tests, no so much10:20
bauzasjohnthetubaguy: sorry was on a meeting10:22
johnthetubaguyno worries, the fix was simple enough in the end10:22
bauzasjohnthetubaguy: yesterday I ran some reshape function tests locally and I had no problem10:22
johnthetubaguyyeah, its my bad default python version, thats all10:22
bauzasjohnthetubaguy: oh, so due to py38 ? strange if so10:22
johnthetubaguyyeah10:23
bauzasI guess you recreated your venv too ?10:23
johnthetubaguyI suspect something had failed to start and I didn't notice that, all I noticed was lots of no valid host exceptions 10:23
johnthetubaguyso tox spotted it and did the re-create10:23
bauzasand it worked then ?10:23
johnthetubaguyin tox.ini the base python is python3 rather than python3.9 I guess, which caused my "fun"10:24
johnthetubaguyyeah, its working fine now10:24
bauzasI suspect some os-resource-classes or os-traits library update wasn't updated10:24
bauzashence placement failing and then the novalidhosts eventually10:24
bauzasdoing the tox -r updated the deps10:24
johnthetubaguyah... got you, very possible10:24
johnthetubaguyactually, I remember seeing some os-traits error actually10:25
bauzasyeah, we hardfail on the number of traits10:25
johnthetubaguyit was a missing attribute error I think, which is basically the same thing10:25
bauzascool10:25
johnthetubaguywell mystery solved, thank you!10:25
bauzasnp, glad you fixed it by yourself :D10:26
johnthetubaguy(makes bashing with a hammer noises)10:26
bauzas:)10:27
opendevreviewJohn Garbutt proposed openstack/nova master: Functional test test_boot_reschedule_with_proper_pci_device_count  https://review.opendev.org/c/openstack/nova/+/76035410:44
opendevreviewJohn Garbutt proposed openstack/nova master: Fix PCI passthrough race on reschedule (refresh)  https://review.opendev.org/c/openstack/nova/+/71084810:44
opendevreviewJohn Garbutt proposed openstack/nova master: Fix PCI passthrough race on reschedule (claims)  https://review.opendev.org/c/openstack/nova/+/71084710:44
johnthetubaguygibi: I think you reviewed those in the past, I am not sure it answers all your questions, but I put the functional test first, in an attempt to work out which patches are needed. Its a nasty bug that on re-schedule you try to get the wrong PCI device, but fail with in-use errors.11:02
sean-k-mooneyjohnthetubaguy: so i was thinking about your ironic patch for reserving on schedule over night11:03
sean-k-mooneyi think in general its a good idea11:04
sean-k-mooneythere was some concern about if cleaning was used or not in large cloud right extendign the time they would be unavaiable11:04
johnthetubaguyyeah, gibi was mentioning that, and well, I don't disagree11:05
sean-k-mooneyif we wanted to cater for that we coudl make this configurable but i think the its proably ok ot reserve by default or uncondtionally11:05
johnthetubaguyyeah, it feels like a future workaround config, if its a problem for people11:05
johnthetubaguyinterestingly, I think it fixes an extra case I should add to the commit...11:05
sean-k-mooneyoh what one11:06
johnthetubaguywhen you mark an in-use node as in maintenance mode as its broken, user gets to delete their instance when they are ready, and that goes into clean failed (depending on your ironic config), we don't hit the race with our placement updates any more either11:07
johnthetubaguywe had the same window with that, once the allocation is removed when the instance is deleted11:07
sean-k-mooneyoh ok so clean failed happens because it in mantainance11:08
sean-k-mooneyand cant actully start cleaning?11:08
johnthetubaguymore that, we start sending new instances to the node that is in maintainance, shortly after the user deletes their nova server11:08
johnthetubaguyits basically the same race condition, but with a slightly different reason11:09
sean-k-mooneynice i alwasys like it when one fix fixes multiple bugs11:09
johnthetubaguytotally11:10
sean-k-mooneyso between https://review.opendev.org/c/openstack/nova/+/842478 (the retry) and https://review.opendev.org/c/openstack/nova/+/864773 (reserving) we have two fixes. they retry is  certinaly backportable11:11
sean-k-mooneyreserving honelsy proably is too but to backport that i think we would need the workaround option11:11
johnthetubaguyyeah, I think we need both, although the new one makes the older one less important11:12
johnthetubaguyi.e. the older one only matters where available nodes go no longer available, and we don't spot it right away, since we remove the issue with automatic cleaning also causing that problem11:12
sean-k-mooneyyes so i was about to approve the old patch and then soft -1 the second one askign for the workaround option if that works for you.11:14
sean-k-mooneythe only thing i was wonderign about for the first patch is shoudl it have a release note 11:14
sean-k-mooneyalthough its only apartial fix11:14
sean-k-mooneythe second patch should have one11:14
johnthetubaguyyeah, second patch needs one for sure11:15
johnthetubaguythe first one might be worth advertising via a release note I guess, it is handy11:15
johnthetubaguybut merging is better for me, obviously :)11:16
sean-k-mooneyshall i hold +2w for you to add one or just go for it11:16
sean-k-mooneyi think its fine as is11:16
johnthetubaguyyeah, lets get that first one in, I will add some workaround stuff in the second one11:17
sean-k-mooneyi like to have releas notes for close-bug11:17
sean-k-mooneyi treat them as optional for paritals or related11:17
johnthetubaguyah, fair enough11:17
sean-k-mooneycool first one is on its way11:17
sean-k-mooneyill comment on the second 11:17
johnthetubaguysweet, thank you11:18
sean-k-mooneyi owe you a review of the ironic spec too, i didnt get to it on the review day so once im done with this ill take a look at that next11:19
johnthetubaguyAh, that would be great, thank you. I am sure it needs some refinement. I have half a plan to do some POC work on that soon, but these other bugs keep distracting me.11:24
sean-k-mooneyyour looking at a thrid bug related to pci claims too right. did you pick that up form mark?11:31
sean-k-mooneyya its https://review.opendev.org/q/topic:bug%252F186055511:32
johnthetubaguysort of yes, trying to restack that on top of the functional test11:32
johnthetubaguyI need to decide which of these we backport in various downstreams11:33
sean-k-mooneyi havent looked at it in a while but i tought it would be backportable upstream11:34
sean-k-mooneyin case that helps.11:35
johnthetubaguyyeah, I agree, I think it should be11:36
johnthetubaguyI am not totally sure about the claims stuff, and how critical that is, its very possible we leak PCI devices without that fix up11:36
sean-k-mooneyleak is not quite right11:37
sean-k-mooneywe can end up claiming more then we request11:38
sean-k-mooneythey will all get freed when the vm is deleted11:38
sean-k-mooneybut not before11:38
sean-k-mooneywithout it11:38
johnthetubaguyyeah, without that object refresh, that is certainly true11:38
sean-k-mooneywe have seen this persiste even after the vm is shelved in downstream bug reports11:39
johnthetubaguythe functional test probably needs more checks on the PCI claims I guess11:39
sean-k-mooneynot nessisalry because fo the rescdule there were issue with resize in the past that had a similar effect11:39
johnthetubaguyah, good to know, certainly believeable11:39
johnthetubaguyah, interesting11:40
sean-k-mooneyi have not look i added RP+1 to get it on my list i proably wont get to it this week but ill try and see if i can take a look again on monday11:40
johnthetubaguythank you11:41
opendevreviewKonrad Gube proposed openstack/nova-specs master: Add API for assisted volume extend  https://review.opendev.org/c/openstack/nova-specs/+/85549011:43
sean-k-mooneyelodilles: im plannign to fix a trivial docs issuw with one of our config options but i want to also test this in ci https://bugs.launchpad.net/nova/+bug/199609412:21
sean-k-mooneyi chatted to bauzas  about this a bit downstream and we wer eunsure how you and other stabel cores would feel about when it comes to backporting12:22
sean-k-mooneyelodilles: is it ok to do both in one patch or woudl you prefer we did not backport the ci change 12:22
sean-k-mooneyi woudl prefer to backport both and if we are backportign both i would prefer to have it be in one patch12:22
sean-k-mooneymy curent plan is to set heal_instance_info_cache_interval=0 in nova-next12:23
opendevreviewJohn Garbutt proposed openstack/nova master: Ironic nodes with instance reserved in placement  https://review.opendev.org/c/openstack/nova/+/86477312:36
johnthetubaguygibi sean-k-mooney I have added a release note and the workaround config, slightly more intrusive, but not by much I guess.12:39
* gibi reads back12:39
sean-k-mooneyack most of the way though the spec not much feeback so far beyond what is already there12:40
sean-k-mooneyjust got to the nova manage command12:40
bauzassean-k-mooney: elodilles: yup, I just wondered if we were ok for backporting a .zuul file :)12:48
sean-k-mooneyim pretty sure i have dont that before12:49
sean-k-mooneybackported a job change we defintly did it to fix the train gate12:50
elodillessean-k-mooney bauzas : as far as i remember, traditionally, doc change backports were not accepted, but i think it is OK to backport them. about the CI I'm a bit hesitant, though. but it depends on the change, i would say13:01
gibijohnthetubaguy: left reply in the pci re-schedule bugfix13:01
gibijohnthetubaguy: that single instance.refresh() feels strange to me13:01
sean-k-mooneyelodilles: it littrally is disablelng a perodic task 13:01
sean-k-mooneyelodilles: we perodicaly heal the network info cache but that in practic should not be required as neutron tells us when something chagnes13:02
sean-k-mooneyelodilles: so the ci change is setting one config value to 013:02
johnthetubaguygibi: agreed, its crazy that fixes so much, drops all the transient changes before a save, roughly. I want more of that claims stuff in the functional test I think13:03
sean-k-mooneyin one job nova-next13:03
elodillessean-k-mooney: so it won't be a new CI job, but a config value change in 1(?) job as I understand then13:03
sean-k-mooneyyes just one addtionall config override in the nova-next job13:03
gibijohnthetubaguy: if the PciDevice.instance_uuid field is already update to point to this instance then simply resetting the instance with instance.refresh() cannot be enough13:03
sean-k-mooneyto set the existing config option to 0 instead of the defautl 6013:03
elodillessean-k-mooney: and you state that it won't introduce any instability o:)13:04
sean-k-mooneywell thats why we want to have it runing in ci13:04
sean-k-mooneybut i dont belive it will13:04
elodillesmaby let's see it first in master banch then :)13:04
elodilles* maybe13:05
sean-k-mooneyok so two patches one for docs change to corect the help text adn seperate one for zuul 13:05
sean-k-mooneyi can do that13:05
elodillessean-k-mooney: yep, that sounds good13:05
sean-k-mooneycool ill do that then thanks13:09
opendevreviewAmit Uniyal proposed openstack/nova stable/train: functional: Change order of two classes  https://review.opendev.org/c/openstack/nova/+/86467213:36
opendevreviewAmit Uniyal proposed openstack/nova stable/train: functional: Rework '_delete_server'  https://review.opendev.org/c/openstack/nova/+/86472113:36
opendevreviewAmit Uniyal proposed openstack/nova stable/train: functional: Unify '_build_minimal_create_server_request' implementations  https://review.opendev.org/c/openstack/nova/+/86471313:36
opendevreviewAmit Uniyal proposed openstack/nova stable/train: Extend NeutronFixture to allow live migration with ports  https://review.opendev.org/c/openstack/nova/+/86490013:36
opendevreviewDanylo Vodopianov proposed openstack/nova master: Napatech SmartNIC support  https://review.opendev.org/c/openstack/nova/+/85957714:04
opendevreviewDanylo Vodopianov proposed openstack/os-vif master: MTU support for DPDK port added  https://review.opendev.org/c/openstack/os-vif/+/85957414:09
opendevreviewJohn Garbutt proposed openstack/nova master: Ironic nodes with instance reserved in placement  https://review.opendev.org/c/openstack/nova/+/86477314:11
opendevreviewMerged openstack/nova stable/ussuri: add regression test case for bug 1978983  https://review.opendev.org/c/openstack/nova/+/86260314:16
*** dasm|off is now known as dasm14:22
opendevreviewMerged openstack/nova stable/ussuri: For evacuation, ignore if task_state is not None  https://review.opendev.org/c/openstack/nova/+/86260414:36
opendevreviewMerged openstack/nova master: DOC update remote console access  https://review.opendev.org/c/openstack/nova/+/86068714:37
johnthetubaguygibi: I don't think the PCI device objects are getting saved in time, I am trying to prove that in the functional test at the moment14:47
opendevreviewMerged openstack/nova master: Replace "db archive" with "db archive_deleted_raws"  https://review.opendev.org/c/openstack/nova/+/84796315:11
opendevreviewsean mooney proposed openstack/nova stable/yoga: refactor: remove duplicated logic  https://review.opendev.org/c/openstack/nova/+/85502215:15
opendevreviewsean mooney proposed openstack/nova stable/yoga: Record SRIOV PF MAC in the binding profile  https://review.opendev.org/c/openstack/nova/+/85502315:15
opendevreviewsean mooney proposed openstack/nova stable/yoga: Remove double mocking  https://review.opendev.org/c/openstack/nova/+/85502415:16
opendevreviewsean mooney proposed openstack/nova stable/yoga: Remove double mocking... again  https://review.opendev.org/c/openstack/nova/+/85502515:16
opendevreviewsean mooney proposed openstack/nova stable/yoga: Add compute restart capability for libvirt func tests  https://review.opendev.org/c/openstack/nova/+/85502615:16
opendevreviewsean mooney proposed openstack/nova stable/yoga: enable blocked VDPA move operations  https://review.opendev.org/c/openstack/nova/+/85503515:16
sean-k-mooneygibi: finally got around to fixing the commit messge on the double mocking pathc. i also rebased to the tip of stable yoga15:16
sean-k-mooneyim goint to try backporting that to xena and wallaby now15:16
gibisean-k-mooney: I'm +2 on https://review.opendev.org/q/topic:bug%252F197046715:20
* sean-k-mooney first patch to backport to xena already has a conflict yeah...15:25
sean-k-mooneyok including https://review.opendev.org/c/openstack/nova/+/829974 fixes it and that applies cleanly15:28
opendevreviewJohn Garbutt proposed openstack/nova master: Functional test test_boot_reschedule_with_proper_pci_device_count  https://review.opendev.org/c/openstack/nova/+/76035416:24
opendevreviewJohn Garbutt proposed openstack/nova master: Fix PCI passthrough race on reschedule (refresh)  https://review.opendev.org/c/openstack/nova/+/71084816:24
opendevreviewJohn Garbutt proposed openstack/nova master: Fix PCI passthrough race on reschedule (refresh)  https://review.opendev.org/c/openstack/nova/+/71084816:30
johnthetubaguygibi: I double checked with mgoddard, it turns out he found either patch will fix the issue, we just get to pick which one to merge. The functional tests seem to suggest mark is totally correct, i.e. either patch will fix it. I prefer the instance.refresh() one myself.16:34
gibijohnthetubaguy: I'm wondering when we add the pci dev to the instance.pci_devices list. If we do that during the claim, then claim.abort should be the one to clean that up16:58
opendevreviewsean mooney proposed openstack/nova stable/xena: Fix migration with remote-managed ports & add FT  https://review.opendev.org/c/openstack/nova/+/86493118:51
opendevreviewsean mooney proposed openstack/nova stable/xena: refactor: remove duplicated logic  https://review.opendev.org/c/openstack/nova/+/86493218:51
opendevreviewsean mooney proposed openstack/nova stable/xena: Record SRIOV PF MAC in the binding profile  https://review.opendev.org/c/openstack/nova/+/86493318:51
opendevreviewsean mooney proposed openstack/nova stable/xena: Remove double mocking  https://review.opendev.org/c/openstack/nova/+/86493419:08
opendevreviewsean mooney proposed openstack/nova stable/xena: Remove double mocking... again  https://review.opendev.org/c/openstack/nova/+/86493519:08
opendevreviewsean mooney proposed openstack/nova stable/xena: Add compute restart capability for libvirt func tests  https://review.opendev.org/c/openstack/nova/+/86493619:08
opendevreviewsean mooney proposed openstack/nova stable/xena: enable blocked VDPA move operations  https://review.opendev.org/c/openstack/nova/+/86493719:08
opendevreviewGhanshyam proposed openstack/nova master: Update gate jobs as per the 2023.1 cycle testing runtime  https://review.opendev.org/c/openstack/nova/+/86111120:30
opendevreviewGhanshyam proposed openstack/nova master: Update gate jobs as per the 2023.1 cycle testing runtime  https://review.opendev.org/c/openstack/nova/+/86111120:31
opendevreviewGhanshyam proposed openstack/placement master: Update gate jobs as per the 2023.1 cycle testing runtime  https://review.opendev.org/c/openstack/placement/+/86147120:39
opendevreviewGhanshyam proposed openstack/placement master: Update gate jobs as per the 2023.1 cycle testing runtime  https://review.opendev.org/c/openstack/placement/+/86147120:41
opendevreviewGhanshyam proposed openstack/os-traits master: Update python classifier for python 3.10  https://review.opendev.org/c/openstack/os-traits/+/86146620:47
opendevreviewGhanshyam proposed openstack/python-novaclient master: Update python classifier for python 3.10  https://review.opendev.org/c/openstack/python-novaclient/+/86146920:57
opendevreviewGhanshyam proposed openstack/osc-placement master: Update gate jobs as per the 2023.1 cycle testing runtime  https://review.opendev.org/c/openstack/osc-placement/+/86147021:01
opendevreviewGhanshyam proposed openstack/os-vif master: Update gate jobs as per the 2023.1 cycle testing runtime  https://review.opendev.org/c/openstack/os-vif/+/86146821:09
opendevreviewGhanshyam proposed openstack/nova master: Update gate jobs as per the 2023.1 cycle testing runtime  https://review.opendev.org/c/openstack/nova/+/86111121:48
*** dasm is now known as dasm|off22:47

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!