Tuesday, 2024-03-19

opendevreviewOpenStack Release Bot proposed openstack/nova stable/2024.1: Update .gitreview for stable/2024.1  https://review.opendev.org/c/openstack/nova/+/91363110:20
opendevreviewOpenStack Release Bot proposed openstack/nova stable/2024.1: Update TOX_CONSTRAINTS_FILE for stable/2024.1  https://review.opendev.org/c/openstack/nova/+/91363210:21
opendevreviewOpenStack Release Bot proposed openstack/nova master: Update master for stable/2024.1  https://review.opendev.org/c/openstack/nova/+/91363310:21
opendevreviewRocky proposed openstack/nova master: Fix the command to list hw_machine_type unset instances  https://review.opendev.org/c/openstack/nova/+/91365210:35
bauzasguys, please welcome our master branch that's now a Dalmatian :)10:48
bauzasso long Caracal and thanks for the fish10:48
bauzas(RC1 tag is set and caracal branch created)10:49
gibibauzas: \o/10:53
sean-k-mooney:)11:15
*** sfinucan is now known as stephenfin11:38
opendevreviewFabian Wiesel proposed openstack/nova master: Do not close returned image-chunk iterator & get_verifier early  https://review.opendev.org/c/openstack/nova/+/90947412:00
*** mklejn__ is now known as mklejn13:11
fwieselbauzas: I've reopened the bug #2055700 as #2058225. In my mind _rebuild_default_impl is implemented libvirt specific. I can implement driver.rebuild for vmwareapi to work around the bug, but I don't think that would be a great way forward.13:18
opendevreviewMerged openstack/nova stable/2023.2: Power on cores for isolated emulator threads  https://review.opendev.org/c/openstack/nova/+/91319513:21
bauzasfwiesel: cool13:40
fwieselBut I do not know enough how it is supposed to work in libvirt to make a good suggestion on how to implement it differently. I mean, I assume it makes sense that it is implemented the way it is.13:44
bauzasfwiesel: you're referring to https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4132-L413713:53
bauzasfwiesel: _rebuild_default_impl is a virt-agnostic solution for rebuild  which does delete the old guest and then create a new one https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3648-L373114:00
gibielodilles: thanks for the reviews so far. the parent of https://review.opendev.org/c/openstack/nova/+/913196 is merged now. So when you have time please check this backport too14:02
bauzasas you can read, we power off the guest, detach its volumes, (reimage the root disk if volume-backed), destroy the guest, attach the volumes again and spawn the guest14:02
sean-k-mooneyfwiesel: the way its currently implemtned shoudl wor for vmware too14:14
sean-k-mooneyfwiesel: the only driver that does anythign custom is ironic14:15
sean-k-mooneythe vmware dirver can provide its own solution14:16
elodillesgibi: ACK, will do14:17
sean-k-mooneythe imporant thing to remember about rebuild is rebuild is not a move operations, evacuate is a rebuild to a diffent host but by default a normal rebuild shoudl never change host. if you change image however you do need to call the schduler with the current host adn the existin flavor and new iamge to confirm that host is valid. that should all be handeled for you in the14:18
sean-k-mooneygeneric compute manager code so the only thing the driver level rebuidl needs to do is ensure the root disk is reimaged14:18
sean-k-mooneyif there is a more effiecnt way to do that in vmware then the generic approch then that is fine to use14:19
dansmithnot "should never change host" but "does not change host"14:20
dansmitheven the driver shouldn't be able to override that14:20
sean-k-mooneyright14:22
sean-k-mooneyim just tyring to say that if there was to be a vmware version fo rebuidl in the vmware driver the only thing it should od is reimage the disk14:22
sean-k-mooneythe vm must remain on the same hypervior host14:22
sean-k-mooneyi just mentioned evcautate because the driver code will have to work when called in that code path too14:23
dansmithyep14:23
fwieselsean-k-mooney: The current implementation calls driver.destroy, and *then* calls detach volume. In my mind, that shouldn't work in the general case.14:27
bauzasno14:33
bauzasfwiesel: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3675-L367614:33
bauzaspoweroff, detach, destroy is the workflow14:33
fwieselDestroy: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3695-L3701 and then via https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3713-L3715 comes https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3589-L359514:34
fwieselbauzas: The root volume is being detached after the instance has been destroyed if reimage_boot_volume and is_volume_backed14:35
fwieselSo, poweroff, detach everything except the root-disk, destroy, detach the root_disk (in the case of reimage_boot_volume and is_volume_backed)14:36
bauzaswe still detach the volumes, including the root bdm in https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3675-L367614:36
fwieselSure, and after the desotry the _rebuild_volume_backed_instance -> _detach_root_volume 14:37
fwieselWell, we have `reimage_boot_volume=True` ergo `detach_root_bdm = not reimage_boot_volume = False`, so `detach_block_devices(..., detach_root_bdm=False)`.14:40
bauzasfwiesel: ah you're right, it's a bool14:51
bauzasokay, so the root volume isn't detached yet, my bad14:51
bauzasand ack, we rebuild the root volume in _rebuild_volume_backed_instance()14:53
bauzaswhere we detach the volume, delete the old attachment, and then reimage the volume thru cinder14:55
bauzasinterestingly enough, we destroy the guest indeed before the detach, but we skip its bdm when we call the driver destroy14:56
fwieselThere is probably a reason for that. I mean it is done very explicitly so. But as I said, I don't have any idea why. 15:06
opendevreviewFabian Wiesel proposed openstack/nova master: Do not close returned image-chunk iterator & get_verifier early  https://review.opendev.org/c/openstack/nova/+/90947415:21
opendevreviewElod Illes proposed openstack/nova stable/2024.1: [stable-only] Update .gitreview for stable/2024.1  https://review.opendev.org/c/openstack/nova/+/91363115:30
opendevreviewElod Illes proposed openstack/nova stable/2024.1: [stable-only] Update TOX_CONSTRAINTS_FILE for stable/2024.1  https://review.opendev.org/c/openstack/nova/+/91363215:30
opendevreviewElod Illes proposed openstack/nova stable/2024.1: [stable-only] Update TOX_CONSTRAINTS_FILE for stable/2024.1  https://review.opendev.org/c/openstack/nova/+/91363215:30
bauzas#startmeeting nova16:00
opendevmeetMeeting started Tue Mar 19 16:00:45 2024 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.16:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:00
opendevmeetThe meeting name has been set to 'nova'16:00
bauzasheyho16:00
bauzas#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting16:00
fwieselo/16:01
elodilleso/16:01
auniyalo/16:01
gibio/j16:01
bauzasokay, let's sofly start16:02
sean-k-mooneyo/16:02
grandchildo/16:02
bauzas#topic Bugs (stuck/critical) 16:03
bauzas#info No  Critical bug16:03
bauzas#link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 63 new untriaged bugs (-2 since the last meeting)16:03
bauzas#info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster16:04
bauzasgibi: could you please take the baton for this week ?16:04
gibibauzas: sorry I cannot promise to do meaningful triage this week either16:04
gibican we skip me?j16:05
bauzasokay16:05
gibiI hope I will have more time in the coming months16:05
bauzasmelwitt is the next in the queue16:05
bauzasmelwitt: would you be okay ?16:05
bauzasokay, anyone else wanting to take the baton ? if not, I'll take it again... :)16:07
bauzasokay16:08
elodillesi'd rather wait until we release 2024.1 Caracal o:)16:08
bauzas#info bug baton is bauzas16:08
bauzasmoving on16:08
bauzas#topic Gate status 16:08
bauzas#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:08
bauzas#link https://etherpad.opendev.org/p/nova-ci-failures-minimal16:09
bauzas#link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status16:09
bauzas#info Please look at the gate failures and file a bug report with the gate-failure tag.16:09
bauzasanything to discuss about the gate ?16:09
bauzasoh, one point16:09
bauzasI just saw slaweq's email on bare rechecks16:09
Ugglao/16:10
bauzasas a reminder, please avoid to say only "recheck" when you have a gate failure16:10
dansmithor "recheck unrelated"16:10
bauzasand please add a good comment explaining what's the root problem16:10
dansmithit can be unrelated, but at least say why16:10
bauzasand try to find a second to understand the reason16:10
bauzasyeah, or at least please tell in the comment which job fails and with which exception16:11
bauzas#link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/7L6DSFN6GB65FBVHQATTLSQ7HWGBZJHM/16:11
sean-k-mooneyhonestly i dont think calling this out here helps16:12
bauzas#info please avoid bare rechecks 16:12
dansmithsean-k-mooney: really? the tc has asked PTLs and other project leaders to help get this word out16:12
sean-k-mooneythe people that are here already know we should avoid it16:12
bauzasnot only folks discussing here look at our meetings :)16:13
sean-k-mooneydansmith: sure but doing this in the team metting i dont think helps16:13
dansmithsean-k-mooney: it's exactly what we've asked for, AFAIK16:13
sean-k-mooneyok but we disucssed about stoping to do this here in the past16:13
bauzasI know some folks look at our minutes16:13
dansmithand I think it does help because as bauzas says, people might not be here but read the minutes16:13
sean-k-mooneyok but we stopped doing it in meeting becasue we didnt think it was working16:13
dansmithsean-k-mooney: 15% on nova are still bare, and a higher proportion are "recheck unrelated" so there's still work to do16:14
sean-k-mooneywe can go back to doing it16:14
bauzas16% is a bad value IMHO16:14
dansmithbauzas: better than some projects for sure, but still work to do yeah16:14
bauzasso I'll then discuss this every meeting16:14
sean-k-mooneydo we have the list of those because i dont think just callign it out here is going to have an impact16:15
bauzas#action bauzas to tell about the bare rechecks every week in our meeting16:15
bauzassean-k-mooney: please look at the link above16:15
sean-k-mooneyi have the email open16:15
bauzaswe had 16 bare rechecks against 105 rechecks, so we have 15% 16:16
sean-k-mooneyi might see if i can pull out the bare rehceck and see what there were16:16
bauzasfwiw, I can find which changes have bare rechecks16:16
bauzasthis is quite simple16:16
bauzasbut I don't want to ping folks saying "dude, don't do it"16:17
sean-k-mooneythere are a few poepl that tend to only review +116:17
sean-k-mooneyand i wonder if its the same subset that are blind rechecking16:17
sean-k-mooneyi did ask one person to stopp rechecking 16:17
sean-k-mooneybecuase there was an actual issue16:18
sean-k-mooneyanyway lets move on we can chat about this after the meeting16:19
bauzasokay16:20
bauzas#topic Release Planning 16:20
bauzas#link https://releases.openstack.org/caracal/schedule.html#nova16:20
bauzas#info RC1 delivered this morning16:20
bauzas\œ/16:20
bauzas#info master branch is now on Dalmatian16:21
elodilles\o/16:21
bauzasas a reminder, we won't accept backports for the caracal branch until GA unless they are regression fixes16:21
bauzasditto with other stable backports given you need to have it in Caracal too16:22
bauzas(unless of course the backports are from a caracal merged change)16:22
elodilles+116:22
bauzas#link https://etherpad.opendev.org/p/nova-caracal-rc-potential16:23
bauzasI'll continue to provide this etherpad for the possible new Caracal RC tags16:23
bauzasI just hope we won't find a regression16:23
bauzasany question about RC phases or Dalmatian ?16:24
bauzaslooks not, moving on16:25
bauzas#topic Review priorities 16:25
bauzas#action bauzas to provide a new etherpad for Dalmatian 16:26
bauzasfor the moment, we don't have priorities16:26
bauzasI'll start to look at some spec changes next week16:26
bauzas#topic Stable Branches 16:26
bauzaselodilles: go for it16:26
elodilleso/16:26
elodilles#info stable/2024.1 branch is open for bugfix backports16:26
elodillesof course as bauzas wrote: only rc critical16:27
elodillesfor this time16:27
elodilles#info stable gates seem to be OK16:27
bauzascool16:27
elodilles(as we discussed, there are jobs that fail intermittently :/)16:27
elodilles#info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci16:28
elodillesthat's it from me about stable branches16:28
bauzascool thanks16:29
bauzas#topic vmwareapi 3rd-party CI efforts Highlights 16:29
bauzasfwiesel: ?16:29
fwiesel#info CI comments on openstack/nova CR confirmed to be working.16:29
fwieselBest maybe seen here:16:30
fwiesel#link https://review.opendev.org/dashboard/3657116:30
fwiesel#info Fixed FIPs instability. Tests quite reproducible. Only one error left with all proposed patches applied.16:30
fwieselThe issue and fix were in retrospect easy... we have now a working pipeline.16:30
bauzas\o/16:31
fwieselJust with the new branches came in some instability, as it expects all service to actually have the branch.16:31
fwieselI.e. it fails now to check out stable/2024.1 for requirements, but we have it in nova. Not sure, if I should do something about that one.16:32
fwieselMissed that one:16:32
fwiesel#link http://openstack-ci-logs.global.cloud.sap/sapcc-nova--nmllh/tempest.html16:32
fwieselThe tempest test with the last open problem.16:32
elodillesnow ~everything should have branched (well, requirements, devstack and grenade will be created around end of week)16:32
bauzasare you sure you use the requirements from the stable releases ?16:32
bauzasfwiesel: awesome about where you are16:32
fwieselMissing now are hopefully only cleaning up the rough edges... A nicer UI with linkable lines we still are missing.16:33
bauzasokay, anything else ?16:34
fwieselNo, not from my side.16:34
fwieselQuestions?16:35
fwieselOne remark: A test run with fixes in place takes roughly 20minutes in total.16:35
bauzasnope16:35
bauzaswow16:35
fwieselbauzas: Back to you then16:36
bauzascool, that will be short16:36
bauzas#topic Open discussion 16:36
bauzasanyuthing anyone ?16:36
dansmitha tempest run takes 20 minutes? that would be shockingly fasty16:37
fwieselWell, it is a hefty machine, and I run it as parallel as I can. 16:37
bauzasok guys, I think I can close the meeting16:38
bauzasthanks all16:38
bauzas#endmeeting16:38
opendevmeetMeeting ended Tue Mar 19 16:38:23 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:38
opendevmeetMinutes:        https://meetings.opendev.org/meetings/nova/2024/nova.2024-03-19-16.00.html16:38
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/nova/2024/nova.2024-03-19-16.00.txt16:38
opendevmeetLog:            https://meetings.opendev.org/meetings/nova/2024/nova.2024-03-19-16.00.log.html16:38
fwieselThank you too16:38
sean-k-mooneywell i guess it depend on how many tests you run. devstack can stack in about 8 mins if its not io bound16:38
elodillesthanks o/16:39
fwieselThe tempest run takes 10 minutes, by the way :)16:39
sean-k-mooneyyep but your also unly runnign 525 ish tests16:40
sean-k-mooneyat least based on 16:40
fwieselSure, are there tests I am missing? 16:40
sean-k-mooneyhttp://openstack-ci-logs.global.cloud.sap/openstack-nova-909474-f54wv/tempest.html16:40
sean-k-mooneyfwiesel: well you only runing the compute api tests16:40
sean-k-mooneyyour not runng the senario tests ro any fo teh test fo other services like cinder/nuetorn extra16:41
sean-k-mooneywe dont need you to run all of those16:41
sean-k-mooneybut i was just providing context for others 16:41
sean-k-mooneyi.e. its not tempest-full16:41
fwieselNo, indeed it isn't.16:42
fwieselI am not even running all compute.api-tests, only those that are implemented for the vmware api driver.16:43
sean-k-mooneyack that kind of makes sense16:43
sean-k-mooneyfor the hardware based cis they took a simialr approch16:44
sean-k-mooneyi.e. the old intel ci just ran the cpu pinning/sriov/numa related subset 16:44
fwieselPossibly it would make sense to check what code-coverage the tempest test generates on the driver implementation, and adjust the tests accordingly. 16:46
sean-k-mooneydansmith: correct me if im wrong but we do not have the ablity to ask placement for a host that does not have a resouce type right  like a forbiden trait, we would have to do that in a scheduler filter17:55
sean-k-mooneyrelated to our converation about non numa instances 17:56
dansmithsean-k-mooney: I really don't know currently, but I'm pretty sure that was one of the early limitations.. we could ask for inclusion but not exclusion17:56
sean-k-mooneyack i think its still that way. i belive we coudl have a new filter that does the following check17:57
sean-k-mooneyfor a given instance look  at the guest falvor/image to see if it is requesting numa aware memory i.e. hw:mem_page_size17:58
sean-k-mooneyif it is not check the allcoation candates for the host to see if it has either the file backed memory trait or an inventory of PCPU17:58
sean-k-mooneyif it has either reject the host17:59
sean-k-mooneythat would allow operators to opt in/out by not using the filter18:00
sean-k-mooneywe can add it to the default filters to enforce the correct behavior. 18:00
dansmithwell, that feels more formal of a feature opt-out than a workaround named I_like_to_oom_my_instances=True18:00
dansmithbut it would mean an abort in scheduling instead of compute, which would be better18:01
sean-k-mooneywell we cloud perhaps not make it a filter but a config option that woudl work like a filter18:02
sean-k-mooneyi.e. have a workaround to remove this mandatory filter18:02
dansmithsean-k-mooney: btw, I am still interested in the answer to my question here: https://review.opendev.org/c/openstack/nova/+/877773/9/nova/virt/libvirt/migration.py18:02
dansmithbecause it seems like it should always fail and I don't know why it's not18:02
sean-k-mooney reading that now ill need to check but if i rememebr correctly we pop the value form the dict before it ever get there18:03
dansmiththat's a bad solution IMHO if so because it would break computes that don't have this code yet,18:04
dansmithbut I also couldn't find any such pop18:04
sean-k-mooneyyour asertion is calling int(a string containing a list of ints)18:04
dansmithsean-k-mooney: no it's calling int() on the fake key18:04
sean-k-mooneyshould always explode so either we are not hitting that path in the test or there is somethign else going on here18:04
dansmiththat dict is supposed to be keys of '1', '2', etc and so it int()'s the keys,18:04
dansmithbut now there's a key of "dst_cpu_whatever" in the dict18:05
dansmithand I don't know how it's not failing to int() that when it iterates the items18:05
dansmithand I'm wondering if it's because we don't have test coverage that was hitting that code path with the three conditions being true18:05
sean-k-mooneyoh right "guest_id, host_ids in info.cpu_pins.items()" so guest_id would be the key18:07
dansmithright18:08
dansmithso I don't know how that's not breaking18:08
sean-k-mooneyartom: https://review.opendev.org/c/openstack/nova/+/631053 and https://bugs.launchpad.net/nova/+bug/1811886 are related to our converstaion about the issue relating to not requesting a page size.19:23
sean-k-mooneyits not the example i was giving when we talks but its a similar equally wrong case.19:23
opendevreviewMerged openstack/nova master: Remove nova.wsgi module  https://review.opendev.org/c/openstack/nova/+/90268619:42
opendevreviewMerged openstack/nova master: Add new nova.wsgi module  https://review.opendev.org/c/openstack/nova/+/90268719:42
sean-k-mooneydansmith: i looked brifly but i dont know why that is not brekaing things but my best guess is we are somehow mocking this and we dont have cpu_shared_set defiend in any of the jobs19:48
sean-k-mooneythere is no reason why we cant have cpu share set in one of our live migration jobs so i might try and enable that19:49
sean-k-mooneyi.e. 0-5 on one node and 2-7 on the other19:50
sean-k-mooneythat pretty trivial to do so ill add a patch on top or renes serise to do that19:51
sean-k-mooneyin theory rene's code should be runnign even if that is not devied but we should be able to test this in the gate anyway19:52
opendevreviewStephen Finucane proposed openstack/nova master: Make overcommit check for pinned instance pagesize aware  https://review.opendev.org/c/openstack/nova/+/63105319:59
dansmithsean-k-mooney: yeah, I'm unsure either, but I'm positive that means we're just not running the code we think we're running20:01
opendevreviewribaudr proposed openstack/nova master: Fix [backport]: Live migrating to a host with cpu_shared_set configured will now update the VM's configuration accordingly.  https://review.opendev.org/c/openstack/nova/+/87777320:02
opendevreviewribaudr proposed openstack/nova master: Fix: Live migrating to a host with cpu_shared_set configured will now update the VM's configuration accordingly.  https://review.opendev.org/c/openstack/nova/+/90370620:02
sean-k-mooneydansmith: my guess right now is we are never populating the numa toplogy info in the migrate_data20:07
dansmithsean-k-mooney:  you mean not running a test that causes it to be populated right?20:08
dansmithI mean...not running a test _case_ where it would be populated and that code path exercised20:08
sean-k-mooneyno i ment not seting the object that had the new data at all but we appear to be20:10
artomsean-k-mooney, dansmith, hrmm, I wonder if we're fine because... let my type this out20:10
artomThe NUMA live migration code kicks in only if the instance has a NUMA topology20:11
artomUggla's patch _always_ adds that hidden item in the dst_numa_info20:11
sean-k-mooneyso it alwasy uses the numa migration code?20:12
artomHold on20:12
artomNo, I thought I had it...20:13
sean-k-mooneythe fact we have to have this converstaion and its not obvious how its working after a few of us lokking at it is a code smell to me in either case20:16
sean-k-mooneythe code might be correct but its to complex to debug20:16
artomSo to trigger it, all that should be required in live migration a NUMA VM to a dest with cpu_shared_set, right?20:18
artomThat would cause Uggla's patch to stash the cpu_shared_set in the "hidden" key in dst_numa_info, and *should* cause the update XML code to trip over and explode?20:19
artomWe've _definitely_ done that ^^ in whitebox, and nothing has exploded20:19
sean-k-mooneyartom: no20:19
sean-k-mooneyto trigger the bug yuou need to migate a non numa vm20:19
sean-k-mooneyoh20:20
sean-k-mooneyyou mena the the driver issue20:20
artomYeah, the thing that we're trying to understand how it even works20:20
sean-k-mooneyyes if the loop over the keys was broken then you woudl have to migrate a pinned instance20:20
sean-k-mooneyright but since this is alwasy set https://review.opendev.org/c/openstack/nova/+/877773/10/nova/virt/libvirt/driver.py#1002820:22
sean-k-mooneyany numa instance should triger this code path20:22
artomExactly20:22
artomI'm expecting migrate_data.dst_numa_info.cpu_pins["dst_cpu_shared_set_info"] to always be there20:23
artomOhhhhh20:25
artomWe just clobber the entire object for NUMA live migration: https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/driver.py#L1010620:25
sean-k-mooneybut again numa instance 20:26
artomAnd that ^^ happens later in the flow20:26
sean-k-mooneysorry that was in my buffer20:26
sean-k-mooneyya that would make sense20:26
sean-k-mooneyas that is where it woudl normally be created20:27
artomThat's after we make the claim on the dest, and Uggla's code was adding its thing earlier in check_can_live_migrate_destination20:27
artomSo it just gets overwritten20:27
sean-k-mooneyis it in check_can_live_migrate_destination i tought it was in pre_live_migration (at dest)20:28
dansmithbut as sean-k-mooney said now and I said earlier, this is hiding demons in the details on so many levels, especially since this transforms to a real object property in the next patch20:28
dansmithartom: just a thought - are you running the tests you expect to pass now on the lower patch or only the stack? if the latter, then this code is "gone" by then right?20:29
artomsean-k-mooney, https://review.opendev.org/c/openstack/nova/+/877773/10/nova/virt/libvirt/driver.py#995820:29
sean-k-mooneyoh its in check_can_live_migrate_destination in the driver i was thinking of the conductor method with a similer name20:29
artomdansmith, definitely the backportable one20:29
dansmithokay just checking20:30
artomdansmith, but I think we've figured it out :)20:30
dansmithyeah, but I haven't figured out yet if you're thinking that we're not covering the case where we would trip over that code, or if they're mutually exclusive somehow (not that it makes it okay)20:31
sean-k-mooneywell at a minium this show we are missign a funtional test to demonstrate that the xml is updated properly between the srouce and destinatnio hosts20:32
artomdansmith, they're not mutually exclusive, they definitely both run, and the NUMA live migration code runs _after_ the hack, so it clobbers the entire dst_numa_info object that was used as the hiding place for the "secret" cpu shared set info20:33
dansmithI don't understand how that can be20:34
artomdansmith lulz another call ^_^ ?20:34
sean-k-mooneyits because of the 3 conditions in the if20:34
dansmithwe have to put the dst info into the blob so it can be used to generate the xml right?20:34
sean-k-mooney   if ('cpu_pins' in info and20:34
sean-k-mooney          'cell_pins' in info and20:34
sean-k-mooney          'emulator_pins' in info):20:34
sean-k-mooneyin the non numa case only one of those 3 in checks is true20:34
sean-k-mooneyso we never enter the loop20:35
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/877773/9/nova/virt/libvirt/migration.py#18620:35
sean-k-mooneyin the numa case the numa live migration code clobers the object that rene modifed20:35
sean-k-mooneyso it just happens to work becasue we get lucky and the case where its not clobpered we dont enter the loop20:36
dansmithand to be clear, that's always the case or because we're missing a test scenario where we could get unlucky?20:36
sean-k-mooneycell_pins and emulator_pins are not defiend in the non numa case20:36
artomThe case where it's not clobbered is when a VM doesn't have a NUMA topoogy20:36
dansmithI guess the point is. yeah that ^20:36
sean-k-mooneythats alwasy the case20:36
artomSo by pure luck we've made them functionally mutually exclusive20:37
sean-k-mooneywhen we dont have a numa guest cell_pins and emulator_pins are not set20:37
sean-k-mooneyand in the numa case the dst_numa_info is regenerated by the numa live migration code20:38
sean-k-mooneyso we never execute _update_numa_xml with cpu_pins that contains the stashed value20:39
dansmithusually when I say "this will be a nightmare of complexity to debug in the future" I have to wait years to collect on that debt.. glad this time it was under 24 hours :D20:39
sean-k-mooneyme too because i proably would have been the one trying to fiugre this out in a few years :)20:39
dansmithyep20:40
artomIn a hypothetical few years wherein we've chosen to merge the haxx ;)20:41
dansmiths/we've/we'd/ yeah20:41
sean-k-mooneyUggla: have you had time to sync with artom on the change of direction20:41
sean-k-mooneyi need to go eat so ill call it a day there20:45
Ugglasean-k-mooney, more or less. But I guess Artom and I will sync tomorrow....20:45
sean-k-mooneyim glad we know why it was working im not happy with how it was working20:45
spatel_sean-k-mooney I believe I talked about this in past. Just re-checking again. Does snapshot support include of memory footprint? 20:51

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!