Monday, 2023-01-23

opendevreviewAmit Uniyal proposed openstack/nova stable/zed: Improving logging at '_allocate_mdevs'.  https://review.opendev.org/c/openstack/nova/+/87141305:12
opendevreviewAmit Uniyal proposed openstack/nova stable/yoga: Improving logging at '_allocate_mdevs'.  https://review.opendev.org/c/openstack/nova/+/87141405:14
opendevreviewAmit Uniyal proposed openstack/nova stable/xena: Improving logging at '_allocate_mdevs'.  https://review.opendev.org/c/openstack/nova/+/87141505:15
opendevreviewAmit Uniyal proposed openstack/nova stable/wallaby: Improving logging at '_allocate_mdevs'.  https://review.opendev.org/c/openstack/nova/+/87141605:16
opendevreviewAmit Uniyal proposed openstack/nova stable/victoria: Improving logging at '_allocate_mdevs'.  https://review.opendev.org/c/openstack/nova/+/87141705:17
opendevreviewAmit Uniyal proposed openstack/nova stable/ussuri: Improving logging at '_allocate_mdevs'.  https://review.opendev.org/c/openstack/nova/+/87141905:19
opendevreviewAmit Uniyal proposed openstack/nova stable/train: Improving logging at '_allocate_mdevs'.  https://review.opendev.org/c/openstack/nova/+/87144406:09
opendevreviewAmit Uniyal proposed openstack/nova stable/train: Improving logging at '_allocate_mdevs'.  https://review.opendev.org/c/openstack/nova/+/87144406:26
opendevreviewRajesh Tailor proposed openstack/nova master: Handle InstanceInvalidState exception  https://review.opendev.org/c/openstack/nova/+/86173807:52
opendevreviewRajesh Tailor proposed openstack/nova master: Handle InstanceInvalidState exception  https://review.opendev.org/c/openstack/nova/+/86173809:47
opendevreviewSahid Orentino Ferdjaoui proposed openstack/nova master: compute: enhance compute evacuate instance to support target state  https://review.opendev.org/c/openstack/nova/+/85838310:00
opendevreviewSahid Orentino Ferdjaoui proposed openstack/nova master: api: extend evacuate instance to support target state  https://review.opendev.org/c/openstack/nova/+/85838410:00
bauzasgibi: so, wants me to move https://blueprints.launchpad.net/nova/+spec/pci-device-tracking-in-placement to Implemented so ,10:07
bauzas?10:08
gibibauzas: yes please10:08
bauzasack10:08
opendevreviewRajesh Tailor proposed openstack/nova master: Handle InstanceInvalidState exception  https://review.opendev.org/c/openstack/nova/+/86173811:06
opendevreviewRajesh Tailor proposed openstack/nova stable/zed: Correct config help message related options  https://review.opendev.org/c/openstack/nova/+/87124712:35
kashyapgibi: If you have time, appreciate a look at this failure: https://zuul.opendev.org/t/openstack/build/42cccae5e7b64091a9a63ed0f808c980.  (I thought mocking _register_all_undefined_instance_details should suffice; but fails differently after that: https://paste.opendev.org/show/bNMZwe9Sh4vT5V9gA95t/)12:44
gibikashyap: you set up mocked_compare to return 2 but the code around self._host.compare_cpu in driver.py expect an libvirtError exception when failure happens12:59
kashyapgibi: You're talking about this test in the above paste-bin, yeah? - test__check_cpu_compatibility_advance_model13:00
gibiyepp13:00
gibiwith the extra mock form the paste applied13:00
kashyapHmm13:01
kashyapgibi: The extra mock is the first step here, at least right13:02
gibikashyap: yepp, I think so13:02
gibiwhat you mocked would call into the db and that is not allowed in that unit test hence the need for the mock13:02
kashyapRight; I see.13:03
kashyapgibi: Unless I'm being dense, the test is trying to raise an exception there, isn't it - line-12.  13:04
kashyap(In the pastebin)13:04
gibikashyap: nope, that point the test *expects* an exception is being raied by calling drvr.init_host13:04
kashyapgibi: Hmm, how would you suggest to fix this?  I'm a bit out of brain cells here13:06
kashyapI didn't paste the last line of test traceback, but probably you saw it in Zuul:  it's the "impl.MismatchError ... <bound method [...] returned None>"13:09
gibiso looking at https://github.com/openstack/nova/blob/d8b4b7bebdc0f55353cd99f372044b9e30315a6d/nova/virt/libvirt/driver.py#L9978-L999313:10
gibiyou need to return a negative integer from mock_compare to trigger a failure case OR you have to raise libvirtError from mock_compare13:11
gibiI'm not sure what the real libvirt behavior is, raise or return negative13:12
gibibut the code linked above handles both13:12
kashyapAaah, the "if ret <=0" bit13:12
gibione more thing13:15
gibiyou mock nova.virt.libvirt.host.libvirt.Connection.compareCPU13:15
gibibut I'm not sure that the actuall call reaches there in the unit test13:16
kashyapHmm13:16
gibiI think in that unit test the call reaches nova.tests.fixtures.libvirt.Connection.compareCPU13:16
gibiinstead13:16
kashyap(Aside: returning negative integer didn't help; I tried -  "mocked_compare.side_effect = -1")13:17
kashyapgibi: Hmm, it's odd that it is reaching the fixture in this case, rather than the host.libvirt13:18
gibilet me check...13:18
kashyap(Another test is failing the same way, actually.  If you want to pull the patch in: https://review.opendev.org/c/openstack/nova/+/870794/)13:18
gibihm, there is a bug in the code. We removed the first check and kept the second but actually the second check runs on each cpu_model_extra_flags flag one by one. If no flags defined there then no check runs at all13:23
gibithe latter happens in the unit test13:23
kashyapgibi: Hmm, I sure tried adding extra_flags to this test.  Maybe I mixed up, lemme try adding one in the test13:24
kashyapgibi: You mean "bug in the code" or bug in the unit test code -- to be modified to reflect the new reality13:25
kashyapgibi: Sure enough, adding this line succeeds the test:13:25
kashyap+                   cpu_model_extra_flags = ["-aes"],13:25
gibiit is more like a question. Do we want to run the check if on extra flags are configured?13:25
kashyapgibi: Yeah, we do want to check to run (_compare_cpu) when the extra_flags are configured -- is that what you ask?13:26
gibiscratch the nova.tests.fixtures.libvirt.Connection.compareCPU issue that was my mistake13:27
kashyapgibi: I had to add the above diff, _and_ also return a negative integer for mocked_compare13:27
kashyapgibi: So that's the fully modified test - https://paste.opendev.org/show/bPZk3eYZzuS4F7yazGqK/13:28
kashyapSee line-6 and line-913:28
gibikashyap: my question is: if there is no extra flags configured, should we still run the compare_cpu or is it OK to only run the compare_cpu if there are extra flags configured 13:28
gibikashyap: yepp tha paste matches what I did13:29
sean-k-mooneygibi: we should run the compare if there are no extra flags13:29
sean-k-mooneywe just dont need to modify the model13:29
gibithen we need to change the code up a bit 13:29
gibias this for loop is empty if no extra flags provided https://review.opendev.org/c/openstack/nova/+/870794/8/nova/virt/libvirt/driver.py#99213:29
sean-k-mooneythe code should treat it as an empty list of flags13:29
kashyapsean-k-mooney: We never modify the model ourselves13:29
gibisean-k-mooney: also I'm not sure you see that but this code compares the extra flag independently one by one, not all at once13:30
sean-k-mooneywell not the model but the cpu definition13:30
sean-k-mooneywhat we want the algortim to do is generate teh cpu xml from the modle then apply any flags that are present and then do one compare13:31
sean-k-mooneyit should do that for each model in cpu_models13:31
kashyapgibi: Can you summarize your observation in the comment, please?  (Also - the one-by-one flag comparison - I don't see an issue there)13:31
sean-k-mooneyand raise an error if any of them fail13:31
gibikashyap: sure13:31
sean-k-mooneydoing it flag by flag is an optimisation of that in that it will fail on the first flag that is not supported13:32
sean-k-mooneybut its not really sematicaly correct13:32
sean-k-mooneyfor example to disable TSX you need to remove two flags13:32
sean-k-mooneydoing it one at a time will fail in that case13:32
kashyapgibi: Also, I'm not sure I agree with sean-k-mooney on to run the _compare_cpu() when there are no flags there13:33
sean-k-mooneyno flags means do not modify the enabled model13:33
kashyapsean-k-mooney: If you run it w/o the flags, then that means it's the same as the first call that we removed13:33
sean-k-mooneyso we still need to check that for validity13:33
kashyapDoes what I say above that make sense?13:33
sean-k-mooneykashyap: the first call is correct where teh model is comaptible with the host13:33
sean-k-mooneyflags is optional and only needed fi teh default models shipped by lbivrt/qemu are not sufficent fro your usecase13:34
sean-k-mooneywhere they are you should not be requried to use it but we should still validate the model13:35
gibithere is a difference between the old check we removed and using the new check with the empty flag list13:35
sean-k-mooneyi mentioned in one of my reviews i was not sure the current compare was correct and the per flag iteration was one of my concerns13:36
kashyapsean-k-mooney: Huh, if we retain the second call when no flags are given, then there's no difference with the _removed_ call in this patch -- gibi: does this make sense?13:36
gibithe old check used cpu.model = self._get_cpu_model_mapping(model) as the the one to compare while the remaining check uses self._host.get_capabilities().host.cpu.model + the extra flags13:36
sean-k-mooneythe first call was only incorrect if you use extra flags13:36
kashyap[ What's the diff with the new (or rather the 2nd check) with an empty flag list?  Ah, you just write above]13:36
sean-k-mooneywell the second check is wrong because its not looping over al the models enabeld in cpu_models13:38
kashyap Hmm, I'm trying to minimize the "impact surface" here.13:38
gibiI guess we are back to the drawing board then13:38
kashyapgibi: The first check was wrong anyway: so that we can remove w/o doubt.13:38
sean-k-mooneyno not in all cases13:39
sean-k-mooneyit was only wrogn if you enabled a modle that is not compatible with the cpu13:39
gibithe two check uses a different cpu.model for comparision13:39
sean-k-mooneygibi: right and the second check is not looping over the lsit fo cpu_modles13:39
sean-k-mooneyso we need to 1 loop over them using self._get_cpu_model_mapping(model) and 2 apply all cpu flags before doing the compare13:40
sean-k-mooney3 the comapre should not be condtional on extra flags and should happen even when empty13:41
kashyapHm, that could make whole code flow to be changed and a lot more surface impact :-( - I wonder if we try it on _top_ of the workaround?13:41
gibibasically to see that each configred model with all the configured flags applied are still compatible with the hypervisor13:41
kashyapgibi: Yeah, that's what libvirt will be doing under the hood "by default".  13:42
gibikashyap: I don't think so. I guess libvirt will do the check during boot with the selected model + extra flags13:43
gibinot each model + flags at startup13:43
kashyap(I'm just thinking if we can "do this additional improvement" on _top_ of the current workaround + API replacement patch?)13:43
gibiI guess if we want the workaround to land independently then we need to add back the old check and make both check optional with the WA flag. 13:43
gibisean-k-mooney: ^^ what do you think?13:44
kashyapHmm, the first check is wrong unilaterally - sean-k-mooney agrees that too, IIUC13:44
kashyap(As based on his suggestion is the current revision)13:44
kashyap(And thanks for your patience so far, folks!)13:45
gibithen I'm confused, as we discussed above that the second check alone is not enough. 13:45
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/870794/8/nova/virt/libvirt/driver.py#99213:46
sean-k-mooneykashyap: actully i dont13:46
gibias it does nothing when no extra flags I configured13:46
sean-k-mooneykashyap: i said its condtionally wrong if and only if you use extra_flags13:46
sean-k-mooneyanyway i posted what i think is the correct code in the comment above ^13:47
kashyapsean-k-mooney: Well, I'm fine if you've refined your view _now_.  But a few days ago you did write here:13:48
kashyap(quote)13:48
kashyap< sean-k-mooney> but if you want to future proof then sure you can put the remain check under a workaroudn13:48
kashyap[...]13:48
kashyap< sean-k-mooney> or just delete the first check13:48
kashyap(/quote)13:48
kashyap(Doing "nothing when no extra flags" are configured is still fine - as it was already explicitly tested by a downstream bug reporter in a real env)13:49
sean-k-mooneyyes i did not express my concern at that point with the looping as i tought i has already disucsst that with you13:49
kashyapOkay.13:49
sean-k-mooneykashyap: its not fine because its an upstream regression13:49
kashyapgibi: sean-k-mooney: Also it was tested with the workaround as it stands: https://bugzilla.redhat.com/show_bug.cgi?id=2138381#c4513:50
kashyap(Unforunately, it's a private comment, so only RHTers can see it)13:50
kashyap(Hmm, perhaps ignore that test, I think he tested it with the first revision)13:51
gibikashyap: you will only see the changed behaivor if the test is configuring incompatible model without extra flags.13:51
kashyapgibi: True13:51
gibiin the baseline that will fail the startup of nova-compute in the new code it will let the nova-compute start13:51
gibibut fail all instance boot instead13:52
kashyapI'm trying to find a reasonable compromise here: so shall I put both the blocks back into the code and add the workaround?13:52
kashyap(s/blocks/calls/)13:52
kashyapgibi: Right, failing at instance boot is still "something" - as it still provides a libvirt error.  But I know we've already discussed and that's not 'acceptable'13:53
sean-k-mooneyi condier the bug you are introduceing by removign all cpu checks in the case that cpu flags is empty to be more serios then the bug you are trying to fix13:53
kashyapsean-k-mooney: Keeping the first call in tact will break upstream deployments whenever using Icelake and Cascadelake 13:53
sean-k-mooney[m]the code i put in a comment in the reveiw will work for that case13:54
kashyapsean-k-mooney: gibi: Okay: I'll keep both the checks in and wrap 'em around the workaround.13:54
sean-k-mooney[m]and provide the corect behavior in other cases13:54
kashyapDoes that suit?13:54
sean-k-mooney[m]have you looked at the code i provided13:54
sean-k-mooney[m]i would perfer if you used that13:54
kashyapNot yet; /me goes to look13:54
sean-k-mooneymy network went down for a minute or two so i read scrollback on matrix so i think im caught back up13:55
gibisean-k-mooney: does that change works for the *lake issue even if the compare_cpu is not replaced with the compare_hypervisor_cpu call?13:56
sean-k-mooneyoh am i ment to use the new api but i think it would yes13:56
kashyapsean-k-mooney: So you did this:13:56
kashyap- cpu.model = self._host.get_capabilities().host.cpu.model13:56
kashyap+ cpu.model = self._get_cpu_model_mapping(model)13:56
kashyapsean-k-mooney: (The new API patch comes on top of the workaround.)13:56
sean-k-mooneynot just that i also did a loop over teh enabled cpu models13:57
sean-k-mooneyyou added support for multiple models a few release ago13:57
sean-k-mooneybut the cpu_flags aware code path13:57
sean-k-mooneywas not updated to loop over the list of models13:57
sean-k-mooneyso that is an existing bug13:57
kashyapOh, wait - you wrapped it under a for loop: "for model in models:"13:58
sean-k-mooneyyes13:58
kashyapLemme try the tests to see what breaks; again - such a change will be done on _top_ of the workaround of both the calls13:58
sean-k-mooneyand i made sure the compare was not condtional on the flags13:58
sean-k-mooneykashyap: no13:58
sean-k-mooneythis need to be in the first patch13:58
sean-k-mooneywell13:59
kashyapsean-k-mooney: Well, we can't be 100% sure this works in all cases.  So that's too much of a risk to take13:59
sean-k-mooneyit depend on what you mean by on top of13:59
kashyapBecause we don't have the bandwidth to test all the possibilities13:59
sean-k-mooneyif you mena later in the series then no if you mean in addtion too then yes13:59
kashyapThe safest is to wrap the 2 calls in the workaround: and _then_ do any code-flow changes on top14:00
kashyapDoes that sound reasonable?  (Cc: gibi)14:00
sean-k-mooneykashyap: as it stand i cant really supprot backporting the changes you are proposing downstream or upstream14:00
kashyapsean-k-mooney: Please see above comment - do you see what I mean?14:00
sean-k-mooneyi do but i dont agree that is what we should do14:01
gibisome of the risk coming from the incomplete test covarege is mitigated by having the WA flag I assume14:02
kashyapsean-k-mooney: Shall we please arrive at a reasonable compromise, instead of "ideally situations?14:02
kashyapgibi: Exactly14:02
sean-k-mooneyyes but i dont think we can recommend that customer use that without closing some of those gaps and qe testing14:02
kashyapWe can't possibly test all cases.  So I'm not confident in any code-flow w/o absolute confidence that it's introducing unwanted regressions14:02
*** dasm}off is now known as dasm14:02
sean-k-mooneykashyap: i am confident that disablinbg the check is a regession and im not comfrotabel with use telling custoemr to do that14:03
kashyapsean-k-mooney: Addressing gaps and QE testing can come later.  We take step by step based on the available bandwidth.14:03
gibiwe don't have the coverage over the current code either, and we see a bug in it for the *lake CPUs. We want to fix the bug by removing a buggy check. But by removing the buggy check we remove the good part of the check too. 14:04
gibisean-k-mooney proposed a fix that does not remove the buggy check but improve it14:04
gibisure it has a testing risk as we have no coverage14:05
gibiso the we have to decide what is bigger the lost good part of the check if we remove it or the risk by fixing the check but not having full coverage14:05
kashyapOkay, let me try sean-k-mooney's suggestion from their comment14:05
kashyapAnd "see what happens" with tests14:06
kashyapI'll come back and comment on the change.14:06
sean-k-mooneywe can test it with the functional tests we just need to write a test for it14:06
sean-k-mooneybut i agree the test coverage is not complete and we might want to do that as a followup14:06
gibisean-k-mooney: we don't have the real libvirt code running in functional 14:06
gibiwe have the libvirt/driver but we don't have the compareCPU from libvirt itself14:07
sean-k-mooneywe dont but we could enhance the fixture if we really wanted14:07
sean-k-mooneyi was thinking if we could test this in ci via tempest but we dont really contole the enve enought to do that reliably14:07
sean-k-mooneyso i dont really want to go down that route14:08
gibiyeah, best would be to test this with real hardver and the TSX case but that is haaard14:08
gibi(should be possible downstream though)14:08
sean-k-mooneywith whitebox we coudl proably do it14:09
kashyapI'm running the UTs with Sean's suggestion on top this: https://review.opendev.org/c/openstack/nova/+/87079414:09
sean-k-mooneyhow we woululd do it upstream in tempest is selct a cpu_model that we knwo is unaviable14:10
kashyap(If that gives some confidence in a day or so.  I can wrap it in the workaround.)14:10
sean-k-mooneyand disable the cpu flag to make it look like nehalem14:10
sean-k-mooneythe problem is that a provider could add that model in teh future14:10
sean-k-mooneythe best way to do that would be to use a model that requests a remvoed feature like TSX14:10
sean-k-mooneyand disable tsk via the extra flags14:11
sean-k-mooneypresumable our ci provireder also dont have tsx avaible anymore14:11
sean-k-mooneyif they have been updating htere kernels/microcode14:11
kashyapYeah, we don't know that, and can't rely on it.14:12
gibiso my position is that let's take the risk and doing sean-k-mooney's fix but have the WA flag added. So if the change is good then all is golden, but if the change has some unforseen side effect then we can turn it off while we improve it. For me the risk of this scenario is acceptable. 14:15
kashyapRight, I'm testing Sean's suggestion with a workaroud wrapper14:16
sean-k-mooneyack14:16
kashyapgibi: Can you also review the code that sean-k-mooney suggested, please?  Two eyes are better :)14:16
sean-k-mooneyso you testing just one check the new one i pasted ya14:16
sean-k-mooneyand that wrapped in the workaround flag14:16
kashyapsean-k-mooney: Your comment on PS8, yeah14:16
gibikashyap: I checked it it looked OK to me at first glance14:16
sean-k-mooneycool14:16
sean-k-mooneyok i want to prepare for meeting ectra so ill be back in a bit14:18
kashyapSure; thanks, folks.14:18
kashyapgibi: sean-k-mooney: The first brush looks good :) I only had to modify the mocked_compare in 3 unit tests to return a negative integer.14:32
opendevreviewKashyap Chamarthy proposed openstack/nova master: libvirt: At start-up rework compareCPU() usage with a workaround  https://review.opendev.org/c/openstack/nova/+/87079414:43
opendevreviewKashyap Chamarthy proposed openstack/nova master: libvirt: Replace usage of compareCPU() with compareHypervisorCPU()  https://review.opendev.org/c/openstack/nova/+/86995014:43
kashyapsean-k-mooney:  --^ (gibi: On the UTs: with the negative integer added, the extra mock is not required even.)14:43
kashyapThanks for bearing with me!14:43
opendevreviewArtom Lifshitz proposed openstack/nova master: DNM: demo 1  https://review.opendev.org/c/openstack/nova/+/87148014:50
opendevreviewArtom Lifshitz proposed openstack/nova master: DNM: demo 2  https://review.opendev.org/c/openstack/nova/+/87148114:50
opendevreviewArtom Lifshitz proposed openstack/nova master: DNM: demo 2  https://review.opendev.org/c/openstack/nova/+/87148114:52
opendevreviewArtom Lifshitz proposed openstack/nova master: DNM: demo 1  https://review.opendev.org/c/openstack/nova/+/87148014:52
opendevreviewArtom Lifshitz proposed openstack/nova master: DNM: demo middle  https://review.opendev.org/c/openstack/nova/+/87148214:52
sahido/ - quick question I have to rebase on conflict a micro-version change, I have some tests failing but struggling to find the issue https://paste.ubuntu.com/p/RNw8yxQtj6/15:58
sahidany idea that can help me?15:59
sahidTemplate: ^2.95$16:01
sahidSample: 2.9416:01
sahidI don't see where this 2.94 is comming from16:01
bauzassahid: are you sure you updated all your API change files so it now uses 2.95 ?16:57
bauzassahid: commented your patch17:04
sahidbauzas: well I want to say yes but I have probably missed something :-)17:34
sahidI will double check17:34
bauzassahid: I tried to quick looked at the test to understand why it autogenerates 2.9417:34
bauzasto quickly look*17:34
bauzasbut I haven't found a lot, if you still have the problem, you could try to pdb it17:35
bauzasto find how it generates this template17:35
bauzasif you can't, I can offer my help tomorrow17:35
sahidno worries thanks to have looked at it. I will double check that tomorrow as-well, don't spend time on it I will ping you when it's ready :-)17:37
opendevreviewSofia Enriquez proposed openstack/nova master: WIP: Implement encryption on backingStore  https://review.opendev.org/c/openstack/nova/+/87001218:28
opendevreviewSahid Orentino Ferdjaoui proposed openstack/nova master: api: extend evacuate instance to support target state  https://review.opendev.org/c/openstack/nova/+/85838418:53
*** dasm is now known as dasm|off23:32

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!