Friday, 2022-07-15

auniyalHello05:25
auniyalplease review these 305:25
auniyalhttps://review.opendev.org/c/openstack/nova/+/84888605:26
auniyalhttps://review.opendev.org/c/openstack/nova/+/84910405:26
auniyalhttps://review.opendev.org/c/openstack/nova/+/84953205:26
*** akekane_ is now known as abhishekk08:04
sean-k-mooneystephenfin: if you get time can you fix the requirmetns file in https://review.opendev.org/c/openstack/nova/+/849867 then it should be good to merge10:03
opendevreviewBalazs Gibizer proposed openstack/nova master: Reproduce bug 1981813 in func env  https://review.opendev.org/c/openstack/nova/+/84998510:45
gibisean-k-mooney: ^^ this is the reproduction for vnic_type change bug10:45
opendevreviewMerged openstack/nova master: libvirt: remove default cputune shares value  https://review.opendev.org/c/openstack/nova/+/82404811:25
artomgibi, I left a comment, I'm... not sure?11:34
artomSmells like more of a unit test11:34
gibiartom: you probably found a bug in that test11:35
gibiit should only pass if _heal_instance_info_cache updates the vnic_type in our cache and then restart11:36
artomMy point is that the vnic change and heal cache isn't needed11:36
gibiprobably made a two broad mock 11:36
artomIf you just start the server with a macvtap and mock the PCI code to return NotFound, you'll still get the exact same service startup error11:36
gibiyeah11:36
artomAnd like, I know in real life the NotFound comes from changing the vnic_type11:37
artomOK, I think I got it.11:37
artomNo, wait.11:38
artomThe thing that Nova is doing wrong is attempting to find the wrong PCI device? Or because the device is already used by the instance, we can't find it?11:41
gibiit is used by the instance so we cannot find it afaik11:42
gibior more precisely 11:42
gibi/sys/bus/pci/devices/0000:19:0a.7/net is not exists11:43
gibias the pci device is attached to the instance there is no netdev on the host11:43
gibiafter the vnic_type changed to macvtap the vif plug code wants to call set_vf_interface_vlan on the pci device11:45
artomSo wait, shouldn't that also happen if you just restart nova-compute with any macvtap device?11:45
gibihm11:46
gibithat is a good question11:46
gibiwait11:46
gibiif it is consumed as a macvtap11:46
gibithen the VF is not consumed11:46
gibiand probably the netdev exists11:46
artomAh, right.11:47
artomSorry, creating confusion.11:47
gibithe problem is that we consumed the VF and then looking up the VF's netdev11:47
artomBingo.11:47
artomSo the functional test should probably be asserting *that*11:47
gibiso probably mocking get_ifname_by_pci_address is too much and I should inject the fault at set_vf_interface_vlan as that is the last specific call11:49
gibifor the macvtap plug11:49
artomAlso, this is in the plug logic, right?11:49
artomSo should also happen during instance hard reboot?11:50
artomWe don't do any PCI accounting update during reboot...11:50
gibiyes11:50
gibiprobably the plug fails at hard reboot too11:50
gibiI can try as I have real sriov hw right now11:50
gibistill the test will not be perfect as with mocking I can only simulate that the VF is consumed 11:52
gibiso if you move the port set before boot but keep the mock as is then it will still fail11:52
gibias it will still simulate that the VF is cosumed11:52
gibieven if in reality it would not be11:53
artomRight, there's no way around the mocking11:53
artomBut... I think I'd an asserting what what device we tried to look for11:54
artom*assertion11:54
artom"<gibi> the problem is that we consumed the VF and then looking up the VF's netdev"11:54
artom:)11:54
gibieven always try to look for the VF of the macvtap in both cases (a) when booting with a proper macvtap, b) when change the port to macvtap and then rebooting)11:55
gibis/even//11:55
artomSo the first case is legit, obviously11:55
gibiyes11:56
gibisecond shoudl fail11:56
gibibut I cannot distinguish on the mock level11:56
artomThe breakage in the second case from the fact that we didn't free up the device when the port type changed11:56
gibiin the func env we never consume VF the pci stuff is mocked out globally11:56
artomNo? We can't do something like assert_called_with(<netdev_path>)?11:57
gibiwe can do both in both case the path would be the same11:57
gibi*but11:57
gibithe difference is that the boot consumed someting in b) but not in a)11:58
gibibut we does not track consumption11:58
gibias that is on the host OS level11:58
artomLike, I'm not saying "stop asserting the service start failure"11:58
artomOh, I think I get it11:59
artomThere's no assertion we can make that would be different between "start with legit macvtap device" and "start with vnic_type changed macvtap"11:59
gibiyes11:59
gibias the path and the logic is the same11:59
gibithe diff happens during boot12:00
gibithis is like extrnal state that we don't modell in test12:00
gibithe boot changes the external state (the host OS) and the reboot will depend on that state12:00
gibibut we don't carry that state in the test env12:00
gibiwe could, but we don't today12:01
gibiwe could create a proper stub for the pci module and track pci devices12:01
artomThat smells like a lot of work... :)12:03
artomBut... you understand why I find the test weird, right? Like, we go through all these change vnic_type steps, but they're all moot because of the mock12:03
gibiyes, that would be a piece of work :)12:04
gibiand yes, I got you, I try to figure out something better...12:04
artomSorry for not bringing any better solution :P12:05
gibibasically the mock and the vnic_type change need to be coupled somehow...12:05
gibiartom: no worries, you have a valid point, and it was a good excersise to talk it through12:05
gibiartom: interestingly the hard reboot did not fail12:20
gibiit created a proper macvtap interface and passed to the instance12:20
gibiit is probably works because hard reboot unplugs the vif -> the VF will be freed the plug can lookup the netdev12:21
gibiand the accounting in nova side is actually correct as the macvtap dev needs to keep the parent VF allocated to the instance (and that does not change during the reboot)12:22
gibiso probably a direct -> direct-phyical change would not work 12:22
artomAh, right, because it's the host OS accounting12:29
artomSo as long as we unplug first, we're fine12:29
gibiin direct -> macvtap yes we are fine both on the OS level an in the nova PCI tracking12:31
gibiin case of direct -> direct-phyical the PCI tracking in nova will be inconsistent12:31
gibias the instance will have a VF allocated but consuming a PF instead (probably)12:31
gibiartom: I think I see a way to make the test more logical. We can check in the get_ifname_by_pci_address mock if the device being looked up is actaully consumed as a VF by the instance12:58
opendevreviewBalazs Gibizer proposed openstack/nova master: Reproduce bug 1981813 in func env  https://review.opendev.org/c/openstack/nova/+/84998513:00
gibiartom: ^^13:01
artomOooo, nice13:17
gibinow if you move the port update before the boot then the mock will not raise the exception and the test will fail13:18
gibiwe have the external state but it is stored the domain in the fake libvirt connectioin13:19
gibi* stored in the domain13:19
opendevreviewBalazs Gibizer proposed openstack/nova master: Gracefully ERROR in _init_instance vnic_type changed  https://review.opendev.org/c/openstack/nova/+/85000313:23
gibiand here is the "fix" (two error logs and a skip)13:23
gibiI have to leave early today so I will add unit test and reno to the patch on monday13:24
artomSure13:29
*** dasm|off is now known as dasm|ruck13:33
sean-k-mooneygibi: ack ill review that shortly i just got back form doctors appoinent they were runing 90 mins behind so took longer then planned13:43
gibithanks13:44
gibino rush, I will probably look at it only on Monday13:45
sean-k-mooneyack13:45
sean-k-mooneygibi: i have set the public security flag on https://bugs.launchpad.net/nova/+bug/1981813 by the way just to highlight it to the security team13:56
sean-k-mooneyit was a public bug downstream so that ship had already sailed when we triaged it13:57
opendevreviewStephen Finucane proposed openstack/nova master: Fix compatibility with jsonschema 4.x  https://review.opendev.org/c/openstack/nova/+/84986714:12
opendevreviewStephen Finucane proposed openstack/nova master: Remove unused requirement  https://review.opendev.org/c/openstack/nova/+/85000614:12
stephenfinsean-k-mooney: Sorry, I missed your ping earlier. Respun with an arbitrary version now ^14:12
sean-k-mooneycool14:14
sean-k-mooneywill look soon14:14
opendevreviewStephen Finucane proposed openstack/nova master: Bump jsonschema minimum to 4.0.0  https://review.opendev.org/c/openstack/nova/+/85002115:23
sean-k-mooneyartom: you are aware that what you sugesed in https://review.opendev.org/c/openstack/nova/+/849104 is not allowed or what you are ment to do right15:55
artomsean-k-mooney, what's not allowed?15:56
sean-k-mooneyusing _IntegratedTestBase in new regression tests15:56
artomo_O15:56
artomIs that a new thing?15:57
sean-k-mooneythe regression tests are explicatly not ment to use the full test infra and we are not ment to use _IntegratedTestBase for new test in genera15:57
artom...15:57
sean-k-mooneyartom: yes its docuemnted in the readme for the regression tests i linked it in the commend inline15:57
artom[artom@zoe nova]$ ag _IntegratedTestBase nova/tests/functional/regressions/ | wc -l15:57
artom1915:57
sean-k-mooneyyep there are cases where its used15:57
sean-k-mooneybut we are trying not to do ti any more15:57
artomAre we? Why? Because backports?15:58
sean-k-mooneybecause regression tests are ment to be independing of the rest fo the fucntest so that they are not impacted by refacortings15:58
sean-k-mooneyit also help with backprots but that is not the main reason15:59
artomI mean https://review.opendev.org/c/openstack/nova/+/81212615:59
artomThat's at the beginning of this year15:59
sean-k-mooneyyep i know that should not have used that15:59
artomLike, we can discuss this, and others can correct me if I'm wrong, but this is the first I hear of it, and I wasn't aware it was an established thing that all core reviewers had agreed on.16:00
sean-k-mooneyits in the readme16:00
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/tests/functional/regressions/README.rst#writing-regression-tests=16:00
artomAnd clearly at least *some* people have not read it :)16:01
sean-k-mooneyright this is a very long standing policy for that subdirectory16:01
artomOf not reading it? :D16:02
sean-k-mooneyit was intoduced when it was created https://github.com/openstack/nova/commit/5fe04c2ee8b721ecdaea4d06de54b4998a0df39516:02
sean-k-mooneyi had condiered asking gibi to rewrite there repoducer to follow it a few minute ago16:02
sean-k-mooneybut we dont require all repoduced to be added as freestandign regressions16:03
sean-k-mooneyso i did not16:03
sean-k-mooneybut if they are in that folder they should be freestanding like that16:03
artomGenuinely news to me.16:06
artomIf we want to make that a thing I won't stand in anyone's ways, but I disagree that it's already a thing.16:06
sean-k-mooneyit reallly is16:06
artomI mean in that case let's add a hacking rule16:06
artomIf it finds _IntegratedTestCase in the regressions folder, pep8 -116:07
artomIf we're *actually* going to be strict about it16:07
sean-k-mooneyartom: the other thing to keep in mind is that stephen had been trying to remvoe _IntegratedTestCase at one point16:07
artomstephenfin's been trying to remove *everything* at one point ;)16:07
sean-k-mooneyso we should not be directlly adding new tests that use it as a base16:07
sean-k-mooneyright but _IntegratedTestCase is considerd private16:08
sean-k-mooneyand new code should not inherit form it directly16:08
artomI feel like facts on the ground contradict that :)16:08
sean-k-mooneyits subclasses are fine but not that16:08
sean-k-mooneyi think lee used it for conveicne not because it was the most correct way to repoduce16:09
sean-k-mooneywe generally try to be pragmatic about this but i disagree that amit shoudl rewrite the test they created16:11
sean-k-mooneyProviderUsageBaseTestCase would be prefible if you require someing other then the mixins for new tests16:13
sean-k-mooneyor LibvirtProviderUsageBaseTestCase if you need libvirt16:14
*** akekane_ is now known as ahishekk16:14
opendevreviewArtom Lifshitz proposed openstack/nova master: hacking: forbid _IntegratedTestBase in regression tests  https://review.opendev.org/c/openstack/nova/+/85005317:03
artomsean-k-mooney ^^ Yes, I'm part-trolling :)17:03
artomAlso, it doens't actually work, which... is surprisingly annoying?17:03
artomLike, I thought I'd be able to hack it up quicker, so the ego takes a hit17:03
sean-k-mooneyits kind fo hard to do sicne its not about _IntegratedTestBase sepcificaly17:04
sean-k-mooneywe are alowed to use the mixins17:04
artomOh, like those will never get refactored :P17:06
sean-k-mooneywell they should not the can have methods added but they idaaly would be addditive changes only or mainly17:06
sean-k-mooneyim going to call it a day chat to you monday o/17:12
artomOh crap that remind me, I'm off next week17:26
* artom sends email17:26
*** tobias-urdin is now known as tobias-urdin_pto19:19
*** dasm|ruck is now known as dasm|off21:37

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!