auniyal | Hello | 05:25 |
---|---|---|
auniyal | please review these 3 | 05:25 |
auniyal | https://review.opendev.org/c/openstack/nova/+/848886 | 05:26 |
auniyal | https://review.opendev.org/c/openstack/nova/+/849104 | 05:26 |
auniyal | https://review.opendev.org/c/openstack/nova/+/849532 | 05:26 |
*** akekane_ is now known as abhishekk | 08:04 | |
sean-k-mooney | stephenfin: if you get time can you fix the requirmetns file in https://review.opendev.org/c/openstack/nova/+/849867 then it should be good to merge | 10:03 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Reproduce bug 1981813 in func env https://review.opendev.org/c/openstack/nova/+/849985 | 10:45 |
gibi | sean-k-mooney: ^^ this is the reproduction for vnic_type change bug | 10:45 |
opendevreview | Merged openstack/nova master: libvirt: remove default cputune shares value https://review.opendev.org/c/openstack/nova/+/824048 | 11:25 |
artom | gibi, I left a comment, I'm... not sure? | 11:34 |
artom | Smells like more of a unit test | 11:34 |
gibi | artom: you probably found a bug in that test | 11:35 |
gibi | it should only pass if _heal_instance_info_cache updates the vnic_type in our cache and then restart | 11:36 |
artom | My point is that the vnic change and heal cache isn't needed | 11:36 |
gibi | probably made a two broad mock | 11:36 |
artom | If you just start the server with a macvtap and mock the PCI code to return NotFound, you'll still get the exact same service startup error | 11:36 |
gibi | yeah | 11:36 |
artom | And like, I know in real life the NotFound comes from changing the vnic_type | 11:37 |
artom | OK, I think I got it. | 11:37 |
artom | No, wait. | 11:38 |
artom | The thing that Nova is doing wrong is attempting to find the wrong PCI device? Or because the device is already used by the instance, we can't find it? | 11:41 |
gibi | it is used by the instance so we cannot find it afaik | 11:42 |
gibi | or more precisely | 11:42 |
gibi | /sys/bus/pci/devices/0000:19:0a.7/net is not exists | 11:43 |
gibi | as the pci device is attached to the instance there is no netdev on the host | 11:43 |
gibi | after the vnic_type changed to macvtap the vif plug code wants to call set_vf_interface_vlan on the pci device | 11:45 |
artom | So wait, shouldn't that also happen if you just restart nova-compute with any macvtap device? | 11:45 |
gibi | hm | 11:46 |
gibi | that is a good question | 11:46 |
gibi | wait | 11:46 |
gibi | if it is consumed as a macvtap | 11:46 |
gibi | then the VF is not consumed | 11:46 |
gibi | and probably the netdev exists | 11:46 |
artom | Ah, right. | 11:47 |
artom | Sorry, creating confusion. | 11:47 |
gibi | the problem is that we consumed the VF and then looking up the VF's netdev | 11:47 |
artom | Bingo. | 11:47 |
artom | So the functional test should probably be asserting *that* | 11:47 |
gibi | so probably mocking get_ifname_by_pci_address is too much and I should inject the fault at set_vf_interface_vlan as that is the last specific call | 11:49 |
gibi | for the macvtap plug | 11:49 |
artom | Also, this is in the plug logic, right? | 11:49 |
artom | So should also happen during instance hard reboot? | 11:50 |
artom | We don't do any PCI accounting update during reboot... | 11:50 |
gibi | yes | 11:50 |
gibi | probably the plug fails at hard reboot too | 11:50 |
gibi | I can try as I have real sriov hw right now | 11:50 |
gibi | still the test will not be perfect as with mocking I can only simulate that the VF is consumed | 11:52 |
gibi | so if you move the port set before boot but keep the mock as is then it will still fail | 11:52 |
gibi | as it will still simulate that the VF is cosumed | 11:52 |
gibi | even if in reality it would not be | 11:53 |
artom | Right, there's no way around the mocking | 11:53 |
artom | But... I think I'd an asserting what what device we tried to look for | 11:54 |
artom | *assertion | 11:54 |
artom | "<gibi> the problem is that we consumed the VF and then looking up the VF's netdev" | 11:54 |
artom | :) | 11:54 |
gibi | even always try to look for the VF of the macvtap in both cases (a) when booting with a proper macvtap, b) when change the port to macvtap and then rebooting) | 11:55 |
gibi | s/even// | 11:55 |
artom | So the first case is legit, obviously | 11:55 |
gibi | yes | 11:56 |
gibi | second shoudl fail | 11:56 |
gibi | but I cannot distinguish on the mock level | 11:56 |
artom | The breakage in the second case from the fact that we didn't free up the device when the port type changed | 11:56 |
gibi | in the func env we never consume VF the pci stuff is mocked out globally | 11:56 |
artom | No? We can't do something like assert_called_with(<netdev_path>)? | 11:57 |
gibi | we can do both in both case the path would be the same | 11:57 |
gibi | *but | 11:57 |
gibi | the difference is that the boot consumed someting in b) but not in a) | 11:58 |
gibi | but we does not track consumption | 11:58 |
gibi | as that is on the host OS level | 11:58 |
artom | Like, I'm not saying "stop asserting the service start failure" | 11:58 |
artom | Oh, I think I get it | 11:59 |
artom | There's no assertion we can make that would be different between "start with legit macvtap device" and "start with vnic_type changed macvtap" | 11:59 |
gibi | yes | 11:59 |
gibi | as the path and the logic is the same | 11:59 |
gibi | the diff happens during boot | 12:00 |
gibi | this is like extrnal state that we don't modell in test | 12:00 |
gibi | the boot changes the external state (the host OS) and the reboot will depend on that state | 12:00 |
gibi | but we don't carry that state in the test env | 12:00 |
gibi | we could, but we don't today | 12:01 |
gibi | we could create a proper stub for the pci module and track pci devices | 12:01 |
artom | That smells like a lot of work... :) | 12:03 |
artom | But... you understand why I find the test weird, right? Like, we go through all these change vnic_type steps, but they're all moot because of the mock | 12:03 |
gibi | yes, that would be a piece of work :) | 12:04 |
gibi | and yes, I got you, I try to figure out something better... | 12:04 |
artom | Sorry for not bringing any better solution :P | 12:05 |
gibi | basically the mock and the vnic_type change need to be coupled somehow... | 12:05 |
gibi | artom: no worries, you have a valid point, and it was a good excersise to talk it through | 12:05 |
gibi | artom: interestingly the hard reboot did not fail | 12:20 |
gibi | it created a proper macvtap interface and passed to the instance | 12:20 |
gibi | it is probably works because hard reboot unplugs the vif -> the VF will be freed the plug can lookup the netdev | 12:21 |
gibi | and the accounting in nova side is actually correct as the macvtap dev needs to keep the parent VF allocated to the instance (and that does not change during the reboot) | 12:22 |
gibi | so probably a direct -> direct-phyical change would not work | 12:22 |
artom | Ah, right, because it's the host OS accounting | 12:29 |
artom | So as long as we unplug first, we're fine | 12:29 |
gibi | in direct -> macvtap yes we are fine both on the OS level an in the nova PCI tracking | 12:31 |
gibi | in case of direct -> direct-phyical the PCI tracking in nova will be inconsistent | 12:31 |
gibi | as the instance will have a VF allocated but consuming a PF instead (probably) | 12:31 |
gibi | artom: I think I see a way to make the test more logical. We can check in the get_ifname_by_pci_address mock if the device being looked up is actaully consumed as a VF by the instance | 12:58 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Reproduce bug 1981813 in func env https://review.opendev.org/c/openstack/nova/+/849985 | 13:00 |
gibi | artom: ^^ | 13:01 |
artom | Oooo, nice | 13:17 |
gibi | now if you move the port update before the boot then the mock will not raise the exception and the test will fail | 13:18 |
gibi | we have the external state but it is stored the domain in the fake libvirt connectioin | 13:19 |
gibi | * stored in the domain | 13:19 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Gracefully ERROR in _init_instance vnic_type changed https://review.opendev.org/c/openstack/nova/+/850003 | 13:23 |
gibi | and here is the "fix" (two error logs and a skip) | 13:23 |
gibi | I have to leave early today so I will add unit test and reno to the patch on monday | 13:24 |
artom | Sure | 13:29 |
*** dasm|off is now known as dasm|ruck | 13:33 | |
sean-k-mooney | gibi: ack ill review that shortly i just got back form doctors appoinent they were runing 90 mins behind so took longer then planned | 13:43 |
gibi | thanks | 13:44 |
gibi | no rush, I will probably look at it only on Monday | 13:45 |
sean-k-mooney | ack | 13:45 |
sean-k-mooney | gibi: i have set the public security flag on https://bugs.launchpad.net/nova/+bug/1981813 by the way just to highlight it to the security team | 13:56 |
sean-k-mooney | it was a public bug downstream so that ship had already sailed when we triaged it | 13:57 |
opendevreview | Stephen Finucane proposed openstack/nova master: Fix compatibility with jsonschema 4.x https://review.opendev.org/c/openstack/nova/+/849867 | 14:12 |
opendevreview | Stephen Finucane proposed openstack/nova master: Remove unused requirement https://review.opendev.org/c/openstack/nova/+/850006 | 14:12 |
stephenfin | sean-k-mooney: Sorry, I missed your ping earlier. Respun with an arbitrary version now ^ | 14:12 |
sean-k-mooney | cool | 14:14 |
sean-k-mooney | will look soon | 14:14 |
opendevreview | Stephen Finucane proposed openstack/nova master: Bump jsonschema minimum to 4.0.0 https://review.opendev.org/c/openstack/nova/+/850021 | 15:23 |
sean-k-mooney | artom: you are aware that what you sugesed in https://review.opendev.org/c/openstack/nova/+/849104 is not allowed or what you are ment to do right | 15:55 |
artom | sean-k-mooney, what's not allowed? | 15:56 |
sean-k-mooney | using _IntegratedTestBase in new regression tests | 15:56 |
artom | o_O | 15:56 |
artom | Is that a new thing? | 15:57 |
sean-k-mooney | the regression tests are explicatly not ment to use the full test infra and we are not ment to use _IntegratedTestBase for new test in genera | 15:57 |
artom | ... | 15:57 |
sean-k-mooney | artom: yes its docuemnted in the readme for the regression tests i linked it in the commend inline | 15:57 |
artom | [artom@zoe nova]$ ag _IntegratedTestBase nova/tests/functional/regressions/ | wc -l | 15:57 |
artom | 19 | 15:57 |
sean-k-mooney | yep there are cases where its used | 15:57 |
sean-k-mooney | but we are trying not to do ti any more | 15:57 |
artom | Are we? Why? Because backports? | 15:58 |
sean-k-mooney | because regression tests are ment to be independing of the rest fo the fucntest so that they are not impacted by refacortings | 15:58 |
sean-k-mooney | it also help with backprots but that is not the main reason | 15:59 |
artom | I mean https://review.opendev.org/c/openstack/nova/+/812126 | 15:59 |
artom | That's at the beginning of this year | 15:59 |
sean-k-mooney | yep i know that should not have used that | 15:59 |
artom | Like, we can discuss this, and others can correct me if I'm wrong, but this is the first I hear of it, and I wasn't aware it was an established thing that all core reviewers had agreed on. | 16:00 |
sean-k-mooney | its in the readme | 16:00 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/tests/functional/regressions/README.rst#writing-regression-tests= | 16:00 |
artom | And clearly at least *some* people have not read it :) | 16:01 |
sean-k-mooney | right this is a very long standing policy for that subdirectory | 16:01 |
artom | Of not reading it? :D | 16:02 |
sean-k-mooney | it was intoduced when it was created https://github.com/openstack/nova/commit/5fe04c2ee8b721ecdaea4d06de54b4998a0df395 | 16:02 |
sean-k-mooney | i had condiered asking gibi to rewrite there repoducer to follow it a few minute ago | 16:02 |
sean-k-mooney | but we dont require all repoduced to be added as freestandign regressions | 16:03 |
sean-k-mooney | so i did not | 16:03 |
sean-k-mooney | but if they are in that folder they should be freestanding like that | 16:03 |
artom | Genuinely news to me. | 16:06 |
artom | If we want to make that a thing I won't stand in anyone's ways, but I disagree that it's already a thing. | 16:06 |
sean-k-mooney | it reallly is | 16:06 |
artom | I mean in that case let's add a hacking rule | 16:06 |
artom | If it finds _IntegratedTestCase in the regressions folder, pep8 -1 | 16:07 |
artom | If we're *actually* going to be strict about it | 16:07 |
sean-k-mooney | artom: the other thing to keep in mind is that stephen had been trying to remvoe _IntegratedTestCase at one point | 16:07 |
artom | stephenfin's been trying to remove *everything* at one point ;) | 16:07 |
sean-k-mooney | so we should not be directlly adding new tests that use it as a base | 16:07 |
sean-k-mooney | right but _IntegratedTestCase is considerd private | 16:08 |
sean-k-mooney | and new code should not inherit form it directly | 16:08 |
artom | I feel like facts on the ground contradict that :) | 16:08 |
sean-k-mooney | its subclasses are fine but not that | 16:08 |
sean-k-mooney | i think lee used it for conveicne not because it was the most correct way to repoduce | 16:09 |
sean-k-mooney | we generally try to be pragmatic about this but i disagree that amit shoudl rewrite the test they created | 16:11 |
sean-k-mooney | ProviderUsageBaseTestCase would be prefible if you require someing other then the mixins for new tests | 16:13 |
sean-k-mooney | or LibvirtProviderUsageBaseTestCase if you need libvirt | 16:14 |
*** akekane_ is now known as ahishekk | 16:14 | |
opendevreview | Artom Lifshitz proposed openstack/nova master: hacking: forbid _IntegratedTestBase in regression tests https://review.opendev.org/c/openstack/nova/+/850053 | 17:03 |
artom | sean-k-mooney ^^ Yes, I'm part-trolling :) | 17:03 |
artom | Also, it doens't actually work, which... is surprisingly annoying? | 17:03 |
artom | Like, I thought I'd be able to hack it up quicker, so the ego takes a hit | 17:03 |
sean-k-mooney | its kind fo hard to do sicne its not about _IntegratedTestBase sepcificaly | 17:04 |
sean-k-mooney | we are alowed to use the mixins | 17:04 |
artom | Oh, like those will never get refactored :P | 17:06 |
sean-k-mooney | well they should not the can have methods added but they idaaly would be addditive changes only or mainly | 17:06 |
sean-k-mooney | im going to call it a day chat to you monday o/ | 17:12 |
artom | Oh crap that remind me, I'm off next week | 17:26 |
* artom sends email | 17:26 | |
*** tobias-urdin is now known as tobias-urdin_pto | 19:19 | |
*** dasm|ruck is now known as dasm|off | 21:37 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!