Monday, 2023-11-13

opendevreviewRajesh Tailor proposed openstack/nova master: Fix KeyError on assisted snapshot call  https://review.opendev.org/c/openstack/nova/+/90078310:35
opendevreviewDanylo Vodopianov proposed openstack/nova master: Packed virtqueue support was added.  https://review.opendev.org/c/openstack/nova/+/87607511:46
jkulikHi folks, we just stumbled over https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/5ASRKETLJQAZM2KJ2EDIJZB6XMDWAL4P/ which has removing vmware from Nova and other projects as the topic. We're still actively using VMware inside SAP through OpenStack. Is there any way we can reverse the decision to completely remove the vmwareapi infrastructure from Nova?12:01
stephenfinjkulik: I would suggest bringing this up on the Nova weekly meeting and likely openstack-discuss. The details and link to the agenda for the former can be found here https://meetings.opendev.org/#Nova_Team_Meeting12:12
stephenfinjkulik: the tl;dr: is that we have had no CI for any of this code for many years now, which means we have no ability to test it and ensure that it works/isn't broken by other changes12:15
opendevreviewSamuel Kunkel proposed openstack/nova master: fix: amd-sev handle missing img properties  https://review.opendev.org/c/openstack/nova/+/87426412:17
stephenfinjkulik: I would also suggest you need to bring this up internally rather quickly. I can't speak for anyone else but _my guess_ would be that the only way this decision would be reversed would be via a long-term commitment from some third party to maintain testing infrastructure for these components (the VMWare-related drivers in Nova, Cinder, Neutron etc.; oslo-vmware, ...)12:17
stephenfinjkulik: We previously got such a commitment from VMWare folks (about 2-3 years ago, iirc) but it wasn't followed up on, so there will likely be some hesitation at taking any commitment at face value without seeing e.g. a CI being established12:20
stephenfinHopefully that helps provide a bit of colour to the situation 12:20
jkulikthank you. indeed I remember the deprecation attempt 2-3 years ago when we got VMware to bring back some CI. we were not aware that it's broken again.12:23
sean-k-mooneyjkulik: the ci hase been absent for 2+ years12:34
jkulikiirc this was the time when we talked to VMware and they brougt back a CI https://review.opendev.org/c/openstack/nova/+/712101 - 2020-03 ... so there should have been some CI ~3 years ago again. https://review.opendev.org/c/openstack/nova/+/806391 in 2021-09 still contains a message from a "VMware NSX CI". I'm just wondering when it went away.12:39
sean-k-mooneyjkulik: presumably related to the pandmeic that ci went way again in 2021 and never returned12:47
jkulik2022-02 I found a message `ext-nova-zuul: NOT_REGISTERED` for that CI. sounds like a good hint when it went away. in 2021 it still seemed to work. Even in 2022-01-27 there's still a message for a Cinder review-request12:47
jkulikyeah, VMware doesn't have too much going for OpenStack. is there a way for a third-party, non-VMware user to get notified if the CI goes away?12:48
sean-k-mooneythere isnt, and evenupstream developer will only know if they noticed its not reporting12:49
sean-k-mooneyjkulik: they did not tell us they were nolonger maintianing it or removing it12:49
sean-k-mooneyit jus t disappeared silently12:49
sean-k-mooneyagain12:50
jkulikthey did not tell us either ... that's why I would have liked to get notified somehow to tell them to fix it12:50
sean-k-mooneyat this point i thik its beyond that12:50
sean-k-mooneythey have demonstarted that they will not maintain the ci or lib requried to supprot vmware12:51
jkulikCould someone else step up?12:51
jkulikWould that be acceptable, I mean?12:51
sean-k-mooneyhonestly i dont think so12:51
jkulikso we would be left with trying to maintain an out-of-tree driver, which is not supported, and keeping all the to-be-removed code in sync with upstream in our downstream fork ... :/12:52
sean-k-mooneyat this point i think we woudl neeed to see more then a token effort12:52
sean-k-mooneyjkulik: out of tree dirvers wont work12:53
jkulikyeah, I read that12:53
stephenfinsean-k-mooney: I don't see any reason why another third party couldn't maintain it. As you say though, it'd have to be more than a token effort12:53
sean-k-mooneythe core of vmware is opensouce right (EXSI(12:53
sean-k-mooneystephenfin: someone else could12:54
jkulikno, ESXi is not open source. not sure if that was sarcasm12:54
sean-k-mooneyi tought ther ewas an opensouce core12:54
sean-k-mooneyexi is not12:54
sean-k-mooneybut isnt esxi based on linux12:54
jkulikI haven't heard of any. The things we use are not open source.12:55
jkulikwhat would "more than a token effort" entail?12:55
sean-k-mooneyactully continutigi to the maintaicne of vmaware as well as first/tridpary ci12:56
sean-k-mooneyjkulik: i might of been thinking fo the fact there is a free version of esxi12:56
sean-k-mooneyits not opensouce but there are free lisnces12:57
stephenfinAt a minimum, CI coverage. Practically speaking though, there's mountains of tech debt in the VMWare driver that needs addressing if we're keeping it around long-term12:57
sean-k-mooneybasically i was considing if instead of third party ci we coudl do this first party somehow12:57
sean-k-mooneyright there is a lot of techdebt( lack of functional tests, old apis/deps) and large feature gaps12:58
jkuliksounds like it boils down to us (SAP) working more with upstream, cleaning the driver and by that recognizing CI going missing, if we want that driver to stay12:59
stephenfinjkulik: Really it would be a near-full-time job. The obvious answer here is that SAP put forward someone to maintain CI and work to get your downstream fork synced back upstream (or at least work to get equivalent functionality added upstream) but I don't know how SAP are for upstream development. It's definitely something that has to developed and nurtured be in the org/company culture. There's also a cost to it, though how 12:59
stephenfinthat cost compares to maintaining an out-of-tree driver would be interesting to study12:59
stephenfinjkulik: exactly12:59
jkulikwe're open to upstreaming, but given that we need the features downstream first, they often don't end up upstream due to time constraints and us not actively including ourselves in the upstream community. I feel that pain every time when we port our patches to a newer release. I'll bring that topic up internally with management.13:02
jkulikthank you both13:03
stephenfinjkulik: np. I'll also add that we're generally very open to adding additional cores, particularly when those cores are focused on a particular area and have competency there, but obviously that competency needs to be demonstrated for some time first13:05
jkulik:+1:13:06
sean-k-mooneywell keep in mind the job of a core is primarly to review code not write it13:06
sean-k-mooneyso while stephenfin  is correct we would be open to adding cores the copendncy we woudl use to make that determination is there review output not there code output13:07
jkulikreview output in Nova in general, I guess, not only the vmwareapi driver then? As I guess, that one doesn't see too many patches.13:08
sean-k-mooneysortof both. in general in the past we have expect a general knowladge of the entire codebase but most core have areas or specialisation where they spend most of there time13:09
stephenfinThat would be an interesting discussion to have as a community. As you say, that area doesn't see too many patches and there is very little expertise in that area among existing cores. Ensuring we had decent review throughput would be an interesting thing to figure out13:10
sean-k-mooneywell part of the expction would be that they would enforece the same feature design crita as we do for the rest of nova13:10
sean-k-mooneyin the past vmware have added funcitonlity in the out of tree drive and then tried to upstream it13:11
sean-k-mooneyor made changes that normaly woudl requrie spec or blueprints that got little or no discussion 13:11
sean-k-mooneyso if we were to add cores spcialised in vmware i woudl expoect the process to addign vmware features to be the saem as libvirt/ironic13:12
jkulikhm ... yeah, I'd expect upstream to work like upstream not like downstream :D13:12
sean-k-mooneythe problem we have had historically with driver that have out of tree vairiant13:12
sean-k-mooneyis they and just enouch funcitonality in the intree driver to enabel them to create the feature they want to have in teh out of tree one13:13
jkulikhm ... having a lot of patches downstream, I can see the point. the VMware driver manages a whole cluster, so things might work quite differently than in the reest of the drivers / Nova in general13:14
sean-k-mooneyi was plannign to bring this up next cycle but once the vmware/hyperv dirers were removed i was gong to propose removoing the ablit y to have out of tree dirvers entirly13:14
sean-k-mooneyjkulik: we dont offically support out of tree driers in nova but we dont explictly block it either13:15
jkulikI mean, we're having a fork basically, so our patches are not necessarily "out of tree" - they are on top of tree :D13:15
stephenfinyeah, that's how cloudbase were handling things with the hyper-v driver13:16
stephenfinthrough a combination of (I suspect) lack of upstream review throughput but also lack of investment in the time needed to drive consensus on larger design changes13:16
jkulikyeah, same here. whenever we try to upstream common code (as a basis or start to upstream more), those reviews hang around for a long time.13:17
stephenfinthe zVM driver takes a different tack, where the driver is essentially a shim over an external library where all the "real" functionality lives13:17
jkuliksince there's noone invovled with the vmware driver upstream, that's fully understandable, but doesn't help in reducing the technical debt either13:18
stephenfinjkulik: then that would be on us. If someone were to start maintaining a CI and the codebase it would be up to us to ensure things aren't hanging around for months13:18
jkulikpoint being no CI anymore for nearly 2 years. got it.13:20
sean-k-mooneyya so zvm is interesting because i also have not see it run as a third party ci13:20
sean-k-mooneybut we did nto mark it as experimental so we need to give them time to correct taht13:21
sean-k-mooneythe IBM PowerKVM CI does not test zvm13:21
sean-k-mooneyits thesting livbirt on power13:21
sean-k-mooneythat siad it may only run wehn you make chagnes to the zvm dirver13:22
sean-k-mooneyso we need to submit a test patch and determin if its still functional13:23
jkulikDoes it make sense to submit monthly patches just to check the different CIs still running?13:25
stephenfinCIs can be configured to run periodic jobs. That would probably be a better approach13:25
stephenfinand publish the URL where those reports can be found13:26
tkajinamo/ May I ask for reviews to merge https://review.opendev.org/c/openstack/nova/+/900538 to make sqlalchemy-2.x job voting again ?13:50
zigoAlso, would be cool to have this one merged: https://review.opendev.org/c/openstack/oslo.cache/+/900158 (ie: oslo.cache leaving sockets to memcached in CLOSE_WAIT state...)14:22
opendevreviewDanylo Vodopianov proposed openstack/nova master: Packed virtqueue support was added.  https://review.opendev.org/c/openstack/nova/+/87607514:26
sean-k-mooneyzigo: wrong channel14:32
zigook :)14:32
sean-k-mooneyim not sure about hte while true:14:32
zigoI am.14:32
sean-k-mooneythat might never exit14:32
zigoThere's a break ...14:32
sean-k-mooneythere is only if you get an indexError14:33
zigoWhich is going to happen when all connections have been closed.14:33
sean-k-mooneyno14:33
zigoqueue.pop() will return nothing...14:33
zigoI mean, will do an IndexError no?14:33
sean-k-mooneyit returns None by default no?14:33
sean-k-mooneywhich will cause an AtributeError when we defernce the connection14:34
zigotkajinam wrote it, he may tell.14:34
zigoAt first, I wrote:14:34
zigowhile ( conn = self.queue.pop().connection ) != None:14:34
sean-k-mooneyhttps://docs.python.org/3/library/collections.html#collections.deque.pop14:35
sean-k-mooneyok pop does raise indexerror if its empty14:35
sean-k-mooneyi was expeciting it to reutrn a default like get on a dict14:35
sean-k-mooneyzigo: i still dont partically like useign excptions as contolflow but i see why this is working14:37
*** ozzzo1 is now known as ozzzo14:37
zigoI was surprised at first too. Probably this only deserves some added code comments ...14:38
sean-k-mooneyzigo: i still feel like it might be better to have a timeout but i see why you want it to wait 14:38
zigoA timeout ?!?14:38
zigoYou mean wait until the CLOSE_WAIT times out ?14:38
zigoThat is a crazy idea. One of our server doesn't even have source port available because of this.14:38
zigoAlso, we reach the memcached max open sockets very fast.14:39
sean-k-mooneyno14:39
sean-k-mooneyi mean perhasp we shoudl not try to clsoe the connections indefintly14:39
zigoIf we don't, then there's some remaining sockets in CLOSE_WAIT state.14:40
tkajinamthe logic loops over all connections in the queue but it does not retry for a single one14:40
sean-k-mooneyoh your right14:40
tkajinamin case a connection closure fails then it leaves that one and process the next14:40
sean-k-mooneyif we cant close it its already poped14:40
tkajinamyeah14:40
sean-k-mooneyso we will at most try each connection once14:40
sean-k-mooneyok yep14:40
sean-k-mooneyso ya i agree the patch is correct then14:40
* sean-k-mooney should read it more closly14:40
zigotkajinam: Are you ok if I add some more comments in your patch?14:41
tkajinamyou can though we have number of similar implementations in the other places in that single file14:41
tkajinamanyway this is not the right channel and we can continue it in #openstack-oslo14:42
sean-k-mooneyi +1'd it for what its worth14:42
sean-k-mooneyand i +2'd the voting job changes14:43
tkajinamsean-k-mooney, thanks !14:43
* zigo added some comments in that code, as it's the 3rd time someone is surprised by it.14:47
sean-k-mooneywhile true: and excptions as contol flow are redflags for me14:48
sean-k-mooneyzigo: so if i see either i try to take a closer look and ensure the will exit in all cases14:48
sean-k-mooneyunless there i s a comment explinaing ny this shoudl run forever or simialr14:49
zigoAgreed, it's not very natural.14:49
zigoThough there's a lot of valid cases in this world.14:50
zigoExample: the wait for event loop in every single X-Window app ... :)14:50
sean-k-mooneyi alwasy implemnt those in terms of a var that is set to true but can be set to false by the app14:54
sean-k-mooneybut yes14:54
tkajinamzigo, as I said earlier the same snippet is used multiple times in the same file. If you add a comment then it should be added to the all places. If you fix it then you should fix all.15:06
dvo-plvhello, hello15:06
dvo-plvI wouldl ike to suggest this spec for review https://review.opendev.org/c/openstack/nova-specs/+/89592415:07
dvo-plvIf you will have a time15:07
tkajinamzigo, hmm. ignore it. it might be the only place where that hack is used for loop15:07
* tkajinam is moving to oslo ,really now. sorry for the noise !15:07
opendevreviewStephen Finucane proposed openstack/nova master: docs: Further tweaks to the CPU models document  https://review.opendev.org/c/openstack/nova/+/78406617:15
opendevreviewRobert Franzke proposed openstack/nova stable/zed: fix: server rebuild not working during zed upgrade  https://review.opendev.org/c/openstack/nova/+/90081117:22
opendevreviewElod Illes proposed openstack/nova stable/2023.2: Fix rebuild compute RPC API exception for rolling-upgrades  https://review.opendev.org/c/openstack/nova/+/90033818:49
opendevreviewElod Illes proposed openstack/nova stable/2023.2: Adding server actions tests to grenade-multinode  https://review.opendev.org/c/openstack/nova/+/90033918:49
opendevreviewmelanie witt proposed openstack/nova stable/yoga: Decorate only Flavor.get_* methods that execute queries  https://review.opendev.org/c/openstack/nova/+/90082119:23
opendevreviewmelanie witt proposed openstack/nova stable/xena: Decorate only Flavor.get_* methods that execute queries  https://review.opendev.org/c/openstack/nova/+/90082219:25
opendevreviewmelanie witt proposed openstack/nova stable/yoga: Decorate only Flavor.get_* methods that execute queries  https://review.opendev.org/c/openstack/nova/+/90082119:30
opendevreviewmelanie witt proposed openstack/nova stable/xena: Decorate only Flavor.get_* methods that execute queries  https://review.opendev.org/c/openstack/nova/+/90082223:05
opendevreviewJay Faulkner proposed openstack/nova master: WIP: [ironic] Partition & use cache for list_instance*  https://review.opendev.org/c/openstack/nova/+/90083123:25
opendevreviewmelanie witt proposed openstack/nova stable/wallaby: Decorate only Flavor.get_* methods that execute queries  https://review.opendev.org/c/openstack/nova/+/90083223:26
opendevreviewJay Faulkner proposed openstack/nova master: WIP: [ironic] Partition & use cache for list_instance*  https://review.opendev.org/c/openstack/nova/+/90083123:34

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!