Monday, 2024-03-25

*** mklejn__ is now known as mklejn05:49
opendevreviewSylvain Bauza proposed openstack/nova master: Update contributor guide for 2024.2 Dalmatian  https://review.opendev.org/c/openstack/nova/+/91348108:17
opendevreviewSahid Orentino Ferdjaoui proposed openstack/nova master: scheduler: AggregateMultitenancyIsolation to support unlimited tenant  https://review.opendev.org/c/openstack/nova/+/89651208:30
sahid bauzas: o/ any chance to make progress on that one?08:30
bauzassahid: good point, thanks for your resilience :)08:31
bauzassahid: I'm adding the blueprint to tomorrow's nova meeting for asking to be a specless one08:32
bauzas(added now https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting )08:34
sahidbauzas: thanks a lot :-) yes waiting for a while now, we have backported the patch anywy in our production but it may be useful for other users as it address a legit use-case09:28
sean-k-mooney[m]bauzas:  while i orginally was not in favor of sahid scheduler enhancement it became apparent that we would need to enchance palcement to do it any other way and that would be non trivial so im ok with proceeding with it as a specless blueprint again11:42
opendevreviewStephen Finucane proposed openstack/nova master: api: Add missing Controller inheritance  https://review.opendev.org/c/openstack/nova/+/91408311:51
opendevreviewStephen Finucane proposed openstack/nova master: api: Remove FlavorManageController  https://review.opendev.org/c/openstack/nova/+/91408411:51
sean-k-mooneybauzas: fyi this was my fun friday rabbit hole https://bugs.launchpad.net/nova/+bug/205892812:53
sean-k-mooneythe title says it all but we shoudl proably fix that12:53
bauzassean-k-mooney: looking13:30
artomsean-k-mooney, btw, I couldn't reproduce that vswitch bug... our internal BZ isn't great on steps to reproduce, but do you recall doing anything specific?13:31
bauzassean-k-mooney: so we're missing instance actions, right?13:31
sean-k-mooneybauzas: yep we never recored any in the compute manage to say if the attach/detach finished with or without an error13:31
sean-k-mooneybauzas: so we get an instance action record with no events13:31
sean-k-mooneyi tried to make tempest use teh compeltion of the instance ation by polling and it timed out because we never actully create the finish event13:32
sean-k-mooneywe do emit the notificaitons we just dont save the event compeltion/failure in the db13:32
sean-k-mooneywe already have the remotable methods to do that including a handy class modthod on the instance_action object13:33
sean-k-mooneywe just dont call it 13:33
sean-k-mooneyartom: i create a two node devstack and manually repoduced it13:34
bauzasokay, so I guess you created a bug report for making sure we could backport it then ?13:34
bauzasbut I'm afraid of the API so13:34
bauzascould be a interop issue13:34
sean-k-mooneyi created the bug to not loose track of this13:34
sean-k-mooneyi dont think it would be an interop thing but i wanted to discsss this at the ptg13:35
bauzasokay, I'm asking that because I don't know whether we could rather a blueprint instead of a bug13:35
sean-k-mooneyor get gmann's input13:35
sean-k-mooneybauzas: its not a blueprint13:35
bauzassean-k-mooney: I created a new etherpad 13:35
sean-k-mooneybauzas: we dont actully need to change the api13:35
bauzassean-k-mooney: could you add it there ? https://etherpad.opendev.org/p/nova-dalmatian-ptg13:35
bauzassean-k-mooney: no, I'm not saying it would change the API13:36
sean-k-mooneythere is a filed called events in teh the action record13:36
sean-k-mooneyits currently empty for at least some of the instance actions13:36
sean-k-mooneybut apprenly stop populates it13:36
sean-k-mooneyso we would jsut be populating that with event for the rest13:36
bauzasbut it would return different actions between different API versions13:36
sean-k-mooneythat said we may or may not want a microversion to signal that13:36
sean-k-mooneyno13:36
sean-k-mooneythe actions woudl not change13:36
bauzas(ie. Caracal wouldn't return some action while Dalmatian would)13:36
sean-k-mooneythe number of actions woudl be identiacal13:37
sean-k-mooneythe payload of the action woudl change13:37
sean-k-mooneyone sec ill get the example13:37
artomsean-k-mooney, I believe you - I guess I'm asking if it was 100% reproducible, and did you need to do anything else besides the hw:numa_nodes=1 flavor extra spec to trigger a NUMA topology13:37
sean-k-mooneybauzas: https://docs.openstack.org/api-ref/compute/#show-server-action-details13:37
bauzasI mean the action would have a finish event https://docs.openstack.org/api-ref/compute/#show-server-action-details13:38
bauzasso I don't know if this is an interop issue13:38
sean-k-mooneyyes within the events list in the action13:38
bauzasbut yeah we can discuss this with gmann sure13:38
bauzassean-k-mooney: can you add this bug to the PTG etherpad ?13:39
sean-k-mooneyits purly additive, populating an existing filed13:39
sean-k-mooneybauzas: sure13:39
bauzasthanks13:39
sean-k-mooneyartom: ya pretty much. not 100% but enough thaat it happend on the first try and several in a row13:39
bauzassean-k-mooney: yeah, my concern is that users would know which release is used for an API just by calling instance actions list :)13:39
bauzasif you can find a finish event, then it's a Dalmatian version13:40
sean-k-mooneyyes although that would only matter if we didnt have a microversion fro dalamtion13:40
sean-k-mooneyand since we hope to have at least 1 for the maniala shares13:40
sean-k-mooneyim not sure this is a concern13:40
bauzaswell, this is not a concern from me, rather an open question13:41
artomsean-k-mooney, so, NUMA placement is deterministic, right? (We don't promise it, but de facto it is)... So if this is not happening 100% of the time, it means there's a race or something similarly *non-deterministic*13:41
sean-k-mooneypersonally i think this is somethign that shoudl be backported once fixed but again we can caht about that13:41
bauzasI wonder #1 whether we need a microversion #2 whether we can backport it13:41
artomWhen you say "reproduced", you mean the logs for the fitting of network metadata not appearing?13:41
bauzaswhat I'd prefer is that #1 no and #yes :)13:42
sean-k-mooneyartom: deteministic in the sense that if you had only one host and you started with it empty and booted vms in the same sequence you would expect the same result yes13:42
sean-k-mooneyartom: obviouly in reality you have mutiple concurrent requests13:42
bauzassorry, meant : MHO : #1 no and #2 yes (no microversion, and yeah we can backport)13:42
sean-k-mooneyso you cant really prediect it 100%13:42
bauzasbut let's ask gmann and other folks13:42
sean-k-mooneyartom: but for functioanl tests it shoudl be prediable13:42
artomsean-k-mooney, but in the context of a functional test with one host and one (or two) instances, it's fully predictable13:43
sean-k-mooneyartom: it should be yes13:43
sean-k-mooneygive me a minute to finish up a few more things and if you want we can try pair programming a repoducer in a bit13:43
artomSo if I'm not reproducing it (barring the possibility of me doing something wrong with my test), it means the cause is something non-deterministic...13:44
sean-k-mooneyartom: no it could mean our test fixtures are maskign it13:44
sean-k-mooneyon firday i tought i knew how to repoduce it but i need to recall what that was.13:45
dansmithmnaser: do you have any sort of preprod environment you could test the eventlet hub switch-to-asyncio for us?15:06
dansmithI'm worried about performance and stability issues by just blindly making that change, and I don't think a small-node-count devstack tempest run is good enough to say it's reasonable15:06
mnaserdansmith: lol i was literally reading yesterday about the switch and searching the number of refs of eventlet in codesearch.opendev.org yesterday lol15:07
dansmithand/or some willing tenant that could take the hit in the name of science?15:07
dansmithah, excellent15:07
mnaseri assume this hits several services15:07
mnaserso not really a "run a few computes with the new code" for example15:07
dansmithno, we'd need coverage for really any service running eventlet in any form, which to me, means even nova-api which doesn't run eventlet proper, but which still depends on some of the plumbing15:08
dansmitha couple compute nodes would be a start, and could be enough to signal "yeah, no, not going to work out of the box"15:08
dansmithnecessary but not sufficient15:09
mnaserhow intrusive are the changes in the code right now (i guess i can go try and find the review)15:09
mnaseror since its the step 1 in the transition docs its just a one liner switching to the asyncio hub15:09
dansmithyeah, the hub one-liner is what I'm talking about here15:10
dansmithmaybe check the latest discussion from this morning on the patch for context15:10
dansmithhttps://review.opendev.org/c/openstack/governance/+/90258515:11
bauzasfor me, I still need to bounce on this governance patch 15:13
dansmithI don't know what that means15:13
mnaserdansmith: i wouldn't mind putting this in somewhere and testing but i think for me if there isn't anyone driving the work after the exercise ends up for no reason ;p15:14
bauzasI meant to say "I would want to review it" :)15:14
dansmithmnaser: okay, I think the point is that if it "just works" we'd start making that our default15:14
dansmithmnaser: this is probably going to be everyone's problem pretty soon here so I'm not sure I'm too worried about it not getting attention15:15
dansmithit already exploded and got some immediate attention a few months ago, which is why this spec exists15:15
mnaserdansmith: in that case, if it "just works" in devstack, i'll be happy to flip the switch over in some envs here and report on what we're seeing15:16
mnaseri'd have to double check what is the earliest version of eventlet that supports this change though15:17
mnaseri.e. how recent of an openstack cloud you'd need to test this out15:17
dansmithyeah, might need the recent stuff from eventlet I think15:17
dansmithI think we're expecting it will "just work" from the devstack perspective, but I'm worried that under load we'll see performance and/o deadlock type issues15:17
dansmithbut yeah, we can get a sniff test first for sure15:18
mnaserit seems we'll need to bump eventlet version15:19
mnaserhttps://github.com/eventlet/eventlet/commits/e1afe7b387a2452c47a98b596c9451f9d95f7bdc/eventlet/hubs/asyncio.py15:19
* dansmith nods15:19
mnaserseems like it was added since 0.35.2 and thats not even in bobcat so it might require a new env15:20
mnaserrally seems like a good idea, i'd be happy to help stand up and env to run rally against since one of the concerns i'd have is well -- if you need to run caracal for this i dont know if i can find something that's running that with enough of a workload :)15:21
dansmithokay15:21
mnaseri assume the challenge is not going straight to asyncio is the fact there are a lot of eventlet based dependencies such as oslo messaging etc?15:22
dansmithparse error15:22
JayFThat's the idea, we want to have a migration where openstack can run eventlet/asyncio side by side so we can migrate off eventlet (and onto asyncio, depending on the outcome of testing+the goal)15:22
dansmiththe challenge *with* going straight to asyncio/15:23
dansmithmnaser: asyncio is a vastly different programming model from eventlet15:23
clarkbasyncio is cooperative and eventlet pretends to not be15:23
clarkbya that15:23
dansmithmnaser: it's not a small change, it's a rewrite everything sort of deal15:23
mnaserah, so ctrl+f and looking at the number of 'eventlet' references is not an indicator of how challenging it might end up being15:24
dansmithwhat we're discussing is a hack to let eventlet use asyncio as its sort of thread-switching engine, but it's just one step15:24
dansmithmnaser: not even remotely close to that no :)15:24
sean-k-mooneymnaser: correct its not15:24
dansmithmnaser: ctrl-f and search for "def.*" would be more accurate :)15:24
dansmithswitching to real threads would be more of what you're describing, since the model is the same15:25
mnaserfair enough, so if this doesn't work out we'd be in a bit of trouble15:25
dansmithto extend what clarkb said, eventlet is the programming model of real threads but with the cooperative sort of behavior of asyncio15:25
mnaseras in, it'd have to be an all-or-nothing and that is not what anyoen wants15:26
dansmithtbh, I think we're in a pickle (maybe not trouble) regardless15:26
dansmithif this hub hack doesn't just magically work, then I think my recommendation for the path forward will likely change (and be different per project/service)15:26
dansmithif it does, all we really gain is kinda getting a chunk of the bottom end of eventlet out of the way, but it doesn't change the fact that we still need all the monkeypatching and top end of eventlet going forward15:27
JayFYeah, we'll still have a lot of eventlet-implicit-depedencies to fix -- e.g. like using asyncio entrypoints for sqlalchemy15:33
JayFI'm extremely concerned about how things will operate -- and if performance will be harmed -- in the middle of that part of migrating15:34
JayFWhen we have some portion of the stack on asyncio and some using implicit greening via eventlet15:34
dansmiththe actual mixing isn't so much the problem as it is the invisibility of the non-async call stacks above the asyncio hub I think15:35
dansmithhowever,15:35
dansmiththere's really no slow migration path for existing code.. you can't really inject asyncio stuff at the lower layer, you kinda have to start at the top15:35
dansmiththere are some hacks, like making a sync method start an asyncio event loop just to make one async call, etc, but it's pretty heavy and gross, IME15:36
sean-k-mooneysorry was on a call15:37
sean-k-mooneyso the hub just means wo can mix asyncio native code and eventlet15:38
sean-k-mooneybut without monkeypatching everything it buys us nothing15:39
dansmithyup15:39
sean-k-mooneyand to transtation nova to async io while supprotign both hubs we would need to wrap all code that is monkeypatch today behind facde apis15:39
sean-k-mooneyand dispatch to the relevent implemention 15:39
sean-k-mooneyuntil we could fully drop supprot for the old model15:40
sean-k-mooneywhich is a long and non trivial thing15:40
dansmithright, which I think I covered above in the "15:40
dansmithrewrite everything" description :)15:40
sean-k-mooneyyep more or less15:40
dansmithwhile everything won't be rewritten, knowing where those seams should go will require inspection on the order of "rewrite everything" IMHO.. we'll have to effectively consider everything needing a rewrite and exclude things we can avoid async'ing15:41
sean-k-mooneyyep and it will require use to also pessimies some code paths by adding in the check for "which hub am i using" 15:42
dansmithanyway, I have to run to an appointment soon, and I'm sure we'll get more PTG discussion on this..I think we need to sniff-test the switch with devstack in order to earn a report from mnaser with a more realistic set of nodes and load15:42
sean-k-mooneyplus the indrection 15:42
sean-k-mooneyi dont currently have a devstack to playwith but i can proably just hack in an export of the hub env var15:43
sean-k-mooneyjust to see if we can even deploy any of the existing services with it15:43
JayFI suspect upstream eventlet (which is mostly Herve and friends these days) would consider it a bug if not15:43
sean-k-mooneydoing the proper enablement will take more time15:43
sean-k-mooneyJayF: Sure but bug are definetly a thing so you know we need to try it an see15:44
dansmithright the point is, mnaser would like to know if it even works for a small deployment before spinning up a test cluster and rally run15:45
sean-k-mooneyuc currently clamps to eventlet===0.35.1 so its in 0.35.2 based on the docs ill need to check if eventlet===0.35.1 has it or ill need to bump uc in the patch15:46
sean-k-mooneyok it is in 0.35.115:46
sean-k-mooneyi have 13 mins before my next meeting give me a sec an ill see if i can hack a dnm patch15:47
dansmithI think you need the latest to get the dns fix IIRC15:48
sean-k-mooneyhttps://review.opendev.org/c/openstack/devstack/+/91410815:52
sean-k-mooneyso ^ is the minimal change i belive15:52
sean-k-mooneyit wont enabel it for cli tools like nova-mange15:52
sean-k-mooneybut they also dont need eventlet today even if they are patched15:53
sean-k-mooneyso we can see if that cannary explode or not and then do somethign better15:53
sean-k-mooneydansmith: so https://review.opendev.org/c/openstack/devstack/+/914108 exploded in keystone manage which is not even ment o use eventlet18:12
sean-k-mooneyso thats fun.18:12
sean-k-mooney| File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/green/thread.py", line 36, in get_ident18:12
sean-k-mooney2024-03-25 16:04:04.700 | AttributeError: 'NoneType' object has no attribute 'getcurrent'18:12
dansmithwell, this is why I think merging a resolution to do a thing before it is even proven to *run* is a bad idea18:14
sean-k-mooneyim pretty sure this si form oslo log or oslo context 18:16
sean-k-mooneyjust tryign to grab the compute id18:16
sean-k-mooney* thread id18:17
sean-k-mooneynot compute id18:17
sean-k-mooneybut apparnetly just defining the env_var has a sidefect fo doing some kind of patching18:17
sean-k-mooneywhich i woudl not expect18:17
dansmithif it's not even able to do this without crashing, I suspect we're a long ways from being a drop-in replacement with no stability or performance implications18:20
JayFThe first step on the journey tho, no matter how long, is to try it, find the issue, fix it, and get that cycle going. I know hberaud and others are engaged and invested on making this work; it'll be nice to give them fresh data to use towards that goal.18:21
dansmithsure, but whether or not the asyncio hub is a good idea depends on some tests, IMHO18:28
gmannbauzas: sean-k-mooney: read logs for instance action event.  I do not think it is API change and should not require microversion bump. there is no change in response fields and It is populating the response with data we have in DB.18:50
sean-k-mooneygmann: ya that was my sense too.18:50
sean-k-mooneywe can see if anyone else disagree tomorrow or the ptg 18:51
gmann From interop perspective, I think that is what this API is. "there are always possibilities of empty action and events".  so it is very easily discoverable items in list18:51
gmannsean-k-mooney: yeah, I will add my opinion in etherpad too and we can discuss in PTG18:51
bauzasgmann: thanks for the short answer, and I'm fine with it, I was opiniating on the same point19:29
opendevreviewMerged openstack/nova master: Update contributor guide for 2024.2 Dalmatian  https://review.opendev.org/c/openstack/nova/+/91348119:36
*** dasm is now known as Guest391520:22
*** Guest3915 is now known as dasm20:30

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!