Thursday, 2023-11-02

rpittaugood morning ironic! o/08:08
opendevreviewRiccardo Pittau proposed openstack/ironic master: [WIP] Generic API for attaching/detaching virtual media  https://review.opendev.org/c/openstack/ironic/+/89491808:38
opendevreviewRiccardo Pittau proposed openstack/ironic master: Generic API for attaching/detaching virtual media  https://review.opendev.org/c/openstack/ironic/+/89491808:38
*** eandersson0 is now known as eandersson11:50
TheJuliastevebaker[m]: a few more examples, including one that says CDROM https://storyboard.openstack.org/#!/story/200876313:59
dtantsurTheJulia: Should I copy my downstream epic into a new upstream RFE?14:12
TheJuliaI'd likely just make a new RFE for out of band record removal14:36
TheJuliaTwo related, but a little different problems. One helps the other along as well, at least in some cases, but not all cases.14:37
dtantsurTheJulia: https://bugs.launchpad.net/ironic/+bug/204257015:00
dtantsurJayF: https://bugs.launchpad.net/ironic/+bug/204257515:22
JayFtyvm, if you're feeling extra spicy go link it in the workstreams, even just editing it into the right place woudl be helpful15:28
JayFbut otherwise i'll find and get it 15:28
dtantsurif you have the workstreams link handy15:29
* JayF just going to do it now, I have a minute16:05
JayFI'm a little double busy, I recruited a downstream GR-OSS python dev to help out with python 3.12 support in eventlet.16:05
JayFand we're walking through it together. The results are ... concerning to say the least?16:06
dtantsurNot a fun activity, I assume16:06
JayFhttps://etherpad.opendev.org/p/nov-2023-eventlet-gross is our very limited notes so far, we're digging in more in a bit16:06
* dtantsur loves the title and fully agrees with it16:06
JayFdtantsur: I am like, 50% convinced the ironic migrations unit tests issues last cycle may be root-caused with some of the stuff we're finding16:07
JayFdtantsur: it's a double-whammy: I work at GR-OSS (G-Research Open Source Software)16:07
dtantsurle sigh16:07
dtantsur:D16:07
dtantsurI still pondering every now and then how hard would it be to migrate off eventlet16:08
dtantsurThe biggest problem is monkey patching which makes it all-or-nothing16:08
JayFThat's basically the conversation I'm having with Itamar, is that over time we've basically dug this hole deeper because getting out of it is so intensely complex.16:08
JayFEven the simplest possible example: IPA. Extract eventlet monkey patching from it. Good luck!16:08
dtantsur"RLock reentrancy in the face of signals and (much more common) garbage collection callbacks" sounds like nightmare material to me16:09
JayFhttps://gist.github.com/itamarst/6883251d4d4e5196f42b90608ab49c3716:09
JayFwanna reproduce it?16:09
dtantsurJayF: Well, Actually (tm)16:09
JayFdtantsur: first of all, new wsgi server. oslo.service + ssl is going to croak16:09
dtantsurI think IPA could be an easy win. It does not do much and has a very low concurrency16:09
dtantsuryeah16:09
JayFyou are forgetting oslo libraries16:09
JayFI'm saying even for IPA, it'd be a lot of risk16:09
JayFand a lot of moved code16:09
dtantsurI'm not really, i'm just mentally prepared to burn them with fire16:09
JayFextrapolate that to Ironic. to Nova. to Neutron.16:10
dtantsurThese are problematic, yes16:10
JayFa direct migration path off eventlet is seemingly-impossible16:10
JayFbut there's a reason more people than me exist, there are smarter people and I have one of them who will rejoin me after eating lunch :D 16:10
dtantsurWhat I'm afraid of is getting cornered one day16:10
JayFdtantsur: I think we're in the corner16:11
dtantsurNot entirely, but on the way into it16:11
dtantsurIf we cannot make 4.12 work....16:11
dtantsur3.12, sorry16:11
JayFdtantsur: https://opendev.org/openstack/cyborg/src/branch/master/cyborg/common/utils.py#L23716:11
dtantsur(using OCP versions lol)16:11
JayFdtantsur: see that?16:11
TheJuliawheeeeeeee16:11
JayFdtantsur: https://opendev.org/openstack/freezer/src/branch/master/freezer/engine/engine.py#L25216:11
JayFsee that?16:11
JayFguess what. the freezer one is going to trigger bugs because it's monkey_patched eventlet, and using SimpleQueue, if run on py>3.816:11
dtantsurthe freezer gonna freeze \o/16:12
JayFand nobody knows, because that doesn't run integration tests16:12
JayFand we trust our libraries to not just... not have working CI and be generally broken16:12
JayFand eventlet doesn't have working CI and has bugs which I'd categorize as being "generally broken" (e.g. simplequeue and rlock race conditions as described in the etherpad)16:12
dtantsurStep one could be reaching out to oslo folks and asking for their opinion16:13
JayFwell, step 0 is where I'm at right now16:13
JayFwhich is make sure I know the full picture of what's going on16:13
dtantsurMaybe if someone can dedicate time to make oslo.service eventlet-agnostic, they just accept that16:13
dtantsurthen we're in a much better place16:13
rpittauJayF: hey I guess we'll have to repeat this https://review.opendev.org/c/openstack/ironic/+/899765 for all the other projects?16:13
JayFbut the more of the picture that gets painted, the less I see of "dog and coffee cup" and the more I see "room on fire"16:13
* dtantsur puts on his "This is fine" t-short16:13
JayFrpittau: yes-ish, but I wanna make sure it looks like what nova lands, just so we aren't different for different sake16:14
rpittauany plan to just switch entirely to pyproject.toml ?16:14
TheJuliadtantsur: did you bring enough t-shirts for everyone? :)16:14
JayFrpittau: I pushed that one up to get early CI, I just W-1'd it so we can wait for nova one to finalize, that one is failing CI so I think it might be working out some bugs16:14
JayFrpittau: none proactive afaict16:15
rpittauok, thanks!16:15
dtantsurTheJulia: I have two of them, I think my size should fit you :)16:15
dtantsurJayF: it's also possible that a lot of oslo stuff is actually implemented elsewhere, and we're wasting our time with oslo.service.16:17
* dtantsur needs to leave early today, happy to continue the discussion tomorrow16:18
TheJuliadtantsur: dunno, I need to go bra shopping again :\16:20
TheJuliaPerhaps a goal for the TC this cycle might be to move the needle16:21
TheJuliaw/r/t eventlet16:21
TheJuliaand dtantsur raises a really good point16:21
dtantsuryeah. but the tough question is: how to do it one step at a time?16:21
TheJuliaWith the monkey patching as-is... eek16:22
dtantsurexactly16:22
TheJuliawe almost just need to treat it as a band aid or a piece of tape16:22
* JayF raises percentage likelihood eventlet threading.RLock problems caused ironic migration unit test issues to 90%16:49
JayFwe're writing a reproducer right now.16:49
JayFbut we've found a potential bug that could cause the mutex around connection cleanup in sqlalchemy to be brokenish in eventlet-patched situations16:49
JayFsound familiar?16:49
TheJuliavery much so16:55
JayFnothing is proven until it is, though :)16:55
clarkbJayF: re IPA I'm completely naive to its design btu seems like it doesn't realy need multiple threads of execution? Just process requests from control one at a time? Its not like you'll clean the disk and update firmware at the same time on hardware right? That seems like a recipe for interesting behaviors :)17:11
JayFclarkb: disk image streaming17:12
JayFclarkb: while also heartbeating progress to ironic api17:12
JayFand disks can take days to shred17:12
clarkbok so you need two threads :)17:12
clarkba larger update then delete eventlet and remove monkey patching would be to just run in a single thread but maybe not terrible17:12
rpittaugood night! o/17:13
JayFthat is likely the route we'd go, but IPA was just an example for OpenStack*17:21
JayFbecause of the terrible we're discovering around eventlet17:21
JayFItamar is going to try to fix the RLock bug that has been reproducing in eventlet's CI for ages, but it's a tough one17:21
JayFgood news; if it's this bug; it only impacted unit tests17:31
opendevreviewJulia Kreger proposed openstack/sushy-tools master: Add Support testing for HttpBootUri  https://review.opendev.org/c/openstack/sushy-tools/+/89696317:42
JayFwe're going to dig on this rlock bug, I'm less convinced it caused our migrations issues just because the case it'd have to hit for this to be the root cause would be if sqlalchemy was imported, in a unit test, before eventlet was monkey patched (as far as we can tell)17:42
JayFthat's at least one of them17:43
opendevreviewJay Faulkner proposed openstack/ironic master: eventlet monkey patch in unit tests earlier  https://review.opendev.org/c/openstack/ironic/+/89997617:49
JayFTheJulia: dtantsur ^ I believe that is the workaround-fix for the migrations test root cause17:53
JayFTheJulia: dtantsur: monkey_patching after importing something that imports sqlalchemy means sqlalchemy created an ungreened RLock, we have proof that can be broken in eventlet17:53
TheJuliajoy17:54
JayFfor non-unit-test codepaths, we were already monkey patching earlier, so we shouldn't have seen it IRL17:54
JayFItamar is going to improve the upgrade code, including adding angry messages if an RLock gets missed during upgrades17:54
JayFwhich would allow us to roll that change back, hook unit tests back up, and prove that was the root cause17:54
JayFthis is the most satisfying find in years17:55
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Add test for dhcp-less vmedia based deployment  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/89800617:57
TheJuliaunfortunately it was also fairly hit and miss18:03
TheJuliabut it *would* prove to be a huge part of the unit test failure pain removed18:04
TheJuliaOkay, I've got a good patch for IPA to navigate the fun configuration drive bug18:19
TheJuliaAnd it nicely unpeels the confusion and asserts the right volume \o/18:20
TheJuliaand can be backported18:20
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Add test for dhcp-less vmedia based deployment  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/89800619:38
opendevreviewJulia Kreger proposed openstack/sushy-tools master: Add Support testing for HttpBootUri  https://review.opendev.org/c/openstack/sushy-tools/+/89696319:40
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Add test for dhcp-less vmedia based deployment  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/89800619:46
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Add test for dhcp-less vmedia based deployment  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/89800619:57
opendevreviewJulia Kreger proposed openstack/ironic master: WIP/DNM: Advanced vmedia deployment test ops  https://review.opendev.org/c/openstack/ironic/+/89801021:01
opendevreviewJulia Kreger proposed openstack/ironic-python-agent-builder master: Remove USE_PYTHON3 option  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/89824121:43
opendevreviewJulia Kreger proposed openstack/ironic-python-agent-builder master: Add Glean into TinyIPA image  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/89824221:45

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!