Tuesday, 2025-07-01

TheJuliaSweet, it appears to actually pass jobs00:05
TheJuliawell, at least some jobs00:06
rpittaugood morning ironic! o/06:18
queensly[m]Good morning06:53
mumesan[m]Good morning!06:59
opendevreviewYu Zou proposed openstack/ironic master: feat: add verify ca conf support for drivers  https://review.opendev.org/c/openstack/ironic/+/94754407:26
abongalegood morning ironic! o/08:20
opendevreviewcid proposed openstack/ironic master: WIP: Replace GreenThreadPoolExecutor in conductor  https://review.opendev.org/c/openstack/ironic/+/95293910:45
opendevreviewcid proposed openstack/ironic master: WIP: Replace GreenThreadPoolExecutor in conductor  https://review.opendev.org/c/openstack/ironic/+/95293910:50
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic master: Handle unresponsive BMC during Firmware Updates  https://review.opendev.org/c/openstack/ironic/+/93810811:45
rpittauFYI fedora is moving DC and that is impacting CS9/10 repos as well https://status.fedoraproject.org/12:57
rpittauthis ^ might impact our tools/CI12:57
opendevreviewHarald Jensås proposed openstack/ironic master: Move drain_shutdown_timeout to conductor cfg group  https://review.opendev.org/c/openstack/ironic/+/95387913:21
TheJuliadtantsur: I replied to some of your comments on https://review.opendev.org/c/openstack/ironic/+/953683. Your review might be a little early, but I think all of your questions should be answered. If you have any resulting questions, just let me know. Thanks!13:43
dtantsurTheJulia: so it's a different situation between "unused pool" and "empty pool", do I get it right?14:16
dtantsurI still don't think it should hang. I can take a careful look at futurist in a few14:16
TheJuliaYeah, I think so14:16
TheJuliaI think its actually documented inside of the python threading docs, I just haven't gotten back to it yet since I was using gerrit as a staging location to pull a newer change down from my development machine14:17
TheJuliato my devstack machine14:17
dtantsurTheJulia: so, this is the part that is different, right? https://github.com/openstack/futurist/blob/master/futurist/_futures.py#L169-L17114:33
dtantsurI don't see an obvious reason why that would hang on an empty executor: the workers list should be empty14:36
dtantsurAt least the intention of the code seems to be for shutdown() to be no-op when the pool has never been used14:39
dtantsur(I'm also not sure why you're referring to Python's docs: shutdown() is not inherited)14:39
dtantsurHmm, did you test with normal threads already or still with green threads?14:40
TheJuliaYeah, that is where it seems to hang but I've not tried to figure out what is actually happening in futurist.14:43
TheJuliadtantsur: I mean regarding wait()14:43
TheJulianot shutdown()14:43
dtantsuryeah, but you're changing invocations to shutdown?14:44
TheJuliait always was shutdown, its just if we invoke the wait or not14:44
TheJuliadtantsur: Literal full threads, it was glorious.14:45
dtantsurTheJulia: I've just tried shutdown(wait=True) on a freshly created pool, and it does not hang for me14:45
TheJuliaAnyway, I only uploaded it to begin to also see if anything else in the mechanics seems to break beyond signaling14:45
dtantsurI also don't see any possibilities of it hanging judging by code alone.14:45
TheJuliafun! okay, because every time I did try to figure out what was going on with the queue, it appeared to have no entrys but then woul dhang14:45
TheJuliayour doing what I was going to do this morning except now I'm negotiating a product roadmap 14:46
dtantsurTheJulia: does it happen all the time? Do you have an easy reproducer?14:46
TheJulialiterally every time in my testing locally14:46
dtantsurOn stopping the conductor with two WIP patches applied?14:47
TheJuliaI need to cleanup that latter wip14:47
TheJuliaI typoed some stuff from my devstack machine which I was developing on14:47
TheJuliadtantsur: my short term focus was carry forward our SIGUSR1 and SIGUSR2 behavior and ran into all shutdowns when running SIGUSR2 hanging there14:50
* TheJulia switches gears to making a roadmap14:50
dtantsurRandom guess: is it possible that we're hitting the issue that normal threads cannot be cancelled14:50
TheJuliaThat is what I was sort of starting to suspect14:51
TheJuliaexcept, when measuring the contents, it was empty each time so I was like "huh?!?"14:51
dtantsurIf a thread is stuck (e.g. because we expect it to run all the time - keepalive?), the queue won't have it there14:52
TheJuliaI glanced at it and I didn't see the db keepalive heartbeat on that thread, but that was also where I was sort of starting to supsect14:53
TheJuliaInteresitngly!14:53
dtantsurhmm, it does not look like the green shutdown() implementation relies on cancelling14:53
opendevreviewDmitry Tantsur proposed openstack/ironic master: WIP allow inspection to ignore certain interfaces  https://review.opendev.org/c/openstack/ironic/+/95388614:54
dtantsurgetting this ^^ out of my way temporary14:55
TheJuliahttps://www.irccloud.com/pastebin/hRQnhNTz/14:55
TheJuliaI think I just realized it14:55
TheJuliaor, the issue per-say14:55
TheJuliasee the paste and the inline note14:55
TheJuliathe warning towards the bottom, makes me think something else is going on that we're either picking up or holding on to14:56
dtantsurTheJulia: we call keepalive_halt() after del_host(), I wonder how it works even14:57
dtantsurit does not look like you're hitting this issue, but I still don't get it15:03
dtantsurOtherwise, I agree, it seems to be around the RPC server itself15:04
dtantsurHowever. If you set wait=False and it starts working, I wonder if something is actually running there still.15:08
TheJuliaBased upon the error and sitting down thinking about it now, I'm thinking somehow we're hung on a socket in select15:11
TheJuliaand somehow we deadlock with that, but it might just be behavior outside of our direct view as well15:12
opendevreviewcid proposed openstack/ironic master: Log executed steps during cleaning/servicing/deploy  https://review.opendev.org/c/openstack/ironic/+/95263715:13
dtantsurTheJulia: attach gdb to the conductor and see where it is? :)15:26
* dtantsur is curious if we can strace a running process..15:26
TheJuliamaybe later today, negotiating items ;)15:27
TheJuliaThe other challenge is another thread will eventually just kill the entire process. I guess it is good reason to finish wiring up the knobs ;)15:45
TheJuliaAny thought regarding doing like nova and sort of keeping some non-explicit compatibility around for a escape hatch of the likes of the Metal3 job right now :)15:46
TheJulia(I still need to double check requirements, but I guess since it is optional I likely only have oslo.service invoked15:47
TheJulia)15:47
dtantsurTheJulia: are the patches breaking the metal3 job?15:59
dtantsurFrom the migration standpoint, I'd rather migrate soon15:59
TheJuliaOh yeah, It is simple from what I saw15:59
TheJuliaI'm not saying don't migrate, I'm more thinking outloud if keep an escape hatch around or not16:00
dtantsurI guess the only argument for staged migration is the logic around the thread pool size?16:00
TheJuliabecause with an auto-escape hatch, it would just have worked16:00
dtantsurYou'd need to conditionally monkey-patch then?16:00
TheJuliaIt sure looks that way, tbh16:01
TheJulianot even monkeypatch, just let the eventlet defaults for the area workers get used16:01
dtantsurSo, combine threaded API/RPC with green threads in the conductor?16:01
TheJuliayes, the existing handler should just work if we condition the startup properly16:01
dtantsurI'm not just that is more robust then just migrate everything, at least as far as Metal3 is concerned16:01
dtantsurs/just/sure/ (wut)16:02
TheJuliasame, okay16:02
dtantsur(In the defence of my brain, it's +36 outside and +29 here)16:02
TheJulia(eeek!)16:02
TheJuliaIt will be 41+ where I'll be next week16:03
dtantsurugh16:04
dtantsurI hope your A/C is working16:04
TheJulia(Why am I having service work done when it is that hot outside?!?)16:04
dtantsur:(16:04
TheJuliaYeah, it will be, and should one fail they will have spares there already (hopefully)16:04
dtantsurThat's really nice16:04
* dtantsur only has a portable one16:05
TheJuliaugh, yeah, and they only do so well when it comes to heat as well since they aren't really built for super efficient heat movement16:05
TheJulia(Next up, we'll have videos of a outlet learning how to build an Air Conditioning system.16:06
TheJulia)16:06
TheJuliaeerr, owllet16:06
TheJulia(at some point, you are on that path once you buy a vacuum pump and copper tubing. ;))16:06
dtantsur:D16:07
TheJuliaokay, my meeting is now tomorrow and I've likely appropaitely angered everyone... soooo back to eradicating eventlet!16:09
dtantsur\o/16:09
TheJulia.... 3.5 hours after I started my day :(16:10
opendevreviewHarald Jensås proposed openstack/ironic master: Move drain_shutdown_timeout to conductor cfg group  https://review.opendev.org/c/openstack/ironic/+/95387916:19
cardoeTheJulia: so I commented on https://bugs.launchpad.net/ironic/+bug/2115471 some more... but I'm thinking about this... Do we really want the "inventory" to ever be free-form?16:20
cardoeWe've got some keys for "extra_hardware" which are defined as more free-form but do we want top level keys being free-form?16:21
cardoeI don't see how that could be a win for Ironic to be able to run operations safely against the data16:21
plestangHello team, currently the GlanceImageService only handle swift images as remote images is there any will or WIP to support images backed on s3?16:31
dtantsurcardoe: extra_hardware is not a part of the inventory, right?16:33
dtantsurplestang: I believe you can use any backend but you won't be able to use such features as swift temporary URLs, i.e. the images will be downloaded to the conductor first16:33
cardoedtantsur: it is... it's a top level key that's defined as being an object which the hardware manager controls16:34
dtantsurcardoe: well, it's not. what you describe is because the inventory/plugin_data split has only happened on the Ironic side so far16:35
dtantsur(again, my bad; we learn as we grow)16:35
dtantsurso, the IPA output is still a bit of a garbage pile :)16:35
dtantsurI hope, eventually, to get the inventory/plugin_data split there as well16:35
dtantsurEssentially, the logic is as follows: everything under the inventory key is the inventory, and it's not free-form16:36
dtantsureverything else is what becomes plugin_data in Ironic16:36
cardoeso IPA submits an empty plugin_data16:37
dtantsurit does not submit any16:38
dtantsurplugin_data is a relatively new concept, it appeared during the migration from ironic-inspector16:38
dtantsurto specifically start solving the problem you're now solving16:38
dtantsurthen Ironic does this: https://opendev.org/openstack/ironic/src/branch/master/ironic/api/controllers/v1/ramdisk.py#L39516:38
dtantsurthis line separates inventory from plugin_data16:39
dtantsurTL;DR top-level keys are not a part of the inventory, only the 'inventory' top-level key is16:40
* dtantsur has so many regrets as he writes this explanation :D16:40
TheJuliaheh16:55
TheJuliaregrets are normal16:55
TheJulia;)16:55
opendevreviewMerged openstack/ironic stable/2025.1: fix: handle unexpected 'loop' in actions field  https://review.opendev.org/c/openstack/ironic/+/95346516:57
dtantsurHas anyone faced this problem:16:58
dtantsur"RNDIS and CDC enabled host interfaces to access USB over the network shows almost as a regular interface in the CoreOS. This is true to all vendors. In Super Micro in particular this is highly problematic because the same MAC address is used across all servers if they are in the same BMC FW version. This is something that cannot be changed. "16:58
dtantsurSo, when you inspect such nodes, you end up with duplicate MACs :(16:58
dtantsurWhich is why I'm working on an option to exclude certain interfaces, btw16:58
dtantsurBut I wonder if anyone has already hit this problem.16:59
opendevreviewDmitry Tantsur proposed openstack/ironic master: Allow inspection to ignore certain interfaces  https://review.opendev.org/c/openstack/ironic/+/95388617:16
dtantsurI'll read any responses tomorrow, have a good night o/17:19
TheJuliadtantsur: no, the advise I've generally given folks is to ensure those are off because if someone hacks a node, and then successfully gets a shell in a BMC, your environment is now owned by the attacker completely17:43
opendevreviewChris Krelle proposed openstack/ironic master: update Jinja2 to address BDSA-2024-2383  https://review.opendev.org/c/openstack/ironic/+/95390217:50
NobodyCambeen a long while, and the patch I wanted to put up... but someone may get value from this one.17:51
NobodyCams/and\ the/and\ not\ the/17:52
NobodyCamhehhee17:52
TheJuliaso, str(exception.message) likely is not required since the formatting is %(param)s18:00
TheJuliaIs there a reference for an actual cVE and I guess not someone's tool repository?18:00
NobodyCamlet me get the ref 18:05
NobodyCamTheJulia: https://nvd.nist.gov/vuln/detail/CVE-2024-3406418:09
NobodyCamLet me update18:09
opendevreviewChris Krelle proposed openstack/ironic master: update Jinja2 to address BDSA-2024-2383  https://review.opendev.org/c/openstack/ironic/+/95390218:13
TheJuliaso, just a minimum version bump then18:17
NobodyCam3.0.0 to 3.1.6... the exc.message was the only change I found that was needed18:19
NobodyCam3.1.6 is what is in the upper constraints 18:20
plestangdtantsur: the swift temp url is exactly the issue but we could get a temp url like behaviour with presigned url in s3?18:29
plestangthis is why I was thinking that we could implement a generic generate_temp_url method which is able to handle multiple backend (at least swift and s3)18:30
opendevreviewChris Krelle proposed openstack/ironic master: update Jinja2 to address BDSA-2024-2383  https://review.opendev.org/c/openstack/ironic/+/95390218:30
TheJuliaNobodyCam: Uhh, commit message for the CVE related fix awareness18:48
TheJuliaI ask because if you search for BDSA-2024-2383, you get a page with two links18:48
TheJuliaplestang: seems reasonable, just put it on a class or condition the logic as such18:49
TheJulia(and I've heard even integrated folks who would love to use S3 directly (I guess, their data gravity is so heavy there that they just pay the bills and want to try and avoid the intermediate action18:49
opendevreviewJulia Kreger proposed openstack/ironic master: WIP: Replace GreenThreadPoolExecutor in conductor  https://review.opendev.org/c/openstack/ironic/+/95293918:51
opendevreviewJulia Kreger proposed openstack/ironic master: WIP: Set the backend to threading.  https://review.opendev.org/c/openstack/ironic/+/95368318:51
TheJuliacid: I rebased yours, fwiw.18:51
TheJuliaI removed eventlet from the test runner as well18:52
TheJuliaand... we may have more work ahead of us. I'll get a clearer picture to work from from CI18:53
NobodyCamahh one from 2022 and one from 2024..18:54
cidTheJulia, More work ahead, that's always !good news, ++ 😁18:54
TheJuliacid: its most likely just some stuff needs to get re-sorted18:55
TheJuliaNobodyCam: huh?18:55
TheJuliaand by stuff, cid, I mean test modeling with tasks since we may have some tests which don't happilly work across real threads.18:56
plestangTheJulia: sure!18:56
TheJuliaNobodyCam: if it wasn't obvious, we're in the process of ripping eventlet out18:56
cidack'd18:57
opendevreviewChris Krelle proposed openstack/ironic master: update Jinja2 to address BDSA-2024-2383  https://review.opendev.org/c/openstack/ironic/+/95390218:57
TheJulias/BDSA-2024-2383/CVE-2023-34064/18:59
NobodyCamack18:59
TheJuliafwiw, that just pins it up for upper-constraints to top it out. Locally I already default to 3.1.6, but I can see the point to move up the version18:59
opendevreviewChris Krelle proposed openstack/ironic master: update Jinja2 to address CVE-2024-2383  https://review.opendev.org/c/openstack/ironic/+/95390219:00
NobodyCamTheJulia: yes, NS thank you!19:01
NobodyCamNS = and19:01
TheJulianp!19:02
opendevreviewJulia Kreger proposed openstack/ironic master: WIP: Set the backend to threading.  https://review.opendev.org/c/openstack/ironic/+/95368319:11
TheJuliahttps://45bbcce67481c2c3c580-e564cc116dd6d65d7c4300b6ac54c802.ssl.cf2.rackcdn.com/openstack/ea600759f5d3405fabb11e1d353cf19e/testr_results.html <-- I am wondering if for now we just skip trying to threadize our tests, since the stack setup is all built on eventlet green threading19:13
TheJuliadtantsur: I think you were right about the keepalive thread, explicitly stopping it seemed to do the trick20:06
TheJuliaalso, adding self._shutdown since we tie the long running loops with that and multiple threads can be trying to read it20:09
opendevreviewJulia Kreger proposed openstack/ironic master: WIP: Set the backend to threading.  https://review.opendev.org/c/openstack/ironic/+/95368322:39
TheJuliacid: dtantsur: I think we're going to need to revisit the db layer because ultimately we're using enginefacade and the real problem is sessions are simply not thread safe.  sqlalchemy has some nice docs on this topic but I'm thinking the db stuff is going to get retooled22:41
NobodyCamGood afternoon folks I see that my patch failed one test: ironic-tempest-bios-redfish-pxe with POST_FAILURE in 9m 35s.... Looks like a issue downloading ubuntu/dists/noble-updates/main/binary-amd64/Packages.gz ... is the process still to recheck?23:04
opendevreviewDoug Goldstein proposed openstack/ironic master: fix port vif attach with networks with dynamic segments  https://review.opendev.org/c/openstack/ironic/+/95216823:40
cardoeNobodyCam: yes23:41

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!