TheJulia | Sweet, it appears to actually pass jobs | 00:05 |
---|---|---|
TheJulia | well, at least some jobs | 00:06 |
rpittau | good morning ironic! o/ | 06:18 |
queensly[m] | Good morning | 06:53 |
mumesan[m] | Good morning! | 06:59 |
opendevreview | Yu Zou proposed openstack/ironic master: feat: add verify ca conf support for drivers https://review.opendev.org/c/openstack/ironic/+/947544 | 07:26 |
abongale | good morning ironic! o/ | 08:20 |
opendevreview | cid proposed openstack/ironic master: WIP: Replace GreenThreadPoolExecutor in conductor https://review.opendev.org/c/openstack/ironic/+/952939 | 10:45 |
opendevreview | cid proposed openstack/ironic master: WIP: Replace GreenThreadPoolExecutor in conductor https://review.opendev.org/c/openstack/ironic/+/952939 | 10:50 |
opendevreview | Iury Gregory Melo Ferreira proposed openstack/ironic master: Handle unresponsive BMC during Firmware Updates https://review.opendev.org/c/openstack/ironic/+/938108 | 11:45 |
rpittau | FYI fedora is moving DC and that is impacting CS9/10 repos as well https://status.fedoraproject.org/ | 12:57 |
rpittau | this ^ might impact our tools/CI | 12:57 |
opendevreview | Harald Jensås proposed openstack/ironic master: Move drain_shutdown_timeout to conductor cfg group https://review.opendev.org/c/openstack/ironic/+/953879 | 13:21 |
TheJulia | dtantsur: I replied to some of your comments on https://review.opendev.org/c/openstack/ironic/+/953683. Your review might be a little early, but I think all of your questions should be answered. If you have any resulting questions, just let me know. Thanks! | 13:43 |
dtantsur | TheJulia: so it's a different situation between "unused pool" and "empty pool", do I get it right? | 14:16 |
dtantsur | I still don't think it should hang. I can take a careful look at futurist in a few | 14:16 |
TheJulia | Yeah, I think so | 14:16 |
TheJulia | I think its actually documented inside of the python threading docs, I just haven't gotten back to it yet since I was using gerrit as a staging location to pull a newer change down from my development machine | 14:17 |
TheJulia | to my devstack machine | 14:17 |
dtantsur | TheJulia: so, this is the part that is different, right? https://github.com/openstack/futurist/blob/master/futurist/_futures.py#L169-L171 | 14:33 |
dtantsur | I don't see an obvious reason why that would hang on an empty executor: the workers list should be empty | 14:36 |
dtantsur | At least the intention of the code seems to be for shutdown() to be no-op when the pool has never been used | 14:39 |
dtantsur | (I'm also not sure why you're referring to Python's docs: shutdown() is not inherited) | 14:39 |
dtantsur | Hmm, did you test with normal threads already or still with green threads? | 14:40 |
TheJulia | Yeah, that is where it seems to hang but I've not tried to figure out what is actually happening in futurist. | 14:43 |
TheJulia | dtantsur: I mean regarding wait() | 14:43 |
TheJulia | not shutdown() | 14:43 |
dtantsur | yeah, but you're changing invocations to shutdown? | 14:44 |
TheJulia | it always was shutdown, its just if we invoke the wait or not | 14:44 |
TheJulia | dtantsur: Literal full threads, it was glorious. | 14:45 |
dtantsur | TheJulia: I've just tried shutdown(wait=True) on a freshly created pool, and it does not hang for me | 14:45 |
TheJulia | Anyway, I only uploaded it to begin to also see if anything else in the mechanics seems to break beyond signaling | 14:45 |
dtantsur | I also don't see any possibilities of it hanging judging by code alone. | 14:45 |
TheJulia | fun! okay, because every time I did try to figure out what was going on with the queue, it appeared to have no entrys but then woul dhang | 14:45 |
TheJulia | your doing what I was going to do this morning except now I'm negotiating a product roadmap | 14:46 |
dtantsur | TheJulia: does it happen all the time? Do you have an easy reproducer? | 14:46 |
TheJulia | literally every time in my testing locally | 14:46 |
dtantsur | On stopping the conductor with two WIP patches applied? | 14:47 |
TheJulia | I need to cleanup that latter wip | 14:47 |
TheJulia | I typoed some stuff from my devstack machine which I was developing on | 14:47 |
TheJulia | dtantsur: my short term focus was carry forward our SIGUSR1 and SIGUSR2 behavior and ran into all shutdowns when running SIGUSR2 hanging there | 14:50 |
* TheJulia switches gears to making a roadmap | 14:50 | |
dtantsur | Random guess: is it possible that we're hitting the issue that normal threads cannot be cancelled | 14:50 |
TheJulia | That is what I was sort of starting to suspect | 14:51 |
TheJulia | except, when measuring the contents, it was empty each time so I was like "huh?!?" | 14:51 |
dtantsur | If a thread is stuck (e.g. because we expect it to run all the time - keepalive?), the queue won't have it there | 14:52 |
TheJulia | I glanced at it and I didn't see the db keepalive heartbeat on that thread, but that was also where I was sort of starting to supsect | 14:53 |
TheJulia | Interesitngly! | 14:53 |
dtantsur | hmm, it does not look like the green shutdown() implementation relies on cancelling | 14:53 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: WIP allow inspection to ignore certain interfaces https://review.opendev.org/c/openstack/ironic/+/953886 | 14:54 |
dtantsur | getting this ^^ out of my way temporary | 14:55 |
TheJulia | https://www.irccloud.com/pastebin/hRQnhNTz/ | 14:55 |
TheJulia | I think I just realized it | 14:55 |
TheJulia | or, the issue per-say | 14:55 |
TheJulia | see the paste and the inline note | 14:55 |
TheJulia | the warning towards the bottom, makes me think something else is going on that we're either picking up or holding on to | 14:56 |
dtantsur | TheJulia: we call keepalive_halt() after del_host(), I wonder how it works even | 14:57 |
dtantsur | it does not look like you're hitting this issue, but I still don't get it | 15:03 |
dtantsur | Otherwise, I agree, it seems to be around the RPC server itself | 15:04 |
dtantsur | However. If you set wait=False and it starts working, I wonder if something is actually running there still. | 15:08 |
TheJulia | Based upon the error and sitting down thinking about it now, I'm thinking somehow we're hung on a socket in select | 15:11 |
TheJulia | and somehow we deadlock with that, but it might just be behavior outside of our direct view as well | 15:12 |
opendevreview | cid proposed openstack/ironic master: Log executed steps during cleaning/servicing/deploy https://review.opendev.org/c/openstack/ironic/+/952637 | 15:13 |
dtantsur | TheJulia: attach gdb to the conductor and see where it is? :) | 15:26 |
* dtantsur is curious if we can strace a running process.. | 15:26 | |
TheJulia | maybe later today, negotiating items ;) | 15:27 |
TheJulia | The other challenge is another thread will eventually just kill the entire process. I guess it is good reason to finish wiring up the knobs ;) | 15:45 |
TheJulia | Any thought regarding doing like nova and sort of keeping some non-explicit compatibility around for a escape hatch of the likes of the Metal3 job right now :) | 15:46 |
TheJulia | (I still need to double check requirements, but I guess since it is optional I likely only have oslo.service invoked | 15:47 |
TheJulia | ) | 15:47 |
dtantsur | TheJulia: are the patches breaking the metal3 job? | 15:59 |
dtantsur | From the migration standpoint, I'd rather migrate soon | 15:59 |
TheJulia | Oh yeah, It is simple from what I saw | 15:59 |
TheJulia | I'm not saying don't migrate, I'm more thinking outloud if keep an escape hatch around or not | 16:00 |
dtantsur | I guess the only argument for staged migration is the logic around the thread pool size? | 16:00 |
TheJulia | because with an auto-escape hatch, it would just have worked | 16:00 |
dtantsur | You'd need to conditionally monkey-patch then? | 16:00 |
TheJulia | It sure looks that way, tbh | 16:01 |
TheJulia | not even monkeypatch, just let the eventlet defaults for the area workers get used | 16:01 |
dtantsur | So, combine threaded API/RPC with green threads in the conductor? | 16:01 |
TheJulia | yes, the existing handler should just work if we condition the startup properly | 16:01 |
dtantsur | I'm not just that is more robust then just migrate everything, at least as far as Metal3 is concerned | 16:01 |
dtantsur | s/just/sure/ (wut) | 16:02 |
TheJulia | same, okay | 16:02 |
dtantsur | (In the defence of my brain, it's +36 outside and +29 here) | 16:02 |
TheJulia | (eeek!) | 16:02 |
TheJulia | It will be 41+ where I'll be next week | 16:03 |
dtantsur | ugh | 16:04 |
dtantsur | I hope your A/C is working | 16:04 |
TheJulia | (Why am I having service work done when it is that hot outside?!?) | 16:04 |
dtantsur | :( | 16:04 |
TheJulia | Yeah, it will be, and should one fail they will have spares there already (hopefully) | 16:04 |
dtantsur | That's really nice | 16:04 |
* dtantsur only has a portable one | 16:05 | |
TheJulia | ugh, yeah, and they only do so well when it comes to heat as well since they aren't really built for super efficient heat movement | 16:05 |
TheJulia | (Next up, we'll have videos of a outlet learning how to build an Air Conditioning system. | 16:06 |
TheJulia | ) | 16:06 |
TheJulia | eerr, owllet | 16:06 |
TheJulia | (at some point, you are on that path once you buy a vacuum pump and copper tubing. ;)) | 16:06 |
dtantsur | :D | 16:07 |
TheJulia | okay, my meeting is now tomorrow and I've likely appropaitely angered everyone... soooo back to eradicating eventlet! | 16:09 |
dtantsur | \o/ | 16:09 |
TheJulia | .... 3.5 hours after I started my day :( | 16:10 |
opendevreview | Harald Jensås proposed openstack/ironic master: Move drain_shutdown_timeout to conductor cfg group https://review.opendev.org/c/openstack/ironic/+/953879 | 16:19 |
cardoe | TheJulia: so I commented on https://bugs.launchpad.net/ironic/+bug/2115471 some more... but I'm thinking about this... Do we really want the "inventory" to ever be free-form? | 16:20 |
cardoe | We've got some keys for "extra_hardware" which are defined as more free-form but do we want top level keys being free-form? | 16:21 |
cardoe | I don't see how that could be a win for Ironic to be able to run operations safely against the data | 16:21 |
plestang | Hello team, currently the GlanceImageService only handle swift images as remote images is there any will or WIP to support images backed on s3? | 16:31 |
dtantsur | cardoe: extra_hardware is not a part of the inventory, right? | 16:33 |
dtantsur | plestang: I believe you can use any backend but you won't be able to use such features as swift temporary URLs, i.e. the images will be downloaded to the conductor first | 16:33 |
cardoe | dtantsur: it is... it's a top level key that's defined as being an object which the hardware manager controls | 16:34 |
dtantsur | cardoe: well, it's not. what you describe is because the inventory/plugin_data split has only happened on the Ironic side so far | 16:35 |
dtantsur | (again, my bad; we learn as we grow) | 16:35 |
dtantsur | so, the IPA output is still a bit of a garbage pile :) | 16:35 |
dtantsur | I hope, eventually, to get the inventory/plugin_data split there as well | 16:35 |
dtantsur | Essentially, the logic is as follows: everything under the inventory key is the inventory, and it's not free-form | 16:36 |
dtantsur | everything else is what becomes plugin_data in Ironic | 16:36 |
cardoe | so IPA submits an empty plugin_data | 16:37 |
dtantsur | it does not submit any | 16:38 |
dtantsur | plugin_data is a relatively new concept, it appeared during the migration from ironic-inspector | 16:38 |
dtantsur | to specifically start solving the problem you're now solving | 16:38 |
dtantsur | then Ironic does this: https://opendev.org/openstack/ironic/src/branch/master/ironic/api/controllers/v1/ramdisk.py#L395 | 16:38 |
dtantsur | this line separates inventory from plugin_data | 16:39 |
dtantsur | TL;DR top-level keys are not a part of the inventory, only the 'inventory' top-level key is | 16:40 |
* dtantsur has so many regrets as he writes this explanation :D | 16:40 | |
TheJulia | heh | 16:55 |
TheJulia | regrets are normal | 16:55 |
TheJulia | ;) | 16:55 |
opendevreview | Merged openstack/ironic stable/2025.1: fix: handle unexpected 'loop' in actions field https://review.opendev.org/c/openstack/ironic/+/953465 | 16:57 |
dtantsur | Has anyone faced this problem: | 16:58 |
dtantsur | "RNDIS and CDC enabled host interfaces to access USB over the network shows almost as a regular interface in the CoreOS. This is true to all vendors. In Super Micro in particular this is highly problematic because the same MAC address is used across all servers if they are in the same BMC FW version. This is something that cannot be changed. " | 16:58 |
dtantsur | So, when you inspect such nodes, you end up with duplicate MACs :( | 16:58 |
dtantsur | Which is why I'm working on an option to exclude certain interfaces, btw | 16:58 |
dtantsur | But I wonder if anyone has already hit this problem. | 16:59 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Allow inspection to ignore certain interfaces https://review.opendev.org/c/openstack/ironic/+/953886 | 17:16 |
dtantsur | I'll read any responses tomorrow, have a good night o/ | 17:19 |
TheJulia | dtantsur: no, the advise I've generally given folks is to ensure those are off because if someone hacks a node, and then successfully gets a shell in a BMC, your environment is now owned by the attacker completely | 17:43 |
opendevreview | Chris Krelle proposed openstack/ironic master: update Jinja2 to address BDSA-2024-2383 https://review.opendev.org/c/openstack/ironic/+/953902 | 17:50 |
NobodyCam | been a long while, and the patch I wanted to put up... but someone may get value from this one. | 17:51 |
NobodyCam | s/and\ the/and\ not\ the/ | 17:52 |
NobodyCam | hehhee | 17:52 |
TheJulia | so, str(exception.message) likely is not required since the formatting is %(param)s | 18:00 |
TheJulia | Is there a reference for an actual cVE and I guess not someone's tool repository? | 18:00 |
NobodyCam | let me get the ref | 18:05 |
NobodyCam | TheJulia: https://nvd.nist.gov/vuln/detail/CVE-2024-34064 | 18:09 |
NobodyCam | Let me update | 18:09 |
opendevreview | Chris Krelle proposed openstack/ironic master: update Jinja2 to address BDSA-2024-2383 https://review.opendev.org/c/openstack/ironic/+/953902 | 18:13 |
TheJulia | so, just a minimum version bump then | 18:17 |
NobodyCam | 3.0.0 to 3.1.6... the exc.message was the only change I found that was needed | 18:19 |
NobodyCam | 3.1.6 is what is in the upper constraints | 18:20 |
plestang | dtantsur: the swift temp url is exactly the issue but we could get a temp url like behaviour with presigned url in s3? | 18:29 |
plestang | this is why I was thinking that we could implement a generic generate_temp_url method which is able to handle multiple backend (at least swift and s3) | 18:30 |
opendevreview | Chris Krelle proposed openstack/ironic master: update Jinja2 to address BDSA-2024-2383 https://review.opendev.org/c/openstack/ironic/+/953902 | 18:30 |
TheJulia | NobodyCam: Uhh, commit message for the CVE related fix awareness | 18:48 |
TheJulia | I ask because if you search for BDSA-2024-2383, you get a page with two links | 18:48 |
TheJulia | plestang: seems reasonable, just put it on a class or condition the logic as such | 18:49 |
TheJulia | (and I've heard even integrated folks who would love to use S3 directly (I guess, their data gravity is so heavy there that they just pay the bills and want to try and avoid the intermediate action | 18:49 |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP: Replace GreenThreadPoolExecutor in conductor https://review.opendev.org/c/openstack/ironic/+/952939 | 18:51 |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP: Set the backend to threading. https://review.opendev.org/c/openstack/ironic/+/953683 | 18:51 |
TheJulia | cid: I rebased yours, fwiw. | 18:51 |
TheJulia | I removed eventlet from the test runner as well | 18:52 |
TheJulia | and... we may have more work ahead of us. I'll get a clearer picture to work from from CI | 18:53 |
NobodyCam | ahh one from 2022 and one from 2024.. | 18:54 |
cid | TheJulia, More work ahead, that's always !good news, ++ 😁 | 18:54 |
TheJulia | cid: its most likely just some stuff needs to get re-sorted | 18:55 |
TheJulia | NobodyCam: huh? | 18:55 |
TheJulia | and by stuff, cid, I mean test modeling with tasks since we may have some tests which don't happilly work across real threads. | 18:56 |
plestang | TheJulia: sure! | 18:56 |
TheJulia | NobodyCam: if it wasn't obvious, we're in the process of ripping eventlet out | 18:56 |
cid | ack'd | 18:57 |
opendevreview | Chris Krelle proposed openstack/ironic master: update Jinja2 to address BDSA-2024-2383 https://review.opendev.org/c/openstack/ironic/+/953902 | 18:57 |
TheJulia | s/BDSA-2024-2383/CVE-2023-34064/ | 18:59 |
NobodyCam | ack | 18:59 |
TheJulia | fwiw, that just pins it up for upper-constraints to top it out. Locally I already default to 3.1.6, but I can see the point to move up the version | 18:59 |
opendevreview | Chris Krelle proposed openstack/ironic master: update Jinja2 to address CVE-2024-2383 https://review.opendev.org/c/openstack/ironic/+/953902 | 19:00 |
NobodyCam | TheJulia: yes, NS thank you! | 19:01 |
NobodyCam | NS = and | 19:01 |
TheJulia | np! | 19:02 |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP: Set the backend to threading. https://review.opendev.org/c/openstack/ironic/+/953683 | 19:11 |
TheJulia | https://45bbcce67481c2c3c580-e564cc116dd6d65d7c4300b6ac54c802.ssl.cf2.rackcdn.com/openstack/ea600759f5d3405fabb11e1d353cf19e/testr_results.html <-- I am wondering if for now we just skip trying to threadize our tests, since the stack setup is all built on eventlet green threading | 19:13 |
TheJulia | dtantsur: I think you were right about the keepalive thread, explicitly stopping it seemed to do the trick | 20:06 |
TheJulia | also, adding self._shutdown since we tie the long running loops with that and multiple threads can be trying to read it | 20:09 |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP: Set the backend to threading. https://review.opendev.org/c/openstack/ironic/+/953683 | 22:39 |
TheJulia | cid: dtantsur: I think we're going to need to revisit the db layer because ultimately we're using enginefacade and the real problem is sessions are simply not thread safe. sqlalchemy has some nice docs on this topic but I'm thinking the db stuff is going to get retooled | 22:41 |
NobodyCam | Good afternoon folks I see that my patch failed one test: ironic-tempest-bios-redfish-pxe with POST_FAILURE in 9m 35s.... Looks like a issue downloading ubuntu/dists/noble-updates/main/binary-amd64/Packages.gz ... is the process still to recheck? | 23:04 |
opendevreview | Doug Goldstein proposed openstack/ironic master: fix port vif attach with networks with dynamic segments https://review.opendev.org/c/openstack/ironic/+/952168 | 23:40 |
cardoe | NobodyCam: yes | 23:41 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!