iurygregory | good morning ironic | 11:08 |
---|---|---|
opendevreview | Merged openstack/ironic master: Replace crypt module https://review.opendev.org/c/openstack/ironic/+/937173 | 13:08 |
opendevreview | Merged openstack/ironic master: doc/source/admin fixes part-2 https://review.opendev.org/c/openstack/ironic/+/937304 | 13:08 |
opendevreview | Merged openstack/bifrost master: Add support for Ubuntu 24.04 image download https://review.opendev.org/c/openstack/bifrost/+/934177 | 13:58 |
*** sfinucan is now known as stephenfin | 14:06 | |
TheJulia | good morning | 14:28 |
dtantsur | morning TheJulia, how was your break? | 14:31 |
TheJulia | Not long enough :) | 14:38 |
dtantsur | fair :) | 14:38 |
TheJulia | so many emails :) | 15:01 |
cid | Happy new year Ironic o/ | 15:05 |
masghar | Happy new year! | 15:05 |
cid | \o masghar | 15:08 |
masghar | o/ cid! Looks like we wont have our meeting today? | 15:09 |
* dtantsur assumes the same | 15:09 | |
cid | ++, holidays still in the air | 15:10 |
TheJulia | Yeah, Likely not awful in the grand scheme of things. I'm still in the middle of emails | 15:12 |
TheJulia | dtantsur: going back to the networking topic, still want to split the discussions? To focus a split discussion would you be able to type up a sentence or two as to what you feel the focus should be? | 15:23 |
* dtantsur desperately tries to recollect the context | 16:01 | |
dtantsur | TheJulia: I'm sorry, you'll have to remind me what I wanted :D | 16:02 |
* dtantsur has a memory of a butterfly and an attention span of a golden retriever | 16:02 | |
JayF | o/ | 16:04 |
JayF | Golden retrievers have surprisingly long memories. Find a sandwich at a bush one time five years ago? Sniff it daily now. | 16:04 |
TheJulia | Pattern establishment, my corgi is the same way | 16:05 |
dtantsur | I see. Should have borrowed the memory, not the attention part. | 16:05 |
TheJulia | "I must sniff this specific tree, and telecom stand *every* time I walk past..." | 16:05 |
TheJulia | heh | 16:05 |
* dtantsur is looking for a tree to sniff (just for a change) | 16:06 | |
TheJulia | dtantsur: so you were hoping to split the improve networking discussion to be a spearate openstack from metal3 discussion because so much of the openstack integrated discussion involves neutron and there is so much context backfilling which also needs to take place to sync state across all the humans | 16:06 |
dtantsur | TheJulia: hmm, yeah, I remember, thank you. I guess "split" is an imprecise word. Adam and I (and potentially other standalone users) could benefit from some instances of the meetings dedicated to topics we understand (which, btw, have a lot of overlap with the neutron world - like Kea) | 16:08 |
TheJulia | ahh, hmmmm | 16:09 |
TheJulia | okay | 16:09 |
* dtantsur may still be very unclear - I somehow cannot English any more after the break | 16:09 | |
TheJulia | you english'ed quite clearly in my mind | 16:10 |
dtantsur | good :) | 16:10 |
masghar | (I seem to have lost some english too xD) | 16:11 |
dtantsur | Maybe an unusual amount of speaking our native languages during the break? :) | 16:12 |
masghar | Absolutely =D | 16:12 |
TheJulia | could always be worse | 16:14 |
TheJulia | I forgot where I put my downstream repos | 16:14 |
dtantsur | maybe they're on your forehead? | 16:14 |
dtantsur | (sorry) | 16:14 |
TheJulia | lol | 16:14 |
masghar | xD | 16:15 |
masghar | For some reason, German also seems to be escaping me (will pick up more soon hopefully xD)), but whenever I see something I dont understand my brain goes 'Was ist das' =D | 16:17 |
dtantsur | :D | 16:19 |
TheJulia | dtantsur: could the challenge be the contextual backfill and consensus finding process also creating a... signal to noise imbalance based upon use case focus? | 16:30 |
dtantsur | I'm not entirely following. If you're trying to say that we're simply getting lost in the information and fail to recognize the commonalities, the answer is "very likely". | 16:33 |
cardoe | We do something with computers right? :D | 16:35 |
dtantsur | Do we? | 16:35 |
* dtantsur looks at his with distrust | 16:35 | |
TheJulia | heh | 16:36 |
TheJulia | dtantsur: That is kind of what I'm suspecting | 16:36 |
TheJulia | since the conversation could easily drift all over the place too | 16:36 |
TheJulia | maybe a more distinct agenda model might be the answer?! | 16:36 |
TheJulia | enable the focusing and plan that focus upfront | 16:37 |
dtantsur | could be worth an attempt | 16:37 |
cardoe | Sorry my poor attempt at humor. Someone thought it'd be good to hit me with 2 1/2 hours of back to back meetings from 8am on the first day I'm back from PTO. I haven't a clue what I was working on. I couldn't even guess an answer for most folks. | 16:38 |
TheJulia | I'm a little frustrated, but the topic is easily falling off my radar as well | 16:38 |
TheJulia | I know what I was working on, but started my day going "uhh, what has appeared while I was on PTO which might require immediate action" | 16:39 |
dtantsur | I'll be in this state in February after my planned PTO (was working over the holidays) | 16:39 |
cardoe | So as far as the network side from me goes, we're still plodding along a little bit. There's a lot of newer hook points in neutron since the last time I looked at the guts around the train days which is really good. | 16:39 |
TheJulia | cardoe: joy, sounds like more context to spread as well | 16:41 |
cardoe | Ultimately from my side I think we're going to stick with neutron. | 16:41 |
cardoe | Oh there's 0 docs to this stuff. There's a multitude of ways to load "plugins" for each of these layers as well. | 16:41 |
TheJulia | ugh | 16:42 |
cardoe | Lemme write up a little description. | 16:44 |
TheJulia | __ | 17:03 |
TheJulia | err | 17:03 |
TheJulia | ++ | 17:03 |
TheJulia | so... What is the perception of our CI this new year? | 17:03 |
dtantsur | Just passed a recheck on your patch, this is promising | 17:04 |
TheJulia | my patch?! | 17:04 |
TheJulia | oh, the metalsmith default change? | 17:04 |
dtantsur | mmmm, was the metalsmith-legacy removal yours? I haven't paid attention | 17:04 |
TheJulia | yeah | 17:04 |
TheJulia | I was looking at a change from cardoe and it failed in the weirdest unrelated way I've seen recently: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7da/937271/2/check/ironic-tempest-ovn-uefi-ipmi-pxe/7da0107/testr_results.html | 17:05 |
dtantsur | I've definitely seen SSH timeouts relatively recently | 17:05 |
TheJulia | it wasn't a timeout though | 17:05 |
dtantsur | "Connection to the 172.24.5.172 via SSH timed out" is what your link points at, no? | 17:06 |
TheJulia | it failed to establish a session, not timed out | 17:07 |
TheJulia | it actually got the socket if it's logging is at all sane | 17:07 |
TheJulia | .... which is always an open question | 17:07 |
TheJulia | https://www.irccloud.com/pastebin/eAiMKWQw/ | 17:07 |
TheJulia | we likely just need to keep an eye out, maybe there is some weird case we can fall into with networking | 17:09 |
dtantsur | Yeah, I don't get this part (I wish we had some verbose logging as in `ssh -v`) | 17:09 |
TheJulia | I have had an open ask for them to rip paramiko out for close to two years | 17:10 |
TheJulia | we've been talking about removing it for more than 5 years now | 17:11 |
dtantsur | Now it's a tradition! | 17:11 |
TheJulia | doh! | 17:11 |
dtantsur | (like planning to implement the graphical console) | 17:11 |
* dtantsur ducks | 17:11 | |
TheJulia | oh, we found a forward path there! | 17:11 |
dtantsur | again? :) | 17:11 |
TheJulia | oh yeah | 17:11 |
TheJulia | ... largely "eh, that was over engineered!" | 17:12 |
dtantsur | Just like our whole industry | 17:12 |
dtantsur | on this positive note I should probably go exercise a bit | 17:13 |
TheJulia | ++ | 17:14 |
cardoe | Sorry. I was writing and then got Zoomed. :/ | 17:33 |
cardoe | https://www.irccloud.com/pastebin/TyPtD0Kb/ | 17:35 |
cardoe | TheJulia: That's what I had written up | 17:35 |
cardoe | So that's where we got at the end of the year. | 17:35 |
cardoe | I've got some crappy Cisco ASAs in the 3 cabs of gear I stole before the end of the year and you can already do "openstack router create --flavor-id cisco-asa test-router" and have that work. | 17:36 |
TheJulia | so in your structrue in case, I guess that kind of actually makes sense, sounds like the challenge is the full mapping hint for a vif bind which is just not a thing, right now | 17:48 |
TheJulia | where as the whole set of networking cases for standalone users don't really apply to your case and needs | 17:48 |
cardoe | Yep. | 17:50 |
JayF | The one place where standalone/non-standalone networking cases overlap is the need to have a networking solution that doesn't require giving all the keys to the NOC to OpenStack | 17:51 |
JayF | that's the common thread between most folks I've talked to about advanced Ironic networking, in integrated and non-integrated contexts | 17:52 |
TheJulia | some of that is likely just mercury as framed in the spec document, at least trying to map a solution there | 17:56 |
opendevreview | Merged openstack/ironic master: CI: Remove legacy metalsmith job https://review.opendev.org/c/openstack/ironic/+/933152 | 18:06 |
opendevreview | Merged openstack/ironic master: apply line length rules to the doc directory https://review.opendev.org/c/openstack/ironic/+/937269 | 18:06 |
opendevreview | Verification of a change to openstack/ironic master failed: change ambiguous variable name https://review.opendev.org/c/openstack/ironic/+/937270 | 18:06 |
TheJulia | well, some stuff merging is a good sign | 18:08 |
TheJulia | looks like a packaging mirror update is in progress and caused the failure | 18:10 |
TheJulia | likely best thing to do is recheck in a little bit | 18:10 |
opendevreview | Verification of a change to openstack/ironic-python-agent-builder master failed: Move jobs and DIB builds to ubuntu noble https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/938115 | 18:12 |
cardoe | the variable name change failed cause metal3 got a CentOS 9 - Stream "baseos" checksum mismatch from all the mirrors. | 18:24 |
cardoe | JayF: Yep. That's my issue. The handing off of all the keys. | 18:25 |
cardoe | So maybe if we can somehow define that TheJulia. | 18:25 |
cardoe | I'm happy to help with the standalone user needs as well because I think any improvement of the networking story helps all the use cases. | 18:25 |
cardoe | I was thinking about ultimately documenting where some of these hook points can be. | 18:27 |
cardoe | Some of the places we've needed to create custom plugins are totally silly. Like the data for VXLANs is stored in the DB and it's acted on from the DB. But every time neutron starts up it uses the data from the config file to replace the data in the DB. There's no API endpoint in neutron to set that data either. | 18:30 |
cardoe | So our plugin nukes that force override and just listens on a RabbitMQ queue using oslo_messaging and takes the setup that way. | 18:31 |
cardoe | So I was thinking of basically refactoring that out so that on startup the default would be to read the config file and send that in. But if it's disabled then some other provider could provide that data. | 18:32 |
cardoe | VXLAN also doesn't support the physical network stuff that the VLAN one has. But those changes are all very small. I feel like that could be made common and shared between the two. It's just nobody has needed / asked for it. | 18:34 |
opendevreview | Verification of a change to openstack/ironic master failed: change ambiguous variable name https://review.opendev.org/c/openstack/ironic/+/937270 | 19:50 |
TheJulia | cardoe: by chance did you read the mercury spec? Asking because the hope was the at least, at a core of it, a delineation of access control management to bridge some of the barrier | 19:51 |
cardoe | I did. It's been a little while. But I do remember the roles made sense to me. | 19:51 |
cardoe | I'll go back and look again however. | 19:52 |
cardoe | Still trying to kick off cobwebs over here. I cannot recall if I asked if it made sense for get_last_error to be blocked for a leased machine. I feel like there was a convo about the fact that the errors either did or didn't share sensitive data. | 19:53 |
TheJulia | So I think there are two delineations, bridge the gap for the lowest level base interactions required, and then also the informational context and state a the intermediate layers, sort of like your vxlan conundrum with neutron state (which sounds like a bug that it reads config, rewrites db... which also seems risky and problematic as well | 19:54 |
TheJulia | Which is also disjointed from the addressing and the overlapping areas | 19:54 |
TheJulia | hmmmmmmmm | 19:55 |
TheJulia | you have nerd sniped my brain | 19:55 |
* TheJulia backs out of the rust folders and to an ironic folder to look | 19:56 | |
cardoe | ooo Rust? I'm in for a nerd snipe myself. | 19:57 |
TheJulia | cardoe: get_last_error only exists on cleaning and servicing | 19:58 |
TheJulia | or do you mean the last_error api field | 19:58 |
JayF | I think he means node.last_error | 19:58 |
JayF | which ends up proxied into the nova failure in integrated cases | 19:58 |
TheJulia | cardoe: I've been looking at bootc install stuff for bootable containers | 19:58 |
* JayF trying to dredge up the context from last year | 19:58 | |
cardoe | yeah it's node.last_error | 19:58 |
opendevreview | Steve Baker proposed openstack/ironic master: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/935992 | 19:58 |
JayF | I think my suggestion was to JFDI fix that; it should be visible to lessees if it's piped thru to nova | 19:59 |
TheJulia | and bootc is in rust as well as bootupd, been trying to figure out if we need to do any EFI record management on a deployment with bootc | 19:59 |
cardoe | If it fails to build via nova the default policy results in ** Value Redacted - Requires baremetal:node:get:last_error permission. ** | 19:59 |
JayF | cardoe: think about this case: "error: failed to blorp port X on switch hostname.domain.ld" | 20:00 |
TheJulia | is nova using the lesse user's privileges? | 20:00 |
JayF | cardoe: is that OK for a lessee to see? I think that's about as private as it gets (internal host name) | 20:00 |
TheJulia | i.e. is it's ironic client spawned with the end user's rights | 20:01 |
cardoe | JayF: Yeah. I'm trying to remember what we were saying here. And I think it was something redacted should land in there. | 20:01 |
TheJulia | I could see loosing the policy by default in ironic as long as we publish a reno with the change | 20:01 |
TheJulia | the idea was to highly restrict what a lessee could get context/knowledge/state wise, but I don't think we anticipated the client from nova to ever be using the lessee's user's privilges | 20:02 |
TheJulia | which would be much more restricted out of the box | 20:02 |
TheJulia | under, at least, the current default RBAC policy | 20:03 |
cardoe | I stupidly left myself a note that just says "clarify baremetal:node:get_last_error usage and provide reasonable default message on build failure" | 20:03 |
cardoe | And I thought I had asked in here if we should make any changes or not. | 20:03 |
TheJulia | I honestly don't remember why I made it so restricted | 20:04 |
cardoe | I think cause of the reason JayF just said. | 20:08 |
JayF | TheJulia: so I think why I was so quick to yes | 20:09 |
JayF | is my original reading of what cardoe said, like weeks ago, was that *it showed up in nova errors already* | 20:09 |
TheJulia | yeah | 20:09 |
JayF | if the general happy path is "those errors end up in nova", we should allow lessees to see slam dumk | 20:09 |
JayF | I probably lean towards allowing it by default even if it's not allowed yet, but that at least would be a change in security level of that info | 20:10 |
TheJulia | I think the reason I restricted it is you can get like a bmc url or IP address in the text | 20:15 |
TheJulia | and for an admin, sure, makes sense | 20:15 |
TheJulia | for an endish user... maybe not | 20:15 |
TheJulia | but also sort of depends on your risk tolerance and desire | 20:15 |
TheJulia | and I could be totally onboard with lessening the restriction and just publishing a reno with maybe some prose as a note around the policy today | 20:15 |
JayF | I had an ask generally in this category from my downstream | 20:17 |
JayF | something along the lines of "count consecutive errors somewhere so you can trigger $action based on it" | 20:17 |
JayF | with the triggering being either internal to or external to ironic, depending on what we think is best | 20:17 |
JayF | I told them the hardest part of that is determining what "failure" is, because almost certainly a lessee armed with enough knowledge of our API to be dangerous could rack up failures lol | 20:18 |
TheJulia | indeed | 20:18 |
TheJulia | and on a plus side, there is a separate history for errors for admins :) | 20:18 |
cardoe | Yeah don't wanna give away too much but some things are weird like when I didn't have enough of that resource class | 20:19 |
TheJulia | last_error really should only get populated when there was a deployment failure | 20:20 |
TheJulia | not resource classing :) | 20:20 |
JayF | TheJulia: yeah, but node history doesn't track successes | 20:25 |
JayF | TheJulia: so you can't answer the question "how many failures has this node had in a row" | 20:26 |
JayF | ...is that the answer? track successes in node history optionally so all the info is there? | 20:26 |
TheJulia | JayF: that is easy to fix | 20:27 |
TheJulia | "record successful deploy" | 20:28 |
JayF | yeah, I'm going to take that back to them and see if it'll work | 20:28 |
JayF | if so that's JFDI levels of simple | 20:28 |
TheJulia | indeed | 20:28 |
TheJulia | node_history_record(blah, blah, "it worked said that crazy julia") | 20:28 |
JayF | I've asked, we'll see if that scratches their itch | 20:29 |
JayF | I could also see an option like "purge node history on successful deploy" | 20:29 |
JayF | as an option for pruning old history entires | 20:29 |
JayF | which would make node history only have "new" errors since a last success | 20:29 |
TheJulia | optional, maybe | 20:29 |
JayF | yeah, as an option alongside the other "when to prune history" options :) | 20:31 |
opendevreview | Merged openstack/ironic master: Trivial deprecation fixes. https://review.opendev.org/c/openstack/ironic/+/936411 | 20:45 |
opendevreview | Jay Faulkner proposed openstack/ironic-python-agent master: Remove dependency on ironic-lib https://review.opendev.org/c/openstack/ironic-python-agent/+/937743 | 20:49 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Migrate ironic_lib to ironic https://review.opendev.org/c/openstack/ironic/+/937948 | 22:02 |
opendevreview | Verification of a change to openstack/ironic master failed: change ambiguous variable name https://review.opendev.org/c/openstack/ironic/+/937270 | 22:36 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!