Monday, 2025-01-06

iurygregorygood morning ironic11:08
opendevreviewMerged openstack/ironic master: Replace crypt module  https://review.opendev.org/c/openstack/ironic/+/93717313:08
opendevreviewMerged openstack/ironic master: doc/source/admin fixes part-2  https://review.opendev.org/c/openstack/ironic/+/93730413:08
opendevreviewMerged openstack/bifrost master: Add support for Ubuntu 24.04 image download  https://review.opendev.org/c/openstack/bifrost/+/93417713:58
*** sfinucan is now known as stephenfin14:06
TheJuliagood morning14:28
dtantsurmorning TheJulia, how was your break?14:31
TheJuliaNot long enough :)14:38
dtantsurfair :)14:38
TheJuliaso many emails :)15:01
cidHappy new year Ironic o/15:05
masgharHappy new year!15:05
cid\o masghar15:08
masgharo/ cid!  Looks like we wont have our meeting today?15:09
* dtantsur assumes the same15:09
cid++, holidays still in the air15:10
TheJuliaYeah, Likely not awful in the grand scheme of things. I'm still in the middle of emails15:12
TheJuliadtantsur: going back to the networking topic, still want to split the discussions? To focus a split discussion would you be able to type up a sentence or two as to what you feel the focus should be?15:23
* dtantsur desperately tries to recollect the context16:01
dtantsurTheJulia: I'm sorry, you'll have to remind me what I wanted :D16:02
* dtantsur has a memory of a butterfly and an attention span of a golden retriever16:02
JayFo/16:04
JayFGolden retrievers have surprisingly long memories. Find a sandwich at a bush one time five years ago? Sniff it daily now.16:04
TheJuliaPattern establishment, my corgi is the same way16:05
dtantsurI see. Should have borrowed the memory, not the attention part.16:05
TheJulia"I must sniff this specific tree, and telecom stand *every* time I walk past..."16:05
TheJuliaheh16:05
* dtantsur is looking for a tree to sniff (just for a change)16:06
TheJuliadtantsur: so you were hoping to split the improve networking discussion to be a spearate openstack from metal3 discussion because so much of the openstack integrated discussion involves neutron and there is so much context backfilling which also needs to take place to sync state across all the humans16:06
dtantsurTheJulia: hmm, yeah, I remember, thank you. I guess "split" is an imprecise word. Adam and I (and potentially other standalone users) could benefit from some instances of the meetings dedicated to topics we understand (which, btw, have a lot of overlap with the neutron world - like Kea)16:08
TheJuliaahh, hmmmm16:09
TheJuliaokay16:09
* dtantsur may still be very unclear - I somehow cannot English any more after the break16:09
TheJuliayou english'ed quite clearly in my mind16:10
dtantsurgood :)16:10
masghar(I seem to have lost some english too xD)16:11
dtantsurMaybe an unusual amount of speaking our native languages during the break? :)16:12
masgharAbsolutely =D16:12
TheJuliacould always be worse16:14
TheJuliaI forgot where I put my downstream repos16:14
dtantsurmaybe they're on your forehead?16:14
dtantsur(sorry)16:14
TheJulialol16:14
masgharxD16:15
masgharFor some reason, German also seems to be escaping me (will pick up more soon hopefully xD)), but whenever I see something I dont understand my brain goes 'Was ist das' =D16:17
dtantsur:D16:19
TheJuliadtantsur: could the challenge be the contextual backfill and consensus finding process also creating a... signal to noise imbalance based upon use case focus?16:30
dtantsurI'm not entirely following. If you're trying to say that we're simply getting lost in the information and fail to recognize the commonalities, the answer is "very likely".16:33
cardoeWe do something with computers right? :D16:35
dtantsurDo we?16:35
* dtantsur looks at his with distrust16:35
TheJuliaheh16:36
TheJuliadtantsur: That is kind of what I'm suspecting16:36
TheJuliasince the conversation could easily drift all over the place too16:36
TheJuliamaybe a more distinct agenda model might be the answer?!16:36
TheJuliaenable the focusing and plan that focus upfront16:37
dtantsurcould be worth an attempt16:37
cardoeSorry my poor attempt at humor. Someone thought it'd be good to hit me with 2 1/2 hours of back to back meetings from 8am on the first day I'm back from PTO. I haven't a clue what I was working on. I couldn't even guess an answer for most folks.16:38
TheJuliaI'm a little frustrated, but the topic is easily falling off my radar as well16:38
TheJuliaI know what I was working on, but started my day going "uhh, what has appeared while I was on PTO which might require immediate action"16:39
dtantsurI'll be in this state in February after my planned PTO (was working over the holidays)16:39
cardoeSo as far as the network side from me goes, we're still plodding along a little bit. There's a lot of newer hook points in neutron since the last time I looked at the guts around the train days which is really good.16:39
TheJuliacardoe: joy, sounds like more context to spread as well16:41
cardoeUltimately from my side I think we're going to stick with neutron.16:41
cardoeOh there's 0 docs to this stuff. There's a multitude of ways to load "plugins" for each of these layers as well.16:41
TheJuliaugh16:42
cardoeLemme write up a little description.16:44
TheJulia__17:03
TheJuliaerr17:03
TheJulia++17:03
TheJuliaso... What is the perception of our CI this new year?17:03
dtantsurJust passed a recheck on your patch, this is promising17:04
TheJuliamy patch?!17:04
TheJuliaoh, the metalsmith default change?17:04
dtantsurmmmm, was the metalsmith-legacy removal yours? I haven't paid attention17:04
TheJuliayeah17:04
TheJuliaI was looking at a change from cardoe and it failed in the weirdest unrelated way I've seen recently: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7da/937271/2/check/ironic-tempest-ovn-uefi-ipmi-pxe/7da0107/testr_results.html17:05
dtantsurI've definitely seen SSH timeouts relatively recently17:05
TheJuliait wasn't a timeout though17:05
dtantsur"Connection to the 172.24.5.172 via SSH timed out" is what your link points at, no?17:06
TheJuliait failed to establish a session, not timed out17:07
TheJuliait actually got the socket if it's logging is at all sane17:07
TheJulia.... which is always an open question17:07
TheJuliahttps://www.irccloud.com/pastebin/eAiMKWQw/17:07
TheJuliawe likely just need to keep an eye out, maybe there is some weird case we can fall into with networking17:09
dtantsurYeah, I don't get this part (I wish we had some verbose logging as in `ssh -v`)17:09
TheJuliaI have had an open ask for them to rip paramiko out for close to two years17:10
TheJuliawe've been talking about removing it for more than 5 years now17:11
dtantsurNow it's a tradition!17:11
TheJuliadoh!17:11
dtantsur(like planning to implement the graphical console)17:11
* dtantsur ducks17:11
TheJuliaoh, we found a forward path there!17:11
dtantsuragain? :)17:11
TheJuliaoh yeah17:11
TheJulia... largely "eh, that was over engineered!"17:12
dtantsurJust like our whole industry17:12
dtantsuron this positive note I should probably go exercise a bit17:13
TheJulia++17:14
cardoeSorry. I was writing and then got Zoomed. :/17:33
cardoehttps://www.irccloud.com/pastebin/TyPtD0Kb/17:35
cardoeTheJulia: That's what I had written up17:35
cardoeSo that's where we got at the end of the year.17:35
cardoeI've got some crappy Cisco ASAs in the 3 cabs of gear I stole before the end of the year and you can already do "openstack router create --flavor-id cisco-asa test-router" and have that work.17:36
TheJuliaso in your structrue in case, I guess that kind of actually makes sense, sounds like the challenge is the full mapping hint for a vif bind which is just not a thing, right now17:48
TheJuliawhere as the whole set of networking cases for standalone users don't really apply to your case and needs17:48
cardoeYep.17:50
JayFThe one place where standalone/non-standalone networking cases overlap is the need to have a networking solution that doesn't require giving all the keys to the NOC to OpenStack17:51
JayFthat's the common thread between most folks I've talked to about advanced Ironic networking, in integrated and non-integrated contexts17:52
TheJuliasome of that is likely just mercury as framed in the spec document, at least trying to map a solution there17:56
opendevreviewMerged openstack/ironic master: CI: Remove legacy metalsmith job  https://review.opendev.org/c/openstack/ironic/+/93315218:06
opendevreviewMerged openstack/ironic master: apply line length rules to the doc directory  https://review.opendev.org/c/openstack/ironic/+/93726918:06
opendevreviewVerification of a change to openstack/ironic master failed: change ambiguous variable name  https://review.opendev.org/c/openstack/ironic/+/93727018:06
TheJuliawell, some stuff merging is a good sign18:08
TheJulialooks like a packaging mirror update is in progress and caused the failure18:10
TheJulialikely best thing to do is recheck in a little bit18:10
opendevreviewVerification of a change to openstack/ironic-python-agent-builder master failed: Move jobs and DIB builds to ubuntu noble  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/93811518:12
cardoethe variable name change failed cause metal3 got a CentOS 9 - Stream "baseos" checksum mismatch from all the mirrors.18:24
cardoeJayF: Yep. That's my issue. The handing off of all the keys.18:25
cardoeSo maybe if we can somehow define that TheJulia.18:25
cardoeI'm happy to help with the standalone user needs as well because I think any improvement of the networking story helps all the use cases.18:25
cardoeI was thinking about ultimately documenting where some of these hook points can be.18:27
cardoeSome of the places we've needed to create custom plugins are totally silly. Like the data for VXLANs is stored in the DB and it's acted on from the DB. But every time neutron starts up it uses the data from the config file to replace the data in the DB. There's no API endpoint in neutron to set that data either.18:30
cardoeSo our plugin nukes that force override and just listens on a RabbitMQ queue using oslo_messaging and takes the setup that way.18:31
cardoeSo I was thinking of basically refactoring that out so that on startup the default would be to read the config file and send that in. But if it's disabled then some other provider could provide that data.18:32
cardoeVXLAN also doesn't support the physical network stuff that the VLAN one has. But those changes are all very small. I feel like that could be made common and shared between the two. It's just nobody has needed / asked for it.18:34
opendevreviewVerification of a change to openstack/ironic master failed: change ambiguous variable name  https://review.opendev.org/c/openstack/ironic/+/93727019:50
TheJuliacardoe: by chance did you read the mercury spec?  Asking because the hope was the at least, at a core of it, a delineation of access control management to bridge some of the barrier19:51
cardoeI did. It's been a little while. But I do remember the roles made sense to me.19:51
cardoeI'll go back and look again however.19:52
cardoeStill trying to kick off cobwebs over here. I cannot recall if I asked if it made sense for get_last_error to be blocked for a leased machine. I feel like there was a convo about the fact that the errors either did or didn't share sensitive data.19:53
TheJuliaSo I think there are two delineations, bridge the gap for the lowest level base interactions required, and then also the informational context and state a the intermediate layers, sort of like your vxlan conundrum with neutron state (which sounds like a bug that it reads config, rewrites db...  which also seems risky and problematic as well19:54
TheJuliaWhich is also disjointed from the addressing and the overlapping areas19:54
TheJuliahmmmmmmmm19:55
TheJuliayou  have nerd sniped my brain19:55
* TheJulia backs out of the rust folders and to an ironic folder to look19:56
cardoeooo Rust? I'm in for a nerd snipe myself.19:57
TheJuliacardoe: get_last_error only exists on cleaning and servicing19:58
TheJuliaor do you mean the last_error api field19:58
JayFI think he means node.last_error19:58
JayFwhich ends up proxied into the nova failure in integrated cases19:58
TheJuliacardoe: I've been looking at bootc install stuff for bootable containers19:58
* JayF trying to dredge up the context from last year19:58
cardoeyeah it's node.last_error19:58
opendevreviewSteve Baker proposed openstack/ironic master: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93599219:58
JayFI think my suggestion was to JFDI fix that; it should be visible to lessees if it's piped thru to nova19:59
TheJuliaand bootc is in rust as well as bootupd, been trying to figure out if we need to do any EFI record management on a deployment with bootc19:59
cardoeIf it fails to build via nova the default policy results in ** Value Redacted - Requires baremetal:node:get:last_error permission. **19:59
JayFcardoe: think about this case: "error: failed to blorp port X on switch hostname.domain.ld"20:00
TheJuliais nova using the lesse user's privileges?20:00
JayFcardoe: is that OK for a lessee to see? I think that's about as private as it gets (internal host name)20:00
TheJuliai.e. is it's ironic client spawned with the end user's rights20:01
cardoeJayF: Yeah. I'm trying to remember what we were saying here. And I think it was something redacted should land in there.20:01
TheJuliaI could see loosing the policy by default in ironic as long as we publish a reno with the change20:01
TheJuliathe idea was to highly restrict what a lessee could get context/knowledge/state wise, but I don't think we anticipated the client from nova to ever  be using the lessee's user's privilges20:02
TheJuliawhich would be much more restricted out of the box20:02
TheJuliaunder, at least, the current default RBAC policy20:03
cardoeI stupidly left myself a note that just says "clarify baremetal:node:get_last_error usage and provide reasonable default message on build failure"20:03
cardoeAnd I thought I had asked in here if we should make any changes or not.20:03
TheJuliaI honestly don't remember why I made it so restricted20:04
cardoeI think cause of the reason JayF just said.20:08
JayFTheJulia: so I think why I was so quick to yes20:09
JayFis my original reading of what cardoe said, like weeks ago, was that *it showed up in nova errors already*20:09
TheJuliayeah20:09
JayFif the general happy path is "those errors end up in nova", we should allow lessees to see slam dumk20:09
JayFI probably lean towards allowing it by default even if it's not allowed yet, but that at least would be a change in security level of that info20:10
TheJuliaI think the reason I restricted it is you can get like a bmc url or IP address in the text20:15
TheJuliaand for an admin, sure, makes sense20:15
TheJuliafor an endish user... maybe not20:15
TheJuliabut also sort of depends on your risk tolerance and desire20:15
TheJuliaand I could be totally onboard with lessening the restriction and just publishing a reno with maybe some prose as a note around the policy today20:15
JayFI had an ask generally in this category from my downstream20:17
JayFsomething along the lines of "count consecutive errors somewhere so you can trigger $action based on it"20:17
JayFwith the triggering being either internal to or external to ironic, depending on what we think is best20:17
JayFI told them the hardest part of that is determining what "failure" is, because almost certainly a lessee armed with enough knowledge of our API to be dangerous could rack up failures lol20:18
TheJuliaindeed20:18
TheJuliaand on a plus side, there is a separate history for errors for admins :)20:18
cardoeYeah don't wanna give away too much but some things are weird like when I didn't have enough of that resource class20:19
TheJulialast_error really should only get populated when there was a deployment failure20:20
TheJulianot resource classing :)20:20
JayFTheJulia: yeah, but node history doesn't track successes20:25
JayFTheJulia: so you can't answer the question "how many failures has this node had in a row"20:26
JayF...is that the answer? track successes in node history optionally so all the info is there?20:26
TheJuliaJayF: that is easy to fix20:27
TheJulia"record successful deploy"20:28
JayFyeah, I'm going to take that back to them and see if it'll work 20:28
JayFif so that's JFDI levels of simple20:28
TheJuliaindeed20:28
TheJulianode_history_record(blah, blah, "it worked said that crazy julia")20:28
JayFI've asked, we'll see if that scratches their itch20:29
JayFI could also see an option like "purge node history on successful deploy"20:29
JayFas an option for pruning old history entires20:29
JayFwhich would make node history only have "new" errors since a last success20:29
TheJuliaoptional, maybe20:29
JayFyeah, as an option alongside the other "when to prune history" options :)20:31
opendevreviewMerged openstack/ironic master: Trivial deprecation fixes.  https://review.opendev.org/c/openstack/ironic/+/93641120:45
opendevreviewJay Faulkner proposed openstack/ironic-python-agent master: Remove dependency on ironic-lib  https://review.opendev.org/c/openstack/ironic-python-agent/+/93774320:49
opendevreviewJay Faulkner proposed openstack/ironic master: Migrate ironic_lib to ironic  https://review.opendev.org/c/openstack/ironic/+/93794822:02
opendevreviewVerification of a change to openstack/ironic master failed: change ambiguous variable name  https://review.opendev.org/c/openstack/ironic/+/93727022:36

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!