Tuesday, 2025-04-01

opendevreviewMerged openstack/ironic master: Fix passing vtep fields to neutron  https://review.opendev.org/c/openstack/ironic/+/94596401:09
opendevreviewLennart Jern proposed openstack/sushy-tools master: Add config option SUSHY_EMULATOR_STORAGE_POOL  https://review.opendev.org/c/openstack/sushy-tools/+/94595905:59
Amarachi_OGood Morning Ironic, wishing you all a great day!06:45
rpittaugood morning ironic! o/07:01
ayo_Good morning rpittau07:03
freemanboss[m]Good morning guys 07:18
opendevreviewVasyl Saienko proposed openstack/ironic master: [devstack] Allow deploy environment with portgroups  https://review.opendev.org/c/openstack/ironic/+/94061108:01
AmarachiOrdor[m]Happy New Month Everyone!09:01
Ayo[m]Happy new month🥳09:02
jssfrback in the day, it was Happy Mailman Day09:09
AmarachiOrdor[m]That's interesting, why though ?09:16
jssfrbecause Mailman sent reminders about your current subscriptions on the first of each month :)09:28
freemanboss[m]@outreachy applicants I hope we are keeping close attention to the project page on outreachy website?11:16
freemanboss[m]Just noticed our project has been updated let's go through the update11:16
AmarachiOrdor[m]Yeah I noticed that too, thank you so much this update Freeman Boss 11:17
freemanboss[m]AmarachiOrdor[m]: Great.11:18
freemanboss[m]You're welcome11:18
queensly[m]<freemanboss[m]> "First thing to avoid unnecessary..." <- Freeman Boss: Amarachi_O  I had to restart this whole process, I have reached the enrollment stage but I see this... (full message at <https://matrix.org/oftc/media/v1/media/download/ASDBsst36jxM-wj0PF8HW3laQJdRa1-ZDwPvCH904aw1wBN7cGmtS-3RZTIDI-l5ZbJ-0s2s-mkEBU76i6KSa8RCeWOUrpPwAG1hdHJpeC5vcmcvamJ0cENKY3NFRFRRZkRiZmpCS1FQcmR3>)11:52
AmarachiOrdor[m]I think the nodes already exist try doing baremetal node list to see if they are powered off11:53
AmarachiOrdor[m]AmarachiOrdor[m]: queensly: 11:54
queensly[m]Alright. I did that , there are two nodes , and their power state is none.11:55
queensly[m]The provisioning state is enroll11:56
Ayo[m]Is this after using deploy? (Provisioning state)11:57
queensly[m]Ayo[m]: No after using enroll11:57
Ayo[m]Can you attach a screenshot?11:57
queensly[m]Ayo[m]: Please can you see this? https://imgur.com/UytD2jf11:59
Ayo[m]Use the baremetal node list command12:00
Ayo[m]I can’t really tell the state of what you enrolled from this output12:01
queensly[m]Ayo[m]: Yeah. Check this : https://imgur.com/K8rhzEm12:01
Ayo[m]The nodes have been enrolled12:02
AmarachiOrdor[m]So what I will advise is that you use baremetal --help to get different baremetal functions, in my situation my power state was off and I used baremetal power on testvm1 to switch it on12:02
queensly[m]AmarachiOrdor[m]: Alright. Let me do this and give you feedback.12:03
queensly[m]queensly[m]: the power state of testvm1 is on now. Amarachi Ordor 12:06
AmarachiOrdor[m]What does the provision state say12:07
queensly[m]AmarachiOrdor[m]: it's showing as "enroll"12:08
* AmarachiOrdor[m] uploaded an image: (3988KiB) < https://matrix.org/oftc/media/v1/media/download/AfI7Rz0NDzBLzcM3nGPs21JBtVmwmf2K7pldxoKX0OsJMXWpcFTfWTaT8gtfSUDJg5yH4sYgR0XnYKlHuYU-uuNCeWOV-H2QAG1hdHJpeC5vcmcvVm5zVkt4S0p2WGpmTHJic2ZkbkpPZXBK >12:14
queensly[m]I used the command baremetal node deploy testvm1 but had this output . It shows that since the state is in "enroll" I can't perform the deploy action.12:14
queensly[m]https://imgur.com/9fiatBf12:14
queensly[m]Using the "node manage" command, the provision state is now "verifying" . Do I have to wait for a while or continue with "node provide"?12:18
AmarachiOrdor[m]Honestly I don't know but if you wait a while and it doesn't change you can try node provide12:19
queensly[m]AmarachiOrdor[m]: Alright, no problem. The provision state is now "manageable" so I will try the node provide. 12:20
queensly[m]<queensly[m]> "Alright, no problem. The..." <- After using the "node provide" command, the provision state changed to cleaning. After some time, I checked again and the state is "available" . From the documentation https://docs.openstack.org/developer/tripleo-docs/advanced_deployment/node_states.html#:~:text=The%20manage%20action%20can%20be,How%20to%20Contribute12:28
queensly[m]this is the last step before deployment. I will deploy now and check the output.12:28
AmarachiOrdor[m]queensly:  nice nice, let me know how it hours 12:29
AmarachiOrdor[m]s/hours/goes/12:29
opendevreviewHarald JensÃ¥s proposed openstack/sushy-tools master: os-vmedia: Add option to delay rebuild on eject  https://review.opendev.org/c/openstack/sushy-tools/+/94580012:43
freemanboss[m]<queensly[m]> "Freeman Boss: Amarachi_O  I..." <- > <@queensly:matrix.org> Freeman Boss: Amarachi_O  I had to restart this whole process, I have reached the enrollment stage but I see this... (full message at <https://matrix.org/oftc/media/v1/media/download/AWqC_RUyy_hDOs0M_Cp1BBcv4mSqYSQgn9XT95u6d1U5xgp4W0c5sW63qjETqy6ZhLa2vqbPQJ5og7QnCrpWP8NCeWOYMmiwAG1hdHJpeC5vcmcvV25XTEticFlpVHBiRGtublRxRlVaRk9l>)12:53
freemanboss[m]queensly: everything working fine now?12:55
freemanboss[m]* finish and when it  retried12:59
queensly[m]freemanboss[m]: I tried to deploy and had this error:12:59
queensly[m] Failed to validate deploy or power info for the node testvm1 : Node testvm1 failed to validate deploy image info. some parameters were missing. Missing are [instance_info.image_sourcee] (HTTP 400)12:59
queensly[m]I have found the image source file and currently working on assigning an image to the instance_info since it showed empty. 12:59
freemanboss[m]send the result of baremetal node list13:00
queensly[m]queensly[m]: I want to see how this will turn out and give you the feedback.13:00
queensly[m]queensly[m]: I am also open to your contributions. 13:01
Ayo[m]queensly[m]: When you say that you’ve found the source file, what do you mean please13:02
Ayo[m]The data for the source files were pre installed during the installation stage when working on our cli13:03
Ayo[m]So are you using the deploy command or there’s a manual means that you’re going to use13:04
freemanboss[m]queensly: let's start from here and send the baremetal node list results 13:05
freemanboss[m]Also when you enroll did you see that it's successful?13:05
Ayo[m]freemanboss[m]: > <@freemanboss:matrix.org> queensly: let's start from here and send the baremetal node list results... (full message at <https://matrix.org/oftc/media/v1/media/download/AeRcp3ji1t4lFwssBjnxaeDcPsYeaDOsDRn_TCzsRv6bAeYSLxrrjne-46-6hFA7ROj27-QXmC80hNI61FWYiEdCeWOY6P7gAG1hdHJpeC5vcmcvTVRkVGhyRnlVamRDckNvREhzbkpST1lk>)13:06
queensly[m]Ayo[m]: I mean the file where one will find the images. you need to assign an image to the instance info. /var/lib/ironic/httpboot shows you the various images. In my case, I would nee these two for my instance info: deployment_image.qcow2 and 13:07
queensly[m]depolyment_image.qcow2-checksum.CHECKSUMS13:07
queensly[m]freemanboss[m]: https://imgur.com/K8yDtJL check this out13:08
freemanboss[m]queensly[m]: > <@queensly:matrix.org> I mean the file where one will find the images. you need to assign an image to the instance info. /var/lib/ironic/httpboot shows you the various images. In my case, I would nee... (full message at <https://matrix.org/oftc/media/v1/media/download/AWaVOMu9eYxHkWG1qbBcZTIUJc2HGIuGOsR_ZDhZ7nvIQ0aSlme-fJ5t9RXxn7VXO3Z7KR7DYkV5sSC6HX6z8eNCeWOZB4qQAG1hdHJpeC5vcmcvc0Ria2lid1F1aFRaaW1vY3NORVNZSFVo>)13:08
Ayo[m]Yep13:08
Ayo[m]That’s why I wanted to know the other means queen was going to use13:09
queensly[m]freemanboss[m]: But when I use the baremetal node show testvm1 command, it shows the instance info is empty {}13:09
Ayo[m]The deploy variable have these steps preconfigured13:09
Ayo[m]queensly[m]: The hardware, outlines in the .json file has been provided with the node details13:10
freemanboss[m]queensly[m]: Good. Just run ./bifrost-cli enroll baremetal-inventory.json again 13:11
queensly[m]freemanboss[m]: Alright13:11
queensly[m]queensly[m]: Why enroll and not deploy since the provision state is now "available" 🤔13:13
freemanboss[m]It shouldn't enter the cleaning loop again but if it does lemme know 13:13
freemanboss[m]queensly[m]: Nope that's not it.13:14
freemanboss[m]If you check we'll out of the 2 baremetals you only have one enrolled so the first one testvm1 will not allow the deployment to work fine13:14
Ayo[m]freemanboss[m]: > <@freemanboss:matrix.org> Nope that's not it.... (full message at <https://matrix.org/oftc/media/v1/media/download/AUK9nwBYp_vD92XdGk1jzNmoOwNa9_msNDpKrS2cG3kVYen27zrlwBBXP_bKTeoQSIiPYHy0ZhfHA13xld1DFmVCeWOZhB9AAG1hdHJpeC5vcmcveHpySGRVVmpCS0F3aENhQ0NwZWlRdWpO>)13:16
queensly[m]freemanboss[m]: Okay. So if I understand you correctly, both have to be in the same state, is the right?13:17
freemanboss[m]queensly[m]: Yes exactly13:17
queensly[m]Ayo[m]: That is what I'm trying to do. I was thinking I could deploy just one. 13:18
Ayo[m]queensly[m]: Just saw the attachments, my bad13:18
freemanboss[m]You previously provided the testvm1 for cleaning mode so the availability is for after cleaning not for after enrollment 13:18
queensly[m]freemanboss[m]: Okay, so does it mean the availability in this stage means available for enrollment? I thought the state of been available meant ready for deployment as I saw here https://docs.openstack.org/developer/tripleo-docs/advanced_deployment/node_states.html#:~:text=The%20manage%20action%20can%20be,How%20to%20Contribute13:21
freemanboss[m]Availability means it's ready for the next stage after a just concluded process13:23
* cardoe table flips and then realizes his coffee was on the table.13:24
cardoeThat's how my day feels like it's shaping up to be.13:24
queensly[m]freemanboss[m]: Alright. Does it mean the testvm2 should also be in the same "available" state before i use the ./bifrost-cli enroll baremetal-inventory.json ?13:25
freemanboss[m]queensly[m]: It should be in available mode already.13:26
freemanboss[m]You can try deploying too though if it works fine13:26
TheJuliaheh13:28
TheJuliaMailman day, love it.13:29
queensly[m]<freemanboss[m]> "You can try deploying too though..." <- I had to use the node manage, and provide commands to also set the testvm2 to 'available". Since it's state wasn't available, I had an error again when I run the ./bifrost-cli enroll baremetal-inventory.json .  After it was set,  that I used the ./bifrost-cli enroll baremetal-inventory.json  command  again and had this output. Ironic has successfully deployed an OS onto the14:07
queensly[m]nodes. 14:07
queensly[m]https://imgur.com/QUoJzRQ14:07
queensly[m]s/that//14:08
AmarachiOrdor[m]queensly: okay thats nice 14:22
AmarachiOrdor[m]That's really good14:22
AmarachiOrdor[m]Glad you were able to resolve it14:22
freemanboss[m]<queensly[m]> "I had to use the node manage..." <- > <@queensly:matrix.org> I had to use the node manage, and provide commands to also set the testvm2 to 'available". Since it's state wasn't available, I had an... (full message at <https://matrix.org/oftc/media/v1/media/download/ARwm5AHomr-rtkWCzmEqRj9Phwhn020ALc84h0IP5uI-W_WEJ0BDnf34SImcgElvcmAh9O5IeI3_rgmgU_4RGf5CeWOddYAAAG1hdHJpeC5vcmcvaFNKR2ZBeXlweUJ6d1hreGZmRWdYc1ls>)14:25
Ayo[m]But why tho14:28
Ayo[m]I ran into the same issue when I tried to deploy the other .json that had node details in it14:30
Ayo[m]Only the inventory.Json file file worlds for deployment14:30
Ayo[m]s/worlds/works/14:37
queensly[m]<AmarachiOrdor[m]> "Glad you were able to resolve it" <- Yeah. Thanks for the support. 😊14:42
queensly[m]<freemanboss[m]> "> <@queensly:matrix.org> I had..." <- Oh I see. Thanks, the other method I tried to use would have been a long way.  As you mentioned earlier, all those settings had already been done for easy deployment in the testenv.  14:46
freemanboss[m]queensly[m]: Yeah exactly 14:50
queensly[m]<Ayo[m]> "Only the inventory.Json file..." <- Yeah. Thank you for the assistance. I hope you have been able to deploy too. 14:50
cardoeTheJulia: https://bugs.launchpad.net/neutron/+bug/2105855 I made the ask15:13
TheJuliacardoe: hacks you guys did, or hacks existing in the ml2 plugins today ?15:15
TheJuliaI could see cisco needing physical network delineation15:15
cardoeSo we're exploring with our own ml2 mech right now. Currently the servers that sit on two separate fabrics automagically get the same VNIs on both sides. It's wasteful but we've got enough VNIs.15:18
cardoeUsing the network-segment-range neutron extension to create a VNI range and tenants create vxlan based networks.15:19
cardoeWe've set physical_network on the ironic baremetal ports to the HA ToR pair. So let's say cabinet X has a switch pair X-1 and X-2 for regular traffic (which portgroups can be created in as well). The physical_network field on those two ports would be "X". We then create a network-segment-range of type=VLAN and physical_network=X equal to a range of allowed VLANs.15:21
TheJuliaI was mainly asking to see if there was a need to provide a bit more context which might need to be added to the RFE15:22
cardoeoh. Well lemme write this up there.15:23
TheJuliaAdds some weight to explain why you've routed around and what you did as well, just so they can frame it15:27
TheJulia(at the end of the day you want them to grok the business value and technical reasons why)15:34
opendevreviewJay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook  https://review.opendev.org/c/openstack/ironic/+/94525915:44
fricklerok, I wasn't sure when to come up with this, but maybe now is as bad as anytime: did anyone ever think about doing nodes with BGP based L3 connectivity? one obvious obstacle is that this is not a concept that neutron currently supports, but maybe it wouldn't be impossible to add, either?16:05
TheJuliafrickler: as in running bgp on the deployed node to provide it that baseline connectivity wired into the existing neutron fabric?16:07
opendevreviewMatt Anson proposed openstack/ironic-python-agent-builder master: Don't install biosdevname in arm64/aarch64 arches  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/94605716:07
fricklerTheJulia: first step might even be outside of neutron, but yes, providing basic connectivity to the node, via IPv6 unnumbered BGP ideally16:09
TheJuliaSo the individual deploying is also the infrastructure operator then ?16:11
frickleryes, in the easy case. but with some BGP filtering on the switches it might also support tenants. I haven't put too much brains yet into how to actually represent that in neutron16:12
TheJuliaso, each switch would need to be configured with the necessary configuration which would then need to be baked to the deployed node16:12
TheJuliaI think that is fundamental layer violation of expectation but only works when the deployer of the baremetal node is also the overall infrastructure operator and I *suspect* supporting other cases would be injection or need something to facilitate that configuration modeling and matching.16:14
fricklerwell (except for the filtering) the switch config wouldn't need to be node specific, so it could be preconfigured for all nodes that use this connectivity type16:14
TheJuliawouldn't you want credentials to manage the session16:14
TheJuliayes, you could do some level of ingress filtering on the session on the switch16:14
fricklerI don't believe much in md5 for bgp, but that's another topic. my current switch config would just look like this https://paste.opendev.org/show/bTPdzRsLveMD7gaY7Dw3/16:17
TheJuliaI'm not sure how that would map through since it seems like you'd need to establish a session to advertise the address space as the baremetal node. I guess I'd need both sides in my mind, largely because my BGP experience is largely rooted in routing upfront, not tunneling16:20
frickleryes, I'm thinking of pure L3 routing, nothing fancy as cardoe has in mind16:21
fricklerthe node would somehow need to know its loopback addresses and its ASN, that's all16:22
fricklerwhich would be part of network_info if neutron knew how to deal with that, but could be just user_data for now. and the switch would likely only announce default routes to the node16:23
fricklerbtw, bonus points for supporting the same setup in the IPA stage already, so the switch doesn't need reconfiguration during deployment16:24
opendevreviewJulia Kreger proposed openstack/ironic-specs master: WIP: Trait based port selection and dynamic portgroups  https://review.opendev.org/c/openstack/ironic-specs/+/94564216:25
fricklerfwiw, my current solution is to create a custom ipa + deploy image per node, next up would be trying to use the config-drive16:25
TheJuliadoable, I guess. What is the risk if someone guesses another private asn number available in the fabric? Thinking malicious user compromises a whole baremetal node16:30
JayFYeah this is not a config that could be multitenant safe unless you're doing some sort of filtering by physical port16:31
TheJuliaWell, something like this is super powerful if you want do someting liek... run a root dns zone16:32
* TheJulia literally knows folks who did similar, but manually deployed hosts, for some of the root servers16:32
JayFoh, I'm not saying it's not possible, good, or we shouldn't do it16:32
JayFjust that it's not multitenant-friendly16:32
TheJuliayeah16:32
frickleryes, the multitenant case would include strict inbound filters on the switch for only the nodes' assigned addresses. both in BGP and as L3 ACL to avoid spoofing16:38
fricklerbut for now my usecase is just to deploy things as operator (to then run something like an overcloud on those nodes)16:39
fricklerfor now my question also only is whether this is a scenario that others might be interested in or whether I'm a weirdo (I don't expect that to be exclusive, you may ignore the latter part :-D)16:41
cardoeWell I updated my bug with what I would have answered Julia with16:42
JayFfrickler: I mean, the question is that but also how to arrange the separation of responsibilities :)16:42
JayFwhere does the ironic/neutron intervention end, and the computer start16:42
JayFhaving to bake in so much logic to the images is concerning to the idea of it being reproducable by others16:43
JayFit's almost like to do this ideally, we'd need a way to express BGP in network_data.json16:43
JayFthen get glean/cloud-init/etc to do the magic to make it work for each16:43
frickleroh, I do have a dib element that does this :)16:43
TheJuliaJayF: That is kind of the only way I could see it really be viable in a generic sense16:43
JayFYeah, but you need N images for N nodes, yeah?16:43
TheJuliacardoe: much appreciated16:44
JayFThat's not a pattern I'd be happy to endorse for most folks.16:44
cardoeI've also come up with another RFE that I dunno how it will fly. But the ability to filter mechanisms from receiving certain networks / ports.16:45
fricklercurrently yes. if I can use user-data, then only 1 per ... well per switch pair likely. which is where the similaritiy to cardoe's scenario comes into play16:45
frickleralthough the information the node needs to have about the switch could actually also be user data16:45
fricklercardoe: "filter mechanisms from receiving" like from the neutron API or where?16:46
cardoeWell now let's say I want OVN involved in some of my networks but not all.16:46
frickleroh, so you want neutron to use different backends in parallel? I think that that should be possible today16:47
frickleror do you want a noop backend for those other networks?16:48
cardoeNo different backends16:48
fricklerthat what should neutron do with those other networks, if their backend actually is ovn?16:49
fricklers/that/then/16:49
cardoeWell some networks would be networking-generic-switch/networking-baremetal and some could be OVN16:52
JayFthat is a design I may have to embrace as well at some point16:54
JayFalthough today it'd be more s/OVN/OVS/ but I don't know how long that'll last16:54
fricklerI never looked at the former so far, but isn't that implemented as another backend? or how is n-g-s tied to neutron?16:54
JayFngs is an ml2 driver, it configures switches in "generic" ways, e.g. ssh commands16:55
cardoeI just figured I'd embrace OVN now since that's where things are trending. I've got no dog in the fight.16:57
JayFI don't think that's wrong for a new system16:57
cardoefrickler: I updated https://bugs.launchpad.net/neutron/+bug/2105855 with some more details examples.16:57
cardoeEssentially our custom mechanism I hope to marry up with an "NGS 2.0"16:58
cardoeWe're doing things very similar to NGS 2.0 but have a lot more operations to consider due to VNIs and VLANs and L2 gateways and blah.17:04
cardoeAll of which are things that other mechanisms like the Cisco, Juniper and Arista one support.17:05
cardoeCurrently it's written as an external service and each switch is bound to an agent to prevent multiple writers at the same time.17:06
cardoeBut we're gonna rework this as a neutron mech agent17:07
JayFwhat is "NGS 2.0"?17:07
TheJuliaSomething cardoe is cooking up downstream17:07
cardoeMy mythical hope of finding the commonality between what we're doing and what requests others have had around NGS become.17:07
TheJuliaAt least, that is my guess17:07
TheJuliaMaybe, be named something else :)17:08
JayF2.0 implies some massive reworking; I've only seen suggestions of incremental progress17:08
JayFwhich, to be clear, is my preference -- incremental improvement works :D 17:08
fricklerah, so I think backend was the wrong term, mechanism driver is what it really is17:08
fricklerif that is what you meant with "mechanisms" above, that would explain a lot ;)17:09
cardoeYes.17:09
frickleror maybe I just should've read that bug17:10
cardoeJayF: okay... 2026.1 :-D17:10
JayFfrickler: I am ... very bad about misusing terminology :)17:10
cardoeI'd just like to find the commonality between what we're doing and what others are doing or asking for and land that upstream.17:11
cardoeThat's what I was referring to as 2.017:12
cardoeSorry for the bad terminology.17:12
JayFhonestly I just worry that I had missed something bigger lol17:12
JayFlike the whole etcd locking stuff for ngs seemed to pop outta nowhere 17:12
TheJuliaI'm less convinced its really the needful at this point17:14
cardoedoesn't it already use it?17:16
TheJuliaIt is available, the code uses it for device locking *and* command pooling across multiple threads17:16
TheJuliain order to prevent concurrent writers and try to streamline changes17:16
TheJuliaI'm semi-convinced at this point that the runtime in neutron model is part of the problem there17:16
TheJuliaas opposed to a clean disjoint17:17
cardoeCurious what you mean about the neutron model issue17:18
TheJuliaso neutron basically owns the transaction, and your trying to do it within the neutron api process's runtime17:19
TheJuliaso your trying to do it basically on something which is behind a load balancer as well17:19
cardoeyeah it kinda needed to be a request with a job ID handed back17:21
cardoelike nova migration (which doesn't actually give you a job ID back)17:21
TheJuliaexactly17:21
TheJuliakind of what I wanted to solve with mercury was to disjoint the interaction/runner of action to beyond the line of control so you don't have to own/manage both.17:26
fricklerso you'd need neutron-conductor rather than the API doing RPC/other stuff directly?17:26
JayFSomething like that; this is another instance of what Ironic hits with many openstack services: the real world is slower and messier than the virtual one :)17:27
TheJuliaNeutron-conductor is just abstracting and trading problems17:27
TheJuliaThe real world messy issue we hear time and time again "my network team won't let me manage switch config"17:27
TheJuliaits a lack of trust issue, and how you solve that is by building trust17:28
frickleryes, that's also why I want to have a static switch config in my BGP scenario17:28
TheJuliaAnother operator who is in this very room "network team just won't provide credentials."17:28
TheJuliaAnd ultimately, there are many different needs and requirements17:29
TheJuliaits not about just IP traffic, even though in a classic cloud sense, that is all you need/want17:29
JayFlest we not forget about the other shift that we've talked about that still is bubbliing around: having DPUs do some of this work17:30
TheJuliaYeah, I've got folks who are now framing that as the ultimate cure-all17:30
JayFthere are a myriad of directions networking designs fork into at this state in our maturity, and I just don't think there's one answer for  everyone17:30
TheJulia"wait until you realize people can't just use a single dpu to meet their operational requirements"17:30
JayFTheJulia: I'm *extremely* concerned about security implications there17:30
TheJuliaJayF: agreee 100%17:30
TheJuliaJayF: likewise, do you remember the giant warning I demanded on smartnic support?17:31
JayFTheJulia: to rely on DPU to be the security arbiter17:31
TheJuliayup, and in their benefit, Mellanox/Nvidia rightfully took a lot of our points and went back to their product designs and improved them17:32
JayFI think it's still safe to say you can't achieve a level of security with the DPU as the firewall that you can when you have a switch as a firewall17:32
TheJuliaOthers... not so much or just haven't gotten to that point in their product journey17:33
JayFhonestly even in an ideal world, it makes me nervous to tie together security+performance17:34
JayFbecause what happens if I need DPU firmware N+1 for security patches, but then that hurts network performance?17:34
JayFI wanna avoid that particular vice.17:34
TheJuliayeah, defense is always in depth17:34
TheJuliayou can't expect a single layer to be impervious17:34
* JayF has been burned by beta-quality switch security technologies in the past17:34
JayFcardoe: do you know ^^^ the story behind that?17:34
JayFcardoe: you can probably look it up in core :)17:35
cardoeI don't. I'm scared now17:35
JayFit was all OMv1 stuff. A security bug in the cisco security stuff which allowed tenants to see traffic from other tenants in some cases17:35
JayFand the person who reported it, well the case was maximum bad17:36
TheJuliaso I think the key takeaway, is to embrace many models, and have giant warnings next to appropriately risks17:40
JayFI think in order to fully secure a server, you need to disconnect it from all external power and network sources, and take it to a deserted tropical island paradise. I'm going to start researching this solution ;) 17:41
* TheJulia has an example warning being generated by an AI but it is thinking very hard17:42
TheJuliaJayF: Don't forget the concrete box17:43
JayFhey, that whole ironic.cmd -> ironic.command move was zero edit AI change  from cursor IDE17:43
JayFI was kinda impressed17:43
JayFI'm skeptical of 'vibe coding' as a thing, but integrating it into tools seems like it can help17:43
TheJuliaso did you just say "rename this module" ?17:44
JayFyeah17:44
JayFand it went step by step, explaining all the changes it made17:44
JayFI approved them, did some grepping to ensure it didn't miss anything17:44
JayFI asked it similarly to write unit tests for my runbooks-as-automated-cleaning change; less impressive overall but not worthless17:45
JayFI will say *cursor specifically* has been massively more capable than any copilot or jetbrains ai assistant I've tried17:45
TheJulianeat17:51
JayFThe rest of the chat was commands it was asking permission to run (mv cmd command) and diffs for me to approve around the codebase, including docs references. https://usercontent.irccloud-cdn.com/file/8ebxnWQa/image.png17:55
JayFI think this weekend, I'm going to take sushy, sushy-tools, and https://github.com/sipeed/NanoKVM and see how far I can get it to go in writing a redfish API service for this device17:57
TheJuliaso, when the board created the AI contribution policy, we explicitly sought folks to provide a bit of clarity behind how they got to that point in addition to what was used. That at least provides clarity to the comment I was seeking an answer to for that change, thanks!17:57
TheJuliaWe've clearly left he realm of "AI is just whipping content up" to "AI is a do-er"17:58
JayFLike, other products I've used/tried were a combo of a chat box with a separate coding assistant which was incapable of grokking more than a few lines of context17:58
JayFcursor being able to see the context of the whole repo you have loaded into the IDE is a massive game changer tbh17:58
JayFAIUI claude-code is similarly powerful, maybe even moreso -- a report I got from someone I trust a lot who managed to get it to do some crazy stuff to do is part of why I'm heading back down this path17:59
TheJuliaso... I guess we need to whip up our ptg schedule?18:03
* TheJulia gets out the PTG-Aid... (which is really a KitchenAid with a post-it note)18:12
TheJuliarpittau: I themed days together on the etherpad, not entirely a schedule but bucketed topics to help guide discussion.18:40
opendevreviewJay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook  https://review.opendev.org/c/openstack/ironic/+/94525920:34
cardoeAny reason why networking-baremetal fails tests always?21:05
cardoeLike should I figure out why or we know about it21:05
TheJuliacardoe: it should pass, but It would help to understand what is going on to frame the discussion21:05
TheJulia(a link would help21:05
cardoeBeen trying to run this one https://review.opendev.org/c/openstack/networking-baremetal/+/94581821:06
cardoebaremetal multi-tenancy trunk fails with "Details: {'type': 'HTTPNotFound', 'message': 'The resource could not be found.', 'detail': ''}"21:06
TheJuliaFinally got my AI image about "WARNING: this feature may..." https://usercontent.irccloud-cdn.com/file/Okh5NSja/warning-this-feature-may.png21:06
TheJuliahmm.. what is going on with that job21:15
TheJuliaOh21:16
TheJuliaits the new test which was added21:16
TheJuliahttps://2b2204763b167555199d-100ff86b174195291db1bc21f098c4ef.ssl.cf1.rackcdn.com/openstack/d106f78853ce427e9259903035d75a06/testr_results.html21:16
TheJuliaThis is why we need to be stupidly careful about new tests in the tempest plugin21:16
JayFespecially scenario tests, really21:17
JayFI think we should have more straight API tests (as evidenced by the giant PR from Adam)21:17
TheJuliayeah, the test is trying to bind the trunk together and it blows up because it is not part of the test21:30
TheJuliaI'll switch laptops and just disable it since the feature is not loaded on that neutron21:30
opendevreviewJay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook  https://review.opendev.org/c/openstack/ironic/+/94525921:34
TheJuliasigh, no knob21:37
opendevreviewJay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook  https://review.opendev.org/c/openstack/ironic/+/94525921:42
* JayF hammers more on this in devstack21:43
JayFit's been a while since I wrote a feature that was this close to the RPC boundary, having to relearn some stuff21:43
opendevreviewJay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook  https://review.opendev.org/c/openstack/ironic/+/94525921:45
TheJuliaI see what is going on21:47
TheJuliathe ngs plugin has been changed to assume all usage of the plugin has trunk usage enabled21:47
TheJuliaThat is a flawed assumption as far as I'm concerned21:49
JayFyeah21:50
TheJulialooks like vasyl's devstack change sort of does the right thing21:50
TheJuliaso...21:50
* TheJulia fixes stuffs21:50
JayFApr 01 21:48:25 devstack-20250401 ironic-conductor[129213]: WARNING oslo_db.sqlalchemy.exc_filters [None req-8b2699fd-508d-4bf3-9e75-d7f324ba74ea None None] DB exception wrapped.: TypeError: Object of type datetime is not JSON serializable21:50
TheJuliarutro21:51
JayFit looks like things are blowing up while trying to update updated_at!?21:51
TheJuliawhere did you get that at?!21:51
JayFhttps://review.opendev.org/c/openstack/ironic/+/94525921:51
JayFtrying to get a runbook cleaning to work with this21:51
JayFApr 01 21:48:25 devstack-20250401 ironic-conductor[129213]: ERROR oslo_db.sqlalchemy.exc_filters oslo_db.exception.DBError: (builtins.TypeError) Object of type datetime is not JSON serializable                                                                                         21:51
JayFApr 01 21:48:25 devstack-20250401 ironic-conductor[129213]: ERROR oslo_db.sqlalchemy.exc_filters [SQL: UPDATE nodes SET driver_internal_info=%(driver_internal_info)s, updated_at=%(updated_at)s WHERE nodes.id = %(nodes_id)s]21:51
JayFobviously maximal wtf21:52
JayFbut I musta screwed up something21:52
JayFthe traceback is so obscured there's not even any ironic code in it :| 21:53
TheJuliaeek :\21:55
JayFwow the error is so obscured because it spirals trying to write to node.last_error21:55
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: CI: Fix trunks enabled by default  https://review.opendev.org/c/openstack/networking-generic-switch/+/94608921:56
TheJuliaThat should fix it21:56
TheJuliaJayF: bravo? :)21:57
JayFwhere do I pick up my "good job breaking ironic so well" aware21:57
JayF*award21:57
JayFnot quite a cve, more of an lolsob21:57
TheJuliado I need to have coins minted?21:58
JayFthis error could happen21:59
JayFif I sent over something not-RPC-encoded properly by accident (?)21:59
JayFhmmm21:59
JayFhttps://review.opendev.org/c/openstack/ironic/+/945259/9/ironic/conductor/manager.py#1182 I suspect I need to do /something/ there to make it happy22:00
JayFcid: ^ if you're around, got any ideas?22:03
TheJuliaif I'm recalling correctly, clean steps get kicked back to the rpc bus22:06
TheJuliaso the task is ultimately released and picked up22:06
JayFmy hunch is the clean_steps get passed into do_node_clean22:06
JayFand when it tries to save the clean steps onto the object22:06
JayFboom because I haven't encoded them properly, or something like that22:06
JayFmanual cleaning might be a good place to look for stuff in this realm22:07
TheJuliapossibly, I suspect you'll need to add tons of extra logging to follow out where exactly we stop logigng22:07
JayFI'm doing that now but it always feels like I'm losing at the mental game when I can't logic out the bug22:11
TheJuliaperhaps a sign to call it a day then and give it a fresh brain in the morning ?22:11
JayFne None] JAY: ABOUT TO SAVE NODE IN DO_NODE_CLEAN {{(pid=130998) do_node_clean /opt/stack/ironic/ironic/conductor/cleaning.py:86}}22:11
JayFnope, it's a sign that I knew exactly what was going wrong22:11
JayFhuzzah but also sad because I don't know how to fix it LOL22:11
TheJuliaa fresh brain :)22:16
JayFI think I have a solution!!22:17
JayFcid's code led me down the right path22:19
TheJulia\o/22:20
JayFhttps://opendev.org/openstack/ironic/src/branch/master/ironic/api/controllers/v1/utils.py#L160022:20
JayFI've been trying to answer the ? of why this method exists on the API side22:20
JayFnow I know: that effectively eliminates the datetime fields (e.g. created_at, updated_at, and friends)22:20
opendevreviewJay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook  https://review.opendev.org/c/openstack/ironic/+/94525922:21
JayF| driver_internal_info   | {'clean_steps': [{'interface': 'deploy', 'step': 'erase_devices_express', 'args': {}, 'order': 1}],              |22:23
JayFit's woooorrrrkkkiiinnnnggggg22:23
JayFI can't leave something that broken at my EOD; I'll lose all context if I forget, and if I remember I'll dread it in the morning ;) 22:24
TheJuliawoot!22:26
JayFI will note it will make my development life easier, much easier, if we can land https://review.opendev.org/c/openstack/ironic/+/94599922:31
cid\o/22:36
opendevreviewcid proposed openstack/ironic-python-agent master: WIP: eventlet wsgi to Gunicorn  https://review.opendev.org/c/openstack/ironic-python-agent/+/94609123:00
opendevreviewJay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook  https://review.opendev.org/c/openstack/ironic/+/94525923:15
opendevreviewJay Faulkner proposed openstack/ironic master: Rename ironic cmd module  https://review.opendev.org/c/openstack/ironic/+/94599923:22

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!