opendevreview | Merged openstack/ironic master: Fix passing vtep fields to neutron https://review.opendev.org/c/openstack/ironic/+/945964 | 01:09 |
---|---|---|
opendevreview | Lennart Jern proposed openstack/sushy-tools master: Add config option SUSHY_EMULATOR_STORAGE_POOL https://review.opendev.org/c/openstack/sushy-tools/+/945959 | 05:59 |
Amarachi_O | Good Morning Ironic, wishing you all a great day! | 06:45 |
rpittau | good morning ironic! o/ | 07:01 |
ayo_ | Good morning rpittau | 07:03 |
freemanboss[m] | Good morning guys | 07:18 |
opendevreview | Vasyl Saienko proposed openstack/ironic master: [devstack] Allow deploy environment with portgroups https://review.opendev.org/c/openstack/ironic/+/940611 | 08:01 |
AmarachiOrdor[m] | Happy New Month Everyone! | 09:01 |
Ayo[m] | Happy new month🥳 | 09:02 |
jssfr | back in the day, it was Happy Mailman Day | 09:09 |
AmarachiOrdor[m] | That's interesting, why though ? | 09:16 |
jssfr | because Mailman sent reminders about your current subscriptions on the first of each month :) | 09:28 |
freemanboss[m] | @outreachy applicants I hope we are keeping close attention to the project page on outreachy website? | 11:16 |
freemanboss[m] | Just noticed our project has been updated let's go through the update | 11:16 |
AmarachiOrdor[m] | Yeah I noticed that too, thank you so much this update Freeman Boss | 11:17 |
freemanboss[m] | AmarachiOrdor[m]: Great. | 11:18 |
freemanboss[m] | You're welcome | 11:18 |
queensly[m] | <freemanboss[m]> "First thing to avoid unnecessary..." <- Freeman Boss: Amarachi_O I had to restart this whole process, I have reached the enrollment stage but I see this... (full message at <https://matrix.org/oftc/media/v1/media/download/ASDBsst36jxM-wj0PF8HW3laQJdRa1-ZDwPvCH904aw1wBN7cGmtS-3RZTIDI-l5ZbJ-0s2s-mkEBU76i6KSa8RCeWOUrpPwAG1hdHJpeC5vcmcvamJ0cENKY3NFRFRRZkRiZmpCS1FQcmR3>) | 11:52 |
AmarachiOrdor[m] | I think the nodes already exist try doing baremetal node list to see if they are powered off | 11:53 |
AmarachiOrdor[m] | AmarachiOrdor[m]: queensly: | 11:54 |
queensly[m] | Alright. I did that , there are two nodes , and their power state is none. | 11:55 |
queensly[m] | The provisioning state is enroll | 11:56 |
Ayo[m] | Is this after using deploy? (Provisioning state) | 11:57 |
queensly[m] | Ayo[m]: No after using enroll | 11:57 |
Ayo[m] | Can you attach a screenshot? | 11:57 |
queensly[m] | Ayo[m]: Please can you see this? https://imgur.com/UytD2jf | 11:59 |
Ayo[m] | Use the baremetal node list command | 12:00 |
Ayo[m] | I can’t really tell the state of what you enrolled from this output | 12:01 |
queensly[m] | Ayo[m]: Yeah. Check this : https://imgur.com/K8rhzEm | 12:01 |
Ayo[m] | The nodes have been enrolled | 12:02 |
AmarachiOrdor[m] | So what I will advise is that you use baremetal --help to get different baremetal functions, in my situation my power state was off and I used baremetal power on testvm1 to switch it on | 12:02 |
queensly[m] | AmarachiOrdor[m]: Alright. Let me do this and give you feedback. | 12:03 |
queensly[m] | queensly[m]: the power state of testvm1 is on now. Amarachi Ordor | 12:06 |
AmarachiOrdor[m] | What does the provision state say | 12:07 |
queensly[m] | AmarachiOrdor[m]: it's showing as "enroll" | 12:08 |
* AmarachiOrdor[m] uploaded an image: (3988KiB) < https://matrix.org/oftc/media/v1/media/download/AfI7Rz0NDzBLzcM3nGPs21JBtVmwmf2K7pldxoKX0OsJMXWpcFTfWTaT8gtfSUDJg5yH4sYgR0XnYKlHuYU-uuNCeWOV-H2QAG1hdHJpeC5vcmcvVm5zVkt4S0p2WGpmTHJic2ZkbkpPZXBK > | 12:14 | |
queensly[m] | I used the command baremetal node deploy testvm1 but had this output . It shows that since the state is in "enroll" I can't perform the deploy action. | 12:14 |
queensly[m] | https://imgur.com/9fiatBf | 12:14 |
queensly[m] | Using the "node manage" command, the provision state is now "verifying" . Do I have to wait for a while or continue with "node provide"? | 12:18 |
AmarachiOrdor[m] | Honestly I don't know but if you wait a while and it doesn't change you can try node provide | 12:19 |
queensly[m] | AmarachiOrdor[m]: Alright, no problem. The provision state is now "manageable" so I will try the node provide. | 12:20 |
queensly[m] | <queensly[m]> "Alright, no problem. The..." <- After using the "node provide" command, the provision state changed to cleaning. After some time, I checked again and the state is "available" . From the documentation https://docs.openstack.org/developer/tripleo-docs/advanced_deployment/node_states.html#:~:text=The%20manage%20action%20can%20be,How%20to%20Contribute | 12:28 |
queensly[m] | this is the last step before deployment. I will deploy now and check the output. | 12:28 |
AmarachiOrdor[m] | queensly: nice nice, let me know how it hours | 12:29 |
AmarachiOrdor[m] | s/hours/goes/ | 12:29 |
opendevreview | Harald Jensås proposed openstack/sushy-tools master: os-vmedia: Add option to delay rebuild on eject https://review.opendev.org/c/openstack/sushy-tools/+/945800 | 12:43 |
freemanboss[m] | <queensly[m]> "Freeman Boss: Amarachi_O I..." <- > <@queensly:matrix.org> Freeman Boss: Amarachi_O I had to restart this whole process, I have reached the enrollment stage but I see this... (full message at <https://matrix.org/oftc/media/v1/media/download/AWqC_RUyy_hDOs0M_Cp1BBcv4mSqYSQgn9XT95u6d1U5xgp4W0c5sW63qjETqy6ZhLa2vqbPQJ5og7QnCrpWP8NCeWOYMmiwAG1hdHJpeC5vcmcvV25XTEticFlpVHBiRGtublRxRlVaRk9l>) | 12:53 |
freemanboss[m] | queensly: everything working fine now? | 12:55 |
freemanboss[m] | * finish and when it retried | 12:59 |
queensly[m] | freemanboss[m]: I tried to deploy and had this error: | 12:59 |
queensly[m] | Failed to validate deploy or power info for the node testvm1 : Node testvm1 failed to validate deploy image info. some parameters were missing. Missing are [instance_info.image_sourcee] (HTTP 400) | 12:59 |
queensly[m] | I have found the image source file and currently working on assigning an image to the instance_info since it showed empty. | 12:59 |
freemanboss[m] | send the result of baremetal node list | 13:00 |
queensly[m] | queensly[m]: I want to see how this will turn out and give you the feedback. | 13:00 |
queensly[m] | queensly[m]: I am also open to your contributions. | 13:01 |
Ayo[m] | queensly[m]: When you say that you’ve found the source file, what do you mean please | 13:02 |
Ayo[m] | The data for the source files were pre installed during the installation stage when working on our cli | 13:03 |
Ayo[m] | So are you using the deploy command or there’s a manual means that you’re going to use | 13:04 |
freemanboss[m] | queensly: let's start from here and send the baremetal node list results | 13:05 |
freemanboss[m] | Also when you enroll did you see that it's successful? | 13:05 |
Ayo[m] | freemanboss[m]: > <@freemanboss:matrix.org> queensly: let's start from here and send the baremetal node list results... (full message at <https://matrix.org/oftc/media/v1/media/download/AeRcp3ji1t4lFwssBjnxaeDcPsYeaDOsDRn_TCzsRv6bAeYSLxrrjne-46-6hFA7ROj27-QXmC80hNI61FWYiEdCeWOY6P7gAG1hdHJpeC5vcmcvTVRkVGhyRnlVamRDckNvREhzbkpST1lk>) | 13:06 |
queensly[m] | Ayo[m]: I mean the file where one will find the images. you need to assign an image to the instance info. /var/lib/ironic/httpboot shows you the various images. In my case, I would nee these two for my instance info: deployment_image.qcow2 and | 13:07 |
queensly[m] | depolyment_image.qcow2-checksum.CHECKSUMS | 13:07 |
queensly[m] | freemanboss[m]: https://imgur.com/K8yDtJL check this out | 13:08 |
freemanboss[m] | queensly[m]: > <@queensly:matrix.org> I mean the file where one will find the images. you need to assign an image to the instance info. /var/lib/ironic/httpboot shows you the various images. In my case, I would nee... (full message at <https://matrix.org/oftc/media/v1/media/download/AWaVOMu9eYxHkWG1qbBcZTIUJc2HGIuGOsR_ZDhZ7nvIQ0aSlme-fJ5t9RXxn7VXO3Z7KR7DYkV5sSC6HX6z8eNCeWOZB4qQAG1hdHJpeC5vcmcvc0Ria2lid1F1aFRaaW1vY3NORVNZSFVo>) | 13:08 |
Ayo[m] | Yep | 13:08 |
Ayo[m] | That’s why I wanted to know the other means queen was going to use | 13:09 |
queensly[m] | freemanboss[m]: But when I use the baremetal node show testvm1 command, it shows the instance info is empty {} | 13:09 |
Ayo[m] | The deploy variable have these steps preconfigured | 13:09 |
Ayo[m] | queensly[m]: The hardware, outlines in the .json file has been provided with the node details | 13:10 |
freemanboss[m] | queensly[m]: Good. Just run ./bifrost-cli enroll baremetal-inventory.json again | 13:11 |
queensly[m] | freemanboss[m]: Alright | 13:11 |
queensly[m] | queensly[m]: Why enroll and not deploy since the provision state is now "available" 🤔 | 13:13 |
freemanboss[m] | It shouldn't enter the cleaning loop again but if it does lemme know | 13:13 |
freemanboss[m] | queensly[m]: Nope that's not it. | 13:14 |
freemanboss[m] | If you check we'll out of the 2 baremetals you only have one enrolled so the first one testvm1 will not allow the deployment to work fine | 13:14 |
Ayo[m] | freemanboss[m]: > <@freemanboss:matrix.org> Nope that's not it.... (full message at <https://matrix.org/oftc/media/v1/media/download/AUK9nwBYp_vD92XdGk1jzNmoOwNa9_msNDpKrS2cG3kVYen27zrlwBBXP_bKTeoQSIiPYHy0ZhfHA13xld1DFmVCeWOZhB9AAG1hdHJpeC5vcmcveHpySGRVVmpCS0F3aENhQ0NwZWlRdWpO>) | 13:16 |
queensly[m] | freemanboss[m]: Okay. So if I understand you correctly, both have to be in the same state, is the right? | 13:17 |
freemanboss[m] | queensly[m]: Yes exactly | 13:17 |
queensly[m] | Ayo[m]: That is what I'm trying to do. I was thinking I could deploy just one. | 13:18 |
Ayo[m] | queensly[m]: Just saw the attachments, my bad | 13:18 |
freemanboss[m] | You previously provided the testvm1 for cleaning mode so the availability is for after cleaning not for after enrollment | 13:18 |
queensly[m] | freemanboss[m]: Okay, so does it mean the availability in this stage means available for enrollment? I thought the state of been available meant ready for deployment as I saw here https://docs.openstack.org/developer/tripleo-docs/advanced_deployment/node_states.html#:~:text=The%20manage%20action%20can%20be,How%20to%20Contribute | 13:21 |
freemanboss[m] | Availability means it's ready for the next stage after a just concluded process | 13:23 |
* cardoe table flips and then realizes his coffee was on the table. | 13:24 | |
cardoe | That's how my day feels like it's shaping up to be. | 13:24 |
queensly[m] | freemanboss[m]: Alright. Does it mean the testvm2 should also be in the same "available" state before i use the ./bifrost-cli enroll baremetal-inventory.json ? | 13:25 |
freemanboss[m] | queensly[m]: It should be in available mode already. | 13:26 |
freemanboss[m] | You can try deploying too though if it works fine | 13:26 |
TheJulia | heh | 13:28 |
TheJulia | Mailman day, love it. | 13:29 |
queensly[m] | <freemanboss[m]> "You can try deploying too though..." <- I had to use the node manage, and provide commands to also set the testvm2 to 'available". Since it's state wasn't available, I had an error again when I run the ./bifrost-cli enroll baremetal-inventory.json . After it was set, that I used the ./bifrost-cli enroll baremetal-inventory.json command again and had this output. Ironic has successfully deployed an OS onto the | 14:07 |
queensly[m] | nodes. | 14:07 |
queensly[m] | https://imgur.com/QUoJzRQ | 14:07 |
queensly[m] | s/that// | 14:08 |
AmarachiOrdor[m] | queensly: okay thats nice | 14:22 |
AmarachiOrdor[m] | That's really good | 14:22 |
AmarachiOrdor[m] | Glad you were able to resolve it | 14:22 |
freemanboss[m] | <queensly[m]> "I had to use the node manage..." <- > <@queensly:matrix.org> I had to use the node manage, and provide commands to also set the testvm2 to 'available". Since it's state wasn't available, I had an... (full message at <https://matrix.org/oftc/media/v1/media/download/ARwm5AHomr-rtkWCzmEqRj9Phwhn020ALc84h0IP5uI-W_WEJ0BDnf34SImcgElvcmAh9O5IeI3_rgmgU_4RGf5CeWOddYAAAG1hdHJpeC5vcmcvaFNKR2ZBeXlweUJ6d1hreGZmRWdYc1ls>) | 14:25 |
Ayo[m] | But why tho | 14:28 |
Ayo[m] | I ran into the same issue when I tried to deploy the other .json that had node details in it | 14:30 |
Ayo[m] | Only the inventory.Json file file worlds for deployment | 14:30 |
Ayo[m] | s/worlds/works/ | 14:37 |
queensly[m] | <AmarachiOrdor[m]> "Glad you were able to resolve it" <- Yeah. Thanks for the support. 😊 | 14:42 |
queensly[m] | <freemanboss[m]> "> <@queensly:matrix.org> I had..." <- Oh I see. Thanks, the other method I tried to use would have been a long way. As you mentioned earlier, all those settings had already been done for easy deployment in the testenv. | 14:46 |
freemanboss[m] | queensly[m]: Yeah exactly | 14:50 |
queensly[m] | <Ayo[m]> "Only the inventory.Json file..." <- Yeah. Thank you for the assistance. I hope you have been able to deploy too. | 14:50 |
cardoe | TheJulia: https://bugs.launchpad.net/neutron/+bug/2105855 I made the ask | 15:13 |
TheJulia | cardoe: hacks you guys did, or hacks existing in the ml2 plugins today ? | 15:15 |
TheJulia | I could see cisco needing physical network delineation | 15:15 |
cardoe | So we're exploring with our own ml2 mech right now. Currently the servers that sit on two separate fabrics automagically get the same VNIs on both sides. It's wasteful but we've got enough VNIs. | 15:18 |
cardoe | Using the network-segment-range neutron extension to create a VNI range and tenants create vxlan based networks. | 15:19 |
cardoe | We've set physical_network on the ironic baremetal ports to the HA ToR pair. So let's say cabinet X has a switch pair X-1 and X-2 for regular traffic (which portgroups can be created in as well). The physical_network field on those two ports would be "X". We then create a network-segment-range of type=VLAN and physical_network=X equal to a range of allowed VLANs. | 15:21 |
TheJulia | I was mainly asking to see if there was a need to provide a bit more context which might need to be added to the RFE | 15:22 |
cardoe | oh. Well lemme write this up there. | 15:23 |
TheJulia | Adds some weight to explain why you've routed around and what you did as well, just so they can frame it | 15:27 |
TheJulia | (at the end of the day you want them to grok the business value and technical reasons why) | 15:34 |
opendevreview | Jay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook https://review.opendev.org/c/openstack/ironic/+/945259 | 15:44 |
frickler | ok, I wasn't sure when to come up with this, but maybe now is as bad as anytime: did anyone ever think about doing nodes with BGP based L3 connectivity? one obvious obstacle is that this is not a concept that neutron currently supports, but maybe it wouldn't be impossible to add, either? | 16:05 |
TheJulia | frickler: as in running bgp on the deployed node to provide it that baseline connectivity wired into the existing neutron fabric? | 16:07 |
opendevreview | Matt Anson proposed openstack/ironic-python-agent-builder master: Don't install biosdevname in arm64/aarch64 arches https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/946057 | 16:07 |
frickler | TheJulia: first step might even be outside of neutron, but yes, providing basic connectivity to the node, via IPv6 unnumbered BGP ideally | 16:09 |
TheJulia | So the individual deploying is also the infrastructure operator then ? | 16:11 |
frickler | yes, in the easy case. but with some BGP filtering on the switches it might also support tenants. I haven't put too much brains yet into how to actually represent that in neutron | 16:12 |
TheJulia | so, each switch would need to be configured with the necessary configuration which would then need to be baked to the deployed node | 16:12 |
TheJulia | I think that is fundamental layer violation of expectation but only works when the deployer of the baremetal node is also the overall infrastructure operator and I *suspect* supporting other cases would be injection or need something to facilitate that configuration modeling and matching. | 16:14 |
frickler | well (except for the filtering) the switch config wouldn't need to be node specific, so it could be preconfigured for all nodes that use this connectivity type | 16:14 |
TheJulia | wouldn't you want credentials to manage the session | 16:14 |
TheJulia | yes, you could do some level of ingress filtering on the session on the switch | 16:14 |
frickler | I don't believe much in md5 for bgp, but that's another topic. my current switch config would just look like this https://paste.opendev.org/show/bTPdzRsLveMD7gaY7Dw3/ | 16:17 |
TheJulia | I'm not sure how that would map through since it seems like you'd need to establish a session to advertise the address space as the baremetal node. I guess I'd need both sides in my mind, largely because my BGP experience is largely rooted in routing upfront, not tunneling | 16:20 |
frickler | yes, I'm thinking of pure L3 routing, nothing fancy as cardoe has in mind | 16:21 |
frickler | the node would somehow need to know its loopback addresses and its ASN, that's all | 16:22 |
frickler | which would be part of network_info if neutron knew how to deal with that, but could be just user_data for now. and the switch would likely only announce default routes to the node | 16:23 |
frickler | btw, bonus points for supporting the same setup in the IPA stage already, so the switch doesn't need reconfiguration during deployment | 16:24 |
opendevreview | Julia Kreger proposed openstack/ironic-specs master: WIP: Trait based port selection and dynamic portgroups https://review.opendev.org/c/openstack/ironic-specs/+/945642 | 16:25 |
frickler | fwiw, my current solution is to create a custom ipa + deploy image per node, next up would be trying to use the config-drive | 16:25 |
TheJulia | doable, I guess. What is the risk if someone guesses another private asn number available in the fabric? Thinking malicious user compromises a whole baremetal node | 16:30 |
JayF | Yeah this is not a config that could be multitenant safe unless you're doing some sort of filtering by physical port | 16:31 |
TheJulia | Well, something like this is super powerful if you want do someting liek... run a root dns zone | 16:32 |
* TheJulia literally knows folks who did similar, but manually deployed hosts, for some of the root servers | 16:32 | |
JayF | oh, I'm not saying it's not possible, good, or we shouldn't do it | 16:32 |
JayF | just that it's not multitenant-friendly | 16:32 |
TheJulia | yeah | 16:32 |
frickler | yes, the multitenant case would include strict inbound filters on the switch for only the nodes' assigned addresses. both in BGP and as L3 ACL to avoid spoofing | 16:38 |
frickler | but for now my usecase is just to deploy things as operator (to then run something like an overcloud on those nodes) | 16:39 |
frickler | for now my question also only is whether this is a scenario that others might be interested in or whether I'm a weirdo (I don't expect that to be exclusive, you may ignore the latter part :-D) | 16:41 |
cardoe | Well I updated my bug with what I would have answered Julia with | 16:42 |
JayF | frickler: I mean, the question is that but also how to arrange the separation of responsibilities :) | 16:42 |
JayF | where does the ironic/neutron intervention end, and the computer start | 16:42 |
JayF | having to bake in so much logic to the images is concerning to the idea of it being reproducable by others | 16:43 |
JayF | it's almost like to do this ideally, we'd need a way to express BGP in network_data.json | 16:43 |
JayF | then get glean/cloud-init/etc to do the magic to make it work for each | 16:43 |
frickler | oh, I do have a dib element that does this :) | 16:43 |
TheJulia | JayF: That is kind of the only way I could see it really be viable in a generic sense | 16:43 |
JayF | Yeah, but you need N images for N nodes, yeah? | 16:43 |
TheJulia | cardoe: much appreciated | 16:44 |
JayF | That's not a pattern I'd be happy to endorse for most folks. | 16:44 |
cardoe | I've also come up with another RFE that I dunno how it will fly. But the ability to filter mechanisms from receiving certain networks / ports. | 16:45 |
frickler | currently yes. if I can use user-data, then only 1 per ... well per switch pair likely. which is where the similaritiy to cardoe's scenario comes into play | 16:45 |
frickler | although the information the node needs to have about the switch could actually also be user data | 16:45 |
frickler | cardoe: "filter mechanisms from receiving" like from the neutron API or where? | 16:46 |
cardoe | Well now let's say I want OVN involved in some of my networks but not all. | 16:46 |
frickler | oh, so you want neutron to use different backends in parallel? I think that that should be possible today | 16:47 |
frickler | or do you want a noop backend for those other networks? | 16:48 |
cardoe | No different backends | 16:48 |
frickler | that what should neutron do with those other networks, if their backend actually is ovn? | 16:49 |
frickler | s/that/then/ | 16:49 |
cardoe | Well some networks would be networking-generic-switch/networking-baremetal and some could be OVN | 16:52 |
JayF | that is a design I may have to embrace as well at some point | 16:54 |
JayF | although today it'd be more s/OVN/OVS/ but I don't know how long that'll last | 16:54 |
frickler | I never looked at the former so far, but isn't that implemented as another backend? or how is n-g-s tied to neutron? | 16:54 |
JayF | ngs is an ml2 driver, it configures switches in "generic" ways, e.g. ssh commands | 16:55 |
cardoe | I just figured I'd embrace OVN now since that's where things are trending. I've got no dog in the fight. | 16:57 |
JayF | I don't think that's wrong for a new system | 16:57 |
cardoe | frickler: I updated https://bugs.launchpad.net/neutron/+bug/2105855 with some more details examples. | 16:57 |
cardoe | Essentially our custom mechanism I hope to marry up with an "NGS 2.0" | 16:58 |
cardoe | We're doing things very similar to NGS 2.0 but have a lot more operations to consider due to VNIs and VLANs and L2 gateways and blah. | 17:04 |
cardoe | All of which are things that other mechanisms like the Cisco, Juniper and Arista one support. | 17:05 |
cardoe | Currently it's written as an external service and each switch is bound to an agent to prevent multiple writers at the same time. | 17:06 |
cardoe | But we're gonna rework this as a neutron mech agent | 17:07 |
JayF | what is "NGS 2.0"? | 17:07 |
TheJulia | Something cardoe is cooking up downstream | 17:07 |
cardoe | My mythical hope of finding the commonality between what we're doing and what requests others have had around NGS become. | 17:07 |
TheJulia | At least, that is my guess | 17:07 |
TheJulia | Maybe, be named something else :) | 17:08 |
JayF | 2.0 implies some massive reworking; I've only seen suggestions of incremental progress | 17:08 |
JayF | which, to be clear, is my preference -- incremental improvement works :D | 17:08 |
frickler | ah, so I think backend was the wrong term, mechanism driver is what it really is | 17:08 |
frickler | if that is what you meant with "mechanisms" above, that would explain a lot ;) | 17:09 |
cardoe | Yes. | 17:09 |
frickler | or maybe I just should've read that bug | 17:10 |
cardoe | JayF: okay... 2026.1 :-D | 17:10 |
JayF | frickler: I am ... very bad about misusing terminology :) | 17:10 |
cardoe | I'd just like to find the commonality between what we're doing and what others are doing or asking for and land that upstream. | 17:11 |
cardoe | That's what I was referring to as 2.0 | 17:12 |
cardoe | Sorry for the bad terminology. | 17:12 |
JayF | honestly I just worry that I had missed something bigger lol | 17:12 |
JayF | like the whole etcd locking stuff for ngs seemed to pop outta nowhere | 17:12 |
TheJulia | I'm less convinced its really the needful at this point | 17:14 |
cardoe | doesn't it already use it? | 17:16 |
TheJulia | It is available, the code uses it for device locking *and* command pooling across multiple threads | 17:16 |
TheJulia | in order to prevent concurrent writers and try to streamline changes | 17:16 |
TheJulia | I'm semi-convinced at this point that the runtime in neutron model is part of the problem there | 17:16 |
TheJulia | as opposed to a clean disjoint | 17:17 |
cardoe | Curious what you mean about the neutron model issue | 17:18 |
TheJulia | so neutron basically owns the transaction, and your trying to do it within the neutron api process's runtime | 17:19 |
TheJulia | so your trying to do it basically on something which is behind a load balancer as well | 17:19 |
cardoe | yeah it kinda needed to be a request with a job ID handed back | 17:21 |
cardoe | like nova migration (which doesn't actually give you a job ID back) | 17:21 |
TheJulia | exactly | 17:21 |
TheJulia | kind of what I wanted to solve with mercury was to disjoint the interaction/runner of action to beyond the line of control so you don't have to own/manage both. | 17:26 |
frickler | so you'd need neutron-conductor rather than the API doing RPC/other stuff directly? | 17:26 |
JayF | Something like that; this is another instance of what Ironic hits with many openstack services: the real world is slower and messier than the virtual one :) | 17:27 |
TheJulia | Neutron-conductor is just abstracting and trading problems | 17:27 |
TheJulia | The real world messy issue we hear time and time again "my network team won't let me manage switch config" | 17:27 |
TheJulia | its a lack of trust issue, and how you solve that is by building trust | 17:28 |
frickler | yes, that's also why I want to have a static switch config in my BGP scenario | 17:28 |
TheJulia | Another operator who is in this very room "network team just won't provide credentials." | 17:28 |
TheJulia | And ultimately, there are many different needs and requirements | 17:29 |
TheJulia | its not about just IP traffic, even though in a classic cloud sense, that is all you need/want | 17:29 |
JayF | lest we not forget about the other shift that we've talked about that still is bubbliing around: having DPUs do some of this work | 17:30 |
TheJulia | Yeah, I've got folks who are now framing that as the ultimate cure-all | 17:30 |
JayF | there are a myriad of directions networking designs fork into at this state in our maturity, and I just don't think there's one answer for everyone | 17:30 |
TheJulia | "wait until you realize people can't just use a single dpu to meet their operational requirements" | 17:30 |
JayF | TheJulia: I'm *extremely* concerned about security implications there | 17:30 |
TheJulia | JayF: agreee 100% | 17:30 |
TheJulia | JayF: likewise, do you remember the giant warning I demanded on smartnic support? | 17:31 |
JayF | TheJulia: to rely on DPU to be the security arbiter | 17:31 |
TheJulia | yup, and in their benefit, Mellanox/Nvidia rightfully took a lot of our points and went back to their product designs and improved them | 17:32 |
JayF | I think it's still safe to say you can't achieve a level of security with the DPU as the firewall that you can when you have a switch as a firewall | 17:32 |
TheJulia | Others... not so much or just haven't gotten to that point in their product journey | 17:33 |
JayF | honestly even in an ideal world, it makes me nervous to tie together security+performance | 17:34 |
JayF | because what happens if I need DPU firmware N+1 for security patches, but then that hurts network performance? | 17:34 |
JayF | I wanna avoid that particular vice. | 17:34 |
TheJulia | yeah, defense is always in depth | 17:34 |
TheJulia | you can't expect a single layer to be impervious | 17:34 |
* JayF has been burned by beta-quality switch security technologies in the past | 17:34 | |
JayF | cardoe: do you know ^^^ the story behind that? | 17:34 |
JayF | cardoe: you can probably look it up in core :) | 17:35 |
cardoe | I don't. I'm scared now | 17:35 |
JayF | it was all OMv1 stuff. A security bug in the cisco security stuff which allowed tenants to see traffic from other tenants in some cases | 17:35 |
JayF | and the person who reported it, well the case was maximum bad | 17:36 |
TheJulia | so I think the key takeaway, is to embrace many models, and have giant warnings next to appropriately risks | 17:40 |
JayF | I think in order to fully secure a server, you need to disconnect it from all external power and network sources, and take it to a deserted tropical island paradise. I'm going to start researching this solution ;) | 17:41 |
* TheJulia has an example warning being generated by an AI but it is thinking very hard | 17:42 | |
TheJulia | JayF: Don't forget the concrete box | 17:43 |
JayF | hey, that whole ironic.cmd -> ironic.command move was zero edit AI change from cursor IDE | 17:43 |
JayF | I was kinda impressed | 17:43 |
JayF | I'm skeptical of 'vibe coding' as a thing, but integrating it into tools seems like it can help | 17:43 |
TheJulia | so did you just say "rename this module" ? | 17:44 |
JayF | yeah | 17:44 |
JayF | and it went step by step, explaining all the changes it made | 17:44 |
JayF | I approved them, did some grepping to ensure it didn't miss anything | 17:44 |
JayF | I asked it similarly to write unit tests for my runbooks-as-automated-cleaning change; less impressive overall but not worthless | 17:45 |
JayF | I will say *cursor specifically* has been massively more capable than any copilot or jetbrains ai assistant I've tried | 17:45 |
TheJulia | neat | 17:51 |
JayF | The rest of the chat was commands it was asking permission to run (mv cmd command) and diffs for me to approve around the codebase, including docs references. https://usercontent.irccloud-cdn.com/file/8ebxnWQa/image.png | 17:55 |
JayF | I think this weekend, I'm going to take sushy, sushy-tools, and https://github.com/sipeed/NanoKVM and see how far I can get it to go in writing a redfish API service for this device | 17:57 |
TheJulia | so, when the board created the AI contribution policy, we explicitly sought folks to provide a bit of clarity behind how they got to that point in addition to what was used. That at least provides clarity to the comment I was seeking an answer to for that change, thanks! | 17:57 |
TheJulia | We've clearly left he realm of "AI is just whipping content up" to "AI is a do-er" | 17:58 |
JayF | Like, other products I've used/tried were a combo of a chat box with a separate coding assistant which was incapable of grokking more than a few lines of context | 17:58 |
JayF | cursor being able to see the context of the whole repo you have loaded into the IDE is a massive game changer tbh | 17:58 |
JayF | AIUI claude-code is similarly powerful, maybe even moreso -- a report I got from someone I trust a lot who managed to get it to do some crazy stuff to do is part of why I'm heading back down this path | 17:59 |
TheJulia | so... I guess we need to whip up our ptg schedule? | 18:03 |
* TheJulia gets out the PTG-Aid... (which is really a KitchenAid with a post-it note) | 18:12 | |
TheJulia | rpittau: I themed days together on the etherpad, not entirely a schedule but bucketed topics to help guide discussion. | 18:40 |
opendevreview | Jay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook https://review.opendev.org/c/openstack/ironic/+/945259 | 20:34 |
cardoe | Any reason why networking-baremetal fails tests always? | 21:05 |
cardoe | Like should I figure out why or we know about it | 21:05 |
TheJulia | cardoe: it should pass, but It would help to understand what is going on to frame the discussion | 21:05 |
TheJulia | (a link would help | 21:05 |
cardoe | Been trying to run this one https://review.opendev.org/c/openstack/networking-baremetal/+/945818 | 21:06 |
cardoe | baremetal multi-tenancy trunk fails with "Details: {'type': 'HTTPNotFound', 'message': 'The resource could not be found.', 'detail': ''}" | 21:06 |
TheJulia | Finally got my AI image about "WARNING: this feature may..." https://usercontent.irccloud-cdn.com/file/Okh5NSja/warning-this-feature-may.png | 21:06 |
TheJulia | hmm.. what is going on with that job | 21:15 |
TheJulia | Oh | 21:16 |
TheJulia | its the new test which was added | 21:16 |
TheJulia | https://2b2204763b167555199d-100ff86b174195291db1bc21f098c4ef.ssl.cf1.rackcdn.com/openstack/d106f78853ce427e9259903035d75a06/testr_results.html | 21:16 |
TheJulia | This is why we need to be stupidly careful about new tests in the tempest plugin | 21:16 |
JayF | especially scenario tests, really | 21:17 |
JayF | I think we should have more straight API tests (as evidenced by the giant PR from Adam) | 21:17 |
TheJulia | yeah, the test is trying to bind the trunk together and it blows up because it is not part of the test | 21:30 |
TheJulia | I'll switch laptops and just disable it since the feature is not loaded on that neutron | 21:30 |
opendevreview | Jay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook https://review.opendev.org/c/openstack/ironic/+/945259 | 21:34 |
TheJulia | sigh, no knob | 21:37 |
opendevreview | Jay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook https://review.opendev.org/c/openstack/ironic/+/945259 | 21:42 |
* JayF hammers more on this in devstack | 21:43 | |
JayF | it's been a while since I wrote a feature that was this close to the RPC boundary, having to relearn some stuff | 21:43 |
opendevreview | Jay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook https://review.opendev.org/c/openstack/ironic/+/945259 | 21:45 |
TheJulia | I see what is going on | 21:47 |
TheJulia | the ngs plugin has been changed to assume all usage of the plugin has trunk usage enabled | 21:47 |
TheJulia | That is a flawed assumption as far as I'm concerned | 21:49 |
JayF | yeah | 21:50 |
TheJulia | looks like vasyl's devstack change sort of does the right thing | 21:50 |
TheJulia | so... | 21:50 |
* TheJulia fixes stuffs | 21:50 | |
JayF | Apr 01 21:48:25 devstack-20250401 ironic-conductor[129213]: WARNING oslo_db.sqlalchemy.exc_filters [None req-8b2699fd-508d-4bf3-9e75-d7f324ba74ea None None] DB exception wrapped.: TypeError: Object of type datetime is not JSON serializable | 21:50 |
TheJulia | rutro | 21:51 |
JayF | it looks like things are blowing up while trying to update updated_at!? | 21:51 |
TheJulia | where did you get that at?! | 21:51 |
JayF | https://review.opendev.org/c/openstack/ironic/+/945259 | 21:51 |
JayF | trying to get a runbook cleaning to work with this | 21:51 |
JayF | Apr 01 21:48:25 devstack-20250401 ironic-conductor[129213]: ERROR oslo_db.sqlalchemy.exc_filters oslo_db.exception.DBError: (builtins.TypeError) Object of type datetime is not JSON serializable | 21:51 |
JayF | Apr 01 21:48:25 devstack-20250401 ironic-conductor[129213]: ERROR oslo_db.sqlalchemy.exc_filters [SQL: UPDATE nodes SET driver_internal_info=%(driver_internal_info)s, updated_at=%(updated_at)s WHERE nodes.id = %(nodes_id)s] | 21:51 |
JayF | obviously maximal wtf | 21:52 |
JayF | but I musta screwed up something | 21:52 |
JayF | the traceback is so obscured there's not even any ironic code in it :| | 21:53 |
TheJulia | eek :\ | 21:55 |
JayF | wow the error is so obscured because it spirals trying to write to node.last_error | 21:55 |
opendevreview | Julia Kreger proposed openstack/networking-generic-switch master: CI: Fix trunks enabled by default https://review.opendev.org/c/openstack/networking-generic-switch/+/946089 | 21:56 |
TheJulia | That should fix it | 21:56 |
TheJulia | JayF: bravo? :) | 21:57 |
JayF | where do I pick up my "good job breaking ironic so well" aware | 21:57 |
JayF | *award | 21:57 |
JayF | not quite a cve, more of an lolsob | 21:57 |
TheJulia | do I need to have coins minted? | 21:58 |
JayF | this error could happen | 21:59 |
JayF | if I sent over something not-RPC-encoded properly by accident (?) | 21:59 |
JayF | hmmm | 21:59 |
JayF | https://review.opendev.org/c/openstack/ironic/+/945259/9/ironic/conductor/manager.py#1182 I suspect I need to do /something/ there to make it happy | 22:00 |
JayF | cid: ^ if you're around, got any ideas? | 22:03 |
TheJulia | if I'm recalling correctly, clean steps get kicked back to the rpc bus | 22:06 |
TheJulia | so the task is ultimately released and picked up | 22:06 |
JayF | my hunch is the clean_steps get passed into do_node_clean | 22:06 |
JayF | and when it tries to save the clean steps onto the object | 22:06 |
JayF | boom because I haven't encoded them properly, or something like that | 22:06 |
JayF | manual cleaning might be a good place to look for stuff in this realm | 22:07 |
TheJulia | possibly, I suspect you'll need to add tons of extra logging to follow out where exactly we stop logigng | 22:07 |
JayF | I'm doing that now but it always feels like I'm losing at the mental game when I can't logic out the bug | 22:11 |
TheJulia | perhaps a sign to call it a day then and give it a fresh brain in the morning ? | 22:11 |
JayF | ne None] JAY: ABOUT TO SAVE NODE IN DO_NODE_CLEAN {{(pid=130998) do_node_clean /opt/stack/ironic/ironic/conductor/cleaning.py:86}} | 22:11 |
JayF | nope, it's a sign that I knew exactly what was going wrong | 22:11 |
JayF | huzzah but also sad because I don't know how to fix it LOL | 22:11 |
TheJulia | a fresh brain :) | 22:16 |
JayF | I think I have a solution!! | 22:17 |
JayF | cid's code led me down the right path | 22:19 |
TheJulia | \o/ | 22:20 |
JayF | https://opendev.org/openstack/ironic/src/branch/master/ironic/api/controllers/v1/utils.py#L1600 | 22:20 |
JayF | I've been trying to answer the ? of why this method exists on the API side | 22:20 |
JayF | now I know: that effectively eliminates the datetime fields (e.g. created_at, updated_at, and friends) | 22:20 |
opendevreview | Jay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook https://review.opendev.org/c/openstack/ironic/+/945259 | 22:21 |
JayF | | driver_internal_info | {'clean_steps': [{'interface': 'deploy', 'step': 'erase_devices_express', 'args': {}, 'order': 1}], | | 22:23 |
JayF | it's woooorrrrkkkiiinnnnggggg | 22:23 |
JayF | I can't leave something that broken at my EOD; I'll lose all context if I forget, and if I remember I'll dread it in the morning ;) | 22:24 |
TheJulia | woot! | 22:26 |
JayF | I will note it will make my development life easier, much easier, if we can land https://review.opendev.org/c/openstack/ironic/+/945999 | 22:31 |
cid | \o/ | 22:36 |
opendevreview | cid proposed openstack/ironic-python-agent master: WIP: eventlet wsgi to Gunicorn https://review.opendev.org/c/openstack/ironic-python-agent/+/946091 | 23:00 |
opendevreview | Jay Faulkner proposed openstack/ironic master: WIP: Automated cleaning by runbook https://review.opendev.org/c/openstack/ironic/+/945259 | 23:15 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Rename ironic cmd module https://review.opendev.org/c/openstack/ironic/+/945999 | 23:22 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!