JayF | dking and others interested: our IPA-Builder elements *do* use constraints to install: https://opendev.org/openstack/ironic-python-agent-builder/src/branch/master/dib/ironic-python-agent-ramdisk/install.d/ironic-python-agent-ramdisk-source-install/60-ironic-python-agent-ramdisk-install#L14 | 00:00 |
---|---|---|
JayF | We might want to add a bit to our docs/examples around hardware manager customization to demonstrate that you need to use U-C if you're setting up downstream HWM unit tests | 00:00 |
JayF | I am not going to take that action today because it's my EOD and my todo list is long; but that's something that hopefully someone could take care of (maybe an operator who experienced this firsthand :D ) | 00:01 |
opendevreview | Merged openstack/ironic master: Use quay.io registry image for metal3 job https://review.opendev.org/c/openstack/ironic/+/935895 | 00:08 |
opendevreview | Steve Baker proposed openstack/ironic master: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/935992 | 01:27 |
opendevreview | Steve Baker proposed openstack/ironic master: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/935992 | 01:36 |
rpittau | good morning ironic! happy friday! o/ | 07:11 |
rpittau | JayF: re bugfix cut: the plan was to release next week (last week of November) so we're more than good, I'll get the release request up ASAP | 07:12 |
opendevreview | Verification of a change to openstack/ironic master failed: Agent deploy: account for disable_power_off https://review.opendev.org/c/openstack/ironic/+/934637 | 10:15 |
opendevreview | Iury Gregory Melo Ferreira proposed openstack/ironic master: Update Node Cache during Servicing https://review.opendev.org/c/openstack/ironic/+/936009 | 11:16 |
opendevreview | Merged openstack/ironic master: IPMI power: account for disable_power_off https://review.opendev.org/c/openstack/ironic/+/932624 | 13:01 |
cardoe | Anyone know if we can boot loop on BIOS? Thinking about booting to the internal UEFI shell. Essentially trying to solve for the gotcha we have in the docs where a BIOS reset might make the machine unable to boot the IPA until we change some BIOS settings. Which are queued up in my next step but can’t get there cause IPA now didn’t boot. | 13:44 |
cardoe | There’s no reason the setting of the BIOS settings needs to boot into IPA via redfish. That’s all out of band. The only requirement is that the redfish task only completes once the machine powers up. I have a suspicion that these Dells have some shenanigans with ExitBootServices / EnterRuntimeServices. Cause I tried to boot Xen on there and the BIOS setting job completed on a very different time frame than Linux. | 13:47 |
cardoe | I used Xen because I happened to have coded that behavior 10? Years ago and it left a scar on my soul so I know the behavior difference from Linux. | 13:49 |
cardoe | Today we poke redfish directly cause we just say that ironic cannot handle this. | 13:51 |
dtantsur | cardoe: janders and I really want to stop booting IPA so often during steps like firmware updates or BIOS settings. We haven't got to it yet. | 14:16 |
dtantsur | The reason IPA is booted before any steps run is because it may have its own clean/service steps that override the built-in ones (think, software RAID or some proprietary BIOS settings tool) | 14:16 |
dtantsur | Maybe we need to change this approach with Redfish and declare that in-band steps should not override out-of-band ones | 14:17 |
TheJulia | cardoe: set a machine to keep booting to bios? | 14:17 |
dtantsur | It feels like the solution may be to fix ironic, not to boot into bios? | 14:18 |
dtantsur | (otherwise, we could add a special clean/service step for that) | 14:18 |
cardoe | dtantsur: Yeah that would make sense. | 14:20 |
cardoe | TheJulia: I don't have TFTP anymore so can't boot BIOS. | 14:20 |
cardoe | But VirtualMedia doesn't work. Whatever Ironic sends results in a 400 error back. | 14:20 |
dtantsur | OMG | 14:21 |
dtantsur | We haven't seen this in our lab, which model is that? | 14:21 |
cardoe | So we're using redfish-https and/or ipxe-http | 14:21 |
cardoe | R7515, R7615 and R740 | 14:21 |
dtantsur | huh | 14:21 |
dtantsur | probably newer than what we have, right iurygregory? | 14:22 |
iurygregory | dtantsur, correct | 14:22 |
TheJulia | good morning everyone | 14:22 |
dtantsur | morning TheJulia | 14:22 |
cardoe | Give me a few minutes and I'll grab the bug (I think I made one) or at least pastebin the error. | 14:22 |
dtantsur | cardoe: is there any error message? Or just HTTP 400? | 14:22 |
iurygregory | may the new ones work uefi http boot lol :D | 14:22 |
dtantsur | right :) | 14:22 |
TheJulia | cardoe: I mean, not bios mode, but I mean into firmware bios settings ? | 14:23 |
dtantsur | iurygregory: if cardoe uses redfish-https, it works | 14:23 |
dtantsur | (which is yay! at least) | 14:23 |
iurygregory | NICE! | 14:23 |
cardoe | It's on my other machine and its sooo far (gesturing to the other room). It's cold this morning so I'm in my chair with a blanket and not wanting to get up. :-D | 14:23 |
dtantsur | oh yeah, it's very chilly here as well | 14:23 |
* dtantsur hopes for a lot of snow in the mountains this season | 14:23 | |
* TheJulia feels the cold in her fingers | 14:23 | |
* TheJulia doesn't want to be up this morning | 14:24 | |
* iurygregory hopes for snow in Berlin between Dec 8 Dec12 :D | 14:24 | |
TheJulia | cardoe: maybe higher bandwidth, once caffination soaks in, might help/enable getting on same page | 14:24 |
cardoe | So basically when we clean, we reset the BIOS and want to set some settings back. | 14:24 |
dtantsur | iurygregory: not very likely, but let's keep the fingers crossed :) | 14:25 |
iurygregory | dtantsur, yeah =) | 14:25 |
cardoe | My plan was a custom hardware manager to ensure that, which is likely what JayF did for Rackspace ages ago. | 14:25 |
dtantsur | ah, reset, interesting. we don't really do it in our lab. | 14:25 |
cardoe | But the BIOS reset happens and it reboots from the IPA and then it can't get back to IPA. | 14:25 |
cardoe | https://docs.openstack.org/ironic/latest/admin/drivers/idrac.html#pxe-reset-with-factory-reset-bios-clean-step | 14:26 |
cardoe | Today we have automated cleaning off and a pile of Python poking redfish directly. | 14:26 |
dtantsur | TheJulia: https://review.opendev.org/c/openstack/ironic/+/929904 could use your attention when you have a minute | 14:27 |
cardoe | We're using HTTP Push to push an ISO that's just a Linux ramdisk that sleeps forever | 14:28 |
cardoe | So I reset the BIOS and give it a one time boot of this sleep forever ISO. | 14:28 |
cardoe | That let's the Redfish Task complete. | 14:28 |
dtantsur | How is it different from booting IPA though? | 14:29 |
cardoe | Then I apply the BIOS settings and give it the sleep forever ISO. | 14:29 |
cardoe | Well IPA is a fetch and not a push. | 14:29 |
cardoe | Sushy doesn't yet have push in it. | 14:29 |
dtantsur | Is push a standard thing? I thought we were still discussing it.. | 14:29 |
cardoe | I mean "standard" | 14:29 |
cardoe | There's an advertised field in the Redfish manager which gives you the endpoint. | 14:30 |
cardoe | It's then vendor specific for the payload but I think a recent redfish update provided a suggested payload. | 14:30 |
dtantsur | cc janders ^^^ | 14:31 |
cardoe | If OEM == "Hpe" do_this() if OEM == "Dell" do_that() | 14:31 |
dtantsur | ugh | 14:31 |
cardoe | Which is part of my interest in the OEM drivers in sushy. | 14:31 |
dtantsur | Anyway. If we get a proper Task from sushy for BIOS settings, we can indeed boot into ~nothing (UEFI shell or whatever). | 14:32 |
cardoe | Well sushy unfortunately drops the task cause it doesn't look at the response except to check for a 200. | 14:32 |
dtantsur | If you fix this, we'll owe you a beer :) | 14:33 |
cardoe | When you issue a reboot you get something in the response that a flag that says "I'm gonna do something before I actually reboot". Then you hit the Task endpoint and you'll have a pending task. | 14:34 |
cardoe | I'm trying to figure out how to lift this into sushy. I'm wanting us to stop crafting our own redfish poking library and just use sushy. | 14:34 |
dtantsur | +++++ | 14:34 |
cardoe | The issue I've had is that the box boots and the task doesn't complete until the Linux kernel gets to a certain point. | 14:35 |
cardoe | My sleep forever ISO gets there pretty quick cause there's no init. | 14:35 |
dtantsur | Are you sure it's related to the kernel O_o | 14:35 |
cardoe | IPA gets there fairly quick and then Xen took its sweet time. | 14:35 |
dtantsur | I'm really surprised to hear it. Can it just take a while? | 14:35 |
cardoe | So this is where my jump to conclusions mat came out and made me think it's related to ExitBootServices / EnterRuntimeServices | 14:36 |
cardoe | So grub used to call EBS but then Linux changed like 10+ years ago to want something in there so grub stopped doing that. | 14:37 |
dtantsur | Hmmm | 14:37 |
TheJulia | dtantsur: my half caffinated brain has looked at https://review.opendev.org/c/openstack/ironic/+/929904 and I see one tiny issue | 14:38 |
cardoe | For Xen I had to not call EBS until dom0 started up and I had to make a mapping of the memory to make it still work for dom0. | 14:38 |
cardoe | So that's where my timing guess is coming from. | 14:38 |
dtantsur | I thought that iDRAC creates a job, which is executed by its firmware during reboot, then the real reboot happens automatically | 14:38 |
dtantsur | but who knows | 14:38 |
TheJulia | dtantsur: no release note to indicate if anyone has hard coded their permitted formats and they permit raw, they might want to add gpt. | 14:38 |
TheJulia | as a follow-up is totally cool in my book, fwiw | 14:38 |
cardoe | Yeah iDRAC is creating the job for us. | 14:38 |
cardoe | But the job sits at like 33% complete until it starts to boot some OS. | 14:39 |
dtantsur | TheJulia: I can do it, although I take this case into account and add "gpt" for them (I think, it has been a while) | 14:39 |
dtantsur | cardoe: I'm speechless, iDRAC is even more complicated than I thought.... | 14:40 |
cardoe | I wish it would complete before something booted. | 14:40 |
dtantsur | so yeah, so far we're booting IPA, this is why, I guess, we haven't noticed this behavior | 14:40 |
TheJulia | dtantsur: ... I didn't see that, but a reno might be good anyway since we're also moving to the community library | 14:40 |
TheJulia | dtantsur: oh, idracs are stupidly complex under the hood | 14:40 |
cardoe | The internal UEFI shell completes the job too. | 14:40 |
* TheJulia needs to nom something to reach "OS Running" state | 14:41 | |
dtantsur | TheJulia: this is the logic: https://review.opendev.org/c/openstack/ironic/+/929904/6/ironic/common/images.py#870 | 14:41 |
dtantsur | happy to follow-up with a release note anyway | 14:41 |
cardoe | So I feel like the job isn't completing until the machine thinks its moved onto an OS is Running state. | 14:41 |
dtantsur | sigh | 14:41 |
dtantsur | We can do the UEFI shell for sure, it's one of the boot device overrides that sushy supports | 14:41 |
* dtantsur needs to finally get some lunch, brb | 14:42 | |
TheJulia | dtantsur: ahh, yes | 14:43 |
TheJulia | yeah, UEFI shell is totally a thing which can be requested | 14:43 |
cardoe | And that's why I was asking about BIOS cause this would make disable_ramdisk=True work for factory_reset and apply_configuration on UEFI. | 14:43 |
cardoe | But it wouldn't work if you were booting legacy. | 14:43 |
dtantsur | Do you care about legacy? I'm totally cool if we make disable_ramdisk conditional on UEFI. | 14:44 |
cardoe | I don't care about legacy. | 14:44 |
cardoe | Okay I'll go that route then. | 14:44 |
cardoe | I think we did the sleep forever ISO back when we used to care about legacy boot. | 14:45 |
cardoe | Cause UEFI shell works and I don't have to have a magical bespoke image to push. | 14:45 |
cardoe | Our push code is also horrifying cause it supports the vendor specific endpoint and not the redfish push one. | 14:46 |
TheJulia | When are we making "BIOS boot is dead" shirts? | 14:48 |
cardoe | Not soon enough. | 14:52 |
TheJulia | "BIOS boot is dead." with a subtext of "... At least on Bare Metal." | 15:23 |
rpittau | bye everyone have a great weekend! o/ | 15:38 |
JayF | \ | 15:41 |
JayF | \o | 15:41 |
opendevreview | Julia Kreger proposed openstack/ironic master: First pass on some strucutral context setting for networking https://review.opendev.org/c/openstack/ironic/+/936039 | 16:48 |
opendevreview | Julia Kreger proposed openstack/ironic master: docs: begin making a general networking document https://review.opendev.org/c/openstack/ironic/+/936040 | 16:48 |
opendevreview | Julia Kreger proposed openstack/ironic master: docs: change network setup steps into the commands https://review.opendev.org/c/openstack/ironic/+/936041 | 16:48 |
opendevreview | Julia Kreger proposed openstack/ironic master: docs: rewrite ml2 and update physnet context https://review.opendev.org/c/openstack/ironic/+/936042 | 16:48 |
opendevreview | Julia Kreger proposed openstack/ironic master: docs: final cleanup pass on networking https://review.opendev.org/c/openstack/ironic/+/936043 | 16:48 |
dtantsur | wow | 16:49 |
TheJulia | basically a complete rewrite of that file | 16:49 |
TheJulia | super bad shape, it was in | 16:49 |
TheJulia | and still needs more work, hence the final change :) | 16:49 |
dtantsur | +++ | 16:49 |
TheJulia | I *think* that better sets base context, to help delineate things apart | 16:50 |
TheJulia | and also focuses more on "the single thing" as opposed to "oh, you need your api version, and then you need to do this and that" | 16:50 |
TheJulia | which is just noise | 16:50 |
TheJulia | ... if your at this point, at least. | 16:50 |
TheJulia | ... I think i'm still sort of missing the some context around how to understand how vifs get applied, but it it is much more theory for people to grok.. | 16:59 |
TheJulia | I guess, maybe it is... how to use the thing related which got lost in some of the early networking stuff | 17:00 |
TheJulia | like we know how many vifs a node can take | 17:00 |
TheJulia | it is based upon the ports and what gets asked for | 17:00 |
TheJulia | But hey, also opened a bug over preferring PXE ports... | 17:00 |
TheJulia | *WHY!* | 17:00 |
TheJulia | That makes sense for PXE, sure | 17:01 |
TheJulia | doesn't make sense for tenant workloads | 17:01 |
TheJulia | At least in my opinion | 17:01 |
TheJulia | </rant> | 17:01 |
dtantsur | Hmm, fair | 17:01 |
TheJulia | This concludes your daily ironic rant | 17:01 |
dtantsur | \o/ | 17:02 |
cardoe | should https://review.opendev.org/c/openstack/ironic/+/934065 be backported to 2024.2? | 17:08 |
cardoe | How would people feel if I got rid of pbr from the runtime of sushy? It's literally only used to parse the local version info to return the version as a major, minor, patch tuple. | 17:15 |
cardoe | pbr still doesn't have Python 3.12 support | 17:15 |
dtantsur | I'd appreciate that | 17:15 |
cardoe | dtantsur: https://paste.opendev.org/show/bpoxYEvFBbqHpsGTy1IF/ that's the virtual media issue. | 17:27 |
dtantsur | ahhhh | 17:28 |
dtantsur | we seem to be getting reports of these 'Virtual Media is detached or Virtual Media devices are already in use.' | 17:28 |
dtantsur | btw cardoe, this literal error may be shadowing some more precise error because of our retry logic | 17:29 |
cardoe | hmm that could make sense. | 17:29 |
dtantsur | yeah, I was going to file a bug but forgot | 17:29 |
dtantsur | we retry connection on HTTP 500, so chances are high that the first error was not the same as this one | 17:29 |
cardoe | TheJulia: https://bugs.launchpad.net/sushy/+bug/2041902 did you have a follow on in Ironic planned for that? | 17:31 |
cardoe | dtantsur: iurygregory has https://review.opendev.org/c/openstack/sushy/+/924020 wonder if that's affecting me as well. | 17:33 |
iurygregory | cardoe, this one I was working because of a Cisco Bug | 17:53 |
iurygregory | but it was a very old firmware that was EOL | 17:53 |
cardoe | You described Dell's latest firmware | 18:26 |
cardoe | dtantsur: was hoping you'd weigh in on https://review.opendev.org/c/openstack/ironic/+/933020 | 18:30 |
dtantsur | will check on Monday | 18:30 |
TheJulia | cardoe: no, no follow-up in ironic. I spot checked some machines and saw only partial support so reliance upon it is... sort of sketchy as the kids say | 18:33 |
TheJulia | *but* relying on it can help some stuff | 18:34 |
cardoe | Thanks Steve Buscemi. | 18:35 |
TheJulia | There *was* something I was thinking of where it would be super useful to check if present and then block/proceed with child nodes | 18:39 |
JayF | TheJulia: https://review.opendev.org/c/openstack/ironic/+/936039 did you want someone else to review the doc changes before they land? | 19:25 |
JayF | It's only been 3 hours and it has two +2s now, so just ensuring you weren't hoping someone wearing a maroon fedora wanted to look first :D | 19:25 |
TheJulia | JayF: I've got 5 changes stacked up, merge what we can and I'll fix them as I roll | 19:26 |
JayF | that is what I suspected, but just checking these weren't written with a specific audience in mind :D | 19:26 |
JayF | when was the last time a non-critical bugfix was merged in ... like half a day :) | 19:26 |
TheJulia | nope, largely to just clarify the docs and get things to reflect reality | 19:26 |
TheJulia | because now() doesn't reflect | 19:27 |
JayF | I'm only one patch in and it's 100% improved | 19:27 |
TheJulia | heh | 19:27 |
TheJulia | yeah, looks like I've got a warning in the next patch which means I need to pull a fix forward | 19:27 |
TheJulia | sooon | 19:27 |
JayF | well I'm going to go put some lunch on and then finish reviewing the stack :) thanks! | 19:28 |
TheJulia | I'll try and fix that issue and rebase the stack | 19:30 |
TheJulia | JayF: keep in mind, I recognize through the changes that I'm improving chunks at a time, so not everything is perfect, but just trying to leave it better with each pass | 19:31 |
JayF | my bar for reviewing is always "is it better than it was" with the exception of things that are potential security risks or user-facing-APIs that are hard to fix later | 19:31 |
TheJulia | heh, looks like I also addressed your first comment largely in the sections being rewritten | 19:33 |
TheJulia | mostly, at least | 19:33 |
JayF | I thought that would be possible | 19:33 |
* TheJulia leaves comments on the last patch | 19:34 | |
opendevreview | Doug Goldstein proposed openstack/sushy master: enable pycodestyle and pyflakes checks in ruff https://review.opendev.org/c/openstack/sushy/+/934915 | 19:37 |
cardoe | So I'm working on bringing the sushy-oem-idrac stuff in... I was going to put it on top of my pyupgrade to Python 3.9 patch. | 19:38 |
opendevreview | Merged openstack/ironic master: First pass on some strucutral context setting for networking https://review.opendev.org/c/openstack/ironic/+/936039 | 19:45 |
opendevreview | Julia Kreger proposed openstack/ironic master: docs: begin making a general networking document https://review.opendev.org/c/openstack/ironic/+/936040 | 19:51 |
opendevreview | Julia Kreger proposed openstack/ironic master: docs: change network setup steps into the commands https://review.opendev.org/c/openstack/ironic/+/936041 | 19:51 |
opendevreview | Julia Kreger proposed openstack/ironic master: docs: rewrite ml2 and update physnet context https://review.opendev.org/c/openstack/ironic/+/936042 | 19:51 |
opendevreview | Julia Kreger proposed openstack/ironic master: docs: final cleanup pass on networking https://review.opendev.org/c/openstack/ironic/+/936043 | 19:51 |
opendevreview | cid proposed openstack/ironic-specs master: Add a Kea DHCP backend https://review.opendev.org/c/openstack/ironic-specs/+/931025 | 20:05 |
opendevreview | cid proposed openstack/ironic-tempest-plugin master: Test double encoding of error message https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/935740 | 20:05 |
opendevreview | cid proposed openstack/ironic-tempest-plugin master: Fix test to not expect double-JSON-encoded errs https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/932544 | 20:06 |
cardoe | TheJulia: Do we send "extras" on ports as part of the binding profile? How can a VIF have multiple physical networks? | 20:19 |
adamcarthur5 | https://review.opendev.org/c/openstack/ironic/+/928919/7 and https://review.opendev.org/c/openstack/ironic/+/928920/5 should be ready to be reviewed, and have had +1 verified from after the merge of the shard tests in tempest (So we are testing that we get fails on the right microversions | 20:21 |
JayF | so adamcarthur5 kinda dumb question about this change | 20:23 |
JayF | is this going to be the first time we've validated schema for shards requests? | 20:23 |
JayF | I'm trying to sniff out if there's any api change; e.g. an error message coming across different in a bad request workflow | 20:23 |
adamcarthur5 | I think it depends what you mean by "validated"? | 20:23 |
adamcarthur5 | Ah - but by that metric, I suspect there is likely to be a change in there somewhere, yes | 20:24 |
adamcarthur5 | It depends on whether the internal-function checks you have already for microversions are consistent. But I could see a 404/406 switcharoo being possible? | 20:24 |
iurygregory | crazy bug in servicing for firmware update, at least in a Dell machine it updates but for some weird reason the new information is not in the DB :facepalm: | 20:24 |
cardoe | Is that the fix you just submitted? | 20:25 |
JayF | adamcarthur5: yeah it just seems weird to me we had no schema on incoming requests before | 20:25 |
JayF | adamcarthur5: lets add a release note to that? Just indicating we're validating input schemas | 20:25 |
iurygregory | cardoe, yup the patch I pushed earlier today, but the fix isn't helping much at least in my testing :D | 20:26 |
JayF | I was waffling but if there's any chance at all it's not invisible, lets put something in there under "features" | 20:26 |
adamcarthur5 | Sounds good JayF, do you think any extra testing is required for the schema validation? | 20:26 |
adamcarthur5 | Like what we did for microversion-fail testing | 20:26 |
JayF | I think if it was my change, I'd want a solid answer to if that 404/406 switch if possible | 20:27 |
JayF | but that is also a very high bar :) | 20:27 |
adamcarthur5 | Yeah, at the very least, the first change is definitely tested and ready to go. I'll keep looking into the 2nd | 20:27 |
JayF | the first already had my +2 | 20:28 |
JayF | reapplied | 20:28 |
JayF | https://review.opendev.org/c/openstack/ironic/+/928919 would be nice to land if someone else who can is around and wants to /me nudges the two other cores watching IRC ;) | 20:29 |
cardoe | I'll look once I'm done reading this doc from Julia. | 20:30 |
cardoe | Do we have a syntax to refer to config options in rst? | 20:31 |
cardoe | or is it really ``[section]option``? I thought there was some other syntax. | 20:31 |
JayF | there is | 20:32 |
cardoe | I feel like I wanna draw some diagrams for some of this stuff too. | 20:32 |
cardoe | I +2'd it. I've been keeping up with the changes on it. | 20:37 |
JayF | awesome, I'll land it \o/ | 20:37 |
JayF | I suspect once adamcarthur5 gets his engine going on the tempest-validation of microversion + schema changes, we'll have a lot of em to review | 20:38 |
JayF | end state of good API test coverage enforcing microversions in tempest + more readable schemas is super exciting | 20:38 |
shermanm | regarding the recent flurry of redfish/virtualmediaboot stuff, I wanted to mention that I ran into someone from the redfish/dtmf forum, and was pretty strongly requested to submit any reports of vendors with misbehaving implementations so they could *encourage* a fix | 20:43 |
shermanm | to https://redfishforum.com/ | 20:44 |
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/allocations https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/935827 | 20:49 |
opendevreview | cid proposed openstack/ironic-specs master: Add a Kea DHCP backend https://review.opendev.org/c/openstack/ironic-specs/+/931025 | 20:58 |
JayF | Circling back around to the "ironic-ui wants to get /v1/conductors" thing; there *is* a /v1/drivers that appears to have that same info | 21:02 |
JayF | and maybe more structured to redact info from | 21:02 |
TheJulia | shermanm: yes, I believe some of that has occured :) | 21:08 |
TheJulia | cardoe: we is a couple different styles, my intent is to batch those sorts of changes up in the last change | 21:08 |
TheJulia | cardoe: also, regaring extras on ports. A VIF cannot have multiple physical networks, but can be mapped across multiple ones, potentially | 21:09 |
TheJulia | thing of a physical network as a fabric | 21:10 |
TheJulia | it might be possible to be on physnet1 and physnet2, but not physnet3 | 21:10 |
TheJulia | physnet3 is Julia's secret network of doom plugged into the special hypervisor full of lolcats | 21:10 |
TheJulia | At least, that is my understanding | 21:12 |
TheJulia | a physical port can only be on a single physnet as well | 21:12 |
TheJulia | (the one it is attached to) | 21:12 |
TheJulia | .... Anyone have a delorean that can get up to 88? I need to go back in time and beat ourselves up about overusing "physnets" | 21:13 |
TheJulia | cardoe: since this is a lot of theory, drawing are always good | 21:14 |
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/nodes/{uuid}/firmware https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/936071 | 21:16 |
cardoe | Well I was just asking about the networks cause the wording implied multiple physical networks for the VIF> | 21:18 |
cardoe | fabric = physical network is literally what I'm preaching here. | 21:19 |
shermanm | so, this is a thing I've had a real headache describing to my operators and in our own docs. (caveat, from my perspective): Ideally physical-network == fabric. BUT, it's possible that a single network fabric may have multiple physnets on it, and/or not all of those physnets could be attached to a given port on that fabric | 21:21 |
shermanm | in our case, usually done to convince neutron to treat one set of vlans differently from another, even though they all coexist on the same fabric | 21:21 |
shermanm | so it's very possible that one "baremetal port" (cause I don't quite understand vif in this context yet), could have more than one neutron physical network attached to it, because they're just logical representations on the same underlying fabric | 21:24 |
shermanm | *potentially attached to it | 21:24 |
TheJulia | cardoe: ahhhhh! I could see that, if you can highlight it, it would be worthy of revising a little | 21:27 |
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/allocations https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/935827 | 21:33 |
shermanm | cardoe: ^ and please ignore my above if I interpreted your comment, totally agree that diagrams would be good. | 21:33 |
cardoe | shermanm: so I’m not a network guy at all so maybe I’m off. But a fabric is essentially a pile of interconnected stuff. I’m making VNIs on top through a vxlan type plugin (not the stock one since that’s OVN specific). | 21:33 |
cardoe | Today the ports are clueless about the specific VLANs they are getting in the racks to match up to those VNIs. | 21:34 |
cardoe | But we’re wanting to have the trunking extension. We’ll have to modify nova to special case those ports to create the correct network_data in cloud-init. | 21:35 |
TheJulia | cardoe: so I've done Ethernet and FibreChannel networks int the past, and thinking of them as fabric helps the interconnected nature of the things, they can be cross-connected with constraints, but at that point you might not let everything cross | 21:35 |
shermanm | cardoe: I'm mostly looking at this from the perspective of neutron + openvswitch + ngs with vlans, and that implementation. In that particular case, the only thing a neutron physnet means is which vlans, and which interface on the networking node, will be added to the ovs bridge. IIRC that part remains the same with OVN. So I'm primarily thinking of "fabric" == "domain within which l2 segment identifiers are valid" | 21:36 |
shermanm | but there's nothing that enforces that two different "logical domains" can't occupy the same physical domain, it's just up to the operator to make sure that they don't conflict | 21:37 |
JayF | I would be careful making that kinda assumption, there are places that do things like "this is 10.1/16 locally, and natted to 172.16/16 for other regions" at least if you mean what I think you mean | 21:37 |
JayF | unless you're explicitly saying, trying to draw a line, that it is two fabrics not one? | 21:37 |
JayF | (in that crazy case) | 21:38 |
cardoe | TheJulia: yeah we have piles of cross-connects. And on my roadmap are scheduler crazy for grokking cross-connects. But I’m honestly hoping for that to be scheduled around the heat death of the universe | 21:40 |
cardoe | Some of my gear has ports on a storage fabric and other ports on real network. | 21:45 |
TheJulia | shermanm: this is a valid way of looking at it as well. I guess the ultimate challenge is to frame, adn then hope humans don't over complicate it more to make it seem simple | 21:45 |
TheJulia | cardoe: Heat death of the universe sounds about right for that level of complexity | 21:46 |
cardoe | Well I would say the person that comes after me but I don’t wish that on them. | 21:46 |
opendevreview | Merged openstack/ironic master: api: Introduce new mechanism for API versioning https://review.opendev.org/c/openstack/ironic/+/928919 | 21:47 |
TheJulia | heh | 21:49 |
shermanm | TheJulia: I do agree on both interpretations if starting from scratch, I mostly wanted to be clear on "do we mean the same thing that neutron does by physical network", and if not, what are the constraints. Even if just "we assume that physnet == fabric, and if you configure neutron to mean something else, you can still only attach one of them <for reasons>" | 21:49 |
TheJulia | its a good thing to sort of just disambiguate upfront | 21:50 |
TheJulia | or at least, set the same shared context for the rest of the text to be digested with | 21:50 |
cardoe | So shermanm we're integrating to Nautobot to give another layer of info for the operators. Maybe we'll go back to NetBox but we'll see. | 21:50 |
cardoe | But in there we handle the switch templating so we're not using NGS. NGS also doesn't do VXLAN | 21:51 |
TheJulia | VTEP might be a way... | 21:53 |
TheJulia | but yeah | 21:53 |
shermanm | cardoe: I'm ultimately not talking too much about ngs here, everything just falls out of the neutron config for `network_vlan_ranges` in ml2_conf, and `bridge_mappings` in ovs-agent. | 22:05 |
shermanm | https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2_type_vlan.network_vlan_ranges | 22:05 |
shermanm | https://docs.openstack.org/neutron/latest/configuration/openvswitch-agent.html#ovs.bridge_mappings | 22:05 |
shermanm | agree that it doesn't really apply to the vxlan case though | 22:05 |
cardoe | TheJulia: I commented on the spot where it made me think multiple physical networks. And started reading the next patch... where you made a bunch of the changes I suggested in the prior one. So yeah I keep my +2 on the current "next in line" patch. | 22:07 |
cardoe | shermanm: yeah so I don't wanna get vlan ranges from a config file. And in fact that's something I brought up on the PTG call. I need that to be serviceable. | 22:07 |
TheJulia | ... that did come up... what was it | 22:08 |
TheJulia | it feels wrong to be in a config file | 22:08 |
cardoe | Because in all the cases here, I don't fully own the entirety of the fabric. | 22:08 |
shermanm | isn't there something with network-segment-ranges in neutron already? | 22:08 |
TheJulia | also wrong like not giving cats scritches | 22:08 |
TheJulia | cardoe: wasn't an idea that the fabric should be able to get asked what is free? | 22:08 |
cardoe | yeah | 22:09 |
TheJulia | cardoe: if so, we should ask neutron for it in the form of a bug/rfe | 22:09 |
TheJulia | because... yeah | 22:09 |
TheJulia | its a really bad pattern | 22:09 |
TheJulia | to rely upon config files | 22:09 |
cardoe | I had a follow up convo with someone who was giving me push back on the call. I forget who. But they ended up agreeing with me and said to make a bug/rfe. | 22:09 |
shermanm | but I also don't mean to debate the docs change and meaning of physnet to death here, I'm super happy with the changes already :) | 22:09 |
cardoe | I need to corner jamesdenton to help me write it. | 22:09 |
TheJulia | jamesdenton: dude, the needful calls! | 22:10 |
cardoe | Basically it's not possible to change a big running system. | 22:10 |
TheJulia | lets re-frame that | 22:10 |
cardoe | Cause they all need to change at the same time otherwise it's coin flip if you get the old value or the new value. | 22:10 |
cardoe | That's what the neutron dev concluded. | 22:11 |
TheJulia | keeping it running is a herculean task. Re-configuring the planet while Hercules is holding it is a laughable idea. | 22:11 |
cardoe | Cause each server reads the state of the ini and then updates the network-segment-ranges like shermanm was saying and then uses that value going forward. there's some re-sync background RPC thing that causes one of the nodes to update the DB if the DB is different then the state is has and all the other nodes read the value from the DB | 22:13 |
cardoe | Like a quasi-leader/follower without any clear direction who the leader is. | 22:13 |
TheJulia | ugh | 22:15 |
shermanm | so, I know it used to be *only* ini-driven and somewhat implicit, but there is also an api/db driven neutron plugin: | 22:16 |
shermanm | https://docs.openstack.org/neutron/latest/admin/config-network-segment-ranges.html | 22:16 |
shermanm | and distinguishes between the ini-created ones, and the API created ones: | 22:16 |
shermanm | https://docs.openstack.org/neutron/latest/admin/config-network-segment-ranges.html#default-network-segment-ranges | 22:16 |
shermanm | buut I have no idea how complete it is | 22:16 |
cardoe | I honestly didn't dig that deep. I just said updating ini files across a fleet isn't a good management surface for us. And was told I needed to look into network-segment-ranges. | 22:18 |
cardoe | The dev came back with an in-depth email about the whole thing. | 22:20 |
cardoe | Ultimately I wanna use both cause we have provider networks and we've got self-service tenant networks. | 22:20 |
cardoe | Like jamesdenton is one of my tenants for example. And he'll make himself a network and all that jazz. | 22:20 |
shermanm | I'd be curious about what you conclude, totally agree that moving away from ini-based config here is the way to go. I've successfully used the plugin to adjust the valid segment range for creation of new self-service networks on existing physnets, but haven't tried anything with creation/deletion of the actual physnets. | 22:22 |
cardoe | So I dunno what the right answer is. a system scoped API endpoint? | 22:24 |
cardoe | ini-file OR API? | 22:25 |
shermanm | from what I understand, the current state is that it supports both, with the API providing the new mechanism, but also exposing existing ini config for backwards compatibility | 22:30 |
cardoe | so that is just for self-service segments. it does nothing for provider segments | 22:51 |
TheJulia | I'm stepping away shortly | 22:51 |
TheJulia | I should be around again tomorrow morning | 22:51 |
opendevreview | cid proposed openstack/ironic-specs master: Add a Kea DHCP backend https://review.opendev.org/c/openstack/ironic-specs/+/931025 | 23:32 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!