Wednesday, 2025-07-09

iurygregoryTheJulia, https://etherpad.opendev.org/p/ironic-idrac10-issues#L17 01:47
iurygregorylet me know if this is enough or if you need more details01:47
opendevreviewAdam McArthur proposed openstack/ironic master: Update firmware schema to require 'created_at' and 'updated_at' fields  https://review.opendev.org/c/openstack/ironic/+/95335202:14
opendevreviewOpenStack Proposal Bot proposed openstack/ironic-ui stable/2025.1: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic-ui/+/95440203:08
opendevreviewOpenStack Proposal Bot proposed openstack/ironic-inspector master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic-inspector/+/95440303:12
opendevreviewOpenStack Proposal Bot proposed openstack/ironic-ui stable/2024.1: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic-ui/+/95440803:24
rpittaugood morning ironic! o/06:43
queensly[m]Good morning 08:39
masgharGood morning ironic!09:21
dtantsurTheJulia: metal3 folks from Ericsson have FakeIPA - a service imitating IPA without actually doing anything09:29
ContinuityMorning09:30
dtantsurTheJulia: blog https://metal3.io/blog/2024/10/24/Scaling-Kubernetes-with-Metal3-on-Fake-Node.html09:53
opendevreviewVerification of a change to openstack/ironic master failed: Handle unresponsive BMC during Firmware Updates  https://review.opendev.org/c/openstack/ironic/+/93810810:08
opendevreviewVerification of a change to openstack/ironic-python-agent master failed: Split hardware manager initialize out of evaluate_hardware_support  https://review.opendev.org/c/openstack/ironic-python-agent/+/95413910:41
opendevreviewVerification of a change to openstack/ironic-python-agent master failed: Graceful way for hardware managers to ignore certain devices  https://review.opendev.org/c/openstack/ironic-python-agent/+/95402410:41
opendevreviewVerification of a change to openstack/ironic-python-agent master failed: Trivial: avoid root logger in modules  https://review.opendev.org/c/openstack/ironic-python-agent/+/95424310:52
dtantsurJul 09 09:58:37.913976 np7c27e7c2f6dd4 ironic-conductor[109630]: ERROR ironic.common.glance_service.service_utils [None req-f686f925-c178-4e6a-86cb-b5b511371c39 None None] Unable to retrieve image members for image de91c3f0-d6b9-49eb-9936-a857812e575c: 'NoneType' object has no attribute 'image'We already have the code in IPA11:29
dtantsur(ignore the part after 'image')11:30
dtantsurI don't know if it's the root cause of the IPA CI outage, but seems possible11:30
dtantsurERROR ironic.conductor.utils [-] Deploy step deploy.write_image failed on node e9675a4a-a19d-4a04-9121-4f04af9b30d8. Download of image be19ea9a-80a3-4192-8187-d1ab3aa04e65 failed: Unable to write image to /tmp/be19ea9a-80a3-4192-8187-d1ab3aa04e65. Error: [Errno 28] No space left on device11:31
dtantsurSigh, this is the root cause11:31
dtantsurI'm curious why we don't stream the image11:33
dtantsur'disk_format': 'qcow2'11:34
dtantsurHave we broken force_raw?11:35
dtantsurI wonder if the first error causes the raw conversion to break11:36
dtantsurmmm, no, it's probably from downloading IPA or something like this. Does not seem directly related to instance images.11:40
opendevreviewVerification of a change to openstack/ironic master failed: Handle unresponsive BMC during Firmware Updates  https://review.opendev.org/c/openstack/ironic/+/93810811:54
opendevreviewMerged openstack/ironic-ui stable/2024.1: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic-ui/+/95440812:07
opendevreviewMerged openstack/ironic-ui stable/2025.1: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic-ui/+/95440212:13
TheJuliadtantsur: looks like it is http url based downloads in some cases, I'm planning on focusing on that today if nothing distracts me12:38
dtantsurthx!12:48
TheJuliadtantsur: also, it seems my last ipa-b fix to reduce the memory footprint of the ramdisk didn't work. I did raise a question a few days ago, if anyone objects for us to tune systemd so it also can't try and grab 10% of the ramdisk12:55
dtantsurI cannot imagine many objects (you can leave a way to opt out)12:55
TheJulia(we're basically chewing ~120M of ramdisk storage with $other stuff once it starts to boot12:56
TheJuliaack12:56
* TheJulia resumes caffination12:56
opendevreviewDmitry Tantsur proposed openstack/ironic-python-agent master: Hint at sector sizes when reporting an invalid written image  https://review.opendev.org/c/openstack/ironic-python-agent/+/95449813:59
dtantsurTheJulia: you'll "like" this one ^^13:59
TheJuliaoh noes ;)14:00
opendevreviewVerification of a change to openstack/ironic master failed: feat: add verify ca conf support for drivers  https://review.opendev.org/c/openstack/ironic/+/94754414:08
kubajjHello again dtantsur, TheJulia, I have tested the efibootmgr logging and it works well. Will prepare a change soon. I also discussed it with Arne and he still thinks that it would be nice to add logging before the changes happen so we can see what has changed for debugging. Is there anything similar to the collect_system_logs, but closer to the beginning of IPA's runtime? We also discussed that Ironic does not seem to 14:13
kubajjremove options other than duplicates from efibootmgr, is there a reason to leave them there? (In our QA node, we have almost 800 entries in the output.)14:13
dtantsurkubajj: (1) nothing special, just normal logging, (2) I wanted it done, just never got time to work on it (maybe I even filed an LP)14:16
kubajjdtantsur: (1) ok, will evaluate14:16
kubajjdtantsur: (2) would you say it would be difficult? I had a brief look at it and doesn't seem that bad - I am asking as I have a student coming to work on Ironic related stuff and need to give him a simple first task to do to get familiar with Ironic - could be a good fit14:18
dtantsurthat's something where TheJulia may have more opinions since she's spent some time fighting EFI records14:19
TheJuliaI mean, we've got logic to remove our duplicates if memory serves14:25
TheJulia800 entries is super impressive, I'm just worried something is missing or the version might not load it14:25
TheJuliaso I guess a good starting point is a really concise bug with some example efibootmgr output because maybe we're not taking the right path or maybe the machine *could* be ignoring removals14:26
TheJuliakubajj: look for ipa changes from steve baker a couple years ago, he put in code to try and keep the table cleaner, but I guess it is not working. It was in 2023.2 I believe14:26
kubajjTheJulia: I see the part which removes duplicates (we are running Caracal), but since we have so many entries, I thought it is supposed to do something else. Will dig deeper14:28
guilhermespTheJulia: sorry just saw your reply now... i mean, maybe its my lack of xp with it, but im not seeing a way that i set bios.apply_configuration args during cleaning that would apply everything else + secure boot and wont fail on the next boot. I might revisit that after. What im focusing right now is get a set of baseline redfish commands ( with no secure boot ) the user requested to be applied with automated clean ( 14:39
guilhermesp?  ) and play with reset_bios once the system is released :P 14:39
TheJuliaguilhermesp: it may also be the overall high level secure boot flag and work dmitry did, just might not apply to the Lenovo SR650s14:51
TheJuliaAs I mentioned and your learning, SR650s are *very* different14:51
guilhermesplol indeed... i simple bios.reset_factory is not going very well here. having a lot of fun right now :P 14:54
TheJuliaI guess Lenovo has some stream of specific customer(s) which have demands around the SR650 and it's release versions which drives a lot of it.15:14
TheJuliale-sigh, crashed laptop15:45
opendevreviewJulia Kreger proposed openstack/ironic-python-agent-builder master: Allow dib build to remove firmware in a structure.  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/95452415:58
TheJuliadtantsur: fyi ^16:01
TheJuliathat should help drop the size of the images which should make things a little better16:01
opendevreviewJulia Kreger proposed openstack/ironic master: ci: stabilize ironic-standalone-redfish  https://review.opendev.org/c/openstack/ironic/+/95430316:06
TheJuliadoh, I change the wrong job16:06
opendevreviewJulia Kreger proposed openstack/ironic-python-agent-builder master: Allow dib build to remove firmware in a structure.  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/95452416:08
dtantsurwill check tomorrow, thanks16:14
opendevreviewChris Krelle proposed openstack/ironic master: update Jinja2 to address CVE-2024-2383  https://review.opendev.org/c/openstack/ironic/+/95390216:17
TheJuliaIs anyone able to login to launchpad?16:22
opendevreviewJulia Kreger proposed openstack/ironic-python-agent-builder master: set a maximum systemd journal size  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/95452716:23
abongaleTheJulia : tried  just now login unsuccessful, bad signature.16:25
TheJuliaokay, so its not just me then16:25
TheJulia\o/16:25
TheJuliadtantsur: I posted some thoughts on https://review.opendev.org/c/openstack/ironic/+/954363, it might actually be okay so I'm okay with merging. Other reviewers, it would be good to go ahead and review it sooner rather than later so we can do the needful16:35
TheJuliaAnd ultimately stack things up so we can see if https://review.opendev.org/c/openstack/ironic/+/953683 is happy16:36
opendevreviewMerged openstack/ironic master: Handle unresponsive BMC during Firmware Updates  https://review.opendev.org/c/openstack/ironic/+/93810816:50
opendevreviewMerged openstack/ironic master: [docs] Automated cleaning by runbook  https://review.opendev.org/c/openstack/ironic/+/95190116:50
opendevreviewJulia Kreger proposed openstack/ironic-python-agent master: docs: remove tinyipa references  https://review.opendev.org/c/openstack/ironic-python-agent/+/95453416:53
TheJuliaiurygregory: looking at the idrac10 list, I'm thinking the "convert bug to epic" item might be good, we'll just have to figure out how when and how we can allocate resourcing on that team to help.16:59
Sandzwerg[m]<TheJulia> "Sandzwerg: the other thing is..." <- I stopped the time it's about 12mins  for the download. I also measured 10mins without nofb but I'm not sure if that is because I measured more accuaretely or if the removal of nofb made a difference. With the remoteboard NIC is set to Gigabit, the MTU is "only" 1500 but still that could take way less. Next I'm trying to get ubuntu-minimal running. For some reason I seem to need the17:02
Sandzwerg[m]ubuntu-minimal build as root or I get a permission denied when "find" tries to access some apt/list in /tmp. But even as root it fails because it can't install ifupdown, python3-venv and ipmitool which is interesting because they are all available in noble and with jammy I get the same issue. Maybe I need to live with the big disk for now. Then I'll just look for a way to include a open (root) shell so it's possible to debug17:02
Sandzwerg[m]during deployments.17:02
TheJuliaSandzwerg[m]: nofb shouldn't be responsible for that much, but it could also be the IRQ of your "virtual usb device" interface is the same as the graphics adapter17:22
TheJuliaSandzwerg[m]: nomodeset will do some wicked things to the memory interaction and actually cause the kernel to lock the memory even as the host is starting too17:22
TheJuliaso likely, a little more accurate of a measurement, but also you were not angering the graphics interface as much. This is one of the areas where on servers, framebuffers at this point are actually good (and I realize that is counter intuitive, but I've had people benchmark it and prove all of the bad behavior where nomodeset and the graphics adapter tied to a server graphics output (through a fake matrox graphics card) 17:24
TheJuliajust doesn't mix17:24
TheJuliaYeah, that is not a bad idea in general. Truthfully, if you know how to take apart the ramdisk, your biggest memory footprint "bang for your buck" is the firmware binaries17:25
TheJulialike, on centos, that gets put at /usr/lib/firmware and can be hundreds of megabytes17:26
TheJulia(which is bonkers, but some device drivers need it, so our pruning code in the ramdisk build is a relatively light touch17:26
adamcarthur5Hey TheJulia, you mentioned about reviewing the Ironic Spec (https://review.opendev.org/c/openstack/ironic-specs/+/952533), just wanted to quickly check in about it. Totally understand you have a lot on your plate so no worries if its just a time constraint :) 17:37
opendevreviewMerged openstack/bifrost master: Default ansible to version 10.x  https://review.opendev.org/c/openstack/bifrost/+/94824517:48
Sandzwerg[m]TheJulia: I'll try to remove nomodeset as well. I assume both nofb and nomodeset are added by DIB? I also try to build centos (in this case 10) but that also fails to find some packages (gdisk in this case). Is that a DIB issue or because gdisk seems to be in the appstream repo of centos? Or Because centos 10 is not yet working with DIB?18:05
Sandzwerg[m]hmm no removing that as well doesn't improve the times. I now got ~10min:30sec and I measured relatively accurate 18:26
TheJuliaSandzwerg[m]: Yeah, they can be, but I *think* we also peeled them both out in more recent dib versions19:33
TheJuliaSandzwerg[m]: you may just need to, unfortunately, extend the timeouts19:33
TheJuliabut still worthwhile double checking the packets between the bmc and the remote server and just make sure nothing fishy is occuring like retransmisisons or dropped packets19:34
cardoeTheJulia: I'm still plugging away at the binding behaviors.21:21
TheJuliacardoe: no worries! I'm starting a new spec. Folks may want to be pepared to scream or hug the idea.21:22
cardoeThanks for the review. Which prompted me to finish testing the patch out on real hardware in a real environment. Which then fell down. But I do see where our existing test cases essentially bang away at every possible behavior in this tiny window of code and then the bigger picture of code is entirely untested.21:22
cardoeEssentially when we have a VXLAN underlay, you'll have VLAN segments to hook the actual machines up. So the behavior we've got is that if there's no created segment that matches today we supply the VXLAN. We have a custom ML2 plugin that dynamically creates the segments. If you look at the Arista, Cisco, and Juniper ML2 plugins which support VXLAN, they are all doing that.21:25
cardoeMy spec for neutron is to formalize that behavior. 21:26
cardoeWhich then Ironic could work with to ensure that the correct ports are hooked up. Because today if I have 6 ports defined in Ironic for example but only 4 actually hooked up. It will sometimes select the 2 un-hooked up ports.21:27

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!