iurygregory | TheJulia, https://etherpad.opendev.org/p/ironic-idrac10-issues#L17 | 01:47 |
---|---|---|
iurygregory | let me know if this is enough or if you need more details | 01:47 |
opendevreview | Adam McArthur proposed openstack/ironic master: Update firmware schema to require 'created_at' and 'updated_at' fields https://review.opendev.org/c/openstack/ironic/+/953352 | 02:14 |
opendevreview | OpenStack Proposal Bot proposed openstack/ironic-ui stable/2025.1: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic-ui/+/954402 | 03:08 |
opendevreview | OpenStack Proposal Bot proposed openstack/ironic-inspector master: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic-inspector/+/954403 | 03:12 |
opendevreview | OpenStack Proposal Bot proposed openstack/ironic-ui stable/2024.1: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic-ui/+/954408 | 03:24 |
rpittau | good morning ironic! o/ | 06:43 |
queensly[m] | Good morning | 08:39 |
masghar | Good morning ironic! | 09:21 |
dtantsur | TheJulia: metal3 folks from Ericsson have FakeIPA - a service imitating IPA without actually doing anything | 09:29 |
Continuity | Morning | 09:30 |
dtantsur | TheJulia: blog https://metal3.io/blog/2024/10/24/Scaling-Kubernetes-with-Metal3-on-Fake-Node.html | 09:53 |
opendevreview | Verification of a change to openstack/ironic master failed: Handle unresponsive BMC during Firmware Updates https://review.opendev.org/c/openstack/ironic/+/938108 | 10:08 |
opendevreview | Verification of a change to openstack/ironic-python-agent master failed: Split hardware manager initialize out of evaluate_hardware_support https://review.opendev.org/c/openstack/ironic-python-agent/+/954139 | 10:41 |
opendevreview | Verification of a change to openstack/ironic-python-agent master failed: Graceful way for hardware managers to ignore certain devices https://review.opendev.org/c/openstack/ironic-python-agent/+/954024 | 10:41 |
opendevreview | Verification of a change to openstack/ironic-python-agent master failed: Trivial: avoid root logger in modules https://review.opendev.org/c/openstack/ironic-python-agent/+/954243 | 10:52 |
dtantsur | Jul 09 09:58:37.913976 np7c27e7c2f6dd4 ironic-conductor[109630]: ERROR ironic.common.glance_service.service_utils [None req-f686f925-c178-4e6a-86cb-b5b511371c39 None None] Unable to retrieve image members for image de91c3f0-d6b9-49eb-9936-a857812e575c: 'NoneType' object has no attribute 'image'We already have the code in IPA | 11:29 |
dtantsur | (ignore the part after 'image') | 11:30 |
dtantsur | I don't know if it's the root cause of the IPA CI outage, but seems possible | 11:30 |
dtantsur | ERROR ironic.conductor.utils [-] Deploy step deploy.write_image failed on node e9675a4a-a19d-4a04-9121-4f04af9b30d8. Download of image be19ea9a-80a3-4192-8187-d1ab3aa04e65 failed: Unable to write image to /tmp/be19ea9a-80a3-4192-8187-d1ab3aa04e65. Error: [Errno 28] No space left on device | 11:31 |
dtantsur | Sigh, this is the root cause | 11:31 |
dtantsur | I'm curious why we don't stream the image | 11:33 |
dtantsur | 'disk_format': 'qcow2' | 11:34 |
dtantsur | Have we broken force_raw? | 11:35 |
dtantsur | I wonder if the first error causes the raw conversion to break | 11:36 |
dtantsur | mmm, no, it's probably from downloading IPA or something like this. Does not seem directly related to instance images. | 11:40 |
opendevreview | Verification of a change to openstack/ironic master failed: Handle unresponsive BMC during Firmware Updates https://review.opendev.org/c/openstack/ironic/+/938108 | 11:54 |
opendevreview | Merged openstack/ironic-ui stable/2024.1: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic-ui/+/954408 | 12:07 |
opendevreview | Merged openstack/ironic-ui stable/2025.1: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic-ui/+/954402 | 12:13 |
TheJulia | dtantsur: looks like it is http url based downloads in some cases, I'm planning on focusing on that today if nothing distracts me | 12:38 |
dtantsur | thx! | 12:48 |
TheJulia | dtantsur: also, it seems my last ipa-b fix to reduce the memory footprint of the ramdisk didn't work. I did raise a question a few days ago, if anyone objects for us to tune systemd so it also can't try and grab 10% of the ramdisk | 12:55 |
dtantsur | I cannot imagine many objects (you can leave a way to opt out) | 12:55 |
TheJulia | (we're basically chewing ~120M of ramdisk storage with $other stuff once it starts to boot | 12:56 |
TheJulia | ack | 12:56 |
* TheJulia resumes caffination | 12:56 | |
opendevreview | Dmitry Tantsur proposed openstack/ironic-python-agent master: Hint at sector sizes when reporting an invalid written image https://review.opendev.org/c/openstack/ironic-python-agent/+/954498 | 13:59 |
dtantsur | TheJulia: you'll "like" this one ^^ | 13:59 |
TheJulia | oh noes ;) | 14:00 |
opendevreview | Verification of a change to openstack/ironic master failed: feat: add verify ca conf support for drivers https://review.opendev.org/c/openstack/ironic/+/947544 | 14:08 |
kubajj | Hello again dtantsur, TheJulia, I have tested the efibootmgr logging and it works well. Will prepare a change soon. I also discussed it with Arne and he still thinks that it would be nice to add logging before the changes happen so we can see what has changed for debugging. Is there anything similar to the collect_system_logs, but closer to the beginning of IPA's runtime? We also discussed that Ironic does not seem to | 14:13 |
kubajj | remove options other than duplicates from efibootmgr, is there a reason to leave them there? (In our QA node, we have almost 800 entries in the output.) | 14:13 |
dtantsur | kubajj: (1) nothing special, just normal logging, (2) I wanted it done, just never got time to work on it (maybe I even filed an LP) | 14:16 |
kubajj | dtantsur: (1) ok, will evaluate | 14:16 |
kubajj | dtantsur: (2) would you say it would be difficult? I had a brief look at it and doesn't seem that bad - I am asking as I have a student coming to work on Ironic related stuff and need to give him a simple first task to do to get familiar with Ironic - could be a good fit | 14:18 |
dtantsur | that's something where TheJulia may have more opinions since she's spent some time fighting EFI records | 14:19 |
TheJulia | I mean, we've got logic to remove our duplicates if memory serves | 14:25 |
TheJulia | 800 entries is super impressive, I'm just worried something is missing or the version might not load it | 14:25 |
TheJulia | so I guess a good starting point is a really concise bug with some example efibootmgr output because maybe we're not taking the right path or maybe the machine *could* be ignoring removals | 14:26 |
TheJulia | kubajj: look for ipa changes from steve baker a couple years ago, he put in code to try and keep the table cleaner, but I guess it is not working. It was in 2023.2 I believe | 14:26 |
kubajj | TheJulia: I see the part which removes duplicates (we are running Caracal), but since we have so many entries, I thought it is supposed to do something else. Will dig deeper | 14:28 |
guilhermesp | TheJulia: sorry just saw your reply now... i mean, maybe its my lack of xp with it, but im not seeing a way that i set bios.apply_configuration args during cleaning that would apply everything else + secure boot and wont fail on the next boot. I might revisit that after. What im focusing right now is get a set of baseline redfish commands ( with no secure boot ) the user requested to be applied with automated clean ( | 14:39 |
guilhermesp | ? ) and play with reset_bios once the system is released :P | 14:39 |
TheJulia | guilhermesp: it may also be the overall high level secure boot flag and work dmitry did, just might not apply to the Lenovo SR650s | 14:51 |
TheJulia | As I mentioned and your learning, SR650s are *very* different | 14:51 |
guilhermesp | lol indeed... i simple bios.reset_factory is not going very well here. having a lot of fun right now :P | 14:54 |
TheJulia | I guess Lenovo has some stream of specific customer(s) which have demands around the SR650 and it's release versions which drives a lot of it. | 15:14 |
TheJulia | le-sigh, crashed laptop | 15:45 |
opendevreview | Julia Kreger proposed openstack/ironic-python-agent-builder master: Allow dib build to remove firmware in a structure. https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/954524 | 15:58 |
TheJulia | dtantsur: fyi ^ | 16:01 |
TheJulia | that should help drop the size of the images which should make things a little better | 16:01 |
opendevreview | Julia Kreger proposed openstack/ironic master: ci: stabilize ironic-standalone-redfish https://review.opendev.org/c/openstack/ironic/+/954303 | 16:06 |
TheJulia | doh, I change the wrong job | 16:06 |
opendevreview | Julia Kreger proposed openstack/ironic-python-agent-builder master: Allow dib build to remove firmware in a structure. https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/954524 | 16:08 |
dtantsur | will check tomorrow, thanks | 16:14 |
opendevreview | Chris Krelle proposed openstack/ironic master: update Jinja2 to address CVE-2024-2383 https://review.opendev.org/c/openstack/ironic/+/953902 | 16:17 |
TheJulia | Is anyone able to login to launchpad? | 16:22 |
opendevreview | Julia Kreger proposed openstack/ironic-python-agent-builder master: set a maximum systemd journal size https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/954527 | 16:23 |
abongale | TheJulia : tried just now login unsuccessful, bad signature. | 16:25 |
TheJulia | okay, so its not just me then | 16:25 |
TheJulia | \o/ | 16:25 |
TheJulia | dtantsur: I posted some thoughts on https://review.opendev.org/c/openstack/ironic/+/954363, it might actually be okay so I'm okay with merging. Other reviewers, it would be good to go ahead and review it sooner rather than later so we can do the needful | 16:35 |
TheJulia | And ultimately stack things up so we can see if https://review.opendev.org/c/openstack/ironic/+/953683 is happy | 16:36 |
opendevreview | Merged openstack/ironic master: Handle unresponsive BMC during Firmware Updates https://review.opendev.org/c/openstack/ironic/+/938108 | 16:50 |
opendevreview | Merged openstack/ironic master: [docs] Automated cleaning by runbook https://review.opendev.org/c/openstack/ironic/+/951901 | 16:50 |
opendevreview | Julia Kreger proposed openstack/ironic-python-agent master: docs: remove tinyipa references https://review.opendev.org/c/openstack/ironic-python-agent/+/954534 | 16:53 |
TheJulia | iurygregory: looking at the idrac10 list, I'm thinking the "convert bug to epic" item might be good, we'll just have to figure out how when and how we can allocate resourcing on that team to help. | 16:59 |
Sandzwerg[m] | <TheJulia> "Sandzwerg: the other thing is..." <- I stopped the time it's about 12mins for the download. I also measured 10mins without nofb but I'm not sure if that is because I measured more accuaretely or if the removal of nofb made a difference. With the remoteboard NIC is set to Gigabit, the MTU is "only" 1500 but still that could take way less. Next I'm trying to get ubuntu-minimal running. For some reason I seem to need the | 17:02 |
Sandzwerg[m] | ubuntu-minimal build as root or I get a permission denied when "find" tries to access some apt/list in /tmp. But even as root it fails because it can't install ifupdown, python3-venv and ipmitool which is interesting because they are all available in noble and with jammy I get the same issue. Maybe I need to live with the big disk for now. Then I'll just look for a way to include a open (root) shell so it's possible to debug | 17:02 |
Sandzwerg[m] | during deployments. | 17:02 |
TheJulia | Sandzwerg[m]: nofb shouldn't be responsible for that much, but it could also be the IRQ of your "virtual usb device" interface is the same as the graphics adapter | 17:22 |
TheJulia | Sandzwerg[m]: nomodeset will do some wicked things to the memory interaction and actually cause the kernel to lock the memory even as the host is starting too | 17:22 |
TheJulia | so likely, a little more accurate of a measurement, but also you were not angering the graphics interface as much. This is one of the areas where on servers, framebuffers at this point are actually good (and I realize that is counter intuitive, but I've had people benchmark it and prove all of the bad behavior where nomodeset and the graphics adapter tied to a server graphics output (through a fake matrox graphics card) | 17:24 |
TheJulia | just doesn't mix | 17:24 |
TheJulia | Yeah, that is not a bad idea in general. Truthfully, if you know how to take apart the ramdisk, your biggest memory footprint "bang for your buck" is the firmware binaries | 17:25 |
TheJulia | like, on centos, that gets put at /usr/lib/firmware and can be hundreds of megabytes | 17:26 |
TheJulia | (which is bonkers, but some device drivers need it, so our pruning code in the ramdisk build is a relatively light touch | 17:26 |
adamcarthur5 | Hey TheJulia, you mentioned about reviewing the Ironic Spec (https://review.opendev.org/c/openstack/ironic-specs/+/952533), just wanted to quickly check in about it. Totally understand you have a lot on your plate so no worries if its just a time constraint :) | 17:37 |
opendevreview | Merged openstack/bifrost master: Default ansible to version 10.x https://review.opendev.org/c/openstack/bifrost/+/948245 | 17:48 |
Sandzwerg[m] | TheJulia: I'll try to remove nomodeset as well. I assume both nofb and nomodeset are added by DIB? I also try to build centos (in this case 10) but that also fails to find some packages (gdisk in this case). Is that a DIB issue or because gdisk seems to be in the appstream repo of centos? Or Because centos 10 is not yet working with DIB? | 18:05 |
Sandzwerg[m] | hmm no removing that as well doesn't improve the times. I now got ~10min:30sec and I measured relatively accurate | 18:26 |
TheJulia | Sandzwerg[m]: Yeah, they can be, but I *think* we also peeled them both out in more recent dib versions | 19:33 |
TheJulia | Sandzwerg[m]: you may just need to, unfortunately, extend the timeouts | 19:33 |
TheJulia | but still worthwhile double checking the packets between the bmc and the remote server and just make sure nothing fishy is occuring like retransmisisons or dropped packets | 19:34 |
cardoe | TheJulia: I'm still plugging away at the binding behaviors. | 21:21 |
TheJulia | cardoe: no worries! I'm starting a new spec. Folks may want to be pepared to scream or hug the idea. | 21:22 |
cardoe | Thanks for the review. Which prompted me to finish testing the patch out on real hardware in a real environment. Which then fell down. But I do see where our existing test cases essentially bang away at every possible behavior in this tiny window of code and then the bigger picture of code is entirely untested. | 21:22 |
cardoe | Essentially when we have a VXLAN underlay, you'll have VLAN segments to hook the actual machines up. So the behavior we've got is that if there's no created segment that matches today we supply the VXLAN. We have a custom ML2 plugin that dynamically creates the segments. If you look at the Arista, Cisco, and Juniper ML2 plugins which support VXLAN, they are all doing that. | 21:25 |
cardoe | My spec for neutron is to formalize that behavior. | 21:26 |
cardoe | Which then Ironic could work with to ensure that the correct ports are hooked up. Because today if I have 6 ports defined in Ironic for example but only 4 actually hooked up. It will sometimes select the 2 un-hooked up ports. | 21:27 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!