Tuesday, 2025-03-04

opendevreviewSatoshi Shirosaka proposed openstack/ironic master: Add ignore_project_check_for_admin_tasks config option  https://review.opendev.org/c/openstack/ironic/+/94302801:30
opendevreviewSatoshi Shirosaka proposed openstack/ironic-python-agent master: WIP Add ContainerHardwareManager  https://review.opendev.org/c/openstack/ironic-python-agent/+/94171401:55
opendevreviewMerged openstack/ironic master: doc: updates to anaconda deploy interface  https://review.opendev.org/c/openstack/ironic/+/94283904:48
rpittaugood morning ironic! o/08:00
rpittauTheJulia: I've started with the cycle highlights, I will publish the patch today08:00
mdfrgooood morning!09:18
rpittauwould like to release bifrost, just missing this small bit when any core has a moment https://review.opendev.org/c/openstack/bifrost/+/942767 thanks! :)09:53
opendevreviewVerification of a change to openstack/ironic-python-agent master failed: Fix the way qemu-img is called with prlimits  https://review.opendev.org/c/openstack/ironic-python-agent/+/94269010:02
opendevreviewVerification of a change to openstack/ironic-tempest-plugin master failed: CI: Dial back the non-voting jobs  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/94284610:44
opendevreviewDerek Higgins proposed openstack/sushy-tools master: Redefine libvirt domain on Restart  https://review.opendev.org/c/openstack/sushy-tools/+/94323911:03
opendevreviewMerged openstack/ironic master: Replace deprecated abc.abstractclassmethod  https://review.opendev.org/c/openstack/ironic/+/94311312:19
opendevreviewMerged openstack/ironic master: Drop direct dependency on iso8601  https://review.opendev.org/c/openstack/ironic/+/94302312:19
TheJuliaHey folks, I'm sure steve would love some reviews on https://review.opendev.org/q/topic:%22novnc_proxy%2214:10
rpittauTheJulia: I have some of them open, left a comment on the docs one and going to add reviews on the others now14:16
TheJuliaYeah, I'm less worried about the docs at the moment since those are easy to backport if it comes to it14:17
rpittauyep14:17
rpittauTheJulia: btw if you have time https://review.opendev.org/c/openstack/releases/+/943229 :)14:19
JayFWe should also try to get inspection rules over the line before we release14:20
rpittauJayF: that would be really nice14:20
TheJuliathe entry on line 35 was more-so a bug, and we can likely drop it14:20
cardoeSo I asked a question why the container image has to be 1:1 tied to the conductor?14:21
TheJuliacardoe: can you elaborate a little bit more, the coffee is not filling in the gaps yet :)14:21
TheJuliaso https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c90/942010/2/check/ironic-standalone-ipa-src/c90f882/ is a fairly clean example of the cirros vm starting, not logging anything, but failing to be pingable14:39
TheJuliaAnd guess what we have now: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c90/942010/2/check/ironic-standalone-ipa-src/c90f882/controller/logs/screen-q-dhcp.txt14:41
TheJuliaSearch for The process should not have died14:41
rpittauah!14:46
opendevreviewJulia Kreger proposed openstack/ironic master: CI: Change standalone jobs over to OVN  https://review.opendev.org/c/openstack/ironic/+/94332414:50
TheJuliaThe standalone jobs are where we're the most vulnerable next to the multinode jobs14:57
TheJuliaSo... I guess we'll see if OVN works14:57
TheJuliaAlso looks like simon kelly, the dnsmasq maintainer, appears to be gearing up for 2.91's release14:57
TheJuliawhich many crash fixes but also lots of churn14:58
cardoeTheJulia: ah so the tie up seems to be that "conductor X" will host the container for "node Y". And it's 1:1 so that you use the host IP of that conductor. Any reason why we shouldn't really say that's "conductor group"?15:20
TheJuliaI'm still not sure what your talking about :(15:21
cardoee.g. if I wanna have a backend implementation that has 1 host be my VNC proxy box for a given conductor group.15:21
TheJuliaoh15:21
TheJuliaso, the host maps into a conductor group and each host's responsibility *should* be managed by the existence of the node being mapped to the conductor through the hash ring, and then the hash ring update would move responsibility based upon the hash ring15:22
cardoeokay so then that should be okay now.15:23
cardoeBasically I've got k8s running my ironic and I've got different conductor groups that run on specifically labeled nodes.15:23
cardoeSo I was thinking I would make an implementation that used the k8s service account token to be able to create a pod or something.15:25
cardoeLots of hand waving here.15:25
TheJuliaYes, so in your case, you'll likely want an operator (if steve's future operator doesn't fit your needs) to spin a container up for each node, so the conductor is still responsible, that may move int he cluster, but you can re-spoke the vnc containers out on your cluster separately15:26
cardoeI didn't stay at a Holiday Inn Express last night so could all be non-sense out of my mouth.15:26
TheJuliayeah, for running on openshift, we're going to need an operator and likely appropriately name so the container might live on and ebtirely be managed by the operator status updates15:27
TheJuliathe conductor will be responsible for doing the first step in interaction where the operator might do a thing, but this is the basis for it ending up being a pluggable interface.15:27
opendevreviewMerged openstack/bifrost master: Remove ubuntu bionic support leftovers  https://review.opendev.org/c/openstack/bifrost/+/94276715:42
opendevreviewRiccardo Pittau proposed openstack/ironic master: [WIP] Run metal3 integration job using UEFI boot (default)  https://review.opendev.org/c/openstack/ironic/+/93969416:06
opendevreviewVerification of a change to openstack/ironic master failed: Add systemd provider for console containers  https://review.opendev.org/c/openstack/ironic/+/94161416:16
opendevreviewVerification of a change to openstack/ironic master failed: Implement drivers redfish-graphical, fake-graphical  https://review.opendev.org/c/openstack/ironic/+/94161516:16
opendevreviewVerification of a change to openstack/ironic master failed: Add vnc-container image build  https://review.opendev.org/c/openstack/ironic/+/94201716:16
cardoewrt to https://review.opendev.org/c/openstack/ironic/+/940333 what do I need to do? I've been trying to poke dtantsur to review it for the past 2 team meetings.16:49
dtantsursorry, meetings recently tend to overlap with other stuff (and yesterday I was on a vacation)16:50
dtantsurI'll keep it open in front of me16:50
dtantsuryeah, W+1, sorry for the delay again16:55
opendevreviewNicholas Kuechler proposed openstack/ironic-python-agent-builder master: fix: Adds bsdextrautils package for Debian which provides hexdump  command  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/94333416:55
dtantsurConnecting to repo.tinycorelinux.net (128.127.66.77:80)17:07
dtantsurwget: download timed out17:07
dtantsursigh17:07
cardoedtantsur: thank you. this will at least get all the infra for the redfish in before 2025.1. so it can be developed out of band better.17:12
dtantsur++17:18
TheJuliaHas anyone looked at the metal3 job logs to see what is going on?17:33
TheJuliaugh, tinycore17:34
JayFTheJulia: it appears the dnsmasq downgrade is no longer running17:35
JayFhttps://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c90/942010/2/check/ironic-standalone-ipa-src/c90f882/job-output.txt at 2025-03-04 05:09:22.882915 17:35
JayFhmm 2.90-2build2 is in ubuntu17:36
JayFis it possible they pushed an update that /didn't/ include all the fixes our stuff was doing?17:36
JayF2025-03-04 05:02:54.198281 | controller | Get:94 https://mirror-int.ord.rax.opendev.org/ubuntu noble/main amd64 dnsmasq-base amd64 2.90-2build2 [375 kB]17:36
TheJuliaso, there is sort of a mixed bag going on here17:37
TheJuliasince we did also remove our explicit downgrade because the logic was also no longer checking, and we kept the downgrade wording even though it was... 2.89 we were dongrading to 2.86 but went to accepting 2.90-ubuntu-something17:38
JayFhttps://github.com/openstack/ironic/blob/master/devstack/lib/ironic#L3674 needs to not say 2.90 anymore, I think 17:38
JayFyep exactly17:38
JayFwe're at the same conclusion I think17:38
TheJuliaI simplified that check to just be 2.90 so we wouldn't try to invoke ubuntu specifics on every other os17:38
JayFthe ubuntu version changed, (I checked) it doesn't include the patch we need17:38
JayFso the missing dnsmasq downgrade is certainly hurting is17:38
TheJuliale sigh17:38
JayF**us17:39
TheJuliaThen we need to revise it, but maybe keep explicitly only run it on ubuntu17:39
JayFit's already under is_ubuntu17:39
TheJuliaahh, I think that was the other thing I did as well17:39
TheJuliaI honestly don't remember it at this point17:39
JayFhonestly given now in the build we remove the dnsmasq_dir 17:39
JayFI'm tempted to remove all the idempotence logic17:40
JayFsince I think we can compile everytime we stack with no harm17:40
TheJuliaI guess17:40
TheJuliain any event, the standalone jobs all passed so I workflowed my default change anyhow17:40
TheJuliaas it is CI config only17:40
JayFI'm not looking for a specific change so much as was curious why dnsmasq started crashing again17:41
TheJuliayeah, no idea17:41
JayFI can try to whip up a fix for that to ensure our upgrades happen again?17:41
TheJuliabut if the specific crash fixes we expected in 2.90 are no longer in the package, who knows17:41
TheJuliasure, fwiw, looking at the change log17:41
TheJuliaI'd consider 2.91rc5 from git17:42
JayFhttps://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=f006be7842104a9f86fbf419326b7aad08ade61d 17:42
JayF(october) but 2.90-deb2 is from 12 months ago17:42
JayFyeah, I'm going to pull the idempotence17:42
TheJuliaugh17:42
JayFactually, even better idea incoming17:43
opendevreviewJay Faulkner proposed openstack/ironic master: Restore recompile of dnsmasq  https://review.opendev.org/c/openstack/ironic/+/94333917:46
* JayF goes back into his SCALE-talk-writing bunker17:47
opendevreviewJay Faulkner proposed openstack/ironic master: Restore recompile of dnsmasq  https://review.opendev.org/c/openstack/ironic/+/94333917:47
opendevreviewVerification of a change to openstack/ironic master failed: Add systemd provider for console containers  https://review.opendev.org/c/openstack/ironic/+/94161417:58
opendevreviewMerged openstack/ironic master: allow multiple inspection interfaces to load hooks  https://review.opendev.org/c/openstack/ironic/+/94033318:20
opendevreviewDoug Goldstein proposed openstack/ironic master: fix glance metadata layout  https://review.opendev.org/c/openstack/ironic/+/94249618:23
opendevreviewNicholas Kuechler proposed openstack/ironic-python-agent-builder master: fix: Adds bsdextrautils package for Debian which provides hexdump  command  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/94333419:05
opendevreviewJay Faulkner proposed openstack/ironic master: Restore recompile of dnsmasq  https://review.opendev.org/c/openstack/ironic/+/94333919:24
cardoecid: what do we need to do to get your rules stuff complete for the release/19:26
JayFa nonzero part of that has to be "fix CI for ironic", yeah?19:26
cardoeI mean ignoring that.19:26
cardoeI'm just looking at stuff we had as a priority and or close and seeing if we can get it over the line.19:27
cardoePlus I'm really hoping that with this ESXi non-sense in my rear view mirror I can get back to that.19:27
cidI think, mostly a review and votes :)19:27
cardoehttps://github.com/rackerlabs/esxi-img is ultimately what we did as an approach. Still need to fix up license and give Sandzwerg[m] some appropriate doots as he sees fit.19:28
cardoeI'll get it up on pypi.org shortly but you can basically do "uvx esxi-img gen-img path/to/esxi.iso" and you get a file dropped in your cwd that's an image that you can upload to glance and use the direct deploy interface with a config-drive and it'll "work on my box"19:30
cardoeThere's easily a half a dozen cases we've thought of that split out "patches welcome"19:32
cardoee.g. BIOS boot... we only did UEFI19:32
opendevreviewVerification of a change to openstack/ironic master failed: CI: Change standalone jobs over to OVN  https://review.opendev.org/c/openstack/ironic/+/94332419:33
opendevreviewJulia Kreger proposed openstack/ironic master: network testing: hooking in an external network simulator  https://review.opendev.org/c/openstack/ironic/+/94229819:43
opendevreviewJulia Kreger proposed openstack/ironic master: WIP: Add network simulator support for force10 os 10  https://review.opendev.org/c/openstack/ironic/+/94334519:43
opendevreviewJulia Kreger proposed openstack/ironic master: network testing: hooking in an external network simulator  https://review.opendev.org/c/openstack/ironic/+/94229819:43
JayFcardoe: you should blog that19:44
JayFcardoe: or get it written down /somewhere/ public.19:44
JayFI don't think it goes in openstack docs if it's that, but  that information should be preserved19:44
JayFmaybe good fodder for ironicbaremetal.org? IDK19:45
cardoeSure.19:45
opendevreviewJulia Kreger proposed openstack/ironic master: WIP: Add network simulator support for force10 os 10  https://review.opendev.org/c/openstack/ironic/+/94334519:45
TheJuliacardoe: bios boot... wut?! ;)19:46
TheJuliaironicbaremetal.org posts are a good idea19:47
cardoele sigh... metal3 test hates code again19:49
cardoeso btw I tried to install Fedora last night using the Anaconda driver and it 100% doesn't work on 2024.2. So the CERN folks must not have upgraded. Applying my glance metadata patch fixed it. So I stand by my backport request.19:50
cardoeI feel like the metal3 job needs to have a fast out.19:52
cardoeIt already tripped over its feet for reasons that aren't clear to me. But then it's doing TWO 2400 second waits cause maybe you wanna wait a while. If my math is right that's an HOUR AND TWENTY MINUTES when it's literally gonna fail.19:54
TheJulia... cardoe have you spotted anything with metal3 yet? I'm wondering if we just need to make the job non-voting, becuase it now seems to be in the 50% failure rate range for the day19:58
TheJuliaI just realized you clearly have19:58
TheJuliaYeah, thats no good :(19:58
cardoehttps://zuul.opendev.org/t/openstack/stream/b6fb81b0f354411b83e1c45f809b5207?logfile=console.log19:58
cardoenode-0 and node-1 weren't working as best as I can tell @ 19:03. The test then proceeds to wait 2400 seconds for node-0 to maybe become happy even though it's already recorded a failure and looking at the behavior it won't change that FAIL to OK after the fact so the wait is pointless. Now that it's done waiting for node-0, it decides to wait 2400 seconds for node-1.20:01
cardoeI've found some other test logs where node-0 failed it waited 40 minutes and then went to node-1 which worked and it finished running tests but still failed the overall CI job.20:02
cardoeAnd I get that node-0 is ipmi and node-1 is redfish so its testing different code paths so its good to see the results of both... but we've got other tests that test redfish and ipmi. So I feel like the metal3 job is really just a full integration test and if any one of the two nodes fail, just fast fail out. I'm not gonna be using the metal3 integration test to troubleshoot why a redfish thing broke. I'll use a redfish 20:03
cardoespecific test job.20:03
cardoeonly reason I've looked at it in detail is that my glance fix has failed 4 times on it20:06
JayFhave you found other patches with similar failures?20:06
JayFJust making sure it's not that your glance patch sneaky-breaks metal3 :D 20:07
cardoeI mean libvirt failed to start the VMs. I guess my patch changed how much CPU heat the test creates which threw off libvirt....20:08
JayFI try to leave my assumptions at the door with that metal3 job20:08
cardoeThe last time IPA generated a MASSIVE log file when it ran inspection and the test has a hard deadline on how long inspection can take and the big payload caused it to still be in continue_inspection() at the timeout20:09
TheJuliaLooks like metal3 is failing because lwo disk space20:14
TheJuliahttps://www.irccloud.com/pastebin/y0lkLMp2/20:14
cardoeWhich log did ya find that in?20:18
TheJuliathe ironic.log file20:18
TheJuliahttps://fad549b2938ec7abe38a-5d9007d54f55ca0ff60ac5193693c3de.ssl.cf2.rackcdn.com/943324/1/gate/metal3-integration/de17864/controller/system/df--h.txt20:18
TheJuliawe can't run that job20:19
JayFI believe on some of the rackspace machines, there's some space you can mount up and use20:19
JayFthat isn't prepared by default20:19
JayFso it's likely possible but needs engineering20:20
TheJuliathat... doesn't appear to be the case on this machine20:20
cardoeohhhh before_pivoting... I never looked in there.20:20
JayFhttps://fad549b2938ec7abe38a-5d9007d54f55ca0ff60ac5193693c3de.ssl.cf2.rackcdn.com/943324/1/gate/metal3-integration/de17864/controller/system/lsblk--ap.txt\20:20
TheJuliaonly xvda as a device20:20
JayFxvde1 80G unmounted20:20
TheJuliaohh, yes20:20
cardoeI wish I had a clue what boxes those were20:21
TheJuliaxvde1 as a parition20:21
TheJuliadoesn't show on dmesg output...20:21
cardoeany idea what the machine UUID would be?20:22
cardoethe kernel set it from the VM UUID20:22
cardoehttps://fad549b2938ec7abe38a-5d9007d54f55ca0ff60ac5193693c3de.ssl.cf2.rackcdn.com/943324/1/gate/metal3-integration/de17864/controller/system/lsblk--ap.txt20:23
cardoeJay's right20:23
JayFeither way, given we're on fire, we should -nv the job and file a bug with this research20:24
JayFIMHO20:24
cardoeagreed20:24
cardoeIMHO the job needs a fast out cause this isn't the first time when its broken that its taken 2+ hours to free up the nodes.20:25
TheJuliayeah, agree20:25
TheJuliaLooking at the structure of the way the job runs and sets it up, we likely need someone who knows the ins and outs of the metal3 dev scripts to figure out how they would prefer to wire in using that space20:27
TheJuliahmm20:28
TheJuliait has a knob20:28
TheJuliaoh, its for the vms built20:29
opendevreviewJulia Kreger proposed openstack/ironic master: CI: Make metal3 non-voting  https://review.opendev.org/c/openstack/ironic/+/94334720:32
opendevreviewVerification of a change to openstack/ironic-python-agent master failed: Fix the way qemu-img is called with prlimits  https://review.opendev.org/c/openstack/ironic-python-agent/+/94269020:42
opendevreviewVerification of a change to openstack/ironic-python-agent-builder master failed: More reliable TinyIPA builds with network retries  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/94236920:55
opendevreviewSatoshi Shirosaka proposed openstack/ironic-python-agent-builder master: WIP Test Podman for container-based cleaning  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/94334821:16
opendevreviewVerification of a change to openstack/ironic master failed: CI: Change standalone jobs over to OVN  https://review.opendev.org/c/openstack/ironic/+/94332421:18
opendevreviewSatoshi Shirosaka proposed openstack/ironic-python-agent master: WIP Add ContainerHardwareManager  https://review.opendev.org/c/openstack/ironic-python-agent/+/94171421:20
opendevreviewSatoshi Shirosaka proposed openstack/ironic-python-agent master: WIP Add ContainerHardwareManager  https://review.opendev.org/c/openstack/ironic-python-agent/+/94171421:24
opendevreviewVerification of a change to openstack/ironic master failed: Add systemd provider for console containers  https://review.opendev.org/c/openstack/ironic/+/94161421:34
JayFGross. All the CI passed on my DNSmasq except for pep 822:24
JayFI'm going to clean it up and re-push. Might be a good idea for someone to pre-approve it if we want it to merge overnight22:24
opendevreviewJay Faulkner proposed openstack/ironic master: Restore recompile of dnsmasq  https://review.opendev.org/c/openstack/ironic/+/94333922:26
TheJuliaI just workflowed the metal3 to nv change as well and noted the consensus this morning22:49
opendevreviewVerification of a change to openstack/ironic-tempest-plugin master failed: Add retries while waiting for SSH on server  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/94200923:03
JayFthank you23:37
opendevreviewSatoshi Shirosaka proposed openstack/ironic-python-agent master: WIP Add ContainerHardwareManager  https://review.opendev.org/c/openstack/ironic-python-agent/+/94171423:49
opendevreviewMerged openstack/ironic master: CI: Change standalone jobs over to OVN  https://review.opendev.org/c/openstack/ironic/+/94332423:51

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!