opendevreview | Satoshi Shirosaka proposed openstack/ironic master: Add ignore_project_check_for_admin_tasks config option https://review.opendev.org/c/openstack/ironic/+/943028 | 01:30 |
---|---|---|
opendevreview | Satoshi Shirosaka proposed openstack/ironic-python-agent master: WIP Add ContainerHardwareManager https://review.opendev.org/c/openstack/ironic-python-agent/+/941714 | 01:55 |
opendevreview | Merged openstack/ironic master: doc: updates to anaconda deploy interface https://review.opendev.org/c/openstack/ironic/+/942839 | 04:48 |
rpittau | good morning ironic! o/ | 08:00 |
rpittau | TheJulia: I've started with the cycle highlights, I will publish the patch today | 08:00 |
mdfr | gooood morning! | 09:18 |
rpittau | would like to release bifrost, just missing this small bit when any core has a moment https://review.opendev.org/c/openstack/bifrost/+/942767 thanks! :) | 09:53 |
opendevreview | Verification of a change to openstack/ironic-python-agent master failed: Fix the way qemu-img is called with prlimits https://review.opendev.org/c/openstack/ironic-python-agent/+/942690 | 10:02 |
opendevreview | Verification of a change to openstack/ironic-tempest-plugin master failed: CI: Dial back the non-voting jobs https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/942846 | 10:44 |
opendevreview | Derek Higgins proposed openstack/sushy-tools master: Redefine libvirt domain on Restart https://review.opendev.org/c/openstack/sushy-tools/+/943239 | 11:03 |
opendevreview | Merged openstack/ironic master: Replace deprecated abc.abstractclassmethod https://review.opendev.org/c/openstack/ironic/+/943113 | 12:19 |
opendevreview | Merged openstack/ironic master: Drop direct dependency on iso8601 https://review.opendev.org/c/openstack/ironic/+/943023 | 12:19 |
TheJulia | Hey folks, I'm sure steve would love some reviews on https://review.opendev.org/q/topic:%22novnc_proxy%22 | 14:10 |
rpittau | TheJulia: I have some of them open, left a comment on the docs one and going to add reviews on the others now | 14:16 |
TheJulia | Yeah, I'm less worried about the docs at the moment since those are easy to backport if it comes to it | 14:17 |
rpittau | yep | 14:17 |
rpittau | TheJulia: btw if you have time https://review.opendev.org/c/openstack/releases/+/943229 :) | 14:19 |
JayF | We should also try to get inspection rules over the line before we release | 14:20 |
rpittau | JayF: that would be really nice | 14:20 |
TheJulia | the entry on line 35 was more-so a bug, and we can likely drop it | 14:20 |
cardoe | So I asked a question why the container image has to be 1:1 tied to the conductor? | 14:21 |
TheJulia | cardoe: can you elaborate a little bit more, the coffee is not filling in the gaps yet :) | 14:21 |
TheJulia | so https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c90/942010/2/check/ironic-standalone-ipa-src/c90f882/ is a fairly clean example of the cirros vm starting, not logging anything, but failing to be pingable | 14:39 |
TheJulia | And guess what we have now: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c90/942010/2/check/ironic-standalone-ipa-src/c90f882/controller/logs/screen-q-dhcp.txt | 14:41 |
TheJulia | Search for The process should not have died | 14:41 |
rpittau | ah! | 14:46 |
opendevreview | Julia Kreger proposed openstack/ironic master: CI: Change standalone jobs over to OVN https://review.opendev.org/c/openstack/ironic/+/943324 | 14:50 |
TheJulia | The standalone jobs are where we're the most vulnerable next to the multinode jobs | 14:57 |
TheJulia | So... I guess we'll see if OVN works | 14:57 |
TheJulia | Also looks like simon kelly, the dnsmasq maintainer, appears to be gearing up for 2.91's release | 14:57 |
TheJulia | which many crash fixes but also lots of churn | 14:58 |
cardoe | TheJulia: ah so the tie up seems to be that "conductor X" will host the container for "node Y". And it's 1:1 so that you use the host IP of that conductor. Any reason why we shouldn't really say that's "conductor group"? | 15:20 |
TheJulia | I'm still not sure what your talking about :( | 15:21 |
cardoe | e.g. if I wanna have a backend implementation that has 1 host be my VNC proxy box for a given conductor group. | 15:21 |
TheJulia | oh | 15:21 |
TheJulia | so, the host maps into a conductor group and each host's responsibility *should* be managed by the existence of the node being mapped to the conductor through the hash ring, and then the hash ring update would move responsibility based upon the hash ring | 15:22 |
cardoe | okay so then that should be okay now. | 15:23 |
cardoe | Basically I've got k8s running my ironic and I've got different conductor groups that run on specifically labeled nodes. | 15:23 |
cardoe | So I was thinking I would make an implementation that used the k8s service account token to be able to create a pod or something. | 15:25 |
cardoe | Lots of hand waving here. | 15:25 |
TheJulia | Yes, so in your case, you'll likely want an operator (if steve's future operator doesn't fit your needs) to spin a container up for each node, so the conductor is still responsible, that may move int he cluster, but you can re-spoke the vnc containers out on your cluster separately | 15:26 |
cardoe | I didn't stay at a Holiday Inn Express last night so could all be non-sense out of my mouth. | 15:26 |
TheJulia | yeah, for running on openshift, we're going to need an operator and likely appropriately name so the container might live on and ebtirely be managed by the operator status updates | 15:27 |
TheJulia | the conductor will be responsible for doing the first step in interaction where the operator might do a thing, but this is the basis for it ending up being a pluggable interface. | 15:27 |
opendevreview | Merged openstack/bifrost master: Remove ubuntu bionic support leftovers https://review.opendev.org/c/openstack/bifrost/+/942767 | 15:42 |
opendevreview | Riccardo Pittau proposed openstack/ironic master: [WIP] Run metal3 integration job using UEFI boot (default) https://review.opendev.org/c/openstack/ironic/+/939694 | 16:06 |
opendevreview | Verification of a change to openstack/ironic master failed: Add systemd provider for console containers https://review.opendev.org/c/openstack/ironic/+/941614 | 16:16 |
opendevreview | Verification of a change to openstack/ironic master failed: Implement drivers redfish-graphical, fake-graphical https://review.opendev.org/c/openstack/ironic/+/941615 | 16:16 |
opendevreview | Verification of a change to openstack/ironic master failed: Add vnc-container image build https://review.opendev.org/c/openstack/ironic/+/942017 | 16:16 |
cardoe | wrt to https://review.opendev.org/c/openstack/ironic/+/940333 what do I need to do? I've been trying to poke dtantsur to review it for the past 2 team meetings. | 16:49 |
dtantsur | sorry, meetings recently tend to overlap with other stuff (and yesterday I was on a vacation) | 16:50 |
dtantsur | I'll keep it open in front of me | 16:50 |
dtantsur | yeah, W+1, sorry for the delay again | 16:55 |
opendevreview | Nicholas Kuechler proposed openstack/ironic-python-agent-builder master: fix: Adds bsdextrautils package for Debian which provides hexdump command https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/943334 | 16:55 |
dtantsur | Connecting to repo.tinycorelinux.net (128.127.66.77:80) | 17:07 |
dtantsur | wget: download timed out | 17:07 |
dtantsur | sigh | 17:07 |
cardoe | dtantsur: thank you. this will at least get all the infra for the redfish in before 2025.1. so it can be developed out of band better. | 17:12 |
dtantsur | ++ | 17:18 |
TheJulia | Has anyone looked at the metal3 job logs to see what is going on? | 17:33 |
TheJulia | ugh, tinycore | 17:34 |
JayF | TheJulia: it appears the dnsmasq downgrade is no longer running | 17:35 |
JayF | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c90/942010/2/check/ironic-standalone-ipa-src/c90f882/job-output.txt at 2025-03-04 05:09:22.882915 | 17:35 |
JayF | hmm 2.90-2build2 is in ubuntu | 17:36 |
JayF | is it possible they pushed an update that /didn't/ include all the fixes our stuff was doing? | 17:36 |
JayF | 2025-03-04 05:02:54.198281 | controller | Get:94 https://mirror-int.ord.rax.opendev.org/ubuntu noble/main amd64 dnsmasq-base amd64 2.90-2build2 [375 kB] | 17:36 |
TheJulia | so, there is sort of a mixed bag going on here | 17:37 |
TheJulia | since we did also remove our explicit downgrade because the logic was also no longer checking, and we kept the downgrade wording even though it was... 2.89 we were dongrading to 2.86 but went to accepting 2.90-ubuntu-something | 17:38 |
JayF | https://github.com/openstack/ironic/blob/master/devstack/lib/ironic#L3674 needs to not say 2.90 anymore, I think | 17:38 |
JayF | yep exactly | 17:38 |
JayF | we're at the same conclusion I think | 17:38 |
TheJulia | I simplified that check to just be 2.90 so we wouldn't try to invoke ubuntu specifics on every other os | 17:38 |
JayF | the ubuntu version changed, (I checked) it doesn't include the patch we need | 17:38 |
JayF | so the missing dnsmasq downgrade is certainly hurting is | 17:38 |
TheJulia | le sigh | 17:38 |
JayF | **us | 17:39 |
TheJulia | Then we need to revise it, but maybe keep explicitly only run it on ubuntu | 17:39 |
JayF | it's already under is_ubuntu | 17:39 |
TheJulia | ahh, I think that was the other thing I did as well | 17:39 |
TheJulia | I honestly don't remember it at this point | 17:39 |
JayF | honestly given now in the build we remove the dnsmasq_dir | 17:39 |
JayF | I'm tempted to remove all the idempotence logic | 17:40 |
JayF | since I think we can compile everytime we stack with no harm | 17:40 |
TheJulia | I guess | 17:40 |
TheJulia | in any event, the standalone jobs all passed so I workflowed my default change anyhow | 17:40 |
TheJulia | as it is CI config only | 17:40 |
JayF | I'm not looking for a specific change so much as was curious why dnsmasq started crashing again | 17:41 |
TheJulia | yeah, no idea | 17:41 |
JayF | I can try to whip up a fix for that to ensure our upgrades happen again? | 17:41 |
TheJulia | but if the specific crash fixes we expected in 2.90 are no longer in the package, who knows | 17:41 |
TheJulia | sure, fwiw, looking at the change log | 17:41 |
TheJulia | I'd consider 2.91rc5 from git | 17:42 |
JayF | https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=f006be7842104a9f86fbf419326b7aad08ade61d | 17:42 |
JayF | (october) but 2.90-deb2 is from 12 months ago | 17:42 |
JayF | yeah, I'm going to pull the idempotence | 17:42 |
TheJulia | ugh | 17:42 |
JayF | actually, even better idea incoming | 17:43 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Restore recompile of dnsmasq https://review.opendev.org/c/openstack/ironic/+/943339 | 17:46 |
* JayF goes back into his SCALE-talk-writing bunker | 17:47 | |
opendevreview | Jay Faulkner proposed openstack/ironic master: Restore recompile of dnsmasq https://review.opendev.org/c/openstack/ironic/+/943339 | 17:47 |
opendevreview | Verification of a change to openstack/ironic master failed: Add systemd provider for console containers https://review.opendev.org/c/openstack/ironic/+/941614 | 17:58 |
opendevreview | Merged openstack/ironic master: allow multiple inspection interfaces to load hooks https://review.opendev.org/c/openstack/ironic/+/940333 | 18:20 |
opendevreview | Doug Goldstein proposed openstack/ironic master: fix glance metadata layout https://review.opendev.org/c/openstack/ironic/+/942496 | 18:23 |
opendevreview | Nicholas Kuechler proposed openstack/ironic-python-agent-builder master: fix: Adds bsdextrautils package for Debian which provides hexdump command https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/943334 | 19:05 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Restore recompile of dnsmasq https://review.opendev.org/c/openstack/ironic/+/943339 | 19:24 |
cardoe | cid: what do we need to do to get your rules stuff complete for the release/ | 19:26 |
JayF | a nonzero part of that has to be "fix CI for ironic", yeah? | 19:26 |
cardoe | I mean ignoring that. | 19:26 |
cardoe | I'm just looking at stuff we had as a priority and or close and seeing if we can get it over the line. | 19:27 |
cardoe | Plus I'm really hoping that with this ESXi non-sense in my rear view mirror I can get back to that. | 19:27 |
cid | I think, mostly a review and votes :) | 19:27 |
cardoe | https://github.com/rackerlabs/esxi-img is ultimately what we did as an approach. Still need to fix up license and give Sandzwerg[m] some appropriate doots as he sees fit. | 19:28 |
cardoe | I'll get it up on pypi.org shortly but you can basically do "uvx esxi-img gen-img path/to/esxi.iso" and you get a file dropped in your cwd that's an image that you can upload to glance and use the direct deploy interface with a config-drive and it'll "work on my box" | 19:30 |
cardoe | There's easily a half a dozen cases we've thought of that split out "patches welcome" | 19:32 |
cardoe | e.g. BIOS boot... we only did UEFI | 19:32 |
opendevreview | Verification of a change to openstack/ironic master failed: CI: Change standalone jobs over to OVN https://review.opendev.org/c/openstack/ironic/+/943324 | 19:33 |
opendevreview | Julia Kreger proposed openstack/ironic master: network testing: hooking in an external network simulator https://review.opendev.org/c/openstack/ironic/+/942298 | 19:43 |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP: Add network simulator support for force10 os 10 https://review.opendev.org/c/openstack/ironic/+/943345 | 19:43 |
opendevreview | Julia Kreger proposed openstack/ironic master: network testing: hooking in an external network simulator https://review.opendev.org/c/openstack/ironic/+/942298 | 19:43 |
JayF | cardoe: you should blog that | 19:44 |
JayF | cardoe: or get it written down /somewhere/ public. | 19:44 |
JayF | I don't think it goes in openstack docs if it's that, but that information should be preserved | 19:44 |
JayF | maybe good fodder for ironicbaremetal.org? IDK | 19:45 |
cardoe | Sure. | 19:45 |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP: Add network simulator support for force10 os 10 https://review.opendev.org/c/openstack/ironic/+/943345 | 19:45 |
TheJulia | cardoe: bios boot... wut?! ;) | 19:46 |
TheJulia | ironicbaremetal.org posts are a good idea | 19:47 |
cardoe | le sigh... metal3 test hates code again | 19:49 |
cardoe | so btw I tried to install Fedora last night using the Anaconda driver and it 100% doesn't work on 2024.2. So the CERN folks must not have upgraded. Applying my glance metadata patch fixed it. So I stand by my backport request. | 19:50 |
cardoe | I feel like the metal3 job needs to have a fast out. | 19:52 |
cardoe | It already tripped over its feet for reasons that aren't clear to me. But then it's doing TWO 2400 second waits cause maybe you wanna wait a while. If my math is right that's an HOUR AND TWENTY MINUTES when it's literally gonna fail. | 19:54 |
TheJulia | ... cardoe have you spotted anything with metal3 yet? I'm wondering if we just need to make the job non-voting, becuase it now seems to be in the 50% failure rate range for the day | 19:58 |
TheJulia | I just realized you clearly have | 19:58 |
TheJulia | Yeah, thats no good :( | 19:58 |
cardoe | https://zuul.opendev.org/t/openstack/stream/b6fb81b0f354411b83e1c45f809b5207?logfile=console.log | 19:58 |
cardoe | node-0 and node-1 weren't working as best as I can tell @ 19:03. The test then proceeds to wait 2400 seconds for node-0 to maybe become happy even though it's already recorded a failure and looking at the behavior it won't change that FAIL to OK after the fact so the wait is pointless. Now that it's done waiting for node-0, it decides to wait 2400 seconds for node-1. | 20:01 |
cardoe | I've found some other test logs where node-0 failed it waited 40 minutes and then went to node-1 which worked and it finished running tests but still failed the overall CI job. | 20:02 |
cardoe | And I get that node-0 is ipmi and node-1 is redfish so its testing different code paths so its good to see the results of both... but we've got other tests that test redfish and ipmi. So I feel like the metal3 job is really just a full integration test and if any one of the two nodes fail, just fast fail out. I'm not gonna be using the metal3 integration test to troubleshoot why a redfish thing broke. I'll use a redfish | 20:03 |
cardoe | specific test job. | 20:03 |
cardoe | only reason I've looked at it in detail is that my glance fix has failed 4 times on it | 20:06 |
JayF | have you found other patches with similar failures? | 20:06 |
JayF | Just making sure it's not that your glance patch sneaky-breaks metal3 :D | 20:07 |
cardoe | I mean libvirt failed to start the VMs. I guess my patch changed how much CPU heat the test creates which threw off libvirt.... | 20:08 |
JayF | I try to leave my assumptions at the door with that metal3 job | 20:08 |
cardoe | The last time IPA generated a MASSIVE log file when it ran inspection and the test has a hard deadline on how long inspection can take and the big payload caused it to still be in continue_inspection() at the timeout | 20:09 |
TheJulia | Looks like metal3 is failing because lwo disk space | 20:14 |
TheJulia | https://www.irccloud.com/pastebin/y0lkLMp2/ | 20:14 |
cardoe | Which log did ya find that in? | 20:18 |
TheJulia | the ironic.log file | 20:18 |
TheJulia | https://fad549b2938ec7abe38a-5d9007d54f55ca0ff60ac5193693c3de.ssl.cf2.rackcdn.com/943324/1/gate/metal3-integration/de17864/controller/system/df--h.txt | 20:18 |
TheJulia | we can't run that job | 20:19 |
JayF | I believe on some of the rackspace machines, there's some space you can mount up and use | 20:19 |
JayF | that isn't prepared by default | 20:19 |
JayF | so it's likely possible but needs engineering | 20:20 |
TheJulia | that... doesn't appear to be the case on this machine | 20:20 |
cardoe | ohhhh before_pivoting... I never looked in there. | 20:20 |
JayF | https://fad549b2938ec7abe38a-5d9007d54f55ca0ff60ac5193693c3de.ssl.cf2.rackcdn.com/943324/1/gate/metal3-integration/de17864/controller/system/lsblk--ap.txt\ | 20:20 |
TheJulia | only xvda as a device | 20:20 |
JayF | xvde1 80G unmounted | 20:20 |
TheJulia | ohh, yes | 20:20 |
cardoe | I wish I had a clue what boxes those were | 20:21 |
TheJulia | xvde1 as a parition | 20:21 |
TheJulia | doesn't show on dmesg output... | 20:21 |
cardoe | any idea what the machine UUID would be? | 20:22 |
cardoe | the kernel set it from the VM UUID | 20:22 |
cardoe | https://fad549b2938ec7abe38a-5d9007d54f55ca0ff60ac5193693c3de.ssl.cf2.rackcdn.com/943324/1/gate/metal3-integration/de17864/controller/system/lsblk--ap.txt | 20:23 |
cardoe | Jay's right | 20:23 |
JayF | either way, given we're on fire, we should -nv the job and file a bug with this research | 20:24 |
JayF | IMHO | 20:24 |
cardoe | agreed | 20:24 |
cardoe | IMHO the job needs a fast out cause this isn't the first time when its broken that its taken 2+ hours to free up the nodes. | 20:25 |
TheJulia | yeah, agree | 20:25 |
TheJulia | Looking at the structure of the way the job runs and sets it up, we likely need someone who knows the ins and outs of the metal3 dev scripts to figure out how they would prefer to wire in using that space | 20:27 |
TheJulia | hmm | 20:28 |
TheJulia | it has a knob | 20:28 |
TheJulia | oh, its for the vms built | 20:29 |
opendevreview | Julia Kreger proposed openstack/ironic master: CI: Make metal3 non-voting https://review.opendev.org/c/openstack/ironic/+/943347 | 20:32 |
opendevreview | Verification of a change to openstack/ironic-python-agent master failed: Fix the way qemu-img is called with prlimits https://review.opendev.org/c/openstack/ironic-python-agent/+/942690 | 20:42 |
opendevreview | Verification of a change to openstack/ironic-python-agent-builder master failed: More reliable TinyIPA builds with network retries https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/942369 | 20:55 |
opendevreview | Satoshi Shirosaka proposed openstack/ironic-python-agent-builder master: WIP Test Podman for container-based cleaning https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/943348 | 21:16 |
opendevreview | Verification of a change to openstack/ironic master failed: CI: Change standalone jobs over to OVN https://review.opendev.org/c/openstack/ironic/+/943324 | 21:18 |
opendevreview | Satoshi Shirosaka proposed openstack/ironic-python-agent master: WIP Add ContainerHardwareManager https://review.opendev.org/c/openstack/ironic-python-agent/+/941714 | 21:20 |
opendevreview | Satoshi Shirosaka proposed openstack/ironic-python-agent master: WIP Add ContainerHardwareManager https://review.opendev.org/c/openstack/ironic-python-agent/+/941714 | 21:24 |
opendevreview | Verification of a change to openstack/ironic master failed: Add systemd provider for console containers https://review.opendev.org/c/openstack/ironic/+/941614 | 21:34 |
JayF | Gross. All the CI passed on my DNSmasq except for pep 8 | 22:24 |
JayF | I'm going to clean it up and re-push. Might be a good idea for someone to pre-approve it if we want it to merge overnight | 22:24 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Restore recompile of dnsmasq https://review.opendev.org/c/openstack/ironic/+/943339 | 22:26 |
TheJulia | I just workflowed the metal3 to nv change as well and noted the consensus this morning | 22:49 |
opendevreview | Verification of a change to openstack/ironic-tempest-plugin master failed: Add retries while waiting for SSH on server https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/942009 | 23:03 |
JayF | thank you | 23:37 |
opendevreview | Satoshi Shirosaka proposed openstack/ironic-python-agent master: WIP Add ContainerHardwareManager https://review.opendev.org/c/openstack/ironic-python-agent/+/941714 | 23:49 |
opendevreview | Merged openstack/ironic master: CI: Change standalone jobs over to OVN https://review.opendev.org/c/openstack/ironic/+/943324 | 23:51 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!