Wednesday, 2024-02-28

opendevreviewJulia Kreger proposed openstack/ironic stable/2023.1: Special case lenovo UEFI boot setup  https://review.opendev.org/c/openstack/ironic/+/91044700:04
opendevreviewJulia Kreger proposed openstack/ironic stable/zed: Special case lenovo UEFI boot setup  https://review.opendev.org/c/openstack/ironic/+/91044800:06
opendevreviewJulia Kreger proposed openstack/ironic stable/xena: Special case lenovo UEFI boot setup  https://review.opendev.org/c/openstack/ironic/+/91031600:17
opendevreviewJulia Kreger proposed openstack/ironic stable/wallaby: Special case lenovo UEFI boot setup  https://review.opendev.org/c/openstack/ironic/+/91031700:17
*** jph6 is now known as jph00:21
JayFTheJulia: landing the version of that on master00:25
JayFoh, I mean00:26
JayF+2 but no +A00:26
JayFgate is broken so no sense in +A00:26
JayFbut someone can feel free to when it's fixed00:26
TheJuliayeah00:26
TheJuliaI backported it so someone could hopefully pick it up off of wallaby and give it a spin00:26
JayFI +2'd back to stable/zed00:27
JayFI'll wait for that report to hit UM+others00:27
TheJuliaack00:27
* TheJulia has reached zero brain00:27
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: Invoke tests with fake interfaces  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/90993900:42
TheJuliadepends-on on added for jayf's patch.00:42
TheJuliaWitht hat, I'm going to go call it a day00:42
TheJuliaoh heh, the changes don't stack cleanly01:30
TheJuliaoh well01:30
opendevreviewMerged openstack/ironic master: ci: Source install dnsmasq-2.87  https://review.opendev.org/c/openstack/ironic/+/88812101:43
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: Invoke tests with fake interfaces  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/90993903:41
TheJuliahjensas: 909939 fixes the tests, but I'll have to unbrick the gates to get it to merge. I guess all our downstream notes can get updated with the same cause.04:30
opendevreviewKaifeng Wang proposed openstack/python-ironicclient master: Client support port name  https://review.opendev.org/c/openstack/python-ironicclient/+/89606706:45
opendevreviewJacob Anders proposed openstack/sushy-tools master: [WIP] Add support for BIOS update emulation  https://review.opendev.org/c/openstack/sushy-tools/+/90950006:48
rpittaugood morning ironic! o/07:58
rpittauwe have 3 patches in ironicclient to merge before releasing, if anyone can have a look today would be great09:01
rpittauhttps://review.opendev.org/c/openstack/python-ironicclient/+/90878809:01
rpittauhttps://review.opendev.org/c/openstack/python-ironicclient/+/90679409:01
rpittauhttps://review.opendev.org/c/openstack/python-ironicclient/+/90679609:01
rpittauTheJulia, JayF: not sure who we can ping for https://review.opendev.org/c/openstack/tempest/+/90872709:02
rpittauTheJulia: about https://review.opendev.org/c/openstack/ironic/+/910444 not sure if you tried with the latest pkg from jammy, it is indeed 2.90 https://paste.openstack.org/raw/bMvZMYa2mKsXzmWJbcwu/09:53
zigoIs there (was there) a problem with the Ironic gate? How come my patch failed gating twice with unrelated issues? https://review.opendev.org/c/openstack/ironic-python-agent/+/91020910:02
zigoOh, read TheJulia, so the gate really is broken ... :/10:03
rpittauzigo: there are multiple issues, trying to sort the unit tests at the moment10:20
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Fix commands order in partition tests  https://review.opendev.org/c/openstack/ironic-python-agent/+/91048010:27
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Fix commands order in partition tests  https://review.opendev.org/c/openstack/ironic-python-agent/+/91048010:34
fricklerrpittau: gmann and kopecmartin are the only remaining tempest people afaict. added them to your patch, you can also ping them in #openstack-qa if needed10:38
rpittaufrickler: thanks a lot! :)10:39
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Fix unit tests after ironic-lib changes  https://review.opendev.org/c/openstack/ironic-python-agent/+/91048011:22
opendevreviewDmitry Tantsur proposed openstack/ironic master: [WIP] Add inspection PXE filter service  https://review.opendev.org/c/openstack/ironic/+/90799111:30
opendevreviewDmitry Tantsur proposed openstack/ironic master: [WIP] Add inspection PXE filter service  https://review.opendev.org/c/openstack/ironic/+/90799111:36
opendevreviewMerged openstack/python-ironicclient master: [codespell] Fixing Spelling Mistakes  https://review.opendev.org/c/openstack/python-ironicclient/+/90679412:20
opendevreviewMerged openstack/python-ironicclient master: [codespell] Adding Tox Target for Codespell  https://review.opendev.org/c/openstack/python-ironicclient/+/90679512:20
opendevreviewVerification of a change to openstack/python-ironicclient master failed: [codespell] Adding CI target for Tox Codespell  https://review.opendev.org/c/openstack/python-ironicclient/+/90679612:26
opendevreviewJacob Anders proposed openstack/sushy-tools master: [WIP] Add support for BIOS update emulation  https://review.opendev.org/c/openstack/sushy-tools/+/90950013:12
opendevreviewMerged openstack/ironic-python-agent-builder master: Update tinyipa to tinycore 15.x  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/91016913:37
TheJuliagood morning14:12
TheJuliarpittau: I'd just keep the job non-voting, it has been left failing for cycles in the past, its not a big deal in my book14:13
rpittauTheJulia: ack14:13
opendevreviewMerged openstack/python-ironicclient master: Force constraints when installing a package during tox test  https://review.opendev.org/c/openstack/python-ironicclient/+/90878814:27
TheJuliahttps://review.opendev.org/c/openstack/ironic/+/910444 is for stable/2023.2 and passes CI14:33
opendevreviewJulia Kreger proposed openstack/ironic master: ci: re-enable grenade test job  https://review.opendev.org/c/openstack/ironic/+/91051614:35
TheJuliaheh, ubuntu pulled their new dnsmasq into focal14:41
dtantsurso kinds of them14:41
TheJuliasince it is quietly exiting, I'm going to pull the source install workaround down to focal as well14:42
opendevreviewJulia Kreger proposed openstack/ironic stable/2023.1: ci: Source install dnsmasq-2.87  https://review.opendev.org/c/openstack/ironic/+/91051814:46
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Fix unit tests after ironic-lib changes  https://review.opendev.org/c/openstack/ironic-python-agent/+/91048014:53
opendevreviewJulia Kreger proposed openstack/ironic master: ci: update dnsmasq to 2.90 via source  https://review.opendev.org/c/openstack/ironic/+/91052115:00
TheJuliaugh https://9cb23ed83c623b0d0348-b8e5e06ecccd85e574a730a33d1fddb0.ssl.cf1.rackcdn.com/910518/1/check/ironic-tox-unit-with-driver-libs/805f681/testr_results.html15:38
* dtantsur blinks15:39
dtantsurhave they just switched to async/await? Oo15:39
TheJuliaThe version is 5.0.015:40
TheJuliait is not being constrained15:40
TheJuliait should... be constrained15:40
JayFpysnmp is a driver lib, is it in u-c?15:41
JayFor do we need to constrain in driver-requirements.txt15:41
dtantsurwe might consider using u-c always15:41
JayFyeah it's in u-c15:42
JayFhttps://github.com/openstack/requirements/blob/master/upper-constraints.txt#L39915:42
JayFthree snmp libs in there15:42
JayFhttps://opendev.org/openstack/ironic/src/branch/master/tox.ini#L3115:43
JayFwe don't use u-c for driver-libs jobs15:43
JayFthere's the answer15:43
TheJuliait is not in u-c on that branch15:43
TheJuliathis is stable/2023.115:43
rpittauwe probably need this https://review.opendev.org/c/openstack/ironic/+/90878315:43
JayFeither way, we don't use constraints for driver-libs15:43
JayFrpittau: we don't pass -c on the driver-libs job15:44
JayFit's more basic than that15:44
TheJuliayeah, we need to directly constrain it on that branch15:44
JayFI'd suggest for stable/ branches, updating requirements15:44
JayFand fixing master going forward15:44
JayFbut I'll +2 any reasonable fix15:44
TheJuliamaster is not broken by this right now15:44
JayFoh! 15:44
JayF{[testenv]deps} is inheritance, isn't it?15:45
JayFis that a new thing? New to me, anyway, and awesome15:45
rpittaunot very new15:45
TheJuliaon a plus side, I seem to be able to reproduce it  locally :)15:45
* TheJulia waits for tox to timeout15:45
TheJuliaerr, no, it passed15:47
TheJuliaugh15:47
TheJuliaoh, it is via proliantutils15:48
opendevreviewJulia Kreger proposed openstack/ironic stable/2023.1: stable-only: pin proliantutils to prevent break  https://review.opendev.org/c/openstack/ironic/+/91052815:53
opendevreviewDmitry Tantsur proposed openstack/ironic master: Don't import sushy conditionally, it's a requirement  https://review.opendev.org/c/openstack/ironic/+/91052915:56
opendevreviewJulia Kreger proposed openstack/ironic stable/2023.1: ci: Source install dnsmasq-2.87  https://review.opendev.org/c/openstack/ironic/+/91051815:56
JayFdtantsur: that's a fun change, I like it :D16:02
opendevreviewJulia Kreger proposed openstack/ironic stable/2023.1: ci: Source install dnsmasq-2.87  https://review.opendev.org/c/openstack/ironic/+/91051816:03
JayFI just landed https://review.opendev.org/c/openstack/ironic/+/888297 as a single core, only change from last (2x+2) patchset is spelling fix16:06
JayFhttps://review.opendev.org/c/openstack/ironic/+/901090 needs another core review16:07
rpittauapproved16:08
JayFdtantsur: I went ahead and landed https://review.opendev.org/c/openstack/ironic/+/902801 (reserved workers pool), I don't like it, but I don't have better ideas or time to implement them16:08
JayFI am afraid my original -1 might have scared off other cores from approving it, too :)16:09
TheJuliaI tagged https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/909939 as a prio item for reviews... It makes the api tests explicit instead of relying upon a default new interface starting as "fake" && "noop", meaning before the change, you couldn't just run the api tests against a deployment with any operational defaults16:12
dtantsurJayF: thanks!16:26
TheJuliaugh, looks like we're going to need to recheck the stable/2023.2 fix. Metal3-integration failed on it16:35
TheJuliatimeout from quay.io16:35
JayFif metal3 itself doesn't do stable16:36
JayFwe should probably drop those jobs from stable branches16:36
JayF(yeah?)16:36
TheJulia....16:36
TheJuliaYou've jumped to an unrelated conclusion16:36
TheJuliathe root cause of the failure is the job timed out talking to quay.io16:36
JayFNo, I'm pulling in information from the slack context of a chat dtantsur and I had a couple weeks ago16:36
TheJuliawell, then yeah, we likely shouldn't have the jobs on stable branches16:36
JayFYeah, I'm basically musing related to that if we should run those jobs or not16:37
JayFdtantsur: does it make sense to run metal3 jobs on ironic?16:37
JayFdtantsur: er, stable brnaches16:37
dtantsurNot much, at least not at this point16:37
dtantsurand yes, quay is more or less reliable, but not 100%16:38
rpittauI agree on not running metal3 jobs on stable branches, doesn't really make sense for now16:39
JayFack; I'll push changes to that effect now()16:41
rpittauI'll have a look tomorrow if noone else get to it, time to leave16:41
rpittaugood night! o/16:41
opendevreviewJulia Kreger proposed openstack/ironic master: Add note regarding metal3 ci job in CI config for stable runs  https://review.opendev.org/c/openstack/ironic/+/91053616:43
TheJuliaJayF: fyi^ just so we know we can prune them out later on 16:43
JayFDoes anyone know who panhongyin is in Gerrit?16:44
JayFThey've been providing more and more reviews but I've not met them16:44
JayFand want to thank them for being active16:44
JayFTheJulia: https://review.opendev.org/c/openstack/ironic/+/910436 needs to be abandoned, I think? (changing package url for dnsmasq) -- if so just say yeah  and I'll do it16:46
TheJuliayup, thanks16:46
JayFaight, I just did a run through priority reviews16:48
JayFSorry it's been about a week since I did a through one before today16:48
dtantsurTheJulia: you can also tell Zuul to only run a job on master16:55
dtantsurI think we've done it for some jobs somewhere..16:55
dtantsuryeah https://opendev.org/openstack/ironic-python-agent-builder/src/branch/master/zuul.d/project.yaml#L44-L5116:55
TheJuliaI think you can, but we already have to prune jobs as time passes anyway16:55
dtantsuryep, but this will be automagical16:56
TheJuliaso the pre-emptive grant of permission is not a bad thing16:56
TheJuliawe're really bad at pruning stuff, imho16:57
clarkbnote that isn't really automagically because https://opendev.org/openstack/ironic-python-agent-builder/src/branch/stable/2023.2/zuul.d/project.yaml#L40-L47 also exists16:58
opendevreviewMerged openstack/python-ironicclient master: [codespell] Adding CI target for Tox Codespell  https://review.opendev.org/c/openstack/python-ironicclient/+/90679616:58
opendevreviewVerification of a change to openstack/ironic-python-agent master failed: Force constraints when installing a package during tox test  https://review.opendev.org/c/openstack/ironic-python-agent/+/90878716:58
clarkbif you remove the job from master then all of a sudden that old job config will apply to master16:59
TheJuliaHeh, so pruning the config is likely for the best then17:00
dtantsurTIL!17:03
TheJuliano, today we learned!17:04
TheJulia:)17:04
dtantsurTrue :)17:04
opendevreviewJulia Kreger proposed openstack/ironic stable/2023.2: ci: stable-only: remove metal3-integration ci job  https://review.opendev.org/c/openstack/ironic/+/91053717:09
TheJuliaheh, metal3-integration has failed on another ojb17:11
opendevreviewDmitry Tantsur proposed openstack/ironic master: Add inspection PXE filter service  https://review.opendev.org/c/openstack/ironic/+/90799117:14
*** dking is now known as Guest131617:15
*** Guest1316 is now known as dking17:16
dtantsurI assume we haven't done anything to fix https://bugs.launchpad.net/ironic/+bug/2052468 have we?17:19
TheJuliaafaik no17:19
TheJuliaI proposed a ptg topic on the subject, fwiw17:19
dtantsurTheJulia: ack. Next, could you confirm/deny https://bugs.launchpad.net/ironic/+bug/2053068/comments/1 ?17:21
* dtantsur OUCH @ https://bugs.launchpad.net/ironic/+bug/205459417:22
TheJuliayes, that is true17:23
TheJuliaoh jeeze17:23
dtantsurthe length issue? yeaaah17:24
TheJuliaThe length can be extended, just ugh17:24
TheJuliaI geuss it is something like <federation><user> concatonated together17:25
dtantsurI wonder why we picked 3217:26
TheJulialength of a uuid17:27
TheJuliadidn't expect longer user ids17:28
TheJuliaI'll start on a fix later today most likely, need to draft an agenda and send a few emails first17:28
dtantsurCould not import extension openstackdocstheme (exception: No module named 'distutils')17:31
dtantsurIncredible!17:31
opendevreviewDmitry Tantsur proposed openstack/python-ironicclient master: Add missing commands to the documentation  https://review.opendev.org/c/openstack/python-ironicclient/+/91054017:32
dtantsuris anyone aware of any movements around https://bugs.launchpad.net/ironic/+bug/1732534 ?17:39
TheJuliaI'm unsure it can be solved in ironic at all17:41
TheJuliaperhaps some documentation?17:41
clarkbuseradd on linux also limits usernames to 32 chars long17:41
opendevreviewVerification of a change to openstack/ironic stable/2023.2 failed: ci: Source install dnsmasq-2.87  https://review.opendev.org/c/openstack/ironic/+/91044417:42
JayFDoes someone have a suggestion for cid on places to learn more about base level tech that Ironic is built on? e.g. DHCP/PXE/Redfish/IPMI/etc17:48
dtantsurIronic source? :D </kidding>17:55
opendevreviewVerification of a change to openstack/ironic stable/2023.2 failed: ci: Source install dnsmasq-2.87  https://review.opendev.org/c/openstack/ironic/+/91044417:59
clarkbJayF: I learned a lot of that stuff in a lab environment on old sun workstation hardware that my university basically didn't want anymore. So we had a large closet that we hacked on stuff in.18:00
clarkbtl;dr I would suggest something similar of possible. Can probably virtualize much of it18:00
clarkbstart with a route and a switch, add a piece of hardawre that you can dhcp and pxeboot and take it from there18:01
opendevreviewVerification of a change to openstack/ironic master failed: Detect ilo6 and redirect to redfish  https://review.opendev.org/c/openstack/ironic/+/88829718:06
JayFclarkb: That requires more local resources than he has access to18:07
JayFdtantsur: Where do you think the questions came from :D 18:07
JayFdtantsur: I got to butcher an explanation to cid on how driver composition works, I'm glad it's not recorded, you'd probably find it upsetting ;) 18:08
JayFlike a kid drawing the mona lisa as a stick figure :P 18:08
dtantsurLOL18:09
cidlol. didn't sound like that to me.18:09
dtantsurmeanwhile, I've added an "Ancient bugs" section to the bug dashboard, and it's kinda depressing...18:09
JayFI basically said node.driver gives you a menu of options to pick from 18:09
JayFwith defaults18:09
dtantsur2014-04-21 23:19:57 is the oldest open one18:09
JayFand you can use the interfaces to tweak it18:09
JayFlol18:09
JayFprobably from me18:09
JayFthat date is perfect for it to be an early jay bug18:10
dtantsurAeva https://bugs.launchpad.net/ironic/+bug/131084318:10
JayFand I wrote a lot of doozies lol18:10
* dtantsur hopes he hasn't caused conflicts for in-flight dashboard patches18:10
clarkbJayF: cid: it doesn't have to be much. If you have a home router it can probably be convinced to dhcp and pxe other things. Probably the biggeest issue is finding something to pxeboot but that can just be a VM even18:11
clarkband then tcpdump all the things18:11
cidI heard at least 10 new terms today on the meet. do you have any resources. Like, I can't pretend to know what "tcpdump" all the things means.18:14
clarkbcid: tcpdump is a network capture tool that you can run on linux/unix type systems. Maybe windows too I'm not sure. But it allows you to see the actual network traffic flowing on the network which can be useful to better understand network protocols. Another higher level tool that is probably more user friendly is wireshark18:15
JayFMicrosoft has a tool akin to tcpdump which is stellar on windows18:15
JayFWireshark also has a gui for windows which can do captures18:16
clarkbnice. I usually start with tcpdump to figure out what I want to capture then dump that to a file and look at it in wireshark for more in depth diggig/debugging18:16
cidI have used wireshark once18:18
cidBy the way JayF: I didn't get a mail for the meet18:20
JayFtcpdump is just a CLI tool that does wireshark-y things18:23
JayFin fact, wireshark will often take a dump file from tcpdump as input to "replay"18:23
JayFbut I'd even say ... you don't even really have to go that far18:24
JayFyou have a devstack w/Ironic18:24
JayFset to pxe boot fake baremetal nodes...18:24
JayFyou could potentially just tcpdump on the bridge while the pxe booting is happening and just watch it18:24
JayFI'm not 100% sure which interface, but figuring that part out could be part of the fun18:24
clarkb++18:25
clarkband really the reason to tcpdump is to "see" things in action to get a better undersatnding of the network protocols. Not strictly necessary but can be really useful when things go wrong18:25
JayFclarkb: tbh, I actually have an adage of network captures /usually/ are wild goose chases18:32
JayFonly exception being if you're working on /actual network software/ (e.g. the troubleshooting Julia was doing around OVN, of course you have to packet cap that stuff)18:33
clarkbfor me it makes things concrete in a way that makes them more undstandabale18:33
clarkba diagram for dhcp flow is super abstract but seeing the broadcasts and responses and renewals helps me18:33
JayFlook at too many packet captures and you realize how awful some dhcp clients are lol18:35
* TheJulia twitches18:38
TheJuliasign, looks like standalone is broken now18:57
JayFI just rechecked my job, it was broken in a strange way19:00
JayFI was thinking (hoping?) it was a bit flip in transfers19:00
JayFbut if it's reproducing in other places :( 19:00
TheJulialooks like it might just be dnsmasq in general19:00
TheJuliaat least, that is my *guess* at the moment19:01
JayFWould you be +1 to marking it n-v while we troubleshoot?19:01
TheJuliadunno, looks like the redfish variant failed19:03
TheJulia(.... I thought I removed one of them... maybe that change just hasn't merged yet)19:03
opendevreviewVerification of a change to openstack/ironic master failed: Add redfish https boot CI job  https://review.opendev.org/c/openstack/ironic/+/90109019:03
JayFheh19:04
JayFnice timing, gerrit19:04
JayFthat's adding a job, not removing one19:04
TheJuliahmmm19:05
TheJulianothing definitive, no direct sign of dnsmasq19:05
JayFin the ironic-standalone failure I saw, I looked in console logs19:06
TheJuliadnsmasq respawning https://f4a78187a8f66e46939f-e2f3a8f1da38bd85104d6de65559a608.ssl.cf1.rackcdn.com/902801/2/gate/ironic-standalone/626a7a4/controller/logs/screen-q-dhcp.txt19:06
JayFand last ramdisk boot failed with errors around not getting proper size back19:06
JayFwe really need to find that .deb file19:07
JayFsomewhere19:07
JayFand host it to get our gate happy for now19:07
JayFand dig dnsmasq without a time crunch19:07
TheJuliaI don't even know where to start with https://f4a78187a8f66e46939f-e2f3a8f1da38bd85104d6de65559a608.ssl.cf1.rackcdn.com/902801/2/gate/ironic-standalone/626a7a4/controller/logs/ironic-bm-logs/node-2_console_log.txt19:08
JayFit's sorta the behavior I'd expect19:09
JayFif it's respawning 19:09
TheJuliawell, any new process launch would change the log19:09
TheJuliait is almost like qemu is kind of going "uhhh reset!"19:09
TheJuliaI can't help but feel like we might find something like: https://forums.fedoraforum.org/showthread.php?323033-kernel-Dazed-and19:11
JayFTheJulia: we don't need to fix the dnsmasq stuff somewhere else? e.g. bifrost, right?19:14
JayFThis looks devstack-y so I assume not, but we've validated it's actually running the version we think, right?19:14
TheJuliabifrost leverages static config19:14
TheJuliathe issue here is rooted in dnsmasq configuration getting updated19:14
JayFokay, confirmed the basic assumption, we're running the compiled version19:15
JayFgoing to check neutron-dhcp-agent for changes19:15
TheJulia... we could hold the next failure19:16
JayFtrying to rule out other possibilities before diving into C, which I don't know well19:16
JayFthe key is being able to reproduce19:16
JayFif I can get a reproducer, specifically locally, I can solve this problem19:16
JayFeither Jay-I or I can get resources from GR-OSS19:16
opendevreviewMerged openstack/ironic master: Handle jsonschema empty error message update  https://review.opendev.org/c/openstack/ironic/+/90959219:18
opendevreviewMerged openstack/ironic master: Force constraints when installing a package during tox test  https://review.opendev.org/c/openstack/ironic/+/90878319:18
opendevreviewMerged openstack/ironic-inspector master: Force constraints when installing a package during tox test  https://review.opendev.org/c/openstack/ironic-inspector/+/90878419:18
TheJuliawhoaw19:19
TheJuliathings merged19:19
TheJulia\o/19:19
JayFyeah, this is the other thing I was suspicious of19:19
JayFgate is as busy today as it ever is19:19
JayFif there were more rare edge cases in dnsmasq breakage, we'd see them here19:19
TheJuliaor the excess pressure is causing "other" things happening which are related19:19
JayFyep19:19
opendevreviewJay Faulkner proposed openstack/ironic master: Remove downgrade_dnsmasq; 2.90 is upstream now  https://review.opendev.org/c/openstack/ironic/+/91044519:20
JayFfixed the version number instead of rechecking19:20
JayFit did pass on your recheck19:20
JayFif we can verify this works at least as well as the custom compiled version, at least we've made some progress (and can operate testing on upstream ubuntu)19:20
JayFso I do think one thing we can/should do19:23
JayFis enable debugging symbols on our build (if possible)19:23
JayFtry to save the core file generated19:24
JayFand see why it's busted19:24
TheJuliadtantsur: so https://review.rdoproject.org/r/c/openstack/ironic-distgit/+/51902/2/openstack-ironic.spec  although I'm wondering if it should recommended since it is purely for bios booting19:24
opendevreviewJay Faulkner proposed openstack/ironic master: Update nova instance instructions to use demo user  https://review.opendev.org/c/openstack/ironic/+/91054519:31
JayFcid: ^ as promised19:31
cidthat was fast19:31
JayFDoc updates are mostly a quick edit and push; once you get the workflow downpat it doesn't take so long19:32
cidtrue19:33
cidso you can provide a fix without create a bug first?19:33
cid*creating19:33
JayFSo there are two basic things that a bug does for us: 1) allows us to track details of an issue until we have time to track it down (or while we're tracking it down) or 2) allows a place for us to show users where things have changed19:34
TheJuliaso going back to the merged patches, https://review.opendev.org/c/openstack/ironic/+/909592?tab=change-view-tab-header-zuul-results-summary  :( 19:34
JayFah19:34
JayFin some projects, it also does a third: creating documentation the bug has been fixed19:35
JayFIn Ironic, we use release notes for that last bit19:35
JayFso unless we need to track something longer term, or it's serious enough to want a bug to point at (e.g. a security issue), we mainly are concerned that things that need a release note get one19:35
JayFto summarize: we almost never ask for someone to explicitly create a bug unless it's for a serious issue19:35
cidcopy that19:36
JayFTheJulia: we don't use any dnsmasq lua features, right?19:42
opendevreviewVerification of a change to openstack/ironic master failed: Add a reserved workers pool (5% by default)  https://review.opendev.org/c/openstack/ironic/+/90280119:44
TheJuliaI'd have to look it up19:44
TheJuliaI highly doubt it19:44
JayFack, that's my assumption19:44
JayFI'm digging thru dnsmasq repo right now19:44
opendevreviewJay Faulkner proposed openstack/ironic master: [DNM/Science] Build master dnsmasq  https://review.opendev.org/c/openstack/ironic/+/91054619:47
opendevreviewJay Faulkner proposed openstack/ironic master: [DNM/Science] Build master dnsmasq  https://review.opendev.org/c/openstack/ironic/+/91054619:50
opendevreviewVerification of a change to openstack/ironic master failed: Log upon completion of power sync  https://review.opendev.org/c/openstack/ironic/+/89133420:29
opendevreviewJay Faulkner proposed openstack/ironic master: [ci] Temporarily disable standalone job voting  https://review.opendev.org/c/openstack/ironic/+/91054820:32
JayFTheJulia: ^ I understand if you wanna nack that :)20:33
TheJuliaI'm semi-tempted to recheck some of the failing items this evening before/while at/or after tonight's hockey game20:35
JayFI think that's a good data point, but still doesn't get us past the fact we can't merge things20:36
JayFand that we can have a reasonable amount of confidence it's because the job is broken, not because the code is broken20:36
JayFthe time pressure to resolve a gate issue one day before milestone-3 is going to make it hard to root cause and fix this properly20:37
JayFthat's my basic motivation20:37
TheJuliawhat worries me is stuff merged with only some tests being run20:45
JayFthose were only unit test changes20:48
JayFso they only ran unit tests20:48
TheJuliaI thought one had actual code20:48
TheJuliaanyway, distracted20:48
JayFI didn't check them all20:48
JayFbut my spot checks all were unit test only 20:48
* TheJulia goes back to gogole doc in other window20:48
JayFTheJulia: https://review.opendev.org/c/openstack/ironic/+/908959 merge conflict, your change to remove ironic-standalone job altogether20:54
JayFTheJulia: I will rebase20:55
TheJuliaoh yeah... there it is!20:55
TheJuliathanks!20:55
JayFoooooh it's stackede behind https://review.opendev.org/c/openstack/ironic/+/90895520:56
TheJuliaoh, doesn't need to be20:56
JayFyeah20:56
JayFI think I'm going to wait for my thing to land20:56
JayFthen start landing these beehind it20:56
JayFthat's the better path20:56
TheJuliaI was just doing some cleanup to remove some excess stuff and it all ended up on a chain20:56
JayFhmm, I could squash em20:56
TheJuliasounds good20:56
JayFthat's probably better tbh20:57
opendevreviewMerged openstack/ironic-python-agent-builder master: Update link to ipmitool repository  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/91021621:00
opendevreviewJay Faulkner proposed openstack/ironic master: Multiple CI updates/improvements  https://review.opendev.org/c/openstack/ironic/+/90895521:02
JayFthis is going to merge conflict with my other landing change21:03
opendevreviewJay Faulkner proposed openstack/ironic master: Multiple CI updates/improvements  https://review.opendev.org/c/openstack/ironic/+/90895521:04
JayFnow it's stacked21:05
JayFso it can be landed21:05
JayFrpittau: This is a change you effectively already had a +2 on: https://review.opendev.org/c/openstack/ironic/+/908955 and is CI, you can probably land it as a single core21:05
opendevreviewVerification of a change to openstack/ironic-python-agent-builder master failed: Update ipmitool version to 1.8.19  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/91034421:07
TheJuliaJayF: ^^^ cirros didn't get a dhcp adddress :(21:09
TheJuliaI think we need to just ask for infra to hold the next standalone job failure21:09
TheJuliasince it has the highest odds of failing21:09
TheJuliagiven so many different dhcp reconfigurations occur there21:10
JayFI struggle to answer what I'd do with such a machine21:10
JayFI might take the approach of spinning up a VM and trying to beat up on dnsmasq and get a tight reproducer21:10
TheJulia1) verify the right version is being invoked 2) hunt for core files 3) try to reproduce exactly21:10
JayFif I can do that, the whole shape of this problem changes21:10
JayF1 is not something I had considered, is good21:11
TheJuliaso neutron does log it21:11
JayFinfra-root: Can someone please hold the next failure of `ironic-standalone` from literally anywhere21:11
* JayF wonders if that highlight works in all channels21:12
TheJulianeutron is logging "dnsmasq[63710]: started, version 2.87 cachesize 150"21:12
TheJuliaso #1, is the one we built21:12
JayFa core file really is what we need, I think21:12
JayFif we can identify where it breaks we can go hunting for the bug21:12
JayFI really suspect we're somehow getting it to reload config twice and it's stomping on itself somehow21:12
TheJuliaFeb 28 20:10:02.134411 np0036907166 dnsmasq[63741]: exiting on receipt of SIGTERM21:13
JayFbut that's not justified by any evidence other than spidey-sense lol21:13
TheJuliaohhhhh21:13
JayF:-?21:13
opendevreviewVerification of a change to openstack/ironic stable/2023.2 failed: ci: Source install dnsmasq-2.87  https://review.opendev.org/c/openstack/ironic/+/91044421:13
fungiJayF: it does, yep21:19
JayFfor access > ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILR/OLTS/VWHzE3vgBFCaTNBg2+MRCENOmDr9oEMxzhZ jay@jvf.cc 21:19
JayFI suspect TheJulia would want a key on there too21:20
JayFbut I can do that if it's a hassle  :)21:20
fungiJayF: i have to specify a project, so any openstack-ironic change which fails a job named ironic-standalone will get that build's node(s) held21:22
fungier, any openstack/ironic change21:23
TheJuliaany openstack/ironic change should do21:23
TheJuliajust ironic-standalone.*21:23
JayFTheJulia: what was the deal with the sigterm discovery?21:25
JayFTheJulia: conclusions, I mean21:25
TheJuliaI'm starting to wonder if we have a comedy of errors21:25
JayFjust making sure before I go deep on this I'm going deep on the right thing21:25
JayFIf so it needs a better sense of humor :|21:26
TheJuliaso restarted 12 times, only sigtermed twice on https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d5e/910546/2/check/ironic-standalone/d5ebf5a/controller/logs/screen-q-dhcp.txt21:27
JayFI have a 2pm-2:30pm meeting, if you wanted to look at this on a call after that I can21:28
TheJuliaenable and restart methods getting called by neutron21:28
JayFwould not hate making sure my knowledge is synced before I get in deep to trying to break dnsmasq in a VM21:28
TheJulia2-3 is a meeting for me, and 3:30 is mr corgi's pickup from the surgeon21:28
JayFoh, I hope things went well with your pup, my FIL's dog recently had that issue+surgery and is doing better as a result21:28
TheJuliaSurgeon called summer and said "so many stones... so very very many stones21:29
JayFhopefully they'll be a lot better getting it fixed21:31
fungiTheJulia: JayF: wait, to be clear, zuul matches on an exact job name. i told it to hold the next failure of a job named "ironic-standalone"21:34
fungiis that an actual job?21:34
TheJuliawe have an ironic-standalone and an ironic-standalone-redfish job21:34
TheJuliaboth are unhappy21:34
JayFe.g. ironic-standalone https://zuul.opendev.org/t/openstack/build/b4274e6e026b4cb592ad9e211eb6917b : FAILURE in 1h 19m 09s21:34
fungiokay, i can create a separate autohold for ironic-standalone-redfish if you want21:35
JayFKangie: ^ Someone in #gentoo-chat said you might have ideas on how to troubleshoot this. Mainly I'm trying to figure out how, locally, in an attempt to reproduce the dnsmasq respawning issues, I might restrict I/O performance on a libvirt+kvm VM in order to emulate high contention in the gate (any other advice you have on this would be good, too)21:35
JayFKangie: basically we're seeing dnsmasq respawn, frequently, during config updates under high load, where we suspect "high load" in reality == bad I/O performance21:36
KangieIt looks like there might be IO limits in a qemu branch 21:36
JayFand I just reproduced the issue on HEAD of master of dnsmasq 21:36
Kangiehttps://repo.or.cz/w/qemu/qemu-dev-zwu.git/shortlog/refs/heads/io_limits_latest21:36
JayFooh good stuff21:36
KangieSadly a decade or more old 21:36
JayFoh, not good stuff LOL21:36
KangieBut if the patches still apply or got upstreamed....21:37
JayFyeah, I'll use that to search around21:37
KangieI guess other things - network filesystem 21:37
JayFam also slightly tempted to... oh, that's a good idea21:37
KangieMount it on a network FS and restrict bandwidth 21:37
JayFI was going to literally just put a spinning disk in the machine21:37
JayFand run bonnie++ on the same partition 21:37
Kangie(or point it at a clustered filesystem and mildly load it... :P)21:37
JayFnetwork fs + traffic shaping is a really, really stellar idea21:38
JayFmainly because I know how to do all those things already21:38
JayFlol21:38
JayFThanks; I knew you'd have some inspiration for me!21:38
KangieHey 21:38
Kangiehttps://blogs.igalia.com/berto/2015/08/14/io-limits-for-disk-groups-in-qemu-2-4/21:38
KangieIO limits seem to have landed :D21:38
JayFokay, this is excellent :D21:39
KangieI got lucky on my first google. :)21:39
KangieLemme know how it goes for you please!21:39
JayFI sometimes vapor-lock and don't even know what to look for21:39
JayFI would say hopefully I don't need it21:40
KangieHahah true 21:40
JayFbut my hope dispenser is dry and empty lolo21:40
KangieI get some hardware to pilot all my new node management stuff 21:40
KangieI'm thinking ironic to image nodes and a Linux head node stack based on containers 21:40
JayFwell I'll state, for posterity, I don't think we've ever had this issue reported in the real world :D 21:40
JayF"linux head node stack" I know all those words separately21:41
KangieAny caveats about running ironic in a VM?21:41
JayFbut can't make sense of them together21:41
JayFOnly real caveat to running Ironic-anything-anywhere is that conductors need specific networking21:41
KangieLike if I wanted to bind an interface in and have ironic do dhco and stuff 21:41
KangieDhcp*21:41
KangieGreat. That might be a deployment option 21:41
JayFdepending on how you have it deployed, what services/etc and what you're using, you might also end up with neutron-dhcp-agent running your dhcp21:41
JayFI don't know how much stack you're going to stack21:42
JayFbut we should have this chat sometime when my brain isn't mushy from looking at CI logs for hours :)21:42
KangieHaha. I'll tidy up the slides that a guy gave a good talk on and share em at some point 21:42
JayFIn fact, I can put aside some time for you if you wanted to chat on it sync21:42
JayFjust lmk21:42
KangieI had to take photos from the back row and the company hasn't responded to my emails 21:42
KangieNo rush, I'll be snowed under with procurement until April.21:43
KangieGood luck with your VM issues21:43
JayFtime to riir dnsmasq /s21:43
JayFsomehow seems less daunting than tracking down a bug that it seems like our CI is the only thing in the world that can reproduce21:43
JayFwe get that a lot (I know nova has that issue r/n in CI with a kernel bug)21:43
clarkbif only we could convince people that these problems are actually problems and likely show up elsewhere people just don't bother to report it21:50
JayFWho is "people" in this case? Most folks working on OpenStack are corporate people, paid to make OSS better. I suspect that's not the case for the author of dnsmasq21:52
JayFlike, I agree with the sentiment, I just feel like in the case of dnsmasq ... we are the people21:52
JayFeven if we are ill-equipped to be lol21:52
fungii think he means ironic users are also seeing these problems and just not reporting them to you22:01
fungiand the feeling that they only ever occur in ci jobs is wishful thinking22:02
fungior nova users in the case of the aforementioned kernel oopses22:03
fungirather, the ci failures are likely indicative of bugs which also occur in production somewhere and nobody's brought it to the project's attention22:04
fungiwe often see bugs in our providers whose signatures/symptoms closely match bugs we "only saw in ci" a few cycles earlier22:05
clarkbfungi: yes exactly the infra team has found over and over again that issues in ci go into prod and then we find them there too22:07
clarkbits just that people don't report them up which makes people think they can avoid fixing them22:07
clarkband I was specifically thinking of the libvirt/kernel stuff22:07
fungithough not in these cases at least, since they're severely impacting the projects' abilities to test and merge changes22:08
clarkbthey push back a lot on us and its fair to a point to get enough info to make things actionable but then they go "oh this isn't the very latest version of libvirt can't help you"22:08
clarkbnevermind millions of hypervisors are running that version22:08
fungidefinitely more aggravating when bugs we "occasionally saw in ci" turn into bugs we "frequently see in production" after they make it into a major release and people start upgrading to it22:10
TheJuliai think I have an idea of what is going on22:23
JayFI'd be really interested to see what :D22:38
TheJuliaso, with 2.87, it *appears* when we get a second release, dnsmasq can loose its mind22:40
TheJuliait then seems to go poof looking at the dnsmasq systemd journal22:40
JayF'a second release' meaning what, exactly?22:40
TheJulialooking for that window again :)22:40
JayFDHCPRELEASE x2 for the same thing?22:40
JayFor something else?22:40
TheJuliahttps://www.irccloud.com/pastebin/gK0vFtNt/22:41
JayFwould we know via logs, for instance, if there was a simultaneous thing happening like a HUP22:41
JayFor do we know this is isolated from any intentional config changes22:41
TheJuliahttps://www.irccloud.com/pastebin/GkOhsBpS/22:42
TheJuliabut, I don't see that when we go to 2.9022:42
JayFbut instead we see other breakages22:43
TheJuliayeah22:43
TheJuliawhat if it is another symptom instead of the actual cause22:43
* TheJulia looks at a 59 second gap22:43
JayFhmmm is that consistent across *all* the services?22:44
TheJuliawhat is that your asking about22:45
JayFe.g. if the VM itself hit a 59 second gap in execution for $busy_reasons, that would be a key insight into what's going on I suspect22:45
JayFI'm saying was teh 60 second gap dnsmasq22:45
JayFor was it *ALL OF THE THINGS* (or at least most)22:45
TheJuliano, the gap more looks like just an artifact of my greps22:45
JayFtrying to differentiate between "the whole VM is out to lunch" and "dnsmasq is out to lunch"22:45
JayFack22:45
JayFright now I'm looking at the code in neutron dhcp agent22:46
JayFthat validates that dnsmasq is done with a hup22:46
JayFbasically pulling on the thread of "where are there potential races in low performance cases"22:46
TheJuliaI'm starting to suspect individual cores are going out to lunch22:46
clarkbyou should see that as cpu steal in top and similar22:47
clarkbwhats the new tool called copilot?22:47
clarkbthough if it is just a deadlock then taht will be less visible22:47
TheJuliaapparently I'm on a call scheduled for an hour and everyone is talking how it will be 2 hours 8|22:48
JayFfungi: clarkb: I honestly hope you don't see "sweeping failures under the rug" in Ironic's culture (if you do, lets talk about what that looks like in a DM; I'm interested to know your perspective). We try pretty hard to isolate and fix the issues where they show up, since we have such a huge range of unknown built-in by the nature of there being so much crazy hardware in the world22:53
fungiJayF: i don't think so, no. it's more of a well-honed kneejerk reaction whenever people say things that might sound like they think a bug exists only in ci environments and not in the real world22:55
clarkbno its usually people outside ofopenstack actually22:56
fungiturns out we do a surprisingly good job of emulating real-world environments and bugs people encounter in their ci jobs are almost certainly possible to hit in real deployments22:57
clarkbopenstack has the ci issue. Identifies a problem in a dependency. Those maintainers say "thats not ap roblem because its a ci issue"22:57
clarkbthe implication being the problem is in our test environment and not in the software being tested22:57
clarkband it is often difficult to get people to understand that I'm not breaking their software it was already broken22:57
fungiyeah, i do think projects outside openstack have a much higher chance of using non-representative ci systems and assume by default bugs encountered in tests are not real bugs22:58
clarkbfor example: we have gotten pusback from qemu/libvirt/kernel types because of things like "no one should actually use qemu" nevermind its useful for all kinds of stuff22:59
fungiwe tried from the beginning to test openstack as close to real usage as feasible in a ci system22:59
clarkband then when we say we can't use nested virt because it crashes all the time they push on that and say "oh well wheere are the bug reports" we can't actually provide theem beacuse we don't have insight into the host kernel22:59
clarkband when we by some miracle manage that we get told everything is too old and can't be helped22:59
clarkbat every turn we're doing things wrong and yet I did nothing to modify the software or use it beyond its intended scope23:00
JayFI see the most infuriating variant of this frequently in gentoo community: someone pushes a PR fixing a real issue, on real machines people use that they care enough to put PRs up for (things like; this is invalid C and only works on gcc version X) ... and they just get crapped on because the author often says "use GCC version X" and just doesn't care about the broken folks23:01
TheJuliaso! we're running low, but not horribly low on ram. we just start to tickle swap even though we have ~1+G free. At least according to the logs. I'm going to try and do a patch on dialing down concurrency a little so we only run one job but there is time to run the cleanup separately23:05
TheJuliaoh! I see a difference in our job config23:06
TheJulia1 cpu on one which is more fail happy, 2 on redfish. Just... "interesting"23:06
opendevreviewJulia Kreger proposed openstack/ironic master: DNM: Adjust standalone job concurrency  https://review.opendev.org/c/openstack/ironic/+/91055223:07
fungiJayF: i feel like that attitude from some upstreams is exactly why distros end up carrying so many downstream patches in their packaging23:16
fungiin debian they even have a term for them, "hostile upstreams"23:17
fungisome are just notoriously anti-packaging in general and have essentially been flagged as "do not contact"23:18
fungiothers tolerate the idea of downstream packaging but don't want to be bothered with any bugs outside their one documented and tightly-scoped deployment model/platform/environment23:20
TheJuliaEven packaging friendly teams can sometimes do things which can really shoot packagers in the foot.23:37
TheJuliaI suspect turning dstat on might be a good idea on the ironic standalone jobs, but still a pile of super weird23:39

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!