Monday, 2024-11-04

rpittaugood morning ironic! o/07:38
rpittauJayF re https://review.opendev.org/c/openstack/virtualpdu/+/933882 that looks good to me,  that's great :)08:31
rpittauJayF re Epoxy priorities: I was planning to do the work items as we usually do after every PTG like https://review.opendev.org/c/openstack/ironic-specs/+/916295 but I haven't started anything yet, if  you have a draft already feel free to submit that and I will review that as highest priority08:31
opendevreviewcid proposed openstack/ironic master: Allow special characters in patch field keys  https://review.opendev.org/c/openstack/ironic/+/93374309:03
opendevreviewcid proposed openstack/ironic master: Save ``configdrive`` in an auxiliary table  https://review.opendev.org/c/openstack/ironic/+/93362209:41
opendevreviewMerged openstack/ironic master: Correct duplicated names/entries in unit tests  https://review.opendev.org/c/openstack/ironic/+/93367910:04
opendevreviewMerged openstack/ironic master: Remove trailing whitespace  https://review.opendev.org/c/openstack/ironic/+/93368411:11
cidGood afternoon Ironic, rpittau.11:24
rpittauhey cid :)11:24
cidDoes networking-baremetal does fall under the Ironic umbrella. 11:25
rpittaucid: it does indeed!11:25
cidOkay great.11:25
cidNew bug I might need help triaging filled on it, I will be adding it to the meeting page, s11:27
rpittauthanks cid, sounds good11:27
cid++11:27
opendevreviewyatin proposed openstack/ironic stable/2023.1: [Stable Only] pin virtualbmc to last released tag  https://review.opendev.org/c/openstack/ironic/+/93403612:42
opendevreviewcid proposed openstack/ironic master: Save ``configdrive`` in an auxiliary table  https://review.opendev.org/c/openstack/ironic/+/93362212:45
opendevreviewyatin proposed openstack/ironic stable/2023.1: [Stable Only] pin virtualbmc/ironic-tempest-plugin to last released tag  https://review.opendev.org/c/openstack/ironic/+/93403613:43
opendevreviewyatin proposed openstack/ironic stable/2023.1: [Stable Only] pin virtualbmc/sushy-tools/ironic-tempest-plugin to last released tag  https://review.opendev.org/c/openstack/ironic/+/93403614:51
rpittau#startmeeting ironic15:00
opendevmeetMeeting started Mon Nov  4 15:00:26 2024 UTC and is due to finish in 60 minutes.  The chair is rpittau. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
opendevmeetThe meeting name has been set to 'ironic'15:00
dtantsuro/15:00
rpittauHello everyone!15:00
rpittauWelcome to our weekly meeting!15:00
rpittauThe meeting agenda can be found here:15:00
rpittauhttps://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_Novembere_04.2C_202415:00
rpittaunot a lot to discuss today, but let's see also if we have quorum15:01
JayFo/15:01
cido/15:02
rpittauI will start with the announcements and see if more people joins15:03
rpittau#topic Announcements/Reminders15:03
rpittau#info Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio15:03
rpittau#link https://tinyurl.com/ironic-weekly-prio-dash15:03
rpittauseems most of the patches there are updated and need a second review15:04
rpittaucan probably have a look after the meeting15:04
rpittau#info 2025.1 Epoxy Release Schedule15:05
rpittau#link https://releases.openstack.org/epoxy/schedule.html15:05
rpittauwe're at R-21, nothing special, just related to that next bugfix branches will happen in 3 weeks and we should consider having releases for other projects too15:05
rpittau#info Epoxy OpenInfra PTG was October 21-25, 202415:07
rpittau#link https://etherpad.opendev.org/p/ironic-ptg-october-202415:07
rpittauI'm leaving this still here for review15:07
rpittauJayF have you seen my message for the work items?15:07
JayFYes. I'll work on that today15:09
rpittauthanks15:09
rpittaumoving  on15:09
rpittau#topic Discussion topics15:09
rpittaudo we have anything to discuss today?15:09
rpittauonward then!15:10
rpittau#topic Bug Deputy Updates15:11
cidYea, there were two RFEs15:11
rpittaucid: I think https://bugs.launchpad.net/networking-baremetal/+bug/2086303 is on oslo and not on us15:11
rpittauor even kolla15:11
cidOh, okay, should I take any specific action(s)15:12
rpittauthere's a reference to another bug15:12
cardoetimezones... 15:13
rpittaulooks like the issue is caused by the  issue is due to the fix for that15:13
cidSo, invalid on networking-baremetal?15:14
rpittaucid: reading the other bug I'm not sure anymore, it may be valid on nbm15:15
cidI might just change the status from new, then we will wait to find out with time where exactly it's occuring15:15
rpittaucid: I would ask in the bug itselfif they want to propose a  fix in nbm15:16
JayFWhile we're talking about bugs, there was a patch posted over the weekend to NGS completely introducing a new actions model using an agent. I put a comment in that rfe requesting a spec and marking it as needs-spec15:17
JayFI don't have the link at hand to the bug, but it's the one associated with the NGS patch that was posted on Sunday15:17
rpittauJayF: thanks :)15:17
cidrpittau, I will drop a message in there.15:17
rpittauthanks cid 15:17
rpittaualso for https://bugs.launchpad.net/ironic/+bug/2064655 that is not a bug15:17
rpittauit's just a permission issue15:18
cidrpittau, I suspected same.15:18
rpittauRFEs look both ok baseed on feedback from PTG15:19
cid++15:19
cidOn 2064655 , can I invalidate the bug.15:20
rpittauyep15:20
cidAlright15:20
rpittauabout that error in CI, any specific job that is failing for that?15:21
cidSeen it in one of mine.15:22
cid1 sec, let me get the link15:22
cidhttps://review.opendev.org/c/openstack/ironic/+/93374315:22
cidSame job and error failed here https://review.opendev.org/c/openstack/ironic/+/933622, though, there's a lot more other failing jobs that are related to the change.15:24
rpittauok, so it's related to a patch? looks like in ironic-tempest-ovn-uefi-ipmi-pxe only15:25
rpittauit may be worth running a test-ci patch to see if it replicates15:25
JayFThe unit test failure on the config drive change is related to the patch based on a cursory look15:26
rpittauyeah15:26
rpittauthat's why we need a zero-change patch to verify15:26
rpittauanyway, moving on15:28
rpittauany volunteer for bug deputy for this week?15:29
cidI don't mind15:29
rpittaucid: thanks! 15:30
cidMy pleasure15:31
rpittauone thing I forgot to mention, next Monday I won't be able to chair the meeting, someone else will need to take my place15:31
rpittauany more topics for today?15:32
rpittaualright, thanks everyone!15:33
rpittau#endmeeting15:33
opendevmeetMeeting ended Mon Nov  4 15:33:20 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:33
opendevmeetMinutes:        https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-11-04-15.00.html15:33
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-11-04-15.00.txt15:33
opendevmeetLog:            https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-11-04-15.00.log.html15:33
JayFI should be able to run the meeting next week. I'll set myself a reminder15:37
rpittauthanks JayF :)15:43
TheJuliaI totally forgot about time change16:14
TheJuliaAlso, conflicting meeting for reasons16:14
rpittauand that's why we should abolish time change16:18
JayFLets abolish all change. 16:20
jamesdentonCan an image created with block-device-efi-lvm be resized post-deployment? Most examples i see have the root partition set to something like 25G. Also curious if config-2 will be problematic?16:39
cardoerpittau: I keep hoping that happens every year.16:43
cardoeSo skrobul noticed that "baremetal node set 1ec32b3b-6e5b-46de-a723-426162a22e3d set --no-automated-clean" has a policy error by default if the nodes are in a project. But if you use system scope like it suggests it can't find the nodes. Do we need a tweak to the default policy?16:45
cardoe"baremetal:node:disable_cleaning": "role:admin and system_scope:all" requires a scope of ['system'], request was made with project scope. (HTTP 500)16:45
cardoeI'm happy to make a bug report or maybe poke him to do so.16:45
JayFcardoe: no, that's accurate16:49
JayFcardoe: security disablement is not a project decision; it's a system decision16:50
JayFIf you're saying it doesn't work either way, I'd like to get some logs of it both direction16:50
JayFjamesdenton: don't know the answer for you, btw16:51
JayFjamesdenton: might be worth cross-posting in #openstack-dib16:51
Pcmalih_Hi, I am facing OSError: [Errno 18] Invalid cross-device link: '/var/lib/ironic/master_iso_images/UUID' -> '/tmp/ahfgjsh/boot. Using Redfish virtual media HTTP URL and Glanc UUID both face this issue. Nay suggesstion?  16:55
JayFPcmalih_: is /tmp on a separate partition?16:56
JayFPcmalih_: if yes; can you file a bug @ bugs.launchpad.net with this error, and the output of `mount` on your conductor16:57
rpittaugood night! o/16:57
Pcmalih_Looks like . But i have hosted both in same machine beig used as controller16:57
JayFPcmalih_: and putting master_path on the same partition as /tmp would resolve it16:57
JayFPcmalih_: this looks a /lot/ like the error you'd get trying to hardlink across parititons, so I'm wondering if we have a bug/assumption that master_path and /tmp are the same partition16:57
Pcmalih_Any way to set master_path ?16:58
JayF[pxe]/instance_master_path https://docs.openstack.org/ironic/latest/configuration/sample-config.html16:59
JayFor 16:59
JayF[deploy]/iso_master_path16:59
JayFthat path looks more like the iso_master_path default value to me16:59
JayFI'll note: this dir is basically the place images get cached in Ironic, so you probalby don't want *that dir* in /tmp17:00
Pcmalih_But i am using Kolla-Ansible17:00
JayFI can't speak to how kolla-ansible sets stuff up, they can help you with that piece, but please do file a bug as this is our issue if it's what I think it is17:01
JayFbut I need full logs to be sure17:01
Pcmalih_How to customise tmp address or ironic/master_iso_images using Kolla-Ansible?17:01
TheJuliajamesdenton: re block-device-efi-lvm, the take a look at the growvols ?element?17:02
TheJuliacardoe: likely, the restriction to it only being system scope should be fixed17:03
TheJuliacardoe: but the default policy rule is realistically, as jayf pointed out, a system decision17:03
JayFAre you saying we should allow project admin to do it?17:04
TheJuliawe should allow someone to set appropriate policy to meet their needs17:04
TheJuliato just error on forcing scope to be specific is wrong17:04
JayFokay, so we don't have the bits in place for someone to make a custom policy rule permitting project to do this17:04
TheJuliabecause then they don't get a real "you don't have access", we outright deny because your asking with the wrong scope17:04
JayFand we wanna add that while keeping the default17:05
TheJuliaslightly nuanced difference17:05
JayFokay I am +1 to that change17:05
JayFas long as default policy still restricts to system admin17:05
JayFor system-wahtever17:05
TheJuliayeah17:05
opendevreviewJulia Kreger proposed openstack/ironic master: trivial: Fix policy scope restriction for automated cleaning  https://review.opendev.org/c/openstack/ironic/+/93406517:15
TheJuliathat ^^17:15
jamesdentonThanks JayF TheJulia - gives me something to look at17:15
JayFDo we want a release note on that ?17:16
JayFSeems like it's operator facing17:16
TheJuliamildly operator facing17:16
opendevreviewJulia Kreger proposed openstack/ironic master: trivial: Fix policy scope restriction for automated cleaning  https://review.opendev.org/c/openstack/ironic/+/93406517:19
opendevreviewMichal Nasiadka proposed openstack/networking-generic-switch master: Add vlan aware VMs support  https://review.opendev.org/c/openstack/networking-generic-switch/+/92849017:28
cardoeSo am I doing it wrong by putting my hardware into a domain and project that's separate from the domain that my normal users/projects are in?17:58
opendevreviewJulia Kreger proposed openstack/networking-baremetal master: prevent break on communications failure  https://review.opendev.org/c/openstack/networking-baremetal/+/93314918:24
jamesdentonmnasiadka - with your NGS work, are you observing network_data.json reflecting the VLAN configuration on the deployed baremetal instance? Like, is cloud-init implementing the subinterface configuration in the OS?18:25
opendevreviewJulia Kreger proposed openstack/networking-baremetal stable/2024.2: avoid attribute error on bad password or config  https://review.opendev.org/c/openstack/networking-baremetal/+/93407018:26
opendevreviewJulia Kreger proposed openstack/networking-baremetal stable/2024.1: avoid attribute error on bad password or config  https://review.opendev.org/c/openstack/networking-baremetal/+/93407118:26
opendevreviewJulia Kreger proposed openstack/networking-baremetal stable/2023.2: avoid attribute error on bad password or config  https://review.opendev.org/c/openstack/networking-baremetal/+/93407218:26
opendevreviewJulia Kreger proposed openstack/networking-baremetal stable/2023.1: avoid attribute error on bad password or config  https://review.opendev.org/c/openstack/networking-baremetal/+/93407318:27
opendevreviewJulia Kreger proposed openstack/networking-baremetal master: prevent break on communications failure  https://review.opendev.org/c/openstack/networking-baremetal/+/93314918:36
TheJuliahjensas: so I've been thinking about maybe we should be raising https://docs.openstack.org/oslo.service/ocata/api/loopingcall.html#oslo_service.loopingcall.LoopingCallDone instead, if you can take a look/glance (hopefully you remember enough of the higher level interaction)18:38
mnasiadkajamesdenton: haven’t thought about it, I’ll check - but isn’t network_data.json a Nova metadata thing?18:39
jamesdentonindeed. just curious to see how tight the integrations were18:39
TheJuliacardoe: you may also have thoughts regarding my comment to hjensas above18:40
jamesdentonno worries and no rush18:40
TheJuliaI've thought about the metadata, and it likely has a couple issues with bonds/mutliple ports18:40
jamesdentoni think you're right18:40
JayFmnasiadka: kinda, but also it's us...18:40
JayFlet me show you18:40
JayFhttps://opendev.org/openstack/nova/src/commit/0a59078935b753ba30518915e15189e63c0dfc66/nova/virt/ironic/driver.py#L104518:41
JayFso essentailly, ironic's nova driver builds what goes in netowrk_data in the configdrive18:41
JayFbut does *not* do the same for metadata service18:41
JayFso yes-ish it's coming from nova-the-service, but contents are code that is sorta jointly maintained (and is separate from other drivers)18:41
TheJuliacardoe: hjensas: at least calling LoopingCallDone would allow us to signal with -1... in theory18:43
hjensasTheJulia: will it fully stop if we raise LoopingCallDone? There are two "loops" running, one for reporting state, and the other "notify_agents" for the cluster of agents.18:57
opendevreviewMerged openstack/ironic-python-agent master: Cleanup usage of imported-from-ironic-lib disk_utils  https://review.opendev.org/c/openstack/ironic-python-agent/+/92846618:57
TheJuliaoh....18:57
TheJuliayeah, that would prevent it from ever exiting then18:57
TheJuliaeww18:57
TheJuliaso... hmmmmm18:57
TheJuliawell, calling stop then would at least exit it18:57
TheJuliain *theory*18:57
hjensasIt would be good to exit with an error return code ...18:58
TheJuliaI think the thing to do is to just have a "exit with a return code" to the definition of stop19:00
TheJulia.... and then force python to exit with it19:00
mnasiadkaJayF: thanks for the pointers19:03
TheJuliahjensas: we could call os.abort()19:04
TheJuliait could result in a core dump, and will result in an exit code... 255 I think19:06
TheJuliamodern linux distros don't save core dumps by default, so it should be fine...19:06
hjensasTheJulia: I wonder if we should have something similar in the _notify_peer_agents method, try to re-initialize the messaging transport? (But I think oslo messaging should provide quite good resiliency, with retries with back off etc already.)19:08
TheJuliaso, it only retires... 9 times19:08
TheJulia30 seconds apart19:09
TheJuliaso... I'm not sure how simple it would be to do that or not19:09
hjensasoh, it stops after 9 tries? I figures it would keep on trying with 30 second interval.19:13
TheJuliayeah, the default is a limited number of tries from my reading of the oslo.messaging code19:14
TheJuliagranted, I only skimmed it19:14
TheJuliaI think the whole idea is "how often does the message bus break19:15
TheJuliahmm... I guess _notify_peer_agents could fail too19:15
TheJuliahmmm19:16
hjensasThen catching "Exception" there might be bad - MessageDeliveryFailure might be better?19:16
TheJuliaso the case we found the failed host in, it wasn't logging anything19:18
TheJuliait could be the overall process was long "weged"19:18
TheJuliaI bet self.notifier.info() never returns and gets hung19:21
opendevreviewMerged openstack/ironic-python-agent master: Remove use of ironic_lib i18n module  https://review.opendev.org/c/openstack/ironic-python-agent/+/93008019:22
TheJuliaso, there is no timeout19:23
TheJuliaso the looping call could in theory just hang too19:23
TheJuliaI guess that is a uniquely separate way the agent can just sort of enter a apparent frozen state19:30
hjensasself.notifier.info() is the listener, it will run if a info message is on the topic.19:32
TheJulianope, there is a default timeout of 300 seconds, so *eventually* it should exit19:36
TheJuliayeah19:36
TheJuliamy nope was for my own perception of it hanging19:37
TheJuliaso the likely state was the self.notifier.info was still silently working19:37
TheJuliaso I bet the *actual* failure was more than likely the client breaking19:40
TheJuliaand I bet the message bus was in a slightly happier state19:40
TheJuliawhich I feel like it sort of aligns with the state, the only way to know otherwise was to look at network sockets I guess19:40
cardoehmm getting caught up sorry.19:43
cardoesounds like no good solution yet?19:43
opendevreviewJulia Kreger proposed openstack/networking-baremetal master: prevent break on communications failure  https://review.opendev.org/c/openstack/networking-baremetal/+/93314919:47
TheJuliaeh, I think ^ might cover it19:48
TheJuliait will cause it to self abort, looping call will cause things to eventually timeout19:49
TheJuliaso as long as loopingcall doesn't entirely go on vacation19:49
TheJuliasome failure under it should become detectable and the abort *should* get called19:50
TheJuliaThat is a few toooo many shoulds, but it feels sound enough19:50
TheJulia... I think19:50
JayFOK something is upside-down in CI19:52
JayFhttps://review.opendev.org/c/openstack/ironic-python-agent/+/911598 and ironic-lib changes both appear to not have tempest installed in the Devstack runs19:53
JayFor the ironic-tempest-plugin19:53
JayFhmmm maybe this is a different shaped failure than ironic-lib19:54
JayFI'm going to recheck and see, but I suspect we got busted by something19:54
TheJuliahmmmmmm19:55
JayFWe also have a stunning number of patches in ironic-week-prio19:57
JayFI'm going thru em now myself, but some nonzero % of them are mine19:57
JayFmnasiadka: https://review.opendev.org/c/openstack/networking-generic-switch/+/932541 just asking if you saw my comment here? Happy to help you get a release note on that if you want so we can land it19:58
JayFQuestion for the class: now that we have ironic-reviewers and ironic-approvers separated, how do we feel about self-approvals when the code review threshold is already met?20:00
JayFe.g. https://review.opendev.org/c/openstack/ironic/+/933685/1 has CID +2, Riccardo +2, but not landed (CID's review came second). Should I land it even though I was the author/20:01
JayFI'm leaning towards "yes/best judgement" but don't wanna make such a call unilaterally :)20:01
TheJuliaI generally refrain from doing so unless there is a needful aspect, i.e. "oh, trying to fix CI"20:02
opendevreviewJay Faulkner proposed openstack/ironic-python-agent master: Vendor metrics library from Ironic-Lib & deprecate  https://review.opendev.org/c/openstack/ironic-python-agent/+/93306320:02
JayFhonestly the only rush here is wanting the tip of that chain to merge before something else new being linted gets landed and breaks it20:03
JayFsince I don't wanna have to redo that auditt20:03
cardoeYeah that’s why I was trying to nudge that in.20:09
opendevreviewMerged openstack/python-ironicclient master: bump minimum pbr version for pip 23.1 support  https://review.opendev.org/c/openstack/python-ironicclient/+/93359720:41
opendevreviewVerification of a change to openstack/ironic-python-agent master failed: Correct invalid docstrings; s/Found/Error/  https://review.opendev.org/c/openstack/ironic-python-agent/+/91159820:50
opendevreviewMerged openstack/ironic-python-agent stable/2024.1: Warn when the provided checksum algorithm does not match the detected  https://review.opendev.org/c/openstack/ironic-python-agent/+/93310521:33
opendevreviewJay Faulkner proposed openstack/ironic-python-agent stable/2023.2: Warn when the provided checksum algorithm does not match the detected  https://review.opendev.org/c/openstack/ironic-python-agent/+/93409121:35
JayFfyi looks like we're getting some clean ci runs now21:58
JayFidk what was broken on all those patches I checked but it's gone21:58
opendevreviewJay Faulkner proposed openstack/ironic master: Use patched dnsmasq from PPA  https://review.opendev.org/c/openstack/ironic/+/93310422:10
JayF^^ that is ready for review, is idempotent, and should solve our issue (and we can back it out once upstream takes the backport)22:11
JayFwell, *ubuntu* takes the backport, which is upstream of us but downstream of dnsmasq22:11
opendevreviewJay Faulkner proposed openstack/ironic-specs master: WIP: 2025.1 priorities  https://review.opendev.org/c/openstack/ironic-specs/+/93409223:03
JayFrpittau: ^ I commented where I stopped off, I have a meeting to go to23:04
JayFrpittau: that should be a good start though23:04
opendevreviewDoug Goldstein proposed openstack/python-ironicclient master: fix port name in Port resource  https://review.opendev.org/c/openstack/python-ironicclient/+/93374623:34
opendevreviewVerification of a change to openstack/ironic-python-agent master failed: Correct invalid docstrings; s/Found/Error/  https://review.opendev.org/c/openstack/ironic-python-agent/+/91159823:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!