Tuesday, 2023-03-14

TheJulia\o/00:08
rpittaugood morning ironic! o/08:08
rpittauand whoopsies for ipmitool08:08
rpittausince we're in the mood recently maybe we can adopt it? :)08:08
dtantsurrpittau: a bunch of vendor-specific C code? I'd rather not...08:33
rpittauno worries, it was of course a joke, I'm not "that" crazy :D08:52
arne_wiebalckGood morning, rpittau dtantsur and Ironic!11:22
arne_wiebalckI am looking into rebuilding nodes with s/w RAID (since it does not work :-).11:23
arne_wiebalckThe reason is that after the image has been deployed, manage_uefi() tries to create the boot partitions which are already there.11:23
arne_wiebalckCan I see somehow in the IPA that this is a "rebuild" operation (with the idea to skip recreating the boot partitions)?11:24
arne_wiebalckAlternatively, I could probably force partition creation and overwrite the existing ones ... not sure I want to do that.11:25
arne_wiebalckOr I could catch the error and *assume* the existing partitions are ok ... not ideal either.11:27
arne_wiebalckrpittau: what happened to ipmitool?11:35
rpittauarne_wiebalck: https://www.phoronix.com/news/ipmitool-GitHub-Suspended11:39
dtantsurarne_wiebalck: IPA has access to the node, maybe it has access to provision_state?11:41
iurygregorygood morning Ironic12:39
arne_wiebalckhey iurygregory o/13:11
arne_wiebalckrpittau: thanks! hadn't heard this yet13:11
arne_wiebalckdtantsur: hmm ... is a `openstack server rebuild ...`, not just a deploying provisioning state in the end? does Ironic make a difference between deploy and rebuild of an instance anywhere?13:12
dtantsurmm, yeah, you're right13:13
iurygregoryarne_wiebalck, o/13:13
dtantsurarne_wiebalck: we can probably store something temporary in driver_internal_info..13:13
arne_wiebalckdtantsur: you mean set this out of band?13:14
arne_wiebalckdtantsur: or does Ironic know at some point?13:14
dtantsurarne_wiebalck: no, I mean ironic could do that on rebuild13:14
dtantsur'rebuild' is a provisioning verb13:14
arne_wiebalckdtantsur: sorry, when I say "rebuild" I mean `openstack server rebuild`13:14
arne_wiebalckdtantsur: will Ironic see this as a rebuild request?13:15
dtantsurarne_wiebalck: yeah, but I'd expect that to result in an ironic rebuild, not move through 'available'?13:15
dtantsurI hope so :)13:15
arne_wiebalckdtantsur: heh13:15
arne_wiebalckdtantsur: I will check13:15
arne_wiebalckdtantsur: if that is the case, we could make a mental note13:15
arne_wiebalckdtantsur: and check, and erase the note13:15
arne_wiebalckdtantsur: thanks, I will try13:16
dtantsurarne_wiebalck: I mean, I'm quite sure we don't drive nodes through cleaning on rebuild13:16
arne_wiebalckdtantsur: yes, that is correct13:24
arne_wiebalckdtantsur: the other  option (for us) would be to set sth out of band on the node: operations such as rebuild are often embedded in workflows where we could add a line13:26
arne_wiebalckdtantsur: the first thing I will try, though, is to see if skipping the creation of the boot partitions actually does the trick13:26
arne_wiebalckdtantsur: if it does, we can look into the logic when to do that13:27
TheJuliagood morning13:35
opendevreviewDmitry Tantsur proposed openstack/ironic master: Refactoring: create ironic.conductor.inspection  https://review.opendev.org/c/openstack/ironic/+/87731714:41
dtantsurmorning TheJulia 14:41
opendevreviewDmitry Tantsur proposed openstack/ironic master: Refactoring: DRY in the root API controller  https://review.opendev.org/c/openstack/ironic/+/87738415:01
opendevreviewDmitry Tantsur proposed openstack/ironic master: Refactoring: clean up inspection data handlers  https://review.opendev.org/c/openstack/ironic/+/87738715:39
opendevreviewDmitry Tantsur proposed openstack/ironic master: [WIP] Migrate the inspector's /continue API  https://review.opendev.org/c/openstack/ironic/+/87594415:45
opendevreviewDmitry Tantsur proposed openstack/ironic master: [WIP] Migrate the inspector's /continue API  https://review.opendev.org/c/openstack/ironic/+/87594415:46
opendevreviewJulia Kreger proposed openstack/ironic stable/2023.1: Clean out agent token even if power is already off  https://review.opendev.org/c/openstack/ironic/+/87739115:50
opendevreviewJulia Kreger proposed openstack/ironic stable/zed: Clean out agent token even if power is already off  https://review.opendev.org/c/openstack/ironic/+/87739215:50
jlvillalTheJulia, FYI: I posted a couple of patches for bifrost. 1) just a simple change to show the path if there is an error. https://review.opendev.org/c/openstack/bifrost/+/87712416:06
jlvillal2) The fix I did to get my system with IPv6 disabled to work: https://review.opendev.org/c/openstack/bifrost/+/87712216:06
opendevreviewMerged openstack/metalsmith stable/wallaby: list_instances - cache allocations  https://review.opendev.org/c/openstack/metalsmith/+/87667316:14
rpittaubye everyone, see you tomorro! o/16:24
TheJuliajlvillal: interesting!16:27
jlvillalTheJulia, Which part? :)16:27
jlvillalI'll assume it is that incredible patch that adds printing the variable of the deploy image location ;)16:28
TheJuliajlvillal: I wonder if the socket code is smart enough to go that is an ipv4 address, we need to use a v4 socket16:30
jlvillalI'm not sure...  I figured since Bifrost is IPv4 only and doesn't support IPv6 it was okay to do this change.16:32
TheJuliajlvillal: looking at https://review.opendev.org/c/openstack/bifrost/+/877122/1//COMMIT_MSG#7 I'm a little confused since the file is ironic-inspector.conf.j216:32
jlvillalAh, duh!  Yes you are correct. It should be 'inspector'16:33
TheJuliasorry16:33
opendevreviewJohn L. Villalovos proposed openstack/bifrost master: chore: allow ironic-inspector to work with IPv6 disabled  https://review.opendev.org/c/openstack/bifrost/+/87712216:35
jlvillalTheJulia, Commit message corrected.16:36
jlvillalThanks.16:39
opendevreviewMerged openstack/ironic stable/zed: Do not recalculate checksum if disk_format is not changed  https://review.opendev.org/c/openstack/ironic/+/87725116:43
opendevreviewMerged openstack/ironic bugfix/21.3: Do not recalculate checksum if disk_format is not changed  https://review.opendev.org/c/openstack/ironic/+/87725016:43
opendevreviewMerged openstack/ironic stable/2023.1: Do not recalculate checksum if disk_format is not changed  https://review.opendev.org/c/openstack/ironic/+/87702916:43
opendevreviewDmitry Tantsur proposed openstack/ironic master: Refactoring: clean up inspection data handlers  https://review.opendev.org/c/openstack/ironic/+/87738717:18
opendevreviewDmitry Tantsur proposed openstack/ironic master: Refactoring: clean up inspection data handlers  https://review.opendev.org/c/openstack/ironic/+/87738717:27
opendevreviewDmitry Tantsur proposed openstack/ironic master: [WIP] Migrate the inspector's /continue API  https://review.opendev.org/c/openstack/ironic/+/87594417:31
opendevreviewMerged openstack/ironic master: Document [fake] delay config values  https://review.opendev.org/c/openstack/ironic/+/86805318:02
opendevreviewMerged openstack/bifrost master: chore: provide the location of deploy_image_path if missing  https://review.opendev.org/c/openstack/bifrost/+/87712418:44
opendevreviewMerged openstack/networking-generic-switch master: config: Ignore unknown options starting with ngs_  https://review.opendev.org/c/openstack/networking-generic-switch/+/86830019:04
opendevreviewJulia Kreger proposed openstack/ironic-specs master: Add service steps framework  https://review.opendev.org/c/openstack/ironic-specs/+/87234919:05
opendevreviewMerged openstack/bifrost master: Restore discovery for dnsmasq dhcp provider  https://review.opendev.org/c/openstack/bifrost/+/86478719:13
opendevreviewMerged openstack/ironic master: Wipe Agent Token when cleaning timeout occcurs  https://review.opendev.org/c/openstack/ironic/+/87616119:27
opendevreviewMerged openstack/ironic bugfix/21.2: Configure CI for bugfix/21.2  https://review.opendev.org/c/openstack/ironic/+/87641019:27
opendevreviewJulia Kreger proposed openstack/ironic stable/2023.1: Wipe Agent Token when cleaning timeout occcurs  https://review.opendev.org/c/openstack/ironic/+/87739419:30
opendevreviewJulia Kreger proposed openstack/ironic stable/zed: Wipe Agent Token when cleaning timeout occcurs  https://review.opendev.org/c/openstack/ironic/+/87739519:30
opendevreviewJulia Kreger proposed openstack/ironic stable/yoga: Wipe Agent Token when cleaning timeout occcurs  https://review.opendev.org/c/openstack/ironic/+/87739619:30
opendevreviewMerged openstack/bifrost master: chore: allow ironic-inspector to work with IPv6 disabled  https://review.opendev.org/c/openstack/bifrost/+/87712219:40
opendevreviewJulia Kreger proposed openstack/ironic-specs master: Add service steps framework  https://review.opendev.org/c/openstack/ironic-specs/+/87234920:28
opendevreviewJulia Kreger proposed openstack/ironic-specs master: Framework for DPU management/orchustration  https://review.opendev.org/c/openstack/ironic-specs/+/87418921:34
JayFTheJulia: how about "ansible" as a step :D 22:14
* JayF just thinking of that w/r/t your "pause" step ... makes me wonder if we could actually have ironic still-orchestrate the external stuff22:14
ashinclouds[m]Pause is needed for dpu programming aiui22:14
JayFI'm thinking in-addition-to not in-lieu-of22:15
TheJuliaOh22:15
JayFlike implementing an external step where conductor would e.g. shell out and do whatever you want (maybe configured via ironic.conf so it's not arb command exec)22:15
TheJuliaI guess the challenge is if we’re in a special ramdisk which say has the api gateway to do magical card programming22:16
JayFno, I'm saying conductor could do it22:16
TheJuliaWhich we can’t trigger directly (still undefined/nebulous at the moment)22:16
TheJuliaOh22:16
JayFsorta like a more easy way to do custom out-of-band step (really, really out of band)22:17
* JayF thinking back to previous companies where we had patches to e.g. call into a CMDB-like service during deployment22:17
TheJuliaMaybe, we have talked about that before and I’ve been shot down on command invocattiony things22:17
* TheJulia tries to fend off aggressive cat22:17
JayFwe'd have to dictate a strong, safe interface22:17
JayFuse nagios nrpe as a guide: only ever run commands that were submitted by an admin into the ironic.conf22:17
TheJuliaSo, I would almost rather just pass thing to the agent22:18
TheJuliaNot on the conductors shell22:18
JayFIn any use case I would've had for such automation, I needed it OOB22:18
JayFbut that is strongly influenced by the fact that any Ironic I've worked on ends up with a custom hw mgr for some reason so in-band is easier :D22:18
TheJuliaYeah, the challenge is we would have to kick out back to the workload and then back into a ramdisk if there was an issue22:19
TheJuliaWhich maybe that is fine22:19
JayFit's a thought that your comment spawned, I don't think we have to do it at all or if there's a technical hurdle22:20
TheJuliahttps://usercontent.irccloud-cdn.com/file/bdXT5EIF/1678832397.JPG22:20
JayFI was just sorta surprised we hadn't done it before22:20
TheJuliaHalp, boy cat wants scritches22:20
TheJuliaIt being commands?22:21
JayFyeah22:22
JayFarbitary commands as a step22:23
JayFso ironic can orcestrate external things22:23
JayFwithout needing more patches22:23
TheJuliaYeah, I honestly would prefer if we could do a filter match conductor side and pass through to the agent to run the command locally22:23
TheJuliaWhich would then allow for a cleanish bolt on22:24
TheJuliaAnd it could have that local context22:24
JayFby "the agent" do you mean something other than the in-band agent?22:24
TheJuliaIke, ipa22:24
TheJuliaErr, like, not Ike22:24
TheJuliaThere are no Ike’s here…. Afaik.22:25
JayFWe already support that relatively trivially though, with custom hardware managers22:25
JayFit's quite a larger lift to create a custom hardware type to do similar22:25
TheJuliaYeah, just thinking a standard method22:25
JayFplus conductors generally are in a more privledged network space22:25
JayFso e.g. IPA in almost anywhere I've worked while booted for deploy/clean/rescue would have access to nearly-nothing22:25
JayFwhereas a conductor could, for instance, update an external CMDB that a node had been deployed22:26
TheJuliaYeah, but you could launch something there that does something and waits22:26
JayFI don't understand the line you're drawing with that comment?22:26
TheJuliaLikely best if we do execute with least privilege. And if a command doesn’t exit immediately it could continue to run22:27
TheJuliaAnd run say until exit on the agent side, at which point upon next heartbeat, the next step would run22:27
JayFI feel like my concern about IPA having no network privs, and the need to modify IPA ramdisk already to include the tool (at which point a thin HWM is trivial) was sorta skated over?22:28
JayFlike, are you arguing this is useful for IPA, or that it's too unsafe for conductors? or ??? 22:29
TheJuliaIPA network preferences are all over the place security wise22:29
TheJuliaFor conductors, I would be worried about too much access, and an inability to complete the needful22:30
JayFI think the latter half can be 10000% coded around22:30
TheJuliaBut maybe a case could be made for both sorts of options22:30
JayFand the first half is less of a concern as long as you ensure the only people who can determine *what commands run* (not when) are those who can edit ironic.conf22:30
JayFbut either way, we're just talking about a throwaway idea; I'm not sure this would reach the top of our lists for important stuff22:30
JayFjust when I hear "so people don't have to orchestrate through ironic", it makes me feel like we're missing something22:31
JayFgiven our entire thing is lifecycle management and hardware orchestration22:31
TheJuliaIt might be my primary priority if we can reach a framework which makes sense22:31
JayFyou familar with nagios_nrpe_server?22:31
TheJuliaOh yea22:31
JayFthat's exactly the model I'm thinking22:31
TheJuliaI did awful things with npre22:31
JayFyou preconfigure a command, maybe put some args in it (or maybe we forbid it to be safer), and go22:32
TheJuliaI guess the question for those that l… ahem… misuse rescue, is would this work22:32
JayFif we wanted to misdirect it, e.g. conductor tells ironic-remote-command-agent (I'd be shocked if openstack doesn't have something in this category already)22:33
TheJuliaS/l…/un…/22:33
TheJuliaErr, um22:33
JayFbut I also think we could just build that model into the conductor and it'd be OK as long as we walled it off, aggressive timeouts, etc22:33
* TheJulia gives up on phone keyboard22:33
JayF(there's also no reason we couldn't do this in both directions; I just see it as much less useful with IPA because going over the line to "I build my own ramdisk" is the hard part)22:34
TheJuliawe don't have a dynamic passthrough, steps require extra internal stuffs, but they can have arguments22:34
TheJuliaoh, I don't think we can time this out, because for example, a process could be hours22:35
TheJuliaAt least, just thinking of some of the similar/related cases22:35
JayFyou have to give me an example of a process we'd run this way that'd take hours22:35
JayFjust to help me grasp it, if you can?22:35
TheJuliafor every device, cat $dev | gzip -c -9 | nc remote-server port22:36
JayFI shouldn't have asked22:36
TheJulialol22:36
JayFthat is SCREAMING to be an in-band step22:36
TheJuliaIve gotten a few messages of "this would help with snapshooting"22:36
JayFlike seriously22:36
TheJuliayeah22:36
TheJuliaagree, one step at a time, at credentials are the challenge there, but yeah22:37
JayFif that's the use case, then I see the problem as more "how do we make custom in-band steps easier" and arb commands don't really work for this unless we want the worlds' worst bash/python/perl oneliners ever in our config files22:37
TheJuliaThe challenge is always laying enough foundation that we can build a few things and not constantly look at the foundation and go "we have too much, or not enough"... and that we support dynamic loads in case the other sturdier looking tower collapses22:38
TheJuliawell, tbh, it could easily just be an embedded script22:38
TheJuliaoperators are so much more comfortable with changing up ramdisks than writing hardware managers22:38
JayFHm.22:38
TheJuliaissue is the latter is software development, the prior is just consult the docs22:39
JayFI'm not going to lie, I think that's a ... false belief22:39
JayFthat one is harder than the other regardless of background22:39
JayFbut software is easier than education so it doesn't matter22:39
TheJuliaI think it comes down to perception and skillset, and the resulting union22:40
JayFyeah, okay, you've got me in pondering land22:40
TheJuliaor the non-overlapping result of the union to be precise22:40
JayFlet me think about that for a bit22:40
TheJuliaponder!22:40
JayFI'll keep the "what's a better name for SERVICE stable state" on the back burner too22:40
TheJuliamy brain is cooked for the day22:41
JayFbecause if we need to do that, we can, but SERVICE vs SERVICING having semantic differences makes my head hurt22:41
TheJuliaif you think of one, lmk :)22:41
opendevreviewMerged openstack/metalsmith stable/zed: Get ports by 'binding:host_id' query filter  https://review.opendev.org/c/openstack/metalsmith/+/87358822:41
JayFOPERATOR22:41
JayFMANUAL_SERVICE ? 22:41
JayFoh no, I have it, I have it22:41
JayFTheJulia: EXTERNAL state22:41
TheJulia... hmm22:41
JayFTheJulia: or something that similarly describes that we've handed over control to an external system22:41
TheJuliaI'm going to need to sleep on it I think22:41
JayFyeah, same22:42

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!