TheJulia | \o/ | 00:08 |
---|---|---|
rpittau | good morning ironic! o/ | 08:08 |
rpittau | and whoopsies for ipmitool | 08:08 |
rpittau | since we're in the mood recently maybe we can adopt it? :) | 08:08 |
dtantsur | rpittau: a bunch of vendor-specific C code? I'd rather not... | 08:33 |
rpittau | no worries, it was of course a joke, I'm not "that" crazy :D | 08:52 |
arne_wiebalck | Good morning, rpittau dtantsur and Ironic! | 11:22 |
arne_wiebalck | I am looking into rebuilding nodes with s/w RAID (since it does not work :-). | 11:23 |
arne_wiebalck | The reason is that after the image has been deployed, manage_uefi() tries to create the boot partitions which are already there. | 11:23 |
arne_wiebalck | Can I see somehow in the IPA that this is a "rebuild" operation (with the idea to skip recreating the boot partitions)? | 11:24 |
arne_wiebalck | Alternatively, I could probably force partition creation and overwrite the existing ones ... not sure I want to do that. | 11:25 |
arne_wiebalck | Or I could catch the error and *assume* the existing partitions are ok ... not ideal either. | 11:27 |
arne_wiebalck | rpittau: what happened to ipmitool? | 11:35 |
rpittau | arne_wiebalck: https://www.phoronix.com/news/ipmitool-GitHub-Suspended | 11:39 |
dtantsur | arne_wiebalck: IPA has access to the node, maybe it has access to provision_state? | 11:41 |
iurygregory | good morning Ironic | 12:39 |
arne_wiebalck | hey iurygregory o/ | 13:11 |
arne_wiebalck | rpittau: thanks! hadn't heard this yet | 13:11 |
arne_wiebalck | dtantsur: hmm ... is a `openstack server rebuild ...`, not just a deploying provisioning state in the end? does Ironic make a difference between deploy and rebuild of an instance anywhere? | 13:12 |
dtantsur | mm, yeah, you're right | 13:13 |
iurygregory | arne_wiebalck, o/ | 13:13 |
dtantsur | arne_wiebalck: we can probably store something temporary in driver_internal_info.. | 13:13 |
arne_wiebalck | dtantsur: you mean set this out of band? | 13:14 |
arne_wiebalck | dtantsur: or does Ironic know at some point? | 13:14 |
dtantsur | arne_wiebalck: no, I mean ironic could do that on rebuild | 13:14 |
dtantsur | 'rebuild' is a provisioning verb | 13:14 |
arne_wiebalck | dtantsur: sorry, when I say "rebuild" I mean `openstack server rebuild` | 13:14 |
arne_wiebalck | dtantsur: will Ironic see this as a rebuild request? | 13:15 |
dtantsur | arne_wiebalck: yeah, but I'd expect that to result in an ironic rebuild, not move through 'available'? | 13:15 |
dtantsur | I hope so :) | 13:15 |
arne_wiebalck | dtantsur: heh | 13:15 |
arne_wiebalck | dtantsur: I will check | 13:15 |
arne_wiebalck | dtantsur: if that is the case, we could make a mental note | 13:15 |
arne_wiebalck | dtantsur: and check, and erase the note | 13:15 |
arne_wiebalck | dtantsur: thanks, I will try | 13:16 |
dtantsur | arne_wiebalck: I mean, I'm quite sure we don't drive nodes through cleaning on rebuild | 13:16 |
arne_wiebalck | dtantsur: yes, that is correct | 13:24 |
arne_wiebalck | dtantsur: the other option (for us) would be to set sth out of band on the node: operations such as rebuild are often embedded in workflows where we could add a line | 13:26 |
arne_wiebalck | dtantsur: the first thing I will try, though, is to see if skipping the creation of the boot partitions actually does the trick | 13:26 |
arne_wiebalck | dtantsur: if it does, we can look into the logic when to do that | 13:27 |
TheJulia | good morning | 13:35 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Refactoring: create ironic.conductor.inspection https://review.opendev.org/c/openstack/ironic/+/877317 | 14:41 |
dtantsur | morning TheJulia | 14:41 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Refactoring: DRY in the root API controller https://review.opendev.org/c/openstack/ironic/+/877384 | 15:01 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Refactoring: clean up inspection data handlers https://review.opendev.org/c/openstack/ironic/+/877387 | 15:39 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: [WIP] Migrate the inspector's /continue API https://review.opendev.org/c/openstack/ironic/+/875944 | 15:45 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: [WIP] Migrate the inspector's /continue API https://review.opendev.org/c/openstack/ironic/+/875944 | 15:46 |
opendevreview | Julia Kreger proposed openstack/ironic stable/2023.1: Clean out agent token even if power is already off https://review.opendev.org/c/openstack/ironic/+/877391 | 15:50 |
opendevreview | Julia Kreger proposed openstack/ironic stable/zed: Clean out agent token even if power is already off https://review.opendev.org/c/openstack/ironic/+/877392 | 15:50 |
jlvillal | TheJulia, FYI: I posted a couple of patches for bifrost. 1) just a simple change to show the path if there is an error. https://review.opendev.org/c/openstack/bifrost/+/877124 | 16:06 |
jlvillal | 2) The fix I did to get my system with IPv6 disabled to work: https://review.opendev.org/c/openstack/bifrost/+/877122 | 16:06 |
opendevreview | Merged openstack/metalsmith stable/wallaby: list_instances - cache allocations https://review.opendev.org/c/openstack/metalsmith/+/876673 | 16:14 |
rpittau | bye everyone, see you tomorro! o/ | 16:24 |
TheJulia | jlvillal: interesting! | 16:27 |
jlvillal | TheJulia, Which part? :) | 16:27 |
jlvillal | I'll assume it is that incredible patch that adds printing the variable of the deploy image location ;) | 16:28 |
TheJulia | jlvillal: I wonder if the socket code is smart enough to go that is an ipv4 address, we need to use a v4 socket | 16:30 |
jlvillal | I'm not sure... I figured since Bifrost is IPv4 only and doesn't support IPv6 it was okay to do this change. | 16:32 |
TheJulia | jlvillal: looking at https://review.opendev.org/c/openstack/bifrost/+/877122/1//COMMIT_MSG#7 I'm a little confused since the file is ironic-inspector.conf.j2 | 16:32 |
jlvillal | Ah, duh! Yes you are correct. It should be 'inspector' | 16:33 |
TheJulia | sorry | 16:33 |
opendevreview | John L. Villalovos proposed openstack/bifrost master: chore: allow ironic-inspector to work with IPv6 disabled https://review.opendev.org/c/openstack/bifrost/+/877122 | 16:35 |
jlvillal | TheJulia, Commit message corrected. | 16:36 |
jlvillal | Thanks. | 16:39 |
opendevreview | Merged openstack/ironic stable/zed: Do not recalculate checksum if disk_format is not changed https://review.opendev.org/c/openstack/ironic/+/877251 | 16:43 |
opendevreview | Merged openstack/ironic bugfix/21.3: Do not recalculate checksum if disk_format is not changed https://review.opendev.org/c/openstack/ironic/+/877250 | 16:43 |
opendevreview | Merged openstack/ironic stable/2023.1: Do not recalculate checksum if disk_format is not changed https://review.opendev.org/c/openstack/ironic/+/877029 | 16:43 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Refactoring: clean up inspection data handlers https://review.opendev.org/c/openstack/ironic/+/877387 | 17:18 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Refactoring: clean up inspection data handlers https://review.opendev.org/c/openstack/ironic/+/877387 | 17:27 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: [WIP] Migrate the inspector's /continue API https://review.opendev.org/c/openstack/ironic/+/875944 | 17:31 |
opendevreview | Merged openstack/ironic master: Document [fake] delay config values https://review.opendev.org/c/openstack/ironic/+/868053 | 18:02 |
opendevreview | Merged openstack/bifrost master: chore: provide the location of deploy_image_path if missing https://review.opendev.org/c/openstack/bifrost/+/877124 | 18:44 |
opendevreview | Merged openstack/networking-generic-switch master: config: Ignore unknown options starting with ngs_ https://review.opendev.org/c/openstack/networking-generic-switch/+/868300 | 19:04 |
opendevreview | Julia Kreger proposed openstack/ironic-specs master: Add service steps framework https://review.opendev.org/c/openstack/ironic-specs/+/872349 | 19:05 |
opendevreview | Merged openstack/bifrost master: Restore discovery for dnsmasq dhcp provider https://review.opendev.org/c/openstack/bifrost/+/864787 | 19:13 |
opendevreview | Merged openstack/ironic master: Wipe Agent Token when cleaning timeout occcurs https://review.opendev.org/c/openstack/ironic/+/876161 | 19:27 |
opendevreview | Merged openstack/ironic bugfix/21.2: Configure CI for bugfix/21.2 https://review.opendev.org/c/openstack/ironic/+/876410 | 19:27 |
opendevreview | Julia Kreger proposed openstack/ironic stable/2023.1: Wipe Agent Token when cleaning timeout occcurs https://review.opendev.org/c/openstack/ironic/+/877394 | 19:30 |
opendevreview | Julia Kreger proposed openstack/ironic stable/zed: Wipe Agent Token when cleaning timeout occcurs https://review.opendev.org/c/openstack/ironic/+/877395 | 19:30 |
opendevreview | Julia Kreger proposed openstack/ironic stable/yoga: Wipe Agent Token when cleaning timeout occcurs https://review.opendev.org/c/openstack/ironic/+/877396 | 19:30 |
opendevreview | Merged openstack/bifrost master: chore: allow ironic-inspector to work with IPv6 disabled https://review.opendev.org/c/openstack/bifrost/+/877122 | 19:40 |
opendevreview | Julia Kreger proposed openstack/ironic-specs master: Add service steps framework https://review.opendev.org/c/openstack/ironic-specs/+/872349 | 20:28 |
opendevreview | Julia Kreger proposed openstack/ironic-specs master: Framework for DPU management/orchustration https://review.opendev.org/c/openstack/ironic-specs/+/874189 | 21:34 |
JayF | TheJulia: how about "ansible" as a step :D | 22:14 |
* JayF just thinking of that w/r/t your "pause" step ... makes me wonder if we could actually have ironic still-orchestrate the external stuff | 22:14 | |
ashinclouds[m] | Pause is needed for dpu programming aiui | 22:14 |
JayF | I'm thinking in-addition-to not in-lieu-of | 22:15 |
TheJulia | Oh | 22:15 |
JayF | like implementing an external step where conductor would e.g. shell out and do whatever you want (maybe configured via ironic.conf so it's not arb command exec) | 22:15 |
TheJulia | I guess the challenge is if we’re in a special ramdisk which say has the api gateway to do magical card programming | 22:16 |
JayF | no, I'm saying conductor could do it | 22:16 |
TheJulia | Which we can’t trigger directly (still undefined/nebulous at the moment) | 22:16 |
TheJulia | Oh | 22:16 |
JayF | sorta like a more easy way to do custom out-of-band step (really, really out of band) | 22:17 |
* JayF thinking back to previous companies where we had patches to e.g. call into a CMDB-like service during deployment | 22:17 | |
TheJulia | Maybe, we have talked about that before and I’ve been shot down on command invocattiony things | 22:17 |
* TheJulia tries to fend off aggressive cat | 22:17 | |
JayF | we'd have to dictate a strong, safe interface | 22:17 |
JayF | use nagios nrpe as a guide: only ever run commands that were submitted by an admin into the ironic.conf | 22:17 |
TheJulia | So, I would almost rather just pass thing to the agent | 22:18 |
TheJulia | Not on the conductors shell | 22:18 |
JayF | In any use case I would've had for such automation, I needed it OOB | 22:18 |
JayF | but that is strongly influenced by the fact that any Ironic I've worked on ends up with a custom hw mgr for some reason so in-band is easier :D | 22:18 |
TheJulia | Yeah, the challenge is we would have to kick out back to the workload and then back into a ramdisk if there was an issue | 22:19 |
TheJulia | Which maybe that is fine | 22:19 |
JayF | it's a thought that your comment spawned, I don't think we have to do it at all or if there's a technical hurdle | 22:20 |
TheJulia | https://usercontent.irccloud-cdn.com/file/bdXT5EIF/1678832397.JPG | 22:20 |
JayF | I was just sorta surprised we hadn't done it before | 22:20 |
TheJulia | Halp, boy cat wants scritches | 22:20 |
TheJulia | It being commands? | 22:21 |
JayF | yeah | 22:22 |
JayF | arbitary commands as a step | 22:23 |
JayF | so ironic can orcestrate external things | 22:23 |
JayF | without needing more patches | 22:23 |
TheJulia | Yeah, I honestly would prefer if we could do a filter match conductor side and pass through to the agent to run the command locally | 22:23 |
TheJulia | Which would then allow for a cleanish bolt on | 22:24 |
TheJulia | And it could have that local context | 22:24 |
JayF | by "the agent" do you mean something other than the in-band agent? | 22:24 |
TheJulia | Ike, ipa | 22:24 |
TheJulia | Err, like, not Ike | 22:24 |
TheJulia | There are no Ike’s here…. Afaik. | 22:25 |
JayF | We already support that relatively trivially though, with custom hardware managers | 22:25 |
JayF | it's quite a larger lift to create a custom hardware type to do similar | 22:25 |
TheJulia | Yeah, just thinking a standard method | 22:25 |
JayF | plus conductors generally are in a more privledged network space | 22:25 |
JayF | so e.g. IPA in almost anywhere I've worked while booted for deploy/clean/rescue would have access to nearly-nothing | 22:25 |
JayF | whereas a conductor could, for instance, update an external CMDB that a node had been deployed | 22:26 |
TheJulia | Yeah, but you could launch something there that does something and waits | 22:26 |
JayF | I don't understand the line you're drawing with that comment? | 22:26 |
TheJulia | Likely best if we do execute with least privilege. And if a command doesn’t exit immediately it could continue to run | 22:27 |
TheJulia | And run say until exit on the agent side, at which point upon next heartbeat, the next step would run | 22:27 |
JayF | I feel like my concern about IPA having no network privs, and the need to modify IPA ramdisk already to include the tool (at which point a thin HWM is trivial) was sorta skated over? | 22:28 |
JayF | like, are you arguing this is useful for IPA, or that it's too unsafe for conductors? or ??? | 22:29 |
TheJulia | IPA network preferences are all over the place security wise | 22:29 |
TheJulia | For conductors, I would be worried about too much access, and an inability to complete the needful | 22:30 |
JayF | I think the latter half can be 10000% coded around | 22:30 |
TheJulia | But maybe a case could be made for both sorts of options | 22:30 |
JayF | and the first half is less of a concern as long as you ensure the only people who can determine *what commands run* (not when) are those who can edit ironic.conf | 22:30 |
JayF | but either way, we're just talking about a throwaway idea; I'm not sure this would reach the top of our lists for important stuff | 22:30 |
JayF | just when I hear "so people don't have to orchestrate through ironic", it makes me feel like we're missing something | 22:31 |
JayF | given our entire thing is lifecycle management and hardware orchestration | 22:31 |
TheJulia | It might be my primary priority if we can reach a framework which makes sense | 22:31 |
JayF | you familar with nagios_nrpe_server? | 22:31 |
TheJulia | Oh yea | 22:31 |
JayF | that's exactly the model I'm thinking | 22:31 |
TheJulia | I did awful things with npre | 22:31 |
JayF | you preconfigure a command, maybe put some args in it (or maybe we forbid it to be safer), and go | 22:32 |
TheJulia | I guess the question for those that l… ahem… misuse rescue, is would this work | 22:32 |
JayF | if we wanted to misdirect it, e.g. conductor tells ironic-remote-command-agent (I'd be shocked if openstack doesn't have something in this category already) | 22:33 |
TheJulia | S/l…/un…/ | 22:33 |
TheJulia | Err, um | 22:33 |
JayF | but I also think we could just build that model into the conductor and it'd be OK as long as we walled it off, aggressive timeouts, etc | 22:33 |
* TheJulia gives up on phone keyboard | 22:33 | |
JayF | (there's also no reason we couldn't do this in both directions; I just see it as much less useful with IPA because going over the line to "I build my own ramdisk" is the hard part) | 22:34 |
TheJulia | we don't have a dynamic passthrough, steps require extra internal stuffs, but they can have arguments | 22:34 |
TheJulia | oh, I don't think we can time this out, because for example, a process could be hours | 22:35 |
TheJulia | At least, just thinking of some of the similar/related cases | 22:35 |
JayF | you have to give me an example of a process we'd run this way that'd take hours | 22:35 |
JayF | just to help me grasp it, if you can? | 22:35 |
TheJulia | for every device, cat $dev | gzip -c -9 | nc remote-server port | 22:36 |
JayF | I shouldn't have asked | 22:36 |
TheJulia | lol | 22:36 |
JayF | that is SCREAMING to be an in-band step | 22:36 |
TheJulia | Ive gotten a few messages of "this would help with snapshooting" | 22:36 |
JayF | like seriously | 22:36 |
TheJulia | yeah | 22:36 |
TheJulia | agree, one step at a time, at credentials are the challenge there, but yeah | 22:37 |
JayF | if that's the use case, then I see the problem as more "how do we make custom in-band steps easier" and arb commands don't really work for this unless we want the worlds' worst bash/python/perl oneliners ever in our config files | 22:37 |
TheJulia | The challenge is always laying enough foundation that we can build a few things and not constantly look at the foundation and go "we have too much, or not enough"... and that we support dynamic loads in case the other sturdier looking tower collapses | 22:38 |
TheJulia | well, tbh, it could easily just be an embedded script | 22:38 |
TheJulia | operators are so much more comfortable with changing up ramdisks than writing hardware managers | 22:38 |
JayF | Hm. | 22:38 |
TheJulia | issue is the latter is software development, the prior is just consult the docs | 22:39 |
JayF | I'm not going to lie, I think that's a ... false belief | 22:39 |
JayF | that one is harder than the other regardless of background | 22:39 |
JayF | but software is easier than education so it doesn't matter | 22:39 |
TheJulia | I think it comes down to perception and skillset, and the resulting union | 22:40 |
JayF | yeah, okay, you've got me in pondering land | 22:40 |
TheJulia | or the non-overlapping result of the union to be precise | 22:40 |
JayF | let me think about that for a bit | 22:40 |
TheJulia | ponder! | 22:40 |
JayF | I'll keep the "what's a better name for SERVICE stable state" on the back burner too | 22:40 |
TheJulia | my brain is cooked for the day | 22:41 |
JayF | because if we need to do that, we can, but SERVICE vs SERVICING having semantic differences makes my head hurt | 22:41 |
TheJulia | if you think of one, lmk :) | 22:41 |
opendevreview | Merged openstack/metalsmith stable/zed: Get ports by 'binding:host_id' query filter https://review.opendev.org/c/openstack/metalsmith/+/873588 | 22:41 |
JayF | OPERATOR | 22:41 |
JayF | MANUAL_SERVICE ? | 22:41 |
JayF | oh no, I have it, I have it | 22:41 |
JayF | TheJulia: EXTERNAL state | 22:41 |
TheJulia | ... hmm | 22:41 |
JayF | TheJulia: or something that similarly describes that we've handed over control to an external system | 22:41 |
TheJulia | I'm going to need to sleep on it I think | 22:41 |
JayF | yeah, same | 22:42 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!