Wednesday, 2025-05-21

iurygregoryI give up, 3 KP when trying to boot the iDRAC 10 is enough for one day =( 02:23
iurygregoryI'm wondering if I should pass root device hints to see if would help , but I will figure out tomorrow02:24
TheJuliaiurygregory: are you able to capture much of the kernel panic? are you able to reproduce it?03:11
*** jroll04 is now known as jroll007:17
opendevreviewNicolas Belouin proposed openstack/ironic-python-agent stable/2025.1: netutils: Use ethtool ioctl to get permanent mac address  https://review.opendev.org/c/openstack/ironic-python-agent/+/95048907:41
stephenfinTheJulia: When you're about, I wonder if you'd be able to take a look over a failure we're seeing in Gophercloud CI? https://github.com/gophercloud/gophercloud/actions/runs/15144029062/job/42575162709?pr=310809:50
stephenfinThat seems to be coming from code related to the network simulator stuff you've added since we branched. I haven't been able to reproduce on a local Ubuntu 24.04 host though, so I'm hoping you'll see something obvious09:52
Sandzwerg[m]Morning ironic. I'd like to test secure boot. Via toggling it doesn't seem to work so far (I have the impression the toggling is not happening) using our own IPA & ESP I get a secure boot error. So our IPA doesn't support secure boot. While I try to figure out what package or config is missing: is there an IPA & ESP that I can use which should support secure boot out of the box? Or a way I can build something fast that should10:15
Sandzwerg[m]work?10:15
masghariurygregory: also wondering if the disks ever came back on the UI, how curious10:25
dtantsuriurygregory: I can assure you that root device hints have no effect when the ramdisk is booting (and why would they?)10:42
dtantsurSandzwerg[m]: what's your boot method and how do you build IPA?10:44
*** sfinucan is now known as stephenfin11:19
opendevreviewMerged openstack/ironic unmaintained/xena: [stable-only] Fix errors building docs  https://review.opendev.org/c/openstack/ironic/+/94926011:30
Sandzwerg[m]<dtantsur> "Sandzwerg: what's your boot..." <- In this case idrac-redfish. We build our IPA with mkosi and use fedora 40 as basis. We needed that some years ago because we had hardware issues with what we were using before. We probably could switch to something else like DIB by now but before I invest the time I'd like to get something running to make sure it works at all and there isn't something else blocking11:41
dtantsurSandzwerg[m]: ah yeah. I recall building a secure boot capable ISO being quite annoying, trying to remember any details11:46
dtantsurSandzwerg[m]: this is what I did for metal3 back in the days: https://github.com/metal3-io/ironic-image/commit/f12f20511:48
Sandzwerg[m]currently the ISO is build on the fly with the ESP and the ramdisk & kernel, and I rebuild the esp on fedora 40 so it's the same as the IPA image itself, there was a note in the documentation that that was required but it still failed. So I guess something in our fedora image is missing. That's why I'd like to get a "known good" and if that works we can even switch. The main reason for our customized deployment is gone11:48
dtantsurtl;dr is to be careful what you put into the ESP and also configure Ironic to match that11:49
Sandzwerg[m]For the esp we basically did https://docs.openstack.org/ironic/latest/install/configure-esp.html we only needed to adjust the size as the hardcoded value was to small for us11:51
Sandzwerg[m]That's why we rebuild it with the matching distribution or could it even be that a change in the package leads to an error?11:52
dtantsurdo you configure the matching grub_config_path?11:56
dtantsurIt should work this way..11:57
dtantsurAlthough, granted, I've only tried secure boot on RHEL CoreOS11:57
Sandzwerg[m]yes, we adjusted the grub,shim path, and the dest path that was all we did. But there is no ESP availavle that fits to the for example centos based IPA images that are available?12:01
dtantsurSandzwerg[m]: we don't publish one, and someone told me that CentOS Stream is not properly signed by the Microsoft's key12:02
Sandzwerg[m]hmpf. Okay is there a recommendation for what to use if one wants to do secure boot?12:04
dtantsurI think Fedora could be the right path. Debian might work too. I haven't dealt with this topic for ages, sorry.12:05
dtantsur(maybe TheJulia has more recent experience?)12:05
Sandzwerg[m]The issue wehave with fedora is the frequent upgrades and changes. We're stll on 40 because it's the last one that doesn't have python 3.12 as default, maybe that would work now but back then it broke IPA I think. We might be able to circumvent that with uv or similar tools but we haven't looked into that yet12:09
* TheJulia waves from an uncaffinated state12:57
TheJuliaso from my experience, Centos Stream's shim loader *is* signed by msft13:00
TheJuliaspecifically we had a bug appear in shim ages ago and it made its way into rhel becuase getting the shim binary re-signed is a brutal process13:03
TheJulia(which I've been copied on for that bug too....)13:03
Sandzwerg[m]Alright, I'll try that then. Thanks13:04
dtantsurokay, must have misunderstood something..13:04
TheJuliastephenfin: so, I'm wondering if it is newgrp which is causing the passwd prompt trigger, or if it is sudo. I guess we could "sudo newgrp"? For what it is worth, recent neutron devstack changes have torpedoed our gate, so we're sort of dead in the water at the moment while we try to figure that out as well13:22
rpittauTheJulia, dtantsur, JayF, cid, I think I found the "issue" with sushy/sushy-tools auth loop -> https://opendev.org/openstack/ironic/commit/5f7c7dcd041e95a7f1283ab12e9d708844fd097413:25
rpittauwe're now calling ironic.drivers.modules.redfish.utils and it does not detect redfish cached session, causing the loop13:26
rpittauthis -> https://pastebin.com/1zckr36J13:27
rpittauwe should revert that change and look into sushy/sushy-tools to avoid the loop13:27
rpittauat least I could not find anything else :/13:29
TheJuliarpittau: doesn't detect the unique session url already on hand so it then tries again13:29
rpittauyeah, I mean that's the only kind of related change that I can see13:32
rpittaualthough it did get merged a week ago, it seems issues started later, so not 100% sure13:32
rpittauoh wait13:33
rpittaujust checking the actual patch, it did pass the first time on metal3 integration too, so I wonder if it's a race then13:34
rpittauno nvm it never passed on metal313:36
rpittauand Python version does not make teh difference13:36
opendevreviewQueensly Kyerewaa Acheampongmaa proposed openstack/sushy master: WIP: Add DateTime and DateTimeLocalOffset support to Manager resource  https://review.opendev.org/c/openstack/sushy/+/95053913:37
rpittauI'll do a revert patch just to try13:37
opendevreviewRiccardo Pittau proposed openstack/ironic master: Revert "Fix redfish driver URL parsing"  https://review.opendev.org/c/openstack/ironic/+/95054013:38
rpittauI will do another test in parallel in metal313:43
TheJuliak13:46
TheJuliatrying to figure out why our most advanced networking jobs are dead now :(13:47
TheJuliaokay, so NGS is just not working now13:56
TheJuliathat is why the jobs are failing with JayF's patch13:56
JayFDid they do the eventlet removal14:03
JayFif so, we probably need a patch similar to my NBM patch in NGS14:03
TheJuliawho is they in that statement, it seems like maybe we're in an odd setup state14:04
JayFneutron landed eventlet migration for l2 agents14:04
JayFwhich makes me wonder if we are breaky downstream for that reason14:04
JayFsince NGS/NBM are plugins14:04
JayFhttps://opendev.org/openstack/neutron/commit/9dc0d0fd2f44e348705804f1f99403086c138010 hmm not as dramatic as I thought14:05
JayFtiming doesn't match anyway14:05
TheJuliaoh, I think I see what is going on14:08
TheJuliaso, we have configuration loaded in the files14:08
TheJuliabut not in the running neutron API instance which is where ml2 plugins launch from14:08
TheJuliaif you compare old to new14:09
JayFit'd be interesting to understand where the restart was dropped, since my statement before about that code being dead is still true14:09
TheJuliaold, we restart neutron 2x14:09
TheJuliaand it gets the configuration14:09
JayFthat's what I've been struggling with; the diff is so small in devstack14:09
TheJuliain new, it never gets restarted14:09
JayFhm14:09
TheJuliayeah14:11
TheJuliathe way it gets *registered* only ever sets the config files parameter14:11
TheJuliahttps://opendev.org/openstack/devstack/src/branch/master/lib/neutron#L104814:11
TheJuliadog is demanding to go out, bbiab14:12
TheJuliaokay, in working, the genericswitch ini file is on the intiial start14:33
TheJuliain the non-working, it never gets added/loaded14:34
TheJuliai see the issue15:04
TheJuliawhen you use wsgi launcher, the existing configuraiton modeling does *not* load up or respect the classical configuration patterns for neutron services15:05
TheJuliainstead, neutron looks for an environment variable to source the list15:05
TheJuliahttps://github.com/openstack/neutron/blob/master/neutron/server/__init__.py15:06
TheJuliaI've raised it in the neutron channel15:13
TheJuliaI'm guessing we're sort of shit out of luck15:13
opendevreviewQueensly Kyerewaa Acheampongmaa proposed openstack/sushy master: Add DateTime and DateTimeLocalOffset support to Manager resource  https://review.opendev.org/c/openstack/sushy/+/95053915:37
opendevreviewVerification of a change to openstack/bifrost master failed: Add support for downloading CentOS Stream 10 image  https://review.opendev.org/c/openstack/bifrost/+/95028615:41
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: Workaround neutorn's move to uwsgi only  https://review.opendev.org/c/openstack/networking-generic-switch/+/95055915:49
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: Workaround neutron's move to uwsgi only  https://review.opendev.org/c/openstack/networking-generic-switch/+/95055915:50
TheJuliaI *think* ^^ might work, but we might need to disable ironic jobs first, and then land it15:50
TheJuliaif we can test it in15:50
TheJuliatime will tell15:50
* JayF just filed RFE https://bugs.launchpad.net/ironic/+bug/211143815:53
JayFI'm not 100% sure I have the shape of the solution right,  but I wanted to document the need/ask15:53
* JayF looks at Julia's change15:54
JayFTheJulia: I think start_neutron_service_and_check does some configuration stuff too15:54
* JayF double checks15:55
JayFyeah I suspect there'll be an issue around stop/start and hitting the right set of services15:56
JayFbut I am not certain enough to suggest a fix before seeing output15:56
TheJuliait happens after the ini file is written15:56
TheJuliahttps://e3fa69918ab3893f89a3-76ad47885070581f857a540cadaa6a6d.ssl.cf1.rackcdn.com/openstack/55cf2727b4c54f06b897353cf71ea0a3/controller/logs/etc/neutron/neutron-api-uwsgi.ini is what we get today15:56
JayFI mainly am wondering if we need to hit start_neutron too15:57
JayFso the agents come back up15:57
JayFI'm ... mostly sure we don't need to?15:57
JayFeither way, you have the science sciencing, we'll see if there's cake at the end15:57
TheJuliayeah, I'm curious what neutron folks will say... if they respond at all15:58
opendevreviewMerged openstack/ironic-python-agent master: Remove TinyIPA jobs  https://review.opendev.org/c/openstack/ironic-python-agent/+/95023616:06
JayFI guess IPA doesn't have any neutron jobs16:06
JayFin a surprising victory of sensibility in our CI16:06
JayFmassive irony about neutron devstack jobs being broken: cid and I have time on calendar today to work on step 0 of dynamic networking (contributor guide docs for complex networking devstack setups)16:07
JayFso I guess that gets pushed a week lol16:07
FreemanBoss[m]rpittau: please your review is required... (full message at <https://matrix.org/oftc/media/v1/media/download/AXmEi5gLrV1hye0GHOujXjSDi9m2WU7JWH3Y5IJ7dhg1bw_sPHJ485qws7uKRCKVkHL3c3c566IQ_GZSpRiPHdFCeXO7NK8gAG1hdHJpeC5vcmcvZFpVR09sZGdhdk9nVUVKYWJ6VlhHclVl>)16:08
masgharFreemanBoss: For changes ready for review, you can add the hashtag 'ironic-week-prio'16:33
masgharWill get more eyes on your change16:33
TheJuliaJayF: So, I think we need to disable voting on the jobs on your patch16:42
TheJuliamerge that, then try to get n-g-s jobs ifxed16:42
TheJuliaThe n-g-s job failed deep in the config, so hopefully that is the shortest path16:42
TheJuliaJayF: I can revise your patch if you want, or you can do so to mark the failing jobs as non-voting16:43
JayFI'm OK with that, but can we achieve the same result for science with Depends-On on the NGS patch?16:43
TheJuliabut we can't merge it16:43
TheJuliaand we would be blocked from doing so16:43
TheJuliaAnd regardless, we would need to make something non-voting someplace to merge a fix16:43
JayFlike I said I'm OK with it16:44
JayFyou can update patch or I will in a few minutes when I reach the end of my current train of thought16:45
TheJuliaI'm on it16:45
opendevreviewJulia Kreger proposed openstack/ironic master: ci: Remove code which has been long-dead  https://review.opendev.org/c/openstack/ironic/+/95046116:47
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: Workaround neutron's move to uwsgi only  https://review.opendev.org/c/openstack/networking-generic-switch/+/95055916:49
TheJuliaokay, that will allow us to semi-unblock and begin sciencing furhter16:51
TheJuliaSCIENCE!16:51
TheJuliaso, regarding https://bugs.launchpad.net/ironic/+bug/2111438, is the idea some sort of "power priority" and then to sort the power sync/status/etc stuff via the priority16:54
JayFWell, in my particular case that won't do the trick16:57
JayFprimarily because it's not specific *nodes* it's specific *instances*16:58
JayFnotice how the example script is keying on instance_info16:58
JayFthat's the reason I leaned towards an offline tool, because it fits into a DR recovery plan more sanely; 1) get DB up 2) set power priorities 3) online ironic and let it execute 4) use API clients to spin up the rest as needed16:58
JayFI could also potentially accept "just power all of them off" as an option, then API to turn on the ones we want after16:59
TheJuliahmm, fair17:08
TheJuliaI guess there are a few competing challenges:17:16
TheJulia1) Conductor should be powering everything up *anyhow*17:16
TheJulia2) Some sort of priority would make a lot of sense as to not overload breakers with inrush current. I've never popped a distribution breaker, but Ironic has successfully popped some breakers in it's history ;)17:16
TheJulia3) I guess it is fair to be able to "go turn on key nodes, and to be able to key that off something which is instance provided.17:16
TheJuliaPerhaps a power_prioirty of 0 could be "go reference some config which could reach into or look at something", 1-98 could be in order, and add a knob at 99 by default17:17
TheJuliaThe whole thing about a DR plan recovery, does sort of make sense, if you had everything powered down17:19
TheJuliabut ironic is going to try to return to prior state17:19
TheJuliaSo, if you did power everything off, then you have to power everything back up17:19
*** darmach0 is now known as darmach17:27
-opendevstatus- NOTICE: Gerrit is being updated to the latest 3.10 bugfix release as part of early prep work for an eventual 3.11 upgrade. Gerrit will be offline momentarily while it restarts on the new version.17:34
adamcarthur5TheJulia Hey, I am not sure about a priority because its really about instances AND nodes, which you kind of mention. I think just a way of saying "power on the nodes as you see fit, but these instances need to come online first"18:09
TheJuliathen I think if we could have a priority which somehow explains "go look at this"... maybe!?18:11
adamcarthur5Ah okay, I think I have misinterpreted what level you meant "priority" at. I agree, I think "go look at this" is the difficult thing because:18:12
adamcarthur51) It needs to be agnostic to how you create instances (i.e support more than just nova)18:13
adamcarthur52) It needs to live entirely in Ironic 18:13
TheJuliayeah, it doesn't solve the case though, if you epxlicitly shut everything down first18:14
TheJuliabecause then save power state superceeds18:14
adamcarthur5Is that about your point 1?18:19
TheJulianot really, mor so is how you get into a disaster in the first place18:20
TheJulia"oh no, the nuclear power plant is melting down" is entirely different from "data center is burning down"18:21
TheJuliai.e. is this a sudden disaster, or a slow rolling disaster18:21
TheJuliaare you coming back from nothing, or just a hard outage18:21
TheJuliawas that outage planned, or not18:21
adamcarthur5Are you mentioning this because we need to have a feature that covers many scenarios, or because you don't understand where this bug desc is coming from?18:23
TheJuliathere are many scenarios18:23
TheJuliaFor example, I ran a DR scenario once which was literally "The power plant near by is melting down, we have to leave, the servers will keep running for a undeterminable amount of time" and a opposite test, "a tornado hit the data center, we're rebuilding from scratch"18:24
TheJuliathere is a whole spectrum in there18:24
TheJuliaso assuming the disaster is "Electrical Room Fire", then you the prior power state is power_on when your conductor is back online18:25
adamcarthur5Yeah okay, I mean is it acceptable for us to specific only think about issues like ours. So "external factors knocked everything offline, for a temporary period" i.e power outage18:25
TheJuliabut assuming your disaster is uhhh... "UPS is in bypass, and we need to cut the slab to replace it" then you can shutdown the workloads and your state is "power off"18:26
adamcarthur5Your worry is getting everything to the power off state?18:27
TheJuliano, my worry is that we can't recover the power state because we start in a power state of power off if you did a staged shutdown of the data center18:28
TheJuliaso the starting place is a little weird18:29
TheJuliathe bug, seems to request the idea of "hey, explicitly power on" which is totally different as well18:29
adamcarthur5I think Jay is purely using that as an example18:30
adamcarthur5I.e. explicit power off is not a requirement18:30
adamcarthur5(before starting_18:30
TheJuliaso we need to "turn on a fleet", and it is almost like we need tri-value field18:34
JayFwell18:34
JayFthe power_state field on those nodes are liekly power_on18:34
JayFeven though the datacenter, in this case, went boom and they are all off18:34
JayFso the idea is just to try and get conductor to spin up, say, the 5% of nodes who (in my downstream case) have an instance name that indicates it's a controller node 18:34
JayFnote that it's **instance** name18:35
TheJuliaoh yeah18:35
JayFwe don't dedicate nodes for controllers18:35
JayFthink about it if you have a leader/followers kind of model with a dedicaated leader, in any sorta app18:41
JayFwe mainly wanna insure the leaders start first18:41
TheJuliaso funny thing is, we have a similar requirement/need which has been articulated by customers, but they've never been able to really articulate what is the driving force and what is the entry state18:41
TheJuliaoh, absolutely18:41
JayFGR wins again as the model upstream customer :D 18:41
JayFlol18:41
TheJuliaso if I was restarting a conductor, and forcing that initial power sync18:45
JayF1) accidentally a whole datacenter18:48
TheJuliaThat power sync could have a static priority check, and ... could we just have an ability to do some sort of query or suggestion from the config file?!?18:48
JayF2) manually power on an absolute minimal set of ironic control plane to get bootstrapped18:50
JayF3) use that smaller setup to get things booted in a proper order18:50
adamcarthur5I mean, how "nasty" can the query search be? How many locations can useful information for what instance info on nodes you might care about be? 18:56
JayFand the requirement I have that makes it suck: base those decisions on *instance* info not *node* info18:57
JayFsorry ^ that never got hit enter on18:57
adamcarthur5We probably want to support more than just display_name18:57
TheJuliayeah18:57
JayFyeah18:57
adamcarthur5Is it too nasty to say, convert the entire node object into a __dict__ and allow regexing on the whole thing 😅, I assume so?18:58
JayFtoo many secrets for that :DS 18:58
JayF"why did my server with 'critical' in the middle of it's ipmi_password get caught" /s18:58
JayFIt really depends on how  it's oriented: if we do some kinda offline tool which mainly pokes the DB, we can do more nasty stuff18:59
JayFif we try to arrange a more api-centric way, that is probably not good18:59
JayFbut I am also skeptical of any solution that starts from "your ironic is working" as a starting point, because even getting *to* that point is nontrivial18:59
TheJuliaI'm largely thinking the right thing feels like "get your ironic conductor restarted" and it starts taking over, since ideally it should be18:59
TheJuliathe only challenge is if you powered anything off....18:59
TheJuliaanyway, stepping away19:00
TheJuliabbiab19:00
JayF"if you powered anything off" <-- we're talking DR-level recovery, from a full outage of a DC or computer room19:00
adamcarthur5Yeah JayF I'm right in saying the "manual powered off" isn't a requirement right? Because that is what the script in the bug does19:00
JayFthe script in the bug /tells ironic to power the node off/19:01
adamcarthur5And I think Julia is questioning whether that is a hard-requirement 19:01
JayFwhich is a noop when Ironic sees the node is already powered off19:01
adamcarthur5noop?19:01
JayFthe entire point of that script is to circumvent conductor powering them on before we are ready19:01
JayFno-op, as in, ironic sees it's already off and does nothing19:01
JayFjust updates the DB to be correct19:01
adamcarthur5But what about Julia's idea of messing the conductor? I.e get it online first and go from there 19:01
JayFthe whole crux of this is ironic thinks node.power_state=on, when the actual server is powered off 19:01
JayF(and it might not be ideal to have an entire datacenter of power hungry servers coming online all at once for physics/electric power reasons)19:02
JayFadamcarthur5: I don't dislike that idea, but I struggle with thinking of a way to model this where it works based on nova instance metdata for deployed images rather than a node-centric orientation19:02
JayFAt some places I worked, they had like, a set of servers that were "core" and they were always the same hardware, sometimes in a separate room/cage, etc19:03
JayFin that model; something on the node to mark those as special is trivial19:03
JayFin the model where what makes the node special is /some property inherent to the software installed on it/ (hence instance_info.display_name), it gets more complex19:03
adamcarthur5I like the idea of editing conductor behaviour (we can handle the whole "entire data center trying to power up at once" problem too)19:06
adamcarthur5And then I don't think getting instance information is impossible from there right? 19:06
adamcarthur5It seems better than powering them all off with one script, and then needing a other script to bring it back in a certain order? (I.e. if it's not a script, where would it live if you took this path?)19:07
JayFmaybe19:10
JayFhonestly I think about stuff like this in terms of API interfaces19:10
JayFand I'm having trouble visualizing how you'd configure behavior like this 19:10
JayFif you were able to articulate a config grammar (even if a separate yaml file like what we proposed for dynamic networking), it might be easier to understand19:10
JayFbut also may be too complex for a minor feature? IDK. 19:11
* TheJulia reappears from taking a break19:27
TheJuliaso, I think irc is doing a dissservice to this discsussion19:28
TheJuliathat being said, I think it has value, so I propose we jump on a call and talk through it because disasters also take many different shapes, and that is where I'm coming from. I'd like the conductor to be able to handle recoveries in general through a simple method.19:30
JayFsure, when do you wanna have that chat? cc: adamcarthur5 19:30
JayFI was about to grab lunch but can delay if all parties are here now()19:31
jphI have a deployment with both Redfish and ILO5 hardware and was wondering how the conductors should be configured to handle this. I have encountered errors where the conductors do not start because no default power interface satisfies both hardware types. Which leads me to believe that I need two separate conductor groups one for each hardware type. Is this correct?19:31
adamcarthur5I can call now 19:32
adamcarthur5JayF19:33
JayFso one conductor can handle pretty much any hardware type you want19:33
JayFlet me find the conf you need19:33
JayFTheJulia: you wanna have that chat now or some later point? 19:33
TheJuliaLet ma make a cup of coffee and then I can chat19:34
TheJuliacoffee: brewing19:35
JayFhttps://docs.openstack.org/ironic/rocky/configuration/sample-config.html jph see enabled_hardware_types19:35
JayFyou can set that to a list, comma separated19:36
JayFoh screw that rocky link19:36
JayFsilly google19:36
TheJuliaheh19:36
JayFhttps://docs.openstack.org/ironic/latest/configuration/sample-config.html19:36
JayFthis example even has two in it! jph ^^^19:36
TheJuliaAnd generally have the same enabled interfaces across your conductors19:36
JayFI have seen environments that used conductor groups to separate devices of different types19:36
JayFbut I wouldn't do it that way19:36
jphOkay thanks I will reconfigure ironic again see if it is any different.19:37
TheJuliaCoffee: Acquired19:38
JayFjph: also note enabled_*_interfaces and default_*_interface. Those may also be useful in your case.19:39
JayFTheJulia: you wanna do a meet? or should I zoomzoom19:39
TheJuliahttps://meet.google.com/sui-uuhe-kyz19:39
JayFjph: to avoid having to explain it all again, https://www.youtube.com/watch?v=FUGB2e3XP0g#t=6m30s (6:30 timestamp) explains some of this19:40
TheJuliaa meeting we shall go, a meeting we shall go...19:40
* TheJulia looses all remaining sanity19:40
JayFadamcarthur5: ^19:41
jphThanks JayF and TheJulia I dropped the default_*_interfaces from the conductor configuration leaving it to the defaults and the conductor is now up and running with both hardware types.20:10
TheJuliaJayF: by resource class, shard, owner, lessee, conductor group, and then just do everything for on or off20:50
TheJuliaRecovery power on delay is a huge variable too20:52
JayFadamcarthur5: ^20:52
JayFif you could ping adam in on these too it'd be awesome20:52
TheJuliaAlso,  we’d to be able to signal a soft off20:55
JayFI thought most power offs these days were soft->(poll)->(timeout)->hard20:56
JayFthat is what I had in mind in any event20:56
opendevreviewVerification of a change to openstack/ironic master failed: ci: Remove code which has been long-dead  https://review.opendev.org/c/openstack/ironic/+/95046120:59
JayFurgh21:00
JayFthe job was -nv'd but is still in gate, fixing21:00
opendevreviewJay Faulkner proposed openstack/ironic master: ci: Remove code which has been long-dead  https://review.opendev.org/c/openstack/ironic/+/95046121:02
TheJuliaPower off is hard, you have to explicitly say soft if you want to be soft21:02
* JayF notes that in etherpad21:02
TheJuliaExplicitly discussing power on recovery, seems like less of a big hammer is needed but avoiding hot spotting is super hard21:06
JayFyeah, i think the recovery side of it is a little fuzzier21:11
JayFI'll ponder it overnight and get more input and we'll see what squeezes out21:11
TheJuliaso in talking with another operator, they are *super* concerned about hotspotting for resumption and almost feel they will need to query external DCIM tooling data to figure out their preferred ordered lists21:14
TheJuliawhich makes it a little easier if recovery is also just explicitly slower too21:14
TheJuliaThe key thing they wanted to note is they have some machines which are at peak for ~2-3 minutes21:15
TheJuliaand only then can they then trigger the next one in the rack.21:15
TheJuliaI floated "EMERGENCY_OFF" as like a power state, and they loved it, they envisioned it as "oh, I can then turn off all my active and powered on nodes, that way they know the nodes they need to get back online to be back at the same place21:18
TheJulialots of concern about being able to send soft though, so whatever flow there gets a little weird21:18
TheJuliaJayF: doh, sorry about that regarding the -nv21:21
JayFnp it happens21:31
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: Workaround neutron's move to uwsgi only  https://review.opendev.org/c/openstack/networking-generic-switch/+/95055921:45
TheJuliadoh, yay typos.21:54
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: ci: workaround neutron's move to uwsgi only  https://review.opendev.org/c/openstack/networking-generic-switch/+/95055922:54
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: ci: workaround neutron's move to uwsgi only  https://review.opendev.org/c/openstack/networking-generic-switch/+/95055923:32

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!