Wednesday, 2025-05-21

iurygregory	I give up, 3 KP when trying to boot the iDRAC 10 is enough for one day =(	02:23
iurygregory	I'm wondering if I should pass root device hints to see if would help , but I will figure out tomorrow	02:24
TheJulia	iurygregory: are you able to capture much of the kernel panic? are you able to reproduce it?	03:11
*** jroll04 is now known as jroll0		07:17
opendevreview	Nicolas Belouin proposed openstack/ironic-python-agent stable/2025.1: netutils: Use ethtool ioctl to get permanent mac address https://review.opendev.org/c/openstack/ironic-python-agent/+/950489	07:41
stephenfin	TheJulia: When you're about, I wonder if you'd be able to take a look over a failure we're seeing in Gophercloud CI? https://github.com/gophercloud/gophercloud/actions/runs/15144029062/job/42575162709?pr=3108	09:50
stephenfin	That seems to be coming from code related to the network simulator stuff you've added since we branched. I haven't been able to reproduce on a local Ubuntu 24.04 host though, so I'm hoping you'll see something obvious	09:52
Sandzwerg[m]	Morning ironic. I'd like to test secure boot. Via toggling it doesn't seem to work so far (I have the impression the toggling is not happening) using our own IPA & ESP I get a secure boot error. So our IPA doesn't support secure boot. While I try to figure out what package or config is missing: is there an IPA & ESP that I can use which should support secure boot out of the box? Or a way I can build something fast that should	10:15
Sandzwerg[m]	work?	10:15
masghar	iurygregory: also wondering if the disks ever came back on the UI, how curious	10:25
dtantsur	iurygregory: I can assure you that root device hints have no effect when the ramdisk is booting (and why would they?)	10:42
dtantsur	Sandzwerg[m]: what's your boot method and how do you build IPA?	10:44
*** sfinucan is now known as stephenfin		11:19
opendevreview	Merged openstack/ironic unmaintained/xena: [stable-only] Fix errors building docs https://review.opendev.org/c/openstack/ironic/+/949260	11:30
Sandzwerg[m]	<dtantsur> "Sandzwerg: what's your boot..." <- In this case idrac-redfish. We build our IPA with mkosi and use fedora 40 as basis. We needed that some years ago because we had hardware issues with what we were using before. We probably could switch to something else like DIB by now but before I invest the time I'd like to get something running to make sure it works at all and there isn't something else blocking	11:41
dtantsur	Sandzwerg[m]: ah yeah. I recall building a secure boot capable ISO being quite annoying, trying to remember any details	11:46
dtantsur	Sandzwerg[m]: this is what I did for metal3 back in the days: https://github.com/metal3-io/ironic-image/commit/f12f205	11:48
Sandzwerg[m]	currently the ISO is build on the fly with the ESP and the ramdisk & kernel, and I rebuild the esp on fedora 40 so it's the same as the IPA image itself, there was a note in the documentation that that was required but it still failed. So I guess something in our fedora image is missing. That's why I'd like to get a "known good" and if that works we can even switch. The main reason for our customized deployment is gone	11:48
dtantsur	tl;dr is to be careful what you put into the ESP and also configure Ironic to match that	11:49
Sandzwerg[m]	For the esp we basically did https://docs.openstack.org/ironic/latest/install/configure-esp.html we only needed to adjust the size as the hardcoded value was to small for us	11:51
Sandzwerg[m]	That's why we rebuild it with the matching distribution or could it even be that a change in the package leads to an error?	11:52
dtantsur	do you configure the matching grub_config_path?	11:56
dtantsur	It should work this way..	11:57
dtantsur	Although, granted, I've only tried secure boot on RHEL CoreOS	11:57
Sandzwerg[m]	yes, we adjusted the grub,shim path, and the dest path that was all we did. But there is no ESP availavle that fits to the for example centos based IPA images that are available?	12:01
dtantsur	Sandzwerg[m]: we don't publish one, and someone told me that CentOS Stream is not properly signed by the Microsoft's key	12:02
Sandzwerg[m]	hmpf. Okay is there a recommendation for what to use if one wants to do secure boot?	12:04
dtantsur	I think Fedora could be the right path. Debian might work too. I haven't dealt with this topic for ages, sorry.	12:05
dtantsur	(maybe TheJulia has more recent experience?)	12:05
Sandzwerg[m]	The issue wehave with fedora is the frequent upgrades and changes. We're stll on 40 because it's the last one that doesn't have python 3.12 as default, maybe that would work now but back then it broke IPA I think. We might be able to circumvent that with uv or similar tools but we haven't looked into that yet	12:09
* TheJulia waves from an uncaffinated state		12:57
TheJulia	so from my experience, Centos Stream's shim loader is signed by msft	13:00
TheJulia	specifically we had a bug appear in shim ages ago and it made its way into rhel becuase getting the shim binary re-signed is a brutal process	13:03
TheJulia	(which I've been copied on for that bug too....)	13:03
Sandzwerg[m]	Alright, I'll try that then. Thanks	13:04
dtantsur	okay, must have misunderstood something..	13:04
TheJulia	stephenfin: so, I'm wondering if it is newgrp which is causing the passwd prompt trigger, or if it is sudo. I guess we could "sudo newgrp"? For what it is worth, recent neutron devstack changes have torpedoed our gate, so we're sort of dead in the water at the moment while we try to figure that out as well	13:22
rpittau	TheJulia, dtantsur, JayF, cid, I think I found the "issue" with sushy/sushy-tools auth loop -> https://opendev.org/openstack/ironic/commit/5f7c7dcd041e95a7f1283ab12e9d708844fd0974	13:25
rpittau	we're now calling ironic.drivers.modules.redfish.utils and it does not detect redfish cached session, causing the loop	13:26
rpittau	this -> https://pastebin.com/1zckr36J	13:27
rpittau	we should revert that change and look into sushy/sushy-tools to avoid the loop	13:27
rpittau	at least I could not find anything else :/	13:29
TheJulia	rpittau: doesn't detect the unique session url already on hand so it then tries again	13:29
rpittau	yeah, I mean that's the only kind of related change that I can see	13:32
rpittau	although it did get merged a week ago, it seems issues started later, so not 100% sure	13:32
rpittau	oh wait	13:33
rpittau	just checking the actual patch, it did pass the first time on metal3 integration too, so I wonder if it's a race then	13:34
rpittau	no nvm it never passed on metal3	13:36
rpittau	and Python version does not make teh difference	13:36
opendevreview	Queensly Kyerewaa Acheampongmaa proposed openstack/sushy master: WIP: Add DateTime and DateTimeLocalOffset support to Manager resource https://review.opendev.org/c/openstack/sushy/+/950539	13:37
rpittau	I'll do a revert patch just to try	13:37
opendevreview	Riccardo Pittau proposed openstack/ironic master: Revert "Fix redfish driver URL parsing" https://review.opendev.org/c/openstack/ironic/+/950540	13:38
rpittau	I will do another test in parallel in metal3	13:43
TheJulia	k	13:46
TheJulia	trying to figure out why our most advanced networking jobs are dead now :(	13:47
TheJulia	okay, so NGS is just not working now	13:56
TheJulia	that is why the jobs are failing with JayF's patch	13:56
JayF	Did they do the eventlet removal	14:03
JayF	if so, we probably need a patch similar to my NBM patch in NGS	14:03
TheJulia	who is they in that statement, it seems like maybe we're in an odd setup state	14:04
JayF	neutron landed eventlet migration for l2 agents	14:04
JayF	which makes me wonder if we are breaky downstream for that reason	14:04
JayF	since NGS/NBM are plugins	14:04
JayF	https://opendev.org/openstack/neutron/commit/9dc0d0fd2f44e348705804f1f99403086c138010 hmm not as dramatic as I thought	14:05
JayF	timing doesn't match anyway	14:05
TheJulia	oh, I think I see what is going on	14:08
TheJulia	so, we have configuration loaded in the files	14:08
TheJulia	but not in the running neutron API instance which is where ml2 plugins launch from	14:08
TheJulia	if you compare old to new	14:09
JayF	it'd be interesting to understand where the restart was dropped, since my statement before about that code being dead is still true	14:09
TheJulia	old, we restart neutron 2x	14:09
TheJulia	and it gets the configuration	14:09
JayF	that's what I've been struggling with; the diff is so small in devstack	14:09
TheJulia	in new, it never gets restarted	14:09
JayF	hm	14:09
TheJulia	yeah	14:11
TheJulia	the way it gets registered only ever sets the config files parameter	14:11
TheJulia	https://opendev.org/openstack/devstack/src/branch/master/lib/neutron#L1048	14:11
TheJulia	dog is demanding to go out, bbiab	14:12
TheJulia	okay, in working, the genericswitch ini file is on the intiial start	14:33
TheJulia	in the non-working, it never gets added/loaded	14:34
TheJulia	i see the issue	15:04
TheJulia	when you use wsgi launcher, the existing configuraiton modeling does not load up or respect the classical configuration patterns for neutron services	15:05
TheJulia	instead, neutron looks for an environment variable to source the list	15:05
TheJulia	https://github.com/openstack/neutron/blob/master/neutron/server/__init__.py	15:06
TheJulia	I've raised it in the neutron channel	15:13
TheJulia	I'm guessing we're sort of shit out of luck	15:13
opendevreview	Queensly Kyerewaa Acheampongmaa proposed openstack/sushy master: Add DateTime and DateTimeLocalOffset support to Manager resource https://review.opendev.org/c/openstack/sushy/+/950539	15:37
opendevreview	Verification of a change to openstack/bifrost master failed: Add support for downloading CentOS Stream 10 image https://review.opendev.org/c/openstack/bifrost/+/950286	15:41
opendevreview	Julia Kreger proposed openstack/networking-generic-switch master: Workaround neutorn's move to uwsgi only https://review.opendev.org/c/openstack/networking-generic-switch/+/950559	15:49
opendevreview	Julia Kreger proposed openstack/networking-generic-switch master: Workaround neutron's move to uwsgi only https://review.opendev.org/c/openstack/networking-generic-switch/+/950559	15:50
TheJulia	I think ^^ might work, but we might need to disable ironic jobs first, and then land it	15:50
TheJulia	if we can test it in	15:50
TheJulia	time will tell	15:50
* JayF just filed RFE https://bugs.launchpad.net/ironic/+bug/2111438		15:53
JayF	I'm not 100% sure I have the shape of the solution right, but I wanted to document the need/ask	15:53
* JayF looks at Julia's change		15:54
JayF	TheJulia: I think start_neutron_service_and_check does some configuration stuff too	15:54
* JayF double checks		15:55
JayF	yeah I suspect there'll be an issue around stop/start and hitting the right set of services	15:56
JayF	but I am not certain enough to suggest a fix before seeing output	15:56
TheJulia	it happens after the ini file is written	15:56
TheJulia	https://e3fa69918ab3893f89a3-76ad47885070581f857a540cadaa6a6d.ssl.cf1.rackcdn.com/openstack/55cf2727b4c54f06b897353cf71ea0a3/controller/logs/etc/neutron/neutron-api-uwsgi.ini is what we get today	15:56
JayF	I mainly am wondering if we need to hit start_neutron too	15:57
JayF	so the agents come back up	15:57
JayF	I'm ... mostly sure we don't need to?	15:57
JayF	either way, you have the science sciencing, we'll see if there's cake at the end	15:57
TheJulia	yeah, I'm curious what neutron folks will say... if they respond at all	15:58
opendevreview	Merged openstack/ironic-python-agent master: Remove TinyIPA jobs https://review.opendev.org/c/openstack/ironic-python-agent/+/950236	16:06
JayF	I guess IPA doesn't have any neutron jobs	16:06
JayF	in a surprising victory of sensibility in our CI	16:06
JayF	massive irony about neutron devstack jobs being broken: cid and I have time on calendar today to work on step 0 of dynamic networking (contributor guide docs for complex networking devstack setups)	16:07
JayF	so I guess that gets pushed a week lol	16:07
FreemanBoss[m]	rpittau: please your review is required... (full message at <https://matrix.org/oftc/media/v1/media/download/AXmEi5gLrV1hye0GHOujXjSDi9m2WU7JWH3Y5IJ7dhg1bw_sPHJ485qws7uKRCKVkHL3c3c566IQ_GZSpRiPHdFCeXO7NK8gAG1hdHJpeC5vcmcvZFpVR09sZGdhdk9nVUVKYWJ6VlhHclVl>)	16:08
masghar	FreemanBoss: For changes ready for review, you can add the hashtag 'ironic-week-prio'	16:33
masghar	Will get more eyes on your change	16:33
TheJulia	JayF: So, I think we need to disable voting on the jobs on your patch	16:42
TheJulia	merge that, then try to get n-g-s jobs ifxed	16:42
TheJulia	The n-g-s job failed deep in the config, so hopefully that is the shortest path	16:42
TheJulia	JayF: I can revise your patch if you want, or you can do so to mark the failing jobs as non-voting	16:43
JayF	I'm OK with that, but can we achieve the same result for science with Depends-On on the NGS patch?	16:43
TheJulia	but we can't merge it	16:43
TheJulia	and we would be blocked from doing so	16:43
TheJulia	And regardless, we would need to make something non-voting someplace to merge a fix	16:43
JayF	like I said I'm OK with it	16:44
JayF	you can update patch or I will in a few minutes when I reach the end of my current train of thought	16:45
TheJulia	I'm on it	16:45
opendevreview	Julia Kreger proposed openstack/ironic master: ci: Remove code which has been long-dead https://review.opendev.org/c/openstack/ironic/+/950461	16:47
opendevreview	Julia Kreger proposed openstack/networking-generic-switch master: Workaround neutron's move to uwsgi only https://review.opendev.org/c/openstack/networking-generic-switch/+/950559	16:49
TheJulia	okay, that will allow us to semi-unblock and begin sciencing furhter	16:51
TheJulia	SCIENCE!	16:51
TheJulia	so, regarding https://bugs.launchpad.net/ironic/+bug/2111438, is the idea some sort of "power priority" and then to sort the power sync/status/etc stuff via the priority	16:54
JayF	Well, in my particular case that won't do the trick	16:57
JayF	primarily because it's not specific nodes it's specific instances	16:58
JayF	notice how the example script is keying on instance_info	16:58
JayF	that's the reason I leaned towards an offline tool, because it fits into a DR recovery plan more sanely; 1) get DB up 2) set power priorities 3) online ironic and let it execute 4) use API clients to spin up the rest as needed	16:58
JayF	I could also potentially accept "just power all of them off" as an option, then API to turn on the ones we want after	16:59
TheJulia	hmm, fair	17:08
TheJulia	I guess there are a few competing challenges:	17:16
TheJulia	1) Conductor should be powering everything up anyhow	17:16
TheJulia	2) Some sort of priority would make a lot of sense as to not overload breakers with inrush current. I've never popped a distribution breaker, but Ironic has successfully popped some breakers in it's history ;)	17:16
TheJulia	3) I guess it is fair to be able to "go turn on key nodes, and to be able to key that off something which is instance provided.	17:16
TheJulia	Perhaps a power_prioirty of 0 could be "go reference some config which could reach into or look at something", 1-98 could be in order, and add a knob at 99 by default	17:17
TheJulia	The whole thing about a DR plan recovery, does sort of make sense, if you had everything powered down	17:19
TheJulia	but ironic is going to try to return to prior state	17:19
TheJulia	So, if you did power everything off, then you have to power everything back up	17:19
*** darmach0 is now known as darmach		17:27
-opendevstatus- NOTICE: Gerrit is being updated to the latest 3.10 bugfix release as part of early prep work for an eventual 3.11 upgrade. Gerrit will be offline momentarily while it restarts on the new version.		17:34
adamcarthur5	TheJulia Hey, I am not sure about a priority because its really about instances AND nodes, which you kind of mention. I think just a way of saying "power on the nodes as you see fit, but these instances need to come online first"	18:09
TheJulia	then I think if we could have a priority which somehow explains "go look at this"... maybe!?	18:11
adamcarthur5	Ah okay, I think I have misinterpreted what level you meant "priority" at. I agree, I think "go look at this" is the difficult thing because:	18:12
adamcarthur5	1) It needs to be agnostic to how you create instances (i.e support more than just nova)	18:13
adamcarthur5	2) It needs to live entirely in Ironic	18:13
TheJulia	yeah, it doesn't solve the case though, if you epxlicitly shut everything down first	18:14
TheJulia	because then save power state superceeds	18:14
adamcarthur5	Is that about your point 1?	18:19
TheJulia	not really, mor so is how you get into a disaster in the first place	18:20
TheJulia	"oh no, the nuclear power plant is melting down" is entirely different from "data center is burning down"	18:21
TheJulia	i.e. is this a sudden disaster, or a slow rolling disaster	18:21
TheJulia	are you coming back from nothing, or just a hard outage	18:21
TheJulia	was that outage planned, or not	18:21
adamcarthur5	Are you mentioning this because we need to have a feature that covers many scenarios, or because you don't understand where this bug desc is coming from?	18:23
TheJulia	there are many scenarios	18:23
TheJulia	For example, I ran a DR scenario once which was literally "The power plant near by is melting down, we have to leave, the servers will keep running for a undeterminable amount of time" and a opposite test, "a tornado hit the data center, we're rebuilding from scratch"	18:24
TheJulia	there is a whole spectrum in there	18:24
TheJulia	so assuming the disaster is "Electrical Room Fire", then you the prior power state is power_on when your conductor is back online	18:25
adamcarthur5	Yeah okay, I mean is it acceptable for us to specific only think about issues like ours. So "external factors knocked everything offline, for a temporary period" i.e power outage	18:25
TheJulia	but assuming your disaster is uhhh... "UPS is in bypass, and we need to cut the slab to replace it" then you can shutdown the workloads and your state is "power off"	18:26
adamcarthur5	Your worry is getting everything to the power off state?	18:27
TheJulia	no, my worry is that we can't recover the power state because we start in a power state of power off if you did a staged shutdown of the data center	18:28
TheJulia	so the starting place is a little weird	18:29
TheJulia	the bug, seems to request the idea of "hey, explicitly power on" which is totally different as well	18:29
adamcarthur5	I think Jay is purely using that as an example	18:30
adamcarthur5	I.e. explicit power off is not a requirement	18:30
adamcarthur5	(before starting_	18:30
TheJulia	so we need to "turn on a fleet", and it is almost like we need tri-value field	18:34
JayF	well	18:34
JayF	the power_state field on those nodes are liekly power_on	18:34
JayF	even though the datacenter, in this case, went boom and they are all off	18:34
JayF	so the idea is just to try and get conductor to spin up, say, the 5% of nodes who (in my downstream case) have an instance name that indicates it's a controller node	18:34
JayF	note that it's instance name	18:35
TheJulia	oh yeah	18:35
JayF	we don't dedicate nodes for controllers	18:35
JayF	think about it if you have a leader/followers kind of model with a dedicaated leader, in any sorta app	18:41
JayF	we mainly wanna insure the leaders start first	18:41
TheJulia	so funny thing is, we have a similar requirement/need which has been articulated by customers, but they've never been able to really articulate what is the driving force and what is the entry state	18:41
TheJulia	oh, absolutely	18:41
JayF	GR wins again as the model upstream customer :D	18:41
JayF	lol	18:41
TheJulia	so if I was restarting a conductor, and forcing that initial power sync	18:45
JayF	1) accidentally a whole datacenter	18:48
TheJulia	That power sync could have a static priority check, and ... could we just have an ability to do some sort of query or suggestion from the config file?!?	18:48
JayF	2) manually power on an absolute minimal set of ironic control plane to get bootstrapped	18:50
JayF	3) use that smaller setup to get things booted in a proper order	18:50
adamcarthur5	I mean, how "nasty" can the query search be? How many locations can useful information for what instance info on nodes you might care about be?	18:56
JayF	and the requirement I have that makes it suck: base those decisions on instance info not node info	18:57
JayF	sorry ^ that never got hit enter on	18:57
adamcarthur5	We probably want to support more than just display_name	18:57
TheJulia	yeah	18:57
JayF	yeah	18:57
adamcarthur5	Is it too nasty to say, convert the entire node object into a __dict__ and allow regexing on the whole thing 😅, I assume so?	18:58
JayF	too many secrets for that :DS	18:58
JayF	"why did my server with 'critical' in the middle of it's ipmi_password get caught" /s	18:58
JayF	It really depends on how it's oriented: if we do some kinda offline tool which mainly pokes the DB, we can do more nasty stuff	18:59
JayF	if we try to arrange a more api-centric way, that is probably not good	18:59
JayF	but I am also skeptical of any solution that starts from "your ironic is working" as a starting point, because even getting to that point is nontrivial	18:59
TheJulia	I'm largely thinking the right thing feels like "get your ironic conductor restarted" and it starts taking over, since ideally it should be	18:59
TheJulia	the only challenge is if you powered anything off....	18:59
TheJulia	anyway, stepping away	19:00
TheJulia	bbiab	19:00
JayF	"if you powered anything off" <-- we're talking DR-level recovery, from a full outage of a DC or computer room	19:00
adamcarthur5	Yeah JayF I'm right in saying the "manual powered off" isn't a requirement right? Because that is what the script in the bug does	19:00
JayF	the script in the bug /tells ironic to power the node off/	19:01
adamcarthur5	And I think Julia is questioning whether that is a hard-requirement	19:01
JayF	which is a noop when Ironic sees the node is already powered off	19:01
adamcarthur5	noop?	19:01
JayF	the entire point of that script is to circumvent conductor powering them on before we are ready	19:01
JayF	no-op, as in, ironic sees it's already off and does nothing	19:01
JayF	just updates the DB to be correct	19:01
adamcarthur5	But what about Julia's idea of messing the conductor? I.e get it online first and go from there	19:01
JayF	the whole crux of this is ironic thinks node.power_state=on, when the actual server is powered off	19:01
JayF	(and it might not be ideal to have an entire datacenter of power hungry servers coming online all at once for physics/electric power reasons)	19:02
JayF	adamcarthur5: I don't dislike that idea, but I struggle with thinking of a way to model this where it works based on nova instance metdata for deployed images rather than a node-centric orientation	19:02
JayF	At some places I worked, they had like, a set of servers that were "core" and they were always the same hardware, sometimes in a separate room/cage, etc	19:03
JayF	in that model; something on the node to mark those as special is trivial	19:03
JayF	in the model where what makes the node special is /some property inherent to the software installed on it/ (hence instance_info.display_name), it gets more complex	19:03
adamcarthur5	I like the idea of editing conductor behaviour (we can handle the whole "entire data center trying to power up at once" problem too)	19:06
adamcarthur5	And then I don't think getting instance information is impossible from there right?	19:06
adamcarthur5	It seems better than powering them all off with one script, and then needing a other script to bring it back in a certain order? (I.e. if it's not a script, where would it live if you took this path?)	19:07
JayF	maybe	19:10
JayF	honestly I think about stuff like this in terms of API interfaces	19:10
JayF	and I'm having trouble visualizing how you'd configure behavior like this	19:10
JayF	if you were able to articulate a config grammar (even if a separate yaml file like what we proposed for dynamic networking), it might be easier to understand	19:10
JayF	but also may be too complex for a minor feature? IDK.	19:11
* TheJulia reappears from taking a break		19:27
TheJulia	so, I think irc is doing a dissservice to this discsussion	19:28
TheJulia	that being said, I think it has value, so I propose we jump on a call and talk through it because disasters also take many different shapes, and that is where I'm coming from. I'd like the conductor to be able to handle recoveries in general through a simple method.	19:30
JayF	sure, when do you wanna have that chat? cc: adamcarthur5	19:30
JayF	I was about to grab lunch but can delay if all parties are here now()	19:31
jph	I have a deployment with both Redfish and ILO5 hardware and was wondering how the conductors should be configured to handle this. I have encountered errors where the conductors do not start because no default power interface satisfies both hardware types. Which leads me to believe that I need two separate conductor groups one for each hardware type. Is this correct?	19:31
adamcarthur5	I can call now	19:32
adamcarthur5	JayF	19:33
JayF	so one conductor can handle pretty much any hardware type you want	19:33
JayF	let me find the conf you need	19:33
JayF	TheJulia: you wanna have that chat now or some later point?	19:33
TheJulia	Let ma make a cup of coffee and then I can chat	19:34
TheJulia	coffee: brewing	19:35
JayF	https://docs.openstack.org/ironic/rocky/configuration/sample-config.html jph see enabled_hardware_types	19:35
JayF	you can set that to a list, comma separated	19:36
JayF	oh screw that rocky link	19:36
JayF	silly google	19:36
TheJulia	heh	19:36
JayF	https://docs.openstack.org/ironic/latest/configuration/sample-config.html	19:36
JayF	this example even has two in it! jph ^^^	19:36
TheJulia	And generally have the same enabled interfaces across your conductors	19:36
JayF	I have seen environments that used conductor groups to separate devices of different types	19:36
JayF	but I wouldn't do it that way	19:36
jph	Okay thanks I will reconfigure ironic again see if it is any different.	19:37
TheJulia	Coffee: Acquired	19:38
JayF	jph: also note enabled__interfaces and default__interface. Those may also be useful in your case.	19:39
JayF	TheJulia: you wanna do a meet? or should I zoomzoom	19:39
TheJulia	https://meet.google.com/sui-uuhe-kyz	19:39
JayF	jph: to avoid having to explain it all again, https://www.youtube.com/watch?v=FUGB2e3XP0g#t=6m30s (6:30 timestamp) explains some of this	19:40
TheJulia	a meeting we shall go, a meeting we shall go...	19:40
* TheJulia looses all remaining sanity		19:40
JayF	adamcarthur5: ^	19:41
jph	Thanks JayF and TheJulia I dropped the default_*_interfaces from the conductor configuration leaving it to the defaults and the conductor is now up and running with both hardware types.	20:10
TheJulia	JayF: by resource class, shard, owner, lessee, conductor group, and then just do everything for on or off	20:50
TheJulia	Recovery power on delay is a huge variable too	20:52
JayF	adamcarthur5: ^	20:52
JayF	if you could ping adam in on these too it'd be awesome	20:52
TheJulia	Also, we’d to be able to signal a soft off	20:55
JayF	I thought most power offs these days were soft->(poll)->(timeout)->hard	20:56
JayF	that is what I had in mind in any event	20:56
opendevreview	Verification of a change to openstack/ironic master failed: ci: Remove code which has been long-dead https://review.opendev.org/c/openstack/ironic/+/950461	20:59
JayF	urgh	21:00
JayF	the job was -nv'd but is still in gate, fixing	21:00
opendevreview	Jay Faulkner proposed openstack/ironic master: ci: Remove code which has been long-dead https://review.opendev.org/c/openstack/ironic/+/950461	21:02
TheJulia	Power off is hard, you have to explicitly say soft if you want to be soft	21:02
* JayF notes that in etherpad		21:02
TheJulia	Explicitly discussing power on recovery, seems like less of a big hammer is needed but avoiding hot spotting is super hard	21:06
JayF	yeah, i think the recovery side of it is a little fuzzier	21:11
JayF	I'll ponder it overnight and get more input and we'll see what squeezes out	21:11
TheJulia	so in talking with another operator, they are super concerned about hotspotting for resumption and almost feel they will need to query external DCIM tooling data to figure out their preferred ordered lists	21:14
TheJulia	which makes it a little easier if recovery is also just explicitly slower too	21:14
TheJulia	The key thing they wanted to note is they have some machines which are at peak for ~2-3 minutes	21:15
TheJulia	and only then can they then trigger the next one in the rack.	21:15
TheJulia	I floated "EMERGENCY_OFF" as like a power state, and they loved it, they envisioned it as "oh, I can then turn off all my active and powered on nodes, that way they know the nodes they need to get back online to be back at the same place	21:18
TheJulia	lots of concern about being able to send soft though, so whatever flow there gets a little weird	21:18
TheJulia	JayF: doh, sorry about that regarding the -nv	21:21
JayF	np it happens	21:31
opendevreview	Julia Kreger proposed openstack/networking-generic-switch master: Workaround neutron's move to uwsgi only https://review.opendev.org/c/openstack/networking-generic-switch/+/950559	21:45
TheJulia	doh, yay typos.	21:54
opendevreview	Julia Kreger proposed openstack/networking-generic-switch master: ci: workaround neutron's move to uwsgi only https://review.opendev.org/c/openstack/networking-generic-switch/+/950559	22:54
opendevreview	Julia Kreger proposed openstack/networking-generic-switch master: ci: workaround neutron's move to uwsgi only https://review.opendev.org/c/openstack/networking-generic-switch/+/950559	23:32

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!