Wednesday, 2022-12-07

vanou	good morning ironic	00:58
opendevreview	Aija Jauntēva proposed openstack/sushy master: Fix volume deletion on newer iDRACs https://review.opendev.org/c/openstack/sushy/+/864845	07:53
rpittau	good morning ironic! o/	08:36
kubajj	Good morning everyone	09:16
kubajj	rpittau: is it possible to recheck just one zuul pipeline?	11:26
rpittau	kubajj: unfortunately that's not possible	11:26
iurygregory	good morning Ironic	11:29
kubajj	good morning iurygregory	11:29
kubajj	If I want to recheck, do I just leave a comment, a review with just recheck in it or is there something special?	11:30
rpittau	kubajj: just add a comment with recheck at the beginning and possibly a reason for the recheck	11:31
rpittau	what's the patch ?	11:31
kubajj	https://review.opendev.org/c/openstack/ironic/+/866056	11:31
kubajj	Zuul says Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance	11:32
kubajj	and I don't think that I modified anything that should cause a problem with partition image	11:33
iurygregory	you can add "recheck zuul retries"	11:34
iurygregory	this would trigger recheck on the patch, and you provided the reason why you would like to do it =)	11:34
rpittau	kubajj: I agree, it doesn't look like the failure is due to your change	11:34
kubajj	thanks, will do	11:36
opendevreview	Jonathan Rosser proposed openstack/ironic master: Fix debug log message argument formatting https://review.opendev.org/c/openstack/ironic/+/866856	13:56
opendevreview	Riccardo Pittau proposed openstack/ironic master: Fix unit tests for Python 3.11 https://review.opendev.org/c/openstack/ironic/+/866861	14:12
janders	dtantsur sorry I missed the ping about https://review.opendev.org/c/openstack/sushy/+/866612 - missed the boat now, but LGTM, thanks!	14:23
janders	Out of curiosity, what hardware needed this fix?	14:23
dtantsur	janders: some ARM stuff, I pinged you downstream	14:24
janders	dtantsur ACK	14:25
kubajj	dtantsur: what are the links that the controllers return?	14:37
dtantsur	kubajj: these are for navigation between API resources. Since your new endpoint is leaf (nothing under it), I think you can omit links for now.	14:38
dtantsur	e.g. a node endpoint links to states (/v1/nodes/<node> to /v1/nodes/<node>/states)	14:39
kubajj	thanks	14:39
TheJulia	good morning folks	14:43
kubajj	dtantsur: just to make sure, I need just get_one and it should just return the data	14:50
dtantsur	kubajj: yep	14:50
rpittau	need to split, see ya tomorrow o/	16:02
kubajj	dtantsur: I think I figured out why we had the discussion about the get_by	16:13
kubajj	get_by_node_uuid in object definition	16:14
opendevreview	Jakub Jelinek proposed openstack/ironic master: WIP: API for node inventory https://review.opendev.org/c/openstack/ironic/+/866876	16:16
jrosser	TheJulia: following up from my difficulty with debug=True yesterday i found a suspicious debug log https://review.opendev.org/c/openstack/ironic/+/866856	16:44
jrosser	it was a little tricky to find because the exception handling here doesnt output the stack trace https://github.com/openstack/ironic/blob/b34d79e3f440c408520f24a2263dc587ac205ee2/ironic/drivers/modules/agent_base.py#L539-L545	16:44
JayF	I don't /think/ that should be a functional change	16:45
JayF	but imbw	16:45
jrosser	i'm now stuck in a bit of a loop where i can't get cleaning to succeed https://paste.opendev.org/show/bTFmRIKizVxJwO8E14Ww/	16:46
JayF	Are you using any kind of custom code in that?	16:47
JayF	Other than the above patch?	16:47
jrosser	no, this is stable/yoga	16:48
JayF	This looks extremely broken	16:48
JayF	Can you reproduce that with IPA in debug mode?	16:48
jrosser	is there a correct way to "start again" after getting to clean failed state	16:49
jrosser	i am wondering if i'm doing that wrong	16:49
JayF	So if it's in clean failed	16:50
JayF	you should be able to run	16:50
JayF	baremetal node blah manage	16:50
JayF	baremetal node blah provide	16:50
JayF	you have to go from:	16:50
JayF	clean failed -> managable	16:50
JayF	then from managable -> cleaning -> available	16:50
JayF	if you're working with automated cleaning	16:50
JayF	so just manage, provide	16:50
jrosser	hmm ok thats what i'm doing	16:50
JayF	So lets do a couple of things:	16:51
JayF	Do your "manage"	16:51
JayF	get a dump of the node object	16:51
JayF	(/v1/nodes/detail?uuid=blah or whatever the node-show-detail command is in OSC)	16:51
JayF	then do a provide	16:51
JayF	when it breaks, get a node detail again	16:51
JayF	toss all that into a pastebin and we should have the info we need to find the problem, I hope	16:51
jrosser	ok i'll try that	16:52
JayF	TheJulia: ^^ uh, it looks like cleaning could be crazy-broken on stable/yoga --- we are trying to call "wait" state verb on a node in cleanwait state	16:52
* TheJulia tries to digest		16:54
TheJulia	what the....	16:56
JayF	yeah, that's what I was thinking lol	16:56
TheJulia	yeah, we're going to need state data for the fields before/after I think	17:00
TheJulia	just so we're not spinning endlessly	17:00
* jrosser not sure how to get node detail		17:01
JayF	baremetal node show $node_uuid --detail # I think ?	17:01
opendevreview	Vishal Manchanda proposed openstack/ironic-ui master: [DNM] Test CI jobs status https://review.opendev.org/c/openstack/ironic-ui/+/866880	17:06
jrosser	JayF: before i run the 'provide' is this sufficient detail? https://paste.opendev.org/show/bkMzulFg0D4nm9zZqB41/	17:18
JayF	perfect	17:18
jrosser	i can't find a way to get detailed output from 'baremetal node show'	17:18
JayF	--long might be it	17:18
JayF	you have the fields I care about :D	17:18
JayF	TheJulia: ^^ is clean_steps supposed to be erased on the failed -> managable transition, or when cleaning restarts?	17:19
TheJulia	it is just baremetal node show	17:19
TheJulia	JayF: I believe it is	17:20
TheJulia	uhh... both I think	17:20
JayF	So I'll note; node.clean_steps in the above paste is still populated	17:21
JayF	we'll wait to see the output but that looks sus	17:21
TheJulia	That seems like it is a problem then	17:22
TheJulia	but it has been ~2 years since I was looking at that code	17:22
JayF	same, but longer	17:22
JayF	and I don't know what coulda busted it in Yoga	17:22
TheJulia	likewise	17:22
TheJulia	this seems super weird	17:22
TheJulia	because we run CI in debug too	17:23
JayF	jrosser: information about how you have installed Ironic would be useful too -- pip installed? Using packages? etc	17:23
TheJulia	but... maybe it got orphaned there earlier on and the presence short circuits things	17:23
TheJulia	which does feel like we've had that come up	17:23
jrosser	it is installed with pip from git using openstack-ansible	17:23
jrosser	this feels less directly related to debug than my trouble yesterday, but i may be wrong there	17:24
jrosser	unless i have some bogus state now from it encountering the exception	17:25
TheJulia	well, I think you hit an unexpected exception	17:26
TheJulia	lets see, what was that exception	17:26
jrosser	ok here is the node detail after i get clean failed https://paste.opendev.org/show/bPcYy3RTxwWFSThLUVVl/	17:27
TheJulia	also, sorry for not being really around late yesterday/early today. Board stuffs	17:28
TheJulia	ouch	17:31
TheJulia	so yeah	17:31
TheJulia	you hit a typeerror, which bombed out things really abdly	17:31
TheJulia	badly	17:31
JayF	If we can't recover from that in Ironic-proper; it's still our bug to fix that up, yeah?	17:31
JayF	manage/provide should be resetting enough context on the node to make it work	17:32
JayF	I just can't tell what is broken where, in the output	17:32
TheJulia	yeah, I'm not that far yet	17:34
TheJulia	mother in law just called	17:34
TheJulia	err	17:34
TheJulia	no, step mother	17:34
* TheJulia gets the two confused		17:34
TheJulia	oh!	17:44
TheJulia	I see what is going on	17:44
TheJulia	so clean step overrides are loaded	17:44
JayF	I'm not sure I follow?	17:45
TheJulia	https://github.com/openstack/ironic/blame/stable/yoga/ironic/conductor/steps.py#L187	17:45
JayF	I'm still not sure how that could cause the failures we're seeing	17:47
TheJulia	I think we just need a general exception catch in do_node_clean	17:47
jrosser	yes i have [conductor] clean_step_priority_override = deploy.erase_devices_express:5	17:48
JayF	So that's the original typeerror	17:49
JayF	but I don't think that explains why it's failing in a recurring method now	17:49
TheJulia	yeah, that is what I'm struggling with	17:54
TheJulia	do we have the latest error?	17:54
JayF	he showed a node log of the error we see in the last node	17:55
JayF	like well above	17:55
JayF	it appears to be failing reliably in the same way	17:55
jrosser	yes it is the same, this is from my last attempt just now https://paste.opendev.org/show/bUASWfgCz9kRVzd1fvRn/	17:56
jrosser	i can see in the console of the node that the express cleaning step complete with result: None	17:57
TheJulia	oh!	17:59
TheJulia	I think i know what is going on	17:59
TheJulia	jrosser: any chance we can get like the hundred lines before that in the logs ?	18:04
jrosser	sure	18:04
jrosser	this is from the point that the node was powered on https://pastebin.com/raw/Dn0AwziH	18:14
JayF	17:20:37	18:22
JayF	hmm, nope, I'm just reading multiple logs about the same thing as if it was multiple different logs	18:23
JayF	it was already in clean wait after it powered on, it never transitioned back to cleaning at any point, but yet wait was called	18:24
JayF	I wonder if this is some kind of weird edge case in DRAC cleaning + clean step override	18:24
jrosser	oh the drac stuff is just noise, there are a bunch of dells as well but this isnt one of them	18:24
* JayF burns his hypothesis		18:25
TheJulia	yeah, this is bizzar	18:33
JayF	jrosser: you 100000000% sure this is unpatched, up to date stable/yoga?	18:36
JayF	jrosser: maybe a pip freeze from inside the venv in the container, if you can do that?	18:37
JayF	17:20:37 is the only state transition	18:38
JayF	why it is trying to wait a cleanwait node???	18:38
jrosser	https://paste.opendev.org/show/baKpxWfnQ18WzAMBXvjl/	18:38
jrosser	it should be ironic from commit 5fc42c41	18:40
JayF	yeah, and that's the latest yoga release	18:40
JayF	I'm looking at git commit logs for something to blame lol	18:40
JayF	this is one of the most clearly broken things I've seen with Ironic, and I can't figure out why lol	18:41
jrosser	i need to fork ironic now anyway to apply my debug log patch repeatably, so i can redeploy all this fresh tomorrow	18:41
JayF	well, if it stopped breaking after you redeployed	18:41
JayF	I'd be even more confused and upset lol	18:41
TheJulia	eh, everything with weird cleaning stuff is... weird	18:43
JayF	https://github.com/openstack/ironic/commit/8034242c225f3293c08ca46dc588d00c5ad0e10a is the only commit I can even see in yoga that I'd be sus over	18:43
JayF	and I can't draw a line between those changes	18:43
jrosser	and just for completeness i tried again with debug=False and it's totally nothing to do with that	18:43
JayF	yeah my hunch is somehow garbage was left in the node cleaning information	18:44
JayF	from the typeerror	18:44
JayF	and that is causing consistent failures	18:44
JayF	so there are 2x things to fix: the root cause, the typeerror, and our recovery semantics around bad data in cleaning metadata	18:44
JayF	but I'd really like to track it through the code	18:44
jrosser	is there value in persisting with this, i'm happy to debug but i don't really know what i'm looking at	18:44
jrosser	alternatively i can delete / recreate the node but the learning opportunity may be lost then	18:45
TheJulia	so it is almost like an explicit call to resume is not getting recorded	18:45
TheJulia	and when it saves the object and refreshes, it gets the old data	18:45
JayF	TheJulia: do you think there's any value in jrosser NOT rebuilding this?	18:45
JayF	I think we have all the data, right?	18:45
TheJulia	jrosser: what version of sqlalchemy is in use?	18:45
JayF	even if we can't explain it	18:45
JayF	oooooh	18:45
JayF	SQLAlchemy==1.4.31	18:46
JayF	sqlalchemy-migrate==0.13.0	18:46
JayF	that looks right to me	18:46
TheJulia	bngo	18:46
JayF	that's not correct? for yoga?	18:46
TheJulia	drop down to 1.3.x or 1.2.x	18:46
JayF	oslo.db==11.2.0	18:46
TheJulia	we're just doing 1.4/2.0 stuff in master branch	18:46
JayF	are our requirements wrong? or is OSA ignoring them?	18:46
TheJulia	uhhhhhhh	18:46
TheJulia	11.2.0	18:46
* TheJulia goes and checks		18:47
JayF	SQLAlchemy>=1.2.19 # MIT	18:47
JayF	oslo.db>=9.1.0 # Apache-2.0	18:47
JayF	no upper bound, when clearly there needs to be one	18:47
TheJulia	upper comes from requirements repo	18:47
JayF	https://github.com/openstack/requirements/blob/stable/yoga/global-requirements.txt#L326	18:48
TheJulia	https://github.com/openstack/requirements/blob/stable/yoga/upper-constraints.txt#L164	18:48
TheJulia	yeah	18:48
TheJulia	so, I'd drop the version back to just see if it works	18:48
JayF	so that is incorrect?	18:48
JayF	1.4 drops autocommit?	18:48
TheJulia	I don't know, I've never seen this before	18:48
TheJulia	1.4 doesn't drop it, 2.0 does	18:48
JayF	let me ask the question in a more declarative way:	18:48
TheJulia	but... this feels like an autocommit issue	18:48
JayF	Would you have expected oslo.db 11.2.0 and sqlalchemy 1.4.31 to work?	18:49
JayF	(on stable/yoga Ironic)	18:49
TheJulia	yes, I would have, but I'm grasping at straws because 1.4 does have some major changes that perhaps we just didn't detect	18:49
JayF	OK; I understand where you're at now	18:50
jrosser	do you think i have something inconsistent with u-c	18:50
* jrosser tries to follow along		18:50
JayF	jrosser: the hypothesis is that there may be a currently-unknown-bug in newer sqlalchemy that we haven't identified yet	18:50
JayF	jrosser: canyou try with 1.3.x or 1.2.x	18:50
TheJulia	... I feel like maybe there could be something, because the only thing that explains this, is if we get an old row back for node upon changing some other fields	18:50
JayF	jrosser: if that fixes it; we'll have to either fix compat with sqla 1.4.x or drop the upper-constraint	18:50
JayF	jrosser: are you using anything exotic for your DB here?	18:51
jrosser	mariadb	18:51
JayF	e.g. some kind of not-quite-mysql cloud service, or read-only secondaries, etc	18:51
JayF	I would think new mariadb would be fine	18:51
TheJulia	the path is task.process_event('resume') not actually executing	18:51
TheJulia	at least, that what it seems	18:52
TheJulia	but it has no way to actually not do so to get as far as it does	18:52
* TheJulia hopes that makes sense		18:52
JayF	what line # do you think that's being called on?	18:52
TheJulia	looking for it again, give me a minute	18:52
JayF	just to make sure I'm looking in the right spot	18:52
JayF	ty	18:52
TheJulia	https://github.com/openstack/ironic/blob/master/ironic/conductor/task_manager.py#L613	18:53
JayF	yeah that's what I expected	18:54
TheJulia	the fact the target_provision_state is available, and not none, also points to something funky	18:54
TheJulia	Honestly, I'm kind of at the point where I'd want to try and reproduce this, but I don't see how we got on that track to begin with unless something "weird" occurd with db interaction	18:55
JayF	like I said above; I think the typeerror killed it in a bad spot	18:55
TheJulia	oh, I agree	18:55
JayF	and somehow gave us a node in a state that just screws up this over and over	18:55
TheJulia	oh	18:55
TheJulia	you know what... it could just be the fact target_provision_state is already set	18:55
JayF	like maybe that leftover target_prov_state is never getting zeroed?	18:55
TheJulia	I wonder if nuking that field int he db would clear this case up	18:55
TheJulia	because I think on failure, that does get reset	18:55
JayF	jrosser: mind trying something for uS?	18:55
jrosser	sure	18:56
TheJulia	or at least, with cleaning is should	18:56
JayF	jrosser: reset the node to managable state	18:56
jrosser	ok, done	18:56
JayF	nope, that hypothesis is bunk TheJulia	18:56
JayF	https://paste.opendev.org/show/bkMzulFg0D4nm9zZqB41/	18:56
TheJulia	openstack baremetal node show ?	18:56
JayF	target_provision_state is null in the before photo	18:56
JayF	dii['clean_steps'] is not though	18:57
JayF	which is what I think could be the culprit but I'm not sure	18:57
TheJulia	ugh	18:57
JayF	TheJulia: we can just have jrosser wipe all of dii, right?	18:57
TheJulia	yes	18:58
JayF	jrosser: so now you'll run this:	18:58
JayF	update nodes set driver_internal_info=None where uuid='THE NODES UUID' limit 1;	18:58
JayF	be 1000000% sure that's right or you'll break some other node	18:58
JayF	I do not believe there is a non-DB way to blank driver_internal_info :(	18:58
TheJulia	doing that will at least identify where this is going sideways since it is not standing out in the code	18:58
TheJulia	there is not	18:58
TheJulia	we purge most of it in error handling stuffs that make sense	18:59
TheJulia	but yeah...	18:59
JayF	jrosser: then, after you've wiped driver_internal_info on the node; lets do another before picture (node show -> pastebin) then do the provide again	18:59
jrosser	ERROR 1054 (42S22): Unknown column 'None' in 'field list'	19:02
jrosser	oh hmm	19:02
JayF	oh	19:02
JayF	null	19:02
JayF	not None	19:02
JayF	sorry, it's null in mysql, not `None`	19:02
jrosser	ok done	19:03
JayF	aight, then like I said, do a node show -> pastebin	19:03
JayF	so we can have a snapshot before state for this test	19:03
JayF	then do the provide	19:03
JayF	and it'll succeed this time. Maybe. We hope. Perhaps.	19:03
JayF	Cross your fingers :D	19:03
jrosser	https://paste.opendev.org/show/b853jZDgzdhsYrsrAHKN/	19:05
* TheJulia is hoping lots		19:14
jrosser	same error :(	19:18
TheJulia	Ugh	19:19
jrosser	i am kind if tired now and it's late	19:19
JayF	I'll be here in the morning :) Thanks for working withus	19:19
jrosser	i will do that again tomorrow, as first time the host dropped to the uefi shell and didnt boot	19:19
jrosser	and i can't be certain if that was my fault or not	19:19
TheJulia	jrosser: get some rest, perhaps log.debug(task.node) before the exception is raised	19:19
JayF	I don't see anything else in the "before" node show that would indicate to me that something in the node object is busted	19:20
JayF	so it's gotta be environmental :(	19:20
jrosser	thankyou very much for your help, i can't help feeling theres something really silly going on here	19:20
JayF	like TheJulia's scary hypothesis	19:20
JayF	jrosser: it shouldn't be possible to get Ironic in this broken of a state, even if misconfigured	19:20
JayF	jrosser: unless you find out it's been flapping between two or three DB servers mid-cleaning :P	19:20
JayF	lol	19:21
JayF	I'm sayin', almost certainly Ironic's fault.	19:21
opendevreview	Julia Kreger proposed openstack/ironic master: Catch any exception for Cleaning https://review.opendev.org/c/openstack/ironic/+/866933	19:58
TheJulia	stevebaker[m]: By chance have you looked at https://zuul.opendev.org/t/openstack/build/1172af245e204813a34d2fc3d61ef5c1 ?	20:07
TheJulia	rpittau: fyi https://review.opendev.org/c/openstack/ironic/+/866780 is failing, looks like job configuration on the branch	20:07
TheJulia	stevebaker[m]: rloo: hjensas: Reviews on https://review.opendev.org/c/openstack/ironic-specs/+/861803 would be appreciated	20:08
stevebaker[m]	TheJulia: that unit test fail looks very much related to the change. I'll take a look	20:09
TheJulia	thanks	20:09
TheJulia	yeah, I saw the ipmitool octets and went "rutro"	20:09
rloo	TheJulia: yes, I started it this morning and then got side-tracked. Have it down to do tomorrow morning (hopefully no interruptions tomorrow...)	20:12
TheJulia	okay	20:12
TheJulia	Thanks	20:12
stevebaker[m]	TheJulia: ah, those tests are testing the default boot mode, which is bios on xena	20:30
TheJulia	so question comes to mind, do we stop backports at that point then?	20:32
stevebaker[m]	TheJulia: the test tests default boot mode, explicit uefi, then explicit bios. The fix actually removes the test_management.py changes from the change	20:35
stevebaker[m]	So I think its ok to backport imo	20:35
opendevreview	Steve Baker proposed openstack/ironic stable/xena: Align iRMC driver with Ironic's default boot_mode https://review.opendev.org/c/openstack/ironic/+/866627	20:36
TheJulia	stevebaker[m]: ack ack	20:38
opendevreview	Steve Baker proposed openstack/ironic stable/wallaby: Align iRMC driver with Ironic's default boot_mode https://review.opendev.org/c/openstack/ironic/+/866628	20:39
opendevreview	Steve Baker proposed openstack/ironic stable/wallaby: Align iRMC driver with Ironic's default boot_mode https://review.opendev.org/c/openstack/ironic/+/866628	20:41
JayF	TheJulia: are stevebaker[m] ^^ patches the new version of that one I originally NACK'd?	21:01
JayF	or is this yet-another-kinda-breaking-change? :(	21:01
* JayF put a -1 and a question on the first backport in that chain		21:07
JayF	As discussed in Monday's meeting, I've proposed a change to project-config to give access to a new group, ironic-release, to delete bugfix branches and create new tags (so we can tag bugfix-xy-eol and delete bugfix/x.y branch)	21:31
JayF	https://review.opendev.org/c/openstack/project-config/+/866937	21:31
TheJulia	JayF: different one	21:31
JayF	Ack; I mean, this is 2x in a week for iRMC driver?	21:32
JayF	Are we not having a high enough review bar for those changes?	21:32
TheJulia	they have a ton of changes in review right now	21:32
JayF	If we think this is OK to land, can someone make that case in the merge request in reponse to my comment	21:32
TheJulia	it is not about a bar, it is about addressing/fixing issues most appropriately	21:32
JayF	and lets get all the breaky stuff landed together and cut a minor-version-bump release	21:32
JayF	This particular default boot mode fix looks like a straight up bug that we should've caught in review.	21:34
JayF	and "we" is probably literal because there's a good chance I was a reviewer on it	21:34
TheJulia	my concious grok which is not deep was that they were not honoring the framework as the driver didn't get updated as ironic evolved	21:36
TheJulia	but.... I could be in left field, under some rocks for all I know right now	21:36
JayF	I'm still at google maps trying to get directions to the ballpark ;)	21:36
TheJulia	oh good, it is not just me	21:37
JayF	I hadn't even thought about how easy it is for bitrot to break these drivers	21:39
JayF	and we don't even track or think about that at all in any kind of structured way	21:40
JayF	this goes back to the discussion about iRMC driver having it's own bios deal, and not doing the bios interface, bceause it predates	21:40
stevebaker[m]	JayF: any existing irmc node with explicit boot mode will not be affected by this backport the irmc docs are clear about boot mode being explicitly set on the node. Not setting the boot mode on the node is has undefined behaviour, that is what the backport fixes	21:40
JayF	undefined meaning, undocumented or unreliable?	21:41
JayF	This is the way I posed the question to TheJulia about the other, similar kinda-breaky change: is there any chance a reasonable operator would have this working reliably the way they want it to and that the change could break it	21:42
JayF	sounds like in this case it would be:	21:42
JayF	- boot_mode unset on the node	21:42
JayF	- default_boot_mode set in config	21:42
JayF	- node is using !default boot_mode reliably	21:42
JayF	- operator wants the node to be using the nondefault boot_mode, despite never setting it explicitly	21:42
JayF	Do I understand it properly?	21:42
JayF	If so, I think this fits like the other one: as long as we have an extremely straightforward release note warning operators the behavior is changing, it should be OK. And I have a strong preference for us cutting a minor-version-bump release of any impacted branches as soon as this (and the other breaky bugfix that Julia was proposing) lands	21:43
JayF	stevebaker[m]: ^ can you ack my understanding is correct? Then one of us put this in the merge req so it's documnted and I'll flip my vote	21:43
TheJulia	I think the inverse, not explicitly stating it thus undefined result on the back end based upon ?last? or prior hardware state	21:44
TheJulia	sort of like what you get by default iwth IPMI if you don't do the raw magic	21:44
JayF	I don't like that scenario as much	21:44
JayF	because it means there's no way for an operator to get the existing behavior post-bugfix	21:44
TheJulia	ahh, but steve said the docs said the are supposed to be explicit	21:45
JayF	and I stopped assuming if certain configurations made sense after working at places that did things that only made sense in downstream context	21:45
TheJulia	this handles when the humand didn't do the thing	21:45
TheJulia	as I understand it	21:45
JayF	what I'm saying though, is right now, if you're running Zed	21:45
JayF	and you're in the impacted group	21:45
JayF	you're getting "we boot with the last boot mode" behavior. Which is not awesome IMO, but if it reliably behaves that way	21:45
TheJulia	well, if you didn't follow the docs, your impacted regardless	21:45
JayF	there's no way for an oper to restore that behavior	21:45
stevebaker[m]	JayF: It looks like defaults to bios when not explicit. default_boot_mode is bios for W, X, and uefi for Y, Z	21:46
JayF	e.g. if it was always bios or always uefi; we could at least dictate how to change it	21:46
JayF	OK; that is nicer then.	21:46
JayF	Can we ensure the release note explcitly tells an operator how to get existing behavior back if they want?	21:46
TheJulia	I'm not sure, if documented to work differently, that the existing behavior was valid in a sense.	21:46
TheJulia	without being set...	21:46
TheJulia	that is	21:47
JayF	I guess I should explicitly state	21:47
JayF	if someone is in this configuration	21:47
JayF	you have to assume they aren't reading docs, right?	21:47
TheJulia	the docs are dense, they could have just missed it	21:49
JayF	yeah, that's fair	21:49
JayF	I think what I said above and re: the previous change is still reasonable, make sure we have clear, directive release notes to operators who were desiring the previous behavior (or were in the uncertain config)	21:49
JayF	and then lets cut a minor version bump release as soon as those two pieces are in	21:50
stevebaker[m]	safest release note advice would be to set capablities boot_mode:bios explicitly for any node which doesn't have it set. And this note would only be required on Y, Z	21:50
JayF	and maybe, maybe consider not backporting this beyond the point where we can't "fit" in a new minor version :/	21:50
stevebaker[m]	JayF: W, X is actually safer because default_boot_mode is bios?	21:51
JayF	So really, Y and Z are the only ones that are changing behavior in a real way	21:51
JayF	I cringe b/c I don't think we can minor-version-bump Y	21:51
JayF	but that is just an unfortunate reality	21:51
JayF	I just noticed a misconfiguration in ironic-stable-maint; I'm going to be taking immediate action to rectify it, and will email the list.	21:54
JayF	We have core-reviewers-emeritus still on the ironic-stable-maint group	21:54
* TheJulia raises an eyebrow		21:54
JayF	I'm actually going to purge the whole list; we have ironic-stable-maint inheriting from ironic-core	21:55
JayF	so anything left in the list for ironic-stable-maint is just asking to be forgotten about	21:55
stevebaker[m]	JayF: So just clarifying, we can go ahead with the backports. Y, Z will get a clear warning in the release note to set the boot mode on existing nodes. W, X can have a softer warning because defaults result in no change?	21:56
JayF	I'm onboard with that	21:57
JayF	just make sure the release note says what you'd tell an operator if they were like "Steve, what happened to my node?" :D	21:57
stevebaker[m]	JayF: maybe in a new "upgrade:" release note section?	21:58
JayF	+++++	21:58
stevebaker[m]	STEVE, MY NODES!	21:58
TheJulia	JayF: when do you want to look at the RBAC stuffs?	21:59
JayF	uh	21:59
JayF	give me a minute to finish sending a message to the list about the stable stuff	21:59
JayF	then nowish is fine w/me?	21:59
TheJulia	uhh, maybe in a little bit	21:59
TheJulia	I'm reviewing the api change now	22:00
JayF	tomorrow is fine, a breadcrumb in the review about where to go look to run tests is fine	22:00
JayF	I know you're busy :D	22:00
TheJulia	sure	22:00
opendevreview	Steve Baker proposed openstack/ironic stable/zed: Align iRMC driver with Ironic's default boot_mode https://review.opendev.org/c/openstack/ironic/+/866625	22:12
stevebaker[m]	JayF: let me know what you think of ^^ wording	22:12
JayF	TheJulia: hmm, could jrosser's issue be like, a stuck-open transaction	22:18
JayF	TheJulia: does that even make sense as a thing to suggest?	22:18
JayF	stevebaker[m]: +1'd it, only not a +2 because I only looked at it from an impact perspective right now	22:24
TheJulia	eh... I don't think so, but it is curious if the conductor has restarted or not	22:24
JayF	I was just thinking that an open transaction, modifying that node, might be an explainable reason why you'd be getting weird results back after you SHOULD see the node in a new position ... but I guess that hypothesis is 100% negated by the fact the node is in a decent state between runs	22:27
opendevreview	Verification of a change to openstack/ironic master failed: Fix debug log message argument formatting https://review.opendev.org/c/openstack/ironic/+/866856	22:28
opendevreview	Steve Baker proposed openstack/ironic stable/yoga: Align iRMC driver with Ironic's default boot_mode https://review.opendev.org/c/openstack/ironic/+/866626	22:40
opendevreview	Steve Baker proposed openstack/ironic stable/xena: Align iRMC driver with Ironic's default boot_mode https://review.opendev.org/c/openstack/ironic/+/866627	22:40
opendevreview	Steve Baker proposed openstack/ironic stable/wallaby: Align iRMC driver with Ironic's default boot_mode https://review.opendev.org/c/openstack/ironic/+/866628	22:43
opendevreview	Steve Baker proposed openstack/ironic bugfix/20.2: Align iRMC driver with Ironic's default boot_mode https://review.opendev.org/c/openstack/ironic/+/866780	22:44
opendevreview	Steve Baker proposed openstack/ironic bugfix/21.0: Align iRMC driver with Ironic's default boot_mode https://review.opendev.org/c/openstack/ironic/+/866656	22:45
opendevreview	Jay Faulkner proposed openstack/ironic master: DB & Object layer for node.shard https://review.opendev.org/c/openstack/ironic/+/864236	23:27
opendevreview	Jay Faulkner proposed openstack/ironic master: API support for CRUD node.shard https://review.opendev.org/c/openstack/ironic/+/866946	23:27
TheJulia	JayF: rutro https://review.opendev.org/c/openstack/ironic/+/866946/1	23:30
TheJulia	check the enitre commit message	23:31
JayF	lolsob	23:31
JayF	gimme a sec, I think I can "fix"	23:31
TheJulia	yeah, delete the new changeid line	23:31
opendevreview	Jay Faulkner proposed openstack/ironic master: DB & Object layer for node.shard https://review.opendev.org/c/openstack/ironic/+/864236	23:32
opendevreview	Jay Faulkner proposed openstack/ironic master: API support for CRUD node.shard https://review.opendev.org/c/openstack/ironic/+/866235	23:32
JayF	okay; fixed	23:32
JayF	ty for pointing that out before it was too long on the split brain lol	23:33
JayF	gerrit telling on me, and how I usually manage multi-patch setups	23:33
JayF	(fixup things patch-by-patch, then rebase them back into the correct commits)	23:33
TheJulia	I go patch by patch and just cherry-pick the next patch on top and go from there	23:36
JayF	Yeah; I think I am a little less scared of rebasing than most folks. I try to avoid telling people my git workflow because they either want to adopt it or change it.	23:39
JayF	lol	23:39

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!