Monday, 2024-12-02

opendevreview	Adam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/936293	05:14
opendevreview	Adam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/936293	05:36
opendevreview	Adam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/936293	06:16
opendevreview	Adam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/936293	06:43
opendevreview	Adam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/936293	06:52
opendevreview	Adam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/936293	07:29
rpittau	good morning ironic! o/	08:25
adam-metal3	Hello ironic. Was there a barmetal networking working group meeting held on the 20th of November on any oder day?	09:21
adam-metal3	I was on PTO and couldn't really check and looking at the e-mail thread I am not sure	09:22
rpittau	adam-metal3: I believe so, last notes are here https://etherpad.opendev.org/p/ironic-networking	09:33
adam-metal3	rpittau, thanks!	09:33
rpittau	np :)	09:33
iurygregory	good morning Ironic	10:52
opendevreview	Merged openstack/ironic master: Allow setting of disable_power_off via API https://review.opendev.org/c/openstack/ironic/+/934740	12:25
dtantsur	derekh__: yay ^^	12:27
iurygregory	\o/	12:43
derekh__	nice :-)	12:46
opendevreview	Verification of a change to openstack/metalsmith master failed: CI: Remove metalsmith legacy jobs https://review.opendev.org/c/openstack/metalsmith/+/933154	13:13
opendevreview	Verification of a change to openstack/ironic stable/2024.2 failed: Use specific fix-commit from dnsmasq https://review.opendev.org/c/openstack/ironic/+/936205	14:11
rpittau	#startmeeting ironic	15:00
opendevmeet	Meeting started Mon Dec 2 15:00:06 2024 UTC and is due to finish in 60 minutes. The chair is rpittau. Information about MeetBot at http://wiki.debian.org/MeetBot.	15:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	15:00
opendevmeet	The meeting name has been set to 'ironic'	15:00
rpittau	mmm I wonder if we'll have quorum today	15:00
rpittau	anyway	15:00
rpittau	Hello everyone!	15:00
rpittau	Welcome to our weekly meeting!	15:00
rpittau	The meeting agenda can be found here:	15:00
rpittau	https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_December_02.2C_2024	15:00
rpittau	let's give it a couple of minutes for people to join	15:00
iurygregory	o/	15:01
TheJulia	o/	15:01
TheJulia	We likely need to figure out our holiday meeting schedule	15:01
rpittau	yeah, I was thinking the same	15:01
kubajj	o/	15:02
cid	o/	15:02
rpittau	ok let's start	15:02
rpittau	#topic Announcements/Reminders	15:02
rpittau	#topic Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio:	15:02
rpittau	#link https://tinyurl.com/ironic-weekly-prio-dash	15:02
rpittau	there are some patches needing +W when any approver has a moment	15:03
adam-metal3	o/	15:03
rpittau	#topic 2025.1 Epoxy Release Schedule	15:04
rpittau	#link https://releases.openstack.org/epoxy/schedule.html	15:04
rpittau	we're at R-17, nothing to mention except I'm wondering if we need to do some releases	15:04
rpittau	we had ironic and ipa last week, I will go through the other repos and see where we are	15:04
TheJulia	I expect, given holidays and all, the next few weeks will largely be mainly focus time for myself	15:05
rpittau	and I have one more thing, I won't be available for the meeting next week as I'm traveling, any volunteer to run the meeting?	15:05
iurygregory	I can't since I'm also traveling	15:06
TheJulia	I might be able to	15:06
JayF	My calendar is clear if you want me to	15:06
TheJulia	I guess a question might be, how many folks will be available next monday?	15:06
rpittau	TheJulia, JayF, thanks, either of you is great :)	15:07
rpittau	oh yeah	15:07
JayF	I'll make a reminder to run it next Monday. Why wouldn't we expect many people around?	15:07
rpittau	I guess it will be at least 3 less people	15:07
JayF	Oh that's a good point. But I wonder if it's our last chance to have a meeting before the holiday, and I think we're technically supposed to have at least one a month	15:07
rpittau	JayF: dtantsur, iurygregory and myself are traveling	15:07
rpittau	we can have a last meeting the week after	15:07
rpittau	then I guess we skip 2 meetings	15:08
rpittau	the 23rd and the 30th	15:08
rpittau	and we get back to the 6th	15:08
TheJulia	I'll note, while next week will be the ?9th?, the following week will be the 16th, and I might not be around	15:08
JayF	I will personally be out ... For exactly those two meetings	15:08
dtantsur	I'll be here on 23rd and 30th if anyone needs me	15:08
dtantsur	but not the next 2 weeks	15:08
TheJulia	Safe travels!	15:09
rpittau	thanks :)	15:09
rpittau	so tentative last meeting the 16th ?	15:09
rpittau	or the 23rd? I may be able to make it	15:09
TheJulia	Lets do the 16th	15:10
TheJulia	I may partial week it, I dunno	15:10
rpittau	perfect	15:11
rpittau	I'll send an email out also as reminder/announcement	15:11
JayF	Yeah I like the idea of just saying the 16th is our only remaining meeting of the month. +1	15:11
rpittau	cool :)	15:11
TheJulia	I really like that idea, skip next week, meet on the 16h, take over world	15:11
TheJulia	etc	15:11
TheJulia	Also, enables time for folks to focus on feature/work items they need to move forward	15:12
rpittau	alright moving on	15:13
rpittau	#topic Discussion topics	15:13
rpittau	I have only one for today	15:13
rpittau	which is more an announcement	15:13
rpittau	#info CI migration to ubuntu noble has been completed	15:13
rpittau	so far so good :D	15:14
rpittau	anything else to discuss today?	15:14
janders	I've got one item, if there is time/interest	15:14
janders	servicing related	15:14
janders	(we also have a good crowd for this topic)	15:14
rpittau	janders: please go ahead :)	15:14
janders	great :)	15:15
janders	(in EMEA this week so easier to join this meeting)	15:15
TheJulia	o/ janders	15:15
janders	so - iurygregory and I ran into some issues with firmware updates during servicing	15:15
janders	the kind of issues I wanted to talk about is related to BMC responsiveness issues during/immediately after	15:15
TheJulia	Okay, what sort of issues?	15:16
TheJulia	and what sort of firmware update?	15:16
* iurygregory thanks HPE for saying servicing failed because the bmc wasn't accessible		15:16
janders	HTTP error codes in responses (400s, 500s, generally things making no sense)	15:16
janders	I think BMC firmware update was the more problematic case (which makes sense)	15:17
TheJulia	i know idracs can start spewing 500s if the FQDN is not set properly	15:17
janders	but then what happens is update succeeds but Ironic thinks if failed cause it got a 400/500 response when BMC was booting up and talking garbage in the process	15:17
janders	(if it remained silent and not responding it would have been OK)	15:17
iurygregory	https://paste.opendev.org/show/bdrsgYzFECwvq5O3hQPb/	15:18
iurygregory	this was the error in case someone is interested =)	15:18
janders	but TL;DR I wonder if we should have some logic saying "during/after BMC firmware upgrade, disregard any 'bad' BMC responses for X seconds"	15:18
TheJulia	There is sort of a weird similar issue NobodyCam has encountered with his downstream where after we power cycle, the BMCs sometimes also just seem to packup and go on vacation for a minute or two	15:18
iurygregory	in this case it was about 3min for me	15:19
TheJulia	Step wise, we likely need to... either implicitly or have an explicit step which is "hey, we're going to get garbage responses, lets hold off on the current action until the $thing is ready	15:19
iurygregory	but yeah, seems similar	15:19
janders	it's not an entirely new problem but the impact of such BMC (mis)behaviour is way worse in day2 ops than day1	15:19
janders	it is annoying when it happens on a new node being provisioned	15:19
TheJulia	or in service	15:19
adam-metal3	I have seen similar related to checking power states	15:20
TheJulia	because these are known workflows and... odd things happening are the beginning of the derailment	15:20
janders	it is disruptive if someone has prod nodes in scheduled downtime (and overshoots the scheduled downtime due to this)	15:20
TheJulia	we almost need a "okay, I've got a thing going on", give the node some grace or something flag	15:20
TheJulia	or "don't take new actions, or... dunno"	15:20
janders	TheJulia++	15:21
TheJulia	I guess I'm semi-struggling to figure out how we would fit it into the model and avoid consuming a locking task, but maybe the answer is to lock it	15:21
TheJulia	and hold a task	15:21
janders	let me re-read the error Iury posted to see what Conductor was exactly trying to do when it crapped out	15:21
TheJulia	we almost need a "it is all okay" once "xyz state is achived"	15:22
janders	OK so in this case it seems like the call to BMC came from within the step it seems	15:22
TheJulia	nobodycam's power issue makes me want to hold a lock, and have a countdown timer of sorts	15:22
janders	but I wonder if it is possible that we hit issues with a periodic task or something	15:22
TheJulia	Well, if the task holds a lock the enitre time, the task can't run.	15:22
janders	TheJulia I probably need to give it some more thought but this makes sense to me	15:23
TheJulia	until the lock releases it	15:23
janders	iurygregory dtantsur WDYT?	15:23
TheJulia	it can't really be a background periodic short of adding a bunch more interlocking complexity	15:23
TheJulia	because then step flows need to resume	15:23
TheJulia	we kind of need to actually block in these "we're doing a thing" cases	15:23
TheJulia	and in nobodycam's case we could just figure out some middle ground which could be turned on for power actions	15:24
janders	yeah it doesn't sound unreasonable	15:24
janders	to do this	15:24
TheJulia	I think his issue is post-cleaning or just after deployment, like the very very very last step	15:24
iurygregory	I think it makes sense	15:24
TheJulia	I've got a bug in launchpad which lays out that issue	15:24
TheJulia	but I think someone triaged it as incomplete and it expired	15:24
iurygregory	oh =(	15:25
janders	I think this time we'll need to get to the bottom of it cause when people start using servicing in anger (and also through metal3) this is going to cause real damage	15:25
janders	(overshooting maintenance windows for already-deployed nodes is first scenario that comes to mind but there will likely be others)	15:26
TheJulia	https://bugs.launchpad.net/ironic/+bug/2069074	15:26
TheJulia	Overshooting maintenance windows is inevitable	15:26
TheJulia	the key is to keep the train of process from derailing	15:26
TheJulia	That way it is not the train which is the root cause	15:27
janders	"if a ironic is unable to connect to a nodes power source" - power source == BMC in this case?	15:27
TheJulia	yes	15:27
TheJulia	I think	15:27
janders	this rings a bell, I think this is what crapped out inside the service_step when iurygregory and I were looking at it	15:27
TheJulia	they also have SNMP PDUs in that environment, aiui	15:27
TheJulia	oh, so basically same type of issue	15:28
janders	(this level of detail is hidden under that last_error)	15:28
janders	yeah	15:28
iurygregory	not during service, but cleaning in an HPE	15:28
iurygregory	but yeah same type of issue indeed	15:28
janders	thank you for clarifying iurygregory	15:28
iurygregory	np =)	15:28
TheJulia	yeah, I think I semi-pinned the issue that I thought	15:28
janders	yeah it feels like we're missing the "don't depend on responses from BMC while mucking around with its firmware" bit	15:29
janders	in a few different scenarios	15:29
TheJulia	well, or in cases where the bmc might also be taking a chance to reset/reboot itself	15:29
TheJulia	at which point, it is no longer a stable entity until it returns to stability	15:29
janders	ok so from our discussion today it feels 1) the issue is real and 2) holding a lock could be a possible solution - am I right here?	15:30
TheJulia	Well, holding a lock prevents things from moving forward	15:30
TheJulia	and prevents others from making state assumptions	15:30
TheJulia	or other conflicting instructions coming in	15:30
janders	yeah	15:31
TheJulia	iurygregory: was https://paste.opendev.org/show/bdrsgYzFECwvq5O3hQPb/'s 400 a result of power state checking?	15:31
TheJulia	post-action	15:31
TheJulia	?	15:31
iurygregory	Need to double check	15:32
iurygregory	I can re-run things later and gather some logs	15:32
janders	so the lock would be requested by code inside the step performing firmware operation in this case (regardless of whether day1 or day2) and if BMC doesn't resume returning valid responses after X seconds we fail the step and release the lock?	15:33
TheJulia	Yeah, I think if this comes down to a "we're in some action like power state change in a workflow, we should be abld to hold, or let the caller know we need to wait unti lwe're getting stable response	15:33
TheJulia	janders: the task triggering the step would already hold a lock (node.reservation) field through the task.	15:33
dtantsur	I think we do a very similar thing with boot mode / secure boot	15:34
TheJulia	Yeah, if the BMC never returns from "lunch" we eventually fail	15:34
janders	dtantsur are you thinking about the code we fixed together in sushy-oem-idrac?	15:34
dtantsur	yep	15:34
janders	or is this in the more generic part?	15:34
janders	OK, I understand, thank you	15:35
rpittau	great :)	15:36
rpittau	anything more on the topic? or other topics to discuss?	15:37
dtantsur	iurygregory and I could get some ideas	15:37
janders	I think this discussion really helped my understanding of this challenge and gives me some ideas going forward, thank you!	15:37
janders	dtantsur yeah	15:37
dtantsur	what on Earth could make IPA take 90 seconds to return any response for any API (including the static root)	15:37
iurygregory	yeah, I'm breaking my mind trying to figure out this one	15:38
dtantsur	even on localhost!	15:38
janders	hmm it's always the DNS right? :)	15:39
dtantsur	it could be DNS..	15:39
TheJulia	dns for logging?	15:39
dtantsur	yeah, I recall this problem	15:40
janders	saying that tongue-in-cheek since you said localhost but hey	15:40
janders	maybe were onto something	15:40
TheJulia	address of the caller :)	15:40
janders	what would be default timeout on the DNS client in question?	15:41
TheJulia	This was also a thing which was "fixed" at one point ages ago by monkeypatching eventlet	15:41
TheJulia	err, using eventlet monkeypatching	15:41
iurygregory	I asked them to check if the response time was the same using the name and ip, and the problem always repeats, I also remember someone said some requests taking 120sec =)	15:41
JayF	That problem was more or less completely excised, the one that was fixed with more monkey patching	15:42
JayF	I really like the hypothesis of inconsistent or non-working DNS. There might even be some differences in behavior between what distribution you're using for the ramdisk in those cases.	15:43
dtantsur	It's a RHEL container inside CoreOS	15:43
janders	could tcpdump help confirm this hypothesis?	15:43
TheJulia	janders: likely	15:43
janders	(see if there is DNS queries on the wire)	15:43
TheJulia	janders: at a minimum, worth a try	15:43
rpittau	anything else to discuss? :)	15:46
adam-metal3	I have a question if I may	15:46
rpittau	adam-metal3: sure thing	15:46
rpittau	we still have some time	15:46
adam-metal3	I have noticed an interesting behaviour with ProLiant DL360 Gen10 Plus servers, as you know IPA registers an UEFI boot record under the name of ironic-<somenumber> by default	15:47
TheJulia	Unless there is a hint file, yeah	15:48
TheJulia	whats going on?	15:48
adam-metal3	On the machine type I have mentioned this record gets saved during all the deployments so if you deploy and clean 50 times you have 50 of these boot devices visible	15:48
TheJulia	oh	15:48
TheJulia	heh	15:48
TheJulia	uhhhhh	15:48
TheJulia	Steve wrote a thing for this	15:49
adam-metal3	as far as tests done by dowsntream folks inidcate, there is no serious issue	15:49
adam-metal3	but it was confusing a lot of my downstream folks	15:49
* iurygregory needs to drop, lunch just arrived		15:50
TheJulia	https://review.opendev.org/c/openstack/ironic-python-agent/+/914563	15:50
adam-metal3	Okay so I assume then it is a known issue, that is good!	15:50
TheJulia	Yeah, so... Ideally the image your deploying has a loader hint	15:51
TheJulia	in that case, the iamge can say what to use, because some shim loaders will try to do record injection as well	15:51
TheJulia	and at one point, that was a super bad bug on some intel hardware	15:51
TheJulia	or, triggered a bug... is the best way to describe it	15:51
TheJulia	Ironic, I think should be trying to clean those entries up in general, but I guess it would help to better understand what your seeing, and compare to a deployment log since the code is supposed to dedupe those entries if memory serves	15:52
TheJulia	adam-metal3: we can continue to discuss more as time permits	15:52
adam-metal3	sure	15:53
TheJulia	we don't need to hold the meeting for this, I can also send you a pointer to the hint file	15:53
adam-metal3	thanks!	15:53
rpittau	we have a couple of minutes left, anything else to discuss today? :(	15:53
rpittau	errr	15:53
rpittau	:)	15:53
rpittau	alright I guess we can close here	15:55
rpittau	thanks everyone!	15:55
rpittau	#endmeeting	15:55
opendevmeet	Meeting ended Mon Dec 2 15:55:32 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	15:55
opendevmeet	Minutes: https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-12-02-15.00.html	15:55
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-12-02-15.00.txt	15:55
opendevmeet	Log: https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-12-02-15.00.log.html	15:55
janders	thank you all o/	15:55
janders	great to be able to join the meeting in real time for a change	15:55
janders	(and sorry for being few minutes late)	15:55
dtantsur	sooo, folks. When running get_deploy_steps, we somehow end up running some real code in IPA. That involves 'udevadm settle'. That consistently takes 2 minutes on their machine.	15:56
dtantsur	what. the. hell.	15:56
rpittau	udevadm settle takes 2 minutes? wow	15:57
janders	crazy - but an awesome find	15:57
janders	gotta drop for a bit again, back a bit later	15:57
TheJulia	adam-metal3: so shim, by default, look fro a BOOTX64.CSV file as a hint, I think it is expected in the folder it is in, so on fedora machine is /boot/efi/EFI/fedora/BOOTX64.CSV and IPA will look for a file like this and use it as the basis for the records to set, replacing ironic-$num behavior	15:58
rpittau	does that mean that udevd is still syncing devices?	15:58
TheJulia	syncing and waiting for settled device state	15:58
TheJulia	which might inform hardware managers what steps are actually available	15:59
rpittau	yep	15:59
rpittau	dtantsur: probably need systemd-udevd logs to see what's taking that long	16:01
adam-metal3	TheJulia: thanks, I will check if we have that file and in general how this process works, so far I have only checked the uefi tooling that IPA uses to set the record	16:02
dtantsur	also caching hardware managers does not seem to work..	16:02
adam-metal3	the strange thing for me is that in my case it is always ironic-1 but 50 times I find it strange that the same name can be saved any number of times	16:03
TheJulia	adam-metal3: that sounds like it is saving or sits regarding a delete	16:06
TheJulia	adam-metal3: we have seen a thing on some Lenovo hardware where if changes are or made in a very particular order, the machine reverts back to last known UEFI boot variables	16:09
TheJulia	We had to move the delete before the save in that case because originally the code was add then cleanup	16:10
TheJulia	dtantsur: … that sounds like what us old bug is now new bug again	16:11
adam-metal3	TheJulia: interesting, I will need to to ask around whether I can get my hands on a machine that exhibits this symptoms otherwise I am not sure how else to play around with the UEFI	16:24
TheJulia	adam-metal3: I suspect that is the only path, it does sound like something is going "off the rails". If you can get us agent logs and the efibootmgr -v output after deployment, it should be easier to wrap our heads around this	16:30
rpittau	good night! o/	16:45
dtantsur	aahhhh, JayF's recent patch fixes the reason why evaluate_hardware_support is called on each get_deploy_steps: https://review.opendev.org/c/openstack/ironic-python-agent/+/920153/7/ironic_python_agent/hardware.py#3504	16:54
dtantsur	if we pull that downstream, things will no longer break because of udevadm settle	16:54
JayF	Uh	16:55
JayF	you know before that change we called it like 5 times instead of 1 iirc	16:55
dtantsur	not just that, we also call it every time get_deploy_steps is called	16:56
dtantsur	which is.. rather often	16:56
JayF	are you 10000% sure that's not cached?	16:56
JayF	I think it is.	16:56
dtantsur	After your patch, it is cached.	16:56
JayF	get_managers_detail() is cached	16:56
dtantsur	See the link, before your patch it was a naked call to dispatch_to_all_managers('evaluate_hardware_support')	16:57
JayF	so we should /not/ be running it on each call to get_deploy_steps() unless that's the bug	16:57
JayF	oh, oh, oh	16:57
dtantsur	Yep, except that we don't have your patch in that version of OpenShift	16:57
JayF	I thought you were saying the opposite	16:57
JayF	which is why I was so confused	16:57
dtantsur	:)	16:57
JayF	yes, this is a gross bug, and it's nice to see it had more real world impact	16:57
JayF	that's also why we don't have so much crap logging from IPA anymore :)	16:57
TheJulia	dtantsur: rpittau: going back to the question from last week, then I think what I'm sort of thinking of works. Ironic should prefer qcow2 images. IPA, should it get a URL directly, could prefer applehv, but that is a bridge to try and cross when we get there.	16:58
dtantsur	applehv == raw?	17:00
TheJulia	very much appears to be the case	17:04
JayF	It really weirds me out how much of this is just "yeah it looks raw" rather than there being a documented standard :\|	17:12
JayF	I know that's not our fault, but it feels like something that's going to eventually cause headaches (if not already causing it)	17:12
TheJulia	I think we should refine it and make "raw" our standard ;)	17:20
TheJulia	tbh	17:20
TheJulia	but, one step at a time	17:20
TheJulia	we first must crawl, then walk, then run	17:20
dtantsur	I think the crawl would be 1 layer (image), detect the type using our new shiny detector	17:24
dtantsur	It feels like many conversations around this effort is because we're trying to do the next step already (payload containing different image types per architecture, etc)	17:25
JayF	TheJulia: I honestly like the idea of calling what we usually call "raw" "gpt" similar to what the glance as defender spec lays out	17:26
dtantsur	Layers also have a MIME content type, I'm curious why podman did not use it..	17:27
JayF	https://review.opendev.org/c/openstack/glance-specs/+/925111	17:27
TheJulia	dtantsur: because OCI spec mandates specific types	17:27
TheJulia	and so you cannot make assumptions and doing some sanity checking upfront allows for "hey, you gave us bad data" instead of just falling over being unable to deploy	17:28
dtantsur	which part of the OCI spec do you mean?	17:29
TheJulia	OCI image spec	17:30
dtantsur	I don't think the OCI spec has any explicit mentions of qcow2/applehv (or does it?)	17:31
TheJulia	it does not	17:31
dtantsur	Then it cannot mandate them?	17:31
TheJulia	But it does, if my memory is serving me, explicitly note all attached manifest data layers are treated as layers with the mandated data types	17:31
TheJulia	its a structural aspect which mandates layer modeling	17:32
dtantsur	sorry, I don't get it	17:32
dtantsur	it's up to a tool how to treat a certain layer	17:32
dtantsur	aha, the spec even shows an explicit "artifactType", I did not see it initially	17:35
dtantsur	so, looks like we could have a layer with "artifactType": "application/x-qemu-disk"	17:35
* dtantsur still curious why podman did not use all this		17:35
TheJulia	https://github.com/opencontainers/image-spec/blob/main/image-layout.md#indexjson-file <-- I think that is the start of it	17:35
* TheJulia looks up artifactType		17:35
TheJulia	https://github.com/opencontainers/image-spec/blob/main/artifacts-guidance.md <-- uses should	17:36
TheJulia	hmmm I like ArtifactType	17:37
dtantsur	It sounds like we could use mediaType as well	17:37
dtantsur	for those following along: https://github.com/opencontainers/image-spec/commit/749ea9a27d1eb44b5369ee7e8e296c7e99e3d2e5	17:38
dtantsur	Ah, there are two different things called mediaType. Thank you, not confusing at all.	17:38
TheJulia	It looks like it might be a lower level we might be able to note/annotate it, but I suspect they did it one level up so they didn't have to walk all the way down and then back up	17:39
TheJulia	at least, I suspect	17:39
TheJulia	they being in that guess is podman	17:40
dtantsur	Okay, I finally got it. On the top level of the index.json, mediaType is its own media type (a constant), artifactType is an optional type of the contained artifact.	17:42
TheJulia	yup	17:42
dtantsur	Then each manifest can have its own mediaType, which can be, well, anything. application/x-qemu-disk	17:42
TheJulia	Each manifest after you make a decision right?	17:42
TheJulia	so not second level, but third level down right?	17:43
TheJulia	because second level is where you have all of the varying types and just the pointers to the final manifests	17:43
dtantsur	even first?	17:43
dtantsur	what is preventing me from having https://paste.opendev.org/show/btSiycm241tCqJDkXUJt/ ?\	17:43
dtantsur	("no existing tooling can product that" is a plausible answer, but let's leave it aside just for a minute)	17:44
dtantsur	Imagine, I have an image with this index.json and exactly one blob	17:44
dtantsur	Am I missing something?	17:44
TheJulia	because under existing data structures, that would be a lower layer artifact, and if we're going to be along side of containers which may also be bootable, we ideally want to be mindful and of fitting in with other aspects instead of trying to create something entirely different in the same upper level modeling	17:45
dtantsur	there may be more manifests with other types	17:46
TheJulia	If we're going to try and carve an entirely new path here, I might as well stop and punt on this, to be entirely honest	17:46
TheJulia	yes, but they can't be index.json at that point, they would need to be other containers	17:46
dtantsur	Mmm, I'm trying NOT to carve a new path	17:46
TheJulia	index.json is top level, the whole of the representation of a container	17:46
TheJulia	My whole driver here is to have a streamlined path so I could eventually have a single container which has a qcow2 file attached, and a bootable container	17:47
dtantsur	I'm looking at https://github.com/opencontainers/image-spec/blob/main/image-layout.md#index-example, and in this example they have two "normal" images as well as some AppStream XML	17:47
TheJulia	and the user could then just choose based upon the deploy interface	17:47
dtantsur	In fact, it's a root index that points at another index, a simpler manifests and just an artifact, right?	17:49
TheJulia	yeah, the third entry in that example is a definite standalone file	17:49
TheJulia	second is a manifest reference pointer	17:49
dtantsur	So that could be our qcow2 alongside the proper container stuffs?	17:49
TheJulia	the first is another index reference	17:50
TheJulia	potentially, question is platform field and if it can exist at that level	17:52
TheJulia	the spec doc walks through what podman did, so it is a top layer primary manifest pointer for the container itself, and then an an index	17:53
TheJulia	inside that index, each manifest entry has platform and annotations to signify what the files are	17:53
dtantsur	https://github.com/opencontainers/image-spec/blob/main/image-index.md#image-index-property-descriptions lists platform	17:54
TheJulia	They point to a separate manifest file	17:54
TheJulia	which then lists the contents as a single layer (why did they do that?!)	17:54
dtantsur	Yeah, I'm also curious about this indirection	17:55
dtantsur	nothing in the spec is telling me that I cannot have top-level artifacts with different architectures	17:55
dtantsur	maybe simply because of tooling support?	17:56
TheJulia	so it could be since I think all layers are expected to be z-streamed	17:56
TheJulia	maybe that is why?!	17:56
dtantsur	i.e. they could create layers with podman easily but they could not great what I describe?	17:56
TheJulia	so sort of a tooling convenience and extra transparent compression?	17:56
dtantsur	yeaah	17:56
TheJulia	yeah	17:56
TheJulia	I kind of suspect some of it is that compression, some of it was explicit modeling of indirection and also trying to not have to walk all the way down	17:57
dtantsur	The cost they're paying through is the non-standard "disktype" annotations	17:57
TheJulia	yup	17:57
dtantsur	I guess the key question is whether we want to mimic that (keeping in mind that they themselves may pivot from it one day)	17:57
dtantsur	Need to do some exercising now. Sorry, I'm afraid I caused more confusion than I solved..	17:59
TheJulia	I think they expect to have to if docker decides to do anything else. Perhaps a question back to ?Arron? was Why not do it at a top-ish level (assuming the tools support it	17:59
TheJulia	the whole thing that made me raise an eyebrow is the container reference being expected	17:59
TheJulia	I bet that is a compatibility aspect on index.json.	17:59
dtantsur	The spec is sometimes vague on what is required and what is not	17:59
TheJulia	which drives the mid-level index to manifest linking	17:59
TheJulia	which is then also weird, is that lower level index for machine-os also circles back and points back at the same container manifest as well	18:00
dtantsur	The case described in https://github.com/opencontainers/image-spec/commit/749ea9a27d1eb44b5369ee7e8e296c7e99e3d2e5 is remotely similar to ours, and they accepted it as a valid case, so there is hope	18:00
TheJulia	from 2023, I wonder if that was the original focus and they maybe pivoted?	18:01
TheJulia	we're making lots of guesses	18:01
dtantsur	the author does not seem to be from podman	18:01
dtantsur	yeah	18:01
TheJulia	oh, hmm	18:01
TheJulia	on a plus side, I've not written any code at this layer. Still trying to wire in the overall higher level changes	18:03
dtantsur	++	18:04
* dtantsur leaves for real o/		18:04
opendevreview	cid proposed openstack/ironic master: [WIP] Save ``configdrive`` in an auxiliary table https://review.opendev.org/c/openstack/ironic/+/933622	18:14
cardoe	Peek at how helm and other tools like orcas use OCI for storage.	19:30
iurygregory	time to setup bifrost in a fresh OS to double check that I'm not going crazy with firmware updates "not working" on stable/2023.2 =(	20:27
-opendevstatus- NOTICE: Gerrit will have a short outage while we update to the latest 3.9 release in preparation for our 3.10 upgrade on Friday		21:31
TheJulia	cardoe: do you happen to have a specific link to aid us in this :)	21:52
JayF	I had given up ever seeing these. https://usercontent.irccloud-cdn.com/file/b4gwar2p/PXL_20241202_223226218.jpg	22:39
JayF	Five nanoKVMs, ready for action :D	22:39
JayF	Took about 2 months, and I had given up on getting anything for my money, but they are here. Hopefully they work!	22:39
JayF	Likely will be what I use to a first stab at redfish console behavior	22:43
JayF	(they also support IPMI, gross)	22:43
iurygregory	nice!	23:11

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!