Monday, 2024-12-02

opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93629305:14
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93629305:36
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93629306:16
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93629306:43
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93629306:52
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93629307:29
rpittaugood morning ironic! o/08:25
adam-metal3Hello ironic. Was there a barmetal networking working group meeting held on the 20th of November on any oder day?09:21
adam-metal3I was on PTO and couldn't really check and looking at the e-mail thread I am not sure 09:22
rpittauadam-metal3: I believe so, last notes are here https://etherpad.opendev.org/p/ironic-networking09:33
adam-metal3rpittau, thanks!09:33
rpittaunp :)09:33
iurygregorygood morning Ironic10:52
opendevreviewMerged openstack/ironic master: Allow setting of disable_power_off via API  https://review.opendev.org/c/openstack/ironic/+/93474012:25
dtantsurderekh__: yay ^^12:27
iurygregory\o/12:43
derekh__nice :-) 12:46
opendevreviewVerification of a change to openstack/metalsmith master failed: CI: Remove metalsmith legacy jobs  https://review.opendev.org/c/openstack/metalsmith/+/93315413:13
opendevreviewVerification of a change to openstack/ironic stable/2024.2 failed: Use specific fix-commit from dnsmasq  https://review.opendev.org/c/openstack/ironic/+/93620514:11
rpittau#startmeeting ironic15:00
opendevmeetMeeting started Mon Dec  2 15:00:06 2024 UTC and is due to finish in 60 minutes.  The chair is rpittau. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
opendevmeetThe meeting name has been set to 'ironic'15:00
rpittaummm I wonder if we'll have quorum today15:00
rpittauanyway15:00
rpittauHello everyone!15:00
rpittauWelcome to our weekly meeting!15:00
rpittauThe meeting agenda can be found here:15:00
rpittauhttps://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_December_02.2C_202415:00
rpittaulet's give it a couple of minutes for people to join15:00
iurygregoryo/15:01
TheJuliao/15:01
TheJuliaWe likely need to figure out our holiday meeting schedule15:01
rpittauyeah, I was thinking the same15:01
kubajjo/15:02
cido/15:02
rpittauok let's start15:02
rpittau#topic Announcements/Reminders15:02
rpittau#topic Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio:15:02
rpittau#link https://tinyurl.com/ironic-weekly-prio-dash15:02
rpittauthere are some patches needing +W when any approver has a moment15:03
adam-metal3o/15:03
rpittau#topic 2025.1 Epoxy Release Schedule 15:04
rpittau#link https://releases.openstack.org/epoxy/schedule.html15:04
rpittauwe're at R-17, nothing to mention except I'm wondering if we need to do some releases15:04
rpittauwe had ironic and ipa last week, I will go through the other repos and see where we are15:04
TheJuliaI expect, given holidays and all, the next few weeks will largely be mainly focus time for myself15:05
rpittauand I have one more thing, I won't be available for the meeting next week as I'm traveling, any volunteer to run the meeting?15:05
iurygregoryI can't since I'm also traveling 15:06
TheJuliaI might be able to15:06
JayFMy calendar is clear if you want me to15:06
TheJuliaI guess a question might be, how many folks will be available next monday?15:06
rpittauTheJulia, JayF, thanks, either of you is great :)15:07
rpittauoh yeah15:07
JayFI'll make a reminder to run it next Monday. Why wouldn't we expect many people around?15:07
rpittauI guess it will be at least 3 less people15:07
JayFOh that's a good point. But I wonder if it's our last chance to have a meeting before the holiday, and I think we're technically supposed to have at least one a month15:07
rpittauJayF: dtantsur, iurygregory and myself are traveling15:07
rpittauwe can have a last meeting the week after15:07
rpittauthen I guess we skip 2 meetings15:08
rpittauthe 23rd and the 30th15:08
rpittauand we get back to the 6th15:08
TheJuliaI'll note, while next week will be the ?9th?, the following week will be the 16th, and I might not be around15:08
JayFI will personally be out ... For exactly those two meetings15:08
dtantsurI'll be here on 23rd and 30th if anyone needs me15:08
dtantsurbut not the next 2 weeks15:08
TheJuliaSafe travels!15:09
rpittauthanks :)15:09
rpittauso tentative last meeting the 16th ?15:09
rpittauor the 23rd? I may be able to make it15:09
TheJuliaLets do the 16th15:10
TheJuliaI may partial week it, I dunno15:10
rpittauperfect15:11
rpittauI'll send an email out also as reminder/announcement15:11
JayFYeah I like the idea of just saying the 16th is our only remaining meeting of the month. +115:11
rpittaucool :)15:11
TheJuliaI really like that idea, skip next week, meet on the 16h, take over world15:11
TheJuliaetc15:11
TheJuliaAlso, enables time for folks to focus on feature/work items they need to move forward15:12
rpittaualright moving on15:13
rpittau#topic Discussion topics15:13
rpittauI have only one for today15:13
rpittauwhich is more an announcement15:13
rpittau#info CI migration to ubuntu noble has been completed15:13
rpittauso far so good :D15:14
rpittauanything else to discuss today? 15:14
jandersI've got one item, if there is time/interest15:14
jandersservicing related15:14
janders(we also have a good crowd for this topic)15:14
rpittaujanders: please go ahead :)15:14
jandersgreat :)15:15
janders(in EMEA this week so easier to join this meeting)15:15
TheJuliao/ janders15:15
jandersso - iurygregory and I ran into some issues with firmware updates during servicing15:15
jandersthe kind of issues I wanted to talk about is related to BMC responsiveness issues during/immediately after15:15
TheJuliaOkay, what sort of issues?15:16
TheJuliaand what sort of firmware update?15:16
* iurygregory thanks HPE for saying servicing failed because the bmc wasn't accessible 15:16
jandersHTTP error codes in responses (400s, 500s, generally things making no sense)15:16
jandersI think BMC firmware update was the more problematic case (which makes sense)15:17
TheJuliai know idracs can start spewing 500s if the FQDN is not set properly15:17
jandersbut then what happens is update succeeds but Ironic thinks if failed cause it got a 400/500 response when BMC was booting up and talking garbage in the process15:17
janders(if it remained silent and not responding it would have been OK)15:17
iurygregoryhttps://paste.opendev.org/show/bdrsgYzFECwvq5O3hQPb/15:18
iurygregorythis was the error in case someone is interested =) 15:18
jandersbut TL;DR I wonder if we should have some logic saying "during/after BMC firmware upgrade, disregard any 'bad' BMC responses for X seconds"15:18
TheJuliaThere is sort of a weird similar issue NobodyCam has encountered with his downstream where after we power cycle, the BMCs sometimes also just seem to packup and go on vacation for a minute or two15:18
iurygregoryin this case it was about 3min for me15:19
TheJuliaStep wise, we likely need to... either implicitly or have an explicit step which is "hey, we're going to get garbage responses, lets hold off on the current action until the $thing is ready15:19
iurygregorybut yeah, seems similar15:19
jandersit's not an entirely new problem but the impact of such BMC (mis)behaviour is way worse in day2 ops than day115:19
jandersit is annoying when it happens on a new node being provisioned15:19
TheJuliaor in service15:19
adam-metal3I have seen similar related to checking power states15:20
TheJuliabecause these are known workflows and... odd things happening are the beginning of the derailment15:20
jandersit is disruptive if someone has prod nodes in scheduled downtime (and overshoots the scheduled downtime due to this)15:20
TheJuliawe almost need a "okay, I've got a thing going on", give the node some grace or something flag15:20
TheJuliaor "don't take new actions, or... dunno"15:20
jandersTheJulia++15:21
TheJuliaI guess I'm semi-struggling to figure out how we would fit it into the model and avoid consuming a locking task, but maybe the answer *is* to lock it15:21
TheJuliaand hold a task15:21
janderslet me re-read the error Iury posted to see what Conductor was exactly trying to do when it crapped out15:21
TheJuliawe almost need a "it is all okay" once "xyz state is achived"15:22
jandersOK so in this case it seems like the call to BMC came from within the step it seems15:22
TheJulianobodycam's power issue makes me want to hold  a lock, and have a countdown timer of sorts15:22
jandersbut I wonder if it is possible that we hit issues with a periodic task or something15:22
TheJuliaWell, if the task holds a lock the enitre time, the task can't run.15:22
jandersTheJulia I probably need to give it some more thought but this makes sense to me15:23
TheJuliauntil the lock releases it15:23
jandersiurygregory dtantsur WDYT?15:23
TheJuliait can't really be a background periodic short of adding a bunch more interlocking complexity15:23
TheJuliabecause then step flows need to resume15:23
TheJuliawe kind of need to actually block in these "we're doing a thing" cases15:23
TheJuliaand in nobodycam's case we could just figure out some middle ground which could be turned on for power actions15:24
jandersyeah it doesn't sound unreasonable15:24
jandersto do this15:24
TheJuliaI *think* his issue is post-cleaning or just after deployment, like the very very very last step15:24
iurygregoryI think it makes sense15:24
TheJuliaI've got a bug in launchpad which lays out that issue15:24
TheJuliabut I think someone triaged it as incomplete and it expired15:24
iurygregoryoh =(15:25
jandersI think this time we'll need to get to the bottom of it cause when people start using servicing in anger (and also through metal3) this is going to cause real damage15:25
janders(overshooting maintenance windows for already-deployed nodes is first scenario that comes to mind but there will likely be others)15:26
TheJuliahttps://bugs.launchpad.net/ironic/+bug/206907415:26
TheJuliaOvershooting maintenance windows is inevitable15:26
TheJuliathe key is to keep the train of process from derailing15:26
TheJuliaThat way it is not the train which is the root cause15:27
janders"if a ironic is unable to connect to a nodes power source" - power source == BMC in this case?15:27
TheJuliayes15:27
TheJuliaI *think*15:27
jandersthis rings a bell, I think this is what crapped out inside the service_step when iurygregory and I were looking at it15:27
TheJuliathey also have SNMP PDUs in that environment, aiui15:27
TheJuliaoh, so basically same type of issue15:28
janders(this level of detail is hidden under that last_error)15:28
jandersyeah15:28
iurygregorynot during service, but cleaning in an HPE15:28
iurygregorybut yeah same type of issue indeed15:28
jandersthank you for clarifying iurygregory15:28
iurygregorynp =)15:28
TheJuliayeah, I think I semi-pinned the issue that I thought15:28
jandersyeah it feels like we're missing the "don't depend on responses from BMC while mucking around with its firmware" bit15:29
jandersin a few different scenarios15:29
TheJuliawell, or in cases where the bmc might also be taking a chance to reset/reboot itself15:29
TheJuliaat which point, it is no longer a stable entity until it returns to stability15:29
jandersok so from our discussion today it feels 1) the issue is real and 2) holding a lock could be a possible solution - am I right here?15:30
TheJuliaWell, holding a lock prevents things from moving forward15:30
TheJuliaand prevents others from making state assumptions15:30
TheJuliaor other conflicting instructions coming in15:30
jandersyeah15:31
TheJuliaiurygregory: was https://paste.opendev.org/show/bdrsgYzFECwvq5O3hQPb/'s 400 a result of power state checking?15:31
TheJuliapost-action15:31
TheJulia?15:31
iurygregoryNeed to double check15:32
iurygregoryI can re-run things later and gather some logs15:32
jandersso the lock would be requested by code inside the step performing firmware operation in this case (regardless of whether day1 or day2) and if BMC doesn't resume returning valid responses after X seconds we fail the step and release the lock?15:33
TheJuliaYeah, I think if this comes down to a "we're in some action like power state change in a workflow, we should be abld to hold, or let the caller know we need to wait unti lwe're getting stable response15:33
TheJuliajanders: the task triggering the step would already hold a lock (node.reservation) field through the task.15:33
dtantsurI think we do a very similar thing with boot mode / secure boot15:34
TheJuliaYeah, if the BMC never returns from "lunch" we eventually fail15:34
jandersdtantsur are you thinking about the code we fixed together in sushy-oem-idrac?15:34
dtantsuryep15:34
jandersor is this in the more generic part?15:34
jandersOK, I understand, thank you15:35
rpittaugreat :)15:36
rpittauanything more on the topic? or other topics to discuss?15:37
dtantsuriurygregory and I could get some ideas15:37
jandersI think this discussion really helped my understanding of this challenge and gives me some ideas going forward, thank you!15:37
jandersdtantsur yeah 15:37
dtantsurwhat on Earth could make IPA take 90 seconds to return any response for any API (including the static root)15:37
iurygregoryyeah, I'm breaking my mind trying to figure out this one15:38
dtantsureven on localhost!15:38
jandershmm it's always the DNS right? :) 15:39
dtantsurit could be DNS..15:39
TheJuliadns for logging?15:39
dtantsuryeah, I recall this problem15:40
janderssaying that tongue-in-cheek since you said localhost but hey 15:40
jandersmaybe were onto something15:40
TheJuliaaddress of the caller :)15:40
janderswhat would be default timeout on the DNS client in question?15:41
TheJuliaThis was also a thing which was "fixed" at one point ages ago by monkeypatching eventlet15:41
TheJuliaerr, using eventlet monkeypatching15:41
iurygregoryI asked them to check if the response time was the same using the name and ip, and the problem always repeats, I also remember someone said some requests taking 120sec =)15:41
JayFThat problem was more or less completely excised, the one that was fixed with more monkey patching15:42
JayFI really like the hypothesis of inconsistent or non-working DNS. There might even be some differences in behavior between what distribution you're using for the ramdisk in those cases.15:43
dtantsurIt's a RHEL container inside CoreOS15:43
janderscould tcpdump help confirm this hypothesis?15:43
TheJuliajanders: likely15:43
janders(see if there is DNS queries on the wire)15:43
TheJuliajanders: at a minimum, worth a try15:43
rpittauanything else to discuss? :)15:46
adam-metal3I have a question if I may15:46
rpittauadam-metal3: sure thing15:46
rpittauwe still have some time15:46
adam-metal3I have noticed an interesting behaviour with ProLiant DL360 Gen10 Plus servers, as you know IPA registers an UEFI boot record under the name of ironic-<somenumber> by default15:47
TheJuliaUnless there is a hint file, yeah15:48
TheJuliawhats going on?15:48
adam-metal3On the machine type I have mentioned this record gets saved during all the deployments so if you deploy and clean 50 times you have 50 of these boot devices visible15:48
TheJuliaoh15:48
TheJuliaheh15:48
TheJuliauhhhhh15:48
TheJuliaSteve wrote a thing for this15:49
adam-metal3as far as tests done by dowsntream folks inidcate, there is no serious issue15:49
adam-metal3but it was confusing a lot of my downstream folks15:49
* iurygregory needs to drop, lunch just arrived15:50
TheJuliahttps://review.opendev.org/c/openstack/ironic-python-agent/+/91456315:50
adam-metal3Okay so I assume then it is a known issue, that is good!15:50
TheJuliaYeah, so... Ideally the image your deploying has a loader hint15:51
TheJuliain that case, the iamge can say what to use, because some shim loaders will try to do record injection as well15:51
TheJuliaand at one point, that was a super bad bug on some intel hardware15:51
TheJuliaor, triggered a bug... is the best way to describe it15:51
TheJuliaIronic, I *think* should be trying to clean those entries up in general, but I guess it would help to better understand what your seeing, and compare to a deployment log since the code is *supposed* to dedupe those entries if memory serves15:52
TheJuliaadam-metal3: we can continue to discuss more as time permits15:52
adam-metal3sure15:53
TheJuliawe don't need to hold the meeting for this, I can also send you a pointer to the hint file15:53
adam-metal3thanks!15:53
rpittauwe have a couple of minutes left, anything else to discuss today? :(15:53
rpittauerrr15:53
rpittau:)15:53
rpittaualright I guess we can close here15:55
rpittauthanks everyone!15:55
rpittau#endmeeting15:55
opendevmeetMeeting ended Mon Dec  2 15:55:32 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:55
opendevmeetMinutes:        https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-12-02-15.00.html15:55
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-12-02-15.00.txt15:55
opendevmeetLog:            https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-12-02-15.00.log.html15:55
jandersthank you all o/15:55
jandersgreat to be able to join the meeting in real time for a change15:55
janders(and sorry for being few minutes late)15:55
dtantsursooo, folks. When running get_deploy_steps, we somehow end up running some real code in IPA. That involves 'udevadm settle'. That consistently takes 2 minutes on their machine.15:56
dtantsurwhat. the. hell.15:56
rpittauudevadm settle takes 2 minutes? wow15:57
janderscrazy - but an awesome find15:57
jandersgotta drop for a bit again, back a bit later15:57
TheJuliaadam-metal3: so shim, by default, look fro a BOOTX64.CSV file as a hint, I think it is expected in the folder it is in, so on fedora machine is /boot/efi/EFI/fedora/BOOTX64.CSV and IPA will look for a file like this and use it as the basis for the records to set, replacing ironic-$num behavior15:58
rpittaudoes that mean that udevd is still syncing devices?15:58
TheJuliasyncing and waiting for settled device state15:58
TheJuliawhich might inform hardware managers what steps are actually available15:59
rpittauyep15:59
rpittaudtantsur: probably need systemd-udevd  logs to see what's taking that long16:01
adam-metal3TheJulia: thanks, I will check if we have that file and in general how this process works, so far I have only checked the uefi tooling that IPA uses to set the record16:02
dtantsuralso caching hardware managers does not seem to work..16:02
adam-metal3the strange thing for me is that in my case it is always ironic-1 but 50 times I find it strange that the same name can be saved any number of times16:03
TheJuliaadam-metal3: that sounds like it is saving or sits regarding a delete16:06
TheJuliaadam-metal3: we have seen a thing on some Lenovo hardware where if changes are or made in a very particular order, the machine reverts back to last known UEFI boot variables16:09
TheJuliaWe had to move the delete before the save in that case because originally the code was add then cleanup16:10
TheJuliadtantsur: … that sounds like what us old bug is now new bug again16:11
adam-metal3TheJulia: interesting, I will need to to ask around whether I can get my hands on a machine that exhibits this symptoms otherwise I am not sure how else to play around with the UEFI16:24
TheJuliaadam-metal3: I suspect that is the only path, it does sound like something is going "off the rails". If you can get us agent logs and the efibootmgr -v output after deployment, it should be easier to wrap our heads around this16:30
rpittaugood night! o/16:45
dtantsuraahhhh, JayF's recent patch fixes the reason why evaluate_hardware_support is called on each get_deploy_steps: https://review.opendev.org/c/openstack/ironic-python-agent/+/920153/7/ironic_python_agent/hardware.py#350416:54
dtantsurif we pull that downstream, things will no longer break because of udevadm settle16:54
JayFUh16:55
JayFyou know before that change we called it like 5 times instead of 1 iirc16:55
dtantsurnot just that, we also call it every time get_deploy_steps is called16:56
dtantsurwhich is.. rather often16:56
JayFare you 10000% sure that's not cached?16:56
JayFI think it is.16:56
dtantsurAfter your patch, it is cached.16:56
JayFget_managers_detail() is cached16:56
dtantsurSee the link, before your patch it was a naked call to dispatch_to_all_managers('evaluate_hardware_support')16:57
JayFso we should /not/ be running it on each call to get_deploy_steps() unless that's the bug16:57
JayFoh, oh, oh16:57
dtantsurYep, except that we don't have your patch in that version of OpenShift16:57
JayFI thought you were saying *the opposite*16:57
JayFwhich is why I was so confused16:57
dtantsur:)16:57
JayFyes, this is a gross bug, and it's nice to see it had more real world impact16:57
JayFthat's also why we don't have so much crap logging from IPA anymore :)16:57
TheJuliadtantsur: rpittau: going back to the question from last week, then I think what I'm sort of thinking of works. Ironic should prefer qcow2 images. IPA, should it get a URL directly, could prefer applehv, but that is a bridge to try and cross when we get there.16:58
dtantsurapplehv == raw?17:00
TheJuliavery much appears to be the case17:04
JayFIt really weirds me out how much of this is just "yeah it looks raw" rather than there being a documented standard :| 17:12
JayFI know that's not our fault, but it feels like something that's going to eventually cause headaches (if not already causing it)17:12
TheJuliaI think we should refine it and make "raw" our standard ;)17:20
TheJuliatbh17:20
TheJuliabut, one step at a time17:20
TheJuliawe first must crawl, then walk, then run17:20
dtantsurI think the crawl would be 1 layer (image), detect the type using our new shiny detector17:24
dtantsurIt feels like many conversations around this effort is because we're trying to do the next step already (payload containing different image types per architecture, etc)17:25
JayFTheJulia: I honestly like the idea of calling what we usually call "raw" "gpt" similar to what the glance as defender spec lays out17:26
dtantsurLayers also have a MIME content type, I'm curious why podman did not use it..17:27
JayFhttps://review.opendev.org/c/openstack/glance-specs/+/92511117:27
TheJuliadtantsur: because OCI spec mandates specific types17:27
TheJuliaand so you cannot make assumptions and doing some sanity checking upfront allows for "hey, you gave us bad data" instead of just falling over being unable to deploy17:28
dtantsurwhich part of the OCI spec do you mean?17:29
TheJuliaOCI image spec17:30
dtantsurI don't think the OCI spec has any explicit mentions of qcow2/applehv (or does it?)17:31
TheJuliait does not17:31
dtantsurThen it cannot mandate them?17:31
TheJuliaBut it does, if my memory is serving me, explicitly note all attached manifest data layers are treated as layers with the mandated data types17:31
TheJuliaits a structural aspect which mandates layer modeling17:32
dtantsursorry, I don't get it17:32
dtantsurit's up to a tool how to treat a certain layer17:32
dtantsuraha, the spec even shows an explicit "artifactType", I did not see it initially17:35
dtantsurso, looks like we could have a layer with "artifactType": "application/x-qemu-disk"17:35
* dtantsur still curious why podman did not use all this17:35
TheJuliahttps://github.com/opencontainers/image-spec/blob/main/image-layout.md#indexjson-file <-- I think that is the start of it17:35
* TheJulia looks up artifactType17:35
TheJuliahttps://github.com/opencontainers/image-spec/blob/main/artifacts-guidance.md <-- uses should17:36
TheJuliahmmm I like ArtifactType17:37
dtantsurIt sounds like we could use mediaType as well17:37
dtantsurfor those following along: https://github.com/opencontainers/image-spec/commit/749ea9a27d1eb44b5369ee7e8e296c7e99e3d2e517:38
dtantsurAh, there are two different things called mediaType. Thank you, not confusing at all.17:38
TheJuliaIt *looks* like it might be a lower level we might be able to note/annotate it, but I suspect they did it one level up so they didn't have to walk all the way down and then back up17:39
TheJuliaat least, I suspect17:39
TheJuliathey being in that guess is podman17:40
dtantsurOkay, I finally got it. On the top level of the index.json, mediaType is its own media type (a constant), artifactType is an optional type of the contained artifact.17:42
TheJuliayup17:42
dtantsurThen each manifest can have its own mediaType, which can be, well, anything. application/x-qemu-disk17:42
TheJuliaEach manifest after you make a decision right?17:42
TheJuliaso not second level, but third level down right?17:43
TheJuliabecause second level is where you have all of the varying types and just the pointers to the final manifests17:43
dtantsureven first?17:43
dtantsurwhat is preventing me from having https://paste.opendev.org/show/btSiycm241tCqJDkXUJt/ ?\17:43
dtantsur("no existing tooling can product that" is a plausible answer, but let's leave it aside just for a minute)17:44
dtantsurImagine, I have an image with this index.json and exactly one blob17:44
dtantsurAm I missing something?17:44
TheJuliabecause under existing data structures, that would be a lower layer artifact, and if we're going to be along side of containers which may also be bootable, we ideally want to be mindful and of fitting in with other aspects instead of trying to create something entirely different in the same upper level modeling17:45
dtantsurthere may be more manifests with other types17:46
TheJuliaIf we're going to try and carve an entirely new path here, I might as well stop and punt on this, to be entirely honest17:46
TheJuliayes, but they can't be index.json at that point, they would need to be other containers17:46
dtantsurMmm, I'm trying NOT to carve a new path17:46
TheJuliaindex.json is top level, the whole of the representation of a container17:46
TheJuliaMy whole driver here is to have a streamlined path so I could eventually have a single container which has a qcow2 file attached, and a bootable container17:47
dtantsurI'm looking at https://github.com/opencontainers/image-spec/blob/main/image-layout.md#index-example, and in this example they have two "normal" images as well as some AppStream XML17:47
TheJuliaand the user could then just choose based upon the deploy interface17:47
dtantsurIn fact, it's a root index that points at another index, a simpler manifests and just an artifact, right?17:49
TheJuliayeah, the third entry in that example is a definite standalone file17:49
TheJuliasecond is a manifest reference pointer17:49
dtantsurSo that could be our qcow2 alongside the proper container stuffs?17:49
TheJuliathe first is another index reference17:50
TheJuliapotentially, question is platform field and if it can exist at that level17:52
TheJuliathe spec doc walks through what podman did, so it is a top layer primary manifest pointer for the container itself, and then an an index17:53
TheJuliainside that index, each manifest entry has platform and annotations to signify what the files are17:53
dtantsurhttps://github.com/opencontainers/image-spec/blob/main/image-index.md#image-index-property-descriptions lists platform17:54
TheJuliaThey point to a separate manifest file17:54
TheJuliawhich then lists the contents as a single layer (why did they do that?!)17:54
dtantsurYeah, I'm also curious about this indirection17:55
dtantsurnothing in the spec is telling me that I cannot have top-level artifacts with different architectures17:55
dtantsurmaybe simply because of tooling support?17:56
TheJuliaso it could be since I think all layers are expected to be z-streamed17:56
TheJuliamaybe that is why?!17:56
dtantsuri.e. they could create layers with podman easily but they could not great what I describe?17:56
TheJuliaso sort of a tooling convenience and extra transparent compression?17:56
dtantsuryeaah17:56
TheJuliayeah17:56
TheJuliaI kind of suspect some of it is that compression, some of it was explicit modeling of indirection and also trying to not have to walk all the way down17:57
dtantsurThe cost they're paying through is the non-standard "disktype" annotations17:57
TheJuliayup17:57
dtantsurI guess the key question is whether we want to mimic that (keeping in mind that they themselves may pivot from it one day)17:57
dtantsurNeed to do some exercising now. Sorry, I'm afraid I caused more confusion than I solved..17:59
TheJuliaI think they expect to have to if docker decides to do anything else. Perhaps a question back to ?Arron? was Why not do it at a top-ish level (assuming the tools support it17:59
TheJuliathe whole thing that made me raise an eyebrow is the container reference being expected17:59
TheJuliaI bet that is a compatibility aspect on index.json.17:59
dtantsurThe spec is sometimes vague on what is required and what is not17:59
TheJuliawhich drives the mid-level index to manifest linking17:59
TheJuliawhich is then also weird, is that lower level index for machine-os also circles back and points back at the same container manifest as well18:00
dtantsurThe case described in https://github.com/opencontainers/image-spec/commit/749ea9a27d1eb44b5369ee7e8e296c7e99e3d2e5 is remotely similar to ours, and they accepted it as a valid case, so there is hope18:00
TheJuliafrom 2023, I wonder if that was the original focus and they maybe pivoted?18:01
TheJuliawe're making lots of guesses18:01
dtantsurthe author does not seem to be from podman18:01
dtantsuryeah18:01
TheJuliaoh, hmm18:01
TheJuliaon a plus side, I've not written any code at this layer. Still trying to wire in the overall higher level changes18:03
dtantsur++18:04
* dtantsur leaves for real o/18:04
opendevreviewcid proposed openstack/ironic master: [WIP] Save ``configdrive`` in an auxiliary table  https://review.opendev.org/c/openstack/ironic/+/93362218:14
cardoePeek at how helm and other tools like orcas use OCI for storage.19:30
iurygregorytime to setup bifrost in a fresh OS to double check that I'm not going crazy with firmware updates "not working" on stable/2023.2 =( 20:27
-opendevstatus- NOTICE: Gerrit will have a short outage while we update to the latest 3.9 release in preparation for our 3.10 upgrade on Friday21:31
TheJuliacardoe: do you happen to have a specific link to aid us in this :)21:52
JayFI had given up ever seeing these. https://usercontent.irccloud-cdn.com/file/b4gwar2p/PXL_20241202_223226218.jpg22:39
JayFFive nanoKVMs, ready for action :D 22:39
JayFTook about 2 months, and I had given up on getting anything for my money, but they are here. Hopefully they work!22:39
JayFLikely will be what I use to a first stab at redfish console behavior22:43
JayF(they also support IPMI, gross)22:43
iurygregorynice!23:11

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!