Monday, 2023-03-27

rpittaugood morning ironic! o/ Happy PTG week07:59
kaloyankmorning everyone o/08:08
dtantsurarne_wiebalck: morning, does https://storyboard.openstack.org/#!/story/2010479 ring any bells? (from what jrosser mentioned above)09:31
ade_leedtantsur, TheJulia hey - who is in charge of the PTG schedule?11:17
dtantsurade_lee: primarily JayF, but we may be able to help11:17
ade_leedtantsur, sorry for the late notice, but I added a topic to the end of the etherpad to talk about fips11:18
ade_leedtantsur, I was hoping maybe to add it to the schedule during the lightning rounds on wednesday?11:18
dtantsurade_lee:     Update on in-progress Antelope items 1400-1500 UTC11:19
dtantsurwill it fit there?11:19
dtantsuror in the end of Wendesday, yes11:19
dtantsurbut make sure to chat with TheJulia first, I know she has been looking into something-something FIPS11:19
ade_leelooking - certainly the topic she has about md5 is relevant11:19
dtantsurade_lee: the usual question about PTG items: is there something to discuss or is it more of Just Do It?11:20
ade_lee(a bit of both :))11:20
dtantsurade_lee: I've injected FIPS into the md5 topic, let's see if JayF agrees11:21
ade_leedtantsur, sounds good11:21
iurygregorygood morning Ironic11:56
opendevreviewDmitry Tantsur proposed openstack/ironic-specs master: Merge Inspector into Ironic  https://review.opendev.org/c/openstack/ironic-specs/+/87800113:22
opendevreviewDmitry Tantsur proposed openstack/ironic-specs master: Migrate inspection rules from Inspector  https://review.opendev.org/c/openstack/ironic-specs/+/87823013:22
dtantsurarne_wiebalck: also please check the rules one ^^^ since it may have impact on you13:22
opendevreviewDmitry Tantsur proposed openstack/ironic-specs master: Migrate inspection rules from Inspector  https://review.opendev.org/c/openstack/ironic-specs/+/87823013:42
TheJuliasooo many emails13:58
JayFdtantsur: ade_lee: I mean, that's more julia's topic. If she's OK sharing time, I am. 14:04
TheJuliaone is follow-up, quick and easy, the other is "adding another ci job"14:05
TheJuliathe latter might require the former though14:05
TheJuliaso... *shrugs*14:05
ade_leeJayF, TheJulia I'm open to talking fips stuff later in the week if we wont have enough time14:06
ade_leebut yeah, if we're going to trip up on md5 issues, then certainly the topics are related14:06
JayFIs there an openstack-wide goal of FIPs jobs? Or some Ironic user asking for them?14:07
opendevreviewMerged openstack/ironic master: Enables boot modes switching with Anaconda deploy for ilo driver  https://review.opendev.org/c/openstack/ironic/+/86082114:07
JayFJust trying to make sure I fully grasp the 'why'14:07
ade_leethis is the openstack-wide fips goal14:07
JayFIs there a spec or policy link I could read about it at?14:07
ade_leehttps://github.com/openstack/governance/blob/master/goals/selected/fips.rst14:08
JayFyeah we'll need the hashlib.md5(usedforsecurity=false) in a few places I'm sure14:09
JayFthanks for the link14:09
arne_wiebalcko/14:10
JayFTheJulia: i'll go to the UTC 1500 keystone meeting if you can clue me into concerns I should bring14:10
* arne_wiebalck realises he is not late but early, DST FTW14:12
JayFNo Ironic sessions until Weds anyway :)14:13
arne_wiebalckdtantsur: thanks for pointing me to the specs14:13
arne_wiebalckdtantsur: re ESP size, I remember we discussed this at the time and followed ESP expert advice to make it 550MiB14:20
TheJuliaJayF: no concerns really, just need to reach the same page14:20
* dtantsur dislikes magic numbers..14:20
arne_wiebalckdtantsur: dtantsur https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/raid_utils.py#L28-L3014:21
dtantsurJayF: I think previously we could not use usedforsecurity because of the Python version14:21
arne_wiebalckdtantsur: there seems to be no limit for the ESP, so it could always be bigger than what we set14:21
dtantsurarne_wiebalck: I wonder if we switch to 1G.. unless we can determine it in advance?14:21
dtantsurI wonder what they put there though.. my /boot/efi is 42M14:22
dtantsurmaybe it's a just a large partition on the image because they never put any thoughts into it :)14:22
arne_wiebalckyes, exactly my thinking ... what is in there?14:22
dtantsureven if it's empty, dd'ing will fail anyway, right?14:22
dtantsurmaybe we need to rsync instead :D14:23
arne_wiebalckwe should probably make it 1025GB to be ultra-safe :-D14:23
arne_wiebalckI remember I was surprised to see one could use cp this way14:23
arne_wiebalckprobably rsync is even more efficient14:23
arne_wiebalcksince it copies 42MB, not 550MB14:24
dtantsurRight. But then we need to deal with selinux stuff..14:24
JayFhttps://storyboard.openstack.org/#!/story/2010479 while you all are looking at that14:24
dtantsurah, it's VFAT, so no selinux14:24
JayFit seems like we need to make the partition slightly larger in raid cases?14:24
dtantsurJayF: we're literally discussing this issue14:25
JayFokay, I wasn't 100% sure 14:25
dtantsurthe problem is: there is no reasonable upper limit14:25
JayFCan we inspect the image and glean it?14:25
dtantsursomeone can create an image with 10G EFI partition just because14:25
dtantsurstreaming :(14:25
JayFGPT is still in the first "X" bytes of the stream, right?14:26
dtantsuryeah, but we need to pre-create the partition before streaming the root device, I assume?14:26
JayFyeah, that's what I was thinking as I went down that path14:26
JayFCould we allow metadata to be set in glance or ironic (if no glance) indicating the esp side?14:26
JayF**size14:26
dtantsurthat's probably the way to go in the end..14:27
* dtantsur dislikes too many options, but too many options are chasing him anyway14:27
arne_wiebalckhow would metadata help in the case at hand? try, fail, get the size, set the meta data, retry?14:28
JayFIf you create an image in glance with a 42 gigabyte ESP, you set some metadata14:28
JayFthat gets passed down thru to the ipa paritioning code14:28
JayFand taht metadata-value is used to size the ESP14:28
arne_wiebalckI get that14:29
JayF(+add enough room for a raid superblock)14:29
arne_wiebalckbut you would need to do this when you download/upload an image14:29
JayFIMO there are two problems being conflated here, which is OK since they are solved similarly:14:29
JayF1) we aren't adding in extra size for raid metadata when raid is being used (to ensure ESP size remains expected usable)14:29
dtantsurwell, a conductor can inspect the image14:29
JayF2) ESP size is hardcoded in Ironic making it impossible for larger ESPs 14:30
TheJulia... larger ESPs... eek14:30
arne_wiebalckTheJulia: ++14:30
arne_wiebalckthere should be a use case for a larger ESP 14:30
dtantsurimagine installers that only accept Gigabytes as units :D14:30
JayFI run all my machines with a 1GB combined ESP+Boot partition14:31
JayFit's not that weird of a setup for a desktop14:31
arne_wiebalckwell, you combine two use cases14:31
JayFWhose to say some operator won't build their images in the same way :) I sorta feel this is reverse, we're hardcoding a value, we have to justify that as a project vs justifying odd values14:32
dtantsurIs it hard for us to inspect an image on the conductor side14:32
JayFwe are the project that allows you to literally write deployment steps to do crazy crap during your deployment, and >.5GB ESP is a bridge too far? LOL14:32
dtantsurI know virt-* stuff actually launches a VM, but maybe there is something more lightweight?14:32
JayFdtantsur: my concern would be more about if there's a security door opened doing that14:32
TheJuliaI guess my concern is increasing/changing the size is going to ripple. Steve changed some downstream image stuff only by a half gig and it has been a world of hurt14:33
dtantsurlike, buffer overflow in an image parsing code?14:33
JayFdtantsur: I bet we can find a partition reading program that'd work on an image, second best case would be qemu-nbd loopback mounting it and feeding that to a fdisk14:33
JayFdtantsur: more if whatever inspection method we used actually mounted it14:33
dtantsurI'd really, really want to stop having root operations in ironic14:33
jrosser^ this is exactly what we did when debugging it, using qemu-nbd + fdisk14:33
dtantsurjrosser: but that requires root, right?14:34
JayF`fdisk -l /path/to/disk.img` works14:34
JayFbut I assume maybe only works on raw images?14:34
dtantsurJayF: for raw images14:34
dtantsurright14:34
dtantsurwe do convert to raw images by default14:34
jrosseraccording to my notes we did it on a qcow214:35
dtantsurqemu-nbd will work, but will it work without root?14:35
* dtantsur is looking at qemu-img map14:35
dtantsurokay, so there is `qemu-img dd`, which we can probably use to extract the MBR/GPT metadata14:36
* dtantsur is messing around with CLI14:41
dtantsurI must say, the tools dislike only having the first part of the image..14:49
dtantsurfdisk outputs nonsense. gdisk outputs seemingly real information, but does not strictly match virt-filesystems Oo14:52
dtantsurtook the first 4MiB of a debian qcow2, got https://paste.opendev.org/show/bhgcMlkiCXPGe6Fs5c3m/14:56
dtantsurEF00 is the code of a EFI partition14:56
dtantsurit's size is consistent with virt-filesystems btw. so we can use max(550, <this size>)14:59
dtantsurarne_wiebalck, jrosser ^^^15:01
dtantsurthe command was $ qemu-img dd -O raw count=100 if=debian-11-genericcloud-amd64.qcow2 of=/tmp/gpt15:02
dtantsur(so it was not 4M, I cannot count today)15:03
jrosserdtantsur: here is the same thing from an image created with dib https://paste.opendev.org/show/bY9EmqAjeoMYUKH2yNtm/15:30
dtantsurjrosser: okay, so 550M. but it's not enough?15:32
jrosserfor the error we had it's not a case of enough, it's that 550M is greater than 550M - 64K15:33
jrosserand 550M is hardcoded into ironic and also dib15:33
jrosserso short of hacking the source code to one of those, it doesnt fit15:34
dtantsur64K - where does it come from?15:35
dtantsurwe even seem to be using 551M in reality15:37
jrosserwell i don't know - i'm assuming that it's some md superblock or something like that15:41
JayF64k is the size of the md raid superblock15:42
JayFso that's what I assumed15:42
dtantsurokay, so the first and foremost fix, we need to account for that15:47
JayFOne way is "if raid; add 64k" (really, probably just add a Meg or something to be safe) which could break existing assumptions from operators about what the math looks like on the rest of their drive15:49
JayFor we can raise the default ESP size slightly to avoid this particular error (quick fix), and then have some more complex logic for the metadata sizing and all moving forward like we talked about ^^ (long-term fix)15:49
clarkbfwiw I think dib lets you set all the partition sizes? The default is 550MiB per https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/block-device-efi/block-device-default.yaml but ou can provide a different layout: https://docs.openstack.org/diskimage-builder/latest/user_guide/building_an_image.html#disk-image-layout15:55
clarkbthere is probably something that can be chagned to make this a better user experience but ^ should work as a workaround in the interim?15:55
jrosserclarkb: oh my - all the time we spent trying to figure this out and didnt find that we could do that :/16:05
jrosseryou are right it really does boil down to needing a better user experience16:06
prometheanfireit looks like ramdisk deploys still require an image_source to be defined?  https://github.com/openstack/ironic/blob/master/ironic/common/pxe_utils.py#L685-L69217:36
prometheanfiredocs state it went away in xena17:36
JayFhttps://github.com/openstack/ironic/blob/master/ironic/common/pxe_utils.py#L713 image properties being unset shouldn't be fatal17:38
JayFdoes that raise if no image_source?17:38
prometheanfireJayF: I'm setting kernel and ramdisk17:38
TheJuliahttps://github.com/openstack/ironic/blob/master/ironic/common/pxe_utils.py#L708-L712 17:39
prometheanfireis it possible to ramdisk boot a node with images stored in glance?17:39
TheJuliashould be, I've never explicitly tried17:40
* TheJulia lays down for a little bit17:41
prometheanfireimage_type seems to get set to partition in instance_info, which seems odd as deploy_interface is ramdisk17:41
TheJuliaugh, that is likely a bug inferring everything is just a partition image if a kernel/ramdisk is present17:41
TheJuliapossibly due to naming as well17:42
TheJulianaming is hard, anyway, it shouldnt' be fatal to not have an image source17:42
prometheanfiretrying to follow https://docs.openstack.org/ironic/latest/admin/ramdisk-boot.html17:42
* JayF adds this to the list of troubleshooting he owes prometheanfire17:43
prometheanfirelol, something to poke at17:43
prometheanfireI'll see if I can get a ramdisk boot to happen with glance, somehow17:44
prometheanfirehttps://paste.openstack.org/show/819356/17:45
JayFprometheanfire: can you edit code directly?17:46
JayFI'm pretty sure if you just fixed the keyerror it'd work17:46
prometheanfiresetting image source still has other errors it looks like, I'm guessing image_source is a dict17:46
prometheanfireya17:46
JayFhttps://github.com/openstack/ironic/blob/master/ironic/common/pxe_utils.py#L691 change that to image_properties = None 17:46
JayFand see if it works17:46
JayFI think that hard-requiring d_info to have an image_source may be the (only?) bug17:46
JayFit's certainly the first one lol17:46
prometheanfireno change, https://paste.openstack.org/show/819357 keep in mind the line numbers are for zed17:50
JayFack; I'll have to take a bigger look later17:51
prometheanfirecool, thanks much17:51
prometheanfireJayF: 685 tries to access d_info['image_source'] as well17:55
JayFIt'd be fun to see where/how it breaks if you fixed all the keyerrors18:00
prometheanfiresetting that to None, causes ImageRefValidationFailed :D18:00
prometheanfireScheme-less image href is not a UUID18:00
prometheanfireso maybe I'll look for a way to ramdisk boot via glance in the mean time18:00
JayFaight18:01
ashinclouds[m]I don’t remember what we do for tempest unfortunately18:01
TheJulia“/me fails at laying down and resting the brain for a few minutes18:02
* JayF succeeded in attending two hours of TC PTG without falling asleep once lol18:02
TheJuliaYay18:03
JayFvideo meetings of length are hard, even with periodic breaks18:03
knikollajust wait until the consecutive 4 hours on Thu-Fri :)18:22
JayFIronic has I think 3/4 hour long sessions, we made time for breaks but it's still brutal18:23
TheJuliaJayF: anything w/r/t oauth2?18:51
JayFTheJulia: I did not attend as the person replied on the mail list saying they were punting it to a time that was compatible18:56
JayFI think we can make 1500 UTC tomorrow work, actually18:57
JayFlooking at the Tues schedule and the Weds APAC schdule -- I booked firmware upgrades in 2x places18:58
TheJuliaOh, okay18:58
JayFwhich gives us an extra 30 minutes on Tuesday18:58
JayFhttps://etherpad.opendev.org/p/ironic-bobcat-ptg sanity check before I reply to list18:58
JayF?18:58
* JayF would put the OAUTH 2.0 chat at 1500-1520, then a break, then move everything else +30min19:00
JayFTheJulia: wdyt?19:00
TheJuliacalendaring19:01
TheJuliathat should work I think19:02
TheJuliaJayF: added some comments to the oauth2 etherpad20:44
JayFack20:46
JayFI have opinions along the lines of cheerleading "hooray this sounds awesome" but know very little about OAUTH2 and authentication in general20:47
JayFso happy to take a backseat in the discussion20:47
JayF(it's really nuts how little I know about OAUTH2 considering I used to work for Okta)20:47
TheJuliaack, the spec is very much in line with what is needed20:47
prometheanfireI don't see a way to ramdisk boot from glance20:47
TheJuliaI have concern over how to get the the auth details into the client for the session 20:48
TheJuliasince an oauth2 provider can be a SSO portal20:48
* TheJulia sees the wife is creating what looks like a hallway on a UNN Battleship in 3d on the other computer in the office20:48
JayFThe app I worked on at Okta did exactly that20:48
JayFyou'd run a command to login (app auth login) 20:48
JayFit'd open a browser on some platforms; on others it'd just print a URL20:49
TheJuliaprometheanfire: well, that is a bug then :(20:49
JayFclick the URL, copy+paste the result, press enter (or in supported platforms, just interact with the popped-up browser)20:49
prometheanfireoauth2 support will be nice20:49
JayFwe'd have to probably have explicit support in a CLI client for it20:49
TheJuliaI'm pretty sure I've seen a "launch browser window, login, and the token gets extracted back to the caller"20:50
JayFprometheanfire: if you're interested, we're talking about it in a cross project session tomorrow @ 1500 UTC in folsom20:50
JayFTheJulia: yep, I've done that multiple places :)20:50
prometheanfireprobably too early for my blood (US/Central)20:50
* TheJulia looks at her US Pacific timezone and thinks she will be well caffinated20:50
TheJulia:)20:50
TheJulia... as long as I make it to the supermarket after my doctor's appointment20:51
JayFprometheanfire: ...is 1500 UTC not 10am Central?20:51
prometheanfirehmm, actually not too bad, might make it20:51
TheJuliaHmm... the hallway could now be from the Anubis...20:51
TheJuliaAnyway, happy to fix the bug prometheanfire (which might have worked, that is a tricky area of the code unfortunately)20:52
TheJuliaJust might not be immediate20:53
JayFYeah, and also he reported that it seems we broke cleaning on devices that held a software raid20:53
JayFI gotta dig into that to see if it's a regression, a misconfiguration, or "yes"20:53
TheJulia... That I think is intentional20:53
JayFI am 99.9999% sure it had review feedback to be made the not-default20:54
TheJuliamaybe, that was a while ago unfortunately20:54
JayFmaybe I just made it optional?20:54
JayFyeah, it was20:54
sean-k-mooneyJayF: are you still planning on working on the shard key enabling this cycle?21:00
JayFsean-k-mooney: I'd like for that work to proceed (in nova driver) with John Garbutt leading it. I'm working on the backend to make sure he has time to do so. 21:00
sean-k-mooneyack if the code gets written we would be happy to review21:02
sean-k-mooneyi just wanted to make sure that it was still planned to proceed21:02
JayFthe hope is yes; I'll even work on it myself if I have to, but I don't think that'd be ideal :D 21:03
TheJuliadid the spec get approved on the nova side?21:06
sean-k-mooneyit did last cycle21:06
TheJuliak21:06
sean-k-mooneyit techniclaly need to be propsoed again but we have a fast reappal policy if its not changed21:06
sean-k-mooneyother then the release names ectra21:06
* prometheanfire tried to create an ironic bug but storyboard still hates me I think, here's the text I'd put in one https://paste.openstack.org/raw/bf1GtDV8GxUq9qdK59Ir/21:07
sean-k-mooneyso i dont think that will be a blocker just paper work21:07
prometheanfireI tried setting the kernel and ramdisk to be glance image IDs and that failed as well (set image_source in case as well), should that be a separate bug?21:08
sean-k-mooneyso its failng here https://github.com/openstack/ironic/blob/master/ironic/common/pxe_utils.py#L682-L68921:10
prometheanfireya, thereabouts21:10
sean-k-mooneyhttps://github.com/openstack/ironic/commit/7b47e09a385d388854a3c81bc13c333b36c18a3621:10
sean-k-mooneythats what last changed that21:11
sean-k-mooneybut thats the extent of my git foo21:12
prometheanfireI saw that, maybe I should revert more fully to test21:12
prometheanfireand maybe glance will work again when setting kernel=glance_uuid and ramdisk=glance_uuid (not sure if that ever worked though, being honest)21:13
sean-k-mooneyi think it did at least i only ever rememebr using glance uuid with ironic21:14
prometheanfiresean-k-mooney: to boot ramdisks nodes (meaning https://docs.openstack.org/ironic/zed/admin/ramdisk-boot.html )?21:15
JayFyeah I think we probably introduced a bug around that21:16
JayFwhen improving anaconda support21:16
JayFlooking at the code, I think there's something there to fix21:17
JayFnot just bad docs21:17
sean-k-mooneyi have only ever really used ironic via nova or kolla-ansible/biforst so i dont really know how this works in detail21:17
JayFramdisk deploy is one of our cooler features sean-k-mooney 21:18
JayFbasically instead of installing an OS, you just don't. and boot a ramdisk instead.21:18
JayFIt should be a ton simpler, but we managed to break it anyway. Bugs are like that LOL21:18
* prometheanfire wants to boot a harvester cluster, and one if their deploy methods is ramdisk, the other is iso...21:19
sean-k-mooneyah ok so using http boot or simialr21:19
prometheanfirekinda, but not really, files still served over tftp21:20
JayFor not :) that's the magic of Ironic21:20
JayFprometheanfire: we *can* do that I believe 21:20
JayFbut you don't have to21:20
sean-k-mooneywell i have pxebooted home servres form a ramdisk before i just assuemd thie is either using legacy pxe with tftp or redfhist httpboot21:21
sean-k-mooneydepending on what your hardware suports and you ahve cofnigured21:21
JayFI think we might even support ramdisk deployments where the "ramdisk" is inserted as a virtual media device21:21
sean-k-mooneylike the IPA ramdisk was always just booted over the netwrok so now your jsut booting something else right21:21
JayFpretty much-ish? I think there's some additional magic needed because of network switching but I'm not super familiar with the feature21:22
prometheanfireJayF: yep, I wanted to do that, but this server is finicky with ipxe21:22
sean-k-mooneymy issue with thse features is the only servers i have that have bmcs at home  wer made aroud 201121:22
prometheanfireso new?21:23
sean-k-mooneyso bios only, with ssd coonected over sata 221:23
JayFdid we deprecate legacy bios boot yet? I know it was a ptg topic at one point21:24
prometheanfireno, default changed to uefi21:24
sean-k-mooneystill got 2 8 core cpus and 48GB of ram  so fine dev but way to power ineffect at this point to keep running 21:24
JayFNext time I build a desktop, I'm taking my current one (5950x 16c/32t + 64GB ram) and turning it into a 24/7 dev VM host21:25
JayFif I didn't have to reboot into windows to do my streams, or kill VMs to play a game, I'd probably be using it now21:25
sean-k-mooneywhy not stream from linux21:26
prometheanfireJayF: sean-k-mooney with revert still getting the same image_source keyerror21:26
sean-k-mooneyalso i can recommend steam decks they work pretty well21:26
JayFI've encountered bugs, multiple times, where OBS Studio audio meters will be bouncing around like audio is going thru, but nothing is going out the stream21:26
JayFeven is audio in the recording, but not going out to the stream21:26
JayFand it's not every time, but it's frequently enough that when I gotta do streams as part of my job, I can't justify it just randomly not working time to time21:27
JayFsean-k-mooney: I own a steam deck and do 99.999% of my gaming in linux; literally am only rebooting at this point for streaming21:28
* TheJulia whispers ‘instance_info/boot_iso’ to prometheanfire21:29
sean-k-mooneyprometheanfire: what about https://github.com/openstack/ironic/commit/4d653ac225a26dd83e81477f9d1152fe385c64aa#diff-04a54f43c1423729b0eab47a9f7ae102af6b1c68b3f897071362bf898fe3378821:29
prometheanfireTheJulia: work with ipmi driver?21:29
TheJuliasean-k-mooney: my gut feeling is that is where I broke it21:30
sean-k-mooneyim wondering if its related to " if not image_properties:" > " if not image_properties and not isap:"21:30
JayFyeah I think we broke it when we added isap21:30
TheJuliaprometheanfire: yes, as long as the boot_interfacenis ipxe21:30
JayF...which is the linked commit21:30
TheJuliaAnd there are not other files on the disk21:30
JayFthat was a scary one to review, I remember it still21:30
prometheanfireTheJulia: heh, I guess I'll test21:31
TheJuliaprometheanfire: if everything is in the ramdisk on the iso, it should fire up.21:31
prometheanfireI was having problems with ipxe21:31
sean-k-mooneyanywya im going to go eat and relax o/21:31
* TheJulia goes to meet new doctor21:32
prometheanfiresean-k-mooney:  yarp, I'll test reverting both patches21:32
* prometheanfire is about to go pick up baby21:32
prometheanfireI can confirm that with that commit reverted it's attempting to deploy21:50
prometheanfirenow, have to run before it finishes, it's downloading the ramfs now (to cache)21:51
TheJuliaJayF: deprecating legacybios was a question at one point, but we had some vendors say they will support it forever22:42
TheJuliaPlus there are downsides with the other hardware platforms22:45
JayFmakes sense23:00
TheJuliaprometheanfire: ack ack, thanks for the data point23:36

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!