opendevreview | Vanou Ishii proposed openstack/sushy master: Fix missing ETag when patching Redfish resource https://review.opendev.org/c/openstack/sushy/+/892113 | 02:28 |
---|---|---|
*** dmellado81918134 is now known as dmellado8191813 | 04:48 | |
rpittau | good morning ironic! o/ | 06:40 |
kubajj | Good morning Ironic | 07:29 |
opendevreview | Dmitry Tantsur proposed openstack/bifrost master: Remove Fedora from the CI https://review.opendev.org/c/openstack/bifrost/+/892123 | 07:46 |
dtantsur | FYI folks ^^^ | 07:46 |
rpittau | :( | 08:14 |
arne_wiebalck | Good morning, Ironic! | 08:25 |
rpittau | hey arne_wiebalck :) | 08:31 |
rpittau | hey kubajj :) | 08:31 |
arne_wiebalck | hey rpittau o/ | 08:31 |
arne_wiebalck | rpittau: do you think it recheck'ing my patch is worth a try? otherwise I am a little lost how to tackle Zuul's lack of approval ... cookies worked in the past :thinking | 08:32 |
arne_wiebalck | rpittau: do you think it recheck'ing my patch is worth a try? otherwise I am a little lost how to tackle Zuul's lack of approval ... cookies worked in the past :thinking: | 08:32 |
arne_wiebalck | rpittau: https://review.opendev.org/c/openstack/ironic-python-agent/+/891609 | 08:33 |
rpittau | arne_wiebalck: let's try with another recheck for now | 08:42 |
rpittau | btw my CS9 patch for metalsmith passed, so moving to CS9 soon | 08:43 |
arne_wiebalck | rpittau: thanks! | 09:31 |
opendevreview | Mahnoor Asghar proposed openstack/ironic master: Add inspection (processing) hooks https://review.opendev.org/c/openstack/ironic/+/887554 | 11:00 |
iurygregory | good morning Ironic | 11:40 |
TheJulia | good morning | 13:13 |
iurygregory | good morning TheJulia | 13:14 |
arne_wiebalck | rpittau: same metalsmith jobs failing after a recheck ... I had another look, but do not see the connection to my patch ... bdist_wheel complaints and dropped connections are the errors I see in the logs | 14:01 |
arne_wiebalck | hey iurygregory and TheJulia o/ | 14:01 |
iurygregory | arne_wiebalck, o/ | 14:01 |
rpittau | arne_wiebalck: I think the problem is related to this message 'mount: can't setup loop device: No such device or address' but not sure why that's happening | 14:10 |
rpittau | although I think we saw that in the past | 14:10 |
TheJulia | dtantsur: is that going to be backported or just a new change to remove CI jobs in older branches? | 14:11 |
arne_wiebalck | rpittau: thanks for checking once more | 14:12 |
arne_wiebalck | rpittau: I only searced for error, not failure :) | 14:12 |
arne_wiebalck | rpittau: do other jobs suffer from this, or only my patch? | 14:12 |
rpittau | it's all patches AFAICS | 14:12 |
dtantsur | TheJulia: the fedora one? I'm afraid it will since nodepool changes affect all branches. | 14:14 |
dtantsur | Since the job is non-voting, we can of course just ignore it.. | 14:14 |
TheJulia | well, we ought to at least backport changes to remove it from our jobs/config/etc | 14:15 |
dtantsur | to be clear, the change does not remove any actual functionality, so if someone has it working, it will keep working for now | 14:15 |
TheJulia | okay, then just our jobs/config for CI then | 14:17 |
arne_wiebalck | rpittau: thanks ... and "phew" :-D | 14:18 |
opendevreview | Julia Kreger proposed openstack/ironic master: DNM Enable OVN https://review.opendev.org/c/openstack/ironic/+/885087 | 14:21 |
iurygregory | fyi I won't be able to join our weekly meeting today | 14:30 |
rpittau | arne_wiebalck: I think I found the issue, need to do one more tests | 14:30 |
arne_wiebalck | rpittau: oh, rly? that would be great ofc ... what do you suspect as the cause? | 14:31 |
opendevreview | Riccardo Pittau proposed openstack/metalsmith master: Use jammy nodes to run CI jobs https://review.opendev.org/c/openstack/metalsmith/+/892146 | 14:32 |
rpittau | wellllll... this ^ :D | 14:32 |
arne_wiebalck | rpittau: :-D | 14:33 |
rpittau | we're still running on focal there and I'm afraid there could be some issues with the latest tinycore build we're using | 14:34 |
rpittau | or just focal nodes not working well in general | 14:34 |
arne_wiebalck | +1 ... will retry once this one is merged | 14:35 |
TheJulia | file under semi-crazy idea: https://bugs.launchpad.net/ironic/+bug/2032380 | 14:39 |
dtantsur | TheJulia: we may get some interest in that soon (will pm you a link) | 14:40 |
dtantsur | TheJulia: I'm +2 to the whole proposal, except that I'm rather +0 on the "nota bene" | 14:41 |
opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent master: [DNM] test ci metalsmith integration jobs https://review.opendev.org/c/openstack/ironic-python-agent/+/892147 | 14:41 |
rpittau | arne_wiebalck: if you keep an eye on this ^ should tell if it works | 14:42 |
opendevreview | Mahnoor Asghar proposed openstack/ironic master: Add inspection (processing) hooks https://review.opendev.org/c/openstack/ironic/+/887554 | 14:43 |
arne_wiebalck | rpittau: +1 | 14:43 |
arne_wiebalck | rpittau: thanks! | 14:43 |
opendevreview | Mahnoor Asghar proposed openstack/ironic master: Add inspection (processing) hooks https://review.opendev.org/c/openstack/ironic/+/887554 | 14:45 |
TheJulia | dtantsur: it is just a note, nothing more | 14:45 |
JayF | TheJulia: my only response is surprise we don't already do that | 14:47 |
JayF | which usually means it's a good suggestion :D | 14:47 |
TheJulia | we needed the abilility to send a url ?which? I think lines up to 2019-ish timeframe | 14:47 |
TheJulia | I haven't looked at the dmtf schema rev dates recently though | 14:48 |
dtantsur | yeah, it was a later addition | 14:48 |
TheJulia | and 2020.... was... 2020 | 14:49 |
dtantsur | dark times, we're not talking about them here | 14:52 |
JayF | 2020, you mean that old ABC news show, right? | 14:55 |
JayF | there is no 2020 other than that I know of | 14:55 |
JayF | :D | 14:55 |
JayF | #startmeeting ironic | 15:00 |
opendevmeet | Meeting started Mon Aug 21 15:00:40 2023 UTC and is due to finish in 60 minutes. The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
opendevmeet | The meeting name has been set to 'ironic' | 15:00 |
TheJulia | o/ | 15:00 |
kubajj | o/ | 15:00 |
opendevreview | Julia Kreger proposed openstack/ironic master: DNM Enable OVN https://review.opendev.org/c/openstack/ironic/+/885087 | 15:01 |
JayF | Good morning Ironic'ers! | 15:01 |
JayF | A reminder we operate under the OpenInfra Foundation CoC https://openinfra.dev/legal/code-of-conduct | 15:01 |
JayF | #topic Announcements/Reminders | 15:01 |
dtantsur | o/ | 15:01 |
JayF | #note Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio: https://tinyurl.com/ironic-weekly-prio-dash | 15:01 |
JayF | I'm also going to note that | 15:01 |
JayF | #note Bobcat non-client library freeze is Thursday, Aug 24 | 15:01 |
JayF | Finally, one about the PTG | 15:02 |
JayF | #note PTG is virtual and taking place October 23-27 2023 | 15:02 |
JayF | #link https://etherpad.opendev.org/p/ironic-ptg-october-2023 | 15:02 |
JayF | please use the etherpad to chat about topics of interest for the etherpad | 15:02 |
JayF | Any comments/questions on the announcements, or anything to add? | 15:03 |
* dtantsur has nothing | 15:04 | |
JayF | I'm going to skip the next item; we do not have action items from the last meeting to follow up on. | 15:04 |
JayF | #topic Review Ironic CI Status | 15:04 |
JayF | We have a couple of CI-related items on the agenda I wanna let folks know about before we get into general status | 15:04 |
JayF | frickler brought it to our attention in IRC Friday afternoon that Ironic is one of the projects left with the most zuul config errors | 15:05 |
TheJulia | so, apparently our power just dropped, I don't know how much longer we'll be on | 15:05 |
JayF | this is basically when CI is so broken that zuul can't even read the config (usually it means we haven't had any patches pass testing since the zuul queue change months maybe year+ ago) | 15:06 |
JayF | #link https://od42.de/ironic | 15:06 |
JayF | that link doesn't work for me, but if I use the filter manually it's obvious we have old bugfix and stable branches plagued by the issue | 15:06 |
rpittau | o/ (man I'm late) | 15:07 |
JayF | heck, it looks like python-ironicclient gates as recent as yoga are impacted | 15:07 |
JayF | dtantsur: TheJulia: Is it one of your teams that use bugfix/ branches downstream? | 15:07 |
JayF | That's one of the big pieces of info I'm missing: if these are bugfix branches which ones can we nuke | 15:07 |
dtantsur | JayF: we used to rely on them; no longer I think | 15:08 |
TheJulia | Mine does not | 15:08 |
dtantsur | not even for ancient releases, right rpittau? | 15:08 |
JayF | Ack, let me take this action then | 15:08 |
TheJulia | I *believe* there was a list made ~1 year ago which enumerated a ton of branches that could be dropped | 15:08 |
rpittau | JayF dtantsur: not anymore no | 15:08 |
JayF | #action JayF to audit zuul-config-errors, propose retirement of clearly-abandonded branches and try to fix broken ones | 15:08 |
frickler | JayF: seem I need to do some URL quoting on that redirect :-/ | 15:08 |
rpittau | we plan to use bugfix for metal3 upstream but only the latest 1-2 | 15:08 |
JayF | frickler: yeah, it happens but it's obvious what's broken :) | 15:08 |
dtantsur | okay, so as far as OCP is concerned, we're fine with going back to short-lived bugfix branches | 15:09 |
rpittau | yep | 15:09 |
dtantsur | metal3 - as rpittau said (thanks!) | 15:09 |
rpittau | btw I need to propose new bugfix branches this week :D | 15:09 |
JayF | Would someone who isn't me mind htiting the list with a "bugfix branch update" saying some of this | 15:09 |
JayF | with a proposal for how long they should live, etc? | 15:09 |
JayF | I don't wanna just guess and rugpull, but it's obvious right now we keep 'em up for too long | 15:09 |
rpittau | I thought we've updated already our docs ? | 15:09 |
dtantsur | We can simply get back to what I proposed in the spec back then | 15:09 |
dtantsur | and yeah, the docs | 15:09 |
JayF | ack | 15:10 |
frickler | JayF: fixed | 15:10 |
JayF | to summarize my understanding of the existing docs: bugfix branches are retired when their letter'd counterpart go out | 15:10 |
JayF | frickler: oooh very nice | 15:10 |
JayF | lol stable/pike NGS | 15:10 |
rpittau | the bugfix branches last for at most 6 months | 15:10 |
JayF | that screams "jay didn't retire this with the rest" :( | 15:11 |
rpittau | then they can get pulverized | 15:11 |
JayF | ack; that works for me | 15:11 |
JayF | So the other CI-related item we have is | 15:11 |
dtantsur | yeah, 6 months matches my recollection | 15:11 |
JayF | After the chat in IRC last week about janders's change and not getting tested on real hardware | 15:11 |
JayF | I reached out to the HPE team, they claim to have fixed HPE Third Party CI. | 15:11 |
JayF | https://review.opendev.org/c/openstack/ironic/+/889750 one of their examples, has a run on it | 15:12 |
JayF | #note HPE Third Party CI is functioning again. | 15:12 |
dtantsur | \o/ | 15:12 |
rpittau | nice | 15:12 |
JayF | Is Is there anything generically about CI we need to speak about? | 15:13 |
JayF | I think other than the endless sqlite locking battles we've been cleaner than usual? | 15:13 |
rpittau | the metalsmith src jobs are busted at the moment | 15:13 |
rpittau | this impacts ipa CI | 15:13 |
JayF | ack | 15:14 |
rpittau | I think it depends on the fact that they still use focal | 15:14 |
rpittau | so proposed https://review.opendev.org/c/openstack/metalsmith/+/892146 | 15:14 |
JayF | that makes sense to me, and is a forced migration anyway | 15:14 |
JayF | we shouldn't release "B" with it on focal anyway | 15:14 |
rpittau | yep | 15:14 |
JayF | Thanks for looking into that. | 15:15 |
JayF | Are there any other outstanding CI items? | 15:15 |
rpittau | also CS9 jobs in metalsmith -> https://review.opendev.org/c/openstack/metalsmith/+/869374 | 15:15 |
JayF | landed that one just now | 15:15 |
rpittau | great, thanks! | 15:15 |
JayF | Do we have a userbase for metalsmith? | 15:15 |
JayF | I feel like I only ever hear about it when CI is broken | 15:15 |
JayF | I assume CERN since arne_wiebalck has some activity on it? | 15:16 |
arne_wiebalck | nope | 15:16 |
rpittau | mmm not sure, maybe TheJulia or hjensas know ? | 15:16 |
arne_wiebalck | I just ran into it since zuul is not happy with my raid rebuild patch | 15:16 |
JayF | Interesting. OK. Maybe I'll add that to PTG topics. | 15:16 |
TheJulia | sorry what might I know? | 15:16 |
JayF | TheJulia: if we have any known users of metalsmith | 15:17 |
TheJulia | Just RHOSP | 15:17 |
TheJulia | afaik | 15:17 |
dtantsur | When I created it, I hoped that people start using it just in general, as a handy CLI | 15:17 |
dtantsur | Maybe I was naive, and we need something equal in ironicclient (and the backing API)... | 15:17 |
JayF | RHOSP is a pretty big user of it then :D | 15:17 |
JayF | dtantsur: it's possible for all those things to be true at the same time :D | 15:18 |
JayF | dtantsur: metalsmith can blaze a trail ,we can use that to figure out how to make it work in primary clients/apis | 15:18 |
dtantsur | the planned but never implemented Deployment API was the next logical step for me | 15:18 |
JayF | that's good to know though, I just want someone to have a use case for it, RHOSP totally counts | 15:18 |
JayF | dtantsur: maybe toss that on PTG topics and we can res it? | 15:18 |
JayF | nobody is goign to make time to do it if we don't talk about it and hype it up | 15:19 |
TheJulia | I added Metalsmith to the ptg topic list | 15:19 |
JayF | I can be your hype man dtantsur | 15:19 |
dtantsur | I don't see a point in that. We've had these discussions over and over again. | 15:19 |
dtantsur | heh | 15:19 |
dtantsur | Until someone has a vested interest, it just does not happen... | 15:19 |
TheJulia | Problem is, at least in my circles, it gets viewed as this "alternative to ironic" or "replacement of ironic" | 15:19 |
TheJulia | and people don't really grok that it is just a client | 15:19 |
dtantsur | lol | 15:19 |
TheJulia | yeah | 15:19 |
JayF | Maybe the answer from PTG is gonna be to make better docs out of it :) | 15:20 |
JayF | I was talking to kubajj this morning about how doing non-nova Ironic deploys is not very intuitive | 15:20 |
TheJulia | I think the real isuse is tons of people don't know how to actually *use* ironic | 15:20 |
dtantsur | Or decide how we can decompose metalsmith into smaller bits and gradually merge | 15:20 |
JayF | and AFAICT we lack a directive doc on how to do it exactly | 15:20 |
TheJulia | even though there are videos, pages, everything else | 15:20 |
JayF | TheJulia: YES | 15:20 |
dtantsur | OH YEAH | 15:20 |
dtantsur | instance_info anyone? | 15:20 |
TheJulia | almost like we need a class | 15:20 |
dtantsur | Who does not like JSON fields without schema validation? | 15:20 |
TheJulia | most people I talk to don't even think along those lines, it is a big scary thing they just don't understand in general | 15:21 |
dtantsur | If the most basic thing Ironic is doing needs to be taught... we're losing :( | 15:21 |
JayF | Well, we don't always like to frame Ironic this way | 15:21 |
JayF | but Ironic deployments are super easy to do... if you have nova in front | 15:21 |
TheJulia | it might just be resistance to information because they have no need to touch it because that is not their primary role | 15:21 |
dtantsur | By the way, the outreachy season is coming. If we have an easy win, we can try proposing it. | 15:22 |
JayF | this is a secondary use case and we've always treated it as a secondary use case, whether that's right/wrong/etc | 15:22 |
JayF | dtantsur: I will have an MLH intern around that season too | 15:23 |
JayF | dtantsur: but we'd need rough docs to be able to get an intern to curate it into not-terrible docs | 15:23 |
dtantsur | We have rough docs, no? | 15:23 |
JayF | dtantsur: I'll note: incoming interns is also why I'm working on contributor guide updates now (probably Tues you'll see a post with a radical improvement on our ironic-in-devstack docs) | 15:23 |
JayF | dtantsur: if so I couldn't find them in 10 minutes while working /kuba | 15:23 |
JayF | *w/kuba | 15:23 |
dtantsur | is https://docs.openstack.org/ironic/latest/user/index.html what you mean? | 15:24 |
dtantsur | particularly, https://docs.openstack.org/ironic/latest/user/deploy.html | 15:24 |
JayF | that is exactly the doc I was looking for earlier | 15:24 |
JayF | dtantsur+++++ | 15:24 |
JayF | kubajj: ^^ fyi I'll also slack the link to you | 15:24 |
dtantsur | I've spent quite some time on this document, but I'm sure it can be improved much further | 15:25 |
dtantsur | especially the configdrive explanation is lacking | 15:25 |
JayF | yeah I got feedback about this stuff being confusing from a lost-potential-user the other day too | 15:25 |
kubajj | JayF: I was reading this in the morning but got some error, thought it might be just bifrost | 15:25 |
* TheJulia wonders what is the attention span window we should be focusing on | 15:25 | |
TheJulia | "what is documentation in the tiktok generation" might be another way of framing that mental musing | 15:26 |
dtantsur | heh | 15:27 |
rpittau | lol | 15:27 |
JayF | I think that is based on a flawed premise: we're not doing a good job of making docs discoverable for the altavista generation either ;) | 15:27 |
dtantsur | We have a lot of vague concepts. Like instance_info itself. | 15:27 |
rpittau | should we do a "bare metal deployment dance" ? | 15:27 |
JayF | we have a large number of docs, it's borderline impossible to know which one you need | 15:27 |
JayF | and the vague concepts like dtantsur points out makes it hard to know what to search for | 15:27 |
dtantsur | Well, dunno. I think "User Guide" is a pretty natural place to look | 15:28 |
* JayF was convinced at SCALE20x by a librarian that we need one | 15:28 | |
dtantsur | I'd be more worried that people run away screaming after reading the Installation Guide :D | 15:28 |
JayF | I'm going to add a note at PTG about this, maybe we can take a swing at it or at least think about it in the intervening time | 15:29 |
JayF | well Julia beat me to it, but it's on that doc :D | 15:29 |
JayF | moving on | 15:30 |
JayF | #topic Review ongoing 2023.2 workstreams | 15:30 |
TheJulia | doc topic added to ptg etherpad | 15:30 |
JayF | #link https://etherpad.opendev.org/p/IronicWorkstreams2023.2 | 15:30 |
JayF | It's too early to fully declare victory | 15:31 |
JayF | but this has been a crazy productive cycle it seems | 15:31 |
JayF | so much impactful stuff landing and in progress | 15:31 |
TheJulia | Where are we at on the nova side of shards key usage? | 15:32 |
JayF | testing and positive feedback to johnthetubaguy | 15:33 |
JayF | then I think he goes begging for reviews | 15:33 |
JayF | I was struggling with devstack I got it working, so I should have an env to test that in this week | 15:33 |
JayF | TheJulia: unless you have time and want to dedicate time to it, let me commit to doing that test on Tues | 15:34 |
JayF | TheJulia: then we can free that up for John and hopefully he lands it | 15:34 |
JayF | Any other questions/comments/discussions on in-progress work streams? | 15:35 |
TheJulia | I can re-review, I think the last time I looked at the code I had high confidence in it | 15:35 |
JayF | same | 15:35 |
JayF | I just want to actually test it | 15:35 |
TheJulia | Since it is all well walked pattern changes | 15:36 |
TheJulia | lets sync after the meeting on it | 15:36 |
JayF | ack, going to move on | 15:36 |
JayF | Nothing listed for RFE Review; skipping that section. | 15:36 |
JayF | #topic Open Discussion | 15:36 |
JayF | I had one item for here: | 15:36 |
JayF | PTL and TC nominations are open. I strongly encourage Ironic contributors to run for PTL and/or TC. If you're interested in being PTL talk to me. | 15:37 |
JayF | If nobody else has self-nominated for PTL by midweek, I will re-nominate myself for a third term. | 15:37 |
JayF | That's all I had, just wanted to draw attention there. | 15:38 |
JayF | Anything else for open discussion? | 15:38 |
dtantsur | Democracy is good, letting Jay to get a break is even better! | 15:38 |
dtantsur | Go people go! | 15:38 |
dtantsur | So, yes, one funny bug | 15:38 |
dtantsur | https://bugs.launchpad.net/ironic/+bug/2032377 was brought to my attention by my fellow operator | 15:38 |
JayF | Eh, I don't mind being PTL tbh; I just appreciate that we cycle leadership and don't wanna break tradition :) | 15:38 |
dtantsur | it's stupidly simple, but I have no idea how to work around it cleanly | 15:38 |
JayF | we can't really leaving cloud-init AND glean installed on the same IPA image | 15:39 |
JayF | that's the bug there, yeah? | 15:39 |
dtantsur | nope | 15:39 |
TheJulia | it *sounds* like the the image has a pre-baked config drive | 15:40 |
TheJulia | and we don't find it | 15:40 |
dtantsur | Imagine we're cleaning a node that had a configdrive. And IPA has a configdrive. | 15:40 |
TheJulia | and we create a new one | 15:40 |
TheJulia | and *boom* | 15:40 |
TheJulia | oh | 15:40 |
JayF | dtantsur: ah, in IPA world we should never ever not ever respect config on disk | 15:40 |
JayF | that is potentially a security bug | 15:40 |
dtantsur | Right. But we do. | 15:40 |
TheJulia | so it is when the ramdisk boots, it finds/attaches the config drive data embedded in the iso | 15:41 |
JayF | we'd almost need glean to have an option to filter block devices and/or look for a different label | 15:41 |
TheJulia | and doesn't unmount it for operations it sounds like? | 15:41 |
dtantsur | TheJulia: still simpler. Glean is looking for a configdrive. There are two: the one it should use (in the CD) and the old one on disk. | 15:42 |
TheJulia | the one on the disk shouldn't be a block device... | 15:42 |
TheJulia | yet | 15:42 |
TheJulia | wut | 15:42 |
JayF | for cleaning | 15:42 |
JayF | yeah? | 15:42 |
dtantsur | TheJulia: how can it be NOT a block device? | 15:42 |
TheJulia | it would need to be attached to the loopback to become a device | 15:42 |
dtantsur | JayF: cleaning is the biggest problem; after the cleaning the rogue partition will be gone. | 15:42 |
JayF | or on a not-cleaned device doing a second deploy | 15:42 |
TheJulia | we need to look at the simple-init code and if we can get a ramdisk log that would be super helpful | 15:43 |
dtantsur | TheJulia: you're talking about a file; configdrive is a partition (on disk) or a whole device (CD) | 15:43 |
TheJulia | OH | 15:43 |
TheJulia | the whole CD is labled config-2 | 15:43 |
TheJulia | wut | 15:43 |
dtantsur | yep, that's how DHCP-less works in Ironic | 15:43 |
JayF | that is how ISO-based configdrives work in VM | 15:43 |
JayF | as well | 15:43 |
TheJulia | I didn't realize that was how vmedia ramdisk worked | 15:44 |
dtantsur | TheJulia: not always, only when Node.network_data is used | 15:44 |
TheJulia | OH | 15:44 |
TheJulia | wheeeeeeeeeeeeee | 15:45 |
JayF | Yeah, I agree that's a nasty bug. | 15:46 |
JayF | I agree I don't know how we fix it without changes in glean. | 15:46 |
JayF | AND service steps will increase the scope of the bug | 15:46 |
dtantsur | yeah. maybe we should talk to ianw, I can drop him an email | 15:47 |
TheJulia | I think we don't have enough information to fully comprehend the bug since if they ahve a pre-existing configuration drive, and one based on the image itself, it is sort of a case we never expected | 15:47 |
TheJulia | ++ | 15:47 |
JayF | even like glean-use-part-uuid=AAAA-BBBB-CCCC-DDDD | 15:48 |
TheJulia | since we expected the node to be cleaned, but this is an instance as a ramdisk with it's own config drive | 15:48 |
JayF | in the kernel command line | 15:48 |
dtantsur | TheJulia: we do actually. IPA has a configdrive because it's how DHCP-less works. The disk has a configdrive because it's cleaning after deployment. | 15:48 |
JayF | just some way for Ironic to signal to glean "use this one" | 15:48 |
JayF | yeah that's why the bug is tricky; both configdrives are valid just not in the same context | 15:48 |
TheJulia | yeah, but we know it at that point, they are doing it themselves outside of cleaning, which is how I'm reading the bug | 15:48 |
dtantsur | mmm? | 15:49 |
TheJulia | hmm | 15:49 |
TheJulia | we really need to talk with the filer of the bug and ask questions | 15:49 |
dtantsur | I talk to him all the time but I don't know which questions you have in mind | 15:50 |
dtantsur | We know what is going on, we don't know how to fix it | 15:50 |
TheJulia | Well, I'm confused on what exactly they are doing | 15:50 |
dtantsur | 1) Normal deployment with DHCP-less IPA; 2) Instance tear down; 3) Boom | 15:50 |
dtantsur | 3) *Boom with DHCP-less cleaning | 15:51 |
JayF | Configure node.network_data. Deploy the node. Clean the node. At clean time both the original deployed configdrive AND the IPA-node.network_data configdrive exists. | 15:51 |
TheJulia | so, our ipa ramdisk should have the smarts to know where to boot from, and not hunt to the OS disks when booted that way | 15:51 |
TheJulia | *should* being the keyword there | 15:52 |
JayF | yep, that's what dtantsur and I were talking about with glean | 15:52 |
JayF | b/c we'll need to tell glean exactly which partition to help it dedupe | 15:52 |
TheJulia | ... or just explicitly run it | 15:52 |
dtantsur | for contrast, that's the current glean's logic: https://opendev.org/opendev/glean/src/branch/master/glean/init/glean-early.sh#L34-L38 | 15:52 |
opendevreview | Merged openstack/bifrost master: Remove Fedora from the CI https://review.opendev.org/c/openstack/bifrost/+/892123 | 15:52 |
TheJulia | yeah, that is semi-problematic | 15:53 |
dtantsur | TheJulia: running Glean manually is non-trivial since it gets triggered from udev | 15:53 |
TheJulia | yeah | 15:53 |
JayF | I mean, couldn't we do something like: | 15:54 |
JayF | 1) Disable glean from autorun, via udev and everything else | 15:54 |
JayF | 2) On IPA startup, look for ipa-network-data=blah and if it exists, do some mounting then run LN 53 from glean-early.sh? | 15:55 |
TheJulia | glean uses integrated network interface enumeration | 15:55 |
JayF | so you can't run it late? | 15:55 |
TheJulia | not really | 15:55 |
JayF | yeah, then we need glean to get some sorta hint | 15:55 |
JayF | that says "no really, this configdrive" | 15:55 |
dtantsur | You probably can, but it's going to happen quite late | 15:55 |
dtantsur | e.g. currently IPA is After:network-online | 15:55 |
JayF | good point | 15:55 |
JayF | we'd basically have to write a separate unit, at which point we've reinvented the wheel | 15:56 |
dtantsur | We could have our own service that goes Before | 15:56 |
dtantsur | right | 15:56 |
* JayF votes for kernel cli or on-disk glean config that points it explicitly at a partition uuid | 15:56 | |
JayF | since we should know the partition uuid at create time, yeah? | 15:56 |
dtantsur | okay, lemme talk to Ian, maybe he has an opinion too | 15:56 |
dtantsur | JayF: we can use /dev/sr0 really.. | 15:57 |
JayF | dtantsur: that is going to potentially vary based on hardware | 15:57 |
dtantsur | also true | 15:57 |
JayF | dtantsur: which is why I'd prefer a uuid-based approach | 15:57 |
JayF | Aight, we have 3 minutes left | 15:57 |
JayF | any items remain for open discussion? | 15:57 |
kubajj | I have a quick question regarding the docs. Is there any prefered location I should describe the hierarchy of kernel/ramdisk parameters? I did not find the current state described anywhere. | 15:57 |
clarkb | you can manually invoke the glean script with the right network info | 15:58 |
clarkb | thats all the udev systemd integration does | 15:58 |
JayF | kubajj: doc/source/install/configure-glance-images.rst | 15:58 |
JayF | kubajj: from a cursory running of rg deploy_ramdisk doc/ | 15:59 |
clarkb | you could also potentially use fancier udev rules to do what you want. udev is magic but also indecipherable | 15:59 |
dtantsur | clarkb: possibly, but then we need to stop using simple-init | 15:59 |
dtantsur | .. which may not be a terrible thing because then we can include whatever we invent in all ramdisks (currently it's opt-in) | 15:59 |
* JayF hears the bells ring for the top of the hour | 15:59 | |
JayF | #endmeeting | 15:59 |
opendevmeet | Meeting ended Mon Aug 21 15:59:54 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:59 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-08-21-15.00.html | 15:59 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-08-21-15.00.txt | 15:59 |
opendevmeet | Log: https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-08-21-15.00.log.html | 15:59 |
JayF | We can keep chatting obviously but no need to log our solutioning :D | 16:00 |
JayF | fyi; I'm out of the office (well, at least out off-and-on) for the remainder of the day; if you need me for something send a message and I'll get around to it | 16:00 |
* TheJulia goes and looks for her voltmeter to check her electrical after this hurricane | 16:04 | |
rpittau | good night! o/ | 16:04 |
dtantsur | clarkb: do you think it would be acceptable for glean to optionally read the configdrive source from kernel params? | 16:05 |
dtantsur | a label or a UUID, dunno yet | 16:05 |
clarkb | currently glean looks for the right device labels. I could see making that configurable I guess | 16:06 |
clarkb | probably a good idea to cross check with cloud-init to see if they do something similar yet and mimic that if so | 16:07 |
dtantsur | yep, makes sense | 16:07 |
dtantsur | clarkb: https://cloudinit.readthedocs.io/en/latest/reference/datasources/nocloud.html seems to be quite relevant | 16:11 |
dtantsur | or https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v1.html#network-config-v1 | 16:13 |
clarkb | I don't think we want anything quote so flexibile in glean. It very specifically does config drive and that is it | 16:13 |
clarkb | I think we would reject anything like that. config drive is the system. If we need to use different labels for coordination purposes that is probably fine though | 16:14 |
dtantsur | yeah, it's probably the easiest and the least invasive way. | 16:14 |
clarkb | it is interesting that cloud-init uses a different hard coded label though | 16:14 |
dtantsur | clarkb: it's for NoCloud. For ConfigDrive, they support the same labels. | 16:15 |
clarkb | rather than accepting a config option for that. I guess using the local fs may be too late though hence the kernel boot param? | 16:15 |
clarkb | right | 16:15 |
dtantsur | clarkb: the local fs is a bit problematic for us because we cannot affect it in runtime so easily in Ironic | 16:15 |
clarkb | aha. In that case checking boot params for a configdrivelabel attribute (or similar) seems reasonable. If not present we default to the current name(s) else use the supplied value | 16:16 |
dtantsur | yep | 16:16 |
opendevreview | Jakub Jelinek proposed openstack/ironic master: Introduce default kernel/ramdisks by arch https://review.opendev.org/c/openstack/ironic/+/890819 | 16:20 |
JayF | Can we, instead of label, specifically use partition uuid? | 16:23 |
JayF | I guess we could use a uuid as a label, but it just seems like we do not want that to be a static string that is guessable by the tenant on the instance | 16:23 |
clarkb | JayF: config drive currenclt explicitly uses labels. I'd like to stick to config drive as much as possible here | 16:23 |
clarkb | I mean you can check uuids but config drive isn't built that way | 16:23 |
clarkb | so its extra code to write and test in a difficult to test and debug part of these systems | 16:24 |
JayF | This is explicitly a case of disambiguation though, this would be two config drives, both validly labeled, and indicating which one is the one to use | 16:24 |
clarkb | right, maybe take that up with nova ? | 16:24 |
JayF | But either way, my only meaningful concern is that we're able to use a nonpredictable string as label or UUID or whatever | 16:24 |
dtantsur | JayF, clarkb, that's what I wrote on the bug: https://bugs.launchpad.net/ironic/+bug/2032377/comments/1. Makes sense? | 16:24 |
JayF | Nah this one is entirely our bug 🫥 | 16:25 |
JayF | Using config drive for IPA while there's also a config drive on the instance is not so fun | 16:25 |
clarkb | well nova says a config drive is a specific label | 16:25 |
opendevreview | Merged openstack/metalsmith master: Add centos9 based job https://review.opendev.org/c/openstack/metalsmith/+/869374 | 16:25 |
clarkb | which is ambiguous if two fses have the same label | 16:25 |
dtantsur | clarkb: note that one of the configdrives is on the service ramdisk ISO | 16:25 |
JayF | dtantsur: that looks good | 16:25 |
dtantsur | the final instance does not see it | 16:25 |
clarkb | I just selfishly want to avoid writing code that has to deal with uuids and labels becaues debugging glean is super painful | 16:26 |
dtantsur | clarkb: it's not too bad, just instead of LABEL=... here https://opendev.org/opendev/glean/src/branch/master/glean/init/glean-early.sh#L44 you'd have UUID= | 16:27 |
dtantsur | in the proposal, we're going to pass it for you even | 16:27 |
TheJulia | wheee looks like one of our solar arrays is down | 16:28 |
clarkb | I guess that isn't too bad since we lean on blkid. May still create debugging paths that are different but unlikely as long as system tools are reliable | 16:28 |
clarkb | mostly I want to be careful we don't diverge from what a config drive is too much | 16:29 |
clarkb | currently it is defined as a hardcoded label on a fs. Making that configurable isn't too much of a stretch | 16:29 |
JayF | One positive thing to note, is that ironic relies heavily on those system tools being reliable already so we should know if something happens that would likely break glean | 16:29 |
clarkb | basically glean is a highly opinionated config drive only instance configurator. If you need fancy features cloud-init is where yo ushould look | 16:30 |
clarkb | I think "please use device label=foobar" isn't too much of a stretch here. And probably use this uuid isn't either given how straightforward that appears to be | 16:30 |
clarkb | TheJulia: hopefully flooding hasn't affected you too much? | 16:30 |
TheJulia | well, a train just derailed at the bridge nearby | 16:31 |
TheJulia | so....... | 16:31 |
TheJulia | yeah | 16:31 |
dtantsur | Oo | 16:31 |
clarkb | oof | 16:31 |
TheJulia | yeah, the highway became a literal river | 16:31 |
clarkb | ya I saw a river is running over I10 | 16:31 |
TheJulia | I'm not far from that actually | 16:32 |
dtantsur | There have been quite a bit of water in Moscow as well: https://t.me/ostorozhno_novosti/18847 https://t.me/sotaproject/64760 | 16:34 |
TheJulia | good news, and bad news | 16:42 |
TheJulia | Good: OVN DHCP w/v4 works \o/ Bad: Downloads through OVN stall out: https://d1260bf3e4063ee5a28c-2b5476c4783b3d94c184e1ce73ec8a2b.ssl.cf5.rackcdn.com/885087/48/check/ironic-tempest-ipa-wholedisk-bios-agent_ipmitool/037d878/controller/logs/ironic-bm-logs/node-0_console_2023-08-21-16%3A02%3A49_log.txt | 16:42 |
opendevreview | Merged openstack/ironic master: Retool sqlite retries https://review.opendev.org/c/openstack/ironic/+/891333 | 16:43 |
hjensas | TheJulia: That is with https://review.opendev.org/c/openstack/ironic/+/885087 ? | 18:41 |
hjensas | On OVN topic, I talked to the neutron folks today, and agreed to have a go at testing https://review.opendev.org/c/openstack/neutron/+/890683 (DHCPv6)w/OVN. | 18:45 |
TheJulia | hjensas: yes, got a pcap and trying to figure out what exactly is going on. I've sort of got a hunch, but I'm trying to understand why I have like 16kB packets | 18:49 |
TheJulia | in my pcap | 18:49 |
hjensas | ok, I am going to use your patch locally - start with v4 and then switch it to v6 and add that neutron patch. | 18:53 |
hjensas | TheJulia: let me know if you figure out why it's can't download the kernel/ramdisk. I'll dig as well if I see the same issue locally. | 18:55 |
TheJulia | pcap file https://usercontent.irccloud-cdn.com/file/b4v3L20Y/tap-node-0i1-a.pcap | 18:55 |
TheJulia | So it seems to be ovn, and eventually I loose a packet somehow | 18:56 |
TheJulia | ... I wonder if ovn is just overflowing a buffer | 18:56 |
TheJulia | hjensas: changed out the networking to libvirt bridge against brbm ... i.e. not vepa mode, and can reproduce the exact same behavior where the conneciton just stalls out | 22:19 |
TheJulia | it looks like we might be loosing packets somewhere between the br-ex bridge and the actual port | 22:35 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!