TheJulia | so, the boot failed once | 00:00 |
---|---|---|
TheJulia | in that grenade test | 00:00 |
TheJulia | which resulted in that console log | 00:01 |
TheJulia | and part of it I think is due to the default change to have blended v6 and v4 networking by default in a clean devstack | 00:01 |
TheJulia | I bet, if we just recheck it it might actually be happy | 00:01 |
TheJulia | It got something for v6, but not v4 | 00:02 |
TheJulia | the other network boot attempts just worked just fine | 00:02 |
JayF | ack, ty | 00:08 |
TheJulia | interesting | 00:08 |
JayF | I'm actively working on a CI improvement right now :D | 00:08 |
JayF | (the IPA gentoo ramdisk) | 00:08 |
JayF | I'd say it's going slightly worse than expected but I'll get there :D | 00:08 |
TheJulia | so, I can see in the logs when the dhcp config got setup, vm powered on | 00:08 |
TheJulia | it just didn't know what to do | 00:08 |
TheJulia | I think we're sort of looking at default weakness in network booting | 00:09 |
TheJulia | I don't see a dhcp request ever hitting neutron/dnsmasq | 00:10 |
TheJulia | for that last vm boot | 00:10 |
TheJulia | .... | 00:10 |
TheJulia | qemu powered the node on at 21:58:02.... neutron didn't process the dhcp update until a minute later | 00:14 |
JayF | It really is unfortunate we can't have ipxe automatically reboot and retry in those failure cases | 00:17 |
JayF | IRL that'd potentially be useful too | 00:17 |
JayF | I guess I'm assuming it'd pxe boot again | 00:17 |
TheJulia | we have code in ironic to cycle it, but this is inspector and it has no clue in those cases | 00:17 |
JayF | more reason to get that thing dead-dead this cycle | 00:17 |
opendevreview | Verification of a change to openstack/ironic master failed: Update Node Cache after Successful Clean/Service https://review.opendev.org/c/openstack/ironic/+/936009 | 00:17 |
JayF | cid I think has the inspector rules stuff ready for review so we're getting close | 00:17 |
TheJulia | Yeah, Ironic confirmed power on at 21:58:07 | 00:18 |
JayF | and thus one of the oldest bugs in Ironic+Neutron integrations rears it's head again (that's the sorta thing getting real callbacks would fix, yes?) | 00:18 |
TheJulia | so | 00:19 |
TheJulia | this is actually ironic-conductor managed at this point, it might be able to have enough time set to power-cycle the node | 00:19 |
TheJulia | but we logged we had set neutron at 21:57:31 | 00:19 |
TheJulia | so.... neutron didn't complete the thing until 21:59:15... ugh | 00:20 |
TheJulia | Yeah, completing callbacks would totally fix this | 00:21 |
JayF | the sad thing is that's like, obviously a bug and a bad interaction. 100% chance it never happens in real life in any meaningful way | 00:21 |
JayF | how much more stable would our CI be if we made sushy-tools take 5 minutes to reboot instead of doing it instantly? | 00:21 |
JayF | it'd certainly reflect real-life behavior. | 00:21 |
TheJulia | Yeah, neutron logged port update behavior complete at 21:59:15 on req-c09d103e-13be-4af6-a5fb-602da729862f | 00:26 |
TheJulia | Best I can tell, neutron went semi-out-to-lunch | 00:26 |
TheJulia | it logged other stuff related to the test, a while later, but I haven't tried to figure out how far behind it was | 00:26 |
JayF | if one of us could write a high quality spec on that, I could probably get someone to look at it next cycle ... but that's basically saying "does someone else want to do half the work?" :D | 00:27 |
TheJulia | I think the challenge always was "how do I stop and resume at this particular point | 00:28 |
TheJulia | we loose context | 00:28 |
TheJulia | and we need to restart | 00:28 |
TheJulia | uhhhhhhhh | 00:28 |
TheJulia | I can't design that right now | 00:28 |
JayF | I think Sean had some ideas in this direction | 00:28 |
TheJulia | its actually trivial in the existing flow except we *have* to have all other backend service interaction work done before we drop the original request context on the floor | 00:29 |
TheJulia | so maybe we might need to move some image stuff up before dhcp, dunno | 00:29 |
JayF | you make that sound almost doable :) | 00:30 |
TheJulia | it is actually doable, just I can't spend spoons on it right now | 00:31 |
JayF | you may have just stirred up the snow enough to cause an avalanche :D | 00:31 |
JayF | I'm going to make a note on my board to look at this. Won't hurt to spend some time at least trying to understand the shape of the problem better. | 00:32 |
JayF | hah, got it. The first IPA DIB ramdisk that isn't the size of montana is gzipping right now. | 00:33 |
JayF | errored because it was looking in the wrong spot for a kernel, but 371M gentoo-ipa.initramfs | 00:34 |
JayF | that's gzipped | 00:34 |
JayF | not great, but not a horrible starting point given the only optimization I'm doing is uninstalling a few things | 00:34 |
JayF | and with that, I need to go spend some time with Vanessa. You have a good night o/ | 00:35 |
TheJulia | good plan, goodnight | 00:38 |
cardoe | Well now I remembered that I wanted to fix sushy’s pbr usage. | 01:13 |
opendevreview | Merged openstack/ironic master: Update Node Cache after Successful Clean/Service https://review.opendev.org/c/openstack/ironic/+/936009 | 02:42 |
opendevreview | Verification of a change to openstack/ironic master failed: dedup reboot request in redfish bios path https://review.opendev.org/c/openstack/ironic/+/933020 | 05:11 |
opendevreview | Merged openstack/ironic master: dedup reboot request in redfish bios path https://review.opendev.org/c/openstack/ironic/+/933020 | 08:25 |
rpittau | good morning ironic! o/ | 08:29 |
kubajj | good morning rpittau! o/ | 09:22 |
rpittau | hey kubajj :) | 09:22 |
opendevreview | Iury Gregory Melo Ferreira proposed openstack/ironic stable/2024.2: Update Node Cache after Successful Clean/Service https://review.opendev.org/c/openstack/ironic/+/937107 | 10:32 |
iurygregory | good morning ironic o/ | 10:32 |
kubajj | Maybe I have a stupid question: How do the deploy steps in IPA work by default? I thought that priority 0 meant that the step is not executed, but at https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L2675-L2713 all the deploy steps (except the last one) have priority 0 🤷 | 12:19 |
rpittau | kubajj: that's just a template for the generic hardware manager, those steps can be requested though during deployment using deploy templates | 12:58 |
rpittau | this is also valid for clean steps btw on the cleaning phase | 12:58 |
rpittau | this guide gives some good info for custom steps https://docs.openstack.org/ironic/latest/admin/node-deployment.html | 12:58 |
kubajj | rpittau: thank so much | 12:59 |
opendevreview | Takashi Kajinami proposed openstack/ironic master: Replace crypt module https://review.opendev.org/c/openstack/ironic/+/937173 | 13:00 |
opendevreview | Takashi Kajinami proposed openstack/ironic-python-agent master: Replace crypt module https://review.opendev.org/c/openstack/ironic-python-agent/+/937175 | 13:04 |
opendevreview | Takashi Kajinami proposed openstack/ironic-python-agent master: Replace crypt module https://review.opendev.org/c/openstack/ironic-python-agent/+/937175 | 13:04 |
opendevreview | Doug Goldstein proposed openstack/sushy master: drop runtime dependency on pbr https://review.opendev.org/c/openstack/sushy/+/937183 | 14:05 |
cardoe | I pinged a few of you on the above ^. I wanted to ping rpittau as well but gerrit was mad at me. | 14:08 |
cardoe | It's a behavior change to sushy version reporting. But I wanted feedback from everyone because I would make the same change to the rest of the ironic projects. It removes the need for pbr in runtime. Which removes the installation of setuptools into virtualenvs. Which today is not an explicit dependency in a lot of places so it makes some packaging quirks/issues. | 14:09 |
cardoe | The biggest difference I can see in the parsing code from what we do is that importlib.metadata.version doesn't strip a leading "v" or "V" in the version while pbr does. | 14:10 |
cardoe | So if we don't do that then we can just use that function. | 14:10 |
JayF | cardoe: what's the value of that change? I rather like that we outsource the details of that to a well-supported library | 14:15 |
JayF | I've written kind of the opposite of this change in an Oslo pr | 14:16 |
cardoe | So I can't find it but we're only using released versions of pbr somewhere. | 14:16 |
cardoe | And pbr doesn't have a release that works on Python 3.12 right now. | 14:16 |
cardoe | The problem is with dependencies and virtualenvs. pbr tries to import a couple of things without declaring them as dependencies. | 14:17 |
cardoe | Previously it was all magically installed but they're installing less and less into virtualenvs now. | 14:18 |
cardoe | At the end of the day the only thing in the runtime case of pbr that's used is to grab the version info. | 14:19 |
cardoe | After a few hoop jumps, if you're on Python 3.8 the resultant code is identical to calling that built in importlib function with 2 differences. | 14:20 |
cardoe | The first is that leading "v" and "V" is stripped off and the latter is that the version string is reassembled according to SemVer 3.0.0 (which isn't really a thing. pbr forked the SemVer 2.0.0 spec and made some changes to it and called it SemVer 3.0.0) | 14:25 |
cardoe | SemVer 2.0.0 had some incompatibilities with PEP-440 (Python's versioning PEP). But PEP-440 has been superseded in https://packaging.python.org/en/latest/specifications/version-specifiers/ | 14:25 |
cardoe | Which my reading of SemVer 3.0.0 which calls out its differences, I'm able to find all those changes in the Python Packaging Version Specifiers spec. | 14:26 |
cardoe | importlib.metadata.version has always implemented the Python Packaging Version Specifiers behavior per the docs. | 14:27 |
cardoe | My mentality here is that "less is more". We (OpenStack community) have a big maintenance burden of items. If there's something available in the stdlib that's now functionally the same. Let's use it. Especially when there's not even enough dev cycles to get pbr released. | 14:28 |
cardoe | </soapbox> | 14:29 |
cardoe | pbr is still great at building and packaging | 14:29 |
TheJulia | good morning | 14:36 |
* TheJulia tries to wake up | 14:36 | |
JayF | cardoe: there is a difference, I can't summon up what it is at 6:30 a.m. local time. I'll try to find it, but I think you're basically using only One source whereas PBR will look in two places | 14:39 |
JayF | Also, if PBR needs python 3.12 support, it should be added pretty quickly... | 14:39 |
cardoe | https://review.opendev.org/c/openstack/pbr/+/924216 | 14:42 |
cardoe | https://opendev.org/openstack/pbr/src/commit/46ff9dd96718cdefe72a1a01447e14491917217b/pbr/version.py#L467 is the place where it grabs the version info. | 14:43 |
cardoe | I'll also go the other way and fix up stuff around pbr. | 14:47 |
TheJulia | JayF: speaking of neutron and weirdness.... :) | 14:50 |
JayF | Yeah, I just struggle with changes like this which bring us less in line with how people generally use things... It means we're going to have to fix it ourselves if it changes in a later python version, versus getting it for free if we continue | 14:50 |
JayF | with PBR | 14:50 |
JayF | cardoe: just above that it'll use pkg_resources if needed too, right? | 14:55 |
rpittau | cardoe: I appreciate the effort, I'm aware of the challenge of PBR support for 3.12, but I'd rather wait for that to happen, looks like it will be soon | 15:06 |
rpittau | unless there's something really breaking or blocking us :) | 15:06 |
cardoe | JayF: yeah it'll use pkg_resources but there's no dependency on it so it blows itself up :) | 15:07 |
JayF | TheJulia: I was thinking you were going somewhere with that comment lol | 15:11 |
TheJulia | heh, see mailing list :) | 15:11 |
JayF | we need to do those changes ourselves :( | 15:13 |
JayF | anyone want me to hold approval of https://review.opendev.org/c/openstack/ironic-specs/+/931025 (kea dhcp) for additional approval? myself and riccardo are +2 | 15:14 |
TheJulia | I can take a look later today if you want | 15:22 |
TheJulia | I'm not so sure we need to | 15:36 |
JayF | I want it merged :) | 15:36 |
TheJulia | then merge it | 15:36 |
JayF | wfm | 15:36 |
JayF | just always give option for extra eyes with specs :D | 15:37 |
TheJulia | Our hash ring loops and is based upon conductor side entries, not time inside of wsgi | 15:45 |
TheJulia | conductor side entries in the db | 15:45 |
JayF | well I mean more generally migrating off eventlet-requiring wsgi stuuf | 15:45 |
JayF | **stuff | 15:45 |
JayF | I have resolved this cycle to get more things to 100% done and less things started and half-done, so I'm not touching it until I get less things partially done, but I'm makin' progress | 15:46 |
opendevreview | Merged openstack/ironic-specs master: Add a Kea DHCP backend https://review.opendev.org/c/openstack/ironic-specs/+/931025 | 15:47 |
JayF | So, question; I just had a feature ask from my downstream. Nova has an ability to setup population of vendor_data via an external web service. I was wondering if we would have any interest in an ironic-side implementation of that service and/or some hook in the virt driver to put some ironic-provided vendor data into place | 15:47 |
JayF | Example: the end-user needs to know what rack/switch they are connected to for performance tuning reasons to compare with other machines -- they don't need to dictate it, they just need to know. This information is inspected by Ironic, but needs to be exposed to the instance. | 15:48 |
TheJulia | I'd kind of be curious what the use case is | 15:48 |
JayF | My thought: either 1) an Ironic-adjacent project implements the rest service Nova wants or 2) we look at adding a method to the virt driver interface in nova allowing the driver to populate some vendor data as well | 15:49 |
TheJulia | That seems... semi-reasonable | 15:49 |
JayF | TheJulia: basically just that ^^ "expose to the instance user some useful data about the node found in inspection" | 15:49 |
JayF | in my case; it's all topological information we care about | 15:49 |
JayF | but I could easily see that including other information we could find in inspection, too | 15:49 |
TheJulia | They could find most of that out themselves | 15:50 |
TheJulia | but topological is a little harder | 15:50 |
JayF | yeah, topological is the piece we caare about in this case | 15:50 |
JayF | and it's populated in inspection downstream | 15:50 |
JayF | honestly the more people use integrated openstack to build other cloud platforms on top the more useful this kinda feature is likely to be | 15:53 |
TheJulia | Indeed | 15:56 |
rpittau | good night! o/ | 17:03 |
cardoe | JayF: I'll pivot a different way then. | 17:03 |
JayF | yeah sorry for being fun police there, I hate nak'ing a patch someone already wrote :( | 17:04 |
JayF | we've just been bitten by that type of change in the past, basically taking on maintenance work for Ironic that's usually handled by the group | 17:04 |
cardoe | It's fine. I do believe ultimately all that should be retired. | 17:05 |
JayF | oof, just kernel modules on the dist-kernel for gentoo are like, 500 MB | 17:09 |
JayF | how the hell is tinyipa so small?! | 17:09 |
cardoe | you strip them? | 17:13 |
dtantsur | Because it barely includes this modules :) | 17:13 |
dtantsur | Same problem with any DIB-built images: a lot of modules and firmware | 17:13 |
JayF | I'm going to look and see what modules are in the tinyipa image | 17:13 |
JayF | and manually prune the stuff in gentoo image | 17:13 |
JayF | even ironic-python-agent itself, the venv, is somehow larger than the entire tinyipa image | 17:14 |
JayF | I don't know how that's even possible | 17:14 |
JayF | I'm now wondering if this whole project is folly :( | 17:15 |
JayF | the status quo is no good, but it's also created a bar that's nearly impossible to clear | 17:15 |
opendevreview | Julia Kreger proposed openstack/ironic master: docs: clarification around setting port llc data https://review.opendev.org/c/openstack/ironic/+/937193 | 17:16 |
JayF | oh, I'm also comparing uncompressed sizes to compressed sizes | 17:17 |
TheJulia | JayF: also, in IPA we prune out some stuff pretty aggressively on the firmware/modules side. | 17:18 |
TheJulia | err, with dib builds | 17:18 |
JayF | yeah, those are running on the gentoo build, too | 17:18 |
JayF | paths are the same I checked | 17:18 |
TheJulia | oh, goodie | 17:18 |
TheJulia | no need for huge 256MB compressed firmware blobs which actually just ahve a nested kernel inside of them | 17:18 |
JayF | got a good piece of advice from #gentoo-chat: I forgot the gentoo pdb contains a readable listing of installed files for a package | 17:19 |
JayF | so I should be able to write a function that, given a gentoo package name, removes it without package manager assistance | 17:19 |
JayF | ...meaning I can also remove the package manager :D | 17:19 |
TheJulia | hehehe | 17:19 |
JayF | yeah I am slightly heartened, I'll keep digging | 17:19 |
JayF | really the ideal way to do this is write something more debootstrap-style, with an external portage modifying the image inside, but that's more change than I wanna write today | 17:20 |
opendevreview | Merged openstack/ironic master: docs: note ipv6 is a good idea with neutron interface https://review.opendev.org/c/openstack/ironic/+/936951 | 17:38 |
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/937206 | 20:14 |
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/937206 | 20:45 |
cardoe | Hard | 20:54 |
cardoe | JayF: nothing to maintain. We are just reading and dumping out the VERSION in the pkg metadata. | 20:55 |
cardoe | But I’ll add the necessary dependencies so pbr works. | 20:56 |
JayF | I am still trying to remember | 21:00 |
JayF | there's something version-y that pbr does | 21:00 |
JayF | that other things don't | 21:00 |
JayF | maybe getting versions from git? | 21:00 |
JayF | or deriving them from git? | 21:00 |
JayF | adamcarthur5: are your i-t-p changes ready for review now? | 21:06 |
cardoe | Yes it derives them from git for the build. I’m a +1 on pbr for builds. But -1 on runtime. | 21:08 |
adamcarthur5 | JayF i-t-p? | 21:08 |
JayF | ironic-tempest-plugins | 21:08 |
JayF | egad; ironic-lib json-rpc uses an eventlet-based wsgi server | 21:09 |
JayF | another place we'll have to excise it :( | 21:09 |
adamcarthur5 | Ah, well, I have nothing ready for review at present | 21:10 |
adamcarthur5 | I'll stick ironic-prio-weekly on it when I do | 21:10 |
JayF | ah, too bad, that looked good, you just waiting on ci or something else | 21:10 |
cardoe | I need to install easy_install just to “import ironic” which is gross. | 21:10 |
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/937206 | 21:11 |
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/937206 | 21:12 |
adamcarthur5 | CI + I will abandon the conductor + node firmware ones and build on top of the generator one JayF | 21:13 |
adamcarthur5 | No point merging changes when I have a new pattern for them | 21:13 |
JayF | yeah tahts' what I was asking, if 937206 was r4r | 21:13 |
adamcarthur5 | r4r? :^) | 21:15 |
adamcarthur5 | Sorry I feel like I should be getting these but I have no idea. | 21:15 |
JayF | ready for review | 21:16 |
JayF | any number of these may just be stuff I have made up lol | 21:16 |
JayF | it's not your fault I'm speaking in riddles | 21:16 |
JayF | /nick Sphinx | 21:16 |
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/allocations https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/937213 | 21:18 |
opendevreview | Adam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/nodes/{uuid}/firmware https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/937214 | 21:22 |
TheJulia | I often speak in riddles of context. At some point it is a responsibility! | 21:31 |
adamcarthur5 | JayF I am leaving for the day, so lets just pray CI passes and say that the changes are good for review: https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/937206 | 21:32 |
adamcarthur5 | Its just one relation-chain now | 21:32 |
cardoe | Ugh. I think we are hitting. https://bugs.launchpad.net/nova/+bug/2019977 | 21:54 |
cardoe | You recall Jay what you and Sean discussed on there? | 21:55 |
JayF | this, I believe, is what TheJulia was talking about yesterday with me, what's breaking the inspector grenade job | 22:09 |
JayF | and it sounds like from the list it's just being more-often-surfaced due to a neutron issue | 22:09 |
JayF | but I'm not 100% sure and cannot look in detail | 22:09 |
TheJulia | no | 22:11 |
TheJulia | inspector's grenade job was being broken by neutron going out to literal lunch for over a minute | 22:11 |
TheJulia | mark's bug is over the issue that nova insists on deletion of resources and value detachment ownership when Ironic has to do it as part of it's flow to ensure it is done because nova is no the only user. The case can be hit when a node can't exit the initial lock quickly which breaks the flow on the nova side | 22:12 |
TheJulia | that can be a result of resource contention on the conductor, though | 22:12 |
TheJulia | or, maybe, yes | 22:13 |
TheJulia | depends on what, exactly, but there is a difference. That being said, if we're hitting underlying instance performance issues, we likely need to disjoint the issues | 22:14 |
cardoe | So if we delete in a test loop it hits and the second time it’s good. | 22:14 |
TheJulia | Anyway, I need to run to an in-person meeting | 22:14 |
TheJulia | well, there is a fundimental disagreement over responsibility | 22:15 |
TheJulia | and then we get into statements asserting ironic has no right to do a thing | 22:15 |
TheJulia | and... the conversation goes sideways because different worlds exist | 22:15 |
cardoe | :/ | 22:17 |
TheJulia | that being said, if we're seeing 2 minutes for the lock to get released, sounds like CI resource contention is near max | 22:19 |
TheJulia | we should check to see if that is actually the case, if htat makes sense | 22:20 |
TheJulia | stepping away for appt | 22:20 |
cardoe | Oh not the project’s tests. This is one of our internal tests. | 22:22 |
TheJulia | oh, then I'm totally not understanding | 23:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!