Thursday, 2024-12-05

TheJuliaso, the boot failed once00:00
TheJuliain that grenade test00:00
TheJuliawhich resulted in that console log00:01
TheJuliaand part of it I think is due to the default change to have blended v6 and v4 networking by default in a clean devstack00:01
TheJuliaI bet, if we just recheck it it might actually be happy00:01
TheJuliaIt got something for v6, but not v400:02
TheJuliathe other network boot attempts just worked just fine00:02
JayFack, ty00:08
TheJuliainteresting00:08
JayFI'm actively working on a CI improvement right now :D 00:08
JayF(the IPA gentoo ramdisk)00:08
JayFI'd say it's going slightly worse than expected but I'll get there :D00:08
TheJuliaso, I can see in the logs when the dhcp config got setup, vm powered on00:08
TheJuliait just didn't know what to do00:08
TheJuliaI think we're sort of looking at default weakness in network booting00:09
TheJuliaI don't see a dhcp request ever hitting neutron/dnsmasq00:10
TheJuliafor that last vm boot00:10
TheJulia....00:10
TheJuliaqemu powered the node on at 21:58:02.... neutron didn't process the dhcp update until a minute later00:14
JayFIt really is unfortunate we can't have ipxe automatically reboot and retry in those failure cases00:17
JayFIRL that'd potentially be useful too00:17
JayFI guess I'm assuming it'd pxe boot again00:17
TheJuliawe have code in ironic to cycle it, but this is inspector and it has no clue in those cases00:17
JayFmore reason to get that thing dead-dead this cycle00:17
opendevreviewVerification of a change to openstack/ironic master failed: Update Node Cache after Successful Clean/Service  https://review.opendev.org/c/openstack/ironic/+/93600900:17
JayFcid I think has the inspector rules stuff ready for review so we're getting close00:17
TheJuliaYeah, Ironic confirmed power on at 21:58:0700:18
JayFand thus one of the oldest bugs in Ironic+Neutron integrations rears it's head again (that's the sorta thing getting real callbacks would fix, yes?)00:18
TheJuliaso00:19
TheJuliathis is actually ironic-conductor managed at this point, it might be able to have enough time set to power-cycle the node00:19
TheJuliabut we logged we had set neutron at 21:57:3100:19
TheJuliaso.... neutron didn't complete the thing until 21:59:15... ugh00:20
TheJuliaYeah, completing callbacks would totally fix this00:21
JayFthe sad thing is that's like, obviously a bug and a bad interaction. 100% chance it never happens in real life in any meaningful way00:21
JayFhow much more stable would our CI be if we made sushy-tools take 5 minutes to reboot instead of doing it instantly?00:21
JayFit'd certainly reflect real-life behavior.00:21
TheJuliaYeah, neutron logged port update behavior complete at 21:59:15 on req-c09d103e-13be-4af6-a5fb-602da729862f00:26
TheJuliaBest I can tell, neutron went semi-out-to-lunch00:26
TheJuliait logged other stuff related to the test, a while later, but I haven't tried to figure out how far behind it was00:26
JayFif one of us could write a high quality spec on that, I could probably get someone to look at it next cycle ... but that's basically saying "does someone else want to do half the work?" :D 00:27
TheJuliaI think the challenge always was "how do I stop and resume at this particular point00:28
TheJuliawe loose context00:28
TheJuliaand we need to restart00:28
TheJuliauhhhhhhhh00:28
TheJuliaI can't design that right now00:28
JayFI think Sean had some ideas in this direction00:28
TheJuliaits actually trivial in the existing flow except we *have* to have all other backend service interaction work done before we drop the original request context on the floor00:29
TheJuliaso maybe we might need to move some image stuff up before dhcp, dunno00:29
JayFyou make that sound almost doable :) 00:30
TheJuliait is actually doable, just I can't spend spoons on it right now00:31
JayFyou may have just stirred up the snow enough to cause an avalanche :D 00:31
JayFI'm going to make a note on my board to look at this. Won't hurt to spend some time at least trying to understand the shape of the problem better.00:32
JayFhah, got it. The first IPA DIB ramdisk that isn't the size of montana is gzipping right now.00:33
JayFerrored because it was looking in the wrong spot for a kernel, but 371M gentoo-ipa.initramfs00:34
JayFthat's gzipped00:34
JayFnot great, but not a horrible starting point given the only optimization I'm doing is uninstalling a few things00:34
JayFand with that, I need to go spend some time with Vanessa. You have a good night o/00:35
TheJuliagood plan, goodnight00:38
cardoeWell now I remembered that I wanted to fix sushy’s pbr usage.01:13
opendevreviewMerged openstack/ironic master: Update Node Cache after Successful Clean/Service  https://review.opendev.org/c/openstack/ironic/+/93600902:42
opendevreviewVerification of a change to openstack/ironic master failed: dedup reboot request in redfish bios path  https://review.opendev.org/c/openstack/ironic/+/93302005:11
opendevreviewMerged openstack/ironic master: dedup reboot request in redfish bios path  https://review.opendev.org/c/openstack/ironic/+/93302008:25
rpittaugood morning ironic! o/08:29
kubajjgood morning rpittau! o/09:22
rpittauhey kubajj :)09:22
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic stable/2024.2: Update Node Cache after Successful Clean/Service  https://review.opendev.org/c/openstack/ironic/+/93710710:32
iurygregorygood morning ironic o/10:32
kubajjMaybe I have a stupid question: How do the deploy steps in IPA work by default? I thought that priority 0 meant that the step is not executed, but at https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L2675-L2713 all the deploy steps (except the last one) have priority 0 🤷12:19
rpittaukubajj: that's just a template for the generic hardware manager, those steps can be requested though during deployment using deploy templates12:58
rpittauthis is also valid for clean steps btw on the cleaning phase12:58
rpittauthis guide gives some good info for custom steps https://docs.openstack.org/ironic/latest/admin/node-deployment.html12:58
kubajjrpittau: thank so much12:59
opendevreviewTakashi Kajinami proposed openstack/ironic master: Replace crypt module  https://review.opendev.org/c/openstack/ironic/+/93717313:00
opendevreviewTakashi Kajinami proposed openstack/ironic-python-agent master: Replace crypt module  https://review.opendev.org/c/openstack/ironic-python-agent/+/93717513:04
opendevreviewTakashi Kajinami proposed openstack/ironic-python-agent master: Replace crypt module  https://review.opendev.org/c/openstack/ironic-python-agent/+/93717513:04
opendevreviewDoug Goldstein proposed openstack/sushy master: drop runtime dependency on pbr  https://review.opendev.org/c/openstack/sushy/+/93718314:05
cardoeI pinged a few of you on the above ^. I wanted to ping rpittau as well but gerrit was mad at me.14:08
cardoeIt's a behavior change to sushy version reporting. But I wanted feedback from everyone because I would make the same change to the rest of the ironic projects. It removes the need for pbr in runtime. Which removes the installation of setuptools into virtualenvs. Which today is not an explicit dependency in a lot of places so it makes some packaging quirks/issues.14:09
cardoeThe biggest difference I can see in the parsing code from what we do is that importlib.metadata.version doesn't strip a leading "v" or "V" in the version while pbr does.14:10
cardoeSo if we don't do that then we can just use that function.14:10
JayFcardoe: what's the value of that change? I rather like that we outsource the details of that to a well-supported library14:15
JayFI've written kind of the opposite of this change in an Oslo pr14:16
cardoeSo I can't find it but we're only using released versions of pbr somewhere.14:16
cardoeAnd pbr doesn't have a release that works on Python 3.12 right now.14:16
cardoeThe problem is with dependencies and virtualenvs. pbr tries to import a couple of things without declaring them as dependencies.14:17
cardoePreviously it was all magically installed but they're installing less and less into virtualenvs now.14:18
cardoeAt the end of the day the only thing in the runtime case of pbr that's used is to grab the version info.14:19
cardoeAfter a few hoop jumps, if you're on Python 3.8 the resultant code is identical to calling that built in importlib function with 2 differences.14:20
cardoeThe first is that leading "v" and "V" is stripped off and the latter is that the version string is reassembled according to SemVer 3.0.0 (which isn't really a thing. pbr forked the SemVer 2.0.0 spec and made some changes to it and called it SemVer 3.0.0)14:25
cardoeSemVer 2.0.0 had some incompatibilities with PEP-440 (Python's versioning PEP). But PEP-440 has been superseded in https://packaging.python.org/en/latest/specifications/version-specifiers/ 14:25
cardoeWhich my reading of SemVer 3.0.0 which calls out its differences, I'm able to find all those changes in the Python Packaging Version Specifiers spec.14:26
cardoeimportlib.metadata.version has always implemented the Python Packaging Version Specifiers behavior per the docs.14:27
cardoeMy mentality here is that "less is more". We (OpenStack community) have a big maintenance burden of items. If there's something available in the stdlib that's now functionally the same. Let's use it. Especially when there's not even enough dev cycles to get pbr released.14:28
cardoe</soapbox>14:29
cardoepbr is still great at building and packaging14:29
TheJuliagood morning14:36
* TheJulia tries to wake up14:36
JayFcardoe: there is a difference, I can't summon up what it is at 6:30 a.m. local time. I'll try to find it, but I think you're basically using only One source whereas PBR will look in two places14:39
JayFAlso, if PBR needs python 3.12 support, it should be added pretty quickly...14:39
cardoehttps://review.opendev.org/c/openstack/pbr/+/92421614:42
cardoehttps://opendev.org/openstack/pbr/src/commit/46ff9dd96718cdefe72a1a01447e14491917217b/pbr/version.py#L467 is the place where it grabs the version info.14:43
cardoeI'll also go the other way and fix up stuff around pbr.14:47
TheJuliaJayF: speaking of neutron and weirdness.... :)14:50
JayFYeah, I just struggle with changes like this which bring us less in line with how people generally use things... It means we're going to have to fix it ourselves if it changes in a later python version, versus getting it for free if we continue14:50
JayFwith PBR14:50
JayFcardoe: just above that it'll use pkg_resources if needed too, right?14:55
rpittaucardoe: I appreciate the effort, I'm aware of the challenge of PBR support for 3.12, but I'd rather wait for that to happen, looks like it will be soon15:06
rpittauunless there's something really breaking or blocking us :)15:06
cardoeJayF: yeah it'll use pkg_resources but there's no dependency on it so it blows itself up :)15:07
JayFTheJulia: I was thinking you were going somewhere with that comment lol15:11
TheJuliaheh, see mailing list :)15:11
JayFwe need to do those changes ourselves :(15:13
JayFanyone want me to hold approval of https://review.opendev.org/c/openstack/ironic-specs/+/931025 (kea dhcp) for additional approval? myself and riccardo are +215:14
TheJuliaI can take a look later today if you want15:22
TheJuliaI'm not so sure we need to15:36
JayFI want it merged :) 15:36
TheJuliathen merge it15:36
JayFwfm15:36
JayFjust always give option for extra eyes with specs :D15:37
TheJuliaOur hash ring loops and is based upon conductor side entries, not time inside of wsgi15:45
TheJuliaconductor side entries in the db15:45
JayFwell I mean more generally migrating off eventlet-requiring wsgi stuuf15:45
JayF**stuff15:45
JayFI have resolved this cycle to get more things to 100% done and less things started and half-done, so I'm not touching it until I get less things partially done, but I'm makin' progress15:46
opendevreviewMerged openstack/ironic-specs master: Add a Kea DHCP backend  https://review.opendev.org/c/openstack/ironic-specs/+/93102515:47
JayFSo, question; I just had a feature ask from my downstream. Nova has an ability to setup population of vendor_data via an external web service. I was wondering if we would have any interest in an ironic-side implementation of that service and/or some hook in the virt driver to put some ironic-provided vendor data into place15:47
JayFExample: the end-user needs to know what rack/switch they are connected to for performance tuning reasons to compare with other machines -- they don't need to dictate it, they just need to know. This information is inspected by Ironic, but needs to be exposed to the instance.15:48
TheJuliaI'd kind of be curious what the use case is15:48
JayFMy thought: either 1) an Ironic-adjacent project implements the rest service Nova wants or 2) we look at adding a method to the virt driver interface in nova allowing the driver to populate some vendor data as well15:49
TheJuliaThat seems... semi-reasonable15:49
JayFTheJulia: basically just that ^^ "expose to the instance user some useful data about the node found in inspection"15:49
JayFin my case; it's all topological information we care about15:49
JayFbut I could easily see that including other information we could find in inspection, too15:49
TheJuliaThey could find most of that out themselves15:50
TheJuliabut topological is a little harder15:50
JayFyeah, topological is the piece we caare about in this case15:50
JayFand it's populated in inspection downstream 15:50
JayFhonestly the more people use integrated openstack to build other cloud platforms on top the more useful this kinda feature is likely to be15:53
TheJuliaIndeed15:56
rpittaugood night! o/17:03
cardoeJayF: I'll pivot a different way then.17:03
JayFyeah sorry for being fun police there, I hate nak'ing a patch someone already wrote :( 17:04
JayFwe've just been bitten by that type of change in the past, basically taking on maintenance work for Ironic that's usually handled by the group17:04
cardoeIt's fine. I do believe ultimately all that should be retired.17:05
JayFoof, just kernel modules on the dist-kernel for gentoo are like, 500 MB 17:09
JayFhow the hell is tinyipa so small?!17:09
cardoeyou strip them?17:13
dtantsurBecause it barely includes this modules :)17:13
dtantsurSame problem with any DIB-built images: a lot of modules and firmware17:13
JayFI'm going to look and see what modules are in the tinyipa image17:13
JayFand manually prune the stuff in gentoo image17:13
JayFeven ironic-python-agent itself, the venv, is somehow larger than the entire tinyipa image17:14
JayFI don't know how that's even possible17:14
JayFI'm now wondering if this whole project is folly :( 17:15
JayFthe status quo is no good, but it's also created a bar that's nearly impossible to clear17:15
opendevreviewJulia Kreger proposed openstack/ironic master: docs: clarification around setting port llc data  https://review.opendev.org/c/openstack/ironic/+/93719317:16
JayFoh, I'm also comparing uncompressed sizes to compressed sizes17:17
TheJuliaJayF: also, in IPA we prune out some stuff pretty aggressively on the firmware/modules side.17:18
TheJuliaerr, with dib builds17:18
JayFyeah, those are running on the gentoo build, too17:18
JayFpaths are the same I checked17:18
TheJuliaoh, goodie17:18
TheJuliano need for huge 256MB compressed firmware blobs which actually just ahve a nested kernel inside of them17:18
JayFgot a good piece of advice from #gentoo-chat: I forgot the gentoo pdb contains a readable listing of installed files for a package17:19
JayFso I should be able to write a function that, given a gentoo package name, removes it without package manager assistance17:19
JayF...meaning I can also remove the package manager :D17:19
TheJuliahehehe17:19
JayFyeah I am slightly heartened, I'll keep digging17:19
JayFreally the ideal way to do this is write something more debootstrap-style, with an external portage modifying the image inside, but that's more change than I wanna write today17:20
opendevreviewMerged openstack/ironic master: docs: note ipv6 is a good idea with neutron interface  https://review.opendev.org/c/openstack/ironic/+/93695117:38
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93720620:14
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93720620:45
cardoeHard20:54
cardoeJayF: nothing to maintain. We are just reading and dumping out the VERSION in the pkg metadata.20:55
cardoeBut I’ll add the necessary dependencies so pbr works.20:56
JayFI am still trying to remember21:00
JayFthere's something version-y that pbr does21:00
JayFthat other things don't21:00
JayFmaybe getting versions from git?21:00
JayFor deriving them from git?21:00
JayFadamcarthur5: are your i-t-p changes ready for review now?21:06
cardoeYes it derives them from git for the build. I’m a +1 on pbr for builds. But -1 on runtime.21:08
adamcarthur5JayF i-t-p?21:08
JayFironic-tempest-plugins21:08
JayFegad; ironic-lib json-rpc uses an eventlet-based wsgi server21:09
JayFanother place we'll have to excise it :( 21:09
adamcarthur5Ah, well, I have nothing ready for review at present21:10
adamcarthur5I'll stick ironic-prio-weekly on it when I do21:10
JayFah, too bad, that looked good, you just waiting on ci or something else21:10
cardoeI need to install easy_install just to “import ironic” which is gross.21:10
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93720621:11
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Microversion Test Generator  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93720621:12
adamcarthur5CI + I will abandon the conductor + node firmware ones and build on top of the generator one JayF21:13
adamcarthur5No point merging changes when I have a new pattern for them21:13
JayFyeah tahts' what I was asking, if 937206 was r4r21:13
adamcarthur5r4r? :^)21:15
adamcarthur5Sorry I feel like I should be getting these but I have no idea.21:15
JayFready for review21:16
JayFany number of these may just be stuff I have made up lol21:16
JayFit's not your fault I'm speaking in riddles21:16
JayF  /nick Sphinx21:16
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/allocations  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93721321:18
opendevreviewAdam McArthur proposed openstack/ironic-tempest-plugin master: Testing bad microversions on v1/nodes/{uuid}/firmware  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93721421:22
TheJuliaI often speak in riddles of context. At some point it is a responsibility!21:31
adamcarthur5JayF I am leaving for the day, so lets just pray CI passes and say that the changes are good for review: https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/93720621:32
adamcarthur5Its just one relation-chain now21:32
cardoeUgh. I think we are hitting. https://bugs.launchpad.net/nova/+bug/201997721:54
cardoeYou recall Jay what you and Sean discussed on there?21:55
JayFthis, I believe, is what TheJulia was talking about yesterday with me, what's breaking the inspector grenade job22:09
JayFand it sounds like from the list it's just being more-often-surfaced due to a neutron issue22:09
JayFbut I'm not 100% sure and cannot look in detail 22:09
TheJuliano22:11
TheJuliainspector's grenade job was being broken by neutron going out to literal lunch for over a minute22:11
TheJuliamark's bug is over the issue that nova insists on deletion of resources and value detachment ownership when Ironic has to do it as part of it's flow to ensure it is done because nova is no the only user. The case can be hit when a node can't exit the initial lock quickly which breaks the flow on the nova side22:12
TheJuliathat can be a result of resource contention on the conductor, though22:12
TheJuliaor, maybe, yes22:13
TheJuliadepends on what, exactly, but there is a difference. That being said, if we're hitting underlying instance performance issues, we likely need to disjoint the issues22:14
cardoeSo if we delete in a test loop it hits and the second time it’s good.22:14
TheJuliaAnyway, I need to run to an in-person meeting22:14
TheJuliawell, there is a fundimental disagreement over responsibility22:15
TheJuliaand then we get into statements asserting ironic has no right to do a thing22:15
TheJuliaand... the conversation goes sideways because different worlds exist22:15
cardoe:/22:17
TheJuliathat being said, if we're seeing 2 minutes for the lock to get released, sounds like CI resource contention is near max22:19
TheJuliawe should check to see if that is actually the case, if htat makes sense22:20
TheJuliastepping away for appt22:20
cardoeOh not the project’s tests. This is one of our internal tests.22:22
TheJuliaoh, then I'm totally not understanding23:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!