Monday, 2025-07-14

*** cloudnull1097742 is now known as cloudnull10977403:02
gibiHi folks! Was there any infra reconfiguration that could explain why the nova-ceph-multistore job started consistently failing since 2025-07-11 08:33:54 with qemu-img info failing with 'failed to allocate memory for stack: Cannot allocate memory\n'. I filed a tracker bug https://bugs.launchpad.net/nova/+bug/211685213:14
gibiThe qemu package versions did not change I checked that13:14
fungigibi: can you link to a failing build result in the zuul dashboard?13:39
sean-k-mooneyhum13:40
gibihttps://zuul.opendev.org/t/openstack/build/3c87e32281df4101b698b01ae6b1f3d5 this is the first failure13:40
fungiah, the bug has one13:40
gibiand this is the list https://zuul.opendev.org/t/openstack/builds?job_name=nova-ceph-multistore&skip=013:40
fungi(note that those urls only have logs for 30 days, so recording them in a bug long-term isn't terribly useful)13:40
sean-k-mooneyis it only happening with ceph voluems or also local13:40
gibisingle tempest triggering it test_extend_attached_encrypted_volume_luksv113:41
gibionly the ceph job are affected13:41
sean-k-mooneyok thats a good data point13:42
fungihttps://zuul.opendev.org/t/openstack/build/3c87e32281df4101b698b01ae6b1f3d5/log/zuul-info/host-info.controller.yaml#474 indicates it's an 8gb node, so same as it's always been13:42
gibiyepp, the last succeeded and first failed run has the same amount of memory and similar amount of memory usage13:42
sean-k-mooneyi wonder if there were any ulimit or ceph changes13:43
sean-k-mooneythe recent qemu packaging issues were related to the librbd version although that might be unrelated13:44
fungialso added 8gb of swap at the start of the build: https://zuul.opendev.org/t/openstack/build/3c87e32281df4101b698b01ae6b1f3d5/console#4/0/15/controller13:44
sean-k-mooneyi dont think this is actully a vm memory issue13:44
fungiso i don't see anything that should make the test node have less total virtual memory than before13:44
sean-k-mooneytderr: 'failed to allocate memory for stack: Cannot allocate memory\n'13:44
sean-k-mooneyfailing to allcoate enough meory for the stack sould like a bug in qemu-img to me13:45
gibithe memtracker output did not show any bad numbers, the journal has no OOM activity13:45
gibisean-k-mooney: the qemu package version exactly the same in tha passing and in the failing runs13:46
gibi8.2.213:46
sean-k-mooneyright but what about the librbd or the crypto libs13:46
gibiI can check....13:46
frickleror kernel version? (just guessing though)13:46
sean-k-mooneyboth the luks and rbd supprot are dlls13:47
sean-k-mooneywell .so files i guss13:47
sean-k-mooneyfrickler: that possible but qemu-img shoudl effectivly be userspace only 13:47
gibibingo13:47
gibilibrdb 19.2.0 passes 19.2.1 fails13:47
gibis/bd/db/13:48
gibido we have an easy way to control this lib version somewhere ?13:49
sean-k-mooneyyou could maybe test that by forcing a downgrade in the post-config devstack step or local.sh is actuly simpler to test with13:50
sean-k-mooneyyou can drop in a pre playbook into that job which writes a local.sh adn force install 19.2.013:50
sean-k-mooneyif that works i assuem we could add a pre playbook to the job to blacklist that version? im not 100% sure how to do that in apt but i woudl be surpised if that is not a thing13:51
sean-k-mooneyyou can do it with a manual install and mark it to hold hte package if nothing else13:51
gibiOK I will try to craft something. Until that / or in parallel with that I have https://review.opendev.org/c/openstack/tempest/+/954949 to skip the test case13:54
sean-k-mooneyack makes sense. there is high overlap with the folks in #openstack-qa but probaly best to share that there too13:54
gibiyepp, done13:59
gibihttps://github.com/ceph/ceph/pull/60133/commits/cdbe2c1bb4731395423ab77f3edd2ee4bd053148 this is between 19.2.0 and 19.2.1 and touching stack allocators 14:16
*** losulers is now known as losuler14:28
gibiand testing the package downgrade via https://review.opendev.org/c/openstack/nova/+/95495614:55
clarkbsean-k-mooney: gibi keep in mind that pinning packages like that doesn't make it work for anyone else16:30
clarkbthe purpose of the CI system is to let us know when things are broken. Papering over the problem amy be appropriate temporarily to not hold up the factory floor but is unlikely to be beneficial for anyone outside of ci16:31
clarkbso getting that fixed in the pacakge is important16:31
sean-k-mooneyclarkb: well the intent is to determin if its a disto packageing bug and avoid the buggy package16:35
sean-k-mooneyclarkb: if we can confirmt its actully a bug in 19.2.1 we can report that to the ubuntu or librbd maintainers16:36
clarkbright but you cannot install old distro packages reliably (they disappear).16:36
clarkbso pinning necessarily is short term is my point16:37
sean-k-mooney ya, i was hopign we coudl jsut say !19.2.116:37
sean-k-mooneyhttps://github.com/ceph/ceph/pull/60133/commits/cdbe2c1bb4731395423ab77f3edd2ee4bd053148 looks plausible but my c++ knowlage is a littel out of date.16:38
sean-k-mooneygibi: that also in ceph core so im not sure if that will impact librbd16:39
gibiclarkb: as soon as I have proof that the package version is the problem I will open a bug upstream16:39
gibito the package maintainer16:39
sean-k-mooneybut there is a redhater email there so perhasp we could reach out to them16:39
opendevreviewScott Little proposed openstack/project-config master: add ability for group starlingx-release to do merges on starlingx/docs  https://review.opendev.org/c/openstack/project-config/+/95503519:53
opendevreviewMerged openstack/project-config master: add ability for group starlingx-release to do merges on starlingx/docs  https://review.opendev.org/c/openstack/project-config/+/95503520:06

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!