Tuesday, 2023-04-25

gmann	dansmith: elodilles has patch up to remove all py38 jobs including focal job https://review.opendev.org/c/openstack/nova/+/881339	00:37
ykarel	dansmith, only merged in neutron repo, others are still unmerged, i pushed the revert	04:49
bauzas	catching up this night's conversations	06:55
* bauzas is about to cry in a corner		06:55
opendevreview	Balazs Gibizer proposed openstack/nova master: DNM: Revert "Temporary skip some volume detach test in nova-lvm job" https://review.opendev.org/c/openstack/nova/+/881389	07:17
gibi	bauzas: I need to leave early today so I will miss the nova meeting	08:49
bauzas	gibi: ack, no worries	09:04
opendevreview	Amit Uniyal proposed openstack/nova master: WIP: Reproducer for dangling volumes https://review.opendev.org/c/openstack/nova/+/881457	10:01
opendevreview	Merged openstack/nova stable/yoga: Handle InstanceInvalidState exception https://review.opendev.org/c/openstack/nova/+/872117	11:27
dansmith	gmann: I don't see that that ever passed	13:01
dansmith	using the alternate python interpreter on focal seems less good to me than just fixing the ceph job	13:02
dansmith	and also not necessary if people don't prohibit 3.8	13:02
sean-k-mooney	dansmith: ya i suggested it because it used to work but it obviously got broken at some point	13:03
sean-k-mooney	i was hoping it woudl be a quick fix to unblock the gate	13:03
sean-k-mooney	just makeign the job use 22.04 would be my preference too	13:03
bauzas	yup, let's try to use Jammy	13:08
opendevreview	ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (api) https://review.opendev.org/c/openstack/nova/+/836830	13:11
opendevreview	ribaudr proposed openstack/nova master: Check shares support https://review.opendev.org/c/openstack/nova/+/850499	13:11
opendevreview	ribaudr proposed openstack/nova master: Add metadata for shares https://review.opendev.org/c/openstack/nova/+/850500	13:11
opendevreview	ribaudr proposed openstack/nova master: Add instance.share_attach notification https://review.opendev.org/c/openstack/nova/+/850501	13:11
opendevreview	ribaudr proposed openstack/nova master: Add instance.share_detach notification https://review.opendev.org/c/openstack/nova/+/851028	13:11
opendevreview	ribaudr proposed openstack/nova master: Add shares to InstancePayload https://review.opendev.org/c/openstack/nova/+/851029	13:11
opendevreview	ribaudr proposed openstack/nova master: Add helper methods to attach/detach shares https://review.opendev.org/c/openstack/nova/+/852085	13:11
opendevreview	ribaudr proposed openstack/nova master: Add libvirt test to ensure metadata are working. https://review.opendev.org/c/openstack/nova/+/852086	13:11
opendevreview	ribaudr proposed openstack/nova master: Add virt/libvirt error test cases https://review.opendev.org/c/openstack/nova/+/852087	13:11
opendevreview	ribaudr proposed openstack/nova master: Add share_info parameter to reboot method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/854823	13:11
opendevreview	ribaudr proposed openstack/nova master: Support rebooting an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/854824	13:11
opendevreview	ribaudr proposed openstack/nova master: Add instance.share_attach_error notification https://review.opendev.org/c/openstack/nova/+/860282	13:11
opendevreview	ribaudr proposed openstack/nova master: Add instance.share_detach_error notification https://review.opendev.org/c/openstack/nova/+/860283	13:11
opendevreview	ribaudr proposed openstack/nova master: Add share_info parameter to resume method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/860284	13:11
opendevreview	ribaudr proposed openstack/nova master: Support resuming an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/860285	13:11
opendevreview	ribaudr proposed openstack/nova master: Add helper methods to rescue/unrescue shares https://review.opendev.org/c/openstack/nova/+/860286	13:11
opendevreview	ribaudr proposed openstack/nova master: Support rescuing an instance with shares (driver part) https://review.opendev.org/c/openstack/nova/+/860287	13:11
opendevreview	ribaudr proposed openstack/nova master: Support rescuing an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/860288	13:11
opendevreview	ribaudr proposed openstack/nova master: Docs about Manila shares API usage https://review.opendev.org/c/openstack/nova/+/871642	13:11
opendevreview	ribaudr proposed openstack/nova master: Mounting the shares as part of the initialization process https://review.opendev.org/c/openstack/nova/+/880075	13:11
opendevreview	ribaudr proposed openstack/nova master: Deletion of associated share mappings on instance deletion https://review.opendev.org/c/openstack/nova/+/881472	13:11
dansmith	bauzas: oh I didn't realize you rechecked my patch and it failed again with another uninstallable package	13:16
dansmith	not sure if it's 3.9 related or not though	13:17
bauzas	I haven't seen the logs yet	13:17
dansmith	I rechecked again so we'll see	13:17
ykarel	fwiw pysaml2 update is py3.9+ only and keystone installation is broken in focal or py38	13:18
dansmith	yeah that ^	13:19
dansmith	that's what I failed on	13:19
dansmith	cripes	13:19
ykarel	revert patch https://review.opendev.org/c/openstack/requirements/+/881466	13:19
bauzas	ykarel: last time I looked, this was due to the fact that py39 wanted UUIDs	13:21
ykarel	bauzas, sorry which issue? /me can't relate to UUIDs	13:35
dansmith	yeah not sure what the uuid thing is	13:39
bauzas	it was when elodilles tried to update the py version for ceph-multinode	13:41
bauzas	but I can be wrong	13:41
dansmith	oh, unrelated to pysaml I see	13:41
ykarel	ack	13:41
bauzas	speaking of https://1742443592df6307c50f-d94079506d616d2cb38d2b70f09b441e.ssl.cf2.rackcdn.com/881339/3/check/nova-ceph-multistore/5f269ed/controller/logs/devstacklog.txt	13:41
bauzas	2023-04-24 11:48:22.720 \| [01;33mWARNING py.warnings [[01;36mNone req-518808ec-0a14-444d-b6d3-fb6383d03e9c [00;36mNone None[01;33m] [01;35m[01;33m/usr/local/lib/python3.9/dist-packages/pycadf/identifier.py:71: UserWarning: Invalid uuid: RegionOne. To ensure interoperability, identifiers should be a valid uuid.	13:42
dansmith	eharney: are you around to talk about the ceph job stuff? I think you were merging some of those cephadm changes	13:43
ykarel	ack got it, it's seperate thing	13:43
eharney	dansmith: yes	13:45
eharney	dansmith: what's going on there?	13:45
dansmith	eharney: so, is that how we should be moving the upstream jobs at this point? jammy has quincy ceph packages, as I understand it	13:45
dansmith	and the cephadm jobs don't work either	13:45
dansmith	and they're nonvoting so I assumed that meant they weren't "ready"	13:47
dansmith	eharney: anyway, the context is, we desperately need to get the ceph jobs running on jammy	13:47
eharney	i was trying to understand the situation with these the other day, i guess they were pinned to focal before because there weren't jammy packages, but maybe now there are?	13:48
dansmith	eharney: yeah that's the reason for the pinning, but indeed now there seem to be base packages in jammy for quincy	13:49
dansmith	I rechecked the unpin patch yesterday and it failed with qemu not having the rbd block driver	13:49
dansmith	eharney: https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315?tab=change-view-tab-header-zuul-results-summary	13:49
eharney	hmm, looking, i'm not really up to speed on this	13:50
dansmith	eharney: okay is there someone else we should bug?	13:50
dansmith	eharney: because right now the ceph jobs are all totally wedged because a bunch of packages have just banned anything python<=3.8	13:51
eharney	dansmith: not sure who but i think some manila folks have been more active in devstack-plugin-ceph recently	13:51
dansmith	reverts are in place or in flight for some of them, but we need to get it resolved	13:51
eharney	ahh	13:51
dansmith	because focal is not in our PTI right now...	13:51
dansmith	eharney: so maybe gouthamr ?	13:52
sean-k-mooney	dansmith: did you notice the ceph nfs job passed? https://zuul.opendev.org/t/openstack/build/3f0ec49fa9e048f6a73e6c03833fecc9/logs	13:57
dansmith	sean-k-mooney: I assume that's not using the qemu block driver	13:57
sean-k-mooney	perhaps although that is an odd issue	13:57
dansmith	sean-k-mooney: the actual setup of ceph seemed to work on jammy, it's just that qemu doesn't have the block driver to load to talk to it (or something)	13:57
sean-k-mooney	they may have chagned that to a weak dep in the qemu packaging	13:58
sean-k-mooney	so we might need to install it in devstack explictly	13:58
dansmith	it seemed like maybe those were provided in the ceph packages instead of qemu or something	13:58
dansmith	sean-k-mooney: well, I looked and couldn't find any package in jammy for it	13:58
dansmith	sean-k-mooney: but yes, hopefully it's something simple	13:58
sean-k-mooney	i assumed it was complied in if im being honest but didnt look	13:58
dansmith	me too	13:59
dansmith	until I saw the logs.. let me get you a link	13:59
dansmith	sean-k-mooney: https://zuul.opendev.org/t/openstack/build/1435ec8b57be4b159c3d373e23673ab5/log/controller/logs/screen-n-cpu.txt#8437	13:59
dansmith	it's also weirdly named "block-block-rbd".. perhaps there's some change in jammy with something and someone is prefixing an extra... prefix/	14:00
dansmith	although it says "unknown driver rbd" on the next line, which looks right	14:00
sean-k-mooney	there nova package just depend on qemu-system https://packages.ubuntu.com/jammy/nova-compute-qemu	14:00
bauzas	dansmith: sean-k-mooney: so, to clarify, once we merge https://review.opendev.org/c/openstack/nova/+/881409	14:00
bauzas	we will still have a problem with ceph-multistore due to keystone + cephadm right?	14:01
dansmith	bauzas: yes	14:01
sean-k-mooney	hum that is not linked agains librbd	14:01
sean-k-mooney	https://packages.ubuntu.com/jammy/qemu-system-x86	14:01
dansmith	bauzas: not "due to cephadm"	14:01
sean-k-mooney	ah	14:01
sean-k-mooney	https://packages.ubuntu.com/jammy/qemu-block-extra	14:01
dansmith	bauzas: just the ceph job	14:01
sean-k-mooney	dansmith: we are misisng qemu-block-extra	14:01
bauzas	because of keystone ?	14:01
sean-k-mooney	that is what provides the rbd supprot	14:01
* bauzas tries to correctly understand the problem		14:02
dansmith	sean-k-mooney: ack, nice	14:02
sean-k-mooney	and just checked its a recommended package	14:02
eharney	ahh, cool, was just looking for that missing link myself	14:02
sean-k-mooney	not a dep	14:02
dansmith	sean-k-mooney: so we likely need a devstack ceph plugin change, let me push that up and try a dep	14:02
sean-k-mooney	ya so we coudl also do this via bindep in nova as an optional dep . perhaps we shoudl as a follow up	14:03
sean-k-mooney	i was just going to look at devstack to see how it installes qemu/libvirt	14:03
sean-k-mooney	and add it there but doing it in the devstack plugin proably makes more sense	14:03
eharney	it can go in devstack-plugin-ceph/devstack/files/debs i think	14:03
dansmith	sean-k-mooney: is that package also in focal?	14:03
sean-k-mooney	yep	14:04
sean-k-mooney	well im waiting fo the page to refrsh	14:04
dansmith	it is	14:04
sean-k-mooney	but it looks like ti shoudl be there form bionic	14:04
dansmith	I confirmed	14:04
dansmith	https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/881479	14:05
dansmith	I'll fix up the revert-focal devstack patch to depend on it	14:05
sean-k-mooney	so they moved it form suggetes to recommends from focal to jammy	14:05
dansmith	ah	14:06
sean-k-mooney	https://github.com/openstack/devstack/blob/master/lib/nova_plugins/functions-libvirt#L72	14:07
sean-k-mooney	so i was thinkign of just adding hit there to be honest	14:07
sean-k-mooney	just alwasy install qemu-block-extra in base devstack	14:08
dansmith	the ceph plugin already has a list of specific-to-it deb packages.. so it needs to go there anyway, IMHO	14:08
sean-k-mooney	sure	14:08
sean-k-mooney	im just wonderign what else is ther	14:09
sean-k-mooney	it look like libisci is also there	14:09
dansmith	xfsprogs	14:09
dansmith	although I dunno why	14:09
dansmith	anyway, let's see if this works	14:09
sean-k-mooney	aho ok we dont need it for isci because we host mount	14:09
sean-k-mooney	for multipath	14:09
dansmith	ah, iscsi is in block-extra as well?	14:09
sean-k-mooney	yes but only if you use qemu to directly conenct to the iscsi backend which we nolonger do. i dont think we use gluserfs either	14:10
dansmith	right	14:10
dansmith	sean-k-mooney: eharney okay it installed the qemu-block-extra package successfully, so that's good.. we'll see if it can actually boot instances once it gets there :)	14:35
dansmith	looks like it might be booting servers ...	14:49
*** iurygregory_ is now known as iurygregory		15:00
bauzas	reminder : nova meeting in 35 mins	15:25
bauzas	sean-k-mooney: dansmith: I'm about to communicate back to the ML saying that the gate is unblocked, amirite ?	15:34
dansmith	no	15:34
sean-k-mooney	no	15:34
sean-k-mooney	none fo the patches are merged	15:34
dansmith	we have a ways to go before that	15:34
dansmith	I have to squash my fix into the main one, since it can't merge on its own	15:34
dansmith	and we'll have to recheck the main one anyway since it failed more volume attach things	15:34
dansmith	so we still got a bit	15:35
bauzas	because other libs are capping >=3.9 ?	15:35
* bauzas is confused		15:35
bauzas	I thought neutron did the revert	15:35
dansmith	bauzas: so many more problems than that dude :)	15:36
dansmith	now pysaml2 has gone >3.8 which is breaking everyone	15:36
bauzas	oh	15:36
dansmith	that revert may land and we might be okay, but there might be others too	15:36
bauzas	yeah, I got the memo but I didn't read it correctly	15:36
bauzas	okay, then I'll write some status email then	15:37
sean-k-mooney	i think we are just goign to mvoe everytin to jammy and drop the 3.8 testing	15:37
bauzas	saying we can't guarantee that any of the deps are still able to use py3.8	15:37
sean-k-mooney	assumign the ceph job evenutally passes without the detach error	15:37
bauzas	sean-k-mooney: that was my thought	15:37
bauzas	https://review.opendev.org/c/openstack/nova/+/881409 should help	15:38
sean-k-mooney	https://review.opendev.org/q/topic:drop-py38 is the full set of fixes	15:39
sean-k-mooney	we basically need to revert the py39 change out of https://review.opendev.org/c/openstack/nova/+/881339 and incopreate the ceph job fixes instead	15:40
sean-k-mooney	so https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/881479 and https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315 need to be squased	15:41
sean-k-mooney	and https://review.opendev.org/c/openstack/nova/+/881339 need to depend on the squahsed commit	15:41
sean-k-mooney	dansmith: those were the two patches you planned to squash right?	15:42
bauzas	sean-k-mooney: thanks, I'll explain it	15:42
dansmith	yes	15:42
dansmith	I'm waiting for the run to finish so I can look at the failures	15:42
dansmith	unfortunately i looks like maybe we OOMed or something because now we're failing to clean up neutron ports and various other things	15:43
dansmith	hopefully the new ceph doesn't like use a lot more memory or something :/	15:43
sean-k-mooney	we might need to bump the swap or enabel the mariadb thing	15:43
sean-k-mooney	thats still not on by defualt in devstack is it?	15:43
dansmith	it is for the ceph job in nova, but yeah maybe not here not sure	15:45
dansmith	we'll see when it finishes	15:45
sean-k-mooney	the default is still false https://github.com/openstack/devstack/blob/4dfb67a831686279acd66f65e51beba42f675c91/stackrc#L207	15:45
dansmith	right	15:46
sean-k-mooney	although it is enabeld by default in zuul for multi node jobs https://github.com/openstack/devstack/blob/2e607b0cbd91d9243c3e9424a500598c72ae34ad/.zuul.yaml#L701	15:46
sean-k-mooney	but im assumign thsi is singelnode	15:46
sean-k-mooney	anyway we have a few levers we can pull when we knwo why it failed	15:46
opendevreview	Elod Illes proposed openstack/nova master: Drop py38 based zuul jobs https://review.opendev.org/c/openstack/nova/+/881339	15:55
opendevreview	Elod Illes proposed openstack/nova master: Drop py38 support from setup.cfg and tox.ini https://review.opendev.org/c/openstack/nova/+/881365	15:56
bauzas	#startmeeting nova	16:00
opendevmeet	Meeting started Tue Apr 25 16:00:02 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.	16:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	16:00
opendevmeet	The meeting name has been set to 'nova'	16:00
bauzas	hey everyone	16:00
elodilles	o/	16:00
dansmith	o/	16:00
bauzas	we are quite busy today so let's try to be quick	16:00
ykarel	o/	16:00
auniyal	o/	16:00
bauzas	#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting	16:00
bauzas	#topic Bugs (stuck/critical)	16:00
bauzas	#info No Critical bug	16:00
bauzas	#link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 17 new untriaged bugs (+4 since the last meeting)	16:00
bauzas	#info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster	16:01
bauzas	honestly, I think I forgot to tell melwitt to look at the bugs, so it's on me :)	16:01
bauzas	the next person in the roster is artom	16:01
artom	ohhai	16:01
bauzas	artom: fancy triaging some upstream bugs if you like ?	16:01
artom	Sure	16:02
bauzas	thanks a lot	16:02
artom	CLOSED RUSTWILLFIXIT	16:02
bauzas	I'll also try to cherry-pick some	16:02
bauzas	artom: well, we're on Launchpad	16:02
bauzas	so it'd be 'Closed' only with a comment saying you'd think Rust would work	16:03
bauzas	or even 'Wontfix'	16:03
bauzas	anyway	16:03
bauzas	#info bug baton is being passed to artom	16:03
bauzas	#topic Gate status	16:03
bauzas	grab your popcorns	16:03
bauzas	#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs	16:03
bauzas	#link https://etherpad.opendev.org/p/nova-ci-failures	16:03
bauzas	but alas,	16:03
bauzas	#link https://lists.openstack.org/pipermail/openstack-discuss/2023-April/033454.html Gate is blocked	16:04
bauzas	last status : https://lists.openstack.org/pipermail/openstack-discuss/2023-April/033468.html	16:04
bauzas	dansmith: sean-k-mooney: elodilles: wanting to add anything else on that ?	16:05
dansmith	probably not	16:05
sean-k-mooney	we are mainly waiting on ci	16:05
sean-k-mooney	so no i think that fine	16:05
elodilles	i've just updated the nova patches according to the discussions (if i did not miss anything)	16:05
bauzas	we have two concurrent patches afaics https://review.opendev.org/c/openstack/nova/+/881409 and https://review.opendev.org/c/openstack/nova/+/881339	16:05
elodilles	if those will be the chosen ones :)	16:05
bauzas	we may want to only use one :)	16:06
dansmith	yeah	16:06
sean-k-mooney	elodilles: your missing the depend on agains the devstack patch but those need to be squashed first	16:06
bauzas	anyway, we'll sort that out of the meeting	16:06
sean-k-mooney	so its fine for now	16:06
elodilles	++	16:07
bauzas	yup and thanks to dansmith and sean-k-mooney for working hardly on the ceph patches	16:07
sean-k-mooney	mainly dansmith	16:08
bauzas	-EOLDGREEK to me	16:08
sean-k-mooney	i have been busy with other things other then checking back every now and then	16:08
bauzas	(I mean, I know what ceph does and all the things, but that is a different story)	16:08
sean-k-mooney	anywya we can move on	16:08
bauzas	cool	16:08
bauzas	https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly	16:08
bauzas	unsurprisingly they are all green	16:09
bauzas	but by their runs we didn't had the py39 updates :)à	16:09
sean-k-mooney	good timing i guess	16:09
bauzas	well, tooz upgraded to 4.0 the week before	16:10
opendevreview	Merged openstack/nova stable/xena: Fix rescue volume-based instance https://review.opendev.org/c/openstack/nova/+/875343	16:10
bauzas	so that's fortunate we haven't merged the u-c patch during the weekend	16:10
bauzas	anyway, moving on	16:10
bauzas	#info Please look at the gate failures and file a bug report with the gate-failure tag.	16:11
bauzas	#info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures	16:11
bauzas	#topic Release Planning	16:11
bauzas	#link https://releases.openstack.org/bobcat/schedule.html	16:11
bauzas	#info Nova deadlines are set in the above schedule	16:11
bauzas	#info Bobcat-1 is in 2 weeks	16:11
bauzas	(we'll have a stable branch review day on Bobcat-1 but I'll explain this the next week)	16:11
bauzas	#topic Review priorities	16:12
bauzas	#link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2)	16:12
bauzas	#info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review	16:12
bauzas	#topic Stable Branches	16:12
bauzas	elodilles: take the mic	16:12
elodilles	yepp	16:12
elodilles	beyond the usual stuff,	16:12
elodilles	(unblocked gate & many rechecks)	16:13
elodilles	auniyal prepared release patches for yoga and zed	16:13
elodilles	they haven't merged yet	16:13
elodilles	and meanwhile 1-1 patches merged to stable/yoga and zed	16:13
auniyal	yes, got 1 +2 yet :), thanks for review elodilles	16:14
elodilles	otherwise they are good as they are	16:14
elodilles	auniyal: thanks for proposing the patches :)	16:14
bauzas	cool	16:14
bauzas	anything else ?	16:15
elodilles	nope, i think that was all	16:16
elodilles	from my side	16:16
elodilles	sorry :)	16:16
bauzas	cool, moving on	16:17
bauzas	#topic Open discussion	16:17
bauzas	(ykarel) Allow to add tb-cache size libvirt option for qemu, context https://bugs.launchpad.net/nova/+bug/1949606	16:17
bauzas	ykarel: go for it	16:17
ykarel	hi	16:19
ykarel	so qemu-5.0.0(included in ubuntu jammy) update default tb cache size to 1 GiB(from 32 MiB) for system-emulated guest vms and with that each guest VM is using much more system memory(1 GB+ memory for guest), resulting into oom-kill issues when creating multiple guest vms concurrently in neutron scenario jobs using ubuntu guest vms	16:19
ykarel	libvirt-8.0.0 added an option to configure it per guest vm	16:20
ykarel	Currently testing WIP nova patch https://review.opendev.org/c/openstack/nova/+/868419 in https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/881391	16:20
ykarel	should this be a specless RFE or can continue as a bug?	16:20
sean-k-mooney	ykarel: did you file the specless pluprint i asked for	16:20
sean-k-mooney	ah	16:20
sean-k-mooney	its not a bug	16:20
ykarel	sean-k-mooney, i just added to meeting agenda to get clearity	16:21
ykarel	but if it's a way to go then i will file it	16:21
sean-k-mooney	we might conider it to be a small enough workaround that we might want to backport it upstream	16:21
sean-k-mooney	to enable ci testing	16:21
ykarel	hmm mainly would be needed for releases running on jammy	16:22
sean-k-mooney	the issue is that i dont think 22.04 has libvirt 8.0.0	16:22
ykarel	it has	16:22
ykarel	that's what we testing	16:22
sean-k-mooney	ah ok	16:22
bauzas	we damn need update of https://docs.openstack.org/nova/latest/reference/libvirt-distro-support-matrix.html	16:23
sean-k-mooney	yes	16:23
sean-k-mooney	so your patch is mising a min libvirt version check	16:23
bauzas	we're still having libvirt 6.0.0 as a min, right?	16:23
sean-k-mooney	so that will need to be added	16:23
ykarel	from job logs libvirt-daemon 8.0.0-1ubuntu7.4	16:23
sean-k-mooney	bauzas: yes and im hoping to bump that to 7.0.0 this release	16:23
ykarel	sean-k-mooney, yes will include that in next update	16:23
sean-k-mooney	but we will sitll need to check 8.0.0 supprot	16:24
sean-k-mooney	ack	16:24
bauzas	sean-k-mooney: but we would still need to only set it if libvirt >=8	16:24
bauzas	do we want to schedule on it, or is it just a performance fix ?	16:24
sean-k-mooney	ya either only set it or preferable check this in init-host	16:24
sean-k-mooney	and raise an error if you set the config option on an old libvirt	16:24
bauzas	will there be any config knob ?	16:24
sean-k-mooney	yes there shoudl be	16:25
ykarel	yes that's second question	16:25
sean-k-mooney	to set the cache size	16:25
ykarel	Default setting for this new option, unconfigured or set to defaults like 32MB or 128 MB etc?	16:25
bauzas	ykarel: if that's a config option, that indeed needs to be an hard fail if the operators sets this config value	16:26
bauzas	and libvirt isn't recent enough	16:26
sean-k-mooney	its not somethign we should hardcode	16:26
ykarel	bauzas, sure will take care that in the patch	16:26
bauzas	for that reason, I'm not happy with a default value except none	16:26
sean-k-mooney	so it shoudl eb a config option im ok with a low default or leaving it unset	16:26
bauzas	I'd prefer it unset for upgrade reasons	16:27
ykarel	okk Thanks will keep it like that no default and let user configure it	16:27
sean-k-mooney	bauzas: to your poitn it woudl be nice to have a triat for this but not required	16:27
bauzas	ykarel: s/user/operator but I think I get your point	16:27
ykarel	so will propose the blueprint and update the patch with as per all the suggestions, Thanks	16:27
sean-k-mooney	in this case really it will be the zuul job	16:27
ykarel	yes	16:28
sean-k-mooney	this is of interest to peopel usign qemu for emulation	16:28
bauzas	sean-k-mooney: man, we could recommend the operators to provide custom traits for this, exactly like vgpu types	16:28
bauzas	I mean, eventually all the computes will support that, right?	16:28
sean-k-mooney	yes	16:28
bauzas	after a couple of releases, once we cap libvirt to >=8	16:28
sean-k-mooney	so its not goign to break live migration	16:29
bauzas	so I'm not a big fan of adding some scheduling thing for something that will eventually be supported mid-term	16:29
sean-k-mooney	because we do not allwo live migration from a newer to older microversion	16:29
sean-k-mooney	and for cold migration we will regenerate the xml on the approiate host	16:29
sean-k-mooney	based on what it has avaiable	16:29
bauzas	s/microversion/libvirt version but yeah	16:29
sean-k-mooney	so i dont think we need anythign sepcial here	16:29
sean-k-mooney	so ya custom trait woudl eb fine with me	16:30
sean-k-mooney	they can use provider.yaml to set that if they want	16:30
bauzas	cool, so that only seems a config knob to add, a check on init_host to fail if set and some magic in the driver to enable it	16:30
bauzas	amirite ?	16:30
sean-k-mooney	more or less	16:30
bauzas	then, I'm OK for specless	16:30
sean-k-mooney	+docs test ectra but its effectivly self contaied in the libvirt driver + the config tweak	16:30
bauzas	we had precedents	16:30
bauzas	+ a relnote obviously	16:31
bauzas	ykarel: do you agree with the direction ?	16:31
ykarel	bauzas, yes	16:31
sean-k-mooney	im ok with specless too assuming all of the above are done	16:31
bauzas	anyone disagreeing ?	16:31
bauzas	looks not	16:32
ykarel	Thanks folks \o/	16:32
sean-k-mooney	the only other thin i would suggest is once this is done a devstack patch shoudl be added to set this by default to say 32mb if qemu is used instead of kvm	16:32
sean-k-mooney	thats out of scope fo this meeting and coud be doen on a per job basis too	16:33
bauzas	#agreed enabling tb cache seems a specless blueprint, provided it only adds a config knob defaulting to unset, init_host failing on an older libvirt and just libvirt config tweak	16:33
bauzas	#action ykarel to ping bauzas once the blueprint is created so that he can approve it	16:34
bauzas	ykarel: and yeah, the scope of this feature can include devstack change and testing, for sure	16:34
bauzas	like we could enable it in nova-next	16:34
sean-k-mooney	bauzas: it will reduce the memory pressue in all our jobs so ocne we know it does not have a negitive impact we will proably want to have it enable din all of them	16:35
bauzas	anything else to add on this item ?	16:35
sean-k-mooney	but we can take it slow like the mariadb reduced memroy	16:35
bauzas	sean-k-mooney: oh yeah, but I'd be in favor of testing it first	16:36
bauzas	yah	16:36
bauzas	ok, fwiw, I have another item	16:36
bauzas	(bauzas) Can https://blueprints.launchpad.net/nova/+spec/cold-migrate-to-host-policy be specless ?	16:36
bauzas	tl;dr: we discussed this at the PTG	16:37
sean-k-mooney	assuming there is no change in the default policy then yes i think so	16:37
bauzas	operators want a better granularity and maybe change the cold-migrate action to be admin_or_owner	16:37
bauzas	but, here, we're just adding a new policy which is admin-only	16:37
bauzas	so no API change, and no policy changes	16:37
bauzas	it will just go check a separate policy if host is set	16:38
bauzas	(literally a one-liner patch besides the policy file)	16:38
bauzas	any objections to have it specless ?	16:38
sean-k-mooney	so basicaly there will be two poicies now for cold migration one for migratoin with a host adn one without but admin only by default	16:38
sean-k-mooney	and then operators can choose	16:39
sean-k-mooney	+1	16:39
bauzas	correct, like we have for os_compute_api:servers:create:forced_host	16:39
bauzas	except I won't change the defaut rule for os_compute_api:os-migrate-server:migrate	16:40
bauzas	there will be os_compute_api:os-migrate-server:migrate and os_compute_api:os-migrate-server:migrate:host	16:40
bauzas	both being admin-only	16:40
bauzas	(and operators can decide to open os_compute_api:os-migrate-server:migrate to endusers)	16:40
bauzas	so I reiterate, any objection to have it specless ?	16:41
bauzas	looks not	16:41
bauzas	if so,	16:41
sean-k-mooney	as long ast there is at least a blueprint im happy. i dislike changing policy without any tracker to works for me	16:42
bauzas	#agreed https://blueprints.launchpad.net/nova/+spec/cold-migrate-to-host-policy accepted as a specless feature for Bobcat	16:42
bauzas	sean-k-mooney: there is a blueprint, and there will be a relnote	16:42
bauzas	and there will be functional tests covering this	16:42
sean-k-mooney	yep all good.	16:42
bauzas	I don't think we need a tempest change, do you think it's a nice to have ?	16:43
sean-k-mooney	i dont think tempst shoudl test non default policy	16:43
bauzas	yeah, that was my question	16:43
bauzas	I'm not a QA expert	16:43
bauzas	tempest is branchless	16:43
bauzas	so that would be a bit hard to test it with tempest	16:44
sean-k-mooney	i suspect you could reuse some fo the exsiting test with the right config if you needed too	16:44
bauzas	anyway, I think we're done on this	16:44
bauzas	about tempest, we could discuss this on the review time	16:44
bauzas	thanks folks	16:44
opendevreview	Artom Lifshitz proposed openstack/nova master: Reproduce bug 1995153 https://review.opendev.org/c/openstack/nova/+/862967	16:44
opendevreview	Artom Lifshitz proposed openstack/nova master: Save cell socket correctly when updating host NUMA topology https://review.opendev.org/c/openstack/nova/+/862964	16:44
bauzas	any other item to add before we end the meeting ?	16:45
auniyal	small thing o/	16:45
auniyal	CI on yoga: this one keep failing for different reasons, mostly	16:45
auniyal	https://review.opendev.org/c/openstack/nova/+/839922	16:45
bauzas	shot	16:45
auniyal	mostly volume tests	16:45
sean-k-mooney	ya that kind fo a pain im not sure there is anythin we can do beyond recheck	16:46
sean-k-mooney	is it the volume detach tests	16:46
sean-k-mooney	auniyal: gibi found some tests that are not waiting properly	16:46
bauzas	I think this is also tracked on the stable CI failures etherpad	16:46
auniyal	yes, attach and detach , but they are always different	16:46
auniyal	sometime tomeout	16:47
sean-k-mooney	yoga is not EM right so its still using tempest master?	16:47
auniyal	no	16:47
auniyal	tbc no, its not EM	16:47
sean-k-mooney	ok	16:47
sean-k-mooney	so it still can get tempest fixes if we fix those tests	16:47
bauzas	yup	16:48
bauzas	are we done ?	16:48
auniyal	sorry I didn't get, any action on above	16:48
auniyal	we need to fix tempest tests ?	16:49
sean-k-mooney	i think just continue to reheck it. gibi found at least on test that is not waiting for sshable	16:49
bauzas	no, we have some tempest patches up	16:49
sean-k-mooney	and notice other dont appear to eb waiting but i dont have the context	16:49
bauzas	and yoga would benefit from those	16:50
bauzas	since tempest is branchless	16:50
sean-k-mooney	oh do you have a link?	16:50
auniyal	ack thanks	16:50
bauzas	I was referring to gibi's recent discoveries of testing gap for ssh wait	16:51
bauzas	(sorry was looking at the -tc meeting)	16:52
bauzas	-tc chan*	16:52
bauzas	can we close this meeting now ?	16:52
sean-k-mooney	its fine we can wrap this here and chat after	16:52
bauzas	cool	16:53
bauzas	thanks all	16:53
bauzas	#endmeeting	16:53
opendevmeet	Meeting ended Tue Apr 25 16:53:12 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	16:53
opendevmeet	Minutes: https://meetings.opendev.org/meetings/nova/2023/nova.2023-04-25-16.00.html	16:53
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-04-25-16.00.txt	16:53
opendevmeet	Log: https://meetings.opendev.org/meetings/nova/2023/nova.2023-04-25-16.00.log.html	16:53
sean-k-mooney	https://review.opendev.org/c/openstack/tempest/+/880891	16:53
sean-k-mooney	seams to be related	16:53
elodilles	thanks o/	16:53
sean-k-mooney	so that test is wrong	17:01
sean-k-mooney	we do not supprot attaching or detaching prots or volume from neuton or cinder	17:01
frickler	so we run this test for 6 years, have issues with it time and again, and only now notice that it tests an unsupported scenario? cool	17:02
sean-k-mooney	frickler: it has never been supported	17:03
sean-k-mooney	i jsut notice this existed because gmann has a dnm patch up	17:03
gmann	which one ?	17:04
frickler	yes, saw the comment in the patch. also didn't want to blame anyone, just enjoying the wondrous world of openstack once again	17:04
sean-k-mooney	https://github.com/openstack/tempest/blob/master/tempest/api/volume/test_volumes_actions.py#L39-L55	17:04
frickler	https://review.opendev.org/c/openstack/tempest/+/881132/3	17:04
sean-k-mooney	it kind of depend on what self.volumes_client.attach_volume actully does	17:05
frickler	ah, no, the one below	17:05
sean-k-mooney	if its calling nova its fine	17:05
sean-k-mooney	if its using the cinder attachments api directly its not	17:05
sean-k-mooney	that looks like its callining cinder https://github.com/openstack/tempest/blob/20e460dacfae6b4546903a9caaf9253330f27b5a/tempest/clients.py#L286	17:07
sean-k-mooney	frickler: this was actully added 11 years ago https://github.com/openstack/tempest/commit/a42fe441703084449107fabb15fe42938c02ba08	17:10
sean-k-mooney	that does not mean it has been correct or supported for all tha time	17:10
frickler	ah, I was only looking at the current blame, which says 2017	17:10
frickler	you can see the actual API calls in https://4ae644854fb3bf106e9b-6877b85dbe482cd2daa62a6731b06023.ssl.cf1.rackcdn.com/881132/3/check/tempest-full-py3/37d3ce7/controller/logs/tempest_log.txt	17:10
gmann	frickler: sean-k-mooney ohk that one. those tests are meant to be cinder standalone case and they are not valid scenario involving nova in half way	17:12
sean-k-mooney	right they are fine fi you are using cinder standalone	17:12
frickler	POST https://213.32.75.38/compute/v2.1/servers/2a24008b-6c93-4b83-a678-8d5b0be7b6a1/os-volume_attachments	17:12
gmann	nobody since starting tested if passing nova server id in attachment via cinder will work from nova perspective or not	17:12
frickler	that looks like nova being used	17:12
gmann	I was testing those to remove nova involvement from those tests and nova+cinder attachment anyways are tested in many other tests	17:13
sean-k-mooney	yep nova should be revmoed form them	17:13
gmann	frickler: attachment is directly to cinder not via nova so nova does not know about attachment but cinder think server is attched to volume so make it in-use	17:13
sean-k-mooney	marking it in use is correct	17:14
sean-k-mooney	but we shoudl not see the volume attaed to the vm	17:14
gmann	yeah, i mean as nova does not know about attachment, adding server_id as valid attachment is not correct.	17:14
gmann	hat server_id can be invalid or can be deleted anytime	17:14
gmann	without cinder knopwing	17:14
gmann	sean-k-mooney: yeah, VM does not know about volume	17:15
sean-k-mooney	anyway im glad you are looking at it you can ping me after the patch is out of DNM if you want me to reivew	17:15
gmann	k	17:16
sean-k-mooney	dansmith: you pinged my yesterday to look at a patch maybe related to the stable uuid stuff	17:17
sean-k-mooney	do you rememebr what it was	17:17
dansmith	sean-k-mooney: the rt stuff, but it's all blocked of course.. gibi gave it the +W so it's probably good for me to just fast-approve once the gate is unblocked	17:17
sean-k-mooney	oh right ya that was it	17:17
sean-k-mooney	i remeber seeing it had a +w	17:18
sean-k-mooney	cool	17:18
dansmith	yep, thanks	17:18
dansmith	I'll definitely hit you up if I need a re-review once things get unblocked	17:18
dansmith	if they get unblocked I should say :)	17:18
frickler	the pysaml revert mergen, so I think CI should be unblocked	17:19
frickler	merged even	17:19
dansmith	we'll see :)	17:19
sean-k-mooney	im going to go get dinner. i might be around later but im mostly done for today	17:20
dansmith	mmm, ceph job appears to be failing again in a similar way.. hope we don't have more work to do	17:50
bauzas	dansmith: which patch are you checking for the job runs ?	18:31
bauzas	so I can try to look over it tomorrow morning	18:31
dansmith	https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315	18:32
bauzas	ack, will target it tomorrow morning	18:32
dansmith	it only failed six tests this time instead of a timeout, so maybe it's better than I thought	18:32
dansmith	but six is still a lot, and I haven't gone through the latest logs yet	18:32
bauzas	I can try to dig into those later	18:33
dansmith	the fails look all volume detach related	18:44
dansmith	so perhaps it's not really a ceph problem	18:44
dansmith	but it seems like a large number for a single run, so I'm not sure	18:44
dansmith	melwitt: can you +W this? https://review.opendev.org/c/openstack/nova/+/881409/2	20:31
dansmith	eharney: gouthamr: Well, it installs and "works" on jammy, but something isn't happy: https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315?tab=change-view-tab-header-zuul-results-summary	20:38
dansmith	eharney: gouthamr I don't really know what I'm looking at, but I don't see any errors in the cinder or ceph stuff that I recognize, and just some "busy" messages from rbd around the failed detaches in the n-cpu log	20:41
gouthamr	hey dansmith - /me is late to the party	20:42
dansmith	gouthamr: we need to drop focal from the jobs and our gate has been blocked for two days because some did it early..we've reverted those things for the moment, but we need to get the ceph job working on jammy	20:43
dansmith	gouthamr: the above patch unpins the jobs to let them run on jammy and they get pretty far, but some volume/ceph related failures are showing that something is not happy	20:43
dansmith	gouthamr: are you the right person to get that working?	20:44
melwitt	dansmith: just to confirm, you still want to remove after neutron has reverted? https://review.opendev.org/c/openstack/neutron/+/881430	20:44
dansmith	melwitt: yeah the neutron failure was a couple failures ago, and not even the only problem.. but as noted in the original patch, it was intended to only live for antelope and then be reverted, so we need to do it anyway	20:45
gouthamr	dansmith: probably not; i'm not an expert on rbd or cinder...	20:45
dansmith	gouthamr: oh.. who is then?	20:45
gouthamr	eharney is my go to guy, probably jbernard	20:46
dansmith	gouthamr: okay he said earlier today that he was not likely the guy to ask (unless I misunderstood)	20:46
gouthamr	ah :) let me look at the logs and see if something pops out	20:47
gouthamr	we've been burnt before by using distro packages for ceph because fixes took forever to land - so we shied away from them and looked upstream..	20:48
gouthamr	but, like you've discussed, the ceph community hasn't built jammy packages for the latest release (quincy) - they meant to, they lost people/mindshare in the recent months	20:49
dansmith	gouthamr: yeah, but last I checked, there were not packages from ceph themselves for jammy	20:49
dansmith	gouthamr: and the cephadm job is even more broken and marked n-v so I assume it's not healthier	20:49
gouthamr	on the manila jobs, we pivoted to use centos-stream-9 because ceph folks continue to publish packages there	20:49
gouthamr	s/there/for it	20:49
gouthamr	dansmith: yep; on that change, i see there's a problem with podman...	20:50
dansmith	gouthamr: yeah, but stream breaks us constantly	20:50
gouthamr	oh	20:50
dansmith	there's very little chance we're going to be able to have this job run on stream :)	20:50
dansmith	gouthamr: yeah I see the podman thing, but the job is also marked non-voting	20:51
dansmith	so I assume it doesn't have a long track record of stability :)	20:51
gouthamr	yes; i don't think we've run the job long enough to test for stability: https://zuul.opendev.org/t/openstack/builds?job_name=devstack-plugin-ceph-cephfs-nfs	20:52
dansmith	presumably the cephadm approach means we could run on jammy but with upstream ceph fixes	20:52
gouthamr	yep	20:52
dansmith	gouthamr: that's the cephfs job, but I assume that's not what nova needs	20:52
dansmith	I'm looking at devstack-plugin-ceph-tempest-cephadm	20:52
gouthamr	yes; just pointing out that job because it uses centos-9-stream	20:53
dansmith	ah okay	20:53
gouthamr	ack;	20:53
gouthamr	"devstack-plugin-ceph-tempest-cephadm" on focal-fossa used a third party repo to get podman	20:53
dansmith	ah, and podman is in jammy itself I think right?	20:54
gouthamr	by the looks of it, yes	20:54
dansmith	although it seems broken :)	20:54
dansmith	anyway, I thought there was also some concern that the cephadm job didn't expose the ceph config that nova needed or something like that, but I heard that like 20th hand	20:55
gouthamr	shouldn't be the case	20:55
dansmith	okay	20:55
gouthamr	i think we tested some of this without tempest in the picture - but that job ("devstack-plugin-ceph-tempest-cephadm") has never passed; we assumed someone working on nova/cinder/glance would help looking at it at some point	20:56
gouthamr	sorry this feels disjointed - conversations happened on irc and gerrit, ptg and the ML iirc.. but its time to reprise this because it's urgent..	20:57
gouthamr	that's a tangent though, let me see if i can spot an issue with the package based job you're looking to fix	20:57
dansmith	ugh, never passed? that's no good.. I wonder if for the same reason the non-cephadm job is failing/	20:58
dansmith	I can try to get podman working on this to see if it's otherwise the same	20:58
gouthamr	++	20:58
dansmith	gouthamr: this is definitely disjointed, and I feel like I'm just flailing because nobody else is :/	20:58
gouthamr	you're doing godly work :D	20:59
dansmith	$deitly work you mean :)	20:59
dansmith	er, $deityly .. or soething	21:00
gouthamr	:P	21:00
dansmith	okay I just pushed something that might get podman working based on that error message, so we'll see	21:00
dansmith	melwitt: thanks for the +W.. I would have just removed that neutron reference and changed to "effing everything" but didn't want to have to make another trip through the jobs, as you can probably imagine :)	21:01
dansmith	gouthamr: the cephadm that has never passed.. is that always on quincy (i.e. newer than what we were running in focal) or what?	21:06
dansmith	I	21:06
melwitt	dansmith: understandable :)	21:06
gouthamr	dansmith: 5/6 failures are on volume detach timeouts; and the request never got to cinder afaict .... https://zuul.opendev.org/t/openstack/build/9ebda7c1ebf843209e57ef0eac13814f/log/controller/logs/screen-n-cpu.txt#61276-61329	21:07
dansmith	gouthamr: yeah but you see the rbd busy messages right?	21:07
dansmith	I commented on an earlier patch	21:08
gouthamr	ah; no i missed those	21:08
dansmith	I'm not sure the cinder detach would have happened by this point by the way, because we haven't gotten the guest to let go yet	21:08
gouthamr	yes	21:09
gouthamr	might be my browser, but i don't see "rbd.ImageBusy: [errno 16] RBD image is busy (error removing image)" in the latest n-cpu logs	21:10
gouthamr	should i be looking elsewhere?	21:10
dansmith	yeah actually I don't think I see them in the latest either.. but that was just a recheck	21:11
dansmith	almost identical set of failures though	21:11
dansmith	so yeah.. weird	21:11
dansmith	gouthamr: here's an example from the previous run: https://zuul.opendev.org/t/openstack/build/f2ecbdd78616419cb5c8c2b3f4a8b71a/log/controller/logs/screen-n-cpu.txt#55439	21:13
dansmith	that's all over those logs and absent from the latest.. bizarre	21:15
dansmith	gouthamr: cephadm job made it past cephadm install phase, so.. progress I think	21:17
gouthamr	very nice	21:17
dansmith	and finished pool setup (ignorant guess from the commands it ran)	21:20
gouthamr	https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315/11/devstack/files/debs/devstack-plugin-ceph breaks focal fossa jobs (cephfs-native) though - but, we can fix that up with some os version annotation, correct?	21:22
dansmith	we can fix it by putting it in the code instead of those package lists	21:23
dansmith	I just jammed it in there because it was hard to mess up and will make sure we get it installed from the distro	21:23
gouthamr	ack; either that or i can just fix the native cephfs job to use jammy too	21:24
dansmith	the current install_podman thing does checks for focal, so I'd just extend that	21:24
dansmith	gouthamr: yeah, although the ceph jobs on stable have to use the plugin without branches don't they?	21:24
gouthamr	no this repo is branched	21:24
dansmith	ah okay	21:24
dansmith	well, either way.. if we move this to only >=jammy for everything (along with the PTI for 2023.2) then we can just use this debs list thing and remove the focal-specific install stuff.. whatever you ant	21:25
dansmith	I just want it to work :)	21:25
gouthamr	agree; lets see this work	21:25
gouthamr	if the "rbd remove" thing fails again on this run with the error, i would suggest reporting a bug - we could _try_ this thing on centos-9-stream and see whether there's some weirdness in the distro packages	21:27
gouthamr	but, i am nervous about that job's future with ubuntu since we've learned about the ceph community's stance	21:27
dansmith	ack, well, based on how this is working, I'm hoping the cephadm will either "work" or "fail the same way" and then we can discount distro packages	21:28
gouthamr	++	21:28
dansmith	it's running tempest now and looking identical to the distro version so far (i.e. hasn't failed but hasn't run volume tests yet)	21:28
dansmith	so that's majorly better. failing with upstream bits is a minor win over failing with distro bits :P	21:28
eharney	i have some work in flight currently around rbd ImageBusy errors... i wonder if this job is running some tests that were previously disabled?	21:29
dansmith	eharney: shouldn't be, and one run failed with a bunch of them and then a single recheck failed similarly, but with no rbdbusy errors	21:29
gouthamr	eharney: interesting, are there librbd changes that you're having to work around?	21:31
eharney	gouthamr: not new ones, just working on fixing the class of errors around rbd images that can't be deleted that we've always had	21:32
gouthamr	eharney: oh.. this error seems to have occurred multiple times in the libvirt rbd "remove image" call..	21:34
gouthamr	i'm hoping no openstack code needs to change; we're hoping to support quincy with stable/wallaby downstream :D based on some testing of this stuff elsewhere	21:36
dansmith	gouthamr: yeah I was going to ask earlier.. are we using quincy downstream such that we know it works? this set of failures has me concerned that it's something fundamental of course	21:36
gouthamr	(same tests, different OS, full blown ceph cluster as opposed to our aio .. etc etc -- so there could be a number of things being issues)	21:37
gouthamr	dansmith: yes, we're not testing quincy with openstack's trunk though... we trail downstream; but this stuff is working with zed last i checked, with the same tempest tests passing there	21:38
dansmith	okay that's good	21:38
dansmith	well, it's passing some volume tests at least	21:56
gouthamr	++	21:56
gouthamr	a couple of things we wanted to try in this cephadm job in the past:	21:57
dansmith	gouthamr: so if this works magically, you're okay just making this drop support for focal as long as the other jobs here are set to run on jammy?	21:57
gouthamr	(1) revert to default test concurrency --- the concurrency was set to 1 because we saw resource contention	21:57
gouthamr	dansmith: yes	21:58
dansmith	resource contention like memory/	21:58
gouthamr	yes, and disk	21:58
dansmith	so there's a tweak in devstack for memory that has been helping a lot of jobs, and we run with it enabled in the nova ceph jobs	21:58
gouthamr	oh? i'd love to know!	21:58
dansmith	drops mysql usage by about half, which is ~400MiB on these jobs	21:58
gouthamr	nice	21:58
dansmith	gouthamr: problem is you have to pay me a royalty per job execution to use it	21:58
dansmith	gouthamr: https://github.com/openstack/nova/blob/master/.zuul.yaml#L422	21:59
gouthamr	:D i'd be poor if i was betting on ceph using less memory ever	21:59
dansmith	haha	21:59
gouthamr	(2) there's also a way to turn off "cephadm" after the install -- we should set that option on the CI - there's no use to keep it running	22:00
dansmith	so based on our success with that, if you're cool with it, I'd say we turn that on for these jobs anyway	22:00
dansmith	gouthamr: meaning before we start running tempest?	22:00
gouthamr	yes, the plugin will do that for you after the ceph cluster deployment is done: https://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/cephadm#L28	22:01
dansmith	oh so why is that not being set?	22:02
dansmith	I don't even know what that means.. cephadm is a tool I thought, but you're saying it stays running even though the services are started or something?	22:02
gouthamr	(here's an example from the only good cephadm job at the moment: https://github.com/openstack/manila-tempest-plugin/blob/ad0db6359c8c51c4521ac6660c8014981b2f1dea/zuul.d/manila-tempest-jobs.yaml#L413)	22:02
dansmith	ack	22:02
gouthamr	yes; we need it for day2 operations on the cluster	22:02
gouthamr	so if you're running this on your local devstack, it's useful	22:03
dansmith	ah, sure, okay	22:03
dansmith	we should set that in these jobs that everyone inherits from	22:03
gouthamr	++	22:03
dansmith	so maybe a follow-on to this to set that, the mysql thing, and make this job voting	22:03
dansmith	you know, if and when it starts working :)	22:04
gouthamr	^^ +1	22:04
dansmith	hmm, gouthamr I just noticed that we're still set to release=pacific for the distro-package-based job	22:06
dansmith	is that maybe setting stuff that is non-ideal for quincy that could be related?	22:06
gouthamr	i think the plugin ignores it, let me check	22:06
dansmith	okay	22:06
dansmith	https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315/11/.zuul.yaml#46	22:06
dansmith	okay just used for package repos anyway it looks like	22:08
gouthamr	ack; we should get that opt out of the job because its confusing	22:08
gouthamr	https://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/ceph#L981-L989	22:08
dansmith	ack, so I assume we're configuring a repo but just not installing anything from it on jammy, right?	22:09
dansmith	I'll add it to my follow-on patch to add the optimization devstack vars	22:10
dansmith	add .. the removal of it, I mean :)	22:10
gouthamr	ack ty	22:10
gouthamr	i may have messed this up in my last patch	22:10
gouthamr	https://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/ceph#L1067-L1094	22:10
gouthamr	we're not invoking that method at all for ubuntu anymore..	22:10
dansmith	ack	22:11
dansmith	well, there's a bunch of focal-specific stuff to clean up in there regardless	22:11
dansmith	hmm, I think it's about to fail a test	22:14
dansmith	been stopped for going on five minutes, assuming retrying a detach	22:15
gouthamr	yes.. it was a head-scratcher; think vkmc and i noticed that our override to use download.ceph.com stopped working with focal at some point since ubuntu default-enabled the ubuntu ceph repos.. so dropping it made no difference, we ended up using/testing with the distro provided packages	22:15
gouthamr	oh	22:15
gouthamr	test_rebuild_server_with_volume_attached?	22:16
dansmith	dunno yet	22:16
gouthamr	ah	22:16
dansmith	but it's in the rebuld group	22:16
dansmith	yep	22:16
dansmith	test_rebuild_server_with_volume_attached [430.305393s] ... FAILED	22:17
dansmith	ugh	22:17
dansmith	so the other variable here is the version of qemu and qemu's block-rbd driver are different in jammy of course, compared to what we've been testing	22:18
dansmith	so could be a bug in one of those, especially since it's related to the detach in the guest	22:18
gouthamr	ack; another thing to try would be to bump the ceph image to the latest quincy: https://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/cephadm#L32	22:19
dansmith	okay	22:20
gouthamr	they've published v17.2.6 today, and v17.2.5 a month ago	22:20
gouthamr	https://quay.io/repository/ceph/ceph?tab=tags	22:21
dansmith	ack, I hate this sort of "version minesweeper" game.. if we're that sensitive to version, it feels like we're doing something wrong	22:21
gouthamr	agreed; but since this stuff hasn't worked before on our ci, its worth a try	22:22
dansmith	yeah for sure	22:22
dansmith	volume resize passed	22:23
dansmith	maybe we'll find that this is fewer fails or something	22:23
dansmith	actually, that one didn't fail before	22:25
gouthamr	okay, this may be good news? devstack-plugin-ceph-tempest-py3 is going to pass	22:28
dansmith	no, really?	22:28
dansmith	third recheck's a charm?	22:28
gouthamr	:D	22:28
dansmith	more fails on this cephadm job	22:42
dansmith	so I guess it's not something fundamental, but maybe just massively less stable or we're hitting some race easier?	22:42
dansmith	maybe it is memory-related and we're stressed more here	22:43
dansmith	maybe I should try turning on the two optimizations here to see if that makes things more stable	22:43
gouthamr	++	22:43
gouthamr	"DISABLE_CEPHADM_POST_DEPLOY: true" and "MYSQL_REDUCE_MEMORY: true" for the rescue; we can iterate after with the concurrency if these failures reduce	22:45
gouthamr	or go away	22:45
opendevreview	Artom Lifshitz proposed openstack/nova master: Reproduce bug 1995153 https://review.opendev.org/c/openstack/nova/+/862967	22:46
opendevreview	Artom Lifshitz proposed openstack/nova master: Save cell socket correctly when updating host NUMA topology https://review.opendev.org/c/openstack/nova/+/862964	22:46
dansmith	gouthamr: ack, will put those in here and see	22:46
opendevreview	Merged openstack/nova master: Remove focal job for 2023.2 https://review.opendev.org/c/openstack/nova/+/881409	22:53
dansmith	gouthamr: okay, it's off and running with those flags	23:25
dansmith	I'm burnt out so I'll circle back tomorrow	23:25
gouthamr	dansmith++ works; good evening! :)	23:27

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!