Tuesday, 2023-04-25

gmanndansmith: elodilles has patch up to remove all py38 jobs including focal job https://review.opendev.org/c/openstack/nova/+/88133900:37
ykareldansmith, only merged in neutron repo, others are still unmerged, i pushed the revert04:49
bauzascatching up this night's conversations06:55
* bauzas is about to cry in a corner06:55
opendevreviewBalazs Gibizer proposed openstack/nova master: DNM: Revert "Temporary skip some volume detach test in nova-lvm job"  https://review.opendev.org/c/openstack/nova/+/88138907:17
gibibauzas: I need to leave early today so I will miss the nova meeting08:49
bauzasgibi: ack, no worries09:04
opendevreviewAmit Uniyal proposed openstack/nova master: WIP: Reproducer for dangling volumes  https://review.opendev.org/c/openstack/nova/+/88145710:01
opendevreviewMerged openstack/nova stable/yoga: Handle InstanceInvalidState exception  https://review.opendev.org/c/openstack/nova/+/87211711:27
dansmithgmann: I don't see that that ever passed13:01
dansmithusing the alternate python interpreter on focal seems less good to me than just fixing the ceph job13:02
dansmithand also not necessary if people don't prohibit 3.813:02
sean-k-mooneydansmith: ya i suggested it because it used to work but it obviously got broken at some point13:03
sean-k-mooneyi was hoping it woudl be a quick fix to unblock the gate13:03
sean-k-mooneyjust makeign the job use 22.04 would be my preference too13:03
bauzasyup, let's try to use Jammy13:08
opendevreviewribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (api)  https://review.opendev.org/c/openstack/nova/+/83683013:11
opendevreviewribaudr proposed openstack/nova master: Check shares support  https://review.opendev.org/c/openstack/nova/+/85049913:11
opendevreviewribaudr proposed openstack/nova master: Add metadata for shares  https://review.opendev.org/c/openstack/nova/+/85050013:11
opendevreviewribaudr proposed openstack/nova master: Add instance.share_attach notification  https://review.opendev.org/c/openstack/nova/+/85050113:11
opendevreviewribaudr proposed openstack/nova master: Add instance.share_detach notification  https://review.opendev.org/c/openstack/nova/+/85102813:11
opendevreviewribaudr proposed openstack/nova master: Add shares to InstancePayload  https://review.opendev.org/c/openstack/nova/+/85102913:11
opendevreviewribaudr proposed openstack/nova master: Add helper methods to attach/detach shares  https://review.opendev.org/c/openstack/nova/+/85208513:11
opendevreviewribaudr proposed openstack/nova master: Add libvirt test to ensure metadata are working.  https://review.opendev.org/c/openstack/nova/+/85208613:11
opendevreviewribaudr proposed openstack/nova master: Add virt/libvirt error test cases  https://review.opendev.org/c/openstack/nova/+/85208713:11
opendevreviewribaudr proposed openstack/nova master: Add share_info parameter to reboot method for each driver (driver part)  https://review.opendev.org/c/openstack/nova/+/85482313:11
opendevreviewribaudr proposed openstack/nova master: Support rebooting an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/85482413:11
opendevreviewribaudr proposed openstack/nova master: Add instance.share_attach_error notification  https://review.opendev.org/c/openstack/nova/+/86028213:11
opendevreviewribaudr proposed openstack/nova master: Add instance.share_detach_error notification  https://review.opendev.org/c/openstack/nova/+/86028313:11
opendevreviewribaudr proposed openstack/nova master: Add share_info parameter to resume method for each driver (driver part)  https://review.opendev.org/c/openstack/nova/+/86028413:11
opendevreviewribaudr proposed openstack/nova master: Support resuming an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/86028513:11
opendevreviewribaudr proposed openstack/nova master: Add helper methods to rescue/unrescue shares  https://review.opendev.org/c/openstack/nova/+/86028613:11
opendevreviewribaudr proposed openstack/nova master: Support rescuing an instance with shares (driver part)  https://review.opendev.org/c/openstack/nova/+/86028713:11
opendevreviewribaudr proposed openstack/nova master: Support rescuing an instance with shares (compute and API part)  https://review.opendev.org/c/openstack/nova/+/86028813:11
opendevreviewribaudr proposed openstack/nova master: Docs about Manila shares API usage  https://review.opendev.org/c/openstack/nova/+/87164213:11
opendevreviewribaudr proposed openstack/nova master: Mounting the shares as part of the initialization process  https://review.opendev.org/c/openstack/nova/+/88007513:11
opendevreviewribaudr proposed openstack/nova master: Deletion of associated share mappings on instance deletion  https://review.opendev.org/c/openstack/nova/+/88147213:11
dansmithbauzas: oh I didn't realize you rechecked my patch and it failed again with *another* uninstallable package13:16
dansmithnot sure if it's 3.9 related or not though13:17
bauzasI haven't seen the logs yet13:17
dansmithI rechecked again so we'll see13:17
ykarelfwiw pysaml2 update is py3.9+ only and keystone installation is broken in focal or py3813:18
dansmithyeah that ^13:19
dansmiththat's what I failed on13:19
dansmithcripes13:19
ykarelrevert patch https://review.opendev.org/c/openstack/requirements/+/88146613:19
bauzasykarel: last time I looked, this was due to the fact that py39 wanted UUIDs 13:21
ykarelbauzas, sorry which issue? /me can't relate to UUIDs13:35
dansmithyeah not sure what the uuid thing is13:39
bauzasit was when elodilles tried to update the py version for ceph-multinode13:41
bauzasbut I can be wrong13:41
dansmithoh, unrelated to pysaml I see13:41
ykarelack13:41
bauzasspeaking of https://1742443592df6307c50f-d94079506d616d2cb38d2b70f09b441e.ssl.cf2.rackcdn.com/881339/3/check/nova-ceph-multistore/5f269ed/controller/logs/devstacklog.txt13:41
bauzas2023-04-24 11:48:22.720 | [01;33mWARNING py.warnings [[01;36mNone req-518808ec-0a14-444d-b6d3-fb6383d03e9c [00;36mNone None[01;33m] [01;35m[01;33m/usr/local/lib/python3.9/dist-packages/pycadf/identifier.py:71: UserWarning: Invalid uuid: RegionOne. To ensure interoperability, identifiers should be a valid uuid.13:42
dansmitheharney: are you around to talk about the ceph job stuff? I think you were merging some of those cephadm changes13:43
ykarelack got it, it's seperate thing13:43
eharneydansmith: yes13:45
eharneydansmith: what's going on there?13:45
dansmitheharney: so, is that how we should be moving the upstream jobs at this point? jammy has quincy ceph packages, as I understand it13:45
dansmithand the cephadm jobs don't work either13:45
dansmithand they're nonvoting so I assumed that meant they weren't "ready"13:47
dansmitheharney: anyway, the context is, we desperately need to get the ceph jobs running on jammy13:47
eharneyi was trying to understand the situation with these the other day, i guess they were pinned to focal before because there weren't jammy packages, but maybe now there are?13:48
dansmitheharney: yeah that's the reason for the pinning, but indeed now there seem to be base packages in jammy for quincy13:49
dansmithI rechecked the unpin patch yesterday and it failed with qemu not having the rbd block driver13:49
dansmitheharney: https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315?tab=change-view-tab-header-zuul-results-summary13:49
eharneyhmm, looking, i'm not really up to speed on this13:50
dansmitheharney: okay is there someone else we should bug?13:50
dansmitheharney: because right now the ceph jobs are all totally wedged because a bunch of packages have just banned anything python<=3.813:51
eharneydansmith: not sure who but i think some manila folks have been more active in devstack-plugin-ceph recently13:51
dansmithreverts are in place or in flight for some of them, but we need to get it resolved13:51
eharneyahh13:51
dansmithbecause focal is not in our PTI right now...13:51
dansmitheharney: so maybe gouthamr ?13:52
sean-k-mooneydansmith: did you notice the ceph nfs job passed? https://zuul.opendev.org/t/openstack/build/3f0ec49fa9e048f6a73e6c03833fecc9/logs13:57
dansmithsean-k-mooney: I assume that's not using the qemu block driver13:57
sean-k-mooneyperhaps although that is an odd issue13:57
dansmithsean-k-mooney: the actual setup of ceph seemed to work on jammy, it's just that qemu doesn't have the block driver to load to talk to it (or something)13:57
sean-k-mooneythey may have chagned that to a weak dep in the qemu packaging13:58
sean-k-mooneyso we might need to install it in devstack explictly13:58
dansmithit seemed like maybe those were provided in the ceph packages instead of qemu or something 13:58
dansmithsean-k-mooney: well, I looked and couldn't find any package in jammy for it13:58
dansmithsean-k-mooney:  but yes, hopefully it's something simple13:58
sean-k-mooneyi assumed it was complied in if im being honest but didnt look13:58
dansmithme too13:59
dansmithuntil I saw the logs.. let me get you a link13:59
dansmithsean-k-mooney: https://zuul.opendev.org/t/openstack/build/1435ec8b57be4b159c3d373e23673ab5/log/controller/logs/screen-n-cpu.txt#843713:59
dansmithit's also weirdly named "block-block-rbd".. perhaps there's some change in jammy with something and someone is prefixing an extra... prefix/14:00
dansmithalthough it says "unknown driver rbd" on the next line, which looks right14:00
sean-k-mooneythere nova package just depend on qemu-system https://packages.ubuntu.com/jammy/nova-compute-qemu14:00
bauzasdansmith: sean-k-mooney: so, to clarify, once we merge https://review.opendev.org/c/openstack/nova/+/88140914:00
bauzaswe will still have a problem with ceph-multistore due to keystone + cephadm right?14:01
dansmithbauzas: yes14:01
sean-k-mooneyhum that is not linked agains librbd14:01
sean-k-mooneyhttps://packages.ubuntu.com/jammy/qemu-system-x8614:01
dansmithbauzas: not "due to cephadm"14:01
sean-k-mooneyah14:01
sean-k-mooneyhttps://packages.ubuntu.com/jammy/qemu-block-extra14:01
dansmithbauzas: just the ceph job14:01
sean-k-mooneydansmith: we are misisng qemu-block-extra14:01
bauzasbecause of keystone ?14:01
sean-k-mooneythat is what provides the rbd supprot 14:01
* bauzas tries to correctly understand the problem14:02
dansmithsean-k-mooney: ack, nice14:02
sean-k-mooneyand just checked its a recommended package14:02
eharneyahh, cool, was just looking for that missing link myself14:02
sean-k-mooneynot a dep14:02
dansmithsean-k-mooney: so we likely need a devstack ceph plugin change, let me push that up and try a dep14:02
sean-k-mooneyya so we coudl also do this via bindep in nova as an optional dep . perhaps we shoudl as a follow up14:03
sean-k-mooneyi was just going to look at devstack to see how it installes qemu/libvirt14:03
sean-k-mooneyand add it there but doing it in the devstack plugin proably makes more sense14:03
eharneyit can go in devstack-plugin-ceph/devstack/files/debs  i think14:03
dansmithsean-k-mooney: is that package also in focal?14:03
sean-k-mooneyyep14:04
sean-k-mooneywell im waiting fo the page to refrsh14:04
dansmithit is14:04
sean-k-mooneybut it looks like ti shoudl be there form bionic14:04
dansmithI confirmed14:04
dansmithhttps://review.opendev.org/c/openstack/devstack-plugin-ceph/+/88147914:05
dansmithI'll fix up the revert-focal devstack patch to depend on it14:05
sean-k-mooneyso they moved it form suggetes to recommends from focal to jammy14:05
dansmithah14:06
sean-k-mooneyhttps://github.com/openstack/devstack/blob/master/lib/nova_plugins/functions-libvirt#L7214:07
sean-k-mooneyso i was thinkign of just adding hit there to be honest14:07
sean-k-mooneyjust alwasy install qemu-block-extra in base devstack14:08
dansmiththe ceph plugin already has a list of specific-to-it deb packages.. so it needs to go there anyway, IMHO14:08
sean-k-mooneysure14:08
sean-k-mooneyim just wonderign what else is ther14:09
sean-k-mooneyit look like libisci is also there14:09
dansmithxfsprogs14:09
dansmithalthough I dunno why14:09
dansmithanyway, let's see if this works14:09
sean-k-mooneyaho ok we dont need it for isci because we host mount14:09
sean-k-mooneyfor multipath14:09
dansmithah, iscsi is in block-extra as well?14:09
sean-k-mooneyyes but only if you use qemu to directly conenct to the iscsi backend which we nolonger do. i dont think we use gluserfs either14:10
dansmithright14:10
dansmithsean-k-mooney: eharney okay it installed the qemu-block-extra package successfully, so that's good.. we'll see if it can actually boot instances once it gets there :)14:35
dansmithlooks like it might be booting servers ...14:49
*** iurygregory_ is now known as iurygregory15:00
bauzasreminder : nova meeting in 35 mins15:25
bauzassean-k-mooney: dansmith: I'm about to communicate back to the ML saying that the gate is unblocked, amirite ?15:34
dansmithno15:34
sean-k-mooneyno15:34
sean-k-mooneynone fo the patches are merged15:34
dansmithwe have a ways to go before that15:34
dansmithI have to squash my fix into the main one, since it can't merge on its own15:34
dansmithand we'll have to recheck the main one anyway since it failed more volume attach things15:34
dansmithso we still got a bit15:35
bauzasbecause other libs are capping >=3.9 ?15:35
* bauzas is confused15:35
bauzasI thought neutron did the revert15:35
dansmithbauzas: so many more problems than that dude :)15:36
dansmithnow pysaml2 has gone >3.8 which is breaking everyone15:36
bauzasoh15:36
dansmiththat revert may land and we might be okay, but there might be others too15:36
bauzasyeah, I got the memo but I didn't read it correctly15:36
bauzasokay, then I'll write some status email then15:37
sean-k-mooneyi think we are just goign to mvoe everytin to jammy and drop the 3.8 testing15:37
bauzassaying we can't guarantee that any of the deps are still able to use py3.815:37
sean-k-mooneyassumign the ceph job evenutally passes without the detach error15:37
bauzassean-k-mooney: that was my thought15:37
bauzashttps://review.opendev.org/c/openstack/nova/+/881409 should help15:38
sean-k-mooneyhttps://review.opendev.org/q/topic:drop-py38 is the full set of fixes15:39
sean-k-mooneywe basically need to revert the py39 change out of https://review.opendev.org/c/openstack/nova/+/881339 and incopreate the ceph job fixes instead15:40
sean-k-mooneyso https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/881479 and https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315 need to be squased15:41
sean-k-mooneyand https://review.opendev.org/c/openstack/nova/+/881339 need to depend on the squahsed commit15:41
sean-k-mooneydansmith: those were the two patches you planned to squash right?15:42
bauzassean-k-mooney: thanks, I'll explain it15:42
dansmithyes15:42
dansmithI'm waiting for the run to finish so I can look at the failures15:42
dansmithunfortunately i looks like maybe we OOMed or something because now we're failing to clean  up neutron ports and various other things15:43
dansmithhopefully the new ceph doesn't like use a lot more memory or something :/15:43
sean-k-mooneywe might need to bump the swap or enabel the mariadb thing15:43
sean-k-mooneythats still not on by defualt in devstack is it?15:43
dansmithit is for the ceph job in nova, but yeah maybe not here not sure15:45
dansmithwe'll see when it finishes15:45
sean-k-mooneythe default is still false https://github.com/openstack/devstack/blob/4dfb67a831686279acd66f65e51beba42f675c91/stackrc#L20715:45
dansmithright15:46
sean-k-mooneyalthough it is enabeld by default in zuul for multi node jobs https://github.com/openstack/devstack/blob/2e607b0cbd91d9243c3e9424a500598c72ae34ad/.zuul.yaml#L70115:46
sean-k-mooneybut im assumign thsi is singelnode15:46
sean-k-mooneyanyway we have a few levers we can pull when we knwo why it failed15:46
opendevreviewElod Illes proposed openstack/nova master: Drop py38 based zuul jobs  https://review.opendev.org/c/openstack/nova/+/88133915:55
opendevreviewElod Illes proposed openstack/nova master: Drop py38 support from setup.cfg and tox.ini  https://review.opendev.org/c/openstack/nova/+/88136515:56
bauzas#startmeeting nova16:00
opendevmeetMeeting started Tue Apr 25 16:00:02 2023 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.16:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:00
opendevmeetThe meeting name has been set to 'nova'16:00
bauzashey everyone16:00
elodilleso/16:00
dansmitho/16:00
bauzaswe are quite busy today so let's try to be quick16:00
ykarelo/16:00
auniyalo/16:00
bauzas#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting16:00
bauzas#topic Bugs (stuck/critical) 16:00
bauzas#info No Critical bug16:00
bauzas#link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 17 new untriaged bugs (+4 since the last meeting)16:00
bauzas#info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster16:01
bauzashonestly, I think I forgot to tell melwitt to look at the bugs, so it's on me :)16:01
bauzasthe next person in the roster is artom16:01
artomohhai16:01
bauzasartom: fancy triaging some upstream bugs if you like ?16:01
artomSure16:02
bauzasthanks a lot16:02
artomCLOSED RUSTWILLFIXIT16:02
bauzasI'll also try to cherry-pick some16:02
bauzasartom: well, we're on Launchpad16:02
bauzasso it'd be 'Closed' only with a comment saying you'd think Rust would work16:03
bauzasor even 'Wontfix'16:03
bauzasanyway16:03
bauzas#info bug baton is being passed to artom16:03
bauzas#topic Gate status 16:03
bauzasgrab your popcorns16:03
bauzas#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:03
bauzas#link https://etherpad.opendev.org/p/nova-ci-failures16:03
bauzasbut alas,16:03
bauzas#link https://lists.openstack.org/pipermail/openstack-discuss/2023-April/033454.html Gate is blocked16:04
bauzaslast status : https://lists.openstack.org/pipermail/openstack-discuss/2023-April/033468.html16:04
bauzasdansmith: sean-k-mooney: elodilles: wanting to add anything else on that ?16:05
dansmithprobably not16:05
sean-k-mooneywe are mainly waiting on ci16:05
sean-k-mooneyso no i think that fine16:05
elodillesi've just updated the nova patches according to the discussions (if i did not miss anything)16:05
bauzaswe have two concurrent patches afaics https://review.opendev.org/c/openstack/nova/+/881409 and https://review.opendev.org/c/openstack/nova/+/88133916:05
elodillesif those will be the chosen ones :)16:05
bauzaswe may want to only use one :)16:06
dansmithyeah16:06
sean-k-mooneyelodilles: your missing the depend on agains the devstack patch but those need to be squashed first16:06
bauzasanyway, we'll sort that out of the meeting16:06
sean-k-mooneyso its fine for now16:06
elodilles++16:07
bauzasyup and thanks to dansmith and sean-k-mooney for working hardly on the ceph patches16:07
sean-k-mooneymainly dansmith 16:08
bauzas-EOLDGREEK to me16:08
sean-k-mooneyi have been busy with other things other then checking back every now and then16:08
bauzas(I mean, I know what ceph does and all the things, but that is a different story)16:08
sean-k-mooneyanywya we can move on16:08
bauzascool16:08
bauzashttps://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly16:08
bauzasunsurprisingly they are all green16:09
bauzasbut by their runs we didn't had the py39 updates :)à16:09
sean-k-mooneygood timing i guess16:09
bauzaswell, tooz upgraded to 4.0 the week before16:10
opendevreviewMerged openstack/nova stable/xena: Fix rescue volume-based instance  https://review.opendev.org/c/openstack/nova/+/87534316:10
bauzasso that's fortunate we haven't merged the u-c patch during the weekend16:10
bauzasanyway, moving on16:10
bauzas#info Please look at the gate failures and file a bug report with the gate-failure tag.16:11
bauzas#info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures16:11
bauzas#topic Release Planning 16:11
bauzas#link https://releases.openstack.org/bobcat/schedule.html16:11
bauzas#info Nova deadlines are set in the above schedule16:11
bauzas#info Bobcat-1 is in 2 weeks16:11
bauzas(we'll have a stable branch review day on Bobcat-1 but I'll explain this the next week)16:11
bauzas#topic Review priorities 16:12
bauzas#link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2)16:12
bauzas#info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review16:12
bauzas#topic Stable Branches 16:12
bauzaselodilles: take the mic16:12
elodillesyepp16:12
elodillesbeyond the usual stuff,16:12
elodilles(unblocked gate & many rechecks)16:13
elodillesauniyal prepared release patches for yoga and zed16:13
elodillesthey haven't merged yet16:13
elodillesand meanwhile 1-1 patches merged to stable/yoga and zed16:13
auniyalyes, got 1 +2 yet :), thanks for review elodilles16:14
elodillesotherwise they are good as they are16:14
elodillesauniyal: thanks for proposing the patches :)16:14
bauzascool16:14
bauzasanything else ?16:15
elodillesnope, i think that was all16:16
elodillesfrom my side16:16
elodillessorry :)16:16
bauzascool, moving on16:17
bauzas#topic Open discussion 16:17
bauzas(ykarel) Allow to add tb-cache size libvirt option for qemu, context https://bugs.launchpad.net/nova/+bug/1949606 16:17
bauzasykarel: go for it16:17
ykarelhi16:19
ykarelso qemu-5.0.0(included in ubuntu jammy) update default tb cache size to 1 GiB(from 32 MiB) for system-emulated guest vms and with that each guest VM is using much more system memory(1 GB+ memory for guest), resulting into oom-kill issues when creating multiple guest vms concurrently in neutron scenario jobs using ubuntu guest vms16:19
ykarellibvirt-8.0.0 added an option to configure it per guest vm16:20
ykarelCurrently testing WIP nova patch https://review.opendev.org/c/openstack/nova/+/868419 in https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/88139116:20
ykarelshould this be a specless RFE or can continue as a bug?16:20
sean-k-mooneyykarel: did you file the specless pluprint i asked for16:20
sean-k-mooneyah 16:20
sean-k-mooneyits not a bug16:20
ykarelsean-k-mooney, i just added to meeting agenda to get clearity16:21
ykarelbut if it's a way to go then i will file it16:21
sean-k-mooneywe might conider it to be a small enough workaround that we might want to backport it upstream16:21
sean-k-mooneyto enable ci testing16:21
ykarelhmm mainly would be needed for releases running on jammy16:22
sean-k-mooneythe issue is that i dont think 22.04 has libvirt 8.0.016:22
ykarelit has16:22
ykarelthat's what we testing16:22
sean-k-mooneyah ok16:22
bauzaswe damn need update of https://docs.openstack.org/nova/latest/reference/libvirt-distro-support-matrix.html16:23
sean-k-mooneyyes16:23
sean-k-mooneyso your patch is mising a min libvirt version check16:23
bauzaswe're still having libvirt 6.0.0 as a min, right?16:23
sean-k-mooneyso that will need to be added16:23
ykarelfrom job logs libvirt-daemon                        8.0.0-1ubuntu7.416:23
sean-k-mooneybauzas: yes and im hoping to bump that to 7.0.0 this release16:23
ykarelsean-k-mooney, yes will include that in next update16:23
sean-k-mooney but we will sitll need to check 8.0.0 supprot16:24
sean-k-mooneyack16:24
bauzassean-k-mooney: but we would still need to only set it if libvirt >=816:24
bauzasdo we want to schedule on it, or is it just a performance fix ?16:24
sean-k-mooneyya either only set it or preferable check this in init-host16:24
sean-k-mooneyand raise an error if you set the config option on an old libvirt16:24
bauzaswill there be any config knob ?16:24
sean-k-mooneyyes there shoudl be 16:25
ykarelyes that's second question16:25
sean-k-mooneyto set the cache size16:25
ykarelDefault setting for this new option, unconfigured or set to defaults like 32MB or 128 MB etc?16:25
bauzasykarel: if that's a config option, that indeed needs to be an hard fail if the operators sets this config value 16:26
bauzasand libvirt isn't recent enough16:26
sean-k-mooneyits not somethign we should hardcode16:26
ykarelbauzas, sure will take care that in the patch16:26
bauzasfor that reason, I'm not happy with a default value except none16:26
sean-k-mooneyso it shoudl eb a config option im ok with a low default or leaving it unset16:26
bauzasI'd prefer it unset for upgrade reasons16:27
ykarelokk Thanks will keep it like that no default and let user configure it16:27
sean-k-mooneybauzas: to your poitn it woudl be nice to have a triat for this but not required16:27
bauzasykarel: s/user/operator but I think I get your point16:27
ykarelso will propose the blueprint and update the patch with as per all the suggestions, Thanks16:27
sean-k-mooneyin this case really it will be the zuul job16:27
ykarelyes16:28
sean-k-mooneythis is of interest to peopel usign qemu for emulation16:28
bauzassean-k-mooney: man, we could recommend the operators to provide custom traits for this, exactly like vgpu types16:28
bauzasI mean, eventually all the computes will support that, right?16:28
sean-k-mooneyyes16:28
bauzasafter a couple of releases, once we cap libvirt to >=816:28
sean-k-mooneyso its not goign to break live migration16:29
bauzasso I'm not a big fan of adding some scheduling thing for something that will eventually be supported mid-term16:29
sean-k-mooneybecause we do not allwo live migration from a newer to older microversion16:29
sean-k-mooneyand for cold migration we will regenerate the xml on the approiate host16:29
sean-k-mooneybased on what it has avaiable16:29
bauzass/microversion/libvirt version but yeah16:29
sean-k-mooneyso i dont think we need anythign sepcial here16:29
sean-k-mooneyso ya custom trait woudl eb fine with me16:30
sean-k-mooneythey can use provider.yaml to set that if they want16:30
bauzascool, so that only seems a config knob to add, a check on init_host to fail if set and some magic in the driver to enable it16:30
bauzasamirite ?16:30
sean-k-mooneymore or less16:30
bauzasthen, I'm OK for specless16:30
sean-k-mooney+docs test ectra but its effectivly self contaied in the libvirt driver + the config tweak16:30
bauzaswe had precedents16:30
bauzas+ a relnote obviously16:31
bauzasykarel: do you agree with the direction ?16:31
ykarelbauzas, yes16:31
sean-k-mooneyim ok with specless too assuming all of the above are done16:31
bauzasanyone disagreeing ?16:31
bauzaslooks not16:32
ykarelThanks folks \o/16:32
sean-k-mooneythe only other thin i would suggest is once this is done a devstack patch shoudl be added to set this by default to say 32mb if qemu is used instead of kvm16:32
sean-k-mooneythats out of scope fo this meeting and coud be doen on a per job basis too16:33
bauzas#agreed enabling tb cache seems a specless blueprint, provided it only adds a config knob defaulting to unset, init_host failing on an older libvirt and just libvirt config tweak16:33
bauzas#action ykarel to ping bauzas once the blueprint is created so that he can approve it16:34
bauzasykarel: and yeah, the scope of this feature can include devstack change and testing, for sure16:34
bauzaslike we could enable it in nova-next16:34
sean-k-mooneybauzas: it will reduce the memory pressue in all our jobs so ocne we know it does not have a negitive impact we will proably want to have it enable din all of them16:35
bauzasanything else to add on this item ?16:35
sean-k-mooneybut we can take it slow like the mariadb reduced memroy16:35
bauzassean-k-mooney: oh yeah, but I'd be in favor of testing it first16:36
bauzasyah16:36
bauzasok, fwiw, I have another item16:36
bauzas(bauzas) Can https://blueprints.launchpad.net/nova/+spec/cold-migrate-to-host-policy be specless ?16:36
bauzastl;dr: we discussed this at the PTG16:37
sean-k-mooneyassuming there is no change in the default policy then yes i think so16:37
bauzasoperators want a better granularity and maybe change the cold-migrate action to be admin_or_owner16:37
bauzasbut, here, we're just adding a new policy which is admin-only16:37
bauzasso no API change, and no policy changes16:37
bauzasit will just go check a separate policy if host is set16:38
bauzas(literally a one-liner patch besides the policy file)16:38
bauzasany objections to have it specless ?16:38
sean-k-mooneyso basicaly there will be two poicies now for cold migration one for migratoin with a host adn one without but admin only by default16:38
sean-k-mooneyand then operators can choose16:39
sean-k-mooney+116:39
bauzascorrect, like we have for os_compute_api:servers:create:forced_host16:39
bauzasexcept I won't change the defaut rule for os_compute_api:os-migrate-server:migrate16:40
bauzasthere will be os_compute_api:os-migrate-server:migrate and os_compute_api:os-migrate-server:migrate:host16:40
bauzasboth being admin-only16:40
bauzas(and operators can decide to open os_compute_api:os-migrate-server:migrate to endusers)16:40
bauzasso I reiterate, any objection to have it specless ?16:41
bauzaslooks not16:41
bauzasif so, 16:41
sean-k-mooneyas long ast there is at least a blueprint im happy. i dislike changing policy without any tracker to works for me16:42
bauzas#agreed https://blueprints.launchpad.net/nova/+spec/cold-migrate-to-host-policy accepted as a specless feature for Bobcat16:42
bauzassean-k-mooney: there is a blueprint, and there will be a relnote16:42
bauzasand there will be functional tests covering this16:42
sean-k-mooneyyep all good. 16:42
bauzasI don't think we need a tempest change, do you think it's a nice to have ?16:43
sean-k-mooneyi dont think tempst shoudl test non default policy16:43
bauzasyeah, that was my question16:43
bauzasI'm not a QA expert16:43
bauzastempest is branchless16:43
bauzasso that would be a bit hard to test it with tempest16:44
sean-k-mooneyi suspect you could reuse some fo the exsiting test with the right config if you needed too16:44
bauzasanyway, I think we're done on this16:44
bauzasabout tempest, we could discuss this on the review time16:44
bauzasthanks folks16:44
opendevreviewArtom Lifshitz proposed openstack/nova master: Reproduce bug 1995153  https://review.opendev.org/c/openstack/nova/+/86296716:44
opendevreviewArtom Lifshitz proposed openstack/nova master: Save cell socket correctly when updating host NUMA topology  https://review.opendev.org/c/openstack/nova/+/86296416:44
bauzasany other item to add before we end the meeting ?16:45
auniyalsmall thing o/16:45
auniyalCI on yoga: this one keep failing for different reasons, mostly 16:45
auniyalhttps://review.opendev.org/c/openstack/nova/+/83992216:45
bauzasshot16:45
auniyalmostly volume tests16:45
sean-k-mooneyya that kind fo a pain im not sure there is anythin we can do beyond recheck16:46
sean-k-mooneyis it the volume detach tests16:46
sean-k-mooneyauniyal: gibi found some tests that are not waiting properly16:46
bauzasI think this is also tracked on the stable CI failures etherpad16:46
auniyalyes, attach and detach , but they are always different16:46
auniyalsometime tomeout16:47
sean-k-mooneyyoga is not EM right so its still using tempest master?16:47
auniyalno 16:47
auniyaltbc no, its not EM16:47
sean-k-mooneyok16:47
sean-k-mooneyso it still can get tempest fixes if we fix those tests16:47
bauzasyup16:48
bauzasare we done ?16:48
auniyalsorry I didn't get, any action on above16:48
auniyalwe need to fix tempest tests ?16:49
sean-k-mooneyi think just continue to reheck it. gibi found at least on test that is not waiting for sshable 16:49
bauzasno, we have some tempest patches up16:49
sean-k-mooneyand notice other dont appear to eb waiting but i dont have the context16:49
bauzasand yoga would benefit from those16:50
bauzassince tempest is branchless16:50
sean-k-mooneyoh do you have a link?16:50
auniyalack thanks16:50
bauzasI was referring to gibi's recent discoveries of testing gap for ssh wait16:51
bauzas(sorry was looking at the -tc meeting)16:52
bauzas-tc chan*16:52
bauzascan we close this meeting now ?16:52
sean-k-mooneyits fine we can wrap this here and chat after16:52
bauzascool16:53
bauzasthanks all16:53
bauzas#endmeeting16:53
opendevmeetMeeting ended Tue Apr 25 16:53:12 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:53
opendevmeetMinutes:        https://meetings.opendev.org/meetings/nova/2023/nova.2023-04-25-16.00.html16:53
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-04-25-16.00.txt16:53
opendevmeetLog:            https://meetings.opendev.org/meetings/nova/2023/nova.2023-04-25-16.00.log.html16:53
sean-k-mooneyhttps://review.opendev.org/c/openstack/tempest/+/88089116:53
sean-k-mooneyseams to be related16:53
elodillesthanks o/16:53
sean-k-mooneyso that test is wrong17:01
sean-k-mooneywe do not supprot attaching or detaching prots or volume from neuton or cinder17:01
fricklerso we run this test for 6 years, have issues with it time and again, and only now notice that it tests an unsupported scenario? cool17:02
sean-k-mooneyfrickler: it has never been supported17:03
sean-k-mooneyi jsut notice this existed because gmann  has a dnm patch up17:03
gmannwhich one ?17:04
frickleryes, saw the comment in the patch. also didn't want to blame anyone, just enjoying the wondrous world of openstack once again17:04
sean-k-mooneyhttps://github.com/openstack/tempest/blob/master/tempest/api/volume/test_volumes_actions.py#L39-L5517:04
fricklerhttps://review.opendev.org/c/openstack/tempest/+/881132/317:04
sean-k-mooneyit kind of depend on what   self.volumes_client.attach_volume actully does17:05
fricklerah, no, the one below17:05
sean-k-mooneyif its calling nova its fine17:05
sean-k-mooneyif its using the cinder attachments api directly its not17:05
sean-k-mooneythat looks like its callining cinder https://github.com/openstack/tempest/blob/20e460dacfae6b4546903a9caaf9253330f27b5a/tempest/clients.py#L28617:07
sean-k-mooneyfrickler: this was actully added 11 years ago https://github.com/openstack/tempest/commit/a42fe441703084449107fabb15fe42938c02ba0817:10
sean-k-mooneythat does not mean it has been correct or supported for all tha time17:10
fricklerah, I was only looking at the current blame, which says 201717:10
frickleryou can see the actual API calls in https://4ae644854fb3bf106e9b-6877b85dbe482cd2daa62a6731b06023.ssl.cf1.rackcdn.com/881132/3/check/tempest-full-py3/37d3ce7/controller/logs/tempest_log.txt17:10
gmannfrickler: sean-k-mooney ohk that one. those tests are meant to be cinder standalone case and they are not valid scenario involving nova in half way17:12
sean-k-mooneyright they are fine fi you are using cinder standalone17:12
fricklerPOST https://213.32.75.38/compute/v2.1/servers/2a24008b-6c93-4b83-a678-8d5b0be7b6a1/os-volume_attachments17:12
gmannnobody since starting tested if passing nova server id in attachment via cinder will work from nova perspective or not17:12
fricklerthat looks like nova being used17:12
gmannI was testing those to remove nova involvement from those tests and nova+cinder attachment anyways are tested in many other tests17:13
sean-k-mooneyyep nova should be revmoed form them17:13
gmannfrickler: attachment is directly to cinder not via nova so nova does not know about attachment but cinder think server is attched to volume so make it in-use17:13
sean-k-mooneymarking it in use is correct17:14
sean-k-mooneybut we shoudl not see the volume attaed to the vm17:14
gmannyeah, i mean as nova does not know about attachment, adding server_id as valid attachment is not correct.17:14
gmannhat server_id can be invalid or can be deleted anytime17:14
gmannwithout cinder knopwing17:14
gmannsean-k-mooney: yeah, VM does not know about volume17:15
sean-k-mooneyanyway im glad you are looking at it you can ping me after the patch is out of DNM if you want me to reivew17:15
gmannk17:16
sean-k-mooneydansmith:  you pinged my yesterday to look at a patch maybe related to the stable uuid stuff17:17
sean-k-mooneydo you rememebr what it was17:17
dansmithsean-k-mooney: the rt stuff, but it's all blocked of course.. gibi gave it the +W so it's probably good for me to just fast-approve once the gate is unblocked17:17
sean-k-mooneyoh right ya that was it17:17
sean-k-mooneyi remeber seeing it had a +w17:18
sean-k-mooneycool17:18
dansmithyep, thanks17:18
dansmithI'll definitely hit you up if I need a re-review once things get unblocked17:18
dansmith*if* they get unblocked I should say :)17:18
fricklerthe pysaml revert mergen, so I think CI should be unblocked17:19
fricklermerged even17:19
dansmithwe'll see :)17:19
sean-k-mooneyim going to go get dinner. i might be around later but im mostly done for today17:20
dansmithmmm, ceph job appears to be failing again in a similar way.. hope we don't have more work to do17:50
bauzasdansmith: which patch are you checking for the job runs ? 18:31
bauzasso I can try to look over it tomorrow morning18:31
dansmithhttps://review.opendev.org/c/openstack/devstack-plugin-ceph/+/86531518:32
bauzasack, will target it tomorrow morning18:32
dansmithit only failed six tests this time instead of a timeout, so maybe it's better than I thought18:32
dansmithbut six is still a lot, and I haven't gone through the latest logs yet18:32
bauzasI can try to dig into those later18:33
dansmiththe fails look all volume detach related18:44
dansmithso perhaps it's not really a ceph problem18:44
dansmithbut it seems like a large number for a single run, so I'm not sure18:44
dansmithmelwitt: can you +W this? https://review.opendev.org/c/openstack/nova/+/881409/220:31
dansmitheharney: gouthamr: Well, it installs and "works" on jammy, but something isn't happy: https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315?tab=change-view-tab-header-zuul-results-summary20:38
dansmitheharney: gouthamr I don't really know what I'm looking at, but I don't see any errors in the cinder or ceph stuff that I recognize, and just some "busy" messages from rbd around the failed detaches in the n-cpu log20:41
gouthamrhey dansmith - /me is late to the party20:42
dansmithgouthamr: we need to drop focal from the jobs and our gate has been blocked for two days because some did it early..we've reverted those things for the moment, but we need to get the ceph job working on jammy20:43
dansmithgouthamr: the above patch unpins the jobs to let them run on jammy and they get pretty far, but some volume/ceph related failures are showing that something is not happy20:43
dansmithgouthamr: are you the right person to get that working?20:44
melwittdansmith: just to confirm, you still want to remove after neutron has reverted? https://review.opendev.org/c/openstack/neutron/+/88143020:44
dansmithmelwitt: yeah the neutron failure was a couple failures ago, and not even the only problem.. but as noted in the original patch, it was intended to only live for antelope and then be reverted, so we need to do it anyway20:45
gouthamrdansmith: probably not; i'm not an expert on rbd or cinder... 20:45
dansmithgouthamr: oh.. who is then?20:45
gouthamreharney is my go to guy, probably jbernard 20:46
dansmithgouthamr: okay he said earlier today that he was not likely the guy to ask (unless I misunderstood)20:46
gouthamrah :) let me look at the logs and see if something pops out 20:47
gouthamrwe've been burnt before by using distro packages for ceph because fixes took forever to land - so we shied away from them and looked upstream.. 20:48
gouthamrbut, like you've discussed, the ceph community hasn't built jammy packages for the latest release (quincy) - they meant to, they lost people/mindshare in the recent months 20:49
dansmithgouthamr: yeah, but last I checked, there were not packages from ceph themselves for jammy20:49
dansmithgouthamr: and the cephadm job is even more broken and marked n-v so I assume it's not healthier20:49
gouthamron the manila jobs, we pivoted to use centos-stream-9 because ceph folks continue to publish packages there 20:49
gouthamrs/there/for it20:49
gouthamrdansmith: yep; on that change, i see there's a problem with podman...20:50
dansmithgouthamr: yeah, but stream breaks us constantly20:50
gouthamroh20:50
dansmiththere's very little chance we're going to be able to have this job run on stream :)20:50
dansmithgouthamr: yeah I see the podman thing, but the job is also marked non-voting20:51
dansmithso I assume it doesn't have a long track record of stability :)20:51
gouthamryes; i don't think we've run the job long enough to test for stability: https://zuul.opendev.org/t/openstack/builds?job_name=devstack-plugin-ceph-cephfs-nfs20:52
dansmithpresumably the cephadm approach means we could run on jammy but with upstream ceph fixes20:52
gouthamryep20:52
dansmithgouthamr: that's the cephfs job, but I assume that's not what nova needs20:52
dansmithI'm looking at devstack-plugin-ceph-tempest-cephadm20:52
gouthamryes; just pointing out that job because it uses centos-9-stream20:53
dansmithah okay20:53
gouthamrack; 20:53
gouthamr"devstack-plugin-ceph-tempest-cephadm" on focal-fossa used a third party repo to get podman20:53
dansmithah, and podman is in jammy itself I think right?20:54
gouthamrby the looks of it, yes20:54
dansmithalthough it seems broken :)20:54
dansmithanyway, I thought there was also some concern that the cephadm job didn't expose the ceph config that nova needed or something like that, but I heard that like 20th hand20:55
gouthamrshouldn't be the case 20:55
dansmithokay20:55
gouthamri think we tested some of this without tempest in the picture - but that job ("devstack-plugin-ceph-tempest-cephadm") has never passed; we assumed someone working on nova/cinder/glance would help looking at it at some point20:56
gouthamrsorry this feels disjointed - conversations happened on irc and gerrit, ptg and the ML iirc.. but its time to reprise this because it's urgent.. 20:57
gouthamrthat's a tangent though, let me see if i can spot an issue with the package based job you're looking to fix20:57
dansmithugh, never passed? that's no good.. I wonder if for the same reason the non-cephadm job is failing/20:58
dansmithI can try to get podman working on this to see if it's otherwise the same20:58
gouthamr++20:58
dansmithgouthamr: this is definitely disjointed, and I feel like I'm just flailing because nobody else is :/20:58
gouthamryou're doing godly work :D 20:59
dansmith$deitly work you mean :)20:59
dansmither, $deityly .. or soething21:00
gouthamr:P21:00
dansmithokay I just pushed something that might get podman working based on that error message, so we'll see21:00
dansmithmelwitt: thanks for the +W.. I would have just removed that neutron reference and changed to "effing everything" but didn't want to have to make another trip through the jobs, as you can probably imagine :)21:01
dansmithgouthamr: the cephadm that has never passed.. is that always on quincy (i.e. newer than what we were running in focal) or what?21:06
dansmithI21:06
melwittdansmith: understandable :)21:06
gouthamrdansmith: 5/6 failures are on volume detach timeouts; and the request never got to cinder afaict .... https://zuul.opendev.org/t/openstack/build/9ebda7c1ebf843209e57ef0eac13814f/log/controller/logs/screen-n-cpu.txt#61276-6132921:07
dansmithgouthamr: yeah but you see the rbd busy messages right?21:07
dansmithI commented on an earlier patch21:08
gouthamrah; no i missed those21:08
dansmithI'm not sure the cinder detach would have happened by this point by the way, because we haven't gotten the guest to let go yet21:08
gouthamryes21:09
gouthamrmight be my browser, but i don't see "rbd.ImageBusy: [errno 16] RBD image is busy (error removing image)" in the latest n-cpu logs21:10
gouthamrshould i be looking elsewhere?21:10
dansmithyeah actually I don't think I see them in the latest either.. but that was just a recheck21:11
dansmithalmost identical set of failures though21:11
dansmithso yeah.. weird21:11
dansmithgouthamr: here's an example from the previous run: https://zuul.opendev.org/t/openstack/build/f2ecbdd78616419cb5c8c2b3f4a8b71a/log/controller/logs/screen-n-cpu.txt#5543921:13
dansmiththat's all over those logs and absent from the latest.. bizarre21:15
dansmithgouthamr: cephadm job made it past cephadm install phase, so.. progress I think21:17
gouthamrvery nice21:17
dansmithand finished pool setup (ignorant guess from the commands it ran)21:20
gouthamrhttps://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315/11/devstack/files/debs/devstack-plugin-ceph breaks focal fossa jobs (cephfs-native) though - but, we can fix that up with some os version annotation, correct? 21:22
dansmithwe can fix it by putting it in the code instead of those package lists21:23
dansmithI just jammed it in there because it was hard to mess up and will make sure we get it installed from the distro21:23
gouthamrack; either that or i can just fix the native cephfs job to use jammy too 21:24
dansmiththe current install_podman thing does checks for focal, so I'd just extend that21:24
dansmithgouthamr: yeah, although the ceph jobs on stable have to use the plugin without branches don't they?21:24
gouthamrno this repo is branched21:24
dansmithah okay21:24
dansmithwell, either way.. if we move this to only >=jammy for everything (along with the PTI for 2023.2) then we can just use this debs list thing and remove the focal-specific install stuff.. whatever you ant21:25
dansmithI just want it to work :)21:25
gouthamragree; lets see this work21:25
gouthamrif the "rbd remove" thing fails again on this run with the error, i would suggest reporting a bug - we could _try_ this thing on centos-9-stream and see whether there's some weirdness in the distro packages21:27
gouthamrbut, i am nervous about that job's future with ubuntu since we've learned about the ceph community's stance 21:27
dansmithack, well, based on how this is working, I'm hoping the cephadm will either "work" or "fail the same way" and then we can discount distro packages21:28
gouthamr++21:28
dansmithit's running tempest now and looking identical to the distro version so far (i.e. hasn't failed but hasn't run volume tests yet)21:28
dansmithso that's majorly better. failing with upstream bits is a minor win over failing with distro bits :P21:28
eharneyi have some work in flight currently around rbd ImageBusy errors... i wonder if this job is running some tests that were previously disabled?21:29
dansmitheharney: shouldn't be, and one run failed with a bunch of them and then a single recheck failed similarly, but with no rbdbusy errors21:29
gouthamreharney: interesting, are there librbd changes that you're having to work around? 21:31
eharneygouthamr: not new ones, just working on fixing the class of errors around rbd images that can't be deleted that we've always had21:32
gouthamreharney: oh.. this error seems to have occurred multiple times in the libvirt rbd "remove image" call..21:34
gouthamr i'm hoping no openstack code needs to change; we're hoping to support quincy with stable/wallaby downstream :D based on some testing of this stuff elsewhere21:36
dansmithgouthamr: yeah I was going to ask earlier.. are we using quincy downstream such that we know it works? this set of failures has me concerned that it's something fundamental of course21:36
gouthamr(same tests, different OS, full blown ceph cluster as opposed to our aio .. etc etc -- so there could be a number of things being issues)21:37
gouthamrdansmith: yes, we're not testing quincy with openstack's trunk though... we trail downstream; but this stuff is working with zed last i checked, with the same tempest tests passing there21:38
dansmithokay that's good21:38
dansmithwell, it's passing some volume tests at least21:56
gouthamr++21:56
gouthamra couple of things we wanted to try in this cephadm job in the past: 21:57
dansmithgouthamr: so if this works magically, you're okay just making this drop support for focal as long as the other jobs here are set to run on jammy?21:57
gouthamr(1) revert to default test concurrency --- the concurrency was set to 1 because we saw resource contention  21:57
gouthamrdansmith: yes21:58
dansmithresource contention like memory/21:58
gouthamryes, and disk 21:58
dansmithso there's a tweak in devstack for memory that has been helping a lot of jobs, and we run with it enabled in the nova ceph jobs21:58
gouthamroh? i'd love to know!21:58
dansmithdrops mysql usage by about half, which is ~400MiB on these jobs21:58
gouthamrnice21:58
dansmithgouthamr: problem is you have to pay me a royalty per job execution to use it21:58
dansmithgouthamr: https://github.com/openstack/nova/blob/master/.zuul.yaml#L42221:59
gouthamr:D i'd be poor if i was betting on ceph using less memory ever21:59
dansmithhaha21:59
gouthamr(2) there's also a way to turn off "cephadm" after the install -- we should set that option on the CI - there's no use to keep it running 22:00
dansmithso based on our success with that, if you're cool with it, I'd say we turn that on for these jobs anyway22:00
dansmithgouthamr: meaning before we start running tempest?22:00
gouthamryes, the plugin will do that for you after the ceph cluster deployment is done: https://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/cephadm#L2822:01
dansmithoh so why is that not being set?22:02
dansmithI don't even know what that means.. cephadm is a tool I thought, but you're saying it stays running even though the services are started or something?22:02
gouthamr(here's an example from the only good cephadm job at the moment: https://github.com/openstack/manila-tempest-plugin/blob/ad0db6359c8c51c4521ac6660c8014981b2f1dea/zuul.d/manila-tempest-jobs.yaml#L413)22:02
dansmithack22:02
gouthamryes; we need it for day2 operations on the cluster22:02
gouthamrso if you're running this on your local devstack, it's useful22:03
dansmithah, sure, okay22:03
dansmithwe should set that in these jobs that everyone inherits from22:03
gouthamr++22:03
dansmithso maybe a follow-on to this to set that, the mysql thing, and make this job voting22:03
dansmithyou know, if and when it starts working :)22:04
gouthamr^^ +122:04
dansmithhmm, gouthamr I just noticed that we're still set to release=pacific for the distro-package-based job22:06
dansmithis that maybe setting stuff that is non-ideal for quincy that could be related?22:06
gouthamri think the plugin ignores it, let me check22:06
dansmithokay22:06
dansmithhttps://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315/11/.zuul.yaml#4622:06
dansmithokay just used for package repos anyway it looks like22:08
gouthamrack; we should get that opt out of the job because its confusing22:08
gouthamrhttps://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/ceph#L981-L98922:08
dansmithack, so I assume we're configuring a repo but just not installing anything from it on jammy, right?22:09
dansmithI'll add it to my follow-on patch to add the optimization devstack vars22:10
dansmithadd .. the removal of it, I mean :)22:10
gouthamrack ty22:10
gouthamri may have messed this up in my last patch22:10
gouthamrhttps://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/ceph#L1067-L109422:10
gouthamrwe're not invoking that method at all for ubuntu anymore.. 22:10
dansmithack22:11
dansmithwell, there's a bunch of focal-specific stuff to clean up in there regardless22:11
dansmithhmm, I think it's about to fail a test22:14
dansmithbeen stopped for going on five minutes, assuming retrying a detach22:15
gouthamryes.. it was a head-scratcher; think vkmc and i noticed that our override to use download.ceph.com stopped working with focal at some point since ubuntu default-enabled the ubuntu ceph repos.. so dropping it made no difference, we ended up using/testing with the distro provided packages 22:15
gouthamroh22:15
gouthamrtest_rebuild_server_with_volume_attached? 22:16
dansmithdunno yet22:16
gouthamrah22:16
dansmithbut it's in the rebuld group22:16
dansmithyep22:16
dansmithtest_rebuild_server_with_volume_attached [430.305393s] ... FAILED22:17
dansmithugh22:17
dansmithso the other variable here is the version of qemu and qemu's block-rbd driver are different in jammy of course, compared to what we've been testing22:18
dansmithso could be a bug in one of those, especially since it's related to the detach in the guest22:18
gouthamrack; another thing to try would be to bump the ceph image to the latest quincy: https://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/cephadm#L32 22:19
dansmithokay22:20
gouthamrthey've published v17.2.6 today, and v17.2.5 a month ago 22:20
gouthamrhttps://quay.io/repository/ceph/ceph?tab=tags22:21
dansmithack, I hate this sort of "version minesweeper" game.. if we're that sensitive to version, it feels like we're doing something wrong22:21
gouthamragreed; but since this stuff hasn't worked before on our ci, its worth a try22:22
dansmithyeah for sure22:22
dansmithvolume resize passed22:23
dansmithmaybe we'll find that this is fewer fails or something22:23
dansmithactually, that one didn't fail before22:25
gouthamrokay, this may be good news? devstack-plugin-ceph-tempest-py3 is going to pass22:28
dansmithno, really?22:28
dansmiththird recheck's a charm?22:28
gouthamr:D22:28
dansmithmore fails on this cephadm job22:42
dansmithso I guess it's not something fundamental, but maybe just massively less stable or we're hitting some race easier?22:42
dansmithmaybe it is memory-related and we're stressed more here22:43
dansmithmaybe I should try turning on the two optimizations here to see if that makes things more stable22:43
gouthamr++22:43
gouthamr"DISABLE_CEPHADM_POST_DEPLOY: true" and "MYSQL_REDUCE_MEMORY: true" for the rescue; we can iterate after with the concurrency if these failures reduce22:45
gouthamror go away22:45
opendevreviewArtom Lifshitz proposed openstack/nova master: Reproduce bug 1995153  https://review.opendev.org/c/openstack/nova/+/86296722:46
opendevreviewArtom Lifshitz proposed openstack/nova master: Save cell socket correctly when updating host NUMA topology  https://review.opendev.org/c/openstack/nova/+/86296422:46
dansmithgouthamr: ack, will put those in here and see22:46
opendevreviewMerged openstack/nova master: Remove focal job for 2023.2  https://review.opendev.org/c/openstack/nova/+/88140922:53
dansmithgouthamr: okay, it's off and running with those flags23:25
dansmithI'm burnt out so I'll circle back tomorrow23:25
gouthamrdansmith++ works; good evening! :) 23:27

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!