gmann | dansmith: elodilles has patch up to remove all py38 jobs including focal job https://review.opendev.org/c/openstack/nova/+/881339 | 00:37 |
---|---|---|
ykarel | dansmith, only merged in neutron repo, others are still unmerged, i pushed the revert | 04:49 |
bauzas | catching up this night's conversations | 06:55 |
* bauzas is about to cry in a corner | 06:55 | |
opendevreview | Balazs Gibizer proposed openstack/nova master: DNM: Revert "Temporary skip some volume detach test in nova-lvm job" https://review.opendev.org/c/openstack/nova/+/881389 | 07:17 |
gibi | bauzas: I need to leave early today so I will miss the nova meeting | 08:49 |
bauzas | gibi: ack, no worries | 09:04 |
opendevreview | Amit Uniyal proposed openstack/nova master: WIP: Reproducer for dangling volumes https://review.opendev.org/c/openstack/nova/+/881457 | 10:01 |
opendevreview | Merged openstack/nova stable/yoga: Handle InstanceInvalidState exception https://review.opendev.org/c/openstack/nova/+/872117 | 11:27 |
dansmith | gmann: I don't see that that ever passed | 13:01 |
dansmith | using the alternate python interpreter on focal seems less good to me than just fixing the ceph job | 13:02 |
dansmith | and also not necessary if people don't prohibit 3.8 | 13:02 |
sean-k-mooney | dansmith: ya i suggested it because it used to work but it obviously got broken at some point | 13:03 |
sean-k-mooney | i was hoping it woudl be a quick fix to unblock the gate | 13:03 |
sean-k-mooney | just makeign the job use 22.04 would be my preference too | 13:03 |
bauzas | yup, let's try to use Jammy | 13:08 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (api) https://review.opendev.org/c/openstack/nova/+/836830 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Check shares support https://review.opendev.org/c/openstack/nova/+/850499 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add metadata for shares https://review.opendev.org/c/openstack/nova/+/850500 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_attach notification https://review.opendev.org/c/openstack/nova/+/850501 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_detach notification https://review.opendev.org/c/openstack/nova/+/851028 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add shares to InstancePayload https://review.opendev.org/c/openstack/nova/+/851029 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add helper methods to attach/detach shares https://review.opendev.org/c/openstack/nova/+/852085 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add libvirt test to ensure metadata are working. https://review.opendev.org/c/openstack/nova/+/852086 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add virt/libvirt error test cases https://review.opendev.org/c/openstack/nova/+/852087 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add share_info parameter to reboot method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/854823 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Support rebooting an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/854824 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_attach_error notification https://review.opendev.org/c/openstack/nova/+/860282 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_detach_error notification https://review.opendev.org/c/openstack/nova/+/860283 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add share_info parameter to resume method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/860284 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Support resuming an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/860285 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Add helper methods to rescue/unrescue shares https://review.opendev.org/c/openstack/nova/+/860286 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Support rescuing an instance with shares (driver part) https://review.opendev.org/c/openstack/nova/+/860287 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Support rescuing an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/860288 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Docs about Manila shares API usage https://review.opendev.org/c/openstack/nova/+/871642 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Mounting the shares as part of the initialization process https://review.opendev.org/c/openstack/nova/+/880075 | 13:11 |
opendevreview | ribaudr proposed openstack/nova master: Deletion of associated share mappings on instance deletion https://review.opendev.org/c/openstack/nova/+/881472 | 13:11 |
dansmith | bauzas: oh I didn't realize you rechecked my patch and it failed again with *another* uninstallable package | 13:16 |
dansmith | not sure if it's 3.9 related or not though | 13:17 |
bauzas | I haven't seen the logs yet | 13:17 |
dansmith | I rechecked again so we'll see | 13:17 |
ykarel | fwiw pysaml2 update is py3.9+ only and keystone installation is broken in focal or py38 | 13:18 |
dansmith | yeah that ^ | 13:19 |
dansmith | that's what I failed on | 13:19 |
dansmith | cripes | 13:19 |
ykarel | revert patch https://review.opendev.org/c/openstack/requirements/+/881466 | 13:19 |
bauzas | ykarel: last time I looked, this was due to the fact that py39 wanted UUIDs | 13:21 |
ykarel | bauzas, sorry which issue? /me can't relate to UUIDs | 13:35 |
dansmith | yeah not sure what the uuid thing is | 13:39 |
bauzas | it was when elodilles tried to update the py version for ceph-multinode | 13:41 |
bauzas | but I can be wrong | 13:41 |
dansmith | oh, unrelated to pysaml I see | 13:41 |
ykarel | ack | 13:41 |
bauzas | speaking of https://1742443592df6307c50f-d94079506d616d2cb38d2b70f09b441e.ssl.cf2.rackcdn.com/881339/3/check/nova-ceph-multistore/5f269ed/controller/logs/devstacklog.txt | 13:41 |
bauzas | 2023-04-24 11:48:22.720 | [01;33mWARNING py.warnings [[01;36mNone req-518808ec-0a14-444d-b6d3-fb6383d03e9c [00;36mNone None[01;33m] [01;35m[01;33m/usr/local/lib/python3.9/dist-packages/pycadf/identifier.py:71: UserWarning: Invalid uuid: RegionOne. To ensure interoperability, identifiers should be a valid uuid. | 13:42 |
dansmith | eharney: are you around to talk about the ceph job stuff? I think you were merging some of those cephadm changes | 13:43 |
ykarel | ack got it, it's seperate thing | 13:43 |
eharney | dansmith: yes | 13:45 |
eharney | dansmith: what's going on there? | 13:45 |
dansmith | eharney: so, is that how we should be moving the upstream jobs at this point? jammy has quincy ceph packages, as I understand it | 13:45 |
dansmith | and the cephadm jobs don't work either | 13:45 |
dansmith | and they're nonvoting so I assumed that meant they weren't "ready" | 13:47 |
dansmith | eharney: anyway, the context is, we desperately need to get the ceph jobs running on jammy | 13:47 |
eharney | i was trying to understand the situation with these the other day, i guess they were pinned to focal before because there weren't jammy packages, but maybe now there are? | 13:48 |
dansmith | eharney: yeah that's the reason for the pinning, but indeed now there seem to be base packages in jammy for quincy | 13:49 |
dansmith | I rechecked the unpin patch yesterday and it failed with qemu not having the rbd block driver | 13:49 |
dansmith | eharney: https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315?tab=change-view-tab-header-zuul-results-summary | 13:49 |
eharney | hmm, looking, i'm not really up to speed on this | 13:50 |
dansmith | eharney: okay is there someone else we should bug? | 13:50 |
dansmith | eharney: because right now the ceph jobs are all totally wedged because a bunch of packages have just banned anything python<=3.8 | 13:51 |
eharney | dansmith: not sure who but i think some manila folks have been more active in devstack-plugin-ceph recently | 13:51 |
dansmith | reverts are in place or in flight for some of them, but we need to get it resolved | 13:51 |
eharney | ahh | 13:51 |
dansmith | because focal is not in our PTI right now... | 13:51 |
dansmith | eharney: so maybe gouthamr ? | 13:52 |
sean-k-mooney | dansmith: did you notice the ceph nfs job passed? https://zuul.opendev.org/t/openstack/build/3f0ec49fa9e048f6a73e6c03833fecc9/logs | 13:57 |
dansmith | sean-k-mooney: I assume that's not using the qemu block driver | 13:57 |
sean-k-mooney | perhaps although that is an odd issue | 13:57 |
dansmith | sean-k-mooney: the actual setup of ceph seemed to work on jammy, it's just that qemu doesn't have the block driver to load to talk to it (or something) | 13:57 |
sean-k-mooney | they may have chagned that to a weak dep in the qemu packaging | 13:58 |
sean-k-mooney | so we might need to install it in devstack explictly | 13:58 |
dansmith | it seemed like maybe those were provided in the ceph packages instead of qemu or something | 13:58 |
dansmith | sean-k-mooney: well, I looked and couldn't find any package in jammy for it | 13:58 |
dansmith | sean-k-mooney: but yes, hopefully it's something simple | 13:58 |
sean-k-mooney | i assumed it was complied in if im being honest but didnt look | 13:58 |
dansmith | me too | 13:59 |
dansmith | until I saw the logs.. let me get you a link | 13:59 |
dansmith | sean-k-mooney: https://zuul.opendev.org/t/openstack/build/1435ec8b57be4b159c3d373e23673ab5/log/controller/logs/screen-n-cpu.txt#8437 | 13:59 |
dansmith | it's also weirdly named "block-block-rbd".. perhaps there's some change in jammy with something and someone is prefixing an extra... prefix/ | 14:00 |
dansmith | although it says "unknown driver rbd" on the next line, which looks right | 14:00 |
sean-k-mooney | there nova package just depend on qemu-system https://packages.ubuntu.com/jammy/nova-compute-qemu | 14:00 |
bauzas | dansmith: sean-k-mooney: so, to clarify, once we merge https://review.opendev.org/c/openstack/nova/+/881409 | 14:00 |
bauzas | we will still have a problem with ceph-multistore due to keystone + cephadm right? | 14:01 |
dansmith | bauzas: yes | 14:01 |
sean-k-mooney | hum that is not linked agains librbd | 14:01 |
sean-k-mooney | https://packages.ubuntu.com/jammy/qemu-system-x86 | 14:01 |
dansmith | bauzas: not "due to cephadm" | 14:01 |
sean-k-mooney | ah | 14:01 |
sean-k-mooney | https://packages.ubuntu.com/jammy/qemu-block-extra | 14:01 |
dansmith | bauzas: just the ceph job | 14:01 |
sean-k-mooney | dansmith: we are misisng qemu-block-extra | 14:01 |
bauzas | because of keystone ? | 14:01 |
sean-k-mooney | that is what provides the rbd supprot | 14:01 |
* bauzas tries to correctly understand the problem | 14:02 | |
dansmith | sean-k-mooney: ack, nice | 14:02 |
sean-k-mooney | and just checked its a recommended package | 14:02 |
eharney | ahh, cool, was just looking for that missing link myself | 14:02 |
sean-k-mooney | not a dep | 14:02 |
dansmith | sean-k-mooney: so we likely need a devstack ceph plugin change, let me push that up and try a dep | 14:02 |
sean-k-mooney | ya so we coudl also do this via bindep in nova as an optional dep . perhaps we shoudl as a follow up | 14:03 |
sean-k-mooney | i was just going to look at devstack to see how it installes qemu/libvirt | 14:03 |
sean-k-mooney | and add it there but doing it in the devstack plugin proably makes more sense | 14:03 |
eharney | it can go in devstack-plugin-ceph/devstack/files/debs i think | 14:03 |
dansmith | sean-k-mooney: is that package also in focal? | 14:03 |
sean-k-mooney | yep | 14:04 |
sean-k-mooney | well im waiting fo the page to refrsh | 14:04 |
dansmith | it is | 14:04 |
sean-k-mooney | but it looks like ti shoudl be there form bionic | 14:04 |
dansmith | I confirmed | 14:04 |
dansmith | https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/881479 | 14:05 |
dansmith | I'll fix up the revert-focal devstack patch to depend on it | 14:05 |
sean-k-mooney | so they moved it form suggetes to recommends from focal to jammy | 14:05 |
dansmith | ah | 14:06 |
sean-k-mooney | https://github.com/openstack/devstack/blob/master/lib/nova_plugins/functions-libvirt#L72 | 14:07 |
sean-k-mooney | so i was thinkign of just adding hit there to be honest | 14:07 |
sean-k-mooney | just alwasy install qemu-block-extra in base devstack | 14:08 |
dansmith | the ceph plugin already has a list of specific-to-it deb packages.. so it needs to go there anyway, IMHO | 14:08 |
sean-k-mooney | sure | 14:08 |
sean-k-mooney | im just wonderign what else is ther | 14:09 |
sean-k-mooney | it look like libisci is also there | 14:09 |
dansmith | xfsprogs | 14:09 |
dansmith | although I dunno why | 14:09 |
dansmith | anyway, let's see if this works | 14:09 |
sean-k-mooney | aho ok we dont need it for isci because we host mount | 14:09 |
sean-k-mooney | for multipath | 14:09 |
dansmith | ah, iscsi is in block-extra as well? | 14:09 |
sean-k-mooney | yes but only if you use qemu to directly conenct to the iscsi backend which we nolonger do. i dont think we use gluserfs either | 14:10 |
dansmith | right | 14:10 |
dansmith | sean-k-mooney: eharney okay it installed the qemu-block-extra package successfully, so that's good.. we'll see if it can actually boot instances once it gets there :) | 14:35 |
dansmith | looks like it might be booting servers ... | 14:49 |
*** iurygregory_ is now known as iurygregory | 15:00 | |
bauzas | reminder : nova meeting in 35 mins | 15:25 |
bauzas | sean-k-mooney: dansmith: I'm about to communicate back to the ML saying that the gate is unblocked, amirite ? | 15:34 |
dansmith | no | 15:34 |
sean-k-mooney | no | 15:34 |
sean-k-mooney | none fo the patches are merged | 15:34 |
dansmith | we have a ways to go before that | 15:34 |
dansmith | I have to squash my fix into the main one, since it can't merge on its own | 15:34 |
dansmith | and we'll have to recheck the main one anyway since it failed more volume attach things | 15:34 |
dansmith | so we still got a bit | 15:35 |
bauzas | because other libs are capping >=3.9 ? | 15:35 |
* bauzas is confused | 15:35 | |
bauzas | I thought neutron did the revert | 15:35 |
dansmith | bauzas: so many more problems than that dude :) | 15:36 |
dansmith | now pysaml2 has gone >3.8 which is breaking everyone | 15:36 |
bauzas | oh | 15:36 |
dansmith | that revert may land and we might be okay, but there might be others too | 15:36 |
bauzas | yeah, I got the memo but I didn't read it correctly | 15:36 |
bauzas | okay, then I'll write some status email then | 15:37 |
sean-k-mooney | i think we are just goign to mvoe everytin to jammy and drop the 3.8 testing | 15:37 |
bauzas | saying we can't guarantee that any of the deps are still able to use py3.8 | 15:37 |
sean-k-mooney | assumign the ceph job evenutally passes without the detach error | 15:37 |
bauzas | sean-k-mooney: that was my thought | 15:37 |
bauzas | https://review.opendev.org/c/openstack/nova/+/881409 should help | 15:38 |
sean-k-mooney | https://review.opendev.org/q/topic:drop-py38 is the full set of fixes | 15:39 |
sean-k-mooney | we basically need to revert the py39 change out of https://review.opendev.org/c/openstack/nova/+/881339 and incopreate the ceph job fixes instead | 15:40 |
sean-k-mooney | so https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/881479 and https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315 need to be squased | 15:41 |
sean-k-mooney | and https://review.opendev.org/c/openstack/nova/+/881339 need to depend on the squahsed commit | 15:41 |
sean-k-mooney | dansmith: those were the two patches you planned to squash right? | 15:42 |
bauzas | sean-k-mooney: thanks, I'll explain it | 15:42 |
dansmith | yes | 15:42 |
dansmith | I'm waiting for the run to finish so I can look at the failures | 15:42 |
dansmith | unfortunately i looks like maybe we OOMed or something because now we're failing to clean up neutron ports and various other things | 15:43 |
dansmith | hopefully the new ceph doesn't like use a lot more memory or something :/ | 15:43 |
sean-k-mooney | we might need to bump the swap or enabel the mariadb thing | 15:43 |
sean-k-mooney | thats still not on by defualt in devstack is it? | 15:43 |
dansmith | it is for the ceph job in nova, but yeah maybe not here not sure | 15:45 |
dansmith | we'll see when it finishes | 15:45 |
sean-k-mooney | the default is still false https://github.com/openstack/devstack/blob/4dfb67a831686279acd66f65e51beba42f675c91/stackrc#L207 | 15:45 |
dansmith | right | 15:46 |
sean-k-mooney | although it is enabeld by default in zuul for multi node jobs https://github.com/openstack/devstack/blob/2e607b0cbd91d9243c3e9424a500598c72ae34ad/.zuul.yaml#L701 | 15:46 |
sean-k-mooney | but im assumign thsi is singelnode | 15:46 |
sean-k-mooney | anyway we have a few levers we can pull when we knwo why it failed | 15:46 |
opendevreview | Elod Illes proposed openstack/nova master: Drop py38 based zuul jobs https://review.opendev.org/c/openstack/nova/+/881339 | 15:55 |
opendevreview | Elod Illes proposed openstack/nova master: Drop py38 support from setup.cfg and tox.ini https://review.opendev.org/c/openstack/nova/+/881365 | 15:56 |
bauzas | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Apr 25 16:00:02 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
bauzas | hey everyone | 16:00 |
elodilles | o/ | 16:00 |
dansmith | o/ | 16:00 |
bauzas | we are quite busy today so let's try to be quick | 16:00 |
ykarel | o/ | 16:00 |
auniyal | o/ | 16:00 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:00 |
bauzas | #topic Bugs (stuck/critical) | 16:00 |
bauzas | #info No Critical bug | 16:00 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 17 new untriaged bugs (+4 since the last meeting) | 16:00 |
bauzas | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:01 |
bauzas | honestly, I think I forgot to tell melwitt to look at the bugs, so it's on me :) | 16:01 |
bauzas | the next person in the roster is artom | 16:01 |
artom | ohhai | 16:01 |
bauzas | artom: fancy triaging some upstream bugs if you like ? | 16:01 |
artom | Sure | 16:02 |
bauzas | thanks a lot | 16:02 |
artom | CLOSED RUSTWILLFIXIT | 16:02 |
bauzas | I'll also try to cherry-pick some | 16:02 |
bauzas | artom: well, we're on Launchpad | 16:02 |
bauzas | so it'd be 'Closed' only with a comment saying you'd think Rust would work | 16:03 |
bauzas | or even 'Wontfix' | 16:03 |
bauzas | anyway | 16:03 |
bauzas | #info bug baton is being passed to artom | 16:03 |
bauzas | #topic Gate status | 16:03 |
bauzas | grab your popcorns | 16:03 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:03 |
bauzas | #link https://etherpad.opendev.org/p/nova-ci-failures | 16:03 |
bauzas | but alas, | 16:03 |
bauzas | #link https://lists.openstack.org/pipermail/openstack-discuss/2023-April/033454.html Gate is blocked | 16:04 |
bauzas | last status : https://lists.openstack.org/pipermail/openstack-discuss/2023-April/033468.html | 16:04 |
bauzas | dansmith: sean-k-mooney: elodilles: wanting to add anything else on that ? | 16:05 |
dansmith | probably not | 16:05 |
sean-k-mooney | we are mainly waiting on ci | 16:05 |
sean-k-mooney | so no i think that fine | 16:05 |
elodilles | i've just updated the nova patches according to the discussions (if i did not miss anything) | 16:05 |
bauzas | we have two concurrent patches afaics https://review.opendev.org/c/openstack/nova/+/881409 and https://review.opendev.org/c/openstack/nova/+/881339 | 16:05 |
elodilles | if those will be the chosen ones :) | 16:05 |
bauzas | we may want to only use one :) | 16:06 |
dansmith | yeah | 16:06 |
sean-k-mooney | elodilles: your missing the depend on agains the devstack patch but those need to be squashed first | 16:06 |
bauzas | anyway, we'll sort that out of the meeting | 16:06 |
sean-k-mooney | so its fine for now | 16:06 |
elodilles | ++ | 16:07 |
bauzas | yup and thanks to dansmith and sean-k-mooney for working hardly on the ceph patches | 16:07 |
sean-k-mooney | mainly dansmith | 16:08 |
bauzas | -EOLDGREEK to me | 16:08 |
sean-k-mooney | i have been busy with other things other then checking back every now and then | 16:08 |
bauzas | (I mean, I know what ceph does and all the things, but that is a different story) | 16:08 |
sean-k-mooney | anywya we can move on | 16:08 |
bauzas | cool | 16:08 |
bauzas | https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly | 16:08 |
bauzas | unsurprisingly they are all green | 16:09 |
bauzas | but by their runs we didn't had the py39 updates :)Ã | 16:09 |
sean-k-mooney | good timing i guess | 16:09 |
bauzas | well, tooz upgraded to 4.0 the week before | 16:10 |
opendevreview | Merged openstack/nova stable/xena: Fix rescue volume-based instance https://review.opendev.org/c/openstack/nova/+/875343 | 16:10 |
bauzas | so that's fortunate we haven't merged the u-c patch during the weekend | 16:10 |
bauzas | anyway, moving on | 16:10 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:11 |
bauzas | #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures | 16:11 |
bauzas | #topic Release Planning | 16:11 |
bauzas | #link https://releases.openstack.org/bobcat/schedule.html | 16:11 |
bauzas | #info Nova deadlines are set in the above schedule | 16:11 |
bauzas | #info Bobcat-1 is in 2 weeks | 16:11 |
bauzas | (we'll have a stable branch review day on Bobcat-1 but I'll explain this the next week) | 16:11 |
bauzas | #topic Review priorities | 16:12 |
bauzas | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) | 16:12 |
bauzas | #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review | 16:12 |
bauzas | #topic Stable Branches | 16:12 |
bauzas | elodilles: take the mic | 16:12 |
elodilles | yepp | 16:12 |
elodilles | beyond the usual stuff, | 16:12 |
elodilles | (unblocked gate & many rechecks) | 16:13 |
elodilles | auniyal prepared release patches for yoga and zed | 16:13 |
elodilles | they haven't merged yet | 16:13 |
elodilles | and meanwhile 1-1 patches merged to stable/yoga and zed | 16:13 |
auniyal | yes, got 1 +2 yet :), thanks for review elodilles | 16:14 |
elodilles | otherwise they are good as they are | 16:14 |
elodilles | auniyal: thanks for proposing the patches :) | 16:14 |
bauzas | cool | 16:14 |
bauzas | anything else ? | 16:15 |
elodilles | nope, i think that was all | 16:16 |
elodilles | from my side | 16:16 |
elodilles | sorry :) | 16:16 |
bauzas | cool, moving on | 16:17 |
bauzas | #topic Open discussion | 16:17 |
bauzas | (ykarel) Allow to add tb-cache size libvirt option for qemu, context https://bugs.launchpad.net/nova/+bug/1949606 | 16:17 |
bauzas | ykarel: go for it | 16:17 |
ykarel | hi | 16:19 |
ykarel | so qemu-5.0.0(included in ubuntu jammy) update default tb cache size to 1 GiB(from 32 MiB) for system-emulated guest vms and with that each guest VM is using much more system memory(1 GB+ memory for guest), resulting into oom-kill issues when creating multiple guest vms concurrently in neutron scenario jobs using ubuntu guest vms | 16:19 |
ykarel | libvirt-8.0.0 added an option to configure it per guest vm | 16:20 |
ykarel | Currently testing WIP nova patch https://review.opendev.org/c/openstack/nova/+/868419 in https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/881391 | 16:20 |
ykarel | should this be a specless RFE or can continue as a bug? | 16:20 |
sean-k-mooney | ykarel: did you file the specless pluprint i asked for | 16:20 |
sean-k-mooney | ah | 16:20 |
sean-k-mooney | its not a bug | 16:20 |
ykarel | sean-k-mooney, i just added to meeting agenda to get clearity | 16:21 |
ykarel | but if it's a way to go then i will file it | 16:21 |
sean-k-mooney | we might conider it to be a small enough workaround that we might want to backport it upstream | 16:21 |
sean-k-mooney | to enable ci testing | 16:21 |
ykarel | hmm mainly would be needed for releases running on jammy | 16:22 |
sean-k-mooney | the issue is that i dont think 22.04 has libvirt 8.0.0 | 16:22 |
ykarel | it has | 16:22 |
ykarel | that's what we testing | 16:22 |
sean-k-mooney | ah ok | 16:22 |
bauzas | we damn need update of https://docs.openstack.org/nova/latest/reference/libvirt-distro-support-matrix.html | 16:23 |
sean-k-mooney | yes | 16:23 |
sean-k-mooney | so your patch is mising a min libvirt version check | 16:23 |
bauzas | we're still having libvirt 6.0.0 as a min, right? | 16:23 |
sean-k-mooney | so that will need to be added | 16:23 |
ykarel | from job logs libvirt-daemon 8.0.0-1ubuntu7.4 | 16:23 |
sean-k-mooney | bauzas: yes and im hoping to bump that to 7.0.0 this release | 16:23 |
ykarel | sean-k-mooney, yes will include that in next update | 16:23 |
sean-k-mooney | but we will sitll need to check 8.0.0 supprot | 16:24 |
sean-k-mooney | ack | 16:24 |
bauzas | sean-k-mooney: but we would still need to only set it if libvirt >=8 | 16:24 |
bauzas | do we want to schedule on it, or is it just a performance fix ? | 16:24 |
sean-k-mooney | ya either only set it or preferable check this in init-host | 16:24 |
sean-k-mooney | and raise an error if you set the config option on an old libvirt | 16:24 |
bauzas | will there be any config knob ? | 16:24 |
sean-k-mooney | yes there shoudl be | 16:25 |
ykarel | yes that's second question | 16:25 |
sean-k-mooney | to set the cache size | 16:25 |
ykarel | Default setting for this new option, unconfigured or set to defaults like 32MB or 128 MB etc? | 16:25 |
bauzas | ykarel: if that's a config option, that indeed needs to be an hard fail if the operators sets this config value | 16:26 |
bauzas | and libvirt isn't recent enough | 16:26 |
sean-k-mooney | its not somethign we should hardcode | 16:26 |
ykarel | bauzas, sure will take care that in the patch | 16:26 |
bauzas | for that reason, I'm not happy with a default value except none | 16:26 |
sean-k-mooney | so it shoudl eb a config option im ok with a low default or leaving it unset | 16:26 |
bauzas | I'd prefer it unset for upgrade reasons | 16:27 |
ykarel | okk Thanks will keep it like that no default and let user configure it | 16:27 |
sean-k-mooney | bauzas: to your poitn it woudl be nice to have a triat for this but not required | 16:27 |
bauzas | ykarel: s/user/operator but I think I get your point | 16:27 |
ykarel | so will propose the blueprint and update the patch with as per all the suggestions, Thanks | 16:27 |
sean-k-mooney | in this case really it will be the zuul job | 16:27 |
ykarel | yes | 16:28 |
sean-k-mooney | this is of interest to peopel usign qemu for emulation | 16:28 |
bauzas | sean-k-mooney: man, we could recommend the operators to provide custom traits for this, exactly like vgpu types | 16:28 |
bauzas | I mean, eventually all the computes will support that, right? | 16:28 |
sean-k-mooney | yes | 16:28 |
bauzas | after a couple of releases, once we cap libvirt to >=8 | 16:28 |
sean-k-mooney | so its not goign to break live migration | 16:29 |
bauzas | so I'm not a big fan of adding some scheduling thing for something that will eventually be supported mid-term | 16:29 |
sean-k-mooney | because we do not allwo live migration from a newer to older microversion | 16:29 |
sean-k-mooney | and for cold migration we will regenerate the xml on the approiate host | 16:29 |
sean-k-mooney | based on what it has avaiable | 16:29 |
bauzas | s/microversion/libvirt version but yeah | 16:29 |
sean-k-mooney | so i dont think we need anythign sepcial here | 16:29 |
sean-k-mooney | so ya custom trait woudl eb fine with me | 16:30 |
sean-k-mooney | they can use provider.yaml to set that if they want | 16:30 |
bauzas | cool, so that only seems a config knob to add, a check on init_host to fail if set and some magic in the driver to enable it | 16:30 |
bauzas | amirite ? | 16:30 |
sean-k-mooney | more or less | 16:30 |
bauzas | then, I'm OK for specless | 16:30 |
sean-k-mooney | +docs test ectra but its effectivly self contaied in the libvirt driver + the config tweak | 16:30 |
bauzas | we had precedents | 16:30 |
bauzas | + a relnote obviously | 16:31 |
bauzas | ykarel: do you agree with the direction ? | 16:31 |
ykarel | bauzas, yes | 16:31 |
sean-k-mooney | im ok with specless too assuming all of the above are done | 16:31 |
bauzas | anyone disagreeing ? | 16:31 |
bauzas | looks not | 16:32 |
ykarel | Thanks folks \o/ | 16:32 |
sean-k-mooney | the only other thin i would suggest is once this is done a devstack patch shoudl be added to set this by default to say 32mb if qemu is used instead of kvm | 16:32 |
sean-k-mooney | thats out of scope fo this meeting and coud be doen on a per job basis too | 16:33 |
bauzas | #agreed enabling tb cache seems a specless blueprint, provided it only adds a config knob defaulting to unset, init_host failing on an older libvirt and just libvirt config tweak | 16:33 |
bauzas | #action ykarel to ping bauzas once the blueprint is created so that he can approve it | 16:34 |
bauzas | ykarel: and yeah, the scope of this feature can include devstack change and testing, for sure | 16:34 |
bauzas | like we could enable it in nova-next | 16:34 |
sean-k-mooney | bauzas: it will reduce the memory pressue in all our jobs so ocne we know it does not have a negitive impact we will proably want to have it enable din all of them | 16:35 |
bauzas | anything else to add on this item ? | 16:35 |
sean-k-mooney | but we can take it slow like the mariadb reduced memroy | 16:35 |
bauzas | sean-k-mooney: oh yeah, but I'd be in favor of testing it first | 16:36 |
bauzas | yah | 16:36 |
bauzas | ok, fwiw, I have another item | 16:36 |
bauzas | (bauzas) Can https://blueprints.launchpad.net/nova/+spec/cold-migrate-to-host-policy be specless ? | 16:36 |
bauzas | tl;dr: we discussed this at the PTG | 16:37 |
sean-k-mooney | assuming there is no change in the default policy then yes i think so | 16:37 |
bauzas | operators want a better granularity and maybe change the cold-migrate action to be admin_or_owner | 16:37 |
bauzas | but, here, we're just adding a new policy which is admin-only | 16:37 |
bauzas | so no API change, and no policy changes | 16:37 |
bauzas | it will just go check a separate policy if host is set | 16:38 |
bauzas | (literally a one-liner patch besides the policy file) | 16:38 |
bauzas | any objections to have it specless ? | 16:38 |
sean-k-mooney | so basicaly there will be two poicies now for cold migration one for migratoin with a host adn one without but admin only by default | 16:38 |
sean-k-mooney | and then operators can choose | 16:39 |
sean-k-mooney | +1 | 16:39 |
bauzas | correct, like we have for os_compute_api:servers:create:forced_host | 16:39 |
bauzas | except I won't change the defaut rule for os_compute_api:os-migrate-server:migrate | 16:40 |
bauzas | there will be os_compute_api:os-migrate-server:migrate and os_compute_api:os-migrate-server:migrate:host | 16:40 |
bauzas | both being admin-only | 16:40 |
bauzas | (and operators can decide to open os_compute_api:os-migrate-server:migrate to endusers) | 16:40 |
bauzas | so I reiterate, any objection to have it specless ? | 16:41 |
bauzas | looks not | 16:41 |
bauzas | if so, | 16:41 |
sean-k-mooney | as long ast there is at least a blueprint im happy. i dislike changing policy without any tracker to works for me | 16:42 |
bauzas | #agreed https://blueprints.launchpad.net/nova/+spec/cold-migrate-to-host-policy accepted as a specless feature for Bobcat | 16:42 |
bauzas | sean-k-mooney: there is a blueprint, and there will be a relnote | 16:42 |
bauzas | and there will be functional tests covering this | 16:42 |
sean-k-mooney | yep all good. | 16:42 |
bauzas | I don't think we need a tempest change, do you think it's a nice to have ? | 16:43 |
sean-k-mooney | i dont think tempst shoudl test non default policy | 16:43 |
bauzas | yeah, that was my question | 16:43 |
bauzas | I'm not a QA expert | 16:43 |
bauzas | tempest is branchless | 16:43 |
bauzas | so that would be a bit hard to test it with tempest | 16:44 |
sean-k-mooney | i suspect you could reuse some fo the exsiting test with the right config if you needed too | 16:44 |
bauzas | anyway, I think we're done on this | 16:44 |
bauzas | about tempest, we could discuss this on the review time | 16:44 |
bauzas | thanks folks | 16:44 |
opendevreview | Artom Lifshitz proposed openstack/nova master: Reproduce bug 1995153 https://review.opendev.org/c/openstack/nova/+/862967 | 16:44 |
opendevreview | Artom Lifshitz proposed openstack/nova master: Save cell socket correctly when updating host NUMA topology https://review.opendev.org/c/openstack/nova/+/862964 | 16:44 |
bauzas | any other item to add before we end the meeting ? | 16:45 |
auniyal | small thing o/ | 16:45 |
auniyal | CI on yoga: this one keep failing for different reasons, mostly | 16:45 |
auniyal | https://review.opendev.org/c/openstack/nova/+/839922 | 16:45 |
bauzas | shot | 16:45 |
auniyal | mostly volume tests | 16:45 |
sean-k-mooney | ya that kind fo a pain im not sure there is anythin we can do beyond recheck | 16:46 |
sean-k-mooney | is it the volume detach tests | 16:46 |
sean-k-mooney | auniyal: gibi found some tests that are not waiting properly | 16:46 |
bauzas | I think this is also tracked on the stable CI failures etherpad | 16:46 |
auniyal | yes, attach and detach , but they are always different | 16:46 |
auniyal | sometime tomeout | 16:47 |
sean-k-mooney | yoga is not EM right so its still using tempest master? | 16:47 |
auniyal | no | 16:47 |
auniyal | tbc no, its not EM | 16:47 |
sean-k-mooney | ok | 16:47 |
sean-k-mooney | so it still can get tempest fixes if we fix those tests | 16:47 |
bauzas | yup | 16:48 |
bauzas | are we done ? | 16:48 |
auniyal | sorry I didn't get, any action on above | 16:48 |
auniyal | we need to fix tempest tests ? | 16:49 |
sean-k-mooney | i think just continue to reheck it. gibi found at least on test that is not waiting for sshable | 16:49 |
bauzas | no, we have some tempest patches up | 16:49 |
sean-k-mooney | and notice other dont appear to eb waiting but i dont have the context | 16:49 |
bauzas | and yoga would benefit from those | 16:50 |
bauzas | since tempest is branchless | 16:50 |
sean-k-mooney | oh do you have a link? | 16:50 |
auniyal | ack thanks | 16:50 |
bauzas | I was referring to gibi's recent discoveries of testing gap for ssh wait | 16:51 |
bauzas | (sorry was looking at the -tc meeting) | 16:52 |
bauzas | -tc chan* | 16:52 |
bauzas | can we close this meeting now ? | 16:52 |
sean-k-mooney | its fine we can wrap this here and chat after | 16:52 |
bauzas | cool | 16:53 |
bauzas | thanks all | 16:53 |
bauzas | #endmeeting | 16:53 |
opendevmeet | Meeting ended Tue Apr 25 16:53:12 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:53 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2023/nova.2023-04-25-16.00.html | 16:53 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-04-25-16.00.txt | 16:53 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2023/nova.2023-04-25-16.00.log.html | 16:53 |
sean-k-mooney | https://review.opendev.org/c/openstack/tempest/+/880891 | 16:53 |
sean-k-mooney | seams to be related | 16:53 |
elodilles | thanks o/ | 16:53 |
sean-k-mooney | so that test is wrong | 17:01 |
sean-k-mooney | we do not supprot attaching or detaching prots or volume from neuton or cinder | 17:01 |
frickler | so we run this test for 6 years, have issues with it time and again, and only now notice that it tests an unsupported scenario? cool | 17:02 |
sean-k-mooney | frickler: it has never been supported | 17:03 |
sean-k-mooney | i jsut notice this existed because gmann has a dnm patch up | 17:03 |
gmann | which one ? | 17:04 |
frickler | yes, saw the comment in the patch. also didn't want to blame anyone, just enjoying the wondrous world of openstack once again | 17:04 |
sean-k-mooney | https://github.com/openstack/tempest/blob/master/tempest/api/volume/test_volumes_actions.py#L39-L55 | 17:04 |
frickler | https://review.opendev.org/c/openstack/tempest/+/881132/3 | 17:04 |
sean-k-mooney | it kind of depend on what self.volumes_client.attach_volume actully does | 17:05 |
frickler | ah, no, the one below | 17:05 |
sean-k-mooney | if its calling nova its fine | 17:05 |
sean-k-mooney | if its using the cinder attachments api directly its not | 17:05 |
sean-k-mooney | that looks like its callining cinder https://github.com/openstack/tempest/blob/20e460dacfae6b4546903a9caaf9253330f27b5a/tempest/clients.py#L286 | 17:07 |
sean-k-mooney | frickler: this was actully added 11 years ago https://github.com/openstack/tempest/commit/a42fe441703084449107fabb15fe42938c02ba08 | 17:10 |
sean-k-mooney | that does not mean it has been correct or supported for all tha time | 17:10 |
frickler | ah, I was only looking at the current blame, which says 2017 | 17:10 |
frickler | you can see the actual API calls in https://4ae644854fb3bf106e9b-6877b85dbe482cd2daa62a6731b06023.ssl.cf1.rackcdn.com/881132/3/check/tempest-full-py3/37d3ce7/controller/logs/tempest_log.txt | 17:10 |
gmann | frickler: sean-k-mooney ohk that one. those tests are meant to be cinder standalone case and they are not valid scenario involving nova in half way | 17:12 |
sean-k-mooney | right they are fine fi you are using cinder standalone | 17:12 |
frickler | POST https://213.32.75.38/compute/v2.1/servers/2a24008b-6c93-4b83-a678-8d5b0be7b6a1/os-volume_attachments | 17:12 |
gmann | nobody since starting tested if passing nova server id in attachment via cinder will work from nova perspective or not | 17:12 |
frickler | that looks like nova being used | 17:12 |
gmann | I was testing those to remove nova involvement from those tests and nova+cinder attachment anyways are tested in many other tests | 17:13 |
sean-k-mooney | yep nova should be revmoed form them | 17:13 |
gmann | frickler: attachment is directly to cinder not via nova so nova does not know about attachment but cinder think server is attched to volume so make it in-use | 17:13 |
sean-k-mooney | marking it in use is correct | 17:14 |
sean-k-mooney | but we shoudl not see the volume attaed to the vm | 17:14 |
gmann | yeah, i mean as nova does not know about attachment, adding server_id as valid attachment is not correct. | 17:14 |
gmann | hat server_id can be invalid or can be deleted anytime | 17:14 |
gmann | without cinder knopwing | 17:14 |
gmann | sean-k-mooney: yeah, VM does not know about volume | 17:15 |
sean-k-mooney | anyway im glad you are looking at it you can ping me after the patch is out of DNM if you want me to reivew | 17:15 |
gmann | k | 17:16 |
sean-k-mooney | dansmith: you pinged my yesterday to look at a patch maybe related to the stable uuid stuff | 17:17 |
sean-k-mooney | do you rememebr what it was | 17:17 |
dansmith | sean-k-mooney: the rt stuff, but it's all blocked of course.. gibi gave it the +W so it's probably good for me to just fast-approve once the gate is unblocked | 17:17 |
sean-k-mooney | oh right ya that was it | 17:17 |
sean-k-mooney | i remeber seeing it had a +w | 17:18 |
sean-k-mooney | cool | 17:18 |
dansmith | yep, thanks | 17:18 |
dansmith | I'll definitely hit you up if I need a re-review once things get unblocked | 17:18 |
dansmith | *if* they get unblocked I should say :) | 17:18 |
frickler | the pysaml revert mergen, so I think CI should be unblocked | 17:19 |
frickler | merged even | 17:19 |
dansmith | we'll see :) | 17:19 |
sean-k-mooney | im going to go get dinner. i might be around later but im mostly done for today | 17:20 |
dansmith | mmm, ceph job appears to be failing again in a similar way.. hope we don't have more work to do | 17:50 |
bauzas | dansmith: which patch are you checking for the job runs ? | 18:31 |
bauzas | so I can try to look over it tomorrow morning | 18:31 |
dansmith | https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315 | 18:32 |
bauzas | ack, will target it tomorrow morning | 18:32 |
dansmith | it only failed six tests this time instead of a timeout, so maybe it's better than I thought | 18:32 |
dansmith | but six is still a lot, and I haven't gone through the latest logs yet | 18:32 |
bauzas | I can try to dig into those later | 18:33 |
dansmith | the fails look all volume detach related | 18:44 |
dansmith | so perhaps it's not really a ceph problem | 18:44 |
dansmith | but it seems like a large number for a single run, so I'm not sure | 18:44 |
dansmith | melwitt: can you +W this? https://review.opendev.org/c/openstack/nova/+/881409/2 | 20:31 |
dansmith | eharney: gouthamr: Well, it installs and "works" on jammy, but something isn't happy: https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315?tab=change-view-tab-header-zuul-results-summary | 20:38 |
dansmith | eharney: gouthamr I don't really know what I'm looking at, but I don't see any errors in the cinder or ceph stuff that I recognize, and just some "busy" messages from rbd around the failed detaches in the n-cpu log | 20:41 |
gouthamr | hey dansmith - /me is late to the party | 20:42 |
dansmith | gouthamr: we need to drop focal from the jobs and our gate has been blocked for two days because some did it early..we've reverted those things for the moment, but we need to get the ceph job working on jammy | 20:43 |
dansmith | gouthamr: the above patch unpins the jobs to let them run on jammy and they get pretty far, but some volume/ceph related failures are showing that something is not happy | 20:43 |
dansmith | gouthamr: are you the right person to get that working? | 20:44 |
melwitt | dansmith: just to confirm, you still want to remove after neutron has reverted? https://review.opendev.org/c/openstack/neutron/+/881430 | 20:44 |
dansmith | melwitt: yeah the neutron failure was a couple failures ago, and not even the only problem.. but as noted in the original patch, it was intended to only live for antelope and then be reverted, so we need to do it anyway | 20:45 |
gouthamr | dansmith: probably not; i'm not an expert on rbd or cinder... | 20:45 |
dansmith | gouthamr: oh.. who is then? | 20:45 |
gouthamr | eharney is my go to guy, probably jbernard | 20:46 |
dansmith | gouthamr: okay he said earlier today that he was not likely the guy to ask (unless I misunderstood) | 20:46 |
gouthamr | ah :) let me look at the logs and see if something pops out | 20:47 |
gouthamr | we've been burnt before by using distro packages for ceph because fixes took forever to land - so we shied away from them and looked upstream.. | 20:48 |
gouthamr | but, like you've discussed, the ceph community hasn't built jammy packages for the latest release (quincy) - they meant to, they lost people/mindshare in the recent months | 20:49 |
dansmith | gouthamr: yeah, but last I checked, there were not packages from ceph themselves for jammy | 20:49 |
dansmith | gouthamr: and the cephadm job is even more broken and marked n-v so I assume it's not healthier | 20:49 |
gouthamr | on the manila jobs, we pivoted to use centos-stream-9 because ceph folks continue to publish packages there | 20:49 |
gouthamr | s/there/for it | 20:49 |
gouthamr | dansmith: yep; on that change, i see there's a problem with podman... | 20:50 |
dansmith | gouthamr: yeah, but stream breaks us constantly | 20:50 |
gouthamr | oh | 20:50 |
dansmith | there's very little chance we're going to be able to have this job run on stream :) | 20:50 |
dansmith | gouthamr: yeah I see the podman thing, but the job is also marked non-voting | 20:51 |
dansmith | so I assume it doesn't have a long track record of stability :) | 20:51 |
gouthamr | yes; i don't think we've run the job long enough to test for stability: https://zuul.opendev.org/t/openstack/builds?job_name=devstack-plugin-ceph-cephfs-nfs | 20:52 |
dansmith | presumably the cephadm approach means we could run on jammy but with upstream ceph fixes | 20:52 |
gouthamr | yep | 20:52 |
dansmith | gouthamr: that's the cephfs job, but I assume that's not what nova needs | 20:52 |
dansmith | I'm looking at devstack-plugin-ceph-tempest-cephadm | 20:52 |
gouthamr | yes; just pointing out that job because it uses centos-9-stream | 20:53 |
dansmith | ah okay | 20:53 |
gouthamr | ack; | 20:53 |
gouthamr | "devstack-plugin-ceph-tempest-cephadm" on focal-fossa used a third party repo to get podman | 20:53 |
dansmith | ah, and podman is in jammy itself I think right? | 20:54 |
gouthamr | by the looks of it, yes | 20:54 |
dansmith | although it seems broken :) | 20:54 |
dansmith | anyway, I thought there was also some concern that the cephadm job didn't expose the ceph config that nova needed or something like that, but I heard that like 20th hand | 20:55 |
gouthamr | shouldn't be the case | 20:55 |
dansmith | okay | 20:55 |
gouthamr | i think we tested some of this without tempest in the picture - but that job ("devstack-plugin-ceph-tempest-cephadm") has never passed; we assumed someone working on nova/cinder/glance would help looking at it at some point | 20:56 |
gouthamr | sorry this feels disjointed - conversations happened on irc and gerrit, ptg and the ML iirc.. but its time to reprise this because it's urgent.. | 20:57 |
gouthamr | that's a tangent though, let me see if i can spot an issue with the package based job you're looking to fix | 20:57 |
dansmith | ugh, never passed? that's no good.. I wonder if for the same reason the non-cephadm job is failing/ | 20:58 |
dansmith | I can try to get podman working on this to see if it's otherwise the same | 20:58 |
gouthamr | ++ | 20:58 |
dansmith | gouthamr: this is definitely disjointed, and I feel like I'm just flailing because nobody else is :/ | 20:58 |
gouthamr | you're doing godly work :D | 20:59 |
dansmith | $deitly work you mean :) | 20:59 |
dansmith | er, $deityly .. or soething | 21:00 |
gouthamr | :P | 21:00 |
dansmith | okay I just pushed something that might get podman working based on that error message, so we'll see | 21:00 |
dansmith | melwitt: thanks for the +W.. I would have just removed that neutron reference and changed to "effing everything" but didn't want to have to make another trip through the jobs, as you can probably imagine :) | 21:01 |
dansmith | gouthamr: the cephadm that has never passed.. is that always on quincy (i.e. newer than what we were running in focal) or what? | 21:06 |
dansmith | I | 21:06 |
melwitt | dansmith: understandable :) | 21:06 |
gouthamr | dansmith: 5/6 failures are on volume detach timeouts; and the request never got to cinder afaict .... https://zuul.opendev.org/t/openstack/build/9ebda7c1ebf843209e57ef0eac13814f/log/controller/logs/screen-n-cpu.txt#61276-61329 | 21:07 |
dansmith | gouthamr: yeah but you see the rbd busy messages right? | 21:07 |
dansmith | I commented on an earlier patch | 21:08 |
gouthamr | ah; no i missed those | 21:08 |
dansmith | I'm not sure the cinder detach would have happened by this point by the way, because we haven't gotten the guest to let go yet | 21:08 |
gouthamr | yes | 21:09 |
gouthamr | might be my browser, but i don't see "rbd.ImageBusy: [errno 16] RBD image is busy (error removing image)" in the latest n-cpu logs | 21:10 |
gouthamr | should i be looking elsewhere? | 21:10 |
dansmith | yeah actually I don't think I see them in the latest either.. but that was just a recheck | 21:11 |
dansmith | almost identical set of failures though | 21:11 |
dansmith | so yeah.. weird | 21:11 |
dansmith | gouthamr: here's an example from the previous run: https://zuul.opendev.org/t/openstack/build/f2ecbdd78616419cb5c8c2b3f4a8b71a/log/controller/logs/screen-n-cpu.txt#55439 | 21:13 |
dansmith | that's all over those logs and absent from the latest.. bizarre | 21:15 |
dansmith | gouthamr: cephadm job made it past cephadm install phase, so.. progress I think | 21:17 |
gouthamr | very nice | 21:17 |
dansmith | and finished pool setup (ignorant guess from the commands it ran) | 21:20 |
gouthamr | https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315/11/devstack/files/debs/devstack-plugin-ceph breaks focal fossa jobs (cephfs-native) though - but, we can fix that up with some os version annotation, correct? | 21:22 |
dansmith | we can fix it by putting it in the code instead of those package lists | 21:23 |
dansmith | I just jammed it in there because it was hard to mess up and will make sure we get it installed from the distro | 21:23 |
gouthamr | ack; either that or i can just fix the native cephfs job to use jammy too | 21:24 |
dansmith | the current install_podman thing does checks for focal, so I'd just extend that | 21:24 |
dansmith | gouthamr: yeah, although the ceph jobs on stable have to use the plugin without branches don't they? | 21:24 |
gouthamr | no this repo is branched | 21:24 |
dansmith | ah okay | 21:24 |
dansmith | well, either way.. if we move this to only >=jammy for everything (along with the PTI for 2023.2) then we can just use this debs list thing and remove the focal-specific install stuff.. whatever you ant | 21:25 |
dansmith | I just want it to work :) | 21:25 |
gouthamr | agree; lets see this work | 21:25 |
gouthamr | if the "rbd remove" thing fails again on this run with the error, i would suggest reporting a bug - we could _try_ this thing on centos-9-stream and see whether there's some weirdness in the distro packages | 21:27 |
gouthamr | but, i am nervous about that job's future with ubuntu since we've learned about the ceph community's stance | 21:27 |
dansmith | ack, well, based on how this is working, I'm hoping the cephadm will either "work" or "fail the same way" and then we can discount distro packages | 21:28 |
gouthamr | ++ | 21:28 |
dansmith | it's running tempest now and looking identical to the distro version so far (i.e. hasn't failed but hasn't run volume tests yet) | 21:28 |
dansmith | so that's majorly better. failing with upstream bits is a minor win over failing with distro bits :P | 21:28 |
eharney | i have some work in flight currently around rbd ImageBusy errors... i wonder if this job is running some tests that were previously disabled? | 21:29 |
dansmith | eharney: shouldn't be, and one run failed with a bunch of them and then a single recheck failed similarly, but with no rbdbusy errors | 21:29 |
gouthamr | eharney: interesting, are there librbd changes that you're having to work around? | 21:31 |
eharney | gouthamr: not new ones, just working on fixing the class of errors around rbd images that can't be deleted that we've always had | 21:32 |
gouthamr | eharney: oh.. this error seems to have occurred multiple times in the libvirt rbd "remove image" call.. | 21:34 |
gouthamr | i'm hoping no openstack code needs to change; we're hoping to support quincy with stable/wallaby downstream :D based on some testing of this stuff elsewhere | 21:36 |
dansmith | gouthamr: yeah I was going to ask earlier.. are we using quincy downstream such that we know it works? this set of failures has me concerned that it's something fundamental of course | 21:36 |
gouthamr | (same tests, different OS, full blown ceph cluster as opposed to our aio .. etc etc -- so there could be a number of things being issues) | 21:37 |
gouthamr | dansmith: yes, we're not testing quincy with openstack's trunk though... we trail downstream; but this stuff is working with zed last i checked, with the same tempest tests passing there | 21:38 |
dansmith | okay that's good | 21:38 |
dansmith | well, it's passing some volume tests at least | 21:56 |
gouthamr | ++ | 21:56 |
gouthamr | a couple of things we wanted to try in this cephadm job in the past: | 21:57 |
dansmith | gouthamr: so if this works magically, you're okay just making this drop support for focal as long as the other jobs here are set to run on jammy? | 21:57 |
gouthamr | (1) revert to default test concurrency --- the concurrency was set to 1 because we saw resource contention | 21:57 |
gouthamr | dansmith: yes | 21:58 |
dansmith | resource contention like memory/ | 21:58 |
gouthamr | yes, and disk | 21:58 |
dansmith | so there's a tweak in devstack for memory that has been helping a lot of jobs, and we run with it enabled in the nova ceph jobs | 21:58 |
gouthamr | oh? i'd love to know! | 21:58 |
dansmith | drops mysql usage by about half, which is ~400MiB on these jobs | 21:58 |
gouthamr | nice | 21:58 |
dansmith | gouthamr: problem is you have to pay me a royalty per job execution to use it | 21:58 |
dansmith | gouthamr: https://github.com/openstack/nova/blob/master/.zuul.yaml#L422 | 21:59 |
gouthamr | :D i'd be poor if i was betting on ceph using less memory ever | 21:59 |
dansmith | haha | 21:59 |
gouthamr | (2) there's also a way to turn off "cephadm" after the install -- we should set that option on the CI - there's no use to keep it running | 22:00 |
dansmith | so based on our success with that, if you're cool with it, I'd say we turn that on for these jobs anyway | 22:00 |
dansmith | gouthamr: meaning before we start running tempest? | 22:00 |
gouthamr | yes, the plugin will do that for you after the ceph cluster deployment is done: https://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/cephadm#L28 | 22:01 |
dansmith | oh so why is that not being set? | 22:02 |
dansmith | I don't even know what that means.. cephadm is a tool I thought, but you're saying it stays running even though the services are started or something? | 22:02 |
gouthamr | (here's an example from the only good cephadm job at the moment: https://github.com/openstack/manila-tempest-plugin/blob/ad0db6359c8c51c4521ac6660c8014981b2f1dea/zuul.d/manila-tempest-jobs.yaml#L413) | 22:02 |
dansmith | ack | 22:02 |
gouthamr | yes; we need it for day2 operations on the cluster | 22:02 |
gouthamr | so if you're running this on your local devstack, it's useful | 22:03 |
dansmith | ah, sure, okay | 22:03 |
dansmith | we should set that in these jobs that everyone inherits from | 22:03 |
gouthamr | ++ | 22:03 |
dansmith | so maybe a follow-on to this to set that, the mysql thing, and make this job voting | 22:03 |
dansmith | you know, if and when it starts working :) | 22:04 |
gouthamr | ^^ +1 | 22:04 |
dansmith | hmm, gouthamr I just noticed that we're still set to release=pacific for the distro-package-based job | 22:06 |
dansmith | is that maybe setting stuff that is non-ideal for quincy that could be related? | 22:06 |
gouthamr | i think the plugin ignores it, let me check | 22:06 |
dansmith | okay | 22:06 |
dansmith | https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/865315/11/.zuul.yaml#46 | 22:06 |
dansmith | okay just used for package repos anyway it looks like | 22:08 |
gouthamr | ack; we should get that opt out of the job because its confusing | 22:08 |
gouthamr | https://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/ceph#L981-L989 | 22:08 |
dansmith | ack, so I assume we're configuring a repo but just not installing anything from it on jammy, right? | 22:09 |
dansmith | I'll add it to my follow-on patch to add the optimization devstack vars | 22:10 |
dansmith | add .. the removal of it, I mean :) | 22:10 |
gouthamr | ack ty | 22:10 |
gouthamr | i may have messed this up in my last patch | 22:10 |
gouthamr | https://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/ceph#L1067-L1094 | 22:10 |
gouthamr | we're not invoking that method at all for ubuntu anymore.. | 22:10 |
dansmith | ack | 22:11 |
dansmith | well, there's a bunch of focal-specific stuff to clean up in there regardless | 22:11 |
dansmith | hmm, I think it's about to fail a test | 22:14 |
dansmith | been stopped for going on five minutes, assuming retrying a detach | 22:15 |
gouthamr | yes.. it was a head-scratcher; think vkmc and i noticed that our override to use download.ceph.com stopped working with focal at some point since ubuntu default-enabled the ubuntu ceph repos.. so dropping it made no difference, we ended up using/testing with the distro provided packages | 22:15 |
gouthamr | oh | 22:15 |
gouthamr | test_rebuild_server_with_volume_attached? | 22:16 |
dansmith | dunno yet | 22:16 |
gouthamr | ah | 22:16 |
dansmith | but it's in the rebuld group | 22:16 |
dansmith | yep | 22:16 |
dansmith | test_rebuild_server_with_volume_attached [430.305393s] ... FAILED | 22:17 |
dansmith | ugh | 22:17 |
dansmith | so the other variable here is the version of qemu and qemu's block-rbd driver are different in jammy of course, compared to what we've been testing | 22:18 |
dansmith | so could be a bug in one of those, especially since it's related to the detach in the guest | 22:18 |
gouthamr | ack; another thing to try would be to bump the ceph image to the latest quincy: https://github.com/openstack/devstack-plugin-ceph/blob/563cb5deeb21815ce0c62fa30249e85e886c783a/devstack/lib/cephadm#L32 | 22:19 |
dansmith | okay | 22:20 |
gouthamr | they've published v17.2.6 today, and v17.2.5 a month ago | 22:20 |
gouthamr | https://quay.io/repository/ceph/ceph?tab=tags | 22:21 |
dansmith | ack, I hate this sort of "version minesweeper" game.. if we're that sensitive to version, it feels like we're doing something wrong | 22:21 |
gouthamr | agreed; but since this stuff hasn't worked before on our ci, its worth a try | 22:22 |
dansmith | yeah for sure | 22:22 |
dansmith | volume resize passed | 22:23 |
dansmith | maybe we'll find that this is fewer fails or something | 22:23 |
dansmith | actually, that one didn't fail before | 22:25 |
gouthamr | okay, this may be good news? devstack-plugin-ceph-tempest-py3 is going to pass | 22:28 |
dansmith | no, really? | 22:28 |
dansmith | third recheck's a charm? | 22:28 |
gouthamr | :D | 22:28 |
dansmith | more fails on this cephadm job | 22:42 |
dansmith | so I guess it's not something fundamental, but maybe just massively less stable or we're hitting some race easier? | 22:42 |
dansmith | maybe it is memory-related and we're stressed more here | 22:43 |
dansmith | maybe I should try turning on the two optimizations here to see if that makes things more stable | 22:43 |
gouthamr | ++ | 22:43 |
gouthamr | "DISABLE_CEPHADM_POST_DEPLOY: true" and "MYSQL_REDUCE_MEMORY: true" for the rescue; we can iterate after with the concurrency if these failures reduce | 22:45 |
gouthamr | or go away | 22:45 |
opendevreview | Artom Lifshitz proposed openstack/nova master: Reproduce bug 1995153 https://review.opendev.org/c/openstack/nova/+/862967 | 22:46 |
opendevreview | Artom Lifshitz proposed openstack/nova master: Save cell socket correctly when updating host NUMA topology https://review.opendev.org/c/openstack/nova/+/862964 | 22:46 |
dansmith | gouthamr: ack, will put those in here and see | 22:46 |
opendevreview | Merged openstack/nova master: Remove focal job for 2023.2 https://review.opendev.org/c/openstack/nova/+/881409 | 22:53 |
dansmith | gouthamr: okay, it's off and running with those flags | 23:25 |
dansmith | I'm burnt out so I'll circle back tomorrow | 23:25 |
gouthamr | dansmith++ works; good evening! :) | 23:27 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!