Tuesday, 2023-10-17

*** han-guangyu is now known as Guest366101:48
*** han-guangyu_ is now known as han-guangyu01:51
opendevreviewHanGuangyu proposed openstack/nova master: Unify ServersController._flavor_id_from_req_data param to server_dict  https://review.opendev.org/c/openstack/nova/+/89831502:14
opendevreviewHanGuangyu proposed openstack/nova master: Unify ServersController._flavor_id_from_req_data param to server_dict  https://review.opendev.org/c/openstack/nova/+/89831505:19
opendevreviewHanGuangyu proposed openstack/nova master: Unify ServersController._flavor_id_from_req_data param to server_dict  https://review.opendev.org/c/openstack/nova/+/89831505:40
opendevreviewHanGuangyu proposed openstack/nova master: Unify ServersController._flavor_id_from_req_data param to server_dict  https://review.opendev.org/c/openstack/nova/+/89831508:34
opendevreviewPavlo Shchelokovskyy proposed openstack/nova master: Forbid use_cow_images together with flat images_type  https://review.opendev.org/c/openstack/nova/+/89822908:44
opendevreviewAlexey Stupnikov proposed openstack/nova master: Translate VF network capabilities to port binding  https://review.opendev.org/c/openstack/nova/+/88443913:28
opendevreviewAlexey Stupnikov proposed openstack/nova master: Translate VF network capabilities to port binding  https://review.opendev.org/c/openstack/nova/+/88443913:30
opendevreviewAlexey Stupnikov proposed openstack/nova master: Translate VF network capabilities to port binding  https://review.opendev.org/c/openstack/nova/+/88443913:31
opendevreviewTobias Urdin proposed openstack/nova master: [WIP] Handle scaling of cputune.shares  https://review.opendev.org/c/openstack/nova/+/89832613:40
sean-k-mooneytobias-urdin: thisis not something we can do in nova13:41
sean-k-mooneytobias-urdin: we can discuss it again but we considerd that option and rejected it before13:41
sean-k-mooneytobias-urdin: its not jsut eh cpu_shares that woudl need to be adjusted13:42
opendevreviewTobias Urdin proposed openstack/nova master: [WIP] Handle scaling of cputune.shares  https://review.opendev.org/c/openstack/nova/+/89832614:00
tobias-urdinsean-k-mooney: are you thinking about quota:cpu_shares flavor extra spec as well?14:01
sean-k-mooneytobias-urdin: yes14:02
tobias-urdini don't understand the reasoning for "libvirt broke us let's remove the default completely"14:02
sean-k-mooneyso on master (since zed) we nolonger genreate cpu_shares implictly14:02
sean-k-mooneytobias-urdin: it was nto libvirt14:02
sean-k-mooneythis was cause by your kernel being compiled with cgroups_v214:03
sean-k-mooneyit was broken by the kernel team changeing the allowed ranges between api versions14:03
sean-k-mooneytobias-urdin: neither nova nor libvirt provided any normalistaion of values menaing the admin is responsible for selecting values that are allowed by there kernel cgroup version14:05
tobias-urdinwhile that is true and libvirt never normalized the values, i don't agree, libvirt is an abstraction and should've handled it14:09
sean-k-mooneytobias-urdin: we made that argument to the libvirt maintainer and they disagreed14:09
sean-k-mooneytobias-urdin: https://bugs.launchpad.net/nova/+bug/1960840 is related14:12
sean-k-mooneywe considerd extneding the flavor validation https://review.opendev.org/c/openstack/nova/+/82906414:12
sean-k-mooneyhowever part of the problem is the range depend on the virtdriver in use and the cgroup version in the libvirt case14:13
sean-k-mooneyin that case actully the limit is a tc one rather the cgroup but we considerd deprecting and removing these as a result14:14
sean-k-mooneythat simpler to do with the vif quotas since they generally nolonger fucntion14:15
sean-k-mooneyand have a neutron replacemnt that does14:15
tobias-urdinas an operator i basically have two options 1) apply patch https://review.opendev.org/c/openstack/nova/+/824048 and just live with the behaviour change and larger instances does not get favored for oversubscription and i have to manually update cpu_shares or backport that patch to yoga to move to zed with live migration or 2) scale the value and14:17
tobias-urdinset correct on all current instances and live migrate them over and make sure new instances also gets a scaled value14:17
tobias-urdinfor an operator where touching the code is an issue this might be very messy14:17
tobias-urdini opted for adding it as a workaround and upon nova-compute startup fix the values, exactly because of the upgrade issue14:18
sean-k-mooneyso im not against backporting the disabling of the implict cpu_share request for what its worht14:18
sean-k-mooneytobias-urdin: we have dont this back to wallaby downstream because we considerd it a release blocker for our product14:18
sean-k-mooneytobias-urdin: we dont have any other code that modifes a guest on startup like that and im not sure hta tis a patten we should follow in general 14:19
sean-k-mooneytobias-urdin: if we were to backport https://review.opendev.org/c/openstack/nova/+/824048 to yoga would that solve your issue14:21
tobias-urdinit feel like kind of a limbo, if libvirt maintainers indeed informed that this will never change then the only way forward is fix applications or drop it, i'm just surpised we went for drop it for a default value that has been there since years(?)14:22
tobias-urdinpersonally i would like it backported but i dont know if stable backport policy covers performance impact as well?14:23
opendevreviewAmit Uniyal proposed openstack/nova-specs master: WIP: Enforce console session timeout  https://review.opendev.org/c/openstack/nova-specs/+/89855314:27
opendevreviewTobias Urdin proposed openstack/nova stable/yoga: libvirt: remove default cputune shares value  https://review.opendev.org/c/openstack/nova/+/89855414:29
tobias-urdini guess ^ and then hope that operators take the proper action to update cpu_shares to default value or live migrate to get rid of it, annoying that nothing with this is optimal14:31
tobias-urdina shame that libvirt didn't even try by introducing a cputune.cpu_weight, deprecating and scaling the cputune.cpu_shares value if required, like systemd did and took backward compatibility more seriously14:33
tobias-urdin / end of rant :p14:33
bauzasreminder : nova meeting in 1 hour and 10 mins here 14:50
opendevreviewTakashi Kajinami proposed openstack/nova master: Fix python shebang  https://review.opendev.org/c/openstack/nova/+/89859415:43
opendevreviewMerged openstack/nova stable/wallaby: Accept both 1 and Y as AMD SEV KVM kernel param value  https://review.opendev.org/c/openstack/nova/+/84393915:46
bauzas#startmeeting nova16:01
opendevmeetMeeting started Tue Oct 17 16:01:18 2023 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.16:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:01
opendevmeetThe meeting name has been set to 'nova'16:01
bauzashey folsk16:01
dansmitho/16:01
elodilleso/16:01
bauzas#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting16:01
* bauzas is currently hit by a big bus, so I'm sorry to not be around like I should16:02
auniyal0/\16:02
auniyalo/16:02
bauzasokay, let's start16:02
bauzas#topic Bugs (stuck/critical) 16:03
bauzas#info No Critical bug16:03
bauzas#link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 47 new untriaged bugs (+1 since the last meeting)16:03
bauzas#info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster16:03
Ugglao/16:03
bauzasartom: you're next in the roster list, fancy trying to look at some upstream bugs ?16:03
bauzasartom seems to be offline, let's move on and we'll see16:04
bauzas#info bug baton is artom16:04
bauzas#undo16:04
opendevmeetRemoving item from minutes: #info bug baton is artom16:04
artomEh? No I'm here16:04
bauzas #info bug baton is tentatively artom16:04
bauzasartom: I was just asking you whether you were happy to go looking at Launchpad16:05
artom(to my own surprise, I should say - I've been trying a new wayland-native IRC client, and it got lost somewhere on my 9 workspaces)16:05
bauzastrust me, this is a happy place compared to some other bug reporting tools I know 16:05
artomYep, I'll launch all the triage pads16:05
artomAnd pad all the triage launches16:06
bauzasI give you my pad, bro16:06
bauzasanyway, moving on16:06
artomPad accepted, brah16:06
bauzas#topic Gate status 16:06
bauzas#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:06
bauzasthis was a funny week16:07
sean-k-mooneydansmith: have you seen that grenade failure more then once16:07
bauzasbut afaict, the gate postfailure is now fixed ?16:07
dansmithsean-k-mooney: yeah16:07
dansmithlooks like failure to ssh to the test instance though16:07
dansmithI haven't dug in deep yet16:07
sean-k-mooneyok so that might be another gate blocker16:08
bauzasshit16:08
sean-k-mooneyya it looks like an ssh issue but im not sure if its consitent or intermitant16:08
dansmithidk, but it's blocking the fix for another blocker :)16:08
bauzasI don't you for you folks, but I have the impression that the universe is after me16:08
sean-k-mooneyfor context https://github.com/openstack/grenade/blob/master/projects/70_cinder/resources.sh#L240 is failing16:09
sean-k-mooneybut have not looked at it properly either16:09
bauzasack16:09
bauzasI guess the cinder team is fully aware of the situation ?16:10
dansmithI doubt it's a cinder problem16:10
bauzasbut since we ssh to the guest, this is our mud, right ?16:10
sean-k-mooneyif they run grenade maybe but i just saw it this morning16:10
sean-k-mooneyim logging in to opensarch to see how common it is now16:11
bauzasand no guest console saying anything ?16:11
sean-k-mooneyagain i havent debug it so im not sure16:11
dansmithno guest console dump in grenade16:11
dansmiththat's a tempest thing16:11
dansmithlets not debug here16:11
bauzasI quite agree with the fact that we shouldn't debug now16:12
bauzasmoving on so16:12
bauzas#link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status16:12
sean-k-mooneylooks like 100 failures in the last 30 days16:12
sean-k-mooneybut ya lets move on16:12
bauzassome reds, but the gate isn't happy those days, we'll see next week16:13
bauzas#info Please look at the gate failures and file a bug report with the gate-failure tag.16:13
bauzas#topic Release Planning 16:13
bauzas#link https://releases.openstack.org/caracal/schedule.html16:14
bauzas#info Nova deadlines are not yet defined and will be once the PTG happens16:14
bauzas#info Caracal-1 milestone in 4 weeks16:14
bauzaswe'll discuss about spec review days next week at the PTG16:14
bauzasbut this is a good idea to propose your specs this week or the week after, since we'll be around at the same time16:14
bauzasmessage is sent, moving on16:15
bauzas#topic Caracal vPTG planning 16:15
bauzas#info Sessions will be held virtually October 23-2716:15
bauzaswhich is next week, basically16:15
bauzaslet's be honest, I'm late in terms of preparing this PTG16:15
opendevreviewTakashi Kajinami proposed openstack/nova master: Drop remaining deprecated upgrade_levels option for nova-cert  https://review.opendev.org/c/openstack/nova/+/89861316:15
bauzasbut we will already have a nova-cinder x-p session16:16
bauzasFYI16:16
bauzasfeel free to add any cinder-related topics in the nova ptg etherpad, I'll move them to the right place in order for them to be discussed16:16
bauzasno other PTLs came to me until now16:16
bauzasbut I guess this will come soon16:17
bauzasI also haven't seen yet any cross-project-ish topic in the nova ptg etherpad, but if so, I'll do the liaison16:17
bauzas#info Register yourselves on https://ptg2023.openinfra.dev/ even if the event is free16:17
bauzasthis is free, tbc.16:17
bauzas#link https://etherpad.opendev.org/p/nova-caracal-ptg PTG etherpad16:18
bauzasthat's the etherpad I was referring to, seconds ago16:18
sean-k-mooneywe can proably just join there room if they have one or two topics 16:18
bauzasand yet the reminder16:18
bauzas#info add your own topics into the above etherpad if you want them to be discussed at the PTG16:18
bauzassean-k-mooney: yeah that's the plan, the nova-cinder x-p session will be in their room16:18
bauzasthe exact timing of the nova-cinder session is already written in the etherpad (wed 5pm IIRC)16:19
bauzasor maybe thur, my brain is playing with me16:19
bauzasthat reminds me, we shall cancel next team meeting16:20
bauzasanybody disagrees ?16:20
bauzasI take your silence as no16:20
* sean-k-mooney nods16:21
dansmithobvious :)16:21
bauzas#agreed next Nova weekly meeting on Oct 24 is CANCELLED, go join the PTG instead, you'll have fun16:21
* bauzas doesn't know what to do in order to incentize operators to join16:21
bauzasI should try to joggle with balls, maybe16:22
bauzasoh, last point16:22
bauzasshall we run again this marvelous and succesful experience that is the operator-hour ?16:22
bauzasI mean, I can just unbook some nova slot and officially pretend this is an operator hour16:23
dansmithI think we should try yeah16:23
bauzasand if nobody steps up, which would be sad and unfortunately expected, we could just consume the regular nova etherpad16:23
bauzasokay, then I'll do the flip for tuesday 4pm16:24
bauzas4pm UTC allows us to get EU and US east-coast ops16:24
bauzasand we could continue to discuss at 5pm if we really have audience and hot topics16:25
bauzas#action bauzas to set up some operator hour, preferably Tuesday around 4pm UTC16:25
bauzas#topic Review priorities 16:26
bauzas#link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2)16:26
bauzas#info As a reminder, people eager to review changes can +1 to indicate their interest, +2 for asking cores to also review16:26
bauzasyet again, taking the action to propose a Gerrit dash, once I'll have 5 mins of my time for doing this easy peasy16:26
bauzas#topic Stable Branches 16:26
bauzaselodilles: ?16:26
elodillesyepp16:27
elodillesi'm not aware of any stable gate issues16:27
bauzasthe universe is smiling at me then16:27
elodillesthough nova-ceph-multistore is suspicious on stable/victoria16:27
elodillesbut need more check16:27
elodillesotherwise gates should be OK16:28
elodillesalso, some bug fixes landed already on stable/2023.1 (antelope), so i'll propose a release patch, if that's OK for people16:28
bauzaselodilles: thanks16:28
bauzaselodilles: and yeah, sounds cool to me16:29
elodilles++16:29
elodillesand the usual:16:29
elodilles#info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci16:29
elodillesadd stable gate issues there ^^^16:29
bauzasfwiw, we'll discuss the State of Wallaby and Ussuri16:29
elodillesif you encounter any16:29
bauzasat the PTG*16:29
elodillesbauzas: ACK16:29
elodilles(wallaby, victoria, ussuri, you mean?)16:30
elodillesanyway, we'll discuss at PTG :)16:30
* bauzas doesn't know why but I always skip victoria16:30
bauzasthis is like the thanos blip16:30
bauzasI pretend victoria never existed16:31
bauzaselodilles: heard any other projects besides Cinder pursuing the idea to drop those releases ?16:31
elodillesbauzas: well, some projects have eol'd their xena already16:32
elodillesbauzas: let me check quickly16:32
bauzassee, the ship has sailed then16:32
bauzasshould be a very quick discussion at the PTG then16:32
elodilleskolla & magnum, otherwise i think projects are quiet about it yet16:32
elodillesso no other projects yet16:33
*** ralonsoh is now known as ralonsoh_ooo16:33
bauzasokay, we'll see at the PTG16:33
elodilles+116:34
bauzasthanks16:34
elodilles++16:34
bauzas#topic Open discussion 16:34
bauzasI have nothing in the wikipage16:34
bauzasanything anyone ?16:34
bauzaslooks not16:35
bauzashave a good day everyone and thanks all16:35
bauzas#endmeeting16:35
opendevmeetMeeting ended Tue Oct 17 16:35:22 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:35
opendevmeetMinutes:        https://meetings.opendev.org/meetings/nova/2023/nova.2023-10-17-16.01.html16:35
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-10-17-16.01.txt16:35
opendevmeetLog:            https://meetings.opendev.org/meetings/nova/2023/nova.2023-10-17-16.01.log.html16:35
elodillesthanks bauzas o/16:35
sean-k-mooneybauzas: dansmith  here is a short url to the greande failures https://tinyurl.com/2a63sy7e16:38
dansmithsean-k-mooney: the fix patch passed grenade (or that phase of grenade) this time, so clearly not a hard fail17:08
sean-k-mooneyack 17:21
sean-k-mooneyit looks like it started aroudn october 5th17:22
auniyalfrom this log https://3594ebcd65d47df3e70b-6ec9504d1ecc47a9ef6950d383ea355d.ssl.cf1.rackcdn.com/898435/1/check/nova-grenade-multinode/a3b1d83/controller/logs/grenade.sh_log.txt17:23
auniyalI think, this https://github.com/openstack/grenade/blob/master/projects/70_cinder/resources.sh#L245 i.e. connecting to test_vm via ssh to verify verify.txt has test string, because, somewhat create failed at `2023-10-17 13:59:13.930`17:23
noonedeadpunkHey folks. I'm not really getting how to satisfy this check: https://opendev.org/openstack/nova/commit/27f384b7ac4f19ffaf884d77484814a220b2d51d18:02
noonedeadpunkAs eventually, you have a query for _compute_node_select where filter is `{"service_id": None}`18:03
noonedeadpunk(as it's always being passed None as service_id here https://opendev.org/openstack/nova/src/branch/master/nova/db/main/api.py#L674)18:04
noonedeadpunkAnd that eventually does not work at all for upgrades in our CI and quite reproducible https://paste.openstack.org/show/bWODASkcA0PkKHUHf3xG/18:05
noonedeadpunkAnd that's on current HEAD of 2023.218:07
noonedeadpunkdansmith: I know you're on a tc meeting now, but maybe you have some thoughts on that18:07
dansmithnoonedeadpunk: have you done the upgrade before you run that check?18:08
* noonedeadpunk need to check logs18:09
dansmithnoonedeadpunk: perhaps you've done the upgrade but haven't started all the computes yet so they haven't fixed their records?18:10
dansmith(as noted in the error message)18:10
noonedeadpunkYes, online data migrations was run on N-1 as well according to the logs 18:11
noonedeadpunkand that' the first thing I've checked - that all is running https://paste.openstack.org/show/b8ZgQRT6N2qmGRRkdEZz/18:11
dansmiththe computes do their own migrations of this, so running online_data_migrations won't change it18:12
dansmithhang on18:12
noonedeadpunkwell, talking about that - computes were not restarted yet after the upgrade of API18:13
noonedeadpunk(I guess)18:13
dansmithwell, that'd be why then18:14
noonedeadpunkbut um... I guess then I'm confused at what point nova-status upgrade check should run?18:15
noonedeadpunkAs that is to ensure that things are ready for upgrade? and like - cancel upgrade if they're not?18:16
noonedeadpunkSo, in case of upgrade, you should now upgrade computes first and then only api/conductor/scheduler?18:16
noonedeadpunkhttps://docs.openstack.org/nova/latest/cli/nova-status.html#upgrade `Performs a release-specific readiness check before restarting services with new code`18:18
noonedeadpunkSo exactly like I did - run check, it failed, services were not restarted with the new code...18:18
dansmithnoonedeadpunk: okay maybe that should have been a warning I guess, idk18:18
sean-k-mooneynoonedeadpunk: nova-status is ment to be run before you do any upgrades18:18
noonedeadpunkahaa..... that what can be wrong18:19
noonedeadpunkas I'm running it with 2023.1 code before restart18:19
noonedeadpunksorry18:19
noonedeadpunk2023.2 code18:19
dansmithsean-k-mooney: running the new nova-status before you upgrade right?18:20
noonedeadpunkso 1. upgrade code. 2. run check 3. restart services if it passes rollback if it's not18:20
dansmithso we shouldn't have made that an error because without having run computes, you can't have had the uuids added18:20
noonedeadpunk(it's current flow)18:20
sean-k-mooneyso to upgrade to 2023.2 form 2023.1 you would run the 2023.2 version of nova-status before doign the upgrde18:20
noonedeadpunkyeah, I think that what I do actually18:21
noonedeadpunkLike code is upgraded but services still running old code18:21
sean-k-mooneydansmith: am i right about runnign the 2023.2 nova status when upgradeign to 2023.2 or should it be the 2023.1 version 18:22
sean-k-mooneyi feel like im wrong about htat18:22
sean-k-mooneywe are ment to deprecate thign at least one release early so we normally put the status check in with the deprecation as a warning18:23
dansmithhonestly, I think it's supposed to be "am I done with my upgrade yet"18:23
noonedeadpunkbut how with old code you can be aware if it's safe to upgrade to the new one?18:23
dansmithwhich before you upgrade will be "no"18:23
dansmithbut maybe error should only be raised if you have to do something before upgrading, idk18:23
dansmithI wrote that at your request (IIRC) and don't recall any review comments about it :)18:24
sean-k-mooney:)18:24
sean-k-mooneyso the currnt  check woudl only pass after all compute are fully upgraded but it also should only be an error if th min compute service version is above the version where we  stated doing that18:25
sean-k-mooneynoonedeadpunk: is this curretly blocking you form upgrading by the way18:26
sean-k-mooneyor were you asking how to make the check happy18:26
sean-k-mooneyif the later then you need to start the computes with the bobcat code18:26
sean-k-mooneyand it should pass after they have all started18:26
noonedeadpunkWell. It makes our upgrade code fail in osa18:26
noonedeadpunkAnd I'm looking how to make our upgrade jobs happy as well as users that would perform upgrade18:27
sean-k-mooneyi would need to check our docs for which version of nova status we expect to be run and where18:27
noonedeadpunkIt says nothing after Xena....18:28
noonedeadpunkif you're about https://docs.openstack.org/nova/latest/cli/nova-status.html#upgrade18:28
sean-k-mooneynova-status on -1 tells you about the thigns that are deprecated that we know might brake you in the future18:28
sean-k-mooneyso before upgrade you shoudl fix those with the n-1 version18:28
sean-k-mooneythen do the upgrade.18:29
noonedeadpunkmhm. Ok, I see, so it should run not on the upgraded code18:29
sean-k-mooneyok so reading the doc text18:29
sean-k-mooneyits say "Performs a release-specific readiness check before restarting services with new code."18:30
sean-k-mooneythat would implcy that the new check should be a warning in 2023.2 and an error in 2024.1 i think18:30
noonedeadpunkWell, I read it in a way - upgrade code, run test, restart services18:30
dansmithsean-k-mooney: yep18:30
sean-k-mooneyalthough with slurp it shold be a warnignin 2024.1 as well?18:31
noonedeadpunkI'm not sure it should be a warning even....18:31
noonedeadpunkand yeah, error only in 2024.218:31
noonedeadpunkAs that is really expected, that you won't have that at this stage, so what to warn about...18:31
sean-k-mooneywell it may mean you have old non upgraded compute service records18:32
sean-k-mooneythat is pretty common where peope scale in or remove say ironic18:32
noonedeadpunkbut you 100% have them until you restart computes, that will happen only afterwards?18:32
sean-k-mooneyand forget to remove the compute service records18:32
dansmithnoonedeadpunk: it needs to go from some state to "all good" after you're done with the upgade18:32
sean-k-mooneythat hsoudl be caust by the "Check: Older than N-1 computes " check18:32
dansmithI don't think slurp has anything to do with this18:33
noonedeadpunkwell, it does, as otherwise you can't jump 2023.1 -> 2024.1?18:33
noonedeadpunkas you have to do 2023.2 regardless ?18:33
noonedeadpunkOr I'm totally lost why slurp even a thing...18:34
sean-k-mooneyso i think given where this is run in the upgrade it would only be valid to be an error if the min compute service version was above that in 2023.218:34
dansmithslurp has nothing to do with it because it shouldn't be erroring out before you'e done whatever upgrade will result in the new ids18:34
dansmithsean-k-mooney: I think we could move it to an error and version check before we start relying on these, but no need to make it an error right now I think18:35
noonedeadpunkwell, docs for the command say, that it should be run BEFORE service restart. So you can't be done with upgrade, it's pre-upgrade check basically18:35
sean-k-mooneydansmith: i agree 18:35
sean-k-mooneynoonedeadpunk: yes it is18:36
noonedeadpunkbut tight it to the compute version is indeed a good idea18:36
sean-k-mooneythe command was intened to tell you before you upgrade that "x will break because your forgot to do something"18:36
noonedeadpunkas if it's n-2 for 2024.1 -> warning, n-1 - error18:36
sean-k-mooneynoonedeadpunk: so 2024.1 will be the first release to fully supprot n-218:37
noonedeadpunkyeah, but well, if the feature has appeared afterwards - you really can't do that before?18:37
dansmithnoonedeadpunk: can you file a bug so we can backport?18:37
noonedeadpunk++18:37
sean-k-mooneydansmith: we just need to change this to warnign correct https://github.com/openstack/nova/blob/master/nova/cmd/status.py#L291-L30218:38
dansmithdoing it now18:38
noonedeadpunkjust to state this one more time - warning will fire in 100% cases, right?18:38
sean-k-mooneyhum yes until the upgrade is compelte18:39
dansmithyes18:39
noonedeadpunkAnd what docs say - `At least one check encountered an issue and requires further investigation.` So all who do upgrades will spend some time to find out that they should proceed first when it's not inclined but the command nature?18:39
dansmithit wouldn't make much sense to say "you pass this test with OK" and then after you restart it goes to warning or error18:39
dansmithif we had a way to skip a test because it's irrelevant, then we'd do that, otherwise I think warning makes sense18:39
noonedeadpunkit does on 2024.118:39
noonedeadpunkif you upgrade from 2023.218:40
sean-k-mooneydansmith: if we modify the check to do a min compute version check it might but then perhasp i was wrong to ask for this as its not as useful as i tought18:40
dansmithdoes what? make sense? I disagree18:40
sean-k-mooneynoonedeadpunk: baislly i asked for this to catch the case where you are upgrading aging and had not started the computed i.e. to 2024.2 i guess18:41
noonedeadpunkbut then I'm really not getting what are expectations on me as operator to perform upgrade from 2023.1 to 2023.2 and what I should do when I see that warning18:41
dansmithsaying we checked a thing and everything is good when we really checked and see that you're not but we're ignoring the done-ness of it because you're in the middle of an upgrade doesn't seem to fit very well to me18:42
dansmithlike I say if we had "skipped" or "not due yet" then that'd make sense18:42
noonedeadpunkyeah, I kind of already got what it does and why exist :) just trying to say that raise warning for expected behaviour is also... meh?18:42
sean-k-mooneyi wonder if it would be beter to remove this check.  i know i asked for it orginally but im wondering if  its adding benifit. im also wondering if there is another bug18:45
sean-k-mooneywe are doing " cn_no_service = main_db_api.compute_nodes_get_by_service_id(ctx, None)18:46
sean-k-mooney"18:46
sean-k-mooneyso cn_no_service is the list of all compute nodes that have service id set to None18:46
dansmithsean-k-mooney: it's really more applicable to the release where we start requiring it18:46
sean-k-mooneydansmith: ya it is. im wonderign what happens for ironic in this case18:47
dansmithidk. I have a big "if ironic: return" in my head over all that stuff :P18:47
sean-k-mooneyi think this will always fail if you have ironic18:48
sean-k-mooneybut i am also pretty sure we have not test coverage18:48
noonedeadpunkhttps://bugs.launchpad.net/nova/+bug/203959718:49
sean-k-mooneyas in, in the compute manager i think your correect we have "if ironic: return" so i dont think we set the service id for ironic computes18:50
dansmithsean-k-mooney: no I meant I have that conditional in my _brain_ :)18:50
sean-k-mooneyoh hehe18:50
dansmithokay well, I have the FAIL->WARN thing queued, I can either submit that or change to a revert18:51
sean-k-mooneygiven this converstaion im inclided to revert18:53
* noonedeadpunk still thinks that WARN is not perfect18:53
sean-k-mooneyand then when we actully start depenign on this we can decide if we shoudl add a status check18:53
sean-k-mooneyas you suggested before18:53
noonedeadpunkI mean. I can ignore WARNs and just make them pass.... But what value of warns then...18:53
noonedeadpunk(or depending on the source version for upgrade for 2024.118:54
sean-k-mooneynoonedeadpunk: no warning are ment to be "this thing is deprecated and you should fix it before upgrading again"18:54
dansmithsean-k-mooney:  well, maybe comment on that in the bug and we can get bauzas to opine tomorrow. ISTR he was pro check as well, but maybe only because you said it and it "seems right" :)18:55
noonedeadpunksean-k-mooney: then it's worth fixing docs to state that18:55
sean-k-mooneyif we keep the check we need to filter out ironic compute nodes too and or add the min compute version cehck18:55
sean-k-mooneybut sure ill comment on the bug18:56
sean-k-mooneynoonedeadpunk: to be honest we try to avoid requiuring the operator to do things to be able to upgrade18:56
noonedeadpunkAs right now reading the doc it sounds a bit different then it's fine now - but will be not next time18:56
sean-k-mooneyso we very really add nova-status checks becuase we try to amke sure not to design something to require them18:57
sean-k-mooneys/really/rarely/18:58
noonedeadpunkand you're doing great job to have that said18:58
sean-k-mooneyif you look at the first copule they are things like you must have placment, or cells v218:59
noonedeadpunkAnd I'm not insisting that we're doing things right - it's just we're doing them how it's written more or less... And if you say we should do things differently - I'm really fine with that, and can propose update to docs as well18:59
sean-k-mooneyand more recently the service user token for the cve18:59
noonedeadpunkYeah, I guess one "but" for service user token - is that it's not since 24.0.0 as it was during Zed in fact and then backported... But anyway - it's small detail19:00
noonedeadpunkI also wondering how nobody has raised that thing until now... As we've seen failure in CI right after the coordinated release, but nobody had time to look into the rootcause...19:03
sean-k-mooneyright so the upgrade check is ment to help catch this before you deploy to production19:05
sean-k-mooneywhat this si also telling me is we dont actullly fail on error in our grenade jobs.19:06
sean-k-mooneythat would have caught this before it merged i think19:06
dansmithwell, tbh I think we're running it after upgrade to see if you've got everything done19:10
dansmithand thus it's passing19:10
dansmithit's supposed to be used for that as well.. basically "am I done with my homework"19:10
sean-k-mooneyya that makes sense i gues although this would still fial in the multi node job in that case19:11
sean-k-mooneysince we only upgrade the contoler node and not the extra compute19:12
noonedeadpunkSorry, I think I have one more question. Today I got quite stuid idea, but wanna confirm how stupid it is :)19:26
noonedeadpunkHow bad is to run `nova-manage cell_v2 discover_hosts --by-service` if you don't have ironic19:26
noonedeadpunkI see there's a performance penalty of doing that in the docs...19:27
noonedeadpunkBut that would simplify some logic quite a lot on the other hand...19:27
noonedeadpunkSo wondering how bad that trade off might be from the nova prespective19:30
opendevreviewAlexey Stupnikov proposed openstack/nova master: Add functional tests to reproduce bug #1994983  https://review.opendev.org/c/openstack/nova/+/86341619:57
opendevreviewAlexey Stupnikov proposed openstack/nova master: Add functional tests to reproduce bug #1994983  https://review.opendev.org/c/openstack/nova/+/86341619:59
opendevreviewAlexey Stupnikov proposed openstack/nova master: Log some InstanceNotFound exceptions from libvirt  https://review.opendev.org/c/openstack/nova/+/86366519:59
colby__Hey Guys. We are in the process of upgrading our hypervisors to Yoga. We are on centos8 stream. Once I update openstack and qemu/libvirt packages instances are not able to be live migrated to the hypervisor. Im seeing the following:20:18
colby__qemu-kvm: Missing section footer for 0000:00:01.3/piix4_pm#0122023-10-17T19:07:20.896349Z qemu-kvm: load of migration failed: Invalid argument20:18
colby__did the commands to migrate change at all with yoga release? Where can I see that in the code? Im wondering if this is a qemu update issue20:19
opendevreviewMerged openstack/nova master: Install lxml before we need it in post-run  https://review.opendev.org/c/openstack/nova/+/89843520:50
*** haleyb is now known as haleyb|out22:30

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!