Tuesday, 2022-03-22

*** dasm|afk is now known as dasm|off00:14
*** hemna0 is now known as hemna02:28
*** hemna9 is now known as hemna02:45
*** clarkb is now known as Guest279003:17
*** bhagyashris is now known as bhagyashris|PTO05:43
opendevreviewStephen Finucane proposed openstack/os-resource-classes master: setup: Update Python testing classifiers  https://review.opendev.org/c/openstack/os-resource-classes/+/83464310:17
opendevreviewStephen Finucane proposed openstack/os-resource-classes master: setup: Replace dashes with underscores, add links  https://review.opendev.org/c/openstack/os-resource-classes/+/83464410:17
*** sfinucan is now known as stephenfin10:18
zigoIs there a way to evacuate a host that has 3 VMs that have affinity? Can I somehow tell nova "migrate them together" ?10:51
*** prometheanfire is now known as Guest011:48
*** ChanServ changes topic to "This channel is for Nova development. For support of Nova deployments, please use #openstack"11:55
*** osmanlicilegi is now known as Guest211:59
opendevreviewanguoming proposed openstack/nova master: fix the bug of the log line has no request_id info at source host when live migration  https://review.opendev.org/c/openstack/nova/+/83467712:41
opendevreviewanguoming proposed openstack/nova master: fix the bug of the log line has no request_id info at source host when live migration  https://review.opendev.org/c/openstack/nova/+/83467712:47
stephenfinsean-k-mooney: This isn't hugely important, but could you look at https://review.opendev.org/c/openstack/nova/+/723572/ and https://review.opendev.org/c/openstack/nova/+/723573/ today?12:50
opendevreviewanguoming proposed openstack/nova master: fix the bug of the log line has no request_id info at source host when live migration  https://review.opendev.org/c/openstack/nova/+/83467712:54
opendevreviewStephen Finucane proposed openstack/nova master: objects: Don't use generic 'Field' container  https://review.opendev.org/c/openstack/nova/+/73823912:58
opendevreviewStephen Finucane proposed openstack/nova master: objects: Remove unnecessary type aliases, exceptions  https://review.opendev.org/c/openstack/nova/+/73824012:58
opendevreviewStephen Finucane proposed openstack/nova master: objects: Use imports instead of type aliases  https://review.opendev.org/c/openstack/nova/+/73801812:58
opendevreviewStephen Finucane proposed openstack/nova master: objects: Remove wrappers around ovo mixins  https://review.opendev.org/c/openstack/nova/+/73801912:58
opendevreviewStephen Finucane proposed openstack/nova master: WIP: add ovo-mypy-plugin to type hinting o.vos  https://review.opendev.org/c/openstack/nova/+/75885112:58
sean-k-mooneystephenfin: sure ill take a look at them now while i have context on this they look reasonably short and i see gmann has already reviewed them13:00
sean-k-mooneygetting rid fo the dict compat layer has been long overdue13:00
sean-k-mooneyi woudl be nice not to have to review for new usease of them as a dict13:00
opendevreviewStephen Finucane proposed openstack/nova master: doc: Remove useless contributor/api-2 doc  https://review.opendev.org/c/openstack/nova/+/82859913:02
EugenMayerWhen deploying via terraform it and changing an flavor (thus replacing it) it seems like the old flavour was removed but not yet 'removed from the instance it has been used' and then it all failed. Now i'am stuck with Unable to retrieve instance size information. Details Flavor 384bc436-a0cb-4e4a-80d1-26dd03743061 could not be found. (HTTP 404)13:54
EugenMayer(Request-ID: req-7c68445d-a8b5-4ef6-a11d-6f037402d92a) - so basically one of my instances references a flavor that no longer exists. Is there a way to somehow fix this?13:54
*** dasm|off is now known as dasm14:20
artomAnyone able to run functional tests on ussuri?14:48
artomTrying to figure out if it's something local to me, or more widespread14:48
artomSeems to be hanging/timing out on:14:48
artomfunctional installdeps: -chttps://releases.openstack.org/constraints/upper/ussuri, -r/home/artom/src/nova/requirements.txt, -r/home/artom/src/nova/test-requirements.txt, openstack-placement>=1.0.014:48
* artom strace's14:49
sean-k-mooneyi can try it one sec14:50
artomSeems to be doing... something?14:50
artomLooping on https://paste.opendev.org/show/b45jbgPA429f5iKFJSEq/14:50
sean-k-mooneylooks like we are missing a fixture14:52
sean-k-mooneyfrom that trace14:52
sean-k-mooneywe shoudl not eb doing ioctl;calls in general14:52
sean-k-mooneylike that implies we are doing file io or network configuration14:53
sean-k-mooneyits running fine for me14:56
sean-k-mooneywere you having a failing test?14:56
sean-k-mooneyor just would not install14:56
sean-k-mooneyi did locally change the psycopg2 to psycopg2-binary in my test-requirements.txt but that is just because i dont have or want postgress installed on my laptop14:57
sean-k-mooneyso i dont have the headers to build psycopg2 form source14:58
artomSo there is a backport in progress15:01
* artom tries on a pristine ussuri rpeo15:01
artom*repo15:01
*** Guest2790 is now known as clarkb15:01
artomBut... it's not running any tests (yet), it's on installdeps...15:01
sean-k-mooneygot the gerrit linke i can try that explictly if you want15:02
artomsean-k-mooney, only local for now15:03
artomBackporting https://review.opendev.org/c/openstack/nova/+/796907/2/nova/tests/functional/libvirt/test_pci_sriov_servers.py#73 to ussuri15:04
sean-k-mooneyi had one failure 15:08
sean-k-mooneyFileNotFoundError: [Errno 2] No such file or directory: 'openssl'15:08
sean-k-mooneywhich is likely just down to the fact im runing this on nixos15:08
artomSeems to be the same problem with a pristine ussuri...15:09
artomI should try on Ubuntu I guess?15:10
artomAlthough func tests should be platform-independant15:10
sean-k-mooneyrunning them on macos last night not as much as you woudl think15:15
sean-k-mooneywe have a bunch that fail because they detach its not linux15:15
sean-k-mooneymaybe pass -r15:15
sean-k-mooneyor delete the .tox dir15:16
sean-k-mooneyincase you have some leftover issue form a previous run15:16
artomYep, tried with -r, same15:17
sean-k-mooneyodd what distro are you currently using15:17
sean-k-mooneyi can try on ubunu if you like i also have a centos 9 vm15:17
artomF3515:18
bauzasreminder : nova meeting in 41 mins here at #openstack-nova15:19
bauzasfwiw, DST is not impacting our meeting, as we use UTC 15:19
clarkbartom: sean-k-mooney: pip installs taking forever likely indicates a dependency resolver problem15:20
clarkbwe've seen that happen when the solver can't find a valid answer. However constraints tends to fix that and you supply constraints so maybe not that15:20
sean-k-mooneyclarkb: i dont think it was the resolver15:20
sean-k-mooneyclarkb: i think artom  is gettign stack traces15:20
artomsean-k-mooney, no, just spinning in the void15:21
sean-k-mooneyoh have you added -v15:21
artomThe paste was a `strace -p` output15:21
sean-k-mooneyso you can see what actully happening15:21
artomsean-k-mooney, *facepalm* lemme try that15:23
sean-k-mooneyartom: f35 has a much newer gcc libffi and kernel then ussuri was developed with by the way so that strace was refering to ffi presumable as part of compiling some of the c python modules so there might be issues with tyrign to install ussuri on f35 to run the func tests15:35
zigoI just noticed that if a host is over its CPU ratio (because it has been reduced), then live-migrations are silently failing (only the scheduler gives a clue). Is this known? Is this considered a bug? Should I file the bug?15:42
zigoThe workaround is obviously to temporary up the CPU overcommit ratio temporarily, but that's still kind of annoying to do.15:43
sean-k-mooneyzigo: yes its a know issue15:43
sean-k-mooneyit has to do with how placement currently validates allocation candiates15:43
sean-k-mooneyif its the issue i think it is15:43
zigoThanks.15:45
sean-k-mooneyif i rememebr correctly it also effect evacuate15:48
bauzaslast reminder : nova meeting in 9 mins15:51
bauzas#startmeeting nova16:00
opendevmeetMeeting started Tue Mar 22 16:00:16 2022 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.16:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:00
opendevmeetThe meeting name has been set to 'nova'16:00
bauzashey ho16:00
elodilleso/16:00
chateaulav\o16:00
bauzas#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting16:00
gmanno/16:00
dansmitho/16:00
artom~o~16:01
bauzasok, let's start16:01
bauzas#topic Bugs (stuck/critical) 16:01
bauzas#info No Critical bug16:01
bauzas#link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 28 new untriaged bugs (+0 since the last meeting)16:01
bauzas#help Nova bug triage help is appreciated https://wiki.openstack.org/wiki/Nova/BugTriage16:01
bauzas#link https://storyboard.openstack.org/#!/project/openstack/placement 26 open stories (0 since the last meeting) in Storyboard for Placement 16:01
bauzasany bug in particular to discuss ?16:02
bauzasI triaged a few of them but I need to create some env for verifying some others16:02
bauzasok, looks not16:03
bauzasnext,16:03
bauzas#topic Gate status 16:03
bauzas#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:03
bauzas#link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status 16:03
bauzas#info Please look at the gate failures and file a bug report with the gate-failure tag.16:03
bauzasI haven't seen any new problem16:03
gmannone update for centos9 stream volume detach failure16:04
gmannit is fixed now as SSH-able series is merged #link https://review.opendev.org/q/(topic:bug/1960346+OR+topic:wait_until_sshable_pingable)+status:merged16:04
gmannI have made centos9-stream as voting job in tempest gate16:04
bauzas\o/16:04
dansmithgmann: really, that makes it all pass reliably?16:05
gmannand proposed to be voting in devstack side too #link https://review.opendev.org/c/openstack/devstack/+/83454616:05
gmanndansmith: for now yes:)16:05
dansmithcool16:05
dansmithfips job in glance was still failing this morning I think, but I will look and see if it ran against that or not16:05
gmannand we will monitor it carefully now as we made it voting. n-v jobs always gets ignored somehow 16:05
dansmithyeah cool16:05
artomSo I wonder, would there be anything else to understand at the guest:host interaction level to understand why Ubuntu doesn't need to wait for SSHABLE?16:06
dansmithartom: I'm super curious as well, as this seems like an odd thing to have changed with just newer libvirt/qemu, although certainly possible16:06
dansmithwe'll see if more weirdness comes out of running it in the full firehose16:06
gmanndansmith: yeah, you can try with recheck. this patch fixed the last test #link https://review.opendev.org/c/openstack/tempest/+/83160816:07
bauzasagreed, it's weird but ok16:07
dansmithas I was seeing other problems (on stream 8 mind you) when we were running it voting16:07
bauzasthanks gmann btw. for having worked on it :)16:07
gmannnp!, just carried lyarwood  work in this. 16:07
bauzascan we move ?16:07
gmannyeah16:08
bauzaskk16:08
bauzas #topic Release Planning 16:08
bauzasshit16:08
bauzas#topic Release Planning 16:08
bauzas#link https://releases.openstack.org/yoga/schedule.html#y-rc1 RC1 is past now16:08
bauzas#link https://etherpad.opendev.org/p/nova-yoga-rc-potential Etherpad for RC tracking16:09
bauzas#link https://bugs.launchpad.net/nova/+bugs?field.tag=yoga-rc-potential RC potential tags16:09
bauzasthis is Regression chasing time !16:09
bauzaswe only have 2 days to provide a RC2 if we find a regression16:09
bauzasfor the moment, we haven't see any of them16:09
bauzas#info RC2 deadline is in 2 days, so we can only fix regressions before16:10
bauzasactually, this is RC-deadline16:10
bauzasnot really a specific RC216:10
bauzaswe could have a RC2 release tomorrow and then a RC2 on Thursday16:10
bauzasshit, RC3 on Thurs16:10
* dansmith watches where he steps in here16:11
bauzasthis is just, either we find regressions before Thursday and then we need to merge the changes before, or we would have a Yoga GA release with some known issue and we could only fix the regression by a next stable release16:11
bauzasbut, as you can see https://bugs.launchpad.net/nova/+bugs?field.tag=yoga-rc-potential is empty16:12
bauzasanyway16:12
bauzasthat's it for me16:13
bauzasany question or discussion for Yoga before we go to the next topic ?16:13
bauzaslooks not16:14
bauzas#topic PTG preparation 16:14
bauzas#link https://etherpad.opendev.org/p/nova-zed-ptg Nova Zed PTG etherpad16:14
bauzasnothing to say, please provide your topics you would like to discuss16:15
bauzasthe PTG will be in 2 weeks, so I'd prefer to see all the topics before end of the next week16:16
bauzasfor the moment, we only have a few of them16:16
bauzasanything to discuss about the PTG ?16:16
bauzasreminder, PTG will be April 4 - 8, 202216:17
Ugglabauzas, sorry for the noob question, will we review bp/specs for zed ?16:17
bauzasUggla: no worries, it's your first PTG16:18
Ugglashould we put the bp/specs in the agenda ?16:18
bauzasUggla: in general, we discuss about some specs if people have some stuff they'd like to see the community to find a consensus16:18
bauzasUggla: we don't generally look at all the open specs16:18
bauzaspeople can also go and discuss about something they'd like to see or work, without having a spec yes16:19
bauzasyet*16:19
bauzasUggla: look at the Xena PTG we had so you'll see what we discussed https://etherpad.opendev.org/p/nova-xena-ptg16:19
Ugglabauzas, I will have a look, thanks.16:20
bauzasok, moving on, then16:21
bauzas#topic Review priorities 16:21
bauzas#link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+label:Review-Priority%252B116:21
artom(No osc/sdk in there?)16:22
bauzasI have seen new changes16:22
artom(What with moving towards deprecation of the novaclient CLI)16:22
bauzasartom: nope16:22
bauzasartom: osc is another community but I understand your point16:23
bauzasartom: this is just, this label is only supported for our repos16:23
artomAh, right16:24
bauzas(AFAIK)16:24
sean-k-mooneyartom: we deprecated teh novaclint cli already16:24
artomYeah, I wasn't sure16:24
bauzasartom: but if you want us to look at OSC changes, we can do this by some etherpad16:24
sean-k-mooneythe python binding are still allowed to be extended16:24
bauzasartom: but you know what ? let's discuss this at the PTG to see how the nova community can review those OSC changes :)16:25
bauzasartom: hopefully you'll provide a topic, right? 16:25
bauzas:)16:25
artomShoudve kept my fat mouth shut :P16:26
* artom will16:26
bauzasartom: :p16:26
bauzasmoving on16:26
bauzas#topic Stable Branches 16:26
bauzaselodilles: your point16:26
elodilles#info xena branch seems to be blocked by nova-tox-functional-centos8-py36 job - https://zuul.opendev.org/t/openstack/builds?job_name=nova-tox-functional-centos8-py3616:26
elodilles#info pike branch is blocked - fix: https://review.opendev.org/c/openstack/nova/+/83366616:26
elodillesand finally a reminder:16:27
elodillesVictoria Extended Maintenance transition is due ~ in a month (2022-04-27)16:27
bauzaswow, time flies16:27
elodillesyes yes16:27
elodillesthat's it i think16:28
bauzaselodilles: can we make the centos8 job non-voting ?16:28
elodillesbauzas: that's an option16:28
bauzasdoes someone already look at the issue ?16:28
elodillesi had a quick look only16:28
artomSeems to be spurious... 16:29
bauzaselodilles: ping me tomorrow morning and we'll jump onto it16:29
artomThe last few runs passed16:29
elodillesit seems to be related to some mirror issue, but not sure16:29
bauzasartom: not the stable/xena branch16:29
gmannyeah seems mirror issue otherwise we can see same version conflict in other places also16:29
elodillesbauzas: sure, thanks16:29
artom... then which? stephenfin has a fix up for the pike one, looks like...16:30
artomSo 'NFO: pip is looking at multiple versions of openstack-placement' is new, no?16:30
bauzasfor the pike branch, agreed on reviewing the fix16:30
artomOn my laptop, for stable/ussuri, it's taking forever16:30
gmannelodilles: let's wait for few more run.16:30
bauzasI don't want us to dig into the job resolution for now16:31
bauzasbut people can start looking at it after the meeting if they want16:31
elodillesgmann: ack16:31
bauzasthis is just, I don't want this branch holding because of one single job16:31
bauzasgmann: elodilles: I'd appreciate some DNM patches to make sure we don't hit this every change16:32
bauzaslooks we discuss all the thingies by now16:33
bauzasdiscussed*16:33
*** Guest0 is now known as prometheanfire16:33
bauzascan we move ?16:33
gmanndid recheck on 828413, let's see16:33
bauzasgmann: ++16:33
elodillesyes, thanks, let's move on16:34
bauzaslast topic then16:35
bauzas#topic Open discussion 16:35
bauzasI have one16:35
bauzas(bauzas) Upgrade our minimum service check https://review.opendev.org/c/openstack/nova/+/83344016:35
bauzastakashi gently provided a changes for bumping our min version support 16:35
bauzasbefore merging it, I'd like to make sure all people here agree on it 16:36
dansmithso one thing we might want to consider,16:36
bauzas(that said, there is a grenade issue on its change, so even with +Wing it...)16:36
dansmithis a PTG topic about the check (and the problems with it that we didn't foresee) to see if there's any better way we could or should be doing that whole thing16:36
dansmithand just punt on the patch until we have that discussion16:36
bauzasI already opened a PTG topic 16:37
bauzasI'll add the service check in it16:37
dansmithokay16:37
bauzasjust done16:39
bauzaspeople agree with this plan ?16:39
bauzaseither way, as said the change itself has grenade issues that need to be fixed16:39
bauzasand I don't see any reason for rushing on it being merged16:39
bauzaswe have the whole zed timeframe for this16:39
elodilles(grenade issue might be because devstack does not have yet stable/yoga)16:40
elodilles(so that should be OK in 1 or 2 days)16:40
bauzaswe haven't released stable/yoga16:40
bauzasthis will be done on next Wed16:40
bauzaselodilles: but yeah, sounds this16:41
elodilles++16:41
gmannyeah, we should do that soon, neutron face same ssue.16:41
gmannelodilles: I will discuss in release channel16:41
elodillesgmann: ack16:41
bauzasok, I guess we're done then16:43
artomOh, can we chat about https://review.opendev.org/c/openstack/nova/+/833453?16:43
bauzas#agreed let's hold https://review.opendev.org/c/openstack/nova/+/833440 until we correctly discuss this at the PTG16:43
* bauzas clicks on artom's patch16:44
artomReally only bringing it up here because, as a periodic, we'd have to check up on the status, presumably here16:44
artomHere == the meeting16:44
bauzasartom: yeah, that's my point16:45
bauzaswe already do a few checks during the gate topic16:45
bauzasbut I wonder whether that wouldn't be better if we could agree on this at the PTG16:45
EugenMayeris it possible to set the flavor of an instance manually using the api?16:46
EugenMayerOh - sorry. Still meeting time. Ignore me.16:46
artombauzas, doesn't seem controversial, but OK :)16:46
bauzasartom: yup, I don't disagree16:47
bauzasdo people have concerns with adding a periodic check on whitebox ?16:47
artomI guess the downside is CI resource usage, but... one nightly job seems OK?16:47
bauzasI heard news of some CI resource shortage, but I'm not in the TC16:47
artomYet ;)16:48
bauzasdansmith: gmann: can we just add a periodic job without being concerned ?16:48
artomdansmith said someone is pulling out16:48
artom(phrasing </archer>)16:48
dansmithperiodic is probably not a big deal I would imagine16:48
dansmithI think we're going to need to trim down nova's per-patch jobs too, as it's getting pretty heavy16:48
bauzasyeah, I don't think this is a big thing if we add a periodic16:49
bauzasdansmith: adding a PTG topic about it fwiw16:49
gmannyeah, and periodic also we can see if daily or weekly? 16:49
bauzastbh, the only matter is how much we'll check its status and that will be weekly (during the team meeting)16:50
gmannbauzas: artom along with periodic, add in experimental pipeline too for manual trigger. that helps to avoid adding it in check/gate pipeline if anyone want to run maually 16:51
artombauzas, yep, no point in making it daily if we're only checking the status weekly16:51
artomgmann, ack, can do16:51
gmann+116:51
dansmithyeah daily seems excessive16:51
bauzasartom: update this change with the weekly period time and mention in the commit msg we'll need to verify it during weekly meetings16:53
* artom will have to find example of periodic weekly to figure out the correct Zuul words magic16:53
bauzaslook at the placement ones16:53
artomOh yeah!16:53
gmannartom: https://github.com/openstack/placement/blob/master/.zuul.yaml#L6416:53
gmannyeah16:54
artomHah, that was easy16:54
bauzasthis is another pipeline IIRC16:54
sean-k-mooneyby the way i think weekly jobs in general suit use better as we can review them in the weekly meeting16:54
sean-k-mooneyif we have a nightly we proably wont look at it every day16:54
bauzasoh yeah16:54
bauzasI just hope this meeting won't transform into some CI meeting16:54
chateaulavartom: nova zuul has an example of weekly periodic now16:54
bauzasif we start adding more periodics16:55
artomI mean, feel free to nack the idea entirely :)16:55
sean-k-mooneybauzas: well it should just be (are they green no we shoudl look at X after the meeting)16:55
artomI'll obviously try to debate/convince you16:55
bauzasartom: nah, I like the idea, I just want us to buy it 16:55
artomBut if we think whitebox doesn't bring value to Nova CI, let's just not do it :)16:55
bauzaswe're approaching meeting's end time16:56
artomEnd times are nigh16:56
bauzasany other item to mention before we close ?16:56
sean-k-mooney:)16:56
* artom gets raptured16:56
sean-k-mooneyah i actully had two blueprints i wanted to raise16:56
sean-k-mooneywe defered updating the defaults for allcoation ratios16:56
bauzassean-k-mooney: oh I forgot to mention I changed Launchpad to reflect zed as the active series16:57
sean-k-mooneyshall we proceed with that or discuss at ptg16:57
sean-k-mooneyalso kasyaps blueprint for usign the new libvirt apis16:57
bauzaswe're a bit short in time for reapproving specless bps by now16:57
sean-k-mooneycan we retarget both to zed16:57
sean-k-mooneyack16:57
bauzasbut we can look at them during next meeting16:57
sean-k-mooneywe can disucss it next week or at ptg16:57
bauzaswell, Zed is open 16:58
bauzasI'm OK with approving things by now16:58
bauzasand the specs repo is ready16:58
bauzassean-k-mooney: just propose your two blueprints for the next meeting so we'll reapprove them (unless concerns of course)16:58
sean-k-mooneyack16:59
bauzasfwiw, I leave the non-implemented blueprints in Deferred state16:59
bauzasonce we start reapproving some, I'd change back their state17:00
bauzasbut anyway, we're on time17:00
bauzasthanks all17:00
bauzas#endmeeting17:00
opendevmeetMeeting ended Tue Mar 22 17:00:16 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)17:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/nova/2022/nova.2022-03-22-16.00.html17:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/nova/2022/nova.2022-03-22-16.00.txt17:00
opendevmeetLog:            https://meetings.opendev.org/meetings/nova/2022/nova.2022-03-22-16.00.log.html17:00
elodillesthanks bauzas o/17:00
bauzaswas a productive meeting, after all17:00
EugenMayerIs there any 'good way' to set the task-state of an instance that has been stuck in 'image backup' due to an issue in glance? so the field OS-EXT-STS:task_state is on "image_backup"17:03
*** tosky is now known as Guest3817:04
*** tosky_ is now known as tosky17:04
EugenMayeri see there is 'nova set --state' or 'nova reset-state' but both seeem to operate on the instance-power-state (OS-EXT-STS:power_state) or OS-EXT-STS:vm_state - but not the task-state17:05
zigosean-k-mooney: Yeah, this was an evacuate operation.17:07
dansmithzigo: I thought you said live migrate?17:07
sean-k-mooneyzigo: ok the reason this breaks is for evacuation we only have 1 allocation in placemnt against both hosts17:07
sean-k-mooneyand since the souce host is over capstiy because you reduce the allocate ration the entire allcoation is considered invlaid17:08
sean-k-mooneywe disussed this at the ptg 1 or 2 ptgs ago17:08
sean-k-mooneyi cant recall if we said we should fix this after consumer types but i dont think we had a workaround other then tempoarly increase the allcoation ratio so its nolonger over commited17:09
dansmithsean-k-mooney: we could also solve it the way we do for cold migration, which is hold the allocation on the source with the migration uuid right?17:09
sean-k-mooneydansmith: yes we could that was on eof the options17:10
sean-k-mooneyim trying to find the launchpad bug17:10
bauzasdansmith: sean-k-mooney: yeah, the Migration uuid for evacuate seems the better and cleaner approach17:11
sean-k-mooneybauzas: that is what we were proposing doing17:12
sean-k-mooneybut i dont think anyone has worked on it since17:12
bauzas:-)17:13
sean-k-mooneyhttps://bugs.launchpad.net/nova/+bug/194319117:13
sean-k-mooneythat might be it17:13
EugenMayerI'am looking on https://wiki.openstack.org/wiki/CrashUp/Recover_From_Nova_Uncontrolled_Operations to understand how to recover from the crashed task state 'image_backup' but i'am not sure how to actual act upon that. Should i use the nova api?17:13
sean-k-mooneyand https://bugs.launchpad.net/nova/+bug/192412317:14
bauzassean-k-mooney: some people expect bugs to be fixed automatically :)17:14
bauzaswe don't have yet AI bots smart enough to close the gaps17:14
sean-k-mooneyEugenMayer: the wiki is basicaly unmaintained17:14
EugenMayeri see. Thank you17:15
sean-k-mooney in the early days of openstack we used the wiki for sepc and project created docs(docs not by the docs team)17:16
EugenMayerI'am really not sure hot to again recover from the failed task the proper way. The only way i yet know, which is huge is: reset the state, then restart the compute the vm is hosted so thee state is somewhat recovered17:16
sean-k-mooneythere is not way to recover form it really beyond that17:17
sean-k-mooneywe dont provide a api to allow taskt to be restarted17:17
dansmithreset state and reboot the vm is what I'd try first,17:17
sean-k-mooneyyep same17:18
dansmithnot restarting the compute I'd hope17:18
sean-k-mooneyya that normally shoudl not be required17:18
sean-k-mooneyi guess it woudl depend on why it failed17:18
dansmithdefinitely not expected for anything like a glance thing17:18
EugenMayertrying that. AFAIR i had to restart the entire compute last time. Anyway, trying that17:18
sean-k-mooneydo you recall way?17:19
sean-k-mooney*why17:19
EugenMayerdansmith well this happens the 4th time. A stuck glance image backup task leaves the task_state of the instance in a broken state17:19
dansmithhonestly restarting the compute shouldn't even do anything, AFAIK17:19
sean-k-mooneyi wonder if the main thread of the compute agent was blocked on an io operations17:19
sean-k-mooneythat is the only thing i can think of that would be fixed by an agent restart17:20
sean-k-mooneywe were not using a thread pool on some of the older release for those17:20
dansmithsean-k-mooney: compute is the thing that "consumes" the task_state and turns it into a vm_state, so to speak, so maybe we clear task_state in init_host in some cases?17:20
EugenMayerwell i'am on xena, so not really old17:20
dansmithbut either way, reset_state to error is supposed to let you clear everything by enabling force reboot I think17:21
dansmithor that's the intent17:21
sean-k-mooneydansmith: i think we do yes but not sure about this case17:21
EugenMayerdansmith it is clear, swt wise, that there is more then one misconception in the microservice and task callstack. I'am not sure if glance is required to call a webhook on success or error (not sure how the result is propagated) but this is simply not the right design.17:22
EugenMayershould the task crash on glance, neither success nor error is called (ever) and there seems nothing to recover from that17:22
dansmithEugenMayer: none of that :)17:22
dansmitheverything is nova->glance17:22
sean-k-mooneyi belive this is a blocking call to do the upload to glace17:23
sean-k-mooneyif its async then either nova would poll17:23
sean-k-mooneyor we woudl get an external event form glance17:23
dansmithso depending on the failure, nova should clean up whatever it can.. an upload to glance for sure should be recoverable on our end, so that's likely it's own bug if we're missing something17:23
sean-k-mooneybut i think image upload if blocking17:23
dansmithsean-k-mooney: none of that with glance17:23
sean-k-mooneyright we dont do polling or external event right17:24
sean-k-mooneywe just do two blocking calls17:24
EugenMayerif it is a blocking task, well the blocking should cleanup - which it seem to not do17:24
sean-k-mooneyone for creating the image and the second for the data upload17:24
dansmithEugenMayer: if you can repro the problem that's definitely a bug candidate17:24
sean-k-mooneyEugenMayer: yes it should clean up if we get an error form glance17:24
dansmiththere are some situations where it might not make sense to clean up, but I would think a glance thing would always be something we can handle17:24
EugenMayerdansmith i can reproduce this the 4th time. If you tell me what to gather, i will grab the logs you need the 5th time - which will happen17:25
dansmithEugenMayer: logs17:25
EugenMayerwhich logs to get?17:25
dansmithall of them? :)17:25
sean-k-mooneydansmith: i would expect the vm to go back to active or error if we dont clean up right17:25
dansmithnova-compute, nova-api at least17:25
dansmithsean-k-mooney: error, yeah17:25
EugenMayervm is in active state, power is on, task_state is image_backup 17:25
dansmiththat said, reset_state resets task_state so that should be the way to get out here17:26
sean-k-mooneyyou can reset state to active17:26
EugenMayerreset-state --active + reboot seems to recover just right. Also viewing the console works (which is one of the problems with a partial state recovery)17:26
dansmithEugenMayer: we're saying that what we would expect is vm_state=ERROR,task_state=None17:26
sean-k-mooneyrather then error and potentaly just trigger the backup/snapthot again17:26
EugenMayerdansmith that never happened yet17:26
dansmithEugenMayer: I know, I'm saying that's what we expect nova should be doing17:27
sean-k-mooneyEugenMayer: do you know why the glance operation is failing.17:27
sean-k-mooneydansmith: i could see an argument to be made that we woudl have vm_state=Active task_state=None but the snapshot action was marked as error in the server event log17:28
sean-k-mooneyif the vm was indeed still runing proberly depending on how it failed17:28
EugenMayerthere is so much one can break right now. e.g. a other topic is using terraform and rescale a flavor. In 2 of 5 cases the following happens (i cannot tell you exactly). The old flavor is delete (too early), the new one is created, then the instance is fetched, this fails since the flavor_id of the old flavor is still set and cannot be found. TF17:28
EugenMayercancles and that's it17:28
dansmithsean-k-mooney: the problem is one of signaling, which is why we (originally as designed) went to error,None for everything and then you do a start (which does nothing) to reset back to active as sort of "ack"17:28
EugenMayerstuck again - stuck that one now needs to shelve the instance and restore it from glance using the 'new flavor'17:29
EugenMayeri did not yet check the tf openstack provider implementation to see what they have implemented and how that is a timing issue in the first place (since it does not happen every timee) .. but if i look at the openstack rest api / nova api .. swapping flavors is not designed at all.17:30
sean-k-mooneyEugenMayer: well flavor are intenede to be imuatble so you idealy woudl not delete them until all instance using them are resized17:30
sean-k-mooneywe do cache the flavor17:30
sean-k-mooneyin the insntace17:30
dansmithEugenMayer: are you describing two issues or one? if the former, then let's not complicate diagnosing this one17:30
sean-k-mooneybut really you shoudl try to avoid removing flavor or image that are in use17:30
EugenMayerwell i cannot tell why tf openstack providere deletes the flav too early or whatever happens in detail (i did not check the sequence in the code yet)17:31
sean-k-mooneyEugenMayer: it should not delete it at all17:31
EugenMayerdansmith sorry, my bad. second issue (the latter one with the flav)17:31
sean-k-mooneyit sould like they are implementing the hacky workflow that horizon use to have17:31
dansmithEugenMayer: yeah, not helping :)17:31
EugenMayerdansmith sorry. my bad.17:32
sean-k-mooneywhere they allowed you to update a flavor by deletign and recating it but ya lets not talk about that issue now17:32
EugenMayerwell if you ask me to the state error - one should not mark the instance as 'error' if a image_backup task failed - there is no reason for that. Creating a glance image does not required the instance to shutdown or similar, this said, i assume both task (the instance running) and the creation of the image can work in parallel and are independent17:35
dansmithEugenMayer: going to error state is just the nova convention (in most places)17:35
EugenMayerso this said, if the image_backup task is: failed, the task_id does no longer exists or whatever, nova should not block 'restarting the instance'17:36
dansmithand if the issue wasn't critical, then a start operation will clear the error state without requiring a reboot of the actual instance17:36
EugenMayerdansmith well it is the 'better safe then sorry convention i guess'17:36
dansmithEugenMayer: we're agreeing with you that we do not expect that this is something that should be so jammed up and that there's probably some missing error handling in this case17:37
dansmithI'm describing what the usual nova error procedure is, regarding going to error state to signal to the user that their thing didn't happen17:37
dansmithit's not great, it's just the convention17:37
sean-k-mooneyEugenMayer: creatign the glance image might require the instnace to be shutdown by the way17:37
dansmithbecause if you do a backup, and the instance goes to active, you assume it worked, but it didn't17:37
sean-k-mooneysnapshots are not guareentee to be live17:37
dansmithright17:38
EugenMayerif nova is the task owner, which i understood is the case, it should design a propere state machine in case the task (which i understood is blocking via REST, so very fragile). Task could complete failed or succeeded. Task could never complete or even be deleted (on the glance side)17:38
sean-k-mooneyEugenMayer: there was a effort to do that at one point but this is also a distibuted system problem17:38
dansmithEugenMayer: there's no task17:38
EugenMayerunderstood, but i assume the sequence is: shutdown/sleep instance, create snapshot, start/resume instance, upload snapshot to glance .. (do task tracking)17:38
sean-k-mooneyEugenMayer:right yes but there may be clean up to be done in the compute node or stoage backend  if the upload fails17:39
EugenMayerno task means: it's blocking only. Understood there is no task_id or somewhat, just a blocking http-call. So as you both suggested, this blocking call needs to cleanup in all cases: 200,500 and also 408 and others.17:40
sean-k-mooneysuch as deleting the file we created that was not uploaded17:40
dansmithEugenMayer: we're saying exactly that.. we should, assuming we can17:41
sean-k-mooneyEugenMayer: yep and nova shoudl check the respocne code and start cleaning up if it failed17:41
EugenMayerthe i have seen the glance image task under image, which i was able to delete, but since the blocking request disconnected far ago, no cleanup happened on the nova side17:41
dansmithEugenMayer: there are cases that are more complicated, such as with ceph where we might not be able to recover at all, depending on what happened, but in general we agree17:41
EugenMayeragreed17:41
sean-k-mooneywell recovery in ceph might be squash/merge the ceph snapshot back into the previous volume for example17:42
dansmithdepends on the failure of course17:42
sean-k-mooneywhere as for qcow we woudl mirror the file on disk then upload and if it faile delete the copy17:42
sean-k-mooneyEugenMayer: if you have logs and or a repoduce please file a bug and we can see if we can figure out why nova is not cleaning up as expected17:43
opendevreviewStephen Finucane proposed openstack/nova master: mypy: Add nova.cmd, nova.conf, nova.console  https://review.opendev.org/c/openstack/nova/+/70565717:52
opendevreviewStephen Finucane proposed openstack/nova master: mypy: Add type annotations to top-level modules  https://review.opendev.org/c/openstack/nova/+/70565817:52
opendevreviewStephen Finucane proposed openstack/nova master: trivial: Clean manager.Manager, service.Service signatures  https://review.opendev.org/c/openstack/nova/+/76480617:52
EugenMayersean-k-mooney dansmith will do, thank you for your both time17:58
admin1hi all .. i am hitting this bug, https://bugs.launchpad.net/glance/+bug/1916482  , but don't have an  idea on how to solve it .. i am using openstack-ansible and the latest tag 24.0.1 18:22
admin1nova is local disk, glance is rbd 18:22
opendevreviewStephen Finucane proposed openstack/nova master: objects: Remove unnecessary type aliases, exceptions  https://review.opendev.org/c/openstack/nova/+/73824018:22
opendevreviewStephen Finucane proposed openstack/nova master: objects: Use imports instead of type aliases  https://review.opendev.org/c/openstack/nova/+/73801818:22
opendevreviewStephen Finucane proposed openstack/nova master: objects: Remove wrappers around ovo mixins  https://review.opendev.org/c/openstack/nova/+/73801918:22
opendevreviewStephen Finucane proposed openstack/nova master: WIP: add ovo-mypy-plugin to type hinting o.vos  https://review.opendev.org/c/openstack/nova/+/75885118:22
opendevreviewGhanshyam proposed openstack/nova stable/xena: DNM: testing centos8 py36 job  https://review.opendev.org/c/openstack/nova/+/83476518:36
opendevreviewGhanshyam proposed openstack/nova stable/wallaby: DNM: testing centos8 py36 job  https://review.opendev.org/c/openstack/nova/+/83472118:38
*** dasm is now known as dasm|off22:18

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!