Wednesday, 2022-03-30

opendevreviewMerged openstack/nova master: Fix pre_live_migration rollback  https://review.opendev.org/c/openstack/nova/+/81532400:33
*** dasm|afk is now known as dasm01:17
*** dasm is now known as dasm|gone03:41
*** efried1 is now known as efried05:08
opendevreviewMerged openstack/nova-specs master: Remove setup.py and setup.cfg  https://review.opendev.org/c/openstack/nova-specs/+/83575909:14
opendevreviewMerged openstack/nova-specs master: Move implemented specs for the Yoga release  https://review.opendev.org/c/openstack/nova-specs/+/83527209:31
bauzasmelwitt: thanks for the help ^09:33
ihti[m]Hi, we are facing a bug with volume attachments(https://bugs.launchpad.net/nova/+bug/1964576). We have a proposed fix for it. If some one has time to review the bug/fix, it would be great. Thanks!10:13
EugenMayersean-k-mooney i know have the issue present with glance showing a 'queed' status for an image, while nova / the instance shows 'image backup'. I cannot see any tasks use 'glance task-list' nor 'glance image-tasks <id>'11:45
EugenMayerwhich nova-logs could be interesting? glance logs do not show anything interesting / error like. neither nova-api-error.log11:47
sean-k-mooneythe nova compute agent log is the only place that might have an error11:48
sean-k-mooneybut if the image is queued i think that means that nova has already finsihed uploading it11:48
opendevreviewkiran pawar proposed openstack/nova master: VMware: Early fail spawn if memory is not multiple of 4.  https://review.opendev.org/c/openstack/nova/+/83573911:48
sean-k-mooneyand glance should be processing it11:49
sean-k-mooneyhave you checked the glance api host to see if it actully has the image on disk11:49
EugenMayersean-k-mooney so the nova compute could have logs or the glance api if the image is there, the question is, why the status is queue and how to find out why it is that way11:50
EugenMayerthere are no error logs on nova-compute.log since 30 days (nothing has logged at all, last from 23th march11:52
sean-k-mooneyso i think this si useing the glance interoperal import pipeline when the image is uploaded then queued to become active after the import pipeline has finsihed processin the image11:53
sean-k-mooneyEugenMayer: that kind of sound like the agent is hung11:53
sean-k-mooneyyou shoudl at least see the periodics11:53
EugenMayerSo my image id (that is queued) is 57850bd9-dfdc-45bd-bd9f-cec297f3fdae - checking the storage folder i see images, but this one is not present11:53
sean-k-mooneyack11:53
sean-k-mooneydansmith: do you know where the image would be in the queued state?11:54
sean-k-mooneyit only enters that state after the upload has happend right?11:54
sean-k-mooneyor am i miss rememebering that11:54
EugenMayerusing 'glance image-tasks <id>' does not show any tasks, neither 'glance task-list'11:54
EugenMayerinteresting, all those backup tasks on compute3 are broken. Means other computes finished, just compute3 backups did not (all of them). This kind of tells that the compute is somehow flaky - but why and what11:56
EugenMayerif you have any idea how to trace, happy to look at it. Currently not sure where to look at at all12:20
sean-k-mooneythe only thing that comes to mind is that the agent is exasuting the thread pool or has made a blocking call on the main thread that was not monkey patched by eventlets12:23
sean-k-mooneygenerating a guru meditation report might shed some light on that12:23
sean-k-mooneybut that will basicaly crashdump the process so you will have to restart the agent after you do the sig_hup12:25
sean-k-mooneyactully not sig_hup12:25
sean-k-mooneysig_usr212:25
EugenMayerwho holds the state in general right now?12:27
sean-k-mooneywhen nova calls glance evenlet yeild form the greenthread and the state is stored in memory in the greenthread local varibles12:31
sean-k-mooneysame when we do any io like copying the image for snapshot12:32
sean-k-mooneywe yield 12:32
sean-k-mooneyand when the io op complete event resumes the greenthread12:32
EugenMayernot sure what a greenthread is. If the state is a memory state, restarting the service (what-ever that is) would reset the state for nova, right?12:48
sean-k-mooneythe threading model in nova is to use implicat coroutiens by using userspace thread 12:51
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/doc/source/reference/threading.rst12:51
sean-k-mooneyso everythime we do io eventlet yeild execution of the current function and it starts running the next greenthread12:52
EugenMayeri see, this is the nova-compute process then, right?12:52
sean-k-mooneythen when the io compelte the previous green trhead is added to the queue to be resumed12:52
sean-k-mooneyyes12:52
sean-k-mooneynova-compute but also conductor and schduler12:52
sean-k-mooneytechnially nova-api is monkeypatch but the way its run with appache  means it only process one request per worker process12:53
sean-k-mooneybecause apache queues the request before it get to the api application12:53
EugenMayeroh holy moly.12:57
EugenMayerI mean, my day job is being a software engeneer. Yes with bigger EE software, yes with microservices, distributed and all that. But this really is very weired to me - or it is simply to complex for me to play around in the mind since i do not know any components properly and have no save-haven to return / start thinking from12:58
EugenMayerthank you for elaborating on that12:59
EugenMayerI left with 2 things: glance has an tasks status 'queued' of an tasks that does not exists and it is unclear where this state comes from. Second is, why my compute3 (out of 4) fails to create any backups, all others can.13:00
EugenMayerAh now i understand - not a task is 'queued' .. the image is queued - without any task. So it is the image state13:00
sean-k-mooneya very long time ago around the catus release opensack moved form twisted to eventlet to remove the need for peopel to explcitly think about multithreading and concurancy most of the time. howver ther eare still case where you have to use locks ectra to ensure no data races. so for the most part eventlet simplifes the common code path when you are io bound which tends to be13:01
sean-k-mooneythe case for nova13:01
sean-k-mooneyyes the image is queue13:02
sean-k-mooneynot a task13:02
sean-k-mooneyhttps://docs.openstack.org/glance/latest/user/statuses.html13:05
sean-k-mooneyqueued13:05
sean-k-mooneyThe image identifier has been reserved for an image in the Glance registry. No image data has been uploaded to Glance and the image size was not explicitly set to zero on creation.13:05
sean-k-mooneyok so queue means we have crerate the image but not uploaded it13:05
sean-k-mooneywhich i guess make sense since nova is not currently in the image_uploading task_state13:06
sean-k-mooneyso that likely means that nova is failing to create the snapshot via libvirt/qemu 13:06
EugenMayeri see, thank you so much sean!13:07
EugenMayeri removed the broken (queued) images now13:08
EugenMayerreset the instances and restarted them13:08
sean-k-mooneyack13:08
EugenMayeri will check the logs of that compute once again and then restart it. Usually those restart fix 9/10 issues i have with openstack13:08
sean-k-mooneywhat you likely shoudl do is try and find the request-id for the backup call and see if you can fined the last log for that operation13:08
EugenMayerWhich obviously is not a good sign, sure13:08
sean-k-mooneyto seee where it got stuck13:08
EugenMayerthis is SO hard to trace for me considering the amount of subsystems, proxies and systems involved13:09
EugenMayerit usually is my bread and butter debugging those kind of things in our software stacks. But well, as i must learn, the reason i can do it there is - i know the software a lot better13:10
sean-k-mooneyya there is a lot of context to grok13:22
*** dasm|gone is now known as dasm13:26
viks__hi, with `soft-anti-affinity`, whenever i create 2 instances together via horizon, it goes in to 2 different hosts. But when i create one after the other with `soft-anti-affinity`, it goes in to the same host? is it expected?14:42
sean-k-mooneyviks__: the behavior of the second case will depend on if you waith for the first one to go active before you create the second15:01
sean-k-mooneybut yes it is expected in the second case you ahve a race to update the instance.host before the second vm is schduled15:02
sean-k-mooneywe provide no affinity garunetees in this case unless you enable the affinity upcall15:02
viks__sean-k-mooney: what all things i need to set for affinity upcall in the nova.conf?15:05
sean-k-mooneythe workaround config option in the compute nova.conf and api database in conductor nova.conf15:24
opendevreviewAlexey Stupnikov proposed openstack/nova stable/xena: Test aborting queued live migration  https://review.opendev.org/c/openstack/nova/+/83585315:39
opendevreviewAlexey Stupnikov proposed openstack/nova stable/xena: Add functional tests to reproduce bug #1960412  https://review.opendev.org/c/openstack/nova/+/83585415:40
opendevreviewAlexey Stupnikov proposed openstack/nova stable/xena: Clean up when queued live migration aborted  https://review.opendev.org/c/openstack/nova/+/83585515:41
viks__sean-k-mooney: ok.. but even with `[workaround]/disable_group_policy_check_upcall = false`, i get the same behaviour15:53
sean-k-mooneythe second isntance will stilll race and get sent to the same host15:53
sean-k-mooneybut it shoudl then be rejected15:53
sean-k-mooneyand reschudled to a differnt host15:53
sean-k-mooneyform the alternate host list15:54
sean-k-mooneynoonedeadpunk: by they way https://specs.openstack.org/openstack/nova-specs/specs/ussuri/implemented/flavor-extra-spec-validators.html should have caught your typo if you use the correct microverion17:09
opendevreviewmelanie witt proposed openstack/nova-specs master: Repropose spec for ephemeral storage encryption  https://review.opendev.org/c/openstack/nova-specs/+/83587717:14
opendevreviewmelanie witt proposed openstack/nova-specs master: Repropose spec for ephemeral storage encryption  https://review.opendev.org/c/openstack/nova-specs/+/83587717:36
*** dasm is now known as dasm|off21:43
*** ianw_pto is now known as ianw22:24

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!