Thursday, 2020-08-13

*** gyee has quit IRC00:21
*** rcernin has quit IRC01:11
*** rcernin has joined #openstack-glance01:11
*** Liang__ has joined #openstack-glance01:23
*** eandersson has quit IRC01:29
*** eandersson has joined #openstack-glance01:29
*** Liang__ has quit IRC03:01
*** rcernin has quit IRC03:06
*** Liang__ has joined #openstack-glance03:14
*** rcernin has joined #openstack-glance03:52
*** Liang__ has quit IRC04:02
*** Liang__ has joined #openstack-glance04:05
*** CeeMac has quit IRC04:26
*** evrardjp has joined #openstack-glance04:33
*** Liang__ has quit IRC04:34
*** ratailor has joined #openstack-glance04:34
*** CeeMac has joined #openstack-glance06:04
*** nikparasyr has joined #openstack-glance06:09
*** Liang__ has joined #openstack-glance06:46
*** Liang__ has quit IRC07:23
*** Liang__ has joined #openstack-glance07:24
*** belmoreira has joined #openstack-glance08:30
*** rcernin has quit IRC09:05
*** Liang__ has quit IRC09:24
*** rcernin has joined #openstack-glance09:57
* abhishekk will be back at 1400 UTC09:58
*** rcernin has quit IRC10:40
*** rcernin has joined #openstack-glance10:46
*** rcernin has quit IRC11:00
*** ratailor has quit IRC13:14
*** tkajinam has quit IRC13:32
abhishekkjokke, rosmaita, smcginnis, dansmith glance weekly meeting in 5 minutes at #openstack-meeting13:57
*** k_mouza has joined #openstack-glance14:54
* abhishekk dinner break14:58
* abhishekk back from break15:28
abhishekksmcginnis, when you have time, https://review.opendev.org/74579615:28
*** k_mouza has quit IRC15:30
jokkedansmith: comment on the lock patch ... hopefully it makes sense at least on the level where you can point out what failed in my logic15:32
*** belmoreira has quit IRC15:42
dansmithjokke: replied15:46
*** priteau has joined #openstack-glance15:46
jokkedansmith: was "success" bustable state? I thought it wasn't, that's first part of my confusion15:48
dansmithjokke: https://review.opendev.org/#/c/743597/16/glance/api/v2/images.py@14615:49
jokkeyes, that makes it little more possible for us to get there.15:51
dansmiththe task status transitions aren't enforced at the DB layer, AFAICT,15:52
dansmithwhich means you can have two transactions competing to set a task to conflicting statuses and both will complete15:52
dansmithmeaning, I can have a task.fail() and task.succeed() both complete if they're interleaved properly because the starting state isn't checked at the DB layer, only in python15:53
dansmithunless I missed where that's happening15:53
*** nikparasyr has left #openstack-glance15:53
jokkeAnd honestly I think we should not revert it at that point anymore if the task.succeed() fails. So couple of reasons for that. The task status is actually not publicly announced anywhere and in any configuration we already do have active image that has locations which might be used by nova already. We might end up to a state where our housekeeping causes us to go and delete image data which we15:53
jokkealready told Nova is there15:54
dansmithbut, we do that for all_stores_must_succeed=True, so I don't know why we wouldn't revert and de-location the image here as well15:56
dansmiththat kinda blows the whole ability to believe the task information at all, no?15:57
dansmithIMHO, as rosmaita suggested, exposing tasks to users would be a real improvement going forward, but we need to be reliable about that status and information otherwise it's really not useful15:57
jokkeWelll ... So say this is not copy-image job and we have all_stores_must_succeed=True ... what actualy will happen (and btw this is bug I just realized happening) we will go all the way back to the very first store that finished, we deleted all the locations before that, but the last delete will fail, failing the revert chain and we're left with image that is active and has 1 location15:59
jokkewith the all_stores_must_succeed=True it only is truly revertable until the point the last store succeeds and the image goes active or gets reverted on that _VerifyImageStatus stage.16:01
jokkewe never can truly back out once we have got to the _CompleteTask16:01
jokkedansmith: I'd say the best approach to that is allow task status transition form failed to succeed and let the _CompleteTask do the housekeeping even if someone manages to come and bust that lock in the very last moment16:03
jokkebut I think all that is outside of the scope of this locking for the race condition16:04
jokkeWe've already done all the cleanup, we've active image with locations. At that point we should not be backing out. As it's only matter of our own bookkeeping failing and nothing ot do with the actual operation.16:08
jokkeabhishekk: what do you think ^^16:08
abhishekkI am inclined towards saying we should not revert if something fails in _CompleteTask16:11
abhishekkBut also this should be handled in a different patch16:13
dansmithI have a part of a patch to add more testing for the terminal states of the tasks which is a ways from being finished, so we could do it around that patch as we can assert how this goes16:19
*** gyee has joined #openstack-glance16:40
dansmithhmm, if a revert() raises an exception, it looks like taskflow won't continue reverting the rest of the flow.. is that right?17:00
abhishekkI think so17:02
abhishekknever came to that scenario in practical17:02
dansmithlooks like it goes into REVERT_FAILED and stops trying to call the other revert()s17:03
dansmithwhich seems weird to me, given the goal17:03
abhishekkin that case we need to handle that failure with appropriate messages to complete the revert cycle17:06
jokkedansmith: I think so yes, unless you take care of that exception as usual17:09
* abhishekk signing out early today17:10
jokkeI'm not 100% sure but I think you can pass your own error handler to it, in which case you can do something like except Exception: log.17:10
*** priteau has quit IRC17:36
*** priteau has joined #openstack-glance17:45
dansmithjokke: so, was just working on some more tests,17:47
dansmithand I think updating of image.locations is also not done safely17:47
dansmithif I hit two glance workers, and and importing to two different stores, both may succeed but the last one to img_repo.save() will "win" with the list of locations and we'll lose others that have been added in the meantime17:48
dansmithisn't there some API that allows nova to update the locations directly when it does a snapshot? I'm wondering if that is similarly leaky17:49
*** priteau has quit IRC17:53
jokkedansmith: yeah, it's the locations api18:51
*** jmlowe has quit IRC21:42
*** jmlowe has joined #openstack-glance21:43
*** rcernin has joined #openstack-glance22:00
*** tkajinam has joined #openstack-glance22:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!