Thursday, 2020-08-13

*** gyee has quit IRC		00:21
*** rcernin has quit IRC		01:11
*** rcernin has joined #openstack-glance		01:11
*** Liang__ has joined #openstack-glance		01:23
*** eandersson has quit IRC		01:29
*** eandersson has joined #openstack-glance		01:29
*** Liang__ has quit IRC		03:01
*** rcernin has quit IRC		03:06
*** Liang__ has joined #openstack-glance		03:14
*** rcernin has joined #openstack-glance		03:52
*** Liang__ has quit IRC		04:02
*** Liang__ has joined #openstack-glance		04:05
*** CeeMac has quit IRC		04:26
*** evrardjp has joined #openstack-glance		04:33
*** Liang__ has quit IRC		04:34
*** ratailor has joined #openstack-glance		04:34
*** CeeMac has joined #openstack-glance		06:04
*** nikparasyr has joined #openstack-glance		06:09
*** Liang__ has joined #openstack-glance		06:46
*** Liang__ has quit IRC		07:23
*** Liang__ has joined #openstack-glance		07:24
*** belmoreira has joined #openstack-glance		08:30
*** rcernin has quit IRC		09:05
*** Liang__ has quit IRC		09:24
*** rcernin has joined #openstack-glance		09:57
* abhishekk will be back at 1400 UTC		09:58
*** rcernin has quit IRC		10:40
*** rcernin has joined #openstack-glance		10:46
*** rcernin has quit IRC		11:00
*** ratailor has quit IRC		13:14
*** tkajinam has quit IRC		13:32
abhishekk	jokke, rosmaita, smcginnis, dansmith glance weekly meeting in 5 minutes at #openstack-meeting	13:57
*** k_mouza has joined #openstack-glance		14:54
* abhishekk dinner break		14:58
* abhishekk back from break		15:28
abhishekk	smcginnis, when you have time, https://review.opendev.org/745796	15:28
*** k_mouza has quit IRC		15:30
jokke	dansmith: comment on the lock patch ... hopefully it makes sense at least on the level where you can point out what failed in my logic	15:32
*** belmoreira has quit IRC		15:42
dansmith	jokke: replied	15:46
*** priteau has joined #openstack-glance		15:46
jokke	dansmith: was "success" bustable state? I thought it wasn't, that's first part of my confusion	15:48
dansmith	jokke: https://review.opendev.org/#/c/743597/16/glance/api/v2/images.py@146	15:49
jokke	yes, that makes it little more possible for us to get there.	15:51
dansmith	the task status transitions aren't enforced at the DB layer, AFAICT,	15:52
dansmith	which means you can have two transactions competing to set a task to conflicting statuses and both will complete	15:52
dansmith	meaning, I can have a task.fail() and task.succeed() both complete if they're interleaved properly because the starting state isn't checked at the DB layer, only in python	15:53
dansmith	unless I missed where that's happening	15:53
*** nikparasyr has left #openstack-glance		15:53
jokke	And honestly I think we should not revert it at that point anymore if the task.succeed() fails. So couple of reasons for that. The task status is actually not publicly announced anywhere and in any configuration we already do have active image that has locations which might be used by nova already. We might end up to a state where our housekeeping causes us to go and delete image data which we	15:53
jokke	already told Nova is there	15:54
dansmith	but, we do that for all_stores_must_succeed=True, so I don't know why we wouldn't revert and de-location the image here as well	15:56
dansmith	that kinda blows the whole ability to believe the task information at all, no?	15:57
dansmith	IMHO, as rosmaita suggested, exposing tasks to users would be a real improvement going forward, but we need to be reliable about that status and information otherwise it's really not useful	15:57
jokke	Welll ... So say this is not copy-image job and we have all_stores_must_succeed=True ... what actualy will happen (and btw this is bug I just realized happening) we will go all the way back to the very first store that finished, we deleted all the locations before that, but the last delete will fail, failing the revert chain and we're left with image that is active and has 1 location	15:59
jokke	with the all_stores_must_succeed=True it only is truly revertable until the point the last store succeeds and the image goes active or gets reverted on that _VerifyImageStatus stage.	16:01
jokke	we never can truly back out once we have got to the _CompleteTask	16:01
jokke	dansmith: I'd say the best approach to that is allow task status transition form failed to succeed and let the _CompleteTask do the housekeeping even if someone manages to come and bust that lock in the very last moment	16:03
jokke	but I think all that is outside of the scope of this locking for the race condition	16:04
jokke	We've already done all the cleanup, we've active image with locations. At that point we should not be backing out. As it's only matter of our own bookkeeping failing and nothing ot do with the actual operation.	16:08
jokke	abhishekk: what do you think ^^	16:08
abhishekk	I am inclined towards saying we should not revert if something fails in _CompleteTask	16:11
abhishekk	But also this should be handled in a different patch	16:13
dansmith	I have a part of a patch to add more testing for the terminal states of the tasks which is a ways from being finished, so we could do it around that patch as we can assert how this goes	16:19
*** gyee has joined #openstack-glance		16:40
dansmith	hmm, if a revert() raises an exception, it looks like taskflow won't continue reverting the rest of the flow.. is that right?	17:00
abhishekk	I think so	17:02
abhishekk	never came to that scenario in practical	17:02
dansmith	looks like it goes into REVERT_FAILED and stops trying to call the other revert()s	17:03
dansmith	which seems weird to me, given the goal	17:03
abhishekk	in that case we need to handle that failure with appropriate messages to complete the revert cycle	17:06
jokke	dansmith: I think so yes, unless you take care of that exception as usual	17:09
* abhishekk signing out early today		17:10
jokke	I'm not 100% sure but I think you can pass your own error handler to it, in which case you can do something like except Exception: log.	17:10
*** priteau has quit IRC		17:36
*** priteau has joined #openstack-glance		17:45
dansmith	jokke: so, was just working on some more tests,	17:47
dansmith	and I think updating of image.locations is also not done safely	17:47
dansmith	if I hit two glance workers, and and importing to two different stores, both may succeed but the last one to img_repo.save() will "win" with the list of locations and we'll lose others that have been added in the meantime	17:48
dansmith	isn't there some API that allows nova to update the locations directly when it does a snapshot? I'm wondering if that is similarly leaky	17:49
*** priteau has quit IRC		17:53
jokke	dansmith: yeah, it's the locations api	18:51
*** jmlowe has quit IRC		21:42
*** jmlowe has joined #openstack-glance		21:43
*** rcernin has joined #openstack-glance		22:00
*** tkajinam has joined #openstack-glance		22:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!