*** gyee has quit IRC | 00:21 | |
*** rcernin has quit IRC | 01:11 | |
*** rcernin has joined #openstack-glance | 01:11 | |
*** Liang__ has joined #openstack-glance | 01:23 | |
*** eandersson has quit IRC | 01:29 | |
*** eandersson has joined #openstack-glance | 01:29 | |
*** Liang__ has quit IRC | 03:01 | |
*** rcernin has quit IRC | 03:06 | |
*** Liang__ has joined #openstack-glance | 03:14 | |
*** rcernin has joined #openstack-glance | 03:52 | |
*** Liang__ has quit IRC | 04:02 | |
*** Liang__ has joined #openstack-glance | 04:05 | |
*** CeeMac has quit IRC | 04:26 | |
*** evrardjp has joined #openstack-glance | 04:33 | |
*** Liang__ has quit IRC | 04:34 | |
*** ratailor has joined #openstack-glance | 04:34 | |
*** CeeMac has joined #openstack-glance | 06:04 | |
*** nikparasyr has joined #openstack-glance | 06:09 | |
*** Liang__ has joined #openstack-glance | 06:46 | |
*** Liang__ has quit IRC | 07:23 | |
*** Liang__ has joined #openstack-glance | 07:24 | |
*** belmoreira has joined #openstack-glance | 08:30 | |
*** rcernin has quit IRC | 09:05 | |
*** Liang__ has quit IRC | 09:24 | |
*** rcernin has joined #openstack-glance | 09:57 | |
* abhishekk will be back at 1400 UTC | 09:58 | |
*** rcernin has quit IRC | 10:40 | |
*** rcernin has joined #openstack-glance | 10:46 | |
*** rcernin has quit IRC | 11:00 | |
*** ratailor has quit IRC | 13:14 | |
*** tkajinam has quit IRC | 13:32 | |
abhishekk | jokke, rosmaita, smcginnis, dansmith glance weekly meeting in 5 minutes at #openstack-meeting | 13:57 |
---|---|---|
*** k_mouza has joined #openstack-glance | 14:54 | |
* abhishekk dinner break | 14:58 | |
* abhishekk back from break | 15:28 | |
abhishekk | smcginnis, when you have time, https://review.opendev.org/745796 | 15:28 |
*** k_mouza has quit IRC | 15:30 | |
jokke | dansmith: comment on the lock patch ... hopefully it makes sense at least on the level where you can point out what failed in my logic | 15:32 |
*** belmoreira has quit IRC | 15:42 | |
dansmith | jokke: replied | 15:46 |
*** priteau has joined #openstack-glance | 15:46 | |
jokke | dansmith: was "success" bustable state? I thought it wasn't, that's first part of my confusion | 15:48 |
dansmith | jokke: https://review.opendev.org/#/c/743597/16/glance/api/v2/images.py@146 | 15:49 |
jokke | yes, that makes it little more possible for us to get there. | 15:51 |
dansmith | the task status transitions aren't enforced at the DB layer, AFAICT, | 15:52 |
dansmith | which means you can have two transactions competing to set a task to conflicting statuses and both will complete | 15:52 |
dansmith | meaning, I can have a task.fail() and task.succeed() both complete if they're interleaved properly because the starting state isn't checked at the DB layer, only in python | 15:53 |
dansmith | unless I missed where that's happening | 15:53 |
*** nikparasyr has left #openstack-glance | 15:53 | |
jokke | And honestly I think we should not revert it at that point anymore if the task.succeed() fails. So couple of reasons for that. The task status is actually not publicly announced anywhere and in any configuration we already do have active image that has locations which might be used by nova already. We might end up to a state where our housekeeping causes us to go and delete image data which we | 15:53 |
jokke | already told Nova is there | 15:54 |
dansmith | but, we do that for all_stores_must_succeed=True, so I don't know why we wouldn't revert and de-location the image here as well | 15:56 |
dansmith | that kinda blows the whole ability to believe the task information at all, no? | 15:57 |
dansmith | IMHO, as rosmaita suggested, exposing tasks to users would be a real improvement going forward, but we need to be reliable about that status and information otherwise it's really not useful | 15:57 |
jokke | Welll ... So say this is not copy-image job and we have all_stores_must_succeed=True ... what actualy will happen (and btw this is bug I just realized happening) we will go all the way back to the very first store that finished, we deleted all the locations before that, but the last delete will fail, failing the revert chain and we're left with image that is active and has 1 location | 15:59 |
jokke | with the all_stores_must_succeed=True it only is truly revertable until the point the last store succeeds and the image goes active or gets reverted on that _VerifyImageStatus stage. | 16:01 |
jokke | we never can truly back out once we have got to the _CompleteTask | 16:01 |
jokke | dansmith: I'd say the best approach to that is allow task status transition form failed to succeed and let the _CompleteTask do the housekeeping even if someone manages to come and bust that lock in the very last moment | 16:03 |
jokke | but I think all that is outside of the scope of this locking for the race condition | 16:04 |
jokke | We've already done all the cleanup, we've active image with locations. At that point we should not be backing out. As it's only matter of our own bookkeeping failing and nothing ot do with the actual operation. | 16:08 |
jokke | abhishekk: what do you think ^^ | 16:08 |
abhishekk | I am inclined towards saying we should not revert if something fails in _CompleteTask | 16:11 |
abhishekk | But also this should be handled in a different patch | 16:13 |
dansmith | I have a part of a patch to add more testing for the terminal states of the tasks which is a ways from being finished, so we could do it around that patch as we can assert how this goes | 16:19 |
*** gyee has joined #openstack-glance | 16:40 | |
dansmith | hmm, if a revert() raises an exception, it looks like taskflow won't continue reverting the rest of the flow.. is that right? | 17:00 |
abhishekk | I think so | 17:02 |
abhishekk | never came to that scenario in practical | 17:02 |
dansmith | looks like it goes into REVERT_FAILED and stops trying to call the other revert()s | 17:03 |
dansmith | which seems weird to me, given the goal | 17:03 |
abhishekk | in that case we need to handle that failure with appropriate messages to complete the revert cycle | 17:06 |
jokke | dansmith: I think so yes, unless you take care of that exception as usual | 17:09 |
* abhishekk signing out early today | 17:10 | |
jokke | I'm not 100% sure but I think you can pass your own error handler to it, in which case you can do something like except Exception: log. | 17:10 |
*** priteau has quit IRC | 17:36 | |
*** priteau has joined #openstack-glance | 17:45 | |
dansmith | jokke: so, was just working on some more tests, | 17:47 |
dansmith | and I think updating of image.locations is also not done safely | 17:47 |
dansmith | if I hit two glance workers, and and importing to two different stores, both may succeed but the last one to img_repo.save() will "win" with the list of locations and we'll lose others that have been added in the meantime | 17:48 |
dansmith | isn't there some API that allows nova to update the locations directly when it does a snapshot? I'm wondering if that is similarly leaky | 17:49 |
*** priteau has quit IRC | 17:53 | |
jokke | dansmith: yeah, it's the locations api | 18:51 |
*** jmlowe has quit IRC | 21:42 | |
*** jmlowe has joined #openstack-glance | 21:43 | |
*** rcernin has joined #openstack-glance | 22:00 | |
*** tkajinam has joined #openstack-glance | 22:58 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!