Monday, 2020-08-17

*** m75abrams has joined #openstack-glance04:11
*** evrardjp has quit IRC04:33
*** evrardjp has joined #openstack-glance04:33
*** ratailor has joined #openstack-glance04:56
*** udesale has joined #openstack-glance05:43
*** udesale has quit IRC06:10
*** nikparasyr has joined #openstack-glance06:22
*** happyhemant has joined #openstack-glance06:40
*** belmoreira has joined #openstack-glance06:45
*** k_mouza has joined #openstack-glance08:03
*** k_mouza has quit IRC08:41
*** k_mouza has joined #openstack-glance08:45
*** happyhemant has quit IRC08:50
*** tkajinam has quit IRC09:55
*** Luzi has joined #openstack-glance09:59
*** k_mouza has quit IRC10:10
*** k_mouza has joined #openstack-glance10:12
*** priteau has joined #openstack-glance10:40
*** k_mouza has quit IRC10:52
*** k_mouza has joined #openstack-glance10:55
*** udesale has joined #openstack-glance11:20
*** happyhemant has joined #openstack-glance11:25
*** tkajinam has joined #openstack-glance11:27
*** k_mouza has quit IRC11:57
*** udesale_ has joined #openstack-glance12:14
*** udesale has quit IRC12:17
*** ratailor has quit IRC12:21
*** rosmaita has joined #openstack-glance12:26
*** udesale_ has quit IRC13:09
*** Luzi has quit IRC13:26
dansmithabhishekk: so last week while writing more tests for the locking thing, I discovered that image_update() will destroy the lock we're setting13:39
dansmithso if the thread we're stealing the lock from is still alive and wakes up later, it will trample over the new one13:40
abhishekkhow is that?13:41
abhishekkdoes image_update updates task as well?13:41
dansmithtwo ways, but the most serious is that if I have my image already that has one value for os_glance_import_task, image_update() will change the property in the database to match that, reverting the newer task's lock13:42
dansmithand if I don't have it set, it will delete the lock from the db because prune_props=True13:42
dansmithwhich also seems to mean that someone doing an otherwise harmless update on an image will erase the lock for something in progress13:42
dansmithso I have some changes locally to make image_update exclude some "atomic properties" from that list13:43
abhishekkso at the moment, we only have one atomic property, right?13:43
dansmithright13:44
abhishekkok13:44
dansmithalso,13:44
dansmithit seems to me that the locations update is also very racy, and I have seen multiple times where competing threads will add or delete something from the locations list and drop something that another thread has added13:45
dansmithbecause it pulls the list of locations, modifies, and then saves them as a whole, which means if anything else added a location in the meantime, the save will erase that one13:46
abhishekkgot it13:46
dansmithand I imagine that is something you can hit via the api even13:46
dansmithso writing some more tests has led me down a rabbit hole of these things13:46
abhishekkyes13:46
abhishekkSo we can use this same locking mechanism in location update as well?13:47
dansmithmeaning use a locking property to prevent anyone else from changing the locations list? that would be very heavyweight and probably error prone, especially if we were to leave a lock in place if an update failed or something13:49
dansmiththe locations thing should be fixed by making it only add or update the location rows that need changing I think13:50
dansmithless desirable to make that use a lock,13:50
dansmithbut honestly I only debugged what was happening and kinda gave up for the week, so I haven't looked deep into the location code13:50
abhishekkhmm, makes sense13:50
abhishekkthere will be many corner cases though13:51
dansmithI assume "losing" an image location is a really bad situation right? likely causes us to leak storage, may leave data around after delete, and/or cause data loss if someone deletes another location and doesn't realize the last one has been lost?13:53
abhishekkYeah, if so its really bad13:55
*** k_mouza has joined #openstack-glance13:57
abhishekkhave you reported this issue in launchpad yet?13:59
abhishekkAFAIK glance doesn't encourage cloud users to use location API (apart from services like nova or cinder)14:01
*** k_mouza has quit IRC14:02
dansmithnope, I just made some local notes and stopped for the week14:03
abhishekkack14:03
*** jv_ has joined #openstack-glance14:03
dansmithmaybe it's not a race that is likely to happen in the real world, because of workflow, but it can definitely happen if you have one import that is proceeding slowly, and the lock gets busted such that another import starts doing work14:04
dansmithwhich is kinda why I was hesitant to have this automatic lock breaking anyway, because writing the tests for that case, where the original import is still alive has uncovered several things like this14:05
dansmithso maybe not a practical problem in the past, but it's likely more possible to hit some of this in the future with things like copy-image14:05
*** tkajinam has quit IRC14:06
abhishekkyes, can be possible with copy-image as well as multiple imports with allow_failure True14:08
dansmithyeah14:08
*** k_mouza has joined #openstack-glance14:27
openstackgerritDan Smith proposed openstack/glance master: Handle atomic image properties separately  https://review.opendev.org/74651814:35
*** m75abrams has quit IRC14:50
abhishekkdansmith, we have added os_glance_import_task to _reserved_properties which will restrict it if someone tries to update it explicitly right?14:51
dansmithyeah but that doesn't fix the image save problem14:51
openstackgerritDan Smith proposed openstack/glance master: Functional test enhancement for lock busting  https://review.opendev.org/74653114:54
abhishekkok14:56
dansmithabhishekk: it's because the image and props are fetched as a whole from the db, mutated, and then saved as a whole14:58
*** gyee has joined #openstack-glance14:58
dansmithif someone else saved a lock key in between your get and save, you will save back a prop dict without the lock in there, and purge_props=True will make sure to remove the lock from the database14:58
dansmitheven if you weren't touching the lock property14:59
abhishekkyes, got it14:59
dansmithcool14:59
dansmithabhishekk: so the image location race, I put a mitigation for this into the second patch above, but I think that's another good reason to make a failed import revert, even if we're close to the end, because we've given up our lock and so trying to work on the image when another thread is working will be problematic15:03
abhishekkdansmith, looking15:09
abhishekkdansmith, so you mean to say if in finish task if assert_lock fails then we should revert, right?15:16
dansmithnot just that,15:16
dansmithbut if your remove that assert, the task may have taken a very long time but completed set_data(), but now it no longer has the lock, and it will try to continue doing things without that lock, like set_image_location, or other stuff that the task _with_ the lock may also be doing15:17
dansmithso I'm saying if our lock is busted, we should clean up, revert, etc regardless of if we think we can still proceed15:18
*** belmoreira has quit IRC15:20
dansmithin addition to the racy location updating, there would also be racy os_glance_importing_to_stores updating (although the context will already prohibit that)15:21
abhishekkyep, makes sense15:25
* abhishekk going for dinner break, will be back in 30 mins15:30
*** happyhemant has quit IRC15:35
* abhishekk back from dinner15:57
*** priteau has quit IRC15:57
openstackgerritDan Smith proposed openstack/glance master: Functional test enhancement for lock busting  https://review.opendev.org/74653116:01
openstackgerritDan Smith proposed openstack/glance master: Cleanup import status information after busting a lock  https://review.opendev.org/74655416:01
openstackgerritRajat Dhasmana proposed openstack/glance_store master: Support Cinder multiple stores  https://review.opendev.org/74655616:03
*** belmoreira has joined #openstack-glance16:33
*** belmoreira has quit IRC16:36
*** k_mouza has quit IRC16:37
dansmithabhishekk: if you can think of other cases or things to check to add to test_images_import_locking let me know17:00
abhishekkdansmith, ack17:01
dansmithI probably need to add some comments to that last functional enhancement too17:01
abhishekkdansmith, yes, that will be more helpful17:03
openstackgerritDan Smith proposed openstack/glance master: Functional test enhancement for lock busting  https://review.opendev.org/74653117:11
openstackgerritDan Smith proposed openstack/glance master: Cleanup import status information after busting a lock  https://review.opendev.org/74655417:11
* abhishekk signing out for the day17:24
*** nikparasyr has left #openstack-glance17:28
openstackgerritErno Kuvaja proposed openstack/glance_store master: Ramp up rbd resize to avoid excessive calls  https://review.opendev.org/74657918:01
openstackgerritLuigi Toscano proposed openstack/glance master: zuul: use the new barbican simple-crypto job  https://review.opendev.org/74658218:08
jokkemnaser: around?18:20
mnaserhi18:20
jokkemnaser: hey, mind to do sanity check on  https://review.opendev.org/746579 ? I think you might have heavy interest on that ;)18:21
mnaserseems aggressive but i don't know just how much of a performance benefit it gives18:22
jokkebased on the complains we've received over years, I'd assume quite a bit18:23
jokkebut I'd like you to have a look to make sure that we do not break anything.18:24
*** hoonetorg has joined #openstack-glance18:29
jokkelike I think I did ;)18:30
openstackgerritErno Kuvaja proposed openstack/glance_store master: Ramp up rbd resize to avoid excessive calls  https://review.opendev.org/74657918:40
*** belmoreira has joined #openstack-glance19:13
*** smcginnis has quit IRC19:19
*** smcginnis has joined #openstack-glance19:21
dansmithmnaser: what is aggressive about the resize change? the fact that it scales up to 8G above the size of the image before shrinking?19:51
dansmithI expect ceph can resize without much overhead, but we'd still be calling resize ~2500 times for the current 8M resizes on a 20G image19:52
dansmithif anything, just doubling the chunk size almost seems like it isn't aggressive enough because you'd still call it a bunch of times ramping 8M into the GB range19:53
dansmithI guess it's like 14 calls for a 20G image vs. 2500 so that's clearly an improvement :)19:54
mnaserdansmith: yeah, the 8gb scale up, but to be honest, i don't know the implications of the resize, but i agree it will happen a lot less often20:01
dansmithyeah me either, I just expect the number of calls we have to make to a remote service is the limitation, not necessarily the actual resize itself20:02
*** belmoreira has quit IRC20:43
*** rcernin has joined #openstack-glance22:33
*** tkajinam has joined #openstack-glance23:07

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!