Wednesday, 2020-08-12

*** Liang__ has joined #openstack-glance01:17
*** gyee has quit IRC01:49
*** Liang__ has quit IRC01:54
*** rcernin has quit IRC03:10
*** rcernin has joined #openstack-glance03:17
*** m75abrams has joined #openstack-glance04:16
*** evrardjp has quit IRC04:33
*** evrardjp has joined #openstack-glance04:33
*** johanssone has quit IRC06:04
*** johanssone has joined #openstack-glance06:11
*** nikparasyr has joined #openstack-glance06:20
*** ratailor has joined #openstack-glance06:30
*** rcernin has quit IRC07:07
*** openstackgerrit has quit IRC08:09
*** evrardjp has quit IRC08:14
*** tkajinam has quit IRC08:19
*** evrardjp has joined #openstack-glance08:21
*** evrardjp has quit IRC08:26
*** udesale has joined #openstack-glance08:27
*** evrardjp has joined #openstack-glance08:27
*** udesale has quit IRC08:27
*** udesale has joined #openstack-glance08:27
*** udesale has quit IRC08:50
*** udesale has joined #openstack-glance09:20
* abhishekk will be back at 1400 UTC09:59
*** m75abrams has quit IRC10:17
*** m75abrams has joined #openstack-glance10:17
*** m75abrams has quit IRC10:18
*** m75abrams has joined #openstack-glance10:20
*** evrardjp has quit IRC10:29
*** evrardjp has joined #openstack-glance10:35
*** lpetrut has joined #openstack-glance10:45
*** evrardjp has quit IRC10:48
*** k_mouza has joined #openstack-glance11:12
*** k_mouza has quit IRC11:52
*** m75abrams has quit IRC12:05
*** m75abrams has joined #openstack-glance12:11
*** ratailor has quit IRC12:44
abhishekkdansmith, https://review.opendev.org/#/c/745763/1 should we add related bug or partial fix tag to this patch?13:53
dansmithabhishekk: ah yeah I forgot in the commit message.. related-bug I think, I'm still working on trying to repro the actual problem13:54
*** openstackgerrit has joined #openstack-glance13:55
openstackgerritDan Smith proposed openstack/glance master: Make wait_for_fork() more robust against infinite deadlock  https://review.opendev.org/74576313:55
abhishekkdansmith, ack13:55
dansmithit ran all night until my machine went to sleep, which is cool, but this morning there are three glance api orphans13:55
abhishekkohh13:56
*** m75abrams has quit IRC14:07
*** Liang__ has joined #openstack-glance14:11
*** Liang__ has quit IRC14:19
*** Liang__ has joined #openstack-glance14:25
dansmithabhishekk: remind me how image[stores] is populated?14:42
dansmithbecause that copy_revert_lifecycle failure is still confusing me.. the task *does* fail because images_3 is gone,14:43
dansmithso I'm wondering if file2 shows up in "stores" briefly before the revert removes something after the failure and the test sometimes grabs the stores list too soon?14:43
abhishekkdansmith, we travers through image_locations on get call and retrieve stores from location metadata14:43
dansmithand what causes a location to show up?14:44
*** udesale_ has joined #openstack-glance14:44
abhishekkmeans?14:44
abhishekkok, got it14:44
abhishekkwe are adding store as a location metadata as soon as image is imported in store14:45
dansmithbut only after the copy completes or what?14:45
dansmithstore.add() is failing14:46
dansmithas expected14:46
dansmithso I'm not sure how it can show up in locations, unless we add it first, and then remove it if add fails?14:46
*** udesale has quit IRC14:46
abhishekkWhen copy completes (which means store.add() finishes, it returns location with metadata back to glance and at the same time we add it to the image14:48
dansmithokay, but in the test, the store.add _is_ failing, yet the test sees it show up in stores14:48
dansmith2020-08-11 16:57:34.333 3938 ERROR glance.async_.taskflow_executor   File "/home/zuul/src/opendev.org/openstack/glance/.tox/functional-py36/lib/python3.6/site-packages/glance_store/_drivers/filesystem.py", line 720, in add14:49
dansmith2020-08-11 16:57:34.333 3938 ERROR glance.async_.taskflow_executor     with open(filepath, 'wb') as f:14:49
dansmith2020-08-11 16:57:34.333 3938 ERROR glance.async_.taskflow_executor FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpsehacfe8/images_3/ff090e76-e8a1-4de2-b567-5055b6a6902d'14:49
abhishekkSO in test we are copying image to file2 and file3 store14:52
dansmithright, but we have broken file2 so it will fail14:52
abhishekkwhen image is copied to file2 store location metadata is updated14:53
abhishekkwe have broken file314:53
abhishekkimages_3 is for file314:53
dansmithright, sorry14:53
dansmithso why are we asserting that file2 is not in stores?14:54
abhishekkthis test is to ensure that if copy operation fails then on revert it will delete all the data from where it is copied successfully14:55
dansmith        # Ensure data is not deleted from existing stores on failure14:55
dansmith^ does this mean only file1 should remain?14:55
dansmitheven still, this test *has* to race, because the wait loop waits for file2 to show up, and that will happen while the task is failing in file314:56
abhishekkyes14:56
dansmithso if it waits until file2 succeeds, then grabs the stores before file3 can fail and has time to revert file2, then of course we're seeing file2 before the revert happens right?14:56
abhishekkyes, that can be case14:57
dansmithokay I think that's what is happening here, and I was previously confused about the comment about what should be happening14:57
abhishekkI have added comment there deleting file3 store but I should have mentioned it should delete data from file2 and not from the original store14:59
*** priteau has joined #openstack-glance14:59
* abhishekk will be in meeting for next 45 mins15:00
*** Liang__ has quit IRC15:07
dansmithwe can't even watch both file2 and file3, because the file3 will fail and go into os_glance_failed_import immediately, but the revert will happen after that, which removes the store location for file215:09
dansmithwe really need the task I think, or to just wait15:10
abhishekkI think wait15:16
dansmiththis also highlights how difficult this will be for a non-human user to use this api reliably :/15:19
dansmithluckily nova currently only ever does a copy to one store at a time, but..15:19
dansmithreally seems like we should allow users to see their tasks at least15:20
*** nikparasyr has left #openstack-glance15:21
abhishekkbut we are not asserting anything like this in API and that's why we came up with os_glance_failed_stores so user can identify what happens15:23
dansmithright, but they can't tell when the overall process has completed,15:23
dansmithso something may go into stores and then be yanked right back out later when another store fails15:24
dansmithI know they asked for it if they said all_stores_must_succeed=True,15:24
dansmithit's just hard to tell when the revert is done15:24
abhishekkhmm15:24
dansmithbecause if file3 fails, it goes into the failed stores list immediately,15:24
abhishekkyes15:25
dansmithbut the revert could take 30 minutes before file2 gets nuked15:25
abhishekkI don't think it should take that long15:26
dansmithit's disk IO.. it can always take long15:26
abhishekkI guess that is also async work?15:27
dansmiththe delete? might be depending on the backend, but probably isn't even for filestore, right?15:29
abhishekkhmm15:29
dansmithbut even in this trivial test, we're showing that it's unpredictable15:29
dansmithhmm, I also don't see file3 go out out of importing_stores and into import_failed when polling15:40
*** jv_ has quit IRC15:47
*** jv_ has joined #openstack-glance15:48
dansmithahh15:49
dansmithyeah this bug goes back to ussuri at least15:50
dansmithabhishekk: https://github.com/openstack/glance/blob/stable/ussuri/glance/async_/flows/api_image_import.py#L27415:50
dansmithif all_stores_must_succeed=True, we will never put the store into os_glance_failed_import15:51
dansmithbecause we raise right there we don't continue to put it in that list, and revert doesn't either15:51
abhishekkthat was the decision taken at that moment15:52
dansmithbut it also stays in os_glance_importing_to_stores forever15:53
abhishekkbecause we were reverting everything back15:53
dansmith...but we don't15:53
abhishekkyeah os_glance_importing_to_store shows remaining stores forever15:54
dansmithright so a polling client doesn't know that it failed15:55
dansmithfile2 doesn't show in failed, but disappears from stores, and file3 says "keep waiting, I'm still doing this" forever15:55
abhishekkhmm15:56
dansmithsince this lifecycle test doesn't assert the value of those importing/failed keys, it didn't notice15:56
dansmithI re-wrote the poll loop to wait for all the things to finish, but.. it times out because file3 stays in importing forever :)15:56
abhishekkahh15:56
dansmith(as nova will do)15:57
abhishekkright15:57
*** lpetrut has quit IRC15:58
dansmithabhishekk: see if this makes sense: https://bugs.launchpad.net/glance/+bug/189135216:03
openstackLaunchpad bug 1891352 in Glance "Failed import of one store will remain in progress forever if all_stores_must_succeed=True" [Undecided,New]16:03
abhishekkthank you, will have a look after meeting16:03
dansmithack, sorry to interrupt16:04
abhishekkno problem :D16:07
abhishekkdansmith, looks correct, at least the failed store should be listed in os_glance_staging_store even if  all_stores_must_succeed=True16:10
dansmithright, so I assume that you want *only* the failed store to appear in the failed list, not all the stores that got reverted, right?16:11
dansmithit's a little confusing that way,16:11
dansmithbecause a store goes into the stores list and then disappears, but *another* store goes into the failed list.. so if you're using the image, not the one doing the import, it is hard to tell why a store disappeared16:11
dansmithbut I can understand why it's important to see which store failed, it would just be nice if there was more info16:12
abhishekkI think failed store will do as we have documented that on failure everything will be reverted if all_stores_must_succeed is True16:12
dansmithyeah, from the perspective of the importer, it makes sense, just not the *other* users of an image16:14
* abhishekk going for dinner16:14
* abhishekk back16:47
*** udesale_ has quit IRC16:52
openstackgerritDan Smith proposed openstack/glance master: Functional reproducer for bug 1891352  https://review.opendev.org/74593216:53
openstackbug 1891352 in Glance "Failed import of one store will remain in progress forever if all_stores_must_succeed=True" [Undecided,New] https://launchpad.net/bugs/189135216:53
openstackgerritDan Smith proposed openstack/glance master: Fix import failure status reporting when all_stores_must_succeed=True  https://review.opendev.org/74593316:53
dansmithabhishekk: ^16:53
abhishekkdansmith, looking16:53
abhishekkdansmith, have you pushed those as a dependencies?16:55
dansmithof each other? yes16:55
abhishekksomething went wrong,16:56
abhishekklater patch is in merged conflict16:56
dansmiththere's an actual merge conflict with master, that's why16:57
openstackgerritDan Smith proposed openstack/glance master: Functional reproducer for bug 1891352  https://review.opendev.org/74593216:58
openstackbug 1891352 in Glance "Failed import of one store will remain in progress forever if all_stores_must_succeed=True" [Undecided,In progress] https://launchpad.net/bugs/189135216:58
openstackgerritDan Smith proposed openstack/glance master: Fix import failure status reporting when all_stores_must_succeed=True  https://review.opendev.org/74593316:58
abhishekkok, now I have one question16:59
abhishekkFunctional reproducer for bug 1891352 patch will fail at the moment, right?16:59
openstackbug 1891352 in Glance "Failed import of one store will remain in progress forever if all_stores_must_succeed=True" [Undecided,In progress] https://launchpad.net/bugs/189135216:59
abhishekkforget it17:00
abhishekkgot it, confused little bit by the note17:00
dansmiththis is a very common pattern we have in nova with things like this, to make it clear that the bugfix actually fixes the bug, by first asserting the broken behavior17:02
abhishekkack17:06
abhishekkadded one comment on the patch17:06
dansmithyep, thanks, I had originally not removed that for that reason and then forgot why.. duh :)17:08
abhishekknp :D17:09
dansmithfix for that, and the fix I was working on for the polling in that test coming in a sec17:11
abhishekkack17:12
openstackgerritDan Smith proposed openstack/glance master: Fix import failure status reporting when all_stores_must_succeed=True  https://review.opendev.org/74593317:13
openstackgerritDan Smith proposed openstack/glance master: Poll for final state on test_copy_image_revert_lifecycle()  https://review.opendev.org/74593717:13
*** priteau has quit IRC17:27
openstackgerritDan Smith proposed openstack/glance master: Implement time-limited import locking  https://review.opendev.org/74359717:27
openstackgerritDan Smith proposed openstack/glance master: Add functional test for task status updating  https://review.opendev.org/74539217:27
openstackgerritDan Smith proposed openstack/glance master: Move SynchronousAPIBase to a generalized location  https://review.opendev.org/74556617:27
*** gyee has joined #openstack-glance17:56
abhishekkdansmith, does your change adds all stores which were reverted to failed list?18:07
*** vkmc_ has joined #openstack-glance18:13
*** knikolla_ has joined #openstack-glance18:13
* abhishekk signing out for the day18:14
dansmithabhishekk: no, and the test asserts that it doesn't :)18:14
abhishekkdansmith, ack, I though it is added in revert so that might be the case18:15
dansmithabhishekk: that's why it checks the result, which taskflow docs say is only a Failure() if *we* are the task that failed,18:16
dansmithand the tests confirm that it gets skipped in that case18:16
abhishekkcool18:16
abhishekkit was saying we are the store that failed, so add us to the failed list18:17
abhishekkthat's why I was confused18:17
dansmithI can clarify that comment for sure,18:19
dansmithbut to me it makes it clear already.. maybe say explicitly "if we are reverting because another task failed, we will not.." ?18:19
abhishekkyeah, makes sense now18:20
*** knikolla has quit IRC18:20
*** vkmc has quit IRC18:20
*** knikolla_ is now known as knikolla18:20
*** vkmc_ is now known as vkmc18:20
abhishekkI will review that tomorrow morning my time now18:26
dansmithcool, thanks18:27
abhishekkthank you for quick fix18:27
dansmithalso I think I figured out the process leak thing,18:27
abhishekkand patience18:27
dansmithand it seems that it's actually just a bug in test_reload(), but will keep looking18:27
abhishekkgreat18:27
dansmithnow, go to sleep :)18:27
abhishekkyep, good night o/~18:27
*** k_mouza has joined #openstack-glance19:23
*** k_mouza has quit IRC19:27
dansmithsmcginnis: if you're looking for easy things, this helps us not deadlock in the functional tests: https://review.opendev.org/#/c/745763/19:48
dansmithI'm debugging the actual issue in the referenced bug, but that at least avoids the test base blocking forever instead of cleaning up19:49
smcginnisNot deadlocking sounds like a useful feature.20:02
dansmithshould I write a spec? :P20:15
openstackgerritMerged openstack/glance master: Make wait_for_fork() more robust against infinite deadlock  https://review.opendev.org/74576321:13
*** rcernin has joined #openstack-glance22:43
*** rcernin has quit IRC22:43
*** rcernin has joined #openstack-glance22:43
*** tkajinam has joined #openstack-glance22:57
*** openstackgerrit has quit IRC23:13
*** whoami-rajat has quit IRC23:35

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!