*** Liang__ has joined #openstack-glance | 01:17 | |
*** gyee has quit IRC | 01:49 | |
*** Liang__ has quit IRC | 01:54 | |
*** rcernin has quit IRC | 03:10 | |
*** rcernin has joined #openstack-glance | 03:17 | |
*** m75abrams has joined #openstack-glance | 04:16 | |
*** evrardjp has quit IRC | 04:33 | |
*** evrardjp has joined #openstack-glance | 04:33 | |
*** johanssone has quit IRC | 06:04 | |
*** johanssone has joined #openstack-glance | 06:11 | |
*** nikparasyr has joined #openstack-glance | 06:20 | |
*** ratailor has joined #openstack-glance | 06:30 | |
*** rcernin has quit IRC | 07:07 | |
*** openstackgerrit has quit IRC | 08:09 | |
*** evrardjp has quit IRC | 08:14 | |
*** tkajinam has quit IRC | 08:19 | |
*** evrardjp has joined #openstack-glance | 08:21 | |
*** evrardjp has quit IRC | 08:26 | |
*** udesale has joined #openstack-glance | 08:27 | |
*** evrardjp has joined #openstack-glance | 08:27 | |
*** udesale has quit IRC | 08:27 | |
*** udesale has joined #openstack-glance | 08:27 | |
*** udesale has quit IRC | 08:50 | |
*** udesale has joined #openstack-glance | 09:20 | |
* abhishekk will be back at 1400 UTC | 09:59 | |
*** m75abrams has quit IRC | 10:17 | |
*** m75abrams has joined #openstack-glance | 10:17 | |
*** m75abrams has quit IRC | 10:18 | |
*** m75abrams has joined #openstack-glance | 10:20 | |
*** evrardjp has quit IRC | 10:29 | |
*** evrardjp has joined #openstack-glance | 10:35 | |
*** lpetrut has joined #openstack-glance | 10:45 | |
*** evrardjp has quit IRC | 10:48 | |
*** k_mouza has joined #openstack-glance | 11:12 | |
*** k_mouza has quit IRC | 11:52 | |
*** m75abrams has quit IRC | 12:05 | |
*** m75abrams has joined #openstack-glance | 12:11 | |
*** ratailor has quit IRC | 12:44 | |
abhishekk | dansmith, https://review.opendev.org/#/c/745763/1 should we add related bug or partial fix tag to this patch? | 13:53 |
---|---|---|
dansmith | abhishekk: ah yeah I forgot in the commit message.. related-bug I think, I'm still working on trying to repro the actual problem | 13:54 |
*** openstackgerrit has joined #openstack-glance | 13:55 | |
openstackgerrit | Dan Smith proposed openstack/glance master: Make wait_for_fork() more robust against infinite deadlock https://review.opendev.org/745763 | 13:55 |
abhishekk | dansmith, ack | 13:55 |
dansmith | it ran all night until my machine went to sleep, which is cool, but this morning there are three glance api orphans | 13:55 |
abhishekk | ohh | 13:56 |
*** m75abrams has quit IRC | 14:07 | |
*** Liang__ has joined #openstack-glance | 14:11 | |
*** Liang__ has quit IRC | 14:19 | |
*** Liang__ has joined #openstack-glance | 14:25 | |
dansmith | abhishekk: remind me how image[stores] is populated? | 14:42 |
dansmith | because that copy_revert_lifecycle failure is still confusing me.. the task *does* fail because images_3 is gone, | 14:43 |
dansmith | so I'm wondering if file2 shows up in "stores" briefly before the revert removes something after the failure and the test sometimes grabs the stores list too soon? | 14:43 |
abhishekk | dansmith, we travers through image_locations on get call and retrieve stores from location metadata | 14:43 |
dansmith | and what causes a location to show up? | 14:44 |
*** udesale_ has joined #openstack-glance | 14:44 | |
abhishekk | means? | 14:44 |
abhishekk | ok, got it | 14:44 |
abhishekk | we are adding store as a location metadata as soon as image is imported in store | 14:45 |
dansmith | but only after the copy completes or what? | 14:45 |
dansmith | store.add() is failing | 14:46 |
dansmith | as expected | 14:46 |
dansmith | so I'm not sure how it can show up in locations, unless we add it first, and then remove it if add fails? | 14:46 |
*** udesale has quit IRC | 14:46 | |
abhishekk | When copy completes (which means store.add() finishes, it returns location with metadata back to glance and at the same time we add it to the image | 14:48 |
dansmith | okay, but in the test, the store.add _is_ failing, yet the test sees it show up in stores | 14:48 |
dansmith | 2020-08-11 16:57:34.333 3938 ERROR glance.async_.taskflow_executor File "/home/zuul/src/opendev.org/openstack/glance/.tox/functional-py36/lib/python3.6/site-packages/glance_store/_drivers/filesystem.py", line 720, in add | 14:49 |
dansmith | 2020-08-11 16:57:34.333 3938 ERROR glance.async_.taskflow_executor with open(filepath, 'wb') as f: | 14:49 |
dansmith | 2020-08-11 16:57:34.333 3938 ERROR glance.async_.taskflow_executor FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpsehacfe8/images_3/ff090e76-e8a1-4de2-b567-5055b6a6902d' | 14:49 |
abhishekk | SO in test we are copying image to file2 and file3 store | 14:52 |
dansmith | right, but we have broken file2 so it will fail | 14:52 |
abhishekk | when image is copied to file2 store location metadata is updated | 14:53 |
abhishekk | we have broken file3 | 14:53 |
abhishekk | images_3 is for file3 | 14:53 |
dansmith | right, sorry | 14:53 |
dansmith | so why are we asserting that file2 is not in stores? | 14:54 |
abhishekk | this test is to ensure that if copy operation fails then on revert it will delete all the data from where it is copied successfully | 14:55 |
dansmith | # Ensure data is not deleted from existing stores on failure | 14:55 |
dansmith | ^ does this mean only file1 should remain? | 14:55 |
dansmith | even still, this test *has* to race, because the wait loop waits for file2 to show up, and that will happen while the task is failing in file3 | 14:56 |
abhishekk | yes | 14:56 |
dansmith | so if it waits until file2 succeeds, then grabs the stores before file3 can fail and has time to revert file2, then of course we're seeing file2 before the revert happens right? | 14:56 |
abhishekk | yes, that can be case | 14:57 |
dansmith | okay I think that's what is happening here, and I was previously confused about the comment about what should be happening | 14:57 |
abhishekk | I have added comment there deleting file3 store but I should have mentioned it should delete data from file2 and not from the original store | 14:59 |
*** priteau has joined #openstack-glance | 14:59 | |
* abhishekk will be in meeting for next 45 mins | 15:00 | |
*** Liang__ has quit IRC | 15:07 | |
dansmith | we can't even watch both file2 and file3, because the file3 will fail and go into os_glance_failed_import immediately, but the revert will happen after that, which removes the store location for file2 | 15:09 |
dansmith | we really need the task I think, or to just wait | 15:10 |
abhishekk | I think wait | 15:16 |
dansmith | this also highlights how difficult this will be for a non-human user to use this api reliably :/ | 15:19 |
dansmith | luckily nova currently only ever does a copy to one store at a time, but.. | 15:19 |
dansmith | really seems like we should allow users to see their tasks at least | 15:20 |
*** nikparasyr has left #openstack-glance | 15:21 | |
abhishekk | but we are not asserting anything like this in API and that's why we came up with os_glance_failed_stores so user can identify what happens | 15:23 |
dansmith | right, but they can't tell when the overall process has completed, | 15:23 |
dansmith | so something may go into stores and then be yanked right back out later when another store fails | 15:24 |
dansmith | I know they asked for it if they said all_stores_must_succeed=True, | 15:24 |
dansmith | it's just hard to tell when the revert is done | 15:24 |
abhishekk | hmm | 15:24 |
dansmith | because if file3 fails, it goes into the failed stores list immediately, | 15:24 |
abhishekk | yes | 15:25 |
dansmith | but the revert could take 30 minutes before file2 gets nuked | 15:25 |
abhishekk | I don't think it should take that long | 15:26 |
dansmith | it's disk IO.. it can always take long | 15:26 |
abhishekk | I guess that is also async work? | 15:27 |
dansmith | the delete? might be depending on the backend, but probably isn't even for filestore, right? | 15:29 |
abhishekk | hmm | 15:29 |
dansmith | but even in this trivial test, we're showing that it's unpredictable | 15:29 |
dansmith | hmm, I also don't see file3 go out out of importing_stores and into import_failed when polling | 15:40 |
*** jv_ has quit IRC | 15:47 | |
*** jv_ has joined #openstack-glance | 15:48 | |
dansmith | ahh | 15:49 |
dansmith | yeah this bug goes back to ussuri at least | 15:50 |
dansmith | abhishekk: https://github.com/openstack/glance/blob/stable/ussuri/glance/async_/flows/api_image_import.py#L274 | 15:50 |
dansmith | if all_stores_must_succeed=True, we will never put the store into os_glance_failed_import | 15:51 |
dansmith | because we raise right there we don't continue to put it in that list, and revert doesn't either | 15:51 |
abhishekk | that was the decision taken at that moment | 15:52 |
dansmith | but it also stays in os_glance_importing_to_stores forever | 15:53 |
abhishekk | because we were reverting everything back | 15:53 |
dansmith | ...but we don't | 15:53 |
abhishekk | yeah os_glance_importing_to_store shows remaining stores forever | 15:54 |
dansmith | right so a polling client doesn't know that it failed | 15:55 |
dansmith | file2 doesn't show in failed, but disappears from stores, and file3 says "keep waiting, I'm still doing this" forever | 15:55 |
abhishekk | hmm | 15:56 |
dansmith | since this lifecycle test doesn't assert the value of those importing/failed keys, it didn't notice | 15:56 |
dansmith | I re-wrote the poll loop to wait for all the things to finish, but.. it times out because file3 stays in importing forever :) | 15:56 |
abhishekk | ahh | 15:56 |
dansmith | (as nova will do) | 15:57 |
abhishekk | right | 15:57 |
*** lpetrut has quit IRC | 15:58 | |
dansmith | abhishekk: see if this makes sense: https://bugs.launchpad.net/glance/+bug/1891352 | 16:03 |
openstack | Launchpad bug 1891352 in Glance "Failed import of one store will remain in progress forever if all_stores_must_succeed=True" [Undecided,New] | 16:03 |
abhishekk | thank you, will have a look after meeting | 16:03 |
dansmith | ack, sorry to interrupt | 16:04 |
abhishekk | no problem :D | 16:07 |
abhishekk | dansmith, looks correct, at least the failed store should be listed in os_glance_staging_store even if all_stores_must_succeed=True | 16:10 |
dansmith | right, so I assume that you want *only* the failed store to appear in the failed list, not all the stores that got reverted, right? | 16:11 |
dansmith | it's a little confusing that way, | 16:11 |
dansmith | because a store goes into the stores list and then disappears, but *another* store goes into the failed list.. so if you're using the image, not the one doing the import, it is hard to tell why a store disappeared | 16:11 |
dansmith | but I can understand why it's important to see which store failed, it would just be nice if there was more info | 16:12 |
abhishekk | I think failed store will do as we have documented that on failure everything will be reverted if all_stores_must_succeed is True | 16:12 |
dansmith | yeah, from the perspective of the importer, it makes sense, just not the *other* users of an image | 16:14 |
* abhishekk going for dinner | 16:14 | |
* abhishekk back | 16:47 | |
*** udesale_ has quit IRC | 16:52 | |
openstackgerrit | Dan Smith proposed openstack/glance master: Functional reproducer for bug 1891352 https://review.opendev.org/745932 | 16:53 |
openstack | bug 1891352 in Glance "Failed import of one store will remain in progress forever if all_stores_must_succeed=True" [Undecided,New] https://launchpad.net/bugs/1891352 | 16:53 |
openstackgerrit | Dan Smith proposed openstack/glance master: Fix import failure status reporting when all_stores_must_succeed=True https://review.opendev.org/745933 | 16:53 |
dansmith | abhishekk: ^ | 16:53 |
abhishekk | dansmith, looking | 16:53 |
abhishekk | dansmith, have you pushed those as a dependencies? | 16:55 |
dansmith | of each other? yes | 16:55 |
abhishekk | something went wrong, | 16:56 |
abhishekk | later patch is in merged conflict | 16:56 |
dansmith | there's an actual merge conflict with master, that's why | 16:57 |
openstackgerrit | Dan Smith proposed openstack/glance master: Functional reproducer for bug 1891352 https://review.opendev.org/745932 | 16:58 |
openstack | bug 1891352 in Glance "Failed import of one store will remain in progress forever if all_stores_must_succeed=True" [Undecided,In progress] https://launchpad.net/bugs/1891352 | 16:58 |
openstackgerrit | Dan Smith proposed openstack/glance master: Fix import failure status reporting when all_stores_must_succeed=True https://review.opendev.org/745933 | 16:58 |
abhishekk | ok, now I have one question | 16:59 |
abhishekk | Functional reproducer for bug 1891352 patch will fail at the moment, right? | 16:59 |
openstack | bug 1891352 in Glance "Failed import of one store will remain in progress forever if all_stores_must_succeed=True" [Undecided,In progress] https://launchpad.net/bugs/1891352 | 16:59 |
abhishekk | forget it | 17:00 |
abhishekk | got it, confused little bit by the note | 17:00 |
dansmith | this is a very common pattern we have in nova with things like this, to make it clear that the bugfix actually fixes the bug, by first asserting the broken behavior | 17:02 |
abhishekk | ack | 17:06 |
abhishekk | added one comment on the patch | 17:06 |
dansmith | yep, thanks, I had originally not removed that for that reason and then forgot why.. duh :) | 17:08 |
abhishekk | np :D | 17:09 |
dansmith | fix for that, and the fix I was working on for the polling in that test coming in a sec | 17:11 |
abhishekk | ack | 17:12 |
openstackgerrit | Dan Smith proposed openstack/glance master: Fix import failure status reporting when all_stores_must_succeed=True https://review.opendev.org/745933 | 17:13 |
openstackgerrit | Dan Smith proposed openstack/glance master: Poll for final state on test_copy_image_revert_lifecycle() https://review.opendev.org/745937 | 17:13 |
*** priteau has quit IRC | 17:27 | |
openstackgerrit | Dan Smith proposed openstack/glance master: Implement time-limited import locking https://review.opendev.org/743597 | 17:27 |
openstackgerrit | Dan Smith proposed openstack/glance master: Add functional test for task status updating https://review.opendev.org/745392 | 17:27 |
openstackgerrit | Dan Smith proposed openstack/glance master: Move SynchronousAPIBase to a generalized location https://review.opendev.org/745566 | 17:27 |
*** gyee has joined #openstack-glance | 17:56 | |
abhishekk | dansmith, does your change adds all stores which were reverted to failed list? | 18:07 |
*** vkmc_ has joined #openstack-glance | 18:13 | |
*** knikolla_ has joined #openstack-glance | 18:13 | |
* abhishekk signing out for the day | 18:14 | |
dansmith | abhishekk: no, and the test asserts that it doesn't :) | 18:14 |
abhishekk | dansmith, ack, I though it is added in revert so that might be the case | 18:15 |
dansmith | abhishekk: that's why it checks the result, which taskflow docs say is only a Failure() if *we* are the task that failed, | 18:16 |
dansmith | and the tests confirm that it gets skipped in that case | 18:16 |
abhishekk | cool | 18:16 |
abhishekk | it was saying we are the store that failed, so add us to the failed list | 18:17 |
abhishekk | that's why I was confused | 18:17 |
dansmith | I can clarify that comment for sure, | 18:19 |
dansmith | but to me it makes it clear already.. maybe say explicitly "if we are reverting because another task failed, we will not.." ? | 18:19 |
abhishekk | yeah, makes sense now | 18:20 |
*** knikolla has quit IRC | 18:20 | |
*** vkmc has quit IRC | 18:20 | |
*** knikolla_ is now known as knikolla | 18:20 | |
*** vkmc_ is now known as vkmc | 18:20 | |
abhishekk | I will review that tomorrow morning my time now | 18:26 |
dansmith | cool, thanks | 18:27 |
abhishekk | thank you for quick fix | 18:27 |
dansmith | also I think I figured out the process leak thing, | 18:27 |
abhishekk | and patience | 18:27 |
dansmith | and it seems that it's actually just a bug in test_reload(), but will keep looking | 18:27 |
abhishekk | great | 18:27 |
dansmith | now, go to sleep :) | 18:27 |
abhishekk | yep, good night o/~ | 18:27 |
*** k_mouza has joined #openstack-glance | 19:23 | |
*** k_mouza has quit IRC | 19:27 | |
dansmith | smcginnis: if you're looking for easy things, this helps us not deadlock in the functional tests: https://review.opendev.org/#/c/745763/ | 19:48 |
dansmith | I'm debugging the actual issue in the referenced bug, but that at least avoids the test base blocking forever instead of cleaning up | 19:49 |
smcginnis | Not deadlocking sounds like a useful feature. | 20:02 |
dansmith | should I write a spec? :P | 20:15 |
openstackgerrit | Merged openstack/glance master: Make wait_for_fork() more robust against infinite deadlock https://review.opendev.org/745763 | 21:13 |
*** rcernin has joined #openstack-glance | 22:43 | |
*** rcernin has quit IRC | 22:43 | |
*** rcernin has joined #openstack-glance | 22:43 | |
*** tkajinam has joined #openstack-glance | 22:57 | |
*** openstackgerrit has quit IRC | 23:13 | |
*** whoami-rajat has quit IRC | 23:35 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!