Wednesday, 2021-08-18

opendevreviewAbhishek Kekane proposed openstack/glance master: DNM [Upload policy] functional-py36 timeout test  https://review.opendev.org/c/openstack/glance/+/80489806:04
opendevreviewAbhishek Kekane proposed openstack/glance_store master: Xena cycle Release Notes  https://review.opendev.org/c/openstack/glance_store/+/80495206:38
opendevreviewMridula Joshi proposed openstack/glance master: It was observed that md-tag-create-multiple (/v2/metadefs/namespaces/{namespace_name}/tags)  API overwrites existing tags for specified namespace rather than creating new one in addition to the existing tags. This patch resolves the issue by not deleting the previous data and adding the new tags to existing ones.  https://review.opendev.org/c/openstack/glance/+/80496608:33
abhishekkSummary of timeout issue;08:45
abhishekkLocally I ran functional-py36 on upload patch more than 80 times - No failure08:45
abhishekkUpstream DNM functional-py36 on upload is failing 8 times out of 10 (functional-py38 passes every time)08:45
abhishekkUpstream DNM functional-py36 on Master is passing 100%08:45
abhishekkTo unblock the progress, I going to pull out upload policy change patch out of the chain for time being and then keep looking for possibile failure.08:45
abhishekkdansmith, croelandt, lbragstad ^^^08:45
opendevreviewAbhishek Kekane proposed openstack/glance master: Check download_image policy in the API  https://review.opendev.org/c/openstack/glance/+/80454709:27
opendevreviewAbhishek Kekane proposed openstack/glance master: Check policies for staging operation in API  https://review.opendev.org/c/openstack/glance/+/80455809:27
abhishekkAfter rebasing on master branch both has timed out10:49
opendevreviewAbhishek Kekane proposed openstack/glance master: Check policies for delete image fro store in API  https://review.opendev.org/c/openstack/glance/+/80458511:17
opendevreviewAbhishek Kekane proposed openstack/glance master: Check policies for delete image for store in API  https://review.opendev.org/c/openstack/glance/+/80458511:21
JqckBHello, I have a small question that I can't find an answer : is it possible to get something like an image alias ? I would want to have as much image as distribution have but have only one target for my users.11:47
JqckBSomething like https://wiki.openstack.org/wiki/Glance/ImageAliases11:47
JqckBDo you have an idea on how to do this ? Or if you have an counter-proposition, I'm open to it 11:48
abhishekkcroelandt, dansmith we need to find out some workaround for timeout issue :/13:45
dansmithabhishekk: I was going to ask earlier when you were offline.. it's not clear to me, but is it still just that one patch or did you say that rebasing the others on master makes them timeout also?13:46
abhishekkalso please review this reno patch for glance store, so that once this is merged I can tag glance-store release tonight or tomorrow13:46
dansmithI went to -qa to talk to gmann about the timeout thing and got sucked into a scope discussion13:47
abhishekkI rebased download patch on master and staging patch on top of download, and then both timed out13:47
abhishekksecond attempt they worked13:47
dansmithack okay I thought that's what you meant, just confirming13:47
dansmithgood news is that makes sense right? that it's not something wonky with a patch that clearly shouldn't be causing timeouts13:48
abhishekkI don't think timeout has anything to do with our changes13:48
abhishekkyes13:48
dansmithmaybe we should ask him here, since it's busy in-qa13:50
abhishekkwe almost have everything covered 13:50
abhishekkyeah13:50
dansmithgmann: we're having an issue with our -py36 jobs timing out pretty seriously, but the same -py38 job is fine13:50
dansmithgmann: wondering if it has to do with some of the py36-specific constraints, because otherwise we are not sure what is different13:51
abhishekkhttps://review.opendev.org/c/openstack/glance/+/80489813:51
abhishekkthis is reference13:51
dansmithgmann: have you heard anything like this from other projects or have any ideas?13:51
abhishekkthis is the latest pipeline for time out, https://zuul.opendev.org/t/openstack/builds?result=TIMED_OUT13:51
abhishekkI can see neutron and tacker has latest hit13:52
abhishekkseems similar kind of failure13:54
dansmithI dunno,13:54
dansmiththe neutron one is py3913:55
dansmiththe tacker one is py36 tho13:55
abhishekkyeah, I think neutron lower constraint is on py3613:55
dansmithokay13:56
dansmithwell, let's see what gmann has to say13:56
abhishekkswift functional is also down there13:57
abhishekkack13:57
* abhishekk 3 important days gone13:59
* abhishekk will be back in 30 mins14:12
opendevreviewDan Smith proposed openstack/glance master: Test Revert "Resolve compatibility with oslo.db future (redux)"  https://review.opendev.org/c/openstack/glance/+/80503514:34
opendevreviewDan Smith proposed openstack/glance master: Test Revert "Resolve compatibility with oslo.db future"  https://review.opendev.org/c/openstack/glance/+/80503614:34
dansmithabhishekk: just a guess ^14:34
abhishekkdansmith, ack, worth trying14:46
dansmithlooks like both have passed py3614:49
dansmithwe can recheck several times of course14:50
dansmiths/can/should/14:50
abhishekkyeah14:50
abhishekktopic :P14:55
abhishekkI think here also we should keep 2 jobs only to save the time and try several rechecks14:57
dansmithtopic?15:00
dansmiththese are only running small jobs, so it's not a huge deal15:00
dansmithoh my topic, I see :)15:00
abhishekkfunctional protection is installing openstack I guess15:01
abhishekkyeah15:01
dansmithdevstack?15:02
abhishekkyes15:02
dansmithI can rebase it on your 804898 patch when this run finishes15:02
abhishekk++15:02
dansmithor I guess we already know it passed, so I can do it now, hang on15:02
abhishekk++ ++15:03
opendevreviewDan Smith proposed openstack/glance master: DNM [Upload policy] functional-py36 timeout test  https://review.opendev.org/c/openstack/glance/+/80489815:04
opendevreviewDan Smith proposed openstack/glance master: Test Revert "Resolve compatibility with oslo.db future (redux)"  https://review.opendev.org/c/openstack/glance/+/80503515:04
opendevreviewDan Smith proposed openstack/glance master: Test Revert "Resolve compatibility with oslo.db future"  https://review.opendev.org/c/openstack/glance/+/80503615:04
whoami-rajatdansmith, hey, i remember you worked on a feature to set virtual_size to image during image create operation?15:08
dansmithyup15:08
abhishekkIn local run these tests are skipped 15:08
dansmithabhishekk: what?15:08
gmanndansmith: abhishekk nto any i know about timeout. seems it is not 100% right https://zuul.opendev.org/t/openstack/builds?job_name=openstack-tox-functional-py3615:08
dansmithgmann: like 80%15:08
whoami-rajatdansmith, so does it apply to the case when we 'upload a volume to image' ? if not, is it feasible to do it?15:09
dansmithwhoami-rajat: it should apply to any upload case yeah15:09
dansmithabhishekk: oh the db migration tests?15:10
abhishekkyeah, just confirming15:10
dansmithabhishekk: ah because they need real db running... well, that would explain it yeah? :)15:11
whoami-rajatdansmith, hmm, somehow it doesn't show up in this bug https://bugs.launchpad.net/cinder/+bug/193997215:11
gmanndansmith: ohk and in glance only15:11
whoami-rajati mean the virtual_size is None15:11
abhishekkgmann, yes15:11
dansmithwhoami-rajat: there are logs if it fails to compute it or read the qcow file, so I would look for those15:12
dansmithwhoami-rajat: but also, we don't know the virtual size until we've read enough of the image to have the metadata.. don't we have to have created the volume before that? I'm not sure if the bug is really that we never get the virtual size so much as we have already created the volume at that point15:13
dansmithwhoami-rajat: look for these: https://github.com/openstack/glance/blob/dd3155516cec2cabf8f74963a44ab642d507384b/glance/location.py#L572-L58515:15
whoami-rajatdansmith, in this scenario, a) we create the volume, b) upload it as an image, c) create a new volume from that image15:15
whoami-rajatso we should have virtual_size in step b)15:16
dansmithack15:16
whoami-rajatdansmith, ok, i will try that scenario out and see for any issues15:16
dansmithwhoami-rajat: cool, let me know what you find15:17
dansmithwhoami-rajat: I think there was some discussion of a "15:18
dansmithof a "qcow2 v2" recently, so maybe we're not detecting the new version properly or something? the amount of data we need for qcow2 is very small, so I would hope it hasn't changed *that* much15:18
* abhishekk successful on all 3 patches15:19
dansmiththat likely means it's just the redux patch I think15:21
dansmithsince they revert in reverse order15:21
whoami-rajatdansmith, sure, IIRC the discussion was around making qcow2-v2 images read only which would render them unusable unless converted to qcow2-v3 format or maybe I'm thinking of another discussion15:21
dansmithwhoami-rajat: okay I thought it was qcow2-v1 and qcow2-v2, but still my point is, if we're not handling something about the *newer* format, that could explain it15:22
dansmithwhoami-rajat: but if you can repro it not working, then we can go from there15:22
abhishekkdansmith, likely, added rechecks on both the patches15:22
whoami-rajatdansmith, sure, thanks for your inputs, i will try and let you know15:22
dansmithwhoami-rajat: cool thanks.. I was pretty excited about this feature because I thought the implementation was cool, so I definitely want to make sure it's working :)15:23
dansmithabhishekk: ack.. I think we should recheck several more just to be sure, but this actually makes sense, which is good15:23
abhishekk++15:24
dansmithabhishekk: also, stephenfin is really smart, so once we tell him we think there's a problem I'm sure he'll fix it quick...oops :)15:24
abhishekk:)15:24
stephenfinappealing to my ego, I see. Clever...15:24
dansmithheh15:25
abhishekkmay be you will explain it better and in short 15:25
whoami-rajatdansmith, yep, i think i requested it long ago in a PTG and my general usecase was to know the virtual_size before uploading image to volume (cinder store), i wasn't aware at that time that glance streams the image in chunks and writes to volume so it wasn't possible15:25
whoami-rajatbut anyway it's a good feature :)15:26
dansmithwhoami-rajat: :)15:26
dansmithstephenfin: it seems like the redux patch is causing one worker to completely hang about 80% of the time, only on py36 and causing most of our runs to timed_out on that job15:27
dansmiththat's the current theory anyway15:27
dansmithwe couldn't repro the failure locally, but now thinking because nobody is running those tests locally because they don't have the opportunistic config15:27
stephenfinOh, that's interesting. In theory all I've done is made the session/connection management explicit rather than automatic15:28
dansmithack, I haven't even looked at them, we were just stabbing in the dark15:28
stephenfinbut frickler (I think) noted...somewhere that SQLA 1.4 had some significant performance issues15:29
dansmithbut the fact that it seems to always only happen on py36 is interesting15:29
stephenfinvery15:29
dansmithdo we pin sqla or any related dep on py36 but not py38?15:29
abhishekkas we are talking it passed 3rd consecutive time on redux revert15:29
dansmithI looked at u-c and saw some different pins, but...15:29
stephenfinif you get a reliable pass rate, it might be worth dragging in zzzeek if he can spare the time15:29
stephenfinwe shouldn't be, but I'll check15:29
dansmithI don15:30
dansmithI don't think it's just running slow or something, it seems totally hung because in the subunits I looked at, we got zero reports ever from the affected worker15:30
stephenfinnope, nothing specific in openstack/requirements or glance itself, fwict15:30
dansmithwe have some different pins in u-c based on python version, but nothing really substantial that I saw15:31
dansmithlike networx, which I know changed recently, but doubt that is related15:31
stephenfinYeah, hangs coupled with the fact the patches made changes to DB session management would suggest a relationship. I'll play around with it...15:33
dansmithare we really running the same version of mysql, pymysqlclient, and sqla on both py36 and py38? I guess I would expect maybe some delta there since py36 distros would have been a while ago15:35
dansmithsince it's apparently version specific, I'd expect this to be related to a different version of something in that stack and not just python15:35
stephenfinFair point. I think we're running on Ubuntu 18.04 for the 3.6 tests and 20.04 for the 3.8 ones15:46
stephenfinso it's not identical15:46
dansmithah yeah, so that means different mysql at least I assume15:47
opendevreviewGhanshyam proposed openstack/glance master: Suppress policy deprecation and default change warnings  https://review.opendev.org/c/openstack/glance/+/80504915:48
dansmithgmann: thank $deity15:48
gmanndansmith: abhishekk ^^ these warning may cause timeout with log filling 15:49
dansmiththey certainly cause pain even without timeouts, so thanks15:49
abhishekk++15:49
* abhishekk going for dinner15:54
opendevreviewGhanshyam proposed openstack/glance master: Suppress policy deprecation and default change warnings  https://review.opendev.org/c/openstack/glance/+/80504916:13
abhishekksmcginnis, could you please review https://review.opendev.org/c/openstack/glance_store/+/80495216:46
abhishekkdansmith, I think 4 successful runs are enough or we should add more rechecks?16:53
dansmithabhishekk: I dunno, I'd do a bunch because they're easy, but sounds like the wheels are moving anyway, so might as well be sure16:54
abhishekkyeah, I will add one more round of recheck16:54
abhishekkthinking of signing out early today, but if we have a fix then you can ninja approve it16:55
kukaczhi, having cinder as glance backend, I am dealing with issue that volumes are byte-streamed from images on creation, instead of being just thin cloned on the cinder backend (powerflex). what might be wrong?16:57
dansmithabhishekk: ack16:58
dansmithabhishekk: struggling with the image factory auth layer thing. causes a ton of fails because something else is going on, so don't think I'm not working on that :)16:58
abhishekkdansmith, no issues16:59
abhishekkkukacz, I don't think cinder has that support17:00
abhishekkyou are thinking similar to copy_on_write ?17:00
abhishekkdansmith, I will scan through glance code and see whether we skipped any get_repo or similar calls to avoid policy enforcement (may be tomorrow)17:02
dansmithokay17:02
kukaczabhishekk: yes, I expect a snapshot based copy to be instantly created since both glance and cinder have the same backend in the end17:03
abhishekkcool, signing out for the day, have a good day17:03
abhishekkwhoami-rajat, could you please confirm whether we this support is there ?17:03
kukaczabhishekk: looking at this blueprint I thought the support might be in place: https://specs.openstack.org/openstack/cinder-specs/specs/liberty/clone-image-in-glance-cinder-backend.html17:04
abhishekkkukacz, need to have a look at it17:05
gmanndansmith: abhishekk can either of you check this too, glance policy scope configuration setting are moved to devstack side (depends-on merged) which avoid restarting the api services - https://review.opendev.org/c/openstack/glance/+/77895217:06
dansmithabhishekk: omg.. I think the auth layer is the only thing that actually sets image.owner! so removing that makes all images owner=None.17:07
dansmithgmann: will have to queue, buried atm17:07
gmanndansmith: sure, not urgent when you have time17:07
abhishekkdansmith, similar thing I faced in namespace17:07
dansmithabhishekk: where is my lighter...17:08
dansmithhttps://pics.me.me/burn-it-burn-all-down-memes-com-16257715.png17:08
kukaczabhishekk: thanks. I am talking about offloading the volume cloning to the real cinder backend, to be clear17:08
abhishekkhttps://review.opendev.org/c/openstack/glance/+/799633/22/glance/api/v2/metadef_namespaces.py@15317:08
abhishekklol17:09
dansmithabhishekk: yeah, same thing.. wtf17:09
abhishekkkukacz, ack, I will have a look possibly tomorrow, if it is urgent you can ping in cinder channel as well17:10
dansmithtoday is definitely a "4 coffee" day17:11
abhishekk+117:12
kukaczabhishekk: thanks a lot. yes, it is quite urgent, I will try cinder chanel and get back to you tomorrow if needed17:12
abhishekkkukacz, no problem, I hope rosmaita or whoami-rajat will be able to point it out17:13
abhishekks/hope/sure17:13
* abhishekk 5th check also cleared17:19
dansmithwoot17:21
abhishekkso till we have a fix or if it is going to take a time can we have flip on connection on the basis of python version ?17:26
dansmithI guess we could disable the opportunistic checking on py36 only17:29
dansmithgmann: is that legit/possible in our own repo?17:29
dansmithabhishekk: we should have a bug that documents the offending patch, failures, etc right? just for the record I would think.17:30
abhishekkdansmith, makes sense17:30
opendevreviewDan Smith proposed openstack/glance master: Check add_image policy in the API  https://review.opendev.org/c/openstack/glance/+/80480017:31
opendevreviewDan Smith proposed openstack/glance master: Refactor gateway auth layer for image factory  https://review.opendev.org/c/openstack/glance/+/80506517:31
abhishekk\o/17:31
dansmithabhishekk: I'm pretty fried, so the above deserves plenty of checking for proper testing and such17:31
gmanndansmith: it is functional job right not unit test? 17:32
dansmithgmann: yeah17:32
dansmithgmann: seems the regression was this: https://review.opendev.org/c/openstack/glance/+/80503517:32
gmanndansmith: then you can either remvoe or make n-v openstack-tox-functional-py3617:33
dansmithabhishekk: I guess we could just merge that revert and keep the testing, since we have time before we need that SA 2.0 fix right?17:33
dansmithgmann: okay I think making it n-v is probably bad, especially since this apparently really caught something.. probably better to just actually revert the problem patch and keep testing17:33
gmanndansmith: yeah17:34
abhishekkdansmith, sounds good17:34
whoami-rajatkukacz, hi, what is the format of your image? (raw, qcow2)17:34
dansmithabhishekk: if you're filing a bug, I can fix the redux to reference that and remove "Test" from the top17:35
abhishekkI am on mobile atm17:36
dansmithabhishekk: oh sure, good excuse to get out of filing a bug :P17:36
abhishekkjust give me some time, will file it :d17:36
dansmithnah, I'm on it17:36
abhishekkcool, thank you17:37
abhishekkyou can assume my +2 on the patch/revert17:37
dansmithack17:38
kukaczwhoami-rajat: I have tried both. now I am testing with raw, after noting it was a prerequisite in the blueprint doc17:40
opendevreviewDan Smith proposed openstack/glance master: Revert "Resolve compatibility with oslo.db future (redux)"  https://review.opendev.org/c/openstack/glance/+/80503517:40
whoami-rajatkukacz, and are you using multiple stores or single store?17:43
dansmithcroelandt: abhishekk is mobile on his badass hog, so if you want to ack this just for coverage that'd be cool: https://review.opendev.org/c/openstack/glance/+/80503517:45
dansmithcroelandt: tl;dr that's the cause of the timeouts we've been seeing17:45
kukaczwhoami-rajat: multiple stores, having cinder as the default set in glance-api.conf17:46
* dansmith hopes the "hog" slang term for motorcycle is accepted worldwide17:46
abhishekk:P17:47
abhishekklet me check the patch, I think croelandt is not around today17:47
dansmithoh okay, I ninja'd anyway17:48
abhishekkack17:48
whoami-rajatkukacz, then we need to fix that optimization since we haven't made cinder side changes for multiple stores yet, https://review.opendev.org/c/openstack/cinder/+/75565417:50
abhishekkhopefully things will get moving now17:54
kukaczwhoami-rajat: great, thank you. is there a workaround, when using victoria? would it help perhaps to switch to single store, while using cinder as the only backend?17:54
whoami-rajatkukacz, yes, if you use  single store (and create a new image), this optimization should work18:00
whoami-rajatkukacz, the main part is "enabled_backends" conf shouldn't be set18:00
kukaczwhoami-rajat: thanks, I will try reconfiguring it18:02
croelandtdansmith: understanding why that patch caused this issue is gonna be *fun*18:08
croelandtI +2/+1ed it18:08
croelandtwe should have 10 patches merged in the next 24 hours then :D18:08
dansmithcroelandt: seems likely to me that it's related to the fact that we're creating more connections explicitly, but yeah, haven't delved into the why yet18:09
croelandtyeah but these connections are supposed to be short-lived, that's weird18:15
kukaczwhoami-rajat: what determines whether the configuration is multi store or single? I have commented out the enabled_backends and still have the issue. have the default_store param set to cinder, not sure if I can remove that one too18:25
whoami-rajatkukacz, in single store, you have one group [glance_store] , you can refer to the config here https://paste.opendev.org/show/808184/18:27
kukaczwhoami-rajat: thanks a lot. will try18:30
whoami-rajatnp18:31
kukaczwhoami-rajat: can I confirm from image parameters that it was correctly created using the single store config. or the image does not reflect the configuration method at all?18:36
whoami-rajatkukacz, in my single store env, i don't see the location returned in image properties and i manually checked it inside the db, maybe dansmith or croelandt know some optional parameters to return the location as well with image properties18:38
dansmithyeah there is a flag to enable showing locations18:39
dansmithshow_multiple_locations18:40
dansmithshow_image_direct_url18:40
kukaczdansmith: thanks! and is the show_image_direct_url parameter important to make the image cloning to volume work? I am dealing with the case when image data is copied into volume instead of having a storage backend-level thin clone created18:44
whoami-rajatdansmith, yes, enabling those does show the locations18:45
dansmithkukacz: dunno, but it does make nova use the hot-clone thing for booting instances18:45
whoami-rajatkukacz,  In the cinder conf, we need to set "allowed_direct_url_schemes" to the value "cinder" and in glance conf, "show_image_direct_url" and "show_multiple_locations" needs to be set to "True"18:45
kukaczwhoami-rajat: you mean even when I am using single store?18:46
whoami-rajatkukacz, yes18:46
kukaczthank you guys, will try that after a short break18:47
* whoami-rajat signing out18:48
kukacznow it works with these parameters and single store. thank you!20:14
*** timburke_ is now known as timburke21:00
croelandtdansmith: well, the patch that fixes the timeout failed... with a timeout22:47
dansmithyeah just saw22:58
dansmithnot awesome22:59
dansmithI was trying to say "we need lots of rechecks on this" but it definitely seemed likely22:59
croelandtso this patch may not be the culprit :/23:04

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!