Monday, 2023-04-10

opendevreviewPranali Deore proposed openstack/glance_store master: DNM: Test whether few failing jobs passes or not  https://review.opendev.org/c/openstack/glance_store/+/87994007:42
opendevreviewPranali Deore proposed openstack/glance master: Change DB migration constant to 2023_2  https://review.opendev.org/c/openstack/glance/+/87994709:58
opendevreviewMerged openstack/glance-specs master: Add a script to prepare the next cycle  https://review.opendev.org/c/openstack/glance-specs/+/87812110:35
dansmithI get all the "ResourceWarning" messages when I run functional locally13:36
dansmithand I do see some leaked processes, but they fluctuate and apparently eventually go away13:39
dansmithand lots of failed process launch status, "no such process" etc13:40
dansmithabhishekk: something clearly happened between 3/14 and 3/27: https://zuul.opendev.org/t/openstack/builds?job_name=glance-tox-functional-py39-rbac-defaults14:02
dansmithalmost nothing other than merging that startup directory check, but that passed the same tests, and reverting it locally doesn't fix anything for me14:02
abhishekkdansmith, ack, don't think that startup directory check has anything to do with timeouts14:03
dansmithwell, I thought maybe it was preventing the functional api workers from starting, because that seems to be the failure (api servers aren't running)14:04
dansmithand because the time lined up, but yeah, seems unrelated14:04
abhishekkmay be need to use skip to isolate/find out the failing test14:05
dansmithit's a ton of them, and maybe all of them?14:06
dansmithI assume you see this in the logs:  AssertionError: False is not true : Unexpected server launch status for: api,14:06
abhishekkis there any requirement side change in between related to eventlet or something ?14:06
dansmithI've been looking but I don't think so, unless there's something unconstrained14:07
abhishekkSo as per your comment those are failing locally as well, right?14:07
dansmithyeah, are they not for you?14:08
abhishekkI haven't run locally anything recently, just doing it now14:08
abhishekkyep it hanged locally for me as well14:15
abhishekkSo I think all tests which are failing are using legacy api server from tests (not the new one which you have written to add new tests for import/quota/policy changes)14:17
dansmithyeah, where it starts a complete api worker process and talks to it over http14:17
abhishekkright14:18
dansmithand it seems that thing is crashing or never coming up or something, which is has always been super hard to debug14:18
dansmithI see lots of zombie processes under the test workers, which are those api processes AFAICT14:18
abhishekk+114:18
abhishekkCan we refactor existing api class similar to recent one?14:20
dansmithtests you mean?14:21
abhishekkyeah14:22
dansmithit would be a massive effort14:23
dansmithalso,14:23
dansmiththe thing you're talking about is synchronous, so the nature of the tests which expect async behaviors would need to change14:23
abhishekkright, I think shortest way is to debug a single test to find out what is going wrong14:24
dansmithyeah14:25
abhishekkpdeore, you around?14:25
dansmithit's also weird that it's working in the devstack jobs, which I think means whatever is broken is something only related to this weird functional worker thing14:25
abhishekkthis made me more nervous :D14:26
abhishekkis there shortctut command to kill zombie processes?14:28
dansmiththey must be waited on14:30
abhishekkyeah14:31
abhishekk  _warn("subprocess %s is still running" % self.pid,14:32
abhishekkmight be related, lots of occurrences in logs14:34
dansmithI think that's a symptom14:34
abhishekklikely14:35
dansmithso I think this is what's crashing it: ERROR: 'NoneType' object has no attribute 'group'14:36
dansmithbut there's no trace so I have no idea where that is coming from14:36
abhishekkI think we should try skipping test_reload once?14:36
dansmithwhy? I can run any of the tests in isolation and they fail14:36
dansmithI've been randomly using this one: test_invalid_cors_get_request14:37
abhishekkohh, I thought that is the one which is reloading configs14:37
abhishekkack14:37
abhishekkgreenlet-1.1.3 is what installed on passing job whereas now it is greenlet==2.0.214:49
abhishekkhttps://d2e021dde0f27c24b843-9e47a969cbb910cc10dbd93fca848265.ssl.cf5.rackcdn.com/850417/9/check/cross-glance-tox-functional/84bf072/job-output.txt14:49
abhishekkthis is the last passing cross-glance-tox-functional14:50
abhishekkhttps://review.opendev.org/c/openstack/requirements/+/87206514:53
abhishekkthis patch is submitted on 26/0314:53
dansmithah, I checked greenlet, but it was released in january, so I figured unlreated14:55
dansmithhowever,14:55
dansmithare we getting that in local runs?14:55
dansmithah, u-c changed14:55
dansmithand we install that from master when we run tox, regardless14:55
dansmiththat's unfortunate14:55
abhishekkyeah14:56
dansmiththat's why walking back in the git history doesn't change it I guess14:56
dansmithdoes that fix it for you? I'm trying14:56
abhishekknah, I just figured out the patch14:56
abhishekkexisting tests are still running for me :D14:57
dansmithI'm not actually sure if eventlet uses greenlet14:57
dansmithyeah, same behavior with 1.1.3 for me14:58
abhishekk:/14:58
abhishekkhow you overridden it in local run?15:00
abhishekkchanged uc inside .tox ?15:00
dansmith$ .tox/functional/bin/pip install -U greenlet==1.1.315:00
dansmithI'm rebuilding my tox env with u-c from march 14th15:01
dansmithso that should get any others to see if that's related15:01
abhishekkack15:01
dansmiththat installed the older constraints, but didn't fix the problem15:04
abhishekkeventlet 0.33.3 vs 0.33.1  ?15:05
dansmithoh hang on,15:05
dansmithI might have broken soemthing else in my testing, just a sec15:05
abhishekkack15:05
dansmithoh snap15:06
dansmith - Passed: 115:06
abhishekkgreenlet or eventlet?15:06
dansmithu-c from march 1415:07
dansmithbut this same issue might have confused my just-greenlet testing earlier15:07
abhishekkack15:08
dansmithI was halfway through trying to print out something on startup and got distracted with the greenlet thing, but had left a typo15:08
dansmithmanifests the same.. a typo preventing the service from starting :)15:08
abhishekk:D15:09
abhishekkwhat changed in u-c 14th and now?15:12
dansmithokay greenlet alone does not fix it15:12
dansmithI'm looking15:12
abhishekki found greenlet and eventlet with diff versions15:13
dansmithhttps://termbin.com/cixg215:13
abhishekkack15:14
dansmithrolling back eventlet failed with dns api error, rolling back dnspython too15:15
dansmiththat diff is 65245016de7cf2d1e585eeb1378aac6aa6d75de0..master in requirements, btw15:15
dansmithnope15:16
dansmithmmm, paste15:17
abhishekkpastedeploy?15:17
dansmithyeah, that's the one :)15:17
dansmithworks with PasteDeploy===2.1.115:18
abhishekkbummer15:18
abhishekkso we need to blacklist 3.0.1 for glance ?15:19
dansmithI dunnno why it's not failing in devstack though15:19
dansmithbut no, I think you need to fix the problem .. can't stay on 2.x forever right?15:19
abhishekkyeah15:20
abhishekktill we fix (which is going to take long) shouldn't we rollback to 2.1.1?15:20
dansmithI think u-c is supposed to be across all the projects, right?15:21
dansmithnot sure it's an option to block it just for glance and rolling it back in u-c is problematic I think assuming some other project wanted it bumped15:21
abhishekkyeah, but there should/might be a way to override it?15:21
dansmithI dunno what the rules are here15:21
abhishekkme too15:22
dansmithoverriding it just means that glance can't be installed alongside nova, for example15:22
dansmithgmann: ^15:22
dansmithI think gmann has been getting in late recently, so might be a while before he's around15:22
abhishekkack15:22
dansmithprobably should quickly work to determine what the actual problem is though.. might be something simple15:22
abhishekkalso we can't skip 106 tests as well :D15:22
abhishekkneed to go through reno of PasteDeploy15:23
dansmithI really need to get back to what I was supposed to be doing this morning, but I assume you can take it from here? or maybe pdeore can try to suss out the change?15:23
abhishekkI think pdeore can take it from here15:23
dansmithknowing what the problem is should be like 90% of the work I bet15:23
abhishekk++15:24
dansmithit's probably something simple like a missing or now-required arg or something15:24
abhishekklikely15:24
abhishekkthanks for spending time on it15:25
* dansmith nods15:25
dansmithalso, maybe some unit tests for the deploy stuff will make it easier to debug what is going on15:26
dansmithand also maybe let's not add any more functional tests based on these api workers :D15:26
abhishekk++15:27
abhishekkthere are two releases 3.0 and 3.0.1 2022-10-16 and 2022-10-1715:28
abhishekkfor pastedeploy ^^15:28
abhishekkhttps://docs.pylonsproject.org/projects/pastedeploy/en/latest/news.html15:28
dansmithyeah, not much in the way of news for those15:28
dansmithseems like the major version bump is just because of dropping py2 support15:29
abhishekklikely15:29
abhishekk        app = deploy.loadapp("config:%s" % conf_file, name=app_name)15:35
abhishekkthis is the only function i think we are calling15:36
dansmithyup15:39
dansmithand it looks the same in nova15:39
dansmithbut of course, it's loading modules provided by glance (and nova) and might be choking on those15:39
dansmithI don't know how it goes from the paste config to the python objects.. so someone probably needs to figure that out ... 15:41
abhishekkack15:43
dansmithoh yeah and those workers run with generated paste configs15:49
dansmithdifferent from the one in etc/15:49
dansmithso could be something there too I guess15:49
dansmiththis is what it's loading via paste I think: glance.api:root_app_factory15:49
dansmithwhich of course is different in the generated paste config for those workers15:51
dansmither, well, maybe the same, but in a different stack15:51
abhishekkright15:53
gmanndansmith: abhishekk_ hi17:36
gmannI do not think we can/should have different u-c (blacklist specific vesion) for glance only17:36
abhishekk_gmann, hi17:36
abhishekk_ack17:37
dansmithyeah, agree17:37
dansmithit's unfortunate that it got bumped without being realized, but alas, here we are17:37
abhishekk_dansmith, you have latest devstack deployed?17:38
gmanndoes not requirement has glance functional test job ?17:38
dansmithabhishekk_: no17:38
abhishekk_dansmith, ack17:39
dansmithgmann: no17:39
dansmithit's a bummer to have to run all the projects' functionals really, especially lately with things breaking so much17:39
dansmithalso, gmann, glance's functionals (these at least) are a nightmare to debug, so asking non-glance people to investigate failures is also unfortunate17:40
gmannohk17:40
abhishekk_somehow I came to conclusion that VersionNegotiationFilter is causing trouble but no further luck since last couple of hours 17:40
dansmithabhishekk_: so you think it's not crashing on start, but refusing to do anything useful?17:41
dansmithbecause the test waits for the timeout for the "ping" which should fail immediately if it's up but just not working17:41
abhishekk_yeah17:41
abhishekk_if I rmove that filter from here17:42
abhishekk_https://github.com/openstack/glance/blob/master/glance/tests/functional/__init__.py#L49917:42
abhishekk_test passes with latest paste deploy17:42
dansmithhuh, maybe just loading that filter fails?17:43
abhishekk_likely, tried putting logs there or in wsgi.Middleware but nothing actually logs17:44
dansmithright17:44
dansmiththis got me some logs:17:44
dansmithhttps://termbin.com/ycd617:44
abhishekk_may be next step is to deploy latest devstack and execute some version api calls17:44
abhishekk_looking17:45
dansmithor make sure it configures it the same way17:45
dansmithmaybe devstack doesn't have that filter? or it's not in that order?17:45
abhishekk_i think it has17:46
dansmithwell, something has to be different :)17:47
abhishekk_agree17:47
dansmithdefinitely a different order17:47
abhishekk_I think now I will hand it over to pdeore17:47
* dansmith nods17:48
abhishekk_curiosity is what changed in deploy 3.0 that causes this middleware to fail :D17:48
dansmithyeah17:49
abhishekk_tempest is also not failing means not much to worry?17:56
dansmithwell, that's what I'm saying.. it must be something specific to the config in the functional workers, since the devstack jobs are fine17:56
abhishekk_yep17:57
* abhishekk_ signing out now, have a good day17:57
dansmitho/18:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!