Tuesday, 2022-06-14

opendevreviewRajat Dhasmana proposed openstack/glance-specs master: Add new location APIs  https://review.opendev.org/c/openstack/glance-specs/+/84088205:54
simboja_dansmith: >>> import dbcounter10:29
simboja_>>> print(dbcounter.__file__)10:29
simboja_>>> 10:29
simboja_https://paste.opendev.org/show/bgXxGgP8PzMTY9n2cQ4y/10:30
opendevreviewPranali Deore proposed openstack/glance master: Remove dead code of auth and policy layers  https://review.opendev.org/c/openstack/glance/+/84511410:31
opendevreviewPranali Deore proposed openstack/glance-specs master: [APIImpact] Add DELETE api for metadef resource types  https://review.opendev.org/c/openstack/glance-specs/+/81819213:04
abhishekkdansmith, do we need to set any other parameter in local.conf than GLANCE_STANDALONE=True because I am getting 503 error while accessing the glance service 13:26
abhishekkhttp://10.0.108.117/image returns 503 but curl http://0.0.0.0:19292 returns valid response13:27
dansmithabhishekk: you need TLS_PROXY for the integrated /image thing I think13:28
dansmithsimboja: okay, that's installed in the right place13:28
abhishekkdansmith, ack, I will restack with TLS_PROXY13:29
dansmithabhishekk: oh sorry you *want* standalone.. might need to disable tls_proxy for that and then /image won't work I think13:29
dansmithabhishekk: look at the standalone jobs, I'd have to :)13:29
dansmithsimboja: which distro?13:29
simbojaLinux ubuntu 20.0413:30
abhishekkthis is local.conf for standalone job, https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_87f/845114/6/check/tempest-integrated-storage-import-standalone/87fa10d/controller/logs/_.localrc_auto.txt13:30
abhishekkhere tls_proxy is enabled13:30
abhishekkI will be back in couple of hours, need to rush outside13:31
dansmithsimboja: that's strange because it's clearly installed, glance should be able to import it13:31
dansmithsimboja: you saw my workaround right?13:32
simbojaI added this conf: https://paste.opendev.org/show/bOIQ9X6Kw2ZNv4ksNbBZ/13:32
dansmithsimboja: and you're good now?13:33
simbojaYeah the workaround passing but I am hitting a different issue now. But that is on Kuryr:  Now, I am stuck with this timeout connecting to machines: https://drive.google.com/file/d/1BKyJKvHBBiB5ofp90ElDiPm_rfdGPRsp/view?usp=sharing13:33
dansmithokay I can't help with that :)13:34
simbojadansmith: Did you take a look at my HOST_IP=10.0.2.15?13:34
simbojaAny issue with the HOST_IP?13:34
dansmithsimboja: is that really the IP of the machine you're running it on?13:34
dansmithyou shouldn't need to set that generally, fwiw13:35
simbojaN13:35
simbojaNo13:35
simbojadansmith: 192.168.43.4213:35
simbojaIf I use this, I will be stuck with keystone timeout :D13:35
simbojadansmith: If I do not set HOST_IP, the stack run will exit and throw an error that it's not set :)13:36
simbojaQuite convoluting13:36
dansmithsimboja: it shouldn't be required..13:37
dansmithsimboja: can we move to -qa? that's where the devstack experts are.. this was never and definitely now isn't a glance issue :)13:37
simbojaokay13:37
*** maysams-afk is now known as maysams14:02
croelandtjokke_: dansmith: can we push a tag for glance_store's stable/wallaby? This bug https://bugzilla.redhat.com/show_bug.cgi?id=2052857 is fixed by a patch that is in stable/wallaby, but the import won't be done automagically on Red Hat's side without a tag15:13
dansmithcroelandt: I'm not on glance stable15:59
dansmithbut presumably if there's content pending, a tag makes sense15:59
croelandtIs there a specific procedure?15:59
croelandtOr can I just git tag && git push --tags? :)15:59
croelandtMaybe I'll bring that up on the Thursday meeting, see if we have doc about that15:59
abhishekkdansmith, https://paste.opendev.org/show/barB6VbDh0NOQiaiWf6t/16:00
abhishekkthis is my local.conf, I am still getting 503 for glance 16:00
dansmithabhishekk: is it registered in the catalog as host/image ?16:01
abhishekkin endpoint I can see it as /image16:01
abhishekkadd53df67a3042a9bafbf319efe99257 | RegionOne | glance       | image          | True    | public    | http://10.0.108.117/image   16:02
dansmithand it's 503ing because why?16:02
dansmithlike is that glance returning 503 or apache because it can't hit glance?16:02
abhishekkI think apache because its not hitting glance16:06
abhishekk<hr>16:06
abhishekk<address>Apache/2.4.41 (Ubuntu) Server at 10.0.108.117 Port 80</address>16:06
abhishekk</body></html>16:06
dansmithso in /etc/apache2 somewhere is the config file that makes it proxy to glance, so see what that's configured for16:07
abhishekkThe server is temporarily unable to service your16:07
abhishekkrequest due to maintenance downtime or capacity16:07
abhishekkproblems. Please try again later.16:07
abhishekkack16:07
abhishekkProxyPass "/image" "http://127.0.0.1:60999" retry=016:08
dansmithis that right?16:09
dansmithI thought disabling tls-proxy got rid of all of this, but maybe not16:10
abhishekkI think this needs to be http://127.0.0.1:1929216:10
dansmithyeah.. I dunno why it's wrong on your system, but that looks like a "generate random port" thing and glance itself didn't get told16:11
dansmithprobably some complex interaction of options16:12
dansmithmaybe something set in tempest-integrated-storage-import16:12
abhishekkthere is one glance.conf as well which has http://127.0.0.1:19292 under /etc/apache216:13
dansmithoh interesting16:13
abhishekkProxyPass "/image" "http://127.0.0.1:60999" retry=0 and this is in /etc/apache2/glance-wsgi-api.conf16:13
abhishekkafter changing in above file it is responding16:14
abhishekknow I am hitting same error as simboja16:29
dansmithugh16:30
abhishekkCan't load plugin: sqlalchemy.plugins:dbcounterplugin=dbcounter16:30
dansmithabhishekk: can you do the same thing, see if it's installed globally?16:30
dansmithimport dbcounter; print(dbcounter.__file__)16:31
abhishekkyes16:31
abhishekk /usr/local/lib/python3.8/dist-packages/dbcounter.py16:31
dansmithokay just a sc16:31
dansmithcat /usr/local/lib/python3.8/dist-packages/dbcounter-0.1.dist-info/entry_points.txt16:32
abhishekk[sqlalchemy.plugins]16:33
abhishekkdbcounter = dbcounter:LogCursorEventsPlugin16:33
dansmithwtf16:33
dansmithI don't know why it's not loading then16:33
abhishekkI guess, I will create new VM and try again on it16:34
dansmithis there a stack trace in the logs when it fails to load that or just the error message?16:34
abhishekklet me https://paste.opendev.org/show/bnQUTmmPAlIPoq1RmjQj/16:36
dansmithman16:37
dansmithjust re-stack with MYSQL_GATHER_PERFORMANCE=False16:38
dansmithno need to recreate the vm16:38
dansmiththat will avoid trying to load that plugin16:38
dansmithI dunno why it appears sometimes and not others,16:39
abhishekkack16:39
dansmithbut pip and setuptools has been really flaky lately :/16:39
abhishekkagree16:39
abhishekkjust curious why this problem is not there with upstream jobs16:40
dansmithwell, right, and clearly it's not breaking everyone16:40
dansmithI just dunno what's happening in some situations.. third time I've seen it in a couple weeks, but I dunno what it takes to repro it16:40
abhishekkI will try to reproduce it on plain vm16:41
abhishekktomorrow16:41
abhishekkfinally standalone setup is ready16:59
abhishekklooks like graceful exit is not working for standalone as well17:11
dansmithin what way?17:24
dansmithwhen some opposition to wsgi was brought up, I was assured that the existing graceful shutdown stuff worked and was a requirement for wsgi mode17:25
dansmithso maybe I broke it?17:25
abhishekkno idea since when it is broken17:26
abhishekkbut I am sure it was working earlier (in the beginning) as I tested it 17:27
dansmithwhat's failing now specifically?17:31
abhishekkit is existing immediately when I send kill -9 or restart g-api service17:31
dansmithI was pretty sure I replicated the graceful shutdown when I did the threadpool stuff, as I had some thing that generated data slowly to keep things running while I tried to kill it17:31
dansmithabhishekk: kill -9 is not graceful, it gives the app no option to handle things17:32
dansmithso that's definitely going to not be graceful :D17:32
dansmitha userland application can't catch SIGKILL17:32
abhishekkkill -15?17:32
dansmithSIGINT is what you want17:32
abhishekkso service restart sends kill signal as well17:33
dansmithsystemd will send INT (or similar) wait some timeout and then KILL17:33
dansmithand I think it's capped at something rather small, like 300s or something, which puts an upper bound on how long we can wait anyway17:34
abhishekkit does not wait 17:34
dansmithI think it has to be configured to do so, hang on17:34
dansmithhttps://stackoverflow.com/questions/42978358/how-systemd-stop-command-actually-works17:35
abhishekklooking17:35
dansmithhttps://www.freedesktop.org/software/systemd/man/systemd.kill.html#17:35
dansmith"Processes will first be terminated via SIGTERM (unless the signal to send is changed via KillSignal= or RestartKillSignal=). Optionally, this is immediately followed by a SIGHUP (if enabled with SendSIGHUP=). If processes still remain after the main process of a unit has exited or the delay configured via the TimeoutStopSec= has passed, the termination request is repeated with the SIGKILL signal or the signal specified via17:36
dansmith FinalKillSignal= (unless this is disabled via the SendSIGKILL= option). See kill(2) for more information."17:36
dansmithhttps://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStopSec=17:37
abhishekkso I need to check glance systemd file17:37
dansmithlooks like you can set it very long now17:37
abhishekkack17:39
dansmithabhishekk: but referring back to kill, look at signal(7) and you'll see:17:39
dansmith"The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored."17:39
dansmithkill nukes you from orbit immediately, no chance to be graceful17:39
abhishekkyeah17:40
abhishekkin short when I hit systemctl restart command it should wait atleast for some seconds, right 17:41
dansmithrestart and stop are different I think17:42
dansmithstop should for sure17:42
abhishekkack17:43
abhishekkwill check separately17:43
dansmithwell, reload is for sure, maybe restart is just stop...start, not sure, looking17:45
abhishekkno, -15 and stop also exits immediately17:46
dansmithyeah, looks like restart should stop then start17:47
dansmithabhishekk: so if we're exiting immediately on TERM then that means we're not catching that signal17:47
abhishekklikely17:47
dansmithso stop ( and restart) are probably sending TERM and then waiting to send KILL, but we exit immediately, so it looks like KILL behavior17:48
jokke_yup ... iirc it's the SIGHUP that gets glance doing reload I can't remember if it's TERM or INT that is supposed to be graceful shutdown17:48
dansmithjokke_: either should trigger graceful shutdown17:48
dansmithINT is effectively Ctrl-C and TERM is the kinda service-y equivalent17:48
dansmithI'm not sure why glance doesn't use oslo.service, but this is all done there and combined into "stop" and "restart" handlers17:49
jokke_yeah17:49
jokke_I think glance service handlers were written way ahead of oslo.service and no-one just ever refactored it17:50
dansmithack17:50
dansmithnova was too, and still has a wrapper around oslo.service, but it uses them under the covers17:50
abhishekkwe use sighup for reload17:50
dansmithso we get automatic config file reload and stuff17:50
abhishekkhttps://review.opendev.org/c/openstack/glance/+/12218117:51
abhishekkthis was the original patch which added reload functionality17:51
dansmithack, so just need to catch INT and TERM as well to handle actual graceful shutdown17:51
abhishekkin 201517:51
abhishekkbut at this moment HUP is also not working as expected17:51
dansmithI'm not sure why HUP would need to drain anything...17:52
dansmithso I'm not sure what glance's "expected" HUP behavior is really17:52
dansmithusually it's "reload the config file" sort of things, which doesn't need to interrupt any in-progress cache or upload/download activities17:53
jokke_the reason it doesn't work, is that if the graceful shutdown does not work everything just dies. It's supposed to send graceful shutdown for all workers and in order of them going down, bring new ones up with new confs.17:55
jokke_but if everything just gets "die" ... well 17:55
dansmiththat's what HUP is supposed to do you mean?17:55
jokke_yes17:55
dansmithack, well, that's fine I guess as long as it continues to answer queries until that happens,17:56
dansmithbut for deployment stuff like updating containers, it should do similar things for TERM/INT17:56
abhishekkactual the old worker which was continuing old task does not accept any new request and exit once its work is finished17:56
abhishekkwhere as after sighup it was immediately starting new workers with new configs and old workers used to be exit once task is done17:57
jokke_yeah can't remember the exact code anymore. IIRC if it has say 4 workers in settings, it should bring one up so serve new connections (so temporarily 5 workers) and once the old ones finishes and start going down it should bring the worker count back to 417:57
abhishekkright17:57
abhishekk_verify_and_respawn_children in glance/common/wsgi.py17:58
dansmithack, so for shutdown it should close its listen socket (if standalone) and then do that, but terminate when complete17:59
dansmithuwsgi handles this for you for things that are in-process connections17:59
dansmiththe other threads you have spawned are not automatic in that case, but if you're processing a connection it handed you, then it knows how to do it right17:59
abhishekkyeah18:00
abhishekkreload does not read new config values as well18:13
abhishekkafter reload new configs are loaded (so just problem with existing workers needs to be wait is the problem)18:34
abhishekkI think this is what causing worker to exit and not wait18:37
abhishekkhttps://paste.opendev.org/show/bTjORjtLtbhUe93xQFDu/18:37
abhishekklooks like eventlet related issue19:05
* abhishekk signing out for the day19:05
abhishekkreproducer (steps) https://paste.opendev.org/show/bFqCD4U4P0Q8MyIrsO4J/19:06
dansmithhey, at least the cache worker exited :)19:15
abhishekk:D19:16

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!