opendevreview | Rajat Dhasmana proposed openstack/glance-specs master: Add new location APIs https://review.opendev.org/c/openstack/glance-specs/+/840882 | 05:54 |
---|---|---|
simboja_ | dansmith: >>> import dbcounter | 10:29 |
simboja_ | >>> print(dbcounter.__file__) | 10:29 |
simboja_ | >>> | 10:29 |
simboja_ | https://paste.opendev.org/show/bgXxGgP8PzMTY9n2cQ4y/ | 10:30 |
opendevreview | Pranali Deore proposed openstack/glance master: Remove dead code of auth and policy layers https://review.opendev.org/c/openstack/glance/+/845114 | 10:31 |
opendevreview | Pranali Deore proposed openstack/glance-specs master: [APIImpact] Add DELETE api for metadef resource types https://review.opendev.org/c/openstack/glance-specs/+/818192 | 13:04 |
abhishekk | dansmith, do we need to set any other parameter in local.conf than GLANCE_STANDALONE=True because I am getting 503 error while accessing the glance service | 13:26 |
abhishekk | http://10.0.108.117/image returns 503 but curl http://0.0.0.0:19292 returns valid response | 13:27 |
dansmith | abhishekk: you need TLS_PROXY for the integrated /image thing I think | 13:28 |
dansmith | simboja: okay, that's installed in the right place | 13:28 |
abhishekk | dansmith, ack, I will restack with TLS_PROXY | 13:29 |
dansmith | abhishekk: oh sorry you *want* standalone.. might need to disable tls_proxy for that and then /image won't work I think | 13:29 |
dansmith | abhishekk: look at the standalone jobs, I'd have to :) | 13:29 |
dansmith | simboja: which distro? | 13:29 |
simboja | Linux ubuntu 20.04 | 13:30 |
abhishekk | this is local.conf for standalone job, https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_87f/845114/6/check/tempest-integrated-storage-import-standalone/87fa10d/controller/logs/_.localrc_auto.txt | 13:30 |
abhishekk | here tls_proxy is enabled | 13:30 |
abhishekk | I will be back in couple of hours, need to rush outside | 13:31 |
dansmith | simboja: that's strange because it's clearly installed, glance should be able to import it | 13:31 |
dansmith | simboja: you saw my workaround right? | 13:32 |
simboja | I added this conf: https://paste.opendev.org/show/bOIQ9X6Kw2ZNv4ksNbBZ/ | 13:32 |
dansmith | simboja: and you're good now? | 13:33 |
simboja | Yeah the workaround passing but I am hitting a different issue now. But that is on Kuryr: Now, I am stuck with this timeout connecting to machines: https://drive.google.com/file/d/1BKyJKvHBBiB5ofp90ElDiPm_rfdGPRsp/view?usp=sharing | 13:33 |
dansmith | okay I can't help with that :) | 13:34 |
simboja | dansmith: Did you take a look at my HOST_IP=10.0.2.15? | 13:34 |
simboja | Any issue with the HOST_IP? | 13:34 |
dansmith | simboja: is that really the IP of the machine you're running it on? | 13:34 |
dansmith | you shouldn't need to set that generally, fwiw | 13:35 |
simboja | N | 13:35 |
simboja | No | 13:35 |
simboja | dansmith: 192.168.43.42 | 13:35 |
simboja | If I use this, I will be stuck with keystone timeout :D | 13:35 |
simboja | dansmith: If I do not set HOST_IP, the stack run will exit and throw an error that it's not set :) | 13:36 |
simboja | Quite convoluting | 13:36 |
dansmith | simboja: it shouldn't be required.. | 13:37 |
dansmith | simboja: can we move to -qa? that's where the devstack experts are.. this was never and definitely now isn't a glance issue :) | 13:37 |
simboja | okay | 13:37 |
*** maysams-afk is now known as maysams | 14:02 | |
croelandt | jokke_: dansmith: can we push a tag for glance_store's stable/wallaby? This bug https://bugzilla.redhat.com/show_bug.cgi?id=2052857 is fixed by a patch that is in stable/wallaby, but the import won't be done automagically on Red Hat's side without a tag | 15:13 |
dansmith | croelandt: I'm not on glance stable | 15:59 |
dansmith | but presumably if there's content pending, a tag makes sense | 15:59 |
croelandt | Is there a specific procedure? | 15:59 |
croelandt | Or can I just git tag && git push --tags? :) | 15:59 |
croelandt | Maybe I'll bring that up on the Thursday meeting, see if we have doc about that | 15:59 |
abhishekk | dansmith, https://paste.opendev.org/show/barB6VbDh0NOQiaiWf6t/ | 16:00 |
abhishekk | this is my local.conf, I am still getting 503 for glance | 16:00 |
dansmith | abhishekk: is it registered in the catalog as host/image ? | 16:01 |
abhishekk | in endpoint I can see it as /image | 16:01 |
abhishekk | add53df67a3042a9bafbf319efe99257 | RegionOne | glance | image | True | public | http://10.0.108.117/image | 16:02 |
dansmith | and it's 503ing because why? | 16:02 |
dansmith | like is that glance returning 503 or apache because it can't hit glance? | 16:02 |
abhishekk | I think apache because its not hitting glance | 16:06 |
abhishekk | <hr> | 16:06 |
abhishekk | <address>Apache/2.4.41 (Ubuntu) Server at 10.0.108.117 Port 80</address> | 16:06 |
abhishekk | </body></html> | 16:06 |
dansmith | so in /etc/apache2 somewhere is the config file that makes it proxy to glance, so see what that's configured for | 16:07 |
abhishekk | The server is temporarily unable to service your | 16:07 |
abhishekk | request due to maintenance downtime or capacity | 16:07 |
abhishekk | problems. Please try again later. | 16:07 |
abhishekk | ack | 16:07 |
abhishekk | ProxyPass "/image" "http://127.0.0.1:60999" retry=0 | 16:08 |
dansmith | is that right? | 16:09 |
dansmith | I thought disabling tls-proxy got rid of all of this, but maybe not | 16:10 |
abhishekk | I think this needs to be http://127.0.0.1:19292 | 16:10 |
dansmith | yeah.. I dunno why it's wrong on your system, but that looks like a "generate random port" thing and glance itself didn't get told | 16:11 |
dansmith | probably some complex interaction of options | 16:12 |
dansmith | maybe something set in tempest-integrated-storage-import | 16:12 |
abhishekk | there is one glance.conf as well which has http://127.0.0.1:19292 under /etc/apache2 | 16:13 |
dansmith | oh interesting | 16:13 |
abhishekk | ProxyPass "/image" "http://127.0.0.1:60999" retry=0 and this is in /etc/apache2/glance-wsgi-api.conf | 16:13 |
abhishekk | after changing in above file it is responding | 16:14 |
abhishekk | now I am hitting same error as simboja | 16:29 |
dansmith | ugh | 16:30 |
abhishekk | Can't load plugin: sqlalchemy.plugins:dbcounterplugin=dbcounter | 16:30 |
dansmith | abhishekk: can you do the same thing, see if it's installed globally? | 16:30 |
dansmith | import dbcounter; print(dbcounter.__file__) | 16:31 |
abhishekk | yes | 16:31 |
abhishekk | /usr/local/lib/python3.8/dist-packages/dbcounter.py | 16:31 |
dansmith | okay just a sc | 16:31 |
dansmith | cat /usr/local/lib/python3.8/dist-packages/dbcounter-0.1.dist-info/entry_points.txt | 16:32 |
abhishekk | [sqlalchemy.plugins] | 16:33 |
abhishekk | dbcounter = dbcounter:LogCursorEventsPlugin | 16:33 |
dansmith | wtf | 16:33 |
dansmith | I don't know why it's not loading then | 16:33 |
abhishekk | I guess, I will create new VM and try again on it | 16:34 |
dansmith | is there a stack trace in the logs when it fails to load that or just the error message? | 16:34 |
abhishekk | let me https://paste.opendev.org/show/bnQUTmmPAlIPoq1RmjQj/ | 16:36 |
dansmith | man | 16:37 |
dansmith | just re-stack with MYSQL_GATHER_PERFORMANCE=False | 16:38 |
dansmith | no need to recreate the vm | 16:38 |
dansmith | that will avoid trying to load that plugin | 16:38 |
dansmith | I dunno why it appears sometimes and not others, | 16:39 |
abhishekk | ack | 16:39 |
dansmith | but pip and setuptools has been really flaky lately :/ | 16:39 |
abhishekk | agree | 16:39 |
abhishekk | just curious why this problem is not there with upstream jobs | 16:40 |
dansmith | well, right, and clearly it's not breaking everyone | 16:40 |
dansmith | I just dunno what's happening in some situations.. third time I've seen it in a couple weeks, but I dunno what it takes to repro it | 16:40 |
abhishekk | I will try to reproduce it on plain vm | 16:41 |
abhishekk | tomorrow | 16:41 |
abhishekk | finally standalone setup is ready | 16:59 |
abhishekk | looks like graceful exit is not working for standalone as well | 17:11 |
dansmith | in what way? | 17:24 |
dansmith | when some opposition to wsgi was brought up, I was assured that the existing graceful shutdown stuff worked and was a requirement for wsgi mode | 17:25 |
dansmith | so maybe I broke it? | 17:25 |
abhishekk | no idea since when it is broken | 17:26 |
abhishekk | but I am sure it was working earlier (in the beginning) as I tested it | 17:27 |
dansmith | what's failing now specifically? | 17:31 |
abhishekk | it is existing immediately when I send kill -9 or restart g-api service | 17:31 |
dansmith | I was pretty sure I replicated the graceful shutdown when I did the threadpool stuff, as I had some thing that generated data slowly to keep things running while I tried to kill it | 17:31 |
dansmith | abhishekk: kill -9 is not graceful, it gives the app no option to handle things | 17:32 |
dansmith | so that's definitely going to not be graceful :D | 17:32 |
dansmith | a userland application can't catch SIGKILL | 17:32 |
abhishekk | kill -15? | 17:32 |
dansmith | SIGINT is what you want | 17:32 |
abhishekk | so service restart sends kill signal as well | 17:33 |
dansmith | systemd will send INT (or similar) wait some timeout and then KILL | 17:33 |
dansmith | and I think it's capped at something rather small, like 300s or something, which puts an upper bound on how long we can wait anyway | 17:34 |
abhishekk | it does not wait | 17:34 |
dansmith | I think it has to be configured to do so, hang on | 17:34 |
dansmith | https://stackoverflow.com/questions/42978358/how-systemd-stop-command-actually-works | 17:35 |
abhishekk | looking | 17:35 |
dansmith | https://www.freedesktop.org/software/systemd/man/systemd.kill.html# | 17:35 |
dansmith | "Processes will first be terminated via SIGTERM (unless the signal to send is changed via KillSignal= or RestartKillSignal=). Optionally, this is immediately followed by a SIGHUP (if enabled with SendSIGHUP=). If processes still remain after the main process of a unit has exited or the delay configured via the TimeoutStopSec= has passed, the termination request is repeated with the SIGKILL signal or the signal specified via | 17:36 |
dansmith | FinalKillSignal= (unless this is disabled via the SendSIGKILL= option). See kill(2) for more information." | 17:36 |
dansmith | https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStopSec= | 17:37 |
abhishekk | so I need to check glance systemd file | 17:37 |
dansmith | looks like you can set it very long now | 17:37 |
abhishekk | ack | 17:39 |
dansmith | abhishekk: but referring back to kill, look at signal(7) and you'll see: | 17:39 |
dansmith | "The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored." | 17:39 |
dansmith | kill nukes you from orbit immediately, no chance to be graceful | 17:39 |
abhishekk | yeah | 17:40 |
abhishekk | in short when I hit systemctl restart command it should wait atleast for some seconds, right | 17:41 |
dansmith | restart and stop are different I think | 17:42 |
dansmith | stop should for sure | 17:42 |
abhishekk | ack | 17:43 |
abhishekk | will check separately | 17:43 |
dansmith | well, reload is for sure, maybe restart is just stop...start, not sure, looking | 17:45 |
abhishekk | no, -15 and stop also exits immediately | 17:46 |
dansmith | yeah, looks like restart should stop then start | 17:47 |
dansmith | abhishekk: so if we're exiting immediately on TERM then that means we're not catching that signal | 17:47 |
abhishekk | likely | 17:47 |
dansmith | so stop ( and restart) are probably sending TERM and then waiting to send KILL, but we exit immediately, so it looks like KILL behavior | 17:48 |
jokke_ | yup ... iirc it's the SIGHUP that gets glance doing reload I can't remember if it's TERM or INT that is supposed to be graceful shutdown | 17:48 |
dansmith | jokke_: either should trigger graceful shutdown | 17:48 |
dansmith | INT is effectively Ctrl-C and TERM is the kinda service-y equivalent | 17:48 |
dansmith | I'm not sure why glance doesn't use oslo.service, but this is all done there and combined into "stop" and "restart" handlers | 17:49 |
jokke_ | yeah | 17:49 |
jokke_ | I think glance service handlers were written way ahead of oslo.service and no-one just ever refactored it | 17:50 |
dansmith | ack | 17:50 |
dansmith | nova was too, and still has a wrapper around oslo.service, but it uses them under the covers | 17:50 |
abhishekk | we use sighup for reload | 17:50 |
dansmith | so we get automatic config file reload and stuff | 17:50 |
abhishekk | https://review.opendev.org/c/openstack/glance/+/122181 | 17:51 |
abhishekk | this was the original patch which added reload functionality | 17:51 |
dansmith | ack, so just need to catch INT and TERM as well to handle actual graceful shutdown | 17:51 |
abhishekk | in 2015 | 17:51 |
abhishekk | but at this moment HUP is also not working as expected | 17:51 |
dansmith | I'm not sure why HUP would need to drain anything... | 17:52 |
dansmith | so I'm not sure what glance's "expected" HUP behavior is really | 17:52 |
dansmith | usually it's "reload the config file" sort of things, which doesn't need to interrupt any in-progress cache or upload/download activities | 17:53 |
jokke_ | the reason it doesn't work, is that if the graceful shutdown does not work everything just dies. It's supposed to send graceful shutdown for all workers and in order of them going down, bring new ones up with new confs. | 17:55 |
jokke_ | but if everything just gets "die" ... well | 17:55 |
dansmith | that's what HUP is supposed to do you mean? | 17:55 |
jokke_ | yes | 17:55 |
dansmith | ack, well, that's fine I guess as long as it continues to answer queries until that happens, | 17:56 |
dansmith | but for deployment stuff like updating containers, it should do similar things for TERM/INT | 17:56 |
abhishekk | actual the old worker which was continuing old task does not accept any new request and exit once its work is finished | 17:56 |
abhishekk | where as after sighup it was immediately starting new workers with new configs and old workers used to be exit once task is done | 17:57 |
jokke_ | yeah can't remember the exact code anymore. IIRC if it has say 4 workers in settings, it should bring one up so serve new connections (so temporarily 5 workers) and once the old ones finishes and start going down it should bring the worker count back to 4 | 17:57 |
abhishekk | right | 17:57 |
abhishekk | _verify_and_respawn_children in glance/common/wsgi.py | 17:58 |
dansmith | ack, so for shutdown it should close its listen socket (if standalone) and then do that, but terminate when complete | 17:59 |
dansmith | uwsgi handles this for you for things that are in-process connections | 17:59 |
dansmith | the other threads you have spawned are not automatic in that case, but if you're processing a connection it handed you, then it knows how to do it right | 17:59 |
abhishekk | yeah | 18:00 |
abhishekk | reload does not read new config values as well | 18:13 |
abhishekk | after reload new configs are loaded (so just problem with existing workers needs to be wait is the problem) | 18:34 |
abhishekk | I think this is what causing worker to exit and not wait | 18:37 |
abhishekk | https://paste.opendev.org/show/bTjORjtLtbhUe93xQFDu/ | 18:37 |
abhishekk | looks like eventlet related issue | 19:05 |
* abhishekk signing out for the day | 19:05 | |
abhishekk | reproducer (steps) https://paste.opendev.org/show/bFqCD4U4P0Q8MyIrsO4J/ | 19:06 |
dansmith | hey, at least the cache worker exited :) | 19:15 |
abhishekk | :D | 19:16 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!