Tuesday, 2022-06-14

opendevreview	Rajat Dhasmana proposed openstack/glance-specs master: Add new location APIs https://review.opendev.org/c/openstack/glance-specs/+/840882	05:54
simboja_	dansmith: >>> import dbcounter	10:29
simboja_	>>> print(dbcounter.__file__)	10:29
simboja_	>>>	10:29
simboja_	https://paste.opendev.org/show/bgXxGgP8PzMTY9n2cQ4y/	10:30
opendevreview	Pranali Deore proposed openstack/glance master: Remove dead code of auth and policy layers https://review.opendev.org/c/openstack/glance/+/845114	10:31
opendevreview	Pranali Deore proposed openstack/glance-specs master: [APIImpact] Add DELETE api for metadef resource types https://review.opendev.org/c/openstack/glance-specs/+/818192	13:04
abhishekk	dansmith, do we need to set any other parameter in local.conf than GLANCE_STANDALONE=True because I am getting 503 error while accessing the glance service	13:26
abhishekk	http://10.0.108.117/image returns 503 but curl http://0.0.0.0:19292 returns valid response	13:27
dansmith	abhishekk: you need TLS_PROXY for the integrated /image thing I think	13:28
dansmith	simboja: okay, that's installed in the right place	13:28
abhishekk	dansmith, ack, I will restack with TLS_PROXY	13:29
dansmith	abhishekk: oh sorry you want standalone.. might need to disable tls_proxy for that and then /image won't work I think	13:29
dansmith	abhishekk: look at the standalone jobs, I'd have to :)	13:29
dansmith	simboja: which distro?	13:29
simboja	Linux ubuntu 20.04	13:30
abhishekk	this is local.conf for standalone job, https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_87f/845114/6/check/tempest-integrated-storage-import-standalone/87fa10d/controller/logs/_.localrc_auto.txt	13:30
abhishekk	here tls_proxy is enabled	13:30
abhishekk	I will be back in couple of hours, need to rush outside	13:31
dansmith	simboja: that's strange because it's clearly installed, glance should be able to import it	13:31
dansmith	simboja: you saw my workaround right?	13:32
simboja	I added this conf: https://paste.opendev.org/show/bOIQ9X6Kw2ZNv4ksNbBZ/	13:32
dansmith	simboja: and you're good now?	13:33
simboja	Yeah the workaround passing but I am hitting a different issue now. But that is on Kuryr: Now, I am stuck with this timeout connecting to machines: https://drive.google.com/file/d/1BKyJKvHBBiB5ofp90ElDiPm_rfdGPRsp/view?usp=sharing	13:33
dansmith	okay I can't help with that :)	13:34
simboja	dansmith: Did you take a look at my HOST_IP=10.0.2.15?	13:34
simboja	Any issue with the HOST_IP?	13:34
dansmith	simboja: is that really the IP of the machine you're running it on?	13:34
dansmith	you shouldn't need to set that generally, fwiw	13:35
simboja	N	13:35
simboja	No	13:35
simboja	dansmith: 192.168.43.42	13:35
simboja	If I use this, I will be stuck with keystone timeout :D	13:35
simboja	dansmith: If I do not set HOST_IP, the stack run will exit and throw an error that it's not set :)	13:36
simboja	Quite convoluting	13:36
dansmith	simboja: it shouldn't be required..	13:37
dansmith	simboja: can we move to -qa? that's where the devstack experts are.. this was never and definitely now isn't a glance issue :)	13:37
simboja	okay	13:37
*** maysams-afk is now known as maysams		14:02
croelandt	jokke_: dansmith: can we push a tag for glance_store's stable/wallaby? This bug https://bugzilla.redhat.com/show_bug.cgi?id=2052857 is fixed by a patch that is in stable/wallaby, but the import won't be done automagically on Red Hat's side without a tag	15:13
dansmith	croelandt: I'm not on glance stable	15:59
dansmith	but presumably if there's content pending, a tag makes sense	15:59
croelandt	Is there a specific procedure?	15:59
croelandt	Or can I just git tag && git push --tags? :)	15:59
croelandt	Maybe I'll bring that up on the Thursday meeting, see if we have doc about that	15:59
abhishekk	dansmith, https://paste.opendev.org/show/barB6VbDh0NOQiaiWf6t/	16:00
abhishekk	this is my local.conf, I am still getting 503 for glance	16:00
dansmith	abhishekk: is it registered in the catalog as host/image ?	16:01
abhishekk	in endpoint I can see it as /image	16:01
abhishekk	add53df67a3042a9bafbf319efe99257 \| RegionOne \| glance \| image \| True \| public \| http://10.0.108.117/image	16:02
dansmith	and it's 503ing because why?	16:02
dansmith	like is that glance returning 503 or apache because it can't hit glance?	16:02
abhishekk	I think apache because its not hitting glance	16:06
abhishekk	<hr>	16:06
abhishekk	<address>Apache/2.4.41 (Ubuntu) Server at 10.0.108.117 Port 80</address>	16:06
abhishekk	</body></html>	16:06
dansmith	so in /etc/apache2 somewhere is the config file that makes it proxy to glance, so see what that's configured for	16:07
abhishekk	The server is temporarily unable to service your	16:07
abhishekk	request due to maintenance downtime or capacity	16:07
abhishekk	problems. Please try again later.	16:07
abhishekk	ack	16:07
abhishekk	ProxyPass "/image" "http://127.0.0.1:60999" retry=0	16:08
dansmith	is that right?	16:09
dansmith	I thought disabling tls-proxy got rid of all of this, but maybe not	16:10
abhishekk	I think this needs to be http://127.0.0.1:19292	16:10
dansmith	yeah.. I dunno why it's wrong on your system, but that looks like a "generate random port" thing and glance itself didn't get told	16:11
dansmith	probably some complex interaction of options	16:12
dansmith	maybe something set in tempest-integrated-storage-import	16:12
abhishekk	there is one glance.conf as well which has http://127.0.0.1:19292 under /etc/apache2	16:13
dansmith	oh interesting	16:13
abhishekk	ProxyPass "/image" "http://127.0.0.1:60999" retry=0 and this is in /etc/apache2/glance-wsgi-api.conf	16:13
abhishekk	after changing in above file it is responding	16:14
abhishekk	now I am hitting same error as simboja	16:29
dansmith	ugh	16:30
abhishekk	Can't load plugin: sqlalchemy.plugins:dbcounterplugin=dbcounter	16:30
dansmith	abhishekk: can you do the same thing, see if it's installed globally?	16:30
dansmith	import dbcounter; print(dbcounter.__file__)	16:31
abhishekk	yes	16:31
abhishekk	/usr/local/lib/python3.8/dist-packages/dbcounter.py	16:31
dansmith	okay just a sc	16:31
dansmith	cat /usr/local/lib/python3.8/dist-packages/dbcounter-0.1.dist-info/entry_points.txt	16:32
abhishekk	[sqlalchemy.plugins]	16:33
abhishekk	dbcounter = dbcounter:LogCursorEventsPlugin	16:33
dansmith	wtf	16:33
dansmith	I don't know why it's not loading then	16:33
abhishekk	I guess, I will create new VM and try again on it	16:34
dansmith	is there a stack trace in the logs when it fails to load that or just the error message?	16:34
abhishekk	let me https://paste.opendev.org/show/bnQUTmmPAlIPoq1RmjQj/	16:36
dansmith	man	16:37
dansmith	just re-stack with MYSQL_GATHER_PERFORMANCE=False	16:38
dansmith	no need to recreate the vm	16:38
dansmith	that will avoid trying to load that plugin	16:38
dansmith	I dunno why it appears sometimes and not others,	16:39
abhishekk	ack	16:39
dansmith	but pip and setuptools has been really flaky lately :/	16:39
abhishekk	agree	16:39
abhishekk	just curious why this problem is not there with upstream jobs	16:40
dansmith	well, right, and clearly it's not breaking everyone	16:40
dansmith	I just dunno what's happening in some situations.. third time I've seen it in a couple weeks, but I dunno what it takes to repro it	16:40
abhishekk	I will try to reproduce it on plain vm	16:41
abhishekk	tomorrow	16:41
abhishekk	finally standalone setup is ready	16:59
abhishekk	looks like graceful exit is not working for standalone as well	17:11
dansmith	in what way?	17:24
dansmith	when some opposition to wsgi was brought up, I was assured that the existing graceful shutdown stuff worked and was a requirement for wsgi mode	17:25
dansmith	so maybe I broke it?	17:25
abhishekk	no idea since when it is broken	17:26
abhishekk	but I am sure it was working earlier (in the beginning) as I tested it	17:27
dansmith	what's failing now specifically?	17:31
abhishekk	it is existing immediately when I send kill -9 or restart g-api service	17:31
dansmith	I was pretty sure I replicated the graceful shutdown when I did the threadpool stuff, as I had some thing that generated data slowly to keep things running while I tried to kill it	17:31
dansmith	abhishekk: kill -9 is not graceful, it gives the app no option to handle things	17:32
dansmith	so that's definitely going to not be graceful :D	17:32
dansmith	a userland application can't catch SIGKILL	17:32
abhishekk	kill -15?	17:32
dansmith	SIGINT is what you want	17:32
abhishekk	so service restart sends kill signal as well	17:33
dansmith	systemd will send INT (or similar) wait some timeout and then KILL	17:33
dansmith	and I think it's capped at something rather small, like 300s or something, which puts an upper bound on how long we can wait anyway	17:34
abhishekk	it does not wait	17:34
dansmith	I think it has to be configured to do so, hang on	17:34
dansmith	https://stackoverflow.com/questions/42978358/how-systemd-stop-command-actually-works	17:35
abhishekk	looking	17:35
dansmith	https://www.freedesktop.org/software/systemd/man/systemd.kill.html#	17:35
dansmith	"Processes will first be terminated via SIGTERM (unless the signal to send is changed via KillSignal= or RestartKillSignal=). Optionally, this is immediately followed by a SIGHUP (if enabled with SendSIGHUP=). If processes still remain after the main process of a unit has exited or the delay configured via the TimeoutStopSec= has passed, the termination request is repeated with the SIGKILL signal or the signal specified via	17:36
dansmith	FinalKillSignal= (unless this is disabled via the SendSIGKILL= option). See kill(2) for more information."	17:36
dansmith	https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStopSec=	17:37
abhishekk	so I need to check glance systemd file	17:37
dansmith	looks like you can set it very long now	17:37
abhishekk	ack	17:39
dansmith	abhishekk: but referring back to kill, look at signal(7) and you'll see:	17:39
dansmith	"The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored."	17:39
dansmith	kill nukes you from orbit immediately, no chance to be graceful	17:39
abhishekk	yeah	17:40
abhishekk	in short when I hit systemctl restart command it should wait atleast for some seconds, right	17:41
dansmith	restart and stop are different I think	17:42
dansmith	stop should for sure	17:42
abhishekk	ack	17:43
abhishekk	will check separately	17:43
dansmith	well, reload is for sure, maybe restart is just stop...start, not sure, looking	17:45
abhishekk	no, -15 and stop also exits immediately	17:46
dansmith	yeah, looks like restart should stop then start	17:47
dansmith	abhishekk: so if we're exiting immediately on TERM then that means we're not catching that signal	17:47
abhishekk	likely	17:47
dansmith	so stop ( and restart) are probably sending TERM and then waiting to send KILL, but we exit immediately, so it looks like KILL behavior	17:48
jokke_	yup ... iirc it's the SIGHUP that gets glance doing reload I can't remember if it's TERM or INT that is supposed to be graceful shutdown	17:48
dansmith	jokke_: either should trigger graceful shutdown	17:48
dansmith	INT is effectively Ctrl-C and TERM is the kinda service-y equivalent	17:48
dansmith	I'm not sure why glance doesn't use oslo.service, but this is all done there and combined into "stop" and "restart" handlers	17:49
jokke_	yeah	17:49
jokke_	I think glance service handlers were written way ahead of oslo.service and no-one just ever refactored it	17:50
dansmith	ack	17:50
dansmith	nova was too, and still has a wrapper around oslo.service, but it uses them under the covers	17:50
abhishekk	we use sighup for reload	17:50
dansmith	so we get automatic config file reload and stuff	17:50
abhishekk	https://review.opendev.org/c/openstack/glance/+/122181	17:51
abhishekk	this was the original patch which added reload functionality	17:51
dansmith	ack, so just need to catch INT and TERM as well to handle actual graceful shutdown	17:51
abhishekk	in 2015	17:51
abhishekk	but at this moment HUP is also not working as expected	17:51
dansmith	I'm not sure why HUP would need to drain anything...	17:52
dansmith	so I'm not sure what glance's "expected" HUP behavior is really	17:52
dansmith	usually it's "reload the config file" sort of things, which doesn't need to interrupt any in-progress cache or upload/download activities	17:53
jokke_	the reason it doesn't work, is that if the graceful shutdown does not work everything just dies. It's supposed to send graceful shutdown for all workers and in order of them going down, bring new ones up with new confs.	17:55
jokke_	but if everything just gets "die" ... well	17:55
dansmith	that's what HUP is supposed to do you mean?	17:55
jokke_	yes	17:55
dansmith	ack, well, that's fine I guess as long as it continues to answer queries until that happens,	17:56
dansmith	but for deployment stuff like updating containers, it should do similar things for TERM/INT	17:56
abhishekk	actual the old worker which was continuing old task does not accept any new request and exit once its work is finished	17:56
abhishekk	where as after sighup it was immediately starting new workers with new configs and old workers used to be exit once task is done	17:57
jokke_	yeah can't remember the exact code anymore. IIRC if it has say 4 workers in settings, it should bring one up so serve new connections (so temporarily 5 workers) and once the old ones finishes and start going down it should bring the worker count back to 4	17:57
abhishekk	right	17:57
abhishekk	_verify_and_respawn_children in glance/common/wsgi.py	17:58
dansmith	ack, so for shutdown it should close its listen socket (if standalone) and then do that, but terminate when complete	17:59
dansmith	uwsgi handles this for you for things that are in-process connections	17:59
dansmith	the other threads you have spawned are not automatic in that case, but if you're processing a connection it handed you, then it knows how to do it right	17:59
abhishekk	yeah	18:00
abhishekk	reload does not read new config values as well	18:13
abhishekk	after reload new configs are loaded (so just problem with existing workers needs to be wait is the problem)	18:34
abhishekk	I think this is what causing worker to exit and not wait	18:37
abhishekk	https://paste.opendev.org/show/bTjORjtLtbhUe93xQFDu/	18:37
abhishekk	looks like eventlet related issue	19:05
* abhishekk signing out for the day		19:05
abhishekk	reproducer (steps) https://paste.opendev.org/show/bFqCD4U4P0Q8MyIrsO4J/	19:06
dansmith	hey, at least the cache worker exited :)	19:15
abhishekk	:D	19:16

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!