Monday, 2019-04-01

*** irclogbot_2 has quit IRC		02:19
*** apetrich has quit IRC		02:58
*** pgaxatte has joined #openstack-mistral		06:28
*** openstackgerrit has joined #openstack-mistral		06:59
openstackgerrit	Merged openstack/python-mistralclient stable/stein: Update .gitreview for stable/stein https://review.openstack.org/644169	06:59
*** apetrich has joined #openstack-mistral		07:32
rakhmerov	d0ugal, apetrich: hi, any idea why this is is happening? http://logs.openstack.org/16/648316/5/check/openstack-tox-docs/e9b504c/job-output.txt.gz#_2019-03-29_21_46_34_025523	07:33
apetrich	rakhmerov, that looks like flask from pecan inside sphinx. that's something I didn't know happened. But that seems to be a sphinx error	07:34
rakhmerov	yeah	07:35
apetrich	that is very odd indeed	07:35
*** akovi has joined #openstack-mistral		07:51
rakhmerov	akovi: hi )	07:58
akovi	rakhmerov: hi Renat!	07:59
rakhmerov	akovi: just FYI: my last few patches seriously improve Mistral performance	07:59
rakhmerov	in case of big data contextx	07:59
rakhmerov	contexts	07:59
rakhmerov	one of them was indeed a regression fix	07:59
akovi	good to know	07:59
rakhmerov	yes	07:59
rakhmerov	you may want to try it	07:59
akovi	I'll try to get an update in the product	07:59
rakhmerov	yep	08:00
akovi	Currently I'm working on solving the expiring tokens issue	08:00
rakhmerov	some of our NSs now get deployed 4-5x faster	08:00
rakhmerov	ooh, cool	08:00
akovi	Nasty hack, this cannot be upstreamed :(	08:00
rakhmerov	aah )	08:00
*** gkadam has joined #openstack-mistral		08:02
*** vgvoleg has joined #openstack-mistral		08:09
vgvoleg	Hi everyone! Does mistral have any recommendations about how to work with huge context? Engines lose their connections with rabbit during yaql evaluation, sometimes engines could be killed by OOM and we lose some delayed calls	08:25
akovi	you've got to increase the RPC and DB heartbeat timeouts	08:26
akovi	and increase the memory limits	08:26
vgvoleg	We have workflow, which has about 6000 tasks, each of them puts about 200 key:value objects to context	08:26
akovi	I had an instance lately where I had to move the limit above 8G	08:27
akovi	Does the kill happen when the Executuion completes?	08:27
vgvoleg	It should be less than 1GB of mem	08:28
vgvoleg	Yes	08:28
akovi	you should decrease the batch size	08:28
vgvoleg	We tried to give 10-15 GB to the engines, but they eat everything	08:28
akovi	Renat has introduced it las summer if I remember correctly	08:28
vgvoleg	we tried to set batch size 5 :D	08:29
vgvoleg	ok we'll try to increase RPC heartbeat timeout	08:29
vgvoleg	ty	08:29
akovi	let the limits go away	08:30
akovi	run only a single engine	08:30
akovi	to see what you have to deal with	08:30
akovi	looks like the json marshalling-unmarshalling is really memory intensive	08:30
akovi	200 tasks with 4MB contexts are 800MB context	08:31
akovi	this required 8-9 GB memory to go through	08:31
vgvoleg	how have you calculate this? :D	08:32
vgvoleg	4MB context	08:32
*** gkadam has quit IRC		08:32
vgvoleg	I've just woke up...	08:32
akovi	this is an example for what I had to tackle lately	08:33
akovi	easiest to check the context size from the DB	08:33
akovi	select sum(len(in_context)) from action_executions_v2;	08:34
akovi	or something similar	08:34
vgvoleg	oh ty	08:34
vgvoleg	`increase the RPC and DB heartbeat timeouts` == heartbeat_timeout_threshold ?	08:36
akovi	yes, and the number of missed HBs	08:36
akovi	during YAQL evaluation the thread is not yielded and the greenthread is stuck	08:37
akovi	we tried to put it on a separate real thread but other issues arised immediately	08:38
akovi	(or rather: consequently)	08:38
vgvoleg	yes, I thought about it	08:38
vgvoleg	`the number of missed HBs` - can't find it	08:38
akovi	#heartbeat_rate = 2	08:39
akovi	#heartbeat_timeout_threshold = 60	08:39
akovi	I think these are the two important	08:39
akovi	#heartbeat_interval = 3	08:40
akovi	maybe this one too	08:40
akovi	it's freakin' loaded with legacy :)	08:40
vgvoleg	ty so much	08:40
vgvoleg	Is there any mechanish to handle stucked delayed calls?	08:42
vgvoleg	I've found `pickup_job_after` option	08:42
akovi	yes	08:42
vgvoleg	But I can't get is it what I needed	08:43
akovi	ah, it's only in our version	08:44
akovi	you can implement it with a cron job	08:44
akovi	select the delayed calls that have the processing=1 flag for too long	08:44
akovi	simply update these lines to 0	08:44
akovi	the engine will start processing them	08:45
akovi	this is the simplest way we could tackle OOM kills too	08:45
vgvoleg	Yes, I thought about how to implement it locally, and how you handle the case when this timeout is less than the executing time?	08:46
akovi	it should not be :)	08:47
akovi	delayed calls are usually short	08:47
vgvoleg	Not all functions, that could be delayed, are OK with executing two times simultaneously	08:47
vgvoleg	oh OK	08:47
akovi	this is practically a fix for the discrepancy that ongoing calls may not have been administered consistently at the time of the OOM kill	08:48
akovi	so yes, this is far slower than having an optimal solution but at least it keeps your service running	08:49
vgvoleg	I think by the time we stop falling because of the memory, it will be possible to set some timeout	08:49
vgvoleg	thank you	08:50
akovi	you're welcome :)	08:50
vgvoleg	This mechanism could be implemented right in the scheduler, was there any problem with it? Or why do you use external cron job for it?	08:52
*** bobh has joined #openstack-mistral		09:06
*** bobh has quit IRC		09:11
*** jrist has quit IRC		09:15
*** jrist has joined #openstack-mistral		09:16
*** gkadam has joined #openstack-mistral		09:28
vgvoleg	btw parallel execution tooks more time then consistent	09:35
vgvoleg	lol	09:35
*** akovi has quit IRC		09:44
*** akovi has joined #openstack-mistral		09:45
*** d0ugal has quit IRC		09:55
*** d0ugal has joined #openstack-mistral		10:05
rakhmerov	vgvoleg: can you remind what version of Mistral you're using?	10:36
rakhmerov	if you have a version from last summer (I remember something like this, Mistral Queens) than the recommendation #1 from me is to switch to the latest available version from master	10:37
rakhmerov	+ my latest patch https://review.openstack.org/#/c/648316/	10:38
rakhmerov	this patch removes a HUGE performance regression related to YAQL evaluation	10:38
rakhmerov	also lots of performance improvements were made in Oct-Nov	10:39
rakhmerov	apetrich, d0ugal: sphinx was updated to 2.0.0 on Mar 28	10:51
rakhmerov	I guess that's the cause	10:51
d0ugal	Sounds likely	10:52
openstackgerrit	Renat Akhmerov proposed openstack/mistral master: WIP: try to pin sphinx version https://review.openstack.org/648944	10:58
vgvoleg	rakhmerov: I'm using latest with your patch	11:01
rakhmerov	ok	11:01
rakhmerov	6000 tasks is a lot :)	11:01
rakhmerov	I guess parsing YAML alone takes very much time	11:02
vgvoleg	we are trying to place all cycles in publish section to one yaql expression, I think we could win some time with it	11:08
rakhmerov	vgvoleg: cycles?	11:22
rakhmerov	what do you mean by that?	11:22
rakhmerov	d0ugal, apetrich: yes, it's sphinx version. https://review.openstack.org/#/c/648944/ passes doc but doesn't pass requirements-check	11:23
apetrich	rakhmerov, great	11:23
rakhmerov	I probably need your advice here. Do you think we need to send a patch to the global requirements to pin sphinx version?	11:24
rakhmerov	vgvoleg: and make sure to set the config property "convert_input_data" to false	11:25
apetrich	rakhmerov, I'm not sure if that isn't an interaction with pecan	11:25
apetrich	I'd keep like that for now. It is going to hit other projects if that is a sphinx bug. if it is not and it is just an interaction with pecan we might need to leave like this anyway	11:26
rakhmerov	it seems like that the new version of sphinx 2.0.0 conflicts with sphinxcontrib-pecanwsme	11:26
rakhmerov	I guess that's what it is	11:27
rakhmerov	there probably must be a new version of sphinxcontrib-pecanwsme to fix that but it doesn't exist yet	11:27
rakhmerov	because yes, the problem comes from interaction of sphinx and pecanwsme	11:28
apetrich	yeah	11:28
apetrich	makes sense	11:28
apetrich	"sense"	11:28
*** akovi has quit IRC		11:30
rakhmerov	ok, I'll leave it as is for now	11:30
rakhmerov	let's see if they fix it	11:30
*** akovi has joined #openstack-mistral		11:30
openstackgerrit	Vlad Gusev proposed openstack/mistral master: Add release note for I04ba85488b27cb05c3b81ad8c973c3cc3fe56d36 https://review.openstack.org/648956	12:11
*** apetrich has quit IRC		12:16
*** apetrich has joined #openstack-mistral		12:17
*** apetrich has quit IRC		12:36
openstackgerrit	Vlad Gusev proposed openstack/mistral stable/stein: Add http_proxy_to_wsgi middleware https://review.openstack.org/647694	12:36
*** apetrich has joined #openstack-mistral		12:53
*** irclogbot_2 has joined #openstack-mistral		13:26
*** bobh has joined #openstack-mistral		15:15
*** bobh has quit IRC		15:19
*** pgaxatte has quit IRC		15:49
*** akovi has quit IRC		16:24
*** gkadam has quit IRC		17:05
*** bobh has joined #openstack-mistral		17:15
*** zigo has quit IRC		17:37
*** bobh has quit IRC		17:54
*** openstackgerrit has quit IRC		23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!