Friday, 2018-03-23

ara-slack	<josh.donzello> Hello #ara. I'm curious if there is some way to limit database connections from the ara webserver. I currently see 82 connections from the webserver (mod_wsgi on centos7) to an aurora instance. They are all sleeping in the `delayed send ok done` state. I have already increase the instance size but the webserver seems to use as many connections that are available.	00:20
ara-slack	<josh.donzello> I did see https://storyboard.openstack.org/#!/story/2000882 but I don't see anything in the documentation on how to handle the issue.	00:21
ara-slack	<josh.donzello> an error from the client because the webserver is utilizing all of the available database connections: ```ERROR! Unexpected Exception: (pymysql.err.OperationalError) (1040, u'Too many connections') (Background on this error at: http://sqlalche.me/e/e3q8)```	00:22
*** weshay is now known as weshay_PTO		00:25
ara-slack	<dmsimard> Hi @josh.donzello! Unfortunately I have not gotten a chance to look into this yet. Do you need MySQL ? sqlite gets a bad reputation but it's really fast since there's no network overhead/latency or a concept of max connections.	00:43
ara-slack	<dmsimard> Are you running your playbooks from a "central" location like a bastion host ?	00:44
ara-slack	<josh.donzello> I don't need mysql, but RDS is easier than an ec2 instance with sqlite. I use a docker container, run via a zsh function, with ansible configs located in checked out git repositories.	00:47
ara-slack	<josh.donzello> also, we get to keep the database easily with the managed backups that RDS provides. I'd have to put something together to back up the sqlite db regularly. I'm lazy :(	01:00
*** harlowja_ has quit IRC		01:13
ara-slack	<dmsimard> How many hosts are you running Ansible against ?	01:47
*** dougbtv__ has joined #ara		02:34
*** dougbtv_ has quit IRC		02:34
*** bcoca has quit IRC		03:37
*** harlowja has joined #ara		04:19
*** weshay_PTO is now known as weshay		04:21
*** harlowja has quit IRC		05:12
*** mmercer has quit IRC		06:30
*** mmercer has joined #ara		06:30
*** jlozadad has quit IRC		07:53
*** resmo has joined #ara		09:54
*** paulfantom has joined #ara		11:38
*** jlozadad has joined #ara		12:39
*** tbielawa has joined #ara		13:03
*** bcoca has joined #ara		13:22
*** bcoca has joined #ara		13:22
*** tbielawa is now known as tbielawa\|errant		14:15
*** tbielawa\|errant has quit IRC		14:19
-openstackstatus- NOTICE: zuul.o.o has been restarted to pick up latest code base and clear memory usage. Both check / gate queues were saved, be sure to check your patches and recheck when needed.		14:50
ara-slack	<josh.donzello> at the moment only a handful, but we are migrating to ansible to replace our current inhouse config management. A rough estimate in six months would be upwards of 30k systems.	15:47
-openstackstatus- NOTICE: Gerrit will be temporarily unreachable as we restart it to complete the rename of some projects.		15:47
ara-slack	<josh.donzello> we will likely not be using Tower or AWX for anything, but I'm not 100% on that. Ideally, anyone on my team would use the container with the baked in ARA config to run playbooks from their laptops.	15:48
*** harlowja has joined #ara		16:16
*** resmo has quit IRC		16:45
*** tbielawa has joined #ara		16:55
*** tbielawa is now known as tbielawa\|relocat		16:59
*** tbielawa\|relocat has quit IRC		17:03
ara-slack	<dmsimard> That sounds cool. I'd love to help you iron out any kinks you might find in terms of scalability and performance. I know there are several users that are running ARA against a large number of hosts like @harlowja and @pilotmattk.	17:35
harlowja	yupppers	17:35
ara-slack	<dmsimard> My ability to reproduce issues like those is limited but I started prototyping a sort of framework for scalability/performance profiling that would be ideal to have in ARA's CI. You can see an example experiment here: https://asciinema.org/a/XUDG8ZK3wY6QpiHPD39D1H65y	17:36
ara-slack	<josh.donzello> I'm happy to provide any info and detail that would be helpful. Is the conversation better outside of a thread for people in irc?	17:40
ara-slack	<dmsimard> Right now I'm trying to focus the available time I have towards the next major release (1.0) which you can read about here: https://dmsimard.com/categories/ara/ .. I'll be posting an update probably next week or the week to let everyone know how things are progressing	17:40
ara-slack	<dmsimard> @josh.donzello the messages (even in a thread) are mirrored to IRC :slightly_smiling_face:	17:41
ara-slack	<dmsimard> http://eavesdrop.openstack.org/irclogs/%23ara/%23ara.2018-03-23.log.html	17:41
ara-slack	<dmsimard> If you need privacy, use private messages :)	17:42
ara-slack	<josh.donzello> I'm not worried about privacy, just ease of use for everyone	17:42
ara-slack	<josh.donzello> regarding the sql connections, I'm not familiar enough with sqlalchemy or python to really dig in. But, I didn't configure anything special, i just followed the examples in the documentation. I started with a t2.small aurora instance with a 45 connection cap, then upgraded to a t2.large with a 90 connection cap and in both cases the connections capped out fairly quickly. I can upgrade to a larger one but I'm assuming the same issue will	17:46
ara-slack	happen.	17:46
*** tbielawa has joined #ara		17:47
*** harlowja has quit IRC		17:50
ara-slack	<josh.donzello> File uploaded https://ara-community.slack.com/files/U9U3DGU8Z/F9V80F0FL/-.txt / https://slack-files.com/T6VAB05L7-F9V80F0FL-88436be1fb	17:54
ara-slack	<josh.donzello> File uploaded https://ara-community.slack.com/files/U9U3DGU8Z/F9UM6NQ2V/-.txt / https://slack-files.com/T6VAB05L7-F9UM6NQ2V-d18f591db7	17:55
*** harlowja has joined #ara		18:33
*** harlowja has quit IRC		18:38
*** tbielawa is now known as tbielawa\|brb		18:49
*** tbielawa\|brb is now known as tbielawa		19:02
ara-slack	<dmsimard> I know that @harlowja has been using https://review.openstack.org/#/c/524427/ (which is not merged or released yet) but I don't think this would help about connection usage/re-usage ?	19:09
*** harlowja has joined #ara		19:09
ara-slack	<harlowja> ya, try that out	19:10
ara-slack	<dmsimard> @harlowja would that help with connection usage ?	19:10
ara-slack	<harlowja> i think so, yup	19:10
ara-slack	<harlowja> though i can't quite say we are approaching 90 connections at once	19:10
ara-slack	<harlowja> lol	19:10
ara-slack	<harlowja> maybe more like 5->10->15	19:11
ara-slack	<dmsimard> even with those 1000 hosts ?	19:11
*** openstackgerrit has joined #ara		19:11
openstackgerrit	David Moreau Simard proposed openstack/ara master: Add support for configuring sqlalchemy pool size, timeout and recycle https://review.openstack.org/524427	19:11
dmsimard	harlowja: ^ I'll land it	19:15
dmsimard	I've been meaning to but forgot/got sidetracked	19:15
ara-slack	<harlowja> well 1000 hosts is still just 1 ansible run injecting things in	19:16
ara-slack	<harlowja> not 1000 ansible runs	19:16
dmsimard	yeah but the amount of connections is bound to spike during ansible-playbook runs though	19:16
ara-slack	<harlowja> ya	19:18
ara-slack	<harlowja> though i don't think we've pushed it super-high	19:18
dmsimard	I mean.. I know AWS tends to nickel and dime folks but (at least as far as I'm concerned) capping to 90 connections is brutal	19:19
dmsimard	Even 500 connections is not really high, at least for the stuff I've had to deal with	19:19
bcoca	why we had to add throttling to cloud modules	19:19
dmsimard	bcoca: throttling ?	19:20
bcoca	to limit connections before hitting cap, also there is code that does 'backoff' once cap is hit	19:20
dmsimard	bcoca: wait so there's API limits/quotas to AWS things you mean ?	19:20
bcoca	yes, thinking you can look at same code to avoid issues on the callback side	19:21
*** tbielawa is now known as tbielawa\|afk		19:21
bcoca	we have 'generic' and 'aws api specific' versions	19:21
bcoca	module_utils/	19:21
bcoca	ive been thinking of doing same for a few callbacks, since right now they 'expect' remote pushes to always succed	19:22
dmsimard	well, the "connections" we had been talking about was really mysql connections.. I'm not sure about implementing throttling at the callback level to mysql (where ara would save its data) because it's a blocking operation -- see https://github.com/ansible/ansible/issues/27705 :)	19:22
bcoca	in the end its very similar code	19:22
bcoca	a) pre throttle cause you know lmits b) have backoff/limit error recognition and auto throttle	19:23
bcoca	that the api is a db/cloud/random service, does not matter as much as the 'handling logic'	19:23
dmsimard	yeah	19:28
dmsimard	One of the features that has been asked about and that 1.0 will (eventually) allow is to have a generic message bus driver/callback	19:30
dmsimard	The problem with mysql/postgresql is that it's synchronous and adds latency to the playbook runs	19:30
dmsimard	If your database server is down or there's a lot of latency you're going to have a bad time	19:30
dmsimard	Dropping the Ansible playbook/play/task data as messages on a message bus and have another reliable/durable process take those and send them to a database would yield better latency and scalability	19:32
dmsimard	We're not there yet and it's definitely not a short term item on the to-do list but the backend/structure will allow for things like that to happen if people really need it	19:32
dmsimard	and when I say message bus, really, it's just a buffer.. it would be redis/memcached/whatever	19:33
dmsimard	I know that awx/tower uses memcached (and redis? I forget) but I don't remember what for	19:33
ara-slack	<josh.donzello> I should also note that those connections were seen with only running ansible-playbook a couple times and only two to four records in the schema.	19:34
*** openstackgerrit has quit IRC		19:34
bcoca	posgres has a embeded message bus, see the CHANNEL stuff	19:35
dmsimard	bcoca: yeah but I'm not very much interested in tying ara to any single reldb engine	19:35
dmsimard	need to pick up kids from school, brb	19:36
bcoca	i normally just use syslog as THE message bus, has local storage and then can send to remote that can execute 'storage actoin' on read	19:36
bcoca	if any errors happen, you can reprocess from the local storage	19:37
dmsimard	oh, that's clever	19:37
dmsimard	brb (really)	19:37
bcoca	previous company,i built 2 'fake message busses' while waiting for dev team to setup real one, one was syslog based, the other was using email, sending to mta as the 'bus' and using procmail to process the consumers	19:38
bcoca	reliable, always stores message on evey machine until after consumption is ensured, retry, reroute, congestiono handling ... all built in	19:39
harlowja	ohhhh, message bus driver niceeee	19:57
*** tbielawa\|afk is now known as tbielawa		20:50
dmsimard	bcoca: lol.. I've seen an implementation using twitter :)	20:51
*** tbielawa has quit IRC		21:00

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!