Friday, 2018-03-23

ara-slack<josh.donzello> Hello #ara. I'm curious if there is some way to limit database connections from the ara webserver. I currently see 82 connections from the webserver (mod_wsgi on centos7) to an aurora instance. They are all sleeping in the `delayed send ok done` state. I have already increase the instance size but the webserver seems to use as many connections that are available.00:20
ara-slack<josh.donzello> I did see https://storyboard.openstack.org/#!/story/2000882 but I don't see anything in the documentation on how to handle the issue.00:21
ara-slack<josh.donzello> an error from the client because the webserver is utilizing all of the available database connections: ```ERROR! Unexpected Exception: (pymysql.err.OperationalError) (1040, u'Too many connections') (Background on this error at: http://sqlalche.me/e/e3q8)```00:22
*** weshay is now known as weshay_PTO00:25
ara-slack<dmsimard> Hi @josh.donzello! Unfortunately I have not gotten a chance to look into this yet. Do you *need* MySQL ? sqlite gets a bad reputation but it's really fast since there's no network overhead/latency or a concept of max connections.00:43
ara-slack<dmsimard> Are you running your playbooks from a "central" location like a bastion host ?00:44
ara-slack<josh.donzello> I don't need mysql, but RDS is easier than an ec2 instance with sqlite. I use a docker container, run via a zsh function, with ansible configs located in checked out git repositories.00:47
ara-slack<josh.donzello> also, we get to keep the database easily with the managed backups that RDS provides. I'd have to put something together to back up the sqlite db regularly. I'm lazy :(01:00
*** harlowja_ has quit IRC01:13
ara-slack<dmsimard> How many hosts are you running Ansible against ?01:47
*** dougbtv__ has joined #ara02:34
*** dougbtv_ has quit IRC02:34
*** bcoca has quit IRC03:37
*** harlowja has joined #ara04:19
*** weshay_PTO is now known as weshay04:21
*** harlowja has quit IRC05:12
*** mmercer has quit IRC06:30
*** mmercer has joined #ara06:30
*** jlozadad has quit IRC07:53
*** resmo has joined #ara09:54
*** paulfantom has joined #ara11:38
*** jlozadad has joined #ara12:39
*** tbielawa has joined #ara13:03
*** bcoca has joined #ara13:22
*** bcoca has joined #ara13:22
*** tbielawa is now known as tbielawa|errant14:15
*** tbielawa|errant has quit IRC14:19
-openstackstatus- NOTICE: zuul.o.o has been restarted to pick up latest code base and clear memory usage. Both check / gate queues were saved, be sure to check your patches and recheck when needed.14:50
ara-slack<josh.donzello> at the moment only a handful, but we are migrating to ansible to replace our current inhouse config management. A rough estimate in six months would be upwards of 30k systems.15:47
-openstackstatus- NOTICE: Gerrit will be temporarily unreachable as we restart it to complete the rename of some projects.15:47
ara-slack<josh.donzello> we will likely not be using Tower or AWX for anything, but I'm not 100% on that. Ideally, anyone on my team would use the container with the baked in ARA config to run playbooks from their laptops.15:48
*** harlowja has joined #ara16:16
*** resmo has quit IRC16:45
*** tbielawa has joined #ara16:55
*** tbielawa is now known as tbielawa|relocat16:59
*** tbielawa|relocat has quit IRC17:03
ara-slack<dmsimard> That sounds cool. I'd love to help you iron out any kinks you might find in terms of scalability and performance. I know there are several users that are running ARA against a large number of hosts like @harlowja and @pilotmattk.17:35
harlowjayupppers17:35
ara-slack<dmsimard> My ability to reproduce issues like those is limited but I started prototyping a sort of framework for scalability/performance profiling that would be ideal to have in ARA's CI. You can see an example experiment here: https://asciinema.org/a/XUDG8ZK3wY6QpiHPD39D1H65y17:36
ara-slack<josh.donzello> I'm happy to provide any info and detail that would be helpful. Is the conversation better outside of a thread for people in irc?17:40
ara-slack<dmsimard> Right now I'm trying to focus the available time I have towards the next major release (1.0) which you can read about here: https://dmsimard.com/categories/ara/ .. I'll be posting an update probably next week or the week to let everyone know how things are progressing17:40
ara-slack<dmsimard> @josh.donzello the messages (even in a thread) are mirrored to IRC :slightly_smiling_face:17:41
ara-slack<dmsimard> http://eavesdrop.openstack.org/irclogs/%23ara/%23ara.2018-03-23.log.html17:41
ara-slack<dmsimard> If you need privacy, use private messages :)17:42
ara-slack<josh.donzello> I'm not worried about privacy, just ease of use for everyone17:42
ara-slack<josh.donzello> regarding the sql connections, I'm not familiar enough with sqlalchemy or python to really dig in. But, I didn't configure anything special, i just followed the examples in the documentation. I started with a t2.small aurora instance with a 45 connection cap, then upgraded to a t2.large with a 90 connection cap and in both cases the connections capped out fairly quickly. I can upgrade to a larger one but I'm assuming the same issue will17:46
ara-slackhappen.17:46
*** tbielawa has joined #ara17:47
*** harlowja has quit IRC17:50
ara-slack<josh.donzello> File uploaded https://ara-community.slack.com/files/U9U3DGU8Z/F9V80F0FL/-.txt / https://slack-files.com/T6VAB05L7-F9V80F0FL-88436be1fb17:54
ara-slack<josh.donzello> File uploaded https://ara-community.slack.com/files/U9U3DGU8Z/F9UM6NQ2V/-.txt / https://slack-files.com/T6VAB05L7-F9UM6NQ2V-d18f591db717:55
*** harlowja has joined #ara18:33
*** harlowja has quit IRC18:38
*** tbielawa is now known as tbielawa|brb18:49
*** tbielawa|brb is now known as tbielawa19:02
ara-slack<dmsimard> I know that @harlowja has been using https://review.openstack.org/#/c/524427/ (which is not merged or released yet) but I don't think this would help about connection usage/re-usage ?19:09
*** harlowja has joined #ara19:09
ara-slack<harlowja> ya, try that out19:10
ara-slack<dmsimard> @harlowja would that help with connection usage ?19:10
ara-slack<harlowja> i think so, yup19:10
ara-slack<harlowja> though i can't quite say we are approaching 90 connections at once19:10
ara-slack<harlowja> lol19:10
ara-slack<harlowja> maybe more like 5->10->1519:11
ara-slack<dmsimard> even with those 1000 hosts ?19:11
*** openstackgerrit has joined #ara19:11
openstackgerritDavid Moreau Simard proposed openstack/ara master: Add support for configuring sqlalchemy pool size, timeout and recycle  https://review.openstack.org/52442719:11
dmsimardharlowja: ^ I'll land it19:15
dmsimardI've been meaning to but forgot/got sidetracked19:15
ara-slack<harlowja> well 1000 hosts is still just 1 ansible run injecting things in19:16
ara-slack<harlowja> not 1000 ansible runs19:16
dmsimardyeah but the amount of connections is bound to spike during ansible-playbook runs though19:16
ara-slack<harlowja> ya19:18
ara-slack<harlowja> though i don't think we've pushed it super-high19:18
dmsimardI mean.. I know AWS tends to nickel and dime folks but (at least as far as I'm concerned) capping to 90 connections is brutal19:19
dmsimardEven 500 connections is not really high, at least for the stuff I've had to deal with19:19
bcocawhy we had to add throttling to cloud modules19:19
dmsimardbcoca: throttling ?19:20
bcocato limit connections before hitting cap, also there is code that does 'backoff' once cap is hit19:20
dmsimardbcoca: wait so there's API limits/quotas to AWS things you mean ?19:20
bcocayes, thinking you can look at same code to avoid issues on the callback side19:21
*** tbielawa is now known as tbielawa|afk19:21
bcocawe have 'generic' and 'aws api specific' versions19:21
bcoca module_utils/19:21
bcocaive been thinking of doing same for a few callbacks, since right now they 'expect' remote pushes to always succed19:22
dmsimardwell, the "connections" we had been talking about was really mysql connections.. I'm not sure about implementing throttling at the callback level to mysql (where ara would save its data) because it's a blocking operation -- see https://github.com/ansible/ansible/issues/27705 :)19:22
bcocain the end its very similar code19:22
bcocaa) pre throttle cause you know lmits b) have backoff/limit error recognition and auto throttle19:23
bcocathat the api is a db/cloud/random service, does not matter as much as the 'handling logic'19:23
dmsimardyeah19:28
dmsimardOne of the features that has been asked about and that 1.0 will (eventually) allow is to have a generic message bus driver/callback19:30
dmsimardThe problem with mysql/postgresql is that it's synchronous and adds latency to the playbook runs19:30
dmsimardIf your database server is down or there's a lot of latency you're going to have a bad time19:30
dmsimardDropping the Ansible playbook/play/task data as messages on a message bus and have another reliable/durable process take those and send them to a database would yield better latency and scalability19:32
dmsimardWe're not there yet and it's definitely not a short term item on the to-do list but the backend/structure will allow for things like that to happen if people really need it19:32
dmsimardand when I say message bus, really, it's just a buffer.. it would be redis/memcached/whatever19:33
dmsimardI know that awx/tower uses memcached (and redis? I forget) but I don't remember what for19:33
ara-slack<josh.donzello> I should also note that those connections were seen with only running ansible-playbook a couple times and only two to four records in the schema.19:34
*** openstackgerrit has quit IRC19:34
bcocaposgres has a embeded message bus, see the CHANNEL stuff19:35
dmsimardbcoca: yeah but I'm not very much interested in tying ara to any single reldb engine19:35
dmsimardneed to pick up kids from school, brb19:36
bcocai normally just use syslog as THE message bus, has local storage and then can send to remote that can execute 'storage actoin' on read19:36
bcocaif any errors happen, you can reprocess from the local storage19:37
dmsimardoh, that's clever19:37
dmsimardbrb (really)19:37
bcocaprevious company,i built 2 'fake message busses' while waiting for dev team to setup real one, one was syslog based, the other was using email, sending to mta as the 'bus' and using procmail to process the consumers19:38
bcocareliable, always stores message on evey machine until after consumption is ensured, retry, reroute, congestiono handling ... all built in19:39
harlowjaohhhh, message bus driver niceeee19:57
*** tbielawa|afk is now known as tbielawa20:50
dmsimardbcoca: lol.. I've seen an implementation using twitter :)20:51
*** tbielawa has quit IRC21:00

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!