ara-slack | <josh.donzello> Hello #ara. I'm curious if there is some way to limit database connections from the ara webserver. I currently see 82 connections from the webserver (mod_wsgi on centos7) to an aurora instance. They are all sleeping in the `delayed send ok done` state. I have already increase the instance size but the webserver seems to use as many connections that are available. | 00:20 |
---|---|---|
ara-slack | <josh.donzello> I did see https://storyboard.openstack.org/#!/story/2000882 but I don't see anything in the documentation on how to handle the issue. | 00:21 |
ara-slack | <josh.donzello> an error from the client because the webserver is utilizing all of the available database connections: ```ERROR! Unexpected Exception: (pymysql.err.OperationalError) (1040, u'Too many connections') (Background on this error at: http://sqlalche.me/e/e3q8)``` | 00:22 |
*** weshay is now known as weshay_PTO | 00:25 | |
ara-slack | <dmsimard> Hi @josh.donzello! Unfortunately I have not gotten a chance to look into this yet. Do you *need* MySQL ? sqlite gets a bad reputation but it's really fast since there's no network overhead/latency or a concept of max connections. | 00:43 |
ara-slack | <dmsimard> Are you running your playbooks from a "central" location like a bastion host ? | 00:44 |
ara-slack | <josh.donzello> I don't need mysql, but RDS is easier than an ec2 instance with sqlite. I use a docker container, run via a zsh function, with ansible configs located in checked out git repositories. | 00:47 |
ara-slack | <josh.donzello> also, we get to keep the database easily with the managed backups that RDS provides. I'd have to put something together to back up the sqlite db regularly. I'm lazy :( | 01:00 |
*** harlowja_ has quit IRC | 01:13 | |
ara-slack | <dmsimard> How many hosts are you running Ansible against ? | 01:47 |
*** dougbtv__ has joined #ara | 02:34 | |
*** dougbtv_ has quit IRC | 02:34 | |
*** bcoca has quit IRC | 03:37 | |
*** harlowja has joined #ara | 04:19 | |
*** weshay_PTO is now known as weshay | 04:21 | |
*** harlowja has quit IRC | 05:12 | |
*** mmercer has quit IRC | 06:30 | |
*** mmercer has joined #ara | 06:30 | |
*** jlozadad has quit IRC | 07:53 | |
*** resmo has joined #ara | 09:54 | |
*** paulfantom has joined #ara | 11:38 | |
*** jlozadad has joined #ara | 12:39 | |
*** tbielawa has joined #ara | 13:03 | |
*** bcoca has joined #ara | 13:22 | |
*** bcoca has joined #ara | 13:22 | |
*** tbielawa is now known as tbielawa|errant | 14:15 | |
*** tbielawa|errant has quit IRC | 14:19 | |
-openstackstatus- NOTICE: zuul.o.o has been restarted to pick up latest code base and clear memory usage. Both check / gate queues were saved, be sure to check your patches and recheck when needed. | 14:50 | |
ara-slack | <josh.donzello> at the moment only a handful, but we are migrating to ansible to replace our current inhouse config management. A rough estimate in six months would be upwards of 30k systems. | 15:47 |
-openstackstatus- NOTICE: Gerrit will be temporarily unreachable as we restart it to complete the rename of some projects. | 15:47 | |
ara-slack | <josh.donzello> we will likely not be using Tower or AWX for anything, but I'm not 100% on that. Ideally, anyone on my team would use the container with the baked in ARA config to run playbooks from their laptops. | 15:48 |
*** harlowja has joined #ara | 16:16 | |
*** resmo has quit IRC | 16:45 | |
*** tbielawa has joined #ara | 16:55 | |
*** tbielawa is now known as tbielawa|relocat | 16:59 | |
*** tbielawa|relocat has quit IRC | 17:03 | |
ara-slack | <dmsimard> That sounds cool. I'd love to help you iron out any kinks you might find in terms of scalability and performance. I know there are several users that are running ARA against a large number of hosts like @harlowja and @pilotmattk. | 17:35 |
harlowja | yupppers | 17:35 |
ara-slack | <dmsimard> My ability to reproduce issues like those is limited but I started prototyping a sort of framework for scalability/performance profiling that would be ideal to have in ARA's CI. You can see an example experiment here: https://asciinema.org/a/XUDG8ZK3wY6QpiHPD39D1H65y | 17:36 |
ara-slack | <josh.donzello> I'm happy to provide any info and detail that would be helpful. Is the conversation better outside of a thread for people in irc? | 17:40 |
ara-slack | <dmsimard> Right now I'm trying to focus the available time I have towards the next major release (1.0) which you can read about here: https://dmsimard.com/categories/ara/ .. I'll be posting an update probably next week or the week to let everyone know how things are progressing | 17:40 |
ara-slack | <dmsimard> @josh.donzello the messages (even in a thread) are mirrored to IRC :slightly_smiling_face: | 17:41 |
ara-slack | <dmsimard> http://eavesdrop.openstack.org/irclogs/%23ara/%23ara.2018-03-23.log.html | 17:41 |
ara-slack | <dmsimard> If you need privacy, use private messages :) | 17:42 |
ara-slack | <josh.donzello> I'm not worried about privacy, just ease of use for everyone | 17:42 |
ara-slack | <josh.donzello> regarding the sql connections, I'm not familiar enough with sqlalchemy or python to really dig in. But, I didn't configure anything special, i just followed the examples in the documentation. I started with a t2.small aurora instance with a 45 connection cap, then upgraded to a t2.large with a 90 connection cap and in both cases the connections capped out fairly quickly. I can upgrade to a larger one but I'm assuming the same issue will | 17:46 |
ara-slack | happen. | 17:46 |
*** tbielawa has joined #ara | 17:47 | |
*** harlowja has quit IRC | 17:50 | |
ara-slack | <josh.donzello> File uploaded https://ara-community.slack.com/files/U9U3DGU8Z/F9V80F0FL/-.txt / https://slack-files.com/T6VAB05L7-F9V80F0FL-88436be1fb | 17:54 |
ara-slack | <josh.donzello> File uploaded https://ara-community.slack.com/files/U9U3DGU8Z/F9UM6NQ2V/-.txt / https://slack-files.com/T6VAB05L7-F9UM6NQ2V-d18f591db7 | 17:55 |
*** harlowja has joined #ara | 18:33 | |
*** harlowja has quit IRC | 18:38 | |
*** tbielawa is now known as tbielawa|brb | 18:49 | |
*** tbielawa|brb is now known as tbielawa | 19:02 | |
ara-slack | <dmsimard> I know that @harlowja has been using https://review.openstack.org/#/c/524427/ (which is not merged or released yet) but I don't think this would help about connection usage/re-usage ? | 19:09 |
*** harlowja has joined #ara | 19:09 | |
ara-slack | <harlowja> ya, try that out | 19:10 |
ara-slack | <dmsimard> @harlowja would that help with connection usage ? | 19:10 |
ara-slack | <harlowja> i think so, yup | 19:10 |
ara-slack | <harlowja> though i can't quite say we are approaching 90 connections at once | 19:10 |
ara-slack | <harlowja> lol | 19:10 |
ara-slack | <harlowja> maybe more like 5->10->15 | 19:11 |
ara-slack | <dmsimard> even with those 1000 hosts ? | 19:11 |
*** openstackgerrit has joined #ara | 19:11 | |
openstackgerrit | David Moreau Simard proposed openstack/ara master: Add support for configuring sqlalchemy pool size, timeout and recycle https://review.openstack.org/524427 | 19:11 |
dmsimard | harlowja: ^ I'll land it | 19:15 |
dmsimard | I've been meaning to but forgot/got sidetracked | 19:15 |
ara-slack | <harlowja> well 1000 hosts is still just 1 ansible run injecting things in | 19:16 |
ara-slack | <harlowja> not 1000 ansible runs | 19:16 |
dmsimard | yeah but the amount of connections is bound to spike during ansible-playbook runs though | 19:16 |
ara-slack | <harlowja> ya | 19:18 |
ara-slack | <harlowja> though i don't think we've pushed it super-high | 19:18 |
dmsimard | I mean.. I know AWS tends to nickel and dime folks but (at least as far as I'm concerned) capping to 90 connections is brutal | 19:19 |
dmsimard | Even 500 connections is not really high, at least for the stuff I've had to deal with | 19:19 |
bcoca | why we had to add throttling to cloud modules | 19:19 |
dmsimard | bcoca: throttling ? | 19:20 |
bcoca | to limit connections before hitting cap, also there is code that does 'backoff' once cap is hit | 19:20 |
dmsimard | bcoca: wait so there's API limits/quotas to AWS things you mean ? | 19:20 |
bcoca | yes, thinking you can look at same code to avoid issues on the callback side | 19:21 |
*** tbielawa is now known as tbielawa|afk | 19:21 | |
bcoca | we have 'generic' and 'aws api specific' versions | 19:21 |
bcoca | module_utils/ | 19:21 |
bcoca | ive been thinking of doing same for a few callbacks, since right now they 'expect' remote pushes to always succed | 19:22 |
dmsimard | well, the "connections" we had been talking about was really mysql connections.. I'm not sure about implementing throttling at the callback level to mysql (where ara would save its data) because it's a blocking operation -- see https://github.com/ansible/ansible/issues/27705 :) | 19:22 |
bcoca | in the end its very similar code | 19:22 |
bcoca | a) pre throttle cause you know lmits b) have backoff/limit error recognition and auto throttle | 19:23 |
bcoca | that the api is a db/cloud/random service, does not matter as much as the 'handling logic' | 19:23 |
dmsimard | yeah | 19:28 |
dmsimard | One of the features that has been asked about and that 1.0 will (eventually) allow is to have a generic message bus driver/callback | 19:30 |
dmsimard | The problem with mysql/postgresql is that it's synchronous and adds latency to the playbook runs | 19:30 |
dmsimard | If your database server is down or there's a lot of latency you're going to have a bad time | 19:30 |
dmsimard | Dropping the Ansible playbook/play/task data as messages on a message bus and have another reliable/durable process take those and send them to a database would yield better latency and scalability | 19:32 |
dmsimard | We're not there yet and it's definitely not a short term item on the to-do list but the backend/structure will allow for things like that to happen if people really need it | 19:32 |
dmsimard | and when I say message bus, really, it's just a buffer.. it would be redis/memcached/whatever | 19:33 |
dmsimard | I know that awx/tower uses memcached (and redis? I forget) but I don't remember what for | 19:33 |
ara-slack | <josh.donzello> I should also note that those connections were seen with only running ansible-playbook a couple times and only two to four records in the schema. | 19:34 |
*** openstackgerrit has quit IRC | 19:34 | |
bcoca | posgres has a embeded message bus, see the CHANNEL stuff | 19:35 |
dmsimard | bcoca: yeah but I'm not very much interested in tying ara to any single reldb engine | 19:35 |
dmsimard | need to pick up kids from school, brb | 19:36 |
bcoca | i normally just use syslog as THE message bus, has local storage and then can send to remote that can execute 'storage actoin' on read | 19:36 |
bcoca | if any errors happen, you can reprocess from the local storage | 19:37 |
dmsimard | oh, that's clever | 19:37 |
dmsimard | brb (really) | 19:37 |
bcoca | previous company,i built 2 'fake message busses' while waiting for dev team to setup real one, one was syslog based, the other was using email, sending to mta as the 'bus' and using procmail to process the consumers | 19:38 |
bcoca | reliable, always stores message on evey machine until after consumption is ensured, retry, reroute, congestiono handling ... all built in | 19:39 |
harlowja | ohhhh, message bus driver niceeee | 19:57 |
*** tbielawa|afk is now known as tbielawa | 20:50 | |
dmsimard | bcoca: lol.. I've seen an implementation using twitter :) | 20:51 |
*** tbielawa has quit IRC | 21:00 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!