Thursday, 2017-11-30

*** Bakey has quit IRC00:51
*** bcoca has quit IRC02:14
*** _dev has quit IRC05:18
*** resmo has joined #ara08:45
*** resmo has quit IRC08:56
*** BlessJah has quit IRC09:39
*** andymccr has quit IRC09:39
*** BlessJah has joined #ara09:39
*** andymccr has joined #ara09:40
*** resmo has joined #ara10:02
*** jparrill has quit IRC10:09
*** berendt has quit IRC10:09
*** jparrill has joined #ara10:15
*** berendt has joined #ara10:15
*** sshnaidm|off is now known as sshnaidm|rover10:48
*** vcn[m] has quit IRC11:14
*** vcn[m] has joined #ara12:29
*** tbielawa has joined #ara12:42
*** tbielawa has quit IRC12:42
*** tbielawa has joined #ara12:43
*** tbielawa is now known as tbielawa|drappt13:33
*** bcoca has joined #ara14:56
*** bcoca has joined #ara14:56
*** Bakey has joined #ara15:01
*** jrist has quit IRC15:08
*** jrist has joined #ara15:09
*** tbielawa|drappt is now known as tbielawa15:16
-openstackstatus- NOTICE: if you receieved a result of "RETRY_LIMIT" after 14:15 UTC, it was likely due to an error since corrected. please "recheck"15:36
*** tbielawa is now known as tbielawa|lunch17:00
*** resmo has quit IRC17:21
*** tbielawa|lunch is now known as tbielawa18:00
*** sshnaidm|rover is now known as sshnaidm|off18:02
*** wwriverrat has joined #ara21:40
wwriverratHi. I’ve been running ansible tied to ara here at GoDaddy. Been noticing it’s not capturing all of the tasks and seems to stop recording tasks after a while (sometimes early, sometimes later). Anyone else see something similar?21:45
wwriverratFor those that drop tasks, it appears on main page they never completed21:48
harlowjaMight be retries needed from what I see :/21:55
harlowjahttps://gist.github.com/harlowja/0c98e92c40c9366a2f56fbe5aa2f083b21:56
*** tbielawa has quit IRC22:09
dmsimardwwriverrat: hey there23:00
wwriverrathey :)23:00
dmsimardharlowja is way better than I am for this kind of stuff but let's try to figure something out23:01
harlowjau can do its23:01
harlowjalol23:01
dmsimardARA isn't very tolerant to failure right now but it's tricky... Are you getting that on a regular basis ?23:03
wwriverratI think harlowja got it figured out. Two thoughts: 1) Like he said, apply a retry and/or 2) Make sure when a connection is pulled for use, it has a connection that the server-side hasnt dropped23:03
dmsimardI say it's tricky because ara has to eventually let go and give up otherwise it could hang the entire ansible process: https://github.com/ansible/ansible/issues/2770523:04
wwriverratWe used to do a “test on borrow” technique: http://docs.sqlalchemy.org/en/latest/core/pooling.html#dealing-with-disconnects23:07
dmsimardSo I think we can do a minimum here and retry another time, though23:07
* dmsimard reads23:07
wwriverratThat way if the server-side drops the connection while it is in the pool, it gets refreshed before there is an attempt to use it23:07
dmsimardThat makes sense. The sqlalchemy stuff in ARA is very basic right now, and a bit all over the place tbh23:08
wwriverratrefreshed by typically invoking a “select 1” (and fixing the connection if broken) before your code attempts to use it23:09
dmsimardPart of that is being addressed with ara 1.0 but it won't be out for a while so that doesn't help you23:09
wwriverratno worries. Just a thought. Likely several ways to skin that cat.23:09
dmsimardIn 1.0, actually, ara no longer speaks to the database directly.23:09
dmsimardIn your use case, it'd be over HTTP with the new API.23:10
*** jparrill has quit IRC23:10
dmsimardHang on, let me get off my phone and on a keyboard... :D23:10
dmsimardyay, keyboard \o/23:11
dmsimardOk, so, in 1.0 if you're going to be sending data from one place to the other like you are doing now, it would be through the API. Everything goes through a single place, the API client, and I am betting a lot on that to be this central point where this kind of failure tolerance will be easy to implement23:13
dmsimardIf this is a regular occurrence for you, I'd love to accomodate you by releasing something helpful in the stable branch but I am honestly not very knowledgeable when it comes to stuff like that23:14
dmsimardI know that harlowja is familiar with oslo.db which likely has stuff like that built-in I suppose23:15
* harlowja is just a dumb person23:15
harlowjalol23:15
harlowjanope nope, don't know anything23:15
harlowjalol23:15
harlowjadumb josh23:15
harlowjabeing dumb23:15
harlowjalol23:15
dmsimardHowever, last time I've looked at oslo.db, it also required oslo.config, which totally derails the thing23:16
wwriverratI’ve been running ansible jobs all day. For ~50% of them, I’ve seen some point where the server-side of MySQL drops the connection under tour feet23:16
dmsimardwwriverrat: 50% ouch23:17
harlowjashitty networking?23:17
wwriverratthat or I believe mysql default wait_timeout is 30 seconds23:17
wwriverratso if you have a pool of say 10 connections and you finally get around to using #8, it could have been long ago disconnected23:18
wwriverratIn that link I sent above, there are several techniques for handling stale mysql connections (pessimistic, optimistic, pool recycle)23:20
harlowjafrom what i understand ara is using http://flask-sqlalchemy.pocoo.org/2.3/config/23:20
harlowjawhich has a SQLALCHEMY_POOL_TIMEOUT Specifies the connection timeout in seconds for the pool.23:21
harlowjamight just need to set that somehow23:21
harlowjaand set it lower than mysql/mariadb value23:21
dmsimardyeah I'm reading right now23:21
dmsimardwwriverrat: what's your database server timeout ?23:21
harlowja302 harlowja23:21
harlowjalol23:21
wwriverratlol.. stole my thunder23:22
harlowjawe just used a standard centos mariadb23:22
harlowjaso i'm guessing the default23:22
wwriverratI did the google thing a while ago and found 30s23:22
harlowjaya, its whatever the default is :-P23:23
harlowjaof `mariadb-server-5.5.52-1.el7.x86_64`23:23
harlowjaprob just need to know how to set SQLALCHEMY_POOL_TIMEOUT so that it gets picked up by ara23:24
dmsimardI love that sqlalchemy has something called connectionFairy23:24
dmsimardharlowja, wwriverrat: If I write a patch that would plumb that up, would you like to try that out and let me know if it helps ?23:25
harlowjaall sorts of weird shit in there, lol23:25
harlowjayes23:25
harlowjawwriverrat very much would like that23:25
harlowja:-P23:25
dmsimardok, give me a minute23:25
wwriverratYep, and it would be on us to ensure SQLALCHEMY_POOL_TIMEOUT is <= maridb wait_timeout23:25
*** openstackgerrit has joined #ara23:35
openstackgerritDavid Moreau Simard proposed openstack/ara master: Add support for configuring sqlalchemy pool size, timeout and recycle  https://review.openstack.org/52442723:35
dmsimardharlowja, wwriverrat ^23:35
dmsimarddo you use an ansible.cfg file to configure ara ? or env variables ?23:36
harlowjaansible.cfg23:36
dmsimardok, so, under [ara], you'll want to play with sqlalchemy_pool_size, sqlalchemy_pool_timeout and sqlalchemy_pool_recycle23:37
harlowjaany changes needed to do that?23:37
harlowja(ara code changes)23:37
dmsimardwell yeah, you're going to need the patch that I just sent23:38
harlowjaoh23:38
* harlowja slaps self in face23:38
harlowjalol23:38
dmsimarddon't do that :)23:38
* harlowja slaps someone else in face?23:38
harlowjalol23:38
dmsimard¯\_(ツ)_/¯23:38
dmsimardI have to catch dinner before I digest myself. Can you try that out and let me know if it helps ? I'll try and see where we go from there.23:39
harlowjayup23:39
dmsimardthanks23:41
dmsimardand like I mentioned, I'm definitely planning to bake failure tolerance at the API client layer which will be fairly easy. Right now there's queries to the database a bit all over the place so it's not super trivial beyond tweaking those pool settings23:42
harlowjayuppers23:42
harlowjamight have to get off flask_sqlachemy thingy23:42
harlowjaand onto <one of the other 10000000 libs>23:42
dmsimardlol23:42
harlowja^ not oslo.db, lol23:42
dmsimardit's funny that you mention that23:42
dmsimardbecause one of the only things that kept me on flask is removed in 1.023:43
dmsimardand awx is built in django23:43
dmsimardbut I don't really wanna rewrite everything, and I don't know django at all anyway23:43
harlowja:-p23:43
dmsimardbut django rest framework looks pretty dope23:43
dmsimardok afk food, please keep me updated23:44
harlowjawilldo23:44
wwriverratIronically: I upped my forks from 5 to 8. Getting all the results :P . Assume it’s cause it takes less time and taken from pool sooner :)23:47
*** Bakey has quit IRC23:54

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!