Friday, 2016-02-26

*** chlong_ has quit IRC00:03
*** james_li_ has quit IRC00:04
*** penick has quit IRC00:05
*** jasonsb has joined #openstack-dns01:01
*** ducttape_ has joined #openstack-dns01:08
*** rudrajit has joined #openstack-dns01:14
*** ducttape_ has quit IRC01:15
*** rudrajit_ has quit IRC01:16
*** stanzgy has joined #openstack-dns01:20
*** jasonsb has quit IRC01:25
*** jasonsb has joined #openstack-dns01:35
*** bpokorny has quit IRC01:43
openstackgerritOpenStack Proposal Bot proposed openstack/designate: Updated from global requirements  https://review.openstack.org/28501601:47
openstackgerritOpenStack Proposal Bot proposed openstack/designate-dashboard: Updated from global requirements  https://review.openstack.org/28501701:47
*** rudrajit has quit IRC01:47
*** chlong_ has joined #openstack-dns01:50
*** jasonsb has quit IRC02:13
*** jasonsb has joined #openstack-dns02:14
*** EricGonczer_ has joined #openstack-dns02:24
*** EricGonczer_ has quit IRC02:25
*** EricGonczer_ has joined #openstack-dns02:27
*** jasonsb has quit IRC02:32
*** ducttape_ has joined #openstack-dns02:37
*** ducttape_ has quit IRC02:40
*** ducttape_ has joined #openstack-dns02:44
*** ducttape_ has quit IRC02:46
*** fawadkhaliq has joined #openstack-dns02:48
*** ducttape_ has joined #openstack-dns02:53
*** rudrajit has joined #openstack-dns03:10
*** EricGonczer_ has quit IRC03:12
*** boris-42 has quit IRC03:24
*** fawadkhaliq has quit IRC03:32
*** ducttape_ has quit IRC03:46
*** jasonsb has joined #openstack-dns04:08
*** richm has quit IRC04:10
*** rudrajit has quit IRC04:47
*** rudrajit has joined #openstack-dns04:47
*** fawadkhaliq has joined #openstack-dns04:58
*** ducttape_ has joined #openstack-dns05:05
*** ducttape_ has quit IRC05:37
*** rudrajit_ has joined #openstack-dns05:51
*** rudrajit has quit IRC05:54
*** jasonsb has quit IRC07:00
*** chlong_ has quit IRC07:26
*** jordanP has joined #openstack-dns08:49
*** rudrajit_ has quit IRC09:05
*** jschwarz has joined #openstack-dns09:20
*** jordanP has quit IRC09:32
*** fawadkhaliq has quit IRC09:37
*** jordanP has joined #openstack-dns09:46
*** kei_yama has quit IRC10:00
eanderssonIt's funny. I had that patches in my build, but forgot to re-apply the patch after updating to 2015.1.1.10:16
*** ducttape_ has joined #openstack-dns10:34
*** stanzgy has quit IRC10:36
*** en_austin has joined #openstack-dns10:37
en_austinKiall: ping?10:37
*** jschwarz has quit IRC10:46
*** ducttape_ has quit IRC10:57
eanderssonLooking ath the [service:pool_manager] section. What would be the recommended changes from default to make it less spammy?11:20
eanderssonI am suspecting that we might be overloading powerdns with too many requests.11:27
federico3eandersson: do you have any number?12:08
eanderssonYou mean any numbers set in the configuration already?12:09
*** krotscheck_dcm is now known as krotscheck12:13
eanderssonI really don't like this bug: http://paste.openstack.org/show/yPxcgi4ErEr4FV8zDJPz/12:22
eanderssonIt happens if you try to do a dns lookup on mdns while the record is in DELETE PENDING12:22
*** jordanP has quit IRC12:35
federico3uh, opening bug report. eandersson anything else you can add?12:35
*** jordanP has joined #openstack-dns12:36
eanderssonThat bug is a side effect, but I hit you up with some more details in a PM12:37
*** johnbelamaric has quit IRC12:56
Kialleandersson: ah, that should be an easy fix. Is there a bug filed?13:01
KiallI can see what's gone wrong there..13:01
Kiallen_austin: pong - so. https://bugs.launchpad.net/designate/+bug/154998013:02
openstackLaunchpad bug 1549980 in Designate "MiniDNS TCP connections stop being accepted" [Critical,In progress] - Assigned to Rahman Syed (rahman-syed-w)13:02
KiallI was about to pull your logs down again to grep for ^13:02
eanderssonKiall: I confirmed that I didn't have the patch in place.13:03
eanderssonWhat is even funnier is that I did have the patch before upgrading to the latest Kilo release. I forgot to re-merge it after the upgrade. :p13:03
Kialleandersson: I saw, that sucks. We'll merge it today13:03
Kiall(and backport)13:03
eanderssonAwesome.13:03
eanderssonThat is one less issue to worry about.13:04
Kiallen_austin: yep, your logs have the faithful "timed out"13:04
*** km has quit IRC13:08
*** ducttape_ has joined #openstack-dns13:13
en_austinso, that means, that you've found an issue that caused my mDNS to hang?13:22
en_austinKiall: ^13:22
Kiallen_austin: yep13:23
Kiallnot me13:24
Kialleandersson found it13:24
en_austineandersson: thank you :)13:24
Kiallhttps://review.openstack.org/#/c/284912 should fix it13:24
en_austinI can apply that to my instance and check it out.13:24
Kiall(with a more comprehensive fix to prevent it in the future coming..)13:24
en_austinand also Kiall do you remember a "with lockutils.lock" issue with my PoolManager? What do you think about it?13:24
en_austinAre there a way to fix it out?13:24
en_austinNow I'm experiencing a zone falling to "ERROR" state (seems like some race condition). Or that patch will fix it?13:25
KiallI remember that, I'm trying to remember the exact reason we believed it to be an issue.. It wasn't mdns lockup, was it?13:25
en_austinPoolManager13:25
en_austinhttps://bugs.launchpad.net/designate/+bug/153449013:25
openstacken_austin: Error: malone bug 1534490 not found13:25
KiallYea, it could have been us thinking PM was overloading mDNS - Trying to remember13:25
Kiallah.. Okay, remembering now. Can you give ^ patch a go, and if it's still happening, we'll dig in again13:26
en_austinSo. I should revert a removal of "with lockutils.lock" and apply a patch from review you've gave me above?13:30
eanderssonen_austin, is that similar to this? http://paste.openstack.org/show/T8LXLLnUQvu0HIWc1yZY/13:30
en_austin+/-13:30
en_austinI'll show you now.13:31
en_austin2016-02-26 16:31:28.913 24354 WARNING designate.mdns.notify [req-bcfd8c83-2b93-416f-9b75-9d327435122d noauth-user noauth-project - - -] Got lower serial for 'xxxxxxx.' to 'xxxxxxx:53'. Expected:'1456493481'. Got:'1456493431'.Retries left='9'13:31
en_austinAnd that repeats.13:32
eanderssonYea, I have the same issue.13:32
en_austinThen zone can or recover to SUCCESS, or fall to ERROR.13:32
KiallI think that "Got lower serial" message is really OK - and shouldn't be a warning.. It's just "The data hasn't propagated yet"... If it goes to success, nothing is wrong or in need of warning about13:33
eanderssonI think for us pdns is simply not keeping up.13:34
en_austinThe issue is - that it often does not goes to success...13:34
en_austineandersson: I'm running BIND on my backends jfyi13:34
en_austinKiall: there was not "got lower serial" before I've removed that lock in PM code13:35
*** ducttape_ has quit IRC13:35
en_austinI think it occurs because more new records are trying to propagate to backend while he has not processed an older records13:36
*** fawadkhaliq has joined #openstack-dns13:36
en_austinand there is no problem, really, until that retries are falling zone to ERROR state13:36
KiallSo, is the "cosmetic" only, as in assuming mDNS doesn't fall over, content is going out reasonably fast and things return to ACTIVE?13:37
Kiallis it*13:37
Kialleandersson: PowerDNS is doing lots of work with large zones like yours.. You may be a candidate for using BIND, or something that doesn't BEGIN; DELETE *; INSERT *; COMMIT13:38
eanderssonActually in this case we don't have large zones, but many zones instead.13:38
KiallOh, I thought you had a bunch of large zones? I'm probably getting people mixed up13:39
Kiall(it happens ;))13:39
eandersson:D13:39
en_austinSometimes - yes.13:45
en_austin:D13:45
en_austinKiall: ^13:45
en_austinsometimes - such log entries causes my designate stop to propagate new records (both of zones are failing to ERROR and mdns restart helps, yeah)13:46
en_austinsometimes - it's just falls to ERROR and self-recovers here in 1-2min13:46
en_austin(e.g returns to ACTIVE)13:46
en_austinso Kiall i'm reverting that http://paste.openstack.org/show/485460/ to its original state (with 'with lockutils.lock') and applying https://review.openstack.org/#/c/284912/3/designate/service.py that patch.13:54
en_austinHope it helps...13:54
*** johnbelamaric has joined #openstack-dns13:57
*** richm has joined #openstack-dns14:03
Kiallen_austin: lets see, although both changes may be necessay. we'll see.14:07
en_austinI've done that i've said before - will now restart Designate and look for it.14:07
en_austinIf it will hangs - I will re-apply a removal of "with lockutils.lock" and try to use it with it.14:08
eanderssonbtw Kiall was this normal? http://paste.openstack.org/show/T8LXLLnUQvu0HIWc1yZY/14:08
Kialleandersson: was this during a new zone creation?14:09
eanderssonIt seems to happen during record create/delete14:09
KiallIf not, mDNS failed to query the nameserver for the zones SOA14:09
en_austinbtw eandersson I've seen the same logs.. sometimes, not always.14:09
eanderssonand the same session you will have it retry endlessly14:10
eanderssonI changed it from 3 to 9 retries, and it just keeps retrying until it runs out of retries14:10
eanderssonIt gets propagated eventually, usually 60-180s14:10
eanderssonCould it be after it hits periodic_recovery_interval ?14:11
KiallThat could be a mis-config14:12
eanderssonOn the pool manager?14:13
KiallThe log is not detailed enough to be able to tell :/14:13
KiallYea, likely a pool_nameserver section is not right14:13
Kiall(If it always happens)14:13
eanderssonall that is in pool_nameserver is the ip and port14:13
KiallYea, one of those might be wrong :)14:14
Kiall(is there 1 pool_nameserver section, or more than 1?)14:14
eandersson1 pool_nameserver section14:14
eanderssonconfirmed that the ip and port are correct14:14
eanderssonOur theory at the moment is that pdns is overloaded14:15
KiallThat's certainly possible, and would explain it too14:15
eanderssonWhat exactly does periodic_recovery do?14:17
*** karimb has joined #openstack-dns14:17
*** karimb has quit IRC14:17
eanderssonIt just checks for records pending and tries to fix that or?14:18
Kiallrecovery finds things in ERROR status, and attempts to fix them14:24
*** mlavalle has joined #openstack-dns14:41
eanderssonKiall: Unable to AXFR zone 'example.com' from remote '<pdns-ip>' (resolver): Timeout waiting for answer from <designate01>:53 during AXFR14:53
eanderssonThis is a common error in the pdns logs.14:53
Kiallso, you don't have large zones? and is that before or after the TCP lockup fix was applied?14:54
eandersson'<pdns-ip> =  <designate01>:5314:54
eanderssonSmall zones14:54
eandersson100-200 records14:54
*** fawadkhaliq has quit IRC14:55
KiallJust checking is mDNS listening on 53?14:55
Kiallor the default of 5354?14:55
eandersson5314:55
eanderssonPatch had no effect.14:55
KiallIs there lots of churn happening in the zones?14:56
*** ducttape_ has joined #openstack-dns14:56
eanderssonabout 3 records created and deleted per 5 minutes14:56
eanderssonby monitoring14:57
eanderssonotherwise it's pretty static14:57
KiallOkay, and you mentioned you think powerdns is overloaded, is that query load coming in?14:59
eanderssonWhen I say overloaded I don't mean IO, but rather that the zone updates get queued up.14:59
eanderssonCPU and RAM usage is very low on pdns and mdns.14:59
KiallHumm, small zones with a few updates a minute really shouldn't be causing anything like that.. can you manually / via dig do a AXFR of the zones again mDNS, from the pDNS server? Keeping an eye on how long it's taking...15:02
eanderssonvery fast15:03
KiallIt's been way too long since I benchmarked it myself to remember, but someone mentioned the other day 8-10k record zones taking about 8-10 seconds15:03
KiallTrying to think how we rule out pDNS and/or mDNS.. can you run that in a loop - say once a second, until the next time you see a timeout in pDNS logs?15:04
eanderssonKiall: I added some additional info in a pm15:08
en_austinKiall: looks like it (PM) begins to freeze again... SOAs is out-of-sync now (as in previous time)15:09
en_austin2016-02-26 18:08:54.089 3275 INFO designate.mdns.notify [req-93573e6a-a34a-415c-9870-479ddeaa30fb noauth-user noauth-project - - -] Sending 'SOA' for 'xxxxxx.' to 'yyyyyyy:53'.15:09
en_austin^ and repeating15:09
en_austinit was ~300sec difference between Designate's SOA and actual one (bigger on Designate side)15:10
en_austinthen it fixed15:10
en_austinbut Zabbix is still reporting "Serials differ on designate and ns1/ns2".15:10
en_austinand kill -USR1 reports "with lockutils.lock" greenthreads.15:12
en_austin    `with lockutils.lock('update-status-%s' % domain.id):`15:12
en_austin62 green threads as for now (1hr of uptime).15:12
en_austinO_O I've began to receive a IOError: Socket closed from... RabbitMQ15:18
en_austinKiall: http://paste.openstack.org/show/488385/15:18
eanderssonI ran into that in Liberty as well when I started testing.15:19
en_austinhttp://paste.openstack.org/show/488386/15:20
en_austinthat's what in logs of RabbitMQ15:20
en_austineandersson: I'm using Liberty too.15:20
en_austinclosing AMQP connection <0.16519.8> (127.0.0.1:54608 -> 127.0.0.1:5672):15:21
en_austin{heartbeat_timeout,running}15:21
en_austinalso here.15:21
en_austinAFAIK "heartbeat_timeout" is raised by RabbitMQ when TCP connection from another side is dead.15:21
KiallSorry, busy will all sorts of other stuff so back/forth from IRC15:22
Kiallheartbeat_timeout <-- that's not a TCP fail.. what's your RMQ config in Designate look like?15:22
en_austinhttp://paste.openstack.org/show/488388/15:24
eanderssonKiall: Kombu is single threaded, so if something is keeping the thread up, e.g. deadlock, it wont reply to heartbeats15:24
en_austinhttps://www.rabbitmq.com/heartbeats.html that's why I've considered about dead TCP conn15:24
eanderssonUnless they have fixed it now15:24
eanderssonAMQP server xxxx:5672 closed the connection. Check login credentials: Socket closed15:24
KiallSo, kobu requires the calling app periodically calls the "send heartbeat" method, oslo.messaging does not call it15:25
eanderssonI have that in my logs for Liberty as well.15:25
Kiallthere's something somewhere to tell RMQ not to expect heartbeats, which means it will instead rely on TCP keepalive15:25
eanderssonIf you set heartbeat interval to 015:26
KiallBut - It's been ages since I've seen it :)15:26
Kialleandersson: sounds about right15:26
eanderssonI wrote my own library, as I didn't like how pika and py-amqp handled heartbeats :p15:27
en_austinI was just worried about IOError's in my mdns.log...15:27
Kialleandersson: lol15:27
en_austinwell Kiall have you seen my report about deadlocking a greenlets?15:28
*** pglass has joined #openstack-dns15:28
en_austinnow i'm running with both patch for removing a "with lockutils.lock" from PoolManager and patch for service.py (about except socket.timeout)15:28
Kiallen_austin: behaving any better?15:28
en_austinUp and running now, sometimes serials are different on Designate and ns1/2 (but returning in-sync at 5-10sec, that's OK)15:29
eanderssonbtw Kiall, I hit you up with some logs in pm in case you didnt see it15:37
Kiallsorry, multi tasking all over the place ;)15:48
*** penick has joined #openstack-dns15:55
*** bpokorny has joined #openstack-dns15:55
*** bpokorny has quit IRC16:04
*** bpokorny has joined #openstack-dns16:04
*** logan- has quit IRC16:14
*** logan- has joined #openstack-dns16:14
en_austinwell... no faults for 1hr - it's a progress lol :D16:21
en_austinKiall: can you explain in couple of words, what was that fix for? what behaviour does it changed? e.g why exception about socket timeout (if any) would not be caught by socket.error exception clause?16:22
timsimen_austin: It was being caught, but because the exceptions aren't uniform, a KeyError was happening during the socket.error exception handling and raising an exception in the main tcp handling thread.16:34
KiallBasically, if an exception happened in our exception handlers, we goofed up.16:37
*** penick_ has joined #openstack-dns16:37
Kiallwe need to re-work so it's some nested try/catches, with the outer one being nothing more than LOG.critical("OH CRAP, SOMETHING SPLODED") so there's little risk of it raising an exception itself16:38
*** penick has quit IRC16:40
*** penick_ is now known as penick16:40
*** jasonsb has joined #openstack-dns16:44
en_austinI've got it.. And now, if socket.error will occur, will it correctly handle it (re-initiate connection again, etc) ?16:49
*** ccneill has joined #openstack-dns16:52
KiallWell, it'll continue doing what it's doing, rather than let the exception (the one generated insude the exception handler) go un-caught, which kills the thread and leaves you with a service that does UDP but not TCP17:00
*** james_li has joined #openstack-dns17:01
*** darkxploit has joined #openstack-dns17:03
en_austinHm..17:05
en_austinMaybe, we can find an origin of that socket.error - or it's a normal behaviour?17:06
*** jordanP has quit IRC17:06
*** jasonsb_ has joined #openstack-dns17:12
*** en_austin has quit IRC17:13
*** baffle___ has joined #openstack-dns17:15
*** mikal_ has joined #openstack-dns17:15
*** jasonsb has quit IRC17:20
*** ekarlso- has quit IRC17:20
*** krotscheck has quit IRC17:20
*** mikal has quit IRC17:20
*** lmiccini has quit IRC17:20
*** baffle has quit IRC17:20
*** krotscheck has joined #openstack-dns17:20
*** lmiccini has joined #openstack-dns17:23
*** ekarlso- has joined #openstack-dns17:27
*** eandersson_ has joined #openstack-dns17:39
*** rudrajit has joined #openstack-dns17:43
*** rudrajit has joined #openstack-dns17:44
*** bpokorny has quit IRC17:59
*** jasonsb_ has quit IRC18:00
*** ducttape_ has quit IRC18:34
*** bpokorny has joined #openstack-dns18:53
*** ccneill has quit IRC18:53
*** ccneill has joined #openstack-dns19:03
*** darkxploit has quit IRC19:09
*** ducttape_ has joined #openstack-dns19:23
*** porunov has joined #openstack-dns19:38
*** bpokorny has quit IRC20:07
*** johnbelamaric has quit IRC20:11
*** johnbelamaric has joined #openstack-dns20:23
*** tg90nor has quit IRC20:25
andrewbogottIf anyone is around… can I get advice about the kilo->trusty upgrade path for designate?  Any config changes?  And do I really need to start running designate-zone-manager if I’m not using ceilometer?20:36
andrewbogottbah, sorry, kilo->liberty20:37
eandersson_I am in the process of that upgrade and it was really easy.20:41
eandersson_The only thing I had to change in the config was to make sure that I had the host and port specificed in pool_target20:41
eandersson_designate-zone-manager isn't required20:42
andrewbogotteandersson_: host and port are new options?20:44
andrewbogottright now I specify… options, masters, type20:44
eandersson_No, but it would use pool_namespace previously to send notifcations20:45
eandersson_so if you didn't have options: host, port set under pool_target it would default to localhost.20:45
andrewbogotteandersson_: my target is a pdns database, which is specified in options = connection:20:47
andrewbogottmy pool_nameserver sections have port and host though20:47
eandersson_options = host: <pdns>, port: 5320:47
eandersson_you will need that under pool_target20:47
eandersson_in addition to what you have in pool_nameserver20:48
andrewbogottok, and that points to where pdns is running, I take it?  (It’s confusing in my case since the target is a single database, which is used by two different pdns servers running on different hosts)20:48
eandersson_yep20:49
andrewbogottany idea what that host/port is used for?  What new interaction is there between designate and pdns?20:49
eandersson_nah, it's just an undocumented change20:49
andrewbogottI should rephrase:  Since I /already/ have two pdns servers running, one of which is not on localhost...20:50
andrewbogottwhat’s broken by that?20:50
andrewbogottsince obviously the non-localhost one is already not referenced20:50
eandersson_ah, yea if you are already targeting localhost it's fine20:50
andrewbogotteandersson_: I still don’t understand, sorry20:53
andrewbogottI have /two/ targets.  Why would it work to just pick one and point to it?20:53
eandersson_So basically under pool_target options = port: 53, host: xxx put what ever you already have under pool_namespace host/port20:53
eandersson_It's due to this change: https://review.openstack.org/#/c/170612/20:54
andrewbogottso it’ll only notify whichever one I specify20:55
eandersson_yes20:55
andrewbogottand the other one will just have to catch up20:55
andrewbogottI guess that’s ok for now20:55
andrewbogottanyway, overall this sounds painless :)  thanks!20:55
eandersson_You can also add also_notify20:55
eandersson_Yep20:55
eandersson_#also_notifies = 192.0.2.1:53, 192.0.2.2:5320:56
eandersson_https://github.com/openstack/designate/blob/master/etc/designate/designate.conf.sample#L34520:56
andrewbogottah!20:56
andrewbogottbetter yet, thank you20:56
openstackgerritEric Larson proposed openstack/designate: Ensure the zone records quota is enforced  https://review.openstack.org/28436121:00
*** tg90nor has joined #openstack-dns21:11
*** bpokorny has joined #openstack-dns21:17
*** mlavalle has quit IRC21:41
*** porunov has quit IRC21:45
elarsontimsim: so I'm looking at the worker review and just thought about how often we get an rcpapi. I kind of want to submit a review that basically does `from designate import rcpapi` and then do `rcpapi['pool-manager']` (or something similar) to get an instance21:58
elarsondoesn't really matter. just was thinking aloud21:58
*** eandersson_ has quit IRC22:36
*** pglass has quit IRC22:40
*** ccneill has quit IRC22:40
*** ccneill has joined #openstack-dns22:45
*** rudrajit has quit IRC23:04
*** rudrajit has joined #openstack-dns23:08
*** ducttape_ has quit IRC23:10
*** ccneill has quit IRC23:48
*** james_li has quit IRC23:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!