Monday, 2016-01-18

*** bpokorny has joined #openstack-dns01:46
*** stanzgy has joined #openstack-dns01:48
*** GonZoPT has quit IRC01:53
*** GonZoPT has joined #openstack-dns01:54
*** rudrajit has joined #openstack-dns01:58
*** PsionTheory has quit IRC02:24
*** bpokorny has quit IRC02:29
*** gohko has quit IRC02:32
*** bpokorny has joined #openstack-dns02:48
*** bpokorny has quit IRC03:27
*** _RuiChen has joined #openstack-dns03:37
*** nkinder has joined #openstack-dns03:51
*** nkinder has quit IRC03:58
*** bpokorny has joined #openstack-dns04:01
*** bpokorny has quit IRC04:14
*** fawadkhaliq has joined #openstack-dns04:23
*** fawadkhaliq has quit IRC04:44
*** fawadkhaliq has joined #openstack-dns05:34
*** jasonsb has joined #openstack-dns05:50
*** sonuk has joined #openstack-dns06:33
*** jasonsb has quit IRC06:45
*** naggappan has joined #openstack-dns06:57
naggappanhi i am trying to run unit test case using "tox -e py27 " but getting failure when i use pdb "import pdb;pdb.set_trace()" could some one tell me how to use pdb with tox?06:57
*** chlong has quit IRC07:00
*** GonZoPT has quit IRC07:10
*** ramtalari has joined #openstack-dns07:16
openstackgerritcaoyue proposed openstack/designate: test: make enforce_type=True in CONF.set_override  https://review.openstack.org/26861507:35
*** sonuk has quit IRC07:50
*** sonuk has joined #openstack-dns08:10
*** fawadkhaliq has quit IRC08:12
*** fawadkhaliq has joined #openstack-dns08:13
en_austinfederico3: sure you can08:16
*** rudrajit has quit IRC08:18
*** fawadkhaliq has quit IRC09:01
*** fawadkhaliq has joined #openstack-dns09:01
en_austinfederico3: please ping me when we can continue our discuss of this issue09:01
*** nyechiel_ has joined #openstack-dns09:43
*** venkat has joined #openstack-dns10:18
*** venkat has quit IRC10:20
*** penchal has joined #openstack-dns10:22
*** venkat has joined #openstack-dns10:22
*** blabbityblabbity has joined #openstack-dns10:24
*** blabbityblabbity has quit IRC10:25
*** sonuk has quit IRC10:26
*** venkat has quit IRC10:29
*** bradjones has quit IRC10:38
*** bradjones has joined #openstack-dns10:40
*** bradjones has quit IRC10:40
*** bradjones has joined #openstack-dns10:40
*** sonuk has joined #openstack-dns10:42
*** stanzgy has quit IRC10:48
*** venkat has joined #openstack-dns11:05
*** kei_yama has quit IRC11:37
*** GonZo2000 has joined #openstack-dns11:54
*** GonZo2000 has joined #openstack-dns11:54
*** fawadkhaliq has quit IRC12:13
*** km_ has quit IRC12:16
*** sonuk has quit IRC12:21
*** bradjones has quit IRC12:30
federico3en_austin are you around?12:31
*** chlong has joined #openstack-dns12:36
*** fawadkhaliq has joined #openstack-dns12:40
openstackgerritMerged openstack/designate: Add retry logic on periodic_sync  https://review.openstack.org/26329512:49
*** rsyed_away is now known as rsyed12:51
*** ramtalari has quit IRC12:54
*** GonZo2000 has quit IRC13:07
*** sonuk has joined #openstack-dns13:09
*** venkat has quit IRC13:09
openstackgerritVishal kumar mahajan proposed openstack/designate: Replace assertEqual(None, *) with assertIsNone in tests  https://review.openstack.org/26903613:16
*** _RuiChen has quit IRC13:23
*** RuiChen has joined #openstack-dns13:24
en_austinfederico3 yeah ,im here but on slow connect, Ill be happy to connect with u in 1.5hr13:30
*** jordanP has joined #openstack-dns13:32
federico3sure13:32
en_austinu will be here?13:32
*** fawadkhaliq has quit IRC13:34
*** ducttape_ has joined #openstack-dns13:36
en_austinAnd just for my info - which timezone you are in?:)13:36
federico3I will - central europe13:44
*** nyechiel has joined #openstack-dns14:04
*** nyechiel_ has quit IRC14:04
openstackgerritMerged openstack/designate: Add retry logic on periodic_sync to stable/liberty  https://review.openstack.org/26489114:08
*** ducttape_ has quit IRC14:12
*** ducttape_ has joined #openstack-dns14:32
*** fawadkhaliq has joined #openstack-dns14:38
*** naggappan has quit IRC14:48
*** rsyed is now known as rsyed_away14:48
*** penchal has quit IRC14:51
*** sonuk has quit IRC14:56
*** rsyed_away is now known as rsyed14:57
*** ducttape_ has quit IRC15:03
en_austinfederico3: i'm here15:06
*** rsyed is now known as rsyed_away15:15
*** rsyed_away is now known as rsyed15:15
*** rsyed is now known as rsyed_away15:18
*** rsyed_away is now known as rsyed15:18
*** mlavalle has joined #openstack-dns15:18
*** chlong has quit IRC15:31
*** pglass has joined #openstack-dns15:37
*** nkinder has joined #openstack-dns15:39
federico3en_austin: yup15:48
*** ducttape_ has joined #openstack-dns15:50
en_austinso15:58
en_austinhow can we debug my issue? :)15:58
en_austinand what it can be exactly?15:58
*** ducttape_ has quit IRC15:59
*** ducttape_ has joined #openstack-dns16:02
en_austinfederico3 ping? ))16:03
*** karimb has joined #openstack-dns16:05
federico3en_austin: you said the logs were from a failure, at the bottom of http://paste.openstack.org/show/msl5N4tjlGDkp3tR2YLN/16:05
federico3...I'm seeing Consensus reached for updating zone ap.int.zone16:05
en_austinsee - its already failed, and I've tried to create a new record (and grep'd for query id in logs)16:06
en_austinthat's my logs from16:06
en_austinMaybe, last merge ("add retry logic on periodic sync") will help here? I don't know..16:07
en_austinThe situation is - at some reasons Designate stops to push new records on server, but expects that pushes to be done16:08
federico3this is a log *after* the failure? So we are witnessing a successful run?16:08
en_austinand, at next sync, it tries to sync a SOAs and get a failure (mismatching SOA serial on backends and actual Designate's)16:08
en_austinYes. This is a API query log _after_ a failure, not before. E.g system is ALREADY in failed state.16:09
federico3ok, if you look at Got lower serial for 'ap.int.zone.' to '10.28.0.17:53'.16:09
federico3...the two serials have been generated at:16:09
federico3Fri, 15 Jan 2016 07:26:12 GMT16:09
federico3Thu, 14 Jan 2016 20:53:22 GMT16:09
federico3does it sounds correct to say that the last timestamp was the last successful run?16:11
en_austinI don't confirm it at 100%, but according to users' reports - it's like a truth, yes.16:11
en_austin15 Jan 07:26:12GMT that's a timestamp of my test query (last one)16:12
en_austinafter that I've restarted a Designate to back it live16:12
federico3you said it failed during the night: last successful update on 20:53:22 GMT and then you tried a new update at 07:26:12 GMT on teh next day?16:12
en_austinto be more clear, I've started to restart components step by step to determine a component which cause this failure16:13
en_austinYep.16:13
en_austinI've woke up from sms "your dns failed again" and began to investigate it :(16:13
federico3good, now can you look for logs around 2016-01-14 20:53:22 ?16:13
en_austinSure, if they are not rotated yet :)16:13
federico3(I warmly recommend storing the logs remotely for much longer times)16:15
en_austin20:53:22 no logs at all, some logs 20:53:XX (another values here).16:15
en_austinno any anomalies (will share 'em now)16:16
en_austinhttp://paste.openstack.org/show/NV5ycnsncLhUlecU2i3d/16:16
federico3please grep for 145280480216:17
en_austinhttp://pastebin.com/rFJJ89tr  here16:20
en_austin(paste.openstack.org not allowed me to paste such amount of text) :(16:21
en_austin(daaaamn, it's cut here too :(16:21
federico3you can attach files to launchpad16:22
*** james_li has joined #openstack-dns16:22
en_austinoh god, 4M of logs with this soa ((16:24
en_austinsee Launchpad :)16:26
federico3ok, fetching it16:27
federico3aha, it kept erroring until morning16:30
*** ccneill has joined #openstack-dns16:31
federico3grep '^2016-01-14 23:5' pool-manager.log.3  | gzip > out.log.gz         ...will capture 10 minutes of logs around the event16:32
*** ccneill_ has joined #openstack-dns16:33
en_austinsee Launchpad :)16:33
*** ccneill has quit IRC16:34
federico3thanks16:34
*** ccneill_ is now known as ccneill16:34
*** nyechiel has quit IRC16:39
-openstackstatus- NOTICE: Gerrit is restarting quickly as a workaround for performance degradation16:48
en_austinany ideas?)16:55
federico3it might take a little while16:57
federico3en_austin: how big is the whole pool-manager.log.3 file? It should compress very well with "xz -k pool-manager.log.3"  - would you consider uploding it if it does not contain any security sensitive information?17:05
en_austinit contains a lot of private data, so... any way to share it only with you (not to public)?17:10
*** rudrajit has joined #openstack-dns17:10
en_austinit's now a 51.2M17:11
federico351 after compression?17:11
en_austinno, raw17:11
en_austin1.3M after compression17:12
en_austinhm, wait. seems to be there no much private data in PM logs.17:14
*** rudrajit has quit IRC17:14
en_austinall of 'em are contained in api/central logs.17:14
en_austinam I right?17:14
*** rudrajit has joined #openstack-dns17:14
federico3it depends if things like zone names, amount of traffic, resolver ip addrs are sensitive for you17:14
en_austinzone names and backend ip's i've already shared, so...17:15
en_austinok, I'll upload it now.17:15
federico3I'm looking for a way to upload embargoed files to launchpad17:15
en_austini've done17:16
en_austincheck Launchpad )17:16
en_austinLaunchpad allows to lock an issue, afaik, for specified users.17:16
federico3it's quite weird that LP has no access control on this stuff17:16
federico3aha17:17
en_austinMaybe, let's do this?17:17
federico3set to Private17:17
federico3albeit it would make more sense to leave the bug as Public and keep only the attachments as private17:17
en_austinI don't know how to do this17:18
en_austindon't sure that's possible17:18
federico3apparently not - I made the bug private17:23
en_austingood17:23
*** ducttape_ has quit IRC17:24
en_austini'm afk for hour - time to drive home :)17:34
en_austinyou'll still be here?17:34
*** jordanP has quit IRC17:35
federico3yep, we'll investigate17:39
*** ducttape_ has joined #openstack-dns17:39
en_austinOK, will contact u here in 1-1.5hr :)17:41
*** nyechiel has joined #openstack-dns17:52
*** ducttape_ has quit IRC17:53
*** karimb has quit IRC17:53
*** fawadkhaliq has quit IRC18:06
*** fawadkhaliq has joined #openstack-dns18:10
*** fawadkhaliq has quit IRC18:11
*** rsyed is now known as rsyed_away18:19
*** james_li has quit IRC18:30
*** doublek has joined #openstack-dns18:32
en_austinfederico3: so...18:39
en_austinany ideas? :)18:39
*** doublek has quit IRC18:41
federico3the first error seems to be at 00:04:28.414 12786 as part of transaction req-5f0b0d72-b5f7-40a7-b7f2-a46013cc830418:42
federico3Could not retrieve status and serial for domain aqa.int.zone. on nameserver 10.28.0.18:53 with action UPDATE (<class 'oslo_messaging.exceptions.MessagingTimeout'>: Timed out waiting for a reply to message ID18:42
en_austinshould I grep for this request id?18:43
federico3it's the only error of that kind and we are broken from there18:43
federico3no need, everything needed is in the log file18:43
en_austinwell18:43
en_austinso... what's our next steps to investigate this?18:44
en_austinand what's the problem in - RabbitMQ, oslo.messaging or some Designate-related code?18:44
federico3the timeout is raised in _retrieve_from_mdns /opt/designate/designate/pool_manager/service.py:65618:47
federico3we might not handle correctly the timeout, maybe insert something in the cache that blocks pool manager permanently18:47
en_austinshould I disable the cache at all?18:49
en_austinbtw, I'm using memcached, not MySQL18:49
en_austinmaybe that's a reason?18:49
federico3cat -n /opt/designate/designate/pool_manager/service.py  | grep 656 -C318:52
federico3just to find what's exactly around that line :)18:53
en_austinhttp://paste.openstack.org/show/sz3ZkUpG5gpnqOmSPjvc/18:55
*** james_li has joined #openstack-dns19:01
federico3that's the smoking gun,   grep 'Could not retrieve status' pool-manager.log*  will show you *all* similar errors in the logs from the past days19:02
en_austinany ideas how to prevent/fix this?19:04
federico3we might need a patch - in the meantime, FWIW it could help to ensure that there's good network connectivity between PM and MiniDNS19:06
en_austinyep, three failures last week - exactly like it was in reality.. now I can only parse logs in realtime and restart PoolManager when this appears19:06
en_austinthey are on the same box19:06
federico3how many PMs do you have?19:06
en_austinand using 127.0.0.1 for communications19:06
en_austinone19:07
en_austini've pasted my config in bug report19:07
federico3a restart is a good workaround in the very short term, just make sure it doesn't trigger a restart loop :)19:07
*** ducttape_ has joined #openstack-dns19:09
*** GonZo2000 has joined #openstack-dns19:10
*** GonZo2000 has quit IRC19:10
*** GonZo2000 has joined #openstack-dns19:10
federico3also, can you grep pool manager's logs for '2016-01-15 00:03' ? It should be a minute worth of logs - do you see any error there?19:11
*** james_li has quit IRC19:13
en_austinsure19:14
*** ducttape_ has quit IRC19:15
en_austin2016-01-15 00:03:29.378 <- here is a same error19:16
en_austin"could not retrieve status & serial"19:16
federico3d'you mind uploading the 10 minutes (or even less if there's sensitive data)?19:21
en_austini can upload full PM logs since bug is already private :)19:22
federico3thanks19:23
en_austinwell wait, I've already uploaded pool-manager-3.log19:27
en_austinor you're talking about another logs?19:27
federico3oh sorry, I mean minidns!19:28
*** mlavalle has quit IRC19:29
federico3either PM failed to communicate to/from mDNS or mDNS did not receive answers from the resolvers - in both cases PM should have handled the failure better19:29
*** mlavalle has joined #openstack-dns19:29
*** fawadkhaliq has joined #openstack-dns19:35
en_austinPM or mDNS should have handled? ;)19:36
federico3PM should have failed more gracefully: an ERROR message instead of being broken forever19:37
en_austini think, it should restart such activity (which failed), not broke forever19:38
en_austinwell, there is no mDNS activity at all related to this timestamps19:38
en_austin(I've grep'd "2016-01-15 00:03") - only api & PM logs19:38
federico3non at all? So most likely the communication failed between PM and mDNS it seems. Are they on different hosts?19:39
en_austinat all.19:39
en_austinno, they are on single box19:39
en_austinand using 127.0.0.1 to comm19:39
en_austinrabbitmq also on the same box with Designate services19:39
federico3humm, odd.19:40
federico3iif you grep for "2016-01-15 00:0"  and then for mdns, is there any other mdns activity?19:42
en_austinno, only api & PM19:42
federico3not ever for "2016-01-15 00" ? When is the last message from mdns?19:43
en_austini don't understand what's wrong with mdns logs, there are no logs at all from mdns for 15jan at all19:44
en_austinseems to be that there is so much logs that they are already rotated :(19:45
en_austini have a logs from 16jan till now19:45
en_austinwill now try to find such freeze in this period19:46
en_austinwell, for example, it fired today. You want to see mDNS logs since point of failure, or for whole day for example?19:48
federico3mDNS is on the same host as PM - how many processes, just one?19:49
en_austinyep19:49
federico3I'm trying to ensure that mDNS was up before blaming PM19:49
federico3do you have any log from mDNS at all from the last 24h?19:50
en_austinsure, it's running on my prod server :)19:50
federico3if you can upload the last day please?19:52
en_austinyep19:52
en_austindone19:53
federico3got it19:54
en_austinawh, wait19:54
en_austinit's not full19:54
en_austin:(19:54
en_austin"xz" process is still running on Designate box)19:54
federico3how big is the original file?19:55
federico3maybe we can fiter out a smaller range19:55
en_austinthey're split into 50M chunks19:56
*** fawadkhaliq has quit IRC19:56
en_austini've grep'd by all of them on the date pattern19:56
en_austinattachment uploaded19:56
en_austin6M in archived state :)19:56
federico3and 550MB uncompressed :)19:58
en_austinpool-manager.log.1:2016-01-18 10:24:55.722 4719 DEBUG designate.pool_manager.service [req-09d6bcff-ef61-44bc-b5ee-7d73824d4df6 noauth-user noauth-project - - -] Could not retrieve status and serial20:01
en_austinlast failure from now20:01
federico3I'm confused, the logfile seems to contain data from 18th at 22 hour GMT  (which is in the future?)20:03
federico3cut -d':' -f2-  mdns_jan18 | cut -c 1-15 | uniq -c20:03
en_austin[root@designate designate]# date20:04
en_austinMon, Jan 18 23:04:04 MSK 201620:04
federico3(are you logging in your local timezone?)20:04
en_austinseems to be that these logs are in local timezone20:05
en_austinnot gmt :)20:05
federico3indeed20:05
en_austini've verified - the log timestamps are not in UTC20:08
federico3yep20:08
*** nyechiel has quit IRC20:09
*** james_li has joined #openstack-dns20:09
federico3anyhow, you said you just had a failure from PM?20:17
en_austinyeah. last time when Designate failed, i've began to restart it step-by-step, not bulk "restart all"20:18
en_austinand in such way I determined that if I will restart a PM - Designate will return to normal state :)20:18
federico3en_austin: btw, one of your 2 resolvers seems to be timing out from time to time20:21
federico3grep 'Unhandled exception while processing request' mdns_jan1820:21
en_austinYeah, I see. Can this be related to PM failure?20:23
federico3or better:20:23
federico3grep -B20 'timeout: timed out' mdns_jan18 | grep 'Unhandled exception while processing'20:23
en_austinaha, I see. Seems strange..20:24
federico3it could be a factor to it. If you can grep all your logs for that timeout, do you see any interesting pattern in time?20:25
en_austinbtw, there is both 0.17 and 0.18 resolvers in these logs (while 0.18 is much more times failed than 0.17)20:25
en_austinmdns generates too much logs - they are already rotated :(20:26
*** jasonsb has joined #openstack-dns20:29
*** jasonsb has quit IRC20:31
*** jasonsb has joined #openstack-dns20:31
*** rudrajit has quit IRC20:37
federico3as a mitigation it could make sense to ensure that connectivity to the resolvers is good and that they are healthy in general20:40
en_austinto tell the truth, I don't think that this timeouts is directly connected to PM failure - there is some AMQP task failed...20:41
en_austinso, what should I do from now?20:42
en_austinare there any action items you will recommend to me?20:42
federico3yep, it's difficult to tell from here20:43
federico3you said that restarting PM help and restarting mDNS does not help, right?20:43
federico3you can trigger an automated restart of PM when the error shows up in PM's logfile as you said20:43
en_austinyeah, but that's a workaround, not problem solution...20:44
federico3yep, the solution will be a bugfix in the next release and a backport into liberty (are you running liberty?)20:45
en_austinI can contact anyone from devs when it will fail again - but it should be done not in IRC due to long delay between irc ping and reply ;(20:45
en_austinyeah, liberty.20:45
*** rudrajit has joined #openstack-dns20:53
timsimTo be fair, it's an open-source project. While the devs will certainly do the best they can to support you, it's not an entitlement.20:54
en_austinsure - I mean that "to investigate issue asap it happens, not after X hours of delay"20:55
en_austindue that's a floating bug, not some reproducible issue. anyway, that community already helped me a lot, and I am very grateful for that :)20:57
federico3en_austin: I'm trying to reproduce it, if it happens again it might be useful to check if there's anything interesting in mdns logs, and put it on launchpad20:57
en_austinOK, I'll do so. But, afaik, there is no any mDNS activity while in failed state.20:58
federico3en_austin: you can also ping me here20:59
en_austinI'll keep it in mind) thank you for help, anyway. Hope restarts will help to stay stable until you'll release a patch for it.21:02
en_austinbtw, are there any shared discussions between all devs about bugs?21:02
en_austinif so, are the external people (like regular users like me) allowed to join 'em?21:02
*** james_li has quit IRC21:03
federico3https://wiki.openstack.org/wiki/Meetings/Designate this ones21:03
en_austinwell, are not-devs allowed to attend it?21:06
en_austinif you'll discuss my bug, maybe, I can provide actual info about it :)21:06
federico3the weekly meeting is not a deep-dive into each bug, we can discuss bugs here at any time :)21:09
*** ducttape_ has joined #openstack-dns21:11
en_austinok, not a problem21:11
en_austini'm asking that because we're talked on this problem only with you - maybe someone else will say "awh, I know what's the reason - it's X" - that's why I'm asking about your meetings and bug triage21:12
en_austinwell.. which time you are avaiable here? std business hour or you have your own schedule? :)21:13
*** ducttape_ has quit IRC21:15
*** james_li has joined #openstack-dns21:17
federico3western european time more or less21:21
en_austinwell, okay21:24
en_austinseems I've done my best to help you investigate it21:25
en_austinif there is anything else I can do - drop a line in the bug or ping me here, I'll be happy to help you with it)21:26
*** sonuk has joined #openstack-dns21:29
federico3sure, thank you!21:30
en_austinthank you for your time too - I appreciate your help =)21:31
en_austin12:31am, it's time to sleep... hope it wont' fail tonight :)21:32
*** karimb has joined #openstack-dns21:32
federico3and hope the restart script will kick in if it does :)21:32
en_austinit only will be written tomorrow :D21:33
*** rudrajit has quit IRC21:44
*** jordanP has joined #openstack-dns22:02
*** chlong has joined #openstack-dns22:12
*** rudrajit has joined #openstack-dns22:13
*** bpokorny has joined #openstack-dns22:14
*** jordanP has quit IRC22:30
*** sonuk has quit IRC22:36
*** james_li has quit IRC22:40
openstackgerritOpenStack Proposal Bot proposed openstack/designate: Updated from global requirements  https://review.openstack.org/26843622:40
openstackgerritOpenStack Proposal Bot proposed openstack/python-designateclient: Updated from global requirements  https://review.openstack.org/26850822:45
*** rudrajit_ has joined #openstack-dns22:47
*** james_li has joined #openstack-dns22:48
*** rudrajit has quit IRC22:50
*** rudrajit has joined #openstack-dns22:57
*** rudrajit_ has quit IRC22:59
*** jasonsb has quit IRC23:00
*** jasonsb has joined #openstack-dns23:01
*** rudrajit_ has joined #openstack-dns23:04
*** ducttape_ has joined #openstack-dns23:06
*** rudrajit has quit IRC23:07
*** km_ has joined #openstack-dns23:09
*** james_li has quit IRC23:21
*** ducttape_ has quit IRC23:32
*** rudrajit has joined #openstack-dns23:38
*** rudrajit_ has quit IRC23:40
*** kei_yama has joined #openstack-dns23:42
*** pglass has quit IRC23:47
*** kei_yama has quit IRC23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!