Friday, 2019-08-30

*** nicolasbock has quit IRC02:52
*** aniketh has joined #openstack-dns04:53
*** trident has quit IRC07:10
*** trident has joined #openstack-dns07:18
*** trident has quit IRC07:22
*** trident has joined #openstack-dns07:30
*** ivve has joined #openstack-dns07:50
*** salmankhan has joined #openstack-dns09:47
*** salmankhan has quit IRC09:50
*** salmankhan has joined #openstack-dns09:51
*** salmankhan has quit IRC09:51
*** salmankhan has joined #openstack-dns09:52
*** trident has quit IRC10:23
*** bnemec has quit IRC10:23
*** frickler has quit IRC10:23
*** stingrayza has quit IRC10:23
*** salmankhan has quit IRC10:24
*** eandersson has quit IRC10:24
*** trident has joined #openstack-dns10:24
*** frickler has joined #openstack-dns10:24
*** irclogbot_0 has quit IRC10:25
*** salmankhan has joined #openstack-dns10:27
*** stingrayza has joined #openstack-dns10:27
*** irclogbot_2 has joined #openstack-dns10:27
*** bnemec has joined #openstack-dns10:28
*** brensen has joined #openstack-dns10:48
brensenHi guys, we are having an issue for a day now running python-designate-7.0.0-1.el7.noarch where mdns is logging 2019-08-30 10:45:12.056 63 WARNING designate.mdns.xfr [req-ceaa5078-aced-4e8e-910e-e4a8b59d9dc2 c1f975ffe36b4de5855540b1cd7f1c0a f32a603f32624870afb39311f9b89e3f - - -] XFR failed for XXX.XXX.XXX.. No servers in [] was reached.: XFRFailur10:51
brensene: XFR failed for XXX.XXX.XXX.. No servers in [] was reached.10:51
brensenand no notifications are send out10:52
brensenwe are trying to find out where this list should be coming from but so far without succes10:52
brensenany clues?10:53
brensenpool show_config shows correct information10:53
brensendatabase inspection also did not reveal anything unexpected10:54
*** nicolasbock has joined #openstack-dns11:05
mugsiebrensen: and if you ssh to the mdns node, and try a `nc XXX.XXX.XXX.XXX 53` does it work?11:08
brensenyes it seems to be running fine11:08
brensencan also dig it, and get the latest expected serial etc11:09
brensenit seems it's getting to this point with an empty list11:09
brensenfor srv in servers:        to = eventlet.Timeout(timeout)        log_info = {'name': zone_name, 'host': srv}        try:            LOG.info("Doing AXFR for %(name)s from %(host)s", log_info)            xfr = dns.query.xfr(srv['host'], zone_name, relativize=False,                                timeout=1, port=srv['port'], source=source)11:09
brensenoh11:09
brensenformatting....11:09
brensendef do_axfr(zone_name, servers, timeout=None, source=None):11:10
brensenthere servers is empty11:10
brensendnsutils.py11:11
mugsiecan you do a `openstack zone list --all-projects` and see if there is any secondary zones created?11:11
brensenlet me check11:11
mugsiethat comes from secondary zones, not normal operations11:11
brensenonly type 'PRIMARY' in the list11:11
brensennow I got a couple in PENDING state when I ran a 'zone set' on it11:12
mugsiethe 'do_axfr' method should only be used for secondary (designate pulling from other DNS servers)11:12
brensenah cool, thanks11:13
brensenso this should not happen at all11:13
brensenhmmm11:13
mugsietry `openstack zone list --type SECONDARY --all-projects` ?11:14
brensenreturns nothing11:15
mugsiecan you see if there is one in the DB? someone might have created and deleted it11:15
mugsieand the old "pull" zone task could still be active11:16
brensenah good idea11:16
brensenlet me check11:16
brensenhow do I post snippets via webchat here?11:16
brensenonly PRIMARY in DB11:17
mugsiepaste.openstack.org11:17
mugsieis XXX.XXX.XXX a domain, or an IP11:17
brensenhttp://paste.openstack.org/raw/SweuGPPL1LgY0FWM0JaS/11:17
brensenin the log it shows domain names11:18
mugsiethe config looks good11:19
mugsiedid you try restarting the mdns service?11:19
brensenyeah it did not change and worked fine11:19
brensenyes we've redeployed all the components multiple times already11:20
brensenwe are running them under nomad in containers11:20
mugsieOK .11:20
mugsielet me dig a little - brb11:20
brensenthere used to be a zookeeper issue reported from central, but that was fixed11:20
brensenthanks man!11:20
mugsieOK - something is sending a notify to mdns for that zone I think11:25
mugsieis there a line "Scheduling AXFR for %(zone_id)s" in the mdns logs?11:26
brensenit happens whenever we do a `openstack zone set blah.cloud.`11:26
brensennope11:26
mugsieis there a "Triggered XFR for" log in the API logs?11:27
brensen2019-08-30 10:45:12.051 72 INFO designate.utils [-] Opening UDP Listening Socket on 172.17.49.14:25453[00m2019-08-30 10:45:12.051 70 INFO designate.service [-] _handle_udp thread started[00m11:27
brensenlots of these11:27
mugsielots? that should be at boot11:27
brensen2019-08-30 10:45:11.981 52 INFO designate.mdns.base [-] Initialized mDNS notify endpoint[00m2019-08-30 10:45:11.982 52 INFO designate.mdns.base [-] Initialized mDNS xfr endpoint[00m11:27
brensenhmmm11:28
brensenhttp://paste.openstack.org/raw/II8gwc84YTkRdlUgHB88/11:29
brensenit does seem to align with workers=1011:30
mugsieopenstack dns service list ?11:31
mugsieI am assuming 25453 is mapped to 5453 by nomad?11:32
brensenthat service list seems to be outdated, it lists al lot of old hostnames11:33
brensenlet me double check that port mapping11:33
brensenwhy you ask about 25453 <-> 5453?11:35
brensenam I missing something?11:35
mugsieah, no I miss read your pools config11:36
mugsieand the zone doesn't work?11:36
mugsieis there anyway to get some more of the logs around the error?11:37
brensenwell it's on the pdns master already, but it does not update anymore, until the master determines the zone is stale an initiate a transfer itself11:37
mugsieif you search across all the logs from the services for the request id does it show anything?11:37
brensenmdns does not seem to send out a notification anymore11:37
brensenlet me check, our centralised logging is not worky so we have to check all containers separately, 1 sec11:38
brensendoes it matter if the service list is not correct?11:40
brensenit still lists old worker nodes with status UP11:41
mugsieno, thats kind of expected11:41
mugsie(that feature is problematic)11:41
brensenit all worked fine after some re-deployments, never actually looked at the service list tbh11:41
brensenok11:41
brensen2019-08-30 11:45:49.834 18 WARNING designate.central.service [-] Managed Resource Tenant ID is not properly configured[00m11:47
brenseni that important?11:47
*** jawad_axd has joined #openstack-dns11:48
brensencould it be related to some zookeeper issues?11:50
brensenwhat does it use zookeeper for?11:50
brensen2019-08-30 11:52:36.984 23 INFO designate.mdns.rpcapi [req-328ad6f4-a8a1-44f5-8a2b-b52efc4d0c2e c1f975ffe36b4de5855540b1cd7f1c0a f32a603f32624870afb39311f9b89e3f - - -] perform_zone_xfr: Calling mdns for zone blah.cloud.[00m11:53
brensenfrom central log11:54
brensenI really cannot find anything else related in the logs12:00
brensenhttp://paste.openstack.org/raw/uYt99EuvWJt4ECHWku41/12:02
brensenprovisioner UNMANAGED <- is that correct?12:02
mugsiethat is12:13
mugsieit is set up for private (per project ) pools12:14
brensenok thanks12:14
mugsiebut they don't exist yet, so they are all unmanagd12:14
mugsiewhat is before the line for calling perform_zone_xfr12:14
mugsie?12:14
mugsiethere is no way the "perform_zone_xfr" should be called if the zone is not SECONDARY12:16
brensenhttp://paste.openstack.org/raw/NQkLsAmlw1gji6yEJQht/12:16
brensenit's happening for all zones right now, which are all PRIMARY12:19
mugsiedoes designate-producer have anythig in its logs?12:24
brensenhttp://paste.openstack.org/raw/l0m3sLxbFrZbA1C0dSub/12:26
mugsiecan you run with DEBUG on?12:27
brensenI think we have it on, or we misconfigured something12:30
brensenah that might not be the case, let me fix that12:32
brensenrestarting with debug=true12:35
* mugsie is grabbing some lunch, will be back in a bit12:41
brensenenjoy12:49
brensenhttp://paste.openstack.org/raw/CX8OK4JnSKDqH5MSHKmt/12:49
*** jawad_axd has quit IRC13:02
*** ygk_12345 has joined #openstack-dns13:09
ygk_12345hi all13:09
ygk_12345i am having issues with designate. the zones I create are not transferring properly. It is intermittently working13:09
ygk_12345can someone helpme pelase13:10
ygk_12345I see this error  often13:10
ygk_12345Stderr: u'rndc: connection to remote host closed\nThis may indicate that\n* the remote server is using an older version of the command protocol,\n* this host is not authorized to connect,\n* the clocks are not synchronized,\n* the key signing algorithm is incorrect, or\n* the key is invalid.\n'.13:11
*** KeithMnemonic has joined #openstack-dns13:21
mugsieygk_12345: can you run rndc manually from the command line?13:36
ygk_12345from the backend bind server u mean ?13:36
mugsiefrom the designate server13:36
ygk_12345or in teh designate coantiners ?13:36
ygk_12345what is the full command ?13:36
mugsiein the containers13:36
mugsieit should be logged just before that line afaik13:36
ygk_12345mugsie what command do I need to run ?13:39
mugsieygk_12345: it should be in the logs just before the error you pasted13:39
ygk_12345oh ok13:40
ygk_12345let me try once13:40
ygk_12345mugsie i see this error13:42
ygk_12345mugsie http://paste.openstack.org/show/767640/13:42
mugsierndc -s 172.29.236.18 -p 953 addzone trsuted-zone.com  '{ type slave; masters { 172.29.236.103 port 5354;}; file "slave.trsuted-zone.com.9b6d5299-3c18-4324-833a-b8a48f20eece"; };13:43
mugsiebash: syntax error near unexpected token `}'13:43
mugsieyou are missing the first "' ' "13:43
ygk_12345oh ok13:44
ygk_12345mugsie can you paste the exact command please. I am unable to figure out  the syantx13:46
mugsie/openstack/venvs/designate-18.1.9/bin/designate-rootwrap /etc/designate/rootwrap.conf rndc -s 172.29.236.18 -p 953 addzone trsuted-zone.com  '{ type slave; masters { 172.29.236.103 port 5354;}; file "slave.trsuted-zone.com.9b6d5299-3c18-4324-833a-b8a48f20eece"; };13:47
ygk_12345mugsieit is sshowing > in the next line13:48
ygk_12345mugsie got it now13:49
ygk_12345it is working one second and not working the other second13:49
mugsieis there anything in the bind logs?13:51
ygk_12345mugsie when it works the zone transfer is very slow and sometime errors out..  buit when I restart the workers in the two containers , it works then and again after sometime gets back to that same state13:54
mugsieworkers, or the mdns services?13:55
ygk_12345workers13:57
ygk_12345now I am not able to delete the zones13:57
ygk_12345Stderr: u"rndc: 'delzone' failed: not found\nno matching zone 'fun-zone.com' in any view\n"13:57
mugsiewhat does the bind logs say?13:57
* mugsie has a meeting, will be back13:58
ygk_12345showing this http://paste.openstack.org/show/767642/14:09
ygk_12345Aug 30 14:10:19 dns named[2544]: invalid command from 172.29.236.103#59741: expired  (from the bind logs)14:10
*** ygk_12345 has quit IRC14:24
fricklersound like there may be a clock sync issue14:48
brensenwe have 2 older instances of designate running, and I'm comparing the DB, seems like in the newer nothing is set in pool_attributes table while in the older it contains a key "internal"14:51
brensencould that be related?14:51
*** ivve has quit IRC14:51
brenseninspecting the object just before the XFR stuff shows: {u'transferred_at': None, u'attributes': OVO Objects, u'masters': OVO Objects}14:53
brensenwhat does that mean?14:53
mugsieit means that the masters have been set on the zone :/14:59
mugsie(which should only happen when it is a secondard)14:59
mugsiesecondary*14:59
brensenthat sound like it should not happen in our case15:00
mugsiethe pool attributes are just key value pairs, for when you have multiple pools, and want to schedule zones between then15:00
mugsiethem*15:00
brensenok15:00
brensenso somehow it thinks the zones are SECONDARY15:01
mugsieit seems son15:01
brensenin the logs it shows as PRIMARY15:01
mugsieso*15:01
brensenprinting the object reveals: {u'transferred_at': None, u'attributes': OVO Objects, u'masters': OVO Objects} <Zone id:'803841a6-30dd-4103-880e-7c721cc38387' type:'PRIMARY' name:'blah.cloud.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1567177080' action:'UPDATE' status:'PENDING'>15:02
mugsiecan you print zone.masters?15:02
openstackgerritErik Olof Gunnar Andersson proposed openstack/designate master: New service layer  https://review.opendev.org/67843215:03
brensenwe are checking...15:03
brensen<ZoneMaster count:‘0’ object:‘ZoneMasterList’>15:04
mugsieok. so not trying it as a secondary then15:07
mugsiebrensen: does https://opendev.org/openstack/designate/src/branch/master/designate/producer/tasks.py#L221 get logged at all?15:07
brensenthat would be on producer?15:08
mugsieyeah15:08
brensenchecking15:10
brensenI don't think we saw something like this before, and we don't have much history (anymore)15:11
brensenhow often does it do this/15:11
brensenah it's configurable on the zones, let me check15:12
brensenit does not much more than: 2019-08-30 15:11:27.064 7 INFO designate.producer.tasks [req-0b130db0-a3ba-46a5-a81a-63c5dbeb3adf - - - - -] Recovering zones for shards 0 to 4094[00m15:12
mugsieis that in debug mode?15:20
mugsiewth is that call coming from :/15:20
brensen?15:23
mugsietalkign to myself :)15:24
mugsieI honestly have no idea why it is calling do_afxr()15:25
brensenhaha, I need to attend to the kids, we're still really stuck, but I need to leave it for now15:25
brensensuper thanks for your time and effort, we'll keep digging15:26
*** eandersson has joined #openstack-dns15:30
*** ygk_12345 has joined #openstack-dns15:51
openstackgerritErik Olof Gunnar Andersson proposed openstack/designate master: Refactored service layer  https://review.opendev.org/67843215:52
mugsieygk_12345: are the clocks in sync? as frickler says ^ is sounds like a clock issue15:54
ygk_12345mugsie before that I have some confusion. can you clear that for me please15:55
mugsiesure - what is the confusion?15:56
ygk_12345mugsie i have an openstack ansible rocky setup with two controller. so I have two designate controllers in all. I have setup a backend bind server for the designate. Apart from that , should I also install bind9 servers in both the designate containers as well ?15:56
mugsieno, just the server that you are controlling from designate15:57
ygk_12345so no need of the  bind9 servers in the designate containers ?15:58
ygk_12345shall I delete them now ?15:58
mugsieare they in your pools.yaml file/15:58
mugsie?*15:58
ygk_12345nope15:58
mugsieyeah, you can delete them then15:59
ygk_12345ok i will delete them now and I will check the clock issue after that and let you know15:59
ygk_12345mugsie one  container is lagging 1 minute and a half or so behind other container and the dns server16:02
*** ivve has joined #openstack-dns16:03
*** ginopc has quit IRC16:05
mugsieygk_12345: ok that needs to be fixed16:09
ygk_12345mugsie now when I deleted those bind9 servers on the containers, I am seeing this16:09
ygk_12345mugsie Stderr: u'rndc: neither /etc/bind/rndc.conf nor /etc/bind/rndc.key was found\16:09
ygk_12345on the containers16:09
ygk_12345in the worker logs16:09
mugsieOK, that looks weird - you were using the bind9 servers to store the RNDC keys?16:10
ygk_12345the only bind9 server I now have is the backend dns server for the designate16:10
ygk_12345but how  to fix the clock on the contianer ? it is ubuntu 1816:11
mugsiethe host needs the time synced - you need to look at ntp16:12
ygk_12345shall I install ntp server on that container ?16:12
mugsiethis should have been set up as part of the openstack ansible install16:13
mugsieI honestly don't know what they do - #openstack-ansible is the best place to ask16:13
jrosserOSA installs chrony out-of-the-box16:46
mugsiejrosser: I thought it did16:52
*** bnemec is now known as beekneemech16:53
*** salmankhan has quit IRC17:13
eanderssonmugsie, pretty happy with the service refactor now17:48
eanderssonif you or frickler have some time over please take a look https://review.opendev.org/#/c/678432/17:48
eanderssona little unfortunate that the ssl piece does not work on py317:48
*** brensen has quit IRC18:13
*** ygk_12345 has quit IRC18:17
*** KeithMnemonic has quit IRC19:22
*** aniketh has quit IRC21:39
*** ivve has quit IRC23:10

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!