Friday, 2017-08-25

*** masber has joined #openstack-dns00:27
trungnvabaindur, I am not sure exactly issue along with 2nd or 3rd. Feel free with some logs where occurs these issues in here or https://bugs.launchpad.net/designate.01:18
abaindurUnfortunately there aren't any logs...01:49
abaindurim trying to add my own and re-pro01:49
abaindurbut hitting some issues printing the DB records01:49
abaindurtrungnv: https://gist.github.com/xagent003/ca15d59e61875babddb1c733c2e3dfe501:52
abaindurhttps://gist.githubusercontent.com/xagent003/ca15d59e61875babddb1c733c2e3dfe5/raw/55105558ebadb93c7841591031b9a1d14ab5cb58/zone%2520action01:52
abaindurone thing i do notice is, right before that log at end of _update_zone_in_storage (which we added), where zone is returned as NONE/ACTIVE01:53
abaindurthere are a bunch of notification logs for "dns.domain.update" and "dns.zone.update". the queueing notificaiton and emitting notification01:54
abaindurwhen the records are created without issue, we dont see those01:54
abaindurit is just the logs for _create_recordset_in_storage, Checking if ... belongs in any subzones..., Updating serial for zone..., and Updating zone01:55
abaindurfollowed by that log at end of _update_zone_in_storage01:55
abaindurBut in case where _update_zone_in_storage returns wrong action for zone, we see logs like that ^^01:55
abaindurtrungnv: what causes those Queuing notification and emitting notifications? Suspect that has something to do with _update_zone_in_storage not properly returning zone from storage as UPDATE/PENDING01:57
trungnvabaindur, when you update zone then you got "action:'UPDATE' status:'PENDING'" then worker will update to backend. If update processing success then return "action:'NONE' status:'ACTIVE'" and if update processing fail then return status 'ERROR'02:01
trungnvI look at your log then nothing issues in there.02:02
abainduryes but problem here is, "then worker will update to backen" - central itself sends zone action as NONE to the worker. so worker can't update02:02
abaindurtrungnv: when central returns from updating the zone in storage with UPDATE/PENDING, the zone is returned as NONE/ACTIVE02:02
abaindurhttps://github.com/openstack/designate/blob/stable/newton/designate/central/service.py#L106502:02
abaindurWe even see here, https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L30902:03
abaindurit has correct UPDATE/PENDING. but when that function is storage returns the zone, it has NONE/ACTIVE02:04
abaindurWe see those Queuing/Emitting notification logs around that time period when it fails - when central correct gets the UPDATE/PENDING for the zone to send to worker, it goes fine02:05
trungnvin your log which you sent me then I also saw log at https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L309 which is 2017-08-23 00:48:17.208 24214 DEBUG designate.storage.impl_sqlalchemy [req-f87ca390-b775-4ceb-83bb-8b3559fbc42b 332fa775cff44c0cbf4282cc4537b339 9b4e484efcb64b718944efaf05042df6 - - -] Updating zone <Zone id:'a0058952-4d23-4bf7-a87d-7184d0dbc048'02:06
trungnvtype:'PRIMARY' name:'net1.test-du-neutron-t-mx-c7-41216.platform9.horse.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1503449297' action:'UPDATE' status:'PENDING'> update_zone /opt/pf9/designate/lib/python2.7/site-packages/designate/storage/impl_sqlalchemy/__init__.py:30902:06
trungnvyou can see that action:'UPDATE' status:'PENDING'02:06
abaindurcorrect, that's still in central.02:06
abaindurthats before the updated_zone has been returned via storage.update_zone02:06
abaindurfrom designate/storage/imply_sqlalchemy/__init__.py: update_zone() function02:06
abaindurwhen https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L43202:07
abaindurreturns, it returns updated_zone with action=NONE, status=ACTIVE02:07
abaindurBut at L #302, we see that update function was called with action=UPDATE, status=PENDING02:08
abaindur(thats also the log you just pasted, line #'s in reference code are a little off due to extra debugs I added)02:09
abaindurthat function is not properly updating and returning the zone in DB. Or there is some race condition that changes it to ACTIVE/NONE02:10
trungnvabaindur, Yep. perhaps I need investigate more info with update zone method.02:11
abainduras mentioned - we only see this so far when we add multiple records to a zone. It's usually the 2nd or 3rd record that fails02:12
abaindur1st record always goes thru fine02:12
trungnvyep.02:12
abaindurand we also see those Queuing and Emitting notifications in logs around that time, when failure occurs02:12
abaindurduring record creation for passing cases, they are not there (or they are present at some time before the next records has been created)02:13
trungnvbefore push zone object into worker then action is None and status is ACTIVE then the worker don't understand next action for this session. ir right?02:14
abaindurWhat triggers those? Are they somehow updating the DB? We don't see any extra sqlalchemy logs though with wrong action/status02:14
abainduryea. so that zone is returned from storage with NONE/ACTIVE02:14
abaindurcentral sends that updated zone to worker, which errors out02:15
abaindurI am wondering if for a workaround/hack fix, we could just force set zone.action=UPDATE, status=PENDING02:15
trungnvDid you check new zone which updated?02:15
abaindurhere: https://github.com/openstack/designate/blob/stable/newton/designate/central/service.py#L106502:15
abaindurafter the storage is returned02:15
abaindurin that case if the DB messed up, we at least send correct actio and status to worker02:16
abaindurthe zone remains in NONE/ACTIVE02:16
abaindurthe recordset remains forever in UPDATE/PENDING - worker never processes it due to action=NONE02:17
trungnvNo. Could you check values in new zone?02:17
abaindurnew zone?02:17
trungnvyeah.02:17
abaindurwhat do you mean?02:18
abaindurin our tests, we create a couple of zones - per tenant network02:18
abaindurthen we create around 3 A records per zone02:18
abaindurhalf the time everything goes fine02:18
abaindurhalf the time, 1 records (the 2nd or 3rd) in one of the zones faces this issue02:18
abaindurthe zone stays as NONE and ACTIVE02:19
trungnvabaindur, Any log at worker when you create 2nd record?02:21
abainduras i said, worker has stack trace because central sent zone.action=NONE02:21
abaindurso worker does nothing for that record02:21
trungnvnothing logs in worker. right?02:22
abainduri mean there are logs... it just gets the update_zone RPC from central with NONE action02:22
abaindurso worker errors out02:22
abaindurLet me report back in about 30-45 minutes... we have added more logs to https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L30902:25
abaindurand to https://github.com/openstack/designate/blob/stable/newton/designate/sqlalchemy/base.py#L52402:25
abaindurThe problem must lie somewhere in there, right? zone comes in with UPDATE/PENDING, but updated_zone that is returned has NONE/ACTIVE02:26
trungnvYep. try to print some logs at there.02:27
abainduri see it does a table.update, then a select to "# Refetch the row, for generated columns etc", then calls return _set_object_from_model(obj, resultproxy.fetchone())02:27
trungnvabaindur, I am caring about 2rd record in backend more than 2nd record in DB.02:30
trungnvwhat is backend which you used?02:30
abaindur2nd record in backend or worker gets nowhere. worker throws exception immediately because central has sent zone with NONE action02:30
trungnvyep. this is issue if 2nd record don't created in backend.02:32
trungnvabaindur, Cool. I also look at this issue soon.02:32
abaindurHow can 2nd record get created in backend in first place though? If central sends wrong update_zone to worker?02:33
*** rektide has joined #openstack-dns02:33
abaindurthe update_zone from central -> worker is sent after central updates the zone in storage to UPDATE/PENDING, right?02:33
trungnvYeah. exactly.02:34
abaindur1. central's create_recordset calls 2. increment_zone_serial, which calls 3. _update_zone_in_storage, which calls the 4. DB storage driver's update_zone02:34
abaindur#4 is where we are seeing issue02:34
trungnvwe are get zone object forn DB firstly then central call worker to create to backend then worker call backe central to update zone.02:35
abaindurFinally 5. worker gets the RPC req. from central. it fails because zone has wrong values due to #402:35
abaindurso central first has worker create the record in backend, *BEFORE* it increments the zone serial in central/DB? Then it sends another update_zone RPC to worker?02:36
trungnvYep.02:37
trungnvDid you see 2nd record in backend(BIND9 or Pdns4)?02:38
abaindurso is there any possibility of race condition then? worker tells central to update zone to NONE/ACTIVE. Meanwhile central increments zone serial and tries to update zone in DB to UPDATE/PENDING02:38
abaindurbut that fails, so worker rejects the update_zone02:39
abaindurNo, the record is unpingable - fails to resolve02:39
trungnvYep. this is an issue.02:39
abaindurand DB shows recordset itself forever remains in UPDATE and PENDING state, while zone remains NONE/ACTIVE02:39
trungnvabaindur, I understand which you said.02:40
abaindurlooks like tests this time are passing... will have to re-run02:40
trungnvI will look at this issue soon. Could you push Bug in launchpad along with logs?02:40
abaindurtake about another 1-1.5 hrs to re-run testbeds02:40
trungnvabaindur, http://paste.ubuntu.com/25387060/ no action and no status on 2nd and 3rd record. This is an issue on your case?03:49
abaindurNo03:50
abaindurrecordset shows as PENDING03:50
abaindurthe zone as NONE and ACTIVE03:50
abaindurwe dont notice it missing altogether03:51
trungnvstatus is PENDING on 2nd record?03:52
trungnvabaindur, http://paste.ubuntu.com/25387095/ this worker log when I create 4th record.04:00
abaindurAre you creating these records quickly?04:08
abainduror waiting to create each one one by one04:08
abaindurWe are creating it in a loop via python API - We attach a floating IP, wait for floating IP status to become ACTIVE, then create recordset04:08
abaindurSo it happens a lot quickre than say, manually using CLI04:08
trungnvabaindur, just via devstack along with one node without floating IP.04:28
frickleris there a sample devstack config somewhere that sets up neutron with designate integration?05:49
fricklerI haven't been able to find any test for that scenario in jenkins, is that correct or did I just miss it?05:50
trungnvfrickler, https://docs.openstack.org/designate/latest/contributor/integrations.html this link which you need about integration.06:09
trungnvfrickler, We have some test-case about notify https://github.com/openstack/designate/blob/master/designate/tests/test_notification_handler/06:11
*** abalutoiu_ has joined #openstack-dns06:16
*** abalutoiu_ has quit IRC06:20
fricklertrungnv: thx, but those tests seem to work with fixtures for notification only, no real interaction with neutron, right?06:29
fricklermaybe the neutron folks have a bit more, I'll ask over there06:32
*** pcaruana has joined #openstack-dns06:40
*** trungnv has quit IRC07:18
*** trungnv has joined #openstack-dns07:19
*** abaindur has quit IRC07:30
*** abalutoiu_ has joined #openstack-dns07:31
*** egonzalez has joined #openstack-dns07:35
*** egonzalez has quit IRC07:50
*** abalutoiu_ has quit IRC07:51
*** egonzalez has joined #openstack-dns07:51
*** vipul has quit IRC07:54
*** vipul has joined #openstack-dns07:57
*** vipul has quit IRC08:23
*** vipul has joined #openstack-dns08:25
*** abalutoiu_ has joined #openstack-dns08:30
*** carthaca_ has quit IRC08:37
*** carthaca_ has joined #openstack-dns08:37
*** MarkBaker has joined #openstack-dns08:44
*** abalutoiu_ has quit IRC09:13
fricklerkiall: https://review.openstack.org/352843 is still W-1ed by you, could you take a look please?09:14
*** abalutoiu_ has joined #openstack-dns09:18
*** jmccarthy has joined #openstack-dns09:47
*** abalutoiu_ has quit IRC09:56
*** trungnv has quit IRC10:02
*** abalutoiu_ has joined #openstack-dns10:34
*** daidv has quit IRC11:00
*** MarkBaker has quit IRC11:04
*** frickler has quit IRC11:26
*** andreaf has quit IRC11:41
*** vcn[m] has quit IRC11:41
*** Guest45420 has quit IRC11:41
*** fouxm has quit IRC11:41
*** odyssey4me has quit IRC11:41
*** ircuser-1 has quit IRC11:41
*** d0ugal has quit IRC11:41
*** simonmcc has quit IRC11:41
*** timsim has quit IRC11:41
*** serverascode has quit IRC11:41
*** johnsom has quit IRC11:41
*** fyxim has quit IRC11:41
*** pcaruana has quit IRC11:41
*** masber has quit IRC11:41
*** KeithMnemonic has quit IRC11:41
*** nyloc has quit IRC11:41
*** mrhillsman has quit IRC11:41
*** amitry has quit IRC11:41
*** ChanServ has quit IRC11:41
*** abalutoiu_ has quit IRC11:41
*** jmccarthy has quit IRC11:41
*** harmw has quit IRC11:41
*** bauruine has quit IRC11:41
*** mugsie has quit IRC11:41
*** dhellmann has quit IRC11:41
*** asettle has quit IRC11:41
*** hieulq has quit IRC11:41
*** mikal has quit IRC11:41
*** brad[] has quit IRC11:41
*** filler has quit IRC11:41
*** logan- has quit IRC11:41
*** ianychoi has quit IRC11:41
*** ChrisDevita has quit IRC11:41
*** raginbajin has quit IRC11:41
*** zigo has quit IRC11:41
*** Krenair has quit IRC11:41
*** basilAB has quit IRC11:41
*** vaishali has quit IRC11:41
*** clayton has quit IRC11:41
*** JonathanD has quit IRC11:41
*** therve_ has quit IRC11:41
*** egonzalez has quit IRC11:41
*** puck has quit IRC11:41
*** lkoranda has quit IRC11:41
*** afranc has quit IRC11:41
*** d34dh0r53 has quit IRC11:41
*** mordred has quit IRC11:41
*** rackertom has quit IRC11:41
*** andrewbogott has quit IRC11:41
*** ying_zuo has quit IRC11:41
*** jmccrory has quit IRC11:41
*** mwhahaha has quit IRC11:41
*** kbyrne has quit IRC11:41
*** sapcc-bot has quit IRC11:41
*** carthaca_ has quit IRC11:41
*** nkinder has quit IRC11:41
*** yee379 has quit IRC11:41
*** vipul has quit IRC11:41
*** rektide has quit IRC11:41
*** kiall has quit IRC11:41
*** eandersson has quit IRC11:41
*** cliles has quit IRC11:41
*** openstack has joined #openstack-dns12:56
*** ChanServ sets mode: +v openstack12:56
*** openstackstatus has joined #openstack-dns12:57
*** ChanServ sets mode: +v openstackstatus12:57
*** hieulq has quit IRC13:29
fricklerI'm seeing this message when trying a default devstack installation, is this a known issue with pdns? "gmysql Connection failed: Could not prepare statement: SELECT domain_id,name,type,modified_at,account,comment FROM comments WHERE domain_id=?: Table 'designate_pdns.comments' doesn't exist"13:34
frickleralso the bugs dashboard link in the topic looks broken to me13:34
mugsiefrickler: damn - it is down. Will fix that over the weekend13:36
mugsieit is not a know issue13:37
mugsieknown*13:37
mugsiecan you log it @ https://bugs.launchpad.net/designate ?13:38
fricklermugsie: possibly pdns 4 on xenial is too new? I noticed the devstack instructions talk about trusty still, but that is kinda deprecated13:38
fricklermugsie: will do13:38
mugsieoh, there is a different driver for xenial13:41
mugsieit needs to be pdn4 instead of powerdns13:41
fricklermugsie: yeah, I just found that when grepping through the devstack plugin code. maybe worth changing the default? or at least generate an error early when trying to run the old driver on xenial?13:46
mugsiefrickler: I though we did error on xenial -_-13:46
mugsiebut yes, it definitly is13:46
mugsie(doing both)13:47
*** d0ugal has quit IRC13:49
fricklerhmm, there is a check in pdns4, that fails on anything other than trusty|xenial, so making that the default will break things for everything non-ubuntu13:51
fricklerthe "real" installation guides seem to use bind9 anyway, maybe make that the default for devstack, too? +1 for consistency13:53
*** saphi has joined #openstack-dns14:01
fricklermugsie: FYI, dumped all that into https://bugs.launchpad.net/designate/+bug/171306914:02
openstackLaunchpad bug 1713069 in Designate "Devstack setup with default settings fails on Ubuntu Xenial" [Undecided,New]14:02
*** saphi has quit IRC14:50
*** saphi has joined #openstack-dns14:54
*** pcaruana has quit IRC15:06
*** d0ugal has joined #openstack-dns15:09
*** egonzalez has quit IRC15:12
*** catintheroof has joined #openstack-dns15:14
*** abalutoiu_ has quit IRC15:15
*** egonzalez has joined #openstack-dns15:24
*** trungnv has joined #openstack-dns15:33
*** trungnv has quit IRC15:34
*** trungnv has joined #openstack-dns15:34
*** trungnv_ has joined #openstack-dns15:38
*** trungnv has quit IRC15:38
*** trungnv_ has quit IRC15:38
*** trungnv has joined #openstack-dns15:39
*** trungnv has quit IRC15:47
*** trungnv has joined #openstack-dns15:47
*** d0ugal has quit IRC15:52
*** catintheroof has quit IRC15:59
*** trungnv has left #openstack-dns16:00
*** trungnv_ has joined #openstack-dns16:00
*** trungnv has joined #openstack-dns16:00
*** trungnv has quit IRC16:03
*** trungnv_ has quit IRC16:03
*** trungnv has joined #openstack-dns16:12
*** trungnv has quit IRC16:12
*** trungnv has joined #openstack-dns16:14
*** trungnv has quit IRC16:15
*** jmccarthy has left #openstack-dns16:31
*** catintheroof has joined #openstack-dns16:52
*** catintheroof has quit IRC17:11
*** catintheroof has joined #openstack-dns17:56
*** masber has quit IRC18:31
*** saphi has quit IRC18:37
*** catintheroof has quit IRC20:55
*** catintheroof has joined #openstack-dns20:56
*** masber has joined #openstack-dns21:27
*** masber has quit IRC21:32
*** abalutoiu_ has joined #openstack-dns22:17
*** egonzalez has quit IRC22:51
*** catintheroof has quit IRC23:17

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!