Wednesday, 2021-02-17

openstackgerritMerged openstack/designate stable/ussuri: Fix pdns4 devstack plugin  https://review.opendev.org/c/openstack/designate/+/75513800:38
openstackgerritMerged openstack/designate stable/train: Fix pdns4 devstack plugin  https://review.opendev.org/c/openstack/designate/+/77590201:12
openstackgerritNicolas Bock proposed openstack/designate stable/stein: Fix pdns4 devstack plugin  https://review.opendev.org/c/openstack/designate/+/77590301:15
openstackgerritMichael Chapman proposed openstack/python-designateclient stable/victoria: Increase hacking version  https://review.opendev.org/c/openstack/python-designateclient/+/77595602:00
openstackgerritMichael Chapman proposed openstack/python-designateclient stable/victoria: Fix lower-constrains job  https://review.opendev.org/c/openstack/python-designateclient/+/77596003:41
*** jpward has quit IRC04:33
*** hamalq has quit IRC04:41
michchapeandersson, should I squash those two designateclient stable/victoria patches to get the gate going, or is there a better way?05:27
eanderssonmichchap I think it's fine to just merge the working patch and then rebase the failing once after that05:36
eanderssonIf you want you can just rebase the hacking change on top of the lower-req change05:37
nicolasbockWe could also drop the lower constraints as we did for designate13:04
michchapeandersson, the lower req change is currently on top of the hacking change - they're each fixing a different job13:15
michchapnicolasbock I could put it in with the hacking change if they both need to go back a few versions13:20
nicolasbockYes, I think that would work michchap13:21
openstackgerritMichael Chapman proposed openstack/python-designateclient stable/victoria: Increase hacking version  https://review.opendev.org/c/openstack/python-designateclient/+/77595613:24
openstackgerritNicolas Bock proposed openstack/designate master: Increment zone serial instead of using time  https://review.opendev.org/c/openstack/designate/+/77617313:33
nicolasbockeandersson, johnsom: Colud you have a look at https://review.opendev.org/c/openstack/designate/+/776173 ? This is still a draft. I wanted to get your input first to see whether I am going in the right direction.13:35
nicolasbockeandersson: The background of this is that we discovered in Stein (since it's lacking the central global lock) the way zone serials are incremented lead to a race condition and result in missing records in the BIND9 backend13:36
nicolasbockSince the serial increment granularity is seconds, BIND9 does not always AXFR the zone because it doesn't see a serial increment13:37
nicolasbockWe are using the serial also as a time stamp though, which means that we need to separate those two uses (as a serial and as a timestamp) to make this work properly13:37
nicolasbockI added some more information for Designate/devstack usage at https://review.opendev.org/c/openstack/designate/+/76554115:37
nicolasbockI found this helpful because I don't do this often enough to remember how this works :)15:37
eanderssonnicolasbock I'll take a look, but another fix is to enable a coordinator20:10
nicolasbockAs in a global lock?20:11
eanderssonIt's a global lock yea, but per domain etc20:11
nicolasbockI was concerned about the performance impact of that though20:11
eanderssonNot super sure this would fix it thou20:11
eanderssonWe probably want mysql to handle the increment20:12
nicolasbockYour global lock fixes it20:12
nicolasbockI confirmed that20:12
nicolasbockRight20:12
nicolasbockThat's what my change is doing20:12
nicolasbockSo it's locked on that operation only20:12
nicolasbockIn our case a customer is using Terraform to create a log of recordsets in parallel20:12
nicolasbockYour global lock fixes the issue but the performance is pretty much serial20:13
nicolasbockI don't mean to criticize your change though. I think it's great :)20:13
eanderssonYea - it would be for a single domain20:13
eanderssonNah my fix was fixing the existing global lock :D20:13
eanderssonSo no worries20:13
nicolasbockHaha20:14
nicolasbockLocking on the SQL UPDATE operation is a lot more fine grained20:14
nicolasbockBut I don't quite see whether this will beak the delayed actions20:14
nicolasbockSince we are using the serial also as a timestamp20:14
eanderssonYou will need to add a get operation as well20:15
eanderssonbecause I don't think current_serial will work right?20:16
nicolasbockNo, that's just the timestamp20:16
nicolasbockI have https://review.opendev.org/c/openstack/designate/+/776173/1/designate/storage/impl_sqlalchemy/__init__.py#43220:17
eanderssonLike this code https://review.opendev.org/c/openstack/designate/+/776173/1/designate/worker/tasks/zone.py20:17
nicolasbockThat updates the serial with the DB version20:17
eanderssonYea20:17
nicolasbockAh right, yes, that's where I am not sure I might have broken something20:18
eanderssonbut twhat is this criterion used for?20:18
eanderssonProbably best to just remove the serial here and see what breaks lol20:18
nicolasbockHah20:18
eandersson(or fetch it from the db)20:18
nicolasbockOk I will try that20:18
nicolasbockI'll go through that code path more carefully20:18
nicolasbockThanks for the help eandersson !20:18
eanderssonI wonder if there are other race condiitons20:19
eanderssonbesides just serial increment20:20
eanderssonIf not we could look at removing the lock all together20:20
nicolasbockYes,  we don't test anything in parallel as far as I know20:20
nicolasbockI would be for removing it20:20
nicolasbockIt's a performance issue for sure20:20
nicolasbockBut on the other hand we should test the parallel aspect20:20
eanderssonThere are some nasty nested calls in central that we sohuld fix as well20:20
nicolasbockYes, agreed20:21
eanderssonhttps://github.com/openstack/designate/blob/master/designate/rpc.py#L22520:21
eanderssonI really want to remove this code someday20:21
nicolasbockI love that naming ;)20:21
nicolasbockAre you aware of other projects testing parallel API calls?20:21
eanderssonIt might be worth asking in #openstack-qa maybe20:22
nicolasbockTrue20:22
nicolasbockLet me head over there and ask20:22
johnsomRally should be for some projects.20:23
eanderssonbtw one of the reasons I think they moved away from incremental increases with Designate was because the serial value would get too high20:23
johnsomOctavia also has a "beat the heck out of it" test tool for the API. I wrote it so I could test our DB locking.20:23
eanderssonWe had to modify the schema at the time to bump the serial from uin32 to uint6420:23
johnsomWell, the protocol has a limit too20:24
eanderssonI used that a lot johnsom20:24
johnsomI think it's uint3220:24
nicolasbockSo how does a dynamic DNS provider deal with this issue eandersson ?20:24
eanderssonWell we can just bump it to a uint6420:24
eanderssonIdeal would probably be to allow batch updates, but that is practically impossible to implement today :D20:25
johnsomhttps://tools.ietf.org/html/rfc1035#section-3.3.1320:26
johnsom Serial is a uint3220:26
eanderssonoh fun20:26
eanderssonAnother approach we could do is move the increment to a periodic job :D20:26
nicolasbockRight, just found the same as johnsom20:26
nicolasbockWe could batch zone updates20:27
nicolasbockI guess just as you suggest eandersson20:27
nicolasbockBIND9 can't really issue AXFRs that fast anyway20:27
nicolasbockWe already have the delayed NOTIFY setting20:27
nicolasbockBatching could be folded into that option20:28
eanderssonYea20:28
eanderssonThat is probably the ideal20:28
eanderssonNo reason to inc serial for each tiny little change20:29
eanderssonI was very confused when I first hit this bug, because I thought it was a worker / producer issue20:29
nicolasbockYes, it took me a while to find the problem20:29
eanderssonhttps://bugs.launchpad.net/designate/+bug/176861820:29
openstackLaunchpad bug 1768618 in Designate "Designate-Worker occasionally raises Bad Action" [Critical,Fix released] - Assigned to Erik Olof Gunnar Andersson (eandersson)20:29
eanderssonThis was the initial bug I hit20:30
johnsomThe other is to let serial just increment. Everything *should* know how to handle a roll over20:30
nicolasbockAh, interesting20:30
johnsomBut, yeah, I don't see harm in some form of short batching.20:30
eanderssonYea - and it should be partly implemented already20:31
nicolasbockI'll have a look at that then20:31
eanderssonIt would drastically improve performance as well central does not have to care about it.20:31
nicolasbockI'll expand https://mxtoolbox.com/problem/dns/dns-soa-serial-number-format#:~:text=It%20has%20become%20common%20to,with%20the%20format%20of%20YYYYmmddss.20:31
nicolasbockTrue20:31
eanderssonCentral is such a train wreck and this code path is part of the problem20:31
nicolasbockHaha, very diplomatically put20:32
eanderssonbtw I wonder if maybe old Designate was using a signed int instead of unsigned20:32
eanderssonhttps://github.com/openstack/designate/blob/stable/stein/designate/backend/impl_powerdns/migrate_repo/versions/001_add_initial_schema.py#L3020:32
johnsomThat would have been an....oversight20:32
eanderssonNot sure what Integer() equals to20:33
johnsomHa, and the changing tide of what sqlalchemy decided an "Integer" was for that backend....20:33
nicolasbockOh did that change johnsom ?20:35
johnsomWell, I don't know for sure, but things like that have been an issue for me in the past.20:36
nicolasbockThe documentation https://docs.sqlalchemy.org/en/13/core/type_basics.html#sqlalchemy.types.Integer is a little unclear here20:36
eanderssonI have a hard time thinking we hit 4,294,967,295 :D20:36
nicolasbockIn Python2 there are two int types, no?20:37
nicolasbockeandersson: Well I wouldn't put anything past an intrepid user :)20:37
nicolasbockBut that's a lot of updates20:37
nicolasbockWe could start the serial at 0 by the way20:38
nicolasbockThat would give us the full range20:38
johnsomYeah, we have what, 17 more years?20:38
eanderssonYea - for new zones20:38
nicolasbockI like to panic early johnsom :) lol20:38
johnsomFYI, serial should start with 1. grin20:39
nicolasbockHaha. Back in the day I wrote a lot of Fortran and C code and this always got me20:40
johnsomeandersson So, when are you posting the RFC to bump the SOA SERIAL field to uint64?20:43
* johnsom has a nagging feeling people would be like, "just roll it over"20:44
nicolasbocklol20:44
nicolasbockWe should also at some point work on fixing ERROR conditions. I have a zone in ERROR. BIND9 doesn't have the zone anymore, but Designate has it in the DB.20:45
nicolasbockThere is no way to remove it without DB surgery20:45
nicolasbockOctavia runs into something like that once in a while as well20:45
johnsomAh, but Octavia has two tools to fix ERROR: failover and delete20:46
nicolasbockWe could add a `--force` option to the CLI?20:46
nicolasbockRight20:46
nicolasbockTrue20:46
nicolasbockWe need something here in Designate20:46
johnsomNow, I don't know the whole situation, but wouldn't https://docs.openstack.org/api-ref/dns/?expanded=abandon-zone-detail#abandon-zone work for you?20:47
johnsomI.e. that is my disclaimer of "If you try this, you are on your own"20:47
nicolasbockWow, I had no idea this API existed :)20:47
nicolasbockHaha20:47
nicolasbockI will try that, but the description fits the bill20:48
johnsomI have never used it20:48
eanderssonNever seen that api before lol20:51
nicolasbockhaha20:52
nicolasbockI am going to try it out20:52
johnsomWatch, it's documented but not implemented. lol20:53
nicolasbockI am sure a user will find it at some point and blow up their Designate ;)20:53
johnsomI *hope* it has Admin RBAC on it20:53
nicolasbockOMG it works!20:53
nicolasbockWell, good question johnsom20:53
eanderssonhttps://github.com/openstack/designate/blob/master/designate/api/v2/controllers/zones/tasks/abandon.py20:53
eanderssonWould probably be neat to add that to the cli20:54
nicolasbockhttps://github.com/openstack/designate/blob/1ea6d44a51df01f5a15eca2d380c849b8e297941/designate/common/policies/zone.py#L5720:55
nicolasbockIt's in the CLI20:55
nicolasbockI was just using it20:55
eanderssonoh20:55
eanderssonhaha20:55
nicolasbockAnd yes, johnsom, it's admin20:55
nicolasbockI had never noticed that command though eandersson20:55
eanderssonI have only touched that file once so that is probably why lol20:55
nicolasbockhttps://github.com/openstack/python-designateclient/blob/e6e7b191a3b19a2af99fd2c8a860dea3083bc6b1/designateclient/v2/cli/zones.py#L26020:56
nicolasbockHaha20:57
* johnsom disappears to yet another meeting today20:58
nicolasbocko/ johnsom20:59
*** eandersson has quit IRC21:40
*** gmann is now known as gmann_afk21:51
openstackgerritNicolas Bock proposed openstack/designate stable/stein: Adding distributed locking to central  https://review.opendev.org/c/openstack/designate/+/77628822:42
openstackgerritLuigi Toscano proposed openstack/designate stable/ussuri: Native Zuul v3 designate-grenade-pdns4 job  https://review.opendev.org/c/openstack/designate/+/75275423:17
*** gmann_afk is now known as gmann23:22
openstackgerritLuigi Toscano proposed openstack/designate stable/train: WIP DNM more debug for devstack  https://review.opendev.org/c/openstack/designate/+/77629323:27
openstackgerritLuigi Toscano proposed openstack/designate stable/ussuri: WIP DNM more debug for devstack  https://review.opendev.org/c/openstack/designate/+/77629423:28

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!