Friday, 2024-04-05

opendevreviewJames E. Blair proposed opendev/system-config master: Add zuul-db01 to cacti  https://review.opendev.org/c/opendev/system-config/+/91510100:15
corvus1i have started a timed import process00:20
fungifinally have my q2 paperwork knocked out, so can be semi-helpful again00:21
opendevreviewJames E. Blair proposed opendev/system-config master: Restrict permissions on mariadb compose file  https://review.opendev.org/c/opendev/system-config/+/91510200:22
corvus1fungi: two more changes ^00:22
corvus1memory during the import looks good; mysql is using 51% as expected.  theres about 3Gi available, almost all currently used for buffers/cache00:25
corvus1s/mysql/mariadb/ :)00:25
corvus1s/mariadb/mariadbd/ :)00:25
corvus1it's using about 3/8 cores00:26
fungiah, yep, i did the same perms on the mailman3 server too00:58
fungiresource utilization sounds spot-on01:00
opendevreviewMerged opendev/system-config master: Mariadb: listen on all IP addresses  https://review.opendev.org/c/opendev/system-config/+/91509601:56
opendevreviewMerged opendev/system-config master: Add zuul-db01 to cacti  https://review.opendev.org/c/opendev/system-config/+/91510101:56
corvus1real    88m4.022s02:07
corvus1that's the good news.  the bad news is that the mariadb 10.11 query planner has come up with a third way of handling these queries, and it's worse than mysql 5.7.  i'm going to manually stand up a mysql 8 on this host so we can compare apples to apples, then decide how to proceed.02:23
opendevreviewMerged opendev/system-config master: Restrict permissions on mariadb compose file  https://review.opendev.org/c/opendev/system-config/+/91510203:17
*** TheMaster is now known as Unit19309:25
fungiafter lengthy debate, it's looking like the importlib.resources "legacy" api is getting un-deprecated: https://discuss.python.org/t/deprecating-importlib-resources-legacy-api/11386/4712:26
opendevreviewDr. Jens Harbott proposed openstack/project-config master: gerritbot: move docs tools to TC channel  https://review.opendev.org/c/openstack/project-config/+/91513012:49
corvus1real    106m25.886s  for mysql 814:16
corvus1query planner is producing sensible results14:17
corvus1oh this makes things more confusing: https://jira.mariadb.org/browse/MDEV-2730214:18
corvus1apparently we may not actually be able to tell if the query planner wants to use a backwards index scan14:19
opendevreviewThierry Carrez proposed opendev/irc-meetings master: Move release team meeting one hour earlier  https://review.opendev.org/c/opendev/irc-meetings/+/91513414:19
corvus1(but, empirically, it's not, since the query is slow)14:19
corvus1i'm trying one more idea with mariadb; i'm trying to make an explicit descending index on the primary key to see if it (a) will use that automatically, or (b) if i can force it to use it, and if so, if that improves things14:37
fungii think i almost understand what that means ;)14:37
opendevreviewMerged openstack/project-config master: Revert "Temporarily remove release docs semaphores"  https://review.opendev.org/c/openstack/project-config/+/91468914:38
corvus1okay, that's a negative to all of the above :(14:39
opendevreviewDr. Jens Harbott proposed openstack/project-config master: gerritbot: move docs tools to TC channel  https://review.opendev.org/c/openstack/project-config/+/91513014:44
clarkbso this likely is a known behavior difference between mariadb and mysql? It seems likely to me that we should be able to express what we want in mariadb the trick is figuring out how I guess14:48
clarkbfungi: if https://review.opendev.org/c/opendev/system-config/+/914895 looks alright to you I should be able to test that rotates properly today14:50
clarkbI still have my old key so don't anticipate getting locked out either. But on the slim possibility it might be good to only land that if someone else is around to debug should that happen14:51
fungiah, yes sorry, i meant to approve that yesterday but got sidetracked by other activities14:51
opendevreviewMerged opendev/irc-meetings master: Move release team meeting one hour earlier  https://review.opendev.org/c/opendev/irc-meetings/+/91513414:58
corvus1clarkb: i'm not sure how known it is...  my understanding is that both mysql 5.7 and mariadb should be able to perform backward index scans, but with some overhead (but that would be entirely acceptable for us).  but neither seems to be doing so.  meanwhile, 8.0 can perform backward index scans without any extra overhead, and seems to do it automatically.14:58
corvus1clarkb: (then there's the bonus failure of mysql 5.7 of not actually reversing the data at all, and instead returning the first N results instead of the last N)14:59
fungiwhat are the odds that this is the root cause of the lengthy delays in pipeline event processing we've been seeing this week?15:07
corvus1fungi: definitely non-zero15:09
clarkbreading about descending indexes in mysql docs it says "DESC in an index definition is no longer ignored but causes storage of key values in descending order." I wonder if the failure for this to help mariadb implies it is still ignored there15:10
clarkbthat must be something in the sql spec that databases haev long ignored due to complexity?15:10
fungii guess it could also be a second-order effect, inefficient queries putting extreme load on the db server, and that's causing it to be unable to process other queries in a timely fashion15:10
clarkbhttps://mariadb.com/kb/en/descending-indexes/ this says they finally implemented it in mariadb 10.1115:10
clarkbbut maybe it is buggy15:10
opendevreviewMerged opendev/system-config master: Rotate clarkbs ssh key  https://review.opendev.org/c/opendev/system-config/+/91489515:45
corvus1clarkb: fungi i think i see a way to fix zuul with mariadb, but it's going to take some non-trivial zuul changes.  i think we should proceed with migrating to the mysql8 db i manually set up yesterday, run that while i work on the zuul changes necessary to support all 3 platforms, then maybe next weekend we can migrate to the ansible-managed mariadb.16:23
clarkbthat sounds like a good path forward to me16:24
fungisure, no objection here. let me know what help you need16:27
corvus1i think we're actually all set.  the db server is ready, tomorrow we can export/import and then update zuul.conf to point to the new dburi (we can make that change now so it's ready).  then also we'll want to merge the index hint change in zuul and restart schedulers/web again after that.  but we shouldn't merge that before we switch dbms.16:33
fungiyep, makes sense16:34
clarkbI suspect that I'll be around tomorrow having a lazy day (its cold and rainy and motivation to go out and do something is low)16:34
clarkbI'll keep an eye on irc so holler if I can help16:35
fungiyeah, i'm planning to be home all day, may be away from the computer from time to time for gardening tasks16:35
clarkbthe key on bridge seems to have updated and I can still get in. After this cup of tea is brewed I'll ensure I remove my old key from my agent and then spot check things a bit more16:44
clarkband then I need to go look at our PTG doc and try to organize and add content to it16:44
clarkbya this appears to be working for me16:50
clarkbfungi: there is a gitea 1.21.10 upgrade change https://review.opendev.org/c/opendev/system-config/+/914292 up for review if you have a moment. I don't think there is a rush on landing that and you indicated a preference for taking it easy through the ptg which is fine by me16:57
clarkbthe LE job seems to still be failing with that same issue as yesterdy which is also likely why we got an alert that nb02's cert expires in less than a month today17:13
clarkbits intersting that nb02 is the node that also fails in the ansible log17:13
clarkbah ok this is liekly the same issue we have had in the past that just mysteriously went away17:15
clarkboh! nb02 is just straight up failing but for whatever reason we progress forward17:16
clarkbforward in ansible I mean. Which makes the error later a bit of a red herring /me goes to figure out why nb02 is sad17:16
clarkbhrm though maybe those early failures get retried and eventually succeed. That would explain why it continues to proceed later17:17
clarkbok ya the issue is nb02's acme.sh install is in a modified state so the tasks to enforce the state we want are not running17:20
clarkbthen it gives up on nb02 for cert renewal but the certcheck domain list creation doesn't know that and then breaks. So we need to figure out why nb02 is in this state and whether or not to fix it further17:21
clarkbacme is stored in /opt/ and /opt filled up on nb0217:26
clarkbMy hunch here is that the git operations we try to perform on that repo had a sad due to the disk being full and ended up in this state that causes ansible to bail out17:26
clarkbI'm going to manually move that directory aside in /opt/ so that it can be further inspected/debugged but the next run of the LE playbooks should hopefully set us back into a working state17:27
clarkbya filestamp timings seem to align with the timing of disk filling17:29
clarkb#status log Reset acme.sh on nb02 as a full disk appears to have corrupted it17:30
opendevstatusclarkb: finished logging17:30
opendevreviewClark Boylan proposed opendev/system-config master: Add more LE debugging info to our Ansible role  https://review.opendev.org/c/opendev/system-config/+/91517317:36
clarkbthe next LE run (daily periodic I think) should correct the problem and ^ should make it slightly easier to debug in the future17:36
fungigood find17:42
clarkbI added that more explicit loop in order to log the node names because when this happened in the past it didn't even log that17:43
clarkbwe probably would've eventually found it but having the explicit "somethign wrong with nb02" led to finding it quicker17:43
opendevreviewMerged opendev/system-config master: Update gitea to v1.21.10  https://review.opendev.org/c/opendev/system-config/+/91429219:29
fungiinfra-prod-service-gitea is starting (or possibly already done, it's hard to tell with the current lag in zuul reporting)19:45
fungi"/usr/local/bin/gitea web" process started on gitea14 at 19:39, 7 minutes ago19:46
fungihttps://opendev.org/ currently says "Powered by Gitea Version: v1.21.10" so i guess it's done19:47
fungibrowsing around, things look the same as always19:47
fungicloning openstack/nova seems to work fine, albeit slowly (but that's not uncommon for my isp unfortunately)19:49
fungiyep, git clone completed without issue19:54
fungiopenbsd 7.5 today. guess it's upgrade time20:01
Clark[m]Oh heh I popped out for lunch and missed the upgrade. Thank you for pushing that along20:02
fungino sweat. it occurred without incident20:02
opendevreviewJeremy Stanley proposed opendev/system-config master: Upgrade Mailman's MariaDB to 10.11  https://review.opendev.org/c/opendev/system-config/+/91518320:06
fungiwe can probably do that ^ whenever20:06
opendevreviewJeremy Stanley proposed opendev/system-config master: Cleanup lingering Mailman 2 playbook  https://review.opendev.org/c/opendev/system-config/+/91518420:09
clarkbI've skimmed the 6 gitea notes and they all report the expected version20:17
funginodes? if so, yes i concur20:18
clarkbyup I can't type20:19
opendevreviewClark Boylan proposed opendev/system-config master: Add more LE debugging info to our Ansible role  https://review.opendev.org/c/opendev/system-config/+/91517320:40
opendevreviewClark Boylan proposed opendev/system-config master: More completely disable ansible galaxy proxy testing  https://review.opendev.org/c/opendev/system-config/+/91518520:40
clarkbfungi: ^ that galaxy proxy testing is why 915173 refused to pass20:40

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!