Tuesday, 2023-10-24

clarkbAlmost meeting time. I suspect it might be another lightly attended quicker one due to the PTG and PTO for some folks19:00
clarkbheh by the time I actually tpyed that the clock had ticked over to meeting time19:00
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Oct 24 19:00:36 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/5OF3FQDLDDDDH7XQAJXMRBA6EEAAUBQY/ Our Agenda19:00
fricklerI'm around-ish19:00
clarkb#topic Announcements19:01
clarkbAs just mentioned the PTG is currently happenign this week19:01
clarkbI think many of us have been participating so no surprises, but please do keep that in mind if you are making changes to etherpad/meetpad19:01
clarkb#topic Mailman 319:01
clarkbI don't think fungi is here to day so I'll try to recap best I can19:01
clarkbSince the last meeting the RH email problems have been resolved. We didn't make any changes on our end19:02
clarkbThe change to remove lists.openstack.org from our inventory has merged19:02
clarkbThe next step will be to snapshot the old server and shut it down. Before doing that I think we should extract the latest kernel and put it in place to ensure that snapshot is bootable19:03
clarkbThere is still the question of adding MX records. I think we can proceed with that if we choose to but doesn't seem directly related to the mm3 migration (we didn't have them before, don't have them now, and stuff seems to generally work)19:03
clarkbOn the usage side mm3 seems to be working. I haven't seen any major complaints19:04
clarkbOne thing I noticed today is that mm3 doesn't indicate who the various list admins are for lists like mm2 did19:05
clarkbBut that is a minor thing19:05
clarkb#topic LE Certcheck List Building Failures19:05
clarkbGetting `'ansible.vars.hostvars.HostVarsVars object' has no attribute 'letsencrypt_certcheck_domains'`19:05
clarkbTHis happens less than 100% of the time. I suspect this is an ansible problem19:05
clarkb#link https://review.opendev.org/c/opendev/system-config/+/898475 Changes to LE roles to improve debugging19:05
clarkbthis change is intended to improve our logging of the situation so that we can debug it better19:06
clarkbreviews welcome. Also Ansible 8 may be the solution19:06
clarkb#topic Ansible 8 Upgrade on Bridge19:06
clarkb#link https://review.opendev.org/c/opendev/system-config/+/898505 will update us to Ansible 8 on Bridge. Should be merged when we can monitor19:06
clarkbI'd like to merge this when fungi is back from PTO19:06
clarkbmostly so that we can ensure the most set of eyeballs are present when that happens19:06
frickler+119:07
clarkbconsider this a heads up that we'll be doing that upgrade soon. The upgrade on the zuul side went well as did system-config-run-* testing with ansible 8 so its probably fine but good to be careful with global changes like this19:07
clarkb#topic Server Upgrades19:08
clarkbNo movement on this recently19:08
clarkb#topic Python container updates19:08
clarkb#link https://review.opendev.org/q/(+topic:bookworm-python3.11+OR+hashtag:bookworm+)status:open19:08
clarkbNext up here is removing builds of our python3.9 iamges19:08
clarkbI think https://review.opendev.org/c/opendev/system-config/+/898480 should be an easy review to do that removal if anyone has time19:08
clarkbonce that is done I'll see about syncing with osc and zuul-operator folks to figure out migrations of those images19:09
clarkb#topic Gitea 1.21 Upgrade19:10
clarkbThere is an rc2 for this release now, but still not seeing a changelog19:10
clarkbI guess I should update https://review.opendev.org/c/opendev/system-config/+/897679/3/docker/gitea/Dockerfile to rc2 and see if our testing complains19:11
clarkb#topic Linaro Cloud SSL Cert Updates19:11
clarkbAs noted recently our zuul service was not marking jobs NODE_FAILURE for invalid labels because the linaro cloud had an invalid cert and processing for that cloud couldn't progress far enough to register node denials19:12
frickleris that something that should be improved in nodepool?19:13
clarkbI sent email to kevinz and his response was to quote an email ianw had sent about this in the past. Basically we haev access to the server and can run acme.sh to reprovision LE certs and thenrun kolla-ansible to apply the cert to the system19:13
clarkbfrickler: possibly. If the cloud is broken due to certs then I could see that being an implied denial19:13
clarkbothers may argue that issues like that may be temporary and it is better to have the cloud continue to retry?19:13
clarkbI think my preference would be to go ahead and fail faster, but we should see if other nodepool users agree19:14
fricklerIMO at least after some timeout (couple of hours?) the request should fail19:14
clarkbI went through the process kevinz shared and reprovisioned things and now all is well.19:14
fricklergetting jobs stuck for days isn't nice19:14
clarkbI want ot note that our access is via bridge's root key as we don't have our users created on that server19:15
fricklerah, I was just about to ask how to access it19:15
clarkband I stuck a text document with the process on the server itself. Easier to find there than in email I think. But maybe that should go into our actual docs somewhere?19:15
clarkboh and our monitoring of the name for ssl cert expiry used the wrong name. That has been fixed so we'll get remindersin two months to redo this19:16
fricklerdepends on whether we can make progress with automating it. if we need to do this ourselves now every 2-3 months, that may increase the motivation to do so19:16
clarkbya ianw was working on that before other tasks took over19:16
clarkbI think it is doable19:16
clarkbbut needs some work19:17
clarkbI think that might look like a daily job that runs acme.sh, copies files if they change, then runs kolla-ansible if there are updates19:17
clarkbI don't think we want to integrate it into our existing acme.sh runs because the infrastructure for the domain is all different19:17
clarkbwoudl just get ugly douing that and intead we could do the naive simple thing19:18
clarkb#topic Gerrit 3.8 Upgrade Planning19:19
clarkbThis is the major task I'm trying to drive to completion right now19:19
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.819:19
clarkbGood news is this upgrade looks straightforward. There are no schema chagnes and we can trivially downgrade with an offline reindex if necessary19:20
clarkbThat said there are a number of breaking changes in the release notes that I'm slowly working through in that etherpad to sort out if they affect us before we take the plunge19:20
clarkbThe current one is https://review.opendev.org/c/opendev/system-config/+/89898919:20
clarkbI had set up autoholds for that yseterday before calling it a day but they all failed on the ruamel thing so I had to recycle those and should have new holds soon19:21
clarkbI want ot make sure that commentlink changes don't come iwth any behavior differences then we can udpate our config on 3.7 first19:21
clarkband then it will be on to the next thing in the list19:21
clarkbAs for the upgrade itself I'm thinking November 17 or December 1. THey are both Fridays. November 10 doesn't work as I'm out that day. November 24 is part of the US Thanksgiving holiday which is a conflict too19:22
clarkbby next week I should have an idea if November 17 is doable and we can announce it at that point19:22
clarkb#topic Open Discussion19:23
clarkbThat was all I had on the proper agenda.19:23
clarkbWorth noting we capped ruamel.yaml in our bridge ansible installs because ARA doesn't work with the latest versions19:23
clarkbthis broke system-config-run-* jobs and led to me failed node holds for gerrit the first time around19:23
clarkbAnything else?19:24
fricklernot from me19:24
clarkbthank you for your time today and for all the help keeping things running19:25
clarkbAs usual feel free to bring things up outside the meeting in #opendev or on the mailing list19:26
clarkbenjoy the PTG! hope it is a productive week for everyone19:26
clarkb#endmeeting19:26
opendevmeetMeeting ended Tue Oct 24 19:26:30 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:26
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-10-24-19.00.html19:26
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-10-24-19.00.txt19:26
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-10-24-19.00.log.html19:26
fricklerthx clarkb 19:26

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!