clarkb | Almost meeting time. I suspect it might be another lightly attended quicker one due to the PTG and PTO for some folks | 19:00 |
---|---|---|
clarkb | heh by the time I actually tpyed that the clock had ticked over to meeting time | 19:00 |
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Oct 24 19:00:36 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/5OF3FQDLDDDDH7XQAJXMRBA6EEAAUBQY/ Our Agenda | 19:00 |
frickler | I'm around-ish | 19:00 |
clarkb | #topic Announcements | 19:01 |
clarkb | As just mentioned the PTG is currently happenign this week | 19:01 |
clarkb | I think many of us have been participating so no surprises, but please do keep that in mind if you are making changes to etherpad/meetpad | 19:01 |
clarkb | #topic Mailman 3 | 19:01 |
clarkb | I don't think fungi is here to day so I'll try to recap best I can | 19:01 |
clarkb | Since the last meeting the RH email problems have been resolved. We didn't make any changes on our end | 19:02 |
clarkb | The change to remove lists.openstack.org from our inventory has merged | 19:02 |
clarkb | The next step will be to snapshot the old server and shut it down. Before doing that I think we should extract the latest kernel and put it in place to ensure that snapshot is bootable | 19:03 |
clarkb | There is still the question of adding MX records. I think we can proceed with that if we choose to but doesn't seem directly related to the mm3 migration (we didn't have them before, don't have them now, and stuff seems to generally work) | 19:03 |
clarkb | On the usage side mm3 seems to be working. I haven't seen any major complaints | 19:04 |
clarkb | One thing I noticed today is that mm3 doesn't indicate who the various list admins are for lists like mm2 did | 19:05 |
clarkb | But that is a minor thing | 19:05 |
clarkb | #topic LE Certcheck List Building Failures | 19:05 |
clarkb | Getting `'ansible.vars.hostvars.HostVarsVars object' has no attribute 'letsencrypt_certcheck_domains'` | 19:05 |
clarkb | THis happens less than 100% of the time. I suspect this is an ansible problem | 19:05 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/898475 Changes to LE roles to improve debugging | 19:05 |
clarkb | this change is intended to improve our logging of the situation so that we can debug it better | 19:06 |
clarkb | reviews welcome. Also Ansible 8 may be the solution | 19:06 |
clarkb | #topic Ansible 8 Upgrade on Bridge | 19:06 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/898505 will update us to Ansible 8 on Bridge. Should be merged when we can monitor | 19:06 |
clarkb | I'd like to merge this when fungi is back from PTO | 19:06 |
clarkb | mostly so that we can ensure the most set of eyeballs are present when that happens | 19:06 |
frickler | +1 | 19:07 |
clarkb | consider this a heads up that we'll be doing that upgrade soon. The upgrade on the zuul side went well as did system-config-run-* testing with ansible 8 so its probably fine but good to be careful with global changes like this | 19:07 |
clarkb | #topic Server Upgrades | 19:08 |
clarkb | No movement on this recently | 19:08 |
clarkb | #topic Python container updates | 19:08 |
clarkb | #link https://review.opendev.org/q/(+topic:bookworm-python3.11+OR+hashtag:bookworm+)status:open | 19:08 |
clarkb | Next up here is removing builds of our python3.9 iamges | 19:08 |
clarkb | I think https://review.opendev.org/c/opendev/system-config/+/898480 should be an easy review to do that removal if anyone has time | 19:08 |
clarkb | once that is done I'll see about syncing with osc and zuul-operator folks to figure out migrations of those images | 19:09 |
clarkb | #topic Gitea 1.21 Upgrade | 19:10 |
clarkb | There is an rc2 for this release now, but still not seeing a changelog | 19:10 |
clarkb | I guess I should update https://review.opendev.org/c/opendev/system-config/+/897679/3/docker/gitea/Dockerfile to rc2 and see if our testing complains | 19:11 |
clarkb | #topic Linaro Cloud SSL Cert Updates | 19:11 |
clarkb | As noted recently our zuul service was not marking jobs NODE_FAILURE for invalid labels because the linaro cloud had an invalid cert and processing for that cloud couldn't progress far enough to register node denials | 19:12 |
frickler | is that something that should be improved in nodepool? | 19:13 |
clarkb | I sent email to kevinz and his response was to quote an email ianw had sent about this in the past. Basically we haev access to the server and can run acme.sh to reprovision LE certs and thenrun kolla-ansible to apply the cert to the system | 19:13 |
clarkb | frickler: possibly. If the cloud is broken due to certs then I could see that being an implied denial | 19:13 |
clarkb | others may argue that issues like that may be temporary and it is better to have the cloud continue to retry? | 19:13 |
clarkb | I think my preference would be to go ahead and fail faster, but we should see if other nodepool users agree | 19:14 |
frickler | IMO at least after some timeout (couple of hours?) the request should fail | 19:14 |
clarkb | I went through the process kevinz shared and reprovisioned things and now all is well. | 19:14 |
frickler | getting jobs stuck for days isn't nice | 19:14 |
clarkb | I want ot note that our access is via bridge's root key as we don't have our users created on that server | 19:15 |
frickler | ah, I was just about to ask how to access it | 19:15 |
clarkb | and I stuck a text document with the process on the server itself. Easier to find there than in email I think. But maybe that should go into our actual docs somewhere? | 19:15 |
clarkb | oh and our monitoring of the name for ssl cert expiry used the wrong name. That has been fixed so we'll get remindersin two months to redo this | 19:16 |
frickler | depends on whether we can make progress with automating it. if we need to do this ourselves now every 2-3 months, that may increase the motivation to do so | 19:16 |
clarkb | ya ianw was working on that before other tasks took over | 19:16 |
clarkb | I think it is doable | 19:16 |
clarkb | but needs some work | 19:17 |
clarkb | I think that might look like a daily job that runs acme.sh, copies files if they change, then runs kolla-ansible if there are updates | 19:17 |
clarkb | I don't think we want to integrate it into our existing acme.sh runs because the infrastructure for the domain is all different | 19:17 |
clarkb | woudl just get ugly douing that and intead we could do the naive simple thing | 19:18 |
clarkb | #topic Gerrit 3.8 Upgrade Planning | 19:19 |
clarkb | This is the major task I'm trying to drive to completion right now | 19:19 |
clarkb | #link https://etherpad.opendev.org/p/gerrit-upgrade-3.8 | 19:19 |
clarkb | Good news is this upgrade looks straightforward. There are no schema chagnes and we can trivially downgrade with an offline reindex if necessary | 19:20 |
clarkb | That said there are a number of breaking changes in the release notes that I'm slowly working through in that etherpad to sort out if they affect us before we take the plunge | 19:20 |
clarkb | The current one is https://review.opendev.org/c/opendev/system-config/+/898989 | 19:20 |
clarkb | I had set up autoholds for that yseterday before calling it a day but they all failed on the ruamel thing so I had to recycle those and should have new holds soon | 19:21 |
clarkb | I want ot make sure that commentlink changes don't come iwth any behavior differences then we can udpate our config on 3.7 first | 19:21 |
clarkb | and then it will be on to the next thing in the list | 19:21 |
clarkb | As for the upgrade itself I'm thinking November 17 or December 1. THey are both Fridays. November 10 doesn't work as I'm out that day. November 24 is part of the US Thanksgiving holiday which is a conflict too | 19:22 |
clarkb | by next week I should have an idea if November 17 is doable and we can announce it at that point | 19:22 |
clarkb | #topic Open Discussion | 19:23 |
clarkb | That was all I had on the proper agenda. | 19:23 |
clarkb | Worth noting we capped ruamel.yaml in our bridge ansible installs because ARA doesn't work with the latest versions | 19:23 |
clarkb | this broke system-config-run-* jobs and led to me failed node holds for gerrit the first time around | 19:23 |
clarkb | Anything else? | 19:24 |
frickler | not from me | 19:24 |
clarkb | thank you for your time today and for all the help keeping things running | 19:25 |
clarkb | As usual feel free to bring things up outside the meeting in #opendev or on the mailing list | 19:26 |
clarkb | enjoy the PTG! hope it is a productive week for everyone | 19:26 |
clarkb | #endmeeting | 19:26 |
opendevmeet | Meeting ended Tue Oct 24 19:26:30 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:26 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-10-24-19.00.html | 19:26 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-10-24-19.00.txt | 19:26 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-10-24-19.00.log.html | 19:26 |
frickler | thx clarkb | 19:26 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!