clarkb | meeting time | 19:00 |
---|---|---|
ianw | o/ | 19:00 |
clarkb | somehow I am even mostly ready | 19:00 |
clarkb | its been an entire day of meetings | 19:01 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue May 2 19:01:06 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JNR4PAHZ5JD272JXUC3BQUSZPLRIJYID/ Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | I didn't have any on the agenda. | 19:01 |
clarkb | #topic Migrating to Quay | 19:02 |
clarkb | Significant progress has been made here | 19:02 |
clarkb | Zuul and all of its images have been moved and are automatically publishing to quay now. We also updated our deployment tooling to pull zuul and friends from quay | 19:03 |
clarkb | Since then Zuul had its normal weekend restart and that all seemed to work (nodepool auto updated shortly after the changes landed) | 19:03 |
clarkb | the only thing I notice as potentially problematic is I think docker image prune is treating the docker hub images as something it should keep around by default so we may need to do manual cleanup at some point. THis isn't urgent though but something to be aware of as we move things | 19:04 |
clarkb | On the opendev side of things my changes to auto create repos if necessary landed and I updated system onfig with a second set of jobs that can be inherited from to move images to quay. I did this with zookeeper-statsd and that seems to work | 19:05 |
clarkb | Where this leaves us is making a plan and todo list for getting all of our images updated. In particular one thing that is annoying is that we have some iamges that rely on other images. We can either update these at the root first or at the leaves first | 19:05 |
clarkb | There are disadvantages and advantages to each approach. If we update the root first (python-base/python-builder in particular) then we would want to pretty quickly update all of their descendents to avoid potentially getting caught out if we had to make quick updates to the base images. | 19:06 |
clarkb | But if we do the root first we probably only need to do a single pass of image rebuulds and publication as the flow follows the direction of the dependencies | 19:07 |
clarkb | if we go theo hter direction and update the leaves first then we don't need to do things as quickly because we can update the base images and rebuild and pull from docker if necessary at any time | 19:07 |
clarkb | the downside to doing the leaves first is that we will want to update them to publish to quay first, then do the base images, then reupdate all the leaves again to pull from quay | 19:07 |
clarkb | I have two changes up for updating the root first | 19:08 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/881932 move base images to quay | 19:08 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/881933?usp=dashboard consume base images from quay | 19:08 |
clarkb | and one that updates a leaf | 19:08 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/881931?usp=dashboard | 19:08 |
clarkb | #undo | 19:09 |
opendevmeet | Removing item from minutes: #link https://review.opendev.org/c/opendev/system-config/+/881931?usp=dashboard | 19:09 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/881931?usp=dashboard Move ircbot to quay | 19:09 |
clarkb | The idea here was to illustrate both approaches | 19:09 |
clarkb | I think whatever approach we do we should write down a small plan with a todo list to not get lost in the ~40 something images that all need to be done | 19:09 |
clarkb | and then we should also consider setting aside a few days to really focus on getting as much of this done as possible so that the time period where we might have to debug both docker and quay things is minimized | 19:10 |
fungi | sounds good to me | 19:10 |
ianw | fwiw it feels like updating the base images first makes the most sense | 19:10 |
clarkb | ianw: ya I am starting to come around to that myself | 19:10 |
ianw | and then working through a checklist to deploy it | 19:10 |
fungi | except it could lock us out of making base image updates to the leaf images for the remainder of the transition | 19:11 |
clarkb | fungi: yes. That said there is an outlet. We could manually push what we have pushed to quay.io to docker hub | 19:11 |
fungi | but as long as that timeframe is reasonably short, i'm in favor of whichever path is the least effort | 19:11 |
clarkb | yesterday I wasn't considering that we could do that manual push and I was far more concerned about the time where updates to base imges would be difficult | 19:11 |
ianw | yeah, i think the fact that we have updated the base images, and have a list of tasks to work through, puts a nice constraint on getting it done | 19:12 |
clarkb | but since then I've realized that this is a reasonable outlet and I think doing base images first is fine | 19:12 |
clarkb | in that case I will start putting together a document (etherpad most likely) with a overview of the plan and a todo list and links to ongoing work | 19:12 |
fungi | wfm | 19:12 |
clarkb | cool. The only other thing I wanted to mention (and this should go on the todo list) is we will need to resync some of the images from docker hub to quay.io manually before updating them as a few have had updates pushed to docker since I did the intial sync | 19:13 |
ianw | ++ sounds good. i can probably help as i have somewhere a fairly recent list of all the images | 19:13 |
clarkb | not a huge deal. Just a step on the todo list | 19:13 |
ianw | at one point, i was trying to automate getting them into a .dot file for graphical view, i can't remember why | 19:14 |
clarkb | we have a plan to make a plan. I'll take it. Hopefulyl I'll have somethignto share tomorrow. Seems unlikely I'll get anywhere near done today | 19:15 |
clarkb | Anything else related to quay? | 19:15 |
clarkb | #topic Bastion Host Updates | 19:16 |
clarkb | Still just need reviews on the bridge backup thing | 19:16 |
clarkb | #link https://review.opendev.org/q/topic:bridge-backups | 19:16 |
clarkb | We've also seen some connectivity errors from bridge to various nodes randomly but I don't think that is anything to do with us | 19:16 |
ianw | yeah i have no idea what's up with that | 19:17 |
ianw | it was worrying it happened both on dns deploy changes, but in both cases it was nowhere near anything to do with dns | 19:17 |
clarkb | Probably just something to monitor and if it persists or get worse we can bring it up with our network/cloud providers | 19:17 |
ianw | in both cases it was rax/dfw -> rax/dfw, what you'd think would be the most reliable | 19:18 |
fungi | there were reports of connectivity issues potentially to releases.openstack.org (static02.opendev.org) from job nodes earlier too, so i wonder if rax-dfw is having some network connectivity issues | 19:18 |
clarkb | fungi: that was likely the gitea thing though? | 19:18 |
clarkb | I susppose they could be separate issues | 19:18 |
ianw | istr periods where we've had weird ipv6 dropouts. but with ansible we only use ip4 in inventory | 19:19 |
fungi | hard to know. the releases.o.o url in question is a redirect to opendev.org gitea, but it was unclear whether the problem was before or after redirecting | 19:19 |
fungi | the job logs weren't precise enough to differentiate | 19:20 |
clarkb | ya if that persists after the UA filter update I guess we can dig deeper | 19:20 |
clarkb | #topic Mailman 3 | 19:20 |
clarkb | fungi: any progress with the held test node? | 19:21 |
fungi | none, sorry :/ | 19:22 |
clarkb | #topic Gerrit Updates | 19:22 |
fungi | i ned to prioritize it | 19:22 |
fungi | er, need | 19:22 |
clarkb | ack | 19:22 |
clarkb | The acl updates landed and we should be all set there. However fungi noticed some behavior of the tool that does normalization that might need updating | 19:22 |
clarkb | fungi: do you have links to those changes? | 19:22 |
fungi | #link https://review.opendev.org/882075 Add an "apply" transformation which applies all | 19:23 |
fungi | #link https://review.opendev.org/882080 Make option indenting a selectable transformation | 19:23 |
clarkb | thanks | 19:24 |
fungi | it mostly came up in reviewing recent project additions where the authors were struggling to figure out how to make their editors indent with tab characters | 19:24 |
clarkb | Neither will change the output we already applied but are good updates to making the tool useable | 19:24 |
clarkb | fungi: you hit the tab button on the keyboard :P | 19:25 |
fungi | once those merge, we can recommend a command in our documentation to reformat acls | 19:25 |
ianw | ctl-q tab :) | 19:25 |
fungi | i was going to say "oh just run this and it'll reformat your acl file" but it wasn't as trivial as i remembered | 19:25 |
ianw | didn't the final pass write it out with tabs? | 19:26 |
fungi | i was disappointed that my prediction about people not having a basic grasp of how to use their text editors was accurate | 19:27 |
clarkb | ianw: the changes above basically showed that it was always running in noop mode which only prints not writes to files | 19:27 |
clarkb | ianw: the fix is to have it also write to files so people don't need to have editors that work :) | 19:27 |
clarkb | For the gerrit replication plugin leaking files I have not seen any movement on the bugs I filed and I have not had time to dig into trying ot write fixes mself | 19:27 |
fungi | ianw: 882080 is just to make the tab intending consistent with the other transformation rules. 882075 is about making it easier to reformat an acl from the command line | 19:27 |
clarkb | I'm somewhat scared to look at how many files are on disk now | 19:28 |
ianw | ok, i guess that makes sense. the tox job would spit out a diff file | 19:28 |
clarkb | I am strongly considering that we revert the change to bind mount that directory now | 19:28 |
fungi | well, the tox job already spits out a diff file | 19:28 |
clarkb | then when we restart gerrit we'll clear it out automatically. Far less than ideal but I'm begining to think that might be better than my hacky solution | 19:29 |
fungi | but yes, it's hard for someone who doesn't actually know the difference between spaces and tabs, or possibly doesn't know how to translate those terms into their native tongue, to know what the diff is telling them to do | 19:29 |
clarkb | but if I can get a second reviewer on my hacky solution we can revisit a revert after wards | 19:29 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk | 19:29 |
ianw | fungi: sure -- i guess i don't have an issue with it actually rewriting the file, and then us telling people "run this and check in the result" | 19:30 |
clarkb | I think the only real risk with the replication leas is if we cause ext4 to run out of inodes | 19:30 |
clarkb | *replication leaks. So this is not super urgent but something we should eventually address | 19:31 |
clarkb | And the last gerrit related item is we dropped 3.6 image builds. Added 3.8 images. And have a 3.7 to 3.8 upgrade job running | 19:31 |
clarkb | this has already led to fixing a couple of UI issues with 3.8 | 19:31 |
fungi | ianw: today i saw an acl go through 6 patchsets and the author is still struggling to get the whitespace right | 19:31 |
clarkb | Anything else gerrit related? | 19:33 |
fungi | none on my end | 19:33 |
ianw | nope | 19:33 |
clarkb | #topic Upgrading Servers | 19:33 |
clarkb | Etherpad server cleanup happend so etherpad is done done at this pint | 19:33 |
clarkb | Nameserver migration stuff happened as well (thank you ianw!) | 19:33 |
clarkb | all four of our zones/domains should e running off of new servers with a new authoritative server as well | 19:34 |
clarkb | I think the remaining tasks are cleaning up the old server? | 19:34 |
clarkb | *old servers | 19:34 |
clarkb | ianw: anything we should be helping with to finish this completely? | 19:35 |
ianw | nope, i can remove the old servers now, and there's one zone change to remove the ns1/ns2/adns1 records | 19:35 |
ianw | the remaining todos were AAAA glue records for opendev.org and rdns for the vexxhost server | 19:35 |
clarkb | https://review.opendev.org/c/opendev/zone-opendev.org/+/881935 | 19:36 |
ianw | i've logged a ticket about the rdns, so we'll see what happens | 19:36 |
clarkb | #link https://review.opendev.org/c/opendev/zone-opendev.org/+/881935 Cleanup old dns server records | 19:36 |
clarkb | I can mention to the foundation registrar intermediary that we want AAAA glue records | 19:37 |
clarkb | fungi: assuming you don't have objections or would prefer to do it to help ensure they don't do the wrong thing at the registrar | 19:37 |
fungi | no objections | 19:37 |
fungi | i don't know how to phrase it any better to avoid confusion, i'd just be prepared for confusion | 19:38 |
clarkb | I figure something like "The ns03.opendev.org and ns04.opendev.org servers have ipv4 and ipv6 addresses. A glue records were added for both but not AAAA. Is that something we can add"" | 19:39 |
ianw | we did have them before, so there's that | 19:39 |
fungi | oh, we did? | 19:40 |
clarkb | Ok I'll reach out later today | 19:40 |
fungi | i wasn't sure since i hadn't paid close enough attention before we changed them | 19:40 |
clarkb | The last item I wanted to note on this topic is we have more servers to upgrade. All at lower priority than what we've already done but still worth doing | 19:41 |
clarkb | I'm hoping to keep pushingon that here and there and help is much appreciated | 19:41 |
clarkb | #topic AFS volume utilization | 19:42 |
clarkb | Grafana reports utiliation has jumped again | 19:42 |
clarkb | I suspect the growth is simply not as nicely linear day to day as I would like | 19:42 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/881952 add bookworm mirrors | 19:42 |
clarkb | This change is going to be effectively blocked on us freeing space or adding space though | 19:42 |
clarkb | (I left a note on the change but didnt -1 or -W) | 19:42 |
fungi | yeah, i saw that go by and thought the same. thanks for calling it out | 19:43 |
clarkb | I feel like with everything else going on I have little time to devote to this as it isn't urgent quite yet. But something that will need to be addressed | 19:44 |
ianw | fungi: maybe you could look at https://review.opendev.org/c/opendev/system-config/+/879239 to confirm it's not insane and we can prune the wheels | 19:44 |
fungi | can do | 19:44 |
clarkb | and ya starting with cleanups like that is a good starting point and we can take it from there | 19:44 |
ianw | fedora has now become a move from 36 -> 38 situation | 19:44 |
fungi | and yeah, the bookworm release is still a ways out, but at least there's a date now | 19:44 |
ianw | i'm not sure how much time i'll have to drive that | 19:44 |
clarkb | ianw: one idea I had was whether or not centos 9 stream is sufficiently up to date that fedora is maybe less important? | 19:45 |
clarkb | ianw: but I'm not plugged into how those distros are being used well enough to know if that is the case | 19:45 |
clarkb | naively they seem to fit into a similar space for CI needs anyway | 19:45 |
ianw | heh, yeah, i was about to say we might want to consider the future of it | 19:45 |
clarkb | maybe we should write an email to the service-discuss list about it | 19:46 |
clarkb | to solicit feedback from users to see how that might impact them | 19:46 |
clarkb | I can write and send that if we think it is a good idea | 19:46 |
ianw | it was mostly for devstack; it's quite a lot of overhead in zuul-jobs | 19:46 |
fungi | i'm happy to put a bit of pressure on openstack to justify the continued use of fedora nodes too | 19:47 |
clarkb | ok I can draft that email | 19:47 |
clarkb | I'll be sure to get it looked over before sending just to avoid saying anything obviously incorrect | 19:47 |
clarkb | #topic Gitea 1.19 | 19:48 |
ianw | ++ | 19:48 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/877541 Upgrade to gitea 1.19.2 | 19:48 |
clarkb | I think gitea 1.19 is ready to go when we are | 19:48 |
clarkb | The api requires auth to list orgs bug was fixed and I updated our ansible to reflect this. | 19:48 |
tonyb | I can help with the Fedora update and also the CentOS determination | 19:48 |
clarkb | I did drop a no_log: true in that update since I dropped auth as well. But double check me on that being safe please | 19:48 |
clarkb | tonyb: cool I'll ping you with the email draft too to ensure it makes sense from your perspective and we can edit or decide to hold off if necessary | 19:49 |
clarkb | #link https://158.69.65.228:3081/opendev/system-config Held test node for checking | 19:49 |
clarkb | thsi is a held node running the deployment of 1.19.2 from the above change which can be used for checking it looks good | 19:49 |
tonyb | clarkb: Sounds good | 19:49 |
clarkb | My afternoon is going to be pretty busy bouncing around errands and parenting today but I'm happy to watch that deploy tomorrow if I get the necessary reviews today | 19:50 |
clarkb | The major chagne is the addition of gitea actions which we turn off. This makes it a much simpler upgrade compred to 1.18 | 19:51 |
clarkb | #topic Storyboard | 19:51 |
clarkb | fungi I think you've continued to do some work helping projects gracefully turn stuff off | 19:51 |
clarkb | anything new to add on this? (I think frickler is out for a few weeks right now too) | 19:52 |
fungi | not really, just making sure we're consistently setting projects to inactive if they move off sb | 19:52 |
fungi | and could stand to do an audit between projects.yaml and the sb database | 19:52 |
fungi | to see what we've missed (moves to lp and general retirements) | 19:53 |
clarkb | storyboard doesn't have the gerrit issue of marking a project read only making it difficult to undo does it? | 19:53 |
fungi | nah | 19:53 |
clarkb | Mostly just curious I doubt it would be an issue anyway | 19:53 |
fungi | however, it never added a ui widget to toggle the active field, so has to be done with the cli | 19:53 |
clarkb | fungi: is that documented somewhere? | 19:54 |
clarkb | just thinking it would be good to be able to have other people do it if you go on vacation etc | 19:54 |
fungi | it can be | 19:54 |
fungi | but it's also not an urgent thing to change | 19:54 |
fungi | i'll try to remember to push up some docs, or better still an audit script to go with it | 19:54 |
fungi | mainly just stops the project from getting used in autocomplete for fields and turning up in searches, i think | 19:55 |
clarkb | makes sense | 19:56 |
clarkb | thanks! | 19:56 |
clarkb | #topic Open Discussion | 19:56 |
clarkb | Anything else? | 19:56 |
clarkb | As briefly mentioned earlier my afternoon is going to involve me being in and out. | 19:56 |
fungi | i'll be out a lot over the next few days | 19:57 |
fungi | but not gone entirely. some errands, long lunches, and a half day on friday | 19:57 |
clarkb | Sounds like that may be everything. Thank you for your time. We'll be back here next week | 19:58 |
clarkb | enjoy your morning/afternoon/evening! | 19:58 |
clarkb | #endmeeting | 19:58 |
opendevmeet | Meeting ended Tue May 2 19:58:37 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:58 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-05-02-19.01.html | 19:58 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-05-02-19.01.txt | 19:58 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-05-02-19.01.log.html | 19:58 |
fungi | thanks clarkb! | 19:58 |
ianw | thanks! | 19:59 |
tonyb | Thanks all | 19:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!