Tuesday, 2023-05-02

clarkbmeeting time19:00
ianwo/19:00
clarkbsomehow I am even mostly ready19:00
clarkbits been an entire day of meetings19:01
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue May  2 19:01:06 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JNR4PAHZ5JD272JXUC3BQUSZPLRIJYID/ Our Agenda19:01
clarkb#topic Announcements19:01
clarkbI didn't have any on the agenda.19:01
clarkb#topic Migrating to Quay19:02
clarkbSignificant progress has been made here19:02
clarkbZuul and all of its images have been moved and are automatically publishing to quay now. We also updated our deployment tooling to pull zuul and friends from quay19:03
clarkbSince then Zuul had its normal weekend restart and that all seemed to work (nodepool auto updated shortly after the changes landed)19:03
clarkbthe only thing I notice as potentially problematic is I think docker image prune is treating the docker hub images as something it should keep around by default so we may need to do manual cleanup at some point. THis isn't urgent though but something to be aware of as we move things19:04
clarkbOn the opendev side of things my changes to auto create repos if necessary landed and I updated system onfig with a second set of jobs that can be inherited from to move images to quay. I did this with zookeeper-statsd and that seems to work19:05
clarkbWhere this leaves us is making a plan and todo list for getting all of our images updated. In particular one thing that is annoying is that we have some iamges that rely on other images. We can either update these at the root first or at the leaves first19:05
clarkbThere are disadvantages and advantages to each approach. If we update the root first (python-base/python-builder in particular) then we would want to pretty quickly update all of their descendents to avoid potentially getting caught out if we had to make quick updates to the base images.19:06
clarkbBut if we do the root first we probably only need to do a single pass of image rebuulds and publication as the flow follows the direction of the dependencies19:07
clarkbif we go theo hter direction and update the leaves first then we don't need to do things as quickly because we can update the base images and rebuild and pull from docker if necessary at any time19:07
clarkbthe downside to doing the leaves first is that we will want to update them to publish to quay first, then do the base images, then reupdate all the leaves again to pull from quay19:07
clarkbI have two changes up for updating the root first19:08
clarkb#link https://review.opendev.org/c/opendev/system-config/+/881932 move base images to quay19:08
clarkb#link https://review.opendev.org/c/opendev/system-config/+/881933?usp=dashboard consume base images from quay19:08
clarkband one that updates a leaf19:08
clarkb#link https://review.opendev.org/c/opendev/system-config/+/881931?usp=dashboard19:08
clarkb#undo19:09
opendevmeetRemoving item from minutes: #link https://review.opendev.org/c/opendev/system-config/+/881931?usp=dashboard19:09
clarkb#link https://review.opendev.org/c/opendev/system-config/+/881931?usp=dashboard Move ircbot to quay19:09
clarkbThe idea here was to illustrate both approaches19:09
clarkbI think whatever approach we do we should write down a small plan with a todo list to not get lost in the ~40 something images that all need to be done19:09
clarkband then we should also consider setting aside a few days to really focus on getting as much of this done as possible so that the time period where we might have to debug both docker and quay things is minimized19:10
fungisounds good to me19:10
ianwfwiw it feels like updating the base images first makes the most sense19:10
clarkbianw: ya I am starting to come around to that myself19:10
ianwand then working through a checklist to deploy it 19:10
fungiexcept it could lock us out of making base image updates to the leaf images for the remainder of the transition19:11
clarkbfungi: yes. That said there is an outlet. We could manually push what we have pushed to quay.io to docker hub19:11
fungibut as long as that timeframe is reasonably short, i'm in favor of whichever path is the least effort19:11
clarkbyesterday I wasn't considering that we could do that manual push and I was far more concerned about the time where updates to base imges would be difficult19:11
ianwyeah, i think the fact that we have updated the base images, and have a list of tasks to work through, puts a nice constraint on getting it done19:12
clarkbbut since then I've realized that this is a reasonable outlet and I think doing base images first is fine19:12
clarkbin that case I will start putting together a document (etherpad most likely) with a overview of the plan and a todo list and links to ongoing work19:12
fungiwfm19:12
clarkbcool. The only other thing I wanted to mention (and this should go on the todo list) is we will need to resync some of the images from docker hub to quay.io manually before updating them as a few have had updates pushed to docker since I did the intial sync19:13
ianw++ sounds good.  i can probably help as i have somewhere a fairly recent list of all the images19:13
clarkbnot a huge deal. Just a step on the todo list19:13
ianwat one point, i was trying to automate getting them into a .dot file for graphical view, i can't remember why19:14
clarkbwe have a plan to make a plan. I'll take it. Hopefulyl I'll have somethignto share tomorrow. Seems unlikely I'll get anywhere near done today19:15
clarkbAnything else related to quay?19:15
clarkb#topic Bastion Host Updates19:16
clarkbStill just need reviews on the bridge backup thing19:16
clarkb#link https://review.opendev.org/q/topic:bridge-backups19:16
clarkbWe've also seen some connectivity errors from bridge to various nodes randomly but I don't think that is anything to do with us19:16
ianwyeah i have no idea what's up with that19:17
ianwit was worrying it happened both on dns deploy changes, but in both cases it was nowhere near anything to do with dns19:17
clarkbProbably just something to monitor and if it persists or get worse we can bring it up with our network/cloud providers19:17
ianwin both cases it was rax/dfw -> rax/dfw, what you'd think would be the most reliable19:18
fungithere were reports of connectivity issues potentially to releases.openstack.org (static02.opendev.org) from job nodes earlier too, so i wonder if rax-dfw is having some network connectivity issues19:18
clarkbfungi: that was likely the gitea thing though?19:18
clarkbI susppose they could be separate issues19:18
ianwistr periods where we've had weird ipv6 dropouts.  but with ansible we only use ip4 in inventory19:19
fungihard to know. the releases.o.o url in question is a redirect to opendev.org gitea, but it was unclear whether the problem was before or after redirecting19:19
fungithe job logs weren't precise enough to differentiate19:20
clarkbya if that persists after the UA filter update I guess we can dig deeper19:20
clarkb#topic Mailman 319:20
clarkbfungi: any progress with the held test node?19:21
funginone, sorry :/19:22
clarkb#topic Gerrit Updates19:22
fungii ned to prioritize it19:22
fungier, need19:22
clarkback19:22
clarkbThe acl updates landed and we should be all set there. However fungi noticed some behavior of the tool that does normalization that might need updating19:22
clarkbfungi: do you have links to those changes?19:22
fungi#link https://review.opendev.org/882075 Add an "apply" transformation which applies all19:23
fungi#link https://review.opendev.org/882080 Make option indenting a selectable transformation19:23
clarkbthanks19:24
fungiit mostly came up in reviewing recent project additions where the authors were struggling to figure out how to make their editors indent with tab characters19:24
clarkbNeither will change the output we already applied but are good updates to making the tool useable19:24
clarkbfungi: you hit the tab button on the keyboard :P19:25
fungionce those merge, we can recommend a command in our documentation to reformat acls19:25
ianwctl-q tab :)19:25
fungii was going to say "oh just run this and it'll reformat your acl file" but it wasn't as trivial as i remembered19:25
ianwdidn't the final pass write it out with tabs?19:26
fungii was disappointed that my prediction about people not having a basic grasp of how to use their text editors was accurate19:27
clarkbianw: the changes above basically showed that it was always running in noop mode which only prints not writes to files19:27
clarkbianw: the fix is to have it also write to files so people don't need to have editors that work :)19:27
clarkbFor the gerrit replication plugin leaking files I have not seen any movement on the bugs I filed and I have not had time to dig into trying ot write fixes mself19:27
fungiianw: 882080 is just to make the tab intending consistent with the other transformation rules. 882075 is about making it easier to reformat an acl from the command line19:27
clarkbI'm somewhat scared to look at how many files are on disk now19:28
ianwok, i guess that makes sense.  the tox job would spit out a diff file19:28
clarkbI am strongly considering that we revert the change to bind mount that directory now19:28
fungiwell, the tox job already spits out a diff file19:28
clarkbthen when we restart gerrit we'll clear it out automatically. Far less than ideal but I'm begining to think that might be better than my hacky solution19:29
fungibut yes, it's hard for someone who doesn't actually know the difference between spaces and tabs, or possibly doesn't know how to translate those terms into their native tongue, to know what the diff is telling them to do19:29
clarkbbut if I can get a second reviewer on my hacky solution we can revisit a revert after wards19:29
clarkb#link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk19:29
ianwfungi: sure -- i guess i don't have an issue with it actually rewriting the file, and then us telling people "run this and check in the result"19:30
clarkbI think the only real risk with the replication leas is if we cause ext4 to run out of inodes19:30
clarkb*replication leaks. So this is not super urgent but something we should eventually address19:31
clarkbAnd the last gerrit related item is we dropped 3.6 image builds. Added 3.8 images. And have a 3.7 to 3.8 upgrade job running19:31
clarkbthis has already led to fixing a couple of UI issues with 3.819:31
fungiianw: today i saw an acl go through 6 patchsets and the author is still struggling to get the whitespace right19:31
clarkbAnything else gerrit related?19:33
funginone on my end19:33
ianwnope19:33
clarkb#topic Upgrading Servers19:33
clarkbEtherpad server cleanup happend so etherpad is done done at this pint19:33
clarkbNameserver migration stuff happened as well (thank you ianw!)19:33
clarkball four of our zones/domains should e running off of new servers with a new authoritative server as well19:34
clarkbI think the remaining tasks are cleaning up the old server?19:34
clarkb*old servers19:34
clarkbianw: anything we should be helping with to finish this completely?19:35
ianwnope, i can remove the old servers now, and there's one zone change to remove the ns1/ns2/adns1 records19:35
ianwthe remaining todos were AAAA glue records for opendev.org and rdns for the vexxhost server19:35
clarkbhttps://review.opendev.org/c/opendev/zone-opendev.org/+/88193519:36
ianwi've logged a ticket about the rdns, so we'll see what happens19:36
clarkb#link https://review.opendev.org/c/opendev/zone-opendev.org/+/881935 Cleanup old dns server records19:36
clarkbI can mention to the foundation registrar intermediary that we want AAAA glue records19:37
clarkbfungi: assuming you don't have objections or would prefer to do it to help ensure they don't do the wrong thing at the registrar19:37
fungino objections19:37
fungii don't know how to phrase it any better to avoid confusion, i'd just be prepared for confusion19:38
clarkbI figure something like "The ns03.opendev.org and ns04.opendev.org servers have ipv4 and ipv6 addresses. A glue records were added for both but not AAAA. Is that something we can add""19:39
ianwwe did have them before, so there's that19:39
fungioh, we did?19:40
clarkbOk I'll reach out later today19:40
fungii wasn't sure since i hadn't paid close enough attention before we changed them19:40
clarkbThe last item I wanted to note on this topic is we have more servers to upgrade. All at lower priority than what we've already done but still worth doing19:41
clarkbI'm hoping to keep pushingon that here and there and help is much appreciated19:41
clarkb#topic AFS volume utilization19:42
clarkbGrafana reports utiliation has jumped again19:42
clarkbI suspect the growth is simply not as nicely linear day to day as I would like19:42
clarkb#link https://review.opendev.org/c/opendev/system-config/+/881952 add bookworm mirrors19:42
clarkbThis change is going to be effectively blocked on us freeing space or adding space though19:42
clarkb(I left a note on the change but didnt -1 or -W)19:42
fungiyeah, i saw that go by and thought the same. thanks for calling it out19:43
clarkbI feel like with everything else going on I have little time to devote to this as it isn't urgent quite yet. But something that will need to be addressed19:44
ianwfungi: maybe you could look at https://review.opendev.org/c/opendev/system-config/+/879239 to confirm it's not insane and we can prune the wheels19:44
fungican do19:44
clarkband ya starting with cleanups like that is a good starting point and we can take it from there19:44
ianwfedora has now become a move from 36 -> 38 situation19:44
fungiand yeah, the bookworm release is still a ways out, but at least there's a date now19:44
ianwi'm not sure how much time i'll have to drive that19:44
clarkbianw: one idea I had was whether or not centos 9 stream is sufficiently up to date that fedora is maybe less important?19:45
clarkbianw: but I'm not plugged into how those distros are being used well enough to know if that is the case19:45
clarkbnaively they seem to fit into a similar space for CI needs anyway19:45
ianwheh, yeah, i was about to say we might want to consider the future of it19:45
clarkbmaybe we should write an email to the service-discuss list about it19:46
clarkbto solicit feedback from users to see how that might impact them19:46
clarkbI can write and send that if we think it is a good idea19:46
ianwit was mostly for devstack; it's quite a lot of overhead in zuul-jobs 19:46
fungii'm happy to put a bit of pressure on openstack to justify the continued use of fedora nodes too19:47
clarkbok I can draft that email19:47
clarkbI'll be sure to get it looked over before sending just to avoid saying anything obviously incorrect19:47
clarkb#topic Gitea 1.1919:48
ianw++19:48
clarkb#link https://review.opendev.org/c/opendev/system-config/+/877541 Upgrade to gitea 1.19.219:48
clarkbI think gitea 1.19 is ready to go when we are19:48
clarkbThe api requires auth to list orgs bug was fixed and I updated our ansible to reflect this.19:48
tonybI can help with the Fedora update and also the CentOS determination19:48
clarkbI did drop a no_log: true in that update since I dropped auth as well. But double check me on that being safe please19:48
clarkbtonyb: cool I'll ping you with the email draft too to ensure it makes sense from your perspective and we can edit or decide to hold off if necessary19:49
clarkb#link https://158.69.65.228:3081/opendev/system-config Held test node for checking19:49
clarkbthsi is a held node running the deployment of 1.19.2 from the above change which can be used for checking it looks good19:49
tonybclarkb: Sounds good19:49
clarkbMy afternoon is going to be pretty busy bouncing around errands and parenting today but I'm happy to watch that deploy tomorrow if I get the necessary reviews today19:50
clarkbThe major chagne is the addition of gitea actions which we turn off. This makes it a much simpler upgrade compred to 1.1819:51
clarkb#topic Storyboard19:51
clarkbfungi I think you've continued to do some work helping projects gracefully turn stuff off19:51
clarkbanything new to add on this? (I think frickler is out for a few weeks right now too)19:52
funginot really, just making sure we're consistently setting projects to inactive if they move off sb19:52
fungiand could stand to do an audit between projects.yaml and the sb database19:52
fungito see what we've missed (moves to lp and general retirements)19:53
clarkbstoryboard doesn't have the gerrit issue of marking a project read only making it difficult to undo does it?19:53
funginah19:53
clarkbMostly just curious I doubt it would be an issue anyway19:53
fungihowever, it never added a ui widget to toggle the active field, so has to be done with the cli19:53
clarkbfungi: is that documented somewhere?19:54
clarkbjust thinking it would be good to be able to have other people do it if you go on vacation etc19:54
fungiit can be19:54
fungibut it's also not an urgent thing to change19:54
fungii'll try to remember to push up some docs, or better still an audit script to go with it19:54
fungimainly just stops the project from getting used in autocomplete for fields and turning up in searches, i think19:55
clarkbmakes sense19:56
clarkbthanks!19:56
clarkb#topic Open Discussion19:56
clarkbAnything else?19:56
clarkbAs briefly mentioned earlier my afternoon is going to involve me being in and out.19:56
fungii'll be out a lot over the next few days19:57
fungibut not gone entirely. some errands, long lunches, and a half day on friday19:57
clarkbSounds like that may be everything. Thank you for your time. We'll be back here next week19:58
clarkbenjoy your morning/afternoon/evening!19:58
clarkb#endmeeting19:58
opendevmeetMeeting ended Tue May  2 19:58:37 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:58
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-05-02-19.01.html19:58
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-05-02-19.01.txt19:58
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-05-02-19.01.log.html19:58
fungithanks clarkb!19:58
ianwthanks!19:59
tonybThanks all19:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!