Tuesday, 2024-06-11

clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Jun 11 19:00:14 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkbI made it back in time to host the meeting19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/4BD6VNMXEEQWEEANCGX2JWB2LK6XY5RT/ Our Agenda19:00
tonyb\o/19:00
clarkb#topic Announcements19:00
clarkbI'm going to be out from June 14-1919:01
clarkbThis means I'll miss next week's meeting. More than happy for ya'll to have one without me or to skip it if that is easier19:01
tonybI'm relocating back to AU on the 14th19:01
fricklersounds like skipping would be fine19:01
fungiyeah, i'll be around but don't particularly need a meeting either19:02
clarkbworks for me19:02
clarkb#agreed The June 18th meeting is Cancelled19:02
clarkbanything else to announce?19:02
tonyband I have started a project via the OIF univerity partnetships to help us get keycloak talking to Ubuntu SSO19:02
tonybAlso my IRC is super laggy :(19:02
clarkbtonyb: cool! is that related to the folks saying hello in #opendev earlier today?19:03
fungithanks tonyb!!!19:03
tonybclarkb: Yes yes it is :)19:03
fungithat's the main blocker to being able to make progress on the spec19:03
fungiknikolla had some ideas on how to approach it and had written something similar, if you manage to catch him at some point19:04
tonybYup so fingers crossed we'll be able to make solid progress in the next month19:04
tonybfungi: Oh awesome!19:04
fungihe might be able to provide additional insight19:04
fungisomething about having written other simplesamlphp bridges in the past19:04
tonybI'll reachout19:05
clarkb#topic Upgrading Old Servers19:06
clarkbI've been trying to keep up with tonyb but failing to do so. I think there has been progress in getting the wiki changes into shape?19:06
clarkbtonyb: do you still need reviews on the change for new docker compose stuff?19:07
tonybYes please19:07
tonybAdding compose v2 should be ready pending https://review.opendev.org/c/opendev/system-config/+/921764?usp=search passing CI19:08
clarkb#link https://review.opendev.org/c/opendev/system-config/+/920760 Please review this change which adds the newer docker compose tool to our images (which should be safe to do alongside the old tool as they are different commands)19:08
tonyb(side note) once that passes CI I could use it to migrate list3 to the golang tools19:08
clarkbneat19:09
clarkband thank you for pushing that along19:09
tonybmediawiki is closer I have it building 2 versions of our custom container19:09
tonybthe role is getting closer to failing in testinfra19:09
clarkbwhat are the two different versions for?19:09
tonybfrom there I'll have questions about more complex testing19:09
fungitwo different mediawiki releases?19:10
clarkboh right we want to do upgrades19:10
tonyb1.28.2 (the current version) and 1.35,x the version we'll ned to go to next19:10
clarkbgot it thanks19:10
fungiyeah, the plan is to deploy a container of the same version we're running, then upgrade once we're containered up19:10
tonybI wanted to make sure I had the Dockerfile and job more or less right19:10
clarkbsimilar to hwo we do gerrit today with the current and next version queued up19:11
tonybYup heavily inspired by gerrit :)19:11
tonybOn noble I've been trying to move that ball along19:11
clarkbwith a mirror node based on the openafs qusetions19:11
tonybcurrebtly stuck on openafs on noble19:12
clarkbhaving thought about it I think the best thing is to simply skip the ppa on noble until we need to build new pacakges for noble19:12
clarkbI think we can (should?) do that for jammy too but didn't because we didn't realize we were using the distro package until well after we had been using jammy19:12
tonybI tried that and hit a failure later where we install openafs-dkms so I need to cover that also19:12
clarkbat this point its probably fine to leave jammy in that weird state where we install the ppa then ignore it. But for noble I think we can just skip the ppa19:12
tonybOkay we can decide on Jammy during the review19:13
clarkbtonyb: ok the packages may have different names in noble now?19:13
fungiit's possible you're hitting debian bug 106089619:13
clarkbthey were the same in jammy but our ppa had an older version so apt pciked the distro version instead19:13
clarkbfungi: is that the arm bug? I think the issue would only be seen on arm19:13
fungi#link https://bugs.debian.org/1060896 openafs-modules-dkms: module build fails for Linux 6.7: osi_file.c:178:42: error: 'struct inode' has no member named 'i_mtime'19:13
tonybI'll look it only just happened.19:14
clarkboh no thats a different thing19:14
fungioh, is openafs-modules-dkms only failing on arm?19:14
fungi1060896 is what's keeping me from using it with linux >=6.719:14
clarkbfungi: it has an arm failure, the one I helped debug and posted a bug to debian for. I just couldn't remember the number. That specific issue was arm specific19:14
tonybOh and I hit another issue where CentOS9s has moved forward from our image so the headers don't match19:14
clarkbianw has also been pushing for us to use kafs in the mirror nodes. We could test that as an alternative19:15
fungi#link https://bugs.debian.org/1069781 openafs kernel module not loadable on ARM64 Bookworm19:15
clarkbtonyb: you can manually trigger an image rebuild in nodepool if you want to speed up rebuilding images after the mirrors update19:15
tonybYeah I'm keen on that too (kAFS) but I was thinking later19:16
tonybclarkb: That'd be handy we can do that after the meeting19:16
clarkb++ especially if we can decouple using noble from that. I'm a fan of doing one thing at a time if possible19:16
fungiif the challenges with openafs become too steep, kafs might be an easier path forward, yeah19:16
tonybIt's a $newthing for me19:16
clarkbtonyb: yup can help with that19:16
tonybcool cool19:16
clarkband thank you for pushing forward on all these things. Even if we don't have all the answers it is good to be aware of the potentail traps ahead of us19:17
fungii should try using kafs locally to get around bug 1060896 keeping me from accessing afs on my local systems19:17
clarkbanything else on the topic of upgrading servers?19:17
tonybNope I think that's good19:17
clarkb#topic Cleaning up AFS Mirrors19:18
fungii sense a theme19:18
clarkbI don't have any new progress on this. I'm pulled in enough directions that this is an easy deprioritization item. That said I wanted to call out that our centos-8-stream images are basically unusable and maybe unbuildable19:18
clarkbUpstream cleared out the repos and our mirrors faithfully reflected that empty state19:19
clarkbthat means all thats left for us to do really is to remove the nodes and image builds from nodepool19:19
tonyb8-stream should be okay?  I thought it was just CentOS-8 that went away?19:19
clarkboh and the nodeset stuff from zuul19:19
clarkbtonyb: no, 8-stream is EOL as of a week ago19:19
tonybOh #rats19:19
tonybOkay19:19
tonybwhere are we at with Xenial?19:20
clarkband they deleted the repos (I think technically the repos go somewhere else, but I don't want to set an expectation that we're going to chase eol centos repo mirroring locations if the release is eol)19:20
clarkbwith xenial devstack-gate has been wound down which had a fair bit of fallout.19:20
fungiyeah, we could try to start mirroring from the centos archive or whatever they call it, but since the platform's eol and we've said we don't make extra effort to maintain eol platforms19:20
clarkbI think the next steps for xenial removal are going to be removing projects from the zuul tenant config and/or merging changes to projects to drop xenial jobs19:20
tonybOkay so we need to correct some of that fallout?  Or can we look at removing all the xenial jobs and images?19:21
clarkbwe'll have to make educated guesses on which is appropriate (probably based on the last time anything happened in gerrit for the projects)19:21
clarkbtonyb: I don't think we need to correct the devstack-gate removal fallout. Its long been deprecated and most of the fallout is in dead projects whcih will probably get dealt with when we remove them from the tenant config19:21
tonybOkay.19:22
clarkbbasically we should focus on that sort of cleanup first^ then we can figure out what is still on fire and if it is worth intervention at that point19:22
tonybIs there any process or order for removing xenial jobs19:22
fungior in some very old openstack branches that have escaped eol/deletion19:22
tonybfor example system-config-zuul-role-integration-xenial?  to pick on one I noticed today19:23
clarkbtonyb: the most graceful way to do it is in the child jobs first then remove the parents. I think zuul will enforce this for the most part too19:23
clarkbtonyb: for system-config in particular I think we can just proceed with cleanup. Some of my system-config xenial cleanup changes have already merged19:23
tonybclarkb: cool beans19:23
clarkbif you do push changes for xenial removal please use https://review.opendev.org/q/topic:drop-ubuntu-xenial that topic19:24
tonybnoted19:24
clarkbone last thought before we move to the next topic: maybe we pause centos-8-stream image buidls in nodepool now as a step 0 for that distro release. As I'm like 98% positive those image builds will fail now19:24
clarkbI should be able to push up a change for that19:25
clarkb#topic Gitea 1.22 Upgrade19:25
clarkbSimilar to the last topic I haven't made much progress on this after I did the simple doctor command testing a few weeks ago19:25
clarkbI was hoping that they would have a 1.22.1 release before really diving into this as they typically do a bugfix release shortly after the main release.19:26
clarkbthere is a 1.22.1 milestone that was due like yesterday in github so I do expect that soon and will pick this back up again once avaialble19:26
clarkbthere are also some fun bugs in that milestone list though I don't think any would really affect our use case19:26
clarkb(things like PRs are not mergable due to git bugs in gitea)19:27
clarkb#topic Fixing Cloud Launcher19:27
clarkbThis was on the agenda to ensure it didn't get lost if the last fix didn't work. But tonyb reports the job was successful last night19:27
fricklerseems this has passed today19:27
clarkbwe just had to provide a proper path to the safe.directory config entries19:28
clarkbWhich is great bceause it means we're ready to take advantage of the tool for bootstrapping some of the openmetal stuff19:28
tonybHuzzah19:28
clarkbwhich takes us to the next topic19:28
clarkb#topic OpenMetal Cloud Rebuild19:28
clarkbThe inmotion cloud is gone and we have a new openmetal cloud on the same hardware!19:28
clarkbThe upside to all this is we have a much more modern platform and openstack deployment to take advantage of. THe downside is we have to reenroll the cloud into our systems19:29
clarkbI got the ball rolling and over the last 24 hours or so tonyb has really picked up todo items as I've been distracted by other stuff19:29
clarkb#link https://etherpad.opendev.org/p/openmetal-cloud-bootstrapping captures our todo list for bootstrapping the new cloud19:29
tonybI think https://review.opendev.org/c/opendev/system-config/+/921765?usp=search is ready now19:30
clarkbhttps://review.opendev.org/c/opendev/system-config/+/921765 is the next item that neesd to happen, then we can update cloud launcher to configure security groups and ssh keys, then we can keep working down the todo list int he etherpad19:30
tonybI'm happy to work with whomever to keep that rolling19:31
clarkbso anyway reviews welcome as we get changes up to move things along. At this point I expect the vast majority of things to go through gerrit19:31
clarkbtonyb: thanks! I should be able to help through thursday at least if we don't get it done by then19:31
tonybIt's a good learning opportunity19:31
tonybI was hoping the new mirror node would be on noble but that's far from essential19:32
clarkbya I wouldn't prioritize noble here if there aren't easy fixes for the openafs stuff19:32
tonybYeah19:33
frickleralso easy to replace later19:33
clarkb++19:33
clarkbthe last thing I wanted to call out on this is that I think fungi and corvus may still have account creation for the system outstanding19:33
tonybI haven't been consistent about removing inmotion as I add openmetal figuring we can do a complete cleanup "at the end"19:33
clarkbfeel free to reach out to me if you ahve questions (I worked through the process and while not an expert did work through a couple odd things)19:33
corvusyep sorry :)19:33
clarkbtonyb: ya that should be fine too. I have a note in the todo list to find those things that we missed already19:34
fricklerthough that's also not too relevant for operating the cloud itself19:34
tonybOh and datadog ... My invitation expired and I think clarkb was going to sak about dropping it19:34
corvusi'm not, like, blocking things right?19:34
fricklerall infra-root ssh keys should be on all nodes already19:34
clarkbcorvus: no you are not. I'm more worried taht the invites might expire and we'll ahve to get openmetal to send new ones19:34
clarkbbut frickler  is right your ssh keys should be sufficient for the openstack side of things19:34
corvusoh i see.  i will get that done soon then.19:34
tonybIt looks like ew can send them ourselvs19:35
clarkbtonyb: oh cool in that case its far less urgent19:35
tonybit's mostly for issue tracking and "above OpenStack" issues19:35
fungioh, i'll try to find the invites19:35
fungii think i may have missed mine19:35
clarkbfungi: mine got sorted into the spam folder, but once dug out it was usable19:36
funginoted, thanks19:36
clarkbanything else on the subject of the new openmetal cloud?19:36
fricklerwell ... as far as html mails with three-line-links are usable19:36
fricklerdo we have to discuss details like nested virt yet?19:36
clarkbfrickler: I figured we could wait on that until after we have "normal" nodes booting19:37
clarkbadding nested virt labels to nodepool at that point should be trivial. That said I think we can do that as long as the cloud supports it (which I believe it does)19:37
fricklerack, will need some testing first. but let's plan to do that before going "full steam ahead"19:38
tonybThat works19:38
clarkbsounds good19:38
tonybDo we have/need a logo to add to the "sponsors" page?19:38
clarkbtonyb: they are already on the opendev.org page19:39
clarkbbecause the inmotion stuff was really openmetal for the last little while. We were just awiting for the cloud rebuild to go through the trouble of renaming everything19:39
tonybCool19:39
fungithough it's worth double-checking that the donors listed there are still current, and cover everyone we think needs to be listed19:40
clarkbthe main questions there would be do we add osuosl and linaro/equinix/worksonarm19:40
tonybRax, Vexx, OVH and OpenMetal19:40
clarkbthat might be a good discussion outside of the meeting though as we have a couple more topics to get into and I can see that takign some time to get through19:41
fungiyep19:41
funginot urgent19:41
tonybkk19:41
clarkb#topic Improving Mailman Throughput19:41
clarkbWe increased the number of out runner to 4 from 1. This unfortunately didn't fix the openstack-discuss delivery slowness. But now that we understand things better should prevent openstack-discuss slowness from impacting other lists so a (small) win anyway19:42
fungiyeah, there seems to be an odd tuning mismatch between mailman and exim19:42
clarkbfungi did more digging and found that both mailman and exim have a batch size for messages delivery of 10. However it almost seems like exim's might be 9? because it complains when mailman gives it 10 messages and goes into queing mode?19:42
fungiyeah, exim's error says it's deferring delivery because there were more than 10 recipients, but there are exactly 10 the way mailman is batching them up (confirmed from exim logs even)19:43
clarkbmy suggestion was that we could bump the exim number quite a bit to see if it stops complaining mailman is giving it too many messages at once. THen if that helps we can further bump the mailman number up to a closer value but give exim some headroom19:43
fungiso i think mailman's default is chosen trying to avoid exceeding exim's default limit, except it's just hitting it instead19:43
clarkbsay bump exim up to 50 or a 100. Then bump mailman up to 45 or 9519:43
tonybAll sounds good to me19:44
corvuscould also decrease exim's queue interval19:44
fungiyeah, right now the delay is basically waiting for the next scheduled queue run19:44
clarkboh thats a piece of info I wasn't aware of. So exim will process each batch in the queue with a delay between them?19:45
funginot exactly19:45
corvus(i think synchronous is better if the system can handle it, but if not, then queing and running more often could be a solution)19:45
fungiexim will process the message ~immediately if the recipient count is below the threshold19:45
fungiotherwise it just chucks it into the big queue and waits for the batch processing to get it later19:46
clarkbgot it19:46
clarkbfungi: was pushing up changes to do some version of ^ somethign that you were planning to do?19:46
fungiexcept that with our mailing lists, every message exim receives outbound is going to exceed that threshold and end up batched instead19:46
fungiclarkb: i can, just figured it wasn't urgent and we could brainstorm in the meeting, which we've done now19:47
clarkbthanks!19:47
clarkb#topic Testing Rackspaces new cloud offering19:47
fungiso with synchronous being preferable, increasing the recipient limit in exim sounds like the way to go19:47
clarkb++19:47
fungiand it doesn't sound like there are any objections with 50 instead of 1019:48
clarkbwfm19:48
fungi(where we'll really just be submitting 45 anyway)19:48
tonyb#makeitso19:49
clarkbre rax's new thing they want us to help test. I haven't had time to ask for more info on that. However, enrolling it will be very similar to the openmetal cloud (though without setting flavors and stuff I expect) so we're ensuring the process generally works and are familiar with it by going through the process for openmetal19:49
clarkbpart of the reason I have avoided digging into this is I didn't want to start something like that then disappear for a week19:50
clarkbbut for completeness thought I'd keep it on the agenda in case anyone else had info19:50
tonybI was supposed to reachout to cloudnull and/or the RAX helpdesk for more information19:51
frickleris this something they sent to all customers or is it specifically targeted to opendev?19:51
clarkbfrickler: I am not sure. The way it was sent to us was via a ticket to our nodepool ci account19:51
fungii am a paying rackspace customer and did not receive a personal invite, so i think it's selective19:51
clarkbI don't think our other rax accounts got invites either. Cloudnull was a big proponent of what we did back in the osic days and may have asked his team to target us19:51
fricklerok19:52
fricklercoincidentally I've seen a number of tempest job timeouts that were all happening on rax recently19:52
clarkbya I think it can potentially be useful for both us and them19:53
fricklerso assuming the new cloud also has better performance, that might be really interesting19:53
clarkbwe just need to find the time to dig into it.19:53
fricklerI can certainly look into helping with testing, just not sure about communications due to time difference19:54
fungiyes, the aging rackspace public cloud hardware and platform has been hinted in a number of different performance discrepancies/variances in jobs for a while19:54
fungialso with it being kvm, migrating to it eventually could mean we can stop worrying about "some of the test nodes you get might be xen"19:54
frickler+119:55
fungiat least i think it's kvm, but don't have much info either19:55
clarkbone thing we should calrify is what happens to the resources after the test period. If we aren't welcome to continue using it in some capacity then the amoutn of effort may not be justified19:55
clarkbsounds like we need more info in general. tonyb if you don't manage to find out before I get back from vacation I'll put it on my todo list at that point as I will be less likely to just disappear :)19:55
clarkb#topic Open Discussion19:56
clarkbWe have about 4 minutes left. Anything else that was missed in the agenda?19:56
tonybPerfect19:56
tonybNothing from me19:56
fungii got nothin'19:56
fricklerjust ftr I finished the pypi cleanup for openstack today19:57
fricklerbut noticed some repos weren't covered yet, like dib. so a bit more work coming up possibly19:58
tonybfrickler: Wow!19:58
tonybThat's very impressive19:58
clarkbfrickler: ack thanks for the heads up19:58
fungioh, also you wanted to ask about similar cleanup for opendev-maintained packages19:58
fungilike glean19:58
frickleryeah, but that's not urgent, so can wait for a less full agenda19:59
clarkbfor our packges I think it would be good to have a central account that is the opendev pypi owner account19:59
fungithough for glean specifically mordred has already indicated he has no need to be a maintainer on it any longer19:59
clarkbwe can add that account as owner on our packages then remove anyone that isn't the zuul upload account (also make the zuul upload account a normal maintainer rather than owner)19:59
frickleryes, and not sure how many other things we actually have on pypi?19:59
fungiyeah, the biggest challenge is that there's no api, so we can't script adding maintainers/collaborators19:59
clarkbfrickler: git-review, glean, jeepyb maybe?20:00
clarkboh and gerritlib possible20:00
fungistill a totally manual clicky process unless someone wants to fix the feature request in warehouse20:00
clarkbfungi: ya thankfully our problem space is much smaller than openstacks20:00
fungiagreed20:00
fungibindep too20:00
fungigit-restack, etc20:00
clarkbI've also configured that centos 8 stream image builds are failing20:01
clarkband we are at time20:01
clarkbthank you everyone. Feel free to continue discussion in #opendev or on the mailing list, but I'll end the meeting here20:02
clarkb#endmeeting20:02
opendevmeetMeeting ended Tue Jun 11 20:02:02 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:02
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-11-19.00.html20:02
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-11-19.00.txt20:02
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-11-19.00.log.html20:02
fungithanks clarkb!20:02
tonybThanks all20:02
fricklero/20:02

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!