clarkb | #startmeeting infra | 19:00 |
---|---|---|
opendevmeet | Meeting started Tue Jun 11 19:00:14 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | I made it back in time to host the meeting | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/4BD6VNMXEEQWEEANCGX2JWB2LK6XY5RT/ Our Agenda | 19:00 |
tonyb | \o/ | 19:00 |
clarkb | #topic Announcements | 19:00 |
clarkb | I'm going to be out from June 14-19 | 19:01 |
clarkb | This means I'll miss next week's meeting. More than happy for ya'll to have one without me or to skip it if that is easier | 19:01 |
tonyb | I'm relocating back to AU on the 14th | 19:01 |
frickler | sounds like skipping would be fine | 19:01 |
fungi | yeah, i'll be around but don't particularly need a meeting either | 19:02 |
clarkb | works for me | 19:02 |
clarkb | #agreed The June 18th meeting is Cancelled | 19:02 |
clarkb | anything else to announce? | 19:02 |
tonyb | and I have started a project via the OIF univerity partnetships to help us get keycloak talking to Ubuntu SSO | 19:02 |
tonyb | Also my IRC is super laggy :( | 19:02 |
clarkb | tonyb: cool! is that related to the folks saying hello in #opendev earlier today? | 19:03 |
fungi | thanks tonyb!!! | 19:03 |
tonyb | clarkb: Yes yes it is :) | 19:03 |
fungi | that's the main blocker to being able to make progress on the spec | 19:03 |
fungi | knikolla had some ideas on how to approach it and had written something similar, if you manage to catch him at some point | 19:04 |
tonyb | Yup so fingers crossed we'll be able to make solid progress in the next month | 19:04 |
tonyb | fungi: Oh awesome! | 19:04 |
fungi | he might be able to provide additional insight | 19:04 |
fungi | something about having written other simplesamlphp bridges in the past | 19:04 |
tonyb | I'll reachout | 19:05 |
clarkb | #topic Upgrading Old Servers | 19:06 |
clarkb | I've been trying to keep up with tonyb but failing to do so. I think there has been progress in getting the wiki changes into shape? | 19:06 |
clarkb | tonyb: do you still need reviews on the change for new docker compose stuff? | 19:07 |
tonyb | Yes please | 19:07 |
tonyb | Adding compose v2 should be ready pending https://review.opendev.org/c/opendev/system-config/+/921764?usp=search passing CI | 19:08 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/920760 Please review this change which adds the newer docker compose tool to our images (which should be safe to do alongside the old tool as they are different commands) | 19:08 |
tonyb | (side note) once that passes CI I could use it to migrate list3 to the golang tools | 19:08 |
clarkb | neat | 19:09 |
clarkb | and thank you for pushing that along | 19:09 |
tonyb | mediawiki is closer I have it building 2 versions of our custom container | 19:09 |
tonyb | the role is getting closer to failing in testinfra | 19:09 |
clarkb | what are the two different versions for? | 19:09 |
tonyb | from there I'll have questions about more complex testing | 19:09 |
fungi | two different mediawiki releases? | 19:10 |
clarkb | oh right we want to do upgrades | 19:10 |
tonyb | 1.28.2 (the current version) and 1.35,x the version we'll ned to go to next | 19:10 |
clarkb | got it thanks | 19:10 |
fungi | yeah, the plan is to deploy a container of the same version we're running, then upgrade once we're containered up | 19:10 |
tonyb | I wanted to make sure I had the Dockerfile and job more or less right | 19:10 |
clarkb | similar to hwo we do gerrit today with the current and next version queued up | 19:11 |
tonyb | Yup heavily inspired by gerrit :) | 19:11 |
tonyb | On noble I've been trying to move that ball along | 19:11 |
clarkb | with a mirror node based on the openafs qusetions | 19:11 |
tonyb | currebtly stuck on openafs on noble | 19:12 |
clarkb | having thought about it I think the best thing is to simply skip the ppa on noble until we need to build new pacakges for noble | 19:12 |
clarkb | I think we can (should?) do that for jammy too but didn't because we didn't realize we were using the distro package until well after we had been using jammy | 19:12 |
tonyb | I tried that and hit a failure later where we install openafs-dkms so I need to cover that also | 19:12 |
clarkb | at this point its probably fine to leave jammy in that weird state where we install the ppa then ignore it. But for noble I think we can just skip the ppa | 19:12 |
tonyb | Okay we can decide on Jammy during the review | 19:13 |
clarkb | tonyb: ok the packages may have different names in noble now? | 19:13 |
fungi | it's possible you're hitting debian bug 1060896 | 19:13 |
clarkb | they were the same in jammy but our ppa had an older version so apt pciked the distro version instead | 19:13 |
clarkb | fungi: is that the arm bug? I think the issue would only be seen on arm | 19:13 |
fungi | #link https://bugs.debian.org/1060896 openafs-modules-dkms: module build fails for Linux 6.7: osi_file.c:178:42: error: 'struct inode' has no member named 'i_mtime' | 19:13 |
tonyb | I'll look it only just happened. | 19:14 |
clarkb | oh no thats a different thing | 19:14 |
fungi | oh, is openafs-modules-dkms only failing on arm? | 19:14 |
fungi | 1060896 is what's keeping me from using it with linux >=6.7 | 19:14 |
clarkb | fungi: it has an arm failure, the one I helped debug and posted a bug to debian for. I just couldn't remember the number. That specific issue was arm specific | 19:14 |
tonyb | Oh and I hit another issue where CentOS9s has moved forward from our image so the headers don't match | 19:14 |
clarkb | ianw has also been pushing for us to use kafs in the mirror nodes. We could test that as an alternative | 19:15 |
fungi | #link https://bugs.debian.org/1069781 openafs kernel module not loadable on ARM64 Bookworm | 19:15 |
clarkb | tonyb: you can manually trigger an image rebuild in nodepool if you want to speed up rebuilding images after the mirrors update | 19:15 |
tonyb | Yeah I'm keen on that too (kAFS) but I was thinking later | 19:16 |
tonyb | clarkb: That'd be handy we can do that after the meeting | 19:16 |
clarkb | ++ especially if we can decouple using noble from that. I'm a fan of doing one thing at a time if possible | 19:16 |
fungi | if the challenges with openafs become too steep, kafs might be an easier path forward, yeah | 19:16 |
tonyb | It's a $newthing for me | 19:16 |
clarkb | tonyb: yup can help with that | 19:16 |
tonyb | cool cool | 19:16 |
clarkb | and thank you for pushing forward on all these things. Even if we don't have all the answers it is good to be aware of the potentail traps ahead of us | 19:17 |
fungi | i should try using kafs locally to get around bug 1060896 keeping me from accessing afs on my local systems | 19:17 |
clarkb | anything else on the topic of upgrading servers? | 19:17 |
tonyb | Nope I think that's good | 19:17 |
clarkb | #topic Cleaning up AFS Mirrors | 19:18 |
fungi | i sense a theme | 19:18 |
clarkb | I don't have any new progress on this. I'm pulled in enough directions that this is an easy deprioritization item. That said I wanted to call out that our centos-8-stream images are basically unusable and maybe unbuildable | 19:18 |
clarkb | Upstream cleared out the repos and our mirrors faithfully reflected that empty state | 19:19 |
clarkb | that means all thats left for us to do really is to remove the nodes and image builds from nodepool | 19:19 |
tonyb | 8-stream should be okay? I thought it was just CentOS-8 that went away? | 19:19 |
clarkb | oh and the nodeset stuff from zuul | 19:19 |
clarkb | tonyb: no, 8-stream is EOL as of a week ago | 19:19 |
tonyb | Oh #rats | 19:19 |
tonyb | Okay | 19:19 |
tonyb | where are we at with Xenial? | 19:20 |
clarkb | and they deleted the repos (I think technically the repos go somewhere else, but I don't want to set an expectation that we're going to chase eol centos repo mirroring locations if the release is eol) | 19:20 |
clarkb | with xenial devstack-gate has been wound down which had a fair bit of fallout. | 19:20 |
fungi | yeah, we could try to start mirroring from the centos archive or whatever they call it, but since the platform's eol and we've said we don't make extra effort to maintain eol platforms | 19:20 |
clarkb | I think the next steps for xenial removal are going to be removing projects from the zuul tenant config and/or merging changes to projects to drop xenial jobs | 19:20 |
tonyb | Okay so we need to correct some of that fallout? Or can we look at removing all the xenial jobs and images? | 19:21 |
clarkb | we'll have to make educated guesses on which is appropriate (probably based on the last time anything happened in gerrit for the projects) | 19:21 |
clarkb | tonyb: I don't think we need to correct the devstack-gate removal fallout. Its long been deprecated and most of the fallout is in dead projects whcih will probably get dealt with when we remove them from the tenant config | 19:21 |
tonyb | Okay. | 19:22 |
clarkb | basically we should focus on that sort of cleanup first^ then we can figure out what is still on fire and if it is worth intervention at that point | 19:22 |
tonyb | Is there any process or order for removing xenial jobs | 19:22 |
fungi | or in some very old openstack branches that have escaped eol/deletion | 19:22 |
tonyb | for example system-config-zuul-role-integration-xenial? to pick on one I noticed today | 19:23 |
clarkb | tonyb: the most graceful way to do it is in the child jobs first then remove the parents. I think zuul will enforce this for the most part too | 19:23 |
clarkb | tonyb: for system-config in particular I think we can just proceed with cleanup. Some of my system-config xenial cleanup changes have already merged | 19:23 |
tonyb | clarkb: cool beans | 19:23 |
clarkb | if you do push changes for xenial removal please use https://review.opendev.org/q/topic:drop-ubuntu-xenial that topic | 19:24 |
tonyb | noted | 19:24 |
clarkb | one last thought before we move to the next topic: maybe we pause centos-8-stream image buidls in nodepool now as a step 0 for that distro release. As I'm like 98% positive those image builds will fail now | 19:24 |
clarkb | I should be able to push up a change for that | 19:25 |
clarkb | #topic Gitea 1.22 Upgrade | 19:25 |
clarkb | Similar to the last topic I haven't made much progress on this after I did the simple doctor command testing a few weeks ago | 19:25 |
clarkb | I was hoping that they would have a 1.22.1 release before really diving into this as they typically do a bugfix release shortly after the main release. | 19:26 |
clarkb | there is a 1.22.1 milestone that was due like yesterday in github so I do expect that soon and will pick this back up again once avaialble | 19:26 |
clarkb | there are also some fun bugs in that milestone list though I don't think any would really affect our use case | 19:26 |
clarkb | (things like PRs are not mergable due to git bugs in gitea) | 19:27 |
clarkb | #topic Fixing Cloud Launcher | 19:27 |
clarkb | This was on the agenda to ensure it didn't get lost if the last fix didn't work. But tonyb reports the job was successful last night | 19:27 |
frickler | seems this has passed today | 19:27 |
clarkb | we just had to provide a proper path to the safe.directory config entries | 19:28 |
clarkb | Which is great bceause it means we're ready to take advantage of the tool for bootstrapping some of the openmetal stuff | 19:28 |
tonyb | Huzzah | 19:28 |
clarkb | which takes us to the next topic | 19:28 |
clarkb | #topic OpenMetal Cloud Rebuild | 19:28 |
clarkb | The inmotion cloud is gone and we have a new openmetal cloud on the same hardware! | 19:28 |
clarkb | The upside to all this is we have a much more modern platform and openstack deployment to take advantage of. THe downside is we have to reenroll the cloud into our systems | 19:29 |
clarkb | I got the ball rolling and over the last 24 hours or so tonyb has really picked up todo items as I've been distracted by other stuff | 19:29 |
clarkb | #link https://etherpad.opendev.org/p/openmetal-cloud-bootstrapping captures our todo list for bootstrapping the new cloud | 19:29 |
tonyb | I think https://review.opendev.org/c/opendev/system-config/+/921765?usp=search is ready now | 19:30 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/921765 is the next item that neesd to happen, then we can update cloud launcher to configure security groups and ssh keys, then we can keep working down the todo list int he etherpad | 19:30 |
tonyb | I'm happy to work with whomever to keep that rolling | 19:31 |
clarkb | so anyway reviews welcome as we get changes up to move things along. At this point I expect the vast majority of things to go through gerrit | 19:31 |
clarkb | tonyb: thanks! I should be able to help through thursday at least if we don't get it done by then | 19:31 |
tonyb | It's a good learning opportunity | 19:31 |
tonyb | I was hoping the new mirror node would be on noble but that's far from essential | 19:32 |
clarkb | ya I wouldn't prioritize noble here if there aren't easy fixes for the openafs stuff | 19:32 |
tonyb | Yeah | 19:33 |
frickler | also easy to replace later | 19:33 |
clarkb | ++ | 19:33 |
clarkb | the last thing I wanted to call out on this is that I think fungi and corvus may still have account creation for the system outstanding | 19:33 |
tonyb | I haven't been consistent about removing inmotion as I add openmetal figuring we can do a complete cleanup "at the end" | 19:33 |
clarkb | feel free to reach out to me if you ahve questions (I worked through the process and while not an expert did work through a couple odd things) | 19:33 |
corvus | yep sorry :) | 19:33 |
clarkb | tonyb: ya that should be fine too. I have a note in the todo list to find those things that we missed already | 19:34 |
frickler | though that's also not too relevant for operating the cloud itself | 19:34 |
tonyb | Oh and datadog ... My invitation expired and I think clarkb was going to sak about dropping it | 19:34 |
corvus | i'm not, like, blocking things right? | 19:34 |
frickler | all infra-root ssh keys should be on all nodes already | 19:34 |
clarkb | corvus: no you are not. I'm more worried taht the invites might expire and we'll ahve to get openmetal to send new ones | 19:34 |
clarkb | but frickler is right your ssh keys should be sufficient for the openstack side of things | 19:34 |
corvus | oh i see. i will get that done soon then. | 19:34 |
tonyb | It looks like ew can send them ourselvs | 19:35 |
clarkb | tonyb: oh cool in that case its far less urgent | 19:35 |
tonyb | it's mostly for issue tracking and "above OpenStack" issues | 19:35 |
fungi | oh, i'll try to find the invites | 19:35 |
fungi | i think i may have missed mine | 19:35 |
clarkb | fungi: mine got sorted into the spam folder, but once dug out it was usable | 19:36 |
fungi | noted, thanks | 19:36 |
clarkb | anything else on the subject of the new openmetal cloud? | 19:36 |
frickler | well ... as far as html mails with three-line-links are usable | 19:36 |
frickler | do we have to discuss details like nested virt yet? | 19:36 |
clarkb | frickler: I figured we could wait on that until after we have "normal" nodes booting | 19:37 |
clarkb | adding nested virt labels to nodepool at that point should be trivial. That said I think we can do that as long as the cloud supports it (which I believe it does) | 19:37 |
frickler | ack, will need some testing first. but let's plan to do that before going "full steam ahead" | 19:38 |
tonyb | That works | 19:38 |
clarkb | sounds good | 19:38 |
tonyb | Do we have/need a logo to add to the "sponsors" page? | 19:38 |
clarkb | tonyb: they are already on the opendev.org page | 19:39 |
clarkb | because the inmotion stuff was really openmetal for the last little while. We were just awiting for the cloud rebuild to go through the trouble of renaming everything | 19:39 |
tonyb | Cool | 19:39 |
fungi | though it's worth double-checking that the donors listed there are still current, and cover everyone we think needs to be listed | 19:40 |
clarkb | the main questions there would be do we add osuosl and linaro/equinix/worksonarm | 19:40 |
tonyb | Rax, Vexx, OVH and OpenMetal | 19:40 |
clarkb | that might be a good discussion outside of the meeting though as we have a couple more topics to get into and I can see that takign some time to get through | 19:41 |
fungi | yep | 19:41 |
fungi | not urgent | 19:41 |
tonyb | kk | 19:41 |
clarkb | #topic Improving Mailman Throughput | 19:41 |
clarkb | We increased the number of out runner to 4 from 1. This unfortunately didn't fix the openstack-discuss delivery slowness. But now that we understand things better should prevent openstack-discuss slowness from impacting other lists so a (small) win anyway | 19:42 |
fungi | yeah, there seems to be an odd tuning mismatch between mailman and exim | 19:42 |
clarkb | fungi did more digging and found that both mailman and exim have a batch size for messages delivery of 10. However it almost seems like exim's might be 9? because it complains when mailman gives it 10 messages and goes into queing mode? | 19:42 |
fungi | yeah, exim's error says it's deferring delivery because there were more than 10 recipients, but there are exactly 10 the way mailman is batching them up (confirmed from exim logs even) | 19:43 |
clarkb | my suggestion was that we could bump the exim number quite a bit to see if it stops complaining mailman is giving it too many messages at once. THen if that helps we can further bump the mailman number up to a closer value but give exim some headroom | 19:43 |
fungi | so i think mailman's default is chosen trying to avoid exceeding exim's default limit, except it's just hitting it instead | 19:43 |
clarkb | say bump exim up to 50 or a 100. Then bump mailman up to 45 or 95 | 19:43 |
tonyb | All sounds good to me | 19:44 |
corvus | could also decrease exim's queue interval | 19:44 |
fungi | yeah, right now the delay is basically waiting for the next scheduled queue run | 19:44 |
clarkb | oh thats a piece of info I wasn't aware of. So exim will process each batch in the queue with a delay between them? | 19:45 |
fungi | not exactly | 19:45 |
corvus | (i think synchronous is better if the system can handle it, but if not, then queing and running more often could be a solution) | 19:45 |
fungi | exim will process the message ~immediately if the recipient count is below the threshold | 19:45 |
fungi | otherwise it just chucks it into the big queue and waits for the batch processing to get it later | 19:46 |
clarkb | got it | 19:46 |
clarkb | fungi: was pushing up changes to do some version of ^ somethign that you were planning to do? | 19:46 |
fungi | except that with our mailing lists, every message exim receives outbound is going to exceed that threshold and end up batched instead | 19:46 |
fungi | clarkb: i can, just figured it wasn't urgent and we could brainstorm in the meeting, which we've done now | 19:47 |
clarkb | thanks! | 19:47 |
clarkb | #topic Testing Rackspaces new cloud offering | 19:47 |
fungi | so with synchronous being preferable, increasing the recipient limit in exim sounds like the way to go | 19:47 |
clarkb | ++ | 19:47 |
fungi | and it doesn't sound like there are any objections with 50 instead of 10 | 19:48 |
clarkb | wfm | 19:48 |
fungi | (where we'll really just be submitting 45 anyway) | 19:48 |
tonyb | #makeitso | 19:49 |
clarkb | re rax's new thing they want us to help test. I haven't had time to ask for more info on that. However, enrolling it will be very similar to the openmetal cloud (though without setting flavors and stuff I expect) so we're ensuring the process generally works and are familiar with it by going through the process for openmetal | 19:49 |
clarkb | part of the reason I have avoided digging into this is I didn't want to start something like that then disappear for a week | 19:50 |
clarkb | but for completeness thought I'd keep it on the agenda in case anyone else had info | 19:50 |
tonyb | I was supposed to reachout to cloudnull and/or the RAX helpdesk for more information | 19:51 |
frickler | is this something they sent to all customers or is it specifically targeted to opendev? | 19:51 |
clarkb | frickler: I am not sure. The way it was sent to us was via a ticket to our nodepool ci account | 19:51 |
fungi | i am a paying rackspace customer and did not receive a personal invite, so i think it's selective | 19:51 |
clarkb | I don't think our other rax accounts got invites either. Cloudnull was a big proponent of what we did back in the osic days and may have asked his team to target us | 19:51 |
frickler | ok | 19:52 |
frickler | coincidentally I've seen a number of tempest job timeouts that were all happening on rax recently | 19:52 |
clarkb | ya I think it can potentially be useful for both us and them | 19:53 |
frickler | so assuming the new cloud also has better performance, that might be really interesting | 19:53 |
clarkb | we just need to find the time to dig into it. | 19:53 |
frickler | I can certainly look into helping with testing, just not sure about communications due to time difference | 19:54 |
fungi | yes, the aging rackspace public cloud hardware and platform has been hinted in a number of different performance discrepancies/variances in jobs for a while | 19:54 |
fungi | also with it being kvm, migrating to it eventually could mean we can stop worrying about "some of the test nodes you get might be xen" | 19:54 |
frickler | +1 | 19:55 |
fungi | at least i think it's kvm, but don't have much info either | 19:55 |
clarkb | one thing we should calrify is what happens to the resources after the test period. If we aren't welcome to continue using it in some capacity then the amoutn of effort may not be justified | 19:55 |
clarkb | sounds like we need more info in general. tonyb if you don't manage to find out before I get back from vacation I'll put it on my todo list at that point as I will be less likely to just disappear :) | 19:55 |
clarkb | #topic Open Discussion | 19:56 |
clarkb | We have about 4 minutes left. Anything else that was missed in the agenda? | 19:56 |
tonyb | Perfect | 19:56 |
tonyb | Nothing from me | 19:56 |
fungi | i got nothin' | 19:56 |
frickler | just ftr I finished the pypi cleanup for openstack today | 19:57 |
frickler | but noticed some repos weren't covered yet, like dib. so a bit more work coming up possibly | 19:58 |
tonyb | frickler: Wow! | 19:58 |
tonyb | That's very impressive | 19:58 |
clarkb | frickler: ack thanks for the heads up | 19:58 |
fungi | oh, also you wanted to ask about similar cleanup for opendev-maintained packages | 19:58 |
fungi | like glean | 19:58 |
frickler | yeah, but that's not urgent, so can wait for a less full agenda | 19:59 |
clarkb | for our packges I think it would be good to have a central account that is the opendev pypi owner account | 19:59 |
fungi | though for glean specifically mordred has already indicated he has no need to be a maintainer on it any longer | 19:59 |
clarkb | we can add that account as owner on our packages then remove anyone that isn't the zuul upload account (also make the zuul upload account a normal maintainer rather than owner) | 19:59 |
frickler | yes, and not sure how many other things we actually have on pypi? | 19:59 |
fungi | yeah, the biggest challenge is that there's no api, so we can't script adding maintainers/collaborators | 19:59 |
clarkb | frickler: git-review, glean, jeepyb maybe? | 20:00 |
clarkb | oh and gerritlib possible | 20:00 |
fungi | still a totally manual clicky process unless someone wants to fix the feature request in warehouse | 20:00 |
clarkb | fungi: ya thankfully our problem space is much smaller than openstacks | 20:00 |
fungi | agreed | 20:00 |
fungi | bindep too | 20:00 |
fungi | git-restack, etc | 20:00 |
clarkb | I've also configured that centos 8 stream image builds are failing | 20:01 |
clarkb | and we are at time | 20:01 |
clarkb | thank you everyone. Feel free to continue discussion in #opendev or on the mailing list, but I'll end the meeting here | 20:02 |
clarkb | #endmeeting | 20:02 |
opendevmeet | Meeting ended Tue Jun 11 20:02:02 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:02 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-11-19.00.html | 20:02 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-11-19.00.txt | 20:02 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-11-19.00.log.html | 20:02 |
fungi | thanks clarkb! | 20:02 |
tonyb | Thanks all | 20:02 |
frickler | o/ | 20:02 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!