clarkb | meeting time | 19:00 |
---|---|---|
* tonyb waves | 19:00 | |
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Jun 4 19:00:39 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JH2FNAUEA32AS4GH475AHYEPLP4FUGPE/ Our Agenda | 19:00 |
clarkb | #topic Announcements | 19:01 |
clarkb | I will not be able to run the meeting on June 18 | 19:01 |
clarkb | that is two weeks from today. More than happy for someone else to take over or we can skip if people prefer that | 19:01 |
tonyb | I'll be back in Australia by then so I probably be can't run it | 19:02 |
clarkb | I wil be afk from the 14-19th | 19:02 |
tonyb | okay. I'll be travelling for part of that. | 19:03 |
clarkb | we can sort out the plan next week. Plenty of time | 19:03 |
tonyb | poor fungi | 19:03 |
tonyb | sounds good | 19:03 |
clarkb | #topic Upgrading old servers | 19:03 |
clarkb | #link https://etherpad.opendev.org/p/opendev-mediawiki-upgrade | 19:03 |
clarkb | I think this has been the recent focus of this effort | 19:03 |
tonyb | yup | 19:04 |
clarkb | looks like there was a change just psuehd too I should #link that too | 19:04 |
tonyb | there are some things to read about the plan etc | 19:04 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/921321 Mediawiki deployment for OpenDev | 19:04 |
tonyb | reviews very welcome. it works for local testing but needs more and comprehensive testing | 19:05 |
clarkb | tonyb: anything specific in the plan etherpad you want us to be looking at or careful about? | 19:05 |
tonyb | There is a small announcement that could do with a review | 19:05 |
clarkb | tonyb: ya I think if we can get something close enough to migrate data into we can live with small issues like theming not working and plan a cut over. I assume we'll want to shutdown the old wiki so we don't have content divergence? | 19:05 |
tonyb | as I'd like to get that out reasonably soon | 19:06 |
clarkb | #link https://etherpad.opendev.org/p/opendev-wiki-announce Announcement for wiki changes | 19:06 |
clarkb | that one I assume | 19:06 |
tonyb | Yes we'll shutdown the current server ASAP | 19:06 |
tonyb | There is plenty of planning stuff like moving away from rax-trove etc but I'm pretty happy with the progress | 19:07 |
clarkb | yup I think this is great. I'm planning to dig into it more today after meetings and lunch | 19:07 |
tonyb | the bare bones upgrade the host OS is IMO a solid improvement | 19:08 |
tonyb | I'll try a noble server this week | 19:08 |
tonyb | I think that's pretty much all there is to say for server upgrades | 19:08 |
clarkb | thanks | 19:08 |
clarkb | #topic AFS Mirror Cleanup | 19:09 |
clarkb | Not much new here other than that devstack-gate has been fully retired | 19:09 |
clarkb | I've been distracted by many other things like gerrit upgrades and gitea upgrades and cloud shutdowns which we'll get to shortly | 19:09 |
clarkb | #topic Gerrit 3.9 Upgrade | 19:09 |
clarkb | This happened. It seemed to go well. People are even noticing some of the new features like suggested edits | 19:09 |
clarkb | Has anyone else seen or heard of any issues since the upgrade? | 19:10 |
clarkb | I guess there was the small gertty issue which is resolvable by starting with a new sqlite db | 19:10 |
tonyb | That's all I'm aware of | 19:11 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/920938 is ready when we think we are unlikely to revert | 19:11 |
clarkb | I'm going to go ahead and remove my WIP vote on that change since we arne't aware of any problems | 19:12 |
tonyb | I'm happy to go ahead whenever | 19:13 |
clarkb | That change has a few children which add the 3.10 image builds and testing. The testing change seems to error when trying to pull the 3.10 which I thought should be in the intermediate ci registry. But I wonder if it is because the tag doesn't exist at all | 19:13 |
clarkb | corvus: ^ just noting that in case you've seen it like does the tooling try to fetch from docker hub proper and then fail even if it could find stuff locally? | 19:13 |
clarkb | in any case that isn't urgent but getting some early feedback on whether or not the next release works and is upgradeable is always good (particularly after the last upgrade was broken initially) | 19:14 |
clarkb | thank you to everyone that helped with the upgrade. Despite my concerns feeling under prepared things went smoothly which probably speaks to how prepared we actually were | 19:14 |
clarkb | #topic Gitea 1.22 Upgrade | 19:15 |
clarkb | I was hoping there would be a 1.22.1 by now but as far as I can tell there isn't. I'm also likely going to put this on the back burner for the immediate future as I've got several other more time sensitive things to worry about before I take that time off | 19:15 |
tonyb | That's fair. | 19:16 |
clarkb | That said I think the next step is going to be getting the upgrade chagne into shape so if people can review that it is still helpful | 19:16 |
clarkb | then we can upgrade then we can do the doctor tooling one by one on our backend nodes | 19:17 |
tonyb | Whatever works, if we decide we need it then we're pretty prepared thanks to your work | 19:17 |
clarkb | testing of the doctor tool seems to indicate running it is straightforward. We should be able to do it with minimal impact taking one node out of service and doctoring it and doing that in a loop | 19:17 |
clarkb | tonyb: ya I think reviews are the most useful thing at this point for the gitea upgrade | 19:17 |
clarkb | since the initial pass is working as is the doctor tool in testing | 19:17 |
tonyb | Cool | 19:18 |
clarkb | #topic Fixing Cloud Launcher Ansible | 19:19 |
clarkb | frickler: fixed up the security groups in the osuosl cloud but then almost immediately we ran into the git perms trust issue that is a side effect of recent git packaging updates for security concern fixes | 19:19 |
clarkb | This appears to be our only infra-prod-* job affected by the git updates so overall pretty good | 19:19 |
clarkb | for fixing cloud launcher I went ahead and wrote a change that trusts the ansible role repos when we clone them | 19:20 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/921061 workaround the Git issue | 19:20 |
clarkb | I did find that you can pass config options to git clone but they only apply after most of the cloning is complete | 19:20 |
clarkb | so I'm like 99% sure that doing that won't work for this case as the perms check occurs before cloning begins | 19:20 |
tonyb | Yeah it'd be nice if the options applied "earlier" but I think what you've done is good for now | 19:21 |
clarkb | One thing to keep in mind reviewing that change is I'm pretty sure we don't have any real test coverage for it | 19:21 |
clarkb | so careful review is a good idea and tonyb already caught one big issue with it (now fixed) | 19:21 |
tonyb | The general git safe.directory doesn't have a one size fits all solution | 19:22 |
corvus | clarkb: (sorry i'm late) i don't think lack of image in dockerhub should be a problem; that may point to a bug/flaw | 19:22 |
clarkb | corvus: ok, I feel like this has happend before and I've double checked the dependencies and provides/requires and I think everything is working in the order I would expect | 19:23 |
clarkb | and I just wondered if using a new tag is part of the issue since 3.10 doesn't exist anywhere yet but a :latest would be present typically | 19:23 |
clarkb | I guess I'll look more closely | 19:24 |
clarkb | #topic Increase Mailman 3 Out Runner Count | 19:24 |
corvus | yeah; lemme know if you want me to dig into details with you | 19:24 |
clarkb | corvus: thanks | 19:24 |
clarkb | I don't know if anyone else has noticed but recently I had some confusion over emails being in the openstack-discuss list archive and not in my inbox thinking there was some problem | 19:24 |
clarkb | the issue was that delivery for that list takes time. Upwards of 10 minutes. | 19:24 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/920765 Increase the number of out runners to speed up outbound mail delivery | 19:25 |
clarkb | In that change I linked to a discussion about similar problems another mail list host had and ultimately the biggest impact for them was increasing the number of runner instances | 19:25 |
tonyb | Makes sense to me. | 19:25 |
clarkb | I've gone ahead and proposed we do the same. This isn't super critical but I think it will help avoid confusion in the future and keep discussions moving without unnecessary delay | 19:26 |
tonyb | I'm cool with your fix but it'd be cool to have a way to verify it has the intended impact | 19:26 |
corvus | are we still letting exim do the verp? | 19:26 |
tonyb | not that I'd block the chnage until that exists | 19:26 |
clarkb | corvus: yes I believe exim is doing verp | 19:26 |
clarkb | corvus: and disabling verp is one of the suggestions in the thread I found about improving this performance | 19:27 |
clarkb | however, it seems like we can try this first since it is less impactful to behavior (in theory anyway) | 19:27 |
corvus | k. exim doing verp make things pretty fast, since if we're delivering 10k messages, and 5k are for gmail and 4k are for redhat and 1k are for everyone else (made up numbers), that should boil down to just a handful of outgoing smtp transactions for mailman. | 19:28 |
corvus | s/make things/should make things/ | 19:28 |
clarkb | corvus: makes sense. And ya I suspect the bottleneck may be between mailman and exim based on that thread (but I haven't profiled it locally) | 19:29 |
corvus | if, however, mailman is doing 5k smtp transactions to exim for gmail, then we've lost the advantage | 19:29 |
fungi | from what i gather, mailman 3 limits the number of addresses per message to the mta to a fairly small number in order to avoid some spam detection heuristics that may trigger when you have too many recipients | 19:29 |
corvus | (it should do, ideally, 1, but there are recipient limits, so maybe 10 total, 500 recipients each) | 19:29 |
fungi | i don't recall what the default is for sure, but think it may be something like 10 addresses per delivery | 19:30 |
corvus | okay, so there might be an opportunity to tune that, so that we can maximize the work exim does | 19:30 |
clarkb | suonds good. Do we think that is something to do instead of increasing the out runner instances or somethign to try next after this update? | 19:30 |
fungi | but yeah, the usual recommendation is to increase the number of threads mailman/django will use to submit messages to the mta so it doesn't just serialize and block | 19:30 |
corvus | my gut says try both, order doesn't matter | 19:31 |
clarkb | ack I guess we proceed with this and try the other thing too | 19:31 |
corvus | (non-exclusive) | 19:31 |
corvus | yep | 19:31 |
corvus | (sending multiple 500 recipient messages to exim in parallel is an ideal outcome) | 19:32 |
clarkb | #topic OpenMetal Cloud Rebuild | 19:32 |
clarkb | The Inmotion/OpenMetal folks sent email recently calling out that they have updated their openstack cloud as a service tooling to deploy on new platforms and deploy a newer openstack version | 19:33 |
clarkb | they have volunteered to help out with the provisiining in the near future so I've been try to prepare cleanup/shutdown of the existing cloud so that we can gracefully replace it | 19:33 |
clarkb | The hardware needs to be reused rather than setting up a new cloud adjacent to the old one which means shutting everything down first | 19:34 |
clarkb | #link https://review.opendev.org/c/openstack/project-config/+/921072 Nodepool cleanups | 19:34 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/921075 System-config cleanups | 19:34 |
clarkb | my goal is to get these chagnes landed over the next day or so as I'm meeting with Yuriys at 1pm Pacific time tomorrow to discuss further actions | 19:34 |
clarkb | I expect that nodepool will be in a dirty state (with stuck nodes and maybe stuck images) after the cleanup changes land. corvus pointed out the erase command which I'll use if it comes to that | 19:35 |
clarkb | Also if anyone else is interested in joining tomorrow just let me know. I can share the conference link | 19:35 |
tonyb | Happy to help with all/any of that | 19:35 |
clarkb | tonyb: will do. I think the immediate next step is to land teh change which should clean up images in nodepool and see where that gets us | 19:35 |
clarkb | oh and the system-config change should be landable too | 19:35 |
corvus | ping me if there are nodepool cleanup probs | 19:36 |
clarkb | corvus: will do | 19:36 |
clarkb | then maybe tomorrow we land the final cleanup and do the erase step | 19:36 |
clarkb | I'm hopeful that by the end of the week or early next week we'll be able to spin up a new mirror and add the new cloud to nodepool under its new openmetal name | 19:37 |
tonyb | That seems doable :) | 19:37 |
corvus | ++ | 19:38 |
clarkb | #topic Testing Rackspace's New Cloud Product | 19:38 |
clarkb | Yesterday a ticket was opened in our rax nodepool account to let us know that their new openstack flex product is in a beta/testing/trial period and we coudl opt into testing it. They seem specifically interested in any feedback we may have | 19:39 |
clarkb | I think this is a good idea, but the info is a bit scarse so not sure if it is a good fit for us yet. I'm also pretty swamped with the other stuff going on so not sure I'll have time to run this down before I take that time off | 19:39 |
corvus | heh my first question is "are you sure?" :) | 19:39 |
corvus | you=rax | 19:39 |
clarkb | Wanted to call it out if anyone else wants to look into this more closely. The product is called openstack flex and it might be worth pinging cloudnull to get details or a referal to someone else with that info | 19:40 |
clarkb | corvus: ya I think the upside is we really do batter a cloud pretty good so if they take that data and feedback and improve with it then we're helping | 19:40 |
clarkb | at the same time we might make their lives miserable :) | 19:40 |
tonyb | I can take a stab at that if you'd like | 19:40 |
clarkb | tonyb: sure, I think the main thing is starting up some communication to see if this fits our use case and then go for it I guess | 19:41 |
clarkb | #topic Open Discussion | 19:42 |
clarkb | dansmith discovered today that centos 8 stream seems to have been deleted upstream | 19:42 |
clarkb | our mirrors have faithfully synced this state and now centos 8 stream jobs are breaking | 19:42 |
tonyb | Oh nice | 19:43 |
clarkb | I think that means we can probably put removing centos 8 stream nodes on the todo list for nodepool cleanups | 19:43 |
clarkb | and we can also remove the centos afs mirror as everything should live in centos-stream now | 19:43 |
fungi | the new rax service might be interesting if it performs better, is kvm-based, has stable support for nested kvm acceleration, etc | 19:43 |
clarkb | but all the content has been deleted already so that is mostly just bookkeeping | 19:43 |
clarkb | fungi: ++ | 19:43 |
fungi | we do get a lot of users complaining about performance or lack of cpu features in our current rax nodes | 19:44 |
tonyb | I can find out more | 19:45 |
tonyb | I assume I need to do that via the Account/webUI? | 19:45 |
tonyb | I can also ping cloudnull for a informal chat | 19:46 |
clarkb | tonyb: that would be one appraoch though it might get us to the first level of support first | 19:46 |
clarkb | an informal chat might be better if cloudnull has time to at least point us at someone else | 19:46 |
tonyb | Okay | 19:48 |
clarkb | sounds like that might be everything | 19:50 |
clarkb | thank you for your time | 19:50 |
clarkb | #endmeeting | 19:50 |
opendevmeet | Meeting ended Tue Jun 4 19:50:28 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:50 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-04-19.00.html | 19:50 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-04-19.00.txt | 19:50 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-04-19.00.log.html | 19:50 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!