Tuesday, 2025-03-18

clarkbmeeting time19:00
clarkb#startmeeting19:00
opendevmeetclarkb: Error: A meeting name is required, e.g., '#startmeeting Marketing Committee'19:00
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Mar 18 19:00:47 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/YM5MZF2IHG6P4FFTRVMNVLJHYOIBVFUD/ Our Agenda19:00
clarkb#topic Announcements19:02
clarkbAnything to announce?19:02
clarkbI dno't have any concrete plans but next week the kids are home from school for a break and I may try to do an easy day or two and get out of the house with them19:02
tonybFWIW, next week 25th I head back to AU19:05
clarkback19:05
clarkb#topic Zuul-launcher image builds19:05
clarkbI'm not aware of any new updates on this subject19:05
clarkbthough if corvus is around I'd be curious to know if the quota handling has improved the reliability19:06
fungialso i've still got a change up to add flex dfw3 images/quota19:06
clarkboh do you have a link to that?19:06
fungi#link https://review.opendev.org/943104 Add the DFW3 region for Rackspace Flex19:06
clarkbthanks!19:07
clarkbI'll review that after the meeting19:07
clarkb#topic Updating Flavors in OVH19:08
clarkbThe other related item was the OVH flavor update process19:08
clarkb#link https://etherpad.opendev.org/p/ovh-flavors19:08
clarkbwe proposed starting that yesterday but they came back and said that timing didn't work for them. They will be in touch with timing that does work at some point but I haven't seen any proposed dates from their end19:08
clarkbwe can probably pull this off of the agenda next week. But I wanted to make sure we called out it wasn't happening and we're waiting on timing from ovh19:09
clarkb#topic Container hygiene tasks19:09
clarkb#link https://review.opendev.org/c/opendev/system-config/+/944799 drop python 3.10 image builds19:09
clarkbaccording to codesearch we don't have anything using python 3.10 iamges anymore19:10
clarkbI think this is a safe change to land at any time19:10
clarkbnext up is rebuilding the images we do use19:10
clarkb#link https://review.opendev.org/c/opendev/system-config/+/944789 rebuild python 3.11 and 3.12 base images19:10
clarkbunfortunately this change has been in uWSGI purgatory19:10
clarkbit seems that compiling current uwsgi on top of current bookworm on aarch64 segfaults19:11
clarkbuwsgi is only barely maintained anymore so I've begun looking at alternatives (which is the next topic)19:11
clarkbI don't think we can safely udpate python-builder and python-base without also updating uwsgi-base as the wheel builds for things will see mismatched binary packages and we could break things relying on uwsgi today19:11
clarkbso I think our main options here are either to drop uwsgi, fix uwsgi builds somehow, or retry until we get lucky? (they failures don't seem to be 100% consistent)19:12
clarkbmaybe it is worth trying to enqueue to the gate a few times to see if we can get it to go through19:12
clarkbthen finally I've got changes up to move images from python 3.11 to 3.1219:13
clarkb#link https://review.opendev.org/q/topic:%22opendev-python3.12%22+status:open Update images to use python3.1219:13
fricklerI didn't look at details yet, but could we use distro pkgs like devstack does?19:13
clarkbwhile those changes don't strictly depend on the new image updates it might be nice to use these rebuilds to also get new base images so we address two things with one set of service restarts19:13
clarkbfrickler: the container images are based on debian bookworm. If there is a bookworm uwsgi package then yes I think that is possible19:14
clarkbwe would add uwsgi to lodgeit's bindep requirements and then switch the base container image from uwsgi-base to python-base19:14
fungi#link https://packages.debian.org/bookworm/uwsgi19:14
clarkbthis will likely downgrade uwsgi for us but we aren't doing anything too crazy with it so thats probably fine (also uwsgi itself isn't super stateful)19:14
fricklerI think that would likely only work with distro python. so it might work for py3.11 then and we'd have to use noble for py3.1219:15
clarkbactually it wouldn't work for either if that is the cas19:15
clarkbsince both 3.11 and 3.12 are custom builds on top of debian19:15
clarkbanother option would be to try and pin uwsgi versions to something older than latest to see if we can get that to build more reliably19:16
fricklerand we do need custom python builds?19:16
clarkbit dramatically simplifies supporting a range of pythons19:16
clarkbsince we can use the upstream python container images and not need to have different platforms for different python versions or even wait for new distro releases19:17
clarkbI don't think uwsgi is important enough to warrant dramatically changing how we run python services19:17
clarkbit is basically unmaintained and can't be built on modern platforms reliably19:18
clarkbif we can get by with simple fixes to uwsgi that is fine. But I would rather invest my time in replacing uwsgi than completely rearchitecting our container setup19:18
fricklerack19:18
clarkbwhich is maybe a good segue into the next topic. Reviwes on the above changes are very much appreciated too19:19
clarkb#topic Dropping uWSGI19:19
* corvus arrives late19:19
clarkbthere are a number of factors have me thinking uWSGI is no longer a viable option for python web servers. First is that the project is minimally maintained and bug fixes are slow and build errors are common. Next is the python world is becoming more async which often means ASGI instead of WSGI19:20
clarkbrather than try and keep the old barely working thing going I suspect a better use of our time is looking forward and picking a modern tool that is maintained and can support asgi and wsgi19:20
clarkbin my investigation I discovered granian which seems to fit this criteria. However, frickler and others have pointed out granian is maintained laregly by one person (there are a number of contributors but the vast majority of work is one person) and there is no distro packaging for it19:21
clarkbI think that the lack of distro packaging is less meaningful for us as we install from pypi and they publish x86 and arm wheels19:21
clarkb#link https://review.opendev.org/c/opendev/lodgeit/+/944805 lodgeit granian container update19:22
clarkb#link https://review.opendev.org/c/opendev/system-config/+/944806 lodgeit system-config deployment update19:22
corvuswe only use uwsgi for lodgeit, right?19:22
clarkbcorvus: the uwsgi-base container image is lodgeit only. mailman3 also uses uwsgi but not via the base image19:22
clarkbanyway I'm open to other alternatives we think might be more appropriate, but I do think we should try and stop using uwsgi19:23
corvusmy feeling is -- for something like lodgeit -- whatever works and requires the least time.  if granian works, that's good enough for me; i'd be in favor of adding a granian image and declaring uwsgi unmaintained by us19:24
fungiin the mm3 case uwsgi is part of upstream images we're reusing (well, rebuilding), right?19:24
clarkbfungi: yes19:24
fungiin which case we can probably wait for or help with upstream's decisions on whether to continue relying on it19:24
clarkbya I think being in sync with them and possibly starting a conversation with them about alternatives is a good thing and also less urgent19:25
clarkbalso I think our needs differ from openstack's needs. While it might be more important for openstack to use distro packaged wsgi servers that isn't as important for us19:25
fricklercan we simply re-use the mm3 container and install lodgeit in there? or is that setup too different?19:26
clarkbI'm not sure. But it is fairly different (uses alpine instead of debian, and has django installed)19:26
fungihttps://github.com/maxking/docker-mailman is the mailman containers. they're arranged a bit differently from how we usually do our container images19:26
tonybI think that granian is a good step forward and the single maintainer is better that no maintainer.  The switch to granian looks pretty nice19:26
clarkbgunicorn was also mentioend in discussion in the tc channel. But gunicorn doesn't do asgi. uvicorn was mentioned but that is a separate code base aiui and doesn't do wsgi19:27
clarkbanyway I noted on the change why I thought granian was a good option and I think being able to switch to gunicorn or similar later means this si fairly low risk19:28
tonyb++19:28
frickleriiuc our uwsgi build issues were on aarch64, too? https://github.com/maxking/docker-mailman/blob/main/web/Dockerfile#L20-L2119:29
clarkbfrickler: aha thats a good find19:29
clarkbI can pin our uwsgi image to that version and see if it works19:29
clarkband then we can migrate away from uwsgi on a less urgent schedule19:29
fungicorvus summed up my position quite nicely, if it works with minimal effort, then great, lodgeit hopefully shouldn't eat a ton of our admin time19:29
clarkbya so I think plan is try that workaround from mm3, if that works land it. Concurrently we can make a plan to migrate lodgeit (as written the chagnes I have require coordination between container and system-cofnig so we may have an outage)19:30
fricklerack, I wasn't aware that this is "only" for lodgeit, I'd withdraw my -1 then19:30
clarkbif the workaround doesn't work we can speed up the granian switch. If the workaround works then we can be a bit more cautious and double check for alternatives first19:30
clarkbfrickler: oh cool19:30
clarkbthat gives me a path forward I'll work on that19:31
clarkbthanks for listening to me on this subject19:31
fungiand yeah, i did a search for uwsgi in the open/closed issues and pull requests for the mailman containers and saw they'd been doing work recently to get it working for aarch6419:31
fungiso potentially related19:31
fricklerah, yes, git blame shows this https://github.com/maxking/docker-mailman/pull/743, so rather recent19:32
clarkb#topic Upgrading old servers19:33
clarkbI should probably just merge this and the sprint idea topics together19:33
tonybclarkb: Yeah it seems to be basically the same topic at this point19:33
clarkbI built nb05 and nb06 yesterday and got them deployed19:34
clarkbthey have built every image but bookworm, centos 9 stream, gentoo, and openeuler. The last two are paused and don't build. Just before the meeting I requested rebuilds of the first two19:34
clarkbeverything is looking ok to me so far. I'll plan to clean out the old servers late this week when we're confident we won't need them anymore19:34
clarkb#link https://review.opendev.org/c/opendev/system-config/+/944867 fix nodepool image export cron19:35
clarkbthis is a related fix for a change I made to be backward and forward compatible between docker-compose and docker compose19:35
clarkbwe should land that and make sure we are exporting things properly before deleting the old servers too19:35
clarkbfrickler: ^ you had a concern about updating cron's PATH instead but I don't think that is necessary as docker-compose should be a temporary shim to make things compatible between old and new systems19:36
clarkboh also good news is the nested container stuff to build rocky linux images seems to work fine19:37
clarkbanyway I'm going to keep working on server replacements as time goes on and help is very much welcome19:37
clarkbanyone else have updates on this topic?19:37
frickleryes, I was only worried about what happens on nb04, but if it works there, I'm fine19:37
fungithere was also some error you spotted on the new servers, frickler?19:38
clarkbit should. on the old servers we pip install docker-compose which puts the executable in /usr/local/bin/docker-compose and on noble and newer we write out a shim for docker-compose to docker compose that we write to /usr/local/bin/docker-compose19:38
clarkbfungi: its the same error on both I think. /usr/local/bin isn't in cron's path so we don't find the pip installed executable or our shim19:39
clarkbyour fix uses the rooted path and should fix it for both I think19:39
fungi"/etc/nodepool-builder-compose/docker-compose.yaml: `version` is obsolete"19:39
clarkboh thats a warning19:39
clarkb`docker compose` doesn't use versioned docker-compose.yaml files but `docker-compose` does19:39
fungiokay, so benign then, but something we can clean up after the transition19:39
clarkbya19:39
fricklerack19:40
clarkb#topic Running certcheck on bridge19:41
clarkbI haven't had a chance to look into running this out of an infra-prod job19:41
clarkbbut I still intend to as I think that is likely a good way to do it19:41
clarkb#topic Working through our TODO list19:42
clarkb#link https://etherpad.opendev.org/p/opendev-january-2025-meetup19:42
clarkbI marked parallel infra-prod job execution done on that list19:42
clarkband the python 3.12 effort from earlier is also out of that list19:42
clarkbfeel free to pick things off of their and work on them when you have time19:43
clarkb#topic Packaging updates for bindep19:43
clarkbfungi has two proposed changes to bindep that serve as examples for modernizing python packaging with pbr19:43
clarkb#link https://review.opendev.org/938570 Drop requirements.txt19:43
fungijust wanted to touch base quickly on those two remaining changes19:43
clarkb#link https://review.opendev.org/940711 Drop auxiliary requirements files19:43
fungiand either merge or abandon them19:43
clarkbI'm happy to reduce the delta between us and upstream PyPA expectations and this is a good step in showing people using PBR how to do that so +2 from me19:44
fungionce we have a decision one way or the other, we'll tag a new bindep version and then i can work on porting the various packaging updates from bindep to our other tools19:44
fungiideally i'd like to get a bindep release out this week19:45
fricklerI can take a closer look at those tomorrow if you want to wait for that19:45
fungibut since they're stylistic changes for packaging to serve as a possible template for our other tools and wider pbr user base, i want to be sure we have consensus19:45
fungiyeah, tomorrow would be great19:45
fungithis came about in part because of problems openstack was facing using pbr, so it would be great to have a real-world example to point them to instead of something fabricated19:46
fungianyway, that's all i had for this topic19:47
clarkb#topic Open Discussion19:47
clarkbas mentioned I marked parallel infra-prod deplyoments done. This has dramatically sped up our deploy buildsets in many situations19:47
clarkbthank you to everyone who helped get that over the finish line19:48
corvuswhat's the mutex at?19:48
clarkb419:48
fricklera generic certcheck job in zuul-jobs might also be interesting for other #zuul users, maybe ask if someone there is interested in helping?19:48
clarkbthere is potential to bump that up but I think we'll see diminishing returns19:48
corvusthinking of increasing it more?19:48
corvusok19:48
clarkbmutix of 1 ran periodic in 2 hours or so (usually a little over). Mutex of 2 ran in about 1 hour. Mutex of 4 got it to 40 minutes19:49
fungiwatching the load on bridge we could almost certainly increase it, but there are coalescent event horizons like the letsencrypt job which would need to be refactored to take advantage of much more parallelism19:49
clarkbbumping to 6 I think we'd only get to 35 minutes at best and then ya ^19:49
clarkbI'm happy to increase it and see what happens if people would like to do that19:49
clarkbbut it doesn't seem as urgent as the prior increases19:49
corvusmeh.  sounds lie time might be better spent thinking about the next blocker19:50
clarkbI guess if anyone feels strongly about it or wants to experiment push up a change. I don't think anyone will object19:50
fricklerif we're voting I'd rather keep it conservative and stick to 419:50
corvusis there a reason to be conservative?19:50
fungii have a change open documenting how i prepped the openstack-discuss ml to switch it to moderating new subscribers by default:19:50
fungi#link https://review.opendev.org/944893 docs: Switch a mailing list to default moderation19:50
fungi#link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/L4OG3TJ5JBVS4IS7KCQKXER736PWEITB/ [administrivia] Recent change in moderation for new subscribers19:51
corvus(it sounds like the only reason not to do it is because it won't actually help, and we'll just be waiting on other resources outside the mutex)19:51
clarkbthe main reason I can think of is limiting blast radius if we decide we need to shut off ansible runs19:52
clarkbbut the speedup tradeoffs if there would be worth it imo19:52
clarkband if not there then ya not much incentive19:52
clarkbGerrit meets is happening at 00:00 UTC between tuesday and wednesday. That is in just over 4 hours19:53
clarkbif you want to participate they stream it to gerritforge's youtube channel19:53
clarkbthey will discuss gerrit caches and I intend on listening in and sending questions (likely to discord)19:53
clarkbalso the openinfra foundation is putting together a newsletter for the end of march and wants to spotlight openduev19:53
clarkbdraft work is going in https://etherpad.opendev.org/p/opendev_newsletter19:54
clarkbfeel free to add ideas or write something if you are intested or have somethign you want to communicate in that format. I'll probably draft things early next week if no one else beats me to it19:54
fricklerfungi: btw. did you check the load on wiki before rebooting? just wondering whether the issue really was related to that19:55
fungifrickler: i didn't, but in the past have observed that the openid login problem persists even after load dies down19:55
fungiit's probably restored by restarting apache and mariadb/mysql, guessing there's something going on between them that gets stuck19:57
fungibut a reboot takes care of all of that19:57
fricklerah, ok, maybe next time I'm confident enough to try that on my own, then19:58
fungifeel free19:58
fungiwe're really trying not to waste too much of our time on it, and would instead prefer to spend it reviewing tonyb's replacement work19:58
fricklerwell this morning a preferred a running wiki with broken login to a possibly completely broken one after a reboot19:58
fungifair enough19:59
clarkbfwiw the wiki did work for me yseterday when prepping the agenda19:59
clarkbso whatever broke it occured between about 00:00 and when frickler noticed it19:59
fungisame, i edited the agenda during my afternoon yesterday as well19:59
clarkband we are at time. Thank you everyone!19:59
fungithanks clarkb!19:59
frickleractually bauzas noticed, but anyway ;)19:59
clarkbwe'll be back here same time and location next week19:59
clarkbfeel free to continue discussion in our normal comms channels20:00
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue Mar 18 20:00:13 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2025/infra.2025-03-18-19.00.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-03-18-19.00.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2025/infra.2025-03-18-19.00.log.html20:00

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!