Tuesday, 2024-06-25

*** diablo_rojo_phone is now known as Guest10734		11:21
frickler	fyi I'm going to be like 15min late after the long TC meeting	18:49
clarkb	frickler: noted	18:50
clarkb	#startmeeting infra	19:00
opendevmeet	Meeting started Tue Jun 25 19:00:27 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:00
opendevmeet	The meeting name has been set to 'infra'	19:00
clarkb	#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/IWHUW7OFDHZSL7MGPOK53LJ3ZR43OSOF/ Our Agenda	19:00
clarkb	#topic Announcements	19:01
clarkb	Its been a couple weeks since the last meeting but other than that I'm not aware of any important announcements	19:01
clarkb	might be worth noting that a week from Thursday is a major US holdiay and I expect those of us in the US will be busy enjoying the day off	19:01
clarkb	anything else to announce?	19:02
clarkb	seems like thats it	19:04
clarkb	#topic Upgrading Old Servers	19:04
clarkb	tonyb continues to poke at configuration management and related infrastructure for a rebuilt wiki server	19:05
tonyb	Yup it's getting closer	19:05
clarkb	The last big question that came up was where should logs live for the apache embedded in the container images that is responsible for running php stuff. I suggested that we could hook that into our regular container logs to syslog to log files on disk system	19:05
clarkb	I like that appraoch as it keeps the logs distinct from the host side ssl terminating apache that we use on most of our systems and are likely to use here as well	19:06
clarkb	but there are many ways to do that so I guess let tonyb know if you have different ideas	19:06
tonyb	I still need to configure elastic search but I can build a server in CI that "works" seems to do the important stuff	19:07
clarkb	that is great progress	19:07
fungi	was the conclusion to directly expose the container apache or reverse-proxy to it from an outer apache?	19:07
tonyb	At some stage soon I want to try importing the images from the existing server and pointing a held node at the trove DB	19:07
tonyb	fungi: I'm still using a reverse proxy	19:08
clarkb	tonyb: if you do that you'll probably need to make a copy of the db? since I'm not sure its safe to have two different installs talking to the same db?	19:08
clarkb	but that sounds like a great step to include as part of the how do we migrate story	19:09
tonyb	I'll double check on that ... fortunately there is a mariadb server just waiting for data :)	19:09
fungi	yeah, weird as it sounds, proxying from an apache to another apache makes the most sense for consistency with our other systems	19:10
fungi	just wanted to double-check	19:10
tonyb	fungi: when I get a more complete install I'll need you to poke at the antispam stuff to see if it's working as expected	19:10
fungi	yep	19:11
tonyb	It's installed but that's about as far as I have gotten	19:11
tonyb	It's funny to connect to http://new-server/ and watch all the 30X replies to get to a 200 ;P	19:12
clarkb	redirect all the things	19:12
clarkb	tonyb: you were also poking at booting noble nodes but ran into trouble with the afs packaging situation on noble (due to our use of a ppa that doesn't have packages yet)	19:12
clarkb	tonyb: has there been any progress on that? Thats somethign I wanted to catch up on after my week off and haven't managed to	19:13
tonyb	I have patches up that I think passed CI yesterday the topic is noble-mirror or something like that	19:13
tonyb	I poked the Ubuntu stable team to get the fixed packages moved into -updates	19:14
clarkb	#link https://review.opendev.org/q/topic:noble-mirror Deploying noble based mirror servers	19:14
clarkb	anything else for server upgrades?	19:15
tonyb	I think that's it for now	19:15
clarkb	#topic AFS Mirror Cleanups	19:15
clarkb	This is something I've started pushing on again recently.	19:15
clarkb	topic:drop-ubuntu-xenial has at least one new change. This chagne drops xenial from system-config base role testing. I noted in the commit mesasge for that change that this might be one of the more "dangerous" changes in the xenial cleanup for us. The reason for that is we require the infra-prod-service-base job to succeed before running most other jobs	19:16
clarkb	so if we start failing against xenial we could prevent other jobs from running even for stuff not on xenial. That said I think the risk is still relatively low as the base role stuff doesn't change often	19:17
clarkb	open to feedback if you've got it (probably best to keep that in the review though)	19:17
clarkb	I've also been trying to catch up with centos 8 stream cleanup now that none of those test nodes are really functioanl	19:18
clarkb	topic:drop-centos-8-stream has a whole bunch of fun changes to make that cleanup happen. The vast majority should be safe. There is an openstack-zuul-jobs change that is -1'd by zuul bceause projects have c8s jobs for fips	19:18
clarkb	I've been trying to get the word out and brought this up with the openstack qa team and tc. Sounds like the qa team may send email about it	19:19
clarkb	at some point I expect we may end up force merging that chagne though and forcing projects to address the problems if they haven't already	19:19
clarkb	I think the risk is really low though considering that all the jobs are failing on that platform since no packages are availalbe.	19:21
clarkb	#topic Gitea 1.22 Upgrade	19:21
clarkb	This is the other item that is/was high on my todo list. Unfortunately there still isn't a 1.22.1 release yet. I don't think many if any of the issues they have with 1.22.0 will affect us but there were a lot of issuse and some of them seemed important (for general use)	19:21
clarkb	so I'm feeling more confident waiting for a 1.22.1 release before we upgrade	19:21
clarkb	once we have upgraded we can proceed with fixing up the database encodings and all that	19:22
clarkb	anyway no new info here other than 1.22.1 is still absent	19:22
clarkb	#topic Improving Mailman Mail Throughput	19:22
clarkb	as discussed last time we likely need to increase both mailman queue batch sizes and exim verp batch sizes	19:23
clarkb	I don't think a change has been pushed for that yet unless I missed it in today's early morning scrollback but fungi was looking into it	19:23
clarkb	fungi: is there anything else to add?	19:23
tonyb	#link https://review.opendev.org/c/opendev/system-config/+/922703	19:24
fungi	oh, i pushed one, sorry	19:24
tonyb	I think that's what we're talking about right?	19:24
clarkb	oh hey I did miss it in scrollback thanks	19:25
fungi	thanks tonyb	19:25
clarkb	yup	19:25
clarkb	I'll givethat a review after the meeting	19:25
clarkb	#topic OpenMetal Cloud Rebuild	19:26
clarkb	So this was all going great until it wasn't :)	19:26
clarkb	We had the cloud fully running in nodepool with max-servers 50 set and jobs were running etc. But then there were failures booting nodes and frickler's debugging indicated we ran out of disk?	19:26
clarkb	sounds like maybe ceph isn't backing all of the disk related stuff that we expect it to be backing and that lead to us filling the smaller portion of disck allocated to not ceph	19:27
clarkb	frickler: is that a reasonable summary?	19:27
frickler	I sent a mail about that to the openmetal thread	19:27
frickler	the issue is that glance and nova cannot share the ceph pool properly in the current configuration	19:28
clarkb	ya you also suggested some kolla setting changes that could be applied to fix it. I think it is a good idea to work through the openmetal folks on this as I suspect it may affect their product as a whole and this si something we want to ensure is working for us and everyone else	19:28
frickler	thus nova needs to download each image and upload it to ceph again	19:28
clarkb	aha	19:28
clarkb	that would be inefficient	19:28
frickler	which fills up the root partition which is only like 200G and our images are large	19:28
frickler	it was actually openmetal who discovered the disk full condition, too	19:29
frickler	maybe you can poke yuri again since there was no response yet to my mail afaict	19:29
clarkb	frickler: I haven't seen a response to your email yet, any chance they responded to you directly instead of cc'ing the rest of us? or should i try and followup with them in a bit?	19:29
clarkb	ack can do	19:29
clarkb	there was also the question of the number of ceph partition groups per osd being very low. The docs suggest 100 per osd but we're at like 4? I can mention that too	19:30
frickler	once that is fixed we can do another production attempt and them possibly decide about the ceph tuning	19:30
clarkb	though the docs are also a bit hand wavey around how much it actually matters	19:30
clarkb	#link https://docs.ceph.com/en/latest/dev/placement-group/	19:31
clarkb	sounds good thank you for helping with the debugging on that	19:31
frickler	well it will worsen the performance a bit, but difficult to tell how much	19:31
clarkb	anything else openmetal related?	19:32
frickler	I think that should be all for now	19:33
frickler	oh	19:33
frickler	ask about the monitoring thing again, I think that also went unnoticed in your mail or unresponded at least	19:33
clarkb	can do /me scribbles some notes on the todo list	19:34
clarkb	#topic Testing Rackspace's New Cloud Offering	19:34
clarkb	rax recently reached out to fungi and myself about this. Still not a ton of info, but I'm working to try and schedule a short meeting to allow us to discuss it more syncrhonously and determine the next step here	19:35
clarkb	I proposed July 8 as a possible day as it avoids the holiday next week and isn't this week, but have no idea what their schedule is like and haven't heard back yet.	19:35
frickler	do you have a mail about this that you could share?	19:36
clarkb	But hopefully we'll be able to sync up and learn more and do something productive around this. I think helping them burn in the new product if we can is a great idea	19:36
clarkb	frickler: just the ticket they filed against the nodepool account	19:36
fungi	i think the details were the same as what they sent opendev, yeah	19:37
frickler	ah, that sounded like you were contacted directly, ok	19:37
clarkb	ya its basically just "we have a new product in test/beta/limited availability would you like to be an early user"	19:38
fungi	the separate outreach was technically to the foundation staff about using it, but the actual foundation wouldn't really have much use for it beyond supplying resources for opendev to use, i think	19:38
frickler	ack	19:38
fungi	rackspace contacted business development folks on the foundation staff, who basically forwarded it to me and clarkb	19:38
fungi	and in the end it was a matter of "yeah they already reached out to opendev about this same thing"	19:39
clarkb	and without any more info than we got in the ticket	19:39
clarkb	once I hear back anything more actionable I can share that info	19:40
frickler	so more business than technical, I'm fine to be left out of that ;)	19:40
clarkb	#topic Open Discussion	19:40
clarkb	I wanted to mention that dib was also hit by the c8s stuff but is getting sorted out	19:41
clarkb	mostly an fyi Idon't think we need help with it and jobs should be passing again as of today	19:41
corvus	some of the recent zuul performance improvements have resulted in a significant (40%+) reduction in peak zk data size in opendev:	19:42
corvus	https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1&from=now-30d&to=now&viewPanel=38	19:42
clarkb	starlingx also ran into pip 24.1 problems (there is a bunch of discussion of that in #opendev from today) related to pip failing to handle metadata on some packages	19:42
fungi	oh wow!	19:42
fungi	that's a nice performance gain	19:42
tonyb	corvus: awesome!	19:42
corvus	fungi: yeah, i was not expecting it to be so big with opendev's specific characteristics, so was a pleasant surprise :)	19:43
corvus	clarkb: is an underlying cause the age of the package in question? i'm wondering if there may be more time-bombs like that...	19:43
fungi	well, there are a number of backward-incompatible changes in pip 24.1	19:44
clarkb	corvus: sort of. The package is older and newer versions do fix it	19:45
clarkb	corvus: but I think you could hit the same issue with modern packages too	19:46
corvus	ack	19:46
fungi	it dropped support for python 3.7, refuses to install anything with a non-pep440-compliant version string, and also vendors newer libs for things like metadata processing which may have gotten more strict than they used to be	19:46
clarkb	I may end up being away from irc/matrix at some point today and/or tomorrow. I really need to finally get around to RMAing this laptop and before I do I want to retest with ubuntu noble and wayland vs x11 etc to ensure this isn't just a sofwtare problem. I think I'm going to start diving into that today if I ca and then sit on the phoen tomorrow with lenovo if still necessary	19:49
clarkb	I got a talk accepted to the openinfra summit event in korea and need a working laptop before that event	19:49
clarkb	the annoying thing is it almost mostly works if I disable modesetting in the kernel but if I do that everything has to run at 1920x1080 (including external displays) and I can't control display brightness	19:50
tonyb	clarkb: congrats and noted	19:50
clarkb	Last call otherwise I think we can have 10 minutes back for $meal/sleep/etc	19:51
tonyb	.zZ	19:51
clarkb	thanks everyone!	19:52
clarkb	#endmeeting	19:52
opendevmeet	Meeting ended Tue Jun 25 19:52:11 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	19:52
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-25-19.00.html	19:52
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-25-19.00.txt	19:52
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-25-19.00.log.html	19:52
* corvus falls asleep into bowl of rice		19:52

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!