*** diablo_rojo_phone is now known as Guest10734 | 11:21 | |
frickler | fyi I'm going to be like 15min late after the long TC meeting | 18:49 |
---|---|---|
clarkb | frickler: noted | 18:50 |
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Jun 25 19:00:27 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/IWHUW7OFDHZSL7MGPOK53LJ3ZR43OSOF/ Our Agenda | 19:00 |
clarkb | #topic Announcements | 19:01 |
clarkb | Its been a couple weeks since the last meeting but other than that I'm not aware of any important announcements | 19:01 |
clarkb | might be worth noting that a week from Thursday is a major US holdiay and I expect those of us in the US will be busy enjoying the day off | 19:01 |
clarkb | anything else to announce? | 19:02 |
clarkb | seems like thats it | 19:04 |
clarkb | #topic Upgrading Old Servers | 19:04 |
clarkb | tonyb continues to poke at configuration management and related infrastructure for a rebuilt wiki server | 19:05 |
tonyb | Yup it's getting closer | 19:05 |
clarkb | The last big question that came up was where should logs live for the apache embedded in the container images that is responsible for running php stuff. I suggested that we could hook that into our regular container logs to syslog to log files on disk system | 19:05 |
clarkb | I like that appraoch as it keeps the logs distinct from the host side ssl terminating apache that we use on most of our systems and are likely to use here as well | 19:06 |
clarkb | but there are many ways to do that so I guess let tonyb know if you have different ideas | 19:06 |
tonyb | I still need to configure elastic search but I can build a server in CI that "works" seems to do the important stuff | 19:07 |
clarkb | that is great progress | 19:07 |
fungi | was the conclusion to directly expose the container apache or reverse-proxy to it from an outer apache? | 19:07 |
tonyb | At some stage soon I want to try importing the images from the existing server and pointing a held node at the trove DB | 19:07 |
tonyb | fungi: I'm still using a reverse proxy | 19:08 |
clarkb | tonyb: if you do that you'll probably need to make a copy of the db? since I'm not sure its safe to have two different installs talking to the same db? | 19:08 |
clarkb | but that sounds like a great step to include as part of the how do we migrate story | 19:09 |
tonyb | I'll double check on that ... fortunately there is a mariadb server just waiting for data :) | 19:09 |
fungi | yeah, weird as it sounds, proxying from an apache to another apache makes the most sense for consistency with our other systems | 19:10 |
fungi | just wanted to double-check | 19:10 |
tonyb | fungi: when I get a more complete install I'll need you to poke at the antispam stuff to see if it's working as expected | 19:10 |
fungi | yep | 19:11 |
tonyb | It's installed but that's about as far as I have gotten | 19:11 |
tonyb | It's funny to connect to http://new-server/ and watch all the 30X replies to get to a 200 ;P | 19:12 |
clarkb | redirect all the things | 19:12 |
clarkb | tonyb: you were also poking at booting noble nodes but ran into trouble with the afs packaging situation on noble (due to our use of a ppa that doesn't have packages yet) | 19:12 |
clarkb | tonyb: has there been any progress on that? Thats somethign I wanted to catch up on after my week off and haven't managed to | 19:13 |
tonyb | I have patches up that I think passed CI yesterday the topic is noble-mirror or something like that | 19:13 |
tonyb | I poked the Ubuntu stable team to get the fixed packages moved into -updates | 19:14 |
clarkb | #link https://review.opendev.org/q/topic:noble-mirror Deploying noble based mirror servers | 19:14 |
clarkb | anything else for server upgrades? | 19:15 |
tonyb | I think that's it for now | 19:15 |
clarkb | #topic AFS Mirror Cleanups | 19:15 |
clarkb | This is something I've started pushing on again recently. | 19:15 |
clarkb | topic:drop-ubuntu-xenial has at least one new change. This chagne drops xenial from system-config base role testing. I noted in the commit mesasge for that change that this might be one of the more "dangerous" changes in the xenial cleanup for us. The reason for that is we require the infra-prod-service-base job to succeed before running most other jobs | 19:16 |
clarkb | so if we start failing against xenial we could prevent other jobs from running even for stuff not on xenial. That said I think the risk is still relatively low as the base role stuff doesn't change often | 19:17 |
clarkb | open to feedback if you've got it (probably best to keep that in the review though) | 19:17 |
clarkb | I've also been trying to catch up with centos 8 stream cleanup now that none of those test nodes are really functioanl | 19:18 |
clarkb | topic:drop-centos-8-stream has a whole bunch of fun changes to make that cleanup happen. The vast majority should be safe. There is an openstack-zuul-jobs change that is -1'd by zuul bceause projects have c8s jobs for fips | 19:18 |
clarkb | I've been trying to get the word out and brought this up with the openstack qa team and tc. Sounds like the qa team may send email about it | 19:19 |
clarkb | at some point I expect we may end up force merging that chagne though and forcing projects to address the problems if they haven't already | 19:19 |
clarkb | I think the risk is really low though considering that all the jobs are failing on that platform since no packages are availalbe. | 19:21 |
clarkb | #topic Gitea 1.22 Upgrade | 19:21 |
clarkb | This is the other item that is/was high on my todo list. Unfortunately there still isn't a 1.22.1 release yet. I don't think many if any of the issues they have with 1.22.0 will affect us but there were a lot of issuse and some of them seemed important (for general use) | 19:21 |
clarkb | so I'm feeling more confident waiting for a 1.22.1 release before we upgrade | 19:21 |
clarkb | once we have upgraded we can proceed with fixing up the database encodings and all that | 19:22 |
clarkb | anyway no new info here other than 1.22.1 is still absent | 19:22 |
clarkb | #topic Improving Mailman Mail Throughput | 19:22 |
clarkb | as discussed last time we likely need to increase both mailman queue batch sizes and exim verp batch sizes | 19:23 |
clarkb | I don't think a change has been pushed for that yet unless I missed it in today's early morning scrollback but fungi was looking into it | 19:23 |
clarkb | fungi: is there anything else to add? | 19:23 |
tonyb | #link https://review.opendev.org/c/opendev/system-config/+/922703 | 19:24 |
fungi | oh, i pushed one, sorry | 19:24 |
tonyb | I think that's what we're talking about right? | 19:24 |
clarkb | oh hey I did miss it in scrollback thanks | 19:25 |
fungi | thanks tonyb | 19:25 |
clarkb | yup | 19:25 |
clarkb | I'll givethat a review after the meeting | 19:25 |
clarkb | #topic OpenMetal Cloud Rebuild | 19:26 |
clarkb | So this was all going great until it wasn't :) | 19:26 |
clarkb | We had the cloud fully running in nodepool with max-servers 50 set and jobs were running etc. But then there were failures booting nodes and frickler's debugging indicated we ran out of disk? | 19:26 |
clarkb | sounds like maybe ceph isn't backing all of the disk related stuff that we expect it to be backing and that lead to us filling the smaller portion of disck allocated to not ceph | 19:27 |
clarkb | frickler: is that a reasonable summary? | 19:27 |
frickler | I sent a mail about that to the openmetal thread | 19:27 |
frickler | the issue is that glance and nova cannot share the ceph pool properly in the current configuration | 19:28 |
clarkb | ya you also suggested some kolla setting changes that could be applied to fix it. I think it is a good idea to work through the openmetal folks on this as I suspect it may affect their product as a whole and this si something we want to ensure is working for us and everyone else | 19:28 |
frickler | thus nova needs to download each image and upload it to ceph again | 19:28 |
clarkb | aha | 19:28 |
clarkb | that would be inefficient | 19:28 |
frickler | which fills up the root partition which is only like 200G and our images are large | 19:28 |
frickler | it was actually openmetal who discovered the disk full condition, too | 19:29 |
frickler | maybe you can poke yuri again since there was no response yet to my mail afaict | 19:29 |
clarkb | frickler: I haven't seen a response to your email yet, any chance they responded to you directly instead of cc'ing the rest of us? or should i try and followup with them in a bit? | 19:29 |
clarkb | ack can do | 19:29 |
clarkb | there was also the question of the number of ceph partition groups per osd being very low. The docs suggest 100 per osd but we're at like 4? I can mention that too | 19:30 |
frickler | once that is fixed we can do another production attempt and them possibly decide about the ceph tuning | 19:30 |
clarkb | though the docs are also a bit hand wavey around how much it actually matters | 19:30 |
clarkb | #link https://docs.ceph.com/en/latest/dev/placement-group/ | 19:31 |
clarkb | sounds good thank you for helping with the debugging on that | 19:31 |
frickler | well it will worsen the performance a bit, but difficult to tell how much | 19:31 |
clarkb | anything else openmetal related? | 19:32 |
frickler | I think that should be all for now | 19:33 |
frickler | oh | 19:33 |
frickler | ask about the monitoring thing again, I think that also went unnoticed in your mail or unresponded at least | 19:33 |
clarkb | can do /me scribbles some notes on the todo list | 19:34 |
clarkb | #topic Testing Rackspace's New Cloud Offering | 19:34 |
clarkb | rax recently reached out to fungi and myself about this. Still not a ton of info, but I'm working to try and schedule a short meeting to allow us to discuss it more syncrhonously and determine the next step here | 19:35 |
clarkb | I proposed July 8 as a possible day as it avoids the holiday next week and isn't this week, but have no idea what their schedule is like and haven't heard back yet. | 19:35 |
frickler | do you have a mail about this that you could share? | 19:36 |
clarkb | But hopefully we'll be able to sync up and learn more and do something productive around this. I think helping them burn in the new product if we can is a great idea | 19:36 |
clarkb | frickler: just the ticket they filed against the nodepool account | 19:36 |
fungi | i think the details were the same as what they sent opendev, yeah | 19:37 |
frickler | ah, that sounded like you were contacted directly, ok | 19:37 |
clarkb | ya its basically just "we have a new product in test/beta/limited availability would you like to be an early user" | 19:38 |
fungi | the separate outreach was technically to the foundation staff about using it, but the actual foundation wouldn't really have much use for it beyond supplying resources for opendev to use, i think | 19:38 |
frickler | ack | 19:38 |
fungi | rackspace contacted business development folks on the foundation staff, who basically forwarded it to me and clarkb | 19:38 |
fungi | and in the end it was a matter of "yeah they already reached out to opendev about this same thing" | 19:39 |
clarkb | and without any more info than we got in the ticket | 19:39 |
clarkb | once I hear back anything more actionable I can share that info | 19:40 |
frickler | so more business than technical, I'm fine to be left out of that ;) | 19:40 |
clarkb | #topic Open Discussion | 19:40 |
clarkb | I wanted to mention that dib was also hit by the c8s stuff but is getting sorted out | 19:41 |
clarkb | mostly an fyi Idon't think we need help with it and jobs should be passing again as of today | 19:41 |
corvus | some of the recent zuul performance improvements have resulted in a significant (40%+) reduction in peak zk data size in opendev: | 19:42 |
corvus | https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1&from=now-30d&to=now&viewPanel=38 | 19:42 |
clarkb | starlingx also ran into pip 24.1 problems (there is a bunch of discussion of that in #opendev from today) related to pip failing to handle metadata on some packages | 19:42 |
fungi | oh wow! | 19:42 |
fungi | that's a nice performance gain | 19:42 |
tonyb | corvus: awesome! | 19:42 |
corvus | fungi: yeah, i was not expecting it to be so big with opendev's specific characteristics, so was a pleasant surprise :) | 19:43 |
corvus | clarkb: is an underlying cause the age of the package in question? i'm wondering if there may be more time-bombs like that... | 19:43 |
fungi | well, there are a number of backward-incompatible changes in pip 24.1 | 19:44 |
clarkb | corvus: sort of. The package is older and newer versions do fix it | 19:45 |
clarkb | corvus: but I think you could hit the same issue with modern packages too | 19:46 |
corvus | ack | 19:46 |
fungi | it dropped support for python 3.7, refuses to install anything with a non-pep440-compliant version string, and also vendors newer libs for things like metadata processing which may have gotten more strict than they used to be | 19:46 |
clarkb | I may end up being away from irc/matrix at some point today and/or tomorrow. I really need to finally get around to RMAing this laptop and before I do I want to retest with ubuntu noble and wayland vs x11 etc to ensure this isn't just a sofwtare problem. I think I'm going to start diving into that today if I ca and then sit on the phoen tomorrow with lenovo if still necessary | 19:49 |
clarkb | I got a talk accepted to the openinfra summit event in korea and need a working laptop before that event | 19:49 |
clarkb | the annoying thing is it almost mostly works if I disable modesetting in the kernel but if I do that everything has to run at 1920x1080 (including external displays) and I can't control display brightness | 19:50 |
tonyb | clarkb: congrats and noted | 19:50 |
clarkb | Last call otherwise I think we can have 10 minutes back for $meal/sleep/etc | 19:51 |
tonyb | .zZ | 19:51 |
clarkb | thanks everyone! | 19:52 |
clarkb | #endmeeting | 19:52 |
opendevmeet | Meeting ended Tue Jun 25 19:52:11 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:52 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-25-19.00.html | 19:52 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-25-19.00.txt | 19:52 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-06-25-19.00.log.html | 19:52 |
* corvus falls asleep into bowl of rice | 19:52 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!