clarkb | just about meeting time | 18:59 |
---|---|---|
clarkb | I'm not sure how many people we'll get today with the PTG happening | 18:59 |
clarkb | I did want to make sure we had time to quickly go over things and catch up since we skipped last week's meeting | 19:00 |
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Oct 22 19:00:15 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/QGD26LEKHTM3AI6HTETDZWG6NQVM7ALV/ Our Agenda | 19:00 |
clarkb | #topic Announcements | 19:00 |
clarkb | #link https://www.socallinuxexpo.org/scale/22x/events/open-infra-days CFP for Open Infra Days event at SCaLE is open until November 1 | 19:00 |
clarkb | Sounds like the zuul presentation at the recent open infra days in indiana was well received. I've been told I should encourage all ya'll with good ideas to propose presentations for the SCaLE event | 19:01 |
clarkb | also as I just mentioned the PTG is happening this week | 19:01 |
* frickler is watching with one eye or so | 19:02 | |
clarkb | please be careful making changes particularly to meetpad or etherpad and ptgbot | 19:02 |
fungi | yeah, i didn't get a chance to attend any zuul talks at oid-na but was glad to see there were some | 19:02 |
corvus | o/ | 19:02 |
clarkb | along those lines I put meetpad02 and jvb02 in the emergency file because jitsi meet just cut new releases and docker images. Having those servers in the emergency file should ensure that we don't update when our daily infra-prod jobs runs in about ~6 hours? | 19:02 |
clarkb | once the PTG is over we can remove those servers and let them upgrade normally. This just avoids any problems as meetpad has been working pretty well so far | 19:03 |
clarkb | #topic Zuul-launcher image builds | 19:03 |
clarkb | Diving right into the agenda I feel like I need to cathc up on the state of things here. Anything new to report corvus ? | 19:04 |
corvus | at this point i think we can say the image build/upload is successful | 19:04 |
corvus | i think there is an opportunity to improve the throughput of the download/upload cycle on the launcher, but we're missing some log detail to confirm that | 19:05 |
fungi | yay! | 19:05 |
corvus | i even went as far as to try launching a node from the uploaded image | 19:05 |
fungi | i mean yay-success, of course, not yay-missing-logs ;) | 19:05 |
corvus | that almost worked, but the launcher didn't provide enough info to the executor to actually use the node | 19:05 |
corvus | technically we did run a job on a zuul-launcher -launched node, it just failed. :) | 19:06 |
corvus | we just (right before this meeting) merged changes to address both of those things | 19:06 |
clarkb | ok was going to ask if the problem was in our configs or in zuul itself | 19:06 |
corvus | so i will retry the upload to get better logs, and retry the job to see what is to be seen there | 19:06 |
corvus | 2 things aside from the above: | 19:07 |
corvus | 1) i have a suspicion that the x-delete-after is not working. maybe that's not honored by the swift cli when it's doing segmented uploads, or maybe the cloud doesn't support that. i still need to confirm that with the most recent uploads, and then triage which of those things it is. | 19:08 |
corvus | 2) image build jobs for other images is still waiting for tonyb or someone to start on that (no rush, but it's ready for work to start whenever) | 19:08 |
corvus | oh bonus #3: | 19:08 |
corvus | 3) i don't think we're doing periodic builds yet; but we can; so i or someone should hook up the jobs to that pipeline (that's a simple zuul.yaml change) | 19:09 |
clarkb | re 1) probably a good idea to debug before we add a lot of image builds (just to keep the total amount of data as small as possible) | 19:09 |
corvus | yep -- though to be clear, we can work on the jobs for the other platforms and not upload the images yet | 19:10 |
corvus | (so #1 is not a blocker for #2) | 19:10 |
clarkb | got it, upload is a distinct step and we can start with simply doing builds | 19:10 |
corvus | ++ | 19:10 |
corvus | i think that's about it for updates (i could yak longer, but that's the critical path) | 19:11 |
clarkb | thank you for the update | 19:11 |
* tonyb promises to write at least one job this week | 19:11 | |
clarkb | #topic OpenStack OpenAPI spec publishing | 19:11 |
clarkb | #link https://review.opendev.org/921934 | 19:11 |
clarkb | I kept this on the agenda to make sure we don't lose track of it and I was hoping to maybe catch up with the PTG but I'm not sure timing will work out for that | 19:12 |
clarkb | the sdk team met yesterday during TC time | 19:12 |
clarkb | so maybe we just need to follow up after the PTG and see what is next | 19:12 |
clarkb | (there aren't any new comments in response to frickler or myself on the change) | 19:13 |
clarkb | any other thoughts on this? | 19:13 |
fungi | i have none | 19:13 |
clarkb | #topic Backup Server Pruning | 19:13 |
clarkb | we discussed options for reducing disk consumption on the smaller of the two backup servers 2 weeks ago then I went on a last minute international trip and haven't had a chance to do that | 19:14 |
clarkb | good news is tomorrow is a quiet day in my PTG schedule so I'm hoping I can sit down and carefully trim out the backup targets for old/ancient/gone servers | 19:14 |
clarkb | ask01, ethercalc02, etherpad01, gitea01, lists, review-dev01, and review01 | 19:14 |
clarkb | that is the list I'll be looking at probably ethercalc to start since it seems the least impactful | 19:15 |
fungi | i think we already had consensus to remove those, but just to reiterate that list sounds good to me | 19:15 |
fungi | i'd volunteer to help but my dance card is full until at least mid-next week | 19:15 |
clarkb | ya between now and tomorrow is a good time to chime in if you think that we should replace the backing volume instead and keep those around or $otheridea | 19:16 |
tonyb | ++ | 19:16 |
clarkb | but my itnention is to simply clear those out and ensure we're recovering expected disk space to start | 19:16 |
clarkb | we should have server snapshots and the other backup server too so risk seems low | 19:16 |
clarkb | #topic Updating Gerrit Cache Sizes | 19:17 |
clarkb | last Friday we upgraded Gerrit to pick up some bugfixes | 19:17 |
clarkb | when gerrit started up it complained about a number of caches being over sized and needing pruning | 19:17 |
clarkb | it turns out that gerrit prunes them automatically at 0100 but also on startup | 19:17 |
clarkb | https://paste.opendev.org/show/bk4pTIuQLCsWaF3dVVF7/ is the relavent logged output which shows several related caches were much larger than their configured sizes (defaults all) | 19:18 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/932763 increase sizes for four gerrit caches | 19:18 |
clarkb | I psuhed this change to update the cache sizes based on the data in those logs and the documentation to what I hope is a larger more reasonable and performant set of sizes | 19:18 |
clarkb | updating this config will require another gerrit restart so this isn't a rush. May be good to try and get done after the PTG though as dev work should ramp up and give us an idea of whether or not this is helpful | 19:19 |
clarkb | probably the main concern is that we're increasing the size of some memory caches too but they seem clearly too small and this is likely impacting performance | 19:20 |
fungi | out of curiosity, i wonder if anyone has observed worse performance with the aggressively small cache target sizes | 19:20 |
fungi | but also no clue how recently this started complaining | 19:20 |
clarkb | fungi: I suspect that this is why we don't get diffs for a few minutes on gerrit startup. Gerrit is marking all of the cached data for those diffs as stale and it takes a while to repopulate | 19:20 |
fungi | does it persist caches over restarts? prune during startup? | 19:21 |
clarkb | fungi: the disk caches are persisted over restarts but it prunes them to the configured size on startup | 19:21 |
fungi | and also once a day | 19:21 |
clarkb | "Cache jdbc:h2:file:///var/gerrit/cache/gerrit_file_diff size (2.51g) is greater than maxSize (128.00m), pruning" basically all this content is marked invalid at startup | 19:21 |
clarkb | by increasing that cache size to 3g as proposed I suspect/hope that the next restart won't prune and we'll get diffs on startup | 19:22 |
fungi | so maybe if people have been observing sluggishness after 01z daily that could be an explanation | 19:22 |
clarkb | or if it prunes it will do so minimally | 19:22 |
fungi | that sounds like a great test | 19:22 |
clarkb | anyway comments welcome and definitely open to suggestions on size if we have different interpretations of the docs or concerns about memory consumption | 19:23 |
clarkb | and if we can reach general consensus a restart early next week would be great | 19:23 |
frickler | I already +2d, early next week sgtm | 19:24 |
clarkb | #topic Upgrading old servers | 19:24 |
clarkb | tonyb: not sure if you are still around. Any updates on the wiki changes? | 19:25 |
clarkb | I don't see new patchsets. Any other updates? | 19:25 |
fungi | i ended up adding some ua filters to the existing set in order to hopefully get a handle on ai training scrapers overrunning it | 19:26 |
fungi | on the production server that is | 19:26 |
clarkb | oh ya tonyb mentioned those would need syncing as part of the redeployment | 19:26 |
fungi | tonyb mentioned adding those bots to the robots.txt in an update to his changes, since most of those bots should be well-behaved but the old server doesn't present a robots.txt at all | 19:27 |
fungi | i think the load average was up around 50 when i was looking into the problem | 19:27 |
clarkb | I'm guessing tonyb managed to go on that run so we don't need to wait around | 19:27 |
clarkb | fungi: I'm guessing that your edits improved things based on my ability to edit the agenda yesterday :) | 19:27 |
fungi | well, i also fully rebooted the server | 19:27 |
fungi | load average is still pretty high, around 10 at the moment, but the reboot did seem to fix the inability to authenticate via openid | 19:28 |
fungi | anyway, the sooner we're able to move forward with the container replacement, the easier this all gets | 19:29 |
clarkb | and until the AI training wars subside we're likely to need to make continuous updates | 19:29 |
clarkb | #topic Docker compose plugin with podman service for servers | 19:31 |
clarkb | I don't think anyone has pushed up a chagne to start testing this with say paste/lodgeit but that is the current proposed plan | 19:31 |
clarkb | if I'm wrong about that please correct me and point out what needs reviewing or if there are any other questions | 19:31 |
corvus | i share that understanding | 19:31 |
fungi | i don't recall seeing a change yet | 19:31 |
clarkb | #topic Open Discussion | 19:32 |
clarkb | Anything else? | 19:32 |
fungi | i've got nothing | 19:33 |
clarkb | it may be worth mentioning that I'll be out around veterans day weekend. I can't remember if I'm back Tuesday or Wednesday though | 19:34 |
clarkb | also thanksgiving is about a month away for those of us in the US | 19:34 |
frickler | EU switches back from DST next sunday | 19:34 |
clarkb | looks liek I'll be back tuesday so no missed meeting for me and I expect to be around tuesday before thanksgiving | 19:35 |
fungi | i think it's a couple of weeks out that the usa does the same | 19:35 |
clarkb | yes we're a week later than the EU | 19:35 |
fungi | november 3, yep | 19:35 |
clarkb | keep those date changes in mind and as far as I can tell we should have meetings for the next month and a half or so | 19:36 |
clarkb | s/date/timezone/ | 19:36 |
clarkb | I'll give it a few more minutes but we can end early if there is nothing else | 19:36 |
clarkb | thank you for your time everyone! have a productive PTG and we'll see you back here next week | 19:38 |
clarkb | #endmeeting | 19:39 |
opendevmeet | Meeting ended Tue Oct 22 19:39:02 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:39 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-10-22-19.00.html | 19:39 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-10-22-19.00.txt | 19:39 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-10-22-19.00.log.html | 19:39 |
fungi | thanks clarkb! | 19:39 |
frickler | o/ | 19:39 |
clarkb | now we can go find $meal a little early | 19:39 |
fungi | or $drink depending on where we are | 19:42 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!