opendevreview | Merged openstack/project-config master: Move yaml2ical to the opendev tenant https://review.opendev.org/c/openstack/project-config/+/946280 | 00:03 |
---|---|---|
opendevreview | Merged opendev/bindep master: Fix authors/maintainers format in pyproject.toml https://review.opendev.org/c/opendev/bindep/+/946218 | 00:06 |
opendevreview | Merged opendev/engagement master: Drop maintainers field from pyproject.toml https://review.opendev.org/c/opendev/engagement/+/946314 | 00:07 |
clarkb | that is an odd distinction and probably assumes that software is built by an individual at all times | 00:29 |
tonyb | Okay quotas increased, adding 50GB to mirrors.ubuntu still left it > 90%, and flagging a warning, so I added a total of 100GB. I hope that's okay | 00:31 |
tonyb | I'll give it an hour and verify I can see the change in grafana | 00:31 |
clarkb | should be fine. Ubuntu in particular grows slowly typically | 00:31 |
clarkb | tonyb: you may have to wait longer since grafana may look at the ro volumes and not rw volumes. I can't remember how often ubuntu and centos 9 stream vos release | 00:32 |
tonyb | Ah okay | 00:32 |
clarkb | (I wouldn't manaually vos release, just wait for the cron jobs to do it) | 00:32 |
tonyb | Yeah. No rush. I checked and the new quota has been applied so I'm happy to wait | 00:33 |
opendevreview | Tony Breeds proposed opendev/system-config master: Add tony's AFS admin user to UserList https://review.opendev.org/c/opendev/system-config/+/946316 | 00:36 |
tonyb | clarkb: Update Gerrit images to 3.10.5 and 3.11.2 | https://review.opendev.org/c/opendev/system-config/+/946050 looks good to me. Just need to plan the approval and restart | 00:39 |
clarkb | tonyb: thanks my goal is to start on that first thing tomorrow morning | 00:39 |
tonyb | clarkb: Sounds good | 00:40 |
clarkb | that way I have the afternoon to watch for anything unexpected before I disappear for a long weekend | 00:40 |
tonyb | ++ | 00:40 |
tonyb | Also FWIW, I can see the quota changes in grafana \o/ | 01:22 |
opendevreview | Merged opendev/irc-meetings master: kolla: Move meeting one hour backwards (DST) https://review.opendev.org/c/opendev/irc-meetings/+/946126 | 13:26 |
opendevreview | Merged opendev/gerritbot master: Feature: Keep alive interval of 60 by default https://review.opendev.org/c/opendev/gerritbot/+/941117 | 13:37 |
clarkb | any reason to not approve https://review.opendev.org/c/opendev/system-config/+/946050 nowish and plan to restart gerrit on 3.10.5 later today after the image updates? | 14:45 |
fungi | i can think of none | 14:45 |
fungi | approved it just now | 14:45 |
clarkb | thanks. I just wanted to make sure there wasn't anything going on as I'm just rolling in now | 14:46 |
fungi | i'm probably disappearing in about half an hour to grab lunch and maybe run a quick errand, but should be around for the rest of the day after 16:30 or so | 14:46 |
fungi | happy to help with a restart | 14:46 |
clarkb | sounds great. I'm not in a huge hurry to do that and it will take some time to gate and promote anyway | 14:49 |
*** dhill is now known as Guest12898 | 15:08 | |
fungi | okay, disappearing for a bit, shouldn't be much more than an hour | 15:21 |
opendevreview | Merged opendev/system-config master: Update Gerrit images to 3.10.5 and 3.11.2 https://review.opendev.org/c/opendev/system-config/+/946050 | 15:54 |
clarkb | images for ^ have promoted | 15:57 |
clarkb | the rest of the deployment jobs (which should noop) are still running | 15:57 |
dan_with | Hi @fungi, Could I have you all attempt to `openstack server reboot --hard <VM_ID>`? We're sorting through an issue with role/admin. | 15:57 |
clarkb | dan_with: yes I'll do that shortly | 15:57 |
dan_with | Thanks | 15:58 |
clarkb | dan_with: just issued that request | 16:00 |
dan_with | ok thanks | 16:00 |
clarkb | the server pings and https://mirror.dfw3.raxflex.opendev.org/ responds with expected content now | 16:03 |
clarkb | I guess let us know when you're happy on your side and we can reenable the region in the ci system | 16:03 |
dan_with | ok, I had to attach the volume after the boot, so you may need to log in via console and check things out. I would recommend a reboot (again), just to make sure everything comes up cleanly. It has both multipath paths now, so the desired result has been achieved. | 16:06 |
clarkb | oh tahts good to know | 16:08 |
clarkb | checking now | 16:08 |
clarkb | confirmed that it didn't mount things properly. Rebooting it again which should take care of that | 16:09 |
clarkb | (I thought about mount -a but we want to see it come up on boot anyway) | 16:09 |
dan_with | yes please. I couldn't attach the volume through the admin account until it started to boot. I'll have to solve the role/policy problem, but wanted to get you unstuck and your server happy. | 16:10 |
clarkb | seems to have come up with the volume attached and mounting at boot worked | 16:11 |
clarkb | its possible we leaked a small amount of data onto the rootfs at the two mount points but df reports minimal disk consumption for / so I decdie to not overthink cleaning that up for a mirror node | 16:12 |
clarkb | dan_with: you're happy with the server now? and we can return it to service if we are happy with it? | 16:12 |
dan_with | Excellent. Thank you for your patience and willingness to work with me. I can tell you have a great team. You can return to service if you would like. I just wanted to make sure the volume mounted correctly and it didn't need an fsck. | 16:13 |
clarkb | booting was slow it may have fsckd. Let me check | 16:13 |
clarkb | dmesg -T only reports that rootfs was skipped | 16:14 |
clarkb | so I don't think it fscked | 16:14 |
clarkb | dan_with: and thank you for the help and resources | 16:14 |
clarkb | I've +2'd https://review.opendev.org/c/openstack/project-config/+/946266 and when fungi returns he can confirm that the server looks good and approve it if so | 16:15 |
dan_with | You're welcome. Sorry for the odyssey down a rabbit hole. It did help uncover a major problem that I'll be working on today. So, have a wonderful Friday and weekend. | 16:15 |
clarkb | you too! | 16:16 |
fungi | looks like we're all set for the upgrade when ready? | 16:42 |
fungi | thanks dan_with for your help on the mirror server! | 16:43 |
clarkb | fungi: I think so | 16:43 |
clarkb | did you want to drive or should I? | 16:43 |
dan_with | you bet | 16:43 |
clarkb | I think the process is note the current image, pull the new image and check it looks correct, down the service, move waiting queue aside, optionally delete the two large gerrit caches or move them aside, start the service | 16:43 |
fungi | i can drive the gerrit upgrade but would need a few minutes to get settled first | 16:44 |
clarkb | gerrit_file_diff, git_file_diff, git_modified_files are the three main caches that have given us problems | 16:45 |
clarkb | I know that gerrit has also tried to improve things but unsure if any of that has made it onto the 3.10 branch. So may be worth attempting to restart as is. Or we can just go easy mode and not bother | 16:45 |
clarkb | (we don't awnt to delete all caches as that will force everyone to log back in again which is annoying) | 16:45 |
clarkb | diff_intraline appears to be another large one | 16:46 |
clarkb | and each cache has 2-3 files associated with it. The h2 db, a lock file and an optional trace file. I've been moving all aside when clearing them out | 16:48 |
opendevreview | Merged openstack/project-config master: Revert "Temporarily turn down raxflex-dfw3 use" https://review.opendev.org/c/openstack/project-config/+/946266 | 16:50 |
fungi | okay, i have a root screen session open on review.o.o | 16:52 |
clarkb | cool let me jump on there | 16:53 |
fungi | opendevorg/gerrit <none> 66bb7c64dabb 10 months ago 682MB | 16:53 |
fungi | that looks like the image we're running? | 16:53 |
fungi | doesn't seem like it's been that long | 16:53 |
clarkb | is it ok if I type? | 16:53 |
fungi | sure | 16:54 |
fungi | been longer since we restarted gerrit than i realized | 16:54 |
clarkb | opendevorg/gerrit 3.10 279db0b1f27b 8 weeks ago | 16:55 |
clarkb | its that one I think | 16:55 |
fungi | okay, so 8 weeks seems more reasonable | 16:55 |
fungi | that didn't come up in the `docker image list` output for me | 16:55 |
fungi | oh, no it was me scrolling my local tmux session and not the remote screen session, okay | 16:56 |
clarkb | I think it was there but your small terminals cut it off :) | 16:56 |
fungi | yeah, i see it now | 16:56 |
fungi | okay, so 279db0b1f27b is the id we're running | 16:56 |
clarkb | yup | 16:56 |
clarkb | as that matches the opendevorg/gerrit:3.10 image from docker ps -a | 16:56 |
fungi | pulling new images | 16:57 |
fungi | opendevorg/gerrit 3.10 f5b922fbdc07 2 hours ago 681MB | 16:57 |
fungi | that's the one we're switching to | 16:57 |
clarkb | that mariadb image also probably updated? | 16:58 |
clarkb | maybe not the 10.11 image isn't getting updated as frequently so could be either way | 16:58 |
fungi | quay.io/opendevmirror/mariadb 10.11 4254659f2379 8 weeks ago 326MB | 16:58 |
clarkb | ya looks like it hasn't. Thats fine and I think expected | 16:58 |
fungi | that looks older than the last restart, right | 16:58 |
fungi | mv /home/gerrit2/review_site/data/replication/ref-updates/waiting /home/gerrit2/tmp/replication_waiting_queues/waiting_queue_20250404 | 16:59 |
clarkb | ya that looks good to me | 16:59 |
fungi | that's how we'll move the waiting queue once the server is offline | 16:59 |
fungi | okay, i'll status notice something before we restart | 17:00 |
clarkb | do you want to move the four caches (* 2 or 3 files) that I mentioned above too or see if gerrit has imrpvoed? | 17:00 |
clarkb | we know that if things do go sideways/sad that stopping gerrit and moving caches aside at that point does seem to resolve it | 17:00 |
clarkb | so we do have an out | 17:00 |
fungi | what's your gut feel with those? are they getting cumbersomely large? | 17:00 |
clarkb | the sizes we currently have are in a range where we've seen things be fine on restart and we've seen things not be fine | 17:01 |
clarkb | I suspect that may be due to iops/throughput when it does its startup pruning | 17:01 |
clarkb | so the big caches are ok if the network and ceph are happy but not when its slower? But that is just a hunch | 17:01 |
clarkb | removing the caches does cause diffs to be available on changes pages quicker | 17:02 |
clarkb | but may slow down page loads overall | 17:02 |
clarkb | vs taking the 5-10 minute hit upfront and having faster loads overall | 17:02 |
fungi | so something like `rm /home/gerrit2/review_site/cache/{gerrit_file_diff,git_file_diff,git_modified_files,diff_intraline}.{h2,db,lock}` | 17:05 |
clarkb | I think its h2.db not h2,db | 17:05 |
clarkb | and ya I guess we can leave the trace files there problably (though I've moved them in the past) | 17:05 |
fungi | oh, trace | 17:06 |
fungi | i misread your earlier comment about file suffixes | 17:06 |
fungi | okay, gerrit_file_diff and git_file_diff don't have a .trace.db | 17:08 |
fungi | but otherwise `rm /home/gerrit2/review_site/cache/{gerrit_file_diff,git_file_diff,git_modified_files,diff_intraline}.{h2,lock,trace}.db` | 17:08 |
clarkb | ya that looks about right | 17:09 |
fungi | rm them or mv them? | 17:09 |
clarkb | I moved them previously then rm'd them out of the dest dir. The reason for that is I wasn't compeltely sure that gerrit would recreate them automatically but it appears to do so so I think rm should be safe | 17:09 |
clarkb | basically allowed me to be able to recreate the files if necessary with the right perms etc | 17:09 |
clarkb | but gerrit seems to create missing caches on startup | 17:10 |
fungi | so is there a reason to save the replication waiting queue, or could we just rm that too? | 17:10 |
clarkb | the reason to save the replication queue is to have a data corpus to debug and fix the issue as there are different scenarios that can cause those events to leak | 17:11 |
clarkb | at this point i think we have probably collected enough data for that | 17:11 |
fungi | so no real benefit to preserving those either | 17:11 |
clarkb | if you want to rm that content that should be fine too (it might be slow as it is a number of inodes) | 17:11 |
clarkb | so I guess mv then rm would allow us to do it with a potentially shorter gerrit outage | 17:12 |
clarkb | but its probably not slow enough to matter | 17:12 |
fungi | that's a good point, unlinking tons of inodes could be slow | 17:12 |
fungi | i'll stick with the mv for that | 17:12 |
clarkb | sounds good | 17:12 |
fungi | status notice The Gerrit service on review.opendev.org will be offline momentarily for a patch release update | 17:14 |
fungi | that look okay? | 17:14 |
clarkb | yup notice logtm | 17:15 |
fungi | #status notice The Gerrit service on review.opendev.org will be offline momentarily for a patch release update | 17:15 |
opendevstatus | fungi: sending notice | 17:15 |
clarkb | and the command you have queued up is a mouthfull but also looks right | 17:15 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily for a patch release update | 17:15 | |
fungi | once that returns, yeah... enter key | 17:15 |
clarkb | side note grafana says dfw3 is back in service | 17:17 |
fungi | nice | 17:17 |
opendevstatus | fungi: finished sending notice | 17:18 |
fungi | restarting... | 17:18 |
clarkb | [2025-04-04T17:18:35.604Z] [main] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.10.5-1-g47283ba335-dirty ready | 17:18 |
fungi | Powered by Gerrit Code Review (3.10.5-1-g47283ba335-dirty) | 17:19 |
clarkb | web ui is up and I can view diffs on https://review.opendev.org/c/openstack/project-config/+/946280 | 17:19 |
fungi | yeah, came up rather quickly | 17:19 |
fungi | might be one of our fastest restarts yet | 17:19 |
clarkb | https://review.opendev.org/c/starlingx/specs/+/946244 | 17:20 |
clarkb | this got a new patchset | 17:20 |
clarkb | and I believe it was pushed post restart | 17:20 |
clarkb | so we can check replication using that new patchset. I'll work on that now | 17:20 |
clarkb | origin https://opendev.org/starlingx/specs/ and git fetch origin refs/changes/44/946244/14 produces 1172d3da10debd68ceb4c5f3189b8b9f8d1315d6 which seems to match the new patchset so I think that is good | 17:21 |
fungi | yes, looks right here too | 17:22 |
clarkb | the error log is still full of people trying to connect with ancient ci systems that can't negotiate ssh | 17:22 |
clarkb | I guess we shouldn't put random user ips in public config management even if we think they are tied to a system and not an individual? | 17:22 |
clarkb | I'd like to add them to a permanent firewall block list. I guess we can use secret vars for that maybe | 17:22 |
clarkb | it is an IBM ip address if anyone happens to know how to escalate to IBM via red hat or something | 17:23 |
fungi | for the record, this works too and may be easier: https://opendev.org/starlingx/specs/commit/1172d3da10debd68ceb4c5f3189b8b9f8d1315d6 | 17:23 |
fungi | (just reference the commit id) | 17:23 |
clarkb | oh cool. I never do that because I can never remember what the pathing is for gitea. But I can remember the special gerrit changes ref for whatever reason | 17:24 |
clarkb | brains are weird | 17:24 |
clarkb | anything else you want me to check? | 17:24 |
fungi | no, seems like this is good | 17:24 |
clarkb | I detached from the screen and will let you decide when you want to shut it down | 17:24 |
Clark[m] | Gerrit still happy? | 19:15 |
fungi | seems so | 19:16 |
fungi | hard to tell on a friday | 19:16 |
Clark[m] | firefox has native vertical tabs now. I don't think it looks as good or works as well as tree style tabs but progress | 21:22 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!