Thursday, 2025-04-24

opendevreviewTony Breeds proposed openstack/project-config master: [Discussion] Import rdoinfo from the rdoproject  https://review.opendev.org/c/openstack/project-config/+/94803300:04
opendevreviewTony Breeds proposed openstack/project-config master: [Discussion] Import rdoinfo from the rdoproject  https://review.opendev.org/c/openstack/project-config/+/94803301:08
mnasiadkacorvus: good, so now that I understand how it works - I’ll work on the rest next week :)05:08
*** ykarel_ is now known as ykarel05:17
fricklerinfra-root: I noticed that the git gc cron is still running on both review02 and review03, but I assume it doesn't matter since the repos on 02 should get used anywhere anymore?05:54
fricklerbut there is also a small regression in the content of those cron mails, on review03 there are lines like "text1\Mtext2", while for review02 those would appear properly formatted as two lines05:56
fricklereh, s/\M/^M/05:56
opendevreviewLukas Kranz proposed zuul/zuul-jobs master: mirror-workspace-git-repos: Allow deleting current branch  https://review.opendev.org/c/zuul/zuul-jobs/+/94603307:05
*** rlandy__ is now known as rlandy11:28
noonedeadpunkHey! Any idea why https://www.openstack.org/project-mascots is timeouting? Or am I using a wrong page to look for project mascots?12:23
noonedeadpunkAs it also seems it went out of search engines as well12:23
fungifrickler: sounds like stray carriage returns in the message body12:59
funginoonedeadpunk: i think there's been problems with the vexxhost ceph object store that's served from in the past, i've seen it with the mascot logos and also member org logos on the openinfra site. i'll give the sysadmins and vexxhost folks a heads up13:00
fungioh, though the last time this happened (february) it was unrelated to vexxhost's ceph object storage, so i've initially just gave the webdev contractors for the foundation a heads up13:07
funginoonedeadpunk: it's been loading intermittently for me, if you reload a few times you might get lucky, but also they've confirmed and are looking into the problem13:30
noonedeadpunkIt could be also as I'm havbing more remote location then you...13:30
noonedeadpunkthus cloudflare might be also taking longer to get the data13:31
noonedeadpunkAt least my success rate was around 0 last 5 or 6 attempts13:31
noonedeadpunkthanks for escalation!13:32
funginoonedeadpunk: seems like they made some adjustments that should make it load more reliably through the cdn now if you want to try again14:05
fungithey found some backend code that was taking too long to return and they're going to make it async to speed up loading14:06
noonedeadpunkYeah, it's waaay better now14:09
noonedeadpunkI think it was loading all media before returning the page14:10
noonedeadpunkor smth like thatr14:10
fungiit still is for now, they simply increasd some timeouts as a workaround while they refactor the code14:12
opendevreviewIldiko Vancsa proposed opendev/system-config master: Remove logging from Kata IRC channels  https://review.opendev.org/c/opendev/system-config/+/94808114:35
clarkbfrickler: correct the git gc runs against the local git repos so its fine for taht to keep running on review02 until we shut it down14:46
clarkbfrickler: I don't see what you mean about the ^M but maybe my mail client is rendering things even in the raw message view?14:47
clarkbonce I'm caught up on morning stuff I intend on approving https://review.opendev.org/c/opendev/system-config/+/947758 to remove review02 from our inventory14:54
clarkbplease say something if you think it is too early to do so (it will apply about 3 days after the server move and doesn't mean we'll delete the server yet. Just trying to get through things one step at a time)14:54
fungisounds great to me14:56
clarkbat that point https://review.opendev.org/c/opendev/system-config/+/947759 should be safe too (makes the docker-compose.yaml docker compose specific) so I'll approve that one too14:57
fricklerclarkb: maybe, I'm using mutt15:34
fungii'm also using mutt, i'll check my cronspam folder15:39
clarkbdoesn't look like we get interleaved output either (we shouldn't its a find command that should run things serially)15:48
clarkbbut maybe it has to do with changes to git or find?15:48
fungisome of the lines in the "Subject: Cron <gerrit2@review03> find /home/gerrit2/review_site/git/ ..." message are separated by a bare cr (^M) instead of an lf, specifically those that say "Expanding reachable commits in commit graph: ..."15:57
fungimy guess is that something changed wrt the output of git tools on noble15:58
clarkbthose lines get rendered with line feeds instead of rednered ^M even in raw viewer for me15:59
clarkbbut ya I'm not sure that is a regression15:59
frickleroh, maybe the command really only sends ^M to have the output stay on the same line?15:59
fungimost of the other lines are not strung together, e.g. the ones that list the repositories being processed15:59
fricklerI guess I can try running git-gc on some of those repos locally15:59
fungiyeah, you'd need to get them into a not-yet-collected state though16:00
clarkbI have approved https://review.opendev.org/c/opendev/system-config/+/94775816:06
opendevreviewClark Boylan proposed opendev/system-config master: DNM intentional Gitea failure to hold a node  https://review.opendev.org/c/opendev/system-config/+/84818116:10
fricklerhmm, whatever I do, git-gc gives me much more output than what I see in those mails. also it stops generating any output as soon as I try to pipe it somewhere for detailed inspection. anyway it doesn't seem critical, so I'll just leave it at that, I just stumbled about the difference when I saw the two mails side by side this morning16:15
fungifrickler: you need stdin to not be a tty i think, or maybe stdout. try piping through cat and redirecting in from /dev/null?16:17
clarkban at now job might also somewhat replicate the cron env?16:17
fungialso some of those lines might be on other fds, could be a combination of stdout and stderr are interleaved there16:17
opendevreviewMerged opendev/system-config master: Remove review02 from the inventory  https://review.opendev.org/c/opendev/system-config/+/94775816:57
clarkbthat is ahead of hourly jobs. I'm keeping an eye on it as jobs run17:00
clarkbother than removing review2 from zuul known_hosts that should largely be a noop though17:01
clarkbI guess the docs get updated17:01
clarkbdeployment of 947758 reports success. I'll approve the cnage to edit the docker-compose.yaml file now17:39
clarkbinfra-root https://zuul.opendev.org/t/openstack/build/572f8d17b5f249e9a415e35572a30614/logs just had a post failure with no logs. I'm trying to find the executor that ran it now to see if this is a sad swift backend17:50
clarkbit ran on ze03 and failed to upload to ovh bhs117:52
clarkbits possible this is a one off so we shouldn't disable ovh yet. But we should monitor things and be prepared to do so17:53
clarkbhttps://zuul.opendev.org/t/openstack/build/4ff5dc8ad91f47c29abeceac1070ba86/logs is another likely case. But I've only see the two so far18:05
clarkbthat one ran on ze06 and also tried to upload to ovh bhs18:06
clarkbthe ovh status page says object storage is degraded but I think that is for https://public-cloud.status-ovhcloud.com/incidents/491vx956zx6b which shouldn't affect us as we don't use s3 sdks18:09
clarkbanyway if it gets persistently worse we can disable ovh bhs and possibly ovh gra18:09
fungii'll try to keep an eye out18:20
opendevreviewClark Boylan proposed opendev/system-config master: Update gitea web admin docs  https://review.opendev.org/c/opendev/system-config/+/94811618:34
clarkbfungi: ^ this came out of testing the sync tag function on the held gitea. That gitea is here: https://104.239.175.21:3081 if you want to test too18:38
clarkblooks like when you click the button it enqueues the task into a queue. IF you navigate to monitoring -> queues on the left side there is a tag_sync queue which by default has 1 worker and up to 4 workers and the number in queue will jump up to 19XY then down to 0. This happens quickly on the test node so you have to go fast to see it there. I suspect it won't be so fast in18:39
clarkbproduction18:39
clarkbfungi: so now the question is do we just want to send it on the production node and trust that having things in a queue with a limited number of workers won't overwhelm it?18:39
clarkbannoyingly the service log doesnt' seem to report success or completion and cross checking the code I think that is expected (it reports errors)18:40
clarkbso we'd be relyong on that number in queue value to drop to 0 to know when it is done18:40
fungii suppose we could temporarily down gitea09 in haproxy first if we're especially concerned18:41
clarkbya maybe not a bad idea for the first one18:42
clarkbmy focus today is still getting those gerrit cleanups landed. So I'll recheck that change before going to lunch. But its going slow enough that maybe after lunch I'll just go ahead and do that with 09 (pull it out of the haproxy, click the button, wait for queue to report 0 in the queue, then add back to haproxy)18:43
clarkbthen we can check the results and if they look good proceed to do the others18:43
clarkbfungi: re 948116 I don't think this changed our security stance compared to the past18:44
clarkbI'm just updating the docs so tehy are accurate for current gitea deployment18:44
fungino, i agree, i was just pointing out that we do it for some other services, though they're more problematic than gitea18:44
clarkbah18:44
clarkbok version specifier removal change has been rechecked. I should look for food now19:05
fungithat was the one where you first spotted the ovh log upload failure?19:08
fungibon appétit!19:08
clarkbI've logged into gitea09, but socat on gitea-lb02 is giving me permission denied to the socket object for haproxy command20:22
clarkbthis server is still jammy not noble so not hitting the apparmor problems I don't believe20:23
clarkboh wait the path changed20:24
clarkbyup /var/haproxy/run/stats was the old path but we moved it to /var/lib/haproxy/run/stats even on jammy just to keep in sync with noble20:25
clarkbinfra-root: gitea09 has been pulld out of rotation on gitea-lb02 so that I can click the button to resync git tags in repos on gitea0920:26
fungisounds good20:26
fungiand yeah, i recall having to change the command i was running from my shell history when we changed the socket path20:27
clarkbI think it is alredy about halfway done20:29
clarkbits done. So about 5 ish minutes total20:31
clarkbhttps://gitea09.opendev.org:3081/openstack/openstack-ansible-haproxy_server/tags?q=ussuri-eol shows up now20:31
clarkbnoonedeadpunk: ^ fyi20:31
clarkbspot checking zuul and nova I don't see anything going wrong either20:32
clarkbI guess if we don't see any problems between now and tomorrow we can click that sync tags button on the other 5 backends20:33
clarkbI will put 09 back into service with haproxy now20:33
clarkbits probably fine to do these without taking backend sout of haproxy too since it was quick20:33
clarkbI'm logged out of gitea09 now and everything should be back to "normal"20:37
clarkb#status log Ran gitea tag synchronization on gitea09 via the web dashboard20:41
opendevstatusclarkb: finished logging20:41
fungiyeah, seems to have fixed it, lgtm20:48
clarkbI noticed the archive queue has 100000 entries in it. THis is probably related to why archives don't work anymore20:50
clarkbI'm not going to debug that now. Its a known thing and when they do work they fill the disk which is worse20:50
opendevreviewMerged opendev/system-config master: Drop docker-compose version specifier for Gerrit  https://review.opendev.org/c/opendev/system-config/+/94775921:00
clarkbthat appears to have applied successfully21:06
clarkbI wnet ahead and approved the gitea docs update change21:08
opendevreviewMerged opendev/system-config master: Update gitea web admin docs  https://review.opendev.org/c/opendev/system-config/+/94811621:11
clarkbfungi: I think we can consider the python3.12 update for gerrit base images tomorrow too21:14
fungiyeah, that would be great, i plan to be around and can do the restart for it too21:15
clarkbthough we should check the git log on gerrit stable-3.10 before we do that just because we build from source21:15
clarkbhttps://gerrit.googlesource.com/gerrit/+log/refs/heads/stable-3.10 it has never been an issue but I do try to check quickly for anything problematic these days21:16
fungisure21:16
clarkblooks like they are in the middle of debugging some change id lookup latency. The changes primarily seem to be around when you import changes/projects from other servers with different serverids21:19
clarkbwe have never done that so I suspect most of htose changes are noops for us and we're good to go21:19
clarkbbut feel free to look it over and call out any concerns. I'm happy to dig in further too21:20
opendevreviewMerged opendev/system-config master: Remove logging from Kata IRC channels  https://review.opendev.org/c/opendev/system-config/+/94808121:28
clarkbfungi: I don't think ^ restarte dthe container to pick up the new config21:57
fungioh, huh22:02
fungii wonder if we removed the automatic restart from the playbook and decided to just do manual restarts for safety? i'll take a look after dinner22:03
fungilooks like https://opendev.org/opendev/system-config/src/commit/a4a885b/playbooks/roles/limnoria/tasks/main.yaml only effectively restarts the container on an image update?22:59
fungi#status log restarted meetbot container on eavesdrop01 to pick up configuration change from https://review.opendev.org/94808123:02
opendevstatusfungi: finished logging23:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!