Thursday, 2025-04-24

opendevreview	Tony Breeds proposed openstack/project-config master: [Discussion] Import rdoinfo from the rdoproject https://review.opendev.org/c/openstack/project-config/+/948033	00:04
opendevreview	Tony Breeds proposed openstack/project-config master: [Discussion] Import rdoinfo from the rdoproject https://review.opendev.org/c/openstack/project-config/+/948033	01:08
mnasiadka	corvus: good, so now that I understand how it works - I’ll work on the rest next week :)	05:08
*** ykarel_ is now known as ykarel		05:17
frickler	infra-root: I noticed that the git gc cron is still running on both review02 and review03, but I assume it doesn't matter since the repos on 02 should get used anywhere anymore?	05:54
frickler	but there is also a small regression in the content of those cron mails, on review03 there are lines like "text1\Mtext2", while for review02 those would appear properly formatted as two lines	05:56
frickler	eh, s/\M/^M/	05:56
opendevreview	Lukas Kranz proposed zuul/zuul-jobs master: mirror-workspace-git-repos: Allow deleting current branch https://review.opendev.org/c/zuul/zuul-jobs/+/946033	07:05
*** rlandy__ is now known as rlandy		11:28
noonedeadpunk	Hey! Any idea why https://www.openstack.org/project-mascots is timeouting? Or am I using a wrong page to look for project mascots?	12:23
noonedeadpunk	As it also seems it went out of search engines as well	12:23
fungi	frickler: sounds like stray carriage returns in the message body	12:59
fungi	noonedeadpunk: i think there's been problems with the vexxhost ceph object store that's served from in the past, i've seen it with the mascot logos and also member org logos on the openinfra site. i'll give the sysadmins and vexxhost folks a heads up	13:00
fungi	oh, though the last time this happened (february) it was unrelated to vexxhost's ceph object storage, so i've initially just gave the webdev contractors for the foundation a heads up	13:07
fungi	noonedeadpunk: it's been loading intermittently for me, if you reload a few times you might get lucky, but also they've confirmed and are looking into the problem	13:30
noonedeadpunk	It could be also as I'm havbing more remote location then you...	13:30
noonedeadpunk	thus cloudflare might be also taking longer to get the data	13:31
noonedeadpunk	At least my success rate was around 0 last 5 or 6 attempts	13:31
noonedeadpunk	thanks for escalation!	13:32
fungi	noonedeadpunk: seems like they made some adjustments that should make it load more reliably through the cdn now if you want to try again	14:05
fungi	they found some backend code that was taking too long to return and they're going to make it async to speed up loading	14:06
noonedeadpunk	Yeah, it's waaay better now	14:09
noonedeadpunk	I think it was loading all media before returning the page	14:10
noonedeadpunk	or smth like thatr	14:10
fungi	it still is for now, they simply increasd some timeouts as a workaround while they refactor the code	14:12
opendevreview	Ildiko Vancsa proposed opendev/system-config master: Remove logging from Kata IRC channels https://review.opendev.org/c/opendev/system-config/+/948081	14:35
clarkb	frickler: correct the git gc runs against the local git repos so its fine for taht to keep running on review02 until we shut it down	14:46
clarkb	frickler: I don't see what you mean about the ^M but maybe my mail client is rendering things even in the raw message view?	14:47
clarkb	once I'm caught up on morning stuff I intend on approving https://review.opendev.org/c/opendev/system-config/+/947758 to remove review02 from our inventory	14:54
clarkb	please say something if you think it is too early to do so (it will apply about 3 days after the server move and doesn't mean we'll delete the server yet. Just trying to get through things one step at a time)	14:54
fungi	sounds great to me	14:56
clarkb	at that point https://review.opendev.org/c/opendev/system-config/+/947759 should be safe too (makes the docker-compose.yaml docker compose specific) so I'll approve that one too	14:57
frickler	clarkb: maybe, I'm using mutt	15:34
fungi	i'm also using mutt, i'll check my cronspam folder	15:39
clarkb	doesn't look like we get interleaved output either (we shouldn't its a find command that should run things serially)	15:48
clarkb	but maybe it has to do with changes to git or find?	15:48
fungi	some of the lines in the "Subject: Cron <gerrit2@review03> find /home/gerrit2/review_site/git/ ..." message are separated by a bare cr (^M) instead of an lf, specifically those that say "Expanding reachable commits in commit graph: ..."	15:57
fungi	my guess is that something changed wrt the output of git tools on noble	15:58
clarkb	those lines get rendered with line feeds instead of rednered ^M even in raw viewer for me	15:59
clarkb	but ya I'm not sure that is a regression	15:59
frickler	oh, maybe the command really only sends ^M to have the output stay on the same line?	15:59
fungi	most of the other lines are not strung together, e.g. the ones that list the repositories being processed	15:59
frickler	I guess I can try running git-gc on some of those repos locally	15:59
fungi	yeah, you'd need to get them into a not-yet-collected state though	16:00
clarkb	I have approved https://review.opendev.org/c/opendev/system-config/+/947758	16:06
opendevreview	Clark Boylan proposed opendev/system-config master: DNM intentional Gitea failure to hold a node https://review.opendev.org/c/opendev/system-config/+/848181	16:10
frickler	hmm, whatever I do, git-gc gives me much more output than what I see in those mails. also it stops generating any output as soon as I try to pipe it somewhere for detailed inspection. anyway it doesn't seem critical, so I'll just leave it at that, I just stumbled about the difference when I saw the two mails side by side this morning	16:15
fungi	frickler: you need stdin to not be a tty i think, or maybe stdout. try piping through cat and redirecting in from /dev/null?	16:17
clarkb	an at now job might also somewhat replicate the cron env?	16:17
fungi	also some of those lines might be on other fds, could be a combination of stdout and stderr are interleaved there	16:17
opendevreview	Merged opendev/system-config master: Remove review02 from the inventory https://review.opendev.org/c/opendev/system-config/+/947758	16:57
clarkb	that is ahead of hourly jobs. I'm keeping an eye on it as jobs run	17:00
clarkb	other than removing review2 from zuul known_hosts that should largely be a noop though	17:01
clarkb	I guess the docs get updated	17:01
clarkb	deployment of 947758 reports success. I'll approve the cnage to edit the docker-compose.yaml file now	17:39
clarkb	infra-root https://zuul.opendev.org/t/openstack/build/572f8d17b5f249e9a415e35572a30614/logs just had a post failure with no logs. I'm trying to find the executor that ran it now to see if this is a sad swift backend	17:50
clarkb	it ran on ze03 and failed to upload to ovh bhs1	17:52
clarkb	its possible this is a one off so we shouldn't disable ovh yet. But we should monitor things and be prepared to do so	17:53
clarkb	https://zuul.opendev.org/t/openstack/build/4ff5dc8ad91f47c29abeceac1070ba86/logs is another likely case. But I've only see the two so far	18:05
clarkb	that one ran on ze06 and also tried to upload to ovh bhs	18:06
clarkb	the ovh status page says object storage is degraded but I think that is for https://public-cloud.status-ovhcloud.com/incidents/491vx956zx6b which shouldn't affect us as we don't use s3 sdks	18:09
clarkb	anyway if it gets persistently worse we can disable ovh bhs and possibly ovh gra	18:09
fungi	i'll try to keep an eye out	18:20
opendevreview	Clark Boylan proposed opendev/system-config master: Update gitea web admin docs https://review.opendev.org/c/opendev/system-config/+/948116	18:34
clarkb	fungi: ^ this came out of testing the sync tag function on the held gitea. That gitea is here: https://104.239.175.21:3081 if you want to test too	18:38
clarkb	looks like when you click the button it enqueues the task into a queue. IF you navigate to monitoring -> queues on the left side there is a tag_sync queue which by default has 1 worker and up to 4 workers and the number in queue will jump up to 19XY then down to 0. This happens quickly on the test node so you have to go fast to see it there. I suspect it won't be so fast in	18:39
clarkb	production	18:39
clarkb	fungi: so now the question is do we just want to send it on the production node and trust that having things in a queue with a limited number of workers won't overwhelm it?	18:39
clarkb	annoyingly the service log doesnt' seem to report success or completion and cross checking the code I think that is expected (it reports errors)	18:40
clarkb	so we'd be relyong on that number in queue value to drop to 0 to know when it is done	18:40
fungi	i suppose we could temporarily down gitea09 in haproxy first if we're especially concerned	18:41
clarkb	ya maybe not a bad idea for the first one	18:42
clarkb	my focus today is still getting those gerrit cleanups landed. So I'll recheck that change before going to lunch. But its going slow enough that maybe after lunch I'll just go ahead and do that with 09 (pull it out of the haproxy, click the button, wait for queue to report 0 in the queue, then add back to haproxy)	18:43
clarkb	then we can check the results and if they look good proceed to do the others	18:43
clarkb	fungi: re 948116 I don't think this changed our security stance compared to the past	18:44
clarkb	I'm just updating the docs so tehy are accurate for current gitea deployment	18:44
fungi	no, i agree, i was just pointing out that we do it for some other services, though they're more problematic than gitea	18:44
clarkb	ah	18:44
clarkb	ok version specifier removal change has been rechecked. I should look for food now	19:05
fungi	that was the one where you first spotted the ovh log upload failure?	19:08
fungi	bon appétit!	19:08
clarkb	I've logged into gitea09, but socat on gitea-lb02 is giving me permission denied to the socket object for haproxy command	20:22
clarkb	this server is still jammy not noble so not hitting the apparmor problems I don't believe	20:23
clarkb	oh wait the path changed	20:24
clarkb	yup /var/haproxy/run/stats was the old path but we moved it to /var/lib/haproxy/run/stats even on jammy just to keep in sync with noble	20:25
clarkb	infra-root: gitea09 has been pulld out of rotation on gitea-lb02 so that I can click the button to resync git tags in repos on gitea09	20:26
fungi	sounds good	20:26
fungi	and yeah, i recall having to change the command i was running from my shell history when we changed the socket path	20:27
clarkb	I think it is alredy about halfway done	20:29
clarkb	its done. So about 5 ish minutes total	20:31
clarkb	https://gitea09.opendev.org:3081/openstack/openstack-ansible-haproxy_server/tags?q=ussuri-eol shows up now	20:31
clarkb	noonedeadpunk: ^ fyi	20:31
clarkb	spot checking zuul and nova I don't see anything going wrong either	20:32
clarkb	I guess if we don't see any problems between now and tomorrow we can click that sync tags button on the other 5 backends	20:33
clarkb	I will put 09 back into service with haproxy now	20:33
clarkb	its probably fine to do these without taking backend sout of haproxy too since it was quick	20:33
clarkb	I'm logged out of gitea09 now and everything should be back to "normal"	20:37
clarkb	#status log Ran gitea tag synchronization on gitea09 via the web dashboard	20:41
opendevstatus	clarkb: finished logging	20:41
fungi	yeah, seems to have fixed it, lgtm	20:48
clarkb	I noticed the archive queue has 100000 entries in it. THis is probably related to why archives don't work anymore	20:50
clarkb	I'm not going to debug that now. Its a known thing and when they do work they fill the disk which is worse	20:50
opendevreview	Merged opendev/system-config master: Drop docker-compose version specifier for Gerrit https://review.opendev.org/c/opendev/system-config/+/947759	21:00
clarkb	that appears to have applied successfully	21:06
clarkb	I wnet ahead and approved the gitea docs update change	21:08
opendevreview	Merged opendev/system-config master: Update gitea web admin docs https://review.opendev.org/c/opendev/system-config/+/948116	21:11
clarkb	fungi: I think we can consider the python3.12 update for gerrit base images tomorrow too	21:14
fungi	yeah, that would be great, i plan to be around and can do the restart for it too	21:15
clarkb	though we should check the git log on gerrit stable-3.10 before we do that just because we build from source	21:15
clarkb	https://gerrit.googlesource.com/gerrit/+log/refs/heads/stable-3.10 it has never been an issue but I do try to check quickly for anything problematic these days	21:16
fungi	sure	21:16
clarkb	looks like they are in the middle of debugging some change id lookup latency. The changes primarily seem to be around when you import changes/projects from other servers with different serverids	21:19
clarkb	we have never done that so I suspect most of htose changes are noops for us and we're good to go	21:19
clarkb	but feel free to look it over and call out any concerns. I'm happy to dig in further too	21:20
opendevreview	Merged opendev/system-config master: Remove logging from Kata IRC channels https://review.opendev.org/c/opendev/system-config/+/948081	21:28
clarkb	fungi: I don't think ^ restarte dthe container to pick up the new config	21:57
fungi	oh, huh	22:02
fungi	i wonder if we removed the automatic restart from the playbook and decided to just do manual restarts for safety? i'll take a look after dinner	22:03
fungi	looks like https://opendev.org/opendev/system-config/src/commit/a4a885b/playbooks/roles/limnoria/tasks/main.yaml only effectively restarts the container on an image update?	22:59
fungi	#status log restarted meetbot container on eavesdrop01 to pick up configuration change from https://review.opendev.org/948081	23:02
opendevstatus	fungi: finished logging	23:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!