Friday, 2025-04-25

Unit193	FWIW, the debian package of limnoria now ships a template user unit for it, not always useful depending on your setup but somewhat generic.	01:01
fungi	Unit193: yeah, we actually run it in a docker container, but good to know!	14:13
fungi	seems like our mailing lists aren't the only ones struggling to balance usability with deliverability these days: https://www.openwall.com/lists/oss-security/2025/04/24/9	14:21
opendevreview	Julia Kreger proposed openstack/diskimage-builder master: Fix rhel/centos multipathing issues https://review.opendev.org/c/openstack/diskimage-builder/+/948244	14:36
clarkb	I don't see any new complaints about gitea so I'll proceed with syncing tags on gitea10-14 this morning	14:47
clarkb	fungi: I'm around too if we want to proceed with https://review.opendev.org/c/opendev/system-config/+/944408 and https://review.opendev.org/c/openstack/project-config/+/944409 and do a gerrit restart today	14:48
clarkb	gitea10 tag sync has started	14:52
clarkb	and it is done. I don't need to flood the channel with status updates as I work through the other 4. But I'll status log when I'm completely done	14:56
fungi	yeah, going back over those now	15:11
fungi	i've approved the first one	15:13
fungi	will wait to approve the other until it's done	15:13
clarkb	#status log Ran gitea repo to db tag sync process via dashboard on the remaining gitea backends (gitea10-gitea14)	15:19
clarkb	this should be done now	15:19
opendevstatus	clarkb: finished logging	15:19
clarkb	noonedeadpunk: ^ fyi I'm going to consider this done for now. Please let us know if you notice any more tags are missing	15:25
noonedeadpunk	++ sure, will do	15:30
noonedeadpunk	thanks for taking time to figure this out!	15:30
clarkb	infra-root when doing that work this morning I found the admin dashboard monitoring -> stats page gives quick counts on tags that gitea knows about (as well as branches). Can be an easy way to check if the numbers start to diverge greatly again	15:30
clarkb	the branch count seemed consistent across the backends but tag count was not. I suspect because there are feewer branches and we tend to update branches multiple times giving gitea a chance to catch up if it falls behind. Whereas tags are push once and forgotten	15:31
fungi	did the values match up after the fix?	15:32
clarkb	fungi: yes	15:32
clarkb	58835 tags and 9707 branches	15:32
clarkb	though I didn't go back to check gitea10 or gitea09 (I discoverd this page when I got to gitea11)	15:33
clarkb	fungi: the gerrit python3.12 image update bounced off of docker hub rate limits	15:36
clarkb	Maybe give it until 1600 UTC and then recheck ?	15:37
clarkb	in the meantime I'm going to find some breakfast	15:37
fungi	mmm, sure can do	15:37
fungi	looks like it was trying to pull our python-builder image from dockerhub that failed. i guess if this keeps being a frequent problem we could revisit the choice not to publish our image dependencies to both quay and dockerhub in parallel	15:47
fungi	i remember there was a semi-recent change to make the roles in zuul-jobs support that case	15:47
fungi	yet another case of dockerhub rate limits getting in the way of us migrating off dockerhub	15:48
clarkb	yup. But if we pull from quay then we stop getting speculative builds	15:53
clarkb	so its a damned if you do damned if you don't situation. Things have generally been better with the "moev what we can approach" at least	15:53
fungi	not if they're dependencies for other images we're publishing to quat though, right?	15:53
fungi	s/quat/quay/	15:53
fungi	i'm assuming the problem is that the things we're running on noble are based on images we upload to quay so if we built them with dependencies hosted on quay we'd still have speculative testing	15:54
fungi	but those same dependencies are also used in building images for things we run on jammy that need to be uploaded to dockerhub	15:55
clarkb	its both. there is speculative testing of the build side and the deployment side	15:55
fungi	so if the dependencies were uploaded to both places in parallel then the images we're hosting on quay could use the quay copies and the ones we upload to dockerhub could use the dockerhub copies?	15:55
clarkb	yes	15:55
fungi	i guess we'd have to build the images twice in separate jobs to be able to speculatively test changes to the dependency images though	15:56
fungi	use one python-builder image build for testing use in building the things that are uploaded to quay, and a separate one for things that are uploaded to dockerhub	15:56
clarkb	oh also the build jobs dont force ipv4	15:56
clarkb	we can do that and that should help a lot	15:57
fungi	so in essence we'd really need to publish a python-builder-dockerhub image to dockerhub and a python-builder-quay image to quay and then set the image build dependencies to one or the other, and yeah that's a fair amount of extra complexity and literal duplication of work	15:57
clarkb	I suspect ifwe switch the build jobs to force things to ipv4 we'll get better results. Probably not 100% success but better than we get now	15:59
clarkb	and maybe that is good enough to limp along for the moment	15:59
clarkb	I'll look at porting that hack into the build jobs	16:00
opendevreview	Clark Boylan proposed opendev/system-config master: Force IPv4 connectivity to Docker Hub during image builds https://review.opendev.org/c/opendev/system-config/+/948247	16:10
clarkb	something like that maybe.	16:10
clarkb	oh crap	16:10
clarkb	that runs all the image builds	16:10
clarkb	fungi: do you think we should dequeue ^ from check so that the gerrit change has a higher chance of getting in nowish and then reenqueue later?	16:11
clarkb	I'll go ahead and do that since its low impact	16:11
clarkb	that is done. We can rehceck it when we're happy to eat into our quotas	16:12
clarkb	I was worried I would have to update an image so that it would build... Had the exact opposite problem	16:13
opendevreview	sean mooney proposed openstack/project-config master: create openstack/grian-ui git repo https://review.opendev.org/c/openstack/project-config/+/948249	16:20
opendevreview	sean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo https://review.opendev.org/c/openstack/project-config/+/948250	16:20
fungi	hah, yikes	16:21
clarkb	I think that change will improve things once it lands. but while we try to land it it will eat into quotas (albeit the ipv4 quotas which are higher)	16:22
clarkb	fungi: shoudl I recheck the gerrit python3.12 change now?	16:28
fungi	sure!	16:30
clarkb	done	16:30
opendevreview	sean mooney proposed openstack/project-config master: create openstack/grian-ui git repo https://review.opendev.org/c/openstack/project-config/+/948249	16:31
opendevreview	sean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo https://review.opendev.org/c/openstack/project-config/+/948250	16:32
clarkb	fungi: images built in check this time around. Now we're on to testing gerrit deployment stuff	16:47
fungi	good deal	16:49
clarkb	and now we are in the gate again. Halfway there	17:02
opendevreview	sean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo https://review.opendev.org/c/openstack/project-config/+/948250	17:05
opendevreview	sean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo https://review.opendev.org/c/openstack/project-config/+/948250	17:06
opendevreview	Merged opendev/system-config master: Update Gerrit container image to python3.12 https://review.opendev.org/c/opendev/system-config/+/944408	17:55
clarkb	fungi: ^	17:55
clarkb	once that promotes and "deploys" we should be good to do the manual steps	17:56
fungi	yep	17:56
clarkb	and now promotion is done	17:57
fungi	do we not want 944409 in as well before restarting?	17:57
clarkb	fungi: I don't think 944409 affecst the images directly. It will just ensure that when we update jeepyb later it uses the right job order and deps	17:58
clarkb	but 944409 should land quickly so we may as well approve it first I guess and be sure	17:58
fungi	oh, i see, it only adds/adjusts job dependencies on other jobs	18:00
fungi	so yeah, probably no immediate outcome from landing that	18:00
clarkb	but it should be quick so lets go ahead and do that	18:00
clarkb	then its done and out of the way	18:00
fungi	k, i guess it's safe to approve now, doesn't need the depends-on to be actually deployed	18:01
clarkb	nope	18:01
opendevreview	Merged openstack/project-config master: Update jeepyb gerrit image build deps https://review.opendev.org/c/openstack/project-config/+/944409	18:06
clarkb	fungi: the process should be roughly `docker compose pull && docker compose down && mv waiting queue aside && docker compose up -d` basically the same as before but you can choose to use the - between docker compose or not	18:06
clarkb	the two things I'd like to watch for are the warning about docker compsoe version being gone and that the down happens in a reasonable amount of time	18:07
clarkb	if you'd like to start a screen I can join it shortly	18:07
fungi	pulling in a root screen session now	18:09
clarkb	oh and if we make note of the current image then we can alawys fall back to it later if python3.12 is a problem or whatever	18:09
fungi	<none> <none> f5b922fbdc07 3 weeks ago 691MB	18:09
fungi	opendevorg/gerrit 3.10 ca3978438207 38 minutes ago 691MB	18:09
fungi	those are the old and new images	18:09
clarkb	can you do a docker ps -a too?	18:10
fungi	there are two gerrit containers, one is running and one is exited	18:10
fungi	253604aed051 opendevorg/gerrit:3.10 "/wait-for-it.sh 127…" 4 days ago Up 4 days gerrit-compose-gerrit-1	18:11
clarkb	I think the <none> <none> and the ps -a output showing opendevorg/gerrit:3.10 must be podman artifacts	18:11
clarkb	but otherwise that all looks correct to me	18:11
clarkb	fungi: we should move the waiting queue aside too	18:11
clarkb	cool	18:12
clarkb	how does this look #status notice Gerrit is getting restarted to pick up container image updates. It should only be gone for a moment.	18:12
fungi	lgtm	18:12
clarkb	fungi: review_site/data/plugins/replication/ref-updates/waiting or something	18:12
clarkb	(that is from memory so check it)_	18:13
fungi	what's the path to the waiting queue again? is it somewhere under review_site?	18:13
clarkb	fungi: ya see just a couple messages up	18:13
clarkb	not plugin-manager	18:13
clarkb	maybe its data/replication	18:13
clarkb	ya thats it	18:14
fungi	yeah	18:14
clarkb	then put it in ~gerrit2/tmp so we don't back it up but have the notes for maybe fixing the bug (also same fs so mv is fast)	18:14
clarkb	I'll send the notice now	18:14
fungi	that look right?	18:14
clarkb	#status notice Gerrit is getting restarted to pick up container image updates. It should only be gone for a moment.	18:15
opendevstatus	clarkb: sending notice	18:15
-opendevstatus- NOTICE: Gerrit is getting restarted to pick up container image updates. It should only be gone for a moment.		18:15
clarkb	fungi: yes that command looks correct to me	18:15
fungi	also did we want to clear the caches that tend to build up if we let them go too long?	18:15
clarkb	gerrit_file_diff.h2.db and git_file_diff.h2.db are both >7GB	18:16
clarkb	I think we could clear those two out (and the other associated files)	18:16
fungi	is that... too large?	18:16
clarkb	ya ist in the range of too large	18:16
fungi	k, i'll add that	18:16
clarkb	gerrit_file_diff.lock.db and git_file_diff.lock.db are their associated lock files which I've deleted in the past with the h2 files	18:17
clarkb	fungi: its review_site/cache not review_site/db	18:17
opendevstatus	clarkb: finished sending notice	18:18
fungi	i checked in a separate window that ~gerrit2/review_site/cache/{gerrit_file_diff,git_file_diff}.* will match the four files we're wanting	18:19
clarkb	cool command lgtm	18:19
fungi	in progress	18:19
clarkb	fungi: arg I'm thinking the sigint didn't apply possibly because we started this container with sighup so shutdown is using sighup?	18:20
fungi	i wonder if the sigint isn't helping	18:20
fungi	oh, or that, sure	18:20
fungi	i can kill the process in a separate window	18:20
clarkb	cool you can use kill -HUP or kill -INT I think both should work	18:20
clarkb	but then hopefully the next time we do thsi sigint just works	18:20
fungi	gerrit2 265899 289 28.3 127490100 37370460 ? Sl Apr21 17038:35 /usr/lib/jvm/java-17-openjdk-amd64/bin/java -Djava.security.egd=file:/dev/./urandom -Dlog4j2.formatMsgNoLookups=true -Dh2.maxCompactTime=15000 -Xmx96g -jar /var/gerrit/bin/gerrit.war daemon -d /var/gerrit	18:20
fungi	that one?	18:20
clarkb	yes taht looks correct	18:20
fungi	regular sigterm may have done nothing	18:21
clarkb	fungi: int or hup not term	18:21
clarkb	the log does say it shutdown the sshd	18:21
fungi	yeah, i did a -1 (hup) too but that seems to have been ignored	18:22
fungi	sent sigint just now (-2)	18:22
fungi	that did seem to do something	18:23
clarkb	the log output hasn't changed	18:23
fungi	though it's still saying stopping	18:23
clarkb	its about to get kill -9'd by docker compose...	18:23
clarkb	whcih at this point I guess is probably ok	18:24
fungi	do we want a second restart once this comes up?	18:24
fungi	to test that the change takes effect?	18:25
fungi	webui is loading now	18:25
clarkb	Probably a good idea. Looking at the second screen window you did a kill -1 then a kill -2. Were there any other kills prior to that?	18:25
fungi	i did no other kills besides hup and int, no	18:25
fungi	oh, wait, yes i did a term (default) first	18:25
fungi	maybe the handler for that confused it and it stopped responding to the subsequent signals	18:26
fungi	i.e. just `kill` without specifying a specific signal	18:26
clarkb	ya I guess my concern here is yo ualso issue a hup which is what we've used for years and should've worked just fine if issued from outside podman/docker	18:27
fungi	file diffs are loading for me	18:27
clarkb	but if there was a term first that would be different and maybe it got stuck in some shutdown or something	18:27
fungi	right, that's what i'm theorizing	18:27
fungi	so the signals it got (in order) from me were: 15 (term), 1 (hup), 2 (int)	18:28
clarkb	so ya now that we've loaded the container with the new config (sigint) we expect podman to be able to pass that through	18:28
fungi	ready to try another restart? need to move the waiting queue aside again?	18:28
clarkb	let me check for any apparmor logs really quick first in case something got blocked we didn't expect	18:29
clarkb	and yes I would move the queue aside just to avoid distracting tracebacks in the error log	18:29
fungi	that command is queued up in the screen when we're ready	18:30
clarkb	fungi: the only apparmor audit log I see is when docker compose tried to issue the sighup	18:32
clarkb	so I think we're good. Nothing unexpected there	18:32
clarkb	I'm happy to proceed with another restart if you are	18:32
clarkb	if this fails to stop with the sigint we should issue an out of band sighup	18:32
clarkb	not term etc	18:32
clarkb	since we know hup works	18:33
fungi	in progress	18:34
fungi	that was way faster	18:34
clarkb	looks like it worked	18:34
clarkb	and the log recorded it was stopping the sshd again	18:34
fungi	near instantaneous	18:34
clarkb	ya hup was too	18:34
fungi	just what we wanted	18:34
fungi	and the webui is already loading for me again	18:34
clarkb	so the issue with the first round must've been with the sigterm	18:34
fungi	with file diffs too	18:34
fungi	yes, i concur, it must have held onto that because it was associated with a container that was started before we changed the shutdown	18:35
clarkb	https://review.opendev.org/c/starlingx/distcloud/+/948262 this just got a new patchset we can use to check replication	18:36
clarkb	I did git fetch origin refs/changes/62/948262/3 and git show FETCH_HEAD while origin is origin https://opendev.org/starlingx/distcloud (fetch) and there is content and the sha matches	18:38
clarkb	so replication lgtm	18:38
clarkb	to recap: looking at audit log (grep audit in /var/log/syslog) docker compose issued a sighup not a sigint and was blocked when we first tried to restart gerrit. This implies that docker compose/podman use the config from booting the service not what is currently on disk to determine the stop signal.	18:39
clarkb	Then we tried an out of band sig term whcih seems to have put gerrit into an odd shutdown state even after subsequent sighup then sigint	18:39
clarkb	eventually docker compose timed out and did a sigkill and gerrit stopped and was restarted	18:39
clarkb	once gerrit was up again we redid the restart to double check sigint would be used and would work and it seems to have done so. And now things should be up and running	18:40
clarkb	I really want to block that ip that can't set up an ssh connnection and make that block permanent	18:40
clarkb	but to do that properly requires private var updates and maybe updates to our iptables role?	18:41
clarkb	confirmed our iptables rules only allow us to open things up based on ansible vars not block them	18:42
clarkb	I should say iptables configuration management	18:42
fungi	yeah	18:42
fungi	it's added complication for arguably little value	18:42
clarkb	maybe tonyb can track down the ip address at IBM :)	18:43
clarkb	fungi: is there any thing else you think we should check (web ui is running, diffs load, someone pushed a patch that replicated). I guess maybe we want to check that jeepyb isn't sad. But I didn't see any tracecbacks after that starlingx/distcloud patch was pushed and that should trigger jeepyb hook scripts	18:44
fungi	nothing i can think of, it all lgtm	18:44
fungi	i can close out the screen session now if you don't see any reason to keep the log	18:45
clarkb	I'm already disconnected and I tried to summarize the sequence of events above. iF you think that summary is accurate and sufficient I think we are good	18:45
fungi	yeah, it's accurate	18:45
fungi	terminated the screen session	18:45
clarkb	looking at man 7 signal I remember now why I use logical names instead of values for kill. Because sparc	18:47
clarkb	the joys of coming up on linux/unix in a mixed ubuntu and solaris shop.	18:48
clarkb	anyway I'm impressed you can remember the number values :)	18:48
clarkb	I'm going to pop out for lunch shortly. Let me know if there is anything else I can do related to this to be helpful	18:50
clarkb	maybe now we want to rehcekc https://review.opendev.org/c/opendev/system-config/+/948247 or should we wait?	18:50
clarkb	oh if you docker inspect that nameless image there are clues that it is the image we want	18:51
clarkb	(the cmd and volume mount values for example)	18:51
clarkb	ok popping out now. Back in a bit	18:54
clarkb	zuul reported on that starlingx change too (they actually pushed another newer patchset)	18:55
clarkb	now I'm really popping out	18:55
fungi	eh, i double-check `man 7 signal` before i did anything. for some reason i never can remember the syntax for passing named signals to kill	18:55
clarkb	ya I think it was just hammered into me that the numbers sometimes differ in the envionrment bceause we had solaris, ubuntu on x86 and ubunt on sparc so use the lgoical names and don't worry	18:56
fungi	right, i got in the habit of checking the signal.7 page on each platform	18:59
clarkb	it looks really quiet. I will recheck that ipv4 docker hub change now	20:48
clarkb	fungi: one of the jobs that is building is the refstack image. Did we end up with an answer for how to announce that teardown?	21:03
opendevreview	Clark Boylan proposed opendev/system-config master: Force IPv4 connectivity to Docker Hub during image builds https://review.opendev.org/c/opendev/system-config/+/948247	21:05
clarkb	ok I think that second patchset is working (logs look correct and at least one image build succeeded already)	21:23
clarkb	the gitea image build failed with a 403 forbidden from https://git.sr.ht/~mariusor/go-xsd-duration?go-get=1 I suspect we've hit source huts anti ddos measures there	21:54
clarkb	dropping the go-get=1 parameter gets me a quick "checking you are not a bot" splash page then the repo behidn it after	21:55
clarkb	that appears to be the only gitea dep on source hut	21:56
clarkb	https://github.com/go-gitea/gitea/issues/22389 is an old issue where there was ocncern that source hut would block go mod access entirely but then they worked out a plan	21:57
clarkb	https://sourcehut.org/blog/2025-04-15-you-cannot-have-our-users-data/	21:58
clarkb	seems they deployed anubis but looks like that only affects you if your user agent contains mozilla which the go mod user agent shouldn't	22:00
clarkb	maybe a fluke. I guess we can retry and if ti fails again see what upstream gitea has to say	22:00
clarkb	hrm looks like maybe we can/should set GOPROXY	22:01
fungi	clarkb: i still haven't heard from wes about whether he wants any sort of formal announcement about the trademark policy changing, i'll ping him again early in the week	22:09
opendevreview	Clark Boylan proposed opendev/system-config master: Build gitea with GOPROXY set https://review.opendev.org/c/opendev/system-config/+/948277	22:13
clarkb	I don't know that ^ is strictly required, but it seems like this is the way people have settled on addressing the go mod traffic problem	22:13
clarkb	the old shutdown was going to be source hut blocking the go proxy entirely and this wouldn't work but then they got the go proxy traffic down to reasonable levels and it seemsl ike we should be fetching from there as a result?	22:14
fungi	seems reasonable to me, sure	22:15
clarkb	hrm that seems to haev exploded. Almost liek setting GOPROXY made docker itself use that proxy for requests for images	22:21
fungi	uh	22:21
fungi	why would docker use a goproxy?	22:21
clarkb	because it is written in go?	22:22
fungi	that seems pathological if so	22:22
clarkb	https://zuul.opendev.org/t/openstack/build/ae94a46b630a482b9c6cecbcdcfa8cb2/log/job-output.txt#739-754 all of these http reuqests failed	22:23
fungi	i mean, to build the docker tools that are written in go maybe, but i wouldn't expect the runtime to just treat that like a general http proxy	22:23
clarkb	ya I wouldn't either. I guess this is similar to what we think of as the rate limit error when we get the buildset registry error back and not the docker rate limit error	22:24
clarkb	I'll recheck and see if it is consistent I guess	22:24
clarkb	if that breaks again I can dig in deeper. For now I think I'm going to catch some afternoon sun from a bike	22:25
fungi	sounds like a far better use of your time to mee	22:25
clarkb	ok recheck is progressing so maybe it was just ratelimits	22:31
clarkb	and I'm off	22:31
fungi	have fun!	22:33
fungi	pep 784 was formally accepted for adding zstandard compression/decompression to the cpython stdlib	22:35
fungi	seems like that's an initial step toward supporting it for wheel compression	22:36
fungi	system-config-build-image-gitea failed again... unrecognized import path "git.sr.ht/~mariusor/go-xsd-duration": reading https://git.sr.ht/~mariusor/go-xsd-duration?go-get=1: 403 Forbidden	22:39
fungi	that looks more like it didn't get the proxy envvar passed through?	22:40
fungi	https://zuul.opendev.org/t/openstack/build/1b45ca22ec8e4f80bd3a558f19bdae4e	22:40
Clark[m]	Ya maybe the docker file isn't quite right. It does have an arg and env stuff set up for goproxy	22:41

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!