Friday, 2025-04-25

Unit193FWIW, the debian package of limnoria now ships a template user unit for it, not always useful depending on your setup but somewhat generic.01:01
fungiUnit193: yeah, we actually run it in a docker container, but good to know!14:13
fungiseems like our mailing lists aren't the only ones struggling to balance usability with deliverability these days: https://www.openwall.com/lists/oss-security/2025/04/24/914:21
opendevreviewJulia Kreger proposed openstack/diskimage-builder master: Fix rhel/centos multipathing issues  https://review.opendev.org/c/openstack/diskimage-builder/+/94824414:36
clarkbI don't see any new complaints about gitea so I'll proceed with syncing tags on gitea10-14 this morning14:47
clarkbfungi: I'm around too if we want to proceed with https://review.opendev.org/c/opendev/system-config/+/944408 and https://review.opendev.org/c/openstack/project-config/+/944409 and do a gerrit restart today14:48
clarkbgitea10 tag sync has started14:52
clarkband it is done. I don't need to flood the channel with status updates as I work through the other 4. But I'll status log when I'm completely done14:56
fungiyeah, going back over those now15:11
fungii've approved the first one15:13
fungiwill wait to approve the other until it's done15:13
clarkb#status log Ran gitea repo to db tag sync process via dashboard on the remaining gitea backends (gitea10-gitea14)15:19
clarkbthis should be done now15:19
opendevstatusclarkb: finished logging15:19
clarkbnoonedeadpunk: ^ fyi I'm going to consider this done for now. Please let us know if you notice any more tags are missing15:25
noonedeadpunk++ sure, will do15:30
noonedeadpunkthanks for taking time to figure this out!15:30
clarkbinfra-root when doing that work this morning I found the admin dashboard monitoring -> stats page gives quick counts on tags that gitea knows about (as well as branches). Can be an easy way to check if the numbers start to diverge greatly again15:30
clarkbthe branch count seemed consistent across the backends but tag count was not. I suspect because there are feewer branches and we tend to update branches multiple times giving gitea a chance to catch up if it falls behind. Whereas tags are push once and forgotten15:31
fungidid the values match up after the fix?15:32
clarkbfungi: yes15:32
clarkb58835 tags and 9707 branches15:32
clarkbthough I didn't go back to check gitea10 or gitea09 (I discoverd this page when I got to gitea11)15:33
clarkbfungi: the gerrit python3.12 image update bounced off of docker hub rate limits15:36
clarkbMaybe give it until 1600 UTC and then recheck ?15:37
clarkbin the meantime I'm going to find some breakfast15:37
fungimmm, sure can do15:37
fungilooks like it was trying to pull our python-builder image from dockerhub that failed. i guess if this keeps being a frequent problem we could revisit the choice not to publish our image dependencies to both quay and dockerhub in parallel15:47
fungii remember there was a semi-recent change to make the roles in zuul-jobs support that case15:47
fungiyet another case of dockerhub rate limits getting in the way of us migrating off dockerhub15:48
clarkbyup. But if we pull from quay then we stop getting speculative builds15:53
clarkbso its a damned if you do damned if you don't situation. Things have generally been better with the "moev what we can approach" at least15:53
funginot if they're dependencies for other images we're publishing to quat though, right?15:53
fungis/quat/quay/15:53
fungii'm assuming the problem is that the things we're running on noble are based on images we upload to quay so if we built them with dependencies hosted on quay we'd still have speculative testing15:54
fungibut those same dependencies are also used in building images for things we run on jammy that need to be uploaded to dockerhub15:55
clarkbits both. there is speculative testing of the build side and the deployment side15:55
fungiso if the dependencies were uploaded to both places in parallel then the images we're hosting on quay could use the quay copies and the ones we upload to dockerhub could use the dockerhub copies?15:55
clarkbyes15:55
fungii guess we'd have to build the images twice in separate jobs to be able to speculatively test changes to the dependency images though15:56
fungiuse one python-builder image build for testing use in building the things that are uploaded to quay, and a separate one for things that are uploaded to dockerhub15:56
clarkboh also the build jobs dont force ipv415:56
clarkbwe can do that and that should help a lot15:57
fungiso in essence we'd really need to publish a python-builder-dockerhub image to dockerhub and a python-builder-quay image to quay and then set the image build dependencies to one or the other, and yeah that's a fair amount of extra complexity and literal duplication of work15:57
clarkbI suspect ifwe switch the build jobs to force things to ipv4 we'll get better results. Probably not 100% success but better than we get now15:59
clarkband maybe that is good enough to limp along for the moment15:59
clarkbI'll look at porting that hack into the build jobs16:00
opendevreviewClark Boylan proposed opendev/system-config master: Force IPv4 connectivity to Docker Hub during image builds  https://review.opendev.org/c/opendev/system-config/+/94824716:10
clarkbsomething like that maybe.16:10
clarkboh crap16:10
clarkbthat runs all the image builds16:10
clarkbfungi: do you think we should dequeue ^ from check so that the gerrit change has a higher chance of getting in nowish and then reenqueue later?16:11
clarkbI'll go ahead and do that since its low impact16:11
clarkbthat is done. We can rehceck it when we're happy to eat into our quotas16:12
clarkbI was worried I would have to update an image so that it would build... Had the exact opposite problem16:13
opendevreviewsean mooney proposed openstack/project-config master: create openstack/grian-ui git repo  https://review.opendev.org/c/openstack/project-config/+/94824916:20
opendevreviewsean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo  https://review.opendev.org/c/openstack/project-config/+/94825016:20
fungihah, yikes16:21
clarkbI think that change will improve things once it lands. but while we try to land it it will eat into quotas (albeit the ipv4 quotas which are higher)16:22
clarkbfungi: shoudl I recheck the gerrit python3.12 change now?16:28
fungisure!16:30
clarkbdone16:30
opendevreviewsean mooney proposed openstack/project-config master: create openstack/grian-ui git repo  https://review.opendev.org/c/openstack/project-config/+/94824916:31
opendevreviewsean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo  https://review.opendev.org/c/openstack/project-config/+/94825016:32
clarkbfungi: images built in check this time around. Now we're on to testing gerrit deployment stuff16:47
fungigood deal16:49
clarkband now we are in the gate again. Halfway there17:02
opendevreviewsean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo  https://review.opendev.org/c/openstack/project-config/+/94825017:05
opendevreviewsean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo  https://review.opendev.org/c/openstack/project-config/+/94825017:06
opendevreviewMerged opendev/system-config master: Update Gerrit container image to python3.12  https://review.opendev.org/c/opendev/system-config/+/94440817:55
clarkbfungi: ^17:55
clarkbonce that promotes and "deploys" we should be good to do the manual steps17:56
fungiyep17:56
clarkband now promotion is done17:57
fungido we not want 944409 in as well before restarting?17:57
clarkbfungi: I don't think 944409 affecst the images directly. It will just ensure that when we update jeepyb later it uses the right job order and deps17:58
clarkbbut 944409 should land quickly so we may as well approve it first I guess and be sure17:58
fungioh, i see, it only adds/adjusts job dependencies on other jobs18:00
fungiso yeah, probably no immediate outcome from landing that18:00
clarkbbut it should be quick so lets go ahead and do that18:00
clarkbthen its done and out of the way18:00
fungik, i guess it's safe to approve now, doesn't need the depends-on to be actually deployed18:01
clarkbnope18:01
opendevreviewMerged openstack/project-config master: Update jeepyb gerrit image build deps  https://review.opendev.org/c/openstack/project-config/+/94440918:06
clarkbfungi: the process should be roughly `docker compose pull && docker compose down && mv waiting queue aside && docker compose up -d` basically the same as before but you can choose to use the - between docker compose or not18:06
clarkbthe two things I'd like to watch for are the warning about docker compsoe version being gone and that the down happens in a reasonable amount of time18:07
clarkbif you'd like to start a screen I can join it shortly18:07
fungipulling in a root screen session now18:09
clarkboh and if we make note of the current image then we can alawys fall back to it later if python3.12 is a problem or whatever18:09
fungi<none>                          <none>    f5b922fbdc07   3 weeks ago      691MB18:09
fungiopendevorg/gerrit               3.10      ca3978438207   38 minutes ago   691MB18:09
fungithose are the old and new images18:09
clarkbcan you do a docker ps -a too?18:10
fungithere are two gerrit containers, one is running and one is exited18:10
fungi253604aed051   opendevorg/gerrit:3.10                "/wait-for-it.sh 127…"   4 days ago   Up 4 days                         gerrit-compose-gerrit-118:11
clarkbI think the <none> <none> and the ps -a output showing opendevorg/gerrit:3.10 must be podman artifacts18:11
clarkbbut otherwise that all looks correct to me18:11
clarkbfungi: we should move the waiting queue aside too18:11
clarkbcool18:12
clarkbhow does this look #status notice Gerrit is getting restarted to pick up container image updates. It should only be gone for a moment.18:12
fungilgtm18:12
clarkbfungi: review_site/data/plugins/replication/ref-updates/waiting or something18:12
clarkb(that is from memory so check it)_18:13
fungiwhat's the path to the waiting queue again? is it somewhere under review_site?18:13
clarkbfungi: ya see just a couple messages up18:13
clarkbnot plugin-manager18:13
clarkbmaybe its data/replication18:13
clarkbya thats it18:14
fungiyeah18:14
clarkbthen put it in ~gerrit2/tmp so we don't back it up but have the notes for maybe fixing the bug (also same fs so mv is fast)18:14
clarkbI'll send the notice now18:14
fungithat look right?18:14
clarkb#status notice Gerrit is getting restarted to pick up container image updates. It should only be gone for a moment.18:15
opendevstatusclarkb: sending notice18:15
-opendevstatus- NOTICE: Gerrit is getting restarted to pick up container image updates. It should only be gone for a moment.18:15
clarkbfungi: yes that command looks correct to me18:15
fungialso did we want to clear the caches that tend to build up if we let them go too long?18:15
clarkbgerrit_file_diff.h2.db and git_file_diff.h2.db are both >7GB18:16
clarkbI think we could clear those two out (and the other associated files)18:16
fungiis that... too large?18:16
clarkbya ist in the range of too large18:16
fungik, i'll add that18:16
clarkbgerrit_file_diff.lock.db and git_file_diff.lock.db are their associated lock files which I've deleted in the past with the h2 files18:17
clarkbfungi: its review_site/cache not review_site/db18:17
opendevstatusclarkb: finished sending notice18:18
fungii checked in a separate window that ~gerrit2/review_site/cache/{gerrit_file_diff,git_file_diff}.* will match the four files we're wanting18:19
clarkbcool command lgtm18:19
fungiin progress18:19
clarkbfungi: arg I'm thinking the sigint didn't apply possibly because we started this container with sighup so shutdown is using sighup?18:20
fungii wonder if the sigint isn't helping18:20
fungioh, or that, sure18:20
fungii can kill the process in a separate window18:20
clarkbcool you can use kill -HUP or kill -INT I think both should work18:20
clarkbbut then hopefully the next time we do thsi sigint just works18:20
fungigerrit2   265899  289 28.3 127490100 37370460 ?  Sl   Apr21 17038:35 /usr/lib/jvm/java-17-openjdk-amd64/bin/java -Djava.security.egd=file:/dev/./urandom -Dlog4j2.formatMsgNoLookups=true -Dh2.maxCompactTime=15000 -Xmx96g -jar /var/gerrit/bin/gerrit.war daemon -d /var/gerrit18:20
fungithat one?18:20
clarkbyes taht looks correct18:20
fungiregular sigterm may have done nothing18:21
clarkbfungi: int or hup not term18:21
clarkbthe log does say it shutdown the sshd18:21
fungiyeah, i did a -1 (hup) too but that seems to have been ignored18:22
fungisent sigint just now (-2)18:22
fungithat did seem to do something18:23
clarkbthe log output hasn't changed18:23
fungithough it's still saying stopping18:23
clarkbits about to get kill -9'd by docker compose...18:23
clarkbwhcih at this point I guess is probably ok18:24
fungido we want a second restart once this comes up?18:24
fungito test that the change takes effect?18:25
fungiwebui is loading now18:25
clarkbProbably a good idea. Looking at the second screen window you did a kill -1 then a kill -2. Were there any other kills prior to that?18:25
fungii did no other kills besides hup and int, no18:25
fungioh, wait, yes i did a term (default) first18:25
fungimaybe the handler for that confused it and it stopped responding to the subsequent signals18:26
fungii.e. just `kill` without specifying a specific signal18:26
clarkbya I guess my concern here is yo ualso issue a hup which is what we've used for years and should've worked just fine if issued from outside podman/docker18:27
fungifile diffs are loading for me18:27
clarkbbut if there was a term first that would be different and maybe it got stuck in some shutdown or something18:27
fungiright, that's what i'm theorizing18:27
fungiso the signals it got (in order) from me were: 15 (term), 1 (hup), 2 (int)18:28
clarkbso ya now that we've loaded the container with the new config (sigint) we expect podman to be able to pass that through18:28
fungiready to try another restart? need to move the waiting queue aside again?18:28
clarkblet me check for any apparmor logs really quick first in case something got blocked we didn't expect18:29
clarkband yes I would move the queue aside just to avoid distracting tracebacks in the error log18:29
fungithat command is queued up in the screen when we're ready18:30
clarkbfungi: the only apparmor audit log I see is when docker compose tried to issue the sighup18:32
clarkbso I think we're good. Nothing unexpected there18:32
clarkbI'm happy to proceed with another restart if you are18:32
clarkbif this fails to stop with the sigint we should issue an out of band sighup18:32
clarkbnot term etc18:32
clarkbsince we know hup works18:33
fungiin progress18:34
fungithat was way faster18:34
clarkblooks like it worked18:34
clarkband the log recorded it was stopping the sshd again18:34
funginear instantaneous18:34
clarkbya hup was too18:34
fungijust what we wanted18:34
fungiand the webui is already loading for me again18:34
clarkbso the issue with the first round must've been with the sigterm18:34
fungiwith file diffs too18:34
fungiyes, i concur, it must have held onto that because it was associated with a container that was started before we changed the shutdown18:35
clarkbhttps://review.opendev.org/c/starlingx/distcloud/+/948262 this just got a new patchset we can use to check replication18:36
clarkbI did git fetch origin refs/changes/62/948262/3 and git show FETCH_HEAD while origin is origin https://opendev.org/starlingx/distcloud (fetch) and there is content and the sha matches18:38
clarkbso replication lgtm18:38
clarkbto recap: looking at audit log (grep audit in /var/log/syslog) docker compose issued a sighup not a sigint and was blocked when we first tried to restart gerrit. This implies that docker compose/podman use the config from booting the service not what is currently on disk to determine the stop signal.18:39
clarkbThen we tried an out of band sig term whcih seems to have put gerrit into an odd shutdown state even after subsequent sighup then sigint18:39
clarkbeventually docker compose timed out and did a sigkill and gerrit stopped and was restarted18:39
clarkbonce gerrit was up again we redid the restart to double check sigint would be used and would work and it seems to have done so. And now things should be up and running18:40
clarkbI really want to block that ip that can't set up an ssh connnection and make that block permanent18:40
clarkbbut to do that properly requires private var updates and maybe updates to our iptables role?18:41
clarkbconfirmed our iptables rules only allow us to open things up based on ansible vars not block them18:42
clarkbI should say iptables configuration management18:42
fungiyeah18:42
fungiit's added complication for arguably little value18:42
clarkbmaybe tonyb can track down the ip address at IBM :)18:43
clarkbfungi: is there any thing else you think we should check (web ui is running, diffs load, someone pushed a patch that replicated). I guess maybe we want to check that jeepyb isn't sad. But I didn't see any tracecbacks after that starlingx/distcloud patch was pushed and that should trigger jeepyb hook scripts18:44
funginothing i can think of, it all lgtm18:44
fungii can close out the screen session now if you don't see any reason to keep the log18:45
clarkbI'm already disconnected and I tried to summarize the sequence of events above. iF you think that summary is accurate and sufficient I think we are good18:45
fungiyeah, it's accurate18:45
fungiterminated the screen session18:45
clarkblooking at man 7 signal I remember now why I use logical names instead of values for kill. Because sparc18:47
clarkbthe joys of coming up on linux/unix in a mixed ubuntu and solaris shop.18:48
clarkbanyway I'm impressed you can remember the number values :)18:48
clarkbI'm going to pop out for lunch shortly. Let me know if there is anything else I can do related to this to be helpful18:50
clarkbmaybe now we want to rehcekc https://review.opendev.org/c/opendev/system-config/+/948247 or should we wait?18:50
clarkboh if you docker inspect that nameless image there are clues that it is the image we want18:51
clarkb(the cmd and volume mount values for example)18:51
clarkbok popping out now. Back in a bit18:54
clarkbzuul reported on that starlingx change too (they actually pushed another newer patchset)18:55
clarkbnow I'm really popping out18:55
fungieh, i double-check `man 7 signal` before i did anything. for some reason i never can remember the syntax for passing named signals to kill18:55
clarkbya I think it was just hammered into me that the numbers sometimes differ in the envionrment bceause we had solaris, ubuntu on x86 and ubunt on sparc so use the lgoical names and don't worry18:56
fungiright, i got in the habit of checking the signal.7 page on each platform18:59
clarkbit looks really quiet. I will recheck that ipv4 docker hub change now20:48
clarkbfungi: one of the jobs that is building is the refstack image. Did we end up with an answer for how to announce that teardown?21:03
opendevreviewClark Boylan proposed opendev/system-config master: Force IPv4 connectivity to Docker Hub during image builds  https://review.opendev.org/c/opendev/system-config/+/94824721:05
clarkbok I think that second patchset is working (logs look correct and at least one image build succeeded already)21:23
clarkbthe gitea image build failed with a 403 forbidden from https://git.sr.ht/~mariusor/go-xsd-duration?go-get=1 I suspect we've hit source huts anti ddos measures there21:54
clarkbdropping the go-get=1  parameter gets me a quick "checking you are not a bot" splash page then the repo behidn it after21:55
clarkbthat appears to be the only gitea dep on source hut21:56
clarkbhttps://github.com/go-gitea/gitea/issues/22389 is an old issue where there was ocncern that source hut would block go mod access entirely but then they worked out a plan21:57
clarkbhttps://sourcehut.org/blog/2025-04-15-you-cannot-have-our-users-data/21:58
clarkbseems they deployed anubis but looks like that only affects you if your user agent contains mozilla which the go mod user agent shouldn't22:00
clarkbmaybe a fluke. I guess we can retry and if ti fails again see what upstream gitea has to say22:00
clarkbhrm looks like maybe we can/should set GOPROXY22:01
fungiclarkb: i still haven't heard from wes about whether he wants any sort of formal announcement about the trademark policy changing, i'll ping him again early in the week22:09
opendevreviewClark Boylan proposed opendev/system-config master: Build gitea with GOPROXY set  https://review.opendev.org/c/opendev/system-config/+/94827722:13
clarkbI don't know that ^ is strictly required, but it seems like this is the way people have settled on addressing the go mod traffic problem22:13
clarkbthe old shutdown was going to be source hut blocking the go proxy entirely and this wouldn't work but then they got the go proxy traffic down to reasonable levels and it seemsl ike we should be fetching from there as a result?22:14
fungiseems reasonable to me, sure22:15
clarkbhrm that seems to haev exploded. Almost liek setting GOPROXY made docker itself use that proxy for requests for images22:21
fungiuh22:21
fungiwhy would docker use a goproxy?22:21
clarkbbecause it is written in go?22:22
fungithat seems pathological if so22:22
clarkbhttps://zuul.opendev.org/t/openstack/build/ae94a46b630a482b9c6cecbcdcfa8cb2/log/job-output.txt#739-754 all of these http reuqests failed22:23
fungii mean, to build the docker tools that are written in go maybe, but i wouldn't expect the runtime to just treat that like a general http proxy22:23
clarkbya I wouldn't either. I guess this is similar to what we think of as the rate limit error when we get the buildset registry error back and not the docker rate limit error22:24
clarkbI'll recheck and see if it is consistent I guess22:24
clarkbif that breaks again I can dig in deeper. For now I think I'm going to catch some afternoon sun from a bike22:25
fungisounds like a far better use of your time to mee22:25
clarkbok recheck is progressing so maybe it was just ratelimits22:31
clarkband I'm off22:31
fungihave fun!22:33
fungipep 784 was formally accepted for adding zstandard compression/decompression to the cpython stdlib22:35
fungiseems like that's an initial step toward supporting it for wheel compression22:36
fungisystem-config-build-image-gitea failed again... unrecognized import path "git.sr.ht/~mariusor/go-xsd-duration": reading https://git.sr.ht/~mariusor/go-xsd-duration?go-get=1: 403 Forbidden22:39
fungithat looks more like it didn't get the proxy envvar passed through?22:40
fungihttps://zuul.opendev.org/t/openstack/build/1b45ca22ec8e4f80bd3a558f19bdae4e22:40
Clark[m]Ya maybe the docker file isn't quite right. It does have an arg and env stuff set up for goproxy22:41

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!