Unit193 | FWIW, the debian package of limnoria now ships a template user unit for it, not always useful depending on your setup but somewhat generic. | 01:01 |
---|---|---|
fungi | Unit193: yeah, we actually run it in a docker container, but good to know! | 14:13 |
fungi | seems like our mailing lists aren't the only ones struggling to balance usability with deliverability these days: https://www.openwall.com/lists/oss-security/2025/04/24/9 | 14:21 |
opendevreview | Julia Kreger proposed openstack/diskimage-builder master: Fix rhel/centos multipathing issues https://review.opendev.org/c/openstack/diskimage-builder/+/948244 | 14:36 |
clarkb | I don't see any new complaints about gitea so I'll proceed with syncing tags on gitea10-14 this morning | 14:47 |
clarkb | fungi: I'm around too if we want to proceed with https://review.opendev.org/c/opendev/system-config/+/944408 and https://review.opendev.org/c/openstack/project-config/+/944409 and do a gerrit restart today | 14:48 |
clarkb | gitea10 tag sync has started | 14:52 |
clarkb | and it is done. I don't need to flood the channel with status updates as I work through the other 4. But I'll status log when I'm completely done | 14:56 |
fungi | yeah, going back over those now | 15:11 |
fungi | i've approved the first one | 15:13 |
fungi | will wait to approve the other until it's done | 15:13 |
clarkb | #status log Ran gitea repo to db tag sync process via dashboard on the remaining gitea backends (gitea10-gitea14) | 15:19 |
clarkb | this should be done now | 15:19 |
opendevstatus | clarkb: finished logging | 15:19 |
clarkb | noonedeadpunk: ^ fyi I'm going to consider this done for now. Please let us know if you notice any more tags are missing | 15:25 |
noonedeadpunk | ++ sure, will do | 15:30 |
noonedeadpunk | thanks for taking time to figure this out! | 15:30 |
clarkb | infra-root when doing that work this morning I found the admin dashboard monitoring -> stats page gives quick counts on tags that gitea knows about (as well as branches). Can be an easy way to check if the numbers start to diverge greatly again | 15:30 |
clarkb | the branch count seemed consistent across the backends but tag count was not. I suspect because there are feewer branches and we tend to update branches multiple times giving gitea a chance to catch up if it falls behind. Whereas tags are push once and forgotten | 15:31 |
fungi | did the values match up after the fix? | 15:32 |
clarkb | fungi: yes | 15:32 |
clarkb | 58835 tags and 9707 branches | 15:32 |
clarkb | though I didn't go back to check gitea10 or gitea09 (I discoverd this page when I got to gitea11) | 15:33 |
clarkb | fungi: the gerrit python3.12 image update bounced off of docker hub rate limits | 15:36 |
clarkb | Maybe give it until 1600 UTC and then recheck ? | 15:37 |
clarkb | in the meantime I'm going to find some breakfast | 15:37 |
fungi | mmm, sure can do | 15:37 |
fungi | looks like it was trying to pull our python-builder image from dockerhub that failed. i guess if this keeps being a frequent problem we could revisit the choice not to publish our image dependencies to both quay and dockerhub in parallel | 15:47 |
fungi | i remember there was a semi-recent change to make the roles in zuul-jobs support that case | 15:47 |
fungi | yet another case of dockerhub rate limits getting in the way of us migrating off dockerhub | 15:48 |
clarkb | yup. But if we pull from quay then we stop getting speculative builds | 15:53 |
clarkb | so its a damned if you do damned if you don't situation. Things have generally been better with the "moev what we can approach" at least | 15:53 |
fungi | not if they're dependencies for other images we're publishing to quat though, right? | 15:53 |
fungi | s/quat/quay/ | 15:53 |
fungi | i'm assuming the problem is that the things we're running on noble are based on images we upload to quay so if we built them with dependencies hosted on quay we'd still have speculative testing | 15:54 |
fungi | but those same dependencies are also used in building images for things we run on jammy that need to be uploaded to dockerhub | 15:55 |
clarkb | its both. there is speculative testing of the build side and the deployment side | 15:55 |
fungi | so if the dependencies were uploaded to both places in parallel then the images we're hosting on quay could use the quay copies and the ones we upload to dockerhub could use the dockerhub copies? | 15:55 |
clarkb | yes | 15:55 |
fungi | i guess we'd have to build the images twice in separate jobs to be able to speculatively test changes to the dependency images though | 15:56 |
fungi | use one python-builder image build for testing use in building the things that are uploaded to quay, and a separate one for things that are uploaded to dockerhub | 15:56 |
clarkb | oh also the build jobs dont force ipv4 | 15:56 |
clarkb | we can do that and that should help a lot | 15:57 |
fungi | so in essence we'd really need to publish a python-builder-dockerhub image to dockerhub and a python-builder-quay image to quay and then set the image build dependencies to one or the other, and yeah that's a fair amount of extra complexity and literal duplication of work | 15:57 |
clarkb | I suspect ifwe switch the build jobs to force things to ipv4 we'll get better results. Probably not 100% success but better than we get now | 15:59 |
clarkb | and maybe that is good enough to limp along for the moment | 15:59 |
clarkb | I'll look at porting that hack into the build jobs | 16:00 |
opendevreview | Clark Boylan proposed opendev/system-config master: Force IPv4 connectivity to Docker Hub during image builds https://review.opendev.org/c/opendev/system-config/+/948247 | 16:10 |
clarkb | something like that maybe. | 16:10 |
clarkb | oh crap | 16:10 |
clarkb | that runs all the image builds | 16:10 |
clarkb | fungi: do you think we should dequeue ^ from check so that the gerrit change has a higher chance of getting in nowish and then reenqueue later? | 16:11 |
clarkb | I'll go ahead and do that since its low impact | 16:11 |
clarkb | that is done. We can rehceck it when we're happy to eat into our quotas | 16:12 |
clarkb | I was worried I would have to update an image so that it would build... Had the exact opposite problem | 16:13 |
opendevreview | sean mooney proposed openstack/project-config master: create openstack/grian-ui git repo https://review.opendev.org/c/openstack/project-config/+/948249 | 16:20 |
opendevreview | sean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo https://review.opendev.org/c/openstack/project-config/+/948250 | 16:20 |
fungi | hah, yikes | 16:21 |
clarkb | I think that change will improve things once it lands. but while we try to land it it will eat into quotas (albeit the ipv4 quotas which are higher) | 16:22 |
clarkb | fungi: shoudl I recheck the gerrit python3.12 change now? | 16:28 |
fungi | sure! | 16:30 |
clarkb | done | 16:30 |
opendevreview | sean mooney proposed openstack/project-config master: create openstack/grian-ui git repo https://review.opendev.org/c/openstack/project-config/+/948249 | 16:31 |
opendevreview | sean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo https://review.opendev.org/c/openstack/project-config/+/948250 | 16:32 |
clarkb | fungi: images built in check this time around. Now we're on to testing gerrit deployment stuff | 16:47 |
fungi | good deal | 16:49 |
clarkb | and now we are in the gate again. Halfway there | 17:02 |
opendevreview | sean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo https://review.opendev.org/c/openstack/project-config/+/948250 | 17:05 |
opendevreview | sean mooney proposed openstack/project-config master: Add jobs for openstack/grian-ui repo https://review.opendev.org/c/openstack/project-config/+/948250 | 17:06 |
opendevreview | Merged opendev/system-config master: Update Gerrit container image to python3.12 https://review.opendev.org/c/opendev/system-config/+/944408 | 17:55 |
clarkb | fungi: ^ | 17:55 |
clarkb | once that promotes and "deploys" we should be good to do the manual steps | 17:56 |
fungi | yep | 17:56 |
clarkb | and now promotion is done | 17:57 |
fungi | do we not want 944409 in as well before restarting? | 17:57 |
clarkb | fungi: I don't think 944409 affecst the images directly. It will just ensure that when we update jeepyb later it uses the right job order and deps | 17:58 |
clarkb | but 944409 should land quickly so we may as well approve it first I guess and be sure | 17:58 |
fungi | oh, i see, it only adds/adjusts job dependencies on other jobs | 18:00 |
fungi | so yeah, probably no immediate outcome from landing that | 18:00 |
clarkb | but it should be quick so lets go ahead and do that | 18:00 |
clarkb | then its done and out of the way | 18:00 |
fungi | k, i guess it's safe to approve now, doesn't need the depends-on to be actually deployed | 18:01 |
clarkb | nope | 18:01 |
opendevreview | Merged openstack/project-config master: Update jeepyb gerrit image build deps https://review.opendev.org/c/openstack/project-config/+/944409 | 18:06 |
clarkb | fungi: the process should be roughly `docker compose pull && docker compose down && mv waiting queue aside && docker compose up -d` basically the same as before but you can choose to use the - between docker compose or not | 18:06 |
clarkb | the two things I'd like to watch for are the warning about docker compsoe version being gone and that the down happens in a reasonable amount of time | 18:07 |
clarkb | if you'd like to start a screen I can join it shortly | 18:07 |
fungi | pulling in a root screen session now | 18:09 |
clarkb | oh and if we make note of the current image then we can alawys fall back to it later if python3.12 is a problem or whatever | 18:09 |
fungi | <none> <none> f5b922fbdc07 3 weeks ago 691MB | 18:09 |
fungi | opendevorg/gerrit 3.10 ca3978438207 38 minutes ago 691MB | 18:09 |
fungi | those are the old and new images | 18:09 |
clarkb | can you do a docker ps -a too? | 18:10 |
fungi | there are two gerrit containers, one is running and one is exited | 18:10 |
fungi | 253604aed051 opendevorg/gerrit:3.10 "/wait-for-it.sh 127…" 4 days ago Up 4 days gerrit-compose-gerrit-1 | 18:11 |
clarkb | I think the <none> <none> and the ps -a output showing opendevorg/gerrit:3.10 must be podman artifacts | 18:11 |
clarkb | but otherwise that all looks correct to me | 18:11 |
clarkb | fungi: we should move the waiting queue aside too | 18:11 |
clarkb | cool | 18:12 |
clarkb | how does this look #status notice Gerrit is getting restarted to pick up container image updates. It should only be gone for a moment. | 18:12 |
fungi | lgtm | 18:12 |
clarkb | fungi: review_site/data/plugins/replication/ref-updates/waiting or something | 18:12 |
clarkb | (that is from memory so check it)_ | 18:13 |
fungi | what's the path to the waiting queue again? is it somewhere under review_site? | 18:13 |
clarkb | fungi: ya see just a couple messages up | 18:13 |
clarkb | not plugin-manager | 18:13 |
clarkb | maybe its data/replication | 18:13 |
clarkb | ya thats it | 18:14 |
fungi | yeah | 18:14 |
clarkb | then put it in ~gerrit2/tmp so we don't back it up but have the notes for maybe fixing the bug (also same fs so mv is fast) | 18:14 |
clarkb | I'll send the notice now | 18:14 |
fungi | that look right? | 18:14 |
clarkb | #status notice Gerrit is getting restarted to pick up container image updates. It should only be gone for a moment. | 18:15 |
opendevstatus | clarkb: sending notice | 18:15 |
-opendevstatus- NOTICE: Gerrit is getting restarted to pick up container image updates. It should only be gone for a moment. | 18:15 | |
clarkb | fungi: yes that command looks correct to me | 18:15 |
fungi | also did we want to clear the caches that tend to build up if we let them go too long? | 18:15 |
clarkb | gerrit_file_diff.h2.db and git_file_diff.h2.db are both >7GB | 18:16 |
clarkb | I think we could clear those two out (and the other associated files) | 18:16 |
fungi | is that... too large? | 18:16 |
clarkb | ya ist in the range of too large | 18:16 |
fungi | k, i'll add that | 18:16 |
clarkb | gerrit_file_diff.lock.db and git_file_diff.lock.db are their associated lock files which I've deleted in the past with the h2 files | 18:17 |
clarkb | fungi: its review_site/cache not review_site/db | 18:17 |
opendevstatus | clarkb: finished sending notice | 18:18 |
fungi | i checked in a separate window that ~gerrit2/review_site/cache/{gerrit_file_diff,git_file_diff}.* will match the four files we're wanting | 18:19 |
clarkb | cool command lgtm | 18:19 |
fungi | in progress | 18:19 |
clarkb | fungi: arg I'm thinking the sigint didn't apply possibly because we started this container with sighup so shutdown is using sighup? | 18:20 |
fungi | i wonder if the sigint isn't helping | 18:20 |
fungi | oh, or that, sure | 18:20 |
fungi | i can kill the process in a separate window | 18:20 |
clarkb | cool you can use kill -HUP or kill -INT I think both should work | 18:20 |
clarkb | but then hopefully the next time we do thsi sigint just works | 18:20 |
fungi | gerrit2 265899 289 28.3 127490100 37370460 ? Sl Apr21 17038:35 /usr/lib/jvm/java-17-openjdk-amd64/bin/java -Djava.security.egd=file:/dev/./urandom -Dlog4j2.formatMsgNoLookups=true -Dh2.maxCompactTime=15000 -Xmx96g -jar /var/gerrit/bin/gerrit.war daemon -d /var/gerrit | 18:20 |
fungi | that one? | 18:20 |
clarkb | yes taht looks correct | 18:20 |
fungi | regular sigterm may have done nothing | 18:21 |
clarkb | fungi: int or hup not term | 18:21 |
clarkb | the log does say it shutdown the sshd | 18:21 |
fungi | yeah, i did a -1 (hup) too but that seems to have been ignored | 18:22 |
fungi | sent sigint just now (-2) | 18:22 |
fungi | that did seem to do something | 18:23 |
clarkb | the log output hasn't changed | 18:23 |
fungi | though it's still saying stopping | 18:23 |
clarkb | its about to get kill -9'd by docker compose... | 18:23 |
clarkb | whcih at this point I guess is probably ok | 18:24 |
fungi | do we want a second restart once this comes up? | 18:24 |
fungi | to test that the change takes effect? | 18:25 |
fungi | webui is loading now | 18:25 |
clarkb | Probably a good idea. Looking at the second screen window you did a kill -1 then a kill -2. Were there any other kills prior to that? | 18:25 |
fungi | i did no other kills besides hup and int, no | 18:25 |
fungi | oh, wait, yes i did a term (default) first | 18:25 |
fungi | maybe the handler for that confused it and it stopped responding to the subsequent signals | 18:26 |
fungi | i.e. just `kill` without specifying a specific signal | 18:26 |
clarkb | ya I guess my concern here is yo ualso issue a hup which is what we've used for years and should've worked just fine if issued from outside podman/docker | 18:27 |
fungi | file diffs are loading for me | 18:27 |
clarkb | but if there was a term first that would be different and maybe it got stuck in some shutdown or something | 18:27 |
fungi | right, that's what i'm theorizing | 18:27 |
fungi | so the signals it got (in order) from me were: 15 (term), 1 (hup), 2 (int) | 18:28 |
clarkb | so ya now that we've loaded the container with the new config (sigint) we expect podman to be able to pass that through | 18:28 |
fungi | ready to try another restart? need to move the waiting queue aside again? | 18:28 |
clarkb | let me check for any apparmor logs really quick first in case something got blocked we didn't expect | 18:29 |
clarkb | and yes I would move the queue aside just to avoid distracting tracebacks in the error log | 18:29 |
fungi | that command is queued up in the screen when we're ready | 18:30 |
clarkb | fungi: the only apparmor audit log I see is when docker compose tried to issue the sighup | 18:32 |
clarkb | so I think we're good. Nothing unexpected there | 18:32 |
clarkb | I'm happy to proceed with another restart if you are | 18:32 |
clarkb | if this fails to stop with the sigint we should issue an out of band sighup | 18:32 |
clarkb | not term etc | 18:32 |
clarkb | since we know hup works | 18:33 |
fungi | in progress | 18:34 |
fungi | that was way faster | 18:34 |
clarkb | looks like it worked | 18:34 |
clarkb | and the log recorded it was stopping the sshd again | 18:34 |
fungi | near instantaneous | 18:34 |
clarkb | ya hup was too | 18:34 |
fungi | just what we wanted | 18:34 |
fungi | and the webui is already loading for me again | 18:34 |
clarkb | so the issue with the first round must've been with the sigterm | 18:34 |
fungi | with file diffs too | 18:34 |
fungi | yes, i concur, it must have held onto that because it was associated with a container that was started before we changed the shutdown | 18:35 |
clarkb | https://review.opendev.org/c/starlingx/distcloud/+/948262 this just got a new patchset we can use to check replication | 18:36 |
clarkb | I did git fetch origin refs/changes/62/948262/3 and git show FETCH_HEAD while origin is origin https://opendev.org/starlingx/distcloud (fetch) and there is content and the sha matches | 18:38 |
clarkb | so replication lgtm | 18:38 |
clarkb | to recap: looking at audit log (grep audit in /var/log/syslog) docker compose issued a sighup not a sigint and was blocked when we first tried to restart gerrit. This implies that docker compose/podman use the config from booting the service not what is currently on disk to determine the stop signal. | 18:39 |
clarkb | Then we tried an out of band sig term whcih seems to have put gerrit into an odd shutdown state even after subsequent sighup then sigint | 18:39 |
clarkb | eventually docker compose timed out and did a sigkill and gerrit stopped and was restarted | 18:39 |
clarkb | once gerrit was up again we redid the restart to double check sigint would be used and would work and it seems to have done so. And now things should be up and running | 18:40 |
clarkb | I really want to block that ip that can't set up an ssh connnection and make that block permanent | 18:40 |
clarkb | but to do that properly requires private var updates and maybe updates to our iptables role? | 18:41 |
clarkb | confirmed our iptables rules only allow us to open things up based on ansible vars not block them | 18:42 |
clarkb | I should say iptables configuration management | 18:42 |
fungi | yeah | 18:42 |
fungi | it's added complication for arguably little value | 18:42 |
clarkb | maybe tonyb can track down the ip address at IBM :) | 18:43 |
clarkb | fungi: is there any thing else you think we should check (web ui is running, diffs load, someone pushed a patch that replicated). I guess maybe we want to check that jeepyb isn't sad. But I didn't see any tracecbacks after that starlingx/distcloud patch was pushed and that should trigger jeepyb hook scripts | 18:44 |
fungi | nothing i can think of, it all lgtm | 18:44 |
fungi | i can close out the screen session now if you don't see any reason to keep the log | 18:45 |
clarkb | I'm already disconnected and I tried to summarize the sequence of events above. iF you think that summary is accurate and sufficient I think we are good | 18:45 |
fungi | yeah, it's accurate | 18:45 |
fungi | terminated the screen session | 18:45 |
clarkb | looking at man 7 signal I remember now why I use logical names instead of values for kill. Because sparc | 18:47 |
clarkb | the joys of coming up on linux/unix in a mixed ubuntu and solaris shop. | 18:48 |
clarkb | anyway I'm impressed you can remember the number values :) | 18:48 |
clarkb | I'm going to pop out for lunch shortly. Let me know if there is anything else I can do related to this to be helpful | 18:50 |
clarkb | maybe now we want to rehcekc https://review.opendev.org/c/opendev/system-config/+/948247 or should we wait? | 18:50 |
clarkb | oh if you docker inspect that nameless image there are clues that it is the image we want | 18:51 |
clarkb | (the cmd and volume mount values for example) | 18:51 |
clarkb | ok popping out now. Back in a bit | 18:54 |
clarkb | zuul reported on that starlingx change too (they actually pushed another newer patchset) | 18:55 |
clarkb | now I'm really popping out | 18:55 |
fungi | eh, i double-check `man 7 signal` before i did anything. for some reason i never can remember the syntax for passing named signals to kill | 18:55 |
clarkb | ya I think it was just hammered into me that the numbers sometimes differ in the envionrment bceause we had solaris, ubuntu on x86 and ubunt on sparc so use the lgoical names and don't worry | 18:56 |
fungi | right, i got in the habit of checking the signal.7 page on each platform | 18:59 |
clarkb | it looks really quiet. I will recheck that ipv4 docker hub change now | 20:48 |
clarkb | fungi: one of the jobs that is building is the refstack image. Did we end up with an answer for how to announce that teardown? | 21:03 |
opendevreview | Clark Boylan proposed opendev/system-config master: Force IPv4 connectivity to Docker Hub during image builds https://review.opendev.org/c/opendev/system-config/+/948247 | 21:05 |
clarkb | ok I think that second patchset is working (logs look correct and at least one image build succeeded already) | 21:23 |
clarkb | the gitea image build failed with a 403 forbidden from https://git.sr.ht/~mariusor/go-xsd-duration?go-get=1 I suspect we've hit source huts anti ddos measures there | 21:54 |
clarkb | dropping the go-get=1 parameter gets me a quick "checking you are not a bot" splash page then the repo behidn it after | 21:55 |
clarkb | that appears to be the only gitea dep on source hut | 21:56 |
clarkb | https://github.com/go-gitea/gitea/issues/22389 is an old issue where there was ocncern that source hut would block go mod access entirely but then they worked out a plan | 21:57 |
clarkb | https://sourcehut.org/blog/2025-04-15-you-cannot-have-our-users-data/ | 21:58 |
clarkb | seems they deployed anubis but looks like that only affects you if your user agent contains mozilla which the go mod user agent shouldn't | 22:00 |
clarkb | maybe a fluke. I guess we can retry and if ti fails again see what upstream gitea has to say | 22:00 |
clarkb | hrm looks like maybe we can/should set GOPROXY | 22:01 |
fungi | clarkb: i still haven't heard from wes about whether he wants any sort of formal announcement about the trademark policy changing, i'll ping him again early in the week | 22:09 |
opendevreview | Clark Boylan proposed opendev/system-config master: Build gitea with GOPROXY set https://review.opendev.org/c/opendev/system-config/+/948277 | 22:13 |
clarkb | I don't know that ^ is strictly required, but it seems like this is the way people have settled on addressing the go mod traffic problem | 22:13 |
clarkb | the old shutdown was going to be source hut blocking the go proxy entirely and this wouldn't work but then they got the go proxy traffic down to reasonable levels and it seemsl ike we should be fetching from there as a result? | 22:14 |
fungi | seems reasonable to me, sure | 22:15 |
clarkb | hrm that seems to haev exploded. Almost liek setting GOPROXY made docker itself use that proxy for requests for images | 22:21 |
fungi | uh | 22:21 |
fungi | why would docker use a goproxy? | 22:21 |
clarkb | because it is written in go? | 22:22 |
fungi | that seems pathological if so | 22:22 |
clarkb | https://zuul.opendev.org/t/openstack/build/ae94a46b630a482b9c6cecbcdcfa8cb2/log/job-output.txt#739-754 all of these http reuqests failed | 22:23 |
fungi | i mean, to build the docker tools that are written in go maybe, but i wouldn't expect the runtime to just treat that like a general http proxy | 22:23 |
clarkb | ya I wouldn't either. I guess this is similar to what we think of as the rate limit error when we get the buildset registry error back and not the docker rate limit error | 22:24 |
clarkb | I'll recheck and see if it is consistent I guess | 22:24 |
clarkb | if that breaks again I can dig in deeper. For now I think I'm going to catch some afternoon sun from a bike | 22:25 |
fungi | sounds like a far better use of your time to mee | 22:25 |
clarkb | ok recheck is progressing so maybe it was just ratelimits | 22:31 |
clarkb | and I'm off | 22:31 |
fungi | have fun! | 22:33 |
fungi | pep 784 was formally accepted for adding zstandard compression/decompression to the cpython stdlib | 22:35 |
fungi | seems like that's an initial step toward supporting it for wheel compression | 22:36 |
fungi | system-config-build-image-gitea failed again... unrecognized import path "git.sr.ht/~mariusor/go-xsd-duration": reading https://git.sr.ht/~mariusor/go-xsd-duration?go-get=1: 403 Forbidden | 22:39 |
fungi | that looks more like it didn't get the proxy envvar passed through? | 22:40 |
fungi | https://zuul.opendev.org/t/openstack/build/1b45ca22ec8e4f80bd3a558f19bdae4e | 22:40 |
Clark[m] | Ya maybe the docker file isn't quite right. It does have an arg and env stuff set up for goproxy | 22:41 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!