Thursday, 2025-01-09

@jsoo1:matrix.orgfungi: sorry, flaked again05:20
-@gerrit:opendev.org- Benjamin Schanzel proposed: [zuul/zuul] 937716: WIP: Allow pinning pipelines on status page https://review.opendev.org/c/zuul/zuul/+/93771607:37
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 938783: Remove config object freezing https://review.opendev.org/c/zuul/zuul/+/93878310:40
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 938783: Remove config object freezing https://review.opendev.org/c/zuul/zuul/+/93878312:57
@fungicide:matrix.orgjsoo1: yes, this time it looks like the integration test job ran into a problem creating a gerrit user, though i didn't find any obvious reason comparing the transcript side-by-side with the gerrit container log from it13:50
@jsoo1:matrix.orgfungi: same thing. Maybe there's a reproducible bug?14:43
@fungicide:matrix.orgyeah, sort of odd to see that crop up twice in a row. those two builds ran in different cloud providers on entirely different continents even, so probably not a performance-induced race condition in the test setup14:58
@fungicide:matrix.orgbut also zuul-nox-py312 is failing on the most recent recheck14:59
@fungicide:matrix.organd the py312 job failed a couple of tests because ansible deemed ssh on 127.0.0.1 unreachable for some reason15:21
@fungicide:matrix.orgthat's extra strange15:21
@jsoo1:matrix.orgOh is that because on those hosts localhost is ipv6 only?16:03
@fungicide:matrix.orgmmm, unlikely, i think all our systems are configured dual-stack. i'll double-check though...16:04
@fungicide:matrix.orghttps://zuul.opendev.org/t/zuul/build/94503031fb3e46e1a4eaaa425344e817/log/zuul-info/zuul-info.ubuntu-noble.txt#20-25 says the node used for that build had both 127.0.0.1/8 and ::1/128 bound to the lo interface16:06
@clarkb:matrix.orgnote ansible says "unreachable" for any ssh failures up to and including "ansibel actually ssh'd into the remote node but failed to copy its remote payload over to tmp due to permissions or space issues"16:12
@clarkb:matrix.orgthey basically treat application errors as connectivity errors in the way they are reported which can lead to confusion16:13
@fungicide:matrix.orgyeah, i was unsuccessful in finding any more specific error in the log, i think ansible hides it away16:15
@fungicide:matrix.orgso it could have been a dns lookup error, a full /tmp, or any number of other causes16:16
@fungicide:matrix.orglooks like we've hit that gerrit account setup failure thrice in sequence, and i see another change failing in check with the same as well. i wonder if something changed with the gerrit image...16:58
@fungicide:matrix.orglooks like the quickstart compose file just grabs docker.io/gerritcodereview/gerrit unversioned17:00
@fungicide:matrix.orgif i'm reading the dockerhub interface correctly, the last time those images changed was 2024-12-02, over a month ago17:02
@clarkb:matrix.orgThere were the changes to gerrit permissions not allowing you to push acls, but corvus addressed that and this sounds different17:27
@fungicide:matrix.orgyeah, this seems to have just cropped up in the past 24 hours. side-by-side comparisons from earlier gerrit.log files are also not yielding any obvious differences17:28
@fungicide:matrix.orglooks like it started to happen consistently around 22:45 utc, a prior run just before 20:00 succeeded17:30
@fungicide:matrix.orgthe past 6 failures listed at https://zuul.opendev.org/t/zuul/builds?job_name=zuul-quick-start&skip=0 seem to all have that as the cause (the stray failure at 19:25 yesterday looks unrelated)17:32
@clarkb:matrix.orghttps://zuul.opendev.org/t/zuul/build/94eee2f75f1c49dda27a2f1d7459f11b/log/container_logs/gerritconfig.log this is the container that creates the zuul account in gerrit17:40
@clarkb:matrix.orgbased on that log it is stuck waiting for gerrit to be up17:40
@clarkb:matrix.orgthe gerrit log indicates it is up though so maybe our detection mechanism for that is broken17:40
@fungicide:matrix.orgyeah, it never gets past waiting for gerrit to come up, but the gerrit.log shows gerrit thinks it fully started17:40
@clarkb:matrix.orgwe wait for port 29418 to be listening on the host17:40
@clarkb:matrix.orghttps://zuul.opendev.org/t/zuul/build/94eee2f75f1c49dda27a2f1d7459f11b/log/container_logs/gerrit.log#154 indicates that it should be listening but maybe this is related to the localhost connectivity problems too17:41
@clarkb:matrix.orgmight make sense to hold the node and see what is going on17:43
@clarkb:matrix.orgnote this isn't using host networking its looking up the gerrit container via the container networking17:44
@fungicide:matrix.orgyeah, i was trying to work out what additional logs we could/should collect in that job, but maybe a hold is more straightforward for now17:44
@jim:acmegating.comit's not stuck waiting for it to start; it's failing to create the user: https://zuul.opendev.org/t/zuul/build/94eee2f75f1c49dda27a2f1d7459f11b/console17:44
@clarkb:matrix.orghuh so the container log is incomplete?17:45
@fungicide:matrix.orgi'm thinking it must not flush regularly17:45
@jim:acmegating.comoh you're looking at https://zuul.opendev.org/t/zuul/build/3bf117748cb7479ba83b73b4fc99368e/log/container_logs/gerritconfig.log#1117:45
@jim:acmegating.comyeah that's waiting for gerrit to start :)17:45
@clarkb:matrix.orgcorvus: yes and that playbook is the one that creates the zuul user17:46
@clarkb:matrix.orgmy assumtpion was that we never create the user because we don't recognize gerrit as up17:46
@fungicide:matrix.orgright, we started from trying to figure out why the account creation never seems to go through17:46
@jim:acmegating.comyeah, so the outside playbook has decided that gerrit started, but not the inside one17:46
@jim:acmegating.comi agree that smells like container connectivity issues17:47
@fungicide:matrix.orgin that case, i wonder if the name resolution issues in https://zuul.opendev.org/t/zuul/build/94503031fb3e46e1a4eaaa425344e817 could have a similar underlying cause17:48
@fungicide:matrix.org"ssh: Could not resolve hostname test_node: Name or service not known"17:49
@clarkb:matrix.orgI wonder if podman updated on jammy 17:50
@fungicide:matrix.orgnot recently that i can see: https://changelogs.ubuntu.com/changelogs/pool/universe/libp/libpod/libpod_3.4.4+ds1-1ubuntu1.22.04.3/changelog17:52
@fungicide:matrix.org3.4.4+ds1-1ubuntu1.22.04.3 is the current version in jammy and jammy-updates17:53
@fungicide:matrix.orgunless puc is lagging behind...17:54
@clarkb:matrix.orgit may be another supporting package since podman relies on a bunch of tertiary things. But ya I'm guessing holding a node is going to be quickest path to discovery here17:55
@fungicide:matrix.orgautohold is set and 938346 rechecked17:58
@fungicide:matrix.org200.225.47.45 is the held node18:58
@fungicide:matrix.orghttps://zuul.opendev.org/t/zuul/build/06d1225d4261480093c3600b1d145d8b is the corresponding build report19:03
@fungicide:matrix.orginterestingly, apt policy podman shows it's got 3.4.4+ds1-1ubuntu1 installed but that it could upgrade to 3.4.4+ds1-1ubuntu1.22.04.319:08
@fungicide:matrix.orgthe installed version is from jammy/universe while the newer available version is found in jammy-updates/universe and jammy-security/universe19:09
@clarkb:matrix.org`podman exec -it zuul-tutorial_executor_1 bash` then running python3's `socket.gethostbyname('gerrit')` results in `socket.gaierror: [Errno -2] Name or service not known`19:10
@clarkb:matrix.org`gerrit` is the name the setup.yaml playbook checks for port 29418 so that may explain it?19:12
@fungicide:matrix.orgi guess docker does some sort of fancy "lookup containers as if they're hostnames" magic?19:13
@clarkb:matrix.orgyes docker/podman are supposed to intercept dns requests and populate the logical names for you19:13
@clarkb:matrix.orgin opendev we don't generally deal with that because we use host networking which disables that behavior19:14
@fungicide:matrix.orggot it. let's set aside my extreme fright at that discovery for the time being19:14
@fungicide:matrix.orgpresumably that's how it was working previously, so what broke it?19:15
@clarkb:matrix.orgmaybe the packages that set up dns stuff for podman did update to match the security update but the main runtime didn't and there is an incompatibility?19:15
@clarkb:matrix.orghttps://pypi.org/project/podman-compose/#history I think this is the problem actually19:16
@clarkb:matrix.orgbased only on the latest release timestamp19:16
@jim:acmegating.comit may not be creating the container with the correct name19:17
@clarkb:matrix.orgI have confirmed that the latest 1.3.0 version is what we haev installed19:17
@clarkb:matrix.orgwe can push an update touse the previous version and see if that fixes it19:18
@jim:acmegating.comClark: while you do that, do you mind if i take over this host and downgrade to 1.2.0 and test19:18
@jim:acmegating.comi want to see what the difference in container metadata is19:18
@clarkb:matrix.orgcorvus: go for it19:18
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 938837: Pin podman-compose to 1.2.0 https://review.opendev.org/c/zuul/zuul/+/93883719:20
@jim:acmegating.com-                "--network=zuul-tutorial_zuul:alias=gerrit",19:21
+ "--network=zuul-tutorial_zuul",
+ "--network-alias=gerrit",
@fungicide:matrix.orglooping back around to the installed podman version being old, it was a workaround for https://bugs.launchpad.net/ubuntu/+source/libpod/+bug/2024394 implemented via https://review.opendev.org/c/zuul/zuul-jobs/+/886552 (so it's indeed intentional)19:21
@jim:acmegating.com * ```19:22
- "--network=zuul-tutorial_zuul:alias=gerrit",
+ "--network=zuul-tutorial_zuul",
+ "--network-alias=gerrit",
```
@jim:acmegating.commaybe the old podman doesn't support the separate network-alias arg19:22
@clarkb:matrix.orgI guess another approach may be to update the job to noble under the assumption that newer podman compose woudl work. Or use docker compose or something19:23
@clarkb:matrix.orgbut if the pin works on jammy that is probably a reasonable workaround until one of those steps are taken19:23
@fungicide:matrix.orgsupposedly the podman on noble doesn't suffer from 2024394. we may end up trading that bug for newer ones, but at some point we'll need to upgrade either way19:23
@jim:acmegating.comi take it 4394 is not fixed on jammy?19:24
@fungicide:matrix.orgcorrect19:25
@fungicide:matrix.orgpeople running jammy continue to rediscover that bug report19:25
@fungicide:matrix.orgas recently as two months ago, so if it's not beein fixed in ~1.5 years already it's unlikely to ever be19:26
@fungicide:matrix.orgdebian would also be an option, if that aligns better with other jobs we're running19:26
@jim:acmegating.comit sounds like the quickstart is broken on jammy then if we have to pin both podman and podman-compose19:26
@jim:acmegating.comi agree with Clark let's do the podman-compose pin for the test, but we should immediately upgrade the job to noble to see if we can drop the pins.  then after that let's see about switching to "docker compose".  i do have a change for that which could be put into shape quickly.19:28
@fungicide:matrix.orgthe sad thing is podman was working on jammy, ubuntu folk introduced this regression later with a backported patch but then never fixed it19:28
@jim:acmegating.comyes, everything about all of this is very table-flippy.19:29
@jim:acmegating.comthe manual downgrade worked btw19:30
@fungicide:matrix.orgle sigh19:30
@jim:acmegating.comi approved 83719:30
@jim:acmegating.comanyone writing a noble patch or should i?19:30
@fungicide:matrix.orglgtm as well19:30
@fungicide:matrix.orgi had not started one yet, no19:31
@clarkb:matrix.orgnor me19:31
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 938840: Use noble for quickstart https://review.opendev.org/c/zuul/zuul/+/93884019:34
@fungicide:matrix.orgthanks!19:34
@jim:acmegating.comoh btw, https://review.opendev.org/923084 is already running that on noble, so i'm optimistic about that :)19:35
@jim:acmegating.com(with podman under "docker compose")19:36
@fungicide:matrix.orgeven better!19:36
@jim:acmegating.comoh, though the pip install of git-review may fail19:39
@clarkb:matrix.orgI think on noble you may have to install to a venv now?19:40
@fungicide:matrix.orgi expect so, yes19:40
@clarkb:matrix.orgglobal installs are disabled unless youpass the flag to say I really meant it19:40
@fungicide:matrix.orgor with --break-system-packages or by removing the externally-managed flagfile or a variety of other options19:40
@fungicide:matrix.orgcould e.g. create a /opt/git-review venv, pip install inside there, then drop a symlink from /usr/local/bin/git-review to /opt/git-review/bin/git-review, for example19:42
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 923084: WIP: Try docker compose with podman https://review.opendev.org/c/zuul/zuul/+/92308419:43
@jim:acmegating.comi think we should just use the distro git-review19:43
@jim:acmegating.comthat's what i do in that change ^19:44
@fungicide:matrix.orghear hear19:44
@fungicide:matrix.orgthere haven't been any critical fixes to git-review in ages, and we really don't need any of the newer features for this19:44
@jim:acmegating.comassuming 840 fails because of that, i will move that part into 84019:44
@fungicide:matrix.orgugh, now zuul-nox-py311 is failing for 93883719:52
@fungicide:matrix.org"Timeout waiting for Zuul to settle" in TestAWSDriver.testawsdiskimage_snapshot19:53
@jim:acmegating.comgood chance the niz stack i'm working on improves that; i'd recheck-bash it and not worry for now.19:54
@fungicide:matrix.orgwill do, thanks20:03
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:20:11
- [zuul/zuul] 937946: Add image/upload delete lifecycle https://review.opendev.org/c/zuul/zuul/+/937946
- [zuul/zuul] 937947: Add web API image delete endpoints https://review.opendev.org/c/zuul/zuul/+/937947
- [zuul/zuul] 938022: Allow deleting images through web UI https://review.opendev.org/c/zuul/zuul/+/938022
- [zuul/zuul] 938023: Add REST API method to trigger image build https://review.opendev.org/c/zuul/zuul/+/938023
- [zuul/zuul] 938024: Add a web UI button to build an image https://review.opendev.org/c/zuul/zuul/+/938024
- [zuul/zuul] 938087: Add labels and flavors to web https://review.opendev.org/c/zuul/zuul/+/938087
- [zuul/zuul] 938088: Add niz nodes to rest api nodes list endpoint https://review.opendev.org/c/zuul/zuul/+/938088
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 923084: WIP: Try docker compose with podman https://review.opendev.org/c/zuul/zuul/+/92308420:52
@fungicide:matrix.orgahh good, i was having trouble parsing the error message from that last attempt20:52
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:20:55
- [zuul/zuul] 938840: Use noble for quickstart https://review.opendev.org/c/zuul/zuul/+/938840
- [zuul/zuul] 923084: WIP: Try docker compose with podman https://review.opendev.org/c/zuul/zuul/+/923084
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul] 938837: Pin podman-compose to 1.2.0 https://review.opendev.org/c/zuul/zuul/+/93883721:15
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:21:35
- [zuul/zuul] 937384: Use a TreeCache for job request queues https://review.opendev.org/c/zuul/zuul/+/937384
- [zuul/zuul] 937385: Reduce ZK lock contention in executor https://review.opendev.org/c/zuul/zuul/+/937385
- [zuul/zuul] 937386: Disable cache event log https://review.opendev.org/c/zuul/zuul/+/937386
- [zuul/zuul] 937387: Make executor sensor messages more useful https://review.opendev.org/c/zuul/zuul/+/937387
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:21:54
- [zuul/zuul] 938840: Use noble for quickstart https://review.opendev.org/c/zuul/zuul/+/938840
- [zuul/zuul] 923084: Switch quickstart to docker compose v2 https://review.opendev.org/c/zuul/zuul/+/923084
@clarkb:matrix.orgcorvus: I left a comment on https://review.opendev.org/c/zuul/zuul-jobs/+/925916 so I didn't approve it. But feel free if you think that can be sorted out later or is unimportant22:02
@clarkb:matrix.organd a couple of notes/questions on https://review.opendev.org/c/zuul/zuul/+/923084 too22:06
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:22:07
- [zuul/zuul] 937946: Add image/upload delete lifecycle https://review.opendev.org/c/zuul/zuul/+/937946
- [zuul/zuul] 937947: Add web API image delete endpoints https://review.opendev.org/c/zuul/zuul/+/937947
- [zuul/zuul] 938022: Allow deleting images through web UI https://review.opendev.org/c/zuul/zuul/+/938022
- [zuul/zuul] 938023: Add REST API method to trigger image build https://review.opendev.org/c/zuul/zuul/+/938023
- [zuul/zuul] 938024: Add a web UI button to build an image https://review.opendev.org/c/zuul/zuul/+/938024
- [zuul/zuul] 938087: Add labels and flavors to web https://review.opendev.org/c/zuul/zuul/+/938087
- [zuul/zuul] 938088: Add niz nodes to rest api nodes list endpoint https://review.opendev.org/c/zuul/zuul/+/938088
@jim:acmegating.comClark: i suspect either it is necessary for users, or something changed in the interim.  i was definitely being minimal.  so i think we should +w it and if you or anyone else gets curious about reverting it, that's easy enough to test with another change and depends-on.22:09
@jim:acmegating.comClark: did you do a service disabling thing for opendev we could copypasto into 923084?22:11
@clarkb:matrix.orgcorvus: yes there is a service disablement block let me find it22:14
@clarkb:matrix.orgcorvus: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/install-docker/tasks/Ubuntu.noble.yaml#L23-L3722:15
@clarkb:matrix.orgI've approved https://review.opendev.org/c/zuul/zuul-jobs/+/925916 a followup to test the cleanup should be fine22:15
-@gerrit:opendev.org- Zuul merged on behalf of John Soo: [zuul/zuul] 938346: Configure notify settings when reporting to gerrit https://review.opendev.org/c/zuul/zuul/+/93834622:42
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com:22:55
- [zuul/zuul-jobs] 926164: Add ability to exclude a specific platform https://review.opendev.org/c/zuul/zuul-jobs/+/926164
- [zuul/zuul-jobs] 925916: ensure-podman: add tasks to configure socket group https://review.opendev.org/c/zuul/zuul-jobs/+/925916
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 923084: Switch quickstart to docker compose v2 https://review.opendev.org/c/zuul/zuul/+/92308423:22
@jim:acmegating.comClark: ^ thanks, updated.23:23

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!