Thursday, 2024-02-08

@mordred:waterwanders.com🚠🐡🩰00:10
@clarkb:matrix.orgits interesting to see the different glyphs in my browser and on my phone. I guess that is down to font choice. Also I have not idea what that means00:28
@mordred:waterwanders.comOh me neither 00:29
@mordred:waterwanders.comIt was an attempt to express positive reaction to the release in a more abstract manner00:29
@sjal:matrix.orghas anyone had similar problem? I can't requeue jobs using zuul client that are older than x. I have some gerrit changes from 2022 that I could rebuild back in autumn and now I can't (it's just gate pipeline also), scheduler just shows me basic info like `zuul.GerritConnection Updating <Change 0xsomething project change,patchset>`05:57
-@gerrit:opendev.org- Zuul merged on behalf of Dong Zhang: [zuul/nodepool] 904196: Introduce error.capacity states_key for InsufficientInstanceCapacity error https://review.opendev.org/c/zuul/nodepool/+/90419608:23
-@gerrit:opendev.org- Benjamin Schanzel proposed: [zuul/zuul] 908420: Zuul-Web: substring search for builds, buildsets https://review.opendev.org/c/zuul/zuul/+/90842013:19
-@gerrit:opendev.org- Benjamin Schanzel proposed: [zuul/zuul] 908420: Zuul-Web: substring search for builds, buildsets https://review.opendev.org/c/zuul/zuul/+/90842013:29
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 908360: Replace Ansible 6 with Ansible 9 https://review.opendev.org/c/zuul/zuul/+/90836014:49
-@gerrit:opendev.org- Benjamin Schanzel proposed: [zuul/zuul] 908420: Zuul-Web: substring search for builds, buildsets https://review.opendev.org/c/zuul/zuul/+/90842015:52
-@gerrit:opendev.org- Francisco Seruca Salgado proposed: [zuul/zuul] 908507: Update decrypt_secret.py https://review.opendev.org/c/zuul/zuul/+/90850716:09
@clarkb:matrix.org> <@sjal:matrix.org> has anyone had similar problem? I can't requeue jobs using zuul client that are older than x. I have some gerrit changes from 2022 that I could rebuild back in autumn and now I can't (it's just gate pipeline also), scheduler just shows me basic info like `zuul.GerritConnection Updating <Change 0xsomething project change,patchset>`16:12
No I've not seen this before. If I had to guess the changes have states now that prevent them from enqueing. For example maybe they are abandoned. I would check your pipeline configs against the change state and/or look at your scheduler debug logs to see why they don't enqueue
@clarkb:matrix.orgI doubt the issue is age. And more something that is a side effect of age like label updates/mergine/abandonment/etc16:13
@sjal:matrix.org> <@clarkb:matrix.org> No I've not seen this before. If I had to guess the changes have states now that prevent them from enqueing. For example maybe they are abandoned. I would check your pipeline configs against the change state and/or look at your scheduler debug logs to see why they don't enqueue16:19
scheduler logs are too shallow, I tried to find what's going on but had to bring those changes back so I just temporary moved build jobs to post pipeline to build them there, weird. Never had something like this happen to me before
@clarkb:matrix.orgthey should be pretty verbose if you enable debug logs16:19
@clarkb:matrix.orgbut you don't need the logs to compare change state against pipeline requirements. its just a bit of manual legwork16:20
@clarkb:matrix.orgthats my best guess though. Zuul is doing what is asked of it which is to not enqueue those changes because they no longer match the criteria16:20
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 908510: DNM: debug test_component_registry https://review.opendev.org/c/zuul/zuul/+/90851016:51
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:17:41
- [zuul/nodepool] 908513: Reconcile docs/validation for some options https://review.opendev.org/c/zuul/nodepool/+/908513
- [zuul/nodepool] 908514: Remove hostname-format option https://review.opendev.org/c/zuul/nodepool/+/908514
@tom.stappaerts:matrix.orgHi everyone,19:13
We have a zuul 9.2.0 running with Gerrit 3.9.1
We have configured a Zuul Queue which allows circular dependencies, then we have enabled Gerrit Submit topic as a whole.
Check goes great, but Gate is failing always in the middle (jobs are cancelled seemingly randomly) with the message This change is part of a bundle that failed.
I was hoping one of you would be able to pinpoint what we are doing wrong, as now we are not able to merge anything which has a topic 🫠
@jim:acmegating.comtom.stappaerts: the scheduler debug logs may indicate what is the thing that failed originally.  though tbh your best bet is probably to wait until zuul 10.0 where all of this is getting refactored.19:15
@tom.stappaerts:matrix.orgThx, there is no other setting I need to set on zuul or gerrit as far as you know?19:26
@jim:acmegating.comtom.stappaerts: submitWholeTopic in gerrit and Queue.allow-circular-dependencies in zuul are the only things needed to enact that behavior.  there might be something about the pipeline config that is causing a problem (some required setting is lost); the debug logs would tell you that; and if that's the case, 10.0 may or may not help.  but if it's a bug or some unintended emergent behavior then at this point the fix is to wait for 10.0.  so i would check the debug logs, but if it's not obvious, then wait for the refactor.19:29
@lucianga96:matrix.orgHello, I've been using zuul v2 in my prev company for over 1 year. I'm planning to use zuul with proxmox as nodepool. From what I saw from documentation there is no support for proxmox. I'm willing to contribute with adding proxmox driver as public support. Is anyone from zuul dev team here able to tell me if I can contribute to zuul by adding Proxmox support? 19:34
@clarkb:matrix.org> <@lucianga96:matrix.org> Hello, I've been using zuul v2 in my prev company for over 1 year. I'm planning to use zuul with proxmox as nodepool. From what I saw from documentation there is no support for proxmox. I'm willing to contribute with adding proxmox driver as public support. Is anyone from zuul dev team here able to tell me if I can contribute to zuul by adding Proxmox support?19:39
It should be possible. You'll likely want to use the new state machine based driver tooling. But otherwise its likely just a matter of implementing the glue between that foundation and the proxmox apis
@jim:acmegating.comand a fake proxmox implementation for functional tests19:50
@tristanc_:matrix.orgMay I ask for some review on https://review.opendev.org/c/zuul/zuul-jobs/+/899212 , we have been using these roles for quite a while, and it would be great if they were available by default with Zuul. Thanks in advance.19:59
@tom.stappaerts:matrix.org> <@jim:acmegating.com> tom.stappaerts: submitWholeTopic in gerrit and Queue.allow-circular-dependencies in zuul are the only things needed to enact that behavior.  there might be something about the pipeline config that is causing a problem (some required setting is lost); the debug logs would tell you that; and if that's the case, 10.0 may or may not help.  but if it's a bug or some unintended emergent behavior then at this point the fix is to wait for 10.0.  so i would check the debug logs, but if it's not obvious, then wait for the refactor.19:59
Seems like we had duplicate job names for not the same job making it fail. I renamed some jobs and now it works :)
@jim:acmegating.comtom.stappaerts: ah nice; the job deduplication is very different in 10.0.  it can be problematic in 9.x.  that was one of the main motivations for the refactor in 10.20:06
@jim:acmegating.comtom.stappaerts: also see https://zuul-ci.org/docs/zuul/latest/config/job.html#attr-job.deduplicate20:07
@tristanc_:matrix.org> <@tristanc_:matrix.org> May I ask for some review on https://review.opendev.org/c/zuul/zuul-jobs/+/899212 , we have been using these roles for quite a while, and it would be great if they were available by default with Zuul. Thanks in advance.20:08
Clark: many moons ago, in Berlin, you asked how opendev could have that, and here is how. I'd be happy to spawn the service if you are satisfied with the result, it's a simple standalone command.
@clarkb:matrix.org> <@tristanc_:matrix.org> Clark: many moons ago, in Berlin, you asked how opendev could have that, and here is how. I'd be happy to spawn the service if you are satisfied with the result, it's a simple standalone command.20:16
OpenDev has moved away from trying to manage log processing beyond simple storage. We found it required non trivial effort (in terms of resources RAM/CPU/Bandwidth/storage) and there wasn't sufficient community support to justify that
@clarkb:matrix.orgwhich is unfortunately beacuse in the last openstack tc meeting people were complaining that elasticsearch isn't really a "find the problem" tool. But rather fancy grep that allows you to find the problem. In any case I'm not sure that is something we'd be worried aobut right now given community interest20:17
@tristanc_:matrix.orgElasticsearch is great to check when and how often an issue happens, I think LogJuicer can be used to "find the problem". It does not store the logs, they are processed in streaming, only the report is kept in an efficient storage format (using capnproto).20:20
@clarkb:matrix.orgRight but there hasn't been interest in any of the other related tooling so it is hard to justify trying to revive those efforts20:21
@clarkb:matrix.orgIt will get ignored by most and then we'll have to fight to shut it down when it bitrots and I'm not interested in doing thay20:22
@clarkb:matrix.org* It will get ignored by most and then we'll have to fight to shut it down when it bitrots and I'm not interested in doing that20:22
@clarkb:matrix.orgI think it would be useful but unfortunately useful isn't sufficient to make it worthwhile. It also needs to be supportable/maintainable. It will also likely make our jobs take much longer? Though I don't know what processing time looks like20:23
@clarkb:matrix.orgThe main reason we avoided putting processing like this in jobs previously is that you have to avoid the time cost and the penalty for failures.20:24
@clarkb:matrix.orgI think those are addressable I just have very little motivation after previous experience for trying to sort all that out20:24
@jim:acmegating.comthis might be a good subject for #_oftc_#opendev:matrix.org  :)20:24
@clarkb:matrix.orgAck20:24
@jim:acmegating.com(i think the official zuul stance on adding stuff like that to zuul-jobs is something like "cool" :)20:25
@tristanc_:matrix.orgYou can actually try our service for opendev build, here is an example of a nodepool failure: https://softwarefactory-project.io/logjuicer/report/new?target=https://zuul.opendev.org/t/zuul/build/97ce036beaf64308882e1a8798b2e42a , it took 11seconds to crunch 64k lines20:26
@clarkb:matrix.orgYup. I do think there is a general "be careful what you do in post-run" problem for zuul users (see also the discussion about that yesterday) but people can run what they want where they want zuul gives you a lot of flexibility there20:27
@clarkb:matrix.orgcorvus: I meant to ask you if you had any thoughts on preserving the retry state for pre-run failures even when post-run fails20:33
@clarkb:matrix.orgI thought about it a bit yesterday and I couldnt' come up with a godo reason not to. But I may be overlooking something important20:33
@jim:acmegating.comClark: pretty sure we do20:34
@jim:acmegating.comthe only way a job can flip from deciding to do a retry to not is via a zuul_return zuul.retry=False flag20:35
@jkkadgar:matrix.orgMy best guess to what is happening is that zuul gets into both a retry and failed state. So it retries but also the gating queue see it as failed and restarts all builds20:38
@clarkb:matrix.orgnot all builds just the ones for the child change(s). I would expect it to do that but only after all retry attempts fail given what corvus just said20:38
@jkkadgar:matrix.orgIt will do it on the first retry from my testing20:39
@jkkadgar:matrix.org * It will do it on the first retry from my testing for the child builds yes20:42
@jkkadgar:matrix.org * It will do it on the first retry from my testing for the child changes yes20:42
@mordred:waterwanders.comI think preserving retry state here means something different. this is not about preserving the state in terms of whether or not the first parent change is going to retry ... rather, whether or not the post failure causes the child jobs to see the parent job as a failed job. So I don't think the right way to characterize this is about the retry state - it's a question of whether it makes sense to introduce a behavior that says "if we're retrying the job because of a pre failure, and then a post fails, don't consider the job to be failing"20:43
@mordred:waterwanders.comI'm not arguing for or against - just happen to have been following the discussion20:43
@clarkb:matrix.orgmordred:  ah yes that is a better way of explaining the situation20:43
@mordred:waterwanders.comit's honestly a fascinating edge case. :)20:44
@jkkadgar:matrix.orgmordred: yes thanks for explaining that better.20:45
@jkkadgar:matrix.orgFor our use-case it would be very valuable to have some feature to deal with this. We have windows jobs that mount volumes which is inherently unstable. And making sure every post-run step is robust to not fail from that is painful, although we have made changes already to deal with some cases.20:49
@jim:acmegating.comit is already the case that a job that fails a pre-run playbook should not report post_failure.21:14
@jim:acmegating.comhttps://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L1949-L1950 is the code and https://zuul.opendev.org/t/openstack/build/bfea076d2734499e894f1e99718dd3d2/console shows it in action.21:15
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 908355: Deprecate Ansible 6 https://review.opendev.org/c/zuul/zuul/+/90835521:28
@jkkadgar:matrix.orgcorvus: yes that is the same behavior I see. The problem is in the gating scenario, when Change A and Change B are building, if a build fails in pre-build and also in post build in Change A, it will cause change B to restart the builds. This should be reproducible 9.3.021:35
@jim:acmegating.comzuul-maint: how about this release for zuul: commit b038dcaf9f90befe69f6e0032eec25db9bfe1059 (HEAD -> master, tag: 9.5.0, gerrit/master, refs/changes/55/908355/1)21:59
@michael_kelly_anet:matrix.orghey folks.  Looks like opendev.org is down?22:14
@jim:acmegating.comyep, it's an issue in the cloud provider; you can see all the gory details in #_oftc_#opendev:matrix.org 22:15
@jim:acmegating.comand back up22:16
@michael_kelly_anet:matrix.orggood times22:16
@clarkb:matrix.orgcorvus: for the release I guess my only comment is that opendev isn't running the deprecation chagne yet if you wanted to have opendev sanity check it first. The tests are probably good enough for something like that though23:22
@jim:acmegating.comyeah, that's the way i'm leaning; it's mostly a bit flip that we've flipped before23:24

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!