Thursday, 2024-09-12

-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 928913: Retry node launches on failure https://review.opendev.org/c/zuul/zuul/+/92891307:24
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 928959: Inherit some attributes from provider configs https://review.opendev.org/c/zuul/zuul/+/92895909:28
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 928971: Add missing AWS image configuration options https://review.opendev.org/c/zuul/zuul/+/92897109:28
-@gerrit:opendev.org- Benedikt Löffler proposed: [zuul/nodepool] 928962: Make AWS create_fleet more robust https://review.opendev.org/c/zuul/nodepool/+/92896210:42
@mnasiadka:matrix.org> <@fungicide:matrix.org> > <@mnasiadka:matrix.org> Hello - does Zuul support running ,,ad-hoc'' jobs? Like passing some arguments to a job and run it?11:27
>
> no, zuul doesn't operate on single jobs, the closest you can come is triggering a pipeline with specific criteria
My company uses GitHub/Gitlab CI also in a way that allows running a single job by a user (e.g. run a workflow to build images) - so Zuul could be a replacement for the regular automated CI, but not for that. Thanks.
@dfajfer:fsfe.org> <@mnasiadka:matrix.org> My company uses GitHub/Gitlab CI also in a way that allows running a single job by a user (e.g. run a workflow to build images) - so Zuul could be a replacement for the regular automated CI, but not for that. Thanks.12:50
it's more of a thinking approach - Zuul doesn't work as a trigger -> job system but it rather deals with code and runs pipelines on them
@dfajfer:fsfe.orgof course you can make it do whatever you want it's just not the workflow developers had in mind developing Zuul CI12:50
@jim:acmegating.commnasiadka: since zuul uses speculative execution for its configuration and job content, one way to achieve "ad-hoc" jobs is to just write a change to have a job do what you want and then upload it for review.  zuul will run whatever that job is, and the change doesn't even need to merge.  this keeps all the configuration in git, and serves as an audit record of the "ad-hoc" job as well.  as an example: the opendev admins do this regularly when building images of new software that opendev runs so they can evaluate them.  this doesn't address all of the ad-hoc job usecases, but it's just one of several techniques, and it does address a lot of them.13:41
@mnasiadka:matrix.org> <@jim:acmegating.com> mnasiadka: since zuul uses speculative execution for its configuration and job content, one way to achieve "ad-hoc" jobs is to just write a change to have a job do what you want and then upload it for review.  zuul will run whatever that job is, and the change doesn't even need to merge.  this keeps all the configuration in git, and serves as an audit record of the "ad-hoc" job as well.  as an example: the opendev admins do this regularly when building images of new software that opendev runs so they can evaluate them.  this doesn't address all of the ad-hoc job usecases, but it's just one of several techniques, and it does address a lot of them.14:22
Thanks, that makes sense :)
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 929123: Allow dib_elements key to be a nested list https://review.opendev.org/c/zuul/zuul-jobs/+/92912314:54
@filmac:matrix.orgHello!14:58
In our organization, we configured Zuul on k8s with three scheduler replicas. Everything started up correctly, and the logs showed that the schedulers were properly registered by other components. The documentation also mentions that this is a fully scalable component, so to maintain high availability it is even recommended.
Zuul jobs in the new configuration are processed correctly - we do not experience errors or problems - everything works great.
The only worrying thing is the logs with exceptions that the schedulers log.
Schedulers logs:
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: kazoo.exceptions.NoNodeError
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: raise NoNodeError
(...)
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: Unable to refresh pipeline change list for check
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: Exception loading ZKObject <zuul.model.PipelineChangeList object at 0x7c1dbc2e1100> at /zuul/tenant/test/pipeline/check/change_list
We know where they come from and their occurrence is understandable. The schedulers do not communicate directly with each other but do so through Zookeeper. When a change from Gerrit starts being processed by scheduler A and schedulers B and C send a trigger for the already processed change, the exception from this class in Zuul's code is thrown - https://opendev.org/zuul/zuul/src/branch/master/zuul/model.py#L1013
Fortunately, all communication goes through Zookeeper, so Zookeeper itself blocks multiple triggers of the same change, and the block manifests itself with the above error.
Should we be worried about this? Apart from the exception from here - https://opendev.org/zuul/zuul/src/branch/master/zuul/model.py#L1041, can we experience any problems? Data corruption in ZK, or anything else?
Thank you very much in advance for any response. :)
@clarkb:matrix.orgthe edited logs contain out of order entries which is a bit confusing. I also assume those errors come with a traceback? Its hard for me to build context without that information15:08
@filmac:matrix.orgMaybe I unnecessarily cut the log to make the message shorter.15:12
Full log:
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: kazoo.exceptions.NoNodeError
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: raise NoNodeError
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: File "/usr/local/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 375, in _retryableLoad
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: return func(*args, **kwargs)
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: File "/usr/local/lib/python3.8/site-packages/kazoo/retry.py", line 132, in _call_
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: return kazoo_retry(func, *args, **kw)
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: File "/usr/local/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 257, in _retry
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: data, compressed_size = self._retry(context, self._retryableLoad,
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: File "/usr/local/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 385, in _load
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: self._load(context)
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: File "/usr/local/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 182, in refresh
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: return func(*args, **kwargs)
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: File "/usr/local/lib/python3.8/site-packages/kazoo/retry.py", line 132, in _call_
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: return kazoo_retry(func, *args, **kw)
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: File "/usr/local/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 257, in _retry
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: self._retry(context, super().refresh,
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: File "/usr/local/lib/python3.8/site-packages/zuul/model.py", line 809, in refresh
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: pipeline.change_list.refresh(ctx)
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: File "/usr/local/lib/python3.8/site-packages/zuul/scheduler.py", line 2146, in process_tenant_trigger_queue
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: Traceback (most recent call last):
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: Unable to refresh pipeline change list for check
2024-09-11 14:06:53
2024-09-11 12:06:53,504 ERROR zuul.Scheduler: Exception loading ZKObject <zuul.model.PipelineChangeList object at 0x7c1dbc2e1100> at /zuul/tenant/test/pipeline/check/change_list
@clarkb:matrix.orgfilmac: you can use a paste service to keep the messages shorter15:12
@clarkb:matrix.orgcouple of observations: it looks like that traceback is in reverse order (it helps with readability if the original order can be preserved), and you're running python3.8 which is old enough that I wonder how old the zuul installation is. That said looking at the zkobject code I don't think this has changed massively so any improvements in newer zuul would likely need to come from the scheduler/model side of things.15:21
@clarkb:matrix.orgbut ya my hunch is that a different scheduler is processing the check pipeline with a rw lock. This scheduler is able to get a ro lock and then notices the inconsistency which results in an error. This isn't a real world problem because the scheduler with the rw lock si doing the interesting work. However, I haven't traced that all down and probably wouldn't bother unless it persists with an up to date zuul installation15:23
@clarkb:matrix.orgI did confirm that process_tenant_trigger_queue does appear to only grab in a read lock in current zuul code so if that is the issue it likely persists15:31
@filmac:matrix.orgSure, indeed we have an old version of Zuul (6.0.1.dev1 05b94d2f0), but we will be working on an update soon.15:40
Thank you very much for your help and analysis!
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 929147: Fix build-diskimage playbook paths https://review.opendev.org/c/zuul/zuul-jobs/+/92914716:10
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel:17:23
- [zuul/zuul] 916284: Implement new status page https://review.opendev.org/c/zuul/zuul/+/916284
- [zuul/zuul] 916285: Make helper functions available to other components https://review.opendev.org/c/zuul/zuul/+/916285
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 916286: Implement QueueItemPopover https://review.opendev.org/c/zuul/zuul/+/91628617:23
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 916287: Implement PipelineDetails view https://review.opendev.org/c/zuul/zuul/+/91628717:24
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-base-jobs] 929180: Handle cleanup-run deprecation https://review.opendev.org/c/zuul/zuul-base-jobs/+/92918018:37
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 916744: Visualize branches in ChangeQueues https://review.opendev.org/c/zuul/zuul/+/91674419:42
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 929189: Remove ZUUL_REMOTE_IP from nox-remote job https://review.opendev.org/c/zuul/zuul/+/92918920:44
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 929189: Set ZUUL_REMOTE_IP to private ipv4 nox-remote job https://review.opendev.org/c/zuul/zuul/+/92918920:55
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 929189: Set ZUUL_REMOTE_IP to private ipv4 nox-remote job https://review.opendev.org/c/zuul/zuul/+/92918923:23
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 916867: Implement admin actions (promote, dequeue) in new QueueItem component https://review.opendev.org/c/zuul/zuul/+/91686723:23
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 916973: Show queue lengths and fetching state on status page https://review.opendev.org/c/zuul/zuul/+/91697323:23

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!