Friday, 2022-09-09

-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 856246: Don't run cleanup playbooks after setup failure https://review.opendev.org/c/zuul/zuul/+/856246		08:52
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 855890: Don't try to report build w/o buildset to DB https://review.opendev.org/c/zuul/zuul/+/855890		08:55
@avass:vassast.org	Clark: yeah looks like the governor was unregistering/reregistering a lot yesterday before we added another two executors, which also seem to have improved performance. But shouldn't that show up in statsd as fewer executors accepting work?	09:46
@avass:vassast.org	hmm maybe they did but the resolution wasn't good enough	09:47
@fungicide:matrix.org	Albin Vass: you can see it on ours, but we seem to have roughly 20-second granularity for the graphs: https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1&viewPanel=21	11:32
@fungicide:matrix.org	i think most of the time it's the "too many builds starting" governor in our case	11:33
@tristanc_:matrix.org	corvus: swest thanks, that looks great!	11:53
@avass:vassast.org	fungi: yeah i was mostly surprised by the difference it made even though we always had at least three executors accepting jobs	13:04
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 849887: Add Nodepool in Zuul spec https://review.opendev.org/c/zuul/zuul/+/849887		15:49
@pearcetyler:matrix.org	Any idea what would cause gearman to fail to start? I keep getting timeouts in the web container waiting for gearman to start	16:30
@clarkb:matrix.org	In older zuul (note current zuul doesn't use gearman anymore) the default is the schedule forks off a gearman daemon. Maybe the scheduler isn't starting or something is already listening on port 4730 preventing the geard from doing so	16:31
@pearcetyler:matrix.org	Hmm, the scheduler seems to be starting but is hitting some errors like `2022-09-09 16:32:10,888 ERROR zuul.Scheduler: Exception loading ZKObject <zuul.model.PipelineSummary object at 0x7f649c06ad40> at /zuul/tenant/github-junipersquare/pipeline/gate/status`	16:32
@clarkb:matrix.org	I think the geard fork happens well before that	16:37
@clarkb:matrix.org	Here's an idea, is your scheduler much newer than your web? it may be that you've upgraded the scheduler to the point where it doesn't run gear anymore because it was removed. But if the web is older it may still be looking for it	16:38
@pearcetyler:matrix.org	Ah interesting. It just started after a `docker system prune -a --volumes` and pulling everything back down via `sudo -E docker-compose -p zuul up`. I wonder if that caused things to get out of sync?	16:39
@clarkb:matrix.org	you should be able to do a docker image list to see what versions of conatiners you've got. Or exec into the containers and check the `pip freeze` versions	16:41
@gobi_g:matrix.org	Hi,	18:49
Can anyone explain how pipeline window and window-floor works?
Thanks
@clarkb:matrix.org	karthi: it is similar to tcp slow start. The idea is as more changes merge the window size increases allowing more changes to run their jobs concurrently. When changes fail to merge the window is reduced. The idea behind is when the software is more stable it is safe to speculatively assign more test resources to testing changes. But when things are unstable you want it to be more limited as a gate reset discards every live job behind that failure	19:02
@clarkb:matrix.org	ianw: left a couple of questions/notes/concerns on https://review.opendev.org/c/zuul/zuul/+/855309	19:06
@gobi_g:matrix.org	Thanks for the info. Does the window mean one pipeline?	19:22
In the doc it mentioned the default window size is 20. Window floor 3.
But I can't run more than 3 pipelines at a single time.
@clarkb:matrix.org	> <@gobi_g:matrix.org> Thanks for the info. Does the window mean one pipeline?	19:24
>
> In the doc it mentioned the default window size is 20. Window floor 3.
> But I can't run more than 3 pipelines at a single time.
That implies to me that you've had enough failures that the window size has shrunk to the floor. If you want to force a larger number I would increase the floor. I think we set ours to 20 so it can grow from the default but not shrink smaller.
@gobi_g:matrix.org	But I was able to run 3 pipelines every time even though my previous 6 pipelines failed. Is it because of the window-floor set to 3?	19:28
@gobi_g:matrix.org	Is there a way to check the current available window value? Is there any API or command	19:29
@clarkb:matrix.org	> <@gobi_g:matrix.org> Is there a way to check the current available window value? Is there any API or command	19:31
It is in the status.json file and the web status rendering shows it using different colored bubbles on the left side of the changes. The floor is 3 so it started at 20 then shrunk to 3 as the minimum size due to the previous failures. You can configure the floor to better suit your needs
@gobi_g:matrix.org	Okay I got it now. So, the floor is max pipelines you can run at the time of continuous failures or more failures.	19:34
@clarkb:matrix.org	> <@gobi_g:matrix.org> Okay I got it now. So, the floor is max pipelines you can run at the time of continuous failures or more failures.	19:37
Yes the window is a dynamic size based on the current state of affairs. In with tcp the idea is to maximize bandwidth utilization without overfilling the pipe. With zuul it's to ensure efficient use of resources to merge changes. You don't want to run jobs for 20 changes and have them restart continuously that wastes a lot of resources
@gobi_g:matrix.org	Thank you. But in status.json I can't find the window details.	19:40
@gobi_g:matrix.org	Thanks for the detailed explanation 🙂. Is it a way to notify admin when window size decreased to the window-floor. To alert when continuous failures.	19:49
@gobi_g:matrix.org	* Thanks for the detailed explanation 🙂. Is there a way to notify admin when window size decreased to the window-floor. To alert when continuous failures.	19:53
@clarkb:matrix.org	I don't think that is built in currently. You could poll it and trigger an alert off of that.	19:54
@gobi_g:matrix.org	Okay. During the failures zuul add the comment in Gerrit. Is there an option to send an email? Or integrate other tools to notify in them?	19:59
@jim:acmegating.com	there are several other reporters available, including email, mqtt, or elasticsearch, any of which could be used for notification	20:00
@jim:acmegating.com	this doesn't exist, but adding a metric for window size would be a good idea i think.	20:01
@gobi_g:matrix.org	Could you please share the reference or doc link reports 😅	20:02
@jim:acmegating.com	karthi: https://zuul-ci.org/docs/zuul/latest/drivers/index.html	20:02
@gobi_g:matrix.org	Thank you	20:06
@gobi_g:matrix.org	Is it okay to use the same label in different projects pipelines.	20:13
Project1:
Gate1
Project2:
Gate1
The issue was when I trigger a MR for project 1 it shows in both gate1 and gate2 for 2 seconds then it disappears from gate2
@gobi_g:matrix.org	* Is it okay to use the same label in different projects pipelines.	20:13
Project1:
Gate1
Project2:
Gate2
The issue was when I trigger a MR for project 1 it shows in both gate1 and gate2 for 2 seconds then it disappears from gate2
@clarkb:matrix.org	I think that is a side effect of how zuul evaluates whether or not a change applies to a pipeline	20:14
@clarkb:matrix.org	But yes in opendev tenants there is a single gate pipeline defined for everything in the tenant. This is actually important for cross repo gating	20:14
@gobi_g:matrix.org	Okay. So pipelines need their own label to prevent this.	20:17
@clarkb:matrix.org	No, I think there may not be a way to prevent it currently	20:17
@clarkb:matrix.org	It's just rendering that each change is evaluated by each pipeline and when it doesn't apply it gets removed before doing further work	20:18
@gobi_g:matrix.org		20:20
Project1:
Pipeline: Gate1
Label: Gate1
Project2:
Pipeline: Gate2
Label: Gate2
If I configure like this it won't show the gate2 change in gate1 pipeline right?
@clarkb:matrix.org	It will still show momentarily while gate1 and gate2 both evaluate if the change applies to them. I would define a single gate pipeline instead of multiple	20:22
@gobi_g:matrix.org	Okay. In my case project1 and project2 not related to each other both have different kind of test cases and different test setups.	20:23
@clarkb:matrix.org	That's fine. The pipeline configuration should express when jobs run not what kind of test or relationships between repos	20:24
@clarkb:matrix.org	The relationships between repos are set via queues and the jobs are controlled by individual projects	20:24
@gobi_g:matrix.org	Yeah got your point. My case is like two different products. If I set it in a single pipeline because product 2 issues product 1 will get affected that's why.	20:27
@gobi_g:matrix.org	* Yeah got your point. My case is like two different products. If I set it in a single pipeline because of product 2 issues product 1 will get affected that's why.	20:28
@fungicide:matrix.org	karthi: for example, if you look at the gate pipeline shown here at the moment there are related and unrelated projects in a single gate pipeline, but only the related ones are forming sequenced queues, the others merge independently: https://zuul.opendev.org/t/openstack/status	20:28
@clarkb:matrix.org	> <@gobi_g:matrix.org> Yeah got your point. My case is like two different products. If I set it in a single pipeline because of product 2 issues product 1 will get affected that's why.	20:28
That isn't true unless you want them to have different trigger definitions
@clarkb:matrix.org	You only need a different pipeline if the trigger conditions are different	20:29
@fungicide:matrix.org	for example, keystone and swift share a dependent queue called "integrated" but the puppet-neutron change is in its own queue since it has no queue shared with the other projects showing there	20:29
@fungicide:matrix.org	the two python-cinderclient changes have formed a queue together though because they're for the same project, so they implicitly share a queue for that project	20:30
@clarkb:matrix.org	I suppose it is the reporter actions that matter too	20:31
@clarkb:matrix.org	Different pipelines when triggers or reporting actions differ. Otherwise you can share	20:31
@fungicide:matrix.org	and if the concern is about windows, the window calculations are independent per queue so if one queue in the gate pipeline has a lot of failures its window will shrink, but another queue in the same pipeline may be succeeding a lot and so its window could be growing at the same time	20:34
@fungicide:matrix.org	i guess the thing that may not be clear is that queues form within a dependent pipeline, but you can have multiple queues in one pipeline	20:35
@gobi_g:matrix.org	Okay. My doubt was	20:36
If I'm triggering 2 MRs for product 1(project 1) always pass
MRs for product 2(project 2) always fail		00:00
In this case where it will affect future MRs of project 1bcz of window size shrink
@gobi_g:matrix.org	You explained it nice 👍	20:37
@fungicide:matrix.org	if you've declared that those projects should share a queue then yes, but just using the same pipeline definition doesn't mean they share a queue	20:37
@fungicide:matrix.org	if they share a queue they will both affect the window sizing for their shared queue. if they don't share a queue then their window sizes are determined independently. but they can both still use the same pipeline	20:38
@fungicide:matrix.org	if you have projects which are fairly tightly coupled (perhaps you run integration test jobs which install them both and then test they work together) then you'll usually want them to share a queue. if they're fairly independently-developed projects which only maybe consume strictly versioned releases of one another, then that's common for them to have their own separate pipelines	20:41
@gobi_g:matrix.org	Got it. From your explanation it looks like we even have a single pipeline for independently development projects with proper job configuration and separate queues.	20:45
@gobi_g:matrix.org	When both projects need the same kind of trigger*	20:46
@fungicide:matrix.org	exactly... if they share triggering patterns and reporting, then having the same queue for them is sensible. that openstack tenant i pasted the link to has many hundreds of different repositories, some of which share queues and many of which don't, but we have basic archetypes of pipeline definitions they all use like check, gate, experimental, post, promote, release, deploy, periodic...	21:00
@fungicide:matrix.org	* exactly... if they share triggering patterns and reporting, then having the same pipeline for them is sensible. that openstack tenant i pasted the link to has many hundreds of different repositories, some of which share queues and many of which don't, but we have basic archetypes of pipeline definitions they all use like check, gate, experimental, post, promote, release, deploy, periodic...	21:00
@gobi_g:matrix.org	fungi: Clark: Thanks for all the detailed explanations🎉	21:11
@fungicide:matrix.org	any time!	21:11

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!