Thursday, 2019-06-20

*** dmsimard0 is now known as dmsimard		00:25
jhesketh	mhu: no worries, I need to revisit that series but I've been a little snowed under recently :-s	00:30
*** mattw4 has quit IRC		00:37
*** michael-beaver has quit IRC		00:43
openstackgerrit	Joshua Hesketh proposed zuul/zuul master: Expose ansible_date_time instead of date_time https://review.opendev.org/666268	00:44
SpamapS	hrm	00:49
SpamapS	I wish there was a clear way to say "why didn't this event trigger that pipeline?"	00:50
SpamapS	Like, I just had to dig through hundreds of lines of logs to find Exception: Project GoodMoney/funnel-cake is not allowed to run job promote-funnel-cake	00:53
SpamapS	I wish that would somehow post to the PR	00:54
SpamapS	(or maybe get recorded as a failed build)	00:54
SpamapS	Does look like it was recorded as a CONFIG_ERROR buildset	00:54
SpamapS	but there's no link to the text	00:54
fungi	figuring out which things to report that about is where i get stuck	01:00
SpamapS	Ultimately I think it belongs somewhere in zuul's database.	01:01
fungi	there are so many things zuul chooses not to run that it seems like it would be very overwhelming	01:01
fungi	but yeah, maybe database reporter	01:01
SpamapS	The "don't run because not matching" is fine. CONFIG_ERROR though, is important.	01:01
SpamapS	That error message is very clear, and the user will know what to do with it.	01:02
SpamapS	So, it belongs in the user's hands.	01:02
fungi	and that config error isn't reported with the configuration errors in the status interface (via the "bell" icon)?	01:11
SpamapS	fungi:no, it's detected at runtime.	01:15
SpamapS	(though from what I see, it could be detected at config parse time)	01:15
SpamapS	If you try to add a project stanza that isn't allowed, you should get an error.	01:16
SpamapS	Instead it lands just fine, and then at the time where it tries to run the job, it fails the allowed projects check.	01:16
SpamapS	I haven't looked closely though, there may be runtime circumstances that make it allowed.	01:16
*** sanjayu__ has joined #zuul		01:26
*** spsurya has joined #zuul		01:39
*** rlandy\|bbl is now known as rlandy		01:42
pabelanger	jlk: Did you get a chance to see about github3.py release process?	02:01
*** rlandy has quit IRC		02:44
*** jamesmcarthur has joined #zuul		02:47
*** bhavikdbavishi has joined #zuul		03:23
*** bhavikdbavishi1 has joined #zuul		03:26
*** bhavikdbavishi has quit IRC		03:27
*** bhavikdbavishi1 is now known as bhavikdbavishi		03:28
*** jamesmcarthur has quit IRC		03:35
*** jamesmcarthur has joined #zuul		03:42
*** jamesmcarthur has quit IRC		03:47
*** jamesmcarthur has joined #zuul		03:55
*** sanjayu__ has quit IRC		04:22
*** jamesmcarthur has quit IRC		04:26
*** jamesmcarthur has joined #zuul		04:34
*** bhavikdbavishi has quit IRC		04:49
*** jamesmcarthur has quit IRC		05:04
*** jamesmcarthur has joined #zuul		05:16
*** jamesmcarthur has quit IRC		05:21
*** jamesmcarthur has joined #zuul		05:27
*** jamesmcarthur has quit IRC		05:31
*** raukadah is now known as chandankumar		05:40
*** jamesmcarthur has joined #zuul		05:47
*** jamesmcarthur has quit IRC		05:52
*** pcaruana has joined #zuul		06:15
*** jamesmcarthur has joined #zuul		06:20
*** jamesmcarthur has quit IRC		06:30
*** saneax has joined #zuul		06:45
*** jamesmcarthur has joined #zuul		06:45
*** jamesmcarthur has quit IRC		06:52
*** themroc has joined #zuul		07:09
*** jamesmcarthur has joined #zuul		07:21
*** bhavikdbavishi has joined #zuul		07:32
*** jamesmcarthur has quit IRC		07:33
*** bhavikdbavishi1 has joined #zuul		07:35
*** bhavikdbavishi has quit IRC		07:36
*** bhavikdbavishi1 is now known as bhavikdbavishi		07:36
*** jamesmcarthur has joined #zuul		07:44
*** jpena\|off is now known as jpena		07:46
*** jamesmcarthur has quit IRC		07:48
*** jamesmcarthur has joined #zuul		07:50
*** jamesmcarthur has quit IRC		07:57
*** jamesmcarthur has joined #zuul		07:58
*** bhavikdbavishi has quit IRC		08:06
*** saneax has quit IRC		08:06
*** sanjayu_ has joined #zuul		08:06
*** bhavikdbavishi has joined #zuul		08:06
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Add command processor to zuul-web https://review.opendev.org/666307	08:10
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Add repl server for debug purposes https://review.opendev.org/579962	08:10
*** bhavikdbavishi has quit IRC		08:13
*** zbr has joined #zuul		08:17
*** igordc has quit IRC		08:20
*** yolanda has quit IRC		08:54
*** jamesmcarthur has quit IRC		08:55
*** jamesmcarthur has joined #zuul		08:59
*** jamesmcarthur has quit IRC		09:08
*** jamesmcarthur has joined #zuul		09:18
openstackgerrit	Ian Wienand proposed zuul/nodepool master: Pin to openshift <= 0.8.9 https://review.opendev.org/666526	09:31
*** jamesmcarthur has quit IRC		09:36
openstackgerrit	Jean-Philippe Evrard proposed zuul/zuul-jobs master: Dockerhub now returns 200 for DELETEs https://review.opendev.org/666529	09:37
*** jamesmcarthur has joined #zuul		09:45
openstackgerrit	Mark Meyer proposed zuul/zuul master: Add Bitbucket Server source functionality https://review.opendev.org/657837	09:47
openstackgerrit	Mark Meyer proposed zuul/zuul master: Create a basic Bitbucket build status reporter https://review.opendev.org/658335	09:47
openstackgerrit	Mark Meyer proposed zuul/zuul master: Create a basic Bitbucket event source https://review.opendev.org/658835	09:47
openstackgerrit	Mark Meyer proposed zuul/zuul master: Upgrade formatting of the patch series. https://review.opendev.org/660683	09:47
openstackgerrit	Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134	09:47
*** bhavikdbavishi has joined #zuul		09:55
openstackgerrit	Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134	09:59
*** jamesmcarthur has quit IRC		10:00
*** gtema has joined #zuul		10:07
*** gtema has quit IRC		10:16
*** gtema has joined #zuul		10:16
openstackgerrit	Jean-Philippe Evrard proposed zuul/zuul-jobs master: Dockerhub now returns 200 for DELETEs https://review.opendev.org/666529	10:20
*** electrofelix has joined #zuul		10:23
*** electrofelix has quit IRC		10:27
*** electrofelix has joined #zuul		10:27
*** avass has joined #zuul		10:35
*** NBorg has joined #zuul		10:36
*** avass has quit IRC		10:44
*** jamesmcarthur has joined #zuul		10:45
*** jamesmcarthur has quit IRC		10:50
*** jamesmcarthur has joined #zuul		10:50
*** jpena is now known as jpena\|lunch		10:59
openstackgerrit	Alexander Braverman proposed zuul/nodepool master: Openshift client https://review.opendev.org/666541	10:59
*** jamesmcarthur has quit IRC		11:06
openstackgerrit	Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134	11:08
*** NBorg has quit IRC		11:08
*** jamesmcarthur has joined #zuul		11:09
*** rfolco_off has joined #zuul		11:20
*** rfolco_off is now known as rfolco		11:21
*** rlandy has joined #zuul		11:30
*** gtema has quit IRC		11:30
*** rlandy is now known as rlandy\|afk		11:33
flaper87	tobiash: thanks for the hint on using statefulsets. That worked	11:35
*** rfolco is now known as rfolco_pto		11:36
tobiash	:)	11:38
*** hashar has joined #zuul		12:00
*** bhavikdbavishi has quit IRC		12:12
*** jpena\|lunch is now known as jpena		12:17
*** themroc has quit IRC		12:17
*** rfolco_pto has quit IRC		12:35
ofosos	My builds seem to be stuck :(	12:35
*** themroc has joined #zuul		12:37
*** rfolco has joined #zuul		12:42
*** jamesmcarthur has quit IRC		12:42
*** NBorg_ has joined #zuul		13:00
*** pcaruana has quit IRC		13:00
*** pcaruana has joined #zuul		13:00
pabelanger	morning, I'm having an issue with nodepool 3.7.0: http://paste.openstack.org/show/753220/	13:19
openstackgerrit	Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134	13:19
pabelanger	zuul-maint: ^	13:20
pabelanger	I'm rolling back, and going to debug in a few minutes	13:21
pabelanger	openshift==0.9.0	13:28
pabelanger	that got pulled in	13:28
pabelanger	I suspect something changed in their API	13:28
*** spsurya has quit IRC		13:29
Shrews	pabelanger: there are at least two changes up to fix that but I’m not really here today to review	13:31
pabelanger	okay, thanks	13:31
pabelanger	https://review.opendev.org/666526/	13:32
pabelanger	cool, thanks fungi / tobiash	13:32
pabelanger	okay, downgrading openshift, has fixed the issue	13:33
pabelanger	we maybe should consider a 3.7.1 release to pick that up	13:33
*** rlandy\|afk is now known as rlandy		13:40
*** rfolco has quit IRC		13:58
openstackgerrit	Merged zuul/nodepool master: Pin to openshift <= 0.8.9 https://review.opendev.org/666526	13:59
*** sanjayu_ has quit IRC		13:59
*** sanjayu_ has joined #zuul		14:12
openstackgerrit	Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134	14:15
openstackgerrit	Fabien Boucher proposed zuul/zuul master: Add missing start-message in pipeline config schema https://review.opendev.org/665936	14:22
*** jamesmcarthur has joined #zuul		14:22
*** felixgcb has joined #zuul		14:26
felixgcb	hey :) have any of you guys ever tried to use "include_role" from one untrusted project to call a role in another untrusted_project? It works locally for me, but on the zuul-executer it just gives me an immediate "ok" result and proceeds..	14:29
fungi	felixgcb: is that other project in the roles list for the job?	14:30
fungi	zuul-maint: there's a very lengthy log of me basically talking to myself over in #openstack-infra where it seems like we ended up perpetually locking a node for a paused build when the change was rescheduled (though i can't be certain)... it picks up around: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2019-06-20.log.html#t2019-06-20T13:54:01	14:33
fungi	if anybody has further ideas of things i should check (and how i should cleanly release that lock), it would be much appreciated	14:33
*** sanjayu_ has quit IRC		14:35
felixgcb	fungi: It wasn't explicitly defined in the projects job.roles setting, but it should be read automatically, since it is a job in the roles directory. I tried adding it explicitly but that also didn't work..	14:39
fungi	felixgcb: you're talking about the project name, right? like this: https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L18-L19	14:41
felixgcb	fungi: ah this section is really helpful. is "zuul" the repository which contains that job?	14:42
fungi	zuul/zuul-jobs is the repository which contains roles we want this particular job to also use	14:43
openstackgerrit	Fabien Boucher proposed zuul/zuul master: Add missing doc for pipeline start-message https://review.opendev.org/665930	14:43
openstackgerrit	Fabien Boucher proposed zuul/zuul master: Add support for item.change for pipeline start-message formater https://review.opendev.org/665968	14:43
openstackgerrit	Fabien Boucher proposed zuul/zuul master: Add change replacement field in doc for start-message https://review.opendev.org/665974	14:43
corvus	fungi: locks should be released when the scheduler restarts	14:43
fungi	felixgcb: you have to tell zuul for a given job which other projects you want it to search for roles	14:43
*** zbr is now known as zbr\|ruck		14:44
fungi	corvus: but there's no (easy) way to manually tell the scheduler to release a lock for a build which has been (or was supposed to be) cancelled so that nodepool will clean up the node?	14:44
fungi	corvus: is the best course of action just to manually delete the node from outside zuul/nodepool and ignore the locked node entry in nodepool until the zuul scheduler is next restarted?	14:46
*** michael-beaver has joined #zuul		14:46
corvus	fungi: i guess it depends on the goal -- if the zk entry is still there, nodepool will reserve space for it in quota calculations, so deleting the node won't necessarily get the quota back	14:47
corvus	fungi: wearing my opendev hat, i'd say "one node doesn't matter, just leave it until next restart"	14:48
fungi	one goal is to comply with a request from the provider's support staff to clean up an apparently unused instance	14:48
fungi	since they were the ones who brought it to our attention	14:49
corvus	fungi: sure, then delete it out from under nodepool	14:49
fungi	just wanting to make sure i'm not going to cause more problems by deleting it out of band when there's a zuul lock on it	14:49
fungi	the bigger reason i brought it up in here was to better postulate about what could cause zuul to indefinitely hold a node lock associated with a build which seems to have been cancelled	14:50
tobiash	corvus, fungi: I also noticed rare node leaks which I haven't got to analyze so far	14:51
corvus	it should be possible to find the exact sequence that caused that in the logs	14:51
tobiash	my first quick log analysis a few weeks ago showed nothing special, but I'll also have a deeper look	14:53
pabelanger	corvus: if we do nodepool 3.7.1 (for openshift pin), do we need to add reno note to generate something for releasenotes page?	14:53
fungi	would log messages about cancellation of a build not include the build uuid (and so that's why i'm not finding them)? or maybe did we never actually cancel the build?	14:54
corvus	pabelanger: yes; we should always have at least one release note (otherwise, why did we make a release?)	14:54
tobiash	but that analysis was before we annotated the logs	14:54
pabelanger	corvus: ack	14:54
corvus	tobiash: it was harder before, but still possible :)	14:55
tobiash	I'm sure it's possible, I just had more pressing issues last weeks	14:56
tobiash	We leak around 20 nodes per week atm, and a temporary workaround is deleting the znode	14:58
openstackgerrit	Paul Belanger proposed zuul/nodepool master: Add release note about pinning openshift client https://review.opendev.org/666605	14:59
corvus	it would be really good to find and fix that leak	14:59
flaper87	does zuul have a concept of stages? Something that would allow for building "artifacts" in one stage and then reuse them in other jobs? Something like https://docs.gitlab.com/ee/ci/yaml/#stages	15:09
openstackgerrit	Merged zuul/zuul-jobs master: Dockerhub now returns 200 for DELETEs https://review.opendev.org/666529	15:09
pabelanger	flaper87: Yup! you'll want to checkout the recent work corvus has been doing around that	15:10
pabelanger	(trying to find docs)	15:11
flaper87	sweeeeet	15:12
pabelanger	flaper87: https://zuul-ci.org/docs/zuul-jobs/docker-image.html has some info about how containers would work	15:13
corvus	flaper87: see https://zuul-ci.org/docs/zuul/user/config.html#attr-job.dependencies to have one job depend on another job	15:14
*** pcaruana has quit IRC		15:14
corvus	flaper87: https://zuul-ci.org/docs/zuul/user/jobs.html#return-values can be used to pass information to dependent jobs	15:15
*** themroc has quit IRC		15:15
corvus	flaper87: and https://zuul-ci.org/docs/zuul/user/jobs.html#var-zuul.artifacts can be used to extend depends-on behavior to artifacts (so you can have a change in one project depend on a built artifact in another project)	15:16
*** hashar has quit IRC		15:17
flaper87	niiiice! Thanks, I'll read all those and come back with questions	15:19
tobiash	corvus: hrm, we seem to be missing a relation between job name and build uuid in the logs	15:24
corvus	tobiash: hrm, we used to have it when we launched a build	15:25
corvus	tobiash: zuul/executor/client.py: "Execute job %s (uuid: %s) on nodes %s for change %s "	15:26
tobiash	corvus: oh found it	15:26
tobiash	yeah that	15:26
tobiash	so appearently the job that leaked a node never shows this message	15:26
tobiash	last thing for the job that leaked is froze job and completed node request	15:28
flaper87	corvus: pabelanger thanks! sounds like all the building blocks for what I need are there	15:28
flaper87	nice	15:28
tobiash	so it might got dequeued somewhere between node lock and 'execute job'	15:29
corvus	fungi: was our node also missing that line ^ ?	15:30
tobiash	fungi: it's a little bit hard to correlate, you could search first for the node, then the node-request which tells you the job and then filter for event id and job name	15:31
felixgcb	fungi: thank you so much :) now it works.	15:33
fungi	corvus: sorry, in two meetings simultaneously at the moment. pretty sure i saw the scheduler log that in my case, yes... checking	15:33
fungi	ahh, no it was the entry about the executor accepting the build request i was thinking of... the one you're asking about would be logged by the builder?	15:35
*** jamesmcarthur has quit IRC		15:36
corvus	fungi: that should show up in the scheduler debug log	15:37
fungi	er, yeah i meant s/builder/executor/ but i was looking at the normal scheduler log not the debug log	15:40
fungi	just a sec	15:40
*** electrofelix has quit IRC		15:44
fungi	well, more than a sec	15:45
fungi	power outage here, but also grepping our scheduler debug log is not fast	15:45
tobiash	corvus, fungi: the job that leaked is making use of skipping child jobs	15:45
*** jamesmcarthur has joined #zuul		15:53
fungi	ooh, we've got a bunch of useful info in the scheduler debug log i'm surprised didn't percolate into the normal service log	15:54
corvus	we should fix that	15:54
fungi	agreed, just figuring out which ones should get elevated	15:54
fungi	apparently the scheduler was trying to cancel the build but couldn't find it in the queue and thought it wasn't started	15:55
*** fdegir8 has joined #zuul		15:55
*** kklimonda_ has joined #zuul		15:55
corvus	mordred: i've run into an error with my new nodepool functional job, can i get your help debugging this: http://paste.openstack.org/show/753227/ (i've set an autohold on that node, so it should be available for us to ssh into in a bit)	15:56
fungi	but yeah, the log entry you were asking about is present in this case:	15:56
*** jamesmcarthur has quit IRC		15:56
fungi	2019-06-20 04:53:52,464 INFO zuul.ExecutorClient: [e: 9b44cc146cd649e585ff229c0a8a296b] Execute job swift-upload-image (uuid: 395c781799ea452c8f639d3037f8de0f) on nodes <NodeSet [<Node 0008103245 ('ubuntu-bionic',):ubuntu-bionic>]> for change <Change 0x7fa6a08caf60 openstack/swift 665487,2> [...]	15:56
corvus	then it seems we may have a different situation than tobiash	15:56
fungi	seems so, yes	15:56
*** jamesmcarthur has joined #zuul		15:56
*** persia_ has joined #zuul		15:57
tobiash	hrm, two node leaks?	15:57
tobiash	bad	15:57
fungi	this is probably the critical element in our scenario:	15:57
fungi	2019-06-20 04:54:18,614 DEBUG zuul.ExecutorClient: [e: 9b44cc146cd649e585ff229c0a8a296b] Response to cancel build request: b'ERR UNKNOWN_JOB'	15:57
*** sean-k-mooney1 has joined #zuul		15:58
*** mattw4 has joined #zuul		15:58
fungi	the build cancellation event was logged at 04:54:18 a few lines before that	16:00
fungi	so ~36 seconds after the job execution event	16:01
fungi	strangely, it seems to have continued on the executor, which i guess never got asked to abort	16:02
fungi	and at 05:02:00 the executor paused the build	16:02
*** pcaruana has joined #zuul		16:02
fungi	so maybe a race around cancelling a job moments after execution?	16:02
*** kklimonda has quit IRC		16:03
*** fdegir has quit IRC		16:03
*** persia has quit IRC		16:03
*** sean-k-mooney has quit IRC		16:03
*** kklimonda_ is now known as kklimonda		16:03
*** jangutter has quit IRC		16:03
fungi	hrm, but the first reference to that build uuid in the executor debug log was for picking up the corresponding executor:execute request from gearman at 04:53:52, then staring the ssh agent at 04:53:53... so the job was well under way when the cancellation happened	16:09
tobiash	I found the sequence of my scenario: https://etherpad.openstack.org/p/EAvGRON5P4	16:13
fungi	yeah, in our case the build seems to have been well into checking out git refs on the executor when the cancellation should have happened	16:13
tobiash	corvus: that etherpad shows the sequence in my scenario and a poposal for a fix ^	16:17
fungi	in the log example above, is 'ERR UNKNOWN_JOB' being returned by gearman? i don't see where the executor ever responded to a cancellation	16:18
ofosos	Hey ho, has anybody experience with running the OpenShift drivers to connect to an EKS cluster?	16:20
*** hwangbo has joined #zuul		16:23
fungi	ofosos: i think SpamapS maybe tried out hooking nodepool to ekcs? not sure if it was with the openshift or kubernetes node driver	16:24
fungi	sounded like authentication for ekcs is complicated	16:24
hwangbo	Hi everyone. When we start up our Zuul server, the executor takes a long time (several hours) updating/resetting the repos in our gerrit server. Is there any way to shorten this time?	16:24
pabelanger	hwangbo: sounds like maybe an IO issue?	16:25
pabelanger	how many repos do you have, and I guess they are pretty large?	16:25
hwangbo	We're using 3 repos, but they have dozens of branches with thousands of commits. Pretty large	16:26
*** felixgcb has quit IRC		16:28
pabelanger	hwangbo: I'd try to collect stats on your disk, and try to figure out where the bottleneck is. IIRC, tobiash also had some IO issues recently.	16:31
pabelanger	hwangbo: how many executors are you running?	16:31
*** fdegir8 is now known as fdegir		16:32
corvus	fungi: yeah, 'err unknown job' is the gearman server telling the client that it doesn't know about the job it was requested to cancel	16:32
hwangbo	We're using the Zuul quick start docker-compose, so it's just a single executor container	16:32
fungi	corvus: should it have been in there, or is that a normal occurrence once an executor picks up the job?	16:32
pabelanger	hwangbo: you may want to also scale our the executor, which should allow less jobs per executor, resulting in less IO too	16:33
tobiash	hwangbo: note that the jobdir must be on the same mountpoint like the cache dir	16:33
tobiash	hwangbo: do you already have this in your deployment https://review.opendev.org/665186 ?	16:33
tobiash	this fixes the default in the docker-compose so this condition is met	16:34
pabelanger	good point	16:34
tobiash	(which has been merged in the last few days)	16:34
hwangbo	tobiash: Nope, we don't have that change yet	16:35
fungi	corvus: hrm, grep counts 46 occurrences in a 24 hour period, so i guess it's somewhat expected (or else this problem is more widespread)	16:35
pabelanger	tobiash: mind a review on https://review.opendev.org/666605/	16:35
tobiash	pabelanger: done	16:35
corvus	fungi: yes, the race is not unexpected, and, at least theoretically handled :)	16:37
fungi	right, so there's more to it than just that	16:37
corvus	it is certainly an interesting code path though	16:37
fungi	presumably if a build is underway on an executor then it is expected that there is a corresponding gearman job, otherwise i'd expect waaaay more than a couple of these an hour	16:38
tobiash	maybe connection issues with gearman?	16:39
tobiash	fungi, corvus: fwiw I don't have a match to 'ERR UNKNOWN_JOB' in my logs in the last 15 days	16:41
fungi	tobiash: in the scheduler debug log, right?	16:41
tobiash	oh wait, the matching was wrong :/	16:41
tobiash	I lied, I see 5000 in the last 15 days	16:42
tobiash	yes, scheduler debug log	16:42
hwangbo	pabelanger: Is there any documentation on how to scale the executor? This is just for the server initialization, I thought there's only ever 1 "executor" container running.	16:44
tobiash	so this looks quite normal	16:44
corvus	hwangbo: update with the change tobiash mentioned first, then let us know if you still have problems	16:45
*** panda has quit IRC		16:50
*** panda has joined #zuul		16:52
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Defer setting build result to event queue https://review.opendev.org/666643	16:55
tobiash	this should fix the node leak in my scenario ^	16:56
openstackgerrit	James E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job https://review.opendev.org/665023	16:56
tobiash	corvus: is this worth a release note?	16:57
tobiash	pabelanger: I'd appreciate a review on https://review.opendev.org/579962 and parent. Facilitates advanced debugging techniques in a running zuul	16:59
corvus	tobiash: i don't feel strongly about a release note for that	17:00
pabelanger	tobiash: sure, can look shortly	17:00
pabelanger	contemplating zuul upgrade to 3.9.0 this afternoon :)	17:01
corvus	tobiash: that will slow things down a bit, but i think we actually process events fairly quickly now, so it's probably okay (there was a time where result event processing took so long, half our system would be idle)	17:01
corvus	tobiash: i think we'll just want to keep an eye out for efficiency changes related to that	17:02
openstackgerrit	Merged zuul/nodepool master: Add release note about pinning openshift client https://review.opendev.org/666605	17:03
tobiash	corvus: yeah, I also thought about that but at least freeing the nodes is at the same stage	17:05
pabelanger	nodepool 3.7.1 should be ready now, if we want to tag with ^	17:05
tobiash	corvus: or do you think we need a different solution?	17:05
pabelanger	We are already running the pre-relase version for zuul.a.c, without issues	17:05
corvus	tobiash: this feels like the more correct one, and is simpler; so i think it's worth trying	17:06
tobiash	++	17:06
corvus	i'll tag nodepool master as 3.7.1	17:07
corvus	pushed	17:09
pabelanger	Thanks!	17:10
hwangbo	corvus pabelanger: I don't think it helped. I updated with the change, but it's still stuck doing "Updating repo...Resetting repo" for what seems like each commit in the repository	17:13
*** jpena is now known as jpena\|off		17:15
tobiash	hwangbo: it shouldn't do that for each commit, so logs would be helpful	17:16
hwangbo	tobiash: Here's a snippet of what the log looks like. I spoofed out some sensitive information, but I think you'd get the idea. https://pastebin.com/HEQH1eWM	17:32
tobiash	hwangbo: you mean the cat jobs that are executed when starting up zuul?	17:35
tobiash	those are executed for each repo and branch	17:35
hwangbo	Yeah, sorry. I should have been more verbose	17:35
tobiash	(not every commit)	17:35
hwangbo	For us, that process takes several hours	17:35
tobiash	in this case a single executor is a bottleneck	17:35
tobiash	so you likely want to run more than one executor	17:35
tobiash	(which is not covered by the docker-compose file)	17:36
hwangbo	Is there any other documentation on how to set that up?	17:36
tobiash	well you have the executor config, so you just can take that part and run it multiple times	17:37
tobiash	note that the cache dirs must not be shared	17:37
hwangbo	run it multiple times?	17:38
hwangbo	Is this separate from the docker-compose file?	17:38
fungi	in our case we run executors and additional mergers, and put them on separate servers from where we run the scheduler/web/finger-gw daemons	17:39
fungi	i think our current count is 12 executors and 8 additional mergers for the opendev zuul deployment	17:40
fungi	the cat job workload probably scales in an embarrassingly parallel fashion, so you could divide the current duration by the desired duration to get a rough estimate of the number you need	17:42
pabelanger	does docker-compose setup a zuul-merger?	17:43
fungi	another (or additional) option could be to limit which repositories you want zuul to look in for job configuration	17:43
pabelanger	if not, maybe we make that an option thing to enable	17:43
hwangbo	It doesn't setup a standalone zuul-merger	17:44
fungi	since i think cat jobs are only run for repositories where zuul is looking for job configuration (someone can probably correct me on that)	17:44
pabelanger	however, given this is a single server, IO is still going to be a concern	17:44
hwangbo	Is there an example zuul.conf I can look at to see how multiple executors/mergers are configured?	17:45
pabelanger	hwangbo: in theory, you shouldn't need to change anything when you add another executor online. However, in this case, like tobiash said, you need to create a new volume in docker as not to share it with other executor	17:47
pabelanger	https://github.com/openstack-infra/puppet-zuul/blob/master/templates/zuulv3.conf.erb	17:48
pabelanger	is example from opendev.org	17:48
pabelanger	https://github.com/ansible-network/windmill-config/blob/master/zuul/zuul.conf.j2	17:48
pabelanger	is ours for zuul.ansible.com	17:48
fungi	zuul is designed so that you can have just one service configuration and hand it out to all the executors and mergers as well as the scheduler, but for merger-specific configuration options see https://zuul-ci.org/docs/zuul/admin/components.html#id4	17:49
fungi	really if you just take your executor configuration and put it on additional servers with zuul installed and start the zuul-executor or zuul-merger daemon, it ought to simply work	17:51
fungi	assuming you've configured them to communicate with the scheduler on an address they can all reach	17:51
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Differentiate between queued and waiting jobs in zuul web UI https://review.opendev.org/660878	17:51
fungi	and that any firewall rules are configured to allow communication between those servers	17:52
hwangbo	fungi, pabelanger, tobiash: Thanks for all the help. I'll be giving this a shot	17:52
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Differentiate between queued and waiting jobs in zuul web UI https://review.opendev.org/660878	17:52
fungi	for a better understanding of what firewall rules you might need to allow communication between distributed services, see https://zuul-ci.org/docs/zuul/admin/components.html	17:53
corvus	yes, io on a single machine may still be a bottleneck no matter how many mergers/executors are running. it all depends on the situation. for that reason, i don't think it's worth expanding the quick start for this. the operator, otoh, should handle this case.	17:57
tobiash	corvus: we're running into another scalability problem regarding the number of branches in a repo that is being used by a job. We have a project containing 2000 branches (due to branch&pull workflow and many people working on it). Restoring the repo state when running a job takes a significant amount of time (20 minutes when under io pressure).	17:58
tobiash	most jobs only need the protected branches (<5)	17:59
corvus	tobiash: "most"?	17:59
tobiash	actually probably all	17:59
corvus	whew, that's probably more actionable :)	18:00
tobiash	one idea would be to be able to limit the repo state to protected branches as a job setting	18:00
tobiash	but that would violate layering in the model	18:00
tobiash	however I cannot guarantee that it's all jobs so I guess we'd need something configurable	18:01
tobiash	(e.g. projects following fork and pull on github.com might not even need protected branches)	18:02
tobiash	or we could determine this by the exclude-protected-branches option on the repo	18:02
tobiash	which we already have	18:03
tobiash	I think in this case we could require a user to protect a branch if he wants to run something on it	18:03
tobiash	I think I'll go for the existing exclude-protected-branches option as the best compromise. I don't really want to put that into a job parameter.	18:05
*** hashar has joined #zuul		18:06
*** jamesmcarthur has quit IRC		18:06
*** jamesmcarthur has joined #zuul		18:07
pabelanger	yah, I've seen some random branches get cretaed in a repo, when humans edit a file via web ui	18:08
pabelanger	hard to suggest changing that workflow, as users tend to not be well versed in git	18:08
*** jamesmcarthur has quit IRC		18:12
tobiash	in many enterprise workflows fork&pull is not even possible. In this case you always have the source branches of pull requests in the same repo (which most likely won't ever be needed by jobs)	18:12
*** jamesmcarthur has joined #zuul		18:12
*** jamesmcarthur_ has joined #zuul		18:17
*** jamesmcarthur has quit IRC		18:17
*** jamesmcarthur_ has quit IRC		18:22
*** jamesmcarthur has joined #zuul		18:23
*** igordc has joined #zuul		18:27
*** Minnie100 has joined #zuul		18:30
*** igordc has quit IRC		18:36
openstackgerrit	Merged zuul/zuul master: Update cached repo during job startup only if needed https://review.opendev.org/648229	18:39
*** jamesmcarthur has quit IRC		18:42
*** jamesmcarthur has joined #zuul		18:51
*** jamesmcarthur has quit IRC		19:01
*** jamesmcarthur has joined #zuul		19:14
*** Minnie100 has quit IRC		19:16
*** jamesmcarthur has quit IRC		19:22
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Filter out unprotected branches from builds if excluded https://review.opendev.org/666664	19:34
*** gtema has joined #zuul		19:40
*** bhavikdbavishi has joined #zuul		19:44
*** gtema has quit IRC		19:46
*** bhavikdbavishi has quit IRC		20:15
SpamapS	tobiash: or in my case, the users just refuse to use fork&pull because they like that branches on the main repo are shared by default.	20:26
SpamapS	ugh, the prohibition of untrusted repos sharing a job with a secret just ruined my day :-/	20:32
SpamapS	have to move that job into a config repo :-/	20:32
corvus	i had an idea of how to address that, but i don't seem to have time to work on it	20:42
*** pcaruana has quit IRC		20:45
SpamapS	corvus: I'm happy my secrets are protected, but yeah... I wish there was a better answer.	20:46
*** jeliu_ has joined #zuul		20:47
SpamapS	also this runs into the job renaming problem I had in the past	20:49
SpamapS	(there's no way to move a job from repo to repo without going through a 3-way dance to create a new name, change all references to it, and then remove the old one and rename the new one	20:52
dmsimard	SpamapS: I'd love a better solution to that dance too.	20:53
dmsimard	I don't have any great ideas :(	20:53
corvus	i expect it could be better with bidirectional dependencies	20:53
corvus	(at least, for those who wished to enable that, once support for it exists)	20:54
SpamapS	Maybe, another thought I had was just to have an optional precedence field.	20:54
corvus	SpamapS: you could temporarily allow a repo to shadow another	20:54
*** jeliu_ has quit IRC		20:54
corvus	(that is a form of optional precedence)	20:54
SpamapS	corvus: indeed.. a bit of a big hammer, but swingable.	20:55
openstackgerrit	James E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job https://review.opendev.org/665023	20:56
SpamapS	corvus: I kind of wonder why I can't just break the prohibition and say "allowed-projects: foo"	20:56
SpamapS	The two repos are separate purely for focus reasons.. we expect a small team to iterate rapidly on a small piece of the code so we extracted it to a separate repo..	20:57
SpamapS	so it would be nice if I could just make an explicit trust.	20:57
corvus	SpamapS: i think the solution i was considering had the idea that a config project could cause a job in untrusted repo A to be run in untrusted repo B. in other words, allowing the config project to assert that trust relationship.	20:59
SpamapS	corvus:yes, that's exactly what I want.	21:00
corvus	SpamapS: unfortunately, just doing "allowed-projects: foo" in an untrusted project is dangerous (that is a thing we had to remove) because a change can speculative change it. so we need to get the config project involved to fixate it safely.	21:00
*** hashar has quit IRC		21:01
corvus	SpamapS: i'll see if i can't negotiate that one up higher in my todo list... it's an unhandled use case, and you know i don't like those.	21:02
SpamapS	I do ;)	21:03
SpamapS	also, TBH, we may actually kick all the secrets out of the untrusted repos to have more separation between deploy and dev.	21:04
openstackgerrit	James E. Blair proposed zuul/zuul master: WIP: Allow config projects to override allowed-projects https://review.opendev.org/666733	21:17
corvus	SpamapS: ^ maybe now it'll be a bit more in my face and i can finish it up piecemeal.	21:18
corvus	i wonder if, in the nodepool devstack job, we could start the nodepool builder at the beginning of the job, so that it ran the DIB build portion while devstack was being installed, and we rely on the upload retries for the builder to eventually upload the image once the cloud came online...	21:39
corvus	i think i'll try that after i get the basic functionality working	21:40
*** sshnaidm is now known as sshnaidm\|off		21:55
openstackgerrit	James E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job https://review.opendev.org/665023	22:15
*** mattw4 has quit IRC		22:47
*** sean-k-mooney1 has quit IRC		22:49
openstackgerrit	James E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job https://review.opendev.org/665023	22:53
openstackgerrit	James E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job https://review.opendev.org/665023	22:54
openstackgerrit	James E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job https://review.opendev.org/665023	22:55
*** mattw4 has joined #zuul		23:02
*** rlandy has quit IRC		23:44

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!