Thursday, 2021-02-25

*** tosky has quit IRC		00:00
openstackgerrit	Merged zuul/zuul master: Support per branch change queues https://review.opendev.org/c/zuul/zuul/+/718531	00:43
openstackgerrit	Merged zuul/zuul master: Move queue from pipeline to project https://review.opendev.org/c/zuul/zuul/+/720182	00:45
*** hamalq has quit IRC		00:56
openstackgerrit	James E. Blair proposed zuul/zuul master: Use ZooKeeper TLS in tests https://review.opendev.org/c/zuul/zuul/+/777489	00:59
openstackgerrit	James E. Blair proposed zuul/zuul master: Emit config warnings if shared queues on pipelines https://review.opendev.org/c/zuul/zuul/+/777479	01:08
openstackgerrit	James E. Blair proposed zuul/nodepool master: Add test-setup-docker.sh https://review.opendev.org/c/zuul/nodepool/+/777491	01:21
corvus	tristanC, tobiash: ^ i just now realized i forgot to 'git add' that file when we were working on the nodepool zk change. some of the review comments make more sense now. :)	01:21
openstackgerrit	James E. Blair proposed zuul/zuul master: Use ZooKeeper TLS in tests https://review.opendev.org/c/zuul/zuul/+/777489	01:57
*** rlandy\|bbl is now known as rlandy		03:15
*** rlandy has quit IRC		03:16
*** SpamapS has quit IRC		03:47
*** SpamapS has joined #zuul		03:51
*** ykarel has joined #zuul		04:55
*** bhagyashri\|ruck is now known as bhagyashri\|rover		05:15
*** asettle has quit IRC		05:17
openstackgerrit	Merged zuul/zuul master: Add python-logstash-async to container images https://review.opendev.org/c/zuul/zuul/+/776551	05:28
*** evrardjp has quit IRC		05:33
*** evrardjp has joined #zuul		05:33
*** ajitha has joined #zuul		05:35
*** jfoufas1 has joined #zuul		05:42
*** saneax has joined #zuul		05:55
*** yoctozepto0 has joined #zuul		06:05
*** yoctozepto has quit IRC		06:05
*** yoctozepto0 is now known as yoctozepto		06:05
*** zbr has quit IRC		06:43
*** reiterative has quit IRC		06:50
*** reiterative has joined #zuul		06:51
*** icey has quit IRC		06:58
*** icey has joined #zuul		06:58
*** icey has quit IRC		07:01
*** icey has joined #zuul		07:05
*** icey has quit IRC		07:08
*** jpena\|off is now known as jpena		07:45
*** rpittau\|afk is now known as rpittau		07:57
*** jcapitao has joined #zuul		07:59
*** jfoufas1 has quit IRC		08:19
openstackgerrit	Tobias Henkel proposed zuul/nodepool master: Add zookeeper-timeout connection config https://review.opendev.org/c/zuul/nodepool/+/752022	08:22
avass	corvus: there's a shell_type change for nodepool as well but that one is marked WIP. I think there should be a simple unittest for it	08:28
*** ykarel_ has joined #zuul		08:31
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Harmonize zk session timeouts https://review.opendev.org/c/zuul/zuul/+/763209	08:31
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Make repo state buildset global https://review.opendev.org/c/zuul/zuul/+/738603	08:33
*** ykarel has quit IRC		08:33
*** ykarel_ is now known as ykarel		08:34
*** tosky has joined #zuul		08:38
*** icey has joined #zuul		08:57
*** msuszko has quit IRC		09:04
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Move fingergw config to fingergw https://review.opendev.org/c/zuul/zuul/+/664949	09:05
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Route streams to different zones via finger gateway https://review.opendev.org/c/zuul/zuul/+/664965	09:05
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw https://review.opendev.org/c/zuul/zuul/+/664950	09:05
*** msuszko has joined #zuul		09:05
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Add --validate-tenants option to zuul scheduler https://review.opendev.org/c/zuul/zuul/+/542160	09:11
Open10K8S	Hi team. can you please check these again? https://review.opendev.org/c/zuul/zuul-jobs/+/776677 https://review.opendev.org/c/opendev/base-jobs/+/777087	09:37
*** harrymichal has joined #zuul		09:48
*** yoctozepto9 has joined #zuul		09:55
*** nils has joined #zuul		09:55
*** jhesketh_ has joined #zuul		09:57
*** paulalbertella has joined #zuul		09:57
*** avass_ has joined #zuul		09:58
*** arxcruz has quit IRC		10:00
*** arxcruz has joined #zuul		10:01
*** reiterative has quit IRC		10:03
*** yoctozepto has quit IRC		10:03
*** jhesketh has quit IRC		10:03
*** avass has quit IRC		10:03
*** imtiazc has quit IRC		10:03
*** irclogbot_0 has quit IRC		10:03
*** yoctozepto9 is now known as yoctozepto		10:03
*** irclogbot_2 has joined #zuul		10:08
*** jangutter has joined #zuul		10:40
*** jangutter_ has joined #zuul		10:45
*** jangutter has quit IRC		10:49
*** sshnaidm\|afk is now known as sshnaidm\|pto		11:22
*** jangutter_ is now known as jangutter		12:05
*** jcapitao is now known as jcapitao_lucnh		12:19
*** jcapitao_lucnh is now known as jcapitao_lunch		12:19
*** jpena is now known as jpena\|lunch		12:27
avass_	any plan on upgrading opendevs zuul to v4? I want to upgrade ours but I don't want to be first out :)	12:38
avass_	or is that 3.19 version the same change but from an image built in gate and not by a tag?	12:39
*** rlandy has joined #zuul		12:51
openstackgerrit	Felix Edel proposed zuul/zuul master: Remove superfluous flushes and queries from SQL reporter https://review.opendev.org/c/zuul/zuul/+/752664	12:51
*** zbr has joined #zuul		13:13
*** saneax has quit IRC		13:19
*** jpena\|lunch is now known as jpena		13:26
openstackgerrit	Merged zuul/nodepool master: Add test-setup-docker.sh https://review.opendev.org/c/zuul/nodepool/+/777491	13:27
tobiash	avass_: v4 has been tagged from the last restart rev	13:28
*** jcapitao_lunch is now known as jcapitao		13:30
*** ykarel has quit IRC		13:32
*** ykarel has joined #zuul		13:37
*** iurygregory has quit IRC		13:43
*** iurygregory has joined #zuul		13:44
*** zbr7 has joined #zuul		13:47
avass_	tobiash: thanks!	13:49
*** zbr has quit IRC		13:49
*** zbr has joined #zuul		13:51
*** zbr7 has quit IRC		13:53
*** jangutter_ has joined #zuul		13:55
*** jangutter has quit IRC		13:58
fungi	avass_: tobiash: yeah it's v4, we restarted it on the commit we were considering tagging just to make sure it was still working	14:01
*** yoctozepto has quit IRC		14:01
*** yoctozepto has joined #zuul		14:01
*** zbr0 has joined #zuul		14:11
*** zbr has quit IRC		14:13
*** zbr0 is now known as zbr		14:13
*** jangutter has joined #zuul		14:16
*** jangutter_ has quit IRC		14:19
openstackgerrit	Felix Edel proposed zuul/zuul master: Move fingergw config to fingergw https://review.opendev.org/c/zuul/zuul/+/664949	14:31
openstackgerrit	Felix Edel proposed zuul/zuul master: Route streams to different zones via finger gateway https://review.opendev.org/c/zuul/zuul/+/664965	14:31
openstackgerrit	Felix Edel proposed zuul/zuul master: Support ssl encrypted fingergw https://review.opendev.org/c/zuul/zuul/+/664950	14:31
openstackgerrit	Felix Edel proposed zuul/zuul master: Perform per tenant locking in getStatus https://review.opendev.org/c/zuul/zuul/+/772695	14:48
*** saneax has joined #zuul		14:49
*** zbr1 has joined #zuul		14:54
*** avass_ is now known as avass		14:56
*** zbr has quit IRC		14:56
*** zbr1 is now known as zbr		14:56
*** jfoufas1 has joined #zuul		15:28
*** saneax has quit IRC		15:35
*** ykarel has quit IRC		15:38
*** zbr3 has joined #zuul		15:50
*** zbr has quit IRC		15:52
*** zbr3 is now known as zbr		15:52
openstackgerrit	Dong Zhang proposed zuul/zuul master: Display branch of queue in status page https://review.opendev.org/c/zuul/zuul/+/777613	15:53
openstackgerrit	Dong Zhang proposed zuul/zuul master: Display branch of queue in status page https://review.opendev.org/c/zuul/zuul/+/777613	15:57
openstackgerrit	Dong Zhang proposed zuul/zuul master: Display branch of queue in status page https://review.opendev.org/c/zuul/zuul/+/777613	15:58
openstackgerrit	Dong Zhang proposed zuul/zuul master: Display branch of queue in status page https://review.opendev.org/c/zuul/zuul/+/777613	15:59
*** jfoufas1 has quit IRC		16:12
*** zbr9 has joined #zuul		16:13
*** zbr has quit IRC		16:15
*** zbr9 is now known as zbr		16:15
*** nils has quit IRC		16:20
*** nils has joined #zuul		16:26
*** zbr3 has joined #zuul		16:39
*** zbr has quit IRC		16:41
*** zbr3 is now known as zbr		16:41
*** jpena is now known as jpena\|off		16:54
openstackgerrit	Matthieu Huin proposed zuul/zuul master: Spec: external permissions for the REST admin API https://review.opendev.org/c/zuul/zuul/+/777629	17:04
*** zbr7 has joined #zuul		17:17
*** zbr has quit IRC		17:19
*** zbr7 is now known as zbr		17:19
*** ikhan has joined #zuul		17:32
*** rpittau is now known as rpittau\|afk		17:34
*** zbr3 has joined #zuul		17:38
*** zbr has quit IRC		17:40
*** zbr3 is now known as zbr		17:40
*** zbr5 has joined #zuul		17:42
avass	fungi: I think we'll be upgrading very soon if you haven't had any issues yet	17:44
*** zbr has quit IRC		17:44
*** zbr5 is now known as zbr		17:44
fungi	none related to latest zuul/nodepool, as far as i'm aware	17:46
avass	good :)	17:47
fungi	we've been restarting frequently on latest master branch builds though, so if we did run into problems we already addressed them earlier i suppose	17:48
clarkb	the only thing we ran into was the ready locked nodes not being marked fulfilled in nodesets, but tobiash reports they observed that too previously (so unlikely to be a v4 issue)	17:51
openstackgerrit	Albin Vass proposed zuul/zuul master: Remove sqlreporter from quickstart pipeline definitions https://review.opendev.org/c/zuul/zuul/+/777638	17:51
clarkb	and we sovled that by restarting the launcher	17:51
clarkb	the workaround at least is easy	17:51
tobiash	We have ready locked since ages	17:51
avass	those ^ are supposed to be removed now right?	17:51
fungi	avass: yes, they'll generate deprecation warnings in the log	17:52
avass	I think we've encountered that a couple of time as well	17:52
avass	we had one locked static node this week that could be what you're talking about	17:54
avass	fungi: just making sure since the example pipelines hadn't been updated :)	17:55
clarkb	related to that one thing I noticed was that we don't log much when we skip in the noderequest poll	17:55
clarkb	maybe we should add logging to the various skip reasons to try and identify why that situation happens (I suspect it must be the poll that causes it but not sure of that)	17:55
*** jcapitao has quit IRC		17:55
clarkb	I'll look at that again while I am thinking about it	17:56
clarkb	hrm it might be very noisy though	17:57
avass	oh those aren't used by quickstart. there's another zuul-conf directory that is used.	17:57
openstackgerrit	Albin Vass proposed zuul/zuul master: Remove sqlreporter from example pipeline definitions https://review.opendev.org/c/zuul/zuul/+/777638	17:58
*** nils has quit IRC		17:59
avass	however for static nodes it's easy to just delete the node in zookeeper. I'm not sure if that's the correct way to solve it however	17:59
avass	I guess for dynamic nodes they would be leaked?	18:00
openstackgerrit	Clark Boylan proposed zuul/nodepool master: Add logging to noderequest handler polls https://review.opendev.org/c/zuul/nodepool/+/777641	18:01
clarkb	avass: when you restart the launcher the locks are removed which allows the newly started replacement launcher to reprocess those nodse and assign them to jobs	18:01
avass	oh that sounds a bit better	18:03
*** wuchunyang has joined #zuul		18:03
clarkb	you can probably manually remove the locks too	18:04
clarkb	for the same effect	18:04
clarkb	but not sure, haven't tested that option	18:04
avass	I _think_ I tried restarting nodepool for static nodes one time that happened and it didn't help but I might be misremembering	18:04
*** wuchunyang has quit IRC		18:07
*** wuchunyang has joined #zuul		18:41
*** saneax has joined #zuul		18:45
*** wuchunyang has quit IRC		18:45
*** harrymichal has quit IRC		18:53
*** jangutter_ has joined #zuul		19:03
*** jangutter has quit IRC		19:06
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: cabal-test: add install_args role var https://review.opendev.org/c/zuul/zuul-jobs/+/777653	19:17
openstackgerrit	Dong Zhang proposed zuul/zuul master: Display branch of queue in status page https://review.opendev.org/c/zuul/zuul/+/777613	19:21
*** tjgresha has joined #zuul		19:33
tobiash	clarkb: in our case the scheduler holds the lock of the leaked nodes	19:57
*** saneax has quit IRC		20:02
*** hamalq has joined #zuul		20:02
clarkb	in https://review.opendev.org/c/zuul/nodepool/+/777641 almost 10% of the log lines in the functional openstack job are "Node request waiting for launches to complete" which that change adds	20:24
clarkb	definitely chatty, would be curious to hear if others think that is a problem though	20:24
*** ajitha has quit IRC		20:25
avass	tobiash: right that was the reason	20:35
*** tflink_ has joined #zuul		20:35
*** logan- has quit IRC		20:35
*** icey_ has joined #zuul		20:36
*** decimuscorvinus_ has joined #zuul		20:36
*** mugsie_ has joined #zuul		20:36
tobiash	clarkb: since that's the 'not yet ready' within a poll I think that can create problems for busy systems	20:36
*** persia_ has joined #zuul		20:36
*** Tahvok_ has joined #zuul		20:36
avass	clarkb: would that become very noisy with a lot of requests?	20:36
*** Tahvok has quit IRC		20:36
*** zbr has quit IRC		20:36
*** fdegir has quit IRC		20:36
*** decimuscorvinus has quit IRC		20:36
*** icey has quit IRC		20:36
*** EmilienM has quit IRC		20:36
*** mvadkert has quit IRC		20:36
*** persia has quit IRC		20:36
*** melwitt has quit IRC		20:36
*** mugsie has quit IRC		20:36
*** swest_ has joined #zuul		20:36
*** Tahvok_ is now known as Tahvok		20:36
*** melwitt has joined #zuul		20:36
*** corvus has quit IRC		20:36
tobiash	clarkb: however I think you could do the debugging for that without this message as well.	20:37
*** tjgresha has quit IRC		20:37
tobiash	clarkb: what I'm wondering is, do you see any exception trace with the poll function in it?	20:37
*** tosky_ has joined #zuul		20:37
*** jonass_ has joined #zuul		20:37
tobiash	I guess that might be also a way to get into that state	20:37
tobiash	clarkb: and to double check, you're sure that your nodes are locked by nodepool?	20:38
tobiash	in this case we have two leaks	20:38
tobiash	one in the scheduler and one in nodepool	20:38
*** evrardjp_ has joined #zuul		20:38
*** jonass has quit IRC		20:38
*** parallax has quit IRC		20:38
*** stevthedev has quit IRC		20:38
*** dry has joined #zuul		20:39
*** mordred has quit IRC		20:39
*** stevthedev has joined #zuul		20:39
*** mmedvede has quit IRC		20:40
*** parallax has joined #zuul		20:40
*** Eighth_Doctor has quit IRC		20:40
*** tosky has quit IRC		20:40
*** noonedeadpunk has quit IRC		20:40
*** ricolin has quit IRC		20:40
*** evrardjp has quit IRC		20:40
*** guillaumec has quit IRC		20:40
*** jpenag has joined #zuul		20:40
*** msuszko has quit IRC		20:40
*** logan- has joined #zuul		20:40
*** mnasiadka has quit IRC		20:41
*** mmedvede has joined #zuul		20:41
avass	tobiash: that also means that the nodepool leak could be recent	20:41
*** tosky_ is now known as tosky		20:42
*** masterpe has quit IRC		20:42
*** ironfoot has quit IRC		20:42
*** jpena\|off has quit IRC		20:42
*** tflink has quit IRC		20:42
*** mnasiadka has joined #zuul		20:42
*** swest has quit IRC		20:43
clarkb	tobiash: I'm like 98% sure that nodepool had the lock because the noderequest state was not fulfilled. But I looked around in zk directly and couldn't figure out how to determine what actually owned the lock. The lock had a uuid associated with it but I couldn't find a way to map that uuid to anything	20:44
*** corvus has joined #zuul		20:44
tobiash	clarkb: and the noderequest was for one node?	20:44
clarkb	tobiash: the node lock should only be taken by zuul once the node request is fulfilled if I read the hand shaking there properly	20:44
clarkb	tobiash: correct it was a single node node request	20:44
*** ironfoot has joined #zuul		20:45
tobiash	clarkb: do you have logs filtered for request id and/or node id?	20:45
clarkb	checking for tracebacks with the poll function in them is a good idea. I don't think I did that when this happened. I did take thread stack dumps before restarting but those all looked normal to me (the provider handler thread was still running)	20:45
clarkb	tobiash: I did, they ended with the node going ready, there was nothing for either node id or request id after that	20:46
tobiash	clarkb: do you still have those logs? I'd be interested in them, maybe there is something missing.	20:49
tobiash	clarkb: on exception in poll it should re-add it to the active list and poll them again	20:50
tobiash	that node request should be then recurring in the 'Active requests' line	20:51
tobiash	if that stops recurring there it should be either finished or failed	20:51
clarkb	tobiash: I think we do have them let me look	20:56
tobiash	clarkb: also you should have either 'Error unlocking node' or 'Unlocked node <number>' in the logs, the former unluckily without node id but with exception trace	20:57
*** mordred has joined #zuul		20:58
openstackgerrit	Tobias Henkel proposed zuul/nodepool master: Mention node id when unlock failed https://review.opendev.org/c/zuul/nodepool/+/777678	21:00
tobiash	clarkb: this should aid correlating node and unlocking exception ^	21:01
openstackgerrit	Tobias Henkel proposed zuul/nodepool master: Mention node id when unlock failed https://review.opendev.org/c/zuul/nodepool/+/777678	21:02
clarkb	thanks, I've got the node request and node id logs sorted out now. Lookking for unlock failuers and will put a paste together	21:02
clarkb	tobiash: http://paste.openstack.org/show/OTea9DPKielkU8wVO1XF/	21:05
tobiash	clarkb: that is really weird	21:08
tobiash	that means that we got out with a result of True of the poll method	21:08
clarkb	or the poll method was always short circuiting	21:09
clarkb	which was why I wanted to add that extra logging	21:09
clarkb	oh except the node request is removed from the active requests list	21:10
clarkb	does that imply poll returned true?	21:10
tobiash	False and exception trigger reenqueue to active handlers	21:10
tobiash	so poll can only get lost if it returned true	21:10
*** guillaumec has joined #zuul		21:10
clarkb	and the request not being logged as active shows us it did that	21:10
clarkb	that is weird	21:10
tobiash	see _removeCompletedHandlers	21:11
tobiash	you presumably also got no zk session loss?	21:12
tobiash	clarkb: what's also weird is that you should have 'Removing request handler' in the logs	21:13
avass	maybe that kze.SessionExpiredError should be logged	21:14
clarkb	aha 2021-02-22 22:59:25,186 WARNING kazoo.client: Connection dropped: socket connection error: Connection reset by peer	21:14
clarkb	2021-02-22 22:59:25,442 INFO kazoo.client: Zookeeper connection established, state: CONNECTED	21:15
clarkb	we were disconnected for 300ms but maybe that was long enough to cause problems for those nodes	21:16
tobiash	clarkb: we do log it in: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/zk.py#L913	21:17
tobiash	if you check for 'ZooKeeper connection: LOST' you see the lost session events	21:17
clarkb	ya 2021-02-22 22:59:25,187 DEBUG nodepool.zk.ZooKeeper: ZooKeeper connection: SUSPENDED	21:17
clarkb	then 2021-02-22 22:59:25,442 DEBUG nodepool.zk.ZooKeeper: ZooKeeper connection: CONNECTED	21:17
clarkb	that is likely the underlying cause, but doesn't quite tell us why that was not recoverable	21:18
tobiash	clarkb: I think we should get this in and default to a higher session timeout: https://review.opendev.org/c/zuul/nodepool/+/752022	21:19
tobiash	clarkb: so the session was not lost but suspended which likely means that some zk store might have failed but we didn't loose the lock	21:19
tobiash	we loose the lock only on session lost	21:20
clarkb	and if that had happened we would've avoided this	21:20
clarkb	tobiash: though your change would presumably make this problem worse?	21:20
clarkb	because we'd lose the session, but not the lock and result in this problem?	21:20
*** Eighth_Doctor has joined #zuul		21:21
tobiash	clarkb: well a busy nodepool is sometimes too late with the heartbeat thread which makes a longer session timeout required	21:21
clarkb	I can see how this might make one problem better but another worse :)	21:22
tobiash	ok, well that fixes a different issue ;)	21:22
clarkb	I'm ok with that too, if we find this issue I observed becomes worse we'll just debug it faster :)	21:22
tobiash	do you have any exception in that time range?	21:22
avass	could there be a similar issue with the scheduler not releasing locks in that case?	21:22
tobiash	some are logged without node id	21:22
clarkb	tobiash: http://paste.openstack.org/show/803023/ that maybe	21:25
clarkb	its the same provider and it fails in _assignHandlers ?	21:25
clarkb	that happens after 2021-02-22 23:00:11,102 DEBUG nodepool.zk.ZooKeeper: ZooKeeper connection: SUSPENDED and before 2021-02-22 23:00:12,052 DEBUG nodepool.zk.ZooKeeper: ZooKeeper connection: CONNECTED	21:26
tobiash	is that one trace?	21:26
tobiash	looks weird	21:26
clarkb	yes I believe it is one trace, but I did use grep -A -B so let me double check	21:26
clarkb	yes one trace unless the log is interleaved	21:27
clarkb	(which I suppose it could be)	21:27
tobiash	no idea what createMinReady should have todo with _assignhandlers	21:28
clarkb	2021-02-22 23:00:11,109 ERROR nodepool.PoolWorker.rax-iad-main: Error in PoolWorker also fails just aboev it and has basically the same trace	21:28
clarkb	so I don't think they are interleaved	21:28
tobiash	anyway, nodepool is not good at handling connection errors, there is much room to improve there	21:29
tobiash	as far as I've understood I think we need to handle the kazoo.exceptions.ConnectionLoss exception everywhere in the zk class and only reraise if the session has been lost and until then keep retrying	21:30
tobiash	with this connectionloss the kazoo state turns into suspended	21:31
clarkb	that makes sense	21:31
clarkb	since before the session is lost our locks are still valid	21:31
clarkb	its only once the lock goes away that nodepool propr will care	21:31
tobiash	correct	21:32
tobiash	corvus: I think we need to take this into consideration in zuul as well	21:32
tobiash	connection state handling will become more important there as well	21:33
tobiash	I guess getting this right will be non-trivial	21:34
openstackgerrit	Tobias Henkel proposed zuul/nodepool master: Log error message on request handler removal on session loss https://review.opendev.org/c/zuul/nodepool/+/777689	21:37
tobiash	clarkb: maybe it went this route if there was a misleading exception in kazoo ^	21:38
clarkb	tobiash: can we log the requset id in that log?	21:39
clarkb	r.request.id is the value I think	21:39
tobiash	that's part of the logger already	21:39
clarkb	oh wait ya I see that now	21:39
*** rlandy has quit IRC		21:54
clarkb	is `zuul-executor graceful` not documented intentionally?	22:02
clarkb	I want to try it out but only if it is expected to work :)	22:02
tobiash	we use it in production	22:02
clarkb	cool I guess if it works for me I'll update the docs too	22:02
tobiash	I guess that was just forgotten	22:03
tobiash	or let me rephrase, I forgot ;)	22:04
tobiash	clarkb: there is one glitch tho, if the executor is already paused or governed	22:05
clarkb	governed based on the disk/cpu/memory governors?	22:06
clarkb	anyway it seems to be working, it is currently waiting on ~39 jobs to complete	22:06
tobiash	then it stays in pause and won't terminate because of uncatched unable to unregister unregistered function	22:06
clarkb	ah ok, so if it goes to 0 jobs to complete but doesn't exit I can probably safely stop it at that point	22:07
clarkb	hrm I think I just discovered a flaw in this though. We tell docker-compose to restart-always in our zuul-executor config	22:08
clarkb	so it will exit and then restart :/	22:08
clarkb	I guess what I really need is pause, wait for it to do no work then docker-compose down. /me is learning	22:08
tobiash	oh, that's unfortunate ;)	22:12
tobiash	btw, this is the exception when it's already paused: http://paste.openstack.org/show/803026/	22:12
tobiash	so double pausing throws an exception	22:13
*** harrymichal has joined #zuul		22:14
openstackgerrit	Tobias Henkel proposed zuul/zuul master: Catch exception when double unregistering merge jobs https://review.opendev.org/c/zuul/zuul/+/777694	22:19
tobiash	clarkb: this should fix this corner case ^	22:19
clarkb	thanks	22:20
tobiash	clarkb: can you tell docker-compose to auto-restart only on result !=0?	22:20
tobiash	then you could still use graceful	22:20
clarkb	good question	22:21
tobiash	clarkb: at least pure docker can do that: https://docs.docker.com/config/containers/start-containers-automatically/#use-a-restart-policy	22:21
clarkb	restart: on-failure and restart: unless-stopped might be good options	22:22
tobiash	clarkb: restart: on-failure is probably what you want	22:22
corvus	on-failure looks good -- also, it's worth checking whether we need a restart in order to have these start on boot. maybe we want to remove restart altogether?	22:27
clarkb	I think start on boot is related to whather or not the last state was an up or a down	22:29
clarkb	we don't auto start on first deployment here because nothing has up'd the containers	22:29
clarkb	but once they are up'd they should start on boot if I understand this correctly	22:30
clarkb	so ya we can probably remove restart altogether	22:30
*** harrymichal has quit IRC		22:41
openstackgerrit	Clark Boylan proposed zuul/zuul master: Document zuul-executor graceful https://review.opendev.org/c/zuul/zuul/+/777699	22:59
corvus	777689 failed with interesting errors if anyone feels like more nodepool debugging: https://zuul.opendev.org/t/zuul/build/b3492e29b42e4adeb480cdf177790327	23:11
corvus	i suspect the failure is transient, but it still looks like a bug	23:11
corvus	clarkb: super nit on the docs patch, but worth it i think (sorry)	23:15
openstackgerrit	Clark Boylan proposed zuul/zuul master: Document zuul-executor graceful https://review.opendev.org/c/zuul/zuul/+/777699	23:17
clarkb	no problem	23:17
fungi	clarkb: we've mostly not used graceful because we're in a hurry to restart and builds which get aborted by an abrupt stop will be retried (unless unlucky and already on their third strike)	23:54

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!