Thursday, 2021-02-25

*** tosky has quit IRC00:00
openstackgerritMerged zuul/zuul master: Support per branch change queues  https://review.opendev.org/c/zuul/zuul/+/71853100:43
openstackgerritMerged zuul/zuul master: Move queue from pipeline to project  https://review.opendev.org/c/zuul/zuul/+/72018200:45
*** hamalq has quit IRC00:56
openstackgerritJames E. Blair proposed zuul/zuul master: Use ZooKeeper TLS in tests  https://review.opendev.org/c/zuul/zuul/+/77748900:59
openstackgerritJames E. Blair proposed zuul/zuul master: Emit config warnings if shared queues on pipelines  https://review.opendev.org/c/zuul/zuul/+/77747901:08
openstackgerritJames E. Blair proposed zuul/nodepool master: Add test-setup-docker.sh  https://review.opendev.org/c/zuul/nodepool/+/77749101:21
corvustristanC, tobiash: ^ i just now realized i forgot to 'git add' that file when we were working on the nodepool zk change.  some of the review comments make more sense now.  :)01:21
openstackgerritJames E. Blair proposed zuul/zuul master: Use ZooKeeper TLS in tests  https://review.opendev.org/c/zuul/zuul/+/77748901:57
*** rlandy|bbl is now known as rlandy03:15
*** rlandy has quit IRC03:16
*** SpamapS has quit IRC03:47
*** SpamapS has joined #zuul03:51
*** ykarel has joined #zuul04:55
*** bhagyashri|ruck is now known as bhagyashri|rover05:15
*** asettle has quit IRC05:17
openstackgerritMerged zuul/zuul master: Add python-logstash-async to container images  https://review.opendev.org/c/zuul/zuul/+/77655105:28
*** evrardjp has quit IRC05:33
*** evrardjp has joined #zuul05:33
*** ajitha has joined #zuul05:35
*** jfoufas1 has joined #zuul05:42
*** saneax has joined #zuul05:55
*** yoctozepto0 has joined #zuul06:05
*** yoctozepto has quit IRC06:05
*** yoctozepto0 is now known as yoctozepto06:05
*** zbr has quit IRC06:43
*** reiterative has quit IRC06:50
*** reiterative has joined #zuul06:51
*** icey has quit IRC06:58
*** icey has joined #zuul06:58
*** icey has quit IRC07:01
*** icey has joined #zuul07:05
*** icey has quit IRC07:08
*** jpena|off is now known as jpena07:45
*** rpittau|afk is now known as rpittau07:57
*** jcapitao has joined #zuul07:59
*** jfoufas1 has quit IRC08:19
openstackgerritTobias Henkel proposed zuul/nodepool master: Add zookeeper-timeout connection config  https://review.opendev.org/c/zuul/nodepool/+/75202208:22
avasscorvus: there's a shell_type change for nodepool as well but that one is marked WIP. I think there should be a simple unittest for it08:28
*** ykarel_ has joined #zuul08:31
openstackgerritTobias Henkel proposed zuul/zuul master: Harmonize zk session timeouts  https://review.opendev.org/c/zuul/zuul/+/76320908:31
openstackgerritTobias Henkel proposed zuul/zuul master: Make repo state buildset global  https://review.opendev.org/c/zuul/zuul/+/73860308:33
*** ykarel has quit IRC08:33
*** ykarel_ is now known as ykarel08:34
*** tosky has joined #zuul08:38
*** icey has joined #zuul08:57
*** msuszko has quit IRC09:04
openstackgerritTobias Henkel proposed zuul/zuul master: Move fingergw config to fingergw  https://review.opendev.org/c/zuul/zuul/+/66494909:05
openstackgerritTobias Henkel proposed zuul/zuul master: Route streams to different zones via finger gateway  https://review.opendev.org/c/zuul/zuul/+/66496509:05
openstackgerritTobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw  https://review.opendev.org/c/zuul/zuul/+/66495009:05
*** msuszko has joined #zuul09:05
openstackgerritTobias Henkel proposed zuul/zuul master: Add --validate-tenants option to zuul scheduler  https://review.opendev.org/c/zuul/zuul/+/54216009:11
Open10K8SHi team. can you please check these again? https://review.opendev.org/c/zuul/zuul-jobs/+/776677 https://review.opendev.org/c/opendev/base-jobs/+/77708709:37
*** harrymichal has joined #zuul09:48
*** yoctozepto9 has joined #zuul09:55
*** nils has joined #zuul09:55
*** jhesketh_ has joined #zuul09:57
*** paulalbertella has joined #zuul09:57
*** avass_ has joined #zuul09:58
*** arxcruz has quit IRC10:00
*** arxcruz has joined #zuul10:01
*** reiterative has quit IRC10:03
*** yoctozepto has quit IRC10:03
*** jhesketh has quit IRC10:03
*** avass has quit IRC10:03
*** imtiazc has quit IRC10:03
*** irclogbot_0 has quit IRC10:03
*** yoctozepto9 is now known as yoctozepto10:03
*** irclogbot_2 has joined #zuul10:08
*** jangutter has joined #zuul10:40
*** jangutter_ has joined #zuul10:45
*** jangutter has quit IRC10:49
*** sshnaidm|afk is now known as sshnaidm|pto11:22
*** jangutter_ is now known as jangutter12:05
*** jcapitao is now known as jcapitao_lucnh12:19
*** jcapitao_lucnh is now known as jcapitao_lunch12:19
*** jpena is now known as jpena|lunch12:27
avass_any plan on upgrading opendevs zuul to v4? I want to upgrade ours but I don't want to be first out :)12:38
avass_or is that 3.19 version the same change but from an image built in gate and not by a tag?12:39
*** rlandy has joined #zuul12:51
openstackgerritFelix Edel proposed zuul/zuul master: Remove superfluous flushes and queries from SQL reporter  https://review.opendev.org/c/zuul/zuul/+/75266412:51
*** zbr has joined #zuul13:13
*** saneax has quit IRC13:19
*** jpena|lunch is now known as jpena13:26
openstackgerritMerged zuul/nodepool master: Add test-setup-docker.sh  https://review.opendev.org/c/zuul/nodepool/+/77749113:27
tobiashavass_: v4 has been tagged from the last restart rev13:28
*** jcapitao_lunch is now known as jcapitao13:30
*** ykarel has quit IRC13:32
*** ykarel has joined #zuul13:37
*** iurygregory has quit IRC13:43
*** iurygregory has joined #zuul13:44
*** zbr7 has joined #zuul13:47
avass_tobiash: thanks!13:49
*** zbr has quit IRC13:49
*** zbr has joined #zuul13:51
*** zbr7 has quit IRC13:53
*** jangutter_ has joined #zuul13:55
*** jangutter has quit IRC13:58
fungiavass_: tobiash: yeah it's v4, we restarted it on the commit we were considering tagging just to make sure it was still working14:01
*** yoctozepto has quit IRC14:01
*** yoctozepto has joined #zuul14:01
*** zbr0 has joined #zuul14:11
*** zbr has quit IRC14:13
*** zbr0 is now known as zbr14:13
*** jangutter has joined #zuul14:16
*** jangutter_ has quit IRC14:19
openstackgerritFelix Edel proposed zuul/zuul master: Move fingergw config to fingergw  https://review.opendev.org/c/zuul/zuul/+/66494914:31
openstackgerritFelix Edel proposed zuul/zuul master: Route streams to different zones via finger gateway  https://review.opendev.org/c/zuul/zuul/+/66496514:31
openstackgerritFelix Edel proposed zuul/zuul master: Support ssl encrypted fingergw  https://review.opendev.org/c/zuul/zuul/+/66495014:31
openstackgerritFelix Edel proposed zuul/zuul master: Perform per tenant locking in getStatus  https://review.opendev.org/c/zuul/zuul/+/77269514:48
*** saneax has joined #zuul14:49
*** zbr1 has joined #zuul14:54
*** avass_ is now known as avass14:56
*** zbr has quit IRC14:56
*** zbr1 is now known as zbr14:56
*** jfoufas1 has joined #zuul15:28
*** saneax has quit IRC15:35
*** ykarel has quit IRC15:38
*** zbr3 has joined #zuul15:50
*** zbr has quit IRC15:52
*** zbr3 is now known as zbr15:52
openstackgerritDong Zhang proposed zuul/zuul master: Display branch of queue in status page  https://review.opendev.org/c/zuul/zuul/+/77761315:53
openstackgerritDong Zhang proposed zuul/zuul master: Display branch of queue in status page  https://review.opendev.org/c/zuul/zuul/+/77761315:57
openstackgerritDong Zhang proposed zuul/zuul master: Display branch of queue in status page  https://review.opendev.org/c/zuul/zuul/+/77761315:58
openstackgerritDong Zhang proposed zuul/zuul master: Display branch of queue in status page  https://review.opendev.org/c/zuul/zuul/+/77761315:59
*** jfoufas1 has quit IRC16:12
*** zbr9 has joined #zuul16:13
*** zbr has quit IRC16:15
*** zbr9 is now known as zbr16:15
*** nils has quit IRC16:20
*** nils has joined #zuul16:26
*** zbr3 has joined #zuul16:39
*** zbr has quit IRC16:41
*** zbr3 is now known as zbr16:41
*** jpena is now known as jpena|off16:54
openstackgerritMatthieu Huin proposed zuul/zuul master: Spec: external permissions for the REST admin API  https://review.opendev.org/c/zuul/zuul/+/77762917:04
*** zbr7 has joined #zuul17:17
*** zbr has quit IRC17:19
*** zbr7 is now known as zbr17:19
*** ikhan has joined #zuul17:32
*** rpittau is now known as rpittau|afk17:34
*** zbr3 has joined #zuul17:38
*** zbr has quit IRC17:40
*** zbr3 is now known as zbr17:40
*** zbr5 has joined #zuul17:42
avassfungi: I think we'll be upgrading very soon if you haven't had any issues yet17:44
*** zbr has quit IRC17:44
*** zbr5 is now known as zbr17:44
funginone related to latest zuul/nodepool, as far as i'm aware17:46
avassgood :)17:47
fungiwe've been restarting frequently on latest master branch builds though, so if we did run into problems we already addressed them earlier i suppose17:48
clarkbthe only thing we ran into was the ready locked nodes not being marked fulfilled in nodesets, but tobiash reports they observed that too previously (so unlikely to be a v4 issue)17:51
openstackgerritAlbin Vass proposed zuul/zuul master: Remove sqlreporter from quickstart pipeline definitions  https://review.opendev.org/c/zuul/zuul/+/77763817:51
clarkband we sovled that by restarting the launcher17:51
clarkbthe workaround at least is easy17:51
tobiashWe have ready locked since ages17:51
avassthose ^ are supposed to be removed now right?17:51
fungiavass: yes, they'll generate deprecation warnings in the log17:52
avassI think we've encountered that a couple of time as well17:52
avasswe had one locked static node this week that could be what you're talking about17:54
avassfungi: just making sure since the example pipelines hadn't been updated :)17:55
clarkbrelated to that one thing I noticed was that we don't log much when we skip in the noderequest poll17:55
clarkbmaybe we should add logging to the various skip reasons to try and identify why that situation happens (I suspect it must be the poll that causes it but not sure of that)17:55
*** jcapitao has quit IRC17:55
clarkbI'll look at that again while I am thinking about it17:56
clarkbhrm it might be very noisy though17:57
avassoh those aren't used by quickstart. there's another zuul-conf directory that is used.17:57
openstackgerritAlbin Vass proposed zuul/zuul master: Remove sqlreporter from example pipeline definitions  https://review.opendev.org/c/zuul/zuul/+/77763817:58
*** nils has quit IRC17:59
avasshowever for static nodes it's easy to just delete the node in zookeeper. I'm not sure if that's the correct way to solve it however17:59
avassI guess for dynamic nodes they would be leaked?18:00
openstackgerritClark Boylan proposed zuul/nodepool master: Add logging to noderequest handler polls  https://review.opendev.org/c/zuul/nodepool/+/77764118:01
clarkbavass: when you restart the launcher the locks are removed which allows the newly started replacement launcher to reprocess those nodse and assign them to jobs18:01
avassoh that sounds a bit better18:03
*** wuchunyang has joined #zuul18:03
clarkbyou can probably manually remove the locks too18:04
clarkbfor the same effect18:04
clarkbbut not sure, haven't tested that option18:04
avassI _think_ I tried restarting nodepool for static nodes one time that happened and it didn't help but I might be misremembering18:04
*** wuchunyang has quit IRC18:07
*** wuchunyang has joined #zuul18:41
*** saneax has joined #zuul18:45
*** wuchunyang has quit IRC18:45
*** harrymichal has quit IRC18:53
*** jangutter_ has joined #zuul19:03
*** jangutter has quit IRC19:06
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: cabal-test: add install_args role var  https://review.opendev.org/c/zuul/zuul-jobs/+/77765319:17
openstackgerritDong Zhang proposed zuul/zuul master: Display branch of queue in status page  https://review.opendev.org/c/zuul/zuul/+/77761319:21
*** tjgresha has joined #zuul19:33
tobiashclarkb: in our case the scheduler holds the lock of the leaked nodes19:57
*** saneax has quit IRC20:02
*** hamalq has joined #zuul20:02
clarkbin https://review.opendev.org/c/zuul/nodepool/+/777641 almost 10% of the log lines in the functional openstack job are "Node request waiting for launches to complete" which that change adds20:24
clarkbdefinitely chatty, would be curious to hear if others think that is a problem though20:24
*** ajitha has quit IRC20:25
avasstobiash: right that was the reason20:35
*** tflink_ has joined #zuul20:35
*** logan- has quit IRC20:35
*** icey_ has joined #zuul20:36
*** decimuscorvinus_ has joined #zuul20:36
*** mugsie_ has joined #zuul20:36
tobiashclarkb: since that's the 'not yet ready' within a poll I think that can create problems for busy systems20:36
*** persia_ has joined #zuul20:36
*** Tahvok_ has joined #zuul20:36
avassclarkb: would that become very noisy with a lot of requests?20:36
*** Tahvok has quit IRC20:36
*** zbr has quit IRC20:36
*** fdegir has quit IRC20:36
*** decimuscorvinus has quit IRC20:36
*** icey has quit IRC20:36
*** EmilienM has quit IRC20:36
*** mvadkert has quit IRC20:36
*** persia has quit IRC20:36
*** melwitt has quit IRC20:36
*** mugsie has quit IRC20:36
*** swest_ has joined #zuul20:36
*** Tahvok_ is now known as Tahvok20:36
*** melwitt has joined #zuul20:36
*** corvus has quit IRC20:36
tobiashclarkb: however I think you could do the debugging for that without this message as well.20:37
*** tjgresha has quit IRC20:37
tobiashclarkb: what I'm wondering is, do you see any exception trace with the poll function in it?20:37
*** tosky_ has joined #zuul20:37
*** jonass_ has joined #zuul20:37
tobiashI guess that might be also a way to get into that state20:37
tobiashclarkb: and to double check, you're sure that your nodes are locked by nodepool?20:38
tobiashin this case we have two leaks20:38
tobiashone in the scheduler and one in nodepool20:38
*** evrardjp_ has joined #zuul20:38
*** jonass has quit IRC20:38
*** parallax has quit IRC20:38
*** stevthedev has quit IRC20:38
*** dry has joined #zuul20:39
*** mordred has quit IRC20:39
*** stevthedev has joined #zuul20:39
*** mmedvede has quit IRC20:40
*** parallax has joined #zuul20:40
*** Eighth_Doctor has quit IRC20:40
*** tosky has quit IRC20:40
*** noonedeadpunk has quit IRC20:40
*** ricolin has quit IRC20:40
*** evrardjp has quit IRC20:40
*** guillaumec has quit IRC20:40
*** jpenag has joined #zuul20:40
*** msuszko has quit IRC20:40
*** logan- has joined #zuul20:40
*** mnasiadka has quit IRC20:41
*** mmedvede has joined #zuul20:41
avasstobiash: that also means that the nodepool leak could be recent20:41
*** tosky_ is now known as tosky20:42
*** masterpe has quit IRC20:42
*** ironfoot has quit IRC20:42
*** jpena|off has quit IRC20:42
*** tflink has quit IRC20:42
*** mnasiadka has joined #zuul20:42
*** swest has quit IRC20:43
clarkbtobiash: I'm like 98% sure that nodepool had the lock because the noderequest state was not fulfilled. But I looked around in zk directly and couldn't figure out how to determine what actually owned the lock. The lock had a uuid associated with it but I couldn't find a way to map that uuid to anything20:44
*** corvus has joined #zuul20:44
tobiashclarkb: and the noderequest was for one node?20:44
clarkbtobiash: the node lock should only be taken by zuul once the node request is fulfilled if I read the hand shaking there properly20:44
clarkbtobiash: correct it was a single node node request20:44
*** ironfoot has joined #zuul20:45
tobiashclarkb: do you have logs filtered for request id and/or node id?20:45
clarkbchecking for tracebacks with the poll function in them is a good idea. I don't think I did that when this happened. I did take thread stack dumps before restarting but those all looked normal to me (the provider handler thread was still running)20:45
clarkbtobiash: I did, they ended with the node going ready, there was nothing for either node id or request id after that20:46
tobiashclarkb: do you still have those logs? I'd be interested in them, maybe there is something missing.20:49
tobiashclarkb: on exception in poll it should re-add it to the active list and poll them again20:50
tobiashthat node request should be then recurring in the 'Active requests' line20:51
tobiashif that stops recurring there it should be either finished or failed20:51
clarkbtobiash: I think we do have them let me look20:56
tobiashclarkb: also you should have either 'Error unlocking node' or 'Unlocked node <number>' in the logs, the former unluckily without node id but with exception trace20:57
*** mordred has joined #zuul20:58
openstackgerritTobias Henkel proposed zuul/nodepool master: Mention node id when unlock failed  https://review.opendev.org/c/zuul/nodepool/+/77767821:00
tobiashclarkb: this should aid correlating node and unlocking exception ^21:01
openstackgerritTobias Henkel proposed zuul/nodepool master: Mention node id when unlock failed  https://review.opendev.org/c/zuul/nodepool/+/77767821:02
clarkbthanks, I've got the node request and node id logs sorted out now. Lookking for unlock failuers and will put a paste together21:02
clarkbtobiash: http://paste.openstack.org/show/OTea9DPKielkU8wVO1XF/21:05
tobiashclarkb: that is really weird21:08
tobiashthat means that we got out with a result of True of the poll method21:08
clarkbor the poll method was always short circuiting21:09
clarkbwhich was why I wanted to add that extra logging21:09
clarkboh except the node request is removed from the active requests list21:10
clarkbdoes that imply poll returned true?21:10
tobiashFalse and exception trigger reenqueue to active handlers21:10
tobiashso poll can only get lost if it returned true21:10
*** guillaumec has joined #zuul21:10
clarkband the request not being logged as active shows us it did that21:10
clarkbthat is weird21:10
tobiashsee _removeCompletedHandlers21:11
tobiashyou presumably also got no zk session loss?21:12
tobiashclarkb: what's also weird is that you should have 'Removing request handler' in the logs21:13
avassmaybe that kze.SessionExpiredError should be logged21:14
clarkbaha 2021-02-22 22:59:25,186 WARNING kazoo.client: Connection dropped: socket connection error: Connection reset by peer21:14
clarkb2021-02-22 22:59:25,442 INFO kazoo.client: Zookeeper connection established, state: CONNECTED21:15
clarkbwe were disconnected for 300ms but maybe that was long enough to cause problems for those nodes21:16
tobiashclarkb: we do log it in: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/zk.py#L91321:17
tobiashif you check for 'ZooKeeper connection: LOST' you see the lost session events21:17
clarkbya 2021-02-22 22:59:25,187 DEBUG nodepool.zk.ZooKeeper: ZooKeeper connection: SUSPENDED21:17
clarkbthen 2021-02-22 22:59:25,442 DEBUG nodepool.zk.ZooKeeper: ZooKeeper connection: CONNECTED21:17
clarkbthat is likely the underlying cause, but doesn't quite tell us why that was not recoverable21:18
tobiashclarkb: I think we should get this in and default to a higher session timeout: https://review.opendev.org/c/zuul/nodepool/+/75202221:19
tobiashclarkb: so the session was not lost but suspended which likely means that some zk store might have failed but we didn't loose the lock21:19
tobiashwe loose the lock only on session lost21:20
clarkband if that had happened we would've avoided this21:20
clarkbtobiash: though your change would presumably make this problem worse?21:20
clarkbbecause we'd lose the session, but not the lock and result in this problem?21:20
*** Eighth_Doctor has joined #zuul21:21
tobiashclarkb: well a busy nodepool is sometimes too late with the heartbeat thread which makes a longer session timeout required21:21
clarkbI can see how this might make one problem better but another worse :)21:22
tobiashok, well that fixes a different issue ;)21:22
clarkbI'm ok with that too, if we find this issue I observed becomes worse we'll just debug it faster :)21:22
tobiashdo you have any exception in that time range?21:22
avasscould there be a similar issue with the scheduler not releasing locks in that case?21:22
tobiashsome are logged without node id21:22
clarkbtobiash: http://paste.openstack.org/show/803023/ that maybe21:25
clarkbits the same provider and it fails in _assignHandlers ?21:25
clarkbthat happens after 2021-02-22 23:00:11,102 DEBUG nodepool.zk.ZooKeeper: ZooKeeper connection: SUSPENDED and before 2021-02-22 23:00:12,052 DEBUG nodepool.zk.ZooKeeper: ZooKeeper connection: CONNECTED21:26
tobiashis that one trace?21:26
tobiashlooks weird21:26
clarkbyes I believe it is one trace, but I did use grep -A -B so let me double check21:26
clarkbyes one trace unless the log is interleaved21:27
clarkb(which I suppose it could be)21:27
tobiashno idea what createMinReady should have todo with _assignhandlers21:28
clarkb2021-02-22 23:00:11,109 ERROR nodepool.PoolWorker.rax-iad-main: Error in PoolWorker also fails just aboev it and has basically the same trace21:28
clarkbso I don't think they are interleaved21:28
tobiashanyway, nodepool is not good at handling connection errors, there is much room to improve there21:29
tobiashas far as I've understood I think we need to handle the kazoo.exceptions.ConnectionLoss exception everywhere in the zk class and only reraise if the session has been lost and until then keep retrying21:30
tobiashwith this connectionloss the kazoo state turns into suspended21:31
clarkbthat makes sense21:31
clarkbsince before the session is lost our locks are still valid21:31
clarkbits only once the lock goes away that nodepool propr will care21:31
tobiashcorrect21:32
tobiashcorvus: I think we need to take this into consideration in zuul as well21:32
tobiashconnection state handling will become more important there as well21:33
tobiashI guess getting this right will be non-trivial21:34
openstackgerritTobias Henkel proposed zuul/nodepool master: Log error message on request handler removal on session loss  https://review.opendev.org/c/zuul/nodepool/+/77768921:37
tobiashclarkb: maybe it went this route if there was a misleading exception in kazoo ^21:38
clarkbtobiash: can we log the requset id in that log?21:39
clarkbr.request.id is the value I think21:39
tobiashthat's part of the logger already21:39
clarkboh wait ya I see that now21:39
*** rlandy has quit IRC21:54
clarkbis `zuul-executor graceful` not documented intentionally?22:02
clarkbI want to try it out but only if it is expected to work :)22:02
tobiashwe use it in production22:02
clarkbcool I guess if it works for me I'll update the docs too22:02
tobiashI guess that was just forgotten22:03
tobiashor let me rephrase, I forgot ;)22:04
tobiashclarkb: there is one glitch tho, if the executor is already paused or governed22:05
clarkbgoverned based on the disk/cpu/memory governors?22:06
clarkbanyway it seems to be working, it is currently waiting on ~39 jobs to complete22:06
tobiashthen it stays in pause and won't terminate because of uncatched unable to unregister unregistered function22:06
clarkbah ok, so if it goes to 0 jobs to complete but doesn't exit I can probably safely stop it at that point22:07
clarkbhrm I think I just discovered a flaw in this though. We tell docker-compose to restart-always in our zuul-executor config22:08
clarkbso it will exit and then restart :/22:08
clarkbI guess what I really need is pause, wait for it to do no work then docker-compose down. /me is learning22:08
tobiashoh, that's unfortunate ;)22:12
tobiashbtw, this is the exception when it's already paused: http://paste.openstack.org/show/803026/22:12
tobiashso double pausing throws an exception22:13
*** harrymichal has joined #zuul22:14
openstackgerritTobias Henkel proposed zuul/zuul master: Catch exception when double unregistering merge jobs  https://review.opendev.org/c/zuul/zuul/+/77769422:19
tobiashclarkb: this should fix this corner case ^22:19
clarkbthanks22:20
tobiashclarkb: can you tell docker-compose to auto-restart only on result !=0?22:20
tobiashthen you could still use graceful22:20
clarkbgood question22:21
tobiashclarkb: at least pure docker can do that: https://docs.docker.com/config/containers/start-containers-automatically/#use-a-restart-policy22:21
clarkbrestart: on-failure and restart: unless-stopped might be good options22:22
tobiashclarkb: restart: on-failure is probably what you want22:22
corvuson-failure looks good -- also, it's worth checking whether we need a restart in order to have these start on boot.  maybe we want to remove restart altogether?22:27
clarkbI think start on boot is related to whather or not the last state was an up or a down22:29
clarkbwe don't auto start on first deployment here because nothing has up'd the containers22:29
clarkbbut once they are up'd they should start on boot if I understand this correctly22:30
clarkbso ya we can probably remove restart altogether22:30
*** harrymichal has quit IRC22:41
openstackgerritClark Boylan proposed zuul/zuul master: Document zuul-executor graceful  https://review.opendev.org/c/zuul/zuul/+/77769922:59
corvus777689 failed with interesting errors if anyone feels like more nodepool debugging: https://zuul.opendev.org/t/zuul/build/b3492e29b42e4adeb480cdf17779032723:11
corvusi suspect the failure is transient, but it still looks like a bug23:11
corvusclarkb: super nit on the docs patch, but worth it i think (sorry)23:15
openstackgerritClark Boylan proposed zuul/zuul master: Document zuul-executor graceful  https://review.opendev.org/c/zuul/zuul/+/77769923:17
clarkbno problem23:17
fungiclarkb: we've mostly not used graceful because we're in a hurry to restart and builds which get aborted by an abrupt stop will be retried (unless unlucky and already on their third strike)23:54

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!