Thursday, 2020-03-26

openstackgerritMerged zuul/zuul master: bindep: add bzip2 to all platforms  https://review.opendev.org/71504100:18
*** zxiiro has quit IRC00:30
*** ysandeep|away is now known as ysandeep00:43
*** jamesmcarthur has quit IRC00:46
*** tosky has quit IRC00:52
*** jamesmcarthur has joined #zuul01:01
*** rlandy has joined #zuul01:05
*** rlandy has quit IRC01:07
*** jamesmcarthur has quit IRC01:23
*** ysandeep is now known as ysandeep|rover01:25
*** Goneri has quit IRC01:30
*** bhavikdbavishi has joined #zuul02:42
*** swest has quit IRC02:54
*** bhavikdbavishi has quit IRC02:54
*** leathekd has quit IRC02:57
*** swest has joined #zuul03:10
*** jamesmcarthur has joined #zuul03:18
*** jamesmcarthur has quit IRC03:22
*** jamesmcarthur has joined #zuul03:27
*** jamesmcarthur has quit IRC03:33
*** jamesmcarthur has joined #zuul03:34
*** bhavikdbavishi has joined #zuul03:53
*** bhavikdbavishi1 has joined #zuul03:56
*** bhavikdbavishi has quit IRC03:57
*** bhavikdbavishi1 is now known as bhavikdbavishi03:57
*** jamesmcarthur has quit IRC04:23
*** jamesmcarthur has joined #zuul04:26
*** jamesmcarthur has quit IRC04:34
*** jamesmcarthur has joined #zuul04:34
*** jamesmcarthur has quit IRC04:43
*** jamesmcarthur has joined #zuul04:44
*** sgw has quit IRC04:59
*** y2kenny has quit IRC05:09
*** evrardjp has quit IRC05:36
*** evrardjp has joined #zuul05:36
*** reiterative has quit IRC05:40
*** reiterative has joined #zuul05:40
*** jamesmcarthur has quit IRC05:55
*** jamesmcarthur has joined #zuul05:57
*** jamesmcarthur has quit IRC06:02
*** jamesmcarthur has joined #zuul06:07
*** jamesmcarthur has quit IRC06:30
*** dpawlik has joined #zuul07:22
*** bhavikdbavishi has quit IRC07:27
*** ysandeep|rover is now known as ysandeep|rover|l07:35
*** ysandeep|rover|l is now known as ysandeep|roveraf07:35
*** ysandeep|roveraf is now known as ysandeep|rover|l07:36
*** bhavikdbavishi has joined #zuul07:54
*** evrardjp has quit IRC07:57
*** bolg has quit IRC07:59
*** evrardjp has joined #zuul08:01
*** bhavikdbavishi has quit IRC08:04
*** guilhermesp has quit IRC08:06
*** vblando has quit IRC08:06
*** dmellado has quit IRC08:06
*** guilhermesp has joined #zuul08:07
*** dmellado has joined #zuul08:10
*** bolg has joined #zuul08:17
*** bhavikdbavishi has joined #zuul08:19
*** bolg has quit IRC08:26
*** bolg has joined #zuul08:29
*** toabctl has quit IRC08:30
*** bolg has quit IRC08:33
*** ysandeep|rover|l is now known as ysandeep|rover08:35
*** bolg has joined #zuul08:35
*** jpena|off is now known as jpena08:53
*** tosky has joined #zuul09:00
*** toabctl has joined #zuul09:14
*** bhavikdbavishi has quit IRC09:17
*** bhavikdbavishi has joined #zuul09:31
*** wxy-xiyuan has quit IRC09:39
*** wxy-xiyuan has joined #zuul09:39
*** sshnaidm is now known as sshnaidm|afk09:56
openstackgerritTobias Henkel proposed zuul/zuul master: Install unzip on all platforms  https://review.opendev.org/71491910:16
openstackgerritTobias Henkel proposed zuul/zuul master: Update docker based test setup for zk auth  https://review.opendev.org/65709610:18
*** bhavikdbavishi has quit IRC10:27
*** bhavikdbavishi has joined #zuul10:28
*** bhavikdbavishi1 has joined #zuul10:31
*** bhavikdbavishi has quit IRC10:33
*** bhavikdbavishi1 is now known as bhavikdbavishi10:33
*** sugaar has quit IRC10:38
*** sugaar has joined #zuul10:39
*** avass has quit IRC11:14
*** sshnaidm|afk is now known as sshnaidm11:25
*** bhavikdbavishi has quit IRC11:55
tristanCwhen y2kenny comes back, iirc, `bwrap: Can't mount proc` can be caused by either the lack of userns, or that bwrap is not root setuid11:57
*** jpena is now known as jpena|lunch11:58
*** Goneri has joined #zuul12:14
*** rfolco has joined #zuul12:20
*** rlandy has joined #zuul12:23
*** bhavikdbavishi has joined #zuul12:32
zbranyone aware of TypeError: 'NoneType' object is not iterable from zuul_sphinx/zuul.py", line 109, in parse_zuul_d ?12:36
zbrmainly it happened when I removed layout.yaml because jobs where already defined inside projecta.yaml, at https://review.rdoproject.org/r/#/c/26005/12:38
zbrzuul had no problem with that, but tox-docs choked.12:38
zbri will propose a patch12:41
zbri see that the code from master behaves differently, giving an error like: ('File %s in Zuul dir is empty', '/Users/ssbarnea/c/rdo/rdo-jobs/zuul.d/layout.yaml')12:46
mordredzbr: interesting. any reason to not just delete the file instead of removing the contents but leaving it there?12:52
zbrmordred: it was my mistake, i left the file empty.12:53
zbrsorted once i removed the file.12:54
zbrbut it would be a good idea to make a new release of zuul-sphinx, so we can benefit from the better error message12:54
zbrnow i do not know if an empty yaml file can be considered valid or not.12:54
mordredyaml loads empty files as a None object12:56
mordredbut - yeah, if there's a better error message in master that's good12:56
fungiusing yaml.load() on /dev/null returns None, yeah12:57
mordredlooks like the error message fix is the only substantive change in zuul-sphinx atm - so maybe an 0.4.212:58
fungiwfm13:00
*** jpena|lunch is now known as jpena13:01
*** y2kenny has joined #zuul13:08
mnaseri'd love feedback on idea how i can test https://review.opendev.org/#/c/715045/113:23
*** bhavikdbavishi has quit IRC13:26
*** bhavikdbavishi has joined #zuul13:27
*** hashar has joined #zuul13:28
*** zxiiro has joined #zuul13:29
*** bhavikdbavishi has quit IRC13:38
*** armstrongs has joined #zuul13:44
mordredzuul-maint: we landed an update to python-builder/python-base to remove the installation of recommends. let me know if you see any image build failures.13:46
*** jcapitao has joined #zuul13:51
*** y2kenny has quit IRC13:58
*** y2kenny has joined #zuul14:03
*** sgw has joined #zuul14:06
openstackgerritMonty Taylor proposed zuul/nodepool master: Pin docker images to 3.7 explicitly  https://review.opendev.org/71504314:08
openstackgerritMonty Taylor proposed zuul/nodepool master: Add libc6-dev to bindep  https://review.opendev.org/71521614:08
mordredmnaser: ^^ speak of the devil :)14:09
mnaseraha14:09
*** ysandeep|rover is now known as ysandeep|away14:10
*** guilhermesp has quit IRC14:27
*** guilhermesp has joined #zuul14:27
openstackgerritMonty Taylor proposed zuul/zuul-registry master: Use versioned python base images  https://review.opendev.org/71522514:28
*** chandan_kumar has joined #zuul14:29
*** jkt has quit IRC14:29
*** chandankumar has quit IRC14:29
*** jkt has joined #zuul14:30
*** sshnaidm has quit IRC14:30
*** corvus has quit IRC14:30
*** sshnaidm has joined #zuul14:31
*** corvus has joined #zuul14:31
corvusmnaser: i don't have a good idea of how to test that; i don't think we can fake out the uri part of it.  i think the only way to test would be a completely synthetic test or adding testing conditionals to the role and supplying test data14:33
*** guilhermesp has quit IRC14:34
*** guilhermesp has joined #zuul14:35
*** guilhermesp has quit IRC14:38
openstackgerritMonty Taylor proposed zuul/zuul master: Be explicit about base container image  https://review.opendev.org/71454914:38
*** guilhermesp has joined #zuul14:38
*** guilhermesp has quit IRC14:39
*** guilhermesp has joined #zuul14:40
*** jamesmcarthur has joined #zuul14:51
*** jamesmcarthur has quit IRC14:59
*** jamesmcarthur has joined #zuul15:00
*** jamesmcarthur_ has joined #zuul15:02
openstackgerritAndreas Jaeger proposed zuul/zuul-website master: Update for OpenDev, https  https://review.opendev.org/71426115:04
*** jamesmcarthur has quit IRC15:07
openstackgerritJan Kubovy proposed zuul/zuul master: WIP: Enforce sql connections for scheduler and web  https://review.opendev.org/63047215:34
openstackgerritJan Kubovy proposed zuul/zuul master: Improve typings in context of 630472  https://review.opendev.org/71524715:34
y2kennySo I was able to setup a nodepool with k8s provider and it looks to be working (it created some namespaces with a pod in it.)  the nodes show up in the web UI and looks like the scheduler is using the right scheduler (in cluster, so the zone things seems to be working.)15:39
y2kennybut when the job is running, the stream log only showed  Job console starting, running ansible setup, pre-runs and post runs15:40
y2kennyand the job  failed after 3 attempt15:40
y2kennyso I look into the executor log (I ran it with the debug flag on.)15:41
y2kenny"Unable to start kubectl port forward" exception.  Does that mean I needed an executor image with kubectl installed?15:42
SpamapSy2kenny: IIRC, the streaming was very recently fixed. And yes, the executor absolutely must have kubectl.15:42
y2kennyor does the executor needs the right role binding as well15:42
clarkbexecutors now need socat and kubectl on them15:43
clarkbif you are using the container images I expect they are present though15:44
SpamapShttps://opendev.org/zuul/zuul/commit/2881ee578599b199280c60fb76b5201dd855f41915:44
y2kennyI am using the official executor image from dockerhub.  I know I can extend that image but I am wondering if the executor actually uses the kubectl shell command or the library15:44
y2kennySpamapS: AH15:44
y2kennymissed that one15:44
SpamapSLooks like maybe that hasn't been released yet?15:45
SpamapShm no, 3.18.0 has the fixes, just not the fixed release note. ;)15:45
y2kennyI will give that a try and see how things go15:46
clarkbthe actual rendered version has it properly under 3.18 on the website15:46
fungihttps://zuul-ci.org/docs/zuul/reference/releasenotes.html#relnotes-3-18-015:51
*** chandan_kumar is now known as chandankumar15:51
*** jamesmcarthur_ has quit IRC16:00
*** jamesmcarthur has joined #zuul16:01
mnaseroh16:02
mnasersometing just occured to me16:02
mnaserdo we have a semaphore or ordering enforce for our promote jobs16:03
mnaserwhat if two promote jobs get queued, does one start before the other?16:03
fungithe promote pipeline uses a supercedent pipeline manager16:04
fungiso only one runs at a time for the same project+branch16:04
clarkband it enforces order16:05
mnaserok cool, just something that came in mind.  i know we have some things we do in 'post' that ends up needing a semaphore16:05
fungiright, and any subsequently queued changes supercede one another so only the most recently triggered one is ever waiting at a time16:05
fungibecause also running builds for all the intermediate changes would just be a waste of resources and added delay16:06
*** ysandeep|away is now known as ysandeep16:08
y2kennyquestion about relationship between job/project/branch/pipeline.16:11
y2kennyif all the job defined for a pipeline for a project does not run due to no matching branch, what is the result of the triggered pipeline?16:12
y2kennyis that a noop (pass) or just the no-jobs16:13
y2kennyactually, nevermind... I think I found the relevant doc that points to no-jobs16:14
fungiyeah, if there are no jobs zuul doesn't (well, shouldn't anyway, modulo bugs) report16:15
fungiif you want it to report an unconditional passing result, add a noop job for those otherwise jobless branches16:16
y2kennyok16:17
fungiif memory serves, the noop job is special in that it short-circuits immediately without having to even engage an executor16:17
fungiand doesn't need to be explicitly defined anywhere to be used in a project-pipeline16:17
y2kennyunderstood16:20
AJaegerzuul-maint, here's a small update for zuul-website, it changes some links and makes changes for OpenDev, please review https://review.opendev.org/#/c/714261/16:23
fungiAJaeger: rebased for a merge conflict, i guess?16:24
fungireapplied my prior +216:24
*** dpawlik has quit IRC16:26
AJaegerfungi: yes, indeed. Thanks!16:27
AJaegertobiash, thanks for second +2.16:29
AJaegerCould somebody +A the change please or what are we waiting for?16:29
openstackgerritJimmy McArthur proposed zuul/zuul-website master: Adding Infrastructure Donors  https://review.opendev.org/71526116:29
mnaserthis sat for a while: https://review.opendev.org/#/c/713469/ -- if anyone is feeling generous to put reviews too16:30
mnaser:p16:30
y2kennyok so I added the start-zuul-console role and verified that kubectl is in the executor image but I still get port forwarding error.  One obvious problem is the executor's access to the cluster so I gave the executor pod a clusterrolebinding that should let it do pretty much anything but that didn't seem to work.  Does the executor k8s connection16:32
y2kennywork differently than nodepool?16:32
y2kennylike, do I have to mount in .kube/config?16:32
AJaegerthanks, tobiash !16:32
mnasery2kenny: mind pasting th error?16:33
tobiashAJaeger: I didn't see fungi's review while reviewing it ;)16:33
*** jamesmcarthur has quit IRC16:33
y2kennymnaser:16:33
y2kenny2020-03-26 16:28:59,212 ERROR zuul.AnsibleJob: [e: 7dc1e284988742c08c0721de11a689b2] [build: 287412faaa9c45f4932b9d8b3a8507c0] Unable to start port forward:Traceback (most recent call last):  File "/usr/local/lib/python3.7/site-packages/zuul/executor/server.py", line 1975, in prepareAnsibleFiles    fwd.start()  File16:33
y2kenny"/usr/local/lib/python3.7/site-packages/zuul/executor/server.py", line 361, in start    raise Exception("Unable to start kubectl port forward")Exception: Unable to start kubectl port forward16:33
y2kennyactually... I just verified kubectl works by exec into the pod16:34
AJaegertobiash: ah, parallel reviewing ;)16:34
*** jamesmcarthur has joined #zuul16:34
y2kennyI can do kubectl get nodes and I got the correct data16:34
mnasery2kenny: what if you did "kubectl port-forward" in the zuul container?16:34
y2kennymnaser:  I will give that a try16:34
y2kennyerror: TYPE/NAME and list of ports are required for port-forwardSee 'kubectl port-forward -h' for help and examples.16:35
tobiashy2kenny: is kubectl in a non-default location?16:35
y2kennytobiash: no, I just use the official executor image16:36
*** jamesmcarthur_ has joined #zuul16:36
tobiashy2kenny: zuul runs jobs in a sandbox and doesn't include paths like /opt by default16:36
tobiashhrm16:36
mnasery2kenny: kubectl -n nodepool-ns get pods16:36
mnaserkubectl -n nodepool-ns port-forward pod/some-random-pod16:36
mnaseri dont remember the rest of the syntax16:36
*** jamesmcarthur has quit IRC16:37
y2kennymnaser: ok16:38
y2kennyI wonder if I need to restart nodepool after I restarted the executor16:38
mnasery2kenny: kubectl port-forward pod/name :1988516:38
mnaserand add namespace if needed16:38
mnaserlet me know what that outputs16:38
y2kenny# kubectl -n k8s-auto-0000000260 port-forward ubuntu-bionic-k8s-auto :19885Forwarding from 127.0.0.1:40309 -> 1988516:40
openstackgerritMerged zuul/zuul-website master: Update for OpenDev, https  https://review.opendev.org/71426116:40
*** ysandeep is now known as ysandeep|away16:41
*** jcapitao is now known as jcapitao_afk16:42
tobiashcorvus: do you also find exceptions like those in opendev? http://paste.openstack.org/show/791199/16:42
y2kennymnaser: so I think the port forwarding is working inside the executor pod16:43
tobiashthis leads to ignored events and occurs in our system roughly 7 times per 30 days16:43
tobiashclarkb: ^16:43
clarkbtobiash: looking16:43
tobiashthanks16:43
clarkbtobiash: is that the scheduler?16:43
tobiashyes16:44
mnasery2kenny: hmm, interesting16:44
y2kennyI wonder if something was out of sync16:45
mnasery2kenny: it looks like it's failing to do the regex match .. or its failing to start the port forward16:45
y2kennyI retriggered  the job and seems like the error is not there any more16:45
mnasery2kenny: oh weird, maybe your executors are not all running the same image?16:46
clarkbtobiash: searching todays log showed nothing and now I'm running a zgrep on compressed logs16:46
*** jamesmcarthur_ has quit IRC16:46
tobiashclarkb: k, I had three hits in the last 7 days16:46
y2kennydoes that make a different?  I only have one executor inside the cluster which I just  launched16:46
y2kennythe other executor is outside the cluster and should not be used to talk to k8s nodes16:46
tobiashso not a huge problem, but looks like we have a slight multithreading problem there16:46
y2kennythat one I haven't restarted in a while16:47
*** jamesmcarthur has joined #zuul16:47
y2kennymnaser: actually no, the error is still there (scrolled too fast)16:48
clarkbtobiash: its possible your cpus are faster than ours so you hit races like that more often (zgrep still running with no hits)16:48
mnaserohhh16:48
mnaseri think i know what might have happened y2kenny16:48
mnaser`kubectl_command = 'kubectl'` but we use popen without shell=True16:49
mnaserdon't we have to use a full path?16:49
clarkbI think it will use the PATH of the calling process in that case16:50
clarkbbut ya rooting the path might fix it16:50
y2kennyother things that seems to look like an error:16:52
y2kennyDEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build: e0c21bb20e604466b27023e42e2c2f93] Ansible output: b'packages/ara/plugins/actions/ara_read.py) as it seems to be invalid: module'16:52
*** jamesmcarthur has quit IRC16:53
clarkby2kenny: is there a traceback above that?16:53
clarkb(that looks like the tail end of a traceback)16:53
clarkbbut I think its saying you have enabled ara but don't have it installed properly?16:53
mnasery2kenny: give me a second, i mthink i mgiht have a repro16:54
y2kennye0c21bb20e604466b27023e42e2c2f93] Ansible output: b'Using /var/lib/zuul/builds/e0c21bb20e604466b27023e42e2c2f93/ansible/setup_playbook/ansible.cfg as config file'2020-03-26 16:45:03,244 DEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build: e0c21bb20e604466b27023e42e2c2f93] Ansible output: b'[WARNING]: provided hosts list is16:54
y2kennyempty, only localhost is available. Note that'2020-03-26 16:45:03,244 DEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build: e0c21bb20e604466b27023e42e2c2f93] Ansible output: b"the implicit localhost does not match 'all'"2020-03-26 16:45:03,722 DEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build:16:54
y2kennye0c21bb20e604466b27023e42e2c2f93] Ansible output: b'[WARNING]: Skipping plugin (/usr/local/lib/zuul/ansible/2.8/lib/python3.7/site-'16:54
clarkbah ok in that case I think its probably fine as is. Because ya its saying I'm not doing ara things because its not properly installed. Disabling ara or installing it properly will make the warning goa way I think16:54
y2kennyanother one I see under debug as well (that can also be nothing):16:55
y2kenny Ansible output: b"bwrap: Can't mount proc on /newroot/proc: Operation not permitted"16:55
mnasery2kenny: can you run http://paste.openstack.org/show/791201/ like this in the container - python test.py k8s-auto-0000000260 ubuntu-bionic-k8s16:56
mnaserand see what it outputs?16:56
y2kennyit's ok to not use 260 right?  (It got deleted by nodepool I think)16:57
mnaseryeah of course16:57
mnaserwhatever you have running there16:57
clarkby2kenny: tristanC had a note about the bwrap thing. 11:57:04        tristanC | when y2kenny comes back, iirc, `bwrap: Can't mount proc` can be caused by either the lack of userns, or that bwrap is not root setuid16:59
y2kennyhttp://paste.openstack.org/raw/791201/16:59
y2kennyoops16:59
y2kenny# python test.py k8s-auto-0000000273 ubuntu-bionic-k8sError: unknown flag: --addressTraceback (most recent call last):  File "test.py", line 32, in <module>    raise Exception("Unable to start kubectl port forward")16:59
mnaserthere we go17:00
clarkby2kenny: for lack of userns I'm not sure of an easy way to check that, but I know rhel/centos/fedora disable by default17:00
clarkby2kenny: suse and ubuntu enable by default iirc17:00
mnasery2kenny: kubectl --version ?17:00
y2kennyclarkb: oooo I am using fedora as the cluster host17:00
mnaserohhh i know what might be happening17:00
clarkby2kenny: in that case I would check if it is setuid as that should solve it for you17:00
mnasery2kenny: can you replace '--address', '127.0.0.1' by ...             '--address=127.0.0.1',17:01
mnaserso you end up with17:01
mnaserhttps://www.irccloud.com/pastebin/CKylsa81/17:01
y2kenny# kubectl versionClient Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2018-10-10T16:38:01Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4",17:01
y2kennyGitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}17:01
y2kenny(is my cluster too new?)17:01
mnaserno i think its the bug of how we use --address17:02
mnaserit should be --address=127.0.0.1 i think17:02
y2kennythe error is the same17:04
y2kennyunknown flag address17:04
*** bolg has quit IRC17:05
clarkbtobiash: zgrep finished with no hits17:05
y2kennyyea... the kubectl in the image is too old for that17:05
tobiashclarkb: ok, thanks17:05
clarkbtobiash: thats 30 days of logs fwiw17:05
y2kennythe kubectl in the execturo image is 1.11 and it does not have address flag.  kubectl 1.17 has it17:05
tristanCclarkb: only el7 has userns disabled by default, el8 and fedora should allow it out of the box17:05
mnasery2kenny: i think we're installing a really old kubectl version which doesnt have --address17:05
clarkbtristanC: oh that is good to know, thanks17:06
y2kennymnaser: yes... it's 1.11 in the image17:06
tristanCy2kenny: are you running the executor in kubernetes? and are they using a privileged pod?17:06
mnaserthe issue is kubectl does not have --address cause its old17:06
y2kennytristanC: yes in k8s, no on privleged17:06
tristanCy2kenny: then you need to use privileged pod for bwrap, that is what preventing the proc mounting17:07
y2kennyok.  No need to do additional setting on the host side due to fedora?17:07
tristanCy2kenny: i guess you already disabled cgroupv2 for k8s, or you are not using f31.17:08
y2kennyf30 :)17:08
mnaserwhere are we installing kubectl from?17:10
tristanCy2kenny: upgrading to f31 should work fine by adding `systemd.unified_cgroup_hierarchy=0` to the linux cmdline17:11
y2kennytristanC: understood.17:11
*** bolg has joined #zuul17:11
*** jamesmcarthur has joined #zuul17:12
y2kennyyay!!! \o/17:13
openstackgerritTobias Henkel proposed zuul/zuul master: Protect getCachedChanges from concurrent modification  https://review.opendev.org/71527017:14
y2kennywait... celebrated too early17:14
tobiashclarkb, corvus: This should fix this ^17:14
fungiit's never too early to celebrate17:15
* fungi lives in a constant state of celebration17:15
y2kennylol17:15
openstackgerritJimmy McArthur proposed zuul/zuul-website master: Adding Infrastructure Donors  https://review.opendev.org/71526117:17
*** jcapitao_afk is now known as jcapitao17:17
tobiashclarkb: btw, this exception always happened during high load on our scheduler so might hit us more often when the scheduler is contended17:19
y2kennyso I don't see the bwrap error with privileged (thanks tristanC)17:20
y2kennyand I get more log from the job17:20
tobiashwe're starting to suffering from more stalls during tenant reconfigurations (which are mostly cpu bound)17:20
tobiashso looking forward to scale out scheduler17:20
y2kennybut I still get failure with 3 attempts17:20
y2kennymnaser: does the port forwarding affect job execution? or just log streaming?17:23
*** evrardjp has quit IRC17:49
*** evrardjp has joined #zuul17:49
fungimnaser: are you good with https://review.opendev.org/715261 ?17:49
*** openstack has quit IRC17:49
*** openstack has joined #zuul17:53
*** ChanServ sets mode: +o openstack17:53
*** jcapitao has quit IRC17:53
openstackgerritTobias Henkel proposed zuul/zuul master: Protect getCachedChanges from concurrent modification  https://review.opendev.org/71527017:54
tobiashclarkb: updated for all drivers ^17:54
clarkbthanks!17:54
*** jpena is now known as jpena|off18:02
*** sshnaidm is now known as sshnaidm|afk18:17
*** jamesmcarthur has quit IRC18:20
*** jamesmcarthur has joined #zuul18:21
*** hashar has quit IRC18:21
*** jamesmcarthur has quit IRC18:24
*** jamesmcarthur has joined #zuul18:24
*** jamesmcarthur has quit IRC18:26
*** jamesmcarthur has joined #zuul18:27
*** jamesmcarthur has quit IRC18:36
*** jamesmcarthur has joined #zuul18:37
*** jamesmcarthur has quit IRC18:50
*** jamesmcarthur has joined #zuul18:50
y2kennyI just noticed another error when trying to use the k8s driver with labels.type = namespace:19:41
y2kenny  File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/launcher.py", line 81, in main    return NodePoolLauncherApp.main()  File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 249, in main    return super(NodepoolDaemonApp, cls).main(argv)  File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line19:41
y2kenny196, in main    return cls()._main(argv=argv)  File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 186, in _main    return self._do_run()  File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 230, in _do_run    return super(NodepoolDaemonApp, self)._do_run()  File19:41
y2kenny"/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 192, in _do_run    return self.run()  File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/launcher.py", line 61, in run    config = self.pool.loadConfig()  File "/usr/local/lib/python3.7/site-packages/nodepool/launcher.py", line 925, in loadConfig    config =19:41
y2kennynodepool_config.loadConfig(self.configfile)  File "/usr/local/lib/python3.7/site-packages/nodepool/config.py", line 264, in loadConfig    newconfig.setProviders(config.get('providers'))  File "/usr/local/lib/python3.7/site-packages/nodepool/config.py", line 150, in setProviders    p.load(self)  File19:41
y2kenny"/usr/local/lib/python3.7/site-packages/nodepool/driver/kubernetes/config.py", line 91, in load    pp.load(pool, config)  File "/usr/local/lib/python3.7/site-packages/nodepool/driver/kubernetes/config.py", line 62, in load    full_config.labels[label['name']].pools.append(self)KeyError: 'zuul-nodes'19:41
y2kenny( I have name: zuul-nodes, type: namespace.)19:42
y2kennynot sure if it's my configuration error or a real bug19:42
fungiy2kenny: it may be easier to paste lengthy output like tracebacks somewhere like http://paste.openstack.org/ and then link them in here19:43
y2kennyfungi:  right... sorry about that:19:43
y2kennyhttp://paste.openstack.org/show/791210/19:43
fungithanks! that's a bit more readable at least19:45
Shrewsy2kenny: sort of seems like a config error. make sure you have that label listed in the top-most labels section (https://zuul-ci.org/docs/nodepool/configuration.html#attr-labels)19:45
y2kennyoh right... I forgot about that correspondence.19:46
y2kennysorry about the noise19:46
fungii guess label['name'] is looking up to a value of "zuul-nodes" there and it's not been loaded into the full_config.labels dict19:46
y2kennyI am actually not too sure what this config does so I thought I would play with it a bit.  with the labels.type=pod, it create min-ready # of namespace19:47
y2kennyok so it created a namespace as defined by  the driver config.  Do you how zuul will use this fungi?19:52
fungii'm not familiar with the kubernetes driver, i was mostly trying to reverse engineer the error from the traceback19:54
y2kennyok19:57
y2kennyit's not a big deal, I am just curious on its function.19:57
*** jamesmcarthur has quit IRC20:09
corvusy2kenny: you'll see a namespace either way...20:12
y2kennycorvus: so what I noticed is the listing in the UI->Nodes20:12
corvusy2kenny: if you request a pod, it still gets a namespace so that it can be set up securely, but there will also be a pod, and that's what zuul puts in the inventory.  the namespace isn't really accessible, it's an implementation detail.20:12
corvusy2kenny: if you request a namespace, then you'll get a namespace without a pod, and zuul gets information about how to use the namespace, but nothing in the inventory20:13
y2kennyso for the second case, how would I use the namespace without a pod20:14
corvusy2kenny: the listing in the nodes tab should either be for a pod or a namespace; you shouldn't see an extra namespace there (what i was describing is what you would see if you looked in k8s)20:14
y2kennyso I see additional nodes with labels  zuul-nodes (the name of the namespace in this  case) with connection name  space20:15
y2kennynamespace*20:15
*** jamesmcarthur has joined #zuul20:15
y2kennythe server  is auto generated with -d (in my case my pool name is k8s-auto , so the server is k8s-auto-0000000279 for example)20:16
y2kennyFull line: 0000000279zuul-nodesnamespacek8s-auto-0000000279k8s-containersready23 minutes ago20:16
corvusthat looks right; the next job that requests a "zuul-nodes" label will get that namespace assigned to it20:17
corvusy2kenny: as for using it, info about connecting to it will appear in the zuul.resources ansible variable20:17
y2kennyok.  So it's up to a zuul role to launch something in it20:18
y2kenny?20:18
y2kennyzuul role or ansible role?20:18
corvusy2kenny: yes... tristanC do you know if we have anything to help with that?20:18
tobiashcorvus, AJaeger: zuul py35 jobs are failing since the release of cliff 3.0.0 which seems to require py3620:18
corvusy2kenny: see here: https://zuul-ci.org/docs/zuul/reference/jobs.html#var-zuul.items.resources20:19
corvusy2kenny: that has some examples about how to use it20:19
y2kennyOOOOH20:20
tobiashAJaeger: was it intended to also drop py35 there https://review.opendev.org/705612 ?20:20
corvusy2kenny: zuul will write out the .kube/config file for you, so it should be all ready to go20:20
y2kennyI did not see that bit (I was looking at that page for some other variables.)20:20
corvusy2kenny: we should probably cross-reference that better from the nodepool docs20:20
tristanCcorvus: y2kenny: here is playbook that populate a namespace: https://review.opendev.org/#/c/570669/8/playbooks/openshift/pre.yaml20:21
corvustristanC: ah, great example, thanks!20:21
Shrewsbefore i forget, https://review.opendev.org/#/q/topic:node-attr has some changes y2kenny needs. i can fix up the merge conflict once we merge its parent20:22
corvustobiash: any idea what we use cliff for?20:22
tobiashcorvus: looks to be a transitive dependency20:22
tobiashI don't know yet what pulls it in, maybe openstacksdk?20:22
tobiashI just learned that cliff exists20:23
Shrewsi don't think sdk depends on cliff. i think osc does20:23
y2kennytristanC, corvus: thanks for the examples.20:23
y2kennymnaser, tristan: so I manually upgraded the kubectl inside the executor container image20:24
mnasery2kenny: cool -- how did that work?20:25
*** saneax has quit IRC20:25
mnaserfwiw we should install newer kubectl in our images..20:25
mnaserwe have like 1.11 or some ancient version that doesn't support --address20:25
mordredsdk definitely doesn't depend on cliff - would be interesting to know what's pulling it in20:25
corvusmnaser: what's --address for?20:25
y2kennymnaser: that seems to have solved the exception (I don't see the error in the executor log any more, but I am still not  able to get the pod to do useful work.)20:26
mnasercorvus: it seems you used --address in port-forward to force it to listen to 127.0.0.120:26
mnaserbut it seems like the default value _might_ already be 127.0.0.120:26
corvusoh20:26
mnaserso we could drop it and not have to install newer kubectl20:26
corvuswe get current kubectl from openshift clients20:26
corvusso if we do need to upgrade, we can look at either whether there's a newer openshift, or if we also need to install a dedicated kubectl20:27
corvusbut yeah, maybe dropping address would be better20:27
corvusmnaser, y2kenny: i can try a test without --address (i shoud be able to just manually set up the same kind of port forward) using the current openshift kubectl and see if that's a solution20:28
corvusmordred: any idea the best way to find out what's using cliff?20:28
corvus(pip --graphviz would be great :)20:28
mordredcorvus: we can just look in the tox logs20:28
mordredone sec20:28
mordredthis is zuul or nodepool?20:29
corvustobiash: have a link to a failure online, or you just seeing this locally now?20:29
mnasercorvus: given we use ubuntu, why don't we use the kubeernetes provided deb packages?20:29
tobiashcorvus: https://c7cc7615691360b259ca-b4831d08abece0047714b6befcdb357a.ssl.cf1.rackcdn.com/715270/2/check/tox-py35/8dfd7e8/job-output.txt20:29
corvusmnaser: need openshift clients, and we get kubectl "for free"20:29
AJaegertobiash: yes, dropping py3.5 was intented20:29
corvusmnaser: not a big deal, just seemed like unecessary extra work at the time20:29
mnaseroh, yes, gotcha20:29
openstackgerritTobias Henkel proposed zuul/zuul master: Cap cliff to <3.0.0  https://review.opendev.org/71530320:30
tristanCfwiw, zuul-operator could run kubectl node and console-stream integration test, this initial change is still waiting for review: https://review.opendev.org/71416520:30
mnaseri dont know if the openshift client behaviour is listening to port forward 127.0.0.120:30
corvusmnaser: yeah, i'll test it with the actual openshift kubectl20:30
tobiashI guess then we need to cap it or drop py35 in zuul as well20:30
mordredcorvus: Collecting cliff>=2.8.0 (from stestr>=1.0.0->-r /Users/mordred/src/opendev.org/zuul/zuul/test-requirements.txt (line 6))20:31
AJaegertobiash: that's what all (most?) of oslo did as part of the Ussuri goal20:31
mordredcorvus: stestr20:31
corvustristanC: i was just about to review that when this happened :(20:31
corvusmordred: can we avoid stestr?20:31
corvus(this is the sort of dependency we tried really hard to avoid in zuul and nodepool)20:31
y2kennymnaser: do you know if all the pre and post run are done on the target node or the executor local?20:31
mnasercorvus: cool, i gave y2kenny a small test script and yeah the error he was getting was --address is missing error20:31
mordredcorvus: it would probably be quicker to just fix cliff20:32
corvuslike, i don't understand why stestr would drop 3.5 support20:32
mnasery2kenny: playbooks run on remote node20:32
corvusmordred: AJaeger says it was intentional in cliff20:32
tobiashI guess stestr doesn't want to but didn't cap cliff as well20:32
mordredwell - as the current PTL of cliff, let me retun to that question real quick20:32
corvusmordred: i await your proclamation with renewed interest :)-20:32
corvuswow, sorry about the soul patch there, that was a typo20:33
AJaegeroverall idea was AFAIK to remove py35 unless there was a need to keep it - and there were few exceptions only. I just updated a few repos20:33
tobiashsince stestr is wider spread than openstack that might be such an exception20:34
mordredcorvus: remote:   https://review.opendev.org/715305 Re-add support for python 3.520:34
AJaegerhttps://governance.openstack.org/tc/goals/selected/ussuri/drop-py27.html#completion-criteria has "The minimum version of Python now supported by <project> is Python 3.6." as note20:34
mordredcorvus: feel free to +2 that20:34
corvusyeah, i'm getting the idea here that stestr probably didn't intend this and fixing cliff is the correct solution20:34
mordredand I'll request a release as soon as it's landed20:35
y2kennyOh... I think I just caught a problematic line.  For the "prepare-workspace" task, I got "Output suppressed because no_log was given"20:35
AJaegermordred: want to run py35 tests as well?20:35
tobiashI guess that makes sense20:36
mordredAJaeger: probably not a bad idea :)20:36
corvusmordred, AJaeger: +2 on 715305 and i think running py35 tests is reasonable20:36
mordredAJaeger: is there a template? or should I just add in openstack-tox-py35 ?20:36
mordredcorvus, AJaeger: updated with py35 jobs added20:37
AJaegermordred:  openstack-python35-jobs is the mpalte20:37
mordredoh - there's a ... one sec20:37
mordreddone20:38
y2kennywhat is the best way to debug the ansible being run on the target node?  (is there a way to have the executor run the ansible playbook with vvvv?)20:38
AJaegermordred: LGTM20:38
corvusy2kenny: run "zuul-executor verbose" on the executor20:38
corvusy2kenny: it will switch to "-vvv"  (i'm pretty sure it's 3 not 4 v's) for all subsequent jobs20:39
mordredcorvus, tobiash: as soon as that lands I'll get a release cut20:39
tobiashawesome, I guess I can drop the cap then :)20:39
corvusy2kenny: then "zuul-executor unverbose" will restore normal behavior20:39
y2kennycorvus: thanks.   I will give that a try20:39
mordredI've got 2 other releases I need cut today anyway, so my today is mostly watching patches land20:39
tobiashy2kenny: don't forget the unverbose, I once managed to fill the hard disks over lunch :-P20:40
y2kennytobiash: :)20:41
y2kennydo I need to restart the executor?20:41
AJaegermordred: need 90 mins to run20:41
corvusy2kenny: no, if you do, it'll also go back to normal (non-verbose) mode20:42
tobiashno, just execute that command, that will tell the running executor to switch behavior20:42
y2kennyum... ok... so perhaps the log I am looking at is not what I think it is20:42
y2kennyI am looking at the console.log from the web UI20:42
tobiashy2kenny: you need to look at the executor logs then20:42
mordredAJaeger: I'm also waiting on two openstacksdk patches to land so we can cut a release to also fix an issue with declared python support - in this case we forgot to put in a "we only support python3" while adding a python3-only dep20:42
tobiashthose will contain the ansible debug logs20:42
corvusy2kenny: the easiest way to get the right output is to find the build uuid (it's in the url for the build) and grep for that in the executor logs20:43
y2kennyis it a separate file or do they get dump to stdout?20:44
y2kennymy case is easier... still prototyping so there's really just one job :)20:44
y2kennyI see a lot of build: <hash> and e: <hash>20:44
corvusy2kenny: in k8s, should go to stdout, but those might be at info-level only while the ansible stuff is at debug level?  let me check20:44
*** jamesmcarthur has quit IRC20:45
y2kennyI see some DEBUG tag (I am running executor in debug mode) but those are executor debug not ansible debug?20:45
corvusok good, then they should show up there20:45
y2kennyor ansible debug will get propagated to there?20:45
y2kennyok20:45
AJaegermordred: https://review.opendev.org/#/c/715243/6/releasenotes/notes/python-3.5-629817cec092d528.yaml has a typo, doesn't it?20:45
mordredAJaeger: yup!20:46
corvusy2kenny: all of the ansible output that goes to the console log should also end up in the zuul executor debug log, and when in verbose mode, all the *extra* ansible output should end up there too20:46
mnasercorvus: we can drop --address20:46
mnaserroot@c0d83c34bf93:~# kubectl port-forward pod/nginx-deployment-574b87c764-795g5 :80 => Forwarding from 127.0.0.1:34177 -> 8020:46
mnaser(thats in zuul/zuul-executor docker contaienr)20:46
corvusy2kenny: those should all start with "Ansible output: "20:46
mordredAJaeger: thanks - that turns out to be an important typo20:47
corvusmnaser: awesome, thanks, that's more or less what i was going to test but hadn't gotten around to; you want to push the patch or should i?20:47
y2kennycorvus: ok yes!  I see the verbose now20:47
mnasercorvus: i will right now20:47
AJaegermordred: one more typo found - sorry  ;(20:48
AJaegermordred: finally the note makes sense, took me some time20:49
mordredAJaeger: wow20:49
mordredAJaeger: done!20:49
AJaegerthanks20:50
corvustristanC: that integration test looks great; i like the git service -- that keeps it nice and separate from the scheduler pod.20:51
AJaegermordred: I'll add another "." to your note...20:51
openstackgerritMohammed Naser proposed zuul/zuul master: executor: drop --address=127.0.0.1 from kubectl  https://review.opendev.org/71530820:52
tristanCcorvus: nice thanks! I'm adding the nodepool-launcher service to the operator, and i think it will be easy to run the integration test on a kubernete nodeset20:52
mnasercorvus: ^20:52
y2kennycorvus: ok so the verbose didn't quite help in this particular case because the failing role is the "prepare-workspace" and no_log was turned on (I didn't realize what that no_log means until I look at the code for the role just now.)20:54
AJaegermordred: pushed a cleanup on your stack as well - https://review.opendev.org/#/c/715309/20:56
corvusy2kenny: oh sorry, i should have caught that20:57
corvusy2kenny: i think that no_log is in there just because it's really verbose otherwise20:57
y2kennyyea... since it's a file sync20:57
y2kennyso the pod image I am using for this is the one from the tutorial20:57
corvusmordred: we should maybe make the no_log a flag, so someone in y2kenny's situation could turn that off in the base job20:57
mordredcorvus: ++20:58
corvusy2kenny: oh, if you're synchronizing to a pod, you'll need to do something different20:58
y2kennyBUT, because I am using the pod type, it's launching the image plain w/o the volume20:58
y2kennyi.e. w/o the ssh key20:58
y2kennyI am guessing that's the issue?20:59
corvusy2kenny: to be clear, this is using a "pod" nodepool type, not "namespace"?20:59
y2kennythat's correct21:00
y2kennyI haven't gone to the namespace usage yet21:00
y2kennybut I have a feeling that's  probably the one more useful to me21:00
y2kennysince I doubt I will have a build/test job that would work right away by launching a plain pod21:01
corvusy2kenny: for a pod, i think you need to use the "prepare-workspace-openshift" role21:01
corvus(yes, it works for k8s)21:01
y2kennyoh!... I will give that a try21:01
y2kennyso just a quick side question, do you guys then have different base/pre.yaml or base job for different nodeset?21:02
y2kennyto manage this type of variation?21:02
corvusy2kenny: basically, i think synchronize doesn't work with k8s, so instead, it runs "oc rsync"; that's one of the reasons we automatically include the openshift client on the executor image21:03
y2kennyooo...21:03
* mnaser would ideally like that we use kubernetes-isms instead of openshift-isms in our roles :>21:04
corvusy2kenny: i think that's what tristanC uses in his mixed environments; you can have as many base jobs as you want, so if you've got container jobs, make sure they inherit from the container base job.  another approach would be to extend the base pre-playbook to include either the prepare-workspace or prepare-workspace-openshift role depending on what's in the inventory21:04
corvusmnaser: yes, in this case there's a thing the openshift client supports that kubectl doesn't: rsync21:04
corvusbut it works with both k8s and openshift21:05
corvusso it makes things easy for us21:05
mnaseroh i see21:05
* mnaser just sometimes assumes everyone is running inside containers21:05
corvusi think we talked about duplicating or renaming that role (to reduce confusion) but i don't recall if that made it all the way to a patch yet21:05
*** y2kenny has quit IRC21:06
tristanCcorvus: y2kenny: ftr, here is a base pre that use different prepare roles depending of the connection: https://pagure.io/fedora-project-config/blob/master/f/playbooks/base/pre.yaml#_1221:06
corvustristanC: yeah, maybe that's the sort of thing we should do in zuul-base-jobs and the zuul docs21:06
tristanCcorvus: well, ideally the prepare-workspace would do the right thing, but there is also the issue with build-sshkey which fails with kubectl21:07
*** y2kenny has joined #zuul21:08
mnaseris someone looking into the tox-py35 failures?21:08
corvustristanC: good point; maybe we can update both of those to protect with "ansible_connection != kubectl"21:08
corvusmnaser: yes, that's the cliff thing from earlier; pending a cliff release21:08
mnaseri can pick it up if someone isnt looking21:08
mnaserah okay, cool21:08
corvusmnaser: https://review.opendev.org/71530521:08
*** jamesmcarthur has joined #zuul21:08
corvusactually, i guess pending that landing then a cliff release21:08
* y2kenny got disconnected due to stupid vpn21:09
mnaserall you missed ^ https://www.irccloud.com/pastebin/GDqUsdr5/21:10
y2kennymnaser: thanks!  those are important bits21:11
mnasernp21:11
* mnaser off to seek food21:11
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Add nodepool launcher service initial deployment  https://review.opendev.org/71531021:20
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Add nodepool external config  https://review.opendev.org/71531121:20
*** jamesmcarthur has quit IRC21:30
*** jamesmcarthur has joined #zuul21:33
*** tjgresha__ has joined #zuul21:33
*** tjgresha_ has quit IRC21:35
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Add nodepool external config  https://review.opendev.org/71531121:48
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Adapt the integration playbook to be usable locally  https://review.opendev.org/71416321:48
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Add nodepool labels to integration test  https://review.opendev.org/71531621:48
tristanCcorvus: would you mind checking https://review.opendev.org/714163 too, it's a small re-org that really help to run the test locally21:49
tristanCthen https://review.opendev.org/715316 (and its parents) should demonstrate a working console-stream when using a kubectl connection with mnaser change that removes the faulty `--address` argument21:50
*** jamesmcarthur has quit IRC21:59
*** jamesmcarthur has joined #zuul22:01
mordredtobiash, corvus: cliff change is in the gate22:21
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Add nodepool labels to integration test  https://review.opendev.org/71531622:25
*** tjgresha_ has joined #zuul22:35
*** tjgresha__ has quit IRC22:38
*** rfolco has quit IRC22:45
*** jamesmcarthur has quit IRC22:46
*** rfolco has joined #zuul22:46
*** rfolco has quit IRC22:52
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script"  https://review.opendev.org/71532523:00
*** jamesmcarthur has joined #zuul23:02
*** y2kenny has quit IRC23:04
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Remove bashate from test-requirements  https://review.opendev.org/71532823:23
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script"  https://review.opendev.org/71532523:24
tristanCand it seems like install-kubernetes broke, it's now failing with `X The none driver requires conntrack to be installed for kubernetes version 1.18.0`23:26
tristanCcorvus: also it seems like the zuul-operator functional test is a bit flaky, it seems like when adding a connection, the scheduler sometime restart before the merger, and the cat job failed unexpectedly, resulting in the scheduler not loading the tenant config23:27
corvustristanC: can the operator manage that sequence?23:27
tristanCcorvus: the role is patching the service in order, but it seems like kubernetes sometime restart the service out of order... here is the task: https://opendev.org/zuul/zuul-operator/src/branch/master/roles/zuul-restart-when-zuul-conf-changed/tasks/main.yaml#L1323:29
tristanCi guess we could add a lockstep between each patch, but wouldn't it be easier if the scheduler simply restart failed cat job?23:30
corvustristanC: yes, but only if the merger is actually going to get fixed.  if it's running with the wrong configuration, it'll never succeed23:30
corvusin that case, a hard, fast failure is better23:30
corvusso i'd prefer the operator do what we would ask a human to do, which is restart the services in the correct order23:30
corvusit's probably worth checking that the merger is up before proceeding to the scheduler23:31
corvusor, at least checking that the merger is down :)23:31
tristanCi meant, a simple cat job retry would likely mitigate that failure23:31
corvusdoesn't actually have to be up23:31
corvustristanC: probably not23:31
corvustristanC: imagine a sizable cluster with 20 mergers and 30 executors23:31
corvustristanC: we can retry cat jobs *really fast*; much faster than a scheduler can restart23:32
*** jamesmcarthur has quit IRC23:32
tristanCcorvus: the scheduler cat job failed at 2020-03-26 21:38:16,661 in https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e64/715310/1/check/zuul-operator-functional-k8s/e645105/docker/k8s_scheduler_zuul-scheduler-0_default_7f90728a-333d-44d2-8e85-be73693b0f68_0.txt23:32
*** jamesmcarthur has joined #zuul23:33
tristanCcorvus: the merger got started at 2020-03-26 21:38:18,279 in https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e64/715310/1/check/zuul-operator-functional-k8s/e645105/docker/k8s_merger_zuul-merger-75d5bd4644-klnhg_default_a6bdc51c-5e50-4a64-ba28-8eba1cbfecb1_0.txt23:33
corvustristanC: right, i get that slowing it down 2 seconds in this test job would fix this race in this test.  but this is a production bug, so we should fix it systemically so this doesn't happen for large installations.23:34
tristanCcorvus: so... what would be the `ready` condition for merger service?23:34
corvustristanC: they don't actually have to be ready, they just have to not be running the old config.23:35
corvustristanC: the easiest thing to do (and what we do with the opendev restart script) is to make sure they're *down*, not up.23:35
tristanCcorvus: oh ok, then that's something we should be able to do23:36
corvushttps://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul_restart.yaml23:36
corvus(obvs not k8s, but that's the play structure)23:36
*** jamesmcarthur has quit IRC23:38
tristanCin k8s, i guess we can process merger and executor, and make sure they are in the desired state before doing the scheduler.23:42
tristanCsince we use deployment and statefulset, i don't think we can easily manage each pod individually23:42
tristanCperhaps we could delete instead of patching, but then i think we'll loose the persistent volume23:43
tristanCtime to afk for me. have a good rest of the day folks23:45
*** jamesmcarthur has joined #zuul23:53
*** tosky has quit IRC23:53

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!