openstackgerrit | Merged zuul/zuul master: bindep: add bzip2 to all platforms https://review.opendev.org/715041 | 00:18 |
---|---|---|
*** zxiiro has quit IRC | 00:30 | |
*** ysandeep|away is now known as ysandeep | 00:43 | |
*** jamesmcarthur has quit IRC | 00:46 | |
*** tosky has quit IRC | 00:52 | |
*** jamesmcarthur has joined #zuul | 01:01 | |
*** rlandy has joined #zuul | 01:05 | |
*** rlandy has quit IRC | 01:07 | |
*** jamesmcarthur has quit IRC | 01:23 | |
*** ysandeep is now known as ysandeep|rover | 01:25 | |
*** Goneri has quit IRC | 01:30 | |
*** bhavikdbavishi has joined #zuul | 02:42 | |
*** swest has quit IRC | 02:54 | |
*** bhavikdbavishi has quit IRC | 02:54 | |
*** leathekd has quit IRC | 02:57 | |
*** swest has joined #zuul | 03:10 | |
*** jamesmcarthur has joined #zuul | 03:18 | |
*** jamesmcarthur has quit IRC | 03:22 | |
*** jamesmcarthur has joined #zuul | 03:27 | |
*** jamesmcarthur has quit IRC | 03:33 | |
*** jamesmcarthur has joined #zuul | 03:34 | |
*** bhavikdbavishi has joined #zuul | 03:53 | |
*** bhavikdbavishi1 has joined #zuul | 03:56 | |
*** bhavikdbavishi has quit IRC | 03:57 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 03:57 | |
*** jamesmcarthur has quit IRC | 04:23 | |
*** jamesmcarthur has joined #zuul | 04:26 | |
*** jamesmcarthur has quit IRC | 04:34 | |
*** jamesmcarthur has joined #zuul | 04:34 | |
*** jamesmcarthur has quit IRC | 04:43 | |
*** jamesmcarthur has joined #zuul | 04:44 | |
*** sgw has quit IRC | 04:59 | |
*** y2kenny has quit IRC | 05:09 | |
*** evrardjp has quit IRC | 05:36 | |
*** evrardjp has joined #zuul | 05:36 | |
*** reiterative has quit IRC | 05:40 | |
*** reiterative has joined #zuul | 05:40 | |
*** jamesmcarthur has quit IRC | 05:55 | |
*** jamesmcarthur has joined #zuul | 05:57 | |
*** jamesmcarthur has quit IRC | 06:02 | |
*** jamesmcarthur has joined #zuul | 06:07 | |
*** jamesmcarthur has quit IRC | 06:30 | |
*** dpawlik has joined #zuul | 07:22 | |
*** bhavikdbavishi has quit IRC | 07:27 | |
*** ysandeep|rover is now known as ysandeep|rover|l | 07:35 | |
*** ysandeep|rover|l is now known as ysandeep|roveraf | 07:35 | |
*** ysandeep|roveraf is now known as ysandeep|rover|l | 07:36 | |
*** bhavikdbavishi has joined #zuul | 07:54 | |
*** evrardjp has quit IRC | 07:57 | |
*** bolg has quit IRC | 07:59 | |
*** evrardjp has joined #zuul | 08:01 | |
*** bhavikdbavishi has quit IRC | 08:04 | |
*** guilhermesp has quit IRC | 08:06 | |
*** vblando has quit IRC | 08:06 | |
*** dmellado has quit IRC | 08:06 | |
*** guilhermesp has joined #zuul | 08:07 | |
*** dmellado has joined #zuul | 08:10 | |
*** bolg has joined #zuul | 08:17 | |
*** bhavikdbavishi has joined #zuul | 08:19 | |
*** bolg has quit IRC | 08:26 | |
*** bolg has joined #zuul | 08:29 | |
*** toabctl has quit IRC | 08:30 | |
*** bolg has quit IRC | 08:33 | |
*** ysandeep|rover|l is now known as ysandeep|rover | 08:35 | |
*** bolg has joined #zuul | 08:35 | |
*** jpena|off is now known as jpena | 08:53 | |
*** tosky has joined #zuul | 09:00 | |
*** toabctl has joined #zuul | 09:14 | |
*** bhavikdbavishi has quit IRC | 09:17 | |
*** bhavikdbavishi has joined #zuul | 09:31 | |
*** wxy-xiyuan has quit IRC | 09:39 | |
*** wxy-xiyuan has joined #zuul | 09:39 | |
*** sshnaidm is now known as sshnaidm|afk | 09:56 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Install unzip on all platforms https://review.opendev.org/714919 | 10:16 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Update docker based test setup for zk auth https://review.opendev.org/657096 | 10:18 |
*** bhavikdbavishi has quit IRC | 10:27 | |
*** bhavikdbavishi has joined #zuul | 10:28 | |
*** bhavikdbavishi1 has joined #zuul | 10:31 | |
*** bhavikdbavishi has quit IRC | 10:33 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 10:33 | |
*** sugaar has quit IRC | 10:38 | |
*** sugaar has joined #zuul | 10:39 | |
*** avass has quit IRC | 11:14 | |
*** sshnaidm|afk is now known as sshnaidm | 11:25 | |
*** bhavikdbavishi has quit IRC | 11:55 | |
tristanC | when y2kenny comes back, iirc, `bwrap: Can't mount proc` can be caused by either the lack of userns, or that bwrap is not root setuid | 11:57 |
*** jpena is now known as jpena|lunch | 11:58 | |
*** Goneri has joined #zuul | 12:14 | |
*** rfolco has joined #zuul | 12:20 | |
*** rlandy has joined #zuul | 12:23 | |
*** bhavikdbavishi has joined #zuul | 12:32 | |
zbr | anyone aware of TypeError: 'NoneType' object is not iterable from zuul_sphinx/zuul.py", line 109, in parse_zuul_d ? | 12:36 |
zbr | mainly it happened when I removed layout.yaml because jobs where already defined inside projecta.yaml, at https://review.rdoproject.org/r/#/c/26005/ | 12:38 |
zbr | zuul had no problem with that, but tox-docs choked. | 12:38 |
zbr | i will propose a patch | 12:41 |
zbr | i see that the code from master behaves differently, giving an error like: ('File %s in Zuul dir is empty', '/Users/ssbarnea/c/rdo/rdo-jobs/zuul.d/layout.yaml') | 12:46 |
mordred | zbr: interesting. any reason to not just delete the file instead of removing the contents but leaving it there? | 12:52 |
zbr | mordred: it was my mistake, i left the file empty. | 12:53 |
zbr | sorted once i removed the file. | 12:54 |
zbr | but it would be a good idea to make a new release of zuul-sphinx, so we can benefit from the better error message | 12:54 |
zbr | now i do not know if an empty yaml file can be considered valid or not. | 12:54 |
mordred | yaml loads empty files as a None object | 12:56 |
mordred | but - yeah, if there's a better error message in master that's good | 12:56 |
fungi | using yaml.load() on /dev/null returns None, yeah | 12:57 |
mordred | looks like the error message fix is the only substantive change in zuul-sphinx atm - so maybe an 0.4.2 | 12:58 |
fungi | wfm | 13:00 |
*** jpena|lunch is now known as jpena | 13:01 | |
*** y2kenny has joined #zuul | 13:08 | |
mnaser | i'd love feedback on idea how i can test https://review.opendev.org/#/c/715045/1 | 13:23 |
*** bhavikdbavishi has quit IRC | 13:26 | |
*** bhavikdbavishi has joined #zuul | 13:27 | |
*** hashar has joined #zuul | 13:28 | |
*** zxiiro has joined #zuul | 13:29 | |
*** bhavikdbavishi has quit IRC | 13:38 | |
*** armstrongs has joined #zuul | 13:44 | |
mordred | zuul-maint: we landed an update to python-builder/python-base to remove the installation of recommends. let me know if you see any image build failures. | 13:46 |
*** jcapitao has joined #zuul | 13:51 | |
*** y2kenny has quit IRC | 13:58 | |
*** y2kenny has joined #zuul | 14:03 | |
*** sgw has joined #zuul | 14:06 | |
openstackgerrit | Monty Taylor proposed zuul/nodepool master: Pin docker images to 3.7 explicitly https://review.opendev.org/715043 | 14:08 |
openstackgerrit | Monty Taylor proposed zuul/nodepool master: Add libc6-dev to bindep https://review.opendev.org/715216 | 14:08 |
mordred | mnaser: ^^ speak of the devil :) | 14:09 |
mnaser | aha | 14:09 |
*** ysandeep|rover is now known as ysandeep|away | 14:10 | |
*** guilhermesp has quit IRC | 14:27 | |
*** guilhermesp has joined #zuul | 14:27 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-registry master: Use versioned python base images https://review.opendev.org/715225 | 14:28 |
*** chandan_kumar has joined #zuul | 14:29 | |
*** jkt has quit IRC | 14:29 | |
*** chandankumar has quit IRC | 14:29 | |
*** jkt has joined #zuul | 14:30 | |
*** sshnaidm has quit IRC | 14:30 | |
*** corvus has quit IRC | 14:30 | |
*** sshnaidm has joined #zuul | 14:31 | |
*** corvus has joined #zuul | 14:31 | |
corvus | mnaser: i don't have a good idea of how to test that; i don't think we can fake out the uri part of it. i think the only way to test would be a completely synthetic test or adding testing conditionals to the role and supplying test data | 14:33 |
*** guilhermesp has quit IRC | 14:34 | |
*** guilhermesp has joined #zuul | 14:35 | |
*** guilhermesp has quit IRC | 14:38 | |
openstackgerrit | Monty Taylor proposed zuul/zuul master: Be explicit about base container image https://review.opendev.org/714549 | 14:38 |
*** guilhermesp has joined #zuul | 14:38 | |
*** guilhermesp has quit IRC | 14:39 | |
*** guilhermesp has joined #zuul | 14:40 | |
*** jamesmcarthur has joined #zuul | 14:51 | |
*** jamesmcarthur has quit IRC | 14:59 | |
*** jamesmcarthur has joined #zuul | 15:00 | |
*** jamesmcarthur_ has joined #zuul | 15:02 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-website master: Update for OpenDev, https https://review.opendev.org/714261 | 15:04 |
*** jamesmcarthur has quit IRC | 15:07 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: WIP: Enforce sql connections for scheduler and web https://review.opendev.org/630472 | 15:34 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Improve typings in context of 630472 https://review.opendev.org/715247 | 15:34 |
y2kenny | So I was able to setup a nodepool with k8s provider and it looks to be working (it created some namespaces with a pod in it.) the nodes show up in the web UI and looks like the scheduler is using the right scheduler (in cluster, so the zone things seems to be working.) | 15:39 |
y2kenny | but when the job is running, the stream log only showed Job console starting, running ansible setup, pre-runs and post runs | 15:40 |
y2kenny | and the job failed after 3 attempt | 15:40 |
y2kenny | so I look into the executor log (I ran it with the debug flag on.) | 15:41 |
y2kenny | "Unable to start kubectl port forward" exception. Does that mean I needed an executor image with kubectl installed? | 15:42 |
SpamapS | y2kenny: IIRC, the streaming was very recently fixed. And yes, the executor absolutely must have kubectl. | 15:42 |
y2kenny | or does the executor needs the right role binding as well | 15:42 |
clarkb | executors now need socat and kubectl on them | 15:43 |
clarkb | if you are using the container images I expect they are present though | 15:44 |
SpamapS | https://opendev.org/zuul/zuul/commit/2881ee578599b199280c60fb76b5201dd855f419 | 15:44 |
y2kenny | I am using the official executor image from dockerhub. I know I can extend that image but I am wondering if the executor actually uses the kubectl shell command or the library | 15:44 |
y2kenny | SpamapS: AH | 15:44 |
y2kenny | missed that one | 15:44 |
SpamapS | Looks like maybe that hasn't been released yet? | 15:45 |
SpamapS | hm no, 3.18.0 has the fixes, just not the fixed release note. ;) | 15:45 |
y2kenny | I will give that a try and see how things go | 15:46 |
clarkb | the actual rendered version has it properly under 3.18 on the website | 15:46 |
fungi | https://zuul-ci.org/docs/zuul/reference/releasenotes.html#relnotes-3-18-0 | 15:51 |
*** chandan_kumar is now known as chandankumar | 15:51 | |
*** jamesmcarthur_ has quit IRC | 16:00 | |
*** jamesmcarthur has joined #zuul | 16:01 | |
mnaser | oh | 16:02 |
mnaser | someting just occured to me | 16:02 |
mnaser | do we have a semaphore or ordering enforce for our promote jobs | 16:03 |
mnaser | what if two promote jobs get queued, does one start before the other? | 16:03 |
fungi | the promote pipeline uses a supercedent pipeline manager | 16:04 |
fungi | so only one runs at a time for the same project+branch | 16:04 |
clarkb | and it enforces order | 16:05 |
mnaser | ok cool, just something that came in mind. i know we have some things we do in 'post' that ends up needing a semaphore | 16:05 |
fungi | right, and any subsequently queued changes supercede one another so only the most recently triggered one is ever waiting at a time | 16:05 |
fungi | because also running builds for all the intermediate changes would just be a waste of resources and added delay | 16:06 |
*** ysandeep|away is now known as ysandeep | 16:08 | |
y2kenny | question about relationship between job/project/branch/pipeline. | 16:11 |
y2kenny | if all the job defined for a pipeline for a project does not run due to no matching branch, what is the result of the triggered pipeline? | 16:12 |
y2kenny | is that a noop (pass) or just the no-jobs | 16:13 |
y2kenny | actually, nevermind... I think I found the relevant doc that points to no-jobs | 16:14 |
fungi | yeah, if there are no jobs zuul doesn't (well, shouldn't anyway, modulo bugs) report | 16:15 |
fungi | if you want it to report an unconditional passing result, add a noop job for those otherwise jobless branches | 16:16 |
y2kenny | ok | 16:17 |
fungi | if memory serves, the noop job is special in that it short-circuits immediately without having to even engage an executor | 16:17 |
fungi | and doesn't need to be explicitly defined anywhere to be used in a project-pipeline | 16:17 |
y2kenny | understood | 16:20 |
AJaeger | zuul-maint, here's a small update for zuul-website, it changes some links and makes changes for OpenDev, please review https://review.opendev.org/#/c/714261/ | 16:23 |
fungi | AJaeger: rebased for a merge conflict, i guess? | 16:24 |
fungi | reapplied my prior +2 | 16:24 |
*** dpawlik has quit IRC | 16:26 | |
AJaeger | fungi: yes, indeed. Thanks! | 16:27 |
AJaeger | tobiash, thanks for second +2. | 16:29 |
AJaeger | Could somebody +A the change please or what are we waiting for? | 16:29 |
openstackgerrit | Jimmy McArthur proposed zuul/zuul-website master: Adding Infrastructure Donors https://review.opendev.org/715261 | 16:29 |
mnaser | this sat for a while: https://review.opendev.org/#/c/713469/ -- if anyone is feeling generous to put reviews too | 16:30 |
mnaser | :p | 16:30 |
y2kenny | ok so I added the start-zuul-console role and verified that kubectl is in the executor image but I still get port forwarding error. One obvious problem is the executor's access to the cluster so I gave the executor pod a clusterrolebinding that should let it do pretty much anything but that didn't seem to work. Does the executor k8s connection | 16:32 |
y2kenny | work differently than nodepool? | 16:32 |
y2kenny | like, do I have to mount in .kube/config? | 16:32 |
AJaeger | thanks, tobiash ! | 16:32 |
mnaser | y2kenny: mind pasting th error? | 16:33 |
tobiash | AJaeger: I didn't see fungi's review while reviewing it ;) | 16:33 |
*** jamesmcarthur has quit IRC | 16:33 | |
y2kenny | mnaser: | 16:33 |
y2kenny | 2020-03-26 16:28:59,212 ERROR zuul.AnsibleJob: [e: 7dc1e284988742c08c0721de11a689b2] [build: 287412faaa9c45f4932b9d8b3a8507c0] Unable to start port forward:Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/zuul/executor/server.py", line 1975, in prepareAnsibleFiles fwd.start() File | 16:33 |
y2kenny | "/usr/local/lib/python3.7/site-packages/zuul/executor/server.py", line 361, in start raise Exception("Unable to start kubectl port forward")Exception: Unable to start kubectl port forward | 16:33 |
y2kenny | actually... I just verified kubectl works by exec into the pod | 16:34 |
AJaeger | tobiash: ah, parallel reviewing ;) | 16:34 |
*** jamesmcarthur has joined #zuul | 16:34 | |
y2kenny | I can do kubectl get nodes and I got the correct data | 16:34 |
mnaser | y2kenny: what if you did "kubectl port-forward" in the zuul container? | 16:34 |
y2kenny | mnaser: I will give that a try | 16:34 |
y2kenny | error: TYPE/NAME and list of ports are required for port-forwardSee 'kubectl port-forward -h' for help and examples. | 16:35 |
tobiash | y2kenny: is kubectl in a non-default location? | 16:35 |
y2kenny | tobiash: no, I just use the official executor image | 16:36 |
*** jamesmcarthur_ has joined #zuul | 16:36 | |
tobiash | y2kenny: zuul runs jobs in a sandbox and doesn't include paths like /opt by default | 16:36 |
tobiash | hrm | 16:36 |
mnaser | y2kenny: kubectl -n nodepool-ns get pods | 16:36 |
mnaser | kubectl -n nodepool-ns port-forward pod/some-random-pod | 16:36 |
mnaser | i dont remember the rest of the syntax | 16:36 |
*** jamesmcarthur has quit IRC | 16:37 | |
y2kenny | mnaser: ok | 16:38 |
y2kenny | I wonder if I need to restart nodepool after I restarted the executor | 16:38 |
mnaser | y2kenny: kubectl port-forward pod/name :19885 | 16:38 |
mnaser | and add namespace if needed | 16:38 |
mnaser | let me know what that outputs | 16:38 |
y2kenny | # kubectl -n k8s-auto-0000000260 port-forward ubuntu-bionic-k8s-auto :19885Forwarding from 127.0.0.1:40309 -> 19885 | 16:40 |
openstackgerrit | Merged zuul/zuul-website master: Update for OpenDev, https https://review.opendev.org/714261 | 16:40 |
*** ysandeep is now known as ysandeep|away | 16:41 | |
*** jcapitao is now known as jcapitao_afk | 16:42 | |
tobiash | corvus: do you also find exceptions like those in opendev? http://paste.openstack.org/show/791199/ | 16:42 |
y2kenny | mnaser: so I think the port forwarding is working inside the executor pod | 16:43 |
tobiash | this leads to ignored events and occurs in our system roughly 7 times per 30 days | 16:43 |
tobiash | clarkb: ^ | 16:43 |
clarkb | tobiash: looking | 16:43 |
tobiash | thanks | 16:43 |
clarkb | tobiash: is that the scheduler? | 16:43 |
tobiash | yes | 16:44 |
mnaser | y2kenny: hmm, interesting | 16:44 |
y2kenny | I wonder if something was out of sync | 16:45 |
mnaser | y2kenny: it looks like it's failing to do the regex match .. or its failing to start the port forward | 16:45 |
y2kenny | I retriggered the job and seems like the error is not there any more | 16:45 |
mnaser | y2kenny: oh weird, maybe your executors are not all running the same image? | 16:46 |
clarkb | tobiash: searching todays log showed nothing and now I'm running a zgrep on compressed logs | 16:46 |
*** jamesmcarthur_ has quit IRC | 16:46 | |
tobiash | clarkb: k, I had three hits in the last 7 days | 16:46 |
y2kenny | does that make a different? I only have one executor inside the cluster which I just launched | 16:46 |
y2kenny | the other executor is outside the cluster and should not be used to talk to k8s nodes | 16:46 |
tobiash | so not a huge problem, but looks like we have a slight multithreading problem there | 16:46 |
y2kenny | that one I haven't restarted in a while | 16:47 |
*** jamesmcarthur has joined #zuul | 16:47 | |
y2kenny | mnaser: actually no, the error is still there (scrolled too fast) | 16:48 |
clarkb | tobiash: its possible your cpus are faster than ours so you hit races like that more often (zgrep still running with no hits) | 16:48 |
mnaser | ohhh | 16:48 |
mnaser | i think i know what might have happened y2kenny | 16:48 |
mnaser | `kubectl_command = 'kubectl'` but we use popen without shell=True | 16:49 |
mnaser | don't we have to use a full path? | 16:49 |
clarkb | I think it will use the PATH of the calling process in that case | 16:50 |
clarkb | but ya rooting the path might fix it | 16:50 |
y2kenny | other things that seems to look like an error: | 16:52 |
y2kenny | DEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build: e0c21bb20e604466b27023e42e2c2f93] Ansible output: b'packages/ara/plugins/actions/ara_read.py) as it seems to be invalid: module' | 16:52 |
*** jamesmcarthur has quit IRC | 16:53 | |
clarkb | y2kenny: is there a traceback above that? | 16:53 |
clarkb | (that looks like the tail end of a traceback) | 16:53 |
clarkb | but I think its saying you have enabled ara but don't have it installed properly? | 16:53 |
mnaser | y2kenny: give me a second, i mthink i mgiht have a repro | 16:54 |
y2kenny | e0c21bb20e604466b27023e42e2c2f93] Ansible output: b'Using /var/lib/zuul/builds/e0c21bb20e604466b27023e42e2c2f93/ansible/setup_playbook/ansible.cfg as config file'2020-03-26 16:45:03,244 DEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build: e0c21bb20e604466b27023e42e2c2f93] Ansible output: b'[WARNING]: provided hosts list is | 16:54 |
y2kenny | empty, only localhost is available. Note that'2020-03-26 16:45:03,244 DEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build: e0c21bb20e604466b27023e42e2c2f93] Ansible output: b"the implicit localhost does not match 'all'"2020-03-26 16:45:03,722 DEBUG zuul.AnsibleJob.output: [e: 21093c09377d436283954954e32f886a] [build: | 16:54 |
y2kenny | e0c21bb20e604466b27023e42e2c2f93] Ansible output: b'[WARNING]: Skipping plugin (/usr/local/lib/zuul/ansible/2.8/lib/python3.7/site-' | 16:54 |
clarkb | ah ok in that case I think its probably fine as is. Because ya its saying I'm not doing ara things because its not properly installed. Disabling ara or installing it properly will make the warning goa way I think | 16:54 |
y2kenny | another one I see under debug as well (that can also be nothing): | 16:55 |
y2kenny | Ansible output: b"bwrap: Can't mount proc on /newroot/proc: Operation not permitted" | 16:55 |
mnaser | y2kenny: can you run http://paste.openstack.org/show/791201/ like this in the container - python test.py k8s-auto-0000000260 ubuntu-bionic-k8s | 16:56 |
mnaser | and see what it outputs? | 16:56 |
y2kenny | it's ok to not use 260 right? (It got deleted by nodepool I think) | 16:57 |
mnaser | yeah of course | 16:57 |
mnaser | whatever you have running there | 16:57 |
clarkb | y2kenny: tristanC had a note about the bwrap thing. 11:57:04 tristanC | when y2kenny comes back, iirc, `bwrap: Can't mount proc` can be caused by either the lack of userns, or that bwrap is not root setuid | 16:59 |
y2kenny | http://paste.openstack.org/raw/791201/ | 16:59 |
y2kenny | oops | 16:59 |
y2kenny | # python test.py k8s-auto-0000000273 ubuntu-bionic-k8sError: unknown flag: --addressTraceback (most recent call last): File "test.py", line 32, in <module> raise Exception("Unable to start kubectl port forward") | 16:59 |
mnaser | there we go | 17:00 |
clarkb | y2kenny: for lack of userns I'm not sure of an easy way to check that, but I know rhel/centos/fedora disable by default | 17:00 |
clarkb | y2kenny: suse and ubuntu enable by default iirc | 17:00 |
mnaser | y2kenny: kubectl --version ? | 17:00 |
y2kenny | clarkb: oooo I am using fedora as the cluster host | 17:00 |
mnaser | ohhh i know what might be happening | 17:00 |
clarkb | y2kenny: in that case I would check if it is setuid as that should solve it for you | 17:00 |
mnaser | y2kenny: can you replace '--address', '127.0.0.1' by ... '--address=127.0.0.1', | 17:01 |
mnaser | so you end up with | 17:01 |
mnaser | https://www.irccloud.com/pastebin/CKylsa81/ | 17:01 |
y2kenny | # kubectl versionClient Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2018-10-10T16:38:01Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", | 17:01 |
y2kenny | GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"} | 17:01 |
y2kenny | (is my cluster too new?) | 17:01 |
mnaser | no i think its the bug of how we use --address | 17:02 |
mnaser | it should be --address=127.0.0.1 i think | 17:02 |
y2kenny | the error is the same | 17:04 |
y2kenny | unknown flag address | 17:04 |
*** bolg has quit IRC | 17:05 | |
clarkb | tobiash: zgrep finished with no hits | 17:05 |
y2kenny | yea... the kubectl in the image is too old for that | 17:05 |
tobiash | clarkb: ok, thanks | 17:05 |
clarkb | tobiash: thats 30 days of logs fwiw | 17:05 |
y2kenny | the kubectl in the execturo image is 1.11 and it does not have address flag. kubectl 1.17 has it | 17:05 |
tristanC | clarkb: only el7 has userns disabled by default, el8 and fedora should allow it out of the box | 17:05 |
mnaser | y2kenny: i think we're installing a really old kubectl version which doesnt have --address | 17:05 |
clarkb | tristanC: oh that is good to know, thanks | 17:06 |
y2kenny | mnaser: yes... it's 1.11 in the image | 17:06 |
tristanC | y2kenny: are you running the executor in kubernetes? and are they using a privileged pod? | 17:06 |
mnaser | the issue is kubectl does not have --address cause its old | 17:06 |
y2kenny | tristanC: yes in k8s, no on privleged | 17:06 |
tristanC | y2kenny: then you need to use privileged pod for bwrap, that is what preventing the proc mounting | 17:07 |
y2kenny | ok. No need to do additional setting on the host side due to fedora? | 17:07 |
tristanC | y2kenny: i guess you already disabled cgroupv2 for k8s, or you are not using f31. | 17:08 |
y2kenny | f30 :) | 17:08 |
mnaser | where are we installing kubectl from? | 17:10 |
tristanC | y2kenny: upgrading to f31 should work fine by adding `systemd.unified_cgroup_hierarchy=0` to the linux cmdline | 17:11 |
y2kenny | tristanC: understood. | 17:11 |
*** bolg has joined #zuul | 17:11 | |
*** jamesmcarthur has joined #zuul | 17:12 | |
y2kenny | yay!!! \o/ | 17:13 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Protect getCachedChanges from concurrent modification https://review.opendev.org/715270 | 17:14 |
y2kenny | wait... celebrated too early | 17:14 |
tobiash | clarkb, corvus: This should fix this ^ | 17:14 |
fungi | it's never too early to celebrate | 17:15 |
* fungi lives in a constant state of celebration | 17:15 | |
y2kenny | lol | 17:15 |
openstackgerrit | Jimmy McArthur proposed zuul/zuul-website master: Adding Infrastructure Donors https://review.opendev.org/715261 | 17:17 |
*** jcapitao_afk is now known as jcapitao | 17:17 | |
tobiash | clarkb: btw, this exception always happened during high load on our scheduler so might hit us more often when the scheduler is contended | 17:19 |
y2kenny | so I don't see the bwrap error with privileged (thanks tristanC) | 17:20 |
y2kenny | and I get more log from the job | 17:20 |
tobiash | we're starting to suffering from more stalls during tenant reconfigurations (which are mostly cpu bound) | 17:20 |
tobiash | so looking forward to scale out scheduler | 17:20 |
y2kenny | but I still get failure with 3 attempts | 17:20 |
y2kenny | mnaser: does the port forwarding affect job execution? or just log streaming? | 17:23 |
*** evrardjp has quit IRC | 17:49 | |
*** evrardjp has joined #zuul | 17:49 | |
fungi | mnaser: are you good with https://review.opendev.org/715261 ? | 17:49 |
*** openstack has quit IRC | 17:49 | |
*** openstack has joined #zuul | 17:53 | |
*** ChanServ sets mode: +o openstack | 17:53 | |
*** jcapitao has quit IRC | 17:53 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Protect getCachedChanges from concurrent modification https://review.opendev.org/715270 | 17:54 |
tobiash | clarkb: updated for all drivers ^ | 17:54 |
clarkb | thanks! | 17:54 |
*** jpena is now known as jpena|off | 18:02 | |
*** sshnaidm is now known as sshnaidm|afk | 18:17 | |
*** jamesmcarthur has quit IRC | 18:20 | |
*** jamesmcarthur has joined #zuul | 18:21 | |
*** hashar has quit IRC | 18:21 | |
*** jamesmcarthur has quit IRC | 18:24 | |
*** jamesmcarthur has joined #zuul | 18:24 | |
*** jamesmcarthur has quit IRC | 18:26 | |
*** jamesmcarthur has joined #zuul | 18:27 | |
*** jamesmcarthur has quit IRC | 18:36 | |
*** jamesmcarthur has joined #zuul | 18:37 | |
*** jamesmcarthur has quit IRC | 18:50 | |
*** jamesmcarthur has joined #zuul | 18:50 | |
y2kenny | I just noticed another error when trying to use the k8s driver with labels.type = namespace: | 19:41 |
y2kenny | File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/launcher.py", line 81, in main return NodePoolLauncherApp.main() File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 249, in main return super(NodepoolDaemonApp, cls).main(argv) File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line | 19:41 |
y2kenny | 196, in main return cls()._main(argv=argv) File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 186, in _main return self._do_run() File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 230, in _do_run return super(NodepoolDaemonApp, self)._do_run() File | 19:41 |
y2kenny | "/usr/local/lib/python3.7/site-packages/nodepool/cmd/__init__.py", line 192, in _do_run return self.run() File "/usr/local/lib/python3.7/site-packages/nodepool/cmd/launcher.py", line 61, in run config = self.pool.loadConfig() File "/usr/local/lib/python3.7/site-packages/nodepool/launcher.py", line 925, in loadConfig config = | 19:41 |
y2kenny | nodepool_config.loadConfig(self.configfile) File "/usr/local/lib/python3.7/site-packages/nodepool/config.py", line 264, in loadConfig newconfig.setProviders(config.get('providers')) File "/usr/local/lib/python3.7/site-packages/nodepool/config.py", line 150, in setProviders p.load(self) File | 19:41 |
y2kenny | "/usr/local/lib/python3.7/site-packages/nodepool/driver/kubernetes/config.py", line 91, in load pp.load(pool, config) File "/usr/local/lib/python3.7/site-packages/nodepool/driver/kubernetes/config.py", line 62, in load full_config.labels[label['name']].pools.append(self)KeyError: 'zuul-nodes' | 19:41 |
y2kenny | ( I have name: zuul-nodes, type: namespace.) | 19:42 |
y2kenny | not sure if it's my configuration error or a real bug | 19:42 |
fungi | y2kenny: it may be easier to paste lengthy output like tracebacks somewhere like http://paste.openstack.org/ and then link them in here | 19:43 |
y2kenny | fungi: right... sorry about that: | 19:43 |
y2kenny | http://paste.openstack.org/show/791210/ | 19:43 |
fungi | thanks! that's a bit more readable at least | 19:45 |
Shrews | y2kenny: sort of seems like a config error. make sure you have that label listed in the top-most labels section (https://zuul-ci.org/docs/nodepool/configuration.html#attr-labels) | 19:45 |
y2kenny | oh right... I forgot about that correspondence. | 19:46 |
y2kenny | sorry about the noise | 19:46 |
fungi | i guess label['name'] is looking up to a value of "zuul-nodes" there and it's not been loaded into the full_config.labels dict | 19:46 |
y2kenny | I am actually not too sure what this config does so I thought I would play with it a bit. with the labels.type=pod, it create min-ready # of namespace | 19:47 |
y2kenny | ok so it created a namespace as defined by the driver config. Do you how zuul will use this fungi? | 19:52 |
fungi | i'm not familiar with the kubernetes driver, i was mostly trying to reverse engineer the error from the traceback | 19:54 |
y2kenny | ok | 19:57 |
y2kenny | it's not a big deal, I am just curious on its function. | 19:57 |
*** jamesmcarthur has quit IRC | 20:09 | |
corvus | y2kenny: you'll see a namespace either way... | 20:12 |
y2kenny | corvus: so what I noticed is the listing in the UI->Nodes | 20:12 |
corvus | y2kenny: if you request a pod, it still gets a namespace so that it can be set up securely, but there will also be a pod, and that's what zuul puts in the inventory. the namespace isn't really accessible, it's an implementation detail. | 20:12 |
corvus | y2kenny: if you request a namespace, then you'll get a namespace without a pod, and zuul gets information about how to use the namespace, but nothing in the inventory | 20:13 |
y2kenny | so for the second case, how would I use the namespace without a pod | 20:14 |
corvus | y2kenny: the listing in the nodes tab should either be for a pod or a namespace; you shouldn't see an extra namespace there (what i was describing is what you would see if you looked in k8s) | 20:14 |
y2kenny | so I see additional nodes with labels zuul-nodes (the name of the namespace in this case) with connection name space | 20:15 |
y2kenny | namespace* | 20:15 |
*** jamesmcarthur has joined #zuul | 20:15 | |
y2kenny | the server is auto generated with -d (in my case my pool name is k8s-auto , so the server is k8s-auto-0000000279 for example) | 20:16 |
y2kenny | Full line: 0000000279zuul-nodesnamespacek8s-auto-0000000279k8s-containersready23 minutes ago | 20:16 |
corvus | that looks right; the next job that requests a "zuul-nodes" label will get that namespace assigned to it | 20:17 |
corvus | y2kenny: as for using it, info about connecting to it will appear in the zuul.resources ansible variable | 20:17 |
y2kenny | ok. So it's up to a zuul role to launch something in it | 20:18 |
y2kenny | ? | 20:18 |
y2kenny | zuul role or ansible role? | 20:18 |
corvus | y2kenny: yes... tristanC do you know if we have anything to help with that? | 20:18 |
tobiash | corvus, AJaeger: zuul py35 jobs are failing since the release of cliff 3.0.0 which seems to require py36 | 20:18 |
corvus | y2kenny: see here: https://zuul-ci.org/docs/zuul/reference/jobs.html#var-zuul.items.resources | 20:19 |
corvus | y2kenny: that has some examples about how to use it | 20:19 |
y2kenny | OOOOH | 20:20 |
tobiash | AJaeger: was it intended to also drop py35 there https://review.opendev.org/705612 ? | 20:20 |
corvus | y2kenny: zuul will write out the .kube/config file for you, so it should be all ready to go | 20:20 |
y2kenny | I did not see that bit (I was looking at that page for some other variables.) | 20:20 |
corvus | y2kenny: we should probably cross-reference that better from the nodepool docs | 20:20 |
tristanC | corvus: y2kenny: here is playbook that populate a namespace: https://review.opendev.org/#/c/570669/8/playbooks/openshift/pre.yaml | 20:21 |
corvus | tristanC: ah, great example, thanks! | 20:21 |
Shrews | before i forget, https://review.opendev.org/#/q/topic:node-attr has some changes y2kenny needs. i can fix up the merge conflict once we merge its parent | 20:22 |
corvus | tobiash: any idea what we use cliff for? | 20:22 |
tobiash | corvus: looks to be a transitive dependency | 20:22 |
tobiash | I don't know yet what pulls it in, maybe openstacksdk? | 20:22 |
tobiash | I just learned that cliff exists | 20:23 |
Shrews | i don't think sdk depends on cliff. i think osc does | 20:23 |
y2kenny | tristanC, corvus: thanks for the examples. | 20:23 |
y2kenny | mnaser, tristan: so I manually upgraded the kubectl inside the executor container image | 20:24 |
mnaser | y2kenny: cool -- how did that work? | 20:25 |
*** saneax has quit IRC | 20:25 | |
mnaser | fwiw we should install newer kubectl in our images.. | 20:25 |
mnaser | we have like 1.11 or some ancient version that doesn't support --address | 20:25 |
mordred | sdk definitely doesn't depend on cliff - would be interesting to know what's pulling it in | 20:25 |
corvus | mnaser: what's --address for? | 20:25 |
y2kenny | mnaser: that seems to have solved the exception (I don't see the error in the executor log any more, but I am still not able to get the pod to do useful work.) | 20:26 |
mnaser | corvus: it seems you used --address in port-forward to force it to listen to 127.0.0.1 | 20:26 |
mnaser | but it seems like the default value _might_ already be 127.0.0.1 | 20:26 |
corvus | oh | 20:26 |
mnaser | so we could drop it and not have to install newer kubectl | 20:26 |
corvus | we get current kubectl from openshift clients | 20:26 |
corvus | so if we do need to upgrade, we can look at either whether there's a newer openshift, or if we also need to install a dedicated kubectl | 20:27 |
corvus | but yeah, maybe dropping address would be better | 20:27 |
corvus | mnaser, y2kenny: i can try a test without --address (i shoud be able to just manually set up the same kind of port forward) using the current openshift kubectl and see if that's a solution | 20:28 |
corvus | mordred: any idea the best way to find out what's using cliff? | 20:28 |
corvus | (pip --graphviz would be great :) | 20:28 |
mordred | corvus: we can just look in the tox logs | 20:28 |
mordred | one sec | 20:28 |
mordred | this is zuul or nodepool? | 20:29 |
corvus | tobiash: have a link to a failure online, or you just seeing this locally now? | 20:29 |
mnaser | corvus: given we use ubuntu, why don't we use the kubeernetes provided deb packages? | 20:29 |
tobiash | corvus: https://c7cc7615691360b259ca-b4831d08abece0047714b6befcdb357a.ssl.cf1.rackcdn.com/715270/2/check/tox-py35/8dfd7e8/job-output.txt | 20:29 |
corvus | mnaser: need openshift clients, and we get kubectl "for free" | 20:29 |
AJaeger | tobiash: yes, dropping py3.5 was intented | 20:29 |
corvus | mnaser: not a big deal, just seemed like unecessary extra work at the time | 20:29 |
mnaser | oh, yes, gotcha | 20:29 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Cap cliff to <3.0.0 https://review.opendev.org/715303 | 20:30 |
tristanC | fwiw, zuul-operator could run kubectl node and console-stream integration test, this initial change is still waiting for review: https://review.opendev.org/714165 | 20:30 |
mnaser | i dont know if the openshift client behaviour is listening to port forward 127.0.0.1 | 20:30 |
corvus | mnaser: yeah, i'll test it with the actual openshift kubectl | 20:30 |
tobiash | I guess then we need to cap it or drop py35 in zuul as well | 20:30 |
mordred | corvus: Collecting cliff>=2.8.0 (from stestr>=1.0.0->-r /Users/mordred/src/opendev.org/zuul/zuul/test-requirements.txt (line 6)) | 20:31 |
AJaeger | tobiash: that's what all (most?) of oslo did as part of the Ussuri goal | 20:31 |
mordred | corvus: stestr | 20:31 |
corvus | tristanC: i was just about to review that when this happened :( | 20:31 |
corvus | mordred: can we avoid stestr? | 20:31 |
corvus | (this is the sort of dependency we tried really hard to avoid in zuul and nodepool) | 20:31 |
y2kenny | mnaser: do you know if all the pre and post run are done on the target node or the executor local? | 20:31 |
mnaser | corvus: cool, i gave y2kenny a small test script and yeah the error he was getting was --address is missing error | 20:31 |
mordred | corvus: it would probably be quicker to just fix cliff | 20:32 |
corvus | like, i don't understand why stestr would drop 3.5 support | 20:32 |
mnaser | y2kenny: playbooks run on remote node | 20:32 |
corvus | mordred: AJaeger says it was intentional in cliff | 20:32 |
tobiash | I guess stestr doesn't want to but didn't cap cliff as well | 20:32 |
mordred | well - as the current PTL of cliff, let me retun to that question real quick | 20:32 |
corvus | mordred: i await your proclamation with renewed interest :)- | 20:32 |
corvus | wow, sorry about the soul patch there, that was a typo | 20:33 |
AJaeger | overall idea was AFAIK to remove py35 unless there was a need to keep it - and there were few exceptions only. I just updated a few repos | 20:33 |
tobiash | since stestr is wider spread than openstack that might be such an exception | 20:34 |
mordred | corvus: remote: https://review.opendev.org/715305 Re-add support for python 3.5 | 20:34 |
AJaeger | https://governance.openstack.org/tc/goals/selected/ussuri/drop-py27.html#completion-criteria has "The minimum version of Python now supported by <project> is Python 3.6." as note | 20:34 |
mordred | corvus: feel free to +2 that | 20:34 |
corvus | yeah, i'm getting the idea here that stestr probably didn't intend this and fixing cliff is the correct solution | 20:34 |
mordred | and I'll request a release as soon as it's landed | 20:35 |
y2kenny | Oh... I think I just caught a problematic line. For the "prepare-workspace" task, I got "Output suppressed because no_log was given" | 20:35 |
AJaeger | mordred: want to run py35 tests as well? | 20:35 |
tobiash | I guess that makes sense | 20:36 |
mordred | AJaeger: probably not a bad idea :) | 20:36 |
corvus | mordred, AJaeger: +2 on 715305 and i think running py35 tests is reasonable | 20:36 |
mordred | AJaeger: is there a template? or should I just add in openstack-tox-py35 ? | 20:36 |
mordred | corvus, AJaeger: updated with py35 jobs added | 20:37 |
AJaeger | mordred: openstack-python35-jobs is the mpalte | 20:37 |
mordred | oh - there's a ... one sec | 20:37 |
mordred | done | 20:38 |
y2kenny | what is the best way to debug the ansible being run on the target node? (is there a way to have the executor run the ansible playbook with vvvv?) | 20:38 |
AJaeger | mordred: LGTM | 20:38 |
corvus | y2kenny: run "zuul-executor verbose" on the executor | 20:38 |
corvus | y2kenny: it will switch to "-vvv" (i'm pretty sure it's 3 not 4 v's) for all subsequent jobs | 20:39 |
mordred | corvus, tobiash: as soon as that lands I'll get a release cut | 20:39 |
tobiash | awesome, I guess I can drop the cap then :) | 20:39 |
corvus | y2kenny: then "zuul-executor unverbose" will restore normal behavior | 20:39 |
y2kenny | corvus: thanks. I will give that a try | 20:39 |
mordred | I've got 2 other releases I need cut today anyway, so my today is mostly watching patches land | 20:39 |
tobiash | y2kenny: don't forget the unverbose, I once managed to fill the hard disks over lunch :-P | 20:40 |
y2kenny | tobiash: :) | 20:41 |
y2kenny | do I need to restart the executor? | 20:41 |
AJaeger | mordred: need 90 mins to run | 20:41 |
corvus | y2kenny: no, if you do, it'll also go back to normal (non-verbose) mode | 20:42 |
tobiash | no, just execute that command, that will tell the running executor to switch behavior | 20:42 |
y2kenny | um... ok... so perhaps the log I am looking at is not what I think it is | 20:42 |
y2kenny | I am looking at the console.log from the web UI | 20:42 |
tobiash | y2kenny: you need to look at the executor logs then | 20:42 |
mordred | AJaeger: I'm also waiting on two openstacksdk patches to land so we can cut a release to also fix an issue with declared python support - in this case we forgot to put in a "we only support python3" while adding a python3-only dep | 20:42 |
tobiash | those will contain the ansible debug logs | 20:42 |
corvus | y2kenny: the easiest way to get the right output is to find the build uuid (it's in the url for the build) and grep for that in the executor logs | 20:43 |
y2kenny | is it a separate file or do they get dump to stdout? | 20:44 |
y2kenny | my case is easier... still prototyping so there's really just one job :) | 20:44 |
y2kenny | I see a lot of build: <hash> and e: <hash> | 20:44 |
corvus | y2kenny: in k8s, should go to stdout, but those might be at info-level only while the ansible stuff is at debug level? let me check | 20:44 |
*** jamesmcarthur has quit IRC | 20:45 | |
y2kenny | I see some DEBUG tag (I am running executor in debug mode) but those are executor debug not ansible debug? | 20:45 |
corvus | ok good, then they should show up there | 20:45 |
y2kenny | or ansible debug will get propagated to there? | 20:45 |
y2kenny | ok | 20:45 |
AJaeger | mordred: https://review.opendev.org/#/c/715243/6/releasenotes/notes/python-3.5-629817cec092d528.yaml has a typo, doesn't it? | 20:45 |
mordred | AJaeger: yup! | 20:46 |
corvus | y2kenny: all of the ansible output that goes to the console log should also end up in the zuul executor debug log, and when in verbose mode, all the *extra* ansible output should end up there too | 20:46 |
mnaser | corvus: we can drop --address | 20:46 |
mnaser | root@c0d83c34bf93:~# kubectl port-forward pod/nginx-deployment-574b87c764-795g5 :80 => Forwarding from 127.0.0.1:34177 -> 80 | 20:46 |
mnaser | (thats in zuul/zuul-executor docker contaienr) | 20:46 |
corvus | y2kenny: those should all start with "Ansible output: " | 20:46 |
mordred | AJaeger: thanks - that turns out to be an important typo | 20:47 |
corvus | mnaser: awesome, thanks, that's more or less what i was going to test but hadn't gotten around to; you want to push the patch or should i? | 20:47 |
y2kenny | corvus: ok yes! I see the verbose now | 20:47 |
mnaser | corvus: i will right now | 20:47 |
AJaeger | mordred: one more typo found - sorry ;( | 20:48 |
AJaeger | mordred: finally the note makes sense, took me some time | 20:49 |
mordred | AJaeger: wow | 20:49 |
mordred | AJaeger: done! | 20:49 |
AJaeger | thanks | 20:50 |
corvus | tristanC: that integration test looks great; i like the git service -- that keeps it nice and separate from the scheduler pod. | 20:51 |
AJaeger | mordred: I'll add another "." to your note... | 20:51 |
openstackgerrit | Mohammed Naser proposed zuul/zuul master: executor: drop --address=127.0.0.1 from kubectl https://review.opendev.org/715308 | 20:52 |
tristanC | corvus: nice thanks! I'm adding the nodepool-launcher service to the operator, and i think it will be easy to run the integration test on a kubernete nodeset | 20:52 |
mnaser | corvus: ^ | 20:52 |
y2kenny | corvus: ok so the verbose didn't quite help in this particular case because the failing role is the "prepare-workspace" and no_log was turned on (I didn't realize what that no_log means until I look at the code for the role just now.) | 20:54 |
AJaeger | mordred: pushed a cleanup on your stack as well - https://review.opendev.org/#/c/715309/ | 20:56 |
corvus | y2kenny: oh sorry, i should have caught that | 20:57 |
corvus | y2kenny: i think that no_log is in there just because it's really verbose otherwise | 20:57 |
y2kenny | yea... since it's a file sync | 20:57 |
y2kenny | so the pod image I am using for this is the one from the tutorial | 20:57 |
corvus | mordred: we should maybe make the no_log a flag, so someone in y2kenny's situation could turn that off in the base job | 20:57 |
mordred | corvus: ++ | 20:58 |
corvus | y2kenny: oh, if you're synchronizing to a pod, you'll need to do something different | 20:58 |
y2kenny | BUT, because I am using the pod type, it's launching the image plain w/o the volume | 20:58 |
y2kenny | i.e. w/o the ssh key | 20:58 |
y2kenny | I am guessing that's the issue? | 20:59 |
corvus | y2kenny: to be clear, this is using a "pod" nodepool type, not "namespace"? | 20:59 |
y2kenny | that's correct | 21:00 |
y2kenny | I haven't gone to the namespace usage yet | 21:00 |
y2kenny | but I have a feeling that's probably the one more useful to me | 21:00 |
y2kenny | since I doubt I will have a build/test job that would work right away by launching a plain pod | 21:01 |
corvus | y2kenny: for a pod, i think you need to use the "prepare-workspace-openshift" role | 21:01 |
corvus | (yes, it works for k8s) | 21:01 |
y2kenny | oh!... I will give that a try | 21:01 |
y2kenny | so just a quick side question, do you guys then have different base/pre.yaml or base job for different nodeset? | 21:02 |
y2kenny | to manage this type of variation? | 21:02 |
corvus | y2kenny: basically, i think synchronize doesn't work with k8s, so instead, it runs "oc rsync"; that's one of the reasons we automatically include the openshift client on the executor image | 21:03 |
y2kenny | ooo... | 21:03 |
* mnaser would ideally like that we use kubernetes-isms instead of openshift-isms in our roles :> | 21:04 | |
corvus | y2kenny: i think that's what tristanC uses in his mixed environments; you can have as many base jobs as you want, so if you've got container jobs, make sure they inherit from the container base job. another approach would be to extend the base pre-playbook to include either the prepare-workspace or prepare-workspace-openshift role depending on what's in the inventory | 21:04 |
corvus | mnaser: yes, in this case there's a thing the openshift client supports that kubectl doesn't: rsync | 21:04 |
corvus | but it works with both k8s and openshift | 21:05 |
corvus | so it makes things easy for us | 21:05 |
mnaser | oh i see | 21:05 |
* mnaser just sometimes assumes everyone is running inside containers | 21:05 | |
corvus | i think we talked about duplicating or renaming that role (to reduce confusion) but i don't recall if that made it all the way to a patch yet | 21:05 |
*** y2kenny has quit IRC | 21:06 | |
tristanC | corvus: y2kenny: ftr, here is a base pre that use different prepare roles depending of the connection: https://pagure.io/fedora-project-config/blob/master/f/playbooks/base/pre.yaml#_12 | 21:06 |
corvus | tristanC: yeah, maybe that's the sort of thing we should do in zuul-base-jobs and the zuul docs | 21:06 |
tristanC | corvus: well, ideally the prepare-workspace would do the right thing, but there is also the issue with build-sshkey which fails with kubectl | 21:07 |
*** y2kenny has joined #zuul | 21:08 | |
mnaser | is someone looking into the tox-py35 failures? | 21:08 |
corvus | tristanC: good point; maybe we can update both of those to protect with "ansible_connection != kubectl" | 21:08 |
corvus | mnaser: yes, that's the cliff thing from earlier; pending a cliff release | 21:08 |
mnaser | i can pick it up if someone isnt looking | 21:08 |
mnaser | ah okay, cool | 21:08 |
corvus | mnaser: https://review.opendev.org/715305 | 21:08 |
*** jamesmcarthur has joined #zuul | 21:08 | |
corvus | actually, i guess pending that landing then a cliff release | 21:08 |
* y2kenny got disconnected due to stupid vpn | 21:09 | |
mnaser | all you missed ^ https://www.irccloud.com/pastebin/GDqUsdr5/ | 21:10 |
y2kenny | mnaser: thanks! those are important bits | 21:11 |
mnaser | np | 21:11 |
* mnaser off to seek food | 21:11 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add nodepool launcher service initial deployment https://review.opendev.org/715310 | 21:20 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add nodepool external config https://review.opendev.org/715311 | 21:20 |
*** jamesmcarthur has quit IRC | 21:30 | |
*** jamesmcarthur has joined #zuul | 21:33 | |
*** tjgresha__ has joined #zuul | 21:33 | |
*** tjgresha_ has quit IRC | 21:35 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add nodepool external config https://review.opendev.org/715311 | 21:48 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Adapt the integration playbook to be usable locally https://review.opendev.org/714163 | 21:48 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add nodepool labels to integration test https://review.opendev.org/715316 | 21:48 |
tristanC | corvus: would you mind checking https://review.opendev.org/714163 too, it's a small re-org that really help to run the test locally | 21:49 |
tristanC | then https://review.opendev.org/715316 (and its parents) should demonstrate a working console-stream when using a kubectl connection with mnaser change that removes the faulty `--address` argument | 21:50 |
*** jamesmcarthur has quit IRC | 21:59 | |
*** jamesmcarthur has joined #zuul | 22:01 | |
mordred | tobiash, corvus: cliff change is in the gate | 22:21 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add nodepool labels to integration test https://review.opendev.org/715316 | 22:25 |
*** tjgresha_ has joined #zuul | 22:35 | |
*** tjgresha__ has quit IRC | 22:38 | |
*** rfolco has quit IRC | 22:45 | |
*** jamesmcarthur has quit IRC | 22:46 | |
*** rfolco has joined #zuul | 22:46 | |
*** rfolco has quit IRC | 22:52 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script" https://review.opendev.org/715325 | 23:00 |
*** jamesmcarthur has joined #zuul | 23:02 | |
*** y2kenny has quit IRC | 23:04 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Remove bashate from test-requirements https://review.opendev.org/715328 | 23:23 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script" https://review.opendev.org/715325 | 23:24 |
tristanC | and it seems like install-kubernetes broke, it's now failing with `X The none driver requires conntrack to be installed for kubernetes version 1.18.0` | 23:26 |
tristanC | corvus: also it seems like the zuul-operator functional test is a bit flaky, it seems like when adding a connection, the scheduler sometime restart before the merger, and the cat job failed unexpectedly, resulting in the scheduler not loading the tenant config | 23:27 |
corvus | tristanC: can the operator manage that sequence? | 23:27 |
tristanC | corvus: the role is patching the service in order, but it seems like kubernetes sometime restart the service out of order... here is the task: https://opendev.org/zuul/zuul-operator/src/branch/master/roles/zuul-restart-when-zuul-conf-changed/tasks/main.yaml#L13 | 23:29 |
tristanC | i guess we could add a lockstep between each patch, but wouldn't it be easier if the scheduler simply restart failed cat job? | 23:30 |
corvus | tristanC: yes, but only if the merger is actually going to get fixed. if it's running with the wrong configuration, it'll never succeed | 23:30 |
corvus | in that case, a hard, fast failure is better | 23:30 |
corvus | so i'd prefer the operator do what we would ask a human to do, which is restart the services in the correct order | 23:30 |
corvus | it's probably worth checking that the merger is up before proceeding to the scheduler | 23:31 |
corvus | or, at least checking that the merger is down :) | 23:31 |
tristanC | i meant, a simple cat job retry would likely mitigate that failure | 23:31 |
corvus | doesn't actually have to be up | 23:31 |
corvus | tristanC: probably not | 23:31 |
corvus | tristanC: imagine a sizable cluster with 20 mergers and 30 executors | 23:31 |
corvus | tristanC: we can retry cat jobs *really fast*; much faster than a scheduler can restart | 23:32 |
*** jamesmcarthur has quit IRC | 23:32 | |
tristanC | corvus: the scheduler cat job failed at 2020-03-26 21:38:16,661 in https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e64/715310/1/check/zuul-operator-functional-k8s/e645105/docker/k8s_scheduler_zuul-scheduler-0_default_7f90728a-333d-44d2-8e85-be73693b0f68_0.txt | 23:32 |
*** jamesmcarthur has joined #zuul | 23:33 | |
tristanC | corvus: the merger got started at 2020-03-26 21:38:18,279 in https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e64/715310/1/check/zuul-operator-functional-k8s/e645105/docker/k8s_merger_zuul-merger-75d5bd4644-klnhg_default_a6bdc51c-5e50-4a64-ba28-8eba1cbfecb1_0.txt | 23:33 |
corvus | tristanC: right, i get that slowing it down 2 seconds in this test job would fix this race in this test. but this is a production bug, so we should fix it systemically so this doesn't happen for large installations. | 23:34 |
tristanC | corvus: so... what would be the `ready` condition for merger service? | 23:34 |
corvus | tristanC: they don't actually have to be ready, they just have to not be running the old config. | 23:35 |
corvus | tristanC: the easiest thing to do (and what we do with the opendev restart script) is to make sure they're *down*, not up. | 23:35 |
tristanC | corvus: oh ok, then that's something we should be able to do | 23:36 |
corvus | https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul_restart.yaml | 23:36 |
corvus | (obvs not k8s, but that's the play structure) | 23:36 |
*** jamesmcarthur has quit IRC | 23:38 | |
tristanC | in k8s, i guess we can process merger and executor, and make sure they are in the desired state before doing the scheduler. | 23:42 |
tristanC | since we use deployment and statefulset, i don't think we can easily manage each pod individually | 23:42 |
tristanC | perhaps we could delete instead of patching, but then i think we'll loose the persistent volume | 23:43 |
tristanC | time to afk for me. have a good rest of the day folks | 23:45 |
*** jamesmcarthur has joined #zuul | 23:53 | |
*** tosky has quit IRC | 23:53 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!