@y2kenny:matrix.org | with required-project/override-checkout, what is the expected behaviour if I specify full ref like "refs/heads/some-release" or "refs/tags/some-release-tag"? | 00:48 |
---|---|---|
@y2kenny:matrix.org | Seems like Zuul is not able to understand full ref and just checkout default silently. | 01:07 |
@jim:acmegating.com | i believe it is not expecting the full ref | 01:14 |
@jim:acmegating.com | it strips `refs/*/` from the start of any branches and tags when it inventories the refs in the repo for use by override-checkout | 01:15 |
@y2kenny:matrix.org | I did some experiment and it doesn't seem to stripes refs/*/ | 01:16 |
@y2kenny:matrix.org | it seems to not able to see the ref and just checkout the default | 01:16 |
@y2kenny:matrix.org | like if I specify refs/heads/release1, the override will just checkout master | 01:22 |
@y2kenny:matrix.org | This would also be an issue if branch and tag have the same name (not saying that's a good practice... but it can happen.) | 01:23 |
@jim:acmegating.com | i agree, it doesn't strip refs/*/, i was trying to say it expects you to do that. | 01:29 |
@jim:acmegating.com | it strips it on the comparison side, not the input | 01:29 |
@jim:acmegating.com | (so if your git repo has refs/heads/foo then set override-checkout: foo) | 01:30 |
@y2kenny:matrix.org | ok. If there are refs/heads/foo and refs/tags/foo, is there a way to remedies the ambiguity? | 01:31 |
@y2kenny:matrix.org | remedy* | 01:32 |
@jim:acmegating.com | i don't believe so. i believe it will defer to git. | 01:34 |
@y2kenny:matrix.org | ok thanks. | 01:35 |
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 819408: Skip tests asserting on tenant reconfig results on multi scheduler https://review.opendev.org/c/zuul/zuul/+/819408 | 01:36 | |
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul] 819735: Add nonvoting multischeduler job https://review.opendev.org/c/zuul/zuul/+/819735 | 02:49 | |
@iwienand:matrix.org | zuul-maint: could i get some feedback on https://review.opendev.org/c/zuul/nodepool/+/818705 ; this checks dib is setting kernel flags correctly. as noted inline, we introduce issues that mean things boot when built on gate dib images, but then don't boot when built externally | 03:51 |
@avass:vassast.org | Clark: corvus Cool thanks. Shouldn't be any issue for us since we haven't put zuul into production yet, and getting 5.0 before that would be perfect :) | 07:46 |
-@gerrit:opendev.org- Albin Vass proposed: [zuul/nodepool] 820024: Add support for configuration runtimeClassName https://review.opendev.org/c/zuul/nodepool/+/820024 | 13:17 | |
-@gerrit:opendev.org- Albin Vass proposed: [zuul/nodepool] 820024: Openshift: Enable configuring runtimeClassName https://review.opendev.org/c/zuul/nodepool/+/820024 | 13:23 | |
-@gerrit:opendev.org- Albin Vass proposed: [zuul/nodepool] 820024: Openshift: Enable configuring runtimeClassName https://review.opendev.org/c/zuul/nodepool/+/820024 | 13:26 | |
-@gerrit:opendev.org- Albin Vass proposed: [zuul/nodepool] 820024: Openshift: Enable configuring runtimeClassName https://review.opendev.org/c/zuul/nodepool/+/820024 | 13:33 | |
@clarkb:matrix.org | corvus: following up on https://review.opendev.org/c/zuul/zuul/+/818257 I'll try to update the change to address tobiash's comment but I wanted to make sure I understood yours better first. Can you check my question there? | 16:49 |
@spamaps:spamaps.ems.host | Really struggling to debug ansible problems with kubernetes hosts. I can't even run a basic ansible command on the executor.. | 16:50 |
@spamaps:spamaps.ems.host | ```root@28053b26ad5f:/var/lib/zuul/builds/c79ab538a1dc4d9dba66019504b345ea# /usr/local/lib/zuul/ansible/2.9/bin/ansible -c kubectl -e ansible_kubectl_kubeconfig=$PWD/work/.kube/config -e ansible_kubectl_context=zuul -m setup -i ubuntu-pod, all | 16:51 |
+ /usr/local/lib/zuul/ansible/2.9/bin/ansible -c kubectl -e ansible_kubectl_kubeconfig=/var/lib/zuul/builds/c79ab538a1dc4d9dba66019504b345ea/work/.kube/config -e ansible_kubectl_context=zuul -m setup -i ubuntu-pod, all | ||
ubuntu-pod | UNREACHABLE! => { | ||
"changed": false, | ||
"msg": "Failed to create temporary directory.In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \"` echo ~/.ansible/tmp `\"&& mkdir \"` echo ~/.ansible/tmp/ansible-tmp-1638377458.0963545-3360-181097587516921 `\" && echo ansible-tmp-1638377458.0963545-3360-181097587516921=\"` echo ~/.ansible/tmp/ansible-tmp-1638377458.0963545-3360-181097587516921 `\" ), exited with result 1", | ||
"unreachable": true | ||
}``` | ||
@spamaps:spamaps.ems.host | If I copy/paste the command that it says failed.. it runs fine. | 16:53 |
@spamaps:spamaps.ems.host | Ah wait, -vvv shows that's actually running on the pod.. derp | 16:54 |
@tobias.henkel:matrix.org | sounds like /tmp is not writable in the pod? | 17:07 |
@spamaps:spamaps.ems.host | Starting to think maybe the kubernetes support in Zuul and Ansible isn't well tested? Or maybe I'm just dumb. I can't get it to work and it feels like it is failing at a really low level. If anybody has tips, maybe a walkthrough, I'd appreciate it. My Zuul POC is falling apart because as much as I don't mind running on VMs.. our entire CI/CD is based on containers and I don't relish trying to get that working on VMs when I have a perfectly good k8s cluster. | 17:07 |
@jpew:matrix.org | I'm using k8s | 17:07 |
@spamaps:spamaps.ems.host | ```root@28053b26ad5f:/var/lib/zuul/builds/78129a8a35e0450aaa560a7133eaf23a# /usr/bin/kubectl exec -ti ubuntu-pod-0000000005 -- /bin/bash | 17:08 |
Defaulted container "ubuntu-pod" out of: ubuntu-pod, ffwd-java-shim | ||
root@ubuntu-pod-0000000005:/# mkdir /tmp/foo | ||
root@ubuntu-pod-0000000005:/#``` | ||
@spamaps:spamaps.ems.host | nope, /tmp, and ~/.ansible/tmp, work fine | 17:08 |
@spamaps:spamaps.ems.host | When I run the kubectl commands that -vvv prints out, myself, they work | 17:08 |
@spamaps:spamaps.ems.host | ```root@28053b26ad5f:/var/lib/zuul/builds/78129a8a35e0450aaa560a7133eaf23a# /usr/bin/kubectl exec -i ubuntu-pod-0000000005 -- /bin/sh -c '( umask 77 && mkdir -p "` echo ~/.ansible/tmp `"&& mkdir "` echo ~/.ansible/tmp/ansible-tmp-1638378065.7739134-4574-248799323483403 `" && echo ansible-tmp-1638378065.7739134-4574-248799323483403="` echo ~/.ansible/tmp/ansible-tmp-1638378065.7739134-4574-248799323483403 `" ) && sleep 0' | 17:09 |
Defaulted container "ubuntu-pod" out of: ubuntu-pod, ffwd-java-shim | ||
ansible-tmp-1638378065.7739134-4574-248799323483403=/root/.ansible/tmp/ansible-tmp-1638378065.7739134-4574-248799323483403 | ||
root@28053b26ad5f:/var/lib/zuul/builds/78129a8a35e0450aaa560a7133eaf23a#``` | ||
@spamaps:spamaps.ems.host | The one thing I do notice... | 17:10 |
@spamaps:spamaps.ems.host | ```<ubuntu-pod-0000000005> EXEC ['/usr/bin/kubectl', 'exec', '-i', 'ubuntu-pod-0000000005', '--', '/bin/sh', '-c', '/bin/sh -c \'( umask 77 && mkdir -p "` echo ~/.ansible/tmp `"&& mkdir "` echo ~/.ansible/tmp/ansible-tmp-1638378594.8050508-5149-183244121444877 `" && echo ansible-tmp-1638378594.8050508-5149-183244121444877="` echo ~/.ansible/tmp/ansible-tmp-1638378594.8050508-5149-183244121444877 `" ) && sleep 0\'']``` | 17:10 |
@spamaps:spamaps.ems.host | `/bin/sh -c` twice after `--`` | 17:11 |
@spamaps:spamaps.ems.host | * `/bin/sh -c` twice after `--` | 17:11 |
@spamaps:spamaps.ems.host | If I do that, I don't get exit code 1, I get a forever never returning kubectl. | 17:11 |
@tobias.henkel:matrix.org | I wonder why that's doing a double shell | 17:11 |
@jim:acmegating.com | Clark: replied | 17:13 |
@tobias.henkel:matrix.org | the question is, does it fail because the command within the pod fails or does it already fail to do the kubectl exec | 17:13 |
@tobias.henkel:matrix.org | what does ansible tell you if you don't execute setup but just a shell task with a sleep? | 17:14 |
@clarkb:matrix.org | corvus: is the concern that calling the abstract method will raise an exception and we'll retry a couple of times unnecessarily? | 17:15 |
@spamaps:spamaps.ems.host | > <@tobias.henkel:matrix.org> what does ansible tell you if you don't execute setup but just a shell task with a sleep? | 17:16 |
same, because this is in the connection establishment phase. It hasn't even begun to try and run a module. | ||
@spamaps:spamaps.ems.host | Like, it hasn't even copied setup to the container yet. | 17:17 |
@spamaps:spamaps.ems.host | Ok I added *5* v's and I got a new error. Yay ansible. | 17:18 |
@clarkb:matrix.org | the command that failed is using ~/.ansible/tmp not /tmp | 17:18 |
@spamaps:spamaps.ems.host | > <@clarkb:matrix.org> the command that failed is using ~/.ansible/tmp not /tmp | 17:19 |
Both work fine | ||
@spamaps:spamaps.ems.host | I got a new error finally.. | 17:19 |
@spamaps:spamaps.ems.host | `error: You must be logged in to the server (Unauthorized)\n",` | 17:19 |
@clarkb:matrix.org | ok just checking as your example test above was for /tmp. | 17:19 |
@spamaps:spamaps.ems.host | Ok so that's kubectl complaining. | 17:19 |
@spamaps:spamaps.ems.host | But when I run it directly, it works. | 17:19 |
@spamaps:spamaps.ems.host | I do not get that error when I run kubectl directly. >:| | 17:20 |
@spamaps:spamaps.ems.host | ```<ubuntu-pod-0000000005> ESTABLISH kubectl CONNECTION | 17:21 |
ubuntu-pod-0000000005 | EXEC ['/usr/bin/kubectl', 'exec', '-i', 'ubuntu-pod-0000000005', '--', '/bin/sh', '-c', "/bin/sh -c 'echo ~zuul && sleep 0'"] | |
ubuntu-pod-0000000005 | EXEC ['/usr/bin/kubectl', 'exec', '-i', 'ubuntu-pod-0000000005', '--', '/bin/sh', '-c', '/bin/sh -c \'echo "`pwd`" && sleep 0\''] | |
ubuntu-pod-0000000005 | EXEC ['/usr/bin/kubectl', 'exec', '-i', 'ubuntu-pod-0000000005', '--', '/bin/sh', '-c', '/bin/sh -c \'( umask 77 && mkdir -p "` echo ~/.ansible/tmp `"&& mkdir "` echo ~/.ansible/tmp/ansible-tmp-1638379115.6251209-5724-266407412259585 `" && echo ansible-tmp-1638379115.6251209-5724-266407412259585="` echo ~/.ansible/tmp/ansible-tmp-1638379115.6251209-5724-266407412259585 `" ) && sleep 0\''] | |
ubuntu-pod | UNREACHABLE! => { | ||
"changed": false, | ||
"msg": "Failed to create temporary directory.In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \"` echo ~/.ansible/tmp `\"&& mkdir \"` echo ~/.ansible/tmp/ansible-tmp-1638379115.6251209-5724-266407412259585 `\" && echo ansible-tmp-1638379115.6251209-5724-266407412259585=\"` echo ~/.ansible/tmp/ansible-tmp-1638379115.6251209-5724-266407412259585 `\" ), exited with result 1, stderr output: error: You must be logged in to the server (Unauthorized)\n", | ||
"unreachable": true | ||
} | ||
root@28053b26ad5f:/var/lib/zuul/builds/78129a8a35e0450aaa560a7133eaf23a#``` | ||
@spamaps:spamaps.ems.host | * ```<ubuntu-pod-0000000005> | 17:21 |
ubuntu-pod-0000000005 | EXEC ['/usr/bin/kubectl', 'exec', '-i', 'ubuntu-pod-0000000005', '--', '/bin/sh', '-c', "/bin/sh -c 'echo ~zuul && sleep 0'"] | |
ubuntu-pod-0000000005 | EXEC ['/usr/bin/kubectl', 'exec', '-i', 'ubuntu-pod-0000000005', '--', '/bin/sh', '-c', '/bin/sh -c \'echo "`pwd`" && sleep 0\''] | |
ubuntu-pod-0000000005 | EXEC ['/usr/bin/kubectl', 'exec', '-i', 'ubuntu-pod-0000000005', '--', '/bin/sh', '-c', '/bin/sh -c \'( umask 77 && mkdir -p "` echo ~/.ansible/tmp `"&& mkdir "` echo ~/.ansible/tmp/ansible-tmp-1638379115.6251209-5724-266407412259585 `" && echo ansible-tmp-1638379115.6251209-5724-266407412259585="` echo ~/.ansible/tmp/ansible-tmp-1638379115.6251209-5724-266407412259585 `" ) && sleep 0\''] | |
ubuntu-pod | UNREACHABLE! => { | ||
"changed": false, | ||
"msg": "Failed to create temporary directory.In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \"` echo ~/.ansible/tmp `\"&& mkdir \"` echo ~/.ansible/tmp/ansible-tmp-1638379115.6251209-5724-266407412259585 `\" && echo ansible-tmp-1638379115.6251209-5724-266407412259585=\"` echo ~/.ansible/tmp/ansible-tmp-1638379115.6251209-5724-266407412259585 `\" ), exited with result 1, stderr output: error: You must be logged in to the server (Unauthorized)\n", | ||
"unreachable": true | ||
} | ||
root@28053b26ad5f:/var/lib/zuul/builds/78129a8a35e0450aaa560a7133eaf23a# | ||
``` | ||
@spamaps:spamaps.ems.host | Ok I added in ANSIBLE_CONFIG= to use the zuul generated ansible.cfg and I get a slightly different manifestation of the same error. | 17:24 |
@jim:acmegating.com | Clark: no i mean make a new method on the base class so that we're not embedding source retry logic in the pipeline manager | 17:24 |
@spamaps:spamaps.ems.host | ```root@28053b26ad5f:/var/lib/zuul/builds/78129a8a35e0450aaa560a7133eaf23a# PYTHONPATH=/usr/local/lib/python3.8/site-packages ANSIBLE_CONFIG=ansible/playbook | 17:25 |
_0/ansible.cfg /usr/local/lib/zuul/ansible/2.9/bin/ansible -vvvvv -c kubectl -e ansible_kubectl_kubeconfig=$PWD/work/.kube/config -e ansible_kubectl_conte | ||
xt=zuul-ci:zuul-worker/146-148-79-24 -m shell -a 'sleep 1' -i ansible/inventory.yaml all | ||
@jim:acmegating.com | (you can't just put it in the abstract method because that's the contract that the source class has to implement, and you don't want to duplicate the retry logic in every driver; though i would be okay with that) | 17:25 |
@spamaps:spamaps.ems.host | new command.. and now I get this error... | 17:25 |
@spamaps:spamaps.ems.host | ```<ubuntu-pod> Attempting python interpreter discovery | 17:25 |
ubuntu-pod-0000000005 | ESTABLISH kubectl CONNECTION | |
ubuntu-pod-0000000005 | EXEC ['/usr/bin/kubectl', 'exec', '-i', 'ubuntu-pod-0000000005', '--', '/bin/sh', '-c', '/bin/sh -c \'echo PLATFORM; uname; echo FOUND; command -v \'"\'"\'/usr/bin/python\'"\'"\'; command -v \'"\'"\'python3.7\'"\'"\'; command -v \'"\'"\'python3.6\'"\'"\'; command -v \'"\'"\'python3.5\'"\'"\'; command -v \'"\'"\'python2.7\'"\'"\'; command -v \'"\'"\'python2.6\'"\'"\'; command -v \'"\'"\'/usr/libexec/platform-python\'"\'"\'; command -v \'"\'"\'/usr/bin/python3\'"\'"\'; command -v \'"\'"\'python\'"\'"\'; echo ENDFOUND && sleep 0\''] | |
[WARNING]: Unhandled error in Python interpreter discovery for host ubuntu-pod: unexpected output from Python interpreter discovery | ||
ubuntu-pod | Interpreter discovery remote stderr: | |
error: You must be logged in to the server (Unauthorized) | ||
Using module file /var/lib/zuul/ansible/2.9/zuul/ansible/library/command.py | ||
Pipelining is enabled. | ||
ubuntu-pod-0000000005 | EXEC ['/usr/bin/kubectl', 'exec', '-i', 'ubuntu-pod-0000000005', '--', '/bin/sh', '-c', "/bin/sh -c '/usr/bin/python && sleep 0'"] | |
[WARNING]: Platform unknown on host ubuntu-pod is using the discovered Python interpreter at /usr/bin/python, but future installation of another Python | ||
interpreter could change this. See https://docs.ansible.com/ansible/2.9/reference_appendices/interpreter_discovery.html for more information. | ||
ubuntu-pod | FAILED! => { | ||
"ansible_facts": { | ||
"discovered_interpreter_python": "/usr/bin/python" | ||
}, | ||
"changed": false, | ||
"module_stderr": "error: You must be logged in to the server (Unauthorized)\n", | ||
"module_stdout": "", | ||
"msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", | ||
"rc": 1 | ||
} | ||
@spamaps:spamaps.ems.host | Running that exact /usr/bin/kubectl ... command does not return. | 17:26 |
@spamaps:spamaps.ems.host | AHA! | 17:28 |
@spamaps:spamaps.ems.host | --kubeconfig was needed because ansible is silently setting the path to the right kubeconfig | 17:28 |
@tobias.henkel:matrix.org | then kubectl directly is likely using a different kubeconfig | 17:28 |
@spamaps:spamaps.ems.host | it's using the executor's adminy kubeconfig | 17:28 |
@spamaps:spamaps.ems.host | Ok so I can reproduce, Ok I'll debug without dumping it all here now. Honestly.. I wonder sometimes if ansible is worth it with containers. | 17:29 |
@tobias.henkel:matrix.org | Everyting in zuul is stuctured around ansible. I don't think it's feasible to replace it with something else | 17:30 |
@spamaps:spamaps.ems.host | I do think it could be, but I understand why it might sound radical. :) | 17:36 |
@jim:acmegating.com | there's an escape valve with zuul: have ansible run the thing you really want. the problem here is that only a trusted playbook can run kubectl without yet another execution context. i'd like to remove that restriction and we're working toward it. removing gearman is a step in that direction. | 17:38 |
@clarkb:matrix.org | > <@jim:acmegating.com> Clark: no i mean make a new method on the base class so that we're not embedding source retry logic in the pipeline manager | 17:40 |
Got it | ||
@jim:acmegating.com | meanwhile, zuul is the only system out there that can coordinate actions across bare metal, vms, and containers. ansible is a big part of why it can do that. | 17:40 |
@jim:acmegating.com | given that zuul is older than k8s, the flexibility to adopt to new systems has served us well | 17:43 |
@spamaps:spamaps.ems.host | > <@jim:acmegating.com> there's an escape valve with zuul: have ansible run the thing you really want. the problem here is that only a trusted playbook can run kubectl without yet another execution context. i'd like to remove that restriction and we're working toward it. removing gearman is a step in that direction. | 17:43 |
That escape valve hinges on me having enough access to my kubernetes cluster to even get Zuul and Nodepool to run things on it. So far, that's been a real slog. | ||
@jim:acmegating.com | spamaps: i mean, you should be able to write a job that just executes a kubectl command on the zuul executor. then you're not dealing with ansible. | 17:44 |
@spamaps:spamaps.ems.host | I have a service account that is allowed access to one namespace only, and that's proving to be too restrictive I think. | 17:44 |
@spamaps:spamaps.ems.host | > <@jim:acmegating.com> spamaps: i mean, you should be able to write a job that just executes a kubectl command on the zuul executor. then you're not dealing with ansible. | 17:45 |
But then I'm also not dealing with nodepool which makes me t3h sad. ;) | ||
@jim:acmegating.com | it's not a practical solution because that requires a trusted playbook, but that's the thing i'd like to change. and that's "merely" lifting a restriction. | 17:45 |
@tobias.henkel:matrix.org | spamaps: that's a point where I agree we should work on since the openshift pods driver supports exactly this use case | 17:45 |
@tobias.henkel:matrix.org | in this regard the k8s and openshift driver differ so we might want to harmonize them at some point regarding the functionality | 17:45 |
@spamaps:spamaps.ems.host | I could make a Deployment and Horizontal Pod Autoscaler.. and delete pods once I use them.. but man.. that sounds a lot like zuul and nodepool. | 17:45 |
@spamaps:spamaps.ems.host | I'm also like.. really struggling to figure out what permissions are needed for the openshiftpods to even work. | 17:46 |
@spamaps:spamaps.ems.host | I do have a sympathetic k8s cluster admin now.. so I can get more perms if need be. | 17:46 |
@spamaps:spamaps.ems.host | (Or I can also run my own k8s cluster.. but ... please no) | 17:46 |
@tobias.henkel:matrix.org | I think that requires a real openshift cluster so it won't work on vanilla k8s | 17:47 |
@jpew:matrix.org | spamaps: https://review.opendev.org/c/zuul/zuul-operator/+/810498 ? | 17:47 |
@spamaps:spamaps.ems.host | > <@jpew:matrix.org> spamaps: https://review.opendev.org/c/zuul/zuul-operator/+/810498 ? | 17:48 |
I can't run the operator. I do not have admin on the cluster. | ||
@jpew:matrix.org | Ah, ya. I was wondering if that was the bug you were seeing with kubeconfig, but if you aren't using the operator, probably not :) | 17:48 |
@spamaps:spamaps.ems.host | A request for perms to have CRDs in my namespace was aggressively opposed. | 17:48 |
@spamaps:spamaps.ems.host | This k8s cluster serves about 20% of Spotify's backend.. so.. they're quite protective of it. :) | 17:49 |
@jpew:matrix.org | Sure | 17:49 |
@tobias.henkel:matrix.org | I wonder if it's a good idea to mix that with ci tasks ;) | 17:49 |
@jpew:matrix.org | That was my thought :) | 17:50 |
@spamaps:spamaps.ems.host | (and collectively this is just one of about 12 identical clusters that serve more like 90% of Spotify's backend ... they have carved out permissions for the usual case.. not CI/CD) | 17:50 |
@spamaps:spamaps.ems.host | > <@tobias.henkel:matrix.org> I wonder if it's a good idea to mix that with ci tasks ;) | 17:51 |
I wonder that too. :-/ | ||
@tobias.henkel:matrix.org | I just wonder if development infrastructure should be separated from real production services | 17:51 |
@spamaps:spamaps.ems.host | But the only alternatives that exist is a staging cluster meant to be identical to prod, so that's not going to work, or roll your own. | 17:51 |
@tobias.henkel:matrix.org | I know my users and they can break everything ;) | 17:51 |
@jim:acmegating.com | there's certainly a school of thought that k8s clusters should be plentiful, and perhaps dedicated to tasks. it's interesting to see that the opposite is also held -- that there should be 1 (or N for small values of N) to rule them all. | 17:52 |
@spamaps:spamaps.ems.host | The permissions are extremely locked down. I'm sure we can break out, but.. I dunno, it doesn't concern anyone here to run CI on the same cluster. | 17:52 |
@tobias.henkel:matrix.org | I tend to like k8s as a service these days :) | 17:53 |
@tobias.henkel:matrix.org | > <@jim:acmegating.com> there's certainly a school of thought that k8s clusters should be plentiful, and perhaps dedicated to tasks. it's interesting to see that the opposite is also held -- that there should be 1 (or N for small values of N) to rule them all. | 17:54 |
that's what we're migrating towards | ||
@spamaps:spamaps.ems.host | > <@jim:acmegating.com> there's certainly a school of thought that k8s clusters should be plentiful, and perhaps dedicated to tasks. it's interesting to see that the opposite is also held -- that there should be 1 (or N for small values of N) to rule them all. | 17:54 |
There are 12 clusters, 4 in each geo region (and about to be 20 because 2 new regions are coming) They are all identical. All namespaces are in all clusters. Apps are by default deployed on one cluster per region, and a provisioner decides which cluster you get at the time of your app creation. | ||
@spamaps:spamaps.ems.host | One might argue this cluster *is* dedicated to a single thing. Running java backends. | 17:54 |
@spamaps:spamaps.ems.host | And I'm a weirdo.. so I should probably accept my fate and make my own k8s cluster. | 17:54 |
@tobias.henkel:matrix.org | but smaller clusters are only useful when using some kind of managed k8s | 17:54 |
@spamaps:spamaps.ems.host | But.. that basically means Zuul comes to Spotify some time in 2023. | 17:54 |
@spamaps:spamaps.ems.host | Because a new k8s cluster must have a host of security approvals to get access to the network where GHE lives. | 17:56 |
@spamaps:spamaps.ems.host | Honestly I'd probably be better off just squeezing our container CI jobs into docker commands on vms. I may do that actually. | 17:56 |
@tobias.henkel:matrix.org | I think such an architecture perfectly serves its main purpose which is reliably running production services at scale | 17:56 |
@spamaps:spamaps.ems.host | We are on Google cloud, so we have GKE.. the management trouble comes at the networking level. | 17:57 |
@tobias.henkel:matrix.org | yay networking level, I feel your pain | 17:57 |
@spamaps:spamaps.ems.host | I have to get subnets.. and routes.. and firewall rules. | 17:57 |
@spamaps:spamaps.ems.host | Thanks, y'all have talked me out of using kubernetes based jobs. ;) | 17:58 |
@tobias.henkel:matrix.org | we have multi-step tunneling mechanisms in place to cope with networking when combining public cloud and on prem | 17:58 |
@tobias.henkel:matrix.org | * we have multi-hop tunneling mechanisms in place to cope with networking when combining public cloud and on prem | 17:58 |
@tobias.henkel:matrix.org | > <@spamaps:spamaps.ems.host> Honestly I'd probably be better off just squeezing our container CI jobs into docker commands on vms. I may do that actually. | 17:59 |
that's actually what many of our users do... | ||
@spamaps:spamaps.ems.host | But, I have to say.. if the only option for using Zuul with k8s is to have admin on the k8s.. it just raises the bar for what has rapidly become the way people run containers. If somebody could write down exactly what is needed to use openshiftpods with a k8s cluster where you don't have admin.. that might change my mind. But right now, it feels like I'm blazing a trail, not following in footsteps. | 18:00 |
@spamaps:spamaps.ems.host | And I just don't have the time or resources to blaze a trail. | 18:00 |
@spamaps:spamaps.ems.host | Thanks everyone for chiming in. It's been helpful! | 18:01 |
@tobias.henkel:matrix.org | I think we'd need to add the functionality of the openshiftpods driver to the k8s driver | 18:01 |
@tobias.henkel:matrix.org | maybe with an option to get a namespace or not | 18:01 |
@tobias.henkel:matrix.org | openshiftpods as is likely only works on openshift | 18:02 |
@spamaps:spamaps.ems.host | I am not sure it is openshiftpods' fault right now? The failure I'm at now is that the token that nodepool put in the port.. doesn't seem to work. Is that some missing openshift magic? Or do I just need to figure out how to bind the right role to the right service account? | 18:02 |
@spamaps:spamaps.ems.host | I guess my point is, fumbling around, asking in this chat room, isn't going to get us to a working solution quick enough to matter. I need something to start running jobs ... now. | 18:03 |
@spamaps:spamaps.ems.host | I was hoping somebody would point me at a blog post or a guide in the docs that says "so you have a k8s namespace and you want to zuul with it." | 18:04 |
@spamaps:spamaps.ems.host | Anyway.. I'm backing away from k8s slowly now. Onward to VMs. Thanks again. Have to context switch now. :( | 18:05 |
@avass:vassast.org | Heh, I've convinced my infra team to give us Kata containers on our private openshift just so we can run docker workloads inside pods :) | 18:11 |
@jim:acmegating.com | Albin Vass: i think you win container bingo | 18:16 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 817495: Use merger API for merger stats https://review.opendev.org/c/zuul/zuul/+/817495 | 18:36 | |
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: | 20:44 | |
- [zuul/zuul] 808041: [web] Pagination in builds, buildsets search https://review.opendev.org/c/zuul/zuul/+/808041 | ||
- [zuul/zuul] 820066: Update patternfly/react-core to 4.175.11 https://review.opendev.org/c/zuul/zuul/+/820066 | ||
@mhuin:matrix.org | ^this is not API-breaking anymore \o/ however I still want to improve search by using indexes rather than offsets. More on that tomorrow! | 20:45 |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 818257: Retry dependency update requests https://review.opendev.org/c/zuul/zuul/+/818257 | 21:44 | |
@clarkb:matrix.org | corvus: tobiash ^ thank you for the feedback I think that addresses your comments | 21:44 |
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand: [zuul/nodepool] 818705: functional test: check DIB kernel flags https://review.opendev.org/c/zuul/nodepool/+/818705 | 22:31 | |
@jim:acmegating.com | Clark, tobiash i think i found the cause of the issue in #opendev (the periodic pipeline is stuck because there's a change missing from the change cache). this is subtle. | 22:49 |
@jim:acmegating.com | after every pipeline run we save a list of changes in the pipeline for quick reference later. we use that list when pruning the change cache so that we don't delete changes that are still in pipelines. | 22:50 |
@jim:acmegating.com | but we remove change queues during pipeline processing, and i think we may skip over change queue N if we remove N-1 (because we mutate the list). if we do that, we won't record change N in the list of changes in the pipeline. | 22:51 |
@clarkb:matrix.org | oh interesting | 22:51 |
@jim:acmegating.com | so basically, if scheduler A completes processing of a queue item right before scheduler B prunes the cache, we might delete a change from the cache that we shouldn't. but it corrects on the next pass, so the timing has to be just right (and it was during this mornings periodic runs) | 22:52 |
@jim:acmegating.com | i'll work on a fix. i'm not hopeful that we'll be able to test this :/ | 22:52 |
@clarkb:matrix.org | ok then I guess we'd try restarting after the fix lands? | 22:53 |
@clarkb:matrix.org | ianw: ^ fyi since you indicated you could help too | 22:53 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 23:42 | |
- [zuul/zuul] 820079: Fix mutation while iterating over queues https://review.opendev.org/c/zuul/zuul/+/820079 | ||
- [zuul/zuul] 820080: Handle more than 1024 changes in the pipeline change list https://review.opendev.org/c/zuul/zuul/+/820080 | ||
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 820080: Handle more than 1024 changes in the pipeline change list https://review.opendev.org/c/zuul/zuul/+/820080 | 23:43 | |
@jim:acmegating.com | Clark: ^ the first change is our immediate problem; the second is something that could eventually be a problem. i think merging the first asap is okay but would like to give tobiash a chance to consider 820080 before we approve it | 23:44 |
@clarkb:matrix.org | ok I'll take a look as soon as the restart is complete | 23:45 |
@jim:acmegating.com | the first change has a fantastic 416:1 commit-message:change ratio for character counts. | 23:46 |
@mordred:inaugust.com | > <@jim:acmegating.com> the first change has a fantastic 416:1 commit-message:change ratio for character counts. | 23:59 |
It is truly glorious |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!