corvus | tristanC: do you think we should use dhall instead of ansible or go? | 00:00 |
---|---|---|
tristanC | and i'm also able to generate an ansible playbook based on the same spec: https://github.com/TristanCacqueray/dhall-zuul/blob/master/deployments/cr-playbook.yaml | 00:00 |
tristanC | corvus: well, the k8s operator still uses ansible: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/roles/zuul/tasks/main.yaml | 00:00 |
corvus | tristanC: oh, so you have ansible running dhall? | 00:01 |
clarkb | dhall is just a config language (replacement for json or yaml) right? | 00:01 |
tristanC | corvus: yes, dhall is just used to generate the k8s object, it doesn't manage zuul. I'll use ansible to implement backup/restore and scalling | 00:01 |
clarkb | I don't think there is an operator sdk implementation using dhall | 00:02 |
corvus | ok. got it. yeah. i mean, i wouldn't put it past someone to have made an operator sdk implementation with dhall. :) | 00:02 |
tristanC | clarkb: that's correct, dhall is like a more advanced yaml or json, the evaluation only results in a configuration | 00:02 |
corvus | remote: https://gerrit-review.googlesource.com/c/zuul/ops/+/250112 Add initial bootstrapping instructions and nodepool config [NEW] | 00:03 |
corvus | mnaser: ^ that's what i did today | 00:03 |
corvus | mnaser: thanks for your help :) | 00:03 |
tristanC | corvus: well i wrote another operator that applies a dhall expression to kubernetes, e.g.: https://github.com/TristanCacqueray/dhall-operator/blob/master/operator/examples/dhall-test.yaml | 00:04 |
corvus | clarkb: ^ there you go :) | 00:04 |
corvus | tristanC: nicely done :) | 00:04 |
tristanC | corvus: but it's implemeted in ansible too: https://github.com/TristanCacqueray/dhall-operator/blob/master/operator/roles/dhall/tasks/main.yml | 00:04 |
*** mattw4 has quit IRC | 00:04 | |
tristanC | but that's low-level, for zuul i respected the spec defined in zuul | 00:05 |
mnaser | corvus: woot, the deployment stuff i made works but im trying to find a way to wait for the deployment to fully rollout | 00:05 |
mnaser | as the deployment is completing properly but nodepool doesnt go up cause zookeeper isnt done rolling out yet (as it puts out 3 replicas) | 00:05 |
tristanC | corvus: thanks, it's working quite well, and based on the same definition of services and configuration, it's super convenient to generate k8s objects as well as ansible playbooks | 00:05 |
clarkb | tristanC: what is the advantage to compiling dhall to k8s objects yaml or ansible playbooks yaml over writing them directly? templating? | 00:06 |
corvus | mnaser: yeah, i guess argo reports synced even before k8s is finished running the pods, and even running the pods isn't the same as 'service is up and running' | 00:06 |
mnaser | corvus: yeah, sync'd means the k8s state == charts, "progressing" = it's doing things | 00:06 |
tristanC | clarkb: mostly so that the user doesn't have to worry about dhall, the interface is the zuul crd | 00:07 |
mnaser | and it transitions from progressing to healthy | 00:07 |
clarkb | tristanC: so it goes from zuul crd yaml to dhall to ansible yaml? | 00:08 |
tristanC | clarkb: i missread... well there are many advantage for using dhall, but if i had to pick one is that it is programmable | 00:08 |
clarkb | ya ok I was worried that was it :) but maybe I'm just scarred from dealing with python packaging and its programming config :) | 00:09 |
tristanC | clarkb: the input is json, e.g.: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-cr.yaml#L6-L28 | 00:09 |
tristanC | clarkb: which is converted in this Input type: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L39-L56 | 00:10 |
tristanC | clarkb: and passed on to this function: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L96 | 00:10 |
tristanC | which can then be used to generate k8s object, ansible playbooks or podman compose script, see the 'cr-*' file here: https://github.com/TristanCacqueray/dhall-zuul/tree/master/deployments | 00:11 |
clarkb | ya so dhall is like your IR for a compile between config formats | 00:12 |
tristanC | clarkb: it's not really comparable to python setup.py since python can do side effects, dhall evaluation is strictly pure | 00:13 |
tristanC | clarkb: yes, something like that, at least that's what i tried for zuul with this project: https://github.com/TristanCacqueray/dhall-operator | 00:13 |
tristanC | hopefully nodepool and the remaining bits like mqtt should be implemented by tomorrow, then i'll push an usable operator image | 00:16 |
tristanC | also, i wrote an optional git-daemon service to serve a ready to use configuration, using a fast periodic pipeline it's quite handy to validate the whole setup, it looks like this: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Demo.dhall#L23-L82 | 00:19 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764 | 00:20 |
mnaser | anyone has any creative nice ideas for the name of a zuul job which would run a helm chart against a k8s cluster? | 00:21 |
clarkb | mnaser: "apply-helm-chart" ? | 00:22 |
mnaser | that seems reasonable | 00:22 |
mnaser | once i get this test working, i'll refactor out to zuul/zuul-jobs cause i'd like to make use of this too | 00:22 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764 | 00:27 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867 | 00:34 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764 | 00:36 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867 | 00:38 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-registry master: Switch to collect-container-logs https://review.opendev.org/701868 | 00:39 |
openstackgerrit | Mohammed Naser proposed zuul/nodepool master: Switch to collect-container-logs https://review.opendev.org/701869 | 00:42 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-registry master: Switch to collect-container-logs https://review.opendev.org/701868 | 00:42 |
clarkb | tristanC: does https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Demo.dhall#L143-L153 and https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Demo.dhall#L192-L194 mean there is a zuul main.yaml generated for non scheduler services? | 00:49 |
clarkb | also I grok this a bit more now that I've realized its haskell with "io" built in | 00:50 |
clarkb | its like bash -x for haskell kind of | 00:50 |
clarkb | but the trace is also the output if that makes sense | 00:50 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867 | 00:52 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764 | 00:52 |
mnaser | so close. i think zookeeper needs more than 300s to start up in this case | 00:53 |
tristanC | clarkb: yes, that's a demo "application", the zuul operator is using this another one which takes the main.yaml from a user provided secrets: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-cr.yaml#L13 that requires users to applies secret before the Zuul cr such as: https://github.com/TristanCacqueray/dhall-zuul/blob/master/deployments/cr-input-k8s.yaml#L30 | 00:54 |
tristanC | clarkb: which is added to scheduler service here: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L290-L295 and https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L230 | 00:55 |
tristanC | clarkb: this could be made optional in the future, for example when the user provide a list of project https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-connection.yaml#L11 (which would be expanded like so: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Connection.dhall#L84-L93 ) | 00:57 |
tristanC | clarkb: another neat example is how the zk services is only enabled when the user doesn't specify a zk config: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L150-L165 | 01:00 |
mnaser | welp, everything is working nicely but | 01:05 |
mnaser | coredns is broken in our install-kubernetes job :) | 01:05 |
mnaser | https://6efce3be0458ca4ff401-19d327c34d133e2dbd296ad72151667c.ssl.cf5.rackcdn.com/701764/11/check/zuul-helm-functional/bf5d120/docker/k8s_coredns_coredns-6955765f44-csbfp_kube-system_7ae315e6-3ae6-4f07-a30c-80f511b6819f_5.txt | 01:05 |
tristanC | clarkb: it's similar to a haskell with "io" built in, but the type system is different, it's based on CCw: https://hal.inria.fr/hal-01445835 | 01:05 |
clarkb | tristanC: ya structurally it is similar | 01:06 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template' https://review.opendev.org/701871 | 01:06 |
tristanC | clarkb: well it's also very different since you can't do recursion, expression are guarated to evaluate (or fail, but not hang or do side effect) | 01:07 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764 | 01:07 |
clarkb | mnaser: hrm I thought we set the coredns config to forward to whatever is set as the hosts forwarders | 01:07 |
clarkb | mnaser: and 1.1.1.1 and 8.8.8.8 should not be forwarding back to coredns | 01:07 |
mnaser | clarkb: so the issue here i think is that you set it as the hosts forwarders, which is 127.0.0.1 | 01:08 |
mnaser | and when coredns starts up, it uses the hosts' /etc/resolv.conf (or whatever you pointed it to) | 01:08 |
clarkb | mnaser: no we changed that | 01:08 |
clarkb | or at least I thought we did /me looks to find it | 01:08 |
mnaser | https://6efce3be0458ca4ff401-19d327c34d133e2dbd296ad72151667c.ssl.cf5.rackcdn.com/701764/11/check/zuul-helm-functional/bf5d120/docker/k8s_coredns_coredns-6955765f44-csbfp_kube-system_7ae315e6-3ae6-4f07-a30c-80f511b6819f_5.txt | 01:08 |
mnaser | that tells me that we're hitting 127.0.0.1 (hence how the loop gets caught) | 01:09 |
clarkb | well it tells you there is a forwarding loop it doesn't say anything about who is forwarding | 01:09 |
mnaser | oh look | 01:09 |
mnaser | yes it seem st obe fixed hmm | 01:09 |
mnaser | hah | 01:10 |
mnaser | jokes on you! | 01:10 |
mnaser | 2020-01-10 00:57:18.993217 | ubuntu-bionic | * Preparing Kubernetes v1.17.0 on Docker '19.03.5' ... | 01:10 |
mnaser | 2020-01-10 00:57:21.852544 | ubuntu-bionic | - kubelet.resolv-conf=/run/systemd/resolve/resolv.conf | 01:10 |
clarkb | https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/install-kubernetes/tasks/minikube.yaml#L43-L53 | 01:10 |
clarkb | you need to set minikube_dns_resolvers | 01:10 |
mnaser | ah | 01:11 |
clarkb | which we did in the tests but I guess we never plumbed that into the real jobs | 01:11 |
*** zbr|rover has quit IRC | 01:11 | |
mnaser | yes indeed that looks like the case | 01:11 |
clarkb | I guess my changes allowed you to run it properly but you still have to set it wherever you run install-kubernetes >) | 01:11 |
clarkb | er :) | 01:11 |
mnaser | it feels like sane user behaviour to maybe have a default on 1.1.1.1 | 01:12 |
mnaser | clarkb: do you remember if that was discussed or not before i push a change for that? | 01:12 |
mnaser | or is this too much of an opendev thing because not everyone runs unbound inside their own nodepool vms i guess | 01:12 |
clarkb | I don't know that we discussed it, but considering zuul-jobs roles are meant to be pretty generic I'm not sure we should assume that? | 01:12 |
fungi | it does mean you're unconditionally opting your users into google tracking their zuul environment's dns queries | 01:13 |
clarkb | fungi: cloudflare, not google but ya | 01:13 |
fungi | oh, cloudflare is 1.1.1.1, google is 8.8.8.8 but right, neither is really above reproach, they both have tracking everyone as their business model | 01:14 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764 | 01:14 |
mnaser | ok, so i think this should give us a functional nodepool deployed inside k8s | 01:14 |
clarkb | fungi: wait 'til you find out that everytime you build go code you hit google servers and tell them what dependencies you are using | 01:15 |
clarkb | (unless you set some random env vars that no one seems to rememberwhat they are) | 01:15 |
clarkb | this makes your build sfaster because they can cache things (and github is slow I guess) | 01:15 |
fungi | the best way to find out is to go to china | 01:17 |
fungi | and then discover that the libs go retrieves are slightly altered because they hit a different cache | 01:17 |
fungi | which served you "suspect" substitutes | 01:18 |
clarkb | in hmac verification libraries | 01:18 |
fungi | right, nothing security-critical | 01:18 |
fungi | you know, just the bits that actually verify nothing else has been changed | 01:18 |
*** threestrands has joined #zuul | 01:19 | |
*** zbr has joined #zuul | 01:19 | |
fungi | for those who weren't with us in shanghai to see it yourselves, i swear this is actually a true story | 01:21 |
mnaser | hmm | 01:23 |
mnaser | do we have some sort of scenario for things inside zuul/zuul-jobs to allow the consumer of the job to .. do something after we do $things | 01:23 |
mnaser | in my case, im trying to make a general "apply a helm chart to kubernetes" job, but id like to allow the user to run things after we've deployed the chart | 01:24 |
*** zbr has quit IRC | 01:24 | |
mnaser | putting the code that deploys the chart in pre.yaml is a bit iffy cause it can actually fail.. having the connsumer of the role use post.yaml also feels meh | 01:24 |
clarkb | mnaser: if you make that step a pre-run step then the job consumer can supply a run playbook that executes after | 01:24 |
mnaser | unless thats ok | 01:24 |
corvus | mnaser: that's part of why we make roles for everything. | 01:24 |
clarkb | also that | 01:25 |
mnaser | ok, fair, so in that case you run.yaml would have the 'helm-template' role and then whatever you need/want after | 01:25 |
corvus | yep, should be a tiny playbook; and if they're running something after anyway, it's just 1 more line | 01:25 |
*** zbr has joined #zuul | 01:25 | |
clarkb | re failures in pre that was something I was looking at pre holidays and it is a non zero occurence problem | 01:26 |
clarkb | you are right to be cautious there | 01:26 |
mnaser | w000t | 01:28 |
mnaser | corvus: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_548/701764/13/check/zuul-helm-functional/5484584/docker/k8s_launcher_zuul-nodepool-launcher-6b9454f5c8-shjrl_default_93aa561c-3726-4945-acbc-c37a0b7c6d15_0.txt check out the end of that file | 01:28 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874 | 01:29 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764 | 01:31 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template' https://review.opendev.org/701871 | 01:40 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874 | 01:40 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764 | 01:40 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764 | 01:47 |
*** threestrands has quit IRC | 01:50 | |
mnaser | ok cool, so we have apply-helm-to-k8s roles and jobs, and that's being used to test the zuul-system chart which currently deploys nodepool with zookeeper successfully. i don't really have something that actually prods nodepool to test if its really running but i'll leave that for those more familiar with the integration testing of nodepool | 01:59 |
* mnaser will probably be sucked into other things in the next few days so anyone can feel free to take the progress and drive it | 01:59 | |
clarkb | the ither integration jobs wait for min ready node to come active | 02:12 |
clarkb | just polling nodepool list with a tineout | 02:12 |
*** swest has quit IRC | 02:15 | |
*** zbr_ has joined #zuul | 02:15 | |
*** zbr has quit IRC | 02:16 | |
*** zbr_ has quit IRC | 02:17 | |
*** zbr has joined #zuul | 02:28 | |
*** saneax has quit IRC | 02:29 | |
*** swest has joined #zuul | 02:31 | |
*** zxiiro has quit IRC | 02:35 | |
*** rlandy has quit IRC | 02:55 | |
mnaser | clarkb: that would imply that I would need an openstack deployment alongside? Maybe it could be a multi node job where we deploy openstack in one and zuul with kubernetes on the other? | 02:55 |
*** bhavikdbavishi has joined #zuul | 02:57 | |
*** bhavikdbavishi1 has joined #zuul | 03:00 | |
*** bhavikdbavishi has quit IRC | 03:01 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 03:01 | |
clarkb | mnaser: no you can setup a k8s prover | 03:21 |
clarkb | er provider | 03:21 |
clarkb | and wait for that node to show up | 03:21 |
mnaser | ya I guess but that means I have to go through the non trivial amount of work in order to setup the service accounts and role bindings and add them to the role too | 04:27 |
*** saneax has joined #zuul | 04:34 | |
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #zuul | 05:34 | |
*** swest has quit IRC | 05:35 | |
*** swest has joined #zuul | 06:22 | |
*** themroc has joined #zuul | 06:39 | |
*** pcaruana has joined #zuul | 07:53 | |
*** avass has quit IRC | 07:56 | |
*** avass has joined #zuul | 07:56 | |
*** tosky has joined #zuul | 08:12 | |
*** fdegir has quit IRC | 08:22 | |
*** fdegir has joined #zuul | 08:23 | |
*** pcaruana has quit IRC | 08:46 | |
*** pcaruana has joined #zuul | 08:53 | |
*** jpena|off is now known as jpena | 08:54 | |
*** bhavikdbavishi has quit IRC | 09:16 | |
*** zbr is now known as zbr|rover | 09:19 | |
*** avass has quit IRC | 09:51 | |
*** sanjayu_ has joined #zuul | 10:26 | |
*** saneax has quit IRC | 10:26 | |
*** sanjayu__ has joined #zuul | 10:27 | |
*** sanjayu_ has quit IRC | 10:30 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: [WIP] Docker compose example: add keycloak authentication https://review.opendev.org/664813 | 10:41 |
*** mhu has joined #zuul | 10:58 | |
*** avass has joined #zuul | 11:12 | |
*** sshnaidm is now known as sshnaidm|off | 11:52 | |
zbr|rover | did anyone attempted to make zuul badge dynamic like other CI systems? re https://zuul-ci.org/docs/zuul/user/badges.html | 11:52 |
*** jpena is now known as jpena|lunch | 12:01 | |
*** pcaruana has quit IRC | 12:34 | |
*** pcaruana has joined #zuul | 12:38 | |
*** bhavikdbavishi has joined #zuul | 12:44 | |
*** bhavikdbavishi1 has joined #zuul | 12:49 | |
*** bhavikdbavishi has quit IRC | 12:51 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 12:51 | |
*** sanjayu__ has quit IRC | 12:56 | |
*** jpena|lunch is now known as jpena | 12:57 | |
*** rlandy has joined #zuul | 13:22 | |
tobiash | zbr|rover: you mean like pass/failed? | 13:33 |
tobiash | zbr|rover: we chose zuul/gated as static because the idea behind gating is that master is never broken | 13:34 |
zbr|rover | tobiash: yeah, that is a very good sales speech ;) -- we do all know that reality is the builds are not green in perpetuity. | 13:36 |
zbr|rover | for the moment i am using the static one, but it would be nice/useful to also be able to expose result of last periodic. bit more tricky with zuul. | 13:37 |
tobiash | Exposing the last periodic needs a service that queries the zuul api and generates the appropriate svg | 13:39 |
*** Goneri has joined #zuul | 13:39 | |
tobiash | I guess this could be done with ~50 lines od python code | 13:40 |
tobiash | it could be even useful to have an api endpoint in zuul that returns the latest result of a specific job as json and produces an svg when requesting application/svg | 13:44 |
tobiash | or alter the already existing builds endpoint such that it generates the svg of the first hit | 13:45 |
tobiash | that would work even without adding an additional endpoint | 13:45 |
mhu | tobiash, that could be shown in the builds page too | 13:45 |
zbr|rover | tobiash: it may be bit more tricky to implement, because the idea of the badge is to report status per project, not per individual jobs. | 13:45 |
tobiash | zbr|rover: in that case: buildsets endpoint | 13:46 |
tobiash | there you can filter for project, branch and pipeline for the whole buildset | 13:47 |
zbr|rover | yeah. nice to have it but has lower prio than other work with more impact. | 13:47 |
tobiash | maybe something when I get bored on a rainy weekend ;) | 13:48 |
*** sanjayu__ has joined #zuul | 14:01 | |
openstackgerrit | Simon Westphahl proposed zuul/nodepool master: Always identify static nodes by node tuple https://review.opendev.org/701969 | 14:05 |
mnaser | i did some refactoring for the collection of container logs into zuul/zuul-jobs and updated the related projects, appreciate some reviews - https://review.opendev.org/#/q/topic:collect-container-logs | 14:08 |
openstackgerrit | Simon Westphahl proposed zuul/nodepool master: Always identify static nodes by node tuple https://review.opendev.org/701969 | 14:10 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: JWT drivers: Deprecate RS256withJWKS, introduce OpenIDConnect https://review.opendev.org/701972 | 14:20 |
AJaeger | mnaser: will review - could I trade you https://review.opendev.org/700913, please? | 14:37 |
mnaser | AJaeger: done :) | 14:38 |
mhu | I'm looking at the Capabilities class in zuul.model, it says "Some plugins add elements to the external API. In order to facilitate consumers knowing if functionality is available or not, keep track of distinct capability flags." | 14:42 |
mhu | Is there a mechanism to register such capabilities ? Or should I just add them to the class directly? | 14:43 |
AJaeger | thanks, mnaser | 14:44 |
openstackgerrit | David Shrewsbury proposed zuul/zuul master: Extract project config YAML into ref docs https://review.opendev.org/701977 | 14:46 |
mhu | Is it used at all? "job_history" seems unsettable | 14:47 |
tobiash | mhu: that's plumbed through to the api/info endpoint | 14:54 |
tobiash | mhu: atm, the only use case is to inform the web ui wether builds history is existing or not (so if there is an sql connection) | 14:55 |
tobiash | based on that the web ui hides the builds tab | 14:55 |
tobiash | mhu: there is no register mechanism for that so add it directly there | 14:57 |
tobiash | it's initialized from here: https://opendev.org/zuul/zuul/src/branch/master/zuul/cmd/web.py#L50 | 14:57 |
Shrews | hrm, our Cross Project Gating doc is confusing to me. It states it shows "how to test future state of cross projects dependencies" but there is no "how" in there | 14:59 |
Shrews | i don't get the point | 15:00 |
tristanC | Shrews: yeah, my bad, i meant to add more content | 15:03 |
tobiash | mhu: actually, job_history there is never set, so strike that... | 15:03 |
tobiash | mhu: so you probably want to modify the fromConfig of WebInfo if you want to add some new capability to the info endpoint | 15:03 |
mhu | tobiash, yeah that's what puzzled me :) if you look at http://zuul.opendev.org/api/tenant/opendev/info for example, job_history = false yet you can hit the builds endpoint | 15:04 |
mhu | tobiash, right that's the simplest way, I was just wondering if I got the spirit of this right | 15:04 |
tobiash | mhu: I guess that's broken since probably no one runs a zuul without a db so nobody complained ;) | 15:04 |
mhu | mordred, git log snitched on you, you added the webinfo capabilities in commit 518dcf8bdba9c5c22711297395c4a9cb4e0c644d, any insights on this ? :) | 15:06 |
Shrews | tristanC: ah, i see | 15:06 |
corvus | mnaser, clarkb: an integration test job could wait for a static node | 15:14 |
fungi | mhu: i hard-code `git blame` to just return "mordred" for everything ;) | 15:16 |
tristanC | tobiash: i can't find the openshift scc for zuul you posted sometime ago... would you mind pasting them again? | 15:32 |
tobiash | tristanC: I use the default privileged scc | 15:33 |
*** themroc has quit IRC | 15:35 | |
*** themroc has joined #zuul | 15:35 | |
mnaser | corvus, clarkb: oh ya, i think that might work. does nodepool need to scan the ssh-keys (aka i can just point at the node ip and thats probably enough?) | 15:38 |
openstackgerrit | Merged zuul/zuul-jobs master: Make pre-molecule tox playbook platform agnostic https://review.opendev.org/700452 | 15:38 |
*** themroc has quit IRC | 15:40 | |
*** themroc has joined #zuul | 15:41 | |
tobiash | tristanC: you just need to make sure you run the executor with a service account that is allowed to use the privileged scc | 15:41 |
tristanC | tobiash: alright, i remembered a more complex setting. i'll try that, thanks! | 15:44 |
*** jpena is now known as jpena|brb | 15:45 | |
tristanC | btw I got an operator that now implements the full Zuul CR, here is an example usage: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-cr.yaml | 15:46 |
*** themroc has quit IRC | 15:46 | |
*** themroc has joined #zuul | 15:47 | |
corvus | tristanC: that looks nice! | 15:51 |
*** themroc has quit IRC | 15:51 | |
*** themroc has joined #zuul | 15:51 | |
corvus | tristanC: do you want to propose that up to the zuul-operator repo? | 15:52 |
*** rfolco has quit IRC | 15:52 | |
tristanC | corvus: yes sure, if you don't mind the dhall based implementation, i'd be happy to push that to zuul/zuul-operator | 15:53 |
corvus | tristanC: i think it's worth looking at. my biggest concern is that users don't see it. but if it's just implementation detail, that's different. | 15:55 |
mordred | mhu: yeah - I added that a while back with the intent on it being used like you're suggesting ... but then we never plumbed it all the way through - and then we started thinking that DB was really something we wanted to make a hard requirement in the future, so the motivation to finish the info/capabilities work kind of went away | 15:55 |
mhu | mordred, okay that clears things up | 15:56 |
clarkb | corvus: tristanC: a bit more annotation of the constructs would probably be good to explain what is going on since dhall is new. But I font think end users will ever interact with it | 15:56 |
mordred | mhu: is there a new plugin capability you were thinking of adding that might be useful to register there? | 15:56 |
mhu | mordred: yeah, the authentication config | 15:57 |
corvus | clarkb, tristanC: yeah -- i'm still not quite sure what dhall is doing here | 15:57 |
mhu | as suggested by tobiash in https://review.opendev.org/#/c/643536/ | 15:57 |
mordred | mhu: ah - yeah, that makes total sense | 15:57 |
clarkb | corvus: after looking yesterday my understanding is it compiles k8s CR yaml inputs to zuul config and ansible playbook outputs | 15:58 |
tristanC | corvus: clarkb: that's good to hear, i should be able to propose a review today or next week with what i currently have | 15:58 |
*** jangutter has quit IRC | 15:59 | |
fungi | tristanC: skimming back through the dhall, how do you end up writing it? does your keyboard have keys for λ and → glyphs or are you using compose sequences? does dhall support a pure-ascii syntax variant maybe? | 16:02 |
corvus | tristanC, pabelanger, mnaser, mordred, clarkb: maybe we should schedule a meeting to talk about how to proceed? since the spec says ansible, and mnaser proposed golang, and now tristan is looking at dhall, it seems like maybe we should try to get back on the same page. | 16:03 |
clarkb | fungi: tristanC uses the ascii dhall version | 16:03 |
clarkb | corvus: tristanC's operator is still ansible | 16:03 |
tristanC | fungi: it's currently debated here: https://discourse.dhall-lang.org/t/change-dhall-format-to-use-ascii-by-default | 16:03 |
clarkb | corvus: dhall takes the k8s CR input and rewrites it to ansible. | 16:03 |
tristanC | corvus: clarkb: yes i'm using the ansible operator-sdk | 16:04 |
fungi | clarkb: he's not using pure ascii in https://github.com/TristanCacqueray/dhall-zuul/blob/master/render.dhall for example | 16:04 |
corvus | clarkb: true, but i guess the spirit of the question is "as we maintain the operator, what are we going to expect to need to modify?" | 16:04 |
clarkb | fungi: ah must of the files I read use \ instead of lambda and -> instead of arrow | 16:04 |
fungi | got it, so there at least is an option of doing pure ascii anyway | 16:04 |
clarkb | corvus: yup, and dhall would definitely be involved in that. FWIW I think with a bit of annotation calling out what the expansions are I think it would end up being quite readable | 16:05 |
tristanC | fungi: long story short, i wish i had a lambda key on my keyboard, but i type in ascii, and the formatter takes care of the rendering. I change the default to use '--ascii' recently as upstream is suggesting that, but the recent poll shows that unicode is still popular https://docs.google.com/forms/d/e/1FAIpQLSc_4Se7V6jRk4SBfAx1UdZ67Cf_5Hg0uRas5PxOMeRes4nQGg/viewanalytics | 16:05 |
clarkb | it might also help to reduce some of the nesting | 16:06 |
corvus | should we maybe plan an irc and/or teleconference meeting next week to talk about it? | 16:06 |
fungi | interesting. people like to use a language they can't type without stuffing it through a preprocessor | 16:06 |
clarkb | so that each logical expansion is a top level function | 16:06 |
clarkb | rather than a bunch of closures | 16:06 |
mnaser | i'm free next week but after that my schedule is a little funky as i'll be in EU for a bit | 16:06 |
clarkb | (at least I think I would find it more readable that way) | 16:06 |
mnaser | i can be available EU evenings | 16:07 |
mordred | fungi: I imagine many of them are using os's/editors that have auto-convert features - but I'm just guessing | 16:07 |
mnaser | i guess im the only one who learned about dhall today | 16:07 |
mordred | corvus: I think a meeting would be great. I'm on a long flight on tuesday - but could do an IRC meeting | 16:07 |
mordred | mnaser: I know next to nothing about it other than tristanC's explorations | 16:07 |
tristanC | corvus: i should have time to cleanup and add comment, hoping that it will make my implementation cleaner | 16:07 |
tristanC | clearer* | 16:08 |
mordred | corvus: other than that - I can do voice again after tuesday | 16:08 |
mnaser | mordred: oh yay, well that's a little bit more comforting :p | 16:08 |
clarkb | I'm around next week. If we get hit by a snow storm I might disappear to go sledding with the kids or if power goes out but otherwise around :) | 16:08 |
fungi | i find it interesting on that dhall discourse thread that there are folks lobbying for completely removing the ascii option and only supporting extended unicode syntax | 16:08 |
mordred | corvus: maybe once tristanC has the annotated version up? because maybe with those annotations the code might make more immediate sense to our eyes and give us a stronger context to discuss the choice? | 16:09 |
corvus | i'm thinking voice might be good for this one... how about wedesday 1800 utc? | 16:09 |
corvus | tristanC: would that be enough time? | 16:09 |
fungi | i'm free all wednesday | 16:09 |
mordred | I'll be in singapore so that'll be 2am for me - BUT - I'll be jetlagged, so it's likely fine :) | 16:10 |
tristanC | corvus: wedesday 1800 works for me, i guess i'll start commenting the code now =) | 16:10 |
clarkb | tristanC: and maybe reduce nesting so that it looks more like imperative programming with function calls? | 16:11 |
fungi | poll results are intriguing as well... >90% write dhall in ascii, but >50% prefer that the preprocessor reformats it to extended unicode symbols | 16:11 |
mordred | fungi: that's fascinating | 16:12 |
*** themroc has quit IRC | 16:12 | |
corvus | mordred: would 15:30 be any better? | 16:12 |
fungi | still, with some 35% explicitly formatting it to ascii in the preprocessor and 41% saying they prefer to read it in ascii, i doubt the folks pushing to remove the ascii syntax entirely will have sufficient support | 16:13 |
mordred | corvus: probably? but I'm happy to do either regardless if 1800 works better for others | 16:13 |
corvus | mordred, clarkb, tristanC, pabelanger, mnaser, fungi: https://ethercalc.openstack.org/ur9x4q4z1z71 | 16:15 |
clarkb | fungi: your questions about character sets as they intersect with programming remind me of apl https://en.wikipedia.org/wiki/APL_(programming_language)#/media/File:APL-keybd2.svg | 16:22 |
*** jpena|brb is now known as jpena | 16:22 | |
fungi | clarkb: reminiscent of https://en.wikipedia.org/wiki/Space-cadet_keyboard#/media/File:Space-cadet.jpg | 16:23 |
*** mattw4 has joined #zuul | 16:23 | |
fungi | (which actually existed) | 16:23 |
corvus | okay, let's go with 15:30, i'll send an email | 16:24 |
fungi | thanks! | 16:26 |
*** rlandy is now known as rlandy|brb | 16:29 | |
fungi | clarkb: on closer inspection, the space cadet keyboard has all the extended dhall glyphs | 16:31 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: web capabilities: remove unused job_history attribute https://review.opendev.org/702001 | 16:41 |
mhu | mordred, tobiash ^ if it's fine with you | 16:41 |
mhu | I'll look at what was done with the webhandlers and see if I can reimplement it somehow for the auth capabilities | 16:42 |
*** mhu has quit IRC | 17:02 | |
*** mhu has joined #zuul | 17:03 | |
*** rlandy|brb is now known as rlandy | 17:04 | |
*** bhavikdbavishi has quit IRC | 17:04 | |
AJaeger | is it safe to go from go 1.13 to 1.13.5 by default in Zuul jobs? See https://review.opendev.org/#/c/700467/ | 17:15 |
tobiash | mhu: actually it's not required, but it us planned to make sql required | 17:16 |
clarkb | AJaeger: https://golang.org/doc/devel/release.html#go1.13.minor looks safe though there is a 1.13.6 now | 17:16 |
clarkb | AJaeger: in general I think upgrading minor releases is ok | 17:16 |
clarkb | its the 1.13 -> 1.14 upgrade that will be potentially problematic | 17:17 |
AJaeger | thanks, clarkb | 17:18 |
* AJaeger will +2A 700467 then | 17:19 | |
*** zxiiro has joined #zuul | 17:24 | |
*** sanjayu__ has quit IRC | 17:30 | |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template' https://review.opendev.org/701871 | 17:31 |
*** saneax has joined #zuul | 17:31 | |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874 | 17:31 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874 | 17:31 |
*** evrardjp has quit IRC | 17:33 | |
*** evrardjp has joined #zuul | 17:34 | |
pabelanger | corvus: wfm, and yah, would be nice if we could all align | 17:36 |
openstackgerrit | Merged zuul/zuul-jobs master: install-go: bump version to 1.13.5 https://review.opendev.org/700467 | 17:42 |
pabelanger | Hmm, we are seeing a traceback in github driver when trying to merge PR for a repo: http://paste.openstack.org/show/788250/ | 18:01 |
pabelanger | however, I do not know why that is | 18:01 |
pabelanger | tobiash: see^ before? | 18:02 |
pabelanger | seen* | 18:02 |
clarkb | pabelanger: https://github.community/t5/GitHub-API-Development-and/Resource-not-accessible-by-integration-when-requesting-GitHub/td-p/13829 reading that implies to me that the functionality you are trying to get isn't available to github applications | 18:02 |
clarkb | have to make a user request instead? | 18:03 |
pabelanger | I wonder if something change on merge_method | 18:04 |
tobiash | pabelanger: hrm, I never saw this before? What was the request? | 18:04 |
pabelanger | tobiash: gate passed, zuul trying to merge | 18:04 |
pabelanger | https://github.com/ansible-network/collection_migration/pull/11 is PR | 18:05 |
tobiash | pabelanger: in zuul nothing changed recently around that afaik | 18:05 |
tobiash | pabelanger: does this affect the whole repo? | 18:05 |
tobiash | The branch pritection settings would be useful | 18:06 |
pabelanger | tobiash: yah, so this has 2 different branch protections rules, 1 for master the other for feature/* both are same but did two because not smart enough to figure out regex | 18:07 |
pabelanger | settings are as follows | 18:07 |
pabelanger | Require status checks to pass before merging, check / gate | 18:07 |
pabelanger | include admins | 18:07 |
pabelanger | and restrict users to ansible-zuul (github app) | 18:08 |
pabelanger | this should be same as other working repos | 18:08 |
pabelanger | Allow merge commits is selected | 18:08 |
*** pcaruana has quit IRC | 18:09 | |
tobiash | pabelanger: the restrict users to app is quite new, might be buggy? | 18:09 |
tobiash | Can you get the post request from the logs? | 18:10 |
pabelanger | tobiash: I can clear, but so far haven't seen a failure before | 18:10 |
pabelanger | yah, 1 sec | 18:10 |
tobiash | And further to check that zuul isn't configured to use squash merge | 18:11 |
pabelanger | https://pastebin.com/GerARqfK | 18:12 |
pabelanger | let me check that too | 18:12 |
pabelanger | tobiash: we don't set it, so it is the default | 18:13 |
pabelanger | merge-commit, IIRC on github | 18:13 |
tobiash | hrm no idea what could cause this | 18:16 |
*** rfolco has joined #zuul | 18:16 | |
tobiash | Is this affecting just this pr, the whole repo or the whole app? | 18:16 |
clarkb | is it possible that restrict users is placing a restriction for user to server api calls | 18:16 |
clarkb | which would break the application per the link I pasted | 18:16 |
pabelanger | tobiash: not sure, this is first PR | 18:17 |
pabelanger | I can try other branch | 18:17 |
pabelanger | clarkb: maybe, but can remove and see | 18:17 |
clarkb | I would try removeing the restrict users and see if it works | 18:17 |
clarkb | ++ | 18:17 |
tobiash | yes, that' what I'd do as well | 18:17 |
tobiash | pabelanger: first pr in that repo or of that gh app? | 18:18 |
tobiash | in case of the latter also check the access rights of the app | 18:18 |
pabelanger | tobiash: well, not first PR, but first since we upgraded to 3.14.0 this worked dec 2019 when using 3.11.1 | 18:19 |
pabelanger | yah, github app is on repo | 18:20 |
pabelanger | has correct permissiosn | 18:20 |
tobiash | Really weird | 18:20 |
pabelanger | no change removing ansible-zuul only | 18:23 |
clarkb | pabelanger: fwiw the only github driver changes in zuul recently that I know of are the ones you and I made recently to make searching for dependencies more efficient | 18:25 |
clarkb | nothing related to merging changes | 18:25 |
clarkb | pabelanger: possible that github3 updated in your upgrade? | 18:25 |
*** jpena is now known as jpena|off | 18:25 | |
pabelanger | clarkb: yah, thats the change I was looking at | 18:26 |
pabelanger | github3.py==1.3.0 | 18:26 |
pabelanger | that is same version as before | 18:26 |
pabelanger | well, dame | 18:28 |
pabelanger | damn* | 18:28 |
pabelanger | I think this is the issue | 18:28 |
pabelanger | https://github.community/t5/GitHub-Actions/GitHub-action-Resource-not-accessible-by-integration/td-p/19707 | 18:28 |
pabelanger | OAuth tokens and GitHub Apps are restricted from editing the main.workflow file. This means it isn't possible to merge a Pull Request via an Action if it updates the main.workflow file I'm afraid. | 18:29 |
pabelanger | let me rebase away that change, and see what happens | 18:29 |
clarkb | so it is change specific if you edit the workflow file? | 18:29 |
pabelanger | clarkb: so, yah this is a little messed up. the PR, is a rebase of an upstream project | 18:30 |
pabelanger | so a series of commit to bring master branch up to date | 18:30 |
pabelanger | upstream uses github acctions | 18:30 |
pabelanger | actions* | 18:30 |
tobiash | pabelanger: ok, so that suggests that this problem won't affect other prs that don't modify those files | 18:32 |
pabelanger | welp, that is a terrible design | 18:33 |
pabelanger | that was the issue | 18:33 |
pabelanger | https://github.com/ansible-network/collection_migration/pull/11 | 18:33 |
pabelanger | that means | 18:33 |
pabelanger | zuul cannot gate github repos, with .github/workflows, regardless of if the feature is enabled or not | 18:34 |
tobiash | yay github :/ | 18:34 |
pabelanger | oh | 18:35 |
pabelanger | this repo has it enabled | 18:35 |
pabelanger | so, let me disable it and add back | 18:35 |
pabelanger | yup, fails even with disabled | 18:37 |
pabelanger | that is terrible | 18:37 |
pabelanger | so, I guess I get to write a new job, that checks for .github/workflows folder and fail if found | 18:39 |
fungi | i'm not entirely sure how to parse github.community's date "format" but the response from github seems to have either been 10 or 8 months ago that they were "passing the feedback to the team to consider" | 18:40 |
fungi | if they haven't solved this in most of a year, then i wouldn't hold my breath | 18:40 |
pabelanger | yah, I would love to see where this is documented | 18:42 |
pabelanger | fungi: clarkb: tobiash: guess, force push is the way to solve this | 18:48 |
clarkb | pabelanger: or escalate your privs and click the merge button | 18:48 |
clarkb | whcih reduces chances that you'll merge the wrong thing | 18:49 |
pabelanger | yah, doesn't scale too well | 18:49 |
clarkb | well should be infrequent right? | 18:49 |
clarkb | onyl when you change those files | 18:49 |
pabelanger | thinking long term, some ansible projects want github actions | 18:49 |
clarkb | oh sure long term we should figure out a way to address it and having zuul push merges back would probably do it | 18:50 |
clarkb | I thought you meant for this change | 18:50 |
pabelanger | yah, this case, best to delete folder and deal with rebase cost | 18:50 |
pabelanger | but yah, hear you | 18:50 |
pabelanger | just terrible that it is blocked regardless if actions is enable or disabled | 18:51 |
tobiash | clarkb: push by zuul will be blocked as well according to that link | 18:52 |
clarkb | oh ha ok | 18:53 |
tobiash | the only way I see to solve this in a non hacky way is talking to github | 18:53 |
pabelanger | tobiash: not sure, internal ansible folks say it works with git push --force | 18:56 |
pabelanger | grumble | 18:56 |
pabelanger | thanks github | 18:56 |
clarkb | tobiash: pabelanger that link says they used an action token and it was from the action | 18:56 |
clarkb | could be the git push fails if done in that way but with other user token would be ok? | 18:56 |
*** pcaruana has joined #zuul | 18:57 | |
pabelanger | https://github.com/ansible-community/collection_migration/pull/210/files | 18:57 |
tobiash | clarkb: a different user probably works but this should be considered a hack because github apps is for handling access to the app... | 18:58 |
pabelanger | I think colleciton migration tool is using deployement keys | 18:58 |
clarkb | tobiash: yes, but the failure was from within the action not an app aiui | 18:59 |
clarkb | tobiash: but maybe the action and app creds are equivalent here | 18:59 |
tobiash | I guess it's the same type of auth token there as we see zuul failing to merge as well | 19:02 |
pabelanger | so, right now, I can't think of a good way to deal with this in zuul, without wrting a new job for all projects and check if files exist. | 19:07 |
clarkb | a good start may simply to be to add a note about this in the github driver docs | 19:08 |
clarkb | so that it is less of an investigation if people hit it in the future | 19:08 |
clarkb | jlk: ^ may also be able to offer some insight | 19:08 |
pabelanger | yah, would be interested in official docs (on github) so I can link to it, but can't seem to figure it out | 19:23 |
jlk | oh that looks fine | 19:37 |
jlk | IIRC Actions is still in beta, so there's definitely going to be docs gaps | 19:37 |
pabelanger | jlk: what is best way to pass along feedback here? Aside from asking you :) Basically, if actions are disabled, we'd want github to allow merging of .github/workflows configs. | 19:39 |
jlk | asking | 19:42 |
pabelanger | thanks! | 19:43 |
jlk | https://support.github.com/contact/feedback?contact[category]=GitHub+Actions | 19:44 |
jlk | Looks like you still have to pick Actions. one sec | 19:44 |
jlk | ¯\_(ツ)_/¯ just pick it from the drop down | 19:45 |
pabelanger | ack, cool | 19:45 |
corvus | jlk, pabelanger: secondarily, it would be nice if there were a way to merge changes if actions *are* enabled. maybe an extra permission that can be granted to an app. otherwise, it doesn't really leave much space for github actions to co-exist with other apps. | 19:48 |
jlk | agreed | 19:48 |
pabelanger | +1 | 19:49 |
corvus | pabelanger: you want to pass on both of those pieces of feedback? | 19:49 |
pabelanger | will do | 19:49 |
corvus | pabelanger: thanks! | 19:49 |
*** Goneri has quit IRC | 19:59 | |
pabelanger | corvus: jlk: feedback sent, not sure if there is a way for me to track status of it | 20:01 |
*** michael-beaver has joined #zuul | 20:57 | |
*** rfolco has quit IRC | 21:06 | |
*** zxiiro has quit IRC | 21:23 | |
*** rlandy has quit IRC | 21:26 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Fix typo in helm role https://review.opendev.org/702046 | 21:27 |
*** Goneri has joined #zuul | 21:28 | |
corvus | pabelanger, yoctozepto: 678273 +3, thanks | 21:38 |
clarkb | pabelanger: mordred: what was the process of doing volume detachments when the nova server no longer exists? | 21:46 |
clarkb | iirc that was sorted out in here recently | 21:46 |
yoctozepto | corvus: thanks | 21:47 |
yoctozepto | any eta to be picked up by opendev/openstack instance? | 21:47 |
fungi | clarkb: based on scrollback it was this api method: https://docs.openstack.org/api-ref/block-storage/v3/#force-detach-a-volume | 21:48 |
mordred | clarkb: one sec ... | 21:48 |
mordred | fungi: do you have that scrollback quick enough to find the paste link I sent to pabelanger? | 21:48 |
fungi | though a clean recipe on how we can run that from bridge.o.o to do batch cleanup would be nice | 21:48 |
fungi | oh, there was a paste too? checking | 21:48 |
mordred | fungi: yeah - I'm also looking | 21:49 |
fungi | http://paste.openstack.org/show/788133/ | 21:49 |
fungi | the power of grep compels you | 21:49 |
fungi | now if only paste.o.o could respond in a timely fashion | 21:50 |
mordred | the uuid in that string comes from volume.attachments and is the id of a given attachment. the overall algorithm is "get volume, look at attachments, look to see if the server.id listed in the attachment exists, if not, do that DELETE call to delete the attachment, then you can delete the volume" | 21:51 |
fungi | and i guess i can pass the usual OS_CLIENT_CONFIG_FILE envvar? | 21:53 |
mordred | fungi: something like http://paste.openstack.org/show/788260/ (untested) - and yes, you should be able to do that | 21:54 |
mordred | fungi: where is this happening now? | 21:55 |
clarkb | mordred: vexxhost sjc1 | 21:55 |
mordred | if you like, I can do a quick try on that script without deleting thing and make it something we can run run | 21:55 |
mordred | clarkb: the zuul account? | 21:55 |
mordred | clarkb, fungi: give me a sec - let me make a proper script that we can run | 21:55 |
clarkb | mordred: yes | 21:55 |
clarkb | mordred: we've leaked a bunch of volumes which is preventing us from deleting images which is filling the builder disks | 21:56 |
openstackgerrit | Merged zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867 | 21:56 |
mordred | clarkb: "awesome" | 21:56 |
corvus | yoctozepto: maybe next week? | 21:57 |
fungi | mordred: yeah, i tried but am getting errors like this: | 21:57 |
fungi | Volume 0f91579c-c627-452b-aad4-67cdeae865c3 could not be found. | 21:58 |
fungi | when that volume shows up in the `openstack volume list` output | 21:58 |
mordred | fungi: cool. let me take a swing at it | 21:58 |
fungi | `openstack volume show 0f91579c-c627-452b-aad4-67cdeae865c3` provides an attachment_id of 42511fe3-d926-4b99-b275-38f0e8fa5a76 | 21:59 |
fungi | so i tried calling print(c.block_storage.delete('/attachments/42511fe3-d926-4b99-b275-38f0e8fa5a76', microversion='3.31')) | 21:59 |
fungi | leading me to suspect there's a deeper problem (though in all likelihood i'm just doing it wrong) | 22:00 |
mordred | fungi: I feel like I just deleted that one | 22:08 |
mordred | (the attachment) | 22:08 |
mordred | can you double check? | 22:08 |
fungi | lookin' | 22:10 |
fungi | attachments [] | 22:10 |
fungi | i believe you did | 22:10 |
fungi | any clue what i was missing? | 22:10 |
fungi | magic fingers? | 22:10 |
mordred | maybe magic fingers :) | 22:11 |
mordred | I've got a script in /root called clean-volumes.py that should work - although I'm going to run it one more time in no-op mode | 22:11 |
mordred | fungi: if you run it with OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml python3 clean-volumes.py | 22:12 |
mordred | fungi: it should print a bunch of REST calls- then a long list of actions | 22:12 |
corvus | mnaser: https://github.com/aslafy-z/helm-git looks interesting but iiuc, one would need to build a custom argo deployment image in order to use it | 22:13 |
mordred | fungi: I think we mostly want to make sure that the list of actions it's going to take there don't include deleting the mirror volume | 22:13 |
fungi | sure, though ideally the mirror volume is in another castle | 22:13 |
mnaser | corvus: !!!! i like that a lot | 22:14 |
corvus | mnaser: i think https://argoproj.github.io/argo-cd/operator-manual/custom_tools/ is relevant to that | 22:14 |
mordred | fungi: oh - right - the mirror volume is in openstackci-vexxhost right? | 22:14 |
mnaser | corvus: yes, that would be exactly the way to go about it | 22:15 |
clarkb | mordred: fungi yes it should be in the other tenant/project/whatever its called | 22:15 |
mnaser | corvus: but we can use an initcontainer and cheat... | 22:15 |
mordred | clarkb: ok - so - we should be "safe" if there is a bug in this script from deleting anything too important, right? | 22:15 |
clarkb | mordred: I think so. Another check is to look at hte volume size | 22:15 |
mordred | clarkb, fungi: maybe this is ian #openstack-infra convo - whoops, sorry | 22:15 |
clarkb | 80GB should be for the test nodes | 22:15 |
mnaser | corvus: thanks for reviewing my collect-container-logs role, i refactored uses i saw of it in the topic - https://review.opendev.org/#/q/topic:collect-container-logs | 22:20 |
*** armstrongs has joined #zuul | 22:21 | |
corvus | mnaser: oh cool, thx | 22:21 |
mnaser | btw, how do we feel about shipping 'tools' inside zuul, for example, in the current ci jobs, nodepool loops forever erroring due to the fact that zookeeper is not up yet .. it would be nice if we had some small python tools that waited for a zookeeper cluster to be ready in the image so we can use those in the initContainer | 22:22 |
mnaser | so that if the zookeeper cluster is not up, nodepool simply won't start | 22:23 |
openstackgerrit | Merged zuul/zuul master: Make files matcher match changes with no files https://review.opendev.org/678273 | 22:23 |
mnaser | it avoids spamming the logs with tons of errors as it fails to connect | 22:23 |
mnaser | or perhaps nodepool can expose some sort or alive/readiness check so the containers dont become ready until nodepool is ready | 22:24 |
mnaser | that way during a new deployment or rollout, if something goes wrong, it will stop rolling out the other pods | 22:25 |
clarkb | mnaser: fwiw you can probably get away with nc or telnet or something simple like that | 22:25 |
clarkb | I'm not sure we need to ship a special tool for that | 22:25 |
mnaser | clarkb: yeah but then id have to do some yaml parsing of the list of zookeeper servers provided | 22:25 |
mnaser | and assume the config format doesnt change for example | 22:25 |
clarkb | aren't you providing that list to the config anyway? | 22:26 |
clarkb | (meaning you could provide it to another tool) | 22:26 |
mnaser | in nodepools case, im actually yaml-ifying the config input straight into a file, without having some manual 'insert this here' thing | 22:26 |
clarkb | remember when we had init scripts that solved this for us | 22:27 |
* clarkb grumps | 22:27 | |
clarkb | mnaser: the cluster (zuul, nodepool and zk) will eventually converge on a happy state once zk is up and running right? its not like this going into an error state? | 22:28 |
clarkb | *permanent error state | 22:29 |
mnaser | clarkb: it will, but technically during a rollout, if something is borked, it will just keep rolling out and breaking everything | 22:29 |
clarkb | but is it broken? | 22:29 |
clarkb | (If it is I think that is a bug worth fixing) | 22:29 |
mnaser | right but we can forget zookeeper in this case, say, if you made a typo in your config and you're using the zuul helm charts | 22:29 |
mnaser | if the first pod goes up and theres' no readiness check, it will starts rolling out the second one | 22:30 |
mnaser | so you might rollout a broken config.. but if we have a readiness check where nodepool can tell k8s that "ok im good to go", it can safely continue with the rollout | 22:30 |
clarkb | sure, checking for proper errors and not needing to wait for services to start in sequence are two different problems | 22:30 |
clarkb | I think we should avoid needing strict sequencing | 22:31 |
mnaser | yes, i agree, i think i started with one problem set and moved to another in the conversation | 22:31 |
*** armstrongs has quit IRC | 22:31 | |
clarkb | right what I was trying to get at earlier is if you've found a sequencing issue we should fix that at the source not workaround it | 22:31 |
clarkb | however, config validation is a good thing to verify and both zuul and nodepool have config validation for the yaml bits but not the ini bits iirc | 22:32 |
openstackgerrit | James E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm https://review.opendev.org/702052 | 22:32 |
mnaser | ya i think the idea is to find a way to get nodepool to say "ok i'm up and im healthy" | 22:33 |
mnaser | and then k8s can poll that as a readiness/liveness check | 22:33 |
corvus | mnaser: i am not opposed to adding http (prometheus) endpoints that expose readiness | 22:34 |
corvus | would that help? | 22:34 |
mnaser | yeah, so generally we'd probably end (ideally) with 3 endpoints -- readiness, liveness and metrics | 22:34 |
mnaser | readiness can equal liveness but metrics (prometheus) is different, but yeah, if we add an http endpoint, that's probably enough infrastructure to do it | 22:35 |
mnaser | reason is most of the liveness and readiness checks are http request based rather than parsing metrics | 22:35 |
corvus | okay, i thought readiness/liveness was part of prom; if it isn't let's set that aside for now, because prom for metrics is complicated | 22:35 |
corvus | i'll rephrase that as "i think it would be fine to add http endpoints that expose liveness info" | 22:36 |
corvus | and readiness :) | 22:36 |
mnaser | cool, i dont know if i have the time to dig into this but it would be neat if nodepool started reporting 200 OK on a port once it's connected to nodepool and all threads are up and running | 22:37 |
corvus | at the same time, i think nodepool can run a config check on it's config as a gate job :) | 22:37 |
corvus | but, belts and suspenders. we can have both. | 22:37 |
corvus | mnaser: can you take a look at https://review.opendev.org/702052 ? | 22:38 |
corvus | it's the most complex helm i've written to date :) | 22:38 |
mnaser | corvus: i am, im trying to double check the ternary formatting | 22:38 |
mnaser | so it looks like you can also do: true | ternary "foo" "bar" | 22:39 |
mnaser | not that i think its a big deal, but that is easier for my brain to process | 22:39 |
corvus | oh neat, that's sort of ansibleish | 22:39 |
corvus | yeah, it took me a minute to understand the docs and realize the condition was last | 22:39 |
mnaser | helm uses this library https://github.com/Masterminds/sprig afaik | 22:40 |
mnaser | https://github.com/Masterminds/sprig/blob/48e6b77026913419ba1a4694dde186dc9c4ad74d/docs/defaults.md so thats there, but yeah, it seems right, though easier to process the more ansible-y one | 22:40 |
clarkb | mnaser: corvus do you think we need that for every microservice or just the "brain" | 22:40 |
corvus | mnaser: i'll switch to "|" | 22:41 |
clarkb | eg with zuul would we want each executor and merger etc to report it or have zuul web aggregate and report a single value | 22:41 |
mnaser | clarkb: the approach is usually each microservice runs its own health check, because a single zuul executor can actually have having problems with the rest being ok | 22:41 |
corvus | clarkb: good q -- if we can have zuul-web do that, it would be best... mnaser would that be complicated to have the liveness check check a service? | 22:41 |
mnaser | https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request | 22:42 |
corvus | mnaser: right, but could you say, hit "http://zuul/live?host=executor-4" ? | 22:42 |
mnaser | so generally the http request hits a url/path on the same pod afaik | 22:42 |
clarkb | mnaser: and I guess you can assign domains to those readyness? beacuse a single executor of many being broken is why we have more than one (or one reason to) | 22:42 |
clarkb | so zuul application being up is different than a specific executor being up | 22:42 |
mnaser | right, so if it is stateless and its broken, k8s can (up to you) decide to kill it and restart it | 22:42 |
mnaser | many/most of the probes seem to want to hit the pod itself, also, the other reason why you wouldnt want it to hit zuul-web, because if you have issues with zuul-web, you dont want your executors to all show as not alive | 22:43 |
mnaser | and then k8s goes ahead and kills them because they are failing health checks | 22:43 |
corvus | a liveness command may be easier? | 22:44 |
corvus | https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-command | 22:44 |
mnaser | yes, i forgot about the commands | 22:44 |
clarkb | that all makes sense. however your earlier example cares about zuul/nodepool the application not its individual parts | 22:44 |
clarkb | so I think we end up wanting both things | 22:44 |
mnaser | that might be a lot better, we can just write a file | 22:44 |
corvus | we've got an easy pattern for commands with 'zuul-scheduler foo' and 'zuul-executor foo' | 22:44 |
corvus | we could extend that to 'zuul-executor ready' | 22:45 |
corvus | to return an exit code based on readiness | 22:45 |
corvus | and apply that to every microservice, regardless of whether it otherwise runs an http server | 22:45 |
mnaser | i think the only tricky bit is thats going to be out of process | 22:45 |
mnaser | so unless you're planning to have some sort of socket open that it can talk to it.. or some file it lays down that the other process reads | 22:45 |
clarkb | mnaser: thats how the commands work today | 22:46 |
corvus | yeah, we have a socket | 22:46 |
mnaser | oh okay so thats perfect then | 22:46 |
mnaser | seems easy now, too easy :p | 22:46 |
corvus | scheduler and executor definitely have this already, maybe some others, and the command structure is standardized enough that we can put it on all the commands real quick | 22:46 |
corvus | so yeah, that's probably < 1 hour of work :) | 22:47 |
mnaser | zuul-web can get away with it and just use an http helth check | 22:47 |
clarkb | nodepool too. though that socket is to zk | 22:47 |
mnaser | given it already runs a server | 22:47 |
clarkb | so might get weird if zk isn't up :) | 22:47 |
mnaser | well part of the ready could be an attempt to connect to zk and if that fails, then its not ready | 22:47 |
corvus | clarkb: nodepool and zuul share a process startup framework, so should be easy to add commands to that | 22:47 |
corvus | clarkb: so we could do "nodepool-builder ready" | 22:48 |
corvus | (rather than "nodepool ready") | 22:48 |
clarkb | makes sense | 22:48 |
mnaser | corvus: btw, could you also rebase or depends-on https://review.opendev.org/#/c/701764/ as well so we can see it in action too | 22:48 |
mnaser | (for the secret change) | 22:48 |
openstackgerrit | James E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm https://review.opendev.org/702052 | 22:49 |
corvus | mnaser: oh yep 1 sec | 22:49 |
openstackgerrit | James E. Blair proposed zuul/zuul-helm master: Change builder container name https://review.opendev.org/701793 | 22:50 |
openstackgerrit | James E. Blair proposed zuul/zuul-helm master: Add empty clouds value https://review.opendev.org/701865 | 22:50 |
openstackgerrit | James E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm https://review.opendev.org/702052 | 22:50 |
corvus | mnaser: https://gerrit-review.googlesource.com/c/zuul/ops/+/250112 is what that looks like in action | 22:54 |
corvus | so the tradeoff is that the secret has to be set up external to argo/helm, but in exchange we get to just edit the nodepool file on disk. | 22:55 |
clarkb | corvus: mnaser if I'm looking at ^ and want to figure out where the chart is for zookeeper how do I find that? | 22:55 |
mnaser | i think corvus deployed it externally outside argo | 22:55 |
corvus | clarkb: readme line 26 | 22:56 |
*** avass has quit IRC | 22:56 | |
clarkb | ya thats just a giant xml doc | 22:56 |
corvus | clarkb: oh, yeah that's the thing we were talking about with fungi yesterday. | 22:57 |
mnaser | corvus: btw.. kubectl -n argocd get application/zookeeper -- trim that down, and add it as an app | 22:57 |
mnaser | so you can deploy zookeeper via argocd too :) | 22:57 |
mordred | pabelanger, Shrews: https://review.opendev.org/702053 <-- has a complete script implementing cleaning up after leaked volumes from BFV openstack clouds that are attached to non-existent servers | 22:58 |
clarkb | corvus: I'm curious to see if they modify the zk settings to make zk not terribly slow, and rotate the journal file, and run in a 3 or 5 pod cluster | 22:58 |
corvus | mnaser: how's that different than the "argocd app create" for zookeeper | 22:58 |
clarkb | corvus: really the first two things are probably most important otherwise your zuul will be slow and then run out of disk and have a safd | 22:58 |
mnaser | corvus: argocd app create pretty much creates a local yaml manifest and pushes it out to the cluster :) | 22:58 |
mnaser | clarkb: there isn't somewhere its listed, but its here -- https://github.com/helm/charts/tree/master/incubator/zookeeper | 22:59 |
corvus | clarkb: should be ~= to this https://github.com/helm/charts/tree/master/incubator/zookeeper | 22:59 |
mnaser | and you can look at all the values https://github.com/helm/charts/blob/master/incubator/zookeeper/values.yaml | 22:59 |
corvus | mnaser: so kubectl -n argocd get application/zookeeper is a shorthand? | 23:00 |
mnaser | corvus: right, i meant that you can get the application definition from k8s and store it in-repo, so you dont have to bootstrap it with a cli command | 23:00 |
mnaser | you can kubectl apply -f zookeeper-app.yaml nodepool-app.yaml | 23:00 |
clarkb | ya antiaffinity is option so you'll likely want to toggle that | 23:03 |
mnaser | also it uses a 5G pv by default i think | 23:04 |
clarkb | mnaser: corvus https://github.com/helm/charts/blob/master/incubator/zookeeper/values.yaml#L277 is the value you want to change to avoid running out of disk iirc | 23:09 |
clarkb | otherwise you grow an infinite number of snaps | 23:09 |
corvus | clarkb: thanks! | 23:10 |
clarkb | yup our system-config/manifests/site.pp comments confirm | 23:10 |
clarkb | opendev sets that to 6 which means every 6 hours purge old snaps down to snap retain count | 23:10 |
clarkb | the other thing we change is snapCount to 10000 | 23:11 |
clarkb | that sets a higher limit to how big of a transactio nbacklog zk can have before it throttles clients to catch up | 23:11 |
clarkb | iirc because we are bursty when restarting services that helps during those times | 23:12 |
corvus | clarkb: i've made notes to change both of those | 23:12 |
clarkb | cool | 23:12 |
corvus | mnaser: do you have any idea what the linter is on about here? https://zuul.opendev.org/t/zuul/build/ba271a844a6c46a38c303ba4e88e33ad | 23:12 |
clarkb | the other thing to check is log rotation but https://github.com/helm/charts/blob/master/incubator/zookeeper/templates/config-script.yaml#L92 takes care of that for you I belive. | 23:13 |
corvus | mordred: do you happen to have thoughts about a mysql-ish helm chart? istr you used helm for the pxc thing a while back? | 23:13 |
corvus | maybe just https://github.com/helm/charts/tree/master/stable/percona-xtradb-cluster ? | 23:13 |
mordred | corvus: I did - but I did a helm export and then committed the results | 23:14 |
mordred | corvus: yeah - I think that's what I used ... looking real quick | 23:14 |
mordred | corvus: kubernetes/export-percona-helm.sh | 23:14 |
mordred | corvus: in system-config - is what I used when we did things before | 23:14 |
mordred | corvus: but since you're using helm directly, you obviously don't need to export - but maybe the arguments will be helpful | 23:15 |
corvus | mordred: cool, thanks | 23:15 |
mnaser | corvus: i think we need to add an empty tenantConfig and conditional on "extraFiles" | 23:15 |
corvus | mnaser: ^ for the sql part of this | 23:15 |
corvus | mnaser: oh i see, i'll see if i can fix that real quick | 23:16 |
mnaser | because its probably rendering to like... | 23:16 |
mnaser | https://www.irccloud.com/pastebin/wRRWY0sm/ | 23:16 |
mnaser | personally, i'm pretty indifferent about db. i actually use this operator cause it has backups and what not: https://github.com/presslabs/mysql-operator | 23:17 |
mnaser | so for me just need to make sure we have a way to disable it | 23:18 |
mordred | oh cool. that operator was _not_ finished back when we were looking at gitea | 23:18 |
mnaser | mordred: yeah its pretty polished and runs pretty well for us. the reason i am not using the percona one because.. at least for our openstack use case, we're always reading/writing to one server all the time because of deadlock stuff | 23:19 |
corvus | i love the juxtaposition of "bulletproof" and "alpha and not suitable for critical production workloads" :) | 23:19 |
mordred | corvus: I do not have emotional ties to the other thing - that was just best I could find back then | 23:20 |
mnaser | so running a master/slave(ehhh, do we have another term folks use these days for dbs?) does the same thing | 23:20 |
mordred | mnaser: oh - right, that's replication based not galera based, yeah? | 23:20 |
mnaser | yep | 23:20 |
mordred | mnaser: also - I probably don't want to know about the deadlocks thing with openstack do I? | 23:21 |
clarkb | tristanC is using postgres in his operator demo for zuul | 23:21 |
mnaser | mordred: https://www.percona.com/blog/2014/09/11/openstack-users-shed-light-on-percona-xtradb-cluster-deadlock-issues/ | 23:21 |
clarkb | fwiw | 23:21 |
mnaser | "The simplest way to overcome this issue from the operator’s point of view is to use only one writer node for these types of transactions. This usually involves configuration change at the load-balancer level." | 23:22 |
mnaser | thats 2014 but i dont know if much work was put into rewriting those queries so they dont deadlock a cluster | 23:22 |
fungi | the overlord/subjugated model of database clustering | 23:23 |
mordred | oh - right. I remember discussion on that and discussions about not going things that way - and I'm pretty sure it died on the vine | 23:23 |
*** rfolco has joined #zuul | 23:25 | |
tristanC | clarkb: i just use pg because i find it easier to setup and use | 23:25 |
clarkb | ah ok so nothing to do with operator readiness | 23:25 |
*** mattw4 has quit IRC | 23:27 | |
corvus | mnaser: it looks like extraFiles is the tricky part there; i can't (with my limited helm experience) find an easy way to do that kind of arbitrary include. one way to fix it would be to iterate over it as a dict | 23:28 |
tristanC | to help with service readyness, i added init-container to most service to run python -c 'socket.connect(service, port)' on the dependency (e.g. executor -> gearman -> zk -> db), this make the deployment a bit slower but the service logs are cleaner like that | 23:29 |
mnaser | corvus: you could just simply add an if statement, if empty map in Golang evaluates to false | 23:30 |
corvus | mnaser: ah yep | 23:30 |
mnaser | I believe so anyways. Alternatively, you can look at how I did nodeSelector and tolerations too | 23:31 |
* mnaser is on mobile so can’t point to specific | 23:31 | |
openstackgerrit | James E. Blair proposed zuul/zuul-helm master: Add Zuul charts https://review.opendev.org/700460 | 23:31 |
*** pcaruana has quit IRC | 23:35 | |
*** rfolco has quit IRC | 23:40 | |
*** michael-beaver has quit IRC | 23:57 | |
openstackgerrit | James E. Blair proposed zuul/zuul-helm master: Allow tenant config file to be managed externally https://review.opendev.org/702057 | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!