Friday, 2020-01-10

corvustristanC: do you think we should use dhall instead of ansible or go?00:00
tristanCand i'm also able to generate an ansible playbook based on the same spec: https://github.com/TristanCacqueray/dhall-zuul/blob/master/deployments/cr-playbook.yaml00:00
tristanCcorvus: well, the k8s operator still uses ansible: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/roles/zuul/tasks/main.yaml00:00
corvustristanC: oh, so you have ansible running dhall?00:01
clarkbdhall is just a config language (replacement for json or yaml) right?00:01
tristanCcorvus: yes, dhall is just used to generate the k8s object, it doesn't manage zuul. I'll use ansible to implement backup/restore and scalling00:01
clarkbI don't think there is an operator sdk implementation using dhall00:02
corvusok.  got it.  yeah.  i mean, i wouldn't put it past someone to have made an operator sdk implementation with dhall.  :)00:02
tristanCclarkb: that's correct, dhall is like a more advanced yaml or json, the evaluation only results in a configuration00:02
corvusremote:   https://gerrit-review.googlesource.com/c/zuul/ops/+/250112 Add initial bootstrapping instructions and nodepool config [NEW]00:03
corvusmnaser: ^ that's what i did today00:03
corvusmnaser: thanks for your help :)00:03
tristanCcorvus: well i wrote another operator that applies a dhall expression to kubernetes, e.g.: https://github.com/TristanCacqueray/dhall-operator/blob/master/operator/examples/dhall-test.yaml00:04
corvusclarkb: ^ there you go :)00:04
corvustristanC: nicely done :)00:04
tristanCcorvus: but it's implemeted in ansible too: https://github.com/TristanCacqueray/dhall-operator/blob/master/operator/roles/dhall/tasks/main.yml00:04
*** mattw4 has quit IRC00:04
tristanCbut that's low-level, for zuul i respected the spec defined in zuul00:05
mnasercorvus: woot, the deployment stuff i made works but im trying to find a way to wait for the deployment to fully rollout00:05
mnaseras the deployment is completing properly but nodepool doesnt go up cause zookeeper isnt done rolling out yet (as it puts out 3 replicas)00:05
tristanCcorvus: thanks, it's working quite well, and based on the same definition of services and configuration, it's super convenient to generate k8s objects as well as ansible playbooks00:05
clarkbtristanC: what is the advantage to compiling dhall to k8s objects yaml or ansible playbooks yaml over writing them directly? templating?00:06
corvusmnaser: yeah, i guess argo reports synced even before k8s is finished running the pods, and even running the pods isn't the same as 'service is up and running'00:06
mnasercorvus: yeah, sync'd means the k8s state == charts, "progressing" = it's doing things00:06
tristanCclarkb: mostly so that the user doesn't have to worry about dhall, the interface is the zuul crd00:07
mnaserand it transitions from progressing to healthy00:07
clarkbtristanC: so it goes from zuul crd yaml to dhall to ansible yaml?00:08
tristanCclarkb: i missread... well there are many advantage for using dhall, but if i had to pick one is that it is programmable00:08
clarkbya ok I was worried that was it :) but maybe I'm just scarred from dealing with python packaging and its programming config :)00:09
tristanCclarkb: the input is json, e.g.: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-cr.yaml#L6-L2800:09
tristanCclarkb: which is converted in this Input type: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L39-L5600:10
tristanCclarkb: and passed on to this function: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L9600:10
tristanCwhich can then be used to generate k8s object, ansible playbooks or podman compose script, see the 'cr-*' file here: https://github.com/TristanCacqueray/dhall-zuul/tree/master/deployments00:11
clarkbya so dhall is like your IR for a compile between config formats00:12
tristanCclarkb: it's not really comparable to python setup.py since python can do side effects, dhall evaluation is strictly pure00:13
tristanCclarkb: yes, something like that, at least that's what i tried for zuul with this project: https://github.com/TristanCacqueray/dhall-operator00:13
tristanChopefully nodepool and the remaining bits like mqtt should be implemented by tomorrow, then i'll push an usable operator image00:16
tristanCalso, i wrote an optional git-daemon service to serve a ready to use configuration, using a fast periodic pipeline it's quite handy to validate the whole setup, it looks like this: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Demo.dhall#L23-L8200:19
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176400:20
mnaseranyone has any creative nice ideas for the name of a zuul job which would run a helm chart against a k8s cluster?00:21
clarkbmnaser: "apply-helm-chart" ?00:22
mnaserthat seems reasonable00:22
mnaseronce i get this test working, i'll refactor out to zuul/zuul-jobs cause i'd like to make use of this too00:22
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176400:27
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role  https://review.opendev.org/70186700:34
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176400:36
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role  https://review.opendev.org/70186700:38
openstackgerritMohammed Naser proposed zuul/zuul-registry master: Switch to collect-container-logs  https://review.opendev.org/70186800:39
openstackgerritMohammed Naser proposed zuul/nodepool master: Switch to collect-container-logs  https://review.opendev.org/70186900:42
openstackgerritMohammed Naser proposed zuul/zuul-registry master: Switch to collect-container-logs  https://review.opendev.org/70186800:42
clarkbtristanC: does https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Demo.dhall#L143-L153 and https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Demo.dhall#L192-L194 mean there is a zuul main.yaml generated for non scheduler services?00:49
clarkbalso I grok this a bit more now that I've realized its haskell with "io" built in00:50
clarkbits like bash -x for haskell kind of00:50
clarkbbut the trace is also the output if that makes sense00:50
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role  https://review.opendev.org/70186700:52
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176400:52
mnaserso close.  i think zookeeper needs more than 300s to start up in this case00:53
tristanCclarkb: yes, that's a demo "application", the zuul operator is using this another one which takes the main.yaml from a user provided secrets: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-cr.yaml#L13  that requires users to applies secret before the Zuul cr such as: https://github.com/TristanCacqueray/dhall-zuul/blob/master/deployments/cr-input-k8s.yaml#L3000:54
tristanCclarkb: which is added to scheduler service here: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L290-L295 and https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L23000:55
tristanCclarkb: this could be made optional in the future, for example when the user provide a list of project https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-connection.yaml#L11 (which would be expanded like so: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Connection.dhall#L84-L93 )00:57
tristanCclarkb: another neat example is how the zk services is only enabled when the user doesn't specify a zk config: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L150-L16501:00
mnaserwelp, everything is working nicely but01:05
mnasercoredns is broken in our install-kubernetes job :)01:05
mnaserhttps://6efce3be0458ca4ff401-19d327c34d133e2dbd296ad72151667c.ssl.cf5.rackcdn.com/701764/11/check/zuul-helm-functional/bf5d120/docker/k8s_coredns_coredns-6955765f44-csbfp_kube-system_7ae315e6-3ae6-4f07-a30c-80f511b6819f_5.txt01:05
tristanCclarkb: it's similar to a haskell with "io" built in, but the type system is different, it's based on CCw: https://hal.inria.fr/hal-0144583501:05
clarkbtristanC: ya structurally it is similar01:06
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template'  https://review.opendev.org/70187101:06
tristanCclarkb: well it's also very different since you can't do recursion, expression are guarated to evaluate (or fail, but not hang or do side effect)01:07
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176401:07
clarkbmnaser: hrm I thought we set the coredns config to forward to whatever is set as the hosts forwarders01:07
clarkbmnaser: and 1.1.1.1 and 8.8.8.8 should not be forwarding back to coredns01:07
mnaserclarkb: so the issue here i think is that you set it as the hosts forwarders, which is 127.0.0.101:08
mnaserand when coredns starts up, it uses the hosts' /etc/resolv.conf (or whatever you pointed it to)01:08
clarkbmnaser: no we changed that01:08
clarkbor at least I thought we did /me looks to find it01:08
mnaserhttps://6efce3be0458ca4ff401-19d327c34d133e2dbd296ad72151667c.ssl.cf5.rackcdn.com/701764/11/check/zuul-helm-functional/bf5d120/docker/k8s_coredns_coredns-6955765f44-csbfp_kube-system_7ae315e6-3ae6-4f07-a30c-80f511b6819f_5.txt01:08
mnaserthat tells me that we're hitting 127.0.0.1 (hence how the loop gets caught)01:09
clarkbwell it tells you there is a forwarding loop it doesn't say anything about who is forwarding01:09
mnaseroh look01:09
mnaseryes it seem st obe fixed hmm01:09
mnaserhah01:10
mnaserjokes on you!01:10
mnaser2020-01-10 00:57:18.993217 | ubuntu-bionic | * Preparing Kubernetes v1.17.0 on Docker '19.03.5' ...01:10
mnaser2020-01-10 00:57:21.852544 | ubuntu-bionic |   - kubelet.resolv-conf=/run/systemd/resolve/resolv.conf01:10
clarkbhttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/install-kubernetes/tasks/minikube.yaml#L43-L5301:10
clarkbyou need to set minikube_dns_resolvers01:10
mnaserah01:11
clarkbwhich we did in the tests but I guess we never plumbed that into the real jobs01:11
*** zbr|rover has quit IRC01:11
mnaseryes indeed that looks like the case01:11
clarkbI guess my changes allowed you to run it properly but you still have to set it wherever you run install-kubernetes >)01:11
clarkber :)01:11
mnaserit feels like sane user behaviour to maybe have a default on 1.1.1.101:12
mnaserclarkb: do you remember if that was discussed or not before i push a change for that?01:12
mnaseror is this too much of an opendev thing because not everyone runs unbound inside their own nodepool vms i guess01:12
clarkbI don't know that we discussed it, but considering zuul-jobs roles are meant to be pretty generic I'm not sure we should assume that?01:12
fungiit does mean you're unconditionally opting your users into google tracking their zuul environment's dns queries01:13
clarkbfungi: cloudflare, not google but ya01:13
fungioh, cloudflare is 1.1.1.1, google is 8.8.8.8 but right, neither is really above reproach, they both have tracking everyone as their business model01:14
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176401:14
mnaserok, so i think this should give us a functional nodepool deployed inside k8s01:14
clarkbfungi: wait 'til you find out that everytime you build go code you hit google servers and tell them what dependencies you are using01:15
clarkb(unless you set some random env vars that no one seems to rememberwhat they are)01:15
clarkbthis makes your build sfaster because they can cache things (and github is slow I guess)01:15
fungithe best way to find out is to go to china01:17
fungiand then discover that the libs go retrieves are slightly altered because they hit a different cache01:17
fungiwhich served you "suspect" substitutes01:18
clarkbin hmac verification libraries01:18
fungiright, nothing security-critical01:18
fungiyou know, just the bits that actually verify nothing else has been changed01:18
*** threestrands has joined #zuul01:19
*** zbr has joined #zuul01:19
fungifor those who weren't with us in shanghai to see it yourselves, i swear this is actually a true story01:21
mnaserhmm01:23
mnaserdo we have some sort of scenario for things inside zuul/zuul-jobs to allow the consumer of the job to .. do something after we do $things01:23
mnaserin my case, im trying to make a general "apply a helm chart to kubernetes" job, but id like to allow the user to run things after we've deployed the chart01:24
*** zbr has quit IRC01:24
mnaserputting the code that deploys the chart in pre.yaml is a bit iffy cause it can actually fail.. having the connsumer of the role use post.yaml also feels meh01:24
clarkbmnaser: if you make that step a pre-run step then the job consumer can supply a run playbook that executes after01:24
mnaserunless thats ok01:24
corvusmnaser: that's part of why we make roles for everything.01:24
clarkbalso that01:25
mnaserok, fair, so in that case you run.yaml would have the 'helm-template' role and then whatever you need/want after01:25
corvusyep, should be a tiny playbook; and if they're running something after anyway, it's just 1 more line01:25
*** zbr has joined #zuul01:25
clarkbre failures in pre that was something I was looking at pre holidays and it is a non zero occurence problem01:26
clarkbyou are right to be cautious there01:26
mnaserw000t01:28
mnasercorvus: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_548/701764/13/check/zuul-helm-functional/5484584/docker/k8s_launcher_zuul-nodepool-launcher-6b9454f5c8-shjrl_default_93aa561c-3726-4945-acbc-c37a0b7c6d15_0.txt check out the end of that file01:28
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts  https://review.opendev.org/70187401:29
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176401:31
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template'  https://review.opendev.org/70187101:40
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts  https://review.opendev.org/70187401:40
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176401:40
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176401:47
*** threestrands has quit IRC01:50
mnaserok cool, so we have apply-helm-to-k8s roles and jobs, and that's being used to test the zuul-system chart which currently deploys nodepool with zookeeper successfully.   i don't really have something that actually prods nodepool to test if its really running but i'll leave that for those more familiar with the integration testing of nodepool01:59
* mnaser will probably be sucked into other things in the next few days so anyone can feel free to take the progress and drive it01:59
clarkbthe ither integration jobs wait for min ready node to come active02:12
clarkbjust polling nodepool list with a tineout02:12
*** swest has quit IRC02:15
*** zbr_ has joined #zuul02:15
*** zbr has quit IRC02:16
*** zbr_ has quit IRC02:17
*** zbr has joined #zuul02:28
*** saneax has quit IRC02:29
*** swest has joined #zuul02:31
*** zxiiro has quit IRC02:35
*** rlandy has quit IRC02:55
mnaserclarkb: that would imply that I would need an openstack deployment alongside? Maybe it could be a multi node job where we deploy openstack in one and zuul with kubernetes on the other?02:55
*** bhavikdbavishi has joined #zuul02:57
*** bhavikdbavishi1 has joined #zuul03:00
*** bhavikdbavishi has quit IRC03:01
*** bhavikdbavishi1 is now known as bhavikdbavishi03:01
clarkbmnaser: no you can setup a k8s prover03:21
clarkber provider03:21
clarkband wait for that node to show up03:21
mnaserya I guess but that means I have to go through the non trivial amount of work in order to setup the service accounts and role bindings and add them to the role too04:27
*** saneax has joined #zuul04:34
*** evrardjp has quit IRC05:33
*** evrardjp has joined #zuul05:34
*** swest has quit IRC05:35
*** swest has joined #zuul06:22
*** themroc has joined #zuul06:39
*** pcaruana has joined #zuul07:53
*** avass has quit IRC07:56
*** avass has joined #zuul07:56
*** tosky has joined #zuul08:12
*** fdegir has quit IRC08:22
*** fdegir has joined #zuul08:23
*** pcaruana has quit IRC08:46
*** pcaruana has joined #zuul08:53
*** jpena|off is now known as jpena08:54
*** bhavikdbavishi has quit IRC09:16
*** zbr is now known as zbr|rover09:19
*** avass has quit IRC09:51
*** sanjayu_ has joined #zuul10:26
*** saneax has quit IRC10:26
*** sanjayu__ has joined #zuul10:27
*** sanjayu_ has quit IRC10:30
openstackgerritMatthieu Huin proposed zuul/zuul master: [WIP] Docker compose example: add keycloak authentication  https://review.opendev.org/66481310:41
*** mhu has joined #zuul10:58
*** avass has joined #zuul11:12
*** sshnaidm is now known as sshnaidm|off11:52
zbr|roverdid anyone attempted to make zuul badge dynamic like other CI systems? re https://zuul-ci.org/docs/zuul/user/badges.html11:52
*** jpena is now known as jpena|lunch12:01
*** pcaruana has quit IRC12:34
*** pcaruana has joined #zuul12:38
*** bhavikdbavishi has joined #zuul12:44
*** bhavikdbavishi1 has joined #zuul12:49
*** bhavikdbavishi has quit IRC12:51
*** bhavikdbavishi1 is now known as bhavikdbavishi12:51
*** sanjayu__ has quit IRC12:56
*** jpena|lunch is now known as jpena12:57
*** rlandy has joined #zuul13:22
tobiashzbr|rover: you mean like pass/failed?13:33
tobiashzbr|rover: we chose zuul/gated as static because the idea behind gating is that master is never broken13:34
zbr|rovertobiash: yeah, that is a very good sales speech ;) -- we do all know that reality is the builds are not green in perpetuity.13:36
zbr|roverfor the moment i am using the static one, but it would be nice/useful to also be able to expose result of last periodic. bit more tricky with zuul.13:37
tobiashExposing the last periodic needs a service that queries the zuul api and generates the appropriate svg13:39
*** Goneri has joined #zuul13:39
tobiashI guess this could be done with ~50 lines od python code13:40
tobiashit could be even useful to have an api endpoint in zuul that returns the latest result of a specific job as json and produces an svg when requesting application/svg13:44
tobiashor alter the already existing builds endpoint such that it generates the svg of the first hit13:45
tobiashthat would work even without adding an additional endpoint13:45
mhutobiash, that could be shown in the builds page too13:45
zbr|rovertobiash: it may be bit more tricky to implement, because the idea of the badge is to report status per project, not per individual jobs.13:45
tobiashzbr|rover: in that case: buildsets endpoint13:46
tobiashthere you can filter for project, branch and pipeline for the whole buildset13:47
zbr|roveryeah. nice to have it but has lower prio than other work with more impact.13:47
tobiashmaybe something when I get bored on a rainy weekend ;)13:48
*** sanjayu__ has joined #zuul14:01
openstackgerritSimon Westphahl proposed zuul/nodepool master: Always identify static nodes by node tuple  https://review.opendev.org/70196914:05
mnaseri did some refactoring for the collection of container logs into zuul/zuul-jobs and updated the related projects, appreciate some reviews - https://review.opendev.org/#/q/topic:collect-container-logs14:08
openstackgerritSimon Westphahl proposed zuul/nodepool master: Always identify static nodes by node tuple  https://review.opendev.org/70196914:10
openstackgerritMatthieu Huin proposed zuul/zuul master: JWT drivers: Deprecate RS256withJWKS, introduce OpenIDConnect  https://review.opendev.org/70197214:20
AJaegermnaser: will review - could I trade you https://review.opendev.org/700913, please?14:37
mnaserAJaeger: done :)14:38
mhuI'm looking at the Capabilities class in zuul.model, it says "Some plugins add elements to the external API. In order to facilitate consumers knowing if functionality is available or not, keep track of distinct capability flags."14:42
mhuIs there a mechanism to register such capabilities ? Or should I just add them to the class directly?14:43
AJaegerthanks, mnaser14:44
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Extract project config YAML into ref docs  https://review.opendev.org/70197714:46
mhuIs it used at all? "job_history" seems unsettable14:47
tobiashmhu: that's plumbed through to the api/info endpoint14:54
tobiashmhu: atm, the only use case is to inform the web ui wether builds history is existing or not (so if there is an sql connection)14:55
tobiashbased on that the web ui hides the builds tab14:55
tobiashmhu: there is no register mechanism for that so add it directly there14:57
tobiashit's initialized from here: https://opendev.org/zuul/zuul/src/branch/master/zuul/cmd/web.py#L5014:57
Shrewshrm, our Cross Project Gating doc is confusing to me. It states it shows "how to test future state of cross projects dependencies" but there is no "how" in there14:59
Shrewsi don't get the point15:00
tristanCShrews: yeah, my bad, i meant to add more content15:03
tobiashmhu: actually, job_history there is never set, so strike that...15:03
tobiashmhu: so you probably want to modify the fromConfig of WebInfo if you want to add some new capability to the info endpoint15:03
mhutobiash, yeah that's what puzzled me :) if you look at http://zuul.opendev.org/api/tenant/opendev/info for example, job_history = false yet you can hit the builds endpoint15:04
mhutobiash, right that's the simplest way, I was just wondering if I got the spirit of this right15:04
tobiashmhu: I guess that's broken since probably no one runs a zuul without a db so nobody complained ;)15:04
mhumordred, git log snitched on you, you added the webinfo capabilities in commit 518dcf8bdba9c5c22711297395c4a9cb4e0c644d, any insights on this ? :)15:06
ShrewstristanC: ah, i see15:06
corvusmnaser, clarkb: an integration test job could wait for a static node15:14
fungimhu: i hard-code `git blame` to just return "mordred" for everything ;)15:16
tristanCtobiash: i can't find the openshift scc for zuul you posted sometime ago... would you mind pasting them again?15:32
tobiashtristanC: I use the default privileged scc15:33
*** themroc has quit IRC15:35
*** themroc has joined #zuul15:35
mnasercorvus, clarkb: oh ya, i think that might work.  does nodepool need to scan the ssh-keys (aka i can just point at the node ip and thats probably enough?)15:38
openstackgerritMerged zuul/zuul-jobs master: Make pre-molecule tox playbook platform agnostic  https://review.opendev.org/70045215:38
*** themroc has quit IRC15:40
*** themroc has joined #zuul15:41
tobiashtristanC: you just need to make sure you run the executor with a service account that is allowed to use the privileged scc15:41
tristanCtobiash: alright, i remembered a more complex setting. i'll try that, thanks!15:44
*** jpena is now known as jpena|brb15:45
tristanCbtw I got an operator that now implements the full Zuul CR, here is an example usage: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-cr.yaml15:46
*** themroc has quit IRC15:46
*** themroc has joined #zuul15:47
corvustristanC: that looks nice!15:51
*** themroc has quit IRC15:51
*** themroc has joined #zuul15:51
corvustristanC: do you want to propose that up to the zuul-operator repo?15:52
*** rfolco has quit IRC15:52
tristanCcorvus: yes sure, if you don't mind the dhall based implementation, i'd be happy to push that to zuul/zuul-operator15:53
corvustristanC: i think it's worth looking at.  my biggest concern is that users don't see it.  but if it's just implementation detail, that's different.15:55
mordredmhu: yeah - I added that a while back with the intent on it being used like you're suggesting ... but then we never plumbed it all the way through - and then we started thinking that DB was really something we wanted to make a hard requirement in the future, so the motivation to finish the info/capabilities work kind of went away15:55
mhumordred, okay that clears things up15:56
clarkbcorvus: tristanC: a bit more annotation of the constructs would probably be good to explain what is going on since dhall is new. But I font think end users will ever interact with it15:56
mordredmhu: is there a new plugin capability you were thinking of adding that might be useful to register there?15:56
mhumordred: yeah, the authentication config15:57
corvusclarkb, tristanC: yeah -- i'm still not quite sure what dhall is doing here15:57
mhuas suggested by tobiash in https://review.opendev.org/#/c/643536/15:57
mordredmhu: ah - yeah, that makes total sense15:57
clarkbcorvus: after looking yesterday my understanding is it compiles  k8s CR yaml inputs to zuul config and ansible playbook outputs15:58
tristanCcorvus: clarkb: that's good to hear, i should be able to propose a review today or next week with what i currently have15:58
*** jangutter has quit IRC15:59
fungitristanC: skimming back through the dhall, how do you end up writing it? does your keyboard have keys for λ and → glyphs or are you using compose sequences? does dhall support a pure-ascii syntax variant maybe?16:02
corvustristanC, pabelanger, mnaser, mordred, clarkb: maybe we should schedule a meeting to talk about how to proceed?  since the spec says ansible, and mnaser proposed golang, and now tristan is looking at dhall, it seems like maybe we should try to get back on the same page.16:03
clarkbfungi: tristanC uses the ascii dhall version16:03
clarkbcorvus: tristanC's operator is still ansible16:03
tristanCfungi: it's currently debated here: https://discourse.dhall-lang.org/t/change-dhall-format-to-use-ascii-by-default16:03
clarkbcorvus: dhall takes the k8s CR input and rewrites it to ansible.16:03
tristanCcorvus: clarkb: yes i'm using the ansible operator-sdk16:04
fungiclarkb: he's not using pure ascii in https://github.com/TristanCacqueray/dhall-zuul/blob/master/render.dhall for example16:04
corvusclarkb: true, but i guess the spirit of the question is "as we maintain the operator, what are we going to expect to need to modify?"16:04
clarkbfungi: ah must of the files I read use \ instead of lambda and -> instead of arrow16:04
fungigot it, so there at least is an option of doing pure ascii anyway16:04
clarkbcorvus: yup, and dhall would definitely be involved in that. FWIW I think with a bit of annotation calling out what the expansions are I think it would end up being quite readable16:05
tristanCfungi: long story short, i wish i had a lambda key on my keyboard, but i type in ascii, and the formatter takes care of the rendering. I change the default to use '--ascii' recently as upstream is suggesting that, but the recent poll shows that unicode is still popular https://docs.google.com/forms/d/e/1FAIpQLSc_4Se7V6jRk4SBfAx1UdZ67Cf_5Hg0uRas5PxOMeRes4nQGg/viewanalytics16:05
clarkbit might also help to reduce some of the nesting16:06
corvusshould we maybe plan an irc and/or teleconference meeting next week to talk about it?16:06
fungiinteresting. people like to use a language they can't type without stuffing it through a preprocessor16:06
clarkbso that each logical expansion is a top level function16:06
clarkbrather than a bunch of closures16:06
mnaseri'm free next week but after that my schedule is a little funky as i'll be in EU for a bit16:06
clarkb(at least I think I would find it more readable that way)16:06
mnaseri can be available EU evenings16:07
mordredfungi: I imagine many of them are using os's/editors that have auto-convert features - but I'm just guessing16:07
mnaseri guess im the only one who learned about dhall today16:07
mordredcorvus: I think a meeting would be great. I'm on a long flight on tuesday - but could do an IRC meeting16:07
mordredmnaser: I know next to nothing about it other than tristanC's explorations16:07
tristanCcorvus: i should have time to cleanup and add comment, hoping that it will make my implementation cleaner16:07
tristanCclearer*16:08
mordredcorvus: other than that - I can do voice again after tuesday16:08
mnasermordred: oh yay, well that's a little bit more comforting :p16:08
clarkbI'm around next week. If we get hit by a snow storm I might disappear to go sledding with the kids or if power goes out but otherwise around :)16:08
fungii find it interesting on that dhall discourse thread that there are folks lobbying for completely removing the ascii option and only supporting extended unicode syntax16:08
mordredcorvus: maybe once tristanC has the annotated version up? because maybe with those annotations the code might make more immediate sense to our eyes and give us a stronger context to discuss the choice?16:09
corvusi'm thinking voice might be good for this one... how about wedesday 1800 utc?16:09
corvustristanC: would that be enough time?16:09
fungii'm free all wednesday16:09
mordredI'll be in singapore so that'll be 2am for me - BUT - I'll be jetlagged, so it's likely fine :)16:10
tristanCcorvus: wedesday 1800 works for me, i guess i'll start commenting the code now =)16:10
clarkbtristanC: and maybe reduce nesting so that it looks more like imperative programming with function calls?16:11
fungipoll results are intriguing as well... >90% write dhall in ascii, but >50% prefer that the preprocessor reformats it to extended unicode symbols16:11
mordredfungi: that's fascinating16:12
*** themroc has quit IRC16:12
corvusmordred: would 15:30 be any better?16:12
fungistill, with some 35% explicitly formatting it to ascii in the preprocessor and 41% saying they prefer to read it in ascii, i doubt the folks pushing to remove the ascii syntax entirely will have sufficient support16:13
mordredcorvus: probably? but I'm happy to do either regardless if 1800 works better for others16:13
corvusmordred, clarkb, tristanC, pabelanger, mnaser, fungi: https://ethercalc.openstack.org/ur9x4q4z1z7116:15
clarkbfungi: your questions about character sets as they intersect with programming remind me of apl https://en.wikipedia.org/wiki/APL_(programming_language)#/media/File:APL-keybd2.svg16:22
*** jpena|brb is now known as jpena16:22
fungiclarkb: reminiscent of https://en.wikipedia.org/wiki/Space-cadet_keyboard#/media/File:Space-cadet.jpg16:23
*** mattw4 has joined #zuul16:23
fungi(which actually existed)16:23
corvusokay, let's go with 15:30, i'll send an email16:24
fungithanks!16:26
*** rlandy is now known as rlandy|brb16:29
fungiclarkb: on closer inspection, the space cadet keyboard has all the extended dhall glyphs16:31
openstackgerritMatthieu Huin proposed zuul/zuul master: web capabilities: remove unused job_history attribute  https://review.opendev.org/70200116:41
mhumordred, tobiash ^ if it's fine with you16:41
mhuI'll look at what was done with the webhandlers and see if I can reimplement it somehow for the auth capabilities16:42
*** mhu has quit IRC17:02
*** mhu has joined #zuul17:03
*** rlandy|brb is now known as rlandy17:04
*** bhavikdbavishi has quit IRC17:04
AJaegeris it safe to go from go 1.13 to 1.13.5 by default in Zuul jobs? See https://review.opendev.org/#/c/700467/17:15
tobiashmhu: actually it's not required, but it us planned to make sql required17:16
clarkbAJaeger: https://golang.org/doc/devel/release.html#go1.13.minor looks safe though there is a 1.13.6 now17:16
clarkbAJaeger: in general I think upgrading minor releases is ok17:16
clarkbits the 1.13 -> 1.14 upgrade that will be potentially problematic17:17
AJaegerthanks, clarkb17:18
* AJaeger will +2A 700467 then17:19
*** zxiiro has joined #zuul17:24
*** sanjayu__ has quit IRC17:30
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template'  https://review.opendev.org/70187117:31
*** saneax has joined #zuul17:31
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts  https://review.opendev.org/70187417:31
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts  https://review.opendev.org/70187417:31
*** evrardjp has quit IRC17:33
*** evrardjp has joined #zuul17:34
pabelangercorvus: wfm, and yah, would be nice if we could all align17:36
openstackgerritMerged zuul/zuul-jobs master: install-go: bump version to 1.13.5  https://review.opendev.org/70046717:42
pabelangerHmm, we are seeing a traceback in github driver when trying to merge PR for a repo: http://paste.openstack.org/show/788250/18:01
pabelangerhowever, I do not know why that is18:01
pabelangertobiash: see^ before?18:02
pabelangerseen*18:02
clarkbpabelanger: https://github.community/t5/GitHub-API-Development-and/Resource-not-accessible-by-integration-when-requesting-GitHub/td-p/13829 reading that implies to me that the functionality you are trying to get isn't available to github applications18:02
clarkbhave to make a user request instead?18:03
pabelangerI wonder if something change on merge_method18:04
tobiashpabelanger: hrm, I never saw this before? What was the request?18:04
pabelangertobiash: gate passed, zuul trying to merge18:04
pabelangerhttps://github.com/ansible-network/collection_migration/pull/11 is PR18:05
tobiashpabelanger: in zuul nothing changed recently around that afaik18:05
tobiashpabelanger: does this affect the whole repo?18:05
tobiashThe branch pritection settings would be useful18:06
pabelangertobiash: yah, so this has 2 different branch protections rules, 1 for master the other for feature/* both are same but did two because not smart enough to figure out regex18:07
pabelangersettings are as follows18:07
pabelangerRequire status checks to pass before merging, check / gate18:07
pabelangerinclude admins18:07
pabelangerand restrict users to ansible-zuul (github app)18:08
pabelangerthis should be same as other working repos18:08
pabelangerAllow merge commits is selected18:08
*** pcaruana has quit IRC18:09
tobiashpabelanger: the restrict users to app is quite new, might be buggy?18:09
tobiashCan you get the post request from the logs?18:10
pabelangertobiash: I can clear, but so far haven't seen a failure before18:10
pabelangeryah, 1 sec18:10
tobiashAnd further to check that zuul isn't configured to use squash merge18:11
pabelangerhttps://pastebin.com/GerARqfK18:12
pabelangerlet me check that too18:12
pabelangertobiash: we don't set it, so it is the default18:13
pabelangermerge-commit, IIRC on github18:13
tobiashhrm no idea what could cause this18:16
*** rfolco has joined #zuul18:16
tobiashIs this affecting just this pr, the whole repo or the whole app?18:16
clarkbis it possible that restrict users is placing a restriction for user to server api calls18:16
clarkbwhich would break the application per the link I pasted18:16
pabelangertobiash: not sure, this is first PR18:17
pabelangerI can try other branch18:17
pabelangerclarkb: maybe, but can remove and see18:17
clarkbI would try removeing the restrict users and see if it works18:17
clarkb++18:17
tobiashyes, that' what I'd do as well18:17
tobiashpabelanger: first pr in that repo or of that gh app?18:18
tobiashin case of the latter also check the access rights of the app18:18
pabelangertobiash: well, not first PR, but first since we upgraded to 3.14.0 this worked dec 2019 when using 3.11.118:19
pabelangeryah, github app is on repo18:20
pabelangerhas correct permissiosn18:20
tobiashReally weird18:20
pabelangerno change removing ansible-zuul only18:23
clarkbpabelanger: fwiw the only github driver changes in zuul recently that I know of are the ones you and I made recently to make searching for dependencies more efficient18:25
clarkbnothing related to merging changes18:25
clarkbpabelanger: possible that github3 updated in your upgrade?18:25
*** jpena is now known as jpena|off18:25
pabelangerclarkb: yah, thats the change I was looking at18:26
pabelangergithub3.py==1.3.018:26
pabelangerthat is same version as before18:26
pabelangerwell, dame18:28
pabelangerdamn*18:28
pabelangerI think this is the issue18:28
pabelangerhttps://github.community/t5/GitHub-Actions/GitHub-action-Resource-not-accessible-by-integration/td-p/1970718:28
pabelangerOAuth tokens and GitHub Apps are restricted from editing the main.workflow file. This means it isn't possible to merge a Pull Request via an Action if it updates the main.workflow file I'm afraid.18:29
pabelangerlet me rebase away that change, and see what happens18:29
clarkbso it is change specific if you edit the workflow file?18:29
pabelangerclarkb: so, yah this is a little messed up. the PR, is a rebase of an upstream project18:30
pabelangerso a series of commit to bring master branch up to date18:30
pabelangerupstream uses github acctions18:30
pabelangeractions*18:30
tobiashpabelanger: ok, so that suggests that this problem won't affect other prs that don't modify those files18:32
pabelangerwelp, that is a terrible design18:33
pabelangerthat was the issue18:33
pabelangerhttps://github.com/ansible-network/collection_migration/pull/1118:33
pabelangerthat means18:33
pabelangerzuul cannot gate github repos, with .github/workflows, regardless of if the feature is enabled or not18:34
tobiashyay github :/18:34
pabelangeroh18:35
pabelangerthis repo has it enabled18:35
pabelangerso, let me disable it and add back18:35
pabelangeryup, fails even with disabled18:37
pabelangerthat is terrible18:37
pabelangerso, I guess I get to write a new job, that checks for .github/workflows folder and fail if found18:39
fungii'm not entirely sure how to parse github.community's date "format" but the response from github seems to have either been 10 or 8 months ago that they were "passing the feedback to the team to consider"18:40
fungiif they haven't solved this in most of a year, then i wouldn't hold my breath18:40
pabelangeryah, I would love to see where this is documented18:42
pabelangerfungi: clarkb: tobiash: guess, force push is the way to solve this18:48
clarkbpabelanger: or escalate your privs and click the merge button18:48
clarkbwhcih reduces chances that you'll merge the wrong thing18:49
pabelangeryah, doesn't scale too well18:49
clarkbwell should be infrequent right?18:49
clarkbonyl when you change those files18:49
pabelangerthinking long term, some ansible projects want github actions18:49
clarkboh sure long term we should figure out a way to address it and having zuul push merges back would probably do it18:50
clarkbI thought you meant for this change18:50
pabelangeryah, this case, best to delete folder and deal with rebase cost18:50
pabelangerbut yah, hear you18:50
pabelangerjust terrible that it is blocked regardless if actions is enable or disabled18:51
tobiashclarkb: push by zuul will be blocked as well according to that link18:52
clarkboh ha ok18:53
tobiashthe only way I see to solve this in a non hacky way is talking to github18:53
pabelangertobiash: not sure, internal ansible folks say it works with git push --force18:56
pabelangergrumble18:56
pabelangerthanks github18:56
clarkbtobiash: pabelanger that link says they used an action token and it was from the action18:56
clarkbcould be the git push fails if done in that way but with other user token would be ok?18:56
*** pcaruana has joined #zuul18:57
pabelangerhttps://github.com/ansible-community/collection_migration/pull/210/files18:57
tobiashclarkb: a different user probably works but this should be considered a hack because github apps is for handling access to the app...18:58
pabelangerI think colleciton migration tool is using deployement keys18:58
clarkbtobiash: yes, but the failure was from within the action not an app aiui18:59
clarkbtobiash: but maybe the action and app creds are equivalent here18:59
tobiashI guess it's the same type of auth token there as we see zuul failing to merge as well19:02
pabelangerso, right now, I can't think of a good way to deal with this in zuul, without wrting a new job for all projects and check if files exist.19:07
clarkba good start may simply to be to add a note about this in the github driver docs19:08
clarkbso that it is less of an investigation if people hit it in the future19:08
clarkbjlk: ^ may also be able to offer some insight19:08
pabelangeryah, would be interested in official docs (on github) so I can link to it, but can't seem to figure it out19:23
jlkoh that looks fine19:37
jlkIIRC Actions is still in beta, so there's definitely going to be docs gaps19:37
pabelangerjlk: what is best way to pass along feedback here? Aside from asking you :) Basically, if actions are disabled, we'd want github to allow merging of .github/workflows configs.19:39
jlkasking19:42
pabelangerthanks!19:43
jlkhttps://support.github.com/contact/feedback?contact[category]=GitHub+Actions19:44
jlkLooks like you still have to pick Actions. one sec19:44
jlk¯\_(ツ)_/¯ just pick it from the drop down19:45
pabelangerack, cool19:45
corvusjlk, pabelanger: secondarily, it would be nice if there were a way to merge changes if actions *are* enabled.  maybe an extra permission that can be granted to an app.  otherwise, it doesn't really leave much space for github actions to co-exist with other apps.19:48
jlkagreed19:48
pabelanger+119:49
corvuspabelanger: you want to pass on both of those pieces of feedback?19:49
pabelangerwill do19:49
corvuspabelanger: thanks!19:49
*** Goneri has quit IRC19:59
pabelangercorvus: jlk: feedback sent, not sure if there is a way for me to track status of it20:01
*** michael-beaver has joined #zuul20:57
*** rfolco has quit IRC21:06
*** zxiiro has quit IRC21:23
*** rlandy has quit IRC21:26
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Fix typo in helm role  https://review.opendev.org/70204621:27
*** Goneri has joined #zuul21:28
corvuspabelanger, yoctozepto: 678273 +3, thanks21:38
clarkbpabelanger: mordred: what was the process of doing volume detachments when the nova server no longer exists?21:46
clarkbiirc that was sorted out in here recently21:46
yoctozeptocorvus: thanks21:47
yoctozeptoany eta to be picked up by opendev/openstack instance?21:47
fungiclarkb: based on scrollback it was this api method: https://docs.openstack.org/api-ref/block-storage/v3/#force-detach-a-volume21:48
mordredclarkb: one sec ...21:48
mordredfungi: do you have that scrollback quick enough to find the paste link I sent to pabelanger?21:48
fungithough a clean recipe on how we can run that from bridge.o.o to do batch cleanup would be nice21:48
fungioh, there was a paste too? checking21:48
mordredfungi: yeah - I'm also looking21:49
fungihttp://paste.openstack.org/show/788133/21:49
fungithe power of grep compels you21:49
funginow if only paste.o.o could respond in a timely fashion21:50
mordredthe uuid in that string comes from volume.attachments and is the id of a given attachment. the overall algorithm is "get volume, look at attachments, look to see if the server.id listed in the attachment exists, if not, do that DELETE call to delete the attachment, then you can delete the volume"21:51
fungiand i guess i can pass the usual OS_CLIENT_CONFIG_FILE envvar?21:53
mordredfungi: something like http://paste.openstack.org/show/788260/ (untested) - and yes, you should be able to do that21:54
mordredfungi: where is this happening now?21:55
clarkbmordred: vexxhost sjc121:55
mordredif you like, I can do a quick try on that script without deleting thing and make it something we can run run21:55
mordredclarkb: the zuul account?21:55
mordredclarkb, fungi: give me a sec - let me make a proper script that we can run21:55
clarkbmordred: yes21:55
clarkbmordred: we've leaked a bunch of volumes which is preventing us from deleting images which is filling the builder disks21:56
openstackgerritMerged zuul/zuul-jobs master: collect-container-logs: add role  https://review.opendev.org/70186721:56
mordredclarkb: "awesome"21:56
corvusyoctozepto: maybe next week?21:57
fungimordred: yeah, i tried but am getting errors like this:21:57
fungiVolume 0f91579c-c627-452b-aad4-67cdeae865c3 could not be found.21:58
fungiwhen that volume shows up in the `openstack volume list` output21:58
mordredfungi: cool. let me take a swing at it21:58
fungi`openstack volume show 0f91579c-c627-452b-aad4-67cdeae865c3` provides an attachment_id of 42511fe3-d926-4b99-b275-38f0e8fa5a7621:59
fungiso i tried calling print(c.block_storage.delete('/attachments/42511fe3-d926-4b99-b275-38f0e8fa5a76', microversion='3.31'))21:59
fungileading me to suspect there's a deeper problem (though in all likelihood i'm just doing it wrong)22:00
mordredfungi: I feel like I just deleted that one22:08
mordred(the attachment)22:08
mordredcan you double check?22:08
fungilookin'22:10
fungiattachments []22:10
fungii believe you did22:10
fungiany clue what i was missing?22:10
fungimagic fingers?22:10
mordredmaybe magic fingers :)22:11
mordredI've got a script in /root called clean-volumes.py that should work - although I'm going to run it one more time in no-op mode22:11
mordredfungi: if you run it with OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml python3 clean-volumes.py22:12
mordredfungi: it should print a bunch of REST calls- then a long list of actions22:12
corvusmnaser: https://github.com/aslafy-z/helm-git looks interesting but iiuc, one would need to build a custom argo deployment image in order to use it22:13
mordredfungi: I think we mostly want to make sure that the list of actions it's going to take there don't include deleting the mirror volume22:13
fungisure, though ideally the mirror volume is in another castle22:13
mnasercorvus: !!!! i like that a lot22:14
corvusmnaser: i think https://argoproj.github.io/argo-cd/operator-manual/custom_tools/ is relevant to that22:14
mordredfungi: oh - right - the mirror volume is in openstackci-vexxhost right?22:14
mnasercorvus: yes, that would be exactly the way to go about it22:15
clarkbmordred: fungi yes it should be in the other tenant/project/whatever its called22:15
mnasercorvus: but we can use an initcontainer and cheat...22:15
mordredclarkb: ok - so - we should be "safe" if there is a bug in this script from deleting anything too important, right?22:15
clarkbmordred: I think so. Another check is to look at hte volume size22:15
mordredclarkb, fungi: maybe this is ian #openstack-infra convo - whoops, sorry22:15
clarkb80GB should be for the test nodes22:15
mnasercorvus: thanks for reviewing my collect-container-logs role, i refactored uses i saw of it in the topic - https://review.opendev.org/#/q/topic:collect-container-logs22:20
*** armstrongs has joined #zuul22:21
corvusmnaser: oh cool, thx22:21
mnaserbtw, how do we feel about shipping 'tools' inside zuul, for example, in the current ci jobs, nodepool loops forever erroring due to the fact that zookeeper is not up yet .. it would be nice if we had some small python tools that waited for a zookeeper cluster to be ready in the image so we can use those in the initContainer22:22
mnaserso that if the zookeeper cluster is not up, nodepool simply won't start22:23
openstackgerritMerged zuul/zuul master: Make files matcher match changes with no files  https://review.opendev.org/67827322:23
mnaserit avoids spamming the logs with tons of errors as it fails to connect22:23
mnaseror perhaps nodepool can expose some sort or alive/readiness check so the containers dont become ready until nodepool is ready22:24
mnaserthat way during a new deployment or rollout, if something goes wrong, it will stop rolling out the other pods22:25
clarkbmnaser: fwiw you can probably get away with nc or telnet or something simple like that22:25
clarkbI'm not sure we need to ship a special tool for that22:25
mnaserclarkb: yeah but then id have to do some yaml parsing of the list of zookeeper servers provided22:25
mnaserand assume the config format doesnt change for example22:25
clarkbaren't you providing that list to the config anyway?22:26
clarkb(meaning you could provide it to another tool)22:26
mnaserin nodepools case, im actually yaml-ifying the config input straight into a file, without having some manual 'insert this here' thing22:26
clarkbremember when we had init scripts that solved this for us22:27
* clarkb grumps22:27
clarkbmnaser: the cluster (zuul, nodepool and zk) will eventually converge on a happy state once zk is up and running right? its not like this going into an error state?22:28
clarkb*permanent error state22:29
mnaserclarkb: it will, but technically during a rollout, if something is borked, it will just keep rolling out and breaking everything22:29
clarkbbut is it broken?22:29
clarkb(If it is I think that is a bug worth fixing)22:29
mnaserright but we can forget zookeeper in this case, say, if you made a typo in your config and you're using the zuul helm charts22:29
mnaserif the first pod goes up and theres' no readiness check, it will starts rolling out the second one22:30
mnaserso you might rollout a broken config.. but if we have a readiness check where nodepool can tell k8s that "ok im good to go", it can safely continue with the rollout22:30
clarkbsure, checking for proper errors and not needing to wait for services to start in sequence are two different problems22:30
clarkbI think we should avoid needing strict sequencing22:31
mnaseryes, i agree, i think i started with one problem set and moved to another in the conversation22:31
*** armstrongs has quit IRC22:31
clarkbright what I was trying to get at earlier is if you've found a sequencing issue we should fix that at the source not workaround it22:31
clarkbhowever, config validation is a good thing to verify and both zuul and nodepool have config validation for the yaml bits but not the ini bits iirc22:32
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm  https://review.opendev.org/70205222:32
mnaserya i think the idea is to find a way to get nodepool to say "ok i'm up and im healthy"22:33
mnaserand then k8s can poll that as a readiness/liveness check22:33
corvusmnaser: i am not opposed to adding http (prometheus) endpoints that expose readiness22:34
corvuswould that help?22:34
mnaseryeah, so generally we'd probably end (ideally) with 3 endpoints -- readiness, liveness and metrics22:34
mnaserreadiness can equal liveness but metrics (prometheus) is different, but yeah, if we add an http endpoint, that's probably enough infrastructure to do it22:35
mnaserreason is most of the liveness and readiness checks are http request based rather than parsing metrics22:35
corvusokay, i thought readiness/liveness was part of prom; if it isn't let's set that aside for now, because prom for metrics is complicated22:35
corvusi'll rephrase that as "i think it would be fine to add http endpoints that expose liveness info"22:36
corvusand readiness :)22:36
mnasercool, i dont know if i have the time to dig into this but it would be neat if nodepool started reporting 200 OK on a port once it's connected to nodepool and all threads are up and running22:37
corvusat the same time, i think nodepool can run a config check on it's config as a gate job :)22:37
corvusbut, belts and suspenders.  we can have both.22:37
corvusmnaser: can you take a look at https://review.opendev.org/702052 ?22:38
corvusit's the most complex helm i've written to date :)22:38
mnasercorvus: i am, im trying to double check the ternary formatting22:38
mnaserso it looks like you can also do:  true | ternary "foo" "bar"22:39
mnasernot that i think its a big deal, but that is easier for my brain to process22:39
corvusoh neat, that's sort of ansibleish22:39
corvusyeah, it took me a minute to understand the docs and realize the condition was last22:39
mnaserhelm uses this library https://github.com/Masterminds/sprig afaik22:40
mnaserhttps://github.com/Masterminds/sprig/blob/48e6b77026913419ba1a4694dde186dc9c4ad74d/docs/defaults.md so thats there, but yeah, it seems right, though easier to process the more ansible-y one22:40
clarkbmnaser: corvus do you think we need that for every microservice or just the "brain"22:40
corvusmnaser: i'll switch to "|"22:41
clarkbeg with zuul would we want each executor and merger etc to report it or have zuul web aggregate and report a single value22:41
mnaserclarkb: the approach is usually each microservice runs its own health check, because a single zuul executor can actually have having problems with the rest being ok22:41
corvusclarkb: good q -- if we can have zuul-web do that, it would be best... mnaser would that be complicated to have the liveness check check a service?22:41
mnaserhttps://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request22:42
corvusmnaser: right, but could you say, hit "http://zuul/live?host=executor-4" ?22:42
mnaserso generally the http request hits a url/path on the same pod afaik22:42
clarkbmnaser: and I guess you can assign domains to those readyness? beacuse a single executor of many being broken is why we have more than one (or one reason to)22:42
clarkbso zuul application being up is different than a specific executor being up22:42
mnaserright, so if it is stateless and its broken, k8s can (up to you) decide to kill it and restart it22:42
mnasermany/most of the probes seem to want to hit the pod itself, also, the other reason why you wouldnt want it to hit zuul-web, because if you have issues with zuul-web, you dont want  your executors to all show as not alive22:43
mnaserand then k8s goes ahead and kills them because they are failing health checks22:43
corvusa liveness command may be easier?22:44
corvushttps://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-command22:44
mnaseryes, i forgot about the commands22:44
clarkbthat all makes sense. however your earlier example cares about zuul/nodepool the application not its individual parts22:44
clarkbso I think we end up wanting both things22:44
mnaserthat might be a lot better, we can just write a file22:44
corvuswe've got an easy pattern for commands with 'zuul-scheduler foo' and 'zuul-executor foo'22:44
corvuswe could extend that to 'zuul-executor ready'22:45
corvusto return an exit code based on readiness22:45
corvusand apply that to every microservice, regardless of whether it otherwise runs an http server22:45
mnaseri think the only tricky bit is thats going to be out of process22:45
mnaserso unless you're planning to have some sort of socket open that it can talk to it.. or some file it lays down that the other process reads22:45
clarkbmnaser: thats how the commands work today22:46
corvusyeah, we have a socket22:46
mnaseroh okay so thats perfect then22:46
mnaserseems easy now, too easy :p22:46
corvusscheduler and executor definitely have this already, maybe some others, and the command structure is standardized enough that we can put it on all the commands real quick22:46
corvusso yeah, that's probably < 1 hour of work :)22:47
mnaserzuul-web can get away with it and just use an http helth check22:47
clarkbnodepool too. though that socket is to zk22:47
mnasergiven it already runs a server22:47
clarkbso might get weird if zk isn't up :)22:47
mnaserwell part of the ready could be an attempt to connect to zk and if that fails, then its not ready22:47
corvusclarkb: nodepool and zuul share a process startup framework, so should be easy to add commands to that22:47
corvusclarkb: so we could do "nodepool-builder ready"22:48
corvus(rather than "nodepool ready")22:48
clarkbmakes sense22:48
mnasercorvus: btw, could you also rebase or depends-on https://review.opendev.org/#/c/701764/ as well so we can see it in action too22:48
mnaser(for the secret change)22:48
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm  https://review.opendev.org/70205222:49
corvusmnaser: oh yep 1 sec22:49
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Change builder container name  https://review.opendev.org/70179322:50
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Add empty clouds value  https://review.opendev.org/70186522:50
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm  https://review.opendev.org/70205222:50
corvusmnaser: https://gerrit-review.googlesource.com/c/zuul/ops/+/250112 is what that looks like in action22:54
corvusso the tradeoff is that the secret has to be set up external to argo/helm, but in exchange we get to just edit the nodepool file on disk.22:55
clarkbcorvus: mnaser if I'm looking at ^ and want to figure out where the chart is for zookeeper how do I find that?22:55
mnaseri think corvus deployed it externally outside argo22:55
corvusclarkb: readme line 2622:56
*** avass has quit IRC22:56
clarkbya thats just a giant xml doc22:56
corvusclarkb: oh, yeah that's the thing we were talking about with fungi yesterday.22:57
mnasercorvus: btw.. kubectl -n argocd get application/zookeeper -- trim that down, and add it as an app22:57
mnaserso you can deploy zookeeper via argocd too :)22:57
mordredpabelanger, Shrews: https://review.opendev.org/702053 <-- has a complete script implementing cleaning up after leaked volumes from BFV openstack clouds that are attached to non-existent servers22:58
clarkbcorvus: I'm curious to see if they modify the zk settings to make zk not terribly slow, and rotate the journal file, and run in a 3 or 5 pod cluster22:58
corvusmnaser: how's that different than the "argocd app create" for zookeeper22:58
clarkbcorvus: really the first two things are probably most important otherwise your zuul will be slow and then run out of disk and have a safd22:58
mnasercorvus: argocd app create pretty much creates a local yaml manifest and pushes it out to the cluster :)22:58
mnaserclarkb: there isn't somewhere its listed, but its here -- https://github.com/helm/charts/tree/master/incubator/zookeeper22:59
corvusclarkb: should be ~= to this https://github.com/helm/charts/tree/master/incubator/zookeeper22:59
mnaserand you can look at all the values https://github.com/helm/charts/blob/master/incubator/zookeeper/values.yaml22:59
corvusmnaser: so kubectl -n argocd get application/zookeeper is a shorthand?23:00
mnasercorvus: right, i meant that you can get the application definition from k8s and store it in-repo, so you dont have to bootstrap it with a cli command23:00
mnaseryou can kubectl apply -f zookeeper-app.yaml nodepool-app.yaml23:00
clarkbya antiaffinity is option so you'll likely want to toggle that23:03
mnaseralso it uses a 5G pv by default i think23:04
clarkbmnaser: corvus https://github.com/helm/charts/blob/master/incubator/zookeeper/values.yaml#L277 is the value you want to change to avoid running out of disk iirc23:09
clarkbotherwise you grow an infinite number of snaps23:09
corvusclarkb: thanks!23:10
clarkbyup our system-config/manifests/site.pp comments confirm23:10
clarkbopendev sets that to 6 which means every 6 hours purge old snaps down to snap retain count23:10
clarkbthe other thing we change is snapCount to 1000023:11
clarkbthat sets a higher limit to how big of a transactio nbacklog zk can have before it throttles clients to catch up23:11
clarkbiirc because we are bursty when restarting services that helps during those times23:12
corvusclarkb: i've made notes to change both of those23:12
clarkbcool23:12
corvusmnaser: do you have any idea what the linter is on about here? https://zuul.opendev.org/t/zuul/build/ba271a844a6c46a38c303ba4e88e33ad23:12
clarkbthe other thing to check is log rotation but https://github.com/helm/charts/blob/master/incubator/zookeeper/templates/config-script.yaml#L92 takes care of that for you I belive.23:13
corvusmordred: do you happen to have thoughts about a mysql-ish helm chart?  istr you used helm for the pxc thing a while back?23:13
corvusmaybe just https://github.com/helm/charts/tree/master/stable/percona-xtradb-cluster ?23:13
mordredcorvus: I did - but I did a helm export and then committed the results23:14
mordredcorvus: yeah - I think that's what I used ... looking real quick23:14
mordredcorvus: kubernetes/export-percona-helm.sh23:14
mordredcorvus: in system-config - is what I used when we did things before23:14
mordredcorvus: but since you're using helm directly, you obviously don't need to export - but maybe the arguments will be helpful23:15
corvusmordred: cool, thanks23:15
mnasercorvus: i think we need to add an empty tenantConfig and conditional on "extraFiles"23:15
corvusmnaser: ^ for the sql part of this23:15
corvusmnaser: oh i see, i'll see if i can fix that real quick23:16
mnaserbecause its probably rendering to like...23:16
mnaserhttps://www.irccloud.com/pastebin/wRRWY0sm/23:16
mnaserpersonally, i'm pretty indifferent about db.  i actually use this operator cause it has backups and what not: https://github.com/presslabs/mysql-operator23:17
mnaserso for me just need to make sure we have a way to disable it23:18
mordredoh cool. that operator was _not_ finished back when we were looking at gitea23:18
mnasermordred: yeah its pretty polished and runs pretty well for us.  the reason i am not using the percona one because.. at least for our openstack use case, we're always reading/writing to one server all the time because of deadlock stuff23:19
corvusi love the juxtaposition of "bulletproof" and "alpha and not suitable for critical production workloads"  :)23:19
mordredcorvus: I do not have emotional ties to the other thing - that was just best I could find back then23:20
mnaserso running a master/slave(ehhh, do we have another term folks use these days for dbs?) does the same thing23:20
mordredmnaser: oh - right, that's replication based not galera based, yeah?23:20
mnaseryep23:20
mordredmnaser: also - I probably don't want to know about the deadlocks thing with openstack do I?23:21
clarkbtristanC is using postgres in his operator demo for zuul23:21
mnasermordred: https://www.percona.com/blog/2014/09/11/openstack-users-shed-light-on-percona-xtradb-cluster-deadlock-issues/23:21
clarkbfwiw23:21
mnaser"The simplest way to overcome this issue from the operator’s point of view is to use only one writer node for these types of transactions. This usually involves configuration change at the load-balancer level."23:22
mnaserthats 2014 but i dont know if much work was put into rewriting those queries so they dont deadlock a cluster23:22
fungithe overlord/subjugated model of database clustering23:23
mordredoh - right. I remember discussion on that and discussions about not going things that way - and I'm pretty sure it died on the vine23:23
*** rfolco has joined #zuul23:25
tristanCclarkb: i just use pg because i find it easier to setup and use23:25
clarkbah ok so nothing to do with operator readiness23:25
*** mattw4 has quit IRC23:27
corvusmnaser: it looks like extraFiles is the tricky part there; i can't (with my limited helm experience) find an easy way to do that kind of arbitrary include.  one way to fix it would be to iterate over it as a dict23:28
tristanCto help with service readyness, i added init-container to most service to run python -c 'socket.connect(service, port)' on the dependency (e.g. executor -> gearman -> zk -> db), this make the deployment a bit slower but the service logs are cleaner like that23:29
mnasercorvus: you could just simply add an if statement, if empty map in Golang evaluates to false23:30
corvusmnaser: ah yep23:30
mnaserI believe so anyways. Alternatively, you can look at how I did nodeSelector and tolerations too23:31
* mnaser is on mobile so can’t point to specific23:31
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Add Zuul charts  https://review.opendev.org/70046023:31
*** pcaruana has quit IRC23:35
*** rfolco has quit IRC23:40
*** michael-beaver has quit IRC23:57
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Allow tenant config file to be managed externally  https://review.opendev.org/70205723:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!