Friday, 2020-01-10

corvus	tristanC: do you think we should use dhall instead of ansible or go?	00:00
tristanC	and i'm also able to generate an ansible playbook based on the same spec: https://github.com/TristanCacqueray/dhall-zuul/blob/master/deployments/cr-playbook.yaml	00:00
tristanC	corvus: well, the k8s operator still uses ansible: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/roles/zuul/tasks/main.yaml	00:00
corvus	tristanC: oh, so you have ansible running dhall?	00:01
clarkb	dhall is just a config language (replacement for json or yaml) right?	00:01
tristanC	corvus: yes, dhall is just used to generate the k8s object, it doesn't manage zuul. I'll use ansible to implement backup/restore and scalling	00:01
clarkb	I don't think there is an operator sdk implementation using dhall	00:02
corvus	ok. got it. yeah. i mean, i wouldn't put it past someone to have made an operator sdk implementation with dhall. :)	00:02
tristanC	clarkb: that's correct, dhall is like a more advanced yaml or json, the evaluation only results in a configuration	00:02
corvus	remote: https://gerrit-review.googlesource.com/c/zuul/ops/+/250112 Add initial bootstrapping instructions and nodepool config [NEW]	00:03
corvus	mnaser: ^ that's what i did today	00:03
corvus	mnaser: thanks for your help :)	00:03
tristanC	corvus: well i wrote another operator that applies a dhall expression to kubernetes, e.g.: https://github.com/TristanCacqueray/dhall-operator/blob/master/operator/examples/dhall-test.yaml	00:04
corvus	clarkb: ^ there you go :)	00:04
corvus	tristanC: nicely done :)	00:04
tristanC	corvus: but it's implemeted in ansible too: https://github.com/TristanCacqueray/dhall-operator/blob/master/operator/roles/dhall/tasks/main.yml	00:04
*** mattw4 has quit IRC		00:04
tristanC	but that's low-level, for zuul i respected the spec defined in zuul	00:05
mnaser	corvus: woot, the deployment stuff i made works but im trying to find a way to wait for the deployment to fully rollout	00:05
mnaser	as the deployment is completing properly but nodepool doesnt go up cause zookeeper isnt done rolling out yet (as it puts out 3 replicas)	00:05
tristanC	corvus: thanks, it's working quite well, and based on the same definition of services and configuration, it's super convenient to generate k8s objects as well as ansible playbooks	00:05
clarkb	tristanC: what is the advantage to compiling dhall to k8s objects yaml or ansible playbooks yaml over writing them directly? templating?	00:06
corvus	mnaser: yeah, i guess argo reports synced even before k8s is finished running the pods, and even running the pods isn't the same as 'service is up and running'	00:06
mnaser	corvus: yeah, sync'd means the k8s state == charts, "progressing" = it's doing things	00:06
tristanC	clarkb: mostly so that the user doesn't have to worry about dhall, the interface is the zuul crd	00:07
mnaser	and it transitions from progressing to healthy	00:07
clarkb	tristanC: so it goes from zuul crd yaml to dhall to ansible yaml?	00:08
tristanC	clarkb: i missread... well there are many advantage for using dhall, but if i had to pick one is that it is programmable	00:08
clarkb	ya ok I was worried that was it :) but maybe I'm just scarred from dealing with python packaging and its programming config :)	00:09
tristanC	clarkb: the input is json, e.g.: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-cr.yaml#L6-L28	00:09
tristanC	clarkb: which is converted in this Input type: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L39-L56	00:10
tristanC	clarkb: and passed on to this function: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L96	00:10
tristanC	which can then be used to generate k8s object, ansible playbooks or podman compose script, see the 'cr-*' file here: https://github.com/TristanCacqueray/dhall-zuul/tree/master/deployments	00:11
clarkb	ya so dhall is like your IR for a compile between config formats	00:12
tristanC	clarkb: it's not really comparable to python setup.py since python can do side effects, dhall evaluation is strictly pure	00:13
tristanC	clarkb: yes, something like that, at least that's what i tried for zuul with this project: https://github.com/TristanCacqueray/dhall-operator	00:13
tristanC	hopefully nodepool and the remaining bits like mqtt should be implemented by tomorrow, then i'll push an usable operator image	00:16
tristanC	also, i wrote an optional git-daemon service to serve a ready to use configuration, using a fast periodic pipeline it's quite handy to validate the whole setup, it looks like this: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Demo.dhall#L23-L82	00:19
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	00:20
mnaser	anyone has any creative nice ideas for the name of a zuul job which would run a helm chart against a k8s cluster?	00:21
clarkb	mnaser: "apply-helm-chart" ?	00:22
mnaser	that seems reasonable	00:22
mnaser	once i get this test working, i'll refactor out to zuul/zuul-jobs cause i'd like to make use of this too	00:22
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	00:27
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867	00:34
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	00:36
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867	00:38
openstackgerrit	Mohammed Naser proposed zuul/zuul-registry master: Switch to collect-container-logs https://review.opendev.org/701868	00:39
openstackgerrit	Mohammed Naser proposed zuul/nodepool master: Switch to collect-container-logs https://review.opendev.org/701869	00:42
openstackgerrit	Mohammed Naser proposed zuul/zuul-registry master: Switch to collect-container-logs https://review.opendev.org/701868	00:42
clarkb	tristanC: does https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Demo.dhall#L143-L153 and https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Demo.dhall#L192-L194 mean there is a zuul main.yaml generated for non scheduler services?	00:49
clarkb	also I grok this a bit more now that I've realized its haskell with "io" built in	00:50
clarkb	its like bash -x for haskell kind of	00:50
clarkb	but the trace is also the output if that makes sense	00:50
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867	00:52
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	00:52
mnaser	so close. i think zookeeper needs more than 300s to start up in this case	00:53
tristanC	clarkb: yes, that's a demo "application", the zuul operator is using this another one which takes the main.yaml from a user provided secrets: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-cr.yaml#L13 that requires users to applies secret before the Zuul cr such as: https://github.com/TristanCacqueray/dhall-zuul/blob/master/deployments/cr-input-k8s.yaml#L30	00:54
tristanC	clarkb: which is added to scheduler service here: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L290-L295 and https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L230	00:55
tristanC	clarkb: this could be made optional in the future, for example when the user provide a list of project https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-connection.yaml#L11 (which would be expanded like so: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Connection.dhall#L84-L93 )	00:57
tristanC	clarkb: another neat example is how the zk services is only enabled when the user doesn't specify a zk config: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/application/Zuul.dhall#L150-L165	01:00
mnaser	welp, everything is working nicely but	01:05
mnaser	coredns is broken in our install-kubernetes job :)	01:05
mnaser	https://6efce3be0458ca4ff401-19d327c34d133e2dbd296ad72151667c.ssl.cf5.rackcdn.com/701764/11/check/zuul-helm-functional/bf5d120/docker/k8s_coredns_coredns-6955765f44-csbfp_kube-system_7ae315e6-3ae6-4f07-a30c-80f511b6819f_5.txt	01:05
tristanC	clarkb: it's similar to a haskell with "io" built in, but the type system is different, it's based on CCw: https://hal.inria.fr/hal-01445835	01:05
clarkb	tristanC: ya structurally it is similar	01:06
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template' https://review.opendev.org/701871	01:06
tristanC	clarkb: well it's also very different since you can't do recursion, expression are guarated to evaluate (or fail, but not hang or do side effect)	01:07
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	01:07
clarkb	mnaser: hrm I thought we set the coredns config to forward to whatever is set as the hosts forwarders	01:07
clarkb	mnaser: and 1.1.1.1 and 8.8.8.8 should not be forwarding back to coredns	01:07
mnaser	clarkb: so the issue here i think is that you set it as the hosts forwarders, which is 127.0.0.1	01:08
mnaser	and when coredns starts up, it uses the hosts' /etc/resolv.conf (or whatever you pointed it to)	01:08
clarkb	mnaser: no we changed that	01:08
clarkb	or at least I thought we did /me looks to find it	01:08
mnaser	https://6efce3be0458ca4ff401-19d327c34d133e2dbd296ad72151667c.ssl.cf5.rackcdn.com/701764/11/check/zuul-helm-functional/bf5d120/docker/k8s_coredns_coredns-6955765f44-csbfp_kube-system_7ae315e6-3ae6-4f07-a30c-80f511b6819f_5.txt	01:08
mnaser	that tells me that we're hitting 127.0.0.1 (hence how the loop gets caught)	01:09
clarkb	well it tells you there is a forwarding loop it doesn't say anything about who is forwarding	01:09
mnaser	oh look	01:09
mnaser	yes it seem st obe fixed hmm	01:09
mnaser	hah	01:10
mnaser	jokes on you!	01:10
mnaser	2020-01-10 00:57:18.993217 \| ubuntu-bionic \| * Preparing Kubernetes v1.17.0 on Docker '19.03.5' ...	01:10
mnaser	2020-01-10 00:57:21.852544 \| ubuntu-bionic \| - kubelet.resolv-conf=/run/systemd/resolve/resolv.conf	01:10
clarkb	https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/install-kubernetes/tasks/minikube.yaml#L43-L53	01:10
clarkb	you need to set minikube_dns_resolvers	01:10
mnaser	ah	01:11
clarkb	which we did in the tests but I guess we never plumbed that into the real jobs	01:11
*** zbr\|rover has quit IRC		01:11
mnaser	yes indeed that looks like the case	01:11
clarkb	I guess my changes allowed you to run it properly but you still have to set it wherever you run install-kubernetes >)	01:11
clarkb	er :)	01:11
mnaser	it feels like sane user behaviour to maybe have a default on 1.1.1.1	01:12
mnaser	clarkb: do you remember if that was discussed or not before i push a change for that?	01:12
mnaser	or is this too much of an opendev thing because not everyone runs unbound inside their own nodepool vms i guess	01:12
clarkb	I don't know that we discussed it, but considering zuul-jobs roles are meant to be pretty generic I'm not sure we should assume that?	01:12
fungi	it does mean you're unconditionally opting your users into google tracking their zuul environment's dns queries	01:13
clarkb	fungi: cloudflare, not google but ya	01:13
fungi	oh, cloudflare is 1.1.1.1, google is 8.8.8.8 but right, neither is really above reproach, they both have tracking everyone as their business model	01:14
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	01:14
mnaser	ok, so i think this should give us a functional nodepool deployed inside k8s	01:14
clarkb	fungi: wait 'til you find out that everytime you build go code you hit google servers and tell them what dependencies you are using	01:15
clarkb	(unless you set some random env vars that no one seems to rememberwhat they are)	01:15
clarkb	this makes your build sfaster because they can cache things (and github is slow I guess)	01:15
fungi	the best way to find out is to go to china	01:17
fungi	and then discover that the libs go retrieves are slightly altered because they hit a different cache	01:17
fungi	which served you "suspect" substitutes	01:18
clarkb	in hmac verification libraries	01:18
fungi	right, nothing security-critical	01:18
fungi	you know, just the bits that actually verify nothing else has been changed	01:18
*** threestrands has joined #zuul		01:19
*** zbr has joined #zuul		01:19
fungi	for those who weren't with us in shanghai to see it yourselves, i swear this is actually a true story	01:21
mnaser	hmm	01:23
mnaser	do we have some sort of scenario for things inside zuul/zuul-jobs to allow the consumer of the job to .. do something after we do $things	01:23
mnaser	in my case, im trying to make a general "apply a helm chart to kubernetes" job, but id like to allow the user to run things after we've deployed the chart	01:24
*** zbr has quit IRC		01:24
mnaser	putting the code that deploys the chart in pre.yaml is a bit iffy cause it can actually fail.. having the connsumer of the role use post.yaml also feels meh	01:24
clarkb	mnaser: if you make that step a pre-run step then the job consumer can supply a run playbook that executes after	01:24
mnaser	unless thats ok	01:24
corvus	mnaser: that's part of why we make roles for everything.	01:24
clarkb	also that	01:25
mnaser	ok, fair, so in that case you run.yaml would have the 'helm-template' role and then whatever you need/want after	01:25
corvus	yep, should be a tiny playbook; and if they're running something after anyway, it's just 1 more line	01:25
*** zbr has joined #zuul		01:25
clarkb	re failures in pre that was something I was looking at pre holidays and it is a non zero occurence problem	01:26
clarkb	you are right to be cautious there	01:26
mnaser	w000t	01:28
mnaser	corvus: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_548/701764/13/check/zuul-helm-functional/5484584/docker/k8s_launcher_zuul-nodepool-launcher-6b9454f5c8-shjrl_default_93aa561c-3726-4945-acbc-c37a0b7c6d15_0.txt check out the end of that file	01:28
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874	01:29
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	01:31
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template' https://review.opendev.org/701871	01:40
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874	01:40
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	01:40
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	01:47
*** threestrands has quit IRC		01:50
mnaser	ok cool, so we have apply-helm-to-k8s roles and jobs, and that's being used to test the zuul-system chart which currently deploys nodepool with zookeeper successfully. i don't really have something that actually prods nodepool to test if its really running but i'll leave that for those more familiar with the integration testing of nodepool	01:59
* mnaser will probably be sucked into other things in the next few days so anyone can feel free to take the progress and drive it		01:59
clarkb	the ither integration jobs wait for min ready node to come active	02:12
clarkb	just polling nodepool list with a tineout	02:12
*** swest has quit IRC		02:15
*** zbr_ has joined #zuul		02:15
*** zbr has quit IRC		02:16
*** zbr_ has quit IRC		02:17
*** zbr has joined #zuul		02:28
*** saneax has quit IRC		02:29
*** swest has joined #zuul		02:31
*** zxiiro has quit IRC		02:35
*** rlandy has quit IRC		02:55
mnaser	clarkb: that would imply that I would need an openstack deployment alongside? Maybe it could be a multi node job where we deploy openstack in one and zuul with kubernetes on the other?	02:55
*** bhavikdbavishi has joined #zuul		02:57
*** bhavikdbavishi1 has joined #zuul		03:00
*** bhavikdbavishi has quit IRC		03:01
*** bhavikdbavishi1 is now known as bhavikdbavishi		03:01
clarkb	mnaser: no you can setup a k8s prover	03:21
clarkb	er provider	03:21
clarkb	and wait for that node to show up	03:21
mnaser	ya I guess but that means I have to go through the non trivial amount of work in order to setup the service accounts and role bindings and add them to the role too	04:27
*** saneax has joined #zuul		04:34
*** evrardjp has quit IRC		05:33
*** evrardjp has joined #zuul		05:34
*** swest has quit IRC		05:35
*** swest has joined #zuul		06:22
*** themroc has joined #zuul		06:39
*** pcaruana has joined #zuul		07:53
*** avass has quit IRC		07:56
*** avass has joined #zuul		07:56
*** tosky has joined #zuul		08:12
*** fdegir has quit IRC		08:22
*** fdegir has joined #zuul		08:23
*** pcaruana has quit IRC		08:46
*** pcaruana has joined #zuul		08:53
*** jpena\|off is now known as jpena		08:54
*** bhavikdbavishi has quit IRC		09:16
*** zbr is now known as zbr\|rover		09:19
*** avass has quit IRC		09:51
*** sanjayu_ has joined #zuul		10:26
*** saneax has quit IRC		10:26
*** sanjayu__ has joined #zuul		10:27
*** sanjayu_ has quit IRC		10:30
openstackgerrit	Matthieu Huin proposed zuul/zuul master: [WIP] Docker compose example: add keycloak authentication https://review.opendev.org/664813	10:41
*** mhu has joined #zuul		10:58
*** avass has joined #zuul		11:12
*** sshnaidm is now known as sshnaidm\|off		11:52
zbr\|rover	did anyone attempted to make zuul badge dynamic like other CI systems? re https://zuul-ci.org/docs/zuul/user/badges.html	11:52
*** jpena is now known as jpena\|lunch		12:01
*** pcaruana has quit IRC		12:34
*** pcaruana has joined #zuul		12:38
*** bhavikdbavishi has joined #zuul		12:44
*** bhavikdbavishi1 has joined #zuul		12:49
*** bhavikdbavishi has quit IRC		12:51
*** bhavikdbavishi1 is now known as bhavikdbavishi		12:51
*** sanjayu__ has quit IRC		12:56
*** jpena\|lunch is now known as jpena		12:57
*** rlandy has joined #zuul		13:22
tobiash	zbr\|rover: you mean like pass/failed?	13:33
tobiash	zbr\|rover: we chose zuul/gated as static because the idea behind gating is that master is never broken	13:34
zbr\|rover	tobiash: yeah, that is a very good sales speech ;) -- we do all know that reality is the builds are not green in perpetuity.	13:36
zbr\|rover	for the moment i am using the static one, but it would be nice/useful to also be able to expose result of last periodic. bit more tricky with zuul.	13:37
tobiash	Exposing the last periodic needs a service that queries the zuul api and generates the appropriate svg	13:39
*** Goneri has joined #zuul		13:39
tobiash	I guess this could be done with ~50 lines od python code	13:40
tobiash	it could be even useful to have an api endpoint in zuul that returns the latest result of a specific job as json and produces an svg when requesting application/svg	13:44
tobiash	or alter the already existing builds endpoint such that it generates the svg of the first hit	13:45
tobiash	that would work even without adding an additional endpoint	13:45
mhu	tobiash, that could be shown in the builds page too	13:45
zbr\|rover	tobiash: it may be bit more tricky to implement, because the idea of the badge is to report status per project, not per individual jobs.	13:45
tobiash	zbr\|rover: in that case: buildsets endpoint	13:46
tobiash	there you can filter for project, branch and pipeline for the whole buildset	13:47
zbr\|rover	yeah. nice to have it but has lower prio than other work with more impact.	13:47
tobiash	maybe something when I get bored on a rainy weekend ;)	13:48
*** sanjayu__ has joined #zuul		14:01
openstackgerrit	Simon Westphahl proposed zuul/nodepool master: Always identify static nodes by node tuple https://review.opendev.org/701969	14:05
mnaser	i did some refactoring for the collection of container logs into zuul/zuul-jobs and updated the related projects, appreciate some reviews - https://review.opendev.org/#/q/topic:collect-container-logs	14:08
openstackgerrit	Simon Westphahl proposed zuul/nodepool master: Always identify static nodes by node tuple https://review.opendev.org/701969	14:10
openstackgerrit	Matthieu Huin proposed zuul/zuul master: JWT drivers: Deprecate RS256withJWKS, introduce OpenIDConnect https://review.opendev.org/701972	14:20
AJaeger	mnaser: will review - could I trade you https://review.opendev.org/700913, please?	14:37
mnaser	AJaeger: done :)	14:38
mhu	I'm looking at the Capabilities class in zuul.model, it says "Some plugins add elements to the external API. In order to facilitate consumers knowing if functionality is available or not, keep track of distinct capability flags."	14:42
mhu	Is there a mechanism to register such capabilities ? Or should I just add them to the class directly?	14:43
AJaeger	thanks, mnaser	14:44
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Extract project config YAML into ref docs https://review.opendev.org/701977	14:46
mhu	Is it used at all? "job_history" seems unsettable	14:47
tobiash	mhu: that's plumbed through to the api/info endpoint	14:54
tobiash	mhu: atm, the only use case is to inform the web ui wether builds history is existing or not (so if there is an sql connection)	14:55
tobiash	based on that the web ui hides the builds tab	14:55
tobiash	mhu: there is no register mechanism for that so add it directly there	14:57
tobiash	it's initialized from here: https://opendev.org/zuul/zuul/src/branch/master/zuul/cmd/web.py#L50	14:57
Shrews	hrm, our Cross Project Gating doc is confusing to me. It states it shows "how to test future state of cross projects dependencies" but there is no "how" in there	14:59
Shrews	i don't get the point	15:00
tristanC	Shrews: yeah, my bad, i meant to add more content	15:03
tobiash	mhu: actually, job_history there is never set, so strike that...	15:03
tobiash	mhu: so you probably want to modify the fromConfig of WebInfo if you want to add some new capability to the info endpoint	15:03
mhu	tobiash, yeah that's what puzzled me :) if you look at http://zuul.opendev.org/api/tenant/opendev/info for example, job_history = false yet you can hit the builds endpoint	15:04
mhu	tobiash, right that's the simplest way, I was just wondering if I got the spirit of this right	15:04
tobiash	mhu: I guess that's broken since probably no one runs a zuul without a db so nobody complained ;)	15:04
mhu	mordred, git log snitched on you, you added the webinfo capabilities in commit 518dcf8bdba9c5c22711297395c4a9cb4e0c644d, any insights on this ? :)	15:06
Shrews	tristanC: ah, i see	15:06
corvus	mnaser, clarkb: an integration test job could wait for a static node	15:14
fungi	mhu: i hard-code `git blame` to just return "mordred" for everything ;)	15:16
tristanC	tobiash: i can't find the openshift scc for zuul you posted sometime ago... would you mind pasting them again?	15:32
tobiash	tristanC: I use the default privileged scc	15:33
*** themroc has quit IRC		15:35
*** themroc has joined #zuul		15:35
mnaser	corvus, clarkb: oh ya, i think that might work. does nodepool need to scan the ssh-keys (aka i can just point at the node ip and thats probably enough?)	15:38
openstackgerrit	Merged zuul/zuul-jobs master: Make pre-molecule tox playbook platform agnostic https://review.opendev.org/700452	15:38
*** themroc has quit IRC		15:40
*** themroc has joined #zuul		15:41
tobiash	tristanC: you just need to make sure you run the executor with a service account that is allowed to use the privileged scc	15:41
tristanC	tobiash: alright, i remembered a more complex setting. i'll try that, thanks!	15:44
*** jpena is now known as jpena\|brb		15:45
tristanC	btw I got an operator that now implements the full Zuul CR, here is an example usage: https://github.com/TristanCacqueray/dhall-zuul/blob/master/operator/examples/zuul-cr.yaml	15:46
*** themroc has quit IRC		15:46
*** themroc has joined #zuul		15:47
corvus	tristanC: that looks nice!	15:51
*** themroc has quit IRC		15:51
*** themroc has joined #zuul		15:51
corvus	tristanC: do you want to propose that up to the zuul-operator repo?	15:52
*** rfolco has quit IRC		15:52
tristanC	corvus: yes sure, if you don't mind the dhall based implementation, i'd be happy to push that to zuul/zuul-operator	15:53
corvus	tristanC: i think it's worth looking at. my biggest concern is that users don't see it. but if it's just implementation detail, that's different.	15:55
mordred	mhu: yeah - I added that a while back with the intent on it being used like you're suggesting ... but then we never plumbed it all the way through - and then we started thinking that DB was really something we wanted to make a hard requirement in the future, so the motivation to finish the info/capabilities work kind of went away	15:55
mhu	mordred, okay that clears things up	15:56
clarkb	corvus: tristanC: a bit more annotation of the constructs would probably be good to explain what is going on since dhall is new. But I font think end users will ever interact with it	15:56
mordred	mhu: is there a new plugin capability you were thinking of adding that might be useful to register there?	15:56
mhu	mordred: yeah, the authentication config	15:57
corvus	clarkb, tristanC: yeah -- i'm still not quite sure what dhall is doing here	15:57
mhu	as suggested by tobiash in https://review.opendev.org/#/c/643536/	15:57
mordred	mhu: ah - yeah, that makes total sense	15:57
clarkb	corvus: after looking yesterday my understanding is it compiles k8s CR yaml inputs to zuul config and ansible playbook outputs	15:58
tristanC	corvus: clarkb: that's good to hear, i should be able to propose a review today or next week with what i currently have	15:58
*** jangutter has quit IRC		15:59
fungi	tristanC: skimming back through the dhall, how do you end up writing it? does your keyboard have keys for λ and → glyphs or are you using compose sequences? does dhall support a pure-ascii syntax variant maybe?	16:02
corvus	tristanC, pabelanger, mnaser, mordred, clarkb: maybe we should schedule a meeting to talk about how to proceed? since the spec says ansible, and mnaser proposed golang, and now tristan is looking at dhall, it seems like maybe we should try to get back on the same page.	16:03
clarkb	fungi: tristanC uses the ascii dhall version	16:03
clarkb	corvus: tristanC's operator is still ansible	16:03
tristanC	fungi: it's currently debated here: https://discourse.dhall-lang.org/t/change-dhall-format-to-use-ascii-by-default	16:03
clarkb	corvus: dhall takes the k8s CR input and rewrites it to ansible.	16:03
tristanC	corvus: clarkb: yes i'm using the ansible operator-sdk	16:04
fungi	clarkb: he's not using pure ascii in https://github.com/TristanCacqueray/dhall-zuul/blob/master/render.dhall for example	16:04
corvus	clarkb: true, but i guess the spirit of the question is "as we maintain the operator, what are we going to expect to need to modify?"	16:04
clarkb	fungi: ah must of the files I read use \ instead of lambda and -> instead of arrow	16:04
fungi	got it, so there at least is an option of doing pure ascii anyway	16:04
clarkb	corvus: yup, and dhall would definitely be involved in that. FWIW I think with a bit of annotation calling out what the expansions are I think it would end up being quite readable	16:05
tristanC	fungi: long story short, i wish i had a lambda key on my keyboard, but i type in ascii, and the formatter takes care of the rendering. I change the default to use '--ascii' recently as upstream is suggesting that, but the recent poll shows that unicode is still popular https://docs.google.com/forms/d/e/1FAIpQLSc_4Se7V6jRk4SBfAx1UdZ67Cf_5Hg0uRas5PxOMeRes4nQGg/viewanalytics	16:05
clarkb	it might also help to reduce some of the nesting	16:06
corvus	should we maybe plan an irc and/or teleconference meeting next week to talk about it?	16:06
fungi	interesting. people like to use a language they can't type without stuffing it through a preprocessor	16:06
clarkb	so that each logical expansion is a top level function	16:06
clarkb	rather than a bunch of closures	16:06
mnaser	i'm free next week but after that my schedule is a little funky as i'll be in EU for a bit	16:06
clarkb	(at least I think I would find it more readable that way)	16:06
mnaser	i can be available EU evenings	16:07
mordred	fungi: I imagine many of them are using os's/editors that have auto-convert features - but I'm just guessing	16:07
mnaser	i guess im the only one who learned about dhall today	16:07
mordred	corvus: I think a meeting would be great. I'm on a long flight on tuesday - but could do an IRC meeting	16:07
mordred	mnaser: I know next to nothing about it other than tristanC's explorations	16:07
tristanC	corvus: i should have time to cleanup and add comment, hoping that it will make my implementation cleaner	16:07
tristanC	clearer*	16:08
mordred	corvus: other than that - I can do voice again after tuesday	16:08
mnaser	mordred: oh yay, well that's a little bit more comforting :p	16:08
clarkb	I'm around next week. If we get hit by a snow storm I might disappear to go sledding with the kids or if power goes out but otherwise around :)	16:08
fungi	i find it interesting on that dhall discourse thread that there are folks lobbying for completely removing the ascii option and only supporting extended unicode syntax	16:08
mordred	corvus: maybe once tristanC has the annotated version up? because maybe with those annotations the code might make more immediate sense to our eyes and give us a stronger context to discuss the choice?	16:09
corvus	i'm thinking voice might be good for this one... how about wedesday 1800 utc?	16:09
corvus	tristanC: would that be enough time?	16:09
fungi	i'm free all wednesday	16:09
mordred	I'll be in singapore so that'll be 2am for me - BUT - I'll be jetlagged, so it's likely fine :)	16:10
tristanC	corvus: wedesday 1800 works for me, i guess i'll start commenting the code now =)	16:10
clarkb	tristanC: and maybe reduce nesting so that it looks more like imperative programming with function calls?	16:11
fungi	poll results are intriguing as well... >90% write dhall in ascii, but >50% prefer that the preprocessor reformats it to extended unicode symbols	16:11
mordred	fungi: that's fascinating	16:12
*** themroc has quit IRC		16:12
corvus	mordred: would 15:30 be any better?	16:12
fungi	still, with some 35% explicitly formatting it to ascii in the preprocessor and 41% saying they prefer to read it in ascii, i doubt the folks pushing to remove the ascii syntax entirely will have sufficient support	16:13
mordred	corvus: probably? but I'm happy to do either regardless if 1800 works better for others	16:13
corvus	mordred, clarkb, tristanC, pabelanger, mnaser, fungi: https://ethercalc.openstack.org/ur9x4q4z1z71	16:15
clarkb	fungi: your questions about character sets as they intersect with programming remind me of apl https://en.wikipedia.org/wiki/APL_(programming_language)#/media/File:APL-keybd2.svg	16:22
*** jpena\|brb is now known as jpena		16:22
fungi	clarkb: reminiscent of https://en.wikipedia.org/wiki/Space-cadet_keyboard#/media/File:Space-cadet.jpg	16:23
*** mattw4 has joined #zuul		16:23
fungi	(which actually existed)	16:23
corvus	okay, let's go with 15:30, i'll send an email	16:24
fungi	thanks!	16:26
*** rlandy is now known as rlandy\|brb		16:29
fungi	clarkb: on closer inspection, the space cadet keyboard has all the extended dhall glyphs	16:31
openstackgerrit	Matthieu Huin proposed zuul/zuul master: web capabilities: remove unused job_history attribute https://review.opendev.org/702001	16:41
mhu	mordred, tobiash ^ if it's fine with you	16:41
mhu	I'll look at what was done with the webhandlers and see if I can reimplement it somehow for the auth capabilities	16:42
*** mhu has quit IRC		17:02
*** mhu has joined #zuul		17:03
*** rlandy\|brb is now known as rlandy		17:04
*** bhavikdbavishi has quit IRC		17:04
AJaeger	is it safe to go from go 1.13 to 1.13.5 by default in Zuul jobs? See https://review.opendev.org/#/c/700467/	17:15
tobiash	mhu: actually it's not required, but it us planned to make sql required	17:16
clarkb	AJaeger: https://golang.org/doc/devel/release.html#go1.13.minor looks safe though there is a 1.13.6 now	17:16
clarkb	AJaeger: in general I think upgrading minor releases is ok	17:16
clarkb	its the 1.13 -> 1.14 upgrade that will be potentially problematic	17:17
AJaeger	thanks, clarkb	17:18
* AJaeger will +2A 700467 then		17:19
*** zxiiro has joined #zuul		17:24
*** sanjayu__ has quit IRC		17:30
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template' https://review.opendev.org/701871	17:31
*** saneax has joined #zuul		17:31
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874	17:31
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874	17:31
*** evrardjp has quit IRC		17:33
*** evrardjp has joined #zuul		17:34
pabelanger	corvus: wfm, and yah, would be nice if we could all align	17:36
openstackgerrit	Merged zuul/zuul-jobs master: install-go: bump version to 1.13.5 https://review.opendev.org/700467	17:42
pabelanger	Hmm, we are seeing a traceback in github driver when trying to merge PR for a repo: http://paste.openstack.org/show/788250/	18:01
pabelanger	however, I do not know why that is	18:01
pabelanger	tobiash: see^ before?	18:02
pabelanger	seen*	18:02
clarkb	pabelanger: https://github.community/t5/GitHub-API-Development-and/Resource-not-accessible-by-integration-when-requesting-GitHub/td-p/13829 reading that implies to me that the functionality you are trying to get isn't available to github applications	18:02
clarkb	have to make a user request instead?	18:03
pabelanger	I wonder if something change on merge_method	18:04
tobiash	pabelanger: hrm, I never saw this before? What was the request?	18:04
pabelanger	tobiash: gate passed, zuul trying to merge	18:04
pabelanger	https://github.com/ansible-network/collection_migration/pull/11 is PR	18:05
tobiash	pabelanger: in zuul nothing changed recently around that afaik	18:05
tobiash	pabelanger: does this affect the whole repo?	18:05
tobiash	The branch pritection settings would be useful	18:06
pabelanger	tobiash: yah, so this has 2 different branch protections rules, 1 for master the other for feature/* both are same but did two because not smart enough to figure out regex	18:07
pabelanger	settings are as follows	18:07
pabelanger	Require status checks to pass before merging, check / gate	18:07
pabelanger	include admins	18:07
pabelanger	and restrict users to ansible-zuul (github app)	18:08
pabelanger	this should be same as other working repos	18:08
pabelanger	Allow merge commits is selected	18:08
*** pcaruana has quit IRC		18:09
tobiash	pabelanger: the restrict users to app is quite new, might be buggy?	18:09
tobiash	Can you get the post request from the logs?	18:10
pabelanger	tobiash: I can clear, but so far haven't seen a failure before	18:10
pabelanger	yah, 1 sec	18:10
tobiash	And further to check that zuul isn't configured to use squash merge	18:11
pabelanger	https://pastebin.com/GerARqfK	18:12
pabelanger	let me check that too	18:12
pabelanger	tobiash: we don't set it, so it is the default	18:13
pabelanger	merge-commit, IIRC on github	18:13
tobiash	hrm no idea what could cause this	18:16
*** rfolco has joined #zuul		18:16
tobiash	Is this affecting just this pr, the whole repo or the whole app?	18:16
clarkb	is it possible that restrict users is placing a restriction for user to server api calls	18:16
clarkb	which would break the application per the link I pasted	18:16
pabelanger	tobiash: not sure, this is first PR	18:17
pabelanger	I can try other branch	18:17
pabelanger	clarkb: maybe, but can remove and see	18:17
clarkb	I would try removeing the restrict users and see if it works	18:17
clarkb	++	18:17
tobiash	yes, that' what I'd do as well	18:17
tobiash	pabelanger: first pr in that repo or of that gh app?	18:18
tobiash	in case of the latter also check the access rights of the app	18:18
pabelanger	tobiash: well, not first PR, but first since we upgraded to 3.14.0 this worked dec 2019 when using 3.11.1	18:19
pabelanger	yah, github app is on repo	18:20
pabelanger	has correct permissiosn	18:20
tobiash	Really weird	18:20
pabelanger	no change removing ansible-zuul only	18:23
clarkb	pabelanger: fwiw the only github driver changes in zuul recently that I know of are the ones you and I made recently to make searching for dependencies more efficient	18:25
clarkb	nothing related to merging changes	18:25
clarkb	pabelanger: possible that github3 updated in your upgrade?	18:25
*** jpena is now known as jpena\|off		18:25
pabelanger	clarkb: yah, thats the change I was looking at	18:26
pabelanger	github3.py==1.3.0	18:26
pabelanger	that is same version as before	18:26
pabelanger	well, dame	18:28
pabelanger	damn*	18:28
pabelanger	I think this is the issue	18:28
pabelanger	https://github.community/t5/GitHub-Actions/GitHub-action-Resource-not-accessible-by-integration/td-p/19707	18:28
pabelanger	OAuth tokens and GitHub Apps are restricted from editing the main.workflow file. This means it isn't possible to merge a Pull Request via an Action if it updates the main.workflow file I'm afraid.	18:29
pabelanger	let me rebase away that change, and see what happens	18:29
clarkb	so it is change specific if you edit the workflow file?	18:29
pabelanger	clarkb: so, yah this is a little messed up. the PR, is a rebase of an upstream project	18:30
pabelanger	so a series of commit to bring master branch up to date	18:30
pabelanger	upstream uses github acctions	18:30
pabelanger	actions*	18:30
tobiash	pabelanger: ok, so that suggests that this problem won't affect other prs that don't modify those files	18:32
pabelanger	welp, that is a terrible design	18:33
pabelanger	that was the issue	18:33
pabelanger	https://github.com/ansible-network/collection_migration/pull/11	18:33
pabelanger	that means	18:33
pabelanger	zuul cannot gate github repos, with .github/workflows, regardless of if the feature is enabled or not	18:34
tobiash	yay github :/	18:34
pabelanger	oh	18:35
pabelanger	this repo has it enabled	18:35
pabelanger	so, let me disable it and add back	18:35
pabelanger	yup, fails even with disabled	18:37
pabelanger	that is terrible	18:37
pabelanger	so, I guess I get to write a new job, that checks for .github/workflows folder and fail if found	18:39
fungi	i'm not entirely sure how to parse github.community's date "format" but the response from github seems to have either been 10 or 8 months ago that they were "passing the feedback to the team to consider"	18:40
fungi	if they haven't solved this in most of a year, then i wouldn't hold my breath	18:40
pabelanger	yah, I would love to see where this is documented	18:42
pabelanger	fungi: clarkb: tobiash: guess, force push is the way to solve this	18:48
clarkb	pabelanger: or escalate your privs and click the merge button	18:48
clarkb	whcih reduces chances that you'll merge the wrong thing	18:49
pabelanger	yah, doesn't scale too well	18:49
clarkb	well should be infrequent right?	18:49
clarkb	onyl when you change those files	18:49
pabelanger	thinking long term, some ansible projects want github actions	18:49
clarkb	oh sure long term we should figure out a way to address it and having zuul push merges back would probably do it	18:50
clarkb	I thought you meant for this change	18:50
pabelanger	yah, this case, best to delete folder and deal with rebase cost	18:50
pabelanger	but yah, hear you	18:50
pabelanger	just terrible that it is blocked regardless if actions is enable or disabled	18:51
tobiash	clarkb: push by zuul will be blocked as well according to that link	18:52
clarkb	oh ha ok	18:53
tobiash	the only way I see to solve this in a non hacky way is talking to github	18:53
pabelanger	tobiash: not sure, internal ansible folks say it works with git push --force	18:56
pabelanger	grumble	18:56
pabelanger	thanks github	18:56
clarkb	tobiash: pabelanger that link says they used an action token and it was from the action	18:56
clarkb	could be the git push fails if done in that way but with other user token would be ok?	18:56
*** pcaruana has joined #zuul		18:57
pabelanger	https://github.com/ansible-community/collection_migration/pull/210/files	18:57
tobiash	clarkb: a different user probably works but this should be considered a hack because github apps is for handling access to the app...	18:58
pabelanger	I think colleciton migration tool is using deployement keys	18:58
clarkb	tobiash: yes, but the failure was from within the action not an app aiui	18:59
clarkb	tobiash: but maybe the action and app creds are equivalent here	18:59
tobiash	I guess it's the same type of auth token there as we see zuul failing to merge as well	19:02
pabelanger	so, right now, I can't think of a good way to deal with this in zuul, without wrting a new job for all projects and check if files exist.	19:07
clarkb	a good start may simply to be to add a note about this in the github driver docs	19:08
clarkb	so that it is less of an investigation if people hit it in the future	19:08
clarkb	jlk: ^ may also be able to offer some insight	19:08
pabelanger	yah, would be interested in official docs (on github) so I can link to it, but can't seem to figure it out	19:23
jlk	oh that looks fine	19:37
jlk	IIRC Actions is still in beta, so there's definitely going to be docs gaps	19:37
pabelanger	jlk: what is best way to pass along feedback here? Aside from asking you :) Basically, if actions are disabled, we'd want github to allow merging of .github/workflows configs.	19:39
jlk	asking	19:42
pabelanger	thanks!	19:43
jlk	https://support.github.com/contact/feedback?contact[category]=GitHub+Actions	19:44
jlk	Looks like you still have to pick Actions. one sec	19:44
jlk	¯\_(ツ)_/¯ just pick it from the drop down	19:45
pabelanger	ack, cool	19:45
corvus	jlk, pabelanger: secondarily, it would be nice if there were a way to merge changes if actions are enabled. maybe an extra permission that can be granted to an app. otherwise, it doesn't really leave much space for github actions to co-exist with other apps.	19:48
jlk	agreed	19:48
pabelanger	+1	19:49
corvus	pabelanger: you want to pass on both of those pieces of feedback?	19:49
pabelanger	will do	19:49
corvus	pabelanger: thanks!	19:49
*** Goneri has quit IRC		19:59
pabelanger	corvus: jlk: feedback sent, not sure if there is a way for me to track status of it	20:01
*** michael-beaver has joined #zuul		20:57
*** rfolco has quit IRC		21:06
*** zxiiro has quit IRC		21:23
*** rlandy has quit IRC		21:26
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: Fix typo in helm role https://review.opendev.org/702046	21:27
*** Goneri has joined #zuul		21:28
corvus	pabelanger, yoctozepto: 678273 +3, thanks	21:38
clarkb	pabelanger: mordred: what was the process of doing volume detachments when the nova server no longer exists?	21:46
clarkb	iirc that was sorted out in here recently	21:46
yoctozepto	corvus: thanks	21:47
yoctozepto	any eta to be picked up by opendev/openstack instance?	21:47
fungi	clarkb: based on scrollback it was this api method: https://docs.openstack.org/api-ref/block-storage/v3/#force-detach-a-volume	21:48
mordred	clarkb: one sec ...	21:48
mordred	fungi: do you have that scrollback quick enough to find the paste link I sent to pabelanger?	21:48
fungi	though a clean recipe on how we can run that from bridge.o.o to do batch cleanup would be nice	21:48
fungi	oh, there was a paste too? checking	21:48
mordred	fungi: yeah - I'm also looking	21:49
fungi	http://paste.openstack.org/show/788133/	21:49
fungi	the power of grep compels you	21:49
fungi	now if only paste.o.o could respond in a timely fashion	21:50
mordred	the uuid in that string comes from volume.attachments and is the id of a given attachment. the overall algorithm is "get volume, look at attachments, look to see if the server.id listed in the attachment exists, if not, do that DELETE call to delete the attachment, then you can delete the volume"	21:51
fungi	and i guess i can pass the usual OS_CLIENT_CONFIG_FILE envvar?	21:53
mordred	fungi: something like http://paste.openstack.org/show/788260/ (untested) - and yes, you should be able to do that	21:54
mordred	fungi: where is this happening now?	21:55
clarkb	mordred: vexxhost sjc1	21:55
mordred	if you like, I can do a quick try on that script without deleting thing and make it something we can run run	21:55
mordred	clarkb: the zuul account?	21:55
mordred	clarkb, fungi: give me a sec - let me make a proper script that we can run	21:55
clarkb	mordred: yes	21:55
clarkb	mordred: we've leaked a bunch of volumes which is preventing us from deleting images which is filling the builder disks	21:56
openstackgerrit	Merged zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867	21:56
mordred	clarkb: "awesome"	21:56
corvus	yoctozepto: maybe next week?	21:57
fungi	mordred: yeah, i tried but am getting errors like this:	21:57
fungi	Volume 0f91579c-c627-452b-aad4-67cdeae865c3 could not be found.	21:58
fungi	when that volume shows up in the `openstack volume list` output	21:58
mordred	fungi: cool. let me take a swing at it	21:58
fungi	`openstack volume show 0f91579c-c627-452b-aad4-67cdeae865c3` provides an attachment_id of 42511fe3-d926-4b99-b275-38f0e8fa5a76	21:59
fungi	so i tried calling print(c.block_storage.delete('/attachments/42511fe3-d926-4b99-b275-38f0e8fa5a76', microversion='3.31'))	21:59
fungi	leading me to suspect there's a deeper problem (though in all likelihood i'm just doing it wrong)	22:00
mordred	fungi: I feel like I just deleted that one	22:08
mordred	(the attachment)	22:08
mordred	can you double check?	22:08
fungi	lookin'	22:10
fungi	attachments []	22:10
fungi	i believe you did	22:10
fungi	any clue what i was missing?	22:10
fungi	magic fingers?	22:10
mordred	maybe magic fingers :)	22:11
mordred	I've got a script in /root called clean-volumes.py that should work - although I'm going to run it one more time in no-op mode	22:11
mordred	fungi: if you run it with OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml python3 clean-volumes.py	22:12
mordred	fungi: it should print a bunch of REST calls- then a long list of actions	22:12
corvus	mnaser: https://github.com/aslafy-z/helm-git looks interesting but iiuc, one would need to build a custom argo deployment image in order to use it	22:13
mordred	fungi: I think we mostly want to make sure that the list of actions it's going to take there don't include deleting the mirror volume	22:13
fungi	sure, though ideally the mirror volume is in another castle	22:13
mnaser	corvus: !!!! i like that a lot	22:14
corvus	mnaser: i think https://argoproj.github.io/argo-cd/operator-manual/custom_tools/ is relevant to that	22:14
mordred	fungi: oh - right - the mirror volume is in openstackci-vexxhost right?	22:14
mnaser	corvus: yes, that would be exactly the way to go about it	22:15
clarkb	mordred: fungi yes it should be in the other tenant/project/whatever its called	22:15
mnaser	corvus: but we can use an initcontainer and cheat...	22:15
mordred	clarkb: ok - so - we should be "safe" if there is a bug in this script from deleting anything too important, right?	22:15
clarkb	mordred: I think so. Another check is to look at hte volume size	22:15
mordred	clarkb, fungi: maybe this is ian #openstack-infra convo - whoops, sorry	22:15
clarkb	80GB should be for the test nodes	22:15
mnaser	corvus: thanks for reviewing my collect-container-logs role, i refactored uses i saw of it in the topic - https://review.opendev.org/#/q/topic:collect-container-logs	22:20
*** armstrongs has joined #zuul		22:21
corvus	mnaser: oh cool, thx	22:21
mnaser	btw, how do we feel about shipping 'tools' inside zuul, for example, in the current ci jobs, nodepool loops forever erroring due to the fact that zookeeper is not up yet .. it would be nice if we had some small python tools that waited for a zookeeper cluster to be ready in the image so we can use those in the initContainer	22:22
mnaser	so that if the zookeeper cluster is not up, nodepool simply won't start	22:23
openstackgerrit	Merged zuul/zuul master: Make files matcher match changes with no files https://review.opendev.org/678273	22:23
mnaser	it avoids spamming the logs with tons of errors as it fails to connect	22:23
mnaser	or perhaps nodepool can expose some sort or alive/readiness check so the containers dont become ready until nodepool is ready	22:24
mnaser	that way during a new deployment or rollout, if something goes wrong, it will stop rolling out the other pods	22:25
clarkb	mnaser: fwiw you can probably get away with nc or telnet or something simple like that	22:25
clarkb	I'm not sure we need to ship a special tool for that	22:25
mnaser	clarkb: yeah but then id have to do some yaml parsing of the list of zookeeper servers provided	22:25
mnaser	and assume the config format doesnt change for example	22:25
clarkb	aren't you providing that list to the config anyway?	22:26
clarkb	(meaning you could provide it to another tool)	22:26
mnaser	in nodepools case, im actually yaml-ifying the config input straight into a file, without having some manual 'insert this here' thing	22:26
clarkb	remember when we had init scripts that solved this for us	22:27
* clarkb grumps		22:27
clarkb	mnaser: the cluster (zuul, nodepool and zk) will eventually converge on a happy state once zk is up and running right? its not like this going into an error state?	22:28
clarkb	*permanent error state	22:29
mnaser	clarkb: it will, but technically during a rollout, if something is borked, it will just keep rolling out and breaking everything	22:29
clarkb	but is it broken?	22:29
clarkb	(If it is I think that is a bug worth fixing)	22:29
mnaser	right but we can forget zookeeper in this case, say, if you made a typo in your config and you're using the zuul helm charts	22:29
mnaser	if the first pod goes up and theres' no readiness check, it will starts rolling out the second one	22:30
mnaser	so you might rollout a broken config.. but if we have a readiness check where nodepool can tell k8s that "ok im good to go", it can safely continue with the rollout	22:30
clarkb	sure, checking for proper errors and not needing to wait for services to start in sequence are two different problems	22:30
clarkb	I think we should avoid needing strict sequencing	22:31
mnaser	yes, i agree, i think i started with one problem set and moved to another in the conversation	22:31
*** armstrongs has quit IRC		22:31
clarkb	right what I was trying to get at earlier is if you've found a sequencing issue we should fix that at the source not workaround it	22:31
clarkb	however, config validation is a good thing to verify and both zuul and nodepool have config validation for the yaml bits but not the ini bits iirc	22:32
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm https://review.opendev.org/702052	22:32
mnaser	ya i think the idea is to find a way to get nodepool to say "ok i'm up and im healthy"	22:33
mnaser	and then k8s can poll that as a readiness/liveness check	22:33
corvus	mnaser: i am not opposed to adding http (prometheus) endpoints that expose readiness	22:34
corvus	would that help?	22:34
mnaser	yeah, so generally we'd probably end (ideally) with 3 endpoints -- readiness, liveness and metrics	22:34
mnaser	readiness can equal liveness but metrics (prometheus) is different, but yeah, if we add an http endpoint, that's probably enough infrastructure to do it	22:35
mnaser	reason is most of the liveness and readiness checks are http request based rather than parsing metrics	22:35
corvus	okay, i thought readiness/liveness was part of prom; if it isn't let's set that aside for now, because prom for metrics is complicated	22:35
corvus	i'll rephrase that as "i think it would be fine to add http endpoints that expose liveness info"	22:36
corvus	and readiness :)	22:36
mnaser	cool, i dont know if i have the time to dig into this but it would be neat if nodepool started reporting 200 OK on a port once it's connected to nodepool and all threads are up and running	22:37
corvus	at the same time, i think nodepool can run a config check on it's config as a gate job :)	22:37
corvus	but, belts and suspenders. we can have both.	22:37
corvus	mnaser: can you take a look at https://review.opendev.org/702052 ?	22:38
corvus	it's the most complex helm i've written to date :)	22:38
mnaser	corvus: i am, im trying to double check the ternary formatting	22:38
mnaser	so it looks like you can also do: true \| ternary "foo" "bar"	22:39
mnaser	not that i think its a big deal, but that is easier for my brain to process	22:39
corvus	oh neat, that's sort of ansibleish	22:39
corvus	yeah, it took me a minute to understand the docs and realize the condition was last	22:39
mnaser	helm uses this library https://github.com/Masterminds/sprig afaik	22:40
mnaser	https://github.com/Masterminds/sprig/blob/48e6b77026913419ba1a4694dde186dc9c4ad74d/docs/defaults.md so thats there, but yeah, it seems right, though easier to process the more ansible-y one	22:40
clarkb	mnaser: corvus do you think we need that for every microservice or just the "brain"	22:40
corvus	mnaser: i'll switch to "\|"	22:41
clarkb	eg with zuul would we want each executor and merger etc to report it or have zuul web aggregate and report a single value	22:41
mnaser	clarkb: the approach is usually each microservice runs its own health check, because a single zuul executor can actually have having problems with the rest being ok	22:41
corvus	clarkb: good q -- if we can have zuul-web do that, it would be best... mnaser would that be complicated to have the liveness check check a service?	22:41
mnaser	https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request	22:42
corvus	mnaser: right, but could you say, hit "http://zuul/live?host=executor-4" ?	22:42
mnaser	so generally the http request hits a url/path on the same pod afaik	22:42
clarkb	mnaser: and I guess you can assign domains to those readyness? beacuse a single executor of many being broken is why we have more than one (or one reason to)	22:42
clarkb	so zuul application being up is different than a specific executor being up	22:42
mnaser	right, so if it is stateless and its broken, k8s can (up to you) decide to kill it and restart it	22:42
mnaser	many/most of the probes seem to want to hit the pod itself, also, the other reason why you wouldnt want it to hit zuul-web, because if you have issues with zuul-web, you dont want your executors to all show as not alive	22:43
mnaser	and then k8s goes ahead and kills them because they are failing health checks	22:43
corvus	a liveness command may be easier?	22:44
corvus	https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-command	22:44
mnaser	yes, i forgot about the commands	22:44
clarkb	that all makes sense. however your earlier example cares about zuul/nodepool the application not its individual parts	22:44
clarkb	so I think we end up wanting both things	22:44
mnaser	that might be a lot better, we can just write a file	22:44
corvus	we've got an easy pattern for commands with 'zuul-scheduler foo' and 'zuul-executor foo'	22:44
corvus	we could extend that to 'zuul-executor ready'	22:45
corvus	to return an exit code based on readiness	22:45
corvus	and apply that to every microservice, regardless of whether it otherwise runs an http server	22:45
mnaser	i think the only tricky bit is thats going to be out of process	22:45
mnaser	so unless you're planning to have some sort of socket open that it can talk to it.. or some file it lays down that the other process reads	22:45
clarkb	mnaser: thats how the commands work today	22:46
corvus	yeah, we have a socket	22:46
mnaser	oh okay so thats perfect then	22:46
mnaser	seems easy now, too easy :p	22:46
corvus	scheduler and executor definitely have this already, maybe some others, and the command structure is standardized enough that we can put it on all the commands real quick	22:46
corvus	so yeah, that's probably < 1 hour of work :)	22:47
mnaser	zuul-web can get away with it and just use an http helth check	22:47
clarkb	nodepool too. though that socket is to zk	22:47
mnaser	given it already runs a server	22:47
clarkb	so might get weird if zk isn't up :)	22:47
mnaser	well part of the ready could be an attempt to connect to zk and if that fails, then its not ready	22:47
corvus	clarkb: nodepool and zuul share a process startup framework, so should be easy to add commands to that	22:47
corvus	clarkb: so we could do "nodepool-builder ready"	22:48
corvus	(rather than "nodepool ready")	22:48
clarkb	makes sense	22:48
mnaser	corvus: btw, could you also rebase or depends-on https://review.opendev.org/#/c/701764/ as well so we can see it in action too	22:48
mnaser	(for the secret change)	22:48
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm https://review.opendev.org/702052	22:49
corvus	mnaser: oh yep 1 sec	22:49
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Change builder container name https://review.opendev.org/701793	22:50
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Add empty clouds value https://review.opendev.org/701865	22:50
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm https://review.opendev.org/702052	22:50
corvus	mnaser: https://gerrit-review.googlesource.com/c/zuul/ops/+/250112 is what that looks like in action	22:54
corvus	so the tradeoff is that the secret has to be set up external to argo/helm, but in exchange we get to just edit the nodepool file on disk.	22:55
clarkb	corvus: mnaser if I'm looking at ^ and want to figure out where the chart is for zookeeper how do I find that?	22:55
mnaser	i think corvus deployed it externally outside argo	22:55
corvus	clarkb: readme line 26	22:56
*** avass has quit IRC		22:56
clarkb	ya thats just a giant xml doc	22:56
corvus	clarkb: oh, yeah that's the thing we were talking about with fungi yesterday.	22:57
mnaser	corvus: btw.. kubectl -n argocd get application/zookeeper -- trim that down, and add it as an app	22:57
mnaser	so you can deploy zookeeper via argocd too :)	22:57
mordred	pabelanger, Shrews: https://review.opendev.org/702053 <-- has a complete script implementing cleaning up after leaked volumes from BFV openstack clouds that are attached to non-existent servers	22:58
clarkb	corvus: I'm curious to see if they modify the zk settings to make zk not terribly slow, and rotate the journal file, and run in a 3 or 5 pod cluster	22:58
corvus	mnaser: how's that different than the "argocd app create" for zookeeper	22:58
clarkb	corvus: really the first two things are probably most important otherwise your zuul will be slow and then run out of disk and have a safd	22:58
mnaser	corvus: argocd app create pretty much creates a local yaml manifest and pushes it out to the cluster :)	22:58
mnaser	clarkb: there isn't somewhere its listed, but its here -- https://github.com/helm/charts/tree/master/incubator/zookeeper	22:59
corvus	clarkb: should be ~= to this https://github.com/helm/charts/tree/master/incubator/zookeeper	22:59
mnaser	and you can look at all the values https://github.com/helm/charts/blob/master/incubator/zookeeper/values.yaml	22:59
corvus	mnaser: so kubectl -n argocd get application/zookeeper is a shorthand?	23:00
mnaser	corvus: right, i meant that you can get the application definition from k8s and store it in-repo, so you dont have to bootstrap it with a cli command	23:00
mnaser	you can kubectl apply -f zookeeper-app.yaml nodepool-app.yaml	23:00
clarkb	ya antiaffinity is option so you'll likely want to toggle that	23:03
mnaser	also it uses a 5G pv by default i think	23:04
clarkb	mnaser: corvus https://github.com/helm/charts/blob/master/incubator/zookeeper/values.yaml#L277 is the value you want to change to avoid running out of disk iirc	23:09
clarkb	otherwise you grow an infinite number of snaps	23:09
corvus	clarkb: thanks!	23:10
clarkb	yup our system-config/manifests/site.pp comments confirm	23:10
clarkb	opendev sets that to 6 which means every 6 hours purge old snaps down to snap retain count	23:10
clarkb	the other thing we change is snapCount to 10000	23:11
clarkb	that sets a higher limit to how big of a transactio nbacklog zk can have before it throttles clients to catch up	23:11
clarkb	iirc because we are bursty when restarting services that helps during those times	23:12
corvus	clarkb: i've made notes to change both of those	23:12
clarkb	cool	23:12
corvus	mnaser: do you have any idea what the linter is on about here? https://zuul.opendev.org/t/zuul/build/ba271a844a6c46a38c303ba4e88e33ad	23:12
clarkb	the other thing to check is log rotation but https://github.com/helm/charts/blob/master/incubator/zookeeper/templates/config-script.yaml#L92 takes care of that for you I belive.	23:13
corvus	mordred: do you happen to have thoughts about a mysql-ish helm chart? istr you used helm for the pxc thing a while back?	23:13
corvus	maybe just https://github.com/helm/charts/tree/master/stable/percona-xtradb-cluster ?	23:13
mordred	corvus: I did - but I did a helm export and then committed the results	23:14
mordred	corvus: yeah - I think that's what I used ... looking real quick	23:14
mordred	corvus: kubernetes/export-percona-helm.sh	23:14
mordred	corvus: in system-config - is what I used when we did things before	23:14
mordred	corvus: but since you're using helm directly, you obviously don't need to export - but maybe the arguments will be helpful	23:15
corvus	mordred: cool, thanks	23:15
mnaser	corvus: i think we need to add an empty tenantConfig and conditional on "extraFiles"	23:15
corvus	mnaser: ^ for the sql part of this	23:15
corvus	mnaser: oh i see, i'll see if i can fix that real quick	23:16
mnaser	because its probably rendering to like...	23:16
mnaser	https://www.irccloud.com/pastebin/wRRWY0sm/	23:16
mnaser	personally, i'm pretty indifferent about db. i actually use this operator cause it has backups and what not: https://github.com/presslabs/mysql-operator	23:17
mnaser	so for me just need to make sure we have a way to disable it	23:18
mordred	oh cool. that operator was _not_ finished back when we were looking at gitea	23:18
mnaser	mordred: yeah its pretty polished and runs pretty well for us. the reason i am not using the percona one because.. at least for our openstack use case, we're always reading/writing to one server all the time because of deadlock stuff	23:19
corvus	i love the juxtaposition of "bulletproof" and "alpha and not suitable for critical production workloads" :)	23:19
mordred	corvus: I do not have emotional ties to the other thing - that was just best I could find back then	23:20
mnaser	so running a master/slave(ehhh, do we have another term folks use these days for dbs?) does the same thing	23:20
mordred	mnaser: oh - right, that's replication based not galera based, yeah?	23:20
mnaser	yep	23:20
mordred	mnaser: also - I probably don't want to know about the deadlocks thing with openstack do I?	23:21
clarkb	tristanC is using postgres in his operator demo for zuul	23:21
mnaser	mordred: https://www.percona.com/blog/2014/09/11/openstack-users-shed-light-on-percona-xtradb-cluster-deadlock-issues/	23:21
clarkb	fwiw	23:21
mnaser	"The simplest way to overcome this issue from the operator’s point of view is to use only one writer node for these types of transactions. This usually involves configuration change at the load-balancer level."	23:22
mnaser	thats 2014 but i dont know if much work was put into rewriting those queries so they dont deadlock a cluster	23:22
fungi	the overlord/subjugated model of database clustering	23:23
mordred	oh - right. I remember discussion on that and discussions about not going things that way - and I'm pretty sure it died on the vine	23:23
*** rfolco has joined #zuul		23:25
tristanC	clarkb: i just use pg because i find it easier to setup and use	23:25
clarkb	ah ok so nothing to do with operator readiness	23:25
*** mattw4 has quit IRC		23:27
corvus	mnaser: it looks like extraFiles is the tricky part there; i can't (with my limited helm experience) find an easy way to do that kind of arbitrary include. one way to fix it would be to iterate over it as a dict	23:28
tristanC	to help with service readyness, i added init-container to most service to run python -c 'socket.connect(service, port)' on the dependency (e.g. executor -> gearman -> zk -> db), this make the deployment a bit slower but the service logs are cleaner like that	23:29
mnaser	corvus: you could just simply add an if statement, if empty map in Golang evaluates to false	23:30
corvus	mnaser: ah yep	23:30
mnaser	I believe so anyways. Alternatively, you can look at how I did nodeSelector and tolerations too	23:31
* mnaser is on mobile so can’t point to specific		23:31
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Add Zuul charts https://review.opendev.org/700460	23:31
*** pcaruana has quit IRC		23:35
*** rfolco has quit IRC		23:40
*** michael-beaver has quit IRC		23:57
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Allow tenant config file to be managed externally https://review.opendev.org/702057	23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!