Friday, 2020-03-27

*** rfolco has joined #zuul		00:06
*** jamesmcarthur has quit IRC		00:11
*** jamesmcarthur has joined #zuul		00:11
*** armstrongs has joined #zuul		00:12
*** armstrongs has quit IRC		00:21
mordred	remote: https://review.opendev.org/715331 Release 3.1.0 of cliff	00:24
mordred	that should fix the py35 issue when it lands	00:24
*** rlandy has quit IRC		00:28
*** jamesmcarthur has quit IRC		00:29
*** jamesmcarthur has joined #zuul		00:30
*** jamesmcarthur has quit IRC		00:30
*** jamesmcarthur has joined #zuul		00:31
*** jamesmcarthur has quit IRC		00:36
*** sgw has quit IRC		00:38
*** sgw has joined #zuul		00:38
*** jamesmcarthur has joined #zuul		00:39
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script" https://review.opendev.org/715325	00:57
*** jamesmcarthur has quit IRC		01:00
*** sgw has quit IRC		01:06
*** sgw has joined #zuul		01:08
*** rlandy has joined #zuul		01:23
*** rlandy has quit IRC		01:33
*** zxiiro has quit IRC		01:35
*** jamesmcarthur has joined #zuul		01:38
*** adamw has quit IRC		01:43
*** adamw has joined #zuul		01:43
*** jamesmcarthur has quit IRC		01:49
*** adamw has quit IRC		01:58
*** adamw has joined #zuul		02:02
*** jamesmcarthur has joined #zuul		02:26
*** jamesmcarthur has quit IRC		02:32
*** jamesmcarthur has joined #zuul		02:32
*** ysandeep\|away is now known as ysandeep\|rover		02:34
*** Goneri has quit IRC		02:37
*** swest has quit IRC		02:54
*** rfolco has quit IRC		02:55
*** jamesmcarthur has quit IRC		02:55
*** bhavikdbavishi has joined #zuul		02:55
*** jamesmcarthur has joined #zuul		02:57
*** bhavikdbavishi has quit IRC		03:00
*** jamesmcarthur has quit IRC		03:06
*** swest has joined #zuul		03:09
*** jamesmcarthur has joined #zuul		03:13
*** jamesmcarthur has quit IRC		03:31
*** jamesmcarthur has joined #zuul		04:14
*** bhavikdbavishi has joined #zuul		04:24
*** saneax has joined #zuul		04:27
*** bhavikdbavishi1 has joined #zuul		04:29
*** bhavikdbavishi has quit IRC		04:30
*** bhavikdbavishi1 is now known as bhavikdbavishi		04:30
*** jamesmcarthur has quit IRC		05:10
*** jamesmcarthur has joined #zuul		05:23
*** jamesmcarthur has quit IRC		05:27
*** bhavikdbavishi has quit IRC		05:33
*** bhavikdbavishi has joined #zuul		05:34
*** evrardjp has quit IRC		05:36
*** evrardjp has joined #zuul		05:36
*** ysandeep\|rover is now known as ysandeep\|rover\|b		05:41
*** bhavikdbavishi has quit IRC		05:57
*** sgw has quit IRC		06:02
*** bhavikdbavishi has joined #zuul		06:08
*** ysandeep\|rover\|b is now known as ysandeep\|rover		06:08
*** bhavikdbavishi has quit IRC		06:49
*** dpawlik has joined #zuul		07:11
*** bhavikdbavishi has joined #zuul		07:40
*** jpena\|off is now known as jpena		08:14
*** ysandeep\|rover is now known as ysandeep\|rover\|l		08:33
*** ysandeep\|rover\|l is now known as ysandeep\|rover		08:59
*** tosky has joined #zuul		09:29
*** sshnaidm\|afk is now known as sshnaidm\|off		09:50
*** pabelanger has quit IRC		09:58
*** fbo has quit IRC		09:58
*** jpena has quit IRC		09:59
*** jpena has joined #zuul		10:06
*** fbo has joined #zuul		10:11
*** fbo has quit IRC		10:36
*** ysandeep\|rover is now known as ysandeep\|rov\|brb		10:41
*** bhavikdbavishi has quit IRC		10:51
*** bhavikdbavishi has joined #zuul		10:52
*** ysandeep\|rov\|brb is now known as ysandeep\|rover		11:03
*** arxcruz is now known as arxcruz\|off		11:19
*** tobias-urdin has joined #zuul		11:28
openstackgerrit	Jan Kubovy proposed zuul/zuul master: WIP: Enforce sql connections for scheduler and web https://review.opendev.org/630472	11:48
*** ysandeep\|rover is now known as ysandeep\|rov\|mtg		12:00
*** rlandy has joined #zuul		12:03
*** jpena is now known as jpena\|lunch		12:05
*** ysandeep\|rov\|mtg is now known as ysandeep\|rover		12:37
openstackgerrit	Andreas Jaeger proposed zuul/zuul-jobs master: Cap stestr version https://review.opendev.org/715415	12:41
*** hashar has joined #zuul		12:44
openstackgerrit	Andreas Jaeger proposed zuul/zuul-jobs master: Cap stestr version https://review.opendev.org/715415	12:45
*** rfolco has joined #zuul		12:48
mnaser	morning all	12:53
mnaser	would anyone like to +W this today? we hit the 2 weeks - https://review.opendev.org/#/c/712804/1	12:54
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: install-kubernetes: fix missing package https://review.opendev.org/715418	12:59
mnaser	tristanC: fyi ^	13:00
tristanC	mnaser: thanks!	13:00
*** openstackstatus has quit IRC		13:01
*** openstack has joined #zuul		13:05
*** ChanServ sets mode: +o openstack		13:05
openstackgerrit	Andreas Jaeger proposed zuul/zuul-jobs master: Cap stestr version https://review.opendev.org/715415	13:07
openstackgerrit	Jan Kubovy proposed zuul/zuul master: Required SQL reporters https://review.opendev.org/630472	13:15
*** hashar_ has joined #zuul		13:20
*** hashar has quit IRC		13:20
tristanC	mnaser: arg, install-kubernetes still fails because of `file (/home/zuul/.minikube/client.key) is absent, cannot continue`	13:24
*** hashar_ is now known as hsahar		13:26
*** hsahar is now known as hashar		13:26
tristanC	how about we stop using minikube master?	13:26
flaper87	how many concurrent jobs can a Zuul executor handle? Do we have some stats/numbers on this?	13:32
tristanC	flaper87: iirc this is controlled by https://zuul-ci.org/docs/zuul/discussion/components.html#attr-executor.load_multiplier	13:35
tristanC	flaper87: the amount of jobs depends on the available cpu and that load_multiplier setting	13:36
*** Goneri has joined #zuul		13:36
flaper87	awesome, thanks. I was trying to put a number on it. Do we know how many jobs does a zuul-executor in the OpenStack deployment handle?	13:37
tristanC	flaper87: you can see that in http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=24&fullscreen&orgId=1	13:39
*** ysandeep\|rover is now known as ysandeep\|away		13:39
openstackgerrit	Merged zuul/zuul master: executor: drop --address=127.0.0.1 from kubectl https://review.opendev.org/715308	13:43
corvus	flaper87: somewhere between 80 and 100; that's an 8g vm	13:43
flaper87	corvus: gotcha! thanks and thank you tristanC	13:44
AJaeger	zuul-maint, do we want to cap stestr for zuul-jobs to unbreak py27 job? See https://review.opendev.org/715415	13:45
openstackgerrit	Merged zuul/zuul master: Display clean error message for missing secret https://review.opendev.org/713469	13:45
flaper87	corvus: are there other zuul services running in that same 8gb VM ?	13:45
corvus	flaper87: no, we have 12 of those vms, each dedicated to running zuul-executor	13:46
flaper87	gotcha! thanks	13:46
fungi	AJaeger: it seems like that's the only choice if there's a newer release out there which mistakenly declares python 2.7 support, as mtreinish observed that's tough to undo	13:46
corvus	flaper87: (our current total capacity is around 1000 test nodes, so that's sized closer to 80 per executor)	13:47
corvus	flaper87: (but sometimes our capacity moves up to about 1200; we probably wouldn't add another executor unless it exceeds that)	13:47
*** sgw has joined #zuul		13:50
openstackgerrit	Jan Kubovy proposed zuul/zuul master: Required SQL reporters https://review.opendev.org/630472	13:51
AJaeger	fungi: we could: Release a version that re-adds py27 support - and works - and then drop 2.7 again and do it properly. That is the dance mordred is doing on openstacksdk right now... And if that happens, we can revert my change.	13:51
tobiash	the version dance :-P	13:52
fungi	AJaeger: yeah, it would need to be higher than the version number which dropped python2 support though	13:52
fungi	but doable, you're right	13:52
*** avass has joined #zuul		13:53
fungi	i mean, with 3.0.0 broken, i suppose you could re-tag the last <3.0.0 release as 4.0.0 and then re-tag the fixed master branch as 5.0.0 with the correct python version restriction	13:54
fungi	so don't actually need to revert things	13:55
mordred	yeah. we talked about doing that for sdk - but reno got real confused	13:55
fungi	(since it does seem to use pbr, so no additional patches required to change the version)	13:55
*** bhavikdbavishi has quit IRC		13:55
fungi	mordred: ahh, i didn't realize stestr was also using reno	13:55
mordred	but - maybe that's not an issue for stestr	13:55
mordred	I don't know that it does - that was mostly - that was the plan I liked the most until it turned out reno didn't like that plan	13:56
mordred	so I think it's a good plan in general	13:56
mordred	but otherwise, yeah - revert, tag, re-revert, fix python, tag	13:56
fungi	i think mtreinish was also trying to cope with a somewhat stale repository which sat collecting python 2 removal changes from other maintainers for a very long time, so reverting it all could be a major pain	13:57
mordred	yeah. in his case just doing the retag might be the easier route	13:57
fungi	i'm not really clear on the maintenance situation for stestr, i may be thinking of testr	13:57
mordred	fungi: no reno in stestr repo that I can see	13:58
openstackgerrit	Andreas Jaeger proposed zuul/zuul-jobs master: Remove bashate from test-requirements https://review.opendev.org/715328	14:00
openstackgerrit	Andreas Jaeger proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script" https://review.opendev.org/715325	14:00
AJaeger	just rebased to resolved conflicts ^	14:00
tobiash	woot, just learned about disk image inheritance (https://review.opendev.org/713157), we have external scripting that enables just this for our config :)	14:01
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: Add tenant reconfiguration when main.yaml changed https://review.opendev.org/703631	14:09
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: WIP: zuul-restart: prevent issue when services are restarted out of order https://review.opendev.org/715424	14:09
openstackgerrit	Merged zuul/zuul-jobs master: Cap stestr version https://review.opendev.org/715415	14:09
corvus	tobiash: yay! let me know if it works for you	14:09
tobiash	corvus: I'll try it after our next nodepool update :)	14:10
bolg	corus: I've updated https://review.opendev.org/#/c/630472/ according to https://etherpad.openstack.org/p/zuulv4. It has -2 from your last review	14:10
tobiash	then I can throw away parts of our config generation scripts :)	14:11
bolg	corvus: ^^^	14:11
AJaeger	any takes to +2A the revert of the download-script change, please? https://review.opendev.org/#/c/715325/	14:11
tobiash	AJaeger: done	14:11
tristanC	zuul-maint (or rather zuul-operator-maint): i documented some tasks that should prevent issues when the scheduler restart before the mergers, but i'm not satisfied by the requirements... could you please have a look and see if there is a better way to fix: https://review.opendev.org/#/c/715424/1/roles/zuul-restart-when-zuul-conf-changed/tasks/main.yaml	14:12
corvus	bolg: thanks -- i think we aren't quite ready to merge that yet, right? i think that's step 6?	14:14
AJaeger	thanks, tobiash	14:14
tobiash	corvus: yes, as I read it we can merge it after merging and releasing zk auth support?	14:18
corvus	tobiash: yes i think that's right. and i think we're close to that	14:19
tobiash	:)	14:19
bolg	corvus: sure. As far as I understand the sequence the reporters are not really dependent on the previous steps, right? Theoretically an independent change if I am not mistaken. But no need to rush it from my POV	14:22
openstackgerrit	Merged zuul/zuul-jobs master: Remove bashate from test-requirements https://review.opendev.org/715328	14:26
openstackgerrit	Merged zuul/zuul-jobs master: ensure-tox: use python3 by default https://review.opendev.org/712804	14:26
openstackgerrit	Merged zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script" https://review.opendev.org/715325	14:26
*** bhavikdbavishi has joined #zuul		14:27
tobiash	corvus: this change of mordred should unbreak the nodepool builds: https://review.opendev.org/715216	14:38
*** y2kenny has joined #zuul		14:39
*** bhavikdbavishi has quit IRC		14:40
corvus	tobiash: thx, i approved that and the followup	14:40
tobiash	clarkb: do you in opendev see some nodes in nodepool nodelist with state deleted and None as provider?	14:41
tobiash	we accumulated ~10 of those in half a year which seems to be some sort of distributed race between two launchers	14:42
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: ci: pin minikube version to 1.8.2 https://review.opendev.org/715443	14:58
*** bhavikdbavishi has joined #zuul		15:01
y2kenny	corvus: the prepare-workspace-openshift thing worked and things are looking great.	15:03
corvus	y2kenny: awesome! tristanC ^ :)	15:03
*** bhavikdbavishi1 has joined #zuul		15:04
*** Open10K8S has joined #zuul		15:05
y2kenny	so what ended up blocking the k8s provisioning was 3 things (just as a summary), the fact that the executor needs to be a privilieged pod due to bwrap; old kubectl in the executor image; and slight variation in the prepare-workspace role	15:05
*** bhavikdbavishi has quit IRC		15:05
*** bhavikdbavishi1 is now known as bhavikdbavishi		15:05
*** zxiiro has joined #zuul		15:12
*** jamesmcarthur has joined #zuul		15:14
*** jamesmcarthur has quit IRC		15:15
*** jamesmcarthur has joined #zuul		15:17
y2kenny	corvus: now that things are working I have some observation that don't understand. When I use the namespace label type, I was expecting the pod (or whichever resources I created via ansible k8s/k8s_raw) to be created under the name of the namespace label. But it is created under the automatically created namespaces. Is this the expected	15:23
y2kenny	behaviour?	15:23
y2kenny	so anything created by the k8s/openshift driver will always be ephemeral?	15:24
clarkb	tobiash: all of our deleting nodes have a provider set. And only one of them is more than a few minutes old	15:27
*** bhavikdbavishi has quit IRC		15:27
tobiash	clarkb: oh, I meant 'deleted'	15:27
clarkb	tobiash: we have noticed that there are some unlocked ready nodes in a specific provider that have been there for a while (almost like nodepool isn't trying to use those to fulfull new requests) but that is the only oddity I am aware of currently in our nodepool	15:27
clarkb	tobiash: oh we have none in a deleted state in the db	15:27
tobiash	great	15:28
tobiash	it's very seldom, I'll observe this further	15:28
*** bhavikdbavishi has joined #zuul		15:28
corvus	y2kenny: yes, the namespace doesn't have access to anything outside of that namespace; the expectation is that you would use that to deploy test versions of your services in the automatically created namespace. if you need to deploy something more complicated (something that needs cluster-level access), you might want to consider deploying a k8s cluster for the job (but you will probably need a vm for	15:28
corvus	that). that's what we do to test the k8s driver in nodepool. if you want to deploy something in production, you could encode k8s credentials as a zuul secret and use them inside of a job to do that.	15:28
tristanC	mnaser: pinning minikube to v1.8.2 works (demo in https://review.opendev.org/#/c/715443/ )	15:29
mnaser	tristanC: i have someone working on fixing the actual thing in the zuul-jobs repo.	15:30
clarkb	is that the contrack thing? seems like you can simply install and modprobe as necessary?	15:31
mnaser	clarkb: i already did that, but there's another error now about the client.key	15:31
clarkb	ah	15:31
mnaser	openk10s (which isn't on irc now) found out it was because they moved the path where the key is stored	15:31
mnaser	so they're pushing a change now to chown that file to ansible_user and that should unblock	15:32
fungi	y2kenny: also more generally, nodepool's job is to create ephemeral resources for individual builds and then delete them as soon as the builds complete to free up available capacity for future builds. it's designed to not leave anything behind (and if it does, we consider that a "leak" of resources)	15:32
y2kenny	corvus: understood.	15:32
y2kenny	corvus, fungi: so are there any connection between k8s resources availability to the nodepool?	15:33
*** bhavikdbavishi has quit IRC		15:33
y2kenny	what I mean is, it is entirely possible for me to try to scheduler something in the k8s namespace via k8s_raw that is not available.	15:33
y2kenny	in that case, would I fail the job or is there a way for me to re-enqueue the job from within the job?	15:34
openstackgerrit	Oleksandr Kozachenko proposed zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path https://review.opendev.org/715418	15:34
y2kenny	as an concrete example, I want to create a pod with some k8s-label selector. Nodepool will schedule it because the namespace is available but kubectl will probably fail.	15:36
mnaser	Open10K8S: thanks for that patch, let's see how CI feels about it	15:37
Open10K8S	thanks	15:37
Open10K8S	mnaser: thanks	15:38
corvus	y2kenny: even the k8s pod type in nodepool gets a custom namespace with the same restriction; but i may not understand what the k8s-label has to do in this situation	15:39
y2kenny	corvus: I am thinking more on k8s concepts like node-taint, resource request, etc.	15:40
tristanC	Open10K8S: mnaser: should the fix also restore the legacy client key path? I was actually using that ~/.minikube/client.key path to setup the nodepool provider in integration tests	15:40
y2kenny	for example, with k8s_raw, I can supply an pod spec yaml that request 8 GPU resources and that might not be available on the cluster	15:41
mnaser	tristanC: does that matter? i don't think we've ever provided a "promised path for kubernetes keys" for that role in our api	15:41
mnaser	and i would encourage using kubectl commands to get that info out, i think there might be a way	15:42
mnaser	tristanC: or you can parse the .kube/config and go to the file that contains it	15:42
y2kenny	or a more generic use case, a job can in theory request to create a pod that has x amount of CPU and y amount of memory for the pod	15:42
tristanC	mnaser: works for me, i actually think the proper fix is to not use minikube master in zuul-operator	15:42
tristanC	like that, we can fix upgrade issue in a controlled maner	15:43
mnaser	i think if we're going to come up with a story of pinning things, we might as well do it in zuul-jobs so our consumers don't get confused as heck :)	15:43
y2kenny	so what I was wondering was, is there a connection between nodepool and k8s on resource availability? and if not, is there a way to feedback to the scheduler	15:43
corvus	y2kenny: got it. since you're controlling the requesting of resources, you could put retries in the ansbile that does that. you could limit the number of 'namespace' nodes in nodepool to try to approximate the level of resources your k8s cluster provides, so that the waiting happens in nodepool rather than in the job.	15:44
y2kenny	corvus: essentially, are there co-operation between the zuul scheduler and the kubernetes cluster scheduler.	15:44
y2kenny	corvus: ok	15:45
tristanC	mnaser: yeah, would be nice to express a 1.18 kubernetes, and let zuul-jobs pick the latest non breaking version.	15:45
* mnaser would be more excited about having third party zuul-jobs running aginst minikube		15:48
mnaser	oh	15:49
corvus	yeah, i'm kinda surprised minikube broke	15:49
mnaser	i mean we could have been a bit more "smart" in our code and parsed the path from .kube/config	15:51
mnaser	technically the path/location of that isn't exactly an api	15:51
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: Add tenant reconfiguration when main.yaml changed https://review.opendev.org/703631	15:55
tristanC	corvus: would you mind if we pin minikube in zuul-operator?	15:55
tristanC	because having to deal with such failure is time consuming, i'd rather do the upgrade manually when needed	15:56
corvus	tristanC: why not in the role?	15:56
corvus	tristanC: i'm sorry, i haven't been following; i'd appreciate a summary	15:56
mnaser	corvus: we run minikube master in our gate, they made 2 changes: conntrack is now a required package, and they moved the path of the client.crt (which we did a chown on in our role, so that failed with "missing fle")	15:57
tristanC	corvus: install-kubernetes started to fail because of minikube 1.19 release. mnaser and Open10K8S fixed the zuul-jobs, and i propose we do that https://review.opendev.org/715443 in zuul-operator (pin to 1.18)	15:58
mnaser	corvus: fixing conntrack was a one liner, which exposed the other failure that Open10K8S fixed relatively quickly afterwards which is https://review.opendev.org/#/c/715418/ (and passed tests)	15:58
corvus	so can we just merge the zuul-jobs change?	15:59
mnaser	yes, and i would be in favour of not pinning zuul-operator to a specific version of minikube	15:59
tristanC	corvus: sure, but i suspect using master will break zuul-operator ci again in the future	15:59
mnaser	i'd rather we break zuul-operator ci and we fix it in zuul-jobs than have all our users use something that's broken without us knowing about it	16:00
corvus	tristanC: then the ci will work :)	16:00
corvus	yeah, if this happened a lot, i'd think differently, but if this happens very rarely then it seems like it's worth tracking master so we don't end up forgetting about the pin and never moving it	16:00
mnaser	corvus: i actually think it might be a really cool opportunity to maybe ask and see if the minikube folks are interested in us doing 3rd party ci :)	16:01
mnaser	but maybe that's me thinking too far out	16:01
corvus	mnaser: i'd be in favor :)	16:01
corvus	/.minikube/profiles/minikube/	16:01
corvus	that's a silly path :)	16:01
mnaser	corvus: i think that's because you can have multiple profiles now and the default name of the default profile is "minikube"	16:02
mnaser	so i guess that's to make sure each profile had their own cert path so to speak	16:02
corvus	mnaser, tristanC, Open10K8S: should we handle both the old and new path?	16:02
corvus	(in case someone is pinned to 1.8?)	16:02
tristanC	corvus: mnaser: right, but well, this is time consuming... i was already fighting with other issue in how zuul scheduler is restarted, and having to deal with a bleeding edge kubernetes is not helping	16:02
Open10K8S	corvus:	16:03
tristanC	i'd rather have a stable fondation, and do those kind of intrusive change (upgrade kubernetes) in a control maner	16:03
mnaser	i like to think we're not just building the operator, but we're building zuul as a whole system and set of projects	16:04
Open10K8S	corvus: As @mnaser mentioned, in the .kube/config involves the right path for the client.key file	16:04
mnaser	how hard is it to parse a yaml string in ansible?	16:05
mnaser	we can run: "kubectl config view" and as part of that, we have this	16:05
mnaser	https://www.irccloud.com/pastebin/4bzdAdPz/	16:05
mnaser	(that's me on an older release)	16:05
tristanC	mnaser: that's actually a string in a record that is present in a list	16:06
mnaser	i mean, we'll probably have just _one_ user anyways	16:07
mnaser	and if we _really_ want to get fancy we can scan for the 'minikube' user..	16:07
corvus	tristanC: yes, just about everything everywhere has broken this week and surprised us. maybe because everyone is bored at home and has too much time on our hands. it's very stressful. but the work needs to be done at some point. i'd be in favor of a temporary pin until it's fixed if a problem comes up, but in this case, the problem is fixed, so i'd rather not have a pin that we forget to remove.	16:07
corvus	mnaser: there's a "from_yaml" filter	16:07
corvus	so if you run that command and register the output, you can do: set_fact: kube_config: {{ kubectl.output \| from_yaml }}	16:08
corvus	then kube_config['users'][0]['user']['client-key']	16:09
corvus	or even, loop: kube_config['users'] and then item['user']['client-key']	16:09
mnaser	corvus: i like the 2nd approach even more. Open10K8S: can we revise the patch to do that so we can do that?	16:09
corvus	yeah, that seems very future- and past- proof	16:10
tristanC	zuul-maint : please review https://review.opendev.org/#/c/715418/ to fix the install-kubernetes	16:10
mnaser	uh	16:10
corvus	tristanC: since that change works now, you can go ahead and depends-on it and continue your zuul-operator work	16:12
Open10K8S	manser: ok	16:12
Open10K8S	mnaser: ok	16:12
corvus	even while Open10K8S updates it to work for multiple versions	16:12
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: Add tenant reconfiguration when main.yaml changed https://review.opendev.org/703631	16:18
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: WIP: zuul-restart: prevent issue when services are restarted out of order https://review.opendev.org/715424	16:21
*** jamesmcarthur has quit IRC		16:40
*** dpawlik has quit IRC		16:41
openstackgerrit	Matthieu Huin proposed zuul/zuul master: OIDCAuthenticator: add capabilities, scope option https://review.opendev.org/702275	16:47
openstackgerrit	Matthieu Huin proposed zuul/zuul master: admin REST API: zuul-web integration https://review.opendev.org/643536	16:52
openstackgerrit	Oleksandr Kozachenko proposed zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path https://review.opendev.org/715418	16:54
*** y2kenny has quit IRC		17:00
openstackgerrit	Oleksandr Kozachenko proposed zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path https://review.opendev.org/715418	17:05
tristanC	Open10K8S: previous PS was also failing with "Invalid data passed to 'loop', it requires a list, got this instead: . Hint: If you passed a list/dict of just one element, try adding wantlist=True to your lookup invocation or use q/query instead of lookup."	17:07
mnaser	tristanC: weird, i just saw it pass here infront of me	17:14
mnaser	https://www.irccloud.com/pastebin/LawvSPUS/	17:14
*** hashar has quit IRC		17:17
tristanC	last PS seems to fix the issue indeed	17:18
mnaser	tristanC: i think it got the kube config from root and it had an empty yaml string	17:19
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-operator master: zuul-restart: prevent issue when services are restarted out of order https://review.opendev.org/715424	17:22
tristanC	so, back to the initial problem, here is a zuul restart implementation using merger deployment metadata.resourceVersion checks ^ . Though i think there is still a chance the scheduler restart before the merger are reconfigured, and perhaps we need a 'pause' command for the scheduler service	17:24
tristanC	corvus: this is quite a bit of extra tasks to work around the merge job failure we talked about yesterday... why can't the scheduler retry failed cat job again?	17:26
*** jpena is now known as jpena\|off		17:28
corvus	tristanC: you can check eavesdrop for the whole thing, but a quick reminder is that in a large system with multiple mergers, we would retry the jobs very quickly. it should be the job of the operator not to put the system into an inconsistent state.	17:32
tristanC	corvus: couldn't we use an exponential delay with a 10minutes bailout or something then?	17:33
tristanC	i mean right now, when the scheduler reload the config, and a merger job failed, then the tenant simply isn't loaded	17:34
corvus	tristanC: probably. but the entire job of the operator is to control what services are running under what configuration. it should do this.	17:35
tristanC	so in a large system, with many projects and connections, a single failure seems to result in a broken tenant	17:35
mnaser	Open10K8S: you can see that https://review.opendev.org/#/c/715418/ failed but it seems like it fail because gpg timed out, you can comment with "recheck" and it will attempt to load it again	17:35
*** evrardjp has quit IRC		17:36
*** evrardjp has joined #zuul		17:36
tristanC	corvus: actually we don't control what services are running, kubernetes does. So the maybe failure i mentioned before is when we stop the scheduler before restarting the merger, kubernetes may still restart the scheduler before the merger are also turned off	17:37
tristanC	oh well, we can control the service by setting the scheduler replica to zero then	17:37
mnaser	the way i have handled that in my operators (granted, those are golang), is that the operator watches for the other deployment states as part of the loop and makes adjustments	17:38
mnaser	i.e. if deployment is still in a rollout, the operator returns a "re-queue" result	17:38
tristanC	mnaser: the trick is we have to ensure the scheduler is stopped, then restart merger, then finally restart the scheduler.	17:39
mnaser	right so when you are inside the "update" loop instead of a "create" loop, you would just wait for the deployments/restarts to complete	17:40
mnaser	like it seems to me if you just ensured the scheduler deployment finished rollout, and not make any other changes to the rest of the system	17:41
tristanC	mnaser: i think that's what was happening before with the `wait: true` attribute of the k8s ansible command	17:41
*** dpawlik has joined #zuul		17:41
mnaser	this was a big reason why i was adament on golang based things, it's because in stuff like this, you can represent logic, right now we just have something that generates manifests using dhall and ansible playbooks, controlling logic is really hard in that	17:42
mnaser	and the wait: true is not really ideal because you're blocking the control loop at the time	17:42
mnaser	instead with a golang operator you can tell it to re-queue, or listens to specific events	17:43
tristanC	mnaser: i'm not sure it's much harder, it's just how the scheduler is designed and that we need to stop it before restarting the service	17:44
tristanC	and it seems like the only way to stop a service in k8s is actually to set the replica count to 0	17:45
mnaser	the whole point of the operator was that it would have the smarts to be able to coordinate and make those actions	17:45
tristanC	right, and i'm still learning how these actions can be performed in k8s	17:46
mnaser	if we're going to make those changes inside zuul, the operator becomes mostly a glorified helm applier	17:46
tristanC	it seems like if the scheduler would retry failed cat job, that would simplify zuul deployment. it seems like the issue we are experiencing the zuul-operator ci are caused by out of order event, e.g. when scheduler restart before merger	17:50
tristanC	if we can't fix that inside zuul, then i'm looking for ways to fix that in the operator... it's not that trivial to translate https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul_restart.yaml into kubernetes actions, however it is performed in golang or ansible	17:52
*** dpawlik has quit IRC		17:52
tristanC	we need a way to ensure the scheduler doesn't restart too early, and iiuc the only way is to set the replica count to 0 and wait for the statefulset to reach the desired state of 0 scheduler	17:53
*** y2kenny has joined #zuul		17:54
*** y2kenny has quit IRC		18:17
*** y2kenny has joined #zuul		18:37
y2kenny	if I want to connect to gerrit from the job (to query the wip state of a change, for example) is there a recommended way to do it?	18:39
mnaser	y2kenny: i think that might become a bit complicated, because of how you'd have to manage credentials for gerrit access (assuming its not public)	18:40
mnaser	y2kenny: but, you have access to the commit message for example inside the zuul structure	18:40
clarkb	you also have the chagne number so could query the http rest api anonymously to get the status of that change (if that is allowed in your setup, it is on ours)	18:40
y2kenny	ok.	18:41
y2kenny	thanks	18:41
clarkb	in general though I think we've found that having consistent job behavior that doesn't depend on exteranl state is a good thing (basically if zuul thinks the job should run then that job should run the same way each time)	18:42
clarkb	reduces variables that people have to consider when things don't work as expected	18:43
y2kenny	that's a good tip	18:43
y2kenny	this is still me trying to get around the not-wanting-to-use-gerrit-but-have-to-use-gerrit issue	18:45
y2kenny	(I don't mean me not want to use gerrit)	18:45
mnaser	corvus: is https://review.opendev.org/#/c/715418/ okay with you at this state?	18:50
*** gmann is now known as gmann_lunch		18:59
corvus	mnaser, Open10K8S: lgtm, thanks!	19:12
Open10K8S	corvus:my pleasure	19:14
openstackgerrit	Ronelle Landy proposed zuul/zuul-jobs master: Add var to allow jobs to skip centos8 deps install https://review.opendev.org/715524	19:26
*** gmann_lunch is now known as gmann		19:29
*** y2kenny has quit IRC		19:31
openstackgerrit	Merged zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path https://review.opendev.org/715418	19:35
*** rf0lc0 has joined #zuul		19:43
*** ysandeep\|away has quit IRC		19:44
*** ysandeep has joined #zuul		19:45
*** avass has quit IRC		19:45
*** kgz has quit IRC		19:45
*** rfolco has quit IRC		19:46
*** kgz has joined #zuul		19:51
*** y2kenny has joined #zuul		20:03
y2kenny	for untrusted project, if I want to add playbooks and roles, does it have to be in specific locations? (i.e. playbooks/ and roles/ at the root parallel .zuul.d/ or can I place them under .zuul.d/playbooks/ and .zuul.d/roles/ ?)	20:05
clarkb	y2kenny: I think the playbooks can be in arbitrary locations. I'm looking up the example I know of	20:07
clarkb	oh wait no thats still in playbooks/ hrm	20:08
mordred	yes - playbooks can be in arbitrary locations - you reference them by path	20:08
y2kenny	but will ansible know where to look for the roles?	20:08
y2kenny	(let say I created a custom role within the untrusted project.)	20:09
mordred	so if you put them in .zuul.d/playbooks/foo.yaml you'd need to say run: .zuul.d/playbooks/foo.yaml - one thing to keep in mind though is that putting yaml files in .zuul.d might be a bad idea purely because zuul config files are yaml and it loads all the things in .zuul.d - I'm pretty sure it doesn't recurse - but still, it might be mentally confusing	20:09
mordred	for roles - there are two answers	20:09
mordred	if you wnat the repo to be used in zuul in a roles: statement so that zuul uses it as a roles repo - the roles dir needs to be in the root of the repo	20:09
y2kenny	oh right... I wasn't thinking about the yaml thing.	20:10
mordred	if it's just roles you're using from inside the playbook - you can put the roles dir adjacent to the playbook	20:10
mordred	so you could have playbooks/foo.yaml and playbooks/roles/my-role - and then use my-role in foo.yaml	20:10
mordred	but then you can' tuse that role in _other_ zuul jobs	20:10
mordred	in different repos	20:10
y2kenny	ok. Thanks mordred and clarkb	20:10
*** rlandy is now known as rlandy\|brb		20:42
y2kenny	I was trying to run markdownlint in one of my untrusted-project. The role executed byt I got no such file or directory for .markdownlint/node_modules/.bin/markdownlint, did I miss some pre step?	21:01
corvus	y2kenny: maybe run ensure-markdownlint first	21:15
*** rlandy\|brb is now known as rlandy		21:16
corvus	y2kenny: or if you have markdownlint installed already, we may need to update that role	21:16
y2kenny	ok. sorry, I missed the ensure roles.	21:17
corvus	i think that was the last one we added before we figured out how to make roles like that work with software pre-installed or not	21:17
openstackgerrit	Merged zuul/nodepool master: Add libc6-dev to bindep https://review.opendev.org/715216	21:17
openstackgerrit	Merged zuul/nodepool master: Pin docker images to 3.7 explicitly https://review.opendev.org/715043	21:17
corvus	(basically, the ensure-markdownlint role will put markdownlint there, and the markdownlint role expects it there. if it's already installed, we need to update the role to check)	21:17
y2kenny	corvus: are there any convention to follow in terms of setting up roles?	21:21
corvus	y2kenny: we're sort of establishing the convention of "ensure-foo" and "foo". and ideally you'd put ensure-foo in a pre playbook	21:22
y2kenny	like what are the trade off of having a pre-installed/configured image vs doing prep in pre-run	21:22
corvus	time+space vs universal applicability and simplicity	21:22
corvus	in opendev, we're actually trying to simplify our images to reduce our maintenance cost	21:23
y2kenny	ok. so ensure->task->fetch is typical	21:23
corvus	(but big things that we use a lot, we'll probably keep on our images because they're worth it)	21:23
y2kenny	how about having tools as part of the source repository. Do you guys have use cases like that?	21:24
corvus	y2kenny: yeah, we occasionally run things out of a "tools/" directory. there's even a zuul-jobs role to run "tools/test-setup.sh"	21:24
y2kenny	(the first example that comes to mind is the checkpatch.pl that comes with the kernel but I am not sure how typical that type of things are.)	21:24
y2kenny	ok I see	21:24
corvus	y2kenny: yeah, that would be a good case	21:24
mordred	we also split into ensure-foo and foo so that we can make jobs do the ensure-foo step in the pre-playbook ... which will retry the job if it fails	21:26
*** y2kenny38 has joined #zuul		21:29
*** y2kenny43 has joined #zuul		21:30
*** y2kenny43 has left #zuul		21:30
*** y2kenny has quit IRC		21:32
*** y2kenny has joined #zuul		21:44
*** rlandy has quit IRC		22:15
mnaser	hmm	22:32
mnaser	y2kenny, corvus: do we not have a markdownlint job already in place that someone can just use?	22:32
y2kenny	mnaser: that's the one I was using	22:32
mnaser	y2kenny: oh interesting, so that means the job doesnt install by default? hmm	22:33
y2kenny	but I need to put ensure-markdownlint in first	22:33
y2kenny	and to use ensure-markdownlint I have to have a task to install npm	22:33
mnaser	tbh if you are using the 'markdownlint' job and it's not "just working" then that's a bug	22:33
y2kenny	sounds like there's a convention of ensure-job before job	22:33
mnaser	y2kenny: ensure-role before role	22:34
y2kenny	right	22:34
y2kenny	OH	22:34
y2kenny	you mean there's a job?	22:34
y2kenny	let me double check	22:34
mnaser	that's what im wondering and checking :)	22:34
mnaser	y2kenny: there is a job -- https://zuul-ci.org/docs/zuul-jobs/general-jobs.html#job-markdownlint	22:34
corvus	mnaser: i agree, and i apologize for not thinking to mention that earlier y2kenny.	22:35
y2kenny	yes. I should've just used that	22:35
y2kenny	lol	22:35
corvus	it's friday for everyone, right? :)	22:35
mnaser	y2kenny: that should be plug and play in that case for you :)	22:35
mnaser	corvus: friday .. monday .. wednesday .. i don't even know what day it is anymore	22:35
mnaser	ENOSOCIALINTERACTION	22:35
y2kenny	I will give that a try. thanks folks.	22:36
mnaser	y2kenny: no problem!	22:42
y2kenny	um... looks like the playbook job requires a bit more than my node can handle and failed	22:46
y2kenny	it looks for gpg and my base image doesn't have it	22:47
y2kenny	in project configuration, I believe I can add variables in a job for a pipeline. for example	23:31
y2kenny	check:	23:32
y2kenny	jobs:	23:32
y2kenny	- name: some-job	23:32
y2kenny	vars:	23:32
y2kenny	var-name: "value"	23:32
y2kenny	but for some reason I am getting "extra keys not allowed @ data['check']['jobs'][0]['vars']['var-name']"	23:33
y2kenny	is this a yaml formatting thing or I misunderstood the documentation?	23:33
clarkb	y2kenny: you need a : after some-job	23:34
clarkb	or wait no maybe not since you used name:	23:34
clarkb	looking for examples now	23:35
fungi	name and vars are the keys under the job at index 0	23:35
fungi	so no, format looks correct to me	23:35
clarkb	https://opendev.org/opendev/system-config/src/branch/master/.zuul.yaml#L1641-L1645 is how we do it	23:35
clarkb	we don't use name:	23:35
clarkb	(is that an optional format?)	23:35
fungi	ahh, yeah we allow the name: thing for project definitions but maybe that's not a thing in job definitions	23:37
y2kenny	clarkb: um... when I does that I get expected str for dictionary value @ data['check']['jobs'][0]['some-job']	23:38
clarkb	y2kenny: ya when you do it the way we do you have to add the : at the end then indent vars under some-job	23:38
y2kenny	I did... but I think I may have missed an indent...	23:39
clarkb	the v in vars should be under the m in some-job	23:39
y2kenny	yup. That's does it!	23:41
y2kenny	what does zuul run when it check this on the server? Is there a way for me to run it locally with some tool?	23:41
y2kenny	(the multiple patchset is no big deal... I am just curious)	23:42
clarkb	y2kenny: its doing a yaml load, then running its own data consistency checks on the data structures that get loaded	23:42
clarkb	y2kenny: you could pretty easily do the yaml load step and ensure it parses yaml properly, but the data consistency checks are best/easiest done by zuul itself	23:42
y2kenny	right. Ok.	23:43
y2kenny	thanks for the help clarkb, fungi	23:43
fungi	yw	23:44
fungi	and yes, performing a dry run of zuul configuration checking is nontrivial because its context can potentially include the state of any branch of any repository in your whole system	23:45
fungi	basically to determine "what will my zuul think of this" you have to ask it, or run a complete enough copy of your deployment to include all the bits which could be involved in assembling the job	23:46
*** ysandeep is now known as ysandeep\|rover		23:48
y2kenny	fungi: I see	23:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!