Friday, 2020-03-27

*** rfolco has joined #zuul00:06
*** jamesmcarthur has quit IRC00:11
*** jamesmcarthur has joined #zuul00:11
*** armstrongs has joined #zuul00:12
*** armstrongs has quit IRC00:21
mordredremote:   https://review.opendev.org/715331 Release 3.1.0 of cliff00:24
mordredthat should fix the py35 issue when it lands00:24
*** rlandy has quit IRC00:28
*** jamesmcarthur has quit IRC00:29
*** jamesmcarthur has joined #zuul00:30
*** jamesmcarthur has quit IRC00:30
*** jamesmcarthur has joined #zuul00:31
*** jamesmcarthur has quit IRC00:36
*** sgw has quit IRC00:38
*** sgw has joined #zuul00:38
*** jamesmcarthur has joined #zuul00:39
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script"  https://review.opendev.org/71532500:57
*** jamesmcarthur has quit IRC01:00
*** sgw has quit IRC01:06
*** sgw has joined #zuul01:08
*** rlandy has joined #zuul01:23
*** rlandy has quit IRC01:33
*** zxiiro has quit IRC01:35
*** jamesmcarthur has joined #zuul01:38
*** adamw has quit IRC01:43
*** adamw has joined #zuul01:43
*** jamesmcarthur has quit IRC01:49
*** adamw has quit IRC01:58
*** adamw has joined #zuul02:02
*** jamesmcarthur has joined #zuul02:26
*** jamesmcarthur has quit IRC02:32
*** jamesmcarthur has joined #zuul02:32
*** ysandeep|away is now known as ysandeep|rover02:34
*** Goneri has quit IRC02:37
*** swest has quit IRC02:54
*** rfolco has quit IRC02:55
*** jamesmcarthur has quit IRC02:55
*** bhavikdbavishi has joined #zuul02:55
*** jamesmcarthur has joined #zuul02:57
*** bhavikdbavishi has quit IRC03:00
*** jamesmcarthur has quit IRC03:06
*** swest has joined #zuul03:09
*** jamesmcarthur has joined #zuul03:13
*** jamesmcarthur has quit IRC03:31
*** jamesmcarthur has joined #zuul04:14
*** bhavikdbavishi has joined #zuul04:24
*** saneax has joined #zuul04:27
*** bhavikdbavishi1 has joined #zuul04:29
*** bhavikdbavishi has quit IRC04:30
*** bhavikdbavishi1 is now known as bhavikdbavishi04:30
*** jamesmcarthur has quit IRC05:10
*** jamesmcarthur has joined #zuul05:23
*** jamesmcarthur has quit IRC05:27
*** bhavikdbavishi has quit IRC05:33
*** bhavikdbavishi has joined #zuul05:34
*** evrardjp has quit IRC05:36
*** evrardjp has joined #zuul05:36
*** ysandeep|rover is now known as ysandeep|rover|b05:41
*** bhavikdbavishi has quit IRC05:57
*** sgw has quit IRC06:02
*** bhavikdbavishi has joined #zuul06:08
*** ysandeep|rover|b is now known as ysandeep|rover06:08
*** bhavikdbavishi has quit IRC06:49
*** dpawlik has joined #zuul07:11
*** bhavikdbavishi has joined #zuul07:40
*** jpena|off is now known as jpena08:14
*** ysandeep|rover is now known as ysandeep|rover|l08:33
*** ysandeep|rover|l is now known as ysandeep|rover08:59
*** tosky has joined #zuul09:29
*** sshnaidm|afk is now known as sshnaidm|off09:50
*** pabelanger has quit IRC09:58
*** fbo has quit IRC09:58
*** jpena has quit IRC09:59
*** jpena has joined #zuul10:06
*** fbo has joined #zuul10:11
*** fbo has quit IRC10:36
*** ysandeep|rover is now known as ysandeep|rov|brb10:41
*** bhavikdbavishi has quit IRC10:51
*** bhavikdbavishi has joined #zuul10:52
*** ysandeep|rov|brb is now known as ysandeep|rover11:03
*** arxcruz is now known as arxcruz|off11:19
*** tobias-urdin has joined #zuul11:28
openstackgerritJan Kubovy proposed zuul/zuul master: WIP: Enforce sql connections for scheduler and web  https://review.opendev.org/63047211:48
*** ysandeep|rover is now known as ysandeep|rov|mtg12:00
*** rlandy has joined #zuul12:03
*** jpena is now known as jpena|lunch12:05
*** ysandeep|rov|mtg is now known as ysandeep|rover12:37
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Cap stestr version  https://review.opendev.org/71541512:41
*** hashar has joined #zuul12:44
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Cap stestr version  https://review.opendev.org/71541512:45
*** rfolco has joined #zuul12:48
mnasermorning all12:53
mnaserwould anyone like to +W this today? we hit the 2 weeks - https://review.opendev.org/#/c/712804/112:54
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: install-kubernetes: fix missing package  https://review.opendev.org/71541812:59
mnasertristanC: fyi ^13:00
tristanCmnaser: thanks!13:00
*** openstackstatus has quit IRC13:01
*** openstack has joined #zuul13:05
*** ChanServ sets mode: +o openstack13:05
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Cap stestr version  https://review.opendev.org/71541513:07
openstackgerritJan Kubovy proposed zuul/zuul master: Required SQL reporters  https://review.opendev.org/63047213:15
*** hashar_ has joined #zuul13:20
*** hashar has quit IRC13:20
tristanCmnaser: arg, install-kubernetes still fails because of `file (/home/zuul/.minikube/client.key) is absent, cannot continue`13:24
*** hashar_ is now known as hsahar13:26
*** hsahar is now known as hashar13:26
tristanChow about we stop using minikube master?13:26
flaper87how many concurrent jobs can a Zuul executor handle? Do we have some stats/numbers on this?13:32
tristanCflaper87: iirc this is controlled by https://zuul-ci.org/docs/zuul/discussion/components.html#attr-executor.load_multiplier13:35
tristanCflaper87: the amount of jobs depends on the available cpu and that load_multiplier setting13:36
*** Goneri has joined #zuul13:36
flaper87awesome, thanks. I was trying to put a number on it. Do we know how many jobs does a zuul-executor in the OpenStack deployment handle?13:37
tristanCflaper87: you can see that in http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=24&fullscreen&orgId=113:39
*** ysandeep|rover is now known as ysandeep|away13:39
openstackgerritMerged zuul/zuul master: executor: drop --address=127.0.0.1 from kubectl  https://review.opendev.org/71530813:43
corvusflaper87: somewhere between 80 and 100; that's an 8g vm13:43
flaper87corvus: gotcha! thanks and thank you tristanC13:44
AJaegerzuul-maint, do we want to cap stestr for zuul-jobs to unbreak py27 job? See https://review.opendev.org/71541513:45
openstackgerritMerged zuul/zuul master: Display clean error message for missing secret  https://review.opendev.org/71346913:45
flaper87corvus: are there other zuul services running in that same 8gb VM ?13:45
corvusflaper87: no, we have 12 of those vms, each dedicated to running zuul-executor13:46
flaper87gotcha! thanks13:46
fungiAJaeger: it seems like that's the only choice if there's a newer release out there which mistakenly declares python 2.7 support, as mtreinish observed that's tough to undo13:46
corvusflaper87: (our current total capacity is around 1000 test nodes, so that's sized closer to 80 per executor)13:47
corvusflaper87: (but sometimes our capacity moves up to about 1200; we probably wouldn't add another executor unless it exceeds that)13:47
*** sgw has joined #zuul13:50
openstackgerritJan Kubovy proposed zuul/zuul master: Required SQL reporters  https://review.opendev.org/63047213:51
AJaegerfungi: we could: Release a version that re-adds py27 support - and works - and then drop 2.7 again and do it properly. That is the dance mordred is doing on openstacksdk right now... And if that happens, we can revert my change.13:51
tobiashthe version dance :-P13:52
fungiAJaeger: yeah, it would need to be higher than the version number which dropped python2 support though13:52
fungibut doable, you're right13:52
*** avass has joined #zuul13:53
fungii mean, with 3.0.0 broken, i suppose you could re-tag the last <3.0.0 release as 4.0.0 and then re-tag the fixed master branch as 5.0.0 with the correct python version restriction13:54
fungiso don't actually need to revert things13:55
mordredyeah. we talked about doing that for sdk - but reno got real confused13:55
fungi(since it does seem to use pbr, so no additional patches required to change the version)13:55
*** bhavikdbavishi has quit IRC13:55
fungimordred: ahh, i didn't realize stestr was also using reno13:55
mordredbut - maybe that's not an issue for stestr13:55
mordredI don't know that it does - that was mostly - that was the plan I liked the most until it turned out reno didn't like that plan13:56
mordredso I think it's a good plan in general13:56
mordredbut otherwise, yeah - revert, tag, re-revert, fix python, tag13:56
fungii think mtreinish was also trying to cope with a somewhat stale repository which sat collecting python 2 removal changes from other maintainers for a very long time, so reverting it all could be a major pain13:57
mordredyeah. in his case just doing the retag might be the easier route13:57
fungii'm not really clear on the maintenance situation for stestr, i may be thinking of testr13:57
mordredfungi: no reno in stestr repo that I can see13:58
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Remove bashate from test-requirements  https://review.opendev.org/71532814:00
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script"  https://review.opendev.org/71532514:00
AJaegerjust rebased to resolved conflicts ^14:00
tobiashwoot, just learned about disk image inheritance (https://review.opendev.org/713157), we have external scripting that enables just this for our config :)14:01
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Add tenant reconfiguration when main.yaml changed  https://review.opendev.org/70363114:09
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: zuul-restart: prevent issue when services are restarted out of order  https://review.opendev.org/71542414:09
openstackgerritMerged zuul/zuul-jobs master: Cap stestr version  https://review.opendev.org/71541514:09
corvustobiash: yay!  let me know if it works for you14:09
tobiashcorvus: I'll try it after our next nodepool update :)14:10
bolgcorus: I've updated https://review.opendev.org/#/c/630472/ according to https://etherpad.openstack.org/p/zuulv4. It has -2 from your last review14:10
tobiashthen I can throw away parts of our config generation scripts :)14:11
bolgcorvus: ^^^14:11
AJaegerany takes to +2A the revert of the download-script change, please? https://review.opendev.org/#/c/715325/14:11
tobiashAJaeger: done14:11
tristanCzuul-maint (or rather zuul-operator-maint): i documented some tasks that should prevent issues when the scheduler restart before the mergers, but i'm not satisfied by the requirements... could you please have a look and see if there is a better way to fix: https://review.opendev.org/#/c/715424/1/roles/zuul-restart-when-zuul-conf-changed/tasks/main.yaml14:12
corvusbolg: thanks -- i think we aren't quite ready to merge that yet, right?  i think that's step 6?14:14
AJaegerthanks, tobiash14:14
tobiashcorvus: yes, as I read it we can merge it after merging and releasing zk auth support?14:18
corvustobiash: yes i think that's right.  and i think we're close to that14:19
tobiash:)14:19
bolgcorvus: sure. As far as I understand the sequence the reporters are not really dependent on the previous steps, right? Theoretically an independent change if I am not mistaken. But no need to rush it from my POV14:22
openstackgerritMerged zuul/zuul-jobs master: Remove bashate from test-requirements  https://review.opendev.org/71532814:26
openstackgerritMerged zuul/zuul-jobs master: ensure-tox: use python3 by default  https://review.opendev.org/71280414:26
openstackgerritMerged zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script"  https://review.opendev.org/71532514:26
*** bhavikdbavishi has joined #zuul14:27
tobiashcorvus: this change of mordred should unbreak the nodepool builds: https://review.opendev.org/71521614:38
*** y2kenny has joined #zuul14:39
*** bhavikdbavishi has quit IRC14:40
corvustobiash: thx, i approved that and the followup14:40
tobiashclarkb: do you in opendev see some nodes in nodepool nodelist with state deleted and None as provider?14:41
tobiashwe accumulated ~10 of those in half a year which seems to be some sort of distributed race between two launchers14:42
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: ci: pin minikube version to 1.8.2  https://review.opendev.org/71544314:58
*** bhavikdbavishi has joined #zuul15:01
y2kennycorvus: the prepare-workspace-openshift thing worked and things are looking great.15:03
corvusy2kenny: awesome! tristanC ^ :)15:03
*** bhavikdbavishi1 has joined #zuul15:04
*** Open10K8S has joined #zuul15:05
y2kennyso what ended up blocking the k8s provisioning was 3 things (just as a summary), the fact that the executor needs to be a privilieged pod due to bwrap; old kubectl in the executor image; and slight variation in the prepare-workspace role15:05
*** bhavikdbavishi has quit IRC15:05
*** bhavikdbavishi1 is now known as bhavikdbavishi15:05
*** zxiiro has joined #zuul15:12
*** jamesmcarthur has joined #zuul15:14
*** jamesmcarthur has quit IRC15:15
*** jamesmcarthur has joined #zuul15:17
y2kennycorvus: now that things are working I have some observation that don't understand.    When I use the namespace label type, I was expecting the pod (or whichever resources I created via ansible k8s/k8s_raw) to be created under the name of the namespace label.  But it is created under the automatically created namespaces.  Is this the expected15:23
y2kennybehaviour?15:23
y2kennyso anything created by the k8s/openshift driver will always be ephemeral?15:24
clarkbtobiash: all of our deleting nodes have a provider set. And only one of them is more than a few minutes old15:27
*** bhavikdbavishi has quit IRC15:27
tobiashclarkb: oh, I meant 'deleted'15:27
clarkbtobiash: we have noticed that there are some unlocked ready nodes in a specific provider that have been there for a while (almost like nodepool isn't trying to use those to fulfull new requests) but that is the only oddity I am aware of currently in our nodepool15:27
clarkbtobiash: oh we have none in a deleted state in the db15:27
tobiashgreat15:28
tobiashit's very seldom, I'll observe this further15:28
*** bhavikdbavishi has joined #zuul15:28
corvusy2kenny: yes, the namespace doesn't have access to anything outside of that namespace; the expectation is that you would use that to deploy test versions of your services in the automatically created namespace.  if you need to deploy something more complicated (something that needs cluster-level access), you might want to consider deploying a k8s cluster for the job (but you will probably need a vm for15:28
corvusthat).  that's what we do to test the k8s driver in nodepool.  if you want to deploy something in production, you could encode k8s credentials as a zuul secret and use them inside of a job to do that.15:28
tristanCmnaser: pinning minikube to v1.8.2 works (demo in https://review.opendev.org/#/c/715443/ )15:29
mnasertristanC: i have someone working on fixing the actual thing in the zuul-jobs repo.15:30
clarkbis that the contrack thing? seems like you can simply install and modprobe as necessary?15:31
mnaserclarkb: i already did that,  but there's another error now about the client.key15:31
clarkbah15:31
mnaseropenk10s (which isn't on irc now) found out it was because they moved the path where the key is stored15:31
mnaserso they're pushing a change now to chown that file to ansible_user and that should unblock15:32
fungiy2kenny: also more generally, nodepool's job is to create ephemeral resources for individual builds and then delete them as soon as the builds complete to free up available capacity for future builds. it's designed to not leave anything behind (and if it does, we consider that a "leak" of resources)15:32
y2kennycorvus: understood.15:32
y2kennycorvus, fungi: so are there any connection between k8s resources availability to the nodepool?15:33
*** bhavikdbavishi has quit IRC15:33
y2kennywhat I mean is, it is entirely possible for me to try to scheduler something in the k8s namespace via k8s_raw that is not available.15:33
y2kennyin that case, would I fail the job or is there a way for me to re-enqueue the job from  within the job?15:34
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path  https://review.opendev.org/71541815:34
y2kennyas an concrete example, I want to create a pod with some k8s-label selector.  Nodepool will schedule it because the namespace is available but kubectl will probably fail.15:36
mnaserOpen10K8S: thanks for that patch, let's see how CI feels about it15:37
Open10K8Sthanks15:37
Open10K8Smnaser: thanks15:38
corvusy2kenny: even the k8s pod type in nodepool gets a custom namespace with the same restriction;  but i may not understand what the k8s-label has to do in this situation15:39
y2kennycorvus: I am thinking more on k8s concepts like node-taint, resource request, etc.15:40
tristanCOpen10K8S: mnaser: should the fix also restore the legacy client key path? I was actually using that ~/.minikube/client.key path to setup the nodepool provider in integration tests15:40
y2kennyfor example, with k8s_raw, I can supply an pod spec yaml that request 8 GPU resources and that might not be available on the cluster15:41
mnasertristanC: does that matter?  i don't think we've ever provided a "promised path for kubernetes keys" for that role in our api15:41
mnaserand i would encourage using kubectl commands to get that info out, i think there might be a way15:42
mnasertristanC: or you can parse the .kube/config and go to the file that contains it15:42
y2kennyor a more generic use case, a job can in theory request to create a pod that has x amount of CPU and y amount of memory for the pod15:42
tristanCmnaser: works for me, i actually think the proper fix is to not use minikube master in zuul-operator15:42
tristanClike that, we can fix upgrade issue in a controlled maner15:43
mnaseri think if we're going to come up with a story of pinning things, we might as well do it in zuul-jobs so our consumers don't get confused as heck :)15:43
y2kennyso what I was wondering was, is there a connection between nodepool and k8s on resource availability?  and if not, is there a way to feedback to the scheduler15:43
corvusy2kenny: got it.  since you're controlling the requesting of resources, you could put retries in the ansbile that does that.  you could limit the number of 'namespace' nodes in nodepool to try to approximate the level of resources your k8s cluster provides, so that the waiting happens in nodepool rather than in the job.15:44
y2kennycorvus: essentially, are there co-operation between the zuul scheduler and the kubernetes cluster scheduler.15:44
y2kennycorvus: ok15:45
tristanCmnaser: yeah, would be nice to express a 1.18 kubernetes, and let zuul-jobs pick the latest non breaking version.15:45
* mnaser would be more excited about having third party zuul-jobs running aginst minikube15:48
mnaseroh15:49
corvusyeah, i'm kinda surprised minikube broke15:49
mnaseri mean we could have been a bit more "smart" in our code and parsed the path from .kube/config15:51
mnasertechnically the path/location of that isn't exactly an api15:51
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Add tenant reconfiguration when main.yaml changed  https://review.opendev.org/70363115:55
tristanCcorvus: would you mind if we pin minikube in zuul-operator?15:55
tristanCbecause having to deal with such failure is time consuming, i'd rather do the upgrade manually when needed15:56
corvustristanC: why not in the role?15:56
corvustristanC: i'm sorry, i haven't been following; i'd appreciate a summary15:56
mnasercorvus: we run minikube master in our gate, they made 2 changes: conntrack is now a required package, and they moved the path of the client.crt (which we did a chown on in our role, so that failed with "missing fle")15:57
tristanCcorvus: install-kubernetes started to fail because of minikube 1.19 release. mnaser and Open10K8S fixed the zuul-jobs, and i propose we do that https://review.opendev.org/715443 in zuul-operator (pin to 1.18)15:58
mnasercorvus: fixing conntrack was a one liner, which exposed the other failure that Open10K8S fixed relatively quickly afterwards which is https://review.opendev.org/#/c/715418/ (and passed tests)15:58
corvusso can we just merge the zuul-jobs change?15:59
mnaseryes, and i would be in favour of *not* pinning zuul-operator to a specific version of minikube15:59
tristanCcorvus: sure, but i suspect using master will break zuul-operator ci again in the future15:59
mnaseri'd rather we break zuul-operator ci and we fix it in zuul-jobs than have all our users use something that's broken without us knowing about it16:00
corvustristanC: then the ci will work :)16:00
corvusyeah, if this happened a lot, i'd think differently, but if this happens very rarely then it seems like it's worth tracking master so we don't end up forgetting about the pin and never moving it16:00
mnasercorvus: i actually think it might be a really cool opportunity to maybe ask and see if the minikube folks are interested in us doing 3rd party ci :)16:01
mnaserbut maybe that's me thinking too far out16:01
corvusmnaser: i'd be in favor :)16:01
corvus/.minikube/profiles/minikube/16:01
corvusthat's a silly path :)16:01
mnasercorvus: i think that's because you can have multiple profiles now and the default name of the default profile is "minikube"16:02
mnaserso i guess that's to make sure each profile had their own cert path so to speak16:02
corvusmnaser, tristanC, Open10K8S: should we handle both the old and new path?16:02
corvus(in case someone is pinned to 1.8?)16:02
tristanCcorvus: mnaser: right, but well, this is time consuming... i was already fighting with other issue in how zuul scheduler is restarted, and having to deal with a bleeding edge kubernetes is not helping16:02
Open10K8Scorvus:16:03
tristanCi'd rather have a stable fondation, and do those kind of intrusive change (upgrade kubernetes) in a control maner16:03
mnaseri like to think we're not just building the operator, but we're building zuul as a whole system and set of projects16:04
Open10K8Scorvus: As @mnaser mentioned, in the .kube/config involves the right path for the client.key file16:04
mnaserhow hard is it to parse a yaml string in ansible?16:05
mnaserwe can run: "kubectl config view" and as part of that, we have this16:05
mnaserhttps://www.irccloud.com/pastebin/4bzdAdPz/16:05
mnaser(that's me on an older release)16:05
tristanCmnaser: that's actually a string in a record that is present in a list16:06
mnaseri mean, we'll probably have just _one_ user anyways16:07
mnaserand if we _really_ want to get fancy we can scan for the 'minikube' user..16:07
corvustristanC: yes, just about everything everywhere has broken this week and surprised us.  maybe because everyone is bored at home and has too much time on our hands.  it's very stressful.  but the work needs to be done at some point.  i'd be in favor of a temporary pin until it's fixed if a problem comes up, but in this case, the problem is fixed, so i'd rather not have a pin that we forget to remove.16:07
corvusmnaser: there's a "from_yaml" filter16:07
corvusso if you run that command and register the output, you can do: set_fact: kube_config: {{ kubectl.output | from_yaml }}16:08
corvusthen kube_config['users'][0]['user']['client-key']16:09
corvusor even, loop: kube_config['users']  and then item['user']['client-key']16:09
mnasercorvus: i like the 2nd approach even more.  Open10K8S: can we revise the patch to do that so we can do that?16:09
corvusyeah, that seems very future- and past- proof16:10
tristanCzuul-maint : please review https://review.opendev.org/#/c/715418/ to fix the install-kubernetes16:10
mnaseruh16:10
corvustristanC: since that change works now, you can go ahead and depends-on it and continue your zuul-operator work16:12
Open10K8Smanser: ok16:12
Open10K8Smnaser: ok16:12
corvuseven while Open10K8S updates it to work for multiple versions16:12
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: Add tenant reconfiguration when main.yaml changed  https://review.opendev.org/70363116:18
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: zuul-restart: prevent issue when services are restarted out of order  https://review.opendev.org/71542416:21
*** jamesmcarthur has quit IRC16:40
*** dpawlik has quit IRC16:41
openstackgerritMatthieu Huin proposed zuul/zuul master: OIDCAuthenticator: add capabilities, scope option  https://review.opendev.org/70227516:47
openstackgerritMatthieu Huin proposed zuul/zuul master: admin REST API: zuul-web integration  https://review.opendev.org/64353616:52
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path  https://review.opendev.org/71541816:54
*** y2kenny has quit IRC17:00
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path  https://review.opendev.org/71541817:05
tristanCOpen10K8S: previous PS was also failing with "Invalid data passed to 'loop', it requires a list, got this instead: . Hint: If you passed a list/dict of just one element, try adding wantlist=True to your lookup invocation or use q/query instead of lookup."17:07
mnasertristanC: weird, i just saw it pass here infront of me17:14
mnaserhttps://www.irccloud.com/pastebin/LawvSPUS/17:14
*** hashar has quit IRC17:17
tristanClast PS seems to fix the issue indeed17:18
mnasertristanC: i think it got the kube config from root and it had an empty yaml string17:19
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: zuul-restart: prevent issue when services are restarted out of order  https://review.opendev.org/71542417:22
tristanCso, back to the initial problem, here is a zuul restart implementation using merger deployment metadata.resourceVersion checks ^  . Though i think there is still a chance the scheduler restart before the merger are reconfigured, and perhaps we need a 'pause' command for the scheduler service17:24
tristanCcorvus: this is quite a bit of extra tasks to work around the merge job failure we talked about yesterday... why can't the scheduler retry failed cat job again?17:26
*** jpena is now known as jpena|off17:28
corvustristanC: you can check eavesdrop for the whole thing, but a quick reminder is that in a large system with multiple mergers, we would retry the jobs very quickly.  it should be the job of the operator not to put the system into an inconsistent state.17:32
tristanCcorvus: couldn't we use an exponential delay with a 10minutes bailout or something then?17:33
tristanCi mean right now, when the scheduler reload the config, and a merger job failed, then the tenant simply isn't loaded17:34
corvustristanC: probably.  but the entire job of the operator is to control what services are running under what configuration.  it should do this.17:35
tristanCso in a large system, with many projects and connections, a single failure seems to result in a broken tenant17:35
mnaserOpen10K8S: you can see that https://review.opendev.org/#/c/715418/ failed but it seems like it fail because gpg timed out, you can comment with "recheck" and it will attempt to load it again17:35
*** evrardjp has quit IRC17:36
*** evrardjp has joined #zuul17:36
tristanCcorvus: actually we don't control what services are running, kubernetes does. So the maybe failure i mentioned before is when we stop the scheduler before restarting the merger, kubernetes may still restart the scheduler before the merger are also turned off17:37
tristanCoh well, we can control the service by setting the scheduler replica to zero then17:37
mnaserthe way i have handled that in my operators (granted, those are golang), is that the operator watches for the other deployment states as part of the loop and makes adjustments17:38
mnaseri.e. if deployment is still in a rollout, the operator returns a "re-queue" result17:38
tristanCmnaser: the trick is we have to ensure the scheduler is stopped, then restart merger, then finally restart the scheduler.17:39
mnaserright so when you are inside the "update" loop instead of a "create" loop, you would just wait for the deployments/restarts to complete17:40
mnaserlike it seems to me if you just ensured the scheduler deployment finished rollout, and not make any other changes to the rest of the system17:41
tristanCmnaser: i think that's what was happening before with the `wait: true` attribute of the k8s ansible command17:41
*** dpawlik has joined #zuul17:41
mnaserthis was a big reason why i was adament on golang based things, it's because in stuff like this, you can represent logic, right now we just have something that generates manifests using dhall and ansible playbooks, controlling logic is really hard in that17:42
mnaserand the wait: true is not really ideal because you're blocking the control loop at the time17:42
mnaserinstead with a golang operator you can tell it to re-queue, or listens to specific events17:43
tristanCmnaser: i'm not sure it's much harder, it's just how the scheduler is designed and that we need to stop it before restarting the service17:44
tristanCand it seems like the only way to stop a service in k8s is actually to set the replica count to 017:45
mnaserthe whole point of the operator was that it would have the smarts to be able to coordinate and make those actions17:45
tristanCright, and i'm still learning how these actions can be performed in k8s17:46
mnaserif we're going to make those changes inside zuul, the operator becomes mostly a glorified helm applier17:46
tristanCit seems like if the scheduler would retry failed cat job, that would simplify zuul deployment. it seems like the issue we are experiencing the zuul-operator ci are caused by out of order event, e.g. when scheduler restart before merger17:50
tristanCif we can't fix that inside zuul, then i'm looking for ways to fix that in the operator... it's not that trivial to translate https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul_restart.yaml into kubernetes actions, however it is performed in golang or ansible17:52
*** dpawlik has quit IRC17:52
tristanCwe need a way to ensure the scheduler doesn't restart too early, and iiuc the only way is to set the replica count to 0 and wait for the statefulset to reach the desired state of 0 scheduler17:53
*** y2kenny has joined #zuul17:54
*** y2kenny has quit IRC18:17
*** y2kenny has joined #zuul18:37
y2kennyif I want to connect to gerrit from the job (to query the wip state of a change, for example) is there a recommended way to do it?18:39
mnasery2kenny: i think that might become a bit complicated, because of how you'd have to manage credentials for gerrit access (assuming its not public)18:40
mnasery2kenny: but, you have access to the commit message for example inside the zuul structure18:40
clarkbyou also have the chagne number so could query the http rest api anonymously to get the status of that change (if that is allowed in your setup, it is on ours)18:40
y2kennyok.18:41
y2kennythanks18:41
clarkbin general though I think we've found that having consistent job behavior that doesn't depend on exteranl state is a good thing (basically if zuul thinks the job should run then that job should run the same way each time)18:42
clarkbreduces variables that people have to consider when things don't work as expected18:43
y2kennythat's a good tip18:43
y2kennythis is still me trying to get around the not-wanting-to-use-gerrit-but-have-to-use-gerrit issue18:45
y2kenny(I don't mean me not want to use gerrit)18:45
mnasercorvus: is https://review.opendev.org/#/c/715418/ okay with you at this state?18:50
*** gmann is now known as gmann_lunch18:59
corvusmnaser, Open10K8S: lgtm, thanks!19:12
Open10K8Scorvus:my pleasure19:14
openstackgerritRonelle Landy proposed zuul/zuul-jobs master: Add var to allow jobs to skip centos8 deps install  https://review.opendev.org/71552419:26
*** gmann_lunch is now known as gmann19:29
*** y2kenny has quit IRC19:31
openstackgerritMerged zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path  https://review.opendev.org/71541819:35
*** rf0lc0 has joined #zuul19:43
*** ysandeep|away has quit IRC19:44
*** ysandeep has joined #zuul19:45
*** avass has quit IRC19:45
*** kgz has quit IRC19:45
*** rfolco has quit IRC19:46
*** kgz has joined #zuul19:51
*** y2kenny has joined #zuul20:03
y2kennyfor untrusted project, if I want to add playbooks and roles, does it have to be in specific locations?  (i.e. playbooks/ and roles/  at the root parallel .zuul.d/ or can I place them under .zuul.d/playbooks/ and .zuul.d/roles/ ?)20:05
clarkby2kenny: I think the playbooks can be in arbitrary locations. I'm looking up the example I know of20:07
clarkboh wait no thats still in playbooks/ hrm20:08
mordredyes - playbooks can be in arbitrary locations - you reference them by path20:08
y2kennybut will ansible know where to look for the roles?20:08
y2kenny(let say I created a custom role within the untrusted project.)20:09
mordredso if you put them in .zuul.d/playbooks/foo.yaml you'd need to say run: .zuul.d/playbooks/foo.yaml - one thing to keep in mind though is that putting yaml files in .zuul.d might be a bad idea purely because zuul config files are yaml and it loads all the things in .zuul.d - I'm pretty sure it doesn't recurse - but still, it might be mentally confusing20:09
mordredfor roles - there are two answers20:09
mordredif you wnat the repo to be used in zuul in a roles: statement so that zuul uses it as a roles repo - the roles dir needs to be in the root of the repo20:09
y2kennyoh right... I wasn't thinking about the yaml thing.20:10
mordredif it's just roles you're using from inside the playbook - you can put the roles dir adjacent to the playbook20:10
mordredso you could have playbooks/foo.yaml and playbooks/roles/my-role - and then use my-role in foo.yaml20:10
mordredbut then you can' tuse that role in _other_ zuul jobs20:10
mordredin different repos20:10
y2kennyok.  Thanks mordred and clarkb20:10
*** rlandy is now known as rlandy|brb20:42
y2kennyI was trying to run markdownlint in one of my untrusted-project.  The role executed byt I got no such file or directory for .markdownlint/node_modules/.bin/markdownlint, did I miss some pre step?21:01
corvusy2kenny: maybe run ensure-markdownlint first21:15
*** rlandy|brb is now known as rlandy21:16
corvusy2kenny: or if you have markdownlint installed already, we may need to update that role21:16
y2kennyok.  sorry, I missed the ensure roles.21:17
corvusi think that was the last one we added before we figured out how to make roles like that work with software pre-installed or not21:17
openstackgerritMerged zuul/nodepool master: Add libc6-dev to bindep  https://review.opendev.org/71521621:17
openstackgerritMerged zuul/nodepool master: Pin docker images to 3.7 explicitly  https://review.opendev.org/71504321:17
corvus(basically, the ensure-markdownlint role will put markdownlint there, and the markdownlint role expects it there.  if it's already installed, we need to update the role to check)21:17
y2kennycorvus: are there any convention to follow in terms of setting up roles?21:21
corvusy2kenny: we're sort of establishing the convention of "ensure-foo" and "foo".  and ideally you'd put ensure-foo in a pre playbook21:22
y2kennylike what are the trade off of having a pre-installed/configured image vs doing prep in pre-run21:22
corvustime+space vs universal applicability and simplicity21:22
corvusin opendev, we're actually trying to simplify our images to reduce our maintenance cost21:23
y2kennyok.  so ensure->task->fetch is typical21:23
corvus(but big things that we use a lot, we'll probably keep on our images because they're worth it)21:23
y2kennyhow about having tools as part of the source repository.  Do you guys have use cases like that?21:24
corvusy2kenny: yeah, we occasionally run things out of a "tools/" directory.  there's even a zuul-jobs role to run "tools/test-setup.sh"21:24
y2kenny(the first example that comes to mind is the checkpatch.pl that comes with the kernel but I am not sure how typical that type of things are.)21:24
y2kennyok I see21:24
corvusy2kenny: yeah, that would be a good case21:24
mordredwe also split into ensure-foo and foo so that we can make jobs do the ensure-foo step in the pre-playbook ... which will retry the job if it fails21:26
*** y2kenny38 has joined #zuul21:29
*** y2kenny43 has joined #zuul21:30
*** y2kenny43 has left #zuul21:30
*** y2kenny has quit IRC21:32
*** y2kenny has joined #zuul21:44
*** rlandy has quit IRC22:15
mnaserhmm22:32
mnasery2kenny, corvus: do we not have a markdownlint job already in place that someone can just use?22:32
y2kennymnaser: that's the one I was using22:32
mnasery2kenny: oh interesting, so that means the job doesnt install by default? hmm22:33
y2kennybut I need to put ensure-markdownlint in first22:33
y2kennyand to use ensure-markdownlint I have to have a task to install npm22:33
mnasertbh if you are using the 'markdownlint' *job* and it's not "just working" then that's a bug22:33
y2kennysounds like there's a convention of ensure-job before job22:33
mnasery2kenny: ensure-role before role22:34
y2kennyright22:34
y2kennyOH22:34
y2kennyyou mean there's a job?22:34
y2kennylet me double check22:34
mnaserthat's what im wondering and checking :)22:34
mnasery2kenny: there is a job -- https://zuul-ci.org/docs/zuul-jobs/general-jobs.html#job-markdownlint22:34
corvusmnaser: i agree, and i apologize for not thinking to mention that earlier y2kenny.22:35
y2kennyyes.  I should've just used that22:35
y2kennylol22:35
corvusit's friday for everyone, right? :)22:35
mnasery2kenny: that should be plug and play in that case for you :)22:35
mnasercorvus: friday .. monday .. wednesday .. i don't even know what day it is anymore22:35
mnaserENOSOCIALINTERACTION22:35
y2kennyI will give that a try.  thanks folks.22:36
mnasery2kenny: no problem!22:42
y2kennyum... looks like the playbook job requires a bit more than my node can handle and failed22:46
y2kennyit looks for gpg and my base image doesn't have it22:47
y2kennyin project configuration, I believe I can add variables in a job for a pipeline.  for example23:31
y2kennycheck:23:32
y2kenny  jobs:23:32
y2kenny    - name: some-job23:32
y2kenny      vars:23:32
y2kenny        var-name: "value"23:32
y2kennybut for some reason I am getting "extra keys not allowed @ data['check']['jobs'][0]['vars']['var-name']"23:33
y2kennyis this a yaml formatting thing or I misunderstood the documentation?23:33
clarkby2kenny: you need a : after some-job23:34
clarkbor wait no maybe not since you used name:23:34
clarkblooking for examples now23:35
funginame and vars are the keys under the job at index 023:35
fungiso no, format looks correct to me23:35
clarkbhttps://opendev.org/opendev/system-config/src/branch/master/.zuul.yaml#L1641-L1645 is how we do it23:35
clarkbwe don't use name:23:35
clarkb(is that an optional format?)23:35
fungiahh, yeah we allow the name: thing for project definitions but maybe that's not a thing in job definitions23:37
y2kennyclarkb: um... when I does that I get expected str for dictionary value @ data['check']['jobs'][0]['some-job']23:38
clarkby2kenny: ya when you do it the way we do you have to add the : at the end then indent vars under some-job23:38
y2kennyI did... but I think I may have missed an indent...23:39
clarkbthe v in vars should be under the m in some-job23:39
y2kennyyup.  That's does it!23:41
y2kennywhat does zuul run when it check this on the server?  Is there a way for me to run it locally with some tool?23:41
y2kenny(the multiple patchset is no big deal... I am just curious)23:42
clarkby2kenny: its doing a yaml load, then running its own data consistency checks on the data structures that get loaded23:42
clarkby2kenny: you could pretty easily do the yaml load step and ensure it parses yaml properly, but the data consistency checks are best/easiest done by zuul itself23:42
y2kennyright.  Ok.23:43
y2kennythanks for the help clarkb, fungi23:43
fungiyw23:44
fungiand yes, performing a dry run of zuul configuration checking is nontrivial because its context can potentially include the state of any branch of any repository in your whole system23:45
fungibasically to determine "what will my zuul think of this" you have to ask it, or run a complete enough copy of your deployment to include all the bits which could be involved in assembling the job23:46
*** ysandeep is now known as ysandeep|rover23:48
y2kennyfungi: I see23:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!