*** rfolco has joined #zuul | 00:06 | |
*** jamesmcarthur has quit IRC | 00:11 | |
*** jamesmcarthur has joined #zuul | 00:11 | |
*** armstrongs has joined #zuul | 00:12 | |
*** armstrongs has quit IRC | 00:21 | |
mordred | remote: https://review.opendev.org/715331 Release 3.1.0 of cliff | 00:24 |
---|---|---|
mordred | that should fix the py35 issue when it lands | 00:24 |
*** rlandy has quit IRC | 00:28 | |
*** jamesmcarthur has quit IRC | 00:29 | |
*** jamesmcarthur has joined #zuul | 00:30 | |
*** jamesmcarthur has quit IRC | 00:30 | |
*** jamesmcarthur has joined #zuul | 00:31 | |
*** jamesmcarthur has quit IRC | 00:36 | |
*** sgw has quit IRC | 00:38 | |
*** sgw has joined #zuul | 00:38 | |
*** jamesmcarthur has joined #zuul | 00:39 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script" https://review.opendev.org/715325 | 00:57 |
*** jamesmcarthur has quit IRC | 01:00 | |
*** sgw has quit IRC | 01:06 | |
*** sgw has joined #zuul | 01:08 | |
*** rlandy has joined #zuul | 01:23 | |
*** rlandy has quit IRC | 01:33 | |
*** zxiiro has quit IRC | 01:35 | |
*** jamesmcarthur has joined #zuul | 01:38 | |
*** adamw has quit IRC | 01:43 | |
*** adamw has joined #zuul | 01:43 | |
*** jamesmcarthur has quit IRC | 01:49 | |
*** adamw has quit IRC | 01:58 | |
*** adamw has joined #zuul | 02:02 | |
*** jamesmcarthur has joined #zuul | 02:26 | |
*** jamesmcarthur has quit IRC | 02:32 | |
*** jamesmcarthur has joined #zuul | 02:32 | |
*** ysandeep|away is now known as ysandeep|rover | 02:34 | |
*** Goneri has quit IRC | 02:37 | |
*** swest has quit IRC | 02:54 | |
*** rfolco has quit IRC | 02:55 | |
*** jamesmcarthur has quit IRC | 02:55 | |
*** bhavikdbavishi has joined #zuul | 02:55 | |
*** jamesmcarthur has joined #zuul | 02:57 | |
*** bhavikdbavishi has quit IRC | 03:00 | |
*** jamesmcarthur has quit IRC | 03:06 | |
*** swest has joined #zuul | 03:09 | |
*** jamesmcarthur has joined #zuul | 03:13 | |
*** jamesmcarthur has quit IRC | 03:31 | |
*** jamesmcarthur has joined #zuul | 04:14 | |
*** bhavikdbavishi has joined #zuul | 04:24 | |
*** saneax has joined #zuul | 04:27 | |
*** bhavikdbavishi1 has joined #zuul | 04:29 | |
*** bhavikdbavishi has quit IRC | 04:30 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 04:30 | |
*** jamesmcarthur has quit IRC | 05:10 | |
*** jamesmcarthur has joined #zuul | 05:23 | |
*** jamesmcarthur has quit IRC | 05:27 | |
*** bhavikdbavishi has quit IRC | 05:33 | |
*** bhavikdbavishi has joined #zuul | 05:34 | |
*** evrardjp has quit IRC | 05:36 | |
*** evrardjp has joined #zuul | 05:36 | |
*** ysandeep|rover is now known as ysandeep|rover|b | 05:41 | |
*** bhavikdbavishi has quit IRC | 05:57 | |
*** sgw has quit IRC | 06:02 | |
*** bhavikdbavishi has joined #zuul | 06:08 | |
*** ysandeep|rover|b is now known as ysandeep|rover | 06:08 | |
*** bhavikdbavishi has quit IRC | 06:49 | |
*** dpawlik has joined #zuul | 07:11 | |
*** bhavikdbavishi has joined #zuul | 07:40 | |
*** jpena|off is now known as jpena | 08:14 | |
*** ysandeep|rover is now known as ysandeep|rover|l | 08:33 | |
*** ysandeep|rover|l is now known as ysandeep|rover | 08:59 | |
*** tosky has joined #zuul | 09:29 | |
*** sshnaidm|afk is now known as sshnaidm|off | 09:50 | |
*** pabelanger has quit IRC | 09:58 | |
*** fbo has quit IRC | 09:58 | |
*** jpena has quit IRC | 09:59 | |
*** jpena has joined #zuul | 10:06 | |
*** fbo has joined #zuul | 10:11 | |
*** fbo has quit IRC | 10:36 | |
*** ysandeep|rover is now known as ysandeep|rov|brb | 10:41 | |
*** bhavikdbavishi has quit IRC | 10:51 | |
*** bhavikdbavishi has joined #zuul | 10:52 | |
*** ysandeep|rov|brb is now known as ysandeep|rover | 11:03 | |
*** arxcruz is now known as arxcruz|off | 11:19 | |
*** tobias-urdin has joined #zuul | 11:28 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: WIP: Enforce sql connections for scheduler and web https://review.opendev.org/630472 | 11:48 |
*** ysandeep|rover is now known as ysandeep|rov|mtg | 12:00 | |
*** rlandy has joined #zuul | 12:03 | |
*** jpena is now known as jpena|lunch | 12:05 | |
*** ysandeep|rov|mtg is now known as ysandeep|rover | 12:37 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Cap stestr version https://review.opendev.org/715415 | 12:41 |
*** hashar has joined #zuul | 12:44 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Cap stestr version https://review.opendev.org/715415 | 12:45 |
*** rfolco has joined #zuul | 12:48 | |
mnaser | morning all | 12:53 |
mnaser | would anyone like to +W this today? we hit the 2 weeks - https://review.opendev.org/#/c/712804/1 | 12:54 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: install-kubernetes: fix missing package https://review.opendev.org/715418 | 12:59 |
mnaser | tristanC: fyi ^ | 13:00 |
tristanC | mnaser: thanks! | 13:00 |
*** openstackstatus has quit IRC | 13:01 | |
*** openstack has joined #zuul | 13:05 | |
*** ChanServ sets mode: +o openstack | 13:05 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Cap stestr version https://review.opendev.org/715415 | 13:07 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Required SQL reporters https://review.opendev.org/630472 | 13:15 |
*** hashar_ has joined #zuul | 13:20 | |
*** hashar has quit IRC | 13:20 | |
tristanC | mnaser: arg, install-kubernetes still fails because of `file (/home/zuul/.minikube/client.key) is absent, cannot continue` | 13:24 |
*** hashar_ is now known as hsahar | 13:26 | |
*** hsahar is now known as hashar | 13:26 | |
tristanC | how about we stop using minikube master? | 13:26 |
flaper87 | how many concurrent jobs can a Zuul executor handle? Do we have some stats/numbers on this? | 13:32 |
tristanC | flaper87: iirc this is controlled by https://zuul-ci.org/docs/zuul/discussion/components.html#attr-executor.load_multiplier | 13:35 |
tristanC | flaper87: the amount of jobs depends on the available cpu and that load_multiplier setting | 13:36 |
*** Goneri has joined #zuul | 13:36 | |
flaper87 | awesome, thanks. I was trying to put a number on it. Do we know how many jobs does a zuul-executor in the OpenStack deployment handle? | 13:37 |
tristanC | flaper87: you can see that in http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=24&fullscreen&orgId=1 | 13:39 |
*** ysandeep|rover is now known as ysandeep|away | 13:39 | |
openstackgerrit | Merged zuul/zuul master: executor: drop --address=127.0.0.1 from kubectl https://review.opendev.org/715308 | 13:43 |
corvus | flaper87: somewhere between 80 and 100; that's an 8g vm | 13:43 |
flaper87 | corvus: gotcha! thanks and thank you tristanC | 13:44 |
AJaeger | zuul-maint, do we want to cap stestr for zuul-jobs to unbreak py27 job? See https://review.opendev.org/715415 | 13:45 |
openstackgerrit | Merged zuul/zuul master: Display clean error message for missing secret https://review.opendev.org/713469 | 13:45 |
flaper87 | corvus: are there other zuul services running in that same 8gb VM ? | 13:45 |
corvus | flaper87: no, we have 12 of those vms, each dedicated to running zuul-executor | 13:46 |
flaper87 | gotcha! thanks | 13:46 |
fungi | AJaeger: it seems like that's the only choice if there's a newer release out there which mistakenly declares python 2.7 support, as mtreinish observed that's tough to undo | 13:46 |
corvus | flaper87: (our current total capacity is around 1000 test nodes, so that's sized closer to 80 per executor) | 13:47 |
corvus | flaper87: (but sometimes our capacity moves up to about 1200; we probably wouldn't add another executor unless it exceeds that) | 13:47 |
*** sgw has joined #zuul | 13:50 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Required SQL reporters https://review.opendev.org/630472 | 13:51 |
AJaeger | fungi: we could: Release a version that re-adds py27 support - and works - and then drop 2.7 again and do it properly. That is the dance mordred is doing on openstacksdk right now... And if that happens, we can revert my change. | 13:51 |
tobiash | the version dance :-P | 13:52 |
fungi | AJaeger: yeah, it would need to be higher than the version number which dropped python2 support though | 13:52 |
fungi | but doable, you're right | 13:52 |
*** avass has joined #zuul | 13:53 | |
fungi | i mean, with 3.0.0 broken, i suppose you could re-tag the last <3.0.0 release as 4.0.0 and then re-tag the fixed master branch as 5.0.0 with the correct python version restriction | 13:54 |
fungi | so don't actually need to revert things | 13:55 |
mordred | yeah. we talked about doing that for sdk - but reno got real confused | 13:55 |
fungi | (since it does seem to use pbr, so no additional patches required to change the version) | 13:55 |
*** bhavikdbavishi has quit IRC | 13:55 | |
fungi | mordred: ahh, i didn't realize stestr was also using reno | 13:55 |
mordred | but - maybe that's not an issue for stestr | 13:55 |
mordred | I don't know that it does - that was mostly - that was the plan I liked the most until it turned out reno didn't like that plan | 13:56 |
mordred | so I think it's a good plan in general | 13:56 |
mordred | but otherwise, yeah - revert, tag, re-revert, fix python, tag | 13:56 |
fungi | i think mtreinish was also trying to cope with a somewhat stale repository which sat collecting python 2 removal changes from other maintainers for a very long time, so reverting it all could be a major pain | 13:57 |
mordred | yeah. in his case just doing the retag might be the easier route | 13:57 |
fungi | i'm not really clear on the maintenance situation for stestr, i may be thinking of testr | 13:57 |
mordred | fungi: no reno in stestr repo that I can see | 13:58 |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Remove bashate from test-requirements https://review.opendev.org/715328 | 14:00 |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script" https://review.opendev.org/715325 | 14:00 |
AJaeger | just rebased to resolved conflicts ^ | 14:00 |
tobiash | woot, just learned about disk image inheritance (https://review.opendev.org/713157), we have external scripting that enables just this for our config :) | 14:01 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add tenant reconfiguration when main.yaml changed https://review.opendev.org/703631 | 14:09 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: zuul-restart: prevent issue when services are restarted out of order https://review.opendev.org/715424 | 14:09 |
openstackgerrit | Merged zuul/zuul-jobs master: Cap stestr version https://review.opendev.org/715415 | 14:09 |
corvus | tobiash: yay! let me know if it works for you | 14:09 |
tobiash | corvus: I'll try it after our next nodepool update :) | 14:10 |
bolg | corus: I've updated https://review.opendev.org/#/c/630472/ according to https://etherpad.openstack.org/p/zuulv4. It has -2 from your last review | 14:10 |
tobiash | then I can throw away parts of our config generation scripts :) | 14:11 |
bolg | corvus: ^^^ | 14:11 |
AJaeger | any takes to +2A the revert of the download-script change, please? https://review.opendev.org/#/c/715325/ | 14:11 |
tobiash | AJaeger: done | 14:11 |
tristanC | zuul-maint (or rather zuul-operator-maint): i documented some tasks that should prevent issues when the scheduler restart before the mergers, but i'm not satisfied by the requirements... could you please have a look and see if there is a better way to fix: https://review.opendev.org/#/c/715424/1/roles/zuul-restart-when-zuul-conf-changed/tasks/main.yaml | 14:12 |
corvus | bolg: thanks -- i think we aren't quite ready to merge that yet, right? i think that's step 6? | 14:14 |
AJaeger | thanks, tobiash | 14:14 |
tobiash | corvus: yes, as I read it we can merge it after merging and releasing zk auth support? | 14:18 |
corvus | tobiash: yes i think that's right. and i think we're close to that | 14:19 |
tobiash | :) | 14:19 |
bolg | corvus: sure. As far as I understand the sequence the reporters are not really dependent on the previous steps, right? Theoretically an independent change if I am not mistaken. But no need to rush it from my POV | 14:22 |
openstackgerrit | Merged zuul/zuul-jobs master: Remove bashate from test-requirements https://review.opendev.org/715328 | 14:26 |
openstackgerrit | Merged zuul/zuul-jobs master: ensure-tox: use python3 by default https://review.opendev.org/712804 | 14:26 |
openstackgerrit | Merged zuul/zuul-jobs master: Revert "upload-logs-swift: Create a download script" https://review.opendev.org/715325 | 14:26 |
*** bhavikdbavishi has joined #zuul | 14:27 | |
tobiash | corvus: this change of mordred should unbreak the nodepool builds: https://review.opendev.org/715216 | 14:38 |
*** y2kenny has joined #zuul | 14:39 | |
*** bhavikdbavishi has quit IRC | 14:40 | |
corvus | tobiash: thx, i approved that and the followup | 14:40 |
tobiash | clarkb: do you in opendev see some nodes in nodepool nodelist with state deleted and None as provider? | 14:41 |
tobiash | we accumulated ~10 of those in half a year which seems to be some sort of distributed race between two launchers | 14:42 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: ci: pin minikube version to 1.8.2 https://review.opendev.org/715443 | 14:58 |
*** bhavikdbavishi has joined #zuul | 15:01 | |
y2kenny | corvus: the prepare-workspace-openshift thing worked and things are looking great. | 15:03 |
corvus | y2kenny: awesome! tristanC ^ :) | 15:03 |
*** bhavikdbavishi1 has joined #zuul | 15:04 | |
*** Open10K8S has joined #zuul | 15:05 | |
y2kenny | so what ended up blocking the k8s provisioning was 3 things (just as a summary), the fact that the executor needs to be a privilieged pod due to bwrap; old kubectl in the executor image; and slight variation in the prepare-workspace role | 15:05 |
*** bhavikdbavishi has quit IRC | 15:05 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 15:05 | |
*** zxiiro has joined #zuul | 15:12 | |
*** jamesmcarthur has joined #zuul | 15:14 | |
*** jamesmcarthur has quit IRC | 15:15 | |
*** jamesmcarthur has joined #zuul | 15:17 | |
y2kenny | corvus: now that things are working I have some observation that don't understand. When I use the namespace label type, I was expecting the pod (or whichever resources I created via ansible k8s/k8s_raw) to be created under the name of the namespace label. But it is created under the automatically created namespaces. Is this the expected | 15:23 |
y2kenny | behaviour? | 15:23 |
y2kenny | so anything created by the k8s/openshift driver will always be ephemeral? | 15:24 |
clarkb | tobiash: all of our deleting nodes have a provider set. And only one of them is more than a few minutes old | 15:27 |
*** bhavikdbavishi has quit IRC | 15:27 | |
tobiash | clarkb: oh, I meant 'deleted' | 15:27 |
clarkb | tobiash: we have noticed that there are some unlocked ready nodes in a specific provider that have been there for a while (almost like nodepool isn't trying to use those to fulfull new requests) but that is the only oddity I am aware of currently in our nodepool | 15:27 |
clarkb | tobiash: oh we have none in a deleted state in the db | 15:27 |
tobiash | great | 15:28 |
tobiash | it's very seldom, I'll observe this further | 15:28 |
*** bhavikdbavishi has joined #zuul | 15:28 | |
corvus | y2kenny: yes, the namespace doesn't have access to anything outside of that namespace; the expectation is that you would use that to deploy test versions of your services in the automatically created namespace. if you need to deploy something more complicated (something that needs cluster-level access), you might want to consider deploying a k8s cluster for the job (but you will probably need a vm for | 15:28 |
corvus | that). that's what we do to test the k8s driver in nodepool. if you want to deploy something in production, you could encode k8s credentials as a zuul secret and use them inside of a job to do that. | 15:28 |
tristanC | mnaser: pinning minikube to v1.8.2 works (demo in https://review.opendev.org/#/c/715443/ ) | 15:29 |
mnaser | tristanC: i have someone working on fixing the actual thing in the zuul-jobs repo. | 15:30 |
clarkb | is that the contrack thing? seems like you can simply install and modprobe as necessary? | 15:31 |
mnaser | clarkb: i already did that, but there's another error now about the client.key | 15:31 |
clarkb | ah | 15:31 |
mnaser | openk10s (which isn't on irc now) found out it was because they moved the path where the key is stored | 15:31 |
mnaser | so they're pushing a change now to chown that file to ansible_user and that should unblock | 15:32 |
fungi | y2kenny: also more generally, nodepool's job is to create ephemeral resources for individual builds and then delete them as soon as the builds complete to free up available capacity for future builds. it's designed to not leave anything behind (and if it does, we consider that a "leak" of resources) | 15:32 |
y2kenny | corvus: understood. | 15:32 |
y2kenny | corvus, fungi: so are there any connection between k8s resources availability to the nodepool? | 15:33 |
*** bhavikdbavishi has quit IRC | 15:33 | |
y2kenny | what I mean is, it is entirely possible for me to try to scheduler something in the k8s namespace via k8s_raw that is not available. | 15:33 |
y2kenny | in that case, would I fail the job or is there a way for me to re-enqueue the job from within the job? | 15:34 |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path https://review.opendev.org/715418 | 15:34 |
y2kenny | as an concrete example, I want to create a pod with some k8s-label selector. Nodepool will schedule it because the namespace is available but kubectl will probably fail. | 15:36 |
mnaser | Open10K8S: thanks for that patch, let's see how CI feels about it | 15:37 |
Open10K8S | thanks | 15:37 |
Open10K8S | mnaser: thanks | 15:38 |
corvus | y2kenny: even the k8s pod type in nodepool gets a custom namespace with the same restriction; but i may not understand what the k8s-label has to do in this situation | 15:39 |
y2kenny | corvus: I am thinking more on k8s concepts like node-taint, resource request, etc. | 15:40 |
tristanC | Open10K8S: mnaser: should the fix also restore the legacy client key path? I was actually using that ~/.minikube/client.key path to setup the nodepool provider in integration tests | 15:40 |
y2kenny | for example, with k8s_raw, I can supply an pod spec yaml that request 8 GPU resources and that might not be available on the cluster | 15:41 |
mnaser | tristanC: does that matter? i don't think we've ever provided a "promised path for kubernetes keys" for that role in our api | 15:41 |
mnaser | and i would encourage using kubectl commands to get that info out, i think there might be a way | 15:42 |
mnaser | tristanC: or you can parse the .kube/config and go to the file that contains it | 15:42 |
y2kenny | or a more generic use case, a job can in theory request to create a pod that has x amount of CPU and y amount of memory for the pod | 15:42 |
tristanC | mnaser: works for me, i actually think the proper fix is to not use minikube master in zuul-operator | 15:42 |
tristanC | like that, we can fix upgrade issue in a controlled maner | 15:43 |
mnaser | i think if we're going to come up with a story of pinning things, we might as well do it in zuul-jobs so our consumers don't get confused as heck :) | 15:43 |
y2kenny | so what I was wondering was, is there a connection between nodepool and k8s on resource availability? and if not, is there a way to feedback to the scheduler | 15:43 |
corvus | y2kenny: got it. since you're controlling the requesting of resources, you could put retries in the ansbile that does that. you could limit the number of 'namespace' nodes in nodepool to try to approximate the level of resources your k8s cluster provides, so that the waiting happens in nodepool rather than in the job. | 15:44 |
y2kenny | corvus: essentially, are there co-operation between the zuul scheduler and the kubernetes cluster scheduler. | 15:44 |
y2kenny | corvus: ok | 15:45 |
tristanC | mnaser: yeah, would be nice to express a 1.18 kubernetes, and let zuul-jobs pick the latest non breaking version. | 15:45 |
* mnaser would be more excited about having third party zuul-jobs running aginst minikube | 15:48 | |
mnaser | oh | 15:49 |
corvus | yeah, i'm kinda surprised minikube broke | 15:49 |
mnaser | i mean we could have been a bit more "smart" in our code and parsed the path from .kube/config | 15:51 |
mnaser | technically the path/location of that isn't exactly an api | 15:51 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add tenant reconfiguration when main.yaml changed https://review.opendev.org/703631 | 15:55 |
tristanC | corvus: would you mind if we pin minikube in zuul-operator? | 15:55 |
tristanC | because having to deal with such failure is time consuming, i'd rather do the upgrade manually when needed | 15:56 |
corvus | tristanC: why not in the role? | 15:56 |
corvus | tristanC: i'm sorry, i haven't been following; i'd appreciate a summary | 15:56 |
mnaser | corvus: we run minikube master in our gate, they made 2 changes: conntrack is now a required package, and they moved the path of the client.crt (which we did a chown on in our role, so that failed with "missing fle") | 15:57 |
tristanC | corvus: install-kubernetes started to fail because of minikube 1.19 release. mnaser and Open10K8S fixed the zuul-jobs, and i propose we do that https://review.opendev.org/715443 in zuul-operator (pin to 1.18) | 15:58 |
mnaser | corvus: fixing conntrack was a one liner, which exposed the other failure that Open10K8S fixed relatively quickly afterwards which is https://review.opendev.org/#/c/715418/ (and passed tests) | 15:58 |
corvus | so can we just merge the zuul-jobs change? | 15:59 |
mnaser | yes, and i would be in favour of *not* pinning zuul-operator to a specific version of minikube | 15:59 |
tristanC | corvus: sure, but i suspect using master will break zuul-operator ci again in the future | 15:59 |
mnaser | i'd rather we break zuul-operator ci and we fix it in zuul-jobs than have all our users use something that's broken without us knowing about it | 16:00 |
corvus | tristanC: then the ci will work :) | 16:00 |
corvus | yeah, if this happened a lot, i'd think differently, but if this happens very rarely then it seems like it's worth tracking master so we don't end up forgetting about the pin and never moving it | 16:00 |
mnaser | corvus: i actually think it might be a really cool opportunity to maybe ask and see if the minikube folks are interested in us doing 3rd party ci :) | 16:01 |
mnaser | but maybe that's me thinking too far out | 16:01 |
corvus | mnaser: i'd be in favor :) | 16:01 |
corvus | /.minikube/profiles/minikube/ | 16:01 |
corvus | that's a silly path :) | 16:01 |
mnaser | corvus: i think that's because you can have multiple profiles now and the default name of the default profile is "minikube" | 16:02 |
mnaser | so i guess that's to make sure each profile had their own cert path so to speak | 16:02 |
corvus | mnaser, tristanC, Open10K8S: should we handle both the old and new path? | 16:02 |
corvus | (in case someone is pinned to 1.8?) | 16:02 |
tristanC | corvus: mnaser: right, but well, this is time consuming... i was already fighting with other issue in how zuul scheduler is restarted, and having to deal with a bleeding edge kubernetes is not helping | 16:02 |
Open10K8S | corvus: | 16:03 |
tristanC | i'd rather have a stable fondation, and do those kind of intrusive change (upgrade kubernetes) in a control maner | 16:03 |
mnaser | i like to think we're not just building the operator, but we're building zuul as a whole system and set of projects | 16:04 |
Open10K8S | corvus: As @mnaser mentioned, in the .kube/config involves the right path for the client.key file | 16:04 |
mnaser | how hard is it to parse a yaml string in ansible? | 16:05 |
mnaser | we can run: "kubectl config view" and as part of that, we have this | 16:05 |
mnaser | https://www.irccloud.com/pastebin/4bzdAdPz/ | 16:05 |
mnaser | (that's me on an older release) | 16:05 |
tristanC | mnaser: that's actually a string in a record that is present in a list | 16:06 |
mnaser | i mean, we'll probably have just _one_ user anyways | 16:07 |
mnaser | and if we _really_ want to get fancy we can scan for the 'minikube' user.. | 16:07 |
corvus | tristanC: yes, just about everything everywhere has broken this week and surprised us. maybe because everyone is bored at home and has too much time on our hands. it's very stressful. but the work needs to be done at some point. i'd be in favor of a temporary pin until it's fixed if a problem comes up, but in this case, the problem is fixed, so i'd rather not have a pin that we forget to remove. | 16:07 |
corvus | mnaser: there's a "from_yaml" filter | 16:07 |
corvus | so if you run that command and register the output, you can do: set_fact: kube_config: {{ kubectl.output | from_yaml }} | 16:08 |
corvus | then kube_config['users'][0]['user']['client-key'] | 16:09 |
corvus | or even, loop: kube_config['users'] and then item['user']['client-key'] | 16:09 |
mnaser | corvus: i like the 2nd approach even more. Open10K8S: can we revise the patch to do that so we can do that? | 16:09 |
corvus | yeah, that seems very future- and past- proof | 16:10 |
tristanC | zuul-maint : please review https://review.opendev.org/#/c/715418/ to fix the install-kubernetes | 16:10 |
mnaser | uh | 16:10 |
corvus | tristanC: since that change works now, you can go ahead and depends-on it and continue your zuul-operator work | 16:12 |
Open10K8S | manser: ok | 16:12 |
Open10K8S | mnaser: ok | 16:12 |
corvus | even while Open10K8S updates it to work for multiple versions | 16:12 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add tenant reconfiguration when main.yaml changed https://review.opendev.org/703631 | 16:18 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: WIP: zuul-restart: prevent issue when services are restarted out of order https://review.opendev.org/715424 | 16:21 |
*** jamesmcarthur has quit IRC | 16:40 | |
*** dpawlik has quit IRC | 16:41 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: OIDCAuthenticator: add capabilities, scope option https://review.opendev.org/702275 | 16:47 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: admin REST API: zuul-web integration https://review.opendev.org/643536 | 16:52 |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path https://review.opendev.org/715418 | 16:54 |
*** y2kenny has quit IRC | 17:00 | |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path https://review.opendev.org/715418 | 17:05 |
tristanC | Open10K8S: previous PS was also failing with "Invalid data passed to 'loop', it requires a list, got this instead: . Hint: If you passed a list/dict of just one element, try adding wantlist=True to your lookup invocation or use q/query instead of lookup." | 17:07 |
mnaser | tristanC: weird, i just saw it pass here infront of me | 17:14 |
mnaser | https://www.irccloud.com/pastebin/LawvSPUS/ | 17:14 |
*** hashar has quit IRC | 17:17 | |
tristanC | last PS seems to fix the issue indeed | 17:18 |
mnaser | tristanC: i think it got the kube config from root and it had an empty yaml string | 17:19 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: zuul-restart: prevent issue when services are restarted out of order https://review.opendev.org/715424 | 17:22 |
tristanC | so, back to the initial problem, here is a zuul restart implementation using merger deployment metadata.resourceVersion checks ^ . Though i think there is still a chance the scheduler restart before the merger are reconfigured, and perhaps we need a 'pause' command for the scheduler service | 17:24 |
tristanC | corvus: this is quite a bit of extra tasks to work around the merge job failure we talked about yesterday... why can't the scheduler retry failed cat job again? | 17:26 |
*** jpena is now known as jpena|off | 17:28 | |
corvus | tristanC: you can check eavesdrop for the whole thing, but a quick reminder is that in a large system with multiple mergers, we would retry the jobs very quickly. it should be the job of the operator not to put the system into an inconsistent state. | 17:32 |
tristanC | corvus: couldn't we use an exponential delay with a 10minutes bailout or something then? | 17:33 |
tristanC | i mean right now, when the scheduler reload the config, and a merger job failed, then the tenant simply isn't loaded | 17:34 |
corvus | tristanC: probably. but the entire job of the operator is to control what services are running under what configuration. it should do this. | 17:35 |
tristanC | so in a large system, with many projects and connections, a single failure seems to result in a broken tenant | 17:35 |
mnaser | Open10K8S: you can see that https://review.opendev.org/#/c/715418/ failed but it seems like it fail because gpg timed out, you can comment with "recheck" and it will attempt to load it again | 17:35 |
*** evrardjp has quit IRC | 17:36 | |
*** evrardjp has joined #zuul | 17:36 | |
tristanC | corvus: actually we don't control what services are running, kubernetes does. So the maybe failure i mentioned before is when we stop the scheduler before restarting the merger, kubernetes may still restart the scheduler before the merger are also turned off | 17:37 |
tristanC | oh well, we can control the service by setting the scheduler replica to zero then | 17:37 |
mnaser | the way i have handled that in my operators (granted, those are golang), is that the operator watches for the other deployment states as part of the loop and makes adjustments | 17:38 |
mnaser | i.e. if deployment is still in a rollout, the operator returns a "re-queue" result | 17:38 |
tristanC | mnaser: the trick is we have to ensure the scheduler is stopped, then restart merger, then finally restart the scheduler. | 17:39 |
mnaser | right so when you are inside the "update" loop instead of a "create" loop, you would just wait for the deployments/restarts to complete | 17:40 |
mnaser | like it seems to me if you just ensured the scheduler deployment finished rollout, and not make any other changes to the rest of the system | 17:41 |
tristanC | mnaser: i think that's what was happening before with the `wait: true` attribute of the k8s ansible command | 17:41 |
*** dpawlik has joined #zuul | 17:41 | |
mnaser | this was a big reason why i was adament on golang based things, it's because in stuff like this, you can represent logic, right now we just have something that generates manifests using dhall and ansible playbooks, controlling logic is really hard in that | 17:42 |
mnaser | and the wait: true is not really ideal because you're blocking the control loop at the time | 17:42 |
mnaser | instead with a golang operator you can tell it to re-queue, or listens to specific events | 17:43 |
tristanC | mnaser: i'm not sure it's much harder, it's just how the scheduler is designed and that we need to stop it before restarting the service | 17:44 |
tristanC | and it seems like the only way to stop a service in k8s is actually to set the replica count to 0 | 17:45 |
mnaser | the whole point of the operator was that it would have the smarts to be able to coordinate and make those actions | 17:45 |
tristanC | right, and i'm still learning how these actions can be performed in k8s | 17:46 |
mnaser | if we're going to make those changes inside zuul, the operator becomes mostly a glorified helm applier | 17:46 |
tristanC | it seems like if the scheduler would retry failed cat job, that would simplify zuul deployment. it seems like the issue we are experiencing the zuul-operator ci are caused by out of order event, e.g. when scheduler restart before merger | 17:50 |
tristanC | if we can't fix that inside zuul, then i'm looking for ways to fix that in the operator... it's not that trivial to translate https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul_restart.yaml into kubernetes actions, however it is performed in golang or ansible | 17:52 |
*** dpawlik has quit IRC | 17:52 | |
tristanC | we need a way to ensure the scheduler doesn't restart too early, and iiuc the only way is to set the replica count to 0 and wait for the statefulset to reach the desired state of 0 scheduler | 17:53 |
*** y2kenny has joined #zuul | 17:54 | |
*** y2kenny has quit IRC | 18:17 | |
*** y2kenny has joined #zuul | 18:37 | |
y2kenny | if I want to connect to gerrit from the job (to query the wip state of a change, for example) is there a recommended way to do it? | 18:39 |
mnaser | y2kenny: i think that might become a bit complicated, because of how you'd have to manage credentials for gerrit access (assuming its not public) | 18:40 |
mnaser | y2kenny: but, you have access to the commit message for example inside the zuul structure | 18:40 |
clarkb | you also have the chagne number so could query the http rest api anonymously to get the status of that change (if that is allowed in your setup, it is on ours) | 18:40 |
y2kenny | ok. | 18:41 |
y2kenny | thanks | 18:41 |
clarkb | in general though I think we've found that having consistent job behavior that doesn't depend on exteranl state is a good thing (basically if zuul thinks the job should run then that job should run the same way each time) | 18:42 |
clarkb | reduces variables that people have to consider when things don't work as expected | 18:43 |
y2kenny | that's a good tip | 18:43 |
y2kenny | this is still me trying to get around the not-wanting-to-use-gerrit-but-have-to-use-gerrit issue | 18:45 |
y2kenny | (I don't mean me not want to use gerrit) | 18:45 |
mnaser | corvus: is https://review.opendev.org/#/c/715418/ okay with you at this state? | 18:50 |
*** gmann is now known as gmann_lunch | 18:59 | |
corvus | mnaser, Open10K8S: lgtm, thanks! | 19:12 |
Open10K8S | corvus:my pleasure | 19:14 |
openstackgerrit | Ronelle Landy proposed zuul/zuul-jobs master: Add var to allow jobs to skip centos8 deps install https://review.opendev.org/715524 | 19:26 |
*** gmann_lunch is now known as gmann | 19:29 | |
*** y2kenny has quit IRC | 19:31 | |
openstackgerrit | Merged zuul/zuul-jobs master: install-kubernetes: fix missing package, fix the client.key file path https://review.opendev.org/715418 | 19:35 |
*** rf0lc0 has joined #zuul | 19:43 | |
*** ysandeep|away has quit IRC | 19:44 | |
*** ysandeep has joined #zuul | 19:45 | |
*** avass has quit IRC | 19:45 | |
*** kgz has quit IRC | 19:45 | |
*** rfolco has quit IRC | 19:46 | |
*** kgz has joined #zuul | 19:51 | |
*** y2kenny has joined #zuul | 20:03 | |
y2kenny | for untrusted project, if I want to add playbooks and roles, does it have to be in specific locations? (i.e. playbooks/ and roles/ at the root parallel .zuul.d/ or can I place them under .zuul.d/playbooks/ and .zuul.d/roles/ ?) | 20:05 |
clarkb | y2kenny: I think the playbooks can be in arbitrary locations. I'm looking up the example I know of | 20:07 |
clarkb | oh wait no thats still in playbooks/ hrm | 20:08 |
mordred | yes - playbooks can be in arbitrary locations - you reference them by path | 20:08 |
y2kenny | but will ansible know where to look for the roles? | 20:08 |
y2kenny | (let say I created a custom role within the untrusted project.) | 20:09 |
mordred | so if you put them in .zuul.d/playbooks/foo.yaml you'd need to say run: .zuul.d/playbooks/foo.yaml - one thing to keep in mind though is that putting yaml files in .zuul.d might be a bad idea purely because zuul config files are yaml and it loads all the things in .zuul.d - I'm pretty sure it doesn't recurse - but still, it might be mentally confusing | 20:09 |
mordred | for roles - there are two answers | 20:09 |
mordred | if you wnat the repo to be used in zuul in a roles: statement so that zuul uses it as a roles repo - the roles dir needs to be in the root of the repo | 20:09 |
y2kenny | oh right... I wasn't thinking about the yaml thing. | 20:10 |
mordred | if it's just roles you're using from inside the playbook - you can put the roles dir adjacent to the playbook | 20:10 |
mordred | so you could have playbooks/foo.yaml and playbooks/roles/my-role - and then use my-role in foo.yaml | 20:10 |
mordred | but then you can' tuse that role in _other_ zuul jobs | 20:10 |
mordred | in different repos | 20:10 |
y2kenny | ok. Thanks mordred and clarkb | 20:10 |
*** rlandy is now known as rlandy|brb | 20:42 | |
y2kenny | I was trying to run markdownlint in one of my untrusted-project. The role executed byt I got no such file or directory for .markdownlint/node_modules/.bin/markdownlint, did I miss some pre step? | 21:01 |
corvus | y2kenny: maybe run ensure-markdownlint first | 21:15 |
*** rlandy|brb is now known as rlandy | 21:16 | |
corvus | y2kenny: or if you have markdownlint installed already, we may need to update that role | 21:16 |
y2kenny | ok. sorry, I missed the ensure roles. | 21:17 |
corvus | i think that was the last one we added before we figured out how to make roles like that work with software pre-installed or not | 21:17 |
openstackgerrit | Merged zuul/nodepool master: Add libc6-dev to bindep https://review.opendev.org/715216 | 21:17 |
openstackgerrit | Merged zuul/nodepool master: Pin docker images to 3.7 explicitly https://review.opendev.org/715043 | 21:17 |
corvus | (basically, the ensure-markdownlint role will put markdownlint there, and the markdownlint role expects it there. if it's already installed, we need to update the role to check) | 21:17 |
y2kenny | corvus: are there any convention to follow in terms of setting up roles? | 21:21 |
corvus | y2kenny: we're sort of establishing the convention of "ensure-foo" and "foo". and ideally you'd put ensure-foo in a pre playbook | 21:22 |
y2kenny | like what are the trade off of having a pre-installed/configured image vs doing prep in pre-run | 21:22 |
corvus | time+space vs universal applicability and simplicity | 21:22 |
corvus | in opendev, we're actually trying to simplify our images to reduce our maintenance cost | 21:23 |
y2kenny | ok. so ensure->task->fetch is typical | 21:23 |
corvus | (but big things that we use a lot, we'll probably keep on our images because they're worth it) | 21:23 |
y2kenny | how about having tools as part of the source repository. Do you guys have use cases like that? | 21:24 |
corvus | y2kenny: yeah, we occasionally run things out of a "tools/" directory. there's even a zuul-jobs role to run "tools/test-setup.sh" | 21:24 |
y2kenny | (the first example that comes to mind is the checkpatch.pl that comes with the kernel but I am not sure how typical that type of things are.) | 21:24 |
y2kenny | ok I see | 21:24 |
corvus | y2kenny: yeah, that would be a good case | 21:24 |
mordred | we also split into ensure-foo and foo so that we can make jobs do the ensure-foo step in the pre-playbook ... which will retry the job if it fails | 21:26 |
*** y2kenny38 has joined #zuul | 21:29 | |
*** y2kenny43 has joined #zuul | 21:30 | |
*** y2kenny43 has left #zuul | 21:30 | |
*** y2kenny has quit IRC | 21:32 | |
*** y2kenny has joined #zuul | 21:44 | |
*** rlandy has quit IRC | 22:15 | |
mnaser | hmm | 22:32 |
mnaser | y2kenny, corvus: do we not have a markdownlint job already in place that someone can just use? | 22:32 |
y2kenny | mnaser: that's the one I was using | 22:32 |
mnaser | y2kenny: oh interesting, so that means the job doesnt install by default? hmm | 22:33 |
y2kenny | but I need to put ensure-markdownlint in first | 22:33 |
y2kenny | and to use ensure-markdownlint I have to have a task to install npm | 22:33 |
mnaser | tbh if you are using the 'markdownlint' *job* and it's not "just working" then that's a bug | 22:33 |
y2kenny | sounds like there's a convention of ensure-job before job | 22:33 |
mnaser | y2kenny: ensure-role before role | 22:34 |
y2kenny | right | 22:34 |
y2kenny | OH | 22:34 |
y2kenny | you mean there's a job? | 22:34 |
y2kenny | let me double check | 22:34 |
mnaser | that's what im wondering and checking :) | 22:34 |
mnaser | y2kenny: there is a job -- https://zuul-ci.org/docs/zuul-jobs/general-jobs.html#job-markdownlint | 22:34 |
corvus | mnaser: i agree, and i apologize for not thinking to mention that earlier y2kenny. | 22:35 |
y2kenny | yes. I should've just used that | 22:35 |
y2kenny | lol | 22:35 |
corvus | it's friday for everyone, right? :) | 22:35 |
mnaser | y2kenny: that should be plug and play in that case for you :) | 22:35 |
mnaser | corvus: friday .. monday .. wednesday .. i don't even know what day it is anymore | 22:35 |
mnaser | ENOSOCIALINTERACTION | 22:35 |
y2kenny | I will give that a try. thanks folks. | 22:36 |
mnaser | y2kenny: no problem! | 22:42 |
y2kenny | um... looks like the playbook job requires a bit more than my node can handle and failed | 22:46 |
y2kenny | it looks for gpg and my base image doesn't have it | 22:47 |
y2kenny | in project configuration, I believe I can add variables in a job for a pipeline. for example | 23:31 |
y2kenny | check: | 23:32 |
y2kenny | jobs: | 23:32 |
y2kenny | - name: some-job | 23:32 |
y2kenny | vars: | 23:32 |
y2kenny | var-name: "value" | 23:32 |
y2kenny | but for some reason I am getting "extra keys not allowed @ data['check']['jobs'][0]['vars']['var-name']" | 23:33 |
y2kenny | is this a yaml formatting thing or I misunderstood the documentation? | 23:33 |
clarkb | y2kenny: you need a : after some-job | 23:34 |
clarkb | or wait no maybe not since you used name: | 23:34 |
clarkb | looking for examples now | 23:35 |
fungi | name and vars are the keys under the job at index 0 | 23:35 |
fungi | so no, format looks correct to me | 23:35 |
clarkb | https://opendev.org/opendev/system-config/src/branch/master/.zuul.yaml#L1641-L1645 is how we do it | 23:35 |
clarkb | we don't use name: | 23:35 |
clarkb | (is that an optional format?) | 23:35 |
fungi | ahh, yeah we allow the name: thing for project definitions but maybe that's not a thing in job definitions | 23:37 |
y2kenny | clarkb: um... when I does that I get expected str for dictionary value @ data['check']['jobs'][0]['some-job'] | 23:38 |
clarkb | y2kenny: ya when you do it the way we do you have to add the : at the end then indent vars under some-job | 23:38 |
y2kenny | I did... but I think I may have missed an indent... | 23:39 |
clarkb | the v in vars should be under the m in some-job | 23:39 |
y2kenny | yup. That's does it! | 23:41 |
y2kenny | what does zuul run when it check this on the server? Is there a way for me to run it locally with some tool? | 23:41 |
y2kenny | (the multiple patchset is no big deal... I am just curious) | 23:42 |
clarkb | y2kenny: its doing a yaml load, then running its own data consistency checks on the data structures that get loaded | 23:42 |
clarkb | y2kenny: you could pretty easily do the yaml load step and ensure it parses yaml properly, but the data consistency checks are best/easiest done by zuul itself | 23:42 |
y2kenny | right. Ok. | 23:43 |
y2kenny | thanks for the help clarkb, fungi | 23:43 |
fungi | yw | 23:44 |
fungi | and yes, performing a dry run of zuul configuration checking is nontrivial because its context can potentially include the state of any branch of any repository in your whole system | 23:45 |
fungi | basically to determine "what will my zuul think of this" you have to ask it, or run a complete enough copy of your deployment to include all the bits which could be involved in assembling the job | 23:46 |
*** ysandeep is now known as ysandeep|rover | 23:48 | |
y2kenny | fungi: I see | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!