Friday, 2021-06-18

airship-irc-bot<ma257n> I am getting etcd timeout ```[airshipctl] 2021/06/18 03:42:26 opendev.org/airship/airshipctl/pkg/events/processor.go:84: Received error when applying resources to kubernetes: polling for status failed: etcdserver: request timed out Error events received on channel, errors are: [polling for status failed: etcdserver: request timed out]``` during ATT Airship2.0 CI build for my PS. Is there any workaround for this. Thanks in advance, (PS06:32
airship-irc-bot = https://review.opendev.org/c/airship/airshipctl/+/783533)06:32
airship-irc-bot<sirajudeen.yasin> I am adding multiple rechecks to get this PS merged https://review.opendev.org/c/airship/treasuremap/+/796930 getting intermittent issues with gate failures in check and gate stages Zuul: https://zuul.opendev.org/t/openstack/build/97a09c4286c3428da3a8cd55fa272848 From Log: helmrelease.helm.toolkit.fluxcd.io/ingress is InProgress: Helm upgrade failed: another operation (install/upgrade/rollback) is in progress13:45
airship-irc-bot<sidney.shiba> Hi Team, can I get some review for https://review.opendev.org/c/airship/treasuremap/+/796949. This PS adds _dex-aio_ deployment for _multi-tenant_ type for the Target cluster (i.e., _workload-target_ phase), which is required for downstream.13:58
airship-irc-bot<sidney.shiba> Is anyone having issues accessing lab VMs today, i.e., auk56b? Note I can access VM in bsr22a.15:28
airship-irc-bot<sirajudeen.yasin> https://zuul.opendev.org/t/openstack/builds?job_name=airship-airshipctl-gate-script-runner&pipeline=gate   => several timeouts and failures ate *gate* stage which is not usual15:54
airship-irc-bot<sidney.shiba> Hi Siraj, I had similar issues during `25_deploy_gating.sh` in Zuul and after three rechecks it went through.  May be there is no correlation but prior to that issue, Zuul was failing in `36_deploy_workload.sh` because master node was running out of resources, i.e., ephemeral disk, and `ingress controller` was being evicted. I needed to change d`ex-aio` pods to be deployed in the worker node (`nodeSelector`) to make it work.16:01
airship-irc-bot<sirajudeen.yasin> Thanks sidney. is there a PS to move dex-io to worker node ?16:54
airship-irc-bot<sidney.shiba> It has already been merged. The dex-aio is deployed using Helm charts and look at treasuremap/manifests/function/dex-aio/dex-helmrelease.yaml node_labels attribute. This attibute is used to override the value for _nodeSelector_ in the Deployment-dex.yaml template:  `nodeSelector:`   `node-role.kubernetes.io/worker: ""`17:44
airship-irc-bot<sirajudeen.yasin> oh! ok ... the builds are very unstable now in zuul https://zuul.opendev.org/t/openstack/builds?job_name=airship-treasuremap-deploy-test-site19:01
airship-irc-bot<sirajudeen.yasin> have to do multiple rechecks and if its lucky it will pass.. i am not lucky in last 5+ rechecks19:02
airship-irc-bot<sidney.shiba> Hello, I am trying to test some updates to _*nc-regions-labs*_ using _*airshipctl*_ for the first time and getting a strange error. Tried a simple "_*airshipctl phase list*_" and here is the error:  _unable to find one of 'kustomization.yaml', 'kustomization.yml' or 'Kustomization' in directory '/home/ubuntu/projects/nc-regions-labs'._  Anybody would know the cause and remedy?19:23
airship-irc-bot<raliev> looks like you have to update root directory of this site with changes to metadata.yaml and add kustomization.yaml file there the way it was done there  https://review.opendev.org/c/airship/airshipctl/+/792060/9/manifests/site/test-site/kustomization.yaml https://review.opendev.org/c/airship/airshipctl/+/792060/9/manifests/site/test-site/metadata.yaml19:40
airship-irc-bot<sidney.shiba> @raliev, checked and `nc-regions-labs` had already implemented this so needed to use `airshipctl` master branch, instead of pinned one in `treasuremap`. So now it passed previous issue but failed later on. I am guessing it is because `treasuremap/manifests/type/multi-tenant/phases` are not yet in sync with `airshipctl` master.  $ airshipctl plan listaccumulating resources: 2 errors occurred: * accumulateFile error:20:59
airship-irc-bot"accumulating resources from '../../../../../nc-release/manifests/type/cloudharbor/phases/': '/home/ubuntu/projects/nc-release/manifests/type/cloudharbor/phases' must resolve to a file" * accumulateDirector error: "recursed accumulation of path '/home/ubuntu/projects/nc-release/manifests/type/cloudharbor/phases': accumulating resources: 2 errors occurred:\n\t* accumulateFile error: \"accumulating resources from20:59
airship-irc-bot'../../../../../treasuremap/manifests/type/multi-tenant/phases': '/home/ubuntu/projects/treasuremap/manifests/type/multi-tenant/phases' must resolve to a file\"\n\t* accumulateDirector error: \"recursed accumulation of path '/home/ubuntu/projects/treasuremap/manifests/type/multi-tenant/phases': 2 errors occurred:\\n\\t* accumulateFile error: \\\"accumulating resources from '../../../../../airshipctl/manifests/function/validator': evalsymlink20:59
airship-irc-botfailure on '/home/ubuntu/projects/airshipctl/manifests/function/validator' : lstat /home/ubuntu/projects/airshipctl/manifests/function/validator: no such file or directory\\\"\\n\\t* loader.New error: \\\"error loading ../../../../../airshipctl/manifests/function/validator with git: url lacks host: ../../../../../airshipctl/manifests/function/validator, dir: evalsymlink failure on '/home/ubuntu/projects/airshipctl/manifests/function/validator'20:59
airship-irc-bot: lstat /home/ubuntu/projects/airshipctl/manifests/function/validator: no such file or directory, get: invalid source string: ../../../../../airshipctl/manifests/function/validator\\\"\\n\\n\"\n\n"20:59
airship-irc-bot<raliev> airshipctl also has some new changes in validation, `manifests/function/validator` transformer was removed, so you can safely remove all the references on it from `kustomization.yaml` files21:02
airship-irc-bot<sean.eagan> It looks like the k8s master nodes are getting overloaded during the gates, which causes the ingress pods to get evicted: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_l[…]ds/ingress-ingress-nginx-controller-585567484-d5wcw.yaml ```status:   message: 'The node was low on resource: ephemeral-storage. '   phase: Failed   reason: Evicted   startTime: "2021-06-18T19:12:55Z"``` which causes21:20
airship-irc-botthis failure: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_l[…]ship-treasuremap-deploy-test-site/11786cb/job-output.txt ```2021-06-18 19:19:19.224226 | primary | + echo 'Ensure we can reach ingress controller default backend' 2021-06-18 19:19:19.224257 | primary | Ensure we can reach ingress controller default backend 2021-06-18 19:19:19.224274 | primary | + curl --head --write-out '%{http_code}'21:20
airship-irc-bot--silent --output /dev/null 10.23.25.102/should-404 2021-06-18 19:19:19.918262 | primary | + '[' 404 '!=' 000 ]``` Hoping that moving the ingress-controller to the worker nodes will help stabilize this: https://review.opendev.org/c/airship/treasuremap/+/797159  Do we need to go through and see what else can be moved to worker nodes as well?21:20
airship-irc-bot<sidney.shiba> dex-aio has been moved to worker nodes because of this issue. It worked for me but it seems that Zuul master node reached its limits of resource availability and went into starvation.23:34

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!