Wednesday, 2021-07-14

airship-irc-bot<mattmceuen> Hey @sean.eagan I'm able to consistently reproduce an issue with current airshipctl (via airship in a pod), where 1. the `controlplane-ephemeral` phase completes 2. the subsequent interaction with the target cluster fails because it's not really up I see this airshipctl/cliutils output: ```[airshipctl] 2021/07/14 16:05:12 opendev.org/airship/airshipctl/pkg/k8s/applier/applier.go:77: Getting infos for bundle, inventory id is16:26
airship-irc-botcontrolplane-ephemeral                   ... baremetalhost.metal3.io/node01 is Current: Resource is current kubeadmcontrolplane.controlplane.cluster.x-k8s.io/cluster-controlplane is Current: Resource is current metal3cluster.infrastructure.cluster.x-k8s.io/target-cluster is Current: Resource is current [airshipctl] 2021/07/14 16:05:18 opendev.org/airship/airshipctl/pkg/k8s/applier/applier.go:92: applier channel closed16:26
airship-irc-botmetal3machinetemplate.infrastructure.cluster.x-k8s.io/cluster-controlplane is Current: Resource is current [airshipctl] 2021/07/14 16:05:18 opendev.org/airship/airshipctl/pkg/phase/client.go:295: executing phase: kubectl-get-node-target``` But the state of the `kubeadmcontrolplane` is still: ```root@airship-in-a-pod:/# kubectl --kubeconfig /root/.airship/kubeconfig --context ephemeral-cluster get kubeadmcontrolplane -A                    16:26
airship-irc-botNAMESPACE      NAME                   READY   INITIALIZED   REPLICAS   READY REPLICAS   UPDATED REPLICAS   UNAVAILABLE REPLICAS                     target-infra   cluster-controlplane                         1                           1                  1                    ```16:26
airship-irc-bot<mattmceuen> We'd expect cliutils/airshipctl to wait till the kubeadmcontrolplane was in a READY status before proceeding, right?  Do you know of any corner cases that could defeat that?16:26
airship-irc-bot<kk6740> @mattmceuen no, that isn’t the case i think. because cluster-api objects don’t fully implement ready conditions. So as far as i remember, we had a special waiter that made sure control plane is reacable16:33
airship-irc-bot<mattmceuen> Ah ok - thanks Konstantine, that may be what's getting defeated (and may actually be the thing that's breaking with `Unable to connect to the server: dial tcp 10.23.25.102:6443: connect: no route to host` -- I'll dig into that a bit16:35
airship-irc-bot<kk6740> what is the next phase that is coming after that?16:43
airship-irc-bot<kk6740> in airship in the pod env16:43
airship-irc-bot<mattmceuen> kubectl-get-node-target is what's defined in the gating phase plan (so I don't think specific to aiap)16:44
airship-irc-bot<mattmceuen> Later on we call a kubectl-wait-cluster-target -- maybe we should be calling it at that point as well?16:44
airship-irc-bot<kk6740> get node should be able to wait for node to become reacable , and retry 60 times with interval 30 seconds16:46
airship-irc-bot<mattmceuen> ok, let me make sure that's the phase that really gets called in aiap :slightly_smiling_face:16:46
airship-irc-bot<kk6740> i don’t remember aiap setup, maybe its not working with the plan at the moment16:47
airship-irc-bot<mattmceuen> It looks like `kubectl-get-node-target` is the phase that's failing, with ```airshipctl] 2021/07/14 16:05:18 opendev.org/airship/airshipctl/pkg/phase/client.go:295: executing phase: kubectl-get-node-target {"Message":"starting generic container","Operation":"GenericContainerStart","Timestamp":"2021-07-14T16:05:18.894734428Z","Type":"GenericContainerEvent"}          [airshipctl] 2021/07/14 16:05:1816:48
airship-irc-botopendev.org/airship/airshipctl/pkg/k8s/kubeconfig/builder.go:258: Received error when extracting context, ignoring kubeconfig. Error: failed merging kubeconfig: source context 'target-cluster' does not exist in source kubeconfig          [airshipctl] 2021/07/14 16:05:18 opendev.org/airship/airshipctl/pkg/k8s/kubeconfig/builder.go:168: Merging kubecontext for cluster 'target-cluster', into site kubeconfig [airshipctl] 2021/07/14 16:05:1816:48
airship-irc-botopendev.org/airship/airshipctl/pkg/phase/executors/container.go:184: Config reference is specified, looking for the object in config ref: '&ObjectReference{Kind:ConfigMap,Namespace:,Name:kubectl-get-node,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,}' [airshipctl] 2021/07/14 16:05:19 Filtering input bundle by Group: , Version: , Kind: + kubectl --kubeconfig /kubeconfig --context target-cluster --request-timeout 10s get node Unable to connect16:48
airship-irc-botto the server: dial tcp 10.23.25.102:6443: connect: no route to host```16:48
airship-irc-bot<kk6740> did it exit? because it should try it 60 times16:49
airship-irc-bot<kk6740> and output this error untill it succeeds16:49
airship-irc-bot<mattmceuen> Yeah, it exited immediately16:49
airship-irc-bot<mattmceuen> no loop16:49
airship-irc-bot<mattmceuen> weird16:49
airship-irc-bot<kk6740> https://github.com/airshipit/airshipctl/blob/master/manifests/function/phase-helpers/wait_node/kubectl_wait_node.sh16:50
airship-irc-bot<kk6740> this is the script16:50
airship-irc-bot<mattmceuen> `set -xe`16:50
airship-irc-bot<mattmceuen> Plus `timeout 20 kubectl --context $KCTL_CONTEXT get node`16:51
airship-irc-bot<mattmceuen> wont that quit out if kubectl returns non-zero?16:51
airship-irc-bot<kk6740> ```"$(timeout 20 \               kubectl --context $KCTL_CONTEXT \               get node -o name | wc -l)"``` This is executed in and i don’t think `-xe`  flags are proagated to the subshell16:51
airship-irc-bot<mattmceuen> ah I see16:52
airship-irc-bot<kk6740> but i have a feeling that this part succeeds ```"$(timeout 20 \               kubectl --context $KCTL_CONTEXT \               get node -o name | wc -l)"``` while this one fails: ```timeout 20 kubectl --context $KCTL_CONTEXT get node```16:53
airship-irc-bot<mattmceuen> This is what we see in the airshipctl output as the failing line: ```kubectl --kubeconfig /kubeconfig --context target-cluster --request-timeout 10s get node```16:53
airship-irc-bot<kk6740> w8 i am looking at the wrong script :slightly_smiling_face:16:53
airship-irc-bot<mattmceuen> That doesn't match either of the ones in the script, does it?16:53
airship-irc-bot<mattmceuen> hahaha16:53
airship-irc-bot<kk6740> https://github.com/airshipit/airshipctl/blob/master/manifests/function/phase-helpers/get_node/kubectl_get_node.sh16:54
airship-irc-bot<kk6740> can u check previous phase?16:54
airship-irc-bot<mattmceuen> Goes stright from `controlplane-ephemeral` -> `kubectl-get-node-target`16:55
airship-irc-bot<kk6740> that certainly looks like a wrong order16:55
airship-irc-bot<kk6740> because i think we need to wait for the node first, and then get it16:55
airship-irc-bot<mattmceuen> The strange thing is, why am I hitting this and the gates aren't?16:55
airship-irc-bot<kk6740> i actually dont think we need to get it at all :slightly_smiling_face:16:55
airship-irc-bot<mattmceuen> yeah :slightly_smiling_face:16:56
airship-irc-bot<kk6740> that maybe because controlplane-ephemeral actually waits for some conditions16:56
airship-irc-bot<kk6740> but not exactly what we need16:57
airship-irc-bot<kk6740> so sometimes by the time it exists, node is available, and sometimes its not16:57
airship-irc-bot<mattmceuen> yeah, so may be sensitive to the environment its running in16:58
airship-irc-bot<kk6740> yes, so to mitigate that, we need different order in the plan16:58
airship-irc-bot<mattmceuen> what about the `kubectl-wait-cluster-target`, is this the waiting scenario it's designed for, or no?16:59
airship-irc-bot<kk6740> exactly that phase is working against target-cluster api, and its not avaialable yet at this point. But we can reuse the script, and create a phase `kubectl-wait-cluster-ephemeral` that would do the same for us17:02
airship-irc-bot<kk6740> but on ephemeral API17:02
airship-irc-bot<mattmceuen> ah I see17:02
airship-irc-bot<kk6740> wait-cluster waits for a cluster object to reach controlPlaneReady condition, so this is a good way for us to wait17:03
airship-irc-bot<kk6740> so it would be good for transparency and predictability: ```- kubectl-wait-cluster-ephemeral - kubectl-wait-node-target - kubectl-get-node-target``` 17:05
airship-irc-bot<mattmceuen> I added the first two with: https://review.opendev.org/c/airship/airshipctl/+/800819 and will check whether it fixes my problem in aiap we already have kubectl-get-node-target though17:34
airship-irc-bot<kk6740> :+1:  let's see how it goes17:37
airship-irc-bot<hr858f> Team, can I have reviews: https://review.opendev.org/c/airship/sip/+/800566? Thanks.22:05

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!