Thursday, 2021-02-25

*** uzumaki has joined #airshipit03:10
*** uzumaki has quit IRC04:49
*** uzumaki has joined #airshipit05:18
*** uzumaki has quit IRC05:22
*** happyhemant has joined #airshipit08:48
*** jhesketh_ has joined #airshipit09:57
*** jhesketh has quit IRC10:03
*** irclogbot_2 has quit IRC10:03
*** irclogbot_1 has joined #airshipit10:06
*** uzumaki has joined #airshipit12:48
*** roman_g has joined #airshipit13:01
airship-irc-bot3<scott> @kk6740 sorry for the delay.   attached is my airship log.14:20
airship-irc-bot3<mattmceuen> I went ahead and added it to the agenda14:32
*** uzumaki has quit IRC14:33
*** uzumaki has joined #airshipit14:41
airship-irc-bot3<sidney.shiba> Thanks.14:57
*** uzumaki has quit IRC15:05
airship-irc-bot3<kk6740> @scott this step takes really long time. it is cluster-api deplying a k8s cluster on top of the node15:51
airship-irc-bot3<kk6740> you can watch that process by looking at the VM console and cluster-api logs15:51
airship-irc-bot3<mattmceuen> Hey all, design call will start asap, another meeting is finishing up16:00
airship-irc-bot3<steven.fitzpatrick> GM all. The other day I asked about some drift in the initinfra-target phase between treasuermap and airshipctl. If we look at the phase entrypoint in airshipctl, it's in `site/test-site/target/initinfra`, and `type/airship-core/target/initinfra` in treasuremap.  I'm trying to integrate my lma-infra composite into this phase, but running into trouble again because the treasuremap initinfra-target phase calls the flux v116:14
airship-irc-bot3`airshipctl/manifests/function/helm-operator`, and not the v2 `airshipctl/manifests/composite/flux-helm` used by airshipctl's initinfra-target phase. My composite was written with v2 in mind, so I need to get this sorted out for my PS to work.  I've tried simply swapping out the composite, but that isn't working. Is anyone available to work through these issues with me today? TIA16:14
*** roman_g has quit IRC17:01
*** uzumaki has joined #airshipit17:05
*** happyhemant has quit IRC18:08
*** uzumaki has quit IRC18:35
*** uzumaki has joined #airshipit18:36
airship-irc-bot3<rishabh.k.jain> Hi, I am trying to setup a baremetal cluster in ericsson lab. However, the ephemeral node is failing to get deployed. Has anyone encountered anything similar?18:52
airship-irc-bot3<rishabh.k.jain> ```TASK [airshipctl-run-script : Run script ./tools/deployment/25_deploy_ephemeral_node.sh] ***************************************************************************************************fatal: [primary]: FAILED! => {     "changed": true,     "cmd": "set -xe;\n./tools/deployment/25_deploy_ephemeral_node.sh\n",     "delta": "0:00:26.792459",     "end": "2021-02-25 18:38:26.807103",     "rc": 1,     "start": "2021-02-2518:52
airship-irc-bot318:38:00.014644" }  STDOUT:  Deploy ephemeral node using redfish with iso   STDERR:  + ./tools/deployment/25_deploy_ephemeral_node.sh + export TIMEOUT=3600 + TIMEOUT=3600 + export KUBECONFIG=/home/airship/.airship/kubeconfig + KUBECONFIG=/home/airship/.airship/kubeconfig + export KUBECONFIG_EPHEMERAL_CONTEXT=ephemeral-cluster + KUBECONFIG_EPHEMERAL_CONTEXT=ephemeral-cluster + echo 'Deploy ephemeral node using redfish with iso' + airshipctl18:52
airship-irc-bot3phase run remotedirect-ephemeral --debug gpg: keybox '/tmp/pubring.kbx' created gpg: /tmp/trustdb.gpg: trustdb created gpg: key 3D16CEE4A27381B4: public key "SOPS Functional Tests Key 1 (https://github.com/mozilla/sops/) <secops@mozilla.com>" imported gpg: key 3D16CEE4A27381B4: secret key imported gpg: Total number processed: 1 gpg:               imported: 1 gpg:       secret keys read: 1 gpg:   secret keys imported: 1 gpg: keybox18:52
airship-irc-bot3'/tmp/pubring.kbx' created gpg: /tmp/trustdb.gpg: trustdb created gpg: key 3D16CEE4A27381B4: public key "SOPS Functional Tests Key 1 (https://github.com/mozilla/sops/) <secops@mozilla.com>" imported gpg: key 3D16CEE4A27381B4: secret key imported gpg: Total number processed: 1 gpg:               imported: 1 gpg:       secret keys read: 1 gpg:   secret keys imported: 1 gpg: keybox '/tmp/pubring.kbx' created gpg: /tmp/trustdb.gpg: trustdb18:52
airship-irc-bot3created gpg: key 3D16CEE4A27381B4: public key "SOPS Functional Tests Key 1 (https://github.com/mozilla/sops/) <secops@mozilla.com>" imported gpg: key 3D16CEE4A27381B4: secret key imported gpg: Total number processed: 1 gpg:               imported: 1 gpg:       secret keys read: 1 gpg:   secret keys imported: 1 gpg: keybox '/tmp/pubring.kbx' created gpg: /tmp/trustdb.gpg: trustdb created gpg: key 3D16CEE4A27381B4: public key "SOPS Functional18:52
airship-irc-bot3Tests Key 1 (https://github.com/mozilla/sops/) <secops@mozilla.com>" imported gpg: key 3D16CEE4A27381B4: secret key imported gpg: Total number processed: 1 gpg:               imported: 1 gpg:       secret keys read: 1 gpg:   secret keys imported: 1 [airshipctl] 2021/02/25 18:38:19 opendev.org/airship/airshipctl@/pkg/cluster/clustermap/map.go:64: cluster  is not defined in cluster map &{{ClusterMap airshipit.org/v1alpha1} {main-map      018:52
airship-irc-bot30001-01-01 00:00:00 +0000 UTC <nil> <nil> map[airshipit.org/deploy-k8s:false] map[config.kubernetes.io/path:clustermap_main-map.yaml] [] []  []} map[ephemeral-cluster:0xc0005beaf0 target-cluster:0xc0005beb40]} {"Message":"","Operation":"","Timestamp":"2021-02-25T18:38:19.375170603Z","Type":"Unknown event type: "} gpg: keybox '/tmp/pubring.kbx' created gpg: /tmp/trustdb.gpg: trustdb created gpg: key 3D16CEE4A27381B4: public key "SOPS18:52
airship-irc-bot3Functional Tests Key 1 (https://github.com/mozilla/sops/) <secops@mozilla.com>" imported gpg: key 3D16CEE4A27381B4: secret key imported gpg: Total number processed: 1 gpg:               imported: 1 gpg:       secret keys read: 1 gpg:   secret keys imported: 1 [airshipctl] 2021/02/25 18:38:22 opendev.org/airship/airshipctl@/pkg/inventory/baremetal/baremetal.go:73: Using selector {node02  } to filter one baremetal host [airshipctl] 2021/02/2518:52
airship-irc-bot318:38:22 opendev.org/airship/airshipctl@/pkg/remote/redfish/client.go:283: Bootstrapping ephemeral host with ID 'air-ephemeral' and BMC Address 'redfish+https://10.23.25.1:8443/redfish/v1/Systems/air-ephemeral'. [airshipctl] 2021/02/25 18:38:23 opendev.org/airship/airshipctl@/pkg/remote/redfish/client.go:293: Ephemeral node has power status 'OFF'. Attempting to power on. [airshipctl] 2021/02/25 18:38:2618:52
airship-irc-bot3opendev.org/airship/airshipctl@/pkg/events/processor.go:60: Received error on event channel {redfish client encountered an error: 500 INTERNAL SERVER ERRORBMC responded: 'Error changing power state at libvirt URI "qemu:///system": internal error: qemu unexpectedly closed the monitor: 2021-02-25T18:38:23.845220Z qemu-system-x86_64: error: failed to set MSR 0x38d to 0x0 qemu-system-x86_64:18:52
airship-irc-bot3/build/qemu-jNW88T/qemu-2.11+dfsg/target/i386/kvm.c:1906: kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.'} Error events received on channel, errors are: [redfish client encountered an error: 500 INTERNAL SERVER ERROR BMC responded: 'Error changing power state at libvirt URI "qemu:///system": internal error: qemu unexpectedly closed the monitor: 2021-02-25T18:38:23.845220Z qemu-system-x86_64: error: failed to set MSR 0x38d to18:52
airship-irc-bot30x0 qemu-system-x86_64: /build/qemu-jNW88T/qemu-2.11+dfsg/target/i386/kvm.c:1906: kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.']   MSG:  non-zero return code  PLAY RECAP *********************************************************************************************************************************************************************************primary                    : ok=8    changed=6    unreachable=0    failed=118:52
airship-irc-bot3skipped=0    rescued=0    ignored=0   ```18:52
airship-irc-bot3<sirajudeen.yasin> @rishabh.k.jain, looks like some libvirt issue.. see if u can launch a small VM for testing19:06
*** raildo has joined #airshipit19:07
airship-irc-bot3<sirajudeen.yasin> Team, Can i request some reviews on this treasuremap PS https://review.opendev.org/c/airship/treasuremap/+/770446  Validated deployment in local setup and 3rd party gating.. it passed till workload stage19:07
*** uzumaki has quit IRC19:33
*** uzumaki has joined #airshipit19:36
*** uzumaki has quit IRC19:37
airship-irc-bot3<mattmceuen> Hey team, @ih616h's airship-in-a-pod change failed its merge gate, with this line failing while trying to tease out the BMH definitions from the test-site controlplane: ```kustomize build --enable_alpha_plugins  src/opendev.org/airship/airshipctl/manifests/site/test-site/ephemeral/controlplane 2>/dev/null | kustomize cfg grep  "kind=BareMetalHost"```19:58
airship-irc-bot3<mattmceuen> I was able to run (essentially) the same line successfully locally against Ian's change, and I don't think anything in Ian's change would have impacted that job.  Anyone recognize what might have caused this?19:59
airship-irc-bot3<sirajudeen.yasin> @mattmceuen, we have the same issue in zuul also.. most of the retries are because of this issue20:01
airship-irc-bot3<ih616h> @sirajudeen.yasin Thanks for the update - should I recheck? or just hold off20:02
airship-irc-bot3<sirajudeen.yasin> for treasuremap, it never passed this stage.. i did several rechecks.. i dont know if that pool of nodes all have issues.. it used to work earlier... not sure what changes on the worker nodes20:03
airship-irc-bot3<mattmceuen> That's really strange - I wouldn't expect zuul node-by-node differences in kustomize behavior20:04
airship-irc-bot3<mattmceuen> when our scripts download kustomize20:04
airship-irc-bot3<sirajudeen.yasin> @mattmceuen, sorry i forgot this, are we downloading kustomize in scripts20:06
airship-irc-bot3<mattmceuen> @sirajudeen.yasin: https://github.com/airshipit/airshipctl/blob/master/roles/install-kustomize/tasks/main.yaml20:07
airship-irc-bot3<sirajudeen.yasin> thanks matt20:08
airship-irc-bot3<sirajudeen.yasin> https://zuul.opendev.org/t/openstack/builds?job_name=airship-treasuremap-deploy-test-site&project=airship/treasuremap20:18
airship-irc-bot3<sirajudeen.yasin> all the retry failures are at the same step20:19
airship-irc-bot3<mattmceuen> Are we getting that error 100% of the time in treasuremap on zuul?20:19
airship-irc-bot3<sirajudeen.yasin> yes Matt, now am seeing this 100% from last few days20:20
airship-irc-bot3<mattmceuen> Are we getting it 100% of the time on airshipctl?  Or is Ian's PS special somehow...20:20
airship-irc-bot3<sirajudeen.yasin> not on airshipctl gates.. very rare in airshipctl i have noticed.. i can double check and confirm20:21
airship-irc-bot3<mattmceuen> Yeah, that would be the questions 1) what's different about treasuremap, 2) why sometimes in airshipctl20:21
airship-irc-bot3<mattmceuen> *beg20:21
airship-irc-bot3<sirajudeen.yasin> may be we should speak to Roman.. i have informed this to Roman already.. i dont know the difference between nodes used..  we can try to check from labels20:23
airship-irc-bot3<sirajudeen.yasin> I see few retries on airshipctl now for the same reason https://zuul.opendev.org/t/openstack/builds?job_name=airship-airshipctl-gate-script-runner20:24
airship-irc-bot3<mattmceuen> When I try running the command manually against treasuremap v2 I get this error (unlike success in airshipctl): ```madgin@leviathan:~/airship2/treasuremap]$ kustomize build --enable_alpha_plugins  manifests/site/test-site/ephemeral/controlplane | kustomize cfg grep  "kind=BareMetalHost" failed to find any source resources identified by Gvk: ~G_~V_VariableCatalogue Name: generated-secretsfailed to find any source resources20:25
airship-irc-bot3identified by Gvk: ~G_~V_VariableCatalogue Name: generated-secretsError: couldn't execute function: exit status 1 ```20:25
airship-irc-bot3<mattmceuen> Is that expected @sirajudeen.yasin since the fixes for treasuremap haven't been merged yet?20:25
airship-irc-bot3<sirajudeen.yasin> you will have to set kustomize home20:26
airship-irc-bot3<sirajudeen.yasin> let me try20:26
airship-irc-bot3<mattmceuen> nope I don't think that's it20:27
airship-irc-bot3<sirajudeen.yasin> KUSTOMIZE_PLUGIN_HOME=. kustomize build --enable_alpha_plugins  manifests/site/test-site/ephemeral/controlplane | kustomize cfg grep  "kind=BareMetalHost"20:27
airship-irc-bot3<sirajudeen.yasin> this works for me'20:27
airship-irc-bot3<sirajudeen.yasin> oh Sorry20:27
airship-irc-bot3<sirajudeen.yasin> yes, it is because of open PS20:27
airship-irc-bot3<mattmceuen> gotcha - thanks20:28
airship-irc-bot3<sirajudeen.yasin> with my PS, it should work20:28
airship-irc-bot3<mattmceuen> yeah, it wfm locally using your patchset too20:29
*** raildo_ has joined #airshipit20:35
*** mugsie_ has joined #airshipit20:36
*** mugsie has quit IRC20:36
*** raildo has quit IRC20:38
*** parallax has quit IRC20:38
*** mattmceuen has quit IRC20:38
*** airship-irc-bot has joined #airshipit20:39
*** airship-irc-bot3 has quit IRC20:39
*** parallax has joined #airshipit20:39
*** dasp has quit IRC20:43
*** dasp has joined #airshipit20:45
*** mattmceuen has joined #airshipit20:54
airship-irc-bot<mattmceuen> Hey @sirajudeen.yasin while we're waiting to see whether your change fixes the issue -- @mf4192 happened to reproduce the same error locally , but after freeing up some disk space and running the gate script again, it worked fine21:21
airship-irc-bot<mattmceuen> Not clear whether "freeing up disk space", "running the gate script again", or "black magic" was the fix in his case21:21
airship-irc-bot<mattmceuen> I don't understand why low disk space would cause that particular command to fail, but that's certainly something that could be different on a node-by-node basis21:23
airship-irc-bot<sirajudeen.yasin> that make sense Matt, whenever i noticed this error.. i also noticed that no usable temporary directory.. ```FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/home/zuul']```21:27
airship-irc-bot<sirajudeen.yasin> so due to space issue it cannot even write anything into buffer to pipe and read21:27
airship-irc-bot<mattmceuen> ohh -- could it be that the kustomize download is actually failing silently, so there is no kustomize binary :slightly_smiling_face:21:27
airship-irc-bot<sirajudeen.yasin> that should have caused a different error.. command not found21:28
airship-irc-bot<sirajudeen.yasin> in this case it might have space to write the output of the kustomize to a tmp path21:28
airship-irc-bot<mattmceuen> but there's the 2>/dev/null in there21:28
airship-irc-bot<sirajudeen.yasin> let me add more debug to the same commit.. the binary path did not help it was already correct21:29
airship-irc-bot<sirajudeen.yasin> 2>/dev/null  => this is just to supress the error right.. stdout still needs to be written to some tmp to read in the pipe operation21:32
airship-irc-bot<mattmceuen> yeah, kustomize prints info to stderr, so that stdout can be purely the yaml stream I guess21:33
airship-irc-bot<mattmceuen> I'm less sure of the "kustomize isn't there" theory -- I tested locally, and it looks like the 2>/dev/null swallows up the "file not found" error before the pipe, but I still got one from the command after the pipe21:34
airship-irc-bot<sirajudeen.yasin> here is df -f just before running kustomize command21:43
airship-irc-bot<sirajudeen.yasin>21:43
airship-irc-bot<mattmceuen> looks pretty full21:45
airship-irc-bot<sirajudeen.yasin> Ah! i removed the 2 >/dev/null and seeing this issue now "2021-02-25 21:58:52.310074 | TASK [get BareMetalHost objects] 2021-02-25 21:58:56.239170 | primary | Unable to find image 'gcr.io/kpt-fn-contrib/sops:v0.1.0' locally"21:59
airship-irc-bot<sirajudeen.yasin> looks like this is the issue thats eaten by 2>/dev/null .... trying to understand when the sops pluginn image gets downloaded22:03
airship-irc-bot<mattmceuen> The plugin image should be downloaded during the kustomize run (by kustomize)22:05
airship-irc-bot<mattmceuen> So "unable to find" seems reasonable, as long as it tries/succeeds in downloading it22:05
airship-irc-bot<sirajudeen.yasin> we had similar issue in arishipctl for replacement and templater22:05
airship-irc-bot<sirajudeen.yasin> we had to run make images, only then it worked22:05
airship-irc-bot<mattmceuen> that was a long time ago, right?22:05
airship-irc-bot<mattmceuen> well, never mind on that22:06
airship-irc-bot<mattmceuen> but in any case - the sops plugin doesn't live in airshipctl, so shouldn't have to run make images here22:06
airship-irc-bot<sirajudeen.yasin> i think in got the issue22:09
airship-irc-bot<mattmceuen> For comparison - if I don't have the image locally I see it try to download it: ```[madgin@leviathan:~/airship2/airshipctl]$ kustomize build --enable_alpha_plugins  manifests/site/test-site/ephemeral/controlplane | kustomize cfg grep  "kind=BareMetalHost" Unable to find image 'gcr.io/kpt-fn-contrib/sops:v0.1.0' locally v0.1.0: Pulling from kpt-fn-contrib/sops 0a6724ff3fcd: Pulling fs layer ...```22:11
airship-irc-bot<sirajudeen.yasin> i tried the same, but it works22:12
airship-irc-bot<mattmceuen> Mine up above works too -- I'm just saying that "Unable to find" is an ok message.  When you saw it in the gate, did it pull the image successfully?22:13
airship-irc-bot<sirajudeen.yasin> agreed22:13
*** airship-irc-bot1 has joined #airshipit23:34
*** airship-irc-bot has quit IRC23:34
airship-irc-bot1<sirajudeen.yasin> We ran df -h before running kustomize command and we had 288M, so sops image was attepmted to download which is 302MB.. will that might be the reason..23:46
airship-irc-bot1<sirajudeen.yasin> so can we say disk space is the issue with nodes for retry failures .. failing at kustomize command23:47
airship-irc-bot1<mattmceuen> Ok got it.  So the next question, do we know what/how to clean up on the disks?23:47
airship-irc-bot1<mattmceuen> good detective work btw23:48
airship-irc-bot1<sirajudeen.yasin> :slightly_smiling_face:23:48
*** sreejithp has joined #airshipit23:59
airship-irc-bot1<sirajudeen.yasin> https://opendev.org/openstack/project-config/src/branch/master/nodepool/nl02.openstack.org.yaml#L276  ==> looks like it should be 100GB .. but we dont see that in df -h .. am not clear on that23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!