Monday, 2018-08-27

*** hongbin has joined #openstack-containers01:03
*** ramishra has joined #openstack-containers02:09
openstackgerritFeilong Wang proposed openstack/magnum master: Add health_status and health_status_reason to cluster  https://review.openstack.org/57081802:28
openstackgerritFeilong Wang proposed openstack/magnum master: [k8s] Update cluster health status by native API  https://review.openstack.org/57289702:28
*** adrianreza has joined #openstack-containers02:30
*** ramishra has quit IRC03:00
*** ramishra has joined #openstack-containers03:08
*** udesale has joined #openstack-containers03:27
*** ykarel has joined #openstack-containers04:54
*** hongbin has quit IRC05:27
*** janki has joined #openstack-containers06:06
*** Bhujay has joined #openstack-containers06:34
*** Bhujay has quit IRC06:40
*** pcaruana has joined #openstack-containers06:49
*** Bhujay has joined #openstack-containers06:49
*** Bhujay has quit IRC07:10
*** sfilatov has joined #openstack-containers07:14
*** Bhujay has joined #openstack-containers07:18
*** mattgo has joined #openstack-containers07:29
*** rcernin has quit IRC07:31
openstackgerritSergey Filatov proposed openstack/magnum master: Remove deprecated `tls-ca-file` option from kube-apiserver  https://review.openstack.org/59664707:32
sfilatovBasically any cluster on 1.11 with tls enabled will fail ^^^07:33
sfilatovSo I think this change should make it before the freeze07:33
ebbexkaiokmo: Hey, how far have you come troubleshooting your kubernetes problem? I have a similar problem with a timeout, and have traced it down to "scripts/part-008" of cloud-init on the master fails. But I don't really understand why.07:34
sfilatovI've checked cloud-init logs & logged into runc container to run it manually07:38
sfilatovOh, sry, I thought you referred to my msg :)07:39
ebbexsfilatov: No worries, you wouldn't happen to have an idea as to why magnum/drivers/common/templates/kubernetes/fragments/write-kube-os-config.sh (part-008) would fail on a fedora-atomichost?07:42
sfilatovebbex: Have you checked that all the needed variables are in /etc/sysconfig/heat-params?07:44
ebbexsfilatov: Yep. seems good to me. I can do "bash -x ...blahbla.../part-008, and the file gets created fine.07:48
ebbexOne strange thing maybe is the "WAIT_CURL", if I run that curl statement, I notice have certificate issues. (self-signed)07:49
strigazisfilatov: are you here?07:49
sfilatovebbex: No clue then. You should look for the error in journalctl07:49
sfilatovstrigazi: yep07:49
strigazilast week I pinged you here07:50
strigaziregarding tls-ca-file07:50
strigaziHow are you using magnum?07:50
strigaziyou have a fork?07:50
sfilatovstrigazi: Yes07:50
sfilatovstrigazi: It's forked from ocata version07:51
strigaziyou are not using docker.io/openstackmagnum/ containers right?07:51
sfilatovstrigazi: Yes, we moved from it07:51
sfilatovstrigazi: Is there a difference in your version?07:52
strigaziSo for others tls-ca-file is not such a breaking proble,m07:52
strigaziAnd we magnum we use tls07:52
strigaziAnd in magnum we use tls07:52
strigaziYou were mentioning that we don't in the channel.07:52
sfilatovstrigazi: Why is that not a problem in your distribution07:52
strigaziThe tls-ca-file param is unset in the container running the apiserver07:53
strigaziIt is not per distribution07:53
strigaziif you clone the rocky branch it works07:53
strigaziThe change is welcome of course to remove the old parameter.07:54
sfilatovstrigazi: It's interesting why it works then07:54
strigaziHow are you using magnum then? What OS?07:55
sfilatovstrigazi: fedora-atomic07:55
sfilatovstrigazi: It's very close to upstream I guess07:55
sfilatovstrigazi: Same OS used07:55
strigazirunning kubernetes in containers or you have a custom fedora-atomic07:56
sfilatovKubernetes is run in atomic containers07:56
sfilatovand the we build kubernetes containers from atomic repo07:57
strigazicool07:57
sfilatovtls-ca-file is added explicitly07:57
sfilatovis tls is not disabled07:57
sfilatovso it's weird how your kube-apiserver is started07:58
strigaziin the containers I have pushed upstream it is unset before starting07:58
sfilatovAh07:58
sfilatovDidn;t see that coming07:58
sfilatov:)07:58
*** ykarel is now known as ykarel|lunch07:59
strigaziI tried to make the builds in the CI, but I was stuck, I'll pick it up again:07:59
sfilatovOkay then, no questions then07:59
strigazihttps://review.openstack.org/#/c/585420/07:59
*** mvpnitesh has joined #openstack-containers08:00
sfilatovYeah, It would be great, I had this one in mind but didn't have time to check it out08:00
*** mvpnitesh has quit IRC08:34
*** ykarel|lunch is now known as ykarel08:41
*** Bhujay has quit IRC08:48
*** Bhujay has joined #openstack-containers09:21
*** mvpnitesh has joined #openstack-containers10:09
*** mvpnitesh has quit IRC10:19
*** pcaruana has quit IRC10:32
*** pcaruana has joined #openstack-containers10:32
*** slagle has quit IRC10:42
*** slagle has joined #openstack-containers10:44
*** flwang1 has joined #openstack-containers10:49
flwang1strigazi: around for a quick sync?10:50
*** slagle has quit IRC10:55
strigaziflwang1: here11:09
flwang1strigazi: https://review.openstack.org/#/c/572897/ is basically ready for review11:10
flwang1now i'm just adding more unit test to fix the cover job11:10
flwang1we need to think about the auto healing part to make sure the health_status_reason's data structure is good enough11:11
flwang1do you know if Ricardo has any PoC code for auto healing?11:11
strigazino, he hasn't11:13
strigaziI'm reviewing now11:13
strigazioh flwang1 you were a glance core right?11:13
strigazi:)11:13
flwang1yes, for years11:13
strigaziI have a quick glance question, I'm helping a user here11:14
flwang1sure11:14
flwang1i'm listening11:14
strigazilet's not polute the containers channel, I'm pm-ing you11:14
flwang1no problem11:14
strigaziI'll give a go to your patch now11:21
strigazihave you started with the kubelet patch? It is not trivial to make kubelet and flannel work on the master11:22
flwang1i will start the kubelet work tomorrow11:22
flwang1and i will start to review/test your rolling upgrade patch11:24
strigazicool11:25
flwang1i have a question about magnum tempest for you11:25
strigaziI'have an old wip I rebased. I can push it an you can pick it up11:25
strigazifor kubelet ^^11:25
flwang1that would be nice11:25
flwang1as you know, last week i was working on the certified k8s stuff, so not much progress on coding11:26
strigazithat was cool, thanks, it gave us visivility11:26
flwang1and my next plan on that part is enabling the functional testing to make our k8s test works11:27
flwang1back to the tempest question, I can see magnum tempest only test swarm mode, no k8s,  and we only test k8s with magnum server's function testing, is that correct?11:28
flwang1wait11:28
strigaziafaik, no, let me find it11:28
flwang1i'm wrong, for function testing, we're also only testing swarm mode11:29
strigazitempest tests only swarm-mode11:29
flwang1yep11:30
strigazibut functional tests k8s and swarm-mode and more11:30
strigazicoreos etc11:30
flwang1yes, and that's why our function testing for k8s is failing11:30
flwang1ok, we're on the same page now11:30
flwang1back to the tempest question11:30
flwang1in Catalyst Cloud, we rely on tempest for e2e testing and monitoring11:31
flwang1we run tempest scenarios test periodiclly to make sure our cloud is working11:31
flwang1so we'd like to make magnum tempest support k8s11:31
strigaziwe do this with rally11:31
strigaziwe test cluster creations and k8s deployment with rally11:32
flwang1rally does not support scenario test IIRC11:32
flwang1does rally support a full functional e2e testing for k8s?11:33
strigazione sec11:33
strigaziwe can add a scenario here, just one method: https://github.com/openstack/rally-openstack/blob/master/rally_openstack/scenarios/magnum/k8s_pods.py11:35
strigazidownstream we patched rally to use kubectl and not the pythonclient11:35
flwang1ok11:38
flwang1but we're not using rally :(11:38
*** udesale has quit IRC11:38
flwang1are you using a dedicated rally service?11:38
strigazithere is not real benefit in using tempest vs rally for this usecsae11:38
strigazisince we have to use the sonobuoy client (or whatever the spelling is :))11:39
flwang1i'm not 100 % sure if sonobuoy (k8s e2e testing) can cover network policy testing with calico11:40
flwang1because except sonobuoy, we also want to cover network policy testing and persistent volume(cinder) testing11:41
strigazifor those cases it will be a custom test right?11:41
flwang1but i think you're probably right, if we can get sonobuoy work, then we're basically good enough11:41
*** ykarel is now known as ykarel|away11:42
flwang1strigazi: what do you mean customer test?11:42
strigazitest cinder and network policy11:42
strigazicustom, not customer :)11:42
flwang1i'd like to get them into magnum tempest plugin11:42
strigazilet's take from start.11:42
strigaziwe want to tests:11:42
flwang1since i think k8s+calico is a common user case11:42
strigaziwe want to test:11:42
strigazi1. network policies with calico11:43
strigazi2. cinder11:43
flwang13. DNS11:43
flwang14. basic k8s api11:43
flwang15. dashboard11:43
flwang16. sonobuoy11:43
strigazi3 and 4 are tested in sonobuoy11:43
sfilatovAre LB covered by sonobuoy?11:43
*** rtjure has quit IRC11:44
strigaziI11:44
flwang1sfilatov: i don't think so11:44
strigaziI'm not sure11:44
strigazimaybe it is in the tests that are skipped11:44
flwang1could be11:44
flwang1i think k8s e2e testing has covered 'everything', but we need to figure out the test suit which can meet magnum's requirements11:45
strigaziI think we can have two sets of tests:11:46
*** ykarel|away has quit IRC11:46
strigaziSome that are fast like  the ones we have atm in the functional tests11:46
strigaziand then if those are passing we can run sonobuoy11:47
strigazifor the dashboard we can just check the liveness probe of its pod?11:47
flwang1strigazi: probably, FWIW generally, if we can get sonobuoy works with tempest/rally, then we're in good shape11:50
flwang1because sonobuoy is the tempest for k8s11:50
strigazithe downside is that it takes an hour not matter what11:50
flwang1strigazi: we can manage the test cases list i think11:51
flwang1start with baby step (those we're interested)11:51
flwang1in other words, if we think sonobuoy is good, then we should just do it11:52
flwang1at least it is endorsed by CNCF foundation11:52
strigaziI still want some tests that are faster, doesn't make sense?11:54
flwang1it does make sense for sure11:55
flwang1hence why you can set the test list in tempest11:55
*** rtjure has joined #openstack-containers11:58
sfilatovSo the idea is to call sonobuoy from tempest?11:58
flwang1we have no idea yet ;)11:58
flwang1just brainstorming and discussing11:59
strigaziThis is the most obvious11:59
flwang1strigazi: for the auto healing feature, we may need another field or table to save the last status12:03
strigaziwe may need cluster versions ... this comes up all the time12:03
flwang1strigazi: https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-repair12:04
flwang1what do you mean cluster versions?12:04
strigazikeep the history of the cluster12:04
strigaziwhenever a change is done12:04
strigaziscale, upgrade, heal12:05
flwang1yep12:05
flwang1i see12:06
flwang1cluster versions or template versions?12:06
strigaziI guess both?12:06
flwang1because scaling and healing shouldn't change the cluster coe version12:07
flwang1only upgrade change the coe version, but for upgrade, we're going to use template to track different versions, is it?12:07
strigaziyes12:07
strigazibut even when you scale you need some history12:08
strigazior heal12:08
flwang1so, we probably at least to have a place to track the 'change-list'12:08
flwang1yes12:08
flwang1so better to call it 'change_history‘ or something like that12:09
strigaziheat has events12:09
flwang1yep, similar stuff12:09
flwang1i have to offline12:10
*** ktibi has joined #openstack-containers12:10
flwang1anything you wanna discuss today?12:10
strigaziI think I'm good, just that this week we can do one more release on thursday 1700 UTC12:11
sfilatovflwang1: BTW Do you know current status for nodepools?12:11
strigazikind of morning for US12:11
flwang1strigazi: ok, let's get the health patches in then12:12
flwang1ttyl12:12
flwang1sfilatov: no, i have no idea12:12
strigaziflwang1: we have someone that starts working on it here12:12
flwang1strigazi: ok, cool12:13
strigazisfilatov: ^^12:13
strigazisfilatov: flwang1 this is stein work, that we should discuss during september12:13
sfilatovstrigazi: Okay. I though there mught be some updates on this12:14
*** ykarel|away has joined #openstack-containers12:31
*** ykarel|away is now known as ykarel12:32
*** ykarel_ has joined #openstack-containers12:36
*** ykarel has quit IRC12:39
*** rtjure has quit IRC12:43
*** ramishra has quit IRC12:46
*** rtjure has joined #openstack-containers12:48
*** ykarel_ is now known as ykarel12:53
*** janki has quit IRC12:55
*** ykarel has quit IRC13:03
*** ykarel has joined #openstack-containers13:03
openstackgerritweizj proposed openstack/magnum master: Remove the space to keep docs consistence with others  https://review.openstack.org/59673413:49
*** rado_ has joined #openstack-containers14:05
*** ramishra has joined #openstack-containers14:30
*** rado_ has quit IRC14:34
*** ianychoi has quit IRC14:36
*** ianychoi has joined #openstack-containers14:42
*** hongbin has joined #openstack-containers14:58
*** Bhujay has quit IRC15:02
*** pcaruana has quit IRC15:29
*** Bhujay has joined #openstack-containers15:38
*** ykarel has quit IRC15:42
*** ykarel has joined #openstack-containers15:43
*** Bhujay has quit IRC15:49
*** ktibi_ has joined #openstack-containers16:00
*** ktibi has quit IRC16:02
*** sfilatov has quit IRC16:06
*** ykarel is now known as ykarel|dinner16:26
*** pcaruana has joined #openstack-containers16:26
*** itlinux has joined #openstack-containers16:29
*** ykarel|dinner has quit IRC16:33
*** Bhujay has joined #openstack-containers17:00
*** openstackgerrit has quit IRC17:04
*** Bhujay has quit IRC17:05
*** ykarel has joined #openstack-containers17:20
*** Nisha_Agarwal has joined #openstack-containers17:21
*** ykarel_ has joined #openstack-containers17:29
*** ykarel has quit IRC17:31
Nisha_Agarwalflwang, ping17:32
Nisha_Agarwalstrigazi, ping17:34
*** mattgo has quit IRC17:47
Nisha_Agarwalhello team18:13
Nisha_Agarwali see that "curl --silent http://127.0.0.1:8080/healthz" request fails with connection error18:15
Nisha_Agarwalcould you tell me with root cause of the same?18:15
Nisha_Agarwali see the script kube-apiserver-to-kubelet-role.sh loops at this statement for ever while doing master creation.18:16
Nisha_Agarwaland kube-apiserver service is also not coming up18:17
flwang1Nisha_Agarwal: it means your api server is not coming up as you said18:20
flwang1so now you need to check cloud-init-output.log to see which script failed18:20
Nisha_Agarwalflwang1, the kube-apiserver fails with this error18:20
Nisha_Agarwalerror creating self-signed certificates: open /var/run/kubernetes/apiserver.crt: permission denied18:20
Nisha_Agarwali have the tls_disabled set to true18:22
Nisha_Agarwaland i am trying this on queens18:22
flwang1Nisha_Agarwal: hmm. though i don't think set tls_disabled=true will cause that, can you try again without tls_disabled=true?18:24
ykarel_flwang1, Nisha_Agarwal https://bugs.launchpad.net/magnum/+bug/171488018:25
openstackLaunchpad bug 1714880 in Magnum "Not able to create TLS_DISABLED k8s cluster" [Undecided,New]18:25
*** ykarel_ is now known as ykarel18:25
flwang1ykarel: interesting, does mount the /var/run/kubernetes work?18:26
flwang1Nisha_Agarwal: ^18:26
Nisha_Agarwalykarel, this bug was reported around a year back. Do we have any fix in place?18:26
ykarelflwang1, tl;dr18:26
ykarelNisha_Agarwal, as per strigazi comments there it seems not fixed yet18:27
flwang1ykarel: i agree with strigazi here18:27
ykareland if there is use case it can be tried fixing as he suggested18:27
flwang1we're enabling RBAC, and using TLS disabled cluster won't work18:28
ykarelflwang1, yes, but if i remember correctly the bug was from the time when RBAC was not enabled18:28
flwang1Nisha_Agarwal: why do you want to use tls_disabled=true?18:28
Nisha_Agarwalflwang1, what do u mean by mount the /var/run/kubernetes ? ....mounting  it to any loop device or something else18:29
flwang1ykarel: yes, i'm talking about current status ;)18:29
ykarelflwang1, ack :)18:29
Nisha_Agarwalflwang1, hmmm there is not such requirement18:29
flwang1i will probably propose a patch to raise 400 when user want to set tls_disabled=true for k8s18:30
Nisha_Agarwalflwang1, i remember when i was experimenting with cluster creation on pike i was facig some issue with tls_disabled set to false so i had set it to true18:30
flwang1Nisha_Agarwal: ok18:30
Nisha_Agarwalflwang1, so used same settings in queens :)18:31
Nisha_AgarwalOk but makes sense with tls_disabled , rbac may not work18:31
flwang1queens and rocky are relatively stable18:31
Nisha_Agarwalso i will try enabling tls18:32
Nisha_Agarwaland see18:32
flwang1good luck18:33
Nisha_Agarwalflwang1, thanks18:34
*** ykarel is now known as ykarel|away18:38
*** hongbin has quit IRC18:53
*** openstackgerrit has joined #openstack-containers18:54
openstackgerritSpyros Trigazis proposed openstack/magnum master: Add kubelet to the master nodes  https://review.openstack.org/59686018:54
openstackgerritChuck Short proposed openstack/magnum master: Fix unit test failure with python3.6  https://review.openstack.org/59686118:54
*** itlinux has quit IRC18:54
*** hongbin has joined #openstack-containers18:54
*** itlinux has joined #openstack-containers19:11
*** pcaruana has quit IRC19:13
*** itlinux has quit IRC19:16
*** ykarel|away has quit IRC19:20
*** ramishra has quit IRC19:38
Nisha_Agarwalflwang1, apiserver still doesnt come up19:45
Nisha_Agarwalnow i see this error19:45
Nisha_Agarwal  1 cloudprovider.go:59] --external-hostname was not specified. Trying to get it from the cloud provider.19:45
Nisha_Agarwalunable to load server certificate: open /etc/kubernetes/certs/server.crt: permission denied19:45
*** flwang1 has quit IRC19:48
*** hongbin has quit IRC20:26
*** hongbin has joined #openstack-containers20:27
*** ktibi_ has quit IRC20:32
*** Nisha_Agarwal has quit IRC20:37
*** rcernin has joined #openstack-containers21:51
*** pbourke has quit IRC22:40
*** pbourke has joined #openstack-containers22:41
*** hongbin has quit IRC22:45
*** Jeffrey4l has quit IRC23:26
*** Jeffrey4l has joined #openstack-containers23:28

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!