*** hongbin has joined #openstack-containers | 01:03 | |
*** ramishra has joined #openstack-containers | 02:09 | |
openstackgerrit | Feilong Wang proposed openstack/magnum master: Add health_status and health_status_reason to cluster https://review.openstack.org/570818 | 02:28 |
---|---|---|
openstackgerrit | Feilong Wang proposed openstack/magnum master: [k8s] Update cluster health status by native API https://review.openstack.org/572897 | 02:28 |
*** adrianreza has joined #openstack-containers | 02:30 | |
*** ramishra has quit IRC | 03:00 | |
*** ramishra has joined #openstack-containers | 03:08 | |
*** udesale has joined #openstack-containers | 03:27 | |
*** ykarel has joined #openstack-containers | 04:54 | |
*** hongbin has quit IRC | 05:27 | |
*** janki has joined #openstack-containers | 06:06 | |
*** Bhujay has joined #openstack-containers | 06:34 | |
*** Bhujay has quit IRC | 06:40 | |
*** pcaruana has joined #openstack-containers | 06:49 | |
*** Bhujay has joined #openstack-containers | 06:49 | |
*** Bhujay has quit IRC | 07:10 | |
*** sfilatov has joined #openstack-containers | 07:14 | |
*** Bhujay has joined #openstack-containers | 07:18 | |
*** mattgo has joined #openstack-containers | 07:29 | |
*** rcernin has quit IRC | 07:31 | |
openstackgerrit | Sergey Filatov proposed openstack/magnum master: Remove deprecated `tls-ca-file` option from kube-apiserver https://review.openstack.org/596647 | 07:32 |
sfilatov | Basically any cluster on 1.11 with tls enabled will fail ^^^ | 07:33 |
sfilatov | So I think this change should make it before the freeze | 07:33 |
ebbex | kaiokmo: Hey, how far have you come troubleshooting your kubernetes problem? I have a similar problem with a timeout, and have traced it down to "scripts/part-008" of cloud-init on the master fails. But I don't really understand why. | 07:34 |
sfilatov | I've checked cloud-init logs & logged into runc container to run it manually | 07:38 |
sfilatov | Oh, sry, I thought you referred to my msg :) | 07:39 |
ebbex | sfilatov: No worries, you wouldn't happen to have an idea as to why magnum/drivers/common/templates/kubernetes/fragments/write-kube-os-config.sh (part-008) would fail on a fedora-atomichost? | 07:42 |
sfilatov | ebbex: Have you checked that all the needed variables are in /etc/sysconfig/heat-params? | 07:44 |
ebbex | sfilatov: Yep. seems good to me. I can do "bash -x ...blahbla.../part-008, and the file gets created fine. | 07:48 |
ebbex | One strange thing maybe is the "WAIT_CURL", if I run that curl statement, I notice have certificate issues. (self-signed) | 07:49 |
strigazi | sfilatov: are you here? | 07:49 |
sfilatov | ebbex: No clue then. You should look for the error in journalctl | 07:49 |
sfilatov | strigazi: yep | 07:49 |
strigazi | last week I pinged you here | 07:50 |
strigazi | regarding tls-ca-file | 07:50 |
strigazi | How are you using magnum? | 07:50 |
strigazi | you have a fork? | 07:50 |
sfilatov | strigazi: Yes | 07:50 |
sfilatov | strigazi: It's forked from ocata version | 07:51 |
strigazi | you are not using docker.io/openstackmagnum/ containers right? | 07:51 |
sfilatov | strigazi: Yes, we moved from it | 07:51 |
sfilatov | strigazi: Is there a difference in your version? | 07:52 |
strigazi | So for others tls-ca-file is not such a breaking proble,m | 07:52 |
strigazi | And we magnum we use tls | 07:52 |
strigazi | And in magnum we use tls | 07:52 |
strigazi | You were mentioning that we don't in the channel. | 07:52 |
sfilatov | strigazi: Why is that not a problem in your distribution | 07:52 |
strigazi | The tls-ca-file param is unset in the container running the apiserver | 07:53 |
strigazi | It is not per distribution | 07:53 |
strigazi | if you clone the rocky branch it works | 07:53 |
strigazi | The change is welcome of course to remove the old parameter. | 07:54 |
sfilatov | strigazi: It's interesting why it works then | 07:54 |
strigazi | How are you using magnum then? What OS? | 07:55 |
sfilatov | strigazi: fedora-atomic | 07:55 |
sfilatov | strigazi: It's very close to upstream I guess | 07:55 |
sfilatov | strigazi: Same OS used | 07:55 |
strigazi | running kubernetes in containers or you have a custom fedora-atomic | 07:56 |
sfilatov | Kubernetes is run in atomic containers | 07:56 |
sfilatov | and the we build kubernetes containers from atomic repo | 07:57 |
strigazi | cool | 07:57 |
sfilatov | tls-ca-file is added explicitly | 07:57 |
sfilatov | is tls is not disabled | 07:57 |
sfilatov | so it's weird how your kube-apiserver is started | 07:58 |
strigazi | in the containers I have pushed upstream it is unset before starting | 07:58 |
sfilatov | Ah | 07:58 |
sfilatov | Didn;t see that coming | 07:58 |
sfilatov | :) | 07:58 |
*** ykarel is now known as ykarel|lunch | 07:59 | |
strigazi | I tried to make the builds in the CI, but I was stuck, I'll pick it up again: | 07:59 |
sfilatov | Okay then, no questions then | 07:59 |
strigazi | https://review.openstack.org/#/c/585420/ | 07:59 |
*** mvpnitesh has joined #openstack-containers | 08:00 | |
sfilatov | Yeah, It would be great, I had this one in mind but didn't have time to check it out | 08:00 |
*** mvpnitesh has quit IRC | 08:34 | |
*** ykarel|lunch is now known as ykarel | 08:41 | |
*** Bhujay has quit IRC | 08:48 | |
*** Bhujay has joined #openstack-containers | 09:21 | |
*** mvpnitesh has joined #openstack-containers | 10:09 | |
*** mvpnitesh has quit IRC | 10:19 | |
*** pcaruana has quit IRC | 10:32 | |
*** pcaruana has joined #openstack-containers | 10:32 | |
*** slagle has quit IRC | 10:42 | |
*** slagle has joined #openstack-containers | 10:44 | |
*** flwang1 has joined #openstack-containers | 10:49 | |
flwang1 | strigazi: around for a quick sync? | 10:50 |
*** slagle has quit IRC | 10:55 | |
strigazi | flwang1: here | 11:09 |
flwang1 | strigazi: https://review.openstack.org/#/c/572897/ is basically ready for review | 11:10 |
flwang1 | now i'm just adding more unit test to fix the cover job | 11:10 |
flwang1 | we need to think about the auto healing part to make sure the health_status_reason's data structure is good enough | 11:11 |
flwang1 | do you know if Ricardo has any PoC code for auto healing? | 11:11 |
strigazi | no, he hasn't | 11:13 |
strigazi | I'm reviewing now | 11:13 |
strigazi | oh flwang1 you were a glance core right? | 11:13 |
strigazi | :) | 11:13 |
flwang1 | yes, for years | 11:13 |
strigazi | I have a quick glance question, I'm helping a user here | 11:14 |
flwang1 | sure | 11:14 |
flwang1 | i'm listening | 11:14 |
strigazi | let's not polute the containers channel, I'm pm-ing you | 11:14 |
flwang1 | no problem | 11:14 |
strigazi | I'll give a go to your patch now | 11:21 |
strigazi | have you started with the kubelet patch? It is not trivial to make kubelet and flannel work on the master | 11:22 |
flwang1 | i will start the kubelet work tomorrow | 11:22 |
flwang1 | and i will start to review/test your rolling upgrade patch | 11:24 |
strigazi | cool | 11:25 |
flwang1 | i have a question about magnum tempest for you | 11:25 |
strigazi | I'have an old wip I rebased. I can push it an you can pick it up | 11:25 |
strigazi | for kubelet ^^ | 11:25 |
flwang1 | that would be nice | 11:25 |
flwang1 | as you know, last week i was working on the certified k8s stuff, so not much progress on coding | 11:26 |
strigazi | that was cool, thanks, it gave us visivility | 11:26 |
flwang1 | and my next plan on that part is enabling the functional testing to make our k8s test works | 11:27 |
flwang1 | back to the tempest question, I can see magnum tempest only test swarm mode, no k8s, and we only test k8s with magnum server's function testing, is that correct? | 11:28 |
flwang1 | wait | 11:28 |
strigazi | afaik, no, let me find it | 11:28 |
flwang1 | i'm wrong, for function testing, we're also only testing swarm mode | 11:29 |
strigazi | tempest tests only swarm-mode | 11:29 |
flwang1 | yep | 11:30 |
strigazi | but functional tests k8s and swarm-mode and more | 11:30 |
strigazi | coreos etc | 11:30 |
flwang1 | yes, and that's why our function testing for k8s is failing | 11:30 |
flwang1 | ok, we're on the same page now | 11:30 |
flwang1 | back to the tempest question | 11:30 |
flwang1 | in Catalyst Cloud, we rely on tempest for e2e testing and monitoring | 11:31 |
flwang1 | we run tempest scenarios test periodiclly to make sure our cloud is working | 11:31 |
flwang1 | so we'd like to make magnum tempest support k8s | 11:31 |
strigazi | we do this with rally | 11:31 |
strigazi | we test cluster creations and k8s deployment with rally | 11:32 |
flwang1 | rally does not support scenario test IIRC | 11:32 |
flwang1 | does rally support a full functional e2e testing for k8s? | 11:33 |
strigazi | one sec | 11:33 |
strigazi | we can add a scenario here, just one method: https://github.com/openstack/rally-openstack/blob/master/rally_openstack/scenarios/magnum/k8s_pods.py | 11:35 |
strigazi | downstream we patched rally to use kubectl and not the pythonclient | 11:35 |
flwang1 | ok | 11:38 |
flwang1 | but we're not using rally :( | 11:38 |
*** udesale has quit IRC | 11:38 | |
flwang1 | are you using a dedicated rally service? | 11:38 |
strigazi | there is not real benefit in using tempest vs rally for this usecsae | 11:38 |
strigazi | since we have to use the sonobuoy client (or whatever the spelling is :)) | 11:39 |
flwang1 | i'm not 100 % sure if sonobuoy (k8s e2e testing) can cover network policy testing with calico | 11:40 |
flwang1 | because except sonobuoy, we also want to cover network policy testing and persistent volume(cinder) testing | 11:41 |
strigazi | for those cases it will be a custom test right? | 11:41 |
flwang1 | but i think you're probably right, if we can get sonobuoy work, then we're basically good enough | 11:41 |
*** ykarel is now known as ykarel|away | 11:42 | |
flwang1 | strigazi: what do you mean customer test? | 11:42 |
strigazi | test cinder and network policy | 11:42 |
strigazi | custom, not customer :) | 11:42 |
flwang1 | i'd like to get them into magnum tempest plugin | 11:42 |
strigazi | let's take from start. | 11:42 |
strigazi | we want to tests: | 11:42 |
flwang1 | since i think k8s+calico is a common user case | 11:42 |
strigazi | we want to test: | 11:42 |
strigazi | 1. network policies with calico | 11:43 |
strigazi | 2. cinder | 11:43 |
flwang1 | 3. DNS | 11:43 |
flwang1 | 4. basic k8s api | 11:43 |
flwang1 | 5. dashboard | 11:43 |
flwang1 | 6. sonobuoy | 11:43 |
strigazi | 3 and 4 are tested in sonobuoy | 11:43 |
sfilatov | Are LB covered by sonobuoy? | 11:43 |
*** rtjure has quit IRC | 11:44 | |
strigazi | I | 11:44 |
flwang1 | sfilatov: i don't think so | 11:44 |
strigazi | I'm not sure | 11:44 |
strigazi | maybe it is in the tests that are skipped | 11:44 |
flwang1 | could be | 11:44 |
flwang1 | i think k8s e2e testing has covered 'everything', but we need to figure out the test suit which can meet magnum's requirements | 11:45 |
strigazi | I think we can have two sets of tests: | 11:46 |
*** ykarel|away has quit IRC | 11:46 | |
strigazi | Some that are fast like the ones we have atm in the functional tests | 11:46 |
strigazi | and then if those are passing we can run sonobuoy | 11:47 |
strigazi | for the dashboard we can just check the liveness probe of its pod? | 11:47 |
flwang1 | strigazi: probably, FWIW generally, if we can get sonobuoy works with tempest/rally, then we're in good shape | 11:50 |
flwang1 | because sonobuoy is the tempest for k8s | 11:50 |
strigazi | the downside is that it takes an hour not matter what | 11:50 |
flwang1 | strigazi: we can manage the test cases list i think | 11:51 |
flwang1 | start with baby step (those we're interested) | 11:51 |
flwang1 | in other words, if we think sonobuoy is good, then we should just do it | 11:52 |
flwang1 | at least it is endorsed by CNCF foundation | 11:52 |
strigazi | I still want some tests that are faster, doesn't make sense? | 11:54 |
flwang1 | it does make sense for sure | 11:55 |
flwang1 | hence why you can set the test list in tempest | 11:55 |
*** rtjure has joined #openstack-containers | 11:58 | |
sfilatov | So the idea is to call sonobuoy from tempest? | 11:58 |
flwang1 | we have no idea yet ;) | 11:58 |
flwang1 | just brainstorming and discussing | 11:59 |
strigazi | This is the most obvious | 11:59 |
flwang1 | strigazi: for the auto healing feature, we may need another field or table to save the last status | 12:03 |
strigazi | we may need cluster versions ... this comes up all the time | 12:03 |
flwang1 | strigazi: https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-repair | 12:04 |
flwang1 | what do you mean cluster versions? | 12:04 |
strigazi | keep the history of the cluster | 12:04 |
strigazi | whenever a change is done | 12:04 |
strigazi | scale, upgrade, heal | 12:05 |
flwang1 | yep | 12:05 |
flwang1 | i see | 12:06 |
flwang1 | cluster versions or template versions? | 12:06 |
strigazi | I guess both? | 12:06 |
flwang1 | because scaling and healing shouldn't change the cluster coe version | 12:07 |
flwang1 | only upgrade change the coe version, but for upgrade, we're going to use template to track different versions, is it? | 12:07 |
strigazi | yes | 12:07 |
strigazi | but even when you scale you need some history | 12:08 |
strigazi | or heal | 12:08 |
flwang1 | so, we probably at least to have a place to track the 'change-list' | 12:08 |
flwang1 | yes | 12:08 |
flwang1 | so better to call it 'change_history‘ or something like that | 12:09 |
strigazi | heat has events | 12:09 |
flwang1 | yep, similar stuff | 12:09 |
flwang1 | i have to offline | 12:10 |
*** ktibi has joined #openstack-containers | 12:10 | |
flwang1 | anything you wanna discuss today? | 12:10 |
strigazi | I think I'm good, just that this week we can do one more release on thursday 1700 UTC | 12:11 |
sfilatov | flwang1: BTW Do you know current status for nodepools? | 12:11 |
strigazi | kind of morning for US | 12:11 |
flwang1 | strigazi: ok, let's get the health patches in then | 12:12 |
flwang1 | ttyl | 12:12 |
flwang1 | sfilatov: no, i have no idea | 12:12 |
strigazi | flwang1: we have someone that starts working on it here | 12:12 |
flwang1 | strigazi: ok, cool | 12:13 |
strigazi | sfilatov: ^^ | 12:13 |
strigazi | sfilatov: flwang1 this is stein work, that we should discuss during september | 12:13 |
sfilatov | strigazi: Okay. I though there mught be some updates on this | 12:14 |
*** ykarel|away has joined #openstack-containers | 12:31 | |
*** ykarel|away is now known as ykarel | 12:32 | |
*** ykarel_ has joined #openstack-containers | 12:36 | |
*** ykarel has quit IRC | 12:39 | |
*** rtjure has quit IRC | 12:43 | |
*** ramishra has quit IRC | 12:46 | |
*** rtjure has joined #openstack-containers | 12:48 | |
*** ykarel_ is now known as ykarel | 12:53 | |
*** janki has quit IRC | 12:55 | |
*** ykarel has quit IRC | 13:03 | |
*** ykarel has joined #openstack-containers | 13:03 | |
openstackgerrit | weizj proposed openstack/magnum master: Remove the space to keep docs consistence with others https://review.openstack.org/596734 | 13:49 |
*** rado_ has joined #openstack-containers | 14:05 | |
*** ramishra has joined #openstack-containers | 14:30 | |
*** rado_ has quit IRC | 14:34 | |
*** ianychoi has quit IRC | 14:36 | |
*** ianychoi has joined #openstack-containers | 14:42 | |
*** hongbin has joined #openstack-containers | 14:58 | |
*** Bhujay has quit IRC | 15:02 | |
*** pcaruana has quit IRC | 15:29 | |
*** Bhujay has joined #openstack-containers | 15:38 | |
*** ykarel has quit IRC | 15:42 | |
*** ykarel has joined #openstack-containers | 15:43 | |
*** Bhujay has quit IRC | 15:49 | |
*** ktibi_ has joined #openstack-containers | 16:00 | |
*** ktibi has quit IRC | 16:02 | |
*** sfilatov has quit IRC | 16:06 | |
*** ykarel is now known as ykarel|dinner | 16:26 | |
*** pcaruana has joined #openstack-containers | 16:26 | |
*** itlinux has joined #openstack-containers | 16:29 | |
*** ykarel|dinner has quit IRC | 16:33 | |
*** Bhujay has joined #openstack-containers | 17:00 | |
*** openstackgerrit has quit IRC | 17:04 | |
*** Bhujay has quit IRC | 17:05 | |
*** ykarel has joined #openstack-containers | 17:20 | |
*** Nisha_Agarwal has joined #openstack-containers | 17:21 | |
*** ykarel_ has joined #openstack-containers | 17:29 | |
*** ykarel has quit IRC | 17:31 | |
Nisha_Agarwal | flwang, ping | 17:32 |
Nisha_Agarwal | strigazi, ping | 17:34 |
*** mattgo has quit IRC | 17:47 | |
Nisha_Agarwal | hello team | 18:13 |
Nisha_Agarwal | i see that "curl --silent http://127.0.0.1:8080/healthz" request fails with connection error | 18:15 |
Nisha_Agarwal | could you tell me with root cause of the same? | 18:15 |
Nisha_Agarwal | i see the script kube-apiserver-to-kubelet-role.sh loops at this statement for ever while doing master creation. | 18:16 |
Nisha_Agarwal | and kube-apiserver service is also not coming up | 18:17 |
flwang1 | Nisha_Agarwal: it means your api server is not coming up as you said | 18:20 |
flwang1 | so now you need to check cloud-init-output.log to see which script failed | 18:20 |
Nisha_Agarwal | flwang1, the kube-apiserver fails with this error | 18:20 |
Nisha_Agarwal | error creating self-signed certificates: open /var/run/kubernetes/apiserver.crt: permission denied | 18:20 |
Nisha_Agarwal | i have the tls_disabled set to true | 18:22 |
Nisha_Agarwal | and i am trying this on queens | 18:22 |
flwang1 | Nisha_Agarwal: hmm. though i don't think set tls_disabled=true will cause that, can you try again without tls_disabled=true? | 18:24 |
ykarel_ | flwang1, Nisha_Agarwal https://bugs.launchpad.net/magnum/+bug/1714880 | 18:25 |
openstack | Launchpad bug 1714880 in Magnum "Not able to create TLS_DISABLED k8s cluster" [Undecided,New] | 18:25 |
*** ykarel_ is now known as ykarel | 18:25 | |
flwang1 | ykarel: interesting, does mount the /var/run/kubernetes work? | 18:26 |
flwang1 | Nisha_Agarwal: ^ | 18:26 |
Nisha_Agarwal | ykarel, this bug was reported around a year back. Do we have any fix in place? | 18:26 |
ykarel | flwang1, tl;dr | 18:26 |
ykarel | Nisha_Agarwal, as per strigazi comments there it seems not fixed yet | 18:27 |
flwang1 | ykarel: i agree with strigazi here | 18:27 |
ykarel | and if there is use case it can be tried fixing as he suggested | 18:27 |
flwang1 | we're enabling RBAC, and using TLS disabled cluster won't work | 18:28 |
ykarel | flwang1, yes, but if i remember correctly the bug was from the time when RBAC was not enabled | 18:28 |
flwang1 | Nisha_Agarwal: why do you want to use tls_disabled=true? | 18:28 |
Nisha_Agarwal | flwang1, what do u mean by mount the /var/run/kubernetes ? ....mounting it to any loop device or something else | 18:29 |
flwang1 | ykarel: yes, i'm talking about current status ;) | 18:29 |
ykarel | flwang1, ack :) | 18:29 |
Nisha_Agarwal | flwang1, hmmm there is not such requirement | 18:29 |
flwang1 | i will probably propose a patch to raise 400 when user want to set tls_disabled=true for k8s | 18:30 |
Nisha_Agarwal | flwang1, i remember when i was experimenting with cluster creation on pike i was facig some issue with tls_disabled set to false so i had set it to true | 18:30 |
flwang1 | Nisha_Agarwal: ok | 18:30 |
Nisha_Agarwal | flwang1, so used same settings in queens :) | 18:31 |
Nisha_Agarwal | Ok but makes sense with tls_disabled , rbac may not work | 18:31 |
flwang1 | queens and rocky are relatively stable | 18:31 |
Nisha_Agarwal | so i will try enabling tls | 18:32 |
Nisha_Agarwal | and see | 18:32 |
flwang1 | good luck | 18:33 |
Nisha_Agarwal | flwang1, thanks | 18:34 |
*** ykarel is now known as ykarel|away | 18:38 | |
*** hongbin has quit IRC | 18:53 | |
*** openstackgerrit has joined #openstack-containers | 18:54 | |
openstackgerrit | Spyros Trigazis proposed openstack/magnum master: Add kubelet to the master nodes https://review.openstack.org/596860 | 18:54 |
openstackgerrit | Chuck Short proposed openstack/magnum master: Fix unit test failure with python3.6 https://review.openstack.org/596861 | 18:54 |
*** itlinux has quit IRC | 18:54 | |
*** hongbin has joined #openstack-containers | 18:54 | |
*** itlinux has joined #openstack-containers | 19:11 | |
*** pcaruana has quit IRC | 19:13 | |
*** itlinux has quit IRC | 19:16 | |
*** ykarel|away has quit IRC | 19:20 | |
*** ramishra has quit IRC | 19:38 | |
Nisha_Agarwal | flwang1, apiserver still doesnt come up | 19:45 |
Nisha_Agarwal | now i see this error | 19:45 |
Nisha_Agarwal | 1 cloudprovider.go:59] --external-hostname was not specified. Trying to get it from the cloud provider. | 19:45 |
Nisha_Agarwal | unable to load server certificate: open /etc/kubernetes/certs/server.crt: permission denied | 19:45 |
*** flwang1 has quit IRC | 19:48 | |
*** hongbin has quit IRC | 20:26 | |
*** hongbin has joined #openstack-containers | 20:27 | |
*** ktibi_ has quit IRC | 20:32 | |
*** Nisha_Agarwal has quit IRC | 20:37 | |
*** rcernin has joined #openstack-containers | 21:51 | |
*** pbourke has quit IRC | 22:40 | |
*** pbourke has joined #openstack-containers | 22:41 | |
*** hongbin has quit IRC | 22:45 | |
*** Jeffrey4l has quit IRC | 23:26 | |
*** Jeffrey4l has joined #openstack-containers | 23:28 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!