Wednesday, 2020-03-25

openstackgerritFeilong Wang proposed openstack/magnum master: [k8s] Upgrade calico/coredns to the latest stable version  https://review.opendev.org/70559902:06
*** xinliang has joined #openstack-containers03:28
*** xinliang has quit IRC03:51
*** flwang1 has quit IRC04:07
*** ykarel|away is now known as ykarel04:26
*** udesale has joined #openstack-containers04:51
*** vishalmanchanda has joined #openstack-containers05:03
*** vesper11 has quit IRC07:16
*** vesper has joined #openstack-containers07:16
*** sapd1_x has joined #openstack-containers08:18
*** ykarel is now known as ykarel|lunch08:39
*** xinliang has joined #openstack-containers08:43
*** flwang1 has joined #openstack-containers08:43
flwang1brtknr: ping08:43
flwang1strigazi: around?08:47
*** xinliang has quit IRC08:48
strigazio/08:56
flwang1strigazi: before the meeting, quick question08:57
flwang1did you see my email about the cluster upgrade?08:58
flwang1strigazi: did you ever think about the upgrade issue from fedora atomic to fedora coreos?08:58
strigaziI just saw it, not possible with the API. I tried to support it (mixing coreos and atomic) but you guys said no :)08:59
strigaziI don't think it is wise to pursue this08:59
strigaziWe channel users to use multiple clusters and drop the old ones09:00
strigaziUpgrade in place is more useful for CVEs09:00
strigaziat least that is our strategy at CERN09:00
flwang1i see. i tried and i realized it's very hard09:01
flwang1#startmeeting magnum09:01
openstackMeeting started Wed Mar 25 09:01:12 2020 UTC and is due to finish in 60 minutes.  The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot.09:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.09:01
*** openstack changes topic to " (Meeting topic: magnum)"09:01
openstackThe meeting name has been set to 'magnum'09:01
flwang1#topic roll call09:01
*** openstack changes topic to "roll call (Meeting topic: magnum)"09:01
flwang1o/09:01
strigaziο/09:01
flwang1brtknr: ^09:01
brtknro/09:01
flwang1i think just us09:02
flwang1are you guys still safe?09:02
flwang1NZ will lockdown in the next 4 weeks :(09:02
strigaziall good here09:02
brtknryep, only left the house to go for a run yesterday but havent really left home for 2 weeks09:03
flwang1be kind and stay safe09:03
brtknrother than to go shopping09:03
brtknryou too!09:03
flwang1#topic update health status09:03
*** openstack changes topic to "update health status (Meeting topic: magnum)"09:03
flwang1thanks for the good review from brtknr09:03
flwang1i think it's in a good shape now09:04
flwang1and i have propose a PR in magnum auto healer https://github.com/kubernetes/cloud-provider-openstack/pull/985 if you want to give it a try09:04
brtknri think it would still be good to try and pursue updating the reason only and letting magnum conductor infer health_status09:05
flwang1we (catalyst cloud) are keen to have this, because all our cluster are private and now we can't monitor their status09:05
brtknrotherwise there will be multiple places with logic for determining health status09:05
flwang1brtknr: we can, but how about do it in a separate, following patch?09:06
brtknralso why make 2 calls to the API when you can do this with one?09:06
flwang1i'm not feeling confident to change the internal health update logic in this patch09:07
flwang1strigazi: thoughts?09:08
strigaziI try to understand to which two calls you are talking about09:08
brtknr1 api call to update health_status and another api call to health_status_reason09:09
flwang1brtknr: did we?09:09
strigazi1 call should be enough09:09
flwang1brtknr: i forgot the details09:09
flwang1I'm happy to improve it if it can be, my point is, i'd like to do it in a separate patch09:10
flwang1instead of mixing in this patch09:10
strigazi+1 to separate patch09:11
strigazi(gerrit should allow many patches in a change)09:11
strigazi(but it doesn't)09:11
flwang1strigazi: please help review the current one, thanks09:12
flwang1brtknr: are you ok with that?09:12
brtknralso the internal poller is always setting the status to UNKNOWN09:12
brtknrsomething needs to be done about that09:12
brtknrotherwise it will be like a lottery09:13
brtknr50% of the time, the status will be UNKNOWN09:13
brtknrwhich defeats the point of having an external updater09:13
flwang1brtknr: can you explain?09:14
flwang1it shouldn't be09:14
flwang1i'm happy to fix it in the following patch09:14
brtknrwhen I was testing it, the CLI would update the health_status, but the poller would reset it back to UNKNOWN09:15
flwang1if both api and worker nodes are OK, then the health status should be healthy09:15
flwang1brtknr: you're mixing the two things09:15
flwang1are you saying there is a bug with the internal health status update logic?09:16
flwang1i saw your patch for the corner case and I have already +2 on that, are you saying there is another potential bug?09:16
brtknrthere is no bug currently apart from the master_lb edge case09:16
flwang1good09:17
brtknrbut if we want to be able to update externally, its a race condition between internal poller and the external update09:17
brtknrthe internal poller sets it back to UNKNOWN every polling interval?09:17
brtknrmake sense?09:17
flwang1yep, but after we fixed the corner case by https://review.opendev.org/#/c/714589/, then we should be good09:17
flwang1strigazi: can you help review https://review.opendev.org/#/c/714589/ ?09:18
flwang1brtknr: are you ok we improve the health_status calculation in a separate patch?09:18
strigaziIf a magnum deployment is relying on an external thing, why not disalbe the conductor? I will have a look09:19
flwang1strigazi: it's not hard depedency09:20
strigaziit's not I know09:20
flwang1we can introduce a config if you think that's better09:20
flwang1a config to totally disable the internal polling for health status09:20
strigaziI mean for someone who uses the external controller it makes sense09:20
flwang1right09:21
strigaziwhat brtknr proposes makes in this path09:21
brtknri am slightly unconfortable with it because if we have the health_status calculation logic in both CPO and magnum-conductor, we need to make 2 patches if we ever want to change this logic... my argument is that we should do this in one place... we already have this logic in magnum-conductor so makes sense to keep it therere and let the magnum-auto-healer simply provide the health_status_reason09:21
strigaziwhat brtknr proposes makes in this patch09:21
brtknrand let it calculate the health_status reason... i'm okay with it being a separate patch but i'd like to test them together09:22
flwang1sure, i mean if the current patch is in good shape, we can get it in and which will make the following patch easy for testing and review09:23
flwang1i just don't want to submit large patch because we don't have any function test in gate09:24
flwang1as you know, we're fully relying on our manual testing to keep the magnum code quality09:25
flwang1that's why i prefer to get smaller patch in09:25
flwang1hopefully that makes sense for this case09:25
brtknrok makes sense09:26
brtknrlets move to the next topic09:26
flwang1thanks, let's move on09:26
flwang1#topic https://review.opendev.org/#/c/714423/ - rootfs kubelet09:26
*** openstack changes topic to "https://review.opendev.org/#/c/714423/ - rootfs kubelet (Meeting topic: magnum)"09:26
flwang1brtknr: ^09:27
brtknrok so turns out mounting rootfs to kubelet fixes the cinder selinux issue09:27
brtknri tried mounting just the selinux specific things but that didnt help09:27
brtknrselinux specific things: /sys/fs/selinx, /var/lib/selinux/, /etc/selinx09:28
strigazikubelet has access to the docker socket or another cri socket. The least privileged pattern made little sense here.09:28
brtknrwe mounted /rootfs to kubelet in atomic, strigazi suggested doing this ages ago but flwang and i were cautious, but we should take this09:29
*** xinliang has joined #openstack-containers09:29
flwang1brtknr: after taking this, do we still have to disable selinux?09:29
brtknrflwang1: nope09:29
brtknrits upto you guys whether you want to take the selinux_mode patch09:30
brtknrit might be useful for other things09:30
strigazithe patch is useful09:30
flwang1if that's the case, i prefer to mountfs and still enable seliux09:30
brtknrok :) lets take both then :P09:31
brtknrselinux in fcos is always enabled by default09:31
flwang1i'm ok with that09:32
flwang1strigazi: ^09:32
strigaziof I agree with it, optionally disabling a security feature (selinux) and giving extra access to an already super uber priliged process (kubelet)09:34
flwang1cool09:34
flwang1next topic?09:34
flwang1#topic https://review.opendev.org/#/c/714574/ - cluster name for network09:34
*** openstack changes topic to "https://review.opendev.org/#/c/714574/ - cluster name for network (Meeting topic: magnum)"09:34
flwang1i'm happy to take this one09:34
flwang1private as the network name is annoying sometimes09:35
brtknr:)09:35
brtknrglad you agree09:35
flwang1strigazi: ^09:36
flwang1anything else we need to discuss?09:36
strigaziis it an issue when two clusters with the same name exist?09:36
flwang1not a problem09:37
strigaziwe should do the same for subnets if not there09:37
brtknrnope, it will be the same as when there are two networks called private09:37
brtknrsubnets get their name from heat stack09:37
flwang1but sometimes it's not handy to find the correct network09:37
brtknre.g.  k8s-flannel-coreos-f2mpsj3k7y6i-network-2imn745rxgzv-private_subnet-27qmm3u76ubp09:37
brtknrso its not a problem there09:37
strigaziok09:38
strigazimakes sense09:38
*** ykarel|lunch is now known as ykarel09:38
flwang1anything else we should discuss?09:39
brtknrhmm i made a few patches yesterday09:39
brtknrhttps://review.opendev.org/71471909:40
brtknrchanging repo for etcd09:40
brtknris that okay with you guys09:40
brtknri prefer quay.io/coreos as it uses the same release tag as etcd development repo09:40
brtknrit annoys me that k8s.gcr.io drops the v from the release version09:41
flwang1building etcd system container for atomic?09:41
brtknralso on https://github.com/etcd-io/etcd/releases, they say they use quay.io/coreos/etcd as their secondanry container registry09:41
strigaziwhere does the project publishes their builds? We should use that one (i don't know which one it is)09:42
brtknri am also okay to use gcr.io/etcd-development/etcd09:42
brtknraccording to https://github.com/etcd-io/etcd/releases, they publish to gcr.io/etcd-development/etcd and quay.io/coreos/etcd officially09:42
flwang1i like quay.io since it's maintained by coreos09:43
brtknri am happy with either09:44
strigaziI would choose the primary, but for us it doesn't matter, we mirror09:44
flwang1agree09:44
flwang1brtknr: done?09:44
flwang1i have a question about metrics-server09:44
brtknrokay shall i change it to primary or leave it as secondary?09:44
flwang1when i run 'kubectl top node', i got :09:45
flwang1Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)09:45
brtknris your metric server running?09:45
flwang1yes09:46
brtknrflwang1: do you have this patch: https://review.opendev.org/#/c/705984/09:46
flwang1yes09:47
strigaziwhat the metrics-server logs say?09:47
flwang1http://paste.openstack.org/show/791116/09:48
flwang1http://paste.openstack.org/show/791115/09:48
flwang1i can't see much error from the metrics-server09:49
strigaziwhich one?09:49
strigazi16 or 1509:49
flwang179111609:49
brtknrflwang1: is this master branch?09:49
flwang1yes09:49
flwang1i tested the caliao and coredns09:49
flwang1maybe related to the calico issue09:50
flwang1i will test it again with a master branch, no calico change09:50
flwang1as for calico patch, strigazi, i do need your help09:50
flwang1i think i have done everything and i can't see anything wrong, but the connection between nodes and pods don't work09:51
brtknrflwang1: is this with calico plugin?09:51
brtknrits not working for me either with calico09:51
flwang1ok09:51
brtknrprobably to do with pod to pod communication issue09:51
brtknrits working with flannel09:52
flwang1then it should be the calico version upgrade issue09:52
strigazileft this in gerrit too "With ip encapsualtion it works but the no-encapsulated mode is not working."09:52
*** ivve has joined #openstack-containers09:53
brtknrhow do you enable ip encaptulation?09:53
brtknrstrigazi:09:53
flwang1strigazi: just to be clear, you mean 'CALICO_IPV4POOL_IPIP' == 'Always' ?09:54
strigazihttps://review.opendev.org/#/c/705599/13/magnum/drivers/common/templates/kubernetes/fragments/calico-service.sh@45409:54
strigaziAlways09:54
strigaziyes09:54
strigaziNever should work though09:55
strigazias it used to work09:55
strigaziwhen you have SDN on SDN this can happen :)09:55
strigaziI mean being lost :)09:55
flwang1strigazi: should we ask help for calico team?09:56
strigaziyes09:56
flwang1and i just found it hard to debug because the toobox doesn't work on fedora coreos09:56
strigaziin devstack we run calico on openvswitch09:56
flwang1so i can't use tcpdump to check the traffic09:56
flwang1strigazi: did you try it on prod?09:57
flwang1is it working?09:57
strigaziflwang1: come on, privileged daemon with centos and instal whatever you want :)09:57
strigazis/daemon/daemonset/09:57
strigazior add a sidecar to calico node09:57
flwang1strigazi: you mean just ran a centos daemon set?09:57
strigazior exec in calico node, it is RHEL09:58
strigazimicrodnf install09:58
flwang1ok, will try09:58
flwang1strigazi: did you try it on prod? is it working?09:58
strigazia sidecar is optimal09:59
strigazinot yet, today, BUT09:59
strigaziin prod we don't run on openvswitch09:59
strigaziwe use linux-bridge09:59
strigaziso it may work09:59
strigaziI will update gerrit09:59
flwang1pls do, at least it can help us understand the issue09:59
flwang1should I split the calico and coredns upgrade into 2 patches?10:00
brtknrflwang1: probably good practice :)10:00
strigazias you want, it doesn't hurt10:00
flwang1i combine them because they're very critical services10:00
flwang1so i want to test them together for conformance test10:01
brtknrthey're not dependent on each other though right?10:01
flwang1no depedency10:01
strigazithey are not10:01
brtknrhave we ruled out if the regression is not caused by coredns?10:01
brtknrhave we ruled out if the regression is not caused by coredns upgrade?10:01
strigaziif you update coredns can you make it run on master too?10:01
flwang1i don't think it's related to coredns10:02
strigaziit can't be10:02
strigazitrust bu verify though10:02
flwang1strigazi: make coredns only running on master node?10:02
strigazitrust but verify though10:02
strigaziflwang1: no, run in master as well10:02
brtknrstrigazi: why?10:02
flwang1ah, sure, i can do that10:02
brtknrwhy run on master as well?10:03
flwang1brtknr: i even want to run it only on master ;)10:03
strigazibecause the user might have a stupid app that will run next to coredns and kill it10:03
flwang1since it's critical service10:03
strigazithen things on master don't have DNS10:04
flwang1we don't want to lose it when the worker node down as well10:04
flwang1let's end the meeting first10:04
brtknrok and I suppose we want it to run on workers too because we want the dns service to scale with the number of workers10:04
flwang1#endmeeting10:05
*** openstack changes topic to "OpenStack Containers Team | Meeting: every Wednesday @ 9AM UTC | Agenda: https://etherpad.openstack.org/p/magnum-weekly-meeting"10:05
openstackMeeting ended Wed Mar 25 10:05:00 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)10:05
openstackMinutes:        http://eavesdrop.openstack.org/meetings/magnum/2020/magnum.2020-03-25-09.01.html10:05
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/magnum/2020/magnum.2020-03-25-09.01.txt10:05
openstackLog:            http://eavesdrop.openstack.org/meetings/magnum/2020/magnum.2020-03-25-09.01.log.html10:05
flwang1strigazi: if you have time, pls help debug the calico issue10:05
flwang1meanwhile, i will consultant the calico team as well10:05
strigazithe argument about dns makes sense?10:05
strigaziflwang1: please cc me10:05
strigaziif it is public10:05
flwang1i will go to the calico slack channel10:05
strigazigithub issue too?10:06
flwang1good idea10:06
brtknri think they will try and ask for cash for advice :)10:06
flwang1i will cc you then10:06
flwang1brtknr: no, they won't ;)10:06
flwang1i asked them before and they're nice10:06
brtknrok maybe just the weave people then10:07
flwang1alright, i have to go, guys10:07
strigazigood night10:07
brtknrok sleep well!10:07
flwang1ttyl10:08
*** flwang1 has quit IRC10:08
*** rcernin has quit IRC10:17
*** trident has quit IRC10:29
*** trident has joined #openstack-containers10:31
*** trident has quit IRC10:33
*** xinliang has quit IRC10:36
*** trident has joined #openstack-containers10:37
*** sapd1_x has quit IRC11:06
*** yolanda has quit IRC11:27
*** markguz_ has quit IRC11:31
*** yolanda has joined #openstack-containers11:32
*** udesale_ has joined #openstack-containers12:42
*** udesale has quit IRC12:45
*** ramishra has quit IRC12:47
*** ramishra has joined #openstack-containers12:58
*** sapd1_x has joined #openstack-containers13:38
*** sapd1_x has quit IRC14:55
*** udesale_ has quit IRC14:57
*** ykarel is now known as ykarel|away15:25
brtknrcosmicsound: are you using etcd tag 3.4.3?15:38
brtknryou need to override it when using coreos15:38
brtknri think15:39
brtknrcheck that etcd is running on the master15:39
*** sapd1 has joined #openstack-containers15:41
*** sapd1 has quit IRC16:24
openstackgerritBharat Kunwar proposed openstack/magnum master: Build new autoscaler containers  https://review.opendev.org/71498616:27
*** tobias-urdin has joined #openstack-containers18:01
tobias-urdinquick question, if anybody knows, deployed a kubernetes v1.15.7 cluster with magnum18:11
tobias-urdinand using k8scloudprovider/openstack-cloud-controller-manager:v1.15.018:11
tobias-urdinis the openstack ccm v1.15.0 suppose to work with v1.15.7?18:11
tobias-urdinlxkong: maybe knows? ^18:12
tobias-urdinfails on:18:12
tobias-urdinkubectl create -f /srv/magnum/kubernetes/openstack-cloud-controller-manager.yaml18:12
tobias-urdinerror: SchemaError(io.k8s.api.autoscaling.v2beta1.ExternalMetricSource): invalid object doesn't have additional properties18:12
tobias-urdinhttps://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/kube-apiserver-to-kubelet-role.sh#L15418:12
tobias-urdincan reproduce, error message doesn't help to point out anything specific in the yaml file so probably an incompatibility issue18:12
tobias-urdini will try to respawn cluster with v1.15.0 instead, maybe openstack-cloud-controller-manager needs to release new versions to support stable v1.1518:13
*** irclogbot_1 has quit IRC18:37
tobias-urdinwith k8s v1.15.0 error: SchemaError(io.k8s.api.node.v1alpha1.RuntimeClassSpec): invalid object doesn't have additional properties18:48
*** irclogbot_0 has joined #openstack-containers19:01
NobodyCamGood Morning Folks; I am attempting to deploy a v1.15.9 kubernetes cluster with Rocky having some issues.19:06
NobodyCam"kube_cluster_deploy" ends up timing out. are there tricks to get this working... I.e. setting calico tags differently? "kube_tag=v1.15.9,tiller_enabled=True,availability_zone=nova,calico_tag=v2.6.12,calico_cni_tag=v1.11.8,calico_kube_controllers_tag=v1.0.5,heat_container_agent_tag=rawhide"19:08
*** irclogbot_0 has quit IRC19:37
*** irclogbot_2 has joined #openstack-containers19:40
*** irclogbot_2 has quit IRC19:42
*** irclogbot_1 has joined #openstack-containers19:45
*** irclogbot_1 has quit IRC20:00
*** irclogbot_3 has joined #openstack-containers20:03
*** irclogbot_3 has quit IRC20:12
*** irclogbot_2 has joined #openstack-containers20:15
*** irclogbot_2 has quit IRC20:16
tobias-urdinthe issue seems to be the kubectl version in the heat-container-agent, if i copy the file openstack-cloud-controller-manager.yaml to my computer and run it from there it works20:17
tobias-urdin/var/lib/containers/atomic/heat-container-agent.0/rootfs/usr/bin/kubectl version20:18
tobias-urdinClient Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"archive", BuildDate:"2018-07-25T11:20:04Z", GoVersion:"go1.11beta2", Compiler:"gc", Platform:"linux/amd64"}20:18
tobias-urdinand locally20:18
tobias-urdin$kubectl version20:18
tobias-urdinClient Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.7", GitCommit:"6c143d35bb11d74970e7bc0b6c45b6bfdffc0bd4", GitTreeState:"clean", BuildDate:"2019-12-11T12:42:56Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}20:18
*** irclogbot_0 has joined #openstack-containers20:21
NobodyCamhttps://www.irccloud.com/pastebin/JjeXp3Pk/20:42
NobodyCamI end up with :20:44
NobodyCamcni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d20:44
NobodyCamkubelet.go:2173] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized20:44
tobias-urdinNobodyCam: sorry was following up on my issue before yours, not related20:44
NobodyCam:) All Good!20:45
NobodyCamI was following your issue because it on the surface seemed close to what I was seeing locally20:46
brtknrtobias-urdin: which version of magnum are you running?21:02
brtknrNobodyCam: I wouldn’t change the default calico and calico_cni_tag21:04
brtknralso i think the latest kube_tag supported in rocky is v1.15.1121:04
NobodyCambrtknr: Thank you :) I have attempted with the Defaults too21:08
tobias-urdinbrtknr: 7.2.0 rocky release21:26
*** flwang1 has joined #openstack-containers22:11
flwang1brtknr: ping22:11
brtknrflwang1: pong22:17
brtknri was waiting for u!22:17
flwang1brtknr: :)22:17
flwang1brtknr: i'm reviewing the logic of _poll_health_status22:18
flwang1i don't really understand why you said 2 api calls to get the health status22:18
flwang1brtknr: are you still there?22:21
*** Jeffrey4l has quit IRC22:27
brtknrbrtknr: yes sorry im trying to get my dsvm up again after calico cluster was repeatedly failing on the clean master branch22:28
brtknrflwang1: ok i take back that we dont need two api calls22:29
*** Jeffrey4l has joined #openstack-containers22:29
brtknri didnt realise that it was possible to make multiple updates in a single api call22:29
brtknrthat said, i am still not a great fan of logic for determining health status being in the magnum-auto-healer :(22:30
brtknri feel like the easiest time to make this change is now, it will only become harder to change this once it is merged22:31
flwang1so you mean totally don't allow update the health_status field?22:32
flwang1my point is, the health_status_reason is really a dict/json, and it could be anything inside, depends on how the cloud provider would like to leverage it22:33
flwang1for example, i'm trying to put the 'updated_at' into the health_status_reason, so that the 3rd monitor code can get more information from there22:34
flwang1if we totally limit the format of the health_status_reason, then we will lost the flexibility22:35
flwang1brtknr: ^22:35
*** vishalmanchanda has quit IRC22:35
brtknrflwang1: yes, i mean prevent update of health_status field22:36
flwang1I can see your point, we get a bit benefit but meanwhile, we will lost bigger flexibility22:36
brtknrand let the magnum-conductor work it out22:36
flwang1brtknr: another thing is22:37
flwang1our current heath_status_reason is quite simple, as you can see, now it only get the Ready condition22:37
flwang1if you put more information, say support more conditions of node and master, then the calculation could be mess22:38
flwang1which i don't think magnum should control that much22:38
flwang1in other words, when I designed this22:39
flwang1the main thing, the cloud provider admin care about is, the health status, and the health_status_reason is just a reference, instead of reverse22:39
flwang1brtknr: TBH, i don't want to maintain such a logic in magnum22:40
flwang1magnum is a platform, as long as we open these 2 fields for cloud admin, we want to grant the flexibility instead of limit it22:41
brtknrok fine i see your argument22:42
NobodyCamI am able to deploy up to v1.13.11 on my rocky install22:42
brtknrif we have the option to disable the polling from magnum side, i would be happy with that solution22:42
NobodyCam1.14.# and above fail22:42
flwang1brtknr: you mean totally disable it? for that case, we probably have to introduce a config22:43
flwang1but actually, it's a really cluster by cluster config22:43
NobodyCamhttps://wiki.openstack.org/wiki/Magnum#Compatibility_Matrix says 1.15.X should work?22:44
flwang1i don't think totally disable it is a good idea, TBH22:44
brtknrflwang1: it doesnt make sense for the internal poller and magnum auto healer stepping on each other's toe22:45
brtknrNobodyCam: I got 1.15.x working when i last tested rocky22:45
brtknri probably had to use heat_container_agent_tag=train-stable22:46
brtknri cant remember 100%22:46
NobodyCamnice! I'm still working on it...22:46
NobodyCamoh Thank you I can try that22:46
flwang1brtknr:  as i mentioned above, some cluster may be public and no auto healer running on that, some cluster maybe private and having cluster running on that22:46
brtknrNobodyCam: actually try train-stable-222:46
flwang1 some cluster maybe private and having auto healer running on that22:46
brtknrflwang1: i meant option to disable it22:47
brtknrflwang1: i meant option to disable it for each cluster22:47
brtknre.g. if auto healer is running22:47
brtknrcould disable automatically if auto healer is running22:48
flwang1you mean checking the magnum-auto-healer   when doing the accessible validation?22:48
flwang1or a separate function to disable it?22:48
flwang1no problem,i can do that22:49
brtknrsomething that stops the internal poller and magnum auto healer fighting like cats and dogs22:51
flwang1sure, i will fix it in next ps22:54
flwang1thank you for your review22:54
flwang1and glad to see we're on the same page now22:54
brtknrflwang1: :)22:55
brtknrflwang1: btw is calico working for you on master branch22:55
brtknrwithout your patch22:55
flwang1brtknr: i didn't try that yet TBH22:55
flwang1but it works well on our prod22:56
brtknrhmmm22:56
*** rcernin has joined #openstack-containers22:56
brtknrits appears to be working on stable/train but broken on master22:57
brtknre.g. the cluster-autoscaler cannot reach keystone for auth22:57
brtknrsame with cinder-csi-plugin22:57
brtknrotherwise all reports healthy22:57
flwang1brtknr: try to open the 179 port on master22:59
brtknrwhat? manually?23:00
brtknrflwang1: but its not an issue on stable/train branch23:05
brtknronly on master23:05
flwang1ok, then i'm not sure, probably a regression issue23:05
brtknrflwang1: hmm looks like the regression might be caused by your patch to change default calico_ipv4_cidr23:18
flwang1brtknr: no way23:19
flwang1it's impossible :)23:19
brtknrflwang1: yes way!23:20
flwang1how? can you pls show me?23:20
brtknrwhen i revert the change, it works23:20
flwang1then you should check your local settings23:21
flwang1are you having 10.100.x.x with you local vm network?23:21
brtknr magnum/drivers/heat/k8s_coreos_template_def.py:58:                cluster.labels.get('calico_ipv4pool', '192.168.0.0/16')23:21
brtknrmagnum/drivers/heat/k8s_fedora_template_def.py:58:                cluster.labels.get('calico_ipv4pool', '192.168.0.0/16')23:21
flwang1shit, my bad23:22
flwang1brtknr: i will submit a fix soon23:23
brtknrflwang1: mind if i propose?23:24
flwang1i will submit the patch in 5 secs23:25
brtknrok23:27
brtknrwe should remove the defaults from kubecluster.yaml23:28
brtknrespecially if pod_network_cidr depends on it23:28
brtknri wonder if this will fix calico upgrade?23:28
openstackgerritFeilong Wang proposed openstack/magnum master: Fix calico regression issue caused by default ipv4pool change  https://review.opendev.org/71509323:28
flwang1let's see23:29
flwang1i will test the calico upgrade again with this one23:29
flwang1brtknr: https://review.opendev.org/71509323:29
flwang1brtknr: i'm sorry for the regression issue :(23:31
flwang1and the stupid confidence :D23:31
brtknrhey, we approved it so partly our fault too23:31
brtknrwe should remove those defaults from kubecluster.yaml if it is never used23:32
flwang1brtknr: or remove the defaults from the python code, thoughts?23:32
flwang1let's get this one in ,and you and work on how to handle the default value?23:33
flwang1and you can work on23:33
brtknrhmm makes more sense to handle it in python code though23:34
brtknrsince pod_network_cidr has to match flannel_network_cidr or calico_ipv4pool23:35
brtknrim sure this logic would be far more complicated in heat23:35
brtknrim sure this logic would be far more complicated in heat template23:35
flwang1ok, but anyway, let's do that in a separate patch23:45
brtknrflwang1: but do you agree with what i said? that the logic would be more complicated to implement in heat template?23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!