Tuesday, 2018-10-09

*** itlinux has joined #openstack-containers00:23
openstackgerritFeilong Wang proposed openstack/magnum master: [k8s] Using node instead of minion  https://review.openstack.org/60879901:07
*** hongbin has joined #openstack-containers01:08
*** slagle has joined #openstack-containers01:20
*** ricolin has joined #openstack-containers01:20
*** dave-mccowan has quit IRC03:02
*** ramishra has joined #openstack-containers03:08
*** jaewook_oh has joined #openstack-containers03:46
*** hongbin has quit IRC04:00
*** udesale has joined #openstack-containers04:07
openstackgerritKien Nguyen proposed openstack/magnum master: Update auth_uri option to www_authenticate_uri  https://review.openstack.org/56091904:11
*** pcaruana has joined #openstack-containers04:38
*** janki has joined #openstack-containers05:25
*** ttsiouts has joined #openstack-containers06:34
*** dabukalam has joined #openstack-containers07:00
*** rcernin has quit IRC07:08
*** serlex has joined #openstack-containers07:15
*** jaewook_oh has quit IRC07:28
*** jaewook_oh has joined #openstack-containers07:28
*** jaewook_oh has quit IRC07:29
*** ttsiouts has quit IRC07:30
*** ttsiouts has joined #openstack-containers07:31
*** jaewook_oh has joined #openstack-containers07:32
*** mattgo has joined #openstack-containers07:32
*** ttsiouts has quit IRC07:35
*** ttsiouts has joined #openstack-containers07:36
*** ttsiouts has quit IRC07:38
*** ttsiouts has joined #openstack-containers07:38
*** belmoreira has joined #openstack-containers07:40
*** gsimondon has joined #openstack-containers07:42
*** ttsiouts has quit IRC07:52
*** ttsiouts has joined #openstack-containers08:09
*** ricolin has quit IRC08:12
*** janki is now known as janki|lunch08:16
*** imdigitaljim has quit IRC08:25
*** flwang1 has joined #openstack-containers08:40
flwang1strigazi: pls ping me when you're available08:40
*** ricolin has joined #openstack-containers08:47
*** belmoreira has quit IRC08:57
*** belmorei_ has joined #openstack-containers08:57
*** salmankhan has joined #openstack-containers09:11
strigaziflwang1: ping09:13
flwang1strigazi: pong09:13
flwang1strigazi: see my email?09:13
*** salmankhan has quit IRC09:15
flwang1strigazi: 1. https://storyboard.openstack.org/#!/story/2003992   heat-container-agent version tag09:19
strigazilet's start from the easy one, the container tags09:19
flwang1i'd like to upload a new image for heat-container-agent09:19
flwang1to fix the multi region problem09:19
flwang1the problem does exist, verified in our cloud09:20
strigaziWasn't there one already in docker.io?09:20
flwang1no, that one has a bug, which i fixed, but forgot to upload a newer version image09:20
strigazirocky-dev09:20
flwang1there is a bug in that one09:20
strigaziwhich one?09:21
strigazihttps://review.openstack.org/#/c/584215/09:21
strigaziisn't it fixed here ^^09:22
strigaziflwang1: ^^09:22
flwang1https://review.openstack.org/#/c/584215/4/magnum/drivers/common/image/heat-container-agent/scripts/heat-config-notify09:22
flwang1it is09:22
flwang1but we need a new image09:22
flwang1and we need bump the heat-container-agent version in magnum09:22
strigaziand the fix is not included in rocky-dev?09:23
flwang1the image with rock-dev was built before we fix the bug09:23
flwang1yes09:23
strigaziok09:23
flwang1i just need your input for the tag09:23
strigaziwait, I have a patch already to add it as a label09:24
flwang1still using rocky-dev, rocky or rocky-stable09:24
strigazior rocky-<six digits from commit id>09:25
strigazior rocky-<seven digits from commit id>09:25
strigaziactually we can use both09:26
flwang1let's use 7 digits to be consistent with github09:27
strigazirocky-stable will be an extra tag the points to the stable of the branch09:27
strigazithis waym the images will have the same sha09:27
flwang1ok, sounds a plan09:27
strigaziand when we say stable we will also now where it came from09:27
flwang1cool, i like the idea09:27
flwang1move to next one?09:28
strigazione sec09:28
strigazito address this issue for good09:28
strigazishould we finilize:09:28
strigazihttps://review.openstack.org/#/c/561858/ https://review.openstack.org/#/c/585420/09:28
strigaziadd the label as a label09:29
strigaziand build in the ci09:29
flwang1sure09:29
strigaziI can also build an image right now to get the fix09:29
flwang1good to see you already have a patch to make it as a label https://review.openstack.org/#/c/561858/1/magnum/drivers/common/templates/kubernetes/fragments/start-container-agent.sh09:29
flwang1sure, go ahead09:30
strigazione more thing09:30
*** salmankhan has joined #openstack-containers09:31
strigazithe rocky branch uses rawhide atm09:31
flwang1strigazi: that's the one i'd like to address09:31
flwang1we need a simple fix and then backport09:31
flwang1i almost forgot that09:31
strigaziI don't want to override something that works other sites already09:31
strigaziI don't want to override something that works in other sites already09:32
strigaziso09:32
strigaziI build with rocky-stable09:32
strigaziand we do a one-liner in rocky to use that09:32
strigaziand we do a one-liner in the rocky branch to use that09:32
flwang1no change in master, you mean?09:32
flwang1directly change rocky branch09:33
flwang1?09:33
strigazishould we hard-code rocky-stable in master?09:33
strigaziwhy not09:33
strigaziit will be for a week there09:33
flwang1i mean we should hard code it in master and then get your patch in09:33
strigazi1. build with rocky stable 2. patch master 3. backport to rocky09:34
flwang1yes09:34
strigazidone09:34
strigazior agreed09:34
strigazinext item?09:34
flwang1+209:34
strigazicoredns?09:34
flwang1version tag of coreDNS09:34
strigazipush a patch quickly and back port?09:34
strigaziand bump to latest stable release?09:35
flwang1which one are you talking about? still the heat-container-agent issue?09:35
strigazino, coredns09:35
flwang1oh, for coreDNS, we don't have to backport09:35
flwang1current version works fine so far09:36
flwang1but we do need a tag for that09:36
flwang1and we probably need a better name convention for all labels to follow for now on09:36
flwang1for/from09:36
strigazi<something>_tag is not good?09:36
flwang1nope, i'm talking about all the labels09:37
flwang1sorry09:37
flwang1for the confusion09:37
flwang1coredns_tag is good enough09:37
flwang1i will propose a patch later09:37
flwang1next one?09:37
strigaziok09:37
flwang1Multi discovery service09:38
flwang1do you think it's a bad/stupid idea, even if user set a private discovery service, then when it's done, magnum will use the discovery.etcd.io as a backup one?09:38
strigaziwe could do that09:38
flwang1or soemthing like that09:39
strigaziwell, in china won't work09:39
flwang1it's not urgent, but i think it's useful09:39
flwang1user can set the backup one i think09:39
flwang1for a better solution09:39
strigaziif the backup is configurable too makes sense09:39
flwang1in otherwords, the discovery services, should be a list, not a single one09:40
strigaziok09:40
flwang1another similar requirement, the dns_nameserver09:40
flwang1it definitely should be a list, instead of single server09:41
strigaziwill this work: https://github.com/openstack/magnum/blob/stable/rocky/magnum/drivers/k8s_fedora_atomic_v1/templates/kubecluster.yaml#L52309:42
strigazi?09:42
flwang1i haven't tried, but we're defining the param as a string, so i'm not sure09:43
flwang1FWIW, it should be improved because neutron can support it09:43
flwang1I will dig and propose patch if it's necessary, agree?09:43
strigazione moment,09:44
strigazido you need may servers for the docker config?09:44
flwang1may servers?09:45
strigazimany, sorry09:45
flwang1what's the context for 'docker config'?09:45
strigazidockerd and coredns are the components in the cluster that can be configured09:46
strigaziwith dns servers09:46
*** vabada has quit IRC09:46
flwang1ah, i see what do you mean. probably yes, we just want to have backup dns server for prod use09:46
strigaziagreed actually: https://developer.openstack.org/api-ref/network/v2/index.html#create-subnet09:47
strigazidns_nameservers (Optional)bodyarrayList of dns name servers associated with the subnet. Default is an empty list.09:47
flwang1so are we on the same page now?09:48
strigaziyes09:48
flwang1cool, next one09:48
flwang1stack delete09:49
strigazithe only thing to solve is how to pass it in the magnum API. ok stack delete09:49
*** dave-mccowan has joined #openstack-containers09:49
strigazifrom your email I didn't get the problem with stack delete09:49
strigaziadmins can do stack delete09:49
flwang1yep, it's still the LB delete issue09:50
flwang1https://review.openstack.org/#/c/497144/09:50
strigaziwhat is the status of ^^09:51
strigazi?09:51
flwang1my current plan is passing the cluster UUID to --cluster-name09:51
strigaziis there disagreement? I forgot09:51
flwang1then with this patch https://github.com/kubernetes/cloud-provider-openstack/pull/223/files09:51
strigazisounds good09:51
flwang1then we should be good to figure out the correct LB09:52
flwang1Jim said he has a patch, but so far I haven't seen it09:52
flwang1so I will propose a patch set to use the way I mentioned above09:53
strigazilet's propose one then09:53
strigaziwhat is the change we need?09:53
strigaziin magnum, i mean09:53
strigaziin the config of the cloud-provider?09:53
flwang1we need pass in the uuid to kube-controller-manager with --cluster-name09:53
flwang1then use basically the code in https://review.openstack.org/#/c/497144/ to get the LB09:54
strigazihttps://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/ this one?09:54
strigazi--cluster-name string     Default: "kubernetes"09:54
flwang1yes09:54
flwang1any concern?09:54
strigaziThe instance prefix for the cluster. what does this even mean?09:54
strigaziwhere will it be used?09:55
flwang1i don't know, TBH, at this moment, need some test09:55
flwang1but Gardener is using the same way09:55
flwang1so i assume it's fine09:55
strigaziquestion?09:56
strigaziquestion.09:56
strigaziwe need to use the out-of-tree cloud-provider to benefit from this, correct?09:56
flwang1i need to check, probably not really necessary, we should be able to use kube-controller-manager as well09:57
*** vabada has joined #openstack-containers09:57
flwang1i haven't tried yet, sorry, i can't provide much details09:57
strigaziI think we need the out-of-tree one, since the patch you sent me is merged there, not in tree09:59
flwang1no mater which one we need, that's the way we can fix the issue10:00
strigaziagreed10:00
flwang1and given we're moving to cloud-controller-manager, we should definitely dig into that10:01
flwang1ok, next one?10:01
strigazinext one.10:01
flwang1cluster template update or sandbox failure10:01
flwang1which one you want to discuss first?10:01
strigazisandbox failure10:01
flwang1ok10:01
*** janki|lunch is now known as janki10:02
flwang1recently, i have seen several times the 'sandbox failure'10:02
flwang1let me should you the log10:02
flwang1the problem is, the kubelet is in Ready status10:03
flwang1but some pods can't be created10:03
strigaziyou see this in the kubelet log?10:04
flwang1see http://paste.openstack.org/show/731752/10:04
flwang1describe pod http://paste.openstack.org/show/731753/10:06
strigaziit seems related to calico maybe? the calico pods are running but all the other ones no.10:06
flwang1strigazi: yep, i think it's related to calico10:07
flwang1but it's not constantly repeatble10:07
flwang1just wanna ask to see if you have any idea10:08
strigaziI haven't seen it in my devstack10:08
flwang1do you think it's related to the order we start the service?10:08
strigazilet me spin a cluster10:08
strigazithe systemd service of kubelet?10:09
flwang1i mean the dependency relationship of those components in heat template10:09
flwang1https://github.com/openstack/magnum/blob/master/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L67910:10
strigaziI don't think so, there is no dependency when you send requests in the k8s api10:10
flwang1right10:11
strigazifor example10:11
strigaziwith kubeadm if you start the master and you don't have configured the cni, the pods are there pending10:11
flwang1same in magnum10:12
flwang1before having a Ready node/kubelet, those pods should be in pending10:13
flwang1but the weird thing is the calico controller pod can be created10:13
*** jaewook_oh has quit IRC10:15
flwang1and any new pod can't be created10:15
strigaziis docker running in that node?10:19
flwang1yes10:19
flwang1v1.13.110:19
strigaziis this in the slow tesing env?10:28
flwang1no10:28
flwang1it's on our prod10:28
strigaziin my devstack looks ok :(10:28
flwang1yep, as i said, it's not always repeatable10:29
flwang1i will dig anyway10:29
strigazidid you restart kubelet?10:29
flwang1no, i didn't10:29
flwang1and i assume that not help10:29
flwang1let's discuss the cluster template ?10:30
strigaziok10:30
strigazisince we mentioned cni, can you also look into the flannel patch?10:30
strigazixD10:30
flwang1yep, sure10:30
strigaziso, CT10:31
flwang1you know, i was so busy recently10:31
strigazii do10:31
flwang1flannel is always on my list10:31
flwang1i will revisit it in this week for sure10:31
strigaziThe problem:10:31
strigazithanks10:31
strigaziOperators want to maintain public CTs to advertise to users10:32
flwang1yes10:32
strigaziAs the time passes, operators want advertise new features or solve configurations issues in the clusters based in the values in those CTs10:33
strigazicorrect?10:33
flwang1yes10:33
strigaziOperators do not want to change the name and uuid of the advertised cluster templates10:34
flwang1yes10:34
strigaziSo that user have the same entry point to consume magnum10:34
strigaziSo that users have the same entry point to consume magnum10:34
strigazihere is where it becomes tricku10:35
strigazihere is where it becomes tricky10:35
flwang1listening...10:35
strigaziTo maintain the integritity of the data model the cluster templates are immutable if they are referenced. Elaborating on this10:36
strigazis/integritity/integrity10:36
strigaziA running cluster references a cluster template, this cluster template should not change during the life of the cluster because10:37
flwang1you know, i stay on glance for quite a long time10:37
flwang1glance image is immutable10:37
flwang1and I understand that10:38
strigaziok10:38
flwang1but template is different10:38
flwang1for image, it's very simple, technically, the image data can't be changed10:38
strigaziso, you propose to allow editing the cluster template10:38
*** ricolin has quit IRC10:38
flwang1but for template, there are too many things can be improved without impact the cluster itself10:38
flwang1i can give your a lot examples10:39
strigaziyou propose to allow editing the cluster template, correct?10:39
flwang1for admin, for some special attributes/labels10:39
strigaziI know that CTs contain a lot more info10:39
strigaziI know the issue very well10:39
flwang1for example, dns_nameserver10:39
strigaziin two year with 100s of cluster and 10s of changes improvements, I know it very well10:40
flwang1add or replace a dns server won't impact the cluster itself, not version change, no flavor change10:40
flwang1same for container_infra_prefix10:41
flwang1i'm not proposing editing anything of a CT10:41
flwang1but some of them can be improved10:41
strigaziallow partially editing?10:41
strigazipartially or not10:42
flwang1just like, cluster update, you can't update anything of a cluster, but the node count10:42
flwang1partially10:42
flwang1like, dns_nameserver, container_infra_prefix, etc etc10:42
strigaziFirst, I want to clarify that I and our team at CERN, we want to address the issue because it makes lifes horrible10:43
flwang1in other words, after changed those attributes/labels, the clusters created with that CT before and after are same10:43
strigaziwhat I don't like is that we lose track of the changes.10:44
*** ttsiouts has quit IRC10:44
flwang1that's a good point, but we can do it at stage 210:45
strigaziwe see that already because we touch the db10:45
strigaziit is the same thing10:45
flwang1what do you mean?10:45
strigazifor example10:45
strigazik8s x.y.z has a bug10:46
strigazik8s x.y.z+n includes the patch10:46
strigaziwhen we want to advertise he change we just change the value in the db10:47
strigaziwhich is horrible10:47
strigazibut in practice it is the same thing as doing it with the API.10:47
strigaziwith the api you have some validation but in respect to logging and monitoing the data of the service is the same10:48
flwang1i see10:48
flwang1so let's do it?10:48
flwang1what's your concern now?10:48
strigazilet's version the cluster templates, would be the answer10:49
flwang1another attribute needs can be updated is 'name'10:49
strigazijust allow editing the CTs is not that small change10:50
flwang1for example, we'd like to keep a template name 'k8s-v1.11.2-prod'10:50
strigazithe uuid will be different I understand10:50
flwang1when there is a new change, we can rename the existing one to 'k8s-v1.11.2-prod-20181009' and then create a new one with name 'k8s-v1.11.2-prod'10:51
flwang1yes10:51
strigaziso for that model, there is a better and simpler solution10:51
flwang1we're using the same way for our public images10:51
strigazilet me exmplain10:51
strigazi1. public CT {uuid: 1, name: foo, labels: {bar: val} }10:52
*** udesale has quit IRC10:52
strigaziyou want:10:52
strigazipublic CT {uuid: 2, name: foo, labels: {bar: val5} }10:52
strigazicorrect?10:52
flwang1and we'd like to rename the CT with uuid:110:53
strigaziwhy?10:53
flwang1to name foo-xxx10:53
flwang1because we don't want to confuse user10:53
strigazieven better10:53
strigazigot it10:53
flwang1user is using ansible/terraform to look for the template10:53
strigazioh, wait, what I don't understand is why you want the same id10:54
strigaziit doesn't make sense10:54
flwang1i'm not asking the same id10:54
flwang1i don't want to touch the id10:54
flwang1and i don't think change id is a good idea for any case10:55
strigaziso the proposal is, (it is not only my idea, proposed long time ago10:55
strigazito have a deprecated/hide field.10:56
flwang1that's the way proposed in glance looooong time ago10:56
strigazilike you have with images10:56
flwang1if hide is true, then user can't list/show it, right?10:57
strigaziyes, but, we don't want only hide the listing10:58
strigaziwe don't want users to use the old CTs10:58
strigaziso deprecated10:58
strigaziif it is deprecated you don't list and you can't use it10:58
flwang1ok10:58
strigazifrom the ops point of view10:59
strigazihe releases a new CT10:59
flwang1it's a OK solution for me10:59
strigazisame name, deprecates the old one10:59
flwang1why it's not implemented?10:59
strigazieven not same name, as the operator wants10:59
strigaziman power, priority10:59
flwang1i can do that11:00
flwang1it's not a big change i think11:00
strigaziit way smaller than what we discussed11:00
flwang1i can do it in this cycle11:00
flwang1not really, but i won't argue11:01
flwang1i'm happy with this solution11:01
*** ttsiouts has joined #openstack-containers11:01
flwang1when use want to update an existing cluster, does it need to get the template info?11:03
flwang1if the template is deprecated at that moment, what will happen?11:03
flwang1use/user11:03
strigazideprecated CTs won't be touched11:04
flwang1see my above question11:04
flwang1if there is a cluster created with a deprecated CT, and user want to update it to add more nodes, what could happen?11:05
strigaziyou create a new one with the same info11:05
strigazioh11:05
strigaziwait, I misread it11:05
strigaziit would work11:05
strigaziscale will work11:06
flwang1so it is still accessible for that case, right?11:06
strigaziyes11:06
flwang1that's the special case?11:06
strigaziactully one creation is the special case11:06
strigaziand show11:06
flwang1ok, depends on the view11:07
flwang1my next question is11:07
strigaziwell user should be able to do show11:07
strigazito see what they are using11:07
flwang1if user is using ansible or terraform, does that mean they need to check the deprecated attribute to get the correct one?11:07
strigaziwhat terraform takes as parameters?11:08
strigazicorrect syntax, what are the parameters used by terraform? same as the client?11:08
strigaziit doesn't matter.11:09
*** ttsiouts has quit IRC11:09
flwang1https://www.terraform.io/docs/providers/openstack/11:10
strigaziif the CT is deprecated and the client (terraform, osc, ansible) tries to create a cluster with a CT that is deprecated the api will return 40111:10
flwang1ok11:11
flwang1i will discuss the proposal with our support/business people and our customer to get ideas from their view11:11
strigaziis it unreasonable? unexpected?11:11
flwang1no, as i said, it's a OK solution for me11:11
flwang1software is always a trade off game11:12
flwang1i just want to avoid some design issue when we can get comments from different parties11:12
strigazishouldn't this be done public though?11:13
strigaziwhat are your concerns?11:13
flwang1like send an email to mailing list?11:13
*** ttsiouts has joined #openstack-containers11:13
strigaziML or gerrit11:13
flwang1it's not consistent with the way we're dealing with images11:13
flwang1we do have some customers now11:14
strigazihow are you dealing with images?11:14
flwang1so i'd like to know their preference11:14
flwang1we rename old images and create new one with same name11:14
flwang1again, i think this solution works11:15
flwang1i can start with a spec and send email to mailing list to get feedback11:15
strigazihmm, this is a convention. I don't think it is incompatible with this solution11:15
flwang1no, it shouldn't i think11:15
flwang1to be short11:16
strigazithe name could change, it doesn't carry info.11:16
strigaziok11:16
flwang1i'm happy with current way11:16
flwang1i can propose spec11:16
flwang1yep, allow name change is a baby easy step we can take11:17
strigazichange names yes, values no11:17
strigaziwhat about cluster template versioning. We (CERN) could to it11:18
flwang1ok, we can start with name change, and discuss the details of how to how to handle this11:19
flwang1deal?11:19
strigaziyou mean name change and deprecated field? or just name change?11:20
flwang1separately11:20
flwang1name change firstly11:20
flwang1with that, 80% requirements can be covered11:21
strigazideal11:21
flwang1then let's propose a spec to discuss the 'deprecated' idea, sounds a plan?11:21
strigaziyes11:21
flwang1cool11:21
flwang1strigazi: thank you for your time11:21
flwang1big progress today11:21
*** slagle has quit IRC11:22
strigaziyou are welcome11:22
strigaziwe need to get Blizzard on board too though :)11:22
flwang1yep, we can add them as reviewer11:23
flwang1for code change11:23
strigaziwill you attend the meeting (tmr for you)11:23
flwang1yep11:23
flwang1could be late since i work late today11:24
flwang100:24 here11:24
*** ttsiouts has quit IRC12:01
*** ttsiouts has joined #openstack-containers12:02
*** zul has joined #openstack-containers12:05
*** ttsiouts has quit IRC12:10
*** ttsiouts has joined #openstack-containers12:22
*** lpetrut has joined #openstack-containers12:58
*** zul has quit IRC13:09
*** udesale has joined #openstack-containers13:13
*** ttsiouts has quit IRC13:13
*** udesale has quit IRC13:21
*** janki has quit IRC13:23
*** ttsiouts has joined #openstack-containers13:37
*** hongbin has joined #openstack-containers13:48
*** gsimondo1 has joined #openstack-containers13:51
*** ttsiouts has quit IRC13:52
*** gsimondon has quit IRC13:54
*** ttsiouts has joined #openstack-containers14:02
*** itlinux has quit IRC14:07
*** ttsiouts has quit IRC14:11
*** ttsiouts has joined #openstack-containers14:17
*** serlex has quit IRC14:19
*** ramishra has quit IRC14:55
*** munimeha1 has joined #openstack-containers14:55
*** lpetrut has quit IRC14:56
*** gsimondo1 has quit IRC15:07
*** itlinux has joined #openstack-containers15:08
*** isitirctime has joined #openstack-containers15:12
isitirctimeHope everyone is well. Is there a mechanism to rollback a magnum upgrade that failed? The stack is UPDATE_FAILED status and I can make any modifications now.15:13
isitirctimeI am thinking I need to modify the magnum_service table and set back to complete status and set the node count back. But I am a little worried about what modifications may be needed in the heat table15:18
*** ttsiouts has quit IRC15:33
*** ttsiouts has joined #openstack-containers15:33
isitirctimesorry not magnum_service table cluster table15:36
*** ttsiouts has quit IRC15:38
*** ianychoi has quit IRC15:39
*** munimeha1 has quit IRC15:43
*** janki has joined #openstack-containers15:52
*** salmankhan has quit IRC15:58
*** belmorei_ has quit IRC16:03
*** isitirctime has quit IRC16:04
*** belmoreira has joined #openstack-containers16:10
*** chhagarw has joined #openstack-containers16:17
*** mattgo has quit IRC16:27
*** janki has quit IRC17:14
*** janki has joined #openstack-containers17:22
*** gsimondon has joined #openstack-containers17:51
*** gsimondo1 has joined #openstack-containers18:14
*** gsimondon has quit IRC18:16
*** kaiokmo has quit IRC18:26
*** chhagarw has quit IRC18:28
*** chhagarw has joined #openstack-containers18:38
*** janki has quit IRC18:38
*** salmankhan has joined #openstack-containers18:44
*** gsimondon has joined #openstack-containers18:47
*** flwang1 has quit IRC18:47
*** gsimondo1 has quit IRC18:48
*** kaiokmo has joined #openstack-containers18:51
*** chhagarw has quit IRC19:17
*** gsimondo1 has joined #openstack-containers19:20
*** gsimondon has quit IRC19:22
*** spiette has quit IRC19:58
*** pcaruana has quit IRC20:37
*** ttsiouts has joined #openstack-containers20:51
*** ttsiouts has quit IRC21:00
strigazi#startmeeting containers21:00
openstackMeeting started Tue Oct  9 21:00:39 2018 UTC and is due to finish in 60 minutes.  The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot.21:00
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:00
*** openstack changes topic to " (Meeting topic: containers)"21:00
openstackThe meeting name has been set to 'containers'21:00
strigazi#topic Roll Call21:00
*** ttsiouts has joined #openstack-containers21:00
*** openstack changes topic to "Roll Call (Meeting topic: containers)"21:00
strigazio/21:00
ttsioutso/21:01
cbrummo/21:01
cbrummjim won't be able to make it today21:03
strigaziHello ttsiouts cbrumm, we will be cozy21:03
strigaziflwang: will joing some time later I think21:04
strigazi#topic announcements21:04
*** openstack changes topic to "announcements (Meeting topic: containers)"21:04
flwango/21:04
flwangsorry, was in a standup21:05
strigazilast week we added some patches from master to rocky with flwang, we'll do a point release this week I hope. I just want to add the flannel cni patch in this release.21:05
flwangstrigazi: i will review it today21:05
strigazithe release will be 7.1.021:05
strigaziflwang: thanks21:06
strigazi#topic stories/tasks21:06
*** openstack changes topic to "stories/tasks (Meeting topic: containers)"21:06
strigaziWe worked with ttsiouts a bit on the spec for nodegroups and he updated it, so reviews are welcome. Shoot questions to ttsiouts if you want :)21:07
flwanggreat21:07
strigazisince Jim is not here we can discuss it all together in gerrit, to be in sync.21:08
ttsioutssounds good!21:08
strigaziThis morning (morining for me), we discussed allowing cluster template rename with flwang. We don't have a story for it, but flwang will create one right? flwang we can push a patch as docs or a spec to describe the change. I can do the doc if you want21:10
cbrummis this so that renaming templates doesn't have to require database actions?21:11
cbrumma manual database action21:11
strigaziyes21:11
strigazithe uuid and the values will remain immutable21:12
strigazithe next will be adding a "deprecated" field in the CT object21:12
cbrummso we can cleanly call one "latest" or "deprecated"21:12
flwangi do have a story, wait a sec21:12
flwangcbrumm: or any tag you want21:12
cbrummnice21:13
cbrummthank you21:13
flwanghttps://storyboard.openstack.org/#!/story/200396021:13
flwangi will update above story to reflect the requirements strigazi mentioned above21:13
strigaziyes, thank you21:14
strigazito clarify, only the template name will change as a first step21:14
flwangi just created 2 tasks21:15
flwangone for name change, another one for 'deprecated' attribute21:15
strigazithe second independent change is to add the deprecated field which will hide the template from listing and even if users know the uuid they won't be able to create new clusters.21:15
strigaziflwang: exactly21:15
strigazicbrumm: ttsiouts the discussion starts here if you want to have a look http://eavesdrop.openstack.org/irclogs/%23openstack-containers/%23openstack-containers.2018-10-09.log.html#t2018-10-09T10:32:0421:17
strigaziFinal item for me, as discussed with flwang this morning, I'll publish a new heat-container-agent image tagged rocky-stable which includes the multi-region fix21:18
ttsioutsstrigazi:thanks21:18
strigaziand the a patch to include the agent tag in the labels.21:19
strigaziand then a patch to include the agent tag in the labels.21:19
flwangstrigazi: https://review.openstack.org/#/c/585061/1/magnum/drivers/common/image/heat-container-agent/Makefile21:19
strigazithat is all from me, any comments questions?21:19
flwangabove makefile could be useful for you21:20
strigaziI'll have a look, other projects loci and kolla build with ansible21:20
flwangno problem21:20
strigazihttps://review.openstack.org/#/c/585420/16/playbooks/vars.yaml21:21
strigaziwe can add account there etc21:21
*** cbrumm has quit IRC21:22
flwangcool, no matter what's the way to build, i'd like to see we have a more stable process21:23
strigaziany question, comment?21:23
strigaziflwang:  also to solve, the "strigazi gets hit by a bus problem"21:23
flwangwhat's your busy problem? missed the last one?21:24
flwangbus21:24
strigazithe process of building publishing the images is done by me only so far and it is not documented/automated21:25
flwangcool, let me know if you need any help on that21:25
strigaziI'll ping you for reviews on https://review.openstack.org/#/c/58542021:26
flwang@all, Catalyst Cloud just deployed Magnum on our production as a public managed k8s service, so welcome to come for questions, cheers21:26
flwangstrigazi: sure, no problem21:26
strigazi\o/21:27
flwangstrigazi: thank you for all you guys help21:27
*** cbrumm has joined #openstack-containers21:27
strigazianytime, really anytime21:27
flwangit wouldn't happen without your help and support21:28
flwangsuch a great team21:28
strigazi:)21:28
flwangin the next couple weeks, i will focus on polishing our magnum and will do upstream first21:30
strigazido we need a magnum-ui release?21:30
strigaziyour fixeds are in?21:31
strigaziyour fixes are in?21:31
cbrummwe have a few edits to the ui, mostly removal of user choices21:31
cbrummI'm not sure if their intended for upstream though21:31
strigazicbrumm: which option for example?21:32
*** salmankhan has quit IRC21:32
flwangstrigazi: i would be good to have a new release for ui21:32
cbrummI would need to check with Jim. But our UI only asks for template and minion count21:32
strigaziwe could make this configurable i guess21:33
strigaziflwang:  you need to backport to stable21:33
strigaziflwang: you can test this image: docker.io/openstackmagnum/heat-container-agent:rocky-stable21:34
flwangstrigazi: yep, sure21:34
flwangstrigazi: ok, will do21:34
strigaziAnything else for the meeting?21:36
flwangi'm good21:36
cbrummgood here21:37
strigazittsiouts: if you need anything come to my office :)21:37
strigazisee you next week have a nice day cbrumm flwang21:37
ttsioutsstrigazi: will do21:38
ttsiouts:)21:38
strigazi#endmeeting21:38
*** openstack changes topic to "OpenStack Containers Team"21:38
openstackMeeting ended Tue Oct  9 21:38:16 2018 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)21:38
cbrummbye all21:38
openstackMinutes:        http://eavesdrop.openstack.org/meetings/containers/2018/containers.2018-10-09-21.00.html21:38
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/containers/2018/containers.2018-10-09-21.00.txt21:38
openstackLog:            http://eavesdrop.openstack.org/meetings/containers/2018/containers.2018-10-09-21.00.log.html21:38
flwangthank you21:39
*** ttsiouts has quit IRC21:39
*** ttsiouts has joined #openstack-containers21:39
*** ttsiouts has quit IRC21:44
*** salmankhan has joined #openstack-containers21:51
*** salmankhan has quit IRC21:56
*** imdigitaljim has joined #openstack-containers22:02
imdigitaljimhey all22:03
imdigitaljimsorry i missed meeting22:03
imdigitaljimif you're present22:03
imdigitaljimwas in another meeting22:03
flwangi'm still around22:06
flwangimdigitaljim: i do have a question for you22:06
flwangwhen you use calico and k8s v1.11.1+, did you ever see a 'sandbox create' probelm?22:06
imdigitaljimnot at all22:07
imdigitaljimi think i do have some additional arguments that arent in magnum yet22:07
flwangmind sharing that?22:07
flwangespecially the kubelet arguments22:07
imdigitaljimyeah22:08
imdigitaljim1 sec22:08
imdigitaljimand thats where it is22:08
flwangthanks a lot22:08
*** gsimondo1 has quit IRC22:08
imdigitaljimnp22:08
imdigitaljimand the more our driver moves forward22:08
imdigitaljimit might be unwieldy to push it on top of the current driver22:08
imdigitaljimwe were thinking it might just be another driver v2 perhaps22:08
flwangthat's not a bad idea22:08
imdigitaljimi could at least stage it there22:09
flwangis it using standalone cloud provider controller?22:09
imdigitaljimand if you all want to use it22:09
imdigitaljimor not22:09
imdigitaljimit is using ccm yes22:09
flwangcool22:09
imdigitaljimwe have *some* pending upstream stuff22:09
imdigitaljimbut other than that22:09
imdigitaljimit even cleans up octavia, cinder, and neutron lbaas resources22:10
imdigitaljimon cluster delete22:10
flwangthat's cool22:10
imdigitaljimhttp://paste.openstack.org/show/731795/22:10
imdigitaljimour cluster boot time is starting to drop too22:10
imdigitaljimits around 6minutes now but ive got some changes that might drop it to ~422:10
flwangyep, the boot time is one of the area i'd like to improve22:11
imdigitaljimi havent moved all eligible fields to the kubelet config yet but those are operational at this time22:11
flwang"--pod-infra-container-image=${CONTAINER_INFRA_PREFIX:-gcr.io/google_containers/}pause:3.0"  this one is very interesting for me22:11
imdigitaljimyeah i mean if we switched to my driver, you'd immediately get the benefit :)22:12
imdigitaljimhttps://www.ianlewis.org/en/almighty-pause-container22:12
flwangi found it with google before with my issue of sandbox22:13
flwangand i think that's probably related22:13
imdigitaljimenforceNodeAllocatable:22:13
imdigitaljim    - pods22:13
flwangsandbox is actually using the pause image, right?22:13
imdigitaljimi think this is also22:13
imdigitaljimwhat your problem is22:14
imdigitaljimdo you have this arg?22:14
imdigitaljimand cgroupsperqos22:14
flwangi don't have this arg, since we're using the upstream version22:15
imdigitaljimtry it out22:15
flwangand no cgroupsPerQOS: true22:15
imdigitaljimand see if it fixes it22:15
flwangyep, that's what i'm going to try22:15
imdigitaljimim guessing on your error22:15
flwangwait a sec22:15
imdigitaljimbut i recall fixing a related issue with these args22:15
imdigitaljimas flags i think its --enforce-node-allocatable=pods --cgroups-per-qos=true22:16
flwanghttp://paste.openstack.org/show/731797/22:18
imdigitaljimwhats the kubelet args on your minions22:19
imdigitaljimps -ef | grep kubelet22:19
imdigitaljimoutput22:19
flwangwait a sec22:20
flwanghttp://paste.openstack.org/show/731799/22:21
imdigitaljimmissing22:22
imdigitaljim--enforce-node-allocatable=pods22:22
imdigitaljimyou have --enforce-node-allocatable=22:22
flwanglet me try22:22
imdigitaljim--cgroups-per-qos=false22:22
imdigitaljimyou also have false22:22
imdigitaljim:P22:22
flwangthe problem is not always repeatable22:22
imdigitaljimyeah22:23
flwangthat's why it's weird22:23
imdigitaljimi had the same problem22:23
flwangreally?22:23
imdigitaljimand with these 2 things (for us) its 100% gone22:23
imdigitaljimhavent seen it for like 6mo22:23
flwangcooooooool22:23
flwanglet me try now22:23
imdigitaljimping me im gonna minimize22:25
flwangsure, thanks a lot22:25
flwangsame22:26
flwang:(22:27
flwangi'm going to add --pod-infra-container-image=${CONTAINER_INFRA_PREFIX:-gcr.io/google_containers/}pause:3.022:27
imdigitaljimgood hoice22:28
imdigitaljimchoice*22:28
imdigitaljimi think theres a 3.122:28
imdigitaljimbut22:28
imdigitaljiminsignificant different22:28
flwanghttp://paste.openstack.org/show/731800/22:29
flwangseems i can't just add --cgroups-per-qos=false in /etc/kubernetes/kubelet22:30
imdigitaljimwell its here now22:30
imdigitaljimthe --cgroups-per-qos=false --enforce-node-allocatable= at the beginning is from the atomic-system-container22:31
imdigitaljimwe've edited it out22:31
imdigitaljimwe want 100% control of the image flags22:31
imdigitaljimalthough the way you have yours now it *should* be accepting the latest flag22:31
imdigitaljimis your problem present after restarting the pods?22:31
flwang--cgroups-per-qos=false --enforce-node-allocatable= --logtostderr=true --v=0 --address=0.0.0.0 --allow-privileged=true22:31
flwangbut with above args, how will it do?22:31
imdigitaljim122:32
imdigitaljim.22:32
imdigitaljimhyperkube kubelet --cgroups-per-qos=false --enforce-node-allocatable= --logtostderr=true --v=0 --address=0.0.0.0 --allow-privileged=true --pod-infra-container-image=gcr.io/google_containers/pause:3.0 --cgroups-per-qos=true --enforce-node-allocatable=pods22:32
imdigitaljimnear the end22:32
imdigitaljim(theres more flags i just got it off)22:32
flwangyep22:32
flwangi mean22:32
flwangwith above args, how it works22:32
imdigitaljimoh did it work for you?22:32
imdigitaljim(or i dont understand)22:33
flwanghyperkube kubelet --cgroups-per-qos=false --enforce-node-allocatable= --logtostderr=true --v=0 --address=0.0.0.0 --allow-privileged=true --pod-infra-container-image=gcr.io/google_containers/pause:3.0 --cgroups-per-qos=true --enforce-node-allocatable=pods22:33
flwangno it doens't work22:33
imdigitaljimah ok still same issue22:33
imdigitaljimhmm22:33
flwangnot sure, which value kubelet will take22:33
imdigitaljimit takes the latest iirc22:33
imdigitaljimbut we dont have them at all22:34
imdigitaljimjust the 'correct' values22:34
imdigitaljimwhat do other logs say22:34
imdigitaljimlike kubelet logs22:34
flwangwait a sec22:35
flwanghttp://paste.openstack.org/show/731801/22:36
imdigitaljimumm22:38
imdigitaljimthis is weird22:38
flwangyep22:38
flwangit's not always happened22:39
imdigitaljimbut would you be willing to delete and restart your calico22:39
flwangdelete calico node?22:39
flwangi mean calico node pod?22:39
imdigitaljimyeah delete the daemonset22:39
imdigitaljimand redeploy?22:39
flwanganother weird thing is, the calico-kube-controller can start on that worker node22:40
imdigitaljimhmm22:40
flwangi think it's related to calico, but i have no clue22:40
imdigitaljimon a side note22:40
imdigitaljimwe put the controller on the master22:41
flwanghow? by label?22:41
*** rcernin has joined #openstack-containers22:42
imdigitaljimyeah22:44
flwangok, interesting22:44
imdigitaljim        - key: dedicated22:45
imdigitaljim          value: master22:45
imdigitaljim          effect: NoSchedule22:45
imdigitaljim        - key: CriticalAddonsOnly22:45
imdigitaljim          value: "True"22:45
imdigitaljim          effect: NoSchedule22:45
imdigitaljimtolerations:22:45
imdigitaljim  - key: dedicated22:45
imdigitaljim    value: master22:45
imdigitaljim    effect: NoSchedule22:45
imdigitaljim  - key: CriticalAddonsOnly22:45
imdigitaljim    value: "True"22:45
imdigitaljim    effect: NoSchedule22:45
imdigitaljimnodeSelector:22:45
imdigitaljim  node-role.kubernetes.io/master: ""22:45
imdigitaljimspecifically22:45
flwanghow many replicas? just 1?22:46
*** hongbin has quit IRC22:58
*** itlinux has quit IRC23:14

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!