Tuesday, 2019-03-05

*** ttsiouts has joined #openstack-containers00:07
*** sdake has joined #openstack-containers00:08
*** itlinux has joined #openstack-containers00:09
*** sdake has quit IRC00:12
*** itlinux_ has joined #openstack-containers00:13
*** sdake has joined #openstack-containers00:14
*** itlinux has quit IRC00:15
*** ttsiouts has quit IRC00:29
*** sdake has quit IRC00:41
*** sapd1 has quit IRC00:42
*** PagliaccisCloud has quit IRC00:50
*** PagliaccisCloud has joined #openstack-containers00:52
*** openstackgerrit has joined #openstack-containers00:57
openstackgerritJake Yip proposed openstack/magnum master: Update min tox version to 2.0  https://review.openstack.org/61641200:57
*** ricolin has joined #openstack-containers01:03
*** sdake has joined #openstack-containers01:08
*** sapd1 has joined #openstack-containers01:52
*** sdake has quit IRC02:05
*** hongbin has joined #openstack-containers02:13
*** sdake has joined #openstack-containers02:28
*** sapd1 has quit IRC02:37
*** itlinux_ has quit IRC03:06
*** itlinux has joined #openstack-containers03:12
*** itlinux has quit IRC03:21
*** itlinux has joined #openstack-containers03:25
*** sdake has quit IRC03:51
*** ramishra has joined #openstack-containers04:09
*** udesale has joined #openstack-containers04:18
*** ykarel|away has joined #openstack-containers04:31
*** ykarel|away is now known as ykarel04:31
*** janki has joined #openstack-containers05:08
*** hongbin has quit IRC05:09
*** udesale has quit IRC05:22
*** jhesketh has quit IRC05:47
*** jhesketh has joined #openstack-containers05:48
*** sdake has joined #openstack-containers05:48
*** sdake has quit IRC05:50
*** pcaruana has joined #openstack-containers05:52
*** sdake has joined #openstack-containers05:58
*** pcaruana has quit IRC06:07
*** dims has quit IRC06:24
*** dims has joined #openstack-containers06:26
*** itlinux has quit IRC06:34
*** dims has quit IRC06:36
*** dims has joined #openstack-containers06:37
*** mkuf has quit IRC06:48
*** mkuf has joined #openstack-containers07:01
*** udesale has joined #openstack-containers07:51
*** udesale has quit IRC08:01
*** flwang1 has joined #openstack-containers08:14
*** sapd1 has joined #openstack-containers08:16
*** pcaruana has joined #openstack-containers08:18
*** pcaruana has quit IRC08:25
*** yolanda has joined #openstack-containers08:25
flwang1strigazi: around?08:33
*** sdake has quit IRC08:36
*** pcaruana has joined #openstack-containers08:37
flwang1strigazi: do you have time for a catch up?08:40
*** ykarel is now known as ykarel|lunch08:41
*** pcaruana has quit IRC08:44
*** ttsiouts has joined #openstack-containers08:52
*** alisanhaji has joined #openstack-containers09:00
*** pcaruana has joined #openstack-containers09:01
*** ttsiouts has quit IRC09:05
*** ttsiouts has joined #openstack-containers09:06
*** ttsiouts has quit IRC09:10
*** ttsiouts has joined #openstack-containers09:12
*** ign0tus has joined #openstack-containers09:12
*** alisanhaji has quit IRC09:30
*** alisanhaji has joined #openstack-containers09:32
*** ykarel|lunch is now known as ykarel09:39
*** sdake has joined #openstack-containers09:55
*** sdake has quit IRC10:51
*** sdake has joined #openstack-containers10:55
*** ttsiouts has quit IRC11:19
*** ttsiouts has joined #openstack-containers11:20
*** janki has quit IRC11:22
*** ttsiouts has quit IRC11:24
*** mkuf has quit IRC11:27
*** mkuf has joined #openstack-containers11:28
*** sapd1 has quit IRC11:29
*** udesale has joined #openstack-containers12:00
*** ttsiouts has joined #openstack-containers12:01
*** dave-mccowan has joined #openstack-containers12:19
*** sdake has quit IRC12:39
*** janki has joined #openstack-containers13:17
*** sdake has joined #openstack-containers13:19
*** ivve has joined #openstack-containers13:38
*** andrein has joined #openstack-containers13:41
*** sapd1 has joined #openstack-containers13:43
andreinHi guys, I'm trying to configure magnum on openstack rocky. I can launch the cluster, but the heat stack fails after creating the masters. I've logged in to the masters and every one of them is hanging when starting etcd because it can't find the certificates. I've noticed the make-certs.sh job failed on all of them because they're trying to hit the keystone API over the internal endpoint. How can I change this?13:47
*** sapd1 has quit IRC13:47
*** sdake has quit IRC14:11
*** janki has quit IRC14:12
*** janki has joined #openstack-containers14:12
*** ykarel is now known as ykarel|away14:14
*** ykarel|away has quit IRC14:18
*** sapd1 has joined #openstack-containers14:18
*** ykarel|away has joined #openstack-containers14:19
*** ttsiouts has quit IRC14:21
*** sdake has joined #openstack-containers14:22
*** ttsiouts has joined #openstack-containers14:22
*** sdake has quit IRC14:23
DimGRstrigazi hii :)14:23
*** ttsiouts has quit IRC14:25
*** ttsiouts has joined #openstack-containers14:25
*** sdake has joined #openstack-containers14:33
brtknrandrein: how was your openstack deployed?14:37
andreinbrtknr, I deployed it using kolla-ansible14:47
*** hongbin has joined #openstack-containers14:47
brtknrWhich version of kolla-ansible?14:47
andreinVersion 7.0.114:48
*** sdake has quit IRC14:48
*** sdake has joined #openstack-containers14:50
brtknrHmm, can you check your heat-container-agent log in master?14:50
brtknralso check /var/log/cloud-init.log14:51
brtknrand /var/log/cloud-init-output.log14:51
brtknrand grep -i for fail14:51
andreinOn the Kubernetes master, right?14:51
* andrein is spawning another cluster14:53
*** pcaruana has quit IRC14:57
*** munimeha1 has joined #openstack-containers14:58
andreinbrtknr, cloud init log shows make-cert.service failing14:59
andreinThat's the only error I see in cloud-init logs. I'm using coreos as a base image for this cluster.15:04
*** sdake has quit IRC15:04
andreinFrom what I notice in /etc/sysconfig/heat-params, MAGNUM_URL is set to the public endpoint, but AUTH_URL is private.15:04
andreinMake-certs.sh is trying to hit the private auth endpoint and times out after a while, that causes etcd to fail etc.15:05
*** sdake has joined #openstack-containers15:07
openstackgerritjacky06 proposed openstack/magnum-tempest-plugin master: Update json module to jsonutils  https://review.openstack.org/63896815:07
andreinHmmm, wait a second, in horizon under admin/system information I do have the wrong URL for the public endpoint. Seems something went south in kolla-ansible15:08
*** sapd1 has quit IRC15:25
*** openstackgerrit has quit IRC15:28
*** alisanhaji has quit IRC15:34
*** pcaruana has joined #openstack-containers15:42
*** alisanhaji has joined #openstack-containers15:44
*** sdake has quit IRC15:49
*** udesale has quit IRC15:50
*** belmoreira has quit IRC16:00
*** sdake has joined #openstack-containers16:01
*** ricolin has quit IRC16:04
*** Adri2000 has joined #openstack-containers16:21
Adri2000hello16:21
Adri2000is there any existing discussion somewhere about using 8.8.8.8 as default dns server for magnum-created networks, instead of not specifying any dns server and therefore using neutron dns resolution?16:23
Adri2000at least in the k8s_fedora_atomic_v1 driver16:23
*** janki has quit IRC16:41
*** ramishra has quit IRC16:41
*** janki has joined #openstack-containers16:41
*** ign0tus has quit IRC16:45
*** ivve has quit IRC16:58
*** andrein has quit IRC17:00
-openstackstatus- NOTICE: Gerrit is being restarted for a configuration change, it will be briefly offline.17:09
*** ykarel|away has quit IRC17:21
*** itlinux has joined #openstack-containers17:35
*** sdake has quit IRC17:35
*** ttsiouts has quit IRC17:38
flwang1Adri2000: we have seen this requirement before17:47
flwang1but no one working on that now17:47
*** itlinux has quit IRC18:42
*** itlinux has joined #openstack-containers18:47
*** andrein has joined #openstack-containers18:56
brtknrMeeting today?19:10
*** ivve has joined #openstack-containers19:12
brtknrandrein: if coreos is not essential, try using fedora-atomic driver19:12
brtknrnot sure many people here are testing coreos environment19:12
brtknralthough that might change soon with fedora-coreos?19:13
*** ttsiouts has joined #openstack-containers19:31
*** sdake has joined #openstack-containers19:32
*** dave-mccowan has quit IRC19:55
*** itlinux has quit IRC19:56
*** NobodyCam has joined #openstack-containers19:57
NobodyCammorning Magnum folks19:57
NobodyCamanyone encountered Authorization failed. or token scope issues with OpenStack sensible installed magnum?19:58
*** dave-mccowan has joined #openstack-containers20:01
*** itlinux has joined #openstack-containers20:08
*** ttsiouts has quit IRC20:11
*** ttsiouts has joined #openstack-containers20:11
*** itlinux has quit IRC20:15
*** ttsiouts has quit IRC20:15
*** sdake has quit IRC20:17
andreinbrtknr, I eventually got it working with CoreOS. Had to change the keystone public endpoint manually, no idea why kolla skipped reconfiguring it, the other endpoints were changed.20:18
*** sdake has joined #openstack-containers20:19
*** ttsiouts has joined #openstack-containers20:32
flwang1strigazi: do we have meeting today?20:36
flwang1brtknr: seems we don't have meeting today, strigazi is not online20:42
strigaziWe do have a meeting20:46
brtknrWoot!20:49
brtknrandrein: we have submitted kolla-ansible config to modify keystone endpoint in the past but maybe it wasnt for coreos20:49
strigaziDates for the next three Tuesdays https://wiki.openstack.org/wiki/Meetings/Containers20:50
colin-hi20:51
*** andrein has quit IRC20:52
flwang1strigazi: cool, good to see you20:53
*** andrein has joined #openstack-containers20:53
*** mkuf has quit IRC20:57
strigazi#startmeeting containers21:00
openstackMeeting started Tue Mar  5 21:00:05 2019 UTC and is due to finish in 60 minutes.  The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot.21:00
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:00
*** openstack changes topic to " (Meeting topic: containers)"21:00
openstackThe meeting name has been set to 'containers'21:00
strigazi#topic Roll Call21:00
*** openstack changes topic to "Roll Call (Meeting topic: containers)"21:00
strigazio/21:00
schaneyo/21:00
jakeyipo/21:00
brtknro/21:01
strigaziHello schaney jakeyip brtknr21:02
strigazi#topic Stories/Tasks21:02
*** openstack changes topic to "Stories/Tasks (Meeting topic: containers)"21:02
*** imdigitaljim has joined #openstack-containers21:02
imdigitaljimo/21:02
strigaziI want to mention three things quickly.21:03
strigaziCI for swarm and kubernetes is not passing21:03
colin-hello21:03
strigaziHello colin- imdigitaljim21:03
strigaziI'm finding the error21:04
strigazifor example for k8s http://logs.openstack.org/73/639873/3/check/magnum-functional-k8s/06f3638/logs/screen-h-eng.txt.gz?level=ERROR21:04
strigaziThe error is the same for swarm21:04
strigaziIf someone wants to take a look and then comment in https://review.openstack.org/#/c/640238/ or in a fix :)21:06
strigazi2.21:06
strigazismall regression I have found for the etcd_volume_size label (persistent storage for etcd) https://storyboard.openstack.org/#!/story/200514321:06
strigazithis fix is obvious21:07
strigazi3.21:07
strigaziimdigitaljim created Cluster creators that leave WRT Keystone cause major error https://storyboard.openstack.org/#!/story/200514521:07
imdigitaljimyeah thats my 121:07
strigaziit has been discusses many times. the keystone team says there is no fix21:07
strigaziin our cloud we manually transfer the trustee user to another account.21:08
imdigitaljimcould we rework magnum to opt to poll heat based on a service account for 1 part21:08
imdigitaljiminstead of using trust cred to poll heat21:08
strigaziimdigitaljim: some says this is a security issue, it was like this before.21:08
imdigitaljimoh?21:09
strigazibut this fixes part of the problem21:09
imdigitaljimcouldnt it be scoped to readonly/gets for heat21:09
imdigitaljimthe kubernetes side21:09
imdigitaljimeither might be trust transfer (like you suggest)21:09
imdigitaljimor we have been opting for teams to use a bot account type approach for their tenant21:09
imdigitaljimthat will persist among users leaving21:09
strigazitrusts transfer *won't* happen in keystone, ever21:10
imdigitaljimyeah21:10
imdigitaljimi doubt it would21:10
jakeyipdoes this happen only if the user is deleted from keystone?21:10
strigazithey were clear with this in the Dublin PTG21:10
*** pcaruana has quit IRC21:10
strigaziyes21:10
imdigitaljimyeah21:10
strigazithe trust powers die when the user is deleted21:11
strigazisame for application creds21:11
imdigitaljimto be honest even if we fix the "magnum to opt to poll heat based on a service account"21:11
imdigitaljimthat would be a huge improvement21:11
imdigitaljimthat would at least enable us to delete the clusters21:11
imdigitaljimwithout db edits21:11
strigaziadmins can delete the cluster anyway21:11
imdigitaljimwe could not21:12
strigazi?21:12
imdigitaljimwith our admin accounts21:12
imdigitaljimthe codepaths bomb out with heat polling21:12
imdigitaljimnot sure where21:12
jakeyipis this a heat issue instead?21:12
imdigitaljimthe occurrence was just yesterda21:12
strigazimayve you diverged in the code?21:12
imdigitaljimno i had to delete the heat stack underneath with normal heat functionality21:12
imdigitaljimand then manually remove the cluster via db21:13
strigaziwrong policy?21:13
imdigitaljimnot with that regard21:13
colin-+1 re: service account, fwiw21:13
imdigitaljimnope21:14
imdigitaljimAuthorizationFailure: unexpected keystone client error occurred: Could not find user: <deleted_user>. (HTTP 404) (Request-ID: req-370b414f-239a-4e13-b00d-a1d87184904b)21:15
strigaziok21:15
jakeyipok so figuring out why admin can't use magnum to delete a cluster but can use heat to delete a stack will be a way forward?21:15
jakeyipI wonder what is the workflow for normal resources (e.g. nova instances) in case of people leaving?21:15
strigazithe problem is magnum can't check the status of the stack21:16
brtknrit would be nice if the trust was owned by a role+domain rather than a user, so anyone with the role+domain can act as that role+domain21:16
imdigitaljim^21:16
imdigitaljim+121:16
imdigitaljim+121:16
brtknrguess its too late to refactor things now...21:16
imdigitaljimimo not really21:16
strigaziit is a bit bad as well21:16
imdigitaljimbut it can be bad based on the use-case21:17
imdigitaljimfor us its fine21:17
strigazithe trust creds are a leak21:17
imdigitaljimyeah21:17
imdigitaljimthe trust creds on the server21:17
strigaziuserA takes trust creds from userb that they both own the cluster21:17
imdigitaljimand you can get access to other clusters21:17
strigaziuserA is fired, can still access keystone21:17
brtknroh, because trust is still out in the wild?21:18
strigazithe polling issue is different than the trust in the cluster21:18
imdigitaljimyeah21:18
brtknrchange trust password *rolls eyes*21:18
imdigitaljimdifferent issues21:18
strigaziwe can do service account for polling again21:18
imdigitaljimbut an admin readonly scope21:19
imdigitaljim?21:19
strigaziThat is possible21:19
strigazisince the magnum controller is managed by admins21:19
imdigitaljimyeah21:19
imdigitaljimi think that would a satisfactory solution21:19
imdigitaljimthe clusters we can figure out/delete/etc21:19
imdigitaljimbut magnums behavior is a bit unavoidable21:20
imdigitaljimthanks strigazi!21:20
imdigitaljimyou going to denver?21:20
strigazihttps://github.com/openstack/magnum/commit/f895b2bd0922f29a9d6b08617cb60258fa101c68#diff-e004adac7f8cb91a28c210e2a8d08ee921:21
strigaziI'm going yes21:21
imdigitaljimlets meet up!21:21
strigazisure thing :)21:22
strigaziIs anyone going to work on the polling thing? maybe a longer description first in storyboard?21:22
flwang1strigazi:  re https://storyboard.openstack.org/#!/story/2005145  i think you and ricardo proposed this issue before in mailing list21:23
strigaziyes, I mentioned this. I discussed it with the keystone team in Dublin21:24
flwang1and IIRC, we need support from keystone side?21:24
strigazithere won't be help or change21:24
strigazifrom the keystone side21:24
strigazi22:11 < strigazi> trusts transfer *won't* happen in keystone, ever21:25
strigazinor for application credentials21:25
flwang1strigazi: so we have to fix it in magnum?21:25
strigaziyes21:25
strigazitwo issues, one is the polling heat issue21:25
strigazi2nd, the cluster inside the cluster must be rotated21:26
imdigitaljimcreds inside*21:26
strigaziwe had a design for this in Dublin, but not man power21:26
strigaziyes, creds :)21:26
imdigitaljimyeah 1) trust on magnum, fixable and 2) trust on cluster, no clear path yet21:26
strigazi2) we have a rotate certificates api with noop21:27
strigaziit can rotate the certs and the trust21:27
strigazithat was the design21:27
flwang1strigazi: ok, i think we need longer discussion for this one21:27
imdigitaljimim more concerned about 1) for the moment which is smaller in scope21:27
imdigitaljim2) might be more challenging and needs more discussion/desing21:27
imdigitaljimdesign21:27
strigazino :) we did it one year ago, someone can implement it :)21:27
strigaziI'll bring up the pointer in storyboard21:28
*** janki has quit IRC21:29
strigaziFor the autoscaler, are there any outstanfing comments? Can we start pushing the maintainers to accept it?21:30
flwang1strigazi: i'm happy with current status.21:30
flwang1it passed my test21:30
schaneystrigazi: there are some future enhancements that I am hoping to work with you guys on21:31
flwang1strigazi: so we can/should start to push CA team to merge it21:31
strigazischaney: do you want to leave a comment you are happy with the current state? we can ping the CA team the {'k8s', 'sig', 'openstack'} in some order21:32
flwang1schaney: sure, the /resize api is coming21:32
schaneyI can leave a comment yeah21:34
schaneyAre you alright with me including some of the stipulations in the comment?21:35
schaneyfor things like nodegroups, resize, and a couple bugs21:35
strigazischaney: I don't know how it will work for them21:35
schaneysame, not sure if it's better to get something out there and start iterating21:36
strigazi+1 ^^21:36
schaneyor try to get it perfect first21:36
flwang1schaney: i would suggest to track them in magnum or open separated issues later, but just my 2c21:36
imdigitaljimwe'll probably just do PRs against the first iteration21:37
schaneytrack them in magnum vs the autoscaler?21:37
imdigitaljimand use issues in autoscaler repo probably21:37
imdigitaljim./shrug21:37
schaneyyeah, us making PRs to the autoscaler will work for us going forward21:38
schaneythe current PR has so much going on already21:38
strigaziWe can focus on the things that work atm, and when it is in, PR in the CA repo are fine21:38
flwang1issues in autoscaler, but don't scare them :)21:38
flwang1strigazi: +121:39
schaneyone question if tghartland has looking into the TemplateNodeInfo interface method implementation21:39
strigazias long as we agree on the direction21:39
schaneyI think the current implementation will cause a crash21:40
imdigitaljimimho i think we're all heading the same direction21:40
strigazicreash on what?21:40
strigazicrash on what? why?21:40
schaneythe autoscaler21:40
strigaziis it reproducible?21:41
schaneyShould be, I am curious as to if you guys have seen it21:41
strigazino21:42
schaneyI'll double check, but the current implementation should crash 100% of the time when it gets called21:42
strigaziit is a specific call that is not implemented?21:42
schaneyyes21:42
strigaziTemplateNodeInfo  this >21:42
schaneyTemplateNodeInfo()21:42
strigaziI'll discuss it with him tmr21:43
schaneykk sounds good, I think for good faith for the upstream autoscaler guys, we might want to figure that part out21:43
schaneybefore requesting merge21:44
strigazi100% probability of crash should be fixed first21:44
*** ivve has quit IRC21:44
schaney:) yeah21:44
strigaziit is the vm flavor basically?21:45
schaneyyeah pretty much21:45
schaneythe autoscaler gets confused when there are no schedulable nodes21:46
*** alisanhaji has quit IRC21:46
schaneyso TemplateNodeInfo() should generate a sample node for a given nodegroup21:46
strigazisounds easy21:47
schaneyYeah shouldn't be too bad, just need to fully construct the template node21:47
strigazithis however: 'the autoscaler gets confused when there are no schedulable nodes' sounds bad.21:48
schaneyit tries to run simulations before scaling up21:48
strigaziso how it works now?21:48
schaneyif there are valid nodes, it will use their info in the simulation21:49
strigaziit doesn't do any simulations?21:49
schaneyif there is no valid node, it needs the result of templateNodeInfo21:49
strigaziif you can send us a scenario to reproduce, it would help21:50
schaneycordon all nodes and put the cluster in a situation to scale up, should show the issue21:51
strigazibut, won't it create a new node?21:51
strigaziI pinged him, he will try tmr21:52
flwang1strigazi: in my testing, it scaled up well21:52
strigazischaney: apart from that, anything else?21:52
strigazito request to merge21:52
strigaziflwang1: for me as well21:53
schaneyI think that was the last crash that I was looking at, everything else will just be tweaking21:54
strigazinice21:54
schaneyflwang1: to be clear, this issue is only seen when effectively scaling up from 021:54
flwang1schaney: i see. i haven't tested that case21:55
schaneyrare case, but I was just bringing it up since it will cause a crash21:55
flwang1schaney: cool21:55
strigaziwe can address it21:55
schaneyawesome21:56
strigaziwe are almost out of time21:58
flwang1strigazi: rolling upgrade status?21:58
strigaziI'll just ask one more time, Can someone look into the CI failures?21:58
flwang1strigazi: i did21:59
strigaziflwang1: end meeting first and the discuss it?21:59
flwang1the current ci failure is related to nested virt21:59
strigazihow so?21:59
flwang1strigazi: sure21:59
flwang1i even popped up in infra channel21:59
strigazilet's end the meeting first21:59
colin-see you next time21:59
strigazithanks everyone22:00
flwang1and there is no good way now, seems infra recently upgrade their kernel22:00
flwang1manser may have more inputs22:00
strigazi#endmeeting22:00
*** openstack changes topic to "OpenStack Containers Team"22:00
openstackMeeting ended Tue Mar  5 22:00:33 2019 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)22:00
openstackMinutes:        http://eavesdrop.openstack.org/meetings/containers/2019/containers.2019-03-05-21.00.html22:00
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/containers/2019/containers.2019-03-05-21.00.txt22:00
openstackLog:            http://eavesdrop.openstack.org/meetings/containers/2019/containers.2019-03-05-21.00.log.html22:00
strigazithis thing? http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2019-03-03.log.html#t2019-03-03T20:17:2922:02
flwang1strigazi: yes22:03
strigazithat is why I have the CI non voting22:03
strigazifeels more like an indication to me all these years.22:03
flwang1strigazi: yep, nest virt is still a pain22:04
strigazino problems with centos here22:04
strigazianyway,22:04
flwang1maybe we should migrate to FA 29 to try?22:04
flwang1did you get any luck on FA 29? it failed in my testing22:05
strigaziat cern we use it22:05
flwang1which k8s version?22:05
strigaziI didn't have time for testing it in devstack22:05
flwang1from community, no change?22:05
strigazino change22:05
flwang1ok22:05
strigazi1.13.3 and 1.12.422:06
flwang1cool22:06
strigazioverlay2, no extra volumes22:06
flwang1ok22:06
flwang1btw, i have already proposed the patch for api ref of resize API  https://review.openstack.org/63988222:07
flwang1and the health_status patch in cluster listing patch is here https://review.openstack.org/64022222:07
strigazii have seen them22:08
strigazimissed the api-ref22:08
strigaziFor upgrades, I'm working in the driver code.22:09
strigaziDo you want to take the api?22:09
flwang1yep, i can help polish the api patch, and the api ref22:09
strigaziThe only part that needs work in the api is:22:09
flwang1as for api, do you want to use the same way I'm using for resize api?22:10
strigazii think they are the same, no?22:10
strigazilast time I checked it was22:10
flwang1they should be same, a little bit diff between your current one and mine22:10
strigaziok22:10
strigazithe only part that needs some thought is22:11
strigaziclusterA used CT-A to be created22:11
strigaziCT-A had labels X Y and Z22:11
flwang1labels merging issue?22:11
strigaziyes22:12
strigaziI thought of a config option to check if some labels are going to be changed, and in such a case refuse to upgrade or even create a cluster22:13
flwang1can we do simple merge now like ricardo and i discussed?22:14
jakeyipis this to do with https://review.openstack.org/#/c/621611/ ?22:15
strigazino22:16
strigaziflwang1: were you discussed it?22:16
strigaziflwang1: where you discussed it?22:16
flwang1https://review.openstack.org/#/c/621611/22:16
flwang1we discussed similar issue in above patch22:16
strigazithis is for cluster create22:16
flwang1yes, but similar issue22:17
strigazialmost but not22:17
flwang1i mean, we probably want to use same policy for merging to avoid confusing users22:17
flwang1yes i know22:17
strigazishall i explain it or not?22:17
strigaziin one loine22:18
strigaziin one line22:18
strigaziuser in cluster creation selected version 5, the new CT to upgrade has version 4, what do you do?22:18
strigazidowngrade?22:18
flwang1for a version of an addon?22:19
strigazibased on 621611 yes, downgrade22:19
strigaziaddon or k8s22:19
jakeyipcluster creation label should override CT label?22:19
strigazijakeyip: yes, for creation. for upgrade?22:20
strigazias an admin, I don't wont to support users that go rogue22:20
jakeyipbtw I was just reviewing 621611 last night and I felt quite uneasy about it, prob cos there are many ways like this where it's going to be weird22:20
flwang1we should support downgrade, but better not now?22:20
strigazisupport downgrade from user selected version, to admin suggested?22:21
strigazithis is asking for trouble22:21
strigazithe matrix of version explodes this way22:21
flwang1yes, so i think we don't have to do it now, maybe even in future22:22
jakeyipsorry I am a bit lost where is the change for this functionality? (CT label update and cluster upgrade)22:23
strigazidowngrading is bad22:23
strigazijakeyip: there is no change for this yet22:24
jakeyipstrigazi: I see. are we thinking of using update on CT to update clusters?22:24
strigazithe problem is that users can select labels in cluster creation, then with a CT they will try to upgrade and there will be conflicts22:25
strigazijakeyip: yes22:25
flwang1strigazi: yep, labels is a pain, we can only support base image upgrade and k8s upgrade22:26
flwang1we need more discussion about labels, i mean i need more thinking22:26
strigaziwe need to support uprading add ons too. but even for k8s we should discourage users to pick the version in cluster creation22:27
strigazilet's see next week about it22:27
flwang1strigazi: that's why i mentioned before, we probably need another attributes for template22:27
flwang1which can indicate the compatibility with new/old versions22:28
*** sdake has quit IRC22:28
strigazithis week you can pick the API patch and I continue with the driver22:28
flwang1for example, CT-1.11.2     has attribute   can_upgrade_to   ['1.12.4', '1.13.4']22:28
flwang1strigazi: no problem22:28
strigazicool22:29
*** sdake has joined #openstack-containers22:29
*** sdake has quit IRC22:29
flwang1strigazi: thank you, my frined22:29
flwang1friend22:29
strigazijakeyip: do you need help with anything?22:29
strigaziflwang1: thank you22:29
strigaziflwang1: you do too much for the project22:30
jakeyipyes, maybe https://review.openstack.org/#/c/638077/ ?22:30
jakeyipPardon me, I am a newbie, but I feel like the CT to Clusters relationship needs to be defined a bit better?22:31
flwang1strigazi: btw, as for https://review.openstack.org/64021122:31
jakeyipif they are tightly coupled then updating CT is going to be very scary, for both operators and users22:31
strigazijakeyip: ack for 63807722:31
strigazijakeyip: we can limit the access to CTs for users22:32
flwang1i changed my mind, i would like to show the health_status_reason by default, cause the api has returned everything, we don't have to ask the user to add --detailed again to trigger another api call to see the health_status_reason, thoughts?22:32
strigazijakeyip: and give less freedom in labels in cluster creation22:32
jakeyipI feel like having CT just acting like a template is good, it prefills fields that you can override22:33
strigaziflwang1: hmm, i'm only concerned for very big clusters22:33
flwang1strigazi: that's rare case, no?22:33
strigaziwhat is the limit of the field in the db?22:33
flwang1maybe common in cern22:34
strigazifor us more than 500 nodes is a bit rare22:34
strigazibut a few 100s is not22:34
strigaziwe can take it :)22:34
flwang1strigazi: you try download the patch and give it try22:35
strigazisince the info is in the db anyway22:35
strigaziyes22:35
flwang1if it's really a pain, i'm open to support --detailed22:35
strigazicool22:35
*** sapd1 has joined #openstack-containers22:36
jakeyipstrigazi: I, as an operator, will feel uneasy updating a CT that half the clusters in my cloud depend on. So maybe I won't do it and just create new CTs. Negating the whole benefit.22:36
flwang1i'm good, sorry for a lot of pushing22:36
strigaziwon't be need probably, I'll add Ricardo in the review too22:36
strigazijakeyip it is not possible to update used CT and it won't be22:37
flwang1jakeyip: i can see your pain, but just like image, make CT immutable also have good sides22:37
*** sdake has joined #openstack-containers22:37
strigazijakeyip: did you understand that CTs will be mutable? they will continue to be immutable22:38
jakeyipstrigazi: sorry I thought we are talking about a updating labels on a  CT to trigger upgrade on a cluster?22:38
jakeyipok. phew22:38
strigazijakeyip: selecting a new CT triggers upgrade22:38
jakeyipstrigazi: thanks for the clarification!22:39
strigazicool, I have to go guys22:39
strigazisee you around22:40
strigazithanks flwang1 jakeyip for all the work22:40
brtknrSorry, i enjoyed my observer role, like to stay in the loop! good night22:40
jakeyipsee you thanks strigazi as always22:40
flwang1strigazi: see you22:40
*** andrein has quit IRC22:40
strigazibrtknr: cheers22:41
strigazibye22:41
imdigitaljimflwang1 we've moved long past having CT blocking issues for upgrades fwiw22:42
imdigitaljimid be glad to share more recent updates to centos driver22:42
imdigitaljimwe want a few last things done and a few of my team is going to scour the code and ready it for upstreaming22:42
brtknrimdigitaljim: are you using centos atomic or vanilla centos?22:42
imdigitaljimcentos 7.622:42
brtknratomic or not?22:43
imdigitaljimnope22:43
brtknrwith magnum?22:43
imdigitaljimyes22:43
imdigitaljimthis driver can be fairly easily adapted to ubuntu as wel22:44
imdigitaljimand the like22:44
brtknrthats cool! i often get questions about whether that is possible22:44
imdigitaljimoh for sure it is22:44
imdigitaljimill ping you when we start uploading the driver22:44
imdigitaljimit works differently than fedoras22:44
imdigitaljimbut its still executed the same way if that makes sense22:44
brtknrdo you have kube* services running as containers?22:44
imdigitaljimyes22:44
*** sdake has quit IRC22:45
brtknrcool!22:45
imdigitaljimwe also rely on github for versioning22:45
imdigitaljimso when you have a cluster you know what git revision of the cluster it was in case you need to know how it was bootstrapped22:45
imdigitaljimand providing newer versions is ezpz22:45
imdigitaljimupgrades in place are done through an api call + heat agent + kubernetes deployment22:46
brtknrthats one way of doing it!22:46
imdigitaljimso in other words we have a repo for magnum and a repo for the bootstrapping kubernetes content22:46
imdigitaljimthat we handle separately22:46
imdigitaljimwe hardly ever update magnum's code22:46
imdigitaljimill be glad to provide documentation on it and how it works for everything22:47
jakeyiphow do you point k8s to the config repo ?22:47
brtknrso you dont go via kubeadm ?22:47
imdigitaljimwhen that time comes22:47
imdigitaljimno but we've considered adapting that approach22:47
jakeyipthat's nice in a way. good for power users.22:47
imdigitaljimwe do it the hardway since we have more control22:48
imdigitaljim(i.e similar to fedora atomics)22:48
imdigitaljimbtw we could use any git repo22:48
imdigitaljimeven github.com22:48
brtknrcan you upgrade docker in place too? inside containers?22:48
imdigitaljimyoud provide that bootstrapping endpoint in the config file22:48
brtknr*inside vms22:48
imdigitaljimwhen we need to upgrade docker22:48
imdigitaljimwe just do a rolling scale22:49
imdigitaljimrolling/update22:49
imdigitaljimbut we actually have puppet connected22:49
imdigitaljimwhich can do it too22:49
imdigitaljimbut thats not part of the requirement for upstream content22:49
imdigitaljimpuppet is not a dependency22:49
brtknrdo you replace the image with new docker version or update the package itself?22:49
imdigitaljimso for example22:50
imdigitaljimthe image in the template is centos-magnum22:50
imdigitaljimwe update the image with a new centos-magnum with updated docker22:50
imdigitaljimand when nodes are scaled in/out they come up with new version of centos22:50
imdigitaljimand/or docker22:50
imdigitaljimand any additional software upgrades22:50
*** sdake has joined #openstack-containers22:51
imdigitaljimbased on a little git management and using the git api22:51
imdigitaljimwe can provide any versions of kubernetes22:51
imdigitaljim(that we want to support)22:51
brtknrso rolling scale = -1 old instance +1 new instance?22:51
imdigitaljimso we support v1.12.1 -> v1.13.422:51
imdigitaljimand slowly move up from there as people dont have older clusters22:52
imdigitaljimyeah22:52
brtknrdoesnt that just replace the nths node22:52
brtknrn_th node*22:52
*** munimeha1 has quit IRC22:52
imdigitaljimminions_to_remove=1...N22:52
imdigitaljimmasters_to_remove=1...N22:53
imdigitaljimjust make that call22:53
imdigitaljimuntil all minions/masters are cycled22:53
imdigitaljimthats how we execute the inplace upgrades mostly too22:53
jakeyipthis is via kubectl or ?22:53
imdigitaljimautomatic22:53
imdigitaljimheat-api call essentially though22:53
imdigitaljimwe have a kubernetes deployment that gets put on the cluster22:54
imdigitaljimthat cycles them22:54
imdigitaljimthe cluster manages itself22:54
imdigitaljimit does a drain + kill22:55
imdigitaljimso no loss of service22:55
imdigitaljimwe might iterate and if you have enough capacity22:55
imdigitaljimyou could grow your cluster N -> 2N22:55
jakeyipnice. one thing I'm confused when we were talking about a new image, is that a docker image or glance image?22:55
imdigitaljimthen back down to N canning the old nodes22:55
imdigitaljimah yeah22:56
imdigitaljimglance image22:56
imdigitaljimcentos-magnum is the glance image defined in the template22:56
imdigitaljimand we upgrade that22:56
jakeyipok so heat-api to do a rebuild, or is that a new nova instance ?22:56
imdigitaljimso new nodes come up with the upgrades22:56
brtknrso the glance image name is important22:56
jakeyipI guess N -> 2N is a new nova instance22:56
brtknrthere cannot be duplicate centos-magnum images?22:56
imdigitaljimwe dont need multiple images22:57
imdigitaljimfor our case22:57
imdigitaljimjust 122:57
brtknrwhat i mean is, old and new version keep the same name22:57
imdigitaljimif youre creating a cluster or upgrading nodes22:57
brtknr?22:57
imdigitaljimwe delete old22:57
imdigitaljimor rename old22:57
imdigitaljimand provide new22:57
brtknrok cool22:57
imdigitaljimtheres only ever 1 by that name22:58
imdigitaljimso its always okay22:58
imdigitaljim:D22:58
brtknr=D22:58
brtknri like the way docker solves this problem by using tag22:58
brtknrimage hash can be different but tag is always the same22:59
imdigitaljimyeah22:59
brtknrglance could benefit from something similar22:59
imdigitaljimbasically thats how we treat the glance image22:59
imdigitaljimbut yeah22:59
imdigitaljimits not by default22:59
imdigitaljimbecause when you provide same tag in docker22:59
imdigitaljimit detags the old one23:00
imdigitaljimwhich is what we do23:00
imdigitaljimheh23:00
brtknrare you also doing federations?23:00
imdigitaljimwe do keep it for a while23:00
imdigitaljimnot yet but thats to come23:00
imdigitaljimwell push upstream probably after inplace upgrades is fully completed23:00
brtknrwith a mix of gpu/non-gpu nodes in the same cluster?23:00
imdigitaljimits mostly like a alpha/beta level maturity23:00
imdigitaljimah yeah we'd also need to probably get hte nodegroups in place for upstream since the fedora guys all want it23:01
imdigitaljimwe dont use it here23:01
imdigitaljimbut we know its important23:01
brtknrhte?23:01
imdigitaljimthe*23:01
jakeyipimdigitaljim: so are you still using magnum?23:02
imdigitaljimyup23:02
brtknrlol^23:02
imdigitaljimcore magnum is almost the same (tiny changes that we'd upstream)23:02
jakeyiplol I don't mean it that way23:02
*** sdake has quit IRC23:02
jakeyipjust for 1st provisioning?23:02
imdigitaljimand everything else is a driver change23:02
imdigitaljimmagnum =/= fedora atomic k8s for us23:02
imdigitaljimif thats what you mean23:02
jakeyipI am thinking everything like node-count / image is going to be out of sync with your approach23:03
imdigitaljimmagnum is basically just a CRUD service23:03
imdigitaljimwhen flwang fixes the api23:03
imdigitaljimit wont be23:03
imdigitaljimbut atm yes it only reflects create time23:03
* brtknr googles CRUD23:03
brtknroh i see23:04
imdigitaljimcreate read update delete23:04
imdigitaljimthe image doesnt get out of date23:04
imdigitaljimthe node-count does23:04
jakeyipso it might be magnum thinks node-count is 2 but heat and actually is 4? then what happens when flwang api updates node-count?23:04
imdigitaljimwe were also thinking of a better way than polling heat23:04
imdigitaljimbut see if we can interact with a rabbitmq or something23:05
imdigitaljimheat <-> magnum relationship is pretty close anyways23:05
imdigitaljimjakeyip23:05
imdigitaljimwhen any scaling happens we'd use magnum's api instead of heats23:05
imdigitaljimand then magnum will update itself as expected23:05
imdigitaljimbut mostly magnum just proxies requests to heat anyways23:06
jakeyipok but for current clusters they'll be out of synced and someone needs to fix them up again I think?23:06
imdigitaljimhow so?23:06
jakeyipthe upgrade workflow you were mentioning about adds new nodes?23:07
imdigitaljimwe dont get out of synced issues23:08
imdigitaljimso maybe ive already solved that and hadnt explained it23:08
jakeyipis node-count going to be eventually consistent with what's defined in magnum, after the upgrade ?23:09
imdigitaljimoh you mean the N->2N thing23:13
imdigitaljimwe dont do that now23:13
imdigitaljimthat was just an idea for a more optimal upgrade23:13
imdigitaljimbut with the extra resource requirement23:13
jakeyipI see. thanks for the clarification!23:13
jakeyipwould love to see your implementation when it is possible!23:14
brtknrMe too23:31
*** sdake has joined #openstack-containers23:33
imdigitaljimyeah im looking forward to submitting it and would love to have some additional users23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!