Wednesday, 2024-04-03

jakeyiphi all, anyone around for meeting? 08:54
jakeyipwas a extra long weekend over here, nothing much changed from last meeting08:54
daleesi'm back, but likewise little to share08:58
jakeyipok quick one then.09:00
jakeyip#startmeeting magnum09:01
opendevmeetMeeting started Wed Apr  3 09:01:03 2024 UTC and is due to finish in 60 minutes.  The chair is jakeyip. Information about MeetBot at http://wiki.debian.org/MeetBot.09:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.09:01
opendevmeetThe meeting name has been set to 'magnum'09:01
jakeyip#link https://etherpad.opendev.org/p/magnum-weekly-meeting09:01
jakeyip#topic Roll Call09:01
jakeyipo/09:01
jakeyipmnasiadka dalees courtesy ping :) 09:01
daleeso/09:01
jakeyip#topic Abandon old patches09:03
jakeyipI've abandoned many old patches as agreed 09:03
jakeyipsome I've starred for further reviews but haven't had time to action all of them09:03
daleesThanks for doing that, jakeyip 09:03
jakeyipfeel free to abandon any earlier than 2022-10-05 as previously agreed09:06
jakeyipanything else on this topic?09:06
jakeyipI'll go on to the next one09:08
jakeyipnext topic09:08
jakeyip#topic PTG09:08
jakeyipWed, 10 Apr, 06-08 UTC (next week)09:08
jakeyipmark your cal, see you all there  :) 09:09
jakeyipanything on PTG to chat about?09:11
daleesnothing from me, but see you there09:11
mkjpryorI guess all the Cluster API stuff09:11
mkjpryorJust to make sure we all understand the plan, and agree as much as possible09:12
jakeyip#topic ClusterAPI09:12
jakeyipmkjpryor: go ahead :) 09:12
mkjpryorSo we have moved the Helm driver to opendev now09:12
mkjpryorAs discussed before, we are still hoping for in to be in-tree in Dalmation09:13
mkjpryorBut I think that is one part that definitely needs discussing at the PTG09:13
mkjpryorThere are a few supporting components that we would also like to contribute to the Magnum project, but are currently quite dependent on GitHub Actions for their CI09:14
mkjpryorIn particular, the Helm charts themselves09:14
mkjpryorBut also our addon provider and the janitor that we use to do cleanup of OCCM resources09:14
mkjpryorThey will take more work to move to opendev09:14
mkjpryorSo, in summary, our current plan is for the Magnum project to own the Helm driver, the charts themselves, the addon provider and the janitor09:15
mkjpryorWhether the Helm driver is in-tree or not, or just the Cluster API driver that is actually owned by the Magnum project in a separate repo, needs discussion09:16
mkjpryorOur preference is for an in-tree Cluster API driver09:16
jakeyipI think we should drop the in-tree discussion this round, let's get it working out-of-tree properly first. being out-of-tree gives us good advantage to iterate without the massive change chain I was grappling with previously.09:16
jakeyipwe need to get the CI working09:16
mkjpryorThis is all true09:16
mkjpryorAlthough the "change chain" this time would just be an import of the driver as-is right? We wouldn't be doing it in increments like before.09:17
mkjpryorI think mnasiadka has made some progress with the CI09:18
jakeyipyeah we can discuss if and how to get it in when we clear the current obstacles09:19
jakeyipon the same topic, I had some issues running it, so I was wondering if it's being used in the current iteration anyway09:19
mkjpryorWe should have somewhere to raise and discuss issues that isn't this meeting09:20
jakeyipmy issue is a mismatch version of autoscaler + kubernetes. I can fix it, but without CI it's not useful.09:20
jakeyipso CI first09:20
mkjpryorWe have seen basically no issues with mismatched autoscaler and kubernetes versions. There may be something more sinister going on.09:21
jakeyipmkjpryor: bugs will be opened in Launchpad as per goverance I think? 09:21
mkjpryorYes09:21
mkjpryorBut in terms of chatting about general usage? Maybe bugs in launchpad that don't actually turn out to be bugs are fine09:22
jakeyipwe can chat here or in the Kubernetes Slack. 09:22
daleesmkjpryor: there is a channel #openstack-magnum on Kubernetes Slack. It's not very active, but that is a good persistent chat location perhaps.09:24
mkjpryorSounds good09:24
mkjpryorSlack works for me09:24
jakeyipit's early days not sure where the majority of the chat will end up. both sides have their advantages. feel free to DM me on Slack if you need me, the notification is a bit better there :) 09:25
jakeyipif you can stay after the meeting, it'll be great so I can post more details about the autoscaler issue09:26
jakeyipmkjpryor ^09:26
mkjpryorI can do that09:27
jakeyipthanks09:27
jakeyipmkjpryor: for your other points, the helm chart already has a repo `openstack/magnum-capi-helm-charts`, but is currently empty09:28
jakeyipit's set up for the same governance as `openstack/magnum-capi-helm` 09:29
mkjpryorOf all the projects, that is the one that is most heavily reliant on GitHub Actions09:29
mkjpryorThe CI will be difficult to port09:29
mkjpryorAnd will take time, which we do not currently have09:29
jakeyipcan you elaborate on the difficulties? maybe others can help09:30
mkjpryorJust we rely on a lot of actions to do things that would need to be replicated in another way09:31
jakeyipok, anything that's technically not possible in zuul?09:32
mkjpryorI don't think so09:32
mkjpryorWe also use images built for Azimuth in the CI, which might be politically not good :shrugs:09:33
mkjpryorI'm not sure whether the Magnum project wants to build and ship images09:33
mkjpryorProbably not09:33
jakeyipok, someone has to push that... StackHPC is probably best suited to do it...09:34
mkjpryorWe are, but we are also small and busy09:34
jakeyipdon't have to replicate ALL the CI too, a small subset is fine09:34
mkjpryorThis is the issue with open-source that isn't fully funded by a customer right09:34
jakeyipyeah same deal for most of us here... it'll be worse if we can't work together right? :D09:35
jakeyipif we can get it working, we can attract more people to help with maintenance. the initial hump is hard to get it up enough for others to see it as viable.09:37
mkjpryorOf course, but there is a significant chunk of initial effort that is required from us09:37
jakeyipI'm struggling with this being the PTL.09:37
mkjpryorWe have to offset this effort against customer work as well. It is tricky.09:38
mkjpryorAlso, we already have a set of things that work for us so there is inertia there as well.09:38
mkjpryorBut we do want to get there09:39
jakeyipyeah can understand. but this is critical get us from the OpenStack world to the Kubernetes world. That's why I'm still doing this...09:39
jakeyipanyway for the helm chart, I think we need it in governance and CI so that we can strip it down as a minimal product that can be extended by operators. that's what I see is good about this09:42
mkjpryorI agree. That is definitely a strength of the approach for sue.09:43
mkjpryorsure*09:43
jakeyipFYI it's a point of contention with TC if the driver references the helm chart managed in stackhpc github, so I don't think we have much room on this.09:43
jakeyipthis driver sure got many people's attention now :) 09:44
mkjpryorSo we talked about setting up a GitHub org for Azimuth the project, because we also want that to exist independently of StackHPC, and the charts could live there09:44
mkjpryorThere is definitely precedent for using components from other open-source projects09:44
mkjpryorSo maybe that is more palatable as an interim solution09:45
mkjpryorAfter all, it is just an external component that can be easily replaced09:45
mkjpryorI do understand that people have an issue with the default coming from a specific vendor09:46
jakeyipI don't understand this approach, is it so that the GH actions can be ported more easily from SHPC to Azimuth org? and it is harder to port it to Zuul ?09:46
mkjpryorWell - we want somewhere for Azimuth to live that is independent of StackHPC. No offence to Gerrit, but I wouldn't choose to use it unless I really had to.09:46
mkjpryorSo the new GitHub org is primarily for that09:47
mkjpryorBut it would have the side-effect that the CAPI Helm charts would now be an Azimuth sub-project rather than a "StackHPC product" (which we don't consider them to be now, but others clearly do)09:47
mkjpryorAnd yes - the CI would "just work" (modulo setting up some CI vars) in the new org09:48
mkjpryorSo it is easier than porting to Zuul in that sense09:48
jakeyipIt's a departure from what was originally proposed, I'm not sure if it will be agreeable to the many parties09:50
mkjpryorJust a thought on an alternative way to get to a palatable solution for the short term that doesn't rely on a large time investment from us that we might not be able to commit to09:50
mkjpryorLong term, we definitely want the charts under Magnum I think09:51
mkjpryorIf only we could get a customer to fund "moving the CAPI Helm charts under the governance of OpenInfra"09:51
mkjpryor:chuckles:09:52
mkjpryorHonestly, the time and availability of the right people to do the work is the major blocker here09:52
mkjpryorEspecially when something exists that works nicely09:53
mkjpryorAnyway - I think we are saying the same thing. We want the charts under Magnum governance.09:53
mkjpryorBut tensioning the time required to do that against the work that pays the bills is difficult.09:54
jakeyipI'm looking at the actions now https://github.com/stackhpc/capi-helm-charts/actions , a bunch of them don't have any runs, is there like a minimal thing we need to get it working? keep in mind we don't need ALL the CI09:54
jakeyipI think a big issue is StackHPC shifting to use the charts under Magnum governance but that doesn't have to happen09:54
mkjpryorSo at the moment, because they use our test infra, all the runs require manual approval which is only given after the changes have been reviewed09:55
mkjpryorSo there are a lot of runs that stay in the pending state and never get executed, yes09:56
mkjpryorThat would obviously change with Zuul09:56
jakeyipI'm seeing like https://github.com/stackhpc/capi-helm-charts/actions/workflows/ensure-capi-images.yaml doesn't have any run09:57
jakeyiphttps://github.com/stackhpc/capi-helm-charts/actions/workflows/lint.yaml09:57
mkjpryorOh09:57
mkjpryorA lot of those workflows are reusable workflows that are called from other workflows09:57
mkjpryorNot run directly09:57
mkjpryorFor example, those two you mention are called from main.yaml and pr.yaml09:58
jakeyipI see.09:58
mkjpryorThis is what I mean by it isn't going to be trivial to port09:58
opendevreviewMerged openstack/magnum master: CI: Use Calico v3.26.4  https://review.opendev.org/c/openstack/magnum/+/91157709:59
jakeyipshould I start from main.yaml to get an idea what needs to be ported?10:00
mkjpryorYeah10:01
mkjpryormain.yaml runs the most minimal set of tests that we have10:01
jakeyipok I'll take a look10:02
mkjpryorAnother thing we do is mirror all the required images to https://quay.io/organization/azimuth, and use that as a registry mirror for each of the major registries10:02
jakeyipI have a question - how do we get the chart built and hosted on opendev? 10:02
mkjpryormnasiadka and I spoke about this before10:02
mkjpryorProbably the easiest place to host the chart would be on Artifact Hub10:03
mkjpryorhttps://artifacthub.io/10:03
mkjpryorSo I guess there would be a Zuul job that ran the Helm packaging and pushed the resulting chart to there10:03
mkjpryorBut you don't need to package the chart to test it10:04
mkjpryorProbably getting the CI working is more pressing10:04
jakeyipyeah10:05
jakeyipok I need to end meeting cos overrun10:06
jakeyipdalees / mnasiadka / mkjpryor: anything else we need to capture for meeting? 10:06
jakeyipok I'll end it10:09
jakeyip#endmeeting10:09
opendevmeetMeeting ended Wed Apr  3 10:09:14 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)10:09
opendevmeetMinutes:        https://meetings.opendev.org/meetings/magnum/2024/magnum.2024-04-03-09.01.html10:09
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/magnum/2024/magnum.2024-04-03-09.01.txt10:09
opendevmeetLog:            https://meetings.opendev.org/meetings/magnum/2024/magnum.2024-04-03-09.01.log.html10:09
jakeyipmkjpryor: so my issue with autoscaler was registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0 fail to start on a v1.28.1 cluster10:10
jakeyipyou tried this combination before?10:11
mkjpryorSo10:11
mkjpryorThis isn't actually a version mismatch10:11
mkjpryorJust a permission missing from our clusterrole10:11
mkjpryorhttps://github.com/stackhpc/capi-helm-charts/pull/28210:11
mkjpryorThere will be a new release of the charts today with the fix in10:12
jakeyipcool10:13
mkjpryorBasically, the v1.29.0 version of the autoscaler needs permission to access an extra CAPI resource10:13
mkjpryorEven though the OpenStack provider doesn't actually support machinepools10:14
jakeyipok. I was going off https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#releases10:14
jakeyip> We recommend using Cluster Autoscaler with the Kubernetes control plane (previously referred to as master) version for which it was meant10:14
mkjpryorThey do say that10:14
mkjpryorBut in practice we haven't seen any issues with using mismatched versions10:15
jakeyipany plan to set the CA tag to kube_tag? PR or WIP? 10:15
jakeyipyeah it was previously ok for us too, just broke.10:16
mkjpryorThe Helm driver doesn't use kube_tag, because you can't change the Kubernetes version like that10:16
mkjpryorThe Kubernetes version is tied to the image you use10:16
mkjpryorSo we use image properties10:16
jakeyipoh yeah right hmm10:17
mkjpryorAlso, they don't release autoscaler versions for every point release, which makes doing it automatically tricky10:17
mkjpryori.e. if you have a 1.29.3 cluster, how do you know whether you need to use 1.29.0, 1.29.1, 1.29.2 or 1.29.3 for the autoscaler10:18
jakeyipwell on the webpage it's .X so I assume they do some sort of testing within the dot releases but not across minor versions10:19
mkjpryorBecause at the moment, the latest is 1.29.010:19
mkjpryorYeah, but there isn't a cluster-autoscaler tag of 1.29.3 even though Kubernetes 1.29.3 is out10:19
mkjpryorIs what I mean10:19
mkjpryorSo you can't just use the Kubernetes version as the tag10:19
mkjpryorBut also, you don't just want to use autoscaler 1.29.0 in case there is a bugfix release10:20
jakeyipyeah I understand now10:20
mkjpryorFor example, there are autoscaler tags for 1.28.{0,1,2} even though there are way more Kubernetes 1.28.X versions than that10:21
mkjpryorAnd I assume that if you are running a 1.28.1 cluster, you want the 1.28.2 autoscaler as it has bugfixes :shrugs:10:21
jakeyipI think a cloud operator might carry their helm charts with pinned versions10:22
mkjpryorThat makes sense10:22
jakeyiphow would upgrades be done?10:22
mkjpryorAt the moment, we just use 1.29.0 and it seems to work with 1.26, 1.27 and 1.28 clusters as well10:22
mkjpryorJust change the Helm values and the autoscaler deployment would be upgraded10:23
jakeyipfrom Magnum POV ?10:23
mkjpryorSo in our Magnum deployments with this driver, we are tying each Magnum template to a specific chart version and iamge10:23
mkjpryorSo when we roll out new templates, they have a new chart version and image10:24
jakeyipok makes sense10:24
mkjpryorWhich triggers an upgrade of the various components10:24
jakeyipwhat k8s versions have you upgraded across?10:25
mkjpryorIn the CAPI Helm chart CI we test 1.27 -> 1.28 -> 1.29 right now10:25
jakeyipand how do you version your cluster templates?10:26
mkjpryorSo we release new templates every month when we build new images10:26
mkjpryorWe version them with the Kubernetes point version10:26
mkjpryorIf we need to create a new template mid-cycle, then we add -1, -2 to the end I think10:27
jakeyipI'm thinking e.g. how to upgrade autoscaler from 1.29.0 to 1.29.1 if there's a bug in 1.29.010:27
jakeyipok maybe that's what I would end up doing10:28
mkjpryorI suppose you could version your templates as a combination of the kubernetes and chart versions10:28
jakeyiphave you tried upgrade -1 to -2? 10:28
mkjpryorThat might work10:28
mkjpryorIt is just the same as any other template upgrade10:28
jakeyipwhich ideally will only upgrade the helm chart10:28
mkjpryorAll it ends up doing is a Helm upgrade with the version and values derived from the template and cluster config10:29
jakeyipone annoying thing is that magnum cluster templates can't be renamed or deleted if there are clusters using it10:29
mkjpryorSo that is also the case in Azimuth, but we allow templates to be marked as deprecated, which prevents them being used for new clusters10:29
mkjpryorMagnum could do similar?10:30
jakeyipI hide them but then the API gets confused if the user uses name instead of template id10:30
jakeyipideally I hope to have template names like 'xxxx-v1.28.6' and whenever the user uses that name it gets the newest CT fitting that name10:31
mkjpryorI see what you mean10:32
jakeyipthinking thru, maybe we could actually upgrade users using the old 'xxx-v1.28.6' to the newer one and delete the old CT10:32
jakeyipsince it's all helm anyway10:32
mkjpryorDepends if you want to force an upgrade10:32
mkjpryorWhich could be disruptive10:33
jakeyiphmm10:33
jakeyipgood point I'll solve it when I get to it :) 10:33
mkjpryorI think a lot of those decisions will be what the operator is happy with10:34
mkjpryorSome will be happy with users running out-of-date clusters as long as they are isolated10:34
mkjpryorOthers will want to make sure their users stay up to date10:34
jakeyipanyway if the CA 1.29.0 thing gets solved, I will be able to help you with it10:35
mkjpryorCool10:35
mkjpryorThere will be a chart release today10:35
jakeyipwith the capi-helm-charts CI or something along that line10:35
mkjpryorHelp porting the CAPI Helm charts to opendev would be much appreciated,10:35
jakeyipI'm swapping Nectar dev over to use https://opendev.org/openstack/magnum-capi-helm from the in-tree patches10:36
mkjpryorCool10:36
mkjpryorMore users of that would also be helpful10:36
jakeyipwhen that works I'll look at swapping to use https://opendev.org/openstack/magnum-capi-helm-charts 10:36
jakeyiplooking the pace of release, one thing I'm concerned with is when to cut from StackHPC and then start to get CI working10:37
jakeyipyou have a lot of commits and the two repos will drift as soon as we cut it10:38
mkjpryorWell we have automated updates of addon versions10:39
mkjpryorSo yeah - it will drift quite quickly10:39
mkjpryorBut in terms of actual releases, we cut them on the first Wednesday of each month10:39
mkjpryorUnless we have bugs to fix10:40
jakeyipah good info10:40
jakeyipI guess I can start testing with this month's version and maybe I can import on this10:40
jakeyip(pending CI working for openstack/magnum-capi-helm) 10:41
mkjpryorFor the time being, we could propose each release from stackhpc/capi-helm-charts to opendev/magnum-capi-helm-charts10:41
mkjpryorUntil we have CI working that we are happy with on the opendev side10:41
jakeyipok.10:42
mkjpryorThen they hopefully don't drift too much10:42
jakeyipwonder how downstream operators will clone and keep up10:42
jakeyipI'll think about it10:44
jakeyipthanks for your time today. 10:45
mkjpryorNo worries10:45
jakeyipI'll knock off, it's late for me :) 10:46
mkjpryorAny help you are able to offer would be much appreciated10:46
mkjpryorI'd like to see the charts under Magnum governance10:46
jakeyipsure will be happy to10:46
mkjpryorYeah - go get some rest!10:46
jakeyipok, if you can get onto the Kubernetes Slack that'll be great too, can find me in #openstack-magnum. 10:48
jakeyipthanks for coming to the meeting10:48
jakeyipseeya!10:48
mnasiadkajakeyip: was off today ;)15:16
*** gmann_ is now known as gmann16:55
opendevreviewMerged openstack/magnum master: Replace abc.abstractproperty with property and abc.abstractmethod  https://review.opendev.org/c/openstack/magnum/+/85201023:05
opendevreviewTravis Holton proposed openstack/magnum-capi-helm master: add label for enabling auto scaling  https://review.opendev.org/c/openstack/magnum-capi-helm/+/91503123:30

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!