dalees | Hi all, meeting here in 5 minutes. | 07:55 |
---|---|---|
jakeyip | hi dalees | 07:57 |
dalees | #startmeeting magnum | 08:00 |
opendevmeet | Meeting started Tue Jun 24 08:00:09 2025 UTC and is due to finish in 60 minutes. The chair is dalees. Information about MeetBot at http://wiki.debian.org/MeetBot. | 08:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 08:00 |
opendevmeet | The meeting name has been set to 'magnum' | 08:00 |
dalees | #topic Roll Call | 08:00 |
dalees | o/ | 08:00 |
jakeyip | o/ | 08:01 |
sd109 | o/ | 08:01 |
dalees | mnasiadka: ping | 08:01 |
mnasiadka | o/ (but on a different meeting so might not be very responsive) | 08:01 |
dalees | okay | 08:02 |
dalees | Agenda has reviews and one topic, so I moved that first. | 08:02 |
dalees | #topic Upgrade procedure | 08:02 |
dalees | jakeyip: you brought this one up? | 08:03 |
jakeyip | hi that's mine. just want to know, anyone has an upgrade workflow inmine? | 08:03 |
jakeyip | in mind? | 08:03 |
dalees | for the combination of magnum, helm driver, helm charts capo, capi? | 08:04 |
jakeyip | yes. | 08:05 |
dalees | do you mean versions, or actual processes of performing them? | 08:06 |
jakeyip | we recently looked into upgrading capi/capo but realised there are dependency with magnum-capi-helm and also helm charts | 08:07 |
sd109 | capi-helm-charts publishes a dependencies.json which specifies which capi version is tested against for each release of the charts: https://github.com/azimuth-cloud/capi-helm-charts/blob/affae0544b07c4b2e641b3b5bf990e561c055a91/dependencies.json | 08:07 |
dalees | yeah, we've done capi/capo but not to their latest where they dropped capo's v1alpha7. that will be tricky to make sure all clusters are moved off old helm charts. | 08:08 |
jakeyip | sd109: are there upgrade tests being run ? | 08:10 |
jakeyip | charts will provision the resource at the version specified. upgrading CTs / charts _should_ upgrade those resources too? | 08:11 |
jakeyip | capi / capo upgrades also upgrade the resources too I believe, but ideally they should be upgraded by charts first? | 08:13 |
dalees | jakeyip: so the way these versions work is one is the 'stored' version (usually latest) and the k8s api can translate between that and any other served version (wheel-spoke in k8s docs). | 08:14 |
dalees | so once you upgrade capo, it'll write new CRDs for the new versions to k8s, and start storing and serving the new versions (eg v1beta1). It doesn't matter if the charts write in the old or new crd version as long as it's still served. | 08:15 |
dalees | so you need to care when they stop serving, but otherwise can keep using either old or new crd versions. | 08:15 |
sd109 | There are upgrade tests being run but since we haven't upgrade CAPO past v0.10 (due to some security group changes in v0.11 and the need for the new ORC installation in v0.12) yet, we haven't actually tested v0.12 upgrade that drops v1alpha7 for example | 08:16 |
dalees | kubebuilder has some good info on this if you want to read more: https://book.kubebuilder.io/multiversion-tutorial/conversion-concepts.html | 08:16 |
jakeyip | in this case, the chart will be using an older version, what happens when you do a helm upgrade? | 08:16 |
dalees | sd109: ah, that's helpful to know you haven't gotten there either! | 08:17 |
sd109 | When you do a helm upgrade, it will upgrade the resources to new v1beta1 thanks to https://github.com/azimuth-cloud/capi-helm-charts/pull/423 | 08:18 |
jakeyip | no I mean a `helm upgrade` changing values but not chart, which happens when you resize etc | 08:19 |
dalees | if helm tries to talk to capo with an old crd version that isn't served it'll fail. | 08:20 |
dalees | ie. new capo (after v1alpha7 no longer served), and old chart that specifies v1alpha7. | 08:21 |
jakeyip | in the case where it is still served but not latest | 08:21 |
dalees | if it's served, it'll just translate to v1beta1 and store as that. | 08:21 |
jakeyip | ok | 08:22 |
dalees | that's the wheel-spoke model the kubebuilder docs talk about. (I *think* it changes the stored version in etcd on first write after the controller upgrade) | 08:22 |
jakeyip | I'll read that | 08:23 |
sd109 | So I guess we need to make sure all user clusters are using a new enough version of capi-helm-charts to be on v1beta1 before we upgrade CAPO on the management cluster | 08:23 |
jakeyip | yeah and also driver needs to be upgraded to talk v1beta1 first | 08:24 |
jakeyip | I _think_ the sequence is something like - driver -> charts -> cluster templates -> all clusters -> capi+capo ? | 08:26 |
dalees | yeah, that sounds right - for the version of capo that drops v1alpha7 (was it v0.10?). | 08:27 |
sd109 | I think the only place which needs updated in the driver is here: https://opendev.org/openstack/magnum-capi-helm/src/commit/60dc96c4dae8628e92c20b1ca594c4cf10eba5e4/magnum_capi_helm/kubernetes.py#L289 | 08:27 |
dalees | you do need capo at least up to a version that supports v1beta1 first (but that will be most of the installs) | 08:27 |
sd109 | And as far as I can tell that actually only affects the health_status of the cluster becuase it gets a 404 when trying to fetch the v1alpha7 version of the openstackcluster object from the management cluster | 08:28 |
sd109 | It was v0.12 that dropped v1alpha7 in CAPO | 08:28 |
dalees | ah v0.12, thanks. | 08:28 |
jakeyip | I think there were more issues than that but I can't remember now | 08:29 |
dalees | jakeyip: you had a patchset for that - did it merge? | 08:30 |
jakeyip | for magnum? | 08:30 |
jakeyip | sorry for the driver? I didn't merge it yet I think | 08:30 |
dalees | ah this one - https://review.opendev.org/c/openstack/magnum-capi-helm/+/950806 | 08:31 |
dalees | sd109: would you have a look at that soon? | 08:31 |
sd109 | Can we move that to v1beta1 now instead of v1alpha7? | 08:31 |
dalees | yeah, I'd prefer that | 08:32 |
jakeyip | can we / should we increment more than two? | 08:32 |
dalees | well, it blocks upgrade to capo 0.12 if we don't | 08:32 |
jakeyip | does capo serve more than 2? | 08:33 |
dalees | It's probably worth noting the version restrictions, but yes they do. | 08:34 |
dalees | (i need to look up these particular versions of capo) | 08:35 |
jakeyip | ok I really need to try it out first then report back | 08:35 |
jakeyip | happy to skip ahead while I take a look at capo, then come back later | 08:35 |
sd109 | There's some CAPO docs in API versions here which suggest CAPO does some kind of automatic migration to new API versions for us? https://cluster-api-openstack.sigs.k8s.io/topics/crd-changes/v1alpha7-to-v1beta1#migration | 08:35 |
sd109 | I also need to go away and have a closer look so happy to move on for now | 08:36 |
dalees | if you would like a whole lot of yaml to read through, all the info about your capo version is found in: `kubectl get crd openstackclusters.infrastructure.cluster.x-k8s.io -o yaml` | 08:36 |
dalees | look for "served: true", "stored: true" and "storedVersions" | 08:36 |
dalees | okay, we shall move on. thanks for sharing what we each know! | 08:37 |
dalees | #topic Review: Autoscaling min/max defaults | 08:37 |
dalees | link https://review.opendev.org/c/openstack/magnum-capi-helm/+/952061 | 08:38 |
dalees | so this is from a customer request to be able to change min and max autoscaling values | 08:38 |
dalees | i think we touched on this last meeting, but needed more thinking time. | 08:39 |
dalees | perhaps it's the same again | 08:39 |
sd109 | Yeah sorry I haven't had time to look at that one yet, I'm hoping to get to it this week and will leave any comments I have on the patch itself | 08:40 |
jakeyip | hm I thought I reviewed that but seems like I didn't vote | 08:40 |
dalees | thanks both, we can move on unless there are things to discuss. please leave review notes in there when you get to it. | 08:41 |
jakeyip | I will give it a go again | 08:41 |
jakeyip | oh it's in draft :P | 08:42 |
dalees | #topic Review: Poll more Clusters for health status updates | 08:42 |
dalees | https://review.opendev.org/c/openstack/magnum/+/948681 | 08:43 |
jakeyip | I will review this | 08:43 |
dalees | so this one I understand will add polling load to conductor for a large number of clusters | 08:43 |
dalees | we've been running it for ages, and it enables a few things I'm doing in later patchsets in the helm driver: better health_status, and pulling back node_count from autoscaler. | 08:44 |
dalees | it would be better to use watches for this than polling though. | 08:44 |
jakeyip | hm the current situation is syncing _COMPLETE, but this adds more? | 08:47 |
sd109 | Don't think I have anything to add on this one, seems like a nice addition to me but agree that the extra load is worth thinking about | 08:49 |
dalees | ah, that's true - it already is polling CREATE_COMPLETE and UPDATE_COMPLETE. So really that would be most clusters. | 08:49 |
jakeyip | yeah I didn't understand you comments "Without the _COMPLETE I also wonder if..." . | 08:51 |
dalees | huh. I think I had confused myself on what was being added. | 08:53 |
dalees | agree, _COMPLETE is already there. | 08:53 |
jakeyip | I think what you want is adding CREATE_IN_PROGRESS to surface the errors where a cluster gets stuck midway thru creation with things like autoscaler pod errors? | 08:54 |
jakeyip | maybe need to clarify the use case in the commit message, then good to go | 08:54 |
dalees | yeah I think so. thanks | 08:55 |
dalees | there are 3 more reviews noted in agenda but only 5 min. Any particular of those or others we might talk about? | 08:56 |
dalees | #topic Open Discussion | 08:56 |
dalees | or other topics, for the last part of the meeting | 08:56 |
jakeyip | I will look at them, maybe discuss next week | 08:57 |
jakeyip | next meeting :P | 08:57 |
sd109 | Yeah I haven't had time to look at the two Helm reviews either so I don't think we need to discuss them now | 08:58 |
dalees | all good, those helm ones are from stackhpc. Johns update to his one looks good and I want to progress Stigs one sometime as it's hurting us occasionally, but it can wait. | 08:59 |
sd109 | Great, thanks. I'm also trying to get someone from our side to progress Stig's one too but proving difficult to find the time at the moment | 09:00 |
dalees | yep, i think it's promising. probably just needs to move more things (like delete) to the same reconciliation loop to avoid the conflicts. | 09:01 |
dalees | but yeah, time! | 09:01 |
dalees | #endmeeting | 09:01 |
opendevmeet | Meeting ended Tue Jun 24 09:01:27 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 09:01 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/magnum/2025/magnum.2025-06-24-08.00.html | 09:01 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/magnum/2025/magnum.2025-06-24-08.00.txt | 09:01 |
opendevmeet | Log: https://meetings.opendev.org/meetings/magnum/2025/magnum.2025-06-24-08.00.log.html | 09:01 |
dalees | thanks both for coming and sharing! | 09:01 |
jakeyip | thanks dalees | 09:02 |
*** LarsErik1 is now known as LarsErikP | 10:30 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!