Wednesday, 2023-05-17

*** mattp is now known as mkjpryor08:20
opendevreviewJohn Garbutt proposed openstack/magnum master: WIP: Implement cluster update for Cluster API driver  https://review.opendev.org/c/openstack/magnum/+/88080508:36
jakeyipgmann: ok08:57
jakeyipI'm here if anyone needs me08:59
* dalees is here09:01
daleeswe tested Fedora CoreOS 38 this week, doesn't look like any changes required.09:03
jakeyipnice09:06
jakeyipwhich k8s?09:06
daleesthat was with 1.25 I think.09:07
daleesI did test 1.27 the other week too (with the same changes in place as required for 1.26), it started ok too. Might be some `kube-system` services to update, but the basic service ran with all existing arguments.09:08
mkjpryorNot sure who has seen so far, but we have been working pretty hard on the Cluster API driver in the last few weeks.09:12
mkjpryorWe have had all the basic functionality working except template upgrade, so create/resize/delete and nodegroups all work09:14
daleeswe have seen and are following it closely mkjpryor - thank you! travisholton is testing your work.  I'm currently working on the CAPI side (to ensure access to clusters with no API Floating IP)09:15
mkjpryordalees - nice. travisholton and others - any reviews of the patches would be much appreciated!09:16
travisholtonhi all09:16
mkjpryordalees - on the subject of CAPI clusters without a floating IP. I don't know if you know Azimuth? It is the user-friendly platform-focused portal we have been developing at StackHPC.09:17
mkjpryorIn Azimuth, we are able to create Kubernetes clusters and expose web services on them (e.g. monitoring, dashboards, JupyterHub, KubeFlow) without consuming any floating IPs at all.09:18
mkjpryorWe do this using a tunnelling proxy that we built called Zenith. Zenith also handles TLS and SSO for services that want it to (not the Kubernetes API).09:20
mkjpryorZenith has a server and a client that establish a tunnel (using SSH reverse port forwarding) over which traffic can flow to the proxied service. In the case of the Kubernetes API, we launch a Zenith client on each control plane node using static pod manifests injected using cloud-init that points to the API server on localhost.09:22
daleesThat's interesting, I'll be keen to have a look at Zenith. I've PoC'd an Octavia solution that will give us access in, from another project network. Now looking at a netns proxy on neutron dhcp nodes, as Vexxhost have shown - seems like a good no-resource consuming solution.09:23
mkjpryorI've also seen Vexxhost's solution, and it does look neat.09:23
mkjpryorZenith does require you to run the Zenith server somewhere, but it is one server for lots of tunnels.09:24
mkjpryorWe will stick with Zenith in Azimuth as we want the SSO integration.09:24
mkjpryorBut something like Vexxhost's solution could be neat for Magnum going forward.09:25
mkjpryorIn terms of the upstream patches we have now, something like that would not be in the first iteration as the patch(es) are already large enough!09:25
jakeyipany link to vexxhost's solution ?09:25
daleesand so all your zenith clients just phone home with reverse ssh? that's kinda neat, too - so once they're connected the zenith server handles the SSO and no-one needs the ssh keys?09:25
daleesjakeyip: https://github.com/vexxhost/magnum-cluster-api/blob/main/magnum_cluster_api/proxy/manager.py  (as mentioned on https://kubernetes.slack.com/archives/CFKJB65G9/p1683654401315339?thread_ts=1683121146.879219&cid=CFKJB65G9 )09:26
mkjpryorjakeyip it is in their out-of-tree driver09:26
jakeyipthanks09:27
mkjpryorOnce we have the initial version of the in-tree Cluster API driver merged, it is something we could look at adding in a future patch09:27
daleesmkjpryor: yes i agree, it doesn't need to be in the first iteration. Not a must have for Private clouds, and we can add later. It would deploy in a very different way to the rest of Magnum, too.09:27
mkjpryorTheir code should work in our driver with very few tweaks, as it is pretty much independent of the way the CAPI resources are actually mode09:28
mkjpryordalees: I'm keen not to co-opt this into a discussion on Zenith, but yeah - basically. There is a process of associating an SSH key with a service ID, then any Zenith client that connects with that SSH key is associated with that service in a load balanced configuration.09:29
mkjpryorThe server then makes that load-balanced configuration available as <serviceid>.<basedomain>, handling TLS termination and SSO as required.09:30
mkjpryor(It only works for HTTP services ATM)09:31
mkjpryorBecause it relies on virtual hosts09:31
jakeyipvexxhost's implementation looks interesting09:31
mkjpryorI do like vexxhost's implementation if your only aim is to expose the Kubernetes API server, which it would be in Magnum09:33
daleesjakeyip: yeah, haproxy does all the heavy lifting. Just need to keep its config updated.09:33
mkjpryorIn Azimuth we also use Zenith to expose the Kubernetes and monitoring dashboards of the stacks that we deploy on the clusters, with SSO to make sure only authorised people can access them.09:34
jakeyipmkjpryor: this is more like a managed service? Azimuth is used to deploy clusters and managed resources, and users use normal k8s ingress, etc?09:36
daleescould do extra services with HAProxy into the netns, just need to write the control plane host addresses into config.09:37
mkjpryorjakeyip: At the moment, yes. Azimuth deploys Kubernetes clusters with dashboard and monitoring. It also allows users to deploy apps onto those clusters, like JupyterHub and KubeFlow, whose interfaces are also exposed using Zenith.09:38
mkjpryorThe Kubernetes API server is a slightly special case, done using static pod manifests for the Zenith clients. This is because they need to run before Kubernetes itself is actually up and running.09:39
jakeyipdeploying apps is interesting. do users still get kubeconfig access to their clusters? does Azimuth store state? what if a user deletes e.g. namespace of the app, will Azimuth gets confused09:40
mkjpryorFor all the other services, there is a Zenith operator watching for instances of the CRDs "zenith.stackhpc.com/{Reservation, Client}" to be created on each tenant cluster, and creating resources on the clusters in response to those.09:40
mkjpryorjakeyip: At the moment, we are just using Helm to splat the apps onto the cluster at deploy time. However we are in the process of using Argo CD to manage all the addons and apps on tenant clusters so that, in theory, we can recover from users doing stupid things like that ;-)09:41
mkjpryorAll the Azimuth state is in CRDs in the Kubernetes cluster on which it runs, so etcd in most cases09:42
jakeyipnice09:42
mkjpryorI did once have a user delete the CNI... In theory, the ArgoCD-managed addons could actually recover from that.09:43
mkjpryorBecause Argo is constantly watching to make sure that the resources it created are healthy09:44
mkjpryorI'm giving a talk about Azimuth in Vancouver actually, if anyone is interested.09:44
jakeyipwill make sure my team mate go :P09:45
jakeyipI need to read more about Azimuth and Zenith09:45
daleescool, I will look out for the recording09:45
mkjpryorIt is called "Self-service LOKI applications for non-technical users" in the Private & Hybrid Cloud track09:45
mkjpryorI'm also giving a talk about Cluster API and Magnum with mnaser from vexxhost09:46
jakeyipwe are also starting to dip our toes into Managed K8S so your experiences will be very helpful indeed. will need to bother you more if we have questions.09:47
jakeyiphopefully the code merges before Vancouver! :D09:47
jakeyipno pressure09:47
mkjpryorjakeyip feel free to get in touch09:47
jakeyipthanks09:48
mkjpryorjakeyip speaking of merging the code, your eyes on the patches would be much appreciated09:48
mkjpryorWe have been working on improving the test coverage09:48
jakeyipyeah I have a bunch of things to review09:48
mkjpryorI think there is some stuff that is probably not too far from being mergable09:48
jakeyipneed everyone's help on the RBAC too09:48
daleestravisholton: did you have any comments or changes on mkjpryor's patchsets in gerrit?09:49
jakeyipmkjpryor: are tests passing yet? would appreciate if you can guide me with a list of reviews that I should be doing in order, etc09:50
travisholtonmkjpryor: I have been experimenting with it a bit lately as dalees mentioned09:50
mkjpryorjakeyip: Tyler from our team has been doing a lot of work on the Tempest Magnum plugin.09:51
mkjpryorI'm not sure things are completely passing yet in the gate09:51
mkjpryorBut then I don't think the old driver is fully tested in the gate either right, because of general slowness?09:52
mkjpryor(That is what we are trying to fix)09:52
jakeyipyeah I do the testing manually, which contributed to the slowness of reviews. mnasiadka was trying to help with that but he got busy for a bit.09:53
travisholtonmkjpryor: ok if I submit my own patches to that? I have one change I'd like to add09:53
mkjpryorOther colleagues in our team have been working on the tests for the old driver. mnasiadka probably has more context than me on how far that got.09:53
mkjpryortravisholton: The patch chain is already quite complicated - maybe for now just comment what you would change? We can add you as a co-author on the patch if we adopt it09:54
travisholtonyes I've been watching the patchsets change daily :-)09:55
mkjpryorIs the change adding additional functionality or fixing something?09:55
travisholtonadditional functionality (at least as of Monday)09:55
mkjpryorWe want to get the simplest possible chain merged, TBH. 09:55
mkjpryorFor example, we hardly support any labels at all right now09:55
mkjpryorBut I think that is fine for the first pass.09:56
travisholtonlol yeah I wanted to add a way to pass extra --set args to helm via labels09:56
mkjpryorWe can add support for labels one-by-one in smaller, more reviewable, patches09:56
jakeyiptravisholton: you can also put up a WIP patch too without the intention of it getting merged, just for sharing your code and ideas09:57
jakeyipWIP/DNM 09:57
mkjpryortravisholton so my idea for that is that there will be a template label that contains serialized JSON that will be merged into the Helm values09:57
mkjpryorThat is what we do in Azimuth (we also have a templates -> clusters model, where the operator manages the available templates)09:57
travisholtonmy idea was similar I think. Just a label that adds a string for the --set option09:58
mkjpryorI think I prefer the more structured approach, TBH09:58
mkjpryorAs in the template defines a set of values that are used as a "starting point" (just an empty dict now), then Magnum applies the cluster-specific stuff on top (node groups etc.)09:59
mkjpryor--set also notoriously doesn't deal with certain types very well, which is why they added --set-string10:00
mkjpryorSo using serialised JSON in the label avoids that too10:00
travisholtonsounds sensible10:02
travisholtonI've needed to pass some extra arguments in to get clusters to build successfully on devstack. How are you managing things like machineSSHKeyName, and passing kubeletExtraArgs right now?10:07
mkjpryormachineSSHKeyName should be set, if a key name is provided in the cluster10:10
mkjpryorkubeletExtraArgs is not set ATM10:10
mkjpryorWhat do you need to set in kubeletExtraArgs?10:10
travisholtonit hasn't been set in my devstack clusters (at least as of Monday). I haven't tried the past couple days10:11
travisholtonin ubuntu I need to set resolv-conf: /run/systemd/resolve/resolv.conf 10:12
mkjpryorTBH, I haven't tried since JohnG starting mangling my original patches to make them more reviewable/testable10:14
mkjpryorSo it might be broken10:14
mkjpryorBut it should work10:14
travisholtonhere's a complete output of the helm values that I used in devstack that works for me: https://0bin.net/paste/Kj+EsYct#Ps1iIFncxctoFg966Yg1aVBV7W8Z6HEbOnVM0IY-mbR10:15
mkjpryorI've never seen a requirement to make that change to the resolv.conf. What is it that means you need to do that?10:15
travisholtonhttps://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues10:16
mkjpryor"kubeadm automatically detects systemd-resolved, and adjusts the kubelet flags accordingly."10:17
mkjpryorShouldn't that be happening?10:17
travisholtonhmm...I know I have seen it not working. I may try it without again and see if it's still a problem10:18
mkjpryorThat would explain why I haven't needed to do it before10:18
mkjpryorWhat image are you using?10:18
travisholtonan ubuntu-22.04-x86_64 image that I built with packer+ansible10:19
mkjpryorMaybe it is a 22.04 thing10:19
mkjpryorWe are using 20.04 still10:19
travisholtonright..I've only been using 22.04 lately10:19
mkjpryorIn any case, I think the implementation of the label containing strucutred "default values" would be a good thing to have10:20
travisholton+110:20
mkjpryorYou can implement it at the top of the patch chain if you want10:20
mkjpryorJust be prepared to rebase regularly :D10:21
travisholtonthat's not a problem10:21
mkjpryorWe will probably also implement other labels to simplify common things that could "technically" all be specified via this "default values" template label10:22
daleesin which order should they have precedence? I'd guess: helm chart defaults, then default value json structure, then specific labels? 10:23
travisholtonwe'll certainly want to have some that we can customise (eg imageRepository)10:23
mkjpryorSo the merge order, with rightmost entries taking precedence, will eventually be: "chart defaults" -> "Magnum global defaults" -> "template defaults" -> "template labels" -> "cluster labels" -> "Magnum-derived cluster-specifics (e.g. networks, node groups)"10:24
mkjpryorIMHO anyway10:24
daleesyeah, that makes sense.10:26
daleesthanks all, see you next week.10:43
jakeyipseeya all 10:54
gmannjakeyip: thanks 16:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!