Tuesday, 2023-12-19

NeilHanlono/ good morning folks14:47
NeilHanlonreminder: meeting in ~13 minutes at the top of the hour14:47
jrosserNeilHanlon: is there an official way to get a newer kernel on Rocky (like ubuntu has HWE kernels for example)14:51
NeilHanlonyep! we have a relatively new release of a kernel-mainline package, which you can access by installing `rocky-release-kernel`14:51
NeilHanlonplan is to have mainline and LTS options14:52
NeilHanlon(some notes here: https://sig-kernel.rocky.page/meetings/2023-11-30/#discussions)14:52
jrosserahha interesting14:53
jrosserwe needed >= 5.1914:53
NeilHanlonwe're providing 6.5.12 at the moment, will be bumping that to 6.6 soon-ish14:54
NeilHanlonthere is a bit of a problem when it comes to 6.7, but.. that will be a problem for almost everyone I think -- it's related to SecureBoot and NX support14:54
NeilHanloni forget which LTS branch we decided to go with14:55
jrosserit turns out that everyone who ever enabled kTLS on linux and wrote a fancy blog about achieving hardware TLS offload and zero-copy forgot to actually measure the before/after memory bandwidth14:57
jrosseryou need a new-ish kernel and 3 week old (as of today) openssl for that to all actually work14:59
NeilHanlonouch.15:00
NeilHanlon#startmeeting openstack_ansible_meeting15:01
opendevmeetMeeting started Tue Dec 19 15:01:05 2023 UTC and is due to finish in 60 minutes.  The chair is NeilHanlon. Information about MeetBot at http://wiki.debian.org/MeetBot.15:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:01
opendevmeetThe meeting name has been set to 'openstack_ansible_meeting'15:01
NeilHanlon#topic rollcall15:01
NeilHanlong'morning again, folks :) 15:01
jrossero/ hello15:05
NeilHanlon#topic bugs15:06
NeilHanlonnothing new, i don't think. We still have https://bugs.launchpad.net/openstack-ansible/+bug/2046172 which I need to look into more15:06
NeilHanlonthere is also https://bugs.launchpad.net/openstack-ansible/+bug/2046223 -- which we were able to fix last week15:07
NeilHanlonalso of course, we released 2023.2, so really great work everyone on that. again, my apologies for the rocky gates; will keep a better handle of this going forward.15:08
damiandabrowskihi!15:08
NeilHanlonhi damiandabrowski :) 15:08
NeilHanlonanyone have anything for bugs?15:08
jrosserso we need to merge https://review.opendev.org/c/openstack/openstack-ansible/+/903545 next for the haproxy fix15:08
NeilHanlonah yes, was just looking into if we'd backported it yet15:09
NeilHanlon#topic office hours15:11
damiandabrowskiIn a week or two I'll try to implement OVN on our multi-AZ dev environment15:14
damiandabrowskiso I'll have a chance to test https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/858271 :D 15:14
jrosserwe have been doing a ton of work on magnum-cluster-api15:15
jrosserunfortunately it being out of tree driver has made it very difficult for me to get any of this merged15:16
NeilHanlonthat's "fun" 15:17
jrosserbut it is pretty much ready to be tested https://review.opendev.org/c/openstack/openstack-ansible/+/89324015:17
NeilHanlonnice work, i'll take a look at the change set15:18
jrosserthere was a long discussion here yesterday with spatel to help with the same thing in a kolla deploy15:18
jrosserand lots on the ML in general too15:18
NeilHanlonyeah i had seen some of that fly by15:19
jrosserso, imho, this could do with a higher profile]15:19
NeilHanlonyes, there definitely does seem to be a lot of interest in it15:19
NeilHanlonthanks everyone for coming - a reminder we have cancelled the next couple of meetings. The next meeting will be Tuesday, January 9th.15:38
NeilHanlonWishing you all the best in this festive season :) 15:38
NeilHanlon#endmeeting15:38
opendevmeetMeeting ended Tue Dec 19 15:38:48 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:38
opendevmeetMinutes:        https://meetings.opendev.org/meetings/openstack_ansible_meeting/2023/openstack_ansible_meeting.2023-12-19-15.01.html15:38
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2023/openstack_ansible_meeting.2023-12-19-15.01.txt15:38
opendevmeetLog:            https://meetings.opendev.org/meetings/openstack_ansible_meeting/2023/openstack_ansible_meeting.2023-12-19-15.01.log.html15:38
opendevreviewMerged openstack/openstack-ansible stable/2023.2: Fix http-check ordering for services  https://review.opendev.org/c/openstack/openstack-ansible/+/90354517:33
jrosserspatel: did you make your magnum stuff work in the end?17:46
spatelI am still dealing with some public network issue.. 17:46
spatelI will sure let you know as soon as it work 17:47
spateljrosser Hey! how do I tell CAPI control plane to use PublicURL endpoint? 19:31
spatelwho pass info to CAPI related what endpoint to use? 19:31
jrossercomplicated19:31
jrosserbecause there are several things talk to endpoints19:32
spatelI have noticed that its using CAPI using private endpoint to talk to openstack 19:32
jrosserwhat though?19:32
jrossermagnum service?19:33
spatelMy CAPI k8s running on public node and openstack has public and private both IPs. 19:33
spatelIn logs of magnum all I am seeing private endpoints info 19:33
jrosseradjust the clients in magnum.conf19:34
spatelthat means magnum sending private endpoint data to CAPI and CAPI trying to talk to private endpoint and its failing (because its located on public network)19:34
jrosseryou are not being very specific unfortunately19:35
jrosserthere is stuff done by magnum service to the openstack endpoint19:35
jrosseralso by the controlplane k8s cluster19:35
jrosserand also by the workload cluster19:36
spatelwhole workload is so confusing about public and private 19:36
jrosserwell, perhaps19:36
spateldoc just saying keep control plane on same LAN where openstack control plan running19:37
spatelBut in my case CAPI control plane running very far in different network on public IP machine. 19:37
spatelI have put ngnix (with public IP) in front of my private openstack to expose on public and updated Public endpoint to ngnix public IP19:38
spatelNow I want to tell magnum to always use PublicURL for all request 19:39
jrosserlike I say you configure the things that magnum is going to interact with in magnum.conf19:40
jrosserand sounds like you need a mix?19:40
spatel[magnum_client] is this what I should tell?19:41
jrossermagnum container can use internal endpoint19:41
spatelhttps://docs.openstack.org/magnum/latest/configuration/sample-config.html19:41
jrosserbut you need to tell magnum-cluster-api to use the public endpoint19:41
jrosser[capi_client] config section19:41
spatelin magnum.conf just add section  [capi_client] and put endpoint_type = publicURL right?19:43
spatellet me add and restart services and see 19:44
NeilHanlonugh... we have some cycle i think with the rocky upgrade job in gate for https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/903544, jrosser19:45
NeilHanlon2023.1 tries to install erlang-25.2.3, but uses the el/8/ repo, which apparently doesn't have this version19:46
spateljrosser no luck.. its still sending internal URL to CAPI 19:47
jrosserspatel: honestly, I spent several weeks making this all work19:47
spatel[capi_client]19:48
spatelendpoint_type = publicURL19:48
spatelThis is what I added in magnum.conf 19:48
hamburglerHas anyone had any luck integrating horizon - swift dash with ceph backend so that swift storage policies are mapped 1:1 with ceph placement-targets? Thinking this is probably a no :p 19:48
jrosserspatel: check the magnum-cluster-api docs for the config options19:49
spatelhttps://github.com/vexxhost/magnum-cluster-api/blob/main/docs/user/configs.md19:50
spatelwhere should I add this flag? does magnum-cluster-api has different config file? 19:51
jrosserhamburgler: I see two choices that I expect in the storage policy drop-down in horizon19:52
spatelI am reading your patch here - https://review.opendev.org/c/openstack/openstack-ansible/+/893240/31/tests/roles/bootstrap-host/templates/user_variables_k8s.yml.j219:52
hamburglerjrosser: is your backend ceph for object storage? or swift?19:53
spatelI have added in correct place in magnum.conf file 19:53
jrosserspatel: magnum-cluster-api is a driver for magnum, it uses settings from magnum.conf19:53
spatel+119:53
jrosserhamburgler: it’s ceph/radosgw19:53
jrosserno actual swift19:53
hamburglerso no swift installation at all, just in local_settings.py for horizon you enabled swift dash? this is what I have done, everything works fine except cannot create buckets via horizon, can oddly fully interact within them after the fact if i crated via cli19:55
hamburglercontainers* not buckets19:55
jrosserwell, we have a pretty complicated radosgw setup19:57
spateljrosser as per doc default endpoint is publicURL 19:57
spatelThis is crazy :(19:57
spatellet me see what I can do 19:57
jrosserbut, there is one specifically setup to be Swift API for horizon19:57
hamburglerah k :) you mean a dedicated cluster for horizon? 20:02
jrossera dedicted radosgw yes, but that’s just as we have a slightly odd setup20:03
jrosserI can take a look at the config tomorrow if that is helpful20:04
jrosseralso ask my colleague who did this if we had any trouble20:04
hamburglerah k that makes sense :), i guess there would be issues if too many different api types were enabled on that set of of gateways20:04
jrosserit was something like that yes20:05
jrossereventually it was simpler to just run different unique instances for swift / s3 and “internal” for horizon20:05
hamburglerthat'd be great if you wouldn't mind - running reef over here 20:06
hamburgleryeah totally makes sense20:06
jrosseraha we didn’t dare start to try reef until the .1 release just made20:07
hamburglerhaha :D yeah saw the notice yesterday20:07
hamburgleri've had no issues running it thankfully20:07
spateljrosser one more question, does workload cluster k8s instance required talking to CAPI control plane k8s ?20:18
jrosserspatel: no it’s the other way round20:27
jrossersee my diagram20:27
spateloh so CAPI k8s will make connection to workload master/worker nodes? 20:27
jrosseryes20:28
spatelnow I am having issue of my instance not getting floating IP attach :( 20:34
spatelin logs its saying - CAPI OpenstackCluster status reason: Successfulcreatefloatingip: Created floating IP 104.xxx.xxx.204 with id 2422f704-78e4-45ff-9f3d-bccb713d59d3)20:34
spatelbut in nova list Its not showing up.. 20:34
jrosseropenstack server list?20:36
jrosserspatel: the floating ip is for the loadbalancer20:39
spatelmy external network is public in template so its always use public floating IP 20:40
jrosserthat’s fine20:40
jrosseryou said trouble “with instance getting floating up”20:40
spateleven in my old deployment all my master and worker nodes has public IP (floating)20:40
jrosserbut the floating ip is for octavia20:40
spatelYes octavia should have floating IP 20:41
spatelbut in my old environment all the k8s nodes has public floating IP20:42
jrosserwell that was the heat driver?20:42
spatelyes it was heat 20:42
spatelI hate heat so trying to make CAPI workable 20:42
jrosserspatel: I’m not sure this puts floating ip on the instances21:02
spateljrosser here its, I just fixed something and its showing floating IP now - https://paste.opendev.org/show/bxtZb82TkS5Clk1mtLP7/21:07
jrosserand the loadbalancer?21:11
spateljrosser I am inside master k8s node and seeing this error - https://paste.opendev.org/show/bRkZftSA4PioHzQ026VK/21:15
spatelmy cluster still in creating state 21:15
spatelwhere kube config located inside master node?21:18
jrosserthen look at usual k8s stuff to debug21:19
jrosserlike pod status, kubelet log21:19
jrosserall the config came in through cloud-init21:19
spatelI am not able to find ~/.kube/conf file21:19
jrosserlook at the could-init user data for that21:19
jrosserit’s in /etc/kubernetes ?21:20
jrosseralso there I think is clouds.yaml for openstack api setup21:20
spatelhttps://paste.opendev.org/show/bIpAzjocZpB4LOYzktk4/21:23
spatelhow do i give path of config in kubectl command :D21:24
jrosserright, so admin.conf?21:24
spatelI am little noob 21:24
jrosser—kubeconfig21:24
jrosserhah I am also total noob at this21:24
spatelcp admin.conf ~/.kube/config21:25
jrosseronce the cluster deploys I have absolutely no clue what you would do next21:25
spatelit works - https://paste.opendev.org/show/besdCfQThMMIUafiKfZk/21:25
spatelI am also learning :) 21:25
spatelThis is fun now.. because I have no clue what to look here to find my cinder issue 21:28
spatelhttps://paste.opendev.org/show/bOxDdbp5i65y6RxM452b/21:28
opendevreviewDoug Goldstein proposed openstack/openstack-ansible master: abstract bootstrap host disk partition names  https://review.opendev.org/c/openstack/openstack-ansible/+/90110621:29
jrossercheck the kubelet journal21:29
spatelhttps://paste.opendev.org/show/b78hlrkve1WHgS68zfvg/21:30
spatel"Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cinder-csi-plugin\" with CrashLoopBackOff:21:31
spatelthat is all I can see 21:31
spatellet me ask this in mailing list and see 21:37
jrosserget the pod logs for cinder csi if you can21:40
jrosserspatel: does the cloud.conf look sensible? (don’t paste the app creds!)21:42
spatelThis is my playground cluster so not worried about it :) 21:42
spatelgood for heads up 21:42
spatelI am thinking to build different version of cluster and see getting similar error or not. 21:43
spatelor I can remove cinder dependency from template and try (at least it will help me to understand real issue)21:43
spateljrosser here you go - https://paste.opendev.org/show/bCpPzeyLS1ezMuYx9PO5/21:45
jrosserdid you check the cloud.conf?21:45
jrosserreason i ask is that both cinder csi and openstack-cloud-controller-manager are failing21:46
jrosserthose two things are related to your openstack21:46
jrosserfor example, you say it's a playground cluster, so what is the situation with SSL on the public endpoint?21:47
spatelhere is my cloud.conf file - https://paste.opendev.org/show/bY2aj3JbksdC6QaNPv5r/21:47
spatelThis file pointing to private IP and from instance I can't curl that URL 21:48
jrosserit is the public endpoint though?21:48
spatelThat could be my issue.. totally21:48
spatelno its internal os2-int 21:48
jrosseroh 'int' -> internal21:48
spatelyes 21:48
jrosserahha so this is no good21:49
spatelthat is where my head is exploding because I am not able to tell magnum to use public endpoint :(21:49
spatelis there anyway I can tell it to change it to public endpoint in template or somewhere ?21:50
jrosserthat is read from your service catalog21:53
jrosserhttps://github.com/vexxhost/magnum-cluster-api/blob/main/magnum_cluster_api/utils.py#L11621:53
spatelHmm my service catalog has private IP for public endpoint21:54
spatelas I said earlier I have ngnix outside openstack to expose public 21:54
jrosserserivce catalog should have the nginx ip then for the public endpoint21:55
spatelwhen I changed that IP to public my vm stopped receiving floating IP :?(21:57
spatelLet me change it and debug what is going on21:57
spatelhold on..21:57
spateljrosser I think I found issue.. my nginx not parsing header properly.. let me fix it 22:09
opendevreviewGaudenz Steinlin proposed openstack/openstack-ansible-plugins master: Ensure consistent ordering of network_mappings  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/90404022:11
spateljrosser progress.. now I can see vm building process 22:23
spatelfingre cross 22:23
spatelfinger*22:23
spateljrosser epic win :)   CREATE_COMPLETE22:26
jrosser\o/22:27
jrossernice22:27
spateljrosser Thank you so much for the help :) 22:31
jrosserthats ok :)22:31
jrosserthing is, for this your deployment has to be in really good shape and everything correcy22:32
jrossercorrect22:32
spatelcredit should goes to you22:32
spateltime to create blog before my brain shutdown 22:32
jrosseryeah write it down quick22:32
jrosseri was doing docs for OSA today for this same stuff22:32
spatelYep! 22:32
spatelfor production do I need multi-node CAPI control-plane ?22:33
jrosserthough my patches are a bit different and deploy everything including control plane22:33
jrosserwell maybe, i believe that it is stateful and important stuff is in the etcd22:33
spatelare you using kind cluster?22:33
jrosserso if you lose that then you are hosed22:33
spatelouch!!! so one more critical stuff to handle :O22:34
jrosserno, i am using this https://github.com/vexxhost/ansible-collection-kubernetes22:34
spatelhmm! 22:34
jrosseryes i think that it is important like mariadb to have some plan for disaster / upgrades etc22:34
jrosserand so far that is something that i have not yet really put much time into22:35
jrosserother than with that collection i get a completely specified k8s environment, all versions controlled22:35
spatelThis is crazy then.. to maintain one more beast which I am not good at it 22:35
jrosserbtw you do not have to make the control plane k8s public22:36
jrosseryou can do it public if you want22:36
spatelagreed, I have moved my k8s CAPI to private network and it works 22:36
jrosserbut you can also keep it completely on the mgmt network22:36
jrosseralso the public floating IP is optional if you want to reduce usage of public IP22:37
spatelhmm! 22:37
jrosserthere is an optional service you can run22:38
spatelokie okie! 22:39
jrosserhttps://github.com/vexxhost/magnum-cluster-api/tree/main/magnum_cluster_api/proxy22:39
jrossersadly there is zero docs for this so far22:39
jrosserbut my OSA patches provide an example22:40
jrosseryou run it on your network node and it bridges between the management network and the tenant network22:40
jrosser"bridges" -> provides an http endpoint between them with haproxy, automatically22:40
jrosserthat might be important, depending on what your production environment network looks like22:41
jrosserand how tight your supply of floating IP is22:41
spatelI think it required proper planning or placement of capi node22:42
jrosseryes, though now you are in a goot position to think about those things22:42
jrosser*good22:42
spatellook at me.. how i was struggling between public/internal endpoints :(22:43
jrosseri only know all this because we had to got some fixes done for the same stuff22:43
jrosserand we also contribute some code to it as well now22:43
jrosserbecause we have a very strict split between internal/public22:43
spatelin OSA why we just run CAPI in infra nodes and expose via HAProxy 22:44
jrosserso if there are errors in the code / selection of endpoints it's kist all broken22:44
jrosserthats exactly what happens on the internal endpoint, 6443 is capi k8s22:44
spatelIn that case we don't need to worry about capi proxy etc.. (let the end user decide how they want to place it outside OSA)22:45
jrosserremember capi k8s needs to contact the k8s api endpoint in your workload cluster22:45
jrosseryou just did that with your floating ip22:45
spatelIn my case only master node has floating IP so capi can talk to it 22:46
jrosserso the proxy allows that to happen when there is not a floating IP on the octavia lb22:46
jrosserno i keep saying :)22:46
jrosserthis deploys a LB with octavia22:46
spatelNot worker node22:46
spatelBut we need floating IP on workload master node, otherwise how customer will access it ?22:47
spatelVia octavia ?22:48
jrosserthats a different thing22:48
jrosseri expect the idea is that you already have other infrastructure, a bastion or something else in your tenant22:48
jrosserand you access via that and the neutron router22:48
spatelmy customers are public users so I have to place it on public IP to let them access 22:48
jrossersure22:49
jrosserthey may have a bastion with a floating ip22:49
spatelHmm! (may be private cloud)22:49
jrosseror some other vm22:49
spatelYep! that should work 22:49
jrosseralso pay attention to the cluster template22:49
jrosserthe only way a ssh key gets into the instances is via the cluster template22:49
spatelNo.. I am injecting key when I create cluster 22:50
spatelopenstack coe cluster create --cluster-template k8s-v1.27.4 --master-count 1 --node-count 2 --keypair spatel-ssh-key mycluster1 22:50
jrosserok that probably overrides the template, not sure22:51
spatelThis works for me 22:51
jrosseranyway, i expect the idea is that you obtain the .kube/config using the openstack cli22:51
jrosserand there should be not much need to actually get into the nodes22:51
spatelI have used - openstack coe cluster config mycluster122:52
jrosseralso this supports auto-healing and ausoscaling so it makes not much sense for manual config22:52
jrosseri think if you delete a worker vm it will spawn another to replace it22:52
spatelHmm! interesting point :)22:53
jrosserso workflow here for your users is important22:53
jrosserif they think they need to login to the instances to do things by hand, thats probably wrong22:53
spatelWe let customer do whatever they want. If they want to SSH on master node then sure they can do it 22:54
jrosserdeploy the apps and whatever remotely, using the api and credentials obtained from opensatack cli22:54
jrossersure they can do that22:54
jrosserbut it should be no surprise if their vm get recycled, or replaced22:54
spatelYes +1 22:55
jrossersomething else i have not looked at is how upgrades are supposed to work22:55
jrosserwould not surprise me if the nodes are rolling-replaced with a newer k8s image from glance22:55
spatelI am damn new to k8s.. I have no idea how stuff work inside k8s 22:57
spateltime to learn.. 22:57
jrosseryeah actually `openstack coe cluster upgrade <cluster ID> <new cluster template ID>`22:57
jrosserthe new cluster template would specify a different image in glance22:57
jrosserso i expect that the nodes would be replaced to conform to the new cluster template22:58
jrosserok it is late here22:58
spatelsame here! 23:02
spatelI will catch you next time :)23:02
spatelgoodnight folks! and merry Christmas if I don't see you here23:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!