Monday, 2023-12-18

opendevreviewMarcus Klein proposed openstack/openstack-ansible-ops master: Add Prometheus Mysqld exporter  https://review.opendev.org/c/openstack/openstack-ansible-ops/+/90385808:50
opendevreviewMarcus Klein proposed openstack/openstack-ansible-ops master: Add Prometheus Mysqld exporter  https://review.opendev.org/c/openstack/openstack-ansible-ops/+/90385811:32
opendevreviewMarcus Klein proposed openstack/openstack-ansible-ops master: Add Prometheus Mysqld exporter  https://review.opendev.org/c/openstack/openstack-ansible-ops/+/90385813:29
deflatedHi all, me again, managed to fix the ceph repo issue kinda, it's still generating another ceph repo in each container i have to delete but once i do it works. Moving on i have noticed my external vip is binding to br-mgmt, i'm sure this isn't right, how do i set this to my external/public network as no matter what i set haproxy_keepalived_external_interface/haproxy_bind_external_lb_vip_interface to it 14:00
deflatedeither wont attach to anything or attaches to br-mgmt still, if i leave this blank it attaches to br-mgmt14:00
deflatedif this is intended i'll move on, if not, help is appreciated14:00
jrosseryou can make the external VIP be whatever you need14:11
deflatedi've tried and it doesnt seem to honour it in user_variables14:11
deflatedon 28.0.0 btw14:12
jrossercan you share what you set?14:12
deflatedhaproxy_keepalived_external_vip_cidr originally then when it binded to br-mgmt tried setting haproxy_keepalived_external_interface/haproxy_bind_external_lb_vip_interface to the wanted interface14:13
deflatedcidr i used the ip/subnet wanted of course14:14
deflatedi can ping and access the interface/network i am trying to attach to14:14
jrosserhere is some of my config https://paste.opendev.org/show/bRAsO7OBq3daaIRlg3xm/14:14
deflatedyeah those are the same as the ones i set (dirrerent values of course)14:15
deflated*different14:15
jrosserand which playbook are you running14:15
deflatedhosts/infra.yml with -limit 'haproxy_all' to test my changes14:16
deflatedpretty sure i only need to run infra but i tried hosts for my own sanity tbh14:17
jrosseryou can run `openstack-ansible playbooks/haproxy-install.yml`14:18
deflatedjust noticed your interfaces dont have quotes, does that matter14:18
deflatedah ok, will do that from now14:18
jrosserthey are just yaml strings, so it should be fine14:18
jrosserdeflated: then the corresponding part of my openstack_user_config.yml is https://paste.opendev.org/show/bPWqqGHBgvWB1JGH8QVe/14:22
jrosserdeflated: do you have more than one infra node?14:23
deflatedyep, also have those set to match14:23
jrosseractually i mean, are you running more than one haproxy instance14:23
deflatedyeah, have 3, all identical14:23
jrosserso you should then be able to check the keepalived config and haproxy config on those nodes14:24
deflatedi checked /etc/haproxy/haproxy.cfg and it states the right bridge in the text but it's not actually attaching14:25
jrosserwell it wouldnt14:25
jrosserbecasue in a HA deployment, keepalived is responsible for the VIP14:25
jrosserdeflated: can i just double check that you are not getting mixed up between haproxy_bind_internal_lb_vip_interface and haproxy_keepalived_internal_interface14:28
deflatedchecking keepalived also shows me the correct virtual_ipaddress and bridge14:28
deflatedno, i dont have lb_vip set in variables14:28
jrossercan you please explain more `i checked /etc/haproxy/haproxy.cfg and it states the right bridge in the text but it's not actually attaching`14:29
spateljrosser morning 15:11
jrossero/ hello there15:11
spatelI am playing with magnum-cluster-api and seeing this error in magnum - https://paste.opendev.org/show/btsoaa2SjauhVIkWq3uA/15:12
jrosseri see you all over the ML and slack and irc :)15:12
spatel:D15:12
spatelI am desperate to make it work because customer looking for alternative solution 15:12
jrosseryou can only use calico15:12
spatelI am frustrated because there are not enough doc for this stuff.. :(15:13
spatelI am using calico in my template 15:13
jrosseroh no actually that is magnum.conf problem15:14
jrosserthis is all in my patches for OSA15:14
jrossermagnum.conf must say that *only* calico is allowed15:15
spatelDo you know config option which I can put manually? 15:16
spatellet me add allowed_network_drivers=calico in magnum.conf 15:17
jrosserhttps://review.opendev.org/c/openstack/openstack-ansible/+/893240/31/tests/roles/bootstrap-host/templates/user_variables_k8s.yml.j215:17
jrosserdon't jsut copy/paste the whole lot, needs understanding15:18
spatelI am using kolla-ansible :( but I can compile info which required for it 15:19
jrosserimho there should be proper documentation with the deployment tools15:19
jrosserotherwise it is a total nightmare15:20
jrosserbut you know enough how openstack-ansible overrides work to be able to translate magnum_magnum_conf_overrides in OSA into something the same in kolla?15:20
spatelI do have template with calico driver - https://paste.opendev.org/show/b79POHXj4tWB8S1Aubdz/15:20
jrosseryes but like i say, the magnum-cluster-api driver is validating that magum.conf allows *only* calico15:21
jrossernot in your cluster template15:21
spatelok.. let me add in magnum.conf file 15:23
spatelis this correct flag - allowed_network_drivers=calico15:23
jrosserdo you have barbican?15:23
spatelno 15:23
jrosseryou can see in my patch that i set kubernetes_allowed_network_drivers and  ikubernetes_default_network_driver in the [cluster_template] config section15:24
spatelok.. let me try and i will get back to you15:25
jrosserand if you do not have barbican then you also need cert_manager_type: x509keypair in [certificates] if it is not already like that15:25
spatelI don't have barbican 15:25
spatelcert_manager_type: x509keypair in [certificates] this will be in magnum.conf15:26
jrosserslow down :)15:26
jrosserlook at my patch15:26
spatelok.. :)15:27
spatelGive me few min.. stuck in one meeting.. 15:37
spateljrosser are you running kind cluster in OSA?15:38
jrosserno, i have used the vexxhost.kubernetes ansible collection to deploy the control plane cluster15:42
spateljrosser I am getting this error now - https://paste.opendev.org/show/bv9F7fK4xLGT4dCsTBif/15:47
jrosseryou have to debug15:47
spatelI have enabled debug but no interesting logs there.. let me show you15:48
spatelI am using this to deploy controlplane - https://github.com/vexxhost/magnum-cluster-api/blob/main/hack/stack.sh#L128C1-L140C4515:48
jrosseri am going to guess that this is because your magnum container does not trust the certificate in the k8s endpoint15:48
spatelkubectl command works from magnum container 15:49
spateljrosser how does magnum knows that I have to talk to CAPI node? 15:57
jrosserthe credentials and CA and endpoint are all in the .kube/config15:59
jrosserso if you have delete/recreate/something your control plane cluster but not copied the updated .kube/config to your magnum container, you could have difficulty16:00
jrosserwhich would certainly lead to SSL errors as the CA will be different16:00
spateljrosser check this out - https://paste.opendev.org/16:00
spatelI do copy .kube/config when I rebuilt my kind cluster16:01
jrosserand you restart magnum conductor? (i don't know if this is needed, not sure about the lifecycle of the config)16:02
jrosserbtw the paste link is incomplete16:02
spatelI am always restarting all container 16:02
jrosserand have you looked at the log for magnum conductor16:03
spateljrosser https://pastebin.com/gVkvDmVd16:04
jrosseri mean specifically for the SSL errors you see in the cluster status16:06
spateljrosser let me verify SSL again16:09
jrosserspatel: mgariepy heres how my magnum diagram is so far https://pasteboard.co/XtSEagQfxwgv.png16:18
spateljrosser I got new error this time - https://paste.opendev.org/show/bIcRHJDJJVlQiTr82bfO/16:18
spateljrosser +++1 for diagram :)16:19
jrosserspatel: i have no idea on your error16:20
spateljrosser did you use this code to deploy capi control plane - https://github.com/vexxhost/magnum-cluster-api/blob/main/hack/stack.sh#L128C1-L140C45 16:21
jrosserthe diagram is "full fat / max complexity" deployment, lots is optional and probably not required16:21
jrosserspatel: no i did not16:22
spatelcan you point me what did you use to deploy capi ?16:22
jrosserwhat version did you install?16:22
jrosserspatel: i used this https://review.opendev.org/c/openstack/openstack-ansible/+/89324016:26
spateljrosser look like progress, I am seeing - CREATE_IN_PROGRESS16:39
spatelfingers cross 16:39
spatelWhat is the command to check progress? in heat we can see resources but what is the command in CAPI?16:42
jrosserspatel: hah that is a great question indeed16:44
jrosserto start with i think you can see some of the progress in magnum conductor16:45
jrosseryou can try something like `kubectl -n capo-system logs deploy/capo-controller-manager`16:46
jrosserspatel: do you have octavia deployed?16:48
deflatedjrosser so sorry i didn't get back to you earlier, my son had an accident at school, seem to have have found the problem, the bridge i was using was set to manual with no ip, setting it to static with an ip has caused the vip to be created as a secondary, figured this out by trying another network then analysed the differences, which was the ip, i've checked and i can't see anything in the docs that 16:58
deflatedstates the bridge for the vip requires an ip16:58
jrosserwell "it depends"16:58
jrosserif it was your external interface for neutron routers / floating IP then there would be no need for an IP on the bridge16:59
jrosserand ultimately it pretty much depends how you want it to work17:00
jrosserif you were using real internet ipv4 for this then it might be quite reasonable not to want to "waste" a public ipv4 address on each node, as well as the VIP17:00
jrosserthe thing with openstack-ansible is that almost anything is possible, like a toolkit really17:01
deflatedI'm just happy i figured it out, it's a big learning curve, I can imagine i'm going to run into more caveats when this goes from testing to production17:03
jrosseroh sure i totally undertstand about the learning curve17:03
deflatedhaving my settings confirmed helped me dig deeper so thanks for that17:03
jrosserit's a very different thing to a shrink-wrap install where all the decisions are made for you17:03
jrosserflip-side of that is, almost anything is possible17:04
deflatedi've been modding things my whole life, i much prefer to tinker and learn than be handed it on a platter17:04
jrosseras an example, my API endpoints / horizon are in a different interface and subnet to the neturon networks17:04
jrosserjust becasue i choose it to be that way17:04
jrosserfwiw most of the active poeple here in openstack-ansible IRC are operating clouds, and are contributing to the code17:05
deflatedcurrently running infrastructure then on to openstack, i have ran this before and had a ceph key error for gnocchi that if it reoccurs i'll post up later (probably tomorrow, it's almost the end of my work day)17:05
jrosserso theres quite a good perspective on what works, and whats necessary17:06
jrosserah ok, i don't run the telemetry stack so don't have any hands on experience with gnocchi17:06
deflatedi have spent a bit of time learning and following the tracker on opendev, i think i need to make an account to better understand the process and then i think i'll submit an updated network setup for ovs as i may just have it working17:08
jrossercool - be sure to ask networking things of jamesdenton too17:09
jrosserfwiw, OVS should 'just work' if you've followed how the all-in-one is setup17:09
jrosserand also, new deployments probably should be using OVN17:09
spateljrosser yes I do have octavia 17:10
deflatedi actually found his blog a while back and it helped to understand the transition from lb to ovs, i am using ovn, my bonds and bridges are however ovs17:11
jrosserspatel: so you should be able to follow the creation of the loadbalancer, security groups, router, network,..... by cluster_api17:11
spatelMy cluster stuck in CREATE_IN_PROGRESS 17:14
jrosserright - you need to find out what it is trying to do17:14
spatelnova list - I can see only single vm created - k8s-clusterapi-cluster-magnum-system-kube-5n49h17:14
jrosserdid you setup an ssh key with your cluster template?17:14
spatelI think not.. that is my next step to add ssh key and re-create cluster 17:15
jrosseryes, definatly do that for debugging17:15
jrosserspatel: so another question - can your control plane k8s contact the API endpoint on your created workload cluster17:16
jrosseryou either need "some networking" that makes that work / a floating IP to be created on the octavia LB / or use the magnum-cluster-api-proxy service17:17
spatelvm has public floating IP so my k8s should be able to reach it 17:18
spatelI meant  k8s-clusterapi-cluster-magnum-system-kube-5n49h vm 17:18
jrosserno, i mean floating IP on the loadbalancer17:18
spatelI can't see any octavia instance yet17:19
jrosseri think the default is that it's enabled actually17:19
spatelI can see only single VM spun up with name of - kube-5n49h-7jkxl-245s517:20
spatelAssuming this is master node 17:20
deflatedspatel, you can ssh into the vm as soon as it creates the node and run journalctl -f to watch for errors, i'm of course only just entering the convo but i havent seen what kube version you are using? certain versions will fail no matter how hard you try17:25
spateldeflated this is all pre-build images so version should work. I have feeling that my openstack endpoint not allowing access to kube vms because they are not on public. 17:27
spatelI am debugging it and see what is going on 17:27
deflatedah ok, assumed you were building from a coreos image17:27
jrosserpublic ip doesnt matter17:28
jrosserthe magnum vm should nat out through the neutron router to your public endpoint17:28
jrosserthe floating ip is necessary for the control plane k8s cluster to see the workload cluster api17:28
jrosserdeflated: this is all new exciting stuff using cluster-api rather than the heat/coreos driver in magnum17:30
deflatedgreat, another subject to learn lol, guess more research is in order17:31
spateljrosser my controllers running on all private IPs and if kube vms running on public IP then they can't talk to openstack endpoints. 17:49
spatelI am setting up one VM with nginx to expose all endpoint to public IP and then I will update keystone catalog of Public to point my ngnix with public IP 17:49
spatelI believe k8s workload vms need to talk to openstack endpoints otherwise it won't work 17:50
jrosseryes, and a network and router are created for this17:56
jrosserspatel: it’s totally not needed to make extra nginx17:56
jrosseroh wait? you don’t have public endpoint?17:57
spatelno18:08
spatelnot yet.. I am setting up now with ngnix 18:08
spatelany idea about this error in novaconsole logs - handler exception: The token '***' is invalid or has expired20:23

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!