Friday, 2024-10-18

opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Freeze roles for 30.0.0.0b1 release  https://review.opendev.org/c/openstack/openstack-ansible/+/93161108:12
opendevreviewDmitriy Rabotyagov proposed openstack/ansible-role-python_venv_build master: DNM  https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/93263910:00
opendevreviewMerged openstack/openstack-ansible master: Freeze roles for 30.0.0.0b1 release  https://review.opendev.org/c/openstack/openstack-ansible/+/93161113:57
noonedeadpunkfolks, one more review is needed for https://review.opendev.org/c/openstack/openstack-ansible/+/932439 so I could push the tag ^14:01
noonedeadpunkas otherwise deployments with keepalived simply fail :(14:01
andrewbonneyDone14:02
noonedeadpunkthanks!14:03
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: [reno] Add a release note for haproxy_limit_hosts key implementation  https://review.opendev.org/c/openstack/openstack-ansible/+/93271314:19
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Reference keepalived scripts through integrated repo path  https://review.opendev.org/c/openstack/openstack-ansible/+/93243914:21
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Unfreeze roles after milestone release  https://review.opendev.org/c/openstack/openstack-ansible/+/93161214:21
noonedeadpunkdamn14:22
noonedeadpunkdidn't want to rebase that :(14:22
opendevreviewMerged openstack/openstack-ansible-rabbitmq_server master: Proceed with installation/upgrade even if cluster not healthy  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/93180014:49
opendevreviewMerged openstack/openstack-ansible stable/2023.2: Update Neutron SHA after bugfix  https://review.opendev.org/c/openstack/openstack-ansible/+/93249815:30
opendevreviewMerged openstack/openstack-ansible stable/2023.1: Update Neutron SHA after bugfix  https://review.opendev.org/c/openstack/openstack-ansible/+/93249915:38
opendevreviewMerged openstack/openstack-ansible-plugins master: Simplify haproxy_service_configs defenition  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/93176215:59
noonedeadpunkjrosser: I think right now I'm all ears on how you solved TLS stuff with CAPI. As apparently supplying `openstack_ca_file: '/etc/ssl/certs/ca-certificates.crt'` is not a good option, as that will go as part of cloud-init user data on VM create request. Or you've had to increase 16:55
noonedeadpunkbody size for nova as well? https://docs.openstack.org/nova/latest/configuration/config.html#oslo_middleware.max_request_body_size16:55
jrosserwell16:55
jrosseropenstack_ca_file is indeed for the worload cluster to be able to trust the external endpoint16:55
jrosserso we do the exact equivalent of this https://github.com/openstack/openstack-ansible-ops/blob/master/mcapi_vexxhost/playbooks/files/openstack_deploy/user_variables_z_magnum.yml#L2716:56
jrosserjust that we use our local root here `openstack_ca_file: '/usr/local/share/ca-certificates/bbcrd-lt.crt.crt'`16:57
jrosser^ thats the internal CA for our testlan16:57
jrossertestlab16:57
jrosseroh copy/paste error there but kywim16:58
noonedeadpunkoh, so you're using internal ca for public endpoint? I guess lucky you then...16:59
jrosserwell, company CA16:59
jrosserwhich != internal vip CA16:59
noonedeadpunkthough I think I will update that sample with more context...16:59
noonedeadpunkbut this one - is also not only internal CA, right? https://github.com/openstack/openstack-ansible-ops/blob/master/mcapi_vexxhost/playbooks/files/openstack_deploy/user_variables_z_magnum.yml#L2917:00
noonedeadpunkas, as it reaches keystone via public still I guess17:00
noonedeadpunkand that's the bug you pointed me to earlier17:01
jrosserhttps://paste.opendev.org/show/biXrLNe0t4I1zeXJ1uPP/17:01
noonedeadpunkso I'm just using a plain let's encrypt here. ANd I frankly not sure of a good idea, except to pass system trust17:01
jrosserwell, then you should not need anything i think17:02
noonedeadpunkexcept ca_file: '/usr/local/share/ca-certificates/<ca-for-internal-vip>.crt' - doesn't work17:02
noonedeadpunkit tries to connect to keystone via public vip and fails or smth like that17:02
jrosseras the trust store in the ubuntu image will be good for the LE on the public endpoint17:02
noonedeadpunkso it never tries to create anything in openstack17:02
noonedeadpunk>  trust store in the ubuntu image will be good for the LE on the public endpoint - I don't think it's using system trust store at all17:03
noonedeadpunkso it's pulling in whatever is in the cloud-init from what I got17:03
noonedeadpunkor, you have to pass insecure17:03
noonedeadpunkhttps://github.com/vexxhost/magnum-cluster-api/blob/770fa907d909fcd93bac747b92e425229dbe9787/magnum_cluster_api/utils.py#L135-L13617:03
noonedeadpunkor well... maybe if I don't pass openstack_ca_file at all it will work?17:04
jrosseri think the intention is that you can omit it https://github.com/vexxhost/magnum-cluster-api/pull/208/commits/211bdcf0564447320d51a9db669fbc3d3f03baf717:05
noonedeadpunkaha, let me try that....17:05
jrosseryeah i think that patch catches the case when it is not set, so does not try to pass gigantic whole system CA bundle17:06
jrosseri believe that openstack_ca_file is for the specific case of having a public endpoint with a CA that you know is not in the worker node system trust store17:07
noonedeadpunkfrankly, from my prespective - Heat driver "just works" hehe17:07
jrosserits interesting isnt it - my experience is completely the opposite17:08
noonedeadpunkoh yes, totally17:08
jrosserbut i think i try to do lots of things that require everything to be exactly right17:09
jrosserlike no route from controller to public endpoint17:09
noonedeadpunkand then I assume you maintain local path that ensures nothing goes to public keystone17:09
jrosserindeed we do patch for that17:10
* noonedeadpunk was lazy to do so in a sandbox17:10
jrosserit was really a lot of work with mcapi also to make it work wirth different internal and external CA17:11
jrosserlots of patches and fixing with mnaser to get it all happy17:11
noonedeadpunkI had to call out for devops who more aware of k8s to help out to understand why only half of nodes are spawned and nothing is in logs17:12
noonedeadpunktook quite some time to understand that VMs can't report back due to cert issue17:12
jrosserso something I want to do is make the debugging docs better in the ops repo17:15
jrosserI have some stuff internally about all the kubctl things you can do on the controller17:16
noonedeadpunkyeah, would need to cotribute here and there for sure17:16
jrosserbut I think major structural things like getting the CA setup all straight, and correct for the actually deployment is tough17:16
jrossersome understanding is needed17:17
jrosserwe can perhaps improve/annotate my diagram for that?17:17
noonedeadpunkwell, I guess example should contain a bit more generic example with some basis around that17:17
noonedeadpunkas you might have imagined - I just defined an internal OSA CA everywhere by example17:17
noonedeadpunknothing worked :D17:18
jrosseraaahhh17:18
jrosserwell perhaps that’s a thing I’ve not been clear around17:19
jrossermy example is entirely based on an AIO/CI setup17:19
jrossernot necessarily a real deploy17:20
noonedeadpunkwhich also makes sense....17:20
jrosserso we can defiantly either provide a second example or better describe what the options are doing and when/why they are needed17:21
noonedeadpunkyeah, I'm gonna add some comments in there17:22
jrosserthat’s really great17:22
noonedeadpunkthanks for answering and helping out!17:22
jrosserI was always worried that I had made it not understandable but we did not get many people trying I yet for a fresh perspective17:22
jrosserthen finally today I was on a k8s course - only took 1yr to book :(17:24
jrossernoonedeadpunk: did you figure out the FIP vs. proxy service bit yet?17:27
jrosseralso tbh I am not so sure that vm calling back to magnum for completion is a thing here - that’s more to do with heat afaik17:40
noonedeadpunkjrosser: I am not17:49
noonedeadpunkJust managed to create the first 1 node cluster 5 mins ago :D17:49
jrosserso the health check is from the control plane to the workload cluster17:49
noonedeadpunkI think it does call somewhere, before marking cluster as create complete17:49
jrossernaaaah17:50
noonedeadpunkat least there was smth in vm logs, that it failed to verify certs connecting to keystone17:50
noonedeadpunkiirc I've took that from master - https://paste.openstack.org/show/bDK3iq0azG0LFmK6PIfq/17:51
jrosseryou do have a bunch of clients there for cinder/manila/keystone etc17:51
noonedeadpunkyeah might be... but without that cluster was stuck in progress still17:52
noonedeadpunkit was never marked as completed without that last CA part17:52
jrosseranyway you must have a fip for the cluster health to come good17:52
noonedeadpunkyeah, I think I do have a fip there indeed17:53
jrosserif you don’t, then look at the proxy service, but do that once you get everything else good17:53
noonedeadpunkbtw, as there some way to "extend" default security groups?17:53
jrosserthen you can run without the fip too17:53
noonedeadpunkproxy service... where is it?17:53
noonedeadpunkbut then I need to spawn it inside of the internal network I assume?17:54
jrossereither network nodes, !ovn17:54
jrosseror computes for ovn17:54
jrosseranother subtlety :)17:54
noonedeadpunkso it's k8s proxy service?17:54
noonedeadpunkugh17:56
noonedeadpunkit's all a bit... more complex then I was imagining to myself17:57
jrosserpython service that runs haproxy17:57
jrossersorry just multitasking here :)17:57
noonedeadpunkum. I guess I'm not getting what proxy it is..17:58
jrosserwhen there is no fip18:06
jrosserthe control plane k8s still needs to be able to reach the workload k8s cluster api endpoints18:07
jrosserso it’s an haproxy that does that for you18:07
noonedeadpunkyeah, that part I understood. but where's that service is? how to run it?18:13
noonedeadpunkor it's smth to come up on your own?18:16
noonedeadpunkah18:17
noonedeadpunkok, it's in a to-do part :D18:17
noonedeadpunkto it's basically install same magnum-cluster-api but just run with a different config file and binary probably18:18
noonedeadpunkok, makes sense18:18
noonedeadpunkand without magnum or anything like that18:19
jrosseryes it’s all the playbooks I wrote18:29
noonedeadpunk++18:29
jrosserlittle python service / venv / systemd18:29
jrosserwas really nice just to reuse all our roles for that18:29
jrosserso there are two cases you need that18:30
jrosserclusters without fip - it figures out of that is the case and instead proxies when needed18:30
noonedeadpunkheh, yep, we have quite some stuff for setting up things...18:30
jrosseror if your control plane is not routable you can set env var to make it always proxy for all clusters, regardless of fip or not18:31
jrosserbut thing is with ovn the required network namespaces only exist on the computes, so some attention is needed as to which group to target with that18:32
noonedeadpunkI think in my production usecase I shouldn't have issues with reaching contorl plane API from clusters18:32
noonedeadpunkBut would struggle without registry, so I guess my next step would be on how to provision without internet access on control-plane side18:32
noonedeadpunkand actually how to setup control cluster without internet18:33
jrosserah well I have that done too :)18:33
jrosserbut taking very very long time to upstream all that18:33
noonedeadpunk(I'm not there yet anyway)18:33
noonedeadpunkcan totally get that18:33
jrosserthere are the first few pr there now actually18:34
jrosserto introspect the ansible role and download all the needed binaries18:34
jrosserthat you can stick them all on some internal http server and install from there18:34
noonedeadpunkok, that sounds not too bad18:38
noonedeadpunkas indeed there're just binaries mainly18:38
noonedeadpunkthough I was more unsure about helm charts and operators18:38
jrosserthere’s nothing like that needed on the control plane18:40
noonedeadpunknice18:40
noonedeadpunkand there're no issue for internat access from workloads18:41
jrosserI think that’s kind of a major difference between the vexxhost and stackhpc approaches18:41
noonedeadpunkseems gerrit just died :(18:42
noonedeadpunkah, it's planned18:42
noonedeadpunkI'm taking a very baby steps now, as can't spend work time for that kind of thing18:43
jrossercool well just ask if you get stuck :)18:43
noonedeadpunksure, thanks, much appreciated!18:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!