Friday, 2024-10-18

opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Freeze roles for 30.0.0.0b1 release https://review.opendev.org/c/openstack/openstack-ansible/+/931611	08:12
opendevreview	Dmitriy Rabotyagov proposed openstack/ansible-role-python_venv_build master: DNM https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/932639	10:00
opendevreview	Merged openstack/openstack-ansible master: Freeze roles for 30.0.0.0b1 release https://review.opendev.org/c/openstack/openstack-ansible/+/931611	13:57
noonedeadpunk	folks, one more review is needed for https://review.opendev.org/c/openstack/openstack-ansible/+/932439 so I could push the tag ^	14:01
noonedeadpunk	as otherwise deployments with keepalived simply fail :(	14:01
andrewbonney	Done	14:02
noonedeadpunk	thanks!	14:03
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: [reno] Add a release note for haproxy_limit_hosts key implementation https://review.opendev.org/c/openstack/openstack-ansible/+/932713	14:19
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Reference keepalived scripts through integrated repo path https://review.opendev.org/c/openstack/openstack-ansible/+/932439	14:21
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Unfreeze roles after milestone release https://review.opendev.org/c/openstack/openstack-ansible/+/931612	14:21
noonedeadpunk	damn	14:22
noonedeadpunk	didn't want to rebase that :(	14:22
opendevreview	Merged openstack/openstack-ansible-rabbitmq_server master: Proceed with installation/upgrade even if cluster not healthy https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/931800	14:49
opendevreview	Merged openstack/openstack-ansible stable/2023.2: Update Neutron SHA after bugfix https://review.opendev.org/c/openstack/openstack-ansible/+/932498	15:30
opendevreview	Merged openstack/openstack-ansible stable/2023.1: Update Neutron SHA after bugfix https://review.opendev.org/c/openstack/openstack-ansible/+/932499	15:38
opendevreview	Merged openstack/openstack-ansible-plugins master: Simplify haproxy_service_configs defenition https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/931762	15:59
noonedeadpunk	jrosser: I think right now I'm all ears on how you solved TLS stuff with CAPI. As apparently supplying `openstack_ca_file: '/etc/ssl/certs/ca-certificates.crt'` is not a good option, as that will go as part of cloud-init user data on VM create request. Or you've had to increase	16:55
noonedeadpunk	body size for nova as well? https://docs.openstack.org/nova/latest/configuration/config.html#oslo_middleware.max_request_body_size	16:55
jrosser	well	16:55
jrosser	openstack_ca_file is indeed for the worload cluster to be able to trust the external endpoint	16:55
jrosser	so we do the exact equivalent of this https://github.com/openstack/openstack-ansible-ops/blob/master/mcapi_vexxhost/playbooks/files/openstack_deploy/user_variables_z_magnum.yml#L27	16:56
jrosser	just that we use our local root here `openstack_ca_file: '/usr/local/share/ca-certificates/bbcrd-lt.crt.crt'`	16:57
jrosser	^ thats the internal CA for our testlan	16:57
jrosser	testlab	16:57
jrosser	oh copy/paste error there but kywim	16:58
noonedeadpunk	oh, so you're using internal ca for public endpoint? I guess lucky you then...	16:59
jrosser	well, company CA	16:59
jrosser	which != internal vip CA	16:59
noonedeadpunk	though I think I will update that sample with more context...	16:59
noonedeadpunk	but this one - is also not only internal CA, right? https://github.com/openstack/openstack-ansible-ops/blob/master/mcapi_vexxhost/playbooks/files/openstack_deploy/user_variables_z_magnum.yml#L29	17:00
noonedeadpunk	as, as it reaches keystone via public still I guess	17:00
noonedeadpunk	and that's the bug you pointed me to earlier	17:01
jrosser	https://paste.opendev.org/show/biXrLNe0t4I1zeXJ1uPP/	17:01
noonedeadpunk	so I'm just using a plain let's encrypt here. ANd I frankly not sure of a good idea, except to pass system trust	17:01
jrosser	well, then you should not need anything i think	17:02
noonedeadpunk	except ca_file: '/usr/local/share/ca-certificates/<ca-for-internal-vip>.crt' - doesn't work	17:02
noonedeadpunk	it tries to connect to keystone via public vip and fails or smth like that	17:02
jrosser	as the trust store in the ubuntu image will be good for the LE on the public endpoint	17:02
noonedeadpunk	so it never tries to create anything in openstack	17:02
noonedeadpunk	> trust store in the ubuntu image will be good for the LE on the public endpoint - I don't think it's using system trust store at all	17:03
noonedeadpunk	so it's pulling in whatever is in the cloud-init from what I got	17:03
noonedeadpunk	or, you have to pass insecure	17:03
noonedeadpunk	https://github.com/vexxhost/magnum-cluster-api/blob/770fa907d909fcd93bac747b92e425229dbe9787/magnum_cluster_api/utils.py#L135-L136	17:03
noonedeadpunk	or well... maybe if I don't pass openstack_ca_file at all it will work?	17:04
jrosser	i think the intention is that you can omit it https://github.com/vexxhost/magnum-cluster-api/pull/208/commits/211bdcf0564447320d51a9db669fbc3d3f03baf7	17:05
noonedeadpunk	aha, let me try that....	17:05
jrosser	yeah i think that patch catches the case when it is not set, so does not try to pass gigantic whole system CA bundle	17:06
jrosser	i believe that openstack_ca_file is for the specific case of having a public endpoint with a CA that you know is not in the worker node system trust store	17:07
noonedeadpunk	frankly, from my prespective - Heat driver "just works" hehe	17:07
jrosser	its interesting isnt it - my experience is completely the opposite	17:08
noonedeadpunk	oh yes, totally	17:08
jrosser	but i think i try to do lots of things that require everything to be exactly right	17:09
jrosser	like no route from controller to public endpoint	17:09
noonedeadpunk	and then I assume you maintain local path that ensures nothing goes to public keystone	17:09
jrosser	indeed we do patch for that	17:10
* noonedeadpunk was lazy to do so in a sandbox		17:10
jrosser	it was really a lot of work with mcapi also to make it work wirth different internal and external CA	17:11
jrosser	lots of patches and fixing with mnaser to get it all happy	17:11
noonedeadpunk	I had to call out for devops who more aware of k8s to help out to understand why only half of nodes are spawned and nothing is in logs	17:12
noonedeadpunk	took quite some time to understand that VMs can't report back due to cert issue	17:12
jrosser	so something I want to do is make the debugging docs better in the ops repo	17:15
jrosser	I have some stuff internally about all the kubctl things you can do on the controller	17:16
noonedeadpunk	yeah, would need to cotribute here and there for sure	17:16
jrosser	but I think major structural things like getting the CA setup all straight, and correct for the actually deployment is tough	17:16
jrosser	some understanding is needed	17:17
jrosser	we can perhaps improve/annotate my diagram for that?	17:17
noonedeadpunk	well, I guess example should contain a bit more generic example with some basis around that	17:17
noonedeadpunk	as you might have imagined - I just defined an internal OSA CA everywhere by example	17:17
noonedeadpunk	nothing worked :D	17:18
jrosser	aaahhh	17:18
jrosser	well perhaps that’s a thing I’ve not been clear around	17:19
jrosser	my example is entirely based on an AIO/CI setup	17:19
jrosser	not necessarily a real deploy	17:20
noonedeadpunk	which also makes sense....	17:20
jrosser	so we can defiantly either provide a second example or better describe what the options are doing and when/why they are needed	17:21
noonedeadpunk	yeah, I'm gonna add some comments in there	17:22
jrosser	that’s really great	17:22
noonedeadpunk	thanks for answering and helping out!	17:22
jrosser	I was always worried that I had made it not understandable but we did not get many people trying I yet for a fresh perspective	17:22
jrosser	then finally today I was on a k8s course - only took 1yr to book :(	17:24
jrosser	noonedeadpunk: did you figure out the FIP vs. proxy service bit yet?	17:27
jrosser	also tbh I am not so sure that vm calling back to magnum for completion is a thing here - that’s more to do with heat afaik	17:40
noonedeadpunk	jrosser: I am not	17:49
noonedeadpunk	Just managed to create the first 1 node cluster 5 mins ago :D	17:49
jrosser	so the health check is from the control plane to the workload cluster	17:49
noonedeadpunk	I think it does call somewhere, before marking cluster as create complete	17:49
jrosser	naaaah	17:50
noonedeadpunk	at least there was smth in vm logs, that it failed to verify certs connecting to keystone	17:50
noonedeadpunk	iirc I've took that from master - https://paste.openstack.org/show/bDK3iq0azG0LFmK6PIfq/	17:51
jrosser	you do have a bunch of clients there for cinder/manila/keystone etc	17:51
noonedeadpunk	yeah might be... but without that cluster was stuck in progress still	17:52
noonedeadpunk	it was never marked as completed without that last CA part	17:52
jrosser	anyway you must have a fip for the cluster health to come good	17:52
noonedeadpunk	yeah, I think I do have a fip there indeed	17:53
jrosser	if you don’t, then look at the proxy service, but do that once you get everything else good	17:53
noonedeadpunk	btw, as there some way to "extend" default security groups?	17:53
jrosser	then you can run without the fip too	17:53
noonedeadpunk	proxy service... where is it?	17:53
noonedeadpunk	but then I need to spawn it inside of the internal network I assume?	17:54
jrosser	either network nodes, !ovn	17:54
jrosser	or computes for ovn	17:54
jrosser	another subtlety :)	17:54
noonedeadpunk	so it's k8s proxy service?	17:54
noonedeadpunk	ugh	17:56
noonedeadpunk	it's all a bit... more complex then I was imagining to myself	17:57
jrosser	python service that runs haproxy	17:57
jrosser	sorry just multitasking here :)	17:57
noonedeadpunk	um. I guess I'm not getting what proxy it is..	17:58
jrosser	when there is no fip	18:06
jrosser	the control plane k8s still needs to be able to reach the workload k8s cluster api endpoints	18:07
jrosser	so it’s an haproxy that does that for you	18:07
noonedeadpunk	yeah, that part I understood. but where's that service is? how to run it?	18:13
noonedeadpunk	or it's smth to come up on your own?	18:16
noonedeadpunk	ah	18:17
noonedeadpunk	ok, it's in a to-do part :D	18:17
noonedeadpunk	to it's basically install same magnum-cluster-api but just run with a different config file and binary probably	18:18
noonedeadpunk	ok, makes sense	18:18
noonedeadpunk	and without magnum or anything like that	18:19
jrosser	yes it’s all the playbooks I wrote	18:29
noonedeadpunk	++	18:29
jrosser	little python service / venv / systemd	18:29
jrosser	was really nice just to reuse all our roles for that	18:29
jrosser	so there are two cases you need that	18:30
jrosser	clusters without fip - it figures out of that is the case and instead proxies when needed	18:30
noonedeadpunk	heh, yep, we have quite some stuff for setting up things...	18:30
jrosser	or if your control plane is not routable you can set env var to make it always proxy for all clusters, regardless of fip or not	18:31
jrosser	but thing is with ovn the required network namespaces only exist on the computes, so some attention is needed as to which group to target with that	18:32
noonedeadpunk	I think in my production usecase I shouldn't have issues with reaching contorl plane API from clusters	18:32
noonedeadpunk	But would struggle without registry, so I guess my next step would be on how to provision without internet access on control-plane side	18:32
noonedeadpunk	and actually how to setup control cluster without internet	18:33
jrosser	ah well I have that done too :)	18:33
jrosser	but taking very very long time to upstream all that	18:33
noonedeadpunk	(I'm not there yet anyway)	18:33
noonedeadpunk	can totally get that	18:33
jrosser	there are the first few pr there now actually	18:34
jrosser	to introspect the ansible role and download all the needed binaries	18:34
jrosser	that you can stick them all on some internal http server and install from there	18:34
noonedeadpunk	ok, that sounds not too bad	18:38
noonedeadpunk	as indeed there're just binaries mainly	18:38
noonedeadpunk	though I was more unsure about helm charts and operators	18:38
jrosser	there’s nothing like that needed on the control plane	18:40
noonedeadpunk	nice	18:40
noonedeadpunk	and there're no issue for internat access from workloads	18:41
jrosser	I think that’s kind of a major difference between the vexxhost and stackhpc approaches	18:41
noonedeadpunk	seems gerrit just died :(	18:42
noonedeadpunk	ah, it's planned	18:42
noonedeadpunk	I'm taking a very baby steps now, as can't spend work time for that kind of thing	18:43
jrosser	cool well just ask if you get stuck :)	18:43
noonedeadpunk	sure, thanks, much appreciated!	18:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!