opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Freeze roles for 30.0.0.0b1 release https://review.opendev.org/c/openstack/openstack-ansible/+/931611 | 08:12 |
---|---|---|
opendevreview | Dmitriy Rabotyagov proposed openstack/ansible-role-python_venv_build master: DNM https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/932639 | 10:00 |
opendevreview | Merged openstack/openstack-ansible master: Freeze roles for 30.0.0.0b1 release https://review.opendev.org/c/openstack/openstack-ansible/+/931611 | 13:57 |
noonedeadpunk | folks, one more review is needed for https://review.opendev.org/c/openstack/openstack-ansible/+/932439 so I could push the tag ^ | 14:01 |
noonedeadpunk | as otherwise deployments with keepalived simply fail :( | 14:01 |
andrewbonney | Done | 14:02 |
noonedeadpunk | thanks! | 14:03 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: [reno] Add a release note for haproxy_limit_hosts key implementation https://review.opendev.org/c/openstack/openstack-ansible/+/932713 | 14:19 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Reference keepalived scripts through integrated repo path https://review.opendev.org/c/openstack/openstack-ansible/+/932439 | 14:21 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Unfreeze roles after milestone release https://review.opendev.org/c/openstack/openstack-ansible/+/931612 | 14:21 |
noonedeadpunk | damn | 14:22 |
noonedeadpunk | didn't want to rebase that :( | 14:22 |
opendevreview | Merged openstack/openstack-ansible-rabbitmq_server master: Proceed with installation/upgrade even if cluster not healthy https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/931800 | 14:49 |
opendevreview | Merged openstack/openstack-ansible stable/2023.2: Update Neutron SHA after bugfix https://review.opendev.org/c/openstack/openstack-ansible/+/932498 | 15:30 |
opendevreview | Merged openstack/openstack-ansible stable/2023.1: Update Neutron SHA after bugfix https://review.opendev.org/c/openstack/openstack-ansible/+/932499 | 15:38 |
opendevreview | Merged openstack/openstack-ansible-plugins master: Simplify haproxy_service_configs defenition https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/931762 | 15:59 |
noonedeadpunk | jrosser: I think right now I'm all ears on how you solved TLS stuff with CAPI. As apparently supplying `openstack_ca_file: '/etc/ssl/certs/ca-certificates.crt'` is not a good option, as that will go as part of cloud-init user data on VM create request. Or you've had to increase | 16:55 |
noonedeadpunk | body size for nova as well? https://docs.openstack.org/nova/latest/configuration/config.html#oslo_middleware.max_request_body_size | 16:55 |
jrosser | well | 16:55 |
jrosser | openstack_ca_file is indeed for the worload cluster to be able to trust the external endpoint | 16:55 |
jrosser | so we do the exact equivalent of this https://github.com/openstack/openstack-ansible-ops/blob/master/mcapi_vexxhost/playbooks/files/openstack_deploy/user_variables_z_magnum.yml#L27 | 16:56 |
jrosser | just that we use our local root here `openstack_ca_file: '/usr/local/share/ca-certificates/bbcrd-lt.crt.crt'` | 16:57 |
jrosser | ^ thats the internal CA for our testlan | 16:57 |
jrosser | testlab | 16:57 |
jrosser | oh copy/paste error there but kywim | 16:58 |
noonedeadpunk | oh, so you're using internal ca for public endpoint? I guess lucky you then... | 16:59 |
jrosser | well, company CA | 16:59 |
jrosser | which != internal vip CA | 16:59 |
noonedeadpunk | though I think I will update that sample with more context... | 16:59 |
noonedeadpunk | but this one - is also not only internal CA, right? https://github.com/openstack/openstack-ansible-ops/blob/master/mcapi_vexxhost/playbooks/files/openstack_deploy/user_variables_z_magnum.yml#L29 | 17:00 |
noonedeadpunk | as, as it reaches keystone via public still I guess | 17:00 |
noonedeadpunk | and that's the bug you pointed me to earlier | 17:01 |
jrosser | https://paste.opendev.org/show/biXrLNe0t4I1zeXJ1uPP/ | 17:01 |
noonedeadpunk | so I'm just using a plain let's encrypt here. ANd I frankly not sure of a good idea, except to pass system trust | 17:01 |
jrosser | well, then you should not need anything i think | 17:02 |
noonedeadpunk | except ca_file: '/usr/local/share/ca-certificates/<ca-for-internal-vip>.crt' - doesn't work | 17:02 |
noonedeadpunk | it tries to connect to keystone via public vip and fails or smth like that | 17:02 |
jrosser | as the trust store in the ubuntu image will be good for the LE on the public endpoint | 17:02 |
noonedeadpunk | so it never tries to create anything in openstack | 17:02 |
noonedeadpunk | > trust store in the ubuntu image will be good for the LE on the public endpoint - I don't think it's using system trust store at all | 17:03 |
noonedeadpunk | so it's pulling in whatever is in the cloud-init from what I got | 17:03 |
noonedeadpunk | or, you have to pass insecure | 17:03 |
noonedeadpunk | https://github.com/vexxhost/magnum-cluster-api/blob/770fa907d909fcd93bac747b92e425229dbe9787/magnum_cluster_api/utils.py#L135-L136 | 17:03 |
noonedeadpunk | or well... maybe if I don't pass openstack_ca_file at all it will work? | 17:04 |
jrosser | i think the intention is that you can omit it https://github.com/vexxhost/magnum-cluster-api/pull/208/commits/211bdcf0564447320d51a9db669fbc3d3f03baf7 | 17:05 |
noonedeadpunk | aha, let me try that.... | 17:05 |
jrosser | yeah i think that patch catches the case when it is not set, so does not try to pass gigantic whole system CA bundle | 17:06 |
jrosser | i believe that openstack_ca_file is for the specific case of having a public endpoint with a CA that you know is not in the worker node system trust store | 17:07 |
noonedeadpunk | frankly, from my prespective - Heat driver "just works" hehe | 17:07 |
jrosser | its interesting isnt it - my experience is completely the opposite | 17:08 |
noonedeadpunk | oh yes, totally | 17:08 |
jrosser | but i think i try to do lots of things that require everything to be exactly right | 17:09 |
jrosser | like no route from controller to public endpoint | 17:09 |
noonedeadpunk | and then I assume you maintain local path that ensures nothing goes to public keystone | 17:09 |
jrosser | indeed we do patch for that | 17:10 |
* noonedeadpunk was lazy to do so in a sandbox | 17:10 | |
jrosser | it was really a lot of work with mcapi also to make it work wirth different internal and external CA | 17:11 |
jrosser | lots of patches and fixing with mnaser to get it all happy | 17:11 |
noonedeadpunk | I had to call out for devops who more aware of k8s to help out to understand why only half of nodes are spawned and nothing is in logs | 17:12 |
noonedeadpunk | took quite some time to understand that VMs can't report back due to cert issue | 17:12 |
jrosser | so something I want to do is make the debugging docs better in the ops repo | 17:15 |
jrosser | I have some stuff internally about all the kubctl things you can do on the controller | 17:16 |
noonedeadpunk | yeah, would need to cotribute here and there for sure | 17:16 |
jrosser | but I think major structural things like getting the CA setup all straight, and correct for the actually deployment is tough | 17:16 |
jrosser | some understanding is needed | 17:17 |
jrosser | we can perhaps improve/annotate my diagram for that? | 17:17 |
noonedeadpunk | well, I guess example should contain a bit more generic example with some basis around that | 17:17 |
noonedeadpunk | as you might have imagined - I just defined an internal OSA CA everywhere by example | 17:17 |
noonedeadpunk | nothing worked :D | 17:18 |
jrosser | aaahhh | 17:18 |
jrosser | well perhaps that’s a thing I’ve not been clear around | 17:19 |
jrosser | my example is entirely based on an AIO/CI setup | 17:19 |
jrosser | not necessarily a real deploy | 17:20 |
noonedeadpunk | which also makes sense.... | 17:20 |
jrosser | so we can defiantly either provide a second example or better describe what the options are doing and when/why they are needed | 17:21 |
noonedeadpunk | yeah, I'm gonna add some comments in there | 17:22 |
jrosser | that’s really great | 17:22 |
noonedeadpunk | thanks for answering and helping out! | 17:22 |
jrosser | I was always worried that I had made it not understandable but we did not get many people trying I yet for a fresh perspective | 17:22 |
jrosser | then finally today I was on a k8s course - only took 1yr to book :( | 17:24 |
jrosser | noonedeadpunk: did you figure out the FIP vs. proxy service bit yet? | 17:27 |
jrosser | also tbh I am not so sure that vm calling back to magnum for completion is a thing here - that’s more to do with heat afaik | 17:40 |
noonedeadpunk | jrosser: I am not | 17:49 |
noonedeadpunk | Just managed to create the first 1 node cluster 5 mins ago :D | 17:49 |
jrosser | so the health check is from the control plane to the workload cluster | 17:49 |
noonedeadpunk | I think it does call somewhere, before marking cluster as create complete | 17:49 |
jrosser | naaaah | 17:50 |
noonedeadpunk | at least there was smth in vm logs, that it failed to verify certs connecting to keystone | 17:50 |
noonedeadpunk | iirc I've took that from master - https://paste.openstack.org/show/bDK3iq0azG0LFmK6PIfq/ | 17:51 |
jrosser | you do have a bunch of clients there for cinder/manila/keystone etc | 17:51 |
noonedeadpunk | yeah might be... but without that cluster was stuck in progress still | 17:52 |
noonedeadpunk | it was never marked as completed without that last CA part | 17:52 |
jrosser | anyway you must have a fip for the cluster health to come good | 17:52 |
noonedeadpunk | yeah, I think I do have a fip there indeed | 17:53 |
jrosser | if you don’t, then look at the proxy service, but do that once you get everything else good | 17:53 |
noonedeadpunk | btw, as there some way to "extend" default security groups? | 17:53 |
jrosser | then you can run without the fip too | 17:53 |
noonedeadpunk | proxy service... where is it? | 17:53 |
noonedeadpunk | but then I need to spawn it inside of the internal network I assume? | 17:54 |
jrosser | either network nodes, !ovn | 17:54 |
jrosser | or computes for ovn | 17:54 |
jrosser | another subtlety :) | 17:54 |
noonedeadpunk | so it's k8s proxy service? | 17:54 |
noonedeadpunk | ugh | 17:56 |
noonedeadpunk | it's all a bit... more complex then I was imagining to myself | 17:57 |
jrosser | python service that runs haproxy | 17:57 |
jrosser | sorry just multitasking here :) | 17:57 |
noonedeadpunk | um. I guess I'm not getting what proxy it is.. | 17:58 |
jrosser | when there is no fip | 18:06 |
jrosser | the control plane k8s still needs to be able to reach the workload k8s cluster api endpoints | 18:07 |
jrosser | so it’s an haproxy that does that for you | 18:07 |
noonedeadpunk | yeah, that part I understood. but where's that service is? how to run it? | 18:13 |
noonedeadpunk | or it's smth to come up on your own? | 18:16 |
noonedeadpunk | ah | 18:17 |
noonedeadpunk | ok, it's in a to-do part :D | 18:17 |
noonedeadpunk | to it's basically install same magnum-cluster-api but just run with a different config file and binary probably | 18:18 |
noonedeadpunk | ok, makes sense | 18:18 |
noonedeadpunk | and without magnum or anything like that | 18:19 |
jrosser | yes it’s all the playbooks I wrote | 18:29 |
noonedeadpunk | ++ | 18:29 |
jrosser | little python service / venv / systemd | 18:29 |
jrosser | was really nice just to reuse all our roles for that | 18:29 |
jrosser | so there are two cases you need that | 18:30 |
jrosser | clusters without fip - it figures out of that is the case and instead proxies when needed | 18:30 |
noonedeadpunk | heh, yep, we have quite some stuff for setting up things... | 18:30 |
jrosser | or if your control plane is not routable you can set env var to make it always proxy for all clusters, regardless of fip or not | 18:31 |
jrosser | but thing is with ovn the required network namespaces only exist on the computes, so some attention is needed as to which group to target with that | 18:32 |
noonedeadpunk | I think in my production usecase I shouldn't have issues with reaching contorl plane API from clusters | 18:32 |
noonedeadpunk | But would struggle without registry, so I guess my next step would be on how to provision without internet access on control-plane side | 18:32 |
noonedeadpunk | and actually how to setup control cluster without internet | 18:33 |
jrosser | ah well I have that done too :) | 18:33 |
jrosser | but taking very very long time to upstream all that | 18:33 |
noonedeadpunk | (I'm not there yet anyway) | 18:33 |
noonedeadpunk | can totally get that | 18:33 |
jrosser | there are the first few pr there now actually | 18:34 |
jrosser | to introspect the ansible role and download all the needed binaries | 18:34 |
jrosser | that you can stick them all on some internal http server and install from there | 18:34 |
noonedeadpunk | ok, that sounds not too bad | 18:38 |
noonedeadpunk | as indeed there're just binaries mainly | 18:38 |
noonedeadpunk | though I was more unsure about helm charts and operators | 18:38 |
jrosser | there’s nothing like that needed on the control plane | 18:40 |
noonedeadpunk | nice | 18:40 |
noonedeadpunk | and there're no issue for internat access from workloads | 18:41 |
jrosser | I think that’s kind of a major difference between the vexxhost and stackhpc approaches | 18:41 |
noonedeadpunk | seems gerrit just died :( | 18:42 |
noonedeadpunk | ah, it's planned | 18:42 |
noonedeadpunk | I'm taking a very baby steps now, as can't spend work time for that kind of thing | 18:43 |
jrosser | cool well just ask if you get stuck :) | 18:43 |
noonedeadpunk | sure, thanks, much appreciated! | 18:43 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!