opendevreview | Merged openstack/ansible-role-systemd_networkd stable/2024.2: Restart systemd-networkd on routes changes https://review.opendev.org/c/openstack/ansible-role-systemd_networkd/+/938699 | 00:51 |
---|---|---|
opendevreview | Merged openstack/ansible-role-systemd_networkd stable/2024.1: Restart systemd-networkd on routes changes https://review.opendev.org/c/openstack/ansible-role-systemd_networkd/+/938700 | 01:28 |
noonedeadpunk | good morning | 07:39 |
jrosser | o/ morning | 07:50 |
opendevreview | Dmitriy Rabotyagov proposed openstack/ansible-role-httpd master: Initial commit to the role https://review.opendev.org/c/openstack/ansible-role-httpd/+/938245 | 10:41 |
noonedeadpunk | fwiw, I can't reproduce failure for skyline with httpd role | 11:09 |
f0o | Happy 2025! | 13:45 |
f0o | Unfortunately cinder-scheduler decided to die on me without any changes... we just rebooted the instances one by one to apply OS updates without touching the lxc; now it spews this: | 13:46 |
f0o | Exchange.declare: (406) PRECONDITION_FAILED - inequivalent arg 'durable' for exchange 'cinder-scheduler_fanout' in vhost 'cinder': received 'false' but current is 'true' | 13:46 |
f0o | I've vaguely seen this before but for the love of me cannot remember where or how I resolved it last time | 13:46 |
f0o | anyone got pointers? | 13:46 |
f0o | hrm maybe I can just bump it all to 2024.2 and see if that just fixes it... although I would really like to know why its even happening.. google returns only only bugs from 2016 | 14:04 |
noonedeadpunk | f0o: it seems that some cinder-service was somehow reconfigured for HA queues (instead of QUorum) on new vhost | 15:57 |
noonedeadpunk | for 2024.1 we made a migration from HA queues (which are non-durable) to quorum (which are durable). As this is descructve and tricky process, the best way to do so was to drop` /cinder` vhost and create a new `cinder` (without /) | 15:58 |
noonedeadpunk | then before service restart, they were re-configured to use quorum queues. as durable flag is set on initial creation and can't be easily changed | 16:00 |
noonedeadpunk | So I guess I'd check if all cinder services do have quorum enaled in cinder.conf and use `cinder` vhost, and then re-created vhost in rabbitmq | 16:01 |
noonedeadpunk | upgrade to 2024.2 won't give you anything with this regard | 16:01 |
spatel | jrosser do you know how to update existing k8s cluster template values? | 16:58 |
spatel | I want to add new ssh-key pair in cluster template but not show I am doing right way or not | 16:58 |
spatel | openstack coe cluster template <template_name> update keypair_id=jmp1-key | 16:59 |
spatel | Got it - openstack coe cluster template update k8s-v1.27.4-private add keypair_id=jmp1-key | 17:00 |
spatel | ClusterTemplate dc6e16ee-3a2a-4c06-9ac5-e90dff486db1 is referenced by one or multiple clusters (HTTP 400) (Request-ID: req-5c756cc2-313d-49d4-bfe3-76c28d60faa4) | 17:00 |
spatel | look like template in use so I can't add | 17:00 |
noonedeadpunk | yeah, I don't think you can update tenmplate in-use indeed | 17:04 |
f0o | noonedeadpunk: thanks for the pointer! will investigate | 17:30 |
f0o | i recall some rabbitmq migration steps to 2024.1 which I performed. can I simply drop the vhost and rerun thosr steps? | 17:32 |
spatel | noonedeadpunk I am all set there | 17:33 |
spatel | noonedeadpunk but now I am seeing strange issue related AZ stuff with mcapi | 17:33 |
spatel | I have multiple AZ and somehow mcapi doesn't honor my AZ and randomly creating nodes to different AZ | 17:33 |
spatel | This is my label on template - kube_tag=v1.27.4,ingress_controller=octavia,cloud_provider_enabled=true,availability_zone=az1 | 17:34 |
spatel | I want all my k8s nodes got to AZ1 | 17:34 |
noonedeadpunk | f0o: you can re-run os-cinder-install --tags cinder-config which ensure correect configuration, and re-create vhost, yes | 17:40 |
noonedeadpunk | spatel: are those worker or controller nodes? | 17:41 |
spatel | controller | 17:41 |
noonedeadpunk | as az is respected only for workers | 17:41 |
noonedeadpunk | controllers are indeed spread randomly... | 17:41 |
spatel | so what is the solution to put controller on right AZ? | 17:42 |
noonedeadpunk | I think I saw something in code to restrict controllers though | 17:42 |
spatel | control_avaibility_zone or something? | 17:42 |
noonedeadpunk | `control_plane_availability_zones` - yeah | 17:43 |
noonedeadpunk | https://github.com/vexxhost/magnum-cluster-api/blob/5c40b607e609f93842d9dd3e9d4b2a70b45f844d/magnum_cluster_api/resources.py#L2896-L2899 | 17:43 |
spatel | Perfect! let me give it a try and see | 17:43 |
f0o | thanls noonedeadpunk ! | 18:07 |
f0o | cant type on phone | 18:07 |
spatel | noonedeadpunk that works! | 18:09 |
spatel | control_plane_availability_zones | 18:09 |
spatel | now one more issue related openstack server group | 18:09 |
spatel | mcapi doesn't delete server group. I have to delete by hand (manually() | 18:10 |
noonedeadpunk | well. we have another issue as well, when trying to scale cluster back capi driver marks all of workers as taint, regardless of minimal workers value | 18:18 |
noonedeadpunk | ofc k8s does not delete them, but magnum still looses them from it's db | 18:18 |
spatel | hmm | 18:42 |
spatel | noonedeadpunk What permission I have to give normal user to create k8s cluster? | 18:42 |
spatel | I did reader role to user but still getting permission error - openstack role add --user spatel --project production reader | 18:42 |
noonedeadpunk | at very least member... but also there could be some magnum specific roles - can't recall tbh | 18:43 |
noonedeadpunk | but could be it's only for heat driver... | 18:43 |
jrosser | afaik we have not had to do anything special with permissions, just regular user is ok | 18:46 |
spatel | hmm | 18:46 |
noonedeadpunk | but reader totally won't be enough | 18:47 |
spatel | Hmm! | 18:55 |
spatel | Did you see this error? - https://paste.opendev.org/show/btNIQ7wMvpa3BEepwLvI/ | 18:55 |
spatel | I have created template using admin account and now trying to create k8s cluster using my own account spatel | 18:56 |
spatel | I can see template then why its not letting me use that template ? | 18:56 |
spatel | Do i need to create template for each project? that is strange | 18:57 |
noonedeadpunk | um, you have a type either in config or in endpoints | 18:57 |
noonedeadpunk | I guess it's config | 18:57 |
noonedeadpunk | wrt keystone url | 18:58 |
birbilakos | hello team, I'm new with openstack-ansible and I was wondering how I could deploy magnum AFTER the initial installation of openstack. I.e. made all the config changes in user config but don't really know which playbooks to trigger. Any help? | 18:58 |
noonedeadpunk | birbilakos: have you also edited openstack_user_config? | 18:58 |
noonedeadpunk | and what version? | 18:58 |
noonedeadpunk | lxc? | 18:59 |
birbilakos | yes, it's prepped. I'm using OSA 2023.2 | 18:59 |
birbilakos | yes, on lxc | 18:59 |
spatel | noonedeadpunk are you asking me for keystone url? | 18:59 |
noonedeadpunk | just yo doublecheck - when you run /opt/openstack-ansible/scripts/inventory-manage.py -G \grep magnum - you have some containers there? | 18:59 |
spatel | export OS_AUTH_URL=http://openstack-bos-2.example.com:5000 | 18:59 |
noonedeadpunk | spatel: you have somewhere extra "%" in keystone url according to the error | 19:00 |
spatel | hmm | 19:00 |
noonedeadpunk | `identity version 3%` - so you have somewhere `http://openstack-bos-2.example.com:5000/3%` or smth liek that | 19:00 |
noonedeadpunk | birbilakos: if you have output from inventory-manage.py -G wt to magnum, then you need to: 1. Create containers - openstack-ansible lxc-container-create.yml --limit magnum_all,lxc_hosts | 19:02 |
spatel | I have removed % :) | 19:02 |
spatel | but error is still there | 19:02 |
spatel | ClusterTemplate dc6e16ee-3a2a-4c06-9ac5-e90dff486db1 could not be found (HTTP 404) (Request-ID: req-3371c096-19d2-418b-8210-36b9881325a6) | 19:02 |
noonedeadpunk | spatel: did you made a template public? | 19:02 |
noonedeadpunk | birbilakos: then, you can jsut run `openstack-ansible os-magnum-install.yml` and it should be enough | 19:03 |
noonedeadpunk | ofc depending on the driver for magnum.... | 19:03 |
spatel | if its not public then how does it visible ? | 19:03 |
birbilakos | thank you noonedeadpunk. Yes, /opt/openstack-ansible/scripts/inventory-manage.py -G |grep magnum returns nothing | 19:03 |
noonedeadpunk | birbilakos: ok, then seems you've missed smth | 19:03 |
noonedeadpunk | oh, well. try /opt/openstack-ansible/dynamic-inventory.py first... | 19:04 |
birbilakos | no, this config change was added now. I know it wasnt part of the initial install | 19:04 |
noonedeadpunk | as probably you did changes but didn't run anything after? | 19:04 |
birbilakos | so the question is how to deploy just this now | 19:04 |
noonedeadpunk | I just wanted to ensure you have defined openstack_user_config properly :) | 19:05 |
spatel | noonedeadpunk you are right.. it was [public] issue | 19:05 |
birbilakos | magnum-infra_hosts: *infrastructure_hosts | 19:05 |
spatel | After fixing that it works | 19:05 |
birbilakos | that's what I added just now, should be enough I guess | 19:05 |
spatel | very odd | 19:05 |
noonedeadpunk | yeah, so as I said - openstack-ansible lxc-container-create.yml --limit magnum_all,lxc_hosts ; openstack-ansible os-magnum-install.yml | 19:05 |
birbilakos | got it, will try | 19:05 |
noonedeadpunk | birbilakos: but if you want cluster-api driver, it will be a bit more steps | 19:05 |
noonedeadpunk | https://docs.openstack.org/openstack-ansible-ops/latest/mcapi.html | 19:06 |
birbilakos | good point... thanks for the link!! | 19:06 |
birbilakos | btw, don't I also need to rerun horizon playbook for the magnum integration to work? | 19:06 |
noonedeadpunk | as there you;d need also to install a "master" k8s cluster inside lxc as well | 19:06 |
noonedeadpunk | if you use horizon - then yes :) | 19:07 |
noonedeadpunk | sorry, I just don't have it for quite some time so clean forgot about it | 19:07 |
noonedeadpunk | btw. if it's not cluster api - then you'd need to have heat | 19:07 |
noonedeadpunk | and likely you also want to have Octavia with Magnum | 19:07 |
birbilakos | hehe, no worries. I think its openstack-ansible os-horizon-install.yml right? | 19:07 |
noonedeadpunk | yup | 19:08 |
birbilakos | i think octavia requires quite some more changes so I'm not sure if I want to take it up just now | 19:08 |
noonedeadpunk | you might need to add `-e venv_rebuild=true`, though for this case it could be you don't need it... | 19:08 |
birbilakos | I hope this wont change the haproxy configs or certs. I have manually updated them since the initial install | 19:09 |
noonedeadpunk | so for cluster api to access spawned control plane from "master" cluster you need either octavia or proxy service on computes.... | 19:09 |
noonedeadpunk | it will change some haproxy config | 19:10 |
noonedeadpunk | as magnum playbooks needs to add more backends | 19:10 |
noonedeadpunk | though it should not change existing ones... | 19:10 |
birbilakos | got it. for now I will try the default installation (without cluster api). I think the limitation is that it's very tied to specific k8s versions for this to work and of course no native lb support from openstack (which is ok for my use case). I will study more on the mcapi stuff though | 19:11 |
noonedeadpunk | birbilakos: I think it's not about specific version, rather that these versions are old | 19:11 |
birbilakos | that probably also means only one 'master' node for my k8s envs | 19:11 |
noonedeadpunk | I think latest one that works was 1.28 or smth | 19:12 |
noonedeadpunk | and heat driver is also gonna be deprecated soon by magnum team | 19:12 |
noonedeadpunk | (with kind no migration from heat to capi) | 19:12 |
birbilakos | correct | 19:12 |
birbilakos | 1.28 is good enough for now | 19:12 |
birbilakos | so if I get this righ, capi installs a minimal control plane lxc container in my infra nodes. Then this is responsible for spinning up the actual clusters, right? | 19:14 |
birbilakos | so if I get this right, capi installs a minimal control plane lxc container in my infra nodes. Then this is responsible for spinning up the actual clusters, right? | 19:15 |
noonedeadpunk | well, it's not capi but playbooks... but yes | 19:15 |
f0o | noonedeadpunk: I deleted the vhost and rerun openstack-ansible os-cinder-install.yml --tags cinder-config; same error :/ | 19:15 |
noonedeadpunk | and then you need to have a separate image for each k8s version | 19:15 |
noonedeadpunk | (along with cluster template) | 19:16 |
birbilakos | looks like this is supported in 2024.1 onwards though, which is a no go until we upgrade :) | 19:16 |
noonedeadpunk | f0o: ugh... I guess I'd get rabbitmqadmin (though you'd need to extend rights for monitoring user first) and try to see more details about the queue - if it's durable or not right now. | 19:21 |
noonedeadpunk | if it's in "cinder" vhost (not "/cinder") - it should be durable | 19:21 |
noonedeadpunk | ie https://www.rabbitmq.com/docs/management-cli#get-a-list-of-queues-with-all-the-detail-we-can-take | 19:21 |
f0o | if you have a rabbitmqctl command -------- nvmd you just read my mind | 19:21 |
noonedeadpunk | but rabbitmqadmin is not installed out of the box | 19:22 |
f0o | that's correct.. any chance you know the rabbitmqctl equivalent? :D | 19:24 |
noonedeadpunk | there's none afaik | 19:24 |
noonedeadpunk | birbilakos: well, I'm not 100% sure, you might want to double-check with jrosser, but I think it's doable on 2023.2 as well | 19:25 |
f0o | apt-cache search rabbitmqadmin is empty ;/ | 19:26 |
noonedeadpunk | iirc it's just a binary... | 19:28 |
noonedeadpunk | https://www.rabbitmq.com/docs/management-cli | 19:28 |
noonedeadpunk | you can donwload or from rabbitmq management api, or from github | 19:29 |
noonedeadpunk | also, there's a UI for it... | 19:29 |
noonedeadpunk | but it's available only on openstack management network by default | 19:29 |
f0o | fun | 19:29 |
noonedeadpunk | so you might need to make a SOCKS proxy through SSH, ie ssh -D 8080 control01 | 19:29 |
noonedeadpunk | and then set 127.0.0.1:8080 as socks5 proxy like in firefox... | 19:30 |
noonedeadpunk | and then you can connect to internal vip on .... 15672 ? | 19:30 |
noonedeadpunk | 15671 | 19:31 |
noonedeadpunk | for tls | 19:31 |
f0o | soooo *** Could not connect: [Errno 111] Connection refused | 19:32 |
f0o | on the same host | 19:32 |
f0o | but it seems that the admin binary is a shell script | 19:33 |
f0o | so maybe I can just hack it | 19:33 |
noonedeadpunk | it's balanced by haproxy iirc | 19:34 |
f0o | but it should work against the local listener no? | 19:36 |
noonedeadpunk | um, yeah, it should | 19:37 |
noonedeadpunk | just tls/non-tls are different ports as well - keep that in mind | 19:37 |
noonedeadpunk | but it somehow feels that you might have one of the cinder services misconfigured (or even "lost"/unmanaged), which creates non-durable queues, while others expect it to be durable... | 19:38 |
f0o | hrm | 19:39 |
f0o | so best guess stop all cinders everywhere? | 19:39 |
noonedeadpunk | eventually.. another thing to try is to rollback cinder to HA queues, which are not durable. This will drop "cinder" vhost and create an old-new "/cinder" | 19:39 |
noonedeadpunk | and then revert this... but it's not great way... | 19:40 |
spatel | noonedeadpunk after more testing I found worker node not honer AZ | 19:40 |
noonedeadpunk | https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html#migrate-between-ha-and-quorum-queues | 19:40 |
f0o | right now I cant even get admin cli to work... all says connection refused regardless of -H -P -s -k and the certs | 19:40 |
f0o | I did that migrate tbh | 19:41 |
spatel | I have created large k8s cluster and couple of worker nodes end up in wrong AZ | 19:41 |
f0o | like just before anoying you on irc | 19:41 |
noonedeadpunk | f0o: but you can migrate both ways with this approach | 19:41 |
noonedeadpunk | just instead of rabbitmq_server / setup_openstack palybooks, run os-cinder-install --tags common-mq,post-install | 19:42 |
spatel | kube_tag=v1.27.4,ingress_controller=octavia,cloud_provider_enabled=true,availability_zone=az1,control_plane_availability_zones=az1 | 19:42 |
noonedeadpunk | and you can also set `cinder_oslomsg_rabbit_quorum_queues: false` in vars not to accidentally affect everything | 19:42 |
f0o | being almost 9pm I'm gonna reiterate this tomorrow on a fresh mind... but I feel like something is off somewhere... I will try like you suggested with just stopping all cinder services and do migration | 19:44 |
f0o | I sort of have a suspicion | 19:44 |
f0o | there's some HPE appliance that might be at cause here for cinder-backups | 19:45 |
f0o | but i can nuke it | 19:45 |
f0o | just not at 9pm heh | 19:45 |
noonedeadpunk | yeah... I totally get that :) | 19:47 |
noonedeadpunk | and I also will sign out - as you said - it's 9pm almost :D | 19:48 |
f0o | I do appreciate the help tho :) | 20:01 |
opendevreview | Merged openstack/openstack-ansible-os_glance master: Remove support for amqp1 https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/938448 | 20:32 |
opendevreview | Merged openstack/ansible-role-qdrouterd master: Retire qdrouterd role https://review.opendev.org/c/openstack/ansible-role-qdrouterd/+/938192 | 21:16 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!