Thursday, 2025-01-09

opendevreview	Merged openstack/ansible-role-systemd_networkd stable/2024.2: Restart systemd-networkd on routes changes https://review.opendev.org/c/openstack/ansible-role-systemd_networkd/+/938699	00:51
opendevreview	Merged openstack/ansible-role-systemd_networkd stable/2024.1: Restart systemd-networkd on routes changes https://review.opendev.org/c/openstack/ansible-role-systemd_networkd/+/938700	01:28
noonedeadpunk	good morning	07:39
jrosser	o/ morning	07:50
opendevreview	Dmitriy Rabotyagov proposed openstack/ansible-role-httpd master: Initial commit to the role https://review.opendev.org/c/openstack/ansible-role-httpd/+/938245	10:41
noonedeadpunk	fwiw, I can't reproduce failure for skyline with httpd role	11:09
f0o	Happy 2025!	13:45
f0o	Unfortunately cinder-scheduler decided to die on me without any changes... we just rebooted the instances one by one to apply OS updates without touching the lxc; now it spews this:	13:46
f0o	Exchange.declare: (406) PRECONDITION_FAILED - inequivalent arg 'durable' for exchange 'cinder-scheduler_fanout' in vhost 'cinder': received 'false' but current is 'true'	13:46
f0o	I've vaguely seen this before but for the love of me cannot remember where or how I resolved it last time	13:46
f0o	anyone got pointers?	13:46
f0o	hrm maybe I can just bump it all to 2024.2 and see if that just fixes it... although I would really like to know why its even happening.. google returns only only bugs from 2016	14:04
noonedeadpunk	f0o: it seems that some cinder-service was somehow reconfigured for HA queues (instead of QUorum) on new vhost	15:57
noonedeadpunk	for 2024.1 we made a migration from HA queues (which are non-durable) to quorum (which are durable). As this is descructve and tricky process, the best way to do so was to drop` /cinder` vhost and create a new `cinder` (without /)	15:58
noonedeadpunk	then before service restart, they were re-configured to use quorum queues. as durable flag is set on initial creation and can't be easily changed	16:00
noonedeadpunk	So I guess I'd check if all cinder services do have quorum enaled in cinder.conf and use `cinder` vhost, and then re-created vhost in rabbitmq	16:01
noonedeadpunk	upgrade to 2024.2 won't give you anything with this regard	16:01
spatel	jrosser do you know how to update existing k8s cluster template values?	16:58
spatel	I want to add new ssh-key pair in cluster template but not show I am doing right way or not	16:58
spatel	openstack coe cluster template <template_name> update keypair_id=jmp1-key	16:59
spatel	Got it - openstack coe cluster template update k8s-v1.27.4-private add keypair_id=jmp1-key	17:00
spatel	ClusterTemplate dc6e16ee-3a2a-4c06-9ac5-e90dff486db1 is referenced by one or multiple clusters (HTTP 400) (Request-ID: req-5c756cc2-313d-49d4-bfe3-76c28d60faa4)	17:00
spatel	look like template in use so I can't add	17:00
noonedeadpunk	yeah, I don't think you can update tenmplate in-use indeed	17:04
f0o	noonedeadpunk: thanks for the pointer! will investigate	17:30
f0o	i recall some rabbitmq migration steps to 2024.1 which I performed. can I simply drop the vhost and rerun thosr steps?	17:32
spatel	noonedeadpunk I am all set there	17:33
spatel	noonedeadpunk but now I am seeing strange issue related AZ stuff with mcapi	17:33
spatel	I have multiple AZ and somehow mcapi doesn't honor my AZ and randomly creating nodes to different AZ	17:33
spatel	This is my label on template - kube_tag=v1.27.4,ingress_controller=octavia,cloud_provider_enabled=true,availability_zone=az1	17:34
spatel	I want all my k8s nodes got to AZ1	17:34
noonedeadpunk	f0o: you can re-run os-cinder-install --tags cinder-config which ensure correect configuration, and re-create vhost, yes	17:40
noonedeadpunk	spatel: are those worker or controller nodes?	17:41
spatel	controller	17:41
noonedeadpunk	as az is respected only for workers	17:41
noonedeadpunk	controllers are indeed spread randomly...	17:41
spatel	so what is the solution to put controller on right AZ?	17:42
noonedeadpunk	I think I saw something in code to restrict controllers though	17:42
spatel	control_avaibility_zone or something?	17:42
noonedeadpunk	`control_plane_availability_zones` - yeah	17:43
noonedeadpunk	https://github.com/vexxhost/magnum-cluster-api/blob/5c40b607e609f93842d9dd3e9d4b2a70b45f844d/magnum_cluster_api/resources.py#L2896-L2899	17:43
spatel	Perfect! let me give it a try and see	17:43
f0o	thanls noonedeadpunk !	18:07
f0o	cant type on phone	18:07
spatel	noonedeadpunk that works!	18:09
spatel	control_plane_availability_zones	18:09
spatel	now one more issue related openstack server group	18:09
spatel	mcapi doesn't delete server group. I have to delete by hand (manually()	18:10
noonedeadpunk	well. we have another issue as well, when trying to scale cluster back capi driver marks all of workers as taint, regardless of minimal workers value	18:18
noonedeadpunk	ofc k8s does not delete them, but magnum still looses them from it's db	18:18
spatel	hmm	18:42
spatel	noonedeadpunk What permission I have to give normal user to create k8s cluster?	18:42
spatel	I did reader role to user but still getting permission error - openstack role add --user spatel --project production reader	18:42
noonedeadpunk	at very least member... but also there could be some magnum specific roles - can't recall tbh	18:43
noonedeadpunk	but could be it's only for heat driver...	18:43
jrosser	afaik we have not had to do anything special with permissions, just regular user is ok	18:46
spatel	hmm	18:46
noonedeadpunk	but reader totally won't be enough	18:47
spatel	Hmm!	18:55
spatel	Did you see this error? - https://paste.opendev.org/show/btNIQ7wMvpa3BEepwLvI/	18:55
spatel	I have created template using admin account and now trying to create k8s cluster using my own account spatel	18:56
spatel	I can see template then why its not letting me use that template ?	18:56
spatel	Do i need to create template for each project? that is strange	18:57
noonedeadpunk	um, you have a type either in config or in endpoints	18:57
noonedeadpunk	I guess it's config	18:57
noonedeadpunk	wrt keystone url	18:58
birbilakos	hello team, I'm new with openstack-ansible and I was wondering how I could deploy magnum AFTER the initial installation of openstack. I.e. made all the config changes in user config but don't really know which playbooks to trigger. Any help?	18:58
noonedeadpunk	birbilakos: have you also edited openstack_user_config?	18:58
noonedeadpunk	and what version?	18:58
noonedeadpunk	lxc?	18:59
birbilakos	yes, it's prepped. I'm using OSA 2023.2	18:59
birbilakos	yes, on lxc	18:59
spatel	noonedeadpunk are you asking me for keystone url?	18:59
noonedeadpunk	just yo doublecheck - when you run /opt/openstack-ansible/scripts/inventory-manage.py -G \grep magnum - you have some containers there?	18:59
spatel	export OS_AUTH_URL=http://openstack-bos-2.example.com:5000	18:59
noonedeadpunk	spatel: you have somewhere extra "%" in keystone url according to the error	19:00
spatel	hmm	19:00
noonedeadpunk	`identity version 3%` - so you have somewhere `http://openstack-bos-2.example.com:5000/3%` or smth liek that	19:00
noonedeadpunk	birbilakos: if you have output from inventory-manage.py -G wt to magnum, then you need to: 1. Create containers - openstack-ansible lxc-container-create.yml --limit magnum_all,lxc_hosts	19:02
spatel	I have removed % :)	19:02
spatel	but error is still there	19:02
spatel	ClusterTemplate dc6e16ee-3a2a-4c06-9ac5-e90dff486db1 could not be found (HTTP 404) (Request-ID: req-3371c096-19d2-418b-8210-36b9881325a6)	19:02
noonedeadpunk	spatel: did you made a template public?	19:02
noonedeadpunk	birbilakos: then, you can jsut run `openstack-ansible os-magnum-install.yml` and it should be enough	19:03
noonedeadpunk	ofc depending on the driver for magnum....	19:03
spatel	if its not public then how does it visible ?	19:03
birbilakos	thank you noonedeadpunk. Yes, /opt/openstack-ansible/scripts/inventory-manage.py -G \|grep magnum returns nothing	19:03
noonedeadpunk	birbilakos: ok, then seems you've missed smth	19:03
noonedeadpunk	oh, well. try /opt/openstack-ansible/dynamic-inventory.py first...	19:04
birbilakos	no, this config change was added now. I know it wasnt part of the initial install	19:04
noonedeadpunk	as probably you did changes but didn't run anything after?	19:04
birbilakos	so the question is how to deploy just this now	19:04
noonedeadpunk	I just wanted to ensure you have defined openstack_user_config properly :)	19:05
spatel	noonedeadpunk you are right.. it was [public] issue	19:05
birbilakos	magnum-infra_hosts: *infrastructure_hosts	19:05
spatel	After fixing that it works	19:05
birbilakos	that's what I added just now, should be enough I guess	19:05
spatel	very odd	19:05
noonedeadpunk	yeah, so as I said - openstack-ansible lxc-container-create.yml --limit magnum_all,lxc_hosts ; openstack-ansible os-magnum-install.yml	19:05
birbilakos	got it, will try	19:05
noonedeadpunk	birbilakos: but if you want cluster-api driver, it will be a bit more steps	19:05
noonedeadpunk	https://docs.openstack.org/openstack-ansible-ops/latest/mcapi.html	19:06
birbilakos	good point... thanks for the link!!	19:06
birbilakos	btw, don't I also need to rerun horizon playbook for the magnum integration to work?	19:06
noonedeadpunk	as there you;d need also to install a "master" k8s cluster inside lxc as well	19:06
noonedeadpunk	if you use horizon - then yes :)	19:07
noonedeadpunk	sorry, I just don't have it for quite some time so clean forgot about it	19:07
noonedeadpunk	btw. if it's not cluster api - then you'd need to have heat	19:07
noonedeadpunk	and likely you also want to have Octavia with Magnum	19:07
birbilakos	hehe, no worries. I think its openstack-ansible os-horizon-install.yml right?	19:07
noonedeadpunk	yup	19:08
birbilakos	i think octavia requires quite some more changes so I'm not sure if I want to take it up just now	19:08
noonedeadpunk	you might need to add `-e venv_rebuild=true`, though for this case it could be you don't need it...	19:08
birbilakos	I hope this wont change the haproxy configs or certs. I have manually updated them since the initial install	19:09
noonedeadpunk	so for cluster api to access spawned control plane from "master" cluster you need either octavia or proxy service on computes....	19:09
noonedeadpunk	it will change some haproxy config	19:10
noonedeadpunk	as magnum playbooks needs to add more backends	19:10
noonedeadpunk	though it should not change existing ones...	19:10
birbilakos	got it. for now I will try the default installation (without cluster api). I think the limitation is that it's very tied to specific k8s versions for this to work and of course no native lb support from openstack (which is ok for my use case). I will study more on the mcapi stuff though	19:11
noonedeadpunk	birbilakos: I think it's not about specific version, rather that these versions are old	19:11
birbilakos	that probably also means only one 'master' node for my k8s envs	19:11
noonedeadpunk	I think latest one that works was 1.28 or smth	19:12
noonedeadpunk	and heat driver is also gonna be deprecated soon by magnum team	19:12
noonedeadpunk	(with kind no migration from heat to capi)	19:12
birbilakos	correct	19:12
birbilakos	1.28 is good enough for now	19:12
birbilakos	so if I get this righ, capi installs a minimal control plane lxc container in my infra nodes. Then this is responsible for spinning up the actual clusters, right?	19:14
birbilakos	so if I get this right, capi installs a minimal control plane lxc container in my infra nodes. Then this is responsible for spinning up the actual clusters, right?	19:15
noonedeadpunk	well, it's not capi but playbooks... but yes	19:15
f0o	noonedeadpunk: I deleted the vhost and rerun openstack-ansible os-cinder-install.yml --tags cinder-config; same error :/	19:15
noonedeadpunk	and then you need to have a separate image for each k8s version	19:15
noonedeadpunk	(along with cluster template)	19:16
birbilakos	looks like this is supported in 2024.1 onwards though, which is a no go until we upgrade :)	19:16
noonedeadpunk	f0o: ugh... I guess I'd get rabbitmqadmin (though you'd need to extend rights for monitoring user first) and try to see more details about the queue - if it's durable or not right now.	19:21
noonedeadpunk	if it's in "cinder" vhost (not "/cinder") - it should be durable	19:21
noonedeadpunk	ie https://www.rabbitmq.com/docs/management-cli#get-a-list-of-queues-with-all-the-detail-we-can-take	19:21
f0o	if you have a rabbitmqctl command -------- nvmd you just read my mind	19:21
noonedeadpunk	but rabbitmqadmin is not installed out of the box	19:22
f0o	that's correct.. any chance you know the rabbitmqctl equivalent? :D	19:24
noonedeadpunk	there's none afaik	19:24
noonedeadpunk	birbilakos: well, I'm not 100% sure, you might want to double-check with jrosser, but I think it's doable on 2023.2 as well	19:25
f0o	apt-cache search rabbitmqadmin is empty ;/	19:26
noonedeadpunk	iirc it's just a binary...	19:28
noonedeadpunk	https://www.rabbitmq.com/docs/management-cli	19:28
noonedeadpunk	you can donwload or from rabbitmq management api, or from github	19:29
noonedeadpunk	also, there's a UI for it...	19:29
noonedeadpunk	but it's available only on openstack management network by default	19:29
f0o	fun	19:29
noonedeadpunk	so you might need to make a SOCKS proxy through SSH, ie ssh -D 8080 control01	19:29
noonedeadpunk	and then set 127.0.0.1:8080 as socks5 proxy like in firefox...	19:30
noonedeadpunk	and then you can connect to internal vip on .... 15672 ?	19:30
noonedeadpunk	15671	19:31
noonedeadpunk	for tls	19:31
f0o	soooo *** Could not connect: [Errno 111] Connection refused	19:32
f0o	on the same host	19:32
f0o	but it seems that the admin binary is a shell script	19:33
f0o	so maybe I can just hack it	19:33
noonedeadpunk	it's balanced by haproxy iirc	19:34
f0o	but it should work against the local listener no?	19:36
noonedeadpunk	um, yeah, it should	19:37
noonedeadpunk	just tls/non-tls are different ports as well - keep that in mind	19:37
noonedeadpunk	but it somehow feels that you might have one of the cinder services misconfigured (or even "lost"/unmanaged), which creates non-durable queues, while others expect it to be durable...	19:38
f0o	hrm	19:39
f0o	so best guess stop all cinders everywhere?	19:39
noonedeadpunk	eventually.. another thing to try is to rollback cinder to HA queues, which are not durable. This will drop "cinder" vhost and create an old-new "/cinder"	19:39
noonedeadpunk	and then revert this... but it's not great way...	19:40
spatel	noonedeadpunk after more testing I found worker node not honer AZ	19:40
noonedeadpunk	https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html#migrate-between-ha-and-quorum-queues	19:40
f0o	right now I cant even get admin cli to work... all says connection refused regardless of -H -P -s -k and the certs	19:40
f0o	I did that migrate tbh	19:41
spatel	I have created large k8s cluster and couple of worker nodes end up in wrong AZ	19:41
f0o	like just before anoying you on irc	19:41
noonedeadpunk	f0o: but you can migrate both ways with this approach	19:41
noonedeadpunk	just instead of rabbitmq_server / setup_openstack palybooks, run os-cinder-install --tags common-mq,post-install	19:42
spatel	kube_tag=v1.27.4,ingress_controller=octavia,cloud_provider_enabled=true,availability_zone=az1,control_plane_availability_zones=az1	19:42
noonedeadpunk	and you can also set `cinder_oslomsg_rabbit_quorum_queues: false` in vars not to accidentally affect everything	19:42
f0o	being almost 9pm I'm gonna reiterate this tomorrow on a fresh mind... but I feel like something is off somewhere... I will try like you suggested with just stopping all cinder services and do migration	19:44
f0o	I sort of have a suspicion	19:44
f0o	there's some HPE appliance that might be at cause here for cinder-backups	19:45
f0o	but i can nuke it	19:45
f0o	just not at 9pm heh	19:45
noonedeadpunk	yeah... I totally get that :)	19:47
noonedeadpunk	and I also will sign out - as you said - it's 9pm almost :D	19:48
f0o	I do appreciate the help tho :)	20:01
opendevreview	Merged openstack/openstack-ansible-os_glance master: Remove support for amqp1 https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/938448	20:32
opendevreview	Merged openstack/ansible-role-qdrouterd master: Retire qdrouterd role https://review.opendev.org/c/openstack/ansible-role-qdrouterd/+/938192	21:16

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!