Wednesday, 2024-12-18

derekokeeffe85Morning all, hope all is well, I've been trying to add a compute node to a current cluster and have run into some issues. If anyone has a few min could you take a look at this please and point me in the right direction. Thanks in advance. https://paste.openstack.org/show/bRfClW31LFWDg7iMWt0K/09:07
kleiniFirst deleting openstack_inventory.json is a bad idea. It stores chosen IP addresses for LXC containers running on infra nodes. Running dynamic_inventory.py choses random IPs from management network and assigns them to LXC containers. So every container now got a new management network IP. Furthermore, last part of the containers hostname is some random. These randoms are stored in openstack_inventory.json, too. They get also 09:29
kleinilost when deleting the inventory. This means, your deployment is not able to find existing LXC containers and will create new ones. This doubles them.09:29
kleiniFirst, you need to fix, is the restore your openstack_inventory.json.09:30
kleiniSecond: Adding a compute node is very well described: https://docs.openstack.org/openstack-ansible/2023.1/admin/scale-environment.html#add-a-compute-host09:31
kleiniJust follow that guide.09:31
derekokeeffe85Hi Kleini, I have done exactly as that link has described, but the services are bing skipped tha's what I don't understand and am asking for some help with. Regarding the inventory file, I have a back up of it so I can restore it. so how does that get updated to have the new compute3 added?09:53
kleiniThe inventory is updated by running any playbook. You don't need to care about it. And sorry to have to say it this way, you didn't follow that guide. the limit parameter needs to contain localhost, meaning the deployment host.09:56
kleiniYou must not change the openstack-ansible commands described in that guide. They have essential differences to the openstack-ansible commands you wrote down in your paste.09:59
derekokeeffe85Sorry I had copied those 3 commands from a document I was writing and had put them in wrong. I did follow that link and I did in fact run the script as a second alternative, the first two playbooks ran through fine and the 3rd skips the nova install etc.. maybe what you have pointed out was the reason. I also didn't know that the inventory was updated by running the playbooks. I will restore the original inventory 09:59
derekokeeffe85and run again (which I did at the very beginning when following that guide) Thanks for your input09:59
kleiniRestore openstack_inventory.json and then follow those instructions. Maybe you need to adjust the openstack-ansible version in the URL to the guide according to the version, you're actually running in your deployment.10:35
jrosserderekokeeffe85: if there were no hosts matched, then perhaps your inventory is still broken11:03
jrosseryou can use openstack-ansible/scripts/inventory-manage.py to view the inventory and see if your compute hosts appear where you expect them to11:04
derekokeeffe85Thanks I'll take a look at that. I did restore the inventory, ran a playbook, checked the nventory and compute3 was there (as I said I didn't know this was how it worked) but the 3rd command (openstack-ansible setup-openstack.yml --limit localhost,compute3) still skipped the services as far as I can see unless I'm reading the output wrong. I'll do as you both say and maybe later stick the output in a paste if you 11:47
derekokeeffe85wouldn't mind taking a look.11:47
jrosserderekokeeffe85: you can see one of that tasks that was skipped here https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/playbooks/nova.yml#L16-L2011:54
jrosserand you can see that the target group was "nova_all"11:54
jrosserwe can maybe infer from this that nova_all combined with --limit compute3 returns no hosts11:55
jrosseri.e compute3 is not in the nova_all group11:55
derekokeeffe85Hmm ok jrosser but it does actually create the nova dir and add nova.conf... it then fails saying there's no nova-compute service. Could it be looking for compute3 in different places? I only have it added in the the compute_hosts in openstack_user_config.y,l11:57
jrosserhave you checked which groups the tasks that do succeed were targetting?12:00
derekokeeffe85I've bben trying to dig around in the playbooks but nothing definite (I don't know how you all developed this :)) so I ran the manage inventory script and can see compute3 there with the same output as the working computes. This is the output from the setup openstack playbook https://paste.openstack.org/show/bajR5O4dUtgMqGcX4mEJ/12:05
derekokeeffe85jrosser there's no hosts at all in nova_all in openstack_user_config "nova_all": {13:42
derekokeeffe85<OMMITTED> "hosts": []13:42
derekokeeffe85    }13:42
derekokeeffe85Sorry I meant openstack_inventory.json13:42
jrosserhave you also replacement the relevant parts of env.d / conf.d in your openstack_deploy directory?13:44
kleiniinventory should look like this: https://paste.opendev.org/show/byZ3gYLEDgQPimyrT0tm/13:46
derekokeeffe85Yep group_compute_hosts & nova_compute show the 3 computes jrosser. I didn't change/delete anything in there kleini they all have .example or .aio extensions13:54
derekokeeffe85In both dirs13:54
kleininova_all in inventory should have children and those should contain nova_compute13:55
kleiniand you already confirmed, that nova_compute contains your new compute node13:55
jrosserderekokeeffe85: it would be most helpful if you were comparing to a working deployment14:04
jrosserbecasue for example, nova_all does indeed not have any hosts itself, but it has child groups14:04
jrosserone of which should be nova_compute14:04
jrosserand in the nova_compute group you should find your actual compute hosts under hosts[]14:05
jrosserif you don't, your inventory is still incorrect14:05
derekokeeffe85kleini that's correct in my inventory nova_all contains nova_compute. jrosser, this is the back up of etc/openstack_deploy that deployed this current cluster that has been working for 2 years now. Maybe we thought it was working but we didn't configure it properly from the get go. In my inventory right now I do have nova_compute with no children but the 3 computes are there under hosts :(14:17
derekokeeffe85ahhhhh I think I may have found something!!! Does nova compute depend on a bridge to be configured? My networking might be the problem, could this be something?14:19
jrosseryou have to do whatever is specified in "setup the target hosts" in the documentation, adjusting for your own local networking choices14:20
jrosserit's pretty obvious where things are failing in your earlier paste https://paste.openstack.org/show/bajR5O4dUtgMqGcX4mEJ/14:21
jrosserthis seems not really to do with either inventory or network14:21
jrosseri am very confused with what you're actually asking about now :(14:22
derekokeeffe85In my inventory it has network, container_bridge: br-storage, group_binds: [glance_api, cinder_api, cinder_volume, nova_compute] and my bridge is missing. I don''t have br-storage for some reason all the networking that we had built with netplan is messed up. I'm after taking up too much time on you both with this now so I'll get away and fix this issue first. Thanks for the help today both14:27
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-ops master: Test all supported versions of k8s workload cluster with magnum-cluster-api  https://review.opendev.org/c/openstack/openstack-ansible-ops/+/91664915:49
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-ops master: Test all supported versions of k8s workload cluster with magnum-cluster-api  https://review.opendev.org/c/openstack/openstack-ansible-ops/+/91664916:48
jrossernoonedeadpunk: i found the difference between the working/broken rabbitmq installs https://paste.opendev.org/show/bNgQS66qbtwLlvYptGvU/17:24
jrosserthe jobs that work have something (doesnt matter what really) that causes a 'changed' task and so the handler executes to restart rabbitmq on the upgraded installation17:25
jrosserthe jobs that break do not have any changed task, but all 'ok' instead17:25
jrosserso the problem really starts here https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/0e009192879ee4f9c07d5cdc08d7071d4cd6cf2c/tasks/rabbitmq_upgrade_prep.yml#L34-L3717:26
jrosserthat we have a task which unconditionally stops rabbitmq before an upgrade17:26
jrosserbut there is nothing that unconditionally restarts it afterwards, only some "accident" like a config file update or actually a new version of the package is installed17:27
jrosseri expect at this point when we have just branched the version is equal on N and N-1 versions so thats a fairly serious issue as unless the version or something else changes it will never restart17:28
jrosserlikley also why we did not see this before on upgrade jobs, as the versions would have been different on N and N-117:29

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!