derekokeeffe85 | Morning all, hope all is well, I've been trying to add a compute node to a current cluster and have run into some issues. If anyone has a few min could you take a look at this please and point me in the right direction. Thanks in advance. https://paste.openstack.org/show/bRfClW31LFWDg7iMWt0K/ | 09:07 |
---|---|---|
kleini | First deleting openstack_inventory.json is a bad idea. It stores chosen IP addresses for LXC containers running on infra nodes. Running dynamic_inventory.py choses random IPs from management network and assigns them to LXC containers. So every container now got a new management network IP. Furthermore, last part of the containers hostname is some random. These randoms are stored in openstack_inventory.json, too. They get also | 09:29 |
kleini | lost when deleting the inventory. This means, your deployment is not able to find existing LXC containers and will create new ones. This doubles them. | 09:29 |
kleini | First, you need to fix, is the restore your openstack_inventory.json. | 09:30 |
kleini | Second: Adding a compute node is very well described: https://docs.openstack.org/openstack-ansible/2023.1/admin/scale-environment.html#add-a-compute-host | 09:31 |
kleini | Just follow that guide. | 09:31 |
derekokeeffe85 | Hi Kleini, I have done exactly as that link has described, but the services are bing skipped tha's what I don't understand and am asking for some help with. Regarding the inventory file, I have a back up of it so I can restore it. so how does that get updated to have the new compute3 added? | 09:53 |
kleini | The inventory is updated by running any playbook. You don't need to care about it. And sorry to have to say it this way, you didn't follow that guide. the limit parameter needs to contain localhost, meaning the deployment host. | 09:56 |
kleini | You must not change the openstack-ansible commands described in that guide. They have essential differences to the openstack-ansible commands you wrote down in your paste. | 09:59 |
derekokeeffe85 | Sorry I had copied those 3 commands from a document I was writing and had put them in wrong. I did follow that link and I did in fact run the script as a second alternative, the first two playbooks ran through fine and the 3rd skips the nova install etc.. maybe what you have pointed out was the reason. I also didn't know that the inventory was updated by running the playbooks. I will restore the original inventory | 09:59 |
derekokeeffe85 | and run again (which I did at the very beginning when following that guide) Thanks for your input | 09:59 |
kleini | Restore openstack_inventory.json and then follow those instructions. Maybe you need to adjust the openstack-ansible version in the URL to the guide according to the version, you're actually running in your deployment. | 10:35 |
jrosser | derekokeeffe85: if there were no hosts matched, then perhaps your inventory is still broken | 11:03 |
jrosser | you can use openstack-ansible/scripts/inventory-manage.py to view the inventory and see if your compute hosts appear where you expect them to | 11:04 |
derekokeeffe85 | Thanks I'll take a look at that. I did restore the inventory, ran a playbook, checked the nventory and compute3 was there (as I said I didn't know this was how it worked) but the 3rd command (openstack-ansible setup-openstack.yml --limit localhost,compute3) still skipped the services as far as I can see unless I'm reading the output wrong. I'll do as you both say and maybe later stick the output in a paste if you | 11:47 |
derekokeeffe85 | wouldn't mind taking a look. | 11:47 |
jrosser | derekokeeffe85: you can see one of that tasks that was skipped here https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/playbooks/nova.yml#L16-L20 | 11:54 |
jrosser | and you can see that the target group was "nova_all" | 11:54 |
jrosser | we can maybe infer from this that nova_all combined with --limit compute3 returns no hosts | 11:55 |
jrosser | i.e compute3 is not in the nova_all group | 11:55 |
derekokeeffe85 | Hmm ok jrosser but it does actually create the nova dir and add nova.conf... it then fails saying there's no nova-compute service. Could it be looking for compute3 in different places? I only have it added in the the compute_hosts in openstack_user_config.y,l | 11:57 |
jrosser | have you checked which groups the tasks that do succeed were targetting? | 12:00 |
derekokeeffe85 | I've bben trying to dig around in the playbooks but nothing definite (I don't know how you all developed this :)) so I ran the manage inventory script and can see compute3 there with the same output as the working computes. This is the output from the setup openstack playbook https://paste.openstack.org/show/bajR5O4dUtgMqGcX4mEJ/ | 12:05 |
derekokeeffe85 | jrosser there's no hosts at all in nova_all in openstack_user_config "nova_all": { | 13:42 |
derekokeeffe85 | <OMMITTED> "hosts": [] | 13:42 |
derekokeeffe85 | } | 13:42 |
derekokeeffe85 | Sorry I meant openstack_inventory.json | 13:42 |
jrosser | have you also replacement the relevant parts of env.d / conf.d in your openstack_deploy directory? | 13:44 |
kleini | inventory should look like this: https://paste.opendev.org/show/byZ3gYLEDgQPimyrT0tm/ | 13:46 |
derekokeeffe85 | Yep group_compute_hosts & nova_compute show the 3 computes jrosser. I didn't change/delete anything in there kleini they all have .example or .aio extensions | 13:54 |
derekokeeffe85 | In both dirs | 13:54 |
kleini | nova_all in inventory should have children and those should contain nova_compute | 13:55 |
kleini | and you already confirmed, that nova_compute contains your new compute node | 13:55 |
jrosser | derekokeeffe85: it would be most helpful if you were comparing to a working deployment | 14:04 |
jrosser | becasue for example, nova_all does indeed not have any hosts itself, but it has child groups | 14:04 |
jrosser | one of which should be nova_compute | 14:04 |
jrosser | and in the nova_compute group you should find your actual compute hosts under hosts[] | 14:05 |
jrosser | if you don't, your inventory is still incorrect | 14:05 |
derekokeeffe85 | kleini that's correct in my inventory nova_all contains nova_compute. jrosser, this is the back up of etc/openstack_deploy that deployed this current cluster that has been working for 2 years now. Maybe we thought it was working but we didn't configure it properly from the get go. In my inventory right now I do have nova_compute with no children but the 3 computes are there under hosts :( | 14:17 |
derekokeeffe85 | ahhhhh I think I may have found something!!! Does nova compute depend on a bridge to be configured? My networking might be the problem, could this be something? | 14:19 |
jrosser | you have to do whatever is specified in "setup the target hosts" in the documentation, adjusting for your own local networking choices | 14:20 |
jrosser | it's pretty obvious where things are failing in your earlier paste https://paste.openstack.org/show/bajR5O4dUtgMqGcX4mEJ/ | 14:21 |
jrosser | this seems not really to do with either inventory or network | 14:21 |
jrosser | i am very confused with what you're actually asking about now :( | 14:22 |
derekokeeffe85 | In my inventory it has network, container_bridge: br-storage, group_binds: [glance_api, cinder_api, cinder_volume, nova_compute] and my bridge is missing. I don''t have br-storage for some reason all the networking that we had built with netplan is messed up. I'm after taking up too much time on you both with this now so I'll get away and fix this issue first. Thanks for the help today both | 14:27 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-ops master: Test all supported versions of k8s workload cluster with magnum-cluster-api https://review.opendev.org/c/openstack/openstack-ansible-ops/+/916649 | 15:49 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-ops master: Test all supported versions of k8s workload cluster with magnum-cluster-api https://review.opendev.org/c/openstack/openstack-ansible-ops/+/916649 | 16:48 |
jrosser | noonedeadpunk: i found the difference between the working/broken rabbitmq installs https://paste.opendev.org/show/bNgQS66qbtwLlvYptGvU/ | 17:24 |
jrosser | the jobs that work have something (doesnt matter what really) that causes a 'changed' task and so the handler executes to restart rabbitmq on the upgraded installation | 17:25 |
jrosser | the jobs that break do not have any changed task, but all 'ok' instead | 17:25 |
jrosser | so the problem really starts here https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/0e009192879ee4f9c07d5cdc08d7071d4cd6cf2c/tasks/rabbitmq_upgrade_prep.yml#L34-L37 | 17:26 |
jrosser | that we have a task which unconditionally stops rabbitmq before an upgrade | 17:26 |
jrosser | but there is nothing that unconditionally restarts it afterwards, only some "accident" like a config file update or actually a new version of the package is installed | 17:27 |
jrosser | i expect at this point when we have just branched the version is equal on N and N-1 versions so thats a fairly serious issue as unless the version or something else changes it will never restart | 17:28 |
jrosser | likley also why we did not see this before on upgrade jobs, as the versions would have been different on N and N-1 | 17:29 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!