Thursday, 2022-10-06

*** mhen_ is now known as mhen02:01
*** yadnesh|away is now known as yadnesh04:35
*** luigi is now known as luigi-mtg07:05
*** rlandy|out is now known as rlandy10:37
*** blarnath is now known as d34dh0r5312:27
*** yadnesh is now known as yadnesh|away13:13
jpic_hi all, i'm having issues migrating instances, it fails with "cannot find compute host compute-008", nova hypervisor-list shows fqdns in hypervisor hostnames, but in openstack server show it shows only short hostnames for OS-EXT-SRV-ATTR:{host,hypervisor_hostname}, how do you suggest to fix this ?14:15
jpic_i suppose we need to have matching hypervisor hostnames in both instances host attributes and hypervisor names, so, I'm goign to have to change something one way or another right?14:16
lowercaseit would be best if you go the nova-scheduler logs and look at the error there14:50
lowercaseIf the nova scheduler states, "no suitable hypervisor found" then yes. 14:51
lowercaseas for the hypervisor hostnames? I'm not quite sure what you mean there. This might be a stupid answer, but all hypervisors should have unique hostnames.14:52
lowercaseYou could look at hypervisor attributes in the admin console, and confirm that the different hypervisors have similar attributes. 14:53
lowercase"but in openstack server show it shows only short hostnames for OS-EXT-SRV-ATTR:{host,hypervisor_hostname}" I never did, those attributes are different in my environments as well. I just wrote code to handle the difference for me.14:54
lowercaseYou could also avoid typing in the destination server, and just do `nova live-migrate <host>` and then do not specify a destination server. Nova scheduler will then select the best hypervisor, and migrate the server to the best suitable hypervisor.14:56
lowercasejpic_: ^14:56
jpic_lowercase: not no suitable hypervisor, it says it can't find the hypervisor where the instance is actually located14:59
jpic_it says it can't find compute-008 where the VM is at, while compute-008 is indeed the name of the compute service, it's not the name from `nova hypervisor-list` output so I thought we had a naming problem there15:00
lowercasenova live-migration takes the vm uuid as a param.15:02
lowercaseso the hypervisor is online and nova-compute is running?15:03
jpic_yes, it's up, the VM is even up15:05
lowercaseConfirmed by doing `nova service-list`?15:08
lowercasethe vm can be up with the nova-compute service being down 15:11
jpic_yes, it's up in nova service-list, and openstack compute service list15:23
lowercaseokay, can you please share the command you are using to migrate an instance?15:29
jpic_openstack server migrate --live-migration 84cfe8c4-e28a-4b2e-9586-3207492710dc --host compute-009 --os-compute-api-version=2.3015:30
lowercase2.3 huh. What version of openstack is your environment running?15:31
lowercasejuno ish15:31
jpic_train ...15:31
lowercaselol, you are in luck that i still have 2 environments that run train.15:31
lowercaseeverything else i am up to date.15:31
jpic_they are using an os-ansible playbooks fork made by a company that has shutdown, and expecting me to make a quote to "migrate to kolla-ansible" xD15:32
lowercaseso uh, openstack command isn't really fully fledged then. Do you have the command `nova`?15:32
lowercasenaw, make new environment.15:32
lowercaselol15:32
jpic_yes there's the nova command15:33
lowercaseyea, try just doing nova live-migrate <vim-uuid>15:33
jpic_well i guess i need to do some policies: ERROR (Forbidden): Policy doesn't allow os_compute_api:os-migrate-server:migrate_live to be performed. 15:34
lowercasei love a good helpful error15:34
jpic_thanks!15:34
jpic_lowercase: actually that wasn't the VM causing the issue i was trying to describe, see https://dpaste.com/GQXF2XAB815:41
jpic_openstack server migrate is not finding the compute where the VM is actually sitting ...15:42
jpic_`nova live-migration UUID` does not output anything, but doesn't do anything neither: https://dpaste.com/AYVSJTEB215:43
lowercaseplease perform `nova migration-list`, your migration will be the top one15:48
jpic_yes, it's in error15:48
lowercaseview the scheduler logs15:49
lowercaseand then check the nova-conductor logs if oyu don't see anything there15:49
jpic_Failed to compute_task_migrate_server: L'hôte de calcul compute-008 ne peut pas être trouvé.: ComputeHostNotFound: L'h\xf4te de calcul compute-008 ne peut pas \xeatre trouv\xe9.15:50
jpic_which means "the compute host compute-008 cannot be found"15:50
jpic_apparently they got a joyful locale15:50
lowercasehere is a stupid thing to check15:50
lowercasego check if compute-008 is in the same AZ as 0915:50
lowercasenvm it is15:51
lowercasesorry15:51
jpic_openstack availability zone list does show 4 lines with the same name "nova"15:52
lowercaseyeah, nova is the default name for an az15:52
jpic_but we do we have 4 of them15:52
lowercasecompute-008 the host hypervisor cannot be found.15:53
jpic_check it out https://dpaste.com/HRYHSSZMQ15:53
lowercaserestart nova-compute on 008 ?15:53
lowercaseoh no.15:54
lowercasethey have 4 identical zones named nova?!15:54
jpic_apparently!!15:54
lowercaseyikes, start deleting them one by one and see if any hypervisors are assigned to the other novas15:54
jpic_no my bad they don't, they have one nova zone only for computes15:55
jpic_https://dpaste.com/CHGU6TKXT15:55
lowercasewhat does the nova-compute logs say on 008?15:55
lowercaseyou may need to enable debug logging at this point15:56
jpic_ok restarting in debug and retrying15:56
lowercasemight as well put the scheduler and conductor logs in debug as well15:57
lowercasei would still remove those extra AZs15:57
jpic_there's a nova AZ for volume, another one for storage and so on apparently15:57
lowercaseso thats 3 nova's, whats the forth one for lol15:58
jpic_ok there are 2 AZs with name nova for network15:59
lowercasewelp, guess thats okay16:00
lowercasedo a nova live-migration again and watch the logs.16:00
jpic_one is for resource router, the other for network16:00
jpic_well there's a lot of logs but not really seeing anything related16:02
jpic_especially on compute-008, absolutely nothing16:02
jpic_in scheduler, this is logged a lot though: /var/log/nova/nova-scheduler.log:2022-05-02 18:52:13.129 42861 INFO nova.scheduler.host_manager [req-1ee6306b-4ffc-4ddd-b552-c817fdaf3996 - - - - -] The instance sync for host 'compute-008' did not match. Re-created its InstanceList.16:03
jpic_is it normal to have a lot of packet drops on br-ex with ovs and vxlan? like, almost half of the total packets going through?16:09
jpic_I don't have another environment to check16:09
jpic_RX packets 18008  dropped 1548716:10
lowercaseWe don't use ovs.16:11
lowercaseand no... that's a lot16:11
jpic_you have ovn?16:11
lowercaseour switches and routers do the routing. We use a combination of neutron routing, bgp and dedicated routers16:12
lowercasebut the key thing is that neutron and the routers share the same multicast routing, so that both know of each other and can perform the packet routing16:13
lowercases/multicast routing/multicast group16:13
jpici did fix a bunch of networking problems earlier today, not sure if the drops are exactly recent16:23
lowercaseif the api is unable to talk to the compute node, it could result in a very similar issue 16:30
*** soniya29|ruck is now known as soniya2916:30
lowercaseSo, let me think this outloud cause i didn't set it up. We use zebra (linux bgp service) on the neutron linux container. The router advertises a 10.0. block of addresses to the neutron routers. The neutron routers and physical routers share a multicast group that contains the location of each address and it's reservation. If the ip exists, it gets the ip address of the hypervisor as to where it to route the packet (i think). Th16:35
lowercasee hypervisors have multiple vlans, normal stuff. one for db, one for rabbit, one for trove, one for storage backend, one for management.. those are all vxlans. If a vm needs to talk to the other vm, it goes up to neutron and back down. If the connection is external, i believe it go to neutron, to the router and back out.16:35
lowercaseSo every vm, gets a fully routable ip address, a 10.x.x.x address, instead of the default, non routable 1.x.x.x address16:35
lowercaseim afk, be back in an hour or so16:37
*** rlandy is now known as rlandy|bbl21:58
*** rlandy|bbl is now known as rlandy|out22:15

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!