Thursday, 2022-10-06

*** mhen_ is now known as mhen		02:01
*** yadnesh\|away is now known as yadnesh		04:35
*** luigi is now known as luigi-mtg		07:05
*** rlandy\|out is now known as rlandy		10:37
*** blarnath is now known as d34dh0r53		12:27
*** yadnesh is now known as yadnesh\|away		13:13
jpic_	hi all, i'm having issues migrating instances, it fails with "cannot find compute host compute-008", nova hypervisor-list shows fqdns in hypervisor hostnames, but in openstack server show it shows only short hostnames for OS-EXT-SRV-ATTR:{host,hypervisor_hostname}, how do you suggest to fix this ?	14:15
jpic_	i suppose we need to have matching hypervisor hostnames in both instances host attributes and hypervisor names, so, I'm goign to have to change something one way or another right?	14:16
lowercase	it would be best if you go the nova-scheduler logs and look at the error there	14:50
lowercase	If the nova scheduler states, "no suitable hypervisor found" then yes.	14:51
lowercase	as for the hypervisor hostnames? I'm not quite sure what you mean there. This might be a stupid answer, but all hypervisors should have unique hostnames.	14:52
lowercase	You could look at hypervisor attributes in the admin console, and confirm that the different hypervisors have similar attributes.	14:53
lowercase	"but in openstack server show it shows only short hostnames for OS-EXT-SRV-ATTR:{host,hypervisor_hostname}" I never did, those attributes are different in my environments as well. I just wrote code to handle the difference for me.	14:54
lowercase	You could also avoid typing in the destination server, and just do `nova live-migrate <host>` and then do not specify a destination server. Nova scheduler will then select the best hypervisor, and migrate the server to the best suitable hypervisor.	14:56
lowercase	jpic_: ^	14:56
jpic_	lowercase: not no suitable hypervisor, it says it can't find the hypervisor where the instance is actually located	14:59
jpic_	it says it can't find compute-008 where the VM is at, while compute-008 is indeed the name of the compute service, it's not the name from `nova hypervisor-list` output so I thought we had a naming problem there	15:00
lowercase	nova live-migration takes the vm uuid as a param.	15:02
lowercase	so the hypervisor is online and nova-compute is running?	15:03
jpic_	yes, it's up, the VM is even up	15:05
lowercase	Confirmed by doing `nova service-list`?	15:08
lowercase	the vm can be up with the nova-compute service being down	15:11
jpic_	yes, it's up in nova service-list, and openstack compute service list	15:23
lowercase	okay, can you please share the command you are using to migrate an instance?	15:29
jpic_	openstack server migrate --live-migration 84cfe8c4-e28a-4b2e-9586-3207492710dc --host compute-009 --os-compute-api-version=2.30	15:30
lowercase	2.3 huh. What version of openstack is your environment running?	15:31
lowercase	juno ish	15:31
jpic_	train ...	15:31
lowercase	lol, you are in luck that i still have 2 environments that run train.	15:31
lowercase	everything else i am up to date.	15:31
jpic_	they are using an os-ansible playbooks fork made by a company that has shutdown, and expecting me to make a quote to "migrate to kolla-ansible" xD	15:32
lowercase	so uh, openstack command isn't really fully fledged then. Do you have the command `nova`?	15:32
lowercase	naw, make new environment.	15:32
lowercase	lol	15:32
jpic_	yes there's the nova command	15:33
lowercase	yea, try just doing nova live-migrate <vim-uuid>	15:33
jpic_	well i guess i need to do some policies: ERROR (Forbidden): Policy doesn't allow os_compute_api:os-migrate-server:migrate_live to be performed.	15:34
lowercase	i love a good helpful error	15:34
jpic_	thanks!	15:34
jpic_	lowercase: actually that wasn't the VM causing the issue i was trying to describe, see https://dpaste.com/GQXF2XAB8	15:41
jpic_	openstack server migrate is not finding the compute where the VM is actually sitting ...	15:42
jpic_	`nova live-migration UUID` does not output anything, but doesn't do anything neither: https://dpaste.com/AYVSJTEB2	15:43
lowercase	please perform `nova migration-list`, your migration will be the top one	15:48
jpic_	yes, it's in error	15:48
lowercase	view the scheduler logs	15:49
lowercase	and then check the nova-conductor logs if oyu don't see anything there	15:49
jpic_	Failed to compute_task_migrate_server: L'hôte de calcul compute-008 ne peut pas être trouvé.: ComputeHostNotFound: L'h\xf4te de calcul compute-008 ne peut pas \xeatre trouv\xe9.	15:50
jpic_	which means "the compute host compute-008 cannot be found"	15:50
jpic_	apparently they got a joyful locale	15:50
lowercase	here is a stupid thing to check	15:50
lowercase	go check if compute-008 is in the same AZ as 09	15:50
lowercase	nvm it is	15:51
lowercase	sorry	15:51
jpic_	openstack availability zone list does show 4 lines with the same name "nova"	15:52
lowercase	yeah, nova is the default name for an az	15:52
jpic_	but we do we have 4 of them	15:52
lowercase	compute-008 the host hypervisor cannot be found.	15:53
jpic_	check it out https://dpaste.com/HRYHSSZMQ	15:53
lowercase	restart nova-compute on 008 ?	15:53
lowercase	oh no.	15:54
lowercase	they have 4 identical zones named nova?!	15:54
jpic_	apparently!!	15:54
lowercase	yikes, start deleting them one by one and see if any hypervisors are assigned to the other novas	15:54
jpic_	no my bad they don't, they have one nova zone only for computes	15:55
jpic_	https://dpaste.com/CHGU6TKXT	15:55
lowercase	what does the nova-compute logs say on 008?	15:55
lowercase	you may need to enable debug logging at this point	15:56
jpic_	ok restarting in debug and retrying	15:56
lowercase	might as well put the scheduler and conductor logs in debug as well	15:57
lowercase	i would still remove those extra AZs	15:57
jpic_	there's a nova AZ for volume, another one for storage and so on apparently	15:57
lowercase	so thats 3 nova's, whats the forth one for lol	15:58
jpic_	ok there are 2 AZs with name nova for network	15:59
lowercase	welp, guess thats okay	16:00
lowercase	do a nova live-migration again and watch the logs.	16:00
jpic_	one is for resource router, the other for network	16:00
jpic_	well there's a lot of logs but not really seeing anything related	16:02
jpic_	especially on compute-008, absolutely nothing	16:02
jpic_	in scheduler, this is logged a lot though: /var/log/nova/nova-scheduler.log:2022-05-02 18:52:13.129 42861 INFO nova.scheduler.host_manager [req-1ee6306b-4ffc-4ddd-b552-c817fdaf3996 - - - - -] The instance sync for host 'compute-008' did not match. Re-created its InstanceList.	16:03
jpic_	is it normal to have a lot of packet drops on br-ex with ovs and vxlan? like, almost half of the total packets going through?	16:09
jpic_	I don't have another environment to check	16:09
jpic_	RX packets 18008 dropped 15487	16:10
lowercase	We don't use ovs.	16:11
lowercase	and no... that's a lot	16:11
jpic_	you have ovn?	16:11
lowercase	our switches and routers do the routing. We use a combination of neutron routing, bgp and dedicated routers	16:12
lowercase	but the key thing is that neutron and the routers share the same multicast routing, so that both know of each other and can perform the packet routing	16:13
lowercase	s/multicast routing/multicast group	16:13
jpic	i did fix a bunch of networking problems earlier today, not sure if the drops are exactly recent	16:23
lowercase	if the api is unable to talk to the compute node, it could result in a very similar issue	16:30
*** soniya29\|ruck is now known as soniya29		16:30
lowercase	So, let me think this outloud cause i didn't set it up. We use zebra (linux bgp service) on the neutron linux container. The router advertises a 10.0. block of addresses to the neutron routers. The neutron routers and physical routers share a multicast group that contains the location of each address and it's reservation. If the ip exists, it gets the ip address of the hypervisor as to where it to route the packet (i think). Th	16:35
lowercase	e hypervisors have multiple vlans, normal stuff. one for db, one for rabbit, one for trove, one for storage backend, one for management.. those are all vxlans. If a vm needs to talk to the other vm, it goes up to neutron and back down. If the connection is external, i believe it go to neutron, to the router and back out.	16:35
lowercase	So every vm, gets a fully routable ip address, a 10.x.x.x address, instead of the default, non routable 1.x.x.x address	16:35
lowercase	im afk, be back in an hour or so	16:37
*** rlandy is now known as rlandy\|bbl		21:58
*** rlandy\|bbl is now known as rlandy\|out		22:15

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!