Tuesday, 2019-01-22

*** tetsuro has joined #openstack-placement		00:10
*** tetsuro_ has joined #openstack-placement		04:09
*** tetsuro has quit IRC		04:11
*** tetsuro has joined #openstack-placement		04:31
*** tetsuro_ has quit IRC		04:31
*** openstackgerrit has joined #openstack-placement		04:41
openstackgerrit	Merged openstack/placement master: Add irrelevant files list to perfload job https://review.openstack.org/624047	04:41
*** e0ne has joined #openstack-placement		05:52
*** e0ne has quit IRC		05:53
*** avolkov has joined #openstack-placement		06:03
*** takashin has left #openstack-placement		06:36
*** tssurya has joined #openstack-placement		07:41
*** helenafm has joined #openstack-placement		08:22
*** tssurya has quit IRC		09:25
*** tetsuro has quit IRC		09:36
openstackgerrit	Tetsuro Nakamura proposed openstack/placement master: Configure database api in upgrade check https://review.openstack.org/632365	09:51
*** cdent has joined #openstack-placement		10:12
cdent	thanks for paying attention tetsuro	10:15
cdent	gibi: if you're happy with https://review.openstack.org/#/c/632365/ can you kick it in? It's a bug fix to the status command that didn't get fully tested before merged	10:16
gibi	cdent: done	10:20
cdent	thanks	10:20
*** e0ne has joined #openstack-placement		10:24
openstackgerrit	Tetsuro Nakamura proposed openstack/placement master: Configure database api in upgrade check https://review.openstack.org/632365	10:32
*** tetsuro has joined #openstack-placement		10:34
cdent	I think maybe we should just let tetsuro do everything, he's the only one paying sufficient attention to correctness :)	10:37
tetsuro	cdent: anyway thanks for re-approving! :)	10:38
cdent	thank you	10:39
gibi	:)	10:46
*** ttsiouts has joined #openstack-placement		12:22
*** e0ne has quit IRC		12:25
*** e0ne has joined #openstack-placement		12:30
*** tssurya has joined #openstack-placement		12:38
*** tetsuro has quit IRC		12:53
*** mriedem has joined #openstack-placement		13:18
openstackgerrit	Merged openstack/placement master: Configure database api in upgrade check https://review.openstack.org/632365	14:01
*** e0ne has quit IRC		14:05
*** e0ne has joined #openstack-placement		14:08
*** mriedem has quit IRC		14:19
*** mriedem has joined #openstack-placement		14:21
*** ttsiouts has quit IRC		14:39
*** ttsiouts has joined #openstack-placement		14:39
*** ttsiouts has quit IRC		14:41
*** ttsiouts has joined #openstack-placement		14:41
*** efried_mlk is now known as efried		14:45
*** rubasov_ is now known as rubasov		14:50
*** efried1 has joined #openstack-placement		15:00
*** efried has quit IRC		15:01
*** efried1 is now known as efried		15:01
*** avolkov has quit IRC		15:34
*** openstackgerrit has quit IRC		15:51
*** efried has quit IRC		16:00
*** efried has joined #openstack-placement		16:09
*** dims has quit IRC		16:15
*** dims has joined #openstack-placement		16:20
*** e0ne has quit IRC		16:38
*** e0ne has joined #openstack-placement		16:39
*** mriedem is now known as mriedem_away		16:39
*** ttsiouts has quit IRC		16:55
*** ttsiouts has joined #openstack-placement		16:55
*** ttsiouts has quit IRC		17:00
*** efried has quit IRC		17:00
*** e0ne has quit IRC		17:02
*** helenafm has quit IRC		17:10
*** efried has joined #openstack-placement		17:49
*** avolkov has joined #openstack-placement		18:35
*** e0ne has joined #openstack-placement		19:07
*** gryf has joined #openstack-placement		19:29
*** tssurya has quit IRC		19:36
*** e0ne has quit IRC		19:42
*** mriedem_away is now known as mriedem		20:16
*** alanmeadows has joined #openstack-placement		20:37
jaypipes	alanmeadows: whatup g-money?	20:37
alanmeadows	Ahoy folks.	20:37
alanmeadows	We had change go out to a number of production sites that adjusted the hostname to nova agents (agent stops reporting as `host` and starts reporting as `host.fqdn`)	20:38
jaypipes	alanmeadows: lemme guess... doubled-up resource provider records? :)	20:39
jaypipes	alanmeadows: and a scheduler that suddenly thinks you've got a shitload of extra capacity?	20:39
alanmeadows	yes along those lines	20:39
*** dims has quit IRC		20:40
alanmeadows	pci scheduling conflicts obviously, as the pci_devices table is populated with unallocated entries for these "new" nodes	20:40
alanmeadows	but let me walk through what was tried quickly	20:40
jaypipes	alanmeadows: and you're looking for a quick hotfix to get the data ungoofed?	20:40
alanmeadows	and link that up to the placement question	20:40
jaypipes	ack	20:40
alanmeadows	we got the bright idea given no resources have been created using the new compute_nodes entry	20:41
alanmeadows	that we would revive the compute_nodes old entry, and update it to the new fqdn, and deactivate the new one	20:41
*** dims has joined #openstack-placement		20:42
alanmeadows	dancing around the unique constraints	20:42
alanmeadows	we then discovered a `uuid` in compute_nodes that clearly links the node to the placement tables	20:42
alanmeadows	and finally on to the bit confusing us	20:43
jaypipes	alanmeadows: yes, that's essentially what you'll need to do. the only issue is that you're going to need to first delete the placement resource_providers table records that refer to the new fqdns	20:43
alanmeadows	well this is whats weird	20:43
alanmeadows	the resource_providers table has an entry for the new, target fqdn name, with the wrong uuid	20:44
alanmeadows	we can fix that, sure	20:44
alanmeadows	but whats odd is there is no entry for the old agent name like there was in compute_nodes	20:44
alanmeadows	on top of that there are no allocations for the new entry generated in resource_providers	20:44
jaypipes	alanmeadows: that is indeed weird.	20:45
alanmeadows	much like the magic conversion nova will do for deactivating dupes in compute_nodes (set shortname to deleted=1, ...)	20:45
jaypipes	alanmeadows: there cannot be allocations referring to the old provider UUIDs but no entries in the resource_providers table with those UUIDs.	20:45
alanmeadows	to make a transition from short->long or long->short hostnames painless	20:45
alanmeadows	it almost seems like some magic transition happened in the placement data, and all allocations lost in the process	20:46
jaypipes	alanmeadows: so all allocations are gone?	20:46
alanmeadows	all allocations for a node that has undergone this hostname transition of short->fqdn are gone	20:47
jaypipes	yikes.	20:47
alanmeadows	in a site where this happened to all nodes before it was noticed	20:47
jaypipes	mriedem: ^^	20:47
alanmeadows	the allocations table is an empty set	20:47
jaypipes	alanmeadows: is this a flashing red lights situation?	20:49
jaypipes	alanmeadows: i.e. production site with no immediate way of recovering	20:49
alanmeadows	it has some people quite interested in the outcome and a resolution ;-)	20:49
jaypipes	heh, ok.	20:50
jaypipes	we really need to put some sort of barrier/prevention in place when/if we notice a my_hostname CONF change...	20:50
*** dims has quit IRC		20:50
jaypipes	or whatever the CONF option is called that determines the nova-compute service name. can't ever remember it.	20:50
jaypipes	my_ip?	20:51
jaypipes	meh, whatever...	20:51
jaypipes	alanmeadows: lemme have a think.	20:52
*** dims has joined #openstack-placement		20:52
jaypipes	alanmeadows: if you reset the hostname for a service back to its original hostname and restart the nova-compute service, what happens?	20:52
alanmeadows	oh thats definitely coming under strict control as a lesson learned	20:52
alanmeadows	I did not try that scenario without the mucking	20:53
mriedem	hostname changes will result in a new compute node record	20:53
mriedem	which means a new resource provider with new uuid	20:53
mriedem	compute nodes are unique per hostname/nodename (which for kvm is the same)	20:53
alanmeadows	We attempt to roll forward with the fqdn transition but preserve mappings	20:54
jaypipes	mriedem: right, but apparently something deletes all the instances/allocations on the old provider in the process...	20:54
alanmeadows	looking at nodes that underwent the transition	20:54
mriedem	probably the resource tracker	20:54
alanmeadows	they have the highest ID increment in resource_providers	20:54
mriedem	https://github.com/openstack/nova/blob/31956108e6e785407bdcc31dbc8ba99e6a28c96d/nova/compute/resource_tracker.py#L1244	20:54
alanmeadows	as though something deleted their `short` version, created the `longName` version and cascaded the allocations	20:54
mriedem	my guess would be either something in the RT or something on compute restart thinking an evacuation happened	20:56
jaypipes	mriedem: right, but https://github.com/openstack/nova/blob/31956108e6e785407bdcc31dbc8ba99e6a28c96d/nova/compute/resource_tracker.py#L792-L794 should not delete instances from the old hostname, since self._compute_nodes[nodename] (where nodename == new FQDN) should yield no results for InstanceList.get_all_by_host_and_nodename(), right?	20:56
mriedem	https://github.com/openstack/nova/blob/31956108e6e785407bdcc31dbc8ba99e6a28c96d/nova/compute/manager.py#L628	20:57
jaypipes	mriedem: that's migrations, though, which again, shouldn't be returning anything in this case (of a hostname rename)	20:58
jaypipes	or at least, I think that's the case. alanmeadows, what version of openstack are you using?	20:59
alanmeadows	This is ocata	20:59
alanmeadows	in this instance	20:59
mriedem	the evac migrations robustification was added by dan because in the olden times a hostname change would make compute think an evac happened and delete your instances	20:59
mriedem	but that was liberty i think	20:59
mriedem	https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/robustify_evacuate.html	20:59
jaypipes	yeah, ocata code is identical as no	21:00
jaypipes	yeah, ocata code is identical as now	21:00
jaypipes	so I don't believe it's the evacuate code path that is the issue here.	21:00
mriedem	so the allocations were deleted for the old provider in placement or just not there for the new provider?	21:00
jaypipes	alanmeadows: UNLESS... your deployment tooling issued some sort of host-evacuate call in doing this rename of hostname FQDNs?	21:00
alanmeadows	I'd be ok with them not being there for the new provider	21:01
alanmeadows	I could deal with that	21:01
alanmeadows	its that they appear to be gone entirely	21:01
alanmeadows	@jaypipes: definitely no, no nova calls, just a /etc/hosts ordering and `domain` resolv updates.	21:02
mriedem	as a workaround you could run the heal_allocations CLI but that's not in ocata you'd have to backport it or run from a container https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement	21:02
alanmeadows	didn't know about that	21:02
alanmeadows	potential contender	21:03
alanmeadows	one grounding question though	21:05
alanmeadows	these agents that have undergone this name transition	21:06
alanmeadows	they come up, and wish to report to the world their PciDevicePool counts are 100% available	21:07
alanmeadows	when obviously resources have been assigned	21:07
alanmeadows	they also overwrite any pci_devices entries for that node id that may have been allocated to unallocated	21:07
alanmeadows	and so at the end of the day, am I chasing the right thing with there being empty allocations records for these hosts	21:08
jaypipes	alanmeadows: well, PCI devices are not handled by placement unfortunately (or... fortunately, for you at least)	21:09
jaypipes	alanmeadows: so we need to separate out the placement DB's allocations table issues from the pci_devices table issues, because they are handled differently.	21:09
alanmeadows	sure the authority separation I get	21:10
alanmeadows	I arrived at missing data in the allocations table but I started with	21:10
alanmeadows	why these agents are reporting an incorrect state to the world	21:10
jaypipes	alanmeadows: are there still entries in the pci_devices table that refer to the original compute nodes table records for the original hostname?	21:13
alanmeadows	yes, until the agent starts up and whacks them	21:14
alanmeadows	oh, re-read your question	21:14
alanmeadows	yes, there were	21:14
alanmeadows	but recall our brilliant idea about how to back out of this	21:14
alanmeadows	and preserve mappings	21:14
jaypipes	alanmeadows: ok, so at least that issue should be easy to resolve...	21:14
jaypipes	alanmeadows: need to drop jules off ... back in about 20 mins	21:14
alanmeadows	was to revive the original compute_nodes entry	21:15
alanmeadows	by updating its hypervisor_name	21:15
alanmeadows	and when we do this	21:15
alanmeadows	the correct pci_devices entries for that older node name (but now updated)	21:15
alanmeadows	are clobbered	21:15
alanmeadows	and set all set to available	21:16
alanmeadows	we're just trying this approach out on one host	21:16
alanmeadows	https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L6658-L6671	21:18
* alanmeadows blinks		21:18
*** efried has quit IRC		21:30
*** efried has joined #openstack-placement		21:30
jaypipes	mriedem, alanmeadows: that looks to be it.	21:39
alanmeadows	How much do we trust https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement	21:40
alanmeadows	This does look to be an answer for rebuilding this data	21:40
alanmeadows	without having to go off and figure out how to cobble it	21:40
mriedem	i wrote it	21:49
mriedem	but that doesn't mean you have to trust it	21:50
mriedem	run it with --max-count of 1 if you don't trust it	21:50
mriedem	i thought about adding a --dry-run option but didn't have time	21:50
mriedem	mnaser also has a script to fix up allocations i think	21:51
alanmeadows	whatever the nova elders believe is the best approach	21:53
alanmeadows	ensuring I can still read docs I should be fine with an RPC version of 1.28 against a rocky nova-manage to leverage heal_allocations	21:54
alanmeadows	aka rocky nova-manage on ocata nova/placement for `heal_allocations` seems to be "ok"	21:55
jaypipes	alanmeadows: honestly, I'm still trying to figure out if the code you link above is actually the thing that is deleting the resource provider and allocation records.	21:56
jaypipes	mriedem: I mean, wouldn't self.host by equal to the new FQDN in this line? https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L6675 and therefore it would not find the old compute node record and call destroy() on it?	21:58
mriedem	alanmeadows: heal_allocations shouldn't require anything over rpc	22:00
mriedem	it's all db	22:00
jaypipes	mriedem: ahhhhhhhhh	22:00
jaypipes	mriedem: I think I understand now what happened...	22:01
mriedem	well, was "Deleting orphan compute node" in the logs?	22:01
jaypipes	alanmeadows: I bet you didn't change the nova.cnf file's CONF.host option when you changed the hostname of the compute nodes, right?	22:02
alanmeadows	mriedem: excellent question, working on an answer to that	22:02
alanmeadows	jaypipes: we do not use `host` at this time, but clearly after this, we will drive it going forwar	22:03
alanmeadows	to avoid any shuffling not at our consent	22:03
alanmeadows	we let nova determine it	22:04
alanmeadows	and of course, no one likes moving targets	22:04
jaypipes	alanmeadows: and in doing so, there was a mismatch between the CONF.host value and what was returned by the virt driver's get_available_nodes() method (called from here: https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L6654). the issue is that get_available_nodename() doesn't use CONF.host. It uses the hypervisor's local hostname which would be different (https://github.com/openstack/nova/blob/stable/ocata/nova/	22:06
jaypipes	virt/libvirt/host.py#L681-L691)	22:06
jaypipes	alanmeadows: and that's what caused the delete of orphaned compute nodes to run.	22:06
jaypipes	mriedem: so, basically, nova-compute started up thinking it was the old hostname, libvirt told the compute manager it was the new hostname, and the compute manager deleted the compute node record referring to the old hostname.	22:07
mriedem	yup, that's what the old evac issue was like	22:09
mriedem	that code in the compute manager is really meant for ironic	22:09
alanmeadows	luckily we don't allow orphan vms to be cleaned	22:09
alanmeadows	or ... oops.	22:09
*** cdent has quit IRC		22:50
*** efried has quit IRC		22:53
alanmeadows	looks like heal_allocations will require backporting	23:01
mnaser	seems like hostname changing fun? :\	23:40
mnaser	we just reboot servers on hostname changes now	23:40
alanmeadows	since you popped in mnaser...	23:41
alanmeadows	mriedem mentioned you had a script for fixing up allocations	23:42
mnaser	i pasted it somewhere hmm	23:42
mnaser	it was more meant for cleaning up in the sense of removing entries that should not be there	23:42
alanmeadows	attempting to slam heal_allocations into ocata is proving... fun	23:42
alanmeadows	so if you have something more simplistic about	23:42
mnaser	but you could maybe rewrite it using the foundation to do more	23:42
mnaser	let me find it	23:42
mnaser	it was in a launchpad somewhere..	23:43
mnaser	the world's worst site to search	23:43
mnaser	alanmeadows: https://bugs.launchpad.net/nova/+bug/1793569	23:44
openstack	Launchpad bug 1793569 in OpenStack Compute (nova) "Add placement audit commands" [Wishlist,Confirmed]	23:44
mnaser	http://paste.openstack.org/show/734146/	23:44
alanmeadows	this is much more hackable	23:45
mnaser	so the idea is it hits the nova os-hypervisors api	23:45
mnaser	and then kinda just does an audit comparing things back and forth	23:45
alanmeadows	I'm not convinced the ocata placement api has everything heal_allocations wants (the report client definitely does not, but was fixing) - that rabbit hole feeling was creeping over me	23:46
mnaser	if you can keep somewhat the same logic and add a way to make sure entries which are missing get added, it'll be even more useful	23:46

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!