Thursday, 2016-02-25

mordred	crinkle: fwiw: delete from compute_nodes where id=21;	00:00
crinkle	mordred: cool	00:00
mordred	crinkle: I'm looking for pre-written scripts for doing the soft deletes	00:00
mordred	crinkle: it seems to be ... tricky	00:01
nibalizer	284468	00:01
nibalizer	logs ^^	00:01
*** cody-somerville has joined #openstack-sprint		00:05
*** dfflanders has quit IRC		00:20
*** cody-somerville has quit IRC		00:21
*** delatte has joined #openstack-sprint		00:58
*** delattec has quit IRC		01:00
*** yolanda_ has joined #openstack-sprint		01:05
*** yolanda has quit IRC		01:18
*** yolanda_ is now known as yolanda		01:18
*** sivaramakrishna has joined #openstack-sprint		02:40
*** baoli_ has quit IRC		03:31
*** baoli has joined #openstack-sprint		03:40
*** baoli has quit IRC		03:42
*** baoli has joined #openstack-sprint		03:43
*** baoli has quit IRC		03:48
*** yolanda has quit IRC		04:28
*** mrmartin has joined #openstack-sprint		05:44
*** mrmartin has quit IRC		05:48
*** cody-somerville has joined #openstack-sprint		05:54
pabelanger	sigh	06:30
pabelanger	https://github.com/openstack/puppet-nova/blob/stable/kilo/manifests/init.pp#L603	06:30
pabelanger	kilo hardcodes /var/log/nova to 0750	06:31
pabelanger	which causes apache to 404	06:31
pabelanger	will hack on it in the morning	06:31
*** cody-somerville has quit IRC		06:34
*** mrmartin has joined #openstack-sprint		06:38
*** imcsk8 has quit IRC		06:53
*** imcsk8 has joined #openstack-sprint		07:06
*** AJaeger has quit IRC		08:10
*** mrmartin has quit IRC		08:55
*** trusted has joined #openstack-sprint		09:08
*** trusted has quit IRC		09:10
*** sivaramakrishna has quit IRC		09:10
*** sivaramakrishna has joined #openstack-sprint		09:11
*** sivaramakrishna is now known as Guest92287		09:11
*** yolanda has joined #openstack-sprint		09:19
*** yolanda has quit IRC		09:35
*** mrmartin has joined #openstack-sprint		09:44
*** Guest92287 has quit IRC		10:19
*** _degorenko\|afk is now known as degorenko		10:30
*** mrmartin has quit IRC		10:34
*** mrmartin has joined #openstack-sprint		10:35
*** mrmartin has joined #openstack-sprint		10:47
*** mrmartin has quit IRC		11:12
*** rfolco has joined #openstack-sprint		11:53
*** mrmartin has joined #openstack-sprint		12:11
*** NobodyCa1 has joined #openstack-sprint		12:15
*** NobodyCam has quit IRC		12:17
*** mrmartin has quit IRC		12:43
*** yolanda has joined #openstack-sprint		12:45
*** baoli has joined #openstack-sprint		12:55
*** baoli_ has joined #openstack-sprint		13:26
*** baoli has quit IRC		13:28
clarkb	anyone else doing pancakes? otherwiae we are heading over soon	14:33
fungi	clarkb: i am not. trying to catch up on some work and have been eating more than i'm used to this week	14:38
fungi	though if anyone's driving to the airport this afternoon and willing to have a passenger, i'm in need of a ride. if not, i'll probably just arrange a cab	14:39
clarkb	I am catching shuttle at noon	14:42
clarkb	which is a bit early for you	14:42
fungi	well, also i'm not booked on your shuttle, so they'd likely balk	14:48
*** NobodyCa1 is now known as Nobodycam		15:09
*** Nobodycam is now known as NobodyCam		15:10
anteaya	morning	15:13
*** dfflanders has joined #openstack-sprint		15:19
fungi	anteaya: how's minnesota?	15:21
anteaya	much the same as fort collins	15:21
anteaya	driving a rav4	15:21
anteaya	liking it	15:21
anteaya	looks like devstack had a problem last night: http://lists.openstack.org/pipermail/openstack-dev/2016-February/087518.html	15:22
anteaya	sdague and jhesketh worked on it: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2016-02-25.log.html#t2016-02-25T11:55:42	15:22
anteaya	and ansible was pinned in devstack-gate: https://review.openstack.org/#/c/284652/2	15:22
anteaya	that is my summary thus far	15:22
anteaya	and I understand that you got infra cloud connected to nodepool yesterday, yay!	15:25
anteaya	\o/	15:25
anteaya	nice work	15:25
*** dfflanders has quit IRC		15:25
fungi	it was exciting to see	15:27
nibalizer	good morning	15:27
fungi	even if that short-lived excitement was soon replaced by new bugs to fix	15:28
clarkb	good morning. jeblair pabelanger amd I are at hpe lobby. anyome check into why nodepool cant boot things yet?	15:28
fungi	i'm just about to check out of my room and will be there shortly	15:29
anteaya	yay	15:30
anteaya	fungi: new bugs, yay!	15:30
anteaya	nibalizer clarkb morning	15:30
*** yolanda has quit IRC		15:32
nibalizer	clarkb: i have done no investigating	15:37
nibalizer	gonna head to the hpe office in a few	15:37
clarkb	https://www.cloudbees.com/press/cloudbees-announces-first-jenkins-based-continuous-delivery-service-aws-and-openstack-private	15:50
nibalizer	heya	16:07
nibalizer	were at the security desk	16:07
fungi	yep	16:08
nibalizer	can has fetching?	16:08
clarkb	pleia2: is on the way	16:08
*** yolanda has joined #openstack-sprint		16:09
rcarrillocruz	https://review.openstack.org/#/c/284789/	16:23
rcarrillocruz	reviews pls, fix for the set project quotas playbook	16:23
clarkb	file=/var/lib/nova/instances/979306d6-e169-4ba8-a343-b266b4f422c1/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none	16:26
clarkb	compute029 doesn't have the updated nova.conf	16:27
clarkb	so this may just be a matter of making sure puppet ran everywhere	16:27
clarkb	mordred: are you around? new ansible may have broken infracloud puppeting. We don't run puppet against the infracloud hosts	16:31
clarkb	https://jenkins05.openstack.org/job/gate-rally-dsvm-rally-cinder/848/console	16:32
mordred	clarkb: uhm	16:35
mordred	clarkb: I am here - would it be useful for me to dial in?	16:36
clarkb	mordred: sure	16:36
clarkb	we are currently talking about plans for mid term of clouds	16:37
clarkb	but you are welcome to join in	16:37
mordred	neat	16:37
clarkb	mordred: the other issue is remember that host that nodepool couldn't delete and you cleaned up? it seems to still count against quota	16:38
mordred	clarkb: neat. I may need to delete more records	16:39
clarkb	we bumped quota and are now booting nodes again successfully	16:39
clarkb	but we have one nodes worth of extra quota counted against us and it does not exist	16:39
* jeblair thinks we sholud just redploy west ;)		16:41
clarkb	jeblair: how were you checking disk IO of running VM yesterday? I can start that up on compute029	16:42
jeblair	iotop	16:46
clarkb	we seem to do a consistent 1-2 megabyte per second on that VM	16:47
clarkb	for writes	16:47
jeblair	that's what we saw on 035 yesterday too	16:47
clarkb	note 029 has not updated its libvirt config	16:47
clarkb	because puppet isn't running on the infracloud things	16:48
yolanda	infra-root, is ok to restart gerrit ? is performing badly	16:48
fungi	yolanda: fine by me	16:48
*** rfolco has quit IRC		16:56
crinkle	okay I believe https://review.openstack.org/#/c/276375/ is reasonably safe to apply to west though it will cause a short downtime	16:57
Clint	fungi: "ssh: Enable HashKnownHosts by default to try to limit the spread of ssh worms."	16:58
*** rfolco has joined #openstack-sprint		16:59
*** rfolco has quit IRC		16:59
fungi	Clint: thanks~ i feel like we should just set that globally across all our puppeted servers in fact	17:00
Clint	i certainly do personally	17:01
fungi	i understand the theory as to why it's sometimes a security improvement for personal systems (since it allows an attacker to get a list of other places you may have an account) but for our servers i don't see it being anything beyond a complication	17:04
mordred	fungi: ++	17:04
clarkb	mordred: [DEPRECATION WARNING]: Using bare variables is deprecated. from puppet run all log	17:06
Clint	an attacker can get that list of places from my shell history	17:06
clarkb	I think that is new with new ansible	17:06
Clint	so all it does is irritate me	17:06
yolanda	crinkle is it ok now to approve https://review.openstack.org/#/c/276375/ ?	17:09
yolanda	if that lands i can remove management address from my patches	17:09
clarkb	nibalizer: puppet did not run	17:10
clarkb	but thats what it said so it actually tried	17:11
crinkle	yolanda: I think so	17:11
crinkle	except i guess puppet isn't running ^	17:11
mordred	clarkb: https://review.openstack.org/284820	17:11
fungi	Clint: true. dat	17:18
*** krtaylor has quit IRC		17:21
jeblair	http://docs.openstack.org/infra/system-config/sysadmin.html#ssh-access	17:23
jeblair	crinkle: ^	17:23
crinkle	jeblair: ty	17:23
jeblair	yolanda: can you see my comment on https://review.openstack.org/280326 ?	17:26
*** sivaramakrishna has joined #openstack-sprint		17:27
yolanda	checking	17:28
yolanda	yes i will fix	17:29
crinkle	nibalizer: https://review.openstack.org/284830	17:30
nibalizer	ty	17:30
yolanda	jeblair, amended	17:30
jeblair	yolanda: thanks	17:31
nibalizer	fungi: 268366 and 284830	17:33
nibalizer	sets up infracloud-root	17:33
*** krtaylor has joined #openstack-sprint		17:33
jeblair	mordred, nibalizer, yolanda: can we chat about https://review.openstack.org/281892 and https://review.openstack.org/243399 ?	17:35
clarkb	nibalizer: Could not find data item openstackci_infracloud_password in any Hiera data file and no default supplied at /opt/system-config/production/manifests/site.pp:1188 on node controller00.hpuswest.ic.openstack.org	17:37
clarkb	the rally job succeeded \o/	17:52
clarkb	so while a little slow we are producing good results	17:52
crinkle	\o/	17:57
yolanda	jeblair, patches fixed and waiting for review again	17:58
*** sivaramakrishna has quit IRC		18:04
*** degorenko is now known as _degorenko\|afk		18:04
Clint	https://jenkins01.openstack.org/job/gate-tempest-dsvm-neutron-full/21745/console aborted	18:06
fungi	Clint: did the change which triggered it get a new patchset, or was it in the gate and a change ahead of it failed so testing for it was restarted as new job runs?	18:09
Clint	fungi: no new patchset, no explicit dependency	18:11
fungi	was it for an approved change?	18:12
mordred	sensu_metrics_check: name=swift-sla-metrics plugin=metrics-os-api.py	18:13
mordred	args='-S swift --scheme {{ monitoring.graphite.host_prefix }}'	18:13
nibalizer	i have fixed up controller00's puppet i think	18:13
yolanda	jeblair so we have port 80 in west, but is not needed	18:13
yolanda	rcarrillocruz sent a change to remove for west	18:13
rcarrillocruz	i pushed a change to disable that for west	18:13
rcarrillocruz	y	18:13
jeblair	yolanda, crinkle: yes -- so let's make it it the same	18:13
jeblair	pabelanger: are you doing any logs on the baremetal nodes yet?	18:14
yolanda	jeblair so i think the way is to not use 80, and remove from west, rather than adding to east	18:14
jeblair	pabelanger: or just the controller for now?	18:14
Clint	fungi: got approved ~1h ago	18:14
pabelanger	jeblair: just controllers, but easy to enable on controllers	18:14
pabelanger	err	18:14
pabelanger	computes	18:15
jeblair	pabelanger: okay, so if you're not doing it on baremetal for now, then it may be safe to turn off port 80 on baremetal	18:15
jeblair	crinkle: sound right to you?	18:15
mordred	https://github.com/blueboxgroup/ursula-monitoring/tree/master/sensu/plugins	18:15
pabelanger	jeblair: Ya, a matter of just saying we want it and updating a line in puppet	18:15
mordred	https://github.com/blueboxgroup/ursula-monitoring/blob/master/sensu/plugins/check-glance-store.py	18:16
crinkle	jeblair: yolanda I don't see anything running on port 80...I'm wondering why it was there, nginx is running on 8080	18:16
mordred	https://github.com/blueboxgroup/ursula-monitoring/blob/master/sensu/plugins/metrics-nova.py	18:16
rcarrillocruz	probably a copy pasta from other definition	18:16
crinkle	could be	18:17
rcarrillocruz	based as a template for baremetal	18:17
fungi	Clint: what change was it? my connection is not well suited to browsing the jenkins webui	18:17
crinkle	jeblair: yolanda I think it should be safe to remove it	18:17
Clint	fungi: https://review.openstack.org/#/c/271599	18:18
fungi	Clint: oh, it's a cinder change. it's almost certainly still in the gate and a change ahead of it in the integrated queue failed and caused a gate reset	18:19
mordred	https://github.com/blueboxgroup/ursula-monitoring/tree/master/collectd/plugins/openstack	18:20
fungi	Clint: http://status.openstack.org/zuul/ would tell you for sure (i similarly can't easily look at that from this machine)	18:20
Clint	fungi: yup, thanks	18:21
crinkle	this might be useful for the compute nodes https://collectd.org/wiki/index.php/Plugin:virt	18:23
crinkle	since sometimes the controller lies about what the computes are doing	18:23
jeblair	mordred: https://review.openstack.org/281892 and https://review.openstack.org/243399	18:26
yolanda	infra-root, there is an extra +2 needed for patches on infra-cloud apart from jeblair, care reviewing those ?	18:26
mordred	https://galaxy.ansible.com/openstack-infra/puppet/ btw	18:35
clarkb	nibalizer: has puppet updated the libvurt settings on compute029 yet?	18:36
yolanda	jeblair, resent	18:36
clarkb	if that improves performance we can probably merge your change to bump max-servers	18:37
nibalizer	clarkb: checking	18:37
fungi	nibalizer: why is 268366 using depends-on instead of being rebased onto 284830 (they're in the same git repo)	18:38
nibalizer	clarkb: yes it ran	18:39
clarkb	nibalizer: if you look in /etc/nova/nova.conf is something=file=unsafe presentm	18:39
clarkb	?	18:39
clarkb	if so check the currently running VM for its qcow2 disk line in ps	18:40
nibalizer	i dont see a vm in 029	18:40
clarkb	scheduker us probablu moving them aroubd now	18:40
fungi	is anyone driving to the denver airport this afternoon/evening i can bum a ride from? i don't really care what time... my flight doesn't take off until after midnight so i expect to just work from the airport for a while anyway	18:46
fungi	if no, not a big deal, just don't want to waste money on cab fare needlessly	18:46
crinkle	yolanda: jeblair https://review.openstack.org/284872	18:51
pabelanger	jeblair: mordred: I asked the trystack team to publish there grafana dashboard tooling too: https://github.com/trystack/trystack-collectd As another metric to how to get the data	18:53
yolanda	crinkle, pleia2, or anyone interested, do you want to join a skype call about move of the servers in an hour ?	18:54
nibalizer	clarkb: hrm	18:54
jeblair	pabelanger: cool, and yeah, more collectd	18:54
crinkle	yolanda: yes	18:54
nibalizer	so according to openstack server show the vm is on 029	18:54
nibalizer	but when i ps on 029 it does not exist	18:54
nibalizer	so i wonder if we lost another one	18:54
pleia2	yolanda: I assumed we'd do it here in the room with the phone thing	18:54
clarkb	nibalizer: if it uodated the hostname when it rwstarted then that seems likrly	18:54
nibalizer	clarkb: yea the vm doesn't ping	18:54
pleia2	we can use the skype phone number on the conference a call device	18:54
yolanda	i had an invite with skype so we need to change the method	18:55
yolanda	ah ok	18:55
clarkb	so we may want a controlled puppet everywhere	18:55
pleia2	I don't use skype, I always use the phone numbers :)	18:55
clarkb	then redo mordreds cleanups	18:55
yolanda	let's do with skype phone	18:55
crinkle	nibalizer: yep nova hypervisor-list \| grep compute029 has one up one down	18:55
nibalizer	gg	18:55
clarkb	skype stopped working on linux this week too	18:55
clarkb	so ya bad skype	18:55
nibalizer	our mirror pings	18:55
nibalizer	because its a good mirror	18:56
*** AJaeger has joined #openstack-sprint		18:56
clarkb	I am on shuttle now so service may be spotty but ya just getting it everywhere then cleaning up may get us past this	18:59
fungi	pleia2: yolanda: and we can dial the conference phone into pbx.openstack.org if there's more than one person not in the room who wants to be on the call	19:00
crinkle	yolanda: https://review.openstack.org/#/c/277605/5/manifests/site.pp	19:06
yolanda	jeblair, crinkle, can you revisit https://review.openstack.org/260018 ? fixed rabbit port to use ssl	19:09
*** baoli_ has quit IRC		19:11
yolanda	fungi ok i'll share phone number and conf and we can do it on the room, sounds good	19:22
nibalizer	ed78ffbb-ce70-43a1-8c5d-b0bbbe7d8174 \| devstack-trusty-infracloud-west-8311836 \| ACTIVE \| public=15.184.55.16	19:27
yolanda	fungi thanks for approving, i'll look at puppetmaster to babysit	19:32
clarkb	nibalizer: is thst a new host with better cache?	19:36
nibalizer	clarkb: no thats an orphaned node	19:37
nibalizer	that needs the mordred clenaup funtime	19:37
clarkb	ah	19:37
clarkb	we should write up a proposal for nova to do that	19:40
clarkb	possibly a no really just delete from db please	19:40
clarkb	and maybe we can identify compute hosts with unique id set in config?	19:41
clarkb	then this only changes if we explicitly change it in config	19:41
clarkb	I wonder how something like mesos does that	19:42
jeblair	or generate a uuid and store it in host-local storage on boot?	19:42
jeblair	(first boot)	19:42
jeblair	(and if you blow away your host, you get another uuid and you need to clean up the old one, but so what?)	19:43
clarkb	ya as long as it is more explicit than hostname	19:44
*** baoli has joined #openstack-sprint		19:49
*** krotscheck is now known as krotscheck_dcm		19:50
*** baoli has quit IRC		19:51
nibalizer	https://review.openstack.org/#/c/284820/	20:02
nibalizer	infra root should be able to do that	20:02
clarkb	nibalizer we can manually remove that instance from nodepool db too, since it is gone cloud side	20:02
nibalizer	clarkb: ok	20:02
nibalizer	would i have to stop nodepool?	20:02
clarkb	that should boot a new thing with cache hopefully	20:02
clarkb	nibalizer: no	20:02
nibalizer	we're doing a conf call to determine the future of the gear	20:02
clarkb	ah	20:02
clarkb	just thinking it would be good to discover if cache mode fixes performance with io	20:03
clarkb	but uncapable of doing anything myself so dont worry about it	20:03
jeblair	i wonder if "missing hardware" might explain some of the nodes that are unresponsive :)	20:08
crinkle	afaik there are only two that are unresponsive https://etherpad.openstack.org/p/infra-cloud-inventory-status	20:10
crinkle	others might not be fully deployed or have other miscellaneous errors	20:10
mordred	it'd be great if we could get ipv6 in the new ecopod	20:10
mordred	:)	20:10
clarkb	+1	20:11
mordred	and also maybe not so much with the tagged vlans	20:11
mordred	AND	20:11
mordred	if they're unracking everything anyway	20:11
mordred	maybe they can do battery remediation	20:11
clarkb	tagged is fine if we can have two nics	20:12
mordred	yah	20:12
clarkb	noe that we grok vlans eith glean and friemds	20:12
mordred	what?	20:13
clarkb	now that we grok vlans	20:14
clarkb	the trouble before was around interface setup when needing to tag	20:14
clarkb	but thats mostly solved and leaves us with neutron funkyness on one interface	20:15
rcarrillocruz	folks	20:25
rcarrillocruz	for the ansible role to create resources baseline	20:25
rcarrillocruz	what are the resources that are common to all clouds	20:26
rcarrillocruz	trying to model it	20:26
rcarrillocruz	flavors are common to all clouds i assume	20:26
rcarrillocruz	?	20:26
rcarrillocruz	anything else?	20:26
rcarrillocruz	images	20:26
rcarrillocruz	...	20:26
clarkb	flavors images users/projects/domains	20:28
rcarrillocruz	that's it?	20:28
rcarrillocruz	thanks	20:28
clarkb	the mirror maybe but I think thats a layer above	20:28
rcarrillocruz	yeah	20:28
mordred	crinkle (and everyone else) this: http://paste.openstack.org/show/488271 is a pseudo-code first draft of the delete script	20:29
clarkb	networks	20:29
clarkb	and subnets	20:29
mordred	I took the previous script which was using pt-archiver to move things to shadow tables	20:30
mordred	and am rewriting it ot just delete things	20:30
mordred	because screw that	20:30
mordred	also, it'll now take a server argument which is the UUID of a server to purge from the db	20:30
mordred	I've also got a dump of the nova db from west on my laptop so that I can practice a few times :)	20:31
rcarrillocruz	clarkb: ?	20:31
rcarrillocruz	i had the impression you get the netowrks and subnets created for you on public cloud providers	20:31
rcarrillocruz	so it's not really common to all clousd?	20:32
mordred	it depends	20:32
mordred	some clouds create them for you, some do not	20:32
rcarrillocruz	ok	20:32
rcarrillocruz	tell you what	20:32
mordred	same with flavors - some clouds we create them, some we do not	20:32
rcarrillocruz	i'll model it in a yaml	20:32
mordred	yah	20:32
mordred	++	20:32
mordred	I think that's a great first step	20:32
rcarrillocruz	with a dict as 'common_cloud_resources"	20:32
mordred	images is a thing we want to be the same everywhere	20:32
rcarrillocruz	those get created oin all clouds defined	20:32
rcarrillocruz	then in a cloud/region basis we can have specific additional resources	20:32
mordred	yah	20:32
rcarrillocruz	not mutually exclusive	20:33
anteaya	oh good news rcarrillocruz and clarkb, dougwig and kevinbenton were sitting close to me at lunch and are willing to listen to you tell them your use case involving default security groups	20:35
anteaya	I suggested next week might be a good time for a chat	20:35
anteaya	and recommended they try to find you in infra	20:36
rcarrillocruz	nice anteaya	20:38
clarkb	denver is an airport where pre would be useful	20:38
rcarrillocruz	mordred: fwiw, can't wait to see this coming: https://github.com/ansible/ansible/issues/13262	20:38
rcarrillocruz	include with_items workaround is nasty :/	20:39
Clint	clarkb: they don't have pre?	20:39
clarkb	I dont	20:39
Clint	ohhh	20:39
clarkb	portland + pre is not super useful	20:39
clarkb	lines what lines	20:39
clarkb	denver on the other hand	20:39
clarkb	also there are special baggage claims for skis	20:42
*** _degorenko\|afk has quit IRC		20:42
*** _degorenko\|afk has joined #openstack-sprint		20:43
rcarrillocruz	folks, can you +2+A https://review.openstack.org/#/c/284801/ , had to rebase	20:44
rcarrillocruz	thx fungi	20:47
pabelanger	So, crinkle was mentioning that our puppet-infracloud jobs in the gate were pretty slow. Did we want to review the jobs and possible trim them? EG: setup -nv for centos / fedora and or drop beaker jobs?	20:49
pabelanger	since we depend on bare-precise / bare-trusty and are limited by rax	20:50
crinkle	pabelanger: in this case they were waiting to run puppet-lint and puppet-syntax jobs	20:51
rcarrillocruz	mordred, clarkb: http://paste.openstack.org/show/488275/	20:52
pabelanger	crinkle: right, which is on bare-trusty nodes (rax only). i think we could take with fungi and convert -infra to bindep jobs (which is ubuntu-trusty) nodes and on all clouds now	20:53
fungi	pabelanger: the challenge i think will be that the plan so far has been to switch jobs over on a per-job-template basis (since we need to force them to specific node types), so we'd need to use slightly different templates just for the jobs using bindep if we're talking about splitting some projects to this and not others which used the same job-templates	20:55
pabelanger	fungi: right	20:56
fungi	that also complicates the unwinding once we finish migrating	20:57
fungi	so would want to make sure we track what can be recombined into common job-templates again in the end	20:57
AJaeger	fungi, yeah - I had to duplicate the tox job for moving the manual jobs in 283445	21:02
fungi	pabelanger: also i have a feeling we're about to be not landing many puppet-infracloud changes soon for at least some weeks while all te hardware is offline	21:02
anteaya	clarkb: yes denver is an airport where pre would be useful	21:02
AJaeger	and right now our systems are really busy - 380 changes in check queue...	21:03
pabelanger	fungi: agreed. Just looking at optimizations	21:03
fungi	clarkb: current discussion is around whether we should be doing 802.3ad or similar and then 802.1q over that. you almost certainly have some input here	21:13
crinkle	could someone check on the puppetmaster and see if the last puppet run on controller00 was successful? it looks like puppet has run on the controller but not on the computes so the computes are trying to reach a rabbitmq that has moved ports	21:14
fungi	clarkb: as opposed to putting different native vlans on the two uplinks	21:14
crinkle	o.0 Feb 25 21:20:30 controller00 puppet-user[10150]: Could not find data item elasticsearch_nodes in any Hiera data file and no default supplied at /opt/system-config/production/manifests/site.pp:8 on node controller00.hpuswest.ic.openstack.org	21:21
crinkle	root@controller00:~# grep elasticsearch_nodes /opt/system-config/production/hiera/common.yaml	21:22
crinkle	elasticsearch_nodes:	21:22
crinkle	aha http://git.openstack.org/cgit/openstack-infra/ansible-puppet/tree/defaults/main.yml#n15	21:25
*** cody-somerville has joined #openstack-sprint		21:25
nibalizer	cody-somerville: http://paste.openstack.org/show/488279/	21:25
yolanda	infra-root , are you familiar with error	21:27
yolanda	fatal: [controller00.hpuseast.ic.openstack.org]: FAILED! => {"changed": false, "disabled": false, "error": true, "failed": true, "invocation": {"module_args": {"environment": null, "facter_basename": "ansible", "facts": null, "logdest": "syslog", "manifest": "/opt/system-config/production/manifests/site.pp", "puppetmaster": null, "show_diff": false,	21:27
yolanda	"timeout": "30m"}, "module_name": "puppet"}, "msg": "puppet did not run", "rc": 1, "stderr": "", "stdout": "", "stdout_lines": []}	21:27
yolanda	how can i better debug it ?	21:28
crinkle	yolanda: please see what i pasted above ^	21:28
crinkle	the last ansible-puppet change broke the hiera config	21:28
yolanda	crinkle there was a typo on the flag, i fixed with https://review.openstack.org/284936	21:29
yolanda	but still fails for me	21:30
fungi	nibalizer: i'm reading through lspci... two gigabit interfaces, a mellanox 10g interface... i wonder what the fourth nic is?	21:30
yolanda	but yes , apart from that there is something with hieradata now	21:31
crinkle	yolanda: the hiera.yaml is pointing the hiera data at :datadir: "/etc/puppet/hieradata"	21:31
clarkb	fungi: neutron wants its own logical interface	21:31
clarkb	fungi: so link aggregation doesn't really help	21:31
clarkb	you can do link aggregation but you still need two logical interfaces	21:31
yolanda	nibalizer, can you check at that hiera paths in ansible-puppet ? looks as they are not really matching our needs	21:34
clarkb	I think the ideal would be 10Gbe for neutron, and 1gig or better for control	21:34
clarkb	and on the controller use 10Gbe for image stuff	21:34
clarkb	actually you want 10Gbe for all the things because images	21:35
fungi	clarkb: sounds like these have a 10gbe mellanox interface and then a couple of 1gbe nics	21:35
fungi	the hardware in west anyway	21:35
yolanda	mmm	21:35
yolanda	:yaml:	21:35
yolanda	:datadir: "/opt/system-config/"	21:35
yolanda	needs to be that, not /etc/puppet/hieradata...	21:35
clarkb	looks like the unhappy VM is still in nodepool and still unhappy	21:35
Clint	fungi: if you up all 4 interfaces, how many of them have link?	21:35
clarkb	mordred: ^ you are working on cleaning it out of cloud? has that happened yet?	21:36
fungi	also i have a feeling the ipmi boot isn't going to work over the 10gbe nic	21:36
clarkb	mordred: not sure if nodepool needs encouragement or if it isn't expected to work yet	21:36
fungi	Clint: Clint two	21:36
fungi	eth0 and eth1 have no carrier, eth2 and eth3 have carrier	21:37
fungi	after ip link set up	21:37
Clint	fungi: and which one are we using?	21:37
fungi	eth2	21:38
crinkle	could someone make recommendations on https://review.openstack.org/284939 ?	21:38
Clint	so if eth3 is shared with the ilo port that might make sense	21:38
yolanda	crinkle nibalizer https://review.openstack.org/#/c/284942	21:40
fungi	Clint: yep, that's what i'm suspecting	21:41
jeblair	Speed: 10000Mb/s	21:41
jeblair	from ethtool eth2 in compute035	21:41
fungi	Clint: because i see the ilo show up in lspci	21:41
yolanda	and i need https://review.openstack.org/284936 and https://review.openstack.org/284938 to land as well	21:42
fungi	jeblair: did you manually install ethtool? i was only finding mii-tool installed on compute001	21:42
jeblair	fungi: yep	21:42
mordred	clarkb: I'm still working on the script	21:43
yolanda	infra-root, to fix puppet ^	21:43
crinkle	yolanda: I +1'd all your things and abandoned my thing	21:44
yolanda	thx	21:44
yolanda	i did some tests changing that parameters on my controller and puppet ran	21:45
jhesketh	Howdy	21:48
nibalizer	yolanda: commented on 284938/1	21:48
jeblair	jhesketh: o/	21:48
nibalizer	yolanda: crinkle is the problems with ansible-puppet why the controller00 is failing puppet?	21:49
crinkle	nibalizer: yes	21:49
nibalizer	cool	21:49
crinkle	and most likely everything in infra is failing	21:49
nibalizer	crinkle: suprisingly not	21:50
nibalizer	probably because we do not have manage_true	21:50
crinkle	ah	21:50
nibalizer	er manage_config: true	21:50
jeblair	manage_config=true but yeah	21:50
nibalizer	in the other playbooks	21:51
nibalizer	0 controller00 puppet-user[11898]: Could not find data item elasticsearch_nodes in any Hiera data file and no default supplied	21:53
nibalizer	crinkle: yolanda	21:53
crinkle	right that is what led to this	21:53
crinkle	there is a hiera.yaml that points hiera data dir to /etc/puppet/hierdata	21:53
jeblair	(is elasticsearch_nodes something that controller00 should be attempting to lookup?)	21:54
crinkle	jeblair: it's in the public common.yaml	21:54
nibalizer	the problem is 284942	21:54
nibalizer	we need that	21:54
crinkle	jeblair: http://git.openstack.org/cgit/openstack-infra/system-config/tree/manifests/site.pp#n8	21:55
clarkb	http://www.meetup.com/OpenStack-Colorado/events/228594900/ if you want to learn about magnum and kolla	21:57
Clint	alas, i have a prior engagement	21:58
nibalizer	https://review.openstack.org/#/c/284954/	21:58
clarkb	mordred: would it be a bad idea to remove that node fomr the nodepool db so that nodepool can boot another machine?	21:59
clarkb	I think we should've applied the new nova compute config everywhere at this point	21:59
yolanda	infra-root can you review https://review.openstack.org/#/c/284942/ ?	21:59
mordred	clarkb: sure. which node is it?	21:59
clarkb	8311836 with uuid ed78ffbb-ce70-43a1-8c5d-b0bbbe7d8174	22:00
*** delatte has quit IRC		22:00
mordred	clarkb: done	22:01
mordred	clarkb: I have not yet figured out the quota thing	22:01
clarkb	mordred: done meaning nova no longer know about it?	22:01
jeblair	clarkb: puppet has been broken; i'm not sure if we know whether it has been applied everywhere	22:01
clarkb	jeblair: rgr	22:01
mordred	clarkb: I have done the same db things to that node as I did to the previous one	22:02
clarkb	mordred: woot thanks	22:02
mordred	update instances set task_state='deleted', deleted=1 where uuid= 'ed78ffbb-ce70-43a1-8c5d-b0bbbe7d8174';	22:02
mordred	for what it's worth	22:02
yolanda	infra-root, i need reviews again for https://review.openstack.org/284938, i refactored	22:02
jeblair	clarkb: where's the setting?	22:02
clarkb	jeblair: /etc/nova/nova.conf grep for 'unsafe'	22:02
jeblair	clarkb: it is applies everywhere	22:03
jeblair	applied	22:03
clarkb	cool	22:04
mordred	clarkb: how many instances does nova think are in use now?	22:04
jeblair	(confirmed with ansible adhoc grep)	22:04
clarkb	mordred: nova? or nodepool?	22:04
mordred	either	22:04
clarkb	nodepool says 0	22:04
crinkle	one (the mirror)	22:04
crinkle	from nova's perspective	22:04
mordred	that's in a different project	22:04
mordred	\| 2016-02-24 20:20:24 \| 2016-02-25 18:12:57 \| NULL \| 17 \| 894a11e0a16a4c29bb8b884c1c70bf2c \| instances \| 2 \| 0 \| NULL \| 0 \| 7dbe0f121e424a74be2eed25399e2c75 \|	22:04
mordred	\| 2016-02-24 20:20:24 \| 2016-02-25 18:12:57 \| NULL \| 18 \| 894a11e0a16a4c29bb8b884c1c70bf2c \| ram \| 16384 \| 0 \| NULL \| 0 \| 7dbe0f121e424a74be2eed25399e2c75 \|	22:04
mordred	\| 2016-02-24 20:20:24 \| 2016-02-25 18:12:57 \| NULL \| 19 \| 894a11e0a16a4c29bb8b884c1c70bf2c \| cores \| 32 \| 0 \| NULL \| 0 \| 7dbe0f121e424a74be2eed25399e2c75 \|	22:04
mordred	that's the quota usage for the nodepool project	22:04
mordred	I think I'd like to set instance, ram and core usage to 0	22:04
mordred	I have now done that	22:05
mordred	I think quotas should be correct now	22:05
clarkb	the next VM booted should use cache unsafe on its boot disk	22:07
mordred	woot	22:07
clarkb	you can confirm this by running a ps -elf on that process	22:07
clarkb	and part of the command is the string used for the root disk	22:07
clarkb	was saying cache=none should say cache=unsafe	22:07
Clint	clarkb: so it hasn't yet today?	22:08
clarkb	no because puppet wasn't running on the computes	22:08
clarkb	then it ran and it broke the single VM which made nodepool stop making new ones	22:09
Clint	ahh	22:09
fungi	fragility, thy name is openstack	22:11
clarkb	LaunchStatusException: Server ca1453ce-23bb-41e7-ba8c-9e44415f0064 for node id: 8316914 status: ERROR	22:12
clarkb	so thats neat	22:12
mordred	mmm	22:12
mordred	that's a great status	22:12
crinkle	clarkb: run nova hypervisor-list	22:12
jeblair	clarkb: yeah, rabbitmq is broken because of the puppet break	22:12
clarkb	aha	22:12
jeblair	i think all puppet-fixing patches have been aprvd	22:13
jeblair	but only just now	22:13
nibalizer	284938	22:14
nibalizer	284954	22:15
Clint	full of lies	22:15
jeblair	936, 938 and 942	22:15
*** delatte has joined #openstack-sprint		22:25
yolanda	infra-root, i need https://review.openstack.org/#/c/284447/j	22:30
yolanda	https://review.openstack.org/#/c/284447/ , for controller to work	22:31
*** delattec has joined #openstack-sprint		22:33
mordred	https://review.openstack.org/284969	22:35
mordred	clarkb, crinkle: ^^	22:35
mordred	that should be usable to delete an instance from the db - and to clean out soft-deleted records	22:35
yolanda	crinkle, i fixed... and also change https://review.openstack.org/#/c/284881/ to properly send neutron in west needs to land https://review.openstack.org/#/c/284881/	22:36
*** delatte has quit IRC		22:36
crinkle	yolanda: commented on 881	22:37
mordred	perhaps we should just make a cron to install that script and run it every 15 minutes?	22:39
mordred	we pretty much never want to keep actual deleted records around	22:39
crinkle	puppet is still broken, hiera.yaml still has wrong datadir	22:40
crinkle	oh, possibly because 284936 merged after the last puppet run	22:41
fungi	i now have a cab booked departing to den from the hilton garden inn at 5:30pm local if anyone needs to share a ride	22:46
jeblair	mordred: yeah, add it to puppet-infracloud... :\|	22:47
crinkle	puppet looks fixed	22:50
* crinkle crosses fingers		22:50
jeblair	crinkle: \o/	22:50
yolanda	infra-root, this change https://review.openstack.org/#/c/284447/ is blocking addition of controller in useast... can you review ?	22:56
crinkle	puppet successfully ran on controller00, was it able to start on the computes?	22:58
crinkle	nibalizer: yolanda ^	22:58
crinkle	my reading of puppetboard makes it look like controller hasn't submitted a report in a couple of hours	22:59
yolanda	and this https://review.openstack.org/284881 needs to land first	23:00
*** delatte has joined #openstack-sprint		23:01
crinkle	I don't think 881 and 447 will pass puppet apply separately, I think they need to be in the same change	23:02
*** delattec has quit IRC		23:03
yolanda	infra-root, i abandoned 881 and added west to https://review.openstack.org/284447	23:08
nibalizer	crinkle: we failed to post facts	23:10
nibalizer	so no	23:10
nibalizer	im debugging	23:10
crinkle	nibalizer: :'(	23:10
nibalizer	weeping uncontrollably	23:12
fungi	thanks yolanda	23:12
fungi	nibalizer: sniffle	23:12
yolanda	jeblair, can you add another +2 to 284447	23:18
jeblair	nibalizer: https://review.openstack.org/284346	23:19
mordred	jeblair: shouldn't it reference remote_puppet_adhoc?	23:22
mordred	oh. piddle. that didn't land yet because linters	23:23
mordred	silly me - I got indentation of the shellscript wrong	23:24
mordred	https://review.openstack.org/#/c/284352/	23:24
yolanda	thx jeblair	23:25
yolanda	jeblair http://paste.openstack.org/show/488292/	23:38
rcarrillocruz	huh	23:38
rcarrillocruz	mordred: there's no os_domain module	23:38
rcarrillocruz	?	23:38
* rcarrillocruz sad panda		23:38
mordred	rcarrillocruz: https://github.com/ansible/ansible-modules-extras/blob/devel/cloud/openstack/os_keystone_domain.py	23:39
rcarrillocruz	aha!	23:39
rcarrillocruz	thanks	23:39
mordred	a domain is an admin-only thing, so it gets prefixed with the name of the service	23:39
rcarrillocruz	any reason for 'keystone' in middle	23:40
rcarrillocruz	ah, see the pattern	23:40
rcarrillocruz	something i see there's missing is an os_quota thing	23:40
rcarrillocruz	?	23:40
rcarrillocruz	in case, would that be os_server_quota	23:40
rcarrillocruz	os_port_quota	23:40
rcarrillocruz	etc	23:40
rcarrillocruz	or os_quota with the resource passed in as param	23:41
rcarrillocruz	thinking on potential stream works for myself	23:41
rcarrillocruz	mordred: ^	23:43
mordred	hrm	23:44
mordred	I think probably os_nova_quota	23:44
mordred	again - it's an admin thing, so you konw you're setting nova quotas	23:44
rcarrillocruz	erm, yeah	23:45
mordred	I think we could also have the ansible modules do validation that the project_id you're passing in is a valid project_id	23:45
crinkle	have we put together an etherpad yet for our networking request? and/or could someone start that?	23:49
crinkle	clarkb: can you review and help expand on the bottom of https://etherpad.openstack.org/p/mitaka-infra-midcycle so we can send it as an email to the dc ops team?	23:53
*** krtaylor has quit IRC		23:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!