Friday, 2015-03-20

*** hichihara has joined #openstack-infra		00:00
anteaya	well we never fail to mark feature freeze in new and exciting ways	00:01
mordred	clarkb: do we make nova quota queries in nodepool?	00:01
mordred	clarkb: I thought we just hard-coded all of that in our config file	00:02
clarkb	mordred: I dunno, can grep	00:02
mordred	me too	00:02
clarkb	git grep says no	00:02
mordred	I agree	00:02
clarkb	and that the config determines it	00:02
*** VijayTripathi has quit IRC		00:02
*** dannywilson has quit IRC		00:03
cinerama	pleia2: so it looks like the openstackid bit is working with my patch. zanata does seem to have sprouted an additional login button though which is a bit odd	00:03
*** camunoz is now known as camunoz_mtg		00:04
pleia2	cinerama: I think we'll need to play around with that a bit anyway, since we need to make sure the only option for logging in is via openstackid	00:04
pleia2	cinerama: but yay!	00:04
*** ZZelle_ has quit IRC		00:04
cinerama	pleia2: this is of course testing against a local install of openstackid so i don't have any accounts on there as yet but it does seem to be redirecting correctly	00:04
cinerama	pleia2: oh, both buttons go to openstackid...	00:05
pleia2	cinerama: so it's more than theoretical!	00:05
cinerama	pleia2: as i said, it's a bit odd :)	00:05
pleia2	cinerama: hah, fun	00:05
*** achanda has quit IRC		00:05
fungi	this feature freeze we took down a major service provider... wondering how we can top that next cycle ;)	00:05
*** baoli has quit IRC		00:06
jeblair	mordred: i think we're going to continue to try to delete those servers	00:06
* fungi knows we probably didn't, but that's what history will record		00:07
fungi	should we stop nodepool, delete those rows and then clean up aliens later?	00:07
*** baoli_ has joined #openstack-infra		00:07
clarkb	can we safely delete those rows out from under nodepool?	00:07
clarkb	probably not	00:07
clarkb	so shutdown is required	00:07
clarkb	(just thinking out loud here)	00:08
pleia2	fungi: I'm going to go with that story	00:08
*** baoli_ has quit IRC		00:08
*** asselin_ has joined #openstack-infra		00:08
*** garyh has quit IRC		00:09
jeblair	fungi, clarkb: i think row locks will prevent us from deleting while running. stopping, delete, then alien cleanup later is probably best.	00:09
mordred	jeblair: oh - that's a good point	00:09
fungi	i can give that a shot now unless someone else is already doing it	00:10
jeblair	fungi: go for it	00:10
*** oomichi has joined #openstack-infra		00:11
*** garyh has joined #openstack-infra		00:11
*** emagana has quit IRC		00:11
anteaya	yay the nova patch ttx needed merged	00:13
reed	woot	00:13
fungi	okay, it's running again and the hpcloud nodes are all gone from the db	00:13
anteaya	just trove and cinder to go	00:13
anteaya	cool	00:13
jeblair	fungi: the log seems to not be mentioning hpcloud which is good	00:14
*** YorikSar has quit IRC		00:14
SlickNik	anteaya: trove patch is in the gate as well. fingers crossed	00:14
reed	meanwhile I'm patting myself on the back for having understood how mediawiki template work (at very high level) Check this out https://wiki.openstack.org/wiki/Template:InternshipIdea	00:14
* reed proud of the useless (almost) knowledge accumulated		00:14
anteaya	SlickNik: yes I see that, and I have my reservations about that patch and shared my thoughts with ttx as well	00:14
fungi	reed: cool--we have something similar for the third-party ci systems pages too	00:15
anteaya	SlickNik: the fact that patchset 2 failed jenkins 5 times disconcerts me, but it is your project	00:15
anteaya	and patchset 5 failed 3 times, etc. etc.	00:15
reed	fungi, neat-o	00:15
reed	fungi, now all those need are a category :)	00:15
*** yamamoto has joined #openstack-infra		00:15
*** sputnik13 has quit IRC		00:16
fungi	reed: https://wiki.openstack.org/wiki/Template:ThirdPartySystemInfo	00:16
SlickNik	anteaya: I looked into that. The earlier failures we were seeing were caused by a flaky assert in one of the unit tests that made it past the gate.	00:16
anteaya	SlickNik: devs rechecking a patch rather than fixing it	00:16
SlickNik	anteaya: I have a different patch that's going through check now to fix that issue.	00:16
anteaya	SlickNik: yes	00:16
anteaya	SlickNik: okay	00:16
reed	fungi, do you mind if I add a category: to that template?	00:17
fungi	reed: an additional category? i'm sure that's fine	00:17
reed	fungi, or, in other words, how do you collect all the pages create with that template?	00:17
anteaya	reed: what category are you thinking?	00:17
reed	oh, I se it	00:17
*** tiswanso_ has quit IRC		00:17
*** koolhead17 has quit IRC		00:17
anteaya	reed: https://wiki.openstack.org/wiki/ThirdPartySystems	00:18
reed	anteaya, fungi, my bad, I din't see [[Category:ThirdPartySystems]]	00:18
fungi	reed: yeah, they all wind up in https://wiki.openstack.org/wiki/Category:ThirdPartySystems	00:18
reed	supercool	00:18
anteaya	thanks	00:18
anteaya	didn't know you didn't know	00:18
reed	that part makes me hate mediawiki less	00:18
reed	a little less	00:18
SlickNik	referring to https://review.openstack.org/#/c/165995/	00:18
anteaya	well that's good	00:18
*** sdake_ has joined #openstack-infra		00:19
openstackgerrit	Dan Prince proposed openstack-infra/system-config: Re-order tripleo Zuul images (to see if it helps) https://review.openstack.org/166055	00:19
SlickNik	anteaya: Was talking to fungi about exactly that yesterday. How to get folks to move away from the "recheck" habit.	00:19
anteaya	SlickNik: what did you come away with as an understanding?	00:20
anteaya	shutting off a test that exposes a race doesn't feel right to me, btw	00:20
anteaya	but I'm curious to hear your take away from your conversation with fugi	00:21
SlickNik	anteaya: It's a race in the test, not in the code.	00:21
anteaya	fungi	00:21
anteaya	okay	00:21
SlickNik	For starters: Taking into consideration the number of rechecks a patch has gone through when reviewing the patchset.	00:21
*** sdake__ has joined #openstack-infra		00:21
*** sdake has quit IRC		00:22
anteaya	that is a good place to begin, I agree	00:23
*** VijayTripathi has joined #openstack-infra		00:23
*** stevemar has joined #openstack-infra		00:23
SlickNik	fungi mentioned that for the current patchset that number is now visible at the top of the review as well — which is super cool.	00:23
anteaya	very helpful	00:24
greghaynes	A nice thing we have in tripleo land is http://goodsquishy.com/downloads/s_tripleo-jobs.html which gives us pass rates, having stats on per-job pass rates could be pretty enlightening	00:24
greghaynes	not sure if zuul has that already...	00:24
clarkb	greghaynes: http://graphite.openstack.org	00:25
*** sdake_ has quit IRC		00:25
greghaynes	since generally recheck is a side effect of a test that doesnt pass very often	00:25
clarkb	jogo has a set of graphs that he uses	00:25
clarkb	built from that data	00:25
greghaynes	nice (although graphite)	00:25
anteaya	SlickNik: sounds like you have a good place to begin	00:26
greghaynes	Its neat because even just the % pass rate is a super useful stat	00:26
anteaya	SlickNik: well done	00:26
*** dprince has joined #openstack-infra		00:26
dprince	Still not seeing any Fedora 20 jobs in the TripleO rack.	00:28
greghaynes	dprince: clarkb says its your keystone	00:28
SlickNik	Sorry I was looking at the graphite metrics.	00:28
greghaynes	dprince: apparently its configured to not use a routable address	00:28
anteaya	SlickNik: yup	00:28
*** tkelsey has joined #openstack-infra		00:28
clarkb	greghaynes: dprince ya just run nova --debug list	00:28
dprince	greghaynes: so why are Ubuntu nodes running fine	00:28
clarkb	you get back 10.1.8.37 as something to talk to	00:29
clarkb	dprince: I do not know but in my investigating I ran into ^	00:29
SlickNik	anteaya: thanks! it's still WIP but we hope to get more disciplined about it.	00:29
greghaynes	dprince: the other question id have for that is, did anything change?	00:29
SlickNik	And any data we can gather around how we're doing is super useful. :)	00:29
*** tjones1 has joined #openstack-infra		00:29
greghaynes	(on the tripleo ci cloud end)	00:29
*** garyh has quit IRC		00:29
clarkb	dprince: "links": [{"href": "http://10.1.8.37:5000/v2.0/", "rel": "self"}	00:30
dprince	clarkb: you may have found a different issue, but by my local testing Fedora nodes fire up fine, and I can get a floating IP too	00:30
dprince	clarkb: that is the local IP, weird	00:30
dprince	clarkb: let me check some other things	00:30
clarkb	dprince: until ^ is fixed I don't think there is much I can do from this end	00:30
anteaya	SlickNik: sure	00:31
dprince	clarkb: okay. looking into it	00:31
anteaya	SlickNik: the biggest thing I have seen is response to a test failure	00:31
jeblair	clarkb, fungi, mordred: we have an okay to start performing alien cleanup	00:31
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool: Ignore stderr for documentation program output https://review.openstack.org/166057	00:31
anteaya	SlickNik: the more the devs think oh perhaps something is wrong with my patch	00:31
fungi	jeblair: i'll fire that off now	00:31
anteaya	SlickNik: the better the quaility of patches	00:31
jeblair	clarkb, fungi, mordred: i think they're going to ask us to turn things on again to verify the issue	00:32
clarkb	jeblair: while we have their attention maybe they can unlock that one node for us?	00:32
jeblair	(after we cleanup)	00:32
anteaya	SlickNik: when they believe that a bad structure is preventing their great code from merging, that is when problems arise	00:32
jroll	anteaya: relatedly, I've found that when I review my own patches, they tend to improve	00:32
fungi	jeblair: i'm not sure i like the sound of that ;)	00:32
anteaya	jroll: good point	00:32
jeblair	clarkb: good idea -- let's just delete everything and we'll give them a list of what we can't	00:32
jeblair	fungi: they aren't sure they do either :)	00:32
clarkb	jeblair: +1	00:32
*** tkelsey has quit IRC		00:32
jroll	anteaya: which is probably where that comes from, they re-read the patch to try to find the bug	00:33
fungi	jeblair: clarkb: yep, i'll make an instance uuid list of whatever's left after deletes finish	00:33
jeblair	fungi: let me/us know if you can use another hand on cleanup	00:33
anteaya	jroll: exactly	00:33
SlickNik	anteaya / jroll: ++	00:33
anteaya	SlickNik: so thanks for being proactive	00:33
SlickNik	it's def a mindset thing.	00:33
anteaya	SlickNik: taht helps	00:33
anteaya	SlickNik: it is	00:34
fungi	499 alien nodes	00:34
mordred	fungi: heh	00:34
fungi	in hpcloud anyway	00:34
jroll	whoa, we broke hp cloud?	00:34
jroll	lol.	00:34
* jroll is somewhat surprised hp fell over first		00:35
*** ddieterly has joined #openstack-infra		00:35
jeblair	fungi: i figure it's probably safe to run deletes in series across N parallel processes, where 6<N<12	00:35
*** markvoelker has quit IRC		00:37
ianw	dprince: i have no idea what's going on because scrollback is huge; but i see fedora 20 and just yesterday i fixed an issue with the latest f20 kernel update that creates a broken extlinux.conf and hence it no boot.	00:38
clarkb	ianw: this is for the tripleo f20 nodes so we didn't think that was related	00:39
ianw	ok, i figured there was hours of context i'm missing	00:40
clarkb	ianw: and in attempts to investigate I ran into the 10.1.8.37 address coming back from keystone so tlaking to the cloud was cut short	00:40
fungi	i have 10 parallel loops going over equal slices of the ~500 nodes	00:40
fungi	hopefully this goes fairly quickly	00:40
*** adalbas has joined #openstack-infra		00:41
reed	wow! I didn't know you could include inside a mediawiki page the content of another page by simply adding {{:name_of_the_page}}	00:41
fungi	reed: you can also transclude subsections	00:41
reed	they call it 'transclusion' http://www.mediawiki.org/wiki/Transclusion	00:41
fungi	yep	00:41
dprince	ianw: could be related	00:42
jeblair	fungi: have an idea how long each delete is taking?	00:42
fungi	jeblair: i can probably wall clock time one. just a moment	00:42
dprince	ianw: we haven't had Fedora 20 tripleo jobs running all day	00:42
dprince	ianw: for TripleO that is (not speaking in general)	00:43
fungi	jeblair: the bad news is, ~30sec each	00:43
jeblair	fungi: so could take 25 mins	00:43
fungi	jeblair: yep	00:43
*** adalbas has quit IRC		00:44
fungi	unless any "stuck" nodes cause some deletes to take an extra long time	00:44
*** asettle has quit IRC		00:44
fungi	okay, i'm also seeing some go as quickly as 5 seconds, so i think it's extremely variable	00:44
fungi	because you know, it's the cloud	00:44
fungi	who needs consistency really?	00:44
clarkb	fungi: its eventually ocnsistent	00:45
clarkb	ianw: mordred 165792	00:45
clarkb	ianw: I think the whole point is to not raise the exception so that we can list the aliens that we do know about isntead of dying early and writing no prettytable	00:46
greghaynes	I should really make a test for that	00:46
greghaynes	since its been that kind of week	00:46
mordred	ianw: yes, what clarkb said	00:46
*** ddieterly has quit IRC		00:46
*** dalgaaf has quit IRC		00:47
*** tjones1 has quit IRC		00:49
clarkb	greghaynes: comment for you on 165682	00:50
*** ddieterly has joined #openstack-infra		00:51
ianw	mordred: ok, well it's still the only thing that writes to stderr like that ... maybe it should log.error . or ignore me, that's fine too :)	00:51
*** bknudson has joined #openstack-infra		00:51
greghaynes	clarkb: yea, so I was a bit confused and not wanting to read all of nova client code - why is novaclient.images a list of nodes you say?	00:52
jeblair	clarkb, fungi, mordred: i have to run now, i will check back in after dinner	00:52
mordred	kk	00:52
clarkb	greghaynes: I am pretty sure its listing nodes there	00:52
clarkb	greghaynes: because of the ip addrs	00:52
fungi	thanks jeblair	00:52
greghaynes	clarkb: That list is instantiated as FakeClient.images	00:52
greghaynes	oh wait	00:53
greghaynes	its just our abstract list thing	00:53
clarkb	no create_image is a different method there	00:53
greghaynes	ugh, so this needs some more work	00:53
greghaynes	the problem is its used in more than one place	00:53
ianw	mordred: also, image_list doesn't want the same thing?	00:53
clarkb	greghaynes: I don't think it needs major work	00:53
clarkb	greghaynes: just a more generic fake object so the code doesn't read funny	00:53
greghaynes	clarkb: well, I also need to use it for glance images. I think the right thing to do is just get rid of FakeGlanceImage and add the method I need to dummy	00:54
clarkb	FakeCloudResource or something	00:54
clarkb	greghaynes: ya or that	00:54
greghaynes	Yea, thats essentially equivilent	00:54
mordred	ianw: it may?	00:55
greghaynes	ianw: Ive been fixing those bugs as I go when im adding fail tests, theres a lot of them	00:55
greghaynes	so its likely other commands need it	00:55
clarkb	greghaynes: so you're good with that -1?	00:56
greghaynes	clarkb: yes	00:56
greghaynes	I mean, its correct ;)	00:56
greghaynes	clarkb: another note about that patch - it leaves fake dib images around in the test dir	00:58
greghaynes	not sure how much we care about that	00:58
clarkb	greghaynes: isn't the test dir a tmpdir fixture?	00:59
clarkb	greghaynes: if thats the case it should be fine since the fixture should clean it up	00:59
*** mfink_ has joined #openstack-infra		01:00
greghaynes	oh, good point, we must not be passing a path to the fixture dir for the dib image output dest	01:00
*** corvusphone has joined #openstack-infra		01:00
greghaynes	ill mess with that too	01:02
*** Sukhdev has joined #openstack-infra		01:02
clarkb	oh btw one suggestion on the jenkins bug I filed is that we use a jenkins cloud slave plugin instead	01:03
*** Ryan_Lane has quit IRC		01:03
dprince	clarkb: try now	01:03
dprince	clarkb: nova --debug lists	01:03
dprince	list	01:03
clarkb	dprince: yup that worked (well I did nova floating ip list	01:03
clarkb	dprince: I think at least part of the problem is we have leaked floating IPs	01:04
clarkb	so will clean that up now	01:04
dprince	clarkb: I'm not aware of any changes we put into place on our side, so I'll check on how this keystone setting could have been altered.	01:04
dprince	clarkb: otherwise I'm not sure how this ever worked	01:05
clarkb	{"message": "Unknown auth strategy", "code": 500, "created": "2014-11-23T20:17:29Z"} errors like that for a precise node	01:05
dprince	clarkb: Yeah, I saw those too. Possibly related to this keystone change?	01:05
clarkb	dprince: maybe?	01:06
*** mfink_ has quit IRC		01:06
dprince	clarkb: I was able to create a Fedora node 45 minutes ago	01:06
clarkb	anyways let me clean up the floating ips and see if tht makes it a healthier cloud	01:06
dprince	clarkb: and a floating ip too	01:06
dprince	clarkb: yeah, step at a time. Cleanup and lets see :)	01:06
*** LinuxJed_ has quit IRC		01:07
*** mayurig has joined #openstack-infra		01:08
*** dims has joined #openstack-infra		01:08
*** asettle has joined #openstack-infra		01:10
*** tnovacik has quit IRC		01:10
openstackgerrit	Darragh Bailey proposed openstack-infra/jenkins-job-builder: Treat non-existant output files as empty files https://review.openstack.org/166062	01:11
*** ghostpl_ has joined #openstack-infra		01:11
*** ddieterl_ has joined #openstack-infra		01:11
*** ddieterly has quit IRC		01:12
fungi	deletes are wrapping up now	01:12
dprince	ianw: while I'm waiting is there a ticket open for the Fedora boot error you mentioned?	01:12
corvusphone	clarkb: speaking of which, we should check ports and ips on hpcloud	01:12
*** mfink_ has joined #openstack-infra		01:12
ianw	dprince: see the comments in https://review.openstack.org/#/c/165681/1/install_puppet.sh	01:12
clarkb	corvusphone: I can do that now	01:12
dprince	ianw: thanks	01:12
ianw	dprince: my desire to debug grubby on f20 was/is quite low, especially when it works with a later version of it	01:13
clarkb	ya floating IPs definitely leaked there	01:13
dprince	ianw: sound fine to me	01:13
clarkb	starting a round of deletions for hpcloud FIPs	01:13
mordred	corvusphone: WOAH! when did you start corvusphoning?	01:13
*** ivar-lazzaro has quit IRC		01:13
mordred	clarkb: harvard is going toe to toe with unc	01:14
clarkb	mordred: I have my television on downstairs with no one watching it like a good MURICAN	01:14
fungi	last of the deletes just finished but there are 52 which didn't delete, so i'm starting a second pass with just those	01:14
anteaya	mordred: when we break hpcloud it appears	01:14
clarkb	fungi: ok, I just starting floating ip deletion	01:15
anteaya	clarkb: ha ha ha	01:15
clarkb	and I just got rate limited	01:15
openstackgerrit	Darragh Bailey proposed openstack-infra/jenkins-job-builder: Convert all inline publisher examples to tests https://review.openstack.org/166064	01:15
corvusphone	mordred: its just the webchat. I need to put a proper setup on my phone.	01:15
*** LinuxJedi has joined #openstack-infra		01:15
*** ghostpl_ has quit IRC		01:16
clarkb	I have restarted floating ip deletes serially	01:16
clarkb	can I just say that independently managing 3 different resources in order to get one working VM is really not fun	01:17
*** markvoelker has joined #openstack-infra		01:17
clarkb	especially when I get rate limited doing it when reality is I need one api call to get one node (or maybe more than one node)	01:18
*** asettle has quit IRC		01:19
*** asettle has joined #openstack-infra		01:19
lifeless	ratelimiting is a PITA	01:21
*** Sukhdev has quit IRC		01:21
*** prad has quit IRC		01:21
mordred	clarkb: HARVARD JUST TOOK THE LEAD	01:22
*** markvoelker has quit IRC		01:22
mordred	clarkb: 3-point shot and the foul	01:22
mordred	clarkb: 1:15 to go	01:22
*** otter768 has joined #openstack-infra		01:23
clarkb	ha I just turned it off because I realized I didn't need it on	01:23
clarkb	but maybe I should go back and watch	01:23
clarkb	lifeless: yes xargs needs a rate limit flag	01:23
clarkb	I could put a sleep in the commands I suppose	01:24
lifeless	yes	01:24
lifeless	and cry into your sleep	01:24
*** claudiub has quit IRC		01:24
*** corvusphone has quit IRC		01:24
openstackgerrit	Darragh Bailey proposed openstack-infra/jenkins-job-builder: Only query jenkins plugins if config provided https://review.openstack.org/158826	01:25
*** corvusphone has joined #openstack-infra		01:25
*** dmorita has quit IRC		01:26
mordred	clarkb: dude. that was almost CRAZY	01:26
clarkb	dprince: {"message": "Failed to terminate process 16378 with SIGKILL: Device or resource busy", "code": 500, "created": "2015-03-20T01:04:29Z"} is the error I see on a random f20 node	01:27
*** otter768 has quit IRC		01:27
corvusphone	mordred: who won?	01:27
clarkb	dprince: also floating IPs should be cleaned up now	01:27
mordred	corvusphone: unc	01:28
mordred	corvusphone: at. the. end	01:28
* mordred using weechat android ...		01:29
clarkb	I am going to run the delete port script on hpcloud now	01:29
mordred	clarkb: cool	01:29
mordred	fungi: we ready to start ramping up again yet?	01:30
*** garyh has joined #openstack-infra		01:30
clarkb	floating IPs were cleaned up so its just the ports left though port list was small so if we have leaked there its minimal	01:31
fungi	each pass through retrying to delete i manage to knock down a few more, but there are still 45 which haven't succeeded yet	01:31
mordred	that's so special	01:31
openstackgerrit	Darragh Bailey proposed openstack-infra/jenkins-job-builder: Convert all inline publisher examples to tests https://review.openstack.org/166064	01:31
fungi	should we just give that list of uuids to hpcloud and fire back up?	01:31
fungi	we were able to delete >90% anyway	01:32
clarkb	I am being dragged to dinner	01:32
clarkb	will check in later	01:32
fungi	oh, the requests must still be getting processed because it's dropped to 38 now	01:33
mordred	kk	01:33
*** tiswanso has joined #openstack-infra		01:33
mordred	fungi: so - maybe we should just turn back on and see how it goes?	01:34
fungi	i guess that's what "Request to delete server X has been accepted." means	01:34
fungi	mordred: yeah, it's probably safe to hand-revert 166043 now	01:34
mordred	fungi: well, it's hand applied :)	01:34
mordred	fungi: I'll do that now	01:35
fungi	indeed	01:35
mordred	fungi: rate: 4.0 -- is that the thing I should adjust to adjust api rate limit?	01:36
fungi	yeah	01:36
mordred	rackspace is set to 1.0	01:36
mordred	maybe I should set hp to that too to be nice?	01:36
fungi	mordred: i think it's inversely named and is actually a frequency?	01:38
fungi	as in the lower the number the faster we poll (value indicating fraction of a second delay between polls)	01:38
*** baoli has joined #openstack-infra		01:39
corvusphone	Rackspace is much lower (faster) than 1.0	01:40
*** garyh has quit IRC		01:40
corvusphone	4.0 is one request every 0.8 secs (4.0/5)	01:40
corvusphone	(5 HP providers)	01:41
*** virmitio has quit IRC		01:42
mordred	corvusphone: kk. cool	01:43
openstackgerrit	Stephanie Miller proposed openstack-infra/puppet-zanata: Add OpenID login provider support to Zanata config https://review.openstack.org/166073	01:43
*** ddieterl_ has quit IRC		01:45
*** baoli_ has joined #openstack-infra		01:45
*** ddieterly has joined #openstack-infra		01:45
*** harlowja_ is now known as harlowja_away		01:46
*** otter768 has joined #openstack-infra		01:46
mordred	you know - when I get the shade patch done	01:47
*** baoli has quit IRC		01:48
mordred	some of the metadata caching support may be nice - like flavors	01:48
*** garyh has joined #openstack-infra		01:49
mordred	or - it's possible that "tail -f debug.log \| grep hpcloud is only showing me FlavorListTask ...	01:49
fungi	i'm winding down here, but will try to keep an eye on irc for a little while longer	01:49
*** tsg has quit IRC		01:50
anteaya	the trove patch ttx has been waiting on is in the top of the gate	01:51
cinerama	pleia2, StevenK: did we ever work out the deal with the zanata client?	01:51
anteaya	and these two cinder patches: https://review.openstack.org/#/q/status:open+topic:cinder-driver-removals,n,z	01:52
anteaya	and I do believe that is all ttx needs	01:52
*** camunoz_mtg has quit IRC		01:52
StevenK	cinerama: Packaging it is hard.	01:54
StevenK	cinerama: We don't have support for building and updating the packaging anyway.	01:54
StevenK	cinerama: I have a WIP patch to change system-config to install the cli client on the proposal slave	01:54
cinerama	StevenK: sounds like we should just install like we do with other nonpackaged stuff in puppet	01:55
StevenK	cinerama: Yes, which is what my WIP patch does	01:55
cinerama	StevenK: oh cool. is it up anywhere yet?	01:56
StevenK	cinerama: I want to test it before pushing it up	01:56
cinerama	StevenK: bo-ring :)	01:56
StevenK	Hah	01:56
mordred	fungi, clarkb, jeblair: it FEELS like nothing is happening, other than hp listing flavors	01:56
cinerama	insert dos equis man hipchat emoji here	01:57
clarkb	mordred are there deficit calculations and allocations?	01:57
clarkb	mordred thats what Ibwould look for in the log	01:57
*** tqtran has quit IRC		01:58
*** asettle has quit IRC		01:58
*** tiswanso has quit IRC		01:58
*** garyh has quit IRC		02:00
mordred	clarkb: 2015-03-20 01:57:16,230 DEBUG nodepool.NodePool: Deficit: bare-trusty: 0 (start: 232 min-ready: 8 ready: 240 capacity: 75)	02:00
*** woodster_ has quit IRC		02:00
clarkb	ok so it wants to start 232 thats good	02:00
clarkb	below that should be the allocations any to hpcloud?	02:00
clarkb	oh! do we still have images?	02:01
clarkb	maybe those are building?	02:01
*** tiswanso has joined #openstack-infra		02:01
mordred	clarkb: welll... a) I don't see allocations - but I did just get a 500 error	02:01
*** asettle has joined #openstack-infra		02:02
mordred	clarkb: just went to hell	02:03
mordred	clarkb: quotas back to 0	02:03
clarkb	oh?	02:03
mordred	yah	02:03
mordred	same thing as before	02:03
*** unicell1 has quit IRC		02:04
*** asselin_ has quit IRC		02:04
mordred	I show 381 nodes in nodepool in some sort of state	02:05
mordred	clarkb: does our rate limit apply to database as well?	02:05
mordred	gah	02:06
mordred	to delete	02:06
clarkb	mordred ya	02:06
*** camunoz_mtg has joined #openstack-infra		02:06
clarkb	so maybe bump it up with quota 0	02:06
mordred	ok - I just set the rate to 16	02:06
mordred	with quota 0	02:06
*** patrickeast has quit IRC		02:08
*** yamahata has quit IRC		02:11
clarkb	did that help?	02:11
jroll	mordred: are these the times where you wish you had access to HP control plane?	02:11
mordred	jroll: NO	02:11
jroll	lol	02:11
mordred	clarkb: the api plane seems to be recovering	02:11
dprince	clarkb: are all the F20 nodes failing that way?	02:11
jroll	interesting, I would want to figure out what's wrong	02:11
anteaya	are there times you wish you had access to HP control plane, mordred?	02:12
dprince	clarkb: any successful onces?	02:12
*** nelsnels_ has joined #openstack-infra		02:12
mordred	jroll: well, I mean - I sort of would - but I don't like the semblance of cuplability for a system I don't own	02:12
dprince	clarkb: nm, I can see those as well	02:13
clarkb	dprince I cant check right now, getting foods	02:13
*** sigmavirus24 is now known as sigmavirus24_awa		02:14
jroll	mordred: I guess I get that; kind of like how y'all think I have magic rackspace powers	02:14
jroll	:P	02:14
*** jyuso has quit IRC		02:14
*** nelsnelson has quit IRC		02:14
*** tiswanso has quit IRC		02:15
mordred	clarkb: 16 wasn't enough for them to recover - I've set it to 64	02:16
clarkb	mordred ok	02:16
anteaya	jroll: you don't have magic rackspace powers?	02:16
* anteaya realizes another illusion is blown		02:16
jroll	lol	02:18
jroll	anteaya: I have rackspace internal irc and ldap	02:18
jroll	turns out those are pretty useful	02:18
*** markvoelker has joined #openstack-infra		02:18
anteaya	you do have magic powers	02:18
anteaya	whew	02:18
jroll	:P	02:18
*** markvoelker has quit IRC		02:22
*** mayurig has quit IRC		02:23
openstackgerrit	greghaynes proposed openstack-infra/nodepool: Monkeypatch Fake Clients for tests https://review.openstack.org/165682	02:23
*** sdake__ has quit IRC		02:25
*** sdake has joined #openstack-infra		02:25
*** bhunter71 has joined #openstack-infra		02:27
greghaynes	clarkb: ^ I think that addresses your comments	02:28
mordred	clarkb: oh god	02:29
mordred	I just looked at nova source code	02:29
greghaynes	mordred: some things cant be unseen?	02:30
mordred	# NOTE(johannes): The quota code uses SQL locking to ensure races don't	02:30
mordred	# cause under or over counting of resources. To avoid deadlocks, this	02:30
mordred	# code always acquires the lock on quota_usages before acquiring the lock	02:30
mordred	# on reservations.	02:30
mordred	STABSTABSTAB	02:30
*** kaisers1 has joined #openstack-infra		02:31
mordred	clearly written by someone who knows nothing about databases	02:31
mordred	and should stop writing database code	02:31
anteaya	so talking with thingee, he is way more forgiving of ci account ops than I am, but it is his call and I am supporting him	02:31
anteaya	he might be coming in here and asking for a ci account to be disabled	02:31
mordred	okie	02:31
anteaya	I've given him a paste with all the info like the gerrit id if he decides to go ahead with it	02:31
anteaya	ttx's nova and trove patches are in	02:32
anteaya	and I'm going to bed	02:32
anteaya	night	02:32
*** jamielennox is now known as jamielennox\|lunc		02:32
*** jamielennox\|lunc is now known as jamielennox\|food		02:32
*** kaisers has quit IRC		02:32
mordred	corvusphone: I'm pumpkin-ing - I'm lurking, but need to stop doing active things	02:32
mordred	corvusphone: the current status is that we're back off for creates and deletes for hp in nodepool are throttled to 64	02:33
mordred	I think we can find an appropriate throttle number - but I also think there is an issue that should be solved internally too	02:33
mordred	so I'm not particularly interested in chasing the tip of a thundering herd	02:33
mordred	which is what we're doing right now	02:33
*** unicell has joined #openstack-infra		02:38
*** bhunter71 has quit IRC		02:39
*** woodster_ has joined #openstack-infra		02:39
openstackgerrit	Jerry Zhao proposed openstack-infra/nodepool: add option to use ipv6 for image update and node launching https://review.openstack.org/156178	02:40
*** bhunter71 has joined #openstack-infra		02:40
tchaypo	dstufft: around, perchance?	02:41
*** sputnik13 has joined #openstack-infra		02:45
*** achanda has joined #openstack-infra		02:45
*** ujuc has joined #openstack-infra		02:45
*** unicell has quit IRC		02:45
*** mfink__ has joined #openstack-infra		02:48
*** unicell has joined #openstack-infra		02:48
*** mfink_ has quit IRC		02:48
mordred	clarkb, corvusphone: acually, I'm trying one more thing - I'm trying turning creates back on with the crazy-low rate limit	02:50
*** weshay has quit IRC		02:51
*** weshay has joined #openstack-infra		02:52
*** amotoki has joined #openstack-infra		02:53
*** tsg has joined #openstack-infra		02:55
*** asettle has quit IRC		02:55
*** greghaynes has quit IRC		02:58
clarkb	mordred how is that going?	02:59
*** ghostpl_ has joined #openstack-infra		02:59
*** garyh has joined #openstack-infra		03:00
mordred	clarkb: so far so good	03:01
mordred	clarkb: last time it took a while before stuff started dying	03:01
*** jamielennox\|food is now known as jamielennox		03:01
mordred	clarkb: but I'm 95% convinced taht deletes are the problem	03:01
mordred	so we don't see the problem until we start trying to delete things	03:01
clarkb	huh	03:02
*** asettle has joined #openstack-infra		03:02
*** mwagner_lap has joined #openstack-infra		03:03
*** ddieterly has quit IRC		03:04
mordred	clarkb: I believe it's a thundering herd that's caused by the quota code doing table locks, combined with delete using soft deletes and bad queries so taht the delete query quota updating is tying up the create quota calculation in the table lock	03:04
mordred	so if delete performance is slow, it casues everything to stack up	03:04
*** greghaynes has joined #openstack-infra		03:05
*** sdake_ has joined #openstack-infra		03:05
*** subscope_ has joined #openstack-infra		03:05
*** xyang1 has joined #openstack-infra		03:08
*** ghostpl_ has quit IRC		03:09
*** radez is now known as radez_g0n3		03:09
*** sdake has quit IRC		03:10
mordred	clarkb: it seems to not be falling over	03:10
*** garyh has quit IRC		03:10
*** sdake has joined #openstack-infra		03:10
clarkb	thats good	03:11
clarkb	did they deploy new nova quota code recently?	03:11
*** amotoki has quit IRC		03:11
mordred	don't think so	03:12
mordred	I think it was our scheduling fix earlier that triggered this particular interaction	03:12
corvusphone	Mordred we should probably revert my nodepool patch	03:12
corvusphone	It will serialize deletes	03:13
clarkb	except thibgs have done exceptionslly poorly there the last few weekz	03:13
clarkb	but maybe this just makesnit worse	03:13
corvusphone	Super slow but will not overwhelm them	03:13
clarkb	ya	03:13
*** sdake_ has quit IRC		03:14
corvusphone	Yes. I mean we should do that now while they fix it. Because frankly this is the worlds easiest dos	03:14
mordred	corvusphone: well, the current rate limiting is holding steady	03:15
mordred	corvusphone: I have not checked to see if we're winding up with any nodes	03:15
*** coolsvap has joined #openstack-infra		03:16
mordred	corvusphone: also - the cloud noc folks are very motivated to figure out root cause on the delete thing	03:17
*** nelsnels_ has quit IRC		03:18
corvusphone	at 12s per request were probably performing worse than before	03:18
mordred	yeah	03:19
*** markvoelker has joined #openstack-infra		03:19
corvusphone	But I guess it won't hurt to keep it like tgis	03:19
corvusphone	And its easy to ramp up if the NOC asks us to	03:19
*** jyuso1 has joined #openstack-infra		03:19
mordred	corvusphone: well, also - we can try reverting your patch in the morning when we're all awake	03:20
*** dims has quit IRC		03:20
*** bhunter71 has quit IRC		03:20
corvusphone	Ok you don't have to convince me :)	03:21
*** markvoelker has quit IRC		03:23
*** coolsvap\|afk has joined #openstack-infra		03:23
openstackgerrit	Merged openstack-infra/nodepool: Move nodepool creation in tests to common method https://review.openstack.org/165581	03:25
*** coolsvap has quit IRC		03:25
*** coolsvap\|afk is now known as coolsvap		03:26
*** coolsvap is now known as coolsvap\|afk		03:26
*** coolsvap\|afk is now known as coolsvap		03:27
*** otter768 has quit IRC		03:27
*** emagana has joined #openstack-infra		03:28
*** otter768 has joined #openstack-infra		03:29
*** spzala has quit IRC		03:30
*** corvusphone has quit IRC		03:31
*** sputnik13 has quit IRC		03:33
*** sputnik13 has joined #openstack-infra		03:37
*** otter768 has quit IRC		03:39
*** gyee has quit IRC		03:41
*** sputnik13 has quit IRC		03:44
*** achanda has quit IRC		03:49
*** ujuc has quit IRC		03:51
*** asettle has quit IRC		03:52
*** ujuc has joined #openstack-infra		03:54
*** sputnik13 has joined #openstack-infra		03:55
*** armax has quit IRC		03:55
*** asettle has joined #openstack-infra		03:56
*** asettle has quit IRC		03:56
*** asettle has joined #openstack-infra		03:57
*** achanda has joined #openstack-infra		03:59
*** achanda has quit IRC		04:01
*** mayurig has joined #openstack-infra		04:01
*** dannywilson has joined #openstack-infra		04:02
*** ddieterly has joined #openstack-infra		04:05
*** dannywilson has quit IRC		04:05
*** dannywilson has joined #openstack-infra		04:06
zaro	clarkb: any interest going to NW linux fest this year?	04:07
*** sabeen has joined #openstack-infra		04:08
*** Sukhdev has joined #openstack-infra		04:08
clarkb	I thoughr about it but probably wont make it	04:09
*** amotoki has joined #openstack-infra		04:09
*** ddieterly has quit IRC		04:09
*** achanda has joined #openstack-infra		04:12
*** sputnik13 has quit IRC		04:15
*** camunoz_mtg has quit IRC		04:15
*** achanda has quit IRC		04:15
*** VijayTripathi has quit IRC		04:16
*** sputnik13 has joined #openstack-infra		04:16
*** rlucio has quit IRC		04:17
*** Somay has joined #openstack-infra		04:19
*** markvoelker has joined #openstack-infra		04:19
*** sushilkm has joined #openstack-infra		04:20
*** sushilkm has left #openstack-infra		04:20
*** mmedvede has joined #openstack-infra		04:21
*** mayurig has quit IRC		04:21
zaro	i think i'll be there, 1st time. any tip on where to stay?	04:23
*** markvoelker has quit IRC		04:24
*** dims has joined #openstack-infra		04:25
*** wuhg has joined #openstack-infra		04:26
*** sputnik13 has quit IRC		04:27
*** camunoz_mtg has joined #openstack-infra		04:27
clarkb	not really the only time I went I stayed in bad hotel off freeway	04:28
wuhg	how can i add search by subject keyword to https://review.openstack.org/#/q/status:open+project:openstack-dev/devstack,n,0033de7e000285e9	04:28
clarkb	I would try downtown or water front areas if possible	04:28
*** Qiming_ has joined #openstack-infra		04:29
clarkb	wuhg message:"some message"	04:30
Qiming_	hello, infra	04:31
thingee	waiting on something that's blocking the tag for Cinder in k3...it has one job stuck in queued for some time.	04:31
Qiming_	another review is appreciated: https://review.openstack.org/#/c/164963/	04:31
thingee	166003 review	04:31
wuhg	clarkb: thanks ,it works	04:32
*** dims has quit IRC		04:32
*** rkukura has quit IRC		04:33
*** asettle has quit IRC		04:35
*** sputnik13 has joined #openstack-infra		04:36
*** tkelsey has joined #openstack-infra		04:36
*** tkelsey has quit IRC		04:41
*** sputnik13 has quit IRC		04:44
*** sputnik13 has joined #openstack-infra		04:46
*** Sukhdev has quit IRC		04:53
*** ghostpl_ has joined #openstack-infra		04:54
*** rkukura has joined #openstack-infra		04:54
*** baoli_ has quit IRC		04:57
*** yamahata has joined #openstack-infra		04:58
*** sputnik13 has quit IRC		04:59
*** sputnik13 has joined #openstack-infra		05:00
*** amotoki_ has joined #openstack-infra		05:00
*** ghostpl_ has quit IRC		05:01
*** sigmavirus24_awa is now known as sigmavirus24		05:02
*** sputnik13 has quit IRC		05:02
*** chlong has quit IRC		05:02
*** sputnik13 has joined #openstack-infra		05:03
*** ddieterly has joined #openstack-infra		05:06
*** achanda has joined #openstack-infra		05:08
*** VijayTripathi has joined #openstack-infra		05:08
*** ddieterly has quit IRC		05:10
*** mriedem_away has quit IRC		05:18
*** mriedem has joined #openstack-infra		05:18
*** mriedem has quit IRC		05:18
*** mriedem has joined #openstack-infra		05:18
*** chlong has joined #openstack-infra		05:19
*** markvoelker has joined #openstack-infra		05:20
*** garyh has joined #openstack-infra		05:21
*** markvoelker has quit IRC		05:25
*** coolsvap is now known as coolsvap\|afk		05:28
*** jyuso1 has quit IRC		05:31
*** garyh has quit IRC		05:31
*** alexpilotti has quit IRC		05:35
*** tsg has quit IRC		05:35
*** coolsvap\|afk is now known as coolsvap		05:36
*** sputnik13 has quit IRC		05:46
*** sputnik13 has joined #openstack-infra		05:47
*** hdd has joined #openstack-infra		05:48
*** Somay has quit IRC		05:52
*** sputnik13 has quit IRC		05:54
*** reed has quit IRC		05:55
*** dannywilson has quit IRC		05:58
*** xyang1 has quit IRC		06:00
*** chlong has quit IRC		06:01
*** hdd has quit IRC		06:03
*** sdake_ has joined #openstack-infra		06:04
*** VijayTripathi has quit IRC		06:08
*** sdake has quit IRC		06:08
*** sputnik13 has joined #openstack-infra		06:11
*** BharatK has quit IRC		06:11
*** BharatK has joined #openstack-infra		06:12
*** sputnik13 has quit IRC		06:12
*** chlong has joined #openstack-infra		06:13
*** [HeOS] has quit IRC		06:16
*** dims has joined #openstack-infra		06:18
*** nilasae has joined #openstack-infra		06:18
*** emagana has quit IRC		06:18
*** sdake has joined #openstack-infra		06:18
*** markvoelker has joined #openstack-infra		06:21
*** sdake_ has quit IRC		06:22
*** dims has quit IRC		06:24
*** markvoelker has quit IRC		06:26
*** mrda is now known as mrda-afk		06:31
*** garyh has joined #openstack-infra		06:32
*** fifieldt has joined #openstack-infra		06:32
openstackgerrit	Steve Kowalik proposed openstack-infra/system-config: Add zanata-cli utility to proposal slave https://review.openstack.org/166109	06:32
*** jamielennox is now known as jamielennox\|away		06:35
StevenK	pleia2, cinerama: ^	06:35
*** macjack has joined #openstack-infra		06:36
*** deepakcs has joined #openstack-infra		06:40
*** macjack has quit IRC		06:40
thingee	gate queue just restart?	06:40
thingee	I had two jobs pending to cut cinder that were five mins left from being done...and now back to an hour	06:41
* thingee wants sleep		06:41
*** macjack has joined #openstack-infra		06:41
*** garyh has quit IRC		06:42
*** subscope_ has quit IRC		06:42
*** teran has quit IRC		06:43
*** jyuso1 has joined #openstack-infra		06:44
*** sigmavirus24 is now known as sigmavirus24_awa		06:44
*** ghostpl_ has joined #openstack-infra		06:44
thingee	and gate just restarted all my builds again	06:48
*** emagana has joined #openstack-infra		06:49
*** juggler_ is now known as juggler		06:50
*** macjack has quit IRC		06:50
*** mrunge has joined #openstack-infra		06:52
openstackgerrit	yolanda.robla proposed openstack-infra/project-config: Add stackforge/puppet-nscld https://review.openstack.org/165922	06:53
*** yolanda has joined #openstack-infra		06:54
*** emagana has quit IRC		06:54
*** ghostpl_ has quit IRC		06:55
*** yamahata has quit IRC		06:58
*** yamahata has joined #openstack-infra		06:58
*** Bsony has quit IRC		07:01
*** fandi has joined #openstack-infra		07:05
*** fandi has quit IRC		07:06
*** fandi has joined #openstack-infra		07:07
*** ddieterly has joined #openstack-infra		07:08
*** coolsvap is now known as coolsvap_		07:09
*** fandi has quit IRC		07:10
*** scheuran has joined #openstack-infra		07:10
*** fandi has joined #openstack-infra		07:10
*** ddieterly has quit IRC		07:12
*** fandi has quit IRC		07:13
*** fandi has joined #openstack-infra		07:14
*** emagana has joined #openstack-infra		07:15
*** fandi has quit IRC		07:17
*** achanda has quit IRC		07:17
*** fandi has joined #openstack-infra		07:17
*** emagana_ has joined #openstack-infra		07:18
*** achuprin has quit IRC		07:19
*** emagana has quit IRC		07:20
*** sabeen has quit IRC		07:20
*** fandi has quit IRC		07:21
*** fandi has joined #openstack-infra		07:22
*** markvoelker has joined #openstack-infra		07:22
*** emagana_ has quit IRC		07:23
*** fandi has quit IRC		07:25
*** fandi has joined #openstack-infra		07:26
*** markvoelker has quit IRC		07:27
*** achanda has joined #openstack-infra		07:29
*** fandi has quit IRC		07:29
*** fandi has joined #openstack-infra		07:30
openstackgerrit	Jan Provaznik proposed openstack-infra/project-config: Create os-cloud-management project on Stackforge https://review.openstack.org/165433	07:31
*** achuprin has joined #openstack-infra		07:32
*** fandi has quit IRC		07:33
*** fandi has joined #openstack-infra		07:33
*** yfried\|afk is now known as yfried		07:33
openstackgerrit	greghaynes proposed openstack-infra/nodepool: Monkeypatch Fake Clients for tests https://review.openstack.org/165682	07:36
*** fandi has quit IRC		07:37
*** fandi has joined #openstack-infra		07:37
*** camunoz_mtg has quit IRC		07:38
*** fandi has quit IRC		07:39
*** shardy has joined #openstack-infra		07:40
*** garyh has joined #openstack-infra		07:42
*** Bsony has joined #openstack-infra		07:43
GheRivero	morning	07:43
*** chlong has quit IRC		07:45
*** ildikov has quit IRC		07:48
openstackgerrit	Merged openstack-infra/project-config: Update puppet-setproxy to belong to Gozer group https://review.openstack.org/164820	07:48
*** arxcruz has joined #openstack-infra		07:48
yolanda	hi AJaeger, thx for the approval. How can we manage to get some people added to the gozer gerrit group?	07:49
yolanda	morning GheRivero	07:49
*** e0ne has joined #openstack-infra		07:51
*** yfried is now known as yfried\|afk		07:51
*** garyh has quit IRC		07:53
*** ibiris_away is now known as ibiris		07:55
*** markus_z has joined #openstack-infra		07:56
*** Somay has joined #openstack-infra		07:57
*** jistr has joined #openstack-infra		07:58
*** asselin_ has joined #openstack-infra		08:00
*** jcoufal has joined #openstack-infra		08:01
*** dtantsur\|afk is now known as dtantsur		08:02
*** Somay has quit IRC		08:04
*** asselin_ has quit IRC		08:05
*** Somay has joined #openstack-infra		08:05
*** __mimir has joined #openstack-infra		08:08
*** ddieterly has joined #openstack-infra		08:08
*** __mimir has quit IRC		08:09
*** dims has joined #openstack-infra		08:09
*** __mimir has joined #openstack-infra		08:09
*** Somay has quit IRC		08:11
*** ghostpl_ has joined #openstack-infra		08:11
*** oomichi has quit IRC		08:12
*** tnovacik has joined #openstack-infra		08:12
*** emagana has joined #openstack-infra		08:12
*** Longgeek has joined #openstack-infra		08:12
*** ddieterly has quit IRC		08:13
*** ominakov has joined #openstack-infra		08:13
openstackgerrit	greghaynes proposed openstack-infra/nodepool: Don't die while doing alien list https://review.openstack.org/165792	08:14
*** mpavone has joined #openstack-infra		08:15
*** emagana has quit IRC		08:17
*** ghostpl_ has quit IRC		08:17
*** dims has quit IRC		08:18
*** _nadya_ has joined #openstack-infra		08:19
*** e0ne has quit IRC		08:19
*** openstackgerrit has quit IRC		08:22
*** openstackgerrit has joined #openstack-infra		08:22
openstackgerrit	greghaynes proposed openstack-infra/nodepool: Dont die on alien-image-list failure https://review.openstack.org/166132	08:22
*** markvoelker has joined #openstack-infra		08:22
*** achanda has quit IRC		08:23
*** achanda has joined #openstack-infra		08:27
*** ildikov has joined #openstack-infra		08:27
*** markvoelker has quit IRC		08:27
*** dboik_ has quit IRC		08:28
*** deepakcs has quit IRC		08:30
*** boris-42 has quit IRC		08:32
*** hashar has joined #openstack-infra		08:36
*** Bsony_ has joined #openstack-infra		08:39
*** Bsony has quit IRC		08:40
*** dtantsur is now known as dtantsur\|bbl		08:41
AJaeger	yolanda, wait for one of the infra roots to add you to the gozer gerrit group. Let's ask fungi or clarkb to it during the US morning.	08:46
*** achanda has quit IRC		08:50
*** marun has quit IRC		08:51
*** garyh has joined #openstack-infra		08:53
*** stevemar has quit IRC		08:54
*** Somay has joined #openstack-infra		08:56
*** andreykurilin_ has joined #openstack-infra		08:58
*** dannywilson has joined #openstack-infra		08:59
*** skolekonov has joined #openstack-infra		09:00
*** andreykurilin_ has quit IRC		09:03
*** dannywilson has quit IRC		09:03
*** garyh has quit IRC		09:04
*** andreykurilin_ has joined #openstack-infra		09:04
*** ildikov has quit IRC		09:06
*** emagana has joined #openstack-infra		09:07
*** Somay has quit IRC		09:07
*** Ala has joined #openstack-infra		09:07
*** andreykurilin__ has joined #openstack-infra		09:08
*** yamahata has quit IRC		09:09
*** andreykurilin_ has quit IRC		09:09
*** ___mimir has joined #openstack-infra		09:10
*** emagana has quit IRC		09:11
*** Somay has joined #openstack-infra		09:12
*** __mimir has quit IRC		09:13
*** ghostpl_ has joined #openstack-infra		09:13
*** jamielennox\|away is now known as jamielennox		09:14
*** tkelsey has joined #openstack-infra		09:19
*** markvoelker has joined #openstack-infra		09:23
*** tkelsey has quit IRC		09:24
*** ghostpl_ has quit IRC		09:24
*** tkelsey has joined #openstack-infra		09:24
*** Longgeek has quit IRC		09:24
*** Longgeek has joined #openstack-infra		09:25
*** zz_johnthetubagu is now known as johnthetubaguy		09:25
*** andreykurilin__ has quit IRC		09:27
*** andreykurilin_ has joined #openstack-infra		09:27
*** derekh has joined #openstack-infra		09:28
*** markvoelker has quit IRC		09:28
*** Longgeek has quit IRC		09:30
*** dizquierdo has joined #openstack-infra		09:32
*** mtreinish has quit IRC		09:35
*** mtreinish has joined #openstack-infra		09:36
*** amotoki has quit IRC		09:37
*** _nadya_ has joined #openstack-infra		09:40
*** Qiming__ has joined #openstack-infra		09:41
*** yfried\|afk is now known as yfried		09:43
*** andreykurilin_ has quit IRC		09:43
*** ZZelle has quit IRC		09:43
*** ZZelle has joined #openstack-infra		09:44
*** Qiming_ has quit IRC		09:44
*** ominakov has quit IRC		09:49
*** ominakov has joined #openstack-infra		09:50
yolanda	hi AJaeger, ok	09:52
*** yfried is now known as yfried\|afk		09:54
*** hichihara has quit IRC		09:56
*** BobBall_AWOL is now known as BobBall		09:58
*** yfried\|afk is now known as yfried		09:59
*** emagana has joined #openstack-infra		10:01
*** ssam2 has joined #openstack-infra		10:02
*** yamamoto has quit IRC		10:02
*** garyh has joined #openstack-infra		10:04
*** emagana has quit IRC		10:05
*** yamamoto has joined #openstack-infra		10:05
*** sileht has quit IRC		10:07
*** e0ne has joined #openstack-infra		10:08
*** yfried is now known as yfried\|afk		10:10
*** ddieterly has joined #openstack-infra		10:10
*** dimsum__ has joined #openstack-infra		10:11
*** yfried\|afk is now known as yfried		10:14
*** ddieterly has quit IRC		10:14
*** garyh has quit IRC		10:15
*** pblaho__ is now known as pblaho		10:15
*** mfink__ has quit IRC		10:19
*** hashar has quit IRC		10:21
*** sileht has joined #openstack-infra		10:22
*** sushilkm has joined #openstack-infra		10:23
*** markvoelker has joined #openstack-infra		10:24
*** yamamoto has quit IRC		10:25
*** Longgeek has joined #openstack-infra		10:26
*** yamamoto has joined #openstack-infra		10:27
*** yamamoto has quit IRC		10:28
*** markvoelker has quit IRC		10:29
*** yfried is now known as yfried\|afk		10:30
*** Longgeek has quit IRC		10:31
*** ___mimir has quit IRC		10:33
*** rlandy has joined #openstack-infra		10:36
*** pc_m has joined #openstack-infra		10:40
*** erlon has joined #openstack-infra		10:42
*** YorikSar has joined #openstack-infra		10:47
openstackgerrit	Valeriy Ponomaryov proposed openstack/requirements: Bumg ddt to min version 0.7.0 https://review.openstack.org/166162	10:49
*** sushilkm has left #openstack-infra		10:49
*** yfried\|afk is now known as yfried		10:50
*** ___mimir has joined #openstack-infra		10:51
openstackgerrit	Valeriy Ponomaryov proposed openstack/requirements: Bump ddt to min version 0.7.0 https://review.openstack.org/166162	10:51
*** yamamoto has joined #openstack-infra		10:52
*** BharatK has quit IRC		10:52
*** enikanorov has quit IRC		10:54
*** emagana has joined #openstack-infra		10:55
*** e0ne is now known as e0ne_		10:55
*** enikanorov has joined #openstack-infra		10:55
*** tkelsey has quit IRC		10:56
*** tkelsey has joined #openstack-infra		10:56
*** e0ne_ is now known as e0ne		10:57
*** emagana has quit IRC		11:00
*** Longgeek has joined #openstack-infra		11:01
*** tnovacik has quit IRC		11:03
*** enikanorov has quit IRC		11:04
*** enikanorov has joined #openstack-infra		11:05
*** Somay has quit IRC		11:05
*** enikanorov has quit IRC		11:06
*** ghostpl_ has joined #openstack-infra		11:06
*** enikanorov has joined #openstack-infra		11:07
*** yfried is now known as yfried\|afk		11:07
*** Somay has joined #openstack-infra		11:08
*** _nadya_ has quit IRC		11:09
*** mpaolino has joined #openstack-infra		11:09
*** ddieterly has joined #openstack-infra		11:11
*** Qiming_ has joined #openstack-infra		11:11
*** cdent has joined #openstack-infra		11:12
*** baoli has joined #openstack-infra		11:13
*** baoli has quit IRC		11:13
*** Qiming__ has quit IRC		11:15
*** ddieterly has quit IRC		11:15
*** garyh has joined #openstack-infra		11:16
*** jcoufal has quit IRC		11:16
*** enikanorov has quit IRC		11:26
*** garyh has quit IRC		11:26
*** enikanorov has joined #openstack-infra		11:27
*** yfried\|afk is now known as yfried		11:27
openstackgerrit	Chris Dent proposed openstack/requirements: Update gabbi to 0.12.0 https://review.openstack.org/156253	11:29
*** enikanorov has quit IRC		11:30
*** enikanorov has joined #openstack-infra		11:31
*** ldnunes has joined #openstack-infra		11:31
*** yfried is now known as yfried\|afk		11:37
*** enikanorov has quit IRC		11:39
*** enikanorov has joined #openstack-infra		11:40
*** dtantsur\|bbl is now known as dtantsur		11:41
*** otter768 has joined #openstack-infra		11:42
*** jlanoux has joined #openstack-infra		11:43
*** dprince has quit IRC		11:43
*** claudiub has joined #openstack-infra		11:45
*** mestery is now known as mestery_afk		11:45
*** ominakov has quit IRC		11:47
*** otter768 has quit IRC		11:47
*** emagana has joined #openstack-infra		11:49
*** e0ne is now known as e0ne_		11:49
openstackgerrit	Merged openstack/requirements: Remove failing project nova-docker https://review.openstack.org/156260	11:54
*** emagana has quit IRC		11:54
*** fbo has joined #openstack-infra		11:54
*** fifieldt has quit IRC		11:54
*** fifieldt_ has joined #openstack-infra		11:55
*** dizquierdo has quit IRC		11:55
*** sdake has quit IRC		11:57
*** fifieldt__ has joined #openstack-infra		11:58
*** pelix has joined #openstack-infra		11:59
*** dizquierdo has joined #openstack-infra		11:59
*** e0ne_ is now known as e0ne		11:59
*** fifieldt_ has quit IRC		12:00
openstackgerrit	Dmitry Tantsur proposed openstack/requirements: Add ironic-discoverd to projects.txt https://review.openstack.org/156270	12:01
*** yfried\|afk is now known as yfried		12:01
openstackgerrit	Sean Dague proposed openstack-infra/os-loganalyze: extract static methods https://review.openstack.org/165850	12:02
openstackgerrit	Sean Dague proposed openstack-infra/os-loganalyze: unwind test class multiple inheritance https://review.openstack.org/165851	12:02
openstackgerrit	Sean Dague proposed openstack-infra/os-loganalyze: let tests be run from test file location https://review.openstack.org/165799	12:02
*** markvoelker has joined #openstack-infra		12:03
*** ghostpl_ has quit IRC		12:03
*** Qiming__ has joined #openstack-infra		12:06
*** eharney has quit IRC		12:07
*** Qiming_ has quit IRC		12:07
*** Somay has quit IRC		12:10
*** rfolco has joined #openstack-infra		12:10
TheJulia	good morning	12:10
*** ihrachyshka has joined #openstack-infra		12:11
*** dprince has joined #openstack-infra		12:11
*** ibiris is now known as ibiris_away		12:11
*** ddieterly has joined #openstack-infra		12:11
*** yfried is now known as yfried\|afk		12:11
*** jlanoux has quit IRC		12:14
*** radez_g0n3 is now known as radez		12:16
*** ddieterly has quit IRC		12:16
*** radez_g0n3 has joined #openstack-infra		12:16
*** radez_g0n3 is now known as radez		12:16
*** ghostpl_ has joined #openstack-infra		12:17
*** dkliban_afk is now known as dkliban		12:18
*** anthonyper has quit IRC		12:18
*** anthonyper has joined #openstack-infra		12:18
*** aysyd has joined #openstack-infra		12:21
*** ibiris_away is now known as ibiris		12:22
openstackgerrit	Merged openstack/requirements: Bump novaclient version https://review.openstack.org/162492	12:22
*** jaypipes has joined #openstack-infra		12:22
Kiall	any requirements core besides sean about? https://review.openstack.org/#/c/158287/ :)	12:25
*** garyh has joined #openstack-infra		12:27
*** Longgeek has quit IRC		12:27
AJaeger	sdague, https://review.openstack.org/#/c/164077/ is needed to fix important bugs in our documentation toolchain, please reconsider your -2	12:27
*** gordc has joined #openstack-infra		12:27
*** baoli has joined #openstack-infra		12:29
*** bknudson has quit IRC		12:29
sdague	AJaeger: can we just remove the docs from g-r	12:29
sdague	because honestly, there is no reason for the doc repos to be in there	12:30
sdague	especially as your freeze windows are different	12:30
*** sdake has joined #openstack-infra		12:30
AJaeger	sdague, we had this discussion already ;) We really like the syncing of requirements and somebody should implement this in a different way...	12:31
sdague	yep, so then you have to live with freeze restrictions	12:31
sdague	you can't have it both ways	12:31
AJaeger	sdague, but I just had one idea: We already sync from openstack-manuals the glossary, we could sync requirements, let me investigate	12:31
sdague	my patience on this point is pretty limitted	12:31
*** yfried\|afk is now known as yfried		12:32
AJaeger	sdague, it was submitted a week ago - wasn't that before the feature freeze?	12:32
sdague	and, honestly, openstack-manuals is such a small number of projects, it's way easier for you folks to sync your projects directly and not do these g-r round trips	12:32
sdague	AJaeger: doesn't matter when it's submitted	12:32
sdague	it didn't land	12:32
AJaeger	sdague, when do you unfreeze? Is that before Kilo is released?	12:33
sdague	after all integrated projects have stable branches	12:33
*** kgiusti has joined #openstack-infra		12:34
*** baoli has quit IRC		12:34
*** bswartz has quit IRC		12:34
sdague	I litterally have no idea why you think g-r makes any sense for documentation team	12:35
sdague	if the projects weren't in projects.txt you would have already landed these changes in your repos	12:35
AJaeger	sdague, I have to leave for a meeting now - I understand your arguments but need a different solution.	12:36
sdague	I don't know why	12:36
AJaeger	Once we have one, I happily do the changes on the documentation side.	12:36
sdague	no, seriously, you have what, 6 repos?	12:36
AJaeger	sdague, when we did this, we had 10+	12:36
*** adalbas has joined #openstack-infra		12:36
sdague	right, but you don't now	12:37
*** garyh has quit IRC		12:37
AJaeger	;)	12:37
sdague	and, even then, it would have been so much faster for you to local sync all those then going through the g-r process	12:37
sdague	it makes 0 sense that you keep insisting on that	12:38
*** hodos has joined #openstack-infra		12:40
*** adalbas has quit IRC		12:41
*** e0ne is now known as e0ne_		12:41
*** Longgeek has joined #openstack-infra		12:42
*** unicell1 has joined #openstack-infra		12:43
*** emagana has joined #openstack-infra		12:43
*** e0ne_ is now known as e0ne		12:44
*** unicell has quit IRC		12:44
*** markus_z has quit IRC		12:46
sdague	fungi: can you land - https://review.openstack.org/#/c/165542/ - I think it will fix some of the es indexing	12:47
*** emagana has quit IRC		12:48
*** markus_z has joined #openstack-infra		12:49
*** ddieterly has joined #openstack-infra		12:50
*** sdake_ has joined #openstack-infra		12:52
*** adalbas has joined #openstack-infra		12:53
*** ddieterly has quit IRC		12:53
*** pelix has quit IRC		12:53
*** bknudson has joined #openstack-infra		12:54
*** baoli has joined #openstack-infra		12:54
*** sdake has quit IRC		12:55
openstackgerrit	Rafael Folco proposed openstack-infra/system-config: Updates to running-your-own CI docs: Changes required https://review.openstack.org/162268	12:55
*** baoli has quit IRC		12:58
*** baoli has joined #openstack-infra		12:59
*** ChuckC_ has joined #openstack-infra		13:00
*** ChuckC has quit IRC		13:01
*** bradjones has joined #openstack-infra		13:01
*** ChuckC_ has quit IRC		13:05
*** enikanorov has quit IRC		13:09
*** mattfarina has joined #openstack-infra		13:10
*** enikanorov has joined #openstack-infra		13:11
*** ihrachyshka has quit IRC		13:14
*** yfried is now known as yfried\|afk		13:15
*** eharney has joined #openstack-infra		13:15
*** xyang1 has joined #openstack-infra		13:16
*** ChuckC_ has joined #openstack-infra		13:19
*** bswartz has joined #openstack-infra		13:19
*** dustins has joined #openstack-infra		13:22
*** ildikov has joined #openstack-infra		13:23
openstackgerrit	Paul Belanger proposed stackforge/gertty: Add missing requirement for six https://review.openstack.org/166218	13:24
*** JoshNang has quit IRC		13:24
*** eharney has quit IRC		13:25
*** zz_dimtruck is now known as dimtruck		13:25
openstackgerrit	Paul Belanger proposed stackforge/gertty: Add missing requirement for six https://review.openstack.org/166218	13:25
*** JoshNang has joined #openstack-infra		13:26
*** eharney has joined #openstack-infra		13:26
*** dimsum__ has quit IRC		13:27
dprince	If I manually clear out the TripleO RH1 cloud instances will nodepool discover them missing on its next cycle and recreate them?	13:32
*** ffrog has joined #openstack-infra		13:33
*** nilasae is now known as nilasae\|afk		13:33
*** eharney has quit IRC		13:33
*** Longgeek has quit IRC		13:34
*** amotoki_ has quit IRC		13:35
mordred	dprince: if not, it's super simple to delete them from nodepool's database	13:35
*** yfried\|afk is now known as yfried		13:35
*** amotoki has joined #openstack-infra		13:35
*** cdent has quit IRC		13:36
dprince	mordred: could you delete them for me?	13:36
mordred	dprince: sure	13:36
dprince	mordred: the TripleO RH1 zone nodes.	13:36
*** jamielennox is now known as jamielennox\|away		13:36
*** peristeri has joined #openstack-infra		13:37
*** emagana has joined #openstack-infra		13:37
*** garyh has joined #openstack-infra		13:37
mordred	dprince: you're in luck - nodepool already doesn't think it has any nodes there	13:38
dprince	mordred: great. any idea how long before I see new ones spawning?	13:39
mordred	let me look at the logs real quick ...	13:39
*** wuhg has quit IRC		13:39
*** gaelL_ has quit IRC		13:39
mordred	dprince: should be soon - nodepool shows a demand	13:41
mordred	dprince: 2015-03-20 13:39:44,584 DEBUG nodepool.NodePool: Deficit: tripleo-f20: 31 (start: 31 min-ready: 8 ready: 0 capacity: 0)	13:41
mordred	2015-03-20 13:39:44,603 DEBUG nodepool.NodePool: Deficit: tripleo-precise: 51 (start: 51 min-ready: 8 ready: 0 capacity: 0)	13:41
*** rhe00 has quit IRC		13:41
*** gaelL has joined #openstack-infra		13:41
openstackgerrit	Merged openstack/requirements: Import cap.py tool to cap explicit dependencies https://review.openstack.org/155454	13:41
openstackgerrit	Merged openstack/requirements: Up pymongo version to avoid memory leak https://review.openstack.org/123995	13:42
dprince	mordred: cool, thanks	13:42
*** rhe00 has joined #openstack-infra		13:42
*** emagana has quit IRC		13:42
openstackgerrit	Merged openstack/requirements: Block eventlet 0.17.0 https://review.openstack.org/158287	13:42
mordred	dprince: oh! no, we have you turned off ...	13:42
*** mpavone has quit IRC		13:42
mordred	dprince: one sec - let me see what your quota setting should e	13:42
openstackgerrit	Paul Belanger proposed stackforge/gertty: Add support for tox -epep8 https://review.openstack.org/166229	13:43
*** sushilkm has joined #openstack-infra		13:43
*** sushilkm has left #openstack-infra		13:43
*** otter768 has joined #openstack-infra		13:43
mordred	dprince: k. NOW you should start seeing nodes build	13:43
*** ddieterly has joined #openstack-infra		13:43
dprince	mordred: okay, thanks. Will watch these closely	13:44
*** Qiming__ is now known as Qiming		13:45
*** yfried is now known as yfried\|afk		13:45
Qiming	hello, openstack-infra, another review of this new project proposal is appreciated: https://review.openstack.org/#/c/164963/	13:46
Qiming	thanks	13:46
dprince	mordred: I see them going ACTIVE, and floatingips too	13:47
mordred	dprince: woot!	13:47
dprince	mordred: 1 major outage in a year isn't too bad. Thinking the root cause was a MySQL issue of sorts	13:47
*** otter768 has quit IRC		13:48
*** garyh has quit IRC		13:48
dprince	mordred: still looking into the logs but simply bouncing MySQL and clearing out some things made it happy again (we think)	13:48
*** ildikov has quit IRC		13:48
mordred	cool! yeah - that's actually pretty solid I think	13:48
mordred	I mean, we had issues with hp public cloud yesterday that were also mysql related ... so it's fair :)	13:48
*** eharney has joined #openstack-infra		13:49
*** mtanino has joined #openstack-infra		13:50
*** ihrachyshka has joined #openstack-infra		13:51
*** tkelsey has quit IRC		13:51
*** dimsum__ has joined #openstack-infra		13:51
*** raginbajin has quit IRC		13:53
*** hdd has joined #openstack-infra		13:54
*** amitgandhinz has joined #openstack-infra		13:54
*** raginbajin has joined #openstack-infra		13:55
*** dboik has joined #openstack-infra		13:55
*** alexpilotti has joined #openstack-infra		13:56
openstackgerrit	Merged openstack-infra/project-config: puppet-openstack update https://review.openstack.org/163333	13:57
openstackgerrit	Merged openstack-infra/project-config: Custom OVERRIDE_ENABLED_SERVICES for heat-dsvm-functional https://review.openstack.org/162487	13:57
openstackgerrit	Merged openstack-infra/project-config: Run ironicclient functional tests as STACK_USER https://review.openstack.org/163552	13:57
*** hdd has quit IRC		13:59
openstackgerrit	Paul Belanger proposed openstack-infra/project-config: Add pep8 / py27 gates for gertty https://review.openstack.org/166234	13:59
sdague	mordred: speaking of hpcloud, is that recovered yet?	13:59
mordred	sdague: kinda - we've found a rate limit that seems to be working out ok and not causing death	14:00
*** notnownikki has joined #openstack-infra		14:00
*** tqtran has joined #openstack-infra		14:00
sdague	ok, we still have like 500 nodes in building	14:00
mordred	but we haven't poked further to see if we can increase it	14:00
*** tqtran has quit IRC		14:01
*** _nadya_ has joined #openstack-infra		14:01
*** yfried\|afk is now known as yfried		14:02
*** dansmith is now known as superdan		14:03
mordred	sdague: the underlying problem seems to be a thundering herd issue with an interaction between slow deletes and quota interactions	14:03
mordred	sdague: in that something in there takes enough time that if our API rate hits above a certain point, Mysql can't service faster than it's getting new queries	14:04
sdague	interesting, would be nice if we could get a more direct link into the ops to figure out what that hot spot is, and if it's fixable in the code side	14:06
*** esker has joined #openstack-infra		14:06
mordred	sdague: so you see TONS of things in _refresh_quota_usages	14:06
openstackgerrit	Merged openstack-infra/project-config: Add experimental job for Manila scenario tests https://review.openstack.org/164102	14:06
*** cdent has joined #openstack-infra		14:06
openstackgerrit	Merged openstack-infra/project-config: Change project description text https://review.openstack.org/164501	14:07
mordred	sdague: I'm certain I could set that up- I mean, I started looking at nova source code last night, then started backing away quietly	14:07
openstackgerrit	Merged openstack-infra/project-config: Change node param for ec2api rally job. https://review.openstack.org/164717	14:07
sdague	yeh, the quotas code is ... problematic	14:07
mordred	the select for update is just not a good idea :)	14:07
sdague	yeh, most of that is getting unwound	14:08
openstackgerrit	Monty Taylor proposed openstack-infra/system-config: Turn HP back on with lower rate limit https://review.openstack.org/166239	14:10
mordred	I was also thinking - in addition to adding rebuild support to nodepool	14:10
mordred	we have knowledge of what our desired amount of nodes is at any given point in time - we should look in to sending multi-node requests	14:11
mordred	so rather than saying "nova boot" 100 times, we should say "nova boot --count=100" - perhaps	14:11
clarkb	you still need 100 fip attaches, this was sort of my point yesterday about why this is :(	14:13
mordred	yah. that part is still :(	14:13
clarkb	sure we optimize one but but still we are o(n) because cloud	14:13
dprince	mordred: are there public logs I could view to gain insight into why the Fedora jobs are still queued?	14:14
mordred	yah - but one thing at a time	14:14
dprince	mordred: Seeing active instances getting deleted now. Makes me thing something is failing with regards to setting up the Fedora slaves	14:14
mordred	clarkb: are we public-ing the nodepool logs? ^^	14:14
dprince	mordred: FWIW the Ubuntu jobs seem to be running fine we think	14:14
clarkb	no because openstacj leaks private data to logs	14:15
*** esker has quit IRC		14:15
*** hodos has quit IRC		14:15
dprince	mordred: not fine actually, but at least trying to run...	14:15
*** esker has joined #openstack-infra		14:15
clarkb	dprince I gave you the error froma fedora node yesterday	14:15
dprince	clarkb: right, we think we solved that one.	14:16
dprince	clarkb: nodepool got turned off yesterday for TripleO	14:16
dprince	clarkb: now it is back on again so we are checking some things	14:16
*** sputnik13 has joined #openstack-infra		14:19
mordred	sdague: while we're on the subject - why does a delete api call take a long time? is it blocking on something rather than just plopping a delete request on a queue?	14:19
sdague	it's an async call as far as I know	14:19
*** mestery_afk has quit IRC		14:19
fungi	yeah, if you manually nova delete, you'll see it says that it accepted the request, but it doesn't actually disappear from nova list for a while	14:20
*** prad has joined #openstack-infra		14:21
*** Qiming_ has joined #openstack-infra		14:21
fungi	so are we still wanting to revert the nodepool patch from yesterday?	14:22
sdague	mordred: nope, I'm wrong, it's sync	14:23
mordred	fungi: _I_ don't	14:23
fungi	i assume from the looks of the graph that hpcloud is still in a bad way	14:23
mordred	sdague: I'd suggest making it async - since deleting doesn't actually happen at that point anyway	14:23
sdague	yeh, easier said than done	14:23
mordred	sdague: like, we block on the aPI call for 30 seconds , and still have to wait for hours for the node to get deleted	14:24
mordred	sdague: :)	14:24
sdague	I'm looking through that code path	14:24
clarkb	dprince: BadRequest: Error. Unable to associate floating ip (HTTP 400) (Request-ID: req-cedfd88b-ba1e-4a4e-aa19-e6d131fd8db7)	14:24
*** tqtran has joined #openstack-infra		14:24
mordred	sdague: I'm in a chat with folks about the issues - do you want me to bring up that you think it might be interesting to dig in?	14:24
*** Qiming has quit IRC		14:24
sdague	mordred: sure, though honestly it probably won't be today	14:25
*** che-arne has joined #openstack-infra		14:25
mordred	sdague: k. I'll bring that up in a different email then	14:25
dprince	clarkb: sigh, so the same error again?	14:25
*** timcline has joined #openstack-infra		14:25
*** ominakov has joined #openstack-infra		14:26
clarkb	dprince: yes	14:26
*** bhunter71 has joined #openstack-infra		14:26
dprince	clarkb: thanks	14:26
clarkb	at least from nodepools perspective that is what is happening	14:26
dprince	clarkb: are those public?	14:26
*** e0ne is now known as e0ne_		14:26
clarkb	dprince: no we can't make these logs public because openstack clients refuse to sanitize their logging	14:26
mordred	clarkb: have we checked that recently? it's possible it's been fixed	14:26
dprince	clarkb: also, do you see lot of these or just a few. Using clients myself I'm seeing floatingip's get assigned just fine	14:27
clarkb	mordred: I haven't checked it since last summit but also not sure we have upgraded any clients since last summit either	14:27
*** e0ne_ is now known as e0ne		14:27
dprince	clarkb: well, at least I did for the first round of instances	14:27
fungi	i think they (some?) still do it in debug, so if we set debug on a client lib using service and it applies transitively, credentials in logs	14:27
*** peristeri has quit IRC		14:27
sdague	mordred: so my guess, honestly, is it's all the quotas calculations is the delete cost	14:27
clarkb	dprince: 65 since the log was last roatated	14:27
openstackgerrit	Merged openstack/requirements: Remove hardware-specific proliantutils module https://review.openstack.org/158000	14:27
openstackgerrit	Merged openstack/requirements: Do not break on projects without setup.cfg https://review.openstack.org/156220	14:28
clarkb	dprince: looks like it rotated ~6 hours ago	14:28
openstackgerrit	Merged openstack/requirements: Add a script to find cruft global requirements https://review.openstack.org/148071	14:28
fungi	where "it" is print full copies of what's being sent in the api calls, which includes credentials	14:28
*** tsg_ has joined #openstack-infra		14:28
dprince	clarkb: I see many instances with floatingips actually. This one default-net=10.2.8.125, 66.187.229.119; tripleo-bm-test=192.168.1.79. The 66. address is the floatingip	14:29
*** peristeri has joined #openstack-infra		14:29
clarkb	dprince: what is the instance uuid?	14:29
dprince	clarkb: daf50f7d-d73c-460b-9634-274462c6e6c4	14:30
*** yfried is now known as yfried\|afk		14:30
clarkb	dprince: Exception: Timeout waiting for ssh access is the error from that node	14:31
clarkb	which may be related to the issue ianw discovered on rax f20 nodes (prevented node from booting properly so ssh would fail)	14:31
dprince	clarkb: right, I was thinking similar	14:32
jd__	ok sorry to ask here but I'm dumb; why is https://review.openstack.org/#/c/164182/ not merging? what do I miss?	14:32
mordred	sdague: well, that would fit with my napkin theory	14:33
dprince	clarkb: could just be slowness though, still trying some things.	14:33
*** wenlock has joined #openstack-infra		14:33
mordred	sdague: since it was the quota code that was killing the db - and it was mostly happening when we were not ratelimiting the deletes - so if they're long and sync ... that'll pile up easily	14:33
clarkb	mordred: sdague if this affects kilo nova hopefully it is treated as a critical bug and we can look at it before we release	14:34
*** scheuran has quit IRC		14:34
clarkb	jd__: I am not sure at first glance, let me poke around	14:34
fungi	dprince: are you able to see the virtual console for it? when we ran into that, if it's what we ran into, the console was looping with the bootloader failing to find a config	14:35
*** prad has quit IRC		14:35
openstackgerrit	Merged openstack-infra/project-config: Add job for network based elastic-recheck queries test https://review.openstack.org/164869	14:35
mordred	clarkb: I agree - it's effectively a DDOS in a box right now	14:35
dprince	fungi: I can get at it I think. Will involve some tunnelling trickery so give me a bit.	14:35
sdague	clarkb: this is a super long standing issue that requires substantial architecture changes	14:35
mordred	awesome	14:35
*** prad has joined #openstack-infra		14:36
openstackgerrit	Merged openstack-infra/project-config: Add gate check skip for rst/doc files os-ansible-deployment repository https://review.openstack.org/164271	14:36
clarkb	sdague: huh I guess we never tripped it before because we serialized deletes	14:37
fungi	jd__: clarkb: is there a dependency loop there? some of the changes depending on that one also have depends-on commit message headers set to other changes in the same project which i think might be also in that git dependency chain	14:37
fungi	i'm trying to map out the dependencies behind it but they're a little complex	14:37
clarkb	sdague: which is why we had such a large delete backlog in the node graphs for so long	14:37
clarkb	fungi: I am looking at zuul logs	14:38
clarkb	fungi: hopefully between the two we get an answer	14:38
mordred	clarkb: I just had a VERY evil thought	14:38
sdague	right	14:38
mordred	clarkb: what if we stopped deleting full-stop	14:38
mordred	clarkb: and just replaced our delete calls with rebuild calls	14:38
mordred	since delete is broken	14:38
jd__	fungi: ah good hint let me check	14:38
mordred	it means our consumption would never decrease	14:38
mordred	but actually our load against the clouds would be much less	14:38
clarkb	2015-03-20 10:23:37,390 DEBUG zuul.IndependentPipelineManager: Change <Change 0x7fef948f42d0 164182,6> does not match pipeline requirement <ChangeishFilter required_approvals: [{'username': 'jenkins', 'verified': [1, 2]}]> is interesting, we probably want a verified of 0 to be valid for merge check	14:39
clarkb	mordred: we need to test it with rackspace	14:39
mordred	clarkb: yah	14:39
mordred	clarkb: test that rebuild works you mean?	14:39
clarkb	mordred: iirc they were the cloud that said don't use rebuild	14:39
clarkb	ya	14:40
*** tonytan4ever has joined #openstack-infra		14:40
clarkb	because I am pretty sure rax's feedback a while back was rebuild is :(	14:40
clarkb	so we didn't keep looking into it with much priority	14:40
openstackgerrit	Julien Danjou proposed openstack-infra/project-config: Move Gnocchi from Stackforge to OpenStack https://review.openstack.org/162146	14:40
openstackgerrit	Julien Danjou proposed openstack-infra/project-config: Remove some tests to Gnocchi https://review.openstack.org/164211	14:40
mordred	my god. so HP actively wants us to rebuild and RAX actively wants us to not	14:40
mordred	that's so great	14:40
clarkb	mordred: well that may have changed	14:40
sdague	I wonder if that's because rebuild on xen is hokey?	14:41
mordred	clarkb: it would take _slightly_ more work in nodepool than just a quick hack, btw	14:41
jeblair	clarkb: i don't remember negative feedback from rax about rebuild	14:41
*** armax has joined #openstack-infra		14:41
*** asselin_ has joined #openstack-infra		14:41
*** dustins_ has joined #openstack-infra		14:42
jd__	fungi: there was a loop with another repo but at a later point in the branch, not sure that's the issue	14:42
clarkb	jeblair: the feedback was it will perform worse in our cloud so please do what you are doing now iirc	14:42
fungi	jd__: looks like you found it. you had a child of a child of that change which was I5ddb00a depending on I56f1988 which was in turn depending on I5ddb00a	14:42
jeblair	clarkb: where was that feedback?	14:42
clarkb	jeblair: here in irc when mordred asked them about it	14:42
jd__	fungi: ok if that's it cool :)	14:42
jeblair	clarkb: i remember jogo saying he might want to make some things more efficient, but that's it	14:42
fungi	jd__: i _think_ zuul tries to build up the entire dependency set including children and parents of the given change and if it finds a loop anywhere in there it aborts	14:42
jd__	ack :)	14:43
fungi	jeblair: ^ yes?	14:43
jeblair	fungi: that should be the case yes	14:43
jd__	fungi: jeblair: sounds like it, my recheck has been picked! :)	14:43
jd__	thanks guys <3	14:43
fungi	awesome	14:43
jeblair	clarkb: who provided that feedback?	14:44
clarkb	jeblair: I do not remember the specific individual	14:44
*** dustins has quit IRC		14:44
*** sushilkm has joined #openstack-infra		14:45
*** marun has joined #openstack-infra		14:45
*** sushilkm has left #openstack-infra		14:45
*** sputnik13 has quit IRC		14:45
jeblair	clarkb: well, who shall we ask again then?	14:45
*** rlandy has quit IRC		14:46
clarkb	I am reading logs...	14:46
sdague	so, I think all the time is probably in this - https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L3463 . Optimizing there would probably be the place to do it, however stuff like that has enough ripple effects that it's definitely not a post freeze issue	14:47
*** enikanorov has quit IRC		14:47
*** claudiub has quit IRC		14:48
*** garyh has joined #openstack-infra		14:48
*** enikanorov has joined #openstack-infra		14:48
jeblair	mordred: anyway, please let's not invest time in making this version of nodepool use rebuild. it would be a huge change to the algorithm that we will then throw away with zuulv3. if hpcloud can't improve, let's switch back to the old task algorithm and save rebuild for zuulv3.	14:49
openstackgerrit	yolanda.robla proposed openstack-infra/system-config: Don't hardcode pip.conf values https://review.openstack.org/166252	14:49
mordred	jeblair: that's not what I was talking about	14:50
clarkb	looks like phschwartz thought it was a good idea in http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2014-06-12.log and was going to work on a patch. So at least according to that log they were ok with it	14:50
clarkb	now to see if my memory is based on results from writing that change?	14:50
*** yfried\|afk is now known as yfried		14:50
mordred	jeblair: I was talking about not changing the algorithm - and just literally never making a delete call on an existing node in nodepool, but instead changing the body of deleteNode to call rebuild	14:51
mordred	jeblair: which is why I said it was an evil idea - since it would essentially keep us at max utilization constantly	14:51
jeblair	mordred: i understood that. that's still a major change to the algorithm. you have to decide what to rebuild into, etc.	14:51
mordred	jeblair: fair nuff	14:52
jeblair	biab	14:52
*** ___mimir has quit IRC		14:53
*** masayukig_ has joined #openstack-infra		14:53
mordred	clarkb: we don't seem to mark subnodes with deleted state in the db	14:55
mordred	clarkb: we just call cleanupNode on them and then call node.delete()	14:55
yolanda	ah, clarkb, i need you, or fungi	14:56
*** aysyd has quit IRC		14:56
yolanda	a new gozer group was created	14:56
yolanda	for stackforge projects	14:56
yolanda	and i need someone to add people there	14:56
*** mrunge has quit IRC		14:56
clarkb	yolanda: who should be the first group member (they can add the remaining members)	14:56
yolanda	you can add myself	14:57
clarkb	yolanda: you have two accounts, can you give me the account id number for the one you are using?	14:57
yolanda	ah, ok	14:57
clarkb	mordred: that doesn't update teh subnodes table?	14:57
yolanda	it's still that legacy thing	14:57
yolanda	let me check	14:57
mordred	clarkb: don't think so	14:57
mordred	clarkb: I could be wrong though	14:57
mordred	clarkb: I'm not REALLY looking at that - mainly said it here to remind me to look further	14:58
clarkb	there is a state column and on the running DB they have different states	14:58
fungi	we supposedly have 388 alien nodes in hpcloud right now	14:58
fungi	should we be playing whack-a-mole with these still?	14:58
yolanda	ah, mordred, jeblair, for the rebuild, i told Tim that should be better to create a spec	14:58
mordred	fungi: I'm VERY confused as to how we keep growing alien node there	14:59
yolanda	as it involves changes on nodepool logic, for the capacity algorithm	14:59
*** garyh has quit IRC		14:59
clarkb	fwiw I am not finding any follwup to the above convesation in the logs so I may have misremembered something that was said	14:59
yolanda	clarkb href="https://login.launchpad.net/+id/yMkMBPe"	14:59
mordred	yolanda: yes - it's definitely spec worthy - but I agree with jim, it's more likely something we'll want to do as part of zuulv3	14:59
clarkb	phschwartz: any idea where you got with nodepool using rebuild?	14:59
*** aysyd has joined #openstack-infra		14:59
fungi	mordred: i can grab an example uuid and put together a boot/delete/whatever timeline if someone in hpcloud noc wants to trace the corresponding api calls	14:59
clarkb	yolanda: can you give me the gerrit account id? https://review.openstack.org/#/settings/ is wher you can find it	14:59
yolanda	clarkb, yolanda.robla	15:00
mordred	fungi: of one of our aliens? yeah - let's try that	15:00
clarkb	yolanda: done	15:00
anteaya	the weather has been good so I have to start boiling down sap today, will take me a few hours to get set up, back later	15:00
yolanda	mordred, yes, concern i had is that you only need a rebuild if you still have demand of these types of nodes	15:00
mordred	yolanda: you can rebuild a node to a different type	15:00
mordred	yolanda: you don't have to be that fancy	15:01
*** dimsum__ has quit IRC		15:01
*** AJaeger has quit IRC		15:01
yolanda	ah, nice, didn't know that it was possible	15:01
yolanda	but how about flavor ? will that work for different flavours?	15:01
yolanda	clarkb, thx	15:02
clarkb	no, I think rebuild basically takes an arbitrary image, writes it over an existing VMs disk, then reboots the VM	15:02
clarkb	so flavor needs to be constant	15:02
*** stevemar has joined #openstack-infra		15:02
fungi	mordred: cool, seeing what i can put together from our logs for a sample	15:02
yolanda	and we have different ones for bare, devstack, right?	15:02
clarkb	yolanda: we do not	15:02
clarkb	but nodepool will probably need to solve that generally	15:02
clarkb	since others may	15:02
yolanda	we have it downstream, less memory for bare	15:03
yolanda	so it should pick a different flavour	15:03
fungi	also as we start using nodepool for more varied tasks, we may want it for ourselves too	15:03
fungi	so we would go from having per-label demand to per-flavor demand, i guess	15:04
mordred	yes - all of those things are true	15:04
clarkb	it would be both, because per-label would determine what to boot into	15:05
mordred	however - at the moment - none of those things are real _current_ requirements	15:05
fungi	or probably two tiers there since we would still possibly want to pre-boot and attach the workers before the jobs want to run things on them	15:05
fungi	yeah, that	15:05
mordred	they are requirements we should do - and should take in to account when we design the thing	15:05
*** reed has joined #openstack-infra		15:05
fungi	hrm... the math on that model is going to get fun	15:06
openstackgerrit	Merged openstack/requirements: Add ironic-lib to project.txt https://review.openstack.org/161603	15:06
fungi	but i need to ponder other more immediate concerns right now, so will revisit later	15:06
clarkb	we are essentially taking over nova's scheduling problem	15:06
clarkb	which as a user is not what I would like to be spending my time doing	15:06
clarkb	bhunter71: can you see my comment on 161994? curious to hear what you think about that	15:07
*** ociuhandu has quit IRC		15:08
sdague	hmmmm jeblair / clarkb - https://review.openstack.org/#/c/165851/ another one of those zuul merge incorrect errors	15:09
openstackgerrit	Merged openstack-infra/devstack-gate: Add ironic-lib to devstack-vm-gate-wrap.sh https://review.openstack.org/161600	15:10
openstackgerrit	Merged openstack-infra/devstack-gate: Remove configurable testr artifact processing https://review.openstack.org/161422	15:11
*** ffrog has quit IRC		15:11
*** mjturek1 has joined #openstack-infra		15:11
*** armax has quit IRC		15:11
bhunter71	clarkb: thanks, I think that helps. I wanted to change the again, anyway.	15:11
bhunter71	sorry,I wanted to change the format anyway.	15:12
clarkb	bhunter71: as long as it sorts well I think its fine	15:12
clarkb	sdague: jeblair looked into that yesterday and found gerrit doesn't always show the dependency when you push a series and query it immediately	15:13
*** yamahata has joined #openstack-infra		15:13
*** dimsum__ has joined #openstack-infra		15:14
openstackgerrit	Merged openstack-infra/devstack-gate: XenAPI: Highlight that eth4 does not exist outside the Citrix environment https://review.openstack.org/165607	15:14
openstackgerrit	Merged openstack/requirements: Bump tempest-lib min version https://review.openstack.org/166044	15:14
sdague	clarkb: ah, right	15:14
*** dannywilson has joined #openstack-infra		15:14
*** mestery has joined #openstack-infra		15:15
*** sputnik13 has joined #openstack-infra		15:15
clarkb	mordred: looking at graphs the hpcloud error rate is almost 100%	15:15
mordred	clarkb: awesome	15:15
clarkb	mordred: so while we may not be making nova fall over, it isn't doing us any good	15:15
*** radez is now known as radez_g0n3		15:15
*** nelsnelson has joined #openstack-infra		15:16
*** dimsum__ has quit IRC		15:16
mordred	clarkb: well, time to dive back in to figuring out what's failing with node boots	15:16
*** rkukura_ has joined #openstack-infra		15:16
*** yfried is now known as yfried\|afk		15:17
*** sushilkm has joined #openstack-infra		15:17
*** sushilkm has left #openstack-infra		15:17
*** emagana has joined #openstack-infra		15:17
*** nelsnelson has quit IRC		15:17
*** rkukura has quit IRC		15:17
*** rkukura_ is now known as rkukura		15:17
*** yamahata has quit IRC		15:17
*** sputnik13 has quit IRC		15:18
*** nelsnelson has joined #openstack-infra		15:18
*** ddieterly has quit IRC		15:18
*** radez_g0n3 is now known as radez		15:18
*** sdake has joined #openstack-infra		15:19
*** ddieterly has joined #openstack-infra		15:19
*** ajmiller has joined #openstack-infra		15:19
jogo	jeblair: yeah, johnthetubaguy said there is some nova / xen side work to do	15:20
clarkb	oh cool I am not crazy	15:20
jogo	jeblair: to make rebuild more efficient	15:20
jogo	I think it involved making sure they don't delete the image and redownload it during a rebuild	15:20
*** sdake_ has quit IRC		15:20
johnthetubaguy	jogo: ah, yeah, its more the idea we could cache every image thats in use, not just base images	15:21
*** sputnik13 has joined #openstack-infra		15:21
*** sdake_ has joined #openstack-infra		15:21
*** jogo is now known as flashgordon		15:21
*** openstackgerrit has quit IRC		15:21
johnthetubaguy	jogo: xenapi doesn't use the image cache stuff, this work has just dropped on my plate actually, although there are some more urgent things before I will get to this, it should happen	15:21
*** openstackgerrit has joined #openstack-infra		15:22
clarkb	jenkins02 appears to be spiralling into thread leak terribleness	15:22
clarkb	I am going to put it in shutdown mode so that I can get data for my upstream bug	15:22
flashgordon	johnthetubaguy: :/	15:22
*** sigmavirus24_awa is now known as sigmavirus24		15:22
flashgordon	even without that would rebuild be faster or slower or the same as boot delete cycles	15:22
johnthetubaguy	flashgordon: it should be a little faster/more reliable due the lack of IP stuff, and scheduling etc, but thats going to be saving 5-10seconds I would guess	15:23
*** sdake__ has joined #openstack-infra		15:23
*** sdake has quit IRC		15:23
*** hdd has joined #openstack-infra		15:24
mordred	yah. so it wouldn't kill rax -a nd it would be a huge win for hp	15:24
johnthetubaguy	flashgordon: with the above change, it should save 190-200 seconds	15:24
BobBall	So there is no reason to favour delete/rebuild in RAX?	15:24
johnthetubaguy	BobBall: there is a reason to favour rebuild, it saves quite a few error conditions from being possible	15:24
johnthetubaguy	BobBall: its marginal though	15:25
openstackgerrit	Joe Gordon proposed openstack-infra/project-config: Add new keystone tempest job to only run keystone tests https://review.openstack.org/164314	15:25
johnthetubaguy	flashgordon: of course you can rebuild to a new image now too, so no needed to delete at the end of the day, technically	15:25
*** sdake_ has quit IRC		15:26
flashgordon	nice! that is pretty neat	15:27
flashgordon	mordred hopefully that gave you the information you needed, any other rax/rebuild questions for johnthetubaguy ?	15:28
mordred	flashgordon: nope. that's awesome. thanks johnthetubaguy !	15:28
*** sputnik13 has quit IRC		15:28
johnthetubaguy	mordred: do let me know if anything crops up	15:28
mordred	johnthetubaguy: also, we now know that you're the new pvo in terms of us bugging someone about rax nova questions	15:28
*** sputnik13 has joined #openstack-infra		15:28
mordred	johnthetubaguy: hope you're ok with that	15:28
mordred	:)	15:28
*** sputnik13 has quit IRC		15:28
johnthetubaguy	mordred: I noticed a few errors seem to happen just after you switch the images over	15:28
johnthetubaguy	mordred: lol, sure	15:29
*** ayoung has quit IRC		15:30
clarkb	ok I hae a thread dump for jenkins02, is that a safe thing to upload to jenkins' jira?	15:31
clarkb	I will put a copy of it in my homedir on jenkins02	15:31
clarkb	johnthetubaguy: what sort of errors? because that may make it not useable for us	15:31
fungi	johnthetubaguy: actually one of the reasons it's attractive to us is that we often end up waiting up to an hour for rax to assign an ip address on a nova boot request at peak activity, and rebuild was seen as a potential workaround for that	15:32
*** carl_baldwin has joined #openstack-infra		15:32
johnthetubaguy	clarkb: I mean the existing method is hitting some image not found issues, like you deleted the image during the build, but not totally sure	15:32
clarkb	johnthetubaguy: I see, not really something that is a problem generally, but a few corner issues	15:33
johnthetubaguy	fungi: hmm, never seen that take an hour, but yes, it would totally side step that	15:33
fungi	so that 10-15 second performance gain from reusing network configuration is maybe more like 3600 seconds	15:33
clarkb	we can likely live with that :)	15:33
fungi	johnthetubaguy: at times we've been told that the regions we're booting in simply had no available ip addresses and so nova was waiting on some to free up	15:34
johnthetubaguy	fungi: not according to the reporting on my side, not seen any of the builds take quite that long, would love to dig into that if it comes up again	15:34
*** yfried\|afk is now known as yfried		15:34
johnthetubaguy	fungi: oh yeah, we worked around that now, only takes 15 mins to go back in the pool	15:34
fungi	johnthetubaguy: aha, so old info then. that's awesome	15:34
*** yfried has quit IRC		15:34
johnthetubaguy	fungi: I think the switch configs had to get updated, etc	15:34
mordred	that woudl be one of the benefits on the other side too - floating ips stay with a node that is rebuilt	15:34
*** asselin_ has quit IRC		15:35
johnthetubaguy	mordred: yeah, so we don't support those yes, sigh, but rebuild will do the trick for now	15:35
fungi	though we do still recycle instances quickly enough that 15 minutes to return them to the pool is potentially a lot of waste from rackspace's perspective still	15:35
johnthetubaguy	s/yes/yet/	15:35
mordred	johnthetubaguy: we LOVE that you don't have floating ips	15:35
mordred	johnthetubaguy: we HATE floating ips	15:35
*** ociuhandu has joined #openstack-infra		15:35
mordred	johnthetubaguy: I really hope that you don't stop supporting servers having real ips	15:35
mordred	johnthetubaguy: because I would consider that a regression from teh thing you do now which is very awesome	15:36
johnthetubaguy	mordred: they are going to be an additional, AFAIK	15:36
mordred	yay!	15:36
* mordred hugs johnthetubaguy		15:36
* johnthetubaguy sends hug to brad mcconnall		15:36
fungi	301 hug redirect	15:37
*** dustins_ has quit IRC		15:37
*** dimsum__ has joined #openstack-infra		15:37
johnthetubaguy	:)	15:37
mordred	now, if only I could convince johnthetubaguy to start using dhcp my life would be complete ...	15:37
*** Qiming__ has joined #openstack-infra		15:37
johnthetubaguy	mordred: yeah, I am kinda requesting that, but its not on a roadmap I have seen right now	15:37
*** jaypipes is now known as leakypipes		15:37
*** thedodd has joined #openstack-infra		15:37
clarkb	mordred: fungi: I have skimmed the thread dump from jenkins02, the only thing it seems to expose are the server's name, its ip address, some slave names, and some job names running on those slaves	15:38
*** Qiming_ has quit IRC		15:38
clarkb	mordred: fungi do either of you want to double check it isn't leaking anything dangerous?	15:38
johnthetubaguy	mordred: so config drive and cloud-init might do all that for you, not had chance to test it, so you can kill the agent in your image, if you want (needs an extra image prop xenapi_use_agent=False I think)	15:38
fungi	clarkb: where did you save it? in your homedir?	15:38
mordred	johnthetubaguy: yes, that is correct	15:38
clarkb	fungi: yup on jenkins02	15:38
mordred	johnthetubaguy: except you need patched config drive	15:38
johnthetubaguy	mordred: patched config drive?	15:38
mordred	johnthetubaguy: but that's fine - we have a workaround/know how to deal with it	15:38
dprince	clarkb,mordred: so to disable the RH1 TripleO (temporarily) while we try some things do we need to push a patch?	15:39
johnthetubaguy	mordred: OK, if its working thats cool	15:39
mordred	johnthetubaguy: yes - upstream config drive does not yet support reading teh passthrough network info	15:39
mordred	johnthetubaguy: but yeah - it's a thing we have a plan for	15:39
clarkb	dprince: ya, you want to update the nodepool.yaml.erb file to set your regions max servers to 0	15:39
BobBall	Quick config drive question... does rax support volume IDs for config drive?	15:39
dprince	clarkb,mordred: I mean I know we can update the nodepool conf... or would it be okay if we just temporarily block the public API port?	15:39
mordred	johnthetubaguy: but if you used dhcp - we wouldn't need to work around anything	15:39
johnthetubaguy	mordred: ack	15:39
clarkb	dprince: I think temporarily blocking the API port is also fine	15:39
*** ChuckC_ is now known as ChuckC		15:39
dprince	clarkb: okay, we might do that just so we don't have to bother you as much :), thansk	15:40
johnthetubaguy	BobBall: volume IDs? we just do what the upstream code does, I would have to check	15:40
mordred	BobBall: yes	15:40
johnthetubaguy	ah, there you go	15:40
mordred	BobBall: rax allows you to mount config-2	15:40
jeblair	dprince: yeah, an icmp reject would be best i think (don't just drop it)	15:41
*** weshay has quit IRC		15:41
dprince	jeblair: okay, will try that	15:41
BobBall	I don't _think_ I mean disk-by-label - I mean specifying config_drive=<volume_id> when creating the server. Personally never done it so don't know what the use case is for that though :)	15:41
*** ominakov has quit IRC		15:41
BobBall	jhesketh mentioned it on https://review.openstack.org/#/c/155770/	15:42
jeblair	clarkb, mordred, yolanda: can you dump the information we just gathered about rebuild into a comment on https://review.openstack.org/#/c/164371/ ?	15:42
jeblair	clarkb, mordred, yolanda: we can put a paragraph about rebuild into the next iteration	15:43
mordred	jeblair: yes. I can do that	15:43
*** ominakov has joined #openstack-infra		15:43
*** otter768 has joined #openstack-infra		15:44
*** weshay has joined #openstack-infra		15:44
yolanda	ah, i need to read that spec	15:45
jroll	johnthetubaguy: by 'patched configdrive' mordred means this thing that's deployed in our cloud: https://review.openstack.org/#/c/153097/	15:45
jroll	(in case that wasn't clear)	15:46
*** claudiub has joined #openstack-infra		15:46
* jroll wonders if we can get that merged within a year from the original patch		15:46
mordred	jeblair: I think I captured everything	15:46
johnthetubaguy	jroll: yeah, it should also have the regular network info in there two in the XenServer based VMs, I think, so regular cloud-init should have picked it up, in theory	15:47
clarkb	mordred: comment on flavor?	15:47
jroll	johnthetubaguy: yeah, I don't think infra uses cloud-init though	15:48
clarkb	jroll: we do	15:48
jroll	orly	15:48
jroll	do you use patched cloud-init or?	15:48
clarkb	jroll: at least as of yesterday, things sort of changed yesterday afternoon	15:48
jroll	heh	15:48
clarkb	mordred: speaking of, did all images get rebuilt this morning without cloud init?	15:48
*** otter768 has quit IRC		15:49
pabelanger	jeblair, nice, spec. I was thinking about multi-nodes this morning. After looking into the current subnodes setup, I was having some troubles getting subnodes to use a different image then the parent.	15:50
*** baoli has quit IRC		15:50
jeblair	pabelanger: yeah, it's about 1/4 of the spec-writing necessary for the zuulv3 work i outlined in an email a while ago. i'm hoping to write up another chunk today.	15:51
mordred	clarkb: no - because we didn't ever move past hpcloud-b5 yesterday because it all went to hell	15:51
clarkb	mordred: except that nodepool rebuilds images every day	15:51
mordred	clarkb: good point - then if image builds were successful ...yes	15:52
jeblair	mordred, clarkb: i'll add the note about flavors	15:52
*** ominakov has quit IRC		15:52
*** harlowja_at_home has joined #openstack-infra		15:52
fungi	mordred: here's an example alien node leak which hpcloud noc can maybe analyze from their side http://paste.openstack.org/show/193961/	15:53
mordred	jeblair: nod. thanks - I knew I was missing something	15:53
clarkb	mordred: looks like the snapshots are all still building	15:53
clarkb	mordred: and the dib images haven't been uploaded yet because some are still building	15:54
clarkb	mordred: so we haven't flipped that switch yet but it is in progress	15:54
mordred	fungi: did we actually submit a delete server task there?	15:54
mordred	clarkb: cool	15:54
clarkb	mordred: and maybe that will affect the hpcloud error rate if the metadata server is still hosed there	15:54
fungi	mordred: that's a good question. nodepool says it deleted the node, but it didn't explicitly log the api call itself so... depends on how much we trust that nodepool is making those calls?	15:55
fungi	mordred: i guess we could add all provider api responses to the debug log. not sure how much bloat that would add	15:56
clarkb	fungi: does that uuid ever show up in the nodepool log?	15:56
fungi	clarkb: nope	15:56
clarkb	fungi: I have a hunch that the 502 happens before assigning a uuid, nodepool says I don't need to delete this node in the cloud because it never exists (no uuid) and simply removes it from the db	15:56
clarkb	then at some point in time hpcloud says "here have a node"	15:57
fungi	clarkb: great point. nodepool may be assuming that errors from a boot call are always going to be cleaned up on the provider side	15:57
clarkb	jroll: so we do still use cloud init, we are currently rebuilding our images to stop using it because ec2 metadata server isn't very reliable	15:57
clarkb	fungi: yes, and I am not sure it can assume much else	15:58
fungi	clarkb: also i'm not entirely sure how it would ever be able to be 100% certain that it needs to clean those up	15:58
fungi	yeah, agreed	15:58
*** unicell1 has quit IRC		15:58
jroll	clarkb: right, thought you were moving away from it, though	15:58
mordred	jroll: yup	15:59
jroll	cool	15:59
mordred	jroll: two different efforts - one is related to getting dib images to work on rackspace to start with	15:59
jesusaurus	clarkb: fungi: yeah last night i had to clean out a bunch of nodes that nodepool listed as aliens	15:59
*** garyh has joined #openstack-infra		15:59
jroll	mordred: right	15:59
mordred	jroll: this one is reacting to the fact taht ec2 metadata in hp is horky - so we want to stop using it there sooner	15:59
jroll	heh	16:00
*** harlowja_at_home has quit IRC		16:00
openstackgerrit	David Lyle proposed openstack/requirements: Raise cap for Django to allow 1.7 https://review.openstack.org/155353	16:00
mordred	I should remove in hp - I believe it's horky anywhere it exists	16:00
openstackgerrit	Sean Dague proposed openstack/requirements: Bump sahara client version https://review.openstack.org/155428	16:00
fungi	jesusaurus: seems to me like a nova bug, if the behavior we're theorizing is actually responsible	16:00
jroll	mordred: considering rackspace doesn't have a metadata service, that should make life easier	16:00
*** ominakov has joined #openstack-infra		16:00
jeblair	fungi, mordred: i'm _pretty_ sure looking at the log in http://paste.openstack.org/show/193961/ that we will have issued a create server api call, gotten a 502 response from that, therefore we never received the server id, but the server was actually created	16:00
mordred	jroll: you'd think	16:00
clarkb	jeblair: yup that is my following too	16:01
clarkb	s/following/reading/ english hard	16:01
mordred	jeblair: that seems likely	16:01
jroll	ha	16:01
mordred	so - it's possible that a 502'd api call can still result in a booted node	16:01
*** dizquierdo has quit IRC		16:01
fungi	so... if hpcloud can confirm from their side the circumstances which cause the boot call to return an error but allow the server to still be built, that needs to be filed as a bug against nova yeah?	16:01
jeblair	fungi, mordred, clarkb: is there some, like, nova metadata we can stick in a create call that will show up on an inventory later so we could link that server to a "failed" api call?	16:01
*** Qiming_ has joined #openstack-infra		16:01
mordred	jeblair: yes	16:01
mordred	jeblair: we can put anyting we want in the nova metadata	16:02
fungi	i like the canary idea there	16:02
jeblair	mordred: and that's included in the create call so it's synchronous?	16:02
mordred	yup	16:02
*** dimsum__ has quit IRC		16:02
mordred	jeblair: we can add more to my patch for that if you want	16:02
clarkb	fungi: yes I think that is a nova bug	16:02
jeblair	cool, so we should probably do that, but also, i do think api calls that return failures while succeeding are bad form :)	16:02
mordred	jeblair: https://review.openstack.org/#/c/126621/	16:02
clarkb	fungi: if api returns error node should not be booted	16:02
mordred	jeblair: indeed	16:02
mordred	jeblair: so, I addeda nodepool dict to that metadata - would be simple to put more things in that	16:03
jeblair	mordred: cool	16:03
fungi	jeblair: so are you thinking stick an identifier for the nodepoold in as metadata so that it can say "here's an instance in the list, metadata says it's one i built, but i don't have any record of it, delete now"?	16:03
jeblair	fungi: yeah	16:04
clarkb	fungi: jeblair I think we can do that with the node name fwiw	16:04
jeblair	fungi: i think it'd have to be since we delete the node record from the db	16:04
clarkb	the name is essentially metadata that we already have	16:04
fungi	clarkb: not necessarily. consider multiple nodepoolds using a common tenant	16:04
jeblair	fungi: and you may be surprised at this -- i don't want to keep records of every node we've ever created. I've seen where that goes ;)	16:04
clarkb	fungi: oh hrm	16:04
*** Qiming__ has quit IRC		16:04
mordred	I think it's cheap to add more things into the nova metadata	16:04
fungi	jeblair: yeah, that's why i'm guessing we just have a reusable id of the nodepoold itself (maybe specified in its config)	16:05
clarkb	fungi: ya you are right, so each nodepool would need to add metadata that uniquely identified a booted node to a nodepool instance	16:05
fungi	"here stick this value in the metadata of ever instance you boot"	16:05
*** masayukig_ has quit IRC		16:05
fungi	er, every	16:05
clarkb	fungi: ya	16:05
jeblair	wfm	16:05
*** sdake has joined #openstack-infra		16:05
fungi	and if nodepoold sees its own canary there, it knows it's one it built	16:06
*** masayukig_ has joined #openstack-infra		16:06
fungi	in theory we could accomplish it without metadata by namespacing the instance hostnames, but that's ugliness	16:06
mordred	yeah - especially when we have a friendly metadata structure to use	16:06
clarkb	flashgordon: ^ any idea if the above api returned error but nova booted a node is already a filed bug?	16:06
*** armax has joined #openstack-infra		16:07
clarkb	flashgordon: and if not ideas on how hpcloud can confirm it is a nova issue?	16:07
*** amotoki has quit IRC		16:07
*** amotoki has joined #openstack-infra		16:07
ttx	jeblair: fyi I wrote a new app for design summit scheduling -- one that allows PTLs to directly edit bits of info on sched.org	16:09
ttx	Currently at https://github.com/ttx/summitsched	16:09
*** sdake__ has quit IRC		16:09
*** garyh has quit IRC		16:10
ttx	sched.org is all-or-nothing, this will allow to delegate maintenance of parts of the schedule content to people	16:10
*** ominakov has quit IRC		16:10
*** ominakov has joined #openstack-infra		16:11
ttx	also enforces all sorts of rules, like prefixing of session titles with track name	16:11
jeblair	ttx: cool, you want to run it in infraland?	16:11
jeblair	ttx: (also, yay prefixing titles!)	16:11
*** Qiming_ has quit IRC		16:11
*** ominakov has quit IRC		16:12
ttx	jeblair: it will require infraland resourecs -- whether I'll be able to fuly puppetize it or just request an empty box with root shell is yet tbd	16:12
ttx	(depending on how much time I'll have on my hands)	16:12
jeblair	ttx: we've already got a puppet model for 'install django and run the syncdb thing'	16:13
jeblair	ttx: so it shouldn't be too hard	16:13
jeblair	ttx: (graphite.o.o does that i think)	16:13
ttx	jeblair: yeah, just need to add the inital data load (track names and lead usernames)	16:13
ttx	Also allows multiple people to help for the same track	16:14
ttx	rather than be PTL-reserved	16:14
ttx	Currently considerign the ability to tag a session with multiple types, so that it appears on multiple tracks	16:14
*** ayoung has joined #openstack-infra		16:15
ttx	but the sched data model is pretty weak	16:15
*** esker has quit IRC		16:15
ttx	(its API is weak too)	16:15
clarkb	fungi: were you going to check that thread dump?	16:16
fungi	clarkb: ahh, yep, grepping through it now	16:16
fungi	funny, just noticed that the spelling check in my irc client believes "grepping" is an actual word	16:17
*** baoli has joined #openstack-infra		16:17
*** sigmavirus24 is now known as sigmavirus24_awa		16:17
*** masayukig_ has quit IRC		16:18
*** thingee has quit IRC		16:18
fungi	clarkb: nothing troublesome that i can find	16:20
openstackgerrit	Clark Boylan proposed openstack-infra/project-config: Disable -dibtest jobs https://review.openstack.org/166302	16:20
*** sigmavirus24_awa is now known as sigmavirus24		16:20
openstackgerrit	Clark Boylan proposed openstack-infra/system-config: Cleanup devstack-(trusty\|precise)-dib images https://review.openstack.org/158891	16:21
*** masayukig_ has joined #openstack-infra		16:21
clarkb	mordred: ^ getting those two changes in should allow you to delete the devstack-precise-dib and devstack-trusty-dib images on nodepool.o.o freeing up ~16GB of disk	16:21
yolanda	mordred, i see your change for rate=64 for hpcloud? is really that needed? that's too high, going to make nodepool sloooow	16:21
*** achuprin has quit IRC		16:21
yolanda	and i'm worried for the part i'm affected	16:21
fungi	hrm, the test nodes graph says we have no nodes in use now	16:21
mordred	yolanda: it's required for us	16:21
yolanda	but this means 1 api call per 64 secs?!	16:21
mordred	nope	16:22
mordred	well, for us	16:22
fungi	nodepool list says 14 nodes in use	16:22
mordred	it's 1 ever 12.8	16:22
mordred	because we have 5 hpcloud regions	16:22
mordred	yolanda: I would not copy that setting if I were you	16:22
mordred	yolanda: if the noc is not yelling at you, your current setting is fine	16:22
yolanda	no, of course :)	16:22
fungi	i think something in the image updates for may have tanked rackspace node builds?	16:22
clarkb	fungi: all the clouds are basically 100% error rate	16:22
yolanda	but i was reviewing that change and my alerts raised	16:22
mordred	fungi: oh, that's not great	16:22
fungi	i'm going to grab a console for one now and see what's going on	16:23
clarkb	fungi: its possible thats the remove cloud init oing into affect	16:23
*** dimsum__ has joined #openstack-infra		16:23
fungi	clarkb: yeah, that was my suspicion as well	16:23
clarkb	so we may need to delete todays build in rax then revert those changes	16:23
mordred	sigh	16:23
jeblair	mordred, yolanda: it's also only required for the un-merged change that does rate limiting on start of requests instead of end	16:23
clarkb	fungi: thank you for checking the thraed dump I will upload that to the bug now	16:23
yolanda	ah, jeblair, i looked at that, i'm willing that this merged	16:23
yolanda	nodepool is our daily pain	16:24
clarkb	yolanda: nodepool or the cloud?	16:24
clarkb	if there are bugs in nodepool we should fix them	16:24
mordred	clarkb: same thing for them - they only have one cloud	16:24
yolanda	75% cloud 25% nodepool?	16:24
mordred	clarkb: which is, I believe, why they're more eager for the rebuild stuff	16:24
rcarrillocruz	++	16:25
rcarrillocruz	rebuild would be a killer feature for us	16:25
jeblair	mordred, yolanda: i think this is no better than what we had before and actually i think it is a little worse. i think we should not merge that change and go back to our previous config	16:25
yolanda	if you look at the charts, most of nodes are in building and delete status	16:25
mordred	jeblair: yah	16:25
*** unicell has joined #openstack-infra		16:25
yolanda	mordred, jeblair, have you ever thought about nodepool serving docker instances? for lots of simple tests that will make things much faster	16:27
yolanda	a pep8 test, an alphabetized one	16:27
yolanda	these are very very simple things	16:28
yolanda	why a full vm for that?	16:28
jeblair	yolanda: yes, i have. that's something that we could consider doing in zuulv3 as well. but again, it would be very complicated in the current system. partially because of the nodepool allocation system, but also complex for us because docker is not secure.	16:28
yolanda	i was running tests on lxc when i started and i've always had that on my mind	16:28
mordred	jeblair: ++	16:28
yolanda	jeblair, but discriminating the kind of tests that could use it... it could be a real helper	16:29
yolanda	you cannot run a tempest test there, but can run pep8 or unit testing	16:29
fungi	clarkb: mordred: no errors on the console. they're booting up to a local login prompt, but not reachable on the ip address reported by nova list (no response even via ping)	16:29
mordred	fungi: these are the rax nodes?	16:29
fungi	mordred: yep	16:29
jeblair	yolanda: you don't need to convince me, i understand.	16:29
zaro	morning	16:30
fungi	mordred: looks like just since this morning's image updates there	16:30
mordred	fungi: well, I mean, that definitely points to delete images and revert	16:30
yolanda	jeblair, and how could we achieve it? some spec for it, and tied to the new nodepool spec?	16:30
fungi	mordred: agreed. doing so now	16:30
mordred	I do NOT understand why	16:30
mordred	but figuring out why is a task for later	16:30
*** sabari has quit IRC		16:31
*** spzala has joined #openstack-infra		16:31
jeblair	yolanda: yes, it could be done either as part of, or after, the zuulv3 work.	16:31
yolanda	that will be a killer feature for the simple tests	16:32
rcarrillocruz	i think docker is cool, but besides security implications, i think docker should be kept at the nova provider layer, and not on nodepool	16:32
rcarrillocruz	would be great to maybe have that new hp infra cloud with docker or something	16:32
clarkb	its important to note that containers don't address the current problems because you still need somewhere to run the container. So the current issues need to be fixed first regardless	16:32
yolanda	do providers allow it?	16:32
rcarrillocruz	and devoting nodes for pep8	16:32
rcarrillocruz	etc	16:32
jeblair	clarkb: yep	16:32
*** sabari has joined #openstack-infra		16:32
jeblair	okay, so i'd like to defer this conversation for later...	16:32
yolanda	the way i had it implemented, is that i had x static slaves, that were serving lxc containers	16:33
jeblair	and instead breach the subject that we have no workers right now.	16:33
*** skolekonov has quit IRC		16:33
mordred	jeblair: yes.	16:33
jeblair	well, i mean, we have 20.	16:33
mordred	jeblair: statistically, that's no workers	16:33
*** dkranz has quit IRC		16:34
jeblair	we have 295 building in rax	16:34
clarkb	fungi is deleting the new images we just built	16:35
fungi	all images built in rackspace in the last few hours are now deleted. hopefully we see some recovery there shortly	16:35
jeblair	okay cool	16:35
clarkb	that should get us back to pre cloud init removal. Then we also need to revert those changes	16:35
clarkb	otherwise this will regress over the weekend	16:35
jeblair	i don't see anything happening on the hpcloud side	16:35
jeblair	as far as the noc asking us to load test or anything	16:35
*** amotoki has quit IRC		16:36
jeblair	so i'd like to just go ahead and revert back to thursday morning's config	16:36
fungi	that works for me	16:36
mordred	k.	16:36
jeblair	mordred: if hpcloud improves something, we can check our logs for deletion times	16:36
*** ayoung has quit IRC		16:37
*** ociuhandu has quit IRC		16:37
jeblair	mordred: we see 30 second delete api calls enough that we should see an improvement in that time if they manage to make an improvement	16:37
mordred	jeblair: yah	16:37
*** tjones1 has joined #openstack-infra		16:37
jeblair	rcarrillocruz, yolanda: ^ (if they are able to improve things, this would help you too)	16:37
*** MrAboii has joined #openstack-infra		16:38
*** dkranz has joined #openstack-infra		16:38
yolanda	so jeblair, what's the issue, api response went worse than normal?	16:38
jeblair	yolanda: i believe it has slowed gradually over time	16:39
*** achuprin has joined #openstack-infra		16:39
openstackgerrit	James E. Blair proposed openstack-infra/system-config: Revert "Turn off HP Public Cloud" https://review.openstack.org/166308	16:40
jeblair	mordred: can you ninja that ^	16:40
*** EmilienM is now known as EmilienM\|afk		16:41
mordred	jeblair: yup	16:41
clarkb	jeblair: revert sounds good to me as well	16:41
openstackgerrit	Merged openstack-infra/system-config: Revert "Turn off HP Public Cloud" https://review.openstack.org/166308	16:42
mordred	jeblair: don't forget, puppet is disabled on nodepool	16:42
*** andreykurilin_ has joined #openstack-infra		16:42
jeblair	mordred: yep. i plan on stopping nodepool, re-installing master, running puppet apply, and starting nodepool	16:42
jeblair	fungi: are you ready for me to do that ^ ?	16:42
clarkb	I have made progress on https://issues.jenkins-ci.org/browse/JENKINS-27514 just by reading through this to update the bug	16:43
jeblair	(er, puppet agent)	16:43
fungi	jeblair: yes, go for it	16:43
*** yamahata has joined #openstack-infra		16:43
*** Ala has quit IRC		16:43
openstackgerrit	Merged openstack-infra/project-config: Add new project faafo to Stackforge https://review.openstack.org/164668	16:44
*** tsg_ has quit IRC		16:44
clarkb	https://github.com/jenkinsci/ssh-slaves-plugin/blob/ssh-slaves-1.9/src/main/java/hudson/plugins/sshslaves/SSHLauncher.java#L1213 hanging coupled with a synchronized method appears to be leaking all of the threads	16:44
jeblair	+if type dpkg-reconfigure >/dev/null 2>&1 && ! test -f /etc/ssh/ssh_host_rsa_key	16:44
jeblair	+then	16:44
jeblair	+ dpkg-reconfigure openssh-server	16:44
jeblair	+fi	16:44
jeblair	puppet did that ^	16:45
jeblair	in case that impacts your thinking about the content of images that were built this morning	16:45
fungi	that was part of the "regen ssh host keys ourself" patch	16:45
jeblair	yeah, seems like it may not have been applied	16:45
fungi	i don't think the rackspace images were getting that far	16:45
jeblair	ok	16:45
fungi	seems like they weren't actually configuring their network interfaces	16:45
fungi	at least from the limited testing i was able to do	16:46
*** dprince has quit IRC		16:46
*** dprince has joined #openstack-infra		16:46
fungi	they were booted to login prompts for several minutes but i got no ping response from the ip addresses reported for them by nova	16:46
openstackgerrit	Monty Taylor proposed openstack-infra/project-config: Revert "Regenerate ssh host key on boot" https://review.openstack.org/166310	16:47
openstackgerrit	Monty Taylor proposed openstack-infra/project-config: Revert "Remove ssh host keys during image build" https://review.openstack.org/166311	16:47
fungi	can also check what nodepoold saw from those. if it really was an ssh host key problem we'd get connection closed. if lack of networking then connection timeout	16:47
fungi	i'l go hunting in the logs	16:47
openstackgerrit	Monty Taylor proposed openstack-infra/system-config: Revert cloud-init removal https://review.openstack.org/166312	16:50
*** david-lyle_ has joined #openstack-infra		16:50
mordred	I'm going to ninja those reverts above, unless there is opposition	16:50
mordred	clarkb, fungi, jeblair, pleia2 ^^	16:50
jeblair	mordred: are you abandoning the effort?	16:51
*** david-lyle_ has quit IRC		16:51
*** harlowja_away is now known as harlowja_		16:51
mordred	jeblair: well, it didn't fix hp, and it broke rax - so I think regrouping and starting over is probably in order, yeah?	16:52
*** Sukhdev has joined #openstack-infra		16:52
pleia2	makes sense	16:52
jeblair	mordred: (btw, if we are in a similar situation again, i believe we could mitigate the extremely slow hpcloud boot times by lowering max-servers for one of the providers)	16:52
jeblair	mordred: yeah -- will we get the same effect eventually with the dib work?	16:52
mordred	well - we're doing much more active and methodical testing with the dib work	16:53
mordred	to understand what we need at boot time for realz	16:53
*** sigmavirus24 is now known as sigmavirus24_awa		16:53
jeblair	mordred: so just roll "don't have cloud-init depend on metadata server but also make sure we get fresh host keys" into that?	16:53
mordred	that removing cloud-init somehow broke rackspace which I thought didn't use it is mindboggling to me	16:53
*** dustins has joined #openstack-infra		16:54
fungi	looks like what nodepoold logged was "Timeout waiting for server <UUID> in rax-xxx" so it never got as far as testing ssh	16:54
mordred	jeblair: yah- I mean, I'd like to get an answer sooner - but worst case yes	16:54
mordred	jeblair: because I want to undersatnd what about that broke rackspace	16:54
jeblair	mordred: okay sounds good	16:54
mordred	since it defies my understanding of how the rackspace nodes work	16:54
mordred	which isn't good :)	16:54
jeblair	if you look at the node graph now, you can basically see what it's like when we delete all our instances at once. a steep decline from rax, and we've leveled out waiting on hpcloud	16:55
*** dtantsur is now known as dtantsur\|afk		16:55
jeblair	and now starting to build up in rax	16:55
jeblair	is anyone deleting hpcloud aliens? if not, i'll start on that	16:59
fungi	i had not started yet	16:59
openstackgerrit	Merged openstack-infra/project-config: Make VPNaaS StrongSwan functional gate voting https://review.openstack.org/165392	16:59
fungi	my best guess is that something we did broke rax instances' ability to configure their network interface/routing and if the hypervisor can't ping the interface nova never reports it as ready?	17:00
*** psedlak has joined #openstack-infra		17:00
rcarrillocruz	jeblair: nod, we've been plagued by those slow delete api calls... last thing we heard, the Neutron guys were looking at it	17:00
clarkb	fungi: I am wondering if nova agent is tied to cloud init somehow	17:01
jeblair	fungi: do you have an easy way to reconcile two alien lists?	17:01
clarkb	jeblair: comm -12	17:01
*** baoli has quit IRC		17:01
clarkb	I learned this from sdague, its a neat little trick	17:01
*** baoli has joined #openstack-infra		17:01
fungi	jeblair: you mean to diff them? i'll give you my script	17:02
clarkb	I just use `comm -12 file1 file2`	17:02
jeblair	clarkb: that's pretty cool	17:02
*** baoli has quit IRC		17:02
fungi	jeblair: world's worst bash one-liner http://paste.openstack.org/show/193973/	17:02
*** wenlock has quit IRC		17:02
*** baoli has joined #openstack-infra		17:02
clarkb	jeblair: it does get cranky about unsorted inputs so I sort the files first usually	17:02
openstackgerrit	Merged openstack-infra/project-config: Remove check-tempest-dsvm-f20 https://review.openstack.org/165532	17:03
*** Ryan_Lane has joined #openstack-infra		17:03
fungi	clarkb: oh, neat. that would get rid of my hacky nested loops	17:03
fungi	also i need to go get some lunch, but will be back shortly	17:04
*** markus_z has quit IRC		17:04
clarkb	johnthetubaguy: any ideas on why purging cloud-init from our images would break our ability to have working networking on rax nodes? is nova agent piggy backing off of somethign cloud init does?	17:04
*** sarob has joined #openstack-infra		17:04
SpamapS	jeblair: Are you aware of anybody who has successfully used gear w/ eventlet?	17:05
* SpamapS needs to get back to real work.. has fallen down a gearman hole lately		17:05
openstackgerrit	Monty Taylor proposed openstack-infra/project-config: Disable metadata in cloud-init config https://review.openstack.org/166318	17:06
mordred	jeblair: ok - rather than the full remove cloud-init - I would like to modify it ^^	17:06
jeblair	SpamapS: i think there may be an unmerged patch to that effect in gear's review queue	17:06
mordred	SpamapS: ^^ can you check me that that's not insane?	17:06
mordred	clarkb: ^^ I think that is more inline with what you were saying yesterday	17:06
*** Somay has joined #openstack-infra		17:06
SpamapS	jeblair: in my experimenting with gear as an oslo.messaging driver.. it's not working.	17:06
fungi	jeblair: sdague: oh, as for yesterday's heat functional discussion, https://review.openstack.org/166030 seems to have gotten the job back to an hour consistently	17:07
jeblair	fungi: yay	17:07
SpamapS	And I've spent way more time than I ever should have on this, so I think it's time to WIP it and circle back later. :-/	17:07
clarkb	mordred: ya I was also suggestion we use config drive but I don't think thats necessary for the short term	17:07
jeblair	SpamapS: 97533	17:07
clarkb	fungi: awesome	17:07
*** ociuhandu has joined #openstack-infra		17:07
anteaya	fungi jeblair do you think we can change the timeouts for the heat job then?	17:08
mordred	clarkb: yes - I don't think we need the config drive part	17:08
jeblair	SpamapS: but yeah, i support you in not rat-holing on it :)	17:08
clarkb	mordred: agreed	17:08
openstackgerrit	Merged openstack-infra/project-config: Adds compute-hyperv in StackForge https://review.openstack.org/165611	17:08
mordred	clarkb: but I think we can satisfy the intent with that patch	17:08
SpamapS	jeblair: ah yes just found that. Well that is exactly what I ran into.	17:08
clarkb	mordred: but if config drive were enabled it should also just work	17:08
clarkb	assuming new enough cloud-init	17:08
mordred	clarkb: yes	17:08
SpamapS	jeblair: oh you should have hid this from me. Now it will be calling to me from the bottom of the rat hole. ;)	17:08
clarkb	so its win win	17:08
mordred	it turns out rackspace IS using cloud-init for something in addition to nova-agent	17:09
clarkb	mordred: do you know what that is?	17:09
mordred	as evidenced by the existence of /etc/cloud/cloud.cfg.d/10_rackspace.cfg	17:09
mordred	so - I think they are using it for many of the things	17:09
mordred	just not the things that nova-agent is doing	17:09
openstackgerrit	Merged openstack-infra/os-loganalyze: fix supports_sev matching https://review.openstack.org/165542	17:09
mordred	boggles	17:09
*** tnovacik has joined #openstack-infra		17:09
SpamapS	mordred: sanity checked	17:10
openstackgerrit	Merged openstack-infra/project-config: new-project: stackforge/python-senlinclient https://review.openstack.org/164963	17:10
clarkb	looking at this java code I think that there is zero reason to synchronize that method. I wish java devs wouldn't default to doing that its a horrible practice. Instead we need to synchronize around the connection and session objects which are not class level but object level	17:10
*** garyh has joined #openstack-infra		17:10
fungi	anteaya: probably if they're waay higher than an hour	17:10
openstackgerrit	Merged openstack-infra/os-loganalyze: let tests be run from test file location https://review.openstack.org/165799	17:10
fungi	also, this seems to have worked yesterday... http://lists.openstack.org/pipermail/foundation-board/2015-March/thread.html	17:10
mordred	fungi: woot!	17:11
clarkb	fungi: anteaya they are currently set to 2 hours, I think we should reduce to 90 minutes or so	17:11
mordred	clarkb, fungi: mind if a push through the reverts and the new attempt at cloud-init and kick another hpcloud image rebuild?	17:11
openstackgerrit	Merged openstack-infra/os-loganalyze: extract static methods https://review.openstack.org/165850	17:11
*** e0ne has quit IRC		17:11
clarkb	mordred: if you are around today to babysit fine by me :)	17:12
*** wenlock has joined #openstack-infra		17:12
mordred	k. I'm going to run to get the bag of coffee beans right now - if there are no objections when I get back, I will do that next	17:13
clarkb	mordred: you will need to free disk space again	17:13
anteaya	clarkb: just found the patch so will offer something around 90 minutes noting that 60 minutes would be the ideal	17:13
mordred	clarkb: that's so exciting	17:13
*** psedlak is now known as psedlak^afk		17:13
clarkb	mordred: you can `sudo -H -u nodepool dib-image-delete $imageid`	17:13
anteaya	fungi: k	17:13
fungi	mordred: lgtm though i didn't vet the cloud-init config syntax i'm assuming SpamapS did	17:14
clarkb	mordred: where imageid is the id for the older of the devstack-precise-dib devstack-trusty-dib and devstack-centos7-dib images	17:14
fungi	okay, really going to lunch now. bbiaw	17:14
*** Sukhdev has quit IRC		17:15
mordred	fungi: fwiw, I copied that content directly from the rackspace nodes	17:15
clarkb	mordred: https://issues.jenkins-ci.org/browse/JENKINS-27514 and https://review.openstack.org/#/c/158891/ should allow us to properly remove those images until we need them for rax	17:15
SpamapS	fungi: it's yaml, it can't be wrong. ;)	17:15
*** achanda has joined #openstack-infra		17:15
*** notnownikki has quit IRC		17:16
*** dboik_ has joined #openstack-infra		17:16
*** psedlak^afk is now known as psedlak		17:16
clarkb	mordred: there is a comment in those files about dpkg reconfiguring	17:16
clarkb	mordred: is that going to be something you need to do or something that will override your changes if it happens?	17:16
*** psedlak is now known as psedlak^afk		17:16
mordred	SpamapS: ^^ ?	17:17
*** psedlak^afk is now known as psedlak		17:17
mordred	clarkb: I don't know - i've never used cloud-init successfullly	17:17
SpamapS	ugh	17:17
*** AJaeger has joined #openstack-infra		17:17
SpamapS	I think you might have to put the answer in debconf, let me check	17:18
clarkb	I am pretty sure that this is the real reason people use docker	17:18
mordred	yup	17:18
clarkb	not the packaging or the potential security	17:18
mordred	yup	17:18
clarkb	but the "I just want this damn process to run" functionality	17:18
mordred	yup	17:18
mordred	because everything else has lost track of that being the use case people are trying to solve 95% of the time	17:19
AJaeger	sdague, some of the requirements we have in openstack-manuals are unique and we could remove them. Note that trove also uses the docbook XML toolchain and thus needs openstack-doc-tools. So, what about the following:	17:19
SpamapS	holllyyy crap	17:19
* SpamapS did not need to see cloud-init's postinst today		17:19
SpamapS	don't look at it	17:19
SpamapS	face melting	17:19
clarkb	this should be simple, but after diving into upstart sysv compat on ubuntu I no longer assume anything about how running proceses should be simple at boot	17:19
openstackgerrit	Anita Kuno proposed openstack-infra/project-config: Reduce timeout for heat functional job https://review.openstack.org/166320	17:19
AJaeger	sdague, allow in projects.txt in requirments "soft" projects where we do not require all requirements - and set that flag for the doc projects. And then remove their unique requirements?	17:19
*** jistr has quit IRC		17:19
*** dboik has quit IRC		17:20
*** pblaho has quit IRC		17:20
SpamapS	ok yeah	17:20
*** psedlak has quit IRC		17:20
SpamapS	mordred: so clarkb is right in being concerned	17:20
* anteaya is not a fan of swearing in channel		17:20
SpamapS	the debconf value cloud-init/datasources will be injected there	17:21
clarkb	anteaya: sorry	17:21
*** garyh has quit IRC		17:21
clarkb	SpamapS: and that will happen only if dpkg-reconfigure is called right? so if the package is updated?	17:21
anteaya	clarkb: np, thanks	17:21
clarkb	we can probably get away with the change as is	17:21
clarkb	but it may also lead to weirdness down the road if we aren't carefuk	17:21
*** openstackgerrit has quit IRC		17:21
clarkb	*careful	17:21
SpamapS	clarkb: updates will cause it yes	17:21
*** openstackgerrit has joined #openstack-infra		17:22
openstackgerrit	Somay Jain proposed openstack-infra/jenkins-job-builder: Adding more configurable options in Notifications plugin https://review.openstack.org/163137	17:22
*** kgiusti has quit IRC		17:22
SpamapS	you can make a file, 91_reallydatasources.cfg	17:22
*** dmorita has joined #openstack-infra		17:22
SpamapS	cloud-init reads them in order and will do a __dict__.update() using the new one	17:22
SpamapS	so that might be the safest way	17:23
SpamapS	mordred: ^	17:23
clarkb	SpamapS: that sounds simple and reliable, I like it	17:23
SpamapS	or echo "cloud-init cloud-init/datasources Configdrive,None" \| debconf-set-selections	17:24
SpamapS	but really, debconf, DIAF. :-P	17:25
*** Bsony_ has quit IRC		17:25
*** mjturek1 has quit IRC		17:26
*** gyee has joined #openstack-infra		17:27
*** gampel has joined #openstack-infra		17:29
cinerama	hey pleia2	17:30
openstackgerrit	Anita Kuno proposed openstack-infra/project-config: Reduce timeout for heat functional job https://review.openstack.org/166320	17:30
*** koolhead17 has joined #openstack-infra		17:31
*** pc_m has quit IRC		17:31
*** armax has quit IRC		17:32
morganfainberg	lbragstad, https://bugs.launchpad.net/keystone/+bug/1433311 is not wishlist, this is higher prio	17:33
openstack	Launchpad bug 1433311 in Keystone "Fernet tokens current don't support token bind" [Medium,Triaged]	17:33
*** tsg has joined #openstack-infra		17:33
morganfainberg	whoopse wrong channel	17:33
*** pelix has joined #openstack-infra		17:35
jeblair	clarkb, fungi, mordred: hpcloud alien deletes are running	17:36
*** sputnik13 has joined #openstack-infra		17:37
*** koolhead17 has quit IRC		17:37
clarkb	jeblair: cool, do you want me to kick off a floating ip cleanup too? I can also start the leaked port deletion script	17:37
*** pelix has quit IRC		17:38
clarkb	jeblair: or I can give you my one liner for FIPs if you want to run it	17:38
anteaya	morganfainberg: I was going to say	17:38
*** ivar-lazzaro has joined #openstack-infra		17:38
jeblair	clarkb: why don't you kick it off? but i'm guessing it won't have much to do	17:38
clarkb	ok	17:38
jeblair	clarkb: i think most of these happened before we got to the fip state	17:38
jeblair	stage	17:38
morganfainberg	anteaya, yeah i know :P	17:38
*** arxcruz has quit IRC		17:38
morganfainberg	anteaya, tooooooo many irc channels	17:38
*** pelix has joined #openstack-infra		17:39
anteaya	I've never seen lbragstad say anything in this channel	17:39
anteaya	I'd make fun of his hat if he did	17:39
clarkb	jeblair: `venv/bin/neutron floatingip-list \| grep -v '10\.0\.' \| sed -e '1,3d' -e '$d' \| cut -d'\|' -f 2 \| xargs -n 1 -P 1 venv/bin/neutron floatingip-delete` is the one liner fwiw	17:39
anteaya	morganfainberg: I've been waiting for the opportunity	17:39
clarkb	and its done, only 4 to delete	17:39
*** ivar-lazzaro has quit IRC		17:39
morganfainberg	anteaya, ++ yes!	17:40
*** ivar-lazzaro has joined #openstack-infra		17:40
*** aysyd has quit IRC		17:42
anteaya	morganfainberg: I _know_ he is wearing it	17:42
morganfainberg	anteaya, i'm sure he is!	17:42
*** otter768 has joined #openstack-infra		17:45
openstackgerrit	Merged openstack/requirements: Update gabbi to 0.12.0 https://review.openstack.org/156253	17:45
*** ghostpl_ has quit IRC		17:45
*** aysyd has joined #openstack-infra		17:46
*** dmorita has quit IRC		17:47
*** fandi has joined #openstack-infra		17:47
*** fandi has quit IRC		17:47
*** sabeen1 has joined #openstack-infra		17:47
*** dmorita has joined #openstack-infra		17:48
*** ayoung has joined #openstack-infra		17:49
*** otter768 has quit IRC		17:50
*** VijayTripathi has joined #openstack-infra		17:50
*** ghostpl_ has joined #openstack-infra		17:50
openstackgerrit	Merged openstack-infra/project-config: Drop ironic tempest regex, stop running all of Tempest https://review.openstack.org/161420	17:52
*** coolsvap_ is now known as coolsvap\|afk		17:52
openstackgerrit	Monty Taylor proposed openstack-infra/project-config: Disable metadata in cloud-init config https://review.openstack.org/166318	17:53
mordred	SpamapS, clarkb: ^^ there - that also tells debconf	17:53
*** dmorita has quit IRC		17:54
*** andreykurilin_ has quit IRC		17:54
jeblair	clarkb, fungi, mordred: with nodepool running on hpcloud, i can only do about two alien delete processes in parallel	17:55
clarkb	mordred: you prefer that over SpamapS' suggestion?	17:55
openstackgerrit	Merged openstack-infra/project-config: Add a python-ironicclient src job https://review.openstack.org/163632	17:55
*** mpaolino has quit IRC		17:55
jeblair	clarkb, fungi, mordred: 2 pushes create time up to 45 seconds / request, 3 occasionally pushes it over 60 seconds which is what we have the api timeout set to	17:55
jeblair	(more than 3 regularly pushes it over that limit)	17:56
*** patrickeast has joined #openstack-infra		17:56
jeblair	so it's going to take a really long time to run	17:56
*** mrmartin has joined #openstack-infra		17:56
openstackgerrit	Merged openstack-infra/project-config: Turn on oslo.messaging coverage report https://review.openstack.org/164022	17:56
mordred	clarkb: that was one of his suggestions	17:56
mordred	clarkb: that will ensure that if it gets re-run, the file will remain set	17:57
mordred	clarkb: the other thing seems really confusing to me	17:57
*** dmorita has joined #openstack-infra		17:57
mordred	jeblair: yoy	17:57
clarkb	mordred: ok	17:57
*** Bsony has joined #openstack-infra		17:58
openstackgerrit	Merged openstack/requirements: Bump keystonemiddleware requirement https://review.openstack.org/164573	17:59
clarkb	mordred: lgtm	17:59
clarkb	I am going to pop out now for early lunch. Back in a bit	18:00
*** _nadya_ has quit IRC		18:00
openstackgerrit	Merged openstack-infra/system-config: Revert cloud-init removal https://review.openstack.org/166312	18:01
openstackgerrit	Merged openstack/requirements: Bump requests-mock version https://review.openstack.org/162493	18:01
openstackgerrit	Merged openstack/requirements: Update pip and pip-missing-reqs https://review.openstack.org/159293	18:01
*** armax has joined #openstack-infra		18:01
*** shardy has quit IRC		18:01
mordred	SpamapS: dare I even ask why the file is called 90_dpkg ?	18:02
mordred	SpamapS: I mean, it has nothing to do with configuring dpkg - it's a setting that configures data sources	18:02
*** sdake_ has joined #openstack-infra		18:03
mordred	oh - sod it - I need to do a different patch on rh systems don't I?	18:03
johnthetubaguy	mordred: I think they did something evil inside cloud init to stop it racing with the agent…	18:03
mordred	johnthetubaguy: all to avoid running dhcp	18:03
mordred	johnthetubaguy: the mind boggles	18:03
mordred	at the amount of effort that has been expended to chat that	18:03
mordred	chase	18:04
mordred	not chat	18:04
johnthetubaguy	mordred: basically, the agent needs to setup network, before cloud-init starts if I remember	18:04
mordred	yup	18:04
mordred	I've looked through the init-script hacks for that	18:04
johnthetubaguy	ah, OK, thats the bit I knew about	18:05
*** e0ne has joined #openstack-infra		18:05
*** Swami has joined #openstack-infra		18:05
mordred	clarkb, SpamapS: new version of that patch coming - I didn't think to test for ubuntu first	18:05
johnthetubaguy	so I assume you don't have an image metadata tag telling nova not to talk to the agent on your image, but thats another thing that can stop that working	18:05
mordred	johnthetubaguy: well, we deleted cloud-init earlier	18:05
mordred	because we bake keys into the images	18:06
johnthetubaguy	mordred: a good way to test the agents OK is to changepassword, after rebooting your VM, after removing cloud-init from it, if that works?	18:06
mordred	but it turns out that breaks something else on rackspace	18:06
*** Sukhdev has joined #openstack-infra		18:06
mordred	something related to networking	18:06
*** sdake has quit IRC		18:06
*** mrmartin has quit IRC		18:06
mordred	which surprised me - because I was expecting ... OH! I think I know	18:06
johnthetubaguy	mordred: hmm, we certainly don't use cloud-init for anything critical, I can only think of the init hack for them being linked	18:06
clarkb	mordred maybe do spamaps thing as it should be distro agnostic?	18:07
mordred	clarkb: sigh. ok. I want to go on record as saying it makes me angry, fwiw	18:07
johnthetubaguy	I am curious, why do you need to remove cloud-init?	18:08
mordred	johnthetubaguy: we don't need it	18:09
johnthetubaguy	hmm, OK	18:09
mordred	johnthetubaguy: but we were going for the easy way to stop hammering the hp cloud metadata service	18:09
mordred	johnthetubaguy: all of our nodes boot from images we build	18:09
johnthetubaguy	ah, that makes more sense, gotcha	18:09
mordred	johnthetubaguy: we're currently working on a project which is "make an image that can boot on both rackspace and hp that contains neither nova-agent nor cloud-init"	18:10
johnthetubaguy	mordred: eek, gotcha	18:10
johnthetubaguy	a worth aim	18:10
mordred	johnthetubaguy: I'd give in and use cloud-init if we didn't have to patch cloud-init to get networking info on rax	18:10
johnthetubaguy	s/worth/worthy/	18:10
mordred	but we do - so it's also a pita	18:10
*** derekh has quit IRC		18:11
johnthetubaguy	mordred: I didn't think you should have to do that patch though, the regular info should have been there two, sounds like a bug	18:11
johnthetubaguy	s/two/too/	18:11
mordred	it's not - the patch hasn't landed upstream	18:11
mordred	to pass neutron IP info through to config-drive	18:11
mordred	rax is VERY THANKFULLY deploying teh same info currently into a vendor extension (thank you thank you)	18:11
mordred	but until it lands upstream, the patch to consume from cloud-init isn't even proposed to cloud-init	18:12
mordred	and cloud-init upstream is currently doing a 2.0 rewrite anyway	18:12
johnthetubaguy	mordred: yeah, I am thinking there was a way if you set flat_injected=True, I thought on XenServer we had a hack that did that injection into the old location, but I never got chance to test that in production yet	18:12
johnthetubaguy	mordred: ah, interest	18:12
johnthetubaguy	afraid I have to run off now	18:13
johnthetubaguy	its getting dark in the UK, and I have an extra tuba rehersal tonight this week	18:13
openstackgerrit	Monty Taylor proposed openstack-infra/project-config: Disable metadata in cloud-init config https://review.openstack.org/166318	18:13
mordred	johnthetubaguy: have fun at rehearsal!	18:13
anteaya	johnthetubaguy: you are after all, the tuba guy	18:13
mordred	SpamapS: ^^ can you sanity check that for me please?	18:13
johnthetubaguy	:)	18:13
*** pc_m has joined #openstack-infra		18:14
*** gampel has quit IRC		18:15
jeblair	mordred, fungi, clarkb: my plan today is to tend to the slow alien delete process, but otherwise avoid any nodepool changes, and take it easy this afternoon and write up more zuulv3 specs so i'm not burned out for our maint tomorrow	18:16
anteaya	jeblair: is there anything I can do to help? I'm holding off reviewing/approving stuff as I don't want to tax the few workers we have	18:17
jeblair	anteaya: i would not worry about that. approve at will; it'll get through it eventually.	18:18
clarkb	jeblair sounds good, should mordred avoid the cloud init change then?	18:18
anteaya	okay	18:18
*** tkelsey has joined #openstack-infra		18:21
*** garyh has joined #openstack-infra		18:22
fungi	jeblair: that sounds like a great plan	18:23
mordred	clarkb, jeblair: the cloud-init change shouldn't affect the other api stuff much	18:23
*** kgiusti has joined #openstack-infra		18:23
*** dboik_ has quit IRC		18:23
*** dboik has joined #openstack-infra		18:24
fungi	jeblair: as for alien deletes, i usually just do them entirely serially unless the quantity is enormous (like the ~500 we needed to delete yesterday)	18:25
*** johnthetubaguy is now known as zz_johnthetubagu		18:25
*** ghostpl_ has quit IRC		18:25
*** ghostpl_ has joined #openstack-infra		18:27
anteaya	what is z/tempest? it is in zuul/layout.yaml but I dont' know what repo it corresponds to: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/layout.yaml#n2826	18:29
*** dboik has quit IRC		18:30
anteaya	I don't even know where else to look to find out	18:31
*** dboik has joined #openstack-infra		18:31
pleia2	cinerama: so yeah, saw StevenK's patch and updated the topic on it this morning so it shows up in reviews with all the other zanata patches, thanks for reviewing, I'll have a look in a bit	18:31
* anteaya nips out to get more sap, back in a minute		18:32
*** garyh has quit IRC		18:32
*** MarkAtwood has joined #openstack-infra		18:33
cinerama	pleia2: kool	18:33
*** MrAboii has quit IRC		18:34
*** arxcruz has joined #openstack-infra		18:35
*** crc32 has joined #openstack-infra		18:35
fungi	anteaya: it's a dummy project used to set up a transitive co-gating relationship between actual projects	18:35
anteaya	fungi: ah ha	18:36
fungi	anteaya: it's basically abusing zuul's queue sharing algorithm to establish an equivalency between multiple jobs in case a project runs one of those but not the others	18:37
anteaya	fungi: I found it because I am reviewing https://review.openstack.org/#/c/165648/1	18:38
*** edwarnicke has quit IRC		18:38
anteaya	which reduces the prevelence of neutron-large-ops on projects	18:38
*** sweston has quit IRC		18:38
*** ujuc has quit IRC		18:38
nibalizer	mordred: so for using 1 cert for everone with puppet apply i think we have to set this	18:39
nibalizer	https://docs.puppetlabs.com/references/latest/configuration.html#nodename	18:39
anteaya	should the z/tempest neutron-large-ops job also be scaled back?	18:39
nibalizer	then we can set certname to everyonecert.lol.openstack.org	18:39
*** dougwig has quit IRC		18:39
*** erw has quit IRC		18:39
nibalizer	thanks to Hunner for that one	18:39
mordred	nibalizer: nod	18:39
Hunner	and just whitelist that cert at the puppetdb	18:40
Hunner	Puppetdb won't care about the cert that is auth'd, only the contents of the payload	18:40
fungi	anteaya: probably not since we likely still want to make sure projects which do continue to run that job co-gate with others which don't	18:40
anteaya	very good, thanks	18:41
*** Somay has quit IRC		18:41
anteaya	I understood about 20% of what you told me about that dummy project but hopefully I can get a better visualization of it at some point	18:41
openstackgerrit	Jeremy Stanley proposed openstack-infra/system-config: Move security.openstack.org to HTTPS https://review.openstack.org/155099	18:46
*** dmorita has quit IRC		18:46
*** spzala has quit IRC		18:46
*** VijayTripathi1 has joined #openstack-infra		18:48
*** VijayTripathi has quit IRC		18:51
*** pradk has joined #openstack-infra		18:55
*** pradk has quit IRC		18:55
fungi	clarkb: you've flown through vancouver inbound to the usa before right? i'm looking at flying back directly and trying to figure out if i need to leave buffer time in my first layover for customs or if they really do us customs when boarding in vancouver and then treat the connections as usa domestic...	18:55
*** prad has quit IRC		18:56
*** e0ne has quit IRC		18:57
*** mjturek1 has joined #openstack-infra		18:58
*** e0ne has joined #openstack-infra		18:58
*** prad has joined #openstack-infra		18:59
*** mjturek1 has left #openstack-infra		19:00
*** ssam2 has quit IRC		19:00
clarkb	they do customs in canada for us departures	19:01
clarkb	so add time for that	19:01
clarkb	yvr was pretty quick about it though	19:02
fungi	good to know, and conversely i can scale back on my first layover since i won't need to claim and re-check my luggage	19:02
*** emagana has quit IRC		19:02
*** tkelsey has quit IRC		19:03
sdague	oh, man - https://review.openstack.org/#/c/125944/ - fungi / clarkb either of you want to put the final +2 on that one? That would make for an awesome Friday	19:03
*** emagana has joined #openstack-infra		19:04
fungi	as for the working-group-on-a-train idea, amtrak apparently has three coach options for the portland->vancouver run ranging from us$48-114... does it matter which coach ticket i get?	19:04
openstackgerrit	Merged openstack-infra/project-config: Disable metadata in cloud-init config https://review.openstack.org/166318	19:04
openstackgerrit	Merged openstack-infra/project-config: Revert "Remove ssh host keys during image build" https://review.openstack.org/166311	19:04
openstackgerrit	Merged openstack-infra/project-config: Revert "Regenerate ssh host key on boot" https://review.openstack.org/166310	19:04
*** tqtran is now known as tqtran_afk		19:04
*** [HeOS] has joined #openstack-infra		19:04
cdent	aw, fungi, I'm jealous, the train ride from portland to vancouver is beautiful	19:05
fungi	cdent: join us! do some openstacking on a train	19:05
*** rlucio has joined #openstack-infra		19:06
jroll	oh, that would be cool	19:06
cdent	too late, got plane tickets already, in vancouver, out seattle, taking the train south afterwards	19:06
*** achanda has quit IRC		19:07
*** nilasae\|afk has quit IRC		19:07
*** armax has quit IRC		19:08
*** tjones1 has quit IRC		19:08
*** e0ne has quit IRC		19:08
*** EmilienM\|afk is now known as EmilienM		19:10
*** e0ne has joined #openstack-infra		19:10
*** sdake has joined #openstack-infra		19:11
fungi	sdague: as awesome as it is, browsers are going to choke on it. see comment	19:13
*** dimtruck is now known as zz_dimtruck		19:14
AJaeger	sdague: did you see my question above?	19:14
anteaya	flashgordon: I'm currently reviewing: https://review.openstack.org/#/c/165652/1 where do I see the list of projects grenade does test?	19:14
*** ghostpl_ has quit IRC		19:14
sdague	fungi: gotcha	19:15
cinerama	neat. so looks like sjc to yvr via train takes a couple days but you have to stop overnight in seattle the way it's calculating it	19:15
*** sdake_ has quit IRC		19:15
clarkb	fungi amtrak assigms seats as you vheck in	19:15
sdague	AJaeger: only barely, I'm wrapping up one last thing then calling it a week	19:15
clarkb	usually its easy to at least take over the food/observation cars	19:15
fungi	sdague: e.g. https://review.openstack.org won't be allowed to embed http://zuul.openstack.org json content for security reasons	19:16
anteaya	flashgordon: just if it is listed here with an upgrade-<project> file? http://git.openstack.org/cgit/openstack-dev/grenade/tree/	19:16
sdague	fungi: yeh... hmmm... so I definitely had this working before	19:16
fungi	clarkb: open lounge areas. got it	19:16
sdague	is there a new change there?	19:16
cinerama	no assigned seating on capitol corridor when i've ridden but that may be different	19:16
*** ghostpl_ has joined #openstack-infra		19:17
fungi	sdague: were you doing it with overrides in the javascript console? that might make a difference	19:17
AJaeger	sdague: ok, I'll followup via email - enjoy the weekend	19:17
sdague	fungi: I was injecting it directly in the js console	19:18
sdague	so that could be	19:18
fungi	sdague: at least i know when it's come up in the past, having an javascript in an https-served page call an http url to retrieve data has caused browser security warnings/errors	19:18
sdague	yeh, I can believe that	19:19
fungi	sdague: though if the javascript is being provided via the js debug console instead of the site, that may not be spotted	19:19
*** e0ne has quit IRC		19:19
sdague	yep, good call	19:19
fungi	sdague: well, not so much a call, as this use case is precisely why i added the changes to get zuul also serving its status data via https	19:20
*** spzala has joined #openstack-infra		19:20
sdague	well, I will be excited when it shows up.	19:20
fungi	because it came up before in discussion as a prerequisite for the status embedding we wanted	19:21
anteaya	flashgordon: and grenade doesnt' seem to be testing devstack-gate (which is good since I don't know what ti would test) so perhaps we should remove it from there as well	19:21
openstackgerrit	Merged openstack-infra/project-config: Split grenade out of integrated-gate template https://review.openstack.org/165651	19:21
*** achanda has joined #openstack-infra		19:21
*** pelix has quit IRC		19:22
anteaya	zaro: do you have all the code merged that needs to be merged for tomorrow?	19:22
*** tjones1 has joined #openstack-infra		19:25
*** tjones1 has left #openstack-infra		19:25
clarkb	anteaya should be this is just moving to trusty	19:26
mordred	SpamapS: back away. back slowly away	19:26
*** MarkAtwood has quit IRC		19:26
mordred	SpamapS: (oh, I was scrolled back ... that was a response to a long time ago)	19:26
clarkb	and I think I reviewdd and got all those trusty related changes merged. good to double chwck though	19:27
anteaya	clarkb: awesome	19:27
*** MarkAtwood has joined #openstack-infra		19:28
anteaya	I recall the js minifier patch was about to look for that, that needs to be in for tomorrow, just wondering if something else got discovered that I missed	19:28
anteaya	patch, I was	19:28
anteaya	there it is, I have reviewed, it has yet to be approved: https://review.openstack.org/#/c/165145/	19:29
*** hashar has joined #openstack-infra		19:31
*** sarob has quit IRC		19:31
clarkb	oh thats a nee one.will review after eating blts	19:31
*** MarkAtwood has quit IRC		19:32
cinerama	oh pleia2 when you get a chance, the spec mentions ansible playbooks - if you have the location i wouldn't mind taking a look to see if there's stuff we missed in the modules	19:32
mordred	clarkb: how many blts are you going to eat?	19:32
*** garyh has joined #openstack-infra		19:33
*** zz_dimtruck is now known as dimtruck		19:33
*** hashar is now known as hasharConfcall		19:34
fungi	all teh blts	19:36
*** tiswanso has joined #openstack-infra		19:36
*** yamahata has quit IRC		19:36
openstackgerrit	Merged openstack-infra/project-config: Update forge-upload job to use tags https://review.openstack.org/164016	19:37
*** yamahata has joined #openstack-infra		19:37
fungi	clarkb: when you have a moment between blts #4 and #5, another portland travel logistics question... is 2.5 hours from landing at pdx to amtrak departure from union station easy enough to accomplish via public transit?	19:37
pleia2	cinerama: I'll forward them to (they weren't strictly open sourced, just grabbed and sanitized by Red Hat IT and shared with me)	19:38
greghaynes	mordred: Made a couple fixes to your https://review.openstack.org/#/c/165792/ in case you didnt see	19:38
cinerama	pleia2: oh cool thanks	19:38
pleia2	cinerama: which address to send them to?	19:38
cinerama	pleia2: either hp or personal is fine	19:38
mordred	greghaynes: I am a fan of fixes!	19:38
pleia2	cinerama: I don't know your personal address, so just PM me what you prefer :)	19:39
clarkb	fungi ya should be, take red line downtown ~hour + catch bus/yellow/green to union station ~20 minutes	19:39
clarkb	you can walk that last step too	19:39
fungi	clarkb: cool. the neighborhood around union station looked marginally familiar on the map but wasn't sure what the closest stop on the red line was	19:40
greghaynes	fungi: youre portlanding!?	19:40
greghaynes	oh, im guessing this is for summit	19:40
fungi	greghaynes: for to ride teh trainz for summit, yes	19:41
*** andreykurilin_ has joined #openstack-infra		19:41
fungi	greghaynes: though i have a talk accepted at oscon so will be back ~ a month later too	19:41
greghaynes	awesome, yes as clarkb said the max red line is kind of a direct airport -> amtrak	19:41
*** ZZelle_ has joined #openstack-infra		19:42
*** sushilkm has joined #openstack-infra		19:42
*** sushilkm has left #openstack-infra		19:42
*** sushilkm has joined #openstack-infra		19:42
*** sushilkm has left #openstack-infra		19:42
fungi	i always travel with a hiking pack as my checked luggage, so easy for me to walk a few miles briskly with it if needed	19:42
greghaynes	Nice, I actually just booked a trip to your area for july :)	19:42
*** dimtruck is now known as zz_dimtruck		19:43
fungi	ooh! you should get a paper in for all things open and come to nc in october (though it's the week before tokyo, so maybe you actually shouldn't unless you're insane)	19:43
*** garyh has quit IRC		19:44
*** ihrachyshka has quit IRC		19:44
greghaynes	haha, the wife would be thrilled! (not really)	19:44
mordred	fungi: I need to submit for ATO	19:44
mordred	fungi: except - really it's the week before tokyo?	19:44
fungi	mordred: SUBMIT!	19:44
* mordred sobs		19:44
fungi	mordred: it's sunday through tuesday this time though, so there's a few days buffer at least	19:45
clarkb	the best part of this time of year is cadbury eggs	19:45
pleia2	++	19:45
fungi	just in case you wanted higher-octane sugar inside your normal sugar	19:46
*** otter768 has joined #openstack-infra		19:46
clarkb	fungi: yes	19:46
clarkb	I got a dozen :)	19:46
clarkb	ok time to review that change for gerrit	19:46
clarkb	anteaya: any others you can find?	19:47
anteaya	not for tomorrow	19:47
anteaya	hoping to hear from zaro	19:47
anteaya	do states have cadbury easter creme eggs now?	19:47
anteaya	I had believed you didn't	19:47
mordred	anteaya: we've had cadbury eggs for my entire life	19:48
anteaya	cool	19:48
mordred	anteaya: it's possible that there is an additional thing that we don't have	19:48
anteaya	not sure what I'm thinking of then	19:48
fungi	one of the few cadbury products we get here in the states	19:48
pleia2	cadbury in general isn't very common here	19:48
pleia2	but we get the eggs :d	19:48
mordred	to us, it's the company that makes the eggs	19:48
fungi	unless you go to import shops	19:48
mordred	fungi: MURICA!	19:48
anteaya	https://en.wikipedia.org/wiki/Cadbury_Creme_Egg	19:49
*** baoli has quit IRC		19:49
fungi	i quite like the cadbury currant bars	19:49
anteaya	fungi pleia2 oh okay	19:49
fungi	but muricans also mostly don't know what currants are either	19:49
anteaya	I don't know the currant bars	19:49
fungi	or call them "tiny raisins"	19:49
anteaya	fungi: well there's that	19:49
anteaya	:)	19:49
pleia2	fungi: not chocolate chips	19:49
clarkb	I think currants are those weird things we ate in belgium	19:49
anteaya	really?	19:49
anteaya	I don't consider currants belgian	19:50
*** baoli_ has joined #openstack-infra		19:50
*** otter768 has quit IRC		19:50
clarkb	I think they just had them there	19:50
clarkb	because ya we don't really have them inthis country	19:50
* krotscheck has a supplier of redcurrants in Seattle. ALL TO MYSELF.		19:51
clarkb	ok js people, why would we bother to go through the trouble of minifying jquery on trusty for ~15kb	19:51
*** rfolco has quit IRC		19:51
clarkb	krotscheck: ^ see https://review.openstack.org/#/c/165145/6/modules/openstack_project/manifests/gerrit.pp	19:51
mordred	krotscheck: also, if you didn't see the other day - ubuntu apparently ships jquery.min.js as a symlink to jquery.js	19:52
*** emagana has quit IRC		19:52
krotscheck	clarkb: Ehn. It doesn't hurt?	19:52
anteaya	clarkb: something about ensuring the toggle ci button works	19:52
anteaya	clarkb: not invalidating your question though	19:52
clarkb	anteaya: ya, mostly trying to figure out if this is worth the trouble	19:52
krotscheck	To be honest, serving javascript up as gzip is more effective than minifcation.	19:53
fungi	clarkb: i agree 15kb extra that your browser's going to cache anyway isn't necessarily worth the effort to puppet compressing it	19:53
krotscheck	So I usually don't bother minifying.	19:53
anteaya	clarkb: always worth it to ask that question	19:53
krotscheck	Also, minifying makes production debugging hard.	19:53
*** nilasae has joined #openstack-infra		19:53
krotscheck	"Exception thrown in line 1" -> Line 1 is 16K characters of text.	19:54
clarkb	krotscheck: ya, though at least in gerrits case its all minified otherwise and impossible to debug so thats less of a concern	19:54
fungi	anteaya: also it's a cadbury chocolate bar with currants and almonds. tasty, tasty stuff	19:54
anteaya	fungi: I don't think I've ever seen that	19:54
*** nilasae has quit IRC		19:54
anteaya	sounds very tasty indeed	19:54
fungi	anteaya: i've only found it in the uk	19:54
*** nilasae has joined #openstack-infra		19:54
anteaya	ah	19:54
clarkb	the other question I have is what will yui-compressor do if fed an already minified version of the file?	19:54
anteaya	I'll look for it next time I'm there	19:55
mordred	clarkb: dude, krotscheck has convinced me we should not bother	19:55
anteaya	haven't spent much time in the uk yet, mostly just passing through	19:55
fungi	clarkb: we shouldn't be re-feeding the already minified file into it?	19:55
* fungi re-checks that change		19:55
clarkb	fungi: oh right yup	19:55
fungi	yeah, it	19:55
fungi	grrr	19:55
zaro	anteaya, clarkb : yo! this is needed for the trusty upgrade, https://review.openstack.org/#/c/165145/	19:55
mordred	clarkb: and, in fact, should maybe stop minifying anywhere just because it would let us delete more puppet	19:55
clarkb	ok so we do need to address the broken button	19:55
fungi	it's minifying the normal version not the min.js file	19:55
anteaya	zaro: yes the very patch we are talking about	19:55
clarkb	fungi: ya	19:56
anteaya	zaro: glad you are here	19:56
zaro	was out to lunch and now back	19:56
fungi	i'm on board with serve readable source code from our servers	19:56
fungi	because we're open	19:56
anteaya	zaro: so how much do you care if we minify the js	19:56
*** ajmiller_ has joined #openstack-infra		19:56
anteaya	zaro: because right now the group is leaning towards not bothering	19:56
clarkb	zaro: can we not go through the trouble of minifying that file and simply have puppet do a smlink to /usr/share/javascript/jquery/jquery.js?	19:56
zaro	anteaya: fungi & jeblair seems to think it's important	19:57
anteaya	zaro: okay so if they come back in favour of not bothering that is okay with you?	19:57
fungi	zaro: i only felt it was important to actually have a minified file if we're serving it as jquery.min.js, but if we can serve the full source and _call_ it jquery.js i'm cool with that	19:58
mordred	clarkb: I'm voting for "have puppet do a symlink"	19:58
clarkb	I think I am fine with the change as is at this point too	19:58
zaro	uhhm, i think that's to make better perfomance.	19:58
zaro	i'm not sure how much better though.	19:58
mordred	zaro: krotscheck says it won't do that really	19:58
clarkb	but for simplicity a symlink would probabl be best	19:58
clarkb	the file size differences is about 15kb	19:58
mordred	fungi: yes - my issue with the debian package was that they called it .min.js	19:58
fungi	mordred: mine too. i think that's worse than just not including the file	19:59
krotscheck	The only real benefit is download speed, and that's heavily dependent on your browser's caching settings, the server's use of cache invalidation headers, and the server's use of mod_gzip	19:59
fungi	javascript minification is, to some extent, an obsessive compulsive disorder some people have about squeezing every last bit of whitespace out of files they serve even if their webserver is going to turn around and gzip-encode it anyway	19:59
mordred	krotscheck: all of which are going to do a better job than minification	19:59
greghaynes	Yea, really the use case for gaining speed via minification isnt something I belive youall have	19:59
* mordred hands krotscheck an extra box of redcurrants		19:59
clarkb	mordred: where are we with https://review.openstack.org/#/c/166318/ ? have images building in hpcloud and rax yet?	19:59
fungi	it's not like we're minifying our html	19:59
krotscheck	From what I remember, the gzip algorithm actually works better on things with large regular words rather than things collapsed to single-character varnames.	20:00
mordred	clarkb: image just uploaded to hpcloud-b5, I kicked b4 just now	20:00
greghaynes	The only time ive seen that download size make a big difference is when youre dealing with things like mobile where its more of a slowstart issue than download size issue	20:00
clarkb	krotscheck: that sounds right, because it does prefixes (or suffixes, maybe both) so you need longer strings that overlap	20:00
*** ajmiller has quit IRC		20:00
*** ghostpl_ has quit IRC		20:00
anteaya	zaro: so this patch was to ensure the toggle ci button works, yes? https://review.openstack.org/#/c/165145/	20:01
zaro	yes	20:01
*** andreykurilin_ has quit IRC		20:01
clarkb	mordred: cool, I will keep an eye on nodes there, devstack-trusty?	20:01
greghaynes	Like, either way, you might not even be talking a full packet in size difference when you gzip both versions so there is effectively no difference ;)	20:01
anteaya	zaro: okay great, can we get the toggle ci button working without having to minify the js?	20:01
mordred	clarkb: yah	20:01
*** ociuhandu has quit IRC		20:02
zaro	yes, it works wihtout minifying js	20:02
anteaya	zaro: how would you feel if we went that way?	20:02
clarkb	zaro: oh, so it works today without that change?	20:02
mordred	clarkb: b4 has it	20:02
zaro	clarkb: no, it will be broken on trusty	20:02
clarkb	zaro: ok, so we do need a change, but it doesn't have to be that change	20:03
zaro	clarkb: it works today because it's on precise. precise lib-jquery library provides min.js file	20:03
clarkb	right rather than a symlink	20:03
zaro	so you proprose just linking to the .js file, yes that will work as well	20:04
zaro	actually it has to be a copy not a link	20:04
clarkb	zaro: we can do that then, just switch to using the real file not the .min.js	20:05
*** ChuckC has quit IRC		20:05
zaro	clarkb: ok, maybe fungi and jeblair should chime in on that since i thought they wanted the min.js	20:05
fungi	zaro: i only felt it was important to actually have a minified file if we're serving it as jquery.min.js, but if we can serve the full source and _call_ it jquery.js i'm cool with that	20:05
*** Sukhdev has quit IRC		20:06
jeblair	zaro: i'm okay with the non-minified file. it is a regression since we are serving it now, but the argument that it won't actually be any worse makes sense. we can try it, and if it is, we can go with what you have.	20:07
zaro	no min.js is cool with me if there's no benefit	20:07
clarkb	and maybe we can file a bug with debuntu about this	20:07
mordred	clarkb: it would distract them from fixing python	20:07
greghaynes	Do youall do gzipping of those files when you serve them?	20:08
pleia2	clarkb: videos aren't online yet, but the slides for the "life of a logstash event" talk are up and helpfully detailed https://speakerdeck.com/elastic/life-of-a-logstash-event	20:08
anteaya	yay, so we looking forward to your new patch zaro, which we hope to review and merge in the next few hours	20:08
*** edwarnicke has joined #openstack-infra		20:08
clarkb	pleia2: thank you, was it a good talk?	20:08
pleia2	clarkb: it was great	20:08
zaro	cool, i'll fix up. maybe try to add that bug as well. LP right?	20:08
fungi	i mean, i sort of know why they did that. they can't ship jquery.min.js for certain reasons, but some other packaged web applications may be hard-coded to serve a file called jquery.min.js, so someone thought this was the most pragmatic compromise	20:08
pleia2	clarkb: I'll let you know when the video shows up :)	20:08
clarkb	pleia2: awesome, I should bug you when logstash derps now :)	20:08
*** hdd has quit IRC		20:09
greghaynes	my browswer says youall do gzip, so \O/	20:09
clarkb	fungi: wait, I thought the can if there is an FOSS toolchain to gnerate the file	20:09
pleia2	clarkb: haha, I might actually be able to help!	20:09
clarkb	fungi: I don't see how that is any different than say shipping a compiled gcc	20:09
greghaynes	Yes, I thought jquery is mit license	20:09
mordred	clarkb: ++	20:10
fungi	clarkb: yeah, though the reliability of that was potentially in question when the javascript-jquery package landed in debian in the timeframe in which trusty imported it before it froze for release	20:10
*** Bsony has quit IRC		20:10
* fungi looks to see if it's still that way in testing/unstable		20:10
*** Bsony has joined #openstack-infra		20:11
jeblair	greghaynes: thanks for checking! :)	20:11
jeblair	(confirming gzip)	20:12
clarkb	pleia2: it is interesting that they still use that scaling architecture, I threw it out after about a day because it doesn't scale :)	20:12
*** tiswanso has quit IRC		20:12
clarkb	pleia2: we run N indexers instead of funneling it down to 1 indexer	20:12
* jeblair gets back to writing words		20:12
fungi	clarkb: mordred: jeblair: yeah, the libjs-jquery 1.7.2+dfsg-3.2 from jessie and sid has a separate min.js file not a symlink	20:13
fungi	and it's definitely smaller by roughly the right amount	20:13
*** dougwig has joined #openstack-infra		20:13
*** timcline has quit IRC		20:14
clarkb	cool	20:14
*** ajmiller_ is now known as ajmiller		20:14
*** erw has joined #openstack-infra		20:14
fungi	so looks like it was restored to sanity. it was a symlink in 1.7.2+debian-2.1 because the minification relied on uglify which was not at that time destined to make it into the wheezy release	20:15
greghaynes	clarkb: batch processing does a ton for scalng ;)	20:15
fungi	(circa november 2012)	20:15
*** mfink_ has joined #openstack-infra		20:15
clarkb	mordred: devstack-trusty-1426881119.template.openstack.org is that the image I should be looking for?	20:15
clarkb	greghaynes: you can do it without batch processing either	20:16
clarkb	greghaynes: every shipper could just be an indexer too	20:16
mordred	clarkb: yes	20:16
*** sweston has joined #openstack-infra		20:16
mordred	and it's uploaded to 2-5 now - and 1 is in progress	20:16
greghaynes	clarkb: Yes, I imagine under the hood thats what youre gaining by scaling via replication though	20:16
clarkb	greghaynes: replication only affects query scaling not indexing	20:16
clarkb	or maybe you don't mean es replication	20:17
greghaynes	oh, I did but I guess it works differently than I thought. Maybe thats a good performance improvement if we run into write scaling issues	20:17
greghaynes	to somehow batch writes	20:18
pleia2	clarkb: the ELK family is interesting and young so even in 2 years since you first set up our system it's changed a ton, better scaling support across the board has been one of the big things	20:18
pleia2	clarkb: during one of the talks, they said they thought "logstash dropping things on the floor" was mostly an unusual bug/hardware failure/something but they came to realize it's a real thing once they started doing bigger testing	20:19
clarkb	pleia2: ya supposedly the elasticsearch 1.0 release performs much better but they keep having CVEs for their groovy script support which is a bit :(	20:19
greghaynes	but sounds like youre just effectively making readonly replicas?	20:19
pleia2	clarkb: ah, that is unfortunate	20:19
clarkb	greghaynes: no we basically have N logstash indexers that each process a file at a time	20:19
clarkb	greghaynes: they talk to local ES clients that are part of the cluster which then index the data on the data nodes	20:20
clarkb	greghaynes: but aiui you only ever write to the primary shard for indexing, so replicas don't help indexing performance	20:20
*** mfink_ has quit IRC		20:20
clarkb	greghaynes: they do however help reads beacuse you can read from any shard that has the data on any node when doing queries	20:20
clarkb	and if you lose a node with primary shards replicas will become primary shards so you also get ha from them	20:21
*** claudiub has quit IRC		20:21
*** erlon is now known as erlon_away		20:21
mordred	clarkb: ok - all of hpcloud has the new devstack-trusty image	20:22
clarkb	mordred: ok	20:22
clarkb	mordred: have you also updated in rax to make sure we don't break rax tomorrow morning?	20:22
*** melwitt has joined #openstack-infra		20:22
openstackgerrit	Khai Do proposed openstack-infra/system-config: Fix jquery setup on Gerrit server. https://review.openstack.org/165145	20:22
mordred	clarkb: no - I'll do that next	20:22
zaro	clarkb, anteaya, fungi ^	20:23
mordred	clarkb: I'm bulding bare-trusty right now	20:23
clarkb	mordred: ok, probabl just start with one region there	20:23
mordred	clarkb: yah	20:23
mordred	clarkb: although I _expect_ it to be a no-op since I got the file from rax - but still	20:23
greghaynes	clarkb: so, you obviously have to do some kind of write to the non-primary shards otherwise they dont have the data ;) I think you are effectively doing batch write replication though. Locally youre not but when you replicate you will which is where you tend to hit scaling issues	20:23
mordred	clarkb: oh - duh. bare-trusty is not dib	20:24
*** melwitt_ has joined #openstack-infra		20:24
mordred	clarkb: I got an overquota error- apparently we're sitting at 600 nodes on hp	20:24
*** dkliban is now known as dkliban_afk		20:24
clarkb	zaro: see comment	20:24
*** thingee has joined #openstack-infra		20:24
clarkb	greghaynes: oh I think maybe we have confused each other. The scaling issues are in logstash not es	20:24
greghaynes	clarkb: ty for the explanation though, kinda want to figure out more about tha tsetup	20:24
mordred	greghaynes: we wold love for someone other than clarkb to actualy understand it :)	20:25
clarkb	greghaynes: so the problem is scaling up cputime for logstash indexer process which means running one of those is bad for scaling	20:25
clarkb	greghaynes: es is actually pretty good at scaling up, every time I have added nodes it has helped	20:25
greghaynes	clarkb: ah! I should stop assuming all problems are database problems	20:25
clarkb	greghaynes: but basicaly ruby that runs lots of regexes in the jvm is slow :)	20:25
*** peristeri has quit IRC		20:25
mordred	clarkb: it's a fair asumption	20:25
openstackgerrit	Khai Do proposed openstack-infra/system-config: Fix jquery setup on Gerrit server. https://review.openstack.org/165145	20:25
mordred	gha	20:25
*** dprince has quit IRC		20:26
nibalizer	jeblair: we confirmed that puppet-blacksmith can create the puppetforge module when it submits the first time	20:26
mordred	nibalizer: neat	20:26
nibalizer	or i guess more accurately that the forge api is smart enough	20:26
zaro	clarkb: argg! shuld be good now	20:26
clarkb	zaro: one more thing	20:26
clarkb	zaro: sorry should've caught that on the previous patchset	20:26
*** melwitt has quit IRC		20:27
*** armax has joined #openstack-infra		20:27
zaro	don't we only want to copy only on package updates?	20:28
clarkb	mordred: for my sanity, the thing that we think caused hpcloud troubles was lots of deletes piling up after jeblairs change beacuse they were no longer serialized? and we had lots of deletes because metadata was broken. To address this we reverted jeblairs change and are avoiding metadata service	20:28
clarkb	zaro: the notify is independent. you need to tell gerrit to rebuild its stuff once you update the file	20:28
clarkb	zaro: so ya you need both things, the package subscription and the notify to make gerrit rebuild things	20:28
mordred	clarkb: yes	20:29
clarkb	mordred: any idea on whether or not metadata service is being fixed in hpcloud?	20:29
mordred	clarkb: well, we had a non-zero number of build failures that indicated some issue with metadata service in their logs	20:29
mordred	clarkb: I do not expect that it is - I think it is merely a fundamentally broken part of openstack	20:29
clarkb	where non zero is vast majority	20:29
clarkb	mordred: it may be, but we have only been experiencing trouble in hpcloud for about 2-3 weeks now	20:30
clarkb	mordred: basically I am trying to work backwards and see if we can attribute all of this to the same problem	20:30
mordred	clarkb: I think in general the mysql there is unhappy	20:30
clarkb	oh I see and metadata relies on that to get its info	20:30
mordred	AIUI	20:30
clarkb	gotcha so ya it may actually all be related	20:30
zaro	clarkb: doesn't the GerritSiteHEader.html get reloaded on every browser page reload?	20:31
clarkb	because I think we leak resources when these random failures happen that report fail to nodepool without and resource uuids	20:31
clarkb	zaro: no gerrit only notices that file has changed if you touch it	20:31
clarkb	which is what the exec does	20:31
mordred	clarkb: yes - although it seems that we may want to more systemically account for the 502 followed by resource pattern	20:32
clarkb	mordred: ya I think the metadata idea from fungi is a good one	20:32
mordred	clarkb: like - it's quite possible that we will ALWAYS have registered an actual request when that happens	20:32
clarkb	mordred: I am just trying to assert that This si what made hpcloud broken for us over the last two weeks	20:33
mordred	yah	20:33
clarkb	because if it isn't then we also have other things to debug	20:33
clarkb	flashgordon: ^ around? we ould like to talk about that and how this may be a nova bug	20:33
*** tqtran_afk is now known as tqtran		20:33
mordred	clarkb: I kinda think that we should trap for 500 errors, and if we get them, assume that the request succeeded and that we need to poll nova for the uuid based on the hostname we requested and try to resume once we have one	20:34
greghaynes	It would be helpful to know how they determine request throttling...	20:34
*** andreykurilin_ has joined #openstack-infra		20:34
fungi	clarkb: mordred: i assert the metadata idea was jeblair's	20:34
openstackgerrit	Khai Do proposed openstack-infra/system-config: Fix jquery setup on Gerrit server. https://review.openstack.org/165145	20:34
mordred	clarkb: because it's a frequent enough thing - it may fit the profile of "here is another way we've discovered clouds fail that we work around"	20:34
*** zz_dimtruck is now known as dimtruck		20:34
greghaynes	seems like the case were optimizing for is if we make a request, it fails, can we immediately make a second one - not an we async a bunch of requests out	20:34
mordred	as in - I think we should just put more logic in our retry-timeout code there - and not delete the incomplete database record until we've reached the timeout	20:35
clarkb	greghaynes: I think part of the problem here is that throttling is $request/time when $request may be one of many requests that all have different costs	20:35
zaro	clarkb: ok, done. thanks	20:35
greghaynes	clarkb: oh joy	20:35
mordred	yup	20:35
mordred	delete, for instance, is very expensive	20:35
clarkb	zaro: lgtm thanks	20:36
*** emagana has joined #openstack-infra		20:37
zaro	clarkb, fungi : noticed that jquery.min.js is used for other servers as well so probably should keep an eye out for those when moving to trusty.	20:37
anteaya	fungi mordred pleia2 jeblair https://review.openstack.org/#/c/165145/ is up for review, be best if we had it in for tomorrow	20:37
clarkb	mordred: 687d0191-2605-4a8b-a1e5-cd773366c9b5 is up console log looks mostly ok to me	20:38
mordred	clarkb: woot	20:38
clarkb	mordred: I think nodepool is waiting to get it a floating ip now	20:38
clarkb	mordred: but hopefully it goes used/ready soon	20:38
*** eharney has quit IRC		20:38
clarkb	zaro: good point, we use it for zuul status and other tools	20:39
*** dustins has quit IRC		20:39
pleia2	anteaya: that's the patch that clarkb and zaro are talking about how :)	20:39
pleia2	s/how/now	20:39
openstackgerrit	Doug Wiegley proposed openstack-infra/project-config: For neutron and neutron-lbaas, skip more wasted jobs https://review.openstack.org/166035	20:39
anteaya	well clark is +2 on the patch, so my read is that he is happy as is	20:40
*** tsg_ has joined #openstack-infra		20:40
pleia2	yeah, I'm going to hold off until checks pass	20:40
anteaya	fair enough	20:40
*** bswartz has quit IRC		20:40
pleia2	but it's still on my radar	20:40
anteaya	great, as long as we have it merged for tomorrow	20:40
anteaya	didn't want it to get lost	20:40
clarkb	mordred: so, thinking about rebuilds and I know jeblair wants it to be zuul v3, but if we changed from queues for provider managers to heaps where delete had a higher cost/lower priority we could then update them to be rebuilds which have a lower cost then resort and bam now we have new node	20:41
clarkb	mordred: I think that may be a relatively easy change assuming the heap implementation doesn't make us cry	20:41
*** tsg has quit IRC		20:41
*** AJaeger has quit IRC		20:41
jeblair	clarkb: do you know why i want to work on that in zuulv3?	20:41
clarkb	jeblair: no	20:42
greghaynes	clarkb: heapq?	20:42
clarkb	greghaynes: ya but needs to be thread safe	20:42
clarkb	greghaynes: so likely needs a wrapper of some sort, doable I just haven't thought about implementation much	20:42
jeblair	okay, so i've tried to explain this already, but i guess i haven't been sucessful.	20:42
greghaynes	clarkb: what about starvation?	20:42
jeblair	i will try again	20:42
jeblair	rebuilding is a big change in logic for nodepool	20:43
jeblair	currently the whole thing is built around delete/create	20:43
jeblair	specifically the allocator assumes that behavior	20:43
jeblair	the allocator is _incredibly_ complex at this point	20:43
clarkb	jeblair: yes, so thats why I am thinking about how to make it without making it a big change and I think using a heap above can make it not a big change	20:43
jeblair	no individual actually understands how it works	20:43
jeblair	changing from the delete/create cycle to rebuild means altering the allocator	20:44
*** dimtruck is now known as zz_dimtruck		20:44
clarkb	jeblair: I don't think it has to	20:44
clarkb	when you call create it would replace a delete with a rebuild if there are any rebuilds, otherwise place a create on the heap	20:44
jeblair	clarkb: okay, how can we avoid that then?	20:44
*** garyh has joined #openstack-infra		20:45
clarkb	when you call delete it just adds a delete like normally happens	20:45
*** sdake_ has joined #openstack-infra		20:45
clarkb	but the api for the allocator remains the same, the provider manager just returns a node back into the scheduler that may or may not have been rebuilt	20:45
clarkb	greghaynes: starvation is somethign to worry about, BUT we are already so starved doing deletes I don't think it will be worse	20:46
*** timcline has joined #openstack-infra		20:46
*** hyakuhei has joined #openstack-infra		20:46
greghaynes	hrm, I mean, I could wip out a siple PI loop for that ;)	20:46
greghaynes	because thats effectively what you need here too	20:47
greghaynes	but seems like different problem first	20:47
clarkb	I am fairly positive that the naive implemetnation would work just fine at least compared to the current situation	20:47
greghaynes	yea, with this kind of stuff its almost always a 'just test it'	20:48
*** sdake has quit IRC		20:49
*** enikanorov has quit IRC		20:50
*** radez is now known as radez_g0n3		20:50
pleia2	zaro: precise apply failed on https://review.openstack.org/#/c/165145/ note inline about it	20:50
jeblair	clarkb: i could see how that would work with hpcloud in its current situation. i do not believe we would end up doing very many, if any, rebuilds on rax because deletes happen so quickly there.	20:50
clarkb	jeblair: ya, it likely would not work well in a situation where delete isn't very high cost	20:50
jeblair	clarkb: and you would not need to change the allocator, unless you wanted to fix the rax problem	20:51
jeblair	clarkb: though you would need to change quite a bit of the rest of nodepool -- the delete and create threads	20:51
*** enikanorov has joined #openstack-infra		20:51
*** adalbas has quit IRC		20:52
openstackgerrit	Doug Hellmann proposed openstack/requirements: Fix oslo caps for kilo https://review.openstack.org/166377	20:53
jeblair	so here's why i want to do this in zuulv3 -- nodepool is really complicated, and hard to maintain and hard to test changes. the allocation system in v3 will be so much simpler -- node requests are just a fifo, and the allocator just needs to find spare capacity as it comes up.	20:53
zaro	pleia2: uggh, the puppet file doesn't have refreshonly. it was used for exec from previous patch. will fix up	20:53
jeblair	it will be really simple to build systems like this on top of it	20:53
greghaynes	jeblair: is v3 just the one spec at this point?	20:54
pleia2	zaro: thanks :)	20:54
fungi	i could sort of see if nodepool sees we're running at capacity so it's deferring creation, then waiting demand could get spun to rebuild calls on nodes which complete jobs instead of making delete calls for them, but that would then only kick in when you're out of capacity everywhere	20:54
*** garyh has quit IRC		20:54
jeblair	whereas changing fundamental things about nodepool is really hard right now. i'd rather try to avoid spending a lot of time working on an implementation in the current complex system which i want to get rid of.	20:55
fungi	which wouldn't necessarily be much of an improvement over the current situationm	20:55
fungi	yep, makes complete sense	20:55
jeblair	greghaynes: yes. i was in the middle of writing the next part	20:56
mordred	I think I have a patch almost finished to deal with the current weird hp failure, btw	20:56
clarkb	ya I think the ultimate goal of zuul v3 is good. I just don't know how long we can limp along at <200 useable nodes at any time	20:56
jeblair	greghaynes: (it's also an email, which is a good chunk of it)	20:56
greghaynes	ah, ok. I should find that	20:56
jeblair	clarkb: it's more like 300, but yeah	20:57
*** nilasae is now known as nilasae\|zzz		20:57
openstackgerrit	Khai Do proposed openstack-infra/system-config: Fix jquery setup on Gerrit server. https://review.openstack.org/165145	20:57
zaro	pleia2: ^	20:57
*** rkukura has quit IRC		20:58
clarkb	zaro: pleia2 oh refresh only must be an exec thing	20:58
fungi	greghaynes: http://lists.openstack.org/pipermail/openstack-infra/2015-February/002471.html	20:58
pleia2	clarkb: yeah, seems so	20:58
mordred	clarkb, jeblair: either one of you happen to have one of the 500 level exceptions on create tracebacks laying around handy?	20:59
fungi	mordred: there was one in my paste earlier	20:59
pleia2	zaro: thanks, I'll keep an eye on tests and approve when things pass	20:59
clarkb	mordred: I do not but I can probably grep one for you if that helps	20:59
mordred	ah! foudn one	20:59
jeblair	clarkb: if we need to do it, then we need to do it. honestly, for something like that i expect that one or two infra-cores will spend a week or two babysitting it and fixing things _after_ we've merged the change. we never find all the edge cases right away.	20:59
greghaynes	fungi: tyty	20:59
fungi	greghaynes: ywyw	21:00
clarkb	jeblair: yes, it wouldn't be a low cost change to implement. But I do think we can avoid allocation complications	21:00
*** tkelsey has joined #openstack-infra		21:00
*** achuprin has quit IRC		21:00
*** sarob has joined #openstack-infra		21:00
*** _nadya_ has joined #openstack-infra		21:01
jeblair	clarkb: how would you like to proceed?	21:02
*** baoli_ has quit IRC		21:02
*** emagana has quit IRC		21:02
openstackgerrit	Merged openstack/requirements: Bump sahara client version https://review.openstack.org/155428	21:02
*** esker has joined #openstack-infra		21:03
clarkb	I am reading heapq docs to see how terrible a priority queue implementation might be. It doesn't look like replacing arbitrary entries is very performant or easy	21:04
*** tkelsey has quit IRC		21:04
*** dmorita has joined #openstack-infra		21:05
*** emagana has joined #openstack-infra		21:05
clarkb	though can probably work aroudn that simply by using two Queue.Queues, since really we only have two priorities. delete and everything else	21:05
*** Ryan_Lane has quit IRC		21:05
clarkb	let me see how terrible updating the code to do this might be really quickly	21:06
*** _nadya_ has quit IRC		21:06
greghaynes	clarkb: You can just rebuild the queue, I think we have gasp hundreds of nodes, right?	21:06
*** hasharConfcall is now known as hashar		21:07
openstackgerrit	Monty Taylor proposed openstack-infra/nodepool: Deal with failures that succeed https://review.openstack.org/166383	21:07
clarkb	greghaynes: ya but this way I also get thread safety so I am just going to go with it	21:07
mordred	clarkb, fungi, jeblair: ^^ there is a first stab at adding some smarts about waiting for teh server record to show up after a 500	21:07
greghaynes	clarkb: well we have to make inserts thread safe too, IMO why not just put locks around it and call it good	21:07
clarkb	greghaynes: because Queue.Queue is thread safe	21:07
mordred	I would like to posit that it might get us a decent amount more nodes in hpcloud - given how many aliens we keep growing	21:08
mordred	I'd like eyes on the approach before I spend too much time working on testing it	21:08
jeblair	mordred: i believe most of the aliens have been related to our failed experiment from yesterday	21:08
mordred	jeblair: we had them before our experiment	21:09
jeblair	mordred: before then, we only had a handful every few weeks i want to say?	21:09
mordred	jeblair: possibly - but I think it's a regular failure mode of hpcloud now	21:09
mordred	jeblair: so handling it is appropriate	21:09
clarkb	jeblair: we had ~150 a couple days ago	21:09
jeblair	clarkb: okay, that's more than i recall dealing with.	21:10
*** zz_dimtruck is now known as dimtruck		21:10
mordred	yah - I remember trying to get the noc folks to care	21:10
*** ldnunes has quit IRC		21:13
krotscheck	clarkb, mordred: I stand corrected, minified does actually gzip even more -> http://paste.openstack.org/show/194069/	21:14
clarkb	krotscheck: meh its 5kb difference :)	21:14
*** mattfarina has quit IRC		21:15
*** bswartz has joined #openstack-infra		21:15
krotscheck	clarkb: You mean 50	21:15
krotscheck	?	21:15
openstackgerrit	Merged openstack-infra/project-config: Use template for Rally py34 job https://review.openstack.org/164858	21:15
*** timcline has quit IRC		21:15
krotscheck	(well, 43)	21:16
fungi	krotscheck: that's also a much newer jquery than we're talking about	21:17
krotscheck	fungi: True.	21:17
fungi	the difference in size is substantial from 1.7	21:17
*** emagana has quit IRC		21:18
clarkb	oh was I off by order of magnitude beacuse they increased size by an order of magnitude?	21:18
krotscheck	Well, 1.11 also has a ~40KB difference.	21:18
jeblair	jquery was downloaded from review 15062 times yesterday (out of 40385 requests; most were 304 not modified)	21:18
krotscheck	I actually think it's documentation.	21:19
*** emagana has joined #openstack-infra		21:19
fungi	oh, wait, it's not. i was looking at the file earlier and seeing an order of magnitude difference in size	21:19
clarkb	maybe all my maths are off	21:19
fungi	fridaymath	21:19
clarkb	mordred: that node I gave you a uuid for is still building fwiw	21:19
krotscheck	Yeah, caching is definitely a the thing that needs to happen.	21:19
* greghaynes is curious what the thing trying to be optimized is		21:21
greghaynes	because if its either download speed or bandwidth then google jquery cdn will be the best fix	21:21
greghaynes	but if its just effort then not sure it matters	21:21
fungi	mostly effort as far as i'm concerned. but also i like us serving actual readable/modifiable source code	21:22
krotscheck	Well, the reason I care right now is taht I've had a bunch of discussions with packagers and other frontend people, and I'm trying to come up with sane JS policies to propose to the TC.	21:22
*** timcline has joined #openstack-infra		21:22
krotscheck	Things like: Don't minify, learn to cache instead.	21:22
*** emagana has quit IRC		21:23
fungi	but also make it possible for the deployer to decide to switch out the js for minified versions if they really want to go through the effort to be that ocd about it	21:23
krotscheck	But for that I need data.	21:23
clarkb	jeblair: you know where this really gets complicated? the fact that we have to delete >1 thing :(	21:24
jroll	fungi: fwiw, you can serve a source map to modern browsers so that they can make the JS readable	21:24
*** hyakuhei has quit IRC		21:24
mordred	krotscheck: data++	21:24
fungi	jroll: true, like shipping separate stripped binaries and symbol files	21:25
jroll	yep	21:25
mordred	clarkb: still building seems very lame	21:25
greghaynes	actually, that brings up a good point krotscheck ^ the biggest gain of not minifying if its software were making is that we can actually debug errors	21:25
mordred	clarkb: devstack-trusty-1426883075.template.openstack.org <-- rax-dfw rebuilt	21:25
greghaynes	otherwise you have to do that souce mapping hackery	21:26
SpamapS	mordred: https://review.openstack.org/#/c/166383/ -1'd	21:26
SpamapS	mordred: but I think the idea is sound and worthwhile.	21:26
mordred	SpamapS: thanks - good feedbacks	21:26
krotscheck	greghaynes: Ooooh yes. I know that pain acutely.	21:26
fungi	krotscheck: jeblair: anyway, assuming that all those downloads yesterday were gzip-compressed, that's ~630mib additional data which would have been downloaded if we weren't minifying. so not enormous	21:27
SpamapS	greghaynes: my experience has been that building a simple way to run in a debug mode using the query string helps with that. foo/?dontminify=1	21:28
clarkb	SpamapS: good luck getting that into gerrit	21:28
SpamapS	clarkb: They don't have UI engineers with sanity requirements? :)	21:29
greghaynes	SpamapS: The problem is a lot of the time you have a setup where clients send backtraces to you when they error	21:29
greghaynes	(I wonder if any of our deployers do that?)	21:29
greghaynes	Since you kind of want to know if the code you send them works	21:29
krotscheck	greghaynes: I thought that's why we have tests.	21:29
krotscheck	:D	21:29
greghaynes	heh	21:29
fungi	SpamapS: sanity requirements? it was initially developed at google. i think they just don't have ui engineers	21:31
fungi	also, i think the current and new webuis for gerrit are further proof of that conjecture ;)	21:31
*** kgiusti has left #openstack-infra		21:32
jeblair	jquery is not used by gerrit	21:32
krotscheck	Well, they built Angular, which is pretty cool.	21:33
krotscheck	But then they ported it to TypeScript.	21:33
krotscheck	So I'm not certain what that says about them.	21:33
openstackgerrit	Clark Boylan proposed openstack-infra/nodepool: Rough rough shape of what rebuilds might look like https://review.openstack.org/166387	21:33
clarkb	jeblair: ^ is the basic shape of the thing	21:34
fungi	jeblair: nope, but getting switches into the gerrit request syntax to switch between serving multiple javascript files was the ui sanity question, not really jquery specifically	21:34
jeblair	krotscheck: s/built/hired developer of/	21:34
clarkb	jeblair: but you are right there are some hairy bits, I have commented on them with TODOs and am curious if you think its worth figuring those bits out	21:34
krotscheck	....oh. Well then.	21:34
krotscheck	Still, TypeScript.	21:34
krotscheck	ick.	21:34
krotscheck	(Though maybe not really)	21:35
clarkb	jeblair: biggest thing is DeleteServerTask needs to become more atomic within nodepool	21:35
*** teran has joined #openstack-infra		21:35
*** dboik has quit IRC		21:35
clarkb	jeblair: so that when it fires it handles all of the delete tasks otherwise rebuilt nodes would need ot figure out floating ips and keypairs some other way	21:35
*** spzala has quit IRC		21:36
BobBall	How can I have a gerrit reporter comment to gerrit without voting? Seems that I _must_ have a value after the "gerrit:" tag which gets translated to something that's actually sent through. Is there a no-op command I can add?	21:36
jeblair	BobBall: look at the definition of our experimental pipeline	21:37
BobBall	oh - {} - how obvious... Thanks.	21:37
jeblair	clarkb: do you think we should defer an existing priority effort here?	21:37
*** esker has quit IRC		21:38
*** VijayTripathi1 has quit IRC		21:39
clarkb	jeblair: I think getting the deletions working correctly with the various resources that need to be deleted + rebuilds will likely sink quite a bit of time and probably are not worth it	21:39
openstackgerrit	Monty Taylor proposed openstack-infra/nodepool: Deal with failures that succeed https://review.openstack.org/166383	21:39
clarkb	jeblair: we already know that this stuff is hairy even without rebuilds	21:39
clarkb	(we leak floating ips for example)	21:40
mordred	ok. if we're doing that	21:40
openstackgerrit	Monty Taylor proposed openstack-infra/nodepool: Use rebuild instead of delete https://review.openstack.org/166388	21:40
mordred	I wasn't pushing mine up because we weren't doing it and I didn't want to be a randomization function	21:40
mordred	but I think it can be much much simplier	21:40
clarkb	mordred: your change won't work for the same rasons I think	21:40
clarkb	mordred: no yours has the same issues	21:41
mordred	ok	21:41
mordred	just saying	21:41
clarkb	mordred: the problem here is that a VM isn't just a VM	21:41
clarkb	it should be and we should give that feedback to nova and neutron	21:41
clarkb	but unfortunately today it isn't	21:41
mordred	clarkb: I don't know that that matters in this case	21:42
clarkb	oh I see you just never call cleanupServer at all	21:42
mordred	yeah	21:42
mordred	that's why I say "this will just keep things at max usage"	21:43
clarkb	ya, the one place that may be a problem is with snapshot builds, maybe	21:43
clarkb	I do not know how "clean" a rebuild is for that	21:43
mordred	I think a follow up could be done to deal with that - but my main thought experiment was "can this be done without affecting the algorithm at all"	21:43
clarkb	anyways I think the deal I was describing isn't as simple as I thought because delete is really at least 3 delete operations	21:44
clarkb	and as soon as we have to make that more atomic it becomes complicated	21:44
mordred	yes. getting delete right is very hard	21:44
mordred	because of that	21:44
*** EmilienM is now known as EmilienM\|PTO		21:44
mordred	I wish re-using floating-ips was less insane - but the race condition ... ZOMG - I just had an idea for that - probably a zuulv3 idea	21:45
mordred	but I don't know why I didn't think of it before	21:45
mordred	I've been trying to think about floating-ip reuse atomiticy in pure openstack terms	21:45
mordred	but we have a database	21:45
mordred	which means we can deal with the multi-thread race condition issues with allocation flags in the db, rather than in the nova/neutron api	21:46
jeblair	okay, i'd like us to make some kind of a decision here	21:46
fungi	so basically, to summarize, 166383 will go ahead and delete (or attempt to delete) the errored instance and wait for it to no longer appear in the nova list... but are we necessarily able to track it at that point (e.g. has it provided an instance uuid so it can be identified)?	21:46
jeblair	i was expecting to spend this afternoon working on zuulv3 specs	21:46
jeblair	but i'm getting nowhere, because we're still talking about rebuilds	21:46
jeblair	so, can we decide to either pursure this thing or not?	21:46
mordred	fungi: no - 383 does the opposite	21:46
jeblair	i don't see any way it's simple	21:47
mordred	fungi: it continues to try to use a node even if it gets a 500 error - since we've learned that 500 errors are lies	21:47
*** otter768 has joined #openstack-infra		21:47
clarkb	jeblair: I am with you now, we have not pursued it yet because it is complicated in various ways. Despite that we have limped along with create-delete we should be able to make that work while zuulv3 happens	21:47
jeblair	i think it's at least one person working on it a while, and at least one core babysitting it for a week	21:47
jeblair	if people think it's worth doing, let's knock something off the priority list and do it	21:47
jeblair	or put it on the priority list backlog	21:47
*** aysyd has quit IRC		21:47
fungi	mordred: ahh, yeah i forget that it can actually become usable even after a 5xx rather than simply hang around broken	21:47
jeblair	but it's a big enough thing that i don't think we have any more marginal time for it	21:48
mordred	I think it's maybe worth it - and I Agree it's not going to be quick and easy	21:48
*** doug-fish has left #openstack-infra		21:48
mordred	I say maybe because I think one of our clouds is not goign to get any better any time soon, and I think that we've learned that delete calls are super expensive in openstack	21:48
*** hashar has quit IRC		21:48
jeblair	mordred: so are create. do we know how expensive rebuild is?	21:48
mordred	so the ability to avoid them may give us a much larger amount of bandwidth	21:48
mordred	jeblair: I expect it to be the same	21:49
mordred	which makes it 1/2 as expensive	21:49
fungi	however, we're past feature freeze now... we might have quite a few months we can limp along with the current create/delete model	21:49
jeblair	has anyone tried this and timed it on hpcloud?	21:49
*** jamielennox\|away is now known as jamielennox		21:49
mordred	I think timrc did some initial benchmarks, yes	21:49
jeblair	fungi: i agree	21:50
clarkb	timrc did, I don't have his numbers handy but they were an improvement	21:50
mordred	fungi: that is a good point	21:50
openstackgerrit	Merged openstack-infra/system-config: Fix jquery setup on Gerrit server. https://review.openstack.org/165145	21:50
*** timcline has quit IRC		21:50
fungi	hence i'd rather see those several months runway invested hard in zuul v3 rather than continuing to try to make incremental improvements to the current design	21:50
mordred	I don't think it's urgent anymore because of FF - and I've been solidly on the "do it later" camp because adding it now will make the nodepool-shade patch larger	21:51
fungi	which might rob us of the resources we need to get new zuul in time for the _next_ feature freeze	21:51
*** otter768 has quit IRC		21:51
anteaya	zaro: so we have all we need for tomorrow?	21:51
clarkb	mordred: and you don't expect hpcloud to correct whatever has ailed it over the last couple weeks?	21:52
mordred	no	21:52
mordred	or, rather	21:52
clarkb	I am fairly positive this was not a problem a couple mnoths ago	21:52
mordred	I think we need to assume for planning purposes that it will not	21:52
mordred	so that if it does, it will be a pleasant surprise	21:53
*** enikanorov has quit IRC		21:53
jeblair	mordred, clarkb: let's do this. think about it a bit more, and talk to timrc if you want. do some experiments to see if we would actually gain anything, and if you want to do it, propose it as either a backlog priority effort, or propose that we bump a current priority effort for it at the next meeting.	21:53
jeblair	mordred, clarkb: how's that sound?	21:53
* timrc perks up		21:53
clarkb	jeblair: ya, we should definitely test it in the new hpcloud situation too	21:53
mordred	jeblair: I think that's a great plan	21:54
*** mriedem has quit IRC		21:54
clarkb	jeblair: not sure that timrc's testing captured what it is like now	21:54
timrc	clarkb, I pastebin'ed my numbers on the review for 'rebuild'	21:54
clarkb	timrc: yes, but you did so before we blew up hpcloud yesterday	21:54
*** weshay has quit IRC		21:54
*** enikanorov has joined #openstack-infra		21:54
mordred	there is another thing - which is that I think one of us might need to test in the openstackjenkins2 tenant	21:54
timrc	clarkb, Yeah... want me to rerun the script?	21:54
mordred	timrc: so if you can provide a script	21:54
timrc	There is a script...	21:55
mordred	clarkb: I say that beacuse I think delete time might be tied to tenant account too	21:55
clarkb	mordred: gotcha	21:55
*** garyh has joined #openstack-infra		21:55
timrc	Give me a second... I'm running off of what my carrier says is 4G on a beach in south Texas.	21:55
mordred	I don't know that it is - but if there are database table issues, then our soft-delete database history could make a difference	21:55
mordred	timrc: it's not urgent	21:55
clarkb	but ya I think checking that performance is a good next step and from there we can decide if its worth the effort	21:55
mordred	clarkb: ++	21:55
timrc	mordred, clarkb, jeblair: Numbers: http://paste.openstack.org/show/187334/ Script: http://paste.openstack.org/show/187333/	21:56
jeblair	okay, i'm going to get back to writing the zuulv3 spec so that it stops being an imaginary thing, and so we can get closer to actually working on it instead of blocking on me	21:56
fungi	thanks jeblair!	21:56
mordred	jeblair: woohoo!	21:57
jeblair	that's region-a, which is another difference, yeah?	21:57
fungi	mordred: on the "limping along" front, any updates on whether hpcloud west is something we should try?	21:57
clarkb	jeblair: yes we are in region-b	21:57
timrc	clarkb, So... when you say blew up hpcloud... what does that actuall mean? I've been on vacation.	21:57
SpamapS	bewm	21:57
jeblair	kabloom	21:57
SpamapS	as in, boom with a fratboy accent	21:57
fungi	timrc: maybe you should avoid worrying about it while vacationing. seems like a waste of a good vacation	21:57
mordred	timrc: go back to vacation - you don't want to know	21:57
clarkb	timrc: https://community.hpcloud.com/status/incident/2944	21:58
timrc	fungi, Do you have kids?	21:58
fungi	timrc: point taken ;)	21:58
timrc	;)	21:58
*** gordc has quit IRC		21:58
SpamapS	it kind of makes sense.. rebuild does an update on the row a few times (for status, image id) but otherwise just happens all in nova-compute	21:58
mordred	yah - also, we dont' have to re-floating-ip	21:58
mordred	so the number of API calls it takes is considerably less	21:58
mordred	and if API call limit is one of our blockers	21:59
SpamapS	create has to be scheduled	21:59
mordred	then that might actually be more important - or at least an important factor	21:59
*** carl_baldwin has quit IRC		22:00
SpamapS	yeah reducing API calls would be a win especially for hpcloud's current ailments	22:00
fungi	agreed, testing rebuild performance in our tenant while hpcloud is failing to respond to most of our api calls might be an interesting performance datapoint	22:00
SpamapS	also with HPCloud floundering, does this increment the priotity a bit on infra cloud?	22:00
*** armax has quit IRC		22:00
fungi	SpamapS: dunno. i haven't seen recent updates the people who were writing the infra-cloud bits, though i could have missed some	22:01
fungi	er, from the people	22:01
jeblair	SpamapS: i think infra cloud is already fairly high priority because of this	22:01
jeblair	fungi: may be a misconception here... lemme splain	22:01
jeblair	i've asked the new folks joining our team to pitch in on existing efforts because i don't want there to be an infra-cloud team which is separate from the infra team	22:02
*** mrmartin has joined #openstack-infra		22:02
SpamapS	I've moved writing the initial docs changes up to the top of my priority, to be multi-plexed with nodepool and shade testing.	22:02
fungi	oh! yes that's an extremely good thing	22:02
*** tnovacik has quit IRC		22:02
jeblair	i think that's going pretty well so far	22:02
jeblair	so i think that as SpamapS finishes up the doc he's writing...	22:03
fungi	that explains the recent uptick in people getting more involved in general infra stuff, so i'm thrilled. seems to have worked out well so far	22:03
jeblair	we can start to slot that effort into the priority list when one or more things wrap up	22:03
*** sabeen1 has quit IRC		22:04
mordred	SpamapS: actually ... I was going to ask you if you'd inject work on nodepool-dib into your priority list	22:04
fungi	yep, i missed that's how it was ramping up. i may have skimmed one of the meeting logs from when i was on vacation a little too lightly	22:04
jeblair	and we'll all be working on it together, at least as much as we do anything else -- some people are going to focus on it more than others, but it'll operate more like the other things we've got going on	22:04
clarkb	SpamapS: did the first step of homogenizing hardware get started? /me wonders where we are at	22:04
mordred	SpamapS: because I think both of the main blocking tasks there you are exceptionally well suited to attack	22:04
mordred	clarkb: no - we have done no tasks there	22:04
*** Swami has quit IRC		22:05
*** jamielennox is now known as jamielennox\|away		22:05
mordred	clarkb: I have run puppet on a node in each cloud region - so you have a login on them	22:05
* greghaynes might also be able to help with nodepool-dib if SpamapS is spread too thin		22:05
*** garyh has quit IRC		22:05
mordred	clarkb: but they have not been cleaned in any way - pending what jeblair is discussing before	22:05
SpamapS	mordred: by all means, push things onto my stack. :)	22:05
jeblair	also, we expect to have a few more people joining soon too	22:05
mordred	SpamapS, greghaynes: Ng and GheRivero are starting to look as well - but the two tricky tasks are:	22:06
clarkb	mordred: I understood that was step 0 what is step -1?	22:06
fungi	out of curiosity, and feel free to point me to existing descriptions/documentation, what sort of initial capacity are we expecting out of the current hardware?	22:06
mordred	clarkb: current infra priority efforts	22:06
jeblair	so i hope that happens in a time frame where they can also pitch in on non-infra-cloud things, but also be here when we really start on infra cloud	22:06
mordred	SpamapS, greghaynes: get a working base image that can boot on both rackspace and hp for ubuntu and centos/fedora	22:06
clarkb	mordred: right but aiui its just an internal hp ticket to have someone physically located in the same building as the hardware move some pcie cards around	22:06
flashgordon	anteaya: pong, re: devstack-gate triggers grenade which is why we still should run grenade on it	22:06
flashgordon	clarkb: pong, which bug?	22:07
greghaynes	mordred: ah, and AIUI the issue there is just rax networking?	22:07
mordred	clarkb: yes - but we need to also do some more design on networking vlans, which is an infra team task before we set that in motion	22:07
clarkb	flashgordon: `nova boot` returns 502 error from api server, but then nova boots the node anyways	22:07
mordred	greghaynes, SpamapS: we have a script that can handle rax networking	22:07
mordred	current issue is making sure that script runs at the right tie during boot	22:07
SpamapS	mordred: Ah see here I thought you had that well in hand and it was about done.	22:07
flashgordon	clarkb: what is the full response?	22:07
clarkb	mordred: gotcha	22:08
mordred	greghaynes, SpamapS: Ng has been looking at it some, but is battling rackspace london	22:08
flashgordon	clarkb: very odd	22:08
*** dboik has joined #openstack-infra		22:08
mordred	greghaynes, SpamapS: but, in any case, you guys know a lot about dib things too :)	22:08
clarkb	mordred: have that nova 500 exception handy?	22:08
*** marun has quit IRC		22:08
mordred	clarkb: one sec ...	22:08
anteaya	flashgordon: okay where do I find a list of projects grenade runs on?	22:08
mordred	clarkb: it's a ClientException "unknown error" fwiw	22:09
mordred	SpamapS, greghaynes: the second thing is the "port nodepool to shade" task - which yolanda started and GheRivero started looking at	22:09
flashgordon	anteaya: two answers, 'git grep project-name' in grenade	22:09
mordred	but it's going to be a not-small patch	22:09
mordred	so collaboration is likely important	22:09
clarkb	flashgordon: ClientException: Unknown Error (HTTP 500)	22:09
mordred	it's going to involve porting smarts from nodepool into shade in a few places	22:10
flashgordon	anteaya: and http://git.openstack.org/cgit/openstack-dev/grenade/tree/ check for upgrade-* as you mentioned above	22:10
fungi	clarkb: flashgordon http://paste.openstack.org/show/193961/	22:10
clarkb	we apparently don't have any 502's in the log so I was wrong about specific error	22:10
fungi	502 error actually	22:10
SpamapS	mordred: writing more tests will end up being a good parallel effort to maintain while that is ongoing.	22:10
anteaya	flashgordon: horizon isn't there, devstack-gate isn't there	22:10
clarkb	oh two different types of Unknown Error. The best kind of Unknown	22:10
anteaya	flashgordon: neutron isn't there	22:10
openstackgerrit	melanie witt proposed openstack-infra/project-config: Adjust regression exceptions for Nova Cells V1 job https://review.openstack.org/166396	22:10
mordred	SpamapS: yes indeed - but it would be great to shove a facehead into the bucket of that - I'm betting it will expose some specific things that need specific testing	22:11
flashgordon	upgrade-neutron	22:11
greghaynes	mordred: so for the first task - is the state of things that were all good for images in hpcloud and weve yet to get booting images in rax or is there hpcloud issues as well?	22:11
SpamapS	I keep forgetting that infra uses topics so nicely in gerrit.	22:11
clarkb	greghaynes: hpcloud is good	22:11
* SpamapS finds all the revies		22:11
SpamapS	reviews even	22:11
flashgordon	anteaya: https://github.com/openstack-dev/grenade	22:11
flashgordon	anteaya: http://paste.openstack.org/show/194083 is what I see	22:12
clarkb	flashgordon: but remember I was complaining that we were leaking nodes? this is how	22:12
mordred	greghaynes: it's actually mostly that centos/fedora aka systemd is weird	22:12
flashgordon	devstack-gate calls grenade so the relationship is the other way around	22:12
flashgordon	clarkb: ahh	22:12
mordred	greghaynes: we MAY be really close to being awesome everywhere	22:12
flashgordon	clarkb: that log isn't really useful hmmm	22:12
greghaynes	mordred: yep, thats the whole we need our script to run at the right time WRT networking and cloud-init, yes?	22:13
*** amitgandhinz has quit IRC		22:13
mordred	greghaynes: but we need to do empirical testing of the axises of (ubuntu, debian, centos, fedora) * (hpcloud, racspace)	22:13
anteaya	flashgordon: what is upgrade-infra?	22:13
clarkb	flashgordon: yes well we only have nova to blame for that :) but I think the key bit is that even after a 5XX error it is possible for nova to continue scheduling a node happily	22:13
greghaynes	ok	22:13
mordred	greghaynes: yes - except no cloud-init	22:13
SpamapS	mordred: maybe we need to make our script a little more fuzzy	22:13
fungi	flashgordon: not really useful, but also all that novaclient tells us	22:13
greghaynes	oh, we changed that again	22:13
clarkb	mordred: wait	22:13
SpamapS	as in, run it backgrounded and keep trying, as long as we block bad things from happening.	22:13
clarkb	mordred: lets back up on that, we just tried no cloud-init and it failed spectacularly	22:13
clarkb	mordred: I think we should cloud init for this reason	22:13
clarkb	mordred: at least as a first stab	22:14
mordred	nonononoo	22:14
mordred	nononononononononononono	22:14
mordred	nononononon	22:14
fungi	clarkb: but remember that means installing our own non-distro-packaged cloud-init too	22:14
mordred	that is tottally differeent	22:14
mordred	please don't confuse the issues	22:14
greghaynes	I think hes stuck in a loop	22:14
clarkb	mordred: I don't think I am, you just said no cloud init	22:14
mordred	no	22:14
SpamapS	reboot him	22:14
clarkb	mordred: we just tried that, it broke	22:14
mordred	hangon	22:14
mordred	no	22:14
* timrc gets popcorn		22:15
*** esker has joined #openstack-infra		22:15
mordred	it broke because we assumed that rax wasn't using cloud-init in their images	22:15
mordred	they are	22:15
mordred	we are not using their images	22:15
flashgordon	anteaya: upgrade infra is some random stuff AFAIK	22:15
mordred	in order to use cloud-init in our own images	22:15
* SpamapS sends a SIGHUP		22:15
mordred	we MUST use a patched version of cloud-init	22:15
anteaya	flashgordon: sigh	22:15
mordred	and it all gets very complex	22:15
mordred	I promise - we do not need to start over from scratch on this effort	22:15
anteaya	flashgordon: okay I have to go get more sap before I burn what is on the stove	22:15
SpamapS	right I believe nibalizer was working on cloud-init-in-a-virtualenv for that purpose?	22:15
anteaya	flashgordon: I'll look at it again later, thanks	22:15
mordred	we are an initscript away from being done	22:15
mordred	noe	22:15
mordred	NO seriously	22:16
mordred	can we not start over from scratch	22:16
SpamapS	ok for some other weird purpose. :)	22:16
mordred	I don't care why he was working on it	22:16
*** ddieterly has quit IRC		22:16
mordred	I don't want to keep having this argument	22:16
SpamapS	I'm up for not starting over	22:16
mordred	we are almost done with this	22:16
SpamapS	mordred: is there somewhere I can look at the result of things not working right?	22:16
mordred	it works - we're fine - we need to test it and make sure we've covered the combinations	22:16
clarkb	mordred: so, can you clarify what rax does use cloud init for and why we don't need to use it for that?	22:16
SpamapS	mordred: oh so it's truly at a point of needing to be reasoned about and landed, not smoke tested?	22:17
flashgordon	clarkb: can you run this with novaclient debug logs on?	22:17
mordred	SpamapS: yes	22:17
flashgordon	otherwise I don't have enough to go on	22:17
*** abramley has quit IRC		22:17
mordred	clarkb: they have designed images that depend on cloud-init - I have not dug in to why	22:17
mordred	but it's irrelevant	22:17
greghaynes	so, somewhat related question - mordred is there any sane way to get creds to boot some rax vms	22:17
mordred	we have booted appropriate images in racksapce that do not have cloud-init and they work fine	22:17
SpamapS	mordred: ok, please point me at a starting point.. just the element in project-config ? Active review?	22:17
flashgordon	anteaya: no worries. so devstack has a lib/infra section	22:18
mordred	greghaynes: yes - use your amex - I will approve it :)	22:18
flashgordon	upgrade-infra calls taht	22:18
greghaynes	mordred: easy enough	22:18
mordred	greghaynes: just make sure they don't put you in london	22:18
mordred	and you'll have to request glance being turned on	22:18
mordred	SpamapS: one sec - I'm looming	22:18
mordred	looking	22:18
*** esker has quit IRC		22:19
mordred	SpamapS, greghaynes: https://review.openstack.org/#/c/154132/	22:19
pleia2	fungi: just saw your superuser.o.o interview, very nice!	22:19
fungi	greghaynes: i would say sign up for the iopenedthecloud.com promotion, but they stopped offering it and took the form offline	22:19
flashgordon	clarkb: not nova -- 10.5.3 502 Bad Gateway	22:19
mordred	needs to be finished - and we'll want to import that repo into gerrit (the one in source-repositories)	22:19
fungi	pleia2: you're welcome! ;)	22:19
flashgordon	http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html	22:19
mordred	but I figured just leaving it there until we're good is fine	22:19
greghaynes	fungi: Yea, missed that boat :(	22:19
*** esker has joined #openstack-infra		22:19
SpamapS	ahh base-elements as a topic	22:20
*** abramley has joined #openstack-infra		22:20
*** arxcruz has quit IRC		22:20
clarkb	flashgordon: it happens with 500 errors too iirc	22:20
clarkb	fungi: ^ did we narrow it down only to the 502?	22:20
flashgordon	clarkb: and nova doesn't raise any bad gateways	22:20
fungi	clarkb: i didn't see the 500 examples but i'll look for one	22:21
flashgordon	clarkb: do you have a 500 error log as well?	22:21
clarkb	flashgordon: its the same just s/502/500/	22:21
flashgordon	clarkb: hopefully if its a nova bug it will have a error message	22:21
clarkb	flashgordon: Unknown Error	22:21
fungi	clarkb: flashgordon: i wouldn't be surprised if this is some sort of network device sitting in front of teh api endpoint getting overwhelmed	22:21
flashgordon	no message	22:21
clarkb	fungi: ya	22:21
flashgordon	fungi: yeah that is my bet too	22:21
mordred	SpamapS, greghaynes: I imagine that you and Ng can probably knock it out in like, 30 minutes	22:21
flashgordon	which cloud is this on ?	22:21
clarkb	its entirely possible that nova is doing that it is told while a frontend device derps	22:21
SpamapS	mordred: I wonder if it would be simpler to land that script in the review, and then once we know it works, publish it as its own repo?	22:22
clarkb	flashgordon: hpcloud	22:22
flashgordon	clarkb:ah	22:22
mordred	SpamapS: I'm fine with whatever works best for your brain	22:22
greghaynes	mordred: yea, I suspect 90% of the effort is going to be just getting setup with rax properly	22:22
mordred	greghaynes: yup	22:22
mordred	but once you are - it'll make future testing things easier	22:22
mordred	because making sure shade works in both places is important too	22:22
greghaynes	yep, good point	22:22
SpamapS	mordred: oh its like, a thing, with setup.cfg and stuff	22:23
clarkb	fungi: iopenedthecloud ends next month too for those of us that have it iirc	22:23
mordred	SpamapS: we have this cookiecutter thing ...	22:23
clarkb	fungi: I need to find hosting that doesn't charge a $50 base fee	22:23
SpamapS	mordred: its more about it being best for velocity. Less moving parts in the beginning.	22:23
SpamapS	lurve me some single purpose well tested repos, but that this is not. ;)	22:23
mordred	SpamapS: yes - TOTALLY - I say do it - we can put it back later	22:23
fungi	clarkb: yeah i was never able to get them to correct my account to add that promotion so i've just been paying for about a year	22:23
SpamapS	mordred: ok	22:23
* mordred must run away ...		22:24
harlowja_	clarkb hey, do u know if that virtualenv change ever happened so that https://review.openstack.org/#/c/164836/ can get rechecked?	22:24
SpamapS	greghaynes: so, I suggest you start trying to use that review, and I will whip it into shape to be landed	22:24
SpamapS	mordred: sounds good, we got this	22:24
clarkb	harlowja_: no one has written it yet	22:24
clarkb	harlowja_: feel free to	22:24
harlowja_	k	22:24
fungi	clarkb: so far the 500 errors i'm finding are tripleo	22:24
greghaynes	SpamapS: Yep	22:24
greghaynes	mordred: Yes, all your base-elements are belong to us	22:25
clarkb	harlowja_: we have been battling the cloud exploded fires	22:25
harlowja_	np	22:25
*** prad has quit IRC		22:25
clarkb	harlowja_: but it should be as simple as updating the line I linked with the latest version	22:25
harlowja_	ya	22:25
clarkb	harlowja_: or replacing the version specifier with ensure => latest	22:25
harlowja_	will get a review up	22:25
clarkb	harlowja_: ^ is likely the change we really want since we don't care about aging virtualenv/pip/setuptools	22:26
harlowja_	ya	22:26
fungi	clarkb: oh, found some http 500 errors in hpcloud but they're all for delete calls so far	22:26
clarkb	fungi: so its possible that only 502s caused the leaks	22:26
fungi	ooh! ClientException: Unknown Error (HTTP 503)	22:27
fungi	there's another to hunt down	22:27
fungi	that was also on a deletion	22:27
flashgordon	clarkb: any idea of who I can switch to after the rax thing ends	22:27
flashgordon	mikal: I can has free cloud?	22:28
clarkb	flashgordon: no, I haven't really looked. I would stick with rax if the $50 base charge wasn't a thing	22:28
flashgordon	$50 base whaaat	22:28
fungi	clarkb: flashgordon: maybe https://www.runabove.com/	22:28
clarkb	https://www.arpnetworks.com/ are supposed to be good but not openstack	22:28
fungi	they have pretty low rates and are supposedly basic openstack services	22:28
zaro	anteaya: yep, looks like everything is in place.	22:28
openstackgerrit	Joshua Harlow proposed openstack-infra/system-config: Always try to use the latest virtualenv https://review.openstack.org/166404	22:28
harlowja_	clarkb ^ ok, let's see how that goes	22:29
*** ociuhandu has joined #openstack-infra		22:29
* flashgordon wants free		22:29
clarkb	fungi: their prices are pretty good and its openstack	22:30
*** YogeeBear has joined #openstack-infra		22:31
greghaynes	highly reccomend arpnetworks	22:32
clarkb	yes but not openstack	22:33
greghaynes	:p	22:33
greghaynes	yea, if you want to actually do dev and need a cloud then :(	22:33
greghaynes	I need to deploy an openstack in my rack so I can do this	22:34
fungi	i gave up having a small datacenter in my house when i moved to the beach	22:34
fungi	so remote virtual machines are now pretty necessary for me	22:34
greghaynes	fungi: I actually colo	22:35
greghaynes	but yea	22:35
greghaynes	thers upsides and downsides	22:35
fungi	a colo near where i live would be way more expensive than what i'm doing now	22:35
openstackgerrit	Joe Gordon proposed openstack-infra/project-config: Don't run neutron-large-ops on neutron advanced services https://review.openstack.org/165648	22:35
reed	fungi, when you have time, remember to pull the list of new ATCs... you can do it on Monday	22:35
fungi	and "near" would be ~2-3 hours drive	22:35
greghaynes	eek	22:35
fungi	reed: yep, on my list for this weekend	22:35
fungi	though might end up being monday	22:36
greghaynes	We end up having pretty cheap colo here in pdx (although its not the best DC) and I actually just do it because my home became way too hot otherwise	22:36
reed	no work over weekend, fungi	22:36
clarkb	greghaynes: by not the best DC I think you mean its basically someones garage	22:36
*** bknudson has quit IRC		22:36
clarkb	greghaynes: because lol shelf servers	22:36
fungi	reed: what work? i'm happily enjoying retired life	22:36
reed	LOL	22:36
greghaynes	clarkb: haha, its actually impressive as the building and infra goes, but yea they are pretty low key on how they manage it	22:37
greghaynes	clarkb: tata was going to move their HQ there then the bubble burst and the dc was just kinda overbuilt and underused	22:37
fungi	greghaynes: my home datacenter had a separate air conditioning unit for exactly that reason	22:37
fungi	and it was in my basement, so in the winter i'd just open the door to the rest of the house and use the computers as auxiliary heating	22:38
*** YogeeBear has left #openstack-infra		22:38
greghaynes	nice!	22:38
fungi	i had relay racks bolted to heavy duty shipping pallets with swivel casters underneath as my poor-man's raised floor, so i could move them around in the room as needed	22:40
fungi	3x3 grid of 500lb-rated swivel casters underneath each	22:40
clarkb	runabove won't let me try their free tier without supplying a credit card	22:40
clarkb	I get it but :(	22:41
*** erlon_away has quit IRC		22:41
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/project-config: Add config-drive element https://review.openstack.org/154132	22:41
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/project-config: Add elements for Infra servers https://review.openstack.org/140840	22:41
SpamapS	greghaynes: ^	22:41
SpamapS	removed the gitorious dependency until we can get that code into its own proper stackforge (openstack??) project.	22:42
fungi	clarkb: yeah, and also a friend of mine tried to sign up with it for the very small start-up he was working for, and they insisted he provide paperwork proving the company existed. it's a french parent company, so "shut up and take my money" doesn't work with them like it does with, say, amazon	22:42
SpamapS	greghaynes: also what happened to your devuser patch to dib?	22:42
fungi	SpamapS: gitorious? you mean gitlab right? ;)	22:42
greghaynes	SpamapS: not merged yet	22:42
SpamapS	fungi: sourceforge ftw?	22:43
fungi	SpamapS: apparently	22:43
greghaynes	SpamapS: oh, looks like they added a linter for some dib stuff that it needs to be updated for https://review.openstack.org/#/c/153439/	22:43
greghaynes	SpamapS: You going to use it?	22:43
*** sdake has joined #openstack-infra		22:43
clarkb	wow and I can't boot any instances because they are reserved for paying customers currently	22:44
SpamapS	greghaynes: I had a moment of wanting to just run 'kvm foo.qcow2' to test .. and no user to login as. ;)	22:44
*** mriedem has joined #openstack-infra		22:44
clarkb	I wanted to see what their sandbox instances are. container vs vm etc	22:44
clarkb	I am in a waiting list though so I shall wait	22:44
SpamapS	clarkb: just setsockopt O_NONBLOCK .. 60% of the time it works, every time.	22:45
greghaynes	SpamapS: Also, not sure if you saw: https://review.openstack.org/#/c/156433/	22:45
fungi	clarkb: so far the 503 errors are also deletions, but in rackspace	22:45
clarkb	weird	22:45
greghaynes	SpamapS: Im unsure if the fact that people have tested in rax means that is verified as working or if people have been building images by hand, but it would be good to test	22:45
nibalizer	what was I supposedly doing?	22:46
greghaynes	nibalizer: fixing everything	22:46
nibalizer	well i can confirm i am not doing that	22:46
nibalizer	SpamapS: ?	22:46
fungi	clarkb: flashgordon: oh, here's a ClientException: Unknown Error (HTTP 500) on create in hpcloud, so we do see some with that too	22:47
*** baoli has joined #openstack-infra		22:47
greghaynes	nibalizer: we were talking about why you were messing with cloud-init in venv	22:47
*** sdake_ has quit IRC		22:47
greghaynes	but its actually not relevant since were decidedly not trying to cloud-init ATM	22:47
clarkb	Ithink I may have discovered that if I use horizon with runabove I can boot an instance <_<	22:47
clarkb	we will see if it actually successfully boots	22:48
nibalizer	okay coool, yea burn cloud-init in the firepit plz	22:48
nibalizer	hopefully its author isn't in this room	22:48
fungi	oh, he is	22:49
*** baoli has quit IRC		22:49
* nibalizer apologizes		22:50
clarkb	woot I got a node, total hacks	22:50
clarkb	it looks like their $2.50/month node is a kvm vm	22:50
clarkb	which is winning	22:50
fungi	what specs for ram/disk?	22:51
clarkb	2GB ram 20GB disk	22:51
flashgordon	fungi: do you have the full trace?	22:51
clarkb	but its oversubscribed, the ~$10/month is supposedly not oversubscribed	22:52
clarkb	also you have to login as admin	22:52
fungi	flashgordon: sure, but the files/lines are the same as from the 502	22:52
flashgordon	fungi: :/ was hoping for some other data	22:52
flashgordon	fungi: once again w/o the debug logs from novaclient ...	22:53
clarkb	also they give you a real ip addr	22:53
clarkb	but no ipv6	22:53
fungi	flashgordon: yep, it's exactly the same point in the client, just a (slightly) different http error code	22:53
flashgordon	so 500's are generally something went really wrong	22:54
flashgordon	so not to surprised that case is leaking things	22:54
*** enikanorov has quit IRC		22:55
clarkb	fungi: my node seems to be in europe too	22:55
fungi	clarkb: yeah, i think they have several datacenters in europe and also at least one in quebec	22:55
clarkb	ya I wasn't given an option via horizon to chose a region, I should dig into that more	22:56
*** enikanorov has joined #openstack-infra		22:56
flashgordon	fungi: my hunch is its a layer on top of OpenStack as well	22:57
flashgordon	fungi: there may be a way to better detect what nodes nodepool started versus others	22:57
*** dimsum__ has quit IRC		22:58
flashgordon	to make it easier to detect zombies	22:58
*** hdd has joined #openstack-infra		22:58
clarkb	flashgordon: there is, we can write metadata on each node that states which nodepool booted the node	22:58
clarkb	then if that nodepool doesn't know about that node it can delete it	22:58
clarkb	or adopt it I guess	22:58
anteaya	zaro: great well I'm going offline then so I can be coherent tomorrow, see those who will be there at 1500	22:58
*** andreykurilin_ has quit IRC		22:58
flashgordon	clarkb: there me be a better way	22:59
*** dimtruck is now known as zz_dimtruck		22:59
*** dimsum__ has joined #openstack-infra		22:59
flashgordon	clarkb: yeah nova boot --meta	22:59
flashgordon	that kind of metadata	22:59
*** dimsum__ is now known as dims		23:00
*** tsg_ has quit IRC		23:00
fungi	flashgordon: yep, we definitely have an idea of how we might do that (or just ignore the 5xx errors per mordred's proposed patch) but regardless if it was likely to be a bug in nova we wanted to figure out whether we had enough details to make a useful bug report or identify if it's an already known issue	23:01
clarkb	mordred: that node I was follwing in hpcloud went ready and was then used	23:01
clarkb	mordred: and is now in the delete queue	23:01
clarkb	mordred: so I think if you change works in rax (where are we on that) then we are good	23:02
flashgordon	fungi: in the 5xx case do you get a instance id?	23:02
flashgordon	fungi: ahh	23:02
*** boris-42 has joined #openstack-infra		23:02
fungi	flashgordon: not in the api response i don't think	23:02
zaro	anteaya: thanks for remind the crew, see you tomorrow.	23:03
clarkb	fungi: correct we get the 50X not a uuid	23:03
clarkb	which then leads to leaking the node, mordreds hackaround is to query based on the name we told to boot	23:03
fungi	zaro: looking forward to it	23:03
fungi	in good news, zuul has about finished chipping away at its waiting jobs	23:04
*** thedodd has quit IRC		23:05
*** garyh has joined #openstack-infra		23:06
fungi	speaking of alien nodes, 217 at the moment	23:06
fungi	jeblair: when you said slowly deleting those did you manually do so or have you been doing it continuously in a loop? if the former, i'll go ahead and do another cleanup pass while i'm thinking about it	23:07
*** mtanino has quit IRC		23:08
*** sarob has quit IRC		23:08
*** ghostpl_ has joined #openstack-infra		23:08
flashgordon	clarkb: ahh that is the workaround	23:09
flashgordon	fungi: so if no req-id its not a nova bug	23:09
flashgordon	as a general rule of thumb	23:09
clarkb	flashgordon: oh you want reqid	23:10
clarkb	flashgordon: I was talking instance uuid	23:10
*** wenlock has quit IRC		23:10
fungi	i went ahead and started deleting the current alien nodes	23:11
flashgordon	clarkb: err I meant instance uuid	23:14
flashgordon	well really either	23:14
*** ddieterly has joined #openstack-infra		23:14
flashgordon	if get neither it isn't a nova thing	23:14
jeblair	fungi: i had 2 processes going through them; they are finished now (sorry i don't know the completion time, but it was probably within the last hour)	23:15
*** esker has quit IRC		23:15
*** garyh has quit IRC		23:15
fungi	jeblair: cool, well i've got a serialized pass going now	23:16
jeblair	fungi: good, that should reduce the load and decrease the chance of timeouts on our side (which cause more alien nodes)	23:17
fungi	so since you started your pass, we accumulated more than 200 additional	23:17
*** sarob has joined #openstack-infra		23:17
*** Bsony has quit IRC		23:18
fungi	but demand is now down to the point where i don't think we're going to continue accumulating many from here through the weekend	23:18
jeblair	fungi: yeah. my gut is we can ascribe some of them to my additional activity (especially since i started with 10 of them in parallel before i realized the effect). but not all of them. perhaps i would discount that by 50-100. so still a serious problem.	23:18
*** esker has joined #openstack-infra		23:18
fungi	it's also about time to wind down here and do some friday night things, but i'll keep an eye on irc in case something goes horribly, horribly wrong	23:19
*** ibiris is now known as ibiris_away		23:21
pleia2	fungi: enjoy, see you in the morning	23:21
fungi	absolutely	23:22
*** enikanorov has quit IRC		23:22
*** enikanorov has joined #openstack-infra		23:23
*** achanda has quit IRC		23:24
SpamapS	greghaynes: reviewed your compress_and_save thing	23:24
*** ociuhandu has quit IRC		23:25
*** achanda has joined #openstack-infra		23:26
*** mrmartin has quit IRC		23:27
greghaynes	hrm?	23:29
*** unicell has quit IRC		23:29
* greghaynes has too many things lying around		23:29
*** unicell has joined #openstack-infra		23:29
greghaynes	the VHD one?	23:29
*** tkelsey has joined #openstack-infra		23:31
SpamapS	greghaynes: hah sorry I meant VHD	23:32
SpamapS	greghaynes: but I said compress_and_save because thats what I made a comment on	23:32
*** tsg has joined #openstack-infra		23:34
pleia2	doh, I think the toggle ci button broke on production review.o.o	23:35
*** tkelsey has quit IRC		23:35
SpamapS	hopefully it broke and always shows CI because I hate that CI is hidden by default now. ;)	23:36
pleia2	yeah, and jenkins results aren't up at the top with our votes	23:36
SpamapS	oh well I like that. ;)	23:37
openstackgerrit	James E. Blair proposed openstack-infra/infra-specs: WIP: Add Zuul v3 spec. https://review.openstack.org/164371	23:37
SpamapS	(the results at the top) :-p	23:37
*** unicell has quit IRC		23:37
jeblair	(still not done, but a little more defined)	23:37
*** unicell has joined #openstack-infra		23:37
*** tonytan4ever has quit IRC		23:37
jeblair	pleia2: did merging that change result in a broken symlink?	23:38
*** esker has quit IRC		23:38
pleia2	jeblair: oddly not, we've still got jquery.js and jquery.min.js with old timestamps and living out their lives as separate files	23:39
pleia2	oh, it's the static one from /home I should be looking at	23:39
jeblair	SpamapS: well, the idea is that only the latest ci is shown. so it should never be hidden, but there's no need to wade through 100 auto generated messages	23:39
jeblair	pleia2: ah, one idea is that gerrit may need to be restarted -- it's got something weird going on with the hash of the file that i'm not sure we fully understand	23:40
SpamapS	jeblair: I think I would like that better if I worked on some of the projects with 100's of auto generated CI results. :)	23:40
pleia2	-rw-r--r-- 1 root root 243K Mar 20 22:14 jquery.js	23:40
pleia2	so that changed	23:40
jeblair	pleia2: want to restart gerrit and see if it fixes it?	23:41
pleia2	it's also broken on review-dev and our new server	23:41
*** harlowja_ has quit IRC		23:41
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/project-config: Add config-drive element https://review.openstack.org/154132	23:41
jeblair	pleia2: do we have gerrit running on our new server?	23:41
*** markvoelker has quit IRC		23:42
pleia2	oh, no, it just redirects	23:42
*** harlowja has joined #openstack-infra		23:42
pleia2	jeblair: we can try a restart, I'll need some help there though	23:43
pleia2	maybe on review-dev first? but I don't know if review-dev has any special weirdness	23:44
*** che-arne has quit IRC		23:45
*** emagana has joined #openstack-infra		23:45
pleia2	note to self: don't find these things at 16:30 on friday	23:46
jeblair	:)	23:46
jeblair	yeah, why don't you try review-dev first	23:46
jeblair	pleia2: nothing tricky about gerrit restarts; /etc/init.d/gerrit restart should do it	23:46
*** ajmiller has quit IRC		23:46
jeblair	i'm here as backup	23:46
pleia2	do we use init.d or service gerrit restart?	23:47
jeblair	pleia2: i suppose service is more correct; i think init.d works tho	23:47
pleia2	I'll do init.d today, here goes on review-dev	23:47
SpamapS	jeblair: do you only wear your grey beard when you're on IRC, or sometimes at the market too? ;-)	23:47
* SpamapS secretly wishes upstart and systemd had never been invented and /etc/init.d/ was still "the way"		23:48
jeblair	SpamapS: i stop people at the market and tell them what i think about systemd	23:48
pleia2	still broken :(	23:48
jeblair	SpamapS: which basically means i blend in perfectly in berkeley	23:48
SpamapS	"Hello, do you have a few minutes to discuss pid 1?"	23:48
pleia2	no wait, I think it's ok!	23:49
pleia2	https://review-dev.openstack.org/#/c/5270/	23:49
SpamapS	"Have you considered what will happen to your zombie processes when you die?"	23:49
jeblair	pleia2: lgtm!	23:49
jeblair	pleia2: so the only advice i'd give about a prod gerrit restart is don't do it right before zuul is about to merge a change	23:50
pleia2	so, restarting real gerrit, anything special to do re: telling people or anything?	23:50
pleia2	ah	23:50
jeblair	pleia2: current top of the queue is 11 mins out so you should be fine	23:50
pleia2	ok, I'm going to do it now then	23:50
jeblair	sounds good	23:50
*** pc_m has quit IRC		23:51
SpamapS	hah, distributed systems are hard.	23:51
jeblair	if we were to do this in the middle of the day, i might consider a statusbot notice, but no one is around except SpamapS so. :)	23:51
pleia2	hehe	23:52
SpamapS	jeblair: that guy wouldn't know what to do with statusbot notices anyway	23:52
pleia2	alright, back up, let's see	23:52
pleia2	all better!	23:52
pleia2	thanks jeblair	23:52
jeblair	yay!	23:52
jeblair	as a bonus, gerrit will be nice and speedy until we shut it down again tomorrow. :)	23:52
pleia2	so javascript changes require a gerrit restart	23:52
pleia2	makes sense (what)	23:52
jeblair	pleia2: right? :)	23:52
jeblair	i thought that touching the site include file was supposed to avoid that, but i'm not certain we fully understand what's going on	23:53
* pleia2 nods		23:53
jeblair	and restarting is faster than figuring it out.	23:53
SpamapS	pleia2: re: "sense" http://www.quickmeme.com/img/a5/a5fd9f50473ea78ab4a5668771803996dfaebe931facffc060a9c530337dc7e7.jpg	23:53
pleia2	SpamapS: ++	23:54
jeblair	what a nice way to end the day	23:54
SpamapS	some day.. I'll figure out why downloading an image from cloud-images.ubuntu.com on my home connection tops out at 1Mbit	23:54
SpamapS	but if I download it to an hpcloud instance, and then to my home box, 40Mbit all the way :-P	23:54
SpamapS	which at least is effectively 20x faster but 100x more annoying.	23:55
*** dannywilson has quit IRC		23:56
*** gyee has quit IRC		23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!