Monday, 2014-01-20

sdague	fungi: hey, so what just happend with nodepool, just saw a huge drop in nodes	00:03
fungi	sdague: i'm restarting it to try the aggressive-delete patch and see if that gets us back some of those deleted nodes faster	00:03
sdague	cool	00:04
sdague	+1	00:04
fungi	but it has to quiesce node creation/deletion activity before a graceful restart	00:04
fungi	almost there	00:04
lifeless	fungi: ctrl-C :P	00:04
*** jhesketh_ has joined #openstack-infra		00:05
sdague	fungi: once you are done, promoting - 67480 is probably a good idea. It will help give us a console on some of the network tests that are racing	00:09
fungi	will do	00:10
*** salv-orlando has quit IRC		00:11
*** dcramer_ has quit IRC		00:12
sdague	thanks	00:12
openstackgerrit	lifeless proposed a change to openstack-infra/config: Clamp MTU in TripleO instances https://review.openstack.org/67740	00:14
openstackgerrit	lifeless proposed a change to openstack-infra/config: Update the geard server for tripleo-gate. https://review.openstack.org/67680	00:14
openstackgerrit	lifeless proposed a change to openstack-infra/config: Configure eth1 for DHCP in tripleo-gate instances https://review.openstack.org/67260	00:14
fungi	sdague: the promote did interesting things to the enqueued times for a few changes in the gate	00:15
lifeless	yay	00:21
lifeless	\| 3e251d4c-377d-4fa4-9b6a-4eff78f86cd7 \| precise-1390175363.template.openstack.org \| ACTIVE \| image_pending_upload \| Running \| default-net=10.0.0.7, 138.35.77.21; tripleo-bm-test=192.168.1.12 \|	00:21
lifeless	significant progress	00:21
lifeless	now if I can just get someone to take all my patches ;)	00:21
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Don't load system host keys. https://review.openstack.org/67738	00:23
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Ignore vim editor backup and swap files. https://review.openstack.org/67651	00:23
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Only attempt to copy files when bootstrapping. https://review.openstack.org/67678	00:23
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Document that fake.yaml isn't usable. https://review.openstack.org/67679	00:23
lifeless	and -woo- \| 02394c2d-a200-4e9b-83a4-ca2d87b411f1 \| precise-ci-overcloud-1.slave.openstack.org \| BUILD \| spawning \| NOSTATE \|	00:24
lifeless	ci-overcloud open for business, minions!	00:24
fungi	booyah	00:25
lifeless	fungi: so with all my patches applied, it should be good. We're now blocked on this :(	00:29
lifeless	fungi: how can I move it forward?	00:29
mattoliverau	Been reading a weekends (including US friday) worth of scroll back. Looks like the new new new zuul migration worked out well. glad to hear it!	00:29
fungi	lifeless: use toothpicks to hold clarkb's eyelids open while he reviews all of it ;)	00:30
lifeless	clarkb: Hi. You need toothpicks?	00:30
lifeless	fungi: can mordred review this, if I can distract him sufficiently?	00:31
fungi	mattoliverau: well enough. the stumbling blocks we did hit counted as learning experiences/bugs worth fixing	00:31
fungi	lifeless: probably. i can too if i free myself up sufficiently, but there's a lot of other stuff we all need to review too	00:31
lifeless	fungi: I know :(	00:32
sdague	fungi: it did?	00:34
lifeless	mordred: it would be a great help if you could review everything from me in infra/config and infra/nodepool	00:35
sdague	hmmm... yeh, so it definitely reset some of those	00:35
sdague	interesting	00:35
fungi	sdague: i bet it's gerrit dependencies	00:35
fungi	look at the pattern	00:36
sdague	yeh, could be	00:36
sdague	so the items only include the roots?	00:36
fungi	seems to always be items reset immediately following other items from the same project. zuul looks for and pulls in any approved dependencies, so however that's being accomplished may be creating new objcects	00:37
fungi	objects	00:37
fungi	rather than reusing the existing ones	00:37
sdague	well, it's recreating all of them	00:37
sdague	my patch just provided a way of setting the enqueue time	00:38
sdague	but I guess the children are a little different	00:38
fungi	got it. so i guess that code path isn't hit for dependent changes	00:38
*** DennyZhang has joined #openstack-infra		00:39
sdague	yep	00:39
sdague	so I think once this lands - https://review.openstack.org/#/c/67739/ stable/havana will work again	00:39
sdague	at least that's the current blocker	00:39
*** sarob has joined #openstack-infra		00:43
mordred	lifeless: what?	00:49
lifeless	mordred: I've kindof patchbombed a bunch of stuff to get tripleo-ci functional (the infra/config and nodepool bits we need)	00:49
mordred	lifeless: yeah - I saw that - I'll go read	00:50
mordred	once those land, you belive that ci-overcloud is good for business?	00:50
mordred	lifeless: also, how much capacity does it have? can we also use it for normal gate nodes?	00:50
lifeless	mordred: derekh and I have been pushing hard on actually, you know, having it all work and we're now (running manually) in end to end fine tuning	00:50
mordred	:)	00:50
lifeless	mordred: so status with these patches:	00:50
lifeless	- we should be able to run 24 cores of jenkins slaves	00:51
lifeless	- and right now uhm 10 test environments	00:51
fungi	24 cores of jenkins slaves meaning 6 slaves?	00:51
fungi	(at 4x vcpu each)	00:51
lifeless	fungi: dunno, depends on the size we choose. remember it's not running devstack-gate	00:51
fungi	ahh, yeah	00:52
mordred	right. I was just asking about its capacity for also running d-g - mainly because I'm asking everyone that right now. the answer can be "nope"	00:52
lifeless	we have another 40+ machines we can start scaling up into	00:52
lifeless	plus the RH cloud coming along	00:52
lifeless	I'm trying to highlight that 'good for business' is nuanced :)	00:53
lifeless	the silent queue runs on everything but doesn't vote, right ?	00:53
fungi	lifeless: the silent queue doesn't report to the change at all, just uploads logs and sends stats to graphite	00:54
lifeless	fungi: is there something that reports but won't vote ?	00:54
fungi	lifeless: use a filter in the jobs section of layout.yaml to set voting: false on a job or job name pattern	00:55
lifeless	oh right	00:55
fungi	same place you filter which jobs run on what branch name patterns	00:55
fungi	then it will report back to the change, but its result won't be taken into account for the verify score	00:56
lifeless	mordred: oh, running actual devstack-gate jobs, the ones rh and hp run today?	00:58
lifeless	mordred: I think we should layer that in only after everything else is working	00:59
mordred	kk	00:59
lifeless	mordred: not so much a capacity issue (though there is that) but rather what benefit we get	00:59
lifeless	d-g is running elsewhere	00:59
lifeless	tripleo-gate isn't	00:59
mordred	d-g is runnign elsewhere, but the gate is under pretty massive duress atm	00:59
lifeless	once tripleo-gate is running and heading up the path to being a symmetric gate with everything else	01:00
mordred	although maybe rax will bump our quota	01:00
mordred	lifeless: ++	01:00
lifeless	then adding more d-g nodes in excess capacity would be a great thing to do	01:00
openstackgerrit	lifeless proposed a change to openstack-infra/config: Don't vote with gate-tripleo-deploy yet. https://review.openstack.org/67743	01:03
mordred	lifeless: ok. your config changes are +2/+A - they all seem pretty directly only touching tripleo at the moment	01:14
lifeless	mordred: yeah, we're not in the collective gate yet	01:15
*** sarob has quit IRC		01:17
*** DennyZhang has quit IRC		01:17
*** sarob has joined #openstack-infra		01:20
fungi	sdague: the promoted change failed on a tempest test with "The resource could not be found."	01:22
*** nosnos has joined #openstack-infra		01:24
sdague	bummer, link?	01:27
jog0	so it looks like the console logs are still not in elasticSearch is that correct	01:28
fungi	sdague: https://jenkins04.openstack.org/job/gate-tempest-dsvm-full/3488/	01:28
jog0	fungi: ^	01:28
fungi	jog0: they should be for any jobs run through jenkins01 and jenkins02	01:28
mordred	jog0: we're rolling out the new plugin version one jenkins ata time	01:29
mordred	btw - the fact that we have 5 jenkins master is still kinda amazing to me	01:29
jog0	mordred: yeah last I checked it was 3	01:30
*** Guest52195 is now known as maelfius		01:30
jog0	ahh, I'll manually check for jenkins01 logs in elasticSearch	01:30
*** maelfius is now known as Guest62585		01:31
jog0	this data missing means we are running partially blind in elastic-search	01:31
mordred	jog0: thats what you'll get for being sick for a period of time	01:31
mordred	yup	01:31
mordred	jog0: sdague was talking about that earlier	01:31
fungi	jog0: i think dims has a patch proposed for adding the name of the jenkins master as a metadata field so it can be searched/summarized	01:31
jog0	fungi: yeah that will really help	01:32
*** Guest62585 is now known as needscoffee		01:32
*** needscoffee has joined #openstack-infra		01:32
*** needscoffee is now known as morganfainberg		01:32
*** morganfainberg is now known as morganfainberg\|z		01:32
jog0	touchdown seahawks	01:32
*** morganfainberg\|z is now known as morganfainberg		01:32
mordred	jog0: that was a RUN	01:33
mordred	clarkb: are you at the stadium?	01:33
openstackgerrit	A change was merged to openstack-infra/nodepool: Permit specifying instance networks to use. https://review.openstack.org/66394	01:33
*** cyeoh has quit IRC		01:33
sdague	fungi: sigh, yeh, that's unrelated. It was on my monday fix list	01:34
openstackgerrit	A change was merged to openstack-infra/nodepool: Permit using a known keypair when bootstrapping. https://review.openstack.org/67649	01:34
openstackgerrit	A change was merged to openstack-infra/nodepool: Add some debugging around image checking. https://review.openstack.org/67650	01:34
openstackgerrit	A change was merged to openstack-infra/nodepool: Only attempt to copy files when bootstrapping. https://review.openstack.org/67678	01:34
openstackgerrit	A change was merged to openstack-infra/nodepool: Document that fake.yaml isn't usable. https://review.openstack.org/67679	01:34
openstackgerrit	A change was merged to openstack-infra/nodepool: Don't load system host keys. https://review.openstack.org/67738	01:34
*** dcramer_ has joined #openstack-infra		01:34
openstackgerrit	A change was merged to openstack-infra/nodepool: Ignore vim editor backup and swap files. https://review.openstack.org/67651	01:34
jog0	fungi: confirmed that jenkins01 logs are in elasticSearch	01:34
jog0	at least for a passing job	01:35
fungi	jog0: and for jenkins02 that should be the case as well, as of about 6 hours ago (rough estimate)	01:35
lifeless	fungi: where is your branch updating the nodepool definition for ci-overcloud ? I have tweaks	01:35
jog0	fungi: cool	01:35
*** sarob has quit IRC		01:35
fungi	lifeless: https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:tripleo-ci,n,z	01:36
fungi	it's really just https://review.openstack.org/66491 though	01:36
openstackgerrit	lifeless proposed a change to openstack-infra/config: Update TripleO Cloud API endpoint for Nodepool https://review.openstack.org/66491	01:39
lifeless	mordred: ^ needed too	01:39
lifeless	then I think we can turn it on and start debugging the actual test scripts	01:39
openstackgerrit	A change was merged to openstack-infra/config: Improve tripleo nodepool image build efficiency. https://review.openstack.org/67255	01:43
lifeless	I think then I need to look into how all the zuul ref stuff works so that we can make sure we run the code being merged not the code in trunk	01:43
openstackgerrit	A change was merged to openstack-infra/config: Configure eth1 for DHCP in tripleo-gate instances https://review.openstack.org/67260	01:43
openstackgerrit	A change was merged to openstack-infra/config: Update the geard server for tripleo-gate. https://review.openstack.org/67680	01:44
sdague	fungi: so yah, that's the big giant stack trace in pci	01:45
mordred	lifeless: it's actuallu pretty straightforward - zuul sends you a refspec and you use that	01:46
lifeless	mordred: yeah, I know but ...	01:46
lifeless	mordred: we need to translate that to our various refs	01:46
lifeless	etc	01:46
lifeless	its not that its hard, its that we need to do it	01:46
mordred	lifeless: wait - what do you mean by "our various refs" ?	01:48
mordred	why would your refs be different?	01:48
lifeless	we have one set of variables - git url, branch, commitish - per source repository	01:48
lifeless	we don't consult ZUUL_REF	01:48
mordred	well, if you don't consult ZUUL_REF, your going to have a very hard time getting the right commit	01:49
lifeless	thus my point	01:49
lifeless	just like devstack doesn't consult ZUUL_REF but devstack_gate arranges it so things DTRT we need to do the same	01:50
mordred	I do not undersatnd your souther hemisphere english	01:50
*** dkranz has quit IRC		01:51
StevenK	mordred: You need to read it upside down	01:54
mordred	StevenK: DOH	01:54
*** zhiwei has joined #openstack-infra		01:54
lifeless	mordred: anyhow, nvm - I know we have more to do, and I know how zuul works it, and I know our plumbing which you perhaps don't know as much as you could :)	01:58
jog0	given a failed job how do I now which jenkins server it ran on?	02:00
jog0	http://logs.openstack.org/92/64592/4/check/check-tempest-dsvm-neutron-isolated/c6cda8d/	02:00
fungi	jog0: easiest way is to look at the hyperlink embedded in the first few lines to the slave hostname	02:01
mordred	lifeless: I usually hire a plumber to deal with plumbing issues...	02:01
jog0	fungi: oh nice	02:02
jog0	jenkins01, so this should get console logs	02:02
lifeless	mordred: you did	02:03
mordred	lifeless: that's what I'm saying	02:04
fungi	sdague: i just noticed where the subway can use another color... chances cancelled because they depend on another change which is failing or hitting a merge conflict (wight now those show up red)	02:04
fungi	s/chances/changes/	02:04
openstackgerrit	ChangBo Guo proposed a change to openstack-dev/hacking: Add check for removed modules in Python 3 https://review.openstack.org/61049	02:05
lifeless	mordred: are you offering to do the work for tripleo?	02:05
lifeless	mordred: or am I just horribly confused	02:05
*** pcrews has quit IRC		02:05
mordred	lifeless: let's go with confused	02:06
*** senk has joined #openstack-infra		02:07
sdague	fungi: sure	02:08
mordred	jog0: wow. that was a throw right there	02:08
jog0	mordred: didn't see it got distracted by work	02:09
jog0	but I did hear yelling from the bar down the street	02:09
mordred	jog0: oh my. it was a 50+ yard throw 4th down conversion for a TD	02:09
*** senk has quit IRC		02:09
jog0	ouch	02:10
jog0	tie game	02:10
mordred	it was the type of throw which makes me worry for property damage in downtown sf	02:10
mordred	jog0: nope. seattle is in the lead by 3 now	02:10
jog0	ahh the online score is outdated	02:10
jog0	I was downtown on new years and it looked like something out of mad max	02:11
mordred	I try to not be in placse like that	02:11
mordred	of course, we've been booking our stuff for mardi gras, so I'm actually full of shit :)	02:11
sdague	jog0: you didn't manage to classify this one yet, did you - https://bugs.launchpad.net/nova/+bug/1270680	02:11
jog0	I can't imagine what sf would do in this case	02:11
jog0	sdague: no sorting out some kinks on my e-r patch	02:12
sdague	jog0: ok, cool, I just didn't want to dupe something you'd gotten	02:13
*** sarob has joined #openstack-infra		02:13
jog0	sdague: btw I think it would be interesting to plot commits to openstack/openstack and zuul gate queue	02:13
sdague	jog0: sure. I'm trying to keep a balance between making the problem visible and fixing it	02:14
sdague	because visibility is only seeming to work so much	02:14
jog0	sdague: heh yeah, well that would tell us if things are getting worse or better merge rate wise	02:14
jog0	so I don't know the answer to the following question: did concurrency=2 in gate make things better or worse	02:15
fungi	ugh... https://jenkins04.openstack.org/job/gate-swift-dsvm-functional/728/consoleText	02:15
jog0	and merge rate may shine a little insight into that	02:15
fungi	fail on an hpcloud-az2 slave failing to connect via ipv6 to git.openstack.org. wtf?	02:15
fungi	Building remotely on devstack-precise-hpcloud-az2-1143800 [...] fatal: unable to connect to git.openstack.org: git.openstack.org[0: 2001:4800:7813:516:3bc3:d7f6:ff04:aacb]: errno=Network is unreachable	02:16
fungi	uh, yeah, hpcloud az2 has no ipv6. why did you try to use it?	02:16
sdague	sweet fumble!	02:16
* fungi is apparently missing some very enthralling sportball		02:17
sdague	yes	02:17
sdague	especially if you don't like SF :)	02:17
StevenK	I am too, but australia only shows american football on pay TV	02:17
lifeless	fungi: you might have local ipv6 connectivity	02:18
clarkb	I ammissing it :(	02:18
fungi	lifeless: well, somehow that slave thought it had a global ipv6 address assigned	02:18
*** yaguang has joined #openstack-infra		02:18
clarkb	saw the kearse td. did seattle just recover a fumble?	02:18
clarkb	sdague ^	02:18
*** sarob has quit IRC		02:18
lifeless	fungi: clearly it *did8	02:18
lifeless	fungi: just not a working one...	02:19
fungi	indeed	02:19
openstackgerrit	Sean Dague proposed a change to openstack-infra/elastic-recheck: add hit for bug 1270680 https://review.openstack.org/67751	02:19
fungi	freakish. first time i've seen an hp vm do that	02:19
sdague	clarkb: yes	02:19
sdague	jog0: can you look at that er fingerprint?	02:19
lifeless	oh wow we clone all of stackforge too...	02:20
jog0	sdague: looking	02:20
lifeless	I wonder if we made one mega git repo	02:20
lifeless	and sucked everything into it	02:20
lifeless	and then made branches it would be faster	02:20
mordred	lifeless: I've been meaning to get a grokmirror thing set up - the kernel guys says it helps	02:24
jog0	sdague: message:"TRACE nova.api.openstack" AND message:"pci.py" AND message:"InstanceNotFound: Instance" AND filename:"logs/screen-n-api.txt"	02:24
jog0	message:"TRACE nova.api.openstack" AND message:"InstanceNotFound: Instance" AND filename:"logs/screen-n-api.txt"	02:24
jog0	those have very different hit counts	02:24
*** senk has joined #openstack-infra		02:24
sdague	they do, it's not limitted to that extension	02:24
mordred	clarkb: how are you MISSING the sportsball?	02:24
sdague	at least from what I can tell	02:24
mordred	clarkb: it's one of teh best games of sports I've seen in a while	02:24
openstackgerrit	ChangBo Guo proposed a change to openstack-dev/hacking: Add check for removed modules in Python 3 https://review.openstack.org/61049	02:25
clarkb	mordred: I have friends that dont sports ball. about to be at a house party will try watching from there	02:25
mordred	clarkb: 8:33 left in the 4th	02:25
sdague	jog0: I actually think this is one of the new ones that is biting us hard	02:25
clarkb	mordred we still winning	02:26
clarkb	?	02:26
sdague	wow, worst handoff ever	02:26
sdague	clarkb: yes, but refumbled	02:26
mordred	clarkb: yeah. but by 3 - and jsut lost it on downs	02:26
jog0	sdague: so this has rougly equal hits for FAILURE and SUCCESS	02:26
jog0	which is actually not a horrible query	02:26
mordred	clarkb: be VERY glad you didn't see the knee braek though	02:26
sdague	jog0: yes, you read the log message right	02:26
sdague	even on success, we are doing bad things, because we're going to be leaking resources	02:26
sdague	as those success versions are on tempest compute deletes	02:27
jog0	agreed	02:27
jog0	so LGTM	02:27
*** nati_uen_ has quit IRC		02:27
* jog0 +As sdague's patch		02:27
sdague	woot	02:27
sdague	this quarter is just rediculous	02:30
clarkb	I need play by play :)	02:30
openstackgerrit	A change was merged to openstack-infra/elastic-recheck: add hit for bug 1270680 https://review.openstack.org/67751	02:31
StevenK	clarkb: Aren't there apps for that?	02:31
sdague	clarkb: seatle just intercepted	02:31
mordred	clarkb: seattle just intercepted again	02:31
lifeless	mordred: goingto +A too ? https://review.openstack.org/#/c/66491/ [it's passed everything except the yaml order check, which it doesn't affect]	02:32
sdague	which means we just had: fumble SF, fumble (but not called) SEA, fumble (and self recovery) on 4th down by SEA, interception by SF	02:33
sdague	in about 8 downs	02:33
mordred	yeah - fungi, you ok with https://review.openstack.org/#/c/66491/ going in?	02:33
sdague	the only thing that would make this better is snow :)	02:34
mordred	sdague: or a giant earthquake	02:34
fungi	mordred: sure, it won't take effect automatically anyway because it's the reason puppet's still disabled on nodepool.o.o	02:34
mordred	fungi: k. awesome	02:35
fungi	mordred: see https://review.openstack.org/66958 and accompanying bug 1269001 for details	02:35
sdague	fungi: when you get a chance can you see if I goofed this up too badly - https://review.openstack.org/#/c/67591/	02:35
sdague	that will give us the uncategorized jobs list	02:36
notmyname	how do I deal with the error on the 2nd job in the gate right now? logs: https://jenkins04.openstack.org/job/gate-swift-dsvm-functional/728/console	02:37
notmyname	error connecting to git	02:37
openstackgerrit	A change was merged to openstack-infra/config: Clamp MTU in TripleO instances https://review.openstack.org/67740	02:38
notmyname	if the top one fails will it stay in? or is it too late? any chance it can be retried right there so as not to wait another 40+ hours?	02:38
*** senk has quit IRC		02:38
openstackgerrit	A change was merged to openstack-infra/config: Don't vote with gate-tripleo-deploy yet. https://review.openstack.org/67743	02:38
notmyname	patch set 65604,3	02:39
fungi	notmyname: i'm stumped on that one--was looking at it earlier. hpcloud west doesn't provide global ipv6 to tenant networks, so why it thought it had one is a real enigma	02:39
mordred	clarkb: field-goal. seahawks up by 6	02:39
*** mrda has quit IRC		02:39
notmyname	fungi: any hope for it going in? looks like zuul already recalculated it so it has to go to the bottom of the queue with a manual reverify?	02:39
fungi	notmyname: i think the very top of that diagram gets it wrong	02:40
fungi	notmyname: if you look at https://jenkins04.openstack.org/job/gate-swift-dsvm-functional/728/ it says Other changes tested concurrently with this change: 65255,1	02:40
notmyname	fungi: so my only hope is that the top one fails?	02:40
fungi	notmyname: yeah, if the change running ahead of it fails, it will be retested on the branch tip	02:41
*** mrda has joined #openstack-infra		02:42
sdague	fungi: so we've seen this creep up a couple times before - http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiMjAwMTo0ODAwOjc4MTM6NTE2OjNiYzM6ZDdmNjpmZjA0OmFhY2JcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiYWxsIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM5MDE4NTgyOTgxNX0=	02:45
sdague	http://logs.openstack.org/19/65019/2/gate/gate-grenade-dsvm/d5c5219/console.html.gz	02:46
sdague	fungi: so maybe we should register a bug and a fingerprint for it	02:46
*** gokrokve has quit IRC		02:48
sdague	clarkb: interception in the end zone	02:50
mordred	clarkb: INTERCEPTED	02:50
sdague	by SEA	02:50
jog0	wow SF fail	02:50
mordred	like, wow	02:50
clarkb	mordred sdague you guys are awesome thank you	02:50
sdague	final score SF: 17, SEA 23	02:53
clarkb	\o/	02:53
jog0	☹	02:54
mordred	clarkb: when is supersportsball?	02:54
mordred	clarkb: next week or in 2 weeks?	02:55
clarkb	2weeks	02:55
clarkb	feb 2nd	02:55
mordred	I'll be in Brussels	02:56
mordred	I'm going to need to find a place with the game	02:56
mordred	because broncos seahawks is going to be interesting	02:57
*** gokrokve has joined #openstack-infra		02:57
openstackgerrit	Joe Gordon proposed a change to openstack-infra/elastic-recheck: Mark resolved bugs https://review.openstack.org/67752	02:57
sdague	jog0: so look at - http://status.openstack.org/elastic-recheck/	02:58
jog0	sdague: looking	02:58
sdague	I actually think Bug 1270680 - v3 extensions api inherently racey wrt instances - might be one of our biggest new gate issues	02:58
sdague	and the reason we're getting killed right now	02:59
jog0	sdague: yeah I agree	02:59
jog0	1270680 has a pretty graph lol	02:59
jog0	so many colors	02:59
sdague	heh, yeh	02:59
sdague	so I marked it as critical for nova	02:59
jog0	sdague: cool	03:00
sdague	I'll dive on it tomorrow	03:00
jog0	it looks like its tome to do some git log and git bisect	03:00
jog0	since we know when it started	03:00
sdague	actually, it's when we actually started testing it	03:00
sdague	that code's been in nova since oct	03:00
jog0	:(	03:00
sdague	maybe something else changed wrt to it	03:00
sdague	also, we're kind of mostly blind for the last couple of weeks	03:01
jog0	hmm so logstash.o.o doesn't show the same graph	03:01
jog0	there are hits before jan 16th there	03:01
*** gokrokve has quit IRC		03:02
openstackgerrit	A change was merged to openstack-infra/config: Update TripleO Cloud API endpoint for Nodepool https://review.openstack.org/66491	03:02
*** cyeoh has joined #openstack-infra		03:02
*** AaronGr_Zzz is now known as AaronGr		03:03
sdague	jog0: maybe hot data issue	03:03
sdague	let's see if the query fills out next go around	03:03
jog0	yeah	03:03
openstackgerrit	A change was merged to openstack-infra/elastic-recheck: Mark resolved bugs https://review.openstack.org/67752	03:04
notmyname	fungi: what bug number should I use for a recheck?	03:05
openstackgerrit	A change was merged to openstack-infra/elastic-recheck: Add check for bug 1270608 https://review.openstack.org/67713	03:05
*** praneshp_ has joined #openstack-infra		03:05
notmyname	I know you guys are aware of it, but I want this to be in the logs (ie on the record). the patch that is about to fail (because the test node couldn't connect to git.o.o) has been in the queue for at least 50 hours and been rechecked 19 times due to gate resets	03:06
fungi	notmyname: i don't think we have one for it--not that i've seen at any rate	03:06
*** praneshp has quit IRC		03:06
*** praneshp_ is now known as praneshp		03:06
sdague	notmyname: I think it's worth reporting one against openstack-ci and we can build an er query for it	03:08
sdague	there were 2 logstash hits back on the 8th	03:08
sdague	so it happens from time to time	03:08
openstackgerrit	Joe Gordon proposed a change to openstack-infra/elastic-recheck: Remove remaining cases of '@message' https://review.openstack.org/67754	03:08
sdague	oh, finally, I was going to get around to that	03:08
notmyname	sdague: could you please do that and give me a bug number?	03:08
sdague	notmyname: you can't register a bug?	03:09
*** sarob has joined #openstack-infra		03:09
notmyname	sdague: I'm not particularly in the mood to file a bug against the gate and make it polite or charitable	03:10
notmyname	see the above number for why	03:10
openstackgerrit	Joe Gordon proposed a change to openstack-infra/elastic-recheck: Remove remaining cases of '@message' https://review.openstack.org/67754	03:10
*** AaronGr is now known as AaronGr_Zzz		03:11
*** sarob_ has joined #openstack-infra		03:13
*** sarob has quit IRC		03:14
notmyname	the job ahead actually failed!!!!!!!!	03:15
fungi	seems that way	03:15
notmyname	and now with another 60+ minutes to check the status, I'm stepping away for a bit	03:16
*** sarob_ has quit IRC		03:18
clarkb	is something broken?	03:19
clarkb	sorry superbowl is happening	03:19
*** gokrokve has joined #openstack-infra		03:19
StevenK	Not for another two weeks? :-P	03:19
fungi	clarkb: nothing new is broken, to my knowledge. do you ask for any particular reason, or just checking in?	03:20
clarkb	fungi the bug number questions	03:22
fungi	clarkb: oh, apparently we saw an hpcloud vm in az2 fail a job because it tried to connect to the ipv6 address of git.o.o and (unsurprisingly) got a network unreachable response	03:23
fungi	which means it must have somehow gotten a global-scope address from somewhere	03:24
openstackgerrit	Joe Gordon proposed a change to openstack-infra/elastic-recheck: Clarify required parameters in query_builder https://review.openstack.org/67756	03:24
openstackgerrit	Joe Gordon proposed a change to openstack-infra/elastic-recheck: Use short build_uuids in elasticSearch queries https://review.openstack.org/67596	03:24
*** gokrokve has quit IRC		03:24
fungi	clarkb: all i can guess is maybe another client in the same ethernet broadcast domain had radvd running or was otherwise generating router advertisements for some reason	03:25
clarkb	awesome	03:25
clarkb	dont we firewall that?	03:27
clarkb	or not because no ipv6 typically	03:27
fungi	ipv6 icmp type for ra? probably not explicitly	03:27
*** vkozhukalov has joined #openstack-infra		03:28
lifeless	fungi: so, if its all landed/ing we can reenable puppet?	03:35
*** nati_ueno has joined #openstack-infra		03:41
jog0	https://review.openstack.org/#/c/67485/	03:41
jog0	that should help with resource issues ever so slightly	03:41
jog0	sdague: ^	03:41
jog0	thats to get better classification numbers	03:41
*** nati_ueno has quit IRC		03:43
*** nati_ueno has joined #openstack-infra		03:44
fungi	lifeless: possibly. i'm not sure if this is a good week to be experimenting with (and potentially destabilizing) nodepool, but i won't really be around to troubleshoot it much during the week so i'll defer to clarkb and mordred if they're going to be in a position to keep an eye on it	03:48
jog0	sdague: can you review https://review.openstack.org/#/c/67596/2	03:54
jog0	still waiting for some more data to finish testing	03:54
jog0	but it looks like its working	03:54
jog0	hada failed to classify failure	03:54
jog0	when the current e-r had a incorrect classification	03:55
jog0	waiting for a successful classification	03:55
*** uriststk has joined #openstack-infra		03:56
*** slong has quit IRC		04:06
*** slong_ has joined #openstack-infra		04:06
*** sarob has joined #openstack-infra		04:13
*** uriststk has quit IRC		04:14
fungi	sdague: jog0: that latest gate reset looks like nova v3 api problems again. does v3 testing just need to be disabled again?	04:17
*** sarob has quit IRC		04:18
cyeoh	fungi: do you have a link to that failure?	04:19
*** coolsvap has joined #openstack-infra		04:19
fungi	cyeoh: https://jenkins03.openstack.org/job/gate-tempest-dsvm-neutron/2524/consoleText	04:19
cyeoh	fungi: thx	04:20
*** gokrokve has joined #openstack-infra		04:20
mattoliverau	cyeoh: are you breaking the v3 api again :P I see you survived the Adelaide heat wave, Melbourne had it pretty bad as well, damn thing followed us back from LCA ;P	04:23
StevenK	mattoliverau: I think the heatwave was on your flight	04:25
cyeoh	mattoliverau: between Perth and Adelaide I ended up with 7 days in a row >40C and three in a row >44C	04:25
cyeoh	I don't think the v3 API is broken but am looking now just to check :-)	04:25
*** gokrokve has quit IRC		04:25
mattoliverau	StevenK: maybe it hid in my bags :P	04:25
*** dcramer_ has quit IRC		04:26
fungi	cyeoh: see the earlier discussion, sdague asserts "bug 1270680 - v3 extensions api inherently racey wrt instances - might be one of our biggest new gate issues"	04:26
StevenK	mattoliverau: Haha	04:26
notmyname	bug 1264972 for it?	04:26
notmyname	fungi: ah, bug 1270680 instead?	04:26
cyeoh	fungi: thanks, will look into it now	04:27
fungi	notmyname: i'm not sure--i just keep the lights on. i defer to nova devs like cyeoh and sdague on these sorts of things	04:27
* fungi compares error messages		04:28
cyeoh	I guess sdague is asleep by now...	04:28
fungi	eh, it's not even midnight in our tz yet. he's probably just distracted by sportball (assuming the game is still going anyway)	04:29
StevenK	No, game finished	04:29
fungi	notmyname: 1264972 looks more searchable anyway	04:31
*** nati_uen_ has joined #openstack-infra		04:32
cyeoh	fungi: oh yes, 1270680 is definitely a problem. I think there's lighter weight things we can do than disable the v3 api testing though	04:32
*** nati_uen_ has quit IRC		04:33
*** nati_uen_ has joined #openstack-infra		04:33
fungi	cyeoh: if it's something which will significantly reduce spurious tempest test failures, i'll gladly shove it to the head of the gate so fewer changes get kicked out needlessly	04:35
notmyname	what magic is behind the zuul queue having patches that have been in the queue for 4 hours ahead of patches that have been around for 35 hours?	04:35
cyeoh	fungi: cool - am just looking now at how to fix it. I think a proper fix should be pretty straight forward	04:36
*** nati_ueno has quit IRC		04:36
fungi	notmyname: sdague's change to carry over the enqueue time isn't actually used on changes which are gerrit dependencies (you'll note the offenders follow changes with sane-looking enqueue times for the same project)	04:36
notmyname	fungi: ok	04:36
fungi	so when those dependent changes get reenqueued, they end up with their enqueue times reset apparently. just noticed that myself a few hours ago	04:37
notmyname	fungi: I figured it had something to do with dependencies. so the patches with shorter times have been around just as long, or those were put up front because of the git logic?	04:37
notmyname	fungi: ah ok	04:37
fungi	yeah, they've been in there as long as the others	04:38
fungi	it's just lying	04:38
fungi	cosmetic bug	04:38
jog0	fungi: I haven't dug into the v3 work enough to know if disabling is the right move	04:38
fungi	jog0: cyeoh seems to have lighter-weight ideas there	04:39
*** dcramer_ has joined #openstack-infra		04:39
jog0	fungi: cool	04:40
cyeoh	fungi, jog0: so I think we have this potential racey failure mode all over both v2 and v3 APIs	04:44
cyeoh	I guess we've just been lucky in the past (or we haven't noticed it anyway)	04:44
jog0	cyeoh: agreed	04:46
jog0	its v2 and v3	04:46
cyeoh	jog0: so I guess there's two ways to fix this. Cache a whole lot more information in the resp_obj or fail gracefully in extensions if the instance is not found	04:49
jog0	failing when we don't need to is a bad idea	04:50
cyeoh	I think I prefer the latter - not including a bit of information about an instance which has just been deleted anyway seems okayish to me	04:50
jog0	as in if the data in the DB but we can't find it ... thats bad	04:50
cyeoh	yea, in this case its because the data has just been deleted.	04:51
jog0	cyeoh: ahh	04:51
cyeoh	so we can just not append the data we can't to anymore (because of the race)	04:51
cyeoh	"can't get to" I mean	04:51
jog0	cyeoh: TBH I haven't looked at this enough to have enough of an understanding of the issue	04:51
jog0	cyeoh: so its your call. I am distracted by elastic-recheck stuff at the moment	04:52
cyeoh	jog0: ok, np. I'll see if its by luck just hitting a specific extension which we can fix quickly now, or if we need to fix all of them to make a difference	04:52
*** amotoki has joined #openstack-infra		04:55
jog0	cyeoh: thanks	04:55
*** gokrokve has joined #openstack-infra		04:57
jog0	cyeoh: can you update that bug with your comments	04:57
cyeoh	jog0: just did	04:57
jog0	cyeoh: excellent	04:58
cyeoh	hrm and looking through logstash for occurrences of it just found another bug in the v2 api ;-)	04:59
*** senk has joined #openstack-infra		05:00
jog0	:/	05:02
*** slong has joined #openstack-infra		05:07
*** slong_ has quit IRC		05:07
*** katyafervent has quit IRC		05:11
*** katyafervent has joined #openstack-infra		05:11
*** sarob has joined #openstack-infra		05:13
*** nicedice has quit IRC		05:15
*** sarob has quit IRC		05:18
jog0	fungi: how far are we from getting all the jenkins masters to have the console.html=>elasticSearch fix	05:19
fungi	jog0: clarkb and zaro were monitoring the plugin upgrade on jenkins02 before applying it on the others	05:20
*** senk has quit IRC		05:20
jog0	fungi: thanks, from what I See the fix is defiantly helping	05:21
*** nati_ueno has joined #openstack-infra		05:21
*** nati_ueno has quit IRC		05:21
jog0	haven't seen a missing console on any jenkins 01 and 02 nodes	05:22
*** nati_ueno has joined #openstack-infra		05:22
*** krtaylor has joined #openstack-infra		05:25
*** nati_uen_ has quit IRC		05:25
lifeless	fungi: ah, so its less about experimenting with nodepool and more about getting jobs running for us; its rather critical path	05:25
lifeless	fungi: we'll obviously stand ready to support any issues it might cause	05:25
lifeless	fungi: could we run a separate nodepool in fact, avoid whatever bugs might lurk in nodepool?	05:26
*** chandankumar_ has joined #openstack-infra		05:26
openstackgerrit	Joe Gordon proposed a change to openstack-infra/elastic-recheck: Sort uncategorized fails by time https://review.openstack.org/67761	05:26
lifeless	mordred: clarkb: ^ you may be more awake :P	05:26
clarkb	whats wrong with the existing nodepool?	05:27
clarkb	I'm not sure the nodepool cli is built for two nodepools	05:27
clarkb	or the db stuff	05:27
* fungi is awake, but not very awake		05:30
lifeless	clarkb: fungi is worried that turning tripleo-test-cloud on will cause issues w/nodepool when the gate is already fragile	05:31
lifeless	clarkb: I was suggesting to mitigate that by running an entirely separate nodepool that is connected to the same geard	05:32
fungi	nothing's necessarily wrong with the the existing nodepool. i just don't want to greenlight all the tripleo-ci-supporting patches for it and the config by reenabling puppet on the server when i'm not going to necessarily be around to troubleshoot it	05:32
fungi	so leaving that call to those who will be around	05:33
jog0	is there a bug filed for: http://logs.openstack.org/23/66223/1/gate/gate-python-heatclient-pypy/9950fd5/console.html#_2014-01-19_01_08_37_063	05:35
jog0	pip timeouts	05:35
clarkb	fungi I see	05:37
*** michchap has quit IRC		05:37
*** michchap has joined #openstack-infra		05:37
clarkb	jog0 I think if you search under openstack-ci there may be	05:37
jog0	all I found was https://bugs.launchpad.net/openstack-ci/+bug/1254167	05:38
jog0	which is a little differnt	05:38
jog0	this is the fingerprint I am using: filename:"console.html" AND message:"download.py\", line 495"	05:38
jog0	there aren't many occurrences thankfully	05:38
lifeless	clarkb: back in the states?	05:40
*** DinaBelova_ is now known as DinaBelova		05:40
*** SergeyLukjanov_ is now known as SergeyLukjanov		05:40
clarkb	lifeless: yes, mostly over jetlag now	05:41
lifeless	clarkb: \o/	05:42
openstackgerrit	lifeless proposed a change to openstack-infra/config: Tripleo-gate needs the gear library. https://review.openstack.org/67762	05:42
fungi	the jetlag's not gone, just lulling you into a false sense of security	05:42
StevenK	Haha	05:42
lifeless	mordred: more ^ fodder	05:42
lifeless	mordred: we could install that at runtime, but its really part of base setup	05:42
*** carl_baldwin has joined #openstack-infra		05:45
clarkb	I will change scp plugins tomorrow on 03 and 04 then resume holidaying	05:45
jog0	clarkb: thanks	05:46
*** nosnos has quit IRC		05:52
*** nosnos_ has joined #openstack-infra		05:52
fungi	oh, right, tomorrow is a usa holiday	05:55
lifeless	nuts :(	05:55
* fungi has lost track of which days are weekends much less holidays		05:55
fungi	and yes, we're all nuts here	05:56
*** oubiwann_ has quit IRC		05:56
StevenK	Oh, MLK day	05:58
jog0	clarkb: how do I add grenade logs to elasticSearch	05:59
jog0	actully being this is supposed to be the weekend, never mind	05:59
openstackgerrit	Joe Gordon proposed a change to openstack-infra/elastic-recheck: Add query for bug 1270710 https://review.openstack.org/67764	06:01
clarkb	jog0 add the files to the list of files	06:01
jog0	clarkb: where is that?	06:02
clarkb	jog0 though ideally we list the files without paths and recursively look them up	06:02
jog0	clarkb: so in tempest the files are under logs/	06:03
clarkb	modules/openstack_project/files/logstash/sometjingclient.yaml	06:03
jog0	but in grenade they are under new/logs	06:03
clarkb	jog0 right. today logstash needs full paths	06:03
fungi	okay, i swear i'm really going to try to take a nap now	06:04
jog0	clarkb: lets pick this up on tuesday	06:04
clarkb	jog0 ok. sherlock is on now :)	06:04
*** nati_uen_ has joined #openstack-infra		06:10
clarkb	jog0 is the pip fail downloading pip installer from github during devstack?	06:12
clarkb	that is arguably a devstack bug	06:12
jog0	clarkb: http://logs.openstack.org/23/66223/1/gate/gate-python-heatclient-pypy/9950fd5/console.html#_2014-01-19_01_08_37_063	06:13
*** sarob has joined #openstack-infra		06:13
clarkb	ah no different problem	06:13
*** nati_ueno has quit IRC		06:14
*** gokrokve has quit IRC		06:16
*** sarob has quit IRC		06:18
*** rahmu has quit IRC		06:27
*** DinaBelova has quit IRC		06:28
*** carl_baldwin has quit IRC		06:29
*** rahmu has joined #openstack-infra		06:29
*** DinaBelova has joined #openstack-infra		06:31
*** mrda has quit IRC		06:41
*** vkozhukalov has quit IRC		06:44
*** bookwar has joined #openstack-infra		06:46
*** gokrokve has joined #openstack-infra		06:47
*** gokrokve_ has joined #openstack-infra		06:49
*** jhesketh_ has quit IRC		06:50
*** yamahata has joined #openstack-infra		06:52
*** gokrokve has quit IRC		06:52
*** nosnos_ has quit IRC		06:52
*** nosnos has joined #openstack-infra		06:53
*** gokrokve_ has quit IRC		06:54
*** jhesketh has quit IRC		06:56
amotoki	hi, I would like to request gerrit account for external testing.	06:57
*** pblaho has joined #openstack-infra		06:57
amotoki	I am now working on neutron third party testing. Is this a right place to request an account?	06:57
*** gokrokve has joined #openstack-infra		06:58
*** nati_ueno has joined #openstack-infra		06:59
clarkb	amotoki please see thr document at http://ci.openstack.org	06:59
*** SergeyLukjanov is now known as SergeyLukjanov_a		06:59
*** SergeyLukjanov_a is now known as SergeyLukjanov_		07:00
*** nati_uen_ has quit IRC		07:01
amotoki	clarkb: I saw http://ci.openstack.org/third_party.html and there are several ways: #openstack-infra , ML, bug report. Can I request it in this channel?	07:02
*** gokrokve has quit IRC		07:03
*** nati_ueno has quit IRC		07:04
*** nati_ueno has joined #openstack-infra		07:04
*** SergeyLukjanov_ is now known as SergeyLukjanov		07:08
*** mrda has joined #openstack-infra		07:09
clarkb	amotoki: you can but it is sunday night before a US holiday. a better bet is the mail list	07:10
amotoki	clarkb: ah.... thanks.. I will request it via the list.	07:12
*** sarob has joined #openstack-infra		07:13
*** yolanda has joined #openstack-infra		07:14
*** jhesketh_ has joined #openstack-infra		07:15
*** sarob has quit IRC		07:17
*** SergeyLukjanov is now known as SergeyLukjanov_		07:18
*** morganfainberg is now known as morganfainberg\|z		07:19
*** mrda has quit IRC		07:20
*** mayu has joined #openstack-infra		07:29
*** jcoufal has joined #openstack-infra		07:29
*** crank has quit IRC		07:30
*** mayu has quit IRC		07:34
*** NikitaKonovalov_ is now known as NikitaKonovalov		07:43
*** crank has joined #openstack-infra		07:44
*** afazekas_ has joined #openstack-infra		07:52
*** SergeyLukjanov_ is now known as SergeyLukjanov		07:53
*** nati_uen_ has joined #openstack-infra		07:53
*** morganfainberg\|z is now known as morganfainberg		07:54
*** nati_ueno has quit IRC		07:56
*** gokrokve has joined #openstack-infra		07:59
ttx	FTR I'm traveling all day, mostly on a non-wifi transatlantic plane	08:02
*** mrda has joined #openstack-infra		08:02
*** gokrokve has quit IRC		08:04
*** yolanda has quit IRC		08:05
*** jamielennox is now known as jamielennox\|away		08:08
*** crank has quit IRC		08:09
*** mrda has quit IRC		08:09
*** crank has joined #openstack-infra		08:09
*** zhiwei has quit IRC		08:09
*** zhiwei has joined #openstack-infra		08:09
*** hashar has joined #openstack-infra		08:12
*** sarob has joined #openstack-infra		08:13
*** sarob has quit IRC		08:18
*** flaper87\|afk is now known as flaper87		08:18
*** crank has quit IRC		08:21
*** crank has joined #openstack-infra		08:22
openstackgerrit	Nikita Konovalov proposed a change to openstack-infra/storyboard: Minor migration fix https://review.openstack.org/67789	08:25
*** yolanda has joined #openstack-infra		08:25
*** luqas has joined #openstack-infra		08:26
openstackgerrit	Nikita Konovalov proposed a change to openstack-infra/storyboard: Introducing basic REST API https://review.openstack.org/63118	08:27
*** vkozhukalov has joined #openstack-infra		08:28
*** vkozhukalov has quit IRC		08:34
*** matrohon has joined #openstack-infra		08:36
*** luqas has quit IRC		08:36
*** nati_ueno has joined #openstack-infra		08:41
*** nati_ueno has quit IRC		08:41
*** nati_ueno has joined #openstack-infra		08:42
*** praneshp has quit IRC		08:42
*** hashar has quit IRC		08:42
*** praneshp has joined #openstack-infra		08:44
*** nati_uen_ has quit IRC		08:44
*** mrda has joined #openstack-infra		08:44
*** SergeyLukjanov is now known as SergeyLukjanov_		08:47
*** DinaBelova is now known as DinaBelova_		08:48
*** jcoufal has quit IRC		08:49
*** vkozhukalov has joined #openstack-infra		08:51
*** fbo_away is now known as fbo		08:52
*** zhiwei has quit IRC		08:54
*** senk has joined #openstack-infra		08:56
*** senk has quit IRC		08:57
*** BobBallAway is now known as BobBall		08:58
*** gokrokve has joined #openstack-infra		09:00
*** gokrokve has quit IRC		09:05
*** nati_ueno has quit IRC		09:07
*** mancdaz_away is now known as mancdaz		09:07
*** nati_ueno has joined #openstack-infra		09:07
*** mancdaz is now known as mancdaz_away		09:07
*** luqas has joined #openstack-infra		09:12
*** nati_ueno has quit IRC		09:12
*** jcoufal has joined #openstack-infra		09:12
*** sarob has joined #openstack-infra		09:13
*** derekh has joined #openstack-infra		09:15
*** yassine has joined #openstack-infra		09:16
*** sarob has quit IRC		09:18
*** markmc has joined #openstack-infra		09:18
*** dpyzhov has joined #openstack-infra		09:18
*** yassine has quit IRC		09:18
*** yassine has joined #openstack-infra		09:18
*** jpich has joined #openstack-infra		09:23
*** zhiwei has joined #openstack-infra		09:25
*** praneshp has quit IRC		09:29
*** dizquierdo has joined #openstack-infra		09:35
*** dpyzhov has quit IRC		09:35
*** dpyzhov has joined #openstack-infra		09:36
*** SergeyLukjanov_ is now known as SergeyLukjanov		09:43
*** jamielennox\|away is now known as jamielennox		09:48
*** SergeyLukjanov is now known as SergeyLukjanov_		09:48
*** IvanBerezovskiy has joined #openstack-infra		09:49
*** jp_at_hp has joined #openstack-infra		09:52
*** mancdaz_away is now known as mancdaz		09:54
*** morganfainberg is now known as morganfainberg\|z		09:57
*** derekh is now known as derekh_afk		09:59
*** gokrokve has joined #openstack-infra		10:01
*** rwsu has joined #openstack-infra		10:03
*** vkozhukalov has quit IRC		10:04
*** gokrokve has quit IRC		10:06
*** Ryan_Lane has quit IRC		10:08
*** johnthetubaguy has joined #openstack-infra		10:08
*** amotoki has quit IRC		10:08
*** sarob has joined #openstack-infra		10:13
*** vkozhukalov has joined #openstack-infra		10:16
*** dpyzhov has quit IRC		10:16
*** sarob has quit IRC		10:18
*** max_lobur_afk is now known as max_lobur		10:18
*** zhiwei has quit IRC		10:38
*** mrda has quit IRC		10:39
*** _ruhe is now known as ruhe		10:42
*** zhiwei has joined #openstack-infra		10:43
*** mrda has joined #openstack-infra		10:46
*** yassine has quit IRC		10:46
*** dpyzhov has joined #openstack-infra		10:51
*** zhiwei has quit IRC		10:55
*** iv_m has joined #openstack-infra		10:59
*** ArxCruz has joined #openstack-infra		11:01
*** gokrokve has joined #openstack-infra		11:01
*** markvoelker has quit IRC		11:04
*** gokrokve has quit IRC		11:06
*** sarob has joined #openstack-infra		11:13
*** sarob has quit IRC		11:18
*** rfolco has joined #openstack-infra		11:27
*** boris-42 has quit IRC		11:31
*** derekh_afk is now known as derekh		11:38
*** boris-42 has joined #openstack-infra		11:41
*** pblaho has quit IRC		11:47
*** jhesketh_ has quit IRC		11:51
openstackgerrit	Nikita Konovalov proposed a change to openstack-infra/storyboard: Fix the intial db migration https://review.openstack.org/67592	11:51
openstackgerrit	Nikita Konovalov proposed a change to openstack-infra/storyboard: Introducing basic REST API https://review.openstack.org/63118	11:52
openstackgerrit	Nikita Konovalov proposed a change to openstack-infra/storyboard: Introducing basic REST API https://review.openstack.org/63118	11:54
*** mrda has quit IRC		11:56
sdague	fungi: when you wake up, cyeoh has a fix for that new bug	11:57
*** jamielennox is now known as jamielennox\|away		11:58
max_lobur	Somebody from requirements core group, could you please review/approve the patch https://review.openstack.org/#/c/66349/3. It's already has one +1 from core reviewer	12:00
*** gokrokve has joined #openstack-infra		12:02
*** gokrokve has quit IRC		12:07
*** coolsvap has quit IRC		12:07
*** ruhe is now known as _ruhe		12:09
*** gsamfira has joined #openstack-infra		12:10
*** gsamfira has joined #openstack-infra		12:11
*** yassine has joined #openstack-infra		12:11
*** sarob has joined #openstack-infra		12:13
*** sarob has quit IRC		12:18
*** CaptTofu has joined #openstack-infra		12:25
*** _ruhe is now known as ruhe		12:30
*** dims has quit IRC		12:34
*** dpyzhov has quit IRC		12:34
*** yaguang has quit IRC		12:34
*** yassine has quit IRC		12:39
*** dims has joined #openstack-infra		12:39
*** yassine has joined #openstack-infra		12:40
*** markmc has quit IRC		12:41
*** markmc has joined #openstack-infra		12:44
*** CaptTofu has quit IRC		12:46
*** senk has joined #openstack-infra		12:50
*** pblaho has joined #openstack-infra		12:50
*** senk has quit IRC		12:51
*** dkranz has joined #openstack-infra		12:57
*** david-lyle_ has quit IRC		12:58
*** SergeyLukjanov_ is now known as SergeyLukjanov		12:58
*** DinaBelova_ is now known as DinaBelova		12:58
*** AJaeger has joined #openstack-infra		13:01
*** smarcet has joined #openstack-infra		13:03
*** gokrokve has joined #openstack-infra		13:03
*** heyongli has joined #openstack-infra		13:06
*** markmc has quit IRC		13:07
*** gokrokve has quit IRC		13:08
*** ruhe is now known as _ruhe		13:11
*** sarob has joined #openstack-infra		13:13
*** markmc has joined #openstack-infra		13:15
*** sarob has quit IRC		13:17
*** mriedem has joined #openstack-infra		13:19
*** max_lobur is now known as max_lobur_afk		13:26
*** SergeyLukjanov is now known as SergeyLukjanov_		13:26
matel	Hi, I would like to have some recommendations on what is the proper development process for the devstack-gate project	13:26
*** _ruhe is now known as ruhe		13:29
sdague	matel: can you be more specific for what you are looking for?	13:29
*** alexpilotti has joined #openstack-infra		13:30
*** thomasem has joined #openstack-infra		13:37
*** flaper87 is now known as flaper87\|afk		13:37
fungi	sdague: i saw the discussion in #nova... assuming it's https://review.openstack.org/67767 we seem to still need an approver	13:41
sdague	fungi: yep	13:41
sdague	and test results	13:42
fungi	well, yeah, that	13:42
sdague	so once we get activity on nova channel, and I get a +A, I'll ping you	13:42
fungi	sounds good	13:43
matel	sdauge: I want to test some changes in devstack-gate.	13:44
matel	sdague: I already have an "emulated" node.	13:44
matel	sdague: ./safe-devstack-vm-gate-wrap.sh seems to use the master.	13:45
AJaeger	infra team, fungi: I would love to see the other api projects gated the same way as api-sites (right now they use gate-noop), do you have time for a review, please? https://review.openstack.org/#/c/67394/	13:45
matel	the master of devstack-gate	13:45
matel	sdague: I have this script: https://github.com/matelakat/xenapi-os-testing/blob/start-devstack/launch-node.sh	13:46
matel	sdague: on line 66, I am checking out the branch that I want to try out.	13:47
sdague	yeh, honestly, we don't have a good model for testing that outside of the gate itself right now	13:48
sdague	honestly, when I am making changes I usually use the gate to test them	13:49
*** iv_m has quit IRC		13:49
*** Ng_ has joined #openstack-infra		13:49
matel	sdague: How does that work? The issue in my case, is that it requires a xenserver node.	13:50
matel	Which does not exist in nodepool yet.	13:50
sdague	matel: well we haven't had that situation before	13:50
matel	sdague: I see.	13:51
*** Ng_ has quit IRC		13:51
*** Ng_ has joined #openstack-infra		13:51
matel	sdague: So I would like to modify: https://github.com/openstack-infra/devstack-gate/blob/master/devstack-vm-gate.sh so that it can work with xenserver as well (I need to adjust the localrc basically)	13:52
matel	sdague: Maybe checking out my branch to a location, and set SKIP_DEVSTACK_GATE_PROJECT ?	13:53
sdague	matel: yeh that might work	13:53
sdague	that's in place to test d-g changes actually, so it won't recursively keep checking itself out	13:54
matel	My idea is that I'm gonna launch my node, check out d-g to the location(I need to look at it), and see if that works.	13:55
matel	I need to check where does the checked-out repos live.	13:56
matel	I guess it will live in $BASE/new	13:57
matel	which is /opt/stack/new.	13:57
matel	Okay, I give it a try.	13:58
*** dstanek has joined #openstack-infra		14:01
*** heyongli has quit IRC		14:02
*** gokrokve has joined #openstack-infra		14:04
*** dcramer_ has quit IRC		14:04
*** SergeyLukjanov_ is now known as SergeyLukjanov		14:04
*** b3nt_pin has joined #openstack-infra		14:08
*** Ng_ has quit IRC		14:08
*** gokrokve has quit IRC		14:09
*** Ng_ has joined #openstack-infra		14:12
*** Ng has quit IRC		14:13
*** Ng_ is now known as Ng		14:13
*** sarob has joined #openstack-infra		14:13
*** b3nt_pin is now known as beagles		14:15
sdague	fungi: so that patch failed jenkins on an unrelated race. I still think it should be promoted.	14:17
*** sarob has quit IRC		14:18
sdague	https://bugs.launchpad.net/nova/+bug/1270608 is the other new issue that showed up last week	14:19
*** dprince has joined #openstack-infra		14:27
*** alexpilotti has quit IRC		14:29
*** alexpilotti has joined #openstack-infra		14:29
BobBall	sdague: what's the recommended way to run a single test in tempest these days?	14:29
*** mrodden1 has quit IRC		14:30
*** pblaho1 has joined #openstack-infra		14:31
mriedem	https://review.openstack.org/#/c/67767/ is +A'ed, but needs to pass jenkins	14:31
*** pblaho has quit IRC		14:33
sdague	BobBall: tox -eall testname	14:33
*** damnsmith is now known as dansmith		14:33
BobBall	heh...	14:33
BobBall	sorry	14:33
sdague	fungi: please promote 67767 when you can	14:33
BobBall	that should have been one of the combinations I tried.	14:33
*** max_lobur_afk is now known as max_lobur		14:33
*** eharney has joined #openstack-infra		14:34
sdague	fungi: actually abort on that	14:37
fungi	holding off	14:38
*** senk has joined #openstack-infra		14:38
*** coolsvap has joined #openstack-infra		14:41
*** oubiwann_ has joined #openstack-infra		14:45
*** ryanpetrello has joined #openstack-infra		14:45
fungi	unfortunate... 67767,2 seems to have a merge conflict with some change ahead of it	14:47
*** mrodden has joined #openstack-infra		14:48
*** SergeyLukjanov is now known as SergeyLukjanov_a		14:51
*** pblaho1 has quit IRC		14:52
*** SergeyLukjanov_a is now known as SergeyLukjanov_		14:52
*** dcramer_ has joined #openstack-infra		14:53
*** pblaho has joined #openstack-infra		14:55
*** malini has joined #openstack-infra		14:55
malini	Good Morning!!	14:56
malini	I have a couple of patches outstanding for adding MArconi support	14:56
malini	Can I get some reviews please?	14:56
malini	https://review.openstack.org/#/c/65145/	14:56
malini	https://review.openstack.org/#/c/65140/	14:57
malini	I need these merged before I can get my patch to tempest merged	14:57
*** malini is now known as malini_afk		15:00
*** oubiwann_ has quit IRC		15:00
*** malini_afk is now known as malini		15:02
*** senk has quit IRC		15:03
*** oubiwann_ has joined #openstack-infra		15:04
*** gokrokve has joined #openstack-infra		15:04
*** afazekas_ has quit IRC		15:05
*** nosnos has quit IRC		15:06
*** annegent_ has joined #openstack-infra		15:06
*** senk has joined #openstack-infra		15:07
*** senk1 has joined #openstack-infra		15:08
*** gokrokve has quit IRC		15:09
sdague	fungi: yeh, we're still discussing 67767	15:11
*** senk has quit IRC		15:12
*** sarob has joined #openstack-infra		15:13
*** annegent_ has quit IRC		15:13
*** DinaBelova is now known as DinaBelova_		15:16
*** afazekas_ has joined #openstack-infra		15:16
*** sarob has quit IRC		15:18
*** SergeyLukjanov_ is now known as SergeyLukjanov		15:19
*** DinaBelova_ is now known as DinaBelova		15:20
openstackgerrit	Zang MingJie proposed a change to openstack-infra/zuul: Supply authentication to zuul's gerrit baseurl https://review.openstack.org/67858	15:20
*** dims has quit IRC		15:21
*** rakhmerov has quit IRC		15:22
*** ryanpetrello has quit IRC		15:22
*** rakhmerov has joined #openstack-infra		15:22
openstackgerrit	Zang MingJie proposed a change to openstack-infra/zuul: Supply authentication to zuul's gerrit baseurl https://review.openstack.org/67858	15:23
*** mrmartin has joined #openstack-infra		15:25
openstackgerrit	Nikita Konovalov proposed a change to openstack-infra/storyboard: Load projects from yaml file https://review.openstack.org/66280	15:25
*** talluri has joined #openstack-infra		15:30
max_lobur	Somebody from requirements core group, could you please review/approve the patch https://review.openstack.org/#/c/66349/3. It's already has one +1 from core reviewer	15:30
*** nprivalova is now known as nadya_		15:31
*** dmitkuzn has joined #openstack-infra		15:32
*** jgrimm has joined #openstack-infra		15:34
*** vkozhukalov has quit IRC		15:34
*** dims has joined #openstack-infra		15:35
*** gokrokve has joined #openstack-infra		15:37
*** gokrokve has joined #openstack-infra		15:37
*** rcleere has joined #openstack-infra		15:40
*** johnthetubaguy has quit IRC		15:40
*** DennyZhang has joined #openstack-infra		15:40
*** johnthetubaguy has joined #openstack-infra		15:41
openstackgerrit	Arx Cruz proposed a change to openstack-infra/config: Change mysql-devel to community-mysql-devel in Fedora https://review.openstack.org/62739	15:43
*** afazekas_ has quit IRC		15:44
*** juliashem has joined #openstack-infra		15:46
*** NikitaKonovalov is now known as NikitaKonovalov_		15:47
*** mrmartin has quit IRC		15:49
*** annegent_ has joined #openstack-infra		15:50
*** dmitkuzn has quit IRC		15:51
*** senk1 has quit IRC		15:51
*** juliashem has quit IRC		15:51
*** annegent_ has quit IRC		15:54
*** ryanpetrello has joined #openstack-infra		15:55
*** ryanpetrello has quit IRC		15:55
*** marun has joined #openstack-infra		15:57
fungi	the merge rate seems to be getting substantially worse. we're on track to merge 3 or 4 changes to openstack/openstack in a 24-hour period	15:57
fungi	with the load from check pileup putting zuul into a pendulum between pipelines, we're merging or kicking out (more often kicking out) one change from the gate every couple hours, yet we're approving a dozen an hour	15:59
*** talluri has quit IRC		16:01
notmyname	fungi: how are you tracking that number?	16:01
fungi	notmyname: looked at http://git.openstack.org/cgit/openstack/openstack/log/	16:01
fungi	3 changes merged in the past 18 hours	16:02
notmyname	fungi: thanks	16:02
*** johnthetubaguy has quit IRC		16:02
fungi	and the cinder change at the head of the gate just failed a grenade job, which means now we get to service the 50 or so changes waiting for nodes in the check pipeline before we restart testing on the change which was behind it in the gate	16:03
*** johnthetubaguy has joined #openstack-infra		16:05
fungi	granted that off-the-cuff metric misses changes to stable release branches, but right now those are broken anyway so we wouldn't be merging any changes to them regardless	16:05
*** david-lyle_ has joined #openstack-infra		16:05
*** SergeyLukjanov is now known as SergeyLukjanov_		16:09
openstackgerrit	Arx Cruz proposed a change to openstack-infra/config: Change mysql-devel to community-mysql-devel in Fedora https://review.openstack.org/62739	16:11
*** afazekas_ has joined #openstack-infra		16:11
*** nicedice has joined #openstack-infra		16:13
*** sarob has joined #openstack-infra		16:13
*** salv-orlando has joined #openstack-infra		16:13
*** jcoufal has quit IRC		16:15
*** DinaBelova is now known as DinaBelova_		16:17
*** sarob has quit IRC		16:18
*** nati_ueno has joined #openstack-infra		16:19
*** marun has quit IRC		16:20
*** thuc has joined #openstack-infra		16:20
*** marun has joined #openstack-infra		16:22
*** johnthetubaguy has quit IRC		16:22
*** johnthetubaguy has joined #openstack-infra		16:22
*** dizquierdo has quit IRC		16:27
*** NikitaKonovalov_ is now known as NikitaKonovalov		16:32
sdague	mordred: any word on quota bump?	16:32
fungi	i think it must be freudian that i've started mistyping "gate" as "hate"	16:34
fungi	sdague: it looks like https://review.openstack.org/67371 could use an approval vote	16:35
sdague	fungi: doh	16:36
fungi	otherwise pretty much all of the tempest changes from last week's sprint have merged (except for a couple which are in the gate currently)	16:36
sdague	fungi: where is it in the queue?	16:36
*** mancdaz is now known as mancdaz_away		16:37
fungi	sdague: it isn't. it already passed all the way through but failed to merge because dkranz revoked his approval	16:37
*** AaronGr_Zzz is now known as AaronGr		16:37
*** UtahDave has joined #openstack-infra		16:38
*** markmcclain has joined #openstack-infra		16:38
sdague	fungi: can we promote or ninja merge? that will actually take some of the load off the neutron tests	16:39
*** yamahata has quit IRC		16:39
sdague	which should increase their pass rate	16:39
fungi	sdague: should be safe. looks like it would have made it were it not for the missing approval vote when it was done	16:39
*** DinaBelova_ is now known as DinaBelova		16:39
fungi	i'll merge it	16:40
*** mancdaz_away is now known as mancdaz		16:41
fungi	it's merged now	16:41
*** jpich has quit IRC		16:43
*** afazekas_ has quit IRC		16:45
mtreinish	fungi: heh, I don't think I actually seen that before	16:45
fungi	mtreinish: that's the behavior if missing vrfy/cdrv/aprv votes are missing or there's a -2 vote on it when it comes time to merge	16:46
fungi	generally happens when they're manually unset while it's in the gate	16:46
mordred	sdague: nope. just pinged back again	16:46
mtreinish	fungi: yeah it looks like dkranz removed his +A after the gate tests started on it	16:46
fungi	yep	16:46
fungi	which won't kick it out of the gate at the moment, but will prevent it from merging once it makes its way through	16:47
sdague	fungi: so we might want to trigger a gate dequeue on removing A	16:47
sdague	because otherwise it's kind of useless	16:47
fungi	sdague: i believe there is intent to make that happen (along with on -2 as well), but is still on the to do list	16:48
sdague	yep	16:49
sdague	did the early pep8 on check ever get merged?	16:49
fungi	sdague: mordred wanted to rework it. it wouldn't have bought us much in its original form	16:50
sdague	ok, cool	16:50
sdague	just checking	16:50
fungi	all it would have preempted was python26/27 and docs checks	16:50
mordred	yeah. I'm not sure it's possible to express with the current template setup	16:50
*** elasticio has joined #openstack-infra		16:50
sdague	ok	16:50
*** mgagne has joined #openstack-infra		16:51
*** GheRiver1 has joined #openstack-infra		16:53
*** GheRiver1 has quit IRC		16:53
*** MarkAtwood has joined #openstack-infra		16:54
*** pblaho has quit IRC		16:57
*** AaronGr is now known as AaronGr_Zzz		16:58
sdague	fungi: so given that we're not really moving code anyway, what are the odds we could fix logs on the other jenkinses	16:58
*** alexpilotti has quit IRC		16:58
*** sarob has joined #openstack-infra		16:59
*** ruhe is now known as _ruhe		16:59
*** krotscheck has joined #openstack-infra		17:00
fungi	sdague: pretty good. would be easier when clarkb is around since he knows how he was obtaining the patched plugin build to upload into them	17:00
sdague	sure	17:01
sdague	that's fair, hopefully he'll be back on soon	17:01
*** nati_ueno has quit IRC		17:02
*** pblaho has joined #openstack-infra		17:04
*** pblaho has quit IRC		17:04
mgagne	zaro: ping	17:06
*** vkozhukalov has joined #openstack-infra		17:08
*** senk1 has joined #openstack-infra		17:09
*** Ryan_Lane has joined #openstack-infra		17:10
*** Ryan_Lane has quit IRC		17:11
sdague	mordred: if you feel like reviewing something that can merge - https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:gatestatus,n,z	17:12
sdague	then I can get that off the elastic recheck page	17:13
mordred	usdlooking	17:13
mordred	gah	17:13
mordred	sdague: looking	17:13
*** gokrokve has quit IRC		17:13
*** gokrokve has joined #openstack-infra		17:13
*** aburaschi has joined #openstack-infra		17:15
sdague	also, where is that framework patch for status again?	17:16
sdague	I want to look at redoing the er stuff like that before I add more logic to the existing page	17:16
*** gokrokve has quit IRC		17:18
*** jaypipes has joined #openstack-infra		17:19
*** mancdaz is now known as mancdaz_away		17:19
*** moted has quit IRC		17:20
*** moted has joined #openstack-infra		17:20
aburaschi	Hello, newbie quick question: if I want to reverify a patch in jenkins, and I identify that two bugs are associated to that failure, which is the correct way to proceed?	17:20
aburaschi	a) put:	17:20
aburaschi	reverify bug 1	17:20
aburaschi	reverify bug 2	17:20
aburaschi	or b) select just one and go with that one?	17:20
fungi	aburaschi: best would be to leave two reverify comments, one for each bug which resulted in a failure (don't leave them in the same comment though or it won't work)	17:22
*** SumitNaiksatam has quit IRC		17:25
aburaschi	Excellent, thank you very much, fungi.	17:25
fungi	you're welcome	17:26
*** DennyZhang has quit IRC		17:26
*** yassine has quit IRC		17:27
*** AaronGr has joined #openstack-infra		17:29
fungi	...thinking aloud, i wonder whether giving the check pipeline priority over the gate would break the pendulum swing and improve gating performance	17:29
*** AaronGr has quit IRC		17:30
*** AaronGr_Zzz is now known as AaronGr		17:30
fungi	we'd dribble nodes into the gate jobs in sequence as the check pipeline no longer needs them. as a result, we'd be testing fewer gate changes at a time, meaning a smaller rush of nodes to reclaim on the inevitable gate reset	17:31
fungi	would have the effect of spreading nodepool delete and build operations out more evenly	17:31
*** thuc has quit IRC		17:36
*** afazekas_ has joined #openstack-infra		17:37
*** thuc has joined #openstack-infra		17:37
*** marun has quit IRC		17:40
*** fbo is now known as fbo_away		17:40
*** pballand has joined #openstack-infra		17:40
*** marun has joined #openstack-infra		17:41
*** thuc has quit IRC		17:41
*** chandankumar_ has quit IRC		17:42
*** luqas has quit IRC		17:43
*** senk1 has quit IRC		17:43
sdague	fungi: do we ever hit a point where check doesn't need them right now?	17:43
sdague	I also thought both queues were equal priority	17:44
fungi	sdague: if we were servicing it first, we probably would	17:44
sdague	fungi: I'm not convinced :)	17:44
fungi	they are equal priority right now, which is what causes the swing	17:44
sdague	it's at 102	17:44
clarkb	morning	17:44
sdague	and given the build delays, I think we'd just completely starve the gate	17:44
sdague	if we had more nodes, I'd agree	17:44
sdague	ok, going to pop out for lunch	17:45
clarkb	fungi: I grabbed the scp.jpi file from jenkins-dev	17:45
*** _ruhe is now known as ruhe		17:45
clarkb	fungi you can grab it from there or 02	17:45
fungi	sdague: possibly. part of it is that right now, we're applying every new node to gate changes (because there's more than we can service) and then once a gate reset happens, we start handing every available node to the check pipeline changes which piled up while we were previously handing them all to the gate	17:46
sdague	yep, swapping not fun	17:46
fungi	but given the gate reset frequency, most of the nodes burned on gate pipeline changes were wasted because their results were never needed	17:46
*** dstufft has quit IRC		17:46
sdague	right	17:47
*** nati_ueno has joined #openstack-infra		17:47
*** dstufft has joined #openstack-infra		17:47
fungi	at most the first few dozen nodes applied to the gate had any real effect at all, and the rest were just resources which could have gone to clearing out the check pipeline instead	17:47
*** jasondotstar has joined #openstack-infra		17:47
sdague	the smart way to do it would be to calculate out the percentage chances for each successive piece of the gate to get through from it's current position, then define a cutoff	17:47
sdague	and not schedule past that point	17:47
sdague	that requires a lot more logic though	17:48
*** nati_ueno has quit IRC		17:48
dkranz	fungi, sdague : Did I do something bad?	17:48
sdague	dkranz: yeh, but we fixed it	17:48
mordred	sdague: we can segregate teh pools more though	17:48
mordred	we have precide and check-precise or whatever it's called	17:48
dkranz	sdague: For future reference, what am I not supposed to do?	17:48
sdague	dkranz: don't remove +A from a change in the gate	17:49
sdague	the behavior isn't what you actually want	17:49
mordred	we could change teh nodepool config to put less nodes into devstack-precise and more into devstack-precise-check	17:49
fungi	mordred: we actually got rid of precise-check nodes a few weeks ago. now all dsvm and bare nodes are available for either check or gate	17:49
mordred	to achieve with a baseball bat the thing you were talking about above	17:49
mordred	oh. well	17:49
sdague	ok... really, leaving for lunch	17:49
dkranz	sdague: ok. But I was not trying to stop it and didn't realize it was in the gate.	17:49
dkranz	sdague: I just saw from other comments that it should not have been approved.	17:50
dkranz	But I won't do it again	17:50
*** rakhmerov has quit IRC		17:51
clarkb	fungi mordred should we bump scp on 03 now?	17:51
openstackgerrit	A change was merged to openstack-infra/config: Additional jobs for python-rallyclient https://review.openstack.org/66929	17:51
mordred	clarkb: yeah	17:51
fungi	dkranz: probably could have just left it approved at that point and waited to see whether the check results came back green. then if not, upload a new patchset to knock the previous broken one out of the gate	17:52
openstackgerrit	A change was merged to openstack-infra/config: Add an experimental functional job for neutron. https://review.openstack.org/66967	17:52
fungi	clarkb: i believe that would be good	17:52
*** SergeyLukjanov_ is now known as SergeyLukjanov		17:52
*** sarob has quit IRC		17:53
*** sarob has joined #openstack-infra		17:54
*** bnemec_ is now known as bnemec		17:54
clarkb	ok putting 03 in shutdown mode	17:56
*** ruhe is now known as _ruhe		17:57
openstackgerrit	A change was merged to openstack-infra/storyboard: Fix the intial db migration https://review.openstack.org/67592	17:59
*** MarkAtwood has quit IRC		18:00
*** boris-42 has quit IRC		18:00
clarkb	fungi: mordred: the scp.jpi file is on 03 and 04 under ~clarkb/plugins/scp/fixed	18:01
fungi	k	18:02
fungi	and you're just using the upload screen in the webui to upgrade it?	18:03
*** derekh has quit IRC		18:03
*** chandankumar_ has joined #openstack-infra		18:03
*** boris-42 has joined #openstack-infra		18:04
*** nati_ueno has joined #openstack-infra		18:04
clarkb	fungi: no, I am actually stopping the server, putting the scp.jpi in /var/lib/jenkins/plugins then starting jenkins	18:05
*** NikitaKonovalov is now known as NikitaKonovalov_		18:05
clarkb	fungi: you can use the webui instead, it is how zaro put it on -dev	18:05
clarkb	I feel like there is more control doing it by hand on disk	18:05
*** CaptTofu has joined #openstack-infra		18:06
clarkb	because I don't know what magic jenkins is doing under the hood to do restartless upgrades (which don't work) and so on	18:06
*** gokrokve has joined #openstack-infra		18:09
*** zz_ewindisch is now known as ewindisch		18:09
*** sarob has quit IRC		18:09
*** markmcclain has quit IRC		18:10
fungi	ahh, okay. the last time i did it from the fs it was because jenkins wouldn't start otherwise, and i wasn't sure how many of the accompanying files needed to be copied into place too or whether some of those were ephemeral	18:10
*** markmcclain has joined #openstack-infra		18:10
clarkb	fungi: the scp/ dir that is created is made by expanding the jpi archive I think	18:11
radix	can someone help me understand what's going on in http://logs.openstack.org/06/67006/4/check/check-tempest-dsvm-full/5fa3d8a/ ? It seems to be some kind of network failure	18:11
clarkb	fungi: the only thing you need is the .jpi or .hpi	18:11
*** rakhmerov has joined #openstack-infra		18:11
*** afazekas_ has quit IRC		18:12
*** rakhmerov has joined #openstack-infra		18:12
clarkb	radix: http://logs.openstack.org/06/67006/4/check/check-tempest-dsvm-full/5fa3d8a/logs/devstack-gate-setup-workspace-new.txt an hpcloud node is trying to clone a repo over ipv6	18:13
clarkb	hpcloud doesn't have an ipv6 stack	18:13
*** johnthetubaguy has quit IRC		18:13
clarkb	fungi: did we determine anything more about that problem?	18:13
radix	clarkb: this came up in my heat change, pretty sure it's unrelated, and I'm not sure what to do about it	18:13
clarkb	radix: I am not sure either, I think fungi has investigated it	18:14
radix	oh ok :)	18:14
clarkb	one job left on 03, I will stop it and start it with new scp plugin as soon as that job clears out	18:14
clarkb	which is now	18:15
openstackgerrit	Brant Knudson proposed a change to openstack/requirements: Update oauthlib requirement to at least 0.6 https://review.openstack.org/67900	18:15
*** jaypipes has quit IRC		18:16
fungi	clarkb: only speculation... the ip configuration output in the console log only shows ipv4 (not even any linklocal v6), which makes me think we're doing "ip -4 ad sh" explicitly or something. i'll have a look and see how we might get more diagnostics for this on future runs	18:16
clarkb	03 is back up with new plugin	18:17
clarkb	fungi: mordred: should I put 04 in shutdown mode now?	18:18
radix	hmm, looks like this: https://bugs.launchpad.net/openstack-ci/+bug/1266616	18:18
radix	I guess I'll run a recheck on that	18:18
fungi	clarkb: go for it	18:18
*** vkozhukalov has quit IRC		18:19
clarkb	fungi: once 04 is done the remaining nodes will be 01 and jenkins.o.o which can get the correct version when we update their jenkins version	18:19
*** fifieldt has joined #openstack-infra		18:19
fungi	radix: that looks like it, yeah	18:19
clarkb	I am going to take advantage of the wait to return to my regularly scheduled morning	18:19
clarkb	will pop back in in a bit to finish 04	18:20
fungi	radix: current suspicion is that some other tenant in hpcloud is generating router advertisements, but adding some extra debugging around address assignments there may help enlighten us as to the cause	18:20
radix	yikes	18:21
*** yamahata has joined #openstack-infra		18:21
clarkb	fungi: we can update the iptables rules right?	18:21
clarkb	needs to be conditional for hpcloud only though	18:22
*** pballand has quit IRC		18:23
*** SergeyLukjanov is now known as SergeyLukjanov_a		18:24
fungi	clarkb: well, if that's the cause then yes, but if so there's every chance the same could happen in rackspace and then we'd need to be able to keep filters updated for their router linklocal addresses	18:24
*** SergeyLukjanov_a is now known as SergeyLukjanov_		18:25
fungi	clarkb: radix: for details, see https://launchpad.net/bugs/1262759	18:26
*** afazekas_ has joined #openstack-infra		18:26
fungi	it's apparently blocked if you're doing openstack ipv6 networking, but given the way in which rackspace has implemented their ipv6 vm connectivity i have no idea whether that also holds true for them	18:29
*** afazekas_ has quit IRC		18:30
*** dcramer_ has quit IRC		18:31
*** afazekas_ has joined #openstack-infra		18:32
*** marun has quit IRC		18:32
*** marun has joined #openstack-infra		18:33
*** ewindisch is now known as zz_ewindisch		18:36
*** elasticio has quit IRC		18:36
*** praneshp has joined #openstack-infra		18:36
*** zz_ewindisch is now known as ewindisch		18:37
*** jaypipes has joined #openstack-infra		18:37
*** senk1 has joined #openstack-infra		18:38
*** ewindisch is now known as zz_ewindisch		18:40
*** marun has quit IRC		18:40
*** zz_ewindisch is now known as ewindisch		18:41
*** marun has joined #openstack-infra		18:41
clarkb	eta on 04 is 30 minutes	18:43
*** chandankumar_ has quit IRC		18:44
*** yamahata has quit IRC		18:48
*** jasondotstar has quit IRC		18:49
openstackgerrit	Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Add tox.ini file to run things via tox https://review.openstack.org/67721	18:49
clarkb	fungi: do we think we should submit a ticket to hpcloud about the possible bad 'router'?	18:50
*** ewindisch is now known as zz_ewindisch		18:50
openstackgerrit	Jeremy Stanley proposed a change to openstack-infra/devstack-gate: Also print IPv6 address details https://review.openstack.org/67911	18:52
fungi	clarkb: maybe we start with ^ and have a look at the next one which hits logstash	18:52
*** CaptTofu has quit IRC		18:52
*** mindjiver has quit IRC		18:52
*** zz_ewindisch is now known as ewindisch		18:52
* clarkb looks		18:53
*** nati_uen_ has joined #openstack-infra		18:53
fungi	do you think an ip route show along with that would also be in order?	18:53
krotscheck	clarkb: The run-selenium script seems to depend on having run_tests.sh in the project. Do you have a strong opinion on whether A) I can remove that, or B) I should create an xvfb builder macro that just executes tox?	18:53
*** markmcclain has quit IRC		18:53
fungi	clarkb: oh, though for that you also need ip -6 route show. maybe add an ip {,-6} neighbor show too	18:54
clarkb	krotscheck: I would love it if we can remove the dependency on run_tests.sh, but horizon is a thing	18:54
clarkb	krotscheck: maybe we can feed run-selenium a command to execute a test with selenium bits prestaged	18:54
clarkb	krotscheck: then feed a different command to horizon and storyboard	18:55
clarkb	fungi: sounds good to me	18:55
krotscheck	clarkb: I dunno, that feels a bit like overparameterizing a command	18:55
clarkb	krotscheck: not really, its creating a specific test environment to run tests within	18:55
krotscheck	clarkb: BTW- so there's a python module called nodeenv that will drop a nodejs runtime into your virtualenv for you.	18:55
clarkb	the test yo uwant to run within it don't need to be identical	18:55
krotscheck	clarkb: So mordred and I are working on just having storyboard use tox.	18:56
clarkb	fungi: want to update the existing change or do that in a different one?	18:56
*** nati_ueno has quit IRC		18:56
fungi	clarkb: i'm updating it now	18:56
openstackgerrit	Jeremy Stanley proposed a change to openstack-infra/devstack-gate: More network debugging detail https://review.openstack.org/67911	18:58
fungi	clarkb: ^ updated	18:58
*** markmcclain has joined #openstack-infra		18:59
fungi	clarkb: turns out ip neighbor show gets you both the arp and nd table entries together, so it's just ip route show which needs a separate -6 variant	18:59
*** Ajaeger1 has joined #openstack-infra		19:01
*** SergeyLukjanov_ is now known as SergeyLukjanov		19:02
*** amotoki has joined #openstack-infra		19:03
fungi	clarkb: anyway, with that it should give us enough info to spot the ethernet address of the "router" if it really is someone testing radvd or a broken switchrouter in the distribution layer or something at fault	19:03
jog0	sdague: ping	19:03
sdague	jog0: yo	19:03
*** afazekas_ has quit IRC		19:03
*** amotoki has quit IRC		19:03
jog0	sdague: https://review.openstack.org/#/c/67596/ works can you review it	19:03
jog0	mtreinish: if your around	19:03
jog0	sdague: that will give us more accurate e-r comments	19:04
*** CaptTofu has joined #openstack-infra		19:05
jog0	which is why I want to get this in as soon as possible	19:06
sdague	jog0: so I have one suggested change, inline	19:08
jog0	sdague: sounds like a good idea to me, thanks	19:09
*** mrodden has quit IRC		19:10
jog0	so actually we use a lot of data from the gerrit event	19:11
jog0	and its all over the place right now	19:12
jog0	sdague: so I would prefer to do that refactor separately	19:12
*** yamahata has joined #openstack-infra		19:13
*** markmcclain has quit IRC		19:13
openstackgerrit	lifeless proposed a change to openstack-infra/config: Add some dependencies required by toci https://review.openstack.org/67685	19:13
lifeless	clarkb: fungi: if we can get ^ landed and then turn on the tripleo nodepool config, that would be the awesome	19:14
mriedem	did anything change with the backing cinder volume store on the test nodes around 1/17?	19:14
*** jasondotstar has joined #openstack-infra		19:14
sdague	jog0: can you introduce the event object under this one	19:14
sdague	I really hate having to clean these up later	19:14
clarkb	04 is idle now, updating scp plugin now	19:15
*** mrodden has joined #openstack-infra		19:15
sdague	great	19:15
sdague	I can already see us timing out a lot less in the channel	19:15
*** markmc has quit IRC		19:16
openstackgerrit	Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Add tox.ini file to run things via tox https://review.openstack.org/67721	19:16
fungi	mriedem: what backing cinder volume store? you mean the one devstack creates when it starts up?	19:16
mriedem	fungi: yes	19:16
mriedem	anything using iscsu	19:16
mriedem	*iscsi	19:16
jgriffith	mriedem: fungi aren't those still just loopback files created by devstack?	19:17
fungi	jgriffith: as far as i know, yes. so any changes would be changes in devstack or maybe devstack-gate repositories	19:17
jgriffith	fungi: or cinder :/	19:17
jog0	sdague: normally I would say sure, bit I am not even supposed to be working today, just stopped in for an hour or so	19:17
jgriffith	mriedem: what are you seeing?	19:17
fungi	jgriffith: well, yeah, or cinder ;)	19:18
mriedem	jgriffith: digging into this https://bugs.launchpad.net/nova/+bug/1270608	19:18
jog0	I agree it needs cleanup but I don't think its worth holding this up for that	19:18
clarkb	04 seems up	19:18
mriedem	i might be looking at a red herring in the nova code that changed on 1/17 which is when that bug started showing up	19:18
fungi	clarkb: agreed. looks like it's running jobs already	19:18
mriedem	i'll see what changed in cinder and devstack on 1/17	19:18
ewindisch	irt a conversation I've been having with dtroyer in #openstack-dev....	19:19
ewindisch	what are the thoughts toward gating another nova hypervisor in openstack-infra?	19:19
jgriffith	mriedem: I seem to recall this may be a dup of another nova item we looked at a while back	19:19
sdague	jog0: so I don't want to unwind this when we could do the event object first	19:20
ewindisch	Dean seems to worry about having enough resources for the extra gate	19:20
sdague	as it makes more work	19:20
sdague	ewindisch: -1	19:20
sdague	revisit at Juno summit	19:20
ewindisch	sdague: at the root of this is russell REQUIRING a (non-voting) gate to keep hypervisors in Nova	19:21
jog0	sdague: you want to take a whack at the event object? I am trying to not work today	19:21
*** annegent_ has joined #openstack-infra		19:21
sdague	jog0: yep, I will	19:21
sdague	ewindisch: yep, do what everyone else is doing, and bring up a 3rd party system	19:21
jog0	sdague: thanks	19:21
jog0	!	19:21
fungi	ewindisch: as it stands there's still a whole stack of patches against nodepool, devstack-gate and infra/config to get xenserver testing working. we haven't even had time to look at them as far as i'm aware (much to the displeasure of the xenserver devs)	19:21
clarkb	fungi: did they ever respond to the first round of review on those?	19:22
fungi	clarkb: i believe so, but i've been too busy to look through them again	19:22
*** nati_uen_ has quit IRC		19:22
clarkb	fungi: I was really curious what the feedback would be but the changes sat idel and were auto abandoned	19:22
mriedem	jgriffith: this is iscsi related and 1/17: https://github.com/openstack/cinder/commit/a9267644ee09591e2d642d6c1204d94a9fdd8c82	19:22
*** annegent_ has quit IRC		19:22
*** markmcclain has joined #openstack-infra		19:23
ewindisch	sdague: everyone else being "VMware" and "Citrix" i.e. https://www.google.com/finance?q=ctxs and https://www.google.com/finance?q=vmw	19:23
jgriffith	mriedem: eeek	19:24
*** jp_at_hp has quit IRC		19:24
jog0	ewindisch: even if we wanted to we don't have the resources right now	19:24
mriedem	jgriffith: i'm not familiar with that code, but does that look like it could cause races? like premature return from snapshot from volume when it's not ready?	19:24
ewindisch	I understand, but I'm going to have to sync with russellb and samalba about this.	19:25
jgriffith	mriedem: indeed, I believe it could	19:25
jgriffith	mriedem: looking now	19:25
jgriffith	mriedem: I believe you're correct	19:28
jog0	so I have a question that I am not sure how to answer: do we think dropping tempest concurrency down to 2 increased the number of patches we are able to merge into openstack/openstack in a given amount of time	19:28
jgriffith	mriedem: I'll spin it up here and take a look after I finish what I'm in the middle of now	19:28
*** thomasem has quit IRC		19:28
clarkb	jog0: no	19:28
clarkb	I think it significantly impacted the backlog	19:28
jog0	I count 7 patches in last 24 hours	19:29
jog0	clarkb: perhaps we should consider reverting the patch	19:29
clarkb	in the opposite direction, but I have no hard data to support that	19:29
clarkb	because tests are taking up to 1.33 hours now isntead of .70 hours or wherever they were before	19:29
mriedem	jgriffith: cool, thanks	19:30
*** yolanda has quit IRC		19:30
russellb	taking 1.33 hours more reliably is better than 0.7 hours with random failures all over the place due to pegging the CPU the entire time	19:30
fungi	as discussed in #nova, i'm going to promote https://review.openstack.org/67914 to the head of the gate pipeline. the result will be that everything in the check pipeline as of now will get new nodes first, and then that change will get a shot at fixing a substantial percentage of our gate resets	19:30
sdague	so I actually think the concurrency did make things better	19:30
russellb	it's really just a non-starter to run the tests with CPU over the top	19:30
clarkb	russellb: it isn't more reliable though	19:30
sdague	clarkb: sure	19:31
*** SergeyLukjanov is now known as SergeyLukjanov_a		19:31
russellb	the failures are just other things right now	19:31
sdague	but it's more reliable	19:31
sdague	so I'm -1 to going back to 4x	19:31
russellb	it eliminates a whole class of failures	19:31
sdague	agree with russellb	19:31
clarkb	were those failures just masking all of these failures?	19:31
sdague	clarkb: possibly	19:31
clarkb	we are still essentially worst cases the gate queue which is where we were before	19:32
clarkb	so the gate queue isn't more reliable	19:32
*** SergeyLukjanov_a is now known as SergeyLukjanov_		19:32
sdague	we were also in deep gate queue	19:32
jog0	clarkb: http://status.openstack.org/elastic-recheck/ the graph at the top looks very wrong	19:32
sdague	so we're basically driving a rover on mars	19:32
clarkb	I think we had what 30 changes merge over a day recently	19:32
clarkb	jog0: looks like graphite problems	19:32
jog0	clarkb: yeah	19:32
jog0	so merge rates: http://paste.openstack.org/show/61594/	19:32
sdague	clarkb: yeh, friday -> sat was about 30 in 24hrs	19:32
sdague	I also expect what happened is in drop concurency we had some tests move around	19:33
*** _david_ has joined #openstack-infra		19:33
sdague	so we go new overlaps	19:33
sdague	got new overlaps	19:33
fungi	http://git.openstack.org/cgit/openstack/openstack/log/ shows 4 commits in the past 22 hours, one of which i force-merged without putting through the gate	19:33
sdague	which exposed a few new issues	19:34
clarkb	I need to run back to regularly scheduled holiday programming	19:34
fungi	k	19:35
*** SergeyLukjanov_ is now known as SergeyLukjanov		19:35
jog0	spot checking shows these numbers appear to be common	19:36
jog0	merges per day in below 45	19:37
*** emagana has joined #openstack-infra		19:37
sdague	jog0: you need to only count merge commits	19:37
sdague	otherwise the timing is off	19:37
sdague	filter by author jenkins	19:37
jog0	https://github.com/openstack/openstack/graphs/commit-activity	19:38
sdague	jog0: right, but we have 2 commits per commit	19:38
lifeless	sdague: so 3 in total?	19:39
*** markmcclain has quit IRC		19:39
sdague	:P	19:39
*** HenryG has quit IRC		19:39
sdague	jog0: anyway if yuo add --author=jenkins to your git commands it will be close	19:39
sdague	it will double count translations	19:39
sdague	but that's pretty minor	19:39
jog0	I don't see any doubles and translations are merges	19:40
jog0	anyway	19:40
sdague	jog0: oh, github is filtering merges	19:41
sdague	but on your pastebin	19:42
jog0	sdague: yeah I forgot about github, they have pretty pictures	19:42
jog0	anyway data looks inconclusive to me	19:44
jog0	do we know why deletes take so long in nodepool btw?	19:44
clarkb	because cloud. deletes are expensive	19:45
fungi	jog0: in particular, rackspace likes to ignore them	19:45
fungi	so we keep spamming them with delete calls until they finally free up the node	19:45
jog0	fungi: ahh	19:46
fungi	hpcloud doesn't ignore them as much, just takes a long time to act on them	19:46
*** praneshp_ has joined #openstack-infra		19:46
jog0	are deletes slow in openstack in general?	19:46
fungi	i suspect it depends on the load in your cloud	19:46
jog0	can we conplain to RAX and HP cloud about it?	19:46
*** CaptTofu has quit IRC		19:46
fungi	i have this running on nodepool.o.o for the past 18 hours or so, but it hasn't seemed to make any difference: https://review.openstack.org/67723	19:47
sdague	jog0: they'll probably just complain back to us to clean up nova :)	19:47
jog0	we can do that	19:47
jog0	but the nodepool plots are just sad	19:48
fungi	clarkb: i was wanting to ask on 67723, does that need a yield in the outer loop too?	19:48
lifeless	hah, devstack-gate really wants a lot of variables and node state :/	19:48
*** praneshp has quit IRC		19:48
*** praneshp_ is now known as praneshp		19:48
sdague	jog0: honestly, that's what swapping looks like. We've just got a working set far too large for our resources, so now we're swapping	19:50
lifeless	fungi: no, its just broken	19:50
lifeless	fungi: reviewing it now	19:50
fungi	sdague: well, the providers also do take waaaay too long to act on delete calls from us	19:50
fungi	lifeless: okay, thanks. it's a bit over my head i'm afraid	19:50
jog0	mordred: ^ can you look into the HP side of this	19:50
*** markmcclain has joined #openstack-infra		19:51
sdague	jog0: I think that's a good long term conversation, I don't see that helping us over the hump	19:51
*** gokrokve has quit IRC		19:53
openstackgerrit	Michael Krotscheck proposed a change to openstack-infra/config: Use nodeenv via tox to do javascript testing https://review.openstack.org/67729	19:53
*** gokrokve has joined #openstack-infra		19:53
jog0	sdague: agreed	19:53
*** fifieldt has quit IRC		19:53
*** yolanda has joined #openstack-infra		19:54
openstackgerrit	Michael Krotscheck proposed a change to openstack-infra/config: Use nodeenv via tox to do javascript testing https://review.openstack.org/67729	19:55
*** rnirmal has joined #openstack-infra		19:55
lifeless	fungi: so nodepool is regular python	19:55
lifeless	fungi: threads, not eventlet	19:55
*** marun has quit IRC		19:55
*** westmau5 is now known as westmaas		19:55
fungi	lifeless: thanks! i'm far more used to hacking on single-threaded applications	19:55
lifeless	fungi: at least, AFAICT	19:55
lifeless	fungi: anyhow, have a look at task_manager.py - you can see that run() is single threaded	19:55
lifeless	fungi: it pulls a work item off of a queue, processes it, and continues.	19:56
lifeless	fungi: it's not using a thread pool, so making the time to process a single item longer (e.g. up to 10 minutes!) will delay operating /all/ the tasks in the queue	19:56
*** HenryG has joined #openstack-infra		19:57
lifeless	fungi: I'll work up an alternative patch for you	19:57
fungi	lifeless: well, it was 10 minutes before, but having the outer loop be 10 minutes rather than the iterate_timeout() loop may make it less what i meant, agreed	19:57
lifeless	fungi: I think you're missing my point :(. All deletes occur in a single thread.	19:58
*** gokrokve has quit IRC		19:58
fungi	i pondered running two layers of iterate_timeout() inside each other there	19:58
lifeless	fungi: waiting in that thread for a delete to occur makes all other deletes slower.	19:58
*** _ruhe is now known as ruhe		19:59
fungi	lifeless: you mean originally, or only with that patch	19:59
*** AaronGr is now known as Aarongr_afk		19:59
lifeless	fungi: in both cases its all single threaded	19:59
fungi	got it	19:59
lifeless	fungi: because its in the JenkinsManager TaskManager queue	19:59
lifeless	fungi: your patch increases how long a specific delete takes, but does so by not deleting anything else for that period... because it's single threaded	20:00
fungi	okay, so the yield in iterate_timeout() doesn't really allow anything helpful anyway	20:00
lifeless	the yield in iterate_timeout is an entirely separate discussion	20:00
lifeless	its because its a generator, so its needed	20:00
fungi	oh, right	20:00
* fungi sighs at his absent-mindedness		20:01
lifeless	there's also a 1 second gap between tasks by default	20:03
lifeless	I'm not at all sure that makes sense	20:03
lifeless	if you have more than 60 actions a minute, it will backlog	20:03
*** zanins has joined #openstack-infra		20:03
* lifeless makes a mental note to ask jeblair about that		20:04
lifeless	it may be working around broken API ratelimits on small clouds	20:04
jog0	lifeless: that may be why deletes are so slow right now	20:04
fungi	jog0: well, they were equally as slow before i tried that	20:04
lifeless	so one simple thing to try would be to set rate to 0.5 or something	20:05
jog0	fungi: the 1 second gap?	20:05
fungi	jog0: oh, i thought you meant the extra loop	20:05
russellb	it would probably backlog earlier than 60 per minute	20:06
*** marun has joined #openstack-infra		20:06
*** bermut has joined #openstack-infra		20:06
jog0	well we definitly have more then 60 nodes in nodepool and many are in delete	20:07
russellb	so i wonder if we should just put a hard limit on how many changes are tested in parallel in the gate queue	20:07
russellb	that would help node thrashing on resets	20:07
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Provide diagnostics when task rate limiting. https://review.openstack.org/67924	20:07
lifeless	fungi: so I'd back the aggressive delete out, and apply ^	20:07
lifeless	fungi: I haven't /tested/ that patch yet, however	20:07
fungi	however the 600-second timeout in the cleanup method was being hit fairly regularly, which was similarly backing up the other delete actions from what i saw before	20:08
jog0	russellb: looking at status.o.o/zuul we don't get test that many in parallel in gate	20:08
lifeless	fungi: that has to go too	20:08
lifeless	fungi: I bet thats an attempt to avoid quota overuse	20:08
russellb	jog0: right now yeah ...	20:08
jog0	because we are resources starved, infact top of gate isn't getting run	20:08
jog0	russellb: right now yeah	20:08
openstackgerrit	Michael Krotscheck proposed a change to openstack-infra/config: Genericize javascript release artifact creation https://review.openstack.org/67731	20:09
lifeless	oh, wow	20:09
lifeless	so this code doesn't clearly signal what is within a task and what is not	20:09
lifeless	that 600 second wait actually does a cross-thread block	20:09
jeblair	fungi, lifeless: easy thing to help with deletes is to increase the 600 second delete timeout (maybe 1 hour)	20:09
*** gokrokve has joined #openstack-infra		20:09
lifeless	jeblair: increase it ?	20:10
lifeless	jeblair: what is the 600 second timeout for ?	20:10
jeblair	fungi, lifeless: because it turns deletes from parallel operations into serial ones	20:10
jog0	right now we have 150 nodes in deleting tate or so	20:10
jeblair	lifeless: to avoid having lots of threads waiting around "forever" for something that isn't going to happen	20:11
lifeless	jeblair: I mean, why wait at all ?	20:11
jeblair	lifeless: occasionally cloud providers never delete nodes	20:11
lifeless	jeblair: the code doesn't take any action if it's not deleted.	20:11
lifeless	jeblair: other than raising an exception	20:12
fungi	yeah, we now have two stuck in a active(deleting) state in hpcloud-az2 which i have manually cleaned out of nodepool so that it doesn't keep trying and failing to delete those	20:12
jeblair	lifeless: good point; it should probably delete, wait 5-10 minutes, then delete again	20:12
openstackgerrit	Davanum Srinivas (dims) proposed a change to openstack-infra/devstack-gate: Temporary HACK : Enable UCA https://review.openstack.org/67564	20:12
jeblair	lifeless: oh, but it does set the node state, right?	20:12
lifeless	jeblair: note that I can see, I'm just tracing the code atm	20:13
jeblair	lifeless: there is an action if it does succeed -- it deletes the node from the db	20:13
*** bermut has quit IRC		20:13
jeblair	lifeless: so that's what it's waiting for	20:14
lifeless	jeblair: so I think we should decouple those things	20:15
lifeless	jeblair: not wait, instead set a state DELETING	20:15
*** jasondotstar has quit IRC		20:15
lifeless	jeblair: and in the periodic check if the server is gone, delete from db, if its not submit a delete again	20:15
sdague	welcome back jeblair	20:16
dstufft	offtopic, but I need to ask someone a pbr question and i don't see a pbr specific channel ;P anyone mind if I PM them? Or tell me if there's a better channel :D (sorry to bother y'all)	20:16
*** thomasem has joined #openstack-infra		20:17
dansmith	dstufft: it's easy. pull the tab to open the spout, chug it, recycle the can when done	20:17
lifeless	jeblair: in fact, nodedb.DELETE appears to be for this already, just the surrounding code isn't quite aligned	20:17
dstufft	dansmith: :D	20:17
jeblair	lifeless: i think the behavior you described is the problem we're trying tofix	20:18
jeblair	so the thing we want to deal with is that rackspace (apparently) ignores deletes and takes a long time for them to run	20:18
lifeless	jeblair: yes, exactly	20:18
lifeless	jeblair: or are we talking at cross purposes ;)	20:18
*** jcoufal has joined #openstack-infra		20:18
jeblair	deleting nodes is parallel normally, but after the 10 minute timeout, the parallel thread exits and the only chance for it to be deleted is the serialized periodic task	20:19
jeblair	so overall, i would expect that process to be slower.	20:19
jeblair	the peroidic task should not be where the bulk of work happens, it should be where the stuff that falls through the cracks eventually gets cleaned up	20:19
jeblair	so i think we need to change nodepool to match what's actually happening with clouds	20:19
jeblair	which is that deletes can take longer than 10 minutes normally	20:20
jeblair	so step 1 is to increase the 10 minute timeout for deletes	20:20
lifeless	jeblair: I may be misunderstanding someting here, is deleteNode where the parallel deletes come in?	20:20
jeblair	and if we think that rax is ignoring delete api calls, then we should have it send more of them (step 2)	20:20
lifeless	jeblair: so the theory is that we're stuck on the quota because rax aren't deleting ?	20:21
jeblair	lifeless: yeah, or deleting very slowly	20:21
jeblair	lifeless, fungi: if we're hitting the 10 minute delete timeout and then later the periodic task is successfully deleting rax nodes, then what i've described is accurate	20:22
lifeless	ok, so I see	20:22
lifeless	NodeCompleteThread	20:22
jeblair	fungi: i haven't checked the logs recently, is that the case?	20:22
lifeless	is started per-node	20:22
jeblair	lifeless: right	20:22
*** SergeyLukjanov is now known as SergeyLukjanov_		20:22
fungi	jeblair: yes, that's what i've been seeing. mostly in ord	20:23
lifeless	jeblair: so what I want to do is remove the 10m block, let the node complete wrap up quickly and let the periodic check also run quickly	20:23
lifeless	then run the periodic check more often	20:23
jeblair	lifeless: it's not a block	20:23
jeblair	lifeless: because it's a thread-per-node, it doesn't block anything else	20:24
lifeless	jeblair: Clearly I'm misunderstanding the code; I see deleteNode -> cleanupServer -> getServer -> submitTask	20:24
lifeless	jeblair: the periodic code also calls cleanupServer, so it blocks that thread	20:25
lifeless	jeblair: no ?	20:25
mikal	Morning	20:25
jeblair	lifeless: all the manager tasks are fast	20:26
jeblair	lifeless: they are just nova api calls	20:26
lifeless	jeblair: except cleanupServer	20:26
jeblair	lifeless: serialized across multiple threads	20:26
lifeless	jeblair: not for the periodic cleanup	20:27
lifeless	jeblair: unless I've misunderstood TaskManager	20:27
jeblair	lifeless: (periodic cleanup is just one of the threads submitting tasks)	20:27
jeblair	lifeless: the cleanupServer method is slow, but it doesn't block anything else	20:27
jeblair	lifeless: it submits a series of tasks to the manager	20:27
mikal	What does a check time-in-queue time of 4 hours 18 minutes mean? That there weren't enought workers to start running the test immediately that it was enqueued?	20:27
*** senk1 has quit IRC		20:28
jeblair	lifeless: it's a sort of convenience wrapper around the series of tasks needed to delete a server	20:28
lifeless	jeblair: and the manager is a single thread with a Queue.Queue	20:28
notmyname	mikal: not just workers, but also patches previous to it failing that cause a flush of the gate	20:29
lifeless	I see one JenkinsManager per target jenkins	20:29
fungi	mikal: yes, currently when a gate reset happens, gate pipeline changes go to the back of the line for resource allocation and any pending check pipeline changes are getting available nodes assigned until they catch up to whatever was pending there at teh time of teh gate reset	20:29
jeblair	lifeless: so cleanupServer isn't what is run by that manager, but rather 'removeFloatingIP' 'deleteFloatingIP' 'deleteKeypair' 'deleteServer' are the actual serialized tasks	20:29
*** ryanpetrello has joined #openstack-infra		20:29
mikal	notmyname: this is check though, I thought that was the IndependantPipelineManager	20:29
mattoliverau	Morning all	20:30
notmyname	mikal: ah. so just what fungi said, then :-)	20:30
notmyname	heh. Australia has woken up ;-)	20:30
mikal	fungi: I am having trouble parsing that...	20:30
fungi	notmyname: the pipeline is independent, but node allocation is on a first-come, first served basis	20:30
fungi	er, mikal ^	20:30
mikal	Oh, so a gate flush eats all the nodes that check would use?	20:30
mikal	So check starves for a while?	20:30
fungi	mikal: more or less. when there are enough nodes to go around you don't see this. when we run out of available nodes we get into a situation where the pipelines take turns	20:31
mikal	Ok, fair enough	20:31
mgagne	zaro: ping	20:31
fungi	and it escalates, because each pipeline is accumulating new changes faster than it can serve them	20:31
mikal	So... Should I go to the node shop and bring you back some more quota?	20:32
fungi	s/serve/service/	20:32
fungi	mikal: yes, a thousand standard.large would do nicely ;)	20:32
jeblair	lifeless: so the actual blocking parts of the manager are the methods that do 'self.submitTask(something)'	20:32
lifeless	jeblair: I'm not sure I believe you. periodicCleanup->cleanupOneNode->deleteNode->manager.cleanupServer	20:32
mikal	fungi: this is actually a serious question... Would asking rackspace for more test node quota actually get you out of trouble?	20:32
jeblair	lifeless: cleanupServer as a whole is not blocking	20:33
jeblair	lifeless: there's no thread lock around it or anything	20:33
lifeless	jeblair: it won't return until the server is deleted	20:33
fungi	mikal: i got the impression mordred was already asking rackspace for more quota, so might want to confirm with him (and reinforce as needed)	20:33
*** ociuhandu has joined #openstack-infra		20:33
jeblair	lifeless: that is correct	20:33
lifeless	jeblair: because getServer does a wait on the task	20:33
mordred	jeblair: yay!	20:33
*** gsamfira has quit IRC		20:33
*** rfolco has quit IRC		20:33
lifeless	jeblair: periodicCleanup will be blocked	20:33
jeblair	mordred: don't be too happy	20:33
mikal	mordred: you chasing rackspace for more quota?	20:33
mordred	sorry , that should have been "yay, it's jeblair"	20:33
mordred	mikal: yes	20:33
jeblair	mordred: i'm quite sick	20:33
mordred	jeblair: oh no!	20:33
mordred	jeblair: you need me to bring you soup? I can do that now ...	20:34
*** gokrokve has quit IRC		20:34
fungi	jeblair: you brought something more than your luggage back from perth, i take it?	20:34
lifeless	jeblair: I think you might be saying 'node deletes when jobs finish will still be attempted' - and sure, I agree.	20:34
jeblair	mordred: thanks! but i don't want you to get sick	20:34
lifeless	jeblair: I'm talking about making the whole set of cleanup things accomondate rax better	20:34
*** gokrokve has joined #openstack-infra		20:34
*** andreaf has joined #openstack-infra		20:34
*** ociuhandu has quit IRC		20:34
jeblair	lifeless: so am i.	20:34
lifeless	jeblair: but I want to be sure I understand the code; and when you say 'wont be blocked' while I'm specifically talking about the periodic cleanup, I'm thoroughly confused.	20:35
jeblair	lifeless: oh yes, the periodic cleanup _will_ be blocked.	20:35
*** gokrokve_ has joined #openstack-infra		20:35
jeblair	lifeless: you're quite right there, and i think you understand correctly.	20:35
lifeless	jeblair: so my point about this was that if we stop waiting in the nodecomplete handler	20:35
lifeless	jeblair: and stop waiting in the periodic cleanup	20:35
lifeless	jeblair: then we can retry across all the pending deletes more often	20:36
*** malini has left #openstack-infra		20:36
lifeless	jeblair: without adding a raft of new threads or anything	20:36
fungi	without a raft, i'll never get off this island	20:36
jeblair	lifeless: if you don't wait at all then you only give the provider 1 second to delete a node before you ask it to again.	20:36
fungi	though palm trees might do a sight better than threads	20:36
*** markwash has joined #openstack-infra		20:36
lifeless	jeblair: We do periodic deletes 1/ second ?	20:36
jeblair	lifeless: not at the moment	20:37
jeblair	lifeless: are you suggesting that we leave the periodic interval as-is, every 5 minutes?	20:38
jeblair	lifeless: then minimum time to delete a node will be 5 mins	20:38
*** talluri has joined #openstack-infra		20:38
lifeless	jeblair: lets say we set it to 30 seconds	20:38
lifeless	jeblair: then if the cloud deletes on request, it will be deleted by nodecompletehandler	20:38
jeblair	lifeless: nodepool won't notice it until the next periodic run though since you aren't waiting for it	20:39
lifeless	jeblair: if the cloud doesn't delete it on the first request, up to 30 seconds later we will try from periodic, and every 30s thereafter	20:39
*** gokrokve has quit IRC		20:39
lifeless	jeblair: I don't mean 'don't try' I mean 'don't block if it does not go away immediately.	20:39
jeblair	lifeless: it never goes away immediately	20:39
jeblair	lifeless: even the fastest cloud provider takes a little while (many seconds-minutes) to delete a node	20:40
lifeless	sure	20:40
fungi	on a good day, novaclient reports my hpcloud vms gone after 10 seconds and rackspace after more like 60	20:40
lifeless	do nodes in state DELETE count towards the max-servers count ?	20:40
jeblair	lifeless: yes	20:40
lifeless	ah	20:40
lifeless	jeblair: so is 30 seconds a reasonable time to wait to find out if the cloud deleted the node ?	20:40
jeblair	lifeless: apparently 10 minutes isn't long enough	20:41
lifeless	jeblair: I know, but I'm looking at the nodepool state changes from what I'm proposing	20:41
fungi	i don't think any rackspace deletes will work in a 30-second timeframe. maybe one on occasion, but unlikely	20:41
lifeless	they seem to be that if a cloud reacts quickly, we change from findout out at 2/4/6/8 (iterate_timeout) seconds	20:41
lifeless	to finding out at 30+ seconds	20:41
lifeless	in fact, right now we do nodes in state DELETE /2 API checks a second	20:42
lifeless	so we could make the periodiccheck run every 2 seconds	20:42
jeblair	lifeless: i think there are two ways of fixing this: i propose that we adjust the parallel delete strategy to match current reality, you propose going to all-serial.	20:42
lifeless	and it would be the exact same API traffic	20:42
*** gbrugnago has joined #openstack-infra		20:42
*** dcramer_ has joined #openstack-infra		20:42
lifeless	jeblair: yes; though actually I wasn't intending to block there; I was more aiming at a centralised view	20:43
lifeless	jeblair: s/block/stop/	20:43
lifeless	jeblair: anyhow, now I understand more of the design - thanks - I can see why increasing the timeout will help - as long as nodepool isn't restarted	20:43
lifeless	jeblair: but when it's restarted everything will become dependent on the periodic cleanup, so I think making that much more effective is important	20:44
*** NikitaKonovalov_ is now known as NikitaKonovalov		20:45
fungi	under present volume, i've had to resort to ungracefully restarting nodepool and cleaning up the mess	20:45
jeblair	lifeless: agreed; perhaps adjusting the timeout for parallel operation and reducing it for periodic cleanup would be best	20:45
jeblair	s/adjusting/increasing/	20:45
*** markmcclain has quit IRC		20:46
lifeless	jeblair: so, what about eliminating the timeout, going all serial as I proposed, but then introducing concurrency in the periodic cleanup - e.g. worker threads there to scatter-gather at some defined concurrency	20:47
lifeless	jeblair: this would get the same performance for live deletes and make after restart better too, without needing two different codepaths	20:47
*** dcramer_ has quit IRC		20:47
lifeless	jeblair: oh, I just had a possible insight	20:48
lifeless	jeblair: one form of rate limiting is to discard requests that are over the threshold	20:48
lifeless	jeblair: how many nodes do we try to delete at once at peak ?	20:48
*** derekh has joined #openstack-infra		20:48
lifeless	I'm guessing hundreds	20:48
jeblair	lifeless: yes. sometimes the entire quota.	20:49
lifeless	so what if our basically random api calls results in basically random things being actioned and the rest dropped	20:49
lifeless	being non-blocking-serial (e.g one api call to delete each server before we probe for any of them, then probe all once, then delete all once, in a loop)	20:49
lifeless	would give much better behaviour with such rate limiters	20:50
lifeless	-> doctors	20:50
mordred	mikal: I have a thread going with pvo	20:50
lifeless	jeblair: I will prepare a patch after my dr visit so we can discuss code	20:51
jeblair	lifeless: i'm going to be semi-responsive for a while	20:51
jeblair	due to illness and other schedule issues	20:52
openstackgerrit	Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Add tox.ini file to run things via tox https://review.openstack.org/67721	20:52
jeblair	fungi: is there anything else urgent i can help with? otherwise i'm going to go sleep	20:52
*** mrda has joined #openstack-infra		20:53
fungi	jeblair: go to sleep	20:53
russellb	jeblair: hope you feel better soon! health more important :)	20:53
jeblair	russellb: thanks	20:53
fungi	we're handling it. really most of the issues are volume+openstack bugs	20:53
sdague	jeblair: yeh, hope you feel better soon	20:54
jeblair	sdague: thanks	20:54
fungi	definitely. the sooner you're well, the more we'll get accomplished	20:54
jeblair	fungi: i don't think i'm well enough to go to utah, i'll try to join in by phone or something	20:55
fungi	jeblair: my flight's through baltimore tomorrow, so there's every chance i could get stuck in maryland instead ;)	20:55
jeblair	but hopefully that will give me a chance to get better and pitch in later this week, and hopefully still go to brussels	20:55
mordred	jeblair: ++	20:55
mordred	jeblair: and seriously- I'm sure you're covered, but let me know if I can be helpful	20:56
*** yolanda has quit IRC		20:56
jeblair	mordred: cool, thanks	20:56
*** rnirmal has quit IRC		20:56
*** ociuhandu has joined #openstack-infra		20:57
*** zanins has quit IRC		20:57
*** aburaschi has quit IRC		20:59
sdague	mordred: hey, tox question, because I'm abusing it for doing something unnatural	20:59
sdague	is there an easy way to catch and pass a ^C through tox to the underlying thing it was running?	21:00
mordred	hrm	21:00
mordred	sdague: no idea	21:00
*** DinaBelova is now known as DinaBelova_		21:00
sdague	ok, no big deal	21:00
fungi	russellb: sdague: the nova fix is getting nodes now	21:02
russellb	yeah just saw that	21:02
fungi	~1 hour to results	21:02
*** dcramer_ has joined #openstack-infra		21:03
openstackgerrit	Sean Dague proposed a change to openstack-infra/elastic-recheck: objectify the gerrit event for our purposes https://review.openstack.org/67941	21:03
sdague	sweet	21:03
sdague	fingers crossed	21:03
sdague	if you watch on -qa you can see that 680 is coming back a lot	21:04
*** ociuhandu has quit IRC		21:05
*** CaptTofu has joined #openstack-infra		21:05
sdague	now lets hope it doesn't fail on one of the other races	21:05
*** misskitty has joined #openstack-infra		21:05
fungi	clarkb: prelim results from 67911... http://paste.openstack.org/show/61596/ (seems to work as intended)	21:07
fungi	if there's any ipv6 ra monkeybusiness at that point in time, we should be able to identify it now	21:08
fungi	(...and knowing's half the battle)	21:08
clarkb	++	21:09
fungi	once check results come back, i say we just approve it into the gate normally and then can promote it or force-merge as necessary if the frequency increases substantially	21:10
*** dprince has quit IRC		21:10
*** max_lobur is now known as max_lobur_afk		21:10
fungi	otherwise just let the gate take its course	21:10
clarkb	sounds good	21:11
fungi	i haven't see enough of these yet to suggest it's killing us	21:11
fungi	seen	21:11
clarkb	ya	21:11
*** jaypipes has quit IRC		21:12
*** talluri has quit IRC		21:14
*** misskitty has quit IRC		21:14
openstackgerrit	Derek Higgins proposed a change to openstack-infra/config: Enable precise-backports on tripleo test nodes https://review.openstack.org/67958	21:16
*** gbrugnago has quit IRC		21:17
*** kirukhin has joined #openstack-infra		21:17
ewindisch	russellb: it seems to me that vmware is only complying with the "group b" functional testing requirement on changes that affect their driver directly... is that okay?	21:17
*** dcramer_ has quit IRC		21:17
*** senk has joined #openstack-infra		21:17
*** salv-orlando has quit IRC		21:18
*** salv-orlando has joined #openstack-infra		21:18
*** smarcet has left #openstack-infra		21:21
dansmith	ewindisch: have you read the guidelines?	21:24
ewindisch	dansmith: which? I've read https://wiki.openstack.org/wiki/HypervisorSupportMatrix	21:24
dansmith	ewindisch: https://wiki.openstack.org/wiki/HypervisorSupportMatrix/DeprecationPlan	21:24
dansmith	ewindisch: and the bit on the matrix page says "group c will be deprecated"	21:25
openstackgerrit	Sean Dague proposed a change to openstack-infra/elastic-recheck: objectify the gerrit event for our purposes https://review.openstack.org/67941	21:25
ewindisch	dansmith: yes, I know that... which is why I'm trying to get into group B ;-)	21:25
ewindisch	dansmith: I need to re-review the click-through for DeprecationPlan	21:25
russellb	right, A and B are fine	21:26
russellb	i expect most to end up in B	21:26
dansmith	(for now)	21:26
russellb	A is ideal	21:27
russellb	B acceptable	21:27
ewindisch	russellb / dansmith: the problem is that running our own gating infrastructure for every change is quite an undertaking. I had originally thought this could run in upstream CI	21:27
*** CaptTofu has quit IRC		21:28
dims	ewindisch, the folks working on the vmware driver are on #openstack-vmware channel if you have questions for them as well - fyi	21:28
dansmith	ewindisch: yeah, that's why most people got started early	21:28
ewindisch	russellb: well, it sounds like A -- which is what I'd prefer to implement - is not acceptable by the openstack-infra team, based on conversations earlier today	21:28
russellb	yeah, this has been set since before the driver was merged	21:28
russellb	well ... it's just that the timing is bad	21:28
dansmith	ewindisch: most of them can't run in upstream infra, so you have a major advantage that you can even do that	21:28
ewindisch	dims: the question was more to russell, "does vmware qualify as B considering it doesn't run on every proposed change to nova"?	21:28
russellb	it will be running on every change	21:29
dansmith	ewindisch: they're working on that	21:29
sdague	ewindisch: you can't come to infra at i2 and ask for implementing additional hypervisor in upstream ci	21:29
russellb	that's their plan	21:29
ewindisch	russellb: gotcha	21:29
sdague	if we'd had a session at icehouse summit, it would be something worth discussing	21:29
sdague	which is why I said -1, bring to juno summit	21:29
russellb	fwiw, supporting docker in existing CI is way easier than anything else	21:29
*** dcramer_ has joined #openstack-infra		21:29
sdague	russellb: agreed	21:30
ewindisch	russellb: agreed.	21:30
ewindisch	sdague: is it about human resources or hardware resources?	21:30
russellb	but yeah, have to be sensitive to infra priorities based on the status of things	21:30
sdague	ewindisch: right now, both	21:30
dims	ewindisch, right. i was responding to "running our own gating infrastructure". you can get an idea from them if you wanted to :)	21:31
ewindisch	dims: ah	21:31
*** jhesketh_ has joined #openstack-infra		21:33
openstackgerrit	Andreas Jaeger proposed a change to openstack-infra/config: Early abort documentation builds https://review.openstack.org/67722	21:33
*** ruhe is now known as _ruhe		21:34
ewindisch	sdague: I've worked on gate stuff before, I don't know if it will require that much human capital besides my own effort and perhaps some inquiries here on irc -- but I could be wrong.	21:34
ewindisch	sdague: hardware is something we might be able to help with, TBD	21:35
mattoliverau	lifeless: In regards to speeding up the cleaning up/deleting of nodes: I don't know if this is possible yet, I've started playing, but what if we only have to build servers once (each day). That is, Build a server with a main LXC container using the prepare_node.sh etc. Then everytime we need a new "server" for running a test/build, the create is as simeple as creating an empherial LXC container (of	21:35
mattoliverau	an existing one). This is a container that only lasts until it's turned off.. so run the tests and then the delete and clean up of a node is as simple as stopping a container. Containers run almost as fast as the the machine they run on as they use the same kernel. So as long as the tests/devstack can run inside one of course, so I could be missing something here, but wouldn't this speed up	21:35
mattoliverau	subsuquent rebuids and deletes of each node. Just my 2 cents. But again I'm new to the project and have a huge gap in my knowlege on the evironment etc.	21:35
*** yamahata has quit IRC		21:36
ewindisch	sdague: one of my concerns is that pulling from the gerrit eventstream, we don't get the advantages of things like zuul and "speculative testing" that are done upstream.	21:36
clarkb	mattoliverau: containers are insufficient for our needs. pleia2 has a list of issues iirc	21:36
*** nati_ueno has joined #openstack-infra		21:37
sdague	ewindisch: you only need to vote on check	21:37
*** jhesketh__ has joined #openstack-infra		21:37
clarkb	there are things that are not namespaced that openstack touches	21:37
mattoliverau	clarkb: ok, just a thought :)	21:37
sdague	and I agree, it's not quite the same	21:37
jhesketh__	Morning	21:37
clarkb	mattoliverau: we wish they would work :)	21:37
sdague	ewindisch: to be pragmatic. 1) don't plan on this happening in icehouse. 2) start working on how to do it in juno, get prelim work started now 3) be prepared to do summit session on it in Atlanta	21:38
*** yamahata has joined #openstack-infra		21:38
russellb	sdague: it's really not that complicated ... not sure a summit session should block anything	21:39
ewindisch	sdague: meanwhile, unless we invest in external infrastructure, our driver is removed from NOva	21:39
sdague	russellb: it's a socializing thing about what the new matrix looks like	21:39
russellb	we can only socialize every 6 months?	21:39
clarkb	so I spoke to devananda about ironic testing too. can we have nova talk to libvirt wemu, ironic, and docker and run one test	21:40
sdague	russellb: there is so much to be done to get us to a functioning i3 at this point given the current gate, adn there are very few people to get it done	21:40
ewindisch	investing in external infrastructure which is not only expensive, distracts us from making progress on getting into "group A"	21:40
mikal	So, as a data point turbo hipster runs on every nova commit and isn't _that_ big	21:40
*** Shrews has quit IRC		21:41
mikal	(21 instances, about $2,000 a month in public cloud costs)	21:41
russellb	mikal: cool data point	21:41
dansmith	yeah, awesome	21:41
mikal	It sometimes gets behind, but that's mostly when dansmith does a thing	21:42
russellb	sdague: yeah, i get that, it's kinda late to be getting started trying to get something running, given that infra is only going to get more busy	21:42
fungi	and reuses a lot of upstream ci tooling	21:42
mikal	And it catches up	21:42
dansmith	$6k per cycle for CI testing	21:42
russellb	well ... 12k	21:42
mikal	Ball park	21:42
russellb	6 months :-)	21:42
*** Shrews has joined #openstack-infra		21:42
mikal	If we catch one production db problem a cycle, then that's easily paid for itself	21:42
dansmith	russellb: only if you feel the need for the numbers to be right	21:42
russellb	mikal: how long do your runs take?	21:42
dansmith	russellb: er, yeah, 12 :)	21:42
mikal	Heh	21:42
mikal	Ummm, about 20 minutes...	21:42
russellb	we're asking for a full tempest run	21:43
mikal	So a _lot_ faster than infra's CI at this point	21:43
russellb	right	21:43
mikal	So, what's a tempest run these days? An hour?	21:43
russellb	yeah	21:43
dansmith	certainly a tempest docker run would be way faster than kvm, no?	21:43
mikal	So I guess multiply those numbers by three	21:43
russellb	dansmith: yes	21:43
russellb	because a tempest config that works with docker would be a small subset	21:43
mikal	But yeah, I would expect containers to be a lot faster to start than vms	21:43
russellb	that too	21:43
russellb	but also, docker driver supports a small subset of the API	21:43
dansmith	yeah, both of those things	21:43
ewindisch	dansmith: yeah, and I've thought about doing "docker in docker" so we can avoid putting any of it into VMs at all (or gating multiple tests on a single VM in parallel)	21:44
openstackgerrit	Matt Ray proposed a change to openstack-infra/config: Chef style testing enablement and minor speed cleanup starting w/block-storage https://review.openstack.org/67964	21:44
russellb	ewindisch: sure, whatever works ...	21:44
russellb	just ... full tempest run on every patch :)	21:45
sdague	anyway, I think with what's on the infra plate at this point, I think this is too late. Especially by a team that's not contributed to anything besides their corner of the world. So start helping on generic infra so we can free up some resources, and then it becomes part of the conversation	21:45
russellb	where "full" is a bit loose	21:45
sdague	every new feature has a cost, and i2 is the wrong place to be bringing this forward	21:45
dansmith	russellb: well, I think the definition is "full, for everything you support, and show your config" :)	21:46
russellb	dansmith: yeah	21:46
sdague	I'll let jeblair contradict me when he is well, but until then, I'll play bad cop :)	21:46
dansmith	I would think "we're not testing anything else until we can test what we already test" would be a reasonable answer until we get out of the current mess anyway, almost regardless of what it is	21:47
russellb	+1 to that	21:48
ewindisch	sdague: we're a startup, putting a team onto openstack-ci work is really a non-starter. I've personally worked with openstack-ci stuff in the past, although admittedly as it improved my "own little corner of the world", but I'm not entirely fresh on this.	21:48
boris-42	mikal russellb sdague sorry for probably off top but we learn Rally to make deployment at scale	21:48
* russellb stares down the top nova change in the queue		21:48
boris-42	I mean for 30 minutes we got 128 compute nodes	21:48
russellb	boris-42: huh?	21:48
boris-42	russellb yep we are working around Rally	21:48
boris-42	russellb thing that make benchmarking simple	21:48
boris-42	russellb so latest results are next simulation of compute_node (running it in LXC) requires 150MB RAM	21:49
boris-42	russellb and we are instead of deploying it (actually copy-pasting)	21:49
boris-42	russellb probably will be interesting for catching Rabbit/NovaNetwork/Scheduer/DB bottlnecks	21:50
boris-42	without having tons of resources	21:50
boris-42	and a lot of $$$	21:50
sdague	ewindisch: again, it's about timing. you can't show up at i2, when we are under huge strains in the existing system, and say "hey guys, I want you all to pivot out the test matrix and test our hypervisor"	21:51
dansmith	it's not like this nova requirement is new, or anything	21:51
*** NikitaKonovalov is now known as NikitaKonovalov_		21:52
mikal	I think you could argue as well that our obligation to existing driver users is greater than our obligation to new drivers.	21:52
fungi	russellb: this is not the failure mode your new change is trying to fix, right? https://jenkins01.openstack.org/job/gate-grenade-dsvm/4786/consoleText	21:52
mikal	We have a duty of care to the users we currently have	21:52
*** jaypipes has joined #openstack-infra		21:52
dansmith	fungi: no	21:53
fungi	okay, good. because that cropped up with the proposed fix in place	21:53
dansmith	fungi: I think that's "the other one"	21:53
ewindisch	sdague: I understand that. We're conflating two issues here of human and hardware resources. I acknowledge we might need to help with both, however.	21:53
*** kirukhin has quit IRC		21:53
dansmith	fungi: i.e. switch the 8 and 0	21:53
fungi	dansmith: it's definitely a common one, because i've hit it on several changes today	21:54
dansmith	fungi: yar	21:54
russellb	yeah, not sure what that one is yet	21:54
sdague	and we've got the other issue which is why would we play favorites on containers and pick docker instead of libvirt lxc	21:54
ewindisch	sdague: presuming we could help with hardware, are the human-side strains still too hard?	21:55
russellb	sdague: well ... someone is actually trying to do the work for docker, heh	21:55
sdague	which is why I think this is a summit conversation	21:55
fungi	dansmith: ahh, yep, the cinderclient change behind it is also failing on that	21:55
sdague	ewindisch: yes	21:55
*** _david_ has quit IRC		21:55
sdague	the infra team is massively strained at this point	21:55
fungi	sdague: it's not that bad. i did actually sleep a few hours last night	21:56
sdague	and we're probably going to need to do some heads down things to get the gate to a good state for i3	21:56
mikal	sdague: don't forget Canonical's lxc specific driver, which has been in review for a while	21:56
sdague	yep	21:56
ewindisch	sdague / dansmith: and we haven't ignored those requirements-- Docker acknowledged that the gating work had to be done and resourced the effort -- which is in part what I've been hired to accomplish.	21:57
portante	sdague, fungi, clarkb: FWIW, I think you guys are doing a great job, and rely on your commitment and knowledge tremendously	21:57
mikal	sdague: there's at least three container options at the moment	21:57
fungi	portante: thanks!	21:57
*** thuc has joined #openstack-infra		21:57
sdague	portante: thanks	21:57
russellb	mikal: well ... 2 in tree	21:58
russellb	mikal: the other one didn't even have a blueprint last i saw it	21:58
fungi	ewindisch: a related datapoint, note that there are a stack of changes proposed to support xenserver in upstream infra, started a while back, and still being hashed over	21:58
russellb	so, pretty far from even needing code review IMO	21:58
sdague	mikal: right, which is why I said this is a summit conversation. Because I think containers in gate is a good idea, and I think it's a community conversation we should have. It's just not a now good idea.	21:58
mikal	russellb: that's true, but it exists	21:58
russellb	for some definition of exists	21:58
russellb	not really relevant for this discussion of driver CI right now	21:59
fungi	russellb: sdague: IT LIVES	21:59
sdague	fungi: sweet!	21:59
russellb	merged?	21:59
mikal	I wonder how broken a tempest run with lxc containers turned on is?	21:59
ewindisch	fungi: I'd have to look at those changes, but my perspective is that I'd target the docker gate to have no more impact than, say, adding a postgres gate as opposed to mysql	21:59
fungi	russellb: well, it will merge once zuul wakes up and processes the result it has there	21:59
sdague	russellb: passed everything	21:59
russellb	mikal: well first you'd have to come up with a tempest config that only hits what it supports	21:59
russellb	fungi: yay	21:59
dansmith	woo!	22:00
russellb	now, that other damn bug ...	22:00
russellb	mriedem: have you fixed it yet? :-p	22:00
sdague	some times you do get the bear	22:00
sdague	on a day like today, a win like that is a good one	22:00
*** rnirmal has joined #openstack-infra		22:00
fungi	ewindisch: right, their work involved needing separate test node configurations entirely (they have to reboot for new kernels and other stuff), so conceivably less involved for docker	22:00
mriedem	russellb: nope, was thinking about pushing a test patch to increase the sleep in the libvirt volume module to see if it hits after a few rechecks, but i'm open to suggestion/help	22:01
russellb	mriedem: was mostly kidding of course :)P	22:01
mriedem	jsbryant said he looked at it a bit and nothing jumped out at him from the cinder changes	22:01
ewindisch	fungi: we just need to install a userland package and run a daemon. We no longer have any special kernel requirements (there used to be a requirement on AUFS which required a newish vanilla kernel)	22:01
fungi	sdague: well, it's a win, but it'll be the first change to merge through normal gating in 8 hours (per the openstack/openstack commit log)	22:02
sdague	fungi: I'll take anything today	22:02
ewindisch	fungi: the only special requirement we have right now is that our package isn't in precise-backports, only tahr (14.04)... I recognize it will be easier if we can use upstream ubuntu pacakges that work in Precise, so I'm pressing to get a package into precise-backports ASAP	22:03
fungi	ewindisch: or ubuntu cloud archive for precise, assuming we can work out why it's still breaking tempest runs and nova unit tests	22:03
*** nati_ueno has quit IRC		22:04
ewindisch	fungi: at present, we have our own packages for precise that live in our own private repo (signed with our own key). I recognize that's troublesome in a few ways ;-)	22:04
*** ArxCruz has quit IRC		22:06
*** dizquierdo has joined #openstack-infra		22:06
fungi	ewindisch: yes, i know you definitely understand that ;)	22:07
*** beagles has quit IRC		22:07
*** ArxCruz has joined #openstack-infra		22:09
*** marun has quit IRC		22:14
*** nati_ueno has joined #openstack-infra		22:14
*** nati_ueno has quit IRC		22:14
dansmith	russellb: merged	22:14
russellb	\o/	22:15
russellb	good thing every patch isn't that hard to land	22:15
russellb	... usually	22:15
dansmith	what's with the big gap in the failure rates graph on the e-r page?	22:15
*** nati_ueno has joined #openstack-infra		22:16
*** ewindisch is now known as zz_ewindisch		22:16
*** zz_ewindisch is now known as ewindisch		22:17
*** Ajaeger1 has quit IRC		22:18
*** ewindisch is now known as zz_ewindisch		22:20
*** zz_ewindisch is now known as ewindisch		22:21
*** nati_ueno has quit IRC		22:23
openstackgerrit	Davanum Srinivas (dims) proposed a change to openstack-infra/devstack-gate: Temporary HACK : Enable UCA https://review.openstack.org/67564	22:26
*** jerryz has joined #openstack-infra		22:27
*** yamahata has quit IRC		22:29
*** michchap has quit IRC		22:30
*** michchap has joined #openstack-infra		22:31
*** senk has quit IRC		22:32
jerryz	fungi: ping	22:32
*** dcramer_ has quit IRC		22:33
*** thomasem has quit IRC		22:33
fungi	jerryz: hi there	22:34
*** nati_ueno has joined #openstack-infra		22:35
*** jcoufal has quit IRC		22:36
*** sandywalsh has joined #openstack-infra		22:36
jerryz	fungi: i have a question about third party testing. if a gerrit trigger is configured for a project, will every single create patch event of the project trigger a third party test ?	22:37
fungi	jerryz: yes, in a normal configuration, it will	22:37
jerryz	fungi: even if the patch may not have anything to do with the plugin	22:37
*** yamahata has joined #openstack-infra		22:38
ewindisch	mikal: any idea how many patchsets per day on nova?	22:38
*** dims has quit IRC		22:39
mikal	About 100 last I looked	22:39
ewindisch	thanks	22:39
mikal	Obviously around deadlines that spikes	22:39
*** thuc has quit IRC		22:39
fungi	jerryz: i'm not familiar enough with the gerrit-trigger plugin for jenkins to know whether it can filter on changes matching only specific file patterns. but as far as whether the desired result is to test on every patch, that's more of a question for the ptl who's insisting on test results (i don't know whether requirements are differing between nova, neutron and cinder driver testing)	22:39
*** thuc has joined #openstack-infra		22:40
*** thuc has quit IRC		22:40
*** dizquierdo has quit IRC		22:40
jerryz	fungi: do you know what is the final decision on whether keep enable third party testing +1 privilege?	22:43
*** senk1 has joined #openstack-infra		22:43
mikal	ewindisch: I am lying it seems, its closer to 200	22:43
* mikal is making a graph now		22:43
jerryz	fungi: if a third party testing account will post vote on any patch regarding the project, that would indeed require the third party ci infra to be stable.	22:44
*** dkranz has quit IRC		22:44
ewindisch	mikal: thanks	22:44
mriedem	ewindisch: http://russellbryant.net/openstack-stats/nova-reviewers-30.txt	22:44
mriedem	New patch sets in the last 30 days: 2564 (85.5/day)	22:44
jerryz	fungi: i mean -1 privilege	22:45
*** dcramer_ has joined #openstack-infra		22:46
fungi	jerryz: it's mostly consensus from the project it's voting on. there are some clarifications to the guidelines being proposed at https://review.openstack.org/63478	22:47
*** jasondotstar has joined #openstack-infra		22:50
*** carl_baldwin has joined #openstack-infra		22:50
*** nati_ueno has quit IRC		22:52
*** nati_ueno has joined #openstack-infra		22:54
*** dims has joined #openstack-infra		22:54
lifeless	ok back	22:57
lifeless	fungi: clarkb: where are we at with exhaustion ?	22:57
sdague	dansmith: on top? graphite fell over	22:58
fungi	lifeless: i've reverted to running my manual auxiliary nodepool delete loops from the cli to keep the stale deletes minimized	22:58
dansmith	sdague: ah, okay	22:59
sdague	because, you know, we didn't have enough things breaking today :)	22:59
fungi	i missed the graphite outage. who wound up fixing that?	23:00
dansmith	sdague: just sucks to be able to see the change, if any, from the recent merge, which is why I was asking	23:00
*** jasondotstar has quit IRC		23:00
sdague	yeh	23:01
sdague	honestly, it takes a while to build up data anyway	23:01
*** carl_baldwin has quit IRC		23:01
sdague	and I'm less trusting of the graphite numbers after I found that some of our interupts get reported as fails	23:02
fungi	that's something i think would have to be addressed in jenkins itself too	23:02
*** carl_baldwin has joined #openstack-infra		23:02
*** nati_ueno has quit IRC		23:03
lifeless	fungi: ahahahaha	23:04
lifeless	fungi: I found a 15m latency on periodic cleanup as well	23:04
fungi	lifeless: ooh!	23:04
*** nati_ueno has joined #openstack-infra		23:04
sdague	lifeless: nice	23:04
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Cleanup nodes in state DELETE immediately. https://review.openstack.org/67979	23:05
lifeless	I may be misunderstanding state_time	23:06
*** mrodden has quit IRC		23:07
lifeless	actually, I think that code block is entirely broken	23:07
*** miqui has joined #openstack-infra		23:07
* lifeless revisits		23:07
lifeless	yeah, its missing a now -	23:08
*** senk1 has quit IRC		23:09
*** miqui has quit IRC		23:09
*** miqui has joined #openstack-infra		23:09
*** miqui has quit IRC		23:10
lifeless	there	23:11
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Fix early-exit on recently-set-state in deleteNode https://review.openstack.org/67980	23:11
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Cleanup nodes in state DELETE immediately. https://review.openstack.org/67979	23:11
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Cleanup nodes in state DELETE immediately. https://review.openstack.org/67979	23:12
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Fix early-exit in cleanupOneNode https://review.openstack.org/67980	23:12
lifeless	sorry for spam :)	23:13
fungi	grrr... my flight tomorrow just got cancelled	23:13
*** sarob has joined #openstack-infra		23:13
russellb	fungi: :( weather?	23:16
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Log how long nodes have been in DELETE state. https://review.openstack.org/67982	23:17
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Consolidate duplicate logging messages. https://review.openstack.org/67983	23:17
*** dcramer_ has quit IRC		23:20
fungi	russellb: yeah, my layover was going to be in baltimore, which is now on lockdown for tomorrow (noaa/nws winter storm warning all day_	23:21
russellb	bummer	23:21
fungi	just rebooked through vegas instead, but i think the long leg will end up being without wifi as a result	23:22
*** sarob has quit IRC		23:22
russellb	vegas is a good choice. much worse places to be stuck than there, just in case	23:22
clarkb	fungi: :( its ok I will be back to full focus tomorrow	23:23
fungi	yeah, i figured it's slightly less likely to get buried under ice and snow	23:23
fungi	clarkb: yay!	23:23
russellb	i'm out for today ... on the volume bug, only candidate we have is https://review.openstack.org/#/c/67973/	23:23
russellb	just going to watch that through some rechecks while we keep digging	23:23
fungi	russellb: thanks for the heads up	23:24
russellb	that's https://bugs.launchpad.net/nova/+bug/1270608	23:24
jgriffith	russellb: agreed	23:24
russellb	jgriffith: mriedem thanks again	23:25
mriedem	np, fun first day back :)	23:25
sdague	clarkb: if you have a little focus now, the config change with the er - uncategorized list would be handy to help us stay figure out what other unknown bugs are in the reset pile	23:26
sdague	it was very good gamification for jog0 to try to drive up our classification rate	23:27
*** eharney has quit IRC		23:28
*** jamielennox\|away is now known as jamielennox		23:29
*** derekh has quit IRC		23:30
*** dcramer_ has joined #openstack-infra		23:32
*** gokrokve_ has quit IRC		23:33
*** gokrokve has joined #openstack-infra		23:34
lifeless	fungi: cron timing	23:36
lifeless	fungi: in nodepool	23:36
openstackgerrit	lifeless proposed a change to openstack-infra/nodepool: Make cleanupServer optionally nonblocking. https://review.openstack.org/67985	23:37
*** gokrokve has quit IRC		23:38
openstackgerrit	lifeless proposed a change to openstack-infra/config: Cleanup old servers every minute. https://review.openstack.org/67986	23:39
lifeless	fungi: clarkb: would love https://review.openstack.org/#/c/67685 to be reviewed please	23:39
*** carl_baldwin has quit IRC		23:39
lifeless	jeblair: I've pushed a stack that will do what I propose to nodepool; I'm giving it a basic test now	23:40
jog0	wow 3 patches in openstack/openstack in 8 hours :/	23:49
*** rcleere has quit IRC		23:53
lifeless	yah, messed up	23:53
lifeless	did you see jay's note that passlib isn't installed properly?	23:53
*** rcleere has joined #openstack-infra		23:53
lifeless	> https://review.openstack.org/#/c/66670/	23:54
lifeless	That second patch has the gate-tempest-dsvm-neutron-isolated job failing	23:54
lifeless	trying to run keystone-manage pki-setup:	23:54
lifeless	ImportError: No module named passlib.hash	23:54
*** reed has joined #openstack-infra		23:55
*** rcleere has quit IRC		23:58
lifeless	jog0: ^	23:59
jog0	lifeless: I did	23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!