Wednesday, 2013-11-20

jeblairbecause of the rarity, i think filing bugs is ok, even if we don't end up doing anything with some of them because they are not ultimately actionable.00:00
jeblairit fits into the data collection strategy we have otherwise, so does not need an exception to the processes we're discussing00:01
clarkbmordred: are we using wheels yet or are they just in the mirror so that we are ready for them?00:01
clarkbmordred: the libvirt thing mikal is looking at started on the 15th and that appears to be when the mirror started doing wheels00:01
openstackgerritA change was merged to openstack-dev/hacking: Add a check for newline after docstring summary  https://review.openstack.org/5564400:04
fungianother fun issue causing gate resets... nettron py26 unit tests taking over an hour https://jenkins01.openstack.org/job/gate-neutron-python26/3161/console00:07
fungis/nettron/neutron/00:07
fungirobotron 208400:07
fungilooks like subunit2html.py got killed after 12 minutes of processing the subunit log00:09
clarkbfungi: :/ we made that go faster but I think their log files are too gigantic00:09
*** CaptTofu has joined #openstack-infra00:11
*** zaro0508 has joined #openstack-infra00:14
*** vipul-away is now known as vipul00:16
*** xeyed4good has quit IRC00:16
*** harlowja has quit IRC00:16
*** thomasem has joined #openstack-infra00:17
*** xeyed4good has joined #openstack-infra00:17
openstackgerritA change was merged to openstack-infra/elastic-recheck: Add query for bug 1252514  https://review.openstack.org/5707000:17
uvirtbotLaunchpad bug 1252514 in swift "glance doesn't recover if Swift returns an error" [Undecided,New] https://launchpad.net/bugs/125251400:17
*** zaro0508 has quit IRC00:20
*** zaro0508 has joined #openstack-infra00:20
*** zaro0508 has joined #openstack-infra00:21
*** dkliban has joined #openstack-infra00:21
*** wenlock has quit IRC00:22
*** hogepodge has quit IRC00:23
*** thomasem has quit IRC00:26
*** dkliban has quit IRC00:27
*** matsuhashi has joined #openstack-infra00:29
*** senk has joined #openstack-infra00:30
*** pcrews has quit IRC00:31
*** dkranz has joined #openstack-infra00:34
*** loq_mac has joined #openstack-infra00:35
*** mrodden has quit IRC00:36
*** loq_mac has quit IRC00:37
*** dcramer_ has joined #openstack-infra00:39
*** MarkAtwood has quit IRC00:41
*** CaptTofu has quit IRC00:42
*** CaptTofu has joined #openstack-infra00:44
*** matsuhashi has quit IRC00:48
*** matsuhashi has joined #openstack-infra00:49
*** mrodden has joined #openstack-infra00:50
*** matsuhashi has quit IRC00:53
*** alchen99 has quit IRC00:54
*** jcooley_ has quit IRC00:55
clarkbI am going to delete the 7 nodes I held with nodepool that were not failures now00:55
*** jcooley_ has joined #openstack-infra00:55
*** alchen99 has joined #openstack-infra00:57
*** alchen99 has quit IRC00:59
*** alexpilotti has quit IRC00:59
*** Ryan_Lane has quit IRC00:59
*** Ryan_Lane has joined #openstack-infra00:59
*** alchen99 has joined #openstack-infra00:59
*** jcooley_ has quit IRC01:00
*** reed has quit IRC01:02
*** senk has quit IRC01:02
*** oubiwann has joined #openstack-infra01:07
*** matsuhashi has joined #openstack-infra01:08
*** dcramer_ has quit IRC01:08
*** nosnos has joined #openstack-infra01:09
fungiclarkb: jeblair: we seem to have grown a crust of ~100 nodepool nodes perpetually in a deleted state since earlier today. i'm guessing the periodic cleanup thread is maybe deadlocked again? should i get a thread dump and restart nodepool?01:18
jeblairfungi: oh excellent; please do01:18
clarkbfungi: ++01:18
fungion it01:19
*** dcramer_ has joined #openstack-infra01:20
*** sarob has quit IRC01:22
fungiclarkb: jeblair: trimming the stack dump out of the debug log, it's still nearly 5k lines... find it at nodepool:~fungi/stack_dump.log01:28
fungirestarting nodepool now01:28
*** svarnau has quit IRC01:29
*** wenlock has joined #openstack-infra01:29
fungiself.periodicCleanup(session) is on line 113801:30
*** senk has joined #openstack-infra01:31
fungiam i reading that correctly that it's blocked on ListFloatingIPsTask()?01:31
*** markwash has quit IRC01:31
*** senk has joined #openstack-infra01:32
jeblairfungi: what's the name of the thread?01:32
fungithough looks like there are other threads also looping in a wait01:32
fungiThread: Thread-12214 (140153506952960)01:33
jeblairfungi: that appears to be the case01:33
fungii see about half a dozen threads that might be similarly blocking on that call01:34
jeblairfungi: 1701:34
*** matsuhashi has quit IRC01:34
fungiyeah, i lowballed. grep -c next time ;)01:34
*** matsuhashi has joined #openstack-infra01:35
*** wenlock has quit IRC01:35
fungiso i guess there are clearly times where that does not return. should we be wrapping that call in a timeout?01:35
*** jamesmcarthur has joined #openstack-infra01:36
*** matsuhashi has quit IRC01:40
*** matsuhashi has joined #openstack-infra01:43
jeblairfungi: i don't think it was hung01:43
jeblairfungi: i think the problem is that periodic cleanup exceptions on a node stop the thread01:43
jeblairfungi: grep "periodic cleanup" /var/log/nodepool/debug.log01:44
*** noorul has left #openstack-infra01:45
fungiInvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: UPDATE statement on table 'node' expected to update 1 row(s); 0 were matched.01:46
funginice01:46
clarkbso the cleanup thread dies? (eating dinner so mostly afk)01:46
jeblairclarkb: exits01:46
jeblairfungi: i think it's racing with the normal delete threads01:47
*** alchen99 has quit IRC01:49
jeblairperiodic should probably only delete a node in the delete state if it's been in that state for at least 15 mins01:49
fungifair, it should realistically never take that long for the normal cleanup to run i guess01:50
jeblairfungi: it times out after 10 mins01:50
jeblairwe might actually just want to do that first before changing the exception handler, to flush out any similar bugs01:51
*** harlowja has joined #openstack-infra01:51
jeblairanyway, me -> dinner01:52
fungiahh, so it does... for count in iterate_timeout(600, "waiting for server %s deletion"01:52
*** jcooley_ has joined #openstack-infra01:54
*** bingbu has joined #openstack-infra01:55
*** jaypipes has quit IRC01:58
*** sjing has joined #openstack-infra01:58
*** nati_ueno has joined #openstack-infra01:59
*** oubiwann has quit IRC02:03
lifelessttx: https://wiki.openstack.org/wiki/Governance/Foundation/TechnicalCommittee - 'spring' and 'fall' are meaningless terms. Can we change that to specify calendar months? Or hemispheres?02:03
*** yaguang has joined #openstack-infra02:04
*** alexpilotti has joined #openstack-infra02:06
*** sjing has quit IRC02:07
*** changbl has joined #openstack-infra02:09
*** sjing has joined #openstack-infra02:09
*** dolphm has joined #openstack-infra02:11
*** gyee has quit IRC02:13
*** metabro has quit IRC02:14
*** xeyed4good has quit IRC02:16
*** ilyashakhat has quit IRC02:17
*** changbl has quit IRC02:19
*** pcrews has joined #openstack-infra02:22
*** ogelbukh has quit IRC02:26
*** yamahata_ has joined #openstack-infra02:28
*** Ryan_Lane has quit IRC02:28
*** sarob has joined #openstack-infra02:29
*** david-lyle has quit IRC02:30
*** david-lyle has joined #openstack-infra02:31
*** yamahata_ has quit IRC02:33
*** mrodden has quit IRC02:34
*** senk has quit IRC02:38
*** sarob has quit IRC02:40
*** sarob has joined #openstack-infra02:42
*** jerryz has quit IRC02:42
*** sarob has quit IRC02:47
*** sarob has joined #openstack-infra02:48
*** yamahata_ has joined #openstack-infra02:50
*** mrodden has joined #openstack-infra02:50
*** senk has joined #openstack-infra02:51
*** sarob has quit IRC02:52
*** dcramer_ has quit IRC02:54
*** dolphm has quit IRC02:59
*** senk has quit IRC03:00
*** herndon_ has quit IRC03:02
*** changbl has joined #openstack-infra03:02
*** melwitt has quit IRC03:02
*** xeyed4good has joined #openstack-infra03:08
*** dkranz has quit IRC03:09
jog0holy crap: gate is 108 long03:11
jog0and check is 3003:12
*** nati_ueno has quit IRC03:14
*** nati_ueno has joined #openstack-infra03:16
*** D30 has joined #openstack-infra03:17
*** sileht has quit IRC03:17
clarkbjog0: its been that way all day03:17
*** nati_ueno has quit IRC03:20
*** sarob has joined #openstack-infra03:21
*** michchap has quit IRC03:21
*** michchap has joined #openstack-infra03:22
*** DennyZhang has joined #openstack-infra03:24
jog0clarkb: blames nova console log03:27
* jog0 blames ^03:27
notmynamejog0: I don't want to sound pessimistic, but did we have any patches pass jenkins today?03:29
*** matsuhashi has quit IRC03:31
notmyname108 jobs in the gate is 16 more than this morning. that's the wrong direction! ;-)03:31
*** matsuhashi has joined #openstack-infra03:31
notmynamejog0: do we need o stop triggering retries until things settle down?03:31
*** sjing has quit IRC03:32
*** sjing has joined #openstack-infra03:33
*** Ryan_Lane has joined #openstack-infra03:34
fungithere were definitely changes making it through. i saw post jobs (coverage, branch-tarball, et cetera) running for them from time to time03:34
fungispecifically for projects which are part of the integrated queue03:35
*** matsuhashi has quit IRC03:36
*** fifieldt has joined #openstack-infra03:36
notmynamefungi: should we hold of on doing rechecks?03:37
funginotmyname: i don't know that it would help any more than avoiding uploading or approving changes03:39
*** DennyZhang has quit IRC03:39
fungii just hope it gains some ground over the coming hours when activity is lower03:39
notmynamefungi: well, it would lower check jobs that simply failed but haven't been fully reviewed or aren't ready to merge yet03:39
fungithe check queue isn't really starving the gate03:40
notmynameok03:40
*** matsuhashi has joined #openstack-infra03:41
notmynamedo you know what the baseline was this morning? ie what number will be better or worse in the morning?03:41
notmyname*morning == in 10 hours for everyone03:41
jog0notmyname: you should sound pessimistic03:42
notmynameie with 70 jobs in the queue in 10 hours, will that be great or terrifying as to what the gate queue would look like in 24 hours from now?03:42
fungithroughput is being mostly limited by the increased nondeterminism in the jobs being run constantly restarting jobs for changes behind them03:42
notmynameor do we need 7 jobs in the queue in 10 hours?03:42
*** dstanek has quit IRC03:43
fungiwe had roughly 75 in the gate when i started working around 1400z03:43
fungiand we've lost ground by about 30 since03:44
notmynameor 50% if you want to be pessimistic like jog003:44
notmynameI gotta go for the night. I know you all are working hard on it. thanks03:45
fungii suspect that the best hope for speeding it up would be if more development effort could be focused on identifying and fixing the various sources of nondeterminism which have found their way in, rather than on unrelated development efforts03:48
fungisince the latter is adding fuel to the fire at the moment03:49
*** dstanek has joined #openstack-infra03:49
*** dstanek has quit IRC03:50
*** portante is now known as I-Am-Sam03:52
*** CaptTofu has quit IRC03:53
*** CaptTofu has joined #openstack-infra03:54
*** guohliu has joined #openstack-infra03:54
*** jcooley_ has quit IRC03:57
*** yamahata_ has quit IRC03:58
*** SergeyLukjanov has joined #openstack-infra04:01
*** I-Am-Sam is now known as portante04:02
*** boris-42 has joined #openstack-infra04:03
*** ljjjusti1 has quit IRC04:05
*** metabro has joined #openstack-infra04:10
*** dstanek has joined #openstack-infra04:14
openstackgerritKhai Do proposed a change to openstack-infra/pypi-mirror: add an export option  https://review.openstack.org/5734504:17
clarkblifeless we test openstack on libvirt 0.9.8 because cloud archive mongodb is broken (iirc this is why we dont use cloud archive)04:18
jog0notmyname: one of the bugs is swift config in devstack04:20
openstackgerritKhai Do proposed a change to openstack-infra/config: add nodepool to jenkins-dev server  https://review.openstack.org/5733304:21
jog0https://bugs.launchpad.net/bugs/125251404:21
uvirtbotLaunchpad bug 1252514 in swift "glance doesn't recover if Swift returns an error" [Undecided,New]04:21
*** mgagne has joined #openstack-infra04:22
*** mgagne has quit IRC04:22
*** mgagne has joined #openstack-infra04:22
*** DinaBelova has joined #openstack-infra04:23
*** ogelbukh has joined #openstack-infra04:24
lifelessclarkb: is there a bug open for that ?04:24
portantejog0 looking04:24
*** wenlock has joined #openstack-infra04:25
jog0portante: thanks04:25
clarkblifeless not sure jd_- was dealing with it04:26
clarkb*jd__04:26
openstackgerritJeremy Stanley proposed a change to openstack-infra/nodepool: Skip periodic cleanup if the node is not stale  https://review.openstack.org/5736404:26
* portante goes and pulls the glance code ...04:26
lifelessclarkb: because, LTS w/out cloud archive isn't a config I expect to be representative of deployments04:27
*** DennyZhang has joined #openstack-infra04:28
*** mgagne1 has joined #openstack-infra04:28
*** mgagne1 has quit IRC04:28
*** mgagne1 has joined #openstack-infra04:28
clarkbtotally agree, but cloud archive can't do an all in one install (or couldn't)04:29
lifelesscause of mongo?04:29
clarkbyup04:30
*** mgagne has quit IRC04:31
*** jcooley_ has joined #openstack-infra04:33
*** masayukig has joined #openstack-infra04:34
*** SergeyLukjanov has quit IRC04:37
*** jcooley_ has quit IRC04:38
portantejog0: okay, so it looks like the proxy-server configuration for swift on the devstack is set to 10 seconds, but the object server took around 43 seconcds to create the object04:41
jog0portante: so what is the fix?04:41
jog0do we somehow make the object server faster? or just bump the timeout04:42
portanteso the tolerances for swift need to be loosened up a bit, it would seem04:42
jog0portante: ohh and awesome, thanks04:42
jog0want to propose a patch to devstack for this?04:42
portantepoint me at a repo and gerrit?04:42
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Remove queries for dead bugs  https://review.openstack.org/5736704:42
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Add doc on queries.yaml  https://review.openstack.org/5736804:42
portantenever don't that before04:42
jog0portante: http://git.openstack.org/cgit/openstack-dev/devstack/04:43
* portante looks04:43
jog0portante: I think you want to look at lib/swift04:43
*** dcramer_ has joined #openstack-infra04:43
portantek04:44
jog0and look at the iniset commands04:44
*** arata has joined #openstack-infra04:46
*** sandywalsh has quit IRC04:49
*** markwash has joined #openstack-infra04:51
*** boris-42 has quit IRC04:53
*** yamahata_ has joined #openstack-infra04:54
*** boris-42 has joined #openstack-infra04:55
*** boris-42 has quit IRC04:55
*** dcramer_ has quit IRC04:55
*** boris-42 has joined #openstack-infra04:56
*** DennyZhang has quit IRC04:57
mordredclarkb: we should not be using wheels in any way yet04:58
clarkbmordred thanks I didn't think so04:59
mordredclarkb: also, even if we were, it shouldnt' affect libvirt, since we don't pip install that04:59
mordred:(04:59
clarkbright but it may have affected $otherthing possibly05:00
portantejog0: see http://paste.openstack.org/show/53637/05:00
clarkbmordred: mikal and jog0 are closing in on the problem I think05:01
portantethe conn_timeout is all about how long it takes a connect() system call to return05:01
jog0portante: looks good to me05:01
*** sarob has quit IRC05:01
portante20 seconds might be too generous05:01
jog0gitreview that sucker05:01
jog0portante: your the swift expert your call05:01
*** sarob has joined #openstack-infra05:01
jog0clarkb: closing in, is a strong term05:01
portantenode_timeout is all about how long between read operations a node takes to respond to the proxy server05:01
*** nati_ueno has joined #openstack-infra05:02
portantejog0: telling you this so that you can adjust after I hit the sack05:02
clarkbjog0: I fully expect the problem to be gone tomorrow :)05:02
portanteI did not set this up to file via gerrit, so could you do that?05:02
portantejog0 ?05:03
clarkbportante I can do it in the morning if no one beats me to it05:03
jog0portante: sure05:03
portanteokay05:03
portantewhat is the load like on the host machines all this runs on?05:05
*** yamahata_ has quit IRC05:05
portanteclarkb, jog0?05:05
clarkbit can be a little high but not for long sustained periods of time05:05
portanteenough to kill one request05:05
clarkbpossibly05:06
portanteSo glance gets a 503 from swift and just gives up, which it should05:06
*** sarob has quit IRC05:06
portantebut swift is actually completing the request behind the scenes05:06
portantewe should get this behavior in a bug for the swift team to comment on05:07
jog0I figured glance should give up but wasn't 100% sure05:07
jog0portante: leave a comment on https://bugs.launchpad.net/glance/+bug/125251405:07
uvirtbotLaunchpad bug 1252514 in swift "glance doesn't recover if Swift returns an error" [Undecided,New]05:07
portantek05:07
*** xeyed4good has quit IRC05:08
*** dcramer_ has joined #openstack-infra05:08
*** sdake_ has joined #openstack-infra05:11
jog0portante: https://review.openstack.org/5737305:13
*** jcooley_ has joined #openstack-infra05:14
*** arata has quit IRC05:14
*** yamahata_ has joined #openstack-infra05:15
portantejog0: posted a +1 for that05:18
*** jcooley_ has quit IRC05:19
jog0portante: woot!05:21
jog0horray teamwork05:22
portante+105:22
portanteship it05:22
portantelet's get one more gate job failure out of the way05:22
*** chandankumar has joined #openstack-infra05:22
portanteor perhaps even "set of failures" out of the way05:22
jog0sdague, other devstack cores ^05:27
*** sarob has joined #openstack-infra05:28
*** jamesmcarthur has quit IRC05:29
*** yamahata_ has quit IRC05:33
portantejog0: when will that make it into the gate jobs?05:36
jog0portante: when the devstack cores are around to +2 it and we can squeeze it through the gate05:38
jog0portante: raw numbers on gate issues http://paste.debian.net/66730/05:38
portantek, thanks, so the two big boys are not addressed by this, but it looks like this change will probably help out with two or three others05:40
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Add launchpad support to check_success  https://review.openstack.org/5737405:40
jog0portante: yeah about to send out an email with that data but annotated05:41
portantekewl, thanks05:42
* portante hits the sack05:42
*** mgagne1 is now known as mgagne05:44
sdaguejog0: just got back from dinner, looking05:45
*** jhesketh__ has quit IRC05:45
sdaguejog0: bash8 is going to fail you for that :)05:46
sdaguemr tabs man05:46
*** ljjjustin has joined #openstack-infra05:46
*** jhesketh__ has joined #openstack-infra05:47
*** DinaBelova has quit IRC05:49
*** SergeyLukjanov has joined #openstack-infra05:50
jog0sdague: just fixed05:51
*** harlowja has quit IRC05:52
sdagueyou didn't read my other comment though05:52
*** marun has joined #openstack-infra05:55
*** marun has quit IRC05:55
*** vipul has quit IRC05:56
*** mihgen has joined #openstack-infra05:56
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Add launchpad support to check_success  https://review.openstack.org/5737405:56
*** vipul has joined #openstack-infra05:56
*** sdake_ has quit IRC05:59
*** sdake_ has joined #openstack-infra05:59
*** sdake_ has quit IRC05:59
*** sdake_ has joined #openstack-infra05:59
*** nati_ueno has quit IRC06:03
*** senk has joined #openstack-infra06:03
*** nati_ueno has joined #openstack-infra06:04
*** michchap has quit IRC06:04
*** michchap has joined #openstack-infra06:05
*** matsuhashi has quit IRC06:08
*** matsuhashi has joined #openstack-infra06:08
*** davidhadas has joined #openstack-infra06:08
*** Ryan_Lane has quit IRC06:12
*** matsuhashi has quit IRC06:13
*** Ryan_Lane has joined #openstack-infra06:13
*** Ryan_Lane has joined #openstack-infra06:13
*** mestery has quit IRC06:14
*** sarob has quit IRC06:15
*** sarob has joined #openstack-infra06:15
*** mestery has joined #openstack-infra06:18
*** senk has quit IRC06:18
*** ljjjustin has quit IRC06:21
*** jcooley_ has joined #openstack-infra06:21
*** markwash has quit IRC06:23
*** yongli has joined #openstack-infra06:24
*** jcooley_ has quit IRC06:24
*** nosnos_ has joined #openstack-infra06:25
openstackgerritSergey Lukjanov proposed a change to openstack-infra/config: Setup devstack-gate tests for Savanna  https://review.openstack.org/5731706:25
*** senk has joined #openstack-infra06:25
*** nosnos has quit IRC06:29
*** matsuhashi has joined #openstack-infra06:29
*** senk has quit IRC06:30
*** vipul is now known as vipul-away06:34
*** jcooley_ has joined #openstack-infra06:35
*** jhesketh__ has quit IRC06:37
*** marun has joined #openstack-infra06:38
*** davidhadas has quit IRC06:42
*** sdake_ has quit IRC06:44
*** sarob has joined #openstack-infra06:47
*** afazekas has quit IRC06:50
*** dstanek has quit IRC06:51
*** jcooley_ has quit IRC06:51
*** SergeyLukjanov has quit IRC06:52
*** luisg has quit IRC06:54
*** cody-somerville has quit IRC06:55
*** odyssey4me has joined #openstack-infra06:59
*** sdake_ has joined #openstack-infra07:00
*** sarob has quit IRC07:03
*** jcooley_ has joined #openstack-infra07:03
*** nosnos_ has quit IRC07:05
*** sjing has quit IRC07:05
*** arata has joined #openstack-infra07:05
*** nosnos has joined #openstack-infra07:05
*** odyssey4me has quit IRC07:05
*** sjing has joined #openstack-infra07:06
*** vipul-away is now known as vipul07:07
*** mgagne1 has joined #openstack-infra07:08
*** mgagne1 has quit IRC07:08
*** mgagne1 has joined #openstack-infra07:08
*** mgagne has quit IRC07:09
*** jcooley_ has quit IRC07:10
*** ljjjustin has joined #openstack-infra07:11
*** marun has quit IRC07:13
*** marun has joined #openstack-infra07:14
*** denis_makogon has joined #openstack-infra07:21
*** matsuhashi has quit IRC07:22
*** matsuhashi has joined #openstack-infra07:23
*** yolanda has joined #openstack-infra07:25
*** afazekas_ has joined #openstack-infra07:29
*** matsuhas_ has joined #openstack-infra07:30
*** bingbu has quit IRC07:31
*** bingbu has joined #openstack-infra07:32
*** wenlock has quit IRC07:33
*** matsuhashi has quit IRC07:34
*** nsaje has joined #openstack-infra07:34
*** che-arne has quit IRC07:35
*** nsaje has quit IRC07:35
*** DinaBelova has joined #openstack-infra07:35
*** flaper87|afk is now known as flaper8707:36
*** mgagne1 has quit IRC07:45
*** davidhadas has joined #openstack-infra07:51
*** DinaBelova has quit IRC07:52
*** sileht has joined #openstack-infra07:55
*** sileht has quit IRC07:55
*** sileht_ has joined #openstack-infra07:55
*** sileht_ is now known as sileht07:56
*** sdake_ has quit IRC07:56
*** sarob has joined #openstack-infra07:59
*** sarob has quit IRC08:03
*** SergeyLukjanov has joined #openstack-infra08:04
*** mihgen has quit IRC08:05
*** marun has quit IRC08:05
*** marun has joined #openstack-infra08:05
*** dizquierdo has joined #openstack-infra08:06
*** xeyed4good has joined #openstack-infra08:08
*** jcooley_ has joined #openstack-infra08:11
*** osanchez has joined #openstack-infra08:12
*** xeyed4good has quit IRC08:12
*** nsaje has joined #openstack-infra08:14
*** Hefeweizen has quit IRC08:17
*** matsuhas_ has quit IRC08:18
*** matsuhashi has joined #openstack-infra08:19
*** fbo_away is now known as fbo08:19
*** DinaBelova has joined #openstack-infra08:20
*** matsuhashi has quit IRC08:24
*** boris-42 has quit IRC08:26
*** boris-42 has joined #openstack-infra08:28
openstackgerritA change was merged to openstack-dev/pbr: Ignore jenkins@openstack.org in authors building  https://review.openstack.org/5640708:29
*** matsuhashi has joined #openstack-infra08:31
*** resker has joined #openstack-infra08:36
*** arata has left #openstack-infra08:38
openstackgerritDavid Caro proposed a change to openstack-infra/jenkins-job-builder: Added config options to not overwrite jobs desc  https://review.openstack.org/5208008:38
*** esker has quit IRC08:39
*** mihgen has joined #openstack-infra08:41
*** hashar has joined #openstack-infra08:43
*** denis_makogon has quit IRC08:43
*** jcoufal has joined #openstack-infra08:44
*** jcooley_ has quit IRC08:45
*** shardy_afk is now known as shardy08:46
*** sarob has joined #openstack-infra08:47
*** boris-42 has quit IRC08:49
*** DinaBelova has quit IRC08:52
*** ljjjustin has quit IRC08:52
*** derekh has joined #openstack-infra08:54
*** guohliu has quit IRC08:58
ttxlifeless: about spring/fall, feel free to propose alternate wording (it's in openstack/governance:reference/charter) -- the trick is since elections happen a number of weeks before release, I wanted to stay fuzzy ("spring" = March-May, "fall" = September-November) rather than write month names in stone08:58
*** nati_ueno has quit IRC08:59
*** ilyashakhat has joined #openstack-infra08:59
*** DinaBelova has joined #openstack-infra08:59
lifelessttx: there you go.09:00
ttxif it fell on clear quarters we could have used Q2/Q4 but that's not really the case09:01
*** yassine has joined #openstack-infra09:06
*** jpich has joined #openstack-infra09:06
*** arata has joined #openstack-infra09:10
openstackgerritMarcus Nilsson proposed a change to openstack-infra/jenkins-job-builder: Added support for Stash Notifier  https://review.openstack.org/5633709:11
*** marun has quit IRC09:13
lifelessttx: sure, my main point was that fall and spring are relative terms09:14
*** zaro0508 has quit IRC09:14
*** zaro0508 has joined #openstack-infra09:14
lifelessttx: and it's hemispherist to assume they are northern without calling it out09:14
*** marun has joined #openstack-infra09:14
ttxI definitely am an hemispherist. I should get invited south more often09:16
lifelessttx: open invite here.09:16
lifelessttx: just bring a crate of nice French wine.09:16
*** sjing has quit IRC09:17
*** arata has quit IRC09:19
*** sarob has quit IRC09:21
*** alexpilotti has quit IRC09:23
*** bingbu has quit IRC09:25
*** talluri has joined #openstack-infra09:28
yolandamordred, jeblair, any update to the licensecheck jenkins bug?09:29
jcoufalhey, can we add #openstack-ux channel to the list of OpenStack IRCs? (https://wiki.openstack.org/wiki/IRC)09:29
*** afazekas_ is now known as afazekas09:34
*** Ryan_Lane has quit IRC09:34
*** D30 has quit IRC09:35
DinaBelovahello, guys! We have some strange thing with Jenkins tests yesterday and today this problem seems to be unsolved. On the same change pep8 may be failed (and errors are like 'module X is not a molude. Import only modules') and may be passed ok. Like for this change https://review.openstack.org/#/c/57106/ (it was merged finally): failed logs - http://logs.openstack.org/06/57106/2/check/gate-climate-pep8/ac52a11/console.html good ones -09:35
DinaBelova http://logs.openstack.org/06/57106/2/gate/gate-climate-pep8/44394c9/console.html09:35
DinaBelovaDo you have any idea what it may be connected with?09:36
*** pblaho has joined #openstack-infra09:38
*** dizquierdo has quit IRC09:39
*** D30 has joined #openstack-infra09:40
ogelbukhDinaBelova: gate seems to be unstable for last two days at least09:41
*** D30 has quit IRC09:41
*** jcooley_ has joined #openstack-infra09:41
DinaBelovaogelbukh, I've got it... But really I missed if there were any comments or other guys' complaints... So that's the known issue?09:43
jpichjcoufal: You should be able to edit the page yourself to add it, I don't think there are other requirements09:44
jcoufaljpich: okey, great, I just didn't know if there is need for some approval from openstack-infra team09:45
ogelbukhDinaBelova: here's some info on this http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html09:46
DinaBelovaogelbukh, thank you so much09:46
*** sarob has joined #openstack-infra09:47
*** jcooley_ has quit IRC09:47
*** davidhadas has quit IRC09:48
jpichjcoufal: They will probably request to register it under infra, which should be fine. We can ask about that in the afternoon when the infra chaps wake up :)09:48
jcoufaljpich: sure09:48
*** resker has quit IRC09:49
*** esker has joined #openstack-infra09:50
*** sarob has quit IRC09:51
*** odyssey4me has joined #openstack-infra09:52
*** esker has quit IRC09:54
*** talluri has quit IRC09:55
*** mattymo has joined #openstack-infra09:56
*** talluri has joined #openstack-infra09:56
*** masayukig has quit IRC09:58
*** Ryan_Lane has joined #openstack-infra10:04
*** davidhadas has joined #openstack-infra10:08
*** Ryan_Lane has quit IRC10:13
*** matsuhashi has quit IRC10:14
*** SergeyLukjanov is now known as _SergeyLukjanov10:16
*** _SergeyLukjanov has quit IRC10:17
*** SergeyLukjanov has joined #openstack-infra10:21
*** ruhe has joined #openstack-infra10:21
*** SergeyLukjanov is now known as _SergeyLukjanov10:24
*** SergeyLukjanov has joined #openstack-infra10:24
*** nsaje has quit IRC10:26
*** plomakin has quit IRC10:26
*** nsaje has joined #openstack-infra10:27
lifelessttx: do you happen to have a script to assess LP bug activity atm ?10:29
lifelessttx: see https://etherpad.openstack.org/p/nova-bug-triage for context10:29
*** talluri has quit IRC10:30
*** SergeyLukjanov has quit IRC10:30
ttxlifeless: the only time-based data I have would be http://status.openstack.org/bugday/10:30
ttx(recent bug activity)10:30
ttxlaunchpad is desperately dry when it comes to historical data, as you probably know10:31
ttxI may have other webnumbr's around though10:31
*** nsaje has quit IRC10:31
* ttx digs deeper10:32
ttxhttp://webnumbr.com/untouched-nova-bugs10:32
ttxhttp://webnumbr.com/open-nova-bugs10:33
ttxhttp://webnumbr.com/nova-bugfixes10:34
ttxlifeless: that all I have for nova ^10:34
lifelessttx: I think I'll refactor reviewstats to be unrecognisable10:38
lifelessand then feed in bug data as a source10:38
lifelesscause we all need a new data analytics framework10:38
ttxI'm interested in what you come up with.10:39
ttxthose workarounds above all query LP at regular intervals and try to build some historical data, but the queries are quite narrow10:40
*** lcestari has joined #openstack-infra10:41
*** DinaBelova has quit IRC10:41
*** marun has quit IRC10:42
*** jcooley_ has joined #openstack-infra10:43
hashar<rant>attempted to switch my Zuul setup to use the Gearman version, turns out the Jenkins gearman plugin has a very nasty bug :/ </rant>10:44
* hashar blames Zaro and jeblair :D10:44
lifelessttx: meh, I'll just query the s**t out of LP.10:46
lifelessttx: iteration 0, work but not be pretty.10:46
ttxlifeless: heh, I wonder how much % of LP total traffic can be traced back to people regularly querying it to work around its lack of historical data and graphs.10:46
*** osanchez is now known as OlivierSanchez10:46
*** sarob has joined #openstack-infra10:47
lifelessttx: a fairly large amount, but that traffic is also cheap to answer.10:47
lifelessttx: we had trouble when we had several thousand such scripts all running at once in the OEM team10:47
mattymois anyone aware of an IRC bot that watches changes in Launchpad bugs, similar to our lovely gerritbot?10:49
*** ruhe has joined #openstack-infra10:50
*** sarob has quit IRC10:51
*** OlivierSanchez is now known as osanchez10:52
*** osanchez has quit IRC10:54
*** osanchez has joined #openstack-infra10:54
lifelessmattymo: I'm sure there are several10:57
lifelessmattymo: launchpad-users list would be a place to ask10:58
*** ruhe has quit IRC10:58
*** SergeyLukjanov has joined #openstack-infra10:59
*** dpyzhov has joined #openstack-infra10:59
ogelbukhlifeless: I guess mattymo means if there anything like that in infra11:00
dpyzhovhi. what happened with review.openstack.org? Reivews with +2 are hang unmerged11:01
dpyzhovAny estimates for fix?11:01
*** DinaBelova has joined #openstack-infra11:04
*** odyssey4me has quit IRC11:05
*** ruhe has joined #openstack-infra11:09
lifelessdpyzhov: see joe's email to -dev11:09
lifelessdpyzhov: bad tests/code in trunk -> flaky gate -> backlog11:09
*** Ryan_Lane has joined #openstack-infra11:10
mattymolifeless, this one? http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html11:11
*** sgran has joined #openstack-infra11:12
sgranhello.  I'm curious who I talk to about a tempest change11:12
*** sandywalsh has joined #openstack-infra11:12
ogelbukhmattymo: that one, yes11:12
sgranhttps://review.openstack.org/#/c/57311/ is the one I'm looking at11:12
mattymoit doesn't indicate that merging was going to be frozen for any project (including stackforge)11:13
*** yaguang has quit IRC11:13
*** odyssey4me has joined #openstack-infra11:14
lifelessmattymo: it's not frozen; its just very very slow because all the optimisations depend on a low gate failure rate.11:14
*** Ryan_Lane has quit IRC11:15
mattymonice11:16
*** mihgen has quit IRC11:16
*** jcooley_ has quit IRC11:17
*** pcm_ has joined #openstack-infra11:19
*** pcm_ has quit IRC11:23
*** pcm_ has joined #openstack-infra11:24
*** mihgen has joined #openstack-infra11:25
*** davidhadas has quit IRC11:35
*** mihgen_ has joined #openstack-infra11:41
*** mihgen has quit IRC11:45
*** mihgen_ is now known as mihgen11:45
*** sarob has joined #openstack-infra11:47
*** hashar_ has joined #openstack-infra11:48
*** hashar has quit IRC11:49
*** hashar_ is now known as hashar11:49
*** dstanek has joined #openstack-infra11:51
*** dstanek has quit IRC11:55
ekarlsowhat is the new local.conf thing in devstack ?11:59
*** yamahata_ has joined #openstack-infra12:00
*** davidhadas has joined #openstack-infra12:00
*** resker has joined #openstack-infra12:04
*** resker has quit IRC12:05
BobBallekarlso: it's an amalgamation of locarc and other changes you might want to make to nova.conf post-devstack installation12:08
BobBallyou can use the existing localrc if you want, or migrate it to the brave new world12:08
*** nosnos_ has joined #openstack-infra12:09
BobBallekarlso: http://devstack.org/localrc.html12:09
ekarlsooh12:10
ekarlsocool!12:10
*** odyssey4me has quit IRC12:11
*** Ryan_Lane has joined #openstack-infra12:11
*** nosnos has quit IRC12:12
*** jcooley_ has joined #openstack-infra12:13
*** nosnos_ has quit IRC12:14
*** Ryan_Lane has quit IRC12:16
*** nsaje has joined #openstack-infra12:16
*** odyssey4me has joined #openstack-infra12:19
*** jcooley_ has quit IRC12:19
*** sarob has quit IRC12:20
*** ruhe has quit IRC12:21
*** jamesmcarthur has joined #openstack-infra12:23
*** nsaje has quit IRC12:23
*** nsaje has joined #openstack-infra12:23
*** marun has joined #openstack-infra12:24
*** michchap_ has joined #openstack-infra12:25
*** michchap has quit IRC12:25
*** ruhe has joined #openstack-infra12:26
*** chuck__ is now known as zul12:26
zulgates are backed up i guess?12:27
*** ruhe has quit IRC12:27
*** nsaje has quit IRC12:28
*** boris-42 has joined #openstack-infra12:30
*** ruhe has joined #openstack-infra12:31
*** sarob has joined #openstack-infra12:32
*** marun has quit IRC12:34
*** marun has joined #openstack-infra12:34
*** sarob has quit IRC12:37
*** johnthetubaguy has joined #openstack-infra12:45
*** davidhadas has quit IRC12:47
*** sarob has joined #openstack-infra12:47
*** pcm__ has joined #openstack-infra12:49
*** sarob has quit IRC12:51
*** pcm_ has quit IRC12:52
*** AJaeger has joined #openstack-infra12:53
*** pcm__ has quit IRC12:56
*** dprince has joined #openstack-infra12:57
*** changbl has quit IRC12:57
*** pcm_ has joined #openstack-infra12:57
*** hashar has quit IRC12:58
*** jamesmcarthur has quit IRC13:01
BobBallI think it's a bit worse than backed up - but I'm not sure13:02
BobBallgate should be given priority over check - but only a few gate jobs are running, but loads of checks are13:02
BobBallhttps://review.openstack.org/#/c/56065/ was approved yesterday but gate jobs still haven't started...  :)13:03
BobBallSounds like a broken gate to me!13:03
*** nsaje has joined #openstack-infra13:05
*** fifieldt has quit IRC13:07
*** ruhe has quit IRC13:08
*** hashar has joined #openstack-infra13:08
anteayahi BobBall yes gate is in a bad way13:09
BobBallPoor thing...13:09
anteayawe had a discussion about it last night, different options for what to do13:09
BobBallProbably having a huff13:09
anteayaI went to sleep before it all came to an end13:09
anteayawas about to read the backlog13:09
* BobBall reads the backlog too :)13:09
anteayaI think rather a perfect storm of many things13:10
anteayatrying to get them identified and deal with them effectively13:10
anteaya:D13:10
BobBalllooks nasty :)13:11
anteayayeah, 120 in the gate and only the top 7 patches have running jobs atm13:12
*** thomasem has joined #openstack-infra13:12
anteayait is13:12
*** Ryan_Lane has joined #openstack-infra13:12
anteayawith no clear and easy approach to fix13:12
BobBalland unfortunately fungi's last comment was that he hoped things improved with lower activity13:12
BobBallwhich they have not13:12
BobBallit's just got worse!13:12
anteayayes13:12
anteayaif you have time to add your thoughts, bug tracking expertise it would not go amiss today13:12
anteayamethinks the gate will be at the fore of activity again today13:13
BobBallso the problem is too many failures caused by bugs?13:13
*** dstanek has joined #openstack-infra13:13
anteayathat is a big part of it13:13
BobBallin the rechecks queue?13:13
anteayathe fact we lost jenkins01 very early Monday morning and haven't recovered since doesn't help either13:13
BobBallwell jenkins01 hasn't had a holiday for ages13:14
anteayajenkins01 is back online but the hiccup really set us back in terms of keeping up to the volume13:14
anteayayeah, and had a bit of a hard time Monday morning - too many processes running so the cron job failed and it reset its puppet.conf to a default13:15
anteayawhich set cert to undefined13:15
anteayait created and deleted nodes but ran no jobs until we got it back on track13:15
anteayathen clarkb and fungi had to go in an manually delted unattached nodes13:15
anteayathat was Monday13:15
*** jcooley_ has joined #openstack-infra13:15
BobBallblimey - as you say - a perfect storm13:15
anteayayes13:16
*** Ryan_Lane has quit IRC13:16
*** alcabrera has joined #openstack-infra13:16
anteayaso any help or kind words you have would be most welcome13:16
*** boris-42 has quit IRC13:17
*** afazekas is now known as afazekas_mtg13:18
BobBallI'm afraid that I could offer substantially less help than those who have already been involved... but I can have a look at a few of the recheck bugs to see if I can work out what's going on13:18
*** boris-42 has joined #openstack-infra13:18
*** nsaje has quit IRC13:18
anteayaawesome thanks BobBall13:19
anteayathat would be a great help13:19
BobBalldon't be so sure!13:19
*** nsaje has joined #openstack-infra13:19
anteayajust glad you are willing to look13:19
BobBallOne better way of doing this - is there a way to get a list of all changes that have been verified but failed gate?13:23
*** herndon_ has joined #openstack-infra13:24
*** nsaje has quit IRC13:24
anteayaBobBall: http://status.openstack.org/rechecks/13:27
anteayaBobBall: http://status.openstack.org/elastic-recheck/13:27
*** zoresvit has quit IRC13:27
anteayathis is a converstation going that we may disable elastic-recheck as it might be being abused and causing folks to push through bad patches13:28
anteayaor contributing to that behaviour if not causing13:28
*** zoresvit has joined #openstack-infra13:30
BobBallelastic-recheck is close to what I was thinking... the rechecks page doesn't include all of the gate failures - does it even get update with reverify vs recheck?13:32
*** amotoki has joined #openstack-infra13:32
*** ruhe has joined #openstack-infra13:34
BobBallheh... I see enough people have been on the two bugs that are causing problems13:34
*** ruhe has quit IRC13:34
BobBallthe neutron one and the console logs one13:35
*** ruhe has joined #openstack-infra13:35
BobBallI suspect I can't help with either, but I'll look at the console logs one just in case!13:35
*** ilyashakhat has quit IRC13:35
anteayaokay look back in the logs last night for a conversation between mikal and clarkb13:35
anteayathey were working on it and had saved a few nodes to see if that would help13:36
*** ilyashakhat has joined #openstack-infra13:36
anteayaBobBall: and noone is on this bug yet: https://bugs.launchpad.net/neutron/+bug/125178413:37
uvirtbotLaunchpad bug 1251784 in nova "nova+neutron scheduling error: Connection to neutron failed: Maximum attempts reached" [Critical,New]13:37
anteayaI was just about to ask about it in -neutron13:37
*** dkranz has joined #openstack-infra13:37
anteayado you have time to help on that one?13:37
BobBallI have absolutely no knowledge in the area :/13:37
*** DinaBelova has quit IRC13:37
anteayaunderstood13:38
anteayathanks though13:38
BobBallI wish I did - I hit that in gate a few months ago - but it was claimed to have been fixed13:38
BobBallThat was https://bugs.launchpad.net/bugs/121191513:39
uvirtbotLaunchpad bug 1211915 in neutron/havana "Connection to neutron failed: Maximum attempts reached" [High,Fix committed]13:39
*** osanchez is now known as cdk_gerritbot13:40
*** cdk_gerritbot is now known as osanchez13:40
*** zul has quit IRC13:41
*** yaguang has joined #openstack-infra13:41
*** zul has joined #openstack-infra13:41
anteayaBobBall: according to jog0 this bug is a different root cause13:44
anteayaI personally don't know enough about the internals to verify independently13:44
anteayaI am, for better or worse, trusting jog0 on this assessment13:44
BobBallah ok13:45
BobBallindeed :)13:45
*** afazekas_mtg is now known as afazekas13:45
*** dcramer_ has quit IRC13:46
*** cyril has joined #openstack-infra13:46
*** osanchez has quit IRC13:46
*** DinaBelova has joined #openstack-infra13:46
*** osanchez has joined #openstack-infra13:46
*** cyril is now known as Guest9707913:46
*** mihgen_ has joined #openstack-infra13:47
*** ogelbukh has quit IRC13:47
*** sarob has joined #openstack-infra13:47
*** mihgen has quit IRC13:47
*** mihgen_ is now known as mihgen13:47
*** ilyashakhat_ has joined #openstack-infra13:48
*** ilyashakhat has quit IRC13:48
*** jcooley_ has quit IRC13:49
anteaya:D13:50
*** ogelbukh has joined #openstack-infra13:50
*** cody-somerville has joined #openstack-infra13:50
openstackgerritMarcus Nilsson proposed a change to openstack-infra/jenkins-job-builder: Added support for Stash Notifier  https://review.openstack.org/5633713:51
openstackgerritThierry Carrez proposed a change to openstack-infra/config: Track icehouse development in releasestatus  https://review.openstack.org/5744113:51
*** nsaje has joined #openstack-infra13:51
*** nsaje has quit IRC13:51
*** nsaje has joined #openstack-infra13:52
*** nsaje has quit IRC13:52
*** nsaje has joined #openstack-infra13:52
*** sarob has quit IRC13:52
*** sarob has joined #openstack-infra13:53
anteayaso fungi I know that the gate will be your immediate concern upon arrival13:53
anteayasomething to know is that marun is working on one of the -neutron gate block bugs13:54
anteayaand needs to talk to you, about another job is it marun?13:54
marunanteaya: the job addition is a side-project to introduce functional testing, it won't fix gate issues13:54
anteayaah sorry, I mis-understood13:55
anteayaso for later when introducing functional testing13:55
marunwe already have functional tests in the tree, but some of them can't run because they need sudo privileges13:57
marunsomeone suggested at the summit to create a new functional-only job that runs as the tempest user so that sudo is allowed13:58
sgrancan I ask for a review of https://review.openstack.org/#/c/57311/ when someone has a moment, please?13:59
sgranit will allow me to make a real change in neutron afterwards14:00
*** dkliban has joined #openstack-infra14:01
*** julim has joined #openstack-infra14:04
*** yamahata_ has quit IRC14:06
*** yamahata_ has joined #openstack-infra14:06
*** dolphm has joined #openstack-infra14:06
*** sarob has quit IRC14:08
*** ruhe has quit IRC14:10
*** ruhe has joined #openstack-infra14:12
*** hashar has quit IRC14:14
*** ruhe has quit IRC14:16
*** jergerber has joined #openstack-infra14:18
*** CaptTofu has quit IRC14:19
*** CaptTofu has joined #openstack-infra14:19
*** thomasm has joined #openstack-infra14:22
*** thomasm is now known as Guest9840814:23
*** mriedem has joined #openstack-infra14:23
fungianteaya: i'm no longer concerned, just holding out hope that someone will fix the current bugs in openstack which are slowing down gating14:23
*** thomasem has quit IRC14:23
*** Guest98408 is now known as thomasem14:23
fungisgran: i'm happy to review your tempest change, but keep in mind that the infrastructure team aren't core reviewers on tempest... you probably want the qa team (headquartered in #openstack-qa)14:24
sgranah, great14:24
anteayafungi: okay14:25
sgranand, the more reviewers the better, so please :)14:25
*** jamesmcarthur has joined #openstack-infra14:25
fungianteaya: it looks like zuul, jenkins, nodepool et al are working as intended, so i'm actually thrilled they don't fall over under a worst-case testing situation such as this14:27
anteayayes14:28
*** AJaeger has left #openstack-infra14:28
anteayaokay wasn't sure on what you wanted to address first this morning14:28
fungiit looks like the gate actually caught up quite a bit but then a lot of new changes started getting approved 4-5 hours ago, bringing it up to its current length14:28
anteayaah14:28
fungi(note the 8-hour sparklines above the different pipelines)14:29
anteayajust came out of one of the neutron meetings, can't keep track of them all yet, mestery needs to discuss the creation of a testing structure for multi-node testing14:29
anteayafungi: the sparklines are coming back up aren't they14:30
*** dprince has quit IRC14:30
*** weshay has joined #openstack-infra14:31
fungithe change at the head of the integrated gate queue was approved a little over 14 hours ago, so things actually *are* moving14:31
fungiit only seems like wading through pitch compared to how fast things get through on sunnier days14:32
anteayayes14:34
*** dprince has joined #openstack-infra14:34
anteayamorning dprince14:34
dprinceanteaya: hello14:34
anteaya:D14:35
*** markmc has joined #openstack-infra14:37
*** mfer has joined #openstack-infra14:40
*** boris-42_ has joined #openstack-infra14:40
*** boris-42 has quit IRC14:41
ttxmordred: in case you missed it, you were #action-ed to create a thread summarizing the issue with glance client lib branches on the ML, to kick off the discussion there14:42
ttxfun, uh?14:43
*** thomasem has quit IRC14:44
*** thomasem has joined #openstack-infra14:44
*** sarob has joined #openstack-infra14:47
*** senk has joined #openstack-infra14:50
BobBallWhere is the devstack-gate.yaml housed now?  I think we need the same change as https://review.openstack.org/#/c/53249/2/modules/openstack_project/files/jenkins_job_builder/config/devstack-gate.yaml in grenade (up timeout to 90 minutes) - hit in check queue at http://logs.openstack.org/31/57431/2/check/check-grenade-devstack-vm/d8cfbd8/console.html14:51
*** ekarlso has quit IRC14:51
*** ekarlso has joined #openstack-infra14:52
*** ekarlso has quit IRC14:52
BobBalloh... heh... typical.  I just found it in config14:52
*** wenlock has joined #openstack-infra14:53
openstackgerritMarcus Nilsson proposed a change to openstack-infra/jenkins-job-builder: Added support for Stash Notifier  https://review.openstack.org/5633714:53
*** ekarlso has joined #openstack-infra14:53
*** johnthetubaguy1 has joined #openstack-infra14:54
*** johnthetubaguy has quit IRC14:54
*** changbl has joined #openstack-infra14:56
*** dolphm has quit IRC14:56
anteayaBobBall: config, the black hole of all that is14:57
*** dolphm has joined #openstack-infra14:57
openstackgerritBob Ball proposed a change to openstack-infra/config: Increase timeout for grenade to 90 minutes  https://review.openstack.org/5745014:58
anteayaBobBall: oh sorry, I didn't see that before14:59
*** luisg has joined #openstack-infra14:59
anteayawe are currently frowning on increasing timeouts as a way to solve testing problems14:59
BobBallshame... :)14:59
anteayathat just contributes to the mass expansion of all the tests14:59
anteayadoesn't mean you don't have a case for it14:59
*** sandywalsh has quit IRC15:00
anteayabut you might be encouraged to pursue other options first15:00
*** dcramer_ has joined #openstack-infra15:00
anteayattx let's hope 56150 makes it through the gate15:01
*** dizquierdo has joined #openstack-infra15:01
BobBallI wouldn't know where to start - it's a grenade failure where things just got killed after 60 minutes - and it's running code unrelated to my change (since my change was in xenapi and grenade is running KVM!) :)15:01
anteayaBobBall: have you tried engaging anyone in -qa into conversation about it?15:02
BobBallheh15:03
BobBallno15:03
BobBallbut it's ok15:03
BobBallthis is just another symptom of the gate being broken15:03
*** sandywalsh has joined #openstack-infra15:03
* BobBall delets his change15:03
*** nati_ueno has joined #openstack-infra15:03
fungiyeah, on 57450 i'd want to see some consensual +1s from the qa core devs that it's really the best way out15:03
BobBall2013-11-20 12:51:54.621 |     inet 10.7.205.65/15 brd 10.7.255.255 scope global eth015:03
BobBall2013-11-20 13:27:54.376 | Triggered by: https://review.openstack.org/57431 patchset 215:03
BobBallmore than 30 minutes just waiting15:03
BobBalldoing nothing :)15:04
fungiin cases like that, often increasing the timeout does nothing other than let the job sit there stuck for longer before getting killed, which is a step in the wrong direction of course15:04
*** dkliban_ has joined #openstack-infra15:04
BobBallyup - agreed15:05
*** dkliban has quit IRC15:05
BobBallalthough in my case it would have just made it through because it was finally running tests quickly15:05
*** wenlock has quit IRC15:05
BobBalldunnoy what the pause was caused by15:05
*** sarob has quit IRC15:05
*** dcramer_ has quit IRC15:06
*** alcabrera is now known as alcabrera|afk15:09
*** odyssey4me has quit IRC15:09
*** herndon_ has quit IRC15:12
openstackgerritRoman Prykhodchenko proposed a change to openstack-infra/devstack-gate: Support Ironic in devstack gate  https://review.openstack.org/5389915:12
*** senk has quit IRC15:12
*** senk has joined #openstack-infra15:16
*** blamar has quit IRC15:16
*** ruhe has joined #openstack-infra15:17
*** odyssey4me has joined #openstack-infra15:18
Alex_Gaynorttx: A phrase you might like: DX, "Developer experience", UX for developers. Meaning stuff like docs, good sdks, cli clients, etc.15:18
*** dcramer_ has joined #openstack-infra15:18
*** davidhadas has joined #openstack-infra15:22
*** thedodd has joined #openstack-infra15:23
annegentlettx: I like DX too15:23
*** hashar has joined #openstack-infra15:23
fungiinterfaces for sentient life forms15:23
*** blamar has joined #openstack-infra15:23
*** dcramer__ has joined #openstack-infra15:23
*** xeyed4good has joined #openstack-infra15:24
*** ruhe has quit IRC15:25
annegentlefungi: heh15:25
annegentleIFSLF15:25
*** dcramer_ has quit IRC15:25
fungiannegentle: oh, heads up, yesterday i tagged the tip of the stable/folsom branch of openstack-manuals with "folsom-eol" and then removed the branch (same thing was done for essex and diablo during previous cycles). if you think it's causing/caused any issues, let me know so we can work through them15:27
annegentlefungi: should be fine (off the top of my head)15:27
fungik, great15:27
annegentlefungi: we already redirect away from folsom15:27
*** ben_duyujie has joined #openstack-infra15:27
*** davidhadas has quit IRC15:28
fungiwell, it doesn't remove any documents which were already published... just prevents you from being able to land new changes to that branch any longer15:28
*** davidhadas has joined #openstack-infra15:29
*** ben_duyujie has quit IRC15:30
SergeyLukjanovhi folks, I have a question about testr... we're using some resource files from tests, what's the right function to load them?15:30
SergeyLukjanovwe're using the code like open(pkg.resource_filename(version.version_info.package, file_name)).read() to read such files now15:30
sdagueBobBall: so in that failure it took 30 minutes to prep the node in nodepool15:31
*** ben_duyujie has joined #openstack-infra15:31
BobBallIndeed.  I was too keen to follow a previous bug that upped the timeout for devstack-full :)15:32
annegentledumb question. is lifeless on the tc?15:33
notmynamefungi: starting (my) day with 137 gate jobs. do I need to go buy a plunger or a snake?15:33
funginotmyname: stuff is actually moving through. i think it got down to around 50ish changes for a while, but around 0900 utc, but then a bunch of changes started getting approved which has been steadily increasing since15:34
*** ben_duyujie1 has joined #openstack-infra15:34
*** pblaho has quit IRC15:35
fungier, by around 0900 utc15:35
ttxAlex_Gaynor: yes, in our case U ~= D15:35
fungispot checking when i first got up, changes at the head of the queue were approved around 14 hours previous15:35
ttx(imho)15:36
notmynamefungi: ok. let me know if we need to stop (or start) doing stuff on the swift side. we've got 12 patches approved but not landed because of gate issues15:36
openstackgerritJaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add batch_tasks support.  https://review.openstack.org/5746915:36
anteayaannegentle: yes he is15:36
anteayanot a dumb question15:36
*** dcramer__ has quit IRC15:37
*** mihgen has quit IRC15:37
funginotmyname: there was one swift-related failure cropping up on some jobs, but portante mentioned last night that he'd look into the details15:37
*** wenlock has joined #openstack-infra15:37
*** ben_duyujie has quit IRC15:38
*** nsaje has quit IRC15:38
notmynamefungi: that was the timeout issue? ie it's taking something like 35+ seconds to talk to a storage node. the default timeout is 10 seconds, but the root cause is that there are drive contention issues. ie the disk is backing up and not letting stuff get flushed15:38
annegentleanteaya: ah robert collins15:38
annegentleanteaya: thanks15:38
anteayanp15:38
*** nsaje has joined #openstack-infra15:39
notmynamefungi: I'm torn on the "solution" of raising the timeout. it may improve stuff, but it doesn't seem to be addressing the root issue15:39
*** datsun180b has joined #openstack-infra15:39
funginotmyname: ahh, could have been. i think jog0 brought it up, but not sure of the specifics15:39
*** davidhadas has quit IRC15:39
notmynamefungi: I havnet' talked to portante about it yet today, but I saw the emails15:39
fungilikely then15:40
*** rcleere has joined #openstack-infra15:40
*** senk has quit IRC15:40
*** rnirmal has joined #openstack-infra15:41
*** nsaje has quit IRC15:43
*** yamahata_ has quit IRC15:43
*** senk has joined #openstack-infra15:45
*** xeyed4good has left #openstack-infra15:46
*** CaptTofu has quit IRC15:46
*** CaptTofu has joined #openstack-infra15:46
portantenotmyname: long response times from the object server do not always mean that the disk is the bottleneck15:47
*** boris-42_ is now known as boris-4215:47
*** sarob has joined #openstack-infra15:47
*** senk has quit IRC15:48
*** senk has joined #openstack-infra15:49
*** miqui has joined #openstack-infra15:49
miquihello...15:50
miquianyone know a workaround for this: https://code.google.com/p/gerrit/issues/detail?id=188415:50
*** nati_ueno has quit IRC15:50
*** odyssey4me has quit IRC15:50
openstackgerritJaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add seealso to batch_task from promoted_build.  https://review.openstack.org/5747315:52
russellbfun stat of the day ... 127,207 reviews in the last 365 days (avg 348.5/day) across integrated/incubated projects15:53
anteayaw00t15:54
anteayaa year in a day15:54
anteayaalmost15:54
*** atiwari has joined #openstack-infra15:56
*** dcramer__ has joined #openstack-infra15:57
*** sarob has quit IRC15:58
*** jaypipes has joined #openstack-infra15:58
*** ilyashakhat_ has quit IRC15:58
*** ilyashakhat has joined #openstack-infra15:58
*** hdd has joined #openstack-infra15:59
mordredSergeyLukjanov: that shoudl be fine with testr as well15:59
mordredSergeyLukjanov: are you hitting problems with it?15:59
*** jcoufal has quit IRC16:00
*** CaptTofu has quit IRC16:01
SergeyLukjanovmordred, yep, there was some problems with before the summit, but now i'm hitting another error - http://paste.openstack.org/show/53684/16:01
*** CaptTofu has joined #openstack-infra16:01
portantenotmyname: there are other errors in those logs that require steve lang's fix at https://review.openstack.org/5701916:02
portantebut we get that through the gate jobs. :(16:02
portantemordred, jog0, clarkb: how can we get some commits through to help with the gate job issues?16:03
*** jcooley_ has joined #openstack-infra16:04
mordredSergeyLukjanov: that looks like a normal import issue - is that in trunk? or in a patch?16:04
SergeyLukjanovmordred, here is the patch https://review.openstack.org/#/c/57477 for moving to testr from nosetests16:05
mordredah. neat16:05
*** UtahDave has joined #openstack-infra16:05
*** marun has quit IRC16:06
*** markmc has quit IRC16:07
mordredSergeyLukjanov: looking now16:07
SergeyLukjanovmordred, thank you, that's very strange, tests work ok with nose16:08
*** dolphm is now known as dolphm_afk16:08
SergeyLukjanovmordred, I'm expect some resources-related failures16:08
fungiportante: we have a few options, none of them ideal... two main possibilities are to force the changes in without final testing (gross) or dump the entire zuul state then reenqueue it all with your proposed fix at the front and hope it passes... or we just wait and cross out fingers (current head of the gate was approved roughly 15.75 hours ago)16:13
fungis/out/our/16:13
portantefungi: how do we determine which of the three to take?16:16
portanteI look last night the gated jobs where at 107 or so, and now at 13716:16
portantecan I get access to a running set of systems to look at how the overall VMs (assuming we are not running directly on hardware) are behaving?16:18
portantefungi?16:18
fungiportante: if you follow the sparklines you'll see it got fairly low overnight. we're actually landing quite a few changes, it's just that the approval rate on changes has been relatively high coupled with current bugs in openstack making them moderately untestable16:18
fungisorry, i'll try to answer your questions in sequence here...16:18
fungiso in the past, any solution involving "jumping the gate" has come down to a fairly involved discussion between infra and qa usually. we try really, really hard not to further compromise the state of openstack by doing that16:19
portantecertainly, requeuing seems like the best option16:20
mordredSergeyLukjanov: so - what's happening16:20
fungiin some ways, the fact that people have approved buggy changes into openstack is contributing to slowing the overall pace of openstack development, which in a one-project-bit-picture view could be thought of as a self-imposed limit on the rate of development over quality16:20
mordredis that discover scans through the python module path16:20
mordredwhich causes code in __init__.py files to get executed16:20
fungier, big picture16:20
portanteagreed16:20
mordredSergeyLukjanov: in savanna/conductor/__init__.py for instance, there is code that's executed16:21
portantethat comes down to individual teams have effective unit test strategies that help lower those incidents, no?16:21
fungior not enough cross-team involvement in helping each other fix that situation16:21
*** alcabrera|afk is now known as alcabrera16:21
portantefungi: perhaps16:22
mordredSergeyLukjanov: which it seems may expect to be executed in a particular context, that's now not there when discover is doing the test scan16:22
mordredSergeyLukjanov: in general, executing code with side effects in __init__.py is an anti-pattern and should be avoided16:22
mordred(for reasons such as this here)16:22
fungias for vm performance, are you talking about from the perspective of the slow response on swift objects issue seen, or just a general impression that slow virtual machines are making the gate slow (which for the latter i don't see any evidence to support)16:22
*** jcooley_ has quit IRC16:22
portanteslow response on the swift objects16:23
portanteI'd like to just peek at the "VMs" and poke around16:23
fungiportante: so, we do collect sysstat logs of the entire tempest run. those should provide some statistics on performance of various resources on the machine during the course of the test and can be correlated to the logs from the test16:23
* fungi digs up an example16:24
portanteokay, is that in the typical logs/ directory?16:24
fungiyes16:24
*** mgagne has joined #openstack-infra16:24
fungiportante: such as http://logs.openstack.org/65/51865/1/gate/gate-tempest-devstack-vm-postgres-full/da2a423/logs/sysstat.dat.gz16:25
fungi(random example selected)16:25
*** nsaje has joined #openstack-infra16:25
fungishould be able to feed that into sar and examine various resources over teh course of teh test16:25
mordredSergeyLukjanov: ah - actually16:25
mordredSergeyLukjanov: I lied16:25
mordredI believe it has to do with file paths16:26
mordredSergeyLukjanov: ok. I'm just lying a lot16:26
*** DinaBelova has quit IRC16:26
fungiportante: as far as getting access to virtual machines, if you decide the only efficient way is to ssh into one of the machines where a test ran into this particular issue, our best bet is to proactively mark a bunch of machines in a held state (enough to be statistically likely that one will encounter that bug) so that they won't automatically be garbage collected, and then hope we catch one16:27
fungibut doing so reduces the overall available pool in our aggregate quota, and so reduces test velocity for other changes even further, so it's also not without a downside16:28
fungiif you simply want a vm configured similarly to how we run tests, i wrote https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/README.rst#n100 to help developers recreate the conditions under which we run devstack/tempest like jobs16:30
portantefungi: cool16:30
portantewhat is the format of sysstat.dat?16:30
portantesar output?16:30
*** dolphm_afk is now known as dolphm_16:31
*** dolphm_ is now known as dolphm16:31
fungiyes16:31
portantek16:31
*** Hefeweizen has joined #openstack-infra16:31
mordredSergeyLukjanov: INFO: Configuration file "itest.conf" not found  *16:32
*** jcoufal has joined #openstack-infra16:33
*** ^d has joined #openstack-infra16:33
fungiportante: if you look in devstack's stack.sh, you'll see that enabling "sysstat" as a devstack service runs sar -o $SCREEN_LOGDIR/$SYSSTAT_FILE $SYSSTAT_INTERVAL16:34
fungiso that file is the result, which we collect at the end of the job16:34
mordredSergeyLukjanov: ugh. I'm not sure. the things that I thought were the cause are not the cause16:34
mordredSergeyLukjanov: but all of those errors are errors it's finding while trying to run discover16:35
portantefungi: got it16:35
mordredSergeyLukjanov: if you want to poke at it more, you can run16:36
mordredSergeyLukjanov: python -m testtools.run discover savanna | less16:36
mordredSergeyLukjanov: which will run things outside of the context of testr (you'll want to be in a venv of course)16:36
fungimiqui: i haven't seen that issue before. were you encountering it on review.openstack.org or elsewhere? the bug reports linked suggest that it may have been a problem in 2.3 and possibly 2.5 but not 2.4 (which is what we currently run)16:37
mordredSergeyLukjanov: oh - it is running tests - so I'm guessing perhaps something isnt' cleaning up and making it hard for something else to importa16:39
mordredSergeyLukjanov: it's possible you shoudl not listen to me16:39
portantefungi: can we get the sar data to collect using "-S DISK" and "-S XDISK"?16:41
portantethere does not appear to be any data on disk behavior16:41
fungiportante: good question... sdague/dtroyer: do you object to adding those?16:42
SergeyLukjanovmordred, sorry, was afk for last 15 mins16:43
*** senk has quit IRC16:43
annegentlemarkmc or others: do you know of auto-backport scripts for cherry-picking patches?16:44
dtroyerfungi: I don't think that would be a problem off hand...16:44
SergeyLukjanovmordred, got the idea, we'll try to debug it, thank you very much! sorry for afk16:45
*** svarnau has joined #openstack-infra16:46
*** nsaje has quit IRC16:47
*** sarob has joined #openstack-infra16:47
*** nsaje has joined #openstack-infra16:47
*** ^d is now known as ^demon|sick16:48
fungiportante: i guess try running devstack with the "sysstat" devstack service enabled but patch that line to add those options and see if it generates the output you expect. if so, the patch to devstack should be exceedingly simple16:48
*** afazekas has quit IRC16:48
mordredSergeyLukjanov: I'm not sure that side effect is the problem16:50
mordredSergeyLukjanov: I tried fixing it locally and it did not fix the problem16:50
mordredbut it's _something_ having to do with imports16:50
*** jcoufal has quit IRC16:51
portantefungi: have you noticed that the number of tcp sockets in  use starts at 5 and ramps up to 105 and never returns?16:51
SergeyLukjanovmordred, ok, maybe there's some other side effect16:51
portanteis everything supposed to be shutdown when sar stops collecting?16:52
*** sarob has quit IRC16:52
*** mrodden has quit IRC16:52
*** nsaje has quit IRC16:52
*** jcooley_ has joined #openstack-infra16:53
fungiportante: i'm not sure. the #openstack-qa channel is probably a better place to dig into details like that16:54
*** jcoufal-mob has joined #openstack-infra16:54
portantek16:55
*** sparkycollier has joined #openstack-infra16:58
*** pcrews has quit IRC16:59
*** gyee has joined #openstack-infra17:00
*** mihgen has joined #openstack-infra17:01
*** mrodden has joined #openstack-infra17:04
*** jpich has quit IRC17:05
*** dcramer__ has quit IRC17:05
*** mihgen has quit IRC17:05
*** ftcjeff has joined #openstack-infra17:08
*** jcoufal-mob has quit IRC17:09
*** jcooley_ has quit IRC17:10
*** bpokorny has joined #openstack-infra17:10
*** mihgen has joined #openstack-infra17:12
mkodererdoes somebody know if there are special privileges needed to host a meeting in #openstack-meeting? or does meetbot accept everybody?17:13
*** CaptTofu has quit IRC17:15
fungimkoderer: meetbot is an outgoing chap and will be anyone's friend. no privs needed17:16
*** CaptTofu has joined #openstack-infra17:17
mkodererfungi: cool thx17:17
fungiwe also recently added a feature allowing anyone (not just the chairperson) to #endmeeting an hour or more after a #startmeeting17:18
fungisince chairs were sometimes forgetting to do it17:19
*** dcramer__ has joined #openstack-infra17:19
*** yaguang has quit IRC17:19
zulhey is there something going on with the gates? there seems to be some python-keystoneclient stuff that has been approved but havent gone in yet17:20
portantezul: join the club17:20
portante;)17:20
*** dkliban_ has quit IRC17:22
*** dcramer__ has quit IRC17:25
*** hashar has quit IRC17:25
*** nsaje has joined #openstack-infra17:25
*** derekh has quit IRC17:27
anteayazul17:27
anteayayes, a bad start to the week and many gate bugs17:27
zulanteaya:  hi17:27
zulok cool17:27
anteayahttp://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html17:27
anteayazul while I have you here can we address novnc pulling in nova packages on 12.04?17:28
anteayait is interfering with many a devstack install17:28
zulanteaya:  sure open up a bug and i have a look17:28
sdaguemkoderer: no special privs17:28
anteayazul: https://bugs.launchpad.net/devstack/+bug/124892317:29
uvirtbotLaunchpad bug 1248923 in devstack "Devstack install is failing:" [Undecided,Confirmed]17:29
openstackgerritBen Nemec proposed a change to openstack-dev/hacking: Enforce import grouping  https://review.openstack.org/5222117:29
openstackgerritBen Nemec proposed a change to openstack-dev/hacking: Enforce grouping like imports together  https://review.openstack.org/5440217:29
openstackgerritBen Nemec proposed a change to openstack-dev/hacking: Enforce import group ordering  https://review.openstack.org/5440317:29
anteayaseems to be a packaging issue, zul, what do you think?17:29
zulanteaya:  probably ill have a look17:30
anteayathanks17:30
*** Bada has joined #openstack-infra17:32
*** dkliban_ has joined #openstack-infra17:34
*** sileht is now known as sileht_17:40
*** sileht_ is now known as sileht17:40
*** chandankumar has quit IRC17:40
*** boris-42 has quit IRC17:42
*** jamesmcarthur has quit IRC17:44
*** afazekas has joined #openstack-infra17:45
*** reed_ has joined #openstack-infra17:45
*** jamesmcarthur has joined #openstack-infra17:47
*** SergeyLukjanov has quit IRC17:48
*** salv-orlando has quit IRC17:49
clarkbmorning17:49
* clarkb catches up on sb17:49
*** senk has joined #openstack-infra17:52
*** salv-orlando has joined #openstack-infra17:54
anteayamorning clarkb17:55
*** sarob has joined #openstack-infra17:57
portantefungi: so what is up with the gates, all the jobs are queued, nothing running, it appears?17:57
clarkbportante: I think we are out of available test nodes17:58
portanteoy17:58
clarkbthe jobs that are running are using all available resources17:58
portanteso the check jobs are interferring with the gate jobs?17:58
fungiportante: up in the top-left you'll see event and result totals17:58
*** pcrews has joined #openstack-infra17:58
fungiimmediately following a gate reset, zuul blocks to process the events/results to determine what to do next17:59
fungithe more changes impacted by a gate reset, the higher the total number of factors it ends up taking into account, and the longer that takes to happen17:59
*** ruhe has joined #openstack-infra17:59
portanteokay17:59
*** sarob has quit IRC18:00
fungibut in addition, as clarkb points out, right now we have more pending jobs than we have nodes, so a gate reset effectively depletes the entire pool and has to wait for new nodes to be added since we're maxxed out on our quotas with our providers right now18:00
*** sarob has joined #openstack-infra18:00
fungilooking at the graph in the bottom-left, you'll see the swing between used and deleting which accompanies each agte reset18:00
clarkbI think there may be a slight compounding problem where our 2.5 jenkins masters can't deal with the amount of load being thrown at them18:01
clarkbhence the long delay BobBall saw earlier18:01
fungientirely possible. we've been driving the jenkins masters like sled dogs18:02
clarkbjog0: also grenade smoke tests run in parallel now? I think we may be seeing some failures there related to parallel testing18:02
clarkbtl;dr ugh18:02
SpamapSholy overwhelming QA fail batman.. http://status.openstack.org/rechecks/ is overrun18:02
*** harlowja has joined #openstack-infra18:02
*** metabro has quit IRC18:03
openstackgerritBen Nemec proposed a change to openstack-dev/hacking: Enforce import grouping  https://review.openstack.org/5222118:03
openstackgerritBen Nemec proposed a change to openstack-dev/hacking: Enforce grouping like imports together  https://review.openstack.org/5440218:03
openstackgerritBen Nemec proposed a change to openstack-dev/hacking: Enforce import group ordering  https://review.openstack.org/5440318:03
clarkbwe should just rebase everything to havana and start over >_>18:03
clarkb(havana works :) )18:04
*** ruhe has quit IRC18:04
*** metabro has joined #openstack-infra18:04
fungiclarkb: on a positive note, we got confirmation that those ghost "running" jobs zuul sees in the wake of the jenkins01 situation can effectively be cleared with a new patchset18:04
fungiso i'm less worried about finding an opportunity to restart it18:04
clarkbcool18:04
*** ilyashakhat has quit IRC18:05
*** sarob has quit IRC18:05
*** ilyashakhat has joined #openstack-infra18:05
clarkbSpamapS: if you want the bug that is currently most troublesome 1251920 seems to be the ticket (I know that number off the top of my head now)18:05
fungii'm working on adding more space to static.o.o now but i worry that we may max out the kernel limit on how many scsi block devices are allowed18:06
*** johnthetubaguy has joined #openstack-infra18:06
*** johnthetubaguy1 has quit IRC18:06
clarkbsilly kernel limits18:07
*** jerryz has joined #openstack-infra18:07
fungiwe'll have something like 20 cinder volumes of 0.5tb each attached to it when i'm done18:08
*** dcramer_ has joined #openstack-infra18:09
hub_capmordred: we want to remove our guest agent from our main repo and into its own. is that something that is ok to do w/o some sort of formal approval?18:09
mordredhub_cap: I think it's a great idea - and it's all inside of your program, so I dont think it's a problem18:10
hub_capk wonderful18:10
* clarkb is going to be annoying. we should fix manage-projects before adding new projects18:10
fungihub_cap: however, you probably want to explore git filter-branch (so you can preserve its revision control history)18:10
SpamapSclarkb: yeah I've done several rechecks and reverifies for 15192018:10
hub_capfungi: roger18:11
SpamapS125192018:11
clarkbor at least everyone should know that after rerunning manage-projects by hand successfully you probably need to reclone the repo for zuul18:11
*** sdake_ has joined #openstack-infra18:11
*** sdake_ has quit IRC18:11
*** sdake_ has joined #openstack-infra18:11
mordredclarkb: what's broke with it?18:11
mordredis this the race-condition thing?18:11
clarkbmordred: first run of manage-projects creates the project in gerrit but gives it no content because github (or some other failure) then zuul clones the empty repo18:11
*** sarob has joined #openstack-infra18:11
mordredclarkb: awesome18:11
clarkbnow zuul is stuck with an empty repo and when content arrives zuul can't resolve the mismatch18:11
clarkbso you have to move the old repo zuul cloned aside then reclone it for zuul (because zuul thinks it is cloned and won't reclone itself)18:12
clarkbif you look in the zuul users bash history there are some examples of recloning (you have to do it with the special GIT_SSH script)18:13
*** sparkycollier has quit IRC18:14
*** dizquierdo has quit IRC18:14
*** sparkycollier has joined #openstack-infra18:15
clarkbmordred: I think we just need to make sure that failures in manage-projects are more isolated from each other, eg github derping shouldn't prevent us from seeding content in gerrit18:16
mordredclarkb: ++18:16
mordredI agree. I think that's a great idea18:16
*** sparkycollier has quit IRC18:17
*** johnthetubaguy has quit IRC18:18
*** ben_duyujie1 has quit IRC18:18
*** marun has joined #openstack-infra18:19
fungiwe may also want to consider making manage-projects slightly more stateful and not rely on it to do things like check to make sure every single project we have is configured in github or perhaps only run specifically for newly aded/changed projects18:19
*** osanchez has quit IRC18:20
fungicrap. as i feared, nova volume-attach is not giving me devnodes beyond /dev/xvdq (just seems to keep reusing that one once i got to it)18:20
clarkb:/18:20
fungiwe may need to take some time to individually pvremove 0.5tb volumes and add 1tb volumes in their place18:21
clarkband then really start looking at swift again18:22
fungidmesg on static.o.o makes no mention of any new block devices getting hotadded after xvdp18:22
clarkbfungi: you can easily swap out the 0.5TB volumes that you just added for 1TB volumes right?18:25
clarkband we can then worry about the others later?18:25
fungiclarkb: yeah, that'll be a start, but it's not enough to get me to the total i was shooting for18:25
*** melwitt has joined #openstack-infra18:26
fungihowever i can pvmove parts of the main vg to them, freeing up a few additional 0.5tb blockdevs which i can then vgreduce off of, pvremove, cinder detach, cinder delete and replace with more 1tb volumes18:27
*** hogepodge has joined #openstack-infra18:27
*** ilyashakhat_ has joined #openstack-infra18:27
fungibut that's probably best left for a weekend when there's a lot less chance of breaking log uploads and causing jobs to fail18:28
*** ilyashakhat has quit IRC18:28
lifelessannegentle: I am!18:31
*** sarob has quit IRC18:31
lifelessannegentle: thats why we had dinner together in HK :)18:31
*** sarob has joined #openstack-infra18:32
*** nsaje has quit IRC18:32
*** MarkAtwood has joined #openstack-infra18:32
*** nsaje has joined #openstack-infra18:33
*** nsaje has quit IRC18:33
*** jamesmcarthur has quit IRC18:33
*** nsaje has joined #openstack-infra18:33
jeblairfungi: what limit are we hitting?18:34
*** jamesmcarthur has joined #openstack-infra18:35
*** Bada has quit IRC18:35
jeblairclarkb: afaik we're planning on using swift, not "looking at it"18:35
jeblairclarkb: at least, that's what i got out of that design summit session18:36
fungijeblair: good question. i don't get an error message anywhere obvious, but the kernel stops registering new xvd's after the 16th one it has (i think that's a kernel limit, maybe tunable via sysctl--checking now)18:36
*** sarob has quit IRC18:37
clarkbjeblair: right18:37
*** dcramer_ has quit IRC18:37
jog0clarkb: ohh that makes sense we went parallel for grenade because we thought that was a better solution that bumping the timeout on the job because it was getting too long on RAX18:37
jog0clarkb: and e-r doesn't cover grenade18:38
jog0sdague: sorry ^18:38
jeblairfungi: hrm, that seems very low18:38
sdaguejog0: correct, e-r doesn't cover grenade18:41
sdaguesome refactoring is needed for that18:41
mordredclarkb, jeblair, fungi: hp cloud is having some capacity issues they're asking for some help with from us18:42
mordredspecifically, az2 has way more capacity right now, so I suggested that we request a doubling of our quota in az2, and the move half of our usage out of each of az1 and az3 to az218:42
mordredthey said that woudl be very helpful - any issues from you guys on moving forward on that?18:42
fungimordred: that sounds sane to me18:43
jeblairmordred: ++18:43
jog0sdague: what do you think the right step forward for fixing grenade is?18:44
jog0we can just look at the bug inside of nova we are hitting18:44
* clarkb will fire off an email right now18:44
clarkbre quota bump in az218:44
fungias for the 16 block device limit, i'm finding that at various points in time that was a per-domu limit in both xen and libvirt (thinking both are resolved now but still digging for confirmation), but also at one point linux lvm2 allowed no more than 16 pv components in a vg so need to figure out whether that's still the case as well18:45
sdaguejog0: honestly, I haven't looked at it yet18:47
* sdague still working through some onboarding tasks18:47
jog0sdague: it took me months to onboard18:48
jog0sdague: no problem, short term I see two options: revert  and bump timeout or fix nova bug18:49
jog0I am in favor of fixing nova instead18:49
sdaguejog0: so is the issue that parallel grenade broke the world?18:49
clarkbno18:49
jog0sdague:just a tiny part18:49
clarkbparallel grenade is a small problem compared to the other issues18:49
sdagueok18:49
sdaguethe timeout bump is fine, but I'm not convinced it will fix it18:50
jog0sdague: timeout bump + serial18:50
sdaguejenkins load means it's taking 30 minutes to even have a node ready18:50
*** ilyashakhat_ has quit IRC18:50
jeblairsdague: can you clarify?18:51
*** ilyashakhat has joined #openstack-infra18:51
*** sandywalsh has quit IRC18:52
sdagueso there was a grenade fail that BobBall posted earlier18:52
mgagnezaro: ping18:53
*** jamesmcarthur has quit IRC18:53
sdaguehttp://logs.openstack.org/31/57431/2/check/check-grenade-devstack-vm/742d85e/18:53
sdaguejeblair: look at the timestamps from job kick off, until it actually does anything real18:54
sdaguehttp://logs.openstack.org/31/57431/2/check/check-grenade-devstack-vm/742d85e/console.html#_2013-11-20_15_49_44_26418:54
jeblairsdague: gotcha18:55
sdaguethe nex line is 49 minutes later18:55
sdagueso yes.... that would cause a timeout issue :)18:55
jog0wait grenade is still timing out?18:55
jog0even with parallel tests?18:55
sdagueyes18:55
sdaguebecause the tests aren't the problem18:55
jog0I was refering to a differnet bug where some of the tests failed18:55
jog0sdague: ohh18:56
sdagueit's taking 30 - 50 minutes before devstack even starts executing18:56
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Create jenkins03 and jenkins04  https://review.openstack.org/5751018:56
sdaguewhich, based on past conversations means jenkins load is slowing things down18:57
sdagueI think18:57
jeblairsdague: hrm, no actually...18:57
fungisomething between when gate-wrap.sh started executing and when grenade started to run18:57
jog0sdague: its getting old18:57
jeblairsdague: what fungi said18:57
sdaguejeblair: ok, correct me if I'm wrong18:57
jog0soon it will need ja jenkins walker18:57
sdagueoh, right18:57
fungiwe don't have timestamps in any of the setup logs though, so hard to tell18:58
sdagueyeh, I just noticed that18:58
zaromgagne: yo!18:59
mgagnezaro: bug #1253180 Would assigning an empty dict to jobparams if value is None be an acceptable solution? Or should we wrap d.update(...) calls with an if jobparams: instead?18:59
uvirtbotLaunchpad bug 1253180 in openstack-ci "jenkins-job-builder exception on valid yaml" [Undecided,New] https://launchpad.net/bugs/125318018:59
*** sarob has joined #openstack-infra19:00
* zaro reads bug19:00
mgagnezaro: jenkins_jobs/builder.py:14819:02
zaromgagne: bug is a little confusing.  that yaml is *not* valid, correct?19:02
mgagnezaro: jobparams is None if a colon is introduced after the job name as shown in the example. (and without params). Later, this value is used to update a dict. .update() expects the value to be iterable, not None.19:03
mgagnezaro: I took his words when he said it was valid. I'm checking atm19:03
*** alcabrera is now known as alcabrera|afk19:04
zaromgagne: i'm pretty sure that's invalid.  because it's a key value pair, but no value.19:05
*** rnirmal has quit IRC19:06
clarkbmordred: jeblair: fungi: quota bump request sent19:06
zaromgagne: but in any case, a better error message would be nice.19:06
*** sandywalsh has joined #openstack-infra19:08
jog0so we think* we have a fix for the big bug (bug 1251920)19:10
uvirtbotLaunchpad bug 1251920 in nova "Tempest failures due to failure to return console logs from an instance" [Critical,In progress] https://launchpad.net/bugs/125192019:10
jog0https://review.openstack.org/5750919:10
clarkbjog0: if oslo hadn't been synced in 4 months why is havana fine?19:10
jog0can we call on any infra magic to get that bumped to the top of the queue19:10
jog0clarkb: see lifeless's email on the thread19:11
lifelessclarkb: it may not be.19:11
clarkbjog0: (just trying to understand why we think this will fix the problem) and why did it start on the 15th?19:11
lifelessclarkb: I'm actually fairly worried about H right now.19:11
jog0clarkb: and TBH I am confused about this one19:11
clarkblifeless: k19:11
* clarkb reads email19:11
mgagnezaro: I think it is valid.19:11
jog0because the patch that we think broke things was for nova neutron. but the bug wasn't19:11
clarkbfungi: jeblair mordred basically I think the question is do we stop zuul now, maybe ask everyone to stop approving things and or push patches (ha) get some of these bug related changes in then open the floodgates19:12
jog0although worste case is we are wrong and this fixes something else19:12
jog0I think19:12
clarkbat this point I am game for trying it since we aren't really keeping up and gate resets are killing us19:13
clarkbjog0: can you get a list of changes together so that we know which ones should be high priority? thinking of the swift timeout change and any nova changes19:13
sdagueonce the check queue is back on that oslo change, we could also just ninja merge it19:13
clarkbsdague: we could do that as well19:14
jog0clarkb: oh right19:14
zaromgagne: i think it's parser dependend.  how are you validating?19:14
sdaguehonestly, until we've got a +2 initiated ninja merge process, I'd just ninja merge things with good check results that we think fix the world19:14
clarkbsdague: it leaves us open to small races that can make things worse but that seems like a low risk19:15
clarkb(if eg something does get through the gate somehow :) and interferes with the forced in change)19:15
jog0clarkb: https://review.openstack.org/#/c/57373/19:16
jog0let me rebase taht one19:16
jog0there were valid -1s on it19:16
mgagnezaro: my guts is telling me it is. Trying to find references for that one. In key: value, if value is empty, it is null (None).19:16
jog0I htink those are the only two I know of19:17
jog0andthe swift one is *way* lower priority19:17
clarkbjog0: ok19:17
zaromgagne: ok. i've verified with python yaml.  it does look ok.  no value is interpreted as null19:18
mgagnezaro: either the job name is a scalar (w/o job params) or it's a mapping with a single key, the key being the name and the value, the job params, hopefully a mapping, not a scalar as we are expecting a mapping.19:18
jog0clarkb: so ignoer the swift one19:18
jog0that can wait19:18
*** alcabrera|afk is now known as alcabrera19:19
jeblairsdague, jog0, clarkb: if there are patches that we believe will fix things, i think we should declare queue bankruptcy, stop zuul, and then reverify those patches.19:20
*** ilyashakhat has quit IRC19:20
*** roaet has left #openstack-infra19:20
fungiyeah, at the moment the head of the gate is changes which were approved 18 hours ago. when i started this morning it was only 14 hours... it looked like it had managed to work through quite a lot of things and gain ground while we were asleep. i think around 0800 utc or so the gate was only about 50 changes deep according to graphs19:20
jeblairi also think people should stop approving things until the project is at least kinda working again...19:21
zaromgagne: i assume that if it's valid it should be accepted.19:21
*** ilyashakhat has joined #openstack-infra19:21
jog0jeblair: if lifeless is right that one patch  will fix most things19:21
clarkbjeblair: ++19:21
jeblairso in concert with that, i'd send a msg to the list saying we have intentionally dropped the queue of approved patches, please reapprove only ones that fix known problems19:21
portantejeblair, can we disable the starting gate jobs thing off of approvale?19:21
portantemake it manual for now?19:21
jeblairportante: approvals are the manual starting of gate jobs19:22
jog0portante: ca nyou look at a https://review.openstack.org/#/c/57373/219:22
jog0sdague: ^19:22
jog0if you both sign off we can push that to the top of the queue too19:22
clarkbjeblair: any opinions on who should be doing what?19:22
jeblairclarkb: do we have a (set of) patch(es) ready for this?19:23
zaromgagne: so i guess jjb should not even throw an exception at all.19:23
clarkbI am happy to do the zuul stop start or collect the current list of changes so that we can swing around later and reverify/recheck19:23
clarkbjeblair: jog0 indicates that the only one we should worry about is the nova change https://review.openstack.org/#/c/57509/219:23
clarkbjog0: we might as well get the swift change in too if possible19:24
mgagnezaro: we should handle this use case.19:24
jeblairjog0: (i think the standard is 'Co-Authored-By', btw)19:24
clarkbso maybe we need to sync up with nova cores to make sure they are happy getting that code in asap?19:24
portantejog0: looks fine, I wrote this up to add a bit more detail to help others less familiar: http://paste.openstack.org/show/53694/19:24
jog0jeblair: thats what the other co-author in devstack said19:24
mgagnezaro: I see 2 possible solutions: Assigning an empty dict to jobparams if value is None Or wrap d.update(...) calls with an if jobparams:19:24
jog0(co-auth-by)19:25
*** markwash has joined #openstack-infra19:25
jeblairjog0: you said '-with' in that swift commit19:25
jog0jeblair: thats what I ment19:25
jog0meant19:25
jeblairjog0: it's wrong :)19:25
jog0jeblair: see I0652f639673e600fd7508a9869ec85f8d5ce451819:25
jog0and blame sdague19:26
portantejeblair: while approvals might be the manual way, there are those that will still be approving requests that are not necessary to get the gate jobs working19:26
* zaro pulls jjb master19:26
jog0so bug fix patches: https://review.openstack.org/57509 https://review.openstack.org/#/c/5737319:26
fungijog0: see https://wiki.openstack.org/wiki/GitCommitMessages#Including_external_references ;)19:26
clarkbwoot we have more quota, I will propose a nodepool config change to rebalance our node distribution19:27
jeblairjog0: i'm telling you that commit is wrong.  that's really really wrong, in fact.  that's monty saying that he co-authored a patch with himself.  which is ridiculous.19:27
jeblairjog0: what does sdague have to do with it?19:27
jog0jeblair: https://review.openstack.org/#/c/35705/ is that patch I took the format from19:28
jeblairclarkb: so, step 1 is get 57509 APRV+1, then let's save the queue, stop zuul, and reverify/reapprove that one19:28
jog0jeblair: I will fix19:28
clarkbjeblair: sounds good19:28
jeblairjog0: i know.  you said that already.  it's wrong.  :)  nothing i can do about that other than tell you it's wrong.19:29
openstackgerritClark Boylan proposed a change to openstack-infra/config: Shift more node use to HPCloud AZ2.  https://review.openstack.org/5751519:29
jog0jeblair: its not wrong anymore19:29
jog0fixed19:29
clarkbjeblair: fungi mordred ^ rebalance the hpcloud nodes19:30
*** ilyashakhat has quit IRC19:30
lifelessjog0: it fixes two of the bugs - see comments in the review.19:31
fungiclarkb: plus 18 extra nodes?19:31
*** hashar has joined #openstack-infra19:31
clarkbfungi: 1219:31
*** ilyashakhat has joined #openstack-infra19:31
*** melwitt has quit IRC19:31
*** melwitt1 has joined #openstack-infra19:32
clarkbfungi: the limit is 96*3 (or was) I am bumping az2 to 192 and cutting the theoretical limit in half not the 90 in half19:32
clarkbfungi: I can s/48/45/ though19:32
zaromgagne: imo the wrap d.update() would be better.19:32
jog0lifeless: holy crap sweet19:32
fungiclarkb: cool. just pointing out that 192+48*2-90*3=1819:33
*** mihgen has quit IRC19:33
jeblairclarkb: ah yeah, that was to give a small buffer for node leakage; we'll get more quota errors in the log without it.  either way.19:33
clarkbjeblair: the new limit in AZ2 is above 192 so we should be fine19:33
fungiwfm19:34
*** danger_fo_away is now known as dangers19:35
jeblairclarkb, sdague, jog0: are we confident enough in that fix to go ahead and restore the gate queue after it?19:35
clarkbjeblair: I have asked in #openstack-nova for people to look at it19:35
clarkbjog0 is rallying the troops there19:36
jog0clarkb: I thnkk they are all sleeping19:36
jog0including sdague19:36
jeblairthat sounds like a really good idea right now :)19:37
sdaguejog0: so there is the little fix (the 7 line tls one) and the big sync19:38
sdaguewhich one are you talking about?19:38
jog0sdague: the little one19:38
jog0https://review.openstack.org/#/c/57509/19:38
sdagueI want test results back on it before I +2 it19:38
jog0sdague: can you A+ ASAP19:38
jog0we have results19:38
*** sarob has quit IRC19:38
jog0see peter'scomments19:38
sdaguewe don't have jenkins results19:39
jog0sdague: and AFAIK we will still gate on it19:39
jog0clarkb: ^19:39
*** sarob has joined #openstack-infra19:39
jog0we won't because jenkins is backed up19:39
clarkbjog0: we will still gate on it, it will go to the head of the queue19:39
jog0sdague: ^19:39
jog0right19:39
clarkband get tested ahead of everything else19:39
jog0otherwise we wait hours19:39
*** ^demon|sick has quit IRC19:39
clarkbwe just got approval19:39
fungiand hopefully not hit a nondeterministic error19:39
fungiand get kicked back out19:40
clarkbjeblair: fungi: ready?19:40
jog0fungi: heh yeah19:40
jeblairclarkb: i'm saving queues now19:40
*** markmc has joined #openstack-infra19:40
*** ^demon|sick has joined #openstack-infra19:40
sdagueok, sorry, was confused on the ordering19:40
*** nsaje has quit IRC19:41
*** blamar has quit IRC19:41
jeblairclarkb: i have saved the check and gate queues19:41
clarkbjeblair: should I stop then start zuul now?19:41
jeblairclarkb: go for it19:41
clarkbwe may also need to manually cleanup some jobs in jenkins afterwards to free up slaves19:41
clarkbstopping zuul now19:41
jeblairclarkb: it would be easiest to do that while zuul is stopped19:42
jeblairclarkb: so why don't you delay between stopping and starting19:42
jeblairclarkb: and we'll go kill jobs19:42
fungiagreed. jumping in the jenkins01/02 webuis now19:42
clarkbjeblair: oh too late19:42
jeblairclarkb: meh.  just stop it again.  :)19:42
clarkbI can stop zuul again really quickly19:42
clarkbdone19:42
* clarkb starts killing jobs on jenkins0219:43
*** hogepodge has quit IRC19:43
fungii'll start with jenkins01 jobs then19:43
jeblairi will too.  shouldn't hurt if we double-kill things.19:44
*** blamar has joined #openstack-infra19:44
*** sarob has quit IRC19:44
* fungi is working from the bottom on 0119:44
jeblairi'll work from the top19:45
* jog0 owes everyone here a beer19:45
*** hogepodge has joined #openstack-infra19:46
fungijenkins01 is clean19:47
fungipitching in on 02 now19:47
clarkbthere are a couple stubbon jobs on 02 near the top I may just ignore them for now19:49
jeblairi've got them all open in tabs19:49
jeblairso i can continue to try to kill them or at least track them19:49
jeblairclarkb: why don't you start zuul now19:49
clarkbjeblair: doing that now19:50
clarkbzuul is starting19:50
fungii am going to guess that 02 is being more of a pain because 01 was restarted only a couple days ago19:50
clarkbready to reverify the nova change?19:51
*** Ryan_Lane has joined #openstack-infra19:51
jeblairclarkb: done19:51
clarkbwe should recheck the full nova sync as well19:51
jeblairthe jobs on 02 were all started ~7 hours 50 mins ago19:51
* clarkb rechecks the full nova sync19:51
*** dstanek has quit IRC19:52
clarkbok those jobs have started, any other changes we want to get in asap? maybe the az2 rebalance config change?19:52
jog0clarkb: and the swift patch if you want19:52
jog0https://review.openstack.org/#/c/57373/19:52
jeblairclarkb: go for it19:52
jeblairjog0: not approved yet19:52
jog0sdague: ^19:52
clarkbapproved the nodepool config change19:52
jog0jeblair: I ment for the check queue19:52
jeblairjog0: ah sure19:53
sdagueapproved now19:53
jog0sdague: thanks19:53
fungilooks like precise3 and precise23 are dead. i'll see what i can do to get them back on line19:53
jog0portante: ^19:53
jeblairi'll clean up those stuck nodes on jenkins0219:54
*** dstanek has joined #openstack-infra19:54
openstackgerritA change was merged to openstack-infra/config: Shift more node use to HPCloud AZ2.  https://review.openstack.org/5751519:54
*** ^demon|sick is now known as ^d19:54
*** ^d has joined #openstack-infra19:54
*** reed_ is now known as reed19:54
*** reed has quit IRC19:54
*** reed has joined #openstack-infra19:54
portantejog0: ?19:55
jog0portante: your patch is on the top of the merge queue19:55
jeblairclarkb, jog0, sdague: so should we re-load the queue? or just drop it?19:55
portanteI see the swift patch cool19:55
jog0anteaya: you had something19:56
anteayaneutron needs https://review.openstack.org/#/c/53188/ and https://review.openstack.org/#/c/57475/19:56
portantethere is another swift fix that seems to affect glance runs, that is 5701919:56
anteayato merge in prep for a bug fix patch19:56
portantejog0:19:56
clarkbjog0: I think we should wait a little longer before really loading it back up again19:56
clarkber jeblair ^19:56
jog0anteaya: 56475 isn't approved19:56
portante https://review.openstack.org/5701919:56
* koolhead17 lurks19:56
clarkbportante: I think you can reverify that one19:57
anteayajog0: asking in -neutron19:57
jog0clarkb: 57019 and https://review.openstack.org/#/c/57018/219:57
jeblairclarkb, portante: i just reverified it19:57
clarkbjeblair: that reverify seems to have grabbed 57018 too19:58
jeblairclarkb: dependent change19:58
fungiyeah, parent19:59
fungi57511 looks most unhappy19:59
fungiDuplicateOptError: duplicate option: policy_file20:00
* portante is back20:00
fungiit'll need to be reworked i gues20:00
fungis20:00
jog0fungi: 57511 will have to wait20:00
jog0its not critical (that we know of)20:00
fungioh, i see. that one just happened to jump in as we started zuul, wasn't one of the set we cared about20:01
clarkbfungi: I rechecked it20:02
clarkbfungi: because it is the larger oslo sync which we need to get in for longer term oslo syncing20:02
clarkbbut doesn't affect the immediate problem20:02
fungiprecise3 and precise23 are back on line in jenkins now and seem to both be running jobs20:02
*** SergeyLukjanov has joined #openstack-infra20:08
*** hogepodge has quit IRC20:08
lifelessgate-tempest-devstack-vm-neutron-large-ops: SUCCESS20:10
lifelessthats a good sign20:10
*** sdake_ is now known as randallburt20:10
fungiyeah, early signs on the critical changes look good\20:10
sdagueanteaya: do you have a consolidated email of the details so far on the code sprint (if one hit the list recently, a link is good enough), so I can start running it up the chain here?20:11
clarkbI am drafting a thing at https://etherpad.openstack.org/p/icehouse-1-gate-reset20:11
anteayasdague: not yet, will provide20:11
*** derekh has joined #openstack-infra20:11
*** dstanek has quit IRC20:12
sdagueanteaya: cool, thanks20:12
jog0now that we are still in a fairly critical mode, its time for me to go to lunch20:12
anteayajog0: enjoy lunch20:12
jog0didn't think the timing would be so bad20:13
jog0if anything major comes up, email is the quickest way to get me for the next 45 minutes or so20:13
jog0clarkb: thanks for writting this up, it looks like we are fixing 4 or 5 bugs all at once20:13
jog0which is good20:13
jog0at least20:13
fungijog0: according to the time estimates, 45 minutes should be just about right to find out if it worked20:14
portantegate jobs seem to be creeping in20:16
portantedo we want all these?20:16
clarkbportante: not really20:16
clarkbportante: we probably needed to shout louder about leaving the gate along20:17
clarkb*alone20:17
*** melwitt1 has quit IRC20:17
clarkbportante: worst case we do what we did again and yell louder :) I think we will just live with it for now, the important things are looking good so far20:17
portanteyes20:17
*** melwitt has joined #openstack-infra20:17
portantek20:17
jeblairclarkb: well, we haven't asked anyone to do that yet, so it's no surprise they didn't listen20:17
clarkbjeblair: right20:18
jeblairclarkb: but at any rate, yeah, i don't think it's a big deal.  if we have to move something else, we can just kill it again20:18
clarkbjeblair: I am trying to get a cohesive thought going in https://etherpad.openstack.org/p/icehouse-1-gate-reset20:18
clarkbjeblair: I agree20:18
*** randallburt is now known as sdkae20:19
clarkbbah grenade just failed in the nova fix20:19
*** sdkae is now known as sdake_20:19
portanteyup20:19
*** vipul is now known as vipul-away20:19
*** vipul-away is now known as vipul20:19
jeblairclarkb: because of the problem fixed in the grenade fix?20:19
jeblairer devstack20:19
fungirequest timeout on verify resize20:20
clarkbjeblair: maybe20:20
clarkbportante: any chance you can look at the logs?20:20
*** ilyashakhat has quit IRC20:20
portanteyes20:20
*** DinaBelova has joined #openstack-infra20:20
portantegotta link?20:20
fungiportante: https://jenkins02.openstack.org/job/gate-grenade-devstack-vm/17023/consoleFull20:21
*** sdake_ is now known as randallburt20:21
*** ilyashakhat has joined #openstack-infra20:21
portantewht about hte syslog.txt file?20:22
*** eharney has joined #openstack-infra20:22
fungigetting20:22
portantek thx20:23
fungiportante: http://logs.openstack.org/09/57509/2/gate/gate-grenade-devstack-vm/d2252da/logs/20:24
jeblairok, all those stuck nodes on jenkins02 are deleted20:25
*** randallburt is now known as sdake_20:26
fungiclarkb: your wording in the etherpad is spot on. i don't see a thing i disagree with or would change20:27
jeblair++20:27
fungialso thank you for drafting that20:28
*** dstanek has joined #openstack-infra20:28
clarkbcool, do we want to send it as a collective?20:29
*** ilyashakhat has quit IRC20:29
clarkb(I don't mind being the mail list target if not :) )20:29
fungii'm fine with it either way, but your words can be your words and we can jump in if there's any contention20:30
*** ilyashakhat has joined #openstack-infra20:30
*** sandywalsh has quit IRC20:30
clarkbhttp://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-full/c500394/console.html it failed the full tempest runs too20:31
jeblairclarkb: ++ what fungi said20:31
clarkbin similar ways maybe? I think this bug fix may expose something else? boy wouldn't that be fun20:31
*** hashar has left #openstack-infra20:31
anteaya+1 on the ether pad clarkb20:32
*** hashar has joined #openstack-infra20:32
clarkbok /me sends mail to openstack-dev20:32
clarkbwe can retry the reverify and see if the cahnges ahead of it help20:33
* portante still looking20:33
* clarkb awaits portante's analysis20:34
portanteguys, the rsyslog buffers are truncated20:34
*** kgriffs has joined #openstack-infra20:34
portantepki token values are huge20:34
portanteand some of the tracebacks are cut off20:34
portantecan we get a syslog config bumped to use bigger buffers20:34
clarkbwe did that grizzly time frame, maybe we didn't go big enough20:34
clarkbportante: this is one reason we don't use syslog for most of the service logging though20:35
portantethey appear to be 2K20:35
clarkbhmm I thought we bumped to 65k20:35
portanteyes20:35
portantemaybe something else is truncating, then20:35
clarkboh we ensure absent on the file that bumped the buggers20:36
clarkb*buffer20:36
clarkbwe must've reverted that change20:36
portantehmm20:37
portantewe might be able to put an option in to not log the entire PKI token, but that won't help with the Tracebacks20:37
jeblairportante: what is it that's only logged to syslog and not to a file?20:37
lifelessclarkb: is it 'gear' we use?20:38
clarkblifeless: yes20:38
lifelessclarkb: thats not on pypi?20:38
clarkblifeless: it is20:38
lifelessclarkb: do we run gearmand?20:38
clarkbjeblair: swift proxy logs20:38
clarkblifeless: no we run geard20:38
lifelessclarkb: or the python thing you pointed me at?20:38
jeblairclarkb: that's not screen-s-proxy.txt ?20:39
lifelessderekh: dprince: pleia2: https://pypi.python.org/pypi/gear20:39
fungiclarkb: didn't we undo the buffer change because we were overrunning/crashing/something rsyslog?20:39
portantejeblair, not sure what you mean20:39
jeblairportante: eg http://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-full/c500394/logs/screen-s-proxy.txt.gz20:39
clarkbjeblair: it is but that file doesn't have timestamps and other nice things20:39
jeblairclarkb: i see.20:40
clarkbfungi: oh yeah, the service would compeltely fall over resulting in test failures20:40
jeblairportante: can you add timestamps and other nice things to the swift proxy logs?20:40
hasharjeblair: hi :-D I had an issue with the German Jenkins plugin. What would be the best way  to expose it: openstack-infra list ||  launchpad bug ?20:41
dprinceclarkb/lifeless: then why int the world do we have a puppet module for saz-gearman20:41
* dprince is confused20:42
jeblairhashar: https://launchpad.net/gearman-plugin20:42
clarkbdprince: because we were going to use C gearman, but then we found out C gearman is special in a couple ways20:42
hasharjeblair: thanks :-]20:42
portantejeblair: see https://review.openstack.org/5669220:42
dprinceclarkb: so we can nuke that puppet module?20:42
openstackgerritMathieu Gagné proposed a change to openstack-infra/jenkins-job-builder: Ensure jobparams and group_jobparams are dict  https://review.openstack.org/5752520:42
dprinceclarkb: the source of my confusion I think!!!20:42
jeblairdprince: yes20:42
clarkbdprince: probably, unless jeblair wants to move to the C server at some point20:42
jeblairportante: neat :)20:43
portantebut that is only for proxy request loggin20:44
*** blamar has quit IRC20:44
portantejeblair: ^20:44
* clarkb finally sends mail. hopefully we get good discussion20:44
*** sandywalsh has joined #openstack-infra20:44
portantejeblair: it does not fix it for the other logs, that depends on how that is configured in devstack20:45
portantedo you know how that happens?20:45
jeblairportante: i'm not expert in that, but i'd expect something in http://git.openstack.org/cgit/openstack-dev/devstack/tree/lib/swift20:46
*** vipul is now known as vipul-away20:48
*** SergeyLukjanov has quit IRC20:49
*** yolanda has quit IRC20:49
*** hogepodge has joined #openstack-infra20:49
*** blamar has joined #openstack-infra20:50
anteayawe have a patch offered https://review.openstack.org/#/c/57290/20:50
*** senk has quit IRC20:50
anteayabut it hasn't passed check yet20:50
anteayawe hope it will address https://bugs.launchpad.net/swift/+bug/122400120:51
uvirtbotLaunchpad bug 1224001 in neutron "test_network_basic_ops fails waiting for network to become available" [High,In progress]20:51
clarkbanteaya: looks like check tests are running on it now20:51
anteayacool20:51
*** vipul-away is now known as vipul20:51
*** marun has quit IRC20:53
anteayaneutron likes 57290 so once it has passed check they are ready to approve20:53
portantejeblair: looks like some uncaught exceptions in swift, not sure if they are related yet20:54
*** rpodolyaka1 has joined #openstack-infra20:54
*** senk has joined #openstack-infra20:54
clarkbI am going to go grab a quick bite for lunch while we are playing the waiting game.20:54
clarkbback in a bit20:55
anteayafungi jeblair 57290 has both check and gate jobs running on it at the same time20:55
anteayathey were keen to get it through20:55
fungianteaya: that's possible if it was approved or reverified while checks were running20:55
anteayaI would like it to pass check first and then queue for gate20:55
anteayait was20:55
anteayasuggestions at this point?20:56
anteayaleave it or ask for the approval to be removed?20:56
fungijeblair: clarkb: i gave up trying to figure out whether the 16 xvd limit is an underlying limitation on rackspace's xen rev and just opened a support case with them instead. Ticket ID20:56
fungi    131120-00494-320:56
jeblairfungi: cool20:56
jeblairanteaya: leave it20:56
anteayaleaving it20:56
*** jergerber has quit IRC20:57
*** MarkAtwood has quit IRC20:58
*** alcabrera has quit IRC20:59
portantejeblair: okay, I see, swift is run behind apache in this case, and so the default logging we do to the console does not add a timestamp21:00
*** DinaBelova has quit IRC21:00
*** shardy has quit IRC21:01
*** rpodolyaka1 has quit IRC21:01
*** sarob has joined #openstack-infra21:01
anteayawon't matter anyway, the depedency patch failed in the gate, it appears it had never passed check21:01
anteayawe are chatting about the importance of passing check tests in -neutron21:01
anteayawell, I am anyway21:01
*** sarob has quit IRC21:02
*** shardy has joined #openstack-infra21:03
*** ^d has quit IRC21:03
*** sarob has joined #openstack-infra21:04
*** kgriffs is now known as kgriffs_afk21:04
fungilooks like the devstack 57373 change we wanted is also failing out on grenade in the post-upgrade tempest run (setupclass on tempest.api.compute.servers.test_server_addresses.ServerAddressesTestXML and tempest.api.volume.test_volumes_actions.VolumesActionsTest as well as tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm)21:04
*** ^d has joined #openstack-infra21:04
*** sarob has quit IRC21:06
*** sarob has joined #openstack-infra21:06
pabelangerclarkb, fungi: Was there any discussion about publishing horizon to pypi at the submit?21:06
pabelangerIn reference to https://review.openstack.org/#/c/54795/21:07
fungipabelanger: not sure--i wasn't in any of the horizon sessions though there was some general discussion of possibly publishing all things to pypi once mordred's wheels bits are in and working as intended21:07
pabelangerfair enough21:08
portantejbelair, jog0: https://review.openstack.org/5752621:09
david-lylepabelanger: we also plan to split horizon into a ui-toolkit library and what is now the openstack dashboard, but that will only get half to pypi without the wheels change21:09
notmynameportante: devstack is running swift behind apache?21:10
portanteapparently21:11
notmynameweird21:11
pabelangerdavid-lyle, Ya, that's the main reason I wanted to see if up on pypi.  I'm building a dashboard on top of it :)  I should idle back in #openstack-horizon again to follow that development21:11
david-lylepabelanger: looking at i-2 for that split21:12
portanteprobably not a good idea to only run behind apache, should probably have environments using the WSIG wrappers we provide21:12
mordredpabelanger, fungi what?21:12
* pabelanger is excited21:12
*** herndon_ has joined #openstack-infra21:12
pabelangermordred, I opened an review about publishing horizon to pypi before the submit. Was asked to hold off until people could talk more about, was just looking for an update about it21:13
fungimordred: i thought we had discussed expanding the scope of which projects we publish to pypi once we're able to safely do prerelease versions there and have signing of pypi uploads in place21:13
mordredyes21:13
mordredtwo things21:13
mordreda) splitting horizon and dashboard21:13
mordredb) publishing all the things to pypi21:13
mordredboth are on the roadmap21:13
pabelangerdanke21:13
fungithat matches my recollection21:14
pabelangerI'll abandon my review for the time being21:14
*** DennyZhang has joined #openstack-infra21:15
openstackgerritDan Prince proposed a change to openstack-infra/config: Drop the saz-gearman module (we don't use it)  https://review.openstack.org/5752721:16
*** vipul is now known as vipul-away21:20
*** vipul-away is now known as vipul21:20
portantejeblair: where are we at with the rsyslog buffer size thread of inquiry?21:20
*** otherwiseguy has joined #openstack-infra21:22
lifelessdid 57509 fail?21:22
anteayafungi jeblair otherwiseguy wants to create this dependancy chain 53188 -> 54747 -> 5729021:23
lifelessoh, the other flaky test shot it in the head?21:23
*** dprince has quit IRC21:23
*** hashar has quit IRC21:23
jeblairportante: i thought clarkb was working on that; i don't recall the changes/reversion he mentioned (i may not have been a part of that)21:23
anteayagiven that he has 53188 -> 54745 already21:23
portantesorry, jeblair, clarkb?21:24
anteayahow does he add 57290 to the end?21:24
anteayait is 54745 right otherwiseguy? not 5474721:24
jeblairanteaya: git review -d 54747; git review -x 57290; git review21:24
otherwiseguyyeah, 5475721:24
otherwiseguyjeblair: thanks21:25
* otherwiseguy tries21:25
jeblairlifeless: i believe portante is looking into 5750921:25
anteayajeblair: thank you21:26
lifelessjeblair: it ran into https://bugs.launchpad.net/tempest/+bug/123035421:26
uvirtbotLaunchpad bug 1230354 in tempest "tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario fails sporadically" [Undecided,Confirmed]21:26
lifelessjeblair: so is now requeued behind a bunch of other gate jobs21:26
fungiall three of the fixes we tried to queue up front failed grenade21:27
anteayait was 5747521:27
fungithe swift one (well, its parent) just tanked too21:27
jeblairportante: i think that conflicts with https://review.openstack.org/#/c/57373/4 (jog0's change that lists you as a co-author)21:29
jeblairfungi: 57509, 57373, and what was the 3rd?21:30
jeblairfungi: oh https://review.openstack.org/#/c/57019/ (and 18)21:31
fungiyeah, that one21:31
*** LarsN has quit IRC21:32
*** hogepodge has quit IRC21:32
morganfainbergperhaps there should be a zuul "mode" that allows administrative loading of tasks but just spools or ignores other tasks?21:33
jeblairso 509 failed on a bug we don't have a fix for21:33
*** kgriffs_afk is now known as kgriffs21:33
portanteregarding 509, it is hard to tell what happened with swift, but it does not appear to be part of the cause of 50921:34
jeblairmorganfainberg: we try to empower the core teams of the projects themselves.  we don't like to engineer things that make us special gatekeepers.21:34
morganfainbergjeblair, thats fine, but in cases like today, perhaps it would be worthwhile?21:34
fungislippery slope21:35
*** hogepodge has joined #openstack-infra21:35
portantejeblair: yes, I'll fix that21:35
otherwiseguyjeblair: that worked great. thakns again.21:35
morganfainbergfungi, agreed. but sometimes it's worth considering for extraordinary times21:35
jog0I see the gate still doesn't look good21:38
jeblairjog0: https://review.openstack.org/#/c/57509/ failed on https://bugs.launchpad.net/tempest/+bug/1230354 ; can you look at that bug?  it doesn't look very fleshed out21:39
uvirtbotLaunchpad bug 1230354 in tempest "tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario fails sporadically" [Undecided,Confirmed]21:39
jog0jeblair: looking21:40
jog0jeblair: anything get merged?21:41
jeblairjog0: no; 57509, 57373, 57019 all failed21:41
fungijog0: not any of the changes we put at the front anyway. all failed out21:41
jog0:(21:42
jeblairfungi: 019 just got a comment saying someone is rebasing it on master (instead of 018), so that may improve its chances21:42
jog0clarkb: what happened to your tempest test patch to stable?21:42
jog0jeblair: that bug is hella vague21:42
*** hogepodge has quit IRC21:43
jog0this just re-enforces fungi's theory about bugs21:44
jeblairfungi, jog0: which is?21:44
jog0jeblair: something about bugs making friends with more bugs21:45
jog0"nondeterministic failures breed more nondeterministic failures, because people are so used to having to reverify their patches to get them to merge that they are doing so even when it's their patch which is introducing a nondeterministic bug" fungi21:46
*** nati_ueno has joined #openstack-infra21:46
fungiahh, that postulate21:46
*** kgriffs is now known as kgriffs_afk21:48
*** senk has quit IRC21:49
jog0clarkb: http://logs.openstack.org/97/44097/18/check/gate-tempest-devstack-vm-postgres-full/270e611/logs/screen-n-cpu.txt.gz#_2013-09-25_15_07_49_82421:51
jog0I can't seem to find 'iSCSI device not found at' in elasticSearch21:51
jog0any ideas21:51
jog0ohh that stack trace is old21:52
jog0never mind21:52
jog0shitty bug reports21:52
*** dcramer_ has joined #openstack-infra21:52
*** MarkAtwood has joined #openstack-infra21:54
*** hogepodge has joined #openstack-infra21:54
jog0so https://review.openstack.org/#/c/57509/ had 3 failed jobs21:54
jog0can someone help me hunt the caues down21:54
jog0dims21:54
openstackgerritKhai Do proposed a change to openstack-infra/config: add nodepool to jenkins-dev server  https://review.openstack.org/5733321:55
clarkbjog0: portante was looking at them to see if the swift bugs which had fixes behind it caused it to fail21:55
jog0cool21:55
jog0we have http://logs.openstack.org/09/57509/2/gate/gate-grenade-devstack-vm/d2252da/21:55
portantestill noodling over 509, we'd like to get the rsyslog buffers increased so we can see more21:55
jog0http://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-postgres-full/9a14cc3/21:55
jog0portante: 509 failure for which one21:55
clarkbportante: I don't know if you caught it but we had to revert the larger rsyslog buffers because it was killing the syslog service21:56
jog0or http://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-full/c500394/21:56
portanteyuck21:56
portantehow much larger did you go?21:56
clarkbportante: 65k iirc I will check logs21:56
fungiapparently the volume of log data we spam to syslog during devstack/tempest is more than it can deal with21:56
portantewe don't need 65k buffers, just 4 or 6k21:57
*** kgriffs_afk is now known as kgriffs21:57
portantewe have Tracebacks that are lost21:57
clarkb64k21:57
jog0lifeless: https://review.openstack.org/#/c/57509/ hit a AssertionError: Console output was empty.21:57
jog0http://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-postgres-full/9a14cc3/testr_results.html.gz21:57
fungialso, that may have been back when we were using 2gb flavors for devstack tests clarkb?21:57
jog0lifeless: so that wasn't the fix21:57
clarkbportante: ok I can propose a change that bumps it to 6k21:57
clarkbfungi: maybe21:57
*** sdake_ has quit IRC21:57
portantegreat21:58
jog0lifeless: although its *A* fix21:58
portanteI'll fix that PKI token logging change which should help to reduce the proxy log sizes, at leasta21:58
portanteleasat21:58
portanteleast21:58
jog0this greande failure is hurting us bad22:00
jog0mikal ^22:00
jog0http://logs.openstack.org/09/57509/2/gate/gate-grenade-devstack-vm/d2252da/testr_results.html.gz'22:00
jog0HALP22:00
*** ericw has quit IRC22:00
*** zul has quit IRC22:00
openstackgerritClark Boylan proposed a change to openstack-infra/config: Increase rsyslog buffer sizes.  https://review.openstack.org/5753822:00
*** sarob_ has joined #openstack-infra22:00
clarkbportante: fungi ^22:01
*** sarob_ has quit IRC22:01
portanteon it22:02
portanteclarkb: thanks22:02
mikalOh hai22:03
mikaljog0: so the oslo thing didn't fix console logs?22:04
jog0mikal: no :(22:04
mikal(Sorry, doing three things at once, so not sure where we're up to here)22:04
jog0it fixed something else though22:04
jog0just can't get it merged22:04
*** sarob has quit IRC22:04
jog0mikal: we are no where here22:04
mikalSo... That revert didn't completely solve things.22:04
jog0nothing is merging still22:04
jog0mikal: we are on fire22:04
mikalBut it made things a little bit better.22:04
mikaljog0: doomed!22:05
jog0mikal: we can't merge anything22:05
mikalI feel a bit like we should stop all code approvals until we have this fixed22:05
jog0I am looking at this grenade failure http://logs.openstack.org/09/57509/2/gate/gate-grenade-devstack-vm/d2252da/logs/new/22:05
jog0mikal: ++++22:05
mikali.e. shut down the merge pipeline except for attempts to fix the gate22:05
mikalCould we script removing the approve bit from all patches?22:05
mikalOr add a -2 to all approved patches in a way we could detect and remove later?22:05
*** ryanpetrello has quit IRC22:05
jeblairi kind of think we should ask people to do that first22:06
mikalThis has been going for days now22:06
mikalIts not fun any more22:06
* mikal has more gray hair22:06
jeblairand then solve it technically only if people don't listen22:06
clarkbI'm with jeblair on this22:06
mikalCertainly I think an email saying "approve nothing" is justified22:06
jeblairclarkb: have you sent your email yet?22:07
clarkbjeblair: I did send mine22:07
jog0mikal: works for me22:07
mikaljog0: got any thoughts on that revert and if we should try it for reals?22:07
mikaljog0: it doesn't seem conclusive to me...22:07
jeblairmikal, jog0: maybe you could follow up clarkb's email with a stronger "okay, i really think we should stop approvals" msg22:07
mikalSure22:08
*** UtahDave has quit IRC22:09
*** hogepodge has quit IRC22:09
jog0I nominante mikal22:09
* mikal is drafting something now22:09
jog0mikal: I am not sure about the revert22:09
*** branen has quit IRC22:10
mikaljog0: yeah, I had hope, but its not as clear cut as I was looking for22:10
fungiclarkb: portante: on 57538, we're still going to have to wait for new nodepool images, unless we want to start new ones building here shortly and then expire out existing servers22:11
*** openstackstatus has joined #openstack-infra22:11
clarkbfungi: yeah, I can babysit that if we go down that road22:11
portantefungi: you mean in order to get the bigger rsyslog buffers?22:11
fungiportante: yes22:12
*** hogepodge has joined #openstack-infra22:12
jog0mikal: I am looking into https://bugs.launchpad.net/tempest/+bug/1252170 which blocked the oslo sync22:12
uvirtbotLaunchpad bug 1252170 in tempest "tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm[compute] failed" [Critical,Confirmed]22:12
openstackgerritA change was merged to openstack-infra/config: Increase rsyslog buffer sizes.  https://review.openstack.org/5753822:12
fungiportante: that's in a file baked into the machine image22:12
*** ryanpetrello has joined #openstack-infra22:12
portantefungi: oh, okay22:12
mikaljog0: sigh22:12
jog0which we think fixes bug #125178422:12
uvirtbotLaunchpad bug 1251784 in tripleo "nova+neutron scheduling error: Connection to neutron failed: Maximum attempts reached (dup-of: 1251920)" [Critical,New] https://launchpad.net/bugs/125178422:12
uvirtbotLaunchpad bug 1251920 in nova "Tempest failures due to failure to return console logs from an instance" [Critical,In progress] https://launchpad.net/bugs/125192022:12
*** kgriffs is now known as kgriffs_afk22:12
clarkbfungi: I will kick off image builds via nodepool shortly (I believe we can manually trigger them)22:13
fungiclarkb: yeah, it's easy, but you'll want to wait until that hits the puppet master first obviously22:13
mikaljog0: what was the failure rate as a percentage for 1251920?22:13
clarkbfungi: yup22:13
jog0mikal:  we don't know, but a crap ton22:14
lifelessthats the new metric22:14
jog0it had the highest rate by far22:14
fungii have to disappear shortly to cook dinner, then i'll jump back in and keep beating on things with the rest of you22:14
jog0but don't know % vs total22:14
mikaljog0: so 2 out of 9 fails might be less?22:14
*** kgriffs_afk is now known as kgriffs22:14
mikaljog0: I'm wondering if we should just try the revert to see what happens at scale22:14
mikaljog0: mostly because I am out of other ideas22:14
jog0mikal: hmm22:14
jog0I don't disagree with that logic22:15
*** ekarlso has quit IRC22:15
jog0I think at some point we may need to do a git-bisect in all of havana22:15
jog0err icehouse22:15
mikalThat's super painful...22:15
mikalWe'd have to hook bisect up with git review somehow22:15
mikalAnd then do rechecks a few times for each review22:15
mikalI'm not sure what that would look like22:15
clarkbjust revert all of icehouse and start over >_> <- sorry I couldn't help it22:16
mikalclarkb: tempting22:16
*** DennyZhang has quit IRC22:16
*** branen has joined #openstack-infra22:17
*** ekarlso has joined #openstack-infra22:17
clarkbrebuilding d-g images now22:17
*** sarob has joined #openstack-infra22:17
openstackgerritBen Nemec proposed a change to openstack-dev/hacking: Enforce grouping like imports together  https://review.openstack.org/5440222:18
openstackgerritBen Nemec proposed a change to openstack-dev/hacking: Enforce import group ordering  https://review.openstack.org/5440322:18
fungifork openstack at havana and slowly cherry-pick everything back in ;)22:19
*** dkliban_ has quit IRC22:20
mikalfungi: well, Havana _works_ at least22:21
jog0fungi: just squash and bisect22:21
jog0shouldn't be too hard22:21
jog0mikal:  we already are making sure its not a tempest issue22:21
fungiclarkb: at the current rate of slave turnover, there may be little point in expiring existing servers. they seem to be getting used as fast as we can build replacements anyway22:22
fungijudging from the state of teh graph22:22
clarkbfungi: nice, less work for me :)22:22
clarkb6 new images are building22:22
*** ryanpetrello_ has joined #openstack-infra22:23
fungii suspect by the time you saw the image builds complete and wrote up the command to delete the ready images, they'd already be claimed by new jobs anyway22:23
anteayalooking at https://review.openstack.org/#/c/57475/ Jenkins returns success but is not voting +122:25
anteayais that intentional?22:26
anteayaI saw it on another patch earlier22:26
*** kgriffs is now known as kgriffs_afk22:26
clarkbanteaya: the check tests came back successful then the gate jobs started22:26
clarkbanteaya: you are still waiting for the gate jobs to run22:26
*** ryanpetrello has quit IRC22:26
*** ryanpetrello_ is now known as ryanpetrello22:26
*** dangers is now known as danger_fo_away22:26
anteayaah okay, thanks22:27
anteayathe +1 disappears while the gate jobs are running22:27
lifelessok so does infra need more nodes?22:28
lifelessclarkb: ^22:28
mgagnezaro: ping22:29
clarkbjeblair: fungi: http://logs.openstack.org/06/57506/1/check/check-tempest-devstack-vm-postgres-full/9150091/logs/devstack-gate-setup-workspace-new.txt is an interesting failure22:29
clarkblifeless: not anymore22:29
clarkblifeless: killing things and focusing on fixing the gate has freed up breathing room22:29
clarkblifeless: we may need more when we get tests passing again, but I don't want to worry about that now22:30
mgagnezaro: https://review.openstack.org/#/c/57525/ Antoine thinks it would be better to force jobparams to be a dict instead22:30
clarkbjeblair: fungi: error: Failed connect to zuul.openstack.org:80; Connection timed out while accessing http://zuul.openstack.org/p/openstack/tempest/info/refs I am going to look at haproxy logs now22:30
clarkber nevermind that is zuul22:30
clarkbhmm load on zuul is >2022:33
clarkb*loadaverage22:33
lifelessbussay22:33
chmouelclarkb: any chance to get a second +2 on this https://review.openstack.org/#/c/56927 (should be trivial)22:34
*** kgriffs_afk is now known as kgriffs22:34
chmouelor anyone else from infra with +2 ^22:34
*** joshuamckenty has joined #openstack-infra22:34
clarkbchmouel: right now new projects are basically at the bottom of the priority queue22:34
*** dcramer_ has quit IRC22:34
jog0chmouel: http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html22:35
joshuamckentyHey happy -infra folks22:35
clarkbchmouel: happy to look at it once things settle down22:35
chmouelah how sorry guys22:35
joshuamckentycould you set up a new lists.openstack.org mailing list for the DefCore committee please?22:35
jog0joshuamckenty: http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html22:35
chmouelclarkb, jog0: good lucks to you all22:35
joshuamckentywe need myself (josh@openstack.org) and Rob Hirschfeld (rob_hirschfeld@dell.com) as admins22:35
clarkbjoshuamckenty: you can propose the change in puppet22:35
joshuamckentygotcha22:35
clarkbjoshuamckenty: openstack-infra/config/modules/openstack_project/manifests/lists.pp there should be examples in that file22:36
joshuamckentysweet22:36
jeblairjoshuamckenty: http://ci.openstack.org/lists.html <-- docs22:36
*** thomasem has quit IRC22:36
portantejog0: new patch to devstack/lib/swift22:36
portantepatches, really22:36
clarkbjeblair: lots of git upload packs running on zuul. maybe we need get those behind git.o.o by replicating zuul refs there? that ends up being messy iirc22:37
*** changbl has quit IRC22:38
jeblairclarkb: yes, if we want to spread that load, we should probably replicate to different repos; i'm not keen on zuul refs being in the canonical ones22:38
jeblairclarkb: we could potentially use the same git server farm just with different paths, or make another farm22:39
*** jhesketh_ has quit IRC22:39
*** thingee has joined #openstack-infra22:39
clarkbjeblair: possibly split the farm as I think it is currently far overpowered22:39
*** jhesketh_ has joined #openstack-infra22:40
thingeenoticed a temporary connection issue to mirror.rackspace.com22:40
thingeehttp://logs.openstack.org/06/57406/2/check/check-tempest-devstack-vm-neutron/590e7e0/console.html22:40
*** MarkAtwood has quit IRC22:42
*** MarkAtwood has joined #openstack-infra22:42
*** yassine has quit IRC22:42
jog0thingee: we have bigger issues ATM http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html22:42
clarkband there isn't much we can do about that (other than run our own mirror)22:42
*** mriedem has quit IRC22:43
*** dkranz has quit IRC22:43
*** kgriffs is now known as kgriffs_afk22:43
*** branen has quit IRC22:44
jeblairif it keeps happening, we will, but it's generally fairly stable.  it's good to have reports of when it does happen so we can keep an eye on it.22:44
jog0hopefully this will fix the grenade issue https://review.openstack.org/#/c/57357/22:44
*** hogepodge has quit IRC22:44
jeblair#status alert Please refrain from approving changes that don't fix gate-blocking issues -- http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html22:45
openstackstatusNOTICE: Please refrain from approving changes that don't fix gate-blocking issues -- http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html22:45
*** ChanServ changes topic to "Please refrain from approving changes that don't fix gate-blocking issues -- http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html"22:45
thingeejeblair: thanks22:45
*** zul has joined #openstack-infra22:45
*** hdd has quit IRC22:45
clarkbhaving the data is useful, in fact I think there is a bug for that problem /me looks really fast we can add this as a data point22:46
*** rcleere has quit IRC22:46
lifelessclarkb: are we starved of non-dg nodes? e.g. will things that don't trigger d-g be ok?22:47
*** ilyashakhat has quit IRC22:47
lifelessclarkb: or do you want everything halted ?22:47
clarkbthingee: https://bugs.launchpad.net/openstack-ci/+bug/1251117 is the bug, you can attach a comment with a link to that build log22:47
uvirtbotLaunchpad bug 1251117 in openstack-ci "devstack-vm gate failure due to apt-get download failure" [Undecided,New]22:47
clarkblifeless: currently we seem to be ok with non d-g slaves22:47
jog0we are close to getting a patch merged !22:48
clarkblifeless: you are probably ok approving things that are not part of the integrated gate22:48
clarkbjog0: woot22:48
lifelessI'm just swimming up through my daily review stuff, seeking to avoid causing headaches22:48
lifelessclarkb: ok, thanks22:48
*** ilyashakhat has joined #openstack-infra22:48
thingeeclarkb: done22:49
openstackgerritJoshua McKenty proposed a change to openstack-infra/config: Adding the DefCore committee list  https://review.openstack.org/5754722:51
*** pcm_ has quit IRC22:51
*** dkliban_ has joined #openstack-infra22:51
fungilifeless: clarkb: we do end up starved for all nodes once the gate grows deep enough that there are more jobs than we have machines (in terms of nodepool quota room and static slaves) due to thrash when there are gate resets, but i don't know that more nodes really solves that situation22:52
* portante steps out for a bit, back in a few hours22:54
clarkbfungi: portante: new d-g images in hpcloud are in. still waiting on rax22:55
*** joshuamckenty has quit IRC22:56
*** branen has joined #openstack-infra22:56
*** weshay has quit IRC22:58
*** masayukig has joined #openstack-infra23:00
*** eharney has quit IRC23:02
*** jhesketh__ has joined #openstack-infra23:03
*** sarob has quit IRC23:03
*** Ng has quit IRC23:03
*** sarob has joined #openstack-infra23:03
jog0so we have 4 patches we think should help23:07
jog0but don't have a fix for the big one, the console log23:07
jog0mikal lifeless ^23:07
jog0I think we are back to the drawing board for console log23:07
mikaljog0: as in the revert is a dead end?23:07
mikaljog0: bugger23:07
jog0mikal: so we didn't try the revert neutron bug23:08
*** sarob has quit IRC23:08
jog0mikal: I am refering to https://review.openstack.org/#/c/57509/23:08
*** joshuamckenty has joined #openstack-infra23:08
*** joshuamckenty has quit IRC23:10
jog0mikal:if you have any idea I am for it23:13
*** flaper87 is now known as flaper87|afk23:13
anteayajog0: something -neutron can help with?23:15
anteayawe are standing by23:15
*** michchap_ has quit IRC23:15
mikaljog0: I am out of ideas23:16
mikalI will continue grinding though23:16
*** changbl has joined #openstack-infra23:17
jog0mikal: yeah me too23:17
jog0mikal: if want a different bug23:18
*** michchap has joined #openstack-infra23:18
anteayahopefully when 57290 goes through, https://bugs.launchpad.net/swift/+bug/1224001 will be fixed from -neutron23:18
uvirtbotLaunchpad bug 1224001 in neutron "test_network_basic_ops fails waiting for network to become available" [High,In progress]23:18
*** slong has joined #openstack-infra23:19
jog0anteaya: fingers crossed23:19
*** herndon_ has quit IRC23:20
anteayayeah here too23:21
clarkbjog0: can we get a tl;dr status report. 57509 does not fix the console thing but does fix a valud bug, 57290 fixes a neutron bug, both are in the gate now fingers crossed23:21
clarkbjog0: 57357 is also in the gate to disable the v3 tests because they were causing problems with grenade23:22
clarkbdoes that leave us with the major bug being 1251920?23:22
jog0clarkb: clueless on 125192023:23
jog0mikal and I are questining our sanity23:23
clarkbfungi: portante: every cloud region but rax dfw has the new d-g image23:24
jog0clarkb: looking for other ideas23:24
jog0may just revert all the things23:24
*** fifieldt has joined #openstack-infra23:24
clarkbI suggested that crazy idea and think it is overkill23:25
clarkbmostly because for example with tempest the diff is >7k lines between now and havana23:25
clarkbI imageine nova is worse23:25
jog0clarkb: and we have https://bugs.launchpad.net/tempest/+bug/1252170 which we have a partial fix for23:26
uvirtbotLaunchpad bug 1252170 in tempest "tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm[compute] failed" [Critical,In progress]23:26
clarkbjog0: is that in the gate yet? /me looks at bug23:26
*** loq_mac has joined #openstack-infra23:28
*** datsun180b has quit IRC23:29
jog0part is( v323:29
jog0disable23:29
notmynameare you suggesting reverting every patch that has landed in every openstack project since havana and then running them each through to see what breaks and find the cause of the instability?23:29
notmynameas the crazy idea23:29
jog0notmyname: if that was easy, then yes23:30
notmynamejog0: I posit that it isn't easy ;-)23:30
clarkbthe "joke" I made was revert to havana and start over :) nevermind trying to find what causes the breakage :)23:30
notmynameya. just making sure I actually read all that correctly23:30
jog0notmyname: revert large chuncks of nova23:31
jog0(not merge though)23:31
*** markmc has quit IRC23:34
*** thedodd has quit IRC23:34
*** sdake_ has joined #openstack-infra23:37
*** mfer has quit IRC23:37
clarkbjog0: https://jenkins01.openstack.org/job/gate-tempest-devstack-vm-large-ops/14880/console :(23:38
jog0clarkb: WAT23:39
jog0whats taht from23:39
fungiso... maybe not entirely easy but... hook git bisect into git review coupled with multiple rechecks and see what you can zero in on? probably completely bonkers23:39
clarkbjog0: http://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-large-ops/1171354/logs/screen-key.txt.gz23:40
clarkbkeystone is saying the address is already in use, we have seen that infrequently. jeblair double checked last time to make sure the slave wasn't used twice23:40
jeblairfungi: better if you can also mark them WIP as you add them23:40
clarkbI can do a quick sanity check on that now23:40
clarkbthe node is gone in jenkins, to the logstash23:41
jeblairclarkb: http://paste.openstack.org/show/53707/ nodepool says used once; feel free to confirm with logstash23:43
anteayareally turbo-hipster?23:43
fungianteaya: i hear you can blame jhesketh_23:43
clarkbjeblair: logstash confirms tags:"console.html" AND message:"Building remotely on" AND message:"devstack-precise-hpcloud-az2-703687"23:43
anteayajhesketh_: really?23:43
jhesketh__anteaya: umm? what am I being blamed for?23:44
fungijhesketh__: taking pride in random program names, i think23:45
jhesketh__oh right, yes, you can blame me for that23:45
anteayano, for squeezing in a gate change while we are all on gate lockdown23:45
anteayagate blocking bug fixes only, everyone watching zuul TV23:46
fungioh, i was lacking in context then, sorry23:46
jog0turby himpster lock down23:46
anteayaand up pops a turbo-hipster23:46
jhesketh__ah, so I didn't realise it was also on lockdown for stackforge too (I'm a bit out of the loop)23:46
jeblairanteaya: what change?23:46
jhesketh__very sorry guys, I'll stop23:46
clarkbanteaya: I semi sort of said people doing things for projects not part of the integrated gate were fine23:46
clarkbanteaya: we have plenty of non d-g slaves doing nothing23:46
anteayajeblair: it just finished, a stackforge23:46
anteayajhesketh__: was funny too23:47
jeblairjhesketh__, anteaya: yeah, don't worry about it if it's not in the devstack/tempest change queue (the 'openstack integrated gate')23:47
anteayaah okay23:47
jhesketh__anteaya: by zuul TV do you mean the status page?23:47
fungiyeah, things running devstack-tempest jobs mostly23:47
anteayaI've come down pretty hard on -neutron, just wanting to keep the consistency23:47
jhesketh__jeblair: okay, so it's cool if I merge through another 6 or so patches? (I'm happy to wait anyway)23:47
anteayaand btw they have responded well23:47
anteayajhesketh__: yes I call the status page zuul TV23:48
jeblairi mean, it's full name is "openstack-dev/devstack, openstack-dev/grenade, openstack-dev/pbr, openstack-infra/devstack-gate, openstack-infra/jeepyb, openstack-infra/pypi-mirror, openstack/ceilometer, openstack/cinder, openstack/glance, openstack/heat, openstack/horizon, openstack/keystone, openstack/neutron, openstack/nova, openstack/oslo.config, openstack/oslo.messaging, openstack/oslo.version, openstack/python-ceilometerclient, openstack/p23:48
fungi[...]23:48
jeblairbut we like to abbreviate it23:49
jeblairjhesketh__: yeah it won't hurt anything23:49
jhesketh__okay cool23:49
mordredjeblair: I think we shoudl start calling it by its full name23:50
jhesketh__I know you guys have been working really hard on fixing the gates, let me know if I can do anything to help! :-)23:50
jog0jhesketh__: we got a nova bug for you23:50
jhesketh__the one everybody is stuck on?23:50
jog0no the otherone23:51
jog0:)23:51
jhesketh__oh I can look at the other one23:51
jog0jhesketh__: https://bugs.launchpad.net/tempest/+bug/125217023:51
uvirtbotLaunchpad bug 1252170 in tempest "tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm[compute] failed" [Critical,In progress]23:51
jog0we are banging away on that in nova room23:52
*** thomasem has joined #openstack-infra23:53
*** mrodden has quit IRC23:54
jhesketh__yep, watching :-)23:54
*** julim has quit IRC23:55
jog0clarkb: https://review.openstack.org/5756623:58
jog0hehe23:58
clarkbjog0: I think I know what the problem is with keystone not starting. the default keystone port is in the default linux local ephemeral port range23:58
jog0revert all work from this week23:58
jog0clarkb: FAIL23:59
jog0wow23:59
clarkbjog0: I will propose a change to devstack probably to come up with a fix23:59
jog0you should file a big for that23:59
clarkbbasically shift the ephemeral port range23:59
jeblairclarkb: what's the default port?23:59
clarkbjog0: it is the IANA assigned port23:59
clarkbjeblair: 3535723:59
clarkblinux range is 32768 to 6100023:59
jeblairthat's terrible23:59
lifelesshttps://blueprints.launchpad.net/keystone/+spec/iana-register-port23:59
clarkbso unlikely to have problems but when you run as many tests as we do...23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!