Friday, 2013-10-11

*** mrodden has quit IRC00:01
clarkbmordred: I left a comment on the change, I think I managed to express my concern properly, but let me know if it isn't clear00:02
*** vipul is now known as vipul-away00:04
*** openstackgerrit has quit IRC00:04
*** openstackgerrit has joined #openstack-infra00:05
*** pcm_ has quit IRC00:06
*** krtaylor has joined #openstack-infra00:07
*** weshay has quit IRC00:09
*** mrodden has joined #openstack-infra00:14
openstackgerritDan Bode proposed a change to openstack-infra/config: Add stackforge project: puppet_openstack_builder  https://review.openstack.org/5107900:15
*** alexpilotti has quit IRC00:15
clarkbmordred: https://review.openstack.org/#/c/33926/5 if I +2 that do you want to babysit an approval?00:15
clarkbI need to drop offline here for a bit in order to get move stuff done prior to the weekend00:16
* clarkb AFKs to do that. I did +2 the change. I think it just needs a sanity check once in so that the next gerrit restart doesn't go sideways00:17
*** melwitt has quit IRC00:17
*** vipul-away is now known as vipul00:17
*** dripton has joined #openstack-infra00:17
*** melwitt has joined #openstack-infra00:19
*** alchen99 has quit IRC00:20
openstackgerritA change was merged to openstack-infra/config: Document how to delete a pad from Etherpad Lite  https://review.openstack.org/4632900:20
*** CaptTofu has quit IRC00:21
*** CaptTofu has joined #openstack-infra00:21
*** hogepodge has quit IRC00:22
openstackgerritDan Bode proposed a change to openstack-infra/config: Add stackforge project: puppet_openstack_builder  https://review.openstack.org/5107900:22
*** amotoki has joined #openstack-infra00:23
openstackgerritA change was merged to openstack-infra/jenkins-job-builder: Add repo scm  https://review.openstack.org/4516500:23
*** dripton has quit IRC00:25
*** matsuhashi has joined #openstack-infra00:27
*** senk has joined #openstack-infra00:27
*** oubiwann_ has quit IRC00:27
openstackgerritA change was merged to openstack-infra/devstack-gate: Improve fallback to master branch  https://review.openstack.org/4989400:27
openstackgerritA change was merged to openstack-infra/devstack-gate: Revert "Revert "Enable q-vpn service""  https://review.openstack.org/5024200:27
openstackgerritA change was merged to openstack-infra/devstack-gate: Conditionally override PyPI for reqs integration  https://review.openstack.org/5019800:27
*** dripton has joined #openstack-infra00:33
*** gyee has quit IRC00:35
*** dripton has quit IRC00:42
*** sandywalsh_ has joined #openstack-infra00:42
*** sandywalsh has quit IRC00:43
*** nosnos has joined #openstack-infra00:44
*** dripton has joined #openstack-infra00:45
*** krtaylor has quit IRC00:57
*** sarob has joined #openstack-infra00:58
*** senk has quit IRC01:02
*** wenlock has quit IRC01:07
*** sarob has quit IRC01:08
*** sarob has joined #openstack-infra01:09
*** melwitt has quit IRC01:12
*** DennyZhang has joined #openstack-infra01:12
*** senk has joined #openstack-infra01:14
stevebakerhey, is there some permissions I need to review heat proposals on http://summit.openstack.org/ ?01:18
*** markmcclain has joined #openstack-infra01:23
*** mriedem has joined #openstack-infra01:24
lifelessPTL01:26
*** yaguang has joined #openstack-infra01:27
*** yaguang has quit IRC01:27
*** yaguang has joined #openstack-infra01:28
*** basha has joined #openstack-infra01:31
*** senk has quit IRC01:39
*** chris613 has quit IRC01:48
*** guohliu has quit IRC01:49
*** jhesketh__ has quit IRC01:57
*** wenlock has joined #openstack-infra02:01
*** jhesketh has joined #openstack-infra02:02
*** ArxCruz has joined #openstack-infra02:05
*** dkranz has joined #openstack-infra02:08
*** ArxCruz_ has joined #openstack-infra02:12
*** fifieldt has joined #openstack-infra02:13
*** xchu has joined #openstack-infra02:14
*** ArxCruz has quit IRC02:15
*** sarob has quit IRC02:15
*** ArxCruz_ has quit IRC02:20
*** alchen99 has joined #openstack-infra02:24
*** krtaylor has joined #openstack-infra02:26
*** alchen99 has quit IRC02:36
*** senk has joined #openstack-infra02:40
*** guohliu has joined #openstack-infra02:43
*** locke105 has quit IRC02:44
*** crank has quit IRC02:44
*** kpepple has quit IRC02:44
*** alaski has quit IRC02:44
*** mkerrin has quit IRC02:44
*** guitarzan has quit IRC02:44
*** Reapster has quit IRC02:44
*** Vivek has quit IRC02:44
*** davidlenwell has quit IRC02:44
*** BobBall has quit IRC02:44
*** Ng has quit IRC02:44
*** alaski_ has joined #openstack-infra02:44
*** BobBall has joined #openstack-infra02:44
*** guitarzan has joined #openstack-infra02:44
*** Reapster has joined #openstack-infra02:44
*** crank has joined #openstack-infra02:44
*** Vivek has joined #openstack-infra02:44
*** kpepple has joined #openstack-infra02:44
*** Ng has joined #openstack-infra02:44
*** locke105 has joined #openstack-infra02:44
*** senk has quit IRC02:44
*** Vivek is now known as Guest8658602:45
*** mkerrin has joined #openstack-infra02:45
*** davidlenwell has joined #openstack-infra02:45
*** erfanian has joined #openstack-infra02:49
*** mriedem has quit IRC02:49
*** matsuhashi has quit IRC02:57
lifelessmordred: you might care about https://bugs.launchpad.net/tripleo/+bug/122230602:57
uvirtbotLaunchpad bug 1222306 in tripleo "can't install keystone with pypi mirror" [Medium,Triaged]02:57
lifelessmordred: or https://bugs.launchpad.net/tripleo/+bug/122230802:57
uvirtbotLaunchpad bug 1222308 in tripleo "can't install cinderclient with pypi mirror" [Medium,Triaged]02:57
*** HenryG has joined #openstack-infra02:58
clarkblifeless: we really should require <0.8alpha or whatever the lowest 0.8 version is02:59
lifelessclarkb: of requests?03:00
*** basha has quit IRC03:00
clarkblifeless: sqlalchemy03:00
clarkbits silly we can't just say <0.803:01
mordredah. fascinating03:01
mordredclarkb: we can with pip 1.403:01
lifelessclarkb: oh right, there are two distinct bugs03:01
clarkbmordred: right, but everyone else doesn't do new pip03:01
lifelessmordred: yeah, I found this testing --offline with a fresh mirror03:01
lifelessmordred: so this is in the 'stuff we don't mirror in' category03:01
lifelessthe problem is global requirements doesn't list all the different requirements all releases of clients had03:02
mordredlifeless: yah. https://review.openstack.org/#/q/topic:openstack/requirements,n,z03:02
clarkbmordred: https://review.openstack.org/#/c/51053/03:03
mordredclarkb: I think we have a script bug: https://review.openstack.org/#/c/49201/03:03
mordredlook at the commit message03:03
*** flaper87|afk has quit IRC03:03
*** flaper87|afk has joined #openstack-infra03:03
clarkbwe do, 51053 fixes it :)03:03
*** mkerrin has quit IRC03:03
*** mkerrin has joined #openstack-infra03:03
*** HenryG has quit IRC03:03
*** HenryG has joined #openstack-infra03:03
lifelessmordred: I'm not sure how that will fix the issue03:03
lifelessmordred: we're installing releases03:03
mordreddone03:04
mordredwhat?03:04
lifelessmordred: when we pip install nova trunk03:04
lifelessmordred: we get a release of python-neutronclient03:04
mordredyah03:04
mordredk03:04
* mordred bats eyelashes03:04
lifelessmordred: if the current requirements rules don't bring down versions that match the requirements when the release of that client was cut03:05
mordredall of the projects should merge all of those changes and then cut releases03:05
clarkbmordred: thanks. I also made sure to document why that horrible read into a variable trick is used03:05
mordredhrm. ok03:05
mordredlifeless: I grok what you are saying03:05
clarkbbecause I keep forgetting why we did that and I don't want to have to remember03:05
lifelessmordred: I don't claim to have an answer yet03:05
lifelessmordred: just thought you should have it in your thinking cap03:05
openstackgerritA change was merged to openstack-infra/config: Use a single change ID per requirement proposal.  https://review.openstack.org/5105303:05
mordredlifeless: I think this may fall in to the category of things that jeblair was worried about in terms of enabling use of our mirror for non-gate activities03:05
mordredlifeless: which is to say, I think it may have some design holes03:06
lifelessmordred: we're not using your mirror yet03:06
lifelessmordred: this is a fresh run-mirror'd mirror03:06
mordredlifeless: yup. I grok. but the mirror script is designed to keep a running mirror03:06
lifelessmordred: right, ack.03:06
mordredlifeless: thinking cap on - btw03:06
mordredthis is my way of thinking03:06
lifelessonce we get sophisticated enough in our CI03:07
lifelesswe'll spin up new mirrors as part of the test03:07
lifelessand detect this03:07
mordredI will be honest - my most recent thinking has been to investigate use of devpi03:07
lifelesss/the test/a test/03:07
lifelessmordred: fully offline is very attractive for dc bringup stories03:07
mordredyup. devpi has fully offline03:07
lifelessmordred: so I'm not super keen on devpi03:07
lifelessmordred: I thought it only captured what you used?03:08
mordredit also has pockets03:08
mordredso you can have a "mirror upstream" pocket, and a "my local stuff" which depends on the "mirror upstream"03:08
lifelessmordred: so devpi would demonstrate the same failure mode03:08
mordredso pointing at my local stuff will get you both03:08
mordredlifeless: yes. I'm just saying03:08
lifelessok, tangent, sure.03:08
mordredI've been thinking that richer implementation scripting might be better served at this point by devpi instead of pypi-mirror03:09
mordredBUT03:09
mordredI support the goal you are expressing03:09
lifelesscool03:09
mordredish03:09
mordredsort of03:09
mordredI mean03:09
mordredyeah03:09
lifelessso I suspect we're going to be gating a different scenario than the gate currently does03:09
mordredyup03:09
lifelessI'm thinking I should mail the list when we're in sight of success03:09
lifelessand get discussion03:09
lifelessand/or a session in the CFP at the project level I guess03:09
mordredoy03:10
clarkbmordred: are you thinking we should use devpi for our mirror too?03:10
*** dims has quit IRC03:10
mordredclarkb: toying with the idea03:10
mordredclarkb: the fact that it support multiple sets of things03:10
mordredclarkb: and local uploads03:10
mordredbut also linking things03:10
mordredis very attractive03:11
mordreddownside: it serves things from pyton instead of apache03:11
clarkbright, I was just going to ask about that03:11
mordredyup. that's the assinine part03:11
mordredbut also the part that allows you to describe sets that depend on other sets03:11
mordredso, you know, feature. bug.03:11
mordredalso - I'm thrilled that 3rd party testing has finally caught on03:13
mordredit only took a year03:13
mordredmaybe a year and a half03:13
clarkbmordred: so I was thinking about swift logs and realized we should just put our mirror in swift too03:13
mordredhow long have we been doing this?03:13
mordredclarkb: totes03:13
clarkbmordred: then we can manage a single index.html file03:13
clarkband maybe not even that03:13
clarkbmordred: nova is requiring it for their hypervisors03:14
clarkbmordred: I think ssh will always be the way to go for third party testing (because event stream > polling)03:15
*** wenlock_ has joined #openstack-infra03:18
*** wenlock has quit IRC03:19
*** wenlock_ is now known as wenlock03:19
mordred++03:20
mordredamazing how russellb telling people they have to do it or they're going to get dropped gets further than us offering that they can do it and people can track the quality of their driver03:20
*** matsuhashi has joined #openstack-infra03:24
*** matsuhashi has quit IRC03:31
*** matsuhashi has joined #openstack-infra03:32
*** guitarzan has quit IRC03:34
*** alaski_ has quit IRC03:34
*** dkranz has quit IRC03:34
*** jhesketh has quit IRC03:34
*** nosnos has quit IRC03:34
*** michchap has quit IRC03:34
*** uvirtbot has quit IRC03:34
*** Ryan_Lane has quit IRC03:34
*** SlickNik has quit IRC03:34
*** freyes has quit IRC03:34
*** mkoderer has quit IRC03:34
*** slong has quit IRC03:34
*** guitarzan has joined #openstack-infra03:34
*** alaski has joined #openstack-infra03:34
*** freyes has joined #openstack-infra03:34
*** mkoderer_ has joined #openstack-infra03:34
*** SlickNik has joined #openstack-infra03:34
*** dkranz has joined #openstack-infra03:34
*** slong has joined #openstack-infra03:34
*** jhesketh has joined #openstack-infra03:34
*** nosnos has joined #openstack-infra03:34
*** michchap has joined #openstack-infra03:34
*** Ryan_Lane has joined #openstack-infra03:35
*** Ryan_Lane has quit IRC03:35
*** Ryan_Lane has joined #openstack-infra03:35
*** matsuhashi has quit IRC03:36
*** senk has joined #openstack-infra03:41
*** matsuhashi has joined #openstack-infra03:41
*** matsuhashi has quit IRC03:41
*** matsuhashi has joined #openstack-infra03:42
*** senk has quit IRC03:45
*** basha has joined #openstack-infra03:45
*** matsuhashi has quit IRC03:46
*** CaptTofu has quit IRC03:47
*** CaptTofu has joined #openstack-infra03:48
*** matsuhashi has joined #openstack-infra03:49
*** basha_ has joined #openstack-infra03:50
*** basha has quit IRC03:52
*** basha_ is now known as basha03:52
*** basha has quit IRC03:53
*** SergeyLukjanov has joined #openstack-infra03:54
*** jerryz has quit IRC04:01
*** wenlock has quit IRC04:04
*** sarob has joined #openstack-infra04:11
*** erfanian has quit IRC04:14
*** D30 has joined #openstack-infra04:20
openstackgerritTom Fifield proposed a change to openstack-infra/config: Fix Doc Location for Transifex  https://review.openstack.org/5111204:21
clarkbfifieldt: you around?04:21
fifieldtyessir clarkb04:22
fifieldtthe sun is up and doing well04:22
clarkbfifieldt: cool. We would like to add Ironic to transifex and I figured I should figure out how you would like to go about adding new prjoects04:22
fifieldtright, yes, that proceedure should be documented04:22
fifieldtI take it you're most interested in the transifex side of things?04:23
clarkbI think I have sufficient permissions to do it, but didn't want to be sidestepping things04:23
clarkbfifieldt: right04:23
clarkbfifieldt: I can send an email or submit a bug or whatever is best for you04:23
fifieldtif you want, we can step through it now and just do it?04:23
clarkbsure04:23
fifieldtand I can update the wiki at the same time04:23
fifieldtso, we start in the OpenStack "organisation" on transifex04:23
fifieldthttps://www.transifex.com/organization/openstack04:23
fifieldtat the top of the projects list is the "+ NEW" button04:24
fifieldtwe type in a name, and description as appropriate04:24
clarkbyup, I have clicked the NEW button04:24
fifieldtand importantly: set the source language to English (en)04:24
clarkbfifieldt: and the name is the project less openstack/ ?04:24
fifieldtyes04:24
fifieldtthe openstack organisation provides the openstack bit04:25
fifieldtchoose "Permissive Open Source" as the license04:25
fifieldtand paste the URL for the source (either github or git.openstack.org) in the "source code URL" box04:25
fifieldtonce you have created the project, go to its page and click the "Manage" button04:26
clarkbfifieldt: does the URL for the source need to be a clonable path?04:26
clarkbor is that just a handy link for humans?04:26
fifieldtjust a handy link for humans04:26
clarkbok I am on the manage page04:26
*** basha has joined #openstack-infra04:27
fifieldtfeel free to fill out a long description, home page, if you want,04:27
fifieldtbut the important bit here is maintainers04:27
fifieldtsorry04:27
fifieldtnot maintainers04:27
fifieldtaccess control04:27
fifieldtset the "Project Type" to "Outsourced project"04:27
clarkbfifieldt: under features is a TM check box. should I check that?04:27
fifieldtand "Outsource Access to" OpenStack04:27
fifieldtyes, that is a good idea clarkb04:28
clarkbok TM check box checked and project outsourced to openstack04:28
fifieldtgreat04:28
clarkbnow I need to add maintainers04:28
fifieldtin theory, that is done through the OpenStack organisation04:29
clarkboh04:29
fifieldtbut you can add anyone you think is relevent to an individual project04:29
clarkbfifieldt: can you check if you have management perms on Ironic?04:29
clarkbyou haven't been explicitly added but are part of the project hub04:29
fifieldtI do indeed04:29
fifieldtso no problems with permissions04:29
clarkbcool I will leave it as is then04:29
fifieldtyay :)04:29
clarkbis that it for the transifex side?04:30
fifieldtyes04:30
clarkbawesome thanks04:30
fifieldtwell04:30
fifieldtthere is one thing I'm not 100% sure of04:30
fifieldtthat is whether there's a need to manually create the "Resources" the first time04:30
fifieldtI think the client can do that04:30
fifieldtbut I'm not 100% sure04:30
clarkbI think the client can do that too04:30
fifieldtgreat04:30
fifieldtthen yes, that should be everything04:31
clarkbas other new projects haven't needed to do anything under resources, instead jenkins jobs push to them and they are automagically added04:31
fifieldtexcellent04:31
fifieldtit's good to get confirmation on that04:31
clarkbfifieldt: I will try to remeber and double check ironic once the jenkins jobs are in place04:31
fifieldtcheers04:31
clarkbbut I haven't heard complaining about it not working so it must work right? :)04:31
fifieldtright :)04:32
fifieldthttps://review.openstack.org/#/c/51112 <-- though, speaking of failing jenkins jobs, how do you feel about this? :) I'd like to get manuals working again :(04:32
clarkbdevananda: ^ you are ready for the jenkins jobs04:32
clarkbfifieldt: 51112 lgtm +2'd04:32
fifieldtcheers04:32
* fifieldt wonders who else he can bother at this insane timezone04:33
*** markmcclain has quit IRC04:38
fifieldtdammit clarkb, now I have to check every project to make sure that TM box is ticked :D04:40
fifieldtit must be a new option04:42
fifieldtthey weren't04:42
fifieldtnice job on the discovery :)04:42
*** senk has joined #openstack-infra04:42
*** DennyZhang has quit IRC04:46
*** senk has quit IRC04:47
clarkbfifieldt: :)04:47
*** changbl has quit IRC04:47
*** changbl has joined #openstack-infra04:51
*** DennyZhang has joined #openstack-infra04:56
*** sarob has quit IRC05:02
*** sarob has joined #openstack-infra05:02
*** boris-42 has joined #openstack-infra05:04
*** afazekas has joined #openstack-infra05:06
*** sarob has quit IRC05:07
*** SergeyLukjanov has quit IRC05:08
*** afazekas has quit IRC05:11
*** ryanpetrello has joined #openstack-infra05:17
*** ryanpetrello has quit IRC05:18
*** changbl has quit IRC05:38
*** senk has joined #openstack-infra05:43
*** senk has quit IRC05:48
*** cody-somerville has quit IRC05:50
*** yaguang has quit IRC05:50
*** kong has quit IRC05:57
*** Lingxian has joined #openstack-infra05:58
openstackgerritEndre Karlson proposed a change to openstack-infra/config: Add pypi job to python-libraclient  https://review.openstack.org/5106906:05
*** DennyZhang has quit IRC06:08
*** yolanda has joined #openstack-infra06:11
*** sarob has joined #openstack-infra06:13
openstackgerritEndre Karlson proposed a change to openstack-infra/config: Add / Change python-libraclient jobs  https://review.openstack.org/5106906:17
*** sarob has quit IRC06:18
*** mkoderer_ is now known as mkoderer06:37
*** senk has joined #openstack-infra06:45
*** senk has quit IRC06:50
*** uvirtbot has joined #openstack-infra06:52
*** mkerrin has quit IRC06:55
*** yamahata has joined #openstack-infra06:58
*** mkerrin has joined #openstack-infra07:00
*** cody-somerville has joined #openstack-infra07:09
*** cody-somerville has quit IRC07:09
*** cody-somerville has joined #openstack-infra07:09
*** mancdaz_ has quit IRC07:14
*** slong has quit IRC07:15
*** mancdaz has joined #openstack-infra07:15
openstackgerritMasashi Ozawa proposed a change to openstack/requirements: Set boto minimum version  https://review.openstack.org/5113107:15
*** cody-somerville has quit IRC07:16
openstackgerritMasashi Ozawa proposed a change to openstack/requirements: Set boto minimum version  https://review.openstack.org/5113107:17
*** D30 has quit IRC07:17
*** D30 has joined #openstack-infra07:22
*** bauzas has joined #openstack-infra07:22
bauzashi all07:22
bauzasI'm having trouble with the py27 build for a review : http://logs.openstack.org/70/50970/1/check/gate-climate-python27/5ede61d/console.html07:23
bauzasmy own tow -r -epy27 works like a charm07:23
bauzass/tow/tox07:23
bauzasbut the oslo config on the Jenkins VM is incorrect07:24
bauzasI checked both Jenkins and tox venvs07:24
bauzasand the pip freeze is slighly different07:24
*** fbo_away is now known as fbo07:25
bauzasoslo.config is the same 1.2.107:25
bauzasbut I found trace of oslo-config on Jenkins07:26
*** osanchez has joined #openstack-infra07:26
bauzaswhich is an early build07:26
*** D30 has quit IRC07:27
*** dafter has joined #openstack-infra07:29
*** cody-somerville has joined #openstack-infra07:30
*** shardy_afk is now known as shardy07:31
*** D30 has joined #openstack-infra07:32
*** cody-somerville has quit IRC07:37
*** senk has joined #openstack-infra07:46
*** senk has quit IRC07:51
*** basha has quit IRC07:59
*** che-arne has joined #openstack-infra08:01
*** luhrs1 has joined #openstack-infra08:01
*** jpich has joined #openstack-infra08:05
*** dizquierdo has joined #openstack-infra08:06
*** odyssey4me has joined #openstack-infra08:06
*** yassine has joined #openstack-infra08:08
*** amotoki has quit IRC08:11
*** derekh has joined #openstack-infra08:18
*** odyssey4me has quit IRC08:26
*** markmc has joined #openstack-infra08:29
*** odyssey4me has joined #openstack-infra08:33
*** dizquierdo has quit IRC08:38
*** hashar has joined #openstack-infra08:41
*** dims has joined #openstack-infra08:42
*** dkehn_ has joined #openstack-infra08:44
*** yamahata has quit IRC08:46
*** dkehn has quit IRC08:47
*** senk has joined #openstack-infra08:47
*** dims has quit IRC08:50
*** senk has quit IRC08:52
openstackgerritLucas Alvares Gomes proposed a change to openstack/requirements: Added lower version boundary for netaddr  https://review.openstack.org/4953008:55
*** dizquierdo has joined #openstack-infra09:08
openstackgerritMasashi Ozawa proposed a change to openstack/requirements: Set boto minimum version  https://review.openstack.org/5113109:12
silehtthx fungi I have seen the pypi mirror updated !09:13
*** johnthetubaguy has joined #openstack-infra09:19
*** basha has joined #openstack-infra09:22
*** johnthetubaguy has quit IRC09:31
*** johnthetubaguy has joined #openstack-infra09:31
*** beagles has joined #openstack-infra09:45
openstackgerritMehdi Abaakouk proposed a change to openstack-infra/jenkins-job-builder: Allow macro is dict key  https://review.openstack.org/5115909:46
*** alexpilotti has joined #openstack-infra09:46
*** senk has joined #openstack-infra09:48
*** senk has quit IRC09:52
*** markmc has quit IRC10:02
*** xchu has quit IRC10:04
*** alexpilotti has joined #openstack-infra10:07
*** pcm_ has joined #openstack-infra10:07
*** pcm_ has quit IRC10:09
*** pcm_ has joined #openstack-infra10:09
*** markmc has joined #openstack-infra10:11
*** D30 has quit IRC10:12
* ttx juggles with CIVS since it doesn't allow more than 1000 voters10:17
*** fifieldt has quit IRC10:18
ttxFun fact: there is one voter that was left out by CIVS for a mysterious reason and I have no way of determining who it is.10:18
*** fifieldt has joined #openstack-infra10:18
ttxfungi, jeblair, mordred: multiple failures downloading deps on various jobs10:23
ttxhttp://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html10:24
ttxlooks like network issues10:24
ttxdoesn't hit the same dep every time10:24
* ttx lunches10:24
sdaguettx: it doesn't allow more than 1000 voters?10:26
*** branen has quit IRC10:30
openstackgerritEndre Karlson proposed a change to openstack-infra/config: Add / Change python-libraclient jobs  https://review.openstack.org/5106910:31
*** dims has joined #openstack-infra10:34
*** mestery has joined #openstack-infra10:40
*** hashar_ has joined #openstack-infra10:46
*** johnthetubaguy has quit IRC10:47
*** johnthetubaguy has joined #openstack-infra10:48
*** hashar has quit IRC10:48
*** hashar_ is now known as hashar10:48
*** mestery has quit IRC10:48
*** senk has joined #openstack-infra10:49
*** senk has quit IRC10:53
*** boris-42 has quit IRC10:55
*** guohliu has quit IRC11:01
openstackgerritQiu Yu proposed a change to openstack-infra/jeepyb: Print help message and exit if no config file by default  https://review.openstack.org/5118211:03
*** cody-somerville has joined #openstack-infra11:13
sorenttx: CIVS is free software, IIRC. You might be able to install it somewhere and crank that limit up to eleven... thousand.11:17
*** michchap has quit IRC11:26
*** michchap has joined #openstack-infra11:26
*** CaptTofu has quit IRC11:27
*** CaptTofu has joined #openstack-infra11:27
*** cody-somerville has quit IRC11:31
sdaguemight need that for next go around. the ATC growth being what it is11:32
*** hashar has quit IRC11:34
ttxsoren: yes, it's a bit weird but I ran it locally recently to test the ability to rerun ballots with alternative algorithms11:35
ttxsdague: you can actually send voters in multiple batches of <100011:35
sdagueah, gotcha11:35
*** SergeyLukjanov has joined #openstack-infra11:35
ttxsdague: but i wasn't sure of that until I tried and already sent half of them :)11:36
sdagueheh11:37
*** SergeyLukjanov is now known as _SergeyLukjanov11:37
*** _SergeyLukjanov is now known as SergeyLukjanov11:37
openstackgerritEkaterina Fedorova proposed a change to openstack-infra/config: Add murano-repository to stackforge  https://review.openstack.org/5002611:38
ttxErr... the test nodes graph at http://status.openstack.org/zuul/ looks highly suspicious11:41
ttxfungi, jeblair, mordred: ^ may or may not be related with the network issues we're experiencing fetching deps11:42
ttxAt this rate we'll reach universe entropy in 67 minutes11:42
ttxsdague: ever saw something like it ?11:43
sdagueyeh, that looks crazy11:44
sdagueI wonder if the network timeouts are preventing the node builds11:44
sdaguewhich would make sense11:44
ttxsdague: yes, definitely started to appear at around the same time11:44
sdagueso they enter that state, but stall out11:44
sdagueand the system is correctly trying to build more, because it's not getting any out the other side11:45
sdaguebecause we are definitely backed up on devstack nodes11:45
ttxit's like watching a train wreck in slow motion11:45
ttxgood thing I got most of my patches merged earlier.11:46
sdagueheh11:46
sdaguewho knew that skynet would need this much care and feeding11:46
ttxsdague: I was thinking of issuing a statusbot alert.11:47
sdagueprobably fair11:47
ttxon it11:47
ttx#status notice Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun)11:49
*** senk has joined #openstack-infra11:49
ttxI like how every time I need to use that bot it miserably fails11:50
ttxwhere the heck is openstackstatus bot11:50
*** basha has quit IRC11:50
*** senk has quit IRC11:53
-ttx- Top issues right now: (1) test node starvation (2) networking issues fetching dep (might be the cause of 1) and (3) no statusbot to warn people11:54
ttxfungi, jeblair, mordred: ^11:55
*** thomasm has joined #openstack-infra11:55
*** thomasm has quit IRC11:55
*** thomasm has joined #openstack-infra11:56
openstackgerritTom Fifield proposed a change to openstack-infra/config: Fix Doc Location for Transifex  https://review.openstack.org/5111211:56
ttxsdague: wondering if we are not past the peak of network issues and starting to gradually recover11:58
ttxlooking at the graph and the status of the very few tests that run11:58
sdagueyeh, could be11:59
*** boris-42 has joined #openstack-infra11:59
*** basha has joined #openstack-infra12:00
sdagueso in one of the ways to make skynet smarter, I wonder if we should consider auto respooling checks that hit a network timeout12:01
ttxsdague: that wouldn't make it smarter, but would certainly make it more resilient12:02
sdagueyeh12:02
*** basha has quit IRC12:02
*** markmc has quit IRC12:02
*** dizquierdo has quit IRC12:03
*** w_ has joined #openstack-infra12:04
*** markmc has joined #openstack-infra12:05
*** CaptTofu has quit IRC12:05
*** olaph has quit IRC12:05
*** CaptTofu has joined #openstack-infra12:06
*** dprince has joined #openstack-infra12:10
*** cody-somerville has joined #openstack-infra12:22
mordredyay! things have fixed themselves before I woke up?12:23
BobBallthey knew you were coming12:23
BobBalland were scared...12:23
mordredBobBall: ++12:24
thomasm'Tis a good day.12:26
*** thomasbiege has joined #openstack-infra12:30
*** dcramer_ has quit IRC12:31
*** adalbas has joined #openstack-infra12:32
*** matsuhashi has quit IRC12:33
*** matsuhashi has joined #openstack-infra12:34
*** aspiers has quit IRC12:34
*** nosnos has quit IRC12:35
*** nosnos has joined #openstack-infra12:35
*** aspiers has joined #openstack-infra12:38
openstackgerritRoman Podolyaka proposed a change to openstack-infra/config: Fix sqlalchemy-migrate py26/sa07 job  https://review.openstack.org/4468612:38
*** matsuhashi has quit IRC12:38
*** dafter has quit IRC12:40
*** nosnos has quit IRC12:40
*** dafter has joined #openstack-infra12:41
ttxmordred: no12:41
*** weshay has joined #openstack-infra12:41
ttxmordred: start by the scary "test nodes" graph @ http://status.openstack.org/zuul/12:41
ttxmordred: then look at download fails @ http://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html12:42
ttx(might be same issue around networking)12:42
*** hashar has joined #openstack-infra12:42
ttxmordred: then finally, where the heck is statusbot when you need it ?12:42
ttxmordred: gate is totally wedged right now.12:43
fifieldtthat looks awesome12:44
bauzassdague: ping ?12:44
fifieldtbut the amount of scrolling was annoying to get to the graphs ;)12:44
ttxfifieldt: if I didn't need it urgently for RC2 production I would probably find it funny too12:44
bauzassdague: I'm now at my office, still broken about my oslo.config version12:44
bauzasbtw, maybe ppl could help me ?12:45
fifieldtsorry ttx :) 2345 here and the brain is off, it seems12:45
* ttx sees his weekend vanish12:45
bauzashttp://logs.openstack.org/70/50970/1/check/gate-climate-python27/5ede61d/console.html12:45
bauzasoslo-config got pulled from Jenkins while it shouldn't12:45
bauzasmy own tox venv on my laptop doesn't get this pretty old oslo-config beta version12:45
*** dafter has quit IRC12:46
bauzasthe gate should be fine12:46
mordredwhy are we timing out on fetches from pypi.o.o ?12:46
*** jhesketh has quit IRC12:46
ttxmordred: you tell me12:46
*** openstackstatus has joined #openstack-infra12:47
mordredok. there's statusbot12:47
ttxyay, a bot12:47
ttx#status notice Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun)12:48
openstackstatusNOTICE: Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun)12:48
*** basha has joined #openstack-infra12:48
openstackgerritRenat Akhmerov proposed a change to openstack-infra/config: Add configuration for Mistral project  https://review.openstack.org/5120512:49
*** dhouck_ has joined #openstack-infra12:50
* ttx goes to get some fresh air12:50
openstackgerritEmilien Macchi proposed a change to openstack-infra/config: Add IRC bot on #openstack-rally for Gerrit changes  https://review.openstack.org/5120712:58
*** dkehn_ is now known as dkehn12:58
*** basha has quit IRC12:58
*** CaptTofu has quit IRC13:00
*** CaptTofu has joined #openstack-infra13:00
ttxmordred: fwiw we might be past the peak of networking issues and slowly recovering13:02
yolandahi, i'm trying to create users automatically in gerrit, they are created, correctly assigned to groups, but when i click on their links (aka /#/dashboard/xxxx), it shows me a not found page, what could be the issue there?13:02
ttxmordred: there are a few successful test runs by now, a few hours earlier they were all failing13:03
yolandai can see the /dashboard/ url for the logged user, but not for others, although i'm logged with an admin user13:03
ttxmordred: hard to tell more from where I stand13:03
*** miqui has joined #openstack-infra13:05
* mordred now useless and on the phone once more13:05
*** blamar has joined #openstack-infra13:07
*** michchap has quit IRC13:11
*** michchap has joined #openstack-infra13:14
*** michchap has quit IRC13:16
*** julim has joined #openstack-infra13:16
*** sandywalsh_ has quit IRC13:16
openstackgerritA change was merged to openstack-infra/elastic-recheck: Fix test_files_at_url_pass  https://review.openstack.org/5070613:16
*** DennyZhang has joined #openstack-infra13:16
*** mriedem has joined #openstack-infra13:18
*** basha has joined #openstack-infra13:19
fungihaving a look13:22
ttxfungi: network issues preventing dep fetching, potentially also the cause of test nodes starvation13:22
ttx(executive summary)13:23
ttxsee scary "test nodes" graph @ http://status.openstack.org/zuul/ and example dep fetching fail @ http://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html13:23
*** matty_dubs|gone is now known as matty_dubs13:25
fungiyeah, looking at graphs and checking rackspace's network status info13:25
*** thedodd has joined #openstack-infra13:27
fungiyeah, rs mentions no current issues and no posted maintenance for today13:27
*** sandywalsh has joined #openstack-infra13:28
sdaguemordred: having an issue with the cookiecutter repo - http://paste.openstack.org/show/48266/13:32
fungimmm, the /srv/static/doc filesystem on static.o.o is slap full. not sure whether that's having an impact but i'll give it a little more breathing room13:32
*** markmcclain has joined #openstack-infra13:33
fungier, /srv/static/docs-draft (cacti truncated the label in its graph)13:33
fifieldtthere have been many more doc patches than normal of late13:33
*** russellb is now known as rustlebee13:36
sdagueoh, never mind13:37
fungii'm increasing it by about 25% for now and then we can discuss whether we purge drafts more aggressively or add still more space13:37
sdaguemordred: it's probably a good idea to kill - https://github.com/emonty/cookiecutter-openstack as it is a high hit for openstack cookiecutter13:37
*** dafter has joined #openstack-infra13:38
*** dafter has quit IRC13:38
*** dafter has joined #openstack-infra13:38
mordredsdague: ++13:40
*** dizquierdo has joined #openstack-infra13:41
fungiso, on the nodepool.o.o graphs i see gaps around the time the server building volume increases there on the graph. either it went to lunch and stopped responding to snmp for ~20 minutes or there was a network blip (but i don't find gaps from the same time period for other hosts)13:41
fungii'll start dinning into logs on the nodepool server13:41
*** bnemec is now known as beekneemech13:44
fungithere were some errors in the nodepool image log around 0230 utc, but that's way earlier than the symptoms began and i don't see a recurrence there13:45
ttxfungi: is networking working now on those machines ?13:45
fungiseems fine at the moment. i've got a ping test going to static.o.o right now as well as a few devstack slaves13:46
*** fifieldt has quit IRC13:47
ttxfungi: the test nodes building graph still goes up the roof13:48
*** basha has quit IRC13:48
fungittx: yeah, hoping the logs will give me some inkling of why13:48
*** beagles is now known as seagulls13:49
ttxfungi: our collective guess was that they stalled on dep loading13:49
ttxfungi: the issue might be gone now but they are still rpeventing new ones from being spun13:49
*** aspiers has quit IRC13:49
* ttx wonders how much radical killing would be a solution at this point13:49
fungias of this week, nodepool will try to proactively build additional servers based on perceived demand for waiting jobs so that may be what we're seeing on the graphs13:49
ttxall I can say is thet status has not moved in gate for the last 5 hours13:50
jd__ttx: are you threating fungi? ;)13:50
*** alaski is now known as lascii13:50
fungibut yes it could be a symptom of network issues in hpcloud, though i'm finding no evidence of that yet13:50
ttxas in.. same jobs are waiting for resources13:51
ttxso my guess is that no new test resources are made available. It's not slow, it's stuck13:51
*** dcramer_ has joined #openstack-infra13:51
funginodepool's "alien" list (servers it sees but didn't create) is fairly long. not sure if that's a related symptom13:52
*** prad_ has joined #openstack-infra13:52
*** aspiers has joined #openstack-infra13:53
fungibut nodepool is definitely building and deleting servers based on what i see in its logs, and doesn't mention any serious issues so it could be a tuning problem13:53
ttxfungi: there hasn't been a devstack being run that I could see in the last.. 4 hours now13:54
fungioh, nevermind. most of those are our other non-devstack slaves in rackspace13:54
ttxright13:54
fungithe alien nodes it lists, i mean13:54
ttx(we still get pep8 tests run)13:54
*** pabelanger has joined #openstack-infra13:55
ttx4h20min to be precise13:55
ttxbut the networking issue is gone in the latest non-devstack runs we see13:56
fungithere are definitely still *some* devstack jobs running... https://jenkins01.openstack.org/job/check-tempest-devstack-vm-full/1954/console13:56
*** sarob has joined #openstack-infra13:57
fungiahh, here we go13:57
ttxfungi: yes, a dozen of them in the check line13:57
ttxnone in the gate13:57
*** sarob has joined #openstack-infra13:57
fungithere are no hpcloud slaves in jenkins, only rackspace13:58
fungithat arms me with something more i can look for13:58
*** DennyZhang has quit IRC14:01
ttxsigh, looks like a busy saturday coming up for me14:01
*** marun has quit IRC14:02
*** DennyZhang has joined #openstack-infra14:02
jd__ttx: yup, i'll try to be available too if you want to handle Ceilometer RC2 tomorrow14:02
ttxeven if we solved it now the lines are so long I won't get the RC2 stuff in before eod14:02
fungihttp://paste.openstack.org/show/48272/14:02
ttxand I have family visiting, yay14:02
*** sarob has quit IRC14:02
jd__191 building? O_o is that normal?14:03
fungimost hpcloud nodes are in a building state and many deleting with very few ready, similar to the pverall nodes graph on the zuul status page14:03
*** changbl has joined #openstack-infra14:03
ttxfungi: all our gate testing goes to hp nodes ?14:03
fungiwell, i know we've said in the past that we throw away something like 75% of the slaves we build on hpcloud because after waiting a couple minutes for them to boot they never show up14:03
*** SergeyLukjanov is now known as _SergeyLukjanov14:04
fungittx: yes, we have very few rackspace nodes (much lower quotas) and the slaves are slower14:04
*** _SergeyLukjanov is now known as SergeyLukjanov14:04
ttxmaybe HP asked all their servers to work back in their datacenters14:04
fungilokking to see if i can figure out what's up with hpcloud and hopefully we can get this back on track14:04
* ttx hesitates to cut new RC2s right now, fearing that pre-release jobs would get queued forever14:06
*** hashar has quit IRC14:11
*** yassine has quit IRC14:13
fungi#status alert The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC14:14
openstackstatusNOTICE: The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC14:14
*** ChanServ changes topic to "The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC"14:14
mordredfungi: wow. we have no HP nodes?14:14
fungimordred: well, we have a ton, but... we're not using them14:14
mordredhrm14:14
fungimordred: http://paste.openstack.org/show/48273/14:15
fungii'm hunting for any real error to explain it14:15
fungiwe have a handful of expected errors in the nodepool log for things like timeouts deleting servers, but they're few and far between and not for several hours now14:18
fungistuff that gets retried and would have errored again if it kept happening14:18
annegentle_node starvation sounds serious! Rooting for you guys.14:20
fungiannegentle_: thanks!14:21
*** rahmu has quit IRC14:24
fungithe handful of devstack servers nodepool knows about in hpcloud are also not getting used. i just ssh'd into one of them and it had an uptime of 20 hours14:25
fungissh'd into one in a deleting state and it's got an uptime of 6 hours. i suspect delete (and maybe build?) calls are not being respected14:26
*** yassine has joined #openstack-infra14:28
*** blamar has quit IRC14:28
funginovaclient itself shows the same server as "active" state14:28
mordredfungi: fantastic14:29
mordredfungi: anything I can do to help?14:29
* mordred is off phone now14:29
fungino idea. poke at things. i'm still casting my net wide14:30
fungidoing a nova delete of that "deleting" node seems to work14:30
fungiand nodepool is still listing that node in a "delete" state even after it's gone in hpcloud14:31
mordredfungi: that sounds very weird14:31
fungimaybe nodepool just hasn't noticed yet? (or maybe it doesn't expect anyone else to delete its nodes)14:32
fungianyway, since none of the devstack-gate slaves in hpcloud are currently being used, i'm thinking maybe we delete them all and... nodepool is stateless, right? just restart it?14:33
*** ruhe has joined #openstack-infra14:35
fungibut i'm uneasy going behind its back and making changes, restarting it and losing state which might help point us to the actual error, et cetera14:35
mordredyeah. I'm very shaky on doing things to nodepool without jeblair14:40
*** wenlock has joined #openstack-infra14:40
*** datsun180b has joined #openstack-infra14:41
fungii only just noticed that one of the columns in nodepool list's output is age in hours. quite a few of the "building" nodes have an age over 4 hours14:41
fungithose are the oldest in that state and that's about the timeframe where we started seeing issues, judging from the graphs14:42
*** cody-somerville has quit IRC14:43
fungiactually some almost 6 hours old14:44
fungiaround 0850 utc14:44
fungiall the nodes in a nodepool delete state are not much older than that. maybe 40 minutes older, tops14:46
fungifrom around 082514:46
*** cody-somerville has joined #openstack-infra14:46
*** pentameter has joined #openstack-infra14:47
openstackgerritMasashi Ozawa proposed a change to openstack/requirements: Set boto minimum version  https://review.openstack.org/5113114:48
fungiso i think starting around thenish, hpcloud ceased acting on any nova delete or create calls. maybe nodepool lost a persistent connection and didn't realize/retry?14:48
fungiit has established https sockets (so it thinks) to addresses very similar to what the hpcloud service endpoint resolves to. sniffing now to see if those are actually dead connections14:52
openstackgerritQiu Yu proposed a change to openstack-infra/jeepyb: Print help message and exit if no config file by default  https://review.openstack.org/5118214:52
fungiit's been a couple minutes already and i see no traffic at all to/from those addresses14:53
*** rcleere has joined #openstack-infra14:54
*** DennyZhang has quit IRC14:54
*** basha has joined #openstack-infra14:55
fungioho, so around 0850 nodepool *did* log this gem... ConnectionError: HTTPSConnectionPool(host='ord.servers.api.rackspacecloud.com', port=443): Max retries exceeded with url: /v2/637776/servers/a14b333a-9b03-48c8-b144-4f21a3eec405 (Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer)14:55
funginevermind. that was rackspace14:55
fungipretty large uptick in ssh timeouts waiting for servers to launch around that timeframe too14:58
*** dansmith is now known as Steely_Dan14:59
mordredspectacular15:00
mordredso have we perhaps exceeded another timeout threshold?15:00
funginot sure. also i've been sniffing for any traffic to/from 168.87.243.0/24 (where the hpcloud service endpoint resolves into and where nodepool claims to have a couple established https sockets to remote systems) and so far not a single packet for over 10 minutes15:02
fungiprobably much, much longer, but at least none since i started up tcpdump15:02
*** boris-42 has quit IRC15:03
*** basha has quit IRC15:04
dkranzmordred: I have a process going that is reading the console log for every successful tempest run (listening to gerrit) looking for reported bogus errors. Is that going to annoy any one?15:05
*** blamar has joined #openstack-infra15:05
*** cody-somerville has quit IRC15:05
fungidkranz: unlikely. are you pulling those console logs from logs.openstack.org?15:06
dkranzfungi: Yes.15:06
fungii didn't notice any huge uptick in outbound network utilization on it at any rate15:07
dkranzfungi: ok, cool. Just don't want to be a bad citizen...15:07
dkranzfungi: This will stop once we start failing builds that have bogus errors (or real ones).15:07
fungidkranz: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=311&rra_id=all15:07
*** thingee_zzz is now known as thingee15:08
fungii mean, yes, it's a lot of traffic but it's not at worrying levels, i don't think15:08
sdaguefungi / clarkb I'm respinning the htmlify-screen-logs.py into an os_loganalyze repository so I can do some sane test additions before upping the complexity for other log times15:09
sdaguelog types15:09
fungithough we do seem to have topped out at 100mbps briefly last week15:09
sdagueshould I just github this until good, then pull it back into openstack-infra?15:09
fungisdague: whatever's easy for you. we can import later or you can start off with a basic cookiecutter15:09
sdagueor would we want it as a gerrit repo earlier15:09
sdaguefungi: yeh, I started with a cookiecutter15:09
*** yassine has quit IRC15:10
sdagueso it should play nicely later15:10
fungii mean you can start out with your basic cookiecutter in gerrit or you can import it once it's usable--your call15:10
sdagueI was amused the cookiecutter has 3 pep8 errors init15:10
fungipatches welcome!15:10
sdagueyeh, I guess it's probably faster to get to working unit tests with me just committing and pushing15:10
*** mkerrin has quit IRC15:11
jeblairsdague: have you kept up with the -infra thread on log storing/serving?15:12
sdaguenot as much as I probably should15:13
ttxjeblair!15:13
sdaguesorry, it's been one of those weeks15:13
jeblairsdague: short: there's an idea that we might want to preprocess logs and statically serve them instead of using the wsgi app15:13
sdaguejeblair: ok15:13
jeblairsdague: i don't think that invalidates any of your past or planned work, but if we decide to go that way, it may change how we use it a bit15:13
sdagueso that wouldn't let us do the filtering that we're doing now, which is nice15:13
ttxjeblair: in case you're in "holy batman, what a backlog" mode, we are currently out of HP devstack node, effectively blocking the gate.. for the last 6 hours15:14
sdaguethe filtering being nice, that is15:14
fungijeblair: thoughts on why nodepool is not talking to hpcloud since around 0825 utc (that's the best i've been able to pin details down so far)15:14
jeblairsdague: i think we'd only do that if we found a way to accomplish the goals we get by filtering; anyway, your input is very welcome.15:14
jeblairsdague: it's all a bit brainstormy right now -- nothing urgent15:14
jeblairfungi: i'll go look15:14
sdaguesure, we going to do a session in HK?15:15
ttxjeblair: may or may not be related to network failures we noticed in test jobs fetching deps around the same time (which seem to be fixed now)15:15
jeblairsdague: lets15:15
jeblairfungi: nodepool has _extensive_ logging15:15
sdaguethat would be good brainstormy time for it. Right now I'd just like to get this to a realm where we aren't dropping all the keystone logs for logstash :)15:15
fungijeblair: yes, i've been trying to make sense of the logging and correlate it to the behavior we're seeing15:15
jeblairsdague: yeah; one of the participants on the thread isn't going to make it to hk, so i'm trying to lay some groundwork over email15:16
sdagueyep, no worries15:16
sdaguewho we going to miss in HK?15:16
jeblairsdague: jhesketh15:16
sdagueok, gotcha15:16
fungiit's not saying things like "i'm trying to build servers and they're never appearing (in fact it's not saying much of all--i think it's waiting for hours for them to become ready)15:16
ttxfungi: could the networking issues have caused permanent damage to that nodepool/HPcloud link ?15:16
jeblairfungi: yeah, it looks like they're all stuck in building, and errored out in such a way that the cleanup code failed15:17
fungittx: i'm not sure. i'm thinking maybe the tcp sockets to the service endpoint are actually dead and the nodepool server still believes them to be in an established state15:17
*** sandywalsh has quit IRC15:17
jeblairfungi: if you run 'nodepool list' you'll see a lot of 'None' values in th edb15:17
fungiright, i definitely saw that15:17
jeblairlike this: http://paste.openstack.org/show/48279/15:18
fungii also saw the periodic cleanup error, but i expected it to periodically error if it was continuing to have problems, being a periodic cleanup15:18
fungihowever, it only complained once, then was silent15:18
jeblairfungi: so for some reason, we set the cleanup delay for non-ready to 8 hours15:18
jeblairfungi: so it's going to wait another 2 hours before it starts deleting these15:19
jeblairso good news: it would probably fix itself in 2 hours.  :)15:19
jeblairwe should probably adjust that timing a bit.15:19
fungigot it. would have fixed itself while we slept if only it had started sooner15:19
fungiso what's the safest way to manually clean those up in the future?15:20
jeblairfungi: oh, i may have been wrong -- it may not have failed, you might be right -- it may actually have a couple hundred threads waiting for something15:20
fungidelete queries in the db?15:20
jeblairi want to spend a minute and try to find out if that's the case15:20
fungicertainly. i was very hesitant to disturb its current state lest i lose valuable evidence of the issue15:21
jeblairfungi: do you have any logged errors handy?15:21
fungijeblair: not pasted yet, but i can do that15:21
*** branen has joined #openstack-infra15:22
jeblairttx: can you tell me about the networking issues?15:23
*** beekneemech has quit IRC15:23
*** sandywalsh has joined #openstack-infra15:24
*** bnemec has joined #openstack-infra15:24
ttxjeblair: most  tests suddenly started to fail with dep download errors like http://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html15:24
fungijeblair: nodepool tracebacks in the log from around the time this started (though quieted down to some ssh timeout errors and then nothing of note for hours): http://paste.openstack.org/show/48280/15:25
*** bnemec has quit IRC15:25
ttxat around the same time, the "test nodes" graph on status/zuul started to drink heavily15:25
jeblairi note that we have servers stuck in build from both rax and hpcloud15:26
jeblairthe 'waiting for deletion' timeouts are mostly for hpcloud, but there's one rax15:27
ttxjeblair: the starvation only appears to affect devstack/gate nodes15:27
*** markmcclain has quit IRC15:28
jeblairi don't see anything on rax status, and nothing relevant on hpcloud status15:28
fungijeblair: it might have been a network disruption local to the nodepool server15:30
fungii saw a gap in its cacti graphs from that timeperiod, but couldn't correlate it to any other systems15:30
*** rnirmal has joined #openstack-infra15:32
jeblairgdb says all the threads are sitting in sem_wait15:33
jeblair(well, most of them)15:33
*** anteaya has joined #openstack-infra15:33
*** markmc has quit IRC15:33
jeblairwhich is really weird because the one locking thing nodepool does is to use queue.Queue which handles all the locking internally15:34
fungithough the gap is actually a little later than the logged errors... seeing it span 0915-0925 roughly while we were seeing deletion and ssh errors in the nodepool log an hour prior15:34
jeblairok, so i think i want to do the following:  add a thread-dump handler to nodepool like zuul has15:35
jeblairconsider using dequeue instead of queue15:35
jeblairi think the immediate cause of this may be a mystery for now15:35
jeblairbut if it happens again, hopefully the thread dump handler will help15:36
fungiat least we know where to focus debugging the next time this happens, and possibly minimize the disruption as well15:36
*** markmc has joined #openstack-infra15:36
jeblairyeah.  my thinking is that it's either a thread-related bug (which is really weird because that's hard to imagine except for a bug in the stdlib)15:36
jeblairor it could be a novaclient bug, where all of the novaclient client objects are stuck doing something15:37
jeblair(which may have been triggered by the host/network weirdness)15:37
jeblairso, for cleanup:15:37
fungiand then the manual cleanup for now is, what, shut down nodepool, run a delete query to remove any machines in a building or delete state manually and nova delete any of the failed deletes themselves, then start nodepool again?15:38
jeblairfungi: close15:38
jeblairfungi: i'd get the list of machines we want to delete from nodepool, restart it, then 'nodepool delete' each of them15:38
fungioh, that's nicer15:39
jeblairfungi: (nodepool should be capable of deleting anything it has a record for)15:39
fungiwill nodepool delete work on aliens too?15:39
fungii guess not, since no record15:39
jeblairfungi: then we can also use nodepool alien-list to get the others, and unfortunately no, we'll have to nova delete those15:39
fungithat's easy enough15:39
fungiokay, i can tackle that while you get to breakfast15:40
jeblairfungi: how did you know? :)15:40
jeblairfungi: nodepool list |grep building|awk '{print $2}'15:40
fungiheh15:40
jeblairfungi: is very handy15:40
jeblairfungi: nodepool list |grep building|awk '{print "nodepool delete " $2}'15:40
jeblairfungi: actually that's even handier15:40
fungiyup. i was using cut to a similar effect, but maybe a slightly more machine-parsable format option would be a nice furture addition15:41
fungis/furture/future/15:41
jeblairfungi: i'd recommend taking that list and splitting it into about 5 parts or so, and then background 5 scripts running through that15:41
*** blamar has quit IRC15:41
jeblairfungi: to balance speed vs likelyhood of hitting an api rate limit15:41
fungiyeah, don't want to get throttled15:41
fungiright15:41
*** blamar has joined #openstack-infra15:42
fungiokay, shutting down nodepool now and getting started on that unless you need anything else from the running process first15:42
sdaguefungi: cookiecutter question ... why does this pass tests - https://github.com/sdague/os_loganalyze15:42
sdagueI set up an assertTrue(False) in there to ensure it broke correctly, and no dice15:42
jeblairfungi: nope, go for it;  i'd go ahead and restart nodepool immediately though so it can better keep up with the still-running check nodes15:43
fungigot it--will do jeblair15:43
* ttx will get drunk now to forget he'll have to work over the weekend to catch up15:44
*** sarob has joined #openstack-infra15:45
*** amotoki has joined #openstack-infra15:45
sdaguettx: hopefully with some nice wine15:46
*** ruhe has quit IRC15:46
jeblairfungi: i can start work on the alien deletions if you want15:46
anteayattx keep that Hawaiian shirt handy, you never know15:46
fungijeblair: sure, i've got the building deletions going now15:47
*** Steely_Dan is now known as Steely_Spam15:48
jeblairfungi: cool i'm on it then15:48
fungi5 separate batches of ~50 each15:48
pabelangerSo, should I expect tox to run properly after I use cookiecutter of the first time?15:48
fungipabelanger: you and sdague seem to possibly be asking the same question15:48
* ttx will bbl15:48
pabelangerfungi, okay cool15:49
pabelangerI think it missed setting up versioning15:49
pabelangerfor defaulting to something15:49
fungiif you don't figure it out among yourselves shortly, i'll have a look once i wrap up the current firefight15:49
sdaguefungi: so my issue is actually subunit discover doesn't seem to find any tests15:50
sdagueand "passes" because of it15:50
*** alcabrera has joined #openstack-infra15:50
fungisdague: hrm, maybe the search path in the tox.ini is too strict?15:50
sdagueI don't think so15:51
sdagueif I manually venv, and run15:51
sdague./bin/python -m subunit.run discover -t ./ . --list15:51
sdaguenothing15:51
fungithe zuul test nodes status graph seems to reflect things are on their way to recovery15:51
fungiand i do see some jobs going in the gate queue now15:51
*** sandywalsh has quit IRC15:53
sdagueok, off to lunch15:53
*** sandywalsh has joined #openstack-infra15:53
*** sandywalsh has quit IRC15:54
mordredsdague, what's going on with discover?15:54
mordredand I see code?15:54
fungimordred: his repo is https://github.com/sdague/os_loganalyze15:54
jeblairfungi: aliens deleted15:54
fungijeblair: thanks!15:54
fungithe building ones are deleted now too15:54
*** cody-somerville has joined #openstack-infra15:55
mordredsdague: os_loganalyze/tests/ is missing a __init__.pyfile15:55
jeblairfungi: you probably want to 'nodepool delete' the ones in delete state as well, to speed things up15:56
fungijeblair: or at least they should be deleted but i see "building" state nodes in the nodepool list output with an age >7 hours still15:56
jeblairfungi: hrm15:56
openstackgerritMonty Taylor proposed a change to openstack-dev/cookiecutter: Actually git add the __init__.py file  https://review.openstack.org/5123815:57
*** sandywalsh has joined #openstack-infra15:57
mordredsdague, fungi ^^15:57
pabelangerhttp://pastebin.com/YeMM1kiK15:57
jeblairfungi: i just deleted one and it went away15:57
pabelangerthat's my error for cookiecutter15:57
*** sandywalsh has quit IRC15:57
fungijeblair: nevermind--i think at least one of my delete jobs was hung in the background for a moment15:57
mordredpabelanger: ah! so, you need to make it a git repo and actually commit the first commit before that will work15:58
pabelangermordred, Ah, I see15:58
pabelangerokay15:58
*** sandywalsh has joined #openstack-infra15:58
mordredsorry, I keep meaning to hack something in so that it will a) do that for you or b) print a warning15:58
mordredpabelanger: also - see the note to sdague above15:58
pabelangermordred, roger15:58
mordredpabelanger: I forgot to git add a file :)15:58
fungijeblair: or not. the background jobs did all finish like i thought, i'm just getting output from nodepool on my terminal after restarting it. may not properly close its file descriptors?15:58
jeblairfungi: never seen that15:59
jeblairfungi: ps suggests there's at least one delete script going15:59
*** bnemec has joined #openstack-infra15:59
fungihuh. jobs does not list it15:59
fungioh, yes it does actually16:00
fungiokay, so it's still churning apparently. may have gotten throttled after all16:00
fungiit went silent for several minutes there16:01
jeblairah16:01
funginow it's done16:02
annegentle_way to go looks like you turned a corner! http://bit.ly/1afAl8w16:02
jeblairfungi: btw, errors about '2249297' are my fault, btw16:02
fungiand yes, no building nodes older than 15 minutes now16:02
annegentle_some days I just want to be cheerleader bystander but deadlines keep getting in the way16:02
fungijeblair: okay, noted16:02
jeblairfungi: i accidentally nova deleted it, but it really was building;16:02
fungik16:02
mordredannegentle_: that's a sexy graph!16:03
fungii've got a round of 5 parallel scripts deleting the "delete" state nodes now16:04
clarkbmorning16:06
jeblairclarkb: impeccable timing! :)16:07
pabelangerokay cool16:07
clarkbjeblair: looks like it16:07
pabelangerflake8 a little sad16:07
pabelangerbut that is okay for now16:07
clarkbbauzas: are you running tox with -r locally and are you running that in a clean git checkout?16:08
*** SergeyLukjanov is now known as _SergeyLukjanov16:10
*** _SergeyLukjanov has quit IRC16:10
clarkbjeblair: fungi: so nodepool was having a hard time with the hpcloud endpoints?16:11
clarkbbut is all better now?16:11
fungiclarkb: that was my earlier theory, but no longer suspect that to be the case16:11
fungijeblair did some investigation in a debugger, found a possible deadlock but without a thread dump it was hard to pinpoint the contention16:12
clarkbI see. Does nodepool need to the zuul threaddump signal catcher?16:13
fungibasically, that was his suggestion16:13
clarkbthat should be easy to port over. I can poke at it later16:13
jeblairclarkb: i've got it -- almost done16:15
clarkbcool16:16
*** matty_dubs is now known as matty_dubs|lunch16:16
pabelangermordred, looks like something in the import process is messing up with flake816:16
pabelangerhttp://pastebin.com/xxZqgJ0T16:16
pabelangerhttp://pastebin.com/PVB4wDQp16:17
clarkbpabelanger: the no newline at end of file? I think mordred filed a bug against upstream about that16:17
pabelangerokay cool16:17
pabelangerthat was about the only other thing I see about tox being unhappy\16:18
mordredyup. I have an upstream PR up16:18
clarkbjeblair: I think https://review.openstack.org/#/c/42393/ can probably be approved if you are happy with it16:19
clarkbmordred: https://review.openstack.org/#/c/33926/ I haven't approved that becuase I don't have time to babysit (eg check results of change before next gerrit restart)16:19
clarkbmordred: but if you do, feel free to approve16:19
clarkbjeblair: https://review.openstack.org/#/c/45294/ has comments for you as well16:20
mordredclarkb: same here16:21
pabelangerjeblair, hope to get back into nodepool reviews today16:21
mordredclarkb: I think when I approve that, I'll run a manual puppet agent --test on review.o.o and watch the patch output (should be null-ish)16:21
*** bnemec is now known as beekneemech16:22
*** odyssey4me2 has joined #openstack-infra16:22
* fungi finally had a moment to put on his "i voted" sticker16:23
*** odyssey4me is now known as Guest3605116:23
*** odyssey4me2 is now known as odyssey4me16:23
mordredclarkb: https://review.openstack.org/#/c/48355/ could use a look from you or fungi - you guys had great comments last time16:25
*** markmc has quit IRC16:25
*** _david_ has joined #openstack-infra16:27
_david_zaro, ping16:27
clarkb_david_: zaro isn't around this week16:28
_david_clarkb, thx, i fixed his patch, and wanted to ask if he can test it?16:28
_david_https://gerrit-review.googlesource.com/#/c/48254/16:29
*** enikanorov-w has quit IRC16:29
_david_clarkb, i tested upgrade to Gerrit schema 85 and now a permission can be granted to new system group "Change Owner".16:30
*** blamar has quit IRC16:30
_david_jeblair, mordred, clarkb we host wip-plugin on gerit-review16:31
_david_git clone https://gerrit.googlesource.com/plugins/wip16:31
*** markmcclain has joined #openstack-infra16:31
*** derekh has quit IRC16:32
clarkbcool16:32
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Add a thread dump signal handler  https://review.openstack.org/5124816:33
anteayafungi: yay!16:34
*** anteaya has quit IRC16:34
jeblairclarkb, fungi: fungi may still be right -- it's possible that it had a hard time with the hpcloud endpoints which caused a bug.  i don't know if it was a deadlock or not; it's really hard to say.16:34
jeblairclarkb, fungi: it's entirely possible that we were just sitting in a novaclient call, forever.16:35
fungi#status ok the gate is moving again for the past half hour or so--thanks for your collective patience while we worked through the issue16:36
openstackstatusNOTICE: the gate is moving again for the past half hour or so--thanks for your collective patience while we worked through the issue16:36
*** ChanServ changes topic to "Discussion of OpenStack Project Infrastructure | Docs http://ci.openstack.org/ | Bugs https://launchpad.net/openstack-ci | Code https://git.openstack.org/cgit/openstack-infra/"16:36
*** mrodden has quit IRC16:36
jeblairmordred, clarkb: https://review.openstack.org/#/c/45294/16:36
jeblairmordred, clarkb: i don't care how stackforge projects do project management16:37
*** dkehn_ has joined #openstack-infra16:37
jeblairmordred: but i do care that if people have tag permissions and don't know how to use them, then we get called in to clean it up, which is complicated, takes time, and is not scalable16:37
*** jpich has quit IRC16:38
*** dkehn has quit IRC16:38
jeblairmordred: so i think the best compromise between full access and no access to push tags, is that we ask that they limit the group of people who can tag to a small set who fully understand the process16:38
jeblairmordred: is that unreasonable?16:38
mordredjeblair: I don't think it's unreasonable, - but in this case they're asking for the group to match the group that has tag access for libra itself16:41
mordredjeblair: so, effectively, I believe it is the thing you are asking for, AIUI16:41
mordredLinuxJedi: ^^ right? do I grok?16:41
jeblairmordred: that's fine then.16:42
*** thomasbiege has quit IRC16:42
LinuxJedimordred: yep16:42
LinuxJedimordred: which is really small anyone, and only the people that do it now16:42
mordredexcellent16:42
*** _david_ has quit IRC16:42
jeblairLinuxJedi: the think i wanted to ensure is that it's a small group that understands the process/dangers.  sounds like that's the case.  thanks.16:43
*** _david_ has joined #openstack-infra16:43
LinuxJedijeblair: oh hell yes.  That groups is only me Shrews, marcp and pcrews.  We are the only ones that would do tagging16:44
jeblairi have +2d16:44
ShrewsLinuxJedi: Actually, only you and I are part of the -milestone group16:44
LinuxJedieven better16:44
_david_clarkb, can you ask zaro to test that patch?16:45
_david_because Gerrit maintainer would like to cut stable-2.816:45
clarkb_david_: yes, I will let him know when he is back16:45
_david_clarkb, weekend?16:45
clarkb_david_: oh, well he is AFK until monday iirc16:45
*** thomasbiege has joined #openstack-infra16:46
LinuxJedimordred: maybe a future release for gerrit/git-review should be to have a code review system for tags if there are worries.16:46
clarkbLinuxJedi: yes! that would be awesome16:46
*** mrodden has joined #openstack-infra16:47
*** hogepodge has joined #openstack-infra16:47
*** dkehn has joined #openstack-infra16:52
*** _david_ has left #openstack-infra16:52
*** dkehn_ has quit IRC16:54
*** matty_dubs|lunch is now known as matty_dubs16:54
openstackgerritA change was merged to openstack-infra/config: Remove tuskarclient pylint job.  https://review.openstack.org/4996516:56
*** Ryan_Lane has quit IRC16:58
*** dkehn_ has joined #openstack-infra17:01
*** dkehn has quit IRC17:02
mordredLinuxJedi: yes. we're actually planning that ish17:03
mordredLinuxJedi: or, a tool that lets you do "please make a new minor release for me"17:03
*** rahmu has joined #openstack-infra17:03
mordredLinuxJedi: so it knows how to find your current version, logically increment the thing you asked it to, run the tag command with -s, etc17:04
jeblairmordred: a pbr function that implements 'python setup.py release' ?17:18
mordredjeblair: yah. something that like17:19
mordredsomething like that17:20
mordredalthough I was considering making it two commands or splittable - so you could do the local tagging separate from pushing the local tag17:20
jeblairmordred: i think that's a good idea17:21
mordred(I usually do the tag and then do an sdist to check that it worked and stuff)17:21
*** hemnafk is now known as hemna17:22
*** osanchez has quit IRC17:27
sdaguemordred: woot17:28
sdakejeblair re our conversation at cloudopen regarding using heat to run the gate jobs, is zuul the software that does all that?17:31
*** Ryan_Lane has joined #openstack-infra17:36
fungisdake: zuul coordinates and acts as a scheduler, while jenkins handles the execution and artifact collection17:36
fungiat least presently17:36
sdakedoes jenkins execute some scripts to do the building of the vms?17:36
fungisdake: nodepool (and in some cases humans) do that part17:37
sdaguemordred: so curiously cookiecutter seems to trim newlines at the end of files, so it un pep8's our template17:37
*** Ryan_Lane has quit IRC17:38
fungisdague: specifically nodepool has some pool management heuristics including a semi-predictive evaluation of current demand and uses that to try and maintain sufficient levels of available virtual machines17:38
*** arosen1 has joined #openstack-infra17:39
fungias well as garbage-collecting the machines once they've been used17:39
*** thomasbiege1 has joined #openstack-infra17:39
sdakefungi does it use bare metal nodes as the backend, or openstack instances?17:40
*** arosen has quit IRC17:40
mordredsdake: that is correct. I have submitted a PR to upstream to fix it17:40
fungisdake: it presently uses openstack/nova-based service providers who donate resources to us17:40
fungisdake: though there is work underway to start testing tripleo on bare metal i think?17:41
*** melwitt has joined #openstack-infra17:41
fungiand have nodepool coordinate to a nova-bm/ironic environment the tripleo peeps are maintaining17:42
*** melwitt has quit IRC17:42
fungithough i've not been paying as close attention to that as i should, so i'm light on details there. lots of other stuff going on17:42
sdakefungi which part of nodepool does the orchestration of the vm?17:42
*** reed has joined #openstack-infra17:42
*** thomasbiege has quit IRC17:43
fungisdake: it has an image builder which calls into the template vm and runs some shell scripts and puppet to get it into a desired state, then shuts it down and uses it to clone others17:43
hogepodgepebelanger clarkb Do you know of anyone with free cycles to finish the review of https://review.openstack.org/#/c/49020/ ?17:43
*** SergeyLukjanov has joined #openstack-infra17:43
fungisdake: and refreshes that daily17:43
*** melwitt has joined #openstack-infra17:44
clarkbhogepodge: fungi maybe?17:44
fungiif by orchestration you mean setup, and not the running of the tests/jobs on a particular vm17:44
sdakeso when zuul says "hey I have another job for you" how does that get launched?17:44
fungisdake: zuul tells jenkins to run it and on which vm17:44
*** Ryan_Lane has joined #openstack-infra17:44
sdagueclarkb: ok, first unit tests into the htmlifier to shore up it's behavior, and I already found a bug :)17:44
sdagueyay, tests17:45
clarkbwoot17:45
sdakefungi so jenkins logs into the box and does some ssh commands or something?17:45
fungisdake: zuul knows a list of the jobs and under what circumstances they should be run and which systems can run them, and jenkins has details on what each actual job does17:45
*** boris-42 has joined #openstack-infra17:45
fungisdake: a jenkins master can use multiple means of controlling its slaves, but we rely on ssh17:46
fungisdake: jenkins also has a java-based agent which runs on each slave and communicates state with the master17:46
sdakecool let my brain cook on that for awhile17:46
sdakethanks for the info fungi17:46
fungisdake: you're welcome. these are also covered with pretty diagrams and examples in a couple of brief slide presentations published at http://docs.openstack.org/infra/publications/17:47
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Add a thread dump signal handler  https://review.openstack.org/5124817:48
sdake_thanks bookmarked for later17:48
*** jerryz has joined #openstack-infra17:49
*** rickerc has quit IRC17:49
*** thomasbiege1 has quit IRC17:50
*** dkehn_ is now known as dkehn17:51
*** amotoki has quit IRC17:51
*** alcabrera has quit IRC17:52
*** thomasbiege has joined #openstack-infra17:53
fungihogepodge: left a comment on it. i think you got some of your cleanup backwards in the new patchset17:53
hogepodgefungi: I think I did too.17:54
hogepodgefungi: :-)17:54
hogepodgefungi: This is why I love gerrit17:54
hogepodgefungi: Thanks.17:54
*** thomasbiege has quit IRC17:55
fungimy pleasure17:55
*** rickerc has joined #openstack-infra17:55
*** odyssey4me has quit IRC17:56
jerryzfungi: could you tell me which dns server is used for devstack gate slaves?17:58
*** moted has quit IRC17:58
*** nati_ueno has joined #openstack-infra17:59
jerryzfungi: sometimes i hit this bug/question https://bugs.launchpad.net/devstack/+bug/119084417:59
uvirtbotLaunchpad bug 1190844 in devstack "./stack.sh is resulting any error "/opt/stack/devstack/functions: line 1228: : No such file or directory" on stable/grizzly branch" [Undecided,Invalid]17:59
fungijerryz: depends on the provider i think, but i'll check17:59
reedhello folks18:00
fungihello reed18:00
*** johnthetubaguy has quit IRC18:00
fungijerryz: it may even vary by region/availability zone... in rackspace dfw we use 72.3.128.240 and 72.3.128.24118:01
jerryzfungi: thanks. i think the dns i use which is 8.8.8.8 give me the wrong IP for cdn.download.cirros-cloud.net18:01
sdaguejerryz: the cdn for cirros got flakey some time yesterday18:02
fungijerryz: ahh, yes there were some cdn issues for cirros image downloads which got worked through yesterday. are you still encountering it in current runs?18:02
jerryzfungi: for now, i just put the right ip in /etc/hosts18:03
*** gyee has joined #openstack-infra18:03
jerryzfungi: the cdn chosen in seatle,WA works for me18:04
fungiokay, cool18:04
*** dkehn has quit IRC18:04
*** alcabrera has joined #openstack-infra18:09
jerryzfungi sdague: the slaves from hp or rackspace for jenkins.o.o also has that problem?  i had thought the dns i used was not smart enough to refresh available cdn ip addresses.18:10
*** pycabrera has joined #openstack-infra18:13
*** zehicle_at_dell has joined #openstack-infra18:15
*** esker has joined #openstack-infra18:15
*** alcabrera has quit IRC18:16
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Rename ASRT -> AGT  https://review.openstack.org/5126718:20
jeblairclarkb: ^ to try to make the debug log from nodepool more clear18:20
*** dizquierdo has left #openstack-infra18:22
notmynamejeblair: I see https://github.com/openstack/swift-bench exists now. does that mean we're good to go? well, after I commit a .gitreview doc18:26
jeblairnotmyname: yes, i think it merged last night sometime18:26
jeblairnotmyname: hold on that...18:27
notmyname[gerrit]18:27
notmynamehost=review.openstack.org18:27
notmynameport=2941818:27
notmynameproject=openstack/swift-bench.git18:27
notmynamejeblair: proposed .gitreview ^^18:27
*** CaptTofu has quit IRC18:27
jeblairfungi, clarkb, mordred: http://git.openstack.org/cgit/openstack/swift-bench/tree/18:27
jeblairlooks empty18:27
*** CaptTofu has joined #openstack-infra18:28
notmynamejeblair: empty? I see stuff18:28
jeblairnotmyname: the joys of load balancing; it's empty on git03.o.o18:29
notmynamejeblair: so should I push a change or not?18:30
notmynamejeblair: assuming that proposed .gitreview is good18:31
jeblairnotmyname: i think you're good.  since you don't have any jobs yet, nothing automated is going to try to hit git.o.o.  i'll fix git03 shortly.18:32
*** CaptTofu has quit IRC18:32
notmynamejeblair: great18:32
jeblairnotmyname: (earlier i was worried it was a sign something more serious broke)18:32
*** mestery has joined #openstack-infra18:32
notmynamejeblair: https://review.openstack.org/#/c/51268/18:32
notmynamejeblair: if you can give me your +1 there, I'll merge it and we should be off to the races18:33
jeblairnotmyname: done18:34
*** melwitt1 has joined #openstack-infra18:34
notmynamejeblair: thanks for your help18:35
*** dafter has quit IRC18:35
jeblairnotmyname: no prob!18:35
*** itchsn has joined #openstack-infra18:36
*** melwitt has quit IRC18:37
*** dcramer_ has quit IRC18:37
*** sarob has quit IRC18:38
*** itchsn has quit IRC18:38
*** CaptTofu has joined #openstack-infra18:38
*** dkehn has joined #openstack-infra18:39
jeblairclarkb, mordred, fungi: ok, the swift-bench thing on git03 was just the replication race condition; i replicated again and it's updated.  i'm looking forward to having salt do this.  :)18:41
jeblairnotmyname: ^ all the git.o.o servers have swift-bench now18:42
notmynameyay18:42
*** dafter has joined #openstack-infra18:44
*** alexpilotti_ has joined #openstack-infra18:45
ttxjeblair: hey, nice work on unbreaking the gate! What caused the initial fail ?18:47
*** alexpilotti has quit IRC18:47
*** alexpilotti_ is now known as alexpilotti18:47
ttx(if we know that)18:48
*** dcramer_ has joined #openstack-infra18:53
mordredjeblair: ++ salt18:55
*** Bada has joined #openstack-infra19:00
dkranzThere seems to be a problem with https://review.openstack.org/#/c/50795/ merging19:05
dkranzjenkins reported success an hour ago but zuul shows some of the jobs as "queued". That is strange.19:06
*** dcramer_ has quit IRC19:06
*** melwitt has joined #openstack-infra19:06
*** melwitt1 has quit IRC19:06
clarkbdkranz: the +1 verified is for your recheck. still waiting on gate tests19:09
dkranzclarkb: ok, thanks. Guess things are really slow19:09
hub_capmordred: whats the status on the work we talked about in seattle? the images stuff.. im at a point where i can take any/all of it on19:16
*** dhouck_ has quit IRC19:16
*** jog0 is now known as flashgordon19:17
hub_capclarkb: ^ ^19:17
hub_capflashgordon: silly handle friday?19:17
flashgordonhub_cap: casual nick friday19:18
flashgordonmost of the nova folk do it19:19
hub_capoh yes im aware :)19:19
*** dcramer_ has joined #openstack-infra19:20
* fungi thinks every day is casual nick friday (and hawaiian shirt tuesday)19:20
mordredhub_cap: it's - uhm.19:20
hub_capi thought so :)19:20
mordredwe need to add a thing to the d-g caching scripts to download the images and cache them19:21
mordredthen you're good to go19:21
*** alexpilotti has quit IRC19:21
hub_caplike i said, i can help w any of it :) is someone working on the d-g caching script stuff?19:21
mordrednope19:24
*** dprince has quit IRC19:24
hub_capmind if i take a stab @it?19:25
mordredhub_cap: please do! https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/nodepool/scripts/devstack-cache.py19:25
mordredhub_cap: is the script you want to look at19:25
hub_cap<319:25
mordredit currently has a place where it pre-downloads images referenced by devstack19:25
hub_capcool ill peep it and ask questions :)19:26
mordredhub_cap: steps forward would be either just add direct curl commands to download the images19:26
mordredhub_cap: OR - you could get fancy and read image elemens19:26
mordredelements19:26
mordredin https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/nodepool/scripts/prepare_devstack.sh19:26
mordredyou'll see where we pre-clone a bunch of git repos19:26
mordredyou could add needed repos there, and then read files in them to find out what images they want to download19:27
mordreddepends on how clever you want to be19:27
hub_capya i was wondering about that.. are we wanting to test every flavor of image elements? or just the "supported" ones from openstack perspective19:27
hub_capfor trove, i dont think i need to test fedora/centos, i can test centos and call it a day. but thats my perspective..19:28
hub_capdo we have a list of official supported linux flavors?19:29
mordredfrom a d-g perspective19:29
mordredwe want to precache things that jobs that run on the nodes might want to download19:29
mordred(this is why we go through and pre-download all of the debs that devstack _might_ wind up installing, but not install them)19:29
mordredbut "might"19:29
*** mriedem has quit IRC19:30
mordredis as defined by the set of things actually referenced in elements in repos that we might actually run19:30
mordredclarkb, fungi, jeblair: did you see [openstack-dev] "Thanks for fixing my patch" ?19:30
*** pycabrera is now known as alcabrera19:31
mordredseems like a policy amendment that might apply nicely for us too19:31
*** davidhadas has joined #openstack-infra19:31
hub_capmordred: that makes sense but if someone busts out a scientific linux element, do we want to download/cache that?19:32
hub_capoh and imma use the image elements, just fyi, cuz they have a nice little set of image url details i dont care to duplicate19:33
clarkbmordred I did19:33
clarkbmordred I think we basically do that already but only when in a time crunch19:34
hub_capcurrently there are fedora/centos/ubuntu, i can cache all 3 if we think we _may_ need to test on them all19:34
clarkbwe could shift to being proactive about it19:34
fungimordred: in fact, i do try to do that when i'm in a situation to do so (availability and knowledge-wise)19:36
fungii think it's a great idea19:36
fungii assumed it was already an accepted workflow among our team19:36
ttxfungi: did you guys get to the bottom of today's issue, root cause ?19:36
fungittx: no, we narrowed it down but there were insufficient debugging capabilities available, so jeblair has added those to the daemon for "next time"19:37
ttxfungi: ack19:38
ttxaqt least whatever it was that caused it, it's gone now19:38
fungiwell, whatever caused it got it into a perpetual state which was cleared through a restart, but next time we can generate a thread dump and restart it right away, then have the luxury of debugging while things don't remain indefinitely unusable19:39
ttxcinder rc2 on its way, hold to your seats19:44
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Change test_queries from logical AND to OR  https://review.openstack.org/5016019:46
*** arosen1 has quit IRC19:47
*** arosen has joined #openstack-infra19:48
clarkbmordred: do you have any more ideas on openstack_citest mysql perms? I think granting create and drop globally is necessary19:53
mordredclarkb: I believe you are correct19:59
*** zehicle_at_dell has quit IRC19:59
mordredclarkb: otherwise, we could use mysql sandbox to spin up per-testrun mysqls and tear them down afterwards...19:59
* mordred hides20:00
flashgordonif anyone is looking for reviews to do, hacking has some reviews that need some attention https://review.openstack.org/#/q/status:open+project:openstack-dev/hacking,n,z20:01
fungimordred: isn't that basically what ceilo's mongodb functional tests use?20:02
jeblairfungi, mordred, clarkb: yeah, i believe that has been accepted around infra repos, and i would expect it to be considered on-form in openstack repos too20:02
jeblairfungi, mordred, clarkb: one good reason not to do that for many patches in infra is to help people learn about our systems --20:03
flashgordonclarkb: btw you are the most active reviewer in all of openstack20:03
flashgordonhttp://russellbryant.net/openstack-stats/all-reviewers-180.txt20:03
jeblaira lot of folks come in and say "i want to try to figure this out", and you know, teaching to fish and all.20:03
fungiagreed. best reserved for urgent issues. it's not like we have tons of time to spare fixing up non-urgent changes20:05
ttxcinder rc2 out20:05
* ttx calls it a day20:05
jeblairclarkb: congrats!20:05
mordredclarkb: w00t!20:06
mordredwow. I'm 14th20:06
clarkbflashgordon: I saw that and was a little surprised. some of that is from mass rechecks though20:06
mordredclarkb: ssh20:06
clarkb:)20:06
jeblairclarkb: you leave votes with rechecks?20:06
clarkbjeblair no20:06
mordredclarkb: actually, that's only tracking votes20:07
jeblairclarkb: then... no? :)20:07
clarkbis thar only votes o_O20:07
clarkbwow20:07
flashgordonclarkb: most are in infra it looks like20:07
lifelesswow, I'm up there20:07
clarkbmy review queue is huge. I try to stab at it as often as possible20:07
lifelessand wth is dripton ?20:07
lifelesshttp://russellbryant.net/openstack-stats/all-reviewers-30.txt20:08
lifeless11th, yay.20:08
hub_capthats it, im +1'ing random shit for the next 30 days20:08
lifelessyeah, no.20:09
hub_caphahaha20:09
hub_caplifeless: great work, +1.. everything20:09
jeblairhub_cap: that'll show up in the +/- column20:09
hub_capi know ill be a +1 baller jerryz20:09
hub_cap*jeblair20:09
hub_captab-fail20:09
sdaguehub_cap: yeh, that's why the % pos and conflict columns are there to try to sanity check things20:10
sdagueif you see 90+% pos, the person has missed the point of reviewing20:10
jeblairfor quite some time, clarkb and i have both maintained an 80% average20:10
hub_cap66%.. thats at least 20% worth of system-gaming i can do... numbers make u look good right??20:11
hub_cap;)20:11
flashgordonsdague: looks like i am missing the point  http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt20:12
mordredflashgordon: me too20:12
jeblairhub_cap: yeah, i think you could be 20% nicer, but you wouldn't be you.20:12
flashgordonlifeless: ^^20:12
hub_capjeblair: TRU20:12
mordredalthough part of my problem is that when I'm -1 I tend to poke someone in IRC to ask/chat about it20:12
mordredI really need to leave that in the system more20:13
hub_capmordred: i stopped doing that20:13
hub_capi was super low on tracked reviews20:13
jeblairmordred: i do both20:13
mordredjeblair: I need to do both20:13
hub_capi looked like a schmuck (well more of one than normal)20:13
hub_capyeah ptl of trove has 2 reviews in the past 30 days20:13
mordredjeblair: can we set up a bot that will let me comment on gerrit thigns?20:13
mordredjeblair: so I can say #bot -1 13415 I have issues with this20:13
mordred?20:13
hub_capemacs has a command for that mordred20:14
mordredhub_cap: good point20:14
jeblairmordred: yeah, what could go wrong with giving an irc bot super-super-admin access in gerrit?20:14
lifelessflashgordon: ?20:14
mordredjeblair: can't see anything wrong with that20:14
fungihow did i get to #6? i feel perpetually behind on reviews :/20:15
jeblairmordred: you're now at 92% positive reviews! ;)20:15
mordredflashgordon: I do find it interesting that my 30 day percentage is about == to my 180 day20:15
mordredjeblair: I am?20:15
*** adalbas has quit IRC20:15
mordredoh - just in infra?20:15
fungiugh, i'm also by far the most "positive" reviewer in the top 10 besides mordred20:15
jeblairmordred: sorry, EJOKE20:15
flashgordonlifeless: my review stats are not good http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt20:15
lifelessflashgordon: you are fairly positive20:15
lifelessflashgordon: OTOH you've been reviewing code mainly written by experienced core folk20:16
jeblairmordred: er, the idea was that you gave a cursory +1 to the idea of giving an irc bot super-super-admin access to gerrit.20:16
flashgordonlifeless: I have been doing a a lot of -0s and not -120:16
fungimordred: we need to be more curmudgeoney apparently20:16
flashgordonlifeless: yeah20:16
lifelessflashgordon: I don't think that counts against you; in fact I'd kindof like to see a partitioned metric20:16
lifelessreviews vs core20:16
lifelessreviews vs noncore20:17
hub_capfungi: maybe yall just do such good work that theres nothing to -1 and this system doesnt accurately track that20:17
lifelessI suspect it would be interesting20:17
lifelessrustlebee: ^ But I have no plans to implement just yet :P20:17
flashgordonlifeless: it would be interesting20:17
rustlebeetrack all the things20:18
* mordred needs to go back to reviewing first thing in the morning20:19
mordredand clearing the entire outstanding queue down20:19
clarkbmordred: I can't do that because by 9:30 am PST all the fun stuff is happening20:20
clarkbI have found post dinner to be good for reviews20:20
mordredclarkb: wake up at 6:30am PST like fungi and I!20:21
fungii also think one of the things which keeps my review average on the positive side, besides addressing concerns via irc only (which i should definitely stop doing) is not leaving a negative vote if someone else already has unless it's for a different issue, even if i reviewed the current state of the patch20:21
mordredyah20:21
fungii should probably just get in the habit of it, and not worry so much about people potentially getting offended by negative score dogpiling20:22
lifelessfungi: I usually do what you describe there20:23
mordredclarkb: btw - you're welcome for the email I just sent ;)20:23
lifelessfungi: often if a patch has a -1 already, I won't even review it20:23
lifelessother than a cursory check to see if the submitter replied saying 'no, I disagree'20:23
mordredfor the folks here who are not HP employees (shocking) I just sent an email to the internal openstack interest mailing list with the subject "Clark Boylan is the most active reviewer in all of OpenStack for Icehouse"20:23
mordredlifeless: I actually have a search filter that keeps me from seeing things with a -120:24
mordredbut largely that's because if jeblair or clarkb or fungi have -1'd something, it's pretty darned solid20:25
*** yolanda has quit IRC20:25
*** sandywalsh has quit IRC20:26
fungiespecially if i -1'd it... must have been written in go or something20:27
*** weshay has quit IRC20:27
mordredrustlebee: wow. your -2 count is so high!20:27
rustlebeefeature freeze did it probably20:28
sdaguemordred: feature freeze does that20:28
mordredah20:28
rustlebeebut i do tend to have a higher -2 count than most anyway :)20:28
jeblairso it looks like today is exercising the nodepool burst code20:28
rustlebeei love saying "NO!"20:28
mordredneat20:28
sdagueyou would not believe the crazy that people push after FF :) lots of people don't pay attention to the calendar20:28
mordredwe should have a "Block" button which isn't tied to code review20:29
fungior to the mailing list or to irc or to other people's comments on their other reviews or20:29
jeblairif you look at the nodepool graph, the top of the green line isn't flat anymore; i think whenever that's the case, and the green line is above its normal level, it's bursting due to demand from gearman20:29
jeblair(if it's not flat and it's below the normal level, we're hitting max capacity)20:30
mordredjeblair: oh neat!20:30
lifelessoh btw infra people20:30
lifelesstripleo now has an externally accessible trunk deployed kvm cloud20:31
lifelessupdates every 40m +-20:31
mordredlifeless: when you say updates - you mean goes away and comes back, yeah?20:31
lifelesserm, trunk of OpenStack's API services etc, for clarity (it's not /just/ trunk of tripleo's code:)20:32
lifelessmordred: yes, making it preserve vm's is the next MVP20:32
mordredneat!20:32
lifelessmordred: and after that having it not interrupt shit20:32
lifelessright now hiera has credentials for infra on the grizzly kvm cloud20:32
lifelesswhich should be very reliable as it's entirely static20:32
jeblairlifeless: i'm excited about all of that20:33
lifelessthis is just a headsup on where the next thing is at20:33
* mordred can't wait until we add some nodepool load to your CD cloud so we can watch you update under piles of load20:33
jeblair++20:33
lifelessyay :)20:34
lifelessjeblair: I believe there is a nodepool bug preventing the tripleo experimental job being enabled? Can we help with that?20:34
jeblairlifeless: i think we have the nodepool changes in place to improve our chances of using the grizzly cloud without blowing everything up.20:34
jeblairlifeless: i'm not sure all of them are in the running nodepool yet20:35
jeblairlifeless: but perhaps this weekend we can restart nodepool and put that in again20:35
jeblairsince we had a fire this morning, i want to take it easy for a while to give things a chance to catch up and hopefully minimize impact to the release process20:36
mordred++20:36
*** prad_ has quit IRC20:36
lifelessjeblair: ack, thanks20:37
openstackgerritJames E. Blair proposed a change to openstack-infra/nodepool: Rename ASRT -> AGT  https://review.openstack.org/5126720:37
sdagueclarkb: you about?20:37
sdagueI wanted to get your take on the os_loganalyze tree to figure out what more I should do before we start connecting it up to the log server. Current code won't really change anything, but at least now I have a framework of test in place so I can figure out if I break something.20:40
*** Bada has quit IRC20:41
sdagueso I can feel confident in doing the keystone and swift log support20:41
openstackgerritA change was merged to openstack-infra/gitdm: add user to openstack-config  https://review.openstack.org/5042520:41
mordredwow. that's such a good commit message20:41
*** dafter has quit IRC20:41
mordredthe other reason my review % is so high is that I keep reviewing jeblair code.20:42
jeblairmordred: nice, now i can't say anything bad about your 90% average.  :)20:42
boris-42jeblair hi20:43
jeblairboris-42: hello20:43
clarkbsdague: ish. train wifi/verizon not so good20:43
boris-42jeblair how are you?20:43
mordredjeblair: :)20:44
jeblairboris-42: i am well.  how are you?20:44
*** briancline has quit IRC20:44
clarkbsdague: I think starting with a 1:1 move is good then we can tack on bug fixes20:44
mordredjeblair: any reason I should not APRV a nodepool change? I kinda feel like you should handle landing those at the moment - am I being overly cautious?20:44
*** tvb|afk has joined #openstack-infra20:44
boris-42jeblair nice thanks. I would like to add benchmarking & profiling tool to OpenStack CI =) so probably you will be interested20:45
*** ruhe has joined #openstack-infra20:45
jeblairmordred: nope -- as long as it doesn't require a coordinated config file change, should be safe.  nodepool doesn't auto-restart, so it doesn't take effect until we restart it manually for some reason.20:45
*** ruhe has quit IRC20:46
jeblairboris-42: yes, very much!  do you think it would be a good idea to send an email to openstack-infra@lists.openstack.org to tell us a bit about the tool?20:46
*** thomasm has quit IRC20:46
boris-42jeblair could we move to #openstack-rally20:46
*** alcabrera has quit IRC20:47
sdagueboris-42: I think it would be better here20:47
sdaguehaving a million subchannels doesn't help keep folks on board20:48
boris-42sdague jeblair it's separated project … but ok20:48
boris-42sdague jeblair here is the wiki https://wiki.openstack.org/wiki/Rally20:48
markmcclainso looks like we hit the time limit on py26 neutron tests...20:48
markmcclainhttp://logs.openstack.org/08/50608/2/gate/gate-neutron-python26/de9ae8c/console.html20:48
boris-42sdague jeblair actually the official announce will at this Monday..20:48
markmcclainit actually succeed, but the gate failed since it ran over the hour20:49
jeblairmarkmcclain: do neutron unit tests really take twice as long as a full tempest run?20:49
sdaguemarkmcclain: yeh, an hour is pretty long20:49
openstackgerritA change was merged to openstack-infra/nodepool: Add a thread dump signal handler  https://review.openstack.org/5124820:49
markmcclainI'm surprised by the runtime a bit20:50
sdagueit's seemingly not doing anything for the first 15 minutes20:50
sdaguefiguring out why, would be helpful20:50
sdaguejeblair: they did pass 40 mins on py26 during rc phase20:50
clarkbis git being slow again?20:50
sdagueso if there was some new 15 min delay, I could see that smashing into 6020:50
sdagueclarkb: I don't know20:51
sdague2013-10-11 18:44:23.275 | Building remotely on centos6-6 in workspace /home/jenkins/workspace/gate-neutron-python2620:51
sdague2013-10-11 19:02:42.590 | [gate-neutron-python26] $ /bin/bash -xe /tmp/hudson3680450177999363028.sh20:51
clarkbgit seems fine. that delay at the beginning is weird though20:52
jeblairclarkb: hrm, the 15 min delay looks like it's from jenkins20:52
clarkbjeblair: ya20:52
*** melwitt has quit IRC20:53
hub_capso given that we want to cache the images for dib in the d-g jobs, its probably safe to assume we should run the entire 10-* script that does the work, ya? example: https://github.com/openstack/diskimage-builder/blob/master/elements/ubuntu/root.d/10-cache-ubuntu-tarball20:54
jeblairboris-42: ok, how can we help you?20:54
boris-42jeblair Rally is able to deploy cloud and test it, or just test it=)20:54
hub_capotherwise if we put dib --offline, we will only have done 1/10'th of the work to make the image dib usable20:55
jeblairboris-42: which do you want to do first?20:55
boris-42jeblair to test it it requires only endpoints of cloud20:55
hub_capwhat say you to that lifeless? (see my last 2 msgs)20:55
boris-42jeblair at the end I would like to deploy and test20:55
jeblairclarkb: the current job running on centos6-6 did not have a delay20:55
clarkbjeblair: could a job have taken over the node before eg bug in gearman plugins locking?20:56
boris-42jeblair Rally will support different deploy engines. (at moment only DevStack) but in future trippleO and fule20:56
jeblairboris-42: so we have added some hooks to the devstack-gate script that let you use a lot of the functionality in it20:56
* fungi is popping out for an early dinner, but will return soon20:56
openstackgerritA change was merged to openstack-infra/config: Fix sqlalchemy-migrate py26/sa07 job  https://review.openstack.org/4468620:56
clarkbso 15 minutes of some other job running?20:57
openstackgerritA change was merged to openstack-infra/config: Add tagging permissions to python-libraclient  https://review.openstack.org/4529420:57
boris-42jeblair to test existing cloud I should have only Rally & cloud enpoints20:57
jeblairboris-42: so you should be able to write a job that runs rally on a cloud set up by devstack20:57
jeblairboris-42: or you can write a job like devstack-gate that uses rally to set up a cloud instead of devstack20:57
*** senk has joined #openstack-infra20:58
boris-42jeblair interesting, ok I think it will be simpler for start to write just job that will run tests against your already deployed devstack cloud20:58
sdagueanyone up for helping me get this tree into gerrit?20:59
*** julim has quit IRC20:59
jeblairboris-42: ok.  you can look at the swift-devstack-vm-functional jobs for an example of how to do something like that21:00
boris-42jeblair thank you!21:01
boris-42jeblair will try on next week=)21:01
clarkbsdague: I can try. reading ci.openstack.org/stackforge.html is a good place to start21:01
*** melwitt has joined #openstack-infra21:01
jeblairclarkb: https://jenkins02.openstack.org/job/gate-nova-python26/6747/console https://jenkins02.openstack.org/job/gate-neutron-python26/2257/console https://jenkins02.openstack.org/job/gate-horizon-python26/1379/console21:02
jeblairclarkb: that's the job before, the neutron job, and the job after21:02
jeblairtimestamps don't seem to overlap21:02
jeblairclarkb: and neither the job before or after did that21:02
jeblairi'm leaning toward 'jenkins got busy' or 'jenkins got semi-deadlocked' or 'jenkins garbage collected' or, well, in general, just blaming jenkins for being jenkins.21:03
clarkbjeblair: this is weird. ya jenkins for being jenkins seems plausible21:03
*** jerryz has quit IRC21:04
jeblairmarkmcclain, sdague: so it looks like 15 min of that runtime is jenkins derping.  let's call that a fluke for the moment, unless it happens with significant regularity.21:04
sdaguejeblair: sounds fair21:04
sdagueclarkb: ok, I'm assuming this will live in openstack-infra/ and will propose a patch accordingly21:05
jeblairsdague, clarkb: ++21:05
*** senk has quit IRC21:06
clarkbsdague yup. the stackforge page is a decent template for what you need though21:07
*** miqui has quit IRC21:08
*** matty_dubs is now known as matty_dubs|gone21:09
sdaguewhat's included in python-jobs?21:10
clarkbpep8 pythonXX and pypy21:10
clarkbalso gate-*-docs21:10
clarkband coverage21:10
*** lcestari has quit IRC21:14
*** sarob has joined #openstack-infra21:14
openstackgerritSean Dague proposed a change to openstack-infra/config: add os-loganalyze to gerrit & zuul  https://review.openstack.org/5129921:15
sdagueso, that about right?21:15
*** CaptTofu has quit IRC21:17
*** CaptTofu has joined #openstack-infra21:17
clarkbsdague the pep8 and python jobs are just gate-* no check-*21:17
sdagueok21:18
sdaguelet me fix that quick21:18
openstackgerritSean Dague proposed a change to openstack-infra/config: add os-loganalyze to gerrit & zuul  https://review.openstack.org/5129921:18
*** anteaya has joined #openstack-infra21:19
anteayaclarkb: I am meeting all sorts of elastic search people21:19
*** SergeyLukjanov has quit IRC21:19
dkranzclarkb: My tempest job watcher thinks only four tempest gate jobs have finished in the past few hours. Is it wrong or did I just pick a bad time to start with this?21:19
anteayado you have a list of bugs or an etherpad that outlines your current pain points with logstash and elastic search so I can read up and ask intelligent questions21:19
*** SergeyLukjanov has joined #openstack-infra21:20
anteayaand maybe find out something useful for you?21:20
clarkbanteaya: I don't they are fairly nebulous around scaling21:20
clarkbsdague lgtm21:20
anteayaclarkb: yeah that is what I understood21:20
clarkbanteaya I need to upgrade to latest next week21:20
clarkbnewer versions are supposed to be better21:20
anteayado you think that will address some of the current scaling issues?21:20
anteayak21:21
anteayaI'll ask about versions tomorrow21:21
*** mrodden has quit IRC21:21
anteayawhat version of logstash and elastic search are we using right now21:21
anteayaand what do you want to go to next week?21:21
clarkbyes es memory use is much better in 0.90.X apparently21:21
sdagueclarkb: next time you are in logstash, I have requests for 2 pieces of metadata to get added to the runs21:21
sdague1) cloud-az21:21
sdague2) branch21:22
*** CaptTofu has quit IRC21:22
clarkbdkranz: I don't know. currently on a poor connection.21:23
anteayaclarkb: the one bit of info I got from my after dinner walk around Budapest companions, who just happen to have an elastic search as a service company - what luck - is that they run many small clusters rather than large clusters21:23
clarkbsdague: noted21:23
sdagueclarkb: thanks :)21:24
dkranzclarkb: ok, given that my patch is still hung in zuul almost 4 hours later perhaps it is just slow21:24
anteayaI'm not sure how the size of our cluster would be characterized21:24
clarkbanteaya: interesting I wonder how they shard across clusters21:24
anteayaI can ask21:24
*** esker has quit IRC21:27
sdagueclarkb: ok, jenkins did a +1 - https://review.openstack.org/#/c/51299/21:28
*** vipul is now known as vipul-away21:28
*** vipul-away is now known as vipul21:28
sdaguejeblair, you got a sec to check that out as well?21:28
sdagueI'd like to get this over so I can at least call that part good before the weekend, if possible :)21:29
*** SergeyLukjanov is now known as _SergeyLukjanov21:32
*** _SergeyLukjanov is now known as SergeyLukjanov21:32
*** SergeyLukjanov is now known as _SergeyLukjanov21:33
*** _SergeyLukjanov is now known as SergeyLukjanov21:33
*** SergeyLukjanov is now known as _SergeyLukjanov21:33
*** _SergeyLukjanov is now known as SergeyLukjanov21:33
*** SergeyLukjanov is now known as _SergeyLukjanov21:34
*** _SergeyLukjanov is now known as SergeyLukjanov21:34
*** blamar has joined #openstack-infra21:39
*** vipul is now known as vipul-away21:43
openstackgerritA change was merged to openstack-dev/pbr: Do not pass unicode where byte strings are wanted  https://review.openstack.org/4835521:47
fungisdague: you still have teh typoz21:50
*** anteaya has quit IRC21:51
*** vipul-away is now known as vipul21:54
*** mgagne has quit IRC21:56
fungiso as far as the py26 unit test timeout, i see than jenkins02 is in the midst of one of those use-all-the-things fits and is well on its way to memory exhaustion as a result... http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=41&page=221:58
fungis/than/that/21:58
fungii give it 30-60 minutes before available ram is full21:59
fungithough looking at the swap graph, yesterday's oom condition didn't happen until it reached around 0.5g swap used and then suddenly spiked in a matter of 10-20 minutes until it was up to 2g swap22:01
clarkb:/ is there a newer version of jenkins out. ww could try upgrading22:02
jeblairfungi can you gracefully stop and restart it?22:02
fungijeblair: i definitely can22:02
fungiwas wondering if we wanted to troubleshoot further first, since we've caught it in this state22:02
fungii'm checking the thread count real quick22:02
*** senk has joined #openstack-infra22:03
jeblairi am afk and not useful22:03
*** dkranz has quit IRC22:03
fungino worries--i'm collecting what details i can first22:03
fungibut will definitely try to cycle it here in a moment and see if that helps22:03
*** gyee has quit IRC22:03
clarkb++22:04
*** pcm_ has quit IRC22:04
clarkbI cant help for a bit but should have proper wifi in about an hour22:04
*** SergeyLukjanov has quit IRC22:04
fungithread count is highish but reasonable. not like that other time where it went batty22:05
*** SergeyLukjanov has joined #openstack-infra22:05
fungiThreads on jenkins02.openstack.org@166.78.48.99: Number = 1,935, Maximum = 3,390, Total started = 106,69922:05
Steely_Spamhttps://review.openstack.org/#/c/49622/22:05
Steely_Spamis that hanging out because it got a -1 during check?22:05
Steely_SpamI didn't think that was a thing22:06
*** SergeyLukjanov is now known as _SergeyLukjanov22:06
fungifor comparison...22:06
fungiThreads on jenkins01.openstack.org@166.78.188.99: Number = 1,422, Maximum = 19,590, Total started = 807,51722:06
*** _SergeyLukjanov is now known as SergeyLukjanov22:06
*** senk has quit IRC22:07
clarkbno it should clear the -1 and move on. that is why zuul leaves a gate jobs starting comment22:07
Steely_Spamclarkb: okay, I thought so...22:08
fungiclarkb: Steely_Spam: though in this case i'm not finding it on the zuul status page22:08
Steely_Spamfungi: right, it's not in the queue for some reason22:09
fungiit got a new patchset upload after it was approved but before it merged, then got approved again22:09
*** jerryz has joined #openstack-infra22:09
Steely_Spammaybe a reverify would kick it?22:09
fungiit's possible it was re-approved while the previous patchset was still in the process of waiting to be kicked out of today's extremely slow gate22:09
fungiSteely_Spam: so, yes, try to reverify and see if jenkins leaves a new "starting gating" comment on it after that22:10
* Steely_Spam tries22:10
jerryzfungi: it is still not unusual for me to run into this bug: https://bugs.launchpad.net/openstack-ci/+bug/122566422:10
uvirtbotLaunchpad bug 1225664 in openstack-ci "tempest.api.volume.test_volumes_actions.VolumesActionsTestXML flakey failure" [High,Triaged]22:10
Steely_Spamfungi: related question: can I put a recheck/reverify command on the first line and more comment below it, or does the whole comment have to be just the command in order to work?22:10
fungijerryz: did you hit it recently?22:11
jerryzfungi: i also see in e-r status report several reviews also fail due to that bug22:11
Steely_Spamfungi: yes, that kicked it and like ten behind it, thanks :)22:11
fungiSteely_Spam: no, it's a very strict match right now, no comments in the same post. i usually leave a second comment with my details22:11
Steely_Spamfungi: okay, I've been doing the same, just wondering22:11
jerryzfungi: my code base tested should be two or three days ago22:11
jerryzfungi: but in e-r 's report, recent reviews also hit similar failure22:12
fungijerryz: it's also possible the elastic-recheck criteria for matching that issue are too vague and catching more than one problem under that umbrella22:12
fungijerryz: link to a recent failure or the report you're talking about?22:13
jerryzAffecting changes: 42523, 46696, 46479, 46206, 46598, 45306, 46738, 46777, 46219, 46792, 4224022:13
jerryzhttps://review.openstack.org/#/c/42240/22:13
*** dcramer_ has quit IRC22:14
*** tvb|afk has quit IRC22:14
fungijerryz: thanks--i'll try to take a look in a bit once i've got jenkins02 back under control22:14
fungiheh... top reports the jvm on jenkins02 is using 40g of virtual memory. it doesn't have but 32g including swap22:15
fungimust be shared22:15
fungiresident is 26g though22:16
fungiokay, jenkins02 is preparing for shutdown. i'll restart the service once all jobs complete22:17
fungiprobably about 30 minutes22:18
jeblairfungi i think nodepool is running the new code that should shift load to jenkins01.  you may want to keep an eye on jenkins01.22:19
fungiyeah, as of this morning's restart. i was thinking about that as well22:20
jeblairsince theres a lot of untested stuff going on.22:20
jeblairif jenkins01 gets overloaded we may need to add a cap in nodepool.22:21
fungijerryz: okay, i see that's the swift storage cap being exceeded? you might see if afazekas wants to work on enlarging that since he did the past couple of changes for it (or propose a similar one?)22:21
fungijeblair: definitely agree22:21
jgriffithjerryz: question on that...22:21
jgriffithjerryz: which case of it are you seeing?22:21
jeblairfungi if there is a prob you can adjust provider max values in nodepool.yaml to quickly get a similar effect.22:22
fungijeblair: noted--thanks!22:22
*** SergeyLukjanov has quit IRC22:22
jerryzjgriffith: https://review.openstack.org/#/c/46531  and https://review.openstack.org/#/c/42240/22:23
jerryzthose are recent failures22:23
fungii'll afk for a few minutes while jenkins02 finishes up and brb22:24
jgriffithjerryz: interesting... 500 failure back from the glance client22:26
jgriffithjerryz: http://logs.openstack.org/31/46531/6/gate/gate-tempest-devstack-vm-postgres-full/aa0cbc2/logs/screen-c-vol.txt.gz#_2013-10-10_15_52_57_75322:26
*** thedodd has quit IRC22:30
*** CaptTofu has joined #openstack-infra22:30
lifelessjeblair: I'd like to offer all TripleO ATC's accounts on this cloud; I could just mail -dev but I'm pondering whether something more directed (e.g. direct email) would be good22:31
*** sarob has quit IRC22:31
BobBallfungi: Is there any way to access a vnc console or similar for VMs in the HP cloud?22:33
openstackgerritSean Dague proposed a change to openstack-infra/config: add os-loganalyze to gerrit & zuul  https://review.openstack.org/5129922:33
sdaguefungi: oops, thanks22:33
*** changbl has quit IRC22:35
fungilifeless: if it's decided that an e-mail list of tripleo atcs is warranted, i can generate one on whatever set of repositories and timeframe you want, basically same as we would for a tripleo ptl election22:36
*** rcleere has quit IRC22:36
jeblairlifeless recommend -dev for now as i'd want to carefully consider giving out email addrs22:36
fungiagreed. i'm hesitant as well, but it's a technical possibility22:37
lifelessjeblair: ack22:37
jeblairi personally think this is a fine use, but i dont want to surprise anyone or break any implied trusts22:37
fungi(and you can always just scrape the git commit logs, but that's got the same privacy concerns)22:37
fungiBobBall: i believe so, but it's been a while since i needed console access to am hpcloud vm22:38
jeblairso lets separately come up with some policy for the future22:38
sdaguefungi: can I get another look from you on the os_loganalyze add - https://review.openstack.org/51299 ?22:38
*** datsun180b has quit IRC22:38
fungisdague: yep, was about to pull it back up22:39
BobBallfungi: any ideas how I might do that? the web interface doesn't seem to give me a clue...22:39
sdaguecoolio22:39
*** CaptTofu has quit IRC22:39
fungisdague: keep in mind i only -1'd you to game my review positivity stats ;)22:39
sdague:)22:40
*** CaptTofu has joined #openstack-infra22:40
clarkbBobBall I am not sure you can. I had the same problem last I tried22:40
fungithat's sucky22:41
openstackgerritDan Nguyen proposed a change to openstack/requirements: Add pwtools to requirements for password generator  https://review.openstack.org/5106822:41
* BobBall sighs deeply22:41
BobBallthat's a real shame...22:41
lifelessjeblair: cool, thanks22:41
fungion the other hand, it seems like a good chunk of nova denial of service issues were related to novnc, so maybe disallowing access there is a defensive measure22:42
* BobBall bangs his head against the soft fluffy HP cloud22:42
lifelessBobBall: oh?22:42
BobBallStruggling trying to get Xen booting nested so we can look at gating tests... and the lack of VNC access means I can't play with boot parameters - once I set them, and it fails, I have to reinstall the machine22:43
BobBallit's a right pain22:43
*** CaptTofu has quit IRC22:44
lifelessBobBall: oh :)22:44
fungii know rackspace provides a console. on the down side the reason i know that is because of having to frequently try to troubleshoot crashed/hung/dead virtual machines22:44
lifelessBobBall: erm, I meant oh :(22:44
lifelessBobBall: do you have xen booting locally using kvm ?22:44
lifelessBobBall: could you just upload a custom image?22:44
BobBallwe've had it working, yes22:44
fungilifeless: via that awesome glance service they offer their customers ;)22:45
lifelessfungi: yup, we have that22:45
BobBallnot seen that upload a custom image?22:45
fungilifeless: is it no longer in beta?22:45
lifelessfungi: it's in public beta still I believe22:45
jerryzjgriffith: can i file a bug?22:46
fungiwell, public beta is way better than secret beta. that's something rackspace still hasn't provided22:46
BobBalllifeless: how would I do that?22:46
jgriffithjerryz: the bug that you pointed to is valid.  Just need to add cinder and possibly glance but not sure yet22:46
jgriffithjerryz: I'll have to get back to it here when I have some more time22:47
BobBallfungi: RS cloud is even less fun - in theory it's doable but in practice we need an HVM linux guest which is a pain to get hold of with RS cloud :P22:47
* fungi nods22:47
jgriffithjerryz: feel free to add Cinder to the projects, I don't think it's an infra bug that's for sure22:47
BobBallthis is the joy of nested virt...22:48
lifelessBobBall: hardware assisted virt will be disabled in the kvm vms though surely22:48
lifelessBobBall: go to https://account.hpcloud.com/services22:49
lifelessBobBall: select us east in the beta section and request access22:49
lifelessBobBall: then once you get that, you can ask for glance access too22:50
BobBallgreat, thanks lifeless22:50
lifelessBobBall: it was about 24 hour turnaround when I got it enabled on the -infra account22:50
lifelessthough I don't think they've done anything with it:P22:50
BobBallbeta request sent :)22:50
lifelessBobBall: I'd be delighted to help you get a physical test environment up, if you guys have machines - we should be able to use nova baremetal + nodepool to get you d-g style instances of actual xen deployed pretty easily22:52
BobBallwe do - although not nearly the number of machines that -infra use for the gate :)22:54
sdagueclarkb, jeblair: either of you good with putting this through https://review.openstack.org/#/c/51299/ ? then we could get the gerrit core team set, and I can make changes on that side22:54
BobBallvirtualisation should work - it _really_ should...22:55
jgriffithjerryz: cool.. thanks!22:56
*** dcramer_ has joined #openstack-infra22:56
fungisdague: clarkb seemed basically okay with the previous patchset in irc. i'm okay approving it and will troubleshoot whatever i might overlook22:56
sdaguefungi: that would be awesome22:57
lifelessBobBall: how many concurrent vm's does a gate run need though?22:57
sdaguethen add me + infra-core to the core team in gerrit22:57
lifelessBobBall: say one for d-g itself, and some N concurrent test instances: one solid xen machine should be able to support at least 5 or 6 concurrent d-g style tests.22:58
lifelessBobBall: (without slowing each test down, I mean)22:58
sdagueI'm on for about the next 20 mins22:58
sdaguethen it's off to Plan 9 - http://www.bardavon.org/mobile/event_info.php?id=69422:59
BobBallPerhaps - although I figured we needed one host per VM that's running tests - just to ensure there aren't any cross-interactions which might cause problems?23:00
BobBallalthough maybe I don't understand what d-g style tests are :P23:00
lifelessBobBall: d-g runs devstack which you'd want configured to talk to xen23:00
lifelessBobBall: I don't know xen well; could you have multiple devstacks talking to one xen ?23:00
BobBallin theory, sure23:01
BobBallbut if you have it then there is a risk of one set of tests interacting with another23:01
lifelessk23:01
*** boris-42 has quit IRC23:01
BobBalle.g. if you break the xenserver in a horrible way (or the plugins don't match...) then it might show up as a failure when it shouldn't have23:01
lifelessperhaps have it just run nova gates?23:01
BobBallThat'd be easier for sure23:01
BobBallso how many hosts do you think might be needed?23:02
openstackgerritA change was merged to openstack-infra/config: add os-loganalyze to gerrit & zuul  https://review.openstack.org/5129923:02
lifelessnova is a pretty big fraction of the changes23:03
lifelessbut23:03
lifelessI don't have a gut feel - clarkb / fungi may well23:03
*** senk has joined #openstack-infra23:03
lifelessthe full gate, remembering my back of envelope figures23:04
lifelesswas 400 changes in one day23:04
lifelessat 30m each23:04
BobBallI see23:05
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Change test_queries from logical AND to OR  https://review.openstack.org/5016023:05
sdaguefungi: so now that it's merged, we just want for the next puppet update to trigger the import?23:06
BobBalloh rubbish - just realised it's midnight23:06
BobBallI really should get some sleep23:06
lifelessBobBall: https://etherpad.openstack.org/tripleo-test-cluster23:06
lifelessBobBall: we figureed 40 concurrent test environments is sufficient23:06
lifelessBobBall: so 40 small machines for xen23:06
fungisdague: yup, and then i'll add you as the initial core group member, and add the infra core group as included23:06
sdaguefungi: cool23:06
lifelessBobBall: perhaps a moonshot chassis fully loaded?23:07
BobBallthat'd be a very nice way to do it23:08
*** senk has quit IRC23:08
fungilifeless: BobBall: if you're just talking about gating load, have a look at http://status.openstack.org/zuul/ and note that each job listed for a change is using an 8gb vm with 4x vcpu23:09
lifelessfungi: moonshot is dual core + hyperthreads with 8GB23:09
fungiso depending on the project you're gating, maybe around 10ish servers in parallel23:09
fungilifeless: sounds comparable23:10
lifelessfungi: right, it's why I suggested it.23:10
fungiis that the arm hardfloat version or the atom one?23:10
lifelessfungi: there will be higher density cartridges in future, of course23:10
BobBall10 doesn't sound enough to me if I'm honest23:10
lifelessfungi: atom, it even has VTx23:10
fungiBobBall: i meant 10ish per change you want to test in parallel23:10
BobBallI'm very tempted by the moonshot idea23:11
lifelessBobBall: fungi means 10 * - 10 servers per commit, but I think he's wrong :)23:11
BobBalloh I see23:11
BobBallwhy 10 per commit?23:11
fungii may be. checking the veracity of my assertion now23:11
lifelessfungi: do you mean 'tempest runs 10 sub-vm's ?23:11
lifelessfungi: or do you mean 'zuul schedules 10 jobs' ?23:11
sdagueare you guys talking about devstack/tempest runs?23:11
sdaguebecause our experience is the cpu does matter quite a bit23:11
BobBallWe're talking about adding a devstack/tempest/xenapi run somehow :)23:12
sdaguewhich is why the rax nodes aren't used23:12
sdagueso atom... not a great idea :)23:12
fungijust talking about jobs in general. if you were to replicate *all* of our gating, we use 9 virtual machines in parallel for each iteration of attempting to gate a nova change, for example23:12
lifelesssdague: mmm, I'd seriously consider native atom over virtualised $other :>23:12
lifelessfungi: right, so thats the wrong way to look at it23:13
BobBallahhh ok23:13
funginot sure what metric BobBall was looking for there23:13
lifelessfungi: the way to look at is is we're adding one more job to that set.23:13
fungioh, in that case one per change tested in parallel23:13
lifelessfungi: so from 9 vm's to 10, one of which BobBall would be providing in a dedicated xen-capable-environment.23:13
BobBallI'm not sure I know either :)23:13
sdaguelifeless: it seems pretty cpu bound, so virtualized doesn't have much overhead23:13
lifelesssdague: tempest is running against qemu vm's23:14
sdaguelifeless: the qemu vm start times isn't really the issue23:14
fungijenkins02 has been gracefully restarted and is coming up now23:14
*** sarob has joined #openstack-infra23:14
lifelesssdague: ok; I'll defer to data here.23:14
lifelesssdague: just that even cirros can't make the vm's do their stuff well >23:14
lifelesssdague: I would want to investigate a xen-on-moonshot test before writing it off23:15
sdaguefair, just saying what I've seen.23:15
lifelesssdague: these aren't the atoms most folk have seen23:15
sdagueok, well even the amd chips in rax give us a 40% slow down compared to the intel chips at hp23:16
BobBallwell I'd question whether we'd need to run the full test of tempest tests as well - they all pass of course, so that's not the issue, but some of them are entirely independent of the ypervisor driver23:16
lifelesshttp://www8.hp.com/us/en/products/proliant-servers/product-detail.html?oid=5375897#!tab=specs <- the cartridges I'm referring to23:16
BobBallthe rax chips you were testing on are a fair bit older than the intel ones at HP though23:16
sdaguelifeless: what's the L3 look like on those?23:16
sdagueBobBall: fair23:16
lifelesshttp://ark.intel.com/products/series/71265/Intel-Atom-Processor-S1200-Product-Family-for-Server23:16
sdagueI'd say get some data on a real system first though23:17
lifelesssdague: 1 MB23:17
lifelesssdague: yes, +1 on getting real data23:17
sdagueso, I'd be suspicious then. We've some some pretty strong corolation between L3 size and speed here.23:18
sdaguebut some runs would be good23:18
BobBallDo you have access to a moonshot system lifeless?  I can probably get access but it's likely to take a while23:18
sdagueok, movie time23:18
lifelessBobBall: not at the moment, but I know folk who do :/23:18
BobBallokay23:18
*** fifieldt has joined #openstack-infra23:19
fungisdague: if it's wood's original plan 9, one of my favorites ;)23:19
BobBallokay I'll check with our HP blokey23:19
lifelessBobBall: I would suggest, if doing this is a real possibility, that we go in the front door and get a sales person involved - the sales folk have ready access to moonshot for customer evaluations23:19
lifelessBobBall: (e.g. fully populated 45 cartridge + two switch chassis)23:19
BobBallMaybe.  I know someone who has been talking about moonshot so I'll have a few words with him first23:20
BobBalland try the glance upload route too :)23:21
lifelesscool23:21
BobBallall sorts of fun!23:21
lifelessif you run into a wall, let me know23:21
lifelessI have some interactions with moonshot teams23:21
BobBallperfect, thanks.23:21
BobBall*sleep*23:22
*** BobBall is now known as BobBallAway23:22
lifelessgnight!23:22
*** marktraceur is now known as FreeThaiFood23:23
fungijeblair: anecdotal but worth watching for next time, we had a great many more devstack jobs end up on jenkins02 as soon as it came up than were running on jenkins01. like it got favored for some reason (maybe accumulated shares from while it was unreachable?)23:26
fungiat the moment there are about 5 devstack jobs running on jenkins01 and nearly 50 on jenkins0223:27
fungibut jobs still seem to be running and completing successfully23:29
fungii'll check back in on it in a bit23:29
*** pentameter has quit IRC23:33
*** mriedem has joined #openstack-infra23:37
*** nati_uen_ has joined #openstack-infra23:45
*** nati_ueno has quit IRC23:46
*** FreeThaiFood is now known as marktraceur23:48
*** hogepodge has quit IRC23:49
*** rnirmal has quit IRC23:54
*** vipul is now known as vipul-away23:57

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!