Friday, 2013-12-13

fungizuul seems to have restarted jobs from jenkins01, as far as i can tell from the status page00:00
*** paul-- has joined #openstack-infra00:00
fungiis that new behavior, or am i seeing fairies?00:00
clarkbyeah it is running a bunch of jobs00:01
fungii want to say the last time a jenkins master fell over we had jobs stuck in a running state in zuul until we restarted it00:01
*** oubiwann-lambda has quit IRC00:01
fungi(until we restarted zuul i mean)00:01
*** nicedice has quit IRC00:02
clarkbwe need to make sure that the things that feed off of zmq reconnected00:02
funginodepool needed a restart last time, right?00:02
clarkbyeah and the logstash gearman client00:03
*** nicedice has joined #openstack-infra00:03
clarkbzmq is supposed to avoid these problems but ugh00:03
fungii can go punt it now00:03
clarkbfungi: well we should bea ble to check if it is connected I think00:03
fungior do we want to wait and see if nodepool retains sanity this time?00:03
fungiyeah, that00:03
*** UtahDave has quit IRC00:04
jeblairi'll check on nodepool00:04
*** pabelanger has joined #openstack-infra00:05
fungilooks like we're also down four precise nodes across the two masters00:06
fungii'll get them back up and working at some point this evening00:06
jeblairnodepool is receiving zmq events from both masters, and is currently assigning nodes to both of them.00:07
jeblair(btw, it correctly noted jenkins01 was down earlier and stopped trying to assign nodes to it)00:07
fungithat is a significant improvement over last time00:07
clarkbjeblair: woot00:08
*** mdenny has quit IRC00:08
clarkblogstash seems fine too00:08
*** pabelanger has quit IRC00:09
clarkbI think zmq reconnect mechanism works fine when the disappearance of the service is relatively short00:09
jeblairand yeah, zuul should know to restart jobs if gearman fails, so that's (optimistically) expected behavior00:09
clarkbit has problems when it is hours long00:09
*** blamar has quit IRC00:10
clarkbhttps://review.openstack.org/#/c/61321/1 is the change to fix nodepool install on jenkins-dev00:10
jeblairclarkb: how did you notice jenkins01 died?00:10
openstackgerritA change was merged to openstack-infra/config: Enable patchset-created for #openstack-state-management channel  https://review.openstack.org/6160500:12
*** jgrimm has quit IRC00:13
clarkbjeblair: I was going to hold d-g nodes in hopes of debugging the tempest ssh test, and couldn't open the web UI for jenkins01 to find which nodes needed holding00:13
*** bnemec has quit IRC00:14
anteayaAaronGr: you still around?00:14
AaronGranteaya: hi, i am.00:15
fungiclarkb: zaro: does someone still need to test-drive https://review.openstack.org/61321 on jenkins-dev to confirm it puppets successfully?00:15
jeblairclarkb: neat.  so basically we just lost half our infrastructure and it was pretty much only noticable by admins.  that makes me happy.  :)00:15
clarkbfungi: probably, I have also -1'd it because I think it needs a few tweaks00:15
fungioh00:15
clarkbzaro: can you address those comments?00:16
anteayaAaronGr: to help you parse the above, the path of a commit is user -> Gerrit -> Zuul -> Jenkins job running on a node provided by node pool00:16
fungiyes, looks like you just did that00:16
anteayaAaronGr: that is simplified but a place to begin for now00:16
clarkbjeblair: yup I think I caught it within a couple mintues of it happening00:16
anteayaAaronGr: every new patch submitted follows that path for testing00:16
jeblairclarkb: i think you did too, but i meant that it seems like we're making headway in increasing fault tolerance.00:17
clarkbyup00:17
AaronGranteaya: gerrit = review, zuul = scheduler for jenkins tasks?00:17
*** bnemec has joined #openstack-infra00:18
anteayaAaronGr: gerrit == review00:18
anteayanot sure if I could call zuul a scheduler or not, but definitely a co-ordination layer between Gerrit and Jenkins jobs00:18
fungi"scheduler" is not an inappropriate term for it00:19
*** dstanek has quit IRC00:20
AaronGrsorry, by scheduler i meant 'mechanism for submitting a task to run a jenkins job', in this case to validate a reviewed patch.00:20
fungiin fact, http://ci.openstack.org/zuul/ starts out in its introduction, "The main component of Zuul is the scheduler." (so it's more than a scheduler, but that's a lot of it)00:20
AaronGrassuming it comes back without failing, what's the next step? (i am taking the ci page line at a time, hadn't hit Z yet *grins*)00:21
anteayaAaronGr: that can work for now, but as you understand more prepare to refine the definition00:21
clarkbzaro: if you aren't able to make those chagnes I think I will quickly make them00:21
anteayalogs get posted to the static logs server00:21
AaronGranteaya: absolutely, this is helping to balance out what i've been reading with a condensed set of steps.00:21
anteayazuul tells gerrit what to write on the comment on the patch, attributed to Jenkins00:22
anteayaAaronGr: great00:22
anteayahelps me to say it out loud too00:22
anteayalearning by teaching00:22
AaronGrso back to gerrit, and then it goes through someone who does the final merge?00:22
anteayaso pass or fail, that is what happens00:23
AaronGr(appreciated)00:23
fungianteaya: AaronGr: actually, http://docs.openstack.org/infra/publications/overview/ is a good high-level presentation on those topics00:23
anteayaAaronGr: a person approves, the patch runs through the gate, if passed it is merged as part of the jenkins job00:23
anteayano human merges00:23
fungiit's the "how we try to explain infrastructure to a room full of people in 30 minutes to an hour"00:23
AaronGrfungi: ok, the explanations here help the pictures make sense.00:24
fungi(complete with pretty pictures)00:24
*** openstackgerrit has quit IRC00:24
fungiahh, cool, didn't know if you'd found the presentations yet00:24
jeblairi think there are youtube videos of us giving that presentation.00:24
*** openstackgerrit has joined #openstack-infra00:24
*** ekarlso has quit IRC00:25
*** ekarlso has joined #openstack-infra00:25
AaronGrso code -> gerrit -> jenkins test -> gerrit -> jenkins -> codebase looks like the oneliner00:25
AaronGr(assuming 2 levels of review and no bugs)00:25
anteayaI think you have a basic understanding00:27
*** herndon has quit IRC00:27
jeblairclarkb: after only 4 training runs, i'm starting to get results like this from crm114:00:27
AaronGrnice!00:27
jeblairbad 1.0000 5.3426 2013-12-11 21:45:56.664 | Details: {u'conflictingRequest': {u'message': u"Cannot 'rebuild' while instance is in task_state rebuilding", u'code': 409}}00:27
fungineat!00:27
jeblairthe 1.0 is a rather high probability that line is associated with a failure00:28
*** reed has quit IRC00:28
fungiit's picking up quickly00:28
clarkbjeblair: nice, would it be possible to make that an elasticsearch column?00:29
jeblairfungi: knowing the answers ahead of time helps.  :)  and actually changes the problem a bit.00:29
clarkbjeblair: right you can train on any job that was successful00:29
fungijeblair: well, true. it's more ham vs spam at that stage00:29
jeblairclarkb: that's what i'm thinking00:29
clarkbjeblair: I am thinking that if we have a numeric elasticsearch column with some probability then we can saerch based on that00:30
clarkblucene can do >=0.8 for example on numeric fields iirc00:30
fungioh, neat. yeah filter or sort on bayesian score00:30
lifelessjeblair: ohh what are you training?00:30
*** ArxCruz has joined #openstack-infra00:31
jeblairlifeless: i'm seeing if crm114 can help identify log lines that indicate failure; it's something we talked about at the havana summit but needed logstash to exist first.00:32
anteayahey ArxCruz, I'm going to be sending people your way if they have questions about setting up their own infra00:32
anteayayou and lifeless00:33
anteayahope that is okay with you00:33
ArxCruzanteaya: sure, what's happening ?00:33
anteayaArxCruz: neutron needs all the plugin developers to provide 3rd party testing, there are a few of them00:33
*** oubiwan__ has joined #openstack-infra00:34
anteayawe suggested they set up their own infra using zuul, devstack-gate and nodepool to do so00:34
ArxCruzanteaya: i know IBM is doing something I've been contacted from some colleagues00:34
anteayaArxCruz: cool00:34
anteayayes if IBM has a neutron plugin they will have to test it00:34
*** rcleere has quit IRC00:35
openstackgerritClark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev  https://review.openstack.org/6132100:35
clarkbzaro: fungi ^00:35
fungianteaya: i take it the gerrit jenkins plugin solution described on the third-party testing howto was insufficient for most of them?00:35
*** vipul is now known as vipul-away00:35
*** vipul-away is now known as vipul00:35
anteayafungi: I don't think that even came up00:37
anteayaArxCruz: fungi meeting logs: http://eavesdrop.openstack.org/meetings/networking_third_party_testing/2013/networking_third_party_testing.2013-12-12-17.00.log.html00:37
lifelessjeblair: sweeet00:37
anteayaetherpad: https://etherpad.openstack.org/p/multi-node-neutron-tempest00:37
fungianteaya: http://ci.openstack.org/third_party.html00:37
clarkbfungi: there seems to be a large lack of reading prior art and a lot of what do we do00:37
* anteaya clicks00:37
anteayaclarkb: large lack00:38
lifelessanteaya: so, I will answer questions as needed, but folk should come here firstly.00:38
lifelessanteaya: this is the community forum for discussion00:38
anteayafungi: I hadn't read that before either, that looks so simple00:39
clarkbjeblair: the more you say about it the mroe I am interested :) curious to know what your plan for piping the data through crm114 is and what the crm114 setup looks like (iirc crm114 can do several different types of filters)00:39
clarkbjeblair: but don't let me distract you00:39
anteayafungi: so all they would need is their own Jenkins with this trigger plugin?00:39
anteayalifeless: understood00:40
fungianteaya: plus stuff for their jenkins to run, and a place to post their logs00:40
anteayaI find it helpful to give them names, otherwise they tend to stay silent and just write it all themselves00:40
clarkbfungi: a lot of them want to do things that don't lend well to the trigger plugin00:40
anteayafungi: right, so simple00:40
anteayathat is true too00:41
clarkbthey want mutli node baremetal testing with single use environment and the ability for granular control over what events trigger specific jobs00:41
clarkbtl;dr I really think they should look at zuul devstack-gate and nodepool00:41
anteayain any case they should be in here asking questions00:41
fungigot it00:41
anteayaso brace for onslaught00:41
anteayaat least I hope they show up in here asking questions00:42
openstackgerritTom Fifield proposed a change to openstack-infra/config: Add welcome_message.py to patchset-created trigger  https://review.openstack.org/6189800:42
openstackgerritClark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev  https://review.openstack.org/6132100:44
clarkbthat should fix the lint error I hope00:44
*** dstanek has joined #openstack-infra00:44
*** sarob has quit IRC00:46
*** sarob has joined #openstack-infra00:46
*** dstanek has quit IRC00:49
*** senk has joined #openstack-infra00:50
*** sarob has quit IRC00:51
*** vipul is now known as vipul-away00:51
openstackgerritlifeless proposed a change to openstack-infra/reviewstats: Ghe is tripleo-core now.  https://review.openstack.org/6190000:51
openstackgerritTom Fifield proposed a change to openstack-infra/jeepyb: Add dryrun flag to welcome_message.py  https://review.openstack.org/6190100:52
openstackgerritTom Fifield proposed a change to openstack-infra/config: Add welcome_message.py to patchset-created trigger  https://review.openstack.org/6189800:53
*** vipul-away is now known as vipul00:53
clarkbsdague: jeblair fungi https://bugs.launchpad.net/devstack/+bug/1253482 see my last comment there00:53
uvirtbotLaunchpad bug 1253482 in keystone "Keystone default port in linux local ephemeral port range. Devstack should shift range." [Undecided,In progress]00:53
*** senk has quit IRC00:54
fungiclarkb: good point00:55
*** mriedem1 has quit IRC00:55
funginodepool would have them marked as used but jenkins might have undone its marker for them being used single-use slaves?00:56
jeblairfungi: yeah, the gearman plugin marks them as offline.  i'm guessing a restart marks them all online again.00:56
jeblair(so not actually a nodepool thing but rather a jenkins thing)00:57
funginext time we crash or even reboot a jenkins master, should we nodepool-delete all ephemeral slaves associated with it before starting again?00:57
clarkbjeblair: oh right00:57
clarkbfungi: all used slaves00:57
fungiright, that00:57
*** praneshp has quit IRC00:57
jeblairi'm trying to think of something nodepool could do, but the bad behavior is that jenkins brings all previously known slaves online immediately...00:58
jeblairi think perhaps once we get to all-dynamic slaves, we could probably write a quick script to remove all slaves from the config before (re-)starting jenkins00:58
fungiso, like, here in a moment when we put jenkins02 into prepare-for-shutdown, we should clear used 02 slaves out once it quiesces00:58
clarkbjeblair: right. is nodepool doing a temporary offline or a normal offline?00:58
jeblairfungi: if you put it in shutdown, they should all go away on their own.00:58
fungiahh00:59
clarkber sorry gearman plugin00:59
fungiso really only in the case of unanticipated jenkins failure00:59
jeblairclarkb: i think there is only "offline" and "disconnect";  so i think it's just doing offline.  disconnect would be problematic.  if there's an offline that's more than offline, i'm not familiar with it.00:59
*** mrodden has quit IRC00:59
jeblairclarkb: (but this part of jenkins is kind of a mess, with internal terms not lining up at all with ui elements, etc)01:00
clarkbjeblair: the gui button says "Mark this node temporarily offline"01:00
clarkbI am guessing that lines up to offline and ya disconnect would be bad01:00
jeblairclarkb: i believe that's what's going on.01:00
fungiwhy is disconnect particularly bad?01:01
jeblairanyway, it sounds like we can make an improvement soon.01:01
jeblairfungi: it'll stop the running job01:01
fungioh01:02
fungiyeah, that's bad. okay, important safety tip01:02
jeblairfungi: heh.  yeah, this happens immediately when a job starts so that there's no race condition with doing this when it finishes.01:02
*** ^demon|lunch has quit IRC01:02
*** blamar has joined #openstack-infra01:02
fungisort of the jenkins equivalent of total protonic reversal. got it01:03
clarkbwould be nice if we had temporary offline functionality that wasn't temporary01:03
jeblairwe could probably change the node label too, but that's a lot of extra work for gearman-plugin.01:03
jeblairmay be worth looking into though.01:04
clarkbjeblair: and in the long run probably better putting the effort into making jenkins more reliable01:04
jeblair(or more gone)01:04
clarkbfungi: re https://review.openstack.org/#/c/61321/3 did you want to try applying that to nodepool.o.o and jenkins-dev? I can give it a shot tomorrow01:05
fungiclarkb: doing now01:05
*** jhesketh_ has quit IRC01:05
openstackgerritA change was merged to openstack-dev/hacking: Fix typos of comment in module core  https://review.openstack.org/6111101:06
*** jerryz__ has quit IRC01:06
clarkbfungi: I would --noop it first time just to make sure we didn't do anything silly :)01:06
fungiyup01:07
anteayaAaronGr: --noop is no op or no operation, it means a test which stands up a devstack on a node and returns true01:09
clarkbanteaya: I think AaronGr is familiar with the puppets01:09
anteayaAaronGr: it's main purpose that I know of is a placeholder for further tests01:09
clarkbin fact I think we might be able to bug him about making puppet stuff better >_>01:09
anteayaclarkb: ah okay, will look for him to teach me about the puppets01:09
anteayathank you01:09
fungianteaya: in this case it means i want puppet to pretend to apply new configuration but not actually do it01:10
fungi(no op is a very common term in computing)01:10
anteayathe pretending to apply configuration is always so satisfying01:10
anteayaah sorry, my mistake then was new for me01:10
fungiwell, when it tells me what it's going to do without actually doing it (and then screwing things up), yes satisfying ;)01:10
anteayaguess I am sharing the stuff I wish I knew01:11
anteayahehe01:11
AaronGranteaya: thankfully, one thing i am bringing with me is a bit of puppet, been actively using it for about a year (i run a puppetmaster in my house, for my home network)01:13
fungimmm, puppet agent was not even running on jenkins-dev. going to update it from production before i --noop01:13
anteayaAaronGr: awesome, I will ask you many stupid puppet questions01:13
anteayaget ready01:13
AaronGranteaya: fair trade, you're welcome to anything i know.  have looked through about 40% of infra/config -- i saw at least 10 spots modules could get rewritten or refactored easily01:14
* StevenK waits for "Is the puppet made out of oak, pine or maple?"01:14
AaronGrplus some really cool places to use more hiera.01:14
anteayaAaronGr: awesome01:14
fungiahh, i think the agent must have been left stopped while testing the previously-broken nodepool addition01:14
anteayahiera I don't understand at all, so if you do, power to you01:14
anteayaAaronGr: I have a few infra bugs with my name on it that you might like, puppety stuff01:15
anteayaAaronGr: been in #openstack-neutron since the summit, hard to wear two hats, at least for me01:15
*** mrodden has joined #openstack-infra01:15
*** oubiwan__ has quit IRC01:15
clarkbfungi: ya that was probably me01:16
AaronGranteaya: i'll happily take them, though not until monday, when i get fully up to speed.  after that, pour them on.01:16
*** mrodden has quit IRC01:16
anteayaAaronGr: fair enough01:16
fungiclarkb: see review comment. jenkins_dev_api_user01:17
*** mrodden has joined #openstack-infra01:17
*** ljjjustin has joined #openstack-infra01:18
clarkbfungi: looking01:19
fungiclarkb: playing around with fixing it now. i think the template needs to just not use _dev on its vars01:19
clarkbfungi: oh right, because we collapsed the variables in puppet01:20
fungiyep, those three lines need fixing, but that's not all. new errors once i do01:20
clarkbwoo01:20
fungiupdated comments with the new errors01:23
fungithough perhaps those are an artifact of --noop01:23
fungi?01:23
*** ryanpetrello has quit IRC01:24
fungii can try dropping the --noop and seeing if it applies cleanly01:24
clarkbya those look like artifacts of the --noop01:24
openstackgerritClark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev  https://review.openstack.org/6132101:24
clarkbfungi: ^ that removes the dev vars from the erb01:24
*** jhesketh has joined #openstack-infra01:25
*** hogepodge has quit IRC01:25
*** oubiwan__ has joined #openstack-infra01:26
Alex_GaynorSo the gate is about 12 hours behind real time. Is that entirely because of resets, or other causes?01:28
clarkbAlex_Gaynor: mostly resets01:29
fungiclarkb: yet still more new error comment01:29
clarkbthe sphinx thing and changes getting approved anyways really made it thrash yesterday01:29
*** syerrapragada1 has quit IRC01:29
*** syerrapragada has joined #openstack-infra01:30
clarkbfungi: if you can pip install by hand does it work?01:30
*** syerrapragada has left #openstack-infra01:31
*** praneshp has joined #openstack-infra01:31
fungiit may be missing dependencies for compiling libzmq01:31
clarkbthat could be01:31
fungiahh, yeah01:31
fungigcc: error trying to exec 'cc1plus': execvp: No such file or directory01:31
fungigrah01:32
clarkbdon't we put build-essential everywhere?01:32
clarkbfungi: curious why that wasn't a problem on nodepool.o.o01:33
fungiInstalled: (none)01:34
fungiCandidate: 11.5ubuntu2.101:34
*** praneshp_ has joined #openstack-infra01:34
fungias opposed to nodepool.o.o, Installed: 11.5ubuntu2.101:34
fungiso, no, jenkins-dev has nothing telling it to install build-essential apparently01:35
*** weshay has quit IRC01:35
*** praneshp has quit IRC01:36
*** praneshp_ is now known as praneshp01:36
clarkbinteresting01:36
*** syerrapragada1 has joined #openstack-infra01:36
fungiclarkb: also, do we still want to restart jenkins02? if so, i can go ahead and put it in shutdown now01:37
clarkbI wonder if jeblair installed that by hand, git grep doesn't show it anywhere that nodepool.o.o would pick it up on01:37
clarkbfungi: sure01:37
clarkbfungi: I am adding build-essential to the nodepool module now01:37
fungii got precise23 and precise40 back online in jenkins, but precise5 and precise9 don't seem to want to relaunch the slave agent even after rebooting (and i'm able to ssh into them fine)01:38
fungijenkins02 is in prepare for shutdown now01:38
openstackgerritClark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev  https://review.openstack.org/6132101:38
clarkbfungi: weird01:38
*** jhesketh__ has joined #openstack-infra01:38
clarkbtry that ^01:38
*** gyee has quit IRC01:38
*** dims has quit IRC01:39
*** syerrapragada1 has quit IRC01:39
fungilooks like we've got about 40 minutes to quiescence on jenkins02, based on most recently-started jobs01:39
fungiclarkb: success01:44
clarkbwoot01:44
fungihrm, though nodepool's still not installed01:44
fungithat's... no good01:44
clarkboh because the repo didn't refresh the installer?01:45
clarkbyou can probably just delete the repo and make it reclone01:45
*** dstanek has joined #openstack-infra01:45
fungigood call01:45
fungii thought it finished rather quickly on that run :(01:45
clarkbwe should just make everything stateless01:45
*** xchu has joined #openstack-infra01:46
funginodepool==3871acf01:47
*** sdake_ has quit IRC01:47
fungimuch better01:47
*** sdake_ has joined #openstack-infra01:47
*** sdake_ has quit IRC01:47
*** sdake_ has joined #openstack-infra01:47
fungiand nodepool list works (though the list is of course empty at the moment)01:47
fungialien-list and alien-image-list return entries though, so auth is definitely sane01:48
clarkbfungi: btw what was the process for getting the credential id? did you add a credential to jenkins-dev then go grab an id out of the xml?01:48
fungiclarkb: yes01:49
*** dstanek has quit IRC01:50
fungii figured out where to find it first by grep'ing the prod one out of jenkins01, then confirmed that jenkins-dev had none, then went into manage credentials and added one which matched the settings in the jenkins01 webui, then fished it out of the xml after that01:50
fungiand bob's your uncle01:50
clarkbfungi: I wonder if we need to change min-ready to 1 in the nodepool config01:51
fungiprobably01:51
clarkbis nodepool building an image currently?01:51
fungiit didn't seem to be when i looked, but i'll look again01:51
openstackgerritClark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev  https://review.openstack.org/6132101:51
funginope. image-list is still empty01:51
clarkbthat bumps the min-ready number01:51
fungitesting01:51
clarkbI bet that fixes the image-listing01:51
*** dims has joined #openstack-infra01:52
*** senk has joined #openstack-infra01:52
fungiaha, no... nodepool daemon didn't start01:54
fungiclarkb: once i *started* the nodepool initscript, it began to build an image01:56
fungidid we skip an ensure => running?01:56
clarkbfungi: possibly. I know jeblair isn't a fan of ensure => running01:57
fungiyup, modules/nodepool/manifests/init.pp doesn't do it01:57
fungiokay, mystery solved01:57
*** senk has quit IRC01:58
*** dstanek has joined #openstack-infra01:58
*** senk has joined #openstack-infra02:00
*** AaronGr is now known as AaronGr_afk02:01
*** CaptTofu has joined #openstack-infra02:03
*** yongli has quit IRC02:05
*** locke105 has joined #openstack-infra02:09
*** senk has quit IRC02:10
*** mrodden1 has joined #openstack-infra02:16
*** WarrenUsui has quit IRC02:17
*** sdake_ has quit IRC02:17
*** WarrenUsui has joined #openstack-infra02:18
*** mrodden has quit IRC02:18
*** senk has joined #openstack-infra02:20
*** senk has quit IRC02:24
*** senk has joined #openstack-infra02:24
*** senk1 has joined #openstack-infra02:28
*** senk has quit IRC02:29
*** yaguang has joined #openstack-infra02:29
*** senk1 has quit IRC02:31
*** mriedem has joined #openstack-infra02:31
*** senk has joined #openstack-infra02:32
*** reed has joined #openstack-infra02:36
*** senk has quit IRC02:37
*** CaptTofu_ has joined #openstack-infra02:42
*** bingbu has joined #openstack-infra02:43
*** CaptTofu has quit IRC02:44
*** guohliu has joined #openstack-infra02:45
*** sarob has joined #openstack-infra02:46
*** SushilKM has joined #openstack-infra02:57
*** yongli has joined #openstack-infra02:58
*** yamahata_ has quit IRC03:05
*** beagles has quit IRC03:08
*** b3nt_pin has joined #openstack-infra03:09
*** b3nt_pin is now known as beagles03:09
*** mestery_ has joined #openstack-infra03:11
*** mestery has quit IRC03:14
*** sdake_ has joined #openstack-infra03:16
*** pcrews has quit IRC03:16
*** dkliban has quit IRC03:18
mordredjeblair: I support your crm114 efforts. that's really f-ing cool03:22
clarkbmordred: it is incredibly cool. I will owe jeblair lots of alcohol I bet03:23
mordredclarkb: ++03:23
clarkbmordred: one of the things I have been incredibly happy about the whole logstash elasticsearch thing is that it has enabled folks to hack on it in simple ways without needing too many crazy workarounds for eg logs behind apache03:25
clarkbI think that portion of the system has turned out well. It isn't all perfect though. A lot of the data could be modeled with relations and we don't have that03:25
mordredclarkb: yah. it's one of the coolest things ever03:25
mordredclarkb: I think it also goes to show the power to logging in sane ways03:25
anteayamordred: what country are you in?03:27
anteayagood thing you aren't a micro manager, I haven't talked to you in a month03:28
StevenKLast I heard, it was .es, but that could have changed03:28
mordredanteaya: spain. flying out in a few hours03:28
mordredStevenK: I see you've already started playing everyone's favorite game "Where in the world is mordred?"03:28
mordredanteaya: that's right - would you like me to micro-manage more?03:29
mordredanteaya: go do things!03:29
mordredthat's all I've got03:29
clarkbha03:29
clarkbeven when he tries he isn't able :)03:29
*** dkliban has joined #openstack-infra03:30
anteayamordred: great, whew thanks for that03:30
anteayaI feel better know03:30
anteayanow03:30
mordredclarkb: do - uhm - differnet things. perhaps scrumming something is a good choice?03:30
mordredclarkb: or kanban. definitely you should kanban something03:30
clarkbmordred: got it03:30
mordredphew03:30
* mordred wins03:30
clarkbfungi: mordred wants us to put up a board with post its. do you have room in your lab?03:31
* mordred has kanbanned his employees employing scrum methodology03:31
clarkbfungi: then we can build a robot to move things around for us03:31
mordredclarkb: only if the robot speaks japanese03:31
*** AlexF has joined #openstack-infra03:32
fungipost-it robot, got it03:33
fungitomorrow maybe03:33
fungimordred: agile something something/03:33
StevenKAgile Robot-Driven Development ?03:34
fungi(...kill all humans...)03:35
fungiyes03:36
clarkbmore evidence that everyone from NC is a robot03:36
anteayawell at least fungi is online a lot03:36
anteayaand mostly gives kind answers03:37
anteayawho am I do judge his robot internals03:37
StevenKHaha03:37
*** changbl has joined #openstack-infra03:37
*** ryanpetrello has joined #openstack-infra03:37
fungiin the south we say "rowbut"03:37
clarkbfungi: like zoidberg03:37
StevenKZoidberg is more 'robbit'03:37
anteayaoh I like rowbut03:38
funginewqular rowbuts03:38
anteayarobbit rabbit hobbit03:38
anteayaha ha ha03:38
*** mestery has joined #openstack-infra03:39
*** ArxCruz has quit IRC03:41
*** mestery_ has quit IRC03:42
*** AlexF has quit IRC03:45
*** jhesketh__ has quit IRC03:52
*** mriedem has quit IRC03:52
*** jhesketh__ has joined #openstack-infra03:53
*** pabelanger_ has joined #openstack-infra03:56
*** weshay has joined #openstack-infra03:59
*** sarob has quit IRC04:01
*** sarob has joined #openstack-infra04:01
*** krtaylor has joined #openstack-infra04:02
*** pabelanger_ has quit IRC04:02
*** sarob has quit IRC04:06
*** sarob has joined #openstack-infra04:06
*** sarob has quit IRC04:11
*** AaronGr_afk is now known as AaronGr04:13
*** AaronGr is now known as AaronGr_Zzz04:13
*** SushilKM has quit IRC04:15
*** SushilKM has joined #openstack-infra04:17
*** jcooley_ has joined #openstack-infra04:17
*** SushilKM has quit IRC04:20
*** SushilKM has joined #openstack-infra04:21
*** sharwell has quit IRC04:22
*** pabelanger_ has joined #openstack-infra04:22
*** CaptTofu_ has quit IRC04:25
*** SushilKM has quit IRC04:25
*** CaptTofu has joined #openstack-infra04:25
*** pabelanger__ has joined #openstack-infra04:27
*** pabelanger__ has quit IRC04:27
*** guohliu has quit IRC04:29
*** CaptTofu has quit IRC04:30
*** esker has joined #openstack-infra04:34
*** dkliban has quit IRC04:40
*** guohliu has joined #openstack-infra04:42
*** dkliban has joined #openstack-infra04:43
*** pabelanger_ has quit IRC04:44
*** boris-42 has joined #openstack-infra04:49
*** dkliban has quit IRC05:00
*** sarob has joined #openstack-infra05:05
openstackgerritMatthew Treinish proposed a change to openstack-infra/devstack-gate: Up the default concurrency on tempest runs  https://review.openstack.org/5860505:06
*** jcooley_ has quit IRC05:06
*** sarob has quit IRC05:09
*** sickboy3i has joined #openstack-infra05:10
*** guohliu has quit IRC05:11
*** sickboy3i has quit IRC05:11
*** ryanpetrello has quit IRC05:13
*** dstanek has quit IRC05:13
*** jcooley_ has joined #openstack-infra05:15
*** ryanpetrello has joined #openstack-infra05:19
*** ljjjusti1 has joined #openstack-infra05:20
*** weshay has quit IRC05:21
*** ljjjustin has quit IRC05:21
*** guohliu has joined #openstack-infra05:22
*** SergeyLukjanov has joined #openstack-infra05:27
*** dstanek has joined #openstack-infra05:29
*** vkozhukalov has joined #openstack-infra05:30
*** nicedice has quit IRC05:35
*** reed has quit IRC05:36
*** basha has joined #openstack-infra05:38
bashaHi, anyone around?05:38
clarkbbasha: sort of, whats up?05:38
bashafacing a small issue with a patch05:38
bashaclarkb: https://review.openstack.org/#/c/60188/105:38
bashaThe jenkins seems to pass.05:39
bashabut when I look at the logs for python26/2705:39
bashait seems a lil weird05:39
bashaclarkb: ^^05:39
*** Abhishek has joined #openstack-infra05:39
*** talluri has joined #openstack-infra05:40
clarkbbasha: I see hte logs and exceptions05:40
clarkbbut nose is reporting that the tests pass05:40
bashaisnt it lil weird clarkb ? Does this happpen often?05:41
bashabtw whats hte logs?05:41
basha:D05:41
clarkbbasha: I have no idea, that would be questions for glance05:41
zaroclarkb: trying to use macbook, i'm sucking :(05:41
clarkbzaro: I'm sorry, I can't help you with the aluminum blocks05:42
bashaclarkb: have u seen this happen before?05:42
clarkbbasha: infra runs the tests, we aren't typically very good at answering questions about test weirdness05:42
bashaclarkb: zaro : macs rock!! :P05:42
*** sdake_ has quit IRC05:42
clarkbthe tests themselves fall under the responsibility of the project and the project itself would be most familiar05:43
*** dstanek has quit IRC05:43
bashaclarkb: OK. I was just a lil puzzled that jenkins went green, but the logs seemed to be weird05:43
zarobasha:i'm newbie. shortcut keys don't work same on weechat.05:43
clarkbI would expect the exception at http://logs.openstack.org/88/60188/6/check/gate-glance-python27/bf13e3b/console.html#_2013-12-12_14_46_49_807 to cause the test to fail but nose doesn't agree with me05:43
clarkbzaro: is this a loaner?05:43
zaroclarkb: hopefully, but might be perm05:44
*** Abhishek has quit IRC05:44
zaroclarkb: tara says she's gonna try to get same hp again but she says it's unlikely05:44
bashazaro: http://support.apple.com/kb/ht134305:44
clarkbzaro: :(05:44
bashahope that helps :P05:44
clarkbbasha: jenkins is just looking at the exit code of nose05:45
bashaclarkb: I've seen that fail before.05:45
zaro http://support.apple.com/kb/ht134305:45
zaro05:37:47          clarkb | zaro: :(05:45
clarkbbasha: if nose reports success jenkins reports success, and nose is clearly reporting success05:45
zarogah!05:45
bashaclarkb: I guess thats an ignored test perhaps05:45
clarkbbasha: could also be a nose bug, nose is not the greatest test runner around05:46
clarkbor a test bug05:46
bashaclarkb: I see migrations running, which I havnt seen in unit tests happening05:46
clarkbbasha: I think the DB migration tests depend on having a mysql and or postgres server laying around configured properly05:46
zaroclarkb: my only other option was to get one of those bricks.05:47
*** dstanek has joined #openstack-infra05:47
bashaclarkb: hmmmmm…. l look into this in a bit more detail and let u know :)05:47
clarkbzaro: I would've gotten a brick :)05:47
bashaclarkb: thanks a lot05:47
zarobasha: that did not help. trying to figure out why alt-j doesn't work in weechat.05:48
zaroclarkb: are you kidding me?  that thing is like 10 lbs.05:48
bashazaro: I dont use weechat :D05:48
basha:P05:48
clarkbzaro: I wouldn't carry it anywhere05:50
clarkbbut at least I would have a useable machine at my desk05:50
clarkbbasha: http://logstash.openstack.org/#eyJzZWFyY2giOiIgYnVpbGRfbmFtZTpnYXRlLWdsYW5jZS1weXRob24yKiBBTkQgbWVzc2FnZTpcIk9wZXJhdGlvbmFsRXJyb3JcIiBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiODY0MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIiwic3RhbXAiOjEzODY5MTM4MTE2NTl9 looks like that exception05:50
clarkbhappens quite a bit05:50
*** yamahata_ has joined #openstack-infra05:51
bashaclarkb: yeah. I've seen it break couple of times but the nose still passes.05:52
zaroclarkb: my broken laptop can work as a desktop.05:53
*** jcooley_ has quit IRC05:54
*** jcooley_ has joined #openstack-infra05:54
clarkbzaro: oh its just the display that is bad? I bet you could replace it05:57
zaroclarkb: you got any experience with that?05:58
clarkbsort of, you need to find a replacement display thta is compatible, then when you do the teardown document everything otherwise it doesn't go back together05:59
*** jcooley_ has quit IRC05:59
zarodisplay is probably 80% of cost anyway.06:00
clarkbnot really those laptops have crappy cheap displays06:00
clarkbthe cpu and related peripherals are typically the costly bits06:00
zaroyeah, they do.06:01
zarothese things look pretty sealed.  probably need special tools or something.06:01
*** jcooley_ has joined #openstack-infra06:02
zarocan't even scrollback on this mac06:03
*** dstanek has quit IRC06:03
*** jcooley_ has quit IRC06:04
*** jcooley_ has joined #openstack-infra06:04
clarkbhttps://www.laptopscreen.com/English/model/HP-Compaq/ELITEBOOK~FOLIO~9470M/ is the part06:05
*** sarob has joined #openstack-infra06:06
*** Abhishek_ has joined #openstack-infra06:08
zarohow come it looks so easy on that image?  i don't even see the screes on the display.06:08
clarkbthey always make it look easy :)06:10
clarkb"flex the inside edges of the bottom edge (1), the left and right sides (2), and the top edge (3) of the display bezel until the display bezel disengages from the display enclosure"06:12
zarohey maybe it's the type of shell.  mac default shell is xterm.  what is it on ubuntu?06:12
clarkbzaro: you were using konsole which probably presents itself as an xterm06:12
clarkbswapping the display doesn't look too bad if you can pop the bezel off06:13
zarodang it! page up on mac scrolls the screen, not the backsrcroll06:13
zaropretty good pxd/hous game on tnt06:14
zaroyour right about aldridge, he da  man.06:16
clarkband batum and lillard and matthews06:18
openstackgerritlifeless proposed a change to openstack-infra/reviewstats: Pin Sphinx.  https://review.openstack.org/6192106:19
openstackgerritlifeless proposed a change to openstack-infra/reviewstats: Ghe is tripleo-core now.  https://review.openstack.org/6190006:19
*** jcooley_ has quit IRC06:19
zarook. i'm done mucking with this for tonight.  good night.06:20
clarkbnight06:20
*** denis_makogon has joined #openstack-infra06:20
*** vkozhukalov has quit IRC06:21
*** slong_ has quit IRC06:24
*** ryanpetrello has quit IRC06:28
*** SushilKM has joined #openstack-infra06:39
*** vogxn has joined #openstack-infra06:41
*** basha has quit IRC06:42
*** bingbu has quit IRC06:50
*** sarob has quit IRC06:54
*** basha has joined #openstack-infra06:54
*** sarob has joined #openstack-infra06:55
openstackgerritA change was merged to openstack/requirements: Add oslo.rootwrap to global requirements  https://review.openstack.org/6173806:59
*** sarob has quit IRC06:59
*** NikitaKonovalov has joined #openstack-infra07:03
*** basha has quit IRC07:05
*** SergeyLukjanov is now known as _SergeyLukjanov07:05
*** bingbu has joined #openstack-infra07:06
*** _SergeyLukjanov has quit IRC07:06
openstackgerritlifeless proposed a change to openstack-infra/reviewstats: Pin Sphinx.  https://review.openstack.org/6192107:14
openstackgerritlifeless proposed a change to openstack-infra/reviewstats: Ghe is tripleo-core now.  https://review.openstack.org/6190007:14
*** SergeyLukjanov has joined #openstack-infra07:19
*** sarob has joined #openstack-infra07:25
*** yolanda has joined #openstack-infra07:28
*** dstanek has joined #openstack-infra07:30
*** basha has joined #openstack-infra07:31
*** basha has quit IRC07:31
*** rcarrillocruz has joined #openstack-infra07:33
*** dstanek has quit IRC07:35
*** senk has joined #openstack-infra07:40
*** bingbu has quit IRC07:41
*** SergeyLukjanov is now known as _SergeyLukjanov07:44
*** _SergeyLukjanov has quit IRC07:45
*** sergmelikyan has joined #openstack-infra07:46
sergmelikyan>>/msg chanserv access #murano add openstackinfra +AFRfiorstv07:46
*** Abhishek_ has quit IRC07:46
sergmelikyanWhy is bot require such privileges?07:46
sergmelikyanAnd are they required to merge https://review.openstack.org/61703?07:48
*** andreaf has joined #openstack-infra07:51
*** vkozhukalov has joined #openstack-infra07:52
*** oubiwan__ has quit IRC07:53
*** dizquierdo has joined #openstack-infra07:54
*** jcoufal has joined #openstack-infra07:55
*** sarob has quit IRC07:57
*** vkozhukalov has quit IRC08:02
openstackgerritA change was merged to openstack-infra/devstack-gate: Adding an option to use qpid instead of rabbit or zeromq  https://review.openstack.org/5582908:04
*** flaper87|afk is now known as flaper8708:06
*** vogxn1 has joined #openstack-infra08:06
*** vogxn has quit IRC08:08
*** praneshp has quit IRC08:11
*** vogxn1 has quit IRC08:11
*** SergeyLukjanov has joined #openstack-infra08:12
*** bingbu has joined #openstack-infra08:13
*** vkozhukalov has joined #openstack-infra08:14
*** praneshp has joined #openstack-infra08:16
*** basha has joined #openstack-infra08:18
*** nprivalova has joined #openstack-infra08:23
*** denis_makogon has quit IRC08:25
*** rcarrillocruz1 has joined #openstack-infra08:26
*** sarob has joined #openstack-infra08:26
*** rcarrillocruz has quit IRC08:28
*** praneshp has quit IRC08:29
*** rongze has joined #openstack-infra08:29
*** senk has quit IRC08:30
*** xchu has quit IRC08:31
*** sarob has quit IRC08:34
*** iv_m has joined #openstack-infra08:38
*** bingbu has quit IRC08:38
*** salv-orlando has joined #openstack-infra08:39
*** jpich has joined #openstack-infra08:41
*** sHellUx has joined #openstack-infra08:45
SergeyLukjanovfungi, mordred, clarkb, jeblair, hey guys08:45
SergeyLukjanovQueue lengths: 245 events, 382 results08:45
SergeyLukjanov^^ in zuul, looks not very good08:46
SergeyLukjanovmany of jobs are failing with https://jenkins02.openstack.org/job/gate-cinder-docs/3172/console08:48
*** afazekas has joined #openstack-infra08:48
*** yongli has quit IRC08:48
*** dizquierdo has quit IRC08:49
*** bingbu has joined #openstack-infra08:51
*** nosnos has joined #openstack-infra08:55
*** apevec has joined #openstack-infra08:58
*** apevec has joined #openstack-infra08:58
*** yassine has joined #openstack-infra08:58
*** yassine has quit IRC09:00
*** yassine has joined #openstack-infra09:00
*** yassine has quit IRC09:02
apevecjava.io.IOException: Remote call on precise14 failed - is that broken Jenkins slave ?09:03
apevechttp://logs.openstack.org/32/61532/1/gate/gate-heat-python27/5d7c9dc/console.html09:03
apevecthat failed reverification of 61532 which blocks Heat CVE fixes on stable/havana :(09:04
*** yamahata_ has quit IRC09:04
*** rongze has quit IRC09:05
*** yassine has joined #openstack-infra09:06
*** yassine has quit IRC09:06
openstackgerritRuslan Kamaldinov proposed a change to openstack-infra/config: Add jenkins03, jenkins04 to cacti  https://review.openstack.org/6193809:07
*** yassine has joined #openstack-infra09:07
openstackgerritAbhishek Chanda proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1249889  https://review.openstack.org/6193909:10
uvirtbotLaunchpad bug 1249889 in tempest "tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern[compute,image,volume] failed" [Undecided,Invalid] https://launchpad.net/bugs/124988909:10
*** kruskakli has left #openstack-infra09:12
*** Abhishek_ has joined #openstack-infra09:14
*** derekh has joined #openstack-infra09:16
*** bingbu has quit IRC09:17
*** SergeyLukjanov has quit IRC09:22
*** sarob has joined #openstack-infra09:26
*** jooools has joined #openstack-infra09:34
*** rongze has joined #openstack-infra09:34
*** sHellUx has quit IRC09:45
*** hashar has joined #openstack-infra09:48
*** zhiyan has joined #openstack-infra09:48
*** rossella_s has joined #openstack-infra09:49
*** nosnos has quit IRC09:51
*** johnthetubaguy has joined #openstack-infra09:56
*** sarob has quit IRC09:58
*** saschpe_ has joined #openstack-infra10:01
*** saschpe has quit IRC10:02
*** ArxCruz has joined #openstack-infra10:06
andreafhi - I'm working on a tempest change which has the following implication: listing servers requires tempest.conf to be available.  gate-tempest-py27 is failing because tempest.conf is missing. Is it possible that the config file has not been generated yet when this check runs? I though devstack would create tempest.conf at setup. What am I missing?10:07
*** masayukig has quit IRC10:08
*** basha has quit IRC10:08
*** apevec has quit IRC10:09
*** SergeyLukjanov has joined #openstack-infra10:10
*** jhesketh__ has quit IRC10:12
openstackgerritAlexandre Levine proposed a change to openstack-infra/config: Adding empty gce-api project to stackforge  https://review.openstack.org/6195410:13
*** dizquierdo has joined #openstack-infra10:16
*** nprivalova has quit IRC10:16
*** apevec has joined #openstack-infra10:21
*** apevec has joined #openstack-infra10:22
openstackgerritAlexandre Levine proposed a change to openstack-infra/config: Adding empty gce-api project to stackforge  https://review.openstack.org/6195410:22
*** sarob has joined #openstack-infra10:26
*** guohliu has quit IRC10:29
apevecttx, thanks for filing bug 1260654 that slave seems really broken: https://jenkins02.openstack.org/computer/precise14/builds10:31
uvirtbotLaunchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,New] https://launchpad.net/bugs/126065410:31
*** sarob has quit IRC10:31
apeveconly gate-noop works (what does it do?)10:31
*** ArxCruz has quit IRC10:31
apeveccan that machine be removed from the pool?10:31
*** dstanek has joined #openstack-infra10:33
ttxapevec: it can, but not by me10:33
*** flaper87 is now known as flaper87|afk10:33
ttxWe don't have a good answer yet for borked slaves in european mornings10:33
apevecok, then it will be Russian roulette in the gate10:33
ttxsince the people with power to kill them are not up10:33
ttxmordred, fungi ^10:34
apeveclicense to kill10:34
*** ljjjusti1 has quit IRC10:35
*** dstanek has quit IRC10:37
*** nprivalova has joined #openstack-infra10:40
BobBallwe need a batphone10:42
*** chandankumar has quit IRC10:43
chmouelttx: we are trying to find resource here at eNovance who can help infra during european times10:45
*** senk has joined #openstack-infra10:45
*** chandankumar has joined #openstack-infra10:46
*** senk has quit IRC10:48
*** senk has joined #openstack-infra10:49
openstackgerritVadim Rovachev proposed a change to openstack-infra/devstack-gate: Added ceilometer-anotification to enabled services  https://review.openstack.org/6195810:53
*** senk has quit IRC10:53
*** paul-- has quit IRC10:56
*** sergmelikyan has quit IRC11:00
*** paul-- has joined #openstack-infra11:04
*** marun has joined #openstack-infra11:05
apevecmore bad slaves, now precise20 https://jenkins02.openstack.org/job/gate-nova-python27/13176/console11:05
apevecCaused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.tools.ant.Location11:06
apeveclooks like it lost some java packages??11:06
*** rongze has quit IRC11:06
*** lcestari has joined #openstack-infra11:09
*** markmc has joined #openstack-infra11:19
ttxew11:21
openstackgerritDarragh Bailey proposed a change to openstack-infra/jenkins-job-builder: Use yaml local tags to support including files  https://review.openstack.org/4878311:26
*** sarob has joined #openstack-infra11:26
*** marun has quit IRC11:29
*** rongze has joined #openstack-infra11:38
*** katyafervent has quit IRC11:45
*** afazekas has quit IRC11:47
*** rongze has quit IRC11:49
*** zhiyan has quit IRC11:49
*** sandy__ has quit IRC11:53
*** sandy__ has joined #openstack-infra11:54
*** sandy__ has quit IRC11:57
*** sarob has quit IRC11:58
*** jcoufal has quit IRC12:01
*** jcoufal has joined #openstack-infra12:02
*** nprivalova has quit IRC12:03
*** nprivalova has joined #openstack-infra12:04
sdagueso was there a bug against openstack ci on the jenkins crash?12:05
yassineHello all,12:05
yassinei got some issues with my patch https://review.openstack.org/#/c/60499  it looks like zookeeper package was not12:05
yassinesuccessfully installed in some slaves despite this patch which add zookeeper in Puppet manifest https://review.openstack.org/#/c/60509  for jenkins slaves.  Is it a known issue ? How could it be fixed ? :/12:05
apevecsdague, ttx filed bug 1260654 for one instance of NoClassDefFoundError12:07
uvirtbotLaunchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,New] https://launchpad.net/bugs/126065412:07
sdagueapevec: I actually meant the reusing of the slaves12:07
sdaguewhich caused all the jobs to fail12:07
sdaguebecause right now we are sticking it on another bug12:08
sdaguewhich wasn't really the story12:08
apevecoh, I don't know about "reusing of the slaves", what was that?12:09
*** fifieldt has quit IRC12:09
sdaguelast night, it's why everything failed for a while12:09
apevecis missing classes on slave related or not?12:10
sdaguenot sure12:11
*** ruhe has joined #openstack-infra12:13
*** rongze has joined #openstack-infra12:15
openstackgerritSean Dague proposed a change to openstack-infra/elastic-recheck: add query for jenkins crash  https://review.openstack.org/6197412:16
*** ianw has quit IRC12:18
openstackgerritA change was merged to openstack-infra/elastic-recheck: add query for jenkins crash  https://review.openstack.org/6197412:19
*** SergeyLukjanov is now known as _SergeyLukjanov12:24
*** dstanek has joined #openstack-infra12:24
*** _SergeyLukjanov has quit IRC12:25
*** mfer has joined #openstack-infra12:25
*** sarob has joined #openstack-infra12:26
*** SergeyLukjanov has joined #openstack-infra12:33
*** Abhishek_ has quit IRC12:37
*** thomasem has joined #openstack-infra12:48
yassinecould someone please answer my question :$12:49
*** jcoufal has quit IRC12:49
openstackgerritCyril Roelandt proposed a change to openstack/requirements: HTTPretty: update to 0.7.1  https://review.openstack.org/6198112:49
*** jcoufal has joined #openstack-infra12:50
*** HenryG has quit IRC12:51
*** dolphm has joined #openstack-infra12:51
*** sandywalsh has joined #openstack-infra12:53
*** dizquierdo has quit IRC12:56
*** HenryG has joined #openstack-infra12:58
*** sarob has quit IRC12:59
*** yaguang has quit IRC13:02
*** dkliban has joined #openstack-infra13:02
*** marun has joined #openstack-infra13:03
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Stories and Tasks search  https://review.openstack.org/6051513:05
*** dstanek has quit IRC13:06
*** dstanek has joined #openstack-infra13:10
*** oubiwan__ has joined #openstack-infra13:15
*** sandywalsh has quit IRC13:15
portantejog0, clarkb, sdague, under what category to I file this bug:13:17
portante           http://logs.openstack.org/87/61587/1/gate/gate-swift-pep8/3291350/console.html13:17
portante13:17
sdagueyeh, we definitely need someone in .eu to get skilled up on infra. Maybe we tell mordred he has to move to barcelona :)13:21
ruhealready answered in qa channel. for everyone else jenkins error is filed in https://bugs.launchpad.net/bugs/126065413:21
uvirtbotLaunchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,Confirmed]13:21
sdagueruhe: thanks13:21
*** oubiwan__ has quit IRC13:22
portantesdague: can we get an elastic recheck for these kinds of infra bugs? they are likely to happen again in the future at some point13:22
*** dizquierdo has joined #openstack-infra13:23
ruhesdague, we (mirantis) do plan to dedicate couple of engineers to work on infra full-time, but it sure will take a lot of time to get accustomed to infra13:23
sdagueportante: there is one, but right now e-r only looks for details in tempest/devstack jobs13:24
dimsportante, its easy to submit a review against elastic-recheck, just need to add a yaml file - https://github.com/openstack-infra/elastic-recheck/tree/master/queries :)13:24
sdagueit's a future enhancement to have it look at all the jobs13:24
*** dcramer_ has joined #openstack-infra13:25
sdaguesee scroll back 16 lines where I added it to e-r13:25
*** derekh has quit IRC13:26
*** sarob has joined #openstack-infra13:26
*** esker has quit IRC13:27
dimssdague, ah cool. i just started looking at gate queue and was expected 50+ and found just a few and was wondering when i saw this :)13:27
sdaguedims: yeh, so when jog0 started the assumption was the only things that actually failed in the gate were races caused by a real cloud13:27
sdaguei.e. there is no reason for docs and unit tests to fail in the gate, they should have passed in check13:28
sdaguebut external events can make them fail (as well as bad reviewers)13:28
sdagueso they need to be added13:28
dimssdague, right13:28
sdagueand that's part of the code which needs some more brutal refactoring to get there13:29
portanteis we can tell these events, can we not make users do a recheck and just have infra retry the job?13:29
sdagueportante: so the issue is we are skipping processing them on the elastic recheck side13:29
sdaguebecause processing a job type requires actually knowing all the files that might need to have gotten to elastic search, as there is a delay13:30
portanteyes, thanks, understood13:30
sdagueportante: and, in general, we don't want to do auto recheck, because experience has shown that no one actually looks at the issues13:31
*** dcramer_ has quit IRC13:31
sdaguethe point of e-r is to help us classify the "worst" races we are seeing and grouping them, so people can prioritize these13:32
sdagueand get them fixed13:32
portantedeveloper frustration with rechecks is still growing though, and we need to address that too.13:33
sdagueportante: sure, and the way to fix that is to fix the underlying issues13:33
sdaguebecause if we just autorechecked, all it would mean is the gate merge time would grow to over a day as everything crashes through, blows up, is automatically readded.13:34
portantesdague: certainly. though I am thinking that if a job failes because of infra issue, and it can be moved to another instance and retried, that seems like a worth-while investment13:34
portantesdague, I am not suggesting recheck the entire job13:35
portantejust have the ci re-run the docs job on another instance when it detects that there is an infrastructure issue13:35
sdagueportante: sure, there could be infra recovery for exactly this kind of issue. I'd like the ci team address that13:35
portantewhere do they live? #openstack-ci13:35
sdaguehere13:35
sdagueci/infra13:36
sdaguebut the core team is basically west coast US, plus fungi on the east US, so they aren't awake yet13:36
*** oubiwan__ has joined #openstack-infra13:37
*** paul-- has quit IRC13:38
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Stories and Tasks search  https://review.openstack.org/6051513:40
portantesdague: thanks13:40
*** jcoufal has quit IRC13:41
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Added basic popup messages  https://review.openstack.org/5970613:41
*** dhellmann_ is now known as dhellmann13:42
*** dolphm has quit IRC13:43
*** weshay has joined #openstack-infra13:48
*** dkliban has quit IRC13:50
*** dolphm has joined #openstack-infra13:51
*** dolphm_ has joined #openstack-infra13:52
*** dprince has joined #openstack-infra13:55
*** dolphm has quit IRC13:56
*** bpokorny has joined #openstack-infra13:57
*** sarob has quit IRC13:58
*** rongze has quit IRC13:59
*** oubiwan__ has quit IRC14:02
openstackgerritCyril Roelandt proposed a change to openstack/requirements: HTTPretty: update to 0.7.1  https://review.openstack.org/6198114:07
*** jpich has quit IRC14:07
*** mriedem has joined #openstack-infra14:08
sdaguettx: I have a new hack idea, if you want to try it with your email thing14:09
sdagueany time a bug gets too big to modify via the web, add launchpad as an affected project14:09
sdaguewith a comment that launchpad is getting added because we can no longer modify this bug in launchpad14:10
ttxmy email thing is not magic, just applying https://help.launchpad.net/Bugs/EmailInterface14:10
sdagueI'm actually super annoyed that I've got 2 bugs in the tempest queue that are dead wood14:10
ttx(just need your PGP publickey registered with LP)14:10
sdague#1179008 rename requires files to standard names14:10
ttxsdague: maybe the other one is not as blocked14:10
sdague#1214176 Fix copyright headers to be compliant with Foundation policies14:11
ttxlet me try that second one14:11
sdaguecould you get the LP team to just delete those bugs entirely14:11
*** ruhe is now known as ruhe_14:12
ttxbah, submit request failure14:12
ttxsdague: they usually reply to launchpad questions. Let me try that14:13
sdagueI think we should just delete any bug that's gotten out of control, because it just causes problems with projects that show up late and try to fix it14:14
*** oubiwan__ has joined #openstack-infra14:15
*** vkozhukalov has quit IRC14:15
fungiwhat's the urgent machine to remove?14:16
sdaguefungi: one sec14:16
*** jcoufal has joined #openstack-infra14:17
*** blamar has quit IRC14:17
*** ruhe_ has quit IRC14:18
sdaguefungi: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiamF2YS5pby5JT0V4Y2VwdGlvblwiICAgQU5EIG1lc3NhZ2U6XCJSZW1vdGUgY2FsbCBvblwiICAgQU5EIG1lc3NhZ2U6XCJmYWlsZWRcIiAgIEFORCBmaWxlbmFtZTpcImNvbnNvbGUuaHRtbFwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiJhbGwiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzg2OTQ0Mjg0Nzg3fQ==14:18
sdagueprecise 14 and precise 20 it seems14:18
ttxsdague: let's see how that goes: https://answers.launchpad.net/launchpad/+question/24074814:19
fungisdague: i'll dampen them14:19
*** SushilKM has quit IRC14:20
fungiit's possible something went weird with the slave agent connection to them when we rebooted jenkins0214:20
*** eharney has joined #openstack-infra14:22
ttxfungi: let us know, we'll restart the stable/* jobs afterwards14:24
fungithey're already offline as of a minute or so14:25
ttxfungi: ok, retrying then.14:25
fungihttps://jenkins02.openstack.org/computer/precise14/ and https://jenkins02.openstack.org/computer/precise20/14:25
*** russellb is now known as rustlebee14:25
fungii'll work some magic to get them back into service14:25
ttxfungi: what is the appropriate keyword to reverify in that case ?14:26
*** sarob has joined #openstack-infra14:26
ttxI can abuse bug 126065414:26
uvirtbotLaunchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,Confirmed] https://launchpad.net/bugs/126065414:26
*** flaper87|afk is now known as flaper8714:26
fungittx: you can just reapprove them instead of using reverify if you're core for that (which you are), or if we have a bug open on this already then you could reverify against that bug14:26
fungithat works14:27
apevecttx, why would that be abuse?14:27
ttxapevec: it may or may not match exactly that error :)14:27
fungifor the record, slave agent failures look likely... https://jenkins02.openstack.org/job/gate-nova-python27/13301/consoleText https://jenkins02.openstack.org/job/gate-neutron-docs/3708/consoleText14:27
*** dansmith is now known as damnsmith14:27
*** oubiwan__ has quit IRC14:28
apevecyeah, some had multiple failures, I've sent 2013.2.1 update email to you specifying what failed where14:28
fungiprecise14 seemed to be dying straight away, but precise20 was getting through the job and then bailing on artifact collection14:28
ttxapevec: horizon/heat requirements sync reverified14:28
apevectahnks14:28
*** yamahata_ has joined #openstack-infra14:29
*** prad has joined #openstack-infra14:32
*** ilyashakhat has joined #openstack-infra14:35
*** ilyashakhat has quit IRC14:36
*** ruhe has joined #openstack-infra14:36
*** bknudson has joined #openstack-infra14:36
*** sarob has quit IRC14:37
*** andreaf has quit IRC14:38
*** saper_ is now known as saper14:38
fungiprecise14 and 20 rebooted and back in service, watching to make sure jobs complete on them now14:39
fungithis ran to completion on precise14... https://jenkins02.openstack.org/job/gate-puppet-neutron-puppet-syntax/83/console14:39
fungiand this on precise20... https://jenkins02.openstack.org/job/gate-puppet-neutron-puppet-unit-2.7/121/console14:40
fungishould be sane now14:41
*** Abhishe__ has joined #openstack-infra14:41
apevecfungi, so what was it?14:41
*** smarcet has joined #openstack-infra14:41
apevecdolphm_, please approve https://review.openstack.org/6142514:41
fungijava exceptions when the master was trying to communicate with the slave agent. there's every chance they lost their sanity during the reboot of jenkins02 last night14:42
*** dstanek has quit IRC14:42
fungiwell, s/reboot/restart/14:42
openstackgerritDavid Kranz proposed a change to openstack-infra/devstack-gate: Always dump errors to console  https://review.openstack.org/6185014:43
*** xchu has joined #openstack-infra14:43
fungijenkins01 shot itself in the head last night over a jvm oom condition so we had to scarmble to get everything back up and running after that, and noticed jenkins02 was using at least as much memory as 01 had been so we did a controlled restart of jenkins on it as well14:43
*** xchu has quit IRC14:43
*** dstanek has joined #openstack-infra14:43
sdaguefungi: yeh, seems like it would be nice to have something that auto downs these nodes on a jenkins stack trace capture14:44
fungibut in the process, i'm betting something happened with slave agent communication to precise14 and 20 as it booted back up14:44
*** rongze has joined #openstack-infra14:44
sdaguethis seems to happen every 3 weeks or so, and basically kills a whole dev cycle for .eu14:44
fungisdague: i wonder whether there's a jenkins plugin for that14:44
*** xchu has joined #openstack-infra14:44
*** rnirmal has joined #openstack-infra14:44
*** HenryG_ has joined #openstack-infra14:45
ruhefungi, sdague: i guess monitoring system might be enough to prevent such events14:45
fungisdague: but regardless, we're already in progress shifting jobs to single-use slaves, which is our preferred near-term solution to this (as opposed to the longer-term "get rid of jenkins entirely" solution)14:45
*** jd__ has quit IRC14:46
*** iv_m has quit IRC14:46
*** jd__ has joined #openstack-infra14:46
*** iv_m has joined #openstack-infra14:46
*** hughsaunders has quit IRC14:46
fungiruhe: interestingly, probably not. there's was nothing outwardly unusual about the condition of those slaves. we'd need to interrogate the jenkins master and have it perform some sort of communication and artifact collection tests as a canary14:46
*** hughsaunders has joined #openstack-infra14:46
*** xchu has quit IRC14:46
funginontrivial14:46
fungiprobably special jobs which would need to be run between normal jobs to detect a condition like that14:47
*** xchu has joined #openstack-infra14:47
*** xchu has quit IRC14:47
*** blamar has joined #openstack-infra14:48
fungisdague: at the moment, there are already a handful of infra jobs we've shifted from long-running slaves to bare (non-devstack) single-use slaves, with great success. it's just a matter of slowly shifting the remainder14:48
*** xchu has joined #openstack-infra14:48
* fungi will brb14:48
*** HenryG has quit IRC14:49
*** dkliban has joined #openstack-infra14:50
*** andreaf has joined #openstack-infra14:58
*** jcooley_ has joined #openstack-infra15:00
*** markmcclain has joined #openstack-infra15:02
*** esker has joined #openstack-infra15:05
dolphm_apevec: done15:06
apevecthanks!15:06
*** rcleere has joined #openstack-infra15:09
*** pabelanger has joined #openstack-infra15:09
*** rongze has quit IRC15:11
*** jasond has joined #openstack-infra15:12
*** dcramer_ has joined #openstack-infra15:14
*** basha has joined #openstack-infra15:16
*** ryanpetrello has joined #openstack-infra15:18
*** markmcclain has quit IRC15:19
*** apevec has quit IRC15:21
*** alcabrera has joined #openstack-infra15:23
*** datsun180b has joined #openstack-infra15:26
*** sarob has joined #openstack-infra15:26
*** oubiwan__ has joined #openstack-infra15:29
*** markmcclain has joined #openstack-infra15:30
*** dolphm_ has quit IRC15:30
*** jcoufal has quit IRC15:30
*** zehicle_at_dell has joined #openstack-infra15:31
*** rwsu has joined #openstack-infra15:31
*** rcarrillocruz has joined #openstack-infra15:33
mriedemjust opened this against infra, not sure if it's a known issue yet or not:15:33
mriedemhttps://bugs.launchpad.net/openstack-ci/+bug/126076715:33
uvirtbotLaunchpad bug 1260767 in openstack-ci "gate-nova-docs fails on master with "Remote call on precise14 failed"" [Undecided,New]15:33
*** rcarrillocruz1 has quit IRC15:35
*** xchu has quit IRC15:36
portantemreidem: saw that earlier15:37
*** jcooley_ has quit IRC15:37
*** SushilKM has joined #openstack-infra15:37
portanteI think 126065415:38
*** jcooley_ has joined #openstack-infra15:38
portantehttps://bugs.launchpad.net/openstack-ci/+bug/126065415:38
*** dizquierdo has quit IRC15:38
uvirtbotLaunchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Critical,Fix released]15:38
portantesdague filed that, I think15:38
*** rongze has joined #openstack-infra15:39
sdague portante actually ttx15:42
sdaguebut you are right, it's a dup15:42
*** iv_m has quit IRC15:43
* ttx admits only having checked the rechecks page15:43
*** jcooley_ has quit IRC15:43
*** marun has quit IRC15:43
mriedemthanks guys15:43
*** bnemec is now known as beekneemech15:45
*** dims has quit IRC15:45
*** dims has joined #openstack-infra15:47
*** zehicle_at_dell has quit IRC15:51
fungiyassine: i've checked out unit-test slaves and it doesn't look like the centos6 slaves have a zookeeper-server installed. in fact, it doesn't appear that centos 6.4 provides an rpm for any package named zookeeper-server in its standard yum package repositories15:52
*** NikitaKonovalov has quit IRC15:53
fungiyassine: the corresponding "zookeeper" package is installed on our ubuntu precise slaves however (both our python 2.7 and python 3.3 slave variants)15:53
*** mriedem has quit IRC15:56
*** maurosr has quit IRC15:56
fungiyassine: http://paste.openstack.org/show/54962/15:57
*** mriedem has joined #openstack-infra15:57
*** sarob has quit IRC15:58
*** mfer has quit IRC15:58
*** maurosr has joined #openstack-infra15:59
*** rossella_s has quit IRC16:00
*** mdenny has joined #openstack-infra16:01
*** rnirmal_ has joined #openstack-infra16:02
*** rnirmal has quit IRC16:02
*** rnirmal_ is now known as rnirmal16:02
*** mfer has joined #openstack-infra16:03
sdaguefungi: so https://bugs.launchpad.net/tempest/+bug/126071016:03
openstackgerritA change was merged to openstack-infra/release-tools: Add mpcut.sh for milestone-proposed branch cutting  https://review.openstack.org/6138916:03
uvirtbotLaunchpad bug 1260710 in tempest "testr lists both tests and unit tests in gate-tempest-python27 job" [High,In progress]16:03
sdagueit marked it as in progress, but didn't post the review16:03
*** talluri has quit IRC16:04
sdaguewhich I find highly annoying16:04
sdagueand that seems to be the norm now16:04
*** talluri has joined #openstack-infra16:04
sdagueis that intended, or a bug?16:04
jasondis "reverify no bug" okay to use?  if not, how would i go about identifying the bug to reverify?16:05
fungisdague: what's the corresponding review for that bug? it doesn't seem to be the one mentioned in the bug description16:06
sdaguefungi: https://review.openstack.org/#/c/62019/16:06
fungisdague: https://review.openstack.org/#/c/62019/1..2//COMMIT_MSG16:07
fungithat's why16:07
sdagueoh, right mtreinish failed on commit message16:07
mtreinishsdague: did I have a period?16:08
sdaguemtreinish: Fixes bug doesn't link16:08
fungiupdate_bug.py is not smart enough to know whether or not its posted the review link, so it errs on the side of not spamming people on every patchset with another bug comment and just does it if it's patchset #116:08
sdagueCloses-Bug: #...............16:08
fungithe diff there shows that he added the bug header on comment #2, which is one reason16:09
fungier, on patchset #216:09
mtreinishsdague: sigh... ok I'll respin it16:09
*** basha has quit IRC16:09
*** jcooley_ has joined #openstack-infra16:09
mtreinishfungi: will that be enough?16:09
openstackgerritBen Nemec proposed a change to openstack-dev/hacking: Enforce import group ordering  https://review.openstack.org/5440316:09
jeblairi'm in favor of tightening the gerrit regex so it matches in the webui16:09
openstackgerritSahid Orentino Ferdjaoui proposed a change to openstack/requirements: Tox fails to build environment because of MySQL-Python version  https://review.openstack.org/6202716:10
sdagueyeh, it would be more obvious if the behavior was the same on both16:10
fungiusing a standardized bug header will also help (so that it will also close the bug) but the reason it set in-progress and didn't comment with the link in the bug is that you didn't have a bug header on the initial patchset16:10
sdaguefungi: sure16:11
fungijeblair: i think we tightened the regex in gerrit as much as we could without losing links on review comments like "recheck bug 12345"16:11
uvirtbotLaunchpad bug 12345 in isdnutils "isdn does not work, fritz avm (pnp?)" [Medium,Fix released] https://launchpad.net/bugs/1234516:11
sdaguebut if it gets fixed now will it post?16:11
*** jasond has left #openstack-infra16:11
*** Ryan_Lane has joined #openstack-infra16:12
openstackgerritSahid Orentino Ferdjaoui proposed a change to openstack/requirements: Tox fails to build environment because of MySQL-Python version  https://review.openstack.org/6202816:12
*** talluri has quit IRC16:13
fungisdague: i can't remember if update_bug.py will set it to fix committed/released on a string like "Fixes bug 1260710" though i'm pretty sure "Fixes-bug: 1260710" works (even though closes is the recommended term in the wiki)16:13
uvirtbotLaunchpad bug 1260710 in tempest "testr lists both tests and unit tests in gate-tempest-python27 job" [High,In progress] https://launchpad.net/bugs/126071016:13
fungithe goal being to drive contributors toward using standard git header formats for these so they can be more easily mined from commit logs in the future16:14
*** rongze has quit IRC16:14
fungioh, also it should have been in the final paragraph of the commit message to be a proper header16:14
fungithat extra blank line makes it not16:14
fungimtreinish: ^16:15
*** jasond has joined #openstack-infra16:15
jasonddoes anybody know why this review says "Need Verified"? https://review.openstack.org/#/c/59851/16:16
mtreinishfungi: seriously16:17
mtreinishdo I need to do another revision?16:17
fungimtreinish: nope--just pointing out if you're trying to correct the commit message, that's part of it16:17
fungijasond: taken care of16:18
*** AaronGr_Zzz is now known as AaronGr16:18
jasondfungi: thanks!16:18
*** ilyashakhat_ has quit IRC16:20
*** rongze has joined #openstack-infra16:21
*** jcooley_ has quit IRC16:21
*** jcooley_ has joined #openstack-infra16:21
*** zehicle_at_dell has joined #openstack-infra16:22
*** AaronGr is now known as AaronGr_afk16:22
*** jcooley_ has quit IRC16:25
*** sarob has joined #openstack-infra16:26
*** hashar has quit IRC16:27
*** sarob has quit IRC16:31
*** zehicle has joined #openstack-infra16:32
yassinefungi: thank you for the information !! Do you know how could i fix this issue ? :/16:33
*** zehicle_at_dell has quit IRC16:33
*** johnthetubaguy has quit IRC16:33
*** saschpe_ has quit IRC16:33
*** johnthetubaguy1 has joined #openstack-infra16:33
*** niska has quit IRC16:34
*** mrodden1 has quit IRC16:34
*** saschpe has joined #openstack-infra16:34
fungiyassine: i left a review comment on the change you linked, but in short unless you can get a zookeeper-server rpm into centos 6 main repositories or fedora epel such that we can yum install it on the test slaves, your other option for python 2.6 unit testing right now would be figuring out whether it can be installed and used locally in the jenkins user's home directory by your unit test job without16:35
fungineeding root permissions on the system16:35
fungii'm not familiar enough with what zookeeper is or how it works to know whether that's possible16:35
*** ^d has joined #openstack-infra16:35
*** dkliban has quit IRC16:37
*** hughsaunders has quit IRC16:37
*** prad has quit IRC16:37
*** yamahata_ has quit IRC16:37
*** changbl has quit IRC16:37
*** openstackgerrit has quit IRC16:37
*** Ghe_HPDiscover has quit IRC16:37
*** juice has quit IRC16:37
*** tian has quit IRC16:37
*** iccha has quit IRC16:37
*** Alex_Gaynor has quit IRC16:37
*** jasond has quit IRC16:37
*** hughsaunders_ has joined #openstack-infra16:37
*** changbl_ has joined #openstack-infra16:37
*** hughsaunders_ is now known as hughsaunders16:38
*** nicedice has joined #openstack-infra16:38
*** tian has joined #openstack-infra16:38
*** jasond has joined #openstack-infra16:38
*** dkliban has joined #openstack-infra16:38
*** prad has joined #openstack-infra16:38
*** yamahata_ has joined #openstack-infra16:38
*** openstackgerrit has joined #openstack-infra16:38
*** Ghe_HPDiscover has joined #openstack-infra16:38
*** juice has joined #openstack-infra16:38
*** iccha has joined #openstack-infra16:38
*** Alex_Gaynor has joined #openstack-infra16:38
*** niska has joined #openstack-infra16:38
*** SushilKM has quit IRC16:40
*** prad_ has joined #openstack-infra16:42
*** StevenK_ has joined #openstack-infra16:42
*** johnthetubaguy1 has quit IRC16:43
*** SergeyLukjanov has quit IRC16:43
*** lcestari has quit IRC16:44
*** iccha_ has joined #openstack-infra16:44
*** lcestari has joined #openstack-infra16:44
*** StevenK has quit IRC16:44
*** guitarzan has quit IRC16:44
yassinefungi: oh thanks! If i can wget the zookeeper tar then i can run the zookeeper server without root permissions, is it possible to wget from the slave ?16:44
*** Ghe_HPDi1cover has joined #openstack-infra16:45
*** jasond` has joined #openstack-infra16:45
dhellmannmordred: responding to your query from monday, no I don't see a 1.2 release of oslo.messaging. Did you already talk to markmc about it?16:45
dhellmannclarkb, sdague: responding to your comment from monday about overloading the branch-designator for wsme/pecan gate jobs, I'm not sure what that means. :-(16:46
*** zehicle has quit IRC16:46
ruheis puppet-dashboard.openstack.org supposed to render some html on port 3000?16:47
*** johnthetubaguy has joined #openstack-infra16:47
* dhellmann is happy for irc client history, but needs a better tool for dealing with irc while traveling16:47
fungiyassine: yes, that's fine, just be aware that downloads from the internet sometimes fail, especially if it's a large file, so it could cause your job to occasionally return a false negative result16:47
*** pcrews has joined #openstack-infra16:47
*** johnthetubaguy has quit IRC16:48
*** guitarzan has joined #openstack-infra16:48
*** johnthetubaguy has joined #openstack-infra16:48
*** juice- has joined #openstack-infra16:48
fungiruhe: it would if it were working, but it broke. there's a project underway to replace it with something called puppetboard (anteaya, Hunner and pleia2 are collaborating on it last i heard)16:48
ruhefungi: got it, thanks16:49
*** tian has quit IRC16:49
*** dkliban has quit IRC16:49
*** prad has quit IRC16:49
*** yamahata_ has quit IRC16:49
*** openstackgerrit has quit IRC16:49
*** Ghe_HPDiscover has quit IRC16:49
*** juice has quit IRC16:49
*** iccha has quit IRC16:49
*** Alex_Gaynor has quit IRC16:49
*** jasond has quit IRC16:49
*** juice- is now known as juice16:49
*** prad_ is now known as prad16:49
fungiruhe: puppet dashboard is unfortunately somewhat fragile, and was further complicated by boundlessly growing its mysql db until we couldn't effectively clean or resize it, so we eventually stopped trying while the replacement project is underway16:50
fungi(there are simply too few of us to limp too many broken systems along indefinitely)16:51
*** esker has quit IRC16:51
*** SushilKM has joined #openstack-infra16:51
ruhelet's hope puppetboard doesn't have this issues. i'll try to install it in my infra copy and see how it goes16:52
fungiruhe: it sounds like it will work out much better. lighter weight and actually supported (puppet dashboard was effectively dead upstream, we were running somewhat of a fork, since puppetlabs had moved to recommending their proprietary dashboard instead)16:53
fungiruhe: however it needs puppetdb, which we hadn't previously been using, so i think they're working on getting a manifest together to install that along with puppetboard16:53
*** tian has joined #openstack-infra16:54
*** yamahata_ has joined #openstack-infra16:54
*** Alex_Gaynor has joined #openstack-infra16:54
fungithe alternative we'd explored was switching to the sodabrew fork of puppet dashboard, since its upstream was also somewhat active still16:55
yassinefungi: perfect ! i will wget then, it will simplify my script :) thank you for your help i really appreciate16:55
*** jcooley_ has joined #openstack-infra16:55
*** danger_fo_away is now known as dangers16:55
fungiyassine: my pleasure--let me know if you have any other questions16:55
yassinesure :)16:55
* fungi needs to disappear again for a moment, and will return shortly16:55
sdaguedo we have a bug bot anywhere?16:56
*** dkliban has joined #openstack-infra16:56
sdagueI'd really like to get IRC message on new bugs16:56
sdaguefor tempest, so we can basically keep new bugs down to 016:56
*** mrodden has joined #openstack-infra16:57
clarkbsdague: soren has one. it subscribes to bugs and alerts on imap entries16:58
clarkbdhellman: basically in that designator you put a string saying this is a wsme/pecan job16:59
dhellmannclarkb: does that go in the job definition in one of the yaml files?16:59
clarkbeverything else about the job matches the openstack gate so you stay in sync without mutual gating16:59
markmcdhellmann, I didn't do a 1.2 release of oslo.messaging16:59
clarkbdhellman: in projects.yaml where you instantiate the template16:59
dhellmannmarkmc: I don't see any releases on pypi, should we do one?17:00
markmcdhellmann, no, it wasn't in havana - first release will be in icehouse, and that will be 1.317:00
markmcdhellmann, was going to do 1.2 when it looked like it was going to be in havana17:01
markmcdhellmann, IOW, there's still room for some API changes17:01
dhellmannmarkmc: why 1.3 if there is not yet a 1.2? I feel like we've had this conversation...17:01
dhellmannah17:01
markmcdhellmann, would like them to be minor at this point yet17:01
markmcdhellmann, matching oslo.config, for no great reason17:01
*** markmcclain has quit IRC17:01
dhellmannmarkmc: ok, I didn't think we were worried about matching release versions across libraries like that, but we can talk about it17:02
dhellmannclarkb: what does the branch-designator buy us? a separate gate queue? so pecan gate jobs don't clog up the openstack gate?17:02
clarkbdhellmann: correct that plus staying in sync with the openstack gate17:03
HunnerGuh. Still haven't done any puppetboard stuff... It's hard to do stuff so close to work, I think >_<17:03
*** dkliban has quit IRC17:04
clarkbrather than two different templates that can diverge there is one template that can create jobs with arbitrary names17:04
*** tma996 has joined #openstack-infra17:04
dhellmannclarkb, so I would add a "devstack-jobs" entry to the jobs list for pecan with pipeline=gate and branch-designator=pecan-wsme or something like that?17:06
dhellmannclarkb: or maybe that pipeline should be different, too?17:06
*** jooools has quit IRC17:06
*** UtahDave has joined #openstack-infra17:07
*** SushilKM has quit IRC17:08
*** talluri has joined #openstack-infra17:08
*** nprivalova_ has joined #openstack-infra17:11
clarkbdhellman: no thst sounds fine. you may not want all of devstack-jobs though17:12
*** dstanek_afk has joined #openstack-infra17:12
*** dstanek has quit IRC17:13
*** nprivalova has quit IRC17:13
*** nprivalova_ is now known as nprivalova17:13
*** dstanek_afk is now known as dstanek17:13
dhellmannclarkb: yeah, we'll look at the list and verify before including all of them17:16
dhellmannclarkb: thanks for the tips17:17
*** ruhe has quit IRC17:19
jeblair#status log restarted gerritbot17:21
*** openstackgerrit has joined #openstack-infra17:22
*** SergeyLukjanov has joined #openstack-infra17:23
*** zehicle_at_dell has joined #openstack-infra17:23
*** Alex_Gaynor has quit IRC17:26
*** Alex_Gaynor has joined #openstack-infra17:26
mriedemare jobs timing out at all right now?17:26
mriedemhttp://logs.openstack.org/52/55752/14/check/check-tempest-dsvm-full/3eb1378/console.html17:26
mriedemhttp://logs.openstack.org/52/55752/14/check/check-tempest-dsvm-full/3eb1378/console.html#_2013-12-13_16_53_45_13417:27
*** SushilKM has joined #openstack-infra17:28
jeblairmriedem: http://status.openstack.org/zuul/ sasy many check jobs have succeeded recently17:30
mriedemso hiccup?17:30
jeblairmriedem: no, i believe there is a current nondeterministic bug that causes jobs to run very long and time out17:31
mriedemjeblair: i opened https://bugs.launchpad.net/openstack-ci/+bug/1260816 to recheck against17:31
uvirtbotLaunchpad bug 1260816 in openstack-ci "check-tempest-dsvm-full job timed out causing build failure" [Undecided,New]17:31
*** yaguang has joined #openstack-infra17:31
*** dolphm has joined #openstack-infra17:32
jeblairmriedem: https://bugs.launchpad.net/tempest/+bug/125868217:34
uvirtbotLaunchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid]17:34
jeblairmriedem: i will mark your bug as a dup of that17:34
mriedemjeblair: ah, thanks, maybe i didn't find it 'Build timed out' in LP, at least not in openstack-ci where i was looking17:34
jeblairmriedem: i also tagged it with 'gate-failure' which we've recently started doing to try to make these easier to find17:35
jeblairmriedem: [i understand, you see how long it took me to find it :( ]17:35
*** freyes has joined #openstack-infra17:36
fungijeblair: so i suppose gerritbot didn't return after one of the more recent netsplits? (i saw it in and out a few times on earlier splits today already)17:37
*** markmcclain has joined #openstack-infra17:38
*** freyes has quit IRC17:41
*** johnthetubaguy has quit IRC17:43
*** SushilKM has quit IRC17:43
*** reed has joined #openstack-infra17:43
*** johnthetubaguy has joined #openstack-infra17:43
*** reed has quit IRC17:43
*** reed has joined #openstack-infra17:43
jeblairfungi: possibly; it seemed to be running17:45
*** basha has joined #openstack-infra17:45
*** sandywalsh has joined #openstack-infra17:45
*** ruhe has joined #openstack-infra17:46
notmynamejog0: I put my gate status code and url-generating script online https://github.com/notmyname/gate_status17:48
*** SergeyLukjanov_ has joined #openstack-infra17:48
*** AaronGr_afk is now known as AaronGr17:48
*** SergeyLukjanov has quit IRC17:48
*** dolphm has quit IRC17:49
*** dkliban has joined #openstack-infra17:50
*** tma996 has quit IRC17:51
jeblairnotmyname: fyi there's a jquery plugin to build graphite urls; see it in action at the bottom of view-source:http://status.openstack.org/zuul/index.html17:52
clarkbfungi: I am going to upgrade jenkins on jenkins-dev to 1.543 now17:53
jeblairjog0's graph uses it too17:53
notmynamejeblair: cool. (but that would mean javascript and then I'd have to add "front end design" to my linkedin page and then I'd get more recruiter spam and ...)17:53
jeblairnotmyname: definitely not worth it :)17:53
notmynamehehe17:53
*** dolphm has joined #openstack-infra17:53
fungiclarkb: awesome17:54
*** sdake_ has joined #openstack-infra17:54
notmynamejeblair: 12 hour buckets, over the last 11 days (that's how long you keep data?) http://not.mn/gate_status.html17:54
jeblairclarkb: do you have a script to submit a simulated job completion event to log-gearman-worker?17:54
clarkbjeblair: I don't, the worker doesn't receive job completion events17:55
jeblairnotmyname: it's been 11 days since we renamed the jobs (and when we renamed them, we did not move the graphite data)17:55
zarogood morning17:55
notmynamejeblair: ah, gotcha17:56
yaguanghelp needed, change to  requirements stable/grizzly  jenkins gate fails17:56
jeblairclarkb: i know, it's complicated.  there are several places where you could inject an artificial event for testing; i'm assuming you have no scripts that inject events into any such places? :)17:56
jeblairnotmyname: otherwise we do keep data for a year17:57
yaguangfor this patch https://review.openstack.org/#/c/61237/17:57
clarkbjeblair: not really no, I typically just run the client and worker locally and hook them up to a jenkins feed. jenkins is busy enough to get events that way :)17:57
notmynamejeblair: are there any events generated when the zuul pipeline gets reset? I'd _really_ like to track that number17:57
jeblairclarkb: is jenkins zmq public?17:57
clarkbjeblair: no, port forwarding is necessary17:58
clarkbI may have a stand in client though /me looks17:58
clarkbjeblair: I do have a simple stand in client17:59
clarkbwould you like a copy of that?17:59
jeblairnotmyname: not atm, however i do think such a thing is possible; probably in zuul.scheduler.Scheduler._processOneItem.18:00
jeblairclarkb: that would be lovely18:00
jeblairnotmyname: if you wanted to hack on zuul :)18:00
*** gaelL_ has quit IRC18:00
notmynamejeblair: I'll add it to the todo list, but I can't promise it will be near the top18:00
jeblairnotmyname: note that "resets of head item" and "resets of any item" are probably both interesting and distinct18:00
*** gaelL has joined #openstack-infra18:01
jeblairnotmyname: if you don't get to it, i will eventually.18:01
notmynamejeblair: with that number (and I'm guessing it will be high since the overall chance of success is so low), I think you can get a good feel for the value of the pipeline approach. I suspect that the current pipeline isn't doing much besides keeping the DC warm18:01
*** gyee has joined #openstack-infra18:01
clarkbjeblair: http://paste.openstack.org/show/54967/ super simple18:02
clarkbit provides only the necessary subset of event data that the gearman worker relies on18:02
jeblairnotmyname: not sure what you mean by 'pipeline approach'?18:02
clarkbstopping jenkins now18:02
clarkb* on jenkins-dev18:02
*** yolanda has quit IRC18:04
fungiyaguang: https://review.openstack.org/55939 only just merged a few hours ago to address the iso8601 issues preventing grizzly integration testing, so this is probably a new bug which was being hidden by that one18:04
notmynamejeblair: optimistically queueing all the patches rather than doing them serially.18:04
*** freyes has joined #openstack-infra18:05
*** harlowja has quit IRC18:05
*** sandywalsh has quit IRC18:05
fungii think as long as we manage to merge 1.5 changes an hour on average, the pipeline is going at least as fast as serial testing would18:05
yaguangfungi, yes,  the iso8601 issue disappeared18:05
*** sandywalsh has joined #openstack-infra18:05
yaguangfungi, it seems there is a new one   + sudo chown -R jenkins /opt/stack/new/savanna-dashboard18:06
yaguang2013-12-13 16:33:06.574 | + cd /opt/stack/new/requirements18:06
yaguang2013-12-13 16:33:06.597 | + python update.py /opt/stack/new/savanna-dashboard18:06
yaguang2013-12-13 16:33:06.598 | Traceback (most recent call last):18:06
yaguang2013-12-13 16:33:06.598 |   File "update.py", line 94, in <module>18:06
yaguang2013-12-13 16:33:06.599 |     main(sys.argv[1:])18:06
yaguang2013-12-13 16:33:06.621 |   File "update.py", line 90, in main18:06
yaguang2013-12-13 16:33:06.621 |     _copy_requires(req, argv[0])18:06
yaguang2013-12-13 16:33:06.622 |   File "update.py", line 71, in _copy_requires18:06
yaguang2013-12-13 16:33:06.622 |     dest_reqs = _parse_reqs(dest_path)18:06
yaguang2013-12-13 16:33:06.623 |   File "update.py", line 49, in _parse_reqs18:06
jeblairnotmyname: ah, yes; i'd refer to that as speculative execution.  but yes, as the test-subject system's reliability decreases it degrades to its worst-case behavior which is serial merging.18:06
yaguang2013-12-13 16:33:06.651 |     pip_requires = open(filename, "r").readlines()18:06
yaguang2013-12-13 16:33:06.676 | IOError: [Errno 2] No such file or directory: '/opt/stack/new/savanna-dashboard/tools/pip-requires'18:06
yaguang2013-12-13 16:33:06.751 | Process leaked file descriptors. See http://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build for more information18:06
yaguang2013-12-13 16:33:07.287 | Build step 'Execute shell' marked build as failure18:06
fungiyaguang: please use http://paste.openstack.org/ in the future18:06
clarkbfungi: 1.543 is running on jenkins-dev. Want to give nodepool a sping with the correct NODEPOOL_SSH_KEY value?18:07
notmynamefungi: rounding up from the current status to assume a 70% chance of failing, that means that a queue of 10 patches has a 2.8% chance of not being reset (ie the 10th patch has a 2.8% chance of landing)18:07
fungiyaguang: grizzly was broken for so long that there are likely to be new external/dependency-related issues which crept in during that span18:07
jeblairnotmyname: i think we have the configuration structured so that it shouldn't be _worse_ than serial merging.  but yes, it's, um, providing some load for our providers.18:07
funginotmyname: makes sense. just pointing out that if the average duration of our longest tests is 0.75 hours then serial testing is only going to merge 1.5 changes an hour best case (assuming none fail)18:08
notmynamecurrent queue depth of 34 means a 0.00077% chance of landing18:08
fungi0.00077% chance of landing on that iteration, but it will be automatically retried until there are no failures ahead of it in the pipeline18:09
yaguangfungi, to debug the issue, where can I find the source code for check-requirements-integration-dsvm gate ?18:09
*** dolphm has quit IRC18:09
*** dolphm has joined #openstack-infra18:10
notmynamefungi: right. I'm saying that the last item in the queue pretty much doesn't stand a chance of getting through without a retry18:10
fungiyaguang: in openstack-dev/pbr, openstack-infra/pypi-mirror and openstack-infra/config. i'll get you urls to the relevant files in just a momenty18:10
notmynamefungi: and the result is that there is a pretty low chance of doing much more than "serial speed", but now we have a bunch of servers wasting cycles for tests on patches that won't land18:11
*** jpeeler has quit IRC18:11
*** dolphm has quit IRC18:11
funginotmyname: agreed--there may be a numbers game to determining a sweet spot for maximum pipeline length beyond which it makes no sense to start jobs until you get closer to the head of the gate18:11
*** dolphm_ has joined #openstack-infra18:11
notmynamefungi: right18:11
*** jpeeler has joined #openstack-infra18:12
*** dolphm_ has quit IRC18:12
fungidependent on the current/recent average failure rate for changes18:12
jeblairnotmyname: one thing that has been considered is allowing elastic-recheck to see non-final job results to collect more data on bug frequency18:12
jeblairnotmyname: that is a potential use for the otherwise discarded test runs further down the queue18:12
notmynamejeblair: that would be good. it would magnify problem areas18:12
fungii suspect the slave discard/rebuild overhead places the sweet spot somewhere in the vicinity of 50-100% use of the maximum pool size/quota aggregate too18:14
notmynamejeblair: fungi: but the real source of the problem is that even a 5% pass rate drop has a _massive_ effect on the efficiency of the overall gate queue. the 34th item in the queue only has an 18.4% chance of landing with no retries even if the pass rate is 95% (as opposed to the current <70%)18:14
fungisince that eliminates nodepool's ability to get ahead of the node demand18:14
*** alcabrera is now known as alcabrera|afk18:14
notmynameand I don't think this is big revelation to anyone, but it's at lest a new way to see and track the data18:15
jeblairnotmyname: yep; that's the impetus behind jog0's effort to try to get on gate bugs early.18:15
jeblairnotmyname: ++ more visibility18:15
jeblairokay, back to skynet for me18:16
*** freyes has quit IRC18:16
*** ruhe has quit IRC18:16
*** matel has quit IRC18:16
clarkbfungi: ready to nodepool on jenkins-dev?18:17
fungiclarkb: we can. gimme just a minute18:17
clarkbno rush, ping me when I should pay attention18:17
fungiyaguang: the meat of that job is this script... https://git.openstack.org/cgit/openstack-dev/pbr/tree/tools/integration.sh18:18
yaguangfungi, many thanks  :)18:18
fungiyaguang: the run-mirror command it's using to test building the set is http://git.openstack.org/cgit/openstack-infra/pypi-mirror/tree/pypi_mirror/cmd/run_mirror.py18:19
*** basha has quit IRC18:19
fungiyaguang: the job definition (the entry point for jenkins) can be seen at http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/jenkins_job_builder/config/requirements.yaml#n118:20
fungiyaguang: and that job is actually running the integration test script within the context of these https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/18:21
*** Ryan_Lane has quit IRC18:22
fungiso i guess four relevant projects involved in running that18:22
*** gyee has quit IRC18:22
fungi(not to mention the openstack/requirements project itself, where the projects list and global requirements reside_18:22
yaguangmay be  some projects doesn't have pip-requires file  at initial time18:23
yaguangcloning  savanna-dashboard18:24
yaguangfungi, thanks a lot for the info18:24
*** harlowja has joined #openstack-infra18:24
fungiyaguang: actually, i think this may be more involved. we lack a lot of the mechanisms for requirements testing in grizzly since they were developed in the havana cycle and not all backported. we may be better off setting that job to only run on stable/havana and later for now18:25
fungimordred: ^ opinion?18:25
*** basha has joined #openstack-infra18:26
fungiyaguang: i have an outstanding change to backport some of that and try to get it working, but it was waiting on the iso8601 situation to clear up. at the moment i don't have any reasonable expectation that job will run correctly at all18:26
fungii'll propose a change real quick to exclude stable/grizzly until more of those bits are in place (though we're getting close enough to eol for that release that it may not make sense to invest much more time in requirements consistency there anyway)18:27
yaguangfungi,  I also have  some backports are blocked for a long time18:28
clarkbfungi: I would go along with that18:29
*** rwsu has quit IRC18:32
harlowjaqq, are stackforge gate/merge jobs currently disabled?18:32
harlowjawondering if i should kick https://review.openstack.org/#/c/60850/ to try to get it to move, or not worry about it yet18:33
*** mgagne1 has joined #openstack-infra18:33
*** mgagne1 has quit IRC18:33
*** mgagne1 has joined #openstack-infra18:33
*** mgagne has quit IRC18:34
*** mgagne has joined #openstack-infra18:34
*** mgagne has quit IRC18:34
*** mgagne has joined #openstack-infra18:34
*** yaguang has quit IRC18:34
fungiharlowja: i'm not immediately seeing a good reason for 60850 not to be in progress on http://status.openstack.org/zuul/ so it may merit further investigation18:35
harlowjak, another one of interest, http://logs.openstack.org/20/54220/36/check/gate-taskflow-pep8/444a457/console.html18:35
fungiharlowja: there's nothing special going on for stackforge... it's treated the same as far as whether and when gating is started18:35
*** jergerber has joined #openstack-infra18:35
harlowjakk, thx fungi18:35
harlowjafungi should i try to kick those (recheck no bug) or just leave them for a little?18:37
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Don't run requirements integration for Grizzly  https://review.openstack.org/6205518:37
fungiharlowja: the failure in the log you linked can be rechecked or reverified against bug 126065418:38
uvirtbotLaunchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Critical,Fix released] https://launchpad.net/bugs/126065418:38
harlowjak, thx fungi18:38
fungiprecise14 was not a happy camper this morning18:38
*** mgagne1 has quit IRC18:38
*** basha has quit IRC18:38
harlowja:)18:39
fungiharlowja: on 60850 i think you *may* have originally set approval without a +2 vote and then added the +2 vote after, which might be the reason. try removing and adding your approval on it18:40
harlowjakk18:40
*** rwsu has joined #openstack-infra18:40
*** zehicle_at_dell has quit IRC18:41
*** gyee has joined #openstack-infra18:42
*** herndon has joined #openstack-infra18:43
*** johnthetubaguy has quit IRC18:44
fungiharlowja: looks like you undid your +2 code review on it (which you should add back) but did not remove your +1 approve (which is the one you need to reapply for zuul to notice)18:44
harlowjaah18:44
harlowjawrong one, thx18:44
fungiglad to help18:44
harlowjaneed more coffee, ha18:44
*** basha has joined #openstack-infra18:45
*** rossella_s has joined #openstack-infra18:45
*** zehicle_at_dell has joined #openstack-infra18:46
fungiharlowja: however, that theory didn't pan out since it's still not being tested. i think it may be because you have a draft change indirectly depending on it (61689), and i think we may still have corner case bug where if zuul can't retrieve and inspect the entire chain of dependent and reverse-dependent changes, it doesn't enqueue18:49
harlowjaah18:49
fungizuul, like the rest of the general public, is blind to draft changes18:50
clarkbfungi: this is where you say "don't use drafts" :)18:50
fungi(one of the reasons we recommend against the draft feature)18:50
fungiheh18:50
^dUgh, drafts.18:50
harlowjaya, and 61689 seems hidden18:50
harlowjahmmm18:50
*** rcarrillocruz1 has joined #openstack-infra18:51
*** herndon has quit IRC18:51
harlowjaso if that draft is not a draft but a WIP that should solve this?18:51
*** praneshp has joined #openstack-infra18:51
clarkbyes18:51
harlowjak18:51
*** rcarrillocruz has quit IRC18:52
harlowjai think i know who owns that draft, will bug him18:52
fungiharlowja: it should, yes, though one of the patches in the set will need reapproval again probably after you publish that draft18:52
fungiso that zuul will notice18:52
*** mriedem has quit IRC18:52
harlowjak18:53
harlowjathx guys18:53
*** apevec has joined #openstack-infra18:53
fungithe last time i looked into one of these, i found a traceback in zuul's log from where it tried to retrieve a reverse-dependent change which was in a draft state, and failed to enqueue the non-draft parent change as a result18:53
fungican't remember if i filed a bug or not18:53
harlowjaseems like it should almost skip over drafts completly18:53
apevecmordred, https://review.openstack.org/61237 (grizzly reqs) failed on savanna, but savanna doesn't have grizzly branch afaict?18:54
fungiwell, it can't have any hope of skipping them if they're required for the change in question, but if they're merely draft changes requiring the non-draft change that seems like one we could do something about18:54
fungiapevec: https://review.openstack.org/6205518:55
apevecwhat's requirements-integration test doing ?18:55
apevecfungi, ah thanks18:55
clarkbdkranz: re https://review.openstack.org/#/c/61850/ were my suggestions in patchset 2 not good? (I think the logic in patchset 3 is much more complicated than it needs to be)18:55
harlowjafungi agreed, for reviews that are dependent on a draft, ya, nothing u can do, but the other way around (a draft dependent on a review) seems like u could just ignore that draft (and all its dependents, if any)18:56
*** basha has quit IRC18:56
fungiapevec: i think that job got added while grizzly was broken from iso8601 so we didn't think about the implications on that branch18:56
clarkbharlowja: the greater problem is that drafts just don't work18:56
harlowjaor that clarkb  :)18:56
clarkbthere are so many corner cases where they fall over. It isn't just zuul having a hard time18:57
fungithe worst part of gerrit drafts, in my opinion, is that as the gerrit server admin you can't even disable them18:57
harlowjaeasy/hard to remove draft feature complety?18:57
harlowjaah18:57
fungiif it were a config option, i wouldn't care18:57
apevecfungi, thanks, I've added comment in the review to prevent rechecks in vain18:57
fungiinstead it's baked in, non-optional and thus an attractive nuisance18:58
harlowjafungi agreed18:58
fungialso, the idea of stashing "hidden" changes in progress in the code review system runs pretty counter to what i think open development processes are all about18:59
*** alcabrera|afk is now known as alcabrera18:59
*** yamahata_ has quit IRC18:59
fungiapevec: i think that requirements change is also probably not really absolutely necessary, since i'm pretty sure we don't do requirements enforcement on stable branches anyway (at least not for grizzly but i think also still not for havana either)19:00
*** mrodden has quit IRC19:00
apevecfungi, true, but it'd be nice to keep new updates somewhat synced19:01
fungi(though the havana ones do need to get enforced. on my to do list to check back into the state of those)19:01
fungiapevec: agreed19:02
clarkbfungi: I am fiddling with passing NODEPOOL_SSH_KEY into the daemon env in the init script, but I am probably better off patching nodepool to accept that key as an option19:05
fungik. i'm free to help as soon as i finish filing this zuul bug i should have filed ages ago19:06
clarkbawesome19:06
*** dstanek has quit IRC19:06
clarkbI am still reading source to try and figure out how this best fits in19:06
anteayaany point in adding a comment in git review so that if someone does use the flag to submit a draft they know they are creating a painful situation?19:07
clarkbhalf tempted to dump the public key literally into the yaml config file and just read it there19:07
anteayalike "are you sure you want to create a draft? This will bite you later."19:07
clarkbbut then you have to sort out logic like kicking off image rebuilds if the config changes which I don't think exists today19:07
fungiclarkb: it's not as if we don't do similar things elsewhere (though the path to a keyfile would certainly be nicer)19:08
notmynameanteaya: draft == WIP status?19:08
anteayanotmyname: no, draft means only certain folks can see it19:08
notmynameah19:08
anteayaWIP is a button in the gui for the patch19:08
pleia2better to use wip than draft19:08
*** mrodden has joined #openstack-infra19:08
anteayasubmit and then push "work in progress"19:08
clarkbexcept it isn't completely private when you draft, anyone can still fetch the code if they are smart19:09
anteayapleia2: yes19:09
fungiwe need to add a wip flag back into git-review but have been holding off until we see the state in gerrit 2.919:09
anteayakk19:09
anteayanotmyname: and draft makes future operations on that patch a pain19:09
fungisince the wip feature we're using now exists only in our own fork of gerrit 2.419:09
anteayawhich I believe was what was being discussed above19:09
*** mriedem has joined #openstack-infra19:10
*** mgagne has quit IRC19:17
sdagueso one place where I think it would be good for infra to auto recheck would be when any of the test results come back as UNSTABLE19:18
sdagueas that clearly was an infra fail19:18
*** sandywalsh has quit IRC19:19
fungiclarkb: i assume it's safe to blow away the bad images and nodes from last night's nodepool-dev experiments19:19
*** rongze has quit IRC19:20
*** dims has quit IRC19:21
clarkbfungi: yup should be19:22
clarkbthey weren't used for anything19:22
clarkbsdague: do we still have instances of UNSTABLE jobs making it to reporting? I think the problem there is that when zuul cancells jobs intentionally they sometimes report back as UNSTABLE19:23
*** mgagne has joined #openstack-infra19:23
*** mgagne has quit IRC19:23
*** mgagne has joined #openstack-infra19:23
clarkbbut I suppose in those cases we would know why19:23
sdagueclarkb: https://review.openstack.org/#/c/61778/4 just got hit by it19:23
*** sharwell has joined #openstack-infra19:23
sdaguebecause basically UNSTABLE is completely unuseful to a person, because it means there typically aren't any logs. So the only option is recheck no bug anyway19:24
clarkbsdague: well we should always have the console log...19:25
*** CaptTofu has joined #openstack-infra19:25
clarkbbut it is usually an infra problem19:25
sdagueclarkb: sometimes we don't19:25
fungiright, depends on how long it's been19:25
jeblairsdague: are you talking about errors from the bad jenkins slaves earlier?19:27
fungithe other problem there is that jenkins will persist jobs to the same slaves if available19:27
sdaguejeblair: that might be what this was19:27
jeblairfungi: was precise20 one of those?19:28
sdaguejust trying to think about improvements to the system19:28
sdagueyes it was19:28
fungijeblair: yep19:28
fungithese are things i expect will get better once we no longer run tests on long-lived slaves and use nodepool. not too much longer now19:28
*** jaypipes has joined #openstack-infra19:28
jeblairsdague: so zuul does re-launch jobs when it detects some kinds of jenkins failures19:29
sdaguejeblair: ok, so maybe expand that?19:29
jeblairsdague: though obviously this isn't one of them.  it's possible that retrying on unstable for this error would have made things better, inasmuch as it may have eventually been assigned to a node other than precise20 (possibly after retrying 200 times or something because of what fungi just pointed out)19:30
*** yassine has quit IRC19:30
jeblairsdague: but often retrying on unstable results isn't going to get us anywhere, and may make things worse (logs.o.o full as an example)19:30
fungihttp://logs.openstack.org/55/62055/1/check/gate-config-layout/5ad5222/console.html "Building remotely on bare-precise-rax-ord-850570..."19:31
jeblairsdague: so i don't think that build result alone is enough to automate a retry on19:31
sdaguejeblair: so classifying the kind of problem is probably important19:31
sdaguehonestly, an exception like that should down that node19:31
jeblairsdague: i agree; i think that's a jenkins bug....19:31
fungii bet retrying an unstable job once we're using bare nodepool nodes for them will be slightly more effective19:31
sdagueabout every 3 weeks we have a node go wonky like that and destroy an entire development day for .eu19:31
sdaguebecause there is no one to solve that in that TZ19:32
jeblairsdague: but we think that going to all-dynamic slaves will solve this problem19:32
sdaguejeblair: ok, well if that's close, cool19:32
fungisdague: see the link i posted19:33
fungiwe're already doing it some19:33
jeblairsdague: it's very much in progress ^ :)19:33
sdaguejeblair: ok cool19:33
fungii did mention that in the bug when i resolved it as well19:34
sdagueevery time we have a 'slode I just like to figure out "how does this problem never happen again"19:34
jeblairsdague: this is a unit-test like job that ran on one: http://logs.openstack.org/54/61954/2/check/gate-config-layout/0702552/console.html19:34
sdaguefungi: sure, I guess timeline was the question19:34
jeblairsdague: yes, me too.  sometimes that involves a long multi-step process.  fortunately we're near the end of this one.19:35
sdaguecool19:35
clarkbthe slave threading should help with this too19:35
fungii hope so19:36
sdagueyeh, it's just been a very bad gate week, and only slightly related to actual openstack bugs :)19:36
clarkbI think errors like this are jenkins being unable to maintain 300 ssh connections with thousands of threads all vying for cpu cycles19:36
*** dims has joined #openstack-infra19:36
fungii suspect precise14 and precise20 got into a bad state after we restarted jenkins02 (timeline seems about right) but wasn't obvious until we'd all gone to sleep19:37
jeblairsdague: to be fair, i think the 30% failure rate in openstack is more than a slight contribution.19:37
*** talluri has quit IRC19:37
jeblairfungi, clarkb: they never recover, so i think it's more than just contention.19:37
openstackgerritClark Boylan proposed a change to openstack-infra/nodepool: Allow for ssh key file path in config.  https://review.openstack.org/6206619:38
*** talluri has joined #openstack-infra19:38
jeblairclarkb: do you need that in asap? ^19:38
clarkbjeblair: I don't think so19:38
clarkbjeblair: we will get by running nodepool in the foreground for now19:38
praneshphey all, was any of you able to run the docs test successfully after pinning sphinx<1.2?19:39
clarkbpraneshp: yes19:39
fungiclarkb: we're also back to a clean slate now--old images and nodes deleted successfully19:39
clarkbfungi: awesome19:39
sdaguejeblair: how are you computing that #? because while the SSH race is bad, it's not 30% bad19:39
clarkbpraneshp: you may need to update tox.ini to disable pip install --pre19:39
clarkbpraneshp: line 9 of nova's tox.ini does this19:40
fungisdague: i'm guessing it was an instance of http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/5000/600/5652/5652.strip.gif19:40
praneshpclarkb thanks. Let me look into my tox.ini19:41
openstackgerritMatt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682  https://review.openstack.org/6206719:41
uvirtbotLaunchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/125868219:41
sdaguefungi:  :)19:42
openstackgerritClark Boylan proposed a change to openstack-infra/nodepool: Allow for ssh key file path in config.  https://review.openstack.org/6206619:42
*** sandywalsh has joined #openstack-infra19:43
openstackgerritMatt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682  https://review.openstack.org/6206719:43
uvirtbotLaunchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/125868219:43
jeblairsdague: http://not.mn/gate_status.html19:44
fungiyeah, that does seem to average out to about 30% failing19:44
sdaguejeblair: so that includes all the fails, including the infra fails, which currently are the #1 recheck bug19:44
fungibased on recent freshness metrics presumably19:45
jeblairsdague: the infra fails are the dip from a few hours ago.19:45
praneshpclarkb i don't have a  line relating to pup install --pre in my tox.ini https://review.openstack.org/#/c/61615/17/tox.ini19:45
clarkbsdague: which bug is that? rechecks page says 1253896 which isn't infra19:45
clarkbpraneshp: do you have a line like line 9 in nova's tox.ini?19:45
praneshpone sec.19:46
sdaguehttp://status.openstack.org/elastic-recheck/19:46
jeblairsdague: this chart is based on jog0's chart which, on the right edge measure real-time failure rates of jobs19:46
praneshpclarkb nope.19:46
clarkbpraneshp: that is what you need19:46
sdaguejeblair: so I'm not actually trying to argue who's more to blame here19:46
praneshpok, let me try, thanks19:46
sdagueI'm just saying, it's really hard to get anyone to look at the ssh bug when things are exploding for other reasons19:46
*** rossella_s has quit IRC19:47
jeblairsdague: sure.  but you included some hyperbole in your statements that i don't think helped the situation.19:47
*** zehicle_at_dell has quit IRC19:47
openstackgerritMatt Farina proposed a change to openstack-infra/config: New project request: PHP-Client  https://review.openstack.org/6206919:48
clarkbfungi: were you going to start nodepool in the foreground?19:49
*** dolphm has joined #openstack-infra19:49
clarkblunch should be starting here shortly but will do my best ot pay attention19:49
*** Ryan_Lane has joined #openstack-infra19:50
fungiclarkb: yeah, i was going to try `sudo -u nodepool NODEPOOL_SSH_KEY=~jenkins/.ssh/id_rsa.pub nodepoold -d` in a screen session, but the jenkins public key isn't readable by the nodepool user so i'm pondering options19:52
anteayafwiw, we are working hard on the ssh bug, it is proving to be a tricky one, markmcclain salv-orlando beagles and dkehn are all working on it right now19:53
anteayawill update when we have anything19:53
*** ^d is now known as ^demon|away19:53
fungiclarkb: i think i may just resort to copying it somewhere accessible for now (it's not as if the file's sensitive anyway)19:53
openstackgerritMichael Still proposed a change to openstack-infra/jeepyb: Rename the subscriber map to be a more generic config file.  https://review.openstack.org/6207319:54
openstackgerritMichael Still proposed a change to openstack-infra/jeepyb: Allow configurable mappings to different LP projects  https://review.openstack.org/6207419:54
clarkbfungi: ++ not sensitive19:54
fungiokay, it's running19:55
clarkbfungi: I think the var needs to have the actual file contents19:55
*** CaptTofu has quit IRC19:55
clarkbthe path won't work there19:55
jeblairmriedem: see my comment in https://bugs.launchpad.net/tempest/+bug/125868219:55
uvirtbotLaunchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid]19:55
fungiohhhhhhhh19:55
* fungi totally misread it19:56
jeblairmriedem: not all timeouts are due to the same cause.19:56
fungitanks clarkb19:56
clarkbfungi: it is passed literally to puppet on the remote end19:56
fungimmm, nodepool is also like the honey badger when it comes to trapping sigint, i see19:57
jeblairmriedem: however, i know of no current infra issues that would contribute to timeouts, so i think we can assume that all _current_ timeouts are due to the unknown bug19:57
mriedemjeblair: ok, i just pushed an e-r query for it19:57
jeblairsdague: ^ this is a big untracked contributer for gate failures19:57
mriedemsince there are no logs with errors besides console.html, i didn't have much to base the query on19:57
jeblairsdague: that makes things worse by taking 45 min jobs to 90 mins19:57
clarkbfungi: ya :( will probably need to delete the image build stuff too19:57
mriedemjeblair: https://review.openstack.org/#/c/62067/19:58
fungiclarkb: i plan to19:58
*** dstanek has joined #openstack-infra19:59
sdaguemriedem: can yuo change the message part to19:59
fungiclarkb: cleaned up... so how about: sudo -u nodepool NODEPOOL_SSH_KEY="`cat /tmp/id_rsa.pub`" nodepoold -d19:59
fungiclarkb: is that what you had in mind?19:59
sdaguemessage:"Build timed out (after" AND message:"minutes). Marking the build as failed."19:59
*** jaypipes has quit IRC19:59
sdagueso it catches all the job timeouts, not just the ones set to 90 minutes20:00
*** dcramer_ has quit IRC20:00
clarkbfungi: ya, see the nodepool readme, that is basically what it does20:00
clarkbsdague: that query is even better than mine :)20:00
jeblairsdague, mriedem: we'll want to remove the query asap after fixing the bug too, because lots of people upload broken code that times out20:00
fungiclarkb: founf it. thanks20:01
fungier, found20:01
sdagueso 75 hits over 7 days actually makes it 9th in the e-r bug list (by count)20:02
sdaguejust to get a sense of relative frequency20:02
*** lcestari has quit IRC20:02
fungiclarkb: for the benefit of our sanity, i checked the log and nodepool *does* think it needs two nodes, so could be an off-by-one/rounding error, or maybe that's an effect of the load prediction heuristic20:03
clarkbfungi: so if I sudo nodepool list I should see the data from the foreground process right? since this is all in the DB20:03
clarkbfungi: weird20:03
fungiclarkb: image-list at the moment20:03
fungiclarkb: list will start showing content once the image finished building20:03
clarkbfungi: I wonder if the heuristic will always do one + what it determines20:04
clarkbor some other silly off by one error20:04
fungiclarkb: i also have both commands being called every 60 seconds under watch in the second window of that screen session20:04
mriedemsdague: yeah, i can update the message20:04
fungiclarkb: wild guess would be that it's rounding up from very small values of 1 ;)20:05
*** eharney has quit IRC20:05
* fungi has not looked back at that section of the code to make a more reasoned guess20:06
jeblairsdague: yeah, just pointing out that i think the 2x runtime factor aggravates its severity (when it hits, it has the same throughput effect of hitting twice in a row).20:06
openstackgerritMatt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682  https://review.openstack.org/6206720:06
uvirtbotLaunchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/125868220:06
sdagueyep, definitely20:06
jeblairclarkb, fungi: is there something i can help elucidate?20:06
*** harlowja has quit IRC20:07
sdaguealso, the folks trying to land stable/grizzly patches that didn't fix their doc jobs is a huge problem now as well20:07
fungijeblair: min-ready is set to 1 and nodepool believes it needs 2 nodes20:07
mriedem219 hits > 77 hits:20:07
mriedemhttp://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiQnVpbGQgdGltZWQgb3V0IChhZnRlclwiIEFORCBtZXNzYWdlOlwibWludXRlcykuIE1hcmtpbmcgdGhlIGJ1aWxkIGFzIGZhaWxlZC5cIiBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4Njk2NTA5OTI1NX0=20:07
sdagueas a neutron job will reset ahead of them, then they'll be put back into the zuul queue20:07
mriedemgood call sda20:07
mriedemsdague: *20:07
clarkbjeblair: at this point I don't think so20:07
anteayasdague: the neutron job being the ssh bug?20:07
sdagueanteaya: the ssh bug that I pointed as top bug yesterday20:08
anteayayes20:08
fungiclarkb: jeblair: i'm less concerned with nodepool math problems at the moment and just want to make sure we have all the moving parts in place on jenkins-dev20:08
anteayawhich 4 devs are working on now20:08
sdaguemriedem: actually, that's catching a ton of swift issues20:08
sdaguein their unit tests20:08
anteayacontinuing from yesterday20:08
sdagueso I'm not sure that was a good call20:08
jeblairfungi: but i'm curious, what's the math problem?20:08
notmynamesdague: ? (swift ping)20:08
fungijeblair: min-ready is set to 1 and nodepool believes it needs 2 nodes20:09
fungijeblair: with no jobs underway on jenkins-dev20:09
jeblairfungi: can i see the debug output from the allocator?20:09
mriedemsdague: clarkb: like this: http://logs.openstack.org/15/60215/2/check/gate-swift-python26/1b3754e/console.html20:09
fungijeblair: probably so. i'll fish it out20:09
fungijeblair: scratch that20:09
mriedemhttp://logs.openstack.org/15/60215/2/check/gate-swift-python26/1b3754e/console.html#_2013-12-13_19_31_29_11520:09
sdaguemriedem: yes20:09
*** dprince has quit IRC20:10
fungijeblair: clarkb: jobs are actually underway on jenkins-dev, just none i would expect to need these nodepool nodes20:10
fungianyway, getting debug output20:10
mriedemsdague: so going back to the strict message i had first20:10
sdaguemriedem: yeh20:10
praneshphey clarkb: thanks a lot, my review passed jenkins20:10
praneshp*patch20:10
mferclarkb any chance i could get you to look at https://review.openstack.org/#/c/62069/ ... or maybe there's someone else i could hit up20:10
clarkbpersia: np20:10
jeblairmriedem, sdague: current timeout values for d-g jobs are 60,90,12020:11
jeblairmriedem, sdague: we could change them to 61,91,121 for better fingerprinting20:11
sdagueyou could match job name20:11
clarkbmfer: currently busy trying ot make jenkins more reliable. also manage-projects is still giving us grief... probably won't get to it today20:11
jeblairsdague: oh, right, that's a different field so you can match it.  that's better.  :)20:12
sdaguebuild_name is a valid thing to match20:12
mferclarkb kk20:12
fungijeblair: clarkb: debug log from daemon start to end of demand analysis... http://paste.openstack.org/show/5497120:12
fungijeblair: clarkb: so i think that's our answer20:13
clarkboh it has jobs queued20:13
fungiit wants to run some jobs on them ;)20:13
clarkbfungi: that is good, it means we will get end to end testing :)20:13
fungimystery solved20:13
mriedemsdague: jeblair: but can you do ORs?20:13
clarkbmriedem: yes20:13
sdaguemriedem: yes20:14
sdagueand you can use parens to group20:14
mriedemcan i use ternary operators? :)20:14
jeblairfungi: cool.  error: situation normal.  :)20:14
fungiclarkb: i'm going to clear old nodepool nodes manually out of jenkins-dev too20:14
clarkbfungi: o20:14
fungijeblair: yes, very much so20:14
clarkb*ok20:14
clarkbmriedem: http://lucene.apache.org/core/2_9_4/queryparsersyntax.html20:15
clarkbthat is for an older version of lucene but I think the query syntax hasn't changed20:15
*** Abhishe__ has quit IRC20:15
*** UtahDave1 has joined #openstack-infra20:16
*** mrodden has quit IRC20:16
*** UtahDave has quit IRC20:17
*** UtahDave1 is now known as UtahDave20:17
fungiclarkb: watching jenkins-dev, i think we still have some old periodic jobs which need to be deleted from it20:17
Alex_Gaynordhellmann: Good catch -- I totally missed the existing +220:17
Alex_Gaynor(and then I missed the follow up comment, not doing real well today am I?)20:17
clarkbfungi: probably20:17
dhellmannAlex_Gaynor: yeah, I do that sometimes so I figured that's what it was20:17
fungiclarkb: particularly the devstack-reap-vms-* jobs and similar20:17
* fungi fixes20:17
fungithough hopefully they no longer match the new nodepool names20:18
*** mrodden has joined #openstack-infra20:18
mriedemmessage:"Build timed out (after" AND message:"minutes). Marking the build as failed." AND (build_name:"check-tempest-dsvm-postgres-full" OR build_name:"check-tempest-dsvm-full") AND filename:"console.html"20:19
*** zehicle_at_dell has joined #openstack-infra20:19
sdaguemriedem: s/check/gate/ ?20:20
mriedemsdague: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiQnVpbGQgdGltZWQgb3V0IChhZnRlclwiIEFORCBtZXNzYWdlOlwibWludXRlcykuIE1hcmtpbmcgdGhlIGJ1aWxkIGFzIGZhaWxlZC5cIiBBTkQgKGJ1aWxkX25hbWU6XCJjaGVjay10ZW1wZXN0LWRzdm0tcG9zdGdyZXMtZnVsbFwiIE9SIGJ1aWxkX25hbWU6XCJjaGVjay10ZW1wZXN0LWRzdm0tZnVsbFwiKSBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQ20:20
mriedemsdague: should i duplicate the build_names in the query for check and gate?20:21
mriedemotherwise the e-r query won't hit on check failures and people will have to hunt for it20:21
sdagueclarkb: do we have globbing in that field?20:21
clarkbsdague: yes, but bot at the beginning of the field (lucene limitation)20:21
clarkbalso you have to remove the quotes to glob so20:22
clarkbcheck-tempest-* OR gate-tempest-* should work20:22
clarkbI wish I could will the image build to go faster :)20:22
mriedemmessage:"Build timed out (after" AND message:"minutes). Marking the build as failed." AND (build_name:check-tempest-* OR build_name:gate-tempest-*) AND filename:"console.html"20:23
sdaguemessage:"Build timed out (after" AND message:"minutes). Marking the build as failed." AND filename:"console.html" AND (build_name:gate-tempest* OR build_name:check-tempest*)20:23
mriedemessentially the same20:23
sdagueyeh, that20:23
mriedemi'm back to my original number of hits20:23
mriedemso looks good20:23
sdagueyep, we were going at it the same time20:23
sdagueyep, looks solid20:24
sdaguepush that and I'll land it20:24
sdagueonly 5 hits in the gate20:24
sdaguewhich is nice20:24
*** jcooley_ has quit IRC20:24
sdagueso it's not actually reseting much20:24
clarkboh grenade20:24
clarkbshould add grenade beacyse that is timing out a bunch right>20:25
openstackgerritMatt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682  https://review.openstack.org/6206720:25
uvirtbotLaunchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/125868220:25
*** zehicle has joined #openstack-infra20:26
jeblairmriedem: ^ see clarkb's comment20:28
sdagueso I'm landing mriedem's current patch, but a follow up to add would be accepted20:28
*** zehicle_at_dell has quit IRC20:29
mriedemcheck-grenade-* and gate-grenade-* right?20:29
mordredbackscroll!20:30
mordredalso20:30
mordredthe internet works20:30
mordredI can type20:30
mordredso happy20:30
mordredmorning everyone20:30
*** rcarrillocruz has joined #openstack-infra20:30
clarkbmordred: good morning20:30
clarkbfungi: we have an image id!20:30
mfermordred good morning20:30
*** rongze has joined #openstack-infra20:31
clarkbfungi: almost done building I think20:31
*** rcarrillocruz1 has quit IRC20:31
sdaguemriedem: yes20:31
anteayamorning mordred20:32
sdaguealso, just a style thing, I've been putting the conjunctions after the break20:32
*** Ryan_Lane has quit IRC20:33
anteayalooking at the gerrit merge log, once 24 hours has passed is there a way of seeing the merges that happened at a specific timestamp20:33
openstackgerritMatt Riedemann proposed a change to openstack-infra/elastic-recheck: Add grenade jobs to the bug 1258682 e-r query  https://review.openstack.org/6208420:33
uvirtbotLaunchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/125868220:33
anteayaor at least to the closest hour?20:33
mriedemclarkb: sdague: https://review.openstack.org/6208420:33
anteayaonce 000 utc occurs everything is just attributed to the same date20:33
jeblairanteaya: you can use an ssh query20:33
jeblairanteaya: or the git log.  or the git log for openstack/openstack.20:33
anteayaokay thanks20:34
sdaguemriedem: landed20:34
*** gyee has quit IRC20:35
mordredjeblair: in the airport this morning, jog0 requested that we add the infra repos to openstack/openstack - I think it might be an interesting idea - possibly in an infra subdir to be clear about what they are20:35
fungianteaya: yes, like i suggested yesterday, you can see it in cgit if you like browsers... http://git.openstack.org/cgit/openstack/oslo.messaging/log/20:36
fungi(otherwise, the git log command)20:36
jeblairmordred: why?20:36
mordredjeblair: but his specific question was that when he's trying to track down when something started acting wonky, he's been using o/o to walk backwards and look at system state20:36
anteayafungi: yes, thanks20:36
clarkbfungi: waiting for the image to leave the building state is like watching paint dry20:36
*** rongze has quit IRC20:36
mordredjeblair: so knowing what various infra things looked like around the time of commit X was a thing he was looking to be able to do20:36
clarkbfungi: just want to ready, we have two slaves building20:36
fungiyup20:37
*** zehicle has quit IRC20:37
jeblairmordred: that has limited ulitily with infra; most of our changes take effect between 10 and 1440 minutes after the commit lands...20:37
openstackgerritA change was merged to openstack-infra/elastic-recheck: Add e-r query for bug 1258682  https://review.openstack.org/6206720:37
*** jcooley_ has joined #openstack-infra20:37
uvirtbotLaunchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/125868220:37
mordredjeblair: yeah - that's what I said - and devstack and devstack-gate are already in there20:38
mordredbut I guess there are potentially things in config, such as job changes, that might be helpful to look at? I feel non-strongly in either direction20:38
jeblairmordred: i don't think that was really the intent behind openstack/openstack (i mean, gee, we could just log gerrit merges if that's what's needed)20:39
*** jcooley_ has quit IRC20:39
*** zehicle_at_dell has joined #openstack-infra20:39
openstackgerritMichael Still proposed a change to openstack-infra/config: Add project configuration.  https://review.openstack.org/6208520:39
mordredjeblair: nod20:39
openstackgerritA change was merged to openstack-infra/elastic-recheck: Add grenade jobs to the bug 1258682 e-r query  https://review.openstack.org/6208420:40
uvirtbotLaunchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/125868220:40
jeblairmordred: so i'm inclined to say "let's not" and if jog0 is extremely persuasive that it's totally useful and he's solved all kinds of problems by having the infra git log on the screen with the openstack git log, maybe we look at doing that or a git merge log thing...20:40
mordredjeblair: kk. works for me20:40
sdaguejeblair: now that you have you awesome priority tool - could you bump this to the top of the queue - https://review.openstack.org/#/c/61428/ ?20:42
clarkbfungi: we have slaves!20:42
sdaguemarkmcclain thinks that may solve the ssh race (or at least make it a ton better)20:42
fungiin jenkins-dev and everything20:42
clarkbfungi: they aren't running jobs like I expected though20:42
fungiclarkb: but not actually running any jobs20:42
sdaguebasically a neutron + nova change set pair needed to land, the neutron one did, the nova one did not20:42
fungijinx20:42
*** melwitt has joined #openstack-infra20:42
sdaguethe massive uptick corresponds to the neutron one landing20:43
*** eharney has joined #openstack-infra20:43
openstackgerritMichael Still proposed a change to openstack-infra/jeepyb: Rename the subscriber map to be a more generic config file.  https://review.openstack.org/6207320:43
openstackgerritMichael Still proposed a change to openstack-infra/jeepyb: Allow configurable mappings to different LP projects  https://review.openstack.org/6207420:43
jeblairsdague: ack, i'll start on that.20:43
sdagueit's speculation, but good speculation20:43
clarkbfungi: With that in place I am going to grab lunch very quickly20:43
fungisdague: i'm betting it was for the CVE-2013-6419 fix?20:44
uvirtbotfungi: ** RESERVED ** This candidate has been reserved by an organization or individual that will use it when announcing a new security problem.  When the candidate has been publicized, the details for this candidate will be provided. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-6419)20:44
fungiclarkb: k20:44
sdaguefungi: it's your patch... so you tell me :)20:44
fungisdague: then yes20:44
clarkbfungi: maybe we should just manually trigger some jobs on there, I think that will be sufficient for the nodepool node removal stuff to happen20:44
clarkb(by mnually I mean via gearman)20:44
jeblairzuul promote --pipeline gate --changes 61428,220:44
fungisdague: i had to pester the nova devs a but for approval on their half, so it lost a day and the neutron part went in first20:45
fungier, a bit20:45
fungijeblair: magic!20:46
dkehnclarkb: quickq: with the cirros-0.3.1-x86_64 images, when trying to ssh into them what is username/password?20:46
mordred"clarkb | fungi: we have slaves!"20:46
mordredclarkb, fungi: does that mean nodepool static slave replacement?20:46
fungimordred: if only it was meant the way you read it20:46
fungimordred: though yes, we do20:46
fungimordred: there are already several infra jobs dogfooding on the nodepool bare slaves20:47
*** melwitt has quit IRC20:47
fungimordred: but we were talking about nodepool dev slaves on jenkins-dev20:47
jeblairmordred: fungi and clarkb are working on jenkins-dev; we are using nodepool slaves for some infra jobs, but not generally yet20:47
mordredneat20:48
* mordred is very behind - but thinks everyone is great20:48
jeblairclarkb, fungi, mordred: fyi the zuul promote command waits for the queue to be completely reset before returning; that means it can take a while.20:49
fungijeblair: noted20:49
mordredjeblair: thanks20:49
mordredjeblair: also, baller command20:49
*** talluri has quit IRC20:50
openstackgerritMichael Still proposed a change to openstack-infra/config: Add project configuration.  https://review.openstack.org/6208520:50
*** esker has joined #openstack-infra20:50
fungii did like "jump the queue" but shortened to just "jump" it lost a bit of its contextual meaning as a command-line20:50
*** talluri has joined #openstack-infra20:50
jeblair6 minutes in this case20:50
dkranzclarkb: Sorry, was away. I think your logic was fine and I didn't change it. But unlike previous attempts I tried to run the code locally and got syntax errors that I could not figure out.20:51
jeblairfungi: yeah, promote was the only one that read correctly to me as a direct object20:51
*** vkozhukalov has joined #openstack-infra20:51
fungiwfm20:51
fungimore important is that it does what we want, which it seems to20:51
dkranzclarkb: So I pushed the same logic using the subset of bash I actually understand.20:51
fungijeblair: clarkb: presumably we should be using a modified nodepool node name for the slaves added to jenkins-dev so we can tell them apart in a nova list more easily?20:52
dkranzclarkb: This is an important change so at this point I suggest accepting what I pushed if it is correct, or some one who really knows bash just take over the patch.20:52
*** melwitt has joined #openstack-infra20:53
jeblairfungi: it would be nice, though that affects jjb and zuul config.  not sure the right answer, but i won't be upset if we accidentally delete a dev slave.20:54
fungijeblair: okay, i won't worry too much about it for now20:54
*** talluri has quit IRC20:54
fungiand yeah, the stability of these slaves is beneath concern20:55
*** harlowja has joined #openstack-infra20:56
*** sdake_ is now known as sdake-OOO20:56
*** sdake is now known as sdake-OOO220:57
*** dolphm has quit IRC20:59
*** zehicle_at_dell has quit IRC20:59
jeblairwe should really get rid of gate-noop before going to all-dynamic slaves21:00
dkehn clarkb: quickq: with the cirros-0.3.1-x86_64 images, when trying to ssh into them what is username/password?21:00
fungidkehn: clarkb is out to lunch, but the internets tell me that you can log in as the "cirros" user with a password of "cubswin"21:02
fungisomeone is obviously a chicagoan21:02
dkehnfungi: thxs21:02
funginp21:03
*** Ryan_Lane has joined #openstack-infra21:04
*** jcooley_ has joined #openstack-infra21:05
*** AaronGr is now known as AaronGr_afk21:13
*** mfer_ has joined #openstack-infra21:15
*** mfer has quit IRC21:16
*** mfer_ has quit IRC21:16
*** mfer has joined #openstack-infra21:16
*** mfer has quit IRC21:17
*** mfer has joined #openstack-infra21:17
*** ArxCruz has joined #openstack-infra21:18
*** mfer has quit IRC21:19
*** zehicle_at_dell has joined #openstack-infra21:20
*** mfer has joined #openstack-infra21:20
*** smarcet has left #openstack-infra21:20
*** mfer has quit IRC21:20
*** mfer has joined #openstack-infra21:21
clarkbI am back21:21
fungiclarkb: i hacked up a copy of trigger-job.py in my homedir on jenkins-dev and tried to use it to inject the parameters for https://jenkins01.openstack.org/job/gate-tempest-dsvm-full/2194/parameters/ (third window of the screen session there) but no luck, just sits and no slave gets assigned. what bits may i be missing?21:22
*** mfer has quit IRC21:22
*** mfer has joined #openstack-infra21:23
clarkbfungi: I don't think jenkins-dev knows about that job21:23
fungioh, hurr21:23
clarkbhttps://jenkins-dev.openstack.org/job/gate-tempest-devstack-vm-full/ is the job it knows about21:23
fungiyeah21:24
fungii guess we need to rerun jjb on it?21:24
clarkbfungi: maybe21:25
fungior i can just sub out the job name21:25
fungitrying that first21:25
clarkbok21:25
fungiaha, node labels21:27
*** syerrapragada has joined #openstack-infra21:28
*** changbl_ has quit IRC21:29
*** changbl has joined #openstack-infra21:29
*** ^demon|away is now known as ^d21:29
dkranzclarkb: Did you see my comments above?21:30
*** alcabrera has quit IRC21:30
fungiclarkb: well, no dice. i changed that job to look for devstack-precise (which matches the label on those nodes) rather than dev-devstack-precise, then retried to trigger the job, but still not much going on21:31
*** gyee has joined #openstack-infra21:31
*** ArxCruz has quit IRC21:31
fungii wonder if the parameter list for the job needs to match the parameters i'm passing with the script now :/21:32
*** vkozhukalov has quit IRC21:33
clarkboh could be21:33
clarkbfungi: also is gearman hooked up properly/21:34
fungiooh, good question21:34
* fungi checks the plugin21:34
*** jasond has joined #openstack-infra21:35
fungiinstalled, though a couple of revs behind21:35
jasondis something wrong with the gate jobs?  this seems to be stuck https://review.openstack.org/#/c/59851/21:35
fungiclarkb: we should probably update that anyway from a proper new-jenkins testing perspective21:36
*** esker has quit IRC21:37
fungijasond: stuck how? i see it being tested in the gate pipeline on http://status.openstack.org/zuul/21:37
clarkbfungi: ++21:37
fungiclarkb: updating it now21:38
jasondfungi: it still says "Need Verified" after a reverify 5 hours ago.  so it's working as expected?21:39
fungijasond: yes, that means it's in the gate for testing. there are 17 changes still ahead of it by my count21:41
*** jcooley_ has quit IRC21:41
jasondfungi: oh ok.  thanks for checking21:41
fungidepending on how many of those fail, could still be a while21:41
fungisdague: the theory that https://review.openstack.org/61428 would address ssh timeouts seems to be debunked. after being promoted to the head of the gate it failed on gate-tempest-dsvm-neutron on bug 125389621:44
uvirtbotLaunchpad bug 1253896 in tempest "Attempts to verify guests are running via SSH fails. SSH connection to guest does not work." [Critical,Confirmed] https://launchpad.net/bugs/125389621:44
jasondfungi: i noticed that jenkins' vote has been removed since the last reverify.  do i need to recheck again?21:44
fungijasond: that gets removed automatically, until gate testing concludes21:44
fungithen it will either get a green checkmark or a red x in that column21:45
jasondfungi: okay, thanks21:45
openstackgerritElizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add 2 new ci publication branches to gerritbot  https://review.openstack.org/6209521:46
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Process logs with CRM114  https://review.openstack.org/6209621:47
jeblairclarkb, fungi, mordred: ^21:47
anteayafungi: markmcclain has another candidate21:48
jeblairclarkb, fungi, mordred: crm114 is a fun language.  :)21:48
anteayafungi: he is in a meeting right now and then will address it21:49
fungijeblair: i expect to set aside some time this weekend to revel in it21:49
fungianteaya: thanks for the update. i was mainly just passing along the result21:49
anteayayeah21:49
*** AaronGr_afk is now known as AaronGr21:49
jeblair"Because the commonest use of LIAF is in iteration, LIAF means Loop Iterate Awaiting Failure.   If that's too hard to remember, just pretend that LIAF is FAIL spelled backwards."21:49
anteayayou are correct if it failed on the bug, it is highly unlikely it is the fix for it21:50
fungiheh21:50
clarkbjeblair: is that from a how to doc?21:50
*** harlowja has quit IRC21:50
clarkbfungi: where are you running the gearman client?21:50
fungiclarkb: locally on jenkins-dev... should i not?21:51
clarkbfungi: I just did a netstat -ln and don't see a port 4730 listening. is zuul-dev stilla thing I bet that is where we need to run it21:51
fungiclarkb: should be on 127.0.0.121:51
clarkbfungi: right I think zuul-dev is running the gearman server that jenkins-dev is connected to21:52
fungiclarkb: but was just going to surmise we might need one. i believe the jenkins-gearman plugin is going to refuse to activate if it can't connect to a gearman server21:52
jeblairclarkb: it's from a 283 page non-free book.  :(21:52
*** jcooley_ has joined #openstack-infra21:52
clarkbfungi: yup looks like zuul-dev. I would give your command a shot there21:52
fungiclarkb: aha. zuul-dev *does* exist. i'll try there21:52
fungii just found it as well21:53
clarkbjeblair: you'll just have to explain everything then :)21:53
*** SergeyLukjanov_ has quit IRC21:53
jeblairclarkb: it's distributed with the project.  i dunno what the licensing deal is with the book.  fortunately, the software is clear.  ;)21:54
*** vkozhukalov has joined #openstack-infra21:54
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Process logs with CRM114  https://review.openstack.org/6209621:54
jeblairrequisite pep8 fix ^21:55
clarkbjeblair: oh I see, the book is available just not free21:55
fungiclarkb: oh, after the jenkins-dev restart, nodepool deleted those slaves so it'll be a bit before new ones are enrolled21:56
*** jcooley_ has quit IRC21:57
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Process logs with CRM114  https://review.openstack.org/6209621:57
clarkbfungi: was it supposed to delete them?21:57
fungiclarkb: dunno, but the age on them is about right21:57
*** CaptTofu has joined #openstack-infra21:58
clarkbfungi: that seems odd to me, but can probably be ignored for now21:58
fungiaha, i think it may be having trouble reconnecting to the gearman plugin on jenkins-dev21:58
anteayamarkmcclain feels that this patch: https://review.openstack.org/#/c/62098/ I reversion of a rpc patch may address bug 125389621:58
uvirtbotLaunchpad bug 1253896 in tempest "Attempts to verify guests are running via SSH fails. SSH connection to guest does not work." [Critical,Confirmed] https://launchpad.net/bugs/125389621:58
anteayaany chance of it getting a priority in the check queue?21:59
fungiclarkb: anyway, i'm being paged to go out to dinner now, so i'll have to continue this once i return21:59
fungibut i'll restart the nodepoold first21:59
*** mfer has quit IRC21:59
*** harlowja has joined #openstack-infra22:00
openstackgerritElizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add 2 new ci publication branches to gerritbot  https://review.openstack.org/6209522:00
pleia2sneaky whitespace22:00
*** sarob has joined #openstack-infra22:00
clarkbfungi: ok, I can try triggering the job by hand over on zuul-dev22:01
*** esker has joined #openstack-infra22:01
*** jcooley_ has joined #openstack-infra22:01
*** rcarrillocruz has quit IRC22:02
fungiclarkb: what i was *going* to run is... (in ~/zuul with . venv/bin/activate) ./tools/trigger-job.py --job gate-tempest-devstack-vm-full --project openstack/nova --pipeline gate --newrev 1436c1707a127dc82136b1046934c8a56b558a0a --refname refs/zuul/master/Z2d5c1f5108fa490a8971e381fd423a09 --logpath 28/61428/2/gate/gate-tempest-devstack-vm-full/7e1d10e,222:02
funginote that the zuul on zuul-dev is too old to have trigger-job.py22:02
fungiso it may also be too old to support it, not sure yet22:02
fungioh, and i just realized i didn't modify it to support all the additional parameters a gate job would want like i did the copy i was initially testing on jenkins-dev22:03
*** praneshp has quit IRC22:03
fungibut if you want it, it's in the same place in my homedir there22:03
jeblairfungi: trigger-job doesn't affect zuul, it goes straight to the worker22:03
clarkbfungi: ok thanks22:04
*** esker has quit IRC22:04
fungijeblair: right, okay then it should be fine22:04
* fungi vanishes22:04
openstackgerritA change was merged to openstack-infra/statusbot: Set world-readable permissions on alert file  https://review.openstack.org/6158822:05
jeblairclarkb: let me know if you need anything22:06
clarkbjeblair: will do, btw looking at the crm114 change I like how simple the actual mechanics of it are. Next step is CRM114 as a service? :)22:07
*** resker has joined #openstack-infra22:07
clarkbcurrently waiting for nodepool to spin up two slaves that I can trigger jobs against22:07
clarkbit is running a job \o/22:08
clarkbI didn't have to do anything22:08
jeblairclarkb: heh :) note there's a level there too -- we can disable the filter by removing it from the config yaml22:08
jeblairs/level/lever/22:08
*** praneshp has joined #openstack-infra22:08
clarkbslave 19 is running a devstack job22:09
hemanth_Hi, can anyone help me with this? http://logs.openstack.org/14/59814/8/check/gate-tempest-dsvm-neutron-large-ops/ee6bfe0/console.html22:09
hemanth_not really sure what that means22:09
clarkbhemanth_: http://logs.openstack.org/14/59814/8/check/gate-tempest-dsvm-neutron-large-ops/ee6bfe0/logs/screen-g-api.txt.gz notice in the console log it was attempting to start glance when it failed22:10
*** thomasem has quit IRC22:11
hemanth_clarkb: oops, thanks so much for pointing it!22:13
clarkbjeblair: https://jenkins-dev.openstack.org/job/gate-tempest-devstack-vm-full/7896/console do we actually expect those jobs to run successfully? I think it may be too old22:14
clarkbjeblair: but the job did run and nodepool put the slave into the delete state22:14
clarkband removed it from jenkins22:14
*** openstackstatus has quit IRC22:14
*** openstackstatus_ has joined #openstack-infra22:14
clarkbslave is now completely gone from jenkins-dev22:14
*** openstackstatus_ is now known as openstackstatus22:14
*** resker has quit IRC22:14
jeblairclarkb: yeah, i think that ip might be an old machine i was running22:15
jeblairclarkb: long gone.  so yeah, i wouldn't worry about the jobs themselves, just the mechanics around them.22:15
clarkbjeblair: yeah the SCP thing doesn't bother me22:15
*** prad has quit IRC22:15
clarkbdevstack stopping so quickly does bother me a bit22:15
*** openstackstatus has quit IRC22:15
*** openstackstatus_ has joined #openstack-infra22:15
*** openstackstatus_ is now known as openstackstatus22:16
clarkbjeblair: anything else you think we should look at before planning some rolling upgrades?22:16
jeblairclarkb: it probably tried to fetch a zuul ref from prod22:16
jeblairclarkb: (new ZUUL_URL feature could help with that)22:16
clarkbold zuul refs maybe22:16
*** harlowja has quit IRC22:16
clarkboh from review.o.o?22:16
jeblairclarkb: no i mean i think the jobs are the same jobs as in production, so it tried to fetch from zuul.o.o not zuul-dev.o.o22:16
clarkboh right22:17
*** openstackstatus has quit IRC22:17
*** openstackstatus_ has joined #openstack-infra22:17
*** openstackstatus_ is now known as openstackstatus22:17
jeblairi'll go fix statusbot22:17
*** openstackstatus has quit IRC22:17
*** openstackstatus has joined #openstack-infra22:18
*** openstackstatus has quit IRC22:18
zaroclarkb: https://issues.jenkins-ci.org/browse/JENKINS-2100622:18
*** openstackstatus has joined #openstack-infra22:18
jeblairzaro: neat, thanks22:19
jeblairclarkb: do a jjb run?  delete at least one job from the cache so it does something..22:20
jeblairclarkb: other than that, the only thing i would think is burn-in -- do we want to leave it running for a few days to see if leakes or explodes with nodepool annoying it all the time?22:22
clarkbjeblair: we can22:22
clarkbjeblair: looks like you ran JJB by hand on jenkins-dev. doing that now22:23
*** harlowja has joined #openstack-infra22:24
clarkbjeblair: we don't have jjb running periodically out of a system location there, so I won't worry about cache and just apply all the jobs22:24
jeblairk22:25
*** jcooley_ has quit IRC22:25
*** esker has joined #openstack-infra22:25
jeblairclarkb: if you want to start with the rolling upgrades without burning in on -dev, that's fine.  we do have 2 masters.22:25
*** AlexF has joined #openstack-infra22:25
*** jerryz has quit IRC22:25
clarkbjeblair: part of me wants to, the other part of me realizes the weekend is near22:25
*** harlowja has quit IRC22:27
openstackgerritA change was merged to openstack-infra/config: Fix serving alert json file on eavesdrop  https://review.openstack.org/6159322:28
*** harlowja has joined #openstack-infra22:28
openstackgerritA change was merged to openstack-infra/config: Don't re-exec in check-dg-tempest-dsvm-full  https://review.openstack.org/6156922:29
*** resker has joined #openstack-infra22:32
*** esker_ has joined #openstack-infra22:33
*** esker has quit IRC22:34
clarkbJJB is creating a bunch of jobs, seems to be happy22:35
clarkbjeblair: maybe we upgrade jenkins.o.o today then do 01 and 02 monday?22:35
*** vkozhukalov has quit IRC22:35
clarkbthat will give us a bit more burn in on less active machines22:35
*** denis_makogon_ has joined #openstack-infra22:35
jeblairclarkb: i'd rather do jenkins.o.o last since it's not HA22:35
clarkboh good point22:35
clarkbreverting is relatively easy, I am very tempted to go ahead and try 0122:36
*** resker has quit IRC22:37
*** AlexF has quit IRC22:37
*** dangers is now known as danger_fo_away22:39
*** AlexF has joined #openstack-infra22:40
*** weshay has quit IRC22:40
*** jasond has quit IRC22:42
*** paul-- has joined #openstack-infra22:42
*** ryanpetrello has quit IRC22:44
clarkbjeblair: JJB seems to have been fine, no apparent errors22:48
jeblairclarkb: cool22:48
clarkbjeblair: how do you feel about upgrading 01 or 02 today? My only concern is I will be in CA early next week and may not have as much time to babysit then22:49
*** CaptTofu has quit IRC22:49
*** ^d has quit IRC22:50
*** CaptTofu has joined #openstack-infra22:50
*** esker_ has quit IRC22:50
jeblairclarkb: wfm22:50
clarkbok putting 01 in shutdown mode now22:52
*** rcleere has quit IRC22:54
*** esker has joined #openstack-infra22:57
*** dkliban has quit IRC22:58
jeblairclarkb: i'll be afk for a while, back in a bit22:58
*** bpokorny has quit IRC22:58
clarkbjeblair: ok ping me when you are back, hopefully 01 will be quiet by then23:01
*** mgagne has quit IRC23:03
*** sarob has quit IRC23:07
*** sarob has joined #openstack-infra23:08
*** sarob has quit IRC23:09
*** sarob has joined #openstack-infra23:09
*** datsun180b has quit IRC23:10
*** sarob has quit IRC23:11
*** sarob has joined #openstack-infra23:11
*** oubiwan__ has quit IRC23:12
*** sarob has quit IRC23:12
*** gyee has quit IRC23:13
*** rcarrillocruz1 has joined #openstack-infra23:13
*** sarob has joined #openstack-infra23:13
*** rcarrillocruz2 has joined #openstack-infra23:14
*** sarob has quit IRC23:15
*** rcarrillocruz2 has quit IRC23:15
*** sarob has joined #openstack-infra23:15
*** AlexF has quit IRC23:16
*** sarob has quit IRC23:16
*** rcarrillocruz1 has quit IRC23:17
*** sarob has joined #openstack-infra23:18
*** rnirmal has quit IRC23:20
*** fbo is now known as fbo_away23:20
*** sarob has quit IRC23:21
nikhil__hi23:21
clarkbhello23:21
*** sarob has joined #openstack-infra23:21
nikhil__hey clarkb23:21
nikhil__can you please help me figure out23:21
nikhil__if there's a typo in https://jenkins01.openstack.org/job/check-grenade-dsvm/2036/console ?23:22
nikhil__2013-12-13 22:51:35.733 | [ERROR] ./grenade.sh:263 Failure in upgrade-glancwe23:22
clarkblooks like it23:23
nikhil__that is one of the jenkins runs23:23
clarkbgit grep glancwe in the grenade repo will show you where23:23
clarkbright, but the typo is in grenade23:23
nikhil__oh, is that in the openstack-infra project?23:23
*** sarob has quit IRC23:24
clarkbno23:24
*** sarob_ has joined #openstack-infra23:24
clarkbit is an openstack-dev project like devstack23:24
nikhil__oh23:24
anteayanikhil__: http://git.openstack.org/cgit/openstack-dev/grenade/tree/23:24
clarkbjeblair: fungi: jenkins01 will be idle any minute now. Let me know when at least one of you is around. Though I may go ahead and upgrade jenkins01 if I don't hear from you guys in a bit just for the sake of time23:25
nikhil__thanks clarkb anteaya , checking it out now23:25
*** sarob_ has quit IRC23:27
*** blamar has quit IRC23:28
*** sarob has joined #openstack-infra23:29
*** sarob has quit IRC23:30
*** sarob has joined #openstack-infra23:31
reedsarob, to create a new list https://wiki.openstack.org/wiki/Community#Mailing_lists_in_local_languages23:32
*** sarob has quit IRC23:32
jeblairclarkb: re23:33
clarkbjeblair: there is one job on 01 currently running on hpcloud region b. I think it has a couple more minutes23:34
*** sarob has joined #openstack-infra23:34
*** sarob has quit IRC23:35
*** praneshp has quit IRC23:36
*** sarob has joined #openstack-infra23:36
fungiokay, back... checking scrollback to see where we are23:37
clarkbfungi: jenkins-dev seemed happy with nodepool and jjb so I have put jenkins01 in shutdown mode, waiting on one job there before upgrading23:37
clarkbfungi: I will be in CA early next week so figured doing this now was beneficial despite being friday23:38
fungiyep, great!23:38
fungiso once jenkins-dev's nodepool built new slaves it picked up on the corrected node labels i guess?23:38
clarkbfungi: I guess, because the jobs started running23:38
fungiwondering whether the jenkins-gearman plugin uprgade had anything to do with tat23:38
fungithat23:38
*** praneshp has joined #openstack-infra23:39
clarkbpossibly, maybe it couldn't handle the job data being sent previously23:39
fungiso you didn't actually have to manually trigger any jobs at all i guess. too awesome23:39
fungiinterestingly, jenkins-dev has one devstack slave which is already marked offline but is running a tempest job. slightly odd...23:40
clarkbfungi: the nodes get marked offline when they start the jobs23:40
*** sarob has quit IRC23:41
fungihowever, it also thinks that tempest job should only take a total of ~2 minutes23:41
*** sarob has joined #openstack-infra23:41
clarkbfungi: yeah the job is failing, jeblair thinks it is trying to clone zuul refs from zuul.o.o and not zuul-dev23:41
fungiahh, upload timeouts23:41
clarkbbut the mechanics of add node, delete node, seem fine23:41
clarkbfungi: upload timeouts are because jeblair killed the scp endpoint23:41
fungiit probably is trying to clone from zuul.o.o23:42
*** sarob has quit IRC23:42
fungizuul-dev has too old of a zuul to pass the ZUUL_URLparameter23:42
clarkbthis regionb slave is taking forver23:43
*** sarob has joined #openstack-infra23:43
clarkbalmost tempted to kill a job and leave a comment on the change apologizing23:44
*** sarob has quit IRC23:45
fungiclarkb: assuming it's https://jenkins01.openstack.org/job/check-tempest-dsvm-full/2367/ the change already failed another dsvm job anyway23:45
clarkbthats the one23:45
clarkbok I will just manually kill it23:45
*** sarob has joined #openstack-infra23:45
clarkbfungi: want to leave the comment?23:45
fungiot failed the postgres-full so it's getting a -1 from check regardless23:45
fungisure23:45
funginova devs have grown a thick skin, i think ;)23:46
clarkbgoing to give nodepool a minute or so to try and cleanup that node23:46
fungioh, and it's rustlebee's change anyway ;)23:46
*** sarob has quit IRC23:46
clarkblet me know when you are ready for me to stop jenkins, do the upgrade and start it again23:47
fungii should be nice to him, he did approve vulnerability fixes for me yesterday, after all23:47
fungiclarkb: go for it23:47
clarkbdoing it now23:47
clarkbit is starting23:48
*** sarob has joined #openstack-infra23:48
*** sarob has quit IRC23:49
*** sarob has joined #openstack-infra23:50
clarkbaccording to zuul it is running jobs, still waiting on the guithough23:50
sdaguehmmm... it looks like the only we are finding new errors in logs is in grenade, which wasn't quite the intent of that job.23:50
*** flaper87 is now known as flaper87|afk23:50
*** denis_makogon_ has quit IRC23:51
sdagueI think it might be worth turning that off - https://review.openstack.org/#/c/62107/23:51
*** esker has quit IRC23:52
fungisdague: makes sense23:52
*** esker has joined #openstack-infra23:53
fungierror checks against stable, particularly, are going to be myriad until the icehouse release, i expect23:53
fungiclarkb: jenkins01 looks happytimes23:54
jeblairooh neat you can collapse the exceutor status box23:54
clarkbfungi: yup seems to be doing its job23:54
*** sarob has quit IRC23:54
jeblair"master + 115 computers (7 of 8 executors)"23:54
jeblairno idea what "7 of 8 executors" means.23:55
* fungi nods. +/- glyphs23:55
fungidunno, but it's fancified23:55
clarkbjeblair: now, do you want to let 01 burn in?23:55
*** sarob has joined #openstack-infra23:56
jeblairclarkb: yeah, i kinda do.  see what it looks like after a few hours/days of thrashing23:56
clarkbwfm23:56
fungialso, once we're done upgrading 01 and 02 we should not forget poor jenkins.o.o23:56
fungibut the weekend (or at least a night of churning through the gate) should give us some idea23:56
jeblairclarkb: hopefully if something does go wrong, 02 will continue to keep things going23:56
clarkbI will do my best to make time to check in and help upgrade the others on Monday23:57
*** esker has quit IRC23:57
fungiclarkb: where in ca (also, is that the state code or country code)?23:57
clarkbfungi: the state23:58
fungii need to know whether to send sheriffs or mounties23:58
clarkbI will be in sunnyvale23:58
jeblairthere are only 536 threads on 01 compared to 1,869 on 0223:58
clarkbjeblair: nice23:58
fungioh, sounds work-related. apologies23:58
clarkbfungi: it is! but it is work related in a good way23:58
fungiget zaro a fresh laptop as a souvenir23:58
clarkbShould have lots of time to sit with AaronGr and go over all the things23:59
funginice23:59
AaronGrclarkb: exciting.23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!