Monday, 2014-01-20

sdaguefungi: hey, so what just happend with nodepool, just saw a huge drop in nodes00:03
fungisdague: i'm restarting it to try the aggressive-delete patch and see if that gets us back some of those deleted nodes faster00:03
sdaguecool00:04
sdague+100:04
fungibut it has to quiesce node creation/deletion activity before a graceful restart00:04
fungialmost there00:04
lifelessfungi: ctrl-C :P00:04
*** jhesketh_ has joined #openstack-infra00:05
sdaguefungi: once you are done, promoting - 67480 is probably a good idea. It will help give us a console on some of the network tests that are racing00:09
fungiwill do00:10
*** salv-orlando has quit IRC00:11
*** dcramer_ has quit IRC00:12
sdaguethanks00:12
openstackgerritlifeless proposed a change to openstack-infra/config: Clamp MTU in TripleO instances  https://review.openstack.org/6774000:14
openstackgerritlifeless proposed a change to openstack-infra/config: Update the geard server for tripleo-gate.  https://review.openstack.org/6768000:14
openstackgerritlifeless proposed a change to openstack-infra/config: Configure eth1 for DHCP in tripleo-gate instances  https://review.openstack.org/6726000:14
fungisdague: the promote did interesting things to the enqueued times for a few changes in the gate00:15
lifelessyay00:21
lifeless| 3e251d4c-377d-4fa4-9b6a-4eff78f86cd7 | precise-1390175363.template.openstack.org | ACTIVE | image_pending_upload | Running     | default-net=10.0.0.7, 138.35.77.21; tripleo-bm-test=192.168.1.12 |00:21
lifelesssignificant progress00:21
lifelessnow if I can just get someone to take all my patches ;)00:21
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Don't load system host keys.  https://review.openstack.org/6773800:23
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Ignore vim editor backup and swap files.  https://review.openstack.org/6765100:23
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Only attempt to copy files when bootstrapping.  https://review.openstack.org/6767800:23
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Document that fake.yaml isn't usable.  https://review.openstack.org/6767900:23
lifelessand -woo- | 02394c2d-a200-4e9b-83a4-ca2d87b411f1 | precise-ci-overcloud-1.slave.openstack.org | BUILD  | spawning   | NOSTATE     |00:24
lifelessci-overcloud open for business, minions!00:24
fungibooyah00:25
lifelessfungi: so with all my patches applied, it should be good. We're now blocked on this :(00:29
lifelessfungi: how can I move it forward?00:29
mattoliverauBeen reading a weekends (including US friday) worth of scroll back. Looks like the new new new zuul migration worked out well. glad to hear it!00:29
fungilifeless: use toothpicks to hold clarkb's eyelids open while he reviews all of it ;)00:30
lifelessclarkb: Hi. You need toothpicks?00:30
lifelessfungi: can mordred review this, if I can distract him sufficiently?00:31
fungimattoliverau: well enough. the stumbling blocks we did hit counted as learning experiences/bugs worth fixing00:31
fungilifeless: probably. i can too if i free myself up sufficiently, but there's a lot of other stuff we all need to review too00:31
lifelessfungi: I know :(00:32
sdaguefungi: it did?00:34
lifelessmordred: it would be a great help if you could review everything from me in infra/config and infra/nodepool00:35
sdaguehmmm... yeh, so it definitely reset some of those00:35
sdagueinteresting00:35
fungisdague: i bet it's gerrit dependencies00:35
fungilook at the pattern00:36
sdagueyeh, could be00:36
sdagueso the items only include the roots?00:36
fungiseems to always be items reset immediately following other items from the same project. zuul looks for and pulls in any approved dependencies, so however that's being accomplished may be creating new objcects00:37
fungiobjects00:37
fungirather than reusing the existing ones00:37
sdaguewell, it's recreating all of them00:37
sdaguemy patch just provided a way of setting the enqueue time00:38
sdaguebut I guess the children are a little different00:38
fungigot it. so i guess that code path isn't hit for dependent changes00:38
*** DennyZhang has joined #openstack-infra00:39
sdagueyep00:39
sdagueso I think once this lands - https://review.openstack.org/#/c/67739/ stable/havana will work again00:39
sdagueat least that's the current blocker00:39
*** sarob has joined #openstack-infra00:43
mordredlifeless: what?00:49
lifelessmordred: I've kindof patchbombed a bunch of stuff to get tripleo-ci functional (the infra/config and nodepool bits we need)00:49
mordredlifeless: yeah - I saw that - I'll go read00:50
mordredonce those land, you belive that ci-overcloud is good for business?00:50
mordredlifeless: also, how much capacity does it have? can we also use it for normal gate nodes?00:50
lifelessmordred: derekh and I have been pushing hard on actually, you know, having it all work and we're now (running manually) in end to end fine tuning00:50
mordred:)00:50
lifelessmordred: so status with these patches:00:50
lifeless - we should be able to run 24 cores of jenkins slaves00:51
lifeless - and right now uhm 10 test environments00:51
fungi24 cores of jenkins slaves meaning 6 slaves?00:51
fungi(at 4x vcpu each)00:51
lifelessfungi: dunno, depends on the size we choose. remember it's not running devstack-gate00:51
fungiahh, yeah00:52
mordredright. I was just asking about its capacity for also running d-g - mainly because I'm asking everyone that right now. the answer can be "nope"00:52
lifelesswe have another 40+ machines we can start scaling up into00:52
lifelessplus the RH cloud coming along00:52
lifelessI'm trying to highlight that 'good for business' is nuanced :)00:53
lifelessthe silent queue runs on everything but doesn't vote, right ?00:53
fungilifeless: the silent queue doesn't report to the change at all, just uploads logs and sends stats to graphite00:54
lifelessfungi: is there something that reports but won't vote ?00:54
fungilifeless: use a filter in the jobs section of layout.yaml to set voting: false on a job or job name pattern00:55
lifelessoh right00:55
fungisame place you filter which jobs run on what branch name patterns00:55
fungithen it will report back to the change, but its result won't be taken into account for the verify score00:56
lifelessmordred: oh, running actual devstack-gate jobs, the ones rh and hp run today?00:58
lifelessmordred: I think we should layer that in only after everything else is working00:59
mordredkk00:59
lifelessmordred: not so much a capacity issue (though there is that) but rather what benefit we get00:59
lifelessd-g is running elsewhere00:59
lifelesstripleo-gate isn't00:59
mordredd-g is runnign elsewhere, but the gate is under pretty massive duress atm00:59
lifelessonce tripleo-gate is running and heading up the path to being a symmetric gate with everything else01:00
mordredalthough maybe rax will bump our quota01:00
mordredlifeless: ++01:00
lifelessthen adding more d-g nodes in excess capacity would be a great thing to do01:00
openstackgerritlifeless proposed a change to openstack-infra/config: Don't vote with gate-tripleo-deploy yet.  https://review.openstack.org/6774301:03
mordredlifeless: ok. your config changes are +2/+A - they all seem pretty directly only touching tripleo at the moment01:14
lifelessmordred: yeah, we're not in the collective gate yet01:15
*** sarob has quit IRC01:17
*** DennyZhang has quit IRC01:17
*** sarob has joined #openstack-infra01:20
fungisdague: the promoted change failed on a tempest test with "The resource could not be found."01:22
*** nosnos has joined #openstack-infra01:24
sdaguebummer, link?01:27
jog0so it looks like the console logs are still not in elasticSearch is that correct01:28
fungisdague: https://jenkins04.openstack.org/job/gate-tempest-dsvm-full/3488/01:28
jog0fungi: ^01:28
fungijog0: they should be for any jobs run through jenkins01 and jenkins0201:28
mordredjog0: we're rolling out the new plugin version one jenkins ata  time01:29
mordredbtw - the fact that we have 5 jenkins master is still kinda amazing to me01:29
jog0mordred: yeah last I checked it was 301:30
*** Guest52195 is now known as maelfius01:30
jog0ahh, I'll manually check for jenkins01 logs in elasticSearch01:30
*** maelfius is now known as Guest6258501:31
jog0this data missing means we are running partially blind in elastic-search01:31
mordredjog0: thats what you'll get for being sick for a period of time01:31
mordredyup01:31
mordredjog0: sdague was talking about that earlier01:31
fungijog0: i think dims has a patch proposed for adding the name of the jenkins master as a metadata field so it can be searched/summarized01:31
jog0fungi: yeah that will really help01:32
*** Guest62585 is now known as needscoffee01:32
*** needscoffee has joined #openstack-infra01:32
*** needscoffee is now known as morganfainberg01:32
*** morganfainberg is now known as morganfainberg|z01:32
jog0touchdown seahawks01:32
*** morganfainberg|z is now known as morganfainberg01:32
mordredjog0: that was a RUN01:33
mordredclarkb: are you at the stadium?01:33
openstackgerritA change was merged to openstack-infra/nodepool: Permit specifying instance networks to use.  https://review.openstack.org/6639401:33
*** cyeoh has quit IRC01:33
sdaguefungi: sigh, yeh, that's unrelated. It was on my monday fix list01:34
openstackgerritA change was merged to openstack-infra/nodepool: Permit using a known keypair when bootstrapping.  https://review.openstack.org/6764901:34
openstackgerritA change was merged to openstack-infra/nodepool: Add some debugging around image checking.  https://review.openstack.org/6765001:34
openstackgerritA change was merged to openstack-infra/nodepool: Only attempt to copy files when bootstrapping.  https://review.openstack.org/6767801:34
openstackgerritA change was merged to openstack-infra/nodepool: Document that fake.yaml isn't usable.  https://review.openstack.org/6767901:34
openstackgerritA change was merged to openstack-infra/nodepool: Don't load system host keys.  https://review.openstack.org/6773801:34
*** dcramer_ has joined #openstack-infra01:34
openstackgerritA change was merged to openstack-infra/nodepool: Ignore vim editor backup and swap files.  https://review.openstack.org/6765101:34
jog0fungi: confirmed that jenkins01 logs are in elasticSearch01:34
jog0at least for a passing job01:35
fungijog0: and for jenkins02 that should be the case as well, as of about 6 hours ago (rough estimate)01:35
lifelessfungi: where is your branch updating the nodepool definition for ci-overcloud ? I have tweaks01:35
jog0fungi: cool01:35
*** sarob has quit IRC01:35
fungilifeless: https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:tripleo-ci,n,z01:36
fungiit's really just https://review.openstack.org/66491 though01:36
openstackgerritlifeless proposed a change to openstack-infra/config: Update TripleO Cloud API endpoint for Nodepool  https://review.openstack.org/6649101:39
lifelessmordred: ^ needed too01:39
lifelessthen I think we can turn it on and start debugging the actual test scripts01:39
openstackgerritA change was merged to openstack-infra/config: Improve tripleo nodepool image build efficiency.  https://review.openstack.org/6725501:43
lifelessI think then I need to look into how all the zuul ref stuff works so that we can make sure we run the code being merged not the code in trunk01:43
openstackgerritA change was merged to openstack-infra/config: Configure eth1 for DHCP in tripleo-gate instances  https://review.openstack.org/6726001:43
openstackgerritA change was merged to openstack-infra/config: Update the geard server for tripleo-gate.  https://review.openstack.org/6768001:44
sdaguefungi: so yah, that's the big giant stack trace in pci01:45
mordredlifeless: it's actuallu pretty straightforward - zuul sends you a refspec and you use that01:46
lifelessmordred: yeah, I know but ...01:46
lifelessmordred: we need to translate that to our various refs01:46
lifelessetc01:46
lifelessits not that its hard, its that we need to do it01:46
mordredlifeless: wait - what do you mean by "our various refs" ?01:48
mordredwhy would your refs be different?01:48
lifelesswe have one set of variables - git url, branch, commitish - per source repository01:48
lifelesswe don't consult ZUUL_REF01:48
mordredwell, if you don't consult ZUUL_REF, your going to have a very hard time getting the right commit01:49
lifelessthus my point01:49
lifelessjust like devstack doesn't consult ZUUL_REF but devstack_gate arranges it so things DTRT we need to do the same01:50
mordredI do not undersatnd your souther hemisphere english01:50
*** dkranz has quit IRC01:51
StevenKmordred: You need to read it upside down01:54
mordredStevenK: DOH01:54
*** zhiwei has joined #openstack-infra01:54
lifelessmordred: anyhow, nvm - I know we have more to do, and I know how zuul works it, and I know our plumbing which you perhaps don't know as much as you could :)01:58
jog0given a failed job how do I now which jenkins server it ran on?02:00
jog0http://logs.openstack.org/92/64592/4/check/check-tempest-dsvm-neutron-isolated/c6cda8d/02:00
fungijog0: easiest way is to look at the hyperlink embedded in the first few lines to the slave hostname02:01
mordredlifeless: I usually hire a plumber to deal with plumbing issues...02:01
jog0fungi: oh nice02:02
jog0jenkins01, so this should get console logs02:02
lifelessmordred: you did02:03
mordredlifeless: that's what I'm saying02:04
fungisdague: i just noticed where the subway can use another color... chances cancelled because they depend on another change which is failing or hitting a merge conflict (wight now those show up red)02:04
fungis/chances/changes/02:04
openstackgerritChangBo Guo proposed a change to openstack-dev/hacking: Add check for removed modules in Python 3  https://review.openstack.org/6104902:05
lifelessmordred: are you offering to do the work for tripleo?02:05
lifelessmordred: or am I just horribly confused02:05
*** pcrews has quit IRC02:05
mordredlifeless: let's go with confused02:06
*** senk has joined #openstack-infra02:07
sdaguefungi: sure02:08
mordredjog0: wow. that was a throw right there02:08
jog0mordred: didn't see it got distracted by work02:09
jog0but I did hear yelling from the bar down the street02:09
mordredjog0: oh my. it was a 50+ yard throw 4th down conversion for a TD02:09
*** senk has quit IRC02:09
jog0ouch02:10
jog0tie game02:10
mordredit was the type of throw which makes me worry for property damage in downtown sf02:10
mordredjog0: nope. seattle is in the lead by 3 now02:10
jog0ahh the online score is outdated02:10
jog0I was downtown on new years and it looked like something out of mad max02:11
mordredI try to not be in placse like that02:11
mordredof course, we've been booking our stuff for mardi gras, so I'm actually full of shit :)02:11
sdaguejog0: you didn't manage to classify this one yet, did you - https://bugs.launchpad.net/nova/+bug/127068002:11
jog0I can't imagine what sf would do in this case02:11
jog0sdague: no sorting out some kinks on my e-r patch02:12
sdaguejog0: ok, cool, I just didn't want to dupe something you'd gotten02:13
*** sarob has joined #openstack-infra02:13
jog0sdague: btw I think it would be interesting to plot  commits to openstack/openstack and zuul gate queue02:13
sdaguejog0: sure. I'm trying to keep a balance between making the problem visible and fixing it02:14
sdaguebecause visibility is only seeming to work so much02:14
jog0sdague: heh yeah, well that would tell us if things are getting worse or better merge rate wise02:14
jog0so I don't know the answer to the following question: did concurrency=2 in gate make things better or worse02:15
fungiugh... https://jenkins04.openstack.org/job/gate-swift-dsvm-functional/728/consoleText02:15
jog0and merge rate *may* shine a *little* insight into that02:15
fungifail on an hpcloud-az2 slave failing to connect via ipv6 to git.openstack.org. wtf?02:15
fungiBuilding remotely on devstack-precise-hpcloud-az2-1143800 [...] fatal: unable to connect to git.openstack.org: git.openstack.org[0: 2001:4800:7813:516:3bc3:d7f6:ff04:aacb]: errno=Network is unreachable02:16
fungiuh, yeah, hpcloud az2 has no ipv6. why did you try to use it?02:16
sdaguesweet fumble!02:16
* fungi is apparently missing some very enthralling sportball02:17
sdagueyes02:17
sdagueespecially if you don't like SF :)02:17
StevenKI am too, but australia only shows american football on pay TV02:17
lifelessfungi: you might have local ipv6 connectivity02:18
clarkbI ammissing it :(02:18
fungilifeless: well, somehow that slave thought it had a global ipv6 address assigned02:18
*** yaguang has joined #openstack-infra02:18
clarkbsaw the kearse td. did seattle just recover a fumble?02:18
clarkbsdague ^02:18
*** sarob has quit IRC02:18
lifelessfungi: clearly it *did802:18
lifelessfungi: just not a working one...02:19
fungiindeed02:19
openstackgerritSean Dague proposed a change to openstack-infra/elastic-recheck: add hit for bug 1270680  https://review.openstack.org/6775102:19
fungifreakish. first time i've seen an hp vm do that02:19
sdagueclarkb: yes02:19
sdaguejog0: can you look at that er fingerprint?02:19
lifelessoh wow we clone all of stackforge too...02:20
jog0sdague: looking02:20
lifelessI wonder if we made one mega git repo02:20
lifelessand sucked *everything* into it02:20
lifelessand then made branches it would be faster02:20
mordredlifeless: I've been meaning to get a grokmirror thing set up - the kernel guys says it helps02:24
jog0sdague: message:"TRACE nova.api.openstack"   AND message:"pci.py"  AND message:"InstanceNotFound: Instance"   AND filename:"logs/screen-n-api.txt"02:24
jog0message:"TRACE nova.api.openstack"  AND message:"InstanceNotFound: Instance"   AND filename:"logs/screen-n-api.txt"02:24
jog0those have very different hit counts02:24
*** senk has joined #openstack-infra02:24
sdaguethey do, it's not limitted to that extension02:24
mordredclarkb: how are you MISSING the sportsball?02:24
sdagueat least from what I can tell02:24
mordredclarkb: it's one of teh best games of sports I've seen in a while02:24
openstackgerritChangBo Guo proposed a change to openstack-dev/hacking: Add check for removed modules in Python 3  https://review.openstack.org/6104902:25
clarkbmordred: I have friends that dont sports ball. about to be at a house party will try watching from there02:25
mordredclarkb: 8:33 left in the 4th02:25
sdaguejog0: I actually think this is one of the new ones that is biting us hard02:25
clarkbmordred we still winning02:26
clarkb?02:26
sdaguewow, worst handoff ever02:26
sdagueclarkb: yes, but refumbled02:26
mordredclarkb: yeah. but by 3 - and jsut lost it on downs02:26
jog0sdague: so this has rougly equal hits for FAILURE and SUCCESS02:26
jog0which is actually not a horrible query02:26
mordredclarkb: be VERY glad you didn't see the knee braek though02:26
sdaguejog0: yes, you read the log message right02:26
sdagueeven on success, we are doing bad things, because we're going to be leaking resources02:26
sdagueas those success versions are on tempest compute deletes02:27
jog0agreed02:27
jog0so LGTM02:27
*** nati_uen_ has quit IRC02:27
* jog0 +As sdague's patch02:27
sdaguewoot02:27
sdaguethis quarter is just rediculous02:30
clarkbI need play by play :)02:30
openstackgerritA change was merged to openstack-infra/elastic-recheck: add hit for bug 1270680  https://review.openstack.org/6775102:31
StevenKclarkb: Aren't there apps for that?02:31
sdagueclarkb: seatle just intercepted02:31
mordredclarkb: seattle just intercepted again02:31
lifelessmordred: goingto +A too ? https://review.openstack.org/#/c/66491/ [it's passed everything except the yaml order check, which it doesn't affect]02:32
sdaguewhich means we just had: fumble SF, fumble (but not called) SEA, fumble (and self recovery) on 4th down by SEA, interception by SF02:33
sdaguein about 8 downs02:33
mordredyeah - fungi, you ok with https://review.openstack.org/#/c/66491/ going in?02:33
sdaguethe only thing that would make this better is snow :)02:34
mordredsdague: or a giant earthquake02:34
fungimordred: sure, it won't take effect automatically anyway because it's the reason puppet's still disabled on nodepool.o.o02:34
mordredfungi: k. awesome02:35
fungimordred: see https://review.openstack.org/66958 and accompanying bug 1269001 for details02:35
sdaguefungi: when you get a chance can you see if I goofed this up too badly - https://review.openstack.org/#/c/67591/02:35
sdaguethat will give us the uncategorized jobs list02:36
notmynamehow do I deal with the error on the 2nd job in the gate right now? logs: https://jenkins04.openstack.org/job/gate-swift-dsvm-functional/728/console02:37
notmynameerror connecting to git02:37
openstackgerritA change was merged to openstack-infra/config: Clamp MTU in TripleO instances  https://review.openstack.org/6774002:38
notmynameif the top one fails will it stay in? or is it too late? any chance it can be retried right there so as not to wait another 40+ hours?02:38
*** senk has quit IRC02:38
openstackgerritA change was merged to openstack-infra/config: Don't vote with gate-tripleo-deploy yet.  https://review.openstack.org/6774302:38
notmynamepatch set 65604,302:39
funginotmyname: i'm stumped on that one--was looking at it earlier. hpcloud west doesn't provide global ipv6 to tenant networks, so why it thought it had one is a real enigma02:39
mordredclarkb: field-goal. seahawks up by 602:39
*** mrda has quit IRC02:39
notmynamefungi: any hope for it going in? looks like zuul already recalculated it so it has to go to the bottom of the queue with a manual reverify?02:39
funginotmyname: i think the very top of that diagram gets it wrong02:40
funginotmyname: if you look at https://jenkins04.openstack.org/job/gate-swift-dsvm-functional/728/ it says Other changes tested concurrently with this change: 65255,102:40
notmynamefungi: so my only hope is that the top one fails?02:40
funginotmyname: yeah, if the change running ahead of it fails, it will be retested on the branch tip02:41
*** mrda has joined #openstack-infra02:42
sdaguefungi: so we've seen this creep up a couple times before - http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiMjAwMTo0ODAwOjc4MTM6NTE2OjNiYzM6ZDdmNjpmZjA0OmFhY2JcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiYWxsIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM5MDE4NTgyOTgxNX0=02:45
sdaguehttp://logs.openstack.org/19/65019/2/gate/gate-grenade-dsvm/d5c5219/console.html.gz02:46
sdaguefungi: so maybe we should register a bug and a fingerprint for it02:46
*** gokrokve has quit IRC02:48
sdagueclarkb: interception in the end zone02:50
mordredclarkb: INTERCEPTED02:50
sdagueby SEA02:50
jog0wow SF fail02:50
mordredlike, wow02:50
clarkbmordred sdague you guys are awesome thank you02:50
sdaguefinal score SF: 17, SEA 2302:53
clarkb\o/02:53
jog0☹02:54
mordredclarkb: when is supersportsball?02:54
mordredclarkb: next week or in 2 weeks?02:55
clarkb2weeks02:55
clarkbfeb 2nd02:55
mordredI'll be in Brussels02:56
mordredI'm going to need to find a place with the game02:56
mordredbecause broncos seahawks is going to be interesting02:57
*** gokrokve has joined #openstack-infra02:57
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Mark resolved bugs  https://review.openstack.org/6775202:57
sdaguejog0: so look at - http://status.openstack.org/elastic-recheck/02:58
jog0sdague: looking02:58
sdagueI actually think Bug 1270680 - v3 extensions api inherently racey wrt instances - might be one of our biggest new gate issues02:58
sdagueand the reason we're getting killed right now02:59
jog0sdague: yeah I agree02:59
jog01270680 has a pretty graph lol02:59
jog0so many colors02:59
sdagueheh, yeh02:59
sdagueso I marked it as critical for nova02:59
jog0sdague: cool03:00
sdagueI'll dive on it tomorrow03:00
jog0it looks like its tome to do some git log and git bisect03:00
jog0since we know when it started03:00
sdagueactually, it's when we actually started testing it03:00
sdaguethat code's been in nova since oct03:00
jog0:(03:00
sdaguemaybe something else changed wrt to it03:00
sdaguealso, we're kind of mostly blind for the last couple of weeks03:01
jog0hmm so logstash.o.o doesn't show the same graph03:01
jog0there are hits before jan 16th there03:01
*** gokrokve has quit IRC03:02
openstackgerritA change was merged to openstack-infra/config: Update TripleO Cloud API endpoint for Nodepool  https://review.openstack.org/6649103:02
*** cyeoh has joined #openstack-infra03:02
*** AaronGr_Zzz is now known as AaronGr03:03
sdaguejog0: maybe hot data issue03:03
sdaguelet's see if the query fills out next go around03:03
jog0yeah03:03
openstackgerritA change was merged to openstack-infra/elastic-recheck: Mark resolved bugs  https://review.openstack.org/6775203:04
notmynamefungi: what bug number should I use for a recheck?03:05
openstackgerritA change was merged to openstack-infra/elastic-recheck: Add check for bug 1270608  https://review.openstack.org/6771303:05
*** praneshp_ has joined #openstack-infra03:05
notmynameI know you guys are aware of it, but I want this to be in the logs (ie on the record). the patch that is about to fail (because the test node couldn't connect to git.o.o) has been in the queue for at least 50 hours and been rechecked 19 times due to gate resets03:06
funginotmyname: i don't think we have one for it--not that i've seen at any rate03:06
*** praneshp has quit IRC03:06
*** praneshp_ is now known as praneshp03:06
sdaguenotmyname: I think it's worth reporting one against openstack-ci and we can build an er query for it03:08
sdaguethere were 2 logstash hits back on the 8th03:08
sdagueso it happens from time to time03:08
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Remove remaining cases of '@message'  https://review.openstack.org/6775403:08
sdagueoh, finally, I was going to get around to that03:08
notmynamesdague: could you please do that and give me a bug number?03:08
sdaguenotmyname: you can't register a bug?03:09
*** sarob has joined #openstack-infra03:09
notmynamesdague: I'm not particularly in the mood to file a bug against the gate and make it polite or charitable03:10
notmynamesee the above number for why03:10
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Remove remaining cases of '@message'  https://review.openstack.org/6775403:10
*** AaronGr is now known as AaronGr_Zzz03:11
*** sarob_ has joined #openstack-infra03:13
*** sarob has quit IRC03:14
notmynamethe job ahead actually failed!!!!!!!!03:15
fungiseems that way03:15
notmynameand now with another 60+ minutes to check the status, I'm stepping away for a bit03:16
*** sarob_ has quit IRC03:18
clarkbis something broken?03:19
clarkbsorry superbowl is happening03:19
*** gokrokve has joined #openstack-infra03:19
StevenKNot for another two weeks? :-P03:19
fungiclarkb: nothing new is broken, to my knowledge. do you ask for any particular reason, or just checking in?03:20
clarkbfungi the bug number questions03:22
fungiclarkb: oh, apparently we saw an hpcloud vm in az2 fail a job because it tried to connect to the ipv6 address of git.o.o and (unsurprisingly) got a network unreachable response03:23
fungiwhich means it must have somehow gotten a global-scope address from somewhere03:24
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Clarify required parameters in query_builder  https://review.openstack.org/6775603:24
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Use short build_uuids in elasticSearch queries  https://review.openstack.org/6759603:24
*** gokrokve has quit IRC03:24
fungiclarkb: all i can guess is maybe another client in the same ethernet broadcast domain had radvd running or was otherwise generating router advertisements for some reason03:25
clarkbawesome03:25
clarkbdont we firewall that?03:27
clarkbor not because no ipv6 typically03:27
fungiipv6 icmp type for ra? probably not explicitly03:27
*** vkozhukalov has joined #openstack-infra03:28
lifelessfungi: so, if its all landed/ing we can reenable puppet?03:35
*** nati_ueno has joined #openstack-infra03:41
jog0https://review.openstack.org/#/c/67485/03:41
jog0that should help with resource issues ever so slightly03:41
jog0sdague: ^03:41
jog0thats to get better classification numbers03:41
*** nati_ueno has quit IRC03:43
*** nati_ueno has joined #openstack-infra03:44
fungilifeless: possibly. i'm not sure if this is a good week to be experimenting with (and potentially destabilizing) nodepool, but i won't really be around to troubleshoot it much during the week so i'll defer to clarkb and mordred if they're going to be in a position to keep an eye on it03:48
jog0sdague: can you review https://review.openstack.org/#/c/67596/203:54
jog0still waiting for some more data to finish testing03:54
jog0but it looks like its working03:54
jog0hada failed to classify failure03:54
jog0when the current e-r had a incorrect classification03:55
jog0waiting for a successful classification03:55
*** uriststk has joined #openstack-infra03:56
*** slong has quit IRC04:06
*** slong_ has joined #openstack-infra04:06
*** sarob has joined #openstack-infra04:13
*** uriststk has quit IRC04:14
fungisdague: jog0: that latest gate reset looks like nova v3 api problems again. does v3 testing just need to be disabled again?04:17
*** sarob has quit IRC04:18
cyeohfungi: do you have a link to that failure?04:19
*** coolsvap has joined #openstack-infra04:19
fungicyeoh: https://jenkins03.openstack.org/job/gate-tempest-dsvm-neutron/2524/consoleText04:19
cyeohfungi: thx04:20
*** gokrokve has joined #openstack-infra04:20
mattoliveraucyeoh: are you breaking the v3 api again :P I see you survived the Adelaide heat wave, Melbourne had it pretty bad as well, damn thing followed us back from LCA ;P04:23
StevenKmattoliverau: I think the heatwave was on your flight04:25
cyeohmattoliverau: between Perth and Adelaide I ended up with 7 days in a row >40C and three in a row >44C04:25
cyeohI don't think the v3 API is broken but am looking now just to check :-)04:25
*** gokrokve has quit IRC04:25
mattoliverauStevenK: maybe it hid in my bags :P04:25
*** dcramer_ has quit IRC04:26
fungicyeoh: see the earlier discussion, sdague asserts "bug 1270680 - v3 extensions api inherently racey wrt instances - might be one of our biggest new gate issues"04:26
StevenKmattoliverau: Haha04:26
notmynamebug 1264972 for it?04:26
notmynamefungi: ah, bug 1270680 instead?04:26
cyeohfungi: thanks, will look into it now04:27
funginotmyname: i'm not sure--i just keep the lights on. i defer to nova devs like cyeoh and sdague on these sorts of things04:27
* fungi compares error messages04:28
cyeohI guess sdague is asleep by now...04:28
fungieh, it's not even midnight in our tz yet. he's probably just distracted by sportball (assuming the game is still going anyway)04:29
StevenKNo, game finished04:29
funginotmyname: 1264972 looks more searchable anyway04:31
*** nati_uen_ has joined #openstack-infra04:32
cyeohfungi: oh yes, 1270680 is definitely a problem. I think there's lighter weight things we can do than disable the v3 api testing though04:32
*** nati_uen_ has quit IRC04:33
*** nati_uen_ has joined #openstack-infra04:33
fungicyeoh: if it's something which will significantly reduce spurious tempest test failures, i'll gladly shove it to the head of the gate so fewer changes get kicked out needlessly04:35
notmynamewhat magic is behind the zuul queue having patches that have been in the queue for 4 hours ahead of patches that have been around for 35 hours?04:35
cyeohfungi: cool - am just looking now at how to fix it. I think a proper fix should be pretty straight forward04:36
*** nati_ueno has quit IRC04:36
funginotmyname: sdague's change to carry over the enqueue time isn't actually used on changes which are gerrit dependencies (you'll note the offenders follow changes with sane-looking enqueue times for the same project)04:36
notmynamefungi: ok04:36
fungiso when those dependent changes get reenqueued, they end up with their enqueue times reset apparently. just noticed that myself a few hours ago04:37
notmynamefungi: I figured it had something to do with dependencies. so the patches with shorter times have been around just as long, or those were put up front because of the git logic?04:37
notmynamefungi: ah ok04:37
fungiyeah, they've been in there as long as the others04:38
fungiit's just lying04:38
fungicosmetic bug04:38
jog0fungi: I haven't dug into the v3 work enough to know if disabling is the right move04:38
fungijog0: cyeoh seems to have lighter-weight ideas there04:39
*** dcramer_ has joined #openstack-infra04:39
jog0fungi: cool04:40
cyeohfungi, jog0: so I think we have this potential racey failure mode all over both v2 and v3 APIs04:44
cyeohI guess we've just been lucky in the past (or we haven't noticed it anyway)04:44
jog0cyeoh: agreed04:46
jog0its v2 and v304:46
cyeohjog0: so I guess there's two ways to fix this. Cache a whole lot more information in the resp_obj or fail gracefully in extensions if the instance is not found04:49
jog0failing when we don't need to is a bad idea04:50
cyeohI think I prefer the latter - not including a bit of information about an instance which has just been deleted anyway seems okayish to me04:50
jog0as in if the data in the DB but we can't find it ... thats  bad04:50
cyeohyea, in this case its because the data has just been deleted.04:51
jog0cyeoh: ahh04:51
cyeohso we can just not append the data we can't to anymore (because of the race)04:51
cyeoh"can't get to" I mean04:51
jog0cyeoh: TBH I haven't looked at this enough to have enough of an understanding of the issue04:51
jog0cyeoh: so its your call. I am distracted by elastic-recheck stuff at the moment04:52
cyeohjog0: ok, np. I'll see if its by luck just hitting a specific extension which we can fix quickly now, or if we need to fix all of them to make a difference04:52
*** amotoki has joined #openstack-infra04:55
jog0cyeoh: thanks04:55
*** gokrokve has joined #openstack-infra04:57
jog0cyeoh: can you update that bug with  your comments04:57
cyeohjog0: just did04:57
jog0cyeoh: excellent04:58
cyeohhrm and looking through logstash for occurrences of it just found another bug in the v2 api ;-)04:59
*** senk has joined #openstack-infra05:00
jog0:/05:02
*** slong has joined #openstack-infra05:07
*** slong_ has quit IRC05:07
*** katyafervent has quit IRC05:11
*** katyafervent has joined #openstack-infra05:11
*** sarob has joined #openstack-infra05:13
*** nicedice has quit IRC05:15
*** sarob has quit IRC05:18
jog0fungi: how far are we from getting all the jenkins masters to have the console.html=>elasticSearch fix05:19
fungijog0: clarkb and zaro were monitoring the plugin upgrade on jenkins02 before applying it on the others05:20
*** senk has quit IRC05:20
jog0fungi:  thanks, from what I See the fix is defiantly helping05:21
*** nati_ueno has joined #openstack-infra05:21
*** nati_ueno has quit IRC05:21
jog0haven't seen a missing console on any jenkins 01 and 02 nodes05:22
*** nati_ueno has joined #openstack-infra05:22
*** krtaylor has joined #openstack-infra05:25
*** nati_uen_ has quit IRC05:25
lifelessfungi: ah, so its less about experimenting with nodepool and more about getting jobs running for us; its rather critical path05:25
lifelessfungi: we'll obviously stand ready to support any issues it might cause05:25
lifelessfungi: could we run a separate nodepool in fact, avoid whatever bugs might lurk in nodepool?05:26
*** chandankumar_ has joined #openstack-infra05:26
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Sort uncategorized fails by time  https://review.openstack.org/6776105:26
lifelessmordred: clarkb: ^ you may be more awake :P05:26
clarkbwhats wrong with the existing nodepool?05:27
clarkbI'm not sure the nodepool cli is built for two nodepools05:27
clarkbor the db stuff05:27
* fungi is awake, but not *very* awake05:30
lifelessclarkb: fungi is worried that turning tripleo-test-cloud on will cause issues w/nodepool when the gate is already fragile05:31
lifelessclarkb: I was suggesting to mitigate that by running an entirely separate nodepool that is connected to the same geard05:32
funginothing's necessarily wrong with the the existing nodepool. i just don't want to greenlight all the tripleo-ci-supporting patches for it and the config by reenabling puppet on the server when i'm not going to necessarily be around to troubleshoot it05:32
fungiso leaving that call to those who will be around05:33
jog0is there a bug filed for: http://logs.openstack.org/23/66223/1/gate/gate-python-heatclient-pypy/9950fd5/console.html#_2014-01-19_01_08_37_06305:35
jog0pip timeouts05:35
clarkbfungi I see05:37
*** michchap has quit IRC05:37
*** michchap has joined #openstack-infra05:37
clarkbjog0 I think if you search under openstack-ci there may be05:37
jog0all I found was https://bugs.launchpad.net/openstack-ci/+bug/125416705:38
jog0which is a little differnt05:38
jog0this is the fingerprint I am using:  filename:"console.html" AND message:"download.py\", line 495"05:38
jog0there aren't many occurrences thankfully05:38
lifelessclarkb: back in the states?05:40
*** DinaBelova_ is now known as DinaBelova05:40
*** SergeyLukjanov_ is now known as SergeyLukjanov05:40
clarkblifeless: yes, mostly over jetlag now05:41
lifelessclarkb: \o/05:42
openstackgerritlifeless proposed a change to openstack-infra/config: Tripleo-gate needs the gear library.  https://review.openstack.org/6776205:42
fungithe jetlag's not gone, just lulling you into a false sense of security05:42
StevenKHaha05:42
lifelessmordred: more ^ fodder05:42
lifelessmordred: we could install that at runtime, but its really part of base setup05:42
*** carl_baldwin has joined #openstack-infra05:45
clarkbI will change scp plugins tomorrow on 03 and 04 then resume holidaying05:45
jog0clarkb: thanks05:46
*** nosnos has quit IRC05:52
*** nosnos_ has joined #openstack-infra05:52
fungioh, right, tomorrow is a usa holiday05:55
lifelessnuts :(05:55
* fungi has lost track of which days are weekends much less holidays05:55
fungiand yes, we're all nuts here05:56
*** oubiwann_ has quit IRC05:56
StevenKOh, MLK day05:58
jog0clarkb: how do I add grenade logs to elasticSearch05:59
jog0actully being this is supposed to be the weekend, never mind05:59
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Add query for bug 1270710  https://review.openstack.org/6776406:01
clarkbjog0 add the files to the list of files06:01
jog0clarkb: where is that?06:02
clarkbjog0 though ideally we list the files without paths and recursively look them up06:02
jog0clarkb: so in tempest the files are under logs/06:03
clarkbmodules/openstack_project/files/logstash/sometjingclient.yaml06:03
jog0but in grenade they are under new/logs06:03
clarkbjog0 right. today logstash needs full paths06:03
fungiokay, i swear i'm really going to try to take a nap now06:04
jog0clarkb: lets pick this up on tuesday06:04
clarkbjog0 ok. sherlock is on now :)06:04
*** nati_uen_ has joined #openstack-infra06:10
clarkbjog0 is the pip fail downloading pip installer from github during devstack?06:12
clarkbthat is arguably a devstack bug06:12
jog0clarkb: http://logs.openstack.org/23/66223/1/gate/gate-python-heatclient-pypy/9950fd5/console.html#_2014-01-19_01_08_37_06306:13
*** sarob has joined #openstack-infra06:13
clarkbah no different problem06:13
*** nati_ueno has quit IRC06:14
*** gokrokve has quit IRC06:16
*** sarob has quit IRC06:18
*** rahmu has quit IRC06:27
*** DinaBelova has quit IRC06:28
*** carl_baldwin has quit IRC06:29
*** rahmu has joined #openstack-infra06:29
*** DinaBelova has joined #openstack-infra06:31
*** mrda has quit IRC06:41
*** vkozhukalov has quit IRC06:44
*** bookwar has joined #openstack-infra06:46
*** gokrokve has joined #openstack-infra06:47
*** gokrokve_ has joined #openstack-infra06:49
*** jhesketh_ has quit IRC06:50
*** yamahata has joined #openstack-infra06:52
*** gokrokve has quit IRC06:52
*** nosnos_ has quit IRC06:52
*** nosnos has joined #openstack-infra06:53
*** gokrokve_ has quit IRC06:54
*** jhesketh has quit IRC06:56
amotokihi, I would like to request gerrit account for external testing.06:57
*** pblaho has joined #openstack-infra06:57
amotokiI am now working on neutron third party testing. Is this a right place to request an account?06:57
*** gokrokve has joined #openstack-infra06:58
*** nati_ueno has joined #openstack-infra06:59
clarkbamotoki please see thr document at http://ci.openstack.org06:59
*** SergeyLukjanov is now known as SergeyLukjanov_a06:59
*** SergeyLukjanov_a is now known as SergeyLukjanov_07:00
*** nati_uen_ has quit IRC07:01
amotokiclarkb: I saw http://ci.openstack.org/third_party.html and there are several ways: #openstack-infra , ML, bug report. Can I request it in this channel?07:02
*** gokrokve has quit IRC07:03
*** nati_ueno has quit IRC07:04
*** nati_ueno has joined #openstack-infra07:04
*** SergeyLukjanov_ is now known as SergeyLukjanov07:08
*** mrda has joined #openstack-infra07:09
clarkbamotoki: you can but it is sunday night before a US holiday. a better bet is the mail list07:10
amotokiclarkb: ah.... thanks.. I will request it via the list.07:12
*** sarob has joined #openstack-infra07:13
*** yolanda has joined #openstack-infra07:14
*** jhesketh_ has joined #openstack-infra07:15
*** sarob has quit IRC07:17
*** SergeyLukjanov is now known as SergeyLukjanov_07:18
*** morganfainberg is now known as morganfainberg|z07:19
*** mrda has quit IRC07:20
*** mayu has joined #openstack-infra07:29
*** jcoufal has joined #openstack-infra07:29
*** crank has quit IRC07:30
*** mayu has quit IRC07:34
*** NikitaKonovalov_ is now known as NikitaKonovalov07:43
*** crank has joined #openstack-infra07:44
*** afazekas_ has joined #openstack-infra07:52
*** SergeyLukjanov_ is now known as SergeyLukjanov07:53
*** nati_uen_ has joined #openstack-infra07:53
*** morganfainberg|z is now known as morganfainberg07:54
*** nati_ueno has quit IRC07:56
*** gokrokve has joined #openstack-infra07:59
ttxFTR I'm traveling all day, mostly on a non-wifi transatlantic plane08:02
*** mrda has joined #openstack-infra08:02
*** gokrokve has quit IRC08:04
*** yolanda has quit IRC08:05
*** jamielennox is now known as jamielennox|away08:08
*** crank has quit IRC08:09
*** mrda has quit IRC08:09
*** crank has joined #openstack-infra08:09
*** zhiwei has quit IRC08:09
*** zhiwei has joined #openstack-infra08:09
*** hashar has joined #openstack-infra08:12
*** sarob has joined #openstack-infra08:13
*** sarob has quit IRC08:18
*** flaper87|afk is now known as flaper8708:18
*** crank has quit IRC08:21
*** crank has joined #openstack-infra08:22
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Minor migration fix  https://review.openstack.org/6778908:25
*** yolanda has joined #openstack-infra08:25
*** luqas has joined #openstack-infra08:26
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Introducing basic REST API  https://review.openstack.org/6311808:27
*** vkozhukalov has joined #openstack-infra08:28
*** vkozhukalov has quit IRC08:34
*** matrohon has joined #openstack-infra08:36
*** luqas has quit IRC08:36
*** nati_ueno has joined #openstack-infra08:41
*** nati_ueno has quit IRC08:41
*** nati_ueno has joined #openstack-infra08:42
*** praneshp has quit IRC08:42
*** hashar has quit IRC08:42
*** praneshp has joined #openstack-infra08:44
*** nati_uen_ has quit IRC08:44
*** mrda has joined #openstack-infra08:44
*** SergeyLukjanov is now known as SergeyLukjanov_08:47
*** DinaBelova is now known as DinaBelova_08:48
*** jcoufal has quit IRC08:49
*** vkozhukalov has joined #openstack-infra08:51
*** fbo_away is now known as fbo08:52
*** zhiwei has quit IRC08:54
*** senk has joined #openstack-infra08:56
*** senk has quit IRC08:57
*** BobBallAway is now known as BobBall08:58
*** gokrokve has joined #openstack-infra09:00
*** gokrokve has quit IRC09:05
*** nati_ueno has quit IRC09:07
*** mancdaz_away is now known as mancdaz09:07
*** nati_ueno has joined #openstack-infra09:07
*** mancdaz is now known as mancdaz_away09:07
*** luqas has joined #openstack-infra09:12
*** nati_ueno has quit IRC09:12
*** jcoufal has joined #openstack-infra09:12
*** sarob has joined #openstack-infra09:13
*** derekh has joined #openstack-infra09:15
*** yassine has joined #openstack-infra09:16
*** sarob has quit IRC09:18
*** markmc has joined #openstack-infra09:18
*** dpyzhov has joined #openstack-infra09:18
*** yassine has quit IRC09:18
*** yassine has joined #openstack-infra09:18
*** jpich has joined #openstack-infra09:23
*** zhiwei has joined #openstack-infra09:25
*** praneshp has quit IRC09:29
*** dizquierdo has joined #openstack-infra09:35
*** dpyzhov has quit IRC09:35
*** dpyzhov has joined #openstack-infra09:36
*** SergeyLukjanov_ is now known as SergeyLukjanov09:43
*** jamielennox|away is now known as jamielennox09:48
*** SergeyLukjanov is now known as SergeyLukjanov_09:48
*** IvanBerezovskiy has joined #openstack-infra09:49
*** jp_at_hp has joined #openstack-infra09:52
*** mancdaz_away is now known as mancdaz09:54
*** morganfainberg is now known as morganfainberg|z09:57
*** derekh is now known as derekh_afk09:59
*** gokrokve has joined #openstack-infra10:01
*** rwsu has joined #openstack-infra10:03
*** vkozhukalov has quit IRC10:04
*** gokrokve has quit IRC10:06
*** Ryan_Lane has quit IRC10:08
*** johnthetubaguy has joined #openstack-infra10:08
*** amotoki has quit IRC10:08
*** sarob has joined #openstack-infra10:13
*** vkozhukalov has joined #openstack-infra10:16
*** dpyzhov has quit IRC10:16
*** sarob has quit IRC10:18
*** max_lobur_afk is now known as max_lobur10:18
*** zhiwei has quit IRC10:38
*** mrda has quit IRC10:39
*** _ruhe is now known as ruhe10:42
*** zhiwei has joined #openstack-infra10:43
*** mrda has joined #openstack-infra10:46
*** yassine has quit IRC10:46
*** dpyzhov has joined #openstack-infra10:51
*** zhiwei has quit IRC10:55
*** iv_m has joined #openstack-infra10:59
*** ArxCruz has joined #openstack-infra11:01
*** gokrokve has joined #openstack-infra11:01
*** markvoelker has quit IRC11:04
*** gokrokve has quit IRC11:06
*** sarob has joined #openstack-infra11:13
*** sarob has quit IRC11:18
*** rfolco has joined #openstack-infra11:27
*** boris-42 has quit IRC11:31
*** derekh_afk is now known as derekh11:38
*** boris-42 has joined #openstack-infra11:41
*** pblaho has quit IRC11:47
*** jhesketh_ has quit IRC11:51
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Fix the intial db migration  https://review.openstack.org/6759211:51
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Introducing basic REST API  https://review.openstack.org/6311811:52
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Introducing basic REST API  https://review.openstack.org/6311811:54
*** mrda has quit IRC11:56
sdaguefungi: when you wake up, cyeoh has a fix for that new bug11:57
*** jamielennox is now known as jamielennox|away11:58
max_loburSomebody from requirements core group, could you please review/approve the patch https://review.openstack.org/#/c/66349/3. It's already has one +1 from core reviewer12:00
*** gokrokve has joined #openstack-infra12:02
*** gokrokve has quit IRC12:07
*** coolsvap has quit IRC12:07
*** ruhe is now known as _ruhe12:09
*** gsamfira has joined #openstack-infra12:10
*** gsamfira has joined #openstack-infra12:11
*** yassine has joined #openstack-infra12:11
*** sarob has joined #openstack-infra12:13
*** sarob has quit IRC12:18
*** CaptTofu has joined #openstack-infra12:25
*** _ruhe is now known as ruhe12:30
*** dims has quit IRC12:34
*** dpyzhov has quit IRC12:34
*** yaguang has quit IRC12:34
*** yassine has quit IRC12:39
*** dims has joined #openstack-infra12:39
*** yassine has joined #openstack-infra12:40
*** markmc has quit IRC12:41
*** markmc has joined #openstack-infra12:44
*** CaptTofu has quit IRC12:46
*** senk has joined #openstack-infra12:50
*** pblaho has joined #openstack-infra12:50
*** senk has quit IRC12:51
*** dkranz has joined #openstack-infra12:57
*** david-lyle_ has quit IRC12:58
*** SergeyLukjanov_ is now known as SergeyLukjanov12:58
*** DinaBelova_ is now known as DinaBelova12:58
*** AJaeger has joined #openstack-infra13:01
*** smarcet has joined #openstack-infra13:03
*** gokrokve has joined #openstack-infra13:03
*** heyongli has joined #openstack-infra13:06
*** markmc has quit IRC13:07
*** gokrokve has quit IRC13:08
*** ruhe is now known as _ruhe13:11
*** sarob has joined #openstack-infra13:13
*** markmc has joined #openstack-infra13:15
*** sarob has quit IRC13:17
*** mriedem has joined #openstack-infra13:19
*** max_lobur is now known as max_lobur_afk13:26
*** SergeyLukjanov is now known as SergeyLukjanov_13:26
matelHi, I would like to have some recommendations on what is the proper development process for the devstack-gate project13:26
*** _ruhe is now known as ruhe13:29
sdaguematel: can you be more specific for what you are looking for?13:29
*** alexpilotti has joined #openstack-infra13:30
*** thomasem has joined #openstack-infra13:37
*** flaper87 is now known as flaper87|afk13:37
fungisdague: i saw the discussion in #nova... assuming it's https://review.openstack.org/67767 we seem to still need an approver13:41
sdaguefungi: yep13:41
sdagueand test results13:42
fungiwell, yeah, that13:42
sdagueso once we get activity on nova channel, and I get a +A, I'll ping you13:42
fungisounds good13:43
matelsdauge: I want to test some changes in devstack-gate.13:44
matelsdague: I already have an "emulated" node.13:44
matelsdague: ./safe-devstack-vm-gate-wrap.sh seems to use the master.13:45
AJaegerinfra team, fungi: I would love to see the other api projects gated the same way as api-sites (right now they use gate-noop), do you have time for a review, please? https://review.openstack.org/#/c/67394/13:45
matelthe master of devstack-gate13:45
matelsdague: I have this script: https://github.com/matelakat/xenapi-os-testing/blob/start-devstack/launch-node.sh13:46
matelsdague: on line 66, I am checking out the branch that I want to try out.13:47
sdagueyeh, honestly, we don't have a good model for testing that outside of the gate itself right now13:48
sdaguehonestly, when I am making changes I usually use the gate to test them13:49
*** iv_m has quit IRC13:49
*** Ng_ has joined #openstack-infra13:49
matelsdague: How does that work? The issue in my case, is that it requires a xenserver node.13:50
matelWhich does not exist in nodepool yet.13:50
sdaguematel: well we haven't had that situation before13:50
matelsdague: I see.13:51
*** Ng_ has quit IRC13:51
*** Ng_ has joined #openstack-infra13:51
matelsdague: So I would like to modify: https://github.com/openstack-infra/devstack-gate/blob/master/devstack-vm-gate.sh so that it can work with xenserver as well (I need to adjust the localrc basically)13:52
matelsdague: Maybe checking out my branch to a location, and set SKIP_DEVSTACK_GATE_PROJECT ?13:53
sdaguematel: yeh that might work13:53
sdaguethat's in place to test d-g changes actually, so it won't recursively keep checking itself out13:54
matelMy idea is that I'm gonna launch my node, check out d-g to the location(I need to look at it), and see if that works.13:55
matelI need to check where does the checked-out repos live.13:56
matelI guess it will live in $BASE/new13:57
matelwhich is /opt/stack/new.13:57
matelOkay, I give it a try.13:58
*** dstanek has joined #openstack-infra14:01
*** heyongli has quit IRC14:02
*** gokrokve has joined #openstack-infra14:04
*** dcramer_ has quit IRC14:04
*** SergeyLukjanov_ is now known as SergeyLukjanov14:04
*** b3nt_pin has joined #openstack-infra14:08
*** Ng_ has quit IRC14:08
*** gokrokve has quit IRC14:09
*** Ng_ has joined #openstack-infra14:12
*** Ng has quit IRC14:13
*** Ng_ is now known as Ng14:13
*** sarob has joined #openstack-infra14:13
*** b3nt_pin is now known as beagles14:15
sdaguefungi: so that patch failed jenkins on an unrelated race. I still think it should be promoted.14:17
*** sarob has quit IRC14:18
sdaguehttps://bugs.launchpad.net/nova/+bug/1270608 is the other new issue that showed up last week14:19
*** dprince has joined #openstack-infra14:27
*** alexpilotti has quit IRC14:29
*** alexpilotti has joined #openstack-infra14:29
BobBallsdague: what's the recommended way to run a single test in tempest these days?14:29
*** mrodden1 has quit IRC14:30
*** pblaho1 has joined #openstack-infra14:31
mriedemhttps://review.openstack.org/#/c/67767/ is +A'ed, but needs to pass jenkins14:31
*** pblaho has quit IRC14:33
sdagueBobBall: tox -eall testname14:33
*** damnsmith is now known as dansmith14:33
BobBallheh...14:33
BobBallsorry14:33
sdaguefungi: please promote 67767 when you can14:33
BobBallthat should have been one of the combinations I tried.14:33
*** max_lobur_afk is now known as max_lobur14:33
*** eharney has joined #openstack-infra14:34
sdaguefungi: actually abort on that14:37
fungiholding off14:38
*** senk has joined #openstack-infra14:38
*** coolsvap has joined #openstack-infra14:41
*** oubiwann_ has joined #openstack-infra14:45
*** ryanpetrello has joined #openstack-infra14:45
fungiunfortunate... 67767,2 seems to have a merge conflict with some change ahead of it14:47
*** mrodden has joined #openstack-infra14:48
*** SergeyLukjanov is now known as SergeyLukjanov_a14:51
*** pblaho1 has quit IRC14:52
*** SergeyLukjanov_a is now known as SergeyLukjanov_14:52
*** dcramer_ has joined #openstack-infra14:53
*** pblaho has joined #openstack-infra14:55
*** malini has joined #openstack-infra14:55
maliniGood Morning!!14:56
maliniI have a couple of patches outstanding for adding MArconi support14:56
maliniCan I get some reviews please?14:56
malinihttps://review.openstack.org/#/c/65145/14:56
malinihttps://review.openstack.org/#/c/65140/14:57
maliniI need these merged before I can get my patch to tempest merged14:57
*** malini is now known as malini_afk15:00
*** oubiwann_ has quit IRC15:00
*** malini_afk is now known as malini15:02
*** senk has quit IRC15:03
*** oubiwann_ has joined #openstack-infra15:04
*** gokrokve has joined #openstack-infra15:04
*** afazekas_ has quit IRC15:05
*** nosnos has quit IRC15:06
*** annegent_ has joined #openstack-infra15:06
*** senk has joined #openstack-infra15:07
*** senk1 has joined #openstack-infra15:08
*** gokrokve has quit IRC15:09
sdaguefungi: yeh, we're still discussing 6776715:11
*** senk has quit IRC15:12
*** sarob has joined #openstack-infra15:13
*** annegent_ has quit IRC15:13
*** DinaBelova is now known as DinaBelova_15:16
*** afazekas_ has joined #openstack-infra15:16
*** sarob has quit IRC15:18
*** SergeyLukjanov_ is now known as SergeyLukjanov15:19
*** DinaBelova_ is now known as DinaBelova15:20
openstackgerritZang MingJie proposed a change to openstack-infra/zuul: Supply authentication to zuul's gerrit baseurl  https://review.openstack.org/6785815:20
*** dims has quit IRC15:21
*** rakhmerov has quit IRC15:22
*** ryanpetrello has quit IRC15:22
*** rakhmerov has joined #openstack-infra15:22
openstackgerritZang MingJie proposed a change to openstack-infra/zuul: Supply authentication to zuul's gerrit baseurl  https://review.openstack.org/6785815:23
*** mrmartin has joined #openstack-infra15:25
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Load projects from yaml file  https://review.openstack.org/6628015:25
*** talluri has joined #openstack-infra15:30
max_loburSomebody from requirements core group, could you please review/approve the patch https://review.openstack.org/#/c/66349/3. It's already has one +1 from core reviewer15:30
*** nprivalova is now known as nadya_15:31
*** dmitkuzn has joined #openstack-infra15:32
*** jgrimm has joined #openstack-infra15:34
*** vkozhukalov has quit IRC15:34
*** dims has joined #openstack-infra15:35
*** gokrokve has joined #openstack-infra15:37
*** gokrokve has joined #openstack-infra15:37
*** rcleere has joined #openstack-infra15:40
*** johnthetubaguy has quit IRC15:40
*** DennyZhang has joined #openstack-infra15:40
*** johnthetubaguy has joined #openstack-infra15:41
openstackgerritArx Cruz proposed a change to openstack-infra/config: Change mysql-devel to community-mysql-devel in Fedora  https://review.openstack.org/6273915:43
*** afazekas_ has quit IRC15:44
*** juliashem has joined #openstack-infra15:46
*** NikitaKonovalov is now known as NikitaKonovalov_15:47
*** mrmartin has quit IRC15:49
*** annegent_ has joined #openstack-infra15:50
*** dmitkuzn has quit IRC15:51
*** senk1 has quit IRC15:51
*** juliashem has quit IRC15:51
*** annegent_ has quit IRC15:54
*** ryanpetrello has joined #openstack-infra15:55
*** ryanpetrello has quit IRC15:55
*** marun has joined #openstack-infra15:57
fungithe merge rate seems to be getting substantially worse. we're on track to merge 3 or 4 changes to openstack/openstack in a 24-hour period15:57
fungiwith the load from check pileup putting zuul into a pendulum between pipelines, we're merging or kicking out (more often kicking out) one change from the gate every couple hours, yet we're approving a dozen an hour15:59
*** talluri has quit IRC16:01
notmynamefungi: how are you tracking that number?16:01
funginotmyname: looked at http://git.openstack.org/cgit/openstack/openstack/log/16:01
fungi3 changes merged in the past 18 hours16:02
notmynamefungi: thanks16:02
*** johnthetubaguy has quit IRC16:02
fungiand the cinder change at the head of the gate just failed a grenade job, which means now we get to service the 50 or so changes waiting for nodes in the check pipeline before we restart testing on the change which was behind it in the gate16:03
*** johnthetubaguy has joined #openstack-infra16:05
fungigranted that off-the-cuff metric misses changes to stable release branches, but right now those are broken anyway so we wouldn't be merging any changes to them regardless16:05
*** david-lyle_ has joined #openstack-infra16:05
*** SergeyLukjanov is now known as SergeyLukjanov_16:09
openstackgerritArx Cruz proposed a change to openstack-infra/config: Change mysql-devel to community-mysql-devel in Fedora  https://review.openstack.org/6273916:11
*** afazekas_ has joined #openstack-infra16:11
*** nicedice has joined #openstack-infra16:13
*** sarob has joined #openstack-infra16:13
*** salv-orlando has joined #openstack-infra16:13
*** jcoufal has quit IRC16:15
*** DinaBelova is now known as DinaBelova_16:17
*** sarob has quit IRC16:18
*** nati_ueno has joined #openstack-infra16:19
*** marun has quit IRC16:20
*** thuc has joined #openstack-infra16:20
*** marun has joined #openstack-infra16:22
*** johnthetubaguy has quit IRC16:22
*** johnthetubaguy has joined #openstack-infra16:22
*** dizquierdo has quit IRC16:27
*** NikitaKonovalov_ is now known as NikitaKonovalov16:32
sdaguemordred: any word on quota bump?16:32
fungii think it must be freudian that i've started mistyping "gate" as "hate"16:34
fungisdague: it looks like https://review.openstack.org/67371 could use an approval vote16:35
sdaguefungi: doh16:36
fungiotherwise pretty much all of the tempest changes from last week's sprint have merged (except for a couple which are in the gate currently)16:36
sdaguefungi: where is it in the queue?16:36
*** mancdaz is now known as mancdaz_away16:37
fungisdague: it isn't. it already passed all the way through but failed to merge because dkranz revoked his approval16:37
*** AaronGr_Zzz is now known as AaronGr16:37
*** UtahDave has joined #openstack-infra16:38
*** markmcclain has joined #openstack-infra16:38
sdaguefungi: can we promote or ninja merge? that will actually take some of the load off the neutron tests16:39
*** yamahata has quit IRC16:39
sdaguewhich should increase their pass rate16:39
fungisdague: should be safe. looks like it would have made it were it not for the missing approval vote when it was done16:39
*** DinaBelova_ is now known as DinaBelova16:39
fungii'll merge it16:40
*** mancdaz_away is now known as mancdaz16:41
fungiit's merged now16:41
*** jpich has quit IRC16:43
*** afazekas_ has quit IRC16:45
mtreinishfungi: heh, I don't think I actually seen that before16:45
fungimtreinish: that's the behavior if missing vrfy/cdrv/aprv votes are missing or there's a -2 vote on it when it comes time to merge16:46
fungigenerally happens when they're manually unset while it's in the gate16:46
mordredsdague: nope. just pinged back again16:46
mtreinishfungi: yeah it looks like dkranz removed his +A after the gate tests started on it16:46
fungiyep16:46
fungiwhich won't kick it out of the gate at the moment, but will prevent it from merging once it makes its way through16:47
sdaguefungi: so we might want to trigger a gate dequeue on removing A16:47
sdaguebecause otherwise it's kind of useless16:47
fungisdague: i believe there is intent to make that happen (along with on -2 as well), but is still on the to do list16:48
sdagueyep16:49
sdaguedid the early pep8 on check ever get merged?16:49
fungisdague: mordred wanted to rework it. it wouldn't have bought us much in its original form16:50
sdagueok, cool16:50
sdaguejust checking16:50
fungiall it would have preempted was python26/27 and docs checks16:50
mordredyeah. I'm not sure it's possible to express with the current template setup16:50
*** elasticio has joined #openstack-infra16:50
sdagueok16:50
*** mgagne has joined #openstack-infra16:51
*** GheRiver1 has joined #openstack-infra16:53
*** GheRiver1 has quit IRC16:53
*** MarkAtwood has joined #openstack-infra16:54
*** pblaho has quit IRC16:57
*** AaronGr is now known as AaronGr_Zzz16:58
sdaguefungi: so given that we're not really moving code anyway, what are the odds we could fix logs on the other jenkinses16:58
*** alexpilotti has quit IRC16:58
*** sarob has joined #openstack-infra16:59
*** ruhe is now known as _ruhe16:59
*** krotscheck has joined #openstack-infra17:00
fungisdague: pretty good. would be easier when clarkb is around since he knows how he was obtaining the patched plugin build to upload into them17:00
sdaguesure17:01
sdaguethat's fair, hopefully he'll be back on soon17:01
*** nati_ueno has quit IRC17:02
*** pblaho has joined #openstack-infra17:04
*** pblaho has quit IRC17:04
mgagnezaro: ping17:06
*** vkozhukalov has joined #openstack-infra17:08
*** senk1 has joined #openstack-infra17:09
*** Ryan_Lane has joined #openstack-infra17:10
*** Ryan_Lane has quit IRC17:11
sdaguemordred: if you feel like reviewing something that can merge - https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:gatestatus,n,z17:12
sdaguethen I can get that off the elastic recheck page17:13
mordredusdlooking17:13
mordredgah17:13
mordredsdague: looking17:13
*** gokrokve has quit IRC17:13
*** gokrokve has joined #openstack-infra17:13
*** aburaschi has joined #openstack-infra17:15
sdaguealso, where is that framework patch for status again?17:16
sdagueI want to look at redoing the er stuff like that before I add more logic to the existing page17:16
*** gokrokve has quit IRC17:18
*** jaypipes has joined #openstack-infra17:19
*** mancdaz is now known as mancdaz_away17:19
*** moted has quit IRC17:20
*** moted has joined #openstack-infra17:20
aburaschiHello, newbie quick question: if I want to reverify a patch in jenkins, and I identify that two bugs are associated to that failure, which is the correct way to proceed?17:20
aburaschia) put:17:20
aburaschireverify bug 117:20
aburaschireverify bug 217:20
aburaschior b) select just one and go with that one?17:20
fungiaburaschi: best would be to leave two reverify comments, one for each bug which resulted in a failure (don't leave them in the same comment though or it won't work)17:22
*** SumitNaiksatam has quit IRC17:25
aburaschiExcellent, thank you very much, fungi.17:25
fungiyou're welcome17:26
*** DennyZhang has quit IRC17:26
*** yassine has quit IRC17:27
*** AaronGr has joined #openstack-infra17:29
fungi...thinking aloud, i wonder whether giving the check pipeline priority over the gate would break the pendulum swing and improve gating performance17:29
*** AaronGr has quit IRC17:30
*** AaronGr_Zzz is now known as AaronGr17:30
fungiwe'd dribble nodes into the gate jobs in sequence as the check pipeline no longer needs them. as a result, we'd be testing fewer gate changes at a time, meaning a smaller rush of nodes to reclaim on the inevitable gate reset17:31
fungiwould have the effect of spreading nodepool delete and build operations out more evenly17:31
*** thuc has quit IRC17:36
*** afazekas_ has joined #openstack-infra17:37
*** thuc has joined #openstack-infra17:37
*** marun has quit IRC17:40
*** fbo is now known as fbo_away17:40
*** pballand has joined #openstack-infra17:40
*** marun has joined #openstack-infra17:41
*** thuc has quit IRC17:41
*** chandankumar_ has quit IRC17:42
*** luqas has quit IRC17:43
*** senk1 has quit IRC17:43
sdaguefungi: do we ever hit a point where check doesn't need them right now?17:43
sdagueI also thought both queues were equal priority17:44
fungisdague: if we were servicing it first, we probably would17:44
sdaguefungi: I'm not convinced :)17:44
fungithey are equal priority right now, which is what causes the swing17:44
sdagueit's at 10217:44
clarkbmorning17:44
sdagueand given the build delays, I think we'd just completely starve the gate17:44
sdagueif we had more nodes, I'd agree17:44
sdagueok, going to pop out for lunch17:45
clarkbfungi: I grabbed the scp.jpi file from jenkins-dev17:45
*** _ruhe is now known as ruhe17:45
clarkbfungi you can grab it from there or 0217:45
fungisdague: possibly. part of it is that right now, we're applying every new node to gate changes (because there's more than we can service) and then once a gate reset happens, we start handing every available node to the check pipeline changes which piled up while we were previously handing them all to the gate17:46
sdagueyep, swapping not fun17:46
fungibut given the gate reset frequency, most of the nodes burned on gate pipeline changes were wasted because their results were never needed17:46
*** dstufft has quit IRC17:46
sdagueright17:47
*** nati_ueno has joined #openstack-infra17:47
*** dstufft has joined #openstack-infra17:47
fungiat most the first few dozen nodes applied to the gate had any real effect at all, and the rest were just resources which could have gone to clearing out the check pipeline instead17:47
*** jasondotstar has joined #openstack-infra17:47
sdaguethe smart way to do it would be to calculate out the percentage chances for each successive piece of the gate to get through from it's current position, then define a cutoff17:47
sdagueand not schedule past that point17:47
sdaguethat requires a lot more logic though17:48
*** nati_ueno has quit IRC17:48
dkranzfungi, sdague : Did I do something bad?17:48
sdaguedkranz: yeh, but we fixed it17:48
mordredsdague: we can segregate teh pools more though17:48
mordredwe have precide and check-precise or whatever it's called17:48
dkranzsdague: For future reference, what am I not supposed to do?17:48
sdaguedkranz: don't remove +A from a change in the gate17:49
sdaguethe behavior isn't what you actually want17:49
mordredwe could change teh nodepool config to put less nodes into devstack-precise and more into devstack-precise-check17:49
fungimordred: we actually got rid of precise-check nodes a few weeks ago. now all dsvm and bare nodes are available for either check or gate17:49
mordredto achieve with a baseball bat the thing you were talking about above17:49
mordredoh. well17:49
sdagueok... really, leaving for lunch17:49
dkranzsdague: ok. But I was not trying to stop it and didn't realize it was in the gate.17:49
dkranzsdague: I just saw from other comments that it should not have been approved.17:50
dkranzBut I won't do it again17:50
*** rakhmerov has quit IRC17:51
clarkbfungi mordred should we bump scp on 03 now?17:51
openstackgerritA change was merged to openstack-infra/config: Additional jobs for python-rallyclient  https://review.openstack.org/6692917:51
mordredclarkb: yeah17:51
fungidkranz: probably could have just left it approved at that point and waited to see whether the check results came back green. then if not, upload a new patchset to knock the previous broken one out of the gate17:52
openstackgerritA change was merged to openstack-infra/config: Add an experimental functional job for neutron.  https://review.openstack.org/6696717:52
fungiclarkb: i believe that would be good17:52
*** SergeyLukjanov_ is now known as SergeyLukjanov17:52
*** sarob has quit IRC17:53
*** sarob has joined #openstack-infra17:54
*** bnemec_ is now known as bnemec17:54
clarkbok putting 03 in shutdown mode17:56
*** ruhe is now known as _ruhe17:57
openstackgerritA change was merged to openstack-infra/storyboard: Fix the intial db migration  https://review.openstack.org/6759217:59
*** MarkAtwood has quit IRC18:00
*** boris-42 has quit IRC18:00
clarkbfungi: mordred: the scp.jpi file is on 03 and 04 under ~clarkb/plugins/scp/fixed18:01
fungik18:02
fungiand you're just using the upload screen in the webui to upgrade it?18:03
*** derekh has quit IRC18:03
*** chandankumar_ has joined #openstack-infra18:03
*** boris-42 has joined #openstack-infra18:04
*** nati_ueno has joined #openstack-infra18:04
clarkbfungi: no, I am actually stopping the server, putting the scp.jpi in /var/lib/jenkins/plugins then starting jenkins18:05
*** NikitaKonovalov is now known as NikitaKonovalov_18:05
clarkbfungi: you can use the webui instead, it is how zaro put it on -dev18:05
clarkbI feel like there is more control doing it by hand on disk18:05
*** CaptTofu has joined #openstack-infra18:06
clarkbbecause I don't know what magic jenkins is doing under the hood to do restartless upgrades (which don't work) and so on18:06
*** gokrokve has joined #openstack-infra18:09
*** zz_ewindisch is now known as ewindisch18:09
*** sarob has quit IRC18:09
*** markmcclain has quit IRC18:10
fungiahh, okay. the last time i did it from the fs it was because jenkins wouldn't start otherwise, and i wasn't sure how many of the accompanying files needed to be copied into place too or whether some of those were ephemeral18:10
*** markmcclain has joined #openstack-infra18:10
clarkbfungi: the scp/ dir that is created is made by expanding the jpi archive I think18:11
radixcan someone help me understand what's going on in http://logs.openstack.org/06/67006/4/check/check-tempest-dsvm-full/5fa3d8a/ ? It seems to be some kind of network failure18:11
clarkbfungi: the only thing you need is the .jpi or .hpi18:11
*** rakhmerov has joined #openstack-infra18:11
*** afazekas_ has quit IRC18:12
*** rakhmerov has joined #openstack-infra18:12
clarkbradix: http://logs.openstack.org/06/67006/4/check/check-tempest-dsvm-full/5fa3d8a/logs/devstack-gate-setup-workspace-new.txt an hpcloud node is trying to clone a repo over ipv618:13
clarkbhpcloud doesn't have an ipv6 stack18:13
*** johnthetubaguy has quit IRC18:13
clarkbfungi: did we determine anything more about that problem?18:13
radixclarkb: this came up in my heat change, pretty sure it's unrelated, and I'm not sure what to do about it18:13
clarkbradix: I am not sure either, I think fungi has investigated it18:14
radixoh ok :)18:14
clarkbone job left on 03, I will stop it and start it with new scp plugin as soon as that job clears out18:14
clarkbwhich is now18:15
openstackgerritBrant Knudson proposed a change to openstack/requirements: Update oauthlib requirement to at least 0.6  https://review.openstack.org/6790018:15
*** jaypipes has quit IRC18:16
fungiclarkb: only speculation... the ip configuration output in the console log only shows ipv4 (not even any linklocal v6), which makes me think we're doing "ip -4 ad sh" explicitly or something. i'll have a look and see how we might get more diagnostics for this on future runs18:16
clarkb03 is back up with new plugin18:17
clarkbfungi: mordred: should I put 04 in shutdown mode now?18:18
radixhmm, looks like this: https://bugs.launchpad.net/openstack-ci/+bug/126661618:18
radixI guess I'll run a recheck on that18:18
fungiclarkb: go for it18:18
*** vkozhukalov has quit IRC18:19
clarkbfungi: once 04 is done the remaining nodes will be 01 and jenkins.o.o which can get the correct version when we update their jenkins version18:19
*** fifieldt has joined #openstack-infra18:19
fungiradix: that looks like it, yeah18:19
clarkbI am going to take advantage of the wait to return to my regularly scheduled morning18:19
clarkbwill pop back in in a bit to finish 0418:20
fungiradix: current suspicion is that some other tenant in hpcloud is generating router advertisements, but adding some extra debugging around address assignments there may help enlighten us as to the cause18:20
radixyikes18:21
*** yamahata has joined #openstack-infra18:21
clarkbfungi: we can update the iptables rules right?18:21
clarkbneeds to be conditional for hpcloud only though18:22
*** pballand has quit IRC18:23
*** SergeyLukjanov is now known as SergeyLukjanov_a18:24
fungiclarkb: well, if that's the cause then yes, but if so there's every chance the same could happen in rackspace and then we'd need to be able to keep filters updated for their router linklocal addresses18:24
*** SergeyLukjanov_a is now known as SergeyLukjanov_18:25
fungiclarkb: radix: for details, see https://launchpad.net/bugs/126275918:26
*** afazekas_ has joined #openstack-infra18:26
fungiit's apparently blocked *if* you're doing openstack ipv6 networking, but given the way in which rackspace has implemented their ipv6 vm connectivity i have no idea whether that also holds true for them18:29
*** afazekas_ has quit IRC18:30
*** dcramer_ has quit IRC18:31
*** afazekas_ has joined #openstack-infra18:32
*** marun has quit IRC18:32
*** marun has joined #openstack-infra18:33
*** ewindisch is now known as zz_ewindisch18:36
*** elasticio has quit IRC18:36
*** praneshp has joined #openstack-infra18:36
*** zz_ewindisch is now known as ewindisch18:37
*** jaypipes has joined #openstack-infra18:37
*** senk1 has joined #openstack-infra18:38
*** ewindisch is now known as zz_ewindisch18:40
*** marun has quit IRC18:40
*** zz_ewindisch is now known as ewindisch18:41
*** marun has joined #openstack-infra18:41
clarkbeta on 04 is 30 minutes18:43
*** chandankumar_ has quit IRC18:44
*** yamahata has quit IRC18:48
*** jasondotstar has quit IRC18:49
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Add tox.ini file to run things via tox  https://review.openstack.org/6772118:49
clarkbfungi: do we think we should submit a ticket to hpcloud about the possible bad 'router'?18:50
*** ewindisch is now known as zz_ewindisch18:50
openstackgerritJeremy Stanley proposed a change to openstack-infra/devstack-gate: Also print IPv6 address details  https://review.openstack.org/6791118:52
fungiclarkb: maybe we start with ^ and have a look at the next one which hits logstash18:52
*** CaptTofu has quit IRC18:52
*** mindjiver has quit IRC18:52
*** zz_ewindisch is now known as ewindisch18:52
* clarkb looks18:53
*** nati_uen_ has joined #openstack-infra18:53
fungido you think an ip route show along with that would also be in order?18:53
krotscheckclarkb: The run-selenium script seems to depend on having run_tests.sh in the project. Do you have a strong opinion on whether A) I can remove that, or B) I should create an xvfb builder macro that just executes tox?18:53
*** markmcclain has quit IRC18:53
fungiclarkb: oh, though for that you also need ip -6 route show. maybe add an ip {,-6} neighbor show too18:54
clarkbkrotscheck: I would love it if we can remove the dependency on run_tests.sh, but horizon is a thing18:54
clarkbkrotscheck: maybe we can feed run-selenium a command to execute a test with selenium bits prestaged18:54
clarkbkrotscheck: then feed a different command to horizon and storyboard18:55
clarkbfungi: sounds good to me18:55
krotscheckclarkb: I dunno, that feels a bit like overparameterizing a command18:55
clarkbkrotscheck: not really, its creating a specific test environment to run tests within18:55
krotscheckclarkb: BTW- so there's a python module called nodeenv that will drop a nodejs runtime into your virtualenv for you.18:55
clarkbthe test yo uwant to run within it don't need to be identical18:55
krotscheckclarkb: So mordred and I are working on just having storyboard use tox.18:56
clarkbfungi: want to update the existing change or do that in a different one?18:56
*** nati_ueno has quit IRC18:56
fungiclarkb: i'm updating it now18:56
openstackgerritJeremy Stanley proposed a change to openstack-infra/devstack-gate: More network debugging detail  https://review.openstack.org/6791118:58
fungiclarkb: ^ updated18:58
*** markmcclain has joined #openstack-infra18:59
fungiclarkb: turns out ip neighbor show gets you both the arp and nd table entries together, so it's just ip route show which needs a separate -6 variant18:59
*** Ajaeger1 has joined #openstack-infra19:01
*** SergeyLukjanov_ is now known as SergeyLukjanov19:02
*** amotoki has joined #openstack-infra19:03
fungiclarkb: anyway, with that it should give us enough info to spot the ethernet address of the "router" if it really is someone testing radvd or a broken switchrouter in the distribution layer or something at fault19:03
jog0sdague: ping19:03
sdaguejog0: yo19:03
*** afazekas_ has quit IRC19:03
*** amotoki has quit IRC19:03
jog0sdague: https://review.openstack.org/#/c/67596/ works can you review it19:03
jog0mtreinish: if your around19:03
jog0sdague: that will give us more accurate e-r comments19:04
*** CaptTofu has joined #openstack-infra19:05
jog0which is why I want to get this in as soon as possible19:06
sdaguejog0: so I have one suggested change, inline19:08
jog0sdague: sounds like a good idea to me, thanks19:09
*** mrodden has quit IRC19:10
jog0so actually we use a lot of data from the gerrit event19:11
jog0and its all over the place right now19:12
jog0sdague: so I would prefer to do that refactor separately19:12
*** yamahata has joined #openstack-infra19:13
*** markmcclain has quit IRC19:13
openstackgerritlifeless proposed a change to openstack-infra/config: Add some dependencies required by toci  https://review.openstack.org/6768519:13
lifelessclarkb: fungi: if we can get ^ landed and then turn on the tripleo nodepool config, that would be the awesome19:14
mriedemdid anything change with the backing cinder volume store on the test nodes around 1/17?19:14
*** jasondotstar has joined #openstack-infra19:14
sdaguejog0: can you introduce the event object under this one19:14
sdagueI really hate having to clean these up later19:14
clarkb04 is idle now, updating scp plugin now19:15
*** mrodden has joined #openstack-infra19:15
sdaguegreat19:15
sdagueI can already see us timing out a lot less in the channel19:15
*** markmc has quit IRC19:16
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Add tox.ini file to run things via tox  https://review.openstack.org/6772119:16
fungimriedem: what backing cinder volume store? you mean the one devstack creates when it starts up?19:16
mriedemfungi: yes19:16
mriedemanything using iscsu19:16
mriedem*iscsi19:16
jgriffithmriedem: fungi aren't those still just loopback files created by devstack?19:17
fungijgriffith: as far as i know, yes. so any changes would be changes in devstack or *maybe* devstack-gate repositories19:17
jgriffithfungi: or cinder :/19:17
jog0sdague: normally I would say sure, bit I am not even supposed to be working today, just stopped in for an hour or so19:17
jgriffithmriedem: what are you seeing?19:17
fungijgriffith: well, yeah, or cinder ;)19:18
mriedemjgriffith: digging into this https://bugs.launchpad.net/nova/+bug/127060819:18
jog0I agree it needs cleanup but I don't think its worth holding this up for that19:18
clarkb04 seems up19:18
mriedemi might be looking at a red herring in the nova code that changed on 1/17 which is when that bug started showing up19:18
fungiclarkb: agreed. looks like it's running jobs already19:18
mriedemi'll see what changed in cinder and devstack on 1/1719:18
ewindischirt a conversation I've been having with dtroyer in #openstack-dev....19:19
ewindischwhat are the thoughts toward gating another nova hypervisor in openstack-infra?19:19
jgriffithmriedem: I seem to recall this may be a dup of another nova item we looked at a while back19:19
sdaguejog0: so I don't want to unwind this when we could do the event object first19:20
ewindischDean seems to worry about having enough resources for the extra gate19:20
sdagueas it makes more work19:20
sdagueewindisch: -119:20
sdaguerevisit at Juno summit19:20
ewindischsdague: at the root of this is russell REQUIRING a (non-voting) gate to keep hypervisors in Nova19:21
jog0sdague: you want to take a whack at the event object? I am trying to not work today19:21
*** annegent_ has joined #openstack-infra19:21
sdaguejog0: yep, I will19:21
sdagueewindisch: yep, do what everyone else is doing, and bring up a 3rd party system19:21
jog0sdague: thanks19:21
jog0!19:21
fungiewindisch: as it stands there's still a whole stack of patches against nodepool, devstack-gate and infra/config to get xenserver testing working. we haven't even had time to look at them as far as i'm aware (much to the displeasure of the xenserver devs)19:21
clarkbfungi: did they ever respond to the first round of review on those?19:22
fungiclarkb: i believe so, but i've been too busy to look through them again19:22
*** nati_uen_ has quit IRC19:22
clarkbfungi: I was really curious what the feedback would be but the changes sat idel and were auto abandoned19:22
mriedemjgriffith: this is iscsi related and 1/17: https://github.com/openstack/cinder/commit/a9267644ee09591e2d642d6c1204d94a9fdd8c8219:22
*** annegent_ has quit IRC19:22
*** markmcclain has joined #openstack-infra19:23
ewindischsdague: everyone else being "VMware" and "Citrix" i.e. https://www.google.com/finance?q=ctxs and https://www.google.com/finance?q=vmw19:23
jgriffithmriedem: eeek19:24
*** jp_at_hp has quit IRC19:24
jog0ewindisch: even if we wanted to we don't have the resources right now19:24
mriedemjgriffith: i'm not familiar with that code, but does that look like it could cause races? like premature return from snapshot from volume when it's not ready?19:24
ewindischI understand, but I'm going to have to sync with russellb and samalba about this.19:25
jgriffithmriedem: indeed, I believe it could19:25
jgriffithmriedem: looking now19:25
jgriffithmriedem: I believe you're correct19:28
jog0so I have a question that I am not sure how to answer: do we think dropping tempest concurrency down to 2 increased the number of patches we are able to merge into openstack/openstack in a given amount of time19:28
jgriffithmriedem: I'll spin it up here and take a look after I finish what I'm in the middle of now19:28
*** thomasem has quit IRC19:28
clarkbjog0: no19:28
clarkbI think it significantly impacted the backlog19:28
jog0I count 7 patches in last 24 hours19:29
jog0clarkb: perhaps we should consider reverting the patch19:29
clarkbin the opposite direction, but I have no hard data to support that19:29
clarkbbecause tests are taking up to 1.33 hours now isntead of .70 hours or wherever they were before19:29
mriedemjgriffith: cool, thanks19:30
*** yolanda has quit IRC19:30
russellbtaking 1.33 hours more reliably is better than 0.7 hours with random failures all over the place due to pegging the CPU the entire time19:30
fungias discussed in #nova, i'm going to promote https://review.openstack.org/67914 to the head of the gate pipeline. the result will be that everything in the check pipeline as of now will get new nodes first, and then that change will get a shot at fixing a substantial percentage of our gate resets19:30
sdagueso I actually think the concurrency did make things better19:30
russellbit's really just a non-starter to run the tests with CPU over the top19:30
clarkbrussellb: it isn't more reliable though19:30
sdagueclarkb: sure19:31
*** SergeyLukjanov is now known as SergeyLukjanov_a19:31
russellbthe failures are just other things right now19:31
sdaguebut it's more reliable19:31
sdagueso I'm -1 to going back to 4x19:31
russellbit eliminates a whole class of failures19:31
sdagueagree with russellb19:31
clarkbwere those failures just masking all of these failures?19:31
sdagueclarkb: possibly19:31
clarkbwe are still essentially worst cases the gate queue which is where we were before19:32
clarkbso the gate queue isn't more reliable19:32
*** SergeyLukjanov_a is now known as SergeyLukjanov_19:32
sdaguewe were also in deep gate queue19:32
jog0clarkb: http://status.openstack.org/elastic-recheck/ the graph at the top looks very wrong19:32
sdagueso we're basically driving a rover on mars19:32
clarkbI think we had what 30 changes merge over a day recently19:32
clarkbjog0: looks like graphite problems19:32
jog0clarkb: yeah19:32
jog0so merge rates: http://paste.openstack.org/show/61594/19:32
sdagueclarkb: yeh, friday -> sat was about 30 in 24hrs19:32
sdagueI also expect what happened is in drop concurency we had some tests move around19:33
*** _david_ has joined #openstack-infra19:33
sdagueso we go new overlaps19:33
sdaguegot new overlaps19:33
fungihttp://git.openstack.org/cgit/openstack/openstack/log/ shows 4 commits in the past 22 hours, one of which i force-merged without putting through the gate19:33
sdaguewhich exposed a few new issues19:34
clarkbI need to run back to regularly scheduled holiday programming19:34
fungik19:35
*** SergeyLukjanov_ is now known as SergeyLukjanov19:35
jog0spot checking shows these numbers appear to be common19:36
jog0merges per day in below 4519:37
*** emagana has joined #openstack-infra19:37
sdaguejog0: you need to only count merge commits19:37
sdagueotherwise the timing is off19:37
sdaguefilter by author jenkins19:37
jog0https://github.com/openstack/openstack/graphs/commit-activity19:38
sdaguejog0: right, but we have 2 commits per commit19:38
lifelesssdague: so 3 in total?19:39
*** markmcclain has quit IRC19:39
sdague:P19:39
*** HenryG has quit IRC19:39
sdaguejog0: anyway if yuo add --author=jenkins to your git commands it will be close19:39
sdagueit will double count translations19:39
sdaguebut that's pretty minor19:39
jog0I don't see any doubles and translations are merges19:40
jog0anyway19:40
sdaguejog0: oh, github is filtering merges19:41
sdaguebut on your pastebin19:42
jog0sdague: yeah I forgot about github, they have pretty pictures19:42
jog0anyway data looks inconclusive to me19:44
jog0do we know why deletes take so long in nodepool btw?19:44
clarkbbecause cloud. deletes are expensive19:45
fungijog0: in particular, rackspace likes to ignore them19:45
fungiso we keep spamming them with delete calls until they finally free up the node19:45
jog0fungi: ahh19:46
fungihpcloud doesn't ignore them as much, just takes a long time to act on them19:46
*** praneshp_ has joined #openstack-infra19:46
jog0are deletes slow in openstack in general?19:46
fungii suspect it depends on the load in your cloud19:46
jog0can we conplain to RAX and HP cloud about it?19:46
*** CaptTofu has quit IRC19:46
fungii have this running on nodepool.o.o for the past 18 hours or so, but it hasn't seemed to make any difference: https://review.openstack.org/6772319:47
sdaguejog0: they'll probably just complain back to us to clean up nova :)19:47
jog0we can do that19:47
jog0but  the nodepool plots are just sad19:48
fungiclarkb: i was wanting to ask on 67723, does that need a yield in the outer loop too?19:48
lifelesshah, devstack-gate really wants a lot of variables and node state :/19:48
*** praneshp has quit IRC19:48
*** praneshp_ is now known as praneshp19:48
sdaguejog0: honestly, that's what swapping looks like. We've just got a working set far too large for our resources, so now we're swapping19:50
lifelessfungi: no, its just broken19:50
lifelessfungi: reviewing it now19:50
fungisdague: well, the providers also do take waaaay too long to act on delete calls from us19:50
fungilifeless: okay, thanks. it's a bit over my head i'm afraid19:50
jog0mordred: ^ can you look into the HP side of this19:50
*** markmcclain has joined #openstack-infra19:51
sdaguejog0: I think that's a good long term conversation, I don't see that helping us over the hump19:51
*** gokrokve has quit IRC19:53
openstackgerritMichael Krotscheck proposed a change to openstack-infra/config: Use nodeenv via tox to do javascript testing  https://review.openstack.org/6772919:53
*** gokrokve has joined #openstack-infra19:53
jog0sdague: agreed19:53
*** fifieldt has quit IRC19:53
*** yolanda has joined #openstack-infra19:54
openstackgerritMichael Krotscheck proposed a change to openstack-infra/config: Use nodeenv via tox to do javascript testing  https://review.openstack.org/6772919:55
*** rnirmal has joined #openstack-infra19:55
lifelessfungi: so nodepool is regular python19:55
lifelessfungi: threads, not eventlet19:55
*** marun has quit IRC19:55
*** westmau5 is now known as westmaas19:55
fungilifeless: thanks! i'm far more used to hacking on single-threaded applications19:55
lifelessfungi: at least, AFAICT19:55
lifelessfungi: anyhow, have a look at task_manager.py - you can see that run() is single threaded19:55
lifelessfungi: it pulls a work item off of a queue, processes it, and continues.19:56
lifelessfungi: it's not using a thread *pool*, so making the time to process a single item longer (e.g. up to 10 minutes!) will delay operating /all/ the tasks in the queue19:56
*** HenryG has joined #openstack-infra19:57
lifelessfungi: I'll work up an alternative patch for you19:57
fungilifeless: well, it was 10 minutes before, but having the outer loop be 10 minutes rather than the iterate_timeout() loop may make it less what i meant, agreed19:57
lifelessfungi: I think you're missing my point :(. All deletes occur in a single thread.19:58
*** gokrokve has quit IRC19:58
fungii pondered running two layers of iterate_timeout() inside each other there19:58
lifelessfungi: waiting in that thread for a delete to occur makes all other deletes slower.19:58
*** _ruhe is now known as ruhe19:59
fungilifeless: you mean originally, or only with that patch19:59
*** AaronGr is now known as Aarongr_afk19:59
lifelessfungi: in both cases its all single threaded19:59
fungigot it19:59
lifelessfungi: because its in the JenkinsManager TaskManager queue19:59
lifelessfungi: your patch increases how long a specific delete takes, but does so by not deleting anything else for that period... because it's single threaded20:00
fungiokay, so the yield in iterate_timeout() doesn't really allow anything helpful anyway20:00
lifelessthe yield in iterate_timeout is an entirely separate discussion20:00
lifelessits because its a generator, so its needed20:00
fungioh, right20:00
* fungi sighs at his absent-mindedness20:01
lifelessthere's also a 1 second gap between tasks by default20:03
lifelessI'm not at all sure that makes sense20:03
lifelessif you have more than 60 actions a minute, it will backlog20:03
*** zanins has joined #openstack-infra20:03
* lifeless makes a mental note to ask jeblair about that20:04
lifelessit may be working around broken API ratelimits on small clouds20:04
jog0lifeless: that may be why deletes are so slow right now20:04
fungijog0: well, they were equally as slow before i tried that20:04
lifelessso one simple thing to try would be to set rate to 0.5 or something20:05
jog0fungi: the 1 second gap?20:05
fungijog0: oh, i thought you meant the extra loop20:05
russellbit would probably backlog earlier than 60 per minute20:06
*** marun has joined #openstack-infra20:06
*** bermut has joined #openstack-infra20:06
jog0well we definitly have more then 60 nodes in nodepool and many are in delete20:07
russellbso i wonder if we should just put a hard limit on how many changes are tested in parallel in the gate queue20:07
russellbthat would help node thrashing on resets20:07
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Provide diagnostics when task rate limiting.  https://review.openstack.org/6792420:07
lifelessfungi: so I'd back the aggressive delete out, and apply ^20:07
lifelessfungi: I haven't /tested/ that patch yet, however20:07
fungihowever the 600-second timeout in the cleanup method was being hit fairly regularly, which was similarly backing up the other delete actions from what i saw before20:08
jog0russellb: looking at status.o.o/zuul we don't get test that many in parallel in gate20:08
lifelessfungi: that has to go too20:08
lifelessfungi: I bet thats an attempt to avoid quota overuse20:08
russellbjog0: right now yeah ...20:08
jog0because we are resources starved, infact top of gate isn't getting run20:08
jog0russellb: right now yeah20:08
openstackgerritMichael Krotscheck proposed a change to openstack-infra/config: Genericize javascript release artifact creation  https://review.openstack.org/6773120:09
lifelessoh, wow20:09
lifelessso this code doesn't clearly signal what is within a task and what is not20:09
lifelessthat 600 second wait actually does a cross-thread block20:09
jeblairfungi, lifeless: easy thing to help with deletes is to increase the 600 second delete timeout (maybe 1 hour)20:09
*** gokrokve has joined #openstack-infra20:09
lifelessjeblair: *increase* it ?20:10
lifelessjeblair: what is the 600 second timeout for ?20:10
jeblairfungi, lifeless: because it turns deletes from parallel operations into serial ones20:10
jog0right now we have 150 nodes in deleting tate or so20:10
jeblairlifeless: to avoid having lots of threads waiting around "forever" for something that isn't going to happen20:11
lifelessjeblair: I mean, why wait at all ?20:11
jeblairlifeless: occasionally cloud providers never delete nodes20:11
lifelessjeblair: the code doesn't take any action if it's not deleted.20:11
lifelessjeblair: other than raising an exception20:12
fungiyeah, we now have two stuck in a active(deleting) state in hpcloud-az2 which i have manually cleaned out of nodepool so that it doesn't keep trying and failing to delete those20:12
jeblairlifeless: good point; it should probably delete, wait 5-10 minutes, then delete again20:12
openstackgerritDavanum Srinivas (dims) proposed a change to openstack-infra/devstack-gate: Temporary HACK : Enable UCA  https://review.openstack.org/6756420:12
jeblairlifeless: oh, but it does set the node state, right?20:12
lifelessjeblair: note that I can see, I'm just tracing the code atm20:13
jeblairlifeless: there is an action if it does succeed -- it deletes the node from the db20:13
*** bermut has quit IRC20:13
jeblairlifeless: so that's what it's waiting for20:14
lifelessjeblair: so I think we should decouple those things20:15
lifelessjeblair: not wait, instead set a state DELETING20:15
*** jasondotstar has quit IRC20:15
lifelessjeblair: and in the periodic check if the server is gone, delete from db, if its not submit a delete again20:15
sdaguewelcome back jeblair20:16
dstufftofftopic, but I need to ask someone a pbr question and i don't see a pbr specific channel ;P anyone mind if I PM them? Or tell me if there's a better channel :D (sorry to bother y'all)20:16
*** thomasem has joined #openstack-infra20:17
dansmithdstufft: it's easy. pull the tab to open the spout, chug it, recycle the can when done20:17
lifelessjeblair: in fact, nodedb.DELETE appears to be for this already, just the surrounding code isn't quite aligned20:17
dstufftdansmith: :D20:17
jeblairlifeless: i think the behavior you described is the problem we're trying tofix20:18
jeblairso the thing we want to deal with is that rackspace (apparently) ignores deletes and takes a long time for them to run20:18
lifelessjeblair: yes, exactly20:18
lifelessjeblair: or are we talking at cross purposes ;)20:18
*** jcoufal has joined #openstack-infra20:18
jeblairdeleting nodes is parallel normally, but after the 10 minute timeout, the parallel thread exits and the only chance for it to be deleted is the serialized periodic task20:19
jeblairso overall, i would expect that process to be slower.20:19
jeblairthe peroidic task should not be where the bulk of work happens, it should be where the stuff that falls through the cracks eventually gets cleaned up20:19
jeblairso i think we need to change nodepool to match what's actually happening with clouds20:19
jeblairwhich is that deletes can take longer than 10 minutes normally20:20
jeblairso step 1 is to increase the 10 minute timeout for deletes20:20
lifelessjeblair: I may be misunderstanding someting here, is deleteNode where the parallel deletes come in?20:20
jeblairand if we think that rax is ignoring delete api calls, then we should have it send more of them (step 2)20:20
lifelessjeblair: so the theory is that we're stuck on the quota because rax aren't deleting ?20:21
jeblairlifeless: yeah, or deleting very slowly20:21
jeblairlifeless, fungi: if we're hitting the 10 minute delete timeout and then later the periodic task is successfully deleting rax nodes, then what i've described is accurate20:22
lifelessok, so I see20:22
lifelessNodeCompleteThread20:22
jeblairfungi: i haven't checked the logs recently, is that the case?20:22
lifelessis started per-node20:22
jeblairlifeless: right20:22
*** SergeyLukjanov is now known as SergeyLukjanov_20:22
fungijeblair: yes, that's what i've been seeing. mostly in ord20:23
lifelessjeblair: so what I want to do is remove the 10m block, let the node complete wrap up quickly and let the periodic check also run quickly20:23
lifelessthen run the periodic check more often20:23
jeblairlifeless: it's not a block20:23
jeblairlifeless: because it's a thread-per-node, it doesn't block anything else20:24
lifelessjeblair: Clearly I'm misunderstanding the code; I see deleteNode -> cleanupServer -> getServer -> submitTask20:24
lifelessjeblair: the periodic code also calls cleanupServer, so it blocks that thread20:25
lifelessjeblair: no ?20:25
mikalMorning20:25
jeblairlifeless: all the manager tasks are fast20:26
jeblairlifeless: they are just nova api calls20:26
lifelessjeblair: except cleanupServer20:26
jeblairlifeless: serialized across multiple threads20:26
lifelessjeblair: not for the periodic cleanup20:27
lifelessjeblair: unless I've misunderstood TaskManager20:27
jeblairlifeless: (periodic cleanup is just one of the threads submitting tasks)20:27
jeblairlifeless: the cleanupServer method is slow, but it doesn't block anything else20:27
jeblairlifeless: it submits a series of tasks to the manager20:27
mikalWhat does a check time-in-queue time of 4 hours 18 minutes mean? That there weren't enought workers to start running the test immediately that it was enqueued?20:27
*** senk1 has quit IRC20:28
jeblairlifeless: it's a sort of convenience wrapper around the series of tasks needed to delete a server20:28
lifelessjeblair: and the manager is a single thread with a Queue.Queue20:28
notmynamemikal: not just workers, but also patches previous to it failing that cause a flush of the gate20:29
lifelessI see one JenkinsManager per target jenkins20:29
fungimikal: yes, currently when a gate reset happens, gate pipeline changes go to the back of the line for resource allocation and any pending check pipeline changes are getting available nodes assigned until they catch up to whatever was pending there at teh time of teh gate reset20:29
jeblairlifeless: so cleanupServer isn't what is run by that manager, but rather 'removeFloatingIP' 'deleteFloatingIP' 'deleteKeypair' 'deleteServer' are the actual serialized tasks20:29
*** ryanpetrello has joined #openstack-infra20:29
mikalnotmyname: this is check though, I thought that was the IndependantPipelineManager20:29
mattoliverauMorning all20:30
notmynamemikal: ah. so just what fungi said, then :-)20:30
notmynameheh. Australia has woken up ;-)20:30
mikalfungi: I am having trouble parsing that...20:30
funginotmyname: the pipeline is independent, but node allocation is on a first-come, first served basis20:30
fungier, mikal ^20:30
mikalOh, so a gate flush eats all the nodes that check would use?20:30
mikalSo check starves for a while?20:30
fungimikal: more or less. when there are enough nodes to go around you don't see this. when we run out of available nodes we get into a situation where the pipelines take turns20:31
mikalOk, fair enough20:31
mgagnezaro: ping20:31
fungiand it escalates, because each pipeline is accumulating new changes faster than it can serve them20:31
mikalSo... Should I go to the node shop and bring you back some more quota?20:32
fungis/serve/service/20:32
fungimikal: yes, a thousand standard.large would do nicely ;)20:32
jeblairlifeless: so the actual blocking parts of the manager are the methods that do 'self.submitTask(something)'20:32
lifelessjeblair: I'm not sure I believe you. periodicCleanup->cleanupOneNode->deleteNode->manager.cleanupServer20:32
mikalfungi: this is actually a serious question... Would asking rackspace for more test node quota actually get you out of trouble?20:32
jeblairlifeless: cleanupServer as a whole is not blocking20:33
jeblairlifeless: there's no thread lock around it or anything20:33
lifelessjeblair: it won't return until the server is deleted20:33
fungimikal: i got the impression mordred was already asking rackspace for more quota, so might want to confirm with him (and reinforce as needed)20:33
*** ociuhandu has joined #openstack-infra20:33
jeblairlifeless: that is correct20:33
lifelessjeblair: because getServer does a wait on the task20:33
mordredjeblair: yay!20:33
*** gsamfira has quit IRC20:33
*** rfolco has quit IRC20:33
lifelessjeblair: periodicCleanup will be blocked20:33
jeblairmordred: don't be too happy20:33
mikalmordred: you chasing rackspace for more quota?20:33
mordredsorry , that should have been "yay, it's jeblair"20:33
mordredmikal: yes20:33
jeblairmordred: i'm quite sick20:33
mordredjeblair: oh no!20:33
mordredjeblair: you need me to bring you soup? I can do that now ...20:34
*** gokrokve has quit IRC20:34
fungijeblair: you brought something more than your luggage back from perth, i take it?20:34
lifelessjeblair: I *think* you might be saying 'node deletes when jobs finish will still be attempted' - and sure, I agree.20:34
jeblairmordred: thanks!  but i don't want you to get sick20:34
lifelessjeblair: I'm talking about making the whole set of cleanup things accomondate rax better20:34
*** gokrokve has joined #openstack-infra20:34
*** andreaf has joined #openstack-infra20:34
*** ociuhandu has quit IRC20:34
jeblairlifeless: so am i.20:34
lifelessjeblair: but I want to be sure I understand the code; and when you say 'wont be blocked' while I'm specifically talking about the periodic cleanup, I'm thoroughly confused.20:35
jeblairlifeless: oh yes, the periodic cleanup _will_ be blocked.20:35
*** gokrokve_ has joined #openstack-infra20:35
jeblairlifeless: you're quite right there, and i think you understand correctly.20:35
lifelessjeblair: so my point about this was that if we *stop waiting* in the nodecomplete handler20:35
lifelessjeblair: *and* stop waiting in the periodic cleanup20:35
lifelessjeblair: *then* we can retry across all the pending deletes more often20:36
*** malini has left #openstack-infra20:36
lifelessjeblair: without adding a raft of new threads or anything20:36
fungiwithout a raft, i'll never get off this island20:36
jeblairlifeless: if you don't wait at all then you only give the provider 1 second to delete a node before you ask it to again.20:36
fungithough palm trees might do a sight better than threads20:36
*** markwash has joined #openstack-infra20:36
lifelessjeblair: We do periodic deletes 1/ second ?20:36
jeblairlifeless: not at the moment20:37
jeblairlifeless: are you suggesting that we leave the periodic interval as-is, every 5 minutes?20:38
jeblairlifeless: then minimum time to delete a node will be 5 mins20:38
*** talluri has joined #openstack-infra20:38
lifelessjeblair: lets say we set it to 30 seconds20:38
lifelessjeblair: then if the cloud deletes on request, it will be deleted by nodecompletehandler20:38
jeblairlifeless: nodepool won't notice it until the next periodic run though since you aren't waiting for it20:39
lifelessjeblair: if the cloud doesn't delete it on the first request, up to 30 seconds later we will try from periodic, and every 30s thereafter20:39
*** gokrokve has quit IRC20:39
lifelessjeblair: I don't mean 'don't try' I mean 'don't block if it does not go away immediately.20:39
jeblairlifeless: it never goes away immediately20:39
jeblairlifeless: even the fastest cloud provider takes a little while (many seconds-minutes) to delete a node20:40
lifelesssure20:40
fungion a good day, novaclient reports my hpcloud vms gone after 10 seconds and rackspace after more like 6020:40
lifelessdo nodes in state DELETE count towards the max-servers count ?20:40
jeblairlifeless: yes20:40
lifelessah20:40
lifelessjeblair: so is 30 seconds a reasonable time to wait to find out if the cloud deleted the node ?20:40
jeblairlifeless: apparently 10 minutes isn't long enough20:41
lifelessjeblair: I know, but I'm looking at the nodepool state changes from what I'm proposing20:41
fungii don't think any rackspace deletes will work in a 30-second timeframe. maybe one on occasion, but unlikely20:41
lifelessthey seem to be that *if* a cloud reacts quickly, we change from findout out at 2/4/6/8 (iterate_timeout) seconds20:41
lifelessto finding out at 30+ seconds20:41
lifelessin fact, right now we do nodes in state DELETE /2 API checks a second20:42
lifelessso we could make the periodiccheck run every 2 seconds20:42
jeblairlifeless: i think there are two ways of fixing this: i propose that we adjust the parallel delete strategy to match current reality, you propose going to all-serial.20:42
lifelessand it would be the exact same API traffic20:42
*** gbrugnago has joined #openstack-infra20:42
*** dcramer_ has joined #openstack-infra20:42
lifelessjeblair: yes; though actually I wasn't intending to block there; I was more aiming at a centralised view20:43
lifelessjeblair: s/block/stop/20:43
lifelessjeblair: anyhow, now I understand more of the design - thanks - I can see why increasing the timeout will help - *as long as nodepool isn't restarted*20:43
lifelessjeblair: but when it's restarted everything will become dependent on the periodic cleanup, so I think making that much more effective is important20:44
*** NikitaKonovalov_ is now known as NikitaKonovalov20:45
fungiunder present volume, i've had to resort to ungracefully restarting nodepool and cleaning up the mess20:45
jeblairlifeless: agreed; perhaps adjusting the timeout for parallel operation and reducing it for periodic cleanup would be best20:45
jeblairs/adjusting/increasing/20:45
*** markmcclain has quit IRC20:46
lifelessjeblair: so, what about eliminating the timeout, going all serial as I proposed, but then introducing concurrency in the periodic cleanup - e.g. worker threads there to scatter-gather at some defined concurrency20:47
lifelessjeblair: this would get the same performance for live deletes and make after restart better too, without needing two different codepaths20:47
*** dcramer_ has quit IRC20:47
lifelessjeblair: oh, I just had a possible insight20:48
lifelessjeblair: one form of rate limiting is to discard requests that are over the threshold20:48
lifelessjeblair: how many nodes do we try to delete at once at peak ?20:48
*** derekh has joined #openstack-infra20:48
lifelessI'm guessing hundreds20:48
jeblairlifeless: yes. sometimes the entire quota.20:49
lifelessso what if our basically random api calls results in basically random things being actioned and the rest dropped20:49
lifelessbeing non-blocking-serial (e.g one api call to delete each server before we probe for any of them, then probe all once, then delete all once, in a loop)20:49
lifelesswould give *much* better behaviour with such rate limiters20:50
lifeless-> doctors20:50
mordredmikal: I have a thread going with pvo20:50
lifelessjeblair: I will prepare a patch after my dr visit so we can discuss code20:51
jeblairlifeless: i'm going to be semi-responsive for a while20:51
jeblairdue to illness and other schedule issues20:52
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Add tox.ini file to run things via tox  https://review.openstack.org/6772120:52
jeblairfungi: is there anything else urgent i can help with?  otherwise i'm going to go sleep20:52
*** mrda has joined #openstack-infra20:53
fungijeblair: go to sleep20:53
russellbjeblair: hope you feel better soon!  health more important :)20:53
jeblairrussellb: thanks20:53
fungiwe're handling it. really most of the issues are volume+openstack bugs20:53
sdaguejeblair: yeh, hope you feel better soon20:54
jeblairsdague: thanks20:54
fungidefinitely. the sooner you're well, the more we'll get accomplished20:54
jeblairfungi: i don't think i'm well enough to go to utah, i'll try to join in by phone or something20:55
fungijeblair: my flight's through baltimore tomorrow, so there's every chance i could get stuck in maryland instead ;)20:55
jeblairbut hopefully that will give me a chance to get better and pitch in later this week, and hopefully still go to brussels20:55
mordredjeblair: ++20:55
mordredjeblair: and seriously- I'm sure you're covered, but let me know if I can be helpful20:56
*** yolanda has quit IRC20:56
jeblairmordred: cool, thanks20:56
*** rnirmal has quit IRC20:56
*** ociuhandu has joined #openstack-infra20:57
*** zanins has quit IRC20:57
*** aburaschi has quit IRC20:59
sdaguemordred: hey, tox question, because I'm abusing it for doing something unnatural20:59
sdagueis there an easy way to catch and pass a ^C through tox to the underlying thing it was running?21:00
mordredhrm21:00
mordredsdague: no idea21:00
*** DinaBelova is now known as DinaBelova_21:00
sdagueok, no big deal21:00
fungirussellb: sdague: the nova fix is getting nodes now21:02
russellbyeah just saw that21:02
fungi~1 hour to results21:02
*** dcramer_ has joined #openstack-infra21:03
openstackgerritSean Dague proposed a change to openstack-infra/elastic-recheck: objectify the gerrit event for our purposes  https://review.openstack.org/6794121:03
sdaguesweet21:03
sdaguefingers crossed21:03
sdagueif you watch on -qa you can see that 680 is coming back a lot21:04
*** ociuhandu has quit IRC21:05
*** CaptTofu has joined #openstack-infra21:05
sdaguenow lets hope it doesn't fail on one of the other races21:05
*** misskitty has joined #openstack-infra21:05
fungiclarkb: prelim results from 67911... http://paste.openstack.org/show/61596/ (seems to work as intended)21:07
fungiif there's any ipv6 ra monkeybusiness at that point in time, we should be able to identify it now21:08
fungi(...and knowing's half the battle)21:08
clarkb++21:09
fungionce check results come back, i say we just approve it into the gate normally and then can promote it or force-merge as necessary if the frequency increases substantially21:10
*** dprince has quit IRC21:10
*** max_lobur is now known as max_lobur_afk21:10
fungiotherwise just let the gate take its course21:10
clarkbsounds good21:11
fungii haven't see enough of these yet to suggest it's killing us21:11
fungiseen21:11
clarkbya21:11
*** jaypipes has quit IRC21:12
*** talluri has quit IRC21:14
*** misskitty has quit IRC21:14
openstackgerritDerek Higgins proposed a change to openstack-infra/config: Enable precise-backports on tripleo test nodes  https://review.openstack.org/6795821:16
*** gbrugnago has quit IRC21:17
*** kirukhin has joined #openstack-infra21:17
ewindischrussellb: it seems to me that vmware is only complying with the "group b" functional testing requirement on changes that affect their driver directly... is that okay?21:17
*** dcramer_ has quit IRC21:17
*** senk has joined #openstack-infra21:17
*** salv-orlando has quit IRC21:18
*** salv-orlando has joined #openstack-infra21:18
*** smarcet has left #openstack-infra21:21
dansmithewindisch: have you read the guidelines?21:24
ewindischdansmith: which? I've read https://wiki.openstack.org/wiki/HypervisorSupportMatrix21:24
dansmithewindisch: https://wiki.openstack.org/wiki/HypervisorSupportMatrix/DeprecationPlan21:24
dansmithewindisch: and the bit on the matrix page says "group c will  be deprecated"21:25
openstackgerritSean Dague proposed a change to openstack-infra/elastic-recheck: objectify the gerrit event for our purposes  https://review.openstack.org/6794121:25
ewindischdansmith: yes, I know that... which is why I'm trying to get into group B ;-)21:25
ewindischdansmith: I need to re-review the click-through for DeprecationPlan21:25
russellbright, A and B are fine21:26
russellbi expect most to end up in B21:26
dansmith(for now)21:26
russellbA is ideal21:27
russellbB acceptable21:27
ewindischrussellb / dansmith: the problem is that running our own gating infrastructure for every change is quite an undertaking. I had originally thought this could run in upstream CI21:27
*** CaptTofu has quit IRC21:28
dimsewindisch, the folks working on the vmware driver are on #openstack-vmware channel if you have questions for them as well - fyi21:28
dansmithewindisch: yeah, that's why most people got started early21:28
ewindischrussellb: well, it sounds like A -- which is what I'd prefer to implement - is not acceptable by the openstack-infra team, based on conversations earlier today21:28
russellbyeah, this has been set since before the driver was merged21:28
russellbwell ... it's just that the timing is bad21:28
dansmithewindisch: most of them can't run in upstream infra, so you have a major advantage that you can even do that21:28
ewindischdims: the question was more to russell, "does vmware qualify as B considering it doesn't run on every proposed change to nova"?21:28
russellbit *will* be running on every change21:29
dansmithewindisch: they're working on that21:29
sdagueewindisch: you can't come to infra at i2 and ask for implementing additional hypervisor in upstream ci21:29
russellbthat's their plan21:29
ewindischrussellb: gotcha21:29
sdagueif we'd had a session at icehouse summit, it would be something worth discussing21:29
sdaguewhich is why I said -1, bring to juno summit21:29
russellbfwiw, supporting docker in existing CI is way easier than anything else21:29
*** dcramer_ has joined #openstack-infra21:29
sdaguerussellb: agreed21:30
ewindischrussellb: agreed.21:30
ewindischsdague: is it about human resources or hardware resources?21:30
russellbbut yeah, have to be sensitive to infra priorities based on the status of things21:30
sdagueewindisch: right now, both21:30
dimsewindisch, right. i was responding to "running our own gating infrastructure". you can get an idea from them if you wanted to :)21:31
ewindischdims: ah21:31
*** jhesketh_ has joined #openstack-infra21:33
openstackgerritAndreas Jaeger proposed a change to openstack-infra/config: Early abort documentation builds  https://review.openstack.org/6772221:33
*** ruhe is now known as _ruhe21:34
ewindischsdague: I've worked on gate stuff before, I don't know if it will require that much human capital besides my own effort and perhaps some inquiries here on irc -- but I could be wrong.21:34
ewindischsdague: hardware is something we might be able to help with, TBD21:35
mattoliveraulifeless: In regards to speeding up the cleaning up/deleting of nodes: I don't know if this is possible yet, I've started playing, but what if we only have to build servers once (each day). That is,  Build a server with a main LXC container using the prepare_node.sh etc. Then everytime we need a new "server" for running a test/build, the create is as simeple as creating an empherial LXC container (of21:35
mattoliverauan existing one). This is a container that only lasts until it's turned off.. so run the tests and then the delete and clean up of a node is as simple as stopping a container. Containers run almost as fast as the the machine they run on as they use the same kernel. So as long as the tests/devstack can run inside one of course, so I could be missing something here, but wouldn't this speed up21:35
mattoliverausubsuquent rebuids and deletes of each node. Just my 2 cents. But again I'm new to the project and have a huge gap in my knowlege on the evironment etc.21:35
*** yamahata has quit IRC21:36
ewindischsdague: one of my concerns is that pulling from the gerrit eventstream, we don't get the advantages of things like zuul and "speculative testing" that are done upstream.21:36
clarkbmattoliverau: containers are insufficient for our needs. pleia2 has a list of issues iirc21:36
*** nati_ueno has joined #openstack-infra21:37
sdagueewindisch: you only need to vote on check21:37
*** jhesketh__ has joined #openstack-infra21:37
clarkbthere are things that are not namespaced that openstack touches21:37
mattoliverauclarkb: ok, just a thought :)21:37
sdagueand I agree, it's not quite the same21:37
jhesketh__Morning21:37
clarkbmattoliverau: we wish they would work :)21:37
sdagueewindisch: to be pragmatic. 1) don't plan on this happening in icehouse. 2) start working on how to do it in juno, get prelim work started now 3) be prepared to do summit session on it in Atlanta21:38
*** yamahata has joined #openstack-infra21:38
russellbsdague: it's really not that complicated ... not sure a summit session should block anything21:39
ewindischsdague: meanwhile, unless we invest in external infrastructure, our driver is removed from NOva21:39
sdaguerussellb: it's a socializing thing about what the new matrix looks like21:39
russellbwe can only socialize every 6 months?21:39
clarkbso I spoke to devananda about ironic testing too. can we have nova talk to libvirt wemu, ironic, and docker and run one test21:40
sdaguerussellb: there is *so* much to be done to get us to a functioning i3 at this point given the current gate, adn there are very few people to get it done21:40
ewindischinvesting in external infrastructure which is not only expensive, distracts us from making progress on getting into "group A"21:40
mikalSo, as a data point turbo hipster runs on every nova commit and isn't _that_ big21:40
*** Shrews has quit IRC21:41
mikal(21 instances, about $2,000 a month in public cloud costs)21:41
russellbmikal: cool data point21:41
dansmithyeah, awesome21:41
mikalIt sometimes gets behind, but that's mostly when dansmith does a thing21:42
russellbsdague: yeah, i get that, it's kinda late to be getting started trying to get something running, given that infra is only going to get more busy21:42
fungiand reuses a lot of upstream ci tooling21:42
mikalAnd it catches up21:42
dansmith$6k per cycle for CI testing21:42
russellbwell ... 12k21:42
mikalBall park21:42
russellb6 months :-)21:42
*** Shrews has joined #openstack-infra21:42
mikalIf we catch one production db problem a cycle, then that's easily paid for itself21:42
dansmithrussellb: only if you feel the need for the numbers to be right21:42
russellbmikal: how long do your runs take?21:42
dansmithrussellb: er, yeah, 12 :)21:42
mikalHeh21:42
mikalUmmm, about 20 minutes...21:42
russellbwe're asking for a full tempest run21:43
mikalSo a _lot_ faster than infra's CI at this point21:43
russellbright21:43
mikalSo, what's a tempest run these days? An hour?21:43
russellbyeah21:43
dansmithcertainly a tempest docker run would be way faster than kvm, no?21:43
mikalSo I guess multiply those numbers by three21:43
russellbdansmith: yes21:43
russellbbecause a tempest config that works with docker would be a small subset21:43
mikalBut yeah, I would expect containers to be a lot faster to start than vms21:43
russellbthat too21:43
russellbbut also, docker driver supports a small subset of the API21:43
dansmithyeah, both of those things21:43
ewindischdansmith: yeah, and I've thought about doing "docker in docker" so we can avoid putting any of it into VMs at all (or gating multiple tests on a single VM in parallel)21:44
openstackgerritMatt Ray proposed a change to openstack-infra/config: Chef style testing enablement and minor speed cleanup starting w/block-storage  https://review.openstack.org/6796421:44
russellbewindisch: sure, whatever works ...21:44
russellbjust ... full tempest run on every patch :)21:45
sdagueanyway, I think with what's on the infra plate at this point, I think this is too late. Especially by a team that's not contributed to anything besides their corner of the world. So start helping on generic infra so we can free up some resources, and then it becomes part of the conversation21:45
russellbwhere "full" is a bit loose21:45
sdagueevery new feature has a cost, and i2 is the wrong place to be bringing this forward21:45
dansmithrussellb: well, I think the definition is "full, for everything you support, and show your config" :)21:46
russellbdansmith: yeah21:46
sdagueI'll let jeblair contradict me when he is well, but until then, I'll play bad cop :)21:46
dansmithI would think "we're not testing anything else until we can test what we already test" would be a reasonable answer until we get out of the current mess anyway, almost regardless of what it is21:47
russellb+1  to that21:48
ewindischsdague: we're a startup, putting a team onto openstack-ci work is really a non-starter. I've personally worked with openstack-ci stuff in the past, although admittedly as it improved my "own little corner of the world", but I'm not entirely fresh on this.21:48
boris-42mikal russellb sdague sorry for probably off top but we learn Rally to make deployment at scale21:48
* russellb stares down the top nova change in the queue21:48
boris-42I mean for 30 minutes we got 128 compute nodes21:48
russellbboris-42: huh?21:48
boris-42russellb yep we are working around Rally21:48
boris-42russellb thing that make benchmarking simple21:48
boris-42russellb so latest results are next simulation of compute_node (running it in LXC) requires 150MB RAM21:49
boris-42russellb and we are instead of deploying it (actually copy-pasting)21:49
boris-42russellb probably will be interesting for catching Rabbit/NovaNetwork/Scheduer/DB bottlnecks21:50
boris-42without having tons of resources21:50
boris-42and a lot of $$$21:50
sdagueewindisch: again, it's about timing. you can't show up at i2, when we are under huge strains in the existing system, and say "hey guys, I want you all to pivot out the test matrix and test our hypervisor"21:51
dansmithit's not like this nova requirement is new, or anything21:51
*** NikitaKonovalov is now known as NikitaKonovalov_21:52
mikalI think you could argue as well that our obligation to existing driver users is greater than our obligation to new drivers.21:52
fungirussellb: this is not the failure mode your new change is trying to fix, right? https://jenkins01.openstack.org/job/gate-grenade-dsvm/4786/consoleText21:52
mikalWe have a duty of care to the users we currently have21:52
*** jaypipes has joined #openstack-infra21:52
dansmithfungi: no21:53
fungiokay, good. because that cropped up with the proposed fix in place21:53
dansmithfungi: I think that's "the other one"21:53
ewindischsdague: I understand that. We're conflating two issues here of human and hardware resources. I acknowledge we might need to help with both, however.21:53
*** kirukhin has quit IRC21:53
dansmithfungi: i.e. switch the 8 and 021:53
fungidansmith: it's definitely a common one, because i've hit it on several changes today21:54
dansmithfungi: yar21:54
russellbyeah, not sure what that one is yet21:54
sdagueand we've got the other issue which is why would we play favorites on containers and pick docker instead of libvirt lxc21:54
ewindischsdague: presuming we could help with hardware, are the human-side strains still too hard?21:55
russellbsdague: well ... someone is actually trying to do the work for docker, heh21:55
sdaguewhich is why I think this is a summit conversation21:55
fungidansmith: ahh, yep, the cinderclient change behind it is also failing on that21:55
sdagueewindisch: yes21:55
*** _david_ has quit IRC21:55
sdaguethe infra team is massively strained at this point21:55
fungisdague: it's not *that* bad. i did actually sleep a few hours last night21:56
sdagueand we're probably going to need to do some heads down things to get the gate to a good state for i321:56
mikalsdague: don't forget Canonical's lxc specific driver, which has been in review for a while21:56
sdagueyep21:56
ewindischsdague / dansmith: and we haven't ignored those requirements-- Docker acknowledged that the gating work had to be done and resourced the effort -- which is in part what I've been hired to accomplish.21:57
portantesdague, fungi, clarkb: FWIW, I think you guys are doing a great job, and rely on your commitment and knowledge tremendously21:57
mikalsdague: there's at least three container options at the moment21:57
fungiportante: thanks!21:57
*** thuc has joined #openstack-infra21:57
sdagueportante: thanks21:57
russellbmikal: well ... 2 in tree21:58
russellbmikal: the other one didn't even have a blueprint last i saw it21:58
fungiewindisch: a related datapoint, note that there are a stack of changes proposed to support xenserver in upstream infra, started a while back, and still being hashed over21:58
russellbso, pretty far from even needing code review IMO21:58
sdaguemikal: right, which is why I said this is a summit conversation. Because I think containers in gate is a good idea, and I think it's a community conversation we should have. It's just not a now good idea.21:58
mikalrussellb: that's true, but it exists21:58
russellbfor some definition of exists21:58
russellbnot really relevant for this discussion of driver CI right now21:59
fungirussellb: sdague: IT LIVES21:59
sdaguefungi: sweet!21:59
russellbmerged?21:59
mikalI wonder how broken a tempest run with lxc containers turned on is?21:59
ewindischfungi: I'd have to look at those changes, but my perspective is that I'd target the docker gate to have no more impact than, say, adding a postgres gate as opposed to mysql21:59
fungirussellb: well, it *will* merge once zuul wakes up and processes the result it has there21:59
sdaguerussellb: passed everything21:59
russellbmikal: well first you'd have to come up with a tempest config that only hits what it supports21:59
russellbfungi: yay21:59
dansmithwoo!22:00
russellbnow, that other damn bug ...22:00
russellbmriedem: have you fixed it yet?  :-p22:00
sdaguesome times you do get the bear22:00
sdagueon a day like today, a win like that is a good one22:00
*** rnirmal has joined #openstack-infra22:00
fungiewindisch: right, their work involved needing separate test node configurations entirely (they have to reboot for new kernels and other stuff), so conceivably less involved for docker22:00
mriedemrussellb: nope, was thinking about pushing a test patch to increase the sleep in the libvirt volume module to see if it hits after a few rechecks, but i'm open to suggestion/help22:01
russellbmriedem: was mostly kidding of course :)P22:01
mriedemjsbryant said he looked at it a bit and nothing jumped out at him from the cinder changes22:01
ewindischfungi: we just need to install a userland package and run a daemon. We no longer have any special kernel requirements (there used to be a requirement on AUFS which required a newish vanilla kernel)22:01
fungisdague: well, it's a win, but it'll be the first change to merge through normal gating in 8 hours (per the openstack/openstack commit log)22:02
sdaguefungi: I'll take anything today22:02
ewindischfungi: the only special requirement we have right now is that our package isn't in precise-backports, only tahr (14.04)... I recognize it will be easier if we can use upstream ubuntu pacakges that work in Precise, so I'm pressing to get a package into precise-backports ASAP22:03
fungiewindisch: or ubuntu cloud archive for precise, assuming we can work out why it's still breaking tempest runs and nova unit tests22:03
*** nati_ueno has quit IRC22:04
ewindischfungi: at present, we have our own packages for precise that live in our own private repo (signed with our own key). I recognize that's troublesome in a few ways ;-)22:04
*** ArxCruz has quit IRC22:06
*** dizquierdo has joined #openstack-infra22:06
fungiewindisch: yes, i know you definitely understand that ;)22:07
*** beagles has quit IRC22:07
*** ArxCruz has joined #openstack-infra22:09
*** marun has quit IRC22:14
*** nati_ueno has joined #openstack-infra22:14
*** nati_ueno has quit IRC22:14
dansmithrussellb: merged22:14
russellb\o/22:15
russellbgood thing every patch isn't that hard to land22:15
russellb... usually22:15
dansmithwhat's with the big gap in the failure rates graph on the e-r page?22:15
*** nati_ueno has joined #openstack-infra22:16
*** ewindisch is now known as zz_ewindisch22:16
*** zz_ewindisch is now known as ewindisch22:17
*** Ajaeger1 has quit IRC22:18
*** ewindisch is now known as zz_ewindisch22:20
*** zz_ewindisch is now known as ewindisch22:21
*** nati_ueno has quit IRC22:23
openstackgerritDavanum Srinivas (dims) proposed a change to openstack-infra/devstack-gate: Temporary HACK : Enable UCA  https://review.openstack.org/6756422:26
*** jerryz has joined #openstack-infra22:27
*** yamahata has quit IRC22:29
*** michchap has quit IRC22:30
*** michchap has joined #openstack-infra22:31
*** senk has quit IRC22:32
jerryzfungi: ping22:32
*** dcramer_ has quit IRC22:33
*** thomasem has quit IRC22:33
fungijerryz: hi there22:34
*** nati_ueno has joined #openstack-infra22:35
*** jcoufal has quit IRC22:36
*** sandywalsh has joined #openstack-infra22:36
jerryzfungi: i have a question about third party testing. if a gerrit trigger is configured for a project, will every single create patch event of the project trigger a third party test ?22:37
fungijerryz: yes, in a normal configuration, it will22:37
jerryzfungi: even if the patch may not have anything to do with the plugin22:37
*** yamahata has joined #openstack-infra22:38
ewindischmikal: any idea how many patchsets per day on nova?22:38
*** dims has quit IRC22:39
mikalAbout 100 last I looked22:39
ewindischthanks22:39
mikalObviously around deadlines that spikes22:39
*** thuc has quit IRC22:39
fungijerryz: i'm not familiar enough with the gerrit-trigger plugin for jenkins to know whether it can filter on changes matching only specific file patterns. but as far as whether the desired result is to test on every patch, that's more of a question for the ptl who's insisting on test results (i don't know whether requirements are differing between nova, neutron and cinder driver testing)22:39
*** thuc has joined #openstack-infra22:40
*** thuc has quit IRC22:40
*** dizquierdo has quit IRC22:40
jerryzfungi: do you know what is the final decision on whether keep enable third party testing +1 privilege?22:43
*** senk1 has joined #openstack-infra22:43
mikalewindisch: I am lying it seems, its closer to 20022:43
* mikal is making a graph now22:43
jerryzfungi: if a third party testing account will post vote on any patch regarding the project, that would indeed require the third party ci infra to be stable.22:44
*** dkranz has quit IRC22:44
ewindischmikal: thanks22:44
mriedemewindisch: http://russellbryant.net/openstack-stats/nova-reviewers-30.txt22:44
mriedemNew patch sets in the last 30 days: 2564 (85.5/day)22:44
jerryzfungi: i mean -1 privilege22:45
*** dcramer_ has joined #openstack-infra22:46
fungijerryz: it's mostly consensus from the project it's voting on. there are some clarifications to the guidelines being proposed at https://review.openstack.org/6347822:47
*** jasondotstar has joined #openstack-infra22:50
*** carl_baldwin has joined #openstack-infra22:50
*** nati_ueno has quit IRC22:52
*** nati_ueno has joined #openstack-infra22:54
*** dims has joined #openstack-infra22:54
lifelessok back22:57
lifelessfungi: clarkb: where are we at with exhaustion ?22:57
sdaguedansmith: on top? graphite fell over22:58
fungilifeless: i've reverted to running my manual auxiliary nodepool delete loops from the cli to keep the stale deletes minimized22:58
dansmithsdague: ah, okay22:59
sdaguebecause, you know, we didn't have enough things breaking today :)22:59
fungii missed the graphite outage. who wound up fixing that?23:00
dansmithsdague: just sucks to be able to see the change, if any, from the recent merge, which is why I was asking23:00
*** jasondotstar has quit IRC23:00
sdagueyeh23:01
sdaguehonestly, it takes a while to build up data anyway23:01
*** carl_baldwin has quit IRC23:01
sdagueand I'm less trusting of the graphite numbers after I found that some of our interupts get reported as fails23:02
fungithat's something i think would have to be addressed in jenkins itself too23:02
*** carl_baldwin has joined #openstack-infra23:02
*** nati_ueno has quit IRC23:03
lifelessfungi: ahahahaha23:04
lifelessfungi: I found a 15m latency on periodic cleanup as well23:04
fungilifeless: ooh!23:04
*** nati_ueno has joined #openstack-infra23:04
sdaguelifeless: nice23:04
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Cleanup nodes in state DELETE immediately.  https://review.openstack.org/6797923:05
lifelessI may be misunderstanding state_time23:06
*** mrodden has quit IRC23:07
lifelessactually, I think that code block is entirely broken23:07
*** miqui has joined #openstack-infra23:07
* lifeless revisits23:07
lifelessyeah, its missing a now -23:08
*** senk1 has quit IRC23:09
*** miqui has quit IRC23:09
*** miqui has joined #openstack-infra23:09
*** miqui has quit IRC23:10
lifelessthere23:11
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Fix early-exit on recently-set-state in deleteNode  https://review.openstack.org/6798023:11
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Cleanup nodes in state DELETE immediately.  https://review.openstack.org/6797923:11
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Cleanup nodes in state DELETE immediately.  https://review.openstack.org/6797923:12
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Fix early-exit in cleanupOneNode  https://review.openstack.org/6798023:12
lifelesssorry for spam :)23:13
fungigrrr... my flight tomorrow just got cancelled23:13
*** sarob has joined #openstack-infra23:13
russellbfungi: :(  weather?23:16
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Log how long nodes have been in DELETE state.  https://review.openstack.org/6798223:17
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Consolidate duplicate logging messages.  https://review.openstack.org/6798323:17
*** dcramer_ has quit IRC23:20
fungirussellb: yeah, my layover was going to be in baltimore, which is now on lockdown for tomorrow (noaa/nws winter storm warning all day_23:21
russellbbummer23:21
fungijust rebooked through vegas instead, but i think the long leg will end up being without wifi as a result23:22
*** sarob has quit IRC23:22
russellbvegas is a good choice.  much worse places to be stuck than there, just in case23:22
clarkbfungi: :( its ok I will be back to full focus tomorrow23:23
fungiyeah, i figured it's slightly less likely to get buried under ice and snow23:23
fungiclarkb: yay!23:23
russellbi'm out for today ... on the volume bug, only candidate we have is https://review.openstack.org/#/c/67973/23:23
russellbjust going to watch that through some rechecks while we keep digging23:23
fungirussellb: thanks for the heads up23:24
russellbthat's https://bugs.launchpad.net/nova/+bug/127060823:24
jgriffithrussellb: agreed23:24
russellbjgriffith: mriedem thanks again23:25
mriedemnp, fun first day back :)23:25
sdagueclarkb: if you have a little focus now, the config change with the er - uncategorized list would be handy to help us stay figure out what other unknown bugs are in the reset pile23:26
sdagueit was very good gamification for jog0 to try to drive up our classification rate23:27
*** eharney has quit IRC23:28
*** jamielennox|away is now known as jamielennox23:29
*** derekh has quit IRC23:30
*** dcramer_ has joined #openstack-infra23:32
*** gokrokve_ has quit IRC23:33
*** gokrokve has joined #openstack-infra23:34
lifelessfungi: cron timing23:36
lifelessfungi: in nodepool23:36
openstackgerritlifeless proposed a change to openstack-infra/nodepool: Make cleanupServer optionally nonblocking.  https://review.openstack.org/6798523:37
*** gokrokve has quit IRC23:38
openstackgerritlifeless proposed a change to openstack-infra/config: Cleanup old servers every minute.  https://review.openstack.org/6798623:39
lifelessfungi: clarkb: would love https://review.openstack.org/#/c/67685 to be reviewed please23:39
*** carl_baldwin has quit IRC23:39
lifelessjeblair: I've pushed a stack that will do what I propose to nodepool; I'm giving it a basic test now23:40
jog0wow 3 patches in openstack/openstack in 8 hours :/23:49
*** rcleere has quit IRC23:53
lifelessyah, messed up23:53
lifelessdid you see jay's note that passlib isn't installed properly?23:53
*** rcleere has joined #openstack-infra23:53
lifeless> https://review.openstack.org/#/c/66670/23:54
lifelessThat second patch has the gate-tempest-dsvm-neutron-isolated job failing23:54
lifelesstrying to run keystone-manage pki-setup:23:54
lifelessImportError: No module named passlib.hash23:54
*** reed has joined #openstack-infra23:55
*** rcleere has quit IRC23:58
lifelessjog0: ^23:59
jog0lifeless: I did23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!