Friday, 2015-03-20

*** hichihara has joined #openstack-infra00:00
anteayawell we never fail to mark feature freeze in new and exciting ways00:01
mordredclarkb: do we make nova quota queries in nodepool?00:01
mordredclarkb: I thought we just hard-coded all of that in our config file00:02
clarkbmordred: I dunno, can grep00:02
mordredme too00:02
clarkbgit grep says no00:02
mordredI agree00:02
clarkband that the config determines it00:02
*** VijayTripathi has quit IRC00:02
*** dannywilson has quit IRC00:03
cineramapleia2: so it looks like the openstackid bit is working with my patch. zanata does seem to have sprouted an additional login button though which is a bit odd00:03
*** camunoz is now known as camunoz_mtg00:04
pleia2cinerama: I think we'll need to play around with that a bit anyway, since we need to make sure the *only* option for logging in is via openstackid00:04
pleia2cinerama: but yay!00:04
*** ZZelle_ has quit IRC00:04
cineramapleia2: this is of course testing against a local install of openstackid so i don't have any accounts on there as yet but it does seem to be redirecting correctly00:04
cineramapleia2: oh, both buttons go to openstackid...00:05
pleia2cinerama: so it's more than theoretical!00:05
cineramapleia2: as i said, it's a bit odd :)00:05
pleia2cinerama: hah, fun00:05
*** achanda has quit IRC00:05
fungithis feature freeze we took down a major service provider... wondering how we can top that next cycle ;)00:05
*** baoli has quit IRC00:06
jeblairmordred: i think we're going to continue to try to delete those servers00:06
* fungi knows we probably didn't, but that's what history will record00:07
fungishould we stop nodepool, delete those rows and then clean up aliens later?00:07
*** baoli_ has joined #openstack-infra00:07
clarkbcan we safely delete those rows out from under nodepool?00:07
clarkbprobably not00:07
clarkbso shutdown is required00:07
clarkb(just thinking out loud here)00:08
pleia2fungi: I'm going to go with that story00:08
*** baoli_ has quit IRC00:08
*** asselin_ has joined #openstack-infra00:08
*** garyh has quit IRC00:09
jeblairfungi, clarkb: i think row locks will prevent us from deleting while running.  stopping, delete, then alien cleanup later is probably best.00:09
mordredjeblair: oh - that's a good point00:09
fungii can give that a shot now unless someone else is already doing it00:10
jeblairfungi: go for it00:10
*** oomichi has joined #openstack-infra00:11
*** garyh has joined #openstack-infra00:11
*** emagana has quit IRC00:11
anteayayay the nova patch ttx needed merged00:13
reedwoot00:13
fungiokay, it's running again and the hpcloud nodes are all gone from the db00:13
anteayajust trove and cinder to go00:13
anteayacool00:13
jeblairfungi: the log seems to not be mentioning hpcloud which is good00:14
*** YorikSar has quit IRC00:14
SlickNikanteaya: trove patch is in the gate as well. *fingers crossed*00:14
reedmeanwhile I'm patting myself on the back for having understood how mediawiki template work (at very high level)  Check this out https://wiki.openstack.org/wiki/Template:InternshipIdea00:14
* reed proud of the useless (almost) knowledge accumulated00:14
anteayaSlickNik: yes I see that, and I have my reservations about that patch and shared my thoughts with ttx as well00:14
fungireed: cool--we have something similar for the third-party ci systems pages too00:15
anteayaSlickNik: the fact that patchset 2 failed jenkins 5 times disconcerts me, but it is your project00:15
anteayaand patchset 5 failed 3 times, etc. etc.00:15
reedfungi, neat-o00:15
reedfungi, now all those need are a category :)00:15
*** yamamoto has joined #openstack-infra00:15
*** sputnik13 has quit IRC00:16
fungireed: https://wiki.openstack.org/wiki/Template:ThirdPartySystemInfo00:16
SlickNikanteaya: I looked into that. The earlier failures we were seeing were caused by a flaky assert in one of the unit tests that made it past the gate.00:16
anteayaSlickNik: devs rechecking a patch rather than fixing it00:16
SlickNikanteaya: I have a different patch that's going through check now to fix that issue.00:16
anteayaSlickNik: yes00:16
anteayaSlickNik: okay00:16
reedfungi, do you mind if I add a category: to that template?00:17
fungireed: an additional category? i'm sure that's fine00:17
reedfungi, or, in other words, how do you collect all the pages create with that template?00:17
anteayareed: what category are you thinking?00:17
reedoh, I se it00:17
*** tiswanso_ has quit IRC00:17
*** koolhead17 has quit IRC00:17
anteayareed: https://wiki.openstack.org/wiki/ThirdPartySystems00:18
reedanteaya, fungi, my bad, I din't see [[Category:ThirdPartySystems]]00:18
fungireed: yeah, they all wind up in https://wiki.openstack.org/wiki/Category:ThirdPartySystems00:18
reedsupercool00:18
anteayathanks00:18
anteayadidn't know you didn't know00:18
reedthat part makes me hate mediawiki less00:18
reeda little less00:18
SlickNikreferring to https://review.openstack.org/#/c/165995/00:18
anteayawell that's good00:18
*** sdake_ has joined #openstack-infra00:19
openstackgerritDan Prince proposed openstack-infra/system-config: Re-order tripleo Zuul images (to see if it helps)  https://review.openstack.org/16605500:19
SlickNikanteaya: Was talking to fungi about exactly that yesterday. How to get folks to move away from the "recheck" habit.00:19
anteayaSlickNik: what did you come away with as an understanding?00:20
anteayashutting off a test that exposes a race doesn't feel right to me, btw00:20
anteayabut I'm curious to hear your take away from your conversation with fugi00:21
SlickNikanteaya: It's a race in the test, not in the code.00:21
anteayafungi00:21
anteayaokay00:21
SlickNikFor starters: Taking into consideration the number of rechecks a patch has gone through when reviewing the patchset.00:21
*** sdake__ has joined #openstack-infra00:21
*** sdake has quit IRC00:22
anteayathat is a good place to begin, I agree00:23
*** VijayTripathi has joined #openstack-infra00:23
*** stevemar has joined #openstack-infra00:23
SlickNikfungi mentioned that for the current patchset that number is now visible at the top of the review as well — which is super cool.00:23
anteayavery helpful00:24
greghaynesA nice thing we have in tripleo land is http://goodsquishy.com/downloads/s_tripleo-jobs.html which gives us pass rates, having stats on per-job pass rates could be pretty enlightening00:24
greghaynesnot sure if zuul has that already...00:24
clarkbgreghaynes: http://graphite.openstack.org00:25
*** sdake_ has quit IRC00:25
greghaynessince generally recheck is a side effect of a test that doesnt pass very often00:25
clarkbjogo has a set of graphs that he uses00:25
clarkbbuilt from that data00:25
greghaynesnice (although graphite)00:25
anteayaSlickNik: sounds like you have a good place to begin00:26
greghaynesIts neat because even just the % pass rate is a super useful stat00:26
anteayaSlickNik: well done00:26
*** dprince has joined #openstack-infra00:26
dprinceStill not seeing any Fedora 20 jobs in the TripleO rack.00:28
greghaynesdprince: clarkb says its your keystone00:28
SlickNikSorry I was looking at the graphite metrics.00:28
greghaynesdprince: apparently its configured to not use a routable address00:28
anteayaSlickNik: yup00:28
*** tkelsey has joined #openstack-infra00:28
clarkbgreghaynes: dprince ya just run nova --debug list00:28
dprincegreghaynes: so why are Ubuntu nodes running fine00:28
clarkbyou get back 10.1.8.37 as something to talk to00:29
clarkbdprince: I do not know but in my investigating I ran into ^00:29
SlickNikanteaya: thanks! it's still WIP but we hope to get more disciplined about it.00:29
greghaynesdprince: the other question id have for that is, did anything change?00:29
SlickNikAnd any data we can gather around how we're doing is super useful. :)00:29
*** tjones1 has joined #openstack-infra00:29
greghaynes(on the tripleo ci cloud end)00:29
*** garyh has quit IRC00:29
clarkbdprince: "links": [{"href": "http://10.1.8.37:5000/v2.0/", "rel": "self"}00:30
dprinceclarkb: you may have found a different issue, but by my local testing Fedora nodes fire up fine, and I can get a floating IP too00:30
dprinceclarkb: that is the local IP, weird00:30
dprinceclarkb: let me check some other things00:30
clarkbdprince: until ^ is fixed I don't think there is much I can do from this end00:30
anteayaSlickNik: sure00:31
dprinceclarkb: okay. looking into it00:31
anteayaSlickNik: the biggest thing I have seen is response to a test failure00:31
jeblairclarkb, fungi, mordred: we have an okay to start performing alien cleanup00:31
openstackgerritIan Wienand proposed openstack-infra/nodepool: Ignore stderr for documentation program output  https://review.openstack.org/16605700:31
anteayaSlickNik: the more the devs think oh perhaps something is wrong with my patch00:31
fungijeblair: i'll fire that off now00:31
anteayaSlickNik: the better the quaility of patches00:31
jeblairclarkb, fungi, mordred: i think they're going to ask us to turn things on again to verify the issue00:32
clarkbjeblair: while we have their attention maybe they can unlock that one node for us?00:32
jeblair(after we cleanup)00:32
anteayaSlickNik: when they believe that a bad structure is preventing their great code from merging, that is when problems arise00:32
jrollanteaya: relatedly, I've found that when I review my own patches, they tend to improve00:32
fungijeblair: i'm not sure i like the sound of that ;)00:32
anteayajroll: good point00:32
jeblairclarkb: good idea -- let's just delete everything and we'll give them a list of what we can't00:32
jeblairfungi: they aren't sure they do either :)00:32
clarkbjeblair: +100:32
*** tkelsey has quit IRC00:32
jrollanteaya: which is probably where that comes from, they re-read the patch to try to find the bug00:33
fungijeblair: clarkb: yep, i'll make an instance uuid list of whatever's left after deletes finish00:33
jeblairfungi: let me/us know if you can use another hand on cleanup00:33
anteayajroll: exactly00:33
SlickNikanteaya / jroll: ++00:33
anteayaSlickNik: so thanks for being proactive00:33
SlickNikit's def a mindset thing.00:33
anteayaSlickNik: taht helps00:33
anteayaSlickNik: it is00:34
fungi499 alien nodes00:34
mordredfungi: heh00:34
fungiin hpcloud anyway00:34
jrollwhoa, we broke hp cloud?00:34
jrolllol.00:34
* jroll is somewhat surprised hp fell over first00:35
*** ddieterly has joined #openstack-infra00:35
jeblairfungi: i figure it's probably safe to run deletes in series across N parallel processes, where 6<N<1200:35
*** markvoelker has quit IRC00:37
ianwdprince: i have no idea what's going on because scrollback is huge; but i see fedora 20 and just yesterday i fixed an issue with the latest f20 kernel update that creates a broken extlinux.conf and hence it no boot.00:38
clarkbianw: this is for the tripleo f20 nodes so we didn't think that was related00:39
ianwok, i figured there was hours of context i'm missing00:40
clarkbianw: and in attempts to investigate I ran into the 10.1.8.37 address coming back from keystone so tlaking to the cloud was cut short00:40
fungii have 10 parallel loops going over equal slices of the ~500 nodes00:40
fungihopefully this goes fairly quickly00:40
*** adalbas has joined #openstack-infra00:41
reedwow! I didn't know you could include inside a mediawiki page the content of another page by simply adding {{:name_of_the_page}}00:41
fungireed: you can also transclude subsections00:41
reedthey call it 'transclusion' http://www.mediawiki.org/wiki/Transclusion00:41
fungiyep00:41
dprinceianw: could be related00:42
jeblairfungi: have an idea how long each delete is taking?00:42
fungijeblair: i can probably wall clock time one. just a moment00:42
dprinceianw: we haven't had Fedora 20 tripleo jobs running all day00:42
dprinceianw: for TripleO that is (not speaking in general)00:43
fungijeblair: the bad news is, ~30sec each00:43
jeblairfungi: so could take 25 mins00:43
fungijeblair: yep00:43
*** adalbas has quit IRC00:44
fungiunless any "stuck" nodes cause some deletes to take an extra long time00:44
*** asettle has quit IRC00:44
fungiokay, i'm also seeing some go as quickly as 5 seconds, so i think it's extremely variable00:44
fungibecause you know, it's the cloud00:44
fungiwho needs consistency really?00:44
clarkbfungi: its eventually ocnsistent00:45
clarkbianw: mordred 16579200:45
clarkbianw: I think the whole point is to not raise the exception so that we can list the aliens that we do know about isntead of dying early and writing no prettytable00:46
greghaynesI should really make a test for that00:46
greghaynessince its been that kind of week00:46
mordredianw: yes, what clarkb said00:46
*** ddieterly has quit IRC00:46
*** dalgaaf has quit IRC00:47
*** tjones1 has quit IRC00:49
clarkbgreghaynes: comment for you on 16568200:50
*** ddieterly has joined #openstack-infra00:51
ianwmordred: ok, well it's still the only thing that writes to stderr like that ... maybe it should log.error .  or ignore me, that's fine too :)00:51
*** bknudson has joined #openstack-infra00:51
greghaynesclarkb: yea, so I was a bit confused and not wanting to read all of nova client code - why is novaclient.images a list of nodes you say?00:52
jeblairclarkb, fungi, mordred: i have to run now, i will check back in after dinner00:52
mordredkk00:52
clarkbgreghaynes: I am pretty sure its listing nodes there00:52
clarkbgreghaynes: because of the ip addrs00:52
fungithanks jeblair00:52
greghaynesclarkb: That list is instantiated as FakeClient.images00:52
greghaynesoh wait00:53
greghaynesits just our abstract list thing00:53
clarkbno create_image is a different method there00:53
greghaynesugh, so this needs some more work00:53
greghaynesthe problem is its used in more than one place00:53
ianwmordred: also, image_list doesn't want the same thing?00:53
clarkbgreghaynes: I don't think it needs major work00:53
clarkbgreghaynes: just a more generic fake object so the code doesn't read funny00:53
greghaynesclarkb: well, I also need to use it for glance images. I think the right thing to do is just get rid of FakeGlanceImage and add the method I need to dummy00:54
clarkbFakeCloudResource or something00:54
clarkbgreghaynes: ya or that00:54
greghaynesYea, thats essentially equivilent00:54
mordredianw: it may?00:55
greghaynesianw: Ive been fixing those bugs as I go when im adding fail tests, theres a lot of them00:55
greghaynesso its likely other commands need it00:55
clarkbgreghaynes: so you're good with that -1?00:56
greghaynesclarkb: yes00:56
greghaynesI mean, its correct ;)00:56
greghaynesclarkb: another note about that patch - it leaves fake dib images around in the test dir00:58
greghaynesnot sure how much we care about that00:58
clarkbgreghaynes: isn't the test dir a tmpdir fixture?00:59
clarkbgreghaynes: if thats the case it should be fine since the fixture should clean it up00:59
*** mfink_ has joined #openstack-infra01:00
greghaynesoh, good point, we must not be passing a path to the fixture dir for the dib image output dest01:00
*** corvusphone has joined #openstack-infra01:00
greghaynesill mess with that too01:02
*** Sukhdev has joined #openstack-infra01:02
clarkboh btw one suggestion on the jenkins bug I filed is that we use a jenkins cloud slave plugin instead01:03
*** Ryan_Lane has quit IRC01:03
dprinceclarkb: try now01:03
dprinceclarkb: nova --debug lists01:03
dprincelist01:03
clarkbdprince: yup that worked (well I did nova floating ip list01:03
clarkbdprince: I think at least part of the problem is we have leaked floating IPs01:04
clarkbso will clean that up now01:04
dprinceclarkb: I'm not aware of any changes we put into place on our side,  so I'll check on how this keystone setting could have been altered.01:04
dprinceclarkb: otherwise I'm not sure how this ever worked01:05
clarkb{"message": "Unknown auth strategy", "code": 500, "created": "2014-11-23T20:17:29Z"} errors like that for a precise node01:05
dprinceclarkb: Yeah, I saw those too. Possibly related to this keystone change?01:05
clarkbdprince: maybe?01:06
*** mfink_ has quit IRC01:06
dprinceclarkb: I was able to create a Fedora node 45 minutes ago01:06
clarkbanyways let me clean up the floating ips and see if tht makes it a healthier cloud01:06
dprinceclarkb: and a floating ip too01:06
dprinceclarkb: yeah, step at a time. Cleanup and lets see :)01:06
*** LinuxJed_ has quit IRC01:07
*** mayurig has joined #openstack-infra01:08
*** dims has joined #openstack-infra01:08
*** asettle has joined #openstack-infra01:10
*** tnovacik has quit IRC01:10
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Treat non-existant output files as empty files  https://review.openstack.org/16606201:11
*** ghostpl_ has joined #openstack-infra01:11
*** ddieterl_ has joined #openstack-infra01:11
*** ddieterly has quit IRC01:12
fungideletes are wrapping up now01:12
dprinceianw: while I'm waiting is there a ticket open for the Fedora boot error you mentioned?01:12
corvusphoneclarkb: speaking of which, we should check ports and ips on hpcloud01:12
*** mfink_ has joined #openstack-infra01:12
ianwdprince: see the comments in https://review.openstack.org/#/c/165681/1/install_puppet.sh01:12
clarkbcorvusphone: I can do that now01:12
dprinceianw: thanks01:12
ianwdprince: my desire to debug grubby on f20 was/is quite low, especially when it works with a later version of it01:13
clarkbya floating IPs definitely leaked there01:13
dprinceianw: sound fine to me01:13
clarkbstarting a round of deletions for hpcloud FIPs01:13
mordredcorvusphone: WOAH! when did you start corvusphoning?01:13
*** ivar-lazzaro has quit IRC01:13
mordredclarkb: harvard is going toe to toe with unc01:14
clarkbmordred: I have my television on downstairs with no one watching it like a good MURICAN01:14
fungilast of the deletes just finished but there are 52 which didn't delete, so i'm starting a second pass with just those01:14
anteayamordred: when we break hpcloud it appears01:14
clarkbfungi: ok, I just starting floating ip deletion01:15
anteayaclarkb: ha ha ha01:15
clarkband I just got rate limited01:15
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Convert all inline publisher examples to tests  https://review.openstack.org/16606401:15
corvusphonemordred: its just the webchat. I need to put a proper setup on my phone.01:15
*** LinuxJedi has joined #openstack-infra01:15
*** ghostpl_ has quit IRC01:16
clarkbI have restarted floating ip deletes serially01:16
clarkbcan I just say that independently managing 3 different resources in order to get one working VM is really not fun01:17
*** markvoelker has joined #openstack-infra01:17
clarkbespecially when I get rate limited doing it when reality is I need one api call to get one node (or maybe more than one node)01:18
*** asettle has quit IRC01:19
*** asettle has joined #openstack-infra01:19
lifelessratelimiting is a PITA01:21
*** Sukhdev has quit IRC01:21
*** prad has quit IRC01:21
mordredclarkb: HARVARD JUST TOOK THE LEAD01:22
*** markvoelker has quit IRC01:22
mordredclarkb: 3-point shot and the foul01:22
mordredclarkb: 1:15 to go01:22
*** otter768 has joined #openstack-infra01:23
clarkbha I just turned it off because I realized I didn't need it on01:23
clarkbbut maybe I should go back and watch01:23
clarkblifeless: yes xargs needs a rate limit flag01:23
clarkbI could put a sleep in the commands I suppose01:24
lifelessyes01:24
lifelessand cry into your sleep01:24
*** claudiub has quit IRC01:24
*** corvusphone has quit IRC01:24
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Only query jenkins plugins if config provided  https://review.openstack.org/15882601:25
*** corvusphone has joined #openstack-infra01:25
*** dmorita has quit IRC01:26
mordredclarkb: dude. that was almost CRAZY01:26
clarkbdprince: {"message": "Failed to terminate process 16378 with SIGKILL: Device or resource busy", "code": 500, "created": "2015-03-20T01:04:29Z"} is the error I see on a random f20 node01:27
*** otter768 has quit IRC01:27
corvusphonemordred: who won?01:27
clarkbdprince: also floating IPs should be cleaned up now01:27
mordredcorvusphone: unc01:28
mordredcorvusphone: at. the. end01:28
* mordred using weechat android ...01:29
clarkbI am going to run the delete port script on hpcloud now01:29
mordredclarkb: cool01:29
mordredfungi: we ready to start ramping up again yet?01:30
*** garyh has joined #openstack-infra01:30
clarkbfloating IPs were cleaned up so its just the ports left though port list was small so if we have leaked there its minimal01:31
fungieach pass through retrying to delete i manage to knock down a few more, but there are still 45 which haven't succeeded yet01:31
mordredthat's so special01:31
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Convert all inline publisher examples to tests  https://review.openstack.org/16606401:31
fungishould we just give that list of uuids to hpcloud and fire back up?01:31
fungiwe were able to delete >90% anyway01:32
clarkbI am being dragged to dinner01:32
clarkbwill check in later01:32
fungioh, the requests must still be getting processed because it's dropped to 38 now01:33
mordredkk01:33
*** tiswanso has joined #openstack-infra01:33
mordredfungi: so - maybe we should just turn back on and see how it goes?01:34
fungii guess that's what "Request to delete server X has been accepted." means01:34
fungimordred: yeah, it's probably safe to hand-revert 166043 now01:34
mordredfungi: well, it's hand applied :)01:34
mordredfungi: I'll do that now01:35
fungiindeed01:35
mordredfungi: rate: 4.0 -- is that the thing I should adjust to adjust api rate limit?01:36
fungiyeah01:36
mordredrackspace is set to 1.001:36
mordredmaybe I should set hp to that too to be nice?01:36
fungimordred: i think it's inversely named and is actually a frequency?01:38
fungias in the lower the number the faster we poll (value indicating fraction of a second delay between polls)01:38
*** baoli has joined #openstack-infra01:39
corvusphoneRackspace is much lower (faster) than 1.001:40
*** garyh has quit IRC01:40
corvusphone4.0 is one request every 0.8 secs (4.0/5)01:40
corvusphone(5 HP providers)01:41
*** virmitio has quit IRC01:42
mordredcorvusphone: kk. cool01:43
openstackgerritStephanie Miller proposed openstack-infra/puppet-zanata: Add OpenID login provider support to Zanata config  https://review.openstack.org/16607301:43
*** ddieterl_ has quit IRC01:45
*** baoli_ has joined #openstack-infra01:45
*** ddieterly has joined #openstack-infra01:45
*** harlowja_ is now known as harlowja_away01:46
*** otter768 has joined #openstack-infra01:46
mordredyou know - when I get the shade patch done01:47
*** baoli has quit IRC01:48
mordredsome of the metadata caching support may be nice - like flavors01:48
*** garyh has joined #openstack-infra01:49
mordredor - it's possible that "tail -f debug.log | grep hpcloud is only showing me FlavorListTask ...01:49
fungii'm winding down here, but will try to keep an eye on irc for a little while longer01:49
*** tsg has quit IRC01:50
anteayathe trove patch ttx has been waiting on is in the top of the gate01:51
cineramapleia2, StevenK: did we ever work out the deal with the zanata client?01:51
anteayaand these two cinder patches: https://review.openstack.org/#/q/status:open+topic:cinder-driver-removals,n,z01:52
anteayaand I do believe that is all ttx needs01:52
*** camunoz_mtg has quit IRC01:52
StevenKcinerama: Packaging it is hard.01:54
StevenKcinerama: We don't have support for building and updating the packaging anyway.01:54
StevenKcinerama: I have a WIP patch to change system-config to install the cli client on the proposal slave01:54
cineramaStevenK: sounds like we should just install like we do with other nonpackaged stuff in puppet01:55
StevenKcinerama: Yes, which is what my WIP patch does01:55
cineramaStevenK: oh cool. is it up anywhere yet?01:56
StevenKcinerama: I want to test it before pushing it up01:56
cineramaStevenK: bo-ring :)01:56
StevenKHah01:56
mordredfungi, clarkb, jeblair: it FEELS like nothing is happening, other than hp listing flavors01:56
cineramainsert dos equis man hipchat emoji here01:57
clarkbmordred are there deficit calculations and allocations?01:57
clarkbmordred thats what Ibwould look for in the log01:57
*** tqtran has quit IRC01:58
*** asettle has quit IRC01:58
*** tiswanso has quit IRC01:58
*** garyh has quit IRC02:00
mordredclarkb: 2015-03-20 01:57:16,230 DEBUG nodepool.NodePool:   Deficit: bare-trusty: 0 (start: 232 min-ready: 8 ready: 240 capacity: 75)02:00
*** woodster_ has quit IRC02:00
clarkbok so it wants to start 232 thats good02:00
clarkbbelow that should be the allocations any to hpcloud?02:00
clarkboh! do we still have images?02:01
clarkbmaybe those are building?02:01
*** tiswanso has joined #openstack-infra02:01
mordredclarkb: welll... a) I don't see allocations - but I did just get a 500 error02:01
*** asettle has joined #openstack-infra02:02
mordredclarkb: just went to hell02:03
mordredclarkb: quotas back to 002:03
clarkboh?02:03
mordredyah02:03
mordredsame thing as before02:03
*** unicell1 has quit IRC02:04
*** asselin_ has quit IRC02:04
mordredI show 381 nodes in nodepool in some sort of state02:05
mordredclarkb: does our rate limit apply to database as well?02:05
mordredgah02:06
mordredto delete02:06
clarkbmordred ya02:06
*** camunoz_mtg has joined #openstack-infra02:06
clarkbso maybe bump it up with quota 002:06
mordredok - I just set the rate to 1602:06
mordredwith quota 002:06
*** patrickeast has quit IRC02:08
*** yamahata has quit IRC02:11
clarkbdid that help?02:11
jrollmordred: are these the times where you wish you had access to HP control plane?02:11
mordredjroll: NO02:11
jrolllol02:11
mordredclarkb: the api plane seems to be recovering02:11
dprinceclarkb: are all the F20 nodes failing that way?02:11
jrollinteresting, I would want to figure out what's wrong02:11
anteayaare there times you wish you had access to HP control plane, mordred?02:12
dprinceclarkb: any successful onces?02:12
*** nelsnels_ has joined #openstack-infra02:12
mordredjroll: well, I mean - I sort of would - but I don't like the semblance of cuplability for a system I don't own02:12
dprinceclarkb: nm, I can see those as well02:13
clarkbdprince I cant check right now, getting foods02:13
*** sigmavirus24 is now known as sigmavirus24_awa02:14
jrollmordred: I guess I get that; kind of like how y'all think I have magic rackspace powers02:14
jroll:P02:14
*** jyuso has quit IRC02:14
*** nelsnelson has quit IRC02:14
*** tiswanso has quit IRC02:15
mordredclarkb: 16 wasn't enough for them to recover - I've set it to 6402:16
clarkbmordred ok02:16
anteayajroll: you don't have magic rackspace powers?02:16
* anteaya realizes another illusion is blown02:16
jrolllol02:18
jrollanteaya: I have rackspace internal irc and ldap02:18
jrollturns out those are pretty useful02:18
*** markvoelker has joined #openstack-infra02:18
anteayayou do have magic powers02:18
anteayawhew02:18
jroll:P02:18
*** markvoelker has quit IRC02:22
*** mayurig has quit IRC02:23
openstackgerritgreghaynes proposed openstack-infra/nodepool: Monkeypatch Fake Clients for tests  https://review.openstack.org/16568202:23
*** sdake__ has quit IRC02:25
*** sdake has joined #openstack-infra02:25
*** bhunter71 has joined #openstack-infra02:27
greghaynesclarkb: ^ I think that addresses your comments02:28
mordredclarkb: oh god02:29
mordredI just looked at nova source code02:29
greghaynesmordred: some things cant be unseen?02:30
mordred# NOTE(johannes): The quota code uses SQL locking to ensure races don't02:30
mordred# cause under or over counting of resources. To avoid deadlocks, this02:30
mordred# code always acquires the lock on quota_usages before acquiring the lock02:30
mordred# on reservations.02:30
mordred*STABSTABSTAB*02:30
*** kaisers1 has joined #openstack-infra02:31
mordredclearly written by someone who knows nothing about databases02:31
mordredand should stop writing database code02:31
anteayaso talking with thingee, he is way more forgiving of ci account ops than I am, but it is his call and I am supporting him02:31
anteayahe might be coming in here and asking for a ci account to be disabled02:31
mordredokie02:31
anteayaI've given him a paste with all the info like the gerrit id if he decides to go ahead with it02:31
anteayattx's nova and trove patches are in02:32
anteayaand I'm going to bed02:32
anteayanight02:32
*** jamielennox is now known as jamielennox|lunc02:32
*** jamielennox|lunc is now known as jamielennox|food02:32
*** kaisers has quit IRC02:32
mordredcorvusphone: I'm pumpkin-ing - I'm lurking, but need to stop doing active things02:32
mordredcorvusphone: the current status is that we're back off for creates and deletes for hp in nodepool are throttled to 6402:33
mordredI think we can find an appropriate throttle number - but I also think there is an issue that should be solved internally too02:33
mordredso I'm not particularly interested in chasing the tip of a thundering herd02:33
mordredwhich is what we're doing right now02:33
*** unicell has joined #openstack-infra02:38
*** bhunter71 has quit IRC02:39
*** woodster_ has joined #openstack-infra02:39
openstackgerritJerry Zhao proposed openstack-infra/nodepool: add option to use ipv6 for image update and node launching  https://review.openstack.org/15617802:40
*** bhunter71 has joined #openstack-infra02:40
tchaypodstufft: around, perchance?02:41
*** sputnik13 has joined #openstack-infra02:45
*** achanda has joined #openstack-infra02:45
*** ujuc has joined #openstack-infra02:45
*** unicell has quit IRC02:45
*** mfink__ has joined #openstack-infra02:48
*** unicell has joined #openstack-infra02:48
*** mfink_ has quit IRC02:48
mordredclarkb, corvusphone: acually, I'm trying one more thing - I'm trying turning creates back on with the crazy-low rate limit02:50
*** weshay has quit IRC02:51
*** weshay has joined #openstack-infra02:52
*** amotoki has joined #openstack-infra02:53
*** tsg has joined #openstack-infra02:55
*** asettle has quit IRC02:55
*** greghaynes has quit IRC02:58
clarkbmordred how is that going?02:59
*** ghostpl_ has joined #openstack-infra02:59
*** garyh has joined #openstack-infra03:00
mordredclarkb: so far so good03:01
mordredclarkb: last time it took a while before stuff started dying03:01
*** jamielennox|food is now known as jamielennox03:01
mordredclarkb: but I'm 95% convinced taht deletes are the problem03:01
mordredso we don't see the problem until we start trying to delete things03:01
clarkbhuh03:02
*** asettle has joined #openstack-infra03:02
*** mwagner_lap has joined #openstack-infra03:03
*** ddieterly has quit IRC03:04
mordredclarkb: I believe it's a thundering herd that's caused by the quota code doing table locks, combined with delete using soft deletes and bad queries so taht the delete query quota updating is tying up the create quota calculation in the table lock03:04
mordredso if delete performance is slow, it casues everything to stack up03:04
*** greghaynes has joined #openstack-infra03:05
*** sdake_ has joined #openstack-infra03:05
*** subscope_ has joined #openstack-infra03:05
*** xyang1 has joined #openstack-infra03:08
*** ghostpl_ has quit IRC03:09
*** radez is now known as radez_g0n303:09
*** sdake has quit IRC03:10
mordredclarkb: it seems to not be falling over03:10
*** garyh has quit IRC03:10
*** sdake has joined #openstack-infra03:10
clarkbthats good03:11
clarkbdid they deploy new nova quota code recently?03:11
*** amotoki has quit IRC03:11
mordreddon't think so03:12
mordredI think it was our scheduling fix earlier that triggered this particular interaction03:12
corvusphoneMordred we should probably revert my nodepool patch03:12
corvusphoneIt will serialize deletes03:13
clarkbexcept thibgs have done exceptionslly poorly there the last few weekz03:13
clarkbbut maybe this just makesnit worse03:13
corvusphoneSuper slow but will not overwhelm them03:13
clarkbya03:13
*** sdake_ has quit IRC03:14
corvusphoneYes. I mean we should do that now while they fix it.  Because frankly this is the worlds easiest dos03:14
mordredcorvusphone: well, the current rate limiting is holding steady03:15
mordredcorvusphone: I have not checked to see if we're winding up with any nodes03:15
*** coolsvap has joined #openstack-infra03:16
mordredcorvusphone: also - the cloud noc folks are very motivated to figure out root cause on the delete thing03:17
*** nelsnels_ has quit IRC03:18
corvusphoneat 12s per request were probably performing worse than before03:18
mordredyeah03:19
*** markvoelker has joined #openstack-infra03:19
corvusphoneBut I guess it won't hurt to keep it like tgis03:19
corvusphoneAnd its easy to ramp up if the NOC asks us to03:19
*** jyuso1 has joined #openstack-infra03:19
mordredcorvusphone: well, also - we can try reverting your patch in the morning when we're all awake03:20
*** dims has quit IRC03:20
*** bhunter71 has quit IRC03:20
corvusphoneOk you don't have to convince me :)03:21
*** markvoelker has quit IRC03:23
*** coolsvap|afk has joined #openstack-infra03:23
openstackgerritMerged openstack-infra/nodepool: Move nodepool creation in tests to common method  https://review.openstack.org/16558103:25
*** coolsvap has quit IRC03:25
*** coolsvap|afk is now known as coolsvap03:26
*** coolsvap is now known as coolsvap|afk03:26
*** coolsvap|afk is now known as coolsvap03:27
*** otter768 has quit IRC03:27
*** emagana has joined #openstack-infra03:28
*** otter768 has joined #openstack-infra03:29
*** spzala has quit IRC03:30
*** corvusphone has quit IRC03:31
*** sputnik13 has quit IRC03:33
*** sputnik13 has joined #openstack-infra03:37
*** otter768 has quit IRC03:39
*** gyee has quit IRC03:41
*** sputnik13 has quit IRC03:44
*** achanda has quit IRC03:49
*** ujuc has quit IRC03:51
*** asettle has quit IRC03:52
*** ujuc has joined #openstack-infra03:54
*** sputnik13 has joined #openstack-infra03:55
*** armax has quit IRC03:55
*** asettle has joined #openstack-infra03:56
*** asettle has quit IRC03:56
*** asettle has joined #openstack-infra03:57
*** achanda has joined #openstack-infra03:59
*** achanda has quit IRC04:01
*** mayurig has joined #openstack-infra04:01
*** dannywilson has joined #openstack-infra04:02
*** ddieterly has joined #openstack-infra04:05
*** dannywilson has quit IRC04:05
*** dannywilson has joined #openstack-infra04:06
zaroclarkb: any interest going to NW linux fest this year?04:07
*** sabeen has joined #openstack-infra04:08
*** Sukhdev has joined #openstack-infra04:08
clarkbI thoughr about it but probably wont make it04:09
*** amotoki has joined #openstack-infra04:09
*** ddieterly has quit IRC04:09
*** achanda has joined #openstack-infra04:12
*** sputnik13 has quit IRC04:15
*** camunoz_mtg has quit IRC04:15
*** achanda has quit IRC04:15
*** VijayTripathi has quit IRC04:16
*** sputnik13 has joined #openstack-infra04:16
*** rlucio has quit IRC04:17
*** Somay has joined #openstack-infra04:19
*** markvoelker has joined #openstack-infra04:19
*** sushilkm has joined #openstack-infra04:20
*** sushilkm has left #openstack-infra04:20
*** mmedvede has joined #openstack-infra04:21
*** mayurig has quit IRC04:21
zaroi think i'll be there, 1st time.  any tip on where to stay?04:23
*** markvoelker has quit IRC04:24
*** dims has joined #openstack-infra04:25
*** wuhg has joined #openstack-infra04:26
*** sputnik13 has quit IRC04:27
*** camunoz_mtg has joined #openstack-infra04:27
clarkbnot really the only time I went I stayed in bad hotel off freeway04:28
wuhghow can i add search by subject keyword to https://review.openstack.org/#/q/status:open+project:openstack-dev/devstack,n,0033de7e000285e904:28
clarkbI would try downtown or water front areas if possible04:28
*** Qiming_ has joined #openstack-infra04:29
clarkbwuhg message:"some message"04:30
Qiming_hello, infra04:31
thingeewaiting on something that's blocking the tag for Cinder in k3...it has one job stuck in queued for some time.04:31
Qiming_another review is appreciated: https://review.openstack.org/#/c/164963/04:31
thingee166003 review04:31
wuhgclarkb: thanks ,it works04:32
*** dims has quit IRC04:32
*** rkukura has quit IRC04:33
*** asettle has quit IRC04:35
*** sputnik13 has joined #openstack-infra04:36
*** tkelsey has joined #openstack-infra04:36
*** tkelsey has quit IRC04:41
*** sputnik13 has quit IRC04:44
*** sputnik13 has joined #openstack-infra04:46
*** Sukhdev has quit IRC04:53
*** ghostpl_ has joined #openstack-infra04:54
*** rkukura has joined #openstack-infra04:54
*** baoli_ has quit IRC04:57
*** yamahata has joined #openstack-infra04:58
*** sputnik13 has quit IRC04:59
*** sputnik13 has joined #openstack-infra05:00
*** amotoki_ has joined #openstack-infra05:00
*** ghostpl_ has quit IRC05:01
*** sigmavirus24_awa is now known as sigmavirus2405:02
*** sputnik13 has quit IRC05:02
*** chlong has quit IRC05:02
*** sputnik13 has joined #openstack-infra05:03
*** ddieterly has joined #openstack-infra05:06
*** achanda has joined #openstack-infra05:08
*** VijayTripathi has joined #openstack-infra05:08
*** ddieterly has quit IRC05:10
*** mriedem_away has quit IRC05:18
*** mriedem has joined #openstack-infra05:18
*** mriedem has quit IRC05:18
*** mriedem has joined #openstack-infra05:18
*** chlong has joined #openstack-infra05:19
*** markvoelker has joined #openstack-infra05:20
*** garyh has joined #openstack-infra05:21
*** markvoelker has quit IRC05:25
*** coolsvap is now known as coolsvap|afk05:28
*** jyuso1 has quit IRC05:31
*** garyh has quit IRC05:31
*** alexpilotti has quit IRC05:35
*** tsg has quit IRC05:35
*** coolsvap|afk is now known as coolsvap05:36
*** sputnik13 has quit IRC05:46
*** sputnik13 has joined #openstack-infra05:47
*** hdd has joined #openstack-infra05:48
*** Somay has quit IRC05:52
*** sputnik13 has quit IRC05:54
*** reed has quit IRC05:55
*** dannywilson has quit IRC05:58
*** xyang1 has quit IRC06:00
*** chlong has quit IRC06:01
*** hdd has quit IRC06:03
*** sdake_ has joined #openstack-infra06:04
*** VijayTripathi has quit IRC06:08
*** sdake has quit IRC06:08
*** sputnik13 has joined #openstack-infra06:11
*** BharatK has quit IRC06:11
*** BharatK has joined #openstack-infra06:12
*** sputnik13 has quit IRC06:12
*** chlong has joined #openstack-infra06:13
*** [HeOS] has quit IRC06:16
*** dims has joined #openstack-infra06:18
*** nilasae has joined #openstack-infra06:18
*** emagana has quit IRC06:18
*** sdake has joined #openstack-infra06:18
*** markvoelker has joined #openstack-infra06:21
*** sdake_ has quit IRC06:22
*** dims has quit IRC06:24
*** markvoelker has quit IRC06:26
*** mrda is now known as mrda-afk06:31
*** garyh has joined #openstack-infra06:32
*** fifieldt has joined #openstack-infra06:32
openstackgerritSteve Kowalik proposed openstack-infra/system-config: Add zanata-cli utility to proposal slave  https://review.openstack.org/16610906:32
*** jamielennox is now known as jamielennox|away06:35
StevenKpleia2, cinerama: ^06:35
*** macjack has joined #openstack-infra06:36
*** deepakcs has joined #openstack-infra06:40
*** macjack has quit IRC06:40
thingeegate queue just restart?06:40
thingeeI had two jobs pending to cut cinder that were five mins left from being done...and now back to an hour06:41
* thingee wants sleep06:41
*** macjack has joined #openstack-infra06:41
*** garyh has quit IRC06:42
*** subscope_ has quit IRC06:42
*** teran has quit IRC06:43
*** jyuso1 has joined #openstack-infra06:44
*** sigmavirus24 is now known as sigmavirus24_awa06:44
*** ghostpl_ has joined #openstack-infra06:44
thingeeand gate just restarted all my builds again06:48
*** emagana has joined #openstack-infra06:49
*** juggler_ is now known as juggler06:50
*** macjack has quit IRC06:50
*** mrunge has joined #openstack-infra06:52
openstackgerrityolanda.robla proposed openstack-infra/project-config: Add stackforge/puppet-nscld  https://review.openstack.org/16592206:53
*** yolanda has joined #openstack-infra06:54
*** emagana has quit IRC06:54
*** ghostpl_ has quit IRC06:55
*** yamahata has quit IRC06:58
*** yamahata has joined #openstack-infra06:58
*** Bsony has quit IRC07:01
*** fandi has joined #openstack-infra07:05
*** fandi has quit IRC07:06
*** fandi has joined #openstack-infra07:07
*** ddieterly has joined #openstack-infra07:08
*** coolsvap is now known as coolsvap_07:09
*** fandi has quit IRC07:10
*** scheuran has joined #openstack-infra07:10
*** fandi has joined #openstack-infra07:10
*** ddieterly has quit IRC07:12
*** fandi has quit IRC07:13
*** fandi has joined #openstack-infra07:14
*** emagana has joined #openstack-infra07:15
*** fandi has quit IRC07:17
*** achanda has quit IRC07:17
*** fandi has joined #openstack-infra07:17
*** emagana_ has joined #openstack-infra07:18
*** achuprin has quit IRC07:19
*** emagana has quit IRC07:20
*** sabeen has quit IRC07:20
*** fandi has quit IRC07:21
*** fandi has joined #openstack-infra07:22
*** markvoelker has joined #openstack-infra07:22
*** emagana_ has quit IRC07:23
*** fandi has quit IRC07:25
*** fandi has joined #openstack-infra07:26
*** markvoelker has quit IRC07:27
*** achanda has joined #openstack-infra07:29
*** fandi has quit IRC07:29
*** fandi has joined #openstack-infra07:30
openstackgerritJan Provaznik proposed openstack-infra/project-config: Create os-cloud-management project on Stackforge  https://review.openstack.org/16543307:31
*** achuprin has joined #openstack-infra07:32
*** fandi has quit IRC07:33
*** fandi has joined #openstack-infra07:33
*** yfried|afk is now known as yfried07:33
openstackgerritgreghaynes proposed openstack-infra/nodepool: Monkeypatch Fake Clients for tests  https://review.openstack.org/16568207:36
*** fandi has quit IRC07:37
*** fandi has joined #openstack-infra07:37
*** camunoz_mtg has quit IRC07:38
*** fandi has quit IRC07:39
*** shardy has joined #openstack-infra07:40
*** garyh has joined #openstack-infra07:42
*** Bsony has joined #openstack-infra07:43
GheRiveromorning07:43
*** chlong has quit IRC07:45
*** ildikov has quit IRC07:48
openstackgerritMerged openstack-infra/project-config: Update puppet-setproxy to belong to Gozer group  https://review.openstack.org/16482007:48
*** arxcruz has joined #openstack-infra07:48
yolandahi AJaeger, thx for the approval. How can we manage to get some people added to the gozer gerrit group?07:49
yolandamorning GheRivero07:49
*** e0ne has joined #openstack-infra07:51
*** yfried is now known as yfried|afk07:51
*** garyh has quit IRC07:53
*** ibiris_away is now known as ibiris07:55
*** markus_z has joined #openstack-infra07:56
*** Somay has joined #openstack-infra07:57
*** jistr has joined #openstack-infra07:58
*** asselin_ has joined #openstack-infra08:00
*** jcoufal has joined #openstack-infra08:01
*** dtantsur|afk is now known as dtantsur08:02
*** Somay has quit IRC08:04
*** asselin_ has quit IRC08:05
*** Somay has joined #openstack-infra08:05
*** __mimir has joined #openstack-infra08:08
*** ddieterly has joined #openstack-infra08:08
*** __mimir has quit IRC08:09
*** dims has joined #openstack-infra08:09
*** __mimir has joined #openstack-infra08:09
*** Somay has quit IRC08:11
*** ghostpl_ has joined #openstack-infra08:11
*** oomichi has quit IRC08:12
*** tnovacik has joined #openstack-infra08:12
*** emagana has joined #openstack-infra08:12
*** Longgeek has joined #openstack-infra08:12
*** ddieterly has quit IRC08:13
*** ominakov has joined #openstack-infra08:13
openstackgerritgreghaynes proposed openstack-infra/nodepool: Don't die while doing alien list  https://review.openstack.org/16579208:14
*** mpavone has joined #openstack-infra08:15
*** emagana has quit IRC08:17
*** ghostpl_ has quit IRC08:17
*** dims has quit IRC08:18
*** _nadya_ has joined #openstack-infra08:19
*** e0ne has quit IRC08:19
*** openstackgerrit has quit IRC08:22
*** openstackgerrit has joined #openstack-infra08:22
openstackgerritgreghaynes proposed openstack-infra/nodepool: Dont die on alien-image-list failure  https://review.openstack.org/16613208:22
*** markvoelker has joined #openstack-infra08:22
*** achanda has quit IRC08:23
*** achanda has joined #openstack-infra08:27
*** ildikov has joined #openstack-infra08:27
*** markvoelker has quit IRC08:27
*** dboik_ has quit IRC08:28
*** deepakcs has quit IRC08:30
*** boris-42 has quit IRC08:32
*** hashar has joined #openstack-infra08:36
*** Bsony_ has joined #openstack-infra08:39
*** Bsony has quit IRC08:40
*** dtantsur is now known as dtantsur|bbl08:41
AJaegeryolanda, wait for one of the infra roots to add you to the gozer gerrit group. Let's ask fungi or clarkb to it during the US morning.08:46
*** achanda has quit IRC08:50
*** marun has quit IRC08:51
*** garyh has joined #openstack-infra08:53
*** stevemar has quit IRC08:54
*** Somay has joined #openstack-infra08:56
*** andreykurilin_ has joined #openstack-infra08:58
*** dannywilson has joined #openstack-infra08:59
*** skolekonov has joined #openstack-infra09:00
*** andreykurilin_ has quit IRC09:03
*** dannywilson has quit IRC09:03
*** garyh has quit IRC09:04
*** andreykurilin_ has joined #openstack-infra09:04
*** ildikov has quit IRC09:06
*** emagana has joined #openstack-infra09:07
*** Somay has quit IRC09:07
*** Ala has joined #openstack-infra09:07
*** andreykurilin__ has joined #openstack-infra09:08
*** yamahata has quit IRC09:09
*** andreykurilin_ has quit IRC09:09
*** ___mimir has joined #openstack-infra09:10
*** emagana has quit IRC09:11
*** Somay has joined #openstack-infra09:12
*** __mimir has quit IRC09:13
*** ghostpl_ has joined #openstack-infra09:13
*** jamielennox|away is now known as jamielennox09:14
*** tkelsey has joined #openstack-infra09:19
*** markvoelker has joined #openstack-infra09:23
*** tkelsey has quit IRC09:24
*** ghostpl_ has quit IRC09:24
*** tkelsey has joined #openstack-infra09:24
*** Longgeek has quit IRC09:24
*** Longgeek has joined #openstack-infra09:25
*** zz_johnthetubagu is now known as johnthetubaguy09:25
*** andreykurilin__ has quit IRC09:27
*** andreykurilin_ has joined #openstack-infra09:27
*** derekh has joined #openstack-infra09:28
*** markvoelker has quit IRC09:28
*** Longgeek has quit IRC09:30
*** dizquierdo has joined #openstack-infra09:32
*** mtreinish has quit IRC09:35
*** mtreinish has joined #openstack-infra09:36
*** amotoki has quit IRC09:37
*** _nadya_ has joined #openstack-infra09:40
*** Qiming__ has joined #openstack-infra09:41
*** yfried|afk is now known as yfried09:43
*** andreykurilin_ has quit IRC09:43
*** ZZelle has quit IRC09:43
*** ZZelle has joined #openstack-infra09:44
*** Qiming_ has quit IRC09:44
*** ominakov has quit IRC09:49
*** ominakov has joined #openstack-infra09:50
yolandahi AJaeger, ok09:52
*** yfried is now known as yfried|afk09:54
*** hichihara has quit IRC09:56
*** BobBall_AWOL is now known as BobBall09:58
*** yfried|afk is now known as yfried09:59
*** emagana has joined #openstack-infra10:01
*** ssam2 has joined #openstack-infra10:02
*** yamamoto has quit IRC10:02
*** garyh has joined #openstack-infra10:04
*** emagana has quit IRC10:05
*** yamamoto has joined #openstack-infra10:05
*** sileht has quit IRC10:07
*** e0ne has joined #openstack-infra10:08
*** yfried is now known as yfried|afk10:10
*** ddieterly has joined #openstack-infra10:10
*** dimsum__ has joined #openstack-infra10:11
*** yfried|afk is now known as yfried10:14
*** ddieterly has quit IRC10:14
*** garyh has quit IRC10:15
*** pblaho__ is now known as pblaho10:15
*** mfink__ has quit IRC10:19
*** hashar has quit IRC10:21
*** sileht has joined #openstack-infra10:22
*** sushilkm has joined #openstack-infra10:23
*** markvoelker has joined #openstack-infra10:24
*** yamamoto has quit IRC10:25
*** Longgeek has joined #openstack-infra10:26
*** yamamoto has joined #openstack-infra10:27
*** yamamoto has quit IRC10:28
*** markvoelker has quit IRC10:29
*** yfried is now known as yfried|afk10:30
*** Longgeek has quit IRC10:31
*** ___mimir has quit IRC10:33
*** rlandy has joined #openstack-infra10:36
*** pc_m has joined #openstack-infra10:40
*** erlon has joined #openstack-infra10:42
*** YorikSar has joined #openstack-infra10:47
openstackgerritValeriy Ponomaryov proposed openstack/requirements: Bumg ddt to min version 0.7.0  https://review.openstack.org/16616210:49
*** sushilkm has left #openstack-infra10:49
*** yfried|afk is now known as yfried10:50
*** ___mimir has joined #openstack-infra10:51
openstackgerritValeriy Ponomaryov proposed openstack/requirements: Bump ddt to min version 0.7.0  https://review.openstack.org/16616210:51
*** yamamoto has joined #openstack-infra10:52
*** BharatK has quit IRC10:52
*** enikanorov has quit IRC10:54
*** emagana has joined #openstack-infra10:55
*** e0ne is now known as e0ne_10:55
*** enikanorov has joined #openstack-infra10:55
*** tkelsey has quit IRC10:56
*** tkelsey has joined #openstack-infra10:56
*** e0ne_ is now known as e0ne10:57
*** emagana has quit IRC11:00
*** Longgeek has joined #openstack-infra11:01
*** tnovacik has quit IRC11:03
*** enikanorov has quit IRC11:04
*** enikanorov has joined #openstack-infra11:05
*** Somay has quit IRC11:05
*** enikanorov has quit IRC11:06
*** ghostpl_ has joined #openstack-infra11:06
*** enikanorov has joined #openstack-infra11:07
*** yfried is now known as yfried|afk11:07
*** Somay has joined #openstack-infra11:08
*** _nadya_ has quit IRC11:09
*** mpaolino has joined #openstack-infra11:09
*** ddieterly has joined #openstack-infra11:11
*** Qiming_ has joined #openstack-infra11:11
*** cdent has joined #openstack-infra11:12
*** baoli has joined #openstack-infra11:13
*** baoli has quit IRC11:13
*** Qiming__ has quit IRC11:15
*** ddieterly has quit IRC11:15
*** garyh has joined #openstack-infra11:16
*** jcoufal has quit IRC11:16
*** enikanorov has quit IRC11:26
*** garyh has quit IRC11:26
*** enikanorov has joined #openstack-infra11:27
*** yfried|afk is now known as yfried11:27
openstackgerritChris Dent proposed openstack/requirements: Update gabbi to 0.12.0  https://review.openstack.org/15625311:29
*** enikanorov has quit IRC11:30
*** enikanorov has joined #openstack-infra11:31
*** ldnunes has joined #openstack-infra11:31
*** yfried is now known as yfried|afk11:37
*** enikanorov has quit IRC11:39
*** enikanorov has joined #openstack-infra11:40
*** dtantsur|bbl is now known as dtantsur11:41
*** otter768 has joined #openstack-infra11:42
*** jlanoux has joined #openstack-infra11:43
*** dprince has quit IRC11:43
*** claudiub has joined #openstack-infra11:45
*** mestery is now known as mestery_afk11:45
*** ominakov has quit IRC11:47
*** otter768 has quit IRC11:47
*** emagana has joined #openstack-infra11:49
*** e0ne is now known as e0ne_11:49
openstackgerritMerged openstack/requirements: Remove failing project nova-docker  https://review.openstack.org/15626011:54
*** emagana has quit IRC11:54
*** fbo has joined #openstack-infra11:54
*** fifieldt has quit IRC11:54
*** fifieldt_ has joined #openstack-infra11:55
*** dizquierdo has quit IRC11:55
*** sdake has quit IRC11:57
*** fifieldt__ has joined #openstack-infra11:58
*** pelix has joined #openstack-infra11:59
*** dizquierdo has joined #openstack-infra11:59
*** e0ne_ is now known as e0ne11:59
*** fifieldt_ has quit IRC12:00
openstackgerritDmitry Tantsur proposed openstack/requirements: Add ironic-discoverd to projects.txt  https://review.openstack.org/15627012:01
*** yfried|afk is now known as yfried12:01
openstackgerritSean Dague proposed openstack-infra/os-loganalyze: extract static methods  https://review.openstack.org/16585012:02
openstackgerritSean Dague proposed openstack-infra/os-loganalyze: unwind test class multiple inheritance  https://review.openstack.org/16585112:02
openstackgerritSean Dague proposed openstack-infra/os-loganalyze: let tests be run from test file location  https://review.openstack.org/16579912:02
*** markvoelker has joined #openstack-infra12:03
*** ghostpl_ has quit IRC12:03
*** Qiming__ has joined #openstack-infra12:06
*** eharney has quit IRC12:07
*** Qiming_ has quit IRC12:07
*** Somay has quit IRC12:10
*** rfolco has joined #openstack-infra12:10
TheJuliagood morning12:10
*** ihrachyshka has joined #openstack-infra12:11
*** dprince has joined #openstack-infra12:11
*** ibiris is now known as ibiris_away12:11
*** ddieterly has joined #openstack-infra12:11
*** yfried is now known as yfried|afk12:11
*** jlanoux has quit IRC12:14
*** radez_g0n3 is now known as radez12:16
*** ddieterly has quit IRC12:16
*** radez_g0n3 has joined #openstack-infra12:16
*** radez_g0n3 is now known as radez12:16
*** ghostpl_ has joined #openstack-infra12:17
*** dkliban_afk is now known as dkliban12:18
*** anthonyper has quit IRC12:18
*** anthonyper has joined #openstack-infra12:18
*** aysyd has joined #openstack-infra12:21
*** ibiris_away is now known as ibiris12:22
openstackgerritMerged openstack/requirements: Bump novaclient version  https://review.openstack.org/16249212:22
*** jaypipes has joined #openstack-infra12:22
Kiallany requirements core besides sean about? https://review.openstack.org/#/c/158287/ :)12:25
*** garyh has joined #openstack-infra12:27
*** Longgeek has quit IRC12:27
AJaegersdague, https://review.openstack.org/#/c/164077/ is needed to fix important bugs in our documentation toolchain, please reconsider your -212:27
*** gordc has joined #openstack-infra12:27
*** baoli has joined #openstack-infra12:29
*** bknudson has quit IRC12:29
sdagueAJaeger: can we just remove the docs from g-r12:29
sdaguebecause honestly, there is no reason for the doc repos to be in there12:30
sdagueespecially as your freeze windows are different12:30
*** sdake has joined #openstack-infra12:30
AJaegersdague, we had this discussion already ;) We really like the syncing of requirements and somebody should implement this in a different way...12:31
sdagueyep, so then you have to live with freeze restrictions12:31
sdagueyou can't have it both ways12:31
AJaegersdague, but I just had one idea: We already sync from openstack-manuals the glossary, we could sync requirements, let me investigate12:31
sdaguemy patience on this point is pretty limitted12:31
*** yfried|afk is now known as yfried12:32
AJaegersdague, it was submitted a week ago - wasn't that before the feature freeze?12:32
sdagueand, honestly, openstack-manuals is such a small number of projects, it's way easier for you folks to sync your projects directly and not do these g-r round trips12:32
sdagueAJaeger: doesn't matter when it's submitted12:32
sdagueit didn't land12:32
AJaegersdague, when do you unfreeze? Is that before Kilo is released?12:33
sdagueafter all integrated projects have stable branches12:33
*** kgiusti has joined #openstack-infra12:34
*** baoli has quit IRC12:34
*** bswartz has quit IRC12:34
sdagueI *litterally* have no idea why you think g-r makes any sense for documentation team12:35
sdagueif the projects weren't in projects.txt you would have already landed these changes in your repos12:35
AJaegersdague, I have to leave for a meeting now - I understand your arguments but need a different solution.12:36
sdagueI don't know why12:36
AJaegerOnce we have one, I happily do the changes on the documentation side.12:36
sdagueno, seriously, you have what, 6 repos?12:36
AJaegersdague, when we did this, we had 10+12:36
*** adalbas has joined #openstack-infra12:36
sdagueright, but you don't now12:37
*** garyh has quit IRC12:37
AJaeger;)12:37
sdagueand, even then, it would have been so much faster for you to local sync all those then going through the g-r process12:37
sdagueit makes 0 sense that you keep insisting on that12:38
*** hodos has joined #openstack-infra12:40
*** adalbas has quit IRC12:41
*** e0ne is now known as e0ne_12:41
*** Longgeek has joined #openstack-infra12:42
*** unicell1 has joined #openstack-infra12:43
*** emagana has joined #openstack-infra12:43
*** e0ne_ is now known as e0ne12:44
*** unicell has quit IRC12:44
*** markus_z has quit IRC12:46
sdaguefungi: can you land - https://review.openstack.org/#/c/165542/ - I think it will fix some of the es indexing12:47
*** emagana has quit IRC12:48
*** markus_z has joined #openstack-infra12:49
*** ddieterly has joined #openstack-infra12:50
*** sdake_ has joined #openstack-infra12:52
*** adalbas has joined #openstack-infra12:53
*** ddieterly has quit IRC12:53
*** pelix has quit IRC12:53
*** bknudson has joined #openstack-infra12:54
*** baoli has joined #openstack-infra12:54
*** sdake has quit IRC12:55
openstackgerritRafael Folco proposed openstack-infra/system-config: Updates to running-your-own CI docs: Changes required  https://review.openstack.org/16226812:55
*** baoli has quit IRC12:58
*** baoli has joined #openstack-infra12:59
*** ChuckC_ has joined #openstack-infra13:00
*** ChuckC has quit IRC13:01
*** bradjones has joined #openstack-infra13:01
*** ChuckC_ has quit IRC13:05
*** enikanorov has quit IRC13:09
*** mattfarina has joined #openstack-infra13:10
*** enikanorov has joined #openstack-infra13:11
*** ihrachyshka has quit IRC13:14
*** yfried is now known as yfried|afk13:15
*** eharney has joined #openstack-infra13:15
*** xyang1 has joined #openstack-infra13:16
*** ChuckC_ has joined #openstack-infra13:19
*** bswartz has joined #openstack-infra13:19
*** dustins has joined #openstack-infra13:22
*** ildikov has joined #openstack-infra13:23
openstackgerritPaul Belanger proposed stackforge/gertty: Add missing requirement for six  https://review.openstack.org/16621813:24
*** JoshNang has quit IRC13:24
*** eharney has quit IRC13:25
*** zz_dimtruck is now known as dimtruck13:25
openstackgerritPaul Belanger proposed stackforge/gertty: Add missing requirement for six  https://review.openstack.org/16621813:25
*** JoshNang has joined #openstack-infra13:26
*** eharney has joined #openstack-infra13:26
*** dimsum__ has quit IRC13:27
dprinceIf I manually clear out the TripleO RH1 cloud instances will nodepool discover them missing on its next cycle and recreate them?13:32
*** ffrog has joined #openstack-infra13:33
*** nilasae is now known as nilasae|afk13:33
*** eharney has quit IRC13:33
*** Longgeek has quit IRC13:34
*** amotoki_ has quit IRC13:35
mordreddprince: if not, it's super simple to delete them from nodepool's database13:35
*** yfried|afk is now known as yfried13:35
*** amotoki has joined #openstack-infra13:35
*** cdent has quit IRC13:36
dprincemordred: could you delete them for me?13:36
mordreddprince: sure13:36
dprincemordred: the TripleO RH1 zone nodes.13:36
*** jamielennox is now known as jamielennox|away13:36
*** peristeri has joined #openstack-infra13:37
*** emagana has joined #openstack-infra13:37
*** garyh has joined #openstack-infra13:37
mordreddprince: you're in luck - nodepool already doesn't think it has any nodes there13:38
dprincemordred: great. any idea how long before I see new ones spawning?13:39
mordredlet me look at the logs real quick ...13:39
*** wuhg has quit IRC13:39
*** gaelL_ has quit IRC13:39
mordreddprince: should be soon - nodepool shows a demand13:41
mordreddprince: 2015-03-20 13:39:44,584 DEBUG nodepool.NodePool:   Deficit: tripleo-f20: 31 (start: 31 min-ready: 8 ready: 0 capacity: 0)13:41
mordred2015-03-20 13:39:44,603 DEBUG nodepool.NodePool:   Deficit: tripleo-precise: 51 (start: 51 min-ready: 8 ready: 0 capacity: 0)13:41
*** rhe00 has quit IRC13:41
*** gaelL has joined #openstack-infra13:41
openstackgerritMerged openstack/requirements: Import cap.py tool to cap  explicit dependencies  https://review.openstack.org/15545413:41
openstackgerritMerged openstack/requirements: Up pymongo version to avoid memory leak  https://review.openstack.org/12399513:42
dprincemordred: cool, thanks13:42
*** rhe00 has joined #openstack-infra13:42
*** emagana has quit IRC13:42
openstackgerritMerged openstack/requirements: Block eventlet 0.17.0  https://review.openstack.org/15828713:42
mordreddprince: oh! no, we have you turned off ...13:42
*** mpavone has quit IRC13:42
mordreddprince: one sec - let me see what your quota setting should e13:42
openstackgerritPaul Belanger proposed stackforge/gertty: Add support for tox -epep8  https://review.openstack.org/16622913:43
*** sushilkm has joined #openstack-infra13:43
*** sushilkm has left #openstack-infra13:43
*** otter768 has joined #openstack-infra13:43
mordreddprince: k. NOW you should start seeing nodes build13:43
*** ddieterly has joined #openstack-infra13:43
dprincemordred: okay, thanks. Will watch these closely13:44
*** Qiming__ is now known as Qiming13:45
*** yfried is now known as yfried|afk13:45
Qiminghello, openstack-infra, another review of this new project proposal is appreciated: https://review.openstack.org/#/c/164963/13:46
Qimingthanks13:46
dprincemordred: I see them going ACTIVE, and floatingips too13:47
mordreddprince: woot!13:47
dprincemordred: 1 major outage in a year isn't too bad. Thinking the root cause was a MySQL issue of sorts13:47
*** otter768 has quit IRC13:48
*** garyh has quit IRC13:48
dprincemordred: still looking into the logs but simply bouncing MySQL and clearing out some things made it happy again (we think)13:48
*** ildikov has quit IRC13:48
mordredcool! yeah - that's actually pretty solid I think13:48
mordredI mean, we had issues with hp public cloud yesterday that were also mysql related ... so it's fair :)13:48
*** eharney has joined #openstack-infra13:49
*** mtanino has joined #openstack-infra13:50
*** ihrachyshka has joined #openstack-infra13:51
*** tkelsey has quit IRC13:51
*** dimsum__ has joined #openstack-infra13:51
*** raginbajin has quit IRC13:53
*** hdd has joined #openstack-infra13:54
*** amitgandhinz has joined #openstack-infra13:54
*** raginbajin has joined #openstack-infra13:55
*** dboik has joined #openstack-infra13:55
*** alexpilotti has joined #openstack-infra13:56
openstackgerritMerged openstack-infra/project-config: puppet-openstack update  https://review.openstack.org/16333313:57
openstackgerritMerged openstack-infra/project-config: Custom OVERRIDE_ENABLED_SERVICES for heat-dsvm-functional  https://review.openstack.org/16248713:57
openstackgerritMerged openstack-infra/project-config: Run ironicclient functional tests as STACK_USER  https://review.openstack.org/16355213:57
*** hdd has quit IRC13:59
openstackgerritPaul Belanger proposed openstack-infra/project-config: Add pep8 / py27 gates for gertty  https://review.openstack.org/16623413:59
sdaguemordred: speaking of hpcloud, is that recovered yet?13:59
mordredsdague: kinda - we've found a rate limit that seems to be working out ok and not causing death14:00
*** notnownikki has joined #openstack-infra14:00
*** tqtran has joined #openstack-infra14:00
sdagueok, we still have like 500 nodes in building14:00
mordredbut we haven't poked further to see if we can increase it14:00
*** tqtran has quit IRC14:01
*** _nadya_ has joined #openstack-infra14:01
*** yfried|afk is now known as yfried14:02
*** dansmith is now known as superdan14:03
mordredsdague: the underlying problem seems to be a thundering herd issue with an interaction between slow deletes and quota interactions14:03
mordredsdague: in that something in there takes enough time that if our API rate hits above a certain point, Mysql can't service faster than it's getting new queries14:04
sdagueinteresting, would be nice if we could get a more direct link into the ops to figure out what that hot spot is, and if it's fixable in the code side14:06
*** esker has joined #openstack-infra14:06
mordredsdague: so you see TONS of things in _refresh_quota_usages14:06
openstackgerritMerged openstack-infra/project-config: Add experimental job for Manila scenario tests  https://review.openstack.org/16410214:06
*** cdent has joined #openstack-infra14:06
openstackgerritMerged openstack-infra/project-config: Change project description text  https://review.openstack.org/16450114:07
mordredsdague: I'm certain I could set that up- I mean, I started looking at nova source code last night, then started backing away quietly14:07
openstackgerritMerged openstack-infra/project-config: Change node param for ec2api rally job.  https://review.openstack.org/16471714:07
sdagueyeh, the quotas code is ... problematic14:07
mordredthe select for update is just not a good idea :)14:07
sdagueyeh, most of that is getting unwound14:08
openstackgerritMonty Taylor proposed openstack-infra/system-config: Turn HP back on with lower rate limit  https://review.openstack.org/16623914:10
mordredI was also thinking - in addition to adding rebuild support to nodepool14:10
mordredwe have knowledge of what our desired amount of nodes is at any given point in time - we should look in to sending multi-node requests14:11
mordredso rather than saying "nova boot" 100 times, we should say "nova boot --count=100" - perhaps14:11
clarkbyou still need 100 fip attaches, this was sort of my point yesterday about why this is :(14:13
mordredyah. that part is still :(14:13
clarkbsure we optimize one but but still we are o(n) because cloud14:13
dprincemordred: are there public logs I could view to gain insight into why the Fedora jobs are still queued?14:14
mordredyah - but one thing at a time14:14
dprincemordred: Seeing active instances getting deleted now. Makes me thing something is failing with regards to setting up the Fedora slaves14:14
mordredclarkb: are we public-ing the nodepool logs? ^^14:14
dprincemordred: FWIW the Ubuntu jobs seem to be running fine we think14:14
clarkbno because openstacj leaks private data to logs14:15
*** esker has quit IRC14:15
*** hodos has quit IRC14:15
dprincemordred: not fine actually, but at least trying to run...14:15
*** esker has joined #openstack-infra14:15
clarkbdprince I gave you the error froma fedora node yesterday14:15
dprinceclarkb: right, we think we solved that one.14:16
dprinceclarkb: nodepool got turned off yesterday for TripleO14:16
dprinceclarkb: now it is back on again so we are checking some things14:16
*** sputnik13 has joined #openstack-infra14:19
mordredsdague: while we're on the subject - why does a delete api call take a long time? is it blocking on something rather than just plopping a delete request on a queue?14:19
sdagueit's an async call as far as I know14:19
*** mestery_afk has quit IRC14:19
fungiyeah, if you manually nova delete, you'll see it says that it accepted the request, but it doesn't actually disappear from nova list for a while14:20
*** prad has joined #openstack-infra14:21
*** Qiming_ has joined #openstack-infra14:21
fungiso are we still wanting to revert the nodepool patch from yesterday?14:22
sdaguemordred: nope, I'm wrong, it's sync14:23
mordredfungi: _I_ don't14:23
fungii assume from the looks of the graph that hpcloud is still in a bad way14:23
mordredsdague: I'd suggest making it async - since deleting doesn't actually happen at that point anyway14:23
sdagueyeh, easier said than done14:23
mordredsdague: like, we block on the aPI call for 30 seconds , and still have to wait for hours for the node to get deleted14:24
mordredsdague: :)14:24
sdagueI'm looking through that code path14:24
clarkbdprince: BadRequest: Error. Unable to associate floating ip (HTTP 400) (Request-ID: req-cedfd88b-ba1e-4a4e-aa19-e6d131fd8db7)14:24
*** tqtran has joined #openstack-infra14:24
mordredsdague: I'm in a chat with folks about the issues - do you want me to bring up that you think it might be interesting to dig in?14:24
*** Qiming has quit IRC14:24
sdaguemordred: sure, though honestly it probably won't be today14:25
*** che-arne has joined #openstack-infra14:25
mordredsdague: k. I'll bring that up in a different email then14:25
dprinceclarkb: sigh, so the same error again?14:25
*** timcline has joined #openstack-infra14:25
*** ominakov has joined #openstack-infra14:26
clarkbdprince: yes14:26
*** bhunter71 has joined #openstack-infra14:26
dprinceclarkb: thanks14:26
clarkbat least from nodepools perspective that is what is happening14:26
dprinceclarkb: are those public?14:26
*** e0ne is now known as e0ne_14:26
clarkbdprince: no we can't make these logs public because openstack clients refuse to sanitize their logging14:26
mordredclarkb: have we checked that recently? it's possible it's been fixed14:26
dprinceclarkb: also, do you see lot of these or just a few. Using clients myself I'm seeing floatingip's get assigned just fine14:27
clarkbmordred: I haven't checked it since last summit but also not sure we have upgraded any clients since last summit either14:27
*** e0ne_ is now known as e0ne14:27
dprinceclarkb: well, at least I did for the first round of instances14:27
fungii think they (some?) still do it in debug, so if we set debug on a client lib using service and it applies transitively, credentials in logs14:27
*** peristeri has quit IRC14:27
sdaguemordred: so my guess, honestly, is it's all the quotas calculations is the delete cost14:27
clarkbdprince: 65 since the log was last roatated14:27
openstackgerritMerged openstack/requirements: Remove hardware-specific proliantutils module  https://review.openstack.org/15800014:27
openstackgerritMerged openstack/requirements: Do not break on projects without setup.cfg  https://review.openstack.org/15622014:28
clarkbdprince: looks like it rotated ~6 hours ago14:28
openstackgerritMerged openstack/requirements: Add a script to find cruft global requirements  https://review.openstack.org/14807114:28
fungiwhere "it" is print full copies of what's being sent in the api calls, which includes credentials14:28
*** tsg_ has joined #openstack-infra14:28
dprinceclarkb: I see many instances with floatingips actually. This one default-net=10.2.8.125, 66.187.229.119; tripleo-bm-test=192.168.1.79. The 66. address is the floatingip14:29
*** peristeri has joined #openstack-infra14:29
clarkbdprince: what is the instance uuid?14:29
dprinceclarkb: daf50f7d-d73c-460b-9634-274462c6e6c414:30
*** yfried is now known as yfried|afk14:30
clarkbdprince: Exception: Timeout waiting for ssh access is the error from that node14:31
clarkbwhich may be related to the issue ianw discovered on rax f20 nodes (prevented node from booting properly so ssh would fail)14:31
dprinceclarkb: right, I was thinking similar14:32
jd__ok sorry to ask here but I'm dumb; why is https://review.openstack.org/#/c/164182/ not merging? what do I miss?14:32
mordredsdague: well, that would fit with my napkin theory14:33
dprinceclarkb: could just be slowness though, still trying some things.14:33
*** wenlock has joined #openstack-infra14:33
mordredsdague: since it was the quota code that was killing the db - and it was mostly happening when we were not ratelimiting the deletes - so if they're long and sync ... that'll pile up easily14:33
clarkbmordred: sdague if this affects kilo nova hopefully it is treated as a critical bug and we can look at it before we release14:34
*** scheuran has quit IRC14:34
clarkbjd__: I am not sure at first glance, let me poke around14:34
fungidprince: are you able to see the virtual console for it? when we ran into that, if it's what we ran into, the console was looping with the bootloader failing to find a config14:35
*** prad has quit IRC14:35
openstackgerritMerged openstack-infra/project-config: Add job for network based elastic-recheck queries test  https://review.openstack.org/16486914:35
mordredclarkb: I agree - it's effectively a DDOS in a box right now14:35
dprincefungi: I can get at it I think. Will involve some tunnelling trickery so give me a bit.14:35
sdagueclarkb: this is a super long standing issue that requires substantial architecture changes14:35
mordredawesome14:35
*** prad has joined #openstack-infra14:36
openstackgerritMerged openstack-infra/project-config: Add gate check skip for rst/doc files os-ansible-deployment repository  https://review.openstack.org/16427114:36
clarkbsdague: huh I guess we never tripped it before because we serialized deletes14:37
fungijd__: clarkb: is there a dependency loop there? some of the changes depending on that one also have depends-on commit message headers set to other changes in the same project which i think might be also in that git dependency chain14:37
fungii'm trying to map out the dependencies behind it but they're a little complex14:37
clarkbsdague: which is why we had such a large delete backlog in the node graphs for so long14:37
clarkbfungi: I am looking at zuul logs14:38
clarkbfungi: hopefully between the two we get an answer14:38
mordredclarkb: I just had a VERY evil thought14:38
sdagueright14:38
mordredclarkb: what if we stopped deleting full-stop14:38
mordredclarkb: and just replaced our delete calls with rebuild calls14:38
mordredsince delete is broken14:38
jd__fungi: ah good hint let me check14:38
mordredit means our consumption would never decrease14:38
mordredbut actually our load against the clouds would be much less14:38
clarkb2015-03-20 10:23:37,390 DEBUG zuul.IndependentPipelineManager: Change <Change 0x7fef948f42d0 164182,6> does not match pipeline requirement <ChangeishFilter required_approvals: [{'username': 'jenkins', 'verified': [1, 2]}]> is interesting, we probably want a verified of 0 to be valid for merge check14:39
clarkbmordred: we need to test it with rackspace14:39
mordredclarkb: yah14:39
mordredclarkb: test that rebuild works you mean?14:39
clarkbmordred: iirc they were the cloud that said don't use rebuild14:39
clarkbya14:40
*** tonytan4ever has joined #openstack-infra14:40
clarkbbecause I am pretty sure rax's feedback a while back was rebuild is :(14:40
clarkbso we didn't keep looking into it with much priority14:40
openstackgerritJulien Danjou proposed openstack-infra/project-config: Move Gnocchi from Stackforge to OpenStack  https://review.openstack.org/16214614:40
openstackgerritJulien Danjou proposed openstack-infra/project-config: Remove some tests to Gnocchi  https://review.openstack.org/16421114:40
mordredmy god. so HP actively wants us to rebuild and RAX actively wants us to not14:40
mordredthat's so great14:40
clarkbmordred: well that may have changed14:40
sdagueI wonder if that's because rebuild on xen is hokey?14:41
mordredclarkb: it would take _slightly_ more work in nodepool than just a quick hack, btw14:41
jeblairclarkb: i don't remember negative feedback from rax about rebuild14:41
*** armax has joined #openstack-infra14:41
*** asselin_ has joined #openstack-infra14:41
*** dustins_ has joined #openstack-infra14:42
jd__fungi: there was a loop with another repo but at a later point in the branch, not sure that's the issue14:42
clarkbjeblair: the feedback was it will perform worse in our cloud so please do what you are doing now iirc14:42
fungijd__: looks like you found it. you had a child of a child of that change which was I5ddb00a depending on I56f1988 which was in turn depending on I5ddb00a14:42
jeblairclarkb: where was that feedback?14:42
clarkbjeblair: here in irc when mordred asked them about it14:42
jd__fungi: ok if that's it cool :)14:42
jeblairclarkb: i remember jogo saying he might want to make some things more efficient, but that's it14:42
fungijd__: i _think_ zuul tries to build up the entire dependency set including children and parents of the given change and if it finds a loop anywhere in there it aborts14:42
jd__ack :)14:43
fungijeblair: ^ yes?14:43
jeblairfungi: that should be the case yes14:43
jd__fungi: jeblair: sounds like it, my recheck has been picked! :)14:43
jd__thanks guys <314:43
fungiawesome14:43
jeblairclarkb: who provided that feedback?14:44
clarkbjeblair: I do not remember the specific individual14:44
*** dustins has quit IRC14:44
*** sushilkm has joined #openstack-infra14:45
*** marun has joined #openstack-infra14:45
*** sushilkm has left #openstack-infra14:45
*** sputnik13 has quit IRC14:45
jeblairclarkb: well, who shall we ask again then?14:45
*** rlandy has quit IRC14:46
clarkbI am reading logs...14:46
sdagueso, I think all the time is probably in this - https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L3463 . Optimizing there would probably be the place to do it, however stuff like that has enough ripple effects that it's definitely not a post freeze issue14:47
*** enikanorov has quit IRC14:47
*** claudiub has quit IRC14:48
*** garyh has joined #openstack-infra14:48
*** enikanorov has joined #openstack-infra14:48
jeblairmordred: anyway, please let's not invest time in making this version of nodepool use rebuild.  it would be a huge change to the algorithm that we will then throw away with zuulv3.  if hpcloud can't improve, let's switch back to the old task algorithm and save rebuild for zuulv3.14:49
openstackgerrityolanda.robla proposed openstack-infra/system-config: Don't hardcode pip.conf values  https://review.openstack.org/16625214:49
mordredjeblair: that's not what I was talking about14:50
clarkblooks like phschwartz thought it was a good idea in http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2014-06-12.log and was going to work on a patch. So at least according to that log they were ok with it14:50
clarkbnow to see if my memory is based on results from writing that change?14:50
*** yfried|afk is now known as yfried14:50
mordredjeblair: I was talking about not changing the algorithm - and just literally never making a delete call on an existing node in nodepool, but instead changing the body of deleteNode to call rebuild14:51
mordredjeblair: which is why I said it was an evil idea - since it would essentially keep us at max utilization constantly14:51
jeblairmordred: i understood that.  that's still a major change to the algorithm.  you have to decide what to rebuild into, etc.14:51
mordredjeblair: fair nuff14:52
jeblairbiab14:52
*** ___mimir has quit IRC14:53
*** masayukig_ has joined #openstack-infra14:53
mordredclarkb: we don't seem to mark subnodes with deleted state in the db14:55
mordredclarkb: we just call cleanupNode on them and then call node.delete()14:55
yolandaah, clarkb, i need you, or fungi14:56
*** aysyd has quit IRC14:56
yolandaa new gozer group was created14:56
yolandafor stackforge projects14:56
yolandaand i need someone to add people there14:56
*** mrunge has quit IRC14:56
clarkbyolanda: who should be the first group member (they can add the remaining members)14:56
yolandayou can add myself14:57
clarkbyolanda: you have two accounts, can you give me the account id number for the one you are using?14:57
yolandaah, ok14:57
clarkbmordred: that doesn't update teh subnodes table?14:57
yolandait's still that legacy thing14:57
yolandalet me check14:57
mordredclarkb: don't think so14:57
mordredclarkb: I could be wrong though14:57
mordredclarkb: I'm not REALLY looking at that - mainly said it here to remind me to look further14:58
clarkbthere is a state column and on the running DB they have different states14:58
fungiwe supposedly have 388 alien nodes in hpcloud right now14:58
fungishould we be playing whack-a-mole with these still?14:58
yolandaah, mordred, jeblair, for the rebuild, i told Tim that should be better to create a spec14:58
mordredfungi: I'm VERY confused as to how we keep growing alien node there14:59
yolandaas it involves changes on nodepool logic, for the capacity algorithm14:59
*** garyh has quit IRC14:59
clarkbfwiw I am not finding any follwup to the above convesation in the logs so I may have misremembered something that was said14:59
yolandaclarkb href="https://login.launchpad.net/+id/yMkMBPe"14:59
mordredyolanda: yes - it's definitely spec worthy - but I agree with jim, it's more likely something we'll want to do as part of zuulv314:59
clarkbphschwartz: any idea where you got with nodepool using rebuild?14:59
*** aysyd has joined #openstack-infra14:59
fungimordred: i can grab an example uuid and put together a boot/delete/whatever timeline if someone in hpcloud noc wants to trace the corresponding api calls14:59
clarkbyolanda: can you give me the gerrit account id? https://review.openstack.org/#/settings/ is wher you can find it14:59
yolandaclarkb, yolanda.robla15:00
mordredfungi: of one of our aliens? yeah - let's try that15:00
clarkbyolanda: done15:00
anteayathe weather has been good so I have to start boiling down sap today, will take me a few hours to get set up, back later15:00
yolandamordred, yes, concern i had is that you only need a rebuild if you still have demand of these types of nodes15:00
mordredyolanda: you can rebuild a node to a different type15:00
mordredyolanda: you don't have to be that fancy15:01
*** dimsum__ has quit IRC15:01
*** AJaeger has quit IRC15:01
yolandaah, nice, didn't know that it was possible15:01
yolandabut how about flavor ? will that work for different flavours?15:01
yolandaclarkb, thx15:02
clarkbno, I think rebuild basically takes an arbitrary image, writes it over an existing VMs disk, then reboots the VM15:02
clarkbso flavor needs to be constant15:02
*** stevemar has joined #openstack-infra15:02
fungimordred: cool, seeing what i can put together from our logs for a sample15:02
yolandaand we have different ones for bare, devstack, right?15:02
clarkbyolanda: we do not15:02
clarkbbut nodepool will probably need to solve that generally15:02
clarkbsince others may15:02
yolandawe have it downstream, less memory for bare15:03
yolandaso it should pick a different flavour15:03
fungialso as we start using nodepool for more varied tasks, we may want it for ourselves too15:03
fungiso we would go from having per-label demand to per-flavor demand, i guess15:04
mordredyes - all of those things are true15:04
clarkbit would be both, because per-label would determine what to boot into15:05
mordredhowever - at the moment - none of those things are real _current_ requirements15:05
fungior probably two tiers there since we would still possibly want to pre-boot and attach the workers before the jobs want to run things on them15:05
fungiyeah, that15:05
mordredthey are requirements we should do - and should take in to account when we design the thing15:05
*** reed has joined #openstack-infra15:05
fungihrm... the math on that model is going to get fun15:06
openstackgerritMerged openstack/requirements: Add ironic-lib to project.txt  https://review.openstack.org/16160315:06
fungibut i need to ponder other more immediate concerns right now, so will revisit later15:06
clarkbwe are essentially taking over nova's scheduling problem15:06
clarkbwhich as a user is not what I would like to be spending my time doing15:06
clarkbbhunter71: can you see my comment on 161994? curious to hear what you think about that15:07
*** ociuhandu has quit IRC15:08
sdaguehmmmm jeblair / clarkb - https://review.openstack.org/#/c/165851/ another one of those zuul merge incorrect errors15:09
openstackgerritMerged openstack-infra/devstack-gate: Add ironic-lib to devstack-vm-gate-wrap.sh  https://review.openstack.org/16160015:10
openstackgerritMerged openstack-infra/devstack-gate: Remove configurable testr artifact processing  https://review.openstack.org/16142215:11
*** ffrog has quit IRC15:11
*** mjturek1 has joined #openstack-infra15:11
*** armax has quit IRC15:11
bhunter71clarkb: thanks, I think that helps.   I wanted to change the again, anyway.15:11
bhunter71sorry,I wanted to change the format anyway.15:12
clarkbbhunter71: as long as it sorts well I think its fine15:12
clarkbsdague: jeblair looked into that yesterday and found gerrit doesn't always show the dependency when you push a series and query it immediately15:13
*** yamahata has joined #openstack-infra15:13
*** dimsum__ has joined #openstack-infra15:14
openstackgerritMerged openstack-infra/devstack-gate: XenAPI: Highlight that eth4 does not exist outside the Citrix environment  https://review.openstack.org/16560715:14
openstackgerritMerged openstack/requirements: Bump tempest-lib min version  https://review.openstack.org/16604415:14
sdagueclarkb: ah, right15:14
*** dannywilson has joined #openstack-infra15:14
*** mestery has joined #openstack-infra15:15
*** sputnik13 has joined #openstack-infra15:15
clarkbmordred: looking at graphs the hpcloud error rate is almost 100%15:15
mordredclarkb: awesome15:15
clarkbmordred: so while we may not be making nova fall over, it isn't doing us any good15:15
*** radez is now known as radez_g0n315:15
*** nelsnelson has joined #openstack-infra15:16
*** dimsum__ has quit IRC15:16
mordredclarkb: well, time to dive back in to figuring out what's failing with node boots15:16
*** rkukura_ has joined #openstack-infra15:16
*** yfried is now known as yfried|afk15:17
*** sushilkm has joined #openstack-infra15:17
*** sushilkm has left #openstack-infra15:17
*** emagana has joined #openstack-infra15:17
*** nelsnelson has quit IRC15:17
*** rkukura has quit IRC15:17
*** rkukura_ is now known as rkukura15:17
*** yamahata has quit IRC15:17
*** sputnik13 has quit IRC15:18
*** nelsnelson has joined #openstack-infra15:18
*** ddieterly has quit IRC15:18
*** radez_g0n3 is now known as radez15:18
*** sdake has joined #openstack-infra15:19
*** ddieterly has joined #openstack-infra15:19
*** ajmiller has joined #openstack-infra15:19
jogojeblair: yeah, johnthetubaguy said there is some nova / xen side work to do15:20
clarkb oh cool I am not crazy15:20
jogojeblair: to make rebuild more efficient15:20
jogoI think it involved making sure they don't delete the image and redownload it during a rebuild15:20
*** sdake_ has quit IRC15:20
johnthetubaguyjogo: ah, yeah, its more the idea we could cache every image thats in use, not just base images15:21
*** sputnik13 has joined #openstack-infra15:21
*** sdake_ has joined #openstack-infra15:21
*** jogo is now known as flashgordon15:21
*** openstackgerrit has quit IRC15:21
johnthetubaguyjogo: xenapi doesn't use the image cache stuff, this work has just dropped on my plate actually, although there are some more urgent things before I will get to this, it should happen15:21
*** openstackgerrit has joined #openstack-infra15:22
clarkbjenkins02 appears to be spiralling into thread leak terribleness15:22
clarkbI am going to put it in shutdown mode so that I can get data for my upstream bug15:22
flashgordonjohnthetubaguy: :/15:22
*** sigmavirus24_awa is now known as sigmavirus2415:22
flashgordoneven without that would rebuild be faster or slower or the same as boot delete cycles15:22
johnthetubaguyflashgordon: it should be a little faster/more reliable due the lack of IP stuff, and scheduling etc, but thats going to be saving 5-10seconds I would guess15:23
*** sdake__ has joined #openstack-infra15:23
*** sdake has quit IRC15:23
*** hdd has joined #openstack-infra15:24
mordredyah. so it wouldn't kill rax -a nd it would be a huge win for hp15:24
johnthetubaguyflashgordon: with the above change, it should save 190-200 seconds15:24
BobBallSo there is no reason to favour delete/rebuild in RAX?15:24
johnthetubaguyBobBall: there is a reason to favour rebuild, it saves quite a few error conditions from being possible15:24
johnthetubaguyBobBall: its marginal though15:25
openstackgerritJoe Gordon proposed openstack-infra/project-config: Add new keystone tempest job to only run keystone tests  https://review.openstack.org/16431415:25
johnthetubaguyflashgordon: of course you can rebuild to a new image now too, so no needed to delete at the end of the day, technically15:25
*** sdake_ has quit IRC15:26
flashgordonnice! that is pretty neat15:27
flashgordonmordred hopefully that gave you the information you needed, any other rax/rebuild questions for johnthetubaguy ?15:28
mordredflashgordon: nope. that's awesome. thanks johnthetubaguy !15:28
*** sputnik13 has quit IRC15:28
johnthetubaguymordred: do let me know if anything crops up15:28
mordredjohnthetubaguy: also, we now know that you're the new pvo in terms of us bugging someone about rax nova questions15:28
*** sputnik13 has joined #openstack-infra15:28
mordredjohnthetubaguy: hope you're ok with that15:28
mordred:)15:28
*** sputnik13 has quit IRC15:28
johnthetubaguymordred: I noticed a few errors seem to happen just after you switch the images over15:28
johnthetubaguymordred: lol, sure15:29
*** ayoung has quit IRC15:30
clarkbok I hae a thread dump for jenkins02, is that a safe thing to upload to jenkins' jira?15:31
clarkbI will put a copy of it in my homedir on jenkins0215:31
clarkbjohnthetubaguy: what sort of errors? because that may make it not useable for us15:31
fungijohnthetubaguy: actually one of the reasons it's attractive to us is that we often end up waiting up to an hour for rax to assign an ip address on a nova boot request at peak activity, and rebuild was seen as a potential workaround for that15:32
*** carl_baldwin has joined #openstack-infra15:32
johnthetubaguyclarkb: I mean the existing method is hitting some image not found issues, like you deleted the image during the build, but not totally sure15:32
clarkbjohnthetubaguy: I see, not really something that is a problem generally, but a few corner issues15:33
johnthetubaguyfungi: hmm, never seen that take an hour, but yes, it would totally side step that15:33
fungiso that 10-15 second performance gain from reusing network configuration is maybe more like 3600 seconds15:33
clarkbwe can likely live with that :)15:33
fungijohnthetubaguy: at times we've been told that the regions we're booting in simply had no available ip addresses and so nova was waiting on some to free up15:34
johnthetubaguyfungi: not according to the reporting on my side, not seen any of the builds take quite that long, would love to dig into that if it comes up again15:34
*** yfried|afk is now known as yfried15:34
johnthetubaguyfungi: oh yeah, we worked around that now, only takes 15 mins to go back in the pool15:34
fungijohnthetubaguy: aha, so old info then. that's awesome15:34
*** yfried has quit IRC15:34
johnthetubaguyfungi: I think the switch configs had to get updated, etc15:34
mordredthat woudl be one of the benefits on the other side too - floating ips stay with a node that is rebuilt15:34
*** asselin_ has quit IRC15:35
johnthetubaguymordred: yeah, so we don't support those yes, sigh, but rebuild will do the trick for now15:35
fungithough we do still recycle instances quickly enough that 15 minutes to return them to the pool is potentially a lot of waste from rackspace's perspective still15:35
johnthetubaguys/yes/yet/15:35
mordredjohnthetubaguy: we LOVE that you don't have floating ips15:35
mordredjohnthetubaguy: we HATE floating ips15:35
*** ociuhandu has joined #openstack-infra15:35
mordredjohnthetubaguy: I really hope that you don't stop supporting servers having real ips15:35
mordredjohnthetubaguy: because I would consider that a regression from teh thing you do now which is very awesome15:36
johnthetubaguymordred: they are going to be an additional, AFAIK15:36
mordredyay!15:36
* mordred hugs johnthetubaguy15:36
* johnthetubaguy sends hug to brad mcconnall15:36
fungi301 hug redirect15:37
*** dustins_ has quit IRC15:37
*** dimsum__ has joined #openstack-infra15:37
johnthetubaguy:)15:37
mordrednow, if only I could convince johnthetubaguy to start using dhcp my life would be complete ...15:37
*** Qiming__ has joined #openstack-infra15:37
johnthetubaguymordred: yeah, I am kinda requesting that, but its not on a roadmap I have seen right now15:37
*** jaypipes is now known as leakypipes15:37
*** thedodd has joined #openstack-infra15:37
clarkbmordred: fungi: I have skimmed the thread dump from jenkins02, the only thing it seems to expose are the server's name, its ip address, some slave names, and some job names running on those slaves15:38
*** Qiming_ has quit IRC15:38
clarkbmordred: fungi do either of you want to double check it isn't leaking anything dangerous?15:38
johnthetubaguymordred: so config drive and cloud-init might do all that for you, not had chance to test it, so you can kill the agent in your image, if you want (needs an extra image prop xenapi_use_agent=False I think)15:38
fungiclarkb: where did you save it? in your homedir?15:38
mordredjohnthetubaguy: yes, that is correct15:38
clarkbfungi: yup on jenkins0215:38
mordredjohnthetubaguy: except you need patched config drive15:38
johnthetubaguymordred: patched config drive?15:38
mordredjohnthetubaguy: but that's fine - we have a workaround/know how to deal with it15:38
dprinceclarkb,mordred: so to disable the RH1 TripleO (temporarily) while we try some things do we need to push a patch?15:39
johnthetubaguymordred: OK, if its working thats cool15:39
mordredjohnthetubaguy: yes - upstream config drive does not yet support reading teh passthrough network info15:39
mordredjohnthetubaguy: but yeah - it's a thing we have a plan for15:39
clarkbdprince: ya, you want to update the nodepool.yaml.erb file to set your regions max servers to 015:39
BobBallQuick config drive question... does rax support volume IDs for config drive?15:39
dprinceclarkb,mordred: I mean I know we can update the nodepool conf... or would it be okay if we just temporarily block the public API port?15:39
mordredjohnthetubaguy: but if you used dhcp - we wouldn't need to work around anything15:39
johnthetubaguymordred: ack15:39
clarkbdprince: I think temporarily blocking the API port is also fine15:39
*** ChuckC_ is now known as ChuckC15:39
dprinceclarkb: okay, we might do that just so we don't have to bother you as much :), thansk15:40
johnthetubaguyBobBall: volume IDs? we just do what the upstream code does, I would have to check15:40
mordredBobBall: yes15:40
johnthetubaguyah, there you go15:40
mordredBobBall: rax allows you to mount config-215:40
jeblairdprince: yeah, an icmp reject would be best i think (don't just drop it)15:41
*** weshay has quit IRC15:41
dprincejeblair: okay, will try that15:41
BobBallI don't _think_ I mean disk-by-label - I mean specifying config_drive=<volume_id> when creating the server.  Personally never done it so don't know what the use case is for that though :)15:41
*** ominakov has quit IRC15:41
BobBalljhesketh mentioned it on https://review.openstack.org/#/c/155770/15:42
jeblairclarkb, mordred, yolanda: can you dump the information we just gathered about rebuild into a comment on https://review.openstack.org/#/c/164371/ ?15:42
jeblairclarkb, mordred, yolanda: we can put a paragraph about rebuild into the next iteration15:43
mordredjeblair: yes. I can do that15:43
*** ominakov has joined #openstack-infra15:43
*** otter768 has joined #openstack-infra15:44
*** weshay has joined #openstack-infra15:44
yolandaah, i need to read that spec15:45
jrolljohnthetubaguy: by 'patched configdrive' mordred means this thing that's deployed in our cloud: https://review.openstack.org/#/c/153097/15:45
jroll(in case that wasn't clear)15:46
*** claudiub has joined #openstack-infra15:46
* jroll wonders if we can get that merged within a year from the original patch15:46
mordredjeblair: I think I captured everything15:46
johnthetubaguyjroll: yeah, it should also have the regular network info in there two in the XenServer based VMs, I think, so regular cloud-init should have picked it up, in theory15:47
clarkbmordred: comment on flavor?15:47
jrolljohnthetubaguy: yeah, I don't think infra uses cloud-init though15:48
clarkbjroll: we do15:48
jrollorly15:48
jrolldo you use patched cloud-init or?15:48
clarkbjroll: at least as of yesterday, things sort of changed yesterday afternoon15:48
jrollheh15:48
clarkbmordred: speaking of, did all images get rebuilt this morning without cloud init?15:48
*** otter768 has quit IRC15:49
pabelangerjeblair, nice, spec.  I was thinking about multi-nodes this morning.  After looking into the current subnodes setup, I was having some troubles getting subnodes to use a different image then the parent.15:50
*** baoli has quit IRC15:50
jeblairpabelanger: yeah, it's about 1/4 of the spec-writing necessary for the zuulv3 work i outlined in an email a while ago.  i'm hoping to write up another chunk today.15:51
mordredclarkb: no - because we didn't ever move past hpcloud-b5 yesterday because it all went to hell15:51
clarkbmordred: except that nodepool rebuilds images every day15:51
mordredclarkb: good point - then if image builds were successful ...yes15:52
jeblairmordred, clarkb: i'll add the note about flavors15:52
*** ominakov has quit IRC15:52
*** harlowja_at_home has joined #openstack-infra15:52
fungimordred: here's an example alien node leak which hpcloud noc can maybe analyze from their side http://paste.openstack.org/show/193961/15:53
mordredjeblair: nod. thanks - I knew I was missing something15:53
clarkbmordred: looks like the snapshots are all still building15:53
clarkbmordred: and the dib images haven't been uploaded yet because some are still building15:54
clarkbmordred: so we haven't flipped that switch yet but it is in progress15:54
mordredfungi: did we actually submit a delete server task there?15:54
mordredclarkb: cool15:54
clarkbmordred: and maybe that will affect the hpcloud error rate if the metadata server is still hosed there15:54
fungimordred: that's a good question. nodepool says it deleted the node, but it didn't explicitly log the api call itself so... depends on how much we trust that nodepool is making those calls?15:55
fungimordred: i guess we could add all provider api responses to the debug log. not sure how much bloat that would add15:56
clarkbfungi: does that uuid ever show up in the nodepool log?15:56
fungiclarkb: nope15:56
clarkbfungi: I have a hunch that the 502 happens before assigning a uuid, nodepool says I don't need to delete this node in the cloud because it never exists (no uuid) and simply removes it from the db15:56
clarkbthen at some point in time hpcloud says "here have a node"15:57
fungiclarkb: great point. nodepool may be assuming that errors from a boot call are always going to be cleaned up on the provider side15:57
clarkbjroll: so we do still use cloud init, we are currently rebuilding our images to stop using it because ec2 metadata server isn't very reliable15:57
clarkbfungi: yes, and I am not sure it can assume much else15:58
fungiclarkb: also i'm not entirely sure how it would ever be able to be 100% certain that it needs to clean those up15:58
fungiyeah, agreed15:58
*** unicell1 has quit IRC15:58
jrollclarkb: right, thought you were moving away from it, though15:58
mordredjroll: yup15:59
jrollcool15:59
mordredjroll: two different efforts - one is related to getting dib images to work on rackspace to start with15:59
jesusaurusclarkb: fungi: yeah last night i had to clean out a bunch of nodes that nodepool listed as aliens15:59
*** garyh has joined #openstack-infra15:59
jrollmordred: right15:59
mordredjroll: this one is reacting to the fact taht ec2 metadata in hp is horky - so we want to stop using it there sooner15:59
jrollheh16:00
*** harlowja_at_home has quit IRC16:00
openstackgerritDavid Lyle proposed openstack/requirements: Raise cap for Django to allow 1.7  https://review.openstack.org/15535316:00
mordredI should remove in hp - I believe it's horky anywhere it exists16:00
openstackgerritSean Dague proposed openstack/requirements: Bump sahara client version  https://review.openstack.org/15542816:00
fungijesusaurus: seems to me like a nova bug, if the behavior we're theorizing is actually responsible16:00
jrollmordred: considering rackspace doesn't have a metadata service, that should make life easier16:00
*** ominakov has joined #openstack-infra16:00
jeblairfungi, mordred: i'm _pretty_ sure looking at the log in http://paste.openstack.org/show/193961/ that we will have issued a create server api call, gotten a 502 response from that, therefore we never received the server id, but the server was actually created16:00
mordredjroll: you'd think16:00
clarkbjeblair: yup that is my following too16:01
clarkbs/following/reading/ english hard16:01
mordredjeblair: that seems likely16:01
jrollha16:01
mordredso - it's possible that a 502'd api call can still result in a booted node16:01
*** dizquierdo has quit IRC16:01
fungiso... if hpcloud can confirm from their side the circumstances which cause the boot call to return an error but allow the server to still be built, that needs to be filed as a bug against nova yeah?16:01
jeblairfungi, mordred, clarkb: is there some, like, nova metadata we can stick in a create call that will show up on an inventory later so we could link that server to a "failed" api call?16:01
*** Qiming_ has joined #openstack-infra16:01
mordredjeblair: yes16:01
mordredjeblair: we can put anyting we want in the nova metadata16:02
fungii like the canary idea there16:02
jeblairmordred: and that's included in the create call so it's synchronous?16:02
mordredyup16:02
*** dimsum__ has quit IRC16:02
mordredjeblair: we can add more to my patch for that if you want16:02
clarkbfungi: yes I think that is a nova bug16:02
jeblaircool, so we should probably do that, but also, i do think api calls that return failures while succeeding are bad form :)16:02
mordredjeblair: https://review.openstack.org/#/c/126621/16:02
clarkbfungi: if api returns error node should not be booted16:02
mordredjeblair: indeed16:02
mordredjeblair: so, I addeda  nodepool dict to that metadata - would be simple to put more things in that16:03
jeblairmordred: cool16:03
fungijeblair: so are you thinking stick an identifier for the nodepoold in as metadata so that it can say "here's an instance in the list, metadata says it's one i built, but i don't have any record of it, delete now"?16:03
jeblairfungi: yeah16:04
clarkbfungi: jeblair I think we can do that with the node name fwiw16:04
jeblairfungi: i think it'd have to be since we delete the node record from the db16:04
clarkbthe name is essentially metadata that we already have16:04
fungiclarkb: not necessarily. consider multiple nodepoolds using a common tenant16:04
jeblairfungi: and you may be surprised at this -- i don't want to keep records of every node we've ever created.  I've seen where that goes  ;)16:04
clarkbfungi: oh hrm16:04
*** Qiming__ has quit IRC16:04
mordredI think it's cheap to add more things into the nova metadata16:04
fungijeblair: yeah, that's why i'm guessing we just have a reusable id of the nodepoold itself (maybe specified in its config)16:05
clarkbfungi: ya you are right, so each nodepool would need to add metadata that uniquely identified a booted node to a nodepool instance16:05
fungi"here stick this value in the metadata of ever instance you boot"16:05
*** masayukig_ has quit IRC16:05
fungier, every16:05
clarkbfungi: ya16:05
jeblairwfm16:05
*** sdake has joined #openstack-infra16:05
fungiand if nodepoold sees its own canary there, it knows it's one it built16:06
*** masayukig_ has joined #openstack-infra16:06
fungiin theory we could accomplish it without metadata by namespacing the instance hostnames, but that's ugliness16:06
mordredyeah - especially when we have a friendly metadata structure to use16:06
clarkbflashgordon: ^ any idea if the above api returned error but nova booted a node is already a filed bug?16:06
*** armax has joined #openstack-infra16:07
clarkbflashgordon: and if not ideas on how hpcloud can confirm it is a nova issue?16:07
*** amotoki has quit IRC16:07
*** amotoki has joined #openstack-infra16:07
ttxjeblair: fyi I wrote a new app for design summit scheduling -- one that allows PTLs to directly edit bits of info on sched.org16:09
ttxCurrently at https://github.com/ttx/summitsched16:09
*** sdake__ has quit IRC16:09
*** garyh has quit IRC16:10
ttxsched.org is all-or-nothing, this will allow to delegate maintenance of parts of the schedule content to people16:10
*** ominakov has quit IRC16:10
*** ominakov has joined #openstack-infra16:11
ttxalso enforces all sorts of rules, like prefixing of session titles with track name16:11
jeblairttx: cool, you want to run it in infraland?16:11
jeblairttx: (also, yay prefixing titles!)16:11
*** Qiming_ has quit IRC16:11
*** ominakov has quit IRC16:12
ttxjeblair: it will require infraland resourecs -- whether I'll be able to fuly puppetize it or just request an empty box with root shell is yet tbd16:12
ttx(depending on how much time I'll have on my hands)16:12
jeblairttx: we've already got a puppet model for 'install django and run the syncdb thing'16:13
jeblairttx: so it shouldn't be too hard16:13
jeblairttx: (graphite.o.o does that i think)16:13
ttxjeblair: yeah, just need to add the inital data load (track names and lead usernames)16:13
ttxAlso allows multiple people to help for the same track16:14
ttxrather than be PTL-reserved16:14
ttxCurrently considerign the ability to tag a session with multiple types, so that it appears on multiple tracks16:14
*** ayoung has joined #openstack-infra16:15
ttxbut the sched data model is pretty weak16:15
*** esker has quit IRC16:15
ttx(its API is weak too)16:15
clarkbfungi: were you going to check that thread dump?16:16
fungiclarkb: ahh, yep, grepping through it now16:16
fungifunny, just noticed that the spelling check in my irc client believes "grepping" is an actual word16:17
*** baoli has joined #openstack-infra16:17
*** sigmavirus24 is now known as sigmavirus24_awa16:17
*** masayukig_ has quit IRC16:18
*** thingee has quit IRC16:18
fungiclarkb: nothing troublesome that i can find16:20
openstackgerritClark Boylan proposed openstack-infra/project-config: Disable -dibtest jobs  https://review.openstack.org/16630216:20
*** sigmavirus24_awa is now known as sigmavirus2416:20
openstackgerritClark Boylan proposed openstack-infra/system-config: Cleanup devstack-(trusty|precise)-dib images  https://review.openstack.org/15889116:21
*** masayukig_ has joined #openstack-infra16:21
clarkbmordred: ^ getting those two changes in should allow you to delete the devstack-precise-dib and devstack-trusty-dib images on nodepool.o.o freeing up ~16GB of disk16:21
yolandamordred, i see your change for rate=64 for hpcloud? is really that needed? that's too high, going to make nodepool sloooow16:21
*** achuprin has quit IRC16:21
yolandaand i'm worried for the part i'm affected16:21
fungihrm, the test nodes graph says we have no nodes in use now16:21
mordredyolanda: it's required for us16:21
yolandabut this means 1 api call per 64 secs?!16:21
mordrednope16:22
mordredwell, for us16:22
funginodepool list says 14 nodes in use16:22
mordredit's 1 ever 12.816:22
mordredbecause we have 5 hpcloud regions16:22
mordredyolanda: I would not copy that setting if I were you16:22
mordredyolanda: if the noc is not yelling at you, your current setting is fine16:22
yolandano, of course :)16:22
fungii think something in the image updates for may have tanked rackspace node builds?16:22
clarkbfungi: all the clouds are basically 100% error rate16:22
yolandabut i was reviewing that change and my alerts raised16:22
mordredfungi: oh, that's not great16:22
fungii'm going to grab a console for one now and see what's going on16:23
clarkbfungi: its possible thats the remove cloud init oing into affect16:23
*** dimsum__ has joined #openstack-infra16:23
fungiclarkb: yeah, that was my suspicion as well16:23
clarkbso we may need to delete todays build in rax then revert those changes16:23
mordredsigh16:23
jeblairmordred, yolanda: it's also only required for the un-merged change that does rate limiting on start of requests instead of end16:23
clarkbfungi: thank you for checking the thraed dump I will upload that to the bug now16:23
yolandaah, jeblair, i looked at that, i'm willing that this merged16:23
yolandanodepool is our daily pain16:24
clarkbyolanda: nodepool or the cloud?16:24
clarkbif there are bugs in nodepool we should fix them16:24
mordredclarkb: same thing for them - they only have one cloud16:24
yolanda75% cloud 25% nodepool?16:24
mordredclarkb: which is, I believe, why they're more eager for the rebuild stuff16:24
rcarrillocruz++16:25
rcarrillocruzrebuild would be a killer feature for us16:25
jeblairmordred, yolanda: i think this is no better than what we had before and actually i think it is a little worse.  i think we should not merge that change and go back to our previous config16:25
yolandaif you look at the charts, most of nodes are in building and delete status16:25
mordredjeblair: yah16:25
*** unicell has joined #openstack-infra16:25
yolandamordred, jeblair, have you ever thought about nodepool serving docker instances? for lots of simple tests that will make things much faster16:27
yolandaa pep8 test, an alphabetized one16:27
yolandathese are very very simple things16:28
yolandawhy a full vm for that?16:28
jeblairyolanda: yes, i have.  that's something that we could consider doing in zuulv3 as well. but again, it would be very complicated in the current system.  partially because of the nodepool allocation system, but also complex for us because docker is not secure.16:28
yolandai was running tests on lxc when i started and i've always had that on my mind16:28
mordredjeblair: ++16:28
yolandajeblair, but discriminating the kind of tests that could use it... it could be a real helper16:29
yolandayou cannot run a tempest test there, but can run pep8 or unit testing16:29
fungiclarkb: mordred: no errors on the console. they're booting up to a local login prompt, but not reachable on the ip address reported by nova list (no response even via ping)16:29
mordredfungi: these are the rax nodes?16:29
fungimordred: yep16:29
jeblairyolanda: you don't need to convince me, i understand.16:29
zaromorning16:30
fungimordred: looks like just since this morning's image updates there16:30
mordredfungi: well, I mean, that definitely points to delete images and revert16:30
yolandajeblair, and how could we achieve it? some spec for it, and tied to the new nodepool spec?16:30
fungimordred: agreed. doing so now16:30
mordredI do NOT understand why16:30
mordredbut figuring out why is a task for later16:30
*** sabari has quit IRC16:31
*** spzala has joined #openstack-infra16:31
jeblairyolanda: yes, it could be done either as part of, or after, the zuulv3 work.16:31
yolandathat will be a killer feature for the simple tests16:32
rcarrillocruzi think docker is cool, but besides security implications, i think docker should be kept at the nova provider layer, and not on nodepool16:32
rcarrillocruzwould be great to maybe have that new hp infra cloud with docker or something16:32
clarkbits important to note that containers don't address the current problems because you still need somewhere to run the container. So the current issues need to be fixed first regardless16:32
yolandado providers allow it?16:32
rcarrillocruzand devoting nodes for pep816:32
rcarrillocruzetc16:32
jeblairclarkb: yep16:32
*** sabari has joined #openstack-infra16:32
jeblairokay, so i'd like to defer this conversation for later...16:32
yolandathe way i had it implemented, is that i had x static slaves, that were serving lxc containers16:33
jeblairand instead breach the subject that we have no workers right now.16:33
*** skolekonov has quit IRC16:33
mordredjeblair: yes.16:33
jeblairwell, i mean, we have 20.16:33
mordredjeblair: statistically, that's no workers16:33
*** dkranz has quit IRC16:34
jeblairwe have 295 building in rax16:34
clarkbfungi is deleting the new images we just built16:35
fungiall images built in rackspace in the last few hours are now deleted. hopefully we see some recovery there shortly16:35
jeblairokay cool16:35
clarkbthat should get us back to pre cloud init removal. Then we also need to revert those changes16:35
clarkbotherwise this will regress over the weekend16:35
jeblairi don't see anything happening on the hpcloud side16:35
jeblairas far as the noc asking us to load test or anything16:35
*** amotoki has quit IRC16:36
jeblairso i'd like to just go ahead and revert back to thursday morning's config16:36
fungithat works for me16:36
mordredk.16:36
jeblairmordred: if hpcloud improves something, we can check our logs for deletion times16:36
*** ayoung has quit IRC16:37
*** ociuhandu has quit IRC16:37
jeblairmordred: we see 30 second delete api calls enough that we should see an improvement in that time if they manage to make an improvement16:37
mordredjeblair: yah16:37
*** tjones1 has joined #openstack-infra16:37
jeblairrcarrillocruz, yolanda: ^ (if they are able to improve things, this would help you too)16:37
*** MrAboii has joined #openstack-infra16:38
*** dkranz has joined #openstack-infra16:38
yolandaso jeblair, what's the issue, api response went worse than normal?16:38
jeblairyolanda: i believe it has slowed gradually over time16:39
*** achuprin has joined #openstack-infra16:39
openstackgerritJames E. Blair proposed openstack-infra/system-config: Revert "Turn off HP Public Cloud"  https://review.openstack.org/16630816:40
jeblairmordred: can you ninja that ^16:40
*** EmilienM is now known as EmilienM|afk16:41
mordredjeblair: yup16:41
clarkbjeblair: revert sounds good to me as well16:41
openstackgerritMerged openstack-infra/system-config: Revert "Turn off HP Public Cloud"  https://review.openstack.org/16630816:42
mordredjeblair: don't forget, puppet is disabled on nodepool16:42
*** andreykurilin_ has joined #openstack-infra16:42
jeblairmordred: yep.  i plan on stopping nodepool, re-installing master, running puppet apply, and starting nodepool16:42
jeblairfungi: are you ready for me to do that ^ ?16:42
clarkbI have made progress on https://issues.jenkins-ci.org/browse/JENKINS-27514 just by reading through this to update the bug16:43
jeblair(er, puppet agent)16:43
fungijeblair: yes, go for it16:43
*** yamahata has joined #openstack-infra16:43
*** Ala has quit IRC16:43
openstackgerritMerged openstack-infra/project-config: Add new project faafo to Stackforge  https://review.openstack.org/16466816:44
*** tsg_ has quit IRC16:44
clarkbhttps://github.com/jenkinsci/ssh-slaves-plugin/blob/ssh-slaves-1.9/src/main/java/hudson/plugins/sshslaves/SSHLauncher.java#L1213 hanging coupled with a synchronized method appears to be leaking all of the threads16:44
jeblair+if type dpkg-reconfigure >/dev/null 2>&1 && ! test -f /etc/ssh/ssh_host_rsa_key16:44
jeblair+then16:44
jeblair+    dpkg-reconfigure openssh-server16:44
jeblair+fi16:44
jeblairpuppet did that ^16:45
jeblairin case that impacts your thinking about the content of images that were built this morning16:45
fungithat was part of the "regen ssh host keys ourself" patch16:45
jeblairyeah, seems like it may not have been applied16:45
fungii don't think the rackspace images were getting that far16:45
jeblairok16:45
fungiseems like they weren't actually configuring their network interfaces16:45
fungiat least from the limited testing i was able to do16:46
*** dprince has quit IRC16:46
*** dprince has joined #openstack-infra16:46
fungithey were booted to login prompts for several minutes but i got no ping response from the ip addresses reported for them by nova16:46
openstackgerritMonty Taylor proposed openstack-infra/project-config: Revert "Regenerate ssh host key on boot"  https://review.openstack.org/16631016:47
openstackgerritMonty Taylor proposed openstack-infra/project-config: Revert "Remove ssh host keys during image build"  https://review.openstack.org/16631116:47
fungican also check what nodepoold saw from those. if it really was an ssh host key problem we'd get connection closed. if lack of networking then connection timeout16:47
fungii'l go hunting in the logs16:47
openstackgerritMonty Taylor proposed openstack-infra/system-config: Revert cloud-init removal  https://review.openstack.org/16631216:50
*** david-lyle_ has joined #openstack-infra16:50
mordredI'm going to ninja those reverts above, unless there is opposition16:50
mordredclarkb, fungi, jeblair, pleia2 ^^16:50
jeblairmordred: are you abandoning the effort?16:51
*** david-lyle_ has quit IRC16:51
*** harlowja_away is now known as harlowja_16:51
mordredjeblair: well, it didn't fix hp, and it broke rax - so I think regrouping and starting over is probably in order, yeah?16:52
*** Sukhdev has joined #openstack-infra16:52
pleia2makes sense16:52
jeblairmordred: (btw, if we are in a similar situation again, i believe we could mitigate the extremely slow hpcloud boot times by lowering max-servers for one of the providers)16:52
jeblairmordred: yeah -- will we get the same effect eventually with the dib work?16:52
mordredwell - we're doing much more active and methodical testing with the dib work16:53
mordredto understand what we need at boot time for realz16:53
*** sigmavirus24 is now known as sigmavirus24_awa16:53
jeblairmordred: so just roll "don't have cloud-init depend on metadata server but also make sure we get fresh host keys" into that?16:53
mordredthat removing cloud-init somehow broke rackspace which I thought didn't use it is mindboggling to me16:53
*** dustins has joined #openstack-infra16:54
fungilooks like what nodepoold logged was "Timeout waiting for server <UUID> in rax-xxx" so it never got as far as testing ssh16:54
mordredjeblair: yah- I mean, I'd like to get an answer sooner - but worst case yes16:54
mordredjeblair: because I want to undersatnd what about that broke rackspace16:54
jeblairmordred: okay sounds good16:54
mordredsince it defies my understanding of how the rackspace nodes work16:54
mordredwhich isn't good :)16:54
jeblairif you look at the node graph now, you can basically see what it's like when we delete all our instances at once.  a steep decline from rax, and we've leveled out waiting on hpcloud16:55
*** dtantsur is now known as dtantsur|afk16:55
jeblairand now starting to build up in rax16:55
jeblairis anyone deleting hpcloud aliens?  if not, i'll start on that16:59
fungii had not started yet16:59
openstackgerritMerged openstack-infra/project-config: Make VPNaaS StrongSwan functional gate voting  https://review.openstack.org/16539216:59
fungimy best guess is that something we did broke rax instances' ability to configure their network interface/routing and if the hypervisor can't ping the interface nova never reports it as ready?17:00
*** psedlak has joined #openstack-infra17:00
rcarrillocruzjeblair: nod, we've been plagued by those slow delete api calls... last thing we heard, the Neutron guys were looking at it17:00
clarkbfungi: I am wondering if nova agent is tied to cloud init somehow17:01
jeblairfungi: do you have an easy way to reconcile two alien lists?17:01
clarkbjeblair: comm -1217:01
*** baoli has quit IRC17:01
clarkbI learned this from sdague, its a neat little trick17:01
*** baoli has joined #openstack-infra17:01
fungijeblair: you mean to diff them? i'll give you my script17:02
clarkbI just use `comm -12 file1 file2`17:02
jeblairclarkb: that's pretty cool17:02
*** baoli has quit IRC17:02
fungijeblair: world's worst bash one-liner http://paste.openstack.org/show/193973/17:02
*** wenlock has quit IRC17:02
*** baoli has joined #openstack-infra17:02
clarkbjeblair: it does get cranky about unsorted inputs so I sort the files first usually17:02
openstackgerritMerged openstack-infra/project-config: Remove check-tempest-dsvm-f20  https://review.openstack.org/16553217:03
*** Ryan_Lane has joined #openstack-infra17:03
fungiclarkb: oh, neat. that would get rid of my hacky nested loops17:03
fungialso i need to go get some lunch, but will be back shortly17:04
*** markus_z has quit IRC17:04
clarkbjohnthetubaguy: any ideas on why purging cloud-init from our images would break our ability to have working networking on rax nodes? is nova agent piggy backing off of somethign cloud init does?17:04
*** sarob has joined #openstack-infra17:04
SpamapSjeblair: Are you aware of anybody who has successfully used gear w/ eventlet?17:05
* SpamapS needs to get back to real work.. has fallen down a gearman hole lately17:05
openstackgerritMonty Taylor proposed openstack-infra/project-config: Disable metadata in cloud-init config  https://review.openstack.org/16631817:06
mordredjeblair: ok - rather than the full remove cloud-init - I would like to modify it ^^17:06
jeblairSpamapS: i think there may be an unmerged patch to that effect in gear's review queue17:06
mordredSpamapS: ^^ can you check me that that's not insane?17:06
mordredclarkb: ^^ I think that is more inline with what you were saying yesterday17:06
*** Somay has joined #openstack-infra17:06
SpamapSjeblair: in my experimenting with gear as an oslo.messaging driver..  it's not working.17:06
fungijeblair: sdague: oh, as for yesterday's heat functional discussion, https://review.openstack.org/166030 seems to have gotten the job back to an hour consistently17:07
jeblairfungi: yay17:07
SpamapSAnd I've spent way more time than I ever should have on this, so I think it's time to WIP it and circle back later. :-/17:07
clarkbmordred: ya I was also suggestion we use config drive but I don't think thats necessary for the short term17:07
jeblairSpamapS: 9753317:07
clarkbfungi: awesome17:07
*** ociuhandu has joined #openstack-infra17:07
anteayafungi jeblair do you think we can change the timeouts for the heat job then?17:08
mordredclarkb: yes - I don't think we need the config drive part17:08
jeblairSpamapS: but yeah, i support you in not rat-holing on it :)17:08
clarkbmordred: agreed17:08
openstackgerritMerged openstack-infra/project-config: Adds compute-hyperv in StackForge  https://review.openstack.org/16561117:08
mordredclarkb: but I think we can satisfy the intent with that patch17:08
SpamapSjeblair: ah yes just found that. Well that is exactly what I ran into.17:08
clarkbmordred: but if config drive were enabled it should also just work17:08
clarkbassuming new enough cloud-init17:08
mordredclarkb: yes17:08
SpamapSjeblair: oh you should have hid this from me. Now it will be calling to me from the bottom of the rat hole. ;)17:08
clarkbso its win win17:08
mordredit turns out rackspace IS using cloud-init for something in addition to nova-agent17:09
clarkbmordred: do you know what that is?17:09
mordredas evidenced by the existence of /etc/cloud/cloud.cfg.d/10_rackspace.cfg17:09
mordredso - I think they are using it for many of the things17:09
mordredjust not the things that nova-agent is doing17:09
openstackgerritMerged openstack-infra/os-loganalyze: fix supports_sev matching  https://review.openstack.org/16554217:09
mordred*boggles*17:09
*** tnovacik has joined #openstack-infra17:09
SpamapSmordred: sanity checked17:10
openstackgerritMerged openstack-infra/project-config: new-project: stackforge/python-senlinclient  https://review.openstack.org/16496317:10
clarkblooking at this java code I think that there is zero reason to synchronize that method. I wish java devs wouldn't default to doing that its a horrible practice. Instead we need to synchronize around the connection and session objects which are not class level but object level17:10
*** garyh has joined #openstack-infra17:10
fungianteaya: probably if they're waay higher than an hour17:10
openstackgerritMerged openstack-infra/os-loganalyze: let tests be run from test file location  https://review.openstack.org/16579917:10
fungialso, this seems to have worked yesterday... http://lists.openstack.org/pipermail/foundation-board/2015-March/thread.html17:10
mordredfungi: woot!17:11
clarkbfungi: anteaya they are currently set to 2 hours, I think we should reduce to 90 minutes or so17:11
mordredclarkb, fungi: mind if a push through the reverts and the new attempt at cloud-init and kick another hpcloud image rebuild?17:11
openstackgerritMerged openstack-infra/os-loganalyze: extract static methods  https://review.openstack.org/16585017:11
*** e0ne has quit IRC17:11
clarkbmordred: if you are around today to babysit fine by me :)17:12
*** wenlock has joined #openstack-infra17:12
mordredk. I'm going to run to get the bag of coffee beans right now - if there are no objections when I get back, I will do that next17:13
clarkbmordred: you will need to free disk space again17:13
anteayaclarkb: just found the patch so will offer something around 90 minutes noting that 60 minutes would be the ideal17:13
mordredclarkb: that's so exciting17:13
*** psedlak is now known as psedlak^afk17:13
clarkbmordred: you can `sudo -H -u nodepool dib-image-delete $imageid`17:13
anteayafungi: k17:13
fungimordred: lgtm though i didn't vet the cloud-init config syntax i'm assuming SpamapS did17:14
clarkbmordred: where imageid is the id for the older of the devstack-precise-dib devstack-trusty-dib and devstack-centos7-dib images17:14
fungiokay, really going to lunch now. bbiaw17:14
*** Sukhdev has quit IRC17:15
mordredfungi: fwiw, I copied that content directly from the rackspace nodes17:15
clarkbmordred:  https://issues.jenkins-ci.org/browse/JENKINS-27514 and https://review.openstack.org/#/c/158891/ should allow us to properly remove those images until we need them for rax17:15
SpamapSfungi: it's yaml, it can't be wrong. ;)17:15
*** achanda has joined #openstack-infra17:15
*** notnownikki has quit IRC17:16
*** dboik_ has joined #openstack-infra17:16
*** psedlak^afk is now known as psedlak17:16
clarkbmordred: there is a comment in those files about dpkg reconfiguring17:16
clarkbmordred: is that going to be something you need to do or something that will override your changes if it happens?17:16
*** psedlak is now known as psedlak^afk17:16
mordredSpamapS: ^^ ?17:17
*** psedlak^afk is now known as psedlak17:17
mordredclarkb: I don't know - i've never used cloud-init successfullly17:17
SpamapSugh17:17
*** AJaeger has joined #openstack-infra17:17
SpamapSI think you might have to put the answer in debconf, let me check17:18
clarkbI am pretty sure that this is the real reason people use docker17:18
mordredyup17:18
clarkbnot the packaging or the potential security17:18
mordredyup17:18
clarkbbut the "I just want this damn process to run" functionality17:18
mordredyup17:18
mordredbecause everything else has lost track of that being the use case people are trying to solve 95% of the time17:19
AJaegersdague, some of the requirements we have in openstack-manuals are unique and we could remove them. Note that trove also uses the docbook XML toolchain and thus needs openstack-doc-tools. So, what about the following:17:19
SpamapSholllyyy crap17:19
* SpamapS did not need to see cloud-init's postinst today17:19
SpamapSdon't look at it17:19
SpamapSface melting17:19
clarkbthis should be simple, but after diving into upstart sysv compat on ubuntu I no longer assume anything about how running proceses should be simple at boot17:19
openstackgerritAnita Kuno proposed openstack-infra/project-config: Reduce timeout for heat functional job  https://review.openstack.org/16632017:19
AJaegersdague, allow in projects.txt in requirments "soft" projects where we do not require all requirements - and set that flag for the doc projects. And then remove their unique requirements?17:19
*** jistr has quit IRC17:19
*** dboik has quit IRC17:20
*** pblaho has quit IRC17:20
SpamapSok yeah17:20
*** psedlak has quit IRC17:20
SpamapSmordred: so clarkb is right in being concerned17:20
* anteaya is not a fan of swearing in channel17:20
SpamapSthe debconf value cloud-init/datasources will be injected there17:21
clarkbanteaya: sorry17:21
*** garyh has quit IRC17:21
clarkbSpamapS: and that will happen only if dpkg-reconfigure is called right? so if the package is updated?17:21
anteayaclarkb: np, thanks17:21
clarkbwe can probably get away with the change as is17:21
clarkbbut it may also lead to weirdness down the road if we aren't carefuk17:21
*** openstackgerrit has quit IRC17:21
clarkb*careful17:21
SpamapSclarkb: updates will cause it yes17:21
*** openstackgerrit has joined #openstack-infra17:22
openstackgerritSomay Jain proposed openstack-infra/jenkins-job-builder: Adding more configurable options in Notifications plugin  https://review.openstack.org/16313717:22
*** kgiusti has quit IRC17:22
SpamapSyou can make a file, 91_reallydatasources.cfg17:22
*** dmorita has joined #openstack-infra17:22
SpamapScloud-init reads them in order and will do a  __dict__.update() using the new one17:22
SpamapSso that might be the safest way17:23
SpamapSmordred: ^17:23
clarkbSpamapS: that sounds simple and reliable, I like it17:23
SpamapSor   echo "cloud-init cloud-init/datasources Configdrive,None" | debconf-set-selections17:24
SpamapSbut really, debconf, DIAF. :-P17:25
*** Bsony_ has quit IRC17:25
*** mjturek1 has quit IRC17:26
*** gyee has joined #openstack-infra17:27
*** gampel has joined #openstack-infra17:29
cineramahey pleia217:30
openstackgerritAnita Kuno proposed openstack-infra/project-config: Reduce timeout for heat functional job  https://review.openstack.org/16632017:30
*** koolhead17 has joined #openstack-infra17:31
*** pc_m has quit IRC17:31
*** armax has quit IRC17:32
morganfainberglbragstad, https://bugs.launchpad.net/keystone/+bug/1433311 is not wishlist, this is higher prio17:33
openstackLaunchpad bug 1433311 in Keystone "Fernet tokens current don't support token bind" [Medium,Triaged]17:33
*** tsg has joined #openstack-infra17:33
morganfainbergwhoopse wrong channel17:33
*** pelix has joined #openstack-infra17:35
jeblairclarkb, fungi, mordred: hpcloud alien deletes are running17:36
*** sputnik13 has joined #openstack-infra17:37
*** koolhead17 has quit IRC17:37
clarkbjeblair: cool, do you want me to kick off a floating ip cleanup too? I can also start the leaked port deletion script17:37
*** pelix has quit IRC17:38
clarkbjeblair: or I can give you my one liner for FIPs if you want to run it17:38
anteayamorganfainberg: I was going to say17:38
*** ivar-lazzaro has joined #openstack-infra17:38
jeblairclarkb: why don't you kick it off?  but i'm guessing it won't have much to do17:38
clarkbok17:38
jeblairclarkb: i think most of these happened before we got to the fip state17:38
jeblairstage17:38
morganfainberganteaya, yeah i know :P17:38
*** arxcruz has quit IRC17:38
morganfainberganteaya, tooooooo many irc channels17:38
*** pelix has joined #openstack-infra17:39
anteayaI've never seen lbragstad say anything in this channel17:39
anteayaI'd make fun of his hat if he did17:39
clarkbjeblair: `venv/bin/neutron floatingip-list | grep -v '10\.0\.' | sed -e '1,3d' -e '$d' | cut -d'|' -f 2 | xargs -n 1 -P 1 venv/bin/neutron floatingip-delete` is the one liner fwiw17:39
anteayamorganfainberg: I've been waiting for the opportunity17:39
clarkband its done, only 4 to delete17:39
*** ivar-lazzaro has quit IRC17:39
morganfainberganteaya, ++ yes!17:40
*** ivar-lazzaro has joined #openstack-infra17:40
*** aysyd has quit IRC17:42
anteayamorganfainberg: I _know_ he is wearing it17:42
morganfainberganteaya, i'm sure he is!17:42
*** otter768 has joined #openstack-infra17:45
openstackgerritMerged openstack/requirements: Update gabbi to 0.12.0  https://review.openstack.org/15625317:45
*** ghostpl_ has quit IRC17:45
*** aysyd has joined #openstack-infra17:46
*** dmorita has quit IRC17:47
*** fandi has joined #openstack-infra17:47
*** fandi has quit IRC17:47
*** sabeen1 has joined #openstack-infra17:47
*** dmorita has joined #openstack-infra17:48
*** ayoung has joined #openstack-infra17:49
*** otter768 has quit IRC17:50
*** VijayTripathi has joined #openstack-infra17:50
*** ghostpl_ has joined #openstack-infra17:50
openstackgerritMerged openstack-infra/project-config: Drop ironic tempest regex, stop running all of Tempest  https://review.openstack.org/16142017:52
*** coolsvap_ is now known as coolsvap|afk17:52
openstackgerritMonty Taylor proposed openstack-infra/project-config: Disable metadata in cloud-init config  https://review.openstack.org/16631817:53
mordredSpamapS, clarkb: ^^ there - that also tells debconf17:53
*** dmorita has quit IRC17:54
*** andreykurilin_ has quit IRC17:54
jeblairclarkb, fungi, mordred: with nodepool running on hpcloud, i can only do about two alien delete processes in parallel17:55
clarkbmordred: you prefer that over SpamapS' suggestion?17:55
openstackgerritMerged openstack-infra/project-config: Add a python-ironicclient src job  https://review.openstack.org/16363217:55
*** mpaolino has quit IRC17:55
jeblairclarkb, fungi, mordred: 2 pushes create time up to 45 seconds / request, 3 occasionally pushes it over 60 seconds which is what we have the api timeout set to17:55
jeblair(more than 3 regularly pushes it over that limit)17:56
*** patrickeast has joined #openstack-infra17:56
jeblairso it's going to take a really long time to run17:56
*** mrmartin has joined #openstack-infra17:56
openstackgerritMerged openstack-infra/project-config: Turn on oslo.messaging coverage report  https://review.openstack.org/16402217:56
mordredclarkb: that was one of his suggestions17:56
mordredclarkb: that will ensure that if it gets re-run, the file will remain set17:57
mordredclarkb: the other thing seems really confusing to me17:57
*** dmorita has joined #openstack-infra17:57
mordredjeblair: yoy17:57
clarkbmordred: ok17:57
*** Bsony has joined #openstack-infra17:58
openstackgerritMerged openstack/requirements: Bump keystonemiddleware requirement  https://review.openstack.org/16457317:59
clarkbmordred: lgtm17:59
clarkbI am going to pop out now for early lunch. Back in a bit18:00
*** _nadya_ has quit IRC18:00
openstackgerritMerged openstack-infra/system-config: Revert cloud-init removal  https://review.openstack.org/16631218:01
openstackgerritMerged openstack/requirements: Bump requests-mock version  https://review.openstack.org/16249318:01
openstackgerritMerged openstack/requirements: Update pip and pip-missing-reqs  https://review.openstack.org/15929318:01
*** armax has joined #openstack-infra18:01
*** shardy has quit IRC18:01
mordredSpamapS: dare I even ask why the file is called 90_dpkg ?18:02
mordredSpamapS: I mean, it has nothing to do with configuring dpkg - it's a setting that configures data sources18:02
*** sdake_ has joined #openstack-infra18:03
mordredoh - sod it - I need to do a different patch on rh systems don't I?18:03
johnthetubaguymordred: I think they did something evil inside cloud init to stop it racing with the agent…18:03
mordredjohnthetubaguy: all to avoid running dhcp18:03
mordredjohnthetubaguy: the mind boggles18:03
mordredat the amount of effort that has been expended to chat that18:03
mordredchase18:04
mordrednot chat18:04
johnthetubaguymordred: basically, the agent needs to setup network, before cloud-init starts if I remember18:04
mordredyup18:04
mordredI've looked through the init-script hacks for that18:04
johnthetubaguyah, OK, thats the bit I knew about18:05
*** e0ne has joined #openstack-infra18:05
*** Swami has joined #openstack-infra18:05
mordredclarkb, SpamapS: new version of that patch coming - I didn't think to test for ubuntu first18:05
johnthetubaguyso I assume you don't have an image metadata tag telling nova not to talk to the agent on your image, but thats another thing that can stop that working18:05
mordredjohnthetubaguy: well, we deleted cloud-init earlier18:05
mordredbecause we bake keys into the images18:06
johnthetubaguymordred: a good way to test the agents OK is to changepassword, after rebooting your VM, after removing cloud-init from it, if that works?18:06
mordredbut it turns out that breaks something else on rackspace18:06
*** Sukhdev has joined #openstack-infra18:06
mordredsomething related to networking18:06
*** sdake has quit IRC18:06
*** mrmartin has quit IRC18:06
mordredwhich surprised me - because I was expecting ... OH! I think I know18:06
johnthetubaguymordred: hmm, we certainly don't use cloud-init for anything critical, I can only think of the init hack for them being linked18:06
clarkbmordred maybe do spamaps thing as it should be distro agnostic?18:07
mordredclarkb: sigh. ok. I want to go on record as saying it makes me angry, fwiw18:07
johnthetubaguyI am curious, why do you need to remove cloud-init?18:08
mordredjohnthetubaguy: we don't need it18:09
johnthetubaguyhmm, OK18:09
mordredjohnthetubaguy: but we were going for the easy way to stop hammering the hp cloud metadata service18:09
mordredjohnthetubaguy: all of our nodes boot from images we build18:09
johnthetubaguyah, that makes more sense, gotcha18:09
mordredjohnthetubaguy: we're currently working on a project which is "make an image that can boot on both rackspace and hp that contains neither nova-agent nor cloud-init"18:10
johnthetubaguymordred: eek, gotcha18:10
johnthetubaguya worth aim18:10
mordredjohnthetubaguy: I'd give in and use cloud-init if we didn't have to patch cloud-init to get networking info on rax18:10
johnthetubaguys/worth/worthy/18:10
mordredbut we do - so it's also a pita18:10
*** derekh has quit IRC18:11
johnthetubaguymordred: I didn't think you should have to do that patch though, the regular info should have been there two, sounds like a bug18:11
johnthetubaguys/two/too/18:11
mordredit's not - the patch hasn't landed upstream18:11
mordredto pass neutron IP info through to config-drive18:11
mordredrax is VERY THANKFULLY deploying teh same info currently into a vendor extension (thank you thank you)18:11
mordredbut until it lands upstream, the patch to consume from cloud-init isn't even proposed to cloud-init18:12
mordredand cloud-init upstream is currently doing a 2.0 rewrite anyway18:12
johnthetubaguymordred: yeah, I am thinking there was a way if you set flat_injected=True, I thought on XenServer we had a hack that did that injection into the old location, but I never got chance to test that in production yet18:12
johnthetubaguymordred: ah, interest18:12
johnthetubaguyafraid I have to run off now18:13
johnthetubaguyits getting dark in the UK, and I have an extra tuba rehersal tonight this week18:13
openstackgerritMonty Taylor proposed openstack-infra/project-config: Disable metadata in cloud-init config  https://review.openstack.org/16631818:13
mordredjohnthetubaguy: have fun at rehearsal!18:13
anteayajohnthetubaguy: you are after all, the tuba guy18:13
mordredSpamapS: ^^ can you sanity check that for me please?18:13
johnthetubaguy:)18:13
*** pc_m has joined #openstack-infra18:14
*** gampel has quit IRC18:15
jeblairmordred, fungi, clarkb: my plan today is to tend to the slow alien delete process, but otherwise avoid any nodepool changes, and take it easy this afternoon and write up more zuulv3 specs so i'm not burned out for our maint tomorrow18:16
anteayajeblair: is there anything I can do to help? I'm holding off reviewing/approving stuff as I don't want to tax the few workers we have18:17
jeblairanteaya: i would not worry about that.  approve at will; it'll get through it eventually.18:18
clarkbjeblair sounds good, should mordred avoid the cloud init change then?18:18
anteayaokay18:18
*** tkelsey has joined #openstack-infra18:21
*** garyh has joined #openstack-infra18:22
fungijeblair: that sounds like a great plan18:23
mordredclarkb, jeblair: the cloud-init change shouldn't affect the other api stuff much18:23
*** kgiusti has joined #openstack-infra18:23
*** dboik_ has quit IRC18:23
*** dboik has joined #openstack-infra18:24
fungijeblair: as for alien deletes, i usually just do them entirely serially unless the quantity is enormous (like the ~500 we needed to delete yesterday)18:25
*** johnthetubaguy is now known as zz_johnthetubagu18:25
*** ghostpl_ has quit IRC18:25
*** ghostpl_ has joined #openstack-infra18:27
anteayawhat is z/tempest? it is in zuul/layout.yaml but I dont' know what repo it corresponds to: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/layout.yaml#n282618:29
*** dboik has quit IRC18:30
anteayaI don't even know where else to look to find out18:31
*** dboik has joined #openstack-infra18:31
pleia2cinerama: so yeah, saw StevenK's patch and updated the topic on it this morning so it shows up in reviews with all the other zanata patches, thanks for reviewing, I'll have a look in a bit18:31
* anteaya nips out to get more sap, back in a minute18:32
*** garyh has quit IRC18:32
*** MarkAtwood has joined #openstack-infra18:33
cineramapleia2: kool18:33
*** MrAboii has quit IRC18:34
*** arxcruz has joined #openstack-infra18:35
*** crc32 has joined #openstack-infra18:35
fungianteaya: it's a dummy project used to set up a transitive co-gating relationship between actual projects18:35
anteayafungi: ah ha18:36
fungianteaya: it's basically abusing zuul's queue sharing algorithm to establish an equivalency between multiple jobs in case a project runs one of those but not the others18:37
anteayafungi: I found it because I am reviewing https://review.openstack.org/#/c/165648/118:38
*** edwarnicke has quit IRC18:38
anteayawhich reduces the prevelence of neutron-large-ops on projects18:38
*** sweston has quit IRC18:38
*** ujuc has quit IRC18:38
nibalizermordred: so for using 1 cert for everone with puppet apply i think we have to set this18:39
nibalizerhttps://docs.puppetlabs.com/references/latest/configuration.html#nodename18:39
anteayashould the z/tempest neutron-large-ops job also be scaled back?18:39
nibalizerthen we can set certname to everyonecert.lol.openstack.org18:39
*** dougwig has quit IRC18:39
*** erw has quit IRC18:39
nibalizerthanks to Hunner for that one18:39
mordrednibalizer: nod18:39
Hunnerand just whitelist that cert at the puppetdb18:40
HunnerPuppetdb won't care about the cert that is auth'd, only the contents of the payload18:40
fungianteaya: probably not since we likely still want to make sure projects which do continue to run that job co-gate with others which don't18:40
anteayavery good, thanks18:41
*** Somay has quit IRC18:41
anteayaI understood about 20% of what you told me about that dummy project but hopefully I can get a better visualization of it at some point18:41
openstackgerritJeremy Stanley proposed openstack-infra/system-config: Move security.openstack.org to HTTPS  https://review.openstack.org/15509918:46
*** dmorita has quit IRC18:46
*** spzala has quit IRC18:46
*** VijayTripathi1 has joined #openstack-infra18:48
*** VijayTripathi has quit IRC18:51
*** pradk has joined #openstack-infra18:55
*** pradk has quit IRC18:55
fungiclarkb: you've flown through vancouver inbound to the usa before right? i'm looking at flying back directly and trying to figure out if i need to leave buffer time in my first layover for customs or if they really do us customs when boarding in vancouver and then treat the connections as usa domestic...18:55
*** prad has quit IRC18:56
*** e0ne has quit IRC18:57
*** mjturek1 has joined #openstack-infra18:58
*** e0ne has joined #openstack-infra18:58
*** prad has joined #openstack-infra18:59
*** mjturek1 has left #openstack-infra19:00
*** ssam2 has quit IRC19:00
clarkbthey do customs in canada for us departures19:01
clarkbso add time for that19:01
clarkbyvr was pretty quick about it though19:02
fungigood to know, and conversely i can scale back on my first layover since i won't need to claim and re-check my luggage19:02
*** emagana has quit IRC19:02
*** tkelsey has quit IRC19:03
sdagueoh, man - https://review.openstack.org/#/c/125944/ - fungi / clarkb either of you want to put the final +2 on that one? That would make for an awesome Friday19:03
*** emagana has joined #openstack-infra19:04
fungias for the working-group-on-a-train idea, amtrak apparently has three coach options for the portland->vancouver run ranging from us$48-114... does it matter which coach ticket i get?19:04
openstackgerritMerged openstack-infra/project-config: Disable metadata in cloud-init config  https://review.openstack.org/16631819:04
openstackgerritMerged openstack-infra/project-config: Revert "Remove ssh host keys during image build"  https://review.openstack.org/16631119:04
openstackgerritMerged openstack-infra/project-config: Revert "Regenerate ssh host key on boot"  https://review.openstack.org/16631019:04
*** tqtran is now known as tqtran_afk19:04
*** [HeOS] has joined #openstack-infra19:04
cdentaw, fungi, I'm jealous, the train ride from portland to vancouver is beautiful19:05
fungicdent: join us! do some openstacking on a train19:05
*** rlucio has joined #openstack-infra19:06
jrolloh, that would be cool19:06
cdenttoo late, got plane tickets already, in vancouver, out seattle, taking the train south afterwards19:06
*** achanda has quit IRC19:07
*** nilasae|afk has quit IRC19:07
*** armax has quit IRC19:08
*** tjones1 has quit IRC19:08
*** e0ne has quit IRC19:08
*** EmilienM|afk is now known as EmilienM19:10
*** e0ne has joined #openstack-infra19:10
*** sdake has joined #openstack-infra19:11
fungisdague: as awesome as it is, browsers are going to choke on it. see comment19:13
*** dimtruck is now known as zz_dimtruck19:14
AJaegersdague: did you see my question above?19:14
anteayaflashgordon: I'm currently reviewing: https://review.openstack.org/#/c/165652/1 where do I see the list of projects grenade does test?19:14
*** ghostpl_ has quit IRC19:14
sdaguefungi: gotcha19:15
cineramaneat. so looks like sjc to yvr via train takes a couple days but you have to stop overnight in seattle the way it's calculating it19:15
*** sdake_ has quit IRC19:15
clarkbfungi amtrak assigms seats as you vheck in19:15
sdagueAJaeger: only barely, I'm wrapping up one last thing then calling it a week19:15
clarkbusually its easy to at least take over the food/observation cars19:15
fungisdague: e.g. https://review.openstack.org won't be allowed to embed http://zuul.openstack.org json content for security reasons19:16
anteayaflashgordon: just if it is listed here with an upgrade-<project> file? http://git.openstack.org/cgit/openstack-dev/grenade/tree/19:16
sdaguefungi: yeh... hmmm... so I definitely had this working before19:16
fungiclarkb: open lounge areas. got it19:16
sdagueis there a new change there?19:16
cineramano assigned seating on capitol corridor when i've ridden but that may be different19:16
*** ghostpl_ has joined #openstack-infra19:17
fungisdague: were you doing it with overrides in the javascript console? that might make a difference19:17
AJaegersdague: ok, I'll followup via email - enjoy the weekend19:17
sdaguefungi: I was injecting it directly in the js console19:18
sdagueso that could be19:18
fungisdague: at least i know when it's come up in the past, having an javascript in an https-served page call an http url to retrieve data has caused browser security warnings/errors19:18
sdagueyeh, I can believe that19:19
fungisdague: though if the javascript is being provided via the js debug console instead of the site, that may not be spotted19:19
*** e0ne has quit IRC19:19
sdagueyep, good call19:19
fungisdague: well, not so much a call, as this use case is precisely why i added the changes to get zuul also serving its status data via https19:20
*** spzala has joined #openstack-infra19:20
sdaguewell, I will be excited when it shows up.19:20
fungibecause it came up before in discussion as a prerequisite for the status embedding we wanted19:21
anteayaflashgordon: and grenade doesnt' seem to be testing devstack-gate (which is good since I don't know what ti would test) so perhaps we should remove it from there as well19:21
openstackgerritMerged openstack-infra/project-config: Split grenade out of integrated-gate template  https://review.openstack.org/16565119:21
*** achanda has joined #openstack-infra19:21
*** pelix has quit IRC19:22
anteayazaro: do you have all the code merged that needs to be merged for tomorrow?19:22
*** tjones1 has joined #openstack-infra19:25
*** tjones1 has left #openstack-infra19:25
clarkbanteaya should be this is just moving to trusty19:26
mordredSpamapS: back away. back slowly away19:26
*** MarkAtwood has quit IRC19:26
mordredSpamapS: (oh, I was scrolled back ... that was a response to a long time ago)19:26
clarkband I think I reviewdd and got all those trusty related changes merged. good to double chwck though19:27
anteayaclarkb: awesome19:27
*** MarkAtwood has joined #openstack-infra19:28
anteayaI recall the js minifier patch was about to look for that, that needs to be in for tomorrow, just wondering if something else got discovered that I missed19:28
anteayapatch, I was19:28
anteayathere it is, I have reviewed, it has yet to be approved: https://review.openstack.org/#/c/165145/19:29
*** hashar has joined #openstack-infra19:31
*** sarob has quit IRC19:31
clarkboh thats a nee one.will review after eating blts19:31
*** MarkAtwood has quit IRC19:32
cineramaoh pleia2 when you get a chance, the spec mentions ansible playbooks - if you have the location i wouldn't mind taking a look to see if there's stuff we missed in the modules19:32
mordredclarkb: how many blts are you going to eat?19:32
*** garyh has joined #openstack-infra19:33
*** zz_dimtruck is now known as dimtruck19:33
*** hashar is now known as hasharConfcall19:34
fungiall teh blts19:36
*** tiswanso has joined #openstack-infra19:36
*** yamahata has quit IRC19:36
openstackgerritMerged openstack-infra/project-config: Update forge-upload job to use tags  https://review.openstack.org/16401619:37
*** yamahata has joined #openstack-infra19:37
fungiclarkb: when you have a moment between blts #4 and #5, another portland travel logistics question... is 2.5 hours from landing at pdx to amtrak departure from union station easy enough to accomplish via public transit?19:37
pleia2cinerama: I'll forward them to (they weren't strictly open sourced, just grabbed and sanitized by Red Hat IT and shared with me)19:38
greghaynesmordred: Made a couple fixes to your https://review.openstack.org/#/c/165792/ in case you didnt see19:38
cineramapleia2: oh cool thanks19:38
pleia2cinerama: which address to send them to?19:38
cineramapleia2: either hp or personal is fine19:38
mordredgreghaynes: I am a fan of fixes!19:38
pleia2cinerama: I don't know your personal address, so just PM me what you prefer :)19:39
clarkbfungi ya should be, take red line downtown ~hour + catch bus/yellow/green to union station ~20 minutes19:39
clarkbyou can walk that last step too19:39
fungiclarkb: cool. the neighborhood around union station looked marginally familiar on the map but wasn't sure what the closest stop on the red line was19:40
greghaynesfungi: youre portlanding!?19:40
greghaynesoh, im guessing this is for summit19:40
fungigreghaynes: for to ride teh trainz for summit, yes19:41
*** andreykurilin_ has joined #openstack-infra19:41
fungigreghaynes: though i have a talk accepted at oscon so will be back ~ a month later too19:41
greghaynesawesome, yes as clarkb said the max red line is kind of a direct airport -> amtrak19:41
*** ZZelle_ has joined #openstack-infra19:42
*** sushilkm has joined #openstack-infra19:42
*** sushilkm has left #openstack-infra19:42
*** sushilkm has joined #openstack-infra19:42
*** sushilkm has left #openstack-infra19:42
fungii always travel with a hiking pack as my checked luggage, so easy for me to walk a few miles briskly with it if needed19:42
greghaynesNice, I actually just booked a trip to your area for july :)19:42
*** dimtruck is now known as zz_dimtruck19:43
fungiooh! you should get a paper in for all things open and come to nc in october (though it's the week before tokyo, so maybe you actually shouldn't unless you're insane)19:43
*** garyh has quit IRC19:44
*** ihrachyshka has quit IRC19:44
greghayneshaha, the wife would be thrilled! (not really)19:44
mordredfungi: I need to submit for ATO19:44
mordredfungi: except - really it's the week before tokyo?19:44
fungimordred: SUBMIT!19:44
* mordred sobs19:44
fungimordred: it's sunday through tuesday this time though, so there's a few days buffer at least19:45
clarkbthe best part of this time of year is cadbury eggs19:45
pleia2++19:45
fungijust in case you wanted higher-octane sugar inside your normal sugar19:46
*** otter768 has joined #openstack-infra19:46
clarkbfungi: yes19:46
clarkbI got a dozen :)19:46
clarkbok time to review that change for gerrit19:46
clarkbanteaya: any others you can find?19:47
anteayanot for tomorrow19:47
anteayahoping to hear from zaro19:47
anteayado states have cadbury easter creme eggs now?19:47
anteayaI had believed you didn't19:47
mordredanteaya: we've had cadbury eggs for my entire life19:48
anteayacool19:48
mordredanteaya: it's possible that there is an additional thing that we don't have19:48
anteayanot sure what I'm thinking of then19:48
fungione of the few cadbury products we get here in the states19:48
pleia2cadbury in general isn't very common here19:48
pleia2but we get the eggs :d19:48
mordredto us, it's the company that makes the eggs19:48
fungiunless you go to import shops19:48
mordredfungi: MURICA!19:48
anteayahttps://en.wikipedia.org/wiki/Cadbury_Creme_Egg19:49
*** baoli has quit IRC19:49
fungii quite like the cadbury currant bars19:49
anteayafungi pleia2 oh okay19:49
fungibut muricans also mostly don't know what currants are either19:49
anteayaI don't know the currant bars19:49
fungior call them "tiny raisins"19:49
anteayafungi: well there's that19:49
anteaya:)19:49
pleia2fungi: not chocolate chips19:49
clarkbI think currants are those weird things we ate in belgium19:49
anteayareally?19:49
anteayaI don't consider currants belgian19:50
*** baoli_ has joined #openstack-infra19:50
*** otter768 has quit IRC19:50
clarkbI think they just had them there19:50
clarkbbecause ya we don't really have them inthis country19:50
* krotscheck has a supplier of redcurrants in Seattle. ALL TO MYSELF.19:51
clarkbok js people, why would we bother to go through the trouble of minifying jquery on trusty for ~15kb19:51
*** rfolco has quit IRC19:51
clarkbkrotscheck: ^ see https://review.openstack.org/#/c/165145/6/modules/openstack_project/manifests/gerrit.pp19:51
mordredkrotscheck: also, if you didn't see the other day - ubuntu apparently ships jquery.min.js as a symlink to jquery.js19:52
*** emagana has quit IRC19:52
krotscheckclarkb: Ehn. It doesn't hurt?19:52
anteayaclarkb: something about ensuring the toggle ci button works19:52
anteayaclarkb: not invalidating your question though19:52
clarkbanteaya: ya, mostly trying to figure out if this is worth the trouble19:52
krotscheckTo be honest, serving javascript up as gzip is more effective than minifcation.19:53
fungiclarkb: i agree 15kb extra that your browser's going to cache anyway isn't necessarily worth the effort to puppet compressing it19:53
krotscheckSo I usually don't bother minifying.19:53
anteayaclarkb: always worth it to ask that question19:53
krotscheckAlso, minifying makes production debugging hard.19:53
*** nilasae has joined #openstack-infra19:53
krotscheck"Exception thrown in line 1" -> Line 1 is 16K characters of text.19:54
clarkbkrotscheck: ya, though at least in gerrits case its all minified otherwise and impossible to debug so thats less of a concern19:54
fungianteaya: also it's a cadbury chocolate bar with currants and almonds. tasty, tasty stuff19:54
anteayafungi: I don't think I've ever seen that19:54
*** nilasae has quit IRC19:54
anteayasounds very tasty indeed19:54
fungianteaya: i've only found it in the uk19:54
*** nilasae has joined #openstack-infra19:54
anteayaah19:54
clarkbthe other question I have is what will yui-compressor do if fed an already minified version of the file?19:54
anteayaI'll look for it next time I'm there19:55
mordredclarkb: dude, krotscheck has convinced me we should not bother19:55
anteayahaven't spent much time in the uk yet, mostly just passing through19:55
fungiclarkb: we shouldn't be re-feeding the already minified file into it?19:55
* fungi re-checks that change19:55
clarkbfungi: oh right yup19:55
fungiyeah, it19:55
fungigrrr19:55
zaroanteaya, clarkb : yo! this is needed for the trusty upgrade, https://review.openstack.org/#/c/165145/19:55
mordredclarkb: and, in fact, should maybe stop minifying anywhere just because it would let us delete more puppet19:55
clarkbok so we do need to address the broken button19:55
fungiit's minifying the normal version not the min.js file19:55
anteayazaro: yes the very patch we are talking about19:55
clarkbfungi: ya19:56
anteayazaro: glad you are here19:56
zarowas out to lunch and now back19:56
fungii'm on board with serve readable source code from our servers19:56
fungibecause we're open19:56
anteayazaro: so how much do you care if we minify the js19:56
*** ajmiller_ has joined #openstack-infra19:56
anteayazaro: because right now the group is leaning towards not bothering19:56
clarkbzaro: can we not go through the trouble of minifying that file and simply have puppet do a smlink to /usr/share/javascript/jquery/jquery.js?19:56
zaroanteaya: fungi & jeblair seems to think it's important19:57
anteayazaro: okay so if they come back in favour of not bothering that is okay with you?19:57
fungizaro: i only felt it was important to actually have a minified file if we're serving it as jquery.min.js, but if we can serve the full source and _call_ it jquery.js i'm cool with that19:58
mordredclarkb: I'm voting for "have puppet do a symlink"19:58
clarkbI think I am fine with the change as is at this point too19:58
zarouhhm, i think that's to make better perfomance.19:58
zaroi'm not sure how much better though.19:58
mordredzaro: krotscheck says it won't do that really19:58
clarkbbut for simplicity a symlink would probabl be best19:58
clarkbthe file size differences is about 15kb19:58
mordredfungi: yes - my issue with the debian package was that they called it .min.js19:58
fungimordred: mine too. i think that's worse than just not including the file19:59
krotscheckThe only real benefit is download speed, and that's heavily dependent on your browser's caching settings, the server's use of cache invalidation headers, and the server's use of mod_gzip19:59
fungijavascript minification is, to some extent, an obsessive compulsive disorder some people have about squeezing every last bit of whitespace out of files they serve even if their webserver is going to turn around and gzip-encode it anyway19:59
mordredkrotscheck: all of which are going to do a better job than minification19:59
greghaynesYea, really the use case for gaining speed via minification isnt something I belive youall have19:59
* mordred hands krotscheck an extra box of redcurrants19:59
clarkbmordred: where are we with https://review.openstack.org/#/c/166318/ ? have images building in hpcloud and rax yet?19:59
fungiit's not like we're minifying our html19:59
krotscheckFrom what I remember, the gzip algorithm actually works better on things with large regular words rather than things collapsed to single-character varnames.20:00
mordredclarkb: image just uploaded to hpcloud-b5, I kicked b4 just now20:00
greghaynesThe only time ive seen that download size make a big difference is when youre dealing with things like mobile where its more of a slowstart issue than download size issue20:00
clarkbkrotscheck: that sounds right, because it does prefixes (or suffixes, maybe both) so you need longer strings that overlap20:00
*** ajmiller has quit IRC20:00
*** ghostpl_ has quit IRC20:00
anteayazaro: so this patch was to ensure the toggle ci button works, yes? https://review.openstack.org/#/c/165145/20:01
zaroyes20:01
*** andreykurilin_ has quit IRC20:01
clarkbmordred: cool, I will keep an eye on nodes there, devstack-trusty?20:01
greghaynesLike, either way, you might not even be talking a full packet in size difference when you gzip both versions so there is effectively no difference ;)20:01
anteayazaro: okay great, can we get the toggle ci button working without having to minify the js?20:01
mordredclarkb: yah20:01
*** ociuhandu has quit IRC20:02
zaroyes, it works wihtout minifying js20:02
anteayazaro: how would you feel if we went that way?20:02
clarkbzaro: oh, so it works today without that change?20:02
mordredclarkb: b4 has  it20:02
zaroclarkb: no, it will be broken on trusty20:02
clarkbzaro: ok, so we do need a change, but it doesn't have to be that change20:03
zaroclarkb: it works today because it's on precise.  precise lib-jquery library provides min.js file20:03
clarkbright rather than a symlink20:03
zaroso you proprose just linking to the .js file, yes that will work as well20:04
zaroactually it has to be a copy not a link20:04
clarkbzaro: we can do that then, just switch to using the real file not the .min.js20:05
*** ChuckC has quit IRC20:05
zaroclarkb: ok, maybe fungi and jeblair should chime in on that since i thought they wanted the min.js20:05
fungizaro: i only felt it was important to actually have a minified file if we're serving it as jquery.min.js, but if we can serve the full source and _call_ it jquery.js i'm cool with that20:05
*** Sukhdev has quit IRC20:06
jeblairzaro: i'm okay with the non-minified file.  it is a regression since we are serving it now, but the argument that it won't actually be any worse makes sense.  we can try it, and if it is, we can go with what you have.20:07
zarono min.js is cool with me if there's no benefit20:07
clarkband maybe we can file a bug with debuntu about this20:07
mordredclarkb: it would distract them from fixing python20:07
greghaynesDo youall do gzipping of those files when you serve them?20:08
pleia2clarkb: videos aren't online yet, but the slides for the "life of a logstash event" talk are up and helpfully detailed https://speakerdeck.com/elastic/life-of-a-logstash-event20:08
anteayayay, so we looking forward to your new patch zaro, which we hope to review and merge in the next few hours20:08
*** edwarnicke has joined #openstack-infra20:08
clarkbpleia2: thank you, was it a good talk?20:08
pleia2clarkb: it was great20:08
zarocool, i'll fix up.  maybe try to add that bug as well.  LP right?20:08
fungii mean, i sort of know why they did that. they can't ship jquery.min.js for certain reasons, but some other packaged web applications may be hard-coded to serve a file called jquery.min.js, so someone thought this was the most pragmatic compromise20:08
pleia2clarkb: I'll let you know when the video shows up :)20:08
clarkbpleia2: awesome, I should bug you when logstash derps now :)20:08
*** hdd has quit IRC20:09
greghaynesmy browswer says youall do gzip, so \O/20:09
clarkbfungi: wait, I thought the can if there is an FOSS toolchain to gnerate the file20:09
pleia2clarkb: haha, I might actually be able to help!20:09
clarkbfungi: I don't see how that is any different than say shipping a compiled gcc20:09
greghaynesYes, I thought jquery is mit license20:09
mordredclarkb: ++20:10
fungiclarkb: yeah, though the reliability of that was potentially in question when the javascript-jquery package landed in debian in the timeframe in which trusty imported it before it froze for release20:10
*** Bsony has quit IRC20:10
* fungi looks to see if it's still that way in testing/unstable20:10
*** Bsony has joined #openstack-infra20:11
jeblairgreghaynes: thanks for checking! :)20:11
jeblair(confirming gzip)20:12
clarkbpleia2: it is interesting that they still use that scaling architecture, I threw it out after about a day because it doesn't scale :)20:12
*** tiswanso has quit IRC20:12
clarkbpleia2: we run N indexers instead of funneling it down to 1 indexer20:12
* jeblair gets back to writing words20:12
fungiclarkb: mordred: jeblair: yeah, the libjs-jquery 1.7.2+dfsg-3.2 from jessie and sid has a separate min.js file not a symlink20:13
fungiand it's definitely smaller by roughly the right amount20:13
*** dougwig has joined #openstack-infra20:13
*** timcline has quit IRC20:14
clarkbcool20:14
*** ajmiller_ is now known as ajmiller20:14
*** erw has joined #openstack-infra20:14
fungiso looks like it was restored to sanity. it was a symlink in 1.7.2+debian-2.1 because the minification relied on uglify which was not at that time destined to make it into the wheezy release20:15
greghaynesclarkb: batch processing does a ton for scalng ;)20:15
fungi(circa november 2012)20:15
*** mfink_ has joined #openstack-infra20:15
clarkbmordred: devstack-trusty-1426881119.template.openstack.org is that the image I should be looking for?20:15
clarkbgreghaynes: you can do it without batch processing either20:16
clarkbgreghaynes: every shipper could just be an indexer too20:16
mordredclarkb: yes20:16
*** sweston has joined #openstack-infra20:16
mordredand it's uploaded to 2-5 now - and 1 is in progress20:16
greghaynesclarkb: Yes, I imagine under the hood thats what youre gaining by scaling via replication though20:16
clarkbgreghaynes: replication only affects query scaling not indexing20:16
clarkbor maybe you don't mean es replication20:17
greghaynesoh, I did but I guess it works differently than I thought. Maybe thats a good performance improvement if we run into write scaling issues20:17
greghaynesto somehow batch writes20:18
pleia2clarkb: the ELK family is interesting and *young* so even in 2 years since you first set up our system it's changed a ton, better scaling support across the board has been one of the big things20:18
pleia2clarkb: during one of the talks, they said they thought "logstash dropping things on the floor" was mostly an unusual bug/hardware failure/something but they came to realize it's a real thing once they started doing bigger testing20:19
clarkbpleia2: ya supposedly the elasticsearch 1.0 release performs much better but they keep having CVEs for their groovy script support which is a bit :(20:19
greghaynesbut sounds like youre just effectively making readonly replicas?20:19
pleia2clarkb: ah, that is unfortunate20:19
clarkbgreghaynes: no we basically have N logstash indexers that each process a file at a time20:19
clarkbgreghaynes: they talk to local ES clients that are part of the cluster which then index the data on the data nodes20:20
clarkbgreghaynes: but aiui you only ever write to the primary shard for indexing, so replicas don't help indexing performance20:20
*** mfink_ has quit IRC20:20
clarkbgreghaynes: they do however help reads beacuse you can read from any shard that has the data on any node when doing queries20:20
clarkband if you lose a node with primary shards replicas will become primary shards so you also get ha from them20:21
*** claudiub has quit IRC20:21
*** erlon is now known as erlon_away20:21
mordredclarkb: ok - all of hpcloud has the new devstack-trusty image20:22
clarkbmordred: ok20:22
clarkbmordred: have you also updated in rax to make sure we don't break rax tomorrow morning?20:22
*** melwitt has joined #openstack-infra20:22
openstackgerritKhai Do proposed openstack-infra/system-config: Fix jquery setup on Gerrit server.  https://review.openstack.org/16514520:22
mordredclarkb: no - I'll do that next20:22
zaroclarkb, anteaya, fungi ^20:23
mordredclarkb: I'm bulding bare-trusty right now20:23
clarkbmordred: ok, probabl just start with one region there20:23
mordredclarkb: yah20:23
mordredclarkb: although I _expect_ it to be a no-op since I got the file from rax - but still20:23
greghaynesclarkb: so, you obviously have to do some kind of write to the non-primary shards otherwise they dont have the data ;) I think you are effectively doing batch write replication though. Locally youre not but when you replicate you will which is where you tend to hit scaling issues20:23
mordredclarkb: oh - duh. bare-trusty is not dib20:24
*** melwitt_ has joined #openstack-infra20:24
mordredclarkb: I got an overquota error- apparently we're sitting at 600 nodes on hp20:24
*** dkliban is now known as dkliban_afk20:24
clarkbzaro: see comment20:24
*** thingee has joined #openstack-infra20:24
clarkbgreghaynes: oh I think maybe we have confused each other. The scaling issues are in logstash not es20:24
greghaynesclarkb: ty for the explanation though, kinda want to figure out more about tha tsetup20:24
mordredgreghaynes: we wold love for someone other than clarkb to actualy understand it :)20:25
clarkbgreghaynes: so the problem is scaling up cputime for logstash indexer process which means running one of those is bad for scaling20:25
clarkbgreghaynes: es is actually pretty good at scaling up, every time I have added nodes it has helped20:25
greghaynesclarkb: ah! I should stop assuming all problems are database problems20:25
clarkbgreghaynes: but basicaly ruby that runs lots of regexes in the jvm is slow :)20:25
*** peristeri has quit IRC20:25
mordredclarkb: it's a fair asumption20:25
openstackgerritKhai Do proposed openstack-infra/system-config: Fix jquery setup on Gerrit server.  https://review.openstack.org/16514520:25
mordredgha20:25
*** dprince has quit IRC20:26
nibalizerjeblair: we confirmed that puppet-blacksmith can create the puppetforge module when it submits the first time20:26
mordrednibalizer: neat20:26
nibalizeror i guess more accurately that the forge api is smart enough20:26
zaroclarkb: argg! shuld be good now20:26
clarkbzaro: one more thing20:26
clarkbzaro: sorry should've caught that on the previous patchset20:26
*** melwitt has quit IRC20:27
*** armax has joined #openstack-infra20:27
zarodon't we only want to copy only on package updates?20:28
clarkbmordred: for my sanity, the thing that we think caused hpcloud troubles was lots of deletes piling up after jeblairs change beacuse they were no longer serialized? and we had lots of deletes because metadata was broken. To address this we reverted jeblairs change and are avoiding metadata service20:28
clarkbzaro: the notify is independent. you need to tell gerrit to rebuild its stuff once you update the file20:28
clarkbzaro: so ya you need both things, the package subscription and the notify to make gerrit rebuild things20:28
mordredclarkb: yes20:29
clarkbmordred: any idea on whether or not metadata service is being fixed in hpcloud?20:29
mordredclarkb: well, we had a non-zero number of build failures that indicated some issue with metadata service in their logs20:29
mordredclarkb: I do not expect that it is - I think it is merely a fundamentally broken part of openstack20:29
clarkbwhere non zero is vast majority20:29
clarkbmordred: it may be, but we have only been experiencing trouble in hpcloud for about 2-3 weeks now20:30
clarkbmordred: basically I am trying to work backwards and see if we can attribute all of this to the same problem20:30
mordredclarkb: I think in general the mysql there is unhappy20:30
clarkboh I see and metadata relies on that to get its info20:30
mordredAIUI20:30
clarkbgotcha so ya it may actually all be related20:30
zaroclarkb: doesn't the GerritSiteHEader.html get reloaded on every browser page reload?20:31
clarkbbecause I think we leak resources when these random failures happen that report fail to nodepool without and resource uuids20:31
clarkbzaro: no gerrit only notices that file has changed if you touch it20:31
clarkbwhich is what the exec does20:31
mordredclarkb: yes - although it seems that we may want to more systemically account for the 502 followed by resource pattern20:32
clarkbmordred: ya I think the metadata idea from fungi is a good one20:32
mordredclarkb: like - it's quite possible that we will ALWAYS have registered an actual request when that happens20:32
clarkbmordred: I am just trying to assert that This si what made hpcloud broken for us over the last two weeks20:33
mordredyah20:33
clarkbbecause if it isn't then we also have other things to debug20:33
clarkbflashgordon: ^ around? we ould like to talk about that and how this may be a nova bug20:33
*** tqtran_afk is now known as tqtran20:33
mordredclarkb: I kinda think that we should trap for 500 errors, and if we get them, assume that the request succeeded and that we need to poll nova for the uuid based on the hostname we requested and try to resume once we have one20:34
greghaynesIt would be helpful to know how they determine request throttling...20:34
*** andreykurilin_ has joined #openstack-infra20:34
fungiclarkb: mordred: i assert the metadata idea was jeblair's20:34
openstackgerritKhai Do proposed openstack-infra/system-config: Fix jquery setup on Gerrit server.  https://review.openstack.org/16514520:34
mordredclarkb: because it's a frequent enough thing - it may fit the profile of "here is another way we've discovered clouds fail that we work around"20:34
*** zz_dimtruck is now known as dimtruck20:34
greghaynesseems like the case were optimizing for is if we make a request, it fails, can we immediately make a second one - not an we async a bunch of requests out20:34
mordredas in - I think we should just put more logic in our retry-timeout code there - and not delete the incomplete database record until we've reached the timeout20:35
clarkbgreghaynes: I think part of the problem here is that throttling is $request/time when $request may be one of many requests that all have different costs20:35
zaroclarkb: ok, done. thanks20:35
greghaynesclarkb: oh joy20:35
mordredyup20:35
mordreddelete, for instance, is very expensive20:35
clarkbzaro: lgtm thanks20:36
*** emagana has joined #openstack-infra20:37
zaroclarkb, fungi : noticed that jquery.min.js is used for other servers as well so probably should keep an eye out for those when moving to trusty.20:37
anteayafungi mordred pleia2 jeblair https://review.openstack.org/#/c/165145/ is up for review, be best if we had it in for tomorrow20:37
clarkbmordred: 687d0191-2605-4a8b-a1e5-cd773366c9b5 is up console log looks mostly ok to me20:38
mordredclarkb: woot20:38
clarkbmordred: I think nodepool is waiting to get it a floating ip now20:38
clarkbmordred: but hopefully it goes used/ready soon20:38
*** eharney has quit IRC20:38
clarkbzaro: good point, we use it for zuul status and other tools20:39
*** dustins has quit IRC20:39
pleia2anteaya: that's the patch that clarkb and zaro are talking about how :)20:39
pleia2s/how/now20:39
openstackgerritDoug Wiegley proposed openstack-infra/project-config: For neutron and neutron-lbaas, skip more wasted jobs  https://review.openstack.org/16603520:39
anteayawell clark is +2 on the patch, so my read is that he is happy as is20:40
*** tsg_ has joined #openstack-infra20:40
pleia2yeah, I'm going to hold off until checks pass20:40
anteayafair enough20:40
*** bswartz has quit IRC20:40
pleia2but it's still on my radar20:40
anteayagreat, as long as we have it merged for tomorrow20:40
anteayadidn't want it to get lost20:40
clarkbmordred: so, thinking about rebuilds and I know jeblair wants it to be zuul v3, but if we changed from queues for provider managers to heaps where delete had a higher cost/lower priority we could then update them to be rebuilds which have a lower cost then resort and bam now we have new node20:41
clarkbmordred: I think that may be a relatively easy change assuming the heap implementation doesn't make us cry20:41
*** tsg has quit IRC20:41
*** AJaeger has quit IRC20:41
jeblairclarkb: do you know why i want to work on that in zuulv3?20:41
clarkbjeblair: no20:42
greghaynesclarkb: heapq?20:42
clarkbgreghaynes: ya but needs to be thread safe20:42
clarkbgreghaynes: so likely needs a wrapper of some sort, doable I just haven't thought about implementation much20:42
jeblairokay, so i've tried to explain this already, but i guess i haven't been sucessful.20:42
greghaynesclarkb: what about starvation?20:42
jeblairi will try again20:42
jeblairrebuilding is a big change in logic for nodepool20:43
jeblaircurrently the whole thing is built around delete/create20:43
jeblairspecifically the allocator assumes that behavior20:43
jeblairthe allocator is _incredibly_ complex at this point20:43
clarkbjeblair: yes, so thats why I am thinking about how to make it without making it a big change and I think using a heap above can make it not a big change20:43
jeblairno individual actually understands how it works20:43
jeblairchanging from the delete/create cycle to rebuild means altering the allocator20:44
*** dimtruck is now known as zz_dimtruck20:44
clarkbjeblair: I don't think it has to20:44
clarkbwhen you call create it would replace a delete with a rebuild if there are any rebuilds, otherwise place a create on the heap20:44
jeblairclarkb: okay, how can we avoid that then?20:44
*** garyh has joined #openstack-infra20:45
clarkbwhen you call delete it just adds a delete like normally happens20:45
*** sdake_ has joined #openstack-infra20:45
clarkbbut the api for the allocator remains the same, the provider manager just returns a node back into the scheduler that may or may not have been rebuilt20:45
clarkbgreghaynes: starvation is somethign to worry about, BUT we are already so starved doing deletes I don't think it will be worse20:46
*** timcline has joined #openstack-infra20:46
*** hyakuhei has joined #openstack-infra20:46
greghayneshrm, I mean, I could wip out a siple PI loop for that ;)20:46
greghaynesbecause thats effectively what you need here too20:47
greghaynesbut seems like different problem first20:47
clarkbI am fairly positive that the naive implemetnation would work just fine at least compared to the current situation20:47
greghaynesyea, with this kind of stuff its almost always a 'just test it'20:48
*** sdake has quit IRC20:49
*** enikanorov has quit IRC20:50
*** radez is now known as radez_g0n320:50
pleia2zaro: precise apply failed on https://review.openstack.org/#/c/165145/ note inline about it20:50
jeblairclarkb: i could see how that would work with hpcloud in its current situation.  i do not believe we would end up doing very many, if any, rebuilds on rax because deletes happen so quickly there.20:50
clarkbjeblair: ya, it likely would not work well in a situation where delete isn't very high cost20:50
jeblairclarkb: and you would not need to change the allocator, unless you wanted to fix the rax problem20:51
jeblairclarkb: though you would need to change quite a bit of the rest of nodepool -- the delete and create threads20:51
*** enikanorov has joined #openstack-infra20:51
*** adalbas has quit IRC20:52
openstackgerritDoug Hellmann proposed openstack/requirements: Fix oslo caps for kilo  https://review.openstack.org/16637720:53
jeblairso here's why i want to do this in zuulv3 -- nodepool is really complicated, and hard to maintain and hard to test changes.  the allocation system in v3 will be so much simpler -- node requests are just a fifo, and the allocator just needs to find spare capacity as it comes up.20:53
zaropleia2: uggh, the puppet file doesn't have refreshonly. it was used for exec from previous patch.  will fix up20:53
jeblairit will be really simple to build systems like this on top of it20:53
greghaynesjeblair: is v3 just the one spec at this point?20:54
pleia2zaro: thanks :)20:54
fungii could sort of see if nodepool sees we're running at capacity so it's deferring creation, then waiting demand could get spun to rebuild calls on nodes which complete jobs instead of making delete calls for them, but that would then only kick in when you're out of capacity everywhere20:54
*** garyh has quit IRC20:54
jeblairwhereas changing fundamental things about nodepool is really hard right now.  i'd rather try to avoid spending a lot of time working on an implementation in the current complex system which i want to get rid of.20:55
fungiwhich wouldn't necessarily be much of an improvement over the current situationm20:55
fungiyep, makes complete sense20:55
jeblairgreghaynes: yes.  i was in the middle of writing the next part20:56
mordredI think I have a patch almost finished to deal with the current weird hp failure, btw20:56
clarkbya I think the ultimate goal of zuul v3 is good. I just don't know how long we can limp along at <200 useable nodes at any time20:56
jeblairgreghaynes: (it's also an email, which is a good chunk of it)20:56
greghaynesah, ok. I should find that20:56
jeblairclarkb: it's more like 300, but yeah20:57
*** nilasae is now known as nilasae|zzz20:57
openstackgerritKhai Do proposed openstack-infra/system-config: Fix jquery setup on Gerrit server.  https://review.openstack.org/16514520:57
zaropleia2: ^20:57
*** rkukura has quit IRC20:58
clarkbzaro: pleia2 oh refresh only must be an exec thing20:58
fungigreghaynes: http://lists.openstack.org/pipermail/openstack-infra/2015-February/002471.html20:58
pleia2clarkb: yeah, seems so20:58
mordredclarkb, jeblair: either one of you happen to have one of the 500 level exceptions on create tracebacks laying around handy?20:59
fungimordred: there was one in my paste earlier20:59
pleia2zaro: thanks, I'll keep an eye on tests and approve when things pass20:59
clarkbmordred: I do not but I can probably grep one for you if that helps20:59
mordredah! foudn one20:59
jeblairclarkb: if we need to do it, then we need to do it.  honestly, for something like that i expect that one or two infra-cores will spend a week or two babysitting it and fixing things _after_ we've merged the change.  we never find all the edge cases right away.20:59
greghaynesfungi: tyty20:59
fungigreghaynes: ywyw21:00
clarkbjeblair: yes, it wouldn't be a low cost change to implement. But I do think we can avoid allocation complications21:00
*** tkelsey has joined #openstack-infra21:00
*** achuprin has quit IRC21:00
*** sarob has joined #openstack-infra21:00
*** _nadya_ has joined #openstack-infra21:01
jeblairclarkb: how would you like to proceed?21:02
*** baoli_ has quit IRC21:02
*** emagana has quit IRC21:02
openstackgerritMerged openstack/requirements: Bump sahara client version  https://review.openstack.org/15542821:02
*** esker has joined #openstack-infra21:03
clarkbI am reading heapq docs to see how terrible a priority queue implementation might be. It doesn't look like replacing arbitrary entries is very performant or easy21:04
*** tkelsey has quit IRC21:04
*** dmorita has joined #openstack-infra21:05
*** emagana has joined #openstack-infra21:05
clarkbthough can probably work aroudn that simply by using two Queue.Queues, since really we only have two priorities. delete and everything else21:05
*** Ryan_Lane has quit IRC21:05
clarkblet me see how terrible updating the code to do this might be really quickly21:06
*** _nadya_ has quit IRC21:06
greghaynesclarkb: You can just rebuild the queue, I think we have *gasp* hundreds of nodes, right?21:06
*** hasharConfcall is now known as hashar21:07
openstackgerritMonty Taylor proposed openstack-infra/nodepool: Deal with failures that succeed  https://review.openstack.org/16638321:07
clarkbgreghaynes: ya but this way I also get thread safety so I am just going to go with it21:07
mordredclarkb, fungi, jeblair: ^^ there is a first stab at adding some smarts about waiting for teh server record to show up after a 50021:07
greghaynesclarkb: well we have to make inserts thread safe too, IMO why not just put locks around it and call it good21:07
clarkbgreghaynes: because Queue.Queue is thread safe21:07
mordredI would like to posit that it might get us a decent amount more nodes in hpcloud - given how many aliens we keep growing21:08
mordredI'd like eyes on the approach before I spend too much time working on testing it21:08
jeblairmordred: i believe most of the aliens have been related to our failed experiment from yesterday21:08
mordredjeblair: we had them before our experiment21:09
jeblairmordred: before then, we only had a handful every few weeks i want to say?21:09
mordredjeblair: possibly - but I think it's a regular failure mode of hpcloud now21:09
mordredjeblair: so handling it is appropriate21:09
clarkbjeblair: we had ~150 a couple days ago21:09
jeblairclarkb: okay, that's more than i recall dealing with.21:10
*** zz_dimtruck is now known as dimtruck21:10
mordredyah - I remember trying to get the noc folks to care21:10
*** ldnunes has quit IRC21:13
krotscheckclarkb, mordred: I stand corrected, minified does actually gzip even more -> http://paste.openstack.org/show/194069/21:14
clarkbkrotscheck: meh its 5kb difference :)21:14
*** mattfarina has quit IRC21:15
*** bswartz has joined #openstack-infra21:15
krotscheckclarkb: You mean 5021:15
krotscheck?21:15
openstackgerritMerged openstack-infra/project-config: Use template for Rally py34 job  https://review.openstack.org/16485821:15
*** timcline has quit IRC21:15
krotscheck(well, 43)21:16
fungikrotscheck: that's also a much newer jquery than we're talking about21:17
krotscheckfungi: True.21:17
fungithe difference in size is substantial from 1.721:17
*** emagana has quit IRC21:18
clarkboh was I off by order of magnitude beacuse they increased size by an order of magnitude?21:18
krotscheckWell, 1.11 also has a ~40KB difference.21:18
jeblairjquery was downloaded from review 15062 times yesterday (out of 40385 requests; most were 304 not modified)21:18
krotscheckI actually think it's documentation.21:19
*** emagana has joined #openstack-infra21:19
fungioh, wait, it's not. i was looking at the file earlier and seeing an order of magnitude difference in size21:19
clarkbmaybe all my maths are off21:19
fungifridaymath21:19
clarkbmordred: that node I gave you a uuid for is still building fwiw21:19
krotscheckYeah, caching is definitely a the thing that needs to happen.21:19
* greghaynes is curious what the thing trying to be optimized is21:21
greghaynesbecause if its either download speed or bandwidth then google jquery cdn will be the best fix21:21
greghaynesbut if its just effort then not sure it matters21:21
fungimostly effort as far as i'm concerned. but also i like us serving actual readable/modifiable source code21:22
krotscheckWell, the reason I care right now is taht I've had a bunch of discussions with packagers and other frontend people, and I'm trying to come up with sane JS policies to propose to the TC.21:22
*** timcline has joined #openstack-infra21:22
krotscheckThings like: Don't minify, learn to cache instead.21:22
*** emagana has quit IRC21:23
fungibut also make it possible for the deployer to decide to switch out the js for minified versions if they really want to go through the effort to be that ocd about it21:23
krotscheckBut for that I need data.21:23
clarkbjeblair: you know where this really gets complicated? the fact that we have to delete >1 thing :(21:24
jrollfungi: fwiw, you can serve a source map to modern browsers so that they can make the JS readable21:24
*** hyakuhei has quit IRC21:24
mordredkrotscheck: data++21:24
fungijroll: true, like shipping separate stripped binaries and symbol files21:25
jrollyep21:25
mordredclarkb: still building seems very lame21:25
greghaynesactually, that brings up a good point krotscheck ^ the biggest gain of not minifying if its software were making is that we can actually debug errors21:25
mordredclarkb: devstack-trusty-1426883075.template.openstack.org <-- rax-dfw rebuilt21:25
greghaynesotherwise you have to do that souce mapping hackery21:26
SpamapSmordred: https://review.openstack.org/#/c/166383/ -1'd21:26
SpamapSmordred: but I think the idea is sound and worthwhile.21:26
mordredSpamapS: thanks - good feedbacks21:26
krotscheckgreghaynes: Ooooh yes. I know that pain acutely.21:26
fungikrotscheck: jeblair: anyway, assuming that all those downloads yesterday were gzip-compressed, that's ~630mib additional data which would have been downloaded if we weren't minifying. so not enormous21:27
SpamapSgreghaynes: my experience has been that building a simple way to run in a debug mode using the query string helps with that.   foo/?dontminify=121:28
clarkbSpamapS: good luck getting that into gerrit21:28
SpamapSclarkb: They don't have UI engineers with sanity requirements? :)21:29
greghaynesSpamapS: The problem is a lot of the time you have a setup where clients send backtraces to you when they error21:29
greghaynes(I wonder if any of our deployers do that?)21:29
greghaynesSince you kind of want to know if the code you send them works21:29
krotscheckgreghaynes: I thought that's why we have tests.21:29
krotscheck:D21:29
greghaynesheh21:29
fungiSpamapS: sanity requirements? it was initially developed at google. i think they just don't have ui engineers21:31
fungialso, i think the current and new webuis for gerrit are further proof of that conjecture ;)21:31
*** kgiusti has left #openstack-infra21:32
jeblairjquery is not used by gerrit21:32
krotscheckWell, they built Angular, which is pretty cool.21:33
krotscheckBut then they ported it to TypeScript.21:33
krotscheckSo I'm not certain what that says about them.21:33
openstackgerritClark Boylan proposed openstack-infra/nodepool: Rough rough shape of what rebuilds might look like  https://review.openstack.org/16638721:33
clarkbjeblair: ^ is the basic shape of the thing21:34
fungijeblair: nope, but getting switches into the gerrit request syntax to switch between serving multiple javascript files was the ui sanity question, not really jquery specifically21:34
jeblairkrotscheck: s/built/hired developer of/21:34
clarkbjeblair: but you are right there are some hairy bits, I have commented on them with TODOs and am curious if you think its worth figuring those bits out21:34
krotscheck....oh. Well then.21:34
krotscheckStill, TypeScript.21:34
krotscheckick.21:34
krotscheck(Though maybe not really)21:35
clarkbjeblair: biggest thing is DeleteServerTask needs to become more atomic within nodepool21:35
*** teran has joined #openstack-infra21:35
*** dboik has quit IRC21:35
clarkbjeblair: so that when it fires it handles all of the delete tasks otherwise rebuilt nodes would need ot figure out floating ips and keypairs some other way21:35
*** spzala has quit IRC21:36
BobBallHow can I have a gerrit reporter comment to gerrit without voting?  Seems that I _must_ have a value after the "gerrit:" tag which gets translated to something that's actually sent through.  Is there a no-op command I can add?21:36
jeblairBobBall: look at the definition of our experimental pipeline21:37
BobBalloh - {} - how obvious... Thanks.21:37
jeblairclarkb: do you think we should defer an existing priority effort here?21:37
*** esker has quit IRC21:38
*** VijayTripathi1 has quit IRC21:39
clarkbjeblair: I think getting the deletions working correctly with the various resources that need to be deleted + rebuilds will likely sink quite a bit of time and probably are not worth it21:39
openstackgerritMonty Taylor proposed openstack-infra/nodepool: Deal with failures that succeed  https://review.openstack.org/16638321:39
clarkbjeblair: we already know that this stuff is hairy even without rebuilds21:39
clarkb(we leak floating ips for example)21:40
mordredok. if we're doing that21:40
openstackgerritMonty Taylor proposed openstack-infra/nodepool: Use rebuild instead of delete  https://review.openstack.org/16638821:40
mordredI wasn't pushing mine up because we weren't doing it and I didn't want to be a randomization function21:40
mordredbut I think it can be much much simplier21:40
clarkbmordred: your change won't work for the same rasons I think21:40
clarkbmordred: no yours has the same issues21:41
mordredok21:41
mordredjust saying21:41
clarkbmordred: the problem here is that a VM isn't just a VM21:41
clarkbit should be and we should give that feedback to nova and neutron21:41
clarkbbut unfortunately today it isn't21:41
mordredclarkb: I don't know that that matters in this case21:42
clarkboh I see you just never call cleanupServer at all21:42
mordredyeah21:42
mordredthat's why I say "this will just keep things at max usage"21:43
clarkbya, the one place that may be a problem is with snapshot builds, maybe21:43
clarkbI do not know how "clean" a rebuild is for that21:43
mordredI think a follow up could be done to deal with that - but my main thought experiment was "can this be done without affecting the algorithm at all"21:43
clarkbanyways I think the deal I was describing isn't as simple as I thought because delete is really at least 3 delete operations21:44
clarkband as soon as we have to make that more atomic it becomes complicated21:44
mordredyes. getting delete right is very hard21:44
mordredbecause of that21:44
*** EmilienM is now known as EmilienM|PTO21:44
mordredI wish re-using floating-ips was less insane - but the race condition ... ZOMG - I just had an idea for that - probably a zuulv3 idea21:45
mordredbut I don't know why I didn't think of it before21:45
mordredI've been trying to think about floating-ip reuse atomiticy in pure openstack terms21:45
mordredbut we have a database21:45
mordredwhich means we can deal with the multi-thread race condition issues with allocation flags in the db, rather than in the nova/neutron api21:46
jeblairokay, i'd like us to make some kind of a decision here21:46
fungiso basically, to summarize, 166383 will go ahead and delete (or attempt to delete) the errored instance and wait for it to no longer appear in the nova list... but are we necessarily able to track it at that point (e.g. has it provided an instance uuid so it can be identified)?21:46
jeblairi was expecting to spend this afternoon working on zuulv3 specs21:46
jeblairbut i'm getting nowhere, because we're still talking about rebuilds21:46
jeblairso, can we decide to either pursure this thing or not?21:46
mordredfungi: no - 383 does the opposite21:46
jeblairi don't see any way it's simple21:47
mordredfungi: it continues to try to use a node even if it gets a 500 error - since we've learned that 500 errors are lies21:47
*** otter768 has joined #openstack-infra21:47
clarkbjeblair: I am with you now, we have not pursued it yet because it is complicated in various ways. Despite that we have limped along with create-delete we should be able to make that work while zuulv3 happens21:47
jeblairi think it's at least one person working on it a while, and at least one core babysitting it for a week21:47
jeblairif people think it's worth doing, let's knock something off the priority list and do it21:47
jeblairor put it on the priority list backlog21:47
*** aysyd has quit IRC21:47
fungimordred: ahh, yeah i forget that it can actually become usable even after a 5xx rather than simply hang around broken21:47
jeblairbut it's a big enough thing that i don't think we have any more marginal time for it21:48
mordredI think it's maybe worth it - and I Agree it's not going to be quick and easy21:48
*** doug-fish has left #openstack-infra21:48
mordredI say maybe because I think one of our clouds is not goign to get any better any time soon, and I think that we've learned that delete calls are super expensive in openstack21:48
*** hashar has quit IRC21:48
jeblairmordred: so are create.  do we know how expensive rebuild is?21:48
mordredso the ability to avoid them may give us a much larger amount of bandwidth21:48
mordredjeblair: I expect it to be the same21:49
mordredwhich makes it 1/2 as expensive21:49
fungihowever, we're past feature freeze now... we might have quite a few months we can limp along with the current create/delete model21:49
jeblairhas anyone tried this and timed it on hpcloud?21:49
*** jamielennox|away is now known as jamielennox21:49
mordredI think timrc did some initial benchmarks, yes21:49
jeblairfungi: i agree21:50
clarkbtimrc did, I don't have his numbers handy but they were an improvement21:50
mordredfungi: that is a good point21:50
openstackgerritMerged openstack-infra/system-config: Fix jquery setup on Gerrit server.  https://review.openstack.org/16514521:50
*** timcline has quit IRC21:50
fungihence i'd rather see those several months runway invested hard in zuul v3 rather than continuing to try to make incremental improvements to the current design21:50
mordredI don't think it's urgent anymore because of FF - and I've been solidly on the "do it later" camp because adding it now will make the nodepool-shade patch larger21:51
fungiwhich might rob us of the resources we need to get new zuul in time for the _next_ feature freeze21:51
*** otter768 has quit IRC21:51
anteayazaro: so we have all we need for tomorrow?21:51
clarkbmordred: and you don't expect hpcloud to correct whatever has ailed it over the last couple weeks?21:52
mordredno21:52
mordredor, rather21:52
clarkbI am fairly positive this was not a problem a couple mnoths ago21:52
mordredI think we need to assume for planning purposes that it will not21:52
mordredso that if it does, it will be a pleasant surprise21:53
*** enikanorov has quit IRC21:53
jeblairmordred, clarkb: let's do this.  think about it a bit more, and talk to timrc if you want.  do some experiments to see if we would actually gain anything, and if you want to do it, propose it as either a backlog priority effort, or propose that we bump a current priority effort for it at the next meeting.21:53
jeblairmordred, clarkb: how's that sound?21:53
* timrc perks up21:53
clarkbjeblair: ya, we should definitely test it in the new hpcloud situation too21:53
mordredjeblair: I think that's a great plan21:54
*** mriedem has quit IRC21:54
clarkbjeblair: not sure that timrc's testing captured what it is like now21:54
timrcclarkb, I pastebin'ed my numbers on the review for 'rebuild'21:54
clarkbtimrc: yes, but you did so before we blew up hpcloud yesterday21:54
*** weshay has quit IRC21:54
*** enikanorov has joined #openstack-infra21:54
mordredthere is another thing - which is that I think one of us might need to test in the openstackjenkins2 tenant21:54
timrcclarkb, Yeah... want me to rerun the script?21:54
mordredtimrc: so if you can provide a script21:54
timrcThere is a script...21:55
mordredclarkb: I say that beacuse I think delete time might be tied to tenant account too21:55
clarkbmordred: gotcha21:55
*** garyh has joined #openstack-infra21:55
timrcGive me a second... I'm running off of what my carrier says is 4G on a beach in south Texas.21:55
mordredI don't know that it is - but if there are database table issues, then our soft-delete database history could make a difference21:55
mordredtimrc: it's not urgent21:55
clarkbbut ya I think checking that performance is a good next step and from there we can decide if its worth the effort21:55
mordredclarkb: ++21:55
timrcmordred, clarkb, jeblair: Numbers: http://paste.openstack.org/show/187334/ Script: http://paste.openstack.org/show/187333/21:56
jeblairokay, i'm going to get back to writing the zuulv3 spec so that it stops being an imaginary thing, and so we can get closer to actually working on it instead of blocking on me21:56
fungithanks jeblair!21:56
mordredjeblair: woohoo!21:57
jeblairthat's region-a, which is another difference, yeah?21:57
fungimordred: on the "limping along" front, any updates on whether hpcloud west is something we should try?21:57
clarkbjeblair: yes we are in region-b21:57
timrcclarkb, So... when you say blew up hpcloud... what does that actuall mean? I've been on vacation.21:57
SpamapSbewm21:57
jeblairkabloom21:57
SpamapSas in, boom with a fratboy accent21:57
fungitimrc: maybe you should avoid worrying about it while vacationing. seems like a waste of a good vacation21:57
mordredtimrc: go back to vacation - you don't want to know21:57
clarkbtimrc: https://community.hpcloud.com/status/incident/294421:58
timrcfungi, Do you have kids?21:58
fungitimrc: point taken ;)21:58
timrc;)21:58
*** gordc has quit IRC21:58
SpamapSit kind of makes sense.. rebuild does an update on the row a few times (for status, image id) but otherwise just happens all in nova-compute21:58
mordredyah - also, we dont' have to re-floating-ip21:58
mordredso the number of API calls it takes is considerably less21:58
mordredand if API call limit is one of our blockers21:59
SpamapScreate has to be scheduled21:59
mordredthen that might actually be more important - or at least an important factor21:59
*** carl_baldwin has quit IRC22:00
SpamapSyeah reducing API calls would be a win especially for hpcloud's current ailments22:00
fungiagreed, testing rebuild performance in our tenant while hpcloud is failing to respond to most of our api calls might be an interesting performance datapoint22:00
SpamapSalso with HPCloud floundering, does this increment the priotity a bit on infra cloud?22:00
*** armax has quit IRC22:00
fungiSpamapS: dunno. i haven't seen recent updates the people who were writing the infra-cloud bits, though i could have missed some22:01
fungier, from the people22:01
jeblairSpamapS: i think infra cloud is already fairly high priority because of this22:01
jeblairfungi: may be a misconception here... lemme splain22:01
jeblairi've asked the new folks joining our team to pitch in on existing efforts because i don't want there to be an infra-cloud team which is separate from the infra team22:02
*** mrmartin has joined #openstack-infra22:02
SpamapSI've moved writing the initial docs changes up to the top of my priority, to be multi-plexed with nodepool and shade testing.22:02
fungioh! yes that's an extremely good thing22:02
*** tnovacik has quit IRC22:02
jeblairi think that's going pretty well so far22:02
jeblairso i think that as SpamapS finishes up the doc he's writing...22:03
fungithat explains the recent uptick in people getting more involved in general infra stuff, so i'm thrilled. seems to have worked out well so far22:03
jeblairwe can start to slot that effort into the priority list when one or more things wrap up22:03
*** sabeen1 has quit IRC22:04
mordredSpamapS: actually ... I was going to ask you if you'd inject work on nodepool-dib into your priority list22:04
fungiyep, i missed that's how it was ramping up. i may have skimmed one of the meeting logs from when i was on vacation a little too lightly22:04
jeblairand we'll all be working on it together, at least as much as we do anything else -- some people are going to focus on it more than others, but it'll operate more like the other things we've got going on22:04
clarkbSpamapS: did the first step of homogenizing hardware get started? /me wonders where we are at22:04
mordredSpamapS: because I think both of the main blocking tasks there you are exceptionally well suited to attack22:04
mordredclarkb: no - we have done no tasks there22:04
*** Swami has quit IRC22:05
*** jamielennox is now known as jamielennox|away22:05
mordredclarkb: I have run puppet on a node in each cloud region - so you have a login on them22:05
* greghaynes might also be able to help with nodepool-dib if SpamapS is spread too thin22:05
*** garyh has quit IRC22:05
mordredclarkb: but they have not been cleaned in any way - pending what jeblair is discussing before22:05
SpamapSmordred: by all means, push things onto my stack. :)22:05
jeblairalso, we expect to have a few more people joining soon too22:05
mordredSpamapS, greghaynes: Ng and GheRivero are starting to look as well - but the two tricky tasks are:22:06
clarkbmordred: I understood that was step 0 what is step -1?22:06
fungiout of curiosity, and feel free to point me to existing descriptions/documentation, what sort of initial capacity are we expecting out of the current hardware?22:06
mordredclarkb: current infra priority efforts22:06
jeblairso i hope that happens in a time frame where they can also pitch in on non-infra-cloud things, but also be here when we really start on infra cloud22:06
mordredSpamapS, greghaynes: get a working base image that can boot on both rackspace and hp for ubuntu and centos/fedora22:06
clarkbmordred: right but aiui its just an internal hp ticket to have someone physically located in the same building as the hardware move some pcie cards around22:06
flashgordonanteaya: pong, re: devstack-gate triggers grenade which is why we still should run grenade on it22:06
flashgordonclarkb: pong, which bug?22:07
greghaynesmordred: ah, and AIUI the issue there is just rax networking?22:07
mordredclarkb: yes - but we need to also do some more design on networking vlans, which is an infra team task before we set that in motion22:07
clarkbflashgordon: `nova boot` returns 502 error from api server, but then nova boots the node anyways22:07
mordredgreghaynes, SpamapS: we have a script that can handle rax networking22:07
mordredcurrent issue is making sure that script runs at the right tie during boot22:07
SpamapSmordred: Ah see here I thought you had that well in hand and it was about done.22:07
flashgordonclarkb: what is the full response?22:07
clarkbmordred: gotcha22:08
mordredgreghaynes, SpamapS: Ng has been looking at it some, but is battling rackspace london22:08
flashgordonclarkb: very odd22:08
*** dboik has joined #openstack-infra22:08
mordredgreghaynes, SpamapS: but, in any case, you guys know a lot about dib things too :)22:08
clarkbmordred: have that nova 500 exception handy?22:08
*** marun has quit IRC22:08
mordredclarkb: one sec ...22:08
anteayaflashgordon: okay where do I find a list of projects grenade runs on?22:08
mordredclarkb: it's a ClientException "unknown error" fwiw22:09
mordredSpamapS, greghaynes: the second thing is the "port nodepool to shade" task - which yolanda started and GheRivero started looking at22:09
flashgordonanteaya: two answers, 'git grep project-name' in grenade22:09
mordredbut it's going to be a not-small patch22:09
mordredso collaboration is likely important22:09
clarkbflashgordon: ClientException: Unknown Error (HTTP 500)22:09
mordredit's going to involve porting smarts from nodepool into shade in a few places22:10
flashgordonanteaya: and http://git.openstack.org/cgit/openstack-dev/grenade/tree/ check for upgrade-* as you mentioned above22:10
fungiclarkb: flashgordon http://paste.openstack.org/show/193961/22:10
clarkbwe apparently don't have any 502's in the log so I was wrong about specific error22:10
fungi502 error actually22:10
SpamapSmordred: writing more tests will end up being a good parallel effort to maintain while that is ongoing.22:10
anteayaflashgordon: horizon isn't there, devstack-gate isn't there22:10
clarkboh two different types of Unknown Error. The best kind of Unknown22:10
anteayaflashgordon: neutron isn't there22:10
openstackgerritmelanie witt proposed openstack-infra/project-config: Adjust regression exceptions for Nova Cells V1 job  https://review.openstack.org/16639622:10
mordredSpamapS: yes indeed - but it would be great to shove a facehead into the bucket of that - I'm betting it will expose some specific things that need specific testing22:11
flashgordonupgrade-neutron22:11
greghaynesmordred: so for the first task - is the state of things that were all good for images in hpcloud and weve yet to get booting images in rax or is there hpcloud issues as well?22:11
SpamapSI keep forgetting that infra uses topics so nicely in gerrit.22:11
clarkbgreghaynes: hpcloud is good22:11
* SpamapS finds all the revies22:11
SpamapSreviews even22:11
flashgordonanteaya: https://github.com/openstack-dev/grenade22:11
flashgordonanteaya: http://paste.openstack.org/show/194083 is what I see22:12
clarkbflashgordon: but remember I was complaining that we were leaking nodes? this is how22:12
mordredgreghaynes: it's actually mostly that centos/fedora aka systemd is weird22:12
flashgordondevstack-gate calls grenade so the relationship is the other way around22:12
flashgordonclarkb: ahh22:12
mordredgreghaynes: we MAY be really close to being awesome everywhere22:12
flashgordonclarkb: that log isn't really useful hmmm22:12
greghaynesmordred: yep, thats the whole we need our script to run at the right time WRT networking and cloud-init, yes?22:13
*** amitgandhinz has quit IRC22:13
mordredgreghaynes: but we need to do empirical testing of the axises of (ubuntu, debian, centos, fedora) * (hpcloud, racspace)22:13
anteayaflashgordon: what is upgrade-infra?22:13
clarkbflashgordon: yes well we only have nova to blame for that :) but I think the key bit is that even after a 5XX error it is possible for nova to continue scheduling a node happily22:13
greghaynesok22:13
mordredgreghaynes: yes - except no cloud-init22:13
SpamapSmordred: maybe we need to make our script a little more fuzzy22:13
fungiflashgordon: not really useful, but also all that novaclient tells us22:13
greghaynesoh, we changed that again22:13
clarkbmordred: wait22:13
SpamapSas in, run it backgrounded and keep trying, as long as we block bad things from happening.22:13
clarkbmordred: lets back up on that, we just tried no cloud-init and it failed spectacularly22:13
clarkbmordred: I think we should cloud init for this reason22:13
clarkbmordred: at least as a first stab22:14
mordrednonononoo22:14
mordrednononononononononononono22:14
mordrednononononon22:14
fungiclarkb: but remember that means installing our own non-distro-packaged cloud-init too22:14
mordredthat is tottally differeent22:14
mordredplease don't confuse the issues22:14
greghaynesI think hes stuck in a loop22:14
clarkbmordred: I don't think I am, you just said no cloud init22:14
mordredno22:14
SpamapSreboot him22:14
clarkbmordred: we just tried that, it broke22:14
mordredhangon22:14
mordredno22:14
* timrc gets popcorn22:15
*** esker has joined #openstack-infra22:15
mordredit broke because we assumed that rax wasn't using cloud-init in their images22:15
mordredthey are22:15
mordredwe are not using their images22:15
flashgordonanteaya: upgrade infra is some random stuff AFAIK22:15
mordredin order to use cloud-init in our own images22:15
* SpamapS sends a SIGHUP22:15
mordredwe MUST use a patched version of cloud-init22:15
anteayaflashgordon: sigh22:15
mordredand it all gets very complex22:15
mordredI promise - we do not need to start over from scratch on this effort22:15
anteayaflashgordon: okay I have to go get more sap before I burn what is on the stove22:15
SpamapSright I believe nibalizer was working on cloud-init-in-a-virtualenv for that purpose?22:15
anteayaflashgordon: I'll look at it again later, thanks22:15
mordredwe are an initscript away from being done22:15
mordrednoe22:15
mordredNO seriously22:16
mordredcan we not start over from scratch22:16
SpamapSok for some other weird purpose. :)22:16
mordredI don't care why he was working on it22:16
*** ddieterly has quit IRC22:16
mordredI don't want to keep having this argument22:16
SpamapSI'm up for not starting over22:16
mordredwe are almost done with this22:16
SpamapSmordred: is there somewhere I can look at the result of things not working right?22:16
mordredit works - we're fine - we need to test it and make sure we've covered the combinations22:16
clarkbmordred: so, can you clarify what rax does use cloud init for and why we don't need to use it for that?22:16
SpamapSmordred: oh so it's truly at a point of needing to be reasoned about and landed, not smoke tested?22:17
flashgordonclarkb: can you run this with novaclient debug logs on?22:17
mordredSpamapS: yes22:17
flashgordonotherwise I don't have enough to go on22:17
*** abramley has quit IRC22:17
mordredclarkb: they have designed images that depend on cloud-init - I have not dug in to why22:17
mordredbut it's irrelevant22:17
greghaynesso, somewhat related question - mordred is there any sane way to get creds to boot some rax vms22:17
mordredwe have booted appropriate images in racksapce that do not have cloud-init and they work fine22:17
SpamapSmordred: ok, please point me at a starting point.. just the element in project-config ? Active review?22:17
flashgordonanteaya: no worries. so devstack has a lib/infra section22:18
mordredgreghaynes: yes - use your amex - I will approve it :)22:18
flashgordonupgrade-infra calls taht22:18
greghaynesmordred: easy enough22:18
mordredgreghaynes: just make sure they don't put you in london22:18
mordredand you'll have to request glance being turned on22:18
mordredSpamapS: one sec - I'm looming22:18
mordredlooking22:18
*** esker has quit IRC22:19
mordredSpamapS, greghaynes: https://review.openstack.org/#/c/154132/22:19
pleia2fungi: just saw your superuser.o.o interview, very nice!22:19
fungigreghaynes: i would say sign up for the iopenedthecloud.com promotion, but they stopped offering it and took the form offline22:19
flashgordonclarkb: not nova -- 10.5.3 502 Bad Gateway22:19
mordredneeds to be finished - and we'll want to import that repo into gerrit (the one in source-repositories)22:19
fungipleia2: you're welcome! ;)22:19
flashgordonhttp://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html22:19
mordredbut I figured just leaving it there until we're good is fine22:19
greghaynesfungi: Yea, missed that boat :(22:19
*** esker has joined #openstack-infra22:19
SpamapSahh base-elements as a topic22:20
*** abramley has joined #openstack-infra22:20
*** arxcruz has quit IRC22:20
clarkbflashgordon: it happens with 500 errors too iirc22:20
clarkbfungi: ^ did we narrow it down only to the 502?22:20
flashgordonclarkb: and nova doesn't raise any bad gateways22:20
fungiclarkb: i didn't see the 500 examples but i'll look for one22:21
flashgordonclarkb: do you have a 500 error log as well?22:21
clarkbflashgordon: its the same just s/502/500/22:21
flashgordonclarkb: hopefully if its a nova bug it will have a error message22:21
clarkbflashgordon: Unknown Error22:21
fungiclarkb: flashgordon: i wouldn't be surprised if this is some sort of network device sitting in front of teh api endpoint getting overwhelmed22:21
flashgordonno message22:21
clarkbfungi: ya22:21
flashgordonfungi: yeah that is my bet too22:21
mordredSpamapS, greghaynes: I imagine that you and Ng can probably knock it out in like, 30 minutes22:21
flashgordonwhich cloud is this on ?22:21
clarkbits entirely possible that nova is doing that it is told while a frontend device derps22:21
SpamapSmordred: I wonder if it would be simpler to land that script in the review, and then once we know it works, publish it as its own repo?22:22
clarkbflashgordon: hpcloud22:22
flashgordonclarkb:ah22:22
mordredSpamapS: I'm fine with whatever works best for your brain22:22
greghaynesmordred: yea, I suspect 90% of the effort is going to be just getting setup with rax properly22:22
mordredgreghaynes: yup22:22
mordredbut once you are - it'll make future testing things easier22:22
mordredbecause making sure shade works in both places is important too22:22
greghaynesyep, good point22:22
SpamapSmordred: oh its like, a thing, with setup.cfg and stuff22:23
clarkbfungi: iopenedthecloud ends next month too for those of us that have it iirc22:23
mordredSpamapS: we have this cookiecutter thing ...22:23
clarkbfungi: I need to find hosting that doesn't charge a $50 base fee22:23
SpamapSmordred: its more about it being best for velocity. Less moving parts in the beginning.22:23
SpamapSlurve me some single purpose well tested repos, but that this is not. ;)22:23
mordredSpamapS: yes - TOTALLY - I say do it - we can put it back later22:23
fungiclarkb: yeah i was never able to get them to correct my account to add that promotion so i've just been paying for about a year22:23
SpamapSmordred: ok22:23
* mordred must run away ...22:24
harlowja_clarkb hey, do u know if that virtualenv change ever happened so that https://review.openstack.org/#/c/164836/ can get rechecked?22:24
SpamapSgreghaynes: so, I suggest you start trying to use that review, and I will whip it into shape to be landed22:24
SpamapSmordred: sounds good, we got this22:24
clarkbharlowja_: no one has written it yet22:24
clarkbharlowja_: feel free to22:24
harlowja_k22:24
fungiclarkb: so far the 500 errors i'm finding are tripleo22:24
greghaynesSpamapS: Yep22:24
greghaynesmordred: Yes, all your base-elements are belong to us22:25
clarkbharlowja_: we have been battling the cloud exploded fires22:25
harlowja_np22:25
*** prad has quit IRC22:25
clarkbharlowja_: but it should be as simple as updating the line I linked with the latest version22:25
harlowja_ya22:25
clarkbharlowja_: or replacing the version specifier with ensure => latest22:25
harlowja_will get a review up22:25
clarkbharlowja_: ^ is likely the change we really want since we don't care about aging virtualenv/pip/setuptools22:26
harlowja_ya22:26
fungiclarkb: oh, found some http 500 errors in hpcloud but they're all for delete calls so far22:26
clarkbfungi: so its possible that only 502s caused the leaks22:26
fungiooh! ClientException: Unknown Error (HTTP 503)22:27
fungithere's another to hunt down22:27
fungithat was also on a deletion22:27
flashgordonclarkb: any idea of who I can switch to after the rax thing ends22:27
flashgordonmikal: I can has free cloud?22:28
clarkbflashgordon: no, I haven't really looked. I would stick with rax if the $50 base charge wasn't a thing22:28
flashgordon$50 base whaaat22:28
fungiclarkb: flashgordon: maybe https://www.runabove.com/22:28
clarkbhttps://www.arpnetworks.com/ are supposed to be good but not openstack22:28
fungithey have pretty low rates and are supposedly basic openstack services22:28
zaroanteaya: yep, looks like everything is in place.22:28
openstackgerritJoshua Harlow proposed openstack-infra/system-config: Always try to use the latest virtualenv  https://review.openstack.org/16640422:28
harlowja_clarkb ^ ok, let's see how that goes22:29
*** ociuhandu has joined #openstack-infra22:29
* flashgordon wants free22:29
clarkbfungi: their prices are pretty good and its openstack22:30
*** YogeeBear has joined #openstack-infra22:31
greghayneshighly reccomend arpnetworks22:32
clarkbyes but not openstack22:33
greghaynes:p22:33
greghaynesyea, if you want to actually do dev and need a cloud then :(22:33
greghaynesI need to deploy an openstack in my rack so I can do this22:34
fungii gave up having a small datacenter in my house when i moved to the beach22:34
fungiso remote virtual machines are now pretty necessary for me22:34
greghaynesfungi: I actually colo22:35
greghaynesbut yea22:35
greghaynesthers upsides and downsides22:35
fungia colo near where i live would be way more expensive than what i'm doing now22:35
openstackgerritJoe Gordon proposed openstack-infra/project-config: Don't run neutron-large-ops on neutron advanced services  https://review.openstack.org/16564822:35
reedfungi, when you have time, remember to pull the list of new ATCs... you can do it on Monday22:35
fungiand "near" would be ~2-3 hours drive22:35
greghayneseek22:35
fungireed: yep, on my list for this weekend22:35
fungithough might end up being monday22:36
greghaynesWe end up having pretty cheap colo here in pdx (although its not the best DC) and I actually just do it because my home became way too hot otherwise22:36
reedno work over weekend, fungi22:36
clarkbgreghaynes: by not the best DC I think you mean its basically someones garage22:36
*** bknudson has quit IRC22:36
clarkbgreghaynes: because lol shelf servers22:36
fungireed: what work? i'm happily enjoying retired life22:36
reedLOL22:36
greghaynesclarkb: haha, its actually impressive as the building and infra goes, but yea they are pretty low key on how they manage it22:37
greghaynesclarkb: tata was going to move their HQ there then the bubble burst and the dc was just kinda overbuilt and underused22:37
fungigreghaynes: my home datacenter had a separate air conditioning unit for exactly that reason22:37
fungiand it was in my basement, so in the winter i'd just open the door to the rest of the house and use the computers as auxiliary heating22:38
*** YogeeBear has left #openstack-infra22:38
greghaynesnice!22:38
fungii had relay racks bolted to heavy duty shipping pallets with swivel casters underneath as my poor-man's raised floor, so i could move them around in the room as needed22:40
fungi3x3 grid of 500lb-rated swivel casters underneath each22:40
clarkbrunabove won't let me try their free tier without supplying a credit card22:40
clarkbI get it but  :(22:41
*** erlon_away has quit IRC22:41
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/project-config: Add config-drive element  https://review.openstack.org/15413222:41
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/project-config: Add elements for Infra servers  https://review.openstack.org/14084022:41
SpamapSgreghaynes: ^22:41
SpamapSremoved the gitorious dependency until we can get that code into its own proper stackforge (openstack??) project.22:42
fungiclarkb: yeah, and also a friend of mine tried to sign up with it for the very small start-up he was working for, and they insisted he provide paperwork proving the company existed. it's a french parent company, so "shut up and take my money" doesn't work with them like it does with, say, amazon22:42
SpamapSgreghaynes: also what happened to your devuser patch to dib?22:42
fungiSpamapS: gitorious? you mean gitlab right? ;)22:42
greghaynesSpamapS: not merged yet22:42
SpamapSfungi: sourceforge ftw?22:43
fungiSpamapS: apparently22:43
greghaynesSpamapS: oh, looks like they added a linter for some dib stuff that it needs to be updated for https://review.openstack.org/#/c/153439/22:43
greghaynesSpamapS: You going to use it?22:43
*** sdake has joined #openstack-infra22:43
clarkbwow and I can't boot any instances because they are reserved for paying customers currently22:44
SpamapSgreghaynes: I had a moment of wanting to just run 'kvm foo.qcow2' to test .. and no user to login as. ;)22:44
*** mriedem has joined #openstack-infra22:44
clarkbI wanted to see what their sandbox instances are. container vs vm etc22:44
clarkbI am in a waiting list though so I shall wait22:44
SpamapSclarkb: just setsockopt O_NONBLOCK .. 60% of the time it works, every time.22:45
greghaynesSpamapS: Also, not sure if you saw: https://review.openstack.org/#/c/156433/22:45
fungiclarkb: so far the 503 errors are also deletions, but in rackspace22:45
clarkbweird22:45
greghaynesSpamapS: Im unsure if the fact that people have tested in rax means that is verified as working or if people have been building images by hand, but it would be good to test22:45
nibalizerwhat was I supposedly doing?22:46
greghaynesnibalizer: fixing everything22:46
nibalizerwell i can confirm i am not doing that22:46
nibalizerSpamapS: ?22:46
fungiclarkb: flashgordon: oh, here's a ClientException: Unknown Error (HTTP 500) on create in hpcloud, so we do see some with that too22:47
*** baoli has joined #openstack-infra22:47
greghaynesnibalizer: we were talking about why you were messing with cloud-init in venv22:47
*** sdake_ has quit IRC22:47
greghaynesbut its actually not relevant since were decidedly not trying to cloud-init ATM22:47
clarkb Ithink I may have discovered that if I use horizon with runabove I can boot an instance <_<22:47
clarkbwe will see if it actually successfully boots22:48
nibalizerokay coool, yea burn cloud-init in the firepit plz22:48
nibalizerhopefully its author isn't in this room22:48
fungioh, he is22:49
*** baoli has quit IRC22:49
* nibalizer apologizes22:50
clarkbwoot I got a node, total hacks22:50
clarkbit looks like their $2.50/month node is a kvm vm22:50
clarkbwhich is winning22:50
fungiwhat specs for ram/disk?22:51
clarkb2GB ram 20GB disk22:51
flashgordonfungi: do you have the full trace?22:51
clarkbbut its oversubscribed, the ~$10/month is supposedly not oversubscribed22:52
clarkbalso you have to login as admin22:52
fungiflashgordon: sure, but the files/lines are the same as from the 50222:52
flashgordonfungi: :/ was hoping for some other data22:52
flashgordonfungi: once again w/o the debug logs from novaclient ...22:53
clarkbalso they give you a real ip addr22:53
clarkbbut no ipv622:53
fungiflashgordon: yep, it's exactly the same point in the client, just a (slightly) different http error code22:53
flashgordonso 500's are generally something went really wrong22:54
flashgordonso not to surprised that case is leaking things22:54
*** enikanorov has quit IRC22:55
clarkbfungi: my node seems to be in europe too22:55
fungiclarkb: yeah, i think they have several datacenters in europe and also at least one in quebec22:55
clarkbya I wasn't given an option via horizon to chose a region, I should dig into that more22:56
*** enikanorov has joined #openstack-infra22:56
flashgordonfungi: my hunch is its a layer on top of OpenStack as well22:57
flashgordonfungi: there may be a way to better detect what nodes nodepool started versus others22:57
*** dimsum__ has quit IRC22:58
flashgordonto make it easier to detect zombies22:58
*** hdd has joined #openstack-infra22:58
clarkbflashgordon: there is, we can write metadata on each node that states which nodepool booted the node22:58
clarkbthen if that nodepool doesn't know about that node it can delete it22:58
clarkbor adopt it I guess22:58
anteayazaro: great well I'm going offline then so I can be coherent tomorrow, see those who will be there at 150022:58
*** andreykurilin_ has quit IRC22:58
flashgordonclarkb: there me be a better way22:59
*** dimtruck is now known as zz_dimtruck22:59
*** dimsum__ has joined #openstack-infra22:59
flashgordonclarkb: yeah nova boot --meta22:59
flashgordonthat kind of metadata22:59
*** dimsum__ is now known as dims23:00
*** tsg_ has quit IRC23:00
fungiflashgordon: yep, we definitely have an idea of how we might do that (or just ignore the 5xx errors per mordred's proposed patch) but regardless if it was likely to be a bug in nova we wanted to figure out whether we had enough details to make a useful bug report or identify if it's an already known issue23:01
clarkbmordred: that node I was follwing in hpcloud went ready and was then used23:01
clarkbmordred: and is now in the delete queue23:01
clarkbmordred: so I think if you change works in rax (where are we on that) then we are good23:02
flashgordonfungi: in the 5xx case do you get a instance id?23:02
flashgordonfungi: ahh23:02
*** boris-42 has joined #openstack-infra23:02
fungiflashgordon: not in the api response i don't think23:02
zaroanteaya: thanks for remind the crew, see you tomorrow.23:03
clarkbfungi: correct we get the 50X not a uuid23:03
clarkbwhich then leads to leaking the node, mordreds hackaround is to query based on the name we told to boot23:03
fungizaro: looking forward to it23:03
fungiin good news, zuul has about finished chipping away at its waiting jobs23:04
*** thedodd has quit IRC23:05
*** garyh has joined #openstack-infra23:06
fungispeaking of alien nodes, 217 at the moment23:06
fungijeblair: when you said slowly deleting those did you manually do so or have you been doing it continuously in a loop? if the former, i'll go ahead and do another cleanup pass while i'm thinking about it23:07
*** mtanino has quit IRC23:08
*** sarob has quit IRC23:08
*** ghostpl_ has joined #openstack-infra23:08
flashgordonclarkb: ahh  that is the workaround23:09
flashgordonfungi: so if no req-id its not a nova bug23:09
flashgordonas a general rule of thumb23:09
clarkbflashgordon: oh you want reqid23:10
clarkbflashgordon: I was talking instance uuid23:10
*** wenlock has quit IRC23:10
fungii went ahead and started deleting the current alien nodes23:11
flashgordonclarkb: err I meant instance uuid23:14
flashgordonwell really either23:14
*** ddieterly has joined #openstack-infra23:14
flashgordonif get neither it isn't a nova thing23:14
jeblairfungi: i had 2 processes going through them; they are finished now (sorry i don't know the completion time, but it was probably within the last hour)23:15
*** esker has quit IRC23:15
*** garyh has quit IRC23:15
fungijeblair: cool, well i've got a serialized pass going now23:16
jeblairfungi: good, that should reduce the load and decrease the chance of timeouts on our side (which cause more alien nodes)23:17
fungiso since you started your pass, we accumulated more than 200 additional23:17
*** sarob has joined #openstack-infra23:17
*** Bsony has quit IRC23:18
fungibut demand is now down to the point where i don't think we're going to continue accumulating many from here through the weekend23:18
jeblairfungi: yeah.  my gut is we can ascribe some of them to my additional activity (especially since i started with 10 of them in parallel before i realized the effect).  but not all of them.  perhaps i would discount that by 50-100.  so still a serious problem.23:18
*** esker has joined #openstack-infra23:18
fungiit's also about time to wind down here and do some friday night things, but i'll keep an eye on irc in case something goes horribly, horribly wrong23:19
*** ibiris is now known as ibiris_away23:21
pleia2fungi: enjoy, see you in the morning23:21
fungiabsolutely23:22
*** enikanorov has quit IRC23:22
*** enikanorov has joined #openstack-infra23:23
*** achanda has quit IRC23:24
SpamapSgreghaynes: reviewed your compress_and_save thing23:24
*** ociuhandu has quit IRC23:25
*** achanda has joined #openstack-infra23:26
*** mrmartin has quit IRC23:27
greghayneshrm?23:29
*** unicell has quit IRC23:29
* greghaynes has too many things lying around23:29
*** unicell has joined #openstack-infra23:29
greghaynesthe VHD one?23:29
*** tkelsey has joined #openstack-infra23:31
SpamapSgreghaynes: hah sorry I meant VHD23:32
SpamapSgreghaynes: but I said compress_and_save because thats what I made a comment on23:32
*** tsg has joined #openstack-infra23:34
pleia2doh, I think the toggle ci button broke on production review.o.o23:35
*** tkelsey has quit IRC23:35
SpamapShopefully it broke and always shows CI because I hate that CI is hidden by default now. ;)23:36
pleia2yeah, and jenkins results aren't up at the top with our votes23:36
SpamapSoh well I like that. ;)23:37
openstackgerritJames E. Blair proposed openstack-infra/infra-specs: WIP: Add Zuul v3 spec.  https://review.openstack.org/16437123:37
SpamapS(the results at the top) :-p23:37
*** unicell has quit IRC23:37
jeblair(still not done, but a little more defined)23:37
*** unicell has joined #openstack-infra23:37
*** tonytan4ever has quit IRC23:37
jeblairpleia2: did merging that change result in a broken symlink?23:38
*** esker has quit IRC23:38
pleia2jeblair: oddly not, we've still got jquery.js and jquery.min.js with old timestamps and living out their lives as separate files23:39
pleia2oh, it's the static one from /home I should be looking at23:39
jeblairSpamapS: well, the idea is that only the latest ci is shown.  so it should never be hidden, but there's no need to wade through 100 auto generated messages23:39
jeblairpleia2: ah, one idea is that gerrit may need to be restarted -- it's got something weird going on with the hash of the file that i'm not sure we fully understand23:40
SpamapSjeblair: I think I would like that better if I worked on some of the projects with 100's of auto generated CI results. :)23:40
pleia2-rw-r--r--  1 root    root    243K Mar 20 22:14 jquery.js23:40
pleia2so that changed23:40
jeblairpleia2: want to restart gerrit and see if it fixes it?23:41
pleia2it's also broken on review-dev and our new server23:41
*** harlowja_ has quit IRC23:41
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/project-config: Add config-drive element  https://review.openstack.org/15413223:41
jeblairpleia2: do we have gerrit running on our new server?23:41
*** markvoelker has quit IRC23:42
pleia2oh, no, it just redirects23:42
*** harlowja has joined #openstack-infra23:42
pleia2jeblair: we can try a restart, I'll need some help there though23:43
pleia2maybe on review-dev first? but I don't know if review-dev has any special weirdness23:44
*** che-arne has quit IRC23:45
*** emagana has joined #openstack-infra23:45
pleia2note to self: don't find these things at 16:30 on friday23:46
jeblair:)23:46
jeblairyeah, why don't you try review-dev first23:46
jeblairpleia2: nothing tricky about gerrit restarts; /etc/init.d/gerrit restart   should do it23:46
*** ajmiller has quit IRC23:46
jeblairi'm here as backup23:46
pleia2do we use init.d or service gerrit restart?23:47
jeblairpleia2: i suppose service is more correct; i think init.d works tho23:47
pleia2I'll do init.d today, here goes on review-dev23:47
SpamapSjeblair: do you only wear your grey beard when you're on IRC, or sometimes at the market too? ;-)23:47
* SpamapS secretly wishes upstart and systemd had never been invented and /etc/init.d/ was still "the way"23:48
jeblairSpamapS: i stop people at the market and tell them what i think about systemd23:48
pleia2still broken :(23:48
jeblairSpamapS: which basically means i blend in perfectly in berkeley23:48
SpamapS"Hello, do you have a few minutes to discuss pid 1?"23:48
pleia2no wait, I think it's ok!23:49
pleia2https://review-dev.openstack.org/#/c/5270/23:49
SpamapS"Have you considered what will happen to your zombie processes when you die?"23:49
jeblairpleia2: lgtm!23:49
jeblairpleia2: so the only advice i'd give about a prod gerrit restart is don't do it right before zuul is about to merge a change23:50
pleia2so, restarting real gerrit, anything special to do re: telling people or anything?23:50
pleia2ah23:50
jeblairpleia2: current top of the queue is 11 mins out so you should be fine23:50
pleia2ok, I'm going to do it now then23:50
jeblairsounds good23:50
*** pc_m has quit IRC23:51
SpamapShah, distributed systems are hard.23:51
jeblairif we were to do this in the middle of the day, i might consider a statusbot notice, but no one is around except SpamapS so.  :)23:51
pleia2hehe23:52
SpamapSjeblair: that guy wouldn't know what to do with statusbot notices anyway23:52
pleia2alright, back up, let's see23:52
pleia2all better!23:52
pleia2thanks jeblair23:52
jeblairyay!23:52
jeblairas a bonus, gerrit will be nice and speedy until we shut it down again tomorrow.  :)23:52
pleia2so javascript changes require a gerrit restart23:52
pleia2makes sense (what)23:52
jeblairpleia2: right? :)23:52
jeblairi thought that touching the site include file was supposed to avoid that, but i'm not certain we fully understand what's going on23:53
* pleia2 nods23:53
jeblairand restarting is faster than figuring it out.23:53
SpamapSpleia2: re: "sense"  http://www.quickmeme.com/img/a5/a5fd9f50473ea78ab4a5668771803996dfaebe931facffc060a9c530337dc7e7.jpg23:53
pleia2SpamapS: ++23:54
jeblairwhat a nice way to end the day23:54
SpamapSsome day.. I'll figure out why downloading an image from cloud-images.ubuntu.com on my home connection tops out at 1Mbit23:54
SpamapSbut if I download it to an hpcloud instance, and then to my home box, 40Mbit all the way :-P23:54
SpamapSwhich at least is effectively 20x faster but 100x more annoying.23:55
*** dannywilson has quit IRC23:56
*** gyee has quit IRC23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!