Thursday, 2014-01-23

jeblairlifeless: it's non trivial, but it's not our biggest issue; the time to rebuild the queue is not what's slowing us down.00:00
*** sarob has quit IRC00:00
sdaguewith 5 nova changes, 5 glance changes, 10 heat changes ... 5 nova changes00:00
*** sarob has joined #openstack-infra00:00
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Added storyboard API to webclient venv  https://review.openstack.org/6852300:00
*** flaper87 is now known as flaper87|afk00:01
sdaguewhat would you propose happens when the last nova change passes, and only the first 2 nova changes ran00:01
sdagueand nothing in between did00:01
lifelesssdague: oh I see; I had a misconception in my head about what we ran as we got a deeper queue.00:01
lifelesssdague: that would need to be addressed first00:01
lifelessthanks00:02
sdaguesure00:02
sdaguejeblair: you got enough review fu in you to get this out there - https://review.openstack.org/#/c/67591/ - our uncategorized review list00:03
sdagueuncategorized job failed list00:03
sdaguefor elastic recheck00:03
*** sarob has quit IRC00:05
jeblairsdague: what did you end up doing to build that list?00:05
*** dpyzhov has quit IRC00:06
jeblairsdague: is it going to make spikey bits on this graph? http://cacti.openstack.org/cacti/graph_image.php?action=view&local_graph_id=26&rra_id=100:06
clarkbjeblair: why do your zuul tests do the maintain cache at the end of them?00:06
*** UtahDave has quit IRC00:07
*** senk has joined #openstack-infra00:07
jeblairclarkb: oh, you know they _might_ not need to, hang on00:07
sdaguejeblair: that's all ES queries00:07
sdaguedoesn't touch gerrit00:07
sdaguejeblair: https://github.com/openstack-infra/elastic-recheck/blob/master/elastic_recheck/cmd/uncategorized_fails.py00:08
openstackgerritJames E. Blair proposed a change to openstack-infra/zuul: Add require-approval to Gerrit trigger  https://review.openstack.org/6851600:09
jeblairclarkb: only one of them does; thx; explanation in comment now00:09
fungilifeless: were there any other nodepool config patches we needed in for tripleo besides 68515? are you able to successfully build images from it yet?00:09
jeblairsdague: oh i get it00:09
fungilifeless: and will it need additional prep script patches?00:10
*** dpyzhov has joined #openstack-infra00:10
*** hogepodge has quit IRC00:10
*** fbo is now known as fbo_away00:11
sdaguejeblair: this is basically our todo list of things that failed that we don't know why00:11
* fungi has to disappear again for 4+ hours in about 20 minutes00:11
jeblairsdague: lgtm; fungi do you want to re-review https://review.openstack.org/#/c/67591/00:11
sdagueand with the new bits it will retrigger every time we push a fingerprint change00:11
fungijeblair: yes00:12
sdagueso we can point more people at it. :) jog0's been running it on his laptop to get through the list00:12
openstackgerritKhai Do proposed a change to openstack-infra/jenkins-job-builder: Add tests for YamlParser and patch 2.6 minidom  https://review.openstack.org/6357900:13
openstackgerritKhai Do proposed a change to openstack-infra/jenkins-job-builder: make scm test as the example  https://review.openstack.org/6518600:13
lifelessfungi: I've tested and it boots successfully00:13
lifelessfungi: so we won't need to restart nodepool to tweak it.00:13
fungilifeless: awesome. approving then00:13
lifelessfungi: we may want to change the deploy scripts but that isn't a nodepool restart00:13
fungilifeless: well, config changes aren't a nodepool restart either thankfully, but just trying to make sure we've got what's initially needed in there00:14
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Added storyboard API to webclient venv  https://review.openstack.org/6852300:15
jog0sdague: you run it with my latest patch?00:15
sdaguejog0: no, I was just looking at it00:16
sdagueI was going to wait for the infrastructure to merge to push it00:16
sdagueso that we can see that we can do updates right00:16
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Added storyboard API to webclient venv  https://review.openstack.org/6852300:16
sdaguefungi just +Aed it00:16
jog0sdague: cool, I more wanted you to see the results because they look pretty good00:17
jog0the numbers that is00:17
fungiso i think what we're going to want to do, since nodepool.o.o still has puppet disabled on it and we have a combo of code and config changes going in together, is to kill nodepoold, generate the filtered list of building nodes, manually apply puppet, turn puppet agent back on, start nodepoold again and then en-masse delete the old list of building nodes00:17
sdaguejog0: yeh, you have been a machine, it's awesome00:17
fungihowever i also don't want to do that and then run away for hours leaving everyone else to clean up whatever mess i make00:18
zaromgagne: ok, my jjb changes were rebased.00:18
lifelessfungi: ack00:19
lifelessfungi: I do expect we'll want to change those scripts, but I want to frontload getting something up and going :)00:19
*** pcrews has quit IRC00:19
fungiand we're also waiting on those config changes to get nodes assigned before they gate anyway00:19
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard: Load projects from yaml file  https://review.openstack.org/6628000:19
fungiso they likely won't land until i'm out. however i hope to have more available time tomorrow and be spending less time in meetings00:20
fungior i might feel up to it when i get back to the room tonight, but no idea00:20
*** CaptTofu has joined #openstack-infra00:21
fungii also have hopes nodepool might be way less strained by the time i return00:22
sdaguefungi: those are high hopes00:22
fungiyeah, i expect the zuul improvements will take a bit longer to settle out00:22
sdaguewe seem to be hovering at about 60 / day merge rate right now00:23
sdagueso it will be a while00:23
jeblairrussellb: ping00:23
fungisdague: i think what we're going to see once we churn through the initial check pipeline entries is that check will stay low and the gate reset rate will keep the gate from chewing up the whole pool00:24
*** rnirmal has quit IRC00:24
clarkbfungi: ya that is my hope00:24
fungisince it'll get throttled down to possibly a manageable chunk at a stretch00:24
*** oubiwann_ has joined #openstack-infra00:24
sdaguefungi: that's true, it will be interesting to see what the morning looks like00:24
openstackgerritAntoine Musso proposed a change to openstack-infra/zuul: webapp: set cache-control headers to prevent caching  https://review.openstack.org/6658300:25
* fungi is afk until at least 05:00 utc00:26
clarkbits a party in vegas^H^H^H^H^Hutah00:26
*** wenlock has quit IRC00:27
*** matsuhashi has joined #openstack-infra00:34
*** senk has quit IRC00:34
clarkbjeblair: fyi not approving https://review.openstack.org/#/c/52986/ in hopes you will have a chance to rereview it00:38
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Added storyboard API to webclient venv  https://review.openstack.org/6852300:38
clarkbI am finally stabbing at my review queue00:38
*** mrodden has quit IRC00:39
*** AaronGr is now known as aarongr_afk00:41
jeblairclarkb: i'm trying to respond to your comment on the zuul change, but i don't think my typing it into gerrit is going to work...00:43
jeblairclarkb: i believe the logic i wrote _does_ try to match all criteria00:43
jeblairclarkb: if you start with the flag set to false, then iterate over all the characteristics, you can certainly set something to true if it matches, but how do you make assertions about all the other characteristics?00:44
jeblairclarkb: it's much easier to assume that it matches and then say that it does not on the first instance where something differs00:45
clarkbjeblair: I think you need a flag for each then if not x and y and z return False00:45
jeblairclarkb: ah, yeah, but they are all optional00:45
clarkboh, hmm00:45
clarkbI think that was the piece missing in my head00:45
clarkbjeblair: I am fine with it as is, I will respond to my comment00:47
jeblairclarkb: so then it's "if (matched_a or not a_required) and (matched_b or not b_required) ...." which is, well... not easy to follow00:47
jeblairclarkb: ok00:47
*** zhiwei has joined #openstack-infra00:47
*** senk has joined #openstack-infra00:48
lifelessfungi: ok, I'm about to drop Lynne and C at the airport; I will be available on phone for emergencies (taking laptop w/me) and then back here in ~2h00:48
*** melwitt has quit IRC00:49
*** hogepodge has joined #openstack-infra00:50
*** mrodden has joined #openstack-infra00:54
*** hogepodge has quit IRC00:57
jheskethjeblair: in regards to the gate enqueuing (as per your response to my review comment), I'm confused to how it wouldn't it be enqueued into gate as the first Jenkins post from check will match all the requirements00:57
*** mrodden has quit IRC00:57
jeblairjhesketh: since gate is a dependent queue, there is a check in the dependentpipelinemanager that ensures that only changes that have met all of gerrit's requirements for merging (aside from what the queue itself will supply) are enqueued00:58
*** kraman has joined #openstack-infra00:59
jheskethah okay00:59
jeblairjhesketh: it relates to the canMerge method of the trigger00:59
jeblair(that's what to search for to find that code)01:00
kramanjeblair: ping01:00
jheskethjeblair: right, but why not have the logic in the layout match the desired behaviour?01:00
openstackgerritJames E. Blair proposed a change to openstack-infra/zuul: Add require-approval to Gerrit trigger  https://review.openstack.org/6851601:00
kramanjeblair: looking to bounce some ideas off you about solum. when you get a chance can you please take a look at https://github.com/kraman/zuul/compare/solum_hacks01:01
*** CaptTofu has quit IRC01:01
kramanjeblair: trying to add a message queue based trigger01:01
jeblairjhesketh: i think that might be a good future step -- to be explicit about that in the layout and remove the implicit canMerge check; but we'd want to consider that gerrit provides some extra complexity there...01:02
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Simple round trip API integration with /v1/teams  https://review.openstack.org/6852801:02
jeblairjhesketh: (things like being able to specify complex prolog rules, that say things like a +2 is required in addition to no -2 votes)01:02
jheskethjeblair: sure, I don't mean get rid of canMerge (at least not yet), but as your patch stands it isn't difficult to have a layout that checks for approvals before putting them into gate01:03
*** julim has quit IRC01:04
clarkbmordred: still around? re changes that add projects like https://review.openstack.org/#/c/61954/4 should we go ahead and approve those and manually trigger manage-projects?01:04
clarkbmordred: unsure of where you are in debugging that pain01:05
*** zhiwei has left #openstack-infra01:05
jeblairjhesketh: well, that's my proposed production config too; as it stands, we're relying on the gerrit mergeability check for that aspect of behavior now and i don't want to duplicate it (and thereby confuse the issue).01:05
*** CaptTofu has joined #openstack-infra01:06
*** dkranz has quit IRC01:06
jeblairkraman: i'm about dead for the day (it's been a long one and i'm still a bit ill); that's really exciting though and i'll try to look at that tomorrow when i have fresh brains; that work for you?01:07
*** julim has joined #openstack-infra01:07
kramanjeblair: works great. have a good eve01:07
clarkbone neat thing about that message queue trigger is it would make putting something between gerrit and the world for event streams simpler01:08
jog0clarkb: looks like logstash is getting no data right now01:08
jog0http://logstash.openstack.org/#eyJzZWFyY2giOiJmaWxlbmFtZTpcImNvbnNvbGUuaHRtbFwiIEFORCBtZXNzYWdlOlwiRmluaXNoZWQ6XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjE3MjgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTA0MzkzMTAzMjYsIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIn0=01:09
*** praneshp has quit IRC01:09
mordredclarkb: I'll manually add and run that for debugging01:10
jheskethjeblair: sure, personally though I find it more confusing looking at the layout.yaml and having it look like it may get re-enqueued even though there is code elsewhere that stops it01:10
jheskethI'm happy though if you're happy01:10
jog0nothing in last 6 hours unless the schema changed or something01:10
*** praneshp has joined #openstack-infra01:10
*** thuc has joined #openstack-infra01:11
clarkbmordred: ok, there is also a php addition to stackforge that could be used for debugging01:11
clarkbmordred: I left comments on your unlaunchpadify projects.yaml as well01:11
jog0sdague: we are flying blind again ^01:11
clarkbjog0: o_O01:11
mordredclarkb: yeah? ok. I'll look at that - I'd like to land that at some point - but I'm pretty sure it's going to need a merge01:12
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Use short build_uuids in elasticSearch queries  https://review.openstack.org/6759601:12
sdagueclarkb: hmmm... yeh, that's not good.01:13
sdagueclarkb: any chance our indexers got lost?01:13
sdagueor stuck01:13
clarkbsdague: I think the gearman client went away01:13
*** thuc_ has quit IRC01:13
sdagueoh, that's not so good01:14
*** vipul is now known as vipul-away01:14
clarkbI just restarted it, the log file doesn't show anything wrong01:14
*** thuc has quit IRC01:15
sdaguejog0: on your change, why isn't build_uuid sent to classify?01:15
*** prad_ has joined #openstack-infra01:16
*** ianw has quit IRC01:16
clarkbI know what happened01:16
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Use short build_uuids in elasticSearch queries  https://review.openstack.org/6759601:16
*** prad_ has quit IRC01:16
clarkbthe jenkins log client yaml file was updated to include the new jenkinses before they had resolvable DNS records01:16
*** ianw has joined #openstack-infra01:16
clarkbaccording to syslog the service was restarted by puppet right around when it died and DNS is not zmq happy01:17
jog0sdague: where? I thought I did send it01:17
clarkbjog0: sdague sorry for the turbulence but it should be happy now01:17
*** prad has quit IRC01:17
jog0clarkb: ahhh01:17
jog0thanks01:17
sdaguejog0: https://review.openstack.org/#/c/67596/3/elastic_recheck/bot.py01:18
sdagueI'm confused why it's not just another param to classify01:18
sdagueinstead of doing the union01:18
sdagueor maybe more importantly, which is the first call to classify still there01:19
sdagues/which/why/01:19
jog0sdague:  because thats a rebase mistake01:19
sdagueok :)01:19
sdagueI'm helping! :)01:19
jog0sdague: any other bugs before i repush?01:20
jog0note: still waiting for data to do a live test on new patch01:20
clarkbyou should have some data now, there is data in the new days index01:21
clarkbit may not be a lot of data though because jobs take a while01:21
sdaguejog0: not that I saw01:21
jog0clarkb: cool01:22
sdaguebut it's dinner time here, so I'm done01:22
sdagueand I somehow decided it was a good idea to try to get a PR into eventlet - https://github.com/eventlet/eventlet/pull/7501:22
jog0btw I take it someone knows we have for jobs in post waiting for coverage jobs to run01:22
sdaguenight all01:22
jog0sdague: lol  eventlet that will be fun01:22
jog0o/01:22
*** svarnau has quit IRC01:23
*** pcrews has joined #openstack-infra01:23
openstackgerritA change was merged to openstack-infra/devstack-gate: More network debugging detail  https://review.openstack.org/6791101:26
*** thuc has joined #openstack-infra01:29
jog057 patches landed today according to https://github.com/openstack/openstack/graphs/commit-activity not too bad01:29
clarkbjog0: we just merged a few01:29
*** vipul-away is now known as vipul01:29
*** thuc_ has joined #openstack-infra01:29
clarkbat the risk of jinxing it I think things are moving. window size has fallen to 8 though01:29
StevenKclarkb: Is this new fangled window size visible anywhere?01:30
clarkbStevenK: only in the logs, I havne't made it public yet. Doing so is on the todo list01:31
StevenKI do wonder how that horizon 000000 post job got in01:32
clarkbStevenK: that was ttx01:32
clarkbhe deleted the milestone proposed branch I think01:32
StevenKclarkb: Ah, so caused by GIGO?01:33
clarkbyeah01:33
*** thuc_ has quit IRC01:33
*** thuc has quit IRC01:33
*** nosnos has joined #openstack-infra01:34
clarkbbasically deleting things in that way creates a post job for commit 00000001:35
*** senk has quit IRC01:35
clarkbjeblair you don't happen to still be around do you?01:36
clarkbjeblair: I think the rate limiting doesn't handle the case where jobs beyond the window should be cancelled because the window shrunk and some change failed01:37
clarkbI don't think this affects correctness as that failing change will be booted eventually and changes behind it restarted then01:38
clarkbbut it does affect our resource use01:38
openstackgerritA change was merged to openstack-infra/config: Remove incorrect name filters from nodepool config  https://review.openstack.org/6768401:40
clarkbyeah looks like cancelJobs is in _processOneItem which is within my sliced actionable item list hmm01:41
mordredclarkb: I'm ready to start jenkins05 I think01:41
mordredclarkb: am I correct about that?01:41
clarkbmordred: /me looks01:42
mordredclarkb: (this is my first jenkins server in the new world order, so I want to make sure I'm not going to kill things)01:43
StevenKI thought it would be 0601:43
clarkbmordred: looks like the slave list is dirty01:44
*** jerryz has quit IRC01:45
*** senk has joined #openstack-infra01:46
*** hasharMeeting is now known as hashar01:47
*** yaguang has joined #openstack-infra01:47
clarkbjeblair: yup once the window shifted to cover the failure nnfi did its thing. So correctness is preserved, it just isn't super resource efficient01:47
*** senk has quit IRC01:51
*** senk has joined #openstack-infra01:51
openstackgerritA change was merged to openstack-infra/config: add in elastic-recheck-unclassified report  https://review.openstack.org/6759101:53
*** alexpilotti has quit IRC01:54
*** mrodden has joined #openstack-infra01:54
*** SumitNaiksatam has quit IRC01:55
mikaltempest is running multiple test threads, yes?01:57
clarkbmikal: s/threads/processes/ and currently in the gate there are 201:58
clarkbit was 4 before01:58
mikalclarkb: do you know if anyone has tried tempest with libvirt/lxc?01:58
mikalclarkb: it crashes my cloud instances in interesting ways...01:58
mikalclarkb: i.e. requiring hard reboot to get back01:58
clarkbmikal: I do not know01:58
mikalFairy nugg01:59
clarkbmikal: ewindisch is working on it with docker but that isn't libvirt01:59
openstackgerritA change was merged to openstack-infra/elastic-recheck: objectify the gerrit event for our purposes  https://review.openstack.org/6794101:59
ewindischmikal: actually, getting my jobs to run libvirt/lxc would literally take a single line change...02:00
ewindischhmm02:00
ewindischthat might make zul very happy with me ;-)02:00
zulyes it would :)02:02
mikalzul: have you tried to run tempest? Does it eat your machines?02:03
ewindischzul: I'm running this as opposed to using devstack-gate (*dodges tomatoes*): https://github.com/ewindisch/dockenstack02:03
zulmikal: i havent02:03
openstackgerritA change was merged to openstack-infra/jenkins-job-builder: Add local-branch option  https://review.openstack.org/6536902:04
ewindischI'm uploading it to the docker index so in a few minutes, one would be able to simply run:02:04
ewindischdocker run -privileged -lxc-conf=aa_profile=unconfined -t -i ewindisch/dockenstack-tempest02:04
ewindischand it will bring up the latest master branches and runs tempest against them02:04
zulcool why do you disable the apparmor profiles?02:05
ewindisch(other repos/branches can be specified via the environment args)02:05
mikalclarkb: I can't see where in tempest's run_tests.sh the number of threads is set? Is it hiding from me?02:05
mikals/threads/processes/02:05
jog0mikal: I2338ebf5df8bced935e9ed9b0ebd2d4e859b5dbe02:06
ewindischzul: I actually forked the code from someone else that did that. I haven't reevaluated yet02:06
jog0is the patch that changed the number of threads in gate02:06
mikaljog0: ta02:06
zulewindisch:  ah02:07
openstackgerritMatthew Treinish proposed a change to openstack-infra/elastic-recheck: Add multi-project irc support to the bot  https://review.openstack.org/6754002:07
*** gokrokve has quit IRC02:07
*** gokrokve has joined #openstack-infra02:07
clarkbmikal its in devstack gate02:08
ewindischzul: I'm running dockenstack with lxc now - we'll see if it works, I'm sure it must02:08
*** senk has quit IRC02:09
ewindischzul: are you doing anything in regard to keeping the lxc driver in per the deprecation plan?02:09
zulewindisch:  im working on something right now so i can get things tested more easily02:11
*** gokrokve has quit IRC02:12
jog0clarkb: any changes to gerrit or gerritlib of late? http://paste.openstack.org/show/61714/02:13
jog0I am wondering what is causing ^, it may be me02:14
*** david-lyle_ has joined #openstack-infra02:15
clarkbno recent changes02:16
clarkbtoo swamped02:16
*** smurugesan1 has joined #openstack-infra02:16
*** oubiwann_ has quit IRC02:16
*** smurugesan has quit IRC02:16
*** oubiwann_ has joined #openstack-infra02:16
jog0your to swamped or gerrit is?02:17
jog0or both02:17
jog0and thanks02:17
clarkbwe are02:17
jog0clarkb: ack, thanks that answers my question02:18
lifelessfungi: back02:20
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard: Update ProjectGroups API to consume ID's rather than names.  https://review.openstack.org/6854002:21
*** coolsvap has quit IRC02:21
ewindischmikal: lxc tempest is running fine for me here, other than failing various tests02:22
*** vkozhukalov has joined #openstack-infra02:23
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Simple round trip API integration with storyboard-api  https://review.openstack.org/6852802:24
ewindischmikal: 210 tests in 139 seconds, 34 failures.  (smoketests only)02:24
*** coolsvap has joined #openstack-infra02:24
mikalewindisch: huh, interesting02:25
mikalI'm trying it on a local machine now02:25
mikalWell, installing over very slow DSL at least02:25
ewindischmikal: running on a rackspace vm, btw02:25
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard: Update ProjectGroups API to consume ID's rather than names.  https://review.openstack.org/6854002:26
*** krotscheck has quit IRC02:26
*** dcramer__ has joined #openstack-infra02:27
*** hashar has quit IRC02:28
*** rakhmerov has quit IRC02:30
zulewindisch:  which version of libvirt?02:32
ewindischzul: I should note that most of those errors are around cinder.02:33
openstackgerritA change was merged to openstack-infra/config: Add a fedora image definition for tripleo-cloud  https://review.openstack.org/6851502:33
zulewindisch:  as in attaching volumes?02:33
ewindischzul: as in creating volumes...02:33
zulhuh02:33
ewindischzul: I'm having the problem with all virt drivers, so it's probably my image, devstack, or cinder itself02:34
zulok cool..02:34
ewindisch(probably my image, I'd guess)02:34
* zul disapears ;)02:34
*** julim has quit IRC02:35
lifelessfungi: evening :)02:36
ewindischalright, time to find something to do in SF that isn't work.02:36
*** coolsvap has quit IRC02:37
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Use short build_uuids in elasticSearch queries  https://review.openstack.org/6759602:40
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Clarify required parameters in query_builder  https://review.openstack.org/6775602:42
hub_capmordred: got a sec for a dumb Q?02:43
hub_capu had reported said bug (https://bugs.launchpad.net/trove/+bug/1179009) a while ago and someone is submitting a fix and i cant seem to believe its the only thing thats wrong w/ our code (considering we use proboscis heh)02:44
clarkbhub_cap: there are reasons to use testools even with probuscis02:45
clarkbcleanups for example02:45
clarkbalso kill proboscis with fire02:45
*** rockyg has quit IRC02:46
lifelesshub_cap: whats proboscis02:46
*** mrodden has quit IRC02:49
*** gokrokve has joined #openstack-infra02:51
hub_caphehehe02:54
hub_capclarkb: thats the plan02:54
hub_caplifeless: im sure tim simpson has tried to chat w/ u about it ;) its the super special trove testing framework02:55
hub_capclarkb: the reason i ask is cuz a guy submited a _single_ file change, and it just doesnt seem like thats all itll take02:56
hub_capmind u, lifeless clarkb mordred i know next to nothing (in general) and wrt python testing frameworks, so it may be all we need.. but seems fishy02:56
hub_caphttps://review.openstack.org/#/c/61169/4/trove/tests/api/instances.py02:56
*** pcrews has quit IRC02:57
*** mriedem has quit IRC02:59
clarkbhub_cap: ya you would need to replace it everywhere unittest is used03:00
hub_capbut thats really _it_, like no changing setup/teardown method names etc...03:01
hub_capcuz ive got to convince a -2'er now to undo his -2 ;)03:01
*** SumitNaiksatam has joined #openstack-infra03:07
*** UtahDave has joined #openstack-infra03:09
*** emagana has quit IRC03:10
*** nati_ueno has quit IRC03:10
*** jhesketh has quit IRC03:10
*** jhesketh has joined #openstack-infra03:11
lifelesshub_cap: why should it be more?03:12
*** sdake-ooo is now known as sdake03:13
hub_caplifeless: im just making sure theres not more to it so i can be well informed when i go to talk to him ;)03:13
hub_capit seems as if we inherit from object for all our tests anyway sans one03:14
hub_capwhether thats right or not :)03:14
*** gokrokve has quit IRC03:17
*** ok_delta has joined #openstack-infra03:22
*** mayu_ has joined #openstack-infra03:24
mayu_ping anteaya03:31
notmynameclarkb: just saw in scrollback something you said about "swift doesn't do this and they never will". somehting you were talking about with bknudson. what was the context? something we need to look at?03:32
mayu_my patchset fail. http://logs.openstack.org/48/68148/1/check/gate-neutron-python27/b7edd9c/console.html03:33
mayu_anybody help me to check, I'm new03:34
clarkbnotmyname: oslo.logging03:34
notmynameclarkb: ok. is there a feature set that we should be targeting (or need to explain how we are targeting)? or is it just a question of "things are different"03:35
mayu_I have not found any clue for the failure03:35
*** cody-somerville has quit IRC03:35
clarkbnotmyname: common log format defaults03:35
*** smurugesan1 has quit IRC03:36
clarkbmayu_: there is a traceback03:36
mayu_yes03:36
notmynameclarkb: ok, thanks. but is it something specific in the log, or just looking for the same format?03:37
clarkblooking for the same format03:37
mayu_but it'is nothing about my code03:37
mayu_It seems a bug03:38
mayu_Bug 127018203:39
lifelessfungi: around ?03:39
clarkbmayu_: yes that is possible03:40
mayu_thanks03:41
mayu_there are too many failures, here is the jenkins result, http://paste.openstack.org/show/61715/03:44
mayu_my patchset https://review.openstack.org/#/c/68148/03:45
mayu_jenkins fail, find nothing about my patchset after analysise failure log03:47
mayu_I don't what to do03:48
mayu_I don't know what to do03:48
*** gokrokve has joined #openstack-infra03:49
mayu_clarkb: help03:49
*** cody-somerville has joined #openstack-infra03:51
mordrednotmyname: honestly - I think even "ability to configure to have a shared log format would be a good step in the right direction03:52
mordrednotmyname: and/or helpful03:52
mordredhub_cap: the thing it will do that unittest base class won't is warn you if you don't upcall on setUp/tearDown03:55
notmynamemordred: ya, that's been mentioned. I'm not entirely opposed to that idea, but the hard part is that a different log format is generally only useful to new clusters. while I believe there are more swift clusters that have yet to be installed than have been installed so far, the existing format (which isn't really broken) does give some inertia to keeping the current way03:55
clarkbmayu_: you can recheck it with the bug number you identified as being the cause for the failure03:55
*** ArxCruz has quit IRC03:55
clarkbmordred: in the comment from jenkins is a link to a wiki article that talks about all this03:55
mordrednotmyname: yeah - and I totally hear that - I think that's why I'm less ardent on the "change the default" thing03:56
notmynamemordred: so it's mostly a question or prioritization, and adding yet another config for something that isn't broken isn't really high ;-)03:56
mordrednotmyname: well, "not broken" depends on how you consider your position inside of an opensatck install03:56
notmynamemordred: to be specific, I mean "isn't broken" == "isn't a pain point for people installing swift"03:57
mordredif you consider that a goal, then you are currently broken, since you're the only member of that install that has no ability to log in a manner similar to the others03:57
mordredright. it's not for people only installing swift03:57
notmynamemordred: or actually, "isnt' a pain point for people deploying and contributing to swift" ;-)03:57
mordrednotmyname: I think it's _cleary_ broken without even needing to be demonstrated from an openstack pov03:58
notmynamemordred: ya. supporting a common format seems like a good idea. is there a doc that describes the common format somewhere?03:58
mordredhow mucch you care about that is  --- you know :)03:58
notmynamemordred: like I said. people who contribut to swift ;-)03:58
mordredpeople who contribute to opensatck03:58
mordrednotmyname: I love our regular dance about this ;)03:59
notmynamemordred: I'm just trying to antagonize you03:59
mordrednotmyname: darn. I was trying to do the same to you03:59
notmynamemordred: what is the openstack common logging format?04:00
*** ok_delta has quit IRC04:00
hub_cap"Shits broke %s"04:01
mordrednotmyname: what hub_cap said04:01
mordrednotmyname: looking - one sec04:01
notmynamekk04:01
hub_capsomething liek this mordred ? https://github.com/openstack/oslo-incubator/blob/master/openstack/common/log.py#L13604:03
mordredhub_cap: no - like this: http://git.openstack.org/cgit/openstack/oslo-incubator/tree/openstack/common/log.py#n13004:05
mordred:)04:05
mordrednotmyname: ^^04:05
notmynamelooking04:05
mordredeither one - you can look at the openstack version or the github version :)04:05
*** thuc_ has joined #openstack-infra04:06
notmynameI'll look at the better one (and let y'all decide which one that is) ;-)04:06
*** david_lyle has joined #openstack-infra04:07
notmynamemordred: hub_cap: so line 137-138 is what you'd want to see for the proxy server? or for internal logging? or what?04:07
*** yamahata has joined #openstack-infra04:08
*** david-lyle_ has quit IRC04:09
notmynamewhat does %(instance)s map to?04:09
notmynameand %(message)s is just an arbitrary string?04:09
mordrednotmyname: not 100% sure - I'd assume an identified that helps find which thign this happened on - and yes to message04:09
mordredlifeless: ^^ can you provide any insight into the above?04:10
notmynameis the first item the duration of the request or when it happened?04:10
lifelessinstance will be the string description of the instance - looks like nova specifics that have leaked into oslo04:11
dstufftSo I'm not sure about the zuul status page, if I pushed a tag to a thing 5-6 hours should I have seem a release by now?04:11
notmynameso that log format seems like just a prefix (assuming message is just an arbitrary string). are there any requirements on the message? eg no spaces?04:12
notmynamelifeless: mordred: ^04:12
lifelessnotmyname: http://git.openstack.org/cgit/openstack/oslo-incubator/tree/openstack/common/log.py#n32604:12
hub_capinstance is not just nova, we use it too in trove ;)04:13
lifelessnotmyname: message has no constraints04:13
hub_capand its rarely used..., it is the uuid of the instance fwiw04:13
hub_cap'[instance: %(uuid)s] ',...04:14
*** CaptTofu has quit IRC04:14
hub_caphttp://git.openstack.org/cgit/openstack/oslo-incubator/tree/openstack/common/log.py#n17204:14
notmynameand it's intended that instance and message aren't separated by a space? so that if instance isn't passed in and message = "hello world", you get a different number of log fields than if the instance is passed in?04:14
hub_capyea yea mordred i know i need to start using git.o.o ;) for linking04:14
hub_capiirc u get a []04:15
hub_capif no instance is there04:15
*** rcleere has joined #openstack-infra04:15
hub_capnope im wrong notmyname , u get '' if no instance & no instance_uuid04:15
lifelessnotmyname: its intended, it either is invisible, or nicely presented04:16
notmynameso you get the log line as "12.034 543 DEBUG foo [-] hello world" or  "12.034 543 DEBUG foo [-] uuidhello world"04:17
hub_capexamples04:17
hub_caphttps://gist.github.com/hub-cap/857275204:17
hub_capi can hack in the instance_uuid in a test to show if needed04:17
fungiso i lied. i got back earlier than 05:00 utc (sorry lifeless, you must not have seen me say i was disappearing)04:18
lifelessfungi: I was optimistic that you were joking :)04:19
fungiahh04:19
lifelessfungi: so, how drunk are you?04:19
funginope, i was in a yurt with no elecricity04:19
notmynameso to be clear, where are you wanting this log format in swift? everywhere that swift logs? internal requests (eg replication and object server logs)? proxy server logs (ie API access)?04:19
funginot drunk in the least (unfortunately)04:19
lifelessfungi: GREAT, lets do this!04:19
* hub_cap runs away to let notmyname / mordred discuss ;)04:19
fungichecking up on what the current state is so i don't jump in blind, so just a sec04:19
notmynamefungi: what I know about yurts is that they are in Kazakhstan and you drink fermented mare's milk04:20
funginotmyname: this one was somehow not in kazakhstan04:20
notmynamefungi: that's much less exciting04:20
fungiand they were fresh out of kefir04:21
ttxnot drunk.04:21
hub_capttx: lame04:21
hub_cap;)04:21
fungittx: i think that wine was non-acoholic or something (or else the conference circuit has hardened my liver)04:22
ttxhttps://review.openstack.org/#/c/68135/ is still deep in hte queue, so I guess I should just sleep now04:22
fungiso, gate scheduling changes seem to be helping. nodepool thrash is gone, gone, gone04:23
fungii think it's safe to go ahead and restart nodepool now04:23
*** harlowja is now known as harlowja_away04:23
*** dcramer__ has quit IRC04:25
mordrednotmyname: I think we notice it because of the logstash processing - so I'd probably say "everywhere that things logs things?"04:26
funginodepoold killed04:26
mordredfungi: hey...04:27
fungimordred: hey04:27
mordredfungi: I've got jenkins05 up - do we want to co-locate teh nodepool restart with adding that? or wait because the queu is less suck?04:27
fungimordred: queue seems okay actually04:27
notmynamemordred: ok. thanks.04:27
fungiadding jenkins masters is only a config change, so no nodepoold restart needed for that04:28
lifelessfungi: so I'd read that as no04:28
lifelessfungi: if it doesn't need a restart, don't do one ;)04:28
fungilifeless: mordred: right04:29
fungipuppet config applied and agent started04:31
*** markmcclain has joined #openstack-infra04:32
funginodepool started and didn't insta-die... good sign04:32
mordredw00t04:33
mordredfungi: ++04:34
fungikilling the list of building nodes i recorded now04:34
lifelessfungi: cool04:35
lifelessfungi: I see a template building now04:35
*** morganfainberg is now known as morganfainberg|z04:35
*** vogxn has joined #openstack-infra04:35
lifelessfungi: if you hit quota issues, let me know, its set fairly low because this cloud has one hypervisor only ATM04:36
fungiwill do--it'll be a bit before i can start paying attention to logs and node/image lists04:36
*** vogxn has left #openstack-infra04:37
lifelessack04:38
fungimass deletes of the stale building nodes is underway now04:41
*** praneshp has quit IRC04:41
*** markmcclain has quit IRC04:42
*** markwash_ has joined #openstack-infra04:43
*** markmcclain has joined #openstack-infra04:43
*** markwash has quit IRC04:43
*** markwash_ is now known as markwash04:43
*** praneshp has joined #openstack-infra04:43
*** thuc_ has quit IRC04:43
*** thuc has joined #openstack-infra04:44
*** praneshp has quit IRC04:44
*** ryanpetrello has quit IRC04:46
*** thuc has quit IRC04:48
*** ryanpetrello has joined #openstack-infra04:48
*** smurugesan has joined #openstack-infra04:52
*** yamahata has quit IRC04:57
*** smemon92 has joined #openstack-infra04:58
openstackgerritA change was merged to openstack-infra/config: Add mailing list for OpenStack Ambassadors  https://review.openstack.org/6647805:02
*** nati_ueno has joined #openstack-infra05:03
*** nati_uen_ has joined #openstack-infra05:05
*** nati_ueno has quit IRC05:08
*** rakhmerov has joined #openstack-infra05:09
openstackgerritDarragh Bailey proposed a change to openstack-infra/jenkins-job-builder: Add tests for YamlParser and patch 2.6 minidom  https://review.openstack.org/6357905:10
*** harlowja_away is now known as harlowja05:10
openstackgerritNoboru Arai proposed a change to openstack-dev/hacking: Checking for vim tag  https://review.openstack.org/6855605:13
*** gokrokve has quit IRC05:14
*** talluri has joined #openstack-infra05:18
SpamapSare there other critical bugs in the gate that should definitely be in front of https://review.openstack.org/#/c/68135/ ?05:19
SpamapScritical Heat bug.. terrible thing really.. but it has been "queued" all day.05:20
*** yamahata has joined #openstack-infra05:20
*** senk has joined #openstack-infra05:20
clarkbwho knows05:22
*** senk1 has joined #openstack-infra05:23
clarkbSpamapS: I am not able to do promotions of jobs now, but if that is still floudnering tomorrow ping us here and we can promote it to the head of the queue05:23
clarkbalso the new rate limiting stuff seems to be helping quite a bit05:24
StevenKclarkb: Do you plan to have it rebalance and such for failures anywhere in the running queue not just the window?05:24
clarkbStevenK: I don't parse the question05:25
*** senk has quit IRC05:25
clarkbStevenK: what do you maen by rebalance?05:25
StevenKclarkb: If you look at the queue now, it hasn't kicked out 67349,1 properly05:26
clarkbStevenK: oh right. ya that is a bug. if the window shrinks when there are running jobs it basically leaves them hanging. Then when things shift into the window they are dealt with properly05:26
clarkbI noticed this just before heaidng home today and a quick look at the zuul scheduler code doesn't have me hopeful it will be easy to fix. I do hope to fix it though05:27
*** nicedice has quit IRC05:27
clarkbStevenK: best I can tell the current code is correct just not as efficient as it could be05:27
StevenKclarkb: Do you track the entire running queue, or just whatever the window is?05:27
clarkbStevenK: we track the entire queue except for when we do things like start and cancel jobs :)05:28
clarkbso the data is there, I just need to untangle the loop that reacts to certain events so that the window only affects job starts and not stops05:28
StevenKRight05:28
lifelessand we have slaves05:29
clarkbStevenK: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/scheduler.py#n1115 is where everything happens05:29
clarkbStevenK: and at http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/scheduler.py#n1186 I limit calls ot that on the window05:29
lifelesshttps://jenkins01.openstack.org/job/gate-tripleo-deploy/4/05:30
lifelessyay05:30
*** praneshp has joined #openstack-infra05:30
lifelesslike a bought one05:30
clarkbso _processOneItem should probably be split into two things, one that is run on the entire queue and another function that runs only on the window05:30
lifelessfungi: thank you!05:30
StevenKclarkb: That sounds like a good plan05:31
clarkbStevenK: I am pretty sure that can be down without any other changes to the zuul scheduler, I will try digging into it tomorrow05:32
smemon92Hi, I am unable to register new email address in review.openstack.org please help05:33
fungilifeless: thank YOU! i'm just sorry i couldn't spare time to try it out until now05:34
*** ryanpetrello has quit IRC05:34
*** pballand has quit IRC05:34
lifelessfailed ah well.05:34
fungismemon92: does it give you an error message when you try to enter it? or is it having problems with the confirmation e-mail it sends you at the new address (are you receiving that at all)?05:35
lifelesshmm, not much in the way of log output05:35
*** pballand has joined #openstack-infra05:36
smemon92fungi: I am receiving email address but while i try to confirm that mail i am gettin error like this "Server Error,Identity in use by another account"05:38
*** pballand has quit IRC05:39
*** afazekas has quit IRC05:40
*** coolsvap has joined #openstack-infra05:41
*** dguitarbite has joined #openstack-infra05:43
fungismemon92: okay, it sounds like you may have more than one account in gerrit and one of them already has that address associated with it. what's the address you're trying to add, and what does the settings page say your current account id number is?05:43
clarkbStevenK: rereading _processOneItem() this is going to be fun. Maybe I can get jeblair to help untangle it05:44
fungiclarkb: overall your adaptive throttle is having a great effect on node starvation05:44
fungiand more rapidly than i anticipated05:45
*** gokrokve has joined #openstack-infra05:45
clarkbfungi: ya it seems to be doing well. check is all but cleared, trigger and result queues trned towards zero, and stuff is merging05:46
*** SergeyLukjanov_ is now known as SergeyLukjanov05:46
clarkbfungi: the issue StevenK points out is an efficiency thing but not a correctness thing05:46
clarkbfungi: I think now the big thing will be refining the scaling and the way the data is presented05:46
fungibeing able to tune much of that hitlessly through config reloads will be nice05:47
*** kraman1 has joined #openstack-infra05:47
*** mayu_ has quit IRC05:48
*** SergeyLukjanov is now known as SergeyLukjanov_05:49
smemon92fungi :I am trying to add "salman@aptira.com" and my current account id number is "10071" , I already registered this account with my previous accont but now i deleted my previous accont05:51
fungismemon92: i'll have a look in the gerrit database and get that cleaned up. just a moment and i'll let you know when it's ready to try again05:52
clarkbyou can't delete accounts in gerrit fwiw05:53
smemon92fungi: ya sure , thank u05:53
*** nati_ueno has joined #openstack-infra05:53
*** nati_ueno has quit IRC05:53
*** nati_ueno has joined #openstack-infra05:54
mordredclarkb: you can with a big enough hammer05:56
*** nati_uen_ has quit IRC05:56
mordredclarkb: also, I'm quite impressed that the adaptive throttle is so effective05:57
clarkbthe exponential backoff is pretty heavy handed05:58
clarkbwe just had another window shrinkg I think it is ~5 now05:58
clarkbwe might need to think about maybe something more linear +1 on passes -2 on failures05:58
*** kraman1 has quit IRC05:58
fungismemon92: i've removed that e-mail address from your old account (id 8923). also deleted smemon92@gmail.com and your old ssh username "salman" from it as well05:58
fungiclarkb: i dunno, i think we should give it some time to see where it settles out with the current heuristic first06:00
mordredfungi: ++06:02
*** markmcclain has quit IRC06:02
fungii have a feeling cautious incrementing coupled with vicious halving will work out well, but i'll reserve judgment until we have some data06:04
*** rcleere has quit IRC06:06
smemon92fungi : thank you so much , I added my account06:10
fungismemon92: you're welcome06:10
*** praneshp_ has joined #openstack-infra06:13
*** Ryan_Lane has quit IRC06:14
*** Ryan_Lane has joined #openstack-infra06:14
*** CaptTofu has joined #openstack-infra06:14
*** talluri has quit IRC06:15
*** talluri has joined #openstack-infra06:15
*** praneshp has quit IRC06:16
*** praneshp_ is now known as praneshp06:16
fungiimage builds in nodepool look okay06:18
*** CaptTofu has quit IRC06:19
*** talluri has quit IRC06:19
fungithis failure mode worries me... https://jenkins01.openstack.org/job/gate-nova-python27/17278/console06:20
fungilooks like a child process dying06:20
clarkbya its affecting a lot of stuff06:21
clarkbbut pretty sure it isn't an infra problem06:21
funginova problem?06:22
*** afazekas has joined #openstack-infra06:22
*** afazekas has quit IRC06:22
fungior has it been hitting other jobs/projects?06:22
*** UtahDave has quit IRC06:22
clarkbnova problem06:22
fungik, i'm now less worried06:24
*** oubiwann_ has quit IRC06:25
*** senk1 has quit IRC06:31
fungitwo in a row failing the same job... maybe the first is introducing it?06:33
clarkbI don't think so there are check tests failing too06:35
*** pcrews has joined #openstack-infra06:37
*** gyee has quit IRC06:38
clarkbfungi: it is running `sqlite3 test_bigint.sqlite '.schema'` and that is failing06:38
fungiahh06:38
fungiconsistently on that one test06:38
fungithis on the other hand is troubling... https://jenkins02.openstack.org/job/gate-tempest-dsvm-neutron-large-ops/14063/consoleText06:38
fungilooks like a hung apt-get install hanging around maybe06:39
clarkbfungi: we dno't do auto updates on those ndoes do we? might also be that06:39
fungioh, i wonder if we broke the exclusion for that06:40
*** harlowja is now known as harlowja_away06:45
fungithe complimentary wireless here is flaking out pretty badly. nodepool seems stable still, so knocking off for the night06:48
*** nati_uen_ has joined #openstack-infra06:48
*** matsuhashi has quit IRC06:50
clarkbgood night, I am about to do the same06:51
*** nati_ueno has quit IRC06:51
*** pblaho has joined #openstack-infra06:52
*** odyssey4me has joined #openstack-infra06:53
*** matsuhashi has joined #openstack-infra06:54
*** smemon92 has quit IRC07:00
*** CaptTofu has joined #openstack-infra07:00
*** senk has joined #openstack-infra07:01
*** CaptTofu has quit IRC07:05
*** senk has quit IRC07:06
*** gokrokve has quit IRC07:11
*** rakhmerov has quit IRC07:12
*** yolanda_ has joined #openstack-infra07:15
*** mrda is now known as mrda_away07:16
*** gokrokve has joined #openstack-infra07:18
*** vogxn has joined #openstack-infra07:20
*** yolanda_ has quit IRC07:21
*** emagana has joined #openstack-infra07:22
*** dstanek has quit IRC07:22
*** gokrokve has quit IRC07:22
*** amotoki has joined #openstack-infra07:30
*** senk has joined #openstack-infra07:33
openstackgerritGuido Günther proposed a change to openstack-infra/jenkins-job-builder: tests: Allow to test project parameters  https://review.openstack.org/6726507:39
openstackgerritGuido Günther proposed a change to openstack-infra/jenkins-job-builder: project_maven: Don't require artifact-id and group-id  https://review.openstack.org/6603607:39
*** odyssey4me has quit IRC07:39
*** odyssey4me has joined #openstack-infra07:40
*** talluri has joined #openstack-infra07:41
*** boris-42 has quit IRC07:41
*** gokrokve has joined #openstack-infra07:41
*** nati_ueno has joined #openstack-infra07:42
*** ladquin_afk has quit IRC07:43
*** che-arne has joined #openstack-infra07:45
*** emagana has quit IRC07:45
*** nati_uen_ has quit IRC07:46
*** yamahata has quit IRC07:46
*** gokrokve has quit IRC07:46
*** gokrokve has joined #openstack-infra07:47
*** odyssey4me has quit IRC07:49
*** yolanda_ has joined #openstack-infra07:55
*** odyssey4me has joined #openstack-infra07:57
*** yamahata has joined #openstack-infra08:02
*** bauzas has joined #openstack-infra08:03
*** jcoufal has joined #openstack-infra08:08
*** flaper87|afk is now known as flaper8708:12
*** luqas has joined #openstack-infra08:15
*** pblaho has quit IRC08:20
*** pblaho has joined #openstack-infra08:20
*** dizquierdo has joined #openstack-infra08:21
*** morganfainberg|z has quit IRC08:22
*** morganfainberg|z has joined #openstack-infra08:25
*** morganfainberg|z is now known as morganfainberg08:25
*** andreaf has joined #openstack-infra08:25
*** dizquierdo has quit IRC08:26
*** markwash has quit IRC08:29
*** markwash has joined #openstack-infra08:30
*** markwash has quit IRC08:31
*** gsamfira has joined #openstack-infra08:33
*** jcoufal has quit IRC08:34
*** jcoufal has joined #openstack-infra08:35
openstackgerritZhiQiang Fan proposed a change to openstack-dev/hacking: Enhance H233 rule  https://review.openstack.org/6857308:35
*** yaguang has quit IRC08:36
*** dizquierdo has joined #openstack-infra08:38
*** praneshp has quit IRC08:40
*** vogxn1 has joined #openstack-infra08:47
*** mancdaz_away is now known as mancdaz08:47
*** vogxn has quit IRC08:50
*** yamahata has quit IRC08:53
*** che-arne has quit IRC08:56
openstackgerritNadya Privalova proposed a change to openstack/requirements: Fix happybase version  https://review.openstack.org/6843508:56
*** pblaho has quit IRC08:57
*** CaptTofu has joined #openstack-infra09:01
*** oubiwann has quit IRC09:05
*** fbo_away is now known as fbo09:06
*** smurugesan has quit IRC09:06
*** vkozhukalov has quit IRC09:06
*** bknudson has quit IRC09:06
*** CaptTofu has quit IRC09:06
*** oubiwann has joined #openstack-infra09:06
*** yassine has joined #openstack-infra09:10
*** jpich has joined #openstack-infra09:10
*** smurugesan has joined #openstack-infra09:11
*** vkozhukalov has joined #openstack-infra09:11
*** bknudson has joined #openstack-infra09:11
*** yamahata has joined #openstack-infra09:11
*** pblaho has joined #openstack-infra09:11
*** NikitaKonovalov_ is now known as NikitaKonovalov09:12
*** markmc has joined #openstack-infra09:13
*** smurugesan has quit IRC09:13
*** vkozhukalov has quit IRC09:13
*** bknudson has quit IRC09:13
*** rpodolyaka has quit IRC09:13
*** vkozhukalov has joined #openstack-infra09:14
*** smurugesan has joined #openstack-infra09:14
*** derekh has joined #openstack-infra09:14
*** masayukig has joined #openstack-infra09:16
*** Ryan_Lane has quit IRC09:16
*** rpodolyaka has joined #openstack-infra09:16
*** salv-orlando has quit IRC09:17
*** smurugesan has quit IRC09:19
*** johnthetubaguy has joined #openstack-infra09:22
*** mancdaz is now known as mancdaz_away09:22
*** mancdaz_away is now known as mancdaz09:23
*** beagles has quit IRC09:30
*** b3nt_pin has joined #openstack-infra09:35
*** luqas has quit IRC09:38
*** jooools has joined #openstack-infra09:41
*** talluri_ has joined #openstack-infra09:41
*** talluri has quit IRC09:42
*** coolsvap has quit IRC09:43
*** coolsvap has joined #openstack-infra09:45
*** johnthetubaguy1 has joined #openstack-infra09:45
*** boris-42 has joined #openstack-infra09:46
*** johnthetubaguy has quit IRC09:47
*** bknudson has joined #openstack-infra09:48
*** DinaBelova_ is now known as DinaBelova09:49
*** ArxCruz has joined #openstack-infra09:50
*** SergeyLukjanov_ is now known as SergeyLukjanov09:51
*** masayukig has quit IRC10:00
*** max_lobur_afk is now known as max_lobur10:07
*** luqas has joined #openstack-infra10:09
*** pblaho has quit IRC10:14
*** pblaho has joined #openstack-infra10:14
*** vogxn1 has quit IRC10:24
*** ociuhandu has joined #openstack-infra10:28
*** DinaBelova is now known as DinaBelova_10:36
*** jp_at_hp has joined #openstack-infra10:38
*** odyssey4me has quit IRC10:38
*** DinaBelova_ is now known as DinaBelova10:45
*** odyssey4me has joined #openstack-infra10:47
*** afazekas has joined #openstack-infra10:50
*** che-arne has joined #openstack-infra10:51
*** dizquierdo has quit IRC11:00
*** CaptTofu has joined #openstack-infra11:02
*** matsuhashi has quit IRC11:05
*** salv-orlando has joined #openstack-infra11:06
*** CaptTofu has quit IRC11:06
sdagueso I think we need a new floor in zuul11:12
sdaguethe floor of 3 is definitely way too low, I suggest 10, and I suggest faster upward growth on success11:12
*** matsuhashi has joined #openstack-infra11:14
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard-webclient: Added node_no_api env  https://review.openstack.org/6861011:18
*** rossella_s has joined #openstack-infra11:28
*** dpyzhov has quit IRC11:31
*** rfolco has joined #openstack-infra11:32
*** dpyzhov has joined #openstack-infra11:36
*** michchap has quit IRC11:40
*** salv-orlando has quit IRC11:41
*** odyssey4me has quit IRC11:42
*** che-arne has quit IRC11:44
*** lcestari has joined #openstack-infra11:47
*** yassine has quit IRC11:52
*** odyssey4me has joined #openstack-infra11:53
*** salv-orlando has joined #openstack-infra11:55
*** markmc has quit IRC11:55
*** salv-orlando has quit IRC11:56
*** weshay has joined #openstack-infra11:57
*** dpyzhov has quit IRC11:57
*** simonmcc has joined #openstack-infra12:08
*** pblaho has quit IRC12:08
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Add a sample config file  https://review.openstack.org/6862012:12
*** jooools has quit IRC12:12
*** salv-orlando has joined #openstack-infra12:13
*** senk has quit IRC12:14
*** senk has joined #openstack-infra12:16
*** pblaho has joined #openstack-infra12:20
*** senk1 has joined #openstack-infra12:21
*** senk has quit IRC12:22
*** luqas has quit IRC12:23
*** alexpilotti has joined #openstack-infra12:25
*** salv-orlando has quit IRC12:25
*** xchu has joined #openstack-infra12:25
*** b3nt_pin has quit IRC12:27
*** b3nt_pin has joined #openstack-infra12:27
*** b3nt_pin is now known as beagles12:28
sdaguewoot - https://github.com/eventlet/eventlet/pull/75 - looks like that eventlet patch will land12:29
sdaguegiving us control over the logging there12:29
*** alexpilotti has quit IRC12:30
*** pblaho has quit IRC12:30
*** pblaho has joined #openstack-infra12:31
*** luqas has joined #openstack-infra12:33
*** xchu has quit IRC12:34
*** nosnos has quit IRC12:36
sdaguehmmm.... something not working on how we generate our uncategorized page12:37
openstackgerritSergey Lukjanov proposed a change to openstack-infra/elastic-recheck: Add fingerprint for bug 1268732  https://review.openstack.org/6862512:38
*** markmc has joined #openstack-infra12:38
sdagueSergeyLukjanov: nice12:39
*** matsuhashi has quit IRC12:39
SergeyLukjanovsdague, hi12:39
sdaguejust approved your new er fingerprint12:39
SergeyLukjanovsdague, I'm not really sure that I've done it correctly :)12:40
sdagueit looks right to me12:40
*** smarcet has joined #openstack-infra12:40
SergeyLukjanovsdague, ok, thx, now I what to do when I see some new error in logs ;)12:40
openstackgerritA change was merged to openstack-infra/elastic-recheck: Add fingerprint for bug 1268732  https://review.openstack.org/6862512:41
sdagueso the foundation doing a retreat in Utah means they moved fungi back 2hrs. :( Need to get someone to look at adjusting the zuul window params12:41
sdagueas it's being too conservative now12:41
SergeyLukjanovsdague, are you speaking about pipeline window?12:45
sdagueyeh12:45
sdaguethe floor is too low, we're now sitting on a bunch of idle capacity12:46
*** dpyzhov has joined #openstack-infra12:46
SergeyLukjanovsdague, yep, agreed, we have a bunch of free nodes12:46
SergeyLukjanovabout a half of free nodes I think12:47
sdagueyep12:47
*** dguitarbite has quit IRC12:48
SergeyLukjanovI've missed the moment when this feature was added, so, looking now on implementatoin12:48
sdagueyesterday12:48
sdagueto prevent the massive thrashing12:49
sdaguegrep for window12:49
sdagueit's mostly on the model.py side12:49
*** luqas has quit IRC12:49
portante sdague: nice work on the eventlet patch12:51
sdagueportante: thanks12:51
*** senk1 has quit IRC12:52
*** senk has joined #openstack-infra12:52
*** coolsvap has quit IRC12:58
SergeyLukjanovsdague, heh, I forgot to fetch the latest code ;)12:58
*** alexpilotti has joined #openstack-infra12:59
*** _ruhe is now known as ruhe13:01
SergeyLukjanovsdague, it looks nice13:02
*** miqui has joined #openstack-infra13:02
SergeyLukjanovsdague, and it looks like floor is really to small for the current overall number of nodes13:02
*** CaptTofu has joined #openstack-infra13:03
*** david_lyle has quit IRC13:03
*** heyongli has joined #openstack-infra13:03
*** yassine has joined #openstack-infra13:03
*** dizquierdo has joined #openstack-infra13:05
*** eharney has joined #openstack-infra13:05
*** CaptTofu has quit IRC13:08
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: API tests for rest  https://review.openstack.org/6744713:12
*** SergeyLukjanov is now known as SergeyLukjanov_13:13
anteayattx you around to deal with a spammer in -neutron?13:15
*** jooools has joined #openstack-infra13:16
*** CaptTofu has joined #openstack-infra13:17
anteayasdague: isolated jobs started passing for neutron patches about 7 hours ago13:19
anteayaas far as I can tell no code has changed13:19
*** rpodolyaka has quit IRC13:19
anteayaany thoughts on what might be contributing factors for the passing tests?13:19
*** rpodolyaka has joined #openstack-infra13:20
*** jcoufal has quit IRC13:21
*** ruhe is now known as _ruhe13:22
*** jcoufal has joined #openstack-infra13:22
*** SergeyLukjanov_ is now known as SergeyLukjanov13:22
anteayaisolated jobs in the check queue13:24
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: API tests for rest  https://review.openstack.org/6744713:24
anteayaI spoke too soon, now the running jobs are failing13:26
anteayaseems we just got some through the 30% passing gap13:26
*** rahmu has left #openstack-infra13:37
*** thomasem has joined #openstack-infra13:40
*** rahmu has joined #openstack-infra13:41
*** talluri_ has quit IRC13:44
*** talluri has joined #openstack-infra13:44
*** dstufft is now known as caremad13:44
*** CaptTofu has quit IRC13:44
*** caremad is now known as dstufft13:45
*** rpodolyaka has quit IRC13:45
*** SergeyLukjanov is now known as SergeyLukjanov_a13:50
russellbjeblair: pong from yesterday13:50
*** SergeyLukjanov_a is now known as SergeyLukjanov_13:51
*** talluri has quit IRC13:54
*** senk has quit IRC13:54
*** senk1 has joined #openstack-infra13:54
*** talluri has joined #openstack-infra13:56
*** luqas has joined #openstack-infra13:57
*** thuc has joined #openstack-infra13:57
*** thuc_ has joined #openstack-infra13:58
*** yamahata has quit IRC13:58
*** johnthetubaguy1 is now known as johnthetubaguy13:59
*** dkliban has quit IRC14:00
*** dpyzhov has quit IRC14:02
*** thuc has quit IRC14:02
*** markmcclain has joined #openstack-infra14:03
*** dpyzhov has joined #openstack-infra14:03
ttxanteaya: I don't think I have such privileges yet14:06
anteayaokay14:06
anteayalet's get you privileges14:07
ttxanteaya: jeblair seems to be listed as channel founder14:07
anteayadude stopped at one advertisment fortunately14:07
anteayayes14:07
anteayaJuly 5th, 2013 if memory serves14:07
*** senk has joined #openstack-infra14:07
*** CaptTofu has joined #openstack-infra14:08
openstackgerritNikita Konovalov proposed a change to openstack-infra/storyboard: Auth controller  https://review.openstack.org/6864214:09
*** heyongli has quit IRC14:10
*** senk2 has joined #openstack-infra14:10
*** senk1 has quit IRC14:11
*** senk has quit IRC14:11
*** SergeyLukjanov_ is now known as SergeyLukjanov14:12
*** boris-42_ has joined #openstack-infra14:14
*** yamahata has joined #openstack-infra14:15
*** katyafervent has quit IRC14:15
*** senk2 has quit IRC14:15
*** boris-42 has quit IRC14:15
*** katyafervent has joined #openstack-infra14:15
*** esker has joined #openstack-infra14:17
*** _ruhe is now known as ruhe14:17
*** mriedem has joined #openstack-infra14:18
*** coolsvap has joined #openstack-infra14:18
*** changbl has quit IRC14:19
*** senk has joined #openstack-infra14:19
*** dhellmann_ is now known as dhellmann14:19
*** jooools has quit IRC14:21
*** jooools has joined #openstack-infra14:22
*** julim has joined #openstack-infra14:24
*** dims has quit IRC14:26
*** jooools1 has joined #openstack-infra14:27
*** jooools has quit IRC14:27
*** CaptTofu has quit IRC14:27
sdaguettx: any ideas when we'll see fungi this morning?14:28
*** dims has joined #openstack-infra14:28
*** senk has quit IRC14:29
ttxsdague: should be here sometime in the next hour14:29
*** thuc_ has quit IRC14:30
*** thuc has joined #openstack-infra14:30
*** dpyzhov has quit IRC14:31
openstackgerritDerek Higgins proposed a change to openstack-infra/devstack-gate: Adding the tripleo repositories to PROJECTS  https://review.openstack.org/6864514:34
*** thuc has quit IRC14:34
*** rfolco has quit IRC14:35
derekhHave set ^^ as a WIP would be good if somebody could confirm what I think is correct14:36
*** dkliban has joined #openstack-infra14:37
*** matel is now known as matel_brb14:40
*** matel_brb is now known as matel14:42
*** max_lobur has quit IRC14:45
*** mfer has joined #openstack-infra14:46
*** max_lobur has joined #openstack-infra14:46
*** gsamfira has quit IRC14:46
*** burt1 has joined #openstack-infra14:50
*** mfer has quit IRC14:50
*** jooools1 has quit IRC14:50
*** jooools has joined #openstack-infra14:51
*** senk has joined #openstack-infra14:52
fungisdague: never. ;) what's needed?14:53
*** senk has quit IRC14:53
fungioh, zuul config14:53
sdaguefungi: so... can we set the window floor via rpc?14:53
sdagueor is that a zuul restart14:53
fungisdague: no rpc, but no restart (just a config change to adjust the minimum)14:53
sdaguebecause basically we're over starved14:53
sdaguefungi: it rereads it?14:54
*** esker has quit IRC14:54
fungiyep, on the fly as far as i know14:54
sdagueso I'd like to propose we up the window floor to 6, and the success increment to 214:54
*** esker has joined #openstack-infra14:54
*** dpyzhov has joined #openstack-infra14:54
sdagueor the floor to 10 if we can't change the success increment via config14:54
sdagueif you notice, we now have tons of unused quota14:55
fungiwell, not tons, but at least some. we're keeping up because constant failures have driven us down to 3 gate changes in parallel14:56
mordredsdague: I'm not sure changing the floor is needed if the failures have driven us this low - I think the next change we need is jeblair's change and then the change to teh joblist - I think this one is doing its down properly14:58
mordredjob14:58
mordrednot down14:58
mordredalso, morning!14:58
sdaguemordred: so tcp is really intended with trying to have as few errors as possible15:00
openstackgerritMonty Taylor proposed a change to openstack-infra/zuul: Add require-approval to Gerrit trigger  https://review.openstack.org/6851615:00
*** rcleere has joined #openstack-infra15:00
sdaguebut we actually are ok with speculation fails15:00
sdagueand I think we can take more than we have15:00
mordredfair15:01
sdaguefungi: sure but it takes us 1 hr to up our queue by 1 additional slot15:01
*** talluri has quit IRC15:01
sdagueif these tests were turning around in 5 minutes, I think it would be different15:01
* anteaya raises her hand for experimenting with the floor value15:03
anteayaI'd like to see what happens if we do15:03
*** julim has quit IRC15:03
sdagueor the increment must equal the floor, otherwise I think we'll always regress to floor15:03
sdaguegiven any amount of failure15:03
sdaguebut for right now, floor 10 would do us fine15:03
sdagueI also wonder if we could have nodepool try to always keep spares, not just when we are < 100 nodes15:05
sdagueor maybe it's already doing that15:06
sdagueand I'm reading the graph wrong15:06
*** luqas has quit IRC15:07
mordredsdague: it should always be trying to keep spares15:07
*** julim has joined #openstack-infra15:07
sdagueok, some times we just can't build fast enough then?15:07
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Zuul gate window increment by 2, floor at 6  https://review.openstack.org/6865615:08
fungithere's the original proposed change15:08
*** mfink has joined #openstack-infra15:09
*** dstanek has joined #openstack-infra15:09
fungisdague: so basically nodepool tries to keep 100 ready nodes, and when it sees additional need based on change activity is also tries to spin up additional nodes to accommodate that if it's not already satisfied in the base ready set15:10
*** jasondotstar has joined #openstack-infra15:10
fungibut node building takes long enough that there still ends up potentially being some delay when it spikes up there15:11
*** luqas has joined #openstack-infra15:11
sdaguefungi: cool15:11
anteayawould it make sense to increase the value of ready nodes from 100 to something larger than 100?15:12
anteayasince I do believe the value of 100 was from when we had 3 jenkinses15:12
sdaguethat's an interesting idea15:13
sdaguethough if we are building as fast as we can15:13
sdaguethen I don't think it helps15:13
*** kraman1 has joined #openstack-infra15:14
*** oubiwann_ has joined #openstack-infra15:14
sdaguefungi: so I'd say lets get the windows out there as soon as possible15:14
sdaguebecause now that we aren't thrashing15:14
sdaguewe are actually self limiting our throughput15:14
*** krotscheck has joined #openstack-infra15:15
*** ociuhandu_ has joined #openstack-infra15:15
*** jergerber has joined #openstack-infra15:15
*** ociuhandu has quit IRC15:15
*** ociuhandu_ is now known as ociuhandu15:15
anteayamordred: how goes the configuring of the 3 additional jenkinses?15:16
anteayaI did see you say you had 05 up15:16
*** changbl has joined #openstack-infra15:16
fungiwell, we're testing 3 changes in parallel and it will go up to 4 in 14 minutes if that top one succeeds and none fail, but yes right now we're not using as much of our quota as we could (though i question whether the throughput is likely to increase much if the reset rate has been bad enough to drive us to the floor overnight)15:16
openstackgerritDerek Higgins proposed a change to openstack-infra/config: Remove TRIPLEO_ROOT and pull-tools  https://review.openstack.org/6866115:16
openstackgerritDerek Higgins proposed a change to openstack-infra/config: Switch path to toci_gate_test.sh  https://review.openstack.org/6866215:16
fungianyway, i'm in favor of trying it15:18
russellbfungi: i've got some nova security patches to approve today ... can we have stable +A back?  :-)   sdague says it should be good again15:21
sdaguefungi: yeh, I think we're going to be playing tuning games on it for maximum throughput15:22
openstackgerritA change was merged to openstack-infra/config: Zuul gate window increment by 2, floor at 6  https://review.openstack.org/6865615:22
*** odyssey4me has quit IRC15:23
*** afazekas has quit IRC15:23
*** pballand has joined #openstack-infra15:23
*** dcramer__ has joined #openstack-infra15:24
openstackgerritMichael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Simple round trip API integration with storyboard-api  https://review.openstack.org/6852815:25
sdaguealso, I wonder if we should flip the order on the stacked bar - http://graphite.openstack.org/render/?from=-24hours&height=180&until=now&width=334&bgcolor=ffffff&fgcolor=000000&areaMode=stacked&target=color(alias(sumSeries(stats.gauges.nodepool.target.*.*.*.used),%20%27In%20Use%27),%20%276464ff%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.*.*.*.building),%20%27Building%27),%20%27ffbf52%27)&target=color(alias(sumSeries(stats.gauges.node15:26
sdaguepool.target.*.*.*.ready),%20%27Available%27),%20%2700c868%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.*.*.*.delete),%20%27Deleting%27),%20%27c864ff%27)&title=Test%20Nodes&_t=0.8432196965441108#139049056014115:26
*** ryanpetrello has joined #openstack-infra15:26
*** jgrimm has joined #openstack-infra15:26
sdagueor - http://goo.gl/NzQCxl15:26
*** sandywalsh has joined #openstack-infra15:26
sdaguebecause the blue part is basically our throughput, and nice to see how that is changing over time15:27
anteayaI agree, it would be easier to assess changes to throughput if the lower limit were stable, as portrayed in the above graph15:28
*** dkranz has joined #openstack-infra15:29
AJaegerinfra team, fungi, mordred  - is there a chance to get gating for the api projects enabled, please? I'd really like to have them but if you're still too much in fire fighting, I'll wait - patch is https://review.openstack.org/#/c/67394/15:31
*** prad_ has joined #openstack-infra15:32
*** markmc has quit IRC15:32
*** krtaylor has quit IRC15:35
*** DennyZhang has joined #openstack-infra15:35
*** changbl has quit IRC15:36
*** boris-42_ has quit IRC15:39
*** mfer has joined #openstack-infra15:45
*** markmcclain has quit IRC15:45
*** AJaeger has quit IRC15:47
*** rakhmerov has joined #openstack-infra15:51
*** MarkAtwood has joined #openstack-infra15:53
*** esker has quit IRC15:53
*** dhellmann is now known as dhellmann_15:53
*** esker has joined #openstack-infra15:53
*** DennyZha` has joined #openstack-infra15:54
*** jcoufal has quit IRC15:55
*** DennyZhang has quit IRC15:55
fungirussellb: sure, adding that now15:55
*** gokrokve has quit IRC15:56
*** jcooley_ has joined #openstack-infra15:57
SergeyLukjanovguys, is the FAILED floating_ips exercise known bug?15:57
*** esker has quit IRC15:58
*** gothicmindfood has joined #openstack-infra15:58
russellbfungi: thanks!15:59
*** DennyZha` has quit IRC15:59
*** gmurphy has joined #openstack-infra16:02
fungiSergeyLukjanov: on grizzly?16:02
SergeyLukjanovfungi, on master16:03
*** UtahDave has joined #openstack-infra16:03
SergeyLukjanovfungi, http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiRkFJTEVEIGZsb2F0aW5nX2lwc1wiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiIxNzI4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzkwNDkzMDA5MjE2fQ==16:03
fungidoesn't ring a bell, though do we gate devstack exercises on master any longer?16:03
SergeyLukjanovhttps://review.openstack.org/#/c/68596/16:04
SergeyLukjanovfungi, http://logs.openstack.org/96/68596/2/check/check-devstack-dsvm-neutron/467b223/console.html#_2014-01-23_14_40_25_90816:05
SergeyLukjanovlooks like we have some :)16:05
fungiso, just a data point... it looks like any adjustment to the zuul dependent pipeline windowing restarts its calculations back to start values (so increasing the floor to 6 has caused the current window to go back to 20 initially)16:08
*** rnirmal has joined #openstack-infra16:12
anteayayeah, I saw that, I wondered how that happened16:12
*** emagana has joined #openstack-infra16:12
openstackgerritBrant Knudson proposed a change to openstack-infra/elastic-recheck: Add fingerprint for bug 1271190  https://review.openstack.org/6867816:13
*** mrmartin has joined #openstack-infra16:13
*** rfolco has joined #openstack-infra16:14
mriedembknudson: ^16:15
mriedemthat shows up in successful runs 77% of the time16:15
mriedemyou'll have to dig deeper16:15
bknudsonmriedem: how is that possible? errors in the logs don't cause problems?16:15
bknudsondon't cause a failure?16:16
mriedemsome are whitelisted16:16
bknudsonit's sometimes whitelisted and other times not whitelisted?16:16
fungibknudson: mriedem: also we turned the error checking back to non-failing because it was broken for long enough that nova grew some new occasional errors in the interim, so when we tried to turn it back to enforcing it was dragging the gate throughput back down again16:17
*** prad_ has quit IRC16:18
*** prad has joined #openstack-infra16:19
*** branen has quit IRC16:19
*** wenlock has joined #openstack-infra16:20
bknudsonfungi: mriedem: the fix is in the queue already (at #2) so maybe it's not worth it to add the e-r check.16:20
fungiunless it turns out not to actually fix it, but yeah probably worth waiting just a bit longer16:20
mriedemsounds good to me16:22
*** thuc has joined #openstack-infra16:22
*** SumitNaiksatam has quit IRC16:22
mriedemi didn't see a whitelist in tempest for that heat error, but maybe that's not checked in this case16:22
mriedemguessing tempest whitelist.yaml is only checking in the other logs16:23
mriedemderp, b/c that's what it keys off, double derp16:23
anteayabknudson: are you talking about? https://review.openstack.org/#/c/68135/16:25
anteayaI ask because ttx is waiting on it and I am babysitting it16:26
anteayaany impediment to it merging?16:26
ttxno16:26
bknudsonanteaya: yes, https://review.openstack.org/#/c/68135/ -- it caused 6 keystone changes to fail to merge.16:26
anteayahow?16:27
ttxbah. It failed16:27
bknudsonanteaya: gate-tempest-dsvm-full FAILURE in 57m 51s16:27
*** nati_ueno has quit IRC16:28
anteaya Looks like the node went offline during the build. Check the slave log for the details16:28
anteayabknudson: could you expand a bit?16:28
anteayafungi can you take a look at https://jenkins04.openstack.org/job/gate-tempest-dsvm-full/3936/console16:29
bknudsonanteaya: http://logs.openstack.org/75/64575/15/gate/gate-tempest-dsvm-full/26f0c3c/console.html -- "Logs have errors" -- FAILED16:29
anteayamy initial scan says the node went offline16:29
ttxanteaya: I'll cut i2 without it16:29
anteayabknudson: can you help me make the connection to how that caused 6 keystone changes to fail to merge?16:29
ttxanteaya: so no more need for babysitting16:30
anteayathe keystone changes were ahead of the heat patch16:30
anteayattx okay16:30
*** gyee has joined #openstack-infra16:30
bknudsonanteaya: that review didn't cause the keystone to fail -- the bug that the patch fixes caused the tests to fail16:30
anteayaah okay16:31
anteayanow I understand16:31
bknudsonanteaya: sorry, should have been clearer16:31
anteayaall the more reason for 68135 to merge16:31
anteayanp16:31
*** morganfainberg is now known as morganfainberg|z16:31
fungiyeah, looks like something may have happened to that slave. i'll see if it's still around to check16:31
anteayaI'm curious about the failure log that says the node went offline16:31
anteayathanks16:31
*** nicedice has joined #openstack-infra16:32
*** rakhmerov1 has joined #openstack-infra16:33
*** rakhmerov has quit IRC16:33
*** coolsvap is now known as coolsvap_away16:33
fungii caught it before nodepool managed to successfully delete it. the jenkins slave agent is definitely not running on it... looking around for any obvious explanation on the slave16:35
anteayagood catch16:35
*** coolsvap_away is now known as coolsvap16:35
fungigah, nodepool apparently *had* already initiated the nova delete call, it took effect while i was looking at the system logs16:36
anteayaboo16:36
anteayaanything useful left?16:36
*** aarongr_afk is now known as AaronGr16:36
*** coolsvap is now known as coolsvap_away16:37
fungithe logs i managed to look through before the slave was ripped out from under me didn't have any evidence for why the agent was no longer running. it's possible the slave agent is stopped normally when the slave is deregistered in jenkins though, so that's not necessarily related16:37
anteayahmmm16:38
anteayathe gearman worker for that slave wouldn't have anything useful, would it?16:39
fungihuh?16:40
fungithe gearman worker on the jenkins master you mean?16:41
anteayayes for that slave16:41
anteayatrying to understand where the cease and desist message came from for that slave16:41
anteayasince the slave is gone, trying to cast about for what else remains16:41
jeblairanteaya: where did you find the link to that build?16:42
anteayathe failing test for the zuul status page16:42
anteayaI clicked it before teh job was finished running16:42
fungijeblair: it was a failure which reported, so wasn't cancelled behind another failure16:42
fungianteaya: i'm not convinced it came from anywhere. the slave agent could have crashed, something might have gone awry on the jenkins master (which i'm checking logs on now), might have been a communication issue between them...16:43
*** morganfainberg|z is now known as morganfainberg16:43
anteayaah16:44
anteayablast16:44
openstackgerritJames E. Blair proposed a change to openstack-infra/config: Increase zuul window floor to 10  https://review.openstack.org/6869116:45
fungiJan 23, 2014 4:25:17 PM hudson.node_monitors.AbstractDiskSpaceMonitor markNodeOfflineIfDiskspaceIsTooLow16:45
fungiWARNING: Making devstack-precise-hpcloud-az2-1187454 offline temporarily due to the lack of disk space16:46
jeblairfungi, mordred: ^ i think we actually discussed that as the default floor, i guess we missed that in the zuul patch16:46
*** amotoki is now known as amotoki_zzz16:46
fungijeblair: okay, fair enough16:46
russellbdims: what change was that on?16:46
jeblairbecause it is pretty silly to have written this thing that massively parallelizes work and then not use it at least a little.  :)16:47
russellbi'm probably capable of figuring that out.16:47
dimsrussellb, was on your review16:47
anteayafungi: lack of disk space16:47
fungianteaya: yes, on the slave16:47
fungianteaya: i'm going to sample similar slaves on that az to see how much space they normally have on their filesystems16:48
russellbdims: ah yes .. :(16:48
russellbdims: different error now it seems16:48
anteayafungi: k16:48
dimsrussellb, yea :(16:48
jeblairrussellb: when you have a sec, i wanted to learn more about the tempest 4 procs -> 2 decision16:48
anteayaI wonder how much disk space heat tests usually take to run16:49
russellbjeblair: so ... while studying random failures a couple weeks ago, they seemed to related to things taking too long in various places in the code, not terribly consistent16:49
russellbjeblair: and then we saw that the node was pegged on CPU the entire time16:50
russellbjeblair: so that's basically the sum of it16:50
jeblairrussellb: it seems like a pretty big thing -- runtime * 1.5 is bound to have an affect on throughput, and while the math is hard for my sick-brain, i think it would have to be responsible for a substantial number of failures to cause an overall throughput increase16:50
*** samalba has quit IRC16:50
jeblairrussellb: which may be worthwhile, of course16:50
russellbit's about reliability though, not throughput16:50
russellbrandom failures are also incredibly expensive in dev time for analysis and such16:50
jeblairrussellb: so we ran with 4 runners for quite some time, and during some periods things seemed _very_ reliable16:51
jeblairrussellb: so was there something that changed?16:51
russellbhas anything about what test nodes are being used changed?16:51
russellbin nova, we made parts of nova a lot faster (able to do work concurrently a lot better) and others not16:52
russellbprimarily nova-compute can fly while nova-network is stuck being very serial16:52
russellbso that was a lot of the problems16:52
anteayajeblair can you kick user polfilm from #openstack-neutron? person is a spammer16:52
jeblairrussellb: absolutely -- we use rax performance nodes now which are slightly faster than hpcloud ones16:52
russellbdan smith and I spent a week ripping nova-network apart to make that better, patches all under review now16:52
anteayajeblair: you are the only one I know with ops for -neutron16:52
*** pballand has quit IRC16:52
russellbi just feel like if we're pegging the CPU the entire time, we're bound to have random failures again16:53
fungiyep, '/msg chanserv access #openstack-neutron list' says only jeblair16:53
russellbi really should have written down more of my analysis of specific failures, but i didn't16:53
russellbi'd really like to see some per-process CPU consumption info16:54
fungiso my bet guess is that we have some tests leaking a few gigs of data outside of /opt (probably in /home/jenkins) http://paste.openstack.org/show/61768/16:54
jeblairanteaya: done and you are op16:54
russellbbecause i suspect with concurrency=4, we're running at least 4 instances of qemu (and sometimes more) at once, and that just kills it16:54
anteayathank you16:54
jeblairrussellb: ok, yeah, that was my next question -- any chance of our being able to increase the runner count in the future?16:55
russellbi sure hope so16:55
russellbi'd feel a lot better trying again after we land this patch series: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/nova-network-objects,n,z16:55
russellbthat's going to improve nova-network's ability work concurrently *a lot*16:56
russellbdan smith did some testing and it made a *huge* difference16:56
jeblairrussellb: ok; there's a sysstat file in the logs stored with the jobs; did you use that?16:56
russellbyeah, used that16:56
jeblairok, but it lacks the per-process info you need; what could help?16:56
russellbi don't think we need anything this instant ... maybe when we're ready to try again16:57
fungigiven the constrained nature of the root filesystem on these slaves, i wonder if it would make sense to move ~jenkins into /opt (either via symlink or just in /etc/passwd)16:57
russellbthe above patch series gave a 20% speedup in test runtime, and 33% decrease in CPU consumption by nova-network16:57
fungimaybe /var/log could be a culprit too16:57
fungior /tmp16:57
russellbso i'd like to land that, then we can try again IMO16:57
anteayafungi: would that change the disk space issue?16:58
*** krotscheck has quit IRC16:58
jeblairfungi: reasonable.  that's actually the biggest objection i have to the current practice of limited root spaces --16:58
russellbtest being spawning and deleting a ton of instances16:58
anteayaah, so if it is run as root then that would be the limitation16:58
jeblairfungi: it's _extremely_ difficult to put var/log onto another partition without lvm (and of course, they don't have it on lvm)16:58
*** NikitaKonovalov is now known as NikitaKonovalov_16:58
* russellb does the multiplexed conversation dance16:58
fungijeblair: right. having to close out those file descriptors without a reboot is nearly impossible16:59
*** samalba has joined #openstack-infra16:59
jeblairfungi: yep.  anyway, yeah, those would be good ideas for the slaves i think.16:59
*** SumitNaiksatam has joined #openstack-infra16:59
*** coolsvap_away is now known as coolsvap16:59
fungii suspect the increased runtime has allowed us to bump up closer to filling the root filesystem on some runs17:00
jeblairrussellb: okay, cool.  just wondering if there's anything we should be doing in the mean time17:00
*** mrodden has joined #openstack-infra17:00
*** mrmartin has quit IRC17:00
jeblairrussellb: i believe we're at the cpu sweet-spot in terms of node size17:00
jeblairrussellb: so i don't think bigger nodes would help this17:00
russellbyeah, i'm hoping some nova performance improvements will help enough17:01
russellbbut we may still see some penalties due to just qemu eating the whole node17:01
russellbemulated CPUs aren't fast17:01
russellb:)17:01
sdaguerussellb: so I've got load average, and a couple other stats in the sysstat data now as well17:01
russellbcool17:01
sdaguethat merged yesterday I think17:01
sdagueif that helps figure out bottle necks17:01
*** fifieldt has joined #openstack-infra17:01
jeblairrussellb: oh, actually we have 4 vcpus on hp and 8 vcpus on rax17:02
*** boris-42 has joined #openstack-infra17:02
sdaguejeblair: any chance we could get a few 8 ways to experiment with, and see if they help?17:02
sdaguejeblair: we have 8 on rax?17:02
sdaguesince when?17:02
fungiperformance favor17:02
fungiflavor17:02
russellbmight be worth a periodic ps that shows CPU per process17:02
fungisdague: since a few weeks17:02
russellbwell dang17:02
jeblairrussellb: did you by any chance correlate failures to provider?17:02
russellbno ...17:03
sdaguefungi: were performance nodes in gate?17:03
russellbi thought they were all 4 vcpus17:03
fungiwe do have provider metadata in logstash now17:03
jeblairsdague: since LCA17:03
fungisdague: yes17:03
russellbi'd be happy to try 4 on the 8 CPU nodes now17:03
jeblairso, uh, 1.5 weeks-ish17:03
russellbah17:03
sdagueoh... yeh, so we'd have auto selected to 8way on the 8 CPU nodes17:03
russellbmy analysis was just before that17:03
* fungi has completely lost his sense of time17:03
sdagueok17:03
*** SergeyLukjanov is now known as SergeyLukjanov_17:03
sdaguefungi: no kidding17:03
jeblairbut in much smaller numbers than hpcloud17:03
jeblairat least until mordred finishes spinning up the new jenkinses, then we'll be getting close to comparable17:03
russellbbasically instead of concurrency=2, i want concurrency = 1/2 vCPU or something]17:04
russellbor that was my intention for now17:04
sdagueso, honestly, I think we should figure out the metrics we think would tell us if we are overloaded or not, and get them in17:04
russellbcould possibly try 3/4 vCPU17:04
sdaguethen we can try playing with concurency counts17:04
russellbsdague: i think the load average was enough17:04
sdagueok17:04
russellbto see that we were too overloaded17:04
russellbbut it would be nice to mix in some periodic full process listings in there17:04
sdaguethere are some io and context switch numbers in tehre as well17:04
russellbso that when it's pegged we can see exactly what is the culprit17:04
sdaguerussellb: sure, what about periodic time dumps of cpu time used by processes we care about17:05
sdaguethat we should be able to pull out of /proc not too badly17:05
*** BobBall is now known as BobBallAway17:05
russellbsdague: sure ... though that list might be a pain to make17:06
jeblairrussellb, sdague: one more question to help flesh out my understanding -- what about increasing timeouts in tempest?17:06
*** marun has joined #openstack-infra17:06
russellbwe'd have to increase tempest timeouts, as well as some timeouts in nova i think ...17:06
sdaguerussellb: I think he was thinking per test timeouts17:06
russellbyeah17:06
*** yamahata has quit IRC17:07
jeblairrussellb, sdague: to be fair, we see node deletes take _hours_ in reality on real public clouds, which is why there's so much retry code in nodepool17:07
sdagueyeh17:07
sdaguejeblair: these are cirros guests though17:07
russellbtrue ... but not on real virt, so it's slowwww17:07
russellband if we do as many of them as we have cpus ... we're going to eat the thing17:07
russellbthat on top of performance mismatches we have in nova (compute vs network) made things fall over17:08
russellbcompute is all like Y U TAKE SO LONG NETWORK17:08
*** gothicmindfood has quit IRC17:08
jeblairokay, so short answer, that's probably just a bad just a bad idea and isn't really sustainable17:08
russellbetc.17:08
sdagueso qemu is only slow when it hit's priv instructions right?17:08
russellbi think so, yeah ...17:08
*** DinaBelova is now known as DinaBelova_17:08
russellbsdague: maybe, i should just stop speculating and we should get the per process info17:09
sdagueyeh17:09
*** pblaho has quit IRC17:09
sdagueI actually think we're more bottlenecked on the python from what I've seen looking at things in the past17:09
sdaguebut again, we should figure out what to count17:09
sdagueso we have real numbers17:09
russellbreally just a "ps axu" or some such17:09
russellbevery ... 30 seconds or minute or something, i don't know17:10
russellbsdague: yeah, we need to know17:10
*** starmer has joined #openstack-infra17:10
russellbsdague: so let's get something in, and then throw up a draft that changes concurrency back17:10
sdaguesure17:11
*** markwash has joined #openstack-infra17:11
sdagueok, honestly I probably can't look at that until tomorrow17:11
*** dstanek has quit IRC17:11
sdaguebut I will slice off a bit of time to do it then17:11
russellbsdague: want to point me in the right direction?17:11
russellbwhere is the magic17:11
*** dstanek has joined #openstack-infra17:11
sdaguerussellb: so honestly, we should just add something to devstack that's like how systat was added17:13
russellbyeah17:13
russellbwasn't sure if systat was in devstack or some other place17:13
russellbi'll look17:13
*** dprince has joined #openstack-infra17:13
sdagueI'm not familiar enough with ps flags to get something like seconds of run time in a way we like, if we can, that would be cool17:14
sdagueotherwise I was thinking of just doing something custom and counting out of /proc/*/sched17:14
sdagueonce upon a time I tried to use systemtap for this, but it kind of gets mad when you do that at a system level and you loop the process ids, which we do multiple times in tempest because of all the rootwrap forks17:15
sdagueat least on ubuntu17:15
mordredrussellb, sdague: it's almost like there should be a sensible way to track system performance and correlate it to openstack operation17:15
russellblol.17:16
russellbzomg17:16
sdaguemordred: so honestly, I think the answer is actually systemtap17:16
sdaguebut... above17:16
sdagueand I'm not skilled enough in it to figure it out17:16
russellbps can give %CPU at a given point at least right?17:16
openstackgerritA change was merged to openstack-infra/config: Increase zuul window floor to 10  https://review.openstack.org/6869117:17
*** pballand has joined #openstack-infra17:17
*** krtaylor has joined #openstack-infra17:18
*** mancdaz is now known as mancdaz_away17:18
russellbor pidstat if sysstat may be enough ...17:21
russellbs/if/from/17:22
*** dkliban is now known as dkliban_afk17:23
*** odyi has quit IRC17:29
*** jooools has quit IRC17:30
*** afazekas has joined #openstack-infra17:30
*** yassine has quit IRC17:30
*** bauzas has quit IRC17:31
*** afazekas has quit IRC17:36
openstackgerritKhai Do proposed a change to openstack-infra/jenkins-job-builder: make scm test as the example  https://review.openstack.org/6518617:38
*** changbl has joined #openstack-infra17:38
*** dpyzhov has quit IRC17:40
jeblairkraman: looking at that patch; it seems to broadly cover the bases and seems generally agreeable.17:41
*** dpyzhov has joined #openstack-infra17:41
kraman1jeblair: is the event i used the correct one?17:41
kraman1basically i will be posting that event when an external git repo is updated17:42
kraman1so that zuul can kick off the flo17:42
kraman1flow **17:42
kraman1the issue i had with that event is that I may not have the rev #s that are required as arguments17:42
*** jpich has quit IRC17:43
jeblairkraman1: so that's an event that is emitted by gerrit; since you aren't using a gerrit trigger, you could really do anything -- but as you've probably seen, if you want to use the EventFilter class more or less as it already exists, it does make some assumptions there...17:43
jeblairkraman1: you could perhaps clean up that abstraction a little bit, so that there are GerritEventFilters and MessagingEventFilters...17:44
kraman1yah makes sense, i can make that update17:44
jeblairkraman1: or you could go with the approach you started on there, which i'd describe as sort of emulating gerrit-like events17:44
jeblairand reusing them...17:44
kraman1there were also a few places where the gerrit trigger is assumed for calls. eg to get git web url etc. those might need to be abstracted as well17:45
russellbsdague: https://review.openstack.org/68702  .. pidstat17:45
*** max_lobur is now known as max_lobur_afk17:45
jeblairkraman1: regardless -- the ref-updated event (whether you use it, or do something else like it) is probably the best event to use (or model to follow)17:45
jeblairkraman1: it is emitted when ever any ref (including a branch or a tag) in gerrit is updated17:45
kraman1ok, thanks. i will work on the patch some more and ping you again when i have an update17:45
*** dpyzhov has quit IRC17:45
kraman1or do you prefer me to push a patch and then talk on the ML?17:45
jeblairkraman1: it's essentially the result of a merge, or even a 'git push' to a branch...17:46
jeblairkraman1: so you should actually be able to produce at least the new_rev17:46
jeblairkraman1: if a branch is updated, it would be the git sha of the tip of the branch after the update17:46
jeblairkraman1: (the old_rev would be the git sha of the branch right before the update; not sure if that's available to you)17:46
kraman1jeblair: it depends on the remote repo. i might just get a ping saying "branch updated" … without any rev info attached17:47
kraman1so i didnt want to make that assumption17:47
kraman1and i would prefer not to have external code to get that rev # in solum … trying to isolate all git ops in zuul side17:47
jeblairkraman1: ok.  well, it sounds like you can probably imagine the implications there -- if it takes a while for a job to run, it may end up using a newer revision than what it was initially triggered to run17:48
jeblairkraman1: (if the revision data is missing)17:48
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Make jenkins homedir location configurable  https://review.openstack.org/6870517:48
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Put ~jenkins in /opt on nodepool slaves  https://review.openstack.org/6870617:48
kraman1jeblair: yes, i see. but in the zuul case that is probably the correct thing to do17:49
kraman1will need to think about it a bit more tho17:49
jeblairkraman1: in the solum case?17:49
*** johnthetubaguy has quit IRC17:49
jeblairkraman1: i think it would be fine to start pushing patches and pinging the ml.17:50
kraman1jeblair: for solum, we get a call from a remote git repo (like github etc) sayign a push was made. and we use zuul to build the latest code form the repo17:50
kraman1from*17:50
kraman1jeblair: ok, will start formatting my changes into a patch17:51
jeblairor patches :)17:51
kraman1:)17:51
kraman1are there any guidelines about writing tests?17:51
openstackgerritDolph Mathews proposed a change to openstack-infra/elastic-recheck: eliminate 14 false positives for bug 1268732  https://review.openstack.org/6870717:51
*** markmcclain has joined #openstack-infra17:51
jeblairkraman1: (a) please do (b) i'm more into functional tests than unit tests17:51
fungithe more i think back over the other recent nodepool nodes which have spontaneously offlined, the more it dawns on me that it's most often between tests finishing and logs getting uploaded. so almost certainly /home/jenkins filling up the / filesystem17:52
jeblairkraman1: you'll see that test_scheduler basically exercises a real running zuul with faked-out gerrit and gearman workers17:52
jeblairfungi: sounds reasonable17:52
anteayathat was the case with the heat patch, tests had finished successfully17:52
kraman1jeblair: yep, i will add something similar in that case17:52
anteayashardy showed me this bug: https://bugs.launchpad.net/openstack-ci/+bug/126873217:53
kraman1jeblair: thanks for the guidance. keep an eye out for patches :)17:53
anteayawhich I think is the same bug17:53
*** yamahata has joined #openstack-infra17:53
fungianteaya: agreed. i think it is17:54
SpamapSclarkb: https://review.openstack.org/#/c/68135/ still queued (and at the very bottom of zuul.openstack.org). Is that just "because things are broken"?17:56
jeblairfungi: devstack-gate needs an update before https://review.openstack.org/#/c/68706/17:57
*** jerryz has joined #openstack-infra17:58
jeblairoh wait it doesn't17:58
jeblairfungi: it's just the -dev script the refs /home17:58
*** markmcclain has quit IRC17:58
jeblairSpamapS: zuul has learned to be less optimistic and now only launches jobs for changes near the top of the queue17:59
jeblairSpamapS: in a sliding window based on recent success/failure rates (clarkb just wrote that feature)17:59
*** rakhmerov1 has quit IRC18:00
*** derekh has quit IRC18:00
jeblairSpamapS: not sliding; growing/shrinking.  obviously the window is anchored at the head of the queue.  :)18:00
*** nati_ueno has joined #openstack-infra18:00
jeblairSpamapS: UI indication of this is yet to be written; we'll get there soon.  we needed the feature quickly though.18:01
fungijeblair: i think i need to amend the prep script change to also leave a symlink at /home/jenkins until we can fix pypi-mirror to resolve ~jenkins (its config has /home/jenkins hard-coded) and things like that18:02
jeblairfungi: oh, right, for the requirements change jobs?18:02
fungiand also i want to link the bug anteaya mentioned in the commit message, for thoroughness18:02
*** SumitNaiksatam has quit IRC18:02
fungiyep18:02
*** MarkAtwood has quit IRC18:03
fungiit's a to-do. also, jenkins masters think the workspace is in /home/jenkins... where do we fix that during slave registration (is it somewhere separate)?18:03
*** MarkAtwood has joined #openstack-infra18:03
*** dkliban_afk is now known as dkliban18:04
jeblairfungi: it might be a parameter nodepool can pass via the jenkins library...18:05
jeblairfungi: i'm wondering if it might be not too bad to just have the nodepool setup script mv+ln -s it?18:05
jeblairfungi: and leave it configured as /home.  just throwing it out there.  :)18:05
*** SergeyLukjanov_ is now known as SergeyLukjanov18:06
*** DinaBelova_ is now known as DinaBelova18:07
*** praneshp has joined #openstack-infra18:08
fungitrying to remember... i ran across something not long ago which generously replaced any symlink it found in the parentage of a managed path with a directory... don't recall whether that was puppet or jenkins18:08
fungioh, i know, it was the jenkins publisher18:08
fungiso not relevant18:08
*** MarkAtwood has quit IRC18:08
jog0pleia2: my calender has something about a HP/Intel SF hackathon this weekend, do you know anything abou tthat18:08
*** MarkAtwood has joined #openstack-infra18:09
fungijeblair: in jenkins::jenkinsuser i could wrap an if $home!=/home/jenkins block around a symlink file object18:09
jeblairmordred: what's the latest on the new jenkinses?18:09
* fungi thinks that might be safer18:10
clarkbgood morning. I am having an extremely slow start today18:10
fungiclarkb: how tcp of you18:10
*** harlowja_away is now known as harlowja18:10
jeblairfungi: wfm.  i don't have a strong feeling about it other than my suggestion was prompted by the fact that it seemed like we might be rounding a bend in the rabbit-hole to find more rabbit-hole.  :)18:10
clarkbspamaps it needs manual promotion you can ask for it though it isnt a gate issue is it?18:11
SpamapSjeblair: ahh. Thanks. I had pinged clarkb about it last night and he aske dme to ping him back today if it was still queued18:11
jeblairclarkb: good morning i have a brain dump ready for you regarding the window.  let me know when you are ready to receive.18:11
*** SumitNaiksatam has joined #openstack-infra18:12
*** luqas has quit IRC18:12
SpamapSclarkb: no, it is a "every CD heat user is screwed" issue. We can't test that issue in the gate because it requires spinning up a VM so it is only in our experimental checks.18:12
*** MarkAtwood has quit IRC18:12
SpamapSotherwise we'd have found it. :-P18:12
mordredjeblair: 5 is up and running - I'm about to finish setting up 6 and 718:12
SpamapSclarkb: I don't want to pull focus off the gate blockers though. So if those are still landing.. by all means we can wait more.18:13
anteayamordred \o/18:14
jeblairmordred: the css looks weird on 5; maybe puppet needs to run again?18:14
*** hashar has joined #openstack-infra18:15
*** starmer has quit IRC18:16
*** krotscheck has joined #openstack-infra18:16
jeblairSpamapS, clarkb: we don't want to be the gatekeepers for the project, so we have a pretty limited set of changes we'll promote: gate fixes, security issues, legal issues.  I'm not sure we should add CD issues to that list, as my understanding is that most CD systems are expected to have a mechanism to deal with this kind of temporary breakage, yeah?18:16
SpamapSjeblair: indeed, we're dealing.18:17
SpamapSand I was not asking for promotion, only looking for insight into whats going on.18:17
mordredjeblair: yeah. it's entirely possible - the "run puppet, delete user, install deb, run puppet, fix username/chown dir" process was not perfect18:18
SpamapSclarkb: ah I just realized, btw, that it actually did get a run last night, but failed and was reverified (bug 1268732 fyi) .. anyway thanks for the insight18:23
SpamapSjeblair: ty as well for explaining. :)18:23
*** coolsvap has quit IRC18:23
*** coolsvap has joined #openstack-infra18:24
*** gyee has quit IRC18:27
*** praneshp has quit IRC18:27
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Put ~jenkins in /opt on nodepool slaves  https://review.openstack.org/6870618:27
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Make jenkins homedir location configurable  https://review.openstack.org/6870518:27
fungiSpamapS: ^ yes, i think tempest runs for heat changes are possibly generating an unusually large volume of logs18:28
clarkbjeblair: ready18:29
clarkbfungi: there was another test that ran out of /var/cache/nova room18:29
fungiclarkb: ick18:29
fungiclarkb: perhaps /var/cache should also end up in /opt?18:30
fungior maybe we make an /opt/cache and have devstack-gate tell devstack to tell nova to use that?18:30
clarkbfungi: perhaps. or maybe we should remount a bunch of paths? /home /var /opt and so on18:30
fungiclarkb: remounting those may be hard without a reboot18:31
*** jcooley_ has quit IRC18:31
*** praneshp has joined #openstack-infra18:31
fungiwell, remounting /home is probably doable before the jenkins agent is running18:32
*** rakhmerov has joined #openstack-infra18:32
*** NikitaKonovalov_ is now known as NikitaKonovalov18:33
* fungi ponders that18:33
*** DinaBelova is now known as DinaBelova_18:33
*** andreaf has quit IRC18:34
*** branen has joined #openstack-infra18:34
*** mrmartin has joined #openstack-infra18:34
*** shardy is now known as shardy_afk18:35
openstackgerritDonald Stufft proposed a change to openstack-infra/config: Release python-barbicanclient via Zuul  https://review.openstack.org/6871918:35
*** SumitNaiksatam_ has joined #openstack-infra18:37
mordredjeblair: re-ran puppet - re-started jenkins18:37
jeblairmordred: lovely; probably needs jjb run manually18:38
*** smurugesan has joined #openstack-infra18:38
zarofungi: wiki says we are using puppet 2.6.x is that accurate?18:38
fungizaro: 2.7.x18:38
fungior is it 2.9.x... checking now18:39
jeblairmordred: also, i can't log in; hrm, i think there may still be something wrong... did i see clarkb say that the node list was dirty a while ago?  perhaps i pointed you at the wrong files18:39
fungizaro: 2.7.22 currently18:39
zarofungi: ok. i'll update this page https://wiki.openstack.org/wiki/Puppet-openstack18:40
fungizaro: thanks18:40
*** SumitNaiksatam has quit IRC18:40
*** SumitNaiksatam_ is now known as SumitNaiksatam18:40
jeblairzaro: i don't think that page has anything to do with us18:40
fungizaro: yeah, i just looked at it18:40
fungizaro: that's the openstack puppet community documentation18:41
funginot the openstack infra puppet documentation18:41
*** NikitaKonovalov is now known as NikitaKonovalov_18:41
*** fallenpegasus has joined #openstack-infra18:41
jeblairclarkb: so the optimization you're making is to not launch jobs prematurely18:42
jeblairclarkb: i think the key to addressing the issues you brought up is to make the feature in zuul stick closer to that goal18:42
jeblairclarkb: so rather than iterating over the window, iterate over the whole list18:42
jeblairclarkb: but then when examining each change, just bypass the job launch (and i suppose the merge operation as well) if the current change is outside the window18:43
jeblairclarkb: but otherwise, continue to exercise everything else in the queue processor18:43
clarkbjeblair: so push it down into _processOneItem()18:43
jeblairclarkb: yep.  this will not only cancel jobs and other cleanup, but should fix the subway map lines, and have future benefits like actually removing changes from the deep queue for things like non-mergeability18:44
*** jcooley_ has joined #openstack-infra18:45
jeblair(as in code-review-2 non-mergability)18:45
jeblaironce we get that fixed18:45
clarkbyup18:45
clarkbI will start in that direction then18:45
jeblairclarkb: cool18:45
clarkbthough, mergability checking is one thing that slowed zuul processing down18:45
clarkbI am not sure we want to build zuul refs for the entire tree the whole way down18:46
*** NikitaKonovalov_ is now known as NikitaKonovalov18:46
jeblairclarkb: yeah, i was proposing you also skip the merger operation18:47
clarkbok18:47
jeblairclarkb: double use of mergability here -- we should not check to see if we can git-merge a change.  we should, later, fix the bug that prevents us from removing a change when it fails to satisfy gerrit-merge criteria.18:48
zarofungi: ok. i won't update it.  couldn't find any other openstack-infra puppet version info on the net.18:48
fungijeblair: clarkb: thinking about the ~jenkins move a bit more, the separate /opt filesystem doesn't get created until the job starts (setup_workspace calls out to a function to do that part), so using it in puppet is way too early, and we can't move it during the job because the slave agent will have open descriptors in it18:48
dstufftI'm guessing this is the right channel ;P If anyone can +2 https://review.openstack.org/#/c/68719/ that'd be awesomesauce18:48
clarkbfungi: oh right, hrm18:49
*** mgagne has quit IRC18:49
zarofungi: i do see that ci.o.o/puppet.html references puppet-dashboard.o.o  but that puppet-dashboard link is broken.18:49
jeblairfungi: gah; you're right, and we can't do it in image prep because, well, it's an ephemeral partition.18:49
fungizaro: http://ci.openstack.org/puppet.html18:49
fungizaro: "We have not yet migrated to puppet 3, so we pin puppet to 2.x."18:50
fungizaro: (that means "most recent 2.x puppet version")18:50
jeblairfungi: i think the people who decided this system is workable should fix it.  :|18:50
jeblairfungi: there's something about openstack tests failing because of this openstack deployment decision.18:51
fungijeblair: heh. yes, i agree. just trying to decide where a good place to fix it is18:51
zarofungi: how does that relationship to broken link?18:52
fungiin hpcloud and rackspace i guess ;)18:52
jeblairzaro: ask pleia2 and anteaya about dashboard status18:52
*** pballand has quit IRC18:52
fungizaro: i wasn't answering your question about puppet dashboard. i was pointing you to where our documentation implies the version of puppet we're running18:53
*** mgagne has joined #openstack-infra18:53
jeblairfungi: i am not immediately seeing a solution to this as long as we're running jenkins.18:53
jeblairor rather, as long as we are using it to push logs.18:53
fungijeblair: ahh, right, jenkins won't collect artifacts outside the workspace, right?18:54
jeblairfungi: i believe that's the case.18:54
*** mgagne1 has joined #openstack-infra18:55
anteayazaro the latest on the dashboard was we were working with hunner to bring puppetboard online18:55
jeblairfungi: that should probably be verified though18:55
*** pballand has joined #openstack-infra18:55
anteayabut he was having issues with the underlying db requirements for puppetboard, as best as I can remember18:55
zaroanteaya: probably should remove the broken link until it's available?18:55
anteayathen I got swallowed by neutron and having done anything with it since18:56
anteayaI can do that, the one in ci.openstack.org?18:56
zaroyes.18:56
anteayak18:56
*** sandywalsh has quit IRC18:56
openstackgerritJames E. Blair proposed a change to openstack-infra/zuul: Remove push refs to gerrit feature  https://review.openstack.org/6872318:57
zarofungi: had a hard time making the connection.   thanks.18:57
*** mgagne has quit IRC18:57
*** mgagne1 is now known as mgagne19:00
*** jcooley_ has quit IRC19:01
fungijeblair: i don't suppose we could add a step to the node launch to make earlier use of the ephemeral disk? (that would leave it less flexible for jobs wanting to do things with it though)19:01
fungiin which case, since puppet runs during the image build, we'd have to not rely on puppeting anything ito there19:02
jeblairfungi: perhaps we don't have to care about jenkins open file descriptors19:03
fungiso probably connect to freshly built node, partition/format/mount the ephemeral disk, move /home/jenkins to it and leave a symlink behind, then register it with jenkins as a slave19:03
* fungi looks in the vicinity of the ssh test routine nodepool has19:04
jeblairfungi: yeah.19:04
openstackgerritAnita Kuno proposed a change to openstack-infra/config: Remove links to puppet dashboard  https://review.openstack.org/6872419:04
fungii'm thinking we want an optional nodepool script which can be uploaded and run after/during/as the ssh test19:05
jeblairfungi: yes.  i continue to hate that this is the best solution.19:06
jeblairfungi: can we explore whether we can get jenkins to upload artifacts from another location a bit more first?19:07
fungijeblair: however, this addresses clarkb's concern about people running devstack-gate inadvertently blowing away unrelated filesystems/block devices19:07
*** DinaBelova_ is now known as DinaBelova19:07
jeblairfungi: it makes the nodes less useful19:07
fungijeblair: sure, i'll have a look at tweaking the publisher macro19:07
openstackgerritClark Boylan proposed a change to openstack-infra/zuul: Allow zuul to cleanup jobs outside window  https://review.openstack.org/6872519:10
clarkbjeblair: ^ is a relatively simple change. I do have one concern about it that I will point out inline19:10
clarkbjeblair: and posted19:12
anteayajeblair: http://ci-puppetmaster.openstack.org/ doesn't resolve for me19:12
pleia2jog0: re: hackathon - yeah, I'm working it tomorrow and Sunday, Dave Neilson's thing, details: http://public.bemyapp.com/noblecodehackathon/19:12
clarkbanteaya: I don't think we have a web server there19:12
pleia2jog0: I was actually going to ask if you were helping out too, so I can give you our LCA bag ;)19:12
clarkbjeblair: fungi: we can probably update the SCP plugin to have an option allowing unsafe file copies19:13
anteayaclarkb: that would explain why it doesn't resolve to anything19:13
Hunneranteaya: :(19:13
anteayaHunner: hello19:13
anteayasorry I have had no time for you19:13
anteayathat has changed19:13
Hunneranteaya: Sorry that I haven't made any progress :/19:13
anteayawhen do you have time to lend a hand with puppety things19:13
anteayaHunner: now that we have apologized to each other :D19:14
jog0pleia2: I am going to do Saturday but I think I can swing by on friday night too19:14
anteayawhat now?19:14
anteayaHunner: would enjoy working with you to fix all the things19:14
anteayahow is your time?19:14
Hunneranteaya: I think the only thing left was to write the apache vhost definition... but the apache module version was so old that it made me sad :(19:15
jeblairanteaya: it's not a web server, but i certainly hope the host _resolves_.19:15
anteayaHunner: yes, I remember that being an issue, how can we address the sad19:15
Hunneranteaya: Oh, and I would have to update the masters to report to puppetdb19:15
jog0pleia2: cody got my in touch with Dave19:15
anteayajeblair: I'm confused, so should that link remain in docs or no?19:15
pleia2jog0: ah ok, great19:15
Hunneranteaya: I talked with clark about forking/updating the module, but I think that will just have to be a separate effort19:16
pleia2jog0: maybe we just do lunch some day, probably easier than me bringing bag to hackathon and hoping we cross paths anyway19:16
anteayagreat19:16
anteayaHunner: how would you like to proceed?19:16
pleia2anteaya: re: puppetmaster link, it shouldn't not resolve (typing: "host ci-puppetmaster.openstack.org" in terminal should work), it probably just doesn't *connect* in a web browser19:17
pleia2err it SHOULD resolve19:17
*** UtahDave has quit IRC19:18
*** ociuhandu has quit IRC19:18
jog0pleia2: I was thinking the same thing actually19:18
jog0how does tomorrow work for you?19:18
anteayapleia2: okay so if it doesn't connect to a web browser does it make sense to leave it in the docs as a clickable link?19:18
anteayashould it just have the url as non clickable?19:19
*** ruhe is now known as _ruhe19:19
jeblairanteaya: it's not a clickable link19:19
anteayaif it is non clickable I think it should be removed from the navigation19:19
pleia2anteaya: where?19:19
anteayaand stay in the puppet.html page as non clickable19:20
anteayaci.openstack.org19:20
anteayasee the navigation at the top?19:20
*** fallenpegasus has quit IRC19:20
anteayapuppet master19:20
pleia2anteaya: it's not clickable, "host" refers to which server it's on19:20
anteayaclick it19:20
*** fallenpegasus2 has joined #openstack-infra19:20
pleia2"host" does not mean it's a website19:20
* anteaya wonders if she is imagining things19:20
pleia2(which is why it's not clickable)19:20
*** dpyzhov has joined #openstack-infra19:20
pleia2puppet-dashboard link is clickable, because it should work19:20
anteayaam I the only one who sees puppet master as a clickable option in the navigation of ci.openstack.org?19:22
anteayaa clickable link, which we have just established is not meant to be clickable?19:22
*** vkozhukalov has quit IRC19:22
*** dpyzhov has quit IRC19:22
pleia2anteaya: oh oh, on the TOP of the page!19:22
anteayayes19:22
*** sandywalsh has joined #openstack-infra19:23
pleia2anteaya: ok yeah, I think that should go away and/or be replaced when board comes up19:23
*** pballand has quit IRC19:23
anteayaokay thanks, I will do that19:24
clarkbjeblair: fungi: any idea if zuul did a layout reload recently? (I just noticed that the window size appears to be 20 but logs indicate is should be ~3, layout reload would reset it to 20)19:24
*** salv-orlando has joined #openstack-infra19:24
jeblairclarkb: yes, i bumped the min to 1019:24
fungiclarkb: and before i that i had bumped it to 6 and increased the increment to 219:25
clarkbjeblair: fungi: thanks19:25
pleia2jog0: tomorrow I'll be at the hackathon all day :) I'll send an email to organize something next week (should invite gothicmindfood too!)19:25
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: A test publisher to collect logs from /opt  https://review.openstack.org/6873219:26
fungirunning out to lunch for a bit19:26
jog0pleia2:  sounds good19:26
openstackgerritAnita Kuno proposed a change to openstack-infra/config: Remove link to puppet dashboard  https://review.openstack.org/6872419:27
*** ryanpetrello has quit IRC19:28
*** mrmartin has quit IRC19:28
clarkbjeblair: reading your comment on my change, is protecting process one item on an item that has been removed something that should go into my change?19:29
clarkbjeblair: wasn't clear to me if that is something I need to address or a general issue19:29
*** gsamfira has joined #openstack-infra19:30
jeblairoh sorry, general issue19:30
*** markwash_ has joined #openstack-infra19:30
*** harlowja is now known as harlowja_away19:31
Hunneranteaya: I'm asking my boss to get some work hours to put on this so I don't have to ask my wife ;)19:32
anteayathank you19:32
anteayamuch better option19:32
anteayado not ask your wife19:32
*** markwash has quit IRC19:32
*** markwash_ is now known as markwash19:32
openstackgerritJoão Vale proposed a change to openstack-infra/jenkins-job-builder: Add support for credentials-id in git repositories.  https://review.openstack.org/6873419:34
*** pballand has joined #openstack-infra19:35
lifelesshi infra people! We could use a hint on https://review.openstack.org/#/c/68645/19:38
lifeless /cluebat/ if you prefer19:38
*** NikitaKonovalov is now known as NikitaKonovalov_19:39
*** fallenpegasus2 has quit IRC19:40
jeblairlifeless: we believe the test failure there is caused by the root filesystem filling up.  for some reason, cloud deployers think tiny root filesystems are okay.  i'm not a fan of that.19:40
*** hogepodge has joined #openstack-infra19:41
jeblairlifeless: we're currently trying to work around that but it's rather difficult.19:41
lifelessjeblair: oh, I actually meant derekh's question about whether the change is semantically correct19:41
jeblairoh! hah!19:41
lifelesse.g. will it put more stuff in /opt/stack/new19:41
lifelessif the answer is 'yes but it breaks the root filesystem' then thats sad but still helpful!19:42
jeblairlifeless: no, it's 'yes and will not affect the root filesystem';  the git repos are cloned to the ephemeral disk19:42
*** sarob has joined #openstack-infra19:42
lifelessjeblair: ok cool. He had a follow nuance there - but hey, you can read his q yourself :)19:43
jeblairlifeless: (the difficult part is that /var/log and /home/jenkins can't be moved easily; both of those relate to logs)19:43
lifelessjeblair: heh, yeah - for the jenkins I ran in hpcloud I symlink those trees to the ephemeral disk19:44
*** NikitaKonovalov_ is now known as NikitaKonovalov19:44
openstackgerritSean Dague proposed a change to openstack-infra/elastic-recheck: protect from the case of not passing an event  https://review.openstack.org/6873719:45
*** praneshp has quit IRC19:45
openstackgerritA change was merged to openstack-infra/reviewday: Whitelist external lazr.authentication requirement  https://review.openstack.org/6502619:46
*** markmcclain has joined #openstack-infra19:46
openstackgerritA change was merged to openstack-infra/reviewday: Generate JSON  https://review.openstack.org/6447119:48
clarkbjeblair: that python26 fail seems consistent. I am digging into it now19:48
*** sarob has quit IRC19:49
*** NikitaKonovalov is now known as NikitaKonovalov_19:49
*** sarob has joined #openstack-infra19:50
jeblairclarkb: after lca lifeless helped me find that it was probably that the test_client_enqueue_negative test was timing out.  this doesn't make a lot of sense to me and i was unable to repro on my laptop.  however, i have not done so on a centos6 vm.19:51
openstackgerritA change was merged to openstack-infra/elastic-recheck: protect from the case of not passing an event  https://review.openstack.org/6873719:51
clarkbjeblair: interesting, worker-2 has a return code of -14019:51
jeblairclarkb: he indicated that process-return-code should fail in the case of a test timeout, and that the log should indicate a test start but not finish for the test in question19:52
openstackgerritMatthew Treinish proposed a change to openstack-infra/config: Add projects section to elastic recheck bot yaml  https://review.openstack.org/6874119:52
mtreinishjog0: ^^^19:52
*** gokrokve has joined #openstack-infra19:52
*** markmcclain has quit IRC19:53
*** markmcclain has joined #openstack-infra19:54
*** sarob has quit IRC19:54
*** kostabrava has joined #openstack-infra19:54
*** NikitaKonovalov_ is now known as NikitaKonovalov19:54
*** markmcclain has quit IRC19:55
*** kostabrava has quit IRC19:55
*** markmcclain has joined #openstack-infra19:55
*** sarob has joined #openstack-infra19:56
clarkbjeblair: in this case it looks like test_two_failed_changes_at_head is at fault19:56
lifelessjeblair: clarkb: -140 isn't the return code we synthesis for a timeout though19:56
lifelessjeblair: clarkb: so -140 suggests the backend crashed, to me19:56
jeblairlifeless: it's a hard timeout19:56
lifelessoh duh - right19:57
lifelessthe timer error code19:57
lifelessjeblair: clearly I'm not fully awake yet19:57
*** ryanpetrello has joined #openstack-infra19:57
*** dizquierdo has quit IRC19:57
*** sarob has quit IRC19:58
jeblairfungi: i wanted to get a jump on this, so i looked at the scp plugin code and did a manual test in jenkins...19:58
*** gokrokve has quit IRC19:58
*** sarob has joined #openstack-infra19:58
*** harlowja_away is now known as harlowja19:59
*** NikitaKonovalov is now known as NikitaKonovalov_19:59
*** ivar-lazzaro has quit IRC20:01
jeblairfungi: it does require that the source path be relative to the workspace20:01
jeblairfungi: but it will follow a symlink out of it20:01
lifelessjeblair: huh am I right that devstack-gate nodes have *two* caches of git repos? /opt/git/everything and ~/workspace-cache/ ?20:01
jeblairfungi: so we should be able to have d-g symlink $WORKSPACE/logs to /opt20:02
jeblairlifeless: yes, it's a work-in-progress to move all slaves to use /opt/git/everything, including devstack-gate20:03
*** praneshp has joined #openstack-infra20:03
*** sarob has quit IRC20:04
*** mrodden has quit IRC20:04
*** NikitaKonovalov_ is now known as NikitaKonovalov20:04
*** rfolco has quit IRC20:04
*** jcoufal has joined #openstack-infra20:04
*** dhellmann_ is now known as dhellmann20:04
*** rnirmal has quit IRC20:05
lifelessok20:05
lifelessso I think we should follow the d-g pattern right now and then help with that effort20:05
*** rnirmal has joined #openstack-infra20:07
*** mrodden has joined #openstack-infra20:08
clarkbjeblair: fungi: there is a recent window size decreased to 3 message for the main gate queue, on the next iteration through the list we should see that get reflected. I don't know why it didn't drop to 10 instead though20:08
clarkbbest guess is that window floor isn't getting picked up on layout reloads properly? I wonder if my TODO in the merge change queue function needs to be done20:09
openstackgerritSean Dague proposed a change to openstack-infra/elastic-recheck: add tests for loading the queries  https://review.openstack.org/6874520:10
sdagueclarkb: did you see earlier discussion to bring up the floor20:11
sdagueI think 3 is too low20:11
clarkbsdague: yes, it was brought up20:11
clarkbsdague: but the config doesn't seem to have stuck20:11
sdagueah, bummer20:11
clarkbsdague: I think 3 is plenty high, right now we are just failing at the head of the queue over and over and over20:12
clarkbno sense in testing more than one change imo20:12
*** pballand has quit IRC20:12
*** rnirmal has quit IRC20:12
sdaguewell it's nova changes20:12
sdaguethat have unit test bugs20:12
sdagueso we're going to just fail the next three changes most likely20:12
sdaguerealistically unit tests are reseting us in the gate more often than tempest right now20:13
bknudsonI think the keystone jobs are going to fail until https://review.openstack.org/#/c/68135/ is merged20:13
clarkbI think something is wonky with the way layout is reloaded, we seem to still have a relatively large window according to status20:13
bknudsonso might be a good idea to promote it?20:14
jeblairclarkb: here's why i don't think <10 is helpful -- we really want this stuff to merge as quickly as possible, and we definitely have the resources to run jobs for 10 changes.20:14
clarkbbknudson: that is a heat change?20:14
*** gokrokve has joined #openstack-infra20:14
sdaguejeblair: +120:14
bknudsonclarkb: https://review.openstack.org/#/c/68135/ is the heat change20:14
clarkbjeblair: I agree, but it doesn't make a difference when we are serialized20:14
clarkbbknudson: yes that is the change you linked20:14
sdaguetcp is really about avoiding errors, we are actually ok with a certain amount of errors20:15
sdaguethe big issue we have is overunning our resources then effectively swapping, but floor of 10 would be fine20:15
jeblairclarkb: so even though right at this moment 3 or 1 is sufficient, as soon as we get past that, i'd like us to be better utilizing zuul's capability;  the cost of that is to waste a little now, and i think we can handle that easily.20:15
clarkbjeblair: thats fair, also not convinced my change did the correct thing after hte zuul layout reload20:16
clarkbjeblair: does mergeChangeQueue need to handle window-floor and the other keys to make them carry through a reload properly?20:16
jeblairclarkb: yeah, i'll help look into that in a sec; have to warm up lunch now20:17
clarkbok20:17
clarkblooks like change_queues starts as empty list in buildChangeQueues, so I don't think the old change queues will affect a zuul reload. I should grab lunch too20:18
*** elasticio has joined #openstack-infra20:19
*** NikitaKonovalov is now known as NikitaKonovalov_20:23
mordredjeblair: all three new jenkins servers should now be up and running20:24
*** vipul is now known as vipul-away20:24
anteaya\o/20:24
mordredjeblair: now I believe the next step is to write an additional change to config that adds them to nodepool, right?20:25
*** wenlock has quit IRC20:26
*** gyee has joined #openstack-infra20:27
*** SergeyLukjanov is now known as SergeyLukjanov_20:28
*** NikitaKonovalov_ is now known as NikitaKonovalov20:28
*** yolanda_ has quit IRC20:29
*** markmcclain has quit IRC20:29
sdaguemordred: awesome sauce20:30
*** NikitaKonovalov is now known as NikitaKonovalov_20:31
*** jerryz has quit IRC20:31
openstackgerritMonty Taylor proposed a change to openstack-infra/config: Enable jenkins0[5-7] in nodepool  https://review.openstack.org/6875920:32
mordredjeblair, clarkb: ^^ there ya go20:32
*** jgrimm has quit IRC20:32
*** smarcet has left #openstack-infra20:32
*** oubiwann_ is now known as mr-typo20:33
*** mr-typo is now known as oubiwann-fn20:33
*** vipul-away is now known as vipul20:36
dstufftAlso let me ask for some reviews to https://review.openstack.org/#/c/68719/ please :]20:36
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Add query for bug 1270693  https://review.openstack.org/6876220:38
*** hogepodge_ has joined #openstack-infra20:39
*** pafuent has joined #openstack-infra20:40
*** vipul is now known as vipul-away20:40
sdagueclarkb: are you able to help me figure out why the cron bits apparently didn't work?20:40
sdaguefor elastic search20:40
sdagueelastic recheck20:40
sdaguehttp://status.openstack.org/elastic-recheck/data/20:40
clarkbsdague after lunch20:40
sdaguecool, thanks!20:41
*** rnirmal has joined #openstack-infra20:41
*** hogepodge has quit IRC20:41
*** hogepodge_ is now known as hogepodge20:41
*** mrda_away is now known as mrda20:42
mordredclarkb: the above patch is +2 from me20:43
*** coolsvap is now known as coolsvap_away20:44
*** ok_delta has joined #openstack-infra20:46
*** hogepodge has quit IRC20:46
*** SumitNaiksatam has quit IRC20:51
*** marun has quit IRC20:51
openstackgerritDolph Mathews proposed a change to openstack-infra/elastic-recheck: eliminate 14 false positives for bug 1268732  https://review.openstack.org/6870720:51
*** SumitNaiksatam has joined #openstack-infra20:51
*** marun has joined #openstack-infra20:52
*** pballand has joined #openstack-infra20:53
*** markmcclain has joined #openstack-infra20:54
jeblairmordred: i still can't log into jenkins0520:55
*** hogepodge has joined #openstack-infra20:56
openstackgerritSean Dague proposed a change to openstack-infra/elastic-recheck: stop being rediculous with our time formats  https://review.openstack.org/6876520:56
jeblairmordred: i think you may need to edit some xml files and s/jenkins.openstack.org/jenkins05.openstack.org/20:56
*** coolsvap_away has quit IRC20:56
mordredjeblair: oh weird. ... OH - I know what it is20:57
mordredyeah20:57
*** jgrimm has joined #openstack-infra20:59
openstackgerritA change was merged to openstack-infra/elastic-recheck: Add query for bug 1270693  https://review.openstack.org/6876221:00
mordredjeblair: jenkins.model.JenkinsLocationConfiguration.xml21:01
mordredjeblair: I believe we could add that to the puppet21:01
jeblairmordred: istr there are 2 places21:05
mordredjeblair: also in hudson.tasks.Mailer.xml21:05
mordredjeblair: jenkins05 restarted21:06
jeblairclarkb: i understand the problem21:06
fungijeblair: okay, i'll ditch my test change and work up a simple d-g patch (shouldn'tbe more than a few lines)21:08
openstackgerritSabari Murugesan proposed a change to openstack/requirements: nova api validation fw requires jsonschema >= 2.0.0  https://review.openstack.org/6646421:08
*** wenlock has joined #openstack-infra21:08
*** DinaBelova is now known as DinaBelova_21:09
*** jgrimm has quit IRC21:09
*** dangers_away is now known as dangers21:10
pafuentHi. I want to add tests for Heat and I want to know if the ones tagged as slow are run by Jenkins when someone upload a new patch.21:11
openstackgerritAaron Greengrass proposed a change to openstack-infra/config: Extend user module, add 'disable user'  https://review.openstack.org/6877121:11
*** afazekas has joined #openstack-infra21:12
*** turul_ has joined #openstack-infra21:12
*** turul_ has quit IRC21:12
fungipafuent: probably better to inquire in #openstack-qa21:14
*** pballand has quit IRC21:14
pafuentfungi: Ok, thanks21:15
*** mrodden has quit IRC21:15
*** mrodden has joined #openstack-infra21:16
*** markmcclain has quit IRC21:17
clarkbjeblair: woot21:18
clarkbsdague: back from lunch21:18
*** vipul-away is now known as vipul21:18
jeblairclarkb: just a sec and i'll have something21:18
sdagueclarkb: great, was also poking fungi on this. So the new bits to build the uncategorized.html should be putting it in /var/lib/elastic-recheck21:18
sdaguehowever, instead there is just the lockfile21:19
sdaguehttp://status.openstack.org/elastic-recheck/data/21:19
sdaguewhich makes me think something died weird, and now we're dead on the lock21:19
fungisdague: http://paste.openstack.org/show/61783/21:20
sdaguefungi: yep21:20
*** dkranz has quit IRC21:20
*** salv-orlando has quit IRC21:20
sdaguethat dir maps full to the world at http://status.openstack.org/elastic-recheck/data/21:20
clarkboh interesting21:20
fungisdague: also, we seem to still spawn a new daemon on every restart... http://paste.openstack.org/show/61784/21:21
openstackgerritDennyZhang proposed a change to openstack-infra/gitdm: update personal profile  https://review.openstack.org/6877221:22
sdaguefungi: ok, we should tackle that one as well21:22
clarkbfungi: is there anything I should poke at or do you have a handle on it?21:22
clarkbdon't want to get in the way21:22
openstackgerritA change was merged to openstack-infra/elastic-recheck: eliminate 14 false positives for bug 1268732  https://review.openstack.org/6870721:22
openstackgerritJames E. Blair proposed a change to openstack-infra/zuul: Don't store change_queue in QueueItem  https://review.openstack.org/6877321:23
*** pafuent has left #openstack-infra21:23
openstackgerritIvan Melnikov proposed a change to openstack-dev/hacking: Trigger warnings for raw and unicode docstrings  https://review.openstack.org/6877421:23
jeblairclarkb: https://review.openstack.org/68773  lemme know if that makes sense21:23
fungiclarkb: i can run this to ground. i'd rather you not get dragged from zuul patches21:23
clarkbjeblair: looking21:23
clarkbjeblair: gah! good catch21:24
* fungi swears profusely at the flaky internet here21:24
*** marun has quit IRC21:24
*** ok_delta has quit IRC21:25
jeblairclarkb: so short version is that it was updating defunct change queues; thus the disconnect between logs and reality21:25
mgagneAnyone familiar with hpcloud? I have questions regarding the concept of API keys and what they should be used for.21:25
clarkbjeblair: ya, I figured it out just from your commit message21:25
*** sarob has joined #openstack-infra21:25
*** alexpilotti has quit IRC21:25
clarkbmgagne: rolling credential expiry iirc21:25
clarkbmgagne: you may also be able to restrict access of particular keys21:26
*** alexpilotti has joined #openstack-infra21:26
mgagneclarkb: lets say I use the nova client, how do I feed it the api key?21:26
clarkbmgagne: that I don't know21:26
clarkbmordred: ^21:26
*** julim has quit IRC21:26
*** marun has joined #openstack-infra21:27
openstackgerritIvan Melnikov proposed a change to openstack-dev/hacking: Trigger warnings for raw and unicode docstrings  https://review.openstack.org/6877421:28
openstackgerritAaron Greengrass proposed a change to openstack-infra/config: Extend user creation with more granularity  https://review.openstack.org/6877621:28
fungisdague: http://paste.openstack.org/show/61785/ (looks like run_er_uncat and run_er_graph are broken)21:29
sdaguefungi: interesting....21:30
openstackgerritMatthew Treinish proposed a change to openstack-infra/elastic-recheck: Add basic unit tests for the bot  https://review.openstack.org/6877821:30
fungii'm having a look at it now to see whether i have suggestions21:30
mattoliverauMorning all21:30
sdaguebecause cd isn't a command.... right21:30
sdagueand it's an exec and not bash21:30
*** alexpilotti has quit IRC21:30
fungisdague: https://git.openstack.org/cgit/openstack-infra/config/tree/modules/elastic_recheck/manifests/init.pp#n4921:31
fungiyeah, there's a separate puppet option to set cwd21:31
* fungi gets an example21:31
anteayamattoliverau: morning21:32
anteayahow are the boxes today?21:32
*** MarkAtwood has joined #openstack-infra21:32
sdagueright, cwd21:32
sdaguelet me fix21:32
*** fallenpegasus has joined #openstack-infra21:32
*** fallenpegasus has quit IRC21:32
*** markmcclain has joined #openstack-infra21:33
fungisdague: example... https://git.openstack.org/cgit/openstack-infra/config/tree/modules/kibana/manifests/init.pp#n6021:33
jheskethMorning21:33
clarkbhowdy21:33
jeblairhashar: i'd like to draw your attention to this proposed change: https://review.openstack.org/#/c/68723/21:34
openstackgerritAaron Greengrass proposed a change to openstack-infra/config: Move o.o user creation to it's own manifest.  https://review.openstack.org/6877921:34
mattoliverauanteaya: becoming fewer and fewer... still too many for my liking tho :P21:34
sdaguefungi: ... but why wouldn't the cron part work?21:34
sdaguethat's just the retrigger on git update21:35
mattoliverauanteaya: had to buy a new fridge and washing machine yesterday, turns out they didn't fare to well in storage over the last year :(21:35
fungisdague: i'll check the mail for the recheck user21:35
sdagueoh... I probably need a && there21:35
sdagueit was a semicolon instead21:35
fungisdague: no mail for recheck since the 18th21:36
sdagueor.... it helps to call the right command.... :)21:36
* fungi nods. always a good policy21:36
openstackgerritSean Dague proposed a change to openstack-infra/config: fix er_run commands  https://review.openstack.org/6878021:37
sdague"you can increase your chances of success by knowing what you are doing"21:37
hasharjeblair: hello :)21:37
anteayajhesketh: morning21:38
clarkbsdague: we can't rely on chance?21:38
hasharjeblair: will comment on change that removes the push_ref to gerrit . We don't use it at wikimedia :-]21:38
anteayamattoliverau: nooooo21:38
sdagueclarkb: I think we need more than 10k monkeys for that21:38
*** odyi has joined #openstack-infra21:38
anteayaI hope you like the new appliances though21:38
jeblairhashar: excellent.  good choice.  :)21:39
*** jhesketh_ has quit IRC21:39
clarkbmordred: if you are around I am happy to pay attention to enabling more jenkins masters21:39
clarkbmordred: is that something you had planened to watch go in?21:39
mikalSo... riddle me this batman.21:41
mikalThe zuul test chain for gate21:41
mikalIt shows patches from a minute ago in the same chain as ones from 24 hours ago21:41
mikalAll succeeding21:41
mattoliverauanteaya: yeah was annoying to get the old appliances up here, then they don't work :( But new ones arrive today and will actually be able to keep things cold.. so that's exciting :)21:41
mikalDoes this mean zuul flushes and restarts every time something gets added to the chain?21:42
mikalCause that can't be right21:42
clarkbmikal: things just get appended to the end21:42
mikalclarkb: but that means that line doesn't indicate things currently attempting to merge?21:42
mikalclarkb: or is the merge a continuous process until a flush happens?21:43
clarkbmikal: only the top (head) of the queue can merge21:43
clarkbmikal: as each of those is consumed zuul makes decisions on whether or not to merge based on test results21:43
mikalOh, so it merges the top one, and then if the second one has passed tests merges it, etc etc?21:43
clarkbyup21:43
*** dprince has quit IRC21:43
anteayamattoliverau: keeping cold things cold is a good quality in a refrigerator, good choice21:44
*** praneshp has quit IRC21:44
mikalAnd those test results are just from millions of instances speculatively testing?21:44
jeblairmordred, clarkb: i think the new jenkins masters still don't have jobs; run jjb manually?21:44
clarkbjeblair: I will trigger jjb on 0521:45
mordredclarkb: I'm in meetigs for the next couple of hours - but I'm happy if we wait and I can watch it after21:45
*** sarob has quit IRC21:47
*** sarob has joined #openstack-infra21:47
*** senk has joined #openstack-infra21:48
*** sarob_ has joined #openstack-infra21:48
*** praneshp has joined #openstack-infra21:49
clarkbmordred: ok, I will run JJB on the nodes now then21:49
clarkbauthentication failed ;(21:50
jeblairclarkb: yeah, i was trying to find out from mordred if i had pointed him at the wrong file21:51
jeblairclarkb: didn't you say something like 'the node list was dirty'?  if so, that may indicate so.21:51
*** sarob has quit IRC21:51
clarkbjeblair: yes the slave list was dirty21:51
clarkbjeblair: the users/gerrig/config.xml matches jenkins04 though21:51
clarkbguessing the trouble is elsewhere in the config21:52
jeblairclarkb: but do the secrets?21:52
clarkbjeblair: the secret.key files do not match21:52
*** melwitt has joined #openstack-infra21:52
*** pballand has joined #openstack-infra21:53
jeblairclarkb: why don't i find the correct files and finish correctly documenting this process.21:53
clarkb++21:53
openstackgerritA change was merged to openstack-infra/zuul: Allow pipelines triggers to filter by username  https://review.openstack.org/6421921:54
jeblairseeing as how it's not 3am and i'm not it au, i might finish it this time.21:54
sdaguefungi: https://review.openstack.org/68780 - clean zuul results21:54
sdaguethat should get us working on the er pages21:54
*** whoops has joined #openstack-infra21:55
fungisdague: okie dokie21:55
openstackgerritJeremy Stanley proposed a change to openstack-infra/devstack-gate: Keep logs in $BASE instead of $WORKSPACE  https://review.openstack.org/6878221:55
*** mfer has quit IRC21:55
sdaguejeblair: how do you feel about changing the stacking order on the node graph?21:56
sdaguehttp://goo.gl/NzQCxl21:57
*** sarob_ has quit IRC21:57
jeblairclarkb, mordred: ah, the secrets tarball doesn't have the right root.21:57
clarkbjeblair: I have a small update to my zuul change that I wantto push (allows for disabling rate limiting on pipelines) and am writing up docs too21:58
clarkbFYI before things get merged :)21:58
jeblairsdague: i prefer the current order -- it's designed to produce a green line of ready nodes in a stable situation; you can see that on the current graph but not your revised one21:59
*** dkliban is now known as dkliban_afk22:00
sdaguejeblair: yeh, I was thinking that the most important thing is to know what our throughput is22:00
sdagueand it's a little hard to eyeball it22:00
*** hashar has quit IRC22:00
*** dkranz has joined #openstack-infra22:00
jeblairsdague: i believe we can see the relative amount and estimate the value with the area as-is, but i agree that it is difficult to read an exact value in that case.22:01
*** elasticio has quit IRC22:02
jeblairsdague: perhaps adding current-value numbers to the legend would help?22:02
*** praneshp has quit IRC22:02
jeblairsdague: or if you wanted to duplicate the graph elsewhere with more detail (maybe lines instead of stacking area), that's an option22:02
*** sarob has joined #openstack-infra22:03
*** praneshp has joined #openstack-infra22:03
jeblairsdague: i think the quick-glance overview is the most useful aspect of the graphs on that page (i usually create larger versions of them if i want to really study them)22:03
openstackgerritA change was merged to openstack-infra/config: fix er_run commands  https://review.openstack.org/6878022:04
*** mfink has quit IRC22:04
*** thomasem has quit IRC22:05
*** krtaylor has quit IRC22:09
openstackgerritClark Boylan proposed a change to openstack-infra/zuul: Document zuul rate limiting configuration  https://review.openstack.org/6878822:09
openstackgerritClark Boylan proposed a change to openstack-infra/zuul: Allow zuul to cleanup jobs outside window  https://review.openstack.org/6872522:09
clarkbjeblair: ^ there we go22:09
*** jhesketh__ has joined #openstack-infra22:10
openstackgerritA change was merged to openstack-infra/config: Release python-barbicanclient via Zuul  https://review.openstack.org/6871922:11
jeblairclarkb, mordred, fungi: corrected tarballs and instructions placed in jenkins.o.o:~root/bootstrap22:11
dstufftfungi: mordred thanks22:11
jeblairclarkb, mordred: i also configured puppet to start on 5-7 and started it22:12
jeblairclarkb: want to try jjb again now?22:12
*** gothicmindfood has joined #openstack-infra22:12
clarkbjeblair: sure, do I need to apply the new tarball or did you correct 05?22:12
jeblairclarkb: i corrected all 3 and restarted jenkins22:12
clarkbjeblair: thanks22:12
dstufftnow my next question :) is it possible to trigger a release job for a tag that was already pushed?22:12
*** fifieldt has quit IRC22:17
fungidstufft: yes, i can do that. what tag?22:17
dstufftfungi: 2.0.022:18
dstufftI didn't realize some work had to be done before I pushed the tag :]22:18
fungidstufft: will do. gimme a few minutes to manually trigger the job22:18
clarkbjeblair: I am just about through all of your zuul changes. Do we want to merge a bunch of code and do one zuul restart or put them in more slowly?22:18
dstufftfungi: np, no hurry either, thanks a ton :)22:18
fungidstufft: and no worries. first time always needs testing anyway ;)22:18
clarkbjeblair: also still getting auth errors on jenkins05, digging into that22:18
openstackgerritA change was merged to openstack-infra/zuul: Don't store change_queue in QueueItem  https://review.openstack.org/6877322:19
clarkbjeblair: the secret.key contents are still different22:19
jeblairclarkb: all at once! :)22:20
jeblairclarkb: i'll dig into it again22:20
*** marun has quit IRC22:20
jeblair(key)22:20
*** DennyZhang has joined #openstack-infra22:21
*** nati_ueno has quit IRC22:21
*** morganfainberg is now known as morganfainberg|z22:21
*** salv-orlando has joined #openstack-infra22:22
openstackgerritA change was merged to openstack-infra/zuul: Allow zuul to cleanup jobs outside window  https://review.openstack.org/6872522:23
*** slong has quit IRC22:29
sdaguejeblair: so what exactly is going on in the current zuul picture22:30
sdagueI'm kind of confuse :)22:30
jeblairsdague: you mean the subway map for gate?22:31
clarkb68725 should fix that22:34
*** mfink has joined #openstack-infra22:35
openstackgerritMatt Ray proposed a change to openstack-infra/config: Create new Chef cookbook-openstack-integration-test for Tempest support.  https://review.openstack.org/6879122:36
sdaguejeblair: yeh22:38
*** mriedem has quit IRC22:39
jeblairsdague: as clarkb mentioned, 68725 should fix it, but basically it's a side effect of the window; the nnfi change reparenting isn't being run on changes outside the window, so the graph is incorrect for changes outside the window22:40
jeblairsdague: (and sometimes within the window, depending on where it's trying to draw the lines)22:40
sdaguegotcha22:40
jeblairsdague: anyway, the change going it causes us to calculate the map all the time (it's fast), so it'll look right soon22:40
sdaguecoolio22:40
*** ivar-lazzaro has joined #openstack-infra22:40
*** dcramer__ has quit IRC22:42
*** esker has joined #openstack-infra22:43
openstackgerritJoshua Hesketh proposed a change to openstack-infra/zuul: Allow workers to send back metadata  https://review.openstack.org/6617322:45
*** sandywalsh has quit IRC22:45
jeblairclarkb: okay, 05 api key for gerrig matches now; updated docs and tarballs22:47
jeblairclarkb: i'll fix 06 and 7, you want to try jjb on 5?22:47
*** dangers is now known as dangers_away22:47
openstackgerritClark Boylan proposed a change to openstack-infra/zuul: Report queue window in status JSON.  https://review.openstack.org/6879222:48
clarkbjeblair: yup running JJB again22:48
clarkbjeblair: ^ that change should allow us to make pretty status pages with window info22:49
clarkbJJB is applying jobs now22:49
*** DennyZhang has quit IRC22:50
clarkbjeblair: was the tarball still stale?22:51
jeblairclarkb: i think it was always missing secret.key, so i updated it22:52
jeblairclarkb: 6 and 7 should be gtg22:53
openstackgerritMatt Ray proposed a change to openstack-infra/config: Create new Chef cookbook-openstack-integration-test for Tempest support.  https://review.openstack.org/6879122:53
clarkbjeblair: ok, running JJB there as well22:53
*** jasondotstar has quit IRC22:55
jeblairclarkb: i think it would be more useful to have the status.json include a field for each change indicating whether it was active or not so we could put a different color dot beside it.22:55
*** sarob has quit IRC22:55
dstufftfungi: looks like the release thing worked22:56
jeblairclarkb: (rather than duplicating the in-window logic in the js)22:56
dstufftfungi: thanks gain22:56
clarkbjeblair: hmm good point22:56
dstufftfungi: also, no whl? :[22:56
clarkbjeblair: is that a flag that should be assigned by _processOneItem?22:56
jeblairclarkb: (optionally also continue to do what you did)22:56
jeblairclarkb: seems easiest to me22:56
*** markmcclain has quit IRC22:57
clarkbI don't think it will be too bad to toggle a flag in _processOneItem22:58
*** esker has quit IRC22:59
*** salv-orlando_ has joined #openstack-infra23:00
openstackgerritJames E. Blair proposed a change to openstack-infra/zuul: Add require-approval to Gerrit trigger  https://review.openstack.org/6851623:00
fungidstufft: we don't have wheel building and uploading automated yet. bug mordred ;)23:01
dstufftfungi: oh man23:01
dstufftmordred: is there something I can do to help with ^^23:01
dstufftI wants some whl :]23:01
lifelessfungi: so fedora images aren't buildint I presume; where can I see the logs for nodepool w.r.t. that?23:01
*** sarob has joined #openstack-infra23:02
fungilifeless: they're local to the nodepool server. i'll have a look23:02
jeblairclarkb: fungi: https://review.openstack.org/#/c/68516/ is a rebase with conflicts; i've re-reviewed but wouldn't mind a once over before i aprv again23:02
jeblairlifeless: i'd love to fix that; we could probably put them in a different dir and serve them via apache for starters23:02
*** sarob has quit IRC23:03
jeblairlifeless: (and then there's certainly better things we could do after that)23:03
*** salv-orlando has quit IRC23:03
*** salv-orlando_ is now known as salv-orlando23:03
clarkbjeblair: ok will look in a minute23:03
*** gsamfira has quit IRC23:04
*** thuc has quit IRC23:04
*** thuc has joined #openstack-infra23:04
*** changbl has quit IRC23:05
clarkbjeblair: was the conflict around the username filters?23:05
jog0is it possible to get russellb's patch https://review.openstack.org/#/c/68727/ promoted23:06
torgomaticI've got a change (https://review.openstack.org/67920) that's failing the Swift functional tests, but I don't see any proxy logs or anything here: http://logs.openstack.org/20/67920/3/check/check-swift-dsvm-functional/92f11ae/23:06
sdagueyeh +1 to promote on 6872723:06
torgomatichow might I go about finding out what's failed?23:06
jog0its the top gate reset right now23:06
sdaguewhere's the other unit test fix?23:07
jog0sdague: https://review.openstack.org/#/c/68768/23:07
clarkbtorgomatic: unfortunately that is all you will get from that run because the job timed out and was forcefully killed before the logs could be grabbed23:07
sdagueyeh, so on next gate reset 68727, and 68768 should go up23:07
sdagueand I might have 2 more as well, the nova v3 xml removes from tempest, which will give us back time in all the runs23:08
torgomaticclarkb: okay, so I should focus on making the functional tests better about timing out, and then I'll get logs?23:08
jog0sdague: I just +Aed the second patch23:08
jeblairsdague: i think we may be getting close to a zuul restart.  so maybe we can roll all 4 into that23:09
*** miqui has left #openstack-infra23:09
sdaguejeblair: sure, let me go get the +2s on the tempest patches23:09
*** thuc has quit IRC23:09
jog0russellb: ^^23:09
clarkbtorgomatic: yeah, there is a mechanism to set a timeout in devstack gate which will do a soft timeout before the hard jenkins timeout. the tempest jobs use it. I bet we just need to set the same timeout variable for devstack gate in the swift functional tests23:09
clarkbtorgomatic: or, if that is already set figure out why the soft timeout failed23:09
torgomaticclarkb: thanks; I'll go investigate that.23:10
openstackgerritA change was merged to openstack-infra/zuul: Add require-approval to Gerrit trigger  https://review.openstack.org/6851623:11
clarkbnew jenkinses are all JJB'd23:11
anteayawoot23:12
fungiclarkb: so we're ready to approve https://review.openstack.org/68759 ?23:12
*** burt1 has quit IRC23:12
jeblairhrm, that change adds the new servers but does not redistribute the min-ready across old ones23:12
jeblairso we'll end up with more ready nodes than currently23:12
*** sarob has joined #openstack-infra23:12
jeblairanother 15 bare-precise and another 30 devstack-precise23:13
jeblair(and another 15 devstack-precise-check from hp region b)23:13
*** afazekas has quit IRC23:13
fungioh, good point23:14
fungiso we might want to roughly halve them all23:14
openstackgerritJoshua Hesketh proposed a change to openstack-infra/zuul: Allow workers to send back metadata  https://review.openstack.org/6617323:14
*** dims has quit IRC23:15
jeblairfungi: yeah.  want to revise it?  i think basically the goal should be to have enough ready nodes to start several nova jobs at once.23:15
fungiwill do23:15
openstackgerritJoe Gordon proposed a change to openstack-infra/elastic-recheck: Wait only 5 minutes for ES to have the data  https://review.openstack.org/6879923:16
*** jhesketh__ has quit IRC23:17
sdaguejog0: really?23:17
sdagueI thought 13 was fair23:17
*** sarob has quit IRC23:17
sdague5 seems really pushing it23:17
jog0sdague: so I can't prove it but I think we are loosing gerrit events23:17
jeblairclarkb: think you'll have the status change ready soon?  if so we can wait for it, otherwise we're about ready to go23:18
openstackgerritClark Boylan proposed a change to openstack-infra/zuul: Report queue window in status JSON.  https://review.openstack.org/6879223:18
jog0and I am *guessing* its related to the 13 minute timeout23:18
jeblairha!23:18
sdaguejog0: why?23:18
clarkbjeblair: :)23:18
sdagueso the inner gerrit loop actually means we get them all on our side23:18
jog0sdague: I think I may be wrong about loosing events23:18
jog0but when is the last time we got a log after 10 minutes?23:18
sdagueI think the thing to do is actually see about instrumenting gerritlib to figure out how many unprocessed events are there23:18
jog0FWIW feel free to -2 this patch, see the commit message for the main why23:19
clarkbjeblair: I am providing both chunks of data as window length seems potentially useful or other rendering23:19
sdagueso I agree that we'll back up if ES is losing data23:19
*** jcoufal has quit IRC23:19
sdaguebut I think that given the fix for that is coming soon, I'm less concerned on optimizing that23:19
jeblairclarkb: yep, sounds good, except i have a -1 for that change.  :(23:19
jog0sdague: I only wrote that because I am running e-r bot locally23:20
jog0and waiting 13 minutes is just silly23:20
jog0when I know it won't work23:20
*** DennyZhang has joined #openstack-infra23:20
sdaguehmmm, just saw a weird reset event23:20
clarkbjeblair: :( looking23:20
sdagueso everything just reset23:21
jog0sdague: time to do russellb's patches and xml v3?23:21
sdagueI don't have that +2s yet, I wonder if mtreinish drove home23:21
clarkbjeblair: good point. I have tox running now on that switch23:22
mordreddstufft: https://review.openstack.org/#/c/56760/23:22
jog0sdague: at least nova patches are ready23:22
jeblairsdague, jog0: you are still triggering er bot on comments left in gerrit, right?23:22
jog0jeblair: correct23:22
sdagueyep23:22
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Enable jenkins0[5-7] in nodepool  https://review.openstack.org/6875923:23
jog0jeblair: btw sdague and I were hoping to get https://review.openstack.org/#/c/68768/ and https://review.openstack.org/#/c/68727/ promoted23:23
jog0both are nova unit test fixes for gate bugs23:23
jeblairsdague, jog0: and this is the only determination that the data are ready right?  so you get a comment from jenkins via gerrit and then wait 13(or 5) mins for an e-r query to return data for that change?23:24
openstackgerritClark Boylan proposed a change to openstack-infra/zuul: Report queue window in status JSON.  https://review.openstack.org/6879223:24
clarkband fixed23:24
jog0jeblair: correct23:24
sdaguejeblair: we get the gerrit event, then we start polling ES for consoles23:24
jeblairjog0: i know, i was talking to sdague about that earlier, we may roll that plus the xml changes in with the impending zuul restart23:24
jog0we have several queries (one for console.html) and one for other files23:25
sdaguejeblair: yeh, it's a poll loop, with a 40s delay between polls23:25
jog0jeblair: ack I got mixed up because there was a gate reset just now23:25
jeblairsdague, jog0: so there's a queue of changes for logstash to process.  i'm not sure we make any guarantees about how fast it is.  we want it to be fast, but many minutes does not seem unreasonable.  it occasionally has been hours (which is unreasonable).23:26
sdaguetypically ES is ready sometime between first and fifth poll23:26
*** DennyZhang has quit IRC23:26
sdaguejeblair: our experience is that if it doesn't show up in that 13 minute period, it will never show up23:26
sdagueat least recently23:26
jog0this is monstly from the missing console.html files23:27
*** zaro has quit IRC23:27
clarkbyup we tend to not be that slow but there is the scp race23:27
jog0for example we just lost console.html for check-tempest-dsvm-full 66921,5,7183f6123:27
sdagueyeh, after we get the console, we have another delay loop to get the rest of the files23:27
jog0where last 7 digits are short build_uuid23:27
clarkbjog0: yes, there are cases where our scp plugin won't have the console log copied before logstash machinery tries to get it resulting in a 40423:27
sdaguewe don't assume the console being indexed means we have everything else yet23:27
clarkbjenkins04 05 06 07 have the fix in place23:27
clarkbbut not the other jenkinses23:27
jeblairsdague: okay.  yeah, i don't think you should optimize for that error, but i would encourage you to allow 20-30 minutes in a real production deployment (maybe make an option for local testing)23:28
*** dstanek has quit IRC23:29
jeblairsdague jog0: because if that queue gets backed up, it's not good, but it's not worth throwing out e-r results.23:29
*** hashar has joined #openstack-infra23:29
jeblairsdague jog0: at least, assuming that wait doesn't block other changes.23:29
*** DennyZhang has joined #openstack-infra23:29
jog0jeblair: thats the part we aren't very sure about23:30
jog0we *may* be loosing gerrit events, but that could be red herrings23:30
*** dims has joined #openstack-infra23:30
jog0jeblair: ++ to config option23:30
openstackgerritJeremy Stanley proposed a change to openstack-infra/config: Enable jenkins0[5-7] in nodepool  https://review.openstack.org/6875923:30
jog0I'll revise my patch to make it a config option23:31
sdaguejeblair: so we only loose it for the recheck comments23:31
sdagueand in reality, if the system is just way backed up23:31
*** zaro has joined #openstack-infra23:31
sdaguewe're going to cascade up delays over time23:32
sdagueso end up just running well behind anyway23:32
sdaguethey way it goes lossy is actually kind of reasonable23:32
clarkbso one way to deal with cascade delays is keep track of some actual delay since event arrived23:33
sdagueyeh, that was the instrument gerrit lib idea23:33
jog0sdague: agreed  its not idea but not the biggest issue23:33
clarkbthat way if you spend 13 minutes waiting for event 1 when you get to event 2 that came in 2 minutes after event1 you only delay for two minutes23:33
jog0ideal*23:33
jeblairsdague: i don't think you should wait on one event at a time; wait for all the ones you have received up until a generous timeout.23:33
clarkbeventually you shouldreach a point where you can process the majority of the queue without delay23:33
sdaguejeblair: well, that's a lot more complex23:34
sdaguebecause right now we're just processing them as a fifo23:34
*** dizquierdo has joined #openstack-infra23:34
sdaguethrowing away everything we don't care about, and passing up what we do care about, or timing out23:34
jeblairsdague: yeah, but the thing is that you can't tell whether you haven't received the data you want because we're backed up or because we lost it; you want to do opposite things in those two cases (wait a long time / don't wait at all)23:35
sdagueyeh, sure23:35
sdaguebut that's only one part of the system23:35
jeblairsdague: so finding a single number that works for both is going to be very hard.  :)23:35
*** jasondotstar has joined #openstack-infra23:35
sdagueyep, agreed, but I think the bot is really in a good enough to be useful state at this point. And in experience, as long as we aren't just dropping logs all the time, it works pretty well23:36
*** krtaylor has joined #openstack-infra23:36
* ttx waves23:36
jeblairsdague: alternately, if you subscribed to events from the log pusher, you could have a much closer expectation of when the results should show up23:36
sdagueenough so that *it* was the reason we realized we were dropping logs23:36
clarkbttx: ohai23:36
jog0yeah so it turns out there are a few parts of e-r that are useful and the bot is small part of it23:36
jeblairsdague: and then you could say "if they haven't shown up in es within 2 mins after logstash pushed them, drop it"23:36
sdaguejeblair: sure, all good ideas. I think we have bigger fish to fry23:36
* jog0 waves to ttx23:37
sdaguebut patches are welcomed :)23:37
*** pballand has quit IRC23:37
sdague68630 is also promotable, though the speed up won't come until the patch behind it23:38
jeblairsdague: ok.  at some point the log pushers are going to get 20 minutes backed up for 2 hours and i'm not going to be inclined to burst them out; just letting you know the operational constraints of the system.  :)23:38
sdaguejeblair: that's fine23:38
sdagueand in that case we'll probably just lose one or two comments to gerrit23:38
sdaguethen be hanging out in the lag23:39
jeblairsdague: okiedokie.  ack on 68727 68768 68630.23:39
sdagueuntil it catches up23:39
sdague68673 will be the speed up patch, but I had to fix a couple of things on it. So if it goes +A at any point, that should promote as well.23:40
jeblairsdague: what was weird about the gate reset?23:41
sdaguethere was no failed job at the top23:41
clarkbit may have been consumed23:42
sdaguejust the top job disappeared and everything below reset23:42
jeblairsdague: was it waiting on one outstanding job?23:42
sdagueit was waiting on 2 last I looked23:42
sdagueI guess it could have raced23:42
sdaguegotten them both and popped23:42
russellbthanks for including my 2 patches23:42
*** whoops has quit IRC23:43
russellbi'm just sorry it took me this long to step up and help more23:43
sdaguerussellb: nothing to be sorry about, you have been a machine all week :)23:44
*** dosaboy has joined #openstack-infra23:44
russellbbeen trying to focus on it as much as i can past 2 weeks or so in one way or another23:44
russellband now i reward myself with some wine and strip-ctf.com23:46
dosaboyhey guys, i'm getting the follwing gate failure a fair bit atm - http://logs.openstack.org/00/51900/17/check/check-tempest-dsvm-full/1405291/console.html23:46
dosaboydoes it ring any bells to anyone?23:46
* jog0 is afraid to look at strip-ctf.com in a public setting23:46
dosaboyi can't work out what is causing it23:46
russellblol23:46
russellbjog0: wrong URL ....23:46
russellbstripe-ctf.com23:46
russellbbad typo.23:46
*** dizquierdo has quit IRC23:47
jog0russellb: heh23:47
openstackgerritEli Klein proposed a change to openstack-infra/jenkins-job-builder: Added rbenv wrapper  https://review.openstack.org/6535223:47
clarkbdosaboy: yeah I think that is a problem fungi hunted down. the small root partitions our cloud providers give us are too small for all of the log files we want to collet23:47
jog0sdague: so the missing events may be that my local bot isn't as backed up as the infra e-r bot so not missing events just very backed up23:47
jog0but not sure23:47
clarkbfungi: jeblair ^ that look right?23:47
*** morganfainberg|z is now known as morganfainberg23:48
jog0sdague: actually this is easy to show23:48
openstackgerritA change was merged to openstack-infra/zuul: Report queue window in status JSON.  https://review.openstack.org/6879223:48
jog0sdague: https://review.openstack.org/#/c/62118/23:48
dosaboyclarkb: ok thanks, is there a bug lodged for that per chance?23:48
clarkbdosaboy: there is, but I don't have the number handy23:48
jog0sdague: e-r commented 30 minutes after jenkins23:48
dosaboyclarkb: ok no worries23:48
jog0jeblair: ^23:48
fungidosaboy: https://launchpad.net/bugs/126873223:49
*** prad has quit IRC23:49
dosaboyfungi: cool thanks23:49
dosaboyclarkb, fungi: is it worth doing a recheck on this one or shall i sit back a bit?23:50
fungidosaboy: it seems to be somewhat infrequent, so a recheck on it should be fine23:50
sdaguefungi: can you look at the er_run thing again? That uncategorized page has still not shown up23:50
clarkbfungi: I wonder if the mysql slow log got bigger23:50
*** mriedem has joined #openstack-infra23:50
dosaboyfungi: cool, i'll give it a shot23:50
clarkbit was already huge but would certainly make those small partitions feel very cramped23:50
sdaguefungi: actually - http://status.openstack.org/elastic-recheck/23:50
sdagueit's top of the er list now23:51
jeblair#status alert Zuul is being restarted for an upgrade23:51
openstackstatusNOTICE: Zuul is being restarted for an upgrade23:51
clarkbfungi: might be worth holding a node and investigating to make sure we don't have a silly regression in filesytem use23:51
sdague149 fails in 24hrs23:51
*** ChanServ changes topic to "Zuul is being restarted for an upgrade"23:51
sdaguewith a really steep uptick today23:51
jeblairsdague: last chance for that last change; want to try to round up some +2s?23:51
* anteaya won't click any url russellb offers23:51
russellb:(23:51
clarkbsdague: what url is the unclassified data being reported to?23:52
anteayaunless it is a gerrit one23:52
sdaguejeblair: trying, but I think I'm failing23:52
sdagueso just go for it23:52
*** markmcclain has joined #openstack-infra23:52
sdaguewe'll pick it up tomorrow23:52
anteayawas reading the backscroll23:52
sdagueclarkb: it should be http://status.openstack.org/elastic-recheck/data/unclassified.html23:53
fungisdague: oh, yeah, dsvm-full seems to have a major uptick along with the rise in that offlined slave bug23:53
sdaguewe are writing it into the state dir23:53
sdaguebecause we are generating the html23:53
*** oubiwann-fn has quit IRC23:54
openstackgerritSamuel Merritt proposed a change to openstack-infra/config: Add soft timeout to Swift functional tests  https://review.openstack.org/6880223:54
*** jhesketh_ has joined #openstack-infra23:55
clarkbtorgomatic: you can drop the jenkins timeout on line 7 to ~70 minutes instead of 125 minutes23:56
clarkbtorgomatic: so that it is closer to the total runtime of the functests23:56
jeblairzuul restarted; queues restored23:56
clarkbtorgomatic: actually nevermind, we may need to do some tuning of that since devstack runs first and takes forever23:56
torgomaticclarkb: ok, will do... also I have a test failure to resolve (gate-config-layout?) so it might take me a minute23:56
clarkbtorgomatic: I left a comment23:58
clarkbtorgomatic: that shows how to fix the fail23:58
torgomaticclarkb: thanks23:58
*** jdurgin has joined #openstack-infra23:58
*** dstanek has joined #openstack-infra23:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!