jeblair | lifeless: it's non trivial, but it's not our biggest issue; the time to rebuild the queue is not what's slowing us down. | 00:00 |
---|---|---|
*** sarob has quit IRC | 00:00 | |
sdague | with 5 nova changes, 5 glance changes, 10 heat changes ... 5 nova changes | 00:00 |
*** sarob has joined #openstack-infra | 00:00 | |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Added storyboard API to webclient venv https://review.openstack.org/68523 | 00:00 |
*** flaper87 is now known as flaper87|afk | 00:01 | |
sdague | what would you propose happens when the last nova change passes, and only the first 2 nova changes ran | 00:01 |
sdague | and nothing in between did | 00:01 |
lifeless | sdague: oh I see; I had a misconception in my head about what we ran as we got a deeper queue. | 00:01 |
lifeless | sdague: that would need to be addressed first | 00:01 |
lifeless | thanks | 00:02 |
sdague | sure | 00:02 |
sdague | jeblair: you got enough review fu in you to get this out there - https://review.openstack.org/#/c/67591/ - our uncategorized review list | 00:03 |
sdague | uncategorized job failed list | 00:03 |
sdague | for elastic recheck | 00:03 |
*** sarob has quit IRC | 00:05 | |
jeblair | sdague: what did you end up doing to build that list? | 00:05 |
*** dpyzhov has quit IRC | 00:06 | |
jeblair | sdague: is it going to make spikey bits on this graph? http://cacti.openstack.org/cacti/graph_image.php?action=view&local_graph_id=26&rra_id=1 | 00:06 |
clarkb | jeblair: why do your zuul tests do the maintain cache at the end of them? | 00:06 |
*** UtahDave has quit IRC | 00:07 | |
*** senk has joined #openstack-infra | 00:07 | |
jeblair | clarkb: oh, you know they _might_ not need to, hang on | 00:07 |
sdague | jeblair: that's all ES queries | 00:07 |
sdague | doesn't touch gerrit | 00:07 |
sdague | jeblair: https://github.com/openstack-infra/elastic-recheck/blob/master/elastic_recheck/cmd/uncategorized_fails.py | 00:08 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/zuul: Add require-approval to Gerrit trigger https://review.openstack.org/68516 | 00:09 |
jeblair | clarkb: only one of them does; thx; explanation in comment now | 00:09 |
fungi | lifeless: were there any other nodepool config patches we needed in for tripleo besides 68515? are you able to successfully build images from it yet? | 00:09 |
jeblair | sdague: oh i get it | 00:09 |
fungi | lifeless: and will it need additional prep script patches? | 00:10 |
*** dpyzhov has joined #openstack-infra | 00:10 | |
*** hogepodge has quit IRC | 00:10 | |
*** fbo is now known as fbo_away | 00:11 | |
sdague | jeblair: this is basically our todo list of things that failed that we don't know why | 00:11 |
* fungi has to disappear again for 4+ hours in about 20 minutes | 00:11 | |
jeblair | sdague: lgtm; fungi do you want to re-review https://review.openstack.org/#/c/67591/ | 00:11 |
sdague | and with the new bits it will retrigger every time we push a fingerprint change | 00:11 |
fungi | jeblair: yes | 00:12 |
sdague | so we can point more people at it. :) jog0's been running it on his laptop to get through the list | 00:12 |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: Add tests for YamlParser and patch 2.6 minidom https://review.openstack.org/63579 | 00:13 |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: make scm test as the example https://review.openstack.org/65186 | 00:13 |
lifeless | fungi: I've tested and it boots successfully | 00:13 |
lifeless | fungi: so we won't need to restart nodepool to tweak it. | 00:13 |
fungi | lifeless: awesome. approving then | 00:13 |
lifeless | fungi: we may want to change the deploy scripts but that isn't a nodepool restart | 00:13 |
fungi | lifeless: well, config changes aren't a nodepool restart either thankfully, but just trying to make sure we've got what's initially needed in there | 00:14 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Added storyboard API to webclient venv https://review.openstack.org/68523 | 00:15 |
jog0 | sdague: you run it with my latest patch? | 00:15 |
sdague | jog0: no, I was just looking at it | 00:16 |
sdague | I was going to wait for the infrastructure to merge to push it | 00:16 |
sdague | so that we can see that we can do updates right | 00:16 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Added storyboard API to webclient venv https://review.openstack.org/68523 | 00:16 |
sdague | fungi just +Aed it | 00:16 |
jog0 | sdague: cool, I more wanted you to see the results because they look pretty good | 00:17 |
jog0 | the numbers that is | 00:17 |
fungi | so i think what we're going to want to do, since nodepool.o.o still has puppet disabled on it and we have a combo of code and config changes going in together, is to kill nodepoold, generate the filtered list of building nodes, manually apply puppet, turn puppet agent back on, start nodepoold again and then en-masse delete the old list of building nodes | 00:17 |
sdague | jog0: yeh, you have been a machine, it's awesome | 00:17 |
fungi | however i also don't want to do that and then run away for hours leaving everyone else to clean up whatever mess i make | 00:18 |
zaro | mgagne: ok, my jjb changes were rebased. | 00:18 |
lifeless | fungi: ack | 00:19 |
lifeless | fungi: I do expect we'll want to change those scripts, but I want to frontload getting something up and going :) | 00:19 |
*** pcrews has quit IRC | 00:19 | |
fungi | and we're also waiting on those config changes to get nodes assigned before they gate anyway | 00:19 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard: Load projects from yaml file https://review.openstack.org/66280 | 00:19 |
fungi | so they likely won't land until i'm out. however i hope to have more available time tomorrow and be spending less time in meetings | 00:20 |
fungi | or i might feel up to it when i get back to the room tonight, but no idea | 00:20 |
*** CaptTofu has joined #openstack-infra | 00:21 | |
fungi | i also have hopes nodepool might be way less strained by the time i return | 00:22 |
sdague | fungi: those are high hopes | 00:22 |
fungi | yeah, i expect the zuul improvements will take a bit longer to settle out | 00:22 |
sdague | we seem to be hovering at about 60 / day merge rate right now | 00:23 |
sdague | so it will be a while | 00:23 |
jeblair | russellb: ping | 00:23 |
fungi | sdague: i think what we're going to see once we churn through the initial check pipeline entries is that check will stay low and the gate reset rate will keep the gate from chewing up the whole pool | 00:24 |
*** rnirmal has quit IRC | 00:24 | |
clarkb | fungi: ya that is my hope | 00:24 |
fungi | since it'll get throttled down to possibly a manageable chunk at a stretch | 00:24 |
*** oubiwann_ has joined #openstack-infra | 00:24 | |
sdague | fungi: that's true, it will be interesting to see what the morning looks like | 00:24 |
openstackgerrit | Antoine Musso proposed a change to openstack-infra/zuul: webapp: set cache-control headers to prevent caching https://review.openstack.org/66583 | 00:25 |
* fungi is afk until at least 05:00 utc | 00:26 | |
clarkb | its a party in vegas^H^H^H^H^Hutah | 00:26 |
*** wenlock has quit IRC | 00:27 | |
*** matsuhashi has joined #openstack-infra | 00:34 | |
*** senk has quit IRC | 00:34 | |
clarkb | jeblair: fyi not approving https://review.openstack.org/#/c/52986/ in hopes you will have a chance to rereview it | 00:38 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Added storyboard API to webclient venv https://review.openstack.org/68523 | 00:38 |
clarkb | I am finally stabbing at my review queue | 00:38 |
*** mrodden has quit IRC | 00:39 | |
*** AaronGr is now known as aarongr_afk | 00:41 | |
jeblair | clarkb: i'm trying to respond to your comment on the zuul change, but i don't think my typing it into gerrit is going to work... | 00:43 |
jeblair | clarkb: i believe the logic i wrote _does_ try to match all criteria | 00:43 |
jeblair | clarkb: if you start with the flag set to false, then iterate over all the characteristics, you can certainly set something to true if it matches, but how do you make assertions about all the other characteristics? | 00:44 |
jeblair | clarkb: it's much easier to assume that it matches and then say that it does not on the first instance where something differs | 00:45 |
clarkb | jeblair: I think you need a flag for each then if not x and y and z return False | 00:45 |
jeblair | clarkb: ah, yeah, but they are all optional | 00:45 |
clarkb | oh, hmm | 00:45 |
clarkb | I think that was the piece missing in my head | 00:45 |
clarkb | jeblair: I am fine with it as is, I will respond to my comment | 00:47 |
jeblair | clarkb: so then it's "if (matched_a or not a_required) and (matched_b or not b_required) ...." which is, well... not easy to follow | 00:47 |
jeblair | clarkb: ok | 00:47 |
*** zhiwei has joined #openstack-infra | 00:47 | |
*** senk has joined #openstack-infra | 00:48 | |
lifeless | fungi: ok, I'm about to drop Lynne and C at the airport; I will be available on phone for emergencies (taking laptop w/me) and then back here in ~2h | 00:48 |
*** melwitt has quit IRC | 00:49 | |
*** hogepodge has joined #openstack-infra | 00:50 | |
*** mrodden has joined #openstack-infra | 00:54 | |
*** hogepodge has quit IRC | 00:57 | |
jhesketh | jeblair: in regards to the gate enqueuing (as per your response to my review comment), I'm confused to how it wouldn't it be enqueued into gate as the first Jenkins post from check will match all the requirements | 00:57 |
*** mrodden has quit IRC | 00:57 | |
jeblair | jhesketh: since gate is a dependent queue, there is a check in the dependentpipelinemanager that ensures that only changes that have met all of gerrit's requirements for merging (aside from what the queue itself will supply) are enqueued | 00:58 |
*** kraman has joined #openstack-infra | 00:59 | |
jhesketh | ah okay | 00:59 |
jeblair | jhesketh: it relates to the canMerge method of the trigger | 00:59 |
jeblair | (that's what to search for to find that code) | 01:00 |
kraman | jeblair: ping | 01:00 |
jhesketh | jeblair: right, but why not have the logic in the layout match the desired behaviour? | 01:00 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/zuul: Add require-approval to Gerrit trigger https://review.openstack.org/68516 | 01:00 |
kraman | jeblair: looking to bounce some ideas off you about solum. when you get a chance can you please take a look at https://github.com/kraman/zuul/compare/solum_hacks | 01:01 |
*** CaptTofu has quit IRC | 01:01 | |
kraman | jeblair: trying to add a message queue based trigger | 01:01 |
jeblair | jhesketh: i think that might be a good future step -- to be explicit about that in the layout and remove the implicit canMerge check; but we'd want to consider that gerrit provides some extra complexity there... | 01:02 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Simple round trip API integration with /v1/teams https://review.openstack.org/68528 | 01:02 |
jeblair | jhesketh: (things like being able to specify complex prolog rules, that say things like a +2 is required in addition to no -2 votes) | 01:02 |
jhesketh | jeblair: sure, I don't mean get rid of canMerge (at least not yet), but as your patch stands it isn't difficult to have a layout that checks for approvals before putting them into gate | 01:03 |
*** julim has quit IRC | 01:04 | |
clarkb | mordred: still around? re changes that add projects like https://review.openstack.org/#/c/61954/4 should we go ahead and approve those and manually trigger manage-projects? | 01:04 |
clarkb | mordred: unsure of where you are in debugging that pain | 01:05 |
*** zhiwei has left #openstack-infra | 01:05 | |
jeblair | jhesketh: well, that's my proposed production config too; as it stands, we're relying on the gerrit mergeability check for that aspect of behavior now and i don't want to duplicate it (and thereby confuse the issue). | 01:05 |
*** CaptTofu has joined #openstack-infra | 01:06 | |
*** dkranz has quit IRC | 01:06 | |
jeblair | kraman: i'm about dead for the day (it's been a long one and i'm still a bit ill); that's really exciting though and i'll try to look at that tomorrow when i have fresh brains; that work for you? | 01:07 |
*** julim has joined #openstack-infra | 01:07 | |
kraman | jeblair: works great. have a good eve | 01:07 |
clarkb | one neat thing about that message queue trigger is it would make putting something between gerrit and the world for event streams simpler | 01:08 |
jog0 | clarkb: looks like logstash is getting no data right now | 01:08 |
jog0 | http://logstash.openstack.org/#eyJzZWFyY2giOiJmaWxlbmFtZTpcImNvbnNvbGUuaHRtbFwiIEFORCBtZXNzYWdlOlwiRmluaXNoZWQ6XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjE3MjgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTA0MzkzMTAzMjYsIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIn0= | 01:09 |
*** praneshp has quit IRC | 01:09 | |
mordred | clarkb: I'll manually add and run that for debugging | 01:10 |
jhesketh | jeblair: sure, personally though I find it more confusing looking at the layout.yaml and having it look like it may get re-enqueued even though there is code elsewhere that stops it | 01:10 |
jhesketh | I'm happy though if you're happy | 01:10 |
jog0 | nothing in last 6 hours unless the schema changed or something | 01:10 |
*** praneshp has joined #openstack-infra | 01:10 | |
*** thuc has joined #openstack-infra | 01:11 | |
clarkb | mordred: ok, there is also a php addition to stackforge that could be used for debugging | 01:11 |
clarkb | mordred: I left comments on your unlaunchpadify projects.yaml as well | 01:11 |
jog0 | sdague: we are flying blind again ^ | 01:11 |
clarkb | jog0: o_O | 01:11 |
mordred | clarkb: yeah? ok. I'll look at that - I'd like to land that at some point - but I'm pretty sure it's going to need a merge | 01:12 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Use short build_uuids in elasticSearch queries https://review.openstack.org/67596 | 01:12 |
sdague | clarkb: hmmm... yeh, that's not good. | 01:13 |
sdague | clarkb: any chance our indexers got lost? | 01:13 |
sdague | or stuck | 01:13 |
clarkb | sdague: I think the gearman client went away | 01:13 |
*** thuc_ has quit IRC | 01:13 | |
sdague | oh, that's not so good | 01:14 |
*** vipul is now known as vipul-away | 01:14 | |
clarkb | I just restarted it, the log file doesn't show anything wrong | 01:14 |
*** thuc has quit IRC | 01:15 | |
sdague | jog0: on your change, why isn't build_uuid sent to classify? | 01:15 |
*** prad_ has joined #openstack-infra | 01:16 | |
*** ianw has quit IRC | 01:16 | |
clarkb | I know what happened | 01:16 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Use short build_uuids in elasticSearch queries https://review.openstack.org/67596 | 01:16 |
*** prad_ has quit IRC | 01:16 | |
clarkb | the jenkins log client yaml file was updated to include the new jenkinses before they had resolvable DNS records | 01:16 |
*** ianw has joined #openstack-infra | 01:16 | |
clarkb | according to syslog the service was restarted by puppet right around when it died and DNS is not zmq happy | 01:17 |
jog0 | sdague: where? I thought I did send it | 01:17 |
clarkb | jog0: sdague sorry for the turbulence but it should be happy now | 01:17 |
*** prad has quit IRC | 01:17 | |
jog0 | clarkb: ahhh | 01:17 |
jog0 | thanks | 01:17 |
sdague | jog0: https://review.openstack.org/#/c/67596/3/elastic_recheck/bot.py | 01:18 |
sdague | I'm confused why it's not just another param to classify | 01:18 |
sdague | instead of doing the union | 01:18 |
sdague | or maybe more importantly, which is the first call to classify still there | 01:19 |
sdague | s/which/why/ | 01:19 |
jog0 | sdague: because thats a rebase mistake | 01:19 |
sdague | ok :) | 01:19 |
sdague | I'm helping! :) | 01:19 |
jog0 | sdague: any other bugs before i repush? | 01:20 |
jog0 | note: still waiting for data to do a live test on new patch | 01:20 |
clarkb | you should have some data now, there is data in the new days index | 01:21 |
clarkb | it may not be a lot of data though because jobs take a while | 01:21 |
sdague | jog0: not that I saw | 01:21 |
jog0 | clarkb: cool | 01:22 |
sdague | but it's dinner time here, so I'm done | 01:22 |
sdague | and I somehow decided it was a good idea to try to get a PR into eventlet - https://github.com/eventlet/eventlet/pull/75 | 01:22 |
jog0 | btw I take it someone knows we have for jobs in post waiting for coverage jobs to run | 01:22 |
sdague | night all | 01:22 |
jog0 | sdague: lol eventlet that will be fun | 01:22 |
jog0 | o/ | 01:22 |
*** svarnau has quit IRC | 01:23 | |
*** pcrews has joined #openstack-infra | 01:23 | |
openstackgerrit | A change was merged to openstack-infra/devstack-gate: More network debugging detail https://review.openstack.org/67911 | 01:26 |
*** thuc has joined #openstack-infra | 01:29 | |
jog0 | 57 patches landed today according to https://github.com/openstack/openstack/graphs/commit-activity not too bad | 01:29 |
clarkb | jog0: we just merged a few | 01:29 |
*** vipul-away is now known as vipul | 01:29 | |
*** thuc_ has joined #openstack-infra | 01:29 | |
clarkb | at the risk of jinxing it I think things are moving. window size has fallen to 8 though | 01:29 |
StevenK | clarkb: Is this new fangled window size visible anywhere? | 01:30 |
clarkb | StevenK: only in the logs, I havne't made it public yet. Doing so is on the todo list | 01:31 |
StevenK | I do wonder how that horizon 000000 post job got in | 01:32 |
clarkb | StevenK: that was ttx | 01:32 |
clarkb | he deleted the milestone proposed branch I think | 01:32 |
StevenK | clarkb: Ah, so caused by GIGO? | 01:33 |
clarkb | yeah | 01:33 |
*** thuc_ has quit IRC | 01:33 | |
*** thuc has quit IRC | 01:33 | |
*** nosnos has joined #openstack-infra | 01:34 | |
clarkb | basically deleting things in that way creates a post job for commit 000000 | 01:35 |
*** senk has quit IRC | 01:35 | |
clarkb | jeblair you don't happen to still be around do you? | 01:36 |
clarkb | jeblair: I think the rate limiting doesn't handle the case where jobs beyond the window should be cancelled because the window shrunk and some change failed | 01:37 |
clarkb | I don't think this affects correctness as that failing change will be booted eventually and changes behind it restarted then | 01:38 |
clarkb | but it does affect our resource use | 01:38 |
openstackgerrit | A change was merged to openstack-infra/config: Remove incorrect name filters from nodepool config https://review.openstack.org/67684 | 01:40 |
clarkb | yeah looks like cancelJobs is in _processOneItem which is within my sliced actionable item list hmm | 01:41 |
mordred | clarkb: I'm ready to start jenkins05 I think | 01:41 |
mordred | clarkb: am I correct about that? | 01:41 |
clarkb | mordred: /me looks | 01:42 |
mordred | clarkb: (this is my first jenkins server in the new world order, so I want to make sure I'm not going to kill things) | 01:43 |
StevenK | I thought it would be 06 | 01:43 |
clarkb | mordred: looks like the slave list is dirty | 01:44 |
*** jerryz has quit IRC | 01:45 | |
*** senk has joined #openstack-infra | 01:46 | |
*** hasharMeeting is now known as hashar | 01:47 | |
*** yaguang has joined #openstack-infra | 01:47 | |
clarkb | jeblair: yup once the window shifted to cover the failure nnfi did its thing. So correctness is preserved, it just isn't super resource efficient | 01:47 |
*** senk has quit IRC | 01:51 | |
*** senk has joined #openstack-infra | 01:51 | |
openstackgerrit | A change was merged to openstack-infra/config: add in elastic-recheck-unclassified report https://review.openstack.org/67591 | 01:53 |
*** alexpilotti has quit IRC | 01:54 | |
*** mrodden has joined #openstack-infra | 01:54 | |
*** SumitNaiksatam has quit IRC | 01:55 | |
mikal | tempest is running multiple test threads, yes? | 01:57 |
clarkb | mikal: s/threads/processes/ and currently in the gate there are 2 | 01:58 |
clarkb | it was 4 before | 01:58 |
mikal | clarkb: do you know if anyone has tried tempest with libvirt/lxc? | 01:58 |
mikal | clarkb: it crashes my cloud instances in interesting ways... | 01:58 |
mikal | clarkb: i.e. requiring hard reboot to get back | 01:58 |
clarkb | mikal: I do not know | 01:58 |
mikal | Fairy nugg | 01:59 |
clarkb | mikal: ewindisch is working on it with docker but that isn't libvirt | 01:59 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: objectify the gerrit event for our purposes https://review.openstack.org/67941 | 01:59 |
ewindisch | mikal: actually, getting my jobs to run libvirt/lxc would literally take a single line change... | 02:00 |
ewindisch | hmm | 02:00 |
ewindisch | that might make zul very happy with me ;-) | 02:00 |
zul | yes it would :) | 02:02 |
mikal | zul: have you tried to run tempest? Does it eat your machines? | 02:03 |
ewindisch | zul: I'm running this as opposed to using devstack-gate (*dodges tomatoes*): https://github.com/ewindisch/dockenstack | 02:03 |
zul | mikal: i havent | 02:03 |
openstackgerrit | A change was merged to openstack-infra/jenkins-job-builder: Add local-branch option https://review.openstack.org/65369 | 02:04 |
ewindisch | I'm uploading it to the docker index so in a few minutes, one would be able to simply run: | 02:04 |
ewindisch | docker run -privileged -lxc-conf=aa_profile=unconfined -t -i ewindisch/dockenstack-tempest | 02:04 |
ewindisch | and it will bring up the latest master branches and runs tempest against them | 02:04 |
zul | cool why do you disable the apparmor profiles? | 02:05 |
ewindisch | (other repos/branches can be specified via the environment args) | 02:05 |
mikal | clarkb: I can't see where in tempest's run_tests.sh the number of threads is set? Is it hiding from me? | 02:05 |
mikal | s/threads/processes/ | 02:05 |
jog0 | mikal: I2338ebf5df8bced935e9ed9b0ebd2d4e859b5dbe | 02:06 |
ewindisch | zul: I actually forked the code from someone else that did that. I haven't reevaluated yet | 02:06 |
jog0 | is the patch that changed the number of threads in gate | 02:06 |
mikal | jog0: ta | 02:06 |
zul | ewindisch: ah | 02:07 |
openstackgerrit | Matthew Treinish proposed a change to openstack-infra/elastic-recheck: Add multi-project irc support to the bot https://review.openstack.org/67540 | 02:07 |
*** gokrokve has quit IRC | 02:07 | |
*** gokrokve has joined #openstack-infra | 02:07 | |
clarkb | mikal its in devstack gate | 02:08 |
ewindisch | zul: I'm running dockenstack with lxc now - we'll see if it works, I'm sure it must | 02:08 |
*** senk has quit IRC | 02:09 | |
ewindisch | zul: are you doing anything in regard to keeping the lxc driver in per the deprecation plan? | 02:09 |
zul | ewindisch: im working on something right now so i can get things tested more easily | 02:11 |
*** gokrokve has quit IRC | 02:12 | |
jog0 | clarkb: any changes to gerrit or gerritlib of late? http://paste.openstack.org/show/61714/ | 02:13 |
jog0 | I am wondering what is causing ^, it may be me | 02:14 |
*** david-lyle_ has joined #openstack-infra | 02:15 | |
clarkb | no recent changes | 02:16 |
clarkb | too swamped | 02:16 |
*** smurugesan1 has joined #openstack-infra | 02:16 | |
*** oubiwann_ has quit IRC | 02:16 | |
*** smurugesan has quit IRC | 02:16 | |
*** oubiwann_ has joined #openstack-infra | 02:16 | |
jog0 | your to swamped or gerrit is? | 02:17 |
jog0 | or both | 02:17 |
jog0 | and thanks | 02:17 |
clarkb | we are | 02:17 |
jog0 | clarkb: ack, thanks that answers my question | 02:18 |
lifeless | fungi: back | 02:20 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard: Update ProjectGroups API to consume ID's rather than names. https://review.openstack.org/68540 | 02:21 |
*** coolsvap has quit IRC | 02:21 | |
ewindisch | mikal: lxc tempest is running fine for me here, other than failing various tests | 02:22 |
*** vkozhukalov has joined #openstack-infra | 02:23 | |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Simple round trip API integration with storyboard-api https://review.openstack.org/68528 | 02:24 |
ewindisch | mikal: 210 tests in 139 seconds, 34 failures. (smoketests only) | 02:24 |
*** coolsvap has joined #openstack-infra | 02:24 | |
mikal | ewindisch: huh, interesting | 02:25 |
mikal | I'm trying it on a local machine now | 02:25 |
mikal | Well, installing over very slow DSL at least | 02:25 |
ewindisch | mikal: running on a rackspace vm, btw | 02:25 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard: Update ProjectGroups API to consume ID's rather than names. https://review.openstack.org/68540 | 02:26 |
*** krotscheck has quit IRC | 02:26 | |
*** dcramer__ has joined #openstack-infra | 02:27 | |
*** hashar has quit IRC | 02:28 | |
*** rakhmerov has quit IRC | 02:30 | |
zul | ewindisch: which version of libvirt? | 02:32 |
ewindisch | zul: I should note that most of those errors are around cinder. | 02:33 |
openstackgerrit | A change was merged to openstack-infra/config: Add a fedora image definition for tripleo-cloud https://review.openstack.org/68515 | 02:33 |
zul | ewindisch: as in attaching volumes? | 02:33 |
ewindisch | zul: as in creating volumes... | 02:33 |
zul | huh | 02:33 |
ewindisch | zul: I'm having the problem with all virt drivers, so it's probably my image, devstack, or cinder itself | 02:34 |
zul | ok cool.. | 02:34 |
ewindisch | (probably my image, I'd guess) | 02:34 |
* zul disapears ;) | 02:34 | |
*** julim has quit IRC | 02:35 | |
lifeless | fungi: evening :) | 02:36 |
ewindisch | alright, time to find something to do in SF that isn't work. | 02:36 |
*** coolsvap has quit IRC | 02:37 | |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Use short build_uuids in elasticSearch queries https://review.openstack.org/67596 | 02:40 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Clarify required parameters in query_builder https://review.openstack.org/67756 | 02:42 |
hub_cap | mordred: got a sec for a dumb Q? | 02:43 |
hub_cap | u had reported said bug (https://bugs.launchpad.net/trove/+bug/1179009) a while ago and someone is submitting a fix and i cant seem to believe its the only thing thats wrong w/ our code (considering we use proboscis heh) | 02:44 |
clarkb | hub_cap: there are reasons to use testools even with probuscis | 02:45 |
clarkb | cleanups for example | 02:45 |
clarkb | also kill proboscis with fire | 02:45 |
*** rockyg has quit IRC | 02:46 | |
lifeless | hub_cap: whats proboscis | 02:46 |
*** mrodden has quit IRC | 02:49 | |
*** gokrokve has joined #openstack-infra | 02:51 | |
hub_cap | hehehe | 02:54 |
hub_cap | clarkb: thats the plan | 02:54 |
hub_cap | lifeless: im sure tim simpson has tried to chat w/ u about it ;) its the super special trove testing framework | 02:55 |
hub_cap | clarkb: the reason i ask is cuz a guy submited a _single_ file change, and it just doesnt seem like thats all itll take | 02:56 |
hub_cap | mind u, lifeless clarkb mordred i know next to nothing (in general) and wrt python testing frameworks, so it may be all we need.. but seems fishy | 02:56 |
hub_cap | https://review.openstack.org/#/c/61169/4/trove/tests/api/instances.py | 02:56 |
*** pcrews has quit IRC | 02:57 | |
*** mriedem has quit IRC | 02:59 | |
clarkb | hub_cap: ya you would need to replace it everywhere unittest is used | 03:00 |
hub_cap | but thats really _it_, like no changing setup/teardown method names etc... | 03:01 |
hub_cap | cuz ive got to convince a -2'er now to undo his -2 ;) | 03:01 |
*** SumitNaiksatam has joined #openstack-infra | 03:07 | |
*** UtahDave has joined #openstack-infra | 03:09 | |
*** emagana has quit IRC | 03:10 | |
*** nati_ueno has quit IRC | 03:10 | |
*** jhesketh has quit IRC | 03:10 | |
*** jhesketh has joined #openstack-infra | 03:11 | |
lifeless | hub_cap: why should it be more? | 03:12 |
*** sdake-ooo is now known as sdake | 03:13 | |
hub_cap | lifeless: im just making sure theres not more to it so i can be well informed when i go to talk to him ;) | 03:13 |
hub_cap | it seems as if we inherit from object for all our tests anyway sans one | 03:14 |
hub_cap | whether thats right or not :) | 03:14 |
*** gokrokve has quit IRC | 03:17 | |
*** ok_delta has joined #openstack-infra | 03:22 | |
*** mayu_ has joined #openstack-infra | 03:24 | |
mayu_ | ping anteaya | 03:31 |
notmyname | clarkb: just saw in scrollback something you said about "swift doesn't do this and they never will". somehting you were talking about with bknudson. what was the context? something we need to look at? | 03:32 |
mayu_ | my patchset fail. http://logs.openstack.org/48/68148/1/check/gate-neutron-python27/b7edd9c/console.html | 03:33 |
mayu_ | anybody help me to check, I'm new | 03:34 |
clarkb | notmyname: oslo.logging | 03:34 |
notmyname | clarkb: ok. is there a feature set that we should be targeting (or need to explain how we are targeting)? or is it just a question of "things are different" | 03:35 |
mayu_ | I have not found any clue for the failure | 03:35 |
*** cody-somerville has quit IRC | 03:35 | |
clarkb | notmyname: common log format defaults | 03:35 |
*** smurugesan1 has quit IRC | 03:36 | |
clarkb | mayu_: there is a traceback | 03:36 |
mayu_ | yes | 03:36 |
notmyname | clarkb: ok, thanks. but is it something specific in the log, or just looking for the same format? | 03:37 |
clarkb | looking for the same format | 03:37 |
mayu_ | but it'is nothing about my code | 03:37 |
mayu_ | It seems a bug | 03:38 |
mayu_ | Bug 1270182 | 03:39 |
lifeless | fungi: around ? | 03:39 |
clarkb | mayu_: yes that is possible | 03:40 |
mayu_ | thanks | 03:41 |
mayu_ | there are too many failures, here is the jenkins result, http://paste.openstack.org/show/61715/ | 03:44 |
mayu_ | my patchset https://review.openstack.org/#/c/68148/ | 03:45 |
mayu_ | jenkins fail, find nothing about my patchset after analysise failure log | 03:47 |
mayu_ | I don't what to do | 03:48 |
mayu_ | I don't know what to do | 03:48 |
*** gokrokve has joined #openstack-infra | 03:49 | |
mayu_ | clarkb: help | 03:49 |
*** cody-somerville has joined #openstack-infra | 03:51 | |
mordred | notmyname: honestly - I think even "ability to configure to have a shared log format would be a good step in the right direction | 03:52 |
mordred | notmyname: and/or helpful | 03:52 |
mordred | hub_cap: the thing it will do that unittest base class won't is warn you if you don't upcall on setUp/tearDown | 03:55 |
notmyname | mordred: ya, that's been mentioned. I'm not entirely opposed to that idea, but the hard part is that a different log format is generally only useful to new clusters. while I believe there are more swift clusters that have yet to be installed than have been installed so far, the existing format (which isn't really broken) does give some inertia to keeping the current way | 03:55 |
clarkb | mayu_: you can recheck it with the bug number you identified as being the cause for the failure | 03:55 |
*** ArxCruz has quit IRC | 03:55 | |
clarkb | mordred: in the comment from jenkins is a link to a wiki article that talks about all this | 03:55 |
mordred | notmyname: yeah - and I totally hear that - I think that's why I'm less ardent on the "change the default" thing | 03:56 |
notmyname | mordred: so it's mostly a question or prioritization, and adding yet another config for something that isn't broken isn't really high ;-) | 03:56 |
mordred | notmyname: well, "not broken" depends on how you consider your position inside of an opensatck install | 03:56 |
notmyname | mordred: to be specific, I mean "isn't broken" == "isn't a pain point for people installing swift" | 03:57 |
mordred | if you consider that a goal, then you are currently broken, since you're the only member of that install that has no ability to log in a manner similar to the others | 03:57 |
mordred | right. it's not for people only installing swift | 03:57 |
notmyname | mordred: or actually, "isnt' a pain point for people deploying and contributing to swift" ;-) | 03:57 |
mordred | notmyname: I think it's _cleary_ broken without even needing to be demonstrated from an openstack pov | 03:58 |
notmyname | mordred: ya. supporting a common format seems like a good idea. is there a doc that describes the common format somewhere? | 03:58 |
mordred | how mucch you care about that is --- you know :) | 03:58 |
notmyname | mordred: like I said. people who contribut to swift ;-) | 03:58 |
mordred | people who contribute to opensatck | 03:58 |
mordred | notmyname: I love our regular dance about this ;) | 03:59 |
notmyname | mordred: I'm just trying to antagonize you | 03:59 |
mordred | notmyname: darn. I was trying to do the same to you | 03:59 |
notmyname | mordred: what is the openstack common logging format? | 04:00 |
*** ok_delta has quit IRC | 04:00 | |
hub_cap | "Shits broke %s" | 04:01 |
mordred | notmyname: what hub_cap said | 04:01 |
mordred | notmyname: looking - one sec | 04:01 |
notmyname | kk | 04:01 |
hub_cap | something liek this mordred ? https://github.com/openstack/oslo-incubator/blob/master/openstack/common/log.py#L136 | 04:03 |
mordred | hub_cap: no - like this: http://git.openstack.org/cgit/openstack/oslo-incubator/tree/openstack/common/log.py#n130 | 04:05 |
mordred | :) | 04:05 |
mordred | notmyname: ^^ | 04:05 |
notmyname | looking | 04:05 |
mordred | either one - you can look at the openstack version or the github version :) | 04:05 |
*** thuc_ has joined #openstack-infra | 04:06 | |
notmyname | I'll look at the better one (and let y'all decide which one that is) ;-) | 04:06 |
*** david_lyle has joined #openstack-infra | 04:07 | |
notmyname | mordred: hub_cap: so line 137-138 is what you'd want to see for the proxy server? or for internal logging? or what? | 04:07 |
*** yamahata has joined #openstack-infra | 04:08 | |
*** david-lyle_ has quit IRC | 04:09 | |
notmyname | what does %(instance)s map to? | 04:09 |
notmyname | and %(message)s is just an arbitrary string? | 04:09 |
mordred | notmyname: not 100% sure - I'd assume an identified that helps find which thign this happened on - and yes to message | 04:09 |
mordred | lifeless: ^^ can you provide any insight into the above? | 04:10 |
notmyname | is the first item the duration of the request or when it happened? | 04:10 |
lifeless | instance will be the string description of the instance - looks like nova specifics that have leaked into oslo | 04:11 |
dstufft | So I'm not sure about the zuul status page, if I pushed a tag to a thing 5-6 hours should I have seem a release by now? | 04:11 |
notmyname | so that log format seems like just a prefix (assuming message is just an arbitrary string). are there any requirements on the message? eg no spaces? | 04:12 |
notmyname | lifeless: mordred: ^ | 04:12 |
lifeless | notmyname: http://git.openstack.org/cgit/openstack/oslo-incubator/tree/openstack/common/log.py#n326 | 04:12 |
hub_cap | instance is not just nova, we use it too in trove ;) | 04:13 |
lifeless | notmyname: message has no constraints | 04:13 |
hub_cap | and its rarely used..., it is the uuid of the instance fwiw | 04:13 |
hub_cap | '[instance: %(uuid)s] ',... | 04:14 |
*** CaptTofu has quit IRC | 04:14 | |
hub_cap | http://git.openstack.org/cgit/openstack/oslo-incubator/tree/openstack/common/log.py#n172 | 04:14 |
notmyname | and it's intended that instance and message aren't separated by a space? so that if instance isn't passed in and message = "hello world", you get a different number of log fields than if the instance is passed in? | 04:14 |
hub_cap | yea yea mordred i know i need to start using git.o.o ;) for linking | 04:14 |
hub_cap | iirc u get a [] | 04:15 |
hub_cap | if no instance is there | 04:15 |
*** rcleere has joined #openstack-infra | 04:15 | |
hub_cap | nope im wrong notmyname , u get '' if no instance & no instance_uuid | 04:15 |
lifeless | notmyname: its intended, it either is invisible, or nicely presented | 04:16 |
notmyname | so you get the log line as "12.034 543 DEBUG foo [-] hello world" or "12.034 543 DEBUG foo [-] uuidhello world" | 04:17 |
hub_cap | examples | 04:17 |
hub_cap | https://gist.github.com/hub-cap/8572752 | 04:17 |
hub_cap | i can hack in the instance_uuid in a test to show if needed | 04:17 |
fungi | so i lied. i got back earlier than 05:00 utc (sorry lifeless, you must not have seen me say i was disappearing) | 04:18 |
lifeless | fungi: I was optimistic that you were joking :) | 04:19 |
fungi | ahh | 04:19 |
lifeless | fungi: so, how drunk are you? | 04:19 |
fungi | nope, i was in a yurt with no elecricity | 04:19 |
notmyname | so to be clear, where are you wanting this log format in swift? everywhere that swift logs? internal requests (eg replication and object server logs)? proxy server logs (ie API access)? | 04:19 |
fungi | not drunk in the least (unfortunately) | 04:19 |
lifeless | fungi: GREAT, lets do this! | 04:19 |
* hub_cap runs away to let notmyname / mordred discuss ;) | 04:19 | |
fungi | checking up on what the current state is so i don't jump in blind, so just a sec | 04:19 |
notmyname | fungi: what I know about yurts is that they are in Kazakhstan and you drink fermented mare's milk | 04:20 |
fungi | notmyname: this one was somehow not in kazakhstan | 04:20 |
notmyname | fungi: that's much less exciting | 04:20 |
fungi | and they were fresh out of kefir | 04:21 |
ttx | not drunk. | 04:21 |
hub_cap | ttx: lame | 04:21 |
hub_cap | ;) | 04:21 |
fungi | ttx: i think that wine was non-acoholic or something (or else the conference circuit has hardened my liver) | 04:22 |
ttx | https://review.openstack.org/#/c/68135/ is still deep in hte queue, so I guess I should just sleep now | 04:22 |
fungi | so, gate scheduling changes seem to be helping. nodepool thrash is gone, gone, gone | 04:23 |
fungi | i think it's safe to go ahead and restart nodepool now | 04:23 |
*** harlowja is now known as harlowja_away | 04:23 | |
*** dcramer__ has quit IRC | 04:25 | |
mordred | notmyname: I think we notice it because of the logstash processing - so I'd probably say "everywhere that things logs things?" | 04:26 |
fungi | nodepoold killed | 04:26 |
mordred | fungi: hey... | 04:27 |
fungi | mordred: hey | 04:27 |
mordred | fungi: I've got jenkins05 up - do we want to co-locate teh nodepool restart with adding that? or wait because the queu is less suck? | 04:27 |
fungi | mordred: queue seems okay actually | 04:27 |
notmyname | mordred: ok. thanks. | 04:27 |
fungi | adding jenkins masters is only a config change, so no nodepoold restart needed for that | 04:28 |
lifeless | fungi: so I'd read that as no | 04:28 |
lifeless | fungi: if it doesn't need a restart, don't do one ;) | 04:28 |
fungi | lifeless: mordred: right | 04:29 |
fungi | puppet config applied and agent started | 04:31 |
*** markmcclain has joined #openstack-infra | 04:32 | |
fungi | nodepool started and didn't insta-die... good sign | 04:32 |
mordred | w00t | 04:33 |
mordred | fungi: ++ | 04:34 |
fungi | killing the list of building nodes i recorded now | 04:34 |
lifeless | fungi: cool | 04:35 |
lifeless | fungi: I see a template building now | 04:35 |
*** morganfainberg is now known as morganfainberg|z | 04:35 | |
*** vogxn has joined #openstack-infra | 04:35 | |
lifeless | fungi: if you hit quota issues, let me know, its set fairly low because this cloud has one hypervisor only ATM | 04:36 |
fungi | will do--it'll be a bit before i can start paying attention to logs and node/image lists | 04:36 |
*** vogxn has left #openstack-infra | 04:37 | |
lifeless | ack | 04:38 |
fungi | mass deletes of the stale building nodes is underway now | 04:41 |
*** praneshp has quit IRC | 04:41 | |
*** markmcclain has quit IRC | 04:42 | |
*** markwash_ has joined #openstack-infra | 04:43 | |
*** markmcclain has joined #openstack-infra | 04:43 | |
*** markwash has quit IRC | 04:43 | |
*** markwash_ is now known as markwash | 04:43 | |
*** praneshp has joined #openstack-infra | 04:43 | |
*** thuc_ has quit IRC | 04:43 | |
*** thuc has joined #openstack-infra | 04:44 | |
*** praneshp has quit IRC | 04:44 | |
*** ryanpetrello has quit IRC | 04:46 | |
*** thuc has quit IRC | 04:48 | |
*** ryanpetrello has joined #openstack-infra | 04:48 | |
*** smurugesan has joined #openstack-infra | 04:52 | |
*** yamahata has quit IRC | 04:57 | |
*** smemon92 has joined #openstack-infra | 04:58 | |
openstackgerrit | A change was merged to openstack-infra/config: Add mailing list for OpenStack Ambassadors https://review.openstack.org/66478 | 05:02 |
*** nati_ueno has joined #openstack-infra | 05:03 | |
*** nati_uen_ has joined #openstack-infra | 05:05 | |
*** nati_ueno has quit IRC | 05:08 | |
*** rakhmerov has joined #openstack-infra | 05:09 | |
openstackgerrit | Darragh Bailey proposed a change to openstack-infra/jenkins-job-builder: Add tests for YamlParser and patch 2.6 minidom https://review.openstack.org/63579 | 05:10 |
*** harlowja_away is now known as harlowja | 05:10 | |
openstackgerrit | Noboru Arai proposed a change to openstack-dev/hacking: Checking for vim tag https://review.openstack.org/68556 | 05:13 |
*** gokrokve has quit IRC | 05:14 | |
*** talluri has joined #openstack-infra | 05:18 | |
SpamapS | are there other critical bugs in the gate that should definitely be in front of https://review.openstack.org/#/c/68135/ ? | 05:19 |
SpamapS | critical Heat bug.. terrible thing really.. but it has been "queued" all day. | 05:20 |
*** yamahata has joined #openstack-infra | 05:20 | |
*** senk has joined #openstack-infra | 05:20 | |
clarkb | who knows | 05:22 |
*** senk1 has joined #openstack-infra | 05:23 | |
clarkb | SpamapS: I am not able to do promotions of jobs now, but if that is still floudnering tomorrow ping us here and we can promote it to the head of the queue | 05:23 |
clarkb | also the new rate limiting stuff seems to be helping quite a bit | 05:24 |
StevenK | clarkb: Do you plan to have it rebalance and such for failures anywhere in the running queue not just the window? | 05:24 |
clarkb | StevenK: I don't parse the question | 05:25 |
*** senk has quit IRC | 05:25 | |
clarkb | StevenK: what do you maen by rebalance? | 05:25 |
StevenK | clarkb: If you look at the queue now, it hasn't kicked out 67349,1 properly | 05:26 |
clarkb | StevenK: oh right. ya that is a bug. if the window shrinks when there are running jobs it basically leaves them hanging. Then when things shift into the window they are dealt with properly | 05:26 |
clarkb | I noticed this just before heaidng home today and a quick look at the zuul scheduler code doesn't have me hopeful it will be easy to fix. I do hope to fix it though | 05:27 |
*** nicedice has quit IRC | 05:27 | |
clarkb | StevenK: best I can tell the current code is correct just not as efficient as it could be | 05:27 |
StevenK | clarkb: Do you track the entire running queue, or just whatever the window is? | 05:27 |
clarkb | StevenK: we track the entire queue except for when we do things like start and cancel jobs :) | 05:28 |
clarkb | so the data is there, I just need to untangle the loop that reacts to certain events so that the window only affects job starts and not stops | 05:28 |
StevenK | Right | 05:28 |
lifeless | and we have slaves | 05:29 |
clarkb | StevenK: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/scheduler.py#n1115 is where everything happens | 05:29 |
clarkb | StevenK: and at http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/scheduler.py#n1186 I limit calls ot that on the window | 05:29 |
lifeless | https://jenkins01.openstack.org/job/gate-tripleo-deploy/4/ | 05:30 |
lifeless | yay | 05:30 |
*** praneshp has joined #openstack-infra | 05:30 | |
lifeless | like a bought one | 05:30 |
clarkb | so _processOneItem should probably be split into two things, one that is run on the entire queue and another function that runs only on the window | 05:30 |
lifeless | fungi: thank you! | 05:30 |
StevenK | clarkb: That sounds like a good plan | 05:31 |
clarkb | StevenK: I am pretty sure that can be down without any other changes to the zuul scheduler, I will try digging into it tomorrow | 05:32 |
smemon92 | Hi, I am unable to register new email address in review.openstack.org please help | 05:33 |
fungi | lifeless: thank YOU! i'm just sorry i couldn't spare time to try it out until now | 05:34 |
*** ryanpetrello has quit IRC | 05:34 | |
*** pballand has quit IRC | 05:34 | |
lifeless | failed ah well. | 05:34 |
fungi | smemon92: does it give you an error message when you try to enter it? or is it having problems with the confirmation e-mail it sends you at the new address (are you receiving that at all)? | 05:35 |
lifeless | hmm, not much in the way of log output | 05:35 |
*** pballand has joined #openstack-infra | 05:36 | |
smemon92 | fungi: I am receiving email address but while i try to confirm that mail i am gettin error like this "Server Error,Identity in use by another account" | 05:38 |
*** pballand has quit IRC | 05:39 | |
*** afazekas has quit IRC | 05:40 | |
*** coolsvap has joined #openstack-infra | 05:41 | |
*** dguitarbite has joined #openstack-infra | 05:43 | |
fungi | smemon92: okay, it sounds like you may have more than one account in gerrit and one of them already has that address associated with it. what's the address you're trying to add, and what does the settings page say your current account id number is? | 05:43 |
clarkb | StevenK: rereading _processOneItem() this is going to be fun. Maybe I can get jeblair to help untangle it | 05:44 |
fungi | clarkb: overall your adaptive throttle is having a great effect on node starvation | 05:44 |
fungi | and more rapidly than i anticipated | 05:45 |
*** gokrokve has joined #openstack-infra | 05:45 | |
clarkb | fungi: ya it seems to be doing well. check is all but cleared, trigger and result queues trned towards zero, and stuff is merging | 05:46 |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 05:46 | |
clarkb | fungi: the issue StevenK points out is an efficiency thing but not a correctness thing | 05:46 |
clarkb | fungi: I think now the big thing will be refining the scaling and the way the data is presented | 05:46 |
fungi | being able to tune much of that hitlessly through config reloads will be nice | 05:47 |
*** kraman1 has joined #openstack-infra | 05:47 | |
*** mayu_ has quit IRC | 05:48 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 05:49 | |
smemon92 | fungi :I am trying to add "salman@aptira.com" and my current account id number is "10071" , I already registered this account with my previous accont but now i deleted my previous accont | 05:51 |
fungi | smemon92: i'll have a look in the gerrit database and get that cleaned up. just a moment and i'll let you know when it's ready to try again | 05:52 |
clarkb | you can't delete accounts in gerrit fwiw | 05:53 |
smemon92 | fungi: ya sure , thank u | 05:53 |
*** nati_ueno has joined #openstack-infra | 05:53 | |
*** nati_ueno has quit IRC | 05:53 | |
*** nati_ueno has joined #openstack-infra | 05:54 | |
mordred | clarkb: you can with a big enough hammer | 05:56 |
*** nati_uen_ has quit IRC | 05:56 | |
mordred | clarkb: also, I'm quite impressed that the adaptive throttle is so effective | 05:57 |
clarkb | the exponential backoff is pretty heavy handed | 05:58 |
clarkb | we just had another window shrinkg I think it is ~5 now | 05:58 |
clarkb | we might need to think about maybe something more linear +1 on passes -2 on failures | 05:58 |
*** kraman1 has quit IRC | 05:58 | |
fungi | smemon92: i've removed that e-mail address from your old account (id 8923). also deleted smemon92@gmail.com and your old ssh username "salman" from it as well | 05:58 |
fungi | clarkb: i dunno, i think we should give it some time to see where it settles out with the current heuristic first | 06:00 |
mordred | fungi: ++ | 06:02 |
*** markmcclain has quit IRC | 06:02 | |
fungi | i have a feeling cautious incrementing coupled with vicious halving will work out well, but i'll reserve judgment until we have some data | 06:04 |
*** rcleere has quit IRC | 06:06 | |
smemon92 | fungi : thank you so much , I added my account | 06:10 |
fungi | smemon92: you're welcome | 06:10 |
*** praneshp_ has joined #openstack-infra | 06:13 | |
*** Ryan_Lane has quit IRC | 06:14 | |
*** Ryan_Lane has joined #openstack-infra | 06:14 | |
*** CaptTofu has joined #openstack-infra | 06:14 | |
*** talluri has quit IRC | 06:15 | |
*** talluri has joined #openstack-infra | 06:15 | |
*** praneshp has quit IRC | 06:16 | |
*** praneshp_ is now known as praneshp | 06:16 | |
fungi | image builds in nodepool look okay | 06:18 |
*** CaptTofu has quit IRC | 06:19 | |
*** talluri has quit IRC | 06:19 | |
fungi | this failure mode worries me... https://jenkins01.openstack.org/job/gate-nova-python27/17278/console | 06:20 |
fungi | looks like a child process dying | 06:20 |
clarkb | ya its affecting a lot of stuff | 06:21 |
clarkb | but pretty sure it isn't an infra problem | 06:21 |
fungi | nova problem? | 06:22 |
*** afazekas has joined #openstack-infra | 06:22 | |
*** afazekas has quit IRC | 06:22 | |
fungi | or has it been hitting other jobs/projects? | 06:22 |
*** UtahDave has quit IRC | 06:22 | |
clarkb | nova problem | 06:22 |
fungi | k, i'm now less worried | 06:24 |
*** oubiwann_ has quit IRC | 06:25 | |
*** senk1 has quit IRC | 06:31 | |
fungi | two in a row failing the same job... maybe the first is introducing it? | 06:33 |
clarkb | I don't think so there are check tests failing too | 06:35 |
*** pcrews has joined #openstack-infra | 06:37 | |
*** gyee has quit IRC | 06:38 | |
clarkb | fungi: it is running `sqlite3 test_bigint.sqlite '.schema'` and that is failing | 06:38 |
fungi | ahh | 06:38 |
fungi | consistently on that one test | 06:38 |
fungi | this on the other hand is troubling... https://jenkins02.openstack.org/job/gate-tempest-dsvm-neutron-large-ops/14063/consoleText | 06:38 |
fungi | looks like a hung apt-get install hanging around maybe | 06:39 |
clarkb | fungi: we dno't do auto updates on those ndoes do we? might also be that | 06:39 |
fungi | oh, i wonder if we broke the exclusion for that | 06:40 |
*** harlowja is now known as harlowja_away | 06:45 | |
fungi | the complimentary wireless here is flaking out pretty badly. nodepool seems stable still, so knocking off for the night | 06:48 |
*** nati_uen_ has joined #openstack-infra | 06:48 | |
*** matsuhashi has quit IRC | 06:50 | |
clarkb | good night, I am about to do the same | 06:51 |
*** nati_ueno has quit IRC | 06:51 | |
*** pblaho has joined #openstack-infra | 06:52 | |
*** odyssey4me has joined #openstack-infra | 06:53 | |
*** matsuhashi has joined #openstack-infra | 06:54 | |
*** smemon92 has quit IRC | 07:00 | |
*** CaptTofu has joined #openstack-infra | 07:00 | |
*** senk has joined #openstack-infra | 07:01 | |
*** CaptTofu has quit IRC | 07:05 | |
*** senk has quit IRC | 07:06 | |
*** gokrokve has quit IRC | 07:11 | |
*** rakhmerov has quit IRC | 07:12 | |
*** yolanda_ has joined #openstack-infra | 07:15 | |
*** mrda is now known as mrda_away | 07:16 | |
*** gokrokve has joined #openstack-infra | 07:18 | |
*** vogxn has joined #openstack-infra | 07:20 | |
*** yolanda_ has quit IRC | 07:21 | |
*** emagana has joined #openstack-infra | 07:22 | |
*** dstanek has quit IRC | 07:22 | |
*** gokrokve has quit IRC | 07:22 | |
*** amotoki has joined #openstack-infra | 07:30 | |
*** senk has joined #openstack-infra | 07:33 | |
openstackgerrit | Guido Günther proposed a change to openstack-infra/jenkins-job-builder: tests: Allow to test project parameters https://review.openstack.org/67265 | 07:39 |
openstackgerrit | Guido Günther proposed a change to openstack-infra/jenkins-job-builder: project_maven: Don't require artifact-id and group-id https://review.openstack.org/66036 | 07:39 |
*** odyssey4me has quit IRC | 07:39 | |
*** odyssey4me has joined #openstack-infra | 07:40 | |
*** talluri has joined #openstack-infra | 07:41 | |
*** boris-42 has quit IRC | 07:41 | |
*** gokrokve has joined #openstack-infra | 07:41 | |
*** nati_ueno has joined #openstack-infra | 07:42 | |
*** ladquin_afk has quit IRC | 07:43 | |
*** che-arne has joined #openstack-infra | 07:45 | |
*** emagana has quit IRC | 07:45 | |
*** nati_uen_ has quit IRC | 07:46 | |
*** yamahata has quit IRC | 07:46 | |
*** gokrokve has quit IRC | 07:46 | |
*** gokrokve has joined #openstack-infra | 07:47 | |
*** odyssey4me has quit IRC | 07:49 | |
*** yolanda_ has joined #openstack-infra | 07:55 | |
*** odyssey4me has joined #openstack-infra | 07:57 | |
*** yamahata has joined #openstack-infra | 08:02 | |
*** bauzas has joined #openstack-infra | 08:03 | |
*** jcoufal has joined #openstack-infra | 08:08 | |
*** flaper87|afk is now known as flaper87 | 08:12 | |
*** luqas has joined #openstack-infra | 08:15 | |
*** pblaho has quit IRC | 08:20 | |
*** pblaho has joined #openstack-infra | 08:20 | |
*** dizquierdo has joined #openstack-infra | 08:21 | |
*** morganfainberg|z has quit IRC | 08:22 | |
*** morganfainberg|z has joined #openstack-infra | 08:25 | |
*** morganfainberg|z is now known as morganfainberg | 08:25 | |
*** andreaf has joined #openstack-infra | 08:25 | |
*** dizquierdo has quit IRC | 08:26 | |
*** markwash has quit IRC | 08:29 | |
*** markwash has joined #openstack-infra | 08:30 | |
*** markwash has quit IRC | 08:31 | |
*** gsamfira has joined #openstack-infra | 08:33 | |
*** jcoufal has quit IRC | 08:34 | |
*** jcoufal has joined #openstack-infra | 08:35 | |
openstackgerrit | ZhiQiang Fan proposed a change to openstack-dev/hacking: Enhance H233 rule https://review.openstack.org/68573 | 08:35 |
*** yaguang has quit IRC | 08:36 | |
*** dizquierdo has joined #openstack-infra | 08:38 | |
*** praneshp has quit IRC | 08:40 | |
*** vogxn1 has joined #openstack-infra | 08:47 | |
*** mancdaz_away is now known as mancdaz | 08:47 | |
*** vogxn has quit IRC | 08:50 | |
*** yamahata has quit IRC | 08:53 | |
*** che-arne has quit IRC | 08:56 | |
openstackgerrit | Nadya Privalova proposed a change to openstack/requirements: Fix happybase version https://review.openstack.org/68435 | 08:56 |
*** pblaho has quit IRC | 08:57 | |
*** CaptTofu has joined #openstack-infra | 09:01 | |
*** oubiwann has quit IRC | 09:05 | |
*** fbo_away is now known as fbo | 09:06 | |
*** smurugesan has quit IRC | 09:06 | |
*** vkozhukalov has quit IRC | 09:06 | |
*** bknudson has quit IRC | 09:06 | |
*** CaptTofu has quit IRC | 09:06 | |
*** oubiwann has joined #openstack-infra | 09:06 | |
*** yassine has joined #openstack-infra | 09:10 | |
*** jpich has joined #openstack-infra | 09:10 | |
*** smurugesan has joined #openstack-infra | 09:11 | |
*** vkozhukalov has joined #openstack-infra | 09:11 | |
*** bknudson has joined #openstack-infra | 09:11 | |
*** yamahata has joined #openstack-infra | 09:11 | |
*** pblaho has joined #openstack-infra | 09:11 | |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 09:12 | |
*** markmc has joined #openstack-infra | 09:13 | |
*** smurugesan has quit IRC | 09:13 | |
*** vkozhukalov has quit IRC | 09:13 | |
*** bknudson has quit IRC | 09:13 | |
*** rpodolyaka has quit IRC | 09:13 | |
*** vkozhukalov has joined #openstack-infra | 09:14 | |
*** smurugesan has joined #openstack-infra | 09:14 | |
*** derekh has joined #openstack-infra | 09:14 | |
*** masayukig has joined #openstack-infra | 09:16 | |
*** Ryan_Lane has quit IRC | 09:16 | |
*** rpodolyaka has joined #openstack-infra | 09:16 | |
*** salv-orlando has quit IRC | 09:17 | |
*** smurugesan has quit IRC | 09:19 | |
*** johnthetubaguy has joined #openstack-infra | 09:22 | |
*** mancdaz is now known as mancdaz_away | 09:22 | |
*** mancdaz_away is now known as mancdaz | 09:23 | |
*** beagles has quit IRC | 09:30 | |
*** b3nt_pin has joined #openstack-infra | 09:35 | |
*** luqas has quit IRC | 09:38 | |
*** jooools has joined #openstack-infra | 09:41 | |
*** talluri_ has joined #openstack-infra | 09:41 | |
*** talluri has quit IRC | 09:42 | |
*** coolsvap has quit IRC | 09:43 | |
*** coolsvap has joined #openstack-infra | 09:45 | |
*** johnthetubaguy1 has joined #openstack-infra | 09:45 | |
*** boris-42 has joined #openstack-infra | 09:46 | |
*** johnthetubaguy has quit IRC | 09:47 | |
*** bknudson has joined #openstack-infra | 09:48 | |
*** DinaBelova_ is now known as DinaBelova | 09:49 | |
*** ArxCruz has joined #openstack-infra | 09:50 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 09:51 | |
*** masayukig has quit IRC | 10:00 | |
*** max_lobur_afk is now known as max_lobur | 10:07 | |
*** luqas has joined #openstack-infra | 10:09 | |
*** pblaho has quit IRC | 10:14 | |
*** pblaho has joined #openstack-infra | 10:14 | |
*** vogxn1 has quit IRC | 10:24 | |
*** ociuhandu has joined #openstack-infra | 10:28 | |
*** DinaBelova is now known as DinaBelova_ | 10:36 | |
*** jp_at_hp has joined #openstack-infra | 10:38 | |
*** odyssey4me has quit IRC | 10:38 | |
*** DinaBelova_ is now known as DinaBelova | 10:45 | |
*** odyssey4me has joined #openstack-infra | 10:47 | |
*** afazekas has joined #openstack-infra | 10:50 | |
*** che-arne has joined #openstack-infra | 10:51 | |
*** dizquierdo has quit IRC | 11:00 | |
*** CaptTofu has joined #openstack-infra | 11:02 | |
*** matsuhashi has quit IRC | 11:05 | |
*** salv-orlando has joined #openstack-infra | 11:06 | |
*** CaptTofu has quit IRC | 11:06 | |
sdague | so I think we need a new floor in zuul | 11:12 |
sdague | the floor of 3 is definitely way too low, I suggest 10, and I suggest faster upward growth on success | 11:12 |
*** matsuhashi has joined #openstack-infra | 11:14 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard-webclient: Added node_no_api env https://review.openstack.org/68610 | 11:18 |
*** rossella_s has joined #openstack-infra | 11:28 | |
*** dpyzhov has quit IRC | 11:31 | |
*** rfolco has joined #openstack-infra | 11:32 | |
*** dpyzhov has joined #openstack-infra | 11:36 | |
*** michchap has quit IRC | 11:40 | |
*** salv-orlando has quit IRC | 11:41 | |
*** odyssey4me has quit IRC | 11:42 | |
*** che-arne has quit IRC | 11:44 | |
*** lcestari has joined #openstack-infra | 11:47 | |
*** yassine has quit IRC | 11:52 | |
*** odyssey4me has joined #openstack-infra | 11:53 | |
*** salv-orlando has joined #openstack-infra | 11:55 | |
*** markmc has quit IRC | 11:55 | |
*** salv-orlando has quit IRC | 11:56 | |
*** weshay has joined #openstack-infra | 11:57 | |
*** dpyzhov has quit IRC | 11:57 | |
*** simonmcc has joined #openstack-infra | 12:08 | |
*** pblaho has quit IRC | 12:08 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Add a sample config file https://review.openstack.org/68620 | 12:12 |
*** jooools has quit IRC | 12:12 | |
*** salv-orlando has joined #openstack-infra | 12:13 | |
*** senk has quit IRC | 12:14 | |
*** senk has joined #openstack-infra | 12:16 | |
*** pblaho has joined #openstack-infra | 12:20 | |
*** senk1 has joined #openstack-infra | 12:21 | |
*** senk has quit IRC | 12:22 | |
*** luqas has quit IRC | 12:23 | |
*** alexpilotti has joined #openstack-infra | 12:25 | |
*** salv-orlando has quit IRC | 12:25 | |
*** xchu has joined #openstack-infra | 12:25 | |
*** b3nt_pin has quit IRC | 12:27 | |
*** b3nt_pin has joined #openstack-infra | 12:27 | |
*** b3nt_pin is now known as beagles | 12:28 | |
sdague | woot - https://github.com/eventlet/eventlet/pull/75 - looks like that eventlet patch will land | 12:29 |
sdague | giving us control over the logging there | 12:29 |
*** alexpilotti has quit IRC | 12:30 | |
*** pblaho has quit IRC | 12:30 | |
*** pblaho has joined #openstack-infra | 12:31 | |
*** luqas has joined #openstack-infra | 12:33 | |
*** xchu has quit IRC | 12:34 | |
*** nosnos has quit IRC | 12:36 | |
sdague | hmmm.... something not working on how we generate our uncategorized page | 12:37 |
openstackgerrit | Sergey Lukjanov proposed a change to openstack-infra/elastic-recheck: Add fingerprint for bug 1268732 https://review.openstack.org/68625 | 12:38 |
*** markmc has joined #openstack-infra | 12:38 | |
sdague | SergeyLukjanov: nice | 12:39 |
*** matsuhashi has quit IRC | 12:39 | |
SergeyLukjanov | sdague, hi | 12:39 |
sdague | just approved your new er fingerprint | 12:39 |
SergeyLukjanov | sdague, I'm not really sure that I've done it correctly :) | 12:40 |
sdague | it looks right to me | 12:40 |
*** smarcet has joined #openstack-infra | 12:40 | |
SergeyLukjanov | sdague, ok, thx, now I what to do when I see some new error in logs ;) | 12:40 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Add fingerprint for bug 1268732 https://review.openstack.org/68625 | 12:41 |
sdague | so the foundation doing a retreat in Utah means they moved fungi back 2hrs. :( Need to get someone to look at adjusting the zuul window params | 12:41 |
sdague | as it's being too conservative now | 12:41 |
SergeyLukjanov | sdague, are you speaking about pipeline window? | 12:45 |
sdague | yeh | 12:45 |
sdague | the floor is too low, we're now sitting on a bunch of idle capacity | 12:46 |
*** dpyzhov has joined #openstack-infra | 12:46 | |
SergeyLukjanov | sdague, yep, agreed, we have a bunch of free nodes | 12:46 |
SergeyLukjanov | about a half of free nodes I think | 12:47 |
sdague | yep | 12:47 |
*** dguitarbite has quit IRC | 12:48 | |
SergeyLukjanov | I've missed the moment when this feature was added, so, looking now on implementatoin | 12:48 |
sdague | yesterday | 12:48 |
sdague | to prevent the massive thrashing | 12:49 |
sdague | grep for window | 12:49 |
sdague | it's mostly on the model.py side | 12:49 |
*** luqas has quit IRC | 12:49 | |
portante | sdague: nice work on the eventlet patch | 12:51 |
sdague | portante: thanks | 12:51 |
*** senk1 has quit IRC | 12:52 | |
*** senk has joined #openstack-infra | 12:52 | |
*** coolsvap has quit IRC | 12:58 | |
SergeyLukjanov | sdague, heh, I forgot to fetch the latest code ;) | 12:58 |
*** alexpilotti has joined #openstack-infra | 12:59 | |
*** _ruhe is now known as ruhe | 13:01 | |
SergeyLukjanov | sdague, it looks nice | 13:02 |
*** miqui has joined #openstack-infra | 13:02 | |
SergeyLukjanov | sdague, and it looks like floor is really to small for the current overall number of nodes | 13:02 |
*** CaptTofu has joined #openstack-infra | 13:03 | |
*** david_lyle has quit IRC | 13:03 | |
*** heyongli has joined #openstack-infra | 13:03 | |
*** yassine has joined #openstack-infra | 13:03 | |
*** dizquierdo has joined #openstack-infra | 13:05 | |
*** eharney has joined #openstack-infra | 13:05 | |
*** CaptTofu has quit IRC | 13:08 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: API tests for rest https://review.openstack.org/67447 | 13:12 |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 13:13 | |
anteaya | ttx you around to deal with a spammer in -neutron? | 13:15 |
*** jooools has joined #openstack-infra | 13:16 | |
*** CaptTofu has joined #openstack-infra | 13:17 | |
anteaya | sdague: isolated jobs started passing for neutron patches about 7 hours ago | 13:19 |
anteaya | as far as I can tell no code has changed | 13:19 |
*** rpodolyaka has quit IRC | 13:19 | |
anteaya | any thoughts on what might be contributing factors for the passing tests? | 13:19 |
*** rpodolyaka has joined #openstack-infra | 13:20 | |
*** jcoufal has quit IRC | 13:21 | |
*** ruhe is now known as _ruhe | 13:22 | |
*** jcoufal has joined #openstack-infra | 13:22 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 13:22 | |
anteaya | isolated jobs in the check queue | 13:24 |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: API tests for rest https://review.openstack.org/67447 | 13:24 |
anteaya | I spoke too soon, now the running jobs are failing | 13:26 |
anteaya | seems we just got some through the 30% passing gap | 13:26 |
*** rahmu has left #openstack-infra | 13:37 | |
*** thomasem has joined #openstack-infra | 13:40 | |
*** rahmu has joined #openstack-infra | 13:41 | |
*** talluri_ has quit IRC | 13:44 | |
*** talluri has joined #openstack-infra | 13:44 | |
*** dstufft is now known as caremad | 13:44 | |
*** CaptTofu has quit IRC | 13:44 | |
*** caremad is now known as dstufft | 13:45 | |
*** rpodolyaka has quit IRC | 13:45 | |
*** SergeyLukjanov is now known as SergeyLukjanov_a | 13:50 | |
russellb | jeblair: pong from yesterday | 13:50 |
*** SergeyLukjanov_a is now known as SergeyLukjanov_ | 13:51 | |
*** talluri has quit IRC | 13:54 | |
*** senk has quit IRC | 13:54 | |
*** senk1 has joined #openstack-infra | 13:54 | |
*** talluri has joined #openstack-infra | 13:56 | |
*** luqas has joined #openstack-infra | 13:57 | |
*** thuc has joined #openstack-infra | 13:57 | |
*** thuc_ has joined #openstack-infra | 13:58 | |
*** yamahata has quit IRC | 13:58 | |
*** johnthetubaguy1 is now known as johnthetubaguy | 13:59 | |
*** dkliban has quit IRC | 14:00 | |
*** dpyzhov has quit IRC | 14:02 | |
*** thuc has quit IRC | 14:02 | |
*** markmcclain has joined #openstack-infra | 14:03 | |
*** dpyzhov has joined #openstack-infra | 14:03 | |
ttx | anteaya: I don't think I have such privileges yet | 14:06 |
anteaya | okay | 14:06 |
anteaya | let's get you privileges | 14:07 |
ttx | anteaya: jeblair seems to be listed as channel founder | 14:07 |
anteaya | dude stopped at one advertisment fortunately | 14:07 |
anteaya | yes | 14:07 |
anteaya | July 5th, 2013 if memory serves | 14:07 |
*** senk has joined #openstack-infra | 14:07 | |
*** CaptTofu has joined #openstack-infra | 14:08 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Auth controller https://review.openstack.org/68642 | 14:09 |
*** heyongli has quit IRC | 14:10 | |
*** senk2 has joined #openstack-infra | 14:10 | |
*** senk1 has quit IRC | 14:11 | |
*** senk has quit IRC | 14:11 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 14:12 | |
*** boris-42_ has joined #openstack-infra | 14:14 | |
*** yamahata has joined #openstack-infra | 14:15 | |
*** katyafervent has quit IRC | 14:15 | |
*** senk2 has quit IRC | 14:15 | |
*** boris-42 has quit IRC | 14:15 | |
*** katyafervent has joined #openstack-infra | 14:15 | |
*** esker has joined #openstack-infra | 14:17 | |
*** _ruhe is now known as ruhe | 14:17 | |
*** mriedem has joined #openstack-infra | 14:18 | |
*** coolsvap has joined #openstack-infra | 14:18 | |
*** changbl has quit IRC | 14:19 | |
*** senk has joined #openstack-infra | 14:19 | |
*** dhellmann_ is now known as dhellmann | 14:19 | |
*** jooools has quit IRC | 14:21 | |
*** jooools has joined #openstack-infra | 14:22 | |
*** julim has joined #openstack-infra | 14:24 | |
*** dims has quit IRC | 14:26 | |
*** jooools1 has joined #openstack-infra | 14:27 | |
*** jooools has quit IRC | 14:27 | |
*** CaptTofu has quit IRC | 14:27 | |
sdague | ttx: any ideas when we'll see fungi this morning? | 14:28 |
*** dims has joined #openstack-infra | 14:28 | |
*** senk has quit IRC | 14:29 | |
ttx | sdague: should be here sometime in the next hour | 14:29 |
*** thuc_ has quit IRC | 14:30 | |
*** thuc has joined #openstack-infra | 14:30 | |
*** dpyzhov has quit IRC | 14:31 | |
openstackgerrit | Derek Higgins proposed a change to openstack-infra/devstack-gate: Adding the tripleo repositories to PROJECTS https://review.openstack.org/68645 | 14:34 |
*** thuc has quit IRC | 14:34 | |
*** rfolco has quit IRC | 14:35 | |
derekh | Have set ^^ as a WIP would be good if somebody could confirm what I think is correct | 14:36 |
*** dkliban has joined #openstack-infra | 14:37 | |
*** matel is now known as matel_brb | 14:40 | |
*** matel_brb is now known as matel | 14:42 | |
*** max_lobur has quit IRC | 14:45 | |
*** mfer has joined #openstack-infra | 14:46 | |
*** max_lobur has joined #openstack-infra | 14:46 | |
*** gsamfira has quit IRC | 14:46 | |
*** burt1 has joined #openstack-infra | 14:50 | |
*** mfer has quit IRC | 14:50 | |
*** jooools1 has quit IRC | 14:50 | |
*** jooools has joined #openstack-infra | 14:51 | |
*** senk has joined #openstack-infra | 14:52 | |
fungi | sdague: never. ;) what's needed? | 14:53 |
*** senk has quit IRC | 14:53 | |
fungi | oh, zuul config | 14:53 |
sdague | fungi: so... can we set the window floor via rpc? | 14:53 |
sdague | or is that a zuul restart | 14:53 |
fungi | sdague: no rpc, but no restart (just a config change to adjust the minimum) | 14:53 |
sdague | because basically we're over starved | 14:53 |
sdague | fungi: it rereads it? | 14:54 |
*** esker has quit IRC | 14:54 | |
fungi | yep, on the fly as far as i know | 14:54 |
sdague | so I'd like to propose we up the window floor to 6, and the success increment to 2 | 14:54 |
*** esker has joined #openstack-infra | 14:54 | |
*** dpyzhov has joined #openstack-infra | 14:54 | |
sdague | or the floor to 10 if we can't change the success increment via config | 14:54 |
sdague | if you notice, we now have tons of unused quota | 14:55 |
fungi | well, not tons, but at least some. we're keeping up because constant failures have driven us down to 3 gate changes in parallel | 14:56 |
mordred | sdague: I'm not sure changing the floor is needed if the failures have driven us this low - I think the next change we need is jeblair's change and then the change to teh joblist - I think this one is doing its down properly | 14:58 |
mordred | job | 14:58 |
mordred | not down | 14:58 |
mordred | also, morning! | 14:58 |
sdague | mordred: so tcp is really intended with trying to have as few errors as possible | 15:00 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/zuul: Add require-approval to Gerrit trigger https://review.openstack.org/68516 | 15:00 |
*** rcleere has joined #openstack-infra | 15:00 | |
sdague | but we actually are ok with speculation fails | 15:00 |
sdague | and I think we can take more than we have | 15:00 |
mordred | fair | 15:01 |
sdague | fungi: sure but it takes us 1 hr to up our queue by 1 additional slot | 15:01 |
*** talluri has quit IRC | 15:01 | |
sdague | if these tests were turning around in 5 minutes, I think it would be different | 15:01 |
* anteaya raises her hand for experimenting with the floor value | 15:03 | |
anteaya | I'd like to see what happens if we do | 15:03 |
*** julim has quit IRC | 15:03 | |
sdague | or the increment must equal the floor, otherwise I think we'll always regress to floor | 15:03 |
sdague | given any amount of failure | 15:03 |
sdague | but for right now, floor 10 would do us fine | 15:03 |
sdague | I also wonder if we could have nodepool try to always keep spares, not just when we are < 100 nodes | 15:05 |
sdague | or maybe it's already doing that | 15:06 |
sdague | and I'm reading the graph wrong | 15:06 |
*** luqas has quit IRC | 15:07 | |
mordred | sdague: it should always be trying to keep spares | 15:07 |
*** julim has joined #openstack-infra | 15:07 | |
sdague | ok, some times we just can't build fast enough then? | 15:07 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Zuul gate window increment by 2, floor at 6 https://review.openstack.org/68656 | 15:08 |
fungi | there's the original proposed change | 15:08 |
*** mfink has joined #openstack-infra | 15:09 | |
*** dstanek has joined #openstack-infra | 15:09 | |
fungi | sdague: so basically nodepool tries to keep 100 ready nodes, and when it sees additional need based on change activity is also tries to spin up additional nodes to accommodate that if it's not already satisfied in the base ready set | 15:10 |
*** jasondotstar has joined #openstack-infra | 15:10 | |
fungi | but node building takes long enough that there still ends up potentially being some delay when it spikes up there | 15:11 |
*** luqas has joined #openstack-infra | 15:11 | |
sdague | fungi: cool | 15:11 |
anteaya | would it make sense to increase the value of ready nodes from 100 to something larger than 100? | 15:12 |
anteaya | since I do believe the value of 100 was from when we had 3 jenkinses | 15:12 |
sdague | that's an interesting idea | 15:13 |
sdague | though if we are building as fast as we can | 15:13 |
sdague | then I don't think it helps | 15:13 |
*** kraman1 has joined #openstack-infra | 15:14 | |
*** oubiwann_ has joined #openstack-infra | 15:14 | |
sdague | fungi: so I'd say lets get the windows out there as soon as possible | 15:14 |
sdague | because now that we aren't thrashing | 15:14 |
sdague | we are actually self limiting our throughput | 15:14 |
*** krotscheck has joined #openstack-infra | 15:15 | |
*** ociuhandu_ has joined #openstack-infra | 15:15 | |
*** jergerber has joined #openstack-infra | 15:15 | |
*** ociuhandu has quit IRC | 15:15 | |
*** ociuhandu_ is now known as ociuhandu | 15:15 | |
anteaya | mordred: how goes the configuring of the 3 additional jenkinses? | 15:16 |
anteaya | I did see you say you had 05 up | 15:16 |
*** changbl has joined #openstack-infra | 15:16 | |
fungi | well, we're testing 3 changes in parallel and it will go up to 4 in 14 minutes if that top one succeeds and none fail, but yes right now we're not using as much of our quota as we could (though i question whether the throughput is likely to increase much if the reset rate has been bad enough to drive us to the floor overnight) | 15:16 |
openstackgerrit | Derek Higgins proposed a change to openstack-infra/config: Remove TRIPLEO_ROOT and pull-tools https://review.openstack.org/68661 | 15:16 |
openstackgerrit | Derek Higgins proposed a change to openstack-infra/config: Switch path to toci_gate_test.sh https://review.openstack.org/68662 | 15:16 |
fungi | anyway, i'm in favor of trying it | 15:18 |
russellb | fungi: i've got some nova security patches to approve today ... can we have stable +A back? :-) sdague says it should be good again | 15:21 |
sdague | fungi: yeh, I think we're going to be playing tuning games on it for maximum throughput | 15:22 |
openstackgerrit | A change was merged to openstack-infra/config: Zuul gate window increment by 2, floor at 6 https://review.openstack.org/68656 | 15:22 |
*** odyssey4me has quit IRC | 15:23 | |
*** afazekas has quit IRC | 15:23 | |
*** pballand has joined #openstack-infra | 15:23 | |
*** dcramer__ has joined #openstack-infra | 15:24 | |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Simple round trip API integration with storyboard-api https://review.openstack.org/68528 | 15:25 |
sdague | also, I wonder if we should flip the order on the stacked bar - http://graphite.openstack.org/render/?from=-24hours&height=180&until=now&width=334&bgcolor=ffffff&fgcolor=000000&areaMode=stacked&target=color(alias(sumSeries(stats.gauges.nodepool.target.*.*.*.used),%20%27In%20Use%27),%20%276464ff%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.*.*.*.building),%20%27Building%27),%20%27ffbf52%27)&target=color(alias(sumSeries(stats.gauges.node | 15:26 |
sdague | pool.target.*.*.*.ready),%20%27Available%27),%20%2700c868%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.*.*.*.delete),%20%27Deleting%27),%20%27c864ff%27)&title=Test%20Nodes&_t=0.8432196965441108#1390490560141 | 15:26 |
*** ryanpetrello has joined #openstack-infra | 15:26 | |
*** jgrimm has joined #openstack-infra | 15:26 | |
sdague | or - http://goo.gl/NzQCxl | 15:26 |
*** sandywalsh has joined #openstack-infra | 15:26 | |
sdague | because the blue part is basically our throughput, and nice to see how that is changing over time | 15:27 |
anteaya | I agree, it would be easier to assess changes to throughput if the lower limit were stable, as portrayed in the above graph | 15:28 |
*** dkranz has joined #openstack-infra | 15:29 | |
AJaeger | infra team, fungi, mordred - is there a chance to get gating for the api projects enabled, please? I'd really like to have them but if you're still too much in fire fighting, I'll wait - patch is https://review.openstack.org/#/c/67394/ | 15:31 |
*** prad_ has joined #openstack-infra | 15:32 | |
*** markmc has quit IRC | 15:32 | |
*** krtaylor has quit IRC | 15:35 | |
*** DennyZhang has joined #openstack-infra | 15:35 | |
*** changbl has quit IRC | 15:36 | |
*** boris-42_ has quit IRC | 15:39 | |
*** mfer has joined #openstack-infra | 15:45 | |
*** markmcclain has quit IRC | 15:45 | |
*** AJaeger has quit IRC | 15:47 | |
*** rakhmerov has joined #openstack-infra | 15:51 | |
*** MarkAtwood has joined #openstack-infra | 15:53 | |
*** esker has quit IRC | 15:53 | |
*** dhellmann is now known as dhellmann_ | 15:53 | |
*** esker has joined #openstack-infra | 15:53 | |
*** DennyZha` has joined #openstack-infra | 15:54 | |
*** jcoufal has quit IRC | 15:55 | |
*** DennyZhang has quit IRC | 15:55 | |
fungi | russellb: sure, adding that now | 15:55 |
*** gokrokve has quit IRC | 15:56 | |
*** jcooley_ has joined #openstack-infra | 15:57 | |
SergeyLukjanov | guys, is the FAILED floating_ips exercise known bug? | 15:57 |
*** esker has quit IRC | 15:58 | |
*** gothicmindfood has joined #openstack-infra | 15:58 | |
russellb | fungi: thanks! | 15:59 |
*** DennyZha` has quit IRC | 15:59 | |
*** gmurphy has joined #openstack-infra | 16:02 | |
fungi | SergeyLukjanov: on grizzly? | 16:02 |
SergeyLukjanov | fungi, on master | 16:03 |
*** UtahDave has joined #openstack-infra | 16:03 | |
SergeyLukjanov | fungi, http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiRkFJTEVEIGZsb2F0aW5nX2lwc1wiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiIxNzI4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzkwNDkzMDA5MjE2fQ== | 16:03 |
fungi | doesn't ring a bell, though do we gate devstack exercises on master any longer? | 16:03 |
SergeyLukjanov | https://review.openstack.org/#/c/68596/ | 16:04 |
SergeyLukjanov | fungi, http://logs.openstack.org/96/68596/2/check/check-devstack-dsvm-neutron/467b223/console.html#_2014-01-23_14_40_25_908 | 16:05 |
SergeyLukjanov | looks like we have some :) | 16:05 |
fungi | so, just a data point... it looks like any adjustment to the zuul dependent pipeline windowing restarts its calculations back to start values (so increasing the floor to 6 has caused the current window to go back to 20 initially) | 16:08 |
*** rnirmal has joined #openstack-infra | 16:12 | |
anteaya | yeah, I saw that, I wondered how that happened | 16:12 |
*** emagana has joined #openstack-infra | 16:12 | |
openstackgerrit | Brant Knudson proposed a change to openstack-infra/elastic-recheck: Add fingerprint for bug 1271190 https://review.openstack.org/68678 | 16:13 |
*** mrmartin has joined #openstack-infra | 16:13 | |
*** rfolco has joined #openstack-infra | 16:14 | |
mriedem | bknudson: ^ | 16:15 |
mriedem | that shows up in successful runs 77% of the time | 16:15 |
mriedem | you'll have to dig deeper | 16:15 |
bknudson | mriedem: how is that possible? errors in the logs don't cause problems? | 16:15 |
bknudson | don't cause a failure? | 16:16 |
mriedem | some are whitelisted | 16:16 |
bknudson | it's sometimes whitelisted and other times not whitelisted? | 16:16 |
fungi | bknudson: mriedem: also we turned the error checking back to non-failing because it was broken for long enough that nova grew some new occasional errors in the interim, so when we tried to turn it back to enforcing it was dragging the gate throughput back down again | 16:17 |
*** prad_ has quit IRC | 16:18 | |
*** prad has joined #openstack-infra | 16:19 | |
*** branen has quit IRC | 16:19 | |
*** wenlock has joined #openstack-infra | 16:20 | |
bknudson | fungi: mriedem: the fix is in the queue already (at #2) so maybe it's not worth it to add the e-r check. | 16:20 |
fungi | unless it turns out not to actually fix it, but yeah probably worth waiting just a bit longer | 16:20 |
mriedem | sounds good to me | 16:22 |
*** thuc has joined #openstack-infra | 16:22 | |
*** SumitNaiksatam has quit IRC | 16:22 | |
mriedem | i didn't see a whitelist in tempest for that heat error, but maybe that's not checked in this case | 16:22 |
mriedem | guessing tempest whitelist.yaml is only checking in the other logs | 16:23 |
mriedem | derp, b/c that's what it keys off, double derp | 16:23 |
anteaya | bknudson: are you talking about? https://review.openstack.org/#/c/68135/ | 16:25 |
anteaya | I ask because ttx is waiting on it and I am babysitting it | 16:26 |
anteaya | any impediment to it merging? | 16:26 |
ttx | no | 16:26 |
bknudson | anteaya: yes, https://review.openstack.org/#/c/68135/ -- it caused 6 keystone changes to fail to merge. | 16:26 |
anteaya | how? | 16:27 |
ttx | bah. It failed | 16:27 |
bknudson | anteaya: gate-tempest-dsvm-full FAILURE in 57m 51s | 16:27 |
*** nati_ueno has quit IRC | 16:28 | |
anteaya | Looks like the node went offline during the build. Check the slave log for the details | 16:28 |
anteaya | bknudson: could you expand a bit? | 16:28 |
anteaya | fungi can you take a look at https://jenkins04.openstack.org/job/gate-tempest-dsvm-full/3936/console | 16:29 |
bknudson | anteaya: http://logs.openstack.org/75/64575/15/gate/gate-tempest-dsvm-full/26f0c3c/console.html -- "Logs have errors" -- FAILED | 16:29 |
anteaya | my initial scan says the node went offline | 16:29 |
ttx | anteaya: I'll cut i2 without it | 16:29 |
anteaya | bknudson: can you help me make the connection to how that caused 6 keystone changes to fail to merge? | 16:29 |
ttx | anteaya: so no more need for babysitting | 16:30 |
anteaya | the keystone changes were ahead of the heat patch | 16:30 |
anteaya | ttx okay | 16:30 |
*** gyee has joined #openstack-infra | 16:30 | |
bknudson | anteaya: that review didn't cause the keystone to fail -- the bug that the patch fixes caused the tests to fail | 16:30 |
anteaya | ah okay | 16:31 |
anteaya | now I understand | 16:31 |
bknudson | anteaya: sorry, should have been clearer | 16:31 |
anteaya | all the more reason for 68135 to merge | 16:31 |
anteaya | np | 16:31 |
*** morganfainberg is now known as morganfainberg|z | 16:31 | |
fungi | yeah, looks like something may have happened to that slave. i'll see if it's still around to check | 16:31 |
anteaya | I'm curious about the failure log that says the node went offline | 16:31 |
anteaya | thanks | 16:31 |
*** nicedice has joined #openstack-infra | 16:32 | |
*** rakhmerov1 has joined #openstack-infra | 16:33 | |
*** rakhmerov has quit IRC | 16:33 | |
*** coolsvap is now known as coolsvap_away | 16:33 | |
fungi | i caught it before nodepool managed to successfully delete it. the jenkins slave agent is definitely not running on it... looking around for any obvious explanation on the slave | 16:35 |
anteaya | good catch | 16:35 |
*** coolsvap_away is now known as coolsvap | 16:35 | |
fungi | gah, nodepool apparently *had* already initiated the nova delete call, it took effect while i was looking at the system logs | 16:36 |
anteaya | boo | 16:36 |
anteaya | anything useful left? | 16:36 |
*** aarongr_afk is now known as AaronGr | 16:36 | |
*** coolsvap is now known as coolsvap_away | 16:37 | |
fungi | the logs i managed to look through before the slave was ripped out from under me didn't have any evidence for why the agent was no longer running. it's possible the slave agent is stopped normally when the slave is deregistered in jenkins though, so that's not necessarily related | 16:37 |
anteaya | hmmm | 16:38 |
anteaya | the gearman worker for that slave wouldn't have anything useful, would it? | 16:39 |
fungi | huh? | 16:40 |
fungi | the gearman worker on the jenkins master you mean? | 16:41 |
anteaya | yes for that slave | 16:41 |
anteaya | trying to understand where the cease and desist message came from for that slave | 16:41 |
anteaya | since the slave is gone, trying to cast about for what else remains | 16:41 |
jeblair | anteaya: where did you find the link to that build? | 16:42 |
anteaya | the failing test for the zuul status page | 16:42 |
anteaya | I clicked it before teh job was finished running | 16:42 |
fungi | jeblair: it was a failure which reported, so wasn't cancelled behind another failure | 16:42 |
fungi | anteaya: i'm not convinced it came from anywhere. the slave agent could have crashed, something might have gone awry on the jenkins master (which i'm checking logs on now), might have been a communication issue between them... | 16:43 |
*** morganfainberg|z is now known as morganfainberg | 16:43 | |
anteaya | ah | 16:44 |
anteaya | blast | 16:44 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Increase zuul window floor to 10 https://review.openstack.org/68691 | 16:45 |
fungi | Jan 23, 2014 4:25:17 PM hudson.node_monitors.AbstractDiskSpaceMonitor markNodeOfflineIfDiskspaceIsTooLow | 16:45 |
fungi | WARNING: Making devstack-precise-hpcloud-az2-1187454 offline temporarily due to the lack of disk space | 16:46 |
jeblair | fungi, mordred: ^ i think we actually discussed that as the default floor, i guess we missed that in the zuul patch | 16:46 |
*** amotoki is now known as amotoki_zzz | 16:46 | |
fungi | jeblair: okay, fair enough | 16:46 |
russellb | dims: what change was that on? | 16:46 |
jeblair | because it is pretty silly to have written this thing that massively parallelizes work and then not use it at least a little. :) | 16:47 |
russellb | i'm probably capable of figuring that out. | 16:47 |
dims | russellb, was on your review | 16:47 |
anteaya | fungi: lack of disk space | 16:47 |
fungi | anteaya: yes, on the slave | 16:47 |
fungi | anteaya: i'm going to sample similar slaves on that az to see how much space they normally have on their filesystems | 16:48 |
russellb | dims: ah yes .. :( | 16:48 |
russellb | dims: different error now it seems | 16:48 |
anteaya | fungi: k | 16:48 |
dims | russellb, yea :( | 16:48 |
jeblair | russellb: when you have a sec, i wanted to learn more about the tempest 4 procs -> 2 decision | 16:48 |
anteaya | I wonder how much disk space heat tests usually take to run | 16:49 |
russellb | jeblair: so ... while studying random failures a couple weeks ago, they seemed to related to things taking too long in various places in the code, not terribly consistent | 16:49 |
russellb | jeblair: and then we saw that the node was pegged on CPU the entire time | 16:50 |
russellb | jeblair: so that's basically the sum of it | 16:50 |
jeblair | russellb: it seems like a pretty big thing -- runtime * 1.5 is bound to have an affect on throughput, and while the math is hard for my sick-brain, i think it would have to be responsible for a substantial number of failures to cause an overall throughput increase | 16:50 |
*** samalba has quit IRC | 16:50 | |
jeblair | russellb: which may be worthwhile, of course | 16:50 |
russellb | it's about reliability though, not throughput | 16:50 |
russellb | random failures are also incredibly expensive in dev time for analysis and such | 16:50 |
jeblair | russellb: so we ran with 4 runners for quite some time, and during some periods things seemed _very_ reliable | 16:51 |
jeblair | russellb: so was there something that changed? | 16:51 |
russellb | has anything about what test nodes are being used changed? | 16:51 |
russellb | in nova, we made parts of nova a lot faster (able to do work concurrently a lot better) and others not | 16:52 |
russellb | primarily nova-compute can fly while nova-network is stuck being very serial | 16:52 |
russellb | so that was a lot of the problems | 16:52 |
anteaya | jeblair can you kick user polfilm from #openstack-neutron? person is a spammer | 16:52 |
jeblair | russellb: absolutely -- we use rax performance nodes now which are slightly faster than hpcloud ones | 16:52 |
russellb | dan smith and I spent a week ripping nova-network apart to make that better, patches all under review now | 16:52 |
anteaya | jeblair: you are the only one I know with ops for -neutron | 16:52 |
*** pballand has quit IRC | 16:52 | |
russellb | i just feel like if we're pegging the CPU the entire time, we're bound to have random failures again | 16:53 |
fungi | yep, '/msg chanserv access #openstack-neutron list' says only jeblair | 16:53 |
russellb | i really should have written down more of my analysis of specific failures, but i didn't | 16:53 |
russellb | i'd really like to see some per-process CPU consumption info | 16:54 |
fungi | so my bet guess is that we have some tests leaking a few gigs of data outside of /opt (probably in /home/jenkins) http://paste.openstack.org/show/61768/ | 16:54 |
jeblair | anteaya: done and you are op | 16:54 |
russellb | because i suspect with concurrency=4, we're running at least 4 instances of qemu (and sometimes more) at once, and that just kills it | 16:54 |
anteaya | thank you | 16:54 |
jeblair | russellb: ok, yeah, that was my next question -- any chance of our being able to increase the runner count in the future? | 16:55 |
russellb | i sure hope so | 16:55 |
russellb | i'd feel a lot better trying again after we land this patch series: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/nova-network-objects,n,z | 16:55 |
russellb | that's going to improve nova-network's ability work concurrently *a lot* | 16:56 |
russellb | dan smith did some testing and it made a *huge* difference | 16:56 |
jeblair | russellb: ok; there's a sysstat file in the logs stored with the jobs; did you use that? | 16:56 |
russellb | yeah, used that | 16:56 |
jeblair | ok, but it lacks the per-process info you need; what could help? | 16:56 |
russellb | i don't think we need anything this instant ... maybe when we're ready to try again | 16:57 |
fungi | given the constrained nature of the root filesystem on these slaves, i wonder if it would make sense to move ~jenkins into /opt (either via symlink or just in /etc/passwd) | 16:57 |
russellb | the above patch series gave a 20% speedup in test runtime, and 33% decrease in CPU consumption by nova-network | 16:57 |
fungi | maybe /var/log could be a culprit too | 16:57 |
fungi | or /tmp | 16:57 |
russellb | so i'd like to land that, then we can try again IMO | 16:57 |
anteaya | fungi: would that change the disk space issue? | 16:58 |
*** krotscheck has quit IRC | 16:58 | |
jeblair | fungi: reasonable. that's actually the biggest objection i have to the current practice of limited root spaces -- | 16:58 |
russellb | test being spawning and deleting a ton of instances | 16:58 |
anteaya | ah, so if it is run as root then that would be the limitation | 16:58 |
jeblair | fungi: it's _extremely_ difficult to put var/log onto another partition without lvm (and of course, they don't have it on lvm) | 16:58 |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 16:58 | |
* russellb does the multiplexed conversation dance | 16:58 | |
fungi | jeblair: right. having to close out those file descriptors without a reboot is nearly impossible | 16:59 |
*** samalba has joined #openstack-infra | 16:59 | |
jeblair | fungi: yep. anyway, yeah, those would be good ideas for the slaves i think. | 16:59 |
*** SumitNaiksatam has joined #openstack-infra | 16:59 | |
*** coolsvap_away is now known as coolsvap | 16:59 | |
fungi | i suspect the increased runtime has allowed us to bump up closer to filling the root filesystem on some runs | 17:00 |
jeblair | russellb: okay, cool. just wondering if there's anything we should be doing in the mean time | 17:00 |
*** mrodden has joined #openstack-infra | 17:00 | |
*** mrmartin has quit IRC | 17:00 | |
jeblair | russellb: i believe we're at the cpu sweet-spot in terms of node size | 17:00 |
jeblair | russellb: so i don't think bigger nodes would help this | 17:00 |
russellb | yeah, i'm hoping some nova performance improvements will help enough | 17:01 |
russellb | but we may still see some penalties due to just qemu eating the whole node | 17:01 |
russellb | emulated CPUs aren't fast | 17:01 |
russellb | :) | 17:01 |
sdague | russellb: so I've got load average, and a couple other stats in the sysstat data now as well | 17:01 |
russellb | cool | 17:01 |
sdague | that merged yesterday I think | 17:01 |
sdague | if that helps figure out bottle necks | 17:01 |
*** fifieldt has joined #openstack-infra | 17:01 | |
jeblair | russellb: oh, actually we have 4 vcpus on hp and 8 vcpus on rax | 17:02 |
*** boris-42 has joined #openstack-infra | 17:02 | |
sdague | jeblair: any chance we could get a few 8 ways to experiment with, and see if they help? | 17:02 |
sdague | jeblair: we have 8 on rax? | 17:02 |
sdague | since when? | 17:02 |
fungi | performance favor | 17:02 |
fungi | flavor | 17:02 |
russellb | might be worth a periodic ps that shows CPU per process | 17:02 |
fungi | sdague: since a few weeks | 17:02 |
russellb | well dang | 17:02 |
jeblair | russellb: did you by any chance correlate failures to provider? | 17:02 |
russellb | no ... | 17:03 |
sdague | fungi: were performance nodes in gate? | 17:03 |
russellb | i thought they were all 4 vcpus | 17:03 |
fungi | we do have provider metadata in logstash now | 17:03 |
jeblair | sdague: since LCA | 17:03 |
fungi | sdague: yes | 17:03 |
russellb | i'd be happy to try 4 on the 8 CPU nodes now | 17:03 |
jeblair | so, uh, 1.5 weeks-ish | 17:03 |
russellb | ah | 17:03 |
sdague | oh... yeh, so we'd have auto selected to 8way on the 8 CPU nodes | 17:03 |
russellb | my analysis was just before that | 17:03 |
* fungi has completely lost his sense of time | 17:03 | |
sdague | ok | 17:03 |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 17:03 | |
sdague | fungi: no kidding | 17:03 |
jeblair | but in much smaller numbers than hpcloud | 17:03 |
jeblair | at least until mordred finishes spinning up the new jenkinses, then we'll be getting close to comparable | 17:03 |
russellb | basically instead of concurrency=2, i want concurrency = 1/2 vCPU or something] | 17:04 |
russellb | or that was my intention for now | 17:04 |
sdague | so, honestly, I think we should figure out the metrics we think would tell us if we are overloaded or not, and get them in | 17:04 |
russellb | could possibly try 3/4 vCPU | 17:04 |
sdague | then we can try playing with concurency counts | 17:04 |
russellb | sdague: i think the load average was enough | 17:04 |
sdague | ok | 17:04 |
russellb | to see that we were too overloaded | 17:04 |
russellb | but it would be nice to mix in some periodic full process listings in there | 17:04 |
sdague | there are some io and context switch numbers in tehre as well | 17:04 |
russellb | so that when it's pegged we can see exactly what is the culprit | 17:04 |
sdague | russellb: sure, what about periodic time dumps of cpu time used by processes we care about | 17:05 |
sdague | that we should be able to pull out of /proc not too badly | 17:05 |
*** BobBall is now known as BobBallAway | 17:05 | |
russellb | sdague: sure ... though that list might be a pain to make | 17:06 |
jeblair | russellb, sdague: one more question to help flesh out my understanding -- what about increasing timeouts in tempest? | 17:06 |
*** marun has joined #openstack-infra | 17:06 | |
russellb | we'd have to increase tempest timeouts, as well as some timeouts in nova i think ... | 17:06 |
sdague | russellb: I think he was thinking per test timeouts | 17:06 |
russellb | yeah | 17:06 |
*** yamahata has quit IRC | 17:07 | |
jeblair | russellb, sdague: to be fair, we see node deletes take _hours_ in reality on real public clouds, which is why there's so much retry code in nodepool | 17:07 |
sdague | yeh | 17:07 |
sdague | jeblair: these are cirros guests though | 17:07 |
russellb | true ... but not on real virt, so it's slowwww | 17:07 |
russellb | and if we do as many of them as we have cpus ... we're going to eat the thing | 17:07 |
russellb | that on top of performance mismatches we have in nova (compute vs network) made things fall over | 17:08 |
russellb | compute is all like Y U TAKE SO LONG NETWORK | 17:08 |
*** gothicmindfood has quit IRC | 17:08 | |
jeblair | okay, so short answer, that's probably just a bad just a bad idea and isn't really sustainable | 17:08 |
russellb | etc. | 17:08 |
sdague | so qemu is only slow when it hit's priv instructions right? | 17:08 |
russellb | i think so, yeah ... | 17:08 |
*** DinaBelova is now known as DinaBelova_ | 17:08 | |
russellb | sdague: maybe, i should just stop speculating and we should get the per process info | 17:09 |
sdague | yeh | 17:09 |
*** pblaho has quit IRC | 17:09 | |
sdague | I actually think we're more bottlenecked on the python from what I've seen looking at things in the past | 17:09 |
sdague | but again, we should figure out what to count | 17:09 |
sdague | so we have real numbers | 17:09 |
russellb | really just a "ps axu" or some such | 17:09 |
russellb | every ... 30 seconds or minute or something, i don't know | 17:10 |
russellb | sdague: yeah, we need to know | 17:10 |
*** starmer has joined #openstack-infra | 17:10 | |
russellb | sdague: so let's get something in, and then throw up a draft that changes concurrency back | 17:10 |
sdague | sure | 17:11 |
*** markwash has joined #openstack-infra | 17:11 | |
sdague | ok, honestly I probably can't look at that until tomorrow | 17:11 |
*** dstanek has quit IRC | 17:11 | |
sdague | but I will slice off a bit of time to do it then | 17:11 |
russellb | sdague: want to point me in the right direction? | 17:11 |
russellb | where is the magic | 17:11 |
*** dstanek has joined #openstack-infra | 17:11 | |
sdague | russellb: so honestly, we should just add something to devstack that's like how systat was added | 17:13 |
russellb | yeah | 17:13 |
russellb | wasn't sure if systat was in devstack or some other place | 17:13 |
russellb | i'll look | 17:13 |
*** dprince has joined #openstack-infra | 17:13 | |
sdague | I'm not familiar enough with ps flags to get something like seconds of run time in a way we like, if we can, that would be cool | 17:14 |
sdague | otherwise I was thinking of just doing something custom and counting out of /proc/*/sched | 17:14 |
sdague | once upon a time I tried to use systemtap for this, but it kind of gets mad when you do that at a system level and you loop the process ids, which we do multiple times in tempest because of all the rootwrap forks | 17:15 |
sdague | at least on ubuntu | 17:15 |
mordred | russellb, sdague: it's almost like there should be a sensible way to track system performance and correlate it to openstack operation | 17:15 |
russellb | lol. | 17:16 |
russellb | zomg | 17:16 |
sdague | mordred: so honestly, I think the answer is actually systemtap | 17:16 |
sdague | but... above | 17:16 |
sdague | and I'm not skilled enough in it to figure it out | 17:16 |
russellb | ps can give %CPU at a given point at least right? | 17:16 |
openstackgerrit | A change was merged to openstack-infra/config: Increase zuul window floor to 10 https://review.openstack.org/68691 | 17:17 |
*** pballand has joined #openstack-infra | 17:17 | |
*** krtaylor has joined #openstack-infra | 17:18 | |
*** mancdaz is now known as mancdaz_away | 17:18 | |
russellb | or pidstat if sysstat may be enough ... | 17:21 |
russellb | s/if/from/ | 17:22 |
*** dkliban is now known as dkliban_afk | 17:23 | |
*** odyi has quit IRC | 17:29 | |
*** jooools has quit IRC | 17:30 | |
*** afazekas has joined #openstack-infra | 17:30 | |
*** yassine has quit IRC | 17:30 | |
*** bauzas has quit IRC | 17:31 | |
*** afazekas has quit IRC | 17:36 | |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: make scm test as the example https://review.openstack.org/65186 | 17:38 |
*** changbl has joined #openstack-infra | 17:38 | |
*** dpyzhov has quit IRC | 17:40 | |
jeblair | kraman: looking at that patch; it seems to broadly cover the bases and seems generally agreeable. | 17:41 |
*** dpyzhov has joined #openstack-infra | 17:41 | |
kraman1 | jeblair: is the event i used the correct one? | 17:41 |
kraman1 | basically i will be posting that event when an external git repo is updated | 17:42 |
kraman1 | so that zuul can kick off the flo | 17:42 |
kraman1 | flow ** | 17:42 |
kraman1 | the issue i had with that event is that I may not have the rev #s that are required as arguments | 17:42 |
*** jpich has quit IRC | 17:43 | |
jeblair | kraman1: so that's an event that is emitted by gerrit; since you aren't using a gerrit trigger, you could really do anything -- but as you've probably seen, if you want to use the EventFilter class more or less as it already exists, it does make some assumptions there... | 17:43 |
jeblair | kraman1: you could perhaps clean up that abstraction a little bit, so that there are GerritEventFilters and MessagingEventFilters... | 17:44 |
kraman1 | yah makes sense, i can make that update | 17:44 |
jeblair | kraman1: or you could go with the approach you started on there, which i'd describe as sort of emulating gerrit-like events | 17:44 |
jeblair | and reusing them... | 17:44 |
kraman1 | there were also a few places where the gerrit trigger is assumed for calls. eg to get git web url etc. those might need to be abstracted as well | 17:45 |
russellb | sdague: https://review.openstack.org/68702 .. pidstat | 17:45 |
*** max_lobur is now known as max_lobur_afk | 17:45 | |
jeblair | kraman1: regardless -- the ref-updated event (whether you use it, or do something else like it) is probably the best event to use (or model to follow) | 17:45 |
jeblair | kraman1: it is emitted when ever any ref (including a branch or a tag) in gerrit is updated | 17:45 |
kraman1 | ok, thanks. i will work on the patch some more and ping you again when i have an update | 17:45 |
*** dpyzhov has quit IRC | 17:45 | |
kraman1 | or do you prefer me to push a patch and then talk on the ML? | 17:45 |
jeblair | kraman1: it's essentially the result of a merge, or even a 'git push' to a branch... | 17:46 |
jeblair | kraman1: so you should actually be able to produce at least the new_rev | 17:46 |
jeblair | kraman1: if a branch is updated, it would be the git sha of the tip of the branch after the update | 17:46 |
jeblair | kraman1: (the old_rev would be the git sha of the branch right before the update; not sure if that's available to you) | 17:46 |
kraman1 | jeblair: it depends on the remote repo. i might just get a ping saying "branch updated" … without any rev info attached | 17:47 |
kraman1 | so i didnt want to make that assumption | 17:47 |
kraman1 | and i would prefer not to have external code to get that rev # in solum … trying to isolate all git ops in zuul side | 17:47 |
jeblair | kraman1: ok. well, it sounds like you can probably imagine the implications there -- if it takes a while for a job to run, it may end up using a newer revision than what it was initially triggered to run | 17:48 |
jeblair | kraman1: (if the revision data is missing) | 17:48 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Make jenkins homedir location configurable https://review.openstack.org/68705 | 17:48 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Put ~jenkins in /opt on nodepool slaves https://review.openstack.org/68706 | 17:48 |
kraman1 | jeblair: yes, i see. but in the zuul case that is probably the correct thing to do | 17:49 |
kraman1 | will need to think about it a bit more tho | 17:49 |
jeblair | kraman1: in the solum case? | 17:49 |
*** johnthetubaguy has quit IRC | 17:49 | |
jeblair | kraman1: i think it would be fine to start pushing patches and pinging the ml. | 17:50 |
kraman1 | jeblair: for solum, we get a call from a remote git repo (like github etc) sayign a push was made. and we use zuul to build the latest code form the repo | 17:50 |
kraman1 | from* | 17:50 |
kraman1 | jeblair: ok, will start formatting my changes into a patch | 17:51 |
jeblair | or patches :) | 17:51 |
kraman1 | :) | 17:51 |
kraman1 | are there any guidelines about writing tests? | 17:51 |
openstackgerrit | Dolph Mathews proposed a change to openstack-infra/elastic-recheck: eliminate 14 false positives for bug 1268732 https://review.openstack.org/68707 | 17:51 |
*** markmcclain has joined #openstack-infra | 17:51 | |
jeblair | kraman1: (a) please do (b) i'm more into functional tests than unit tests | 17:51 |
fungi | the more i think back over the other recent nodepool nodes which have spontaneously offlined, the more it dawns on me that it's most often between tests finishing and logs getting uploaded. so almost certainly /home/jenkins filling up the / filesystem | 17:52 |
jeblair | kraman1: you'll see that test_scheduler basically exercises a real running zuul with faked-out gerrit and gearman workers | 17:52 |
jeblair | fungi: sounds reasonable | 17:52 |
anteaya | that was the case with the heat patch, tests had finished successfully | 17:52 |
kraman1 | jeblair: yep, i will add something similar in that case | 17:52 |
anteaya | shardy showed me this bug: https://bugs.launchpad.net/openstack-ci/+bug/1268732 | 17:53 |
kraman1 | jeblair: thanks for the guidance. keep an eye out for patches :) | 17:53 |
anteaya | which I think is the same bug | 17:53 |
*** yamahata has joined #openstack-infra | 17:53 | |
fungi | anteaya: agreed. i think it is | 17:54 |
SpamapS | clarkb: https://review.openstack.org/#/c/68135/ still queued (and at the very bottom of zuul.openstack.org). Is that just "because things are broken"? | 17:56 |
jeblair | fungi: devstack-gate needs an update before https://review.openstack.org/#/c/68706/ | 17:57 |
*** jerryz has joined #openstack-infra | 17:58 | |
jeblair | oh wait it doesn't | 17:58 |
jeblair | fungi: it's just the -dev script the refs /home | 17:58 |
*** markmcclain has quit IRC | 17:58 | |
jeblair | SpamapS: zuul has learned to be less optimistic and now only launches jobs for changes near the top of the queue | 17:59 |
jeblair | SpamapS: in a sliding window based on recent success/failure rates (clarkb just wrote that feature) | 17:59 |
*** rakhmerov1 has quit IRC | 18:00 | |
*** derekh has quit IRC | 18:00 | |
jeblair | SpamapS: not sliding; growing/shrinking. obviously the window is anchored at the head of the queue. :) | 18:00 |
*** nati_ueno has joined #openstack-infra | 18:00 | |
jeblair | SpamapS: UI indication of this is yet to be written; we'll get there soon. we needed the feature quickly though. | 18:01 |
fungi | jeblair: i think i need to amend the prep script change to also leave a symlink at /home/jenkins until we can fix pypi-mirror to resolve ~jenkins (its config has /home/jenkins hard-coded) and things like that | 18:02 |
jeblair | fungi: oh, right, for the requirements change jobs? | 18:02 |
fungi | and also i want to link the bug anteaya mentioned in the commit message, for thoroughness | 18:02 |
*** SumitNaiksatam has quit IRC | 18:02 | |
fungi | yep | 18:02 |
*** MarkAtwood has quit IRC | 18:03 | |
fungi | it's a to-do. also, jenkins masters think the workspace is in /home/jenkins... where do we fix that during slave registration (is it somewhere separate)? | 18:03 |
*** MarkAtwood has joined #openstack-infra | 18:03 | |
*** dkliban_afk is now known as dkliban | 18:04 | |
jeblair | fungi: it might be a parameter nodepool can pass via the jenkins library... | 18:05 |
jeblair | fungi: i'm wondering if it might be not too bad to just have the nodepool setup script mv+ln -s it? | 18:05 |
jeblair | fungi: and leave it configured as /home. just throwing it out there. :) | 18:05 |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 18:06 | |
*** DinaBelova_ is now known as DinaBelova | 18:07 | |
*** praneshp has joined #openstack-infra | 18:08 | |
fungi | trying to remember... i ran across something not long ago which generously replaced any symlink it found in the parentage of a managed path with a directory... don't recall whether that was puppet or jenkins | 18:08 |
fungi | oh, i know, it was the jenkins publisher | 18:08 |
fungi | so not relevant | 18:08 |
*** MarkAtwood has quit IRC | 18:08 | |
jog0 | pleia2: my calender has something about a HP/Intel SF hackathon this weekend, do you know anything abou tthat | 18:08 |
*** MarkAtwood has joined #openstack-infra | 18:09 | |
fungi | jeblair: in jenkins::jenkinsuser i could wrap an if $home!=/home/jenkins block around a symlink file object | 18:09 |
jeblair | mordred: what's the latest on the new jenkinses? | 18:09 |
* fungi thinks that might be safer | 18:10 | |
clarkb | good morning. I am having an extremely slow start today | 18:10 |
fungi | clarkb: how tcp of you | 18:10 |
*** harlowja_away is now known as harlowja | 18:10 | |
jeblair | fungi: wfm. i don't have a strong feeling about it other than my suggestion was prompted by the fact that it seemed like we might be rounding a bend in the rabbit-hole to find more rabbit-hole. :) | 18:10 |
clarkb | spamaps it needs manual promotion you can ask for it though it isnt a gate issue is it? | 18:11 |
SpamapS | jeblair: ahh. Thanks. I had pinged clarkb about it last night and he aske dme to ping him back today if it was still queued | 18:11 |
jeblair | clarkb: good morning i have a brain dump ready for you regarding the window. let me know when you are ready to receive. | 18:11 |
*** SumitNaiksatam has joined #openstack-infra | 18:12 | |
*** luqas has quit IRC | 18:12 | |
SpamapS | clarkb: no, it is a "every CD heat user is screwed" issue. We can't test that issue in the gate because it requires spinning up a VM so it is only in our experimental checks. | 18:12 |
*** MarkAtwood has quit IRC | 18:12 | |
SpamapS | otherwise we'd have found it. :-P | 18:12 |
mordred | jeblair: 5 is up and running - I'm about to finish setting up 6 and 7 | 18:12 |
SpamapS | clarkb: I don't want to pull focus off the gate blockers though. So if those are still landing.. by all means we can wait more. | 18:13 |
anteaya | mordred \o/ | 18:14 |
jeblair | mordred: the css looks weird on 5; maybe puppet needs to run again? | 18:14 |
*** hashar has joined #openstack-infra | 18:15 | |
*** starmer has quit IRC | 18:16 | |
*** krotscheck has joined #openstack-infra | 18:16 | |
jeblair | SpamapS, clarkb: we don't want to be the gatekeepers for the project, so we have a pretty limited set of changes we'll promote: gate fixes, security issues, legal issues. I'm not sure we should add CD issues to that list, as my understanding is that most CD systems are expected to have a mechanism to deal with this kind of temporary breakage, yeah? | 18:16 |
SpamapS | jeblair: indeed, we're dealing. | 18:17 |
SpamapS | and I was not asking for promotion, only looking for insight into whats going on. | 18:17 |
mordred | jeblair: yeah. it's entirely possible - the "run puppet, delete user, install deb, run puppet, fix username/chown dir" process was not perfect | 18:18 |
SpamapS | clarkb: ah I just realized, btw, that it actually did get a run last night, but failed and was reverified (bug 1268732 fyi) .. anyway thanks for the insight | 18:23 |
SpamapS | jeblair: ty as well for explaining. :) | 18:23 |
*** coolsvap has quit IRC | 18:23 | |
*** coolsvap has joined #openstack-infra | 18:24 | |
*** gyee has quit IRC | 18:27 | |
*** praneshp has quit IRC | 18:27 | |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Put ~jenkins in /opt on nodepool slaves https://review.openstack.org/68706 | 18:27 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Make jenkins homedir location configurable https://review.openstack.org/68705 | 18:27 |
fungi | SpamapS: ^ yes, i think tempest runs for heat changes are possibly generating an unusually large volume of logs | 18:28 |
clarkb | jeblair: ready | 18:29 |
clarkb | fungi: there was another test that ran out of /var/cache/nova room | 18:29 |
fungi | clarkb: ick | 18:29 |
fungi | clarkb: perhaps /var/cache should also end up in /opt? | 18:30 |
fungi | or maybe we make an /opt/cache and have devstack-gate tell devstack to tell nova to use that? | 18:30 |
clarkb | fungi: perhaps. or maybe we should remount a bunch of paths? /home /var /opt and so on | 18:30 |
fungi | clarkb: remounting those may be hard without a reboot | 18:31 |
*** jcooley_ has quit IRC | 18:31 | |
*** praneshp has joined #openstack-infra | 18:31 | |
fungi | well, remounting /home is probably doable before the jenkins agent is running | 18:32 |
*** rakhmerov has joined #openstack-infra | 18:32 | |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 18:33 | |
* fungi ponders that | 18:33 | |
*** DinaBelova is now known as DinaBelova_ | 18:33 | |
*** andreaf has quit IRC | 18:34 | |
*** branen has joined #openstack-infra | 18:34 | |
*** mrmartin has joined #openstack-infra | 18:34 | |
*** shardy is now known as shardy_afk | 18:35 | |
openstackgerrit | Donald Stufft proposed a change to openstack-infra/config: Release python-barbicanclient via Zuul https://review.openstack.org/68719 | 18:35 |
*** SumitNaiksatam_ has joined #openstack-infra | 18:37 | |
mordred | jeblair: re-ran puppet - re-started jenkins | 18:37 |
jeblair | mordred: lovely; probably needs jjb run manually | 18:38 |
*** smurugesan has joined #openstack-infra | 18:38 | |
zaro | fungi: wiki says we are using puppet 2.6.x is that accurate? | 18:38 |
fungi | zaro: 2.7.x | 18:38 |
fungi | or is it 2.9.x... checking now | 18:39 |
jeblair | mordred: also, i can't log in; hrm, i think there may still be something wrong... did i see clarkb say that the node list was dirty a while ago? perhaps i pointed you at the wrong files | 18:39 |
fungi | zaro: 2.7.22 currently | 18:39 |
zaro | fungi: ok. i'll update this page https://wiki.openstack.org/wiki/Puppet-openstack | 18:40 |
fungi | zaro: thanks | 18:40 |
*** SumitNaiksatam has quit IRC | 18:40 | |
*** SumitNaiksatam_ is now known as SumitNaiksatam | 18:40 | |
jeblair | zaro: i don't think that page has anything to do with us | 18:40 |
fungi | zaro: yeah, i just looked at it | 18:40 |
fungi | zaro: that's the openstack puppet community documentation | 18:41 |
fungi | not the openstack infra puppet documentation | 18:41 |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 18:41 | |
*** fallenpegasus has joined #openstack-infra | 18:41 | |
jeblair | clarkb: so the optimization you're making is to not launch jobs prematurely | 18:42 |
jeblair | clarkb: i think the key to addressing the issues you brought up is to make the feature in zuul stick closer to that goal | 18:42 |
jeblair | clarkb: so rather than iterating over the window, iterate over the whole list | 18:42 |
jeblair | clarkb: but then when examining each change, just bypass the job launch (and i suppose the merge operation as well) if the current change is outside the window | 18:43 |
jeblair | clarkb: but otherwise, continue to exercise everything else in the queue processor | 18:43 |
clarkb | jeblair: so push it down into _processOneItem() | 18:43 |
jeblair | clarkb: yep. this will not only cancel jobs and other cleanup, but should fix the subway map lines, and have future benefits like actually removing changes from the deep queue for things like non-mergeability | 18:44 |
*** jcooley_ has joined #openstack-infra | 18:45 | |
jeblair | (as in code-review-2 non-mergability) | 18:45 |
jeblair | once we get that fixed | 18:45 |
clarkb | yup | 18:45 |
clarkb | I will start in that direction then | 18:45 |
jeblair | clarkb: cool | 18:45 |
clarkb | though, mergability checking is one thing that slowed zuul processing down | 18:45 |
clarkb | I am not sure we want to build zuul refs for the entire tree the whole way down | 18:46 |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 18:46 | |
jeblair | clarkb: yeah, i was proposing you also skip the merger operation | 18:47 |
clarkb | ok | 18:47 |
jeblair | clarkb: double use of mergability here -- we should not check to see if we can git-merge a change. we should, later, fix the bug that prevents us from removing a change when it fails to satisfy gerrit-merge criteria. | 18:48 |
zaro | fungi: ok. i won't update it. couldn't find any other openstack-infra puppet version info on the net. | 18:48 |
fungi | jeblair: clarkb: thinking about the ~jenkins move a bit more, the separate /opt filesystem doesn't get created until the job starts (setup_workspace calls out to a function to do that part), so using it in puppet is way too early, and we can't move it during the job because the slave agent will have open descriptors in it | 18:48 |
dstufft | I'm guessing this is the right channel ;P If anyone can +2 https://review.openstack.org/#/c/68719/ that'd be awesomesauce | 18:48 |
clarkb | fungi: oh right, hrm | 18:49 |
*** mgagne has quit IRC | 18:49 | |
zaro | fungi: i do see that ci.o.o/puppet.html references puppet-dashboard.o.o but that puppet-dashboard link is broken. | 18:49 |
jeblair | fungi: gah; you're right, and we can't do it in image prep because, well, it's an ephemeral partition. | 18:49 |
fungi | zaro: http://ci.openstack.org/puppet.html | 18:49 |
fungi | zaro: "We have not yet migrated to puppet 3, so we pin puppet to 2.x." | 18:50 |
fungi | zaro: (that means "most recent 2.x puppet version") | 18:50 |
jeblair | fungi: i think the people who decided this system is workable should fix it. :| | 18:50 |
jeblair | fungi: there's something about openstack tests failing because of this openstack deployment decision. | 18:51 |
fungi | jeblair: heh. yes, i agree. just trying to decide where a good place to fix it is | 18:51 |
zaro | fungi: how does that relationship to broken link? | 18:52 |
fungi | in hpcloud and rackspace i guess ;) | 18:52 |
jeblair | zaro: ask pleia2 and anteaya about dashboard status | 18:52 |
*** pballand has quit IRC | 18:52 | |
fungi | zaro: i wasn't answering your question about puppet dashboard. i was pointing you to where our documentation implies the version of puppet we're running | 18:53 |
*** mgagne has joined #openstack-infra | 18:53 | |
jeblair | fungi: i am not immediately seeing a solution to this as long as we're running jenkins. | 18:53 |
jeblair | or rather, as long as we are using it to push logs. | 18:53 |
fungi | jeblair: ahh, right, jenkins won't collect artifacts outside the workspace, right? | 18:54 |
jeblair | fungi: i believe that's the case. | 18:54 |
*** mgagne1 has joined #openstack-infra | 18:55 | |
anteaya | zaro the latest on the dashboard was we were working with hunner to bring puppetboard online | 18:55 |
jeblair | fungi: that should probably be verified though | 18:55 |
*** pballand has joined #openstack-infra | 18:55 | |
anteaya | but he was having issues with the underlying db requirements for puppetboard, as best as I can remember | 18:55 |
zaro | anteaya: probably should remove the broken link until it's available? | 18:55 |
anteaya | then I got swallowed by neutron and having done anything with it since | 18:56 |
anteaya | I can do that, the one in ci.openstack.org? | 18:56 |
zaro | yes. | 18:56 |
anteaya | k | 18:56 |
*** sandywalsh has quit IRC | 18:56 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/zuul: Remove push refs to gerrit feature https://review.openstack.org/68723 | 18:57 |
zaro | fungi: had a hard time making the connection. thanks. | 18:57 |
*** mgagne has quit IRC | 18:57 | |
*** mgagne1 is now known as mgagne | 19:00 | |
*** jcooley_ has quit IRC | 19:01 | |
fungi | jeblair: i don't suppose we could add a step to the node launch to make earlier use of the ephemeral disk? (that would leave it less flexible for jobs wanting to do things with it though) | 19:01 |
fungi | in which case, since puppet runs during the image build, we'd have to not rely on puppeting anything ito there | 19:02 |
jeblair | fungi: perhaps we don't have to care about jenkins open file descriptors | 19:03 |
fungi | so probably connect to freshly built node, partition/format/mount the ephemeral disk, move /home/jenkins to it and leave a symlink behind, then register it with jenkins as a slave | 19:03 |
* fungi looks in the vicinity of the ssh test routine nodepool has | 19:04 | |
jeblair | fungi: yeah. | 19:04 |
openstackgerrit | Anita Kuno proposed a change to openstack-infra/config: Remove links to puppet dashboard https://review.openstack.org/68724 | 19:04 |
fungi | i'm thinking we want an optional nodepool script which can be uploaded and run after/during/as the ssh test | 19:05 |
jeblair | fungi: yes. i continue to hate that this is the best solution. | 19:06 |
jeblair | fungi: can we explore whether we can get jenkins to upload artifacts from another location a bit more first? | 19:07 |
fungi | jeblair: however, this addresses clarkb's concern about people running devstack-gate inadvertently blowing away unrelated filesystems/block devices | 19:07 |
*** DinaBelova_ is now known as DinaBelova | 19:07 | |
jeblair | fungi: it makes the nodes less useful | 19:07 |
fungi | jeblair: sure, i'll have a look at tweaking the publisher macro | 19:07 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/zuul: Allow zuul to cleanup jobs outside window https://review.openstack.org/68725 | 19:10 |
clarkb | jeblair: ^ is a relatively simple change. I do have one concern about it that I will point out inline | 19:10 |
clarkb | jeblair: and posted | 19:12 |
anteaya | jeblair: http://ci-puppetmaster.openstack.org/ doesn't resolve for me | 19:12 |
pleia2 | jog0: re: hackathon - yeah, I'm working it tomorrow and Sunday, Dave Neilson's thing, details: http://public.bemyapp.com/noblecodehackathon/ | 19:12 |
clarkb | anteaya: I don't think we have a web server there | 19:12 |
pleia2 | jog0: I was actually going to ask if you were helping out too, so I can give you our LCA bag ;) | 19:12 |
clarkb | jeblair: fungi: we can probably update the SCP plugin to have an option allowing unsafe file copies | 19:13 |
anteaya | clarkb: that would explain why it doesn't resolve to anything | 19:13 |
Hunner | anteaya: :( | 19:13 |
anteaya | Hunner: hello | 19:13 |
anteaya | sorry I have had no time for you | 19:13 |
anteaya | that has changed | 19:13 |
Hunner | anteaya: Sorry that I haven't made any progress :/ | 19:13 |
anteaya | when do you have time to lend a hand with puppety things | 19:13 |
anteaya | Hunner: now that we have apologized to each other :D | 19:14 |
jog0 | pleia2: I am going to do Saturday but I think I can swing by on friday night too | 19:14 |
anteaya | what now? | 19:14 |
anteaya | Hunner: would enjoy working with you to fix all the things | 19:14 |
anteaya | how is your time? | 19:14 |
Hunner | anteaya: I think the only thing left was to write the apache vhost definition... but the apache module version was so old that it made me sad :( | 19:15 |
jeblair | anteaya: it's not a web server, but i certainly hope the host _resolves_. | 19:15 |
anteaya | Hunner: yes, I remember that being an issue, how can we address the sad | 19:15 |
Hunner | anteaya: Oh, and I would have to update the masters to report to puppetdb | 19:15 |
jog0 | pleia2: cody got my in touch with Dave | 19:15 |
anteaya | jeblair: I'm confused, so should that link remain in docs or no? | 19:15 |
pleia2 | jog0: ah ok, great | 19:15 |
Hunner | anteaya: I talked with clark about forking/updating the module, but I think that will just have to be a separate effort | 19:16 |
pleia2 | jog0: maybe we just do lunch some day, probably easier than me bringing bag to hackathon and hoping we cross paths anyway | 19:16 |
anteaya | great | 19:16 |
anteaya | Hunner: how would you like to proceed? | 19:16 |
pleia2 | anteaya: re: puppetmaster link, it shouldn't not resolve (typing: "host ci-puppetmaster.openstack.org" in terminal should work), it probably just doesn't *connect* in a web browser | 19:17 |
pleia2 | err it SHOULD resolve | 19:17 |
*** UtahDave has quit IRC | 19:18 | |
*** ociuhandu has quit IRC | 19:18 | |
jog0 | pleia2: I was thinking the same thing actually | 19:18 |
jog0 | how does tomorrow work for you? | 19:18 |
anteaya | pleia2: okay so if it doesn't connect to a web browser does it make sense to leave it in the docs as a clickable link? | 19:18 |
anteaya | should it just have the url as non clickable? | 19:19 |
*** ruhe is now known as _ruhe | 19:19 | |
jeblair | anteaya: it's not a clickable link | 19:19 |
anteaya | if it is non clickable I think it should be removed from the navigation | 19:19 |
pleia2 | anteaya: where? | 19:19 |
anteaya | and stay in the puppet.html page as non clickable | 19:20 |
anteaya | ci.openstack.org | 19:20 |
anteaya | see the navigation at the top? | 19:20 |
*** fallenpegasus has quit IRC | 19:20 | |
anteaya | puppet master | 19:20 |
pleia2 | anteaya: it's not clickable, "host" refers to which server it's on | 19:20 |
anteaya | click it | 19:20 |
*** fallenpegasus2 has joined #openstack-infra | 19:20 | |
pleia2 | "host" does not mean it's a website | 19:20 |
* anteaya wonders if she is imagining things | 19:20 | |
pleia2 | (which is why it's not clickable) | 19:20 |
*** dpyzhov has joined #openstack-infra | 19:20 | |
pleia2 | puppet-dashboard link is clickable, because it should work | 19:20 |
anteaya | am I the only one who sees puppet master as a clickable option in the navigation of ci.openstack.org? | 19:22 |
anteaya | a clickable link, which we have just established is not meant to be clickable? | 19:22 |
*** vkozhukalov has quit IRC | 19:22 | |
*** dpyzhov has quit IRC | 19:22 | |
pleia2 | anteaya: oh oh, on the TOP of the page! | 19:22 |
anteaya | yes | 19:22 |
*** sandywalsh has joined #openstack-infra | 19:23 | |
pleia2 | anteaya: ok yeah, I think that should go away and/or be replaced when board comes up | 19:23 |
*** pballand has quit IRC | 19:23 | |
anteaya | okay thanks, I will do that | 19:24 |
clarkb | jeblair: fungi: any idea if zuul did a layout reload recently? (I just noticed that the window size appears to be 20 but logs indicate is should be ~3, layout reload would reset it to 20) | 19:24 |
*** salv-orlando has joined #openstack-infra | 19:24 | |
jeblair | clarkb: yes, i bumped the min to 10 | 19:24 |
fungi | clarkb: and before i that i had bumped it to 6 and increased the increment to 2 | 19:25 |
clarkb | jeblair: fungi: thanks | 19:25 |
pleia2 | jog0: tomorrow I'll be at the hackathon all day :) I'll send an email to organize something next week (should invite gothicmindfood too!) | 19:25 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: A test publisher to collect logs from /opt https://review.openstack.org/68732 | 19:26 |
fungi | running out to lunch for a bit | 19:26 |
jog0 | pleia2: sounds good | 19:26 |
openstackgerrit | Anita Kuno proposed a change to openstack-infra/config: Remove link to puppet dashboard https://review.openstack.org/68724 | 19:27 |
*** ryanpetrello has quit IRC | 19:28 | |
*** mrmartin has quit IRC | 19:28 | |
clarkb | jeblair: reading your comment on my change, is protecting process one item on an item that has been removed something that should go into my change? | 19:29 |
clarkb | jeblair: wasn't clear to me if that is something I need to address or a general issue | 19:29 |
*** gsamfira has joined #openstack-infra | 19:30 | |
jeblair | oh sorry, general issue | 19:30 |
*** markwash_ has joined #openstack-infra | 19:30 | |
*** harlowja is now known as harlowja_away | 19:31 | |
Hunner | anteaya: I'm asking my boss to get some work hours to put on this so I don't have to ask my wife ;) | 19:32 |
anteaya | thank you | 19:32 |
anteaya | much better option | 19:32 |
anteaya | do not ask your wife | 19:32 |
*** markwash has quit IRC | 19:32 | |
*** markwash_ is now known as markwash | 19:32 | |
openstackgerrit | João Vale proposed a change to openstack-infra/jenkins-job-builder: Add support for credentials-id in git repositories. https://review.openstack.org/68734 | 19:34 |
*** pballand has joined #openstack-infra | 19:35 | |
lifeless | hi infra people! We could use a hint on https://review.openstack.org/#/c/68645/ | 19:38 |
lifeless | /cluebat/ if you prefer | 19:38 |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 19:39 | |
*** fallenpegasus2 has quit IRC | 19:40 | |
jeblair | lifeless: we believe the test failure there is caused by the root filesystem filling up. for some reason, cloud deployers think tiny root filesystems are okay. i'm not a fan of that. | 19:40 |
*** hogepodge has joined #openstack-infra | 19:41 | |
jeblair | lifeless: we're currently trying to work around that but it's rather difficult. | 19:41 |
lifeless | jeblair: oh, I actually meant derekh's question about whether the change is semantically correct | 19:41 |
jeblair | oh! hah! | 19:41 |
lifeless | e.g. will it put more stuff in /opt/stack/new | 19:41 |
lifeless | if the answer is 'yes but it breaks the root filesystem' then thats sad but still helpful! | 19:42 |
jeblair | lifeless: no, it's 'yes and will not affect the root filesystem'; the git repos are cloned to the ephemeral disk | 19:42 |
*** sarob has joined #openstack-infra | 19:42 | |
lifeless | jeblair: ok cool. He had a follow nuance there - but hey, you can read his q yourself :) | 19:43 |
jeblair | lifeless: (the difficult part is that /var/log and /home/jenkins can't be moved easily; both of those relate to logs) | 19:43 |
lifeless | jeblair: heh, yeah - for the jenkins I ran in hpcloud I symlink those trees to the ephemeral disk | 19:44 |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 19:44 | |
openstackgerrit | Sean Dague proposed a change to openstack-infra/elastic-recheck: protect from the case of not passing an event https://review.openstack.org/68737 | 19:45 |
*** praneshp has quit IRC | 19:45 | |
openstackgerrit | A change was merged to openstack-infra/reviewday: Whitelist external lazr.authentication requirement https://review.openstack.org/65026 | 19:46 |
*** markmcclain has joined #openstack-infra | 19:46 | |
openstackgerrit | A change was merged to openstack-infra/reviewday: Generate JSON https://review.openstack.org/64471 | 19:48 |
clarkb | jeblair: that python26 fail seems consistent. I am digging into it now | 19:48 |
*** sarob has quit IRC | 19:49 | |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 19:49 | |
*** sarob has joined #openstack-infra | 19:50 | |
jeblair | clarkb: after lca lifeless helped me find that it was probably that the test_client_enqueue_negative test was timing out. this doesn't make a lot of sense to me and i was unable to repro on my laptop. however, i have not done so on a centos6 vm. | 19:51 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: protect from the case of not passing an event https://review.openstack.org/68737 | 19:51 |
clarkb | jeblair: interesting, worker-2 has a return code of -140 | 19:51 |
jeblair | clarkb: he indicated that process-return-code should fail in the case of a test timeout, and that the log should indicate a test start but not finish for the test in question | 19:52 |
openstackgerrit | Matthew Treinish proposed a change to openstack-infra/config: Add projects section to elastic recheck bot yaml https://review.openstack.org/68741 | 19:52 |
mtreinish | jog0: ^^^ | 19:52 |
*** gokrokve has joined #openstack-infra | 19:52 | |
*** markmcclain has quit IRC | 19:53 | |
*** markmcclain has joined #openstack-infra | 19:54 | |
*** sarob has quit IRC | 19:54 | |
*** kostabrava has joined #openstack-infra | 19:54 | |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 19:54 | |
*** markmcclain has quit IRC | 19:55 | |
*** kostabrava has quit IRC | 19:55 | |
*** markmcclain has joined #openstack-infra | 19:55 | |
*** sarob has joined #openstack-infra | 19:56 | |
clarkb | jeblair: in this case it looks like test_two_failed_changes_at_head is at fault | 19:56 |
lifeless | jeblair: clarkb: -140 isn't the return code we synthesis for a timeout though | 19:56 |
lifeless | jeblair: clarkb: so -140 suggests the backend crashed, to me | 19:56 |
jeblair | lifeless: it's a hard timeout | 19:56 |
lifeless | oh duh - right | 19:57 |
lifeless | the timer error code | 19:57 |
lifeless | jeblair: clearly I'm not fully awake yet | 19:57 |
*** ryanpetrello has joined #openstack-infra | 19:57 | |
*** dizquierdo has quit IRC | 19:57 | |
*** sarob has quit IRC | 19:58 | |
jeblair | fungi: i wanted to get a jump on this, so i looked at the scp plugin code and did a manual test in jenkins... | 19:58 |
*** gokrokve has quit IRC | 19:58 | |
*** sarob has joined #openstack-infra | 19:58 | |
*** harlowja_away is now known as harlowja | 19:59 | |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 19:59 | |
*** ivar-lazzaro has quit IRC | 20:01 | |
jeblair | fungi: it does require that the source path be relative to the workspace | 20:01 |
jeblair | fungi: but it will follow a symlink out of it | 20:01 |
lifeless | jeblair: huh am I right that devstack-gate nodes have *two* caches of git repos? /opt/git/everything and ~/workspace-cache/ ? | 20:01 |
jeblair | fungi: so we should be able to have d-g symlink $WORKSPACE/logs to /opt | 20:02 |
jeblair | lifeless: yes, it's a work-in-progress to move all slaves to use /opt/git/everything, including devstack-gate | 20:03 |
*** praneshp has joined #openstack-infra | 20:03 | |
*** sarob has quit IRC | 20:04 | |
*** mrodden has quit IRC | 20:04 | |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 20:04 | |
*** rfolco has quit IRC | 20:04 | |
*** jcoufal has joined #openstack-infra | 20:04 | |
*** dhellmann_ is now known as dhellmann | 20:04 | |
*** rnirmal has quit IRC | 20:05 | |
lifeless | ok | 20:05 |
lifeless | so I think we should follow the d-g pattern right now and then help with that effort | 20:05 |
*** rnirmal has joined #openstack-infra | 20:07 | |
*** mrodden has joined #openstack-infra | 20:08 | |
clarkb | jeblair: fungi: there is a recent window size decreased to 3 message for the main gate queue, on the next iteration through the list we should see that get reflected. I don't know why it didn't drop to 10 instead though | 20:08 |
clarkb | best guess is that window floor isn't getting picked up on layout reloads properly? I wonder if my TODO in the merge change queue function needs to be done | 20:09 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/elastic-recheck: add tests for loading the queries https://review.openstack.org/68745 | 20:10 |
sdague | clarkb: did you see earlier discussion to bring up the floor | 20:11 |
sdague | I think 3 is too low | 20:11 |
clarkb | sdague: yes, it was brought up | 20:11 |
clarkb | sdague: but the config doesn't seem to have stuck | 20:11 |
sdague | ah, bummer | 20:11 |
clarkb | sdague: I think 3 is plenty high, right now we are just failing at the head of the queue over and over and over | 20:12 |
clarkb | no sense in testing more than one change imo | 20:12 |
*** pballand has quit IRC | 20:12 | |
*** rnirmal has quit IRC | 20:12 | |
sdague | well it's nova changes | 20:12 |
sdague | that have unit test bugs | 20:12 |
sdague | so we're going to just fail the next three changes most likely | 20:12 |
sdague | realistically unit tests are reseting us in the gate more often than tempest right now | 20:13 |
bknudson | I think the keystone jobs are going to fail until https://review.openstack.org/#/c/68135/ is merged | 20:13 |
clarkb | I think something is wonky with the way layout is reloaded, we seem to still have a relatively large window according to status | 20:13 |
bknudson | so might be a good idea to promote it? | 20:14 |
jeblair | clarkb: here's why i don't think <10 is helpful -- we really want this stuff to merge as quickly as possible, and we definitely have the resources to run jobs for 10 changes. | 20:14 |
clarkb | bknudson: that is a heat change? | 20:14 |
*** gokrokve has joined #openstack-infra | 20:14 | |
sdague | jeblair: +1 | 20:14 |
bknudson | clarkb: https://review.openstack.org/#/c/68135/ is the heat change | 20:14 |
clarkb | jeblair: I agree, but it doesn't make a difference when we are serialized | 20:14 |
clarkb | bknudson: yes that is the change you linked | 20:14 |
sdague | tcp is really about avoiding errors, we are actually ok with a certain amount of errors | 20:15 |
sdague | the big issue we have is overunning our resources then effectively swapping, but floor of 10 would be fine | 20:15 |
jeblair | clarkb: so even though right at this moment 3 or 1 is sufficient, as soon as we get past that, i'd like us to be better utilizing zuul's capability; the cost of that is to waste a little now, and i think we can handle that easily. | 20:15 |
clarkb | jeblair: thats fair, also not convinced my change did the correct thing after hte zuul layout reload | 20:16 |
clarkb | jeblair: does mergeChangeQueue need to handle window-floor and the other keys to make them carry through a reload properly? | 20:16 |
jeblair | clarkb: yeah, i'll help look into that in a sec; have to warm up lunch now | 20:17 |
clarkb | ok | 20:17 |
clarkb | looks like change_queues starts as empty list in buildChangeQueues, so I don't think the old change queues will affect a zuul reload. I should grab lunch too | 20:18 |
*** elasticio has joined #openstack-infra | 20:19 | |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 20:23 | |
mordred | jeblair: all three new jenkins servers should now be up and running | 20:24 |
*** vipul is now known as vipul-away | 20:24 | |
anteaya | \o/ | 20:24 |
mordred | jeblair: now I believe the next step is to write an additional change to config that adds them to nodepool, right? | 20:25 |
*** wenlock has quit IRC | 20:26 | |
*** gyee has joined #openstack-infra | 20:27 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 20:28 | |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 20:28 | |
*** yolanda_ has quit IRC | 20:29 | |
*** markmcclain has quit IRC | 20:29 | |
sdague | mordred: awesome sauce | 20:30 |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 20:31 | |
*** jerryz has quit IRC | 20:31 | |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/config: Enable jenkins0[5-7] in nodepool https://review.openstack.org/68759 | 20:32 |
mordred | jeblair, clarkb: ^^ there ya go | 20:32 |
*** jgrimm has quit IRC | 20:32 | |
*** smarcet has left #openstack-infra | 20:32 | |
*** oubiwann_ is now known as mr-typo | 20:33 | |
*** mr-typo is now known as oubiwann-fn | 20:33 | |
*** vipul-away is now known as vipul | 20:36 | |
dstufft | Also let me ask for some reviews to https://review.openstack.org/#/c/68719/ please :] | 20:36 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Add query for bug 1270693 https://review.openstack.org/68762 | 20:38 |
*** hogepodge_ has joined #openstack-infra | 20:39 | |
*** pafuent has joined #openstack-infra | 20:40 | |
*** vipul is now known as vipul-away | 20:40 | |
sdague | clarkb: are you able to help me figure out why the cron bits apparently didn't work? | 20:40 |
sdague | for elastic search | 20:40 |
sdague | elastic recheck | 20:40 |
sdague | http://status.openstack.org/elastic-recheck/data/ | 20:40 |
clarkb | sdague after lunch | 20:40 |
sdague | cool, thanks! | 20:41 |
*** rnirmal has joined #openstack-infra | 20:41 | |
*** hogepodge has quit IRC | 20:41 | |
*** hogepodge_ is now known as hogepodge | 20:41 | |
*** mrda_away is now known as mrda | 20:42 | |
mordred | clarkb: the above patch is +2 from me | 20:43 |
*** coolsvap is now known as coolsvap_away | 20:44 | |
*** ok_delta has joined #openstack-infra | 20:46 | |
*** hogepodge has quit IRC | 20:46 | |
*** SumitNaiksatam has quit IRC | 20:51 | |
*** marun has quit IRC | 20:51 | |
openstackgerrit | Dolph Mathews proposed a change to openstack-infra/elastic-recheck: eliminate 14 false positives for bug 1268732 https://review.openstack.org/68707 | 20:51 |
*** SumitNaiksatam has joined #openstack-infra | 20:51 | |
*** marun has joined #openstack-infra | 20:52 | |
*** pballand has joined #openstack-infra | 20:53 | |
*** markmcclain has joined #openstack-infra | 20:54 | |
jeblair | mordred: i still can't log into jenkins05 | 20:55 |
*** hogepodge has joined #openstack-infra | 20:56 | |
openstackgerrit | Sean Dague proposed a change to openstack-infra/elastic-recheck: stop being rediculous with our time formats https://review.openstack.org/68765 | 20:56 |
jeblair | mordred: i think you may need to edit some xml files and s/jenkins.openstack.org/jenkins05.openstack.org/ | 20:56 |
*** coolsvap_away has quit IRC | 20:56 | |
mordred | jeblair: oh weird. ... OH - I know what it is | 20:57 |
mordred | yeah | 20:57 |
*** jgrimm has joined #openstack-infra | 20:59 | |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Add query for bug 1270693 https://review.openstack.org/68762 | 21:00 |
mordred | jeblair: jenkins.model.JenkinsLocationConfiguration.xml | 21:01 |
mordred | jeblair: I believe we could add that to the puppet | 21:01 |
jeblair | mordred: istr there are 2 places | 21:05 |
mordred | jeblair: also in hudson.tasks.Mailer.xml | 21:05 |
mordred | jeblair: jenkins05 restarted | 21:06 |
jeblair | clarkb: i understand the problem | 21:06 |
fungi | jeblair: okay, i'll ditch my test change and work up a simple d-g patch (shouldn'tbe more than a few lines) | 21:08 |
openstackgerrit | Sabari Murugesan proposed a change to openstack/requirements: nova api validation fw requires jsonschema >= 2.0.0 https://review.openstack.org/66464 | 21:08 |
*** wenlock has joined #openstack-infra | 21:08 | |
*** DinaBelova is now known as DinaBelova_ | 21:09 | |
*** jgrimm has quit IRC | 21:09 | |
*** dangers_away is now known as dangers | 21:10 | |
pafuent | Hi. I want to add tests for Heat and I want to know if the ones tagged as slow are run by Jenkins when someone upload a new patch. | 21:11 |
openstackgerrit | Aaron Greengrass proposed a change to openstack-infra/config: Extend user module, add 'disable user' https://review.openstack.org/68771 | 21:11 |
*** afazekas has joined #openstack-infra | 21:12 | |
*** turul_ has joined #openstack-infra | 21:12 | |
*** turul_ has quit IRC | 21:12 | |
fungi | pafuent: probably better to inquire in #openstack-qa | 21:14 |
*** pballand has quit IRC | 21:14 | |
pafuent | fungi: Ok, thanks | 21:15 |
*** mrodden has quit IRC | 21:15 | |
*** mrodden has joined #openstack-infra | 21:16 | |
*** markmcclain has quit IRC | 21:17 | |
clarkb | jeblair: woot | 21:18 |
clarkb | sdague: back from lunch | 21:18 |
*** vipul-away is now known as vipul | 21:18 | |
jeblair | clarkb: just a sec and i'll have something | 21:18 |
sdague | clarkb: great, was also poking fungi on this. So the new bits to build the uncategorized.html should be putting it in /var/lib/elastic-recheck | 21:18 |
sdague | however, instead there is just the lockfile | 21:19 |
sdague | http://status.openstack.org/elastic-recheck/data/ | 21:19 |
sdague | which makes me think something died weird, and now we're dead on the lock | 21:19 |
fungi | sdague: http://paste.openstack.org/show/61783/ | 21:20 |
sdague | fungi: yep | 21:20 |
*** dkranz has quit IRC | 21:20 | |
*** salv-orlando has quit IRC | 21:20 | |
sdague | that dir maps full to the world at http://status.openstack.org/elastic-recheck/data/ | 21:20 |
clarkb | oh interesting | 21:20 |
fungi | sdague: also, we seem to still spawn a new daemon on every restart... http://paste.openstack.org/show/61784/ | 21:21 |
openstackgerrit | DennyZhang proposed a change to openstack-infra/gitdm: update personal profile https://review.openstack.org/68772 | 21:22 |
sdague | fungi: ok, we should tackle that one as well | 21:22 |
clarkb | fungi: is there anything I should poke at or do you have a handle on it? | 21:22 |
clarkb | don't want to get in the way | 21:22 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: eliminate 14 false positives for bug 1268732 https://review.openstack.org/68707 | 21:22 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/zuul: Don't store change_queue in QueueItem https://review.openstack.org/68773 | 21:23 |
*** pafuent has left #openstack-infra | 21:23 | |
openstackgerrit | Ivan Melnikov proposed a change to openstack-dev/hacking: Trigger warnings for raw and unicode docstrings https://review.openstack.org/68774 | 21:23 |
jeblair | clarkb: https://review.openstack.org/68773 lemme know if that makes sense | 21:23 |
fungi | clarkb: i can run this to ground. i'd rather you not get dragged from zuul patches | 21:23 |
clarkb | jeblair: looking | 21:23 |
clarkb | jeblair: gah! good catch | 21:24 |
* fungi swears profusely at the flaky internet here | 21:24 | |
*** marun has quit IRC | 21:24 | |
*** ok_delta has quit IRC | 21:25 | |
jeblair | clarkb: so short version is that it was updating defunct change queues; thus the disconnect between logs and reality | 21:25 |
mgagne | Anyone familiar with hpcloud? I have questions regarding the concept of API keys and what they should be used for. | 21:25 |
clarkb | jeblair: ya, I figured it out just from your commit message | 21:25 |
*** sarob has joined #openstack-infra | 21:25 | |
*** alexpilotti has quit IRC | 21:25 | |
clarkb | mgagne: rolling credential expiry iirc | 21:25 |
clarkb | mgagne: you may also be able to restrict access of particular keys | 21:26 |
*** alexpilotti has joined #openstack-infra | 21:26 | |
mgagne | clarkb: lets say I use the nova client, how do I feed it the api key? | 21:26 |
clarkb | mgagne: that I don't know | 21:26 |
clarkb | mordred: ^ | 21:26 |
*** julim has quit IRC | 21:26 | |
*** marun has joined #openstack-infra | 21:27 | |
openstackgerrit | Ivan Melnikov proposed a change to openstack-dev/hacking: Trigger warnings for raw and unicode docstrings https://review.openstack.org/68774 | 21:28 |
openstackgerrit | Aaron Greengrass proposed a change to openstack-infra/config: Extend user creation with more granularity https://review.openstack.org/68776 | 21:28 |
fungi | sdague: http://paste.openstack.org/show/61785/ (looks like run_er_uncat and run_er_graph are broken) | 21:29 |
sdague | fungi: interesting.... | 21:30 |
openstackgerrit | Matthew Treinish proposed a change to openstack-infra/elastic-recheck: Add basic unit tests for the bot https://review.openstack.org/68778 | 21:30 |
fungi | i'm having a look at it now to see whether i have suggestions | 21:30 |
mattoliverau | Morning all | 21:30 |
sdague | because cd isn't a command.... right | 21:30 |
sdague | and it's an exec and not bash | 21:30 |
*** alexpilotti has quit IRC | 21:30 | |
fungi | sdague: https://git.openstack.org/cgit/openstack-infra/config/tree/modules/elastic_recheck/manifests/init.pp#n49 | 21:31 |
fungi | yeah, there's a separate puppet option to set cwd | 21:31 |
* fungi gets an example | 21:31 | |
anteaya | mattoliverau: morning | 21:32 |
anteaya | how are the boxes today? | 21:32 |
*** MarkAtwood has joined #openstack-infra | 21:32 | |
sdague | right, cwd | 21:32 |
sdague | let me fix | 21:32 |
*** fallenpegasus has joined #openstack-infra | 21:32 | |
*** fallenpegasus has quit IRC | 21:32 | |
*** markmcclain has joined #openstack-infra | 21:33 | |
fungi | sdague: example... https://git.openstack.org/cgit/openstack-infra/config/tree/modules/kibana/manifests/init.pp#n60 | 21:33 |
jhesketh | Morning | 21:33 |
clarkb | howdy | 21:33 |
jeblair | hashar: i'd like to draw your attention to this proposed change: https://review.openstack.org/#/c/68723/ | 21:34 |
openstackgerrit | Aaron Greengrass proposed a change to openstack-infra/config: Move o.o user creation to it's own manifest. https://review.openstack.org/68779 | 21:34 |
mattoliverau | anteaya: becoming fewer and fewer... still too many for my liking tho :P | 21:34 |
sdague | fungi: ... but why wouldn't the cron part work? | 21:34 |
sdague | that's just the retrigger on git update | 21:35 |
mattoliverau | anteaya: had to buy a new fridge and washing machine yesterday, turns out they didn't fare to well in storage over the last year :( | 21:35 |
fungi | sdague: i'll check the mail for the recheck user | 21:35 |
sdague | oh... I probably need a && there | 21:35 |
sdague | it was a semicolon instead | 21:35 |
fungi | sdague: no mail for recheck since the 18th | 21:36 |
sdague | or.... it helps to call the right command.... :) | 21:36 |
* fungi nods. always a good policy | 21:36 | |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: fix er_run commands https://review.openstack.org/68780 | 21:37 |
sdague | "you can increase your chances of success by knowing what you are doing" | 21:37 |
hashar | jeblair: hello :) | 21:37 |
anteaya | jhesketh: morning | 21:38 |
clarkb | sdague: we can't rely on chance? | 21:38 |
hashar | jeblair: will comment on change that removes the push_ref to gerrit . We don't use it at wikimedia :-] | 21:38 |
anteaya | mattoliverau: nooooo | 21:38 |
sdague | clarkb: I think we need more than 10k monkeys for that | 21:38 |
*** odyi has joined #openstack-infra | 21:38 | |
anteaya | I hope you like the new appliances though | 21:38 |
jeblair | hashar: excellent. good choice. :) | 21:39 |
*** jhesketh_ has quit IRC | 21:39 | |
clarkb | mordred: if you are around I am happy to pay attention to enabling more jenkins masters | 21:39 |
clarkb | mordred: is that something you had planened to watch go in? | 21:39 |
mikal | So... riddle me this batman. | 21:41 |
mikal | The zuul test chain for gate | 21:41 |
mikal | It shows patches from a minute ago in the same chain as ones from 24 hours ago | 21:41 |
mikal | All succeeding | 21:41 |
mattoliverau | anteaya: yeah was annoying to get the old appliances up here, then they don't work :( But new ones arrive today and will actually be able to keep things cold.. so that's exciting :) | 21:41 |
mikal | Does this mean zuul flushes and restarts every time something gets added to the chain? | 21:42 |
mikal | Cause that can't be right | 21:42 |
clarkb | mikal: things just get appended to the end | 21:42 |
mikal | clarkb: but that means that line doesn't indicate things currently attempting to merge? | 21:42 |
mikal | clarkb: or is the merge a continuous process until a flush happens? | 21:43 |
clarkb | mikal: only the top (head) of the queue can merge | 21:43 |
clarkb | mikal: as each of those is consumed zuul makes decisions on whether or not to merge based on test results | 21:43 |
mikal | Oh, so it merges the top one, and then if the second one has passed tests merges it, etc etc? | 21:43 |
clarkb | yup | 21:43 |
*** dprince has quit IRC | 21:43 | |
anteaya | mattoliverau: keeping cold things cold is a good quality in a refrigerator, good choice | 21:44 |
*** praneshp has quit IRC | 21:44 | |
mikal | And those test results are just from millions of instances speculatively testing? | 21:44 |
jeblair | mordred, clarkb: i think the new jenkins masters still don't have jobs; run jjb manually? | 21:44 |
clarkb | jeblair: I will trigger jjb on 05 | 21:45 |
mordred | clarkb: I'm in meetigs for the next couple of hours - but I'm happy if we wait and I can watch it after | 21:45 |
*** sarob has quit IRC | 21:47 | |
*** sarob has joined #openstack-infra | 21:47 | |
*** senk has joined #openstack-infra | 21:48 | |
*** sarob_ has joined #openstack-infra | 21:48 | |
*** praneshp has joined #openstack-infra | 21:49 | |
clarkb | mordred: ok, I will run JJB on the nodes now then | 21:49 |
clarkb | authentication failed ;( | 21:50 |
jeblair | clarkb: yeah, i was trying to find out from mordred if i had pointed him at the wrong file | 21:51 |
jeblair | clarkb: didn't you say something like 'the node list was dirty'? if so, that may indicate so. | 21:51 |
*** sarob has quit IRC | 21:51 | |
clarkb | jeblair: yes the slave list was dirty | 21:51 |
clarkb | jeblair: the users/gerrig/config.xml matches jenkins04 though | 21:51 |
clarkb | guessing the trouble is elsewhere in the config | 21:52 |
jeblair | clarkb: but do the secrets? | 21:52 |
clarkb | jeblair: the secret.key files do not match | 21:52 |
*** melwitt has joined #openstack-infra | 21:52 | |
*** pballand has joined #openstack-infra | 21:53 | |
jeblair | clarkb: why don't i find the correct files and finish correctly documenting this process. | 21:53 |
clarkb | ++ | 21:53 |
openstackgerrit | A change was merged to openstack-infra/zuul: Allow pipelines triggers to filter by username https://review.openstack.org/64219 | 21:54 |
jeblair | seeing as how it's not 3am and i'm not it au, i might finish it this time. | 21:54 |
sdague | fungi: https://review.openstack.org/68780 - clean zuul results | 21:54 |
sdague | that should get us working on the er pages | 21:54 |
*** whoops has joined #openstack-infra | 21:55 | |
fungi | sdague: okie dokie | 21:55 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/devstack-gate: Keep logs in $BASE instead of $WORKSPACE https://review.openstack.org/68782 | 21:55 |
*** mfer has quit IRC | 21:55 | |
sdague | jeblair: how do you feel about changing the stacking order on the node graph? | 21:56 |
sdague | http://goo.gl/NzQCxl | 21:57 |
*** sarob_ has quit IRC | 21:57 | |
jeblair | clarkb, mordred: ah, the secrets tarball doesn't have the right root. | 21:57 |
clarkb | jeblair: I have a small update to my zuul change that I wantto push (allows for disabling rate limiting on pipelines) and am writing up docs too | 21:58 |
clarkb | FYI before things get merged :) | 21:58 |
jeblair | sdague: i prefer the current order -- it's designed to produce a green line of ready nodes in a stable situation; you can see that on the current graph but not your revised one | 21:59 |
*** dkliban is now known as dkliban_afk | 22:00 | |
sdague | jeblair: yeh, I was thinking that the most important thing is to know what our throughput is | 22:00 |
sdague | and it's a little hard to eyeball it | 22:00 |
*** hashar has quit IRC | 22:00 | |
*** dkranz has joined #openstack-infra | 22:00 | |
jeblair | sdague: i believe we can see the relative amount and estimate the value with the area as-is, but i agree that it is difficult to read an exact value in that case. | 22:01 |
*** elasticio has quit IRC | 22:02 | |
jeblair | sdague: perhaps adding current-value numbers to the legend would help? | 22:02 |
*** praneshp has quit IRC | 22:02 | |
jeblair | sdague: or if you wanted to duplicate the graph elsewhere with more detail (maybe lines instead of stacking area), that's an option | 22:02 |
*** sarob has joined #openstack-infra | 22:03 | |
*** praneshp has joined #openstack-infra | 22:03 | |
jeblair | sdague: i think the quick-glance overview is the most useful aspect of the graphs on that page (i usually create larger versions of them if i want to really study them) | 22:03 |
openstackgerrit | A change was merged to openstack-infra/config: fix er_run commands https://review.openstack.org/68780 | 22:04 |
*** mfink has quit IRC | 22:04 | |
*** thomasem has quit IRC | 22:05 | |
*** krtaylor has quit IRC | 22:09 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/zuul: Document zuul rate limiting configuration https://review.openstack.org/68788 | 22:09 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/zuul: Allow zuul to cleanup jobs outside window https://review.openstack.org/68725 | 22:09 |
clarkb | jeblair: ^ there we go | 22:09 |
*** jhesketh__ has joined #openstack-infra | 22:10 | |
openstackgerrit | A change was merged to openstack-infra/config: Release python-barbicanclient via Zuul https://review.openstack.org/68719 | 22:11 |
jeblair | clarkb, mordred, fungi: corrected tarballs and instructions placed in jenkins.o.o:~root/bootstrap | 22:11 |
dstufft | fungi: mordred thanks | 22:11 |
jeblair | clarkb, mordred: i also configured puppet to start on 5-7 and started it | 22:12 |
jeblair | clarkb: want to try jjb again now? | 22:12 |
*** gothicmindfood has joined #openstack-infra | 22:12 | |
clarkb | jeblair: sure, do I need to apply the new tarball or did you correct 05? | 22:12 |
jeblair | clarkb: i corrected all 3 and restarted jenkins | 22:12 |
clarkb | jeblair: thanks | 22:12 |
dstufft | now my next question :) is it possible to trigger a release job for a tag that was already pushed? | 22:12 |
*** fifieldt has quit IRC | 22:17 | |
fungi | dstufft: yes, i can do that. what tag? | 22:17 |
dstufft | fungi: 2.0.0 | 22:18 |
dstufft | I didn't realize some work had to be done before I pushed the tag :] | 22:18 |
fungi | dstufft: will do. gimme a few minutes to manually trigger the job | 22:18 |
clarkb | jeblair: I am just about through all of your zuul changes. Do we want to merge a bunch of code and do one zuul restart or put them in more slowly? | 22:18 |
dstufft | fungi: np, no hurry either, thanks a ton :) | 22:18 |
fungi | dstufft: and no worries. first time always needs testing anyway ;) | 22:18 |
clarkb | jeblair: also still getting auth errors on jenkins05, digging into that | 22:18 |
openstackgerrit | A change was merged to openstack-infra/zuul: Don't store change_queue in QueueItem https://review.openstack.org/68773 | 22:19 |
clarkb | jeblair: the secret.key contents are still different | 22:19 |
jeblair | clarkb: all at once! :) | 22:20 |
jeblair | clarkb: i'll dig into it again | 22:20 |
*** marun has quit IRC | 22:20 | |
jeblair | (key) | 22:20 |
*** DennyZhang has joined #openstack-infra | 22:21 | |
*** nati_ueno has quit IRC | 22:21 | |
*** morganfainberg is now known as morganfainberg|z | 22:21 | |
*** salv-orlando has joined #openstack-infra | 22:22 | |
openstackgerrit | A change was merged to openstack-infra/zuul: Allow zuul to cleanup jobs outside window https://review.openstack.org/68725 | 22:23 |
*** slong has quit IRC | 22:29 | |
sdague | jeblair: so what exactly is going on in the current zuul picture | 22:30 |
sdague | I'm kind of confuse :) | 22:30 |
jeblair | sdague: you mean the subway map for gate? | 22:31 |
clarkb | 68725 should fix that | 22:34 |
*** mfink has joined #openstack-infra | 22:35 | |
openstackgerrit | Matt Ray proposed a change to openstack-infra/config: Create new Chef cookbook-openstack-integration-test for Tempest support. https://review.openstack.org/68791 | 22:36 |
sdague | jeblair: yeh | 22:38 |
*** mriedem has quit IRC | 22:39 | |
jeblair | sdague: as clarkb mentioned, 68725 should fix it, but basically it's a side effect of the window; the nnfi change reparenting isn't being run on changes outside the window, so the graph is incorrect for changes outside the window | 22:40 |
jeblair | sdague: (and sometimes within the window, depending on where it's trying to draw the lines) | 22:40 |
sdague | gotcha | 22:40 |
jeblair | sdague: anyway, the change going it causes us to calculate the map all the time (it's fast), so it'll look right soon | 22:40 |
sdague | coolio | 22:40 |
*** ivar-lazzaro has joined #openstack-infra | 22:40 | |
*** dcramer__ has quit IRC | 22:42 | |
*** esker has joined #openstack-infra | 22:43 | |
openstackgerrit | Joshua Hesketh proposed a change to openstack-infra/zuul: Allow workers to send back metadata https://review.openstack.org/66173 | 22:45 |
*** sandywalsh has quit IRC | 22:45 | |
jeblair | clarkb: okay, 05 api key for gerrig matches now; updated docs and tarballs | 22:47 |
jeblair | clarkb: i'll fix 06 and 7, you want to try jjb on 5? | 22:47 |
*** dangers is now known as dangers_away | 22:47 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/zuul: Report queue window in status JSON. https://review.openstack.org/68792 | 22:48 |
clarkb | jeblair: yup running JJB again | 22:48 |
clarkb | jeblair: ^ that change should allow us to make pretty status pages with window info | 22:49 |
clarkb | JJB is applying jobs now | 22:49 |
*** DennyZhang has quit IRC | 22:50 | |
clarkb | jeblair: was the tarball still stale? | 22:51 |
jeblair | clarkb: i think it was always missing secret.key, so i updated it | 22:52 |
jeblair | clarkb: 6 and 7 should be gtg | 22:53 |
openstackgerrit | Matt Ray proposed a change to openstack-infra/config: Create new Chef cookbook-openstack-integration-test for Tempest support. https://review.openstack.org/68791 | 22:53 |
clarkb | jeblair: ok, running JJB there as well | 22:53 |
*** jasondotstar has quit IRC | 22:55 | |
jeblair | clarkb: i think it would be more useful to have the status.json include a field for each change indicating whether it was active or not so we could put a different color dot beside it. | 22:55 |
*** sarob has quit IRC | 22:55 | |
dstufft | fungi: looks like the release thing worked | 22:56 |
jeblair | clarkb: (rather than duplicating the in-window logic in the js) | 22:56 |
dstufft | fungi: thanks gain | 22:56 |
clarkb | jeblair: hmm good point | 22:56 |
dstufft | fungi: also, no whl? :[ | 22:56 |
clarkb | jeblair: is that a flag that should be assigned by _processOneItem? | 22:56 |
jeblair | clarkb: (optionally also continue to do what you did) | 22:56 |
jeblair | clarkb: seems easiest to me | 22:56 |
*** markmcclain has quit IRC | 22:57 | |
clarkb | I don't think it will be too bad to toggle a flag in _processOneItem | 22:58 |
*** esker has quit IRC | 22:59 | |
*** salv-orlando_ has joined #openstack-infra | 23:00 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/zuul: Add require-approval to Gerrit trigger https://review.openstack.org/68516 | 23:00 |
fungi | dstufft: we don't have wheel building and uploading automated yet. bug mordred ;) | 23:01 |
dstufft | fungi: oh man | 23:01 |
dstufft | mordred: is there something I can do to help with ^^ | 23:01 |
dstufft | I wants some whl :] | 23:01 |
lifeless | fungi: so fedora images aren't buildint I presume; where can I see the logs for nodepool w.r.t. that? | 23:01 |
*** sarob has joined #openstack-infra | 23:02 | |
fungi | lifeless: they're local to the nodepool server. i'll have a look | 23:02 |
jeblair | clarkb: fungi: https://review.openstack.org/#/c/68516/ is a rebase with conflicts; i've re-reviewed but wouldn't mind a once over before i aprv again | 23:02 |
jeblair | lifeless: i'd love to fix that; we could probably put them in a different dir and serve them via apache for starters | 23:02 |
*** sarob has quit IRC | 23:03 | |
jeblair | lifeless: (and then there's certainly better things we could do after that) | 23:03 |
*** salv-orlando has quit IRC | 23:03 | |
*** salv-orlando_ is now known as salv-orlando | 23:03 | |
clarkb | jeblair: ok will look in a minute | 23:03 |
*** gsamfira has quit IRC | 23:04 | |
*** thuc has quit IRC | 23:04 | |
*** thuc has joined #openstack-infra | 23:04 | |
*** changbl has quit IRC | 23:05 | |
clarkb | jeblair: was the conflict around the username filters? | 23:05 |
jog0 | is it possible to get russellb's patch https://review.openstack.org/#/c/68727/ promoted | 23:06 |
torgomatic | I've got a change (https://review.openstack.org/67920) that's failing the Swift functional tests, but I don't see any proxy logs or anything here: http://logs.openstack.org/20/67920/3/check/check-swift-dsvm-functional/92f11ae/ | 23:06 |
sdague | yeh +1 to promote on 68727 | 23:06 |
torgomatic | how might I go about finding out what's failed? | 23:06 |
jog0 | its the top gate reset right now | 23:06 |
sdague | where's the other unit test fix? | 23:07 |
jog0 | sdague: https://review.openstack.org/#/c/68768/ | 23:07 |
clarkb | torgomatic: unfortunately that is all you will get from that run because the job timed out and was forcefully killed before the logs could be grabbed | 23:07 |
sdague | yeh, so on next gate reset 68727, and 68768 should go up | 23:07 |
sdague | and I might have 2 more as well, the nova v3 xml removes from tempest, which will give us back time in all the runs | 23:08 |
torgomatic | clarkb: okay, so I should focus on making the functional tests better about timing out, and then I'll get logs? | 23:08 |
jog0 | sdague: I just +Aed the second patch | 23:08 |
jeblair | sdague: i think we may be getting close to a zuul restart. so maybe we can roll all 4 into that | 23:09 |
*** miqui has left #openstack-infra | 23:09 | |
sdague | jeblair: sure, let me go get the +2s on the tempest patches | 23:09 |
*** thuc has quit IRC | 23:09 | |
jog0 | russellb: ^^ | 23:09 |
clarkb | torgomatic: yeah, there is a mechanism to set a timeout in devstack gate which will do a soft timeout before the hard jenkins timeout. the tempest jobs use it. I bet we just need to set the same timeout variable for devstack gate in the swift functional tests | 23:09 |
clarkb | torgomatic: or, if that is already set figure out why the soft timeout failed | 23:09 |
torgomatic | clarkb: thanks; I'll go investigate that. | 23:10 |
openstackgerrit | A change was merged to openstack-infra/zuul: Add require-approval to Gerrit trigger https://review.openstack.org/68516 | 23:11 |
clarkb | new jenkinses are all JJB'd | 23:11 |
anteaya | woot | 23:12 |
fungi | clarkb: so we're ready to approve https://review.openstack.org/68759 ? | 23:12 |
*** burt1 has quit IRC | 23:12 | |
jeblair | hrm, that change adds the new servers but does not redistribute the min-ready across old ones | 23:12 |
jeblair | so we'll end up with more ready nodes than currently | 23:12 |
*** sarob has joined #openstack-infra | 23:12 | |
jeblair | another 15 bare-precise and another 30 devstack-precise | 23:13 |
jeblair | (and another 15 devstack-precise-check from hp region b) | 23:13 |
*** afazekas has quit IRC | 23:13 | |
fungi | oh, good point | 23:14 |
fungi | so we might want to roughly halve them all | 23:14 |
openstackgerrit | Joshua Hesketh proposed a change to openstack-infra/zuul: Allow workers to send back metadata https://review.openstack.org/66173 | 23:14 |
*** dims has quit IRC | 23:15 | |
jeblair | fungi: yeah. want to revise it? i think basically the goal should be to have enough ready nodes to start several nova jobs at once. | 23:15 |
fungi | will do | 23:15 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Wait only 5 minutes for ES to have the data https://review.openstack.org/68799 | 23:16 |
*** jhesketh__ has quit IRC | 23:17 | |
sdague | jog0: really? | 23:17 |
sdague | I thought 13 was fair | 23:17 |
*** sarob has quit IRC | 23:17 | |
sdague | 5 seems really pushing it | 23:17 |
jog0 | sdague: so I can't prove it but I think we are loosing gerrit events | 23:17 |
jeblair | clarkb: think you'll have the status change ready soon? if so we can wait for it, otherwise we're about ready to go | 23:18 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/zuul: Report queue window in status JSON. https://review.openstack.org/68792 | 23:18 |
jog0 | and I am *guessing* its related to the 13 minute timeout | 23:18 |
jeblair | ha! | 23:18 |
sdague | jog0: why? | 23:18 |
clarkb | jeblair: :) | 23:18 |
sdague | so the inner gerrit loop actually means we get them all on our side | 23:18 |
jog0 | sdague: I think I may be wrong about loosing events | 23:18 |
jog0 | but when is the last time we got a log after 10 minutes? | 23:18 |
sdague | I think the thing to do is actually see about instrumenting gerritlib to figure out how many unprocessed events are there | 23:18 |
jog0 | FWIW feel free to -2 this patch, see the commit message for the main why | 23:19 |
clarkb | jeblair: I am providing both chunks of data as window length seems potentially useful or other rendering | 23:19 |
sdague | so I agree that we'll back up if ES is losing data | 23:19 |
*** jcoufal has quit IRC | 23:19 | |
sdague | but I think that given the fix for that is coming soon, I'm less concerned on optimizing that | 23:19 |
jeblair | clarkb: yep, sounds good, except i have a -1 for that change. :( | 23:19 |
jog0 | sdague: I only wrote that because I am running e-r bot locally | 23:20 |
jog0 | and waiting 13 minutes is just silly | 23:20 |
jog0 | when I know it won't work | 23:20 |
*** DennyZhang has joined #openstack-infra | 23:20 | |
sdague | hmmm, just saw a weird reset event | 23:20 |
clarkb | jeblair: :( looking | 23:20 |
sdague | so everything just reset | 23:21 |
jog0 | sdague: time to do russellb's patches and xml v3? | 23:21 |
sdague | I don't have that +2s yet, I wonder if mtreinish drove home | 23:21 |
clarkb | jeblair: good point. I have tox running now on that switch | 23:22 |
mordred | dstufft: https://review.openstack.org/#/c/56760/ | 23:22 |
jog0 | sdague: at least nova patches are ready | 23:22 |
jeblair | sdague, jog0: you are still triggering er bot on comments left in gerrit, right? | 23:22 |
jog0 | jeblair: correct | 23:22 |
sdague | yep | 23:22 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Enable jenkins0[5-7] in nodepool https://review.openstack.org/68759 | 23:23 |
jog0 | jeblair: btw sdague and I were hoping to get https://review.openstack.org/#/c/68768/ and https://review.openstack.org/#/c/68727/ promoted | 23:23 |
jog0 | both are nova unit test fixes for gate bugs | 23:23 |
jeblair | sdague, jog0: and this is the only determination that the data are ready right? so you get a comment from jenkins via gerrit and then wait 13(or 5) mins for an e-r query to return data for that change? | 23:24 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/zuul: Report queue window in status JSON. https://review.openstack.org/68792 | 23:24 |
clarkb | and fixed | 23:24 |
jog0 | jeblair: correct | 23:24 |
sdague | jeblair: we get the gerrit event, then we start polling ES for consoles | 23:24 |
jeblair | jog0: i know, i was talking to sdague about that earlier, we may roll that plus the xml changes in with the impending zuul restart | 23:24 |
jog0 | we have several queries (one for console.html) and one for other files | 23:25 |
sdague | jeblair: yeh, it's a poll loop, with a 40s delay between polls | 23:25 |
jog0 | jeblair: ack I got mixed up because there was a gate reset just now | 23:25 |
jeblair | sdague, jog0: so there's a queue of changes for logstash to process. i'm not sure we make any guarantees about how fast it is. we want it to be fast, but many minutes does not seem unreasonable. it occasionally has been hours (which is unreasonable). | 23:26 |
sdague | typically ES is ready sometime between first and fifth poll | 23:26 |
*** DennyZhang has quit IRC | 23:26 | |
sdague | jeblair: our experience is that if it doesn't show up in that 13 minute period, it will never show up | 23:26 |
sdague | at least recently | 23:26 |
jog0 | this is monstly from the missing console.html files | 23:27 |
*** zaro has quit IRC | 23:27 | |
clarkb | yup we tend to not be that slow but there is the scp race | 23:27 |
jog0 | for example we just lost console.html for check-tempest-dsvm-full 66921,5,7183f61 | 23:27 |
sdague | yeh, after we get the console, we have another delay loop to get the rest of the files | 23:27 |
jog0 | where last 7 digits are short build_uuid | 23:27 |
clarkb | jog0: yes, there are cases where our scp plugin won't have the console log copied before logstash machinery tries to get it resulting in a 404 | 23:27 |
sdague | we don't assume the console being indexed means we have everything else yet | 23:27 |
clarkb | jenkins04 05 06 07 have the fix in place | 23:27 |
clarkb | but not the other jenkinses | 23:27 |
jeblair | sdague: okay. yeah, i don't think you should optimize for that error, but i would encourage you to allow 20-30 minutes in a real production deployment (maybe make an option for local testing) | 23:28 |
*** dstanek has quit IRC | 23:29 | |
jeblair | sdague jog0: because if that queue gets backed up, it's not good, but it's not worth throwing out e-r results. | 23:29 |
*** hashar has joined #openstack-infra | 23:29 | |
jeblair | sdague jog0: at least, assuming that wait doesn't block other changes. | 23:29 |
*** DennyZhang has joined #openstack-infra | 23:29 | |
jog0 | jeblair: thats the part we aren't very sure about | 23:30 |
jog0 | we *may* be loosing gerrit events, but that could be red herrings | 23:30 |
*** dims has joined #openstack-infra | 23:30 | |
jog0 | jeblair: ++ to config option | 23:30 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Enable jenkins0[5-7] in nodepool https://review.openstack.org/68759 | 23:30 |
jog0 | I'll revise my patch to make it a config option | 23:31 |
sdague | jeblair: so we only loose it for the recheck comments | 23:31 |
sdague | and in reality, if the system is just way backed up | 23:31 |
*** zaro has joined #openstack-infra | 23:31 | |
sdague | we're going to cascade up delays over time | 23:32 |
sdague | so end up just running well behind anyway | 23:32 |
sdague | they way it goes lossy is actually kind of reasonable | 23:32 |
clarkb | so one way to deal with cascade delays is keep track of some actual delay since event arrived | 23:33 |
sdague | yeh, that was the instrument gerrit lib idea | 23:33 |
jog0 | sdague: agreed its not idea but not the biggest issue | 23:33 |
clarkb | that way if you spend 13 minutes waiting for event 1 when you get to event 2 that came in 2 minutes after event1 you only delay for two minutes | 23:33 |
jog0 | ideal* | 23:33 |
jeblair | sdague: i don't think you should wait on one event at a time; wait for all the ones you have received up until a generous timeout. | 23:33 |
clarkb | eventually you shouldreach a point where you can process the majority of the queue without delay | 23:33 |
sdague | jeblair: well, that's a lot more complex | 23:34 |
sdague | because right now we're just processing them as a fifo | 23:34 |
*** dizquierdo has joined #openstack-infra | 23:34 | |
sdague | throwing away everything we don't care about, and passing up what we do care about, or timing out | 23:34 |
jeblair | sdague: yeah, but the thing is that you can't tell whether you haven't received the data you want because we're backed up or because we lost it; you want to do opposite things in those two cases (wait a long time / don't wait at all) | 23:35 |
sdague | yeh, sure | 23:35 |
sdague | but that's only one part of the system | 23:35 |
jeblair | sdague: so finding a single number that works for both is going to be very hard. :) | 23:35 |
*** jasondotstar has joined #openstack-infra | 23:35 | |
sdague | yep, agreed, but I think the bot is really in a good enough to be useful state at this point. And in experience, as long as we aren't just dropping logs all the time, it works pretty well | 23:36 |
*** krtaylor has joined #openstack-infra | 23:36 | |
* ttx waves | 23:36 | |
jeblair | sdague: alternately, if you subscribed to events from the log pusher, you could have a much closer expectation of when the results should show up | 23:36 |
sdague | enough so that *it* was the reason we realized we were dropping logs | 23:36 |
clarkb | ttx: ohai | 23:36 |
jog0 | yeah so it turns out there are a few parts of e-r that are useful and the bot is small part of it | 23:36 |
jeblair | sdague: and then you could say "if they haven't shown up in es within 2 mins after logstash pushed them, drop it" | 23:36 |
sdague | jeblair: sure, all good ideas. I think we have bigger fish to fry | 23:36 |
* jog0 waves to ttx | 23:37 | |
sdague | but patches are welcomed :) | 23:37 |
*** pballand has quit IRC | 23:37 | |
sdague | 68630 is also promotable, though the speed up won't come until the patch behind it | 23:38 |
jeblair | sdague: ok. at some point the log pushers are going to get 20 minutes backed up for 2 hours and i'm not going to be inclined to burst them out; just letting you know the operational constraints of the system. :) | 23:38 |
sdague | jeblair: that's fine | 23:38 |
sdague | and in that case we'll probably just lose one or two comments to gerrit | 23:38 |
sdague | then be hanging out in the lag | 23:39 |
jeblair | sdague: okiedokie. ack on 68727 68768 68630. | 23:39 |
sdague | until it catches up | 23:39 |
sdague | 68673 will be the speed up patch, but I had to fix a couple of things on it. So if it goes +A at any point, that should promote as well. | 23:40 |
jeblair | sdague: what was weird about the gate reset? | 23:41 |
sdague | there was no failed job at the top | 23:41 |
clarkb | it may have been consumed | 23:42 |
sdague | just the top job disappeared and everything below reset | 23:42 |
jeblair | sdague: was it waiting on one outstanding job? | 23:42 |
sdague | it was waiting on 2 last I looked | 23:42 |
sdague | I guess it could have raced | 23:42 |
sdague | gotten them both and popped | 23:42 |
russellb | thanks for including my 2 patches | 23:42 |
*** whoops has quit IRC | 23:43 | |
russellb | i'm just sorry it took me this long to step up and help more | 23:43 |
sdague | russellb: nothing to be sorry about, you have been a machine all week :) | 23:44 |
*** dosaboy has joined #openstack-infra | 23:44 | |
russellb | been trying to focus on it as much as i can past 2 weeks or so in one way or another | 23:44 |
russellb | and now i reward myself with some wine and strip-ctf.com | 23:46 |
dosaboy | hey guys, i'm getting the follwing gate failure a fair bit atm - http://logs.openstack.org/00/51900/17/check/check-tempest-dsvm-full/1405291/console.html | 23:46 |
dosaboy | does it ring any bells to anyone? | 23:46 |
* jog0 is afraid to look at strip-ctf.com in a public setting | 23:46 | |
dosaboy | i can't work out what is causing it | 23:46 |
russellb | lol | 23:46 |
russellb | jog0: wrong URL .... | 23:46 |
russellb | stripe-ctf.com | 23:46 |
russellb | bad typo. | 23:46 |
*** dizquierdo has quit IRC | 23:47 | |
jog0 | russellb: heh | 23:47 |
openstackgerrit | Eli Klein proposed a change to openstack-infra/jenkins-job-builder: Added rbenv wrapper https://review.openstack.org/65352 | 23:47 |
clarkb | dosaboy: yeah I think that is a problem fungi hunted down. the small root partitions our cloud providers give us are too small for all of the log files we want to collet | 23:47 |
jog0 | sdague: so the missing events may be that my local bot isn't as backed up as the infra e-r bot so not missing events just very backed up | 23:47 |
jog0 | but not sure | 23:47 |
clarkb | fungi: jeblair ^ that look right? | 23:47 |
*** morganfainberg|z is now known as morganfainberg | 23:48 | |
jog0 | sdague: actually this is easy to show | 23:48 |
openstackgerrit | A change was merged to openstack-infra/zuul: Report queue window in status JSON. https://review.openstack.org/68792 | 23:48 |
jog0 | sdague: https://review.openstack.org/#/c/62118/ | 23:48 |
dosaboy | clarkb: ok thanks, is there a bug lodged for that per chance? | 23:48 |
clarkb | dosaboy: there is, but I don't have the number handy | 23:48 |
jog0 | sdague: e-r commented 30 minutes after jenkins | 23:48 |
dosaboy | clarkb: ok no worries | 23:48 |
jog0 | jeblair: ^ | 23:48 |
fungi | dosaboy: https://launchpad.net/bugs/1268732 | 23:49 |
*** prad has quit IRC | 23:49 | |
dosaboy | fungi: cool thanks | 23:49 |
dosaboy | clarkb, fungi: is it worth doing a recheck on this one or shall i sit back a bit? | 23:50 |
fungi | dosaboy: it seems to be somewhat infrequent, so a recheck on it should be fine | 23:50 |
sdague | fungi: can you look at the er_run thing again? That uncategorized page has still not shown up | 23:50 |
clarkb | fungi: I wonder if the mysql slow log got bigger | 23:50 |
*** mriedem has joined #openstack-infra | 23:50 | |
dosaboy | fungi: cool, i'll give it a shot | 23:50 |
clarkb | it was already huge but would certainly make those small partitions feel very cramped | 23:50 |
sdague | fungi: actually - http://status.openstack.org/elastic-recheck/ | 23:50 |
sdague | it's top of the er list now | 23:51 |
jeblair | #status alert Zuul is being restarted for an upgrade | 23:51 |
openstackstatus | NOTICE: Zuul is being restarted for an upgrade | 23:51 |
clarkb | fungi: might be worth holding a node and investigating to make sure we don't have a silly regression in filesytem use | 23:51 |
sdague | 149 fails in 24hrs | 23:51 |
*** ChanServ changes topic to "Zuul is being restarted for an upgrade" | 23:51 | |
sdague | with a really steep uptick today | 23:51 |
jeblair | sdague: last chance for that last change; want to try to round up some +2s? | 23:51 |
* anteaya won't click any url russellb offers | 23:51 | |
russellb | :( | 23:51 |
clarkb | sdague: what url is the unclassified data being reported to? | 23:52 |
anteaya | unless it is a gerrit one | 23:52 |
sdague | jeblair: trying, but I think I'm failing | 23:52 |
sdague | so just go for it | 23:52 |
*** markmcclain has joined #openstack-infra | 23:52 | |
sdague | we'll pick it up tomorrow | 23:52 |
anteaya | was reading the backscroll | 23:52 |
sdague | clarkb: it should be http://status.openstack.org/elastic-recheck/data/unclassified.html | 23:53 |
fungi | sdague: oh, yeah, dsvm-full seems to have a major uptick along with the rise in that offlined slave bug | 23:53 |
sdague | we are writing it into the state dir | 23:53 |
sdague | because we are generating the html | 23:53 |
*** oubiwann-fn has quit IRC | 23:54 | |
openstackgerrit | Samuel Merritt proposed a change to openstack-infra/config: Add soft timeout to Swift functional tests https://review.openstack.org/68802 | 23:54 |
*** jhesketh_ has joined #openstack-infra | 23:55 | |
clarkb | torgomatic: you can drop the jenkins timeout on line 7 to ~70 minutes instead of 125 minutes | 23:56 |
clarkb | torgomatic: so that it is closer to the total runtime of the functests | 23:56 |
jeblair | zuul restarted; queues restored | 23:56 |
clarkb | torgomatic: actually nevermind, we may need to do some tuning of that since devstack runs first and takes forever | 23:56 |
torgomatic | clarkb: ok, will do... also I have a test failure to resolve (gate-config-layout?) so it might take me a minute | 23:56 |
clarkb | torgomatic: I left a comment | 23:58 |
clarkb | torgomatic: that shows how to fix the fail | 23:58 |
torgomatic | clarkb: thanks | 23:58 |
*** jdurgin has joined #openstack-infra | 23:58 | |
*** dstanek has joined #openstack-infra | 23:58 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!