Friday, 2018-11-23

*** eumel8 has quit IRC00:52
openstackgerritIan Wienand proposed openstack-infra/nodepool master: [wip] Add Fedora 29 testing  https://review.openstack.org/61867101:14
*** ianychoi has quit IRC02:57
*** ianychoi has joined #zuul02:57
*** bhavikdbavishi has joined #zuul03:33
*** bhavikdbavishi1 has joined #zuul03:33
*** bhavikdbavishi has quit IRC03:37
*** bhavikdbavishi1 is now known as bhavikdbavishi03:37
*** marwil has joined #zuul04:13
*** caphrim007_ has quit IRC04:28
openstackgerritIan Wienand proposed openstack-infra/nodepool master: [wip] Add Fedora 29 testing  https://review.openstack.org/61867104:43
openstackgerritIan Wienand proposed openstack-infra/nodepool master: Add Fedora 29 testing  https://review.openstack.org/61867104:44
*** dkehn has quit IRC04:50
*** dkehn has joined #zuul04:55
*** pcaruana has quit IRC05:44
openstackgerritIan Wienand proposed openstack-infra/nodepool master: Add Fedora 29 testing  https://review.openstack.org/61867106:11
*** chandankumar has joined #zuul06:23
*** chandankumar is now known as chkumar|rover06:25
*** chkumar|rover is now known as chkumar|ruck06:25
*** quiquell|off is now known as quiquell06:41
*** marwil has quit IRC06:43
openstackgerritIan Wienand proposed openstack-infra/nodepool master: Add Fedora 29 testing  https://review.openstack.org/61867106:49
*** pcaruana has joined #zuul07:08
*** goern has quit IRC07:17
*** gtema has joined #zuul07:45
evrardjppabelanger: I didn't know about cachable: true. Thanks :)07:53
evrardjppabelanger: I used another mechanism though now :)07:54
*** quiquell is now known as quiquell|brb07:56
*** quiquell|brb is now known as quiquell08:17
*** jpena|off is now known as jpena08:17
*** hashar has joined #zuul08:37
openstackgerritBrendan proposed openstack-infra/zuul master: Add support for Gerrit v2.16's change URL schema  https://review.openstack.org/61953308:58
*** nilashishc has joined #zuul09:07
*** bhavikdbavishi has quit IRC09:17
*** sshnaidm is now known as sshnaidm|off09:52
openstackgerritBrendan proposed openstack-infra/zuul master: Add support for Gerrit v2.16's change URL schema  https://review.openstack.org/61953309:55
*** bjackman has joined #zuul10:18
*** chkumar|ruck has quit IRC10:22
*** chandan_kumar has joined #zuul10:35
*** chandan_kumar is now known as chkumar|ruck10:35
*** electrofelix has joined #zuul10:38
*** bjackman has quit IRC10:47
*** dkehn has quit IRC10:57
*** gtema has quit IRC11:40
*** EmilienM is now known as EvilienM12:14
*** gtema has joined #zuul12:15
*** jpena is now known as jpena|lunch12:20
*** pcaruana has quit IRC12:26
*** bhavikdbavishi has joined #zuul12:41
*** jpena|lunch is now known as jpena13:15
*** goern has joined #zuul13:27
goernhey all, how can I investigate why jobs are receiving retry_limit failures at an increased rate?!13:27
*** dkehn has joined #zuul13:30
AJaegergoern: do you have a log file?13:31
goernI think so... which one? executor?13:32
AJaegerjob log file13:32
AJaegerdepending on the failure, it might show up in that log...13:32
goernna, zuul is pointing to finger:// so I think it was not able to give back the log, right?13:32
AJaegergoern: yeah, then you don't have logs13:32
AJaegergoern: in that case I can't help further.13:33
goernack13:33
AJaegeryou might need to wait for next week, with Thanksgiving weekend and Friday afternoon in Europe, not many are around ...13:33
*** bhavikdbavishi has quit IRC13:34
goernright13:34
AJaegerbut stay online, if somebody with more knowledge reads the backscroll, they will answer13:34
*** quiquell is now known as quiquell|lunch13:39
tobiashgoern: your own zuul or openstack zuul?13:49
tobiashgoern: in case this is reproducable, look at the live log, this can reveal problems with log upload13:50
tobiashgoern: in case this doesn't work, you should inspect the executor logs, they normally contain the information you need if the job logs are not available13:50
goernmy zuul13:52
tobiashgoern: so retry_limit means that either a pre playbook failed too often or this also could be caused by connection problems to the nodes13:53
goernits not one specific job on one repo, its all over the place, 'feels' like it is due to high load on the executer/scheduler ?!?! maybe?13:53
tobiashdo you have load statistics of the executor?13:54
tobiashat least the first place to look is the executor log13:54
tobiashscheduler shouldn't have anything to do with retry_limit13:54
goernno, I was waiting for that prometheus discussion to crystallize for one solution13:55
goernja, I set the executor to keep and started investigating the logs in /tmp/.....13:55
tobiashyou also can just look over the normal executor logs13:55
tobiashmost of the errors are already there13:56
goernack13:56
goernwrt load, I decreaed the number of parallel jobs, keeping the load at 2-413:56
tobiashthe executor should deregister itself automatically if load is above 2.5*cores13:57
goernja, lowered 2.5 to 1 at 8 cores13:58
*** quiquell|lunch is now known as quiquell14:01
*** nilashishc has quit IRC14:06
*** chkumar|ruck has quit IRC14:15
*** bjackman has joined #zuul14:18
bjackmanGot a Verified -1 on my zuul patch, something TIMED_OUT while running the tests, but the comment doesn't show up in the Gerrit web UI (I only have it by mail). Are there infra issues?14:22
bjackmanOh hold on maybe the zuul comment just shows up as a separate table in the Gerrit UI14:23
gtemato tell the truth lots of jobs are failing due to some devstack setup issues. Always in different places, so hard to say what is really the reason14:24
gtemaat least for me in openstacksdk area14:24
bjackmangtema, OK cheers. To my inexpert eyes it just looks like the job wasn't getting much CPU time and got killed at t=30 mins.14:27
gtemabjackman, no idea. For me it is sometime "nova-api did not start", sometimes "devstack post-upgrade returned 1", sometimes tempest jobs fail, sometimes timeouts in devstack. And with the same patch it is always different14:29
AJaegergtema: any commonality? All jobs run in the same region?14:31
*** bjackman has quit IRC14:32
gtemaajaeger: at least 2 last in ovh-bhs114:32
jpenaCan anyone point me where does nodepool call the NetworkDeleteFloatingips task? I'm trying to debug a strange problem where a nodepool VM's FIP gets removed before the job is finished, and I think that task did it14:33
gtemaajaeger: seems to be all failed were in ovh-bhs1 (at least from logs I have recovered). i.e. inap-mtl01 - works ok14:36
gtemaajaeger: the more I dig - the more confirmations I find - ovh-bhs114:38
AJaegergtema: http://grafana.openstack.org/d/BhcSH5Iiz/nodepool-ovh?orgId=1 shows some fails14:43
AJaegerinfra-root, can you check OVH, please? ^14:43
AJaegergtema: better let's discuss on #openstack-infra14:43
gtemaajaeger: sure14:43
fungiwe do seem to be rather steadily deleting a lot of nodes there, which is probably related to the error node launch attempts count too14:45
fungiahh, no that's gra1 not bhs114:49
*** quiquell is now known as quiquell|off14:55
*** bhavikdbavishi has joined #zuul15:26
gtemaquestion to API/UI. I am often following status of single patch or project. However UI fetches complete status and filter this in browser. This results in quite big bandwith throughput and load in my browser. What about adding possibility to filter status results in API?15:34
gtemadepending on time of the day it might be 2Mb data each 5 seconds15:35
*** bhavikdbavishi has quit IRC15:43
*** mattclay has quit IRC15:45
*** mattclay has joined #zuul15:47
*** pcaruana has joined #zuul16:00
*** bhavikdbavishi has joined #zuul16:11
*** hashar has quit IRC16:33
SpamapSgtema: I'd love to see us move to GraphQL for that too.16:40
gtemaspamaps: interesting idea16:41
gtemaso you mean provide new GraphQL API in zuul. Right?16:41
SpamapSYeah, it's pretty good for optimizations like that.16:41
SpamapSBut you could do it with a new REST endpoint too.16:42
gtemacool. Will think about and play a bit to come up with suggestion16:42
SpamapSOr it's possible a filter argument to the current status endpoint would handle your case.16:43
gtemasure, it would. I wanted to ask in the round what people think about such extension16:43
gtemabut looks like most cores are away :D16:44
dmsimardI got pulled into a weird issue with nodepool... in summary, it /looks/ like nodepool has failed to create some virtual machines in a way that it can't recover from ?16:46
dmsimardsome logs/output/trace http://paste.openstack.org/raw/735977/16:46
dmsimardyou'll see that the first warning has pool: None, which I guess explains "Cannot find provider pool for node"16:47
fungiSpamapS: gtema: pretty sure you can already request a specific change's status from the existing zuul api16:55
gtemafungi: yes, but UI is not currently capable on handling that16:56
fungiright, we were looking at reintroducing the gerrit webui overlay to display change test status there16:56
fungiand that api method was a prerequisite16:57
gtemafilter in the UI splits values on ",", but API responds only on the change with patchset16:57
fungipoint being, i don't see where waving a magic graphql wand is relevant16:57
gtemacan be a POC to implement different filters in API16:58
pabelangerdmsimard: do you have access to full log? given that rdo-cloud-triplo is also doing OVB, we need to be sure that something didnt do anything behind the back of nodepool16:59
pabelangerdmsimard: you also cannot delete a locked now, so the execption is to be expected16:59
dmsimardpabelanger: yeah it's a little bit messy and they're about to upgrade OVS on rdo cloud so I'll check next week ..17:01
*** bjackman has joined #zuul17:05
*** bjackman has quit IRC17:12
*** gtema has quit IRC17:13
fungiSpamapS: gtema: i've just confirmed, e.g. http://zuul.openstack.org/api/status/change/619625,2 works on our zuul deployment anyway17:15
fungiso anyway, no need for a new rest endpoint i don't think?17:16
*** jpena is now known as jpena|off17:33
*** cmurphy has quit IRC17:38
*** cmurphy has joined #zuul17:40
*** robcresswell has joined #zuul17:56
SpamapSfungi: agreed, we just have to make the UI use it.18:59
SpamapSand yeah, the API should maybe accept a change w/o a patchset19:00
*** electrofelix has quit IRC19:06
fungiagreed19:09
*** bhavikdbavishi has quit IRC19:12
*** rfolco has quit IRC19:40
*** robcresswell has quit IRC20:25
*** nhicher has joined #zuul20:31
*** pcaruana has quit IRC21:46
SpamapSHrm23:37
SpamapSnodepool keeps doing NODE_FAILURE's because AWS says no more nodes. Need to get quota management in.23:37
SpamapS(My quota with AWS is 10, max-servers is 5, but sometimes I haven't quite terminated 5 and it tries to do 6 more and ... doh).23:37

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!