*** eumel8 has quit IRC | 00:52 | |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: [wip] Add Fedora 29 testing https://review.openstack.org/618671 | 01:14 |
---|---|---|
*** ianychoi has quit IRC | 02:57 | |
*** ianychoi has joined #zuul | 02:57 | |
*** bhavikdbavishi has joined #zuul | 03:33 | |
*** bhavikdbavishi1 has joined #zuul | 03:33 | |
*** bhavikdbavishi has quit IRC | 03:37 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 03:37 | |
*** marwil has joined #zuul | 04:13 | |
*** caphrim007_ has quit IRC | 04:28 | |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: [wip] Add Fedora 29 testing https://review.openstack.org/618671 | 04:43 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: Add Fedora 29 testing https://review.openstack.org/618671 | 04:44 |
*** dkehn has quit IRC | 04:50 | |
*** dkehn has joined #zuul | 04:55 | |
*** pcaruana has quit IRC | 05:44 | |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: Add Fedora 29 testing https://review.openstack.org/618671 | 06:11 |
*** chandankumar has joined #zuul | 06:23 | |
*** chandankumar is now known as chkumar|rover | 06:25 | |
*** chkumar|rover is now known as chkumar|ruck | 06:25 | |
*** quiquell|off is now known as quiquell | 06:41 | |
*** marwil has quit IRC | 06:43 | |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: Add Fedora 29 testing https://review.openstack.org/618671 | 06:49 |
*** pcaruana has joined #zuul | 07:08 | |
*** goern has quit IRC | 07:17 | |
*** gtema has joined #zuul | 07:45 | |
evrardjp | pabelanger: I didn't know about cachable: true. Thanks :) | 07:53 |
evrardjp | pabelanger: I used another mechanism though now :) | 07:54 |
*** quiquell is now known as quiquell|brb | 07:56 | |
*** quiquell|brb is now known as quiquell | 08:17 | |
*** jpena|off is now known as jpena | 08:17 | |
*** hashar has joined #zuul | 08:37 | |
openstackgerrit | Brendan proposed openstack-infra/zuul master: Add support for Gerrit v2.16's change URL schema https://review.openstack.org/619533 | 08:58 |
*** nilashishc has joined #zuul | 09:07 | |
*** bhavikdbavishi has quit IRC | 09:17 | |
*** sshnaidm is now known as sshnaidm|off | 09:52 | |
openstackgerrit | Brendan proposed openstack-infra/zuul master: Add support for Gerrit v2.16's change URL schema https://review.openstack.org/619533 | 09:55 |
*** bjackman has joined #zuul | 10:18 | |
*** chkumar|ruck has quit IRC | 10:22 | |
*** chandan_kumar has joined #zuul | 10:35 | |
*** chandan_kumar is now known as chkumar|ruck | 10:35 | |
*** electrofelix has joined #zuul | 10:38 | |
*** bjackman has quit IRC | 10:47 | |
*** dkehn has quit IRC | 10:57 | |
*** gtema has quit IRC | 11:40 | |
*** EmilienM is now known as EvilienM | 12:14 | |
*** gtema has joined #zuul | 12:15 | |
*** jpena is now known as jpena|lunch | 12:20 | |
*** pcaruana has quit IRC | 12:26 | |
*** bhavikdbavishi has joined #zuul | 12:41 | |
*** jpena|lunch is now known as jpena | 13:15 | |
*** goern has joined #zuul | 13:27 | |
goern | hey all, how can I investigate why jobs are receiving retry_limit failures at an increased rate?! | 13:27 |
*** dkehn has joined #zuul | 13:30 | |
AJaeger | goern: do you have a log file? | 13:31 |
goern | I think so... which one? executor? | 13:32 |
AJaeger | job log file | 13:32 |
AJaeger | depending on the failure, it might show up in that log... | 13:32 |
goern | na, zuul is pointing to finger:// so I think it was not able to give back the log, right? | 13:32 |
AJaeger | goern: yeah, then you don't have logs | 13:32 |
AJaeger | goern: in that case I can't help further. | 13:33 |
goern | ack | 13:33 |
AJaeger | you might need to wait for next week, with Thanksgiving weekend and Friday afternoon in Europe, not many are around ... | 13:33 |
*** bhavikdbavishi has quit IRC | 13:34 | |
goern | right | 13:34 |
AJaeger | but stay online, if somebody with more knowledge reads the backscroll, they will answer | 13:34 |
*** quiquell is now known as quiquell|lunch | 13:39 | |
tobiash | goern: your own zuul or openstack zuul? | 13:49 |
tobiash | goern: in case this is reproducable, look at the live log, this can reveal problems with log upload | 13:50 |
tobiash | goern: in case this doesn't work, you should inspect the executor logs, they normally contain the information you need if the job logs are not available | 13:50 |
goern | my zuul | 13:52 |
tobiash | goern: so retry_limit means that either a pre playbook failed too often or this also could be caused by connection problems to the nodes | 13:53 |
goern | its not one specific job on one repo, its all over the place, 'feels' like it is due to high load on the executer/scheduler ?!?! maybe? | 13:53 |
tobiash | do you have load statistics of the executor? | 13:54 |
tobiash | at least the first place to look is the executor log | 13:54 |
tobiash | scheduler shouldn't have anything to do with retry_limit | 13:54 |
goern | no, I was waiting for that prometheus discussion to crystallize for one solution | 13:55 |
goern | ja, I set the executor to keep and started investigating the logs in /tmp/..... | 13:55 |
tobiash | you also can just look over the normal executor logs | 13:55 |
tobiash | most of the errors are already there | 13:56 |
goern | ack | 13:56 |
goern | wrt load, I decreaed the number of parallel jobs, keeping the load at 2-4 | 13:56 |
tobiash | the executor should deregister itself automatically if load is above 2.5*cores | 13:57 |
goern | ja, lowered 2.5 to 1 at 8 cores | 13:58 |
*** quiquell|lunch is now known as quiquell | 14:01 | |
*** nilashishc has quit IRC | 14:06 | |
*** chkumar|ruck has quit IRC | 14:15 | |
*** bjackman has joined #zuul | 14:18 | |
bjackman | Got a Verified -1 on my zuul patch, something TIMED_OUT while running the tests, but the comment doesn't show up in the Gerrit web UI (I only have it by mail). Are there infra issues? | 14:22 |
bjackman | Oh hold on maybe the zuul comment just shows up as a separate table in the Gerrit UI | 14:23 |
gtema | to tell the truth lots of jobs are failing due to some devstack setup issues. Always in different places, so hard to say what is really the reason | 14:24 |
gtema | at least for me in openstacksdk area | 14:24 |
bjackman | gtema, OK cheers. To my inexpert eyes it just looks like the job wasn't getting much CPU time and got killed at t=30 mins. | 14:27 |
gtema | bjackman, no idea. For me it is sometime "nova-api did not start", sometimes "devstack post-upgrade returned 1", sometimes tempest jobs fail, sometimes timeouts in devstack. And with the same patch it is always different | 14:29 |
AJaeger | gtema: any commonality? All jobs run in the same region? | 14:31 |
*** bjackman has quit IRC | 14:32 | |
gtema | ajaeger: at least 2 last in ovh-bhs1 | 14:32 |
jpena | Can anyone point me where does nodepool call the NetworkDeleteFloatingips task? I'm trying to debug a strange problem where a nodepool VM's FIP gets removed before the job is finished, and I think that task did it | 14:33 |
gtema | ajaeger: seems to be all failed were in ovh-bhs1 (at least from logs I have recovered). i.e. inap-mtl01 - works ok | 14:36 |
gtema | ajaeger: the more I dig - the more confirmations I find - ovh-bhs1 | 14:38 |
AJaeger | gtema: http://grafana.openstack.org/d/BhcSH5Iiz/nodepool-ovh?orgId=1 shows some fails | 14:43 |
AJaeger | infra-root, can you check OVH, please? ^ | 14:43 |
AJaeger | gtema: better let's discuss on #openstack-infra | 14:43 |
gtema | ajaeger: sure | 14:43 |
fungi | we do seem to be rather steadily deleting a lot of nodes there, which is probably related to the error node launch attempts count too | 14:45 |
fungi | ahh, no that's gra1 not bhs1 | 14:49 |
*** quiquell is now known as quiquell|off | 14:55 | |
*** bhavikdbavishi has joined #zuul | 15:26 | |
gtema | question to API/UI. I am often following status of single patch or project. However UI fetches complete status and filter this in browser. This results in quite big bandwith throughput and load in my browser. What about adding possibility to filter status results in API? | 15:34 |
gtema | depending on time of the day it might be 2Mb data each 5 seconds | 15:35 |
*** bhavikdbavishi has quit IRC | 15:43 | |
*** mattclay has quit IRC | 15:45 | |
*** mattclay has joined #zuul | 15:47 | |
*** pcaruana has joined #zuul | 16:00 | |
*** bhavikdbavishi has joined #zuul | 16:11 | |
*** hashar has quit IRC | 16:33 | |
SpamapS | gtema: I'd love to see us move to GraphQL for that too. | 16:40 |
gtema | spamaps: interesting idea | 16:41 |
gtema | so you mean provide new GraphQL API in zuul. Right? | 16:41 |
SpamapS | Yeah, it's pretty good for optimizations like that. | 16:41 |
SpamapS | But you could do it with a new REST endpoint too. | 16:42 |
gtema | cool. Will think about and play a bit to come up with suggestion | 16:42 |
SpamapS | Or it's possible a filter argument to the current status endpoint would handle your case. | 16:43 |
gtema | sure, it would. I wanted to ask in the round what people think about such extension | 16:43 |
gtema | but looks like most cores are away :D | 16:44 |
dmsimard | I got pulled into a weird issue with nodepool... in summary, it /looks/ like nodepool has failed to create some virtual machines in a way that it can't recover from ? | 16:46 |
dmsimard | some logs/output/trace http://paste.openstack.org/raw/735977/ | 16:46 |
dmsimard | you'll see that the first warning has pool: None, which I guess explains "Cannot find provider pool for node" | 16:47 |
fungi | SpamapS: gtema: pretty sure you can already request a specific change's status from the existing zuul api | 16:55 |
gtema | fungi: yes, but UI is not currently capable on handling that | 16:56 |
fungi | right, we were looking at reintroducing the gerrit webui overlay to display change test status there | 16:56 |
fungi | and that api method was a prerequisite | 16:57 |
gtema | filter in the UI splits values on ",", but API responds only on the change with patchset | 16:57 |
fungi | point being, i don't see where waving a magic graphql wand is relevant | 16:57 |
gtema | can be a POC to implement different filters in API | 16:58 |
pabelanger | dmsimard: do you have access to full log? given that rdo-cloud-triplo is also doing OVB, we need to be sure that something didnt do anything behind the back of nodepool | 16:59 |
pabelanger | dmsimard: you also cannot delete a locked now, so the execption is to be expected | 16:59 |
dmsimard | pabelanger: yeah it's a little bit messy and they're about to upgrade OVS on rdo cloud so I'll check next week .. | 17:01 |
*** bjackman has joined #zuul | 17:05 | |
*** bjackman has quit IRC | 17:12 | |
*** gtema has quit IRC | 17:13 | |
fungi | SpamapS: gtema: i've just confirmed, e.g. http://zuul.openstack.org/api/status/change/619625,2 works on our zuul deployment anyway | 17:15 |
fungi | so anyway, no need for a new rest endpoint i don't think? | 17:16 |
*** jpena is now known as jpena|off | 17:33 | |
*** cmurphy has quit IRC | 17:38 | |
*** cmurphy has joined #zuul | 17:40 | |
*** robcresswell has joined #zuul | 17:56 | |
SpamapS | fungi: agreed, we just have to make the UI use it. | 18:59 |
SpamapS | and yeah, the API should maybe accept a change w/o a patchset | 19:00 |
*** electrofelix has quit IRC | 19:06 | |
fungi | agreed | 19:09 |
*** bhavikdbavishi has quit IRC | 19:12 | |
*** rfolco has quit IRC | 19:40 | |
*** robcresswell has quit IRC | 20:25 | |
*** nhicher has joined #zuul | 20:31 | |
*** pcaruana has quit IRC | 21:46 | |
SpamapS | Hrm | 23:37 |
SpamapS | nodepool keeps doing NODE_FAILURE's because AWS says no more nodes. Need to get quota management in. | 23:37 |
SpamapS | (My quota with AWS is 10, max-servers is 5, but sometimes I haven't quite terminated 5 and it tries to do 6 more and ... doh). | 23:37 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!