Friday, 2018-11-23

*** eumel8 has quit IRC		00:52
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool master: [wip] Add Fedora 29 testing https://review.openstack.org/618671	01:14
*** ianychoi has quit IRC		02:57
*** ianychoi has joined #zuul		02:57
*** bhavikdbavishi has joined #zuul		03:33
*** bhavikdbavishi1 has joined #zuul		03:33
*** bhavikdbavishi has quit IRC		03:37
*** bhavikdbavishi1 is now known as bhavikdbavishi		03:37
*** marwil has joined #zuul		04:13
*** caphrim007_ has quit IRC		04:28
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool master: [wip] Add Fedora 29 testing https://review.openstack.org/618671	04:43
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool master: Add Fedora 29 testing https://review.openstack.org/618671	04:44
*** dkehn has quit IRC		04:50
*** dkehn has joined #zuul		04:55
*** pcaruana has quit IRC		05:44
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool master: Add Fedora 29 testing https://review.openstack.org/618671	06:11
*** chandankumar has joined #zuul		06:23
*** chandankumar is now known as chkumar\|rover		06:25
*** chkumar\|rover is now known as chkumar\|ruck		06:25
*** quiquell\|off is now known as quiquell		06:41
*** marwil has quit IRC		06:43
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool master: Add Fedora 29 testing https://review.openstack.org/618671	06:49
*** pcaruana has joined #zuul		07:08
*** goern has quit IRC		07:17
*** gtema has joined #zuul		07:45
evrardjp	pabelanger: I didn't know about cachable: true. Thanks :)	07:53
evrardjp	pabelanger: I used another mechanism though now :)	07:54
*** quiquell is now known as quiquell\|brb		07:56
*** quiquell\|brb is now known as quiquell		08:17
*** jpena\|off is now known as jpena		08:17
*** hashar has joined #zuul		08:37
openstackgerrit	Brendan proposed openstack-infra/zuul master: Add support for Gerrit v2.16's change URL schema https://review.openstack.org/619533	08:58
*** nilashishc has joined #zuul		09:07
*** bhavikdbavishi has quit IRC		09:17
*** sshnaidm is now known as sshnaidm\|off		09:52
openstackgerrit	Brendan proposed openstack-infra/zuul master: Add support for Gerrit v2.16's change URL schema https://review.openstack.org/619533	09:55
*** bjackman has joined #zuul		10:18
*** chkumar\|ruck has quit IRC		10:22
*** chandan_kumar has joined #zuul		10:35
*** chandan_kumar is now known as chkumar\|ruck		10:35
*** electrofelix has joined #zuul		10:38
*** bjackman has quit IRC		10:47
*** dkehn has quit IRC		10:57
*** gtema has quit IRC		11:40
*** EmilienM is now known as EvilienM		12:14
*** gtema has joined #zuul		12:15
*** jpena is now known as jpena\|lunch		12:20
*** pcaruana has quit IRC		12:26
*** bhavikdbavishi has joined #zuul		12:41
*** jpena\|lunch is now known as jpena		13:15
*** goern has joined #zuul		13:27
goern	hey all, how can I investigate why jobs are receiving retry_limit failures at an increased rate?!	13:27
*** dkehn has joined #zuul		13:30
AJaeger	goern: do you have a log file?	13:31
goern	I think so... which one? executor?	13:32
AJaeger	job log file	13:32
AJaeger	depending on the failure, it might show up in that log...	13:32
goern	na, zuul is pointing to finger:// so I think it was not able to give back the log, right?	13:32
AJaeger	goern: yeah, then you don't have logs	13:32
AJaeger	goern: in that case I can't help further.	13:33
goern	ack	13:33
AJaeger	you might need to wait for next week, with Thanksgiving weekend and Friday afternoon in Europe, not many are around ...	13:33
*** bhavikdbavishi has quit IRC		13:34
goern	right	13:34
AJaeger	but stay online, if somebody with more knowledge reads the backscroll, they will answer	13:34
*** quiquell is now known as quiquell\|lunch		13:39
tobiash	goern: your own zuul or openstack zuul?	13:49
tobiash	goern: in case this is reproducable, look at the live log, this can reveal problems with log upload	13:50
tobiash	goern: in case this doesn't work, you should inspect the executor logs, they normally contain the information you need if the job logs are not available	13:50
goern	my zuul	13:52
tobiash	goern: so retry_limit means that either a pre playbook failed too often or this also could be caused by connection problems to the nodes	13:53
goern	its not one specific job on one repo, its all over the place, 'feels' like it is due to high load on the executer/scheduler ?!?! maybe?	13:53
tobiash	do you have load statistics of the executor?	13:54
tobiash	at least the first place to look is the executor log	13:54
tobiash	scheduler shouldn't have anything to do with retry_limit	13:54
goern	no, I was waiting for that prometheus discussion to crystallize for one solution	13:55
goern	ja, I set the executor to keep and started investigating the logs in /tmp/.....	13:55
tobiash	you also can just look over the normal executor logs	13:55
tobiash	most of the errors are already there	13:56
goern	ack	13:56
goern	wrt load, I decreaed the number of parallel jobs, keeping the load at 2-4	13:56
tobiash	the executor should deregister itself automatically if load is above 2.5*cores	13:57
goern	ja, lowered 2.5 to 1 at 8 cores	13:58
*** quiquell\|lunch is now known as quiquell		14:01
*** nilashishc has quit IRC		14:06
*** chkumar\|ruck has quit IRC		14:15
*** bjackman has joined #zuul		14:18
bjackman	Got a Verified -1 on my zuul patch, something TIMED_OUT while running the tests, but the comment doesn't show up in the Gerrit web UI (I only have it by mail). Are there infra issues?	14:22
bjackman	Oh hold on maybe the zuul comment just shows up as a separate table in the Gerrit UI	14:23
gtema	to tell the truth lots of jobs are failing due to some devstack setup issues. Always in different places, so hard to say what is really the reason	14:24
gtema	at least for me in openstacksdk area	14:24
bjackman	gtema, OK cheers. To my inexpert eyes it just looks like the job wasn't getting much CPU time and got killed at t=30 mins.	14:27
gtema	bjackman, no idea. For me it is sometime "nova-api did not start", sometimes "devstack post-upgrade returned 1", sometimes tempest jobs fail, sometimes timeouts in devstack. And with the same patch it is always different	14:29
AJaeger	gtema: any commonality? All jobs run in the same region?	14:31
*** bjackman has quit IRC		14:32
gtema	ajaeger: at least 2 last in ovh-bhs1	14:32
jpena	Can anyone point me where does nodepool call the NetworkDeleteFloatingips task? I'm trying to debug a strange problem where a nodepool VM's FIP gets removed before the job is finished, and I think that task did it	14:33
gtema	ajaeger: seems to be all failed were in ovh-bhs1 (at least from logs I have recovered). i.e. inap-mtl01 - works ok	14:36
gtema	ajaeger: the more I dig - the more confirmations I find - ovh-bhs1	14:38
AJaeger	gtema: http://grafana.openstack.org/d/BhcSH5Iiz/nodepool-ovh?orgId=1 shows some fails	14:43
AJaeger	infra-root, can you check OVH, please? ^	14:43
AJaeger	gtema: better let's discuss on #openstack-infra	14:43
gtema	ajaeger: sure	14:43
fungi	we do seem to be rather steadily deleting a lot of nodes there, which is probably related to the error node launch attempts count too	14:45
fungi	ahh, no that's gra1 not bhs1	14:49
*** quiquell is now known as quiquell\|off		14:55
*** bhavikdbavishi has joined #zuul		15:26
gtema	question to API/UI. I am often following status of single patch or project. However UI fetches complete status and filter this in browser. This results in quite big bandwith throughput and load in my browser. What about adding possibility to filter status results in API?	15:34
gtema	depending on time of the day it might be 2Mb data each 5 seconds	15:35
*** bhavikdbavishi has quit IRC		15:43
*** mattclay has quit IRC		15:45
*** mattclay has joined #zuul		15:47
*** pcaruana has joined #zuul		16:00
*** bhavikdbavishi has joined #zuul		16:11
*** hashar has quit IRC		16:33
SpamapS	gtema: I'd love to see us move to GraphQL for that too.	16:40
gtema	spamaps: interesting idea	16:41
gtema	so you mean provide new GraphQL API in zuul. Right?	16:41
SpamapS	Yeah, it's pretty good for optimizations like that.	16:41
SpamapS	But you could do it with a new REST endpoint too.	16:42
gtema	cool. Will think about and play a bit to come up with suggestion	16:42
SpamapS	Or it's possible a filter argument to the current status endpoint would handle your case.	16:43
gtema	sure, it would. I wanted to ask in the round what people think about such extension	16:43
gtema	but looks like most cores are away :D	16:44
dmsimard	I got pulled into a weird issue with nodepool... in summary, it /looks/ like nodepool has failed to create some virtual machines in a way that it can't recover from ?	16:46
dmsimard	some logs/output/trace http://paste.openstack.org/raw/735977/	16:46
dmsimard	you'll see that the first warning has pool: None, which I guess explains "Cannot find provider pool for node"	16:47
fungi	SpamapS: gtema: pretty sure you can already request a specific change's status from the existing zuul api	16:55
gtema	fungi: yes, but UI is not currently capable on handling that	16:56
fungi	right, we were looking at reintroducing the gerrit webui overlay to display change test status there	16:56
fungi	and that api method was a prerequisite	16:57
gtema	filter in the UI splits values on ",", but API responds only on the change with patchset	16:57
fungi	point being, i don't see where waving a magic graphql wand is relevant	16:57
gtema	can be a POC to implement different filters in API	16:58
pabelanger	dmsimard: do you have access to full log? given that rdo-cloud-triplo is also doing OVB, we need to be sure that something didnt do anything behind the back of nodepool	16:59
pabelanger	dmsimard: you also cannot delete a locked now, so the execption is to be expected	16:59
dmsimard	pabelanger: yeah it's a little bit messy and they're about to upgrade OVS on rdo cloud so I'll check next week ..	17:01
*** bjackman has joined #zuul		17:05
*** bjackman has quit IRC		17:12
*** gtema has quit IRC		17:13
fungi	SpamapS: gtema: i've just confirmed, e.g. http://zuul.openstack.org/api/status/change/619625,2 works on our zuul deployment anyway	17:15
fungi	so anyway, no need for a new rest endpoint i don't think?	17:16
*** jpena is now known as jpena\|off		17:33
*** cmurphy has quit IRC		17:38
*** cmurphy has joined #zuul		17:40
*** robcresswell has joined #zuul		17:56
SpamapS	fungi: agreed, we just have to make the UI use it.	18:59
SpamapS	and yeah, the API should maybe accept a change w/o a patchset	19:00
*** electrofelix has quit IRC		19:06
fungi	agreed	19:09
*** bhavikdbavishi has quit IRC		19:12
*** rfolco has quit IRC		19:40
*** robcresswell has quit IRC		20:25
*** nhicher has joined #zuul		20:31
*** pcaruana has quit IRC		21:46
SpamapS	Hrm	23:37
SpamapS	nodepool keeps doing NODE_FAILURE's because AWS says no more nodes. Need to get quota management in.	23:37
SpamapS	(My quota with AWS is 10, max-servers is 5, but sometimes I haven't quite terminated 5 and it tries to do 6 more and ... doh).	23:37

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!