Friday, 2017-10-06

pabelanger	easy +3 for somebody https://review.openstack.org/509833/	00:04
pabelanger	job layout changes	00:04
pabelanger	jeblair: ^maybe when you have a moment, switches to use build-openstack-sphinx-docs	00:05
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove references to pipelines, queues, and layouts on dequeue https://review.openstack.org/509903	00:37
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig https://review.openstack.org/509912	00:37
pabelanger	something to look into in the morning	00:47
pabelanger	I see the following in nodepool-launcher debug.log	00:48
pabelanger	2017-10-05 17:37:38,277 DEBUG nodepool.driver.openstack.OpenStackNodeRequestHandler[nl01.openstack.org-6338-PoolWorker.inap-mtl01-main]: Unlocked node 0000140400 for request 200-0000188337	00:48
pabelanger	however	00:49
pabelanger	\| 0000140400 \| inap-mtl01 \| nova \| ubuntu-trusty \| 309ec46a-1fee-4314-91fe-c3be122e891a \| ready \| 00:07:11:10 \| locked \|	00:49
fungi	pabelanger: 509833 hit another post_failure (i rechecked just now)	00:59
fungi	i guess those are still going on	00:59
pabelanger	looking	01:02
fungi	i wasn't having much luck identifying it from the console log, and ara is rather tough to navigate in lynx	01:03
pabelanger	ya, according to ze01.o.o, we got exit code: 2	01:04
pabelanger	2017-10-06 00:24:09,149 DEBUG zuul.AnsibleJob: [build: 2f4383a190674034a8d6e71e0d7d0aff] Ansible output: b'Using /var/lib/zuul/builds/2f4383a190674034a8d6e71e0d7d0aff/ansible/post_playbook_3/ansible.cfg as config file'	01:05
pabelanger	was last post playbook	01:05
*** jkilpatr has quit IRC		01:06
pabelanger	fungi: it is possible something in https://review.openstack.org/505451/ is causing the issue	01:10
pabelanger	we'd never know since it happens after logs are uploaded	01:10
pabelanger	we should confirm with jeblair about maybe moving that before upload-logs role, to get info	01:10
pabelanger	now I EOD	01:13
fungi	a plausible theory	01:21
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Use normal docs build jobs https://review.openstack.org/509833	01:25
*** zigo has quit IRC		01:27
fungi	though i think they rely on the logs already being uploaded right? so couldn't go before upload-logs	01:28
fungi	we might need some alternative means of identifying if/how they're breaking	01:29
*** zigo has joined #zuul		01:31
*** logan- has quit IRC		02:12
*** eventingmonkey has quit IRC		02:12
*** fbouliane has quit IRC		02:15
*** logan- has joined #zuul		02:15
*** fbouliane has joined #zuul		02:17
*** eventingmonkey has joined #zuul		02:17
*** mgagne has quit IRC		02:23
*** mgagne has joined #zuul		02:24
*** mgagne is now known as Guest66098		02:24
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Fix path exclusions https://review.openstack.org/509901	04:35
AJaeger	team, I proposed a change to shift some more nodes to v3, see https://review.openstack.org/509965 - we have changes since 22 hours in the queue	05:25
*** tobiash has quit IRC		05:27
*** tobiash has joined #zuul		05:27
*** Diabelko has quit IRC		05:40
*** isaacb has joined #zuul		06:46
*** isaacb has quit IRC		07:30
*** hashar has joined #zuul		07:49
*** electrofelix has joined #zuul		08:38
*** jkilpatr has joined #zuul		11:03
Shrews	pabelanger: jeblair: it would seem that zuul has chosen NOT to reuse any READY nodes for some reason: http://paste.openstack.org/show/622822/	11:03
Shrews	note the "reuse": false	11:03
Shrews	this has the vexxhost pool thread stuck because it has 10 ready nodes (its max)	11:04
Shrews	maybe some other pool threads too	11:04
Shrews	oh! nm, this is a request from nodepool itself (to satisfy min-ready).	11:10
Shrews	i wonder how it got into this state....	11:10
Shrews	this is going to require some deeper digging. but i need to breakfast first	11:14
Shrews	i deleted a few nodes from a few regions to unwedge things	11:15
*** jkilpatr has quit IRC		11:16
*** jkilpatr has joined #zuul		11:18
*** dkranz has joined #zuul		12:09
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not satisfy min-ready requests if at capacity https://review.openstack.org/510085	12:11
Shrews	jeblair: pabelanger: i think that ^^^ will prevent such wedging again	12:11
*** hashar has quit IRC		12:42
*** hashar has joined #zuul		12:53
pabelanger	Shrews: great! +2	13:24
*** hashar has quit IRC		14:09
*** hashar has joined #zuul		14:14
pabelanger	2017-10-06 14:06:56.793664 \| {phase} {step} {result}: [{trusted} : {playbook}@{branch}]2017-10-06 14:06:56.793882 \| {phase} {step}: [{trusted} : {playbook}@{branch}]2017-10-06 14:06:58.268508 \|	14:16
pabelanger	dmsimard: ^ I think you added that recently?	14:17
dmsimard	errrrrrrrrrrrrrrrrr	14:17
dmsimard	pabelanger: have a link ?	14:17
pabelanger	http://logs.openstack.org/91/509491/3/check/ansible-role-nodepool-ubuntu-xenial/84cba16/job-output.txt.gz	14:17
dmsimard	damn it :(	14:18
dmsimard	pabelanger: thanks, I'll figure out a fix, it's indeed coming from https://review.openstack.org/#/c/509254/4/zuul/executor/server.py	14:19
jeblair	msg = msg.format() :)	14:19
dmsimard	doh	14:20
dmsimard	indeed	14:20
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul feature/zuulv3: Properly format messages coming out of emitPlaybookBanner https://review.openstack.org/510135	14:23
dmsimard	well, that was embarassing	14:23
dmsimard	we also need a line break in there, let me add that	14:24
jeblair	Shrews: comment on the min-ready change	14:25
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul feature/zuulv3: Properly format messages coming out of emitPlaybookBanner https://review.openstack.org/510135	14:26
dmsimard	pabelanger, jeblair ^	14:26
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not satisfy min-ready requests if at capacity https://review.openstack.org/510085	14:36
Shrews	jeblair: responded. as an example of my comment, i've been watching our ready node situation and we always have a LOT of ready/unlocked nodes, which i speculate happens when we get super busy responding to requests	14:43
Shrews	(a LOT when idle, that is)	14:44
jeblair	Shrews: i'm confused	14:44
jeblair	Shrews: are you saying when we are very busy, we create a lot of min-ready requests but have few ready nodes, but then when we are idle, we have no requests but many ready nodes?	14:45
Shrews	oh actually, nevermind. i already have checks in place to not submit more min-ready requests when there are unfulfilled requests already.	14:47
Shrews	i'll revert that priority part	14:47
jeblair	ok	14:47
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not satisfy min-ready requests if at capacity https://review.openstack.org/510085	14:48
Shrews	we can bikeshed the priority thing later if we need to	14:48
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Properly format messages coming out of emitPlaybookBanner https://review.openstack.org/510135	14:53
mordred	jeblair: I'm looking at a fix for https://review.openstack.org/#/c/508822/2 which is on our list of job related errors to fix ... tl;dr is that in the project config a job entry was given a list intead of a dict	14:54
mordred	jeblair: I'm poking through configloader - but any hints as to where to look?	14:54
dmsimard	infra-root: need to reload executors ^ landed that fixes this abomination http://logs.openstack.org/91/509491/3/check/ansible-role-nodepool-ubuntu-xenial/84cba16/job-output.txt.gz#_2017-10-06_14_03_41_849986	14:55
mordred	infra-root: do we havea "restart executors" playbook yet?	14:55
mordred	hrm. lemme take that to infra ...	14:56
dmsimard	mordred: that patchset rightfully returned a syntax error, I guess you want to make it more friendly ?	14:56
mordred	dmsimard: yah - the patchset just returned "unknown configuration error" - but in this case we should be able to know that the issue was that required-projects was given as a list and not a dict	14:57
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Changes for Ansible 2.4 https://review.openstack.org/505354	14:57
dmsimard	mordred: we probably need something broader that knows what data types we are expecting and validate that we're receiving what we're expecting	14:58
dmsimard	(for every config parameter)	14:59
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Changes for Ansible 2.4 https://review.openstack.org/505354	14:59
jeblair	dmsimard: something other than voluptuous?	15:00
jeblair	mordred: i start by writing a test for the syntax error, then temporarily adding a raise into the exception handler that's masking it to see the full traceback, then go from there.	15:02
dmsimard	jeblair: I'm not super familiar with how the configuration is loaded in the first place, would need to take a look.	15:03
dmsimard	I guess what I was saying is that we should try to avoid handling things on a case by case basis (i.e, only fix "required-projects") when this kind of issue can be expected elsewhere	15:04
jeblair	dmsimard: ah. well, we have what you describe. there is a bug. mordred is investigating the bug.	15:04
jeblair	dmsimard: we absolutely do not handle this on a case-by-case basis.	15:05
dmsimard	ok, whew :)	15:05
*** hashar is now known as hasharAway		15:05
mordred	jeblair: gotcha! cool	15:07
*** hasharAway has quit IRC		15:10
jeblair	mordred: see test_untrusted_shadow_error for an example test	15:11
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Changes for Ansible 2.4 https://review.openstack.org/505354	15:15
Shrews	that ^^^ doesn't actually change the ansible requirement yet, but gets us ready for it when 2.4.1 is released	15:16
Shrews	mordred: i had to rebase part of your changes in that. i think it's all still valid, but you might want to verify when you have idle time	15:18
mordred	awesome	15:18
dmsimard	Shrews: you're also waiting for 2.4.1 ?	15:21
dmsimard	as in, ara is in stalemate right now and we also need to wait for 2.4.1	15:22
dmsimard	I had to send a bugfix upstream that will only land in 2.4.1 https://github.com/ansible/ansible/pull/31200	15:22
Shrews	dmsimard: 2.4 breaks YAML inventory parsing	15:23
dmsimard	doh	15:24
dmsimard	Shrews: we should probably add a job on zuul which tests against devel, make it non-voting	15:25
dmsimard	that's what I do for ara	15:25
mordred	jeblair: TIL you can set a merge-mode in a project-template	15:26
jeblair	mordred: ya; first one wins though	15:27
dmsimard	Shrews: lgtm, just added a comment	15:30
Shrews	dmsimard: that's one of the changes mordred made. i'll let him comment	15:30
mordred	jeblair: I've got the config syntax issue - made one fix, then realized that was slightly inaccurate, working on second fix	15:49
mordred	jeblair: (tl;dr - we do not actually have a voluptous schema here other than "dict")	15:50
mordred	jeblair: I've now got it emitting this: http://paste.openstack.org/show/622850/	15:54
mordred	jeblair: which is better, but I think still maybe not as clear to the unwashed	15:54
mordred	jeblair: but I think for now that's better than what it was	15:55
fungi	looks like zuulv3 stabilized to around 28GiB virtual memory in use in the 10:00-13:00z span, but looking at the timeline that's basically the period where it was unresponsive due to full rootfs up to the point where i nova rebooted it	15:56
fungi	yikes, load average spiked up into the 360s around 06:30z. periodic jobs kicking off, i guess	15:59
fungi	looks like maybe the periodic jobs starting at 06:00z may have just pushed memory utilization into swap, so then it was trying to deal with them while thrashing swap	16:01
jeblair	mordred: yeah, all those errors are like that. it's not a great message, but it does at least say what's wrong.	16:02
mordred	++	16:02
jeblair	mordred: sinc this is just turning out to be a schema bug, i don't think we need a specific test for it (ie, it's not a new class of unhandled error). up to you whether you want to include that in the fix, i'd say.	16:02
mordred	jeblair: well - I made two tests for the case, might as well keep them :)	16:03
jeblair	ok	16:03
fungi	also, can see the dip on the disk usage graph when logrotate kicked in at 06:00z, but used space in / started curving upward sharply at that point (perhaps filling up with the kazoo thread exceptions pabelanger noted)	16:04
fungi	so this was probably a cascade effect	16:04
mordred	jeblair: do we have a list anywhere of which Job attributes are invalid in project and project-template job lists? I know name and parent are	16:04
jeblair	mordred: i think those are the only ones...	16:06
jeblair	mordred: though, fun fact, because of the bug you're fixing, we totally would have passed those through for amusing results	16:07
mordred	yah	16:07
jeblair	mordred: actually, maybe just omit name	16:07
jeblair	mordred: if we reject parent for job variants, we should do that both here and at the top level	16:08
mordred	jeblair: ooh - I have just learned a new python 3.5 syntax	16:16
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig https://review.openstack.org/509912	16:17
jeblair	uhoh :)	16:18
mordred	jeblair: well, it doesn't work here - but in 3.5 you can pass more than one **kwargs argument to an invocation	16:20
mordred	jeblair: so you can do dict(dict1, dict2)	16:21
fungi	mordred: does it merge them?	16:21
mordred	fungi: yes	16:21
fungi	i guess it instantiates an empty dict and then does an .update() with each of them in sequence or something like that	16:21
mordred	foo = dict(dict1, dict2) is like doing foo = dict1.copy() ; foo.update(dict2)	16:21
fungi	yeah, that's more or less what i was thinking	16:21
fungi	neat!	16:22
mordred	but potentially useful in places where the update step isn't possible (like in a list of class attributes)	16:22
fungi	yup	16:28
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't load dynamic layout twice unless needed https://review.openstack.org/510180	16:36
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't load dynamic layout twice unless needed https://review.openstack.org/510180	16:44
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Provide error message on malformed job list https://review.openstack.org/510185	16:48
mordred	jeblair, dmsimard, fungi: ^^ that should fix the syntax error reporting for list instead of dict from https://review.openstack.org/#/c/508822/2	16:49
jeblair	mordred: the chainmap thing works with voluptuous? neat	16:54
jeblair	mordred: oh i see, you dict() the result	16:54
jeblair	that makes sense	16:54
jeblair	we can probably use that later to reduce the duplication in project-template and project schemas	16:55
mordred	jeblair: ++	16:57
*** openstack has joined #zuul		17:09
*** ChanServ sets mode: +o openstack		17:09
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Provide error message on malformed job list https://review.openstack.org/510185	17:19
*** electrofelix has quit IRC		17:50
dmsimard	pabelanger, jeblair, mordred: I just realized that we're likely not setting up unbound in v3 as it was configured in the v2 ready-script	17:56
dmsimard	I'll put it on my todo, I found it while troubleshooting a centos unbound issue	17:56
jeblair	dmsimard: add to jobs section of etherpad?	17:56
dmsimard	yup	17:56
mordred	dmsimard: great catch - thanks!	18:07
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Do not satisfy min-ready requests if at capacity https://review.openstack.org/510085	18:14
pabelanger	Yay	18:15
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Create git_http_low_speed_limit / git_http_low_speed_time https://review.openstack.org/509893	18:31
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Have zuul re-run ABORTED jobs https://review.openstack.org/510211	18:50
pabelanger	jeblair: mordred: I think that fixes our aborted jobs issue^. If so, I can see about writing a test too	18:51
pabelanger	and fixing tox issues now	19:14
jeblair	pabelanger: see my comment on change	19:15
jeblair	i need to grab lunch now	19:15
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Handle non-syntax errors from Ansible https://review.openstack.org/510219	19:15
mordred	jeblair, pabelanger: as a follow up to that - earlier when looking at restarting executors there was a comment about jobs not getting re-run if the executor was hard-killed rather than gracefulled - can we also detect that in the client and retry? It seems like the executor going away unexpectedly should always result in a retry, no?	19:34
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Retry jobs on executor disconnect https://review.openstack.org/510223	19:38
mordred	jeblair, pabelanger: something like that ^^ although I'm not super sure how to test it ATM	19:39
fungi	mordred: that's separate from what 510211 is attempting to address, i guess?	19:44
fungi	mordred: by hard-killed you mean sigkill or sigsegv instead of sigterm?	19:45
fungi	so the executor doesn't get sufficient time to communicate the abort?	19:45
*** hashar has joined #zuul		19:46
mordred	yah	19:48
mordred	or, heck, cloud decides to kill the VM on which the executor is running	19:48
pabelanger	will look in a minute, about to push an update for 510211	19:48
mordred	the 'executor went away and we don't know why' case	19:48
fungi	makes sense	19:53
fungi	thanks	19:53
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603	19:55
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Have zuul re-run ABORTED jobs https://review.openstack.org/510211	19:56
pabelanger	okay, v2 of abort however, I think one disk monitor job is failing	19:57
pabelanger	might need help on how to properly handle that	19:57
openstackgerrit	Merged openstack-infra/zuul-jobs master: Set zuul_log_path for a periodic job https://review.openstack.org/509384	19:59
jeblair	mordred: the discussion earlier about re-running jobs when restarting executors is exactly what pabelanger is working on	20:11
mordred	jeblair: what's the difference between "ABORTED" and "DISCONNECTED" ?	20:11
jeblair	mordred: iow, when an executor is stopped (via some method other than killall -9, it goes through the code path pabelanger is touching	20:12
jeblair	mordred: where does DISCONNECT show up? afaict, you're adding it in your change	20:12
mordred	jeblair: sorry - I meant what's the difference between result=='ABORTED' and onDisconnect - but I believe I understand now	20:13
jeblair	mordred: ya, so the case you're looking at is important to consider too -- it's the unclean shutdown case	20:13
mordred	jeblair: pabelanger's change is to handle the clean shutdown case, mine is the unclean	20:13
mordred	yah	20:13
jeblair	mordred: however, that should be handled by the current "result is None" case	20:13
mordred	cool - so the only difference would be the suggestion that we don't count executor restarts (clean or unclean) against a job's retry count	20:15
jeblair	mordred: ya; considering how things have changed in v3, if you want to push for it on that basis, i could see that	20:15
* mordred can abandon his change, didn't quite track in the brain that None case would handle it - but now that you say it it totally makes sense and should have been obvious :)		20:15
pabelanger	jeblair: so, should I clean up the test_disk_accountant_kills_job to handle the proper abort retry logic now? http://logs.openstack.org/11/510211/2/infra-check/tox-py35/d6bc444/testr_results.html.gz Or do you have any other suggestion for that use case?	20:16
pabelanger	we'd abort 3 times on that now	20:17
pabelanger	eventually hitting retry_limit	20:17
jeblair	hrm, i think if we hit the disk limit, we should actually return an error other than ABORTED. it's not a retryable error really	20:18
jeblair	lemme look	20:18
pabelanger	kk	20:18
mordred	jeblair: agree	20:19
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Always retry jobs on executor disconnect https://review.openstack.org/510223	20:20
mordred	jeblair, pabelanger: ^^ rebased that on top of pabelanger's change and included ABORTED in the always-retry logic	20:20
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Always retry jobs on executor disconnect https://review.openstack.org/510223	20:22
mordred	except this time without sucking	20:22
pabelanger	mordred: wouldn't that mean is a job keept aborting, it would run forever?	20:24
jlk	hey all, I'm out at SeaGL today, sorry for not responding to anything. But I had lunch with Clark :)	20:24
pabelanger	say hi!	20:24
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Return a distinct result on executor disk full https://review.openstack.org/510227	20:29
jeblair	mordred, pabelanger: maybe base your 2 changes on that?	20:29
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't store pipeline references on builds https://review.openstack.org/509653	20:33
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't load dynamic layout twice unless needed https://review.openstack.org/510180	20:33
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Have zuul re-run ABORTED jobs https://review.openstack.org/510211	20:35
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Always retry jobs on executor disconnect https://review.openstack.org/510223	20:35
pabelanger	done	20:35
jeblair	that's some teamwork	20:35
pabelanger	jeblair: mordred: quick question on 510223	20:37
jeblair	mordred: on 510185 what was "None" needed for?	20:38
jeblair	mordred: oh, i see it broke that test. i'm kind of inclined to go with PS1 and fix the test. i think that was an accident, and i think it's better not to have a ':' unless needed (that's the behavior elsewhere at least)	20:40
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't store pipeline references on builds https://review.openstack.org/509653	20:52
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't load dynamic layout twice unless needed https://review.openstack.org/510180	20:52
pabelanger	so, if I see the follow request from nodepool-launcher	20:55
pabelanger	\| 200-0000309834 \| requested \| zuulv3 \| centos-7 \| \| nl02.openstack.org-10797-PoolWorker.infracloud-vanilla-main,nl02.openstack.org-10797-PoolWorker.citycloud-lon1-main,nl02.openstack.org-10797-PoolWorker.citycloud-sto2-main,nl02.openstack.org-10797-PoolWorker.infracloud-chocolate-main,nl02.openstack.org-10797-PoolWorker.citycloud-la1-main	20:55
pabelanger	does that mean, it is waiting to launch 1 centos-7 on the clouds listed there?	20:55
pabelanger	the other question, it seems nl02 is many more requests assigned to it then nl01 (2562 vs 60)	20:58
pabelanger	trying to see why that is	20:58
jeblair	pabelanger: what's the heading for that column?	21:02
pabelanger	jeblair: I think declined by?	21:03
pabelanger	okay, that helps	21:03
jeblair	pabelanger: so that's a negative. those are launchers which have decided not to handle the request	21:03
jeblair	pabelanger: where are you seeing the 2562 vs 60 numbers?	21:05
jeblair	launchers shouldn't accept more requests than they can handle, and 2562 is definitely more than nl01 can handle	21:05
pabelanger	jeblair: I might be doing it incorrectly, but I ran: sudo -H -u nodepool nodepool request-list \| grep nl01 \| wc -l vs sudo -H -u nodepool nodepool request-list \| grep nl02 \| wc -l	21:07
pabelanger	and was trying to see if the requests were some how split across launchers	21:07
pabelanger	however, I admit, request-list is still new for me	21:07
jeblair	pabelanger: does it include declined-by?	21:08
pabelanger	yah	21:08
pabelanger	I started looking at it, because we have 1 centos-7 ready node	21:08
pabelanger	and trying to see why it wasn't getting used	21:08
pabelanger	Oh, something has grabbed it now	21:09
jeblair	pabelanger: okay, so nl01 is declining more requests than nl02. which may still be undesirable, but not what i was expecting when you first brought it up.	21:09
pabelanger	ok	21:10
pabelanger	I just seen this in the debug log	21:13
pabelanger	2017-10-06 21:12:01,739 DEBUG nodepool.driver.openstack.OpenStackNodeRequestHandler[nl02.openstack.org-10797-PoolWorker.citycloud-lon1-main]: Declining node request 200-0000310062 because it would exceed quota	21:13
pabelanger	so, maybe a side affect of split quota over 2 nodepools?	21:13
pabelanger	Oh, I know	21:20
pabelanger	we have infracloud disabled in nl02	21:21
pabelanger	so, it will always decline	21:21
pabelanger	same goes for citycloud	21:21
pabelanger	jeblair: ^	21:21
clarkb	I think ctiycloud we might be able to turn back on again if the az thing is fixed?	21:21
clarkb	fungi: ^	21:21
pabelanger	ya, infracloud can also be enabled agin	21:22
clarkb	and possibly we can turn on infracloud now that zuul cpu use is saner	21:22
pabelanger	we did on nodepool.o.o, but not nl02	21:22
pabelanger	ya	21:22
pabelanger	okay, EOD for me now	21:23
fungi	clarkb: the assumption being that the citycloud failures were related to us booting in their starved "nova-local" ssd az?	21:23
clarkb	fungi: ya, or at least we knew we had a problem there that we should've fixed	21:23
fungi	worth a retry i s'pose	21:24
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig https://review.openstack.org/509912	21:28
jeblair	i'm going to self-approve some changes after rebase	21:30
jeblair	i'm self-approving 509653 which skips the test as we discussed earlier	21:30
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Update node requests after nodes https://review.openstack.org/509571	21:31
jeblair	fungi, mordred, clarkb: can you +2/+3 508698	21:32
fungi	lgtm	21:35
fungi	also, i have a fair amount of confidence in SpamapS's review on it at well	21:36
jeblair	mordred: i'm pretty sure i wrote a filename other than "sub_nodes" in the etherpad but it looks like you deleted it and replaced it with that	21:39
jeblair	i agree that sub_nodes is used and we should add it, but i'm also pretty sure i was deliberate and correct about the thing i put in there	21:39
jeblair	i will look through the history and try to find out what it was	21:40
jeblair	okay, it was node_private	21:41
jeblair	i will put that back	21:41
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Don't store pipeline references on builds https://review.openstack.org/509653	21:42
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Don't load dynamic layout twice unless needed https://review.openstack.org/510180	21:42
jeblair	i'm going to self-approve 509571 which was previously approved before a rebase	21:45
jeblair	also has 3 other +2s	21:45
jeblair	509912 is ready for review now	21:46
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add base job and roles for javascript https://review.openstack.org/510236	21:46
jeblair	i'm going to pick up the stable branch depends on bug now	21:47
mordred	jeblair: ok. so - the tripleo patch that was linked to didn't use node_private anywhere but instead used sub_nodes all throughout it	21:48
mordred	jeblair: so I think we were looking at different things perhaps?	21:48
jeblair	mordred: i saw node_private in the tht patches	21:48
mordred	jeblair: k. https://review.openstack.org/#/c/508660/ is what I was looking at ... could you point me to what you're looking at? (I don't doubt you - but if we're seeing different things then I worry that there may be other things missed too)	21:50
jeblair	mordred: https://review.openstack.org/509704	21:51
mordred	jeblair: thanks!	21:53
jeblair	np, sorry i didn't leave enough breadcrumbs	21:54
mordred	jeblair: re: None in 510185 - I can go either way - whichever you feel is better	21:55
jeblair	mordred: okay. i'm feeling fond of PS1 and no support for none. i have a moderate (but not strong) feeling that will encourage tidy layouts and reduce typo errors.	21:56
mordred	jeblair: okie - I'll update!	21:57
jeblair	(i'm also certain it will result in more errors to users, but think it's worth it since it points out a potential typo)	21:58
jeblair	it's one of those "maybe you just forgot to remove a colon, or maybe you forgot to add something important" things	21:59
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Provide error message on malformed job list https://review.openstack.org/510185	22:01
mordred	jeblair: there ya go ^^	22:01
jeblair	w00t	22:03
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Update node requests after nodes https://review.openstack.org/509571	22:04
jeblair	mordred: good news and bad news.	22:07
jeblair	mordred: good news: we don't have a test for depends-on same id on multiple branches.	22:07
jeblair	mordred: bad news: i added one and it passed.	22:07
jeblair	even inspected the inventory file, and the stable branch change shows up in the items list	22:07
jeblair	so i'm going to have to dig deeper on that.	22:08
mordred	jeblair: I agree - that is both good news and bad news	22:09
jeblair	i approved mordred's 2 outstanding zuul changes; i think when they land we should restart all components of zuul and zookeeper and nodepool	22:11
mordred	++	22:12
jeblair	i have to run now; anyone who feels up to it, feel free to do that restart when 510185 and 510219 land	22:12
mordred	dmsimard: heya - you aroud?	22:15
dmsimard	mordred: not for long	22:15
mordred	dmsimard: well ... maybe you can answer super-quick	22:15
mordred	dmsimard: I'm looking at the tox-linters job for openstack-zuul-jobs and the linters env in tox.ini	22:15
mordred	it has this:	22:16
mordred	ANSIBLE_ROLES_PATH = {toxinidir}/roles:{envdir}/src/zuul-jobs/roles	22:16
mordred	but I don't know where it's getting that zuul-jobs from	22:16
mordred	do you?	22:16
dmsimard	iirc required-projects gets automatically added to role path or something to that effect	22:17
dmsimard	or perhaps roles:	22:17
mordred	AH - I see it ...	22:17
mordred	-e git://git.openstack.org/openstack-infra/zuul-jobs#egg=zuul-jobs	22:17
dmsimard	https://docs.openstack.org/infra/zuul/feature/zuulv3/user/config.html#attr-job.roles	22:17
mordred	in test-requirements.txt	22:17
dmsimard	Roles are added to the Ansible role path in the order they appear on the job – roles earlier in the list will take precedence over those which follow.4	22:17
mordred	dmsimard: yah - this is on the test node itself running linters	22:18
mordred	dmsimard: I've got an ozj patch with a depends-on on a zj patch but it's not getting set up properly ...	22:18
dmsimard	mordred: link ?	22:18
mordred	although you know what - tox_install_siblings should be able to totally fix this for me :)	22:19
mordred	http://logs.openstack.org/37/510237/1/infra-check/tox-linters/382fe42/job-output.txt.gz	22:19
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Handle non-syntax errors from Ansible https://review.openstack.org/510219	22:19
mordred	dmsimard: gonna see if the magic in the tox job will take care of it ^^	22:19
mordred	wait ... I mean: remote: https://review.openstack.org/510237 Add javascript tarball publication job	22:19
mordred	:)	22:19
dmsimard	fwiw that patch makes sense :p	22:20
mordred	heh	22:21
mordred	dmsimard: btw - like the POST-RUN END RESULT_NORMAL: [untrusted : git.openstack.org/openstack-infra/zuul-jobs/playbooks/tox/post@master] lines	22:26
dmsimard	Aw yeah	22:27
dmsimard	Much better than the non-formatted strings we had this morning	22:27
* dmsimard coughs		22:27
dmsimard	afk food	22:27
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Provide error message on malformed job list https://review.openstack.org/510185	22:27
*** hashar has quit IRC		22:50
*** harlowja has quit IRC		23:45
*** docaedo has joined #zuul		23:47

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!