Friday, 2017-10-06

pabelangereasy +3 for somebody https://review.openstack.org/509833/00:04
pabelangerjob layout changes00:04
pabelangerjeblair: ^maybe when you have a moment, switches to use build-openstack-sphinx-docs00:05
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove references to pipelines, queues, and layouts on dequeue  https://review.openstack.org/50990300:37
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig  https://review.openstack.org/50991200:37
pabelangersomething to look into in the morning00:47
pabelangerI see the following in nodepool-launcher debug.log00:48
pabelanger2017-10-05 17:37:38,277 DEBUG nodepool.driver.openstack.OpenStackNodeRequestHandler[nl01.openstack.org-6338-PoolWorker.inap-mtl01-main]: Unlocked node 0000140400 for request 200-000018833700:48
pabelangerhowever00:49
pabelanger| 0000140400 | inap-mtl01             | nova     | ubuntu-trusty    | 309ec46a-1fee-4314-91fe-c3be122e891a | ready    | 00:07:11:10 | locked   |00:49
fungipabelanger: 509833 hit another post_failure (i rechecked just now)00:59
fungii guess those are still going on00:59
pabelangerlooking01:02
fungii wasn't having much luck identifying it from the console log, and ara is rather tough to navigate in lynx01:03
pabelangerya, according to ze01.o.o, we got exit code: 201:04
pabelanger2017-10-06 00:24:09,149 DEBUG zuul.AnsibleJob: [build: 2f4383a190674034a8d6e71e0d7d0aff] Ansible output: b'Using /var/lib/zuul/builds/2f4383a190674034a8d6e71e0d7d0aff/ansible/post_playbook_3/ansible.cfg as config file'01:05
pabelangerwas last post playbook01:05
*** jkilpatr has quit IRC01:06
pabelangerfungi: it is possible something in https://review.openstack.org/505451/ is causing the issue01:10
pabelangerwe'd never know since it happens after logs are uploaded01:10
pabelangerwe should confirm with jeblair about maybe moving that before upload-logs role, to get info01:10
pabelangernow I EOD01:13
fungia plausible theory01:21
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Use normal docs build jobs  https://review.openstack.org/50983301:25
*** zigo has quit IRC01:27
fungithough i think they rely on the logs already being uploaded right? so couldn't go before upload-logs01:28
fungiwe might need some alternative means of identifying if/how they're breaking01:29
*** zigo has joined #zuul01:31
*** logan- has quit IRC02:12
*** eventingmonkey has quit IRC02:12
*** fbouliane has quit IRC02:15
*** logan- has joined #zuul02:15
*** fbouliane has joined #zuul02:17
*** eventingmonkey has joined #zuul02:17
*** mgagne has quit IRC02:23
*** mgagne has joined #zuul02:24
*** mgagne is now known as Guest6609802:24
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix path exclusions  https://review.openstack.org/50990104:35
AJaegerteam, I proposed a change to shift some more nodes to v3, see https://review.openstack.org/509965 - we have changes since 22 hours in the queue05:25
*** tobiash has quit IRC05:27
*** tobiash has joined #zuul05:27
*** Diabelko has quit IRC05:40
*** isaacb has joined #zuul06:46
*** isaacb has quit IRC07:30
*** hashar has joined #zuul07:49
*** electrofelix has joined #zuul08:38
*** jkilpatr has joined #zuul11:03
Shrewspabelanger: jeblair: it would seem that zuul has chosen NOT to reuse any READY nodes for some reason: http://paste.openstack.org/show/622822/11:03
Shrewsnote the "reuse": false11:03
Shrewsthis has the vexxhost pool thread stuck because it has 10 ready nodes (its max)11:04
Shrewsmaybe some other pool threads too11:04
Shrewsoh! nm, this is a request from nodepool itself (to satisfy min-ready).11:10
Shrewsi wonder how it got into this state....11:10
Shrewsthis is going to require some deeper digging. but i need to breakfast first11:14
Shrewsi deleted a few nodes from a few regions to unwedge things11:15
*** jkilpatr has quit IRC11:16
*** jkilpatr has joined #zuul11:18
*** dkranz has joined #zuul12:09
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not satisfy min-ready requests if at capacity  https://review.openstack.org/51008512:11
Shrewsjeblair: pabelanger: i think that ^^^ will prevent such wedging again12:11
*** hashar has quit IRC12:42
*** hashar has joined #zuul12:53
pabelangerShrews: great! +213:24
*** hashar has quit IRC14:09
*** hashar has joined #zuul14:14
pabelanger2017-10-06 14:06:56.793664 | {phase} {step} {result}: [{trusted} : {playbook}@{branch}]2017-10-06 14:06:56.793882 | {phase} {step}: [{trusted} : {playbook}@{branch}]2017-10-06 14:06:58.268508 |14:16
pabelangerdmsimard: ^ I think you added that recently?14:17
dmsimarderrrrrrrrrrrrrrrrrr14:17
dmsimardpabelanger: have a link ?14:17
pabelangerhttp://logs.openstack.org/91/509491/3/check/ansible-role-nodepool-ubuntu-xenial/84cba16/job-output.txt.gz14:17
dmsimarddamn it :(14:18
dmsimardpabelanger: thanks, I'll figure out a fix, it's indeed coming from https://review.openstack.org/#/c/509254/4/zuul/executor/server.py14:19
jeblairmsg = msg.format()  :)14:19
dmsimarddoh14:20
dmsimardindeed14:20
openstackgerritDavid Moreau Simard proposed openstack-infra/zuul feature/zuulv3: Properly format messages coming out of emitPlaybookBanner  https://review.openstack.org/51013514:23
dmsimardwell, that was embarassing14:23
dmsimardwe also need a line break in there, let me add that14:24
jeblairShrews: comment on the min-ready change14:25
openstackgerritDavid Moreau Simard proposed openstack-infra/zuul feature/zuulv3: Properly format messages coming out of emitPlaybookBanner  https://review.openstack.org/51013514:26
dmsimardpabelanger, jeblair ^14:26
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not satisfy min-ready requests if at capacity  https://review.openstack.org/51008514:36
Shrewsjeblair: responded. as an example of my comment, i've been watching our ready node situation and we always have a LOT of ready/unlocked nodes, which i speculate happens when we get super busy responding to requests14:43
Shrews(a LOT when idle, that is)14:44
jeblairShrews: i'm confused14:44
jeblairShrews: are you saying when we are very busy, we create a lot of min-ready requests but have few ready nodes, but then when we are idle, we have no requests but many ready nodes?14:45
Shrewsoh actually, nevermind. i already have checks in place to not submit more min-ready requests when there are unfulfilled requests already.14:47
Shrewsi'll revert that priority part14:47
jeblairok14:47
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not satisfy min-ready requests if at capacity  https://review.openstack.org/51008514:48
Shrewswe can bikeshed the priority thing later if we need to14:48
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Properly format messages coming out of emitPlaybookBanner  https://review.openstack.org/51013514:53
mordredjeblair: I'm looking at a fix for https://review.openstack.org/#/c/508822/2 which is on our list of job related errors to fix ... tl;dr is that in the project config a job entry was given a list intead of a dict14:54
mordredjeblair: I'm poking through configloader - but any hints as to where to look?14:54
dmsimardinfra-root: need to reload executors ^ landed that fixes this abomination http://logs.openstack.org/91/509491/3/check/ansible-role-nodepool-ubuntu-xenial/84cba16/job-output.txt.gz#_2017-10-06_14_03_41_84998614:55
mordredinfra-root: do we havea "restart executors" playbook yet?14:55
mordredhrm. lemme take that to infra ...14:56
dmsimardmordred: that patchset rightfully returned a syntax error, I guess you want to make it more friendly ?14:56
mordreddmsimard: yah - the patchset just returned "unknown configuration error" - but in this case we should be able to know that the issue was that required-projects was given as a list and not a dict14:57
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Changes for Ansible 2.4  https://review.openstack.org/50535414:57
dmsimardmordred: we probably need something broader that knows what data types we are expecting and validate that we're receiving what we're expecting14:58
dmsimard(for every config parameter)14:59
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Changes for Ansible 2.4  https://review.openstack.org/50535414:59
jeblairdmsimard: something *other* than voluptuous?15:00
jeblairmordred: i start by writing a test for the syntax error, then temporarily adding a raise into the exception handler that's masking it to see the full traceback, then go from there.15:02
dmsimardjeblair: I'm not super familiar with how the configuration is loaded in the first place, would need to take a look.15:03
dmsimardI guess what I was saying is that we should try to avoid handling things on a case by case basis (i.e, only fix "required-projects") when this kind of issue can be expected elsewhere15:04
jeblairdmsimard: ah.  well, we have what you describe.  there is a bug.  mordred is investigating the bug.15:04
jeblairdmsimard: we absolutely do not handle this on a case-by-case basis.15:05
dmsimardok, whew :)15:05
*** hashar is now known as hasharAway15:05
mordredjeblair: gotcha! cool15:07
*** hasharAway has quit IRC15:10
jeblairmordred: see test_untrusted_shadow_error for an example test15:11
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Changes for Ansible 2.4  https://review.openstack.org/50535415:15
Shrewsthat ^^^ doesn't actually change the ansible requirement yet, but gets us ready for it when 2.4.1 is released15:16
Shrewsmordred: i had to rebase part of your changes in that. i think it's all still valid, but you might want to verify when you have idle time15:18
mordredawesome15:18
dmsimardShrews: you're also waiting for 2.4.1 ?15:21
dmsimardas in, ara is in stalemate right now and we also need to wait for 2.4.115:22
dmsimardI had to send a bugfix upstream that will only land in 2.4.1 https://github.com/ansible/ansible/pull/3120015:22
Shrewsdmsimard: 2.4 breaks YAML inventory parsing15:23
dmsimarddoh15:24
dmsimardShrews: we should probably add a job on zuul which tests against devel, make it non-voting15:25
dmsimardthat's what I do for ara15:25
mordredjeblair: TIL you can set a merge-mode in a project-template15:26
jeblairmordred: ya; first one wins though15:27
dmsimardShrews: lgtm, just added a comment15:30
Shrewsdmsimard: that's one of the changes mordred made. i'll let him comment15:30
mordredjeblair: I've got the config syntax issue - made one fix, then realized that was slightly inaccurate, working on second fix15:49
mordredjeblair: (tl;dr - we do not actually have a voluptous schema here other than "dict")15:50
mordredjeblair: I've now got it emitting this: http://paste.openstack.org/show/622850/15:54
mordredjeblair: which is better, but I think still maybe not as clear to the unwashed15:54
mordredjeblair: but I think for now that's better than what it was15:55
fungilooks like zuulv3 stabilized to around 28GiB virtual memory in use in the 10:00-13:00z span, but looking at the timeline that's basically the period where it was unresponsive due to full rootfs up to the point where i nova rebooted it15:56
fungiyikes, load average spiked up into the 360s around 06:30z. periodic jobs kicking off, i guess15:59
fungilooks like maybe the periodic jobs starting at 06:00z may have just pushed memory utilization into swap, so then it was trying to deal with them while thrashing swap16:01
jeblairmordred: yeah, all those errors are like that.  it's not a great message, but it does at least say what's wrong.16:02
mordred++16:02
jeblairmordred: sinc this is just turning out to be a schema bug, i don't think we need a specific test for it (ie, it's not a new class of unhandled error).  up to you whether you want to include that in the fix, i'd say.16:02
mordredjeblair: well - I made two tests for the case, might as well keep them :)16:03
jeblairok16:03
fungialso, can see the dip on the disk usage graph when logrotate kicked in at 06:00z, but used space in / started curving upward sharply at that point (perhaps filling up with the kazoo thread exceptions pabelanger noted)16:04
fungiso this was probably a cascade effect16:04
mordredjeblair: do we have a list anywhere of which Job attributes are invalid in project and project-template job lists? I know name and parent are16:04
jeblairmordred: i think those are the only ones...16:06
jeblairmordred: though, fun fact, because of the bug you're fixing, we totally would have passed those through for amusing results16:07
mordredyah16:07
jeblairmordred: actually, maybe just omit name16:07
jeblairmordred: if we reject parent for job variants, we should do that both here and at the top level16:08
mordredjeblair: ooh - I have just learned a new python 3.5 syntax16:16
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig  https://review.openstack.org/50991216:17
jeblairuhoh :)16:18
mordredjeblair: well, it doesn't work here - but in 3.5 you can pass more than one **kwargs argument to an invocation16:20
mordredjeblair: so you can do dict(**dict1, **dict2)16:21
fungimordred: does it merge them?16:21
mordredfungi: yes16:21
fungii guess it instantiates an empty dict and then does an .update() with each of them in sequence or something like that16:21
mordredfoo = dict(**dict1, **dict2) is like doing foo = dict1.copy() ; foo.update(dict2)16:21
fungiyeah, that's more or less what i was thinking16:21
fungineat!16:22
mordredbut potentially useful in places where the update step isn't possible (like in a list of class attributes)16:22
fungiyup16:28
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't load dynamic layout twice unless needed  https://review.openstack.org/51018016:36
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't load dynamic layout twice unless needed  https://review.openstack.org/51018016:44
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Provide error message on malformed job list  https://review.openstack.org/51018516:48
mordredjeblair, dmsimard, fungi: ^^ that should fix the syntax error reporting for list instead of dict from https://review.openstack.org/#/c/508822/216:49
jeblairmordred: the chainmap thing works with voluptuous? neat16:54
jeblairmordred: oh i see, you dict() the result16:54
jeblairthat makes sense16:54
jeblairwe can probably use that later to reduce the duplication in project-template and project schemas16:55
mordredjeblair: ++16:57
*** openstack has joined #zuul17:09
*** ChanServ sets mode: +o openstack17:09
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Provide error message on malformed job list  https://review.openstack.org/51018517:19
*** electrofelix has quit IRC17:50
dmsimardpabelanger, jeblair, mordred: I just realized that we're likely not setting up unbound in v3 as it was configured in the v2 ready-script17:56
dmsimardI'll put it on my todo, I found it while troubleshooting a centos unbound issue17:56
jeblairdmsimard: add to jobs section of etherpad?17:56
dmsimardyup17:56
mordreddmsimard: great catch - thanks!18:07
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Do not satisfy min-ready requests if at capacity  https://review.openstack.org/51008518:14
pabelangerYay18:15
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Create git_http_low_speed_limit / git_http_low_speed_time  https://review.openstack.org/50989318:31
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Have zuul re-run ABORTED jobs  https://review.openstack.org/51021118:50
pabelangerjeblair: mordred: I think that fixes our aborted jobs issue^. If so, I can see about writing a test too18:51
pabelangerand fixing tox issues now19:14
jeblairpabelanger: see my comment on change19:15
jeblairi need to grab lunch now19:15
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Handle non-syntax errors from Ansible  https://review.openstack.org/51021919:15
mordredjeblair, pabelanger: as a follow up to that - earlier when looking at restarting executors there was a comment about jobs not getting re-run if the executor was hard-killed rather than gracefulled - can we also detect that in the client and retry? It seems like the executor going away unexpectedly should *always* result in a retry, no?19:34
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Retry jobs on executor disconnect  https://review.openstack.org/51022319:38
mordredjeblair, pabelanger: something like that ^^ although I'm not super sure how to test it ATM19:39
fungimordred: that's separate from what 510211 is attempting to address, i guess?19:44
fungimordred: by hard-killed you mean sigkill or sigsegv instead of sigterm?19:45
fungiso the executor doesn't get sufficient time to communicate the abort?19:45
*** hashar has joined #zuul19:46
mordredyah19:48
mordredor, heck, cloud decides to kill the VM on which the executor is running19:48
pabelangerwill look in a minute, about to push an update for 51021119:48
mordredthe 'executor went away and we don't know why' case19:48
fungimakes sense19:53
fungithanks19:53
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Handle double node locking snafu  https://review.openstack.org/50960319:55
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Have zuul re-run ABORTED jobs  https://review.openstack.org/51021119:56
pabelangerokay, v2 of abort however, I think one disk monitor job is failing19:57
pabelangermight need help on how to properly handle that19:57
openstackgerritMerged openstack-infra/zuul-jobs master: Set zuul_log_path for a periodic job  https://review.openstack.org/50938419:59
jeblairmordred: the discussion earlier about re-running jobs when restarting executors is exactly what pabelanger is working on20:11
mordredjeblair: what's the difference between "ABORTED" and "DISCONNECTED" ?20:11
jeblairmordred: iow, when an executor is stopped (via some method other than killall -9,  it goes through the code path pabelanger is touching20:12
jeblairmordred: where does DISCONNECT show up? afaict, you're adding it in your change20:12
mordredjeblair: sorry - I meant what's the difference between result=='ABORTED' and onDisconnect - but I believe I understand now20:13
jeblairmordred: ya, so the case you're looking at is important to consider too -- it's the unclean shutdown case20:13
mordredjeblair: pabelanger's change is to handle the clean shutdown case, mine is the unclean20:13
mordredyah20:13
jeblairmordred: however, that should be handled by the current "result is None" case20:13
mordredcool - so the only difference would be the suggestion that we don't count executor restarts (clean or unclean) against a job's retry count20:15
jeblairmordred: ya; considering how things have changed in v3, if you want to push for it on that basis, i could see that20:15
* mordred can abandon his change, didn't quite track in the brain that None case would handle it - but now that you say it it totally makes sense and should have been obvious :)20:15
pabelangerjeblair: so, should I clean up the test_disk_accountant_kills_job to handle the proper abort retry logic now? http://logs.openstack.org/11/510211/2/infra-check/tox-py35/d6bc444/testr_results.html.gz Or do you have any other suggestion for that use case?20:16
pabelangerwe'd abort 3 times on that now20:17
pabelangereventually hitting retry_limit20:17
jeblairhrm, i think if we hit the disk limit, we should actually return an error other than ABORTED.  it's not a retryable error really20:18
jeblairlemme look20:18
pabelangerkk20:18
mordredjeblair: agree20:19
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Always retry jobs on executor disconnect  https://review.openstack.org/51022320:20
mordredjeblair, pabelanger: ^^ rebased that on top of pabelanger's change and included ABORTED in the always-retry logic20:20
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Always retry jobs on executor disconnect  https://review.openstack.org/51022320:22
mordredexcept this time without sucking20:22
pabelangermordred: wouldn't that mean is a job keept aborting, it would run forever?20:24
jlkhey all, I'm out at SeaGL today, sorry for not responding to anything. But I had lunch with Clark :)20:24
pabelangersay hi!20:24
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Return a distinct result on executor disk full  https://review.openstack.org/51022720:29
jeblairmordred, pabelanger: maybe base your 2 changes on that?20:29
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't store pipeline references on builds  https://review.openstack.org/50965320:33
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't load dynamic layout twice unless needed  https://review.openstack.org/51018020:33
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Have zuul re-run ABORTED jobs  https://review.openstack.org/51021120:35
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Always retry jobs on executor disconnect  https://review.openstack.org/51022320:35
pabelangerdone20:35
jeblairthat's some teamwork20:35
pabelangerjeblair: mordred: quick question on 51022320:37
jeblairmordred: on 510185 what was "None" needed for?20:38
jeblairmordred: oh, i see it broke that test.  i'm kind of inclined to go with PS1 and fix the test.  i think that was an accident, and i think it's better not to have a ':' unless needed (that's the behavior elsewhere at least)20:40
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't store pipeline references on builds  https://review.openstack.org/50965320:52
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't load dynamic layout twice unless needed  https://review.openstack.org/51018020:52
pabelangerso, if I see the follow request from nodepool-launcher20:55
pabelanger| 200-0000309834 | requested | zuulv3 | centos-7 | | nl02.openstack.org-10797-PoolWorker.infracloud-vanilla-main,nl02.openstack.org-10797-PoolWorker.citycloud-lon1-main,nl02.openstack.org-10797-PoolWorker.citycloud-sto2-main,nl02.openstack.org-10797-PoolWorker.infracloud-chocolate-main,nl02.openstack.org-10797-PoolWorker.citycloud-la1-main20:55
pabelangerdoes that mean, it is waiting to launch 1 centos-7 on the clouds listed there?20:55
pabelangerthe other question, it seems nl02 is many more requests assigned to it then nl01 (2562 vs 60)20:58
pabelangertrying to see why that is20:58
jeblairpabelanger: what's the heading for that column?21:02
pabelangerjeblair: I think declined by?21:03
pabelangerokay, that helps21:03
jeblairpabelanger: so that's a negative.  those are launchers which have decided not to handle the request21:03
jeblairpabelanger: where are you seeing the 2562 vs 60 numbers?21:05
jeblairlaunchers shouldn't accept more requests than they can handle, and 2562 is definitely more than nl01 can handle21:05
pabelangerjeblair: I might be doing it incorrectly, but I ran: sudo -H -u nodepool nodepool request-list | grep nl01 | wc -l vs sudo -H -u nodepool nodepool request-list | grep nl02 | wc -l21:07
pabelangerand was trying to see if the requests were some how split across launchers21:07
pabelangerhowever, I admit, request-list is still new for me21:07
jeblairpabelanger: does it include declined-by?21:08
pabelangeryah21:08
pabelangerI started looking at it, because we have 1 centos-7 ready node21:08
pabelangerand trying to see why it wasn't getting used21:08
pabelangerOh, something has grabbed it now21:09
jeblairpabelanger: okay, so nl01 is *declining* more requests than nl02.  which may still be undesirable, but not what i was expecting when you first brought it up.21:09
pabelangerok21:10
pabelangerI just seen this in the debug log21:13
pabelanger2017-10-06 21:12:01,739 DEBUG nodepool.driver.openstack.OpenStackNodeRequestHandler[nl02.openstack.org-10797-PoolWorker.citycloud-lon1-main]: Declining node request 200-0000310062 because it would exceed quota21:13
pabelangerso, maybe a side affect of split quota over 2 nodepools?21:13
pabelangerOh, I know21:20
pabelangerwe have infracloud disabled in nl0221:21
pabelangerso, it will always decline21:21
pabelangersame goes for citycloud21:21
pabelangerjeblair: ^21:21
clarkbI think ctiycloud we might be able to turn back on again if the az thing is fixed?21:21
clarkbfungi: ^21:21
pabelangerya, infracloud can also be enabled agin21:22
clarkband possibly we can turn on infracloud now that zuul cpu use is saner21:22
pabelangerwe did on nodepool.o.o, but not nl0221:22
pabelangerya21:22
pabelangerokay, EOD for me now21:23
fungiclarkb: the assumption being that the citycloud failures were related to us booting in their starved "nova-local" ssd az?21:23
clarkbfungi: ya, or at least we knew we had a problem there that we should've fixed21:23
fungiworth a retry i s'pose21:24
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig  https://review.openstack.org/50991221:28
jeblairi'm going to self-approve some changes after rebase21:30
jeblairi'm self-approving 509653 which skips the test as we discussed earlier21:30
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Update node requests after nodes  https://review.openstack.org/50957121:31
jeblairfungi, mordred, clarkb: can you +2/+3 50869821:32
fungilgtm21:35
fungialso, i have a fair amount of confidence in SpamapS's review on it at well21:36
jeblairmordred: i'm pretty sure i wrote a filename other than "sub_nodes" in the etherpad but it looks like you deleted it and replaced it with that21:39
jeblairi agree that sub_nodes is used and we should add it, but i'm also pretty sure i was deliberate and correct about the thing i put in there21:39
jeblairi will look through the history and try to find out what it was21:40
jeblairokay, it was node_private21:41
jeblairi will put that back21:41
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Don't store pipeline references on builds  https://review.openstack.org/50965321:42
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Don't load dynamic layout twice unless needed  https://review.openstack.org/51018021:42
jeblairi'm going to self-approve 509571 which was previously approved before a rebase21:45
jeblairalso has 3 other +2s21:45
jeblair509912 is ready for review now21:46
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Add base job and roles for javascript  https://review.openstack.org/51023621:46
jeblairi'm going to pick up the stable branch depends on bug now21:47
mordredjeblair: ok. so - the tripleo patch that was linked to didn't use node_private anywhere but instead used sub_nodes all throughout it21:48
mordredjeblair: so I think we were looking at different things perhaps?21:48
jeblairmordred: i saw node_private in the tht patches21:48
mordredjeblair: k. https://review.openstack.org/#/c/508660/ is what I was looking at ... could you point me to what you're looking at? (I don't doubt you - but if we're seeing different things then I worry that there may be other things missed too)21:50
jeblairmordred: https://review.openstack.org/50970421:51
mordredjeblair: thanks!21:53
jeblairnp, sorry i didn't leave enough breadcrumbs21:54
mordredjeblair: re: None in 510185 - I can go either way - whichever you feel is better21:55
jeblairmordred: okay.  i'm feeling fond of PS1 and no support for none.  i have a moderate (but not strong) feeling that will encourage tidy layouts and reduce typo errors.21:56
mordredjeblair: okie - I'll update!21:57
jeblair(i'm also certain it will result in more errors to users, but think it's worth it since it points out a potential typo)21:58
jeblairit's one of those "maybe you just forgot to remove a colon, or maybe you forgot to add something important" things21:59
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Provide error message on malformed job list  https://review.openstack.org/51018522:01
mordredjeblair: there ya go ^^22:01
jeblairw00t22:03
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Update node requests after nodes  https://review.openstack.org/50957122:04
jeblairmordred: good news and bad news.22:07
jeblairmordred: good news: we don't have a test for depends-on same id on multiple branches.22:07
jeblairmordred: bad news: i added one and it passed.22:07
jeblaireven inspected the inventory file, and the stable branch change shows up in the items list22:07
jeblairso i'm going to have to dig deeper on that.22:08
mordredjeblair: I agree - that is both good news and bad news22:09
jeblairi approved mordred's 2 outstanding zuul changes; i think when they land we should restart all components of zuul and zookeeper and nodepool22:11
mordred++22:12
jeblairi have to run now; anyone who feels up to it, feel free to do that restart when 510185 and 510219 land22:12
mordreddmsimard: heya - you aroud?22:15
dmsimardmordred: not for long22:15
mordreddmsimard: well ... maybe you can answer super-quick22:15
mordreddmsimard: I'm looking at the tox-linters job for openstack-zuul-jobs and the linters env in tox.ini22:15
mordredit has this:22:16
mordred  ANSIBLE_ROLES_PATH = {toxinidir}/roles:{envdir}/src/zuul-jobs/roles22:16
mordredbut I don't know where it's getting that zuul-jobs from22:16
mordreddo you?22:16
dmsimardiirc required-projects gets automatically added to role path or something to that effect22:17
dmsimardor perhaps roles:22:17
mordredAH - I see it ...22:17
mordred-e git://git.openstack.org/openstack-infra/zuul-jobs#egg=zuul-jobs22:17
dmsimardhttps://docs.openstack.org/infra/zuul/feature/zuulv3/user/config.html#attr-job.roles22:17
mordredin test-requirements.txt22:17
dmsimardRoles are added to the Ansible role path in the order they appear on the job – roles earlier in the list will take precedence over those which follow.422:17
mordreddmsimard: yah - this is on the test node itself running linters22:18
mordreddmsimard: I've got an ozj patch with a depends-on on a zj patch but it's not getting set up properly ...22:18
dmsimardmordred: link ?22:18
mordredalthough you know what - tox_install_siblings should be able to totally fix this for me :)22:19
mordredhttp://logs.openstack.org/37/510237/1/infra-check/tox-linters/382fe42/job-output.txt.gz22:19
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Handle non-syntax errors from Ansible  https://review.openstack.org/51021922:19
mordreddmsimard: gonna see if the magic in the tox job will take care of it ^^22:19
mordredwait ... I mean: remote:   https://review.openstack.org/510237 Add javascript tarball publication job22:19
mordred:)22:19
dmsimardfwiw that patch makes sense :p22:20
mordredheh22:21
mordreddmsimard: btw - like the POST-RUN END RESULT_NORMAL: [untrusted : git.openstack.org/openstack-infra/zuul-jobs/playbooks/tox/post@master] lines22:26
dmsimardAw yeah22:27
dmsimardMuch better than the non-formatted strings we had this morning22:27
* dmsimard coughs22:27
dmsimardafk food22:27
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Provide error message on malformed job list  https://review.openstack.org/51018522:27
*** hashar has quit IRC22:50
*** harlowja has quit IRC23:45
*** docaedo has joined #zuul23:47

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!