Thursday, 2019-01-03

*** bhavikdbavishi has joined #zuul02:35
*** rlandy|rover|bbl is now known as rlandy|rover02:46
*** rlandy|rover has quit IRC03:20
*** bhavikdbavishi has quit IRC03:46
*** bhavikdbavishi has joined #zuul03:46
*** bhavikdbavishi has quit IRC03:48
*** bhavikdbavishi has joined #zuul03:48
*** bjackman has joined #zuul04:40
*** quiquell|off is now known as quiquell07:18
*** bjackman has quit IRC08:52
*** bjackman has joined #zuul08:54
bjackmanCould anyone guess how I might have ended up with stuff in my Zuul queue that appears to be "running" in the web UI, but that has nothing in its logs, and all I can see in the executor logs is stuff from the merger?09:48
bjackmanNone of my jobs seem to start _really_ running. I can see that nodepool satisfied some node requests but then seems to have freed them up again09:49
*** sshnaidm|afk is now known as sshnaidm09:52
*** bhavikdbavishi has quit IRC10:13
tristanCbjackman: iirc this can happens when executor host are busy, are they accepting jobs?10:23
bjackmantristanC, the scheduler was printing the "Received handle " messages, so I guess so?10:52
bjackmanThe blockage just suddenly cleared up so I suspect that zuul wasn't actually locked up, my infra was just very slow..10:52
bjackmanI don't know what could cause such long delays though10:53
*** dkehn has quit IRC11:04
*** bjackman has quit IRC12:01
*** bhavikdbavishi has joined #zuul12:29
*** rlandy has joined #zuul12:42
*** rlandy is now known as rlandy|rover12:43
*** bjackman has joined #zuul12:50
*** bhavikdbavishi has quit IRC12:54
*** bhavikdbavishi has joined #zuul13:01
*** quiquell is now known as quiquell|brb13:15
*** ianychoi has joined #zuul13:17
*** quiquell|brb is now known as quiquell13:35
*** bhavikdbavishi has quit IRC13:44
*** bjackman has quit IRC13:44
ShrewstristanC: do you think this is a transient error, or something more concerning? http://logs.openstack.org/42/621642/3/check/nodepool-functional-k8s/b9aef79/job-output.txt.gz#_2019-01-02_20_49_34_02028413:56
sshnaidmdo you know if releasing external resources feature is planned (that was discussed here  http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-December/000653.html)13:58
sshnaidmor worth to create a task/story/whatever to track it?13:59
fungisshnaidm: bjackman seems to have disappeared in the last few minutes, but was the initiator of that discussion. i'm not sure whether he's working on it. a story in sb would be good if there isn't one already13:59
sshnaidmI didn't find a story and created this one: https://storyboard.openstack.org/#!/story/200469414:05
sshnaidmfungi, ^^14:05
sshnaidmplease feel free to comment/correct it if something is wrong14:05
fungithanks sshnaidm!14:06
*** dkehn has joined #zuul14:16
*** bhavikdbavishi has joined #zuul15:10
*** sshnaidm has quit IRC15:15
*** bhavikdbavishi has quit IRC15:16
*** sshnaidm has joined #zuul15:26
*** quiquell is now known as quiquell|off15:28
*** bhavikdbavishi has joined #zuul15:41
*** bjackman has joined #zuul15:54
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Extract out common config parsing for ConfigPool  https://review.openstack.org/62164216:05
*** sshnaidm has quit IRC16:15
corvusi'm looking at two potential zuul bugs this morning -- 1) a bug that fungi and Shrews found where a change is stuck in the queue because zuul doesn't know what jobs are attached to it.  and 2) we seem to be performing tenant reconfiguration events very often in response to actions on ansible pull requests.  i haven't started looking into that one at all.16:22
Shrewscorvus: i'm very interested in the outcome of #1 and how we get to that state. it's obviously something we protect against for some reason16:25
*** sshnaidm has joined #zuul16:28
Shrewsthat last sentence should read: "it's obviously something we expect since we protect against it"16:29
*** sshnaidm_ has joined #zuul16:32
*** bjackman has quit IRC16:33
*** sshnaidm has quit IRC16:35
*** sshnaidm__ has joined #zuul16:35
*** sshnaidm has joined #zuul16:37
*** sshnaidm_ has quit IRC16:38
*** sshnaidm__ has quit IRC16:40
*** sshnaidm_ has joined #zuul16:40
*** sshnaidm has quit IRC16:43
corvusand #3 -- there's an unhandled exception in the reviseRequest method; it's falling back on the general scheduler exception handler.16:54
openstackgerritMerged openstack-infra/zuul master: Switch back to three columns for mid sized screens  https://review.openstack.org/62799717:02
*** bjackman has joined #zuul17:09
corvusShrews: any chance you have a few mins to track down the reconfiguration events in the logs and find out what the actual events are? (i know they are mostly about pull requests -- but are they pr's merged, opened, commented?)17:19
corvusShrews: the current description of the problem is "something about ansible pull requests are causing reconfigs"; i'm hoping if we refine that into "ansible pull requests cause reconfigs when <blank>" and it makes the problem obvious :)17:20
*** bjackman has quit IRC17:22
Shrewscorvus: will see what i can dig up for you17:22
corvusShrews: "reconfiguration" is the best thing to grep for17:23
Shrewsossum17:23
corvus?reconfiguration17:23
*** panda is now known as panda|off17:27
*** sshnaidm__ has joined #zuul17:31
*** sshnaidm_ has quit IRC17:34
Shrewscorvus: omg17:35
corvusyeah, it's a few, huh? :)17:35
Shrewsumm, it would be pretty bad if every PR change to ansible/ansible (even if it doesn't touch openstack files) triggered reconfigs, eh?17:35
corvusShrews: yeah, it would be a little bit of wasted work.  :)17:36
Shrewsb/c this change triggers the first reconfig in the current log file: https://github.com/ansible/ansible/pull/50435/commits/503561a54bdf3c551683278615f1e6e198b7474417:36
corvusconsidering it takes about 40-60 seconds of dedicated computation to do a reconfig.17:37
Shrewsthat touches nothing of concern to us17:37
Shrewspardon the paste17:38
Shrews2019-01-03 06:27:10,348 INFO zuul.Pipeline.openstack.check: Adding change <Change 0x7f5674666b70 ansible/ansible 50435,503561a54bdf3c551683278615f1e6e198b74744> to queue <ChangeQueue check: ansible/ansible> in <Pipeline check>17:38
Shrews2019-01-03 06:27:10,350 INFO zuul.Pipeline.openstack.third-party-check: Adding change <Change 0x7f5674666b70 ansible/ansible 50435,503561a54bdf3c551683278615f1e6e198b74744> to queue <ChangeQueue third-party-check: ansible/ansible> in <Pipeline third-party-check>17:38
Shrews2019-01-03 06:27:10,599 INFO zuul.Scheduler: Tenant reconfiguration beginning for openstack due to projects {(<Project ansible/ansible>, 'devel')}17:38
corvuson the other bug -- i'm developing a theory that we're seeing the change in question move in and out of the active window during reconfiguration; if the change moves out of the active window during a reconfiguration, it may no longer have any jobs attached to it, and if there were builds running for those jobs, we'd reject their results; however i'm not sure why we wouldn't restart them when they move into17:40
corvusthe active window17:40
corvusShrews: so was that a PR open event?17:40
Shrewscorvus: not sure how to tell17:42
Shrewsi think so, based on the first ansibot comment on that pr17:43
corvusShrews: maybe line up timestamps?  if all else fails, you can backtrack to the event delivery and look up the event in github.17:43
*** panda|off has quit IRC17:44
Shrewsthe "about a day ago" timestamp on the PR does not help, but it does reference that commit #17:45
corvusShrews: if you mouseover you'll get a real time17:46
*** panda has joined #zuul17:48
SpamapSpabelanger: are you still looking for nginx configs that work btw? I'm revisiting ours to fix some problems.17:49
pabelangerSpamapS: Yah, I'd be happy to also try them out17:52
pabelangerI haven't had much time to dig into it just yet17:52
Shrewscorvus: our logs indicate it was a PR changed event17:57
Shrewsi could not line up the PR open to our timestamp17:57
corvusShrews: so adding commits to the pr?17:57
Shrews2019-01-03 06:27:10,347 DEBUG zuul.Scheduler: Submitting tenant reconfiguration event for openstack due to event <GithubTriggerEvent 0x7f5674666a90 pull_request ansible/ansible refs/pull/50435/head changed github.com/ansible/ansible 50435,503561a54bdf3c551683278615f1e6e198b74744 delivery: 97dffdc0-0f20-11e9-80c1-aefc073e53ea> in project ansible/ansible17:58
corvusShrews: are they all like that?17:58
Shrewsthat commit was to fix the ansible lint errors, so that seems correct17:59
Shrewscorvus: will look at the next one17:59
Shrewscorvus: and that timestamp does line up, so yes, a PR change event18:01
Shrewsok, now on to the next18:01
Shrewscorvus: hrm, looks like a status change (possibly labels being removed/added?) on that same change triggers another reconfigure18:05
Shrewsseconds after the previous18:06
corvusok not just the one type of event then18:06
Shrewsand for irrelevant PRs18:07
Shrewsso multiple issues i guess18:07
corvusShrews: based on that, i wonder if this is what is causing it?  http://git.zuul-ci.org/cgit/zuul/tree/zuul/scheduler.py#n105818:11
corvusfungi, Shrews: i think i've traced bug #1 -- a change has to be in the active window, make node requests, leave the active window, be subject to a tenant reconfiguration and re-enqueue, then the earlier node requests complete.  at that point zuul has lost track of the jobs for the change because it is no longer active, so it returns the nodes, but does not clear the record of the node request.  when the18:37
openstackbug 1 in Ubuntu Malaysia LoCo Team "Microsoft has a majority market share" [Critical,In progress] https://launchpad.net/bugs/1 - Assigned to MFauzilkamil Zainuddin (apogee)18:37
corvuschange enters the active window again, it still thinks there's an outstanding node request and does nothing.18:37
corvusalways helpful that one :)18:37
corvusit's going to take me a little bit to make up the test case for that18:39
fungihrm, so how does a change go from within the active window to outside it?18:40
corvus(and i think bug number 3 is related here -- the sheer number of reconfigurations we are undergoing has served to increase the odds of hitting that case)18:40
fungiwindow shrinkage?18:40
corvusfungi: the active window shrinks.  yeah18:40
Shrewscorvus: oh how fun18:40
corvusit shrinks by half on a failure in our config18:40
fungicool, that's an obscure set of preconditions18:41
corvusfungi: yeah, most of which we test individually, but not together :)18:41
clarkbopenstack is a really good beta user18:41
clarkb:P18:41
corvusalso, worth noting that you have to have an active window > the minimum for this to happen. also a rare condition for us.18:42
clarkbcorvus: something about not merging a bunch of broken code over the holidays helping with that :)18:45
corvus\o/ i have a reproducing test case19:02
corvusthe exception i noted as bug 2 is directly related to bug 1, and i don't think will happen once bug 1 is fixed; so i think i'm going to decline to "fix" that for now, as it might just mask other error conditions later.19:13
openstackbug 1 in Ubuntu Malaysia LoCo Team "Microsoft has a majority market share" [Critical,In progress] https://launchpad.net/bugs/1 - Assigned to MFauzilkamil Zainuddin (apogee)19:13
corvusi'm going to have to call these bug A,B,C aren't i?19:13
*** bhavikdbavishi has quit IRC19:25
Shrewsi prefer alpha, beta, charlie...19:26
Shrewscorvus: so that scheduler code you pointed to... not quite sure how that would be triggered. would be nice if we had some logging to indicate which condition actually triggered the reconfigure there20:00
Shrewsgrr, gotta afk for a bit20:19
corvusShrews: we don't load any config from the ansible repo, so the second clause (line 1063-1064) should always be true.  so it sounds like 1061-1064 reduces to "if an event has a branch, reconfigure".  especially since i don't think we set "exclude unprotected branches" which means the carveout on lines 1069-1071 won't apply either.20:35
*** hashar has joined #zuul20:38
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Remove unecessary finally clauses  https://review.openstack.org/62829921:08
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Fix items stuck in queue pending node requests  https://review.openstack.org/62830021:08
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Be more aggressive in canceling node requests  https://review.openstack.org/62830121:08
corvusokay, that's the outcome of looking into bug alpha (and bravo)21:08
corvusShrews, fungi, clarkb: ^21:09
clarkbthe first one is an easy review. I'll have to dig into the other two after I eat21:11
corvusi'm going to try to work on a test case for bug charlie now21:15
mordredcorvus: I like the assertTrue(len(self.builds), 4)21:18
corvusmordred: it's why my tests are so reliable21:18
mordred++21:19
mordredoh - I think in that case true is for len(self.builds) - and 4 becomes the message it will display if len(self.builds) is false right?21:19
corvusyeah i think so21:19
mordredassertion failed: 421:19
corvusso it would still catch len(self.builds)==0.  but that's not an expected failure case... more likely is 2 instead of 4.21:20
mordredyah21:20
mordredwhich would happily still be true21:21
mordredtype inferance is great until it isn't21:21
*** hashar has quit IRC21:37
Shrewscorvus: oh, i see. i clearly did not understand the 2nd clause21:48
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Cache empty branch config to prevent spurious reconfig  https://review.openstack.org/62830721:48
corvusShrews, tobiash: ^ i think that's the fix for the reconfiguration bug21:49
corvusfbo: ^21:49
mordredcorvus: you have angered the pep8 gods: http://zuul.openstack.org/build/bfd4140565b84c57af7e75494c9faba8 ... but the zuul dashboard helped me see the error more quickly than before22:03
clarkbthe first change in the stack failed py36 tests22:05
mordredsad panda22:07
corvusfalse negative; possibly due to limestone perf issues22:07
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Be more aggressive in canceling node requests  https://review.openstack.org/62830122:08
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Cache empty branch config to prevent spurious reconfig  https://review.openstack.org/62830722:08
corvusthat should take care of the pep8 thing22:08
clarkbcorvus: for deleting the node request, nodepool will notice the request has gone away and then stick the booted node for that into its pool for fulfilling the next request?22:19
corvusyep22:22
corvus(we do currently delete node requests if we're resetting an item (eg, a gate reset) so that code is exercised)22:23
Shrewscorvus: with 628307 in place, would we still catch the github edge case of going from protected to unprotected?22:38
Shrewsor vice versa i guess22:39
corvusShrews: i believe so -- i think the branch won't show up in the list of branches we get from the driver in that case22:39
corvus(so when it does show up (because the bit has flipped), it really will return None in the call to get the branch config in the scheduler -- that second clause we were looking at earlier)22:40
corvusbut we might want to double check that with tobiash  :)22:41
Shrewsyeah, i'm a bit lost on that portion of it, but the rest looks sane (and i learned things!)22:41
clarkbZuul ignores the unprotected branches?22:45
clarkbfor configuration that is22:45
corvusclarkb: it's an option22:45
Shrewshttps://zuul-ci.org/docs/zuul/admin/tenants.html#attr-tenant.untrusted-projects.<project>.exclude-unprotected-branches , i think22:46
clarkbI guess this situation only matters if we are ignoring unprotected branches otherwise we'd be tracking the config the entire time22:47
clarkbalso that is scoped to only when we are ignoring all config anyway22:49
clarkbso I think this is ok?22:49
Shrewscorvus: how does one differentiate between protected and unprotected branches in github?22:52
Shrewsis that a github setting?22:53
corvusShrews: we've reached the limits of my knowledge here, i just know it's a github thing.22:53
Shrews:)22:55
pabelangerShrews: yup, there are branch settings in github where you apply branch protection rules22:55
Shrewshttps://help.github.com/articles/defining-the-mergeability-of-pull-requests/22:55
Shrewsthat ^ describes it22:55
Shrewspabelanger: thx22:55
pabelangerwe just started to enable it on all ansible-network roles before the break22:56
Shrewscorvus: mainly wanted to verify that i wasn't missing a zuul-side config for it22:56
clarkbpabelanger: are those rules viewable by non admins somewhere?22:56
pabelangerclarkb: I don't believe so22:57
pabelangerclarkb: I think, if you are logged into github you will just see branch checks are required on a PR22:58
pabelangereg: https://github.com/ansible-network/cloud_vpn/pull/5522:58
clarkbI don't see naything there indicating that22:58
clarkbso maybe we can't see that22:59
pabelangerRequired statuses must pass before merging, is the section I see22:59
pabelangerbut it is my PR22:59
openstackgerritMerged openstack-infra/zuul master: Remove unecessary finally clauses  https://review.openstack.org/62829923:00
openstackgerritMerged openstack-infra/zuul master: Fix items stuck in queue pending node requests  https://review.openstack.org/62830023:00
clarkbI don't see that (^F doesn't show it either)23:00
pabelangerclarkb: okay, so possible only owners / collaborators / teams see it in a PR, since they can only merge code23:01
openstackgerritMerged openstack-infra/zuul master: Be more aggressive in canceling node requests  https://review.openstack.org/62830123:03
openstackgerritMerged openstack-infra/zuul master: Cache empty branch config to prevent spurious reconfig  https://review.openstack.org/62830723:03
Shrewsour tenant config docs show the use of exclude-unprotected-branches with a gerrit source. that's not correct, is it?23:06
Shrewsthe subsequent docs for that option only mention github23:07
Shrewstobiash: maybe you know? ^^23:07
mordredShrews: I believe you are right - that is a github option - branch protections are a github concept generally23:08
Shrewsmordred: i'm allowed to be right once a year, so it's possible  :)23:09
Shrewsoh wow, i need to start dinner23:10
tobiashShrews: exclude unprotected branches is a tenant or project option in main.yaml and not directly related to a connection23:33
tobiashBut yes it only has an effect in the github driver23:34
tobiashShrews: do you have a link to that page? Maybe it needs a correction.23:35
Shrewstobiash: https://zuul-ci.org/docs/zuul/admin/tenants.html#tenant23:37
tobiashShrews: ok, so that's an option on tenant or project level and has no effect on projects of a gerrit source23:38
tobiashAnd I'd suggest to set this to true on tenant level23:39
corvustobiash: if you could retro-review https://review.openstack.org/628307 i'd appreciate it :)23:52
tobiashcorvus: sure, I'll have a look tomorrow :)23:53

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!