*** GonZo2000_ has quit IRC | 00:04 | |
*** GonZo2000_ has joined #zuul | 00:07 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Fix paused handler exception handling https://review.openstack.org/575515 | 00:08 |
---|---|---|
pabelanger | just catching up on zuul commits this week, is there any more info on https://review.openstack.org/#/c/574788/ ? from the commit it seems like a potential security issue. Are we looking to create 3.0.4 to include it? | 00:10 |
tristanC | pabelanger: iirc, jeblair mentioned doing a 3.1.0 instead because of some other changes to the default behavior | 00:12 |
pabelanger | k, that would get ansible 2.5 too | 00:12 |
*** GonZo2000_ has quit IRC | 00:17 | |
*** rlandy is now known as rlandy|afk | 00:29 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: angular6 fix suggestion https://review.openstack.org/573494 | 01:30 |
*** rlandy|afk is now known as rlandy | 01:36 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Add support for Ansible extra-vars flag https://review.openstack.org/546474 | 02:10 |
tristanC | tobiash: not sure what's going on, but current master version of zuul is failing localhost command task with "msg": "zuul_log_id missing | 02:51 |
tristanC | with those zuul_json exceptions in logs: http://paste.openstack.org/show/723525/ | 02:52 |
tristanC | (i mean on my local deployment) | 02:56 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Add missing __init__.py files https://review.openstack.org/575591 | 03:09 |
tristanC | tobiash: nevermind, found the issue: the action-general directory was missing without 575591 | 03:10 |
*** rlandy has quit IRC | 03:27 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Add support for Ansible extra-vars flag https://review.openstack.org/546474 | 03:32 |
tobiash | pabelanger: yes, I also planned to write a mail about the security fix but that has fallen down because of the other ansible problems | 04:07 |
tobiash | tristanC: phew, my first thought was 'fuck, what did I miss' because the test cases worked | 04:08 |
*** gouthamr has quit IRC | 04:22 | |
*** dmellado has quit IRC | 04:23 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Remove failed_when when creating /tmp/console-None.log https://review.openstack.org/574672 | 04:53 |
tristanC | tobiash: sorry about that :-) Thanks for the many fixes you worked on! | 05:01 |
tobiash | tristanC: pep8 doesn't like your change | 05:02 |
tobiash | we may need to change the directory name | 05:03 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Rename action-general to actiongeneral https://review.openstack.org/575617 | 05:09 |
tobiash | tristanC: ^ | 05:09 |
tobiash | that renames that directory and adds the __init__.py | 05:10 |
tobiash | I'll leave the other __init__.py files to you as they're unrelated | 05:10 |
openstackgerrit | Merged openstack-infra/zuul master: Add supercedent pipeline manager https://review.openstack.org/571932 | 05:15 |
*** pcaruana has quit IRC | 05:18 | |
*** gtema has joined #zuul | 05:34 | |
*** gtema has quit IRC | 06:09 | |
*** dmsimard has quit IRC | 06:13 | |
*** dmsimard has joined #zuul | 06:27 | |
*** gtema has joined #zuul | 06:41 | |
*** pcaruana has joined #zuul | 06:44 | |
*** swest has quit IRC | 07:33 | |
*** swest has joined #zuul | 07:34 | |
*** jpena|off is now known as jpena | 07:47 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: job: add ansible-tags and ansible-skip-tags attribute https://review.openstack.org/575672 | 07:50 |
*** hashar has joined #zuul | 07:51 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: zk: use kazoo retry facilities https://review.openstack.org/535537 | 08:18 |
*** threestrands has quit IRC | 08:20 | |
*** gouthamr has joined #zuul | 09:02 | |
*** electrofelix has joined #zuul | 09:03 | |
*** GonZo2000_ has joined #zuul | 09:20 | |
*** GonZo2000_ has quit IRC | 09:21 | |
tobiash | hrm, I'm currently trying to react on changes of the branch protection settings | 09:32 |
tobiash | but it seems that github doesn't offer any event for this | 09:32 |
tobiash | jlk: is that correct? | 09:32 |
tobiash | jlk: if yes, that would be an important feature in the long run | 09:33 |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: Add tenant yaml validation option to scheduler https://review.openstack.org/574265 | 09:47 |
*** egonzalez has joined #zuul | 10:18 | |
*** GonZo2000_ has joined #zuul | 10:27 | |
*** yolanda has quit IRC | 11:03 | |
*** yolanda has joined #zuul | 11:06 | |
*** jpena is now known as jpena|lunch | 11:12 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Fix paused handler exception handling https://review.openstack.org/575515 | 11:13 |
*** jpena|lunch is now known as jpena|off | 11:38 | |
*** jpena|off is now known as jpena | 12:05 | |
*** rlandy has joined #zuul | 12:19 | |
*** rlandy_ has joined #zuul | 12:46 | |
*** gouthamr has quit IRC | 12:46 | |
*** rlandy_ has quit IRC | 12:48 | |
*** rlandy has quit IRC | 12:48 | |
*** rlandy_ has joined #zuul | 12:48 | |
*** rlandy_ is now known as rlandy | 12:49 | |
*** myoung|off is now known as myoung | 12:49 | |
*** egonzalez has quit IRC | 12:54 | |
Shrews | mordred: pabelanger: clarkb: corvus: we should try to get 575515 ^ merged | 12:57 |
Shrews | ps1 shows the failure w/o the fix | 12:58 |
mordred | Shrews: lgtm - should we wait for corvus to look before landing? looks fairly straightforward to me | 13:03 |
Shrews | mordred: actually, that test has a slight race in the last assert... lemme fix | 13:04 |
mordred | kk | 13:05 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Fix paused handler exception handling https://review.openstack.org/575515 | 13:05 |
Shrews | ok, that should do it | 13:05 |
Shrews | mordred: we can wait. just want it to go in in case we restart any launchers soonish | 13:05 |
mordred | Shrews: cool. I mean, it seems like one we can just land - but I know things related to locking/unlocking/etc benefit from all the eyeballs | 13:07 |
Shrews | all it really does is set self.paused = False (no locking involved), so we can just land it with another +2 | 13:08 |
Shrews | tobiash: if you wanna do the honors ^^ | 13:08 |
tobiash | Shrews: _a | 13:09 |
tobiash | +a | 13:09 |
Shrews | _a should totally be a gerrit op | 13:09 |
tobiash | lol | 13:10 |
mordred | tristanC: left some comments on https://review.openstack.org/#/c/573494 | 13:14 |
*** elyezer has joined #zuul | 13:50 | |
pabelanger | Shrews: great work on leak | 14:01 |
Shrews | pabelanger: well, i *think* that fixes the leak. it definitely does fix the request lingering. i feel like they're related | 14:03 |
openstackgerrit | Merged openstack-infra/nodepool master: Fix paused handler exception handling https://review.openstack.org/575515 | 14:11 |
mordred | tobiash, tristanC: I believe we need to squash your action-general/action-general patches because of pep8 | 14:32 |
mordred | oh - nevermind - the tobiash change does the whole thing | 14:33 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add missing __init__.py files https://review.openstack.org/575591 | 14:35 |
tobiash | just got an interesting zuul misbehavior | 14:43 |
tobiash | one of my users has a pull request with more than 2000 files and a change to zuul.yaml | 14:43 |
tobiash | and zuul ignores the dynamic zuul.yaml change | 14:44 |
tobiash | it looks like zuul takes the changed file list from the github event and github caps that list at some point | 14:44 |
jlk | tobiash: hrm, that's a good question. Let me do some splunking, but I'm pretty sure that's not an event. | 14:48 |
tobiash | jlk: that just destroyed my plans to solve a few scaling issues we're starting to hit with github :( | 14:49 |
jlk | Yeah, you'd have to do it on a timer or something. | 14:49 |
tobiash | jlk: I wanted to start caching of the project branches and that would be an important part in that concept | 14:49 |
jlk | is it an API limit you're reaching, where we're doing an expensive API call too often? | 14:49 |
jlk | I _thought_ that a way forward for that was to move to graphql and limit the number of transactions, but that's a long delayed project | 14:50 |
tobiash | jlk: it's more of a zuul hangs often for a while during tenant reconfiguration querying hundreds of repos for their branches | 14:50 |
mordred | yah - and if you could keep a cache that you update just from the PR event info it would be cheaper ... | 14:51 |
jlk | oh.... why does it need every branch? | 14:51 |
tobiash | maybe I'll try out some stuff with graphql | 14:51 |
mordred | but if the pr event info is incomplete, then you'd still have to do a query | 14:51 |
tobiash | jlk: it queries the project branches when building the global config | 14:51 |
jlk | it has to look for .zuul files in every branch doesn't it. | 14:51 |
mordred | tobiash: does the payload indicate that it's a truncated file list? | 14:51 |
mordred | jlk: yah | 14:51 |
tobiash | mordred: I didn't have a look at the payload yet | 14:51 |
jlk | in theory with GraphQL you can ask for all branches in a set of repos with one query | 14:52 |
jlk | that query will have a cost. I don't know what that cost will be | 14:52 |
tobiash | our github has no rate limit configured | 14:52 |
jlk | GraphQL limits are based on query cost, not transactions | 14:52 |
jlk | in theory the return will be faster because it's like one SQL query rather than 200 | 14:52 |
tobiash | but I guess one big query is not that bad as hammering github for 30s while zuul is stalled in the eye of the users | 14:53 |
mordred | jlk: one would imagine that the graphql query for that would be less expensive than the multiple normal queries ... or would hope so :) | 14:53 |
jlk | it should be less expensive overall, yes | 14:53 |
jlk | tobiash: can you go on a great branch reaping mission? I know internally at GitHub we do all dev in the canonical repo, rather than forks, but we're very active about deleting our PR branches | 14:54 |
*** myoung is now known as myoung|bbl | 14:54 | |
tobiash | well it's not that every project has 300 branches but we have now 200 repos with 5 branches each | 14:55 |
jlk | I see, so definitely more about a single thread traversing a list of 200 projects | 14:55 |
tobiash | jlk: and in fact the tenant reconfig only queries the repos for the branches (the config itself is cached) | 14:55 |
tobiash | so it scales with the number of repos | 14:55 |
tobiash | and that is growing now so it's getting a pain slowly | 14:56 |
jlk | is it single threaded right there? | 14:56 |
tobiash | yes | 14:56 |
tobiash | and it blocks the scheduler event loop | 14:56 |
jlk | can we async it, parallel it? (that may run you into API limits much quicker) | 14:56 |
tobiash | maybe | 14:56 |
mnaser | could we add [zuul] to the mailing list emails | 14:56 |
jlk | mnaser: I'm curious why? | 14:57 |
mnaser | easy to identify in my mailing list and filters | 14:57 |
tobiash | but I think that might be difficult as that would dramatically change how the tenant reconfiguration works | 14:57 |
tobiash | so caching of the project branches would solve that issue | 14:57 |
jlk | mnaser: but list-id is in the headers, which is way better for filtering. | 14:58 |
tobiash | maybe I can get something that is good enough for now | 14:58 |
mordred | mnaser: List-Id: "Discussion of Zuul usage and development." <zuul-discuss.lists.zuul-ci.org> | 14:58 |
mordred | mnaser: is what I filter on | 14:58 |
jlk | or just simply the 'zuul-discussion.lists.zuul-ci.org' bit | 14:59 |
mordred | yah | 14:59 |
*** pcaruana has quit IRC | 15:01 | |
*** gouthamr has joined #zuul | 15:19 | |
*** elyezer has quit IRC | 15:23 | |
rcarrillocruz | helo helo folks. Turns out we are writing a bunch of roles at ansible-networking, some of them will interact with cloud providers (like for example creating cloudformation stacks in AWS and the likes). I'm thinking the cloud creds can be stored on the zuul tenant as secret. As for the job, maybe the static driver node is the best match? The reason I'm asking is cos pre v3, the static node would have those kind | 15:27 |
rcarrillocruz | of secrets stored in filesystem with config mgmt tools (i'm remembering for pypi upload), but given that now Zuul has secrets natively what's the advantage/disadvantage of using a static node vs an ephemeral node for such kind of jobs? | 15:27 |
clarkb | rcarrillocruz: there isn't really one, openstack uses all dynamic ephemeral nodes now and we have deleted our static nodes | 15:31 |
clarkb | rcarrillocruz: static nodes are more of a bootstrap without cloud option I think | 15:31 |
rcarrillocruz | so static nodes now are merely 'i need to run jobs here cos it has a leg on $network' ? that's the use case i can just think of now | 15:32 |
clarkb | rcarrillocruz: ya or maybe you are using zuul for deployment too and you statically define your production env or something | 15:33 |
clarkb | I'm sure there are more use cases but they don't need to be tightly coupled to secrets anymore | 15:33 |
rcarrillocruz | sweet, thanks for the prompt response (as always) | 15:33 |
*** gouthamr has quit IRC | 15:38 | |
*** gouthamr has joined #zuul | 15:39 | |
*** jpena is now known as jpena|off | 16:27 | |
*** jpena|off is now known as jpena | 16:29 | |
*** jpena is now known as jpena|away | 16:29 | |
*** gouthamr has quit IRC | 16:50 | |
*** gouthamr has joined #zuul | 16:50 | |
mordred | rcarrillocruz: fwiw, you can also run zuul jobs nodeless | 16:52 |
mordred | rcarrillocruz: depending on what the content of the job is, of course, but if all the job is doing is calling to a remote web service, you don't *actually* need to spin up a VM or a static node to accomplish that | 16:53 |
mordred | rcarrillocruz: you'd need to define those jobs in a config-project so that they'd be trusted jobs - but it's possible that's a useful option foryou | 16:54 |
*** gouthamr_ has joined #zuul | 17:27 | |
*** gouthamr has quit IRC | 17:27 | |
*** gouthamr_ is now known as gouthamr | 17:27 | |
*** jpena|away is now known as jpena | 17:47 | |
*** electrofelix has quit IRC | 18:05 | |
*** bhavik1 has joined #zuul | 18:11 | |
*** jpena is now known as jpena|off | 18:13 | |
*** bhavik1 has quit IRC | 18:18 | |
corvus | fyi, i'll be in china for a conference next week so my availability may be limited and different. but i should be able to review changes on the plane with gertty :) | 18:24 |
corvus | please remember to fill in https://etherpad.openstack.org/p/zuul-update-email | 18:28 |
*** rlandy_ has joined #zuul | 18:31 | |
*** rlandy_ has quit IRC | 18:31 | |
*** rlandy_ has joined #zuul | 18:31 | |
*** rlandy has quit IRC | 18:33 | |
*** gtema has quit IRC | 18:34 | |
*** rlandy_ is now known as rlandy | 18:36 | |
rcarrillocruz | mordred: yeah, it's just that i feel uneasy about doing that, what if the change has a bogus delegate_to localhost and reviewer overlooks it or the likes | 18:54 |
mordred | rcarrillocruz: yah, totally | 18:54 |
mordred | rcarrillocruz: just wanted to mention it for completeness | 18:54 |
rcarrillocruz | how post-review works, would it help for this case? | 18:55 |
rcarrillocruz | also, was thinking the other day about voting false in the GH scenario , is it a thing, providing there's no +1/-1 on GH | 18:55 |
clarkb | corvus: before the week ends are you still planning a new zuul release? | 18:59 |
clarkb | I haven't explicitly followed up on the logging None issue with tobiash's changes deployed yesterday bu thaven't heard of problems | 18:59 |
*** rlandy has quit IRC | 19:15 | |
*** rlandy has joined #zuul | 19:18 | |
*** rlandy_ has joined #zuul | 19:39 | |
*** rlandy has quit IRC | 19:41 | |
corvus | https://review.openstack.org/564220 previously had the logging None issue, and the recheck i recently issued passed the devstack-multinode job, so i think that adds extra confirmation about the fix | 21:02 |
corvus | i'd like to tag 057d664ecc2ce151789e2488250b5e1da36d48a3 as 3.1.0 | 21:02 |
corvus | clarkb, tobiash, fungi, mordred, anyone else: ^ thoughts | 21:03 |
tobiash | corvus: which one is that? | 21:04 |
* tobiash is not at the computer | 21:04 | |
clarkb | looking at log | 21:04 |
corvus | tobiash: it's right before supercedent. it's what we restarted openstack-infra with yesterday. it includes all the logging fixes | 21:05 |
fungi | we seem to be running 06df3bb on our systems at the moment | 21:05 |
tobiash | corvus: fine for me | 21:05 |
clarkb | Merge "Remove extra argument when logging logger timeout" | 21:05 |
corvus | fungi: running, or have installed? | 21:05 |
clarkb | I'm good with 057d664ecc2ce151789e2488250b5e1da36d48a3 as 3.1.0 | 21:06 |
fungi | oh, good point. installed not restarted since then | 21:06 |
* fungi rechecks assumptions | 21:06 | |
corvus | fungi: ya, the bottom of http://zuul.openstack.org/ has the sha the scheduler is running at least | 21:06 |
corvus | which is 057 | 21:06 |
mordred | ++ to tag | 21:07 |
fungi | yeah our zuul-web says 057d664 | 21:07 |
fungi | okay, makes sense | 21:07 |
fungi | so basically tagging the merge before master tip | 21:08 |
fungi | sounds great to me | 21:08 |
corvus | yeah. if we restart the scheduler now we'd get a new pipeline manager. :) | 21:08 |
fungi | that gets us the multinode console log fix and the unreachable node information leak fix as big ticket changes since last point release? | 21:09 |
clarkb | fungi: yup | 21:09 |
corvus | erm, i had some minor dyslexia when making the tag. can folks double check this before i push it up? http://paste.openstack.org/show/723564/ | 21:10 |
clarkb | corvus: that commit is one I have locally and appeas to tbe the one above | 21:11 |
fungi | lgtm | 21:11 |
corvus | (to be clear, i caught my error and fixed it, i think -- i just want to make sure i really did get the version number and sha right) | 21:11 |
clarkb | corvus: version and sha lgtm | 21:11 |
clarkb | 3.1.0 because it adds ansible 2.5 | 21:12 |
clarkb | sha1 is one above | 21:12 |
corvus | 3.1.0 pushed | 21:12 |
corvus | and, happily, 0.3.1 still does not exist | 21:12 |
corvus | i'm clearly not used to such large version numbers | 21:13 |
fungi | next up, zuul 10! | 21:13 |
corvus | i only learned to count to 2 in sound engineering classes | 21:13 |
corvus | and only up to 1 in computer science | 21:14 |
corvus | 3 just boggles the mind | 21:14 |
fungi | zuul hrair | 21:16 |
*** GonZo2000_ has quit IRC | 21:32 | |
*** hashar has quit IRC | 22:10 | |
*** rlandy_ is now known as rlandy | 22:15 | |
jesusaur | where can I update the zuul-ci docs? specifically I want to add an openSuse environment section to https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html | 22:33 |
clarkb | jesusaur: they are in the zuul repo itself | 22:34 |
clarkb | zuul/doc/source/admin looks like | 22:34 |
jesusaur | aha, there it is, thanks! | 22:35 |
corvus | i've observed a couple of things slowing down openstack-infra's zuul. 1) we run cat jobs on too many project-branches on tenant reconfiguration because we don't distinguish between a cache invalidation and not having any data to start with | 22:36 |
corvus | (about 2k of our 6k project-branches have no zuul config on them, so we query the mergers for them every time) | 22:36 |
corvus | 2) we trigger tenant reconfiguration on branch deletions from random github repos where we aren't actually loading any config | 22:38 |
corvus | presumably we do the same for branch creation | 22:39 |
corvus | i don't think even setting exclude_unprotected_branches would help here -- that's only honored during the configuration itself, it doesn't impact whether a configuration is triggered | 22:39 |
corvus | the fix for #1 is obvious | 22:40 |
corvus | the fix for #2 i'm not sure about -- should we special case repos where no configuration is loaded and ignore those events? should we try to do something with exclude_unprotected_branches? (i worry about being too agressive there because even if no configuration files change, the configuration itself can differ whenever a repo creates the second branch (it goes from unbranched to branched). | 22:44 |
corvus | maybe we could look at exclude_unprotected_branches but also be aware of the second-branch issue, and filter out useless create/delete events, unless it's creating or deleting the second branch | 22:45 |
clarkb | corvus: do pull requests generate useless branch creation events? | 22:49 |
clarkb | or are those distinct? | 22:49 |
corvus | clarkb: i believe those are distinct | 22:50 |
corvus | the events i observed in prod i traced back to real branch creation events. one of the third-party repos we're watching is branch-happy | 22:50 |
clarkb | ah | 22:50 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Log more information about what events trigger a reconfig https://review.openstack.org/575871 | 22:53 |
corvus | i was able to deduce the info without that ^ but it took a long time. that should make it clearer in the future | 22:53 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Log more information about what events trigger a reconfig https://review.openstack.org/575871 | 22:54 |
clarkb | that change prompted me to check zuul sched for the logging fix I did and we now have github delivery events in the logs properly | 22:57 |
corvus | clarkb: yeah, those were really helpful :) | 23:14 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!