jeblair | turning off keep | 00:01 |
---|---|---|
*** harlowja has joined #zuul | 00:03 | |
*** harlowja has quit IRC | 00:09 | |
*** harlowja has joined #zuul | 00:14 | |
ianw | jeblair: doesn't fd.readline() in follow return str ... so under python3 the strings will always be encoded | 00:52 |
ianw | not that i think this is wrong per se, but slightly conflicts with the changelog | 00:53 |
*** harlowja has quit IRC | 00:57 | |
ianw | ah no, you're right, it's only with universal_newlines set that Popen does that | 00:57 |
jeblair | ianw: er, good! i hadn't realized that, but it sounds like it's still accidentally correct. that's part of why i wrote the detailed changelog -- just to make sure it all makes sense to all of us. :) | 01:01 |
*** harlowja has joined #zuul | 01:01 | |
*** jkilpatr has quit IRC | 01:26 | |
clarkb | readline returns str if you set universal newlines to true iirc | 01:34 |
clarkb | oh depends if text mode or binary and text implies universal newlines | 01:35 |
*** jkilpatr has joined #zuul | 01:37 | |
mordred | jeblair: lovely! | 01:49 |
mordred | clarkb: yah - I agree on the collapsing of the environment defaults for tox | 01:52 |
mordred | clarkb: if you're around, the 2 changes before 229 only have 1 +2 - they're mostly just lead-up to 229 though | 01:54 |
mordred | clarkb: I'm not sure if you want to explicitly review them, or if your +2 on the end results of 229 is good and I can just land the stack | 01:54 |
jamielennox | did we end up implementing a new way to hold the nodes of failing jobs? | 01:55 |
mordred | jamielennox: yes we did! | 01:55 |
jamielennox | \o/ - do you remember it? | 01:55 |
mordred | jamielennox: https://docs.openstack.org/infra/zuul/feature/zuulv3/admin/client.html#autohold | 01:55 |
mordred | jamielennox: (was looking for doc link for you) | 01:56 |
jamielennox | ahh, i was looking at nodepool | 01:56 |
jamielennox | probably does make more sense now for that to be on zuul side | 01:56 |
mordred | yah - it used to be there - but with v3 and the shift to active-requests ... yah | 01:56 |
jamielennox | is there still a use for "nodepool hold" | 01:57 |
jamielennox | ? | 01:57 |
jamielennox | mordred: anyway - thanks! | 01:59 |
mordred | jamielennox: I don't think so? maybe? | 02:00 |
jamielennox | not super urgent for now anyway - it can be a useful way of pulling a node out for your own usage | 02:00 |
jamielennox | just not sure if that'll be common | 02:01 |
mordred | jamielennox: I do know that a thing we don't have but people keep asking for is "nodepool boot" - so thatyou can ask nodepool to boot you a node of a particular label - like if you need to debug something about one | 02:01 |
mordred | jamielennox: which I htink is similar to the use case you're talking about yeah? | 02:01 |
jamielennox | mordred: yea, becaues zuul will skip things marked HOLD it basically reserves you a node | 02:02 |
mordred | "as an admin of a zuul/nodepool, I'm having issues that only show up in test and I'd like a node to ssh in to and poke around at to see if I can figure out" | 02:02 |
mordred | jamielennox: ah - ya - hold in nodepool gets you a node to play with - autohold in zuul doesn't delete a node when a job fails | 02:02 |
jamielennox | re: autohold, it'd be useful to not have every parameter required, like if tox-py27 is failing consistently in a tenant i probably don't care which project i capture from? | 02:02 |
jamielennox | which is a feature i should put in storyboard, but i still struggle to know where to put things like that in storyboard | 02:03 |
mordred | yah- I could see that | 02:03 |
mordred | jamielennox: I think we all do | 02:03 |
clarkb | mordred: I think you can go for it. I'm fighting the "my wahing machine stopped working and neither water valve for it has a handle" battle now | 02:04 |
mordred | clarkb: oh good | 02:04 |
*** jkilpatr has quit IRC | 02:04 | |
mordred | clarkb: I'll land those ina sec - I'm working on the squash-tox-environment thing right now | 02:04 |
mordred | butI need to prove something to myself first | 02:05 |
clarkb | it turns out when you replace a section of leaky pipr that kills washing machines | 02:05 |
clarkb | can I blame jaypipes for this? | 02:05 |
mordred | clarkb: yes. he's he right person to blame | 02:07 |
*** jkilpatr has joined #zuul | 02:16 | |
*** xinliang has quit IRC | 02:17 | |
fungi | pabelanger: jamielennox: mordred: i'm still catching up... not installing bindep if there's no bindep.txt ignores the fact that we have a bindep fallback list, doesn't it? or am i misunderstanding the suggestion? | 02:19 |
*** xinliang has joined #zuul | 02:30 | |
*** xinliang has quit IRC | 02:30 | |
*** xinliang has joined #zuul | 02:30 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Collapse tox_environment and tox_environment_defaults https://review.openstack.org/501075 | 02:38 |
mordred | fungi: the code that looks for bindep.txt also looks for the fallback file | 02:39 |
mordred | fungi: so for openstack, it will always find a bindep.txt - AND it will never intsall bindep becaues we have it pre-installed | 02:39 |
mordred | clarkb: ^^ https://review.openstack.org/501075 collapses the tox_environment settings like you mentioned AND gets rid of the python module | 02:39 |
mordred | clarkb: so tons of simplification | 02:40 |
clarkb | nice | 02:40 |
mordred | clarkb: can probably squash with the previous patch, but I figured I'd put it up separate for reading purposes | 02:40 |
clarkb | mordred: I've pulled it up for review first thing tomorrow | 02:40 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Don't install bindep if there's no bindep file https://review.openstack.org/501018 | 02:50 |
mordred | jeblair: I hit +A on https://review.openstack.org/#/c/501040 but there's some good comments from ianw in there that are worthy of reading and potentially a followup | 03:07 |
fungi | jhesketh: trying to fix up your 456162 change i'm down to just one unit test failure now... if you get a chance to take a look at why test_crd_gate_unknown is unhappy with it we might be able to get it merged soon | 03:26 |
* fungi needs to get some sleep, but will be getting into more zuulishness tomorrow | 03:26 | |
jhesketh | fungi: sure, I'll take a look | 03:30 |
ianw | mordred: 501040 ... is that a known thing? | 03:39 |
ianw | the -2 i mean | 03:39 |
*** bhavik1 has joined #zuul | 05:02 | |
jamielennox | {"msg": "[Errno 2] No such file or directory" is such a useless message | 05:03 |
jamielennox | why isn't the accessed name in their by default | 05:04 |
*** bhavik1 has quit IRC | 05:25 | |
*** hashar has joined #zuul | 05:35 | |
openstackgerrit | Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Only grab the gerrit change if necessary https://review.openstack.org/456162 | 06:06 |
jhesketh | fungi: ^ I think that fixes the problem | 06:07 |
* jhesketh misses working on zuul | 06:08 | |
tobiash | jhesketh: I have a thought on ^ | 07:44 |
tobiash | jeblair: just noticed that maintainCache is never called so we probably don't clear anything from the change caches currently | 07:57 |
tobiash | jeblair: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/scheduler.py?h=feature/zuulv3#n589 | 07:57 |
tobiash | jeblair: there is still some comment to update maintainConnectionCache for tenants but to me this method looks correct | 07:58 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Enable maintainConnectionCache https://review.openstack.org/501144 | 08:02 |
tobiash | jeblair: wip'ed ^ in case you have any objections for this | 08:03 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use password supplied from nodepool https://review.openstack.org/500823 | 08:44 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Enable maintainConnectionCache https://review.openstack.org/501144 | 09:06 |
*** openstackgerrit has quit IRC | 09:18 | |
jhesketh | tobiash: ah cool, thanks... I think you're right and your suggestion is good. Should we let this one land and fix it up in a follow up or do it now? | 09:19 |
tobiash | jhesketh: I don't mind, but I also think the data structure change should be its own patch | 09:19 |
jhesketh | tobiash: umm, so you do want them separate? (sorry, I'm confused by your last message) | 09:21 |
tobiash | jhesketh: I think we should have a patch which restructures the cache data structure and the patch which already exists. Possibilities are that the restructure change is either the parent or the child of your change | 09:22 |
jhesketh | oh right, I follow | 09:23 |
*** openstackgerrit has joined #zuul | 09:48 | |
openstackgerrit | Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Only grab the gerrit change if necessary https://review.openstack.org/456162 | 09:48 |
openstackgerrit | Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Connection change cache improvement https://review.openstack.org/501187 | 09:48 |
jhesketh | tobiash: ^ | 09:48 |
*** openstackgerrit has quit IRC | 10:03 | |
tobiash | looking | 10:08 |
*** openstackgerrit has joined #zuul | 10:22 | |
openstackgerrit | Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Connection change cache improvement https://review.openstack.org/501187 | 10:22 |
openstackgerrit | Joshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Only grab the gerrit change if necessary https://review.openstack.org/456162 | 10:22 |
tobiash | jhesketh: +2 | 10:23 |
*** jkilpatr has quit IRC | 10:42 | |
rcarrillocruz | mordred, jeblair : ok, so I created "Ricky Zuul" GH app https://github.com/apps/ricky-zuul . The perms are a bit of guesswork, put r/w on PR and r/w on repo contents | 10:46 |
rcarrillocruz | oh, and also on commit statuses | 10:46 |
rcarrillocruz | i installed that app on my rcarrillocruz org dummy repo 'zuul-tested-repo' | 10:46 |
rcarrillocruz | now on my way to set up zuul-scheduler with github driver | 10:47 |
rcarrillocruz | pabelanger: ^ | 10:47 |
rcarrillocruz | erm, i guess the commit statuses is not needed | 11:01 |
rcarrillocruz | jeblair: so i guess once we agree the bare minimum perms that are needed for creating a bespoke GH app for zuul usage, I can push a change and document that | 11:02 |
rcarrillocruz | is that expected to change? i remember reading a perms model change in GH, something about graphql , not sure if the GitHub App thing may be in flux ? | 11:03 |
*** jkilpatr has joined #zuul | 11:07 | |
*** jkilpatr has quit IRC | 11:07 | |
*** jkilpatr has joined #zuul | 11:07 | |
*** jkilpatr has quit IRC | 11:15 | |
*** jkilpatr has joined #zuul | 11:28 | |
mordred | rcarrillocruz: my understanding is that the App thing itself is the "new" way for 3rd parties to provide services to github users | 11:31 |
mordred | rcarrillocruz: but you're very right - gh is moving their apis to all be graphql-based | 11:31 |
mordred | rcarrillocruz: so at *some* point we'll need to update the gh driver to use graphql-api instead of rest | 11:32 |
mordred | ianw: hrm. not to me | 11:33 |
mordred | jhesketh: we miss you woring on zuul! | 11:33 |
rcarrillocruz | sigh | 11:35 |
rcarrillocruz | what's wrong with rest | 11:35 |
* rcarrillocruz has nightmares, as network vendors instead of adopting rest they are coming back to xml apis | 11:36 | |
rcarrillocruz | does http://paste.openstack.org/show/620512/ ring a bell anyone? | 11:41 |
rcarrillocruz | that from scheduler startup | 11:41 |
rcarrillocruz | if i do on python3 shell | 11:41 |
rcarrillocruz | import github3 | 11:42 |
rcarrillocruz | gh = github3.GitHub() | 11:42 |
rcarrillocruz | it does not have a session either | 11:42 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Connection change cache improvement https://review.openstack.org/501187 | 11:51 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Only grab the gerrit change if necessary https://review.openstack.org/456162 | 11:51 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Collapse tox_environment and tox_environment_defaults https://review.openstack.org/501075 | 12:01 |
*** weshay_PTO is now known as weshay | 12:08 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Collapse tox_environment and tox_environment_defaults https://review.openstack.org/501075 | 12:11 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Delete unused run-cover role https://review.openstack.org/501244 | 12:12 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Switch to openstack-doc-build for doc build jobs https://review.openstack.org/501246 | 12:18 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Be explicit about byte and encoding in command module https://review.openstack.org/501040 | 12:21 |
mordred | jeblair: SOOOOOO | 12:26 |
mordred | jeblair: issue for you to look at as soon as you are awake | 12:26 |
mordred | jeblair: I just watched shade have a job in the gate pipeline - which is incorrect | 12:27 |
mordred | jeblair: the build uuid is 80e369018a664c5f86c3cd64af7a4640 | 12:27 |
mordred | jeblair: https://review.openstack.org/#/c/494535/ is the change it happened for ... | 12:29 |
mordred | jeblair: there is another change: https://review.openstack.org/#/c/500201/ which added it the job to the gate pipeline, which I approved because I'm a moron and didn't register that it had a gate entry | 12:30 |
mordred | jeblair: that change had not landed, but it *WAS* running in the gate when https://review.openstack.org/#/c/494535/ was approved and enqueued | 12:31 |
mordred | rcarrillocruz: zomg network vendors are moving back to XML? they should at least, if they're not going to do REST, do something sane like gRPC | 12:32 |
mordred | rcarrillocruz: did you properly get the version of github3.py from git? | 12:33 |
rcarrillocruz | https://en.wikipedia.org/wiki/NETCONF | 12:33 |
rcarrillocruz | it's been around for a while, but getting more vendors onboard now | 12:33 |
rcarrillocruz | which is a shame, since there's a thing called RESTConf | 12:33 |
mordred | rcarrillocruz: http://git.openstack.org/cgit/openstack-infra/zuul/tree/requirements.txt?h=feature/zuulv3#n5 | 12:33 |
rcarrillocruz | anyay | 12:33 |
mordred | rcarrillocruz: **HEADDESK** | 12:34 |
rcarrillocruz | mordred: yeah, i did | 12:34 |
rcarrillocruz | thing is | 12:34 |
mordred | rcarrillocruz: now is _definitely_ the time to start adopting a protocol written in 2006 | 12:34 |
rcarrillocruz | i don't understand that code | 12:34 |
rcarrillocruz | the github object is suppoed to get the session when it logins | 12:34 |
rcarrillocruz | let me link | 12:34 |
rcarrillocruz | https://github.com/openstack-infra/zuul/blob/feature/zuulv3/zuul/driver/github/githubconnection.py#L427 | 12:35 |
rcarrillocruz | it fails there | 12:35 |
rcarrillocruz | but, the login is done after that method | 12:35 |
rcarrillocruz | so at that point there's no session | 12:35 |
rcarrillocruz | commenting out those lines the execution goes over fine | 12:38 |
mordred | rcarrillocruz: that's really weird - Idon't see that error in our logs | 12:39 |
mordred | rcarrillocruz: I wonder if there is a difference in how we have the auth things configured? | 12:39 |
mordred | rcarrillocruz: http://paste.openstack.org/show/620519/ is our github config snippet (with two values omitted, clearly) | 12:40 |
rcarrillocruz | yeah, i have the same thing | 12:41 |
rcarrillocruz | mordred: http://paste.openstack.org/show/620521/ | 12:42 |
rcarrillocruz | that's pretty much what the driver code does | 12:42 |
rcarrillocruz | in a python3 shell session | 12:42 |
rcarrillocruz | getting a client | 12:42 |
rcarrillocruz | i don't have a session attr | 12:42 |
rcarrillocruz | i'm confused how that works in your side | 12:42 |
mordred | rcarrillocruz: that worksfor me: <github3.session.GitHubSession object at 0x7efeae5aa9e8> | 12:44 |
rcarrillocruz | hmm | 12:44 |
mordred | http://paste.openstack.org/show/620522/ | 12:45 |
mordred | rcarrillocruz: >>> github3.__version__ | 12:45 |
mordred | '1.0.0a4' | 12:45 |
mordred | although that doesn't really tell what version from git it's installed from | 12:46 |
rcarrillocruz | eugh | 12:49 |
rcarrillocruz | that was it | 12:49 |
rcarrillocruz | it seems i had a github lib floating around | 12:49 |
rcarrillocruz | my messing with pip vs pip3 probably | 12:50 |
rcarrillocruz | thx | 12:50 |
mordred | rcarrillocruz: the pip vs. pip3 thing has bitten us more than once :) | 12:50 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Collapse tox_environment and tox_environment_defaults https://review.openstack.org/501075 | 12:56 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Delete unused run-cover role https://review.openstack.org/501244 | 12:56 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add UPPER_CONSTRAINTS_FILE file if it exists https://review.openstack.org/500320 | 12:56 |
rcarrillocruz | mordred: the webhook URL path, is by default <zuul_server>/connection/github/payload or you have some reverse proxy redirecting that to other thing ? | 13:00 |
mordred | rcarrillocruz: it is that by default where zuul_server is the zuul-scheduler webapp process | 13:02 |
rcarrillocruz | sweet | 13:03 |
mordred | rcarrillocruz: we also have a reverse proxy in front of that for us, as there are 2 different web apps at the moment (zuul-scheduler and zuul-web) and we want to hide that | 13:03 |
rcarrillocruz | that reminds me i should bring t up now (zuul-web) | 13:03 |
mordred | hopefully it won't be too long after the PTG for us to migrate the rest of the scheduler webapp into zuul-web so we can go back to having one web app | 13:04 |
rcarrillocruz | jebus | 13:09 |
rcarrillocruz | http://paste.openstack.org/show/620528/ | 13:09 |
rcarrillocruz | i'm excited! | 13:09 |
rcarrillocruz | \o/ | 13:09 |
Shrews | morning folks | 13:13 |
mordred | morning Shrews | 13:13 |
mordred | rcarrillocruz: woot! | 13:13 |
mordred | rcarrillocruz: it's kind of amazeballs isn't it? | 13:13 |
rcarrillocruz | for sure :D | 13:14 |
*** dkranz has joined #zuul | 13:21 | |
mordred | jeblair: also still seeing the weird -2 on patches not at the top of a stack over in shade even with yesterday's patch running | 13:23 |
mordred | jeblair: http://paste.openstack.org/show/620530/ is the relevant portion of the log I think | 13:23 |
mordred | jeblair: also, a little further back in the log: 2017-09-06 12:03:16,903 DEBUG zuul.DependentPipelineManager: Scheduling merge for item <QueueItem 0x7f4882408b38 for <Change 0x7f48927d5ba8 500930,2> in gate> (files: ['zuul.yaml', '.zuul.yaml'], dirs: ['zuul.d', '.zuul.d']) | 13:27 |
rcarrillocruz | mordred: does zuul-scheduler log github events? | 13:28 |
rcarrillocruz | like if i do a PR push, should I expect something in that log (in my case, foreground, not logging to its own file yet) | 13:29 |
mordred | yes | 13:29 |
mordred | you should definitely see activity | 13:29 |
rcarrillocruz | sweet, i'll ry that out | 13:30 |
mordred | rcarrillocruz: https://github.com/organizations/openstack-infra/settings/apps/openstack-zuul/advanced (or replacing with your url) | 13:30 |
mordred | rcarrillocruz: shows you a list of events github has delivered to your app | 13:30 |
mordred | rcarrillocruz: we should be logging the event id so that if you want you can cross-reference with github's log | 13:31 |
mordred | jlk: ^^ speaking of that ... on that advanced tab gh also shows the response it got | 13:31 |
rcarrillocruz | sweet | 13:31 |
mordred | jlk: I wonder if maybe we should add $something to our response - like a header - that includes $something from zuul | 13:32 |
rcarrillocruz | oh , i get 404 , i guess cos i'm not member of the org | 13:32 |
mordred | jlk: it's possible we don't have anything yet | 13:32 |
rcarrillocruz | but yeah | 13:32 |
rcarrillocruz | i can look on my own org | 13:32 |
mordred | jlk: at that stage of processing | 13:32 |
mordred | jlk: but if we do, maybe returning it in our response headers is a thing that could be useful somehow? | 13:33 |
mordred | jlk: just an idle thought | 13:33 |
*** hashar is now known as hasharAway | 13:38 | |
rcarrillocruz | hmm, mordred i don't see logging on a PR I just pushed. Howevre, I do not have layout.yaml set yet. I wonder if the logging is only when the project is set up on a pipeline with jobs and all. i.e. the webhook raw events are not logged ? | 13:38 |
rcarrillocruz | nah, seems like a config issue | 13:41 |
rcarrillocruz | checking the github app i see undelivered messages | 13:41 |
* rcarrillocruz looks | 13:41 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Update tests to use AF_INET6 https://review.openstack.org/501266 | 13:50 |
mordred | rcarrillocruz: yah - you shold see log entries for every event that happens | 13:51 |
rcarrillocruz | tadaaaa | 14:03 |
rcarrillocruz | Sep 06 14:03:08 zuul sh[16326]: 2017-09-06 14:03:08,619 DEBUG zuul.GithubWebhookListener: Github Webhook Received: 7eda70b6-9308-11e7-8827-64faa1b0f4fd | 14:03 |
rcarrillocruz | got confused between zuul-webapp and zuul-web ports | 14:03 |
rcarrillocruz | put 8001 on the gh app URL and sorted it | 14:03 |
mordred | woot! | 14:12 |
mordred | rcarrillocruz: and yah - I'm looking forward to there only being one web port - the current thing is annoying | 14:12 |
mordred | tobiash: left -1 on https://review.openstack.org/#/c/500799 - but overall I like both sides of that stack! | 14:13 |
tobiash | :) | 14:13 |
rcarrillocruz | in order to have feature parity to what we have with dci (periodic CI jobs), i'll set up a periodic pipeline and set what we have now. After that, check_github and gate_github | 14:15 |
dmsimard | mordred: hey, a bit of a silly question -- how do we make the base job work on either ubuntu-xenial and centos-7, but not both ? | 14:17 |
mordred | rcarrillocruz: \o/ | 14:17 |
dmsimard | right now the base job defaults to one ubuntu-xenial node -- so jobs wanting to run on something else would need to override it I guess ? | 14:18 |
mordred | dmsimard: uhm. I'm not 100% sure what you mean by that - can you say that with different words? | 14:18 |
mordred | dmsimard: yes! that is correct | 14:18 |
mordred | dmsimard: jobs that want not-ubuntu-xenial just add whatever they want in nodes: | 14:18 |
dmsimard | mordred: okay, pabelanger stood up the centos-7 image so I'll run some tests to see if it works ahead of the migration | 14:18 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Fix missing logconfig when running tests in pycharm https://review.openstack.org/500748 | 14:19 |
mordred | dmsimard: ++ | 14:19 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Close logging config file after write https://review.openstack.org/500754 | 14:19 |
dmsimard | mordred: the JJB translation to shell will handle the node definition as well ? | 14:19 |
dmsimard | some have centos-7, centos-7-2-nodes etc | 14:20 |
mordred | dmsimard: yah | 14:21 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Do not merge: test v3 jobs on the centos-7 image https://review.openstack.org/501281 | 14:21 |
jeblair | mordred: regarding 494535 being in gate -- i'm not actually sure that was incorrect. when a change is enqueued into gate, it's a proposed future state, and each change enqueued after it exists in the context of that proposed future state. so as soon as the change adding the shade job to gate was in the gate pipeline, the next shade change added to the gate pipeline should run that job too, because it's running in a world where shade has a gate ... | 14:33 |
jeblair | ... job. | 14:33 |
mordred | jeblair: yes - for sure! | 14:33 |
mordred | jeblair: but - when the change adding the gate job does not land, the change behind it should get re-enqueued in a context that doesn't include the gate-adding change | 14:34 |
mordred | jeblair: so while it should run in the gate for a moment in time, the other change should certainly not be merged by the gate pipeline | 14:34 |
mordred | (which is what happened) | 14:35 |
jeblair | mordred: ah yes. i'll make a test. | 14:35 |
mordred | jeblair: here's the sequence that occurred https://etherpad.openstack.org/p/FZhs4Rh86F | 14:39 |
jeblair | mordred: got it | 14:40 |
mordred | jeblair: also - I was going to fix pabelanger's change to remove the gate mention, but was waiting to make sure you didn't need it for any reason | 14:40 |
jeblair | nope | 14:40 |
mordred | jeblair: cool. also added followup with theory on what the followups are with the other changes (that issue is persistent and still happening, fwiw) | 14:45 |
jeblair | mordred: that makes sense too | 14:47 |
mordred | k. cool | 14:47 |
jeblair | we'd certainly want to fix that before we start having zuulv3 chime in on more repos. | 14:47 |
jlk | mordred: a header tossed back, some sort of uuid? | 14:48 |
jlk | mordred: would make sense, particularly if we could carry that ID throughout zuul logging, for tracing. | 14:50 |
mordred | jeblair: agree | 14:51 |
mordred | jlk: and yah -maybe? I mean, on the other hand we already log the GH event id in the zuul logs | 14:51 |
mordred | jlk: so I'm not sure if reporting a zuul id in the gh response is *actually* useful for cross-reference | 14:52 |
jlk | well | 14:52 |
mordred | jlk: since you'd ultimately need to find the gh event id in the zuul log to know which event to open in the gh ui | 14:52 |
jlk | it'd make sense if we did the same elsewhere, and had an ID that could be carried around like with openstack | 14:52 |
mordred | yah - that's a good point "I got an event from something, I created an ID for it, and that ID is gonna carry through the system on the things that event triggers" | 14:53 |
jlk | but yeah, if you were looking from zuul side, you'd eventually trace it back to the incoming event, which would have the GH ID | 14:53 |
mordred | yup | 14:53 |
jlk | so maybe not as useful to toss it back, but certainly that idea spurred other ideas :D | 14:54 |
jlk | spurred? | 14:54 |
mordred | jlk: oh - I also happened to notice in the log: | 14:54 |
mordred | AttributeError: 'GithubWebhookListener' object has no attribute '_event_pull_request_review_comment' | 14:54 |
mordred | jlk: only 4 of them in the current debug log | 14:54 |
jlk | yeah we aren't listening for those | 14:54 |
jlk | that's when somebody does a "review" and comments on the code/review in the review context | 14:54 |
jlk | separate from just a single comment on the PR | 14:55 |
jlk | (and also different from a single comment on the diff) | 14:55 |
mordred | jlk: gotcha. so we don't care about those because we only care about comments on PRs for recheck, yeah? | 14:55 |
jlk | correct. Those come through as issue comment | 14:55 |
mordred | jlk: and for reviews I'd imagine we'd prefer to respond to review approval rather than text in a review | 14:56 |
jlk | because a PR is an issue, except that it isn't. | 14:56 |
mordred | yah | 14:56 |
jlk | mordred: bingo. We do respond to approval/request changes events | 14:56 |
mordred | jlk: maybe we should make a few explicit no-op handlers for thins we know we're not listening to on purpose | 14:56 |
mordred | jlk: so that we don't log AttributeErrors for a thign that's actually purposeful behavior | 14:57 |
jlk | We could do that. It should be gracefully returning to github if we don't handle the event. | 14:57 |
jlk | but yeah, that's a tad ugly in the code | 14:57 |
jlk | Maybe we could just not emit that error when we don't match an event. | 14:57 |
jlk | GH adds events from time to time, we'd be chasing it if we had a noop for each. | 14:57 |
mordred | jlk: yah - at this point I think we're fairly happy with our event matching - we could make a separate logger for unmatched events that defaults to off but that people could add to their logging config if they wanted to debug something related to event matching | 14:59 |
mordred | btw - we're getting "We couldn’t deliver this payload: Service Timeout" from time to time on gh events | 15:01 |
jlk | oh wonderful | 15:02 |
jlk | from gh to zuul or from zuul to gh? | 15:02 |
mordred | so it's possible that we've already hit the scaling point where having more than one webhook listener behind a loadbalancer is needed | 15:02 |
mordred | gh to zuul | 15:02 |
jlk | yeah, I figured that would happen soon | 15:02 |
jlk | I think a big part of that problem is a bunch of processing happens while the sender is connected | 15:03 |
mordred | yah - even though we're not doing anything with them yet, the firehose of gh events from ansible/ansible is actually kind of useful for shaking this sort of stuff out | 15:03 |
jlk | sender connects, zuul chews on the event for a bit, hits the API a bunch, then returns | 15:03 |
jlk | it would probably be much better to take in the event content and return right away. | 15:04 |
mordred | ah. any reason we can't return as soon as we have the json even if we haven't enqueued it yet? or do you think waiting until it's enqueued so that we properly return to gh that we didn't accept it is better? | 15:04 |
jlk | we should be doing minimal processing. | 15:04 |
jlk | I think I broke this a bit | 15:04 |
jlk | it probably was really fast before, but when I moved to caching the PR data, it meant hitting the APIs much earlier. | 15:05 |
mordred | yah. I think returning quickly is likely better here- and we can do work zuul-side to make sure we either don't lose events after returning 200 or that we log ourselves when we do | 15:05 |
jlk | building up the cached object at web event time and carrying it forward. | 15:05 |
jlk | we could still do that, we just have to be careful | 15:05 |
jlk | need a persistent queue | 15:05 |
jlk | btw I'm going to try to participate in the ansible contributor day thing | 15:06 |
mordred | yah - well - luckily the move from webapp to zuul-web will put a gearman in the middle anyway | 15:06 |
jlk | via video/IRC | 15:06 |
mordred | jlk: cool | 15:06 |
jlk | I think the one hard one to easily do a 200 immediately on is when it's a status event | 15:06 |
jlk | actually, no, we could probably post-process that anyway | 15:06 |
jlk | so basically, we'd get an event from GH, we'd ensure it's signed and properly formatted, then return a 200. Maybe we could check to see if it's a project we care about and do a !200 if we don't care about the project yet. | 15:09 |
jlk | doing all the API work in the event thread ties up the event processor. I think that's single threaded, no? | 15:09 |
jeblair | jlk, mordred: that's what we do with gerrit today -- there's a queue object that connects the gerrit listener to the gerrit connection. all of that within the driver. | 15:10 |
jlk | nod | 15:10 |
jeblair | so if we need this before we move the listener into zuul-web, there's a pattern we can copy pretty quickly. it's not much code. | 15:11 |
jlk | yeah I could probably bang that out today | 15:11 |
jlk | see if that gets us out of resource contention | 15:11 |
jlk | I honestly think I'll need some high bandwidth brain / face time with y'all to sort out the zuul-web move in my head. I read some of the code but it's not exactly clicking yet | 15:12 |
jeblair | jlk: it helps if you write it out on a piece of glass and look at it from the back side, upside down | 15:17 |
jlk | perfect! | 15:19 |
mordred | jlk: :) | 15:20 |
mordred | jlk: I thinkn we can sort out the zuul-web move with high bandwidth brain time pretty quickly - it's actually pretty straightforward given the structure of the github driver - at least in my head | 15:22 |
* jlk drops off to prepare for ansiblefest things | 15:26 | |
rcarrillocruz | folks, i have to say the zuul docs have been vaaastly improved | 15:29 |
rcarrillocruz | kudos everyone | 15:29 |
* rcarrillocruz keeps reading how to define zuul v3 jobs | 15:29 | |
*** hasharAway is now known as hashar | 15:31 | |
mordred | rcarrillocruz: luckily - we've got a TON of content now | 15:33 |
Shrews | more content to go in with some +3's: https://review.openstack.org/500213 | 15:43 |
Shrews | jeblair: left a -1 on https://review.openstack.org/500216 because of a missing _ causing the link to not be clickable | 15:44 |
jeblair | Shrews: okay, i'll update that after i finish writing tests for mordred's problem | 15:45 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Do not merge: test v3 jobs on the centos-7 image https://review.openstack.org/501281 | 15:45 |
jeblair | mordred: with the test i've written, change B *does* incorrectly run the gate job, however, it correctly *does not* report it. | 15:46 |
jeblair | mordred: are you sure there aren't any other gate jobs? maybe some with matchers so they aren't actually run? | 15:48 |
jeblair | mordred: or were there any other changes involved in that sequence? | 15:48 |
mordred | jeblair: no - not to my knowledge to either | 15:58 |
jeblair | oh... there *might* be an interaction with the check pipeline... lemme rejigger the test | 15:59 |
mordred | jeblair: there is no mention of shade in a zuul gate pipeline anywhere other than that one change | 16:00 |
jeblair | mordred: ah there we go -- it's the presence it check that caused that behavior. i think i have the reproduction now. sorry for the red herring. | 16:01 |
pabelanger | mordred: I think we are ready to land https://review.openstack.org/500990 this morning. Do you have time to review? Our new publish-openstack-python-docs-infra job | 16:01 |
pabelanger | and child patches | 16:02 |
mordred | jeblair: woot! | 16:03 |
mordred | pabelanger: looking now | 16:03 |
mordred | jeblair: glad you found a reproduction - it's those sorts of squirrely things that this whole run-it phase should be smoking out | 16:06 |
rcarrillocruz | folks, where is the zuul base default job defined | 16:09 |
rcarrillocruz | is it in tree | 16:09 |
rcarrillocruz | or within zuul-jobs repo | 16:09 |
jlk | I thought it was in zuul-jobs | 16:09 |
rcarrillocruz | asking as i defined a custom job on my test repo | 16:09 |
rcarrillocruz | and got | 16:09 |
jlk | or maybe project-config | 16:09 |
jlk | project-config | 16:09 |
rcarrillocruz | "Job base not defined" | 16:09 |
mordred | rcarrillocruz: it's project-config | 16:10 |
rcarrillocruz | so, that means, it's a requirement to pull that repo in order to have a minmal zuul right? | 16:10 |
mordred | rcarrillocruz: you have to define your own base job | 16:10 |
jlk | playbooks/base | 16:10 |
jlk | you can define your own base, or re-use project-config. | 16:10 |
mordred | rcarrillocruz: however, our base job playbooks are just built on roles that are all in zuul-jobs | 16:10 |
rcarrillocruz | k, thought we had some sort of empty 'base' in the code | 16:10 |
rcarrillocruz | so we didn't have to define it | 16:10 |
mordred | rcarrillocruz: it's on our todo list to do that- there are a few things that need to be sorted out first,so for now you need a deployment-specific base job | 16:11 |
rcarrillocruz | dummy question: as there's going to be a super handy library of base jobs, how you plan to distribute that? as part of pip or will it always be a thing on git openstack | 16:12 |
mordred | rcarrillocruz: I recommend copying the base job + playbooks from project-config, then defining a secret that holds credential for whereever you want to upload logs | 16:12 |
mordred | rcarrillocruz: git openstack | 16:12 |
mordred | rcarrillocruz: because you can just put openstack-infra/zuul-jobs directly in your zuul/main.yaml | 16:12 |
mordred | rcarrillocruz: and zuul will update it for you magically | 16:12 |
mordred | rcarrillocruz: I'm sure at somepoint someone is going to think they want a frozen pip/rpm/deb installable set of jobs, but I'm going to argue with them as strongly as I can that they don't really want that :) | 16:13 |
mordred | pabelanger: I'm +2 on that whole stack | 16:13 |
rcarrillocruz | ah ofc, zuul-jobs is *in* github too | 16:14 |
rcarrillocruz | so in my case, a only github driver installation, it would pull it as well | 16:14 |
rcarrillocruz | ++ | 16:14 |
jlk | yeah you can point to github | 16:14 |
jlk | we do at Bonny | 16:14 |
mordred | rcarrillocruz: yup | 16:14 |
dmsimard | mordred, jeblair: is it possible to prevent the base role from running ? | 16:15 |
rcarrillocruz | jlk: periodic also work on github right? | 16:15 |
rcarrillocruz | i'm doing a POC | 16:15 |
jlk | periodic driver? | 16:15 |
jlk | I haven't tried... | 16:15 |
rcarrillocruz | a periodic pipeline | 16:15 |
rcarrillocruz | with github source | 16:15 |
jlk | I mean, it should? | 16:15 |
mordred | dmsimard: you can say "parent: none" to make a job that doesn't use the base job | 16:15 |
dmsimard | mordred: perfect! thanks. | 16:15 |
mordred | dmsimard: although if you did that on openstack's zuul you'd be very sad | 16:15 |
mordred | dmsimard: since you odn't get logging without our base job :) | 16:15 |
dmsimard | mordred: right, but the purpose is to test the base playbook (and the roles it contains) | 16:16 |
dmsimard | so it's kind of inconvenient if the trusted role runs first, and then we re-run the (modified) role on top | 16:16 |
mordred | oh - well - we'll never run a proposed versoin of the base job in a job | 16:16 |
dmsimard | mordred: what is preventing me from adding a required-projects: project-config and then running that playbook with the checked out roles from a review ? | 16:17 |
mordred | dmsimard: zuul is | 16:17 |
mordred | oh - hrm. | 16:18 |
mordred | dmsimard: yah - ok, you could construct something - it would still be synthetic, as it wouldn't have access to the secrets needed for the base job to work | 16:18 |
dmsimard | mordred: the tl;dr is that I want to make sure the base playbook works for all distros -- right now it only works for ubuntu. This is for centos: https://review.openstack.org/#/c/501281/ but I'll also add the debian, fedora and opensuse image to nodepool v3 | 16:19 |
dmsimard | and not getting configure-mirror "self-tested" in the gate will make this suck, a lot | 16:20 |
dmsimard | thus, I was planning on spawning a multi node and run ansible from a controller node, to the node where the base role would be applied -- a bit like how you showed me with zuul stream | 16:20 |
mordred | dmsimard: yah - well, the base-job content is purposely not self-testing - which is why we need to make some synthetic tests | 16:20 |
mordred | dmsimard: oh - but yes, that's exactly right | 16:20 |
dmsimard | mordred: so, can I do that then ? | 16:20 |
mordred | dmsimard: doing that is, I think, what we need to do to verify base content - but then it's not really about being ableto runa job without a its base job... | 16:21 |
mordred | or, rather... | 16:21 |
mordred | dmsimard: the synthetic job will still need to deal with providing the job running from the controller to the other node with variables that the job can use whenit runs the roles in the base job's playbooks | 16:22 |
dmsimard | sure, I can figure what to pass and provide "mock" data as necessary | 16:23 |
mordred | dmsimard: so - that's a thing you could have your synthetic job create - like making sure there is a key installed on one of the nodes, then passing it in | 16:24 |
dmsimard | the purpose is to test that it works without failing horribly | 16:24 |
dmsimard | right | 16:24 |
pabelanger | mordred: thanks, jeblair clarkb fungi: are you interested in reviewing https://review.openstack.org/500990 for our publish-openstack-python-docs-infra jobs | 16:24 |
mordred | dmsimard: we still likely want to make base job in project-config like "base-post-only" or sometihng that you could use that would only run the post-logs playbook - and maybe that would only run the pre-playbooks against the controller node instead of against hosts: all | 16:25 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix dynamic dependent pipeline failure https://review.openstack.org/501345 | 16:25 |
rbergeron | jlk, pabelanger: jeblair / mordred / shrews aren't here with us today at the ansible contributor event -- but since you're here (paul in person, jesse remotely) if you feel like there's anything to hail them about, feel free to do so -- | 16:25 |
mordred | dmsimard: so that you can use the REAL base job to set up your interactions with controller and to get logs published, etc | 16:25 |
rbergeron | the agenda and bluejeans stuff is posted here: https://public.etherpad-mozilla.org/p/ansible-summit-september-2017-core including the bluejeans video stuff | 16:25 |
rbergeron | which i said twice in one sentence | 16:26 |
* mordred waves to rbergeron | 16:26 | |
rbergeron | anyway. | 16:26 |
rbergeron | shrews: i have your hoodie also :) | 16:26 |
fungi | pabelanger: i'm interested in reviewing anything and everything zuulv3, working my way through the infra-manual patches right now but i can look at those next | 16:26 |
* rbergeron waves to mordred | 16:26 | |
rbergeron | (and we're in #ansible-meeting on irc). sorry for the noise. also if anyone else wants to pop in for whatever reason you are welcome to :) | 16:26 |
jeblair | rbergeron: thanks! | 16:27 |
pabelanger | fungi: great! I think we're ready to start testing afs publishing again for infra jobs | 16:27 |
jeblair | mordred: https://review.openstack.org/501345 fixes the first thing (the A+B changes) | 16:28 |
mordred | jeblair: awesome. reading. also, I added post-merge review comments to https://review.openstack.org/#/c/500213 | 16:31 |
dmsimard | mordred: I'm not sure that matters, 'controller' is a subset of 'all' anyway -- so when running the playbook in the job, it will target a specific node as appropriate | 16:31 |
mordred | dmsimard: I believe I see where you're going, but I do not believe it's going to work quite like you want - you can get the proposed project-config change onto a build node, but you cannot get the existing zuul to execute playbooks from the proposed change no matter what you do because of the way config repos work | 16:39 |
clarkb | pabelanger: mordred please see comment on 990 | 16:39 |
mordred | dmsimard: so by synthetic, I mean you're going ot have to command: ansible-playbook something on the controller against one of the other nodes in the multinode job | 16:39 |
dmsimard | mordred: yes, exactly -- I'll be running ansible-playbook | 16:39 |
mordred | clarkb: will do - could you look at https://review.openstack.org/501345 ? | 16:39 |
dmsimard | it's not perfect but it's the best we've got | 16:39 |
mordred | dmsimard: ++ | 16:39 |
mordred | clarkb: we found a really fun edge case with gating this morning :) | 16:40 |
mordred | clarkb: responded. I agree with your comment, but I think pabelanger can make the mv docs/post.yaml docs/infra-post.yaml when he adds the real post playbook for the openstack job | 16:42 |
clarkb | mordred: re parent: none from above, the change you just linked uses parent: null. Is that just a convention of pointing to undefined name or is null actually needed? | 16:47 |
dmsimard | jeblair: Can parameters from a job also be applied to a node ? If, for example, you'd want to run a playbook or a role against only one node. | 16:52 |
mordred | dmsimard: you define that in the playbook | 16:52 |
mordred | dmsimard: you can define groups for your nodes in the nodeset definition | 16:52 |
mordred | dmsimard: so you can put some of them into a group and then write the playbook to target that group | 16:52 |
mordred | clarkb: parent: null is required for the base job - since by definition it's the root of the inheritance hierarchy | 16:53 |
clarkb | mordred: and that is distinct from parent: none? | 16:53 |
mordred | nope. Ijust mistyped earlier | 16:53 |
clarkb | ah ok | 16:53 |
mordred | null is the yaml for None iirc | 16:54 |
dmsimard | mordred: so here's my next awesome problem | 16:54 |
clarkb | like false == False and so on | 16:54 |
mordred | mian thing is - for a base job you must tell zuul explicitly it doesnt have a parent | 16:54 |
mordred | becase omitting parent: means parent: base | 16:54 |
mordred | clarkb: yup | 16:54 |
pabelanger | mordred: clarkb: replied. Yes, publish-openstack-python-docs job needs to be updated now, what I am working on locally now. But want to make sure that python-docs-infra is now working, since we can build a top of that for unified docs | 16:54 |
dmsimard | mordred: I'd like the *real real* base job to actually run on the controller node (so it can, like, upload logs for real and stuff) but not on the node I'm going to fake-run base on | 16:54 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add test for dependent changes not in a pipeline https://review.openstack.org/501353 | 16:55 |
dmsimard | I hope that makes sense | 16:55 |
jeblair | mordred: ^ turns out that change fixed the second thing too, so that's just a test add | 16:55 |
mordred | dmsimard: right- that's why I was thinking we needed to make a special base job for this purpose in project-config that is like the base job but has playbooks that target a specific group instead of "all" | 16:55 |
* jeblair reads scrollback | 16:56 | |
mordred | jeblair: woot. btw - the first patch failed tests in gate | 16:56 |
dmsimard | mordred: making another playbook is easy, I mean, I can just add it as a "fixture" in zuul-jobs -- but then it can get out of sync from the "real" base playbook | 16:56 |
dmsimard | mordred: oh, wait, I'm confusing myself I see what you mean now | 16:57 |
dmsimard | ok, sure, let's do that | 16:57 |
mordred | dmsimard: although - now that I think about it - hosts: lines can have variables - so we COULD consider making a variable on the base job that is like "zuul_default_target: all" - and then defining some of our base playbooks to use hosts: "{{ zuul_default_target }}" - which would let people override that variable and run base playbooks against a subset of nodes | 16:57 |
dmsimard | mordred: what I was thinking about is more along the lines of --limit from the CLI | 16:58 |
dmsimard | mordred: your playbook has 'all' but you're passing a --limit <node name> so that you'd only be running against a specific node or group | 16:58 |
mordred | I could see times in which that would be beneficialto other people - especially with a controller/nodes pattern - like if osmeone wanted 3 nodes for a puppet integration test but only was ever going to run their zuul playbooks against controller since they want puppet to talk and manage their other nodes | 16:58 |
pabelanger | mordred: jeblair: re: wheel builders we'll need more openstack-infra projects to zuulv3, are we okay to do that or hold off until mass import? | 16:58 |
mordred | dmsimard: I thnk both are things we should consider - but for now let's just do a second base job with a limited hardcoded set | 16:59 |
mordred | dmsimard: and hash out a plan for the general usecase next week | 16:59 |
mordred | pabelanger: it's only 3 more projects | 16:59 |
dmsimard | mordred: right, I think adding support for a parameter which gets passed to --limit makes sense, wonder if I should write it down somewhere | 16:59 |
pabelanger | mordred: yes, I can propose it now. just wanted to confirm first | 16:59 |
mordred | https://review.openstack.org/#/c/500626/2/zuul.yaml | 16:59 |
mordred | pabelanger: I have the whole wheel-builder stack :) | 17:00 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Add zuul-cloner shim https://review.openstack.org/500922 | 17:00 |
jeblair | dmsimard: why not just write your playbook to act on the nodes you want it to? | 17:00 |
Shrews | rbergeron: \o/ | 17:01 |
mordred | jeblair: because whathe wants to do is limit the nodes the base playbook operates on | 17:01 |
jeblair | mordred: if that's the case then we have too much stuff in the base playbook | 17:01 |
mordred | jeblair: so that content he runs on the 'controller' node is what's responsible for doing the things to the other node that base would normally do | 17:01 |
dmsimard | it's probably confusing to explain in writing, I'll write it down and explain in person :D | 17:01 |
pabelanger | mordred: Oh, now I see | 17:02 |
mordred | jeblair: not really - zuul_stream test, for instance, has a zuul_console on the node it runs against because the zuul base job sets that up - so we don't actually test in that change that running zuul_console on the remote node does what we expect | 17:02 |
pabelanger | doh | 17:02 |
mordred | jeblair: but I think we can shelve this as a general topic until next week | 17:02 |
dmsimard | mordred: could/should I re-purpose base-test for that purpose ? | 17:02 |
pabelanger | mordred: we need to land zuul/main.yaml first. I'll make change | 17:02 |
mordred | jeblair: and for now do exactly what you said - which is "write a playbook which explicitly lists the hosts desired" | 17:02 |
jeblair | mordred: i have a really high bar for adding features to zuul for the express purpose of being able to test zuul | 17:02 |
mordred | jeblair: yes. I do not think it's needed | 17:03 |
mordred | jeblair: to add any features | 17:03 |
jeblair | so the fact that zuul_stream is hard to test doesn't bother me. that's *our* problem, we don't need to inflict that on users of the general case :) | 17:03 |
mordred | jeblair: right. I'm not saying we need to add any features - I think there are some things we can do simply right now, and some things we can do that might be more complex and general that we can talk about next week | 17:03 |
jeblair | (we can write a playbook to kill it; job done :) | 17:03 |
dmsimard | jeblair: What I'm talking about is not for the purpose of testing Zuul, it's a generic feature of Ansible to be able to limit what hosts a playbook will run against -- your playbook could have 'hosts: all' but if you do ansible-playbook playbook.yml --limit controller, it would only run against your controller node. | 17:04 |
jlk | Do we have any pending upstream Ansible features? | 17:05 |
*** jkilpatr has quit IRC | 17:06 | |
mordred | jlk: the upstreaming of log streaming that we discussed with abadger1999 and jimi-c - but I believe that's at "write a spec" stage | 17:06 |
jlk | okay | 17:06 |
jeblair | dmsimard: right, but i worry that in the zuul context that gets a little confusing when compared to playbooks authored to run against node lists. it seems like if you have a playbook that runs against 'foobar' nodes, and you don't want it to run on foobar nodes, don't add any foobar nodes to the job? | 17:07 |
mordred | jlk: other than that, I think we're pretty solid at the moment | 17:07 |
jlk | okay | 17:07 |
*** harlowja has quit IRC | 17:07 | |
*** harlowja has joined #zuul | 17:07 | |
mordred | for now how about we let's add a base-integration - or even base-zuul-integration that has the same content as base but with playbooks that target controller instead of all - it's a purely job-content solution for being able to build a zuul job to test the base job | 17:09 |
mordred | which, as jeblair points out, is a fairly unique and specific problem | 17:10 |
Shrews | jeblair: cloner shim is ready for review. I thought it would be less error prone to just flat out copy the ClonerMapper class into the shim. Also, using 'cp' for the hard linking since that seemed less error prone than a pure python solution. | 17:10 |
jeblair | yeah, i think part of the context that we're missing here is that we just spent about 3 weeks ensuring that the base content ran everywhere as part of our security posture | 17:10 |
jeblair | mordred, dmsimard: so anything where we back that out is something that we need to do very carefully | 17:11 |
jeblair | mordred, dmsimard: for instance, even in the reduced base job that mordred is talking about, we need to run the ssh key thing | 17:12 |
jeblair | mordred, dmsimard: and there can be no ability for a job to opt-out of that, otherwise we have created a vulnerability | 17:13 |
dmsimard | jeblair: ok let's take a step back | 17:13 |
jeblair | mordred, dmsimard: (to be clear, i'm in favor of mordred's reduced base job, as long as it still contains the ssh key roles running on all hosts) | 17:14 |
dmsimard | jeblair: instead of telling you what I think I want, let me tell you what's my problem and let's see if we're on the same wavelength | 17:14 |
dmsimard | jeblair: I'm trying to iterate on this: https://review.openstack.org/#/c/501281/ | 17:15 |
dmsimard | jeblair: my problem: how do I test that this doesn't break the base playbook on different distros ? | 17:16 |
pabelanger | Shrews: so if we wanted to bring nl02.o.o online, is it better to run both nl01 and nl02 at the same time or stop nl01.o.o and start nl02.o.o. Do you have a preference? | 17:16 |
Shrews | pabelanger: i can't think of any reason why you'd need to stop nl01 | 17:17 |
jeblair | dmsimard: thanks. having that example helps. | 17:17 |
dmsimard | jeblair: what I think I need: create a 2 node job, 'controller' and 'node', run the *real* base playbook on the controller, and get 'controller' to run ansible-playbook pre/post/etcbase.yaml on 'node' | 17:17 |
pabelanger | Shrews: eventually we want to stop / delete / rebuild nl01, since it is trusty | 17:17 |
dmsimard | but if the *real* base playbooks run on 'node', then it kind of sucks because I'm running on top of what already ran. | 17:18 |
clarkb | mordred: is log collecting in v3 expected to left align everything? http://logs.openstack.org/45/501345/1/check/tox-py35/4f47527/job-output.txt.gz#_2017-09-06_16_36_23_660139 | 17:18 |
pabelanger | I think we might want to start bringing online another zuulv3 merger, ze01.o.o is currently processing a large nova change | 17:19 |
jeblair | dmsimard: yeah, so i think mordred's suggestion of the reduced base job which only does minimal things (ssh keys, zuul_stream, logs) is the way to go; ssh keys and zuul_stream are the only things that will run on the remote node, and ssh keys are the only thing that might have an operating system interaction | 17:19 |
mordred | clarkb: nope. that's a bug | 17:19 |
*** jkilpatr has joined #zuul | 17:19 | |
jeblair | pabelanger: ze01 -- ze04 exist; you can make sure the others are up to date and bring them online | 17:19 |
jeblair | dmsimard: however | 17:20 |
pabelanger | jeblair: ah, right. I'll check that now | 17:20 |
jeblair | dmsimard: note that once you have the reduced base job, you don't actually need to implement this as a multinode job, you can run the additional roles on the main node | 17:20 |
mordred | clarkb: although that's content from inside of the output from tox from testr from zuul's tests - so I don't believe we're processing that exception text zuul side | 17:20 |
dmsimard | jeblair: I guess | 17:21 |
clarkb | jeblair: mordred: looking at test failures for 501345 I think that the fix has basically caught test fixtures that were/are broken and now we dequeue and don't report but assert we should report | 17:22 |
jeblair | dmsimard: i think the key thing here is the ssh keys -- regardless of the technical capabilities of the system, we must as a matter of policy in openstack-infra, at the very least run the ssh keys role on every node. | 17:22 |
mordred | jeblair: can you though? you won't be able to get zuul to put the proposed versions of the project-config roles in place on the executor | 17:22 |
mordred | jeblair: or I guess I'm wrong - the job playbook that declares role will get those ... so yah | 17:22 |
jeblair | mordred: ya that second thing, so we can have a test job in zuul-jobs that exercises the roles | 17:23 |
clarkb | mordred: unless that is a new behavior in testr I'm pretty sure it won't left align like that | 17:23 |
mordred | jeblair: yah- main thing will be that we won't be able to get zuul to run *playbooks* from project-config | 17:23 |
pabelanger | jeblair: mordred: clarkb: fungi: I'm going to start ze02.o.o now, any objections? It is up to date | 17:23 |
jeblair | clarkb: ah thanks, looks like i have a bit of cleanup to do | 17:23 |
mordred | jeblair: but sincethose playbooks are simple anyway, that shouldn't be a problem | 17:23 |
dmsimard | mordred: could you not do required-projects: project-config and then have a playbook that includes project-config/playbooks/something.yaml ? | 17:24 |
mordred | clarkb: yah - I'll look at the zuul_stream stack and see if I can reproduce | 17:24 |
mordred | dmsimard: nope | 17:24 |
dmsimard | mordred: or ansible-playbook project-config/playbooks/something.yaml | 17:24 |
dmsimard | why ? | 17:24 |
mordred | dmsimard: you can't execute commands on the executor | 17:24 |
jeblair | (that sounds ironic) | 17:24 |
mordred | dmsimard: the only way for playbooks to be executed from the executor is for zuul to execute them | 17:24 |
clarkb | mordred: if you don't mind I'd like to poke at that for a bit first just to gain more familiarity with the streaming setup | 17:25 |
mordred | so we can put a playbook in zuul-jobs that runs the same roles as the base job - and _that_ playbook is one that zuul can execute | 17:25 |
mordred | clarkb: awesome! I have a helper tool in tree to help get set up with local testing, fwiw | 17:25 |
dmsimard | mordred: I'm confused, let me put up a gist to express what I'm trying to say | 17:26 |
clarkb | mordred: ya I see it test() :) | 17:26 |
mordred | clarkb: https://review.openstack.org/#/c/500161/ | 17:26 |
mordred | dmsimard: ++ | 17:26 |
fungi | pabelanger: starting ze02 sounds good to me | 17:27 |
pabelanger | okay, ze02.o.o started | 17:30 |
dmsimard | mordred: https://gist.github.com/dmsimard/1fc6b22a40009298713c7432d9368a37 | 17:34 |
fungi | pabelanger: is your comment in 500990 implying that you have another patchset coming to address clarkb's concern, or a followup change? | 17:34 |
dmsimard | mordred: I guess in this example, we'd also need to set up the ansible roles path to seek from the checked out zuul-jobs | 17:35 |
pabelanger | fungi: yes, that is what I am writing now. I hope to push up the publish-openstack-python-docs changes in the next hour | 17:35 |
dmsimard | mordred: edited the gist to add the roles_path | 17:36 |
dmsimard | jeblair: https://gist.github.com/dmsimard/1fc6b22a40009298713c7432d9368a37 ? | 17:36 |
dmsimard | er, that ansible.cfg would not be effective | 17:38 |
dmsimard | unless we'd run from a multi-node setup and run ansible manually from a controller to a node | 17:39 |
fungi | pabelanger: awesome, but my question was about whether it' | 17:39 |
jeblair | dmsimard, mordred: theoretically, i think the include approach could work -- you could probably use include to get zuul to execute the un-merged project-config code (if you merged that job which did that) | 17:39 |
fungi | s a new patchset for that change, or a new change entirely | 17:39 |
jeblair | dmsimard, mordred: however, that's the reason we should not merge such a change, as it allows arbitrary code execution on the executor | 17:39 |
pabelanger | fungi: sorry, it will be a follow up because we need a new role in openstack-zuul-jobs. Basically, we can delete publish-openstack-python-docs job right now, if we want to avoid projects using it, currently that is shade | 17:40 |
jeblair | dmsimard: (ftr that path would actually be "{{ zuul.executor.src_dir }}/git.openstack.org/project-config/..." but that's a minor detail) | 17:41 |
dmsimard | jeblair: ah that's the one I was looking for actually, I couldn't find it | 17:42 |
jeblair | dmsimard: put another way: that change would let you tell the executor to run un-vetted code. it would also let people pwn the executor. so we can't merge it. | 17:42 |
fungi | pabelanger: okay, i'll put 500990 on the back burner for a bit and review it in the context of your coming changes | 17:42 |
dmsimard | jeblair: but $enduser from $project could merge something like that | 17:43 |
jeblair | dmsimard: no, that's a base job, and they can only go into project-config | 17:43 |
dmsimard | jeblair: do trusted jobs run outside the bubblewrap ? | 17:43 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Switch to publish-openstack-python-docs-infra https://review.openstack.org/501362 | 17:44 |
jeblair | dmsimard: no, they have their own bubblewrap (with potentially more access) | 17:44 |
pabelanger | fungi: so, we'd need to land ^, then we can remove publish-openstack-python-docs until new code is ready | 17:44 |
dmsimard | jeblair: I guess that's part of what I missed, okay. | 17:44 |
jlk | mordred: et al: There was code added to the github driver to handle a ping event, if it's from a repo we aren't configured to listen to. We're apparently being nice and responding back to github with a 404. If I move over to a ingest, queue, process model, we wouldn't be able to immediately (or at all really) return that 404. How important is this nicety of the 404? | 17:47 |
*** jkilpatr has quit IRC | 17:47 | |
pabelanger | fungi: 501363 removes it for now, until I push up new role | 17:47 |
jeblair | jlk: i guess the 404 just says to anyone looking at the github webhook logs that zuul is ignoring it? | 17:48 |
clarkb | mordred: where does the hostname come from in the log? I see we do timestamp | log_line but no hostname | 17:48 |
jlk | jeblair: yeah | 17:48 |
pabelanger | fungi: but, I think we should start testing 500990 sooner to confirm it works as expected | 17:48 |
*** jkilpatr has joined #zuul | 17:48 | |
jeblair | jlk: i feel like 200 is okay. like "message received!" the fact that it was subsequently ignored is a detail that a zuul admin can inspect. | 17:48 |
jlk | jeblair: it's odd that we do it specifically for the ping event, which is when somebody installs a webhook . | 17:49 |
jlk | different than an app install I think. | 17:49 |
mordred | jeblair, dmsimard: I don't think it would open the door to arbitrary code execution - it would just not run because the execution context is still untrusted | 17:49 |
jlk | I'll drop a TODO in here to validate that the project we got an event for is a project we care about. | 17:49 |
jlk | because we've talked about doing that anyway across the board, not just on ping events. | 17:49 |
mordred | clarkb: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/ansible/callback/zuul_stream.py?h=feature/zuulv3#n277 | 17:50 |
mordred | clarkb: and http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/ansible/callback/zuul_stream.py?h=feature/zuulv3#n277 | 17:50 |
jeblair | mordred: in dmsimard's gist, there is a playbook defined in a project-config job, so it runs in the trusted context. the content of that playbook is to ansible-include other playbooks from the checkout of unmerged code on the executor. that means that a change to an untrusted-project which used that base job and depended on an un-merged change to project-config would run the un-merged project-config content in the trusted context. | 17:52 |
mordred | jlk: I think it's fine to do 200 | 17:52 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Ignore errors from ara generate https://review.openstack.org/500645 | 17:52 |
clarkb | mordred: thanks, I also see the bug now. But have more questions about the relationship between zuul/ansible/library/ and zuul/ansible/callback | 17:53 |
mordred | jeblair: oh - right - sorry - in my brain that playbook was a playbook in zuul-jobs | 17:53 |
mordred | clarkb: sweet! questions are good | 17:53 |
clarkb | mordred: so I was mostly looking in librarby. and command.py there writes to /tmp/uuid.log and zuul_console.py in library reads that and streams it on 19885 | 17:54 |
clarkb | mordred: so I guess the question is where does the callback fit in if we are already writing to the file and streaming it | 17:55 |
mordred | jeblair, dmsimard: in any case, I agree that that's not how we should do this - I think the limited base job in project-config that just does ssh keys, stream and log collection, then a job in zuul-jobs or anywhere else can use that safely | 17:55 |
dmsimard | mordred: yeah I'm sending a patch for exactly that in a moment | 17:55 |
mordred | clarkb: yah - so - command.py in library is ACTUALLY the thing you want to be thinking about from library | 17:55 |
mordred | clarkb: that runs on the remote node when command or shell tasks are run | 17:56 |
mordred | clarkb: and yes, it logs to a local file | 17:56 |
jeblair | dmsimard: note (this is based on your gist) that the base job doesn't need to be node-specific. you can specify a default, or even omit the nodes section entirely from it. either way, the job or jobs you use to iterate on this in the zuul-jobs repo can specify a nodes section with the labels you want | 17:56 |
mordred | clarkb: at the top of the pre-playbook in the base job we run zuul_console which forks off a daemon that reads the files that the command tasks write to disk | 17:56 |
mordred | clarkb: that daemon also listens on port 19885 for incoming connections | 17:57 |
clarkb | right and that is in library/ as well | 17:57 |
mordred | clarkb: zuul_stream in callbacks runs as part of the ansible-playbook process on the executor | 17:58 |
mordred | clarkb: one instance of it is created per ansible-playbook invocation, and ansible-playbook calls its methods as things happen on the executor | 17:58 |
dmsimard | jeblair: I was actually wondering -- what I'm doing amounts to integration test the base playbooks against all distros. Should I do one job with 5 nodes? One of each distro? | 17:58 |
clarkb | mordred: and the callbacks are the aggregation point? | 17:59 |
mordred | clarkb: yes | 18:00 |
mordred | clarkb: as tasks start and stop the callbacks get methods called - and then in zuul_stream if we notice that it's a command or shell task, we spin up a thread to connect to the port of the daemon on the remote node and read the log stream from it | 18:00 |
mordred | clarkb: as we collect that content it is written to local disk on the executor in job-output.txt - which is what the finger daemon reads from when you hit it | 18:01 |
jeblair | dmsimard: you can; though 5 jobs each on one node is both easier for nodepool to supply and easier for humans to parse test results. | 18:02 |
*** harlowja has quit IRC | 18:02 | |
dmsimard | jeblair: in zuul could I set up a job-template and then expand the template with the node types ? | 18:03 |
dmsimard | I see in the docs there's a notion of job template but it's not very fleshed out | 18:03 |
jeblair | dmsimard: there's no job-template in zuul v3. | 18:03 |
dmsimard | jeblair: ah, er, I mistook for project template I guess | 18:03 |
jeblair | dmsimard: instead, make a job definition, then make 5 jobs that inherit from it, each with a different node type | 18:04 |
dmsimard | jeblair: | 18:04 |
dmsimard | jeblair: that's what I was doing but it felt a bit verbose | 18:04 |
jeblair | dmsimard: something like https://etherpad.openstack.org/p/Gdqs1NlMIQ | 18:06 |
jeblair | dmsimard: also, we should probably put these jobs in openstack-zuul-jobs rather than the zuul-jobs repo | 18:06 |
jeblair | dmsimard: the actual labels we're testing against are a little openstack-specific | 18:06 |
clarkb | mordred: and is _log_message() there to record non command/shell logs? | 18:07 |
clarkb | mordred: there is both _log and _log_message in the callback and trying to figure out why we need both | 18:07 |
jeblair | dmsimard: (but if it's faster to iterate against zuul-jobs for now, we can do that, and just avoid landing the changes for the moment) | 18:08 |
pabelanger | ze02.o.o looks to be working | 18:12 |
dmsimard | jeblair: I didn't even know that openstack-zuul-jobs was a thing | 18:12 |
pabelanger | mordred: https://review.openstack.org/500201/ so do we want openstack-doc-builds for shade and zuul? Or will it be tox-docs ? | 18:14 |
clarkb | mordred: actually in v2_runner_on_skipped we use both | 18:15 |
mordred | clarkb: _log is lower level - _log_message is a convenience wrapper | 18:18 |
dmsimard | Do we need to use the new depends-on syntax for v3 ? | 18:20 |
dmsimard | Or can we still just use the gerrit changeid ? | 18:20 |
dmsimard | docs seem to suggest it's just gerrit changeid but I recall a certain thread mentioning it would change (to support gerrit and github side by side for example?) | 18:23 |
jlk | I know github only supports the new method | 18:27 |
jlk | I think gerrit supports both? | 18:28 |
dmsimard | ah so it'd be driver/backend specific | 18:28 |
* dmsimard digs in code | 18:28 | |
jlk | ah crap. Something broke local logging | 18:28 |
jlk | zuul-scheduler_1 | Error grabbing logs: invalid character '\x00' looking for beginning of value | 18:28 |
jlk | and I'm not getting things logged to console | 18:29 |
dmsimard | jlk: yeah you're right it's driver specific | 18:29 |
mordred | clarkb: also - fwiw, the entire zuul_stream file needs to be refactored - but have been putting that off | 18:31 |
mordred | dmsimard: we also have not yet implemented cross-driver depends-on - that's a post-ptg thing | 18:32 |
mordred | dmsimard: so you can't (yet) depends-on a github change from a gerrit change or vice-versa | 18:32 |
mordred | dmsimard: we _definitely_ want to add that though | 18:32 |
dmsimard | jlk: I think jeblair had a patch to fix some junk whitespace issue | 18:32 |
dmsimard | jlk: https://github.com/openstack-infra/zuul-jobs/commit/a35c2ad35ed4aa5be85194d8bcf419bb0025272f | 18:32 |
dmsimard | not sure if it's related | 18:32 |
jlk | not related, this is well before job running | 18:33 |
dmsimard | mordred: right, I was wondering if inside gerrit we could keep using depends-on: <changeid> which seems to be the case so that's okay | 18:33 |
jlk | mordred: we should add, if we haven't already, the ability for gerrit to USE the new syntax, even if it just refers to itself. | 18:33 |
jlk | so that hte same syntax between github and gerrit can be used | 18:34 |
dmsimard | mordred: btw, the minimal playbook thing: https://review.openstack.org/#/c/501368/ | 18:35 |
mordred | jlk: I believe we have | 18:36 |
fungi | clarkb: does pabelanger's followup change address your comment on 500990? | 18:37 |
dmsimard | jlk: ah, yes, I guess that's what I meant -- if it was necessary for gerrit to use the new syntax against itself | 18:38 |
fungi | dmsimard: the proposal i recall was that for the gerrit trigger we would support change-id format as a means of backward-compatibility, but deprecate it and encourage everyone to switch to the new url-based format | 18:42 |
jlk | ansibot is about to be talked about in the ansible contributor thing | 18:44 |
jeblair | fungi, dmsimard, jlk: yes; we haven't had a chance to pull the gerrit syntax forward yet; that'll come with cross-source depends | 18:45 |
jeblair | mordred, jlk: no i don't think gerrit supports the new syntax yet | 18:45 |
dmsimard | jlk: there's no livestream/hangouts/whatever we can stalk in I guess ? | 18:45 |
jeblair | dmsimard: it's on the etherpad: https://public.etherpad-mozilla.org/p/ansible-summit-september-2017-core | 18:46 |
dmsimard | oohhh | 18:46 |
jlk | dmsimard: yup, bluejeans, IRC | 18:47 |
fungi | jlk: i _so_ hope ansibot is a bot for making ansi-escape-based art and animations | 18:47 |
dmsimard | fungi: it's what helps the ansible maintainers keep their sanity with the github workflow of issues and pull requests :) | 18:48 |
pabelanger | jeblair: mordred: fungi: do we have syntax for zuul client to enqueue-ref on a periodic pipeline? | 18:48 |
jeblair | pabelanger: "zuul --help"? | 18:49 |
fungi | pabelanger: i want to say last time i looked, enqueue-ref didn't work with periodic? or maybe i just haven't tried recently | 18:49 |
fungi | especially since there is no ref for periodic jobs | 18:49 |
jeblair | there is in zuul v3 | 18:49 |
fungi | ooh | 18:49 |
jeblair | so try it and if it doesn't work fix it :) | 18:49 |
pabelanger | k, will see if I can figure it out | 18:50 |
jeblair | fungi: thus ending the requirement that periodic jobs bake in their branch; you can just use a regular gate job in periodic and it gets a branch just like any other | 18:50 |
fungi | i missed that innovation | 18:51 |
fungi | that'll be quite handy | 18:51 |
jeblair | you may have been on vacation :) | 18:51 |
fungi | i may have. i seem to do that a lot | 18:51 |
pabelanger | okay, I think https://review.openstack.org/500626/ is ready for final review, if everybody is okay, I can +A upto 500626 | 18:52 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix dynamic dependent pipeline failure https://review.openstack.org/501345 | 19:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add test for dependent changes not in a pipeline https://review.openstack.org/501353 | 19:01 |
jlk | Well, I have a thread started for reading things from the queue, and it read at least one from the queue. Not sure why it's not reading more from the queue. | 19:03 |
jeblair | pabelanger: wfm | 19:04 |
jeblair | Shrews: i'll look at cloner after lunch | 19:04 |
pabelanger | jeblair: mordred: Shrews: any objections on bringing online nl02.o.o? Currently waiting on some code reviews, so can shift to standing up infra stuff | 19:05 |
jeblair | pabelanger: go for it | 19:05 |
mordred | pabelanger: do it | 19:05 |
pabelanger | k | 19:05 |
Shrews | dooo eeet | 19:12 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: WIP: Make the base playbooks/roles work for every supported distro https://review.openstack.org/501281 | 19:28 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: WIP: Make the base playbooks/roles work for every supported distro https://review.openstack.org/501281 | 19:29 |
dmsimard | bah, doing a depends-on a review that hasn't merged in project-config doesn't work :) | 19:33 |
dmsimard | (in v3) | 19:33 |
dmsimard | which I guess is okay, once again security wins | 19:33 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Split github hook ingest and processing https://review.openstack.org/501390 | 19:33 |
jlk | mordred: jeblair: ^^ that introduces the eat and queue model for github events. Sending events is significantly faster to get a 200 back | 19:35 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul feature/zuulv3: Only strip trailing whitespace from console logs https://review.openstack.org/501394 | 19:36 |
clarkb | mordred: ^ something like that I think | 19:36 |
jlk | mordred: jeblair: so from cold start, new method is 1.2 total seconds to get to a 200, and then after that it's .4 on repeated events. Old method was 4.35 on cold, and then 1.2 to 2.9 to whatever on repeat events. | 19:39 |
* jlk lunches | 19:40 | |
*** olaph has quit IRC | 19:50 | |
*** olaph has joined #zuul | 19:51 | |
mordred | clarkb: reading | 19:53 |
mordred | jlk: reading | 19:53 |
* mordred reads in parallel | 19:53 | |
*** olaph1 has joined #zuul | 19:55 | |
*** olaph has quit IRC | 19:56 | |
mordred | clarkb: one comment - otherwise looks great | 19:58 |
*** olaph1 is now known as olaph | 20:00 | |
mordred | clarkb: unfortunately ansible does not log issues in the callback plugins particularly well | 20:00 |
mordred | clarkb: I just had an idea of a thing we can do about that in this context though... | 20:01 |
pabelanger | okay, I am stopping nl01 | 20:02 |
clarkb | mordred: the test failures appears to be in syncing the job-output.json file though. Not sure its related to my change | 20:02 |
clarkb | broken pipe (32) from rsync | 20:02 |
pabelanger | and nl02.o.o started | 20:03 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul feature/zuulv3: Only strip trailing whitespace from console logs https://review.openstack.org/501394 | 20:04 |
pabelanger | cool, nl02.o.o is responding to requests | 20:04 |
jeblair | Shrews: clone mapper looks good to me. we may want to shift that over and merge it into zuul-jobs instead of zuul. stick that in as a templated file and make a role that copies it over top of the zuul-cloner that's baked into our images. | 20:05 |
jeblair | mordred, clarkb: ^ one of you want to look that over (https://review.openstack.org/500922) | 20:05 |
pabelanger | Hmm, I've just noticed nl02.o.o doesn't have swap setup | 20:05 |
jeblair | jlk: cool! though zuul seems to be expressing some displeasure with the unit tests with your patch. | 20:07 |
jlk | ruh roh | 20:07 |
mordred | jeblair, jlk: is queue.Queue inherently threadsafe? | 20:07 |
jlk | dunno, it's what Gerrit driver uses | 20:07 |
clarkb | mordred: ya | 20:07 |
mordred | cool | 20:07 |
mordred | it was just putting and getting from a multi thread context without explicit locks, so I figured I'd ask :) | 20:08 |
clarkb | that said if using asyncio I think there are special queue objects for it (that are not thread safe beacuse no proper threads) | 20:08 |
mordred | clarkb: yah - that'll be a whole other thing for later | 20:08 |
jeblair | mordred: yes | 20:10 |
clarkb | jeblair: looking at 501345, curious how there were tests that failed outside of that change, but you got it to pass only be modifying tests in that change | 20:10 |
clarkb | jeblair: are the test's side effecting each other? | 20:11 |
jeblair | clarkb: no the addition of check: jobs: [] fixed those | 20:11 |
mordred | clarkb: although my first hunch is that the zuul-web webhook will do what this one is doing except instead of self.connection.addEvent() it'll do "gearman.submitJob('addGithubEvent', background=True)" or something | 20:11 |
jeblair | clarkb: basically, the fix tightened up when we report on things not in pipelines; some of those tests then needed their project to be in a pipeline. so i added them to the check pipeline with no jobs. | 20:11 |
jeblair | that's a thing in zuul v3, more or less exactly for this. :) | 20:12 |
pabelanger | jeblair: clarkb: fungi: mordred: okay, nl02.o.o is running, but without swap. I am thinking of leaving it for now, rebuild nl01.o.o under xenial, validate swap is working, swap back to nl01.o.o then fix nl02.o.o. any objections? | 20:12 |
pabelanger | otherwise, I can roll back to nl01.o.o first, and fix swap on nl02.o.o | 20:12 |
jeblair | pabelanger: wfm | 20:12 |
clarkb | jeblair: oh I see other tests not touched in that chagne are also using the in-repo fixture | 20:13 |
jeblair | mordred: agreed; gearman should be the queue in next iteration | 20:13 |
mordred | jeblair: yah | 20:13 |
jeblair | clarkb: yep | 20:13 |
fungi | pabelanger: sounds fine. nl02 doesn't seem to be under any memory pressure | 20:13 |
jeblair | mordred: want to re +2/+3 501345 and child? | 20:14 |
mordred | jeblair: I do! | 20:14 |
jeblair | mordred, pabelanger: we have not restarted since the command.py (utf8 log streaming) patch landed, correct? | 20:15 |
clarkb | I've got another log streaming change https://review.openstack.org/501394 that would be nice to get in for better formatted logs (though utf8 fix is definitely higher priority) | 20:16 |
jeblair | clarkb: ack | 20:16 |
* fungi reviews | 20:16 | |
pabelanger | jeblair: I am not sure | 20:16 |
mordred | jeblair: I restarted zuul first thing this morning iirc - but not since then | 20:17 |
jeblair | mordred: do you know if the stream change had landed? i think you approved it first thing this morning as well, so unsure which first thing was first :) | 20:18 |
fungi | clarkb: did you mean rstrip where you used rsplit? | 20:18 |
clarkb | fungi: yes I most certainly did | 20:18 |
* clarkb fixes | 20:18 | |
* fungi suddenly feels useful | 20:18 | |
jeblair | there *is* an rsplit | 20:18 |
mordred | jeblair: I do not remember | 20:19 |
jeblair | mordred: okay, let's just land clarkb's thing and restart anyway | 20:19 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul feature/zuulv3: Only strip trailing whitespace from console logs https://review.openstack.org/501394 | 20:19 |
mordred | jeblair: ++ | 20:19 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Fix dynamic dependent pipeline failure https://review.openstack.org/501345 | 20:23 |
jlk | hrm I think something isn't closing the thread. | 20:26 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add test for dependent changes not in a pipeline https://review.openstack.org/501353 | 20:28 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add zuul-cloner shim https://review.openstack.org/500922 | 20:30 |
pabelanger | fungi: clarkb: do you mind reviewing https://review.openstack.org/500990/ again, it has related patches needed for zuul and publishing afs docs | 20:30 |
pabelanger | that will stop overwriting http://docs.openstack.org/infra/zuul/ with zuulv3 docs | 20:31 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Update tests to use AF_INET6 https://review.openstack.org/501266 | 20:32 |
Shrews | jeblair: ack. will do the env var thing, then shift it to zuul-jobs | 20:32 |
clarkb | pabelanger: done | 20:32 |
mordred | clarkb, pabelanger: https://review.openstack.org/#/c/501381/ while we're at it | 20:33 |
jeblair | jlk: ah i think i see the issue | 20:34 |
jlk | oh good! I haven't found it yet | 20:34 |
jeblair | jlk: tests/base.py line 2137 | 20:35 |
jeblair | jlk: that makes sure that the tests wait for the gerrit connector event queue to empty before deciding that the system is stable (in waitUntilSettled) | 20:35 |
dmsimard | ah, eh, ew ? | 20:35 |
jeblair | jlk: we need to add the github event queue to that list as wel, a few lines down. | 20:35 |
dmsimard | ansible_distribution returns "openSUSE Leap" with an actual space in it | 20:35 |
jeblair | Shrews: or maybe if it ends up being a templated script, you could just template that in instead of env-varring | 20:36 |
* dmsimard uses os_family for suse | 20:36 | |
jlk | I think that line number is off? | 20:36 |
jeblair | jlk: maybe; it's in def getGerritConnection(driver, name, config): | 20:36 |
jeblair | jlk: self.event_queues.append(con.event_queue) | 20:37 |
jlk | oooh I see | 20:37 |
jeblair | i should really make an emacs macro to open the current line in cgit | 20:37 |
clarkb | dmsimard: and I think for tumbleweed it may be just "tumbleweed" ? family sounds like a good idea | 20:38 |
dmsimard | clarkb: facts: http://logs.openstack.org/67/499467/8/check/gate-tempest-dsvm-neutron-full-opensuse-423-nv/e9b97c3/logs/ara/host/0962bf05-4d54-41b1-8c1d-49bc318e9f33/ | 20:38 |
jeblair | clarkb: mordred: stream change https://review.openstack.org/501394 failed | 20:41 |
clarkb | hrm this time it actually failed and wasn't post sync failing | 20:42 |
clarkb | aha I see why | 20:43 |
jeblair | clarkb: change worked, test needs updating | 20:43 |
clarkb | 2017-09-06 20:33:09.738370 | node1 | link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 <- yup | 20:44 |
jlk | oh haha, I have to remove the ping event handler too | 20:45 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul feature/zuulv3: Only strip trailing whitespace from console logs https://review.openstack.org/501394 | 20:46 |
mordred | clarkb: nice | 20:46 |
clarkb | I skimmed the other greps too and ^ appeared to be the only two that needed indentation | 20:46 |
mordred | clarkb: also - thanks for fixing that - my eyes hadn't realized what was going on - the update looks great | 20:46 |
jeblair | i went ahead and +3d it; if folks see issues in http://logs.openstack.org/94/501394/3/check/zuul-stream-functional/f889e56/stream-files/stream-job-output.txt we can block it before it merges | 20:47 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: WIP: Make the base playbooks/roles work for every supported distro https://review.openstack.org/501281 | 20:49 |
pabelanger | clarkb: dmsimard: do you happen to have an idea how to automate that process? I'm happy to write the code, but I ended up just modifying make_swap.sh manually to create it on xenial server. Is this an issue for devstack-gate, if so, maybe we just update launch-node to use that role now | 20:51 |
clarkb | pabelanger: make_swap.sh in system-config is independent of devstack-gate completely iirc | 20:51 |
pabelanger | clarkb: dmsimard: sorry, this should have been in #openstack-infra for swap issue | 20:51 |
*** jkilpatr has quit IRC | 20:54 | |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Split github hook ingest and processing https://review.openstack.org/501390 | 20:59 |
jlk | mordred: jeblair: fixed! | 20:59 |
jeblair | jlk: lgtm; let's see what zuul thinks! :) | 21:00 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Actually use fetch-stestr-output in unittests base job https://review.openstack.org/501441 | 21:03 |
*** jkilpatr has joined #zuul | 21:08 | |
mordred | jeblair: have we restarted zuul yet with your dependent fix? or are we waiting for clarkb's? | 21:15 |
mordred | clarkb: btw - that failed again | 21:15 |
clarkb | mordred: wat | 21:16 |
clarkb | mordred: host key verification failed | 21:17 |
clarkb | dont think I touched that | 21:17 |
mordred | oh - that's not great | 21:17 |
mordred | you didn't | 21:17 |
fungi | mordred: jeblair: looks like puppet updated the zuul install on zuulv3.o.o 8 minutes ago... time to restart? | 21:17 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Split github hook ingest and processing https://review.openstack.org/501390 | 21:18 |
fungi | oh, we don't have the rstrip console change in yet | 21:18 |
mordred | clarkb: so - it did the hostkey role - http://logs.openstack.org/94/501394/4/check/zuul-stream-functional/868fc52/job-output.txt.gz#_2017-09-06_20_56_20_714355 | 21:19 |
jeblair | mordred, clarkb, fungi: it looks like my stream fix merged before the last restart, so i've resumed poking at devstack while waiting for clark's to merge | 21:20 |
mordred | jeblair: awesome | 21:21 |
mordred | clarkb: 91.106.198.111 floating ips | 21:21 |
mordred | clarkb: these nodes have http://logs.openstack.org/94/501394/4/check/zuul-stream-functional/868fc52/zuul-info/inventory.yaml | 21:21 |
mordred | floating ips (see interface_ip in the inventory) | 21:21 |
fungi | jeblair: yeah, the stream encoding fix is installed on the server already, i did see it on there | 21:22 |
mordred | but multi-node-known-hosts doesn't seem to be adding those | 21:22 |
pabelanger | okay, I've rolled back to nl01.o.o, which is xenial | 21:23 |
fungi | oh weird... multinode over fip bug? | 21:23 |
mordred | well - bug in our role that sets it up so that all hte nodes can ssh to each other | 21:24 |
mordred | I see a fix - one sec | 21:24 |
jeblair | clarkb: can you help me out with a suggestion regarding https://review.openstack.org/451492 | 21:25 |
jeblair | clarkb: the devstack job is running into this: http://logs.openstack.org/02/500202/18/check/devstack/9eb2549/job-output.txt.gz#_2017-09-06_21_08_47_095758 | 21:25 |
clarkb | oh hrm | 21:26 |
*** yolanda has quit IRC | 21:26 | |
jeblair | what creates the mirror_info.sh file? | 21:26 |
*** yolanda has joined #zuul | 21:26 | |
clarkb | nodepool ready script iirx | 21:26 |
clarkb | which we dont have in v4 | 21:27 |
clarkb | *3 | 21:27 |
clarkb | so maybe we just need a pre task that drops that in? I think mordred was looking at that? | 21:27 |
jeblair | yeah, in theory we could do it in the configure_mirrors role | 21:29 |
jeblair | (cc dmsimard, pabelanger ^) | 21:29 |
pabelanger | okay, and nl02.o.o is backonline too. So we are running 2 nodepool-launchers right now, are we good with that? | 21:29 |
jeblair | though it's also worth asking: is that the way we want to handle this? | 21:29 |
jeblair | clarkb: why isn't that in devstack-gate? | 21:30 |
jlk | Oooh, hopefully next restart of zuul includes the github change I just made, so we can see if it reduces the timeouts. | 21:30 |
dmsimard | We'll have to keep the file for the time being for backwards compat | 21:30 |
jeblair | clarkb: (that == the add-apt-repo) | 21:30 |
clarkb | jeblair: because devstack needs it to function | 21:30 |
dmsimard | And yes, we can likely set it up through config mirror | 21:30 |
clarkb | old libvirt just isnt reliable | 21:30 |
jeblair | clarkb: yeah, but there's a big "if running in gate" block there... | 21:31 |
jeblair | clarkb: so why not do the "if running in gate" block in devstack-gate, and then... hopefully add-apt-repo noops in devstack? :) | 21:31 |
clarkb | ya we could have devstack skip if some uca repo exists | 21:32 |
jeblair | so to make this a really high-level question: is /etc/ci/mirror_info.sh an API that we want to support for openstack | 21:33 |
jeblair | er | 21:33 |
jeblair | for jobs in openstack-infra | 21:33 |
pabelanger | Oh, /etc/ci/mirror_info.sh. Ya, we'll have to create that file today. But we could write them as facts on disk moving forward | 21:33 |
jeblair | or is there something more ansiblish/v3 we could do. | 21:33 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Handle floating ips in multi-node-known-hosts https://review.openstack.org/501459 | 21:33 |
jeblair | like what pabelanger just suggested | 21:33 |
mordred | clarkb, jeblair, pabelanger: ^^ thatshould fix the ssh hostkey task for our floating ip clouds | 21:33 |
pabelanger | ya, we could write them into /etc/ansible/facts, but haven't thought what that would look like | 21:34 |
jeblair | okay, so maybe we should add /etc/ci/mirror_info.sh to configure_mirrors role for now, and think about alternatives later | 21:35 |
mordred | yah - so - I think sort-term we _definitely_ have to write that file out, because a metric truckload of people consume mirror_info.sh aiui | 21:35 |
jeblair | really? | 21:35 |
mordred | yah - people use it in things like inside-of-docker-images-in-kolla | 21:35 |
clarkb | ya and dib builds and such | 21:35 |
clarkb | its how we communicate "this is where you find thingd" | 21:36 |
jeblair | mordred: well 14 projects do :) | 21:36 |
clarkb | particularly useful if say builing ubuntu image on centos | 21:36 |
*** olaph1 has joined #zuul | 21:36 | |
mordred | yah - at least "some" | 21:36 |
mordred | maybe not metric truckload - but maybe an imperial one | 21:36 |
jeblair | who wants to add that? | 21:36 |
jeblair | dmsimard: does it make sense for you to work that into your current effort? | 21:36 |
dmsimard | Sure | 21:37 |
*** olaph has quit IRC | 21:37 | |
fungi | it seems worth changing soonish after cut-over, and we ought to be able to find consumers of it pretty easily with git grep (or codesearch.o.o in case they're calling into it from scripts in their repos) | 21:37 |
mordred | it can almost certainly be a fairly easy cut/paste from the existing configure_mirror.sh just replacing the here-doc with a template | 21:37 |
jeblair | dmsimard: http://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/scripts/configure_mirror.sh#n73 | 21:37 |
dmsimard | I'm familiar with that file, yes, we hack it in review.rdo :) | 21:38 |
jeblair | dmsimard: cool thanks :) | 21:38 |
pabelanger | ya, adding to configure-mirror role +1 | 21:38 |
dmsimard | Need +3 on https://review.openstack.org/#/c/501368/ to unblock me though | 21:38 |
jeblair | i will do a hacky thing to devstack to get past that now | 21:38 |
jeblair | mordred: ^ that's you | 21:39 |
fungi | i definitely don't think a sourced shell snippet setting some relatively ad-hoc envvars is an api we want to support in the long term if we value our sanity | 21:39 |
jeblair | mordred: (the +3 on 501368) | 21:39 |
jeblair | fungi: yeah, there must be a better way. i don't know it right now, but we'll find it :) | 21:40 |
* dmsimard Raymond H. voice "there has to be a better way" | 21:40 | |
mordred | dmsimard: +A | 21:41 |
pabelanger | okay, nodepool-launcher looks happy. I'm moving back to testing afs publishing | 21:41 |
clarkb | the biggest problem has been discovering complete list regardless of platform | 21:42 |
clarkb | because there are cases where ubuntu based jobs need centos repos | 21:42 |
mordred | dmsimard, jeblair: for the configure mirrors thing - it probably ALSO needs to write out that list of files after the here doc | 21:42 |
mordred | honestly - I think for now we're likely better of actually just copying that entire file and then running it with NODEPOOL_MIRROR_HOST set properly in an env var | 21:43 |
mordred | cause all of the putting sources.list.available.d and whatnot at the end | 21:44 |
mordred | and it sets up unbound at the top | 21:44 |
*** harlowja has joined #zuul | 21:45 | |
pabelanger | ++ | 21:45 |
pabelanger | then we can itterate on it | 21:45 |
pabelanger | iterate* | 21:45 |
pabelanger | mordred: jeblair: clarkb: fungi: https://review.openstack.org/501362 would like a review, switches zuul to publish-openstack-python-docs-infra job | 21:47 |
*** olaph1 is now known as olaph | 21:51 | |
*** hashar has quit IRC | 21:54 | |
mordred | pabelanger: lgtm +A - also added follow up https://review.openstack.org/501475 which just caught my eye in the review | 21:54 |
SpamapS | is there a way to tell zuul/nodepool that a certain job should _always_ hold its nodes? | 21:55 |
mordred | SpamapS: not to my knowledge, no | 21:57 |
SpamapS | I wonder how well it would work to just have jobs that don't complete for a few days. | 21:57 |
mordred | SpamapS: there is a count argument to autohold though - so I imagine implementing support for that as a count=-1 or something similar wouldn't be terribly difficult | 21:57 |
mordred | SpamapS: it should work as well as gearman works :) | 21:57 |
mordred | SpamapS: oh - I mean, holding a node happens after the job completes though | 21:58 |
mordred | SpamapS: so the only jobs-don't-complete portion would be if you're starving your availble nodes by holding all of them | 21:58 |
mordred | SpamapS: while you're here ... any chance you have a sec to look at / review https://review.openstack.org/#/c/501459/ ? | 21:59 |
mordred | SpamapS: (since you wrote that originally) | 21:59 |
mordred | clarkb: could I get an amen on https://review.openstack.org/#/c/501441 ? | 21:59 |
SpamapS | mordred: I'll look in a few. | 22:00 |
mordred | SpamapS: thanks! | 22:01 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: WIP: Make the base playbooks/roles work for every supported distro https://review.openstack.org/501281 | 22:01 |
pabelanger | mordred: left -1 with comment | 22:01 |
mordred | SpamapS: (as you well know the multi-host-host-keys is dense code) | 22:01 |
dmsimard | jeblair, mordred: you have a hint on this "unknown configuration error" error here ? https://review.openstack.org/#/c/501281/6 | 22:02 |
jeblair | dmsimard: btw, you don't need to define those nodesets, you can just inline them in the jobs under "nodes:" (nodes: is either a nodeset name, or a nodeset definition) | 22:03 |
jeblair | dmsimard: i will go spelunking in logs | 22:03 |
mordred | pabelanger: good call - I responded - but tl;dr - that says to me I'd like to refactor something, but I think we can refactor it later | 22:03 |
dmsimard | jeblair: I know about the node definition but I was actually wondering if these should be defined by default in project-config to be made less redundant | 22:04 |
dmsimard | jeblair: otherwise we end up re-declaring these same nodes over and over | 22:04 |
dmsimard | It probably looks too verbose because it's just one job but would end up saving lines for a dozen different jobs | 22:06 |
dmsimard | I don't have a strong opinion on this one. | 22:06 |
jeblair | nor do i | 22:06 |
dmsimard | I was ready to do one job with 5 nodes to keep zuul.yaml clean :D | 22:06 |
mordred | I mean - there's not much value in a nodeset called "ubuntu-trusty" that has a single node called "ubuntu-trusty" on label "ubuntu-trusty" - it seems that if you're adding a node toa job you should just be able to say "nodes: -ubuntu-trusty" | 22:06 |
mordred | dmsimard: well, you can totally do that you know - ansible will let you :) | 22:07 |
jeblair | mordred: yeah, we can make the name optional and default it to the label | 22:07 |
dmsimard | jeblair: +1 | 22:07 |
dmsimard | that would solve the problem | 22:07 |
jeblair | mostly, i just don't want to establish the idea that you have to define a nodeset | 22:07 |
jeblair | to be honest, i'd rather we be *slightly* less clever at first if it means we avoid showing people the wrong way to do things :) | 22:08 |
mordred | yah | 22:08 |
jeblair | Exception: Configuration item dictionaries must have a single key | 22:09 |
mordred | jeblair: I can't define two different variants with different nodes can I? | 22:09 |
jeblair | mordred: sure you can | 22:09 |
mordred | jeblair: cool | 22:09 |
jeblair | mordred: assuming they match different things | 22:10 |
jeblair | that's sort of the primary use case for variants | 22:10 |
mordred | oh - no - matching the same thing | 22:10 |
jeblair | "stable runs on trusty; master runs on xenial" | 22:10 |
jeblair | mordred: that's not a variant, that's a job | 22:10 |
mordred | nod | 22:10 |
jeblair | dmsimard: all those nodeset definitions need more indentation | 22:11 |
jeblair | i'll see if i can't make that into a nice error | 22:12 |
dmsimard | jeblair: I'm submitting a patchset without nodesets anyway | 22:12 |
jeblair | k | 22:12 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: WIP: Make the base playbooks/roles work for every supported distro https://review.openstack.org/501281 | 22:12 |
dmsimard | ^ | 22:13 |
dmsimard | that one worked | 22:13 |
dmsimard | the jobs are scheduled | 22:13 |
dmsimard | so I guess it was junk out of the nodeset config | 22:13 |
jeblair | dmsimard: it was the indentation | 22:13 |
jeblair | maybe it wasn't clear, but that exception and my indentation suggestion were the result of checking the zuul log for the actual error | 22:14 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: WIP: Make the base playbooks/roles work for every supported distro https://review.openstack.org/501281 | 22:14 |
dmsimard | yeah, I think almost anything except "unknown configuration error" could be a good pointer | 22:14 |
jeblair | that's a specific enough error we can actually say "dude, hit tab" | 22:15 |
dmsimard | lol | 22:15 |
dmsimard | or space space space space | 22:15 |
dmsimard | afk for a bit, I'll hack on the base roles tonight now that it's unblocked \o/ | 22:15 |
dmsimard | oops, looks like something might be wrong with the minimal job http://logs.openstack.org/81/501281/8/check/base-integration-ubuntu-xenial/cc1ffbb/job-output.txt.gz#_2017-09-06_22_14_56_415036 | 22:16 |
dmsimard | I'll look later /me afk | 22:16 |
pabelanger | mordred: ack | 22:17 |
SpamapS | mordred: +A'd | 22:17 |
mordred | SpamapS: thanks! | 22:18 |
fungi | dmsimard: wow! look at all those job failures ;) | 22:18 |
mordred | pabelanger: oh - also - https://review.openstack.org/#/c/501246 goes along with your other one | 22:19 |
SpamapS | mordred: so the use case I have is that I want to have zuul and nodepool spin up test nodes in a number of scenarios, and one of those is more of the "developer wants test nodes deployed with the latest code to test XXX" ... | 22:19 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Update tests to use AF_INET6 https://review.openstack.org/501266 | 22:20 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Actually use fetch-stestr-output in unittests base job https://review.openstack.org/501441 | 22:20 |
fungi | dmsimard: "ERROR: Executing local code is prohibited" | 22:20 |
fungi | for when you get back | 22:20 |
fungi | oh, i see you already found that error | 22:22 |
* fungi should read scrollback more carefully | 22:22 | |
pabelanger | mordred: +3 | 22:22 |
mordred | SpamapS: nod. I can understand that desire. I think we should chat about the best way to expose that | 22:22 |
mordred | SpamapS: for now, you could TOTALLY fake it by adding an autohold to a job with a count of like 99999 or something | 22:22 |
SpamapS | mordred: One way I was thinking to do it is to just have a job that doesn't complete until the user is done playing with the nodes. | 22:23 |
SpamapS | How does one return held nodes? | 22:23 |
mordred | SpamapS: yah - that's another thing you could do - you'd have to either disable or put a REALLY long timeout on it | 22:23 |
* SpamapS has, oddly enough, never done that | 22:23 | |
mordred | SpamapS: you tell nodepool to delete the node | 22:23 |
fungi | SpamapS: an administrator sets the node state to something else (generally delete) | 22:23 |
fungi | so it's not really self-service | 22:24 |
SpamapS | Hm | 22:24 |
SpamapS | I wonder if a better thing to do would be to just emulate zuul+nodepool with a manual provisioning playbook or something. | 22:24 |
mordred | SpamapS: oh - hah. I've got an idea ... | 22:24 |
SpamapS | but that gets into pushing.. blah blah | 22:24 |
SpamapS | The reason I want this is that the job we run to run would have 5 nodes | 22:25 |
mordred | SpamapS: have your job that doesn't complete until the user is done ... just create a stamp file and then wait until the file goes away | 22:25 |
SpamapS | s/run to run/want to run/ | 22:25 |
mordred | SpamapS: so that the dev can just delete the stamp file when they're done | 22:25 |
SpamapS | mordred: that's exactly what I was thinking too | 22:25 |
mordred | SpamapS: and that could be just on one node | 22:25 |
SpamapS | Yeah just have like, a 72 hour timeout on the job and test for a stamp file at the end of the job playbook. | 22:26 |
mordred | yah | 22:26 |
mordred | and the 72 hour timeout is your safety net for people forgetting about it | 22:26 |
SpamapS | exactly | 22:26 |
pabelanger | jeblair: mordred: interesting failure | 22:40 |
pabelanger | http://logs.openstack.org/62/501362/1/gate/tox-py35/040f590/job-output.txt.gz | 22:40 |
pabelanger | possible related to ipv6? | 22:41 |
pabelanger | that ran on vexxhost | 22:41 |
mordred | hrm | 22:43 |
pabelanger | oh | 22:44 |
pabelanger | 2017-09-06 22:40:47,211 DEBUG zuul.AnsibleJob: [build: 72137254d1804768873127b65f5006f3] Ansible output: b'ERROR! A worker was found in a dead state' | 22:44 |
pabelanger | that ran on ze02 | 22:44 |
pabelanger | I don't think we are running right python there | 22:44 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Only strip trailing whitespace from console logs https://review.openstack.org/501394 | 22:45 |
pabelanger | mordred: jeblair: I am going to stop ze02.o.o because of dead state | 22:45 |
jamielennox | pabelanger: gah: http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/fetch-testr-output/tasks/process.yaml#n8 | 22:45 |
pabelanger | jamielennox: ya, we need to fix that | 22:46 |
fungi | looks like clarkb's rstrip fix landed, so watching puppet apply logs now | 22:46 |
jamielennox | pabelanger: so does that always make sense in a base job like that or something that should (somehow) be performed by the client's tests? | 22:47 |
jamielennox | particularly in a non-openstack case, is that something that the client should perform and drop into the logs folder? | 22:47 |
pabelanger | jamielennox: might want to check with mordred. But, we should likely have ensure-testr role, or something like that | 22:49 |
jamielennox | well testr is within tox right? so that should/will be installed each time | 22:50 |
jamielennox | i guess the question is does post-processing always occur in the base jobs or is it something that (somehow) the individual repo should generte? | 22:51 |
pabelanger | right now we run testr in its own virtualenv on DIB | 22:51 |
pabelanger | but, it could be installed in test-requirements.txt | 22:51 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Handle some common yaml syntax errors https://review.openstack.org/501486 | 22:51 |
pabelanger | but, we likely want the job to ensure testr is installed and if missing, install it some place | 22:51 |
jeblair | dmsimard: https://review.openstack.org/501486 handles the syntax error you found and some more | 22:52 |
mordred | jamielennox: well - for now, the general idea is that if your job can produce subunit then we should be able to get that into other forms - or alternately you can just make whatever output you want | 22:52 |
pabelanger | mordred: how did you add python PPA to ze01.o.o? | 22:53 |
pabelanger | I thought that was in system-config | 22:53 |
fungi | i'm not sure what to make of the unit test failure on 501459 | 22:53 |
mordred | pabelanger: it's in the puppet | 22:53 |
fungi | has anyone seen that yet? | 22:53 |
mordred | jamielennox: but it's an area we need to work out amongst ourselves - cause there's a too-much case and a not-enough case | 22:53 |
pabelanger | mordred: k, I see it | 22:53 |
jeblair | pabelanger, mordred: do we need to do something to get mordred's patched python3 on ze02-ze04? | 22:54 |
jeblair | pabelanger: oh you're on that, sorry | 22:54 |
mordred | jamielennox: I call out subunit specifically because at some point in the future when we get around to it we want to be able to have the executor snoop the output stream as it's happening and if contains subunit to notice if any tests failed so we can report that tests are _going_ to fail without waiting until the end | 22:54 |
mordred | pabelanger: did I put it in a bad place? | 22:54 |
clarkb | fungi: http://logs.openstack.org/59/501459/1/gate/tox-py35-on-zuul/143c361/job-output.txt.gz#_2017-09-06_22_25_24_680199 | 22:54 |
jamielennox | patched python3 ? | 22:55 |
jeblair | jamielennox: lemme dig up links | 22:55 |
mordred | jamielennox: yah - there's a crashing bug in the version in xenial - we've submitted a backport upstream | 22:55 |
pabelanger | mordred: no, we just need to run apt-get upgrade for some reason | 22:55 |
clarkb | jamielennox: turns out that python rewrite a good chunk of dict which broke things in ubuntu's versio nof python | 22:55 |
pabelanger | on ze02.o.o | 22:55 |
mordred | https://launchpad.net/~openstack-ci-core/+archive/ubuntu/python-bpo-27945-backport | 22:56 |
jamielennox | oh god | 22:56 |
mordred | jamielennox, jeblair: ^^ | 22:56 |
clarkb | its second lts release of ubuntu with broken python3 :) | 22:56 |
clarkb | hard to blame ubuntu as both were bugs upstream but still painful | 22:56 |
pabelanger | mordred: ya, so we have never versions of python that needs to be installed vi apt. Puppet isn't upgrading the PPA, because we don't have python3-dev latest any place | 22:57 |
mordred | GAH | 22:57 |
pabelanger | mordred: I can manually do it, but we should puppet it | 22:57 |
mordred | SpamapS: "Chris Halse Rogers (raof) wrote 20 hours ago: Proposed package upload rejected" | 22:57 |
mordred | pabelanger: yes. we should | 22:57 |
fungi | clarkb: yeah, i found the traceback, just trying to figure out how it lost the console log file | 22:57 |
mordred | SpamapS: from https://bugs.launchpad.net/ubuntu/+source/python3.5/+bug/1711724 | 22:58 |
openstack | Launchpad bug 1711724 in python3.5 (Ubuntu Xenial) "Segfaults with dict" [High,In progress] - Assigned to Clint Byrum (clint-fewbar) | 22:58 |
pabelanger | mordred: so, if we have puppet manage that, it might break zuul-executor, since we need to uninstall python | 22:58 |
mordred | pabelanger: why do we need to uninstall python? | 22:59 |
pabelanger | mordred: apt does it | 23:00 |
pabelanger | oh wait | 23:00 |
pabelanger | mordred: ignore me | 23:00 |
clarkb | fungi: happened at http://logs.openstack.org/59/501459/1/gate/tox-py35-on-zuul/143c361/job-output.txt.gz#_2017-09-06_22_25_24_599560 too which is a different path | 23:01 |
clarkb | fungi: perhaps that tmpdir was removed? | 23:01 |
SpamapS | mordred: DAMNIT | 23:03 |
pabelanger | SpamapS: mordred: didn't the patch add a unit test? | 23:04 |
mordred | pabelanger: I believe the rejected the upload beause of a different bug that also requested an SRU | 23:04 |
SpamapS | Yeah | 23:04 |
SpamapS | doko piggy backed on ours | 23:04 |
*** dkranz has quit IRC | 23:05 | |
SpamapS | and then failed to follow the process | 23:05 |
clarkb | https://bugs.launchpad.net/ubuntu/+source/python3.5/+bug/1682934 that one | 23:05 |
openstack | Launchpad bug 1682934 in python2.7 (Ubuntu) "python3 in /usr/local/bin can cause python3 packages to fail to install" [Undecided,Confirmed] | 23:05 |
SpamapS | (our bug also explained why a zesty upload wasn't needed) | 23:05 |
mordred | thanks doko | 23:05 |
jeblair | SpamapS: you explained artful, but not zesty? | 23:05 |
SpamapS | Oh maybe. Hrm | 23:06 |
SpamapS | looks like doko is dropping that one | 23:06 |
SpamapS | so I can just re-upload the one I previously produced | 23:06 |
clarkb | I don't see where/how those bugs were associated | 23:07 |
clarkb | other than the comment saying no have a nice day | 23:07 |
SpamapS | clarkb: they were only associated by an upload that was caught in a manual-approval queue | 23:07 |
SpamapS | doko downloaded my upload, added his fix, then re-uploaded | 23:08 |
SpamapS | which I knew.. | 23:08 |
SpamapS | and is not uncommon | 23:08 |
SpamapS | wtf.. zesty eol'd 4/13 | 23:09 |
jeblair | SpamapS: oh. maybe raof needs to update a rejectoscript. | 23:10 |
clarkb | jeblair: or use some zuul gate pipeline to properly evict children :) | 23:10 |
SpamapS | jeblair: I think that's manually typed in | 23:10 |
SpamapS | No I'm dumnb | 23:11 |
SpamapS | dumb | 23:11 |
clarkb | fungi: looking more its a logging handler called jobfile that wants to write to that job-output.txt location and fails to open that. Maybe a race in test setup? | 23:11 |
SpamapS | Zesty was RELEASED 4/13 | 23:11 |
clarkb | so eol is ~11/13 | 23:12 |
clarkb | er that math is wrong | 23:12 |
clarkb | 01/13 ? | 23:12 |
clarkb | I can add 9 to 4 and mod by 12 honest | 23:12 |
clarkb | fungi: ya reading logs there are playbooks under that tmpdir that appear to be read fine | 23:13 |
fungi | huh. vexing | 23:14 |
jeblair | clarkb, fungi: do i need to look into something? i haven't been following. | 23:15 |
fungi | jeblair: unit test failure on 501459 looks like a race on a console log file | 23:15 |
clarkb | http://logs.openstack.org/59/501459/1/gate/tox-py35-on-zuul/143c361/job-output.txt.gz#_2017-09-06_22_25_24_599560 and http://logs.openstack.org/59/501459/1/gate/tox-py35-on-zuul/143c361/job-output.txt.gz#_2017-09-06_22_25_24_680199 | 23:16 |
fungi | not sure whether it's just a racy test or finding a bug in zuul | 23:16 |
clarkb | ok I don't think its a race in the log anymore | 23:17 |
clarkb | we attempt to cat the job-output.txt file when a test fails | 23:17 |
clarkb | but depending on where that assertion happens it may be completely valid to not have that file on disk | 23:17 |
fungi | aha, as in perhaps too early | 23:17 |
clarkb | ya | 23:18 |
clarkb | http://logs.openstack.org/59/501459/1/gate/tox-py35-on-zuul/143c361/job-output.txt.gz#_2017-09-06_22_25_24_616595 appears to be a valid fail that is being caught there | 23:19 |
clarkb | I'll push up a patch to not traceback if the file doesn't exist (and log the case instead) | 23:19 |
jeblair | clarkb: ++ | 23:19 |
fungi | good eye | 23:20 |
clarkb | jeblair: http://logs.openstack.org/59/501459/1/gate/tox-py35-on-zuul/143c361/job-output.txt.gz#_2017-09-06_22_25_24_611455 I think that is what caused it to post failure | 23:21 |
clarkb | and then there it post failures http://logs.openstack.org/59/501459/1/gate/tox-py35-on-zuul/143c361/job-output.txt.gz#_2017-09-06_22_25_24_615440 ? | 23:22 |
clarkb | jeblair: but I'm not sure if the nonodeerror is expected | 23:22 |
jeblair | clarkb: the nonodeerror should be fine | 23:22 |
jeblair | that's probably a periodic poll | 23:22 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Handle debug messages cleanly https://review.openstack.org/501490 | 23:23 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Switch to publish-openstack-python-docs-infra https://review.openstack.org/501362 | 23:23 |
clarkb | ok if not that then I don't see any other sadness, it is running the command then later it returns exit code 2 | 23:24 |
jeblair | clarkb: hrm; the answer should be in job-output.txt. if it got that far, it should be there | 23:24 |
clarkb | [build: 30249a4eec9e429294f5cfeff0ccfd3e] Ansible output: b"<localhost> EXEC /bin/sh -c '/usr/bin/python2 && sleep 0'" then [build: 30249a4eec9e429294f5cfeff0ccfd3e] Ansible output terminated | 23:24 |
clarkb | the python2 invocation there is weird to me, is it piping python into it to execute? | 23:27 |
clarkb | ya that appears to be ansible's localhost connection logging that it is running python2 | 23:31 |
pabelanger | woot | 23:32 |
pabelanger | http://logs.openstack.org/9f/ffee8582c2d8013b89ae5f9c82c4bec9fdd5b59f/post/publish-openstack-python-docs-infra/a389335/job-output.txt.gz | 23:32 |
clarkb | and that hello-world playbook is copying "hello world" into a file in that tmpdir so maybe I am back to something is off with the tmpdir | 23:32 |
pabelanger | afs-docs job worked after our refactors | 23:32 |
fungi | looks like the streamer indentation fix is installed on zuulv3.o.o now if anyone's up for a restart | 23:32 |
fungi | oh, though i guess it's actually the executors we care about there? | 23:33 |
jeblair | fungi: yep, executors | 23:33 |
pabelanger | ze01 and ze02 please | 23:33 |
pabelanger | we're running both now | 23:33 |
clarkb | dest: "{{zuul.executor.log_root}}/hello-world.txt" so could just be the log dir | 23:33 |
pabelanger | we likely should update our ansible-playbook in system-config to support zuul-executors | 23:34 |
fungi | and was there a bug which is currently causing jobs not to get reenqueued when an executor is restarted? | 23:34 |
fungi | any special handling i need there? | 23:34 |
pabelanger | Ya, I think we'll have to see why aborted jobs are not getting requeued | 23:35 |
pabelanger | I can likely look into that in the morning | 23:35 |
jeblair | fungi: nah, just recheck if you care :) | 23:35 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add change_url to zuul dict passed into inventory https://review.openstack.org/501492 | 23:37 |
mordred | pabelanger: oh good re: afs-docs! | 23:37 |
pabelanger | mordred: Yup, I'll finish up publish-openstack-python-docs and get it up for review, but should be able to close that out for tomorrow | 23:38 |
fungi | is zuul not actually installed system-wide on the executors? | 23:38 |
fungi | oh, it's installed into a chroot, right? | 23:38 |
jeblair | fungi: it's installed in the normal manner | 23:38 |
pabelanger | zuul-executor is the service | 23:39 |
fungi | weird, pbr freeze doesn't seem to think zuul is installed | 23:40 |
mordred | fungi: it's python3 | 23:40 |
pabelanger | pip3 :) | 23:40 |
mordred | fungi: you're probalby getting pbr from python2 | 23:40 |
* mordred needs to make "python3 -m pbr freeze" work ... | 23:40 | |
fungi | mordred: gah, you're correct | 23:41 |
fungi | for some reason on zuulv3.o.o that wasn't happening | 23:41 |
mordred | fungi: well, to be fair it should also not be happening on ze01 - but we have, from time to time, accidentally done pip install . instead of pip3 install . | 23:42 |
mordred | fungi: which means some things, like the pbr bin script, may have last been installed with pip2 | 23:42 |
mordred | fungi: since command line entrypoints are last-installed-wins | 23:43 |
fungi | got it | 23:44 |
fungi | well, anyway, i confirmed the fixed version is present on both ze01 and ze02 and restarted zuul-executor on them | 23:44 |
fungi | mordred: anyway, `python3 /usr/local/bin/pbr freeze` does work even if python3 -m doesn't there yet | 23:46 |
fungi | so good enough for me once you reminded me zuul's not installed under the default python any longer | 23:46 |
clarkb | jeblair: I think streams may be getting crossed between ansible builds (and maybe tests) http://logs.openstack.org/59/501459/1/gate/tox-py35-on-zuul/143c361/job-output.txt.gz#_2017-09-06_22_25_24_600305 notice that the build: uuid doesn't match the uuid for the work dir | 23:55 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Split the log_path creation into its own role https://review.openstack.org/501494 | 23:58 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add a role to emit an informative header for logs https://review.openstack.org/501495 | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!