openstackgerrit | Clark Boylan proposed openstack-infra/nodepool feature/zuulv3: Drop -e for pip install for devstack plugin.sh https://review.openstack.org/468127 | 00:12 |
---|---|---|
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Revert "Use devstack's zookeeper support" https://review.openstack.org/468205 | 00:38 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Update keyscan for python3 compat https://review.openstack.org/468253 | 02:09 |
*** yolanda has joined #zuul | 05:32 | |
*** Cibo_ has joined #zuul | 06:15 | |
tobiash | morning | 06:26 |
tobiash | jeblair: how does zuulv3 choose the branch of the config repos? | 06:27 |
tobiash | I have the following use case and trying to find out if that's already possible or not. | 06:28 |
tobiash | Let's say I have several deployments for different projects. We will have an internal release process of the whole ci system where different deployments could be at different releases. | 06:30 |
tobiash | I'd like to be able to refer the different deployments to a common config repo with common roles but stick them to individual branches/tags per deployment. | 06:31 |
tobiash | (independent of the branching model of the project repos itself) | 06:33 |
*** Cibo_ has quit IRC | 06:40 | |
*** jamielennox is now known as jamielennox|away | 06:48 | |
*** Cibo_ has joined #zuul | 06:52 | |
*** bhavik1 has joined #zuul | 06:53 | |
*** bhavik1 has quit IRC | 06:59 | |
*** Cibo_ has quit IRC | 07:30 | |
*** openstackgerrit has quit IRC | 07:48 | |
*** jkilpatr has quit IRC | 10:30 | |
*** Cibo_ has joined #zuul | 10:34 | |
*** jkilpatr has joined #zuul | 10:52 | |
*** openstackgerrit has joined #zuul | 11:08 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Test shared pipeline across multiple gerrits https://review.openstack.org/468374 | 11:08 |
*** Cibo_ has quit IRC | 11:14 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Decode ssh output of gerrit connection https://review.openstack.org/468392 | 12:09 |
*** rcarrillocruz has quit IRC | 12:33 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix socket.error exception usage https://review.openstack.org/468405 | 12:44 |
Shrews | pabelanger: i noticed this py3.5 failure ^^^ in your job failure on 468253 | 12:45 |
pabelanger | Shrews: thanks I am looking at iterate error now too | 12:50 |
pabelanger | trying to see if | 12:50 |
pabelanger | for k in dict.keys() vs for k in dict vs for k in list(dict.keys()) has differences | 12:51 |
pabelanger | I think we could just do: for label in self._submittedRequests | 12:51 |
Shrews | pabelanger: does that work in 2.7? | 12:56 |
Shrews | nah, surely not | 12:58 |
Shrews | heh, it does | 12:59 |
Shrews | TIL | 12:59 |
Shrews | oh, the other error there is use doing bad things with a dict. we loop through it and modify it within the loop. yikes | 13:05 |
Shrews | imma bad programmer | 13:06 |
pabelanger | Shrews: Ya, not sure which one we'd want to go with | 13:10 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix removeCompletedRequests for dict iteration https://review.openstack.org/468409 | 13:13 |
Shrews | let's see if that fixes it. pabelanger, you wanna rebase 468253 on top of that one and see what we get? | 13:13 |
Shrews | maybe that's not necessary though | 13:14 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use https to clone Github3.py https://review.openstack.org/468413 | 13:15 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Drop -e for pip install for devstack plugin.sh https://review.openstack.org/468127 | 13:17 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Update keyscan for python3 compat https://review.openstack.org/468253 | 13:17 |
Shrews | pabelanger: i meant rebase on 468409, but that's alright. let's see what the experimental job comes back with | 13:18 |
pabelanger | we need 468127 currently | 13:19 |
Shrews | i thought maybe your patches were exposing the problem, but looks like maybe not | 13:19 |
pabelanger | so, if you rebase 468409 on 468127 it should expose the failure | 13:19 |
pabelanger | or 468253 | 13:20 |
pabelanger | that is tip | 13:20 |
Shrews | ah, thx | 13:20 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix removeCompletedRequests for dict iteration https://review.openstack.org/468409 | 13:21 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained https://review.openstack.org/468208 | 13:40 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851 | 13:40 |
SpamapS | jeblair: ^ Amusing amend to the bwrap there informed by our close_fds find yesterday. Since py2.7 lacks the pass_fds parameter if we add close_fds=True to our ansible running (not a bad idea) we may become py2.7 incompatible :-P | 13:41 |
*** isaacb has joined #zuul | 14:02 | |
SpamapS | http://zuulv3-dev.openstack.org/logs/9275f367a0ae4993965dc4961f94b9b2/ <--- can somebody explain this? | 14:06 |
SpamapS | looks like maybe something went terribly wrong | 14:06 |
jeblair | tobiash: the config repos are assumed to be branchless -- zuul only reads from master there (mostly because considering more than one branch of a config repo becomes very difficult to reason about). i suppose we could have a zuul configuration setting to indicate which single branch (master or something else) of config repos to read, but i would want to be very careful with that that we don't make it too confusing. | 14:24 |
jeblair | SpamapS: do you mean the fact that there is a lot of test output going to the console there? | 14:25 |
tobiash | jeblair: something like http://paste.openstack.org/show/610740/ ? | 14:27 |
jeblair | tobiash: yep | 14:29 |
tobiash | jeblair: ok, then I'll try this next week | 14:29 |
tobiash | thanks | 14:29 |
*** dkranz has joined #zuul | 14:30 | |
jeblair | tobiash: note that a lot of the self-testing jobs and instant re-configuration in zuulv3 is geared toward something more like a continuous deployment model. so pinning to a tag, etc, will inhibit some of that. | 14:30 |
tobiash | jeblair: I'm aware of that so I'll try to minimize the pinned config parts | 14:31 |
tobiash | jeblair: I'm just thinking of the really basics of the deployment like log collection or similar which would need pinning if I share them between deployments and want to add something which requires a zuul update | 14:32 |
jeblair | tobiash: that makes sense. we're also considering site-local variables, so that even base jobs could have, say, log collection destinations in a config file rather than a repo. | 14:34 |
tobiash | great, I think a carefully chosen (minimal) mixture of both of these two concepts would fit well in our deployment/release model | 14:37 |
Shrews | pabelanger: looks like my patches eliminated the exceptions, but now looks like the nodes is never booted. stuck in the waitfornode loop | 14:41 |
Shrews | s/nodes is/node is/ | 14:41 |
pabelanger | Shrews: ya, I think base64.encodestring() is the issue now | 14:44 |
pabelanger | not python3 compat | 14:44 |
pabelanger | reading docs, it expects a bytes-like object | 14:45 |
pabelanger | and we pass string | 14:45 |
tobiash | Was there a change in the executor config? Looks like it doesn't know the configured gerrit connections anymore. | 14:51 |
Shrews | pabelanger: hrm, yeah. also, would be nice if we could set min-ready:0 for labels we don't build images for | 14:54 |
jeblair | pabelanger: there's probably some base64 changes in zuul you can reference | 14:57 |
jeblair | tobiash: yeah, i think we may have merged a change that should *only* load the gerrit connections in the executor (ie, not sql reporters). I453754052b75de3d4fac266dc73921c7ce9c075f obviously that shouldn't cause the behavior you're seeing, but maybe there was an issue with it? | 14:59 |
jeblair | tobiash: that's the only recent thing i can think of | 14:59 |
tobiash | jeblair: yepp, reverting this fixes my issue | 14:59 |
jeblair | i guess we're not running it yet :| | 15:01 |
jeblair | i wonder if we need isinstance rather than issubclass | 15:01 |
tobiash | jeblair: so with this change only timer and zuul get configured in the merger | 15:01 |
tobiash | I'll try this out | 15:01 |
jeblair | that sounds like the opposite of the desired behavior | 15:02 |
jeblair | oh yeah, 'driver' in that function is an object instance, not a class object. so i think the fix will be to switch to isinstance | 15:03 |
jeblair | i would have expected this to be caught by tests... :| | 15:04 |
tobiash | changing to isinstance -> timer, zuul, gerrit connections | 15:05 |
tobiash | I'll try to craft a test case | 15:05 |
jeblair | oh, i wonder if the tests aren't passing source_only correctly. maybe we should not make that a default argument. | 15:06 |
jeblair | since it defaults to false, that means the executor in the tests would load everything. | 15:06 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not request nodes for which we have no images https://review.openstack.org/468447 | 15:11 |
Shrews | pabelanger: ^^^ i *think* that will work | 15:11 |
tobiash | jeblair: do the tests have separate connection registries by scheduler and executor? | 15:14 |
jeblair | tobiash: oh, you're right, they don't. | 15:15 |
jeblair | tobiash: i'm not sure i want to change that, so maybe we do just need a separate unit test of this. | 15:15 |
tobiash | jeblair: I think I have a test case (+fix) | 15:33 |
tobiash | will upload shortly | 15:33 |
jeblair | tobiash: cool, thanks :) | 15:33 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Fix missing source connections on merger https://review.openstack.org/468451 | 15:42 |
tobiash | jeblair: ^^^ that's my try | 15:42 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Fix base64 encoding of server key https://review.openstack.org/468453 | 15:42 |
Shrews | pabelanger: maybe?? ^^^ | 15:42 |
pabelanger | Shrews: ya, that should work | 15:45 |
pabelanger | just tested locally | 15:45 |
pabelanger | [u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCbqq2UG8bkIcd30BsNt0w9HizdwyB5MMZfBQi8phXaHuKieUEZ1NnXfohIKWBCMC1wYm/XH0rsJOGuolNlDKc12bQ/ibvnNmAKT7H95QHa6QVOwMioeQQPVZcGvs0g6iJOM6bfLn6OJpDQeUMJz9aGJ6JuFeR8/kQsT8s/hXLOxLI58Ac1m5heDZ/8u+GKYW5eB9rXWWKDcYoeWOgs3hsw9rjP2ILW2QjNYAKnv9RPir/CCpa6A2Ydf9Kq5Mr4rdWz1ZP4J/zvpgyIsCrc9pmGPUZ0WkcivxPIk04al71pLKz2ZXX9mtIbmY6sq2L3DBE+vKBpJfj9fi4Zz/4/yqZJ'] | 15:45 |
pabelanger | lets see if zookeeper likes it now | 15:46 |
jeblair | tobiash: lgtm, thanks! | 15:48 |
tobiash | :) | 15:48 |
jeblair | anyone else want to review mordred's log output cleanup patches? 467310 467603 | 15:49 |
* SpamapS will take a look | 15:52 | |
SpamapS | jeblair: would that make the link I pasted easier to read? | 15:52 |
SpamapS | because I honestly have no idea what went wrong there | 15:52 |
jeblair | SpamapS: yes | 15:52 |
SpamapS | "Could not determine repository location of /home/zuul/src/git.openstack.org/openstack-infra/zuul" ? | 15:52 |
jeblair | SpamapS: however, i think mordred identified a further problem in that we might not be getting data back from subsequent command invocations? Shrews, were you looking into that? | 15:53 |
Shrews | no, i think monty was | 15:54 |
jeblair | SpamapS: (it's not clear to me whether we would make things "worse" by landing mordreds fix in that we wouldn't see any output after the first command; though "worse" in this case is definitely subjective) | 15:54 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Use https to clone Github3.py https://review.openstack.org/468413 | 15:58 |
jeblair | SpamapS: inline question on 468173; otherwise lgtm! | 16:00 |
SpamapS | jeblair: I think the net result is "better" | 16:01 |
jeblair | Shrews: how so? we don't need to test that we can generate an ssh key, right? that seems like it would unecessarily slow down the tests... | 16:03 |
jeblair | SpamapS: oh! that was a response to the console log thing. :) | 16:04 |
jeblair | ha, sorry | 16:04 |
jeblair | Shrews: and sorry, i s-tab-failed there | 16:04 |
jeblair | we all kinds of mixed up in irc | 16:05 |
SpamapS | jeblair: indeed, we should add a list of tags to any multiplexed irc conversation. tags=['console'] | 16:06 |
jeblair | yes, and require unique first letters... we may need to use unicode | 16:06 |
mordred | jeblair, Shrews: my thinking was that seeing the issue would be clearer if we land the patches - although I agree that it 'reduces' our ability to see test output at that point | 16:08 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Strip unneeded cmd, changed, stdout and stderr from results https://review.openstack.org/467310 | 16:09 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Display command/shell results in a banner https://review.openstack.org/467603 | 16:09 |
jeblair | mordred: wfm | 16:09 |
Shrews | mordred: yeah. i'll review those too | 16:10 |
mordred | k. let's see what new issues I created | 16:10 |
jeblair | i love seeing the vintage of zuul unit tests when they say things like "stable/havana" | 16:12 |
Shrews | pabelanger: eh? http://logs.openstack.org/53/468453/1/experimental/gate-dsvm-nodepool-py35-src-nv/2f1785a/logs/devstacklog.txt.gz#_2017-05-26_15_55_28_164 | 16:12 |
pabelanger | Shrews: need to be rebased on 468127 | 16:14 |
pabelanger | we should see about landing that | 16:14 |
Shrews | gah. i rebased the earlier patches in gerrit, not locally | 16:14 |
Shrews | grrr | 16:14 |
mordred | jeblair: since those are the action plugins we should pick up those changes without a restart, right? | 16:16 |
jeblair | clarkb, pabelanger: i'm okay with 468127, but just for my own edification, why do we need to drop -e there? | 16:17 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix base64 encoding of server key https://review.openstack.org/468453 | 16:17 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not request nodes for which we have no images https://review.openstack.org/468447 | 16:17 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix removeCompletedRequests for dict iteration https://review.openstack.org/468409 | 16:17 |
jeblair | mordred: i think those are the things we copy into a staging area on start, so we will need a restart | 16:18 |
jeblair | (we'll also need a manual git pull and pip install since the server isn't puppeted) | 16:19 |
pabelanger | jeblair: it appears pip -e for python3 write more data to our git repo, we thing maybe some addition setup files. The issue is, setup_develop above is run under devstack, and sudo. So our choice was either use sudo or drop -e. | 16:20 |
pabelanger | happy to do a 3rd option if there is one | 16:21 |
pabelanger | http://logs.openstack.org/53/468453/1/experimental/gate-dsvm-nodepool-py35-src-nv/2f1785a/logs/devstacklog.txt.gz#_2017-05-26_15_55_28_164 is the issue | 16:21 |
mordred | jeblair: ah - ok. cool. I can work on doing that | 16:22 |
mordred | jeblair: it should just be the executor that needs a restart yah? | 16:25 |
jeblair | mordred: yep | 16:27 |
mordred | jeblair, pabelanger: I'm tailing the debug log of the executor right now ... and I'm seeing issues float by about no-route-to-git.o.o | 16:36 |
pabelanger | odd | 16:41 |
*** isaacb has quit IRC | 16:42 | |
jeblair | mordred: there's an ipv6 and and ipv4 check; i think it's okay if the ipv6 check fails if ipv4 works | 16:47 |
mordred | ah - ok | 16:50 |
* SpamapS rebases | 16:52 | |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained https://review.openstack.org/468208 | 16:52 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851 | 16:52 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add SSH Agent Primitives and usage https://review.openstack.org/468173 | 16:52 |
pabelanger | ya, we just started doing traceroute / traceroute6 on zuulv3-dev | 16:53 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Fix socket.error exception usage https://review.openstack.org/468405 | 17:02 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Drop -e for pip install for devstack plugin.sh https://review.openstack.org/468127 | 17:03 |
mordred | jeblair, Shrews: WELP - those patches just flat didn't work at all ... investigating | 17:13 |
mordred | jeblair: is it possible we only do the code staging copy on schedule restart? | 17:14 |
* mordred goes to look | 17:14 | |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Update keyscan for python3 compat https://review.openstack.org/468253 | 17:15 |
mordred | jeblair: OH FOR THE LOVE OF ... | 17:16 |
jeblair | mordred: did you do a "pip install -U ." ? | 17:16 |
mordred | jeblair: nope | 17:16 |
mordred | jeblair: I just did the git pull :) | 17:16 |
jeblair | that'll do it | 17:16 |
mordred | I'm going to restart the executor again | 17:17 |
jeblair | ++ | 17:17 |
mordred | maybe THIS time I'll be able to make something break because I write bad code | 17:17 |
mordred | rather than making it no-op because I don't know how to intsall software | 17:17 |
SpamapS | hrm | 17:19 |
SpamapS | http://paste.openstack.org/show/610752/ | 17:19 |
jeblair | mordred: a universe of possibilities awaits! | 17:19 |
Shrews | oh, what a universe | 17:20 |
clarkb | Shrews: I've approved https://review.openstack.org/#/c/468409/3 but left a comment inline if thats something you want to clean up | 17:20 |
SpamapS | mordred: HeroOps! | 17:20 |
SpamapS | which is also "Her Oops" which I'm thinking is why we haven't seen much uptake on that moniker. ;) | 17:21 |
Shrews | clarkb: ah, yep. will clean it up after helping pabelanger fix the py3.5 things | 17:21 |
jlk | THEY CALL ME, MISTER OOPS! | 17:21 |
jlk | uh, without the comma | 17:22 |
jeblair | oops | 17:22 |
Shrews | capitalization is everything, SpaMaps | 17:22 |
jeblair | SpamapS: [paste] hrm, do you think that's related to/racing with ssh-agent and close_fds? | 17:23 |
jeblair | SpamapS: (or racing with git ops?) | 17:23 |
SpamapS | Glittering marble tile and tranquil aroma therapy with local cannabis incense burning awaits you once you brave this dirt track outside Fontana, CA... THIS, is Spa Maps! | 17:23 |
SpamapS | jeblair: I do.. I think we may need to start passing close_fds=True on some things. But that also might hasten our py3 transition (or we have to double-fork to run bwrap with open fds) | 17:25 |
jeblair | SpamapS: oh, that's py3 too, so everything should be getting close_fds by default | 17:26 |
jeblair | SpamapS: do you set close_fds=False for the bwrap run? | 17:26 |
SpamapS | jeblair: I don't force it no, since I'm letting the caller pass any subprocess.Popen args in. | 17:26 |
clarkb | Shrews: I've approved those nodepool cleanups. Doesn't look like py35 job is happy yet, but I figure those changes can only help | 17:27 |
SpamapS | jeblair: I recently added some code that is py3 only to it though, where if you do pass close_fds=True, we have to use pass_fds=[passwd_r, group_r] so that our getent stuff makes it in. | 17:27 |
jeblair | SpamapS: closed_fds defaults to true in py3 though | 17:27 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Fix base64 encoding of server key https://review.openstack.org/468453 | 17:27 |
SpamapS | jeblair: right... let me show you the patch.. | 17:28 |
Shrews | clarkb: thx. waiting for the log to see what else is wrong | 17:28 |
SpamapS | jeblair: https://review.openstack.org/#/c/453851/18/zuul/driver/bubblewrap/__init__.py | 17:28 |
SpamapS | https://server/job/tox-py35/0/ <-- handy link | 17:29 |
SpamapS | :-P | 17:29 |
jeblair | SpamapS: Shrews has a fix for that in progress :) | 17:30 |
SpamapS | indeed! | 17:30 |
SpamapS | so the other option btw, is to double-fork | 17:30 |
SpamapS | we could use the zuul-bwrap thing I wrote to do it actually | 17:30 |
jeblair | SpamapS: i'm really confused now. it seems like your code should handle the py35 case... :? | 17:31 |
SpamapS | if we get a close_fds=True in py27, we could run zuul-bwrap, and zuul-bwrap would run bwrap with close_fds=False | 17:31 |
Shrews | i have a fix for what? | 17:31 |
jeblair | Shrews: log streaming | 17:31 |
SpamapS | jeblair: It does handle py35. It does not handle py27 if we start passing close_fds=True anywhere else. These are facts, but I think we got off the context path... | 17:32 |
jeblair | (specifically, a fix for the lack thereof) | 17:32 |
jeblair | SpamapS: i thought you showed me a bad fd error with py3 | 17:32 |
SpamapS | jeblair: in the py35 run that failed, we can presume close_fds=True is the default, and so there may be something else causing our bad file descriptor errors. :-/ | 17:32 |
*** rcarrillocruz has joined #zuul | 17:32 | |
SpamapS | for once, though, this reliably fails when run just as that one test | 17:34 |
jeblair | nice! | 17:34 |
SpamapS | so I can actually employ like, debuggers and stuff | 17:34 |
SpamapS | The quality of my problems is increasing. | 17:34 |
jlk | It's a good problem, Bort | 17:39 |
jlk | Is there a description of what the various gerrit statuses mean? Context is using them as a pipeline requirement. | 17:42 |
pabelanger | http://paste.openstack.org/show/610755/ | 17:45 |
pabelanger | have we changed anything on zuulv3-dev.o.o recently^ | 17:46 |
pabelanger | ya, looks like it was restarted | 17:46 |
mordred | pabelanger: yup | 17:48 |
jeblair | mordred, pabelanger: we should land https://review.openstack.org/468451 | 17:48 |
jeblair | sorry i lost track of that | 17:49 |
mordred | jeblair: oh - why yes we should! | 17:49 |
mordred | +3 | 17:49 |
jeblair | jlk: https://review.openstack.org/Documentation/rest-api-changes.html#change-info is all i see | 17:50 |
jeblair | jlk: it just enumerates them | 17:50 |
jeblair | jlk: but i can describe them more if you have questions | 17:51 |
jlk | I'm trying to determine if I need to map any of that over to github | 17:51 |
jlk | "status" is a completely different thing over there | 17:51 |
jlk | so I was hoping to understand the deeper meaning on the gerrit side | 17:52 |
jeblair | jlk: (the trickiest one is probably submitted, which is gerrit speak for "someone or something (ie zuul) has told gerrit to merge the change and that action has been added to the queue) | 17:52 |
SpamapS | oh man this is getting ugly | 17:52 |
SpamapS | it's definitely still a race | 17:53 |
jeblair | jlk: most closely maps to pull request 'state': open/closed | 17:53 |
SpamapS | http://paste.openstack.org/show/610756/ | 17:53 |
jlk | yeah, and we've already got an "open" boolean | 17:53 |
jeblair | though... how is it that pull requests have only one closed state? shouldn't there be closed-and-merged and closed-unmerged? | 17:53 |
SpamapS | seems like maybe there's a bunch of thread-unsafe stuff | 17:53 |
jlk | well | 17:53 |
jlk | on sec | 17:54 |
jlk | there are multiple data points on github side | 17:55 |
jeblair | SpamapS: there should never be more than one thread dealing with a single jobdir | 17:55 |
jlk | there is "state" which can be open or closed | 17:55 |
SpamapS | jeblair: the problem is pipes and fds | 17:55 |
jeblair | jlk: ah, and 'merged' boolean | 17:55 |
SpamapS | jeblair: there's one global list of open fds for the executor.. and something is closing other threads' fds | 17:55 |
jlk | jeblair: yeah, merged_at | 17:56 |
jlk | or an API call to see if the thing is merged. | 17:56 |
jlk | GET /repos/:owner/:repo/pulls/:number/merge 204 merged, 404 not. | 17:56 |
jeblair | jlk: i see merged_at as well as merged | 17:56 |
jlk | ah right | 17:57 |
jlk | and for even more confusion, there is 'merge_commit_sha' which changes it's meaning and value depending on the state of the PR and of the repo configuration | 17:57 |
mordred | I see merged: False for unmerged things | 17:57 |
mordred | maybe | 17:58 |
mordred | or something | 17:58 |
SpamapS | OSError: [Errno 9] Bad file descriptor: 'role_0' | 17:58 |
jeblair | jlk: at any rate... i wouldn't recommend mapping things, per se. but rather just exposing (some of) what github provides. so let 'status' be different between github and gerrit. if we decide we want a merged attribute, we can add it. | 17:59 |
SpamapS | that's weird | 17:59 |
jeblair | SpamapS: yes that is a funny looking file descriptor | 18:00 |
jlk | oh right I wanted those to be different | 18:00 |
jlk | I am exposing the things in github native ways. I just didn't know what all you could expect from "status" on gerrit. | 18:01 |
jeblair | jlk: got it, cool. so basically gerrit status is a combo of github 'state' and 'merged'. | 18:02 |
jeblair | state really being the important one to have | 18:02 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Fix missing source connections on merger https://review.openstack.org/468451 | 18:02 |
jeblair | (which is covered by 'open' as you point out) | 18:02 |
jlk | On github side I already added a boolean for open, so you can stop processing if a thing is closed. | 18:02 |
jlk | Does it make sense to expose 'merged' as another boolean? | 18:02 |
jlk | I can't think of one immediately, so I'm inclined to not add it. | 18:03 |
jeblair | jlk: same here | 18:03 |
jeblair | add it when it proves useful, i'd say | 18:03 |
jlk | +1 | 18:03 |
SpamapS | __del__ strikes again | 18:04 |
SpamapS | oh | 18:04 |
SpamapS | haha | 18:04 |
* SpamapS sees the bug | 18:04 | |
SpamapS | so.......... | 18:07 |
* SpamapS just found 5 ssh-agents hanginging around | 18:07 | |
SpamapS | beginning to doubt running them in the background | 18:07 |
SpamapS | they're all from aborted test runs in the last 2 hours :-P | 18:07 |
jeblair | SpamapS: yeah... though they didn't seem to die after aborted test runs with -D either, iirc... | 18:10 |
* jeblair assumes something something testr | 18:11 | |
SpamapS | jeblair: for desktop it's not a big difference. But I'm concerned about the server. | 18:11 |
SpamapS | no it's not testr, it's actually systemd/upstart holding them open | 18:11 |
SpamapS | because all the terminals are in the same process group now | 18:11 |
SpamapS | so your child processes get orphaned to the desktop session manager | 18:12 |
SpamapS | on a server though, if zuul-executor dies, the ssh-agents that it hasn't killed would get killed _if_ it's running as a daemon or systemd service since systemd will make a process group for service units by default | 18:13 |
jeblair | SpamapS: in production, it should only be an issue in the case of an unclean shutdown.... | 18:13 |
SpamapS | agreed, a 'splosion level crash | 18:13 |
jeblair | SpamapS: and since we already make a pid group for the ansible run, an unclean shutdown is already messy there | 18:13 |
SpamapS | what's the thinking behind that btw? | 18:14 |
jeblair | (since all the ansible processes would continue in the same way as ssh-agent, right?) | 18:14 |
jeblair | SpamapS: we have to kill ansible sometimes, and we want it and all its subprocesses to die | 18:14 |
SpamapS | the ansible processes would have their stdout/err connected to pipes though, no? | 18:14 |
SpamapS | so they'd get EPIPE or EBADF on operations | 18:15 |
jlk | jeblair: if zuul gets an event for a change, and queues up that event for processing, and time passes, when it's ready to process that event when it looks at the change to run filters, does it get new data about the change at that point? In gerrit it's _updateChange (with some caching), in github it's getChange. | 18:15 |
jlk | the getChange happens at schedule time, which could be some time after event time, right? | 18:15 |
jeblair | SpamapS: maybe? we can try killing it and see what happens :) | 18:15 |
jeblair | jlk: part of the reason for the caching is so that any time there's new information about a change, we update the singleton object in zuul to reflect the new information | 18:16 |
jeblair | jlk: so in gerrit's case, it will use the most recent data | 18:17 |
jeblair | jlk: iirc, github doesn't implement that yet, but i think it should | 18:17 |
jlk | yeah I think we talked through the caching once before, and I put a pin in implementing a cache for github. | 18:17 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Remove unnecessary list() https://review.openstack.org/468489 | 18:17 |
jlk | mostly I wanted to be sure I had the model right, that "fresh" data is grabbed at schedule time | 18:17 |
jeblair | jlk: yep. and it'll go stale until we implement the caching/update thing. then it should get asynchronously updated. | 18:18 |
jlk | it'll "go stale" between the time scheduler decides it needs to be tested and it gets tested? | 18:19 |
jeblair | jlk: yes, or even in the time between when it was enqueued and when it's evaluated | 18:20 |
jlk | what would cause a fresh update at evaluation time? | 18:20 |
jlk | (in the gerrit case) | 18:21 |
jeblair | jlk: in either case, any further update to the change/pr. | 18:22 |
jlk | ooooh wait, I see, because a new event would come in, which would cause an update to the in-memory cached object | 18:23 |
jeblair | jlk: exactly | 18:23 |
jlk | <--- LIGHT BULB | 18:23 |
jeblair | jlk: that might just be a comment-added. it might also be something substantial like someone closed the pr. | 18:23 |
jlk | yup, makes total sense now. I was looking at it from the wrong angle. | 18:23 |
jeblair | in any case, that event should (as a side effect) update the data on the change/pr in memory | 18:24 |
jeblair | and if the scheduler hasn't gotten around to evaluating the original event, once it does, it'll use the new data (so it may well not enqueue a newly updated change in check if someone manages to close it quickly) | 18:25 |
jeblair | or, if it's already running, the next time the pipeline manager passes over that change, it will see that it no longer meets the "open: true" pipeline requirement that it has, and kick it out. | 18:25 |
SpamapS | Oo, I wonder if we should check for leaked FDs in our test shutdown checks. | 18:28 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained https://review.openstack.org/468208 | 18:30 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851 | 18:30 |
* SpamapS stops closing other peoples' fds | 18:30 | |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Cleanup failed upload records https://review.openstack.org/467783 | 18:31 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Fetch server console log if ssh connection fails https://review.openstack.org/452494 | 18:35 |
* jlk afk for some MTB riding | 18:52 | |
jeblair | SpamapS: aha! i see what you mean about __del__ there. is it worth thinking about accomplishing that with a context manager? | 18:58 |
jeblair | mordred: did you re-restart zuulv3 yet after the connection fix landed? | 18:58 |
mordred | jeblair: nope. I will now do that | 18:58 |
mordred | heh | 18:59 |
mordred | Successfully installed zuul-2.5.2.dev834 | 18:59 |
mordred | that's a lot of dev patches :) | 18:59 |
jeblair | hehe | 18:59 |
jeblair | mordred: if only it would tell us how many patches less than 3.0 we are :) | 18:59 |
mordred | jeblair: I did service zuul-executor stop and it did not stop | 19:00 |
mordred | jeblair: how much do we care about that righ tnow? | 19:00 |
jeblair | mordred: i don't know what "service zuul-executor stop" does :| | 19:00 |
mordred | me either | 19:01 |
jeblair | mordred: "zuul-executor stop" will, however, dtrt (and that is, ultimately, what service zuul-executor stop should do) | 19:01 |
jeblair | or maybe it should graceful | 19:01 |
jeblair | i dunno | 19:01 |
mordred | nope. still running | 19:01 |
jeblair | mordred: if it did graceful, then that's expected | 19:01 |
jeblair | it will stop eventually | 19:01 |
mordred | ah. ok | 19:01 |
mordred | the last 4 lines from the log are: | 19:02 |
mordred | 2017-05-26 18:59:41,381 DEBUG zuul.ExecutorServer: Stopping | 19:02 |
jeblair | (also, it will be really nice to have rotated logs) | 19:02 |
mordred | 2017-05-26 18:59:41,381 DEBUG zuul.CommandSocket: Accepted socket connection <socket._socketobject object at 0x7efdb0a2f7c0> | 19:02 |
mordred | 2017-05-26 18:59:41,381 DEBUG zuul.CommandSocket: Received _stop from socket | 19:02 |
mordred | 2017-05-26 18:59:41,382 DEBUG zuul.AnsibleJob: Abort: no process is running | 19:02 |
mordred | ++ | 19:02 |
jeblair | so i guess it ran "stop" and not graceful | 19:02 |
jeblair | which means this is unexpected | 19:03 |
jeblair | mordred: i'd sigkill at this point | 19:03 |
mordred | jeblair: ok. just didn't want to lose any valuable debugging if we thought it was valuable | 19:03 |
jeblair | (maybe some side effect of the bug we're fixing?) | 19:03 |
jeblair | mordred: nah | 19:04 |
mordred | jeblair: I'm voting side-effect | 19:04 |
mordred | jeblair: ok - killed and restarted | 19:04 |
jeblair | mordred, Shrews, pabelanger, jlk, jesusaur, SpamapS, clarkb, fungi, anyone else: er i just realized monday is a us holiday; should we cancel the zuul meeting? | 19:21 |
* jeblair is bad at meetings | 19:21 | |
Shrews | Yes | 19:21 |
mordred | jeblair: ++ | 19:22 |
mordred | jeblair: that'll give me another week to accomplish my tasks so that tristanC doesn't dislike me | 19:22 |
clarkb | I'm likely to be afk | 19:23 |
fungi | jeblair: i will gladly attend a zuul meeting monday, but will be just as happy not to need to remember to take a break from holidaying to do so | 19:24 |
* jeblair is also bad at mondays | 19:25 | |
jlk | I'm AFK | 19:28 |
jlk | for Monday | 19:28 |
pabelanger | jeblair: wfm | 19:29 |
* jesusaur will likely be around monday, but not working | 19:34 | |
SpamapS | +1 cancel | 19:44 |
*** jkilpatr has quit IRC | 19:49 | |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained https://review.openstack.org/468208 | 19:52 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851 | 19:52 |
mordred | Shrews, jeblair: I think the hard-restart we did of the zuul executor may have broke something | 19:57 |
mordred | Shrews, jeblair: http://zuulv3-dev.openstack.org/ is not returning and the executor debug log hasn't really done anything in a while | 19:59 |
jeblair | bummer | 20:03 |
jeblair | mordred: http://zuulv3-dev.openstack.org/status.json works | 20:03 |
* Shrews certainly has no idea about executor things | 20:04 | |
mordred | Shrews: well - I was pinging you originally because I had a stuck status page that made it look like everything was queued - so I thought maybe zk/nodepool blah | 20:05 |
mordred | but then I reloaded the status page | 20:05 |
mordred | and got blank | 20:05 |
jeblair | mordred: http://git.openstack.org/cgit/openstack-infra/zuul/tree/etc/status/public_html/index.html?h=feature/zuulv3#n34 | 20:07 |
jeblair | i think that's the problem | 20:07 |
mordred | jeblair: oh - that sure does seem to have wrapped wrong | 20:08 |
jeblair | yeah, i'm assuming line 34 should be on line 33 | 20:08 |
mordred | yup | 20:08 |
*** jkilpatr has joined #zuul | 20:08 | |
jeblair | mordred: i'm knee deep in something i don't want to stash; you want to push up a fix? | 20:08 |
mordred | jeblair: so - that said- the executor debug log is still very not doing anything | 20:08 |
mordred | jeblair: yup. I'm on it | 20:08 |
jeblair | mordred: agreed; lemme look at the status json and see if it should be | 20:09 |
jeblair | yeah, it definitely thinks it should be running jobs | 20:09 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix bad text wrap in status page https://review.openstack.org/468532 | 20:09 |
mordred | oh! it just did something | 20:10 |
mordred | jeblair: so - the last time it sat there doing nothing, the last line in the log was: | 20:10 |
mordred | 2017-05-26 20:02:01,121 DEBUG zuul.Repo: CreateZuulRef feature/zuulv3/Zdfed0f5092ff42cd8f510ce65be7771d at 7b7520b55df36b62b6800273c16e1200c7c76c03 on <git.Repo "/var/lib/zuul/executor-git/git.openstack.org/openstack-infra/zuul/.git"> | 20:10 |
mordred | it is once again seeming to not do anything - and the last line in the log is now: | 20:11 |
mordred | 2017-05-26 20:10:09,099 DEBUG zuul.Repo: CreateZuulRef feature/zuulv3/Zed63cd22bbdb43b7a2b67a337215a2b2 at 9c702bcb666e17d57f05372eafd68b39a6e8a280 on <git.Repo "/var/lib/zuul/executor-git/git.openstack.org/openstack-infra/zuul/.git"> | 20:11 |
jeblair | yeah, so the executor's merger is doing stuff, but it's apparently not running jobs | 20:11 |
mordred | yah | 20:11 |
jeblair | executor:execute001 | 20:12 |
jeblair | nothing queued | 20:12 |
mordred | jeblair: 14b315d was the commit that was running before - there aren't a ton of things related to interaction in there, but there is some of the work finishing the merger/executor split in the new running executor | 20:15 |
jeblair | mordred: maybe we should restart the scheduler in case something changed with the interaction | 20:16 |
mordred | jeblair: perhaps we're on opposite sides of a shift? | 20:16 |
mordred | yah | 20:16 |
* mordred tries that | 20:16 | |
jeblair | jobs! | 20:17 |
mordred | jeblair: yah. much happier now | 20:17 |
mordred | jeblair: so - let's say it was interface mismatch | 20:18 |
jeblair | wfm | 20:18 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Write inventory as yaml not ini https://review.openstack.org/467634 | 20:34 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add a space and a capital letter to success banner https://review.openstack.org/468539 | 20:39 |
mordred | SpamapS, jeblair: ^^ based on looking at the output now, there is a minor edit to the format string | 20:41 |
mordred | http://zuulv3-dev.openstack.org/logs/59f6cd6d3ae84dd2af24d90878f0e964/ansible_log.txt | 20:41 |
pabelanger | clarkb: also means, if we create the same setup in openstackzuul project, this should just work | 20:41 |
clarkb | pabelanger: for ipv6? I think we want to have consistent connectivity and routes | 20:41 |
clarkb | but ya | 20:41 |
mordred | SpamapS, jeblair: I also think we need a filtered on_task_start method too - printing _raw_params is bad | 20:42 |
pabelanger | mordred: ++ | 20:42 |
pabelanger | i noticed that | 20:43 |
mordred | and honestly I think we need to sort out how to not double-timestamp this | 20:43 |
pabelanger | clarkb: oops wrong room | 20:43 |
SpamapS | mordred: can't we just ARA this stuff and be done? ;) | 20:48 |
mordred | SpamapS: :) | 21:00 |
mordred | pabelanger: I feel like a few weeks ago one of us figured out what we thought was wrong with the "stream logs from host only works for first task" issue | 21:01 |
mordred | pabelanger: you don't happen to remember do you? | 21:01 |
pabelanger | mordred: was it something to do with unicode / encoding / decoding? | 21:02 |
pabelanger | cannot remember | 21:02 |
jeblair | mordred: i agree it would be nice not to double timestamp; i'm really not sure how to accomplish it (i don't even mean technically -- i'm not sure what the desired outcome would actually be) | 21:07 |
jeblair | mordred: the inner timestamp is critical for debugging (it's the *actual* timestamp); the outer one can be discarded... *except* that means that the log no longer has a consistent timestamp series or format. | 21:09 |
jeblair | i'm inclined to that (discard the outer timestamp, if that's possible). i just worry about whether that makes life difficult for log parsers like osla and grok | 21:10 |
mordred | well - I'm thinking we could change the logger format we emit in the callback plugin (which is what controls the outer one) | 21:10 |
jeblair | ooh yeah | 21:10 |
mordred | to match the format the inner one would have | 21:11 |
jeblair | if we change that to match the command module, and then drop it from content streamed from the command module, we might be set | 21:11 |
mordred | then when we emit the messages from the remote logger, don't append timestamps | 21:11 |
mordred | yup | 21:11 |
mordred | I think that's doable | 21:11 |
jeblair | (also p=871 u=zuul are *really* unecessary for us) | 21:11 |
mordred | yup | 21:12 |
mordred | they should go away | 21:12 |
mordred | and a few other cleanups | 21:12 |
mordred | pabelanger: I don't either :( | 21:12 |
jeblair | (it's only interesting when the pid changes, but we can easily throw in a "starting new playbook" line) | 21:12 |
mordred | jeblair: indeed | 21:13 |
mordred | jeblair: oh wow - wanna see a fun hack? | 21:14 |
mordred | jeblair: logger = logging.getLogger("p=%s u=%s | " % (mypid, user)) | 21:14 |
mordred | jeblair: they name the logger itself "p=%s u=%s | " % (mypid, user) - then set the format to '%(asctime)s %(name)s %(message)s' | 21:15 |
jeblair | oh, wow | 21:22 |
jeblair | why? | 21:22 |
jeblair | maybe they couldn't think of another way of adding in extra data like that...? | 21:24 |
clarkb | mordred: its too near the weekend for eye bleeding | 21:25 |
mordred | jeblair: I think yes - I think it's because in the callback plugin itself it actually doesn't know that information perhaps? | 21:26 |
mordred | (I honestly have no clue) | 21:26 |
jeblair | there are so many ways of manipulating logger data if you use the python logging library correctly... | 21:26 |
clarkb | os.getpid() should work anywhere? | 21:26 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Release executor jobs on test exit https://review.openstack.org/468553 | 21:27 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add LoggerAdapter for executor jobs https://review.openstack.org/468554 | 21:27 |
jeblair | there's one ^ :) | 21:27 |
jeblair | mordred: anyway, we can probably get the logger and reset the formatter? | 21:28 |
mordred | jeblair: yah | 21:28 |
mordred | I'll poke at that on money | 21:28 |
mordred | monday | 21:28 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Actually check out the branch in the jobdir https://review.openstack.org/468556 | 21:32 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Enable test_multi_branch cloner/executor test https://review.openstack.org/468557 | 21:32 |
jeblair | 2 cloner tests enabled! ha ha ha | 21:32 |
SpamapS | cloner? | 21:33 |
SpamapS | isn't that a dead thing? | 21:33 |
SpamapS | is zuulv3-dev not running jobs now? | 21:34 |
SpamapS | not seeing it post results to my most recent patches | 21:34 |
jeblair | SpamapS: [cloner] yes, but what it *does* is now handled by the executor. the cloner tests are really detailed and important, so i'm enabling them applied to the executor with minimal auditable changes; i'll rename them at the end. | 21:36 |
jeblair | SpamapS: er, there were issues earlier; i think they were resolved, but i haven't followed up with very recent stuff | 21:36 |
jeblair | SpamapS: anything uploading since 20:17 utc should be okay | 21:38 |
jeblair | er uploaded since | 21:38 |
jeblair | older than that, not so much | 21:38 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add a space and a capital letter to success banner https://review.openstack.org/468539 | 21:43 |
SpamapS | jeblair: ah ok I fell into the "problems" window ;) | 21:46 |
* SpamapS will recheck | 21:46 | |
SpamapS | jeblair: also [cloner] ACK | 21:47 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Verify executor branch checkout https://review.openstack.org/468564 | 22:45 |
* SpamapS hits reviewer exhaustion | 22:46 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Enable test_multi_branch cloner/executor test https://review.openstack.org/468557 | 22:47 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Verify executor branch checkout https://review.openstack.org/468564 | 22:47 |
pabelanger | jeblair: have we committed to nolonger having zuul-merger running for zuulv3? | 22:48 |
SpamapS | pabelanger: I believe the thought is that it might be useful to have it doing just merges to offload work from executors. | 22:48 |
SpamapS | tho if the result of that work doesn't make it into the executor and must be repeated, I'm not sure there's value in doing that. | 22:49 |
jeblair | SpamapS: unfortunately, since jobs for a single change can run on multiple executors, we will either need to repeat it or fetch it from the mergers (which require they serve git repos (though only privately, not publicly)) | 22:50 |
jeblair | at any rate, i've implemented "repeat it"; if we later decide we'd rather have them fetch, that's fine -- it'll mostly just be a matter of deleting code :) | 22:51 |
SpamapS | I could see just having two ways of configuring it as we learn to scale. | 22:52 |
SpamapS | for big ones like infra... you have the mergers serving to the executors | 22:52 |
SpamapS | for the average one, just have executors merging. | 22:52 |
jeblair | while that's related to the question of whether standalone mergers are worthwhile, it's not the only consideration -- either way, there's significant work that can be offloaded from the executors by having standalone mergers, so i want to keep them around until we see how they perform. | 22:52 |
SpamapS | and I'd rather focus on the average one first, and then when we break it, we have data to guide our optimization | 22:53 |
jeblair | well, we kind of need to run the infra one :) | 22:53 |
SpamapS | what about another way of thinking.. what if mergers could push their merge results to executors, sort of like how gerrit push triggers work | 22:54 |
jeblair | at any rate, both are supported in the code right now, and i think if pabelanger is asking "what should the puppet modules do?" i think the answer is they should continue to support both for now. | 22:54 |
SpamapS | jeblair: what I mean by that is that we haven't run an infra with this scheme, so we don't know that it will break at that scale. | 22:54 |
pabelanger | jeblair: ya, puppet / ansible. I have the code to support both still, was mostly curious if still valid. Thanks for the feedback | 22:55 |
SpamapS | entirely possible the executors will remain blissfully busy and humming doing merges sometimes and running jobs others. | 22:55 |
jeblair | SpamapS: yeah, though that's the same problem with the direction of transfer reversed. it's actually a little trickier since an executor knows where to pull from, but a merger doesn't know where to push. | 22:55 |
SpamapS | jeblair: I was thinking merger would push to all. :-P | 22:55 |
SpamapS | or.. something crazy like.. to AFS | 22:55 |
SpamapS | would that even work | 22:56 |
* SpamapS has no idea | 22:56 | |
jeblair | SpamapS: about scale: yes, that much i agree with; we don't need to push people to run mergers yet; we just need to keep them around in case infra-scale needs them. | 22:56 |
jeblair | SpamapS: (mostly ^ i wanted to clarify that i think we do need to keep that architecture working so we have it ready if infra-scale needs it) | 22:56 |
SpamapS | ACK | 22:56 |
jeblair | SpamapS: pull would have the advantage of only doing work on the machines that need it; i think the access/network concerns are about the same either way | 22:57 |
* SpamapS really would love to be in the position of actually watching it scale or fail to scale. ;) | 22:59 | |
jeblair | (i'm just really happy that we're out of the business of saying "here's how you set up apache to serve git repos"; probably if we did pull, we could just have intra-zuul ssh access or something to make it simpler) | 22:59 |
jeblair | (it has also occurred to me that we could push git blobs over gearman, but i'm trying not to think about that) | 22:59 |
jeblair | the "repeat" approach is dumb, but simple. | 23:00 |
SpamapS | is there such a thing as a git blob? | 23:01 |
SpamapS | I wondered if maybe there was , like a git pack thing, that can be easily created. | 23:01 |
jeblair | SpamapS: well, the stuff that git upload-pack ships around | 23:01 |
pabelanger | ya, ssh would be nice over apache | 23:02 |
SpamapS | I just don't know how those are created | 23:02 |
SpamapS | I guess you know the refs you made with the merge | 23:02 |
jeblair | it could get complicated since it depends on what objects are on both sides of the transaction | 23:02 |
jeblair | probably you'd just assume the executor had all the objects required by the current branch tips, and then send all the ones required by the change not in that set. then the executor could start by updating from origin, then apply the objects from gearman. | 23:04 |
jeblair | ugh. i'm failing to not think about this. :) | 23:04 |
jeblair | SpamapS: you've now gotten me convinced this is worth spending a few minutes poking at. | 23:05 |
SpamapS | https://git-scm.com/docs/git-pack-objects | 23:06 |
SpamapS | yep | 23:06 |
SpamapS | it actually doesn't seem so haphazard | 23:06 |
jeblair | so next thing is: get a list of objects we need to send (so we can pass that to git-pack-objects) | 23:07 |
jeblair | would that just be a list of commits, or is it more complex than that? | 23:08 |
jeblair | (like, would we also have to identify what tree objects are required?) | 23:09 |
SpamapS | jeblair: if you know that the other side has everything except those commits, then it's just those commits. | 23:09 |
SpamapS | But if there are things in those objects' tree that are not present, you need those too to unpack. | 23:09 |
jeblair | SpamapS: so basically: for each new commit object, get the associated tree object. pack the set of commit and tree objects. send it over the wire and unpack. ? | 23:17 |
SpamapS | jeblair: indeed | 23:20 |
* SpamapS is playing localy | 23:21 | |
SpamapS | locally even | 23:21 |
SpamapS | jeblair: so I don't know how to get the updates to refs | 23:21 |
SpamapS | I commit to master, pack up that commit hash, and move it to the other repo with unpack-objects ... and then I can checkout that hash, but master is still at the old hash | 23:22 |
jeblair | SpamapS: that's cool -- i already have code to update all the refs, including branch heads, to point at the right commits | 23:22 |
SpamapS | so yeah seems doable | 23:23 |
SpamapS | and the pack files are pretty efficiently sized.. already gzipped | 23:23 |
SpamapS | so even a large chain of merges would probably keep under the 1MB limit | 23:23 |
jeblair | okay... i'll putz around with doing this with gitpython, and maybe try it out next week... | 23:24 |
jeblair | SpamapS: 1MB limit? | 23:24 |
SpamapS | gearman | 23:24 |
jeblair | SpamapS: per packet? | 23:24 |
SpamapS | technically the limit is 4GB but realistically pushing bigger than 1MB through gearman is pretty inefficient. | 23:25 |
SpamapS | gearmand will flat out refuse > 1MB jobs | 23:25 |
SpamapS | dunno if geard does the same | 23:25 |
jeblair | nope, didn't know about a limit :) | 23:26 |
SpamapS | but .. memory usage would obviously get pretty hairy | 23:26 |
SpamapS | it's just an implementation specific limit | 23:26 |
SpamapS | the protocol allows a 32-bit unsigned int for packet length.. that's the only theortical limit. | 23:26 |
jeblair | SpamapS: but a job can send multiple work_data packets, so we could stream reasonable chunks at a time. | 23:26 |
SpamapS | that's a good point, those are handled one at a time for that reason | 23:26 |
SpamapS | the original usage of that was resizing images on livejournal | 23:27 |
jeblair | (we might even want to go ahead and do that just to avoid starvation) | 23:27 |
SpamapS | yeah I could see just pushing a merge job with a flag that says "get some git objects from work data first" | 23:28 |
SpamapS | err no | 23:28 |
SpamapS | clients don't send work data (doh) | 23:28 |
jeblair | oh good point | 23:28 |
SpamapS | feeling more and more like configuring apache to fetch from mergers is simpler ;) | 23:29 |
jeblair | so if we do merger -> scheduler -> executor, we can only chuck from merger -> scheduler, not from scheduler-> executor.... | 23:29 |
jeblair | *but* | 23:29 |
jeblair | if we have the scheduler ask the merger for data directly (via gearman), that could stream in chunks, and, incidentally, avoid the scheduler seeing a lot of data it doesn't care about. | 23:30 |
SpamapS | if we had a coalescing gearmand we could go in reverse | 23:31 |
jeblair | scheduler -> [merger:merger], returns {merger: merger-name; objects: ...}. then executor -> [merger-name:fetch-objects, objects:...] would be able to stream chunks from the merger to the executor directly (at the cost of requiring the merger to stay online for the duration of the buildset) | 23:32 |
SpamapS | this feels like a very roundabout way to get around mergers simply serving their completed merges, doesn't it? | 23:33 |
SpamapS | like, not needing tests nodes to contact mergers is a big win. Not needing executors to do the same seems less important. | 23:33 |
jeblair | SpamapS: yes, fetch over http or ssh would be much simpler, but complicates the operational deployment. | 23:33 |
jeblair | SpamapS: agreed, internal access isn't as significant as external access. | 23:34 |
jeblair | SpamapS: how does coalescing help here? | 23:34 |
SpamapS | you know I think it could be done w/o coalesce in this instance | 23:36 |
SpamapS | but basically flip the roles | 23:37 |
SpamapS | have the executors submit a job to a queue name in response to receiving a merge request, and then basically let the mergers serve it like cache requests. | 23:37 |
SpamapS | "do I have these objects already", no, do the merge.. send packfile. | 23:38 |
SpamapS | the coalesce would just be good for avoiding repeating merges | 23:39 |
SpamapS | since if you had several executors request the same objects they could coalesce on a hash of that request and only one merger would do the merge | 23:39 |
jeblair | SpamapS: oh, yeah, i think this is only a win if we only perform the merge once | 23:40 |
SpamapS | but if you want the merges done beforehand, stuffing the result in somewhere addressable and fetching it from there seems like the win. | 23:40 |
jeblair | which is why i was thinking one option is to have the executors ask the merger which performed it for the objects | 23:40 |
SpamapS | I guess you could have a gearman queue per merger and attach the merger name that did the initial merge to the job that the executors get? | 23:40 |
jeblair | yeah that ^ | 23:41 |
SpamapS | That would achieve 1-merge-per-change and would avoid new network pathways | 23:41 |
jeblair | still sends lots of data through gearman, but it's the shortest path version of that. | 23:41 |
SpamapS | and you wouldn't need coalescing to make it efficient since even if multiple executors submitted they'd just get served rapidly in succession | 23:42 |
jeblair | (both sides can be efficient about how they handle the data since it doesn't require intermediate storage) | 23:42 |
jeblair | yep | 23:42 |
SpamapS | (though coalescing _would_ make it more efficient when you get to lots and lots of executors) | 23:42 |
jeblair | if we go with that, we really should look into whether we can switch to C gearmand though. i'm not worried about gear clients bottlenecking here, but this may exceed what we can do with geard. | 23:43 |
SpamapS | I don't think gearman handling lots of data is much of a problem.. it's stateless so the only waste is network bandwidth and a little buffer churn | 23:43 |
SpamapS | And yeah, C gearmand gets you multi-threaded performance. | 23:43 |
jeblair | i feel fairly confident that if there are any remaining problems with gearmand for zuul, we can fix them. :) | 23:44 |
SpamapS | perhaps | 23:44 |
* SpamapS eyes rustygear longingly, but shuns it again | 23:45 | |
SpamapS | so basically we just need 'git pack over gearman' ;-) | 23:45 |
SpamapS | I will point out that whenever users on the ML ask me for solutions like this, I recommend that they just serve that data over http ;) | 23:46 |
jeblair | SpamapS: here's what i have so far (up to the point where we would git-pack): https://etherpad.openstack.org/p/v5FIUHK4hQ | 23:49 |
SpamapS | jeblair: seems legit :) | 23:58 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!