-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/nodepool] 906596: Fix duplicate fd registration in nodescan https://review.opendev.org/c/zuul/nodepool/+/906596 | 00:34 | |
-@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [zuul/zuul] 907149: Replace mock_kinesis with mock_aws https://review.opendev.org/c/zuul/zuul/+/907149 | 04:08 | |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 905465: Client (old): don't translate null to 0000000 https://review.opendev.org/c/zuul/zuul/+/905465 | 09:04 | |
-@gerrit:opendev.org- Simon Westphahl proposed: | 13:02 | |
- [zuul/nodepool] 906815: Refactor config loading in builder and launcher https://review.opendev.org/c/zuul/nodepool/+/906815 | ||
- [zuul/nodepool] 907201: Improve logging around manual/scheduled image builds https://review.opendev.org/c/zuul/nodepool/+/907201 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 907210: Add backoff delay only after successful job start https://review.opendev.org/c/zuul/zuul/+/907210 | 14:17 | |
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 907210: Add backoff delay only after successful job start https://review.opendev.org/c/zuul/zuul/+/907210 | 14:32 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 907109: Allow custom k8s pod specs https://review.opendev.org/c/zuul/nodepool/+/907109 | 15:01 | |
@raaar:matrix.org | Hi Folks, does anyone have any negative experience with squash-merging? We have it enabled some time ago across several projects we use Zuul for and have recently discovered that under relatively moderate load (20-30 concurrent jobs per executor) it takes a very long time (up to a few hours) for an executor to prepare a workspace for a build. It looks like most of this slow-down is coming from repo.index.commit call in squash_merge method. Has anyone seen anything similar and could, perhaps, suggest a way to solve this? | 17:25 |
---|---|---|
@raaar:matrix.org | * Hi Folks, does anyone have any negative experience with squash-merging? We have it enabled some time ago across several projects we use with Zuul and have recently discovered that under relatively moderate load (20-30 concurrent jobs per executor) it takes a very long time (up to a few hours) for an executor to prepare a workspace for a build. It looks like most of this slow-down is coming from repo.index.commit call in squash\_merge method. Has anyone seen anything similar and could, perhaps, suggest a way to solve this? | 17:27 |
@clarkb:matrix.org | raaar: how many mergers are you running? It is possible you need more to process the resulting git tasks. | 17:28 |
@raaar:matrix.org | We have 5 mergers and 5 executors (with merge_jobs enabled), so 10 in total. Whenever the squash_merge is done by a merger, it take virtually no time... but on executor pods it's extremely slow.... FWIW, we run those on shared k8s nodes with 32cores | 17:30 |
@raaar:matrix.org | * We have 5 mergers and 5 executors (with merge\_jobs enabled), so 10 in total. Whenever the squash\_merge is done by a merger, it takes virtually no time... but on executor pods it's extremely slow.... FWIW, we run those on shared k8s nodes with 32cores | 17:31 |
@clarkb:matrix.org | are the executors all adjacent to each other on the same shared node(s) and the mergers elsewhere? could be a problem of contention (cpu, memory and disk io are all important for merges iirc). Determining why one is slow and not the other is probably helpful in finding a solution | 17:32 |
@raaar:matrix.org | Just double-checked, we do have 5 nodes (out of total 6), which run an instance of an executor, additionally they might have an instance of a merger, a launcher, a zoo-keeper, a web shard.. So, a merger and an executor might share the same node...yet perform vastly differently. Also, we don't really see a clear evidence of resource contention. | 17:42 |
@clarkb:matrix.org | I think the merge logs do record what actions are being taken (you might need to set log level to debug). That may have clues? | 17:59 |
@raaar:matrix.org | > <@clarkb:matrix.org> I think the merge logs do record what actions are being taken (you might need to set log level to debug). That may have clues? | 18:26 |
Well, we added some additional print-debugging around here -- https://opendev.org/zuul/zuul/src/branch/master/zuul/merger/merger.py#L678 and all we see is that the call to `repo.index.commit` takes tens of minutes. We'll try to also enable logging in gitpython to see what actually goes on there... | ||
IIUC, one major difference between executors and mergers is that the former are heavily multi-threaded and run multiple merge operations concurrently (for each of the AnsibleJob they are starting), while the latter do a single merge at a time... I wonder if that could explain our observations? | ||
@jim:acmegating.com | there are a mixture of threads and processes in the executor; most merge operations happen in their own process | 18:30 |
@clarkb:matrix.org | But merges occur one at a time per merger | 18:32 |
@clarkb:matrix.org | Pretty sure the run loop in mergers can't handle more than one at a time. But then ya they fork out to git for that | 18:32 |
@clarkb:matrix.org | looking at the squash merge code it does a merge first then a commit. I would've expected the merge to be the slow part then the commit is mostly bookkeeping. But maybe the bookkeeping actually causes the other work to happen? | 18:38 |
@clarkb:matrix.org | one crazy thing to try may be a tmpfs for those directories to try and rule out disk io problems | 18:39 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 906320: WIP Finish circular dependency refactor https://review.opendev.org/c/zuul/zuul/+/906320 | 18:46 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 906320: WIP Finish circular dependency refactor https://review.opendev.org/c/zuul/zuul/+/906320 | 18:47 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 906320: WIP Finish circular dependency refactor https://review.opendev.org/c/zuul/zuul/+/906320 | 19:27 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 907109: Allow custom k8s pod specs https://review.opendev.org/c/zuul/nodepool/+/907109 | 20:40 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 907109: Allow custom k8s pod specs https://review.opendev.org/c/zuul/nodepool/+/907109 | 20:43 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 907109: Allow custom k8s pod specs https://review.opendev.org/c/zuul/nodepool/+/907109 | 20:46 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 907255: Simplify k8s pod sleep https://review.opendev.org/c/zuul/nodepool/+/907255 | 20:52 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 22:34 | |
- [zuul/zuul] 906320: WIP Finish circular dependency refactor https://review.opendev.org/c/zuul/zuul/+/906320 | ||
- [zuul/zuul] 907256: Add --keep-config-cache option to delete-state command https://review.opendev.org/c/zuul/zuul/+/907256 | ||
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 905275: Optimize db prune query https://review.opendev.org/c/zuul/zuul/+/905275 | 22:39 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!