Tuesday, 2021-11-23

-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl:00:43
- [zuul/zuul] 818385: Fix job data attribute serialization https://review.opendev.org/c/zuul/zuul/+/818385
- [zuul/zuul] 818360: WIP: Load job from pipeline state on executors https://review.opendev.org/c/zuul/zuul/+/818360
-@gerrit:opendev.org- Felix Edel proposed on behalf of Simon Westphahl: [zuul/zuul] 815278: DNM: execute tests with two schedulers https://review.opendev.org/c/zuul/zuul/+/81527808:02
-@gerrit:opendev.org- Felix Edel proposed:08:02
- [zuul/zuul] 818205: WIP: Add source attribute to GitConnection https://review.opendev.org/c/zuul/zuul/+/818205
- [zuul/zuul] 818862: Only use a single createScheduler() helper method in tests https://review.opendev.org/c/zuul/zuul/+/818862
- [zuul/zuul] 818863: Limit scheduler_count to 1 for Scale out Scheduler tests https://review.opendev.org/c/zuul/zuul/+/818863
- [zuul/zuul] 818864: Limit scheduler_count to 1 for broken tenant config tests https://review.opendev.org/c/zuul/zuul/+/818864
-@gerrit:opendev.org- Felix Edel proposed on behalf of Simon Westphahl: [zuul/zuul] 815278: DNM: execute tests with two schedulers https://review.opendev.org/c/zuul/zuul/+/81527809:15
-@gerrit:opendev.org- Felix Edel proposed:09:15
- [zuul/zuul] 818205: WIP: Add source attribute to GitConnection https://review.opendev.org/c/zuul/zuul/+/818205
- [zuul/zuul] 818867: Don't use RecordingMergeClient.history in TestNonLiveMerges https://review.opendev.org/c/zuul/zuul/+/818867
- [zuul/zuul] 818868: Combine different history approaches for merge jobs in tests https://review.opendev.org/c/zuul/zuul/+/818868
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 810699: Web UI: Show pipeline types as icons https://review.opendev.org/c/zuul/zuul/+/81069911:35
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 810699: Web UI: Show pipeline types as icons https://review.opendev.org/c/zuul/zuul/+/81069911:36
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 781858: web UI: allow a privileged user to promote a change https://review.opendev.org/c/zuul/zuul/+/78185811:36
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 802559: Web UI: Add "Create Autohold Request" form, improve API error messages https://review.opendev.org/c/zuul/zuul/+/80255911:37
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 818295: web UI: Add a credentials renew modal https://review.opendev.org/c/zuul/zuul/+/81829511:37
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 818295: web UI: Add a credentials renew modal https://review.opendev.org/c/zuul/zuul/+/81829511:43
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 769943: Example Docker compose: keycloak integration https://review.opendev.org/c/zuul/zuul/+/76994313:19
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] 769943: Example Docker compose: keycloak integration https://review.opendev.org/c/zuul/zuul/+/76994315:14
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 818385: Fix job data attribute serialization https://review.opendev.org/c/zuul/zuul/+/81838517:06
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-sphinx] 819006: Set project vars to match Zuul https://review.opendev.org/c/zuul/zuul-sphinx/+/81900618:08
@jim:acmegating.comzuul-maint: can you speedy review that ^ -- i'll re-enqueue the release jobs when it merges18:09
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-sphinx] 819006: Set project vars to match Zuul https://review.opendev.org/c/zuul/zuul-sphinx/+/81900619:06
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 809300: Add a zookeeper map to developer docs https://review.opendev.org/c/zuul/zuul/+/80930019:20
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 809300: Add a zookeeper map to developer docs https://review.opendev.org/c/zuul/zuul/+/80930021:02
@fungicide:matrix.orgi've been watching the system resource graphs for opendev's zk servers, mainly to confirm we don't have any obvious (slow) memory leaks, and things are looking great. i'm confused by one thing though... two of the zk servers show significant amounts of outbound network traffic, but one has almost no outbound traffic (all have roughly the same inbound rates). is there something about how zk works which would cause that? or is it that each of our two running schedulers has each picked a different zk cluster member to query and the third one is simply unloved?21:09
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 819023: Add a scale-out-scheduler smoke test https://review.opendev.org/c/zuul/zuul/+/81902321:13
@jim:acmegating.comfungi: yes, that's plausible, especially if there's a zk rollover event; clients can end up connected to 2 servers when that's all that's running and they may stay stuck to them21:14
@clarkb:matrix.orgfungi: you can ask the zookeeper's for client connection info. That is likely the cause though. Unbalanced connections21:14
@fungicide:matrix.orgalso, if it's the latter, is there some anti-affinity mechanism in play to make sure the connections are distributed to different cluster members? if not, i'm a little worried that we're likely to see significant packet loss if both schedulers happen to decide they prefer the same cluster member, as we could end up running afoul of rate limits in our service provider21:15
@clarkb:matrix.orgThere is no anti affinity. Just random connection sorting iirc21:15
@jim:acmegating.comfungi: wow, is the traffic that high?21:15
@fungicide:matrix.orghovering around 40-50Mbps outbound from each of two zk cluster members, looks like21:16
@jim:acmegating.comfungi: on this graph you can see the zk watches for all 3 servers.  the watches are set by clients, so that tells us that right now, zk06 has no clients: https://grafana.opendev.org/d/5Imot6EMk/zuul-status?viewPanel=38&orgId=121:16
@fungicide:matrix.orgif those flavors are capped at 100Mbps (and i think the provider likes to add the inbound and outbound together for their bandwidth limits?) things could get a little ugly21:16
@jim:acmegating.comfungi: maybe we should switch those to internal network21:17
@fungicide:matrix.orgneat, should we be worried that zk06 has no clients? does that imply a problem with the server?21:17
@clarkb:matrix.orgI recall limits being much higher than 100mbps though21:18
@fungicide:matrix.organd yeah, i wondered if we could do the "backend" network there as a workaround21:18
@clarkb:matrix.orgCloser to gigabit ranges iirc21:18
@jim:acmegating.comfungi: no, that's likely just due to zk rollover.  this is pretty typical; the only way to get them to level out would be a hard restart of zuul.21:18
@jim:acmegating.comwell, actually, i guess a rolling restart would do it :)21:18
@jim:acmegating.combut any network hiccup and you're right back here, so probably not worth getting on that treadmill21:19
@fungicide:matrix.orgClark: yeah, probably the bandwidth cap is higher unless we chose a fairly small flavor for the servers21:19
@clarkb:matrix.orgAlso if they get throttled and close connections it should auto rebalance automatically21:19
@clarkb:matrix.orgSince reconnects are to a random choice iirc21:19
@fungicide:matrix.orgoh, that's not so bad then21:20
@fungicide:matrix.orglooks like the zk cluster members are a 4gb ram flavor... i'll try to remember how to look up that provider's flavor-specific bandwidth limits21:21
@fungicide:matrix.org`openstack flavor show` says those zk servers have a rxtx_factor of 800.0 which i assumes means we get packets dropped when inbound and outbound together exceed 800Mbps, so not in danger of that i don't think21:32
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl: [zuul/zuul] 818360: Load job from pipeline state on executors https://review.opendev.org/c/zuul/zuul/+/81836023:16

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!