Tuesday, 2021-10-05

-@gerrit:opendev.org- Joshua Watt proposed: [zuul/zuul-operator] Update spec if image version changes https://review.opendev.org/c/zuul/zuul-operator/+/81123502:24
-@gerrit:opendev.org- Simon Westphahl proposed:07:12
- [zuul/zuul] Simplified attribute API for ZKObjects https://review.opendev.org/c/zuul/zuul/+/809532
- [zuul/zuul] Make QueueItem a Zookeeper object https://review.opendev.org/c/zuul/zuul/+/809414
- [zuul/zuul] Store pipeline state in Zookeeper https://review.opendev.org/c/zuul/zuul/+/810658
- [zuul/zuul] Store change queues in Zookeeper https://review.opendev.org/c/zuul/zuul/+/810920
- [zuul/zuul] Save and restore bundle with item in Zookeeper https://review.opendev.org/c/zuul/zuul/+/811422
- [zuul/zuul] Pass ZK context to deserialize method of ZKObjects https://review.opendev.org/c/zuul/zuul/+/811955
- [zuul/zuul] Move ZuulMark from configloader to model https://review.opendev.org/c/zuul/zuul/+/812450
- [zuul/zuul] Recursively delete all sub-nodes of ZKObjects https://review.opendev.org/c/zuul/zuul/+/812451
- [zuul/zuul] wip: Store build sets in Zookeeper https://review.opendev.org/c/zuul/zuul/+/812452
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com:07:12
- [zuul/zuul] Add an API for ZK-backed objects https://review.opendev.org/c/zuul/zuul/+/809293
- [zuul/zuul] Temporarily enqueue cycle changes for reporting https://review.opendev.org/c/zuul/zuul/+/810328
- [zuul/zuul] Dequeue items after we're done with them https://review.opendev.org/c/zuul/zuul/+/811244
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] wip: Store build sets in Zookeeper https://review.opendev.org/c/zuul/zuul/+/81245207:35
-@gerrit:opendev.org- Simon Westphahl proposed:09:16
- [zuul/zuul] Move ZuulMark from configloader to model https://review.opendev.org/c/zuul/zuul/+/812450
- [zuul/zuul] Recursively delete all sub-nodes of ZKObjects https://review.opendev.org/c/zuul/zuul/+/812451
- [zuul/zuul] wip: Store build sets in Zookeeper https://review.opendev.org/c/zuul/zuul/+/812452
- [zuul/zuul] Only retry ZK operations for Kazoo exceptions https://review.opendev.org/c/zuul/zuul/+/812466
- [zuul/zuul] wip: Add support for sharded ZKObjects https://review.opendev.org/c/zuul/zuul/+/812467
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of Felix Edel: [zuul/zuul] Make reporting asynchronous https://review.opendev.org/c/zuul/zuul/+/69125312:06
-@gerrit:opendev.org- daniel.pawlik https://matrix.to/#/@dpawlik:matrix.org proposed on behalf of Jiri Podivin: [zuul/zuul-jobs] DNM https://review.opendev.org/c/zuul/zuul-jobs/+/80703112:13
-@gerrit:opendev.org- Simon Westphahl proposed:12:17
- [zuul/zuul] wip: Store build sets in Zookeeper https://review.opendev.org/c/zuul/zuul/+/812452
- [zuul/zuul] wip: Add support for sharded ZKObjects https://review.opendev.org/c/zuul/zuul/+/812467
-@gerrit:opendev.org- Simon Westphahl proposed:12:40
- [zuul/zuul] Store build sets in Zookeeper https://review.opendev.org/c/zuul/zuul/+/812452
- [zuul/zuul] wip: Add support for sharded ZKObjects https://review.opendev.org/c/zuul/zuul/+/812467
-@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley: [zuul/zuul-website] Update the OpenInfra Community CoC URL https://review.opendev.org/c/zuul/zuul-website/+/81046912:47
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of Felix Edel: [zuul/zuul] Make reporting asynchronous https://review.opendev.org/c/zuul/zuul/+/69125313:57
@jpew:matrix.orgzuul (latest as of yesterday) doesn't seem to be picking up job changes when I force push a branch.... am I doing something wrong?15:47
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] UI: harmonize build color codes https://review.opendev.org/c/zuul/zuul/+/81252315:49
@clarkb:matrix.orgjpew: I would start by looking for the event that zuul should trigger off of in your scheduler logs. With gerrit that would be ref-updated. Not sure what driver(s) you are using16:00
@clarkb:matrix.orgjpew: but that should give you an event id in the logs that you can then grep for and see all (really most I guess) related log entries associated to that event16:00
@mhuin:matrix.orgHello zuul-maint, could I get reviews on https://review.opendev.org/c/zuul/zuul/+/735586 and https://review.opendev.org/c/zuul/zuul/+/736968 ? These are prerequisites on zuul-web for the Admin GUI changes16:10
@gtema:matrix.orgcan I please also ask zuul-maint for giving a look at tiny https://review.opendev.org/c/zuul/zuul/+/806230 and https://review.opendev.org/c/zuul/zuul/+/806231 which help with gitlab driver?16:26
@jpew:matrix.orgClark: Thanks. Turns out it was k8s being unhappy because I restarted the node that was running the service without cordoning it16:30
@jpew:matrix.orgSorry for the noise16:30
@clarkb:matrix.orggtema: for https://review.opendev.org/c/zuul/zuul/+/806231 should the zuul mergers be updated to support a squash method too (or maybe they already do)? I ask because zuul intends to test what the end result is and if we are out of sync on the final merge process the results could differe16:32
@clarkb:matrix.org * gtema: for https://review.opendev.org/c/zuul/zuul/+/806231 should the zuul mergers be updated to support a squash method too (or maybe they already do)? I ask because zuul intends to test what the end result is and if we are out of sync on the final merge process the results could differ16:32
@clarkb:matrix.orgoh I see it is already done on the merger side and this is just enabling it in the request to the remote? I guess if the remote forces squash it doesn't imply it when you request a merge16:33
@gtema:matrix.orghm, this is currently implemented same way like in github driver - passing API param to the API16:33
@clarkb:matrix.orgya I think this is just an oddity of their APIs16:34
@gtema:matrix.org> <@clarkb:matrix.org> oh I see it is already done on the merger side and this is just enabling it in the request to the remote? I guess if the remote forces squash it doesn't imply it when you request a merge16:34
correct - if remote requires squash we can only push raising the flag
@gtema:matrix.orgthanks Clark   16:39
@jim:acmegating.comgtema: do you know where the docs are for the api used in 806230?16:40
@jim:acmegating.comgtema: is it https://docs.gitlab.com/ee/api/merge_requests.html#accept-mr ?16:41
@gtema:matrix.orgyes, it is16:41
@gtema:matrix.orghere I was exactly experiencing that with enabled squash forcing it is returning 200 but with error message16:42
@jim:acmegating.comgtema: yeah, that seems poorly documented, but i trust your report of emperically tested behavior :)16:43
@gtema:matrix.orgyupp - precisely bugfixing from real deployment16:43
@clarkb:matrix.orgcorvus: is https://zuul.opendev.org/t/zuul/build/32b44f9f49bb4773b090281ab5406823 a known failure mode for quickstart?16:52
@jim:acmegating.comgtema: have you seen any http request timeout issues with gitlab?  i fixed some recently (with jpew's help), but i'm still seeing some.  i suspect it's network topology but wondered what your experience has been (jpew you too)16:53
@gtema:matrix.orgnope, so far not16:53
@gtema:matrix.orgwe have dedicated gitlab, thus maybe not really affected16:53
@jim:acmegating.com(oh yeah i'm also dealing with gitlab-ee should have specified)16:54
@jim:acmegating.comgtema: that's good to know thanks.16:55
@gtema:matrix.orgglad to help16:55
@jim:acmegating.comClark: oO  https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_32b/812161/1/gate/zuul-quick-start/32b44f9/container_logs/executor.log16:55
@jim:acmegating.comClark: good news: the cert file read check i added works and successfully produces an error.  bad news is there was an error.16:56
@jim:acmegating.comClark: maybe we can race creating certs16:56
@jim:acmegating.comClark: we may need to add something to wait for that file to appear (like we do for the zookeeper container)17:02
@clarkb:matrix.orgInteresting. I thought we already ran things in order though?17:04
@clarkb:matrix.orgmaybe a file sync issue? I guess enforcing order can't hurt17:05
@clarkb:matrix.orgI'll recheck the change too17:05
@jim:acmegating.comClark: ordering in docker-compose is very weak.  it's just container start (not ready)17:08
@clarkb:matrix.orgoh right17:08
@clarkb:matrix.orgcorvus: would it work here to just have the executor startup retry?17:12
@clarkb:matrix.orgI guess that isn't very clean17:13
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] Executor wait for certs in quickstart docker-compose.yaml https://review.opendev.org/c/zuul/zuul/+/81253117:26
@clarkb:matrix.orgSomething like ^ maybe. I think we probably need to evaluate that for the other zuul services in the docker-compose.yaml file for quickstart too17:26
@jim:acmegating.comClark: we could do that.  we do the same for connecting.  i think "local file is not present" is normally a bit more of a permanent error, but it sure would make the docker-compose neater17:26
@jim:acmegating.comyeah, should basically be every zuul/nodepool service17:27
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] Quickstart wait for certs in quickstart docker-compose.yaml https://review.opendev.org/c/zuul/zuul/+/81253117:56
@clarkb:matrix.orgcorvus: ^ updated to do all the services that should need it17:56
@jim:acmegating.comClark: ++17:56
-@gerrit:opendev.org- Zuul merged on behalf of Artem Goncharov: [zuul/zuul] Fix failed merge detection in gitlab https://review.opendev.org/c/zuul/zuul/+/80623018:07
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul] Handle no file commits in the Gerrit driver https://review.opendev.org/c/zuul/zuul/+/81216119:35
@fungicide:matrix.orgcorvus: are you still thinking about doing zuul 4.10.0 today? trying to put together the last minute tidbits for this month's openinfra newsletter, and don't want to say it's released if we need to delay tagging it20:14
@jim:acmegating.comfungi: yes, just wasn't in a hurry; should be able to get to it within a couple hours20:15
@fungicide:matrix.orgcorvus: thanks! no rush, i'll keep it in. i've already let them know the url to the release announcement is a placeholder and will get up with them on the correct one once it's out20:16
@jim:acmegating.comthx, will do as soon as i finish current task20:16
@clarkb:matrix.orgcorvus: any idea if zuul recheck comments will work with the patchset level comments in Gerrit 3.3? This came up in some context I don't remember20:20
@clarkb:matrix.orgWe can disable that in 3.3 if necessary but not 3.4 so zuul should support it anyway20:20
@jim:acmegating.comClark: i do not know20:21
@clarkb:matrix.orgianw: ^ fyi I think we should look into that really quickly pre upgrade.20:23
@clarkb:matrix.orghrm looking into this I really don't like that gerrit made this change20:30
@clarkb:matrix.orgIt seems that there can be any number of patchset level comments and they don't really distinguish them?20:31
@jim:acmegating.comdon't distinguish how?20:32
@clarkb:matrix.orgWell the current top level comment is the top level comment in the event field. But the new thing gives you a list of dicts under the event's patchsets entry20:33
@clarkb:matrix.orgI guess they may use a magic file name to specify this is the top level comment20:33
@jim:acmegating.comyes it's /PATCHSET_COMMENTS or something like that20:34
@jim:acmegating.com(no real file in git starts with / so they own the namespace)20:34
@clarkb:matrix.orgit also feels like they are reverting to patchset focused change review rather than change focused change review which is a back and forth that has happend before20:34
@jim:acmegating.comClark: actual name /PATCHSET_LEVEL20:34
@clarkb:matrix.orgthe polygerrit ui fixed that problem with the new old ui20:34
@clarkb:matrix.orgcorvus: how did you find that?20:35
@fungicide:matrix.orgyeah, there was a recent change in gertty to handle those, i think?20:35
@jim:acmegating.comClark: i looked at what gertty is using :)20:35
@jim:acmegating.coms/using/expecting from gerrit/ to be clear20:35
@clarkb:matrix.orgha ok20:35
@clarkb:matrix.orgThat was a clue to find https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/gerrit/gerritconnection.py#L239-L24520:36
@clarkb:matrix.organd with that https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/gerrit/gerritmodel.py#L431-L443 I think zuul does supportthis already 20:37
@jim:acmegating.comClark: agreed20:37
@fungicide:matrix.orgin theory, gerrit's own gerrit has been including them for a while20:38
@jim:acmegating.comthanks guillaumec for I5cb84fca5062a129ed461934bb564365371ecc54 :)20:38
@jim:acmegating.comfungi: yes, but gerrit's own zuul doesn't use them (in favor of the checks plugin)20:38
@fungicide:matrix.orgd'oh! right20:39
@jim:acmegating.compushed zuul 4.10.020:45
-@gerrit:opendev.org- Zuul merged on behalf of Artem Goncharov: [zuul/zuul] Add support for gitlab squash merge https://review.opendev.org/c/zuul/zuul/+/80623121:12
@clarkb:matrix.orgcorvus: https://zuul.opendev.org/t/zuul/build/657df90dd57c45c1a4c8ed4ee63b59b0 pypi packaging failed :/21:15
@clarkb:matrix.orgnot sure how important that is anymore, but it was the silly yarn issue we've seen periodically21:15
@clarkb:matrix.orgnote that happened on rax too so not an inmotion thing. THe mystery around that deepens21:16
@jim:acmegating.comi guess we should retrigger that?21:16
@clarkb:matrix.orgcorvus: ya I think so. The docker images might rebuild but not much hard in that21:17
@clarkb:matrix.orgI've got to pop out for a school run but can help with that more later if necessary21:17
@jim:acmegating.comi have re-enqueued21:17
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul] Quickstart wait for certs in quickstart docker-compose.yaml https://review.opendev.org/c/zuul/zuul/+/81253121:29
@clarkb:matrix.orgcorvus: https://zuul.opendev.org/t/zuul/build/87a17ffae554465ab45abcd0b93d4a94 it succeeded on the retry21:59
@jim:acmegating.com\o/22:03
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Add TCP keepalive to gitlab https://review.opendev.org/c/zuul/zuul/+/81261922:14
@jim:acmegating.comgtema: ^ found the issue.  Clark, fungi: ^ i'm interested in whether or not you think that's appropriate for zuul22:15
@clarkb:matrix.orgcorvus: it might be better to disable http keepalives on the zuul client side?22:16
@clarkb:matrix.orgthen connections shouldn't be reused as often?22:16
@jim:acmegating.comClark: i think under linux they are either disabled or have very high timeout values (hours) by default, whereas most nats we find in clouds drop their mapping in minutes22:17
@jim:acmegating.comClark: so that enables and sets a low timeout in order to either keep the connection alive via the keepalive packet itself or find the timout sooner22:18
@clarkb:matrix.orgyup but http keepalive being disabled should force every request to use a new connection?22:18
@jim:acmegating.comtcp keepalive22:18
@jim:acmegating.comClark: you're saying don't set "connection: keep-alive" in the http headerL22:18
@jim:acmegating.com?22:18
@clarkb:matrix.orgyes22:19
@jim:acmegating.comi think that's usually set on the server side.  is there a client header for that?22:19
@clarkb:matrix.orgpython requests sessions do this by default and iirc not doing that can solve problems like this22:19
@clarkb:matrix.orghttps://2.python-requests.org/en/master/user/advanced/#keep-alive unfortunately that doesn't explain disabling it22:19
@clarkb:matrix.orgthis may not be an option22:19
@jim:acmegating.comwell, urllib3 observes that HTTP/1.1 connections are keep-alive by default22:20
@jim:acmegating.comClark: if we don't like tcp keepalives for this, then i think the thing to do would be to disable caching in requests.  i think that can be done.22:20
@jim:acmegating.combut i started with this approach because i figured caching+keepalive might be better for everyone?22:20
@clarkb:matrix.orgthe problem with relying on tcp keepalives aiui is that the host kernel may not respect them. Though chances of our users running into that is also probably low22:22
@jim:acmegating.com(to disable caching, i think we would make a pool with a max of 0 requests)22:22
@clarkb:matrix.orgfwiw I'm not against this it just always feels a little weird to make this change at an application level. But that might also be the best location for users22:23
@jim:acmegating.comyeah, me too, which is why i brought it up.  there are definitely other options, but given that this specific situation is coming up in a "gitlab running in a public cloud" environment, it does feel like something that's not terribly unlikely for other folks to hit.  and we know from our own experiences that this sort of thing happens all the time in clouds.  so it seems maybe not inappropriate for us to say we have some application domain knowledge that says that users setting up a zuul connection to a code review system probably have a low tolerance for connection failures and would welcome any attempt to keep connections working by default, even at the cost of an extra network packet every minute.22:26
@fungicide:matrix.orgadding the keepalive seems like a fine thing to try here, the change makes it configurable and disableable anyway22:27
@jim:acmegating.comthe other alternatives are either disable connection caching for everyone (which might hurt more active users), or add an option to disable caching (which makes it one of those "set this to not have a broken system" options i'm always trying to avoid)22:27
@clarkb:matrix.orgrefreshing my memory on this it seems that ~120 seconds after we enter keepalive protocol which will be 60 seconds after the last data transfer we will kill the connection if there isn't an ack'd keepalive. That does seem a bit aggressive, but firewalls tend to be aggressive too22:28
@fungicide:matrix.orggood firewalls respect connection states for far longer (days even), but bad firewalls are everywhere these days22:28
@fungicide:matrix.orgalso things not intended to be firewalls but just happen to have traffic routing through them, for example it used to be that f5 bigip load balancers, which are often put inline with normally routed traffic, used to close all "idle" tcp sessions after 120 seconds by default. hopefully that's not still the case, but wow was that ever annoying to have to deal with22:30
@fungicide:matrix.orgthat wasn't even for sockets they were load balancing, it was just any connection routed through the machine22:31
@jim:acmegating.compretty sure there's a bigip involved here.  i'm suspecting 255 seconds for the timeout. (i'm not 100% sure on that yet, i'm confirming it via connections from multiple locations now)22:33
@fungicide:matrix.orgneat22:34
@clarkb:matrix.orgya I think really the only other thought I've got is should you document this option as a driver option?22:34
@clarkb:matrix.orglooks like it finds it in a sub dict of some existing config so no actual config loading changes needed?22:34
@jim:acmegating.comi think i did document it22:34
@clarkb:matrix.orgoh heh you did I just fail at gerriting22:35
@clarkb:matrix.orgI +2'd it but didn't approve in case there was other feedback you wanted. I think it should be safe to land though22:36
@jim:acmegating.comokay, thanks.  we can give it time for more feedback too -- i don't want to open a can of worms.  i'll make a custom build for my client and ask them to try it with that to confirm it improves things too.22:38
@clarkb:matrix.orgcorvus: one lighter weight approach would be to simply requset keepalives on sockets then let users set the values via sysctl/procfs/etc22:39
@clarkb:matrix.orgthe issue with that approach is I suspect that in certain container based setups that won't be configurable via those tools22:40
@fungicide:matrix.orgthat also sounds far less user-friendly than a driver config option22:40
@fungicide:matrix.orgnot everyone is as comfortable noodling with sockopts as we are22:41
@jim:acmegating.comagree on all points :)22:43
@clarkb:matrix.orgunrelated https://review.opendev.org/c/zuul/zuul/+/755988 is a change ianw wrote a while back that will help opendev run tests for https://review.opendev.org/c/opendev/system-config/+/807672/4 if we can get zuulian eyeballs on this change22:43
@clarkb:matrix.orgfungi: we did unrevert the zuul-jobs sibling thing that got reverted due to tox config parsing?22:44
@clarkb:matrix.orgjust making sure we managed to finish up that bit of work22:44
@iwienand:matrix.orgClark: i did look at that stack, but didn't +w the final revert in as i thought it probably wanted to be babysat22:48
@jim:acmegating.comClark: how does 755988 help 807672?22:50
@jim:acmegating.com(i don't see any playbooks being changed there)22:50
@clarkb:matrix.orghrm ya. ianw mentioned it in the meeting today, but maybe they don't quite line up22:51
@clarkb:matrix.orgI'm looking at 807672 next and agree I don't think 755988 would help here unless 807672 was updated to change a playbook22:52
@jim:acmegating.comianw, Clark: i left a -1 on 755988 for the release note.  i'm not sold on the change being a net win... i left my thinking there, but i don't object to it.22:57
@jim:acmegating.comsome of that is informed by my own experience where when i'm changing a system-config job, it's almost always a role, not a playbook i'm modifying.22:57
@clarkb:matrix.orgthanks23:00

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!