Wednesday, 2022-10-05

-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 804956: web: Jobs: Use TreeView for job overview page https://review.opendev.org/c/zuul/zuul/+/80495600:08
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 804956: web: Jobs: Use TreeView for job overview page https://review.opendev.org/c/zuul/zuul/+/80495600:24
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 804956: web: Jobs: Use TreeView for job overview page https://review.opendev.org/c/zuul/zuul/+/80495600:41
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul] 860280: Clarify extra vars are not passed with -e https://review.opendev.org/c/zuul/zuul/+/86028002:42
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 804956: web: Jobs: Use TreeView for job overview page https://review.opendev.org/c/zuul/zuul/+/80495603:38
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul-jobs] 859943: Allow overriding of Bazel installer checksum https://review.opendev.org/c/zuul/zuul-jobs/+/85994305:44
-@gerrit:opendev.org- Benedikt Löffler proposed: [zuul/nodepool] 860470: Cleanup local builds without .d folder https://review.opendev.org/c/zuul/nodepool/+/86047014:59
-@gerrit:opendev.org- Benedikt Löffler proposed: [zuul/nodepool] 860470: Cleanup local builds without .d folder https://review.opendev.org/c/zuul/nodepool/+/86047015:11
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 828614: Correct exit routine in web, merger https://review.opendev.org/c/zuul/zuul/+/82861416:31
@jim:acmegating.comtobiash: fungi ^ that needed a rebase16:31
@jim:acmegating.comClark: your link from monday that brought up a queueitem and a merge now returns 2 queueitems: https://tracing.opendev.org/search?end=1664816893236000&limit=20&lookback=1h&maxDuration&minDuration&operation=Build&service=zuul&start=1664813293236000&tags=%7B%22zuul_event_id%22%3A%22163bc9a5f8744d57aa4bde6ee693746c%22%7D17:52
@jim:acmegating.comClark: so i suspect we were seeing an incomplete buildset at the time17:53
@jim:acmegating.com * Clark: so i suspect we were seeing an incomplete queue item at the time17:53
@clarkb:matrix.orgBut doesn't a queueitem initiate the trace?17:59
@jim:acmegating.comClark: yes, but the way it works is that it determines the trace id, and then it and all of the child traces report their spans with that trace id.  since spans are reported to the collector at the end of the span, it's normal behavior for the deepest spans to report first and the highest level ones to report later (mind you, in most cases contemplated by otlp, i have to imagine they were expecting milliseconds between them, not hours or days).  so anyway, jaeger won't be able to show us the queueitem until it's dequeued, but it may have other completed steps.18:05
@jim:acmegating.comso i think we've learned a visual cue here: if the highest spans we see aren't our expected root level spans, then we know that operations are still in progress (or, possibly, data corruption)18:06
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 860487: Include tenant and pipeline in QueueItem span https://review.opendev.org/c/zuul/zuul/+/86048718:07
@jim:acmegating.comClark: ^ and i think that may be useful based on your feedback18:07
@clarkb:matrix.orgI see we're operating a LIFO ish system18:09
@jim:acmegating.comyep18:09
@jim:acmegating.com(and that's not our choice, that's determined by the OTEL protocol)18:10
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 860488: Don't trace merge jobs that we don't lock https://review.opendev.org/c/zuul/zuul/+/86048818:20
@jim:acmegating.comswest:  Clark tristanC ^ i consider that a discussion-starter.  i'm in favor of that for reasons i explained in the commit message, but i think it's a good thing to check on our goals.18:21
@clarkb:matrix.orgoh yup that was something I wondered about for half a second when looking at tracing.opendev.org but didn't dig into it. I agree that change makes sense18:23
@jim:acmegating.comyeah, took me a minute to figure out what was going on18:24
@westphahl:matrix.orgcorvus: makes sense. adding the span for jobs where we can't lock the request was a mistake on my end18:26
@jim:acmegating.comswest: cool, i thought that might be the case (i missed that in review too!), but technically it does give us more information, so i just wanted to double check before we reduced it :)18:27
@clarkb:matrix.orgI think if we want that info we could maybe do another nested span18:27
@clarkb:matrix.orgthe outer one for all merge jobs with or without a lock and the inner for when the lock is held. Then it will be more clear in the visualization18:27
@clarkb:matrix.orgBut I don't think that info is terribly useful18:27
@jim:acmegating.comagree on both18:27
@jim:acmegating.comwe can keep that in our back pocket if we change our minds18:28
@fungicide:matrix.orgi'm perplexed by the failure on https://zuul.opendev.org/t/zuul/build/9395a118af824786ae2e91236ea8f04f19:36
@fungicide:matrix.orgseems ansible tries 30 times to get the status page and finally gives up, but by then the web container log claims the service had been up for roughly 6 minutes19:37
@fungicide:matrix.orgit's getting a connection reset by the socket, like it's not listening at all, or blocked, or connecting to the wrong socket or address maybe19:38
@fungicide:matrix.orgseems to fail the same way consistently for that change19:38
@jim:acmegating.comfungi: hrm, since it deals with init code, it may be the change at issue.  i'll take a look in a bit.19:41
@fungicide:matrix.orgyeah, that was my suspicion too for the same reason, but it's unclear to me what didn't actually initialize. the logs seem to indicate the service started19:45
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 828614: Correct exit routine in web, merger https://review.opendev.org/c/zuul/zuul/+/82861420:25
@jim:acmegating.comfungi: i think there was a logic error in there ^20:25
@fungicide:matrix.orgi've been up longer than usual today, so not super confident in my assessment of the difference there, but continuing an infinite loop when something is unset vs breaking out of an infinite loop when that thing is set seems logically the same to me at the moment21:29
@jim:acmegating.comyeah.  the best defense i have is that change and its antecedents have had to deal with a lot of changes around it (and maybe if/when we get it right we should probably not let it sit on the vine too long)21:38
@fungicide:matrix.orgoh! nevermind, i see it now21:39
@fungicide:matrix.orgif we only continue the loop, we'll never escape (unless the other conditional with the return happens to match)21:39
@fungicide:matrix.orgthe continue was at the end of th eloop block, and therefore entirely redundant. the loop would continue regardless21:40
@fungicide:matrix.org * the continue was at the end of the loop block, and therefore entirely redundant. the loop would continue regardless21:40
@fungicide:matrix.orgso that conditional was doing approximately nothing21:40
@fungicide:matrix.orgi'll take that as a sign i've been staring at the screen too long21:41
@jim:acmegating.comyep, i think it was backwards21:51
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 860506: Include skipped builds in database and web ui https://review.opendev.org/c/zuul/zuul/+/86050622:56

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!