-@gerrit:opendev.org- Tobias Henkel proposed: [zuul/zuul] 824706: Enable reprime by default https://review.opendev.org/c/zuul/zuul/+/824706 | 12:56 | |
@clarkb:matrix.org | The zuul client job seems flaky and its failures have hit the last two changes gerritbot notified about above | 16:40 |
---|---|---|
@clarkb:matrix.org | Interesting test_dequeue in zuul client tests is exiting non zero when running the client | 16:47 |
@clarkb:matrix.org | With empty output | 16:47 |
@clarkb:matrix.org | I think the job might be succeeding before it can be dequeued | 16:48 |
@clarkb:matrix.org | hrm but we hold them in build. I must be misreading the log then | 16:49 |
@clarkb:matrix.org | `"POST /api/tenant/tenant-one/project/org/project/dequeue HTTP/1.1" 400 726 "" "python-requests/2.27.1"` cherrypy returned a 400 result for some reason | 16:55 |
@clarkb:matrix.org | This odd, we call zuul-client with the -v flag set but we don't get the info level log messages showing what args it is calling with | 17:04 |
@clarkb:matrix.org | * This is odd, we call zuul-client with the -v flag set but we don't get the info level log messages showing what args it is calling with | 17:04 |
@clarkb:matrix.org | But we do see the dequeue request come in on the zuul web log side and we see zuul web return a 400 error. This rules out problems with exec'ing the process. However, it is realyl weird we don't get that logged | 17:06 |
@mhuin:matrix.org | Clark: which log are you looking at? | 17:22 |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 824742: Cleanup timer triggers in test fixtures https://review.opendev.org/c/zuul/zuul/+/824742 | 17:23 | |
@clarkb:matrix.org | mhu: https://zuul.opendev.org/t/zuul/build/4a3dbbaf45a54314b877ae7315347e80/ that job | 17:23 |
@clarkb:matrix.org | corvus: 824742 is an attempt at cleaning up some log warnings/errors I noticed when debugging ^. I notice that OpenDev's config does similar too. Is that something we need to address more broadly? Another option would be to accept the - time: config I guess | 17:24 |
@jim:acmegating.com | Clark: does that mean opendev has periodic pipelines that aren't triggering? | 17:28 |
@mhuin:matrix.org | Clark: is the config syntax error normal at 2022-01-14 16:17:51,736 ? | 17:31 |
@clarkb:matrix.org | corvus: no it appears to work anyway somehow. | 17:31 |
@clarkb:matrix.org | mhu: that is what https://review.opendev.org/c/zuul/zuul/+/824742 attempts to address. It seems zuul works despite those warnings but they are distracting | 17:32 |
@clarkb:matrix.org | its also possible I've misread what the error is telling me but other examples show time: without the list prefix and our docs seem to show that is correct as does the getSchema method in the trigger implementation | 17:33 |
@mhuin:matrix.org | Clark: I think that's why you get the error, the dequeue targets a job in the periodic pipeline which is not properly defined | 17:34 |
@mhuin:matrix.org | if the pipeline isn't found zuul-web's dequeue returns a 400 | 17:34 |
@mhuin:matrix.org | https://opendev.org/zuul/zuul/src/branch/master/zuul/web/__init__.py#L385 | 17:36 |
@clarkb:matrix.org | huh ok so my change may fix it then. Is this a new change though? And will that break opendev? | 17:38 |
@clarkb:matrix.org | I guess we need to better understand why this is happening | 17:38 |
@clarkb:matrix.org | re work anyway somehow the periodic job in that test for example all queue up and get held, then after the zuul client error the are released and complete | 17:39 |
@clarkb:matrix.org | mhu: any idea why we didn't get verbose logging from the client in that test? it should've been included as part of the assertion failure logging as we log output there but it is logged as `(b'', None)` | 17:39 |
@mhuin:matrix.org | > <@clarkb:matrix.org> mhu: any idea why we didn't get verbose logging from the client in that test? it should've been included as part of the assertion failure logging as we log output there but it is logged as `(b'', None)` | 17:42 |
Not really, haven't looked into zuul-client in a while, maybe logging to stdout/stderr is buggy | ||
@jim:acmegating.com | Clark: yes i think we need to fully understand how we can have configuration errors and pipelines that still operate. | 17:43 |
@clarkb:matrix.org | grepping "extra keys not allowed" in opendev's debug log from today reports no results. Implying that maybe this is a regression since the last opendev restart | 17:47 |
@clarkb:matrix.org | (I'm only grepping on one scheduler and in a few files but we reload configs often enough it should've appeared I would think) | 17:48 |
@clarkb:matrix.org | What if it is specific to https://review.opendev.org/c/zuul/zuul/+/824640/2/zuul/lib/connections.py ? | 17:51 |
@clarkb:matrix.org | I had assumed that thsi was a general problem because other changes were failing the zuul client job, But I haven't investigated those to this level of detail and they could be the result of different failures | 17:51 |
@jim:acmegating.com | occams razor says "yes" to that i think | 17:53 |
@jim:acmegating.com | i can look into 824640 later today. | 17:55 |
@clarkb:matrix.org | ok, I'm looking at that code and not seeing how it would've caused this, but not impossible either as triggers are like sources | 17:55 |
-@gerrit:opendev.org- Niklas Borg proposed: [zuul/zuul] 824748: Required gitlab labels is subset of set labels https://review.opendev.org/c/zuul/zuul/+/824748 | 17:55 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 818300: Add support for adding and removing labels in gitlab https://review.opendev.org/c/zuul/zuul/+/818300 | 18:04 | |
-@gerrit:opendev.org- Niklas Borg proposed: [zuul/zuul] 824748: Required gitlab labels is subset of set labels https://review.opendev.org/c/zuul/zuul/+/824748 | 18:04 | |
-@gerrit:opendev.org- Niklas Borg proposed: [zuul/zuul] 824748: Required gitlab labels is subset of set labels https://review.opendev.org/c/zuul/zuul/+/824748 | 18:15 | |
-@gerrit:opendev.org- Niklas Borg proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 818300: Add support for adding and removing labels in gitlab https://review.opendev.org/c/zuul/zuul/+/818300 | 18:18 | |
-@gerrit:opendev.org- Niklas Borg proposed: [zuul/zuul] 824748: Required gitlab labels is subset of set labels https://review.opendev.org/c/zuul/zuul/+/824748 | 18:18 | |
-@gerrit:opendev.org- Niklas Borg proposed: [zuul/zuul] 824748: Required gitlab labels is subset of set labels https://review.opendev.org/c/zuul/zuul/+/824748 | 18:53 | |
@clarkb:matrix.org | ok I guess the error is the timer is not valid in the config not the time entry? And ya that continues to point towards the change that had the failures | 19:07 |
@clarkb:matrix.org | mhu: corvus I think the issue may be that we stopped loading the non source drivers in zuul web and zuul web is loading the tenant config to serve the api requests | 19:14 |
@clarkb:matrix.org | it isn't the scheduler that is complaining about the config issue it is the web. This explains why the periodic jobs run just fine | 19:14 |
@tristanc_:matrix.org | Not sure if that is related, but we had to mitigate an issue with the build config being cached in zookeeper: one of our integration test tries to change the fqdn of the deployment, and by doing so the procedure pushes a new config change with the updated fqdn for the log server site secret, and then it restarts zuul. After the restart, a job in the post pipeline is re-enqueued, and that job is now failing because it is using the old secret which has the previous fqdn. Deleting `/zuul/tenant/*/pipeline/*/item` from zookeeper fixed the issue, as this seems to ensure the build config is using up-to-date secrets. | 19:14 |
@clarkb:matrix.org | I think this means the mqtt secrets are actually required | 19:15 |
@tristanc_:matrix.org | Deleting before the restart* | 19:15 |
@clarkb:matrix.org | since mqtt may be used in pipelien config too | 19:15 |
@jim:acmegating.com | Clark: this is in reference to https://review.opendev.org/824640 ? | 19:18 |
@clarkb:matrix.org | > <@jim:acmegating.com> Clark: this is in reference to https://review.opendev.org/824640 ? | 19:18 |
yes | ||
@jim:acmegating.com | okay, i haven't started looking at that yet | 19:18 |
@clarkb:matrix.org | ya no worries, I kept digging to make sure I understood it and to make sure it was related to that chagne and not an already present regression. I think that analysis confirms it is related to the change | 19:20 |
-@gerrit:opendev.org- Zuul merged on behalf of Tobias Henkel: [zuul/zuul] 824245: Don't join the command thread if we stop via command socket https://review.opendev.org/c/zuul/zuul/+/824245 | 20:15 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 824640: Add release note about additional zuul-web requirements https://review.opendev.org/c/zuul/zuul/+/824640 | 21:05 | |
@jim:acmegating.com | Clark: Albin Vass tobiash tristanC ^ i agree with Clark's evaluation, so I turned that into a reno-only change | 21:06 |
@jim:acmegating.com | * Clark: Albin Vass tobiash tristanC ^ i agree with Clark's evaluation, so I turned that into a reno+docs change | 21:07 |
@jim:acmegating.com | https://review.opendev.org/824655 is also a zuul-web config change that might be good to get into 4.12 if we like it | 21:09 |
@avass:vassast.org | corvus: does is actually use mqtt in any way or is requiring secrets just a side effect of it needing to instantiate connections? (It's a bit weird, but i may be out of the loop) | 21:15 |
@avass:vassast.org | Also why does zuul-web parse pipeline definitions? :) | 21:17 |
@clarkb:matrix.org | I've approved teh change. I guess -W it if we aren't ready for that | 21:19 |
@jim:acmegating.com | Albin Vass: 1) probably -- a quick look at the mqtt driver suggests it will try to connect. it won't actually emit any reports from zuul-web though. you can confirm with "Starting MQTT Connection" in the log file. | 21:19 |
@jim:acmegating.com | Albin Vass: 2) because zuul-web is essentially a complete scheduler now, just one that never processes pipelines. it needs to be in order to remove the RPC dependency between web and scheduler (where you would ask zuul-web for a list of projects, and it would ask zuul-scheduler; that process was slow and not scalable; now zuul-web has a complete copy of the config). | 21:21 |
@jim:acmegating.com | i should really get better about using the reply feature of matrix :) | 21:23 |
-@gerrit:opendev.org- Kenny Ho proposed: [zuul/nodepool] 756569: Add Cobbler driver https://review.opendev.org/c/zuul/nodepool/+/756569 | 21:25 | |
@avass:vassast.org | > <@jim:acmegating.com> Albin Vass: 1) probably -- a quick look at the mqtt driver suggests it will try to connect. it won't actually emit any reports from zuul-web though. you can confirm with "Starting MQTT Connection" in the log file. | 21:27 |
I'm gonna guess it's too much work to make zuul web instantiate an mqtt connection object without actually establishing a connection to the service (unless we want it to do that to make sure the connection works?) | ||
@avass:vassast.org | > <@jim:acmegating.com> i should really get better about using the reply feature of matrix :) | 21:27 |
Yes me too :) | ||
@jim:acmegating.com | > <@avass:vassast.org> I'm gonna guess it's too much work to make zuul web instantiate an mqtt connection object without actually establishing a connection to the service (unless we want it to do that to make sure the connection works?) | 21:28 |
yeah. i don't love it either, but i don't think it's a trivial change. it should be *possible* though, it just may take an internal api change and update to all the drivers. they basically weren't written with that in mind. | ||
@avass:vassast.org | I'm thinking that it could be enough to give it alternative certs so it can connect but it's not allowed to do anything | 21:29 |
@jim:acmegating.com | probably; zuul shouldn't care, and i can't think of anything that would change that in the foreseeable future. | 21:31 |
@jim:acmegating.com | these connections *are* used for triggers, which is one of the reasons why this mostly makes sense; but they're not used for reporters. mqtt being a reporter-only connection is a bit of an outlier. | 21:31 |
@jim:acmegating.com | * these connections _are_ used by zuul-web for triggers, which is one of the reasons why this mostly makes sense; but they're not used for reporters. mqtt being a reporter-only connection is a bit of an outlier. | 21:32 |
@jim:acmegating.com | (but we may have an mqtt trigger in the future; that would still probably only be used by the scheduler, but it's at least conceivable we could have zuul-web do that too) | 21:33 |
@avass:vassast.org | I know there are people who wants mqtt to be able to trigger jobs (or add a rabbitmq driver for triggers) :) | 21:37 |
@tristanc_:matrix.org | Albin Vass: like https://review.opendev.org/c/zuul/zuul/+/637458 ? :) | 21:42 |
@avass:vassast.org | tristanC: i remember looking at that before but i don't remember what I thought about it. But usually someone wants to be able to trigger jobs in zuul some kind of message bus service. | 22:01 |
@avass:vassast.org | Usually to trigger a job when a new binary version has been delivered to test the software with | 22:07 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 824640: Add release note about additional zuul-web requirements https://review.opendev.org/c/zuul/zuul/+/824640 | 22:25 | |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 824655: Handle missing config files in zuul-web https://review.opendev.org/c/zuul/zuul/+/824655 | 22:25 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 824806: Improve handling of errors in provider manager startup https://review.opendev.org/c/zuul/nodepool/+/824806 | 22:53 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!