Monday, 2021-10-04

-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: drop rustup default run https://review.opendev.org/c/zuul/zuul-jobs/+/81227200:24
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: rework global install https://review.opendev.org/c/zuul/zuul-jobs/+/81227200:28
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: rework global install https://review.opendev.org/c/zuul/zuul-jobs/+/81227200:32
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: rework global install https://review.opendev.org/c/zuul/zuul-jobs/+/81227200:37
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: rework global install https://review.opendev.org/c/zuul/zuul-jobs/+/81227200:43
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: verify cryptography build on Ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/81227301:01
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: verify cryptography build on Ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/81227302:10
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: verify cryptography build on Ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/81227302:17
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: verify cryptography build on Ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/81227302:22
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: verify cryptography build on Ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/81227302:30
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: verify cryptography build on Ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/81227302:39
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] ensure-rust: verify cryptography build on Ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/81227302:44
-@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley: [zuul/zuul-jobs] Test tox.requires doesn't break sibling installs https://review.opendev.org/c/zuul/zuul-jobs/+/81200403:28
-@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley: [zuul/zuul-jobs] More exact section matching for tox showconfig https://review.opendev.org/c/zuul/zuul-jobs/+/81216003:36
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] Web UI: add builds timeline on buildset page https://review.opendev.org/c/zuul/zuul/+/81175308:57
@mhuin:matrix.orgcorvusmordred Done08:58
@johanssone_:matrix.orgHello 👋09:34
I've run into a snag during 4.6.0 -> 4.9.0 Zuul upgrade (started as a 4.6.0 -> 4.8.1 upgrade but same issue there so jumped 4.9.0 as well).
Running a fairly basic setup with zuul/Gerrit, based on docker/docker-compose.
Change notes doesn't really point to anything w.r.t to error AFAICT..
Symptom:
Post-upgrade, all new changes in gerrit successfully triggers a zuul enqueue of the proper job (check pipe in this case),
but fails (or even doesn't send) a request to nodepool to snatch a nl node for the job, so it's stuck indefinitely in "unknown" state.
I've enabled debug on both zuul side and nodepool launcher side, and the ONLY error I can find is not during the enqueue phase, but rather in nodepool logs during the creation of new worker VMs, at which point I get this: https://paste.opendev.org/show/809755/, but node still reports as ready (both from a nodepool list, as well as in zuul-web).
On top of that, full-reconfigures smart-reconfigures, restarts etc have all been done. (Even the new zuul delete-state command in 4.9.0)
Would greatly appreciate any guidance on where to start to look, as at this point, I'm not sure how to continue.
versions:
nodepool-laucher: 4.3.0 (docker image)
zuul-scheduler: 4.9.0 (docker image)
zuul-wev: 4.9.0 (docker image)
finger-gw: 4.9.0 (docker image)
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] zuul-web: add start time, end time info on buildsets https://review.opendev.org/c/zuul/zuul/+/81217309:47
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] zuul-web: add start time, end time info on buildsets https://review.opendev.org/c/zuul/zuul/+/81217310:02
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul] Web UI: display buildsets start, end times on buildsets page https://review.opendev.org/c/zuul/zuul/+/81217410:02
@johanssone_:matrix.org * Hello 👋10:42
I've run into a snag during 4.6.0 -> 4.9.0 Zuul upgrade (started as a 4.6.0 -> 4.8.1 upgrade but same issue there so jumped 4.9.0 as well).
Running a fairly basic setup with zuul/Gerrit, based on docker/docker-compose.
Change notes doesn't really point to anything w.r.t to error AFAICT..
Symptom:
Post-upgrade, all new changes in gerrit successfully triggers a zuul enqueue of the proper job (check pipe in this case),
but fails (or even doesn't send) a request to nodepool to snatch a nl node for the job, so it's stuck indefinitely in "unknown" state.
I've enabled debug on both zuul side and nodepool launcher side, and the ONLY error I can find is not during the enqueue phase, but rather in nodepool logs during the creation of new worker VMs, at which point I get this: https://paste.opendev.org/show/809755/, but node still reports as ready (both from a nodepool list, as well as in zuul-web).
On top of that, full-reconfigures smart-reconfigures, restarts etc have all been done. (Even the new zuul delete-state command in 4.9.0)
Would greatly appreciate any guidance on where to start to look, as at this point, I'm not sure how to continue.
versions:
nodepool-launcher: 4.3.0 (docker image)
zuul-scheduler: 4.9.0 (docker image)
zuul-web: 4.9.0 (docker image)
finger-gw: 4.9.0 (docker image)
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] Flatten SourceContext data structure https://review.opendev.org/c/zuul/zuul/+/81196711:42
@clarkb:matrix.orgErik Johansson that looks like nodepool is failing to validate ssh works to the booted instances after the cloud says they are ready. One way this can happen is you are running nodepool on fedora and using an RSA key. Fedora doesn't allow to use RSA in some situations anymore, but you are running it on the upstream docker images which are based on debian. I would try to manually create an instance and ssh to it and see if you can sort out why ssh is unhappy from there.13:10
@fungicide:matrix.org * we had an odd "stuck build" scenario in opendev over the weekend... details are at https://meetings.opendev.org/irclogs/%23openstack-infra/%23openstack-infra.2021-10-03.log.html but the tl;dr is: it looks like a network outage impacting our gerrit server caused a git operation to hang during workspace setup, and so the build never timed out14:24
@fungicide:matrix.org * uwe had an odd "stuck build" scenario in opendev over the weekend... details are at https://meetings.opendev.org/irclogs/%23openstack-infra/%23openstack-infra.2021-10-03.log.html but the tl;dr is: it looks like a network outage impacting our gerrit server caused a git operation to hang during workspace setup, and so the build never timed out14:24
@fungicide:matrix.org * we had an odd "stuck build" scenario in opendev over the weekend... details are at https://meetings.opendev.org/irclogs/%23openstack-infra/%23openstack-infra.2021-10-03.log.html but the tl;dr is: it looks like a network outage impacting our gerrit server caused a git operation to hang during workspace setup, and so the build never timed out14:24
@fungicide:matrix.org> <@iwienand:matrix.org> I see "commit 6439b16c04c1d36c48a802cc28d89c67a630368e (HEAD -> master, tag: 1.1.0, origin/master, origin/HEAD, refs/changes/28/809528/1)" in element14:38
thanks, i do now too, since reloading the webclient. for some reason it seems to have gotten "stuck," stopped showing any updates, and also thought my messages weren't going through to the channel
@jim:acmegating.comi've seen that once too :/14:39
@jim:acmegating.comfungi: if that sha looks good to you, i'll push that today14:39
@fungicide:matrix.orgcorvus: yep, looks sane to me. i confirm that's the current master branch, and a minor version increase is warranted for some of the feature changes and dependency version increases since 1.0.014:43
@mordred:inaugust.commhu: +2 from me - I think that's both really slick - and also legitimately useful (I browsed through several arbitrary buildsets and that graph tells a very clear story on several of them)14:44
@jim:acmegating.comtristanC: 811753 is yours to re-review14:51
@mhuin:matrix.orgmordred: happy to be of help! On a side note patternfly's chart library (victory) is relatively easy to use14:52
-@gerrit:opendev.org- Zuul merged on behalf of Matthieu Huin https://matrix.to/#/@mhuin:matrix.org: [zuul/zuul] Web UI: add builds timeline on buildset page https://review.opendev.org/c/zuul/zuul/+/81175316:20
@jim:acmegating.commhu: I did reply to some of your comments on that btw; nothing terribly important, just wanted you to know in case you didn't check it again post-merge.17:30
@johanssone_:matrix.orgClark: Thanks for the reply. The issue was purely human though (missed full zuul-executor upgrade), but your comment brought me out of temporary tunnel vision to find the issue...Thanks!18:10
@clarkb:matrix.org> <@johanssone_:matrix.org> Clark: Thanks for the reply. The issue was purely human though (missed full zuul-executor upgrade), but your comment brought me out of temporary tunnel vision to find the issue...Thanks!19:39
Glad I could help
-@gerrit:opendev.org- Ade Lee proposed on behalf of Jiri Podivin: [zuul/zuul-jobs] DNM https://review.opendev.org/c/zuul/zuul-jobs/+/80703120:12
-@gerrit:opendev.org- Zuul merged on behalf of Guillaume Chauvel: [zuul/zuul-jobs] ensure-twine: Avoid Reinstalling twine if present https://review.opendev.org/c/zuul/zuul-jobs/+/73534020:58
@iwienand:matrix.orgfungi: i've had that "stuck" thing before too.  took me a while to realise that everyone wasn't just being very very quiet :)21:38
@iwienand:matrix.orgzuul-maint: if you have a little time for https://review.opendev.org/c/zuul/zuul-jobs/+/812272 and https://review.opendev.org/c/zuul/zuul-jobs/+/812273 it's some updates to ensure-rust after pyca/cryptography noticed we were installing all the rust bits twice, adding time to the CI runs21:39
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] Handle long strings in build/buildsets tables https://review.opendev.org/c/zuul/zuul/+/81200822:01
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Add `playbook_projects` zuul variable https://review.opendev.org/c/zuul/zuul/+/81241022:39
@jim:acmegating.compushed zuul-registry 1.1.022:55
@jim:acmegating.comzuul-maint: how does this look for a zuul release?  commit 5858510e221bb4904d81331d681a68bb0c4b0f9a (tag: 4.10.0)22:57
@jim:acmegating.comthat's not master; it's a few changes behind.  it's what we restarted opendev on last according to https://wiki.openstack.org/wiki/Infrastructure_Status22:58
@fungicide:matrix.orgcorvus: commit 5858510 for zuul 4.9.1 or 4.10.0? looks fine to me for the latter, not so much the former due to added features23:28
@jim:acmegating.comfungi: 4.10.023:38
@clarkb:matrix.orgcorvus: 4.10.0 on that commit lgtm though I havne'tdouble checked that is what opendev restarted on23:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!