Wednesday, 2022-11-02

-@gerrit:opendev.org- Michael Kelly proposed: [zuul/zuul-jobs] 861799: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/86179900:03
-@gerrit:opendev.org- Michael Kelly proposed: [zuul/zuul-jobs] 861799: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/86179900:15
-@gerrit:opendev.org- Michael Kelly proposed: [zuul/zuul-jobs] 861799: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/86179900:25
-@gerrit:opendev.org- Michael Kelly proposed: [zuul/zuul-jobs] 861799: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/86179900:31
-@gerrit:opendev.org- Michael Kelly proposed:03:35
- [zuul/zuul-operator] 853592: Allow the specification of storageClassName in PVCs https://review.opendev.org/c/zuul/zuul-operator/+/853592
- [zuul/zuul-operator] 853695: Prefix zuul-specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853695
- [zuul/zuul-operator] 853696: Prefix nodepool specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853696
- [zuul/zuul-operator] 861488: helm: Add a basic helm chart for zuul-operator https://review.opendev.org/c/zuul/zuul-operator/+/861488
- [zuul/zuul-operator] 862390: helm: Add cert-manager as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/862390
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279
@michael_kelly_anet:matrix.orgtristanC: and corvus can one (or both) of you folks take another look at https://review.opendev.org/c/zuul/zuul-operator/+/853592/12 and maybe give it workflow +2?03:36
-@gerrit:opendev.org- Michael Kelly proposed:04:28
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279
- [zuul/zuul-operator] 863191: helm: Add pxc-operator as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/863191
-@gerrit:opendev.org- Michael Kelly proposed:04:32
- [zuul/zuul-operator] 862390: helm: Add cert-manager as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/862390
- [zuul/zuul-operator] 863191: helm: Add pxc-operator as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/863191
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279
@iwienand:matrix.orgi can not let this zuul-sphinx issue rest because nothing about it seem to make sense :(07:17
@iwienand:matrix.orgupstream docutils have been helpful and engaged with the issue, but it's still not clear to me what's going on.  i've bisected it down to one change in sphinx07:17
@iwienand:matrix.orghttps://github.com/sphinx-doc/sphinx/issues/1095107:18
@iwienand:matrix.orgi can't replicate with docutils >0.17 ... but we can't use that until rtd_sphinx_theme makes their next release (they are pinned to <0.18, but their changelog says next release is slated to remove that)07:19
@iwienand:matrix.orgthat might be enough to convince us to just pin to sphinx 5.2.3 until that release.  i just didn't want to end up in a situation where zuul-sphinx has bit-rotted and we can never move on to new sphinx versions07:20
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 863186: Fix skipped builds filter in web ui https://review.opendev.org/c/zuul/zuul/+/86318614:37
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 863326: Fix config-errors dedicated page https://review.opendev.org/c/zuul/zuul/+/86332615:48
@clarkb:matrix.orgI'm not sure ^ is a complete or accurate fix. I'm hoping the preview site will help debug.15:50
@jim:acmegating.comClark: that wfm locally.16:02
@jim:acmegating.comzuul-maint: ^ i think we can/should sneak that into the 8.0.1 release too if you want to +3 it.  i can make the release later today.16:03
@clarkb:matrix.orgcool, the bit I was less sure about was whether or not I needed to handle the ready flag and clearing state. But I think beacuse the page is tenant specific we don't need to do that16:03
@jim:acmegating.comfyi, we're about to perform a zk upgrade on opendev's zuul, join #_oftc_#opendev:matrix.org to follow along/participate16:06
-@gerrit:opendev.org- Michael Kelly proposed:16:23
- [zuul/zuul-operator] 853592: Allow the specification of storageClassName in PVCs https://review.opendev.org/c/zuul/zuul-operator/+/853592
- [zuul/zuul-operator] 853695: Prefix zuul-specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853695
- [zuul/zuul-operator] 853696: Prefix nodepool specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853696
- [zuul/zuul-operator] 861488: helm: Add a basic helm chart for zuul-operator https://review.opendev.org/c/zuul/zuul-operator/+/861488
- [zuul/zuul-operator] 862390: helm: Add cert-manager as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/862390
- [zuul/zuul-operator] 863191: helm: Add pxc-operator as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/863191
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279
-@gerrit:opendev.org- Michael Kelly proposed:16:24
- [zuul/zuul-operator] 853592: Allow the specification of storageClassName in PVCs https://review.opendev.org/c/zuul/zuul-operator/+/853592
- [zuul/zuul-operator] 853695: Prefix zuul-specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853695
- [zuul/zuul-operator] 853696: Prefix nodepool specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853696
- [zuul/zuul-operator] 861488: helm: Add a basic helm chart for zuul-operator https://review.opendev.org/c/zuul/zuul-operator/+/861488
- [zuul/zuul-operator] 862390: helm: Add cert-manager as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/862390
- [zuul/zuul-operator] 863191: helm: Add pxc-operator as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/863191
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279
@clarkb:matrix.org> <@jim:acmegating.com> Clark: that wfm locally.18:03
The site preview also shows it working for anyone else doing reviews.
@jim:acmegating.comi'm going to go ahead and +w that due to the breakage; retro-reviews still welcome18:04
-@gerrit:opendev.org- Joshua Watt proposed: [zuul/zuul] 863068: executor: Skip line mapping for special Gerrit files https://review.opendev.org/c/zuul/zuul/+/86306820:02
@clarkb:matrix.orgcorvus: this isn't super urgent but would be nice to have fixed https://review.opendev.org/c/zuul/zuul-jobs/+/863098 allows the named-checkzone command to run via $PATH as the install dir changes across ubuntu versions20:29
@jim:acmegating.comClark: looks like no test job for that... that should be a pretty easy use of 'simple-role-test'... i think we'd just need to stick a sample zonedb file in the repo...20:39
@jim:acmegating.combasically just:20:40
run: test-playbooks/simple-role-test.yaml
vars: {zone_db_files: [path]}
@clarkb:matrix.orgYa, the change it broke has a depends on to show it works on jammy at least. But that doesn't cover the old /usr/sbin path I guess20:41
@clarkb:matrix.orgI'll take a look at making a test job20:42
@jim:acmegating.comthanks, that'd be great.  i really like having at least simple-role-test jobs for every role if we can.  maybe that role predated that.20:42
@jim:acmegating.comwe can throw the multi-platform tag (whatever it's called) on it too to get versions for all the platforms we care about20:43
@jim:acmegating.com(it's both regression and future-proofing, as if it had existed, it would have caught this issue ahead of time)20:44
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 863098: Fix check zone role for Jammy https://review.opendev.org/c/zuul/zuul-jobs/+/86309821:04
@clarkb:matrix.orgok that shows older ubuntu working which is great but debian fails I think because they don't have /usr/sbin in the path by default. I'll reduce this to ubuntu and if we decide to add more platforms later we can make it more robust then21:18
@clarkb:matrix.orgoh except I guess it may have run on debian before since it hardcoded /usr/sbin. Maybe I need to address that for debian now21:19
-@gerrit:opendev.org- Michael Kelly proposed:21:20
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279
- [zuul/zuul-operator] 863439: doc: Re-write install doc to use helm chart https://review.opendev.org/c/zuul/zuul-operator/+/863439
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 863098: Fix check zone role for Jammy https://review.opendev.org/c/zuul/zuul-jobs/+/86309821:21
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 863098: Fix check zone role for Jammy https://review.opendev.org/c/zuul/zuul-jobs/+/86309821:26
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 863440: Fix deduplication exceptions in pipeline processing https://review.opendev.org/c/zuul/zuul/+/86344021:28
@clarkb:matrix.orgthe diff between latest patchset and previous one on that zuul-jobs change is maybe my least favorite ansible behavior21:28
@clarkb:matrix.organyway that appears to be working now. yay for testing21:29
@clarkb:matrix.orgcorvus: is 863440 related to the job timeouts we're seeing?21:34
@jim:acmegating.comaroo?21:34
@clarkb:matrix.orgI noticed them with the py311 stuff first and attributed it to the interpreter. But now it seems py38 and py310 may be getting stuck too?21:34
@clarkb:matrix.orghttps://zuul.opendev.org/t/zuul/stream/ee2643c5980240c98f80606bc547a9cf?logfile=console.log for example21:34
@clarkb:matrix.orgwhen I looked for py311 it appeared the job actually ran a full complement of tests but the stestr process wasn't completing21:34
@clarkb:matrix.organd that was about as far as I got before other things distracted me21:35
@clarkb:matrix.orgbut if you look at that job it hasn't done anything for almost 25 minutes21:35
@jim:acmegating.comClark: i'm unfamiliar with the issue you're describing.  440 is to fix a bug that should only apply to job deduplication which should never been seen in opendev due to the lack of circular dep support21:35
@clarkb:matrix.orggot it21:35
@clarkb:matrix.orgjust talking out loud here: python38 and python310 run on different distro releases. This and the fact that the problem affects multiple python versions implies it isn't a platform problem but more likely to be a problem with our tools or the zuul test suite itself21:43
@clarkb:matrix.orgI've ssh'd onto the python310 test node and there seems to be a python process stuck on a read for fd 18 trying to determine what fd 18 is now21:45
@clarkb:matrix.orgLooks to be multiprocessing related21:48
@clarkb:matrix.orghttps://paste.opendev.org/show/bIfaYeDOEgM6Zz8gqEry/ I'm not sure what to make of that yet but the nodes will be deleted soonish so making a record here21:52
@clarkb:matrix.orgthose deprecation warning flags are something we set via tox.ini21:53
@clarkb:matrix.org8023's parent is 8022 which is a shell process that the stestr command seems to invoke. I think 9070 is the process actually running tests and 8023 is a subprocess of stestr coordinating things?21:56
@clarkb:matrix.orgstestr's last release was over a month ago.21:56
-@gerrit:opendev.org- Sebastian Gonzalez Pintor proposed wip: [zuul/zuul] 862983: [WIP] Add soft version of fail-fast https://review.opendev.org/c/zuul/zuul/+/86298322:06
-@gerrit:opendev.org- Sebastian Gonzalez Pintor proposed wip: [zuul/zuul] 862983: [WIP] Add soft version of fail-fast https://review.opendev.org/c/zuul/zuul/+/86298322:09
@clarkb:matrix.orgI've attempted to hold the node that paste was generated from22:12
@jim:acmegating.comClark: to be clear, we're looking at stestr processes hanging?  not a zuul executor bug or something?22:13
@clarkb:matrix.orgcorvus: It is theoretically possible that it could be a zuul executor issue, but I don't believe it is because the strace on the child process shows it waiting on a read from the parent process. That indicates to me that it is the test framework/tooling itself that is stuck not the code under test22:13
@clarkb:matrix.orgthe zuul tests shouldn't know anything about that interprocess communication between the top level test runner and the per cpu actual test runners22:14
@jim:acmegating.comright.  mostly just confirming that we're talking about the contents of a job, not an operational zuul problem22:14
@clarkb:matrix.orgI've pinged mtreinish in #opendev about it (stestr maintainer) and have attmpted to put a hold in for that node. Hopefully we can use that to debug this properly22:15
@clarkb:matrix.orgcorvus: the node is held now if you want to take a look22:20
@clarkb:matrix.orgoh except22:20
@clarkb:matrix.orgoh nevermind. I thought maybe the processes had died due to ansible connection going away22:20
@clarkb:matrix.orgbut they are still there22:20
@clarkb:matrix.orgthere is a defunct python process child of 8023. I wonder if it is actually hung up trying to reap that and thus not processing thing to allow 9070 to proceed?22:22
@jim:acmegating.comClark: the kazoo rolledbackerror exceptions are interesting.  i don't believe we expect to see those normally; the system is probably extremely overloaded.  they may be causing tests failures in such a way that the tests never exit.22:27
@jim:acmegating.comwe have had a ... complex relationship with unit test timeouts22:29
@clarkb:matrix.orghrm we can rollback the concurrency to 6 from 7 to try and mitigate some of that load22:36
@clarkb:matrix.orgthat will make python38 jobs even longer but py310 should stay within reasonable bounds22:36
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 862978: Add playbook semaphores https://review.opendev.org/c/zuul/zuul/+/86297823:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!