Wednesday, 2022-08-17

-@gerrit:opendev.org- Ian Wienand proposed:01:33
- [zuul/zuul] 853369: zuul-stream: validate on per node basis https://review.opendev.org/c/zuul/zuul/+/853369
- [zuul/zuul] 853208: zuul-stream : Test against a Python 2.7 container https://review.opendev.org/c/zuul/zuul/+/853208
- [zuul/zuul] 853382: zuul-stream: fix typo in file detection https://review.opendev.org/c/zuul/zuul/+/853382
@iwienand:matrix.orgthings I learnt from that series ^01:33
@iwienand:matrix.org"ssh-keyscan localhost -p 2022" does *not* scan localhost on port 2022.  It scans localhost, and two hosts called "-p" and "2022" which it vaguely warns about, but exits successfully anyway.  So although you will think you have keys for the ssh listening on port 2022, you actually have them for the ssh listening on port 2201:35
@iwienand:matrix.org"ssh-keyscan -p 2022 localhost" *will* scan the keys on port 202201:35
@iwienand:matrix.org"_inventory" is not a safe variable name to set_fact: with when working with something inventory-ish.  You will end up with some mangled thing including ansible_facts01:36
@iwienand:matrix.orgthis will be quite unobvious until you've dumped the variables in debug many, many times :)01:37
@iwienand:matrix.organsible is smart enough to find python3 in containers that have python2 in /usr/local/bin, and you want to set ansible_python_interpreter to explicitly use python2.  this probably isn't that surprising in hindsight, but can mean you're not testing what you think you're testing01:38
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand: [zuul/zuul] 850718: zuul-stream : don't write out logfile for tasks in loops https://review.opendev.org/c/zuul/zuul/+/85071801:54
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 852191: Web: disable scroll wheel zoom in job graph https://review.opendev.org/c/zuul/zuul/+/85219102:56
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 852202: Web: fix tabs on project page https://review.opendev.org/c/zuul/zuul/+/85220203:09
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 851942: docs: update console streaming docs https://review.opendev.org/c/zuul/zuul/+/85194206:58
-@gerrit:opendev.org- Stephen Finucane proposed:09:58
- [zuul/zuul] 853419: Add index page for 'config' docs directory https://review.opendev.org/c/zuul/zuul/+/853419
- [zuul/zuul] 853420: setup: Replace dash-separated config https://review.opendev.org/c/zuul/zuul/+/853420
- [zuul/zuul] 853421: docs: Remove unnecessary noise https://review.opendev.org/c/zuul/zuul/+/853421
@jim:acmegating.comthe gerrit folks are working on removing the extension point that the zuul-summary-plugin uses: https://gerrit-review.googlesource.com/c/gerrit/+/34283613:54
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 853351: Add zuul change queue https://review.opendev.org/c/zuul/zuul/+/85335114:13
-@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [zuul/zuul] 852289: Update unit test container setup and instructions https://review.opendev.org/c/zuul/zuul/+/85228914:17
@clarkb:matrix.orgHas anyone else been noticing flakier than usual SSH connectivity with Ansible? OpenDev has seen this with Zuul + Ansible and nested Ansible in Zuul jobs. The SSH connections go between different cloud providers, within the same provider, and sometimes fail against localhost. Different distros all seem to be affected too (centos 8 and 9 and ubuntu focal at least)17:38
@clarkb:matrix.orgThe fact that we see thsi between different distros and with connections to localhost even makes me suspect something about ansible itself. But I haven't found any clear indications ansible is to blame.17:39
@clarkb:matrix.orgIts also not super persistent. It happens enough to be noticeable, but also infrequent enough that it doesn't completely halt everything17:50
@jim:acmegating.comClark: are all the tenants on ansible v5 now?18:06
@clarkb:matrix.org> <@jim:acmegating.com> Clark: are all the tenants on ansible v5 now?18:09
Yes everyone should be defaulting to Ansible v5 now
@clarkb:matrix.orgI noticed that ansible-core did a release on monday (2.12.8) but our executors are still on the prior release 2.12.7.18:11
@clarkb:matrix.orgalso this was happening at least last week as well18:12
@jim:acmegating.comClark: i'm aware of some users encountering problems with ansible crashing/hanging on old (2.9) ansible, but they haven't upgraded to ansible 5 yet.  that's 2 major differences (crashing vs connection error, plus also version differences) so i doubt it's the same, but it's the closest thing i have to offer.  i'm working with another user who is having connection problems, but i'm 95% sure that's related to job/node issues at this point, so i would not draw a line connecting those yet.  tl;dr, i don't have any other accounts to corroborate with that, only things to track and potentially correlate later if things change.18:23
@clarkb:matrix.orgOk, good to know18:23
@y2kenny:matrix.orgDoes zuul provide a way to suspend service in preparation for shutdown?  (for example, suspend processing of incoming event but allow existing jobs to run to completion?)18:36
@tristanc_:matrix.orgKenny Ho: zuul-executor graceful should do that, see: https://zuul-ci.org/docs/zuul/latest/operation.html#executor18:55
@y2kenny:matrix.orgok... I was wondering if there is something at the scheduler level but executor should be useful also.19:04
@jim:acmegating.comthe current version of zuul is designed to never need to be shut down :)19:07
@clarkb:matrix.org> <@jim:acmegating.com> Clark: i'm aware of some users encountering problems with ansible crashing/hanging on old (2.9) ansible, but they haven't upgraded to ansible 5 yet.  that's 2 major differences (crashing vs connection error, plus also version differences) so i doubt it's the same, but it's the closest thing i have to offer.  i'm working with another user who is having connection problems, but i'm 95% sure that's related to job/node issues at this point, so i would not draw a line connecting those yet.  tl;dr, i don't have any other accounts to corroborate with that, only things to track and potentially correlate later if things change.19:43
After further investigation I've pushed https://review.opendev.org/c/openstack/project-config/+/853536 suspecting that maybe ssh scanners and brute forcers may be consuming all our slots.
@y2kenny:matrix.org> <@jim:acmegating.com> the current version of zuul is designed to never need to be shut down :)19:44
ok so I am definitely using it wrong lol :)
@jim:acmegating.comClark: wow.  that would also explain the unique circumstances :)19:44
@jim:acmegating.comKenny Ho: all zuul components can be run redundantly so the whole system can rolling restart.  opendev does that once a week automatically currently, and is considering doing it continuously.19:46
@jim:acmegating.comdepending on job runtime, it may not be fast, but it can be seamless19:46
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 853372: Convert GCE to state machine driver and remove simple https://review.opendev.org/c/zuul/nodepool/+/85337221:51
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 853548: Deprecate Ansible 2, make Ansible 5 default https://review.opendev.org/c/zuul/zuul/+/85354822:13
@jim:acmegating.comzuul-maint: ^ i tihnk we should merge 853548, make a 6.X release, then remove ansible 5 and make a 7.0 release.  how does that sound?22:15
@jim:acmegating.com * zuul-maint: ^ i tihnk we should merge 853548, make a 6.X release, then remove ansible 2 and make a 7.0 release.  how does that sound?22:15
@pearcetyler:matrix.orgAny idea what would cause Zuul to ignore the retry limit on jobs? I'm using the default retry limit of 3, and it correctly fails after 3 attempts if there are legitimate errors, but sometimes gets stuck in a loop (I think on the `prepare-workspace` step). The workaround is to abort and re-queue the PR, but the manual intervention is not ideal22:15
@jim:acmegating.com(then do it all again for ansible 6)22:15
@clarkb:matrix.org> <@jim:acmegating.com> zuul-maint: ^ i tihnk we should merge 853548, make a 6.X release, then remove ansible 2 and make a 7.0 release.  how does that sound?22:16
No objections from me. Might be worth polling on zuul-discuss? I don't know how attached to specific ansible versions people are
@jim:acmegating.comClark: well, i mean, they are EOL....22:17
@jim:acmegating.comi don't think we really have a choice?22:17
@jim:acmegating.comTyler Pearce: certainly not expected behavior22:18
@pearcetyler:matrix.orgFor reference, here's a legitimate failure that honors retry attempts: 22:19
@pearcetyler:matrix.orgSo I know the limit is there/working22:19
@pearcetyler:matrix.orgThe infinite retry happens probably once a week on average, for several months (Sometimes a few weeks between issue, sometimes multiple times per week). I can't determine any kind of pattern here, and it does succeed after abort/re-queue22:20
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 853552: Add Ansible 6 https://review.opendev.org/c/zuul/zuul/+/85355222:29
@clarkb:matrix.orgcorvus: ya thats a good point.22:30
@iwienand:matrix.orgcorvus: minor point but would be nice if could have zuul-console docs in same release as the other updates https://review.opendev.org/c/zuul/zuul/+/85194223:14
@jim:acmegating.comianw: agreed; +2 here Clark that could use a re-review from you when you have a sec23:16
@iwienand:matrix.orgalso it looks like the ansible 6 stuff just links into the base versions of console/stream so shouldn't be different, but https://review.opendev.org/q/topic:stream-2.7-container should regression test those bits back to 2.7 23:17
@michael_kelly_anet:matrix.orghey folks.  I'm exploring using Zuul in Kubernetes.  right now I'm using the zuul operator, but I've run into a few areas that I'm not sure how to proceed.23:19
@michael_kelly_anet:matrix.orgSpecifically, trying to figure out how to deal with build results in Kubernetes + looks like there's some behaviours of the Kubernetes nodepool driver that make some of the examples maybe not work as expected.23:20
@michael_kelly_anet:matrix.orgThe upload-logs role doesn't appear to be sticking them anywhere useful for me, for example.  How is one intended to accomplish this in a k8s operator based deployment?23:21
@iwienand:matrix.org> <@iwienand:matrix.org> also it looks like the ansible 6 stuff just links into the base versions of console/stream so shouldn't be different, but https://review.opendev.org/q/topic:stream-2.7-container should regression test those bits back to 2.723:22
actually i'm going to respin that, i think the bottom change needs an update
@jim:acmegating.comMichael Kelly: you'll definitely need a place to put the build logs; that can either be a cloud object storage, or a static file server.  the zuul operator doesn't automate that.23:23
@jim:acmegating.comMichael Kelly: for example, if you're using a k8s from aws, you might want to store logs in s3.  if you're doing it on-prem, you might want to set up a static server.23:24
@michael_kelly_anet:matrix.orgThis is in a baremetal k8s cluster, fyi.23:24
@michael_kelly_anet:matrix.orgRight now, the upload-logs role (?) in the executor appears to be trying to rsync them to a subtree `/srv`23:24
@michael_kelly_anet:matrix.orgIn it's own filesystem.23:24
@jim:acmegating.comMichael Kelly: the zuul-job roles for uploading logs are here, so these are the options available from the community: https://zuul-ci.org/docs/zuul-jobs/log-roles.html23:24
@jim:acmegating.comthe operator isn't going to set up any base jobs, so that's sort of an early task you'll need to do when deploying.23:26
@michael_kelly_anet:matrix.orgI have the base jobs setup based on https://zuul-ci.org/docs/zuul/latest/tutorials/quick-start.html#configure-a-base-job23:26
@michael_kelly_anet:matrix.org(this is where I added `upload-logs` from)23:26
@jim:acmegating.comah, that's designed for a docker-compose set up where the volume is shared with a static server23:27
@michael_kelly_anet:matrix.orgIs a sensible approach here to jsut create a Kubernetes PV and mount this in via `jobVolumes`?23:27
@jim:acmegating.comso basically, you'd need to tweak that to have it upload to the server (probably via ssh/scp) instead.  the https://zuul-ci.org/docs/zuul-jobs/log-roles.html#role-add-fileserver can help with that23:27
@michael_kelly_anet:matrix.org(can I mount this via `jobVolumes`?)23:28
@jim:acmegating.comthat would restrict your executor and log server to the same k8s node -- so maybe good for a prototype, but less great for production23:28
@michael_kelly_anet:matrix.orgjobvolumes only support hostpath I guess?23:28
@jim:acmegating.comfor production, i'd go with setting up a static server with an k8s-internal service and upload to it over ssh using those roles.23:29
@jim:acmegating.com(honestly, i'm not sure it's that much harder to do the zuul secret encryption + roles, vs the k8s volume stuff, so i might even just suggest doing that from the start)23:30
@jim:acmegating.commaybe we should adjust the tutorial to do that too... we might be keeping it too simple by having the shared volume.  it's kind of a balancing act.23:31
@michael_kelly_anet:matrix.orgFor sure.  23:31
@michael_kelly_anet:matrix.orgFwiw, in my case I have a storage device that will auto-provision PVs for me, so if it's just a case of needing support for non-hostpath volumes in there it could be easy enough to do that, assuming it was acceptable.23:32
@michael_kelly_anet:matrix.orgI have some fixes/changes I had intended to push for the operator anyway...23:32
@jim:acmegating.comMichael Kelly: since i rambled a bit: the tl;dr is: I'd use https://zuul-ci.org/docs/zuul-jobs/log-roles.html#role-add-fileserver in combination with https://zuul-ci.org/docs/zuul-jobs/log-roles.html#role-upload-logs for your environment.23:32
@jim:acmegating.comMichael Kelly: that arrangement (upload via ssh) lets you scale out to multiple executors and locate them on different k8s nodes than your log storage.  executors are the first thing that needs to scale in zuul.23:34
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-client] 853352: Add zuul-client to zuul change queue https://review.opendev.org/c/zuul/zuul-client/+/85335223:34
@michael_kelly_anet:matrix.org```on different k8s nodes than your log storage``` 23:35
In my case, using a PV would be on the external storage device.
@michael_kelly_anet:matrix.orgNot on the K8S node.23:35
@michael_kelly_anet:matrix.orgThus the question as to whether support for non-hostpath volumes would be acceptable. :)23:36
@jim:acmegating.comright, but there are 2 pods involved here: the executor which needs to "upload" the log, and a static http server (apache/nginx/whatever) to serve the content23:36
@jim:acmegating.comso if you want to just use a volume, then it'd have to be a hostpath that's shared between the pods23:36
@michael_kelly_anet:matrix.orgTotally.23:36
@jim:acmegating.comso i think the only viable choices are (PV, hostpath, apache, executor all on one k8s node) or (upload via ssh, and we don't care what nodes apache and executor are on or what kind of volume is attached to apache)23:39
@jim:acmegating.comunless i'm missing something... there's a lot of... variety... in k8s :)23:39
@michael_kelly_anet:matrix.orgThe PV can be shared between pods even if not on the same node.23:39
@michael_kelly_anet:matrix.orgEither via NFS mount or via native ReadWriteMany support23:39
@michael_kelly_anet:matrix.org(the latter depends on the CSI driver: in our case it definitely supports it)23:40
@jim:acmegating.comokay that's what i was missing...  then yes, you could probably do that and share the directory like in the quick start23:40
@jim:acmegating.comMichael Kelly: in that case there are access controls to consider; basically, any trusted playbook in zuul will have full access to the log storage23:42
@michael_kelly_anet:matrix.orgYea, I have to think that through a bit. 23:42
@michael_kelly_anet:matrix.org:)23:42
@michael_kelly_anet:matrix.orgThe other question I had was around the prepare-workspace role.. looks like if I'm using the K8S nodepool driver, it uses `kubectl exec` in lieu of `ssh` to run commands and prepare-workspace appears to just puke at me. 23:42
@jim:acmegating.comthat can be reduced to just certain playbooks using the ssh method since the ssh key is a zuul secret; but with a shared directory, it's basically all trusted playbooks get r/w access23:43
@michael_kelly_anet:matrix.orgprepare-workspace-git seems happy though.  This is presumably the suggested soln here?23:43
@clarkb:matrix.orgI wonder if we should put a big warning on prepare-workspace and tell people to use prepare-workspace-git23:43
@michael_kelly_anet:matrix.orgRegarding the permissions on the volume mounts: That's a good point. So, maybe I fall back to the initial suggestion. 23:44
@michael_kelly_anet:matrix.orgClark: The tutorial tells you to use prepare-workspace :)23:44
@clarkb:matrix.orgprepare-workspace uses rsync which needs ssh and can't properly deal with git caching due to how git packs objects23:44
@jim:acmegating.comMichael Kelly: yeah.  and watch out for any roles using rsync23:44
@michael_kelly_anet:matrix.orgblerg.  23:44
@jim:acmegating.comthere is a version of prepare workspace that uses oc rsync i think23:44
@michael_kelly_anet:matrix.orgGood to know (although this was pretty obvious from the situation with prepare-workspace, I suppose)23:44
@clarkb:matrix.orgprepare-workspace-git does all the manipulation with git which can figure out a delta if you've got a cache on the test node23:45
@michael_kelly_anet:matrix.orgGotcha.23:45
-@gerrit:opendev.org- Ian Wienand proposed:23:45
- [zuul/zuul] 853369: zuul-stream: validate on per node basis https://review.opendev.org/c/zuul/zuul/+/853369
- [zuul/zuul] 853208: zuul-stream : Test against a Python 2.7 container https://review.opendev.org/c/zuul/zuul/+/853208
@michael_kelly_anet:matrix.orgJust wanted to make sure I wasn't doing something overly bad there.23:45
@michael_kelly_anet:matrix.orgBroader question regarding the operator - how `production ready` is it considered to be?23:45

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!