-@gerrit:opendev.org- Ian Wienand proposed: | 01:33 | |
- [zuul/zuul] 853369: zuul-stream: validate on per node basis https://review.opendev.org/c/zuul/zuul/+/853369 | ||
- [zuul/zuul] 853208: zuul-stream : Test against a Python 2.7 container https://review.opendev.org/c/zuul/zuul/+/853208 | ||
- [zuul/zuul] 853382: zuul-stream: fix typo in file detection https://review.opendev.org/c/zuul/zuul/+/853382 | ||
@iwienand:matrix.org | things I learnt from that series ^ | 01:33 |
---|---|---|
@iwienand:matrix.org | "ssh-keyscan localhost -p 2022" does *not* scan localhost on port 2022. It scans localhost, and two hosts called "-p" and "2022" which it vaguely warns about, but exits successfully anyway. So although you will think you have keys for the ssh listening on port 2022, you actually have them for the ssh listening on port 22 | 01:35 |
@iwienand:matrix.org | "ssh-keyscan -p 2022 localhost" *will* scan the keys on port 2022 | 01:35 |
@iwienand:matrix.org | "_inventory" is not a safe variable name to set_fact: with when working with something inventory-ish. You will end up with some mangled thing including ansible_facts | 01:36 |
@iwienand:matrix.org | this will be quite unobvious until you've dumped the variables in debug many, many times :) | 01:37 |
@iwienand:matrix.org | ansible is smart enough to find python3 in containers that have python2 in /usr/local/bin, and you want to set ansible_python_interpreter to explicitly use python2. this probably isn't that surprising in hindsight, but can mean you're not testing what you think you're testing | 01:38 |
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand: [zuul/zuul] 850718: zuul-stream : don't write out logfile for tasks in loops https://review.opendev.org/c/zuul/zuul/+/850718 | 01:54 | |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 852191: Web: disable scroll wheel zoom in job graph https://review.opendev.org/c/zuul/zuul/+/852191 | 02:56 | |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 852202: Web: fix tabs on project page https://review.opendev.org/c/zuul/zuul/+/852202 | 03:09 | |
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 851942: docs: update console streaming docs https://review.opendev.org/c/zuul/zuul/+/851942 | 06:58 | |
-@gerrit:opendev.org- Stephen Finucane proposed: | 09:58 | |
- [zuul/zuul] 853419: Add index page for 'config' docs directory https://review.opendev.org/c/zuul/zuul/+/853419 | ||
- [zuul/zuul] 853420: setup: Replace dash-separated config https://review.opendev.org/c/zuul/zuul/+/853420 | ||
- [zuul/zuul] 853421: docs: Remove unnecessary noise https://review.opendev.org/c/zuul/zuul/+/853421 | ||
@jim:acmegating.com | the gerrit folks are working on removing the extension point that the zuul-summary-plugin uses: https://gerrit-review.googlesource.com/c/gerrit/+/342836 | 13:54 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 853351: Add zuul change queue https://review.opendev.org/c/zuul/zuul/+/853351 | 14:13 | |
-@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [zuul/zuul] 852289: Update unit test container setup and instructions https://review.opendev.org/c/zuul/zuul/+/852289 | 14:17 | |
@clarkb:matrix.org | Has anyone else been noticing flakier than usual SSH connectivity with Ansible? OpenDev has seen this with Zuul + Ansible and nested Ansible in Zuul jobs. The SSH connections go between different cloud providers, within the same provider, and sometimes fail against localhost. Different distros all seem to be affected too (centos 8 and 9 and ubuntu focal at least) | 17:38 |
@clarkb:matrix.org | The fact that we see thsi between different distros and with connections to localhost even makes me suspect something about ansible itself. But I haven't found any clear indications ansible is to blame. | 17:39 |
@clarkb:matrix.org | Its also not super persistent. It happens enough to be noticeable, but also infrequent enough that it doesn't completely halt everything | 17:50 |
@jim:acmegating.com | Clark: are all the tenants on ansible v5 now? | 18:06 |
@clarkb:matrix.org | > <@jim:acmegating.com> Clark: are all the tenants on ansible v5 now? | 18:09 |
Yes everyone should be defaulting to Ansible v5 now | ||
@clarkb:matrix.org | I noticed that ansible-core did a release on monday (2.12.8) but our executors are still on the prior release 2.12.7. | 18:11 |
@clarkb:matrix.org | also this was happening at least last week as well | 18:12 |
@jim:acmegating.com | Clark: i'm aware of some users encountering problems with ansible crashing/hanging on old (2.9) ansible, but they haven't upgraded to ansible 5 yet. that's 2 major differences (crashing vs connection error, plus also version differences) so i doubt it's the same, but it's the closest thing i have to offer. i'm working with another user who is having connection problems, but i'm 95% sure that's related to job/node issues at this point, so i would not draw a line connecting those yet. tl;dr, i don't have any other accounts to corroborate with that, only things to track and potentially correlate later if things change. | 18:23 |
@clarkb:matrix.org | Ok, good to know | 18:23 |
@y2kenny:matrix.org | Does zuul provide a way to suspend service in preparation for shutdown? (for example, suspend processing of incoming event but allow existing jobs to run to completion?) | 18:36 |
@tristanc_:matrix.org | Kenny Ho: zuul-executor graceful should do that, see: https://zuul-ci.org/docs/zuul/latest/operation.html#executor | 18:55 |
@y2kenny:matrix.org | ok... I was wondering if there is something at the scheduler level but executor should be useful also. | 19:04 |
@jim:acmegating.com | the current version of zuul is designed to never need to be shut down :) | 19:07 |
@clarkb:matrix.org | > <@jim:acmegating.com> Clark: i'm aware of some users encountering problems with ansible crashing/hanging on old (2.9) ansible, but they haven't upgraded to ansible 5 yet. that's 2 major differences (crashing vs connection error, plus also version differences) so i doubt it's the same, but it's the closest thing i have to offer. i'm working with another user who is having connection problems, but i'm 95% sure that's related to job/node issues at this point, so i would not draw a line connecting those yet. tl;dr, i don't have any other accounts to corroborate with that, only things to track and potentially correlate later if things change. | 19:43 |
After further investigation I've pushed https://review.opendev.org/c/openstack/project-config/+/853536 suspecting that maybe ssh scanners and brute forcers may be consuming all our slots. | ||
@y2kenny:matrix.org | > <@jim:acmegating.com> the current version of zuul is designed to never need to be shut down :) | 19:44 |
ok so I am definitely using it wrong lol :) | ||
@jim:acmegating.com | Clark: wow. that would also explain the unique circumstances :) | 19:44 |
@jim:acmegating.com | Kenny Ho: all zuul components can be run redundantly so the whole system can rolling restart. opendev does that once a week automatically currently, and is considering doing it continuously. | 19:46 |
@jim:acmegating.com | depending on job runtime, it may not be fast, but it can be seamless | 19:46 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 853372: Convert GCE to state machine driver and remove simple https://review.opendev.org/c/zuul/nodepool/+/853372 | 21:51 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 853548: Deprecate Ansible 2, make Ansible 5 default https://review.opendev.org/c/zuul/zuul/+/853548 | 22:13 | |
@jim:acmegating.com | zuul-maint: ^ i tihnk we should merge 853548, make a 6.X release, then remove ansible 5 and make a 7.0 release. how does that sound? | 22:15 |
@jim:acmegating.com | * zuul-maint: ^ i tihnk we should merge 853548, make a 6.X release, then remove ansible 2 and make a 7.0 release. how does that sound? | 22:15 |
@pearcetyler:matrix.org | Any idea what would cause Zuul to ignore the retry limit on jobs? I'm using the default retry limit of 3, and it correctly fails after 3 attempts if there are legitimate errors, but sometimes gets stuck in a loop (I think on the `prepare-workspace` step). The workaround is to abort and re-queue the PR, but the manual intervention is not ideal | 22:15 |
@jim:acmegating.com | (then do it all again for ansible 6) | 22:15 |
@clarkb:matrix.org | > <@jim:acmegating.com> zuul-maint: ^ i tihnk we should merge 853548, make a 6.X release, then remove ansible 2 and make a 7.0 release. how does that sound? | 22:16 |
No objections from me. Might be worth polling on zuul-discuss? I don't know how attached to specific ansible versions people are | ||
@jim:acmegating.com | Clark: well, i mean, they are EOL.... | 22:17 |
@jim:acmegating.com | i don't think we really have a choice? | 22:17 |
@jim:acmegating.com | Tyler Pearce: certainly not expected behavior | 22:18 |
@pearcetyler:matrix.org | For reference, here's a legitimate failure that honors retry attempts: | 22:19 |
@pearcetyler:matrix.org | So I know the limit is there/working | 22:19 |
@pearcetyler:matrix.org | The infinite retry happens probably once a week on average, for several months (Sometimes a few weeks between issue, sometimes multiple times per week). I can't determine any kind of pattern here, and it does succeed after abort/re-queue | 22:20 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 853552: Add Ansible 6 https://review.opendev.org/c/zuul/zuul/+/853552 | 22:29 | |
@clarkb:matrix.org | corvus: ya thats a good point. | 22:30 |
@iwienand:matrix.org | corvus: minor point but would be nice if could have zuul-console docs in same release as the other updates https://review.opendev.org/c/zuul/zuul/+/851942 | 23:14 |
@jim:acmegating.com | ianw: agreed; +2 here Clark that could use a re-review from you when you have a sec | 23:16 |
@iwienand:matrix.org | also it looks like the ansible 6 stuff just links into the base versions of console/stream so shouldn't be different, but https://review.opendev.org/q/topic:stream-2.7-container should regression test those bits back to 2.7 | 23:17 |
@michael_kelly_anet:matrix.org | hey folks. I'm exploring using Zuul in Kubernetes. right now I'm using the zuul operator, but I've run into a few areas that I'm not sure how to proceed. | 23:19 |
@michael_kelly_anet:matrix.org | Specifically, trying to figure out how to deal with build results in Kubernetes + looks like there's some behaviours of the Kubernetes nodepool driver that make some of the examples maybe not work as expected. | 23:20 |
@michael_kelly_anet:matrix.org | The upload-logs role doesn't appear to be sticking them anywhere useful for me, for example. How is one intended to accomplish this in a k8s operator based deployment? | 23:21 |
@iwienand:matrix.org | > <@iwienand:matrix.org> also it looks like the ansible 6 stuff just links into the base versions of console/stream so shouldn't be different, but https://review.opendev.org/q/topic:stream-2.7-container should regression test those bits back to 2.7 | 23:22 |
actually i'm going to respin that, i think the bottom change needs an update | ||
@jim:acmegating.com | Michael Kelly: you'll definitely need a place to put the build logs; that can either be a cloud object storage, or a static file server. the zuul operator doesn't automate that. | 23:23 |
@jim:acmegating.com | Michael Kelly: for example, if you're using a k8s from aws, you might want to store logs in s3. if you're doing it on-prem, you might want to set up a static server. | 23:24 |
@michael_kelly_anet:matrix.org | This is in a baremetal k8s cluster, fyi. | 23:24 |
@michael_kelly_anet:matrix.org | Right now, the upload-logs role (?) in the executor appears to be trying to rsync them to a subtree `/srv` | 23:24 |
@michael_kelly_anet:matrix.org | In it's own filesystem. | 23:24 |
@jim:acmegating.com | Michael Kelly: the zuul-job roles for uploading logs are here, so these are the options available from the community: https://zuul-ci.org/docs/zuul-jobs/log-roles.html | 23:24 |
@jim:acmegating.com | the operator isn't going to set up any base jobs, so that's sort of an early task you'll need to do when deploying. | 23:26 |
@michael_kelly_anet:matrix.org | I have the base jobs setup based on https://zuul-ci.org/docs/zuul/latest/tutorials/quick-start.html#configure-a-base-job | 23:26 |
@michael_kelly_anet:matrix.org | (this is where I added `upload-logs` from) | 23:26 |
@jim:acmegating.com | ah, that's designed for a docker-compose set up where the volume is shared with a static server | 23:27 |
@michael_kelly_anet:matrix.org | Is a sensible approach here to jsut create a Kubernetes PV and mount this in via `jobVolumes`? | 23:27 |
@jim:acmegating.com | so basically, you'd need to tweak that to have it upload to the server (probably via ssh/scp) instead. the https://zuul-ci.org/docs/zuul-jobs/log-roles.html#role-add-fileserver can help with that | 23:27 |
@michael_kelly_anet:matrix.org | (can I mount this via `jobVolumes`?) | 23:28 |
@jim:acmegating.com | that would restrict your executor and log server to the same k8s node -- so maybe good for a prototype, but less great for production | 23:28 |
@michael_kelly_anet:matrix.org | jobvolumes only support hostpath I guess? | 23:28 |
@jim:acmegating.com | for production, i'd go with setting up a static server with an k8s-internal service and upload to it over ssh using those roles. | 23:29 |
@jim:acmegating.com | (honestly, i'm not sure it's that much harder to do the zuul secret encryption + roles, vs the k8s volume stuff, so i might even just suggest doing that from the start) | 23:30 |
@jim:acmegating.com | maybe we should adjust the tutorial to do that too... we might be keeping it too simple by having the shared volume. it's kind of a balancing act. | 23:31 |
@michael_kelly_anet:matrix.org | For sure. | 23:31 |
@michael_kelly_anet:matrix.org | Fwiw, in my case I have a storage device that will auto-provision PVs for me, so if it's just a case of needing support for non-hostpath volumes in there it could be easy enough to do that, assuming it was acceptable. | 23:32 |
@michael_kelly_anet:matrix.org | I have some fixes/changes I had intended to push for the operator anyway... | 23:32 |
@jim:acmegating.com | Michael Kelly: since i rambled a bit: the tl;dr is: I'd use https://zuul-ci.org/docs/zuul-jobs/log-roles.html#role-add-fileserver in combination with https://zuul-ci.org/docs/zuul-jobs/log-roles.html#role-upload-logs for your environment. | 23:32 |
@jim:acmegating.com | Michael Kelly: that arrangement (upload via ssh) lets you scale out to multiple executors and locate them on different k8s nodes than your log storage. executors are the first thing that needs to scale in zuul. | 23:34 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-client] 853352: Add zuul-client to zuul change queue https://review.opendev.org/c/zuul/zuul-client/+/853352 | 23:34 | |
@michael_kelly_anet:matrix.org | ```on different k8s nodes than your log storage``` | 23:35 |
In my case, using a PV would be on the external storage device. | ||
@michael_kelly_anet:matrix.org | Not on the K8S node. | 23:35 |
@michael_kelly_anet:matrix.org | Thus the question as to whether support for non-hostpath volumes would be acceptable. :) | 23:36 |
@jim:acmegating.com | right, but there are 2 pods involved here: the executor which needs to "upload" the log, and a static http server (apache/nginx/whatever) to serve the content | 23:36 |
@jim:acmegating.com | so if you want to just use a volume, then it'd have to be a hostpath that's shared between the pods | 23:36 |
@michael_kelly_anet:matrix.org | Totally. | 23:36 |
@jim:acmegating.com | so i think the only viable choices are (PV, hostpath, apache, executor all on one k8s node) or (upload via ssh, and we don't care what nodes apache and executor are on or what kind of volume is attached to apache) | 23:39 |
@jim:acmegating.com | unless i'm missing something... there's a lot of... variety... in k8s :) | 23:39 |
@michael_kelly_anet:matrix.org | The PV can be shared between pods even if not on the same node. | 23:39 |
@michael_kelly_anet:matrix.org | Either via NFS mount or via native ReadWriteMany support | 23:39 |
@michael_kelly_anet:matrix.org | (the latter depends on the CSI driver: in our case it definitely supports it) | 23:40 |
@jim:acmegating.com | okay that's what i was missing... then yes, you could probably do that and share the directory like in the quick start | 23:40 |
@jim:acmegating.com | Michael Kelly: in that case there are access controls to consider; basically, any trusted playbook in zuul will have full access to the log storage | 23:42 |
@michael_kelly_anet:matrix.org | Yea, I have to think that through a bit. | 23:42 |
@michael_kelly_anet:matrix.org | :) | 23:42 |
@michael_kelly_anet:matrix.org | The other question I had was around the prepare-workspace role.. looks like if I'm using the K8S nodepool driver, it uses `kubectl exec` in lieu of `ssh` to run commands and prepare-workspace appears to just puke at me. | 23:42 |
@jim:acmegating.com | that can be reduced to just certain playbooks using the ssh method since the ssh key is a zuul secret; but with a shared directory, it's basically all trusted playbooks get r/w access | 23:43 |
@michael_kelly_anet:matrix.org | prepare-workspace-git seems happy though. This is presumably the suggested soln here? | 23:43 |
@clarkb:matrix.org | I wonder if we should put a big warning on prepare-workspace and tell people to use prepare-workspace-git | 23:43 |
@michael_kelly_anet:matrix.org | Regarding the permissions on the volume mounts: That's a good point. So, maybe I fall back to the initial suggestion. | 23:44 |
@michael_kelly_anet:matrix.org | Clark: The tutorial tells you to use prepare-workspace :) | 23:44 |
@clarkb:matrix.org | prepare-workspace uses rsync which needs ssh and can't properly deal with git caching due to how git packs objects | 23:44 |
@jim:acmegating.com | Michael Kelly: yeah. and watch out for any roles using rsync | 23:44 |
@michael_kelly_anet:matrix.org | blerg. | 23:44 |
@jim:acmegating.com | there is a version of prepare workspace that uses oc rsync i think | 23:44 |
@michael_kelly_anet:matrix.org | Good to know (although this was pretty obvious from the situation with prepare-workspace, I suppose) | 23:44 |
@clarkb:matrix.org | prepare-workspace-git does all the manipulation with git which can figure out a delta if you've got a cache on the test node | 23:45 |
@michael_kelly_anet:matrix.org | Gotcha. | 23:45 |
-@gerrit:opendev.org- Ian Wienand proposed: | 23:45 | |
- [zuul/zuul] 853369: zuul-stream: validate on per node basis https://review.opendev.org/c/zuul/zuul/+/853369 | ||
- [zuul/zuul] 853208: zuul-stream : Test against a Python 2.7 container https://review.opendev.org/c/zuul/zuul/+/853208 | ||
@michael_kelly_anet:matrix.org | Just wanted to make sure I wasn't doing something overly bad there. | 23:45 |
@michael_kelly_anet:matrix.org | Broader question regarding the operator - how `production ready` is it considered to be? | 23:45 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!