Wednesday, 2022-08-17

-@gerrit:opendev.org- Ian Wienand proposed:		01:33
- [zuul/zuul] 853369: zuul-stream: validate on per node basis https://review.opendev.org/c/zuul/zuul/+/853369
- [zuul/zuul] 853208: zuul-stream : Test against a Python 2.7 container https://review.opendev.org/c/zuul/zuul/+/853208
- [zuul/zuul] 853382: zuul-stream: fix typo in file detection https://review.opendev.org/c/zuul/zuul/+/853382
@iwienand:matrix.org	things I learnt from that series ^	01:33
@iwienand:matrix.org	"ssh-keyscan localhost -p 2022" does not scan localhost on port 2022. It scans localhost, and two hosts called "-p" and "2022" which it vaguely warns about, but exits successfully anyway. So although you will think you have keys for the ssh listening on port 2022, you actually have them for the ssh listening on port 22	01:35
@iwienand:matrix.org	"ssh-keyscan -p 2022 localhost" will scan the keys on port 2022	01:35
@iwienand:matrix.org	"_inventory" is not a safe variable name to set_fact: with when working with something inventory-ish. You will end up with some mangled thing including ansible_facts	01:36
@iwienand:matrix.org	this will be quite unobvious until you've dumped the variables in debug many, many times :)	01:37
@iwienand:matrix.org	ansible is smart enough to find python3 in containers that have python2 in /usr/local/bin, and you want to set ansible_python_interpreter to explicitly use python2. this probably isn't that surprising in hindsight, but can mean you're not testing what you think you're testing	01:38
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand: [zuul/zuul] 850718: zuul-stream : don't write out logfile for tasks in loops https://review.opendev.org/c/zuul/zuul/+/850718		01:54
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 852191: Web: disable scroll wheel zoom in job graph https://review.opendev.org/c/zuul/zuul/+/852191		02:56
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 852202: Web: fix tabs on project page https://review.opendev.org/c/zuul/zuul/+/852202		03:09
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul] 851942: docs: update console streaming docs https://review.opendev.org/c/zuul/zuul/+/851942		06:58
-@gerrit:opendev.org- Stephen Finucane proposed:		09:58
- [zuul/zuul] 853419: Add index page for 'config' docs directory https://review.opendev.org/c/zuul/zuul/+/853419
- [zuul/zuul] 853420: setup: Replace dash-separated config https://review.opendev.org/c/zuul/zuul/+/853420
- [zuul/zuul] 853421: docs: Remove unnecessary noise https://review.opendev.org/c/zuul/zuul/+/853421
@jim:acmegating.com	the gerrit folks are working on removing the extension point that the zuul-summary-plugin uses: https://gerrit-review.googlesource.com/c/gerrit/+/342836	13:54
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 853351: Add zuul change queue https://review.opendev.org/c/zuul/zuul/+/853351		14:13
-@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [zuul/zuul] 852289: Update unit test container setup and instructions https://review.opendev.org/c/zuul/zuul/+/852289		14:17
@clarkb:matrix.org	Has anyone else been noticing flakier than usual SSH connectivity with Ansible? OpenDev has seen this with Zuul + Ansible and nested Ansible in Zuul jobs. The SSH connections go between different cloud providers, within the same provider, and sometimes fail against localhost. Different distros all seem to be affected too (centos 8 and 9 and ubuntu focal at least)	17:38
@clarkb:matrix.org	The fact that we see thsi between different distros and with connections to localhost even makes me suspect something about ansible itself. But I haven't found any clear indications ansible is to blame.	17:39
@clarkb:matrix.org	Its also not super persistent. It happens enough to be noticeable, but also infrequent enough that it doesn't completely halt everything	17:50
@jim:acmegating.com	Clark: are all the tenants on ansible v5 now?	18:06
@clarkb:matrix.org	> <@jim:acmegating.com> Clark: are all the tenants on ansible v5 now?	18:09
Yes everyone should be defaulting to Ansible v5 now
@clarkb:matrix.org	I noticed that ansible-core did a release on monday (2.12.8) but our executors are still on the prior release 2.12.7.	18:11
@clarkb:matrix.org	also this was happening at least last week as well	18:12
@jim:acmegating.com	Clark: i'm aware of some users encountering problems with ansible crashing/hanging on old (2.9) ansible, but they haven't upgraded to ansible 5 yet. that's 2 major differences (crashing vs connection error, plus also version differences) so i doubt it's the same, but it's the closest thing i have to offer. i'm working with another user who is having connection problems, but i'm 95% sure that's related to job/node issues at this point, so i would not draw a line connecting those yet. tl;dr, i don't have any other accounts to corroborate with that, only things to track and potentially correlate later if things change.	18:23
@clarkb:matrix.org	Ok, good to know	18:23
@y2kenny:matrix.org	Does zuul provide a way to suspend service in preparation for shutdown? (for example, suspend processing of incoming event but allow existing jobs to run to completion?)	18:36
@tristanc_:matrix.org	Kenny Ho: zuul-executor graceful should do that, see: https://zuul-ci.org/docs/zuul/latest/operation.html#executor	18:55
@y2kenny:matrix.org	ok... I was wondering if there is something at the scheduler level but executor should be useful also.	19:04
@jim:acmegating.com	the current version of zuul is designed to never need to be shut down :)	19:07
@clarkb:matrix.org	> <@jim:acmegating.com> Clark: i'm aware of some users encountering problems with ansible crashing/hanging on old (2.9) ansible, but they haven't upgraded to ansible 5 yet. that's 2 major differences (crashing vs connection error, plus also version differences) so i doubt it's the same, but it's the closest thing i have to offer. i'm working with another user who is having connection problems, but i'm 95% sure that's related to job/node issues at this point, so i would not draw a line connecting those yet. tl;dr, i don't have any other accounts to corroborate with that, only things to track and potentially correlate later if things change.	19:43
After further investigation I've pushed https://review.opendev.org/c/openstack/project-config/+/853536 suspecting that maybe ssh scanners and brute forcers may be consuming all our slots.
@y2kenny:matrix.org	> <@jim:acmegating.com> the current version of zuul is designed to never need to be shut down :)	19:44
ok so I am definitely using it wrong lol :)
@jim:acmegating.com	Clark: wow. that would also explain the unique circumstances :)	19:44
@jim:acmegating.com	Kenny Ho: all zuul components can be run redundantly so the whole system can rolling restart. opendev does that once a week automatically currently, and is considering doing it continuously.	19:46
@jim:acmegating.com	depending on job runtime, it may not be fast, but it can be seamless	19:46
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 853372: Convert GCE to state machine driver and remove simple https://review.opendev.org/c/zuul/nodepool/+/853372		21:51
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 853548: Deprecate Ansible 2, make Ansible 5 default https://review.opendev.org/c/zuul/zuul/+/853548		22:13
@jim:acmegating.com	zuul-maint: ^ i tihnk we should merge 853548, make a 6.X release, then remove ansible 5 and make a 7.0 release. how does that sound?	22:15
@jim:acmegating.com	* zuul-maint: ^ i tihnk we should merge 853548, make a 6.X release, then remove ansible 2 and make a 7.0 release. how does that sound?	22:15
@pearcetyler:matrix.org	Any idea what would cause Zuul to ignore the retry limit on jobs? I'm using the default retry limit of 3, and it correctly fails after 3 attempts if there are legitimate errors, but sometimes gets stuck in a loop (I think on the `prepare-workspace` step). The workaround is to abort and re-queue the PR, but the manual intervention is not ideal	22:15
@jim:acmegating.com	(then do it all again for ansible 6)	22:15
@clarkb:matrix.org	> <@jim:acmegating.com> zuul-maint: ^ i tihnk we should merge 853548, make a 6.X release, then remove ansible 2 and make a 7.0 release. how does that sound?	22:16
No objections from me. Might be worth polling on zuul-discuss? I don't know how attached to specific ansible versions people are
@jim:acmegating.com	Clark: well, i mean, they are EOL....	22:17
@jim:acmegating.com	i don't think we really have a choice?	22:17
@jim:acmegating.com	Tyler Pearce: certainly not expected behavior	22:18
@pearcetyler:matrix.org	For reference, here's a legitimate failure that honors retry attempts:	22:19
@pearcetyler:matrix.org	So I know the limit is there/working	22:19
@pearcetyler:matrix.org	The infinite retry happens probably once a week on average, for several months (Sometimes a few weeks between issue, sometimes multiple times per week). I can't determine any kind of pattern here, and it does succeed after abort/re-queue	22:20
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 853552: Add Ansible 6 https://review.opendev.org/c/zuul/zuul/+/853552		22:29
@clarkb:matrix.org	corvus: ya thats a good point.	22:30
@iwienand:matrix.org	corvus: minor point but would be nice if could have zuul-console docs in same release as the other updates https://review.opendev.org/c/zuul/zuul/+/851942	23:14
@jim:acmegating.com	ianw: agreed; +2 here Clark that could use a re-review from you when you have a sec	23:16
@iwienand:matrix.org	also it looks like the ansible 6 stuff just links into the base versions of console/stream so shouldn't be different, but https://review.opendev.org/q/topic:stream-2.7-container should regression test those bits back to 2.7	23:17
@michael_kelly_anet:matrix.org	hey folks. I'm exploring using Zuul in Kubernetes. right now I'm using the zuul operator, but I've run into a few areas that I'm not sure how to proceed.	23:19
@michael_kelly_anet:matrix.org	Specifically, trying to figure out how to deal with build results in Kubernetes + looks like there's some behaviours of the Kubernetes nodepool driver that make some of the examples maybe not work as expected.	23:20
@michael_kelly_anet:matrix.org	The upload-logs role doesn't appear to be sticking them anywhere useful for me, for example. How is one intended to accomplish this in a k8s operator based deployment?	23:21
@iwienand:matrix.org	> <@iwienand:matrix.org> also it looks like the ansible 6 stuff just links into the base versions of console/stream so shouldn't be different, but https://review.opendev.org/q/topic:stream-2.7-container should regression test those bits back to 2.7	23:22
actually i'm going to respin that, i think the bottom change needs an update
@jim:acmegating.com	Michael Kelly: you'll definitely need a place to put the build logs; that can either be a cloud object storage, or a static file server. the zuul operator doesn't automate that.	23:23
@jim:acmegating.com	Michael Kelly: for example, if you're using a k8s from aws, you might want to store logs in s3. if you're doing it on-prem, you might want to set up a static server.	23:24
@michael_kelly_anet:matrix.org	This is in a baremetal k8s cluster, fyi.	23:24
@michael_kelly_anet:matrix.org	Right now, the upload-logs role (?) in the executor appears to be trying to rsync them to a subtree `/srv`	23:24
@michael_kelly_anet:matrix.org	In it's own filesystem.	23:24
@jim:acmegating.com	Michael Kelly: the zuul-job roles for uploading logs are here, so these are the options available from the community: https://zuul-ci.org/docs/zuul-jobs/log-roles.html	23:24
@jim:acmegating.com	the operator isn't going to set up any base jobs, so that's sort of an early task you'll need to do when deploying.	23:26
@michael_kelly_anet:matrix.org	I have the base jobs setup based on https://zuul-ci.org/docs/zuul/latest/tutorials/quick-start.html#configure-a-base-job	23:26
@michael_kelly_anet:matrix.org	(this is where I added `upload-logs` from)	23:26
@jim:acmegating.com	ah, that's designed for a docker-compose set up where the volume is shared with a static server	23:27
@michael_kelly_anet:matrix.org	Is a sensible approach here to jsut create a Kubernetes PV and mount this in via `jobVolumes`?	23:27
@jim:acmegating.com	so basically, you'd need to tweak that to have it upload to the server (probably via ssh/scp) instead. the https://zuul-ci.org/docs/zuul-jobs/log-roles.html#role-add-fileserver can help with that	23:27
@michael_kelly_anet:matrix.org	(can I mount this via `jobVolumes`?)	23:28
@jim:acmegating.com	that would restrict your executor and log server to the same k8s node -- so maybe good for a prototype, but less great for production	23:28
@michael_kelly_anet:matrix.org	jobvolumes only support hostpath I guess?	23:28
@jim:acmegating.com	for production, i'd go with setting up a static server with an k8s-internal service and upload to it over ssh using those roles.	23:29
@jim:acmegating.com	(honestly, i'm not sure it's that much harder to do the zuul secret encryption + roles, vs the k8s volume stuff, so i might even just suggest doing that from the start)	23:30
@jim:acmegating.com	maybe we should adjust the tutorial to do that too... we might be keeping it too simple by having the shared volume. it's kind of a balancing act.	23:31
@michael_kelly_anet:matrix.org	For sure.	23:31
@michael_kelly_anet:matrix.org	Fwiw, in my case I have a storage device that will auto-provision PVs for me, so if it's just a case of needing support for non-hostpath volumes in there it could be easy enough to do that, assuming it was acceptable.	23:32
@michael_kelly_anet:matrix.org	I have some fixes/changes I had intended to push for the operator anyway...	23:32
@jim:acmegating.com	Michael Kelly: since i rambled a bit: the tl;dr is: I'd use https://zuul-ci.org/docs/zuul-jobs/log-roles.html#role-add-fileserver in combination with https://zuul-ci.org/docs/zuul-jobs/log-roles.html#role-upload-logs for your environment.	23:32
@jim:acmegating.com	Michael Kelly: that arrangement (upload via ssh) lets you scale out to multiple executors and locate them on different k8s nodes than your log storage. executors are the first thing that needs to scale in zuul.	23:34
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-client] 853352: Add zuul-client to zuul change queue https://review.opendev.org/c/zuul/zuul-client/+/853352		23:34
@michael_kelly_anet:matrix.org	```on different k8s nodes than your log storage```	23:35
In my case, using a PV would be on the external storage device.
@michael_kelly_anet:matrix.org	Not on the K8S node.	23:35
@michael_kelly_anet:matrix.org	Thus the question as to whether support for non-hostpath volumes would be acceptable. :)	23:36
@jim:acmegating.com	right, but there are 2 pods involved here: the executor which needs to "upload" the log, and a static http server (apache/nginx/whatever) to serve the content	23:36
@jim:acmegating.com	so if you want to just use a volume, then it'd have to be a hostpath that's shared between the pods	23:36
@michael_kelly_anet:matrix.org	Totally.	23:36
@jim:acmegating.com	so i think the only viable choices are (PV, hostpath, apache, executor all on one k8s node) or (upload via ssh, and we don't care what nodes apache and executor are on or what kind of volume is attached to apache)	23:39
@jim:acmegating.com	unless i'm missing something... there's a lot of... variety... in k8s :)	23:39
@michael_kelly_anet:matrix.org	The PV can be shared between pods even if not on the same node.	23:39
@michael_kelly_anet:matrix.org	Either via NFS mount or via native ReadWriteMany support	23:39
@michael_kelly_anet:matrix.org	(the latter depends on the CSI driver: in our case it definitely supports it)	23:40
@jim:acmegating.com	okay that's what i was missing... then yes, you could probably do that and share the directory like in the quick start	23:40
@jim:acmegating.com	Michael Kelly: in that case there are access controls to consider; basically, any trusted playbook in zuul will have full access to the log storage	23:42
@michael_kelly_anet:matrix.org	Yea, I have to think that through a bit.	23:42
@michael_kelly_anet:matrix.org	:)	23:42
@michael_kelly_anet:matrix.org	The other question I had was around the prepare-workspace role.. looks like if I'm using the K8S nodepool driver, it uses `kubectl exec` in lieu of `ssh` to run commands and prepare-workspace appears to just puke at me.	23:42
@jim:acmegating.com	that can be reduced to just certain playbooks using the ssh method since the ssh key is a zuul secret; but with a shared directory, it's basically all trusted playbooks get r/w access	23:43
@michael_kelly_anet:matrix.org	prepare-workspace-git seems happy though. This is presumably the suggested soln here?	23:43
@clarkb:matrix.org	I wonder if we should put a big warning on prepare-workspace and tell people to use prepare-workspace-git	23:43
@michael_kelly_anet:matrix.org	Regarding the permissions on the volume mounts: That's a good point. So, maybe I fall back to the initial suggestion.	23:44
@michael_kelly_anet:matrix.org	Clark: The tutorial tells you to use prepare-workspace :)	23:44
@clarkb:matrix.org	prepare-workspace uses rsync which needs ssh and can't properly deal with git caching due to how git packs objects	23:44
@jim:acmegating.com	Michael Kelly: yeah. and watch out for any roles using rsync	23:44
@michael_kelly_anet:matrix.org	blerg.	23:44
@jim:acmegating.com	there is a version of prepare workspace that uses oc rsync i think	23:44
@michael_kelly_anet:matrix.org	Good to know (although this was pretty obvious from the situation with prepare-workspace, I suppose)	23:44
@clarkb:matrix.org	prepare-workspace-git does all the manipulation with git which can figure out a delta if you've got a cache on the test node	23:45
@michael_kelly_anet:matrix.org	Gotcha.	23:45
-@gerrit:opendev.org- Ian Wienand proposed:		23:45
- [zuul/zuul] 853369: zuul-stream: validate on per node basis https://review.opendev.org/c/zuul/zuul/+/853369
- [zuul/zuul] 853208: zuul-stream : Test against a Python 2.7 container https://review.opendev.org/c/zuul/zuul/+/853208
@michael_kelly_anet:matrix.org	Just wanted to make sure I wasn't doing something overly bad there.	23:45
@michael_kelly_anet:matrix.org	Broader question regarding the operator - how `production ready` is it considered to be?	23:45

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!