Thursday, 2023-05-25

-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/nodepool] 884234: Double check node request during node cleanup https://review.opendev.org/c/zuul/nodepool/+/88423406:11
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/nodepool] 884241: Double check node allocation in a few more places https://review.opendev.org/c/zuul/nodepool/+/88424106:14
@jjbeckman:matrix.orgAh, I missed your comment. Thank you for the nudge in the right direction.08:42
I've set the following in `main.yaml`
```
- authorization-rule:
name: rule-name
conditions:
- groups: {redacted}
- api-root:
authentication-realm: {redacted}.com
access-rules: rule-name
- tenant:
name: example-tenant
admin-rules:
- rule-name
```
As a result, when accessing the root URL, after authentication, I get stuck with the message "Login in progress You will be redirected shortly...".
I assume something is missing from my configuration?
@jjbeckman:matrix.orgI've been able to get "authentication + getting privileged actions working in the Web UI". Thanks so much for your advice :D08:43
@mhuin:matrix.org🎉08:43
@flaper87:matrix.org👋 Is there a way to re-trigger a post job from the UI? I'm looking for  a way to retrigger failed jobs manually (periodic jobs, jobs in the post pipeline, etc)08:57
@mhuin:matrix.org> <@flaper87:matrix.org> 👋 Is there a way to re-trigger a post job from the UI? I'm looking for  a way to retrigger failed jobs manually (periodic jobs, jobs in the post pipeline, etc)08:58
if you are authenticated and your token match admin rules set for your tenant, you can re-enqueue from the buildset's summary page
@flaper87:matrix.orgmhu:  a-ha, thanks. Then I think my problem is the admin-rules because I don't see it anywhere. I'm using an OAuth proxy in front of Zuul rather than the Zuul oauth support for now. I'll proobably switch soon. That said, do you have an example on how to make all users admin? This is a small org and we don't really need to have multiple roles just yet09:02
@flaper87:matrix.org"Make authenticated users admin" should be enough09:04
@flaper87:matrix.org * "Make authenticated users admin" should be enough, looking at this: https://zuul-ci.org/docs/zuul/latest/developer/specs/tenant-scoped-admin-web-API.html#access-control-configuration09:04
@mhuin:matrix.orgI'm not sure an oauth proxy would work for authentication in the web UI. IIRC the openid lib used in the UI expects to fetch an auth token09:04
@mhuin:matrix.orgI think you could write a trivial rule, like some condition about the issuer or the audience since this should be the same for every user and under your control09:05
@flaper87:matrix.orgmhu: thanks! Do you know if there's an easy way to test authorization rules without fully deploying them? 09:15
@mhuin:matrix.orgNo but that could be a good feature to add to the zuul-admin CLI09:16
@flaper87:matrix.orgI've enabled oauth with Google but it's definitely not matching my rule09:16
@mhuin:matrix.orgwhat I do is grab a JWT - you can do that on the use page once your authenticated on zuul - and analyze it in the debugger at https://jwt.io09:16
@flaper87:matrix.orgah, thanks! That's useful. I also just found out that Zuul's web client stores a `zuul_auth_param` cookie in the session storage09:20
@woju:invisiblethingslab.comHi, I'm trying to deploy zuul and I'm hitting this error when starting scheduler:13:12
```
2023-05-25 12:05:14,846 ERROR zuul.Scheduler: Error starting Zuul:
Traceback (most recent call last):
File "/opt/zuul/lib/python3.11/site-packages/zuul/cmd/scheduler.py", line 102, in run
self.sched.prime(self.config)
File "/opt/zuul/lib/python3.11/site-packages/zuul/scheduler.py", line 1054, in prime
tenant = loader.loadTenant(
^^^^^^^^^^^^^^^^^^
File "/opt/zuul/lib/python3.11/site-packages/zuul/configloader.py", line 2853, in loadTenant
new_tenant = self.tenant_parser.fromYaml(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/zuul/lib/python3.11/site-packages/zuul/configloader.py", line 1832, in fromYaml
self._cacheTenantYAML(abide, tenant, loading_errors, min_ltimes,
File "/opt/zuul/lib/python3.11/site-packages/zuul/configloader.py", line 2152, in _cacheTenantYAML
future.result()
File "/usr/lib/python3.11/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/zuul/lib/python3.11/site-packages/zuul/configloader.py", line 2211, in _cacheTenantYAMLBranch
self._updateUnparsedBranchCache(
File "/opt/zuul/lib/python3.11/site-packages/zuul/configloader.py", line 2325, in _updateUnparsedBranchCache
min_ltimes[source_context.project_canonical_name][
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'localhost/gramine-zuul-config'
2023-05-25 12:05:14,848 DEBUG zuul.Scheduler: Stopping scheduler
```
(Environment: Debian testing, python 3.11, virtualenv with latest version of zuul from pip as of about a week ago, systemd with custom units)
@woju:invisiblethingslab.comWhat did I do wrong and/or where to look for problems?13:12
@woju:invisiblethingslab.comI've tried reading through `_cacheTenantYAML` function, but I don't understand much13:12
-@gerrit:opendev.org- Matthieu Huin https://matrix.to/#/@mhuin:matrix.org proposed: [zuul/zuul-operator] 884393: Update zuul-ci, opendevorg containers to fetch from quay.io https://review.opendev.org/c/zuul/zuul-operator/+/88439313:13
@woju:invisiblethingslab.comalso there's similar traceback in the log about one another project, same function and line13:14
@fungicide:matrix.orgwoju: is that project included in the tenant config?13:21
@woju:invisiblethingslab.comyes, it is13:22
@woju:invisiblethingslab.comlet me paste13:22
@woju:invisiblethingslab.com```13:22
- tenant:
name: gramine
source:
local:
config-projects:
- gramine-zuul-config
- zuul-base-jobs
- zuul-jobs
untrusted-projects:
- gramine
```
@woju:invisiblethingslab.com"local" is `file:///srv/local` and those repositories are cloned there, though some have only stub `.zuul.yaml`13:23
@woju:invisiblethingslab.comthere's single pipeline `check` and an example job in `gramine*` repos13:23
@fungicide:matrix.org`local` is using the git source connection driver?13:25
@woju:invisiblethingslab.com```13:26
[connection local]
driver=git
baseurl=file:///srv/git
poll_delay=10
```
@woju:invisiblethingslab.comsorry I'm not pasting the whole file, but it's full of secrets :)13:26
@fungicide:matrix.orgsure, just making sure i understand how it's trying to access those repositories13:27
@fungicide:matrix.orgi'm a little out of my depth on this question, having only used the git driver for trivial cases (it doesn't really have usable triggers besides ref-updated, isn't appropriate for gating proposed changes as there's no code review workflow visible to zuul). hopefully someone else can either spot the configuration mistake or knows whether this is a limitation of the git driver13:31
@woju:invisiblethingslab.comfyi those keys are really missing, from `min_ltimes`, I've quickly added `self.log.debug` and:13:33
```
2023-05-25 13:31:29,542 DEBUG zuul.TenantParser: _cacheTenantYAMLBranch(..., min_ltimes={'localhost/zuul-base-jobs': {'master': 132}, 'localhost/zuul-jobs': {'master': 133}})
```
@woju:invisiblethingslab.comeventually I'll move untrusted repo(s) to github, but right now I wanted to just get something running13:34
@fungicide:matrix.orgyeah, possible it's a corner case where there's missing implementation and we don't have test coverage, i'm trying to follow the source and see if i can understand how that could happen13:38
@woju:invisiblethingslab.comI've followed this up to `zk/layout.py:LayoutStateStore.getMinLtimes` and looks like this is data from zookeeper:13:48
@woju:invisiblethingslab.com```13:48
2023-05-25 13:46:07,084 DEBUG zuul.LayoutStore: getMinLtimes(layout_state=<LayoutState gramine: ltime=252, hostname=ip-172-26-9-217.eu-central-1.compute.internal, last_reconfigured=1684776455>) path='/zuul/layout-data/c20051a77d95465b920531cde9bf2191'
2023-05-25 13:46:07,085 DEBUG zuul.LayoutStore: data=b'{"min_ltimes": {"localhost/zuul-base-jobs": {"master": 132}, "localhost/zuul-jobs": {"master": 133}}}'
2023-05-25 13:46:07,086 DEBUG zuul.ConfigLoader: loadTenant(..., tenant_name='gramine', min_ltimes={'localhost/zuul-base-jobs': {'master': 132}, 'localhost/zuul-jobs': {'master': 133}})
2023-05-25 13:46:07,118 DEBUG zuul.TenantParser: _cacheTenantBranch(..., min_ltimes={'localhost/zuul-base-jobs': {'master': 132}, 'localhost/zuul-jobs': {'master': 133}})
2023-05-25 13:46:07,118 DEBUG zuul.TenantParser: _cacheTenantYAMLBranch(..., min_ltimes={'localhost/zuul-base-jobs': {'master': 132}, 'localhost/zuul-jobs': {'master': 133}})
```
@woju:invisiblethingslab.comI have no idea what to do now13:49
@fungicide:matrix.orglooks like this may have changed a couple of months back in https://review.opendev.org/835100 which introduced min_ltimes, though that didn't touch the drivers so i suspect it's not a driver-specific behavior (unless it's relying on aspects of the other drivers which the git driver lacks)13:51
@woju:invisiblethingslab.comthanks for the link, I'll read what I can13:53
@jim:acmegating.comwoju: assuming that your system is dead in the water and you can handle losing the running state information, you might consider running this command to make sure that it's not some kind of data corruption: https://zuul-ci.org/docs/zuul/latest/client.html#delete-state13:53
@woju:invisiblethingslab.comyeah, it's pretty much empty13:53
@jim:acmegating.comthat's the big red "reset everything" button13:53
@fungicide:matrix.orgalso i suppose if /zuul/layout-data isn't writeable by the service that could cause cache problems13:53
@fungicide:matrix.orgmmm, that's the zk path though not a disk path13:54
@fungicide:matrix.organd presumably zk is working for other aspects of the system13:54
@woju:invisiblethingslab.comyes, this is `self.log.debug(f'{path=}')`, a local variable from this function13:55
@fungicide:matrix.org * <del>also i suppose if /zuul/layout-data isn't writeable by the service that could cause cache problems</del>13:55
@woju:invisiblethingslab.com> <@jim:acmegating.com> woju: assuming that your system is dead in the water and you can handle losing the running state information, you might consider running this command to make sure that it's not some kind of data corruption: https://zuul-ci.org/docs/zuul/latest/client.html#delete-state13:58
this has helped, thank you!
@fungicide:matrix.orgzuul equivalent of "turn it off and on again" ;)14:00
@fungicide:matrix.orgwoju: so it's getting further after clearing the state?14:01
@woju:invisiblethingslab.comit got `defaultdict`s as this min_ltime= argument, and the service started (I still have this ad-hoc logging)14:07
@woju:invisiblethingslab.comI'll continue setting it up, let's see how far I'll go14:07
@fungicide:matrix.orgi wonder if you could have been running a version of zuul from before the schema change and the migration didn't get applied to the existing data when updating to the newer version14:09
@woju:invisiblethingslab.comprobably not a chance, I've started deploying this installation two weeks ago and this change is from March?14:10
@fungicide:matrix.orgthe change merged in march but may not have appeared in a release until much later14:11
@woju:invisiblethingslab.comah, yes, I didn't check this14:11
@fungicide:matrix.orger, no ignore me. i can't remember what year it is any more. the change that updated the schema for that was from march of last year, not this year14:12
@woju:invisiblethingslab.com:)14:12
@fungicide:matrix.orgbut as far as release timelines, there was a several-month gap between 8.2.0 (2023-02-21) and 8.3.0 (2023-05-16) so if you started a couple of weeks ago then something related may have changed in 8.3.0/8.3.1 which led to the existing data not being viable14:14
@woju:invisiblethingslab.comlet me check the versions14:15
@woju:invisiblethingslab.compython's package metadata says 8.3.114:15
@woju:invisiblethingslab.comI suspect I had something misconfigured and data in zookeeper got corrupted14:17
@fungicide:matrix.orgthat seems increasingly likely. the last change to the zk model was in 8.2.0 and i don't see any warnings in the release notes about any additional steps needed on update either14:18
@tristanc_:matrix.orgcorvus: Clark looking at https://review.opendev.org/c/zuul/zuul-operator/+/881245 , it seems like kubernetes is no longer pulling the image from the intermediate registry, is this a known issue with quay.io ?15:39
@jim:acmegating.comtristanC: it seems like the docker-based solutions for speculative execution don't work for images with quay.io.  so we're moving all the other repos to podman to address that.  i'm not sure exactly what is broken with that k8s setup, but i suspect we might need to switch to microk8s since that is the most recent thing that ianw got working with speculative registry jobs.15:44
@tristanc_:matrix.orgcorvus: alright, I meant to try that, would you mind if I update your change?15:44
@jim:acmegating.comtristanC: please do and thanks!15:45
@jim:acmegating.comi've been focusing on the other repos.  i think we're 99% there; the podman changes are ready for both zuul and nodepool, but i want to make sure i fully understand the implications of some cgroup stuff in https://review.opendev.org/883952 before we actually merge them.15:46
@jim:acmegating.comtristanC: btw there should be some jobs in zuul-jobs using the microk8s roles for reference15:48
-@gerrit:opendev.org- Tristan Cacqueray proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-operator] 881245: Publish container images to quay.io https://review.opendev.org/c/zuul/zuul-operator/+/88124515:48
@tristanc_:matrix.orgcorvus: on 883952, it seems like the service is restarting because of `PermissionError: [Errno 13] Permission denied: '/opt/dib/images/builder_id.txt'`16:07
@jim:acmegating.comtristanC: agreed, i have a held node and it looks like podman is not setting up the expected subuidmap, so i'm trying to work out the right set of arguments for that16:13
-@gerrit:opendev.org- Tristan Cacqueray proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-operator] 881245: Publish container images to quay.io https://review.opendev.org/c/zuul/zuul-operator/+/88124518:19
@jim:acmegating.comokay so the issue with 883952 is that we bind mount files owned by user 10001 into the container.  that is the 'nodepool' user on the host, and also the 'nodepool' user in the container.  we are running as the 'zuul' user on the host, and the zuul user on the host does not have permission to map host uid 10001 into the container (it's allowed uids 1000 (itself) and 100000-65535).  so this is a design mismatch between rootless and root containers.  this test mimics the opendev deployment style where we have zuul and nodepool users on the host which are intended to match the in-container users of the same name and uid for our bind mounts.  i think we can resolve it by either doing "sudo podman" so we don't end up with a userns and we mount everything in (so basically, run like docker).  or redesign the test and opendev deployments for rootless use (which probably means chowning this directory on the host to be owned by "zuul" which we would then either map to the "nodepool" user in the container, or the "root" user (and then run the nodepool process in the container as root, which isn't really root, it's the user that launched the container (ie, zuul))18:23
@jim:acmegating.comClark: fyi opendev podman thoughts ^18:24
@jim:acmegating.com * okay so the issue with 883952 is that we bind mount files owned by user 10001 into the container.  that is the 'nodepool' user on the host, and also the 'nodepool' user in the container.  we are running as the 'zuul' user on the host, and the zuul user on the host does not have permission to map host uid 10001 into the container (it's allowed uids 1000 (itself) and 100000-165535).  so this is a design mismatch between rootless and root containers.  this test mimics the opendev deployment style where we have zuul and nodepool users on the host which are intended to match the in-container users of the same name and uid for our bind mounts.  i think we can resolve it by either doing "sudo podman" so we don't end up with a userns and we mount everything in (so basically, run like docker).  or redesign the test and opendev deployments for rootless use (which probably means chowning this directory on the host to be owned by "zuul" which we would then either map to the "nodepool" user in the container, or the "root" user (and then run the nodepool process in the container as root, which isn't really root, it's the user that launched the container (ie, zuul))18:25
@clarkb:matrix.orgcorvus: another option may be to exec podman as nodepool on the host?18:26
@clarkb:matrix.orgthen its uids should all match up by default?18:26
@jim:acmegating.comyes18:26
@clarkb:matrix.orgI think ansible has become tooling to do that18:27
@jim:acmegating.comthat actually makes the whole thing sound slightly less insane :)18:27
@jim:acmegating.com(it complicates the test a little bit though)18:27
@tristanc_:matrix.orgif the host uid match the container uid, then are we using `--userns keep-id` ?18:28
@jim:acmegating.comwell, we aren't now, but it might be an option if we go with clark's suggestion18:29
@jim:acmegating.com(oh, another option would be to, on the host, allow the zuul user to use the nodepool uid; ie, set up /etc/subuid to allow that)18:32
-@gerrit:opendev.org- Tristan Cacqueray proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-operator] 881245: Publish container images to quay.io https://review.opendev.org/c/zuul/zuul-operator/+/88124518:37
@jim:acmegating.comClark: hrm, podman as the nodepool user isn't working out that well; if i su to nodepool, i'm not actually able to run podman apparently due to login issues19:11
@jim:acmegating.comfor one, it keeps trying to do this:  `mkdir /run/user/1000/libpod/tmp: permission denied`  (nodepool is user 10001, zuul is user 1000)19:11
@mordred:inaugust.comcorvus: is it any difference if you su - nodepool?19:12
-@gerrit:opendev.org- Tristan Cacqueray proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-operator] 881245: Publish container images to quay.io https://review.opendev.org/c/zuul/zuul-operator/+/88124519:12
@jim:acmegating.commordred: yeah, i also did `sudo su -` then `su - nodepool`19:12
@mordred:inaugust.comwow19:12
@jim:acmegating.comnothing in env says 1000, so idunno where it's getting that19:13
@mordred:inaugust.comthat seems like a bug in podman19:13
@mordred:inaugust.com(which is a shame - because I really like the "run podman as the nodepool user" - as that models the intended reduced privileges really nice)19:14
@jim:acmegating.comyeah it sounds like the shortest way out of this maze...19:14
@clarkb:matrix.orgIt really wants that systemd session stuff19:17
@clarkb:matrix.orgYou probably need to login in a way that triggers systemd session stuff. I don't know how to do that19:17
@mordred:inaugust.comwhat's the command that sounds like systemd but is less systemd? systemd-launch or systemd-runcommand or something like that?19:19
@jim:acmegating.comClark: mordred tristanC i'm about of ideas.  this issue is pretty easy to replicate and test locally, but for now, i think we're going to have to leave the test running as root (with "sudo podman") and likewise opendev will probably have to run podman containers as root.19:27
@jim:acmegating.comif we're okay accepting that, then i think we can proceed.  if we aren't okay with that, then i think someone needs to figure out an alternative, or we roll back to dockerhub.19:28
@tristanc_:matrix.orgcorvus: would it be possible to mount podman volumes instead of existing host directory? I think doing that should handle uid mapping transparently.19:29
@clarkb:matrix.org> <@jim:acmegating.com> if we're okay accepting that, then i think we can proceed.  if we aren't okay with that, then i think someone needs to figure out an alternative, or we roll back to dockerhub.19:30
I don't think this is any worse than with docker so should be fine? With possibility of improvement in the future
@jim:acmegating.comtristanC:  the opendev admins like to have things like /opt/dib bind mounted so it's accessible outside of containers and allows for complete removal and replacement of containers without losing vital data19:31
@jim:acmegating.comClark: i agree, it's no worse so i think we can live with it.19:32
@tristanc_:matrix.orgcorvus: you can still access the volume data from the host through ~/.local/share/containers/storage/volumes19:33
@tristanc_:matrix.orgalso, for the record, we had quite a few issues running images featuring a USER statement in openshift, and in the end we removed the user creation inside the image, and let the runtime assign the uid.19:36
-@gerrit:opendev.org- Tristan Cacqueray proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-operator] 881245: Publish container images to quay.io https://review.opendev.org/c/zuul/zuul-operator/+/88124519:43
@fungicide:matrix.orgproblem with the container runtime assigning a uid is that you don't have a stable uid on the host end, so if you replace that container and mount the old data back in, it may no longer be owned by the same uids20:13
@avass:matrix.vassast.org> <@fungicide:matrix.org> problem with the container runtime assigning a uid is that you don't have a stable uid on the host end, so if you replace that container and mount the old data back in, it may no longer be owned by the same uids20:14
I think I've used gods to work around that. But maybe that doesn't work for your issue
@avass:matrix.vassast.orggids* :)20:14
@fungicide:matrix.orgwell, also if we want to make things from the host that are owned by users inside the container, we need to be able to predict the uids the container's users will have20:14
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 883985: Add error information to config-errors API endpoint https://review.opendev.org/c/zuul/zuul/+/88398522:22
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 883985: Add error information to config-errors API endpoint https://review.opendev.org/c/zuul/zuul/+/88398522:41

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!