@avass:vassast.org | corvus: I get the feeling that HA schedulers are very close? :) | 08:53 |
---|---|---|
@avass:vassast.org | has anything changed bwrap? I upgraded to latest master and executors started failing to run bwrap with the following error: | 12:51 |
``` | ||
zuul@zuul-executor-0:/$ bwrap | ||
bwrap: Unexpected capabilities but not setuid, old file caps config? | ||
``` | ||
@avass:vassast.org | the `4.10.1` executor image is working but `4.10.2`, `4.10.3` and `4.10.4` are not. | 13:02 |
@avass:vassast.org | tobiash: have you seen anything like that in openshift ^ ? | 13:04 |
@avass:vassast.org | hmm I can get bwrap to run by following the suggestion from this github comment: | 13:10 |
https://github.com/containers/bubblewrap/issues/380#issuecomment-713842550 | ||
@avass:vassast.org | (just realized federation had broken again so I've been talking to myself until now :) | 13:18 |
@nhicher:matrix.org | Hello, the tags 4.10.X for the last zuul releases are not present on https://opendev.org/zuul/zuul/tags, and https://tarballs.opendev.org/zuul/zuul/ redirection is broken | 13:20 |
@avass:vassast.org | oh zuul `4.10.1` uses bwrap `0.3.1` while zuul `4.10.2+` is using bwrap `0.4.1` | 13:58 |
@tristanc_:matrix.org | adding `setpriv --ambient-caps '-all'` to the bwrap command sounds like a good mitigation | 13:59 |
@jpew:matrix.org | FYI: I was able to work around around kubectl's 4 hour single connection time limit by using ansible `async` in my long running task | 14:00 |
@avass:vassast.org | tristanC: yeah I'm just gonna take a look at what's causing bwrap 0.4.1 to fail first | 14:06 |
@avass:vassast.org | tristanC: I guess the difference is that the 0.4.1 bwrap package doesn't setuid bwrap while 0.3.1 | 14:12 |
@avass:vassast.org | * tristanC: I guess the difference is that the 0.4.1 bwrap package doesn't setuid bwrap while 0.3.1 does | 14:12 |
-@gerrit:opendev.org- Albin Vass proposed: [zuul/zuul] 816176: setuid /usr/bin/bwrap https://review.opendev.org/c/zuul/zuul/+/816176 | 14:17 | |
@avass:vassast.org | tristanC: that's ^ also a solution | 14:20 |
@avass:vassast.org | looks like it was caused by the python-3.8-bullseye upgrade: https://review.opendev.org/c/zuul/zuul/+/813711 | 14:30 |
-@gerrit:opendev.org- Albin Vass proposed: [zuul/zuul] 816176: setuid /usr/bin/bwrap https://review.opendev.org/c/zuul/zuul/+/816176 | 14:30 | |
@jim:acmegating.com | maybe we should update the quickstart docker-compose to use non-root users | 14:45 |
@avass:vassast.org | except that I can't reproduce it in docker | 14:48 |
@jim:acmegating.com | oh i see; i've read the github issue now | 14:51 |
@jim:acmegating.com | Albin Vass: then i think the commit message in 816176 may be a little misleading? | 14:53 |
@avass:vassast.org | corvus: yeah | 14:53 |
@clarkb:matrix.org | Ya it works in OpenDev and we don't run as root? | 15:01 |
@avass:vassast.org | I wonder if docker causes this check to behave differently: https://github.com/containers/bubblewrap/blob/bae85baf7208c4acddd9cf032059d1429f179e4a/bubblewrap.c#L749 | 15:11 |
-@gerrit:opendev.org- Ashley Bullock proposed: [zuul/zuul] 759003: Add initial bitbucket cloud driver using webhooks https://review.opendev.org/c/zuul/zuul/+/759003 | 15:13 | |
@avass:vassast.org | oh just realized I read the setprivs command the wrong way. the pod in openshift has ambient caps which is causing this check to fail: | 15:20 |
https://github.com/containers/bubblewrap/blob/bae85baf7208c4acddd9cf032059d1429f179e4a/bubblewrap.c#L780 | ||
while the docker container doesn't have any ambient capabilities set | ||
@clarkb:matrix.org | nhicher: I'm not sure why gitea doesn't produce tarballs for 4.x.y. We do our best to diasble that stuff in gitea (did so for releases), but there doesn't seem to be a good way to disable it for the tag tarballs. The tags themselves are in git. For the tarballs.opendev.org issue it appears that something may be publishing the zuulweb js to that location (that would be an error of zuul's release jobs somewhere) | 15:22 |
@clarkb:matrix.org | nhicher: the sdists are also published to pypi: https://pypi.org/project/zuul/#files I think that may be the best location to grabthem for now while the other stuff is sorted out | 15:23 |
@nhicher:matrix.org | Clark: thanks, I will use pypi to get the tgz | 15:25 |
@nhicher:matrix.org | Clark: the url is not useful for automation, eg https://files.pythonhosted.org/packages/bf/f9/ec5cef67b4f64a23138ee4da2a829128aa5bdf668b87ca4348b6bad77ac7/zuul-4.10.4.tar.gz | 15:28 |
@nhicher:matrix.org | but it's ok for now | 15:28 |
-@gerrit:opendev.org- Albin Vass proposed: [zuul/zuul] 816176: Drop ambient capabilities when running bwrap https://review.opendev.org/c/zuul/zuul/+/816176 | 15:30 | |
@avass:vassast.org | corvus, Clark ^ that should do it, but I'm gonna see if that works in openshift as well | 15:30 |
@avass:vassast.org | (or at least it did when I ran in manually:) | 15:31 |
@avass:vassast.org | * (or at least it did when I ran it manually:) | 15:31 |
@jim:acmegating.com | nhicher: pypi provides an index to resolve the url. that's the only official place it's published; the tarballs that gitea (sometimes?) builds are not the same and in the event they are somehow accidentally made available, should not be used. | 15:33 |
@nhicher:matrix.org | corvus: ok, thanks | 15:34 |
@jim:acmegating.com | nhicher: check out https://pypi.org/simple/zuul/ as a basis for automation | 15:35 |
@mordred:inaugust.com | nhicher: pip3 download --no-binary :all: --no-deps zuul | 15:56 |
@nhicher:matrix.org | @mordred perfect, thanks | 15:58 |
@clarkb:matrix.org | Albin Vass: will that reduce the security stance of bwrap? If so it should probably be optional (I guess I need to go read docs on ambient caps0 | 16:00 |
@clarkb:matrix.org | "The ambient capability set obeys the invariant that no capability can ever be ambient if it is not both permitted and inheritable." Ok so these are capabilities that are permitted and inheritable that apply to non privileged exec'd processes | 16:11 |
@clarkb:matrix.org | The proposed change removes all ambient caps from the bwrap execution environment. This means that none of those proceses (bwrap or what it forks?) can apply those capabilities. However, it isn't clear to me how setpriv can drop those since it isn't privileged either? | 16:13 |
@clarkb:matrix.org | This sounds like a bwrap bug? The whole point of ambient caps is that they apply to non privileged processes so you should be able to apply them to a non setuid process? | 16:14 |
@clarkb:matrix.org | or is the problem that bwrap can't manage caps in that case which means it would "pollute" its child processes with the ambient caps in an unwanted way? | 16:14 |
@jim:acmegating.com | Clark: there's some bwrap background on the github issue | 16:17 |
@clarkb:matrix.org | corvus: ya it indicates that it makes it work, but I'm not really seeing anything indicating whether or not the non setuid situation is broken if ambient caps are removed? THe comment indicates that in setuid setup you are trusting bwrap (which is fair) | 16:20 |
@clarkb:matrix.org | I guess in the non setuid case the implication is bwrap will assume it is setuid with those caps and tries to do more than the ambient caps allow | 16:22 |
@clarkb:matrix.org | I don't understand enough about bwrap to grok why they can't infer how they are executed properly at runtime and make the right choices | 16:22 |
@avass:vassast.org | Clark: yeah I'm not 100% sure yet either but i got the idea that it should at least not lower the security of bwrap. The other options are to run bwrap as root or setuid the bwrap executable. The reason it worked in my docker environment is because it didn't have any ambient caps set (by checking `setpriv -d`) even though the container was privileged while the openshift environment had ambient caps | 16:32 |
@clarkb:matrix.org | Ya I assume that is the same reason it is working in opendev (no ambient caps set) | 16:33 |
@avass:vassast.org | Yeah | 16:33 |
@clarkb:matrix.org | In that case removing ambient caps is probably no worse than why it works now. I just like to understand this stuff before proceeding :) | 16:34 |
@clarkb:matrix.org | Albin Vass: I think your change may not be working https://zuul.opendev.org/t/zuul/build/58948b4a67ef48338adb181b3f09313f/log/container_logs/scheduler.log#81-85 is an error in the scheduler erlated to db entries the executor should be making? BUt the executor log is pretty empty in comparison | 17:42 |
@avass:vassast.org | Clark: yeah i think i got too many quotes in there :) | 17:43 |
@clarkb:matrix.org | ya around the '-all' ? | 17:44 |
@avass:vassast.org | Yeah | 17:44 |
@clarkb:matrix.org | Also FileNotFoundError: [Errno 2] No such file or directory: 'setpriv' in the unittests | 17:44 |
@avass:vassast.org | Hmm i can take a look at that in a minute, making food atm | 17:45 |
@clarkb:matrix.org | Its a separate package on ubuntu bionic | 17:45 |
@clarkb:matrix.org | I think you just need to add setpriv to bindep on bionic (which is where the unittests currently run) | 17:46 |
-@gerrit:opendev.org- Albin Vass proposed: [zuul/zuul] 816176: Drop ambient capabilities when running bwrap https://review.opendev.org/c/zuul/zuul/+/816176 | 18:11 | |
@avass:vassast.org | lets see if that does it | 18:11 |
@clarkb:matrix.org | Albin Vass: I think you might need util-linux on newer ubuntu too. So will want to match setpriv only on bionic | 18:12 |
@clarkb:matrix.org | Albin Vass: let me see if I can find an example of that for you | 18:12 |
-@gerrit:opendev.org- Albin Vass proposed: [zuul/zuul] 816176: Drop ambient capabilities when running bwrap https://review.opendev.org/c/zuul/zuul/+/816176 | 18:13 | |
@clarkb:matrix.org | Albin Vass: "libffi6 [platform:dpkg !platform:ubuntu-focal !platform:debian-bullseye]" in this case you would do "setpriv [platform:ubuntu-bionic]\nutil-linux" | 18:14 |
@clarkb:matrix.org | or ya what you just pushed should work too as it is more specifico n all the platforms that want util-linux | 18:14 |
@avass:vassast.org | unless we care about xenial :) | 18:15 |
@avass:vassast.org | but since it's EOL I guess we don't | 18:15 |
@clarkb:matrix.org | we also dropped python3.5 support which is xenial's python | 18:15 |
@clarkb:matrix.org | I think what you've got is good if testing agress | 18:15 |
@clarkb:matrix.org | * I think what you've got is good if testing agrees | 18:15 |
@avass:vassast.org | otherwise I'll take a closer look tomorrow | 18:16 |
@fungicide:matrix.org | yeah, we dropped xenial testing, i think i posted to the zuul-discuss ml about that | 18:24 |
@fungicide:matrix.org | oh, though that was specifically for zuul-jobs, now that i think back | 18:24 |
@ecsantos:matrix.org | fungi: Thanks for your help the other day regarding Zuul console! I figured out what was my problem. I'm using an OpenStack cloud as a Nodepool provider, and the security group being used by the nodes didn't have a rule for port 19885, hence the error. Found out though the SF docs [1]. | 20:03 |
[1] https://softwarefactory-project.io/docs/operator/nodepool_operator.html#add-a-cloud-provider | ||
@fungicide:matrix.org | ecsantos: certainly sounded like it was something along those lines, glad you were able to work it out! in opendev, we just replace any default security groups with wide open ones, since we run with locked-down iptables rules on our test nodes by default | 20:12 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 816208: WIP Use AuthProvider https://review.opendev.org/c/zuul/zuul/+/816208 | 20:47 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 816208: WIP Use AuthProvider https://review.opendev.org/c/zuul/zuul/+/816208 | 20:48 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 816215: Adjust debug messages in change cache https://review.opendev.org/c/zuul/zuul/+/816215 | 21:51 | |
@jim:acmegating.com | Clark, fungi, tobiash, swest: ^ that's an outcome from this weekends test-in-prod. i can't think of how we could have ended up with the change cache in that state, so i think we need more debug msgs. i believe we saw that even with one scheduler running. i'd like to merge that asap so we get more data on the next restart. | 21:52 |
@clarkb:matrix.org | looking | 21:53 |
@jim:acmegating.com | oh, flake8 error, 1 sec | 21:54 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 816215: Adjust debug messages in change cache https://review.opendev.org/c/zuul/zuul/+/816215 | 21:55 | |
@jim:acmegating.com | there; fixed a pep8 line length issue | 21:55 |
@jim:acmegating.com | fungi: i think i understand the issue in the traceback you found at https://paste.opendev.org/show/810302/ -- i'll work on a fix | 23:05 |
@jim:acmegating.com | (basically, we're refreshing the queueitem data from zk in 2 places and there's a race between them) | 23:05 |
@jim:acmegating.com | i don't think that one is critical, but it's certainly a nuisance | 23:07 |
@jim:acmegating.com | (i think it only affects rendering status json) | 23:07 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 816215: Adjust debug messages in change cache https://review.opendev.org/c/zuul/zuul/+/816215 | 23:28 | |
@jim:acmegating.com | wow that merged in one go. we must be doing something right. :) | 23:28 |
@clarkb:matrix.org | And runtime for unittests was right at an hour. So we improved the speed too | 23:30 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!