Monday, 2021-11-01

@avass:vassast.orgcorvus: I get the feeling that HA schedulers are very close? :)08:53
@avass:vassast.orghas anything changed bwrap? I upgraded to latest master and executors started failing to run bwrap with the following error:12:51
```
zuul@zuul-executor-0:/$ bwrap
bwrap: Unexpected capabilities but not setuid, old file caps config?
```
@avass:vassast.orgthe `4.10.1` executor image is working but `4.10.2`, `4.10.3` and `4.10.4` are not.13:02
@avass:vassast.orgtobiash: have you seen anything like that in openshift ^ ?13:04
@avass:vassast.orghmm I can get bwrap to run by following the suggestion from this github comment:13:10
https://github.com/containers/bubblewrap/issues/380#issuecomment-713842550
@avass:vassast.org(just realized federation had broken again so I've been talking to myself until now :)13:18
@nhicher:matrix.orgHello, the tags 4.10.X for the last zuul releases are not present on https://opendev.org/zuul/zuul/tags, and https://tarballs.opendev.org/zuul/zuul/ redirection is broken13:20
@avass:vassast.orgoh zuul `4.10.1` uses bwrap `0.3.1` while zuul `4.10.2+` is using bwrap `0.4.1`13:58
@tristanc_:matrix.orgadding `setpriv --ambient-caps '-all'` to the bwrap command sounds like a good mitigation13:59
@jpew:matrix.orgFYI: I was able to work around around kubectl's 4 hour single connection time limit by using ansible `async` in my long running task14:00
@avass:vassast.orgtristanC:  yeah I'm just gonna take a look at what's causing bwrap 0.4.1 to fail first14:06
@avass:vassast.orgtristanC:  I guess the difference is that the 0.4.1 bwrap package doesn't setuid bwrap while 0.3.114:12
@avass:vassast.org * tristanC:  I guess the difference is that the 0.4.1 bwrap package doesn't setuid bwrap while 0.3.1 does14:12
-@gerrit:opendev.org- Albin Vass proposed: [zuul/zuul] 816176: setuid /usr/bin/bwrap https://review.opendev.org/c/zuul/zuul/+/81617614:17
@avass:vassast.orgtristanC:  that's ^ also a solution14:20
@avass:vassast.orglooks like it was caused by the python-3.8-bullseye upgrade: https://review.opendev.org/c/zuul/zuul/+/81371114:30
-@gerrit:opendev.org- Albin Vass proposed: [zuul/zuul] 816176: setuid /usr/bin/bwrap https://review.opendev.org/c/zuul/zuul/+/81617614:30
@jim:acmegating.commaybe we should update the quickstart docker-compose to use non-root users14:45
@avass:vassast.orgexcept that I can't reproduce it in docker14:48
@jim:acmegating.comoh i see; i've read the github issue now14:51
@jim:acmegating.comAlbin Vass: then i think the commit message in 816176 may be a little misleading?14:53
@avass:vassast.orgcorvus: yeah14:53
@clarkb:matrix.orgYa it works in OpenDev and we don't run as root?15:01
@avass:vassast.orgI wonder if docker causes this check to behave differently: https://github.com/containers/bubblewrap/blob/bae85baf7208c4acddd9cf032059d1429f179e4a/bubblewrap.c#L74915:11
-@gerrit:opendev.org- Ashley Bullock proposed: [zuul/zuul] 759003: Add initial bitbucket cloud driver using webhooks https://review.opendev.org/c/zuul/zuul/+/75900315:13
@avass:vassast.orgoh just realized I read the setprivs command the wrong way. the pod in openshift has ambient caps which is causing this check to fail:15:20
https://github.com/containers/bubblewrap/blob/bae85baf7208c4acddd9cf032059d1429f179e4a/bubblewrap.c#L780
while the docker container doesn't have any ambient capabilities set
@clarkb:matrix.orgnhicher: I'm not sure why gitea doesn't produce tarballs for 4.x.y. We do our best to diasble that stuff in gitea (did so for releases), but there doesn't seem to be a good way to disable it for the tag tarballs. The tags themselves are in git. For the tarballs.opendev.org issue it appears that something may be publishing the zuulweb js to that location (that would be an error of zuul's release jobs somewhere)15:22
@clarkb:matrix.orgnhicher: the sdists are also published to pypi: https://pypi.org/project/zuul/#files I think that may be the best location to grabthem for now while the other stuff is sorted out15:23
@nhicher:matrix.orgClark: thanks, I will use pypi to get the tgz15:25
@nhicher:matrix.orgClark: the url is not useful for automation, eg https://files.pythonhosted.org/packages/bf/f9/ec5cef67b4f64a23138ee4da2a829128aa5bdf668b87ca4348b6bad77ac7/zuul-4.10.4.tar.gz15:28
@nhicher:matrix.orgbut it's ok for now15:28
-@gerrit:opendev.org- Albin Vass proposed: [zuul/zuul] 816176: Drop ambient capabilities when running bwrap https://review.opendev.org/c/zuul/zuul/+/81617615:30
@avass:vassast.orgcorvus, Clark  ^ that should do it, but I'm gonna see if that works in openshift as well15:30
@avass:vassast.org(or at least it did when I ran in manually:)15:31
@avass:vassast.org * (or at least it did when I ran it manually:)15:31
@jim:acmegating.comnhicher: pypi provides an index to resolve the url.  that's the only official place it's published; the tarballs that gitea (sometimes?) builds are not the same and in the event they are somehow accidentally made available, should not be used.15:33
@nhicher:matrix.orgcorvus: ok, thanks15:34
@jim:acmegating.comnhicher: check out https://pypi.org/simple/zuul/ as a basis for automation15:35
@mordred:inaugust.comnhicher: pip3 download --no-binary :all: --no-deps zuul15:56
@nhicher:matrix.org@mordred perfect, thanks15:58
@clarkb:matrix.orgAlbin Vass: will that reduce the security stance of bwrap? If so it should probably be optional (I guess I need to go read docs on ambient caps016:00
@clarkb:matrix.org"The ambient capability set obeys the invariant that no capability can ever be ambient if it is not both permitted and inheritable." Ok so these are capabilities that are permitted and inheritable that apply to non privileged exec'd processes16:11
@clarkb:matrix.orgThe proposed change removes all ambient caps from the bwrap execution environment. This means that none of those proceses (bwrap or what it forks?) can apply those capabilities. However, it isn't clear to me how setpriv can drop those since it isn't privileged either?16:13
@clarkb:matrix.orgThis sounds like a bwrap bug? The whole point of ambient caps is that they apply to non privileged processes so you should be able to apply them to a non setuid process?16:14
@clarkb:matrix.orgor is the problem that bwrap can't manage caps in that case which means it would "pollute" its child processes with the ambient caps in an unwanted way?16:14
@jim:acmegating.comClark: there's some bwrap background on the github issue16:17
@clarkb:matrix.orgcorvus: ya it indicates that it makes it work, but I'm not really seeing anything indicating whether or not the non setuid situation is broken if ambient caps are removed? THe comment indicates that in setuid setup you are trusting bwrap (which is fair)16:20
@clarkb:matrix.orgI guess in the non setuid case the implication is bwrap will assume it is setuid with those caps and tries to do more than the ambient caps allow16:22
@clarkb:matrix.orgI don't understand enough about bwrap to grok why they can't infer how they are executed properly at runtime and make the right choices16:22
@avass:vassast.orgClark: yeah I'm not 100% sure yet either but i got the idea that it should at least not lower the security of bwrap. The other options are to run bwrap as root or setuid the bwrap executable. The reason it worked in my docker environment is because it didn't have any ambient caps set (by checking `setpriv -d`) even though the container was privileged while the openshift environment had ambient caps16:32
@clarkb:matrix.orgYa I assume that is the same reason it is working in opendev (no ambient caps set)16:33
@avass:vassast.orgYeah16:33
@clarkb:matrix.orgIn that case removing ambient caps is probably no worse than why it works now. I just like to understand this stuff before proceeding :)16:34
@clarkb:matrix.orgAlbin Vass: I think your change may not be working https://zuul.opendev.org/t/zuul/build/58948b4a67ef48338adb181b3f09313f/log/container_logs/scheduler.log#81-85 is an error in the scheduler erlated to db entries the executor should be making? BUt the executor log is pretty empty in comparison17:42
@avass:vassast.orgClark: yeah i think i got too many quotes in there :)17:43
@clarkb:matrix.orgya around the '-all' ?17:44
@avass:vassast.orgYeah17:44
@clarkb:matrix.orgAlso FileNotFoundError: [Errno 2] No such file or directory: 'setpriv' in the unittests17:44
@avass:vassast.orgHmm i can take a look at that in a minute, making food atm17:45
@clarkb:matrix.orgIts a separate package on ubuntu bionic17:45
@clarkb:matrix.orgI think you just need to add setpriv to bindep on bionic (which is where the unittests currently run)17:46
-@gerrit:opendev.org- Albin Vass proposed: [zuul/zuul] 816176: Drop ambient capabilities when running bwrap https://review.opendev.org/c/zuul/zuul/+/81617618:11
@avass:vassast.orglets see if that does it18:11
@clarkb:matrix.orgAlbin Vass: I think you might need util-linux on newer ubuntu too. So will want to match setpriv only on bionic18:12
@clarkb:matrix.orgAlbin Vass: let me see if I can find an example of that for you18:12
-@gerrit:opendev.org- Albin Vass proposed: [zuul/zuul] 816176: Drop ambient capabilities when running bwrap https://review.opendev.org/c/zuul/zuul/+/81617618:13
@clarkb:matrix.orgAlbin Vass: "libffi6 [platform:dpkg !platform:ubuntu-focal !platform:debian-bullseye]" in this case you would do "setpriv [platform:ubuntu-bionic]\nutil-linux"18:14
@clarkb:matrix.orgor ya what you just pushed should work too as it is more specifico n all the platforms that want util-linux18:14
@avass:vassast.orgunless we care about xenial :)18:15
@avass:vassast.orgbut since it's EOL I guess we don't18:15
@clarkb:matrix.orgwe also dropped python3.5 support which is xenial's python18:15
@clarkb:matrix.orgI think what you've got is good if testing agress18:15
@clarkb:matrix.org * I think what you've got is good if testing agrees18:15
@avass:vassast.orgotherwise I'll take a closer look tomorrow18:16
@fungicide:matrix.orgyeah, we dropped xenial testing, i think i posted to the zuul-discuss ml about that18:24
@fungicide:matrix.orgoh, though that was specifically for zuul-jobs, now that i think back18:24
@ecsantos:matrix.orgfungi: Thanks for your help the other day regarding Zuul console! I figured out what was my problem. I'm using an OpenStack cloud as a Nodepool provider, and the security group being used by the nodes didn't have a rule for port 19885, hence the error. Found out though the SF docs [1].20:03
[1] https://softwarefactory-project.io/docs/operator/nodepool_operator.html#add-a-cloud-provider
@fungicide:matrix.orgecsantos: certainly sounded like it was something along those lines, glad you were able to work it out! in opendev, we just replace any default security groups with wide open ones, since we run with locked-down iptables rules on our test nodes by default20:12
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 816208: WIP Use AuthProvider https://review.opendev.org/c/zuul/zuul/+/81620820:47
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 816208: WIP Use AuthProvider https://review.opendev.org/c/zuul/zuul/+/81620820:48
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 816215: Adjust debug messages in change cache https://review.opendev.org/c/zuul/zuul/+/81621521:51
@jim:acmegating.comClark, fungi, tobiash, swest: ^ that's an outcome from this weekends test-in-prod.  i can't think of how we could have ended up with the change cache in that state, so i think we need more debug msgs.  i believe we saw that even with one scheduler running.  i'd like to merge that asap so we get more data on the next restart.21:52
@clarkb:matrix.orglooking21:53
@jim:acmegating.comoh, flake8 error, 1 sec21:54
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 816215: Adjust debug messages in change cache https://review.opendev.org/c/zuul/zuul/+/81621521:55
@jim:acmegating.comthere; fixed a pep8 line length issue21:55
@jim:acmegating.comfungi: i think i understand the issue in the traceback you found at https://paste.opendev.org/show/810302/ -- i'll work on a fix23:05
@jim:acmegating.com(basically, we're refreshing the queueitem data from zk in 2 places and there's a race between them)23:05
@jim:acmegating.comi don't think that one is critical, but it's certainly a nuisance23:07
@jim:acmegating.com(i think it only affects rendering status json)23:07
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 816215: Adjust debug messages in change cache https://review.opendev.org/c/zuul/zuul/+/81621523:28
@jim:acmegating.comwow that merged in one go.  we must be doing something right.  :)23:28
@clarkb:matrix.orgAnd runtime for unittests was right at an hour. So we improved the speed too23:30

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!