Monday, 2017-04-17

*** bhavik1 has joined #zuul		05:36
*** bhavik1 has quit IRC		06:09
*** pbelamge has joined #zuul		09:15
pbelamge	Hello All	09:17
pbelamge	Since few days I have started exploring Zuul by following openstack infra zuul web page	09:18
pbelamge	As per the doc, when I run zuul-server, it just exits and doesn't provide any logs as why it is exiting	09:20
pbelamge	if I run with -d option, then it runs fine on the console	09:20
pbelamge	am I missing anything in first case?	09:20
pbelamge	anybody?	09:52
*** _ari_\|gone is now known as _ari_		13:09
pabelanger	pbelamge: what version of zuul are you running?	13:11
pabelanger	we had this issue recently with a change to logging IIRC	13:11
pabelanger	make sure your logging file is correct	13:11
openstackgerrit	Merged openstack-infra/nodepool master: Add mirror support for fedora-25 DIB https://review.openstack.org/456372	13:49
openstackgerrit	Merged openstack-infra/nodepool master: Switch to /etc/ci/mirror_info.sh for nodepool mirrors https://review.openstack.org/456374	13:50
*** jkilpatr has joined #zuul		13:50
pabelanger	yay	13:51
pabelanger	https://review.openstack.org/#/c/455770/ is people are reviewing :D	13:51
openstackgerrit	Merged openstack-infra/nodepool master: Add functional test for key-name and glean https://review.openstack.org/455770	13:59
*** dkranz has joined #zuul		14:09
*** pbelamge has quit IRC		14:11
*** eggshell has left #zuul		15:57
*** eggshell has joined #zuul		15:57
*** corvus is now known as jeblair		16:12
jeblair	good morning!	16:12
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (1/2) https://review.openstack.org/453362	16:21
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Replace config/project repos with config/untrusted projects https://review.openstack.org/453347	16:21
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (2/2) https://review.openstack.org/453821	16:21
Shrews	jeblair: welcome back	16:44
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Fix dynamic reconfiguration https://review.openstack.org/454395	16:44
jeblair	mordred, pabelanger: i guess you didn't make any more headway on https://review.openstack.org/454396 ?	16:48
pabelanger	jeblair: sadly no, it was a light week for me last week on zuul things	16:49
*** harlowja has quit IRC		16:49
*** harlowja has joined #zuul		16:52
*** jkilpatr has quit IRC		17:26
*** jkilpatr has joined #zuul		17:27
* SpamapS cracks knuckles and prepares to dive back in		17:35
SpamapS	jeblair: feeling recharged I hope? :)	17:35
jeblair	SpamapS: yes! let's merge some changes! :)	17:40
Shrews	i think there's still the random job failure issue, yeah?	17:41
clarkb	Shrews: yes I think so	17:43
clarkb	I have a changr up that runs tests twice that seems to catch it that you can recheck to see	17:43
jeblair	clarkb: what's that telling us?	17:47
SpamapS	If running twice hits sometimes.. tells me there's a race.	17:47
SpamapS	If it hits always, tells me there's a cleanup problem.	17:48
SpamapS	jlk: IIRC, --analyze-isolation did not find a bad interaction, right?	17:48
clarkb	jeblair: SpamapS that and if its test order dependent the .tesrepository data from first run influences test order of second run	17:48
clarkb	local testing showed clean first runs are more likely to pass	17:48
jeblair	clarkb: have you gotten data from your experiment yet?	17:49
jlk	SpamapS: that's right	17:49
clarkb	the change I pushed? it ailed as expected yo match local results. I havent had much time to look at 8t further though	17:50
jlk	I can run with concurrency of 4 and get failures.	17:50
jlk	concurrency 8 is fine	17:50
jeblair	clarkb: did it fail the first run or the second?	17:51
SpamapS	jlk: on a box w/ 8 CPUs yeah?	17:52
jlk	yeah, I haven't tried doing this on a 4 cpu box but with forced 8 concurrency	17:52
clarkb	jeblair: tge second	17:52
SpamapS	weren't we also suspicious about the sqla reporter tests?	17:53
jeblair	clarkb: where will you go next with that change?	17:54
jlk	I removed those from my set and still got failures	17:54
clarkb	jeblair: I think we need to track down the fails and fix them then possibly merge tge change if we think it will prevent regressions else abandon	17:55
clarkb	it was mostly a sanity check that the gate wasnt special	17:55
*** jonesn has joined #zuul		17:56
jeblair	clarkb: okay, i wasn't sure if you had a plan to use that change to track down the failures.	17:57
jonesn	Is anyone around who could answer a few (probably basic) questions about adding a gate to a project?	17:57
jeblair	clarkb: we have a significant first-run error rate as it is, so i don't think running twice is necessary to prevent regressions.	17:57
jeblair	clarkb: (our error rate is also significant enough that i don't think that a single run of that change is enough to show that successive runs always cause problems. i run locally with no cleanup between runs and pass/fail about as often as the gate)	17:59
clarkb	ok	17:59
jeblair	clarkb: may want to throw a bunch of rechecks at it?	17:59
SpamapS	jonesn: I'm certain people in here can answer questions about gates and projects. It may be best to just ask, and then when people have a moment they can answer.	18:03
SpamapS	jonesn: most of us are pretty focused on v3 dev, so your patience is very much appreciated. :)	18:03
clarkb	jeblair: ya, thoguh I think SpamapS --analyze-isolation plan is likely to return the best results in short term	18:03
jonesn	Is getting a tox based gate added as simple as "have a toxenv to run, edit zuul/layout.yaml, edit jenkins/jobs/projects.yaml"?	18:03
jeblair	jonesn: are you asking about openstack's instance of zuul?	18:05
jonesn	jeblair: Yes, I think. I'm trying to add a bandit gate to the cinder project.	18:06
SpamapS	clarkb: To be clear, my suggestion was to hold a node that fails, and use --analyze-isolation on it when it fails. But jlk basically simulated that and got no insight.	18:06
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer https://review.openstack.org/456721	18:07
jeblair	jonesn: thanks, that context helps. these doc links may help: https://docs.openstack.org/infra/manual/creators.html#add-basic-jenkins-jobs should be pretty close to what you want, and for further information: https://docs.openstack.org/infra/manual/drivers.html#running-jobs-with-zuul	18:07
clarkb	SpamapS: you shouldn't need to hold a node, just make it fail locally (which seems easy) and run analyze-isolation on that. :( that it didn't catch anything though	18:07
jeblair	jonesn: the #openstack-infra channel is for discussion of openstack infrastrucuture tools, and there are more folks there that can help with this specific kind of issue	18:08
jonesn	jeblair: Thank you. Sorry for being on the wrong channel.	18:08
jeblair	jonesn: you're welcome (and it's not the wrong channel, just that there's a better one :)	18:09
SpamapS	so..	18:11
SpamapS	the recent shake-up of Ubuntu dev has me worried about landing bubblewrap in xenial-backports	18:12
*** jonesn has left #zuul		18:12
SpamapS	I've asked on the ubuntu-devel mailing list if they need help and gotten no replies	18:12
SpamapS	Anybody want to migrate to Debian unstable? ;-)	18:12
jeblair	SpamapS: iirc, fungi does. i'm not opposed.	18:13
jeblair	(i'm also not opposed to centos, fwiw :)	18:13
SpamapS	Yeah either one would be fine.	18:13
clarkb	my concern with centos generally is lag on security patches	18:14
SpamapS	also Debian's releasing every 2 years now, and stretch is about 3 weeks away.	18:14
SpamapS	so it's actually not such a terrible thing to be on stable	18:14
jeblair	SpamapS: it's possible i misspoke for fungi and that's actually what he would prefer :)	18:14
jeblair	SpamapS: what's the feasibility of doing our own backport?	18:14
SpamapS	jeblair: Oh our own backport is done. Just don't know if that's something infra wants to host somewhere.	18:15
SpamapS	the bug requesting it is literally just a rubber stamp, the backporters team will run a script and upload the backport as soon as they get to it	18:15
SpamapS	but there are 43 others in front of it.	18:15
SpamapS	I've also offered to start helping with that, since I find it quite useful to have a functioning ubuntu backports system.	18:16
mordred	SpamapS: we _do_ depend on a PPA in infra for one package, although it's not the happiest thing in the world to depend on it since it's a one-off with no process around it	18:16
jlk	eww	18:16
jlk	those always bit us at Blue Box	18:17
SpamapS	PPA's are, IMO, Launchpad's killer feature.	18:17
mordred	patched version of vhd-utils is needed to be able to make images for rackspace public cloud	18:17
SpamapS	but yeah, one-offs w/o process are a problem.	18:17
mordred	yup. totally agree	18:17
jlk	agreed	18:17
jlk	I had championed a similar thing over in Fedora land	18:17
mordred	also, it's super annoying that there isn't already a tool that can make vhd images that rax public can consume	18:17
jeblair	at least this would be time-limited (until next release)	18:17
mordred	jeblair: ++	18:17
jlk	it just took a lot of time for it to come to fruition (after I left)	18:18
SpamapS	Yeah a backports PPA is better than a "that's never getting into Debian" PPA.	18:18
jlk	mordred: seems like that problem will solve itself in the future... :(	18:18
mordred	yah. the vhd-util thing falls in to a "wow, that's a terrible patch" category	18:18
mordred	jlk: sssh. we're hoping that future remains far away - will be a lot of work if it comes to pass soon :(	18:19
SpamapS	and with a xenial PPA, it's somewhat natural to end up on the next LTS of Ubuntu with the package coming from universe instead of that PPA.	18:19
mordred	SpamapS: ++	18:19
SpamapS	also it's possible the backporters team gets to it eventually	18:19
SpamapS	and we just delete it from the PPA	18:19
SpamapS	kk I'm convinced	18:19
mordred	in any case, I personally would be fine with a bubblewrap ppa	18:19
mordred	for now - since there is a path to the future	18:19
Shrews	mordred: should you find time, i'm hoping if you could check out https://review.openstack.org/456721 and see if i'm heading in the correct direction. it's pretty basic atm	18:22
mordred	Shrews: looking now	18:22
Shrews	mordred: and if you wonder why i chose socketserver, it's the only thing i found that works with handling sockets in a forked/threaded manner	18:22
Shrews	could not get my own manual version to function correctly :(	18:23
mordred	Shrews: that looks good to me so far - jeblair you wanna double check that to make sure I'm not crazy?	18:27
*** hashar has joined #zuul		18:58
*** hashar has quit IRC		19:18
*** dkranz has quit IRC		19:33
fungi	jeblair: you know me too well. i'd put in time on getting stuff going on sid if there was some consensus it's a good idea ;)	19:34
fungi	(i mean, would be convenient for me since that's what my dev systems run...)	19:34
fungi	(or stretch, sure, why not)	19:34
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer https://review.openstack.org/456721	19:37
mordred	fungi: I think bubblewrap is already in sid, so we'd be good there	19:39
pabelanger	clarkb: do you have examples of the security lag for centos-7? never heard that before	19:39
clarkb	pabelanger: patches to things (for example heartbleed) take a day or two longer than the other distros out there	19:39
fungi	mordred: already in stretch (stable rsn!) and in jessie-backports... https://packages.debian.org/search?keywords=bubblewrap	19:40
mordred	fungi: woot	19:40
fungi	so pick your poison	19:40
clarkb	pabelanger: I think because instead of centos patching things directly they wait for rhel then do all their testing then push? something like that maybe?	19:40
pabelanger	clarkb: ya, if centos depends on RHEL, I could see the day lag or so	19:41
fungi	mordred: if you _specifically_ want >=0.1.8 though, you're probably stuck waiting post stretch for the packages presently hiding in experimental	19:41
fungi	but 0.1.7 can be had now in stable(backports), testing(frozen) or unstable	19:41
fungi	jeblair: clarkb: SpamapS: should we add the executor security spec on today's meeting agenda for some last-minute digging into the comments about the on-a-test-node alternative? would like to be certain on the pros and cons before approval	19:44
SpamapS	fungi: I feel like that horse is dead, but we can of course reanimate it if there are some more questions we forgot to ask while we killed it.	19:49
fungi	clarkb: are you still on the other side of the fence there after my and SpamapS' subsequent comments about the additional cons? (or have you read them yet?)	19:51
clarkb	I haven't read them yet. I wasn't going to stop moving foward as is (I noted that last meeting)	19:51
* clarkb looks now		19:52
SpamapS	I actually kind of think the executor is a perfect case for a real kubernetes system btw.	19:56
SpamapS	scale them out as needed, isolate each pod to vms owned only by the one project	19:56
SpamapS	but that's a large pile of design work	19:56
mordred	I agree with both things - I think there are some potential benefits of k8s things, and also that it'll be a large pile of design work to figure out how	19:57
* clarkb discovered that k8s reused the metadata service design from openstack (and possibly elsewhere) had a sad		20:00
mordred	sigh	20:00
mordred	oh well	20:00
mordred	well, to be fair, the openstack one wouldn't be a problem if it was just a part of the normal api layer and thus scaled out with it instead of a separate service layer that nobody wanted to spend resources to scale	20:01
mordred	it's not like scaling a rest service that returns json blobs is hard	20:02
clarkb	mordred: I think NAT by definition is in the poor to scale category of solutions	20:02
mordred	well - yah	20:02
mordred	that part is stupid	20:02
fungi	clarkb: yeah, i appreciate you were willing to not hold up approval, but since pabelanger scheduled it for approval tomorrow anyway we have time to find out if you still actually disagree. that's important to me (at least)	20:02
clarkb	also I discovred it because it wasn't working :)	20:02
*** dkranz has joined #zuul		20:02
clarkb	fungi: yes I think I still do disagree. We know the alternative works, and works relatively well at scale. We also know that ansible has been tricky to secure so I think using an unproven system is less desireable	20:03
clarkb	thats not to say the other system can't work, its just a lot more risky imo	20:04
pabelanger	I thought the issue of scaling out zuul-executor today was the caching of our git repos? Or have I confused something	20:04
mordred	pabelanger: you have not	20:04
clarkb	I think the option chosen is a good one if not going with the alternative and SpamapS did an excellent job laying out the problem space and optiosn available	20:05
mordred	clarkb: can you expand "the alternative" into slightly more words so I'm sure I'm folling you?	20:06
clarkb	mordred: running the ansible in the test env itself rather than on the executor. Basically what we do today with eg d-g	20:06
fungi	clarkb: important enough to give up on being able to have untrusted pre/post playbooks which need access to things only the executor can do, like uploading logs/tarballs? it seemed like a pretty useful (if ambitious) feature, but i understand everything is of course a trade-off to some degree	20:06
clarkb	fungi: thinking about it operationally, every time a hole is discovered we'd have to turn off zuul	20:07
mordred	clarkb: gotcha. thank you	20:07
fungi	or i guess the publication playbooks themselves do still need to be trusted, so it's more than those would have to run on the executor (without protections) while untrusted pre/post playbooks run on the test node?	20:07
*** jkilpatr has quit IRC		20:07
clarkb	fungi: right	20:07
fungi	er, more that those	20:07
*** jkilpatr has joined #zuul		20:07
pabelanger	I mean, I like the idea of what clarkb is saying, mostly because that is the only way to do it today. However, I am happy to give the bubblewrap approach a shot too.	20:09
fungi	i suppose there's nothing in the current design preventing a trusted playbook on the executor from calling ansible on the test node to run untrusted playbooks, so in theory we could support both if we already set up executor-side protections	20:09
clarkb	fungi: the problem is you don't control that in the current design	20:09
fungi	at worst it makes the spec under discussion redundant/overkill if it ends up only securing trusted playbook execution	20:10
mordred	well - yah. I believe we need to write the "run ansible on test node" stuff at some point regardless - because there will be things that want to run with no restrictions but are untrusted	20:10
clarkb	fungi: your users can push arbitrary code to run in the sandbox and current experience is its quite arbitrary	20:10
clarkb	mordred: speaking of, does http://docs.ansible.com/ansible/raw_module.html need to be handled specially?	20:10
mordred	clarkb: I don't believe so, no? but I'll go look at it to make sure	20:12
mordred	clarkb: yah - it doesn't look like it does anything particularly interesting	20:13
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer https://review.openstack.org/456721	20:13
clarkb	mordred: executable set that to something like dd	20:13
SpamapS	mordred: executable=/something/evil ?	20:13
clarkb	mordred: then the freeform arg options to dd	20:13
clarkb	SpamapS: ya that exactly	20:13
SpamapS	needs a path filter	20:14
clarkb	and you have to make sure the args list can't be abused either if its run by shell by default	20:14
clarkb	since you can ; foo	20:14
mordred	executable is talking about remote executable	20:14
mordred	not local	20:14
fungi	okay, so situation is that 444495 outlines a mechanism for securing ansible on the executor in the face of untrusted playbooks. we need to be able to run at least some trusted playbooks on the executor anyway, and if there is a shift in consensus later that we should only run untrusted playbooks remotely on single-use test nodes and not on the executor that's probably not a huge additional amount of work (but	20:15
fungi	does lose us some exciting v3 features unfortunately)	20:15
clarkb	ah ok	20:15
mordred	as in, "don't run bash on the remote host, run XXX"	20:15
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer https://review.openstack.org/456721	20:15
SpamapS	fungi: I already have a working bubblewrap patch btw	20:16
mordred	SpamapS: woot!	20:16
SpamapS	https://review.openstack.org/453851	20:16
SpamapS	just needs bwrap	20:16
SpamapS	doesn't do seccomp yet	20:16
clarkb	SpamapS: why the subshell in bwrap-executor.sh?	20:18
SpamapS	clarkb: for the FDs	20:19
clarkb	does that not work without a subshell too? I guess concern is that something already has fd 11 and 12?	20:19
SpamapS	clarkb: so /etc/passwd is just the result of the getent	20:19
SpamapS	clarkb: we fork to run this so shouldn't be.	20:20
SpamapS	oh we don't close all tho	20:20
clarkb	(mostly just curious because shell magics)	20:20
SpamapS	clarkb: we could get the fds	20:20
jeblair	clarkb: the "test like production" design goal of v3 would be significantly compromised by "run ansible from the test node" alternative. i think we have to run ansible from something completely outside of the test framework in the general case. doing that with k8s is a reasonable thing to look into, much later. in the mean time, a light-weight containerization is very close from the POV of security architecture (just not in terms of scaling). ...	20:33
jeblair	... i hear your concern about vulnerabilities; this adds defense in depth -- we'll have (at least) two layers of protection for the executor -- which i find very reassuring.	20:33
jeblair	(it also would constrain the CD aspect -- if you have zuul run jobs on a production server, there is no test node from which to launch ansible)	20:34
clarkb	yup, as I said I think the spec does a good job of laying out the options and reasoning about why this one is chosen. I just personally think that given the trade off of turn zuul off for indeterminate period of time vs a couple annoying aspects about the run in the test env option the run in test env option is better	20:34
clarkb	jeblair: why couldn't it run on the production nodes too?	20:34
clarkb	or at least a production node if production == a test env	20:35
jeblair	clarkb: ansible to the production node to run ansible?	20:36
clarkb	I also don't think k8s fixes it in the general case where code is coming from users and is arbitrary (it would be if you scoped it down to a tenant and trust your tenant users though)	20:36
clarkb	jeblair: something to the production node to run ansible (possibly ansible)	20:37
jeblair	clarkb: if you have a job which is "run something on all the git servers" where does that job launch from?	20:37
clarkb	but that something would be more tightly controlled	20:37
clarkb	jeblair: head (list_of_git_servers) ?	20:37
clarkb	(there are other security concerns to that too, but at least you've confined the scope of breakage to within the "env", and not env or orchestrator)	20:38
jeblair	clarkb: that seems pretty arbitrary, and not really i think how people are accustomed to using ansible. the goal is to try to be as transparent as possible. if you think "i run this playbook to update my cluster" that should map easily to "zuul runs this playbook to update my cluster". i think if we add topology design requirements for users beyond that, it won't be very attractive.	20:39
clarkb	jeblair: yes, I understand that. The problem is its a very poor choice for how we run zuul for openstack	20:40
clarkb	at least if you are worried about the orchestrator thing being compromised that scope of that is quite large	20:40
jeblair	i don't see why it's a poor choice	20:40
jeblair	i see that it's important to be careful and get right; but that doesn't necessarily make it poor	20:41
clarkb	because if I manage to get control of the orchestrator now I control everything and not just the test env that I owned	20:41
jeblair	clarkb: sure, but there are many other aspects of zuul that if you got control over would give you similar access	20:41
clarkb	and from what we have seen using ansible in this capacity is incredibly leaky	20:41
clarkb	jeblair: right but we don't let arbitrary code execute within the context of those pieces of zuul	20:42
jeblair	clarkb: we let zuul run with arbitrary configuration	20:42
jeblair	clarkb: that seems almost as dangerous. if not worse.	20:42
clarkb	jeblair: I'm not sure I follow? today zuul config is not arbitrary. Its reviewed by multiple individuals first	20:43
jeblair	(personally, i'm actually more worried that we'll mess up something there than someone will escape bubblewrap)	20:43
jeblair	clarkb: in v3 we have dynamic config	20:43
clarkb	jeblair: you are saying bigger concern that tenant A might somehow get tenant B's secrets by configuring themselves to run that job?	20:44
clarkb	(I can see that being a concern too)	20:44
SpamapS	in a CD situation, Maybe this is misguided, but I don't expect any untrusted playbooks to be running.	20:44
jeblair	clarkb: sure, or run a job they aren't supposed to, or run a job on a node they aren't supposed to.	20:44
jeblair	SpamapS: i agree that would be bad form :)	20:44
clarkb	SpamapS: but you don't actually have control over that	20:44
SpamapS	I, the zuul admin, do.	20:45
clarkb	(thought maybe thats something that should be configable? if it isn't arleady)	20:45
SpamapS	final for all the things.	20:45
jeblair	clarkb: i'm not sure what you mean by "but you don't actually have control over that"	20:45
SpamapS	actually not final for all the things.	20:45
clarkb	jeblair: users can push changes that change how jobs run. And then they will run before being reviewed	20:45
clarkb	jeblair: so in the CD case you could have someone push a playbook that takes out testing/staging/prod whatever potentially without being reviewed first?	20:46
SpamapS	just that if it's CD, I'm not running check jobs on the "actually depoy to things" branch	20:46
SpamapS	of course, you still get a gun and a foot	20:46
jeblair	clarkb: i would not recommend putting CD jobs in untrusted projects; config projects don't run with dynamic config. so if you put your CD stuff in config projects, you should be fine.	20:47
jeblair	clarkb: (unless we mess up implementation, as i was saying above)	20:47
clarkb	right gotcha	20:47
SpamapS	one would hope anybody with access to the guns will be trying hard not to aim at their foot, and mostly use push powers (guns) for dire situations where reviews are broken.	20:47
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Send interface_ip in the node description https://review.openstack.org/455639	20:49
clarkb	I think my concern on the ansible code execution front is I've spent a little time with the code base now and basically think there is no way to fully lock it down. Which is where bubblewrap comes in. And in that space people who know a lot more about it than me stillsay containers are not an isolation primitive. Whereas VMs (which still do get compromised too) seem to have a better reputation in that space	20:49
jeblair	clarkb: i think we may be crossing the hump on container as security primitive. certainly the bubblewrap folks seem to think it's reasonable. and it's being used more and more.	20:50
SpamapS	so, bubblewrap _is_ trying to drop privileges, and close known doors out of containers.	20:51
SpamapS	it's not just making namespaces	20:52
jeblair	++	20:52
jeblair	clarkb: i agree that ansible blocking is leaky. and containers have potential vulns (they are still new, in this context, in the scheme of things). but i think that the two together give me warm fuzzies.	20:52
SpamapS	the one piece I'm missing in the implementation I did already is seccomp to further ratchet down access to kernel subsystems.	20:53
clarkb	definitely (which I've said before, bubble wrap of the not VM option seems like a good one)	20:53
SpamapS	and then another piece which I think is a layer we would want but is hard if not impossible to build into zuul, is configuration of a MAC	20:53
mordred	clarkb: ++	20:53
mordred	this is why I've been an advocate for both things. I agree that we're unlikely to get perfect on locking down ansible - and also that i'm still holding out concerns about containers - but with both of them at the same time it seems to me to be in the acceptable range - and also will give me some time in production with a container tech to get to trust it more	20:54
clarkb	I'm also slightly concerned about the operational overhead involved with getting a working secure bubblewrap on $distro, but thats not my real concern, thats solvable once and then done	20:54
mordred	without relying _only_ on the container tech	20:54
jeblair	clarkb: and i think that the design we've chosen for v3 is a good one -- i think that the way we are looking at jobs in v3 is nothing short of revolutionary in both the CI and CD spaces. i think it's worth doing as much as we can to try to achieve that. if bwrap+ansiblock is insufficient, i'd rather tack toward k8s/nodepool-extra-node/etc rather than give up on that idea.	20:54
SpamapS	It may make sense to include an apparmo and/or an selinux setup kit. But I think those may be hard to make dynamic and built in.	20:54
mordred	SpamapS: yah	20:55
jeblair	SpamapS: the day we can run zuul from packages that include that will be a happy day :)	20:55
mordred	jeblair: ++	20:56
pabelanger	fedora rawhide soon(tm)	20:56
clarkb	(I also don't want to go too far down the path of k8s magically solving this problem as I do not think it does, its solves an orthogonal problem of having cheap throwaway execution envs that may or may not be leaky themselves)	20:57
jeblair	clarkb: but ultimately, i think that bwrap+ansiblock(+more) is good now, and if we can get security+scalability out of k8s/nodepool/etc later, that's a good plan.	20:57
jeblair	clarkb: indeed	20:57
SpamapS	at this point, with just bubblewrap, if you can get ansible to bust out, you're sitting in a purpose built readonly dir with a bind-ro mounted, user namespaced (There is no root) /usr with a namespaced everything-except-networking, a locked fsuid ...	20:58
clarkb	SpamapS: interesting that networking is not namespaced (but since there is no root that should be fine)	20:58
SpamapS	it can be	20:59
SpamapS	but we don't need it	20:59
SpamapS	You are also limited to CAP_SYS_ADMIN, CAP_SYS_CHROOT, CAP_NET_ADMIN, CAP_SETUID, CAP_SETGID (but without a way to get a uid==0)	20:59
clarkb	SpamapS: do we want to allow CAP_NET_ADMIN if not namespacing networking (or do you mean in general you can have CAP_NET_ADMIN?)	21:00
SpamapS	clarkb: bubblewrap allows those caps, and drops all others.	21:01
SpamapS	could probably suggest a feature where if not doing --unshare-net then CAP_NET_ADMIN is dropped	21:02
SpamapS	a lot of what's done in bubblewrap is dropping "things that make no sense to give to applications". But there are, I guess, apps that need that CAP sometimes.	21:02
clarkb	SpamapS: ya just thinking that could be used to DoS the env if you break out of ansiblock	21:03
SpamapS	it also implementes privilege separation for things that it does post-fork	21:04
SpamapS	it's just solid sandboxing	21:04
jeblair	Shrews, mordred: why not make fingerd part of executord?	21:13
Shrews	jeblair: tried that first, but gotta be root for the finger port	21:14
jeblair	Shrews: ah fun. well, we still have to solve that even if it's a separate program, right? we could drop privileges when daemonizing...	21:16
Shrews	jeblair: so it was either a separate thing, or change executor to drop privs to the configured user	21:16
jeblair	is there a way to use capabilities with a python program?	21:16
mordred	https://pypi.python.org/pypi/deescalate/0.1	21:17
jeblair	(can you setcap /usr/local/bin/foo.py or would you have to setcap /usr/bin/python)?	21:17
clarkb	could also potentially do the gerrit thing and run on some high port by default?	21:17
clarkb	(thats less useful if you just want your finger command tow ork though)	21:17
mordred	which is, of course, using C: https://github.com/stephane-martin/deescalate/blob/master/deescalate/_deescalate.pyx	21:18
jeblair	clarkb: yeah, could do that and iptables.	21:18
jeblair	i think the daemon module also supports switching uids	21:20
Shrews	jeblair: i went with the separate daemon to avoid any pesky security things by changing how the executor privs work, but i could go back and rework it again to be in the executor if you prefer.	21:20
Shrews	i think the separate process is actually pretty simplified. it can get all the info about jobs it needs from the zuul.conf file. but tomato tomato	21:22
* Shrews just glad to be coding on non-nodepool things :)		21:22
jeblair	Shrews: i'm not convinced i know the right answer right now. combining it with executor pros: there's a 1:1 relationship between executors and fingerds, so it makes sense. there would be less boilerplate process code for devs and fewer daemons for operators to know and run. if we used threads, we have easy internal access to the jobdir and the host inventory. cons: more work for the executord, especially if we use threads. we can still use ...	21:24
jeblair	... fork, though then we lose the easy access to internal variables.	21:24
jeblair	Shrews: on balance, i'm leaning toward ignoring the internal variable access argument so that we can choose thread/fork as appropriate. but i'm being swayed by the idea that if we combine them, we don't need to keep track of (or explain to operators) an extra process.	21:25
SpamapS	I'm curious now...	21:26
SpamapS	how does zuul know what to put in the finger hostname?	21:26
jeblair	Shrews: (and i realize you have already solved the jobdir location problem, so that shouldn't be an issue. at some point in the future, we will probably want to know host inventory so we can request /var/log/syslog@test-host. but of course, we can read the ansible inventory file. :)	21:27
SpamapS	in the past, the telnet hostname is just the node's best effort public IP	21:27
jeblair	SpamapS: the scheduler will know the executor running a job, so it can say "finger UUID@executorhostname"	21:27
Shrews	jeblair: playing devil's advocate, the executor should just "execute" jobs	21:28
Shrews	but also, we're sending the finger requests to the executor host, so....	21:28
SpamapS	jeblair: yeah, then that's another good argument for fingerd==executord	21:28
jeblair	Shrews: i hear that. i think there's a fuzzy line between too few and too many microservices. i'm not sure the best way to charactize that, but things like "relationship between services and hosts" is one of them, "how annoying is it for operators" is another, and "how does it affect scalability".	21:30
jeblair	Shrews: those first 2 might be the same thing. :)	21:30
Shrews	jeblair: i'll code it up the other way and then we can do a side-by-side comparison. maybe that will help with the decision making?	21:30
Shrews	grrr... i think i lost the code from the first time. ah well.	21:32
fungi	from a security perspective, separate daemons _feels_ safer. but the devil is of course in the details	21:32
Shrews	fungi: YOU are the devil	21:32
jeblair	Shrews: at any rate, for the last one, i'd say that's a weak push toward microservice, but the first two, i'd rate as a slightly stronger push toward monolithic.	21:33
fungi	i advocate for him well, at any rate	21:33
jeblair	fungi: good point, that should be on the list too. :)	21:33
jeblair	fungi, Shrews: i think the security footprint is similar in both cases (we want this to run as the zuul user after getting the port and dropping privileges regardless). so it should end up having the same level of access.	21:34
* fungi still thinks qmail was a good design, security-wise (just made for a fairly unfun management situation at times if you forgot what needed to be kept running)		21:34
fungi	yeah, i guess my worry is that you inadvertently open up an anonymous/unauthenticated vulnerability in the finger socket implementation which allows an attacker to influence or even take control of an executor running a sensitive job	21:35
fungi	i get that's a ton of hand-waving though	21:35
Shrews	fungi: yeah, that was my initial thinking in choosing the separate daemon pathway	21:37
fungi	and still conceivable even if they're separate... assuming the fingerd has access to all the same files on disk that the executor does	21:38
fungi	though would it be possible to limit its access to just the logs it's intended to stream?	21:38
jeblair	fungi: if that's a concern, we could start a separate process from the executor daemon (like we do for geard). so at least it's transparent for the user. the identical access argument makes me weigh this fairly lightly though.	21:38
fungi	yep, i get that	21:39
fungi	just doing my part at devil's advocacy	21:39
fungi	ultimately it's going to be about the same either way, and one way is less work	21:39
jeblair	Shrews: i left some total nits on PS4 because that's what you were on when i started typing them. :)	21:40
jeblair	(ok, 3 nits and one actual thing)	21:40
Shrews	jeblair: k. ps5 just adds the actual streaming	21:41
Shrews	which is really just a rip off of zuul_console.py	21:42
jeblair	that seems to have been sucessfully streaming logs, so ++ :)	21:42
Shrews	fwiw, i do not believe zuul_console would actually properly close the sockets at the end of the log. i had a hard time getting that to actually work	21:43
mordred	jeblair: btw I read the finger protocol RFC ... and in so doing learned about finger foo@bar.com@bang.com syntax	21:43
Shrews	and the vending machine query functionality!	21:45
jlk	today's data point	21:46
jlk	my tree passes tests with 8 cores, concurrency 8	21:46
jlk	fails tests with 8 cores concurrency 4	21:46
jlk	fails tests with 4 cores concurrency 8	21:46
jlk	(unless tox /testr forces concurrency to be no more than cores)	21:47
jlk	(also fails 4 cores, concurrency 4)	21:47
fungi	unit test race(s)?	21:48
mordred	jlk: I swear to god, the fix is going to be a comma some where	21:49
SpamapS	jlk: have you tried changing the hard fail to soft fail in the timeout setup?	21:50
jlk	I have not.	21:50
jlk	what would that look like?	21:50
SpamapS	gentle=False changes to gentle=True	21:51
SpamapS	in tests/base.py	21:51
SpamapS	jlk: also do you have a log of your fail that isn't berzillions of bytes?	21:52
jlk	no	21:52
jlk	because it seems to need to be the whole she-bang	21:52
jeblair	mordred, Shrews: would "finger /var/log/nova.log@compute@build_uuid@zuul.example.com" be appropriate? or would the rfc have us use something other than @ there?	21:52
jlk	smaller sets of tests have passed	21:52
jamielennox	no meeting today?	21:52
SpamapS	jlk: k, is there one I can pull down?	21:52
jeblair	jamielennox: i think we are going to have one?	21:53
jeblair	unless we all decide it's not useful. :)	21:53
jamielennox	isn't now the time?	21:53
Shrews	jeblair: i think so. i was considering 'finger JOB:/log1@compute@something'	21:53
jeblair	jamielennox: in 7 minutes.	21:53
jamielennox	of have my calendars gone crazy again	21:53
jamielennox	doh	21:53
jamielennox	yea, my bad	21:53
mordred	jeblair: I will not be able to be at the meeting- it took us slightly longer to arrive back in Dallas and we're just now hitting rush hour traffic	21:53
SpamapS	jamielennox: sorry, we don't do wallaby time.	21:53
jamielennox	SpamapS: i try so hard to get the hours lined up i completely screwed up the minutes ;)	21:54
jlk	you could have been from one of those dictatorships that set their time to be a few minutes off, just to dick with people.	21:54
fungi	i see you've read the constitution for my jerkocracy	21:56
jamielennox	just google calendar changing the notification to 10 minutes, rather than starting now	21:56
jeblair	fungi: i didn't think you let anyone read that?	21:57
jlk	first rule of jerkocracy, nobody gets to know what the rules are.	21:57
clarkb	the timezones that are off by half an hour always make my brain stop working	21:57
jlk	I rather like the UTC complication on my watch face.	21:58
clarkb	I can do hour math mostly. But when I have to add or subtract half na hour I dunno what happens but brain breaks	21:59
SpamapS	jlk: you mean, like.. Bangalore?	21:59
SpamapS	which is UTC+1230	21:59
SpamapS	or something like that	21:59
jlk	I think there are a few others like that	21:59
jamielennox	adelaide	22:00
SpamapS	UTC+0930 I think	22:00
jamielennox	luckily nothing really happens in adelaide	22:00
SpamapS	like "oh we couldn't possibly have tea with the sun in the sky 3 degrees further on. no no no"	22:00
pabelanger	Newfoundland time zone checking in! UTC-3:30	22:00
SpamapS	so now there is a meeting yeah?	22:00
SpamapS	pabelanger: I guess closer to the poles it might matter. ;)	22:01
jeblair	meeting time now	22:01
jeblair	or 1 minute ago	22:01
*** jkilpatr has quit IRC		22:40
*** jkilpatr has joined #zuul		23:18

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!