Monday, 2017-04-17

*** bhavik1 has joined #zuul05:36
*** bhavik1 has quit IRC06:09
*** pbelamge has joined #zuul09:15
pbelamgeHello All09:17
pbelamgeSince few days I have started exploring Zuul by following openstack infra zuul web page09:18
pbelamgeAs per the doc, when I run zuul-server, it just exits and doesn't provide any logs as why it is exiting09:20
pbelamgeif I run with -d option, then it runs fine on the console09:20
pbelamgeam I missing anything in first case?09:20
pbelamgeanybody?09:52
*** _ari_|gone is now known as _ari_13:09
pabelangerpbelamge: what version of zuul are you running?13:11
pabelangerwe had this issue recently with a change to logging IIRC13:11
pabelangermake sure your logging file is correct13:11
openstackgerritMerged openstack-infra/nodepool master: Add mirror support for fedora-25 DIB  https://review.openstack.org/45637213:49
openstackgerritMerged openstack-infra/nodepool master: Switch to /etc/ci/mirror_info.sh for nodepool mirrors  https://review.openstack.org/45637413:50
*** jkilpatr has joined #zuul13:50
pabelangeryay13:51
pabelangerhttps://review.openstack.org/#/c/455770/ is people are reviewing :D13:51
openstackgerritMerged openstack-infra/nodepool master: Add functional test for key-name and glean  https://review.openstack.org/45577013:59
*** dkranz has joined #zuul14:09
*** pbelamge has quit IRC14:11
*** eggshell has left #zuul15:57
*** eggshell has joined #zuul15:57
*** corvus is now known as jeblair16:12
jeblairgood morning!16:12
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (1/2)  https://review.openstack.org/45336216:21
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Replace config/project repos with config/untrusted projects  https://review.openstack.org/45334716:21
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (2/2)  https://review.openstack.org/45382116:21
Shrewsjeblair: welcome back16:44
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix dynamic reconfiguration  https://review.openstack.org/45439516:44
jeblairmordred, pabelanger: i guess you didn't make any more headway on https://review.openstack.org/454396  ?16:48
pabelangerjeblair: sadly no, it was a light week for me last week on zuul things16:49
*** harlowja has quit IRC16:49
*** harlowja has joined #zuul16:52
*** jkilpatr has quit IRC17:26
*** jkilpatr has joined #zuul17:27
* SpamapS cracks knuckles and prepares to dive back in17:35
SpamapSjeblair: feeling recharged I hope? :)17:35
jeblairSpamapS: yes!  let's merge some changes!  :)17:40
Shrewsi think there's still the random job failure issue, yeah?17:41
clarkbShrews: yes I think so17:43
clarkbI have a changr up that runs tests twice that seems to catch it that you can recheck to see17:43
jeblairclarkb: what's that telling us?17:47
SpamapSIf running twice hits sometimes.. tells me there's a race.17:47
SpamapSIf it hits always, tells me there's a cleanup problem.17:48
SpamapSjlk: IIRC, --analyze-isolation did not find a bad interaction, right?17:48
clarkbjeblair: SpamapS that and if its test order dependent the .tesrepository data from first run influences test order of second run17:48
clarkblocal testing showed clean first runs are more likely to pass17:48
jeblairclarkb: have you gotten data from your experiment yet?17:49
jlkSpamapS: that's right17:49
clarkbthe change I pushed? it ailed as expected yo match local results. I havent had much time to look at 8t further though17:50
jlkI can run with concurrency of 4 and get failures.17:50
jlkconcurrency 8 is fine17:50
jeblairclarkb: did it fail the first run or the second?17:51
SpamapSjlk: on a box w/ 8 CPUs yeah?17:52
jlkyeah, I haven't tried doing this on a 4 cpu box but with forced 8 concurrency17:52
clarkbjeblair: tge second17:52
SpamapSweren't we also suspicious about the sqla reporter tests?17:53
jeblairclarkb: where will you go next with that change?17:54
jlkI removed those from my set and still got failures17:54
clarkbjeblair: I think we need to track down the fails and fix them then possibly merge tge change if we think it will prevent regressions else abandon17:55
clarkbit was mostly a sanity check that the gate wasnt special17:55
*** jonesn has joined #zuul17:56
jeblairclarkb: okay, i wasn't sure if you had a plan to use that change to track down the failures.17:57
jonesnIs anyone around who could answer a few (probably basic) questions about adding a gate to a project?17:57
jeblairclarkb: we have a significant first-run error rate as it is, so i don't think running twice is necessary to prevent regressions.17:57
jeblairclarkb: (our error rate is also significant enough that i don't think that a single run of that change is enough to show that successive runs always cause problems.  i run locally with no cleanup between runs and pass/fail about as often as the gate)17:59
clarkbok17:59
jeblairclarkb: may want to throw a bunch of rechecks at it?17:59
SpamapSjonesn: I'm certain people in here can answer questions about gates and projects. It may be best to just ask, and then when people have a moment they can answer.18:03
SpamapSjonesn: most of us are pretty focused on v3 dev, so your patience is very much appreciated. :)18:03
clarkbjeblair: ya, thoguh I think SpamapS --analyze-isolation plan is likely to return the best results in short term18:03
jonesnIs getting a tox based gate added as simple as "have a toxenv to run, edit zuul/layout.yaml, edit jenkins/jobs/projects.yaml"?18:03
jeblairjonesn: are you asking about openstack's instance of zuul?18:05
jonesnjeblair: Yes, I think. I'm trying to add a bandit gate to the cinder project.18:06
SpamapSclarkb: To be clear, my suggestion was to hold a node that fails, and use --analyze-isolation on it when it fails. But jlk basically simulated that and got no insight.18:06
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer  https://review.openstack.org/45672118:07
jeblairjonesn: thanks, that context helps.  these doc links may help: https://docs.openstack.org/infra/manual/creators.html#add-basic-jenkins-jobs should be pretty close to what you want, and for further information: https://docs.openstack.org/infra/manual/drivers.html#running-jobs-with-zuul18:07
clarkbSpamapS: you shouldn't need to hold a node, just make it fail locally (which seems easy) and run analyze-isolation on that. :( that it didn't catch anything though18:07
jeblairjonesn: the #openstack-infra channel is for discussion of openstack infrastrucuture tools, and there are more folks there that can help with this specific kind of issue18:08
jonesnjeblair: Thank you. Sorry for being on the wrong channel.18:08
jeblairjonesn: you're welcome (and it's not the *wrong* channel, just that there's a better one :)18:09
SpamapSso..18:11
SpamapSthe recent shake-up of Ubuntu dev has me worried about landing bubblewrap in xenial-backports18:12
*** jonesn has left #zuul18:12
SpamapSI've asked on the ubuntu-devel mailing list if they need help and gotten no replies18:12
SpamapSAnybody want to migrate to Debian unstable? ;-)18:12
jeblairSpamapS: iirc, fungi does.  i'm not opposed.18:13
jeblair(i'm also not opposed to centos, fwiw :)18:13
SpamapSYeah either one would be fine.18:13
clarkbmy concern with centos generally is lag on security patches18:14
SpamapSalso Debian's releasing every 2 years now, and stretch is about 3 weeks away.18:14
SpamapSso it's actually not such a terrible thing to be on stable18:14
jeblairSpamapS: it's possible i misspoke for fungi and that's actually what he would prefer :)18:14
jeblairSpamapS: what's the feasibility of doing our own backport?18:14
SpamapSjeblair: Oh our own backport is done. Just don't know if that's something infra wants to host somewhere.18:15
SpamapSthe bug requesting it is literally just a rubber stamp, the backporters team will run a script and upload the backport as soon as they get to it18:15
SpamapSbut there are 43 others in front of it.18:15
SpamapSI've also offered to start helping with that, since I find it quite useful to have a functioning ubuntu backports system.18:16
mordredSpamapS: we _do_ depend on a PPA in infra for one package, although it's not the happiest thing in the world to depend on it since it's a one-off with no process around it18:16
jlkeww18:16
jlkthose always bit us at Blue Box18:17
SpamapSPPA's are, IMO, Launchpad's killer feature.18:17
mordredpatched version of vhd-utils is needed to be able to make images for rackspace public cloud18:17
SpamapSbut yeah, one-offs w/o process are a problem.18:17
mordredyup. totally agree18:17
jlkagreed18:17
jlkI had championed a similar thing over in Fedora land18:17
mordredalso, it's super annoying that there isn't already a tool that can make vhd images that rax public can consume18:17
jeblairat least this would be time-limited (until next release)18:17
mordredjeblair: ++18:17
jlkit just took a lot of time for it to come to fruition (after I left)18:18
SpamapSYeah a backports PPA is better than a "that's never getting into Debian" PPA.18:18
jlkmordred: seems like that problem will solve itself in the future... :(18:18
mordredyah. the vhd-util thing falls in to a "wow, that's a terrible patch" category18:18
mordredjlk: sssh. we're hoping that future remains far away - will be a lot of work if it comes to pass soon :(18:19
SpamapSand with a xenial PPA, it's somewhat natural to end up on the next LTS of Ubuntu with the package coming from universe instead of that PPA.18:19
mordredSpamapS: ++18:19
SpamapSalso it's possible the backporters team gets to it eventually18:19
SpamapSand we just delete it from the PPA18:19
SpamapSkk I'm convinced18:19
mordredin any case, I personally would be fine with a bubblewrap ppa18:19
mordredfor now - since there is a path to the future18:19
Shrewsmordred: should you find time, i'm hoping if you could check out https://review.openstack.org/456721 and see if i'm heading in the correct direction. it's pretty basic atm18:22
mordredShrews: looking now18:22
Shrewsmordred: and if you wonder why i chose socketserver, it's the only thing i found that works with handling sockets in a forked/threaded manner18:22
Shrewscould not get my own manual version to function correctly  :(18:23
mordredShrews: that looks good to me so far - jeblair you wanna double check that to make sure I'm not crazy?18:27
*** hashar has joined #zuul18:58
*** hashar has quit IRC19:18
*** dkranz has quit IRC19:33
fungijeblair: you know me too well. i'd put in time on getting stuff going on sid if there was some consensus it's a good idea ;)19:34
fungi(i mean, would be convenient for me since that's what my dev systems run...)19:34
fungi(or stretch, sure, why not)19:34
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer  https://review.openstack.org/45672119:37
mordredfungi: I think bubblewrap is already in sid, so we'd be good there19:39
pabelangerclarkb: do you have examples of the security lag for centos-7? never heard that before19:39
clarkbpabelanger: patches to things (for example heartbleed) take a day or two longer than the other distros out there19:39
fungimordred: already in stretch (stable rsn!) and in jessie-backports... https://packages.debian.org/search?keywords=bubblewrap19:40
mordredfungi: woot19:40
fungiso pick your poison19:40
clarkbpabelanger: I think because instead of centos patching things directly they wait for rhel then do all their testing then push? something like that maybe?19:40
pabelangerclarkb: ya, if centos depends on RHEL, I could see the day lag or so19:41
fungimordred: if you _specifically_ want >=0.1.8 though, you're probably stuck waiting post stretch for the packages presently hiding in experimental19:41
fungibut 0.1.7 can be had now in stable(backports), testing(frozen) or unstable19:41
fungijeblair: clarkb: SpamapS: should we add the executor security spec on today's meeting agenda for some last-minute digging into the comments about the on-a-test-node alternative? would like to be certain on the pros and cons before approval19:44
SpamapSfungi: I feel like that horse is dead, but we can of course reanimate it if there are some more questions we forgot to ask while we killed it.19:49
fungiclarkb: are you still on the other side of the fence there after my and SpamapS' subsequent comments about the additional cons? (or have you read them yet?)19:51
clarkbI haven't read them yet. I wasn't going to stop moving foward as is (I noted that last meeting)19:51
* clarkb looks now19:52
SpamapSI actually kind of think the executor is a perfect case for a real kubernetes system btw.19:56
SpamapSscale them out as needed, isolate each pod to vms owned only by the one project19:56
SpamapSbut that's a large pile of design work19:56
mordredI agree with both things - I think there are some potential benefits of k8s things, and also that it'll be a large pile of design work to figure out how19:57
* clarkb discovered that k8s reused the metadata service design from openstack (and possibly elsewhere) had a sad20:00
mordredsigh20:00
mordredoh well20:00
mordredwell, to be fair, the openstack one wouldn't be a problem if it was just a part of the normal api layer and thus scaled out with it instead of a separate service layer that nobody wanted to spend resources to scale20:01
mordredit's not like scaling a rest service that returns json blobs is hard20:02
clarkbmordred: I think NAT by definition is in the poor to scale category of solutions20:02
mordredwell - yah20:02
mordredthat part is stupid20:02
fungiclarkb: yeah, i appreciate you were willing to not hold up approval, but since pabelanger scheduled it for approval tomorrow anyway we have time to find out if you still actually disagree. that's important to me (at least)20:02
clarkbalso I discovred it because it wasn't working :)20:02
*** dkranz has joined #zuul20:02
clarkbfungi: yes I think I still do disagree. We know the alternative works, and works relatively well at scale. We also know that ansible has been tricky to secure so I think using an unproven system is less desireable20:03
clarkbthats not to say the other system can't work, its just a lot more risky imo20:04
pabelangerI thought the issue of scaling out zuul-executor today was the caching of our git repos? Or have I confused something20:04
mordredpabelanger: you have not20:04
clarkbI think the option chosen is a good one if not going with the alternative and SpamapS did an excellent job laying out the problem space and optiosn available20:05
mordredclarkb: can you expand "the alternative" into slightly more words so I'm sure I'm folling you?20:06
clarkbmordred: running the ansible in the test env itself rather than on the executor. Basically what we do today with eg d-g20:06
fungiclarkb: important enough to give up on being able to have untrusted pre/post playbooks which need access to things only the executor can do, like uploading logs/tarballs? it seemed like a pretty useful (if ambitious) feature, but i understand everything is of course a trade-off to some degree20:06
clarkbfungi: thinking about it operationally, every time a hole is discovered we'd have to turn off zuul20:07
mordredclarkb: gotcha. thank you20:07
fungior i guess the publication playbooks themselves do still need to be trusted, so it's more than those would have to run on the executor (without protections) while untrusted pre/post playbooks run on the test node?20:07
*** jkilpatr has quit IRC20:07
clarkbfungi: right20:07
fungier, more that those20:07
*** jkilpatr has joined #zuul20:07
pabelangerI mean, I like the idea of what clarkb is saying, mostly because that is the only way to do it today. However, I am happy to give the bubblewrap approach a shot too.20:09
fungii suppose there's nothing in the current design preventing a trusted playbook on the executor from calling ansible on the test node to run untrusted playbooks, so in theory we could support both if we already set up executor-side protections20:09
clarkbfungi: the problem is you don't control that in the current design20:09
fungiat worst it makes the spec under discussion redundant/overkill if it ends up only securing trusted playbook execution20:10
mordredwell - yah. I believe we need to write the "run ansible on test node" stuff at some point regardless - because there will be things that want to run with no restrictions but are untrusted20:10
clarkbfungi: your users can push arbitrary code to run in the sandbox and current experience is its quite arbitrary20:10
clarkbmordred: speaking of, does http://docs.ansible.com/ansible/raw_module.html need to be handled specially?20:10
mordredclarkb: I don't believe so, no? but I'll go look at it to make sure20:12
mordredclarkb: yah - it doesn't look like it does anything particularly interesting20:13
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer  https://review.openstack.org/45672120:13
clarkbmordred: executable set that to something like dd20:13
SpamapSmordred: executable=/something/evil ?20:13
clarkbmordred: then the freeform arg options to dd20:13
clarkbSpamapS: ya that exactly20:13
SpamapSneeds a path filter20:14
clarkband you have to make sure the args list can't be abused either if its run by shell by default20:14
clarkbsince you can ; foo20:14
mordredexecutable is talking about remote executable20:14
mordrednot local20:14
fungiokay, so situation is that 444495 outlines a mechanism for securing ansible on the executor in the face of untrusted playbooks. we need to be able to run at least some trusted playbooks on the executor anyway, and if there is a shift in consensus later that we should only run untrusted playbooks remotely on single-use test nodes and not on the executor that's probably not a huge additional amount of work (but20:15
fungidoes lose us some exciting v3 features unfortunately)20:15
clarkbah ok20:15
mordredas in, "don't run  bash on the remote host, run XXX"20:15
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer  https://review.openstack.org/45672120:15
SpamapSfungi: I already have a working bubblewrap patch btw20:16
mordredSpamapS: woot!20:16
SpamapShttps://review.openstack.org/45385120:16
SpamapSjust needs bwrap20:16
SpamapSdoesn't do seccomp yet20:16
clarkbSpamapS: why the subshell in bwrap-executor.sh?20:18
SpamapSclarkb: for the FDs20:19
clarkbdoes that not work without a subshell too? I guess concern is that something already has fd 11 and 12?20:19
SpamapSclarkb: so /etc/passwd is just the result of the getent20:19
SpamapSclarkb: we fork to run this so shouldn't be.20:20
SpamapSoh we don't close all tho20:20
clarkb(mostly just curious because shell magics)20:20
SpamapSclarkb: we could get the fds20:20
jeblairclarkb: the "test like production" design goal of v3 would be significantly compromised by "run ansible from the test node" alternative.  i think we have to run ansible from something completely outside of the test framework in the general case.  doing that with k8s is a reasonable thing to look into, much later.  in the mean time, a light-weight containerization is very close from the POV of security architecture (just not in terms of scaling). ...20:33
jeblair...  i hear your concern about vulnerabilities; this adds defense in depth -- we'll have (at least) two layers of protection for the executor -- which i find very reassuring.20:33
jeblair(it also would constrain the CD aspect -- if you have zuul run jobs on a production server, there is no test node from which to launch ansible)20:34
clarkbyup, as I said I think the spec does a good job of laying out the options and reasoning about why this one is chosen. I just personally think that given the trade off of turn zuul off for indeterminate period of time vs a couple annoying aspects about the run in the test env option the run in test env option is better20:34
clarkbjeblair: why couldn't it run on the production nodes too?20:34
clarkbor at least a production node if production == a test env20:35
jeblairclarkb: ansible to the production node to run ansible?20:36
clarkbI also don't think k8s fixes it in the general case where code is coming from users and is arbitrary (it would be if you scoped it down to a tenant and trust your tenant users though)20:36
clarkbjeblair: something to the production node to run ansible (possibly ansible)20:37
jeblairclarkb: if you have a job which is "run something on all the git servers" where does that job launch from?20:37
clarkbbut that something would be more tightly controlled20:37
clarkbjeblair: head (list_of_git_servers) ?20:37
clarkb(there are other security concerns to that too, but at least you've confined the scope of breakage to within the "env", and not env or orchestrator)20:38
jeblairclarkb: that seems pretty arbitrary, and not really i think how people are accustomed to using ansible.  the goal is to try to be as transparent as possible.  if you think "i run this playbook to update my cluster" that should map easily to "zuul runs this playbook to update my cluster".  i think if we add topology design requirements for users beyond that, it won't be very attractive.20:39
clarkbjeblair: yes, I understand that. The problem is its a very poor choice for how we run zuul for openstack20:40
clarkbat least if you are worried about the orchestrator thing being compromised that scope of that is quite large20:40
jeblairi don't see why it's a poor choice20:40
jeblairi see that it's important to be careful and get right; but that doesn't necessarily make it poor20:41
clarkbbecause if I manage to get control of the orchestrator now I control everything and not just the test env that I owned20:41
jeblairclarkb: sure, but there are many other aspects of zuul that if you got control over would give you similar access20:41
clarkband from what we have seen using ansible in this capacity is incredibly leaky20:41
clarkbjeblair: right but we don't let arbitrary code execute within the context of those pieces of zuul20:42
jeblairclarkb: we let zuul run with arbitrary configuration20:42
jeblairclarkb: that seems almost as dangerous.  if not worse.20:42
clarkbjeblair: I'm not sure I follow? today zuul config is not arbitrary. Its reviewed by multiple individuals first20:43
jeblair(personally, i'm actually more worried that we'll mess up something there than someone will escape bubblewrap)20:43
jeblairclarkb: in v3 we have dynamic config20:43
clarkbjeblair: you are saying bigger concern that tenant A might somehow get tenant B's secrets by configuring themselves to run that job?20:44
clarkb(I can see that being a concern too)20:44
SpamapSin a CD situation, Maybe this is misguided, but I don't expect any untrusted playbooks to be running.20:44
jeblairclarkb: sure, or run a job they aren't supposed to, or run a job on a node they aren't supposed to.20:44
jeblairSpamapS: i agree that would be bad form :)20:44
clarkbSpamapS: but you don't actually have control over that20:44
SpamapSI, the zuul admin, do.20:45
clarkb(thought maybe thats something that should be configable? if it isn't arleady)20:45
SpamapSfinal for all the things.20:45
jeblairclarkb: i'm not sure what you mean by "but you don't actually have control over that"20:45
SpamapSactually not final for all the things.20:45
clarkbjeblair: users can push changes that change how jobs run. And then they will run before being reviewed20:45
clarkbjeblair: so in the CD case you could have someone push a playbook that takes out testing/staging/prod whatever potentially without being reviewed first?20:46
SpamapSjust that if it's CD, I'm not running check jobs on the "actually depoy to things" branch20:46
SpamapSof course, you still get a gun and a foot20:46
jeblairclarkb: i would not recommend putting CD jobs in untrusted projects; config projects don't run with dynamic config.  so if you put your CD stuff in config projects, you should be fine.20:47
jeblairclarkb: (unless we mess up implementation, as i was saying above)20:47
clarkbright gotcha20:47
SpamapSone would hope anybody with access to the guns will be trying hard not to aim at their foot, and mostly use push powers (guns) for dire situations where reviews are broken.20:47
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Send interface_ip in the node description  https://review.openstack.org/45563920:49
clarkbI think my concern on the ansible code execution front is I've spent a little time with the code base now and basically think there is no way to fully lock it down. Which is where bubblewrap comes in. And in that space people who know a lot more about it than me stillsay containers are not an isolation primitive. Whereas VMs (which still do get compromised too) seem to have a better reputation in that space20:49
jeblairclarkb: i think we may be crossing the hump on container as security primitive.  certainly the bubblewrap folks seem to think it's reasonable.  and it's being used more and more.20:50
SpamapSso, bubblewrap _is_ trying to drop privileges, and close known doors out of containers.20:51
SpamapSit's not just making namespaces20:52
jeblair++20:52
jeblairclarkb: i agree that ansible blocking is leaky.  and containers have potential vulns (they are still new, in this context, in the scheme of things).  but i think that the two together give me warm fuzzies.20:52
SpamapSthe one piece I'm missing in the implementation I did already is seccomp to further ratchet down access to kernel subsystems.20:53
clarkbdefinitely (which I've said before, bubble wrap of the not VM option seems like a good one)20:53
SpamapSand then another piece which I think is a layer we would want but is hard if not impossible to build into zuul, is configuration of a MAC20:53
mordredclarkb: ++20:53
mordredthis is why I've been an advocate for both things. I agree that we're unlikely to get perfect on locking down ansible - and also that i'm still holding out concerns about containers - but with both of them at the same time it seems to me to be in the acceptable range - and also will give me some time in production with a container tech to get to trust it more20:54
clarkbI'm also slightly concerned about the operational overhead involved with getting a working secure bubblewrap on $distro, but thats not my real concern, thats solvable once and then done20:54
mordredwithout relying _only_ on the container tech20:54
jeblairclarkb: and i think that the design we've chosen for v3 is a good one -- i think that the way we are looking at jobs in v3 is nothing short of revolutionary in both the CI and CD spaces.  i think it's worth doing as much as we can to try to achieve that.  if bwrap+ansiblock is insufficient, i'd rather tack toward k8s/nodepool-extra-node/etc rather than give up on that idea.20:54
SpamapSIt may make sense to include an apparmo and/or an selinux setup kit. But I think those may be hard to make dynamic and built in.20:54
mordredSpamapS: yah20:55
jeblairSpamapS: the day we can run zuul from packages that include that will be a happy day :)20:55
mordredjeblair: ++20:56
pabelangerfedora rawhide soon(tm)20:56
clarkb(I also don't want to go too far down the path of k8s magically solving this problem as I do not think it does, its solves an orthogonal problem of having cheap throwaway execution envs that may or may not be leaky themselves)20:57
jeblairclarkb: but ultimately, i think that bwrap+ansiblock(+more) is good now, and if we can get security+scalability out of k8s/nodepool/etc later, that's a good plan.20:57
jeblairclarkb: indeed20:57
SpamapSat this point, with just bubblewrap, if you can get ansible to bust out, you're sitting in a purpose built readonly dir with a bind-ro mounted, user namespaced (There is no root) /usr with a namespaced everything-except-networking, a locked fsuid ...20:58
clarkbSpamapS: interesting that networking is not namespaced (but since there is no root that should be fine)20:58
SpamapSit can be20:59
SpamapSbut we don't need it20:59
SpamapSYou are also limited to CAP_SYS_ADMIN, CAP_SYS_CHROOT, CAP_NET_ADMIN, CAP_SETUID, CAP_SETGID (but without a way to get a uid==0)20:59
clarkbSpamapS: do we want to allow CAP_NET_ADMIN if not namespacing networking (or do you mean in general you can have CAP_NET_ADMIN?)21:00
SpamapSclarkb: bubblewrap allows those caps, and drops all others.21:01
SpamapScould probably suggest a feature where if not doing --unshare-net then CAP_NET_ADMIN is dropped21:02
SpamapSa lot of what's done in bubblewrap is dropping "things that make no sense to give to applications". But there are, I guess, apps that need that CAP sometimes.21:02
clarkbSpamapS: ya just thinking that could be used to DoS the env if you break out of ansiblock21:03
SpamapSit also implementes privilege separation for things that it does post-fork21:04
SpamapSit's just solid sandboxing21:04
jeblairShrews, mordred: why not make fingerd part of executord?21:13
Shrewsjeblair: tried that first, but gotta be root for the finger port21:14
jeblairShrews: ah fun.  well, we still have to solve that even if it's a separate program, right?  we could drop privileges when daemonizing...21:16
Shrewsjeblair: so it was either a separate thing, or change executor to drop privs to the configured user21:16
jeblairis there a way to use capabilities with a python program?21:16
mordredhttps://pypi.python.org/pypi/deescalate/0.121:17
jeblair(can you setcap /usr/local/bin/foo.py or would you have to setcap /usr/bin/python)?21:17
clarkbcould also potentially do the gerrit thing and run on some high port by default?21:17
clarkb(thats less useful if you just want your finger command tow ork though)21:17
mordredwhich is, of course, using C: https://github.com/stephane-martin/deescalate/blob/master/deescalate/_deescalate.pyx21:18
jeblairclarkb: yeah, could do that and iptables.21:18
jeblairi think the daemon module also supports switching uids21:20
Shrewsjeblair: i went with the separate daemon to avoid any pesky security things by changing how the executor privs work, but i could go back and rework it again to be in the executor if you prefer.21:20
Shrewsi think the separate process is actually pretty simplified. it can get all the info about jobs it needs from the zuul.conf file. but tomato tomato21:22
* Shrews just glad to be coding on non-nodepool things :)21:22
jeblairShrews: i'm not convinced i know the right answer right now.  combining it with executor pros: there's a 1:1 relationship between executors and fingerds, so it makes sense.  there would be less boilerplate process code for devs and fewer daemons for operators to know and run.  if we used threads, we have easy internal access to the jobdir and the host inventory.  cons: more work for the executord, especially if we use threads.  we can still use ...21:24
jeblair... fork, though then we lose the easy access to internal variables.21:24
jeblairShrews: on balance, i'm leaning toward ignoring the internal variable access argument so that we can choose thread/fork as appropriate.  but i'm being swayed by the idea that if we combine them, we don't need to keep track of (or explain to operators) an extra process.21:25
SpamapSI'm curious now...21:26
SpamapShow does zuul know what to put in the finger hostname?21:26
jeblairShrews: (and i realize you have already solved the jobdir location problem, so that shouldn't be an issue.  at some point in the future, we will probably want to know host inventory so we can request /var/log/syslog@test-host.  but of course, we can read the ansible inventory file.  :)21:27
SpamapSin the past, the telnet hostname is just the node's best effort public IP21:27
jeblairSpamapS: the scheduler will know the executor running a job, so it can say "finger UUID@executorhostname"21:27
Shrewsjeblair: playing devil's advocate, the executor should just "execute" jobs21:28
Shrewsbut also, we're sending the finger requests to the executor host, so....21:28
SpamapSjeblair: yeah, then that's another good argument for fingerd==executord21:28
jeblairShrews: i hear that.  i think there's a fuzzy line between too few and too many microservices.  i'm not sure the best way to charactize that, but things like "relationship between services and hosts" is one of them, "how annoying is it for operators" is another, and "how does it affect scalability".21:30
jeblairShrews: those first 2 might be the same thing.  :)21:30
Shrewsjeblair: i'll code it up the other way and then we can do a side-by-side comparison. maybe that will help with the decision making?21:30
Shrewsgrrr... i think i lost the code from the first time. ah well.21:32
fungifrom a security perspective, separate daemons _feels_ safer. but the devil is of course in the details21:32
Shrewsfungi: YOU are the devil21:32
jeblairShrews: at any rate, for the last one, i'd say that's a weak push toward microservice, but the first two, i'd rate as a slightly stronger push toward monolithic.21:33
fungii advocate for him well, at any rate21:33
jeblairfungi: good point, that should be on the list too.  :)21:33
jeblairfungi, Shrews: i think the security footprint is similar in both cases (we want this to run as the zuul user after getting the port and dropping privileges regardless).  so it should end up having the same level of access.21:34
* fungi still thinks qmail was a good design, security-wise (just made for a fairly unfun management situation at times if you forgot what needed to be kept running)21:34
fungiyeah, i guess my worry is that you inadvertently open up an anonymous/unauthenticated vulnerability in the finger socket implementation which allows an attacker to influence or even take control of an executor running a sensitive job21:35
fungii get that's a ton of hand-waving though21:35
Shrewsfungi: yeah, that was my initial thinking in choosing the separate daemon pathway21:37
fungiand still conceivable even if they're separate... assuming the fingerd has access to all the same files on disk that the executor does21:38
fungithough would it be possible to limit its access to just the logs it's intended to stream?21:38
jeblairfungi: if that's a concern, we *could* start a separate process from the executor daemon (like we do for geard).  so at least it's transparent for the user.  the identical access argument makes me weigh this fairly lightly though.21:38
fungiyep, i get that21:39
fungijust doing my part at devil's advocacy21:39
fungiultimately it's going to be about the same either way, and one way is less work21:39
jeblairShrews: i left some total nits on PS4 because that's what you were on when i started typing them.  :)21:40
jeblair(ok, 3 nits and one actual thing)21:40
Shrewsjeblair: k. ps5 just adds the actual streaming21:41
Shrewswhich is really just a rip off of zuul_console.py21:42
jeblairthat seems to have been sucessfully streaming logs, so ++ :)21:42
Shrewsfwiw, i do not believe zuul_console would actually properly close the sockets at the end of the log. i had a hard time getting that to actually work21:43
mordredjeblair: btw  I read the finger protocol RFC ... and in so doing learned about finger foo@bar.com@bang.com syntax21:43
Shrewsand the vending machine query functionality!21:45
jlktoday's data point21:46
jlkmy tree passes tests with 8 cores, concurrency 821:46
jlkfails tests with 8 cores concurrency 421:46
jlkfails tests with 4 cores concurrency 821:46
jlk(unless tox /testr forces concurrency to be no more than cores)21:47
jlk(also fails 4 cores, concurrency 4)21:47
fungiunit test race(s)?21:48
mordredjlk: I swear to god, the fix is going to be a comma some where21:49
SpamapSjlk: have you tried changing the hard fail to soft fail in the timeout setup?21:50
jlkI have not.21:50
jlkwhat would that look like?21:50
SpamapSgentle=False changes to gentle=True21:51
SpamapSin tests/base.py21:51
SpamapSjlk: also do you have a log of your fail that isn't berzillions of bytes?21:52
jlkno21:52
jlkbecause it seems to need to be the whole she-bang21:52
jeblairmordred, Shrews: would "finger /var/log/nova.log@compute@build_uuid@zuul.example.com" be appropriate?  or would the rfc have us use something other than @ there?21:52
jlksmaller sets of tests have passed21:52
jamielennoxno meeting today?21:52
SpamapSjlk: k, is there one I can pull down?21:52
jeblairjamielennox: i think we are going to have one?21:53
jeblairunless we all decide it's not useful.  :)21:53
jamielennoxisn't now the time?21:53
Shrewsjeblair: i think so. i was considering 'finger JOB:/log1@compute@something'21:53
jeblairjamielennox: in 7 minutes.21:53
jamielennoxof have my calendars gone crazy again21:53
jamielennoxdoh21:53
jamielennoxyea, my bad21:53
mordredjeblair: I will not be able to be at the meeting- it took us slightly longer to arrive back in Dallas and we're just now hitting rush hour traffic21:53
SpamapSjamielennox: sorry, we don't do wallaby time.21:53
jamielennoxSpamapS: i try so hard to get the hours lined up i completely screwed up the minutes ;)21:54
jlkyou could have been from one of those dictatorships that set their time to be a few minutes off, just to dick with people.21:54
fungii see you've read the constitution for my jerkocracy21:56
jamielennoxjust google calendar changing the notification to 10 minutes, rather than starting now21:56
jeblairfungi: i didn't think you let anyone read that?21:57
jlkfirst rule of jerkocracy, nobody gets to know what the rules are.21:57
clarkbthe timezones that are off by half an hour always make my brain stop working21:57
jlkI rather like the UTC complication on my watch face.21:58
clarkbI can do hour math mostly. But when I have to add or subtract half na hour I dunno what happens but brain breaks21:59
SpamapSjlk: you mean, like.. Bangalore?21:59
SpamapSwhich is UTC+123021:59
SpamapSor something like that21:59
jlkI think there are a few others like that21:59
jamielennoxadelaide22:00
SpamapSUTC+0930 I think22:00
jamielennoxluckily nothing really happens in adelaide22:00
SpamapSlike "oh we couldn't possibly have tea with the sun in the sky 3 degrees further on. no no no"22:00
pabelangerNewfoundland time zone checking in! UTC-3:3022:00
SpamapSso now there is a meeting yeah?22:00
SpamapSpabelanger: I guess closer to the poles it might matter. ;)22:01
jeblairmeeting time now22:01
jeblairor 1 minute ago22:01
*** jkilpatr has quit IRC22:40
*** jkilpatr has joined #zuul23:18

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!