Tuesday, 2017-03-14

clarkbhttps://lwn.net/Articles/706025/00:01
SpamapSok I'm finding the cgroup things00:01
clarkblooks like it may still be a work in progresss00:02
SpamapSIt works00:03
SpamapSbut it's new00:03
SpamapSso I think the way you're supposed to do it00:03
SpamapSis write a template unit file00:03
SpamapSand then systemd-nspawn with that00:03
jlkSo it might help if we sketch out what we think we want to happen00:04
jlkand then look at what tooling will do some of it, and what tooling will require us to write glue around it00:05
SpamapSYeah the spec right now is what we think might happen if somebody breaks out, and weighs the capabilities against that.00:06
SpamapSBut it doesn't do much to help design the happy path.00:06
SpamapSWhat I do see as the happy path is 0) rootfs image with ansible-playbook, git, and deps, built periodically 1) Copy image into empty dir. 2) Make writable scratch space in dir. 3) [git magic in scratch space] 4) run trusted pre-playbooks in dir 5) chroot into dir and run ansible-playbook in untrusted context 6) run trusted post-playbooks in dir. 7) dust off, nuke it from orbit00:11
SpamapS(it's the only way to be sure)00:11
jlkI'm fuzzy on v3 stuff, the launcher is going to push code into the node, yes?00:13
SpamapSyes, the pre playbook does that00:13
jlkand the launcher is also what will fetch code from the upstream to merge the proposed patch(s) in the right order?00:13
SpamapS[git magic]00:13
SpamapSyes that's step 300:14
jlkany reason why the [git magic] can't happen inside the containment?00:14
jlkdoes it require too much zuul ?00:14
SpamapSso that's where the debate on whether or not to have separate mergers comes from00:15
SpamapSI recall there being some reason that having it separate is desirable00:15
clarkbSpamapS: scaling is a big one (git merging is slow)00:15
SpamapSbut clearly it got ejected from my LRU cache00:15
jlkthere's a security concern00:15
jlkif dealing with private repos00:15
SpamapSRight, private repos is enabled by the auth system00:15
SpamapSIn theory anyway00:16
SpamapSprobably some plumbing to make that work eventually00:16
SpamapSSince right now connections are what define how to fetch code.00:16
SpamapSjlk: step 3 is in [] because that may also just be that we copy the merge result which the mergers created somehow somewhere.00:17
SpamapSI think I'll add a revision to the spec that puts that general flow in for debate00:17
jlkkk00:18
jlkwalk backwards00:18
jlkto execute the in repo playbook, we need, the repo contents. heh.00:18
jlkbut wait00:18
SpamapSaye we do00:18
jlkrepo has a playbook in it00:19
SpamapSbut in-repo playbooks are only in step 500:19
SpamapSpre/post come from config repos00:19
jlkk00:19
jlkI'm still struggling with step 5 for a moment00:19
SpamapSthat's the biggie :)00:19
jlkan in-repo playbook, gets executed by the launcher, so it has to be written from the context of ....00:20
jlkthis still feels weird to me, as in why we aren't prepping a VM, and making that in-repo playbook execute on the _vm_, with allthe code around it00:20
SpamapSjlk: the reason we're using ansible is it lets us do some interesting things between vms00:21
SpamapSjlk: but.. that's not a bad idea00:21
jlkit does, but it opens up a big can of worms00:21
jlkbecause if I want to do interesting things between VMs, I'm going to care about the ansibile I'm using to do it00:22
* rbergeron pulls out her pandora's box of dependencies00:22
jlkand the environment it's running in00:22
SpamapSrbergeron: no it's ok, we're rewriting ansible in Rust00:22
SpamapSalmost done00:22
SpamapSransible00:22
rbergeronSpamapS: oh good00:22
SpamapSyou heard it here first00:22
jlkansibust00:22
SpamapSding ding00:22
* SpamapS renames00:23
SpamapSjlk: so I see what you're getting at00:23
SpamapSand I kind of love it00:23
SpamapSjust push the git trees up and run ansible from one of the nodes00:23
SpamapSand they can do whatever they want00:23
SpamapSno plugins needed even00:24
jlkit just feels wrong to me to hand the repo rights to execute things on our control environment00:24
SpamapSI agree00:24
SpamapSansible is too powerful for that00:24
jlkmaybe it costs jobs an extra VM as a bastion00:24
jlkbut it side steps a whole lot of nastiness00:24
SpamapSRight, that bastion can be pretty tiny00:25
SpamapSand if your job can handle it, just run your tasks on localhost00:25
jlkI may be biased, I've written ansible to run ansible before00:25
SpamapSso the default py27-tox job is just a localhost shell.00:26
jlkyeah00:26
jlkyou can get even trickier00:26
jlkthi sis ugly, but you add 10 hosts, all with ansible_connection=local00:26
jlkand then you can do things in parallel, locally00:26
jlkstop, disregard that.00:27
SpamapSYeah that's not the thing :)00:27
SpamapSI think what you'd do is simply have a predictable host that the executor uses to run the in-repo playbooks.00:27
SpamapSmordred: ^ why didn't you think of this? ;)00:28
SpamapSjeblair: ^00:28
SpamapSclarkb: ^00:28
jlklike, Please do convince me why it's a good idea to be doing the in-repo playbook execution _on_ the executor00:28
SpamapSoh man, time changed.. it's 5:30 but I could like, go outside and feel the day star still00:28
clarkbexcept its raining00:28
SpamapSclarkb: what you need is a nice drought-addled state ;)00:29
clarkbSpamapS: oregon is completely drought free for the first time in 5 years or something00:30
clarkbERAIN00:30
jlkI think we've had 3 "mild" days since October.00:33
SpamapSI believe California is now 98% drought free00:35
SpamapSwith San Diego being the only part still slightly behind00:35
SpamapSjlk: seems mordred and jeblair are not around. I think we should discuss your thinking with them tomorrow.00:35
clarkbalso coldest winter in a quarter century. I have not enjoyed thsi winter00:36
jlkI could be missing something fundamental00:36
SpamapSBecause honestly.. I'd much rather inject ansible into the node that we're already working hard to build and isolate, than try to execute it securely on our executor.00:36
SpamapShttp://www.laalmanac.com/weather/we13a.htm00:37
SpamapSWe're a wee bit ahead of usual season totals ;)00:37
jlknot exactly zuul related, but I have a question on workflow00:38
jlkI feel like I'm doing this wrong.00:38
jlkI've got this long series of patches I'm trying to bring over, one by one to v3 and submit them00:39
jlkwhat I've been doing is making a new topic branch based on the previous topic branch, then cherry-picking the old patch over.00:39
jlkfixups, then git review -t00:39
jlkI realize that if I need to fix one of the early patches, I now have like 20 branches to mess with00:40
jlkshould I be doing this all from _one_ local branch, so that a fixup/rebase will bubble up the stack once and a git review -t will re-submit new patch sets for the whole stack in one shot?00:40
jheskethjlk: that is the approach I take with long series (and I've seen others too).. It's kind of expected that a change down the stack will rebase subsequent work. The patchset diff view in gerrit makes re-reviewing changes easier00:41
SpamapSjlk: one branch, definitely00:42
SpamapSname the branch your desired -t argument00:42
jlkokay. I had started with one branch, but for some reason I abandoned that early on, and I can't recall why.00:42
SpamapSgit review will set topic based on branch name IIRC00:42
clarkbya I do a lot of git rebase -i HEAD~N00:42
SpamapSthat^00:43
clarkband jeblair hsa a little utility to figure that rebase command automagically for you00:43
clarkbI forget the name though00:43
SpamapSwould be cool to have that as a git sub-command00:43
clarkbgit restack?00:43
SpamapSlike git review-rebase or something00:43
clarkbSpamapS: ya I think it is00:43
SpamapSthough it's probably also part of stgit00:43
clarkbso if you pip install it you just git foo it00:43
SpamapSwell but stgit doesn't check gerrit for you, just checks when you last stacked00:43
jeblairyep, git-restack; it's on pypi00:44
clarkbI just do git rebase -i HEAD~N because I have used it for years and harder to retrain brain then to keep going wiht it00:44
jeblairyeah, it's like ^ but for people who can't count.  like me.  :)00:45
jeblairhttps://pypi.python.org/pypi/git-restack/1.0.000:45
jeblairhas doc and source links00:45
jheskethI usually rebase back onto master or the target branch as it helps with merge-conflicts and the testing from the gate anyway00:52
SpamapSjlk: I'm writing up what we talked about into the spec. I do have a few little gotchas (but they're smaller than "use docker")00:53
SpamapSBut it also gains us a feature00:54
SpamapSwhich is that you'd only need a public IP on the untrusted executor node00:54
SpamapSs/public/reachable/00:54
*** jamielennox is now known as jamielennox|away00:58
SpamapSjlk: spec updated to include execution on node01:07
SpamapSI suspect the reason it was discounted was that putting ansible on the node and running it there feels like intruding on the user's node.01:08
*** jamielennox|away is now known as jamielennox01:12
jlkso the jobs and such that would run tox, were they going ot run on a node, or were they going to run right on the executor?01:12
jlkA concern I have outside of security with allowing execution on executor, is that we'd have to scale them a lot more. Before, nodepool as the scale point, because that's where the jobs executed. More jobs == more nodes. But if it can happen on the executor, that's something new we'd have to scale01:14
jlkand that cost model may be different. Nodepool resources may be "cheaper" (donated), but infra resources need to be _ours_ and more tightly managed.01:14
pabelangerjlk: SpamapS: FWIW this is how I test ansible things today with zuulv2.5, clone repos on node, pip install ansible, set host to 127.0.0.1. Works as expected for single host jobs01:40
jlkin V3, in layout, is there still a syntax when defining a project to say "run these jobs only if this first job succeeds" ?01:41
jlkthe docs say yes, but my testing is upset with the syntax01:41
jlkpabelanger: that's what we do too01:43
jlkwe're actually doing that plus multi-node to test ansible ansible working with multiple nodes01:43
pabelangerdevstack works that way today too01:43
jlkah, syntax changed slightly01:45
pabelangerbut, so far with my testing of zuulv3, things work as expected from zuul-executor. Assuming we can agree on the container / chroot method of running ansible-playbook, that gets us 99% of things covered I think01:46
jlkSpamapS: are we tracking docs bugs in v3 anywhere?01:46
pabelangerthen we don't need to worry about bootstraping workers with ansible dependencies01:46
jlkwell...01:46
jlkit works until you want to test yoru stuff with a different version of Ansible than what's on the zuul-executor01:46
pabelangersure, but that is an issue today that is not specific to zuul01:47
jlkright, however01:47
jlk"run on the node" model allows the user to influence the version of Ansible used01:47
jlk"run on the executor" does not01:47
pabelangerright01:47
pabelangerbut01:47
pabelangerI think we want to have a playbook that allows a user to run ansible on the node01:48
pabelangerso, zuul-executor runs a playbook on a remote worker, to then run ansible01:48
pabelangerI wonder01:49
pabelangerif we do have ansible-playbook in a container, on executor, could we not expose which container (and version of ansible) to run?01:49
jlkI honestly thing people got too excited that Ansible is the task execution engine and crept too far up the stack.01:50
pabelangerthat will get complicated fast I think01:50
jlkThere's a ton of work that went into scale/isloation of nodes to run suspect code01:50
jlkre-producing that inside the executor feels wrong01:50
jlkpabelanger: I think we'd want to just simply expose containers in general as a "node" resource in nodepool to go down that route01:51
pabelangerI think this is where k8s comes into play01:53
pabelangerand with that01:53
pabelangerI am getting a beer and calling it a night01:53
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Support for dependent pipelines with github  https://review.openstack.org/44529204:55
openstackgerritJamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Don't try and delete nodes with no external_id  https://review.openstack.org/44529905:20
mordredjlk: the intent is that work always happens on nodes, never on executors - that's why we disallow local execution etc05:21
mordredjlk: you said many more things above that will be easier to chat about when we're both online at the same time I think though :)05:22
jlkmost likely, yes05:22
jlkmaybe even in a voice situation05:22
mordredyah - possibly so05:23
mordredSpamapS, jlk: yes, amongst the reasons we didn't execute the ansible on the remote node is that then we'd have to install things on the remote node - and we'd have to re-implement the remote execution code that ansible already has for executing things remotely in order to execute things remotely - so we'd end up having unclean target nodes again and a more brittle execution pipeline05:29
jlkit's a game of tradeoffs05:29
mordredyes, it is05:29
jlkbecause now we're re-implementing user provided code execution isolation05:30
mordredwell, we don't necessarily have to re-implement that part - it is totally acceptable for us to use an existing technology for such containment05:31
jlkwell, I mean, you're having to add a second containment system so that you can execute things in your existing containment system05:31
mordredagain - tradeoffs - so far we've been uncomfortableish with the root requirements of most of them - or the VC-backed obvious pattern of industry abuse of the one that doesn't require zuul itself to be root05:31
mordredbut the executor executing a local process that is "rkt ansible-playbook" instead of "ansible-playbook" isn't nearly as different as replacing "ansible-playbook" with "use paramiko to scp files to remote host, then use paramiko to execute ansible-playbook there"05:33
mordredbecause then we'd also have to do something about trusted pre and post playbooks copying resources - because if we allow code to execute on the same host - even if it's the remote one, we can't trust that the user didn't rootkit the host anymore05:34
jlkright05:34
mordredwhich means we wouldn't be able to use ansible to write log publishing or artifact copying or execution of jobs that need secrets05:34
jlkHow does the user write things with this in mind?05:35
rbergeronstupid rootkits05:35
mordredjlk: so - 2 different answers in my head - for two different userbases ...05:35
jlkI'm trying to map this in my head, how I would craft ansible playbooks to do my unit tests and whatnot05:36
jheskethmordred: why not use ansible on the executor to run ansible on the node?05:37
mordredyah - so the main thing is that you can't write playbooks that do host=localhost - they must always be hoest = somenodename - and if you have tasks that naturally do something on the localhost like copy - you can't reference absolute paths that are outside of the execution context05:37
mordredjhesketh: right - that's the answer for fokls who _do_ need to test more exquisitely crafted ansible05:37
mordred(btw, I'm saying all this like how it is mostly just to try to explain thinking thus far, not to necessarily remove discussion of what might be)05:39
jlkwhat feels "easiest" to me, would be using ansible on executor to run the pre playbook (since we control that content), which preps the remote. Then whatever the user provided (be it a shell call or a full playbook) gets executed _on_ the node (since it's isolated, tenant separated, etc..) via ansible calling ansible, and then the post is ran again from the executor since we control that content, and ostensibly nothing has leaked back from the node to the05:41
jlk executor.05:41
jlkbut I get that it's awkward05:42
mordredwell - I 100% think we should write that05:42
mordredand have it be an opt-in that a user can request for one of their jobs05:42
mordredbecause we need it to handle the case of "test ursula" - since ursula needs plugins and whatnot (ursula is the most complex ansible I'm deeply familiar with, so it's my usual go-to in my head for complex use-case)05:43
jlkright, and Ursula cares _which_ version of ansible is used05:44
mordredbut I _think_ we can write a lot of the "execute remote ansible" with ansible05:44
jlkor say openstack-Ansible05:44
mordredlike a role05:44
jlkwhich may want to test _multiple_ versions of Ansible05:44
mordredthat you can request an ansible version with05:44
mordredso like "role: remote-ansible version: 2.1 target: {{ zuul.work_dir }}" or something05:44
jlkbut this also feels kind of like python envs or ruby envs05:44
mordredthat'll be super crazy ansible but that we wrote05:45
jlkin travis we can say "I want py27, py32"05:45
jlkI don't know how those are implemented, but I know when my job runs on them that version of python is there05:45
jlkthose might be nodepool images05:45
mordredI don't think they need to be images05:45
mordredI think the "run ansible" job in the standard library can install the requested version of ansible on the node in a pre-playbook05:46
jlkyeah, it could be pre-playbooks to install them05:46
mordredand then run with that05:46
mordredand in zuul.yaml you'll just say "I want to run the "run-ansible" job on my repo" sort of like how in travis you say "I want to run with python2.705:47
jlkwandering into the territory of "how does a site decide and expose these different env types"05:47
mordredbut for people who just want to run py27, they can just say "I want to run the py27 job"05:47
jlkor, maybe our site just doesn't _do_ that feature.05:47
mordredjlk: yah - we definitely need a plan for that so we don't accidentally and with the best of intention turn zuul into another non-compatible openstack mess05:48
jlkheh05:48
jlkso far, so good05:48
mordredjlk: I think zuul itself eventually wants some amount of a "standard library" of jobs05:48
mordredand some amount of site specific jobs05:48
mordredso all zuul users no matter operator should be able to, for instance, depend on a base python27 unittests job05:48
mordredthat's pretty much like the travis: python: 2.7 thing05:49
jlkright. We were hoping to be able to survey a bunch of users of travis and others, to see if there are some common things like that which bubble up05:49
mordredbut then, you know, sites that aren't openstack probably don't need a pre-canned "run devstack for me" - although that's clearly a centrally defined job openstack needs05:49
jlkwell, travis has a py27 env, but it doesn't run anything by default.05:50
jlkyou still have to define a script to run05:50
mordredyah - I think we want to have one that does run things - or multiples ... like pabelanger's run-tox thing he's been working on for tox05:51
mordredbut then having a "install py27" pre-playbook / base job that people can use is probably also a good idea05:51
jlkyup. our hunch is that the vast majority of those script calls are going to be to tox, or something like that05:51
mordredyup05:51
jlkand that there are similar patterns for other languages05:52
jlkdoing it as pre stuff is an interesting balance.05:52
mordredso we let peopel make it easy to write scripts that run python with different pythons, but also bake in a tox target that'll just run tox so they don't even have to write a script for it05:52
mordredyup05:52
jlkreduces the number of images you have to build every night, at the cost of a longer "spin up" for each test05:52
mordredyes05:52
mordredbuilding the ansible standard library for this is one of the exciting bits, but I think often gets lost in our discussion of building the zuul framework itself05:52
mordredso when the answer is often "you can just do that in ansible" - that doens't mean a user _has_ to write ansible, because hopefully we've written enough standard library for normal tasks that they're just referencing pre-written zuul jobs05:53
jlkright. the "in ansible" is just an implementation detail they don't care about05:54
mordred(run-autotools, run-maven, run-cargo, run-cmake, etc)05:54
mordredyup05:54
jlkit's just one more job to list, I suppose05:54
mordredand it's there for them when they want to get advanced05:54
* SpamapS catching up with backscroll now05:54
jlkSpamapS: we've carefully danced around actually arguing about the spec, and have found happier topics to discuss instead05:55
* mordred apologizes for being in europe and waking up to do the jet-lag-conversation-bomb05:55
mordredjlk: :)05:55
SpamapSno it looks productive05:55
mordredjlk: to be honest, I think this is actually really good backgroud content05:55
SpamapSand about the spec :)05:55
mordredyah - what SpamapS said05:55
jlkI'm doing a fine job of avoiding working on my book.05:55
mordred++05:55
jlkthe chapter I'm working on right now is ansible + containers, so...05:56
mordred\o/05:56
SpamapSSo I feel like we could put ansible into a chroot in a temp dir venv, run it, and not have it disturb anything on the node. Crazy?05:56
SpamapSjlk: THIS IS working on that ;)05:56
mordredfwiw, if the problem with bubblewrap is that it needs zuul-launcher to have root, that's also the problem with rkt - but I think just granting zuul-launcher the ability to run rkt as root is better than not wrapping or wrapping incompletely05:57
SpamapSand we wouldn't need any special execution code in zuul itself. We'd have local ansible that we control from a config repo run remote ansible playbooks that come from in-repo def.05:57
SpamapSmordred: still feels like we're reinventing isolation instead of just using what we already have.05:58
rbergeronjlk: oh, ansible + containers isn't a rabbit hole or anything05:59
jlkI just had a bad vision of porting oslo runroot over to zuul.05:59
* rbergeron sighs05:59
jlkrbergeron: I'm just doing a surface scratch, Packt wanted another chapter.05:59
SpamapSbtw I really wish we had like, a rack of ribs, glasses of whiskey, and a patio view of the sea, to get this discussion done. :)05:59
mordred05:29:26         mordred | SpamapS, jlk: yes, amongst the reasons we didn't execute the ansible on the remote node is that then we'd have to install things on the remote node - and we'd have to re-implement the05:59
mordred                         | remote execution code that ansible already has for executing things remotely in order to execute things remotely - so we'd end up having unclean target nodes again and a more brittle05:59
mordred                         | execution pipeline05:59
jlkI also think this discussion is funny, because decisions here may drastically impact the talk I pitched for mordred and I do to at AnsibleFest in London.06:00
SpamapSYeah so I'm suggesting that we could probably do that without being brittle or dirtying the node.06:00
mordredotoh - just make zuul run "sudo rkt ansible-playbook"06:00
rbergeronjlk: yup, i think that's about what oreilly wanted for the next thing that lorin and rene are working on. that and "ansible 2.0 plz"06:00
jlkshocker, same.06:01
SpamapSAnd we could probably do it with less complexity than rkt/bubblewrap/systemd-nspawn/etc.06:01
SpamapSmordred: yeah rkt may be the answer06:01
SpamapSIt's a nice tight wrapper around systemd-nspawn and even kvm.06:01
jlkif we gave zuul a very fine line of being able to call rkt to launch unprivelaged containers, that's not too dirty06:02
jlkideally not running all of zuul-executerd as root06:02
mordredyah06:02
SpamapSThe thing I worry about is that we're also building images.06:02
SpamapS(with ansible in them)06:02
SpamapSand it's just a lot.06:02
mordredwe shouldn't be building images with ansible in them06:02
mordredoh - rkt images06:03
rbergeronsince i missed half of this: is this mostly related to the containery topic or ... evvvveryting? (not the rkt, but everything else about where the zuulecutioner lives)06:03
SpamapSmordred: right06:03
SpamapSit's a smaller thing06:03
mordredI do not believe you need to build full images for rkt like you do with docker06:03
SpamapSand less surface06:03
SpamapSbut.. just feels weird.06:03
mordredI'm pretty sure you can point rkt at a chroot like dir06:03
jlkyou don't, just a directory tree06:03
mordredyah06:03
jlkbut you still have to build that tree06:03
SpamapSyes it's still an image06:03
SpamapSa chroot image06:03
mordredand we already build a direcetory tree anwyay06:03
jlkand do an overlay for each run06:03
SpamapSbut still an OS06:03
SpamapSa user space anyway06:03
SpamapSThis will feel a lot less awkward once I'm done with Ansibust06:04
mordredwe don't need a full user space - we need ansible-playbook06:04
jlkrbergeron: it's around "how do we run user supplied code on a zuul control node without allowing the user to own the control node"06:04
SpamapSmordred: which needs rsync, python, in the midnight hour, it screams moar moar moar06:04
jlkrbergeron: to satisfy a zuul v3 feature of "directly run user provided playbook content"06:04
SpamapSmordred: it's not a "let's not do this"06:04
mordredSpamapS: yah - but it doesn't necessarily need to be like a full ubuntu image06:05
SpamapSit's "what if we didn't have to do this?"06:05
mordredtotes06:05
mordredI'm just saying - it would be neat to figure out what the smallest thing that looks more like an app container is that we need to run ansible-playbook06:05
mordredI'd also like to keep 2 things as separate for now until they need to not be separate06:06
mordredthat's a) allow zuul to test 'ursula' and b) protect the ansible execution on the launchers06:06
mordredit MIGHT be that solving b solves a06:06
rbergeronjlk: without allowing the user to own the control node == you're theoretically enabling the user to pwn the control node through some malicious playbook?06:06
jlkrbergeron: correct. Ansible is pretty powerful, if you give a user the ability to provide their own playbook content, there are numerous ways they can interact with the host that's running ansible-playbook, in nefarious ways06:07
mordredbut it's not necessarily true - and I think that jlk brought up many good points such as needing specific ansible versions that make me still thing a) is a great job for a really nice complex ansible role we right ourselves06:07
rbergeroni guess that's not an == but something else, i am tired...06:07
mordredfor b, to keep the rkt image really small should be easy since we still wouldn't be allowing host=localhost - the number of binaries that are actually needed should be super minimal06:08
mordredwe _already_ copy the ansible modules library on zuul restarts - so we're already managing a copy of 75% of what would go into the temp dir we point rkt at :)06:09
rbergeronjlk: not to open the floor to obvious comments, but... i wonder how much of that is handled by tower. or if not at all, aside from "assuming there are multiple layers of permissions / eyeballs on things" which... doesn't really fix it if you're like, super sekurity oriented06:13
SpamapSmordred: yeah I'm pretty sure it will be a small chroot06:13
rbergeronand i have to think they've gotten hit on that question before06:13
mordredrbergeron: not at all06:13
jlkrbergeron: it's a problem that exists in tower as well06:13
jlkrbergeron: without careful human review of what goes into tower, one can pretty easily own the tower box.06:13
SpamapSAnd I was kind of hoping these app sandboxers would be more aligned with what we want.06:13
mordredyah - there are people in ansible-land, including bcoca, a desire to have a restricted-ansible ... but doing it _right_ is hard because ansible was never designed for it06:14
SpamapSthey all have some o_O piece06:14
jlkor at least gain access to whatever items are on the box that can be read by whatever user is used to execute ansible on tower.06:14
mordredSpamapS: ++06:14
rbergeronjlk: ssh, don't tell the rhel folks06:14
jlkI think they know06:14
SpamapSSomebody said it earlier.. Unix gives you a bunch of tools to get anything done. But if you actually use all of them.. it looks like a complicated mess.06:14
mordred:)06:14
mordredok - I gotta run ... this is a super great conversation though and I look forward ot having more of it06:15
SpamapSI kind of want a nice simple facade in front of chroot+cgroup+namespaces06:15
SpamapSrkt is probably the closest to simple06:15
mordredSpamapS: ++06:15
SpamapSsystemd-machined/systemd-nspawn might also count06:15
SpamapSexcept systemd makes me want to throw stuff06:15
SpamapS(but rkt is just a go frontend for that anyway)06:16
mordredyah - but then you'll make me actually type that word in a non-joking manner06:16
rbergeronspamaps: that's 2 out of three06:16
rbergeroni should have a bot06:16
mordredI'd rather say "we use rkt" than "we directly use a spawn of the devil"06:16
mordredrbergeron: ++06:16
mordredbecuase I'd NEVER be able to ge through a conference talk on the subject without grumbling06:16
mordredbut I can choose ot ignore rkt's implementation details ...06:17
SpamapSexactly06:17
SpamapSsame06:17
* SpamapS did write a chapter of the upstart cookbook after all06:17
SpamapSstill a little bitter about that ;)06:17
mordredSpamapS: so say we all06:18
jlkI helped port Fedora from sysv to upstart to systemd. I'm done with init systems.06:18
rbergeronman, y'all don't know bitter about that topic like i do :)06:19
SpamapSrbergeron: yeah, you probably had to actually talk to lennart. ;)06:19
SpamapS(even though Lennart was just the angry shouting monkey that distracted us while Kay stole pid 1 from our pockets) ;)06:19
jlkI spent enough time talking to Lennart.06:20
jlkand Hhoyer for the stuff around it06:20
jlkand ported s390x boot up stuff to Dracut. That was totes fun06:21
jlkwelp, I've reached full on WTF for the night, so I'm done.06:22
rbergeronI feel like if y'all poked at vbatts' or dwalsh's brain a bit they might have ideas. maybe even rackerhacker06:22
rbergeronsince he's off in sekurity land pretty often06:23
jlkright, SELinux is likely part of this strategy06:23
mordredrbergeron: I've talked to vbatts a little bit06:25
mordredrbergeron: the problem is - as awesome as he is, the amount of the problemspace that one needs to page in to be able to be helpful is more than they have time for06:25
rbergeronyeah06:26
rbergeronso many ppl i wish i could clone to further all my causes06:26
mordredrbergeron: but I think vbatts would say "use bubblewrap, or rkt/runc"06:27
rbergeronmordred: wait, are we just talking about the container case or also not container cases? i thought it was both?06:28
rbergeronmordred: also, i thoughtyou were going to like, do things and stuff :)06:28
mordredrbergeron: yah - I'm working on it :)06:28
mordredrbergeron: can you say "container case or also not container cases" but iwth different words? I don't understand the question06:28
mordredoh - I think I do now06:29
mordredrbergeron: we're talking about ALL invocations of ansible-playbook on user supplied code06:29
mordredand since we run the ansible-playbook process on the zuul launcher, wrapping those invocations with rkt or some other container tech not help disallow malicious code06:29
mordredthere was also some discussion of running ansible-playbook on the remote node - which is a thing we _will_ want to do in some cases, but I'm arguing that we don't want to do it for all of them, and am hoping we can use ansible to do that when we need to do that06:30
* mordred may try to write a small POC of using ansible to do that tonight06:31
rbergeronmordred: yeah, i think we want to avoid that becoming the defacto standard in zuul. ppl do it with ansible, but it's usually like, "because complicated stuff"06:31
jlkcommand: ansible-playbook ......06:31
SpamapSyeah I'm not so concerned about that06:32
SpamapSI'm more concerned with isolating ansible-playbook on the remote node from the rest of the node06:32
SpamapSwhich may be only slightly simpler than rkt/systemd-nspawning on the executor06:32
mordredrbergeron: I don't want humans to write ansible to run ansible when it's needed - I want us to do it one time and have it be a pre-canned role people can refer to similar to run-tox06:32
rbergeronesp since a lot of this is just "ansible is easy and makes it easy for you to do stuff" that i get to say over and over and then being like "oh but actually maybe not bring all your best practices over because aieeeeeeee"06:33
mordredSpamapS: I'm _very_ concerned with the theory of doing the paramiko calls ourselves and not using ansible for that - paramiko likes to break and do crazy things and it's really hard to get it all right06:33
rbergeronmordred: yeah, i'm still wrapping my brain around... mostly words with that, i poked at jim about that today in the meeting06:34
SpamapSmordred: that is never an option IMO06:34
mordredrbergeron: right - so there will be a better story for this that doesn't expose all of these guts - most of the ugly we're talking about here are impl details that most users should never need to know06:34
mordredSpamapS: yah. IMO too06:34
mordredok. srrsly ... must AFK06:35
SpamapSjust ansible -m ansible-playbook-runner node106:35
SpamapSgo AFK06:35
SpamapSnao06:35
rbergeroni think there are reasons we went from paramiko to ssh as default :)06:35
jlksomebody showed how much faster it was... :)06:36
* jlk afk06:36
rbergeron;)06:37
openstackgerritJoshua Hesketh proposed openstack-infra/nodepool feature/zuulv3: Merge branch 'master' into feature/zuulv3  https://review.openstack.org/44532506:59
rbergeronjlk, mordred, in your glorious awayness: i guess that thing called tower has started using bubblewrap for things, but unclear to what extent (and yes, obvious comments here, lol)07:10
rbergeronunclear to.. my immediate eyeballs / brainpower anyway07:10
*** isaacb has joined #zuul08:01
*** hashar has joined #zuul08:13
*** yolanda has quit IRC08:37
*** yolanda has joined #zuul08:42
*** Cibo_ has joined #zuul08:55
*** Cibo_ has quit IRC09:14
*** bhavik1 has joined #zuul09:25
*** lennyb has quit IRC10:06
*** lennyb has joined #zuul10:11
*** isaacb has quit IRC11:00
*** bhavik1 has quit IRC11:13
*** isaacb has joined #zuul11:48
*** hashar is now known as hasharLunch12:12
Shrewssuch scrollback12:36
Shrewsat such wee hours12:37
Shrewsyou folks are weird12:37
*** isaacb has quit IRC12:43
*** isaacb has joined #zuul12:44
*** madgoat has joined #zuul13:30
*** madgoat has quit IRC13:31
rbergeronlol14:12
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add request-list nodepool command  https://review.openstack.org/44516914:21
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Remove AllocatorTestCase and RoundRobinTestCase  https://review.openstack.org/44517514:21
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix min-ready/max-servers in test configs  https://review.openstack.org/44551214:21
Shrewsjeblair: pabelanger: rebased those two reviews on new review 445512, which fixes an odd edge case in our tests. We hit that today in 445169 tests.14:22
pabelangerlooking14:23
pabelangerShrews: Hmm, so if I understand right, you are staying min-ready: 2, max-servers: 1 is an invalid config?14:26
Shrewspabelanger: not an invalid config14:27
Shrewspabelanger: it's invalid for our tests (because we can't have them hang)14:27
Shrewstotally fine for production because you will have something freeing up nodes eventually14:27
pabelangerHmm, let me check something.14:28
Shrewspabelanger: btw, this is not the problem with nl01. that problem is that we have lost requests that we don't process. still working up a fix for that14:29
pabelangerokay, not the issue I was thinking of14:31
pabelangerShrews: so, just to confirm, if we had 2 providers, like node_vhd_and_qcow2.yaml, each with max-servers: 1 and label of min-ready: 2. We'd properly launch 1 node in each provider?14:33
Shrewspabelanger: not necessarily. we don't know which provider will attempt to satisfy the min-ready requests.14:34
pabelangerright, but if provider A was at max-servers, doesn't nodepool-launcher move to the next provider14:34
Shrewsso if provider A ends up trying to handle both min-ready requests (1 per node), then it will pause because it can only have 1 server ready14:34
pabelangerOh14:35
pabelangerso, it won't release the request back to other launchers?14:35
Shrewsnope. that's not the algorithm. the algorithm says to "pause if we are at quota".14:35
Shrewspabelanger: if we were to do that, and there were no other launchers, the request would fail. And we don't really want that. This is a very weird case for the min-ready nodes.14:36
pabelangerYa, a little different how we do it today, but that is okay.14:37
Shrewswe can probably put our heads together and come up with a more elegant solution for that, but i'm not concentrating on that right now. gotta fix the lost requests thing.14:51
jeblairjlk, SpamapS, rbergeron: i think you mostly covered this last night, but here's background on why we're executing ansible on the launcher^Wexecutor: https://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#execution14:52
jeblairjlk: and no doc bugs at the moment.  we've completely stopped updating docs in zuulv3 and will update/rewrite them as a separate step.  until then, the spec and my PTG email are the docs.14:54
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Create run-cover role  https://review.openstack.org/44133215:05
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Create zuul_workspace_root job variable  https://review.openstack.org/44144115:05
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Organize playbooks folder  https://review.openstack.org/44154715:05
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Rename prepare-workspace role to bootstrap  https://review.openstack.org/44144015:05
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add run-docs role and tox-docs job  https://review.openstack.org/44134515:05
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info to bootstrap role  https://review.openstack.org/44161715:05
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add revoke-sudo role and update tox jobs  https://review.openstack.org/44146715:05
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Create tox-tarball job  https://review.openstack.org/44160915:05
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Create run-cover role  https://review.openstack.org/44133215:16
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Create zuul_workspace_root job variable  https://review.openstack.org/44144115:16
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Organize playbooks folder  https://review.openstack.org/44154715:16
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Rename prepare-workspace role to bootstrap  https://review.openstack.org/44144015:16
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add run-docs role and tox-docs job  https://review.openstack.org/44134515:17
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info to bootstrap role  https://review.openstack.org/44161715:17
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add revoke-sudo role and update tox jobs  https://review.openstack.org/44146715:17
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Create tox-tarball job  https://review.openstack.org/44160915:17
*** hasharLunch is now known as hashar15:24
*** jesusaur has quit IRC15:38
*** jesusaur has joined #zuul15:42
pabelangerI feel like hacking at a coffeeshop today15:49
*** isaacb has quit IRC15:54
*** hashar has quit IRC16:06
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add request-list nodepool command  https://review.openstack.org/44516916:10
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Remove AllocatorTestCase and RoundRobinTestCase  https://review.openstack.org/44517516:10
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix min-ready/max-servers in test configs  https://review.openstack.org/44551216:10
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix race on node state check in node cleanup  https://review.openstack.org/44555716:10
Shrewsrebased stack on a race fix ^^^16:10
* Shrews hopes to stop finding bugs so that he can fix the one bug that was supposed to be the focus of his day16:15
rbergeronshrews: that's not how software works, dontcha know :)16:20
*** adamw has quit IRC16:22
*** adamw has joined #zuul16:23
* Shrews gives up for a while to release stress at the gym. bbl16:23
pabelangerI've arrived at a coffeeshop16:27
*** bhavik1 has joined #zuul16:31
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Remove ready-script support  https://review.openstack.org/44556716:36
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Stop writing nodepool bash variable on nodes  https://review.openstack.org/44557216:42
SpamapSjeblair: thanks. I am still feeling like as long as we're taking such great pains to isolate tests, we could use some or most of that to isolate untrusted playbooks.16:43
SpamapSjeblair: another way to put it is.. if we're going to teach nodepool to kubernetes.. maybe we should just do that instead and run untrusted playbooks on kubernetes things? I dunno.16:43
SpamapSThat may be over-reductive.16:44
jeblairSpamapS: take a look at the 2 comments i just left on your spec16:50
*** hashar has joined #zuul16:51
SpamapSjeblair: kk16:55
SpamapSjeblair: So I agree it's confusing and I hope we can get through this exercise with something not confusing either way. Ansibling on the node *is* a bit weird, however, by setting up the node in inventory by its name/groups, and just setting ansible_connection=localhost the only confusing part is that there's no SSH in -vvvv right? local is just "local to ansible-playbook's execution context"17:00
SpamapSI think the bigger obstacle is getting a working ansible on the node without affecting it too much.17:01
jeblairright.  it was a design goal to avoid that.17:01
SpamapSSo what I think I'm driving at is whether we can use the node abstraction, rather than the nodes from nodepool itself.17:05
jeblaircan you say that in different words?17:05
SpamapSYeah I'm working on it. I want to use code if I can.17:05
jeblairi'd really rather just understand the sentence you just said17:06
SpamapSLike basically I wonder if we can make something like 'untrusted-node: k8s-ansible' or if you have plenty of capacity 'untrusted-node: ubuntu-xenial' .. makes more sense?17:07
SpamapSNot the same as the test execution node.17:07
SpamapSSo you'd have your test execution nodes for your job, but then an untrusted node as well. But use the same abstraction.17:08
SpamapSand instead of crippled ansible in untrusted context, you'd get a playbook that runs ansible on whatever untrusted-node is defined.17:09
jeblairwhy would it be useful to have a choice there?17:09
SpamapSBecause we can have one that is entirely doable without operator pain, but maybe isn't suitable for all end users.17:10
SpamapSAnd we can take advantage of nodepool when users think that's suitable.17:10
jeblairubuntu-xenial is doable without operator pain?17:10
SpamapSI think it is, if users are already deploying zuul against a large cloud with lots of test (so, infra-esque users).17:11
SpamapSlots of tests.17:11
jeblairopenstack-infra has free donated cloud resources on a huge scale, and i can't see it embracing the idea that we need 2 test nodes in order to run a pep8 job.17:11
jeblair(even if one of them is "only" a 1G vm)17:12
SpamapSI'd say even smaller would be required to make this sensible.17:12
SpamapS(and address quotas likely become the limiting factor here)17:12
jeblairright; those are not choices we have17:12
SpamapSAlso one optimization would be that untrusted nodes could be shared by a trigger?17:13
SpamapSDunno17:13
jeblairSpamapS: i guess i misunderstood where you were at the end of your marathon conversation with mordred last night17:14
SpamapSI'm working it out in my head, but the point is really that I'm pretty sure one size won't fit all.17:14
SpamapSjeblair: I slept since then. :-P17:14
SpamapSlots of ideas flowing out of my head17:15
jeblairSpamapS: okay, so how can we have the conversation you need in a productive way?17:15
jeblairbecause i feel like we're going in circles a bit here17:15
SpamapSjeblair: maybe I should state the problems I have with the other method.17:15
SpamapSRather than trying to sell something that is also very complex.17:15
SpamapSSo isolating on the executor SHOULD be simple. But it's feeling pretty heavy from a development standpoint.17:16
SpamapSNow, rkt may be simpler than I thought, and it may be a good option. I'm still processing its documentation. IMO, rkt is much bigger and more ambitious than what we need.17:17
jeblairi guess that's the part i don't understand -- i'm still under the impression that "bubblewrap ansible-playbook" or "rkt ansible-playbook" or similar things are viable candidates.17:18
SpamapSbubblewrap requires you to build a chroot to run things in.17:18
jeblairwe're like 90% of the way there, right?17:18
SpamapSand doesn't do anything with LSM/MAC's.. so you're still vulnerable to namespace breakouts.17:18
SpamapSrkt is more or less the same, and mostly acts as a frontend for systemd-nspawn and systemd-machined interactions.17:20
SpamapSjeblair: my point is really that the complexity of bubblewrap and rkt aren't gaining us that much, other than another layer to peel.17:21
SpamapSI suppose with that in mind, the simplest one should win. :-P17:22
*** hashar has quit IRC17:23
pabelangermy container foo is not good, but we have a JobDir today, created by zuul. What would be the simplest tool today to make that a container, and just bindmount the ansible bits?17:26
pabelangerwithout diving into image builds atm17:26
clarkbpabelanger: you'd need a fully contained python and ansible install too17:27
clarkb(which is doable with virtualenv iirc)17:27
pabelangerclarkb: right, say we just use virtualenv, inside the chroot17:27
clarkbyou'd have to do the full copy thing and also likely carry along things like libc?17:27
jeblairwe already copy most of ansible into a zuul staging area when we start.  that plus the jobdir is why i say we're close.17:27
pabelangerI mean, we can quickly build something in DIB. I've done that before17:27
pabelangerjeblair: that's what I am thinking too.17:28
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Rename zuul-launcher to zuul-executor  https://review.openstack.org/44559417:29
SpamapSvirtualenv is not chrootable IIRC17:30
SpamapSonce you chroot in, the paths change17:31
SpamapSand thus, it freaks out17:31
*** Cibo_ has joined #zuul17:31
clarkbSpamapS: it has a copy everything option which should avoid that problem17:31
clarkbSpamapS: by default that is true because it symlinks and does funny relative to system python things17:31
SpamapSclarkb: I've always beent old virtualenvs are not relocatable.17:32
* SpamapS testing17:33
clarkbSpamapS: looks like it copies all the python things but not your system deps17:33
clarkbso not complete solution17:33
clarkband for relocatable I think the --relocatable flag would work in this case17:34
clarkb(if you use --always-copy too)17:34
SpamapSoh see they're always making new things17:34
clarkbthe general case is not relocatable for a variety of reasons. symlinks relative to things that move, versions of python being different etc17:34
clarkber not being relative to things that move17:35
SpamapSyou have to run --relocatable after creating17:35
SpamapSinteresting17:35
* SpamapS is lurning17:35
clarkbbut yes general case is not relocatable17:35
SpamapSI can't seem to chroot into it anyway17:37
SpamapSbecause it's not a statically compiled python maybe17:38
SpamapSthough it's saying not found17:38
clarkbya you'd need all the system deps too so not complete solution17:39
clarkbwhich might make a nromal python install in simple image better/simpler17:40
SpamapSyeah, it's got to be a real user-spaced chroot17:42
SpamapSvenv isn't even close17:42
SpamapSbash-4.3# python17:42
SpamapSCould not find platform independent libraries <prefix>17:43
SpamapSeven once I got the system libs in there17:43
SpamapSMuch simpler to just diskimage-builder up a tarball with ansible installed17:43
jeblairthat can probably be done fairly quickly at executor start, yeah?17:43
SpamapSYep17:44
SpamapSgah.. forgot how long it takes the first time you do an ubuntu-minimal dib17:59
jlkso yes18:00
jlkI figured at executor start, and by operator trigger, you could cause a new base image to be created18:00
jlkthe base image is what you start with, and you would overlay the dir that zuul is prepping with all the source18:00
jlkbut here is my concern as an operator18:00
jlkI feel very uncomfortable having end user provided code executing on the resources I have to secure18:00
jlkwe already have a isolation and scale system in place, the nodepool cloud18:01
jlkthose may be resources I don't have full control over18:01
jlkadn they're totally ephemeral18:01
SpamapSjlk: to jeblair's point from before.. to run python unit tests, you either need two vms, or one polluted vm.18:01
jlkbut if this is executing _on_ the executor, then I have to scale the more costly resource I control, _and_ deal with clean ups and break outs and such18:02
jlkSpamapS: a polluted VM, yes18:02
jlkyou get the VM node, + enough to run ansible locally18:02
SpamapSOr, you deploy a VM, and you make two containers in it... ansible executor, and unit test executor.18:02
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP Add per-repo public and private keys  https://review.openstack.org/40638218:02
jlkand then if you're doing pep8, you're doing it locally18:02
SpamapSBut that's the easy case. devstack-gate ... this gets complicated.18:02
jlkright, I need to read the link jeblair sent this morning, before getting too far into this again18:03
clarkbSpamapS: re d-g its not that complicated we already do that thing today18:03
SpamapSjlk: it's pretty brief. But basically suggests that it is simpler to just run ansible that we control than it is to try and cleanly inject ansible.18:03
clarkb(there is the extra overhead of teaching all the hosts how to talk to each other properly)18:03
jlkright, that bit, teaching the hosts to talk, that's hard.18:04
SpamapSclarkb: one might expect in a multi-node test that the point is for the nodes to talk to eachother?18:04
jlkand we hit htat now in 2.5 with our multi-node setup18:04
clarkbSpamapS: yes it is, but there is overhead to set up the communication from a job control perspective18:04
jlkdevstack-gate does feel pretty special case though.18:04
clarkbwe do it today. This is exactly how d-g runs if you run it today18:04
SpamapS-rw-r--r-- 1 clint clint 712M Mar 14 11:04 ansible-chroot.tar18:05
*** mgagne_ is now known as mgagne18:05
jlkSpamapS: compare that to the size of the docker image that Ansible produces to run Ansible in18:05
clarkbsetuppy things make sure ssh keys are in place, then we run ansible on the "polluted" node to run the job18:05
SpamapSThat's the result of running 'disk-image-create -t tar -o ansible-chroot ubuntu-minimal -p ansible'18:05
jlkoh they don't any more :/18:05
SpamapSnot totally shocking really18:06
jlkAnother issue I have18:06
clarkbSpamapS: how old is your dib? there was a bug in old dib that made ubuntu minimal really not minimal18:06
clarkbSpamapS: I think yo ucan likely shave off at least 250MB just by updating dib if yo uare on an older version18:06
pabelangerSpamapS: I have something smaller, I managed to get a tarball down to about 50MB over christmas18:06
jlkdoing all the multi-node things from executord host(s) means an increased amount of forks/ssh threads going out from those18:06
clarkbbut ya not small18:06
jlkwhich costs memory18:06
jlkinstead of pushing that work down to the cloud resource18:06
* pabelanger restores ubuntu-rootfs elements18:06
SpamapSclarkb: git pulled just before18:07
SpamapSoh, why do I have a kernel?18:08
jlkDiB may be dumb18:08
SpamapSyeah18:08
jlkand not know how to do things without a kernel18:08
SpamapSthat's 209MB18:08
jlkFedora did all this with a thing called "mock"18:08
SpamapSmore actually18:08
jlkwhich was just a wrapper around yum install --over-there/18:08
SpamapS360MB18:08
pabelangerubuntu-minimal is very opinionated for a VM18:09
SpamapSwow why is firmware so big??18:09
jlkdnf has this too, wher eou can just dnf --install path is /over/there  <my-package-set>18:09
jlknothing "depends" on kernel, so you can avoid a whole bunch of code18:09
*** bhavik1 has quit IRC18:10
SpamapSright, debootstrap does this too, dib just forces a bunch of VM stuff in18:11
jlkoh I see, it's a wrapper around a wrapper18:11
SpamapSso it's 446M without kernel stuff18:11
jlkare... are you in a wrap battle?18:11
jlk(I'll see myself out)18:12
SpamapSSpamapS The Wrapper18:12
clarkbit has to do kernel things because debootstrap is silly with kernels iirc18:12
SpamapSjlk: Dad Joke Level 7 achieved.18:12
SpamapSclarkb: nope18:12
SpamapSit adds the kernel18:12
clarkbSpamapS: yes because debootstrap adds the wrong kernel iirc18:12
SpamapSdebootstrap is very smart about just installing essentials18:12
jlkif by "silly" you mean "it doesn't include the kernel", then yes18:12
pabelangerwe also don't clean out apt-cache in ubuntu-minimal, so there will be cached dpkgs in the tarball18:13
SpamapSkernel is not an essential18:13
jlkso here's the thing.18:13
clarkbI was pretty sure debootstrap was installing a kernel but not the current kernel18:13
jlksize of the image isn't that big of a deal18:13
clarkbso dib swings around and updates it to be current and cleans out the old one18:13
jlkbecause you're going to maintain one, maybe 2 per executor, locally18:13
clarkbsomething like that18:13
jlknot shlepping them around18:13
jlkand start up is super fast18:13
jlkand overlay is low overhead18:13
SpamapSyeah sorry I got distracted by the size18:14
SpamapSpabelanger: good point, 56MB more18:14
pabelangerjust restored: https://review.openstack.org/#/q/status:open+topic:debootstrap-minimal (untested). That was the path I started down to create a ubuntu-rootfs element, for minimal container things18:14
pabelangerwill have to look at my notes again, but managed to get things pretty small18:14
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP Add per-repo public and private keys  https://review.openstack.org/40638218:14
pabelangereven a minimal ansible + python tarball18:14
jlkI like this work anyway18:14
SpamapSoh, ansible doesn't like being in a chroot18:15
SpamapShttp://paste.openstack.org/show/602732/18:15
jlkbecause even if we weren't running untrusted code on executor, I still like the layers of running the ansible from inside a containment zøne18:15
SpamapS(that looks like a devpts issue or something)18:15
jlkwhoh, not sure how I made that ø18:15
SpamapScompose key fun18:16
jlkanway, the idea of running the ansible things in ephemeral mostly immutable containments sits will with me, regardless of having end user code or not in there.18:16
SpamapSfeels weird to be talking about tools to build containers. Isn't that like, what the whole world is doing right now? How do rkt users build containers?18:17
clarkboh also I think apt installs recommends by default? anothe rarea you can probably trim the tree size18:18
SpamapSclarkb: I believe we turn that off, let me check18:18
jlkhttps://github.com/containers/build18:19
SpamapSYeah, dib turns off recommends18:19
jlkrkt can use docker images, so the same tooling there18:19
SpamapSjlk: so, ansible-container.........18:19
jlkno18:19
jlkit's very tech-preview18:19
pabelangercontainer builds should be no different then image builds IMO18:20
SpamapSbummer18:20
jlkand it's built up around docker-compose18:20
pabelangerright18:20
SpamapScause, that would be very congruent18:20
jlkit would, but it's WEIRD18:20
pabelangerI'd rather not have to depend on docker for build things.18:20
SpamapSpabelanger: except without kernels and boot loaders and partitions and...18:20
jlkit builds a conatiner, to run ansible in, to talk to a second container18:20
jeblairSpamapS: apparently mounting a tmpfs at /dev/shm should solve the error 38 problem: http://stackoverflow.com/questions/6033599/oserror-38-errno-38-with-multiprocessing18:20
jlkto deploy software into18:20
pabelangerSpamapS: right, we just need a new element, which diskimage-builder doesn't include today18:20
SpamapSjeblair: yeah I figured that's just my naive chrooting .. thanks for looking that up18:20
pabelangerI raised a topic on ML about this a few months ago18:21
SpamapSHopefully ansible playbook can survive without proc or sys mounted.18:21
pabelangersome interest but wanted it to be a larger refactor18:21
clarkbalso if you use tarball output you already forgo boot loaders and partitions iirc18:21
pabelangersimple-playbook is also a thing: https://review.openstack.org/#/c/385608/ which allows you to run ansible outside the chroot, to populate things inside the chroot :)18:22
jlkyeah, there's a chroot connection method I thought18:22
jlkso you can do ansible to build the chroot and wrap it up18:23
jlkor there's the docker connection method18:23
jlkbut that's kind of awkward too18:23
SpamapSIs the fact that I have to re-read the elements I wrote 3 years ago a sign that I'm getting old, or lazy?18:23
*** Cibo_ has quit IRC18:24
jlknah, it's that we have small scratch space in our brain for retaining details18:25
jlkthe rest gets paged out18:25
SpamapSyeah, lately, aggressively :)18:25
jlkSpamapS: so a more serious answer to your question about building containers18:26
jlkjust about everybody int he world builds container images by starting with a base image.18:27
jlkand then layering the changes on top of that18:27
jlkso they start with something from docker hub or whathave you18:27
jlkI think it's far more rare to be building images completely from scratch18:27
pabelangerwhich is great, until you lose a layer some place. eg: image gets deleted18:29
SpamapSjlk: makes sense18:29
clarkbit also means you are beholden to the choices of the layers below you18:29
clarkb(which has bee na problem for us in the "real world")18:30
SpamapSRight, that's the rub with using Ubuntu's images18:30
SpamapSyou can't really build the exact same image the way they do18:30
SpamapSwhich is really why ubuntu-minimal exist18:30
SpamapSs18:30
SpamapSpabelanger: did you actually make ubuntu-rootfs or something like it?18:31
SpamapSor just proposed?18:31
SpamapSbecause that's mostly all that's needed I think18:31
pabelangeryes, I've had something for a while.18:31
pabelangerbut haven't gotten it merged into diskimage-builder18:31
pabelangerI've been meaning to push it into project-config for now18:32
jlkclarkb: correct, it's very much a concern.18:34
jlkhonestly, I feel that if zuul is going to depend images at runtime, it should be in the business of building them up from scratch, not doing a FROM <somebody_else's_work> thing.18:35
pabelangerright, nodepool-builer could handle that. Just a matter of getting the tarball to zuul-executor host18:36
jlkdoes it have to be nodepool builder?18:36
jlkthat feels very back/forth to me18:36
clarkbno, but the overlap is significant18:36
pabelangerit is our image build service today18:36
clarkbyou could have zuul just configure a known image location then users use whatever they want18:36
clarkbwe'd likely use nodepool-builder18:37
* jlk waves hands "these are not the microservices you're looking for"18:37
jlkanyway, that's still implementation detail without having solid agreement on overall approach18:38
clarkbjlk: not sure what the concern is? you just need something to put a tarball at $PATH18:38
jlkmoving part complexity I guess, bootstrap dances18:39
jlkit'd feel more simple if executor just built the darn image locally when the service starts.18:39
jlkbut I guess it can just rely on a configured path to find the image18:39
jlkand how that file gets there is a site decision18:39
clarkbya I think thats what it could do via from nodepool import builder ; builder.build_image_at(/path)18:39
clarkbthen if you want to use some other building facility you'd set the config flag to "just look for your image here"18:40
clarkbanyways its all hand wavy at this point18:40
pabelangeronly concern about building images on executor, sudo permissions are needed18:41
jlk... are they?18:41
pabelangerand, increase pressure on HDD for caching things18:41
clarkbjlk: to chroot yes18:41
pabelangeryup18:41
jlkuh18:41
pabelangerI've always wanted to see if fakechroot would work18:41
clarkbI guess we might be able to work around those since non vm image builds are simpler18:41
jlkthat doesn't gel with my memory of building chroots on Fedora18:41
pabelangerbut never tried18:41
clarkbjlk: it has to do with how it chroots and bind mounts things in iirc18:42
jlkwhich was just yum to an alt path18:42
clarkbpossibly changeable for simpler builds18:42
jlkyou need the bind mounts to chroot into the thing, yes, but to build it??18:42
clarkbjlk: the way dib builds are done it bind mounts things into a chroot to do the build18:43
jlkwell that's dumb :)18:43
clarkbjlk: its actually quite useful for many use cases. Just probably not necessary in all18:43
pabelangerdebootstrap needs sudo, IIRC18:44
SpamapSdib's main reason for existing is efficiency, not simplicity of implementation. It does a lot of dumb stuff to make it faster.18:50
jlkoh hrm, doing it as non-root would make permissions of the files installed all wonky, so maybe I'm misremembering things18:50
SpamapSclarkb: do you think importing nodepool's builder like that is a good idea? I haven't thought it through, but that would make me feel a lot better about this if I could just have zuul-executor start as root.. run that to build image.. then drop to non-root.18:55
SpamapSPretty standard daemony behavior.. do some stuff as root then drop all the privileges and capabilities.18:56
clarkbSpamapS: maybe, I also think that the dib 2.0 work is aimed at making that sort of use case work better iwth stock dib18:56
ShrewsCALLING ALL PYTHON EXPERTS: I need to compute the difference of two lists where elements can be repeated (so I cannot use set operations). Can anyone save me some time here?18:56
clarkbso possibly it would be better to just support that work18:56
jlkShrews: feels like a google interview challenge18:57
jlklet me go get a whiteboard and a crippling case of imposter syndrome18:57
SpamapSindeed it does18:57
Shrewsjlk: ssssshhh, they're looking over my shoulder18:57
SpamapSShrews: without set operations. "the difference" could mean different things.18:58
clarkbShrews: collections.Counter might help18:58
jheskethMorning18:58
clarkbdepending on your definition for difference Counter.subtract() may do what you want18:58
ShrewsSpamapS: [1, 1, 1, 2] and [1, 2] ... diff would be [1, 1]18:59
SpamapSThe difference in the actual content as-ordered, or the set of keys, duplicated or not? What I mean is, what do you want when you compare ['a', 'b', 'a'] and [ 'a', 'b', 'b'] ?18:59
Shrewsnot as-ordered18:59
Shrews[1, 2, 1, 1] - [1, 2] is still [1, 1]19:00
SpamapSThere's a name for this19:00
clarkbShrews: then ya I think collections.Counter will work19:00
SpamapSit's the opposite of the union19:00
* SpamapS will probably find out that name is "difference"19:00
clarkband treat negative values as zero19:00
SpamapSyeah counter19:01
Shrewsclarkb: that might work. thx19:01
SpamapSmmmmm freshly shorn yak19:01
jeblairinfra meeting is starting now in #openstack-meeting19:03
Shrewsclarkb: worked brilliantly. thx x 219:07
jlkjeblair: how would you feel about "running user's playbook directly on executor" being a flaggable feature? So that an implementer could say "nope" to that whole code path?19:13
jeblairjlk: what's the alternative?19:21
jlkah, that's a good question. I guess I hadn't thought that through enough, as all the multi-node stuff we rely on now would be going away.19:21
harlowjamordred SpamapS since i think i want to pull u in, https://lists.cncf.io/pipermail/cncf-ci-public/2017-March/000025.html19:22
jlkis it safe to say that any multi-node testing would require doing the playbook on the executor ?19:22
harlowja`19:22
harlowjaIn summary, we are offering to build (initially) a proof-of-concept implementation of a CNCF-centric CI Workflow Platform to make it easy to compose and run cross-project continuous integration workflows composed out of re-usable and configurable building blocks (which incorporate other CI systems like Jenkins, CircleCI, Travis, Zuul, etc).`19:22
harlowja:(19:22
harlowjawhy do people do this, lol19:22
jlka number of us are reading that rightnow19:24
harlowjakk19:24
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Allow github trigger to match on branches/refs  https://review.openstack.org/44562519:25
harlowjahttps://lists.cncf.io/pipermail/cncf-toc/2017-March/000699.html is the other place this shows up (sent to both?)19:25
* harlowja would rather have zuul just work here (not build yet-another...)19:33
harlowjabut that may require some reach-oout from folks here19:33
harlowjareach-out/education...19:33
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix for unpaused request handlers  https://review.openstack.org/44563219:45
ShrewsALL: The nodepool gate seems to have gained some instability with all of the new code added the last few days, and tests re-enabled. Be aware. Trying to squash what I can.19:46
jeblairjlk: how about we try to continue down the path that we're on now, so that we can get something running based on the current design, and get some experience?  the most viable alternatives i see involve interacting with a container service (eg k8s).  revisiting this after (or as part of) our work to design and add container support to nodepool may be more fruitful.  i think if we can agree that we want ansible to run from the perspective of an ...19:50
jeblair... external party to the worker nodes, we will be able to port to other ways of running ansible without needing to re-work playbooks.  if we run into technical roadblocks, or we become so convinced this is a terrible idea and will never work, let's try to schedule a phone call to go over the constraints and design, because this touches a lot of stuff.19:50
jlkjeblair: yeah, i'm backing away from the cliff.19:51
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix for unpaused request handlers  https://review.openstack.org/44563219:56
*** hashar has joined #zuul20:00
pabelangeralso, https://review.openstack.org/#/c/445594/ is now ready to bikeshed over. Renames zuul-launcher to zuul-executor20:13
pabelangerI still have to update puppet, but holding off until we get some +2's on it20:14
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Better merge message for GitHub pull reqeusts  https://review.openstack.org/44564420:15
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Remove the --no-delete option from nodepool  https://review.openstack.org/44524120:24
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add per-repo public and private keys  https://review.openstack.org/40638220:24
jeblairokay, that's step one of secrets ready for review; i'm going to continue building on that20:24
SpamapSharlowja: so, background on that.. Huawei and I spoke at the Tahoe linux foundation leadership summit thing about a month ago.. and they feel it is complimentary to Zuul20:27
SpamapSI haven't read the full thing so I'm not sure it actually is20:27
harlowjau spoke with all of huawei20:27
harlowjalol20:27
harlowjamr.huawei20:27
harlowjabut fair point, they can take it wherever they want (more power to them i guess)20:28
SpamapSharlowja: yes, yes I did.20:29
SpamapSI spoke with a team led by Quanyi Ma, specifically20:29
harlowjacools20:31
harlowjajust made me wonder when i saw `In summary, we are offering to build (initially) a proof-of-concept implementation of a CNCF-centric CI Workflow Platform to make it easy to compose and run cross-project continuous integration workflows composed out of re-usable and configurable building blocks (which incorporate other CI systems like Jenkins, CircleCI, Travis, Zuul, etc). `20:31
harlowjathrow away the 'blah blah' terms from that20:31
harlowjaand it seems like zuul?20:31
pabelangerjeblair: exciting20:37
* pabelanger off to read the spec on secrets again20:37
pabelangerjeblair: left a question / observation on 40638220:45
jeblairpabelanger: excellent point :)20:45
ShrewsWould appreciate some reviews on https://review.openstack.org/445632 and https://review.openstack.org/445557 so we can merge those and stabilize the np gate at least a bit20:45
ShrewsWhen folks have time, of course20:46
pabelangerlooking20:52
pabelanger+320:54
jeblairpabelanger: so we definitely need to add the source dir there.  i'll take a look and see if i can put the public and private keys into the same file there.  we might decide that we want them in two different directories (private/public), but i don't think anything else will access those files directly, so it should be fine.20:54
pabelangerjeblair: cool20:55
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Fix race on node state check in node cleanup  https://review.openstack.org/44555720:58
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Fix min-ready/max-servers in test configs  https://review.openstack.org/44551220:58
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Fix for unpaused request handlers  https://review.openstack.org/44563220:58
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Add request-list nodepool command  https://review.openstack.org/44516921:01
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Remove AllocatorTestCase and RoundRobinTestCase  https://review.openstack.org/44517521:01
Shrewssweet. those top 3 should help. thx21:04
harlowjahmmmm, unsure who on kazoo is around to review stuff anymore21:05
harlowjanone of the other guys seem to be on IRC anymore, lol21:05
harlowjai hope they just out or on vacation, lol21:06
pabelangerShrews: so, once nl01.o.o updates, we can start it back up?21:07
Shrewspabelanger: no. unfortunately, the plethora of other bugs affecting the gate has pulled me away from the lost requests bug.21:08
pabelangerShrews: Ah. I see21:08
Shrewssorry21:08
pabelangerno problems21:08
pabelangerShrews: do you have an idea what the current issue is?21:08
Shrewspabelanger: i think so. If we kill n-l while it's processing request, they are left in the PENDING state with nodes allocated for them. We never continue processing them on the restart21:09
Shrewspabelanger: plan is to clean them up on restart and restart the request handling for them from scratch21:10
pabelangerk21:10
Shrewsi just actually have to GET to the point of coding that21:10
openstackgerritJamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Allow loading logging config from yaml  https://review.openstack.org/44565621:15
SpamapSjeblair: ok, just to update you on the spec.. I'm going to move the "run on the node" bit down to alternatives (as in, not what we're doing). I'm also going to start experimenting, in earnest, with bubblewrap and rkt (and add rkt to the list)21:25
jeblairSpamapS: cool, thx21:26
pabelangerSpamapS: https://review.openstack.org/#/q/topic:debootstrap-minimal21:33
pabelangergets a tarball down to 110M21:33
pabelanger-rw-r--r-- 1 pabelanger pabelanger 110M Mar 14 21:32 ansible-chroot.tar21:33
pabelangerthat is missing python and ansible currently21:33
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add per-repo public and private keys  https://review.openstack.org/40638221:55
jeblairpabelanger: that should be to spec :)21:55
pabelangerjeblair: cool, will review shortly21:55
dmsimardI'm going to ask a stupid question that just crossed my mind21:56
dmsimardWhat's stopping Zuul from doing the VM provisioning with Ansible OpenStack modules ?21:56
dmsimardI guess nodepool also does other things like images and pre-reserved instances21:56
dmsimardYeah, nevermind21:57
jeblairdmsimard: we've discussed that more or less as a plan to use linch-pin with a nodepool worker to do that.21:58
jeblairdmsimard: the short version is: using openstack at scale is hard, so we need a native launcher for that.21:59
dmsimardMakes sense21:59
dmsimardIt's also okay to keep things mostly single-purpose, i.e, do one thing and do it well21:59
jeblairdmsimard: but there are folks who would like to use nodepool with other clouds at not-openstack scale, so outsourcing that to ansible modules makes sense.  linch-pin helps organize those kinds of requests, so the sort of napkin-level idea is to have a nodepool launcher that uses linch-pin to talk to anything ansible can talk to.22:00
jeblairdmsimard: if we run into scaling trouble, we can always make a new launcher for that system.  eg, if we end up with a huge aws user.22:00
dmsimardTIL about https://github.com/CentOS-PaaS-SIG/linch-pin22:01
jeblairya that's the one22:01
pabelangernot to side track us or anything, but the plugin interface spec for nodepool would be interesting.  could be fun to try and add gcloud support for nodepool22:02
SpamapSpabelanger: cool.22:04
pabelangerSpamapS: its about 251M with ansible bits22:04
SpamapSyeah I'm fine with it being 700M too btw.. just was surprised. :)22:06
pabelangerjeblair: sorry for the noob question, but pem files can contain both public / private keys?22:10
dmsimardpem is a bit of a misnomer22:12
pabelangerjeblair: I am trying to understand why we are no longer writing the public_key_file22:12
dmsimardit's meant for x509 certificate stuff iirc22:12
pabelangerand looks like we get the public key from the pem file22:12
pabelangerya, that looks to be true. The googles says so22:18
jeblairit's 'pem encoded' so i called the file .pem (plus that's what's in the spec)22:19
*** hashar has quit IRC22:19
pabelangergreat, thanks22:19
jeblairand yeah, you can get the public key from the private key, so unless we have something else that wants to read that file on its own, i don't think there's too much of a point in writing it out separately at the moment.22:20
pabelangermakes sense22:20
*** hashar has joined #zuul22:20
jeblair(we'll be serving the public key via the webserver, but that's part of zuul and has access to the extracted public key)22:20
SpamapS5 minutes playing with bubblewrap the right way has me liking it a lot22:50
SpamapSmight be worth running it setuid, which is how the Ubuntu packages install it anyway22:50
openstackgerritJamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Refactor nodepool apps into base app  https://review.openstack.org/44567422:58
openstackgerritJamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Split webapp into its own nodepool application  https://review.openstack.org/44567522:58
openstackgerritJamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Split webapp into its own nodepool application  https://review.openstack.org/44567523:02
openstackgerritJamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Refactor nodepool apps into base app  https://review.openstack.org/44567423:02
*** hashar has quit IRC23:51
SpamapSso bubblewrap is actually pretty great.23:53
SpamapSwhen run as a setuid helper.. it pretty much locks down a chroot completely.23:54
SpamapSonly thing missing is cgroups.. which I'm less concerned about (so technically an ansible-escaper could use up all of the executor's allocated RAM/CPU)23:55
SpamapSbut we can proably just make a templated systemd unit for that23:55
SpamapSand then each bubblewrapped playbook will be in its own cgroup23:55
clarkbya should be simple to apply those too if needed23:55

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!