Wednesday, 2017-05-03

jeblairjlk: i think maybe half of what that test is doing is still relevant -- testing that if a new repo shows up after zuul starts, it doesn't bork.  however, we certainly can't have a configuration for that repo until after it exists now.  as pabelanger found out earlier today.00:00
jlkseems kind of chicken-egg00:00
jlkyou can't use a repo unless it's defined in a tenant config00:00
jeblairindeed00:00
jlkcan't put a repo in a tenant config until it exists00:01
jeblairyeah, so our repo creation in infra will need to be at least two phases: create in gerrit, then, after existing, add to tenant config.00:02
jeblairbut aside from that, zuul can get events from that repo even if it's not in its config, and it could end up getting used as a foreign project.00:02
jeblairwe should probably drop that test, and add a test for foreign projects.  i think that would cover what we need.00:03
jlkalright00:04
jeblairSpamapS, clarkb: i would rank ssh-agent above generating keys per node -- keys-per-node won't work well for a static host inventory in nodepool00:04
jeblairpabelanger: ^00:04
*** harlowja has quit IRC00:04
jlk        # TODOv3 (jeblair): re-add support for foreign projects if needed   hehe found that.00:05
clarkbjlk: thats a good point re static hosts00:05
jeblair(also, just to make sure this is out there -- the custom ansible plugins should prevent a rogue check job from accessing the private key data.  however, that's just a belt, not belt-and-suspenders, so i think pursuing suspenders here is worthwhile)00:05
clarkber jeblair ^00:05
jeblairbeen thinking about that a lot lately :)00:06
pabelangerSpamapS: I was able to get ssh-agent working inside bubblewrap container on local fedora system using steps listed: https://review.openstack.org/#/c/453851/7/zuul/driver/bubblewrap/__init__.py00:19
pabelangerSpamapS: I am going to dive more into it tomorrow00:19
pabelangerI've been side tracked by flatpak tonight :)00:20
openstackgerritK Jonathan Harker proposed openstack-infra/zuul feature/zuulv3: Add github reporter status_url config option  https://review.openstack.org/44979401:27
*** jamielennox is now known as jamielennox|away02:03
*** jamielennox|away is now known as jamielennox02:17
jamielennoxshould it be possible to filter reporters based on source?02:37
jamielennoxthis is mostly a conceptual thing i'm thinking of02:38
jamielennoxi'm hooking v3 up to both gerrit and github simultaneously, eventually it would be nice to do cross project dependencies and so you nned triggers for both sources02:38
jamielennoxhowever i started by just reusing the pipeline in both projects02:39
jamielennoxbut then on action i have both a gerrit and github reporter so when a github patch succeeds/fails it tries to report on gerrit as well02:44
jamielennoxobviously i can (and will for now) create seperate pipelines - but is this the intention or should i be able to say only report to github if the source is github?02:45
jamielennoxcurrent zuul.yaml: https://github.com/BonnyCI/project-config-v3/blob/d7830586dbc98b4c3ff10742b3d3862b6448d869/zuul.yaml02:47
jlkjamielennox: that's a good question!03:30
jlkI would think that a report should be tied to the source the job came through03:30
jlkalso, because that's how multiple connections to the same source should work too03:31
jlker multiple connections via the same driver03:31
jamielennoxjlk: so i'm wondering how complex this should be, but i'm at least thinking that on the pipeline you could say:03:32
jamielennoxreporters: {'github': {'source': 'github'}}03:32
jlkwell03:32
jamielennoxso a reporterfilter in the same way you do event filter03:32
jlkwe did work so that the trigger is source specific from the event03:32
jlkI would think that same work should translate somehow to the reporter things03:33
jamielennoxmaybe03:33
jlkI have to go afk, kiddo bed time03:33
jamielennoxi mean an event is always specific to the source, it has to be03:33
jlkI believe the defining bit is the canonical_hostname03:33
jlkevent hostname has to match the trigger source hostname03:33
jamielennoxa reporter could be like please record a log over here that is common to all reporters03:33
jlkoooh, you're thinking of a global reporter and maybe connection specific reporters.03:34
jamielennoxso i would be ok saying like this reporter triggers for these sources, and this reporter triggers for all sources03:34
jlksounds like a larger discussion03:34
jamielennox(just preferrably not in every action)03:34
* jlk really afk03:34
jamielennoxyea, i figured i'd put it out now and people can read and i'll chase it tomorrow03:35
jamielennoxi think things like github/gerrit reporters should be source specific by default, smtp, sql global by default and preferrably overridable in config03:38
jamielennox(can't think of a legitimate reason you'd override github/gerrit being non source specific)03:42
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit  https://review.openstack.org/46198103:57
jamielennoxnot sure ^ is appropriate or not, going to try it out03:57
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit  https://review.openstack.org/46198104:16
*** bhavik1 has joined #zuul05:37
*** yolanda has quit IRC06:20
jamielennoxjeblair: so with the change to canonical hostnames i'm missing a way in the job to determine the directory that the project code was cloned into06:20
jamielennoxfor example the merger used to maintain BonnyCI/sandbox and now it maintains github.com/BonnyCI/sandbox06:21
jamielennoxhowever github.com there is a default, in my jobs etc i only say BonnyCI/sandbox06:22
jamielennoxi used to locate this directory like: "{{ workspace_root }}/src/{{ zuul.project }}" however now there's an additional github.com folder and no zuul variable to tell me what it should be06:24
jamielennoxthere's a couple of ways i could see to fix this and just want to know your opinion - i think zuul.project should be the canonical name but there's a few ways to do that and don't want to go down the wrong rabbit hole06:25
*** hashar has joined #zuul07:28
*** isaacb has joined #zuul08:48
*** Cibo_ has joined #zuul09:01
*** Cibo_ has quit IRC09:15
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use routes for URL handling in webapp  https://review.openstack.org/46133109:30
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use tenant_name to look up project key  https://review.openstack.org/46133209:30
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use routes for URL handling in webapp  https://review.openstack.org/46133110:04
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use tenant_name to look up project key  https://review.openstack.org/46133210:04
*** jkilpatr has quit IRC10:31
mordredjamielennox: that commit message ^^ made me twitch10:37
jamielennoxmordred: which?10:41
mordredjamielennox: "Use tenant_name to look up project..."10:42
mordredjamielennox: just incorrect brain context moment10:42
mordredI was like "No - that's backwards!!! ... oh, wait, wrong project"10:42
jamielennoxoh, right - not allowed to call them tenants for keystone so i'm maintaining the barrier10:42
*** jkilpatr has joined #zuul10:49
*** bhavik1 has quit IRC10:52
*** hashar is now known as hasharLunch11:48
*** isaacb has quit IRC12:01
*** isaacb has joined #zuul12:03
*** hasharLunch is now known as hashar12:19
*** isaacb has quit IRC12:23
pabelangermorning!13:02
hashargood morning13:04
*** dkranz_ has joined #zuul13:20
*** isaacb has joined #zuul13:21
SpamapSpabelanger: so I had a thought about the SSH key in untrusted context conundrum.14:03
SpamapSpabelanger: wondering if we can basically load the key into ssh-agent _inside_ the bwrap, and then rm the key, and kill the agent if ansible-playbook exits.14:06
jeblairSpamapS: let bubblewrap handle the process mgmt?14:07
SpamapShttp://paste.openstack.org/show/608711/14:09
SpamapSjeblair: bwrap is pid 1... and will aggressively remove the entire pid space under it. But I mean in between bwrap and ansible-playbook.14:09
pabelangerThat would mean multiple ssh-agents running on the host, inside bubblewrap, over single ssh-agent outside. Trying to understand why that is better14:09
jeblairjamielennox: i could see making zuul.project be the canonical project name (hostname + full path).  or we could do a series of variables based on the internal ones like https://etherpad.openstack.org/p/vB1WjfL804  (mordred, clarkb: thoughts?)14:10
SpamapSpabelanger: because that is a copy of the key in RAM that we can kill14:10
SpamapSbut now that I think about it14:10
SpamapSbwrap's going to aggressively kill anything on exit of ansible-playbook.... so an attacker is already going to have to _replace_ and fork under ansible-playbook14:11
SpamapSso this gets us nothing14:11
pabelangernot sure I follow14:11
pabelangerthe last part14:11
SpamapSbwrap forks and execs its arguments, in our case, that is 'ansible-playbook --foo --bar baz'14:12
SpamapSand when that process exits, bwrap being its pid 1, it simply terminates the entire namespace, and itself.14:12
SpamapSso if somebody wants to break out of ansible-playbook, they will have to do it by forking something from ansible-playbook, and keeping ansible-playbook running.14:13
SpamapSsince SSH forks from ansible-playbook and reads its private key every time, I don't know that we can prevent a nefarious process from doing the same.14:14
SpamapS(we could with SELinux or AppArmor, because we could say only ssh is allowed to read the key)14:14
pabelangeris it possible to list a private key from ssh-agent?14:15
SpamapSSo, a single executor ssh-agent buys us something, as it will keep the SSH key in RAM and not sprayed out into the various work dirs, so prevents accidental key material propagation. But it doesn't protect us from ansible-playbook escapers.14:16
SpamapSYes, ssh-agent's protocol is to request a private key and get it back over the socket.14:16
SpamapSoy, have to AFK now to get kids off to school14:16
*** isaacb has quit IRC14:17
pabelangerRight, that's the first part I am concerned about right now.  I'd like to minimize the copying of our private keys around, if possible.  I think clarkb suggestion of ssh-agent, gets us that14:17
SpamapSI'm still concerned that the private SSH key will be a single ansible-playbook breakout away, and thus must if nothing else be rotated aggressively.14:17
pabelangeras for breaking out of ansible-playbook, I haven't considered testing that yet14:17
SpamapSThe only reason we're bwrapping is we think it's possible.14:17
*** openstackgerrit has quit IRC14:18
*** isaacb has joined #zuul14:18
jeblairSpamapS: i thought the ssh-agent protocol was signing operations, not key data transfer14:19
pabelangerright, it also stop playbooks from looking into /home/zuul on executor too, which contains our sensitive data right now14:19
jeblairso if you use ssh-agent, you don't request a copy of the key, but rather, request that the key be used to sign something14:19
jeblairif that's the case, we don't have to worry as much about losing the key material in a breakout, however, that job would still be able to use the key to log into any host for the duration of the job.14:21
jeblairhttps://tools.ietf.org/id/draft-miller-ssh-agent-00.html#rfc.section.4.514:22
pabelangerYa, having somebody break out and use key is something I haven't looked into yet.  I was mostly wanting to stop somebody writing a cat ~/.ssh/id_rsa playbook right now, and think ssh-agent might provide protection against that14:23
*** isaacb has quit IRC14:24
jeblairpabelanger: sure, but the cat playbook only works in the case of a breakout too14:24
pabelangeror somebody missed a trusted playbook change to do that14:25
pabelangerbut, agree14:25
jeblairpabelanger: trusted playbooks will not run with bwrap, so this won't stop that.14:26
pabelangerI'm still trying to understand why mount private key into bwrap at all, over using ssh-agent outside14:26
pabelangerjeblair: ah, right. keep forgetting that14:27
pabelangerif we moved to ssh-agent on executor, we could keep ssh_private_key contents outside of zuul in theory. Not including the discussion from last night about having zuul run ssh-add commands14:29
jeblairpabelanger: there will be multiple keys in the future, some of them managed by zuul, so it's likely that we would need multiple ssh agents, and i think zuul would have to start them.14:31
dmsimardfungi, pabelanger: so, the reason why I'm interested in booting from volume with nodepool is because the cloud I'm working with has a huge disparity in performance between ephemeral (local spindles) and volumes (all ssd ceph)14:47
dmsimardsome jobs can land in ephemeral just fine (say, simple tox jobs) but I'd rather have our "intensive" jobs to use volumes (i.e, openstack installation/integration tests)14:48
pabelangerI don't see why it could be used, I think shade support it out of box14:49
dmsimardI would guess so since ansible modules can use boot from volume14:51
Shrewsshade supports boot from volume, nodepool does not so that's where the work would be14:51
fungidmsimard: is there any openstack feature allowing you to pre-image a volume? or do you need a separate step to boot from image, attach the seed volume, overwrite it with the bootable image, detach it and then clone that for subsequent booting?14:51
pabelangerYa, create_server takes a few parameters, 50GB volume seems to be default right now.  I think it would be a matter of exposing some configuration settings into 'images' section to indicate we'd want to use the image from a volume14:51
pabelangerShrews: agree14:52
clarkbyes ssh agent doesnt expose the key. If bubblewrap doesnt protrct access to memory we lose regardless14:52
clarkbjeblair: re project names, maybe have a project name object whose default representation/text is the canonical version. But include methods to get the hostname/short_name/name/protocol?15:04
*** openstackgerrit has joined #zuul15:06
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Increase file permissions around generate keys  https://review.openstack.org/46121815:06
pabelangerjeblair: any objection to updating zuulv3-dev.o.o with new openstack-zuul-jobs / openstack-zuul-roles config changes?15:08
*** rcarrillocruz has quit IRC15:16
jeblairpabelanger: please do15:17
*** rcarrillocruz has joined #zuul15:18
pabelangernice, reload worked15:23
mordredfungi, Shrews, pabelanger, dmsimard: I _believe_ (would have to double-check) that we have options for create_server that allow booting from a pre-created volume, as well as booting from an image but on to a volume. there is also a shade call which is "please make me a volume from this image"15:24
fungioh, very nice!15:25
mordred(we honestly should think about using such things for our long-lived servers, fwiw)15:25
fungiso in theory the logic within nodepool could remain basically the same if you have "please make me a volume from this image" followed by "please boot from this volume"15:26
mordredyes15:26
mordredin fact, it's just a boolean flag to shade15:26
mordredso it can be _very_ little code in nodepool15:26
fungiconfiguration wise you still have images you're maintaining, so just need somewhere (new toggle in nodepool config?) for "and i want it on a volume thxbye"15:27
mordredif you say "boot_from_volume=True" - you then have 2 options - provide a volume to boot from in "boot_volume" - or or provide an image to turn into a volume in the normal image param15:27
mordredyou also want "volume_size"15:27
mordredwhich is in G15:27
mordred(it defaults, as pabelanger mentions, to 50Gb)15:27
fungiahh, yep right15:27
mordredso maybe 2 new options - boot_from_volume and volume_size?15:28
fungiand maybe volume type/flavor i guess if your provider has more than just a default one15:28
pabelangerYa, I think the hard part is new options to bike shed on15:28
pabelangerhopefully we could also try this in nodepool dsvm job too15:28
fungilike in rax we can choose between sata (default) and ssd (non-default)15:28
pabelangermordred: why for long lived servers? Not making the connection15:31
mordredpabelanger: it allows us to pick the size of the root filesystem15:42
mordredpabelanger: and also for our root filesystem to be in a cinder volume instead of just on a local disk - this might or might not be an improvement, depending on the server - but maybe it's worth investigating and thinking about if/when we should use it15:42
mordredfungi: oh - excellent call on volume_type - that is, in fact, _not_ exposed in shade - but I believe it should be15:44
mordredwill need to read up on how block_device_mapping_v2 works for volume types15:44
clarkbmordred: I mentioned it in -infra bu apparently image metadata can control volume related things on boot, any idea if it can be used to have an image boot on volume?15:44
clarkbmordred: beacuse if it can then you could do that in nodepool with no changes to any code, just config15:45
mordredclarkb: I do not know - but I'll admit to being wary of making more use of magic glance metadata ... I don't believe any of them are actually documented15:45
fungicertainly sounds convenient15:45
clarkbmordred: I mean 95% of shade is doing undocumented things15:46
clarkb:P15:46
mordred:)15:46
clarkblike figuring out ip addresses15:46
mordredif we can learn how to trigger such a thing, that seems potentially fine - however it might also be nice to add the options15:46
mordredso that nodepool users can just express "boot these using boot-from-volume" and not have to themselves learn BFV image metadata magic :)15:47
mordred(also, if we do learn more about them, it might be worth while exposing them in some way as options somehow in shade so that it's easy for people to choose to do that themselves)15:47
pabelangermordred: got it, thanks15:48
clarkbmordred: definitely, just thinking it might be doable now if people need it (but ya undocumented in general so who knows!?)15:50
*** isaacb has joined #zuul15:51
SpamapSjeblair: AH! TIL something :)15:57
SpamapSMy understanding was still based on ssh1 keys.. when was ssh2 created, 18 years ago?15:57
SpamapS:-P15:57
mordredSpamapS: that's still super new15:57
SpamapSI really just hadn't thought about how ssh-agent works completely.15:58
SpamapShttps://github.com/openssh/openssh-portable/blob/master/PROTOCOL.agent has the whole thing spelled out nicely btw15:59
jeblairyeah, that agrees with the ietf thing too15:59
SpamapSso yes, ssh-agent exposed into the bwrap will limit the ability of an ansible escaper to using the key while they're running on the box.16:00
SpamapSso changes to the current implementation are needed.. and I think 1 spec revision.16:03
SpamapSspec should add a brief SSH key discussion16:03
SpamapSand we still need to add the venv and now ssh agent to the implementation. Yes?16:04
jeblairSpamapS: sounds about right16:07
* SpamapS is on it16:08
SpamapSpabelanger: have you started on any of that?16:09
pabelangerSpamapS: no, I have POC'd ssh-agent locally but think we need to solve venv issue first. So we can test bwrap in gate16:10
pabelangercurrently head down, trying to get openstack-zuul-jobs setup16:11
SpamapSpabelanger: k, I'm going to take them both on.16:11
eventingmonkeypabelanger: Hey, SpamapS suggested I reach out to you about https://review.openstack.org/#/c/454396/3/?16:13
pabelangereventingmonkey: sure, what can I help with16:14
eventingmonkeypabelanger: I'm looking for something I can help out with, and SpamapS told me to ping you about in here.16:16
pabelangereventingmonkey: sure, the idea of 454396, is we want to add playbooks to get code coverage for all our blocked lookup plugins here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/ansible/lookup?h=feature/zuulv3 Right now, the test is setup as an untrusted-project, which means we should report FAILURE if we try to use certain lookups.16:19
pabelangerfor example, lookup('passwd') will always fail16:19
SpamapSpabelanger: is there already an example test written, or maybe one that asserts success that could be modified to test for fail?16:20
pabelangerjust what we have in 454396 right now16:20
* eventingmonkey relooks over 454396...16:21
pabelangereventingmonkey: SpamapS: what I was thinking, we'd write playbooks to use the lookup function, register a variable, then assert it failed. I do something like that here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/tests/fixtures/config/ansible/git/common-config/playbooks/check-vars.yaml?h=feature/zuulv316:23
pabelangerthen, we can enhance the test, and validate we get the right failure message in back from zuul16:23
SpamapSpabelanger: k. I just want eventingmonkey to have a framework to hang his tests off of, because the test fakes can be a little confusing at first.16:24
pabelangersure, I can add some code to 454396 on how I envision it working. Ensure we are all happy with that, then iterate on it16:26
jeblairwhat if we proceeded with what was already in 454396, and then improved it with the variable registration later?16:27
pabelangerthat works for me, in that case eventingmonkey should be okay to keep iterating on it16:29
eventingmonkeypabelanger: cool, I'm down.16:30
eventingmonkeypabelanger and SpamapS: What I'm not clear on is what it is you would like me to iterate on? I can look at it and see where I would go, but if you had something in mind already I'd rather not jump into the wrong thing...16:31
jeblaireventingmonkey: that test covers a portion of the custom lookup plugins; we want to test all of them.16:31
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Move playbook out of zuul  https://review.openstack.org/46221016:31
jlko/16:32
eventingmonkeyjeblair: Ah...I see, so working in the same area, just add tests for the other ones. Cool beans.16:33
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml  https://review.openstack.org/46221216:35
*** isaacb_ has joined #zuul16:35
*** isaacb has quit IRC16:39
*** isaacb_ has quit IRC16:39
jeblairjlk: if you have a minute to weigh in on 461331, i'd love another opinion there.16:40
* jlk reads16:40
* jeblair subscribes to openstack-zuul-jobs and updates dashboards16:43
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml  https://review.openstack.org/46221216:44
jeblairZuul: Patch Set 2: Verified-1 (2017-05-03 16:25:54+0000) < Reply >16:45
jeblairUnknown configuration error16:45
jeblairwe should track that down16:45
* jeblair looks in logs16:45
pabelangerI think it is job permissions16:45
jeblairhttp://paste.openstack.org/show/608728/16:46
jeblairwe should make sure that error propogates back to gerrit16:46
pabelangeryes, I am struggling to setup dependencies between untrusted projects ATM16:46
pabelangerlearning how it all works16:47
jeblairpabelanger: i think it might be very hard to move these between repos16:47
*** harlowja has joined #zuul16:48
jeblairbecuase you can't have jobs in two different repos with the same name16:48
pabelangershould I not expect depends-on to work properly in this case?16:48
jeblairpabelanger: you should.  and the second change should work.  but the first change won't.16:49
jeblairpabelanger: the first change, which adds a second copy of the jobs, is invalid configuration.  the second job, which removes the first copy of the jobs, should be valid config.16:50
jeblairpabelanger: (assuming the second depends-on the first)16:50
pabelangerokay, great16:50
jeblairbut that sequence will never merge with one-way dependencies (if we did two-way cross-repo-deps, we could merge them both simultaneously and that would be correct)16:51
jeblairpabelanger: i have an idea though16:51
*** hashar has quit IRC16:51
jlkjeblair: commented. I'm on the same fence as you.16:51
pabelangerjeblair: please share :)16:51
pabelangeralso, just seen the following:16:52
pabelanger2017-05-03 16:51:06,879 DEBUG zuul.IndependentPipelineManager: <QueueItem 0x7efe18089590 for <Change 0x7efdfeea4ed0 462214,1> in check> is a failing item because ['is a non-live item with no items behind']16:52
jeblairpabelanger: that's a pretty normal message16:52
jeblairpabelanger: that's a dependent change being dequeued after the depending change reported16:53
jeblairpabelanger: (recall that dependent changes are added to the queue, but aren't acted upon)16:54
pabelangerk16:54
pabelangerI'm having some trouble understanding why 462212 is not running jobs on zuulv3 ATM16:54
pabelangerI don't think openstack-infra/zuul can see openstack-infra/openstack-zuul-jobs right now16:55
pabelangerOh16:57
pabelangerI think I know16:57
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml  https://review.openstack.org/46221216:58
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml  https://review.openstack.org/46221216:59
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml  https://review.openstack.org/46221217:01
jeblairpabelanger: you keep shadowing those jobs.  it's not going to wokr.17:03
pabelangerjeblair: maybe I am confused. How do I get zuul to run tox-linters again?17:03
jeblairpabelanger: it's what i said above -- you can not have the same job defined in two different repos.17:04
*** Cibo_ has joined #zuul17:04
jeblairpabelanger: maybe we should have a phone call?17:04
pabelangeryes, that would help17:05
pabelangergive me 1 moment17:05
clarkbcouldn't you use a new name for the moved stuff, remove the old stuff, then rename to name you want?17:05
jeblairclarkb: yes, that's a possibility17:05
pabelangerokay, back. should we jump into pbx.o.o?17:07
jeblairyep.  let's go with room 681217:07
jeblairanyone is welcome to join: https://wiki.openstack.org/wiki/Infrastructure/Conferencing17:08
SpamapSpabelanger: btw, the bubblewrap implementation doesn't bind mount in the user's home dir17:21
SpamapSpabelanger: --dir just creates that dir in the chroot.17:21
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml  https://review.openstack.org/46221217:24
pabelangerSpamapS: okay cool. I was confused17:24
jeblairquick call summary: we're going to proceed with: 1) a change that removes the jobs from zuul and switches zuul to noop; 2) a change that adds them to openstack-role-jobs and depends-on (1); 3) a change to start running the newly moved jobs on zuul that depends-on (2)17:24
jeblairwe should end up with +1s all the way on that.  but we may have bugs.  :)17:25
pabelangerokay, I think 462212,6 is correct now17:26
pabelangerI see zuulv3 reporting no jobs ATM17:26
pabelangerAlso, I wonder if we should also bubble up that message to gerrit17:27
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap  https://review.openstack.org/45385117:27
SpamapSpabelanger: ^ that should now copy in the venv17:27
SpamapSwell, ro-bind it17:27
*** Cibo_ has quit IRC17:29
jeblairpabelanger: no jobs isn't normally an error, so probably shouldn't report to gerrit17:29
jeblairpabelanger: i'm going to go work on a test case for 'dynamically adding a job in one repo and dynamically running it from another one' and see if i can reproduce this17:32
SpamapSso with the SSH agent. I'm not sure what to do if SSH_AUTH_SOCK is already set.17:32
pabelangerjeblair: ok17:32
SpamapSdo I just use it? save it, execute stuff, and restore it?17:32
clarkbi would pass it through? that seems like it would be least surprise method17:32
jeblairSpamapS: force it to our value?17:33
jeblairi think zuul has to control the ssh agent(s), otherwise we won't be able to manage having multiple keys later17:33
jeblairwe're just talking about setting the env var with bubblewrap, right?  so we're not altering zuul's environment?17:34
jlkjeblair: in gertty, when in a diff view, when I'm trying to comment on a line it keeps opening the comment on the left side (old code) rather than the right side (new code). What changes the focus there?17:36
jeblairjlk: tab17:37
jlkoh arrow key too17:37
jeblairoh yep17:37
jlkwhy didn't I try that already17:37
jeblairit should also be persistent once you switch17:37
jlkand is it normal that my comment box opens on top of somebody else's comment, but once saved it'll show _below_ the comment?17:38
jeblairyeah.  not my preferred behavior, but programming is hard.17:38
jeblairi might know enough about urwid to fix both of those things now.  :)17:38
jlkhaha, no worries17:39
pabelangerIf SSH_SOCK_AUTH was setup, then zuul also added a key, you would get 2 keys I would assume?17:39
SpamapSjeblair: right17:39
SpamapScurrently we don't set env= when calling bwrap17:40
SpamapSbut perhaps we should be very explicit about what gets passed in there17:40
* SpamapS ponders while brewing mmoar coffee17:40
jlkjamielennox: I implemented fixes for your feedback on the github patches. I pushed up the result to my 'github-v3' branch in j2sol/zuul, but I'm not going to push up to gerrit until we get more feedback, since it'd be a lot of noise.17:44
pabelangerSpamapS: jeblair: so, it sounds like we might be okay with https://review.openstack.org/#/c/331148/ for initializing ssh-agent and ssh-add / ssh-remove of keys when executor starts17:44
jeblairSpamapS: pabelanger noted an env variable option to bwap -- maybe that's what we need to use instead of the env= argument to the subprocess call?17:44
jeblairor maybe we need both?17:44
jeblairpabelanger: something like that, maybe -- but i think it needs to happen entirely inside the executor, because i think it will eventually need to start and stup multiple agents.17:46
pabelangeryes, I used --setenv SSH_AUTH_SOCK $SSH_AUTH_SOCK in testing last night, however should maybe confirm if that is needed. I'm not sure which env vars are passed to bwrap by default17:46
jeblairpabelanger: (so, it shouldn't touch any files in zuul/cmd/; only zuul/executor/server)17:46
pabelangerk17:47
pabelangermake sense17:47
SpamapSI've not looked at the env var option to bwrap17:48
SpamapSlet me see17:48
*** Cibo_ has joined #zuul17:48
SpamapSah ok, those are just unset/set options. No real difference I think than just env= to Popen17:49
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add untrusted-projects ansible test  https://review.openstack.org/46188117:50
SpamapSpabelanger: bwrap gets zuul's env, and thus, ansible-playbook gets the same env17:50
SpamapSwhich is why I say, maybe we should black/whitelist envvars17:51
pabelangeryes, I would be okay with that.17:51
*** Cibo_ has quit IRC17:53
pabelangerjeblair: SpamapS: the good news, is paramiko has an SSH agent interface which we can likely use17:53
SpamapShm17:54
SpamapSI'll take a look17:54
pabelangerhmm, at first glance it seems we cannot add key to paramiko SSH agent, just read from existing keys setup17:56
clarkbpabelanger: its just for consuming keys out of an agent?17:57
pabelangerclarkb: looks like it17:57
pabelangerguess we could push that upstream17:57
jeblairpabelanger: hrm, a 2-change series to add jobs in one repo and use them in another passes tests.  i'm going to extend my test to do the 3-change series we're doing now.17:58
pabelangerack18:02
SpamapSpabelanger: so yeah, I think if we move the Evseev change into the executor specifically, and pass that agent's envvars into bwrap via Popen's env= .. I think we're good.18:07
SpamapSwe can talk about filtering the env later.18:07
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers  https://review.openstack.org/46226818:48
jeblairpabelanger: ^ i think that's the bug that's causing no jobs to run.  the key is the branch -- the openstack-zuul-jobs definitions are getting implicit 'master' brinch matchers, so they aren't running on feature/zuulv3.18:49
pabelangerAh interesting18:49
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers  https://review.openstack.org/46226818:50
jeblairgit add ^18:50
jlkthat's quite the commit message18:50
*** Cibo_ has joined #zuul18:52
jeblairit's longer than the change (but not the test)18:53
jeblairwow, i'm going to python that a bit better, hang on.18:54
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers  https://review.openstack.org/46226818:54
jeblairthere goes 3 ridiculous lines18:54
jlkoof, okay18:55
jlkre-reviews18:55
jeblairsorry, it's hot here.18:55
jlkhahah18:55
*** tobiash_ has joined #zuul19:01
jeblairtobiash_: have you seen https://review.openstack.org/461509 ?19:02
tobiash_jeblair: not yet, title sounds interesting :)19:03
tobiash_will have a look asap19:04
tobiash_jeblair: I encountered a bug with v3 where it reports with 'Starting check jobs' for every change in the gerrit instance19:06
tobiash_https://review.openstack.org/#/c/45571119:06
tobiash_that's my try to fix that19:06
jeblairtobiash_: ah thanks, that slipped my attention19:06
pabelangerSpamapS: here is the bwrap command running on my local zuul env, but don't think we have the right mounts setup: http://paste.openstack.org/raw/608739/19:11
pabelangeransible-playbook runs properly now, but missing zuul.ansible libraries19:12
pabelangercheck ansible.cfg pathes now19:13
Shrewsmorgan, mordred, jeblair: why were the zuul py3 changes (325661, 327436, 327459) all abandoned?19:14
jeblairShrews: maybe because of the gear work?19:16
pabelangerI think I see the issue19:18
*** Cibo_ has quit IRC19:20
jeblairShrews: SpamapS has changes in progress for gear19:20
tobiash_jeblair: the spec sounds great, thanks for writing this :)19:24
*** hashar has joined #zuul19:26
*** tobiash_ has quit IRC19:29
Shrewsjeblair: good spec. i think dkranz_ would probably be interested in that, too19:33
jlkit's too nice to be inside any more. I'm out for a few hours of cycling.20:02
dkranz_Shrews: I'll check it out.20:10
jeblairpabelanger, jlk: i think we need something close to the old implied branch-matcher behavior for the project-pipeline variants.  in other words, the change i wrote makes sense for job definitions, but i think if you stuck a project definition inside of a project repo, you would expect implied branch matchers for that...  i'm working on a new revision20:13
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add untrusted-projects ansible test  https://review.openstack.org/46188120:25
pabelangerShrews: jeblair: some progress on bwrap^ I we need to mount the full git repo in bwrap, because we pip install -e for tox.ini20:26
pabelangerShrews: sorry20:26
pabelangerSpamapS: ^20:26
pabelangerthe blocker now, is untrusted and Executing local code is prohibited20:26
jeblairpabelanger: hrm.  that sounds like it could be complex -- how would we know whether (and how) to do that?20:31
jeblair(keeping in mind that the goal here isn't just to handle zuul's unit tests, but to also handle zuul being installed in a virtualenv in production.  even an editable virtualenv, i guess)20:31
pabelangerjeblair: so, I am using the copy module now20:40
pabelangerhttp://paste.openstack.org/show/608757/20:40
pabelangerbut trying to determine if that is a valid use case or not20:40
pabelangertoday, that actually breaks20:41
pabelangerIIRC, we say it was okay for untrusted jobs to write into the executor job dir20:42
jeblairpabelanger: what's the error?20:44
pabelangerhttp://paste.openstack.org/show/608758/20:45
pabelangersrc is none20:45
pabelangerhappy to fix, but want to confirm that is actually valid code for untrusted executor20:46
pabelangeruntrusted job on executor*20:46
jeblairpabelanger: yep, i think so20:46
pabelangerk, let me work on fixing it20:48
SpamapSpabelanger: I thought work_dir would have all the git repos in it.20:57
*** dkranz_ has quit IRC20:58
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add untrusted-projects ansible test  https://review.openstack.org/46188121:02
pabelangerokay, that should pass, and use bwrap properly. Comments most welcome!^21:03
SpamapSpabelanger: left a comment inline.21:28
SpamapSpabelanger: Little confused by ../../..21:28
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers  https://review.openstack.org/46226821:28
jeblairi'm really sorry about that commit message21:28
jeblairit's kind of "but wait!  there's more!"21:29
SpamapSapprove now and you'll get not one, but TWO changes21:34
jeblairour layaway plan lets you buy now and pay down your technical debt over the lifetime of the program!21:35
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Report job shadow errors to users  https://review.openstack.org/46230021:51
jeblairpabelanger: i think those two changes take care of all the issues we've seen with your move.21:52
jeblairi think the first one is done and passes tests now, so we should probably see about merging it asap to proceed with the job move.  i think the second one is good too, just waiting on tests now.21:53
SpamapSwell this is stupid21:54
SpamapSssh-add -x locks an agent21:54
SpamapSbut it won't accept stdin redirects as the password21:54
jeblairSpamapS: is locking feasible for us?  i'd assume ansible might start new ssh connections at any time...21:54
SpamapSjeblair: I was going to look at whether we could use the controlsocket functionality21:55
SpamapSjeblair: so establish the connection and configure ssh to use it, then lock21:56
SpamapSbut that's a bit of a stretch21:56
jeblairSpamapS: interesting... ansible does use controlsocket on its own.  maybe it does the right thing if there is one already, but i don't know.21:57
SpamapSactually, locking just prevents ssh-add operations. I think.21:57
SpamapSI thought anyway21:58
jeblairSpamapS: according to the rfc (draft), locking should "at least" suspend private key operations.  so i'd expect it to not sign anything.  but maybe also not accept additions.22:00
*** fbouliane has joined #zuul22:01
SpamapSthe docs are not awesome22:01
SpamapSfor openssh22:01
SpamapSWhile locked, the agent will refuse all requests except22:03
SpamapSSSH_AGENTC_UNLOCK, SSH_AGENTC_REQUEST_RSA_IDENTITIES and22:03
SpamapSSSH2_AGENTC_REQUEST_IDENTITIES.22:03
SpamapSjeblair: the concern is that the escaper can forward the agent to the test node, and then poke stuff into the agent22:04
SpamapSwhich suggests that agent-per-bwrap may be necessary22:04
*** jkilpatr has quit IRC22:05
jeblairSpamapS: *nod*22:06
clarkbthe main issue being removing a key that subsequent jobs would depend on?22:06
jeblairclarkb: that, or adding a bunch of useless keys (dos?)22:07
SpamapSboth22:07
SpamapSalso keeping it very busy22:07
SpamapSwe can cgroup the bwraps and they'll share CPU fairly. But there would be only one ssh-agent.22:08
SpamapSI think in the scheme of things, ssh-agent with one key loaded is going to be quite a bit smaller than ansible-playbook22:09
jeblairyeah, doesn't seem troubling.22:13
*** cmurphy_ has joined #zuul22:24
*** hashar_ has joined #zuul22:26
*** SotK_ has joined #zuul22:26
*** cmurphy has quit IRC22:26
*** cmurphy_ is now known as cmurphy22:26
*** nibz has joined #zuul22:27
*** jesusaur has quit IRC22:28
*** SotK has quit IRC22:28
*** nibalizer has quit IRC22:28
*** hashar has quit IRC22:28
*** harlowja has quit IRC22:28
*** jesusaur has joined #zuul22:29
*** nibz is now known as nibalizer22:31
jamielennoxso much scroll22:32
jamielennoxis it possible to do per-run ssh keys?22:32
jamielennoxso the image or cloud-init gets baked with an ssh key for a trusted zuul process, be that executor-trusted or zuul elsewhere22:33
jamielennoxit generates a new ssh key there, copies public to the job nodes, and private into the executor22:34
*** jkilpatr has joined #zuul22:39
pabelangerSpamapS: replied to your comment22:45
pabelangerjeblair: ack, looking now22:46
jeblairjamielennox: interesting idea; though i think the easier way to do that would just be to have nodepool generate the keys for every node, which is possible but not something i'm excited about (it may be easier, but it's very difficult -- key management in openstack is difficult to get right).  neither approach maps well either to nodepool static nodes or zuul bypassing nodepool to log into machines directly with project- or tenant-specific keys.22:51
pabelangerya, that is something we talked about last night too. Wen22:53
jamielennoxthat would require nodepool having to log into the new nodes though right?22:53
pabelangerwe'd need to update glean too22:53
jamielennoxlike you can't do arbirtrary put this key on this vm in openstack, you have to register it with nova (AFAIK)22:54
jamielennoxwhich sucks22:54
clarkbjamielennox: yup22:54
jeblairjamielennox: right, which means making boatloads of keys.22:54
jeblairpossibly more than we have quota for22:54
jamielennoxso you'd still be in a position where something with the starting priviledged key has to log in to the box and replace that key with a per-job generated one22:55
jeblair(you can have 1000 servers.  and 10 keys.  have fun.)22:55
jeblairjamielennox: well, only if we can't use nova.22:55
jamielennoxjeblair: having nodepool maintain and rotate X keys in nova sounds horrible22:56
jamielennoxso given complete freedom on this (and not having many deployments to worry about) the ideal for me would be a pam module out to hashicorp vault22:57
* jamielennox would need to double check you can do ssh keys via pam, but the same daemon on box idea22:57
jeblairjamielennox: yes, it was when we were doing one key per day for image builds.  but at this point, we're pretty good at working around nova leaking things, so the only thing (other than just not wanting to do it because it's so painful) that would technically block it is the key quota issue.22:58
jamielennoxbut vault (or similar) is not something you can put a hard dep on22:58
SpamapSI don't think we need key per node.22:58
SpamapSBut making keys short lifetime is a good idea.22:58
clarkbjamielennox: there are ldap things but require changes to sshd?22:58
clarkbI think rhel may carry the patches22:59
jamielennoxSpamapS: i'm happy with short for now, but there are some easily exploitable things in our deploy atm, like test nodes come up with the same keys on basically sequential ips so malicious actor can easily pre-hose a few waiting boxes22:59
jamielennoxnot a huge problem for now,23:00
jamielennoxa big problem if you want to ever entertain private jobs in tenants23:00
jamielennoxclarkb: i think sssd can do it23:00
pabelangeranother idea, since zuul is using a http server for job secrets, is add ssh-import-id support to glean. Then have zuul worker fetch data from zuul on boot.23:00
SpamapSKey per tenant is one we can do too.23:00
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Store initial repo state in the merger  https://review.openstack.org/46117623:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use previously stored repo state on executor  https://review.openstack.org/46117723:01
pabelangerIIRC, cloud-init support thats, from github URLs23:01
SpamapSyeah, cloud-init does github or lp users23:01
jeblairjamielennox: the key should not be available to untrusted projects, so that should not be easily exploitable.23:02
clarkbSpamapS: pabelanger I wouldn't rely on external services that way23:02
clarkbthey are far too unreliable23:02
SpamapSclarkb: it would be zuul23:02
SpamapSnot an external service23:02
jeblairjamielennox: (the work SpamapS is doing is still in the realm of protecting from an escape from ansible)23:03
jamielennoxjeblair: in which model? the key has to be present for ansible to ssh even in the untrusted phase23:03
SpamapSand you don't really need ssh-import-id support. just    curl http://{{zuul}}/{{key_name}} >> /root/.ssh/authorized_keys23:03
clarkbSpamapS: zuul maybe external to the cloud?23:03
jeblairSpamapS: we can't rely on zuul being accessible from the nodes23:03
jeblairright that23:04
SpamapSit's true, we did fix that23:04
SpamapSso let's not break it again :)23:04
pabelangerclarkb: sure, but one way it works today.23:04
jamielennoxSpamapS: right, that was where i was starting because you don't need to execute that from cloud-init/glean, you can do that from the trusted playbooks to set up a key for the untrusted components23:04
SpamapSI think we just have to live with the DoS potential of an ansible-playbook escaper.23:05
SpamapSUnless we do key-per-test-node, which is quite expensive.23:05
*** hashar_ has quit IRC23:05
jeblairSpamapS: if you want to use zuul to drive CD, getting the key is about as bad as things can get though.  so i wouldn't characterize it as a dos potential.  i'd charactize it as privilege escalation.23:06
SpamapSIMO CD would never be untrusted.23:07
SpamapSIn fact that's where I think key-per-tenant makes sense.23:07
jeblair(unless your CD key is a different key, which is the plan in the spec)23:07
jeblairSpamapS: yeah, that's what the spec describes, is a per-tenant (and also per-project) key.23:07
clarkbhaving an agent would protect the key though righr?23:07
SpamapSclarkb: if they escape, they get to use it to ssh to things while they're escaped23:08
SpamapSand once they ssh somewhere, they can agent forward23:08
pabelangertrying to catch up, are we saying ssh-agent might be an issue?23:08
clarkbright which should be same test env?23:08
jeblairSpamapS, clarkb: so assuming we have agents set up correctly, an exposure of the nodepool key doesn't necessarily mean an exposure of the CD key, which is good.  but we just need to keep that case in mind.23:08
SpamapSand then from there.. as long as they are still connected and running.. they're ssh'ing wherever they can23:08
pabelangerjeblair: yes, that is my thinking too23:08
SpamapSso we have to think in job timeout windows23:09
jeblairstill, if you grab the nodepool key, you could start sshing into nodes that use secrets and you may be able to steal the secrets23:09
SpamapSIf we let jobs go for 4 hours.. that's how long somebody can play with a key.23:09
SpamapSthat's nsaty23:10
SpamapSnasty even23:10
SpamapSSo.. what if we did have the executor just do 'ssh-keygen -t rsa -b 1024 -f tmpdir/jobuuid ; ssh-agent ; ssh-add tmpdir/jobuuid ; ssh-copy-id node-address ; bwrap ansible-playbook' ?23:12
SpamapS(in pseudo-commands...)23:12
SpamapSbasically just make a crappy key.. copy it up for this _one playbook run_ using the real key, and then only expose the crappy key to untrusted execution.23:13
jeblairi think it would work for dynamic nodes; needs something to cleanup static nodes though23:14
jeblairSpamapS, jamielennox: the "cleanup" job idea from SpamapS yesterday may apply here23:14
jeblair(and i think we could do this either hard-coded in the executor, or as a trusted pre-playbook as jamielennox suggested; the setup/cleanup cycle may be easier that way actually)23:15
jamielennoxSpamapS: that's basically what i meant by generating a key in the trusted pre-playbooks, though i wasn't thinking of the ssh-agent23:15
pabelangera trusted playbook seems like it would fix well23:15
SpamapSI think this goes inside zuul23:16
jeblairif we do trusted pre-playbook we have to remove the trusted key from the agent after installing the new one23:16
jamielennoxthat ssh-agent is single use right? you just shut it down?23:17
SpamapSAnd we can make the cleanup a finally: after bwrap23:17
SpamapSjamielennox: yep23:17
jamielennoxi think it preferable to code this into the executor that playbook - but either works23:17
jamielennoxs/that/than23:17
SpamapSit's foreground, you just read the sock path from it in and feed it to bwrap/ansible-playbook as env vars and then after kill it off23:18
jamielennoxi don't see a situation where you would want to opt-out from this23:18
SpamapSyeah, this feels like it should be in the executor23:18
jeblairjamielennox: yeah, though i was imagining an agent per job, not per ansible-playbook invocation; so you'd use the agent with the nodepool key for the first pre-playbook, then remove the nodepool key, the subsequent playbooks use the new key23:18
jeblairSpamapS: note that we only want to create one key per job, but we run ansible-playbook and bwrap several times within a job23:19
SpamapSper job seems more sane23:19
pabelangerSo, if we do pre-playbook, don't we still have an issue if somebody breaks out of bwrap, they can still access our primary SSH key to access the node?23:19
SpamapSjeblair: can set it up as a lazyload singleton23:19
SpamapSthat gets torn down after the job23:19
jeblairpabelanger: that's why i say if we do that, the pre-playbook needs to remove the key from the agent23:19
jamielennoxso what is this key per job idea?23:19
pabelangerjeblair: right, but how does the next job SSH into the next node?23:20
jamielennoxjeblair: i don't think it's removed from the agent, you would start a new ssh-agent with only the new key and put that socket through to the bwrap environment23:20
jeblairpabelanger: agent is per-job too23:20
jeblairjamielennox: i would expect something like 3-10 ansible invocations per job; a mix of bubblewrap and non-bubblewrap.  that's why i was thinking agent-per-job.23:21
jamielennoxjeblair: oh sorry, my bad i mentally read that as key per project for some reason23:21
jamielennoxyes agent-per-job23:22
pabelangersorry, still confused :( Does executor SSH into zuul work, and add key B authorized_key file? Then bwrap connection with key B to zuul worker?23:22
SpamapSYeah ok, but I don't think playbooks would do this well.23:22
jamielennoxwell, no agent-per-build/executor not per defined job23:22
jamielennoxwell, no agent-per-build/run not per defined job23:22
SpamapSyeah, agent+key per build23:22
jeblairjamielennox: i stand corrected:  agent per build.  :)23:22
SpamapSrun?23:23
SpamapSgah23:23
SpamapSterms23:23
SpamapS>:23:23
jamielennoxi'm still not exactly sure of all the terms23:23
jeblairyeah, "build" is the right zuul vocab23:23
openstackgerritIan Wienand proposed openstack-infra/nodepool master: Remove timestamp in devstack plugin  https://review.openstack.org/46231923:23
pabelangerCan we maybe get an etherpad going?23:23
jeblair(it's sometimes loose in the executor since we accept gearman "jobs" which tell us to perform a zuul "build")23:23
jeblairhttps://etherpad.openstack.org/p/EIrCy4mhuR23:24
pabelangerty23:24
SpamapSso the way I'm seeing it, we have the executor basically just say "I want to run ansible playbook" and we go "do I have an ssh agent yet? no. Ok... let's create one.. and spray the key out to the nodes (maybe that part is an internal-to-zuul playbook) and then run the playbook you asked for"23:24
SpamapSand then when a build completes we tear that agent down and clean the key off the nodes.23:25
SpamapSbut I don't think we actually ever remove the "real" nodepool key. We just leave that agent running and use it for setup/teardown23:25
SpamapSsince it's never exposed to untrusted playbooks23:25
jamielennoxi don't think there's a teardown in any of these scenarios23:26
jamielennoxnew keys are generated in jobdir that executor auto cleans up23:26
jamielennoxand ssh-agent exits at the end of the build23:26
pabelangerright, if the real nodepool key is still on executor, I am struggling to understand why we go though all the steps for per build ssh-keys23:27
jamielennoxpabelanger: because it is super easy to have ansible-playbook exec something on the executor that just proxies the ssh-agent out to the rest of the world23:27
SpamapSjamielennox: you have to remove the keys from static nodes23:28
jeblairpabelanger: so that if someone escapes out of ansible, into the bubblewrap container, they can't use the nodepool key (whether on disk or in agent) to ssh to anything else other than the nodes for that build23:28
jamielennoxSpamapS: oh, static, right23:28
jeblairi think i have the steps for hardcode-in-zuul in the etherpad23:28
jeblairworking on pre-playbook now23:28
SpamapSjamielennox: and ssh-agent would only exit when you kill it, so it's still part of tearing things down.23:30
*** harlowja has joined #zuul23:31
jeblairokay, i think that's the plan for either hardcode or playbook variants23:31
* SpamapS looking23:31
jamielennoxfrom ssh-agent man page:      If a command line is given, this is executed as a subprocess of the agent.  When the command dies, so does the agent.23:31
jamielennoxmay or may not be possible if you have to add keys etc23:31
jeblairjamielennox: we still have multiple playbook invocations to deal with though23:32
jamielennoxalso not sure how that ould play with bubblewrap23:32
jamielennoxyea, ok, ssh-agent cleanup23:32
pabelangerI'm leaning to trying it with pre-playbook first, see how it goes23:35
SpamapSit's a pretty easy cleanup23:36
jeblairSpamapS: i'm with you on having this be a batteries included thing (and not something that we want users to mess up).  but i'm hesitant to hard-code assumptions about the target hosts into zuul.  so i kind of like the pre-playbook idea since it lets us put it in the zuul standard-lib base job so that everyone can use it easily.  but in the rare case that someone has to do something different in some circumstance, it's still site-customizable.23:36
SpamapSbuild.agent.kill()23:36
jamielennoxpabelanger:  can you establish an agent in pre-playbook that will influence how it is run in the untrusted23:36
SpamapSjeblair: I hadn't thought about a base job, but yes, I think you're right.23:37
SpamapSerr23:37
SpamapSbase playbook?23:37
jamielennoxpabelanger: and have an always run cleanup of agent even on failue?23:37
jeblairSpamapS: job ~= playbook in this situation23:37
SpamapSk23:37
pabelangerjamielennox: I think so. I'd need to test23:38
jamielennoxsomehow you need to tell bwrap to plumb the socket into the untrusted environment and i'm not sure how you do that from a playbook23:38
jeblairjamielennox: yeah, theoretically a trusted playbook could start a daemon, and a post playbook always runs, even on failure.  but.. yes ^ that's one of the reasons we probably want zuul managing the agent, so it can set up the right bubblewrap cmdline23:39
jamielennox(and mostly because i don't know how a lot of this stuff works in bwrap)23:39
pabelangerwe should be able to place the ssh-agent bind_address inside our job dir, right?23:39
jamielennoxbecause depending on how you put code into the bwrap environment if you might be able to put through the actual domain socket as like {jobdir}/ssh-agent23:40
pabelangerthen untrusted would look for it23:40
SpamapSYeah I like the idea of zuul just having the per-build agent be in all playbooks environments.23:40
SpamapSbut all the stuff we do with the agent makes a lot more sense being done in playbook23:40
SpamapSsince we want to copy data from executor -> all nodes, that's kind of ansible's job :)23:40
jeblair++23:41
SpamapSpabelanger: so.. turning attention to the venv...23:41
SpamapSpabelanger: since we pip install -e for tox jobs, the problem is that where the code is coming from is not actually _inside_ the venv. Yes?23:41
pabelangerI do like the idea of having nodepool adding the 1time key to worker, then passing the key via zookeeper to zuul, but no more SSH23:41
pabelangerSpamapS: right, it is 2 directories up23:42
SpamapSpabelanger: in the tox case.23:42
SpamapSin prod.. ????23:42
pabelangertox23:42
pabelangerwell23:42
pabelangerI mean, somebody could pip -e in production, but we don't23:43
pabelangerpip install -U /opt/zuul23:43
SpamapSso.. point being, venvs can have symlinks to wherever23:44
SpamapSso we can't just bind mount in the venv itself.23:44
SpamapSpabelanger: also we don't need zuul's code, we need ansible23:44
pabelangerSpamapS: we actually get ansible on your patch23:45
jeblairtechnically, we need a tiny bit of zuul's code, for our custom plugins, but we can remove that dependency23:45
pabelangerbut, we are missing zuul.ansible23:45
pabelangeryes, that23:45
jeblair(i think)23:45
SpamapSjeblair: that's already handled when we copy the action plugins in. I think?23:45
SpamapSOh, those import zuul stuff.23:45
SpamapSdurn shared library code23:46
jeblairyep.  but it's just zuul.ansible.path, which is a 70 line file, mostly comments, whose only further dependencies are ansible23:46
pabelangerhttp://paste.openstack.org/show/608776/ is the traceback23:47
SpamapSyeah, so.. hm23:47
SpamapStwo ideas popped into my head immediately. They might be insane.23:47
SpamapS1) move that into its own pypi library that we require23:48
SpamapS(2) was in fact insane and I will not share.23:48
jeblaira simple option is: we can copy that specific file into somewhere in sys.path in the bubblewrap and change the import statements to match.23:48
jeblair(was that #2? :)23:49
SpamapSsorta23:49
SpamapSyeah23:49
jeblair(and i guess we'd have to add /var/lib/zuul/ansible to sys.path for the non-bubblewrap case)23:49
jeblairSpamapS: i feel it's less insane because we're already compelled to copy the custom plugins to some location anyway.23:50
jeblairreally just adding that location to sys.path might be enough23:50
SpamapSYeah the custom plugins make everything bonkers already.23:50
jeblairthen "import path" instead of "import zuul.path".  obviousl, path is a bad name then, so "import zuulpaths" or something.23:50
openstackgerritIan Wienand proposed openstack-infra/nodepool master: Turn of time stamps in dib log  https://review.openstack.org/46232123:52
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Report job shadow errors to users  https://review.openstack.org/46230023:52
SpamapSjeblair: that will bork in py3 unfortunately...23:53
SpamapSbut we can make a well known parent dir and add that to sys.path23:53
jeblairSpamapS: why would it bork?23:53
SpamapSjeblair: py3 does not do relative imports anymore23:53
SpamapSyou need package name23:54
jeblairSpamapS: but this would appear as a "root" package, right?  if it's /var/lib/zuul/zuulpaths.py and /var/lib/zuul is in sys.path?23:54
SpamapSperhaps? It's a solvable problem23:55
SpamapSit was a nit pick that flew out of me without much thought23:55
SpamapSEOD rushing without testing ;)23:55
* SpamapS has to go23:55
jeblairya23:55
* SpamapS -> AFK23:56
pabelangerjeblair: would we ever have a zuul-executor that would only run trusted playbooks?23:59
jeblairpabelanger: nope (unless the only playbooks in the system were trusted)23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!