Wednesday, 2017-03-22

jlkI may have messed something up here00:00
mordredjlk: pip freeze won't show setuptools00:00
mordredjlk: pbr freeze will though00:00
jlkgood point, I'm too used to doing _everything_ inside venvs00:01
SpamapSOk I think I've got a pretty good working thing now00:01
mordredjlk: (also, you might like pbr freeze more than pip freeze anyway - it'll show you git shas for anything that has them recorded in their metadata)00:01
clarkbpip list too00:01
mordredwhich is everything that use pbr, fwiw00:01
* mordred hasn't used pip freeze in a couple of years now00:02
jlkI have PTSD from pbr, so I haven't touched it00:02
mordredawww00:02
SpamapSdamnit...00:02
SpamapSKeyError: 'getpwuid(): uid not found: 1000'00:03
SpamapSso close00:03
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul master: Add Dockerfile for running tests  https://review.openstack.org/44831400:05
SpamapSwell I'm 5 minutes over on EOD00:05
SpamapSjlk: ^ that kinda works for me00:05
SpamapSexcept when I try to run as non root00:05
SpamapSnot sure how to plumb in my user00:05
jlkhrm, pbr needs more than git-core it would seem00:07
jlkoh own-goal00:08
*** rattboi has left #zuul00:09
*** rattboi has joined #zuul00:09
*** rattboi is now known as rattboi-test00:11
*** rattboi-test is now known as rattboi00:11
openstackgerritJoshua Hesketh proposed openstack-infra/nodepool feature/zuulv3: Merge branch 'master' into feature/zuulv3  https://review.openstack.org/44532500:27
jheskethpabelanger: ^00:27
*** harlowja has joined #zuul00:42
openstackgerritK Jonathan Harker proposed openstack-infra/zuul feature/zuulv3: Perform pre-launch merge checks  https://review.openstack.org/44627500:44
openstackgerritK Jonathan Harker proposed openstack-infra/zuul feature/zuulv3: Perform pre-launch merge checks  https://review.openstack.org/44627500:48
jlkoh interesting01:38
jlkSpamapS: on my system, I'm running docker as my user, and while it's "root" inside the container, it's actually my UID. It's writing things to the filesystem that show up as my UID/GID when I look at them outside the container.01:38
jlkBWAHAHAHA. My container got named angry_edison and I am amused.01:41
*** harlowja has quit IRC03:35
jeblairjlk: watch out for vengeful_tesla04:16
SpamapSjlk: right, but I want it to not think it is root inside.05:10
SpamapSperhaps that is a bad idea05:10
SpamapSthough I had trouble when they ran "as root"05:17
SpamapS590ee83c155d        zuuldev             "/bin/sh -c tox"    34 seconds ago      Up 32 seconds                           dreamy_stallman05:24
SpamapSthe hits keep on coming05:24
SpamapSjlk: also I think the way you're doing it, you have a VM between you and docker (docker-machine) so that's likely why the ownership stays you05:45
SpamapSfor me, if I'm root in the container, volume touched files are root owned05:45
SpamapSbest way to slow down zuul unit tests seems to be to run it on aufs05:59
* SpamapS looks at how to make container rootfs == tmpfs06:00
SpamapSoo neeat, --tmpfs /tmp makes go fast06:03
openstackgerritJamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Refactor nodepool apps into base app  https://review.openstack.org/44839506:05
SpamapShrm..  running tests in my container fails for weird reasons06:11
SpamapSRan 51 (-6) tests in 179.562s (+39.656s)06:11
SpamapSFAILED (id=10, failures=8 (+3))06:11
*** isaacb has joined #zuul06:26
*** isaacb has quit IRC07:13
*** isaacb has joined #zuul07:23
SpamapSok, docker was a huge mistake.08:05
* SpamapS deletes it forever08:05
* SpamapS got 6 timeouts even in single-threaded mode under docker.08:06
*** hashar has joined #zuul08:19
openstackgerritJoshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Remove url_pattern config parameter  https://review.openstack.org/44716511:08
openstackgerritJoshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Simplify the log url  https://review.openstack.org/43802811:08
*** hashar is now known as hasharLunch11:08
openstackgerritJoshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Remove url_pattern config parameter  https://review.openstack.org/44716511:46
openstackgerritJoshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Simplify the log url  https://review.openstack.org/43802811:46
Shrewsjhesketh: why do you think the nodepool_id feature might need to be removed?12:27
jheskethShrews: I'm not sure it will.. I suspect it's useful, but once we stop running the old nodepool it may not be necessary12:35
Shrewsjhesketh: the merge, as presented by gerrit, confuses me. Are we going to lose the current working code for test_leaked_node for the old broken version if we approve that?12:40
Shrewsi really hate to see skips added back, especially for tests that are working now  :(12:42
jheskethShrews: it's only tests on the v3 branch12:42
jheskethso another commit to turn them back on12:42
jheskethrather than fixing the test in the merge commit making it even longer12:42
Shrewsjhesketh: i understand the skips on the new tests that v3 didn't have. it's the existing test i'm concerned about. looks to me like the merge breaks it (and then skips it)12:43
Shrewstest_leaked_node_with_nodepool_id and test_leaked_node_not_deleted are new. that's fine to fix in a later review. but test_leaked_node works now12:44
jheskethShrews: okay, that's fair12:44
jheskethsomething changed in the merge to break them, but you're right, they should probably be fixed in the merge commit rather than later on12:45
jheskethI'll have to look tomorrow though because it's nearly midnight and this wine is nice12:46
Shrewsjhesketh: mmm, wine. enjoy!12:46
mordredyah - I actually agree more that it may need to be removed - we don't need nodepool_id in the zk version because we track ownership via zk, no?12:46
mordredthat was the hack for v2 to not delete v3 nodes12:46
Shrewsmordred: this entirely depends on what you do with your json change  :)12:47
jheskethplus I just got my esp8266 reading temperature correctly so I can continue writing a logger/server to monitor what might become a cellar ;-)12:47
Shrewsmordred: did you see the note i left on that review about breaking the world if you don't abandon it?12:47
Shrewsmordred: my latest comment here  https://review.openstack.org/#/c/297950/12:49
mordredShrews: oh - yeah - let's just abandon that for v212:50
Shrewshrm, i actually don't understand the purpose of test_leaked_node_not_deleted12:54
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Make sure services are running for test-setup.sh  https://review.openstack.org/44855512:54
pabelangermorning12:54
Shrewsoh, the nodepool_id test12:55
*** hasharLunch is now known as hashar13:05
jheskethnight all!13:12
tobiash_mordred: did some short testing about io footprint optimization13:53
tobiash_I tested with a ccached big c++ project13:54
mordredtobiash_: how did it go?13:54
tobiash_a combination of increasing commit interval of ext4 (> build duration) and a set ov vm.dirty_* settings saved me about 2.6GB of writes13:55
mordrednice! the vm.dirty_ settings have to be set on the host, right?13:55
mordred(rather than the guest)13:56
tobiash_mordred: I just do this at the beginning of the build job: http://paste.openstack.org/show/603765/13:56
mordredtobiash_: oh neat!13:57
tobiash_without this the dirty pages graph (which contains unwritten data) goes up and down13:57
tobiash_with this it more or less goes monotonic up (if there's enough ram) inhibiting most writes13:58
tobiash_if this proves giving good results in a wider range, these settings could just be baked into the dib image nodepool builds13:59
*** isaacb has quit IRC14:00
mordredyah - I'm going to test to see if it's effects on the openstack side real quick14:02
tobiash_depending on what you do in the job this might need to be combined with eatmydata14:04
mordredtobiash_: pushed up a quick test: https://review.openstack.org/44859114:05
tobiash_the build time itself (did not test many parallel jobs at once) was unaffected as the writes typically are done asynchronously14:05
mordredtobiash_: in a few of our clouds we're our own noisy-neighbor - so even if the only effect is reducing load on the underlying cloud, it still might be a win for us14:06
mordredverifying that will be a little bit of work of course :)14:06
tobiash_yepp, that was my initial idea to reduce noisy-neighbor behaviour14:07
*** isaacb has joined #zuul14:11
tobiash_one possible side effect of this is that if unwritten data is so much that it is forced to write then so much data could be written at once that dmesg logs a warning like this:14:11
tobiash_INFO: task jbd2/sdb1-8:612 blocked for more than 120 seconds.14:11
tobiash_could be happen in a situation like 15gb are unwritten, 2gb free and a process wants 8gb, then 6gb would need to be synced to disk at once14:13
pabelangerhttp://paste.openstack.org/show/603769/14:17
pabelangermordred: managed to get dox working^14:17
pabelangerhttps://review.openstack.org/#/c/448555/ was the only patch to zuul needed14:18
Shrewspabelanger: dox??? oy14:27
Shrewsi'm going to have to re-learn that code now, aren't i?  :)14:28
pabelangerShrews: :)14:29
pabelangerit's actually not that bad14:29
pabelangerobviously we need some new images14:29
pabelangerbut, once I got my dox.yaml file setup14:29
pabelangerthings worked as expected14:29
*** bhavik1 has joined #zuul15:13
*** isaacb has quit IRC15:14
jeblairclarkb: should we abandon https://review.openstack.org/436544 now?15:51
clarkbya I can abandon it15:53
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul master: Add Dockerfile for running tests  https://review.openstack.org/44831415:57
jeblairSpamapS: i can't keep up with your on-again / off-again relationship with docker :)15:58
SpamapSjeblair: I know15:58
SpamapSit's awfu15:58
SpamapSl15:59
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul master: Add Dockerfile for running tests  https://review.openstack.org/44831415:59
SpamapSjeblair: that's just me dumping my local changes and context switching back into not-docker15:59
SpamapSThe reality is.. it was kind of fun to work with16:00
SpamapSbut it can't even run our test suite fast enough to avoid the hard timeouts.16:00
jeblair:(16:00
jeblairSpamapS: that seems strange to me based on what i think i know about containers, but i definitely don't want to fall into that rabbit hole i see you're in, so i'm just going to look away :)16:01
SpamapSjeblair: I'm pretty sure it's the overlay filesystem16:01
SpamapSsystem CPU usage was _very_ high16:01
jeblairah16:01
SpamapScould dink around with btrfs or lvm16:01
SpamapSbut...16:02
SpamapSat some point16:02
jeblairSpamapS: and the tmpfsing didn't help?16:02
SpamapSrunning the tests on a VM works fine16:02
SpamapSjeblair: it did a little.16:02
pabelangerSpamapS: jeblair: do you mind reviewing 448042? that is our first stem to green jobs again for zuulv3-dev16:02
SpamapSbut even with 1 thread I still got 6 alarm clocks16:03
SpamapSpabelanger: will do that shortly16:03
jeblairpabelanger: we haven't landed the workspace var yet have we?16:04
*** bhavik1 has quit IRC16:04
pabelangerjeblair: not yet, I restacked it on 44804216:04
jeblaircool16:04
pabelangerthat stack is now green too16:04
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove irrelevant test_merging_queues  https://review.openstack.org/44676816:04
*** bhavik1 has joined #zuul16:04
*** harlowja has joined #zuul16:06
*** bhavik1 has quit IRC16:11
jeblairpabelanger: +2s down the stack until a -1 at the end.  i'll leave the +3 for you after SpamapS weighs in.  i dunno if you want to ping clarkb on them too.16:14
pabelangerjeblair: sure, more people the better!16:14
clarkbwhich stack?16:15
jeblairclarkb: starts at https://review.openstack.org/44804216:15
jeblairpabelanger: can you elaborate on the socket stuff you added in 446683?16:18
jeblairpabelanger: oh, is it because we used to have something equivalent to wait until a host was up and serving ssh (while we tried to ssh and run ready script), so you want to get similar fine-grained failure messages?16:19
pabelangerjeblair: right, with SSHClient, it would raise a socket execption, however Transport doesn't.16:20
pabelangerSo, it would results in spamming the logs: http://logs.openstack.org/83/446683/5/check/gate-dsvm-nodepool/7c4e0b6/logs/screen-nodepool.txt.gz16:20
pabelangerand had no easy way to trap the exception16:20
pabelangerhowever, open to suggestions on making it better16:21
SpamapSpabelanger: reviewed 448042.. heading down stack16:22
* SpamapS may still be a little be decaffeinated and thus carrying a +1 cranky buff16:24
jeblairpabelanger: i left a -0 on 446683; can you take a look at that; and i'd like clarkb and Shrews to take a(nother) look.16:25
pabelangerjeblair: sure16:25
SpamapSew16:39
SpamapS/tmp/console.log should be /run/console.log, FYI16:39
SpamapS(predictable filenames in /tmp are basically always a terrible idea)16:39
SpamapSand really /run/zuul-prepared-workspace/console.log is better (just reading roles/prepare-workspace)16:40
jeblairSpamapS: what's the attack vector/endgame there exactly?16:40
jeblairSpamapS: console.log could end up being rather langer than one might want on a tmpfs.16:41
jeblair(i mean, if a job wanted to replace the console log, it could do regardless of where it's stored; it will have rights to do that)16:43
SpamapSjeblair: it's super low risk, I know, but I prefer to viciously eliminate all use of /tmp/staticanything than try to reason about every attack vector surrounding symlinks as predictable files in /tm16:46
SpamapSdamnit, my enter is beating my letters too often16:47
SpamapSWith throwaway nodes, I know we don't have to worry16:47
SpamapSBut I don't like putting bad practice into code that others will consume and possibly cargo cult without knowing.16:47
jeblairSpamapS: okay, but in this case, we're talking about storing potentially huge files in a tmpfs, and expanding zuul's footprint on the node.16:48
jeblairSpamapS: i don't accept it's bad practice.  :)16:48
SpamapSjeblair: /tmp/${randstring} then, and randstring=$(cat /run/mydir/wheresmystring)16:49
SpamapSalso, /var/tmp is for big files16:49
SpamapS /tmp is not16:49
jeblairSpamapS: no i mean i understand the issue16:49
jlkSpamapS: yeah I'm not sure what happens on OSX any more. It's no longer "docker machine", it's a native thing?16:50
jlkor it's a very very well hidden vm16:50
SpamapSjlk: oh? wild16:50
jeblairSpamapS: what i'm saying is that this is something we have reasoned about at length, and come to a conclusion.  i don't like to blindly follow conventional wisdom.  i think this is something to think about carefully.16:50
jlkah16:51
jlk"The Docker engine is running in an Alpine Linux distribution on top of an xhyve Virtual Machine on Mac OS X"16:51
SpamapSjeblair: fair enough. I have not taken the time to reason about it because I've accepted that it's always a bad idea and have not been proven wrong, nor have I attempted to re-evaluate that position. I'm entirely willing to ignore this case in the face of those who have taken time to think hard about it.16:52
SpamapSjlk: yeah, well hidden is right!16:52
clarkbis xhyve a port of bhyve to os x?16:52
SpamapSprobably16:52
jlkYes16:52
jlkhttps://github.com/mist64/xhyve16:53
SpamapSa google search for 'predictable filename tmp' reveals a pretty awful list of CVE's though16:53
SpamapSso I'd have to really want to have my mind changed16:53
* SpamapS is fully submerged in the confirmation bias now16:53
SpamapScrap I have a meeting in 7 minuts and 8 minutes of prep16:53
* SpamapS de-ircs for a moment16:54
jeblairSpamapS: i'm happy to talk about it, or alternatives, if we can fast-forward past the part where we assume the author (o/) is not aware of the issues and hasn't thought of it.  :)16:54
SpamapSjeblair: No assumption is being made about the author. Only questions from a stubborn old greybeard. ;)16:54
jeblairSpamapS: okay.  happy to continue when we have some more time (it will take a bit).16:55
Shrewsjeblair: pabelanger: i think we can go ahead and move forward with https://review.openstack.org/447108 and https://review.openstack.org/447109 today, if you two agree17:06
pabelangersure17:07
jeblair++17:08
Shrewsgreat17:09
clarkbok I think I have gotten past setuptools is broken and now need caffeien and breakfast then will review pabelanger's stack17:19
mordredclarkb: oh good - I love it when setuptools breaks17:19
*** hashar has quit IRC17:29
Shrewsjeblair: rbergeron: just sent you two an email regarding doc things17:29
Shrewsenjoy at your leisure17:29
* Shrews decides afternoon coffee is a good idea at this point17:30
pabelanger2017-03-22 17:41:44,015 INFO nodepool.NodePool: Starting ProviderWorker.infracloud-vanilla17:41
pabelangerShrews: ^17:41
Shrewsyeah. and first bug found17:42
Shrewsmin-ready nodes not being started for a new provider17:43
pabelangerya, noticing that17:43
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove ZooKeeperConnectionConfig class  https://review.openstack.org/44768317:44
Shrewsoh, no. that's actually correct17:45
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix hostname issue with nodepool integration job  https://review.openstack.org/44823917:45
Shrewsit's per label, not provider17:45
Shrewsso yay17:46
jeblaircool, so after some use, we should see some min-readies start to pop up17:46
Shrewsi'm going to delete a node, just to see if vanilla catches it17:47
jlkSpamapS: where did you mount the tmpfs?17:47
Shrewschocolate is greedy  :(17:49
* Shrews waits for another jlk or jeblair patch bomb17:49
jlkoh no17:49
jeblairShrews: oh, i do have a stack that needs revising...17:50
Shrews\o/17:50
jlkjeblair: You had mentioned something about using tmpfs to speed up tox, where did you make hte tmpfs? in .tox/ ?17:53
jeblairexport ZUUL_TEST_ROOT=/tmpfs17:53
jeblairjlk: i do that ^17:53
jeblairjlk: so you can mount one anywhere, then tell zuul unit tests to use it that way17:54
jlkI see, and that tells, tox to dump stuff there?17:54
jeblairjlk: it's internal to zuul's tests.  so whenever zuul creates a tmpdir (like ALL THE TIME) it makes one there17:55
jlkI see17:55
jeblairjlk: i, er, probably could have just used TMPDIR env variable, but i don't think i realized python tmpdir respected that at the time.17:55
jeblairit's old.17:55
jeblairfor that matter, i reckon that would probably just transparently work too; i haven't tried it.  :)17:56
clarkbShrews: in v2 allocations were proportional to total provider quota. But I think now its whoever can respond to a request first ya?18:02
clarkbat least at zero usage18:02
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add 'allow-secrets' pipeline attribute  https://review.openstack.org/44713818:02
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Isolate encryption-related methods  https://review.openstack.org/44708718:02
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Augment references of pkcs1 with oaep  https://review.openstack.org/44708818:02
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add support for job allowed-projects  https://review.openstack.org/44713418:02
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Serve public keys through webapp  https://review.openstack.org/44675618:02
jlkhrm, 4 minutes in docker to run pep818:03
Shrewsclarkb: yeah. though each request is for a single node, so that gives each provider at least an opportunity to satisfy it18:03
pabelangerthat did something for infracloud-vanilla18:03
Shrewscool, i see vanilla nodes18:03
jeblairand providers should respond more slowly as they get busier; if that's not enough, we can borrow a gearman trick from zuul v2.5 and start adding proportional sleeps to the algorithm.18:04
jlkLOL vs 32 seconds on the VM. WTF.18:04
clarkbI approved the bottom change of pabelanger's stack (please let me know now if there are still more reviewers interested in it but looks like it got a lot of review)18:05
pabelangerclarkb: great. I think you are the last reviewer atm. So, you should be safe to go up to 44161718:06
clarkbthats what I thought, thanks18:06
pabelangerShrews: but, if we had a nodepool-launcher, per provider, each would have 2 min-ready nodes, right?18:07
pabelangeror still 2 across launchers18:07
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add 'allow-secrets' pipeline attribute  https://review.openstack.org/44713818:07
Shrewspabelanger: no. it would still see 2 ready 'label' nodes18:08
pabelangerk18:08
Shrewspabelanger: assuming the labels in each config were identically named, that is18:09
pabelangerright18:09
clarkbpabelanger: et al one thing I notice reading http://zuulv3-dev.openstack.org/logs/1ce3b8446a594d8c8f07092786aba219/console.log to review that change is we don't annotate the console log with what script is being run18:11
clarkbnot sure if we want ot have the logger handle that globally or just modify our scripts to echo some info about themselve as a form of header or what, but it would make following console log flow easier I think18:11
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Create extra-test-setup role  https://review.openstack.org/44804218:13
pabelangerclarkb: Ya, so that is an interesting problem now.  Because there isn't any stdout when ansible runs the script, which means zuul_stream cannot append it to console.log.  I'm not sure how best to fix that18:14
jeblairmordred: ^18:14
pabelangerand agree, it will be confusing for people that just look at console.log18:14
jeblairi know that if we *streamed* the log, we would see what was actually being run18:14
jeblairthe solution might be to copy something produced by the callback plugin rather than the console log on the host?18:15
clarkbanother log observation is we don't seem to capture where the job ran?18:16
pabelangerya, we need to add that still18:16
pabelangeron my list18:16
clarkbin one way thats nice this was an "ubuntu-xenial" host and details are abstracted. On the other the two py27 jobs took vastly different amounts of time to run and wondering if thats related to cloud /region or the changes themselves etc18:17
clarkbpabelanger: for that I think a host info role like our net info macro in jjb probably makes esnse18:17
clarkbjust echo the hostname and some basic host networkign information18:17
pabelanger++18:18
pabelangergoing to do something like: http://logs.openstack.org/17/441617/10/check/gate-zuul-pep8-ubuntu-xenial/621bbeb/_zuul_ansible/pre_playbook18:18
pabelangerand use zuul_stream18:18
pabelangerI'll do that now18:18
clarkbwhat is zuul_stream?18:20
pabelangerour process that runs on the worker to add things into console.log18:20
pabelangermordred: created it as an ansible task18:21
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Create zuul_workspace_root job variable  https://review.openstack.org/44144118:21
clarkbI remember reviewing those changes, but there were many and the beginning state was completely different than the end state so I'll admit to not really having it all sorted out18:21
pabelangeractually, I think it changed a little from zuulv2.518:21
clarkbya its different18:21
pabelangerso maybe just an echo like you said18:22
Shrewsn-l handled that workload well with the additional provider. chalking that up as a success.18:22
clarkbiirc we fork a process on the nodepool node and every playbook that runs there writes to a socket?18:22
clarkband the forked process is on the other end of that socket reading the data as each playbook runs and it writes it to console log? so I think you just have to echo ya18:22
clarkbrather than use a special annotation18:23
clarkbpabelanger: or does it run on the zuul launcher itself? I think its the nodepool node18:23
jeblairzuul_stream is the callback plugin which runs on the executor.18:24
jeblairwhen i suggested that instead of saving the console log, we should save the output of 'the callback plugin' that's what i was referring to18:24
clarkbjeblair: so there is a forked process for every job on the executor with an open socket reading the writes from the job itself?18:24
mordredjeblair: reading18:25
clarkbthe callback plugin is running in the context of ansible execution on the executor, but wondering where the forked process that reads from that is18:25
clarkb(thats the bit I am currently confused about, though its not super important here)18:26
jeblairclarkb: the callback plugin forks a process to read the stdout over TCP from each *node*18:26
clarkbgotcha thanks18:27
jeblair(so we don't fork on every ansible 'command' execution -- just on the first ansible 'command' execution for a given node)18:27
clarkbya18:27
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add revoke-sudo role and update tox jobs  https://review.openstack.org/44146718:27
pabelanger^ was the task I meant that will not show up in console.log18:27
mordredjeblair: yes - I believe that is actually the right way forward - I have not done that yet because I think we'll wind up doing that as a matter of course when we plumb streaming all the way through18:28
mordredjeblair: that is - the thing we wind up writing to console.log should be the same thing as what we do from our streaming18:28
clarkbpabelanger: https://review.openstack.org/#/c/441617/10/playbooks/roles/prepare-workspace/tasks/main.yaml is roughyl where I would collect the data18:29
clarkbpabelanger: have a dump of it all in one place, hostname, networking, etc18:29
jeblairmordred: where does the output of zuul_stream go right now?18:29
clarkb(but I agree that a host-info role is appropriate rather than part of workspace info)18:30
pabelanger++18:30
mordredjeblair: hrm - actually, that should be what's writing console.log18:30
mordredjeblair: so, ignore me - this should already be the case18:31
jeblairmordred: zuul_log writes /tmp/console.log on the worker node18:32
mordredhttp://zuulv3-dev.openstack.org/logs/1ce3b8446a594d8c8f07092786aba219/ansible_log.txt18:32
jeblairmordred: zuul_stream runs on the executor and reads /tmp/console.log and interpolates it with normal ansible output18:32
mordredyes. normal ansible output goes to ansible_log.txt on the executor18:32
jeblairmordred: okay, so zuul_stream should be going to that yeat18:32
jeblairmordred: i don't see the console output there?18:32
pabelangerso, in the case of: https://review.openstack.org/#/c/441467/20/playbooks/roles/revoke-sudo/tasks/main.yaml I think we should have added a shell: echo "Remove sudo access for zuul user.", so console.log seen it18:33
mordred2017-03-21 20:49:31,202 p=6495 u=zuul |   [WARNING]: Failure using method (v2_playbook_on_task_start) in callback plugin18:33
mordred(<ansible.plugins.callback.zuul_stream.CallbackModule object at18:33
mordred0x7f1b8d8f0550>): all not found in hostvars18:33
mordredwell- there's at least one issue in there-although I think jamielennox has a patch up to deal with that18:33
pabelangerya, we haven't restarted zuul yet18:33
jeblairmordred: yeah, that was I9274a2098348b736198e5fea344f078ee0404b41 which merged18:34
mordredcool18:34
mordredthere is likely a bug then - will start staring at code18:34
jeblaircool18:34
mordredit SHOULD be in there - that said, that ansible_log.txt file is ugly - so we may also want to do additional things18:35
jeblairmordred, clarkb, pabelanger: so aiui, we should stop copying /tmp/console.log in our post-playbook, and instead, copy the ansible_log.txt file as its replacement.18:35
jeblairmordred: yeah, then i think we should look at maybe doing something sane with the rsync output.  that's the biggest unreadable mess.18:36
mordredyes18:36
mordredto both things18:36
jeblairSpamapS: that actually removes one of the main drivers for having /tmp/console.log be a known location.  after we do that, i think we can feel free to make it a regular anonymous tmpfile.  :)18:36
mordredwe actually should be able to just stop copying anything in the post-playbook - ansible writes it directly into the workspace already18:36
SpamapSjeblair: neat. :-D18:37
jeblairmordred: er yeah, that's a good point.  i mean, we did just link directly to the file.  :)18:37
mordredjeblair: so - 2 things to sort - a) make that rsync output less ugly18:37
SpamapSjeblair: I've since had the requisite two cups of coffee, so my beard is a bit less grey and I'm far less cranky. :)18:37
mordredjeblair: b) figure out why the console output isn't showing up in there in the first place18:37
mordredSpamapS: you only require 2 ?18:37
jeblairSpamapS: storing coffee in your beard for later? :)18:38
clarkbre rsync maybe just summarize "x bytes transfered into y files" ?18:38
mordredjeblair: I think large pile of # 127.0.0.1:22 SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.1 lines are super helpful too :)18:38
pabelangerOh, I figure that out18:38
pabelangerit was ssh-keyscan18:38
jeblairmordred: oh that's actual stdout from the test18:38
mordredjeblair: yay!18:38
jeblairmordred: so -- yes, we should get rid of it because it's annoying stuff in our unit tests, but from a zuul arch point of view, that is correct output from the job which should be included.  :)18:39
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Organize playbooks folder  https://review.openstack.org/44154718:39
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Create tox-tarball job  https://review.openstack.org/44160918:39
mordredwe could just plop a nolog onto our rsync of the git repos18:39
jeblairmordred: and as pabelanger says, it's in the process of being removed.18:39
jeblairmordred: cool, that sounds nice and easy18:40
mordredyah18:40
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Silence command warnings  https://review.openstack.org/44874818:40
mordredspeaking of ^^ that's a meaningless change but just happened to notice when I was looking at something else - I'm also happy if people don't think it's a good idea18:41
jeblairoh i wish i knew that for v2.5 :)18:41
clarkbis the all not found in hostvars the thing you were saying was fixed?18:41
clarkband if so, next question is why did that not make the job fail?18:42
jeblairclarkb: i think it's just in the callback plugin which isn't critical?18:42
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Stop logging git repo rsync output  https://review.openstack.org/44875018:42
mordredyes. the callback plugin is "just for logging"18:42
clarkbI'm conflicted by that statement :)18:43
clarkbif logging doesn't work then I have no way of knowing a success was legit18:43
clarkbI think logging not working should be a failure?18:43
mordredwhen we get zuul to start interpreting the output of running stuff, we should likely figure out a way to trap for that and fail hard18:43
mordredclarkb: yes. I agree18:43
mordredthat's why I put it in scare-quotes18:43
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Stop copying console.log  https://review.openstack.org/44875218:45
jeblairThat's just a simple remove but i put in a todo about the tmpfile thing18:45
SpamapSmordred: yes, 2 cups == human sauce ... more and I become a _pleasant_ human18:46
* SpamapS afk's18:46
jeblairbut i'm going to WIP that because we shouldn't land it until mordred fixes the callback plugin18:46
mordredbefore we investigate TOO much further, it might be nice to restart with jamielennox's change applied18:47
jeblairoh! restart!18:47
jeblairi forgot about that18:47
jeblairwe so rarely have to now :)18:47
mordredthe original testing was that stuff got written to ansible_log.txt18:47
mordredjeblair: I know! it's exciting18:47
pabelangerokay, a little confused about 44875218:49
pabelangerso, what is the console.log used for?18:49
pabelangerjust live streaming I guess18:49
jlkSpamapS: alright I think I'm going to give up on Docker Toxxer  too, at least on OSX.18:51
Shrewsjlk: you should give up on OSX18:52
Shrewsi just can't use it for real development anymore18:52
jlkheh, I really don't feel like going through the pain of re-imaging this laptop18:53
jeblairmordred: restarted18:53
Shrewsjlk: buy a new laptop! :)18:53
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Stop copying console.log  https://review.openstack.org/44875218:53
jlkShrews: you mean make IBM buy me a new laptop18:53
Shrewsyes. that.18:53
jlkI haven't actually owned my own laptop since... 2002~18:54
mordredpabelanger: yes - console.log on the remote host is where things write content so that the log stremer can stream it18:55
mordredpabelanger: we do things in the command and shell modules to cause the stdout/stderr to go there instead of being returned in the ansible return structure18:56
mordredpabelanger: then in the zuul_stream callback plugin on the executor we open a socket connection to the daemon on the remote node and read the stdout/stderr from it and inject it into the output18:56
mordredpabelanger: it's a bit of a strange dance - it's probably worth a diagram at some point18:57
pabelangerdo we think people will no be confused between the 2 outputs? stream and archived logging?18:57
Shrewsthat screams "diagram please"18:57
clarkbjlk: double check you have vmx?18:57
clarkbif speed is the only problem, looks like bhyve doesn't actually require vmx for single cpu VMs so you might be getting slow emulation18:58
jlkclarkb: it's a known issue with OSX and mounted host volumes18:58
jlkit's just super slow. There are some hacky go-arounds, like using unison to sync files into the container rather than do a volume mount. That made it much faster, but tests that should pass are just failing18:58
jeblairpabelanger: let's see if we can get a good example and see if it's confusing.18:58
*** harlowja has quit IRC18:59
pabelangeragree! I think diagram will help too18:59
jlkI suppose it could be me testing on Fedora rather than Ubuntu18:59
jeblairpabelanger: well, i diagram will help us understand how it works.  a diagram must not be necessary to help a user understand the output.18:59
*** harlowja has joined #zuul19:00
*** harlowja has quit IRC19:01
jeblairpabelanger, mordred: http://zuulv3-dev.openstack.org/logs/07206b6a6d514c40b4852254e2993f8a/ansible_log.txt is much better19:08
jeblairthough it seems to be missing the tox output?19:09
Shrewsjeblair: i wonder if we should add retry logic to our delete* zk api methods? that kazoo recursive delete() issue keeps popping up: http://logs.openstack.org/95/448395/1/check/nodepool-coverage-ubuntu-xenial/86a50ab/console.html#_2017-03-22_06_07_58_05259519:11
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role  https://review.openstack.org/44161719:11
jeblairShrews: the shared locks would let us fix that, right?19:12
Shrewsjeblair: yeah, it would19:12
Shrewsassuming it ever gets merged :)19:13
jeblairShrews: maybe as a temporary measure, so we can land changes?  :)19:13
jeblairShrews: oh, it'll get merged somewhere19:13
Shrewsjeblair: let's hold off a bit then. it doesn't bite us too terribly often. if it gets worse, then yeah19:13
jeblairpabelanger, mordred: it looks like maybe we only got the console log for the first task?19:14
jeblairpabelanger, mordred: oh!  it's because we have multiple playbooks19:17
jeblairpabelanger, mordred: the zuul_stream log reading subprocess terminates at the end of a playbook, but the daemon_stamp file still exists, so the next zuul_stream callback (for the next playbook) does not launch a new subprocess.19:17
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role  https://review.openstack.org/44161719:18
jeblairwe either need to make the stamp file specific to a playbook (where does it get written anyway?), or clean it up properly.19:18
mordredjeblair: OH!19:19
clarkbcould we avoid the stamp file entirely and have ansible parent process kill the child using atexit?19:19
jeblairclarkb: the stamp is so that two tasks within the same playbook don't each start threads19:20
jeblairer, processes19:20
jeblair(heh, if it were threads this would be easy)19:20
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role  https://review.openstack.org/44161719:21
clarkbright instead of using a file though ust do process management?19:21
clarkbeg "do I have a child log worker process if no then create one" then later when process dies kill its child so it doesn't zombie19:21
jeblairclarkb: how do we prevent a second process from running for a second task?19:21
pabelangerreadying backscroll19:22
pabelangerreading*19:22
jeblairclarkb: oh i think i see what you're saying -- keep track subprocesses it in memory in the callback.  i'm going to turn this over to mordred because i'm fuzzy on the ansible internal details here.  :)19:23
clarkbjeblair: ya either that or query the operating system for members of the process group and filter19:23
*** bhavik1 has joined #zuul19:24
jeblairclarkb, mordred: i will say that part of the reason we don't want to open a new tcp connection for each task is that we get the whole stream again each time.  so actually, we probably don't want to just naively make the callback plugin do another subprocess for a second playbook -- we'll get a copy of everything that has come before.19:26
* mordred is trying to remember if there was a reason we didn't do it that way originally19:26
clarkbjeblair: oh interesting19:26
mordredyah19:26
clarkbjeblair: even though they are bound by different invocations of ansible-playbook?19:26
mordredit's because on the remote host we don't have any idea that the first ansible-playbook stopped19:27
jeblairmordred: maybe we thought that this would span playbooks and that's why we made a stamp file, but the error is that we weren't expecting the subprocess to die at the end of the playbook?19:27
mordredyes. I think we made that logic error19:27
jeblair(i'm assuming it does die, but i'm only inferring that from log output)19:27
mordredwhich is pretty obvious now in hindsight19:28
mordredI mean - we launch a new subprocess for each playbook19:28
jeblairmordred: yes, though we set p.daemon, and without looking that up in the manual -- i might make assumptions as to what it does.  :)19:28
mordredyah.19:28
jeblairto be clear, i'm going to modify and repeat an earlier thing i typed:19:29
jeblairmordred: maybe we thought that this would span playbooks and that's why we made a stamp file, but the error is that we weren't expecting the *log streaming* subprocess to die at the end of the playbook?19:29
jeblair(just in case that was ambiguous as to which 'subprocess' i meant there)19:29
*** bhavik1 has quit IRC19:29
jeblair"When a process exits, it attempts to terminate all of its daemonic child processes."19:30
mordredjeblair: yes - I think that may be the case19:30
jeblairso, yeah, according to docs, it seems it's expected for our ("daemon") log-streaming subprocess to die19:30
jeblair(this is via the multiprocessing module)19:31
jeblairi have to forage for food now19:31
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role  https://review.openstack.org/44161719:32
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role  https://review.openstack.org/44161719:33
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info role  https://review.openstack.org/44161719:35
pabelangerk, sorry for noise19:35
pabelangerthe only piece we might want, is info about which zuul-executor was used19:35
pabelangerbut, we don't log that currently19:36
*** hashar has joined #zuul19:40
mordredjeblair: ugh. so it's really caused by the fact that we used multiprocessing - which we used because trying to get a multiprocessing subprocess from ansible to spawn a daemon process the way you did in the log streamer originally died in a fire19:40
mordredjeblair: ping me when you get back from food19:41
*** harlowja has joined #zuul19:45
jlkpabelanger: when using dox, did you just pip install it, or install from git?19:56
pabelangerjlk: git19:59
pabelangerI also had to rebuild the image from docker19:59
pabelangerI was going to push up some reviews tonight19:59
jlkokay. I wanted this to work on Fedora, but it might not. :(20:00
pabelangerI ran dox from fedora-2520:00
jlkI don't know if that's th eissue.20:00
pabelangerbut used xenial images20:00
jlkI meant the image _in_ docker20:00
pabelangercontainers*20:00
pabelangerya, I haven't tried fedora yet20:00
jlklike I wanted tox to pass on fedora-d2520:00
jlkI don't know if that's my problem locally.20:00
pabelangerYa, I've been testing on fedora-25 too locally, tox does work20:02
pabelangerI'll try fedora-25 dox later tonight too20:03
jlkokay, well then it's probably just that docker on osx is too unstable for the py27 tox target. pep8 worked, and I got it reasonably fast, but py27 falls over for unknown reasons20:07
mordredjlk: you saw that SpamapS had timeouts with his docker too - his hunch was aufs20:12
jlkhrm20:14
jlkCOFFEEEEEEE20:15
mordredyah man20:15
mordredI'm doing that right now20:15
*** harlowja has quit IRC20:18
pabelangerneed some coffee too... or I should just nap20:19
*** hashar has quit IRC20:23
*** hashar has joined #zuul20:23
jlkokay, maybe I can think clearly now.20:34
mordredjlk: just to be sure - have another mug20:39
jeblairmordred: i am sufficiently burritoed.20:42
mordredjeblair: woot!20:42
mordredjeblair: so - I have a call with cdub in 18 minutes20:42
mordredjeblair: but - I have been thinking about our issue ... and have 2 ideas20:44
mordredjeblair: one is to make the streaming interaction with the worker more richer - like, rather than telnet/netcat, have it be more complex, maybe with playbook id  that can get passed in so that playbooks can request to start at a point in time or something20:45
mordredjeblair: which seems like a lot of work and probably more and more fragile - but ultimately is still doable20:45
mordredjeblair: but the thing I like, which is even more hacky but likely to maybe/probably be simpler/more reslient20:46
mordredis to spin up a "logging" thread/subprocess on the zuul-side which runs a playbook that basically does the zuul_log module on all of the hosts in the inventory, then just starts streaming the results back into the ansible_log.txt file like the daemon process does now20:47
mordredand have zuul kill that subprocess when it's done executing the job20:48
jeblairmordred: re the first thing -- that would mean having the zuul_log plugin thingy annotate the console.log with metadata, yeah?  or maybe having it write out different streaming output files for each playbook (then obviously being able to serve each of them)?20:48
mordredjeblair: yes - one of those two things20:48
jeblairmordred: i was thinking of the problem and had two thoughts as well --20:48
mordredneat!20:48
jeblairmordred: one was that part of the "start from beginning" thing was for humans, and we don't need that anymore on the worker node.  so we can maybe think about dropping that.  but we still have synchronization issues (like the worker starts recording data before the callback starts fetching it), so maybe we still need it.20:49
jeblairmordred: the second was almost exactly what your second idea was.  :)20:49
jeblairmordred: so that's two votes for that.  :)20:50
mordredjeblair: woot! maybe we should explore that one for now then20:50
jeblairmordred: sounds like a plan20:51
mordredwoot20:51
mordredI'll think about it in earnest after my next call20:51
*** harlowja has joined #zuul21:33
jlkAnybody seen anything like "Exception: Job project-gerrit in common-config is not permitted to shadow job project-gerrit in common-config" ?21:49
SpamapSnot I21:50
jlkokay cool, so this is definitely broken21:53
jeblairjlk: no; zuul emits that when two jobs with the same name appear in two different repos.  so it suggests zuul thinks that the first "common-config" is different than the second "common-config" repo in some way.21:57
jlkthat's... bizarre21:57
jeblairjlk: (maybe it appeared twice in the repo listing in the tenant?  i don't think we have any sanity checking around that yet)21:58
jlkhttp://paste.openstack.org/show/603828/ is the zuul.yaml21:59
jheskethMorning21:59
jlkoh so22:00
jlkmaybe22:00
jlkhttp://paste.openstack.org/show/603829/22:00
jlkthat's the tenant config22:00
jeblairyep22:00
SpamapSoh man22:01
SpamapSI broke subunit22:01
SpamapSLength too long: 1833540522:01
jlkjeblair: I took a guess on the tenant config, what's the right way to list multiple sources?22:01
jlkcan a tenant have more than one connection?22:02
jeblairjlk: i think in production that should work, but the test framework probably won't deal with that correctly -- they'll need different git repo names.  also, in general, we're going to have a devil of a time with identical repo names until we implement http://lists.openstack.org/pipermail/openstack-infra/2017-March/005208.html22:04
jeblairjlk: so yes, can have multiple connections, will just (temporarily) need distinct git repo names across them22:04
jlkI can see the 'repos to test" needing to be unique, but I think where it's falling over is that there needs to be different repos to grab configuration from22:05
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role  https://review.openstack.org/44161722:05
jeblairjlk: i agree, while it's possible/likely that the test framework will not do the correct thing with the underlying git repos if they have the same name, that probably isn't the actual issue you are hitting here (unless you have distinct content in those two identically named repos -- then zuul would probably read the same content twice because of that error)22:07
jeblairjlk: the actual error is that the model.Project object associated with each of those is the same, despite being from two different connections22:07
jlkthey aren't identically named repos, they're literally the same repo. I was approaching this from a "centrally configured service that works with multiple project locations"22:08
jlkso one configuration repo, handling projects that live in more than one connection location22:08
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role  https://review.openstack.org/44161722:08
jeblairjlk: but one of them is accessed via gerrit, and one is acccessed via github...22:08
jlkthe config repo? It's accessed by git.22:09
jlkunless I'm really misunderstanding something22:09
jlkI shouldn't have to host my zuul configuration in somebody's gerrit in order to connect to that gerrit22:09
jeblairjlk: that file tells zuul where to find all the repos it works with.  it's a list of each connection, and for each connection, a list of repos accessed via that connection.22:10
jlkhrm.22:10
jeblairjlk: so that config says "work with the common-config repo on the github server" and "work with the common-config repo on the gerrit server"22:11
jlkMy brain hasn't caught up with the whole "config in git" world.22:11
jlkI'm trying to replicate where all the config lives on the local filesystem22:11
jeblairjlk: the "git" driver will do that for you22:11
jlklike, I want _one_ place to put my zuul configuration, even if that configuration is used by multiple connections22:11
jeblairjlk: i think there's still a mismatch22:11
jeblairjlk: the zuul configuration isn't used by connections -- it's global.  it loads bits of the config from every git repo it knows about.22:12
jeblairjlk: regardless of which connection it uses to access each of those repos22:12
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role  https://review.openstack.org/44161722:13
jeblairjlk: so the part of the config you load from github can be used by projects in gerrit, and vice versa22:13
jeblair(it all goes in to one pot)22:13
jlkokay, okay...22:13
jlkso I should be able ot list ONE config source, and it may list multiple pipelines, which use multiple drivers22:14
jlkand I could use the "git" source in tenant config to deliver it by raw git, and not bother with trying to go through "github"22:14
jeblairjlk: the second thing yes.  the first thing i'm still having trouble with.  :)22:15
jeblairjlk: mostly because zuul won't know about any projects that aren't listed in the tenant config.22:16
jlker...22:17
jeblairjlk: http://paste.openstack.org/show/603830/ is valid22:17
jlkbut we don't list projects in the tenant config22:17
jeblairjlk: if that's what you're getting at22:17
jeblairjlk: (then foo and bar can both be enqueued into a pipeline defined in project-config)22:17
jlkI should re-state.22:17
jlk_I_ haven't been listing project repos in the tenant config, I"ve only been listing them in the zuul.yaml file within the config repo22:18
jlkI may have been doing this all wrong!22:19
jeblairjlk: true, we list 'repos' in the tenant config, and 'projects' in zuul.yaml.  however, it turns out that they need to refer to the same objects in memory anyway, and it's a little confusing, so in that email i'm proposing we start using the word 'project' in the tenant config as well.22:19
jlkso what I've had that seems to be passing tests, is22:19
SpamapSjeblair: when you say "replace NullChange with enqueueing Refs" .. just so I know we're on the same page.. what you meant was to enqueue a Ref that represents HEAD of the given project yes?22:20
jeblairjlk: really every project should show up two places: once in the tenant config ("main.yaml") to tell zuul to go fetch it and read its .zuul.yaml file, and at least once in a zuul.yaml or .zuul.yaml file within one of those projects.   [we will probably need to make exceptions for foreign projects in the third-party ci case, but let's ignore that for this coversation]22:20
jlkhttp://paste.openstack.org/show/603831/22:20
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role  https://review.openstack.org/44161722:21
jeblairSpamapS: yes, or possibly the tip of a given branch.  i think that will make periodic jobs make much more sense (zuul says "test this ref.  don't worry about where it's from")22:21
SpamapSjeblair: great. Just checking before I fall down the "how do I ask Gerrit for that" hole22:21
jeblairSpamapS: might be able to ask a zuul merger too (that might make it slightly more driver independent)22:22
jeblairjlk: you should list org/one-job-project in main.yaml as well; at the very least, it won't be able to have a .zuul.yaml file if it's not listed there.22:23
jeblairpabelanger: i've flipped to a -1 on https://review.openstack.org/447647 can you see my comment, please?22:24
jlkright, we aren't... testing that path yet.22:24
jlkit just seemed to "work" as far as unit tests are concerned.22:24
jeblairpabelanger: i think we need a working gate-zuul-nodepool job to see the failure i mentioned.22:24
SpamapSjeblair: Ah, ok. Hadn't thought of that. Currently changing zuul/gerrit/source.py to stop spewing NullChanges though, so right now I think it's ok.22:24
jlkbecause the entirety of the config was in the common-  repo22:24
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role  https://review.openstack.org/44161722:24
SpamapSour zuul/source/gerrit.y maybe?22:24
pabelangerjeblair: sure. infact, it should be fixed now, I can check experimental22:25
jeblairjlk: yeah, i can see how that would work.22:26
jlkAHAHAHAHAHA22:26
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role  https://review.openstack.org/44161722:26
jlkoooops22:26
jlkjeblair: so yeah, I just tried to list the project repo in main.yaml22:26
jeblairjlk: we may even want to keep that working for the third-party-ci foreign-project case.  but generally speaking, we'd want to list all the projects there so that they can have .zuul.yaml files.22:26
jlkand ran into a lovely:   File "zuul/driver/github/githubsource.py", line 81, in getProjectBranches raise NotImplementedError()"22:27
jeblairjlk: that sounds useful!22:27
jlklooks like I clearly haven't added the capability to fetch zuul files from project repos to the github driver yet.22:27
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role  https://review.openstack.org/44161722:27
jeblairjlk: that's a new thing in v3.  it's "list all the branches of a project, so i can come right back and ask for a .zuul.yaml file on every one of them".22:27
jlkokay well, I'll go further down the path of not listing the projects there yet22:28
jlkbecause I removed it, and got the config to load (and the test fails later down the road, but that's a good start)22:28
jeblaircool22:28
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: [WIP] Add net-info role  https://review.openstack.org/44161722:29
jlkjeblair: thank you for the guidance!22:29
pabelangerOkay, neat!22:30
jeblairjlk: np!22:30
pabelangerI figured out the hostname of zuulv3-dev from ansible22:30
pabelangerhowever...22:30
pabelangerI think it is a security issue22:30
pabelangerhttps://review.openstack.org/#/c/441617/22:30
pabelangerpretty sure we don't want to allow that ^22:30
jlka lookup outside of the safe path?22:31
pabelangerspecifically the lookup for executor22:31
pabelangerya22:31
jlkisn't this ran trusted?22:31
jlkor is this the untrusted bits?22:31
pabelangeruntrusted22:31
jlkyeah, okay.22:31
pabelangerbasically, I think we need to disable lookups for untrusted stuff22:31
pabelangerwhich I think we can do in ansible.cfg?22:32
jeblairmordred: ^ fyi22:32
jlkfile lookups for sure22:32
jlkand probably template too22:32
pabelangerya22:32
jlksince that is also doing the same thing?22:32
pabelangerI didn't want to dig more into it22:32
pabelangerIhttps://docs.ansible.com/ansible/intro_configuration.html#lookup-plugins22:33
pabelangerwe should be able to set that to None for untrusted22:33
jeblairif lookup is useful, we can also make a sanitized version of it like the other plugins22:34
jlkoh dear there are a lot of lookups22:34
mordredoh wow.22:34
pabelangerThey could be, but lookups are limited to side ansible-playbook runs22:34
pabelangerso, would we want jobs looking up things on executors?22:34
jlkwell...22:34
jlkyou could lookup passwords from a password store22:35
jlkor DNS entries, or...22:35
pabelangerright22:35
jlkoh god, redis.22:35
mordredyeah - this is a whole new layer of fun22:35
jlkhahaha22:35
jlkthere's a pipe lookup22:35
jlkNOTHING CAN GO WRONG THERE22:35
jlklike that's just straight "run my code on your box please"22:35
pabelangerI did try to gather facts on localhost, which was blocked. So that is good22:35
mordredpabelanger: woot!22:36
jlkand env lookups. We don't have anything important in the env, do we?22:36
mordredmaybe we start with disabling lookups while we go through the list of lookups to sanitize them22:36
pabelangerjust the defaults setup by bash for zuul user22:36
jlkyeah, shut 'em down22:37
jlkwe'll turn any on that we absolutely need22:37
pabelangermordred: ++ good place to start22:37
pabelangerI mean, container things should help with this too right?22:37
jlkeh...22:37
jlkif we're going belt+suspenders22:37
mordredyah - we should belt/suspenders it in both places22:37
jlktrying to prevent local code execution22:38
pabelangerokay, I'm good with disabling of look ups :)22:38
jeblair++ belt and suspenders22:38
pabelangerso, I'd like to know the hostname of the executor, so I can log it.22:38
jeblairpabelanger: so once we have containers, when we run into something like this we should say "oops -- look, zuul let me do this dangerous ansible that the container caught; let's go fix zuul so that doesn't happen"  :)22:39
jlkDo we have a list somewhere of all the plugins we're rejecting?22:39
pabelangeraside from modifying zuul to add it to vars.yaml, any other suggestions?22:39
jlkor are we rejecting _all_ custom plugins?22:39
pabelangerjeblair: agree22:39
jeblairpabelanger: add it to zuul.exeuctor in vars.yaml22:39
pabelangerk22:39
jeblairjlk: dirs and files in zuul/ansible/* is more or less the list22:40
jlkis that a white list or a blacklist?22:41
jeblairjlk: untrusted mode forces those to be the only callback + action (and soon + lookup) plugins available to ansible, and refuses to run if a playbook or role has a plugins dir)22:41
jlkokay so we refuse any plugins/ dir22:42
jlkthat's good22:42
jeblairjlk: yep, and override the built-in ones22:42
SpamapSoh yay, I fixed enough stuff that subunit works again22:42
SpamapSFAILED (id=16, failures=26 (-8))22:42
jlkwe should add connections to that22:43
mordredjlk: whatchamean?22:43
jlkWe should narrow down what connection plugins are allowed22:44
mordred++22:44
jlkso that user couldn't influence a host in the inventory to have new facts, which would include ansible_connection22:44
jlkdo we have something trying to stop users from changing ansible_host ?22:45
jlkeither by set_fact or add_host ?22:46
mordredjlk: we do not - but we do prevent them from connecting to localhost22:48
jlkin what way?22:48
jlkI think I recall something checking if the connection is "local" or localhost22:49
jlkor 127.0.0.122:49
jeblairpabelanger, Shrews, rbergeron: i have started on the nodepool config structure update described at http://lists.openstack.org/pipermail/openstack-infra/2017-January/005018.html i hope to have it finished this week.22:52
mordredjlk: yah - that - we _do_ block add_hjost22:54
mordredadd_host22:54
jlkwhat about set_fact?22:54
mordredjlk: we do not block set_fack for ansible_host - so we should probbably do that (and go ahead and do ansible_connection too)22:54
openstackgerritJames E. Blair proposed openstack-infra/nodepool feature/zuulv3: WIP Update nodepool config syntax  https://review.openstack.org/44881422:54
jlkyeah I thought you had something in the connection plugin we force itself that prevents an attempt to hit localhost tho22:54
mordredjlk: in the "normal" action plugin22:55
mordredjlk: we do:22:55
mordred        if (self._play_context.connection == 'local'22:56
mordred                or self._play_context.remote_addr == 'localhost'22:56
mordred                or self._play_context.remote_addr.startswith('127.')22:56
mordred                or self._task.delegate_to == 'localhost'22:56
mordred                or (self._task.delegate_to22:56
mordred                    and self._task.delegate_to.startswtih('127.'))):22:56
jlkgotcha22:56
jlkI wonder if play_context gets updated by the host in question22:56
mordredso we should definitely block the set_fact route22:56
jeblairpabelanger: for 448814 i have just started matching on provider.name.startswith('fake') for now (re the bug in 447647).  i'm not super happy about that but it keeps me moving.22:56
jlklike if the host context has hte remote addr22:56
jlknod22:56
jlkjeblair: for v3, a pipeline can only have one source (driver) still, right?22:58
mordredjlk: I should probably do the old versoins too - ansible_ssh_host and ansible_ssh_connection too - right? those are still 'valid' just deprecated?22:58
jlkcorrect22:58
jlkhttp://docs.ansible.com/ansible/intro_inventory.html#list-of-behavioral-inventory-parameters22:58
jlkwant to prevent setting any ansible_ssh_ stuff22:59
jlkor ansible_sftp*22:59
jlkor ansible_scp23:00
jeblairjlk: currently.  that needs to change as described in http://lists.openstack.org/pipermail/openstack-infra/2017-March/005208.html23:00
jlkokay23:00
*** hashar has quit IRC23:01
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Create zuul.executor.hostname ansible variable  https://review.openstack.org/44882023:01
pabelangerjeblair: okay, I haven't looked at 448814 as of yet. Likely tomorrow now23:02
jeblairpabelanger: you don't have to look at it; it's not ready yet.  i pointed out the bug i ran into on 447647.23:04
pabelangerjeblair: okay. I confused the patches23:05
jlkjeblair: yeah I think I'm bumping into that change barrier. I wonder if I should stop trying to shove this in, and accept that for now you can have gerrit, or you can have github, but you can't have both.23:06
jeblairjlk: fwiw, i'm tentatively planning on working on that next week.  :)23:06
jlkoh then I definitely should.23:06
Shrewsjeblair: oh cool. Thx23:16
jlkholy crap23:35
jlkjeblair: I'm probably missing something really really basic, but I might have made this multiple drivers thing work with just one if statement in scheduler.py....23:36
jlkat least until you dig into it and tear all that apart :)23:36
SpamapSjeblair: seems like it might be nice to stack that on top of a re-factored Change/Ref model23:42
jlkhah nope, I broke it23:43
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul master: Remove Changeish and refactor around Ref as base  https://review.openstack.org/44882923:43
SpamapSjeblair: ^^ WIP, fails 26 scheduler tests, but it's a start. :-P23:43
SpamapSSome of them I think are failing because they're expecting some of the aspects of NullChange instead of having a Ref23:44
SpamapSalso I think I might need to dig back to HEAD^ for oldrev23:44
jlkdamn, a timer based trigger doesn't get a trigger_name23:54
mordredjlk: when we were talking plugin holes earlier - blocking set_fact of connection things, blocking connection plugins and blocking lookup plugins were the three things we talked about yeah?23:55
jlkuh.23:56
jlkthat reads right from looking at backscroll23:57
mordredsweet!23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!