Thursday, 2017-10-05

pabelangerYa, zuulv3 does seems to be preforming better00:20
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix Gearman UnknownJob handler  https://review.openstack.org/50899200:39
*** smyers has quit IRC00:54
*** smyers has joined #zuul00:57
*** smyers has quit IRC01:39
*** smyers has joined #zuul01:45
*** jhesketh has quit IRC01:51
*** jhesketh has joined #zuul01:51
*** jkilpatr has quit IRC02:46
mordredrcarrillocruz: I was thinking it was going to fail because you added a line to the test that was SUPER long - but if pep8 allows it, awesome03:21
*** jamielennox has quit IRC04:14
*** jamielennox has joined #zuul04:18
*** bhavik1 has joined #zuul04:44
*** ricky_ has joined #zuul08:28
openstackgerritRicardo Carrillo Cruz proposed openstack-infra/nodepool feature/zuulv3: Bring back per label groups in Openstack  https://review.openstack.org/50962008:33
*** hashar has joined #zuul08:34
openstackgerritTobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Use same flake8 config as in zuul  https://review.openstack.org/50971508:34
ricky_tobiash: please re-review when you get a sec ^08:34
ricky_thx08:34
tobiashmordred, jeblair, rcarrillocruz: looks like pep8 config in nodepool is broken. ^ would sync it to the same settings as in zuul, but we would have to fix quite some stuff...08:35
tobiashricky_: lgtm08:35
ricky_thx08:37
*** bhavik1 has quit IRC09:23
kklimondacan I ship my own ansible action plugins with roles/playbooks? Or perhaps I can explain my usecase: I'd like to expose some additional variables to my tasks (for example I have a repo with debian packaging, I'd like to parse debian/changelog and expose version to other tasks as a variable).09:41
tobiashkklimonda: I think you can ship your own modules, but no action plugins as they are restricted by zuul in order to prevent unreviewed code to do bad stuff09:46
kklimondatobiash: is this just matter of trusted vs untrusted projects? i.e. can I ship action plugin if it's part of a trusted project?09:47
tobiashkklimonda: by looking into the code I think it's just restricted for untrusted projects:10:28
tobiashkklimonda: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/executor/server.py?h=feature/zuulv3#n153310:28
tobiashkklimonda: but I don't know what's the default search path of action plugins in ansible10:28
*** ricky_ has quit IRC10:59
*** jkilpatr has joined #zuul11:15
*** jkilpatr has quit IRC11:24
*** jkilpatr has joined #zuul11:37
kklimonda@tobiash thanks, I'll check it out12:22
*** SotK_ has joined #zuul12:41
*** SotK_ has left #zuul12:48
dmsimardAre we planning on reloading the executors sometime soon ? I'd like to have https://review.openstack.org/#/c/509254/ in to properly test zuulv3 elastic-recheck changes13:29
*** dkranz has joined #zuul13:40
fungimemory utilization on zuulv3 is looking muuuuch better today: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63979&rra_id=all14:17
jeblairyeah, it's still *a lot* of memory, but not an ever increasing graph14:21
jeblairbtw, we do know that we leak cached change (and now pull-request) objects.  there's not an easy solution to that right now, but they are small, and that's a slow leak.14:21
mordredjeblair: with the current rate of change in the scheduler- and existing planned changes - I'm comfortable with a slow leak14:30
mordredjeblair: (I mean, I'm fairly certain we'll have at least one change per week we'll wnat to restart to pick up between now and whenever we could fix theleak)14:30
jeblairwe have about 1k items in the pipelines right now; so more things in memory than we would even have while normally running -- though i don't know what our proportion of dynamic configs is14:33
pabelangerya, memory does look pretty flat this morning14:37
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu  https://review.openstack.org/50960314:38
Shrewsjeblair: probably going to need your expertise for a test for that ^^^14:38
Shrewsmy gerrit "zuulv3" filter is now useless since everyone is using that  :(14:40
jeblairwe should switch to "frank"14:42
rcarrillocruzgot post failure on zuulv3 for sphinx on https://review.openstack.org/#/c/509620/, but regardless, +1 from Jenkins zuulv214:44
rcarrillocruzare we good to merge mordred ?14:44
rcarrillocruzi made a shorter line the assert14:44
dmelladorcarrillocruz: I've been seeing that behavior quite a bit, sadly14:47
rcarrillocruzthx jeblair14:47
pabelangerrcarrillocruz: Docs should not be published for feature branches14:48
rcarrillocruzso failure expected14:48
rcarrillocruzok14:48
pabelangerya14:48
pabelangerwe need to fix the job14:48
pabelangerbuild-openstack-sphinx-docs should be using prepare-infra-docs-for-afs role14:48
jeblairpabelanger: that's the doc *build* job14:50
jeblairit should run on all changes14:50
jeblairand it should publish to logs.o.o14:50
pabelangerjeblair: right, but openstack docs is how it was setup.  Which doesn't allow feature branches to be published when it was zuulv2.5 JJB.14:51
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Bring back per label groups in Openstack  https://review.openstack.org/50962014:51
pabelangerat once point we did have an build-infra-docs, but I am not sure atm14:52
pabelangerlooking14:52
jeblairpabelanger: let's move this to -infra14:52
pabelangerkk14:52
jeblairpabelanger, Shrews: i don't see any zookeeper connection issues since i stopped doing objgraph stuff after the restart15:04
pabelangerYa, I think the high CPU load was causing them to drop15:06
clarkbcacti reports signifiacntly more sane cpu usage15:06
pabelangerclarkb: jeblair: I am noticing we consuming for HDD space, I am not sure we have 2nd drive mounted15:07
jeblairShrews: i think we can probably pause the nodepool provider to control when it fulfills requests, and we can probably manually close the zk connection... i don't think we have a facility to stop scheduler processing of zk events while we do that.  i think tests for this could be very difficult.15:07
jeblairpabelanger: the 2nd drive is all swap15:07
pabelangerHA15:09
pabelangernice15:09
jeblairShrews: there are some more unit-test like tests in test_nodepool... maybe we could do it there15:09
jeblairShrews: the test class itself acts as a scheduler, so it has its own onnodesprovisioned event15:10
jeblairShrews: i think i'd give that a shot -- probably make a new test class because you'll want to control onnodesprovisioned and have it do something differently15:10
jeblairShrews: aha!  there's even a test in there for disconnects15:11
jeblairShrews: so i think that has almost all the pieces15:11
pabelangerjeblair: assuming all of swap on 2nd drive is wrong, on next stoppage of zuulv3 should we rebuild the 2nd drive and setup properly fstab?15:13
pabelangerI'm sorry, but if people are upset there was an outage for 5 days because CI was down or sucked, we should have added more people to zuulv3 effort. Its not like we've been asking for more help.15:15
pabelangerwow15:15
pabelangerthat was the wrong window15:16
mnaserits nice that the memory of zuul is relatively stable15:18
mnaserand my browser cant even handle rendering how big the queue is :D15:19
dmsimardmnaser: time to use zuultty15:21
mnaseri need to learn gertty first but maybe ill look into taht15:22
mnaser:p15:22
dmsimardI always forget where zuultty is hidden, it's like a subfolder in some other project15:22
mnasergoogle search shows... a result of you in eavesdrop15:23
mnaser:p15:23
Shrewsjeblair: k. i'll see if i can figure something out15:24
dmsimardmnaser: lmao15:24
*** hashar is now known as hasharAway15:24
mnaserdmsimard https://gist.github.com/sileht/c342606a7ba64761936e15:24
dmsimardmnaser: nah that's not it.. hang on, let me find it15:25
dmsimardmnaser: https://github.com/harlowja/gerrit_view/blob/master/README.rst#czuul15:26
mnaserdmsimard nice!15:27
fungijeblair: seeing a recent jump in memory utilization (not huge, but at least pronounced) in the past few minutes... wondering if any of this will drop as zuulv3 catches up on its backlog15:30
fungithat is, once we're adding changes more slowly than we're reporting on them15:30
jeblairfungi: i doubt it will ever drop due to python memory management....15:30
jeblairalso, i don't expect zuulv3's queues to ever shrink in our current configuration15:31
fungi"management" needs quotes there ;)15:31
jeblairi'm going to poke at more memory things, it may cause disruption again15:38
*** bhavik1 has joined #zuul15:56
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Use normal docs build jobs  https://review.openstack.org/50983316:49
*** bhavik1 has quit IRC16:50
pabelangerjeblair: here is my first attempt at getting kill_after_timeout working locally in GitPython.  https://github.com/pabelanger/GitPython/commit/2e78443444c3b836ba3bcd6e6dde62be77ce3779 Not that you are an expert, but any thing pop out as a potential issue?16:53
pabelangerwhen the timeout happens, it will now raise the follow: git.exc.GitCommandError: Cmd('git') failed due to: exit code(-9)16:53
pabelangerwhich we can likey trap and then proceed to clean up the repo16:53
jeblairpabelanger: i'm pretty sure the as_process option is tightly integrated with the progress output16:55
jeblairpabelanger: so you may want to try setting as_process to false if progress in None16:56
jeblairpabelanger: however, in parallel to working upstream, why don't you make a local method for us to use instead of that one, so we don't have to wait for a gitpython release?16:57
pabelangerjeblair: Oh, I see. Good idea16:57
pabelangerjeblair: Sure, I can try my hand at it16:57
dmsimardjeblair, mordred, fungi, clarkb, pabelanger: I don't know if we can make use of this or if it's relevant but it seemed awesome enough that it was at least worth sharing: https://github.com/nickjj/ansigenome16:59
jeblairdmsimard: nice, thanks17:00
*** maxamillion has quit IRC17:01
*** maxamillion has joined #zuul17:03
dmsimardThere's some interesting features, like making sure we have READMEs, it can generate them, etc. I might poke at it out of personal curiosity to see what it does.17:04
odyssey4medmsimard nickjj also did https://github.com/nickjj/rolespec17:17
pabelangerI seen a talk at ansiblefest SF, using testinfra library too. https://pypi.python.org/pypi/testinfra17:22
jeblairwe're still using a bit more memory than we should; i'm currently looking at some layouts held in memory because some merger jobs have gotten stuck on the git timeout issue that pabelanger is working on.  i'm going to think about whether we need to do anything about that other than just fix the git timeout thing.17:22
pabelangerI might start using that for helping test roles17:22
dmsimardodyssey4me: one day I would like to see something like serverspec but with ansible17:23
dmsimardserverspec being ruby and all that17:23
dmsimardhttps://github.com/larsks/ansible-assertive/ allows for doing non-failing asserts for example17:24
dmsimardI had written this a long time ago inspired from stuff that EmilienM did back at Enovance17:25
dmsimardhttps://github.com/dmsimard/openstack-serverspec/blob/master/spec/tests/swift_loadbalancer/swift_proxy_spec.rb17:25
dmsimardpabelanger: ah so testinfra is basically like serverspec but in python17:26
dmsimardnever heard of it before17:26
* EmilienM hides17:27
dmsimardI guess I want to do the same thing as serverspec and testinfra but with ansible proper :D17:27
dmsimardpabelanger: oh, but since testinfra is python, we could probably just easily wrap it inside ansible modules17:27
pabelangerdmsimard: well, I'd want it to run outside of ansible. EG: have ansible do its thing, then run the python unit test to validate it worked17:28
pabelangerotherwise, if ansible is broken, it will be hard to detect that if running inside17:28
pabelangerI'll have to find the talk, but it was about molecule at ansiblefest SF17:28
dmsimardare the talks online yet ?17:29
pabelangerI think so17:29
dmsimardI've heard about molecule too but never used it17:29
jlkI know that guy17:29
jlkwho wrote it17:29
jlkHe used to work at Blue Box17:29
jlkalas, I haven't really given molecule a spin yet :(17:29
pabelangerYa, i don't think we'd use it, since it works like beaker. Meant to setup your nodes into containers, then run ansible. But we have zuul / nodepool to do that17:29
pabelangerbut the testinfra was interesting17:30
pabelangerassert file exists, server runs, etc17:30
pabelangerservice*17:30
jlkyeah, there is need for things like that in the enterprise world17:30
jlkwe used ansible to set up serverspec17:31
jlkto validate teh ansible17:31
dmsimardI almost went as far as getting monitoring checks to run serverspecs in prod on a regular basis17:31
dmsimardbut got sidetracked by far less fun things17:31
jlkparticularly because it wasn't a "continual deployment" environment, Ansible was ran on-demand. So something like serverspec could catch something messing with the system17:31
jlkdmsimard: that's exactly what we did17:32
jlksensu alerts for serverspec failures17:32
dmsimardjlk: neat, that's what we wanted to do yeah17:32
dmsimardbut that was at $oldjob :)17:32
dmsimardjlk: so the guy worked at metacloud first? then bluebox? does he have a flair for acquisitions or something ? :p17:33
jlkbluebox then metacloud17:33
jlkhe left BB before acquisition17:33
jlk(before I joined BB actually)17:34
Shrewsjeblair: i'm stumped on this test. if i push up the current iteration of it, would you mind showing me where i'm going wrong?17:38
pabelangerokay, I think I fixed kill_after_timeout upstream: https://github.com/gitpython-developers/GitPython/pull/683 working on a zuul function now17:41
kklimondazuul doesn't seem to be doing anything to ensure that all jobs that are part of the dependency graph will run on the same cloud, right?17:42
pabelangernodes in the same nodeset should be on the same cloud17:43
kklimondabut nodeset is per job, right?17:44
pabelangerRight17:44
pabelangerthe only way to pin jobs to a cloud, would be to create a unique label for said cloud17:45
pabelangerwe do this today for tripleo-centos-7 images17:45
pabelangerand they only run jobs on tripleo-test-cloud-rh1, for historical reasons17:45
kklimondayes, but I don't want to pin it to a specific cloud, just make sure that a given set of jobs will all run on a single cloud17:45
Shrewskklimonda: what's the use case for that requirement?17:46
pabelangerI don't think we support that currently17:46
kklimondaShrews: I need to build 1GB of packages and then install them for testing.17:46
pabelangersounds like artifact handling?17:47
kklimondayeah17:47
pabelangerya, so this is something we don't do too well atm17:47
pabelangerhow we worked around it was regional mirrors / proxies to help with that17:47
kklimondawell, probably less then that - a lot of packages are dbg symbols, but I'll still end up with ~100MB of packages that have to actually be transferred per build.17:47
kklimondasure, but mirrors/proxies don't help me when it's all new artifacts each time17:48
pabelangerright17:48
kklimonda(not that I don't need those anyway)17:48
pabelangerwe basically have the same issue today with the kolla project. They upload large artifacts to tarballs.o.o, and jobs then download from it.17:49
clarkbwe've talked about possibly using shared cinder volumes for that which would imply scheduling to one cloud region.17:49
pabelangeryah17:49
clarkbbut cinser volumes arent currently multi attachable so that has been possible future work17:49
dmsimardnor can we guarantee that cinder will be available in every cloud ?17:52
clarkbdmsimard: correct though only infracloud doesnt in our current clouds iirc17:53
jeblairpabelanger: thinking about it a bit more -- maybe we've only seen the git hangs on https?  so maybe we should go with that solution, and consider the gitpython timeout as a backup solution we can implement later if needed.  what do you think?18:04
jeblairShrews: of course!18:05
pabelangerjeblair: actually, yah. Looking at etherpad it was HTTPs.  So sure, let me get some .gitconfig settings going18:09
pabelangerjeblair: do you have recommendations on limites for ratelimit and time?18:09
pabelangerjeblair: also, I'm having a hard time understanding if our WatchDog for zuulv3 executor is working. I don't see how we abort the process any more18:10
pabelangernot important at the moment, maybe when you have spare time18:10
jeblairpabelanger: maybe as close to "no data in 30s" as we can get?  so probably 30s for the time and the lowest non-zero value you can do for rate?18:10
jeblairpabelanger: i thought we timeout out builds all the time? :)18:11
pabelangerwe do, i think I'm just not understanding https://review.openstack.org/426306/ which is where we stopped passing the proc into Watchdog class18:12
pabelangerbut first, I'll do .gitconfig changes18:13
jeblairpabelanger: we pass an instance method to the watchdoc.  the instance method aborts self.proc18:14
mnaserstatus.json for zuulc3 is almost 4.4M at this point heh18:16
jeblairwhat i'd really like to do with that is send updates over websocket18:16
jeblairof course, zuul itself barely knows when something changes at this point, so that's a ways down the road18:17
pabelangerjeblair: doh, I see it now. Thank you18:18
*** electrofelix has quit IRC18:20
pabelangerjeblair: okay, settings work locally: http://paste.openstack.org/show/622785/18:24
pabelangerproposing patch for 1000 bytes/sec for 30 sec18:24
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu  https://review.openstack.org/50960318:34
Shrewsjeblair: thx. see test_nodepool.py18:35
dmsimardmordred, jeblair: https://www.anandtech.com/show/11902/thinkpad-anniversary-edition-25-limited-edition-thinkpad-goes-retro :)18:37
mordreddmsimard: yes. it gives me great excitement18:39
dmsimardI was almost excited until I saw the Geforce in it ? My W541 has an nvidia card and it has brought me nothing but trouble :(18:40
dmsimardStarting at 1899$, ouch18:40
jeblairShrews: hey cool, you found a test for another issue on the etherpad!18:45
jeblairthat's line 148 -- kazoo callback error18:46
Shrewsjeblair: quite by accident18:46
Shrewsjeblair: oh, i just notice i missed setting fail_first_request in setup (used to be there, but then made the base class and accidentally removed it)18:49
jeblairShrews: ok, i think i understand the exception -- in the test, we're doing a zk operation in the callback we can't do -- we can't shut down zk from the zk callback18:51
jeblairthat means the error is something different from product.....oh.... i bet in production we somehow hit that inside the resubmit (which happens in the callback).18:51
jeblairShrews: oh, no, strike that.18:52
jeblairShrews: the production error is actually something we can ignore, i think.18:52
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add git timeout for HTTP(S) operations  https://review.openstack.org/50987618:53
jeblairShrews: the key to this is that this is happening inside of an execption handler, and the "TypeError: callback() takes 2 positional arguments but 3 were given" error is a red herring18:53
pabelangerjeblair: how does that look^18:53
jeblairShrews: that's a harmless exception, it's the one after that matters18:53
Shrewsjeblair: the callback is _updateNodeRequest(), yeah? i was having a devil of a time trying to get that to trigger18:53
jeblairShrews: yep18:53
jeblairShrews: how about this?  in onNodesProvisioned, set an Event, and .wait() for it inside the main test method.  and then kill zk in the main test method18:55
jeblairShrews: then it'll happen outside the callback thread; should work18:55
Shrewsjeblair: should I delete the request from onNodesProvisioned before setting the event? i'm not seeing how to invalidate the first request19:01
jeblairShrews: it should be deleted automatically after the zk client disconnections (it's ephemeral)19:03
Shrewsjeblair: yeah, but doing it in the main test method seems too late. we need it trigger again by the time it gets back to the main thread.19:04
Shrewsi hate to admit that i'm totally lost by the sequencing here  :(19:04
Shrewsi tried the Event thing and i'm not seeing it retrigger19:05
jeblairShrews: nevermind the test -- what's the sequence you need to have happen?19:07
jeblairShrews: maybe you can write that up on an etherpad, and i can take a look at it when i get back from lunch19:08
Shrews1) request A fulfilled 2) req A enters event queue 3) before the queue is processed, req A disappears, causing req A to resubmit 4) waitForRequests() returns19:09
Shrews0) submit request A19:10
Shrewsif anyone else more familiar with zuul testing wants to take a stab, by all means, please  :)19:11
tobiashShrews: sounds like monkey patching might be able to help disappearing req in step 319:17
tobiashI don't have the code at hand currently, but I could imagine that a disappearing req could be injected into the event processing like that19:20
Shrewsi think the error with my thinking is believing that there is an event queue in test-land. which now leaves me more confuzzled about how to test this19:32
* Shrews walks19:33
*** hasharAway is now known as hashar19:50
*** ianw|pto is now known as ianw20:00
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu  https://review.openstack.org/50960320:01
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu  https://review.openstack.org/50960320:02
Shrewsgah20:02
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu  https://review.openstack.org/50960320:03
jeblairShrews: here's what i've got: https://etherpad.openstack.org/p/4cZGJDn02i20:03
Shrewsjeblair: i think i got it now20:03
*** jkilpatr has quit IRC20:03
jeblairShrews: and yeah, i think we need to make our own "event loop" in the test, since the test is standing in for the scheduler20:03
Shrewsjeblair: took me a while to realize how to actually get to my codepath to test20:03
Shrewsi don't think we need to introduce the event loop20:04
jeblairShrews: well, we want something to call acceptNodes with outdated info, right?20:04
jeblairShrews: (ie, we should end up calling acceptNodes twice in that test)20:05
jeblairhope that makes sense; the main test thread is on the left; other threads are on the right20:06
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Create git_http_low_speed_limit / git_http_low_speed_time  https://review.openstack.org/50989320:06
pabelangerokay, git timeout patches uploaded20:07
Shrewsjeblair: you don't feel that simulating those conditions as i did in that new PS is sufficient? if not, then yeah, we'll have to add an event loop20:08
jeblairShrews: i hadn't looked.  apparently i was working on the etherpad while you were updating the change.20:08
jeblairShrews: i think those are good tests as long as we've got the sequencing right.  i think the only thing they don't do is actually exercise the zk disconnect callback.  however, test_node_request_disconnect covers that separately, so we may be okay.20:11
jeblairShrews: if you're happy, i'm happy :)20:11
Shrewsjeblair: not even test_node_request is testing the event queue path. i really just needed a way to exercise acceptNodes(), which i think those do20:12
Shrewsand my head hurts. :)  being dumb/hardheaded takes a lot of energy20:13
Shrewsmust be all the beer from my younger years20:14
jeblairShrews: yeah -- the first thing the scheduler event processor does is acceptNodes; so all of these are doing a first-order approximation of that and assuming nothing interesting happens after20:14
pabelangeretherpad also updated20:14
jeblairi think that's okay for the scope of these tests20:14
jeblairpabelanger: cool, thx.  +3 on the first and small -1s on the second20:21
*** jkilpatr has joined #zuul20:22
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Create git_http_low_speed_limit / git_http_low_speed_time  https://review.openstack.org/50989320:33
jeblairi have found a second memory leak triggered by these git timeouts.  i have a test case and fix in progress.20:34
jeblair(and i have concluded that we should fix this regardless of the git timeout issue)20:35
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add git timeout for HTTP(S) operations  https://review.openstack.org/50987620:35
jeblairpabelanger: do you want to update and restart executors with that ^ ?20:37
pabelangerjeblair: great work!20:37
pabelangerjeblair: sure, give me a moment to fetch a drink20:38
*** dkranz has quit IRC20:43
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix path exclusions  https://review.openstack.org/50990120:47
mordredjeblair: ok - after thinking WAY too hard about that ^^ I think that should do us20:48
jeblairmordred: heh, the last thing i remember from you on the subject was "let me write that real quick!" :)20:50
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove references to pipelines, queues, and layouts on dequeue  https://review.openstack.org/50990320:54
jeblairthere's memory leak #220:54
jeblairi also realized there's another bug that helped uncover that; it's minor, but i'll try to fix it now too.  causes us to run too many merge operations, and we could always do with fewer of those.20:55
mordredjeblair: yah - 'real quick' got distracted and pushed down on the stack until I remembered that I was going to write it real quick20:56
jeblairmordred: zomg the commit-message-longer-than-bugfix club needs a secret handshake.20:57
jeblairi was settling in for a long review and now find myself unprepared20:58
mordredjeblair: hehehe20:58
mordredjeblair: your most recent patch isn't a easy to read as the previous memory leak - you had to touch WAY more than one line there20:59
jeblairmordred: indeed; i also haven't run the full test suite against it; could have some lurking bugs still.21:00
mordredjeblair: it'll fun the full test suite against itself21:01
clarkbmordred: does path fix work if user changes $HOME21:01
clarkbis that even possible in bwrap?21:01
mordredclarkb: unpossible21:01
clarkbbecause passwd is ro?21:01
mordredclarkb: the user doesn't have the execution context to change the environment in which ansible-playbook is executed21:02
pabelangerokay, starting to restart executors. puppet has been run21:02
mordredclarkb: they can set environment in tasks - but those are all executed by ansible-playbook, so are subshells of the shell where the env is checked21:03
clarkbmordred: even via something like /proc?21:03
mordredwel - they can't write to /proc unless the path filter is already busted21:03
jeblair(worth noting, this can be improved when ansible 2.4.1 is released and we can get this value from an ansible.cfg file)21:04
mordredbut it's all sequencing - zuul-executor executes ansible-playbook and passes an explicit environment to that subprocess - the action plugin that checks the path against HOME is in that process - and the user shouldn't have access to change the environment that exists there21:04
*** hashar has quit IRC21:04
mordredalso what jeblair said21:04
clarkbthat was going to be my next question, can we supply it directly as config rather than potentially user changeablr env stuff21:05
mordredyah - most definitey once 2.4.1 is out21:05
clarkbsoubds like later, ok21:05
mordredbut also - if the user is able to change HOME - that should be considered a SERIOUS issue21:05
mordredas that would mean that the user was able to execute abitrary code in a context that they should not be able to execute arbitrary code21:06
mordredwhich is not to say it's unpossible - obviously- but if we find an instance of that we should drop everything and think about nothing but that21:06
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix doc typo that inverts logic  https://review.openstack.org/50990521:17
mordredjeblair: doublecheck me on that ^^ but I just noticed that looking at the docs for something else21:18
pabelangerso far, ze05 and ze06 were in hung state for git clone. They've been since restarted with fixes, moving to ze0721:21
jeblairmordred: good catch, but i think the fix is different; commented21:25
pabelangerjeblair: all executors upgraded and restarted21:26
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu  https://review.openstack.org/50960321:27
Shrewsnoticed i missed a var for a %s ^^^21:27
jeblairdmsimard: ^ re executors21:28
mordredjeblair: ah - good -I'll update that - and that tells me I want to set some of our publish jobs to be post-review: true as wel21:28
dmsimardYay, thanks21:28
jeblairmordred: do we have publish jobs defined outside of project-config?21:29
jeblairmordred: or i can just wait for the change and review it :)21:29
mordredjeblair: oh - no, we don't. nevermind. all good21:30
jeblairkk21:30
pabelangerjeblair: I'm going to switch to ze03.o.o stopped issue again21:35
pabelangerunless something else you'd like me to do21:35
jeblairpabelanger: thanks21:41
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig  https://review.openstack.org/50991221:59
mordredjeblair: wow. that's a fun one22:00
jeblairmordred: yeah, that happened, and exposed the memory leak, and i was writing a test around it, and realised, "hey, maybe i shouldn't write a test that replicates the behavior of a bug; maybe i should fix the bug."22:07
mnaserjeblair im looking at *a lot* of inefficent behaviour at the moment in the zuul status page22:15
mnaserwould it be okay to introduce 1 additional dependency (mustache)22:15
*** hogepodge has joined #zuul22:15
mnaserits a very simple/small javascript templating language that will help the translation of status state <=> html22:16
jeblairsounds pretty hipster.  :)22:19
mordredmnaser: I should probably talk to you about the 'rework how we deal with javascript and html' patches I'm going to get back to once the v3 rollout is done ...22:20
mnaserjeblair or maybe angular? that would simplify the code base soooo much22:20
mnasermordred jeblair i could probably get the status page redone in angular.. tonight.  maintaining the same look :>22:20
mordredmnaser: I believe tristanC's dashboard work introduces angular - so maybe we should just put a pin in improvements here until the rollout is done and we can give some attention to how it's all being stiched together22:21
mordredmnaser: we've been holding off on that work until post-rollout so that we don't get too distracted ... one sec though, I'll link you to a couple of patches22:21
mnaserok ill have a look22:21
mnasernot like i can help much in the internals of zuul and finding memory leaks :-P22:22
tristanCmordred: mnaser: indeed, the zuul-web patch for tenants, jobs and builds are using angular: https://review.openstack.org/#/q/topic:zuul-web22:22
jeblairya -- my only request for angular is that it be understandable by folks who have used web systems other than angular -- i'm able to follow the patterns that tristanC has used fairly easily22:23
fungifor some reason i always confuse angularjs with reactjs (the facebook one with the patent license controversy)22:23
mnaserjeblair i agree 100% -- i dont want to leave zuul with some complicated codebase no one knows how to fix if im not available22:24
mordredmnaser: https://review.openstack.org/#/c/487538/ and https://review.openstack.org/#/c/503268/ are the two relevant pieces22:24
jeblairmnaser: exactly! :)22:24
mnasermordred ++ to using webpack to manage dependencies22:24
mordredmnaser: the first is some initial exploration I did around incorporating javascript toolchains - the second is the first patch from tristanC that adds angular and uses it using the current setup22:24
jeblairso maybe building on mordred and tristanC's work is the best bet.  i think the only caveat is that we won't really review+merge larger changes like that until after the dust settles (but probably *soon* after the dust settles)22:25
mordredyah - I want the dashboard :)22:25
mnaserthe nice thing is the status page is really well/easily tested22:25
jeblairso just know there will be a bit of a delay if we go that route.  but in the long run, it's the best i think.22:26
mnaserthanks to whoever wrote the ?demo= stuff22:26
mordredthat's one of the reasons I used status page as a trial balloon for the toolchain stuff :)22:26
mnaseri'll work on angular-ifying the status page and then we can "integrate" it with the other angular-based pages later, it shouldn't be too much work (if i do it correctly)22:26
mnaserbecause personally the status page is currently unusable for me, always crashes my browser :(22:27
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix doc typo that missed important words  https://review.openstack.org/50990522:27
mnaserthe v3 one that is, with the big status.json file22:27
jeblairme too22:27
mordredmnaser: I'm rebasing that patch of mine real quick - silly merge conflicts22:29
fungimnaser: is the v2 one at http://zuul.openstack.org/ (not the custom one we have on status.o.o) actually any better in that regard?22:31
fungiit also is nigh unusable for me at relatively high queue sizes22:31
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Use yarn and webpack to manage status javascript  https://review.openstack.org/48753822:39
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Migrate console streaming to webpack/yarn  https://review.openstack.org/48753922:39
mordredmnaser: in https://review.openstack.org/487538 if you do "yarn install ; npm run start:livev3" it'll spin it up in a local dev server pointed at zuulv3.openstack.org22:39
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: alien-list: use provider name  https://review.openstack.org/50878822:42
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Migrate console streaming to webpack/yarn  https://review.openstack.org/48753922:52
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP Use yarn and webpack to manage status javascript  https://review.openstack.org/48753822:54
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP Migrate console streaming to webpack/yarn  https://review.openstack.org/48753922:54
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove references to pipelines, queues, and layouts on dequeue  https://review.openstack.org/50990323:30
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix early processing of merge-pending items on reconfig  https://review.openstack.org/50991223:30

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!