Wednesday, 2017-03-08

*** saneax is now known as saneax-_-|AFK00:40
*** jamielennox is now known as jamielennox|away00:41
jeblairjlk: your request for early review is entirely reasonable, and i'd like to help.  but as i try, i'm honestly not sure how to proceed.  i can do a half-assed review and say "yes/no good/bad general direction" and then do a real review later where we change all kinds of interfaces, etc (and i may still be wrong).  or i can do a real review which involves a major context switch and time investment.00:42
*** jamielennox|away is now known as jamielennox00:44
jeblairjlk: i don't feel like we have achieved enough forward progress on the major milestones which are ahead of the github milestone to be able to afford doing that at this point.00:45
jeblairjlk: i can't see a cursory review being helpful.  you're likely to just get upset that i'm not being consistent.  :)00:51
jeblairi've just spent 30 minutes reviewing the first one and have only gotten as far as determining i need to spend more time on it.00:54
jeblairmy hope was that you would be able to do the bulk of the porting work now, and then once we cleared out more of the work ahead of it, we'd both have time to dig into refining the series.00:57
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Re-enable test_tags  https://review.openstack.org/43985801:25
*** Wei_Liu has joined #zuul02:08
SpamapSjeblair: deerrrrppp.. I totally forgot about example.* in rfc2606 .. sorry.. it's a knee jerk to see names in urls. :-P02:29
SpamapSjeblair: and regarding py3 stuff.. I wonder if we can get a non-intrusive way to start running a subset of the tests for py3. I have some pretty minimal patches that got a couple of the unit test suites to pass.02:30
SpamapSjust to start preventing backsliding02:30
SpamapSbecause I'd definitely prefer /usr/bin/python3 as the linter in this case. :)02:30
*** Wei_Liu has quit IRC02:31
*** abregman_ has joined #zuul06:33
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Change mutex to counting semaphore  https://review.openstack.org/44256306:43
*** saneax-_-|AFK is now known as saneax07:05
*** abregman_ is now known as abregman08:53
*** openstackgerrit has quit IRC09:03
*** hashar has joined #zuul09:06
*** isaacb has joined #zuul09:08
*** bhavik1 has joined #zuul09:23
*** bhavik1 has quit IRC11:07
*** isaacb has quit IRC11:44
*** abregman has quit IRC12:58
*** abregman has joined #zuul12:59
*** abregman has quit IRC13:02
*** abregman has joined #zuul13:03
*** isaacb has joined #zuul14:02
*** openstackgerrit has joined #zuul14:31
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add 'requestor' to NodeRequest model  https://review.openstack.org/44315114:31
pabelangermorning14:37
pabelangerhttps://etherpad.openstack.org/p/zuulv3-ansible-jobs14:37
pabelangerjeblair: Shrews: mordred: SpamapS: ^ please review my proposed email for feedback request14:37
pabelangerrelated to our tox playbook(s)14:38
Shrewspabelanger: will look in a bit14:38
Shrewspabelanger: looks good, definitely helped me14:59
*** saneax is now known as saneax-_-|AFK15:04
*** abregman is now known as abregman|mtg15:06
Shrewspabelanger: and i think i agree with mordred that, when considering what run-tox might look like after it's no longer a passthrough, the individual playbooks seem a bit better than the single playbook. And opens it up for more easily customizing things, I think.15:13
pabelangerShrews: right, that is what I am thinking too.15:14
Shrewsthx for the writeup15:14
jeblairpabelanger: email lgtm.  you might want to copy/paste some of that into the respective commit msgs15:18
pabelangerjeblair: agree15:19
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add generic tox job (multiple playbooks)  https://review.openstack.org/43828115:26
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add generic tox job (single playbook)  https://review.openstack.org/44218015:28
jeblairpabelanger: ++15:37
pabelangerWow, all sorts of badness here: http://zuulv3-dev.openstack.org/logs/62edb6f2d5b64cee9369d605e5eed65c/console.log15:39
Shrewspabelanger: wow15:49
Shrewssomething eat up all the memory?15:49
pabelangerShrews: looks that way15:49
pabelangertesting locally with fresh virtualenv15:50
pabelangerto see if I can reproduce15:50
jeblair2017-03-08 15:37:16.099322 | [28908.424773] Out of memory: Kill process 3758 (coverage) score 443 or sacrifice child15:50
jeblairlooks like each process was about 1G each15:52
pabelangerYa, something looks to have changed. A recheck also fails16:08
pabelangerthe good news, is we can reproduce the issue on zuulv2.516:09
jeblairthat's good news?16:10
Shrewsthis isn't zuulv3 specific? uh oh16:10
pabelanger:)16:10
pabelangergoing to compare dependencies16:11
pabelangerhttp://paste.openstack.org/show/601957/16:14
pabelangernew GitPython today it looks like16:14
pabelangerand appdirs yesterday16:15
jeblairi'll use old and new venvs locally and see if i can spot a memory diff16:16
jeblairmeanwhile, it may be a good idea to push up a change that experimentally pins gitpython to the previous version and see what shakes out16:16
pabelangerk16:17
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: DNM: pin GitPython  https://review.openstack.org/44322116:19
jeblairoh yep, definite increase in memory usage with a new venv compared to old16:20
mordredjeblair: AWESOME16:20
mordredpabelanger: that DNM may turn in to merge please rather quickly :)16:20
pabelangermordred: indeed16:20
Shrewsmordred: DNM means "Desperately Needs Merged"   duh16:20
mordredthe commit message on 2.1.3 is amazing: https://github.com/gitpython-developers/GitPython/commit/c23ae3a48bb37ae7ebd6aacc8539fee090ca34bd16:22
mordredI almost want to just respond to it directly16:22
pabelangermordred: please do! deleting things from pypi makes me sadface16:22
jeblairmordred: please do.  i hope he will appreciate learning that he has users that depend on important information like that16:23
jeblairalso, i just had to kill my test process because my workstation was swapping.16:23
jeblairmy workstation *never* swaps.16:23
jeblair(*one* of the coverace processes was up to 16gb)16:24
pabelangerwoah16:25
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Cap GitPython at 2.1.1 due to memory consumption  https://review.openstack.org/44322116:32
pabelangercapping GitPython fixes our memory issues16:32
jeblairi'm testing out some reverts of commits with suspicous commit msgs16:33
mordredjeblair: I'm going to guess the code in https://github.com/gitpython-developers/GitPython/compare/c823d482d03caa8238b48714af4dec6d9e476520...master in git/repo/base.py is going to be involved16:33
mordredsince it does gc things16:33
jeblairthere seemed to be at least one particular test, or smaller set of tests, that really set it off16:34
jeblairbut i don't know how to figure out what it/they were16:34
jeblairif we could, that would be useful in producing a test case16:34
mordredall of the rest of the changes had something to do with docs, tests or kwarg handling16:34
mordredjeblair: ++16:34
mordredjeblair: so - the gc code seems to be doing something similar to what the paramiko code was doing - trying to be more explicit about memory on close or something16:35
jeblairi reverted both bdb4c7cb5b3dec9e4020aac864958dd16623de77 and f1a82e45fc177cec8cffcfe3ff970560d272d0bf and the problem did not present16:36
jeblairmordred: wow, did someone write a blog post on how to (poorly) muck around with python gc internals?16:36
mordredyah- bdb4c is the one I was suspecting16:36
jeblairmordred: i'm re-running with only that one reverted now16:37
mordredoh - I'm sorry - I got that backwards16:37
mordredf1a82 is the one that does it not for the test suite16:38
jeblairoh okay, i'll let this run then try the other16:38
mordredhttps://github.com/gitpython-developers/GitPython/issues/553 is the issue the commit purports to be about16:39
jeblairthere it goes -- so with only bdb4c7cb5b3dec9e4020aac864958dd16623de77 reverted we still see the problem.  i'll just run with only f1a82e45fc177cec8cffcfe3ff970560d272d0bf reverted now to confirm16:40
jeblairpabelanger: when you said this is reproducible with zuulv2.5, did you mean zuulv2.5 running tests on zuulv3, or did you mean tests running on the code in the master branch?16:43
pabelangerjeblair: running the py27 job on both zuul.o.o and zuulv3-dev.o.o showed the same failure for the feature/zuulv3 branch16:45
pabelangerI haven't checked master yet16:45
jeblairpabelanger: can you please?16:46
pabelangersure16:46
jeblairi'm poking around at subunit log files, and i notice that the debugging output we added to try to catch the gitpython leak problem is outputting *a lot* of data.  it's possible the oom is triggered by that debug output.16:46
pabelangerrecheck of 442533 underway16:47
jeblairi mean, there's still a lot of unknowns there (why did we suddenly start seeing gitpython object leaks about a week ago, and why did the new release seem to make it a certainty).16:47
jeblairmordred: confirmed that reverting f1a82e45fc177cec8cffcfe3ff970560d272d0bf only fixes the problem16:48
mordredjeblair: cool. so, that's at least some information16:48
jeblairi'm running with gitpython 2.1.3 now but without the insane gc logging in zuul's tests16:49
mordredjeblair: looking further at the change, there are two changes in place there - one is adding context manager and an explicit close method so that what was the contents of __del__ will get called in context manager context but also let someone do an explicit cllose- this seems perfectly innocuous16:52
mordredjeblair: but the other change is adding two calls to gc.collect() on either side of a call to gitdb.util.mman.collect()16:53
pabelangerjeblair: no kernel traces on master, but jobs did fail still. Doesn't appear related to OOM16:54
pabelangerhttp://logs.openstack.org/33/442533/1/check/zuul-coverage-ubuntu-xenial-nv/37bb728/console.html16:55
mordredjeblair: I betcha that http://paste.openstack.org/show/601965/ is actually the revert that will make it go away16:55
mordredin the mman.collect() call is this comment:16:58
mordred        .. TODO::16:58
mordred            implement a case where all unusued regions are discarded efficiently.16:58
mordred            Currently its only brute force16:58
jeblairwe've had a problem in the past with zuul leaking gitpython git.Repo objects.  so we added a check at the end of every unit test that calls gc.collect and then gc.get_objects to make sure that there are no git.Repo objects in memory.16:59
jeblairthat hadn't produced any errors in, idunno, years17:00
jeblairlast week a couple started popping up, which seemed really weird, so i added some more debugging to try to figure out what the references are.  that produces a lot of output.  but since merging that change, we haven't seen the errors again.17:01
jeblairit does seem that it's the debugging output that causes the oom condition -- probably because of all the debug logging going into memory.17:01
jeblairwhen i remove the extra debugging and run with gitpython 2.1.3, i don't get ooms, but i do consistently get a lot of tests failing the gc reference count check17:02
jeblairso it seems like the change in 2.1.3 causes zuul to leak git.Repo objects reliably17:03
jeblair(there are 225 instances of "Leaked git repo object" in the test output)17:04
mordredjeblair: I cannot explain that from reading the code17:06
*** hashar has quit IRC17:06
jeblairmordred: where's mman.collect?17:06
mordredin https://github.com/gitpython-developers/smmap17:06
mordredjeblair: in smmap/mman.py17:06
mordredjeblair: I have not dived too deeply  into the crack that's in there - although I will say that implementing your own lru mmap in python seems like a choice _I_ wouldn't choose17:09
jeblairmordred: hrm, i wonder if the additional gc.collect calls are pushing git.python objects into higher generations and so ours isn't actually running on them17:10
jeblair(it's not clear to me what the behavior is for gc.collect() with no arguments)17:10
jeblairi guess "full collection" means run on all the generations?17:11
mordredmaybe?17:12
jeblairalso, i'd really like an analysis of why they think they need to run a full gc sweep twice every time a repo object is closed17:12
jeblairwe close repo objects *all the time*17:12
mordredyah17:12
mordredthat seems super extra expensive17:12
jeblairlike, many times in a single repo update, because gitpython is poor at dealing with files changed out from under it -- even if *it* caused those files to change.17:13
jeblair(run git remote update; get a new repo object; run git fetch; get a new repo object... etc)17:13
mordredyah17:14
jeblairmordred: i'm going to poke at that 3 line patch and see what makes things change17:15
mordredjeblair: k.17:17
mordredjeblair: I give even odds on it being the call to the mman.collect17:17
*** bhavik1 has joined #zuul17:18
*** abregman|mtg has quit IRC17:24
jeblairfirst i'm going to spend a few minutes trying to find a smaller test case17:27
*** bhavik1 has quit IRC17:35
jeblairmordred: hrm, in issue 553, he says you have to call gc.collect before calling clear_cache, but that's not what the patch does17:55
mordredoh - interesting17:56
jeblair(though, i mean, i still think "you have to call gc.collect before..." is probably wrong, or at least needs some real justification)17:56
mordredyah17:57
jeblairi can't get a smaller test case.  the leaked object detection only seems to be happening when i run the full test suite in parallel17:59
mordredjeblair: JOY18:01
jeblair(at some point, it would probably be a good idea to remove our leak detection and then time the test run with and without the change to gitpython to see how much the extra gc calls cost)18:01
mordred++18:03
jeblairoh, i just had a thought18:03
jeblairit's possible i'm seeing a bunch of leaked objects *because* the tests run slower and are timing out18:03
mordredjeblair: oh! yeah18:04
jeblairyeah, i see a lot of correlation with test timeouts.  so, erm, i guess i will perform that benchmark now.  :)18:06
jeblairmordred: i ran *one* test with and without that 3 line patch.  6.7s vs 24.2s.18:07
mordredjeblair: WOW. at the very least, it would be nice if maybe we could get a Repo constructor option that would skip the three collect calls on every close18:08
mordredI'd love to understand more why it's needed - but if the upstream devs can't explain that, then a get-out-of-jail-free card might be nice :)18:08
jeblairmordred: the mman.collect doesn't seem to have much of an effect.  it's the gc calls.18:09
jeblairmordred: i think maybe we have enough to file a bug now?18:11
Shrewspabelanger: in the current nodepoold, the updateStats() call for node counts is called in several different places, some of which won't be relevent in the new system. The zuulv3 relevent ones are at node launch time and node deletion. It's called twice at node deletion for whatever reason. I'm having difficulty figuring out exactly how often updateStats() should be called in the new system. Does it necessarily18:14
Shrewsneed to be immediately when nodes are created/deleted, or can it be more periodic?18:14
ShrewsI guess I'm wondering if those data points need to be recorded so closely to node changes, or can should we spread out the data points a bit.18:16
jeblairShrews: the intent is to be as close to real-time as possible (in openstack-infra, we have a 10s resolution)18:18
pabelangerYa, keeping it close to realtime would be my preference18:19
Shrewsok. fwiw, i've written this change twice now because i haven't been happy with how it turned out. i might try one more way to see how i like it.18:21
jeblairShrews: if the concern is needing to query zk for the gauges (the counters should basically be free), having that be somewhat periodic is probably okay; but an interval measured in seconds would still be ideal.18:21
Shrewsjeblair: that was one concern (since each provider thread is going to be querying zk for the counts), but it may not be so bad since nodes can only be created/deleted so quickly18:23
*** isaacb has quit IRC18:27
jeblairmordred: https://etherpad.openstack.org/p/3BnandNi86  how's that look?18:30
jeblairmordred: (also, do i need to do any github magic to make the commit sha link?)18:31
jlkjeblair: that's perfect feedback, and certainly a good direction to go in.18:34
mordredjeblair: looking18:43
mordredjeblair: yah - I agree - I think it's a good bug18:44
mordredjeblair: I've also got a patch that would add a get-out-of-jail constructor argument, should we decide we desire such a thing18:45
*** adam_g has quit IRC18:46
*** adam_g has joined #zuul18:47
jeblairhttps://github.com/gitpython-developers/GitPython/issues/60518:55
jeblairmordred, pabelanger: i think we should pin to <2.1.2 until that's resolved18:55
jeblairi'll update pabelanger's change to reference that in the commit msg18:56
pabelangerwfm18:56
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Cap GitPython at 2.1.1 due to performance degradation  https://review.openstack.org/44322118:57
jeblairapparently github permits me to express my reaction to that issue.  i don't see an icon that looks appropriate for "shock".18:59
mordredjeblair: oh fun - have you read the readme at https://github.com/gitpython-developers/smmap ?19:16
mordred"System resources (file-handles) are likely to be leaked! This is due to the library authors reliance on a deterministic __del__() destructor."19:16
mordredjeblair: also - https://github.com/gitpython-developers/smmap/pull/33 is potentially worth being on our radar19:18
jlkspeaking of radar19:21
jlkmordred: https://github.com/ansible/ansible/pull/22281  this may be of some interest19:21
openstackgerritPaul Belanger proposed openstack-infra/nodepool master: Remove ubuntu-precise from AFS mirror list  https://review.openstack.org/44329519:24
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Allow using webapp from connections  https://review.openstack.org/43983119:30
mordredjlk: yah - I was looking at that the other day19:33
jlkI gave it a cursory once-over, but seems vaguely similar to our problem19:33
jlkIf we get our talk accepted to AnsibleFest, it'd be worth mentioning this effort.19:33
mordredjlk: yah - it's similar but I think possibly opposite in some ways ... that said, we should likely talk to those people and figure out where we might be able to store common ground19:34
mordredjlk: (for them, it's easy to restrict playbook content, which is the thing we can't do) - but I think the list of restrictions and unsafe thigns you want to prevent would likely be pretty much the same19:34
jlkyup19:35
mordredI like bcoca's first response19:35
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Support GitHub PR webhooks  https://review.openstack.org/43983419:37
mordredjlk: thanks - I responded to that pr19:41
openstackgerritPaul Belanger proposed openstack-infra/nodepool master: Check if /etc/apt/apt.conf.d first exists  https://review.openstack.org/44330219:41
mordredjlk: do you think that returning the package list in a fact for the bindep module is the right thing to do? or should we just return values and have people do register?19:50
mordredjlk: (I was just doing your suggestion of making an example of feeding to package)19:50
mordredjlk: pushed updates19:56
jlkRegister probably, as it seems like a strange thing to pollute the fact space with? dunno.20:11
mordredyah20:17
pabelangerleft a comment, but list and register is fine with me20:19
jlkhuh, why are we getting OOM in tests?20:19
pabelangerunless we want to get fancy and have the bindep task pip it into the package task20:19
pabelangerjlk: ya we have a workaround20:20
pabelangermordred: jlk: https://review.openstack.org/#/c/443221/20:20
pabelangermind a review20:20
jlkoh taht.20:20
jlkI already reviewed20:20
mordredjlk: +A20:20
jlkdon't have those rights :(20:20
openstackgerritPaul Belanger proposed openstack-infra/zuul master: Cap GitPython at 2.1.1 due to performance degradation  https://review.openstack.org/44330620:21
jlk(I don't think I should either yet)20:21
mordredgah20:21
mordredpabelanger: +A I mean20:21
pabelangermaster patch^20:21
pabelangermordred: danke20:21
mordredjlk, pabelanger: ok. I updated the PR to not use facts - and it now includes an appropriate example20:23
jlkthere.20:24
jlkwhoh, github now has a "changes since you last looked" button type thing20:24
jlkand it seems to show the right stuff20:24
mordredoh neat20:24
pabelangermordred: Yay, already have a role lined up to use it20:25
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Cap GitPython at 2.1.1 due to performance degradation  https://review.openstack.org/44322120:27
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Allow using webapp from connections  https://review.openstack.org/43983120:28
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Support GitHub PR webhooks  https://review.openstack.org/43983420:29
mordredpabelanger: nice work on your reference to the issue in the zuul patch :)20:30
pabelangermordred: thanks, but that was jeblair who linked it\20:31
mordredoh - so it was20:32
jeblairmordred: that was partly for our reference, and partly to ensure it ended up in the github event stream for the issue.  :)20:48
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove extra debugging around git repo leaks  https://review.openstack.org/44331720:52
mordredjeblair: yup.  it worked20:52
*** eggshell has quit IRC21:05
*** eventingmonkey has quit IRC21:05
*** eggshell has joined #zuul21:06
*** eventingmonkey has joined #zuul21:06
*** hashar has joined #zuul21:08
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Add basic Github Zuul Reporter.  https://review.openstack.org/44332321:09
*** eggshell has quit IRC21:20
*** eventingmonkey has quit IRC21:20
*** eggshell has joined #zuul21:26
*** eventingmonkey has joined #zuul21:27
jheskethMorning21:29
mordredlook it's a jhesketh21:37
openstackgerritMerged openstack-infra/nodepool master: Remove ubuntu-precise from AFS mirror list  https://review.openstack.org/44329522:34
openstackgerritMerged openstack-infra/nodepool master: Check if /etc/apt/apt.conf.d first exists  https://review.openstack.org/44330222:36
jamielennoxeek, that gitpython thing is horrible, wtf is a library doing invoking python's gc?23:08
*** saneax-_-|AFK is now known as saneax23:21
*** hashar has quit IRC23:34
mordredjamielennox: this is amongst today's many horrors23:41
mordredjamielennox: if that is concerning, I recommend you definitely do not look at the library that implements an LRU mmap thing in python23:42
jamielennoxmordred: it's still morning here, don't ruin all the surprises for me23:42
jamielennoxthere's a library for that?23:42
mordredjamielennox: oh yeah23:42
mordredjamielennox: https://github.com/gitpython-developers/smmap23:43
mordredjamielennox: enjoy the limitations listed in the readme23:43
jamielennoxthis may be obvious - but why does this exist?23:44
mordredjamielennox: well, that's easy .... alsjdnc93w4un 0384fnoain3408nvclsairnvasr34unfvclasdna23:44
mordred(to be fair, the gitpython library has served us well for a few years now, so it's doing an amount of tasks well - but my mind definitely boggles when I see smmap and gc.collect() in a library)23:45
jamielennoxso we're looking at the libgit2 bindings or going to limp it through a while23:46
jamielennox?23:46
mordredwe've submitted a bug, and the pin to the earlier version is making this go away - so I think for now just dealing with it23:46

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!