Tuesday, 2017-04-18

*** jkilpatr has quit IRC01:53
*** jkilpatr has joined #zuul02:01
*** TheJulia has quit IRC02:26
*** TheJulia has joined #zuul02:29
*** dkranz has quit IRC03:12
*** jamielennox is now known as jamielennox|away05:12
*** jamielennox|away is now known as jamielennox05:17
*** jkilpatr has quit IRC06:33
*** jkilpatr has joined #zuul06:38
*** jamielennox is now known as jamielennox|away07:34
*** jamielennox|away is now known as jamielennox08:28
*** openstackgerrit has quit IRC08:33
*** pbelamge has joined #zuul08:51
mordredjamielennox, jeblair: catching up on scrollback from meeting yesterday - another use case to keep in mind for "more support for username in zuul.conf" is people using zuul without having nodepool build images ... this is not implemented yet, but is a thing we've discussed being a thing, and setting a username will likely be useful for that08:55
jamielennoxmordred: yep, i can see a case where people want to use the distro supplied cloud images10:09
jamielennoxbut that would be different per image10:10
mordredjamielennox: exactly10:11
tobiashso nodepool will tell zuul which user to use?10:13
jamielennoxtobiash: the direction for now was just to give you the default option in zuul.conf and nodepool when we have a real use case10:14
tobiashis someone working on support for 'external' images in nodepool which don't get rebuild?10:23
tobiashif not I hopefully have time for that in the next few weeks10:23
mordredtobiash: I do not believe anyone is working on that yet - no10:23
mordredtobiash: hopefully it won't be _super_ hard10:23
mordredsince it'll mostly be about not doing thing10:23
mordredthings10:23
tobiashI have such a use case (windows images)10:24
mordredah - that's an excellent use case10:24
tobiashcurrently I'm using windows images with an old nodepool with snapshot approach10:25
tobiashbut that's ugly as I'm not changing anything during that snapshotting...10:25
mordredyah. that is ugly :(10:25
tobiashwith zookeeper enabled nodepool this won't work anymore so externally managed images seem promising10:25
mordredI think also for some users, the cloud provided image would be fine - if there is no desire to cache things in the images then the urge to rebuild them may be lower10:26
mordredso the build/upload step is just extra complexity for some cases10:26
tobiashyepp10:26
tobiashhm, image rebuild is tightly coupled to diskimage builder right?10:27
mordredit is10:28
tobiashI'm thinking of a possible use case like ansible managed images (could support 'building' of windows images)10:28
mordredalthough with the new zookeeper approach it should _also_ be easier for someone to write a different driver for that10:28
tobiashpossibly running an ansible script instead could be cool10:29
mordredI mean, it shouldn't be that hard to make dib be able to handle windows images - pabelanger added the ability to run ansible playbooks as part of an element - and dib can already download a base image from somewhere10:29
mordredpabelanger: ^^ might be a fun use case to look at for your ansible+dib stuff10:30
tobiashhm, how would that work for windows?10:30
tobiashdownload base image, create local vm (libvirt) run ansible against that?10:31
mordredwell - yah - it would likely have to use local vm - the normal process for dib is to download base image then create a chroot for it10:32
mordredand ansible has a chroot connection type, so the ansible playbook support can be pretty lightweight10:32
mordredhowever, chroot into window image will not work well :)10:32
tobiashnot really ;)10:32
tobiashmaybe also the cloud itself could help there10:33
tobiashinstead of running the local vm10:33
mordredyah - this is maybe the reason we haven't gotten too far down this path yet10:33
mordredsince doing a remote modificaiton and snapshot would make much more sense here10:33
mordredbut I think enough people want unmodified images that adding support for _that_ is likely the biggest win10:34
mordredsince then if you want to manage a few seldom-changing images with something like packer or any other tool, you can do that without nodepool needing to know10:34
tobiashyes, I think an easier way for that would be an externally managed image + jobs which update the images (easy via ansible)10:35
mordredyup10:35
mordredand as soon as we find a person who needs windows images AND wants to have them updated daily like nodepool does and wants nodepool to manage that - well, then we can solve that issue :)10:36
tobiashas this is not a real standard use case I can live very well with updating windows images outside of nodepool10:37
tobiashbut that dib+ansible stuff makes me curious10:38
tobiashhave to look at that10:38
tobiashI have all stuff of our ci deployment in ansible so dib seems a bit 'unnatural' in my env10:40
*** jkilpatr has quit IRC10:40
tobiashcombining dib with ansible for the linux nodes sounds really interesting :)10:40
mordredyah - I agree - I'm a little sad I haven't had more time to play with it directly and also maybe to write some blog posts about it10:45
tobiashmordred: you mean this one? https://review.openstack.org/#/c/38560810:51
mordredtobiash: yes. I think that's it - I guess it hasn't landed yet :)10:52
tobiashnope10:52
* mordred will add that to his list of things to go argue about10:53
*** jkilpatr has joined #zuul11:14
*** hashar has joined #zuul11:25
*** rcarrillocruz has quit IRC11:34
*** yolanda has quit IRC11:44
*** yolanda has joined #zuul11:44
*** yolanda has quit IRC11:49
*** yolanda has joined #zuul11:50
*** yolanda_ has joined #zuul11:56
*** yolanda has quit IRC11:58
*** yolanda_ has quit IRC12:19
*** yolanda_ has joined #zuul12:19
*** yolanda_ has quit IRC12:31
*** yolanda_ has joined #zuul12:31
pabelangermordred: ah yes, it would be great to get some feedback from ansible users. I've run into a snag on that review12:41
*** yolanda_ has quit IRC12:43
*** yolanda_ has joined #zuul12:44
*** yolanda_ has quit IRC13:04
*** yolanda_ has joined #zuul13:05
*** yolanda_ is now known as yolanda13:05
*** gundalow has quit IRC13:06
*** gundalow has joined #zuul13:07
*** dmsimard|afk is now known as dmsimard13:15
*** dkranz has joined #zuul13:18
*** openstackgerrit has joined #zuul13:41
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Allow for prefixing job dir names  https://review.openstack.org/45668513:41
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Initial code for a fingerd log streamer  https://review.openstack.org/45672113:41
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Allow for specifying root job directory  https://review.openstack.org/45669113:41
Shrewsjeblair: mordred: I think those reviews ^^^ actually make the log streamer work now. I don't have a working zuul setup to actually test, but running the zuul-log-streamer alone (and manually creating the log) works. Will now look at the 2nd option of integrating with the executor.13:45
mordredShrews: awesome13:47
ShrewsInitial study of safe ways to do privilege deescalation in python makes me sad though13:47
mordred:(13:47
Shrewsmordred: that deescalate library you noted yesterday scares me a bit. a 0.1 release with 0 contributors and only 22 commits13:55
Shrewsso probably safe to say it's not widely tested/used13:55
Shrewslast commit Aug 201513:56
*** pbelamge has quit IRC13:57
mordredShrews: yah - also, it's basically just using pyrex to hit the C versions of the capabilities calls14:00
mordredShrews: I think if anything we'd wind up doing something similar ourselves14:00
Shrewslooking into the daemon.DaemonContext options now. that might be an easy way to do this14:01
Shrewsthere are uid and gid options for that14:01
*** isaacb has joined #zuul14:07
*** isaacb has quit IRC14:13
*** isaacb has joined #zuul14:13
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Initial code for a fingerd log streamer  https://review.openstack.org/45672114:37
Shrewsadded code ^^^ to drop permissions to 'zuul' user before handling the request14:37
jeblairclarkb, jlk, SpamapS: the unit test failures in the gate happen on osic where we recently switched to 4vcpus.  all the other providers are still 8.14:38
jeblairi think that explains the change in behavior14:39
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (1/2)  https://review.openstack.org/45336214:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Replace config/project repos with config/untrusted projects  https://review.openstack.org/45334714:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add hostname to TriggerEvent  https://review.openstack.org/45234814:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (2/2)  https://review.openstack.org/45382114:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Extend test timeout to 120s  https://review.openstack.org/45480614:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fully qualify project configuration names  https://review.openstack.org/45197014:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add source to project and remove unused tenant attrs  https://review.openstack.org/45196914:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Pass source to project instantiations  https://review.openstack.org/45159614:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add a project index to Tenant  https://review.openstack.org/45159714:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove unused Tenant.getRepo method  https://review.openstack.org/45192914:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use new tenant project index for config references  https://review.openstack.org/45192814:46
*** isaacb has quit IRC14:59
*** isaacb has joined #zuul15:26
*** isaacb_ has joined #zuul15:28
*** isaacb has quit IRC15:31
jeblairi rebased that stack on a timeout increase, and it it passed all of the times it ran in osic.  it failed 1/2 in vanilla and 1/3 in internap.15:32
*** rcarrillocruz has joined #zuul16:05
*** rcarrillocruz has quit IRC16:07
*** rcarrillocruz has joined #zuul16:07
*** isaacb_ has quit IRC16:08
*** hashar has quit IRC16:12
*** rcarrillocruz has quit IRC16:18
*** rcarrillocruz has joined #zuul16:18
jeblairclarkb, jlk: i have found one inter-test leak; working on patch/test now16:50
*** harlowja_ has joined #zuul16:50
*** harlowja has quit IRC16:52
SpamapSjeblair: so longer timeout made it fail on 8cpu boxes, but pass on 4cpu?17:01
jeblairSpamapS: i don't think the longer timeout hurt the 8vcpu; we had many more successes than failures there.17:03
jeblair(there were 8vcpu sucecsses on other providers i didn't mention)17:05
SpamapSjeblair: so is there any insight into the root cause? I can't think of one.17:14
mordredSpamapS: "jeblair | clarkb, jlk: i have found one inter-test leak; working on patch/test now" perhaps that?17:16
jeblairi think we're doing significantly more work in the tests, especially with git repos, which makes them very susceptible to slowdowns.  i think that the leak i found (there may be more leaks) exacerbates the problem by causing cascading failures.17:17
SpamapSAhhhhhhh I didn't see that.17:18
SpamapSjeblair: right, it goes slow, another one that uses something shared runs at the same time, and boom, race?17:18
jeblairi think there are some things we can do to greatly reduce the amount of git repo work being done in tests which i think will help.  we took some shortcuts enabling some of the tests.  we're going to have to take the long way round to improve it.17:19
SpamapSoh that sounds interesting. :)17:19
jeblairi'll try to have a patch up to explain what i've found soon17:22
*** jkilpatr has quit IRC17:22
*** jkilpatr has joined #zuul17:38
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Stop jobs when executor stops  https://review.openstack.org/45775318:04
jeblairSpamapS, clarkb, mordred, jlk: ^ give that a once over please and let me know if it makes sense.18:04
jeblairi don't think it will solve all our problems, but i think it does clear up at least one mystery, and will help us pinpoint actual issues in tests rather than being awash in unrelated failures.18:05
jeblairin http://logs.openstack.org/62/453362/14/check/gate-zuul-python27-ubuntu-xenial/6bb6d65/testr_results.html.gz the first failure is an actual racy test.  all the other failures are due to the leak from that first test.18:09
jeblairso next up -- fixing that race :)18:09
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Add a finger protocol log streamer  https://review.openstack.org/45672118:23
SpamapSjeblair: the premise makes perfect sense to me.18:25
*** hashar has joined #zuul18:29
mordredjeblair: what an exciting patch!18:33
*** jkilpatr_ has joined #zuul18:36
*** jkilpatr has quit IRC18:36
jeblairi guess to clarify, that leak is only between tests within a single run of a single test runner.  it's not a resource leak across test run invocations, which is something that folks were theorizing earlier.18:42
jeblairbut it's definitely an issue.  :)18:42
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Allow for prefixing job dir names  https://review.openstack.org/45668519:33
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Allow for specifying root job directory  https://review.openstack.org/45669119:55
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Add a finger protocol log streamer  https://review.openstack.org/45672119:55
clarkbjeblair: is the failure on 457753 one of the race you are talking about?20:11
clarkblooks like only a single test failed. I am going to recheck to see if its consistent which should help in reviews20:11
jeblairclarkb: just started looking at that.  it looks like it's a new failure uncovered by the additional assertion.  i'm still digging and not sure what to think of it yet.  :)20:12
clarkbjeblair: in the new stop() method I think you may have to wait for settled?20:18
clarkbas the release isn't instantaneous?20:18
clarkbnevermind I see that the super call in stop() is going to forcefully stop things20:22
jeblairclarkb: the superclass stop() waits on the thread, so the executor should be completely done with it by the time it returns20:22
*** openstackgerrit has quit IRC20:33
*** jkilpatr_ has quit IRC20:34
clarkbjeblair: based on the end of the logs there I think the apschedulers aren't being cleared out by the replacement of the pipeline20:36
clarkbjeblair: so the code that is guarding against a race in apscheduler scheduling new things may not be sufficient20:37
clarkbneed to make sure the apscheduler threads die too20:37
jeblairclarkb: it think we expect the apscheduler to continue; we just remove the jobs from the pipeline so it doesn't schedule any more.20:38
clarkbya reading _removeJobs now.20:40
SpamapSI'm constantly calling .shutdown() on apscheduler when pdb'ing20:43
clarkbcurrently not clear to me what the difference is between postConfig and reconfigure20:44
SpamapSin fact I usually just prefix pdb.set_trace() with x.y.z.apsched.shutdown()20:44
clarkbas it appears postConfig is what actually does that work of a reconfiguration event, but timer has no postConfig?20:44
jeblairi think the error in that test is because waitUntilSettled doesn't wait until the AnsibleJob is "finalized" which is what what we look for in the new assertion20:47
clarkbaha reconfigureDrivers is called a little earlier. /me starting ot make sense of this20:47
jeblairso i'm adding a check for that to waitUntilSettled20:47
jeblairi think that should take care of it20:47
clarkbyou'll still get the error logged about apscheduler threads. Would it be better to remove those if no longer needed or maybe whitelist them so they don't generate errors in the logs?20:49
*** openstackgerrit has joined #zuul20:49
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Stop jobs when executor stops  https://review.openstack.org/45775320:49
jeblairthat's a lot of apscheduler threads20:49
clarkbI think it would work to call apsched.stop() after remove jobs. Then start if not already started and at least one job is added in after?20:51
* clarkb checks apscheduler api20:51
clarkbstop is not documented20:52
clarkboh I may have made that up20:52
clarkblooks like pause may be a thing20:53
jeblairwell, i don't think it's critical20:53
jeblair(would be nice to tidy up)20:53
*** jkilpatr has joined #zuul20:55
*** hashar has quit IRC20:58
openstackgerritClark Boylan proposed openstack-infra/zuul feature/zuulv3: Don't run apsched thread when no jobs need scheduling  https://review.openstack.org/45779821:00
clarkbI have no idea if that will actually work21:00
clarkband ya not really necessary since the apsched thread will b reused once new jobs that need timer scheduling show up21:00
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix race in test_queue_rate_limiting_dependent  https://review.openstack.org/45779921:02
jeblairthat fixes the race that was buried in the thread leak errors from earlier ^21:03
jeblairclarkb: i think your initializer should be False instead of True21:04
*** _ari_ is now known as _ari_|gone21:09
clarkbjeblair: beacuse startup implies reconfigure?21:14
jeblairclarkb: oops, i'm wrong, i missed the start() there.  looks right.21:17
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix race in test_queue_rate_limiting_dependent  https://review.openstack.org/45779921:48
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Stop jobs when executor stops  https://review.openstack.org/45775321:48
jeblairclarkb: i restacked those on the timeout bump21:49
clarkbok gonna tank up on cafeeine and then review21:49
jeblairi'm going to perform work to reduce the excessive git repo activity after my multi-source patches because there will be significant conflicts21:50
clarkbjeblair: basically you are going to reduce tests down to required subset of repos?22:02
jlkWas there any more info/progress in test instability?22:07
clarkbjlk: https://review.openstack.org/#/c/457799/2 that stack is what jeblair has put together to start fixing things22:09
jeblairclarkb: yeah22:09
clarkbjeblair: executor one failed pep8 on uuid being shadowed22:09
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (1/2)  https://review.openstack.org/45336222:09
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Replace config/project repos with config/untrusted projects  https://review.openstack.org/45334722:09
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add hostname to TriggerEvent  https://review.openstack.org/45234822:09
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (2/2)  https://review.openstack.org/45382122:09
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fully qualify project configuration names  https://review.openstack.org/45197022:09
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add source to project and remove unused tenant attrs  https://review.openstack.org/45196922:09
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Pass source to project instantiations  https://review.openstack.org/45159622:09
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add a project index to Tenant  https://review.openstack.org/45159722:09
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove unused Tenant.getRepo method  https://review.openstack.org/45192922:09
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use new tenant project index for config references  https://review.openstack.org/45192822:09
jeblairclarkb: pep8 fixed along with the rebase22:10
clarkbjeblair: the current ps is failing, did the fix get pushed?22:10
jeblairhuh, did i miss that?22:11
jeblairdrat22:11
clarkbI left inline comment with details22:11
clarkboh it got into the next change22:12
clarkbhttps://review.openstack.org/#/c/457799/2/tests/base.py is where its fixed22:12
jeblairwhoops22:12
jeblairthanks for finding that :)22:12
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Stop jobs when executor stops  https://review.openstack.org/45775322:13
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (1/2)  https://review.openstack.org/45336222:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Replace config/project repos with config/untrusted projects  https://review.openstack.org/45334722:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix race in test_queue_rate_limiting_dependent  https://review.openstack.org/45779922:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add hostname to TriggerEvent  https://review.openstack.org/45234822:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (2/2)  https://review.openstack.org/45382122:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fully qualify project configuration names  https://review.openstack.org/45197022:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add source to project and remove unused tenant attrs  https://review.openstack.org/45196922:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Pass source to project instantiations  https://review.openstack.org/45159622:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add a project index to Tenant  https://review.openstack.org/45159722:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove unused Tenant.getRepo method  https://review.openstack.org/45192922:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use new tenant project index for config references  https://review.openstack.org/45192822:15
openstackgerritClark Boylan proposed openstack-infra/zuul feature/zuulv3: Don't run apsched thread when no jobs need scheduling  https://review.openstack.org/45779822:19
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (1/2)  https://review.openstack.org/45336222:52
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Replace config/project repos with config/untrusted projects  https://review.openstack.org/45334722:52
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add hostname to TriggerEvent  https://review.openstack.org/45234822:52
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (2/2)  https://review.openstack.org/45382122:52
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fully qualify project configuration names  https://review.openstack.org/45197022:52
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add source to project and remove unused tenant attrs  https://review.openstack.org/45196922:53
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add a project index to Tenant  https://review.openstack.org/45159722:53
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove unused Tenant.getRepo method  https://review.openstack.org/45192922:53
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use new tenant project index for config references  https://review.openstack.org/45192822:53
jheskethMorning22:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!