Saturday, 2017-09-30

openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Handle build_set being None for priority  https://review.openstack.org/50863400:13
*** harlowja has quit IRC00:14
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Handle build_set being None for priority  https://review.openstack.org/50863400:14
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Protect against builds dict changing while we iterate  https://review.openstack.org/50862900:14
SpamapSooo I wish I hadn't been in meetings all day I'd love to do the load average limiter patch01:40
SpamapSfungi: still working on it?01:41
fungiSpamapS: i never even really got off the ground with it--if you want to take it, all yours!01:44
fungithere's some discussion in here on a viable direction for it, at least01:45
fungiif you haven't already caught up01:45
SpamapSI did see that01:46
SpamapSI'm wondering if we can just go simpler and limit things with a thread pool.01:46
SpamapSif ansible jobs are the source of load and RAM usage.. causing load.. .then limiting concurrency seems like the way to go.01:47
fungiand just let the remaining jobs pile up in gearman until an executor has available threads again? i guess that would be the result01:50
SpamapSyep01:51
SpamapSbut it's easier to just make a thread pool than monitor load01:51
SpamapSand the way gearman works busier executors will always respond slower than idle ones if the concurrency hasn't been all used up01:52
openstackgerritJohn L. Villalovos proposed openstack-infra/zuul feature/zuulv3: Fix pep8 error  https://review.openstack.org/50864301:55
*** harlowja has joined #zuul02:39
*** harlowja has quit IRC02:55
SpamapShm actually no03:00
SpamapSif I just have a thread pool for jobs the executor server will slurp all of the jobs in.03:00
SpamapSit's simpler than that anyway. I can have a counter for active jobs and deregister/register when it crosses the concurrency threshhold03:01
jeblairSpamapS: that works for a max job count, but we were thinking that load average might be more adaptive03:11
jeblair(like, actual system load average)03:11
SpamapSjeblair: It is, but it's also more complicated. ;)03:13
SpamapSnow that I'm digging in03:14
SpamapSit's not thaaaat much more complicated03:15
SpamapSI have it working where it will deregister when it has more than 5 jobs running, and re-register when it drops below 503:16
SpamapSjeblair: a concurrency limit will also help control memory in a coarse kind of way.03:17
SpamapSbut if we can poll load average, we can poll free03:17
SpamapSstill I'm inclined to start with this and see how it goes.03:18
SpamapSas much because I'm about to get home and I won't be coding for about 48 hours after that. ;)03:18
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add a concurrency limit to zuul-executor  https://review.openstack.org/50864903:21
SpamapSanyway, I may hack on a load average based one later or something03:21
* SpamapS afk for a bit03:21
jeblairSpamapS: i'm not opposed to more tunable limits; i think what i'd like the default to be though is automatic based on load average.  i'm not a fan of systems that make sysadmins guess tunable parameters when we have computers that can do it for them.  but we can build on that.  :)03:22
jeblairi love ndb.  i'd love it even more if it ran that perl script for me before starting.  ;)03:22
SpamapShah yeah03:44
SpamapSjeblair: I love adaptable systems too. Rarely do I get to write one. :) Sitting on a bus now, will see what flies out of me.03:45
SpamapSI think the right thing is to just check load before getJob03:46
SpamapSif load is too high, sleep a bit and check again.03:46
jeblairSpamapS: thing is we need to keep getting jobs other than execute:execute though.  especially execute:stop03:49
jeblairwell, i guess that's the only other one.03:50
jeblairbut it is important.  :)03:50
SpamapSoh right03:50
SpamapSnot sleep, unregister03:50
SpamapSso check load... if too busy, unregister expensive03:50
SpamapSI think that's the ticket03:50
SpamapSnote that there's still a tunable required03:51
SpamapSWhich is "what's an OK load?"03:51
jeblairyeah, the load average.  we could set it to nproc*303:51
jeblairby default03:51
SpamapSwhat's the 3 coming from?03:51
jeblairswag03:51
SpamapS+103:52
jeblairlooking at http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=63999&rra_id=1&view_type=&graph_start=1506660317&graph_end=1506742125&graph_height=120&graph_width=500&title_font_size=10 make it look like 20 is the magic number for those servers03:54
jeblairso maybe nproc*2.5 :)03:54
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP Do not add implied branch matchers in project-templates  https://review.openstack.org/50865803:58
jeblairclarkb, mordred: ^ i *think* that's 99% there; i think the logic is correct; i just need to write a commit message and do a little cleanup from a previous attempt.  should be able to do that tomorrow.03:59
SpamapSwhoa that was weird04:01
SpamapSI just did git review -d 508649, and got the content from 50865804:01
SpamapSalmost like it collided with you uploading 50865804:02
SpamapSbah04:18
SpamapSif I do it only before getJob... we stay doing nothing until a cancel/cat/merge comes in04:18
SpamapSneed a thread and I think a lock on the client :-P04:18
clarkbcan you do it without a thread as a noop handler?04:20
clarkbiirc the server sends those gratuitously to wake workers?04:20
clarkbor maybe it was different request04:21
SpamapSno04:21
SpamapSwell04:21
SpamapSyeah04:22
SpamapSNOOP is what the server sends to say "Hey you say you can do this, wake up and GRAB JOB04:22
SpamapSso yes04:22
SpamapSthis delay thing is interesting04:22
SpamapSNot sure why that's there.04:22
SpamapSI mean I know why it says it is there.04:23
SpamapSbut that seems unnecessary. The delay should already be happening by virtue of the fact that the less busy servers should respond faster.04:23
clarkbthe problem is the job cost is delayed itself04:26
SpamapSclarkb: so the problem is once we've unregistered from work, we won't get noop's anymore.04:26
clarkbso they will more slowly grab jobs04:27
clarkbbut that doesntatter when load is 80 because ansible04:27
clarkbSpamapS: aha04:27
SpamapSand gearman will send us jobs for anything we're registered for, so if we don't unregister, getjob will still assign us jobs04:27
SpamapSso we need something that periodically checks to see that we're ready for more work04:27
clarkbbut ya if you grab 50 jobs and load skyrockets you'll just continue to add on but more slowly04:28
SpamapSyeah it's untenable this way04:28
SpamapSso a thread that just sleeps and goes "am I taking work? If not, is load low enough to take more work? If yes, register" every few seconds seems like the right thing.04:29
SpamapSbut that also gets into "are gear.Worker's thread safe?"04:30
SpamapSbecause the worker is likely to be in getJob()04:31
SpamapSI wonder if we could just make the tunable "concurrency_factor" and basically say "multiple this time nproc to get the concurrent jobs"?04:32
SpamapSbecause controlling the # of concurrent jobs is pretty easy04:32
SpamapSsince we're always going to be in control when jobs are started or finished.04:32
SpamapSthat also feels like a pretty reasonable factor to expect sites to tune as they optimize. We can make the default roughly what infra sees on its executors, but other sites may have very different jobs04:34
SpamapSIt's also a bit safer since taking on jobs while load is low and we're just waiting for a bunch of devstack runs may backfire if the devstack runs all finish at once and we're doing 50 concurrent rsyncs.04:34
SpamapSwe could even get good at tracking job cost eventually, and not go by the count of jobs, but by the count of expected job execution seconds/iops/etc.04:36
SpamapSanyway.. home now... will ponder more. If we see it getting out of control the patch I made will at least give us a governor04:37
SpamapSActually I just check load in finish and start. I can make a special exception never to unregister in finish if it would take me below 1 job running.04:48
* SpamapS may not sleep until this is implemented04:48
*** xinliang has quit IRC07:46
*** xinliang has joined #zuul07:59
*** bhavik1 has joined #zuul09:31
*** bhavik1 has quit IRC09:35
*** xinliang has quit IRC09:35
*** huangtianhua has quit IRC11:11
*** zhuli has quit IRC13:15
openstackgerritJeremy Stanley proposed openstack-infra/zuul-jobs master: Yet still more fix post log location  https://review.openstack.org/50868414:31
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Handle build_set being None for priority  https://review.openstack.org/50863415:04
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Yet still more fix post log location  https://review.openstack.org/50868415:06
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Protect against builds dict changing while we iterate  https://review.openstack.org/50862915:06
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Make fetch-tox-output more resilient  https://review.openstack.org/50856315:28
jeblairSpamapS: i was about to say "oh it's totally safe to call register functions from another thread", and in fact we do exactly that in zuul v2.515:49
jeblairSpamapS: i went to double check that though, and i did find that we may want this: remote:   https://review.openstack.org/508698 Add a send lock to the base Connection class15:49
jeblairSpamapS: zuul v2.5 was not using ssl, but v3 is.  so we were very unlikely to see a problem with that in v2, but somewhat more likely in v3.15:50
jeblairSpamapS: also, a downside to only checking at start/end of jobs is that if we end up in a case where we unregister, and then a bunch of jobs finish when the load is high, we stay unregistered, but then the load drops but it's an hour until the next job finishes, we may end up substantially underutilized.  so i favor something that checks more regularly.15:52
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Make fetch-tox-output more resilient  https://review.openstack.org/50856315:55
SpamapSjeblair: yeah in fact finishJob sends work complete from another thread from what I see.16:13
jeblairgood point, we're already playing the odds16:14
SpamapSYeah so I think I can just start a thread that checks load every few seconds and registers or unregisters appropriately.16:15
SpamapSAnd then maybe we can put some armor on gear to make it less of a dice roll.16:16
jeblairSpamapS: https://review.openstack.org/508698 should be armor16:19
SpamapSI will say also that I'm not sure gearman is the best protocol for this. AMQP has a specific response which is "send this to somebody else I'm too busy" will would allow us to make per-job cost decisions.16:19
SpamapSs/will/which/.   DYAC16:20
SpamapSNice!16:21
SpamapSRe: lock16:21
SpamapSoh man I just discovered mosh the other day.. nice that my irc session never disconnects now16:24
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Handle build_set being None for priority  https://review.openstack.org/50863416:42
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Limit concurrency in zuul-executor under load  https://review.openstack.org/50864916:48
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Do not add implied branch matchers in project-templates  https://review.openstack.org/50865816:48
SpamapSjeblair: ^ load based, definitely could end up trying to send at the same time we're sending other stuff so may add another dice roll. ;)16:50
SpamapSI'm going to test it out in my GD internal zuul.16:51
jeblairi'm never going to stap giggling whenever you say that :)16:51
jeblairstop even16:51
*** bhavik1 has joined #zuul16:54
openstackgerritMerged openstack-infra/zuul-jobs master: Yet still more fix post log location  https://review.openstack.org/50868416:54
*** bhavik1 has quit IRC16:56
*** bhavik1 has joined #zuul16:56
*** bhavik1 has quit IRC16:58
*** bhavik1 has joined #zuul16:58
*** bhavik1 has quit IRC17:00
SpamapSjeblair: It's a great GD zuul. ;)17:07
openstackgerritClark Boylan proposed openstack-infra/zuul feature/zuulv3: Do not add implied branch matchers in project-templates  https://review.openstack.org/50865817:19
openstackgerritDavid Moreau Simard proposed openstack-infra/zuul feature/zuulv3: Delete IncludeRole object from result object for include_role tasks  https://review.openstack.org/50423817:57
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Do not add implied branch matchers in project-templates  https://review.openstack.org/50865818:11
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Protect against builds dict changing while we iterate  https://review.openstack.org/50862920:39
openstackgerritMerged openstack-infra/zuul-jobs master: Multi-node: Set up hosts file  https://review.openstack.org/50455221:45
openstackgerritMerged openstack-infra/zuul-jobs master: Multi-node: Set up firewalls  https://review.openstack.org/50455321:45
*** mrhillsman has joined #zuul21:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!