Tuesday, 2018-02-06

*** haint has quit IRC00:01
*** haint has joined #zuul00:02
corvusclarkb: if you have a sec to +3 https://review.openstack.org/540965 before you retire for the evening, that would be lovely to have in place tomorrow00:30
* clarkb looks00:30
clarkbdone00:35
corvusi've added executor oom errors to the infra meeting tomorrow because it's looking like this issue may mostly be specific to our deployment.  just mentioning it as a heads up in case it's still interesting to other folks.00:35
openstackgerritMerged openstack-infra/zuul master: Allow a few more starting builds  https://review.openstack.org/54096500:43
*** JasonCL has joined #zuul01:55
*** harlowja has quit IRC02:18
*** jimi|ansible has quit IRC04:39
*** harlowja has joined #zuul04:46
*** harlowja has quit IRC05:11
*** threestrands has quit IRC06:40
*** yolanda has joined #zuul06:43
*** jpena|off is now known as jpena07:53
*** hashar has joined #zuul08:24
AJaegerzuul team, something fishy is happening with zuul and nodepool, see http://grafana.openstack.org/dashboard/db/nodepool and http://grafana.openstack.org/dashboard/db/zuul-status. No root  around in #openstack-infra to investigate.08:39
*** threestrands has joined #zuul09:06
*** threestrands has quit IRC09:21
*** sshnaidm|bbl is now known as sshnaidm|rover09:30
openstackgerritMatthieu Huin proposed openstack-infra/zuul-jobs master: role: Inject public keys in case of failure  https://review.openstack.org/53580309:30
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: Clean held nodes automatically after configurable timeout  https://review.openstack.org/53629509:36
tobiashAJaeger: looks like all executors deregistered itself due to some erratic governor behavior09:47
tobiashhrm, the starting builds doesn't decrease09:48
tobiashso there might be something broken in starting builds limitation09:49
tobiashcorvus, mordred, pabelanger: maybe we leak job workers under some circumstances09:56
tobiashleaked not yet started job workers would be counted towards starting builds and thus explain the current behavior09:57
AJaegertobiash: see backscroll on #openstack-infra as well, please09:58
AJaegerlet's discuss there09:58
*** jaianshu has joined #zuul10:14
*** dtruong has quit IRC10:35
*** dtruong has joined #zuul10:35
openstackgerritMerged openstack-infra/nodepool master: Convert nodepool-zuul-functional job  https://review.openstack.org/54059511:53
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: Add /node-list to the webapp  https://review.openstack.org/53556212:19
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: Add /label-list to the webapp  https://review.openstack.org/53556312:19
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: Refactor status functions, add web endpoints, allow params  https://review.openstack.org/53630112:19
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: Add /node-list to the webapp  https://review.openstack.org/53556212:25
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: Add /label-list to the webapp  https://review.openstack.org/53556312:25
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: Refactor status functions, add web endpoints, allow params  https://review.openstack.org/53630112:25
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: Add a separate module for node management commands  https://review.openstack.org/53630312:25
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: webapp: add optional admin endpoint  https://review.openstack.org/53631912:26
*** sshnaidm|rover is now known as sshnaidm|afk12:27
*** hashar is now known as hasharAway12:28
*** jaianshu has quit IRC12:50
*** jpena is now known as jpena|lunch12:51
*** jimi|ansible has joined #zuul13:44
*** jimi|ansible has quit IRC13:44
*** jimi|ansible has joined #zuul13:44
*** jpena|lunch is now known as jpena13:47
*** sshnaidm|afk is now known as sshnaidm|rover13:52
*** elyezer has quit IRC13:58
*** elyezer has joined #zuul14:00
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Fix for age calculation on unused nodes  https://review.openstack.org/54128114:11
*** dkranz has quit IRC14:55
*** hasharAway is now known as hashar15:00
*** mnaser has quit IRC15:11
*** mnaser has joined #zuul15:12
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: Add separate modules for management commands  https://review.openstack.org/53630315:20
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: webapp: add optional admin endpoint  https://review.openstack.org/53631915:20
tobiashSpamapS: responded on https://review.openstack.org/#/c/54077415:21
SpamapStobiash: yeah, I'm thinking it might actually be useful as an entry point rather than a template.15:30
SpamapStobiash: or something to import from your debugging template15:30
SpamapSI've used it a couple of times now15:30
SpamapSand thought "i wish this just took --foo"15:31
SpamapSbut it's good as-is, which is why I +3'd :)15:31
tobiashSpamapS: yeah, I'm open for improvements ;)15:31
tobiashit's also bugging me that I have an unclean workspace when using this ;)15:31
tobiashbut didn't have time yet for polishing this15:32
SpamapSYeah so if nothing else we could make it into a module and import it into debug scripts or even just python cli15:33
SpamapSbut, just thoughts on some yak shaving for the future15:34
tobiashSpamapS: that sound cool15:35
pabelangertobiash: so, I think we might have missed finishJob() in stopJobByUnique(): http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/executor/server.py#n203415:51
pabelangertobiash: IIUC, if zuul tests the job to stop, either disk is full or timeout, we don't del key from self.job_workers15:51
pabelangers/tests/tells15:51
pabelangerbut, not 100% sure15:52
tobiashpabelanger: have to grok that deeper15:52
tobiashBut heading home now. Maybe I can look at that later this evening.15:53
pabelangernp, I'll see if others can confirm once openstack-infra is more stable15:53
openstackgerritMerged openstack-infra/zuul master: Fix github connection for standalone debugging  https://review.openstack.org/54077215:56
*** dkranz has joined #zuul15:56
*** Wei_Liu has quit IRC16:48
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Do not delete unused but allocated nodes  https://review.openstack.org/54137517:02
*** jpena is now known as jpena|off17:18
openstackgerritMerged openstack-infra/zuul master: Enhance github debugging script for apps  https://review.openstack.org/54077418:03
*** JasonCL has quit IRC18:05
*** JasonCL has joined #zuul18:06
*** JasonCL has quit IRC18:07
*** JasonCL has joined #zuul18:08
*** JasonCL has quit IRC18:11
*** JasonCL has joined #zuul18:12
*** JasonCL has quit IRC18:15
*** myoung is now known as myoung|food18:15
*** sshnaidm|rover is now known as sshnaidm|bbl18:16
*** JasonCL has joined #zuul18:20
*** JasonCL has quit IRC18:21
*** JasonCL has joined #zuul18:22
*** Wei_Liu has joined #zuul18:32
*** harlowja has joined #zuul18:35
openstackgerritMatthieu Huin proposed openstack-infra/nodepool master: [WIP] webapp: add optional admin endpoint  https://review.openstack.org/53631918:54
*** myoung|food is now known as myoung19:03
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Rework log streaming to use python logging  https://review.openstack.org/54143420:13
mordredShrews: if you get a second, I got that ^^ mostly there, but I'm a little stuck20:13
mordredShrews: just running tests.unit.test_log_streamer.TestStreaming.test_streaming ... it doesn't seem like LogRecordStreamHandler is getting constructed or called20:15
mordredcorvus: ^^ overall shape should be in place - working on trying to figure out testing20:16
Shrewsmordred: yeah, will take a look20:17
mordredShrews: also - hopefully this version of that patch makes more sense than the last time I pushed it up20:17
Shrewsmordred: you're going to finally get me to get zuul tests to run locally, aren't you?  ugh20:21
* Shrews waits for the zuul logs :)20:22
mordredShrews: you might be waiting for a long time today :)20:25
mordredShrews: this one is pretty easy -  ttrun -epy35 tests.unit.test_log_streamer.TestStreaming.test_streaming works easy-peasy for it20:26
dmsimardDo we need to initialize a counter anywhere when adding a new one ? or does statsd.incr create it if it doesn't exist ? Trying to find but no luck20:30
clarkbI think it creates it if you send it data20:32
tobiashmordred: did I correctly understand your commit message that the node connects back to the executor for log streaming?20:32
mordredtobiash: sort of - it connects over a port forwarded over the ssh connection20:34
tobiashmordred: ah, that's ok20:34
mordredtobiash: yah - the other thing would not work very well :)20:34
tobiashmy build nodes don't even have a route back to my executor :)20:35
mordredtobiash: if we can get it working, it shoudl allow us to delete zuul_console and have things work across remote node reboots20:35
mordredtobiash: indeed!20:35
tobiashthat sounds pretty cool20:35
mordrednow - if only we can figure why the tests don't work20:37
Shrewsmordred: so, this is changing things between the executor and the node, right?20:47
openstackgerritMerged openstack-infra/nodepool master: Fix for age calculation on unused nodes  https://review.openstack.org/54128120:48
Shrewsmordred: you're changing a test that is testing what the streaming daemon would send back to a finger client. maybe we need a new test there20:48
Shrewsbecause i think we should test both things still20:48
Shrewserr, test the original thing still. plus your new thing20:48
mordredShrews: ah - well, yes - I agree20:59
tobiashzookeeper is a crazy memory hog...21:02
tobiashwe have a 5 node cluster, each taking 500mb ram21:02
tobiashfor managing 40kb of data...21:02
clarkbtobiash: I think that is likely due to the way the jvm works21:06
clarkbtobiash: you can tune that but by default the jvm uses some heuristics to know how much memory the heap should preallocate for a minimum21:06
clarkband it goes out and grabs that memory when it starts21:06
clarkbit will then add on more memory up to the maximum limit21:06
*** dkranz has quit IRC21:07
clarkbchances are it is actually using much less than that memory if you only have 40kb of data21:07
tobiashwell, the memory consumption is constant and I don't really care about that :)21:08
clarkbbut ya thats how java works21:08
tobiashbtw, having that on tmpfs works really great21:08
clarkbhaving the disk backing zk on tmpfs?21:09
tobiashthe zk latency is almost all of the time below 5ms21:09
tobiashyes21:09
mordredjust like mysql-cluster21:10
tobiashwe had some trouble with IO due to ceph cluster restructure and meltdown patching21:10
tobiashso we increased the replica to 5 and put the data and datalog onto tmpfs21:10
tobiashsince then we had no problems with zk anymore21:11
clarkbI don't think we've really been having problems with zk21:11
tobiashwe had occationally io latencies over 20s21:11
tobiashwhich broke zk sessions21:12
pabelangeronly issue so far has been when we lost nodepool.o.o, but even then we didn't lose any data21:12
pabelangerjust few hour outage, then back online21:12
mordredShrews: k. i've got the test redone to be a new test and not mess with the existing test21:13
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Rework log streaming to use python logging  https://review.openstack.org/54143421:13
*** threestrands has joined #zuul21:32
openstackgerritDavid Moreau Simard proposed openstack-infra/zuul master: Add Executor Merger and Ansible execution statsd counters  https://review.openstack.org/54145221:46
dmsimardcorvus: ^ as per discussed earlier21:46
jheskethMorning21:46
SpamapSHas anybody thought about possibly making src_dir absolute rather than relative?21:50
SpamapSI find myself doing a lot of  'chdir: "~{{ ansible_user }}/{{ zuul.project.src_dir }}" after a job fails because I used become: True and suddenly the path isn't relative to where I am on the filesystem.21:50
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Fix stuck node requests across ZK reconnection  https://review.openstack.org/54145421:52
corvusShrews, clarkb, mordred, fungi, dmsimard, pabelanger: ^ that should take care of the scheduler wedge from this morning21:53
clarkbcool I'll take a look shortly. Currently trying to figure out why zuul tests are unhappy on my desktop after changing the timeout behavior21:54
corvusSpamapS: i think it's that way because the first part of it is image dependent21:56
dmsimardcorvus: we know the username we'll be using ahead of time though, right ?21:57
corvusdmsimard: yes.  we don't know if it has a home directory, or if that's where the repos are.21:58
dmsimardI guess21:58
SpamapScorvus: yeah, maybe we can set a fact in prepare-workspace that is responsive to the image?22:11
SpamapSI haven't looked, maybe we even do.22:11
SpamapSAnother thing is I kind of wish I could just tell Ansible to set its default CWD to something and not change whether becoming or not.22:11
*** openstackgerrit has quit IRC22:16
*** openstackgerrit has joined #zuul22:24
openstackgerritMerged openstack-infra/nodepool master: Do not delete unused but allocated nodes  https://review.openstack.org/54137522:24
clarkbdoes anyone know if the test_disk_accountant test is flaky? I am getting testtools.matchers._impl.MismatchError: {'/tmp/tmpf7v6_m8n/012345'} != set() which seems like it shouldn't be related to chagnes in timeouts but still digging22:26
*** hashar has quit IRC22:26
mordredShrews: woot. I think I got it22:43
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Rework log streaming to use python logging  https://review.openstack.org/54143422:46
clarkblooks like disk accountant tests are not arbitrary fs safe /me working on fix for that22:48
clarkbalso we leak tmpdirs like crazy22:48
clarkbalso looking at fixing that22:48
*** myoung is now known as myoung|off22:53
dmsimardmordred: someone smarter than me once told me to use json instead of pickle due to security concerns but I see there's already a comment about that :p22:56
mordreddmsimard: :)22:59
openstackgerritClark Boylan proposed openstack-infra/zuul master: Make timeout value apply to entire job  https://review.openstack.org/54148523:02
openstackgerritClark Boylan proposed openstack-infra/zuul master: Sync when doing disk accountant testing  https://review.openstack.org/54148623:03
Shrewsmordred: awesome. i will look more closely again also too for a second time tomorrow23:08
mordredShrews: awesome. the new test works now - there's a few things about it now that it's all hanging together where I think  I might need to re-think a chunk of it23:11
Shrewscorvus: funny... my name is associated with the first part of that fix, but I have very little memory of it. your part looks great though. +223:12
pabelangerjust noticed a statsd error in zuul debug log: http://paste.openstack.org/show/664149/23:16
pabelangershould be straightforward fix23:16
corvusShrews: yeah, pretty fuzzy for me too :)23:18
openstackgerritClark Boylan proposed openstack-infra/zuul master: Use nested tempfile fixture for cleanups  https://review.openstack.org/54148723:20
clarkbok thats a couple fixes for the tests based on local experiences23:20
clarkbwe already seemed to have a test that covers job timeouts that appears to still be valid after my timeout change so I haven't added a new test23:20
*** sshnaidm|bbl has quit IRC23:21
clarkbreviewing corvus' fix for the stuck nodes now23:22
clarkbcorvus: left a thought/comment/question on https://review.openstack.org/#/c/541454/ I think I'm ok with it as is but I think we can possibly simplify and make things a little easier to understand with a minor change23:37
corvusclarkb: yep.  i have to push up a pep8 fix so i'll incorporate that23:39
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Fix stuck node requests across ZK reconnection  https://review.openstack.org/54145423:40
corvusclarkb, Shrews: ^23:40
clarkbugh browser so slow trying to +2 somehow ended up clicking the cherry pick button23:41
clarkbin any case +2'd23:41
clarkbnow to fix firefox23:41

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!