Friday, 2017-04-21

clarkbtests.unit.test_scheduler.TestScheduler.test_crd_branch was very near 2 minutes too00:01
SpamapSwhat I'm seeing is each of the two python processes using 70-90 percent of "a CPU"00:02
clarkbtest_crd_check_ignore_dependencies just failed with concurrency=100:05
SpamapSthere's a couple timeouts that are shorter than the default 120s one00:05
clarkbthis is Exception: Timeout waiting for Zuul to settle00:06
SpamapSlike any single waitUntilSettled I think is timed out at 60s or so00:06
SpamapSYeah00:06
SpamapSthat's the one00:06
clarkbI shouldn't be runnin with a global timeout as I am testring directly (no tox)00:06
clarkbunless maybe tox sets the env stuff in the venv directly?00:06
SpamapS30s00:06
SpamapSone test sets it to 12000:06
SpamapSI think it's the behind_dequeue00:06
SpamapSyeah00:07
SpamapSso funny thing is, with concurrency=2 I still hit a system load of 500:10
SpamapSlots of gits getting run00:10
SpamapSjava poking around00:10
SpamapS  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND00:11
SpamapS17765 clint     20   0 4610876 2.389g   9232 S  72.8 15.5  15:17.63 python00:11
SpamapSthat's a healthy sized python!00:11
clarkbwow my python is 4.5g resident00:12
SpamapSah you know, I am looking at dmesg and during the last run I got this:00:13
clarkbwhich is ~2x yours and I am concurrency = 100:13
SpamapS[651803.188168] CPU1: Core temperature above threshold, cpu clock throttled (total events = 4211)00:13
SpamapS(my last concurrency=4 run)00:13
clarkbouch, I don't have that problem. cpu is warm but well below thresholds00:13
SpamapSclarkb: probably a horrible chain of resources in subunit.00:13
SpamapSduring this conc=2 run no throttling00:13
SpamapSbut.. we're also at 16 minutes00:14
clarkbtests.unit.test_scheduler.TestScheduler.test_crd_gate is another almost 2 minute test locally00:14
clarkb(passes though)00:14
clarkbI'm gonna leave this to make my office warm and go and assemble a car seat00:15
SpamapSGah I hate those00:16
SpamapSso I guess the moral of this story is that the Zuul test suite wants cores.00:17
SpamapSlots and lots of cores00:17
jlkso many cores00:19
jlkpartly why I went back to my 8 core cloud VM00:20
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Add 'pr-comment' github webhook event  https://review.openstack.org/44395900:21
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Support GitHub PR webhooks  https://review.openstack.org/43983400:21
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Add basic Github Zuul Reporter.  https://review.openstack.org/44332300:21
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Add 'push' and 'tag' github webhook events.  https://review.openstack.org/44394700:21
SpamapSyeah I think I may need to do something similar00:39
jeblairttrun -e py27 tests.unit.test_scheduler.TestScheduler.test_dependent_behind_dequeue01:53
jeblairRan 1 test in 15.328s01:53
clarkbjeblair: is that with a tmpfs?02:15
*** Shrews has quit IRC05:43
*** yolanda has joined #zuul05:58
*** hashar has joined #zuul07:26
*** bhavik1 has joined #zuul08:18
*** phschwartz has quit IRC08:50
*** bhavik1 has quit IRC10:43
*** hashar has quit IRC11:53
*** hashar has joined #zuul12:07
*** dkranz has quit IRC12:17
*** dmellado has quit IRC12:26
*** dkranz has joined #zuul12:33
*** dmellado has joined #zuul12:47
jeblairclarkb: yes, zookeeper and git on tmpfs.14:21
openstackgerritMerged openstack-infra/zuul feature/zuulv3: move test_disable_at to simple_layout  https://review.openstack.org/45825214:38
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Move check_ignore_dependencies to simple_layout  https://review.openstack.org/45825514:39
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Move test_repo_deleted to simple_layout  https://review.openstack.org/45826914:39
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove org_unknown git repo from single-client test config  https://review.openstack.org/45827014:39
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Move semaphore tests to their own class  https://review.openstack.org/45858414:42
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Move timer tests to commitConfigUpdate  https://review.openstack.org/45859914:44
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove commitLayoutUpdate  https://review.openstack.org/45864814:50
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move two more tests to commitConfigUpdate  https://review.openstack.org/45864714:50
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix race in test_queue_rate_limiting  https://review.openstack.org/45886314:51
jeblairclarkb: can you look at 458863?  it fixes a race that showed up in a test run that bumped the second-to-last change from the merge queue.14:51
mordredjeblair: so the reason it's a race fix in the test itself is because in production we don't care about subsequent order so releasing en-masse is ok- but because we have to track sequence in the tests it's bad, yes?15:31
jeblairmordred: sort-of -- the old fake jenkins was simple enough that it would tend to strictly order test runs, but since we're throwing them at at executor now, they can end up in random order because of context switches.  so that used to be deterministic, but isn't anymore.15:35
mordredjeblair: nod15:35
clarkband the asserts just below the diff care about that order since its test1 test2 for first project then test1 test2 for second15:36
mordredjeblair: I had a similar issue that I _totally_ punted on when we switched off of swiftclient and implemented the threaded object chunk uploads - turns out you can't really assert the sequence of REST calls at the end of the operation when a set of them in the middle are running in parallel on purpose :)15:36
clarkbRan 133 (+132) tests in 6531.664s (+6473.655s) FAILED (id=16, failures=11 (+11), skips=10 was my space heater run15:36
jeblairyeah.  we have some methods to do assertions of completed builds in random order, but for a lot of the in-progress builds like these, it's a bit more work (especially if you want to pick one of them out to do something with).15:37
clarkbI'm gonna grab the above fix, then rerun without sql tests (this time) on tmpfs15:38
jeblairclarkb: that is very slow.  that run on chocolate was 300s.15:38
jeblairclarkb: but i'm glad your office is warm.15:39
clarkbthis was single process15:39
clarkbI wonder how that lines up in total time and not just wall15:39
SpamapSjeblair: I'm on the tip of your optimization branch, and I get a lot higher15:42
SpamapSRan 1 test in 32.540s15:42
SpamapSjeblair: really wondering now what is different15:42
jeblairmodel name      : Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz15:43
SpamapSruh roh.. my tmpfs mount is gone15:43
SpamapSthat's it15:43
jeblairthat'll do it15:43
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix race in test_queue_rate_limiting  https://review.openstack.org/45886315:44
SpamapSerr no15:44
SpamapSRan 1 test in 31.455s15:44
clarkbhow big of a tmpfs is necessary? guessing not too large15:45
SpamapSMy CPU is 0.5Ghz slower.. or 17% ... but I'm at 2x slower15:46
SpamapSclarkb: I just let it do the default of half of RAM.15:46
SpamapSbut probably 200MB would do it.15:46
SpamapSmaybe less15:46
jeblairSpamapS: and zk is still on tmpfs?15:48
SpamapSyep15:48
SpamapSjeblair: can you try adding '/usr/bin/time' to the front of yours so I can see what the process itself does?15:49
SpamapS19.06user 37.66system 0:33.24elapsed 170%CPU (0avgtext+0avgdata 74896maxresident)k15:49
SpamapS0inputs+0outputs (0major+3259084minor)pagefaults 0swaps15:49
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Move two more tests to commitConfigUpdate  https://review.openstack.org/45864715:49
SpamapSso much system :-P15:49
jeblair9.00user 17.40system 0:15.77elapsed 167%CPU (0avgtext+0avgdata 70460maxresident)k15:50
jeblair0inputs+0outputs (0major+4652893minor)pagefaults 0swaps15:50
SpamapSYour CPU is just 2x faster. It's settled.15:52
SpamapSjeblair: OS?15:52
jeblairtrusty15:53
SpamapSinteresting.. I wonder if they broke something on xenial15:53
* SpamapS blames systemd15:53
jeblairPython 2.7.615:53
SpamapSxenial is 2.7.1215:53
SpamapSbut I'd be more suspicious about compiler settings, kernel, and libc things than python 2.7 patch versions15:54
clarkbhuh testr run locally is only using one process15:54
SpamapSclarkb: yeah it only turns on parallelism if you run it via setup.py testr15:54
SpamapSI forget the flag to run testr run with the conc auto thing15:54
clarkboh ha ok. Well I guess I will discover if single cpu is stable with tmpfs15:55
clarkband can expand to multi cpu once I have these results15:55
jeblairi wonder if i can grab a trusty and xenial node from one of our clouds and hope they are suitable for a comparison...15:55
clarkbosa I think it was did notice some python performance issues15:55
clarkbwith xenail15:55
SpamapSjeblair: if you use the bluebox cloud you can use affinity and get them on the same compute node #worstFeatureEver15:56
SpamapSand we may not have turned off affinity on infra cloud.. dunno if it's on by default15:56
jeblairthe only trusty nodes we have handy are on rax-iad :/15:57
jeblairoh, also osic15:57
jeblairi held nodes; they are running jobs.  i'll check back in 30 mins or so and see if they're ready.15:59
clarkbgetting Exception: Timeout waiting for Zuul to settle failures now even with tmpfs, though my zk isn't running on it so I guess thats next step16:11
clarkbmaybe even run under eatmydata too16:12
SpamapSOh I hadn't even considered eatmydata'ing16:13
SpamapSeatmydata did in fact shave a few seconds off16:14
clarkbtests.unit.test_scheduler.TestScheduler.test_crd_multiline took 2 minutes 11 seconds in current run16:17
*** jkilpatr has quit IRC16:28
*** jkilpatr_ has joined #zuul16:28
*** hashar has quit IRC16:29
clarkbok I have it rerunning single process with zk and zuul test root on a tmpfs and under eatmydata for good measure16:32
* SpamapS got side tracked by his old SSL certs expiring and trying to setup certbot16:33
SpamapSofflineimap doesn't seem to like the letsencrypt CA :-|16:33
clarkbif this is still slow then I don't think its purely an IO issue16:33
SpamapSclarkb: no I think it's python perf16:33
SpamapSgiven jeblair's /usr/bin/time output16:33
jeblairSpamapS: yeah, i just pinned the hash, but that's not going to last for long16:33
SpamapShttps://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/163869516:33
openstackLaunchpad bug 1638695 in python2.7 (Ubuntu) "Python 2.7.12 performance regression" [High,Confirmed] - Assigned to Louis Bouchard (louis)16:33
SpamapSjeblair: that's what I was about to do.. that will explode every few weeks tho. :-/16:34
clarkbSpamapS: I'm on tumbleweed which would imply its at least not ubuntu's fault if that is the performance issue16:34
SpamapSclarkb: what's your gcc --version ?16:35
SpamapSxenial is 5.4.016:35
jeblairgcc-4.8.real (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.416:36
mordredclarkb: "yeah it only turns on parallelism if you run it via setup.py testr" - is that going to be a problem for OpenStack with our push to not use the setup.py testr ?16:37
mordredclarkb: or was our push to _use_ setup.py testr and not setup.py test ?16:37
clarkbgcc (SUSE Linux) 6.3.1 20170202 [gcc-6-branch revision 245119]16:37
mordredwow. gcc 4, 5 and 616:37
clarkbmordred: it works with both setup.py test and testr, push is to use test16:37
mordredok. cool. so it's not a problem for the gate - just making sure (that could be really unfortunate)16:38
clarkbmordred: its because both entrypoint things check numcpu and default to that concurrency whereas testr itself defaults to no concurrency16:38
SpamapSmordred: 'setup.py test' does the parallel thing16:38
clarkbbecause this is rolling distro I don't actually know if python was compiled with that gcc16:38
SpamapSsorry I misspoke.16:38
clarkbtrying to query python for the info but so far only getting "GCC"16:38
mordredclarkb: python should tell you if you launch its repl16:38
mordredmordred@solace:~$ python16:39
mordredPython 2.7.12 (default, Nov 19 2016, 06:48:10)16:39
mordred[GCC 5.4.0 20160609] on linux216:39
SpamapSso, the analysis on that launchpad bug is legit16:39
jeblair[GCC 4.8.4] on linux216:39
jeblairSpamapS: that's a fascinating bug -- it sure looks like it's saying "python compiled with gcc 4 is better than later versions"16:39
clarkbmordred: Python 2.7.13 (default, Jan 03 2017, 17:41:54) [GCC] on linux2 :/16:39
SpamapShttp://paste.ubuntu.com/24078834/ <-- the bug explains what each of those 'nopgo' 'nolto' configs are16:39
mordredclarkb: oh wow16:39
mordredclarkb: I'd file that as a bug against tumbleweed :)16:39
clarkbmordred: its possible gcc 6 breaks something in pythons preprocessor funzies16:39
mordredyah16:39
SpamapSjeblair: I think they found that the PGO+LTO combination in gcc 5 is hurting python16:40
jeblairSpamapS: ah, k, i'm still only halfway through16:40
SpamapSthe analysis stops late February.. probably around the time doko started looking for a new job. :-/16:40
SpamapSwe should try squeeze and stretch16:41
jeblairin addition to all the other reasons we should do that, it may help me decide what to upgrade my workstation to.  :)16:42
SpamapSI've been feeling itchy about it since the layoffs16:42
SpamapSI don't think doko was laid off.. but I don't know that he'd stick around either.16:43
* SpamapS stops speculating16:43
jeblairSpamapS: read this?  https://github.com/docker-library/python/issues/16016:44
jeblair(i haven't)16:44
jeblairand related https://bugs.python.org/issue2635916:45
clarkbthe docker issue says python should be quicker with pgo and lto enabled. However thats for python316:46
SpamapShave not read them.16:47
SpamapSbut it sounds to me like xenial needs to go ahead and recompile python with gcc416:47
SpamapSThat's definitely an SRU-worthy issue.16:47
clarkbSpamapS: doesn't that present potential binary comaptiblity issues for people compiling modules?16:48
clarkbI guess python-all-dev or whatever it is would have to dep on gcc4 and then have magic to have things compile under that cc by default16:48
SpamapSit should not16:49
SpamapSit won't change the symbols16:49
clarkbI seem to recall a major deal with fedora switched to gcc 516:49
clarkbit required a full recompile of everything16:49
SpamapSOh god, did they do that? really? :-/16:50
SpamapS(they == gcc)16:50
SpamapSwould not put it past the optimizer monkies16:50
clarkbit may be that fedora opted into something that required it, but there were binary compat issues not too long ago16:50
SpamapSI'd say if ldflags and cflags stay the same (they should) it should be fine16:52
SpamapSbut I could be wrong16:52
*** dmellado has quit IRC16:52
SpamapS(just would shock me that compiler upgrades would break your existing dynamically loaded objects)16:52
SpamapSthat's not really how ELF is supposed to work. But hell.. somebody convinced the world to do systemd.. so they might have done this too.16:53
clarkbwell ELF doesn't say things about calling conventions and the like16:53
clarkbits possible to introduce things that elf is happy about that would otherwise break16:53
*** _ari_ has quit IRC16:53
*** pabelanger has quit IRC16:54
jeblairthose that do not understand computer history are doomed to infinite loop16:54
clarkband confirmed I now have timeout failures16:55
clarkbusing tmpfs and eatmydata16:55
jesusaurjeblair: lol16:56
SpamapSoh yay I got offlineimap working16:58
SpamapSjesusaur: if you're struggling.. https://letsencrypt.org/certs/isrgrootx1.pem.txt <-- sslcacertfile={that saved locally somewhere} worked for me.16:58
SpamapSdoh16:58
SpamapSjeblair: if you're struggling.. https://letsencrypt.org/certs/isrgrootx1.pem.txt <-- sslcacertfile={that saved locally somewhere} worked for me.16:59
SpamapStabcomplete fail16:59
SpamapSclarkb: that's a good point.. I suppose if they really wanted to turn the world on its ear they could 'splode calling conventions. I bet they did. :-/16:59
jeblairSpamapS: oh cool, thanks17:00
*** pabelanger has joined #zuul17:00
SpamapSoh derp17:02
SpamapSjeblair: n/m17:02
jeblairSpamapS: i was just starting to dig into why that wasn't working :)17:02
*** _ari_ has joined #zuul17:02
SpamapSjeblair: at this point I'm reporting it as a github issue and going to get on the hash treadmill17:03
*** dmellado has joined #zuul17:06
*** hashar has joined #zuul17:37
*** hashar is now known as hasharAway17:38
clarkbmordred: #define COMPILER "\n[GCC " __VERSION__ "]" <-is how its defined in python17:42
clarkbI wouldn't expect gcc to break that anyways buuilding for myself and can check17:42
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove commitLayoutUpdate  https://review.openstack.org/45864817:51
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Configurable SSH access to GitHub  https://review.openstack.org/44403418:55
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Support for github commit status  https://review.openstack.org/44406020:19
*** openstackgerrit has quit IRC20:48
*** openstackgerrit has joined #zuul20:59
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Merge pull requests from github reporter  https://review.openstack.org/44446320:59
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Support for github commit status  https://review.openstack.org/44406021:12
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Merge pull requests from github reporter  https://review.openstack.org/44446321:12
*** hasharAway has quit IRC22:46

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!