Thursday, 2019-04-25

openstackgerritClark Boylan proposed zuul/zuul master: Make paused status bar blue  https://review.opendev.org/65558800:02
openstackgerritClark Boylan proposed zuul/zuul master: Tiny cleanup in change panel js  https://review.opendev.org/65558900:02
clarkbsomeone that understands react and jsx etc better than me should review ^00:04
*** jamesmcarthur has quit IRC00:04
*** jamesmcarthur has joined #zuul00:33
*** mattw4 has quit IRC00:42
*** rlandy|bbl is now known as rlandy00:48
*** jamesmcarthur has quit IRC00:50
*** jamesmcarthur has joined #zuul00:52
*** jamesmcarthur has quit IRC00:53
*** jamesmcarthur has joined #zuul00:54
*** jamesmcarthur has quit IRC00:56
*** jamesmcarthur has joined #zuul00:56
*** jamesmcarthur has quit IRC02:00
*** jamesmcarthur has joined #zuul02:01
*** jamesmcarthur has quit IRC02:02
*** jamesmcarthur has joined #zuul02:02
*** bjackman has joined #zuul02:24
*** jamesmcarthur has quit IRC02:52
*** jamesmcarthur_ has joined #zuul02:53
*** bhavikdbavishi has joined #zuul03:05
*** smcginnis has quit IRC03:06
*** bhavikdbavishi1 has joined #zuul03:08
*** bhavikdbavishi has quit IRC03:09
*** bhavikdbavishi1 is now known as bhavikdbavishi03:09
*** jamesmcarthur_ has quit IRC03:14
*** jamesmcarthur has joined #zuul03:16
*** bhavikdbavishi has quit IRC03:21
*** bjackman has quit IRC03:39
*** bhavikdbavishi has joined #zuul03:46
*** bhavikdbavishi has quit IRC03:49
*** bjackman has joined #zuul03:51
*** jamesmcarthur has quit IRC03:58
*** pcaruana has joined #zuul04:11
*** bjackman has quit IRC04:42
*** pcaruana has quit IRC04:43
*** bjackman has joined #zuul04:44
*** quiquell|off is now known as quiquell05:09
bjackmanIn the "independent" pipeline manager, is it expected that new patchsets to cross-repo dependee changes re-trigger dependent changes?05:30
*** electrofelix has joined #zuul06:10
*** pcaruana has joined #zuul06:20
AJaegerbjackman: no, this is nowhere done06:47
*** AJaeger has quit IRC06:47
*** quiquell is now known as quiquell|brb06:48
*** bjackman has quit IRC06:51
*** themroc has joined #zuul06:54
*** AJaeger has joined #zuul06:56
*** zbr has joined #zuul07:24
*** bhavikdbavishi has joined #zuul07:38
*** gtema has joined #zuul07:40
*** jpena|off is now known as jpena07:43
*** quiquell|brb is now known as quiquell08:07
*** zbr is now known as zbr|rover08:18
*** zbr|rover has quit IRC09:42
*** zbr has joined #zuul09:43
electrofelixUsing the docker-compose env, where is the ARA generated?09:47
*** bhavikdbavishi has quit IRC10:20
*** jpena is now known as jpena|lunch10:55
*** rfolco|ruck is now known as rfolco|ruck|doct11:12
*** zbr has quit IRC11:31
*** zbr has joined #zuul11:32
*** gtema has quit IRC11:38
*** bhavikdbavishi has joined #zuul11:44
*** panda is now known as panda|lunch11:59
*** hashar has joined #zuul12:08
*** rlandy has joined #zuul12:20
*** altlogbot_2 has quit IRC12:34
*** themroc has quit IRC12:34
*** themr0c has joined #zuul12:36
*** altlogbot_0 has joined #zuul12:40
*** altlogbot_0 has quit IRC12:47
*** altlogbot_2 has joined #zuul12:50
*** gtema has joined #zuul13:05
*** rfolco|ruck|doct is now known as rfolco|ruck13:18
*** altlogbot_2 has quit IRC13:23
*** themr0c has quit IRC13:32
*** altlogbot_1 has joined #zuul13:34
*** hashar has quit IRC13:36
*** themroc has joined #zuul13:42
*** altlogbot_1 has quit IRC13:42
*** themroc has quit IRC13:43
*** themroc has joined #zuul13:44
*** altlogbot_1 has joined #zuul13:58
*** panda|lunch is now known as panda14:02
*** gtema has quit IRC14:06
corvuselectrofelix: it isn't; the jobs are very simple and don't use the ara roles14:22
*** themroc has quit IRC14:32
*** themroc has joined #zuul14:33
*** jpena|lunch is now known as jpena14:33
electrofelixcorvus: oops, badly put together question, should have asked, where do I need to look to have jobs provide ARA14:41
electrofelixI don't mind adding it in, just not sure where bits are placed, is it fully a post part of the jobs?14:42
corvuselectrofelix: yes, there's a role for it; you can look in zuul-jobs for the role and its documentation, or the opendev jobs14:44
corvuselectrofelix: we just discovered a problem that may impact that though14:45
corvuselectrofelix: i'm not 100% sure that's going to be stable in the quickstart environment for the next little bit.  i think it should work now, but i think we're about to break it and fix it14:46
corvuselectrofelix: so just keep that in mind, and sorry14:46
electrofelixcorvus: thanks, no worries, I'm busy breaking it and fixing as I go as well14:48
*** electrofelix has quit IRC14:48
*** electrofelix has joined #zuul14:48
*** electrofelix has quit IRC14:49
*** ericbarrett has joined #zuul14:57
*** themroc has quit IRC15:04
*** themroc has joined #zuul15:05
*** quiquell is now known as quiquell|off15:14
*** jamesmcarthur has joined #zuul15:20
*** themroc has quit IRC15:30
*** jamesmcarthur has quit IRC15:32
*** jamesmcarthur has joined #zuul15:34
openstackgerritMerged zuul/zuul master: Update references for opendev  https://review.opendev.org/65423815:45
pabelanger\o/15:46
tobiashwohoo, took just a day of rechecking15:50
tobiashI have no idea why this got that unstable :(15:51
pabelangerdoes insecure-ci-registry.opendev.org no have a web interface to see what images are currently uploaded?15:53
pabelangerOr do you need to use the docker CLI to see that?15:53
pabelangertobiash: btw: I think ansible 2.8.0rc1 is today, at least according to roadmap15:55
pabelangergoing to refresh your patch and make sure everything is green15:55
*** weshay|rover is now known as weshay15:59
*** hashar has joined #zuul16:06
*** jamesmcarthur has quit IRC16:18
*** sshnaidm|off has quit IRC16:18
*** pcaruana has quit IRC16:19
*** quiquell|off has quit IRC16:20
*** jamesmcarthur has joined #zuul16:31
*** zbr is now known as zbr|rover16:37
*** pcaruana has joined #zuul16:40
*** hashar has quit IRC16:45
*** jpena is now known as jpena|off16:50
*** mattw4 has joined #zuul16:55
*** saneax has quit IRC16:57
*** sshnaidm has joined #zuul17:03
*** sshnaidm has quit IRC17:03
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add retries to skopeo copy operations  https://review.opendev.org/65573917:20
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add retries to skopeo copy operations  https://review.opendev.org/65573917:22
mordredpabelanger, tobiash: ^^ does that retries syntax look ok?17:30
*** nickx-intel has quit IRC17:31
tobiashmordred: lgtm17:37
mordredtobiash: thanks!17:37
*** hashar has joined #zuul17:39
corvustobiash: two big problems held up 654238 -- unit tests unstable (the usual thing), and some problems with the buildset registry system17:41
corvusmordred and i have been working through those in infra;  655739 is hopefully a fix to one of them17:41
corvushttps://review.opendev.org/655744 is the other17:41
tobiashcorvus: ah, so two random things that accumulate17:41
corvusyep17:41
mordredgotta love when that happens17:42
*** jangutter has quit IRC17:42
corvusthey seemed to be very cooperative in that very often, exactly one of them caused the patch to fail.  :|17:42
corvuswhile we're waiting on that -- i looked at our mem usage: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all17:42
corvusit looks ridiculously good17:42
corvusand we're still pretty busy today17:43
corvushopefully that means the problem is in one of those 2 patches clarkb reverted17:43
corvus(clarkb performed a local revert of 2 patches on our scheduler; they haven't been reverted in the codebase yet)17:43
corvus9f7c642a and 3704095c are the ones he reverted17:45
tobiashimpressive17:47
tobiashoh, both patches by me :/17:48
corvusand reviewed by me :)17:48
Shrewshrm, i wish the nodepool leak was so easy to nail down17:48
corvusShrews: i'm happy it's not so urgent that we have to emergency-local-revert patches just to do something to keep it from being on fire :)17:49
corvusShrews: but.... i do think that's a perfectly fine thing to do if you want to try bisecting17:49
corvusShrews: it's actually pretty low-impact for nodepool.  but i know we merged a bunch of leaky-cleanup things recently, and that could make it tricky.17:50
Shrewscorvus: i suppose we could try reverting the zk cache on 1 launcher to see if our suspicions are at least on the right track17:51
clarkbtobiash: fwiw I identified ~4 likely changes but only those two I felt were safe to revert17:51
tobiashso we have one bisect step left...17:52
clarkbthe other two were my security fix and the pass artifacts to child jobs17:52
openstackgerritMerged zuul/zuul-jobs master: Add retries to skopeo copy operations  https://review.opendev.org/65573917:54
corvusShrews: yeah -- that's necessary for the relative_priority feature too, so it'd have to go as well... that would be :( but we can live with it for a while17:54
Shrewscorvus: oh, ugh17:54
Shrewsnot sure if we can run launchers in different feature sets. i'd have to take a look at that relative_priority code again17:55
corvusShrews: i think it should be safe17:55
Shrewscorvus: might be better to attempt that next week when there is less traffic? if we mess up the zk database somehow, it wouldn't be so impactful then17:56
pabelangertristanC: SpamapS: you maybe interested in https://review.opendev.org/655188/ for a github improvement on merge commits17:59
pabelangeralso, once things stabliaze with our jobs, I'd love to get some eyes on https://review.opendev.org/655204/ another improvement for github to help make sure zuul doesn't miss and event18:00
*** bhavikdbavishi has quit IRC18:00
Shrewscorvus: though, i don't see any dependency on the cache here: https://review.opendev.org/62095418:01
Shrewsoh, nm18:01
Shrewsstill, so much in flux right now, might be best to wait a bit anyway18:06
pabelangeris there a way to do pagination on build results?18:11
*** hashar has quit IRC18:20
tobiashpabelanger: yes, but only in the api, not the ui18:22
tobiashit supports the parameters count and skip18:23
*** corvus is now known as jeblair18:23
*** jeblair is now known as corvus18:23
pabelangertobiash: okay, thanks18:23
tobiash(I think it was count and skip)18:24
pabelangertobiash: btw: did you ever get around to trying molecule with zuul-executor again? IIRC you pushed up a patch a while back to add support18:44
tobiashpabelanger: I never did anything with molecule yet18:44
tobiashmaybe it was SpamapS?18:45
pabelangeroh, sorry18:45
pabelangermitegon18:45
pabelangergot too many ansible things on brain18:45
SpamapSI did mitogen yes18:45
openstackgerritTobias Henkel proposed zuul/zuul master: WIP add repl  https://review.opendev.org/57996218:45
SpamapS'twas fast, but I did not do it in a Zuul context18:45
tobiashpabelanger: I did a proof of concept with mitogen18:46
tobiashin zuul18:46
tobiashwhich worked in a quick test18:46
pabelangeryah, I just enabled it for some testing yesterday, and now looks like ansible-playbook is using less resources18:46
tobiashpabelanger: https://review.opendev.org/582728 and parent18:47
pabelangerwas thinking, if worked with zuul, as a way to sqeeze more jobs out of an executor18:47
openstackgerritPaul Belanger proposed zuul/zuul master: Support Ansible 2.8  https://review.opendev.org/63193318:48
pabelanger2.8.0rc1 is now out ^18:48
tobiashI also had to patch the zuul_stream module to fork which i don't seem to have pushed up18:48
pabelangertobiash: cool, will look tonight18:49
pabelangernot an issue for us yet, but something I remembered today18:49
tobiashpabelanger: btw, in my deployment the io during job startups seems to be the mainly limiting factor to the number of jobs per executor19:02
*** jamesmcarthur has quit IRC19:02
tobiashbut my users are also mainly dealing with large repos19:03
pabelangermordred: corvus: fungi: clarkb: tobiash: do you mind add https://review.opendev.org/655474/ to your review pipeline, that is related to the recent ML discussion about pushing dev release to pypi so humans can pip install --pre zuul19:13
corvuspabelanger: honestly, i'm very much focused on stability at this point19:14
*** jamesmcarthur has joined #zuul19:14
pabelangerack, understood19:14
corvuswe have a very large memory leak, the unit test jobs are unstable, the registry jobs are not working, there seems to be a serious bug with artifact passing causing loops...19:15
corvusthose are just the major fires that come to mind19:15
corvusif anyone is looking for a way to help, i'd love a theory as to how the loop that shows up in https://review.opendev.org/655173 is even possible19:15
corvusor why did this fail?  http://logs.openstack.org/91/655491/2/gate/zuul-tox-remote/4440cc9/testr_results.html.gz19:16
corvusthat's a problem with the log streaming19:16
corvusit looks like it ran the rescue task -- why didn't it end up in the console log?19:16
*** jamesmcarthur has quit IRC19:17
corvusmaybe it's time to sound the alarm and ask that we all drop work on new features and dig into those things19:18
*** jamesmcarthur has joined #zuul19:18
corvusi'm happy to review patches that are major blockers to folks, but otherwise, please pitch in and help maintain the stability of the software19:19
pabelangersure, I should have some cycles tomorrow to dig into some tox failures19:19
corvuszuul-maint: due to the current instability, i suggest that we don't approve changes other than test or critical bugfixes and focus on the multitude of problems that have crept up recently which are preventing us from merging changes.19:22
SpamapStobiash:I see the same thing btw... IO from the rsyncs is also what pushes my executors. And since it all comes at once, it's pretty bursty.19:28
SpamapSI currently only run two executors.. and with a monorepo.. some changes kick off 10 jobs ... job startup lags a *lot* when 3 or more patches are submitted all at once.19:29
Shrewscorvus: done with the many doctoring things i had today. still need someone to look at the looping in 655173?19:29
SpamapSBut I figure that's just how things work. :-P19:30
tobiashSpamapS: you may want to use https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/prepare-workspace-git instead of prepare-workspace19:30
corvusShrews: yeah, i'm stumped there19:30
SpamapSI don't have a builder, so it won't help.19:31
SpamapStobiash:cool though!19:31
SpamapSUntil somebody finishes off nodepool-builder mods for AWS... I'm just on stock Ubuntu images.19:31
corvusShrews: the methods involved are supposed to be called a lot, but it's supposed to only leave that warning once becuase it creates a fake build19:31
*** jamesmcarthur has quit IRC19:32
Shrewscorvus: this is seen in any of the 4 failed tests?19:32
corvusShrews: it's the "requirements not met by build" thing19:32
corvusif you're using hideci, don't.  :)19:32
Shrewsah ha19:32
corvuspabelanger: are you using hideci?19:33
tobiashcorvus: what do you mean with loop wrt 655173?19:33
tobiashthe zuul-stream-functional failures?19:33
Shrewstobiash: see the zuul comment19:33
corvusno one can see the error except me because of hideci :(19:33
tobiashoh I see it now19:33
corvuszuul-maint: please don't use the "hide ci" functionality on the opendev gerrit19:34
tobiashcorvus: I turn it off frequently but it seems to default to on19:34
corvuswe need to get rid of that once we upgrade gerrit19:34
Shrewscorvus: it's not that the ci comment is hidden, it just isn't expanded19:34
tobiash(web ui)19:34
Shrewsnot sure if that's configurable19:34
corvusShrews: it may be both things19:35
corvusi'm going to assume that's why pabelanger kept leaving recheck comments instead of reporting the bug19:36
corvusas the authors of the ci system, we kind of need to look at it :)19:36
pabelangerhideci is off now19:37
pabelangeras for rechecks in 655173, ya I was trying to see if build-images was going to be green19:38
pabelangerbut last night did see unitest failure you mentioned, I just haven't dug into it yet19:38
tobiashcorvus: so looking at the code the expected result would have been a failed build instead right?19:39
corvusto be clear, *this* is the issue with 655173: https://screenshots.firefox.com/gOvoTW5JW5PvymIT/review.opendev.org19:40
corvusand apparently firefox screenshots crops images to a max height19:40
pabelangerOh19:40
pabelangerno, i did not see that19:40
tobiashcrystal clear now ;)19:40
tobiashhttps://opendev.org/zuul/zuul/src/branch/master/zuul/model.py#L238319:40
tobiashthat's the line that emits the warning19:40
pabelangerI really should setup gertty again19:40
corvusyep.  and the next 3 lines are the thing that's supposed to keep it from happening more than once19:40
tobiashbut the context tells me that if we hit that we want to fail the build19:40
corvustobiash: i think the build was failed (it's zuul-quick-start)19:41
corvusit shows up failed in that screenshot19:41
corvus(it's garbled in the report because the format is wrong, but it's there)19:41
corvusoh, i bet another "openstackism" is at play here19:42
corvusit's not reported that way in the table at the top19:42
corvusand again, that's hideci's fault19:42
openstackgerritTobias Henkel proposed zuul/zuul master: WIP: Fix requirements loop warning  https://review.opendev.org/65578119:45
tobiashcorvus: I think this should fix that loop ^19:45
corvustobiash: OOOOOOOOOOH19:46
corvustobiash: that makes sense -- it used to be skipped19:46
tobiashthe failed build is not ignored there19:46
corvusi looked at that over and over running that code in my head and kept making the wrong branch choices there19:46
corvusthank you19:46
tobiashno problem :)19:47
tobiashbut it's too late for me to think about a proper test case, so feel free to take it over19:47
Shrewsoh, i'm glad tobiash looked as I certainly don't have the context of that code in my head  :)19:49
clarkb19:52
clarkbderp19:52
tobiashcorvus: another thing, shouldn't the result be FAILURE instead of FAILED?19:52
corvustobiash: yes19:53
corvustobiash: let's fix that too19:53
openstackgerritTobias Henkel proposed zuul/zuul master: WIP: Fix requirements loop warning  https://review.opendev.org/65578119:53
tobiashboth need to match otherwise the fix is not a fix :)19:54
corvusif we're lucky the existing test just needs to be updated to check that the warning appears *once*19:58
corvusit checks that it's present, but not that it's not duplicated19:58
corvuswe're not lucky20:00
mnaserhi zuul-ers -- im trying to fix opendev to catch those extra failure templates20:06
mnaserI have 'ipa-tempest-partition-bios-ipmi-direct-tinyipa-src finger://ze04.openstack.org/4fcf5618a3624c039c4cd8669f982f28 : RETRY_LIMIT in 36m 52s' as a potential example of failure20:07
mnaserand then the 'FAILED' one too ?20:07
mnaseris there somewhere I can find in the code the possible ways that Zuul can report things to gerrit20:07
mnaserso I can try to catch as much of em as possible20:07
Shrewscorvus: that log streamer error (rescue output not found) is weird. I see it logged in the test output20:07
mnaserokay, I assume https://opendev.org/zuul/zuul/src/branch/master/zuul/reporter/__init__.py20:08
mnaserhttps://opendev.org/zuul/zuul/src/branch/master/zuul/reporter/__init__.py#L207-L208 well thats an interesting choice of var :p20:10
mordredmnaser: I started poking at a solution to the problem you're talking about the other day but havne't had time to come back to it20:11
mordredmnaser: things are on fire a bit - so how about I circle around with you on it tomorrow or at the ptg?20:11
mnasermordred: afaik all that is needed is add the appropriate thing here -- https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/manifests/review.pp#L176-L18020:11
mnasermordred: cool no worries20:11
openstackgerritFabien Boucher proposed zuul/zuul master: A reporter for Elasticsearch with the capability to index build and buildset results in an index.  https://review.opendev.org/64492720:18
openstackgerritFabien Boucher proposed zuul/zuul master: A reporter for Elasticsearch with the capability to index build and buildset results in an index.  https://review.opendev.org/64492720:19
corvusmnaser, mordred: there's no definitive set of things zuul can report.  the only things i will guarantee (for the forseeable future) is that one of them will be "SUCCESS", and that anything else is unsuccessful in some way20:24
corvusi say that because that's all that zuul actually cares about20:24
corvusthe rest are fur humans, and we should feel free to add more as they are useful20:24
corvus"FAILED" was a mistake and we're correcting it20:25
*** panda is now known as panda|off20:30
*** jamesmcarthur has joined #zuul20:32
Shrewstobiash: i know i figured this out at one point in the past, but what was the secret sauce to run the tox remote tests locally?20:42
tobiashShrews: ssh access to user zuul (when using localhost as target don't use the loopback address)20:43
tobiashShrews: and there's an environment variable you have to define ( look in tox.ini)20:44
tobiashzuul_remote_ip or something similar20:45
Shrewstobiash: yeah, something still isn't right20:46
ShrewsZUUL_REMOTE_IPV4=192.168.1.19 ZUUL_REMOTE_KEEP=true ZUUL_SSH_KEY=/home/shrews/.ssh/id_rsa ttrun -eremote tests.remote.test_remote_zuul_stream.TestZuulStream26.test_command20:46
tobiashWhat's the error?20:47
Shrewsi wonder if i need to start a geard manually20:47
tobiashNo, but zookeeper20:47
*** pcaruana has quit IRC20:48
ShrewsI see this repeatedly: http://paste.openstack.org/show/749778/20:48
Shrewsyeah, zk is running20:48
tobiashIs there more log context?20:49
Shrewsoh, i think i need a password less ssh key for the zuul user20:51
Shrewspasswordless20:51
tobiashYes, definitely20:51
tobiashThat probably blocks the build20:51
Shrewstobiash: yep, that was it.20:54
Shrewsthx20:54
corvusmordred: is there a way to squash the alembic migrations?20:54
corvuswe spend enough time running those in tests that to do so might make a noticeable improvement20:55
corvusabout 4-5 seconds in the test i'm looking at right now20:55
corvusthough, also, a lot of tests are configured to use both mysql and pg, but only use one20:56
corvusthat would probably be an easier improvement20:56
corvustobiash: i have a reproducing test case; i'll polish it up and squash it into your change21:13
ShrewsZUUL_REMOTE_KEEP does not appear to actually do anything21:16
Shrewsa 'git grep' finds it only in .zuul.yaml21:17
openstackgerritJames E. Blair proposed zuul/zuul master: Fix requirements loop warning  https://review.opendev.org/65578121:24
corvustobiash, Shrews, mordred, fungi, pabelanger: that should take care of the looping requirements errors we saw in https://review.opendev.org/65517321:24
fungiright on!21:25
fungireviewing will help me spend a few more minutes ignoring the fact that the lawn's only halfway done21:25
pabelangerstatic/.keep strikes again21:26
pabelangerI'm just about to #dadops for a few hours, but happy to review when back21:26
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Don't repeat the etc/alias setup for buildset registry pushes  https://review.opendev.org/65580221:33
*** rlandy has quit IRC21:36
corvuspabelanger: oh, that's what you meant by keep21:37
corvusapparently you were trying to tell me that there is a problem with https://review.opendev.org/65578121:38
fungiand one i did not spot when i +2'd it21:38
corvusenough information was missing that i did not understand your message21:38
fungibut yeah, that would account for those test failures21:39
openstackgerritJames E. Blair proposed zuul/zuul master: Fix requirements loop warning  https://review.opendev.org/65578121:40
*** jamesmcarthur has quit IRC21:44
corvusi did a quick look at the last 10 successes and 10 failures of the tox-py35 job.  gra1 is 0/3, ord is 4/1, dfw is 1/3, iad is 5/3 (success/failure respectively)21:47
corvusi don't know if that's statistically significant or not. rax as a whole is well enough represented on both sides that i think perhaps it's not.21:48
corvusmostly i was looking for whether we could say that a certain level of concurrency was good or bad.21:48
corvusby default (s)testr uses the # of cpus.  maybe we should see if we can halve that?21:49
corvusi'm trying to remember if we have something like "run testr with concurrency=cpu/2"21:49
clarkbcorvus: is it possible the memory leak is affecting the tests?21:50
*** jamesmcarthur has joined #zuul21:51
corvusclarkb: yes, depending on what exactly the leak is21:51
corvusclarkb: i would tend to think though that since our zuul always starts up with about the same amount of memory, the tests alone probably aren't going to kill it that way21:52
corvusbut, considering that tests have gotten worse in approximately the same time period that the memleak showed up, we should not exclude that hypothesis21:52
*** jamesmcarthur has quit IRC21:53
*** jamesmcarthur has joined #zuul21:56
openstackgerritJames E. Blair proposed zuul/zuul master: Halve stestr concurrency  https://review.opendev.org/65580421:56
corvusclarkb, tobiash: ^ let's see what comes out of rechecking that a bunch?21:56
clarkbseems reasonable21:59
*** jamesmcarthur has quit IRC22:04
corvusdo we need 35 and 36 jobs?22:05
clarkbopendev still runs on 35. I think the biggest concern is merging code that is not valid on 35 (like maybe something that assumes ordered dicts?)22:06
corvusthat's a good point22:06
openstackgerritJames E. Blair proposed zuul/zuul master: DNM: exercise halving concurrency  https://review.opendev.org/65580522:11
corvuslet's throw some computers at that ^22:11
openstackgerritJames E. Blair proposed zuul/zuul master: DNM: exercise halving concurrency  https://review.opendev.org/65580522:12
*** jamesmcarthur has joined #zuul22:27
*** jamesmcarthur has quit IRC22:32
openstackgerritJames E. Blair proposed zuul/zuul master: Fix race in test_job_pause_pre_skipped_child  https://review.opendev.org/65580822:36
corvusfound a legit and fixable test race ^22:36
openstackgerritMerged zuul/nodepool master: Update devstack settings and docs for opendev  https://review.opendev.org/65423022:45
mordredmordred: re: alembic migrations - I think nova does rollup patches at release time (although I think they're hand-crafted) - that combine all of the things the migration is doing into a single migration22:51
mordredgah22:51
mordredcorvus: you are corvus, not mordred22:51
corvusmordred: yeah, that sounds good; let's look into that when we come up for air22:52
mordredcorvus: and I think they they try to have one-per-release - I don't think that approach would save us a ton, but perhaps from time to time we roll things up22:52
corvusagreed22:52
mordredcorvus: yeah22:52
mordredcorvus, clarkb: re 35 and 36 - maybe we can get opendev running zuul from containers during our ansible/container time at the PTG - which would then perhaps let us consider bumping the python min and dropping the 3.5 job22:54
corvusif we think other users aren't in the same boat22:54
mordredyeah. we should obviously ask questions / communicate widely before doing such a thing22:54
mordredI believe everyone who regularly talks in here other than opendev are all already on newer python than 3.5 - because they're either using containers or they're deploying with software collections which I believe are using 3.6 over in centos land22:55
corvusooh that's promising22:55
corvusthis has been a very rough day(week), and we still haven't merged the executor path patch; but we have made positive progress on every stumbling block we identified, so i think we're heading in the right direction.23:03
corvusthanks everyone for your help :)23:03
mordredcorvus: ++23:03
mordredcorvus: it's always the worst when it's a cluster of issues - thanks for steering the ship23:04
clarkbany idea if either of the two reverts I made is the one to focus on? or is that further down the todo list since its stable for us in production?23:05
mordredclarkb: I was reading through them earlier and the one that re-orged stuff seemed more likely to me - but that's TOTALLY unscientific23:08
*** jamesmcarthur has joined #zuul23:19
openstackgerritMerged zuul/zuul master: Revert "Prepend path with bin dir of ansible virtualenv"  https://review.opendev.org/65549123:20
corvusyeah, i was trying to get a feel for that too, and don't have a great idea.  they both still look good to me, but i feel like the reorg stuuf (i think that's the job cancel one) is probably more likely to be able to hide something subtle23:27
corvusthe other one (missing projects) seems like like something we can prove correct on a napkin.23:27
corvusi'm probably going to regret saying that.23:27
pabelangercorvus: yes, sorry I was heading out the door, I really should have left that comment on gerrit23:32
mordredcorvus: those are basically my thoughts23:40
mordredthe cancel patch looks fine - but it's big enough that maybe I'm missing something, where the other one seems very straight forward ... and I'm certain I'm going to wind up being wrong :)23:40
pabelangerso looking at http://logs.openstack.org/08/655808/1/check/tox-py36/e72c2e2/testr_results.html.gz failure, I decided to run it locally, for me the it takes 7 seconds to run. that failure took more then 45seconds it seems23:48
*** jamesmcarthur has quit IRC23:48
pabelangerso system likely under serious load23:48
*** jamesmcarthur has joined #zuul23:49
pabelanger2019-04-25 23:03:45,552 kazoo.client                     DEBUG    Received error(xid=2) NoNodeError()23:51
pabelangerthat is from failure23:52
pabelangerbut locally I do not get that23:52
corvusthat's probably normal23:52
openstackgerritJames E. Blair proposed zuul/zuul master: Fix race in test_job_pause_pre_skipped_child  https://review.opendev.org/65580823:54
corvuslet's see if another waituntilsettled is needed there23:54
pabelangerk23:55
corvusthe halve-concurrency patch is looking promising (655804 and the child which throws a lot of tests at it)23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!