openstackgerrit | Clark Boylan proposed zuul/zuul master: Make paused status bar blue https://review.opendev.org/655588 | 00:02 |
---|---|---|
openstackgerrit | Clark Boylan proposed zuul/zuul master: Tiny cleanup in change panel js https://review.opendev.org/655589 | 00:02 |
clarkb | someone that understands react and jsx etc better than me should review ^ | 00:04 |
*** jamesmcarthur has quit IRC | 00:04 | |
*** jamesmcarthur has joined #zuul | 00:33 | |
*** mattw4 has quit IRC | 00:42 | |
*** rlandy|bbl is now known as rlandy | 00:48 | |
*** jamesmcarthur has quit IRC | 00:50 | |
*** jamesmcarthur has joined #zuul | 00:52 | |
*** jamesmcarthur has quit IRC | 00:53 | |
*** jamesmcarthur has joined #zuul | 00:54 | |
*** jamesmcarthur has quit IRC | 00:56 | |
*** jamesmcarthur has joined #zuul | 00:56 | |
*** jamesmcarthur has quit IRC | 02:00 | |
*** jamesmcarthur has joined #zuul | 02:01 | |
*** jamesmcarthur has quit IRC | 02:02 | |
*** jamesmcarthur has joined #zuul | 02:02 | |
*** bjackman has joined #zuul | 02:24 | |
*** jamesmcarthur has quit IRC | 02:52 | |
*** jamesmcarthur_ has joined #zuul | 02:53 | |
*** bhavikdbavishi has joined #zuul | 03:05 | |
*** smcginnis has quit IRC | 03:06 | |
*** bhavikdbavishi1 has joined #zuul | 03:08 | |
*** bhavikdbavishi has quit IRC | 03:09 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 03:09 | |
*** jamesmcarthur_ has quit IRC | 03:14 | |
*** jamesmcarthur has joined #zuul | 03:16 | |
*** bhavikdbavishi has quit IRC | 03:21 | |
*** bjackman has quit IRC | 03:39 | |
*** bhavikdbavishi has joined #zuul | 03:46 | |
*** bhavikdbavishi has quit IRC | 03:49 | |
*** bjackman has joined #zuul | 03:51 | |
*** jamesmcarthur has quit IRC | 03:58 | |
*** pcaruana has joined #zuul | 04:11 | |
*** bjackman has quit IRC | 04:42 | |
*** pcaruana has quit IRC | 04:43 | |
*** bjackman has joined #zuul | 04:44 | |
*** quiquell|off is now known as quiquell | 05:09 | |
bjackman | In the "independent" pipeline manager, is it expected that new patchsets to cross-repo dependee changes re-trigger dependent changes? | 05:30 |
*** electrofelix has joined #zuul | 06:10 | |
*** pcaruana has joined #zuul | 06:20 | |
AJaeger | bjackman: no, this is nowhere done | 06:47 |
*** AJaeger has quit IRC | 06:47 | |
*** quiquell is now known as quiquell|brb | 06:48 | |
*** bjackman has quit IRC | 06:51 | |
*** themroc has joined #zuul | 06:54 | |
*** AJaeger has joined #zuul | 06:56 | |
*** zbr has joined #zuul | 07:24 | |
*** bhavikdbavishi has joined #zuul | 07:38 | |
*** gtema has joined #zuul | 07:40 | |
*** jpena|off is now known as jpena | 07:43 | |
*** quiquell|brb is now known as quiquell | 08:07 | |
*** zbr is now known as zbr|rover | 08:18 | |
*** zbr|rover has quit IRC | 09:42 | |
*** zbr has joined #zuul | 09:43 | |
electrofelix | Using the docker-compose env, where is the ARA generated? | 09:47 |
*** bhavikdbavishi has quit IRC | 10:20 | |
*** jpena is now known as jpena|lunch | 10:55 | |
*** rfolco|ruck is now known as rfolco|ruck|doct | 11:12 | |
*** zbr has quit IRC | 11:31 | |
*** zbr has joined #zuul | 11:32 | |
*** gtema has quit IRC | 11:38 | |
*** bhavikdbavishi has joined #zuul | 11:44 | |
*** panda is now known as panda|lunch | 11:59 | |
*** hashar has joined #zuul | 12:08 | |
*** rlandy has joined #zuul | 12:20 | |
*** altlogbot_2 has quit IRC | 12:34 | |
*** themroc has quit IRC | 12:34 | |
*** themr0c has joined #zuul | 12:36 | |
*** altlogbot_0 has joined #zuul | 12:40 | |
*** altlogbot_0 has quit IRC | 12:47 | |
*** altlogbot_2 has joined #zuul | 12:50 | |
*** gtema has joined #zuul | 13:05 | |
*** rfolco|ruck|doct is now known as rfolco|ruck | 13:18 | |
*** altlogbot_2 has quit IRC | 13:23 | |
*** themr0c has quit IRC | 13:32 | |
*** altlogbot_1 has joined #zuul | 13:34 | |
*** hashar has quit IRC | 13:36 | |
*** themroc has joined #zuul | 13:42 | |
*** altlogbot_1 has quit IRC | 13:42 | |
*** themroc has quit IRC | 13:43 | |
*** themroc has joined #zuul | 13:44 | |
*** altlogbot_1 has joined #zuul | 13:58 | |
*** panda|lunch is now known as panda | 14:02 | |
*** gtema has quit IRC | 14:06 | |
corvus | electrofelix: it isn't; the jobs are very simple and don't use the ara roles | 14:22 |
*** themroc has quit IRC | 14:32 | |
*** themroc has joined #zuul | 14:33 | |
*** jpena|lunch is now known as jpena | 14:33 | |
electrofelix | corvus: oops, badly put together question, should have asked, where do I need to look to have jobs provide ARA | 14:41 |
electrofelix | I don't mind adding it in, just not sure where bits are placed, is it fully a post part of the jobs? | 14:42 |
corvus | electrofelix: yes, there's a role for it; you can look in zuul-jobs for the role and its documentation, or the opendev jobs | 14:44 |
corvus | electrofelix: we just discovered a problem that may impact that though | 14:45 |
corvus | electrofelix: i'm not 100% sure that's going to be stable in the quickstart environment for the next little bit. i think it should work now, but i think we're about to break it and fix it | 14:46 |
corvus | electrofelix: so just keep that in mind, and sorry | 14:46 |
electrofelix | corvus: thanks, no worries, I'm busy breaking it and fixing as I go as well | 14:48 |
*** electrofelix has quit IRC | 14:48 | |
*** electrofelix has joined #zuul | 14:48 | |
*** electrofelix has quit IRC | 14:49 | |
*** ericbarrett has joined #zuul | 14:57 | |
*** themroc has quit IRC | 15:04 | |
*** themroc has joined #zuul | 15:05 | |
*** quiquell is now known as quiquell|off | 15:14 | |
*** jamesmcarthur has joined #zuul | 15:20 | |
*** themroc has quit IRC | 15:30 | |
*** jamesmcarthur has quit IRC | 15:32 | |
*** jamesmcarthur has joined #zuul | 15:34 | |
openstackgerrit | Merged zuul/zuul master: Update references for opendev https://review.opendev.org/654238 | 15:45 |
pabelanger | \o/ | 15:46 |
tobiash | wohoo, took just a day of rechecking | 15:50 |
tobiash | I have no idea why this got that unstable :( | 15:51 |
pabelanger | does insecure-ci-registry.opendev.org no have a web interface to see what images are currently uploaded? | 15:53 |
pabelanger | Or do you need to use the docker CLI to see that? | 15:53 |
pabelanger | tobiash: btw: I think ansible 2.8.0rc1 is today, at least according to roadmap | 15:55 |
pabelanger | going to refresh your patch and make sure everything is green | 15:55 |
*** weshay|rover is now known as weshay | 15:59 | |
*** hashar has joined #zuul | 16:06 | |
*** jamesmcarthur has quit IRC | 16:18 | |
*** sshnaidm|off has quit IRC | 16:18 | |
*** pcaruana has quit IRC | 16:19 | |
*** quiquell|off has quit IRC | 16:20 | |
*** jamesmcarthur has joined #zuul | 16:31 | |
*** zbr is now known as zbr|rover | 16:37 | |
*** pcaruana has joined #zuul | 16:40 | |
*** hashar has quit IRC | 16:45 | |
*** jpena is now known as jpena|off | 16:50 | |
*** mattw4 has joined #zuul | 16:55 | |
*** saneax has quit IRC | 16:57 | |
*** sshnaidm has joined #zuul | 17:03 | |
*** sshnaidm has quit IRC | 17:03 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Add retries to skopeo copy operations https://review.opendev.org/655739 | 17:20 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Add retries to skopeo copy operations https://review.opendev.org/655739 | 17:22 |
mordred | pabelanger, tobiash: ^^ does that retries syntax look ok? | 17:30 |
*** nickx-intel has quit IRC | 17:31 | |
tobiash | mordred: lgtm | 17:37 |
mordred | tobiash: thanks! | 17:37 |
*** hashar has joined #zuul | 17:39 | |
corvus | tobiash: two big problems held up 654238 -- unit tests unstable (the usual thing), and some problems with the buildset registry system | 17:41 |
corvus | mordred and i have been working through those in infra; 655739 is hopefully a fix to one of them | 17:41 |
corvus | https://review.opendev.org/655744 is the other | 17:41 |
tobiash | corvus: ah, so two random things that accumulate | 17:41 |
corvus | yep | 17:41 |
mordred | gotta love when that happens | 17:42 |
*** jangutter has quit IRC | 17:42 | |
corvus | they seemed to be very cooperative in that very often, exactly one of them caused the patch to fail. :| | 17:42 |
corvus | while we're waiting on that -- i looked at our mem usage: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all | 17:42 |
corvus | it looks ridiculously good | 17:42 |
corvus | and we're still pretty busy today | 17:43 |
corvus | hopefully that means the problem is in one of those 2 patches clarkb reverted | 17:43 |
corvus | (clarkb performed a local revert of 2 patches on our scheduler; they haven't been reverted in the codebase yet) | 17:43 |
corvus | 9f7c642a and 3704095c are the ones he reverted | 17:45 |
tobiash | impressive | 17:47 |
tobiash | oh, both patches by me :/ | 17:48 |
corvus | and reviewed by me :) | 17:48 |
Shrews | hrm, i wish the nodepool leak was so easy to nail down | 17:48 |
corvus | Shrews: i'm happy it's not so urgent that we have to emergency-local-revert patches just to do something to keep it from being on fire :) | 17:49 |
corvus | Shrews: but.... i do think that's a perfectly fine thing to do if you want to try bisecting | 17:49 |
corvus | Shrews: it's actually pretty low-impact for nodepool. but i know we merged a bunch of leaky-cleanup things recently, and that could make it tricky. | 17:50 |
Shrews | corvus: i suppose we could try reverting the zk cache on 1 launcher to see if our suspicions are at least on the right track | 17:51 |
clarkb | tobiash: fwiw I identified ~4 likely changes but only those two I felt were safe to revert | 17:51 |
tobiash | so we have one bisect step left... | 17:52 |
clarkb | the other two were my security fix and the pass artifacts to child jobs | 17:52 |
openstackgerrit | Merged zuul/zuul-jobs master: Add retries to skopeo copy operations https://review.opendev.org/655739 | 17:54 |
corvus | Shrews: yeah -- that's necessary for the relative_priority feature too, so it'd have to go as well... that would be :( but we can live with it for a while | 17:54 |
Shrews | corvus: oh, ugh | 17:54 |
Shrews | not sure if we can run launchers in different feature sets. i'd have to take a look at that relative_priority code again | 17:55 |
corvus | Shrews: i think it should be safe | 17:55 |
Shrews | corvus: might be better to attempt that next week when there is less traffic? if we mess up the zk database somehow, it wouldn't be so impactful then | 17:56 |
pabelanger | tristanC: SpamapS: you maybe interested in https://review.opendev.org/655188/ for a github improvement on merge commits | 17:59 |
pabelanger | also, once things stabliaze with our jobs, I'd love to get some eyes on https://review.opendev.org/655204/ another improvement for github to help make sure zuul doesn't miss and event | 18:00 |
*** bhavikdbavishi has quit IRC | 18:00 | |
Shrews | corvus: though, i don't see any dependency on the cache here: https://review.opendev.org/620954 | 18:01 |
Shrews | oh, nm | 18:01 |
Shrews | still, so much in flux right now, might be best to wait a bit anyway | 18:06 |
pabelanger | is there a way to do pagination on build results? | 18:11 |
*** hashar has quit IRC | 18:20 | |
tobiash | pabelanger: yes, but only in the api, not the ui | 18:22 |
tobiash | it supports the parameters count and skip | 18:23 |
*** corvus is now known as jeblair | 18:23 | |
*** jeblair is now known as corvus | 18:23 | |
pabelanger | tobiash: okay, thanks | 18:23 |
tobiash | (I think it was count and skip) | 18:24 |
pabelanger | tobiash: btw: did you ever get around to trying molecule with zuul-executor again? IIRC you pushed up a patch a while back to add support | 18:44 |
tobiash | pabelanger: I never did anything with molecule yet | 18:44 |
tobiash | maybe it was SpamapS? | 18:45 |
pabelanger | oh, sorry | 18:45 |
pabelanger | mitegon | 18:45 |
pabelanger | got too many ansible things on brain | 18:45 |
SpamapS | I did mitogen yes | 18:45 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: WIP add repl https://review.opendev.org/579962 | 18:45 |
SpamapS | 'twas fast, but I did not do it in a Zuul context | 18:45 |
tobiash | pabelanger: I did a proof of concept with mitogen | 18:46 |
tobiash | in zuul | 18:46 |
tobiash | which worked in a quick test | 18:46 |
pabelanger | yah, I just enabled it for some testing yesterday, and now looks like ansible-playbook is using less resources | 18:46 |
tobiash | pabelanger: https://review.opendev.org/582728 and parent | 18:47 |
pabelanger | was thinking, if worked with zuul, as a way to sqeeze more jobs out of an executor | 18:47 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Support Ansible 2.8 https://review.opendev.org/631933 | 18:48 |
pabelanger | 2.8.0rc1 is now out ^ | 18:48 |
tobiash | I also had to patch the zuul_stream module to fork which i don't seem to have pushed up | 18:48 |
pabelanger | tobiash: cool, will look tonight | 18:49 |
pabelanger | not an issue for us yet, but something I remembered today | 18:49 |
tobiash | pabelanger: btw, in my deployment the io during job startups seems to be the mainly limiting factor to the number of jobs per executor | 19:02 |
*** jamesmcarthur has quit IRC | 19:02 | |
tobiash | but my users are also mainly dealing with large repos | 19:03 |
pabelanger | mordred: corvus: fungi: clarkb: tobiash: do you mind add https://review.opendev.org/655474/ to your review pipeline, that is related to the recent ML discussion about pushing dev release to pypi so humans can pip install --pre zuul | 19:13 |
corvus | pabelanger: honestly, i'm very much focused on stability at this point | 19:14 |
*** jamesmcarthur has joined #zuul | 19:14 | |
pabelanger | ack, understood | 19:14 |
corvus | we have a very large memory leak, the unit test jobs are unstable, the registry jobs are not working, there seems to be a serious bug with artifact passing causing loops... | 19:15 |
corvus | those are just the major fires that come to mind | 19:15 |
corvus | if anyone is looking for a way to help, i'd love a theory as to how the loop that shows up in https://review.opendev.org/655173 is even possible | 19:15 |
corvus | or why did this fail? http://logs.openstack.org/91/655491/2/gate/zuul-tox-remote/4440cc9/testr_results.html.gz | 19:16 |
corvus | that's a problem with the log streaming | 19:16 |
corvus | it looks like it ran the rescue task -- why didn't it end up in the console log? | 19:16 |
*** jamesmcarthur has quit IRC | 19:17 | |
corvus | maybe it's time to sound the alarm and ask that we all drop work on new features and dig into those things | 19:18 |
*** jamesmcarthur has joined #zuul | 19:18 | |
corvus | i'm happy to review patches that are major blockers to folks, but otherwise, please pitch in and help maintain the stability of the software | 19:19 |
pabelanger | sure, I should have some cycles tomorrow to dig into some tox failures | 19:19 |
corvus | zuul-maint: due to the current instability, i suggest that we don't approve changes other than test or critical bugfixes and focus on the multitude of problems that have crept up recently which are preventing us from merging changes. | 19:22 |
SpamapS | tobiash:I see the same thing btw... IO from the rsyncs is also what pushes my executors. And since it all comes at once, it's pretty bursty. | 19:28 |
SpamapS | I currently only run two executors.. and with a monorepo.. some changes kick off 10 jobs ... job startup lags a *lot* when 3 or more patches are submitted all at once. | 19:29 |
Shrews | corvus: done with the many doctoring things i had today. still need someone to look at the looping in 655173? | 19:29 |
SpamapS | But I figure that's just how things work. :-P | 19:30 |
tobiash | SpamapS: you may want to use https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/prepare-workspace-git instead of prepare-workspace | 19:30 |
corvus | Shrews: yeah, i'm stumped there | 19:30 |
SpamapS | I don't have a builder, so it won't help. | 19:31 |
SpamapS | tobiash:cool though! | 19:31 |
SpamapS | Until somebody finishes off nodepool-builder mods for AWS... I'm just on stock Ubuntu images. | 19:31 |
corvus | Shrews: the methods involved are supposed to be called a lot, but it's supposed to only leave that warning once becuase it creates a fake build | 19:31 |
*** jamesmcarthur has quit IRC | 19:32 | |
Shrews | corvus: this is seen in any of the 4 failed tests? | 19:32 |
corvus | Shrews: it's the "requirements not met by build" thing | 19:32 |
corvus | if you're using hideci, don't. :) | 19:32 |
Shrews | ah ha | 19:32 |
corvus | pabelanger: are you using hideci? | 19:33 |
tobiash | corvus: what do you mean with loop wrt 655173? | 19:33 |
tobiash | the zuul-stream-functional failures? | 19:33 |
Shrews | tobiash: see the zuul comment | 19:33 |
corvus | no one can see the error except me because of hideci :( | 19:33 |
tobiash | oh I see it now | 19:33 |
corvus | zuul-maint: please don't use the "hide ci" functionality on the opendev gerrit | 19:34 |
tobiash | corvus: I turn it off frequently but it seems to default to on | 19:34 |
corvus | we need to get rid of that once we upgrade gerrit | 19:34 |
Shrews | corvus: it's not that the ci comment is hidden, it just isn't expanded | 19:34 |
tobiash | (web ui) | 19:34 |
Shrews | not sure if that's configurable | 19:34 |
corvus | Shrews: it may be both things | 19:35 |
corvus | i'm going to assume that's why pabelanger kept leaving recheck comments instead of reporting the bug | 19:36 |
corvus | as the authors of the ci system, we kind of need to look at it :) | 19:36 |
pabelanger | hideci is off now | 19:37 |
pabelanger | as for rechecks in 655173, ya I was trying to see if build-images was going to be green | 19:38 |
pabelanger | but last night did see unitest failure you mentioned, I just haven't dug into it yet | 19:38 |
tobiash | corvus: so looking at the code the expected result would have been a failed build instead right? | 19:39 |
corvus | to be clear, *this* is the issue with 655173: https://screenshots.firefox.com/gOvoTW5JW5PvymIT/review.opendev.org | 19:40 |
corvus | and apparently firefox screenshots crops images to a max height | 19:40 |
pabelanger | Oh | 19:40 |
pabelanger | no, i did not see that | 19:40 |
tobiash | crystal clear now ;) | 19:40 |
tobiash | https://opendev.org/zuul/zuul/src/branch/master/zuul/model.py#L2383 | 19:40 |
tobiash | that's the line that emits the warning | 19:40 |
pabelanger | I really should setup gertty again | 19:40 |
corvus | yep. and the next 3 lines are the thing that's supposed to keep it from happening more than once | 19:40 |
tobiash | but the context tells me that if we hit that we want to fail the build | 19:40 |
corvus | tobiash: i think the build was failed (it's zuul-quick-start) | 19:41 |
corvus | it shows up failed in that screenshot | 19:41 |
corvus | (it's garbled in the report because the format is wrong, but it's there) | 19:41 |
corvus | oh, i bet another "openstackism" is at play here | 19:42 |
corvus | it's not reported that way in the table at the top | 19:42 |
corvus | and again, that's hideci's fault | 19:42 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: WIP: Fix requirements loop warning https://review.opendev.org/655781 | 19:45 |
tobiash | corvus: I think this should fix that loop ^ | 19:45 |
corvus | tobiash: OOOOOOOOOOH | 19:46 |
corvus | tobiash: that makes sense -- it used to be skipped | 19:46 |
tobiash | the failed build is not ignored there | 19:46 |
corvus | i looked at that over and over running that code in my head and kept making the wrong branch choices there | 19:46 |
corvus | thank you | 19:46 |
tobiash | no problem :) | 19:47 |
tobiash | but it's too late for me to think about a proper test case, so feel free to take it over | 19:47 |
Shrews | oh, i'm glad tobiash looked as I certainly don't have the context of that code in my head :) | 19:49 |
clarkb | 19:52 | |
clarkb | derp | 19:52 |
tobiash | corvus: another thing, shouldn't the result be FAILURE instead of FAILED? | 19:52 |
corvus | tobiash: yes | 19:53 |
corvus | tobiash: let's fix that too | 19:53 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: WIP: Fix requirements loop warning https://review.opendev.org/655781 | 19:53 |
tobiash | both need to match otherwise the fix is not a fix :) | 19:54 |
corvus | if we're lucky the existing test just needs to be updated to check that the warning appears *once* | 19:58 |
corvus | it checks that it's present, but not that it's not duplicated | 19:58 |
corvus | we're not lucky | 20:00 |
mnaser | hi zuul-ers -- im trying to fix opendev to catch those extra failure templates | 20:06 |
mnaser | I have 'ipa-tempest-partition-bios-ipmi-direct-tinyipa-src finger://ze04.openstack.org/4fcf5618a3624c039c4cd8669f982f28 : RETRY_LIMIT in 36m 52s' as a potential example of failure | 20:07 |
mnaser | and then the 'FAILED' one too ? | 20:07 |
mnaser | is there somewhere I can find in the code the possible ways that Zuul can report things to gerrit | 20:07 |
mnaser | so I can try to catch as much of em as possible | 20:07 |
Shrews | corvus: that log streamer error (rescue output not found) is weird. I see it logged in the test output | 20:07 |
mnaser | okay, I assume https://opendev.org/zuul/zuul/src/branch/master/zuul/reporter/__init__.py | 20:08 |
mnaser | https://opendev.org/zuul/zuul/src/branch/master/zuul/reporter/__init__.py#L207-L208 well thats an interesting choice of var :p | 20:10 |
mordred | mnaser: I started poking at a solution to the problem you're talking about the other day but havne't had time to come back to it | 20:11 |
mordred | mnaser: things are on fire a bit - so how about I circle around with you on it tomorrow or at the ptg? | 20:11 |
mnaser | mordred: afaik all that is needed is add the appropriate thing here -- https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/manifests/review.pp#L176-L180 | 20:11 |
mnaser | mordred: cool no worries | 20:11 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: A reporter for Elasticsearch with the capability to index build and buildset results in an index. https://review.opendev.org/644927 | 20:18 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: A reporter for Elasticsearch with the capability to index build and buildset results in an index. https://review.opendev.org/644927 | 20:19 |
corvus | mnaser, mordred: there's no definitive set of things zuul can report. the only things i will guarantee (for the forseeable future) is that one of them will be "SUCCESS", and that anything else is unsuccessful in some way | 20:24 |
corvus | i say that because that's all that zuul actually cares about | 20:24 |
corvus | the rest are fur humans, and we should feel free to add more as they are useful | 20:24 |
corvus | "FAILED" was a mistake and we're correcting it | 20:25 |
*** panda is now known as panda|off | 20:30 | |
*** jamesmcarthur has joined #zuul | 20:32 | |
Shrews | tobiash: i know i figured this out at one point in the past, but what was the secret sauce to run the tox remote tests locally? | 20:42 |
tobiash | Shrews: ssh access to user zuul (when using localhost as target don't use the loopback address) | 20:43 |
tobiash | Shrews: and there's an environment variable you have to define ( look in tox.ini) | 20:44 |
tobiash | zuul_remote_ip or something similar | 20:45 |
Shrews | tobiash: yeah, something still isn't right | 20:46 |
Shrews | ZUUL_REMOTE_IPV4=192.168.1.19 ZUUL_REMOTE_KEEP=true ZUUL_SSH_KEY=/home/shrews/.ssh/id_rsa ttrun -eremote tests.remote.test_remote_zuul_stream.TestZuulStream26.test_command | 20:46 |
tobiash | What's the error? | 20:47 |
Shrews | i wonder if i need to start a geard manually | 20:47 |
tobiash | No, but zookeeper | 20:47 |
*** pcaruana has quit IRC | 20:48 | |
Shrews | I see this repeatedly: http://paste.openstack.org/show/749778/ | 20:48 |
Shrews | yeah, zk is running | 20:48 |
tobiash | Is there more log context? | 20:49 |
Shrews | oh, i think i need a password less ssh key for the zuul user | 20:51 |
Shrews | passwordless | 20:51 |
tobiash | Yes, definitely | 20:51 |
tobiash | That probably blocks the build | 20:51 |
Shrews | tobiash: yep, that was it. | 20:54 |
Shrews | thx | 20:54 |
corvus | mordred: is there a way to squash the alembic migrations? | 20:54 |
corvus | we spend enough time running those in tests that to do so might make a noticeable improvement | 20:55 |
corvus | about 4-5 seconds in the test i'm looking at right now | 20:55 |
corvus | though, also, a lot of tests are configured to use both mysql and pg, but only use one | 20:56 |
corvus | that would probably be an easier improvement | 20:56 |
corvus | tobiash: i have a reproducing test case; i'll polish it up and squash it into your change | 21:13 |
Shrews | ZUUL_REMOTE_KEEP does not appear to actually do anything | 21:16 |
Shrews | a 'git grep' finds it only in .zuul.yaml | 21:17 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Fix requirements loop warning https://review.opendev.org/655781 | 21:24 |
corvus | tobiash, Shrews, mordred, fungi, pabelanger: that should take care of the looping requirements errors we saw in https://review.opendev.org/655173 | 21:24 |
fungi | right on! | 21:25 |
fungi | reviewing will help me spend a few more minutes ignoring the fact that the lawn's only halfway done | 21:25 |
pabelanger | static/.keep strikes again | 21:26 |
pabelanger | I'm just about to #dadops for a few hours, but happy to review when back | 21:26 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Don't repeat the etc/alias setup for buildset registry pushes https://review.opendev.org/655802 | 21:33 |
*** rlandy has quit IRC | 21:36 | |
corvus | pabelanger: oh, that's what you meant by keep | 21:37 |
corvus | apparently you were trying to tell me that there is a problem with https://review.opendev.org/655781 | 21:38 |
fungi | and one i did not spot when i +2'd it | 21:38 |
corvus | enough information was missing that i did not understand your message | 21:38 |
fungi | but yeah, that would account for those test failures | 21:39 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Fix requirements loop warning https://review.opendev.org/655781 | 21:40 |
*** jamesmcarthur has quit IRC | 21:44 | |
corvus | i did a quick look at the last 10 successes and 10 failures of the tox-py35 job. gra1 is 0/3, ord is 4/1, dfw is 1/3, iad is 5/3 (success/failure respectively) | 21:47 |
corvus | i don't know if that's statistically significant or not. rax as a whole is well enough represented on both sides that i think perhaps it's not. | 21:48 |
corvus | mostly i was looking for whether we could say that a certain level of concurrency was good or bad. | 21:48 |
corvus | by default (s)testr uses the # of cpus. maybe we should see if we can halve that? | 21:49 |
corvus | i'm trying to remember if we have something like "run testr with concurrency=cpu/2" | 21:49 |
clarkb | corvus: is it possible the memory leak is affecting the tests? | 21:50 |
*** jamesmcarthur has joined #zuul | 21:51 | |
corvus | clarkb: yes, depending on what exactly the leak is | 21:51 |
corvus | clarkb: i would tend to think though that since our zuul always starts up with about the same amount of memory, the tests alone probably aren't going to kill it that way | 21:52 |
corvus | but, considering that tests have gotten worse in approximately the same time period that the memleak showed up, we should not exclude that hypothesis | 21:52 |
*** jamesmcarthur has quit IRC | 21:53 | |
*** jamesmcarthur has joined #zuul | 21:56 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Halve stestr concurrency https://review.opendev.org/655804 | 21:56 |
corvus | clarkb, tobiash: ^ let's see what comes out of rechecking that a bunch? | 21:56 |
clarkb | seems reasonable | 21:59 |
*** jamesmcarthur has quit IRC | 22:04 | |
corvus | do we need 35 and 36 jobs? | 22:05 |
clarkb | opendev still runs on 35. I think the biggest concern is merging code that is not valid on 35 (like maybe something that assumes ordered dicts?) | 22:06 |
corvus | that's a good point | 22:06 |
openstackgerrit | James E. Blair proposed zuul/zuul master: DNM: exercise halving concurrency https://review.opendev.org/655805 | 22:11 |
corvus | let's throw some computers at that ^ | 22:11 |
openstackgerrit | James E. Blair proposed zuul/zuul master: DNM: exercise halving concurrency https://review.opendev.org/655805 | 22:12 |
*** jamesmcarthur has joined #zuul | 22:27 | |
*** jamesmcarthur has quit IRC | 22:32 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Fix race in test_job_pause_pre_skipped_child https://review.opendev.org/655808 | 22:36 |
corvus | found a legit and fixable test race ^ | 22:36 |
openstackgerrit | Merged zuul/nodepool master: Update devstack settings and docs for opendev https://review.opendev.org/654230 | 22:45 |
mordred | mordred: re: alembic migrations - I think nova does rollup patches at release time (although I think they're hand-crafted) - that combine all of the things the migration is doing into a single migration | 22:51 |
mordred | gah | 22:51 |
mordred | corvus: you are corvus, not mordred | 22:51 |
corvus | mordred: yeah, that sounds good; let's look into that when we come up for air | 22:52 |
mordred | corvus: and I think they they try to have one-per-release - I don't think that approach would save us a ton, but perhaps from time to time we roll things up | 22:52 |
corvus | agreed | 22:52 |
mordred | corvus: yeah | 22:52 |
mordred | corvus, clarkb: re 35 and 36 - maybe we can get opendev running zuul from containers during our ansible/container time at the PTG - which would then perhaps let us consider bumping the python min and dropping the 3.5 job | 22:54 |
corvus | if we think other users aren't in the same boat | 22:54 |
mordred | yeah. we should obviously ask questions / communicate widely before doing such a thing | 22:54 |
mordred | I believe everyone who regularly talks in here other than opendev are all already on newer python than 3.5 - because they're either using containers or they're deploying with software collections which I believe are using 3.6 over in centos land | 22:55 |
corvus | ooh that's promising | 22:55 |
corvus | this has been a very rough day(week), and we still haven't merged the executor path patch; but we have made positive progress on every stumbling block we identified, so i think we're heading in the right direction. | 23:03 |
corvus | thanks everyone for your help :) | 23:03 |
mordred | corvus: ++ | 23:03 |
mordred | corvus: it's always the worst when it's a cluster of issues - thanks for steering the ship | 23:04 |
clarkb | any idea if either of the two reverts I made is the one to focus on? or is that further down the todo list since its stable for us in production? | 23:05 |
mordred | clarkb: I was reading through them earlier and the one that re-orged stuff seemed more likely to me - but that's TOTALLY unscientific | 23:08 |
*** jamesmcarthur has joined #zuul | 23:19 | |
openstackgerrit | Merged zuul/zuul master: Revert "Prepend path with bin dir of ansible virtualenv" https://review.opendev.org/655491 | 23:20 |
corvus | yeah, i was trying to get a feel for that too, and don't have a great idea. they both still look good to me, but i feel like the reorg stuuf (i think that's the job cancel one) is probably more likely to be able to hide something subtle | 23:27 |
corvus | the other one (missing projects) seems like like something we can prove correct on a napkin. | 23:27 |
corvus | i'm probably going to regret saying that. | 23:27 |
pabelanger | corvus: yes, sorry I was heading out the door, I really should have left that comment on gerrit | 23:32 |
mordred | corvus: those are basically my thoughts | 23:40 |
mordred | the cancel patch looks fine - but it's big enough that maybe I'm missing something, where the other one seems very straight forward ... and I'm certain I'm going to wind up being wrong :) | 23:40 |
pabelanger | so looking at http://logs.openstack.org/08/655808/1/check/tox-py36/e72c2e2/testr_results.html.gz failure, I decided to run it locally, for me the it takes 7 seconds to run. that failure took more then 45seconds it seems | 23:48 |
*** jamesmcarthur has quit IRC | 23:48 | |
pabelanger | so system likely under serious load | 23:48 |
*** jamesmcarthur has joined #zuul | 23:49 | |
pabelanger | 2019-04-25 23:03:45,552 kazoo.client DEBUG Received error(xid=2) NoNodeError() | 23:51 |
pabelanger | that is from failure | 23:52 |
pabelanger | but locally I do not get that | 23:52 |
corvus | that's probably normal | 23:52 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Fix race in test_job_pause_pre_skipped_child https://review.opendev.org/655808 | 23:54 |
corvus | let's see if another waituntilsettled is needed there | 23:54 |
pabelanger | k | 23:55 |
corvus | the halve-concurrency patch is looking promising (655804 and the child which throws a lot of tests at it) | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!