Friday, 2019-05-17

*** mattw4 has quit IRC00:06
*** zer0c00l_ has quit IRC00:06
pabelangerokay, the tox-remote failure is getting worst it seems.  I'll try to spend some time tomorrow looking into it00:19
pabelangerI think tristanC as some code to reproduce, but haven't looked yet00:19
fungiagreed, i just hit it a moment ago too00:26
fungiand i *really* don't push many zuul changes00:27
tristanCtobiash: oh, the zuul_console is actually started by the opendev pre run, and i think tox-remote is indeed using the existing service instead of creating a new one with the speculative state00:27
tristanCpabelanger: i'll update the code to re-add the instrumentation with a fix for ^00:28
openstackgerritPaul Belanger proposed zuul/zuul master: Add retries to getPullReviews() with github  https://review.opendev.org/65520400:28
pabelangerSpamapS: ^is that what you were thinking about using for / else. I agree there is a bug, guess need help on what to do when we loop 5 times00:29
pabelangertristanC: ah, cool00:30
pabelangeralso, have no idea how we test that, aside from maybe mock? on 65520400:30
pabelangerwhat does: Requirements ['docker-image'] not met by build e4abfc519a8c431c91c2564a2848127f00:52
pabelangermean again?00:52
pabelangerhttps://review.opendev.org/658486/00:52
pabelangeris that because the depends-on patch docker image is missing?00:53
openstackgerritPaul Belanger proposed zuul/zuul master: Update quickstart nodepool node to python3  https://review.opendev.org/65848600:54
pabelangergoing to drop it and see00:54
pabelangerhttps://zuul.openstack.org/build/e4abfc519a8c431c91c2564a2848127f00:57
pabelangerah00:58
pabelangerI see00:58
pabelangerthe parent change failed00:58
pabelangerbut odd00:58
pabelangerin that case, does it matter is the same project? I would expect the container to be rebuilt again00:59
openstackgerritPaul Belanger proposed zuul/zuul master: Update quickstart nodepool node to python3  https://review.opendev.org/65848601:25
fungizuul-maint: 659674 is passing now and will allow us to uncap gerrit in the quickstart jobs again01:32
SpamapSpabelanger:you definitely got for/else right.01:46
SpamapSpabelanger:I think this is a better exception than exploding later when the variable is undefined.01:46
pabelangerSpamapS: okay, neat. I still think it prevents zuul from enqueuing the change, but guess that is what we live with if github api is wonky01:47
*** ianychoi has joined #zuul02:00
tristanCwouldn't it make sens to have a periodic task to check open changes missing ci votes, and automatically enqueue those in zuul?02:02
pabelangeralso thought of doing that02:03
pabelangerbut, ideally we don't miss the events to start02:03
pabelangerhowever, so far when we do, it has been because of 500 errors on github side02:03
tristanCpabelanger: it can also happen when upgrading zuul, or when we need to reboot the control plane02:08
pabelangertristanC: yah, I think HA scheduler will fix some of that, there has been talk of fixing that02:10
pabelangerso, we shouldn't miss any events if scheduler is down02:11
pabelangerbut agree, a known issue02:11
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures  https://review.opendev.org/65791402:29
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures  https://review.opendev.org/65791402:50
*** jesusaur has quit IRC02:58
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures  https://review.opendev.org/65791403:28
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul_stream: prevent incorrect task exit code detection  https://review.opendev.org/65970803:28
*** ianychoi has quit IRC03:49
*** ianychoi has joined #zuul03:49
*** ianychoi has quit IRC03:52
*** ianychoi has joined #zuul03:52
*** bhavikdbavishi has joined #zuul03:54
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures  https://review.opendev.org/65791403:59
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate github logs with the event id  https://review.opendev.org/65864504:11
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate gerrit event logs  https://review.opendev.org/65864604:11
openstackgerritTobias Henkel proposed zuul/zuul master: Attach event to queue item  https://review.opendev.org/65864704:11
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate some logs in the scheduler with event id  https://review.opendev.org/65864804:11
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate logs in the zuul driver with event ids  https://review.opendev.org/65864904:11
openstackgerritTobias Henkel proposed zuul/zuul master: Add event id to timer events  https://review.opendev.org/65865004:11
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate pipeline processing with event id  https://review.opendev.org/65865104:11
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate merger logs with event id  https://review.opendev.org/65865204:11
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate job freezing logs with event id  https://review.opendev.org/65888804:11
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate node request processing with event id  https://review.opendev.org/65888904:11
openstackgerritTobias Henkel proposed zuul/zuul master: WIP: Annotate builds with event id  https://review.opendev.org/65889504:11
daniel2SpamapS: there was no nginx specific stuff in the docs, just apache04:19
*** bhavikdbavishi1 has joined #zuul04:23
*** bhavikdbavishi has quit IRC04:23
*** bhavikdbavishi1 is now known as bhavikdbavishi04:23
tobiashtristanC: I think I've understood the tox-remote failures. It's a race, not a parsing problem. I'll sum it up later when I have more time.04:24
tristanCtobiash: i've seen my comment about zuul_console not being restarted by the job and using the one setup by the opendev.org job?04:24
tristanChave you seen* ?04:24
tobiashyes, that's unfortunate and should be fixed as well04:25
*** bhavikdbavishi has quit IRC04:29
tobiashtristanC: but we shouldn't fix this by restarting zuul_console but by running a second on a different port04:35
*** bhavikdbavishi has joined #zuul04:36
tobiashOtherwise we're doomed when a change breaks it as this will also break the overall job output04:36
*** yolanda_ has quit IRC04:41
*** yolanda_ has joined #zuul04:42
*** saneax has joined #zuul04:45
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul-tox-remote: kill previously started zuul_console service  https://review.opendev.org/65970804:47
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures  https://review.opendev.org/65791404:47
*** pcaruana has joined #zuul05:20
openstackgerritMerged zuul/zuul master: tox: Integrate tox-docker  https://review.opendev.org/64904105:42
*** bhavikdbavishi has quit IRC05:43
tristanCtobiash: alright, then i'll give https://review.opendev.org/#/c/535538/ another try05:46
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul-tox-remote: use unique zuul_console service  https://review.opendev.org/65970806:09
*** bjackman_ has joined #zuul06:12
*** gtema has joined #zuul06:15
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures  https://review.opendev.org/65791406:23
*** bhavikdbavishi has joined #zuul06:30
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul-tox-remote: use unique zuul_console service  https://review.opendev.org/65970806:39
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures  https://review.opendev.org/65791406:42
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul-tox-remote: use unique zuul_console service  https://review.opendev.org/65970806:55
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures  https://review.opendev.org/65791406:55
*** EmilienM|pto has quit IRC07:16
*** EmilienM has joined #zuul07:17
openstackgerritTristan Cacqueray proposed zuul/zuul master: [DNM] run zuul-tox-remote many times  https://review.opendev.org/65791407:21
tristanCtobiash: ok, 6 SUCCESS on 657914 for zuul-tox-remote07:21
AJaegertristanC: Yeah, thanks for fixing!07:26
tristanCAJaeger: you're welcome, so the fix may be: https://review.opendev.org/65970807:32
openstackgerritTobias Henkel proposed zuul/zuul master: DNM: further zuul-remote debugging attempts  https://review.opendev.org/65963107:46
openstackgerritTobias Henkel proposed zuul/zuul master: Fix race in log streaming  https://review.opendev.org/65973807:46
tristanCunfortunately that doesn't fully fixed the issue, second recheck showed some failure07:48
tobiashtristanC: 659738 should fix that race hopefully07:49
openstackgerritTristan Cacqueray proposed zuul/zuul master: zuul-tox-remote: use unique zuul_console service  https://review.opendev.org/65970807:56
*** hashar has joined #zuul07:57
*** jpena|off is now known as jpena07:59
tobiashall 10 tox-remote successful on https://review.opendev.org/65963108:15
tobiash:)08:15
AJaegertobiash, so is tristanC's change not needed?08:17
tobiashAJaeger: it's needed for a different reason (we don't test zuul_stream without it)08:19
openstackgerritTobias Henkel proposed zuul/zuul master: DNM: further zuul-remote debugging attempts  https://review.opendev.org/65963108:24
openstackgerritTobias Henkel proposed zuul/zuul master: Increase timeout of zuul-tox-remote  https://review.opendev.org/65974308:24
tobiashand also increase the timeout as many runs are close to the default of 30 minutes ^08:24
openstackgerritMerged zuul/zuul master: Add proper __repr__ to merger repo  https://review.opendev.org/64994908:46
*** panda|rover|off is now known as panda|rover09:10
*** saneax has quit IRC09:33
*** saneax has joined #zuul09:35
*** saneax has quit IRC09:42
*** hashar has quit IRC09:47
*** gtema has quit IRC10:04
*** gtema has joined #zuul10:05
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure driver - https://pagure.io/pagure/  https://review.opendev.org/60440410:23
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure driver - https://pagure.io/pagure/  https://review.opendev.org/60440410:29
tobiash659631 passed 20 zuul-tox-remote jobs in a row :)10:40
AJaegeryeah!10:45
openstackgerritTobias Henkel proposed zuul/zuul master: Update cached repo during job startup only if needed  https://review.opendev.org/64822910:47
*** gtema has quit IRC11:06
*** gtema has joined #zuul11:06
*** hashar has joined #zuul11:08
*** jpena is now known as jpena|lunch11:31
*** bhavikdbavishi has quit IRC12:06
mordredtristanC: did you see we had to revert the react v2 change yesterday? I haven't had time to reproduce locally and debug, but it looks like the same issue was happening in the build-dashboard job, so it should be reproducible12:10
mordredtobiash, tristanC: nice zuul-stream patches!12:13
AJaegermordred: did you see https://review.opendev.org/#/c/659738/ https://review.opendev.org/#/c/659743/ and https://review.opendev.org/659708 ? Those address the tox-remote failures...12:14
AJaegermordred: ah, you did see it ;)12:14
mordred:)12:14
mordredyes - nice patches12:14
AJaegerindeed!12:15
tobiashthanks :)12:29
mordredtobiash: maybe with the updates to tox-remote and fixes to the image builds we'll actually be able to land your log event id stack :)12:32
tobiashmordred: that would be great :)12:32
Shrewsmordred: i don't think i've noticed this before, but re: the react revert (https://review.opendev.org/659655), it looked like it failed gate initially but immediate requeued and succeeded. Is that a thing?12:32
Shrewsthere is no recheck comment there12:34
mordredShrews: I'm very confused about that12:35
tobiashwas there a scheduler restart around that time?12:35
tobiasha restart + reenqueue race could explain that12:35
mordredoh 0 yeah - there might have been12:35
mordredwe did restart the scheduler yesterday12:35
Shrewsah12:36
*** jpena|lunch is now known as jpena12:36
tobiashbtw, 659631 succeeded now 30 remote tests in a row, so I think the tox-remote stack looks good now12:37
openstackgerritFabien Boucher proposed zuul/zuul master: Prevent Zuul scheduler to crash at startup if gerrit down  https://review.opendev.org/57619212:39
*** rlandy has joined #zuul12:40
Shrewstobiash: what finally led you to find the cause of the remote job failure?12:44
Shrews+3'd , btw12:44
mordredfbo_: accidental file deletion in that last patchset &&12:46
mordredShrews: I'm going to choose to believe he loaded the tests into a bmw and had someone drive it around the test track and eventually the autopilot pointed out the error12:47
Shrewsmordred: it looks more like he flogged zuul until it gave up its secrets. i approve of this method.12:48
tobiashyou know the feeling when you suddenly have an idea how to fix a hard problem and you don't have access to a laptop to immediately try it out? I had this moment this morning before I took the bike to work.12:49
AJaegermordred: it's race - so, that involves multiples cars and drivers ;)12:49
tobiashit's like torture having to think about that for an hour until you can work on that ;)12:49
tobiashShrews: just a sec, Iooking for the logs with the hint12:50
Shrewstobiash: i think i found it12:50
tobiashShrews: I found the final hint there: http://logs.openstack.org/31/659631/2/check/zuul-tox-remote-5/e2b93e4/testr_results.html.gz12:52
tobiashsearch for 'XXX: streamers stop and log not found controller'12:52
Shrewsyep, that's what i found  :)12:53
openstackgerritFabien Boucher proposed zuul/zuul master: Prevent Zuul scheduler to crash at startup if gerrit down  https://review.opendev.org/57619212:53
fbo_mordred: argh, yes thanks12:53
mordredtobiash: yes! I know that feeling - it can be very annoying.12:54
*** bjackman has joined #zuul12:57
*** bjackman_ has quit IRC12:58
tobiashmordred: thinking about this .keep file, is it possible to remove that and instead create the target dir on the fly?13:13
tobiashalmost everyone accidentally removes that occasionally in changes13:13
tobiash(including me)13:13
mordredtobiash: probably? I can't remember what gets confused when the directory is gone - I wanna say something in react-scripts was grumpy13:14
mordred(I agree, I dislike the file)13:14
*** bjackman has quit IRC13:21
openstackgerritMerged zuul/zuul master: Fix race in log streaming  https://review.opendev.org/65973813:23
tobiash\o/13:23
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure driver - https://pagure.io/pagure/  https://review.opendev.org/60440413:26
*** bhavikdbavishi has joined #zuul13:27
clarkbShrews: mordred I zuul enqueued the revert back to the gate after it failed13:32
openstackgerritMerged zuul/zuul master: Increase timeout of zuul-tox-remote  https://review.opendev.org/65974314:00
openstackgerritBenedikt Löffler proposed zuul/zuul master: Get playbook data from extra_vars  https://review.opendev.org/65980214:03
*** bhavikdbavishi has quit IRC14:05
pabelangerHmm, that's new14:10
pabelangerhttp://paste.openstack.org/show/751518/14:10
pabelangerrunning zuul tests on fedora-30 now14:10
pabelangerguess something changed14:10
pabelangerhttp://paste.openstack.org/show/751519/14:11
pabelangerfirst one was not complete14:11
clarkbpabelanger: rebuild ylur venvs14:12
clarkbdistro likely updated libcrypt and now you need to relink against it14:12
pabelangerclarkb: I did, but let me try again14:12
pabelangeroh14:13
pabelangerhttps://fedoraproject.org/wiki/Changes/FullyRemoveDeprecatedAndUnsafeFunctionsFromLibcrypt14:13
pabelangerthat looks related14:13
pabelangerhttps://github.com/psycopg/psycopg2/issues/91214:15
pabelangerlibxcrypt-compat14:16
pabelangerseems to be the workaround14:16
pabelangerwill update bindep.txt for fedora-3014:16
tobiashpabelanger: yeah I had the same problem after upgrading to fedora 30 and installing libxcrypt-compat is the solution14:18
pabelanger+114:19
pabelangerlet me finish this python-path test, and will push up that too14:19
clarkbthe discussion on the manylinux wheel bug is interesting14:32
fungithe psycopg2 one or somethnig else?14:35
fungibut yeah, manylinux1 wheels which rely on libcrypt are simply going to be broken across the board on fedora 3014:36
fungiwhich is why it's called "many linux" and not "all linux"14:37
fungijust like musl makes manylinux1 wheels not applicable on alpine14:37
openstackgerritFabien Boucher proposed zuul/zuul master: Prevent Zuul scheduler to crash at startup if gerrit down  https://review.opendev.org/57619214:37
openstackgerritFabien Boucher proposed zuul/zuul master: Prevent Zuul scheduler to crash at startup if gerrit down  https://review.opendev.org/57619214:38
fungiclarkb: ahh, you were likely referring to the https://github.com/pypa/manylinux/issues/305 referenced from it?14:38
fungithis mess is why i prefer to jump through hoops and reinvent some wheels (pun intended) to stick to pure python any time it's remotely possible14:40
clarkbya14:41
openstackgerritPaul Belanger proposed zuul/zuul master: Add more test coverage on using python-path  https://review.opendev.org/65981214:41
pabelangerzuul-maint: ^ adds more testing around python-path setting from nodepool in zuul^14:42
pabelangerI've added my +2 back to https://review.opendev.org/65981214:42
pabelangerand confident that it is now working correctly14:42
openstackgerritMark Meyer proposed zuul/zuul master: Create a basic Bitbucket event source  https://review.opendev.org/65883514:46
*** kmalloc is now known as kmalloc_away14:47
openstackgerritMark Meyer proposed zuul/zuul master: Add Bitbucket Server source functionality  https://review.opendev.org/65783714:47
AJaegerzuul-maint, is updating our linting jobs to ansible 2.7 correct? See https://review.opendev.org/659810 and https://review.opendev.org/65981114:49
pabelangerAJaeger: we should consider going right to ansible 2.8, since I expect us to start try it out in opendev soon14:50
AJaegerpabelanger: let's do it stepwise ;)14:50
pabelangersure14:50
AJaegerour users are not that fast14:50
AJaegerand with multi-version ansible, i wonder whehter linting jobs iwth 2.7 is correct or whether we need 2.5 as "deprecated" version14:51
pabelangeryah, that is the tricky part, we need to keep backwards compat for all version zuul suppoerts14:52
pabelangeransible 2.5 isn't EOL just yet14:52
pabelangeranother 30days I think14:52
AJaeger2.5 is marked as depreacted in Zuul14:52
AJaegershould I send an email to zuul list and wip the changes for now?14:53
pabelangermaybe14:53
pabelangerwe'll have to learn as we go I think14:53
AJaeger;)14:54
openstackgerritMark Meyer proposed zuul/zuul master: Add Bitbucket Server source functionality  https://review.opendev.org/65783714:54
openstackgerritMark Meyer proposed zuul/zuul master: Create a basic Bitbucket build status reporter  https://review.opendev.org/65833514:54
openstackgerritMark Meyer proposed zuul/zuul master: Create a basic Bitbucket event source  https://review.opendev.org/65883514:54
pabelangerare any jobs in opendev setup to use another version of ansible besides 2.7?14:54
AJaegerpabelanger: not that I know of14:56
tobiashachievement unlocked, 500 jobs in parallel14:57
pabelangernice14:58
mordredtobiash: as a heads up - 0.28.0 of openstacksdk was released which contains a refactor in image processing code. it should be no different than before - but just wanted you to be aware15:43
tobiashthanks for the info :)15:44
fungi#status log applied opendev migration renames to the storyboard-dev database to stop puppet complaining15:44
openstackstatusfungi: finished logging15:44
fungier, wrong channel, sorry for the noise15:44
*** rlandy is now known as rlandy|biab15:58
*** panda|rover is now known as panda|rover|off16:06
*** hashar has quit IRC16:07
*** armstrongs has joined #zuul16:09
armstrongshttps://storyboard.openstack.org/#!/story/2004868 I am currently hitting this bug and the scheduler is started but not rendering the tenants on the web interface. As a result by executor can't connect to Gearman. Is there any known fix or workaround to get back up and running again?16:12
*** armstrongs has quit IRC16:21
*** jangutter has quit IRC16:22
*** rlandy|biab is now known as rlandy16:30
clarkbarmstrongs is gone, but I wonder what the merger says16:48
*** mattw4 has joined #zuul17:01
tobiashyepp, that exception means that the merger failed17:07
*** mattw4 has quit IRC17:16
*** mattw4 has joined #zuul17:21
*** jpena is now known as jpena|off17:22
*** gtema has quit IRC17:46
tobiashfbo_: I commented on https://review.opendev.org/57619217:48
tobiashpabelanger: +2 with comment on https://review.opendev.org/65981217:53
*** Armstrongs has joined #zuul17:55
ArmstrongsI'm not running a separate merger component17:56
tobiashArmstrongs: the executor is a merger too17:56
ArmstrongsSure17:56
tobiashyour exception indicates that you either don't run an executor/merger or have problems accessing the repo from the executor/merger17:56
ArmstrongsSo that starts then says ending log stream and stops17:57
ArmstrongsIn the logs17:57
ArmstrongsDebug logs say can't connect to gearman17:57
ArmstrongsPort17:57
tobiashArmstrongs: can you post a log of the scheduler and the executor?17:57
ArmstrongsWill do 2secs17:58
fungialso, is the scheduler configured to run a gearman server? or are you trying to run a separate gearman service for it?17:58
tobiashthat would have been my next question ;)17:58
tobiashbecause gearman is started before contacting the executor (if gearman is configured to be started by the scheduler)17:59
ArmstrongsScheduler is running based on the setup from zuul from scratch guide17:59
ArmstrongsSo it runs gearman17:59
ArmstrongsI believe17:59
tobiashso you have this config: https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html#zuul ?17:59
ArmstrongsYeah18:00
ArmstrongsThat's how my components are configured18:00
tobiashok, then I think I need scheduler logs, executor logs and the zuul.conf to understand the problem18:00
ArmstrongsWill do18:01
ArmstrongsLogging into my laptop now18:01
ArmstrongsThanks again18:02
pabelangertobiash: replied18:02
tobiashno hurry ;)18:02
*** armstrongs_ has joined #zuul18:02
fungiArmstrongs: also is your executor on a different machine than your scheduler? if so, have to make sure you have firewall rules on the scheduler allowing access to the gearman port, at least from the ip address of the executor18:02
ArmstrongsNo executor on same machine18:03
ArmstrongsAll on 1 box at moment18:03
fungiin that case it's not likely to be a firewall problem at least18:03
tobiashpabelanger: I guess there is a misunderstanding, I just meant that I would have expected 'self.fake_nodepool.python_path = python_path' on that line18:03
ArmstrongsNo I can telnet the port18:03
tobiashpabelanger: because you pass the path to this function and ignore it there18:04
tobiashArmstrongs: you're running *only* the scheduler?18:05
tobiashoh, I think I misunderstood that18:05
tobiashArmstrongs: so you're running scheduler and executor on the same box right?18:06
armstrongs_i had a running executor, web and scheduler18:06
armstrongs_suddenly the executor stopped18:06
armstrongs_i couldnt start it again18:06
pabelangertobiash: Oh18:06
pabelangerha18:06
pabelangeryah, that is a typo18:06
armstrongs_and it has the errors in the logs that match the ticket i found18:06
pabelangerI see now18:06
armstrongs_wheres the best place to send you the logs18:06
tobiashpaste.openstack.org18:07
openstackgerritPaul Belanger proposed zuul/zuul master: Add more test coverage on using python-path  https://review.opendev.org/65981218:07
armstrongs_ok 2 secs18:07
pabelangertobiash: updatred18:07
pabelangerupdated*18:07
tobiash+218:07
armstrongs_executor.log is here paste.openstack.org/show/751533/18:19
*** Armstrongs has quit IRC18:19
armstrongs_scheduler.log is here paste.openstack.org/show/751534/18:22
armstrongs_and the zuul.conf is here paste.openstack.org/show/751535/18:24
tobiasharmstrongs_: you should change your db password now ;)18:28
armstrongs_its a poc and test instance18:29
armstrongs_we are evaluating would never be our prod one haha18:29
armstrongs_hoping to get this complete so we can use it in production at just eat instead of team city :)18:30
tobiashare those logs incomplete?18:31
tobiashI don't see the error there18:31
armstrongs_just checking18:32
tobiashalso you might be interested in trying out the quickstart instead which is docker based and easier to start with18:34
armstrongs_i had it all working18:34
tobiashah ok18:34
armstrongs_it just stopped after the 3rd day18:34
armstrongs_this happened before too18:35
armstrongs_so defo eventually hits a bug18:35
armstrongs_will update full logs now18:35
armstrongs_sorry18:35
armstrongs_will re-upload the scheduler18:37
tobiashin which order did you start the services?18:39
armstrongs_paste.openstack.org/show/751537/ is the full scheduler log18:39
tobiashto me it looks like the executor tried for some time to connect to gearman and then gave uo18:39
tobiash*up18:39
armstrongs_started them in the order in the guild so executor, web then scheduler18:39
armstrongs_yeah i have tried all orders18:39
tobiashtry the scheduler first18:40
armstrongs_i have18:40
tobiashall services need gearman which is started by the scheduler18:40
armstrongs_it starts but the main page doesnt load18:40
armstrongs_on the web18:40
armstrongs_the tenants18:40
tobiashalso I noticed that the executor log looks weird as there are large time gaps in between18:40
armstrongs_so defo not in a good state18:40
armstrongs_yeah that was me trying to restart it18:40
armstrongs_after debugging the scheduler logs18:41
armstrongs_i will try starting scheduler then web then executor18:42
armstrongs_in that order now18:42
tobiashweb is not necessary for the startup at first18:42
tobiashcould you please delete all logs before and then just start scheduler and executor?18:43
armstrongs_wiil do18:43
armstrongs_scheduler.log paste.openstack.org/show/751538/18:47
armstrongs_executor.log paste.openstack.org/show/751539/18:49
armstrongs_so scheduler service is running but executor just dies as before after starting18:49
SpamapSarmstrongs_:you shouldn't drop the https:// ... makes me have to do an extra 2 clicks to see your pastes. ;)18:50
tobiasharmstrongs_: the executor just dies?18:51
SpamapSyeah that's weird18:51
tobiasharmstrongs_: you could try to run it in foreground by executing 'zuul-executor -d'18:52
armstrongs_ok will try that18:52
clarkbalso check dmesg for oomkiller when processes just die18:53
tobiashyeah, maybe the instance is now too small18:53
armstrongs_the web interface cant load my tenants now either though18:55
tobiashit will only load the tenants after the full startup18:55
armstrongs_scheduler just sits there with -d18:56
armstrongs_saying setting to sleep18:56
armstrongs_polling 1 connection18:56
armstrongs_then nothing18:56
armstrongs_as looking for gearman port18:57
armstrongs_its a powerful box and i was only using a small test repo on it18:57
armstrongs_m5.large18:57
*** tjgresha has quit IRC18:59
SpamapSarmstrongs_:how's zookeeper's health?19:01
*** tjgresha has joined #zuul19:02
armstrongs_healthy service19:03
clarkbfwiw 8GB of memory isn't a ton for a service like zuul (though usually its only a problem under load)19:04
clarkbI would still double check dmesg for oomkiller output19:04
clarkbanother thing to check is if it is segfaulting19:04
clarkbpython3 on some distros has been less than stable19:05
armstrongs_this is fedora 2819:05
armstrongs_the box19:05
pabelangershould be okay with fedora-28, I was using that for local development for 6+ months19:05
armstrongs_i gave up on centos 719:05
armstrongs_:)19:05
pabelangeryah, I wouldn't do centos-7 yet19:05
pabelangerhopefully centos-8 things are better19:06
armstrongs_defo19:06
clarkb28 is eol right?19:06
pabelangeryah, should be now, if not soon19:06
armstrongs_yeah it is19:06
armstrongs_need to package some new images19:06
pabelangerso, executor starts then dies?19:06
armstrongs_:)19:06
armstrongs_yeah19:06
pabelangerstarting with systemd?19:06
pabelangeror manually19:06
armstrongs_so i was setting zuul-executor to verbose19:06
armstrongs_when it happened19:07
armstrongs_if that shines any light on it19:07
pabelangerI'd check what clarkb says, dmesg for OOM19:07
armstrongs_as i wanted by ansible output to be verbose19:07
pabelangerwhat version of zuul?19:07
pabelanger3.8.2.dev619:08
armstrongs_looking now19:08
armstrongs_3.8.2.dev619:09
armstrongs_not seeing any OOM19:10
pabelangerstrange19:11
tjgreshahave a question on separation on zuul from nodepool onto different servers if anyone has insights (is it possible?)19:12
pabelangeronly real issue I had was with not enough entropy for /dev/random. Installing haveged fixed that for me, for VM19:12
pabelangerfor zuul-executors19:12
pabelangertjgresha: yup, ask away, also is possible19:13
pabelangerarmstrongs_: maybe run it under strace and see what happens19:13
clarkbsegfault would be my next suspicion if it is just dying19:14
clarkbstrace will show that I Think19:14
armstrongs_will give it a go in a bit, thanks for all the help debugging so far, my girlfriends home now, its 8pm on a friday here so will get in trouble for not paying attention to her :)19:15
armstrongs_will give strace a go and report back19:15
tjgreshawhere do i give zuul the location of the nodepool it should use?19:15
*** armstrongs_ has quit IRC19:15
clarkbtjgresha: you don't explicitly set that instead you point zuul and nodepool at a common zookeeper database19:16
pabelangertjgresha: https://zuul-ci.org/docs/zuul/admin/components.html#components might help with connections too19:17
tjgreshahmm ok - that is what I thought --  so we put ZK and the Nodepool onto a different server than the zuul now, and changed the nodepool.yaml and zuul.conf to point to the new zk19:22
clarkbyes19:22
tjgreshadoes zuul need to be restarted after a zuul.conf change?19:24
openstackgerritPaul Belanger proposed zuul/zuul master: Set iterate_timeout to 60 for pause jobs  https://review.opendev.org/65987119:24
pabelangertjgresha: usually yes19:24
pabelangertobiash: clarkb: just seen a pause job timeout, ^ should give us some more room whlie waiting for results19:25
tjgreshathanks all19:27
clarkbpabelanger: I think part of the reason for those values is OS_TEST_TIMEOUT is set to 24019:28
clarkb30*4 comes in under 240 but 60*4 == 24019:28
clarkbpabelanger: I still think the change is ok and if we hit that longer timeout we'll see it19:28
pabelangerclarkb: Ah, that makes sense19:28
pabelangeryah, we might need to bump that too19:29
pabelangerso far, I've only see it fail once in 3 weeks, since we improved our testing19:29
*** tjgresha__ has joined #zuul19:30
*** tjgresha__ has quit IRC19:30
*** tjgresha has left #zuul19:30
*** tjgresha has joined #zuul19:31
pabelangerclarkb: we also seem to be hitting a subunit limit?19:31
pabelangerhttp://logs.openstack.org/08/659708/6/gate/tox-py35/4d7ddb8/job-output.txt.gz#_2019-05-17_13_32_55_89619119:31
clarkbpabelanger: ya thats the too much data in the per test stream19:32
pabelangeryah19:33
pabelangertest_plugins19:33
pabelangerlooks to have timed out19:33
pabelangerthink it is bumping up against OS_TEST_TIMEOUT19:33
pabelangerlet me confirm19:33
tobiashgear is very chatty in the tests19:33
clarkbalso zuul only attaches logs on failure19:34
pabelangertests.unit.test_v3.TestAnsible28.test_plugins [260.366155s] ... FAILED19:34
clarkbso that means the test is failing for some other reason and this error is making it harder to debug but not the root source of the error19:34
pabelangerwonder is we have a new failure with 2.8 stuff19:34
pabelangeryah, I think we might need to bump that timeout for test_plugins or look to split it up19:36
pabelangervery busy that test19:37
pabelangerrun in ovh-bhs1, so possible we got a slower node there19:37
pabelangerclarkb: up for a review on https://review.opendev.org/659812 / https://review.opendev.org/637339 that is to allow for python3 only images for zuul / nodepool20:00
*** pcaruana has quit IRC20:02
clarkbpabelanger: reading that second chagne there isn't a way to use python3 on the executor/localhost ?20:38
clarkbwe don't put localhost in the inventory and it isn't managed by nodepool20:38
clarkbdo we need to consider this case too?20:38
*** mattw4 has quit IRC21:00
*** mattw4 has joined #zuul21:00
openstackgerritMerged zuul/zuul master: zuul-tox-remote: use unique zuul_console service  https://review.opendev.org/65970821:08
*** tjgresha has quit IRC21:11
openstackgerritMerged zuul/zuul master: Set iterate_timeout to 60 for pause jobs  https://review.opendev.org/65987121:11
*** tjgresha_ has joined #zuul21:11
*** tjgresha has joined #zuul21:14
pabelangerclarkb: yah, good question. For that, we don't have a way to override right now21:15
pabelangerso, we should think about that21:15
pabelangernot localhost, but for nodes not managed by nodepool, I use add_host to set it21:16
*** tjgresha_ has quit IRC21:16
*** tjgresha has quit IRC21:22
*** Armstrongs has joined #zuul21:22
*** Armstrongs has quit IRC21:31
*** mattw4 has quit IRC21:35
*** mattw4 has joined #zuul21:35
pabelangerclarkb: replied, but also friday. Likely pick it up on monday21:50
*** mattw4 has quit IRC21:54
*** mattw4 has joined #zuul22:05
*** rlandy has quit IRC22:31
*** mattw4 has quit IRC22:38
*** mattw4 has joined #zuul22:42

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!