Friday, 2018-07-13

*** rlandy has quit IRC00:17
*** harlowja has quit IRC01:05
tristanCarg, a misclick and storyboard discarded my report :(03:23
tristanCcorvus: the issue pabelanger mentioned was that the final attribute verification doesn't check for child job in other projects03:24
tristanCcorvus: in our case, periodic job was depending on a parent job that got the final attribute set to True03:24
tristanCcorvus: after zuul merged that change, periodic job wasn't triggered, and the only tell was a stacktrace in the scheduler log that mentioned the "Unable to freeze job graph: Unable to modify final job" error03:25
tristanCso... shouldn't the configloader check for freezing error before accepting such changes?03:26
tristanCand/or, periodic pipeline job graph errors shouldn't be reported to the config-errors list?03:26
tristanCerr, shouln't* post-review job graph errors be reported to the config-errors list?03:36
tobiashtristanC: checking for freezing errors is not that simple as it depends on project, branch, changed files03:54
tobiashtristanC: but maybe we can/should report freezing errors to the user03:56
corvustristanC, tobiash: the pipeline should report the error03:56
tobiashcorvus: actually I thought it does but it might be a problem that this was a periodic pipeline03:58
corvustristanC: does the periodic pipeline report anywhere?  (email, etc?)03:58
*** spsurya_ has joined #zuul03:59
corvusi don't think the sql reporter is capable of displaying errors like that, so if that's the only reporter, that may explain why no one saw the msg03:59
tobiashcorvus: maybe we could inject a failing fake job on a freezing error03:59
tobiashThen it would be included in a normal sql reporting and show up in the builds tab04:00
corvustobiash: perhaps; or we could have the sql reporter store a buildset message04:01
corvusit's > eod, so i'm not going to think about which approach would be better :)04:01
tobiashOr that, I'm still at breakfast so I haven't thought very deeply about that04:02
tobiashcorvus: I already wondered why you're responding at this time :)04:03
corvuspoor virtual desktop organization ;)04:03
*** harlowja has joined #zuul04:25
tristanCcorvus: the rdo's periodic pipeline only has sql reporter, and it didn't reported the issue04:33
tristanCcorvus: it also can happen with any pipeline, i reproduced the issue for regular job, zuul doesn't prevent setting a final job, even though it's currently used as a parent job04:34
tristanCthen, if that setting get merge, every user of that jobs are basically broken04:34
tobiashtristanC: I think we should solve that particular problem by imroving error reporting04:37
tobiashChecking for child jobs is not as trivial as it seems due to the various filters04:38
tobiashtristanC: and it shouldn't prevent you to make a job final if you detect a misuse by a child job in a different repo04:39
tobiashtristanC: in order to check this in the config loader I think you would have to freeze a job graph for every child job on every graph, various changed files...04:42
tobiashThat's not realistic04:42
tobiashs/graph/branch04:43
tobiashgah, mobile keyboard...04:43
*** harlowja has quit IRC04:44
tristanCtobiash: sure, error reporting would be good enough... right now one have to dig GBs of logs to find the traceback and figure out what went wrong...04:45
tobiashtristanC: yes, that's not how it should be04:46
tristanCthough, i don't understand why we couldn't do: "for each job, for each variant, for each parent, check if a final flag would prevent job graph to freeze"04:47
tristanCand be able to prevent the error early, instead of waiting for the pipeline to fail execution04:48
*** ianychoi_ has joined #zuul05:27
*** ianychoi has quit IRC05:30
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: scheduler: fix enqueue event to use canonical project name  https://review.openstack.org/58004005:47
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: scheduler: return project_canonical in status page  https://review.openstack.org/58245105:54
*** nchakrab has joined #zuul06:23
*** lennyb has quit IRC06:49
*** logan- has quit IRC06:50
*** logan- has joined #zuul06:50
*** lennyb has joined #zuul06:52
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: scheduler: fix enqueue event to use canonical project name  https://review.openstack.org/58004006:58
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: scheduler: return project_canonical in status page  https://review.openstack.org/58245106:58
*** jimi|ansible has quit IRC07:06
*** gtema has joined #zuul07:18
*** electrofelix has joined #zuul07:37
*** hwoarang has quit IRC08:15
*** hwoarang has joined #zuul08:23
*** hwoarang has quit IRC08:23
*** hwoarang has joined #zuul08:23
*** GonZo2000 has joined #zuul08:52
*** hwoarang has quit IRC08:54
*** hwoarang has joined #zuul08:55
*** GonZo2000 has quit IRC08:56
*** sshnaidm|afk is now known as sshnaidm|off09:00
*** sshnaidm|off has quit IRC09:05
*** quiquell has joined #zuul09:06
quiquellHello09:06
quiquellQuestion, the jobs specified at "required-jobs" are cloned fom master or from the change that triggered the build ?09:06
*** jimi|ansible has joined #zuul09:50
*** sshnaidm|off has joined #zuul09:55
*** GonZo2000 has joined #zuul10:01
*** GonZo2000 has quit IRC10:01
*** GonZo2000 has joined #zuul10:01
*** nchakrab_ has joined #zuul10:34
*** nchakrab has quit IRC10:37
*** nchakrab has joined #zuul10:55
*** nchakrab has quit IRC10:57
*** nchakrab has joined #zuul10:57
*** nchakrab_ has quit IRC10:58
mordredquiquell: the change that triggered the build and/or any depends-on patches that are relevant11:00
quiquellmordred: ack11:03
*** nchakrab_ has joined #zuul11:11
*** nchakrab has quit IRC11:15
*** abelur has quit IRC11:15
*** abelur has joined #zuul11:16
*** abelur has quit IRC11:16
*** abelur has joined #zuul11:17
*** nchakrab_ has quit IRC11:30
*** nchakrab has joined #zuul11:31
gtemamordred: should I update https://review.openstack.org/#/c/572829/ with respect to https://review.openstack.org/#/c/577616/ and sdk==0.16.0?11:50
*** D0han has joined #zuul12:08
D0hanhi, im looking for more materials, other presentations, videos etc about https://www.youtube.com/watch?v=zC1lY9_gfcE12:09
D0han'ansible-based ci'12:09
mordredgtema: yes please - with 0.16 being cut, we should be able to get that patch landed once you do12:16
mordred(I'll bug Shrews once he's up) :)12:17
gtemaok, thanks12:17
mordredD0han: hi! https://zuul-ci.org/ is the main website and probably a good place to start12:18
D0hanthanks! will look at that12:18
openstackgerritArtem Goncharov proposed openstack-infra/nodepool master: retire shade in favor of openstacksdk  https://review.openstack.org/57282912:19
mordredgtema: lgtm12:21
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Remove OpenStack driver waitForImage call  https://review.openstack.org/58048712:22
gtemamordred: ack12:22
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Use openstacksdk instead of os-client-config  https://review.openstack.org/56615812:23
mordredgtema: ^^ I rebased mine on top of yours12:23
gtemaok12:23
gtemashould be good now finally12:24
mordredyah. that was fun :)12:25
gtemaquite silent today12:25
gtemain all channels12:25
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: Add a dequeue command to zuul client  https://review.openstack.org/9503512:32
*** rlandy has joined #zuul12:38
rcarrillocruzlong live shade!12:50
*** TheJulia is now known as needssleep12:58
mordredrcarrillocruz: :)13:01
*** samccann has joined #zuul13:12
openstackgerritMerged openstack-infra/zuul-jobs master: Revert "Revert "Install build bindep profiles alongside doc and test""  https://review.openstack.org/58087213:22
*** nchakrab_ has joined #zuul13:26
*** nchakrab has quit IRC13:29
*** gtema has quit IRC13:31
*** quiquell is now known as quiquell|lunch13:38
*** samccann has quit IRC13:47
*** samccann has joined #zuul13:48
*** samccann_ has joined #zuul13:49
*** samccann has quit IRC13:50
*** samccann_ is now known as samccann13:50
Shrewsmordred: what's up with the job timeouts on 572829?14:03
*** nchakrab_ has quit IRC14:04
*** nchakrab has joined #zuul14:05
*** nchakrab has quit IRC14:05
*** nchakrab has joined #zuul14:06
mordredShrews: ugh. jeez. who the heck knows14:12
Shrewsmordred: AttributeError: 'TaskManager' object has no attribute 'submit_function'14:13
mordredShrews: hrm14:13
mordredoh for the love of14:15
Shrewswhy was https://review.openstack.org/580487 rebased on the shade removal changes?14:15
* Shrews blames Friday14:16
mordredShrews: I can fix that on my next push14:17
*** nchakrab has quit IRC14:18
mordredShrews: I think we might want to get this: https://review.openstack.org/#/c/414759 fixed so that the TaskManager stuff isn't conflicting - then do the shade patches on top of that?14:18
mordredof course, that patch probably has the same issue14:19
mordredoh. that's the positional argument thing14:20
*** quiquell|lunch is now known as quiquell14:21
mordredShrews: I really want to finish cleaning this mess up14:22
mordredShrews: I hit recheck on 414759 - I feel like I fixed that issue already14:23
*** nchakrab has joined #zuul14:24
mordredShrews, tobiash, corvus, SpamapS: https://review.openstack.org/#/q/topic:container-images is the stack of things related to building zuul container images14:26
mordred(ignore the tripleo patches from last year)14:27
D0hanis zuul limited to gerrit and github only?14:29
mordredD0han: currently yes. it's pluggable, so support for other systems can certainly be added - but so far nobody has14:30
mordredD0han: at various times various people have expressed interest in both gitlab and bitbucket plugins14:30
mordredand I believe we'd also like to see those added14:30
D0hanmakes sense14:31
D0hanwhat about nodepool, can i have mixed one with some static and cloud?14:32
openstackgerritMerged openstack-infra/nodepool master: Change TaskManager from is-a to has-a thread  https://review.openstack.org/58046314:32
rcarrillocruzyeah, there's a driver for static nodes14:38
rcarrillocruzD0han:14:38
mordredD0han: absolutely!14:38
rcarrillocruzeven some drivers in review for AWS14:38
rcarrillocruznot sure if OCI (containers) already merged14:38
rcarrillocruztristanC is the drivers mastah14:38
*** nchakrab has quit IRC14:39
D0hani have pretty big embedded project to put through ci/cd, and usually ready solutions are hard to apply14:43
D0hanzuul looks neat so far14:43
mordredD0han: excellent - and yes, pre-existing things are often hard to apply to complex projects - I hope we're providing enough flexibility to be useful to you, but not so much that we're useless again :)14:45
D0han;D14:46
rcarrillocruzD0han: fwiw, we use it at ansible networking, to test network appliances modules and roles, works a treat14:47
D0hansounds good14:48
D0hani will probably need to research nodepool to make sure it can do stuff14:49
mordredif it can't - definltey chat with us about use case you have that it can't meet14:50
D0hanif i understand correctly, triggering flows/jobs/pipelines/whatever is on zuul, and nodepool takes care of providing nodes to execute them - but is there some part that keeps track of history what has been running on specific nodes/resources?14:52
*** quiquell is now known as quiquell|off15:08
rcarrillocruzD0han: https://ansible.softwarefactory-project.io/zuul/builds.html15:08
rcarrillocruzit will be available from Zuul dhasboard, builds15:09
rcarrillocruzas for nodes15:09
rcarrillocruzhttps://ansible.softwarefactory-project.io/zuul/nodes.html15:09
rcarrillocruzwhat node is assigned to what job is set on the job definition itself15:09
*** yolanda__ has joined #zuul15:16
D0hani have a pool of devices v1 (A,B,C,D) and v2 (A,E,F); job is set to run on pool v1 - can i check execution history regarding A?15:18
*** EmilienM is now known as EvilienM15:18
*** yolanda_ has quit IRC15:18
rcarrillocruzi don't think there's a way from dashboard to reference what jobs ran recently on $static nodes, but i think you can pull that info if you have logstash, like infra have or sf can install15:20
rcarrillocruzhttp://logstash.openstack.org/15:20
rcarrillocruzfrom there you can search for node15:20
rcarrillocruzthen reference back15:20
mordredit might also be possible to add $something to be able to track such a thing15:20
mordredShrews: ^^ fwiw15:20
rcarrillocruzlook on the left15:21
rcarrillocruz'build_node'15:21
rcarrillocruzif you click on it, it will show jobs per node type15:21
rcarrillocruzin oyur use case, you would see per static nodes15:21
Shrewsnodepool knows nothing about the jobs that run on the nodes it delivers15:21
Shrewsyour zuul job would have to log that info somewhere15:22
Shrewswe have $something that logs such info: http://logs.openstack.org/59/414759/14/check/tox-py35/626e2c6/zuul-info/15:23
Shrewsi don't know offhand where that comes from15:23
D0hani need such feature because of embedded device development process, that helps to recognize broken nodes15:24
rcarrillocruzyah, honestly the only thing i can think of as a central location to look at 'what jobs ran on node FOO' is logstash, i.e. by mining job logs15:24
mordredwell, I mean - we could log node into the build log15:25
D0hanthat brings another question - can zuul handle in some smart way 'suddenly broken nodes'?15:25
mordred(in zuul)15:25
mordredand then it would be possible to expose a way to search/report on that in the dashboard15:25
D0hanthis would be similar to jenkins15:26
*** nchakrab has joined #zuul15:27
mordredD0han: as for suddenly broken nodes - no, I do not believe so, although Shrews would know better than I. there's a bunch of work ongoing related to the scheduler and static nodes though15:27
mordredI think the tricky thing would be defining "suddenly broken" in a way that nodepool could detect such a thing15:27
D0hanyep15:27
mordredsince test jobs could obviously legitimately fail - or else they're not very useful test jobs ;)15:28
D0hanyep again15:28
D0hanso, theres need for health check during putting node into the pool15:29
corvusi can think of a way we could have zuul tell nodepool that it thinks a node is broken.  so if someone can write a playbook which determins whether a node is broken, we can have nodepool take it out of service.15:29
rcarrillocruzcorvus: like a pre - pre - run playbook15:29
rcarrillocruz?15:30
corvus(this feauter doesn't exist yet, but most of the pieces are there to do that, so wouldn't be hard to add)15:30
mordred++15:30
corvusrcarrillocruz: i was thinking a pre-playbook with an extra return code that tells nodepool to hold the node15:30
corvusprobably we would only allow trusted playbooks to return that code15:30
mordredoh yeah - that would totally work15:30
rcarrillocruzic15:30
corvusand zuul records the result as a pre_fail (and re-runs the job on a new node)15:31
D0hanyoure talking about pre, so this health check would run just before the test?15:31
corvusD0han: yes15:32
D0hani think it would be better to run it during assimilation into the pool, so its known and ready for tests15:32
mordredwell, I'd imagine that the pre playbook return would cause zuul to reschedule that job on a different node - so I think it would wind up having a similar effect - but would also cover "something went wrong on this node since the last time it was used in a test"15:33
* mordred just thinking out loud15:34
D0hanthis may make more sense with this info - i need to be able to get node from the pool and give it directly to dev/other system for some time, and node then comes back in unknown state15:34
corvusmordred: yeah -- hopefully the automatic rescheduling helps avoid a user impact15:34
*** nchakrab has quit IRC15:36
*** nchakrab has joined #zuul15:36
D0hanhm, or maybe i need to stack zuul nodepool on top of lava15:37
D0hankinda redundant15:37
corvusD0han: if the node failure doesn't cause the job to fail, does it still matter that it's detected later rather than earlier?15:39
D0hanyes, because 'later' creates delay for tests15:39
*** nchakrab has quit IRC15:41
D0hanmaybe even smarter would be to _not_ run health check if last test on it passed15:41
corvusD0han: well, we're trying to keep nodepool as simple as possible; it doesn't have any facility to run workloads on nodes.  to do so means re-implementing a lot of what zuul does.15:41
D0hanmhm15:42
*** pawelzny has quit IRC15:46
*** gtema has joined #zuul15:49
gtemamordred: now for me, what is with timeouts? Should 414759 be repaired and merged first or what? I am wondered, since shade>sdk change was always passing16:06
*** GonZo2000 has quit IRC16:08
mordredgtema: it's related to the taskmanager stuff16:08
*** nchakrab has joined #zuul16:08
gtemawhat is the plan?16:08
mordredgtema: there is another patch that is about aligning those: https://review.openstack.org/#/c/414759/16:08
*** harlowja has joined #zuul16:08
mordredI've rechecked it to see if recent changes fix it (I feel like I did something to fix this already)16:09
gtemaok16:09
mordredand if it still comes back failing, we can debug further16:09
mordredoh - speaking of16:09
mordredShrews, corvus: I THINK the issue with 414759 the previous time it ran had to do with upper-constraints ...16:10
mordredwhich is obviously not what we want in our lives16:10
*** GonZo2000 has joined #zuul16:10
*** GonZo2000 has quit IRC16:10
*** GonZo2000 has joined #zuul16:10
mordredbut I think IIRC when I dug in last time I discovered that our nodepool-functional jobs are picking up upper-constraints because they're built on top of devstack and as a devstack plugin16:10
*** harlowja has quit IRC16:10
mordredso we probably want to dig in, figure that out (If I'm right) and rectify it, since nodepool does not follow upper-constraints16:11
Shrewsmordred: it just failed again16:25
corvusShrews, mordred: http://logs.openstack.org/59/414759/14/check/nodepool-functional-py35/6974ad2/controller/logs/screen-nodepool-launcher.txt.gz16:31
mordredyeah. that makes no sense to me - since that is not a required argument in openstacksdk or in that patch16:36
mordredwhich makes me think the wrong version of something is getting installed16:36
mordredthere it is: http://logs.openstack.org/59/414759/14/check/nodepool-functional-py35/6974ad2/job-output.txt.gz#_2018-07-13_14_56_16_81016616:38
mordredwait - nevermind16:38
mordredstill not it16:38
corvusdid run ever take a client arg?16:39
corvus(also, i really wish pythons error there included the class name)16:40
mordredright?16:41
mordredand yes - it did16:41
mordredthe shade version does16:41
mordredOH16:42
mordred*DUH*16:42
corvusthe job/plugin installs shade16:42
mordredyah - this patch hasn't migrated to sdk yet16:42
mordredthis is trying to migrate to openstacksdk's task_manager but still using shade for rest calls16:42
mordredI think we need to squish this patch with gtema's patch16:43
gtemashould I do something?16:44
mordredgtema: possibly ...16:44
*** fbo is now known as fbo|off16:46
mordredgtema: I believe we want to squash https://review.openstack.org/#/c/414759 into your patch - mind if I take a stab real quick and push up a new rev?16:48
gtemashure - do it16:48
*** electrofelix has quit IRC16:57
*** GonZo2000 has quit IRC16:59
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Remove OpenStack driver waitForImage call  https://review.openstack.org/58048717:03
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Remove Task class  https://review.openstack.org/58046617:03
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Replace shade and os-client-config with openstacksdk.  https://review.openstack.org/57282917:03
mordredShrews, gtema: rebased/squashed -I think that stack should work now17:04
*** gtema has quit IRC17:08
tobiashmordred: just tried out mitogen (outside of zuul yet) and it speeds up a clean openshift cluster deployment from about 45min to 25min17:23
rcarrillocruzjimi|ansible: ^17:24
rcarrillocruz:-)17:24
tobiashusing openshift-ansible17:24
tobiashand the deployment was successful :)17:24
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: try out mitogen for zuul jobs  https://review.openstack.org/58265417:54
*** acozine1 has joined #zuul17:55
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: break ansible strategy  https://review.openstack.org/58265618:13
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Remove Task class  https://review.openstack.org/58046618:34
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Replace shade and os-client-config with openstacksdk.  https://review.openstack.org/57282918:34
mordredsigh. pep818:34
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: try out mitogen for zuul jobs  https://review.openstack.org/58265418:40
tobiashmordred: it seems like when using mitogen within zuul a job freezes with http://paste.openstack.org/show/725849/19:02
tobiashjust tried that in my testenv19:02
tobiashthe assertion is here: https://github.com/dw/mitogen/blob/d493a3d7ca9c9440848739704f9a1ab2d4118de5/mitogen/core.py#L137019:07
tobiashshell tasks loop forever with 'Waiting on logger'19:12
tobiashlooks like mitogen interferes with log streaming19:12
tobiashmordred: am I right that 'Waiting on logger' means that zuul-console isn't started?19:21
tobiashmordred: mitogen re-uses processes on the other end so that might interfere with the daemonization of zuul-console19:21
corvustobiash: yeah, or at least, it means the executor can't connect to the zuul-console19:26
tobiashcorvus: that's unlikely as it works without the mitogen patch19:26
corvustobiash: no i mean i'm just trying to clarify that it means the ansible process on the executor can't open a socket to the zuul console server on the worker for some reason.  a very likely reason is the one you suggest, but there could be others.  :)19:27
tobiashso I think that with mitogen zuul-console might have a problem with daemonizing or daemonizing kills mitogen on the remote end somehow19:27
tobiashcorvus: yes, I guess in most reasons it's connectivity or a forgotten zuul-console :)19:28
tobiashok, at least nothing is listening on the node19:31
*** sshnaidm|off has quit IRC19:40
tobiashgot it working with a hack20:01
tobiashit's in fact the fork that breaks it20:01
*** nchakrab has quit IRC20:08
tobiashsoo, just did some quick tests with mitogen and zuul20:49
tobiashregarding shell tasks we unfortunately we gain nothing (most likely because of our log streaming)20:50
openstackgerritJames E. Blair proposed openstack-infra/zuul-jobs master: WIP: Add a role to return file comments  https://review.openstack.org/57903320:51
tobiashregarding other tasks (i.g. file module) I tested a relatively simple job with just 100 file tasks the execution time went down from 1m:55s to 1m:25s20:52
tobiashso it is an improvement but not as huge as I expected20:55
tobiashbut I think we can expect lower cpu usage which could improve the scalability of the executors20:56
tobiashin a non-zuul related ansible playbook I saw a reduction of cpu time by 2/320:57
tobiashI also have to note that this task took almost 20s regardless of the ansible plugin: http://git.zuul-ci.org/cgit/zuul-jobs/tree/roles/validate-host/tasks/main.yaml#n1221:00
*** samccann has quit IRC21:03
tobiashand about further 20s were executor preparation time21:04
*** acozine1 has quit IRC21:11
openstackgerritGoutham Pacha Ravi proposed openstack-infra/zuul-jobs master: Attempt to copy the coverage report even if job fails  https://review.openstack.org/58269021:26
*** sshnaidm|off has joined #zuul21:27
-openstackstatus- NOTICE: logs.openstack.org is offline, causing POST_FAILURE results from Zuul. Cause and resolution timeframe currently unknown.21:53
*** ChanServ changes topic to "logs.openstack.org is offline, causing POST_FAILURE results from Zuul. Cause and resolution timeframe currently unknown."21:53
*** rlandy has quit IRC22:11
*** ChanServ changes topic to "Discussion of the project gating system Zuul | Website: https://zuul-ci.org/ | Docs: https://zuul-ci.org/docs/ | Source: https://git.zuul-ci.org/ | Channel logs: http://eavesdrop.openstack.org/irclogs/%23zuul/ | Weekly updates: https://etherpad.openstack.org/p/zuul-update-email"23:38
-openstackstatus- NOTICE: logs.openstack.org is back on-line. Changes with "POST_FAILURE" job results should be rechecked.23:38

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!