Thursday, 2018-08-02

corvusclarkb: thinking further on that, i don't think there's a blocker for creating a 'buildset' page in the webapp now.00:00
corvusi think we have all the infrastructure needed for that.00:00
clarkbdo we need to record buildset uuids?00:01
clarkbbuilds have a uuid but not buildsets I don't think00:01
corvusclarkb: they do have uuids.  there's a buildset table already, which i hope has that as a column00:01
clarkbah00:01
gundalowOK, so I need to work out which variables represent the rest of that URL00:02
gundalowwhich is 1) PR 2)  job00:02
corvusgundalow: to be clear, the checks tab at the top of the PR (with conversation, commits, checks, files changed) is new, and no one has written support in zuul for it yet00:07
corvusgundalow: the existing support is to set the status url of the checks comment-box-thingy at the bottom of the PR.  note that users don't even see the checks comment-box-thingy unless they are logged in00:08
*** clarkb has quit IRC00:09
*** elyezer has quit IRC00:10
gundalowcorvus: yup, asking about the details box at the bottom of the main page00:13
corvusgundalow: the log urls you have have the buildset at the *end* of the url, after the job, which means that there is no single url with a summary of all the jobs run for a pull request00:14
corvusgundalow:  https://ansible.softwarefactory-project.io/logs/4/4/f5c6eca18c78cda01c9432e88b66dbde8a33aece/check/ansible-test-network-integration-vyos-py2/195a67f/00:14
corvusgundalow: that's: /logs/pr[-2:]/pr/commit/pipeline/job/short_buildset_uuid00:15
gundalowoh, so link to https://ansible.softwarefactory-project.io/logs/4/4/f5c6eca18c78cda01c9432e88b66dbde8a33aece/check/ ?00:15
corvusgundalow: that location is all builds of all jobs for that pr00:16
corvusgundalow: if you 'recheck' you'll get another entry at https://ansible.softwarefactory-project.io/logs/4/4/f5c6eca18c78cda01c9432e88b66dbde8a33aece/check/ansible-test-network-integration-vyos-py2/00:16
gundalowah, so I do want with 195a67f/ as that's the last commit00:16
corvuswell, it's the last build.  if the commit on the pr changes, the f56e part will change00:17
corvusgundalow: https://ansible.softwarefactory-project.io/logs/4/4/f5c6eca18c78cda01c9432e88b66dbde8a33aece is probably the best you can do under the circumstances, but it's far from ideal.00:18
openstackgerritMerged openstack-infra/zuul-jobs master: trigger-readthedocs: Move secret bits into a dict  https://review.openstack.org/58776700:19
corvusgundalow: the variables involved there are change.number and change.patchset00:19
corvusgundalow: whether or not the softwarefactory folks consider that a stable interface, i don't know.00:20
corvusgundalow: my personal recommendation would be to avoid doing this because i think that the log url should be treated as opaque data, and the user experience of browsing at that url is pretty confusing.  if someone has some time to add a 'buildset' page to the zuul webapp, that would be a much better solution and would work here.00:21
corvusgundalow: but if you want to do this temporarily, those variable names should get you moving00:22
gundalowThis is mainly for failing tests, we want to match Shippable's functionality, people know to look for logs at the end00:22
gundalowie look at the end of https://github.com/ansible/ansible/pull/4357600:22
corvusgundalow: well, i see nothing at the end because i'm not logged in to github00:23
corvusoh the little red "X" is a link to shippable00:24
corvushttps://screenshots.firefox.com/1TstE3nuXtLDIdNC/github.com00:24
corvusthat's the not-logged-in view00:24
gundalowYup and the "Details" link to the right00:24
corvusi don't see a details link00:24
gundalowah, interesting00:24
gundalowif you log in00:24
gundalowpeople doing Ansible development will be logged in00:25
corvusi actually almost never log in to github, even when i do use it (i open prs from the command line)00:26
corvusbut you know your users and of course should tailor the experience for them.  i only hope to convey that the check comment-box-thingy at the bottom has some serious and non-obvious ui issues :)00:27
corvusgundalow: anyway, there isn't a direct equivalent to that shippable link yet.  that will be the buildset page in the zuul webapp, once someone writes that (and it shouldn't be too difficult)00:29
corvusthe buildset page will be able to link to the appropriate builds of all the individual jobs for the latest run of the pr00:30
gundalowcorvus: What's the `buildset` page? The right information is currently added as a comment: on `linters : SUCCESS in 1m 19s`00:31
gundalowwhere `linters` is a link00:32
corvusgundalow: that's only a single build of a single job; when you run two jobs you'll have 2 links there00:32
gundalowah, I see00:34
corvusthere's no way to set 2 links as a status in github, so we need a new page to link people to which has the summary of all the jobs00:34
corvus(however, once zuul has support for github's new checks api, the UI there is actually much more suited to zuul.  take a look at the samples here https://github.com/ansible/ansible/pull/43576/checks -- that box on the left is exactly the way that zuul normally presents info)00:35
corvusprogress on either the buildset page or the checks api would be helpful here00:36
corvus(we ultimately will need both of them i think)00:36
gundalowYup, think you are right there00:46
gundalowWe have contacted Shippable to see what their plans are for the new Checks API. One thing we haven't checked yet is if you can use Checks API to display information that isn't tied to a specific line in a file00:48
gundalowwith `ansible-test` we generate junit files, so that should be easy to link to file:line00:48
gundalowcorvus: What do you think we need to do to progress this?00:49
gundalowI don't think this blocks us putting it live for ansible/ansible00:51
gundalowJust a slightly different UI to shippable, though it's still clear where to look for issues01:01
gundalowOn a different topic, one thing we might need to do is use ansible-test to build the `files:` would be great if that was dynamic, though don't think that's an option01:02
*** EmilienM has quit IRC01:02
gundalowas the parsing of `files:` seems to be done before much o fthe job runs at all01:02
gundalow`ansible-test` has a load of logic for following Python imports and working out given a git diff which tests need to be run01:03
gundalowie if lib/ansible/modules/network/vyos/vyos_command.py is change, only run the vyos_command integration tests01:03
*** EmilienM has joined #zuul01:03
gundalowthough if lib/ansible/module_utils/network/vyos/* is changed run all the vyos tests01:03
gundalowwhich is really useful as some of the full test suites are many hours per platforms01:04
*** swest has quit IRC01:52
*** swest has joined #zuul02:16
*** clarkb has joined #zuul03:05
tobiashgundalow: a dynamic files filter does not exist atm but you can leverage job hierarchies and skipping child jobs via zuul_return03:51
tobiashhttps://zuul-ci.org/docs/zuul/user/jobs.html#return-values03:52
gundalowtobiash: interesting. Can I trigger things to run via that before having to run the main job and waste time booting the nodesets that will not be used?04:35
tobiashgundalow: you can run many jobs per change and also define dependencies between them. So the idea could be to have a job that runs before all other jobs and decides which will be run.04:38
tobiashgundalow: like http://paste.openstack.org/show/727114/04:40
tobiashthat will make 'job-a' and 'job-b' run after 'decide-job'04:40
tobiashand if decide-job returns a list of child jobs to run via zuul_return the children zuul runs will be filtered by that list04:41
tobiashthe job result is the skipped for a child that isn't run and there also won't be a vm booted for the skipped jobs04:41
tobiashwith that you can build up a full directed acyclic graph of jobs to run04:44
openstackgerritIan Wienand proposed openstack-infra/zuul-jobs master: trigger-readthedocs: fix typo  https://review.openstack.org/58813704:52
gundalowtobiash: interesting. Can one job return facts that another job uses?05:02
tristanCgundalow: iirc zuul_return data that are not named zuul are passed as variable to child jobs05:13
tobiashYes, that's correct05:15
gundalowtristanC: tobiash wonder if something like this would work https://etherpad.openstack.org/p/gundalow05:20
openstackgerritMerged openstack-infra/zuul-jobs master: trigger-readthedocs: fix typo  https://review.openstack.org/58813705:23
tristanCgundalow: not sure the list can be extended, it may get replaced by the last zuul_return if multiple list are set05:39
tobiashgundalow: added some comments to this etherpad05:47
tobiashI think the general approach can work if you obey these notes05:47
openstackgerritIan Wienand proposed openstack-infra/zuul-jobs master: Debugging for readthedoc web ping  https://review.openstack.org/58814606:27
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: Require at least openstacksdk 0.17.1  https://review.openstack.org/58814906:41
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: Fix comparison with wrong mime type  https://review.openstack.org/58815006:43
*** snapiri has joined #zuul07:30
*** snapiri has quit IRC07:32
*** snapiri has joined #zuul07:32
*** gouthamr has quit IRC07:36
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: Fixup header/footer  https://review.openstack.org/58816307:44
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: Only link timestamps  https://review.openstack.org/58816407:44
tobiashcorvus: this is a fix and a suggestion to htmlify that makes it work in our environment ^07:45
*** SotK has quit IRC07:45
*** eandersson has quit IRC07:46
tobiashcorvus: feel free to squash this into your change or just ignore it :)07:46
*** leifmadsen_ has quit IRC07:47
*** leifmadsen has joined #zuul07:50
openstackgerritMerged openstack-infra/zuul-jobs master: Debugging for readthedoc web ping  https://review.openstack.org/58814607:50
*** quiquell has joined #zuul08:09
quiquellGood morning08:09
quiquellHave a question about how to reproduce a zuul run locally08:10
quiquellIt's possible to startup a local zuul and point it to a specific change in gerrit towards my openstack ?08:10
*** jpena|off is now known as jpena08:12
*** jlvillal has joined #zuul08:29
tobiashquiquell: in principle that might be possible but it would be *much* work08:33
quiquelltobiash: ack08:35
tobiashcorvus: I got a bug report from a user that zuul matches wrong project-templates under certain circumstances08:37
*** electrofelix has joined #zuul08:37
tobiashcorvus: the analysis showed that he has two branches with config and pulling in different project templates08:37
tobiashcorvus: like foo and foo-bar while a change to foo-bar matches both project pipelines08:38
tobiashcorvus: it looks that the implied branch matcher doesn't do a full match08:38
*** pcaruana has joined #zuul09:00
*** SotK has joined #zuul09:06
openstackgerritIan Wienand proposed openstack-infra/zuul-jobs master: Revert "Debugging for readthedoc web ping"  https://review.openstack.org/58818609:18
openstackgerritMerged openstack-infra/zuul-jobs master: Revert "Debugging for readthedoc web ping"  https://review.openstack.org/58818610:04
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix wrong matched project template  https://review.openstack.org/58820110:19
tobiashcorvus: this fixes that behavior ^10:20
*** panda|rover|bbl is now known as panda|rover10:47
*** jpena is now known as jpena|lunch11:04
*** maxamillion has quit IRC11:04
*** mattclay has quit IRC11:04
*** gregdek has quit IRC11:04
*** zxiiro has quit IRC11:04
*** samccann has joined #zuul11:15
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/pipelines route  https://review.openstack.org/54152111:25
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: scheduler: add job's parent name to the rpc job_list method  https://review.openstack.org/57347311:27
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/labels route  https://review.openstack.org/55397911:32
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/nodes route  https://review.openstack.org/55399811:32
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: dashboard: add /{tenant}/job.html page to display job details  https://review.openstack.org/53554511:44
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: dashboard: add /{tenant}/projects.html web page  https://review.openstack.org/53787011:44
*** elyezer has joined #zuul12:02
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: dashboard: add /{tenant}/labels.html web page  https://review.openstack.org/55398012:15
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: dashboard: add /{tenant}/nodes.html web page  https://review.openstack.org/55399912:16
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: dashboard: add jobs graph rendering  https://review.openstack.org/53786912:16
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: dashboard: add project pipeline rendering  https://review.openstack.org/53787112:17
*** zxiiro has joined #zuul13:03
*** maxamillion has joined #zuul13:28
*** jpena|lunch is now known as jpena13:33
*** quiquell is now known as quiquell|off13:43
*** quiquell|off has quit IRC13:48
*** shachar has joined #zuul13:53
*** shachar has quit IRC13:54
corvustobiash: unfortunately there's a whole lot of log files in openstack jobs with random timestamp formats.  i think we can add support for them, but in the mean time, we should probably fall back on linking the whole line if there's no timestamp, so that we get something.13:59
corvustobiash: and it might be good to structure that as a list of regexes or something, since we know there are a lot of formats14:00
dmsimardmordred: is it known that zuul_stream doesn't provide the runtime for every task ?14:34
dmsimardmordred: I was trying to compare how long it took to upload logs upstream vs rdo and there's no runtime for the log upload task at the very end of http://logs.openstack.org/93/582493/6/gate/tripleo-ci-centos-7-containers-multinode/2364aac/job-output.txt.gz but you can see runtime in some other tasks intermittently14:35
mordreddmsimard: I don't think it's known?14:35
dmsimardhmmm, I wonder why14:37
dmsimardmordred: there's no runtime here https://github.com/openstack-infra/zuul/blob/master/zuul/ansible/callback/zuul_stream.py#L43014:37
dmsimardcould it be as simple as that ?14:37
dmsimardor here https://github.com/openstack-infra/zuul/blob/master/zuul/ansible/callback/zuul_stream.py#L417-L42314:39
dmsimardunrelated: clever use of format with dict keys, didn't realize you could do that and it just ignored other keys14:40
pabelangerwe could also consider using14:40
pabelangercallback_whitelist = profile_tasks, timer14:40
pabelangerto give extra output on tasks14:40
pabelangerhttp://samdoran.com/performance-tuning-ansible-playbooks/ is a good write up14:40
dmsimardpabelanger: I thought about that too first but then profile_tasks adds lines to the output which would probably mangle the "tidy" output that zuul_stream does14:41
dmsimardor maybe it wouldn't, I dunno.14:41
dmsimardpabelanger: fwiw I really like this website which demonstrates what each callback does in practice: https://rndmh3ro.github.io/14:42
pabelangercould I get a few reviews on https://review.openstack.org/557947/ adds test-emit-job-header so we can nodepool info14:53
*** gouthamr has joined #zuul15:01
tobiashcorvus: how does the current htmlify work in openstack? There only the timestamp seems to be linked.15:12
tobiashdoes it do it more sophisticated or is it tailored to zuul logs?15:12
clarkbtobiash: it has whitelisted filenames that it knows it can parse then applies a set of regexes to make sense of them15:12
clarkbtobiash: it is fairly tairlored ot the logs that openstack ci produces for openstack services15:13
tobiashunderstood15:13
clarkbas for linking lines without timestamps the existing os-loganalyze tool will create a block back to the last timestamp iirc15:16
clarkband if there are no timestamp at all then the whole thing is one giant block?15:17
corvusi think we can just have a list of regexes for different timestamp formats, and if one matches, use it.  otherwise, link the whole line.15:24
dmsimardcorvus: could it be as simple of simply linking log lines instead of relying on parsing and timestamps ?15:27
corvusdmsimard: i don't follow15:30
dmsimardlogs.o.o/some/path/file?line=1015:31
corvusdmsimard: what are you proposing as the hyperlink text?15:31
dmsimardhmmmm, it's probably not that easy in practice15:36
dmsimard:(15:36
dmsimardI went back and read how I had done it for ara and it's likely not appropriate to do for general purpose logging15:37
dmsimardbut tl;dr in case it might be helpful anyway, ara uses pygment's htmlformatter http://pygments.org/docs/formatters/#HtmlFormatter15:39
dmsimardwhich wraps around the YAML pygments lexer15:40
dmsimardthe lexer provides the syntax highlighting and the html formatter provides the table for line linking and highlighting15:40
*** eandersson has joined #zuul16:10
tristanCdmsimard: fwiw https://review.openstack.org/580891 adds multi-line selection to os-loganalyze, with anchor similar to github, e.g. job-output.txt#L23-L4216:12
*** jpena is now known as jpena|off16:30
*** jpena|off is now known as jpena16:31
*** jpena is now known as jpena|off16:32
corvustristanC: fwiw, it's looking like we may stop using os-loganalyze in openstack-infra -- so that we don't have to run a proxy to swift17:07
corvustristanC: ^ moving that to #openstack-infra17:15
openstackgerritMerged openstack-infra/zuul-jobs master: Require at least openstacksdk 0.17.1  https://review.openstack.org/58814917:48
*** electrofelix has quit IRC18:02
*** rcarrillocruz has joined #zuul18:03
tobiashhrm, we filled up 3*4 TB disk space with broken nodepool builds :/18:10
clarkbtobiash: image builds?18:10
tobiashwe had an dib-image which failed at every build by trying to download something that's not existing anymore18:10
tobiashand nodepool doesn't seem to do any cleanup after failed dib-image-builds18:11
clarkbtobiash: we built latest nodepool to aggressively rebuild on failures so you don't wait a day for the next attempt, but maybe we should add a backoff18:11
clarkbtobiash: dib is expected to clean up after itsel18:11
clarkbianw: ^ might know if there are issues with that18:12
tobiashclarkb: well, rebuilding directly is ok, but not leaving the tmp dirs behind18:12
tobiashdeleting that cruft takes ages...18:12
pabelangerI think nodepool will only clean up when we delete a image, if it fails, don't think nodepool does anything. Maybe we should18:14
pabelangerbut, we only delete images today18:14
tobiashthat's left behind: http://paste.openstack.org/show/727169/18:16
tobiashplus the same directories as dib_image...18:16
clarkbpabelanger: yes dib is expected to clean up on failure18:16
clarkbit has a bunch of on exit handler code18:16
clarkblikely just a bug and need to update the handlers to clean more stuff up18:16
tobiashso maybe a dib upgrade fixes this18:16
tobiashI'm still on an ancient dib...18:17
pabelanger++18:19
*** rcarrillocruz has quit IRC18:20
pabelangerlooking at nb02.o.o, I can also see some leaked builds18:21
tobiashpabelanger: but you're probably on the latest release18:22
pabelangerwe likely could do the same thing in nodepool, we do for zuul-executor and lost builds dirs. Delete them on startup if any are found18:22
pabelangertobiash: 22GB leaked for nb02.o.o18:24
pabelangerclarkb: ^18:24
clarkbpabelanger: we should really fix dib18:24
tobiashlucky you, we leaked 12TB within one week18:24
corvustobiash: can you try the latest dib and see if it's stil a problem?18:27
tobiashpabelanger: delete on startup is not sufficient if the builders live long18:28
tobiashcorvus: yes, but probably not this evening18:28
pabelangerclarkb: tobiash I think we stop / start nodepool-builder during DIB build: https://nb02.openstack.org/ubuntu-bionic-0000000203.log18:28
pabelangerwhich caused the leak18:28
pabelangertrying to confirm18:29
corvuscool.  let's wait for the results of that.18:29
clarkbpabelanger: ah that could be, possible that dib doesn't clean up properly when init reaps it18:30
corvuspabelanger: nodepool should tell dib to stop and wait for it.  if not, that's a bug in nodepool.18:30
tobiashpabelanger: ok, the delete on start would fix your interrupted dib case18:30
clarkbcorvus: I don't think it does, I think we fall back to init18:30
corvusthen we should fix nodepool there18:30
corvus(i'm not opposed to eventually adding delete on startup, but we should only do so after we've covered all the cases where it shouldn't be necessary)18:31
clarkb++ and also possibly have dib handle this better if it is leaking stuff in its on exit handlers18:32
corvusso first step is to identify any dib bugs and fix them.  then identify any nodepool bugs and fix them.  then add delete on startup.  otherwise we'll just paper over a bunch of broken.18:32
pabelangerhttp://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-07-23.log.html#t2018-07-23T18:49:1818:32
pabelangeryah, Shrews looks to have restarted them18:32
tobiashcorvus: a pod restart would make proper cleanup impossible18:32
pabelangerso, that explains one of the leaks18:32
tobiashcorvus: but that could be handled in our init script as well18:32
corvustobiash: i can't really help what other programs do to us.  my goal is to make *our* program work as well as possible.18:33
tobiashcorvus: yes, so I think I'll just add a cleanup in our startup script because in our environment the builder could get hard killed sometimes18:34
corvustobiash: well, if you can help us identify the bug that would be great18:34
corvustobiash: so upgrading dib and reporting whether it fixed the problem would help18:35
tobiashcorvus: yes, that's still needed as nodepool filled up its disks without any restart ;)18:35
tobiashstartup cleanup wouldn't have helped in my issue18:35
corvusright.  again, my goal is to make sure that dib is working as designed, and nodepool are working as designed.  if either of them leakes during ongoing operation, or on restart, we need to fix that.18:35
corvuswhen there are zero bugs in that code, i think we can add delete on startup to deal with unforseen situations.   but if we add it early, we won't be able to fix our bugs.18:36
tobiash++18:36
tobiashcorvus: ok, updated dib on one of my builders and at it at least successfully builds an image. Now I can test how it behaves if the build fails.19:45
gundalowShrews: mordred corvus tristanC Ansible Contributor Summit signup info https://github.com/ansible/community/issues/247#issuecomment-410042603 (feel free to forward to whoever)19:54
mordredgundalow: done! thanks19:59
tobiashcorvus: looks like a recent dib does a better job regarding cleanup :)20:01
openstackgerritJames E. Blair proposed openstack-infra/zuul-jobs master: Add HTMLify logs role  https://review.openstack.org/58810520:06
openstackgerritJames E. Blair proposed openstack-infra/zuul-jobs master: Add HTMLify logs role  https://review.openstack.org/58810520:12
corvustobiash: i squashed your template fix into that patch ^20:13
tobiash++20:35
pabelangerI wouldn't expect zuul to give a -1 on https://review.openstack.org/588387/ given depends-on, it removes the job in question20:50
pabelangerwill dig more into it later, stepping away for some food20:51
*** elyezer has quit IRC21:11
*** dmsimard has quit IRC21:24
corvuspabelanger: you're right.  but i have a theory -- it may be that there is more than one error (ie, something else refers to that job).  but we only return the first error to the user, and that one from the second pass at layout generation (the one where we don't include the config-project change).  here's the code: http://git.zuul-ci.org/cgit/zuul/tree/zuul/manager/__init__.py#n46221:28
corvuspabelanger: so maybe we need to keep both layouts around, and if there are issues with the layout that includes trusted changes, report the error from that.  otherwise, the error from the second layout.21:29
corvusshould be relatively easy to add a test for this case.  basically, have a job used in a config project and untrusted project.  then make a change which removes the job, and have it depends-on a change that removes it from the config project.  but leave the usage in the untrusted project.21:30
ianwtobiash: dib exit cleanup is a known point of some issue.  it depends quite a lot on *when* the failure happens, as there's a few points we might exit and might not unwind mounts cleanly and then can leave things behind21:30
ianwpatches welcome :)21:30
*** dmsimard has joined #zuul21:32
*** samccann has quit IRC21:35
openstackgerritJames E. Blair proposed openstack-infra/zuul-jobs master: Swift logs: don't allow links outside of the supplied path  https://review.openstack.org/58758022:19
corvusclarkb, tobiash: ^ that should take care of the symlink-in-tests problem22:19
*** goern has quit IRC22:45
pabelangercorvus: okay, thanks for the pointer. I'll see if I can reproduce the issue with a unit test, then try to patch zuul23:19
openstackgerritClark Boylan proposed openstack-infra/zuul-jobs master: Swift logs: don't allow links outside of the supplied path  https://review.openstack.org/58758023:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!