Monday, 2018-07-16

openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: zk: skip node already being deleted in cleanup leaked instance task https://review.openstack.org/576288	00:09
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: job: add ansible-tags and ansible-skip-tags attribute https://review.openstack.org/575672	00:10
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: angular: call enableProdMode https://review.openstack.org/573494	00:13
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: gerrit: add support for report only connection https://review.openstack.org/568216	00:15
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/pipelines route https://review.openstack.org/541521	00:29
*** jiapei has joined #zuul		03:23
*** elyezer has quit IRC		04:24
*** elyezer has joined #zuul		04:26
tobiash	tristanC: these functions now return (True, 'Reason') or (False, 'Reason') and both still evaluate to True and I think that's dangerous for a 'matches' function	05:24
tobiash	tristanC: but you could wrap this in a class and overwrite the __bool__ function https://docs.python.org/3/reference/datamodel.html#object.__bool__	05:30
tobiash	tristanC: like a FalseWithReason class	05:33
*** elyezer has quit IRC		05:35
*** elyezer has joined #zuul		05:36
*** gtema has joined #zuul		05:53
*** bhavik1 has joined #zuul		06:35
tristanC	tobiash: oh good idea, i'll update the review	06:35
*** pcaruana has joined #zuul		06:36
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: github: add event filter debug https://review.openstack.org/580547	06:37
*** bhavik1 has quit IRC		06:40
*** hashar has joined #zuul		07:00
*** quiquell\|off is now known as quiquell		07:04
*** elyezer has quit IRC		07:21
*** fbo\|off is now known as fbo		07:23
*** elyezer has joined #zuul		07:24
*** elyezer has quit IRC		07:33
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/pipelines route https://review.openstack.org/541521	07:33
*** elyezer has joined #zuul		07:36
*** jiapei has quit IRC		07:43
*** dmsimard has quit IRC		07:54
*** dmsimard has joined #zuul		07:55
tobiash	yay, tested that caching patch in prod now and freeze time of one of our bigger jobs went down from 400ms before to 9ms now :)	08:09
tristanC	98% increased performance :-)	08:20
openstackgerrit	Markus Hosch proposed openstack-infra/zuul master: Reduce number of reconfigurations on branch delete https://review.openstack.org/580967	09:10
openstackgerrit	Markus Hosch proposed openstack-infra/zuul master: Per-branch management of unparsed config cache https://review.openstack.org/582897	09:10
*** electrofelix has joined #zuul		09:14
openstackgerrit	Markus Hosch proposed openstack-infra/zuul master: Per-branch management of unparsed config cache https://review.openstack.org/582897	09:19
*** sambetts_ is now known as sambetts		09:25
*** electrofelix has quit IRC		09:59
*** electrofelix has joined #zuul		10:11
*** rcarrill1 is now known as rcarrillocruz		10:16
*** elyezer has quit IRC		10:23
*** elyezer has joined #zuul		10:24
*** electrofelix has quit IRC		10:34
quiquell	tristanC: zuul.project.src_dir is the place to generate stuff in a job ?	10:50
tristanC	quiquell: you need to fetch to zuul.executor.log_root	10:51
quiquell	tristanC: This directory is also the directory of the log.o.o ?	10:55
quiquell	tristanC: I have try zuul.executor.work_root¶ but I don't have permissions there	10:56
tristanC	quiquell: the upload-log roles executed by the base post playbook will exports the file in executor.log_root to logs.o.o	11:00
quiquell	tristanC: Ok, thanks !	11:01
quiquell	tristanC: And zuul.executor.work_root, we cannot write there.	11:02
tristanC	quiquell: you're welcome. no you can't write to work_root, but log_root should be fine	11:05
quiquell	tranzemc: Ok, is there any way we can instruct upload-log to ignore some dirs ?	11:06
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: docs: add job's logs guide https://review.openstack.org/582921	11:12
tristanC	quiquell: https://review.openstack.org/582921 is a copy of https://softwarefactory-project.io/docs/user/zuul_user.html#export-logs-artifacts-to-the-logserver which contains more detailed instructions	11:12
tristanC	not sure logstash_processor_config are available to untrusted-projects in zuul.openstack.org	11:13
quiquell	tristanC: So looks like zuul.project.src_dir is the correct place to generate stuff	11:15
quiquell	tristanC: Then copy what want to be logged to log_root	11:15
quiquell	Is that correct ?	11:15
tristanC	oh right, you can prepare artifacts in zuul.project.src_dir before doing the fetch to zuul.executor.log_root	11:16
quiquell	tristanC: And after that just copy it to log_root	11:16
quiquell	ok	11:17
tristanC	yes, zuul.project is on the ephemeral test instance, zuul.executor is on the executor, e.g. localhost	11:17
quiquell	so zuul.project.src_dir is like the workspace for the build	11:17
quiquell	and also the place for the cloned project	11:17
tristanC	you can also create a dir at "{{ ansible_env.HOME }}/workspace" on the test node to get a clean workspace	11:20
quiquell	tristanC: Yep we where doing so, but looking for an already created place	11:21
quiquell	tristanC: Do you see any problem on using src_dir ?	11:21
tristanC	quiquell: beware that zuul.project.src_dir is relative to ansible_env.HOME	11:22
*** elyezer has quit IRC		11:22
tristanC	quiquell: src_dir should work, but make sure you don't conflict with existing files, for example in case there is already a "build" or "logs" directory in the projects, or as a result of the test	11:22
tristanC	otherwise you'll fetch and export extra bits	11:22
quiquell	tristanC: That's a good one, will go back to 'workspace' dir	11:23
quiquell	tristanC: Thanks	11:23
*** elyezer has joined #zuul		11:23
tristanC	quiquell: you're welcome :-)	11:23
*** electrofelix has joined #zuul		11:23
*** GonZo2000 has joined #zuul		11:30
*** GonZo2000 has quit IRC		11:30
*** GonZo2000 has joined #zuul		11:30
*** gtema has quit IRC		11:32
*** sshnaidm is now known as sshnaidm\|rover		11:43
*** quiquell is now known as quiquell\|lunch		12:12
*** rlandy has joined #zuul		12:23
*** quiquell\|lunch is now known as quiquell		12:37
*** elyezer has quit IRC		12:44
*** samccann has joined #zuul		12:50
*** elyezer has joined #zuul		12:57
fungi	note that having an explicit workspace the job is allowed to write to on the job nodes is a bit of a jenkinsism inherited in zuul v2 which we got rid of in v3. as discussed in #openstack-infra the job can write anywhere on a node you like as long as the usual posix filesystem permissions are taken into account (so if you want the "zuul" user to write in /opt/mydir you need to use root permissions in the job to	13:04
fungi	create and chown/chmod it accordingly)	13:04
*** gtema has joined #zuul		13:05
fungi	there _is_ a "workspace" on the executors because they're shared by lots of builds and tightly restricted as to where they allow writes, but you shouldn't write files explicitly on the executor unless you know you need to do that (e.g., a nodeless executor-only job)	13:06
*** electrofelix has quit IRC		13:07
*** gtema has quit IRC		13:08
mordred	Shrews: I made a pbrx patch for nodepool too https://review.openstack.org/#/c/582732/	13:29
mordred	tristanC: I'm gonna try to get something pushed up this morning, but if I don't, I'll hand it off to you for sure	13:29
*** rcarrill1 has joined #zuul		13:30
Shrews	cool	13:30
*** rcarrillocruz has quit IRC		13:31
*** rcarrill1 is now known as rcarrillocruz		13:33
Shrews	hrm, i'm not sure what's happening with the nodepool-functional-py35-src job: http://logs.openstack.org/32/582732/2/check/nodepool-functional-py35-src/7c024da/job-output.txt.gz#_2018-07-15_13_58_14_680051	13:41
Shrews	python3 should be the version in use, so that's weird	13:41
*** acozine1 has joined #zuul		13:42
Shrews	but obviously it's not :/	13:44
openstackgerrit	Merged openstack-infra/zuul-jobs master: ara-report: add missing ara_report_run check https://review.openstack.org/577675	13:52
*** sshnaidm\|rover is now known as sshnaidm\|afk		13:54
pabelanger	Shrews: looks like broken dependency for diskimage-builder	13:55
pabelanger	guess something dropped 2.7 support	13:56
Shrews	oh, so maybe yesterday's release of astroid breaks dib	13:59
Shrews	i thought dib was also using py3 but i guess not?	14:00
Shrews	oh, i think dhellmann already has a fix out	14:04
*** openstackgerrit has quit IRC		14:04
Shrews	maybe not	14:07
*** weshay is now known as weshay_mtg		14:30
*** quiquell is now known as quiquell\|off		14:32
*** sshnaidm\|afk is now known as sshnaidm\|rover		14:47
corvus	tristanC, tobiash: any thoughts about my email question regarding building container images?	14:55
pabelanger	I still need to read up on it, but hope to reply this afternoon	14:57
*** mhu is now known as mhu\|He-Man		15:00
*** mhu\|He-Man is now known as mhu		15:01
clarkb	corvus: I need to write a response to that, tl;dr is dib can already build container images (it might not be the best/most useful tool for that but is possible iirc) so might be more a question of uploading images than building them?	15:02
*** pcaruana has quit IRC		15:02
*** jiapei has joined #zuul		15:03
corvus	clarkb: yeah, i meant to ask less about "how" and more about "whether we should, regardless of mechanism"	15:03
corvus	clarkb: i agree, if we decide we want to we should consider dib	15:04
pabelanger	Yah, some optimization for DIB and containers would be recommended, but works pretty well.	15:05
pabelanger	a while back create ubuntu-rootfs element to help with container builds: https://review.openstack.org/413115/ needs to be rebased	15:07
mordred	corvus: thanks for the reminder - I just replied to the list	15:08
tobiash	corvus: sorry, I'm in fire fighting mode since quite some time, will read later this evening	15:09
*** weshay_mtg is now known as weshay		15:13
Shrews	i've read mordred's words and find myself agreeing with him (crazy, i know) and cannot think of convincing arguments in the other direction	15:21
corvus	yeah, mordred and jhesketh both make compelling arguments :)	15:23
corvus	Shrews: also, yes, if you agree with mordred you are by definition crazy	15:23
corvus	i frequently agree with mordred, for what that's worth :)	15:23
Shrews	now i agree with corvus, which makes me some sort of exponential of crazy	15:25
mordred	you're all nuts	15:27
corvus	i agree	15:27
mordred	speaking of nuts - if anyone wants to review an ansible role: https://review.openstack.org/#/c/580730/	15:27
mordred	I should recheck the depends-on patches for that just to make sure...	15:28
mordred	corvus: (also, as you may have already seen, I did a 'build images of nodepool' patch as well	15:29
Shrews	speaking of crazy... mordred, i sort of want to land your nodepool openstacksdk change. we ready for that?	15:29
mordred	Shrews: yes. we should totally be ready for that stack	15:29
Shrews	well, i think it's the last one	15:29
Shrews	anyone else want to review that change? https://review.openstack.org/572829	15:30
Shrews	already have two +2's, but since it's a rather large deal, others might be interested	15:31
Shrews	mordred: just did a 'check experimental' on it, fwiw	15:32
mordred	cool	15:34
*** ssbarnea1 has joined #zuul		15:39
ssbarnea1	https://review.openstack.org/#/c/570546/ anyone? one liner.	15:40
*** samccann has quit IRC		15:44
*** samccann has joined #zuul		15:45
*** samccann has quit IRC		15:46
*** samccann has joined #zuul		15:46
*** openstackgerrit has joined #zuul		15:57
openstackgerrit	Fabien Boucher proposed openstack-infra/zuul master: Add tenant yaml validation option to zuul client https://review.openstack.org/574265	15:57
*** sshnaidm\|rover is now known as sshnaidm\|bbl		16:28
*** sshnaidm\|bbl has quit IRC		16:31
tobiash	fbo: added a question to 574265	16:36
tobiash	corvus: what do you think about ^ ?	16:36
pabelanger	looking for some job design help, hopefully this will be clear. In rdoproject.org, we have a project (rdoinfo) that potentially needs to have required-projects of 300 packaging repos (eg: openstack/nova-distgit). Today, I don't believe https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.name can we a regex, any concerns about making it?	16:40
pabelanger	The 2nd part of the design, we actually don't really need to push the 300 projects using prepare-workspace to the node, only if rdoinfo has the depends-on header do we actually care about using that project in require-project to then build an rpm, trying to best think of how to make the rsync more dynamic. Could I use zuul.items here?	16:42
clarkb	pabelanger: maybe update zuul to have depends on populate required projects?	16:50
clarkb	could be a job attribute (I think it may do this in some cases already though?)	16:50
*** sambetts is now known as sambetts\|afk		16:51
*** gtema has joined #zuul		16:54
pabelanger	I see, if zuul is able to fine the project, but not in require-projects auto addit?	16:54
pabelanger	when using depends-on	16:54
clarkb	yeah an implicit required project mode based on depends on	16:55
pabelanger	yah, that would actually work well here, the other concern was, if we added 300 required-projects, how to we keep it in sync when we add new distgit packages. Aside from writing check jobs to help validation	16:58
fungi	could have some nasty side effects if parts of the job rely on required-projects to inform them of what versions of things to install (think tox-siblings) since it could result in deadlocking	16:58
fungi	so we should make it non-default, or have a second required-projects-like var which they go into, or just document that you shouldn't rely on required-projects to only be influenced by explicit static job configuration	16:59
fungi	leaning toward something like a required-projects-implicit list, and then have zuul merge and check those out on disk but keep the vars distinct so you know which were required by configuration and which by dynamic dependencies	17:01
tobiash	corvus: do we already have a concept for buildset resources (nodes) or buildset livetime of nodes?	17:01
pabelanger	Yah, I can see that. I don't actually mind if we allow regex for required-projects then write some check job to deal with sync issues too	17:02
pabelanger	the 300 projects would be ^openstack/*-distgit	17:02
tobiash	corvus: our projects are starting to kill our artifactory with cached stuff which should be handed over to the next child job	17:02
clarkb	pabelanger: my immediate thought on the repo sync is that we should always sync the required projects. Which is why I'm trying to come up with some way of influencing that list rather than special casing behavior around syncing	17:03
clarkb	pabelanger: if you ahve asserted a repo is required theni t should be synced	17:03
tobiash	buildset resources or optional buildset livetimes of nodes could be a good way to solve this use case much better than using a centralized hosting service	17:03
pabelanger	clarkb: yah, I think if we added some logic or new role to only push projects in zuul.items for this job, it might something that worked today	17:04
clarkb	tobiash: something similar came up in the k8s discussion. A way to keep a build around for the lifetime of all other builds (so that it could host a registry iirc)	17:04
*** gtema has quit IRC		17:04
pabelanger	if I understand zuul.items correctly	17:04
*** hashar is now known as hasharAway		17:04
pabelanger	tobiash: yah, rdoproject just pushed to central log server today for artifacts, so really looking forward to container spec to help remove that dependency	17:05
fungi	tobiash: at one point we had discussed providing (limited) scratch space on executors readable by other builds, arranging so dependent jobs all get their builds for a particular changeset run from the same executor, and expiring the shared scratch space when the last build in a dependent job set terminates	17:05
tobiash	clarkb: yes, I remember this. Did this reach a point where I could start to implement such a functionality?	17:05
clarkb	pabelanger: it will still have to update 300 repos for that job though, you are only optimizing half of the problem (the sync from executor to the other test nodes)	17:06
fungi	oh, but the long-lived build also makes sense. especially if it can come with some sort of provider affinity	17:06
clarkb	tobiash: I think the spec may have been updated to mention it, but unsure if anything more has been done	17:06
pabelanger	clarkb: agree, downside to that	17:07
fungi	build node provider affinity might also be easier to satisfy than provider-pinned executors	17:07
fungi	and easier to scale	17:07
pabelanger	clarkb: however, likely step 1 here to solve the issue how we can today without zuul changes, then work to maybe implement what we discussed here today	17:08
fungi	we already have something similar for multi-node build provider affinity anyway	17:08
clarkb	pabelanger: ya the downside is it means you have to support and deprecate that functionality in the repo syncing role	17:09
openstackgerrit	Goutham Pacha Ravi proposed openstack-infra/zuul-jobs master: Attempt to copy the coverage report even if job fails https://review.openstack.org/582690	17:09
pabelanger	clarkb: yup and the job is broke today, so working to land new feature (assuming we wan to do it) might just be best path here	17:11
pabelanger	I'd like to hear from mordred / corvus too	17:11
tobiash	clarkb, fungi: ah yes, it was in the container spec	17:13
logan-	pabelanger: i'm confused why required-projects is being used for the repos that should only be cloned/synced when depends-on is present	17:13
logan-	those will get cloned/synced due to the depends-on even if they are not in required projects	17:14
clarkb	logan-: I thought we might do that but wasn't sure.	17:16
tobiash	So I think buildset lifetime and provider affinity could be solved one after the other	17:16
pabelanger	I also think I am leaving out a key piece of information, we also need to setup the https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.override-checkout setting, because these projects don't have a master branch: https://review.rdoproject.org/r/13330/	17:17
pabelanger	we get the following error: ERROR Project review.rdoproject.org/openstack/networking-cisco-distgit does not have the default branch master	17:17
tobiash	provider affinity in nodepool might be as easy as if a provider is requested, all other providers just decline the request	17:17
corvus	wow, are we talking about two really complicated issues at the same time?	17:17
fungi	so it seems	17:17
tobiash	oh sorry, I'll defer my thoughs	17:18
corvus	okay, i'm going to need like 10 minutes to untangle scrollback	17:18
fungi	calls for a scrollback deinterlacer	17:20
corvus	i'm doing it. i'll have an etherpad in a minute	17:20
corvus	https://etherpad.openstack.org/p/ajP8DUX02S	17:23
corvus	first time i've ever had to do that :)	17:23
corvus	okay, i'm going to read the conversation about required-projects first	17:23
clarkb	I'm sure slack would've solved that for us right? >_>	17:24
corvus	/kick clarkb	17:25
mordred	clarkb: wasn't chromakode working on a threaded chat client at one point?	17:25
clarkb	mordred: ya I think he was involved with some startup that was going to fix chat problems. Like the other dozen startups all doing that :)	17:25
clarkb	tl;dr it is a hard problem	17:26
mordred	yup	17:27
corvus	okay, i've read the required-projects chat, and my understanding is: i agree with logan-: depends-on should cause the repo to show up on disk, but pabelanger says that we also need a specific branch of that repo checked out. pabelanger, what branch do you want to have checked out? presumably these repos don't have the branch of the change (otherwise it would have checked them out). is the issue that you	17:30
corvus	need a specific branch checked out, or that those projects need a default branch so that the job checks something out and the error goes away?	17:30
corvus	pabelanger: try setting "default-branch" on the projects in question in their project stanzas. you can do this in a config-project. you can even do it with a regex project matcher, like "name: .*-distgit" so you don't have to type it 300 times. https://zuul-ci.org/docs/zuul/user/config.html#attr-project.default-branch	17:33
pabelanger	corvus: we'd need a specific branch checked out, in the case of the error it is rpm-master, but also would need queens-rdo, pike-rdo. I am unsure the history here for rdoproject why the branches are different	17:34
pabelanger	corvus: ack, I can test that	17:34
corvus	pabelanger: well, that doesn't apply if you need different branches checked out	17:34
pabelanger	yes, I was first looking to fix the master error first, but rdoproject does development on the other branches too	17:35
corvus	pabelanger: if you have a project (-distgit) with branches that correspond to an equal set of branches in another project, zuul can only handle that if they have the same name.	17:36
corvus	pabelanger: what i would do is recognize the advantages of having corresponding branches of different projects have the same name and use that system going forward so that zuul and humans can both do the intuitive thing. pike == pike. :)	17:37
corvus	pabelanger: but, if you need to map different branch names to each other, then i'd recommend setting the default-branch attribute to get rid of the error, then you can always manually check out the right branch in a pre-playbook.	17:37
corvus	pabelanger: so you can say "if zuul.branch == pike, checkout pike-rdo"	17:38
logan-	would https://zuul-ci.org/docs/zuul/user/config.html#attr-pragma be useful to bridge the branch name issue?	17:38
corvus	pabelanger: you can even iterate over all of the non-required projects by iterating over zuul.projects and looking for required=false. those will be depends-on projects (or, possibly, the project of the change under test)	17:38
pabelanger	okay, thanks. Let me get error resolved first using defaul-branch, and confirm if other branches have an issue	17:38
pabelanger	I don't really know yet	17:39
corvus	logan-: it can do so for job definitions (ie, jobs on pike should apply to changes pike-rdo). and may have a place here just for that depending on what the job definitions look like. but i don't think it's a complete solution to the problem --	17:39
corvus	logan-: because it won't cause the pike-rdo versions of the repos to be checked out for a pike change	17:39
logan-	ah	17:40
corvus	logan-: however, if one were in a situation where you had branch variants of jobs and could specify required-projects for every branch, you could construct a mapping like that. so the pike job runs on changes to pike, and changes to pike-rdo, and it's required-projects says: check out nova@pike and nova-distgit@pike-rdo. but that only works for things added to required-projects.	17:41
corvus	so it's really close, but the 300 projects thing throws a wrench in it here :)	17:42
corvus	pabelanger: okay, sounds like you've got the next one or two steps to take yeah?	17:43
pabelanger	I do, thanks for help	17:44
corvus	ok, i'll read the buildset conversation now :)	17:44
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Don't write docker proxy config if docker_mirror undef https://review.openstack.org/583010	17:47
corvus	tobiash, clarkb: yeah, i think we can start with having jobs zuul_return something from the main playbook which says "i completed successfully, keep me running until my child jobs are finished, then run my post playbook". that's what i was getting at in the container spec, and should allow us to implement all kinds of things based on a create, wait, cleanup pattern.	17:48
corvus	provider affinity makes that better, but isn't required for a first pass, and shouldn't interfere with the initial implementation.	17:49
tobiash	corvus: ah so that would keep the job alive instead of just the node	17:49
tobiash	corvus: that's an interesting idea	17:49
*** GonZo2000 has quit IRC		17:49
corvus	tobiash: yep. that makes things like "create an object store container; child jobs use it; delete container" easy to do.	17:50
corvus	that could be a zero-node job	17:50
tobiash	ok, I was more on the trip adding a leave-me-alive annotation to a node in a buildset but your idea seems to be more powerful	17:51
corvus	this also touches on SpamapS's idea of a cleanup job -- a job which runs after all other jobs are finished. but i think the idea of suspending then resuming the parent job may be able to accomplish the same things, and more, so may be the better approach.	17:51
corvus	tobiash: to be fair, i think what i wrote in the spec suggests more of what you describe. i think i, erm, may have forgotten to write down this newly revised version. :)	17:52
tobiash	corvus: so with that zuul_return info would you want to pause just at the end of the playbook that called this or just at the end of the run playbook?	17:53
corvus	tobiash: i think only at the end of the run playbook? i think it will be difficult to figure out what to do if a job pauses twice, so if we say we can only do it after the run playbook, that makes things clearer?	17:54
corvus	though, i guess we could say that we only pause the first time.	17:55
tobiash	corvus: depends, we could also say pause will be done just once,	17:55
corvus	whether that happens in a pre/run/post doesn't really matter, i guess...	17:55
tobiash	corvus: except you want to attach the success status to this signal	17:56
pabelanger	just catching up, +1 for zuul_return + keep running	17:56
tobiash	corvus: but I don't think we need to attach the success status to this signal as that job still can just use zuul_return to forward any variable to the child jobs	17:57
tobiash	corvus: so I have a slight preference to just let it pause once regardless at which playbook	17:58
tobiash	(but also can live with end of run playbook)	17:58
tobiash	detail question: would be add a new metric to the executor stats (jobs starting, running, paused)?	17:59
tobiash	s/be/we/	18:00
corvus	tobiash: well, we still decide whether to run child jobs based on the result of the parent. right now, that's the final result. we'd be talking about moving that to ether the run playbook, or any playbook. so if we allow it on any, then we can end up in a situation where the pre_playbook pauses with success and child jobs run, then the run playbook fails, so the job result switches to failure. that's	18:00
corvus	weird, but, i guess okay? only allowing this from the run playbook avoids that situation.	18:00
openstackgerrit	Adam Harwell proposed openstack-infra/zuul-jobs master: Make revoke-sudo work on base cloud-init images https://review.openstack.org/564674	18:01
tobiash	corvus: good point	18:01
corvus	tobiash: a new metric sounds like a fine idea	18:01
corvus	(we should also expose the job state in the api and the status page)	18:01
* mordred likes pause-at-end-of-run		18:02
tobiash	uhm, more work, but yes, definitely	18:02
tobiash	corvus: so with your point we should do pause only at end of run	18:02
corvus	my gut says only do this for the run playbook. it's clear and should be sufficient. i think if we decide we want to allow it at other playbooks later, we make that change later.	18:02
corvus	(if a use case comes up that pause-at-end-of-run can't handle)	18:03
tobiash	however the first job still could switch to post failure	18:03
tobiash	but I guess that's ok then in this case	18:03
pabelanger	does parent unpause once child jobs are finished?	18:03
corvus	yep. i feel like that's a fairly minor change.	18:03
corvus	pabelanger: yes	18:04
pabelanger	and we'd pass the parent watchdog too I assume	18:04
pabelanger	pause*	18:04
tobiash	corvus: so for start automatic unpause if all recursive children are finished?	18:05
corvus	someday, someone is going to ask to be able to unpause before child jobs are finished. i'm sure we'll be able to accomodate that then. but until then, let's keep it simple and just unpause when all child jobs are finished.	18:05
corvus	tobiash: exactly	18:05
tobiash	corvus: that use case probably could be added easily also via zuul_return	18:05
corvus	tobiash: ya	18:05
tobiash	(later)	18:05
pabelanger	yah, I don't see a need to unpause for the rdoproject use case	18:05
tobiash	so I guess I will start tomorrow with that	18:06
corvus	pabelanger: good question -- should the parent job timeout be paused? i think probably so.	18:06
mordred	I think so too	18:06
tobiash	because we'll kill our artifactory within the next two weeks probably without this feature :(	18:06
corvus	tobiash: great, when i next update the spec, i'll clarify this section to match :)	18:06
pabelanger	++	18:07
* mordred is excited about this		18:07
pabelanger	exciting	18:07
pabelanger	mordred: YES	18:07
tobiash	mordred: excited about killing artifactory? ;)	18:07
corvus	one more detail: i think currently aborted jobs don't run post-playbooks	18:07
corvus	so you could end up creating containers but not cleaning them up	18:08
tobiash	corvus: yes, they just stop	18:08
tobiash	corvus: do you think for the container use case we need an 'always run this post playbook'-annotation in the job?	18:09
clarkb	tobiash: corvus maybe a new cleanup-playbook: specifier	18:10
tobiash	or that	18:10
tobiash	definitly better than a post playbook annotation	18:10
corvus	yeah, i think one of those would be useful.	18:10
corvus	with cleanup-playbook, we need to define the nesting order (it has 4 dimenions now, which is harder to think about than 3 :). with annotation, we need to alter the yaml structure to allow for annotations (post-run is currently a simple list of strings; we'd have to make it list of [string or dict])	18:11
corvus	i think if we did cleanup-playbook, maybe add cleanup playbooks before post playbooks at each level.	18:13
corvus	like: pre-parent, pre-child, run, cleanup-child, post-child, cleanup-parent, post-parent	18:14
corvus	the annotation approach would let you do that plus more options.	18:15
corvus	i think either would work fine, just a matter of (a) whether we want the extra flexibility, and (b) whether we're going to end up needing annotations anyway in the future for some other change :)	18:16
mordred	corvus: I'm curious - why before post and not after post?	18:16
pabelanger	I could see post-run always runs regardless of aborts and maybe expect users to use blocks with zuul_success, then we can add clean up handler things into roles in always section. But maybe too much work on user side?	18:17
tobiash	mordred: after post could be difficult if you deregister the build ssh key in the post	18:17
*** acozine1 has quit IRC		18:17
mordred	oh. good point	18:17
corvus	mordred: because in the simplest case of just one job level (no child inheritance), you'd probably want "upload logs" to be last, and that's the only way you could do that.	18:17
tobiash	mordred: and you might want to have logs about the cleanup	18:17
corvus	pabelanger: oh, yes, that's another option. but we'd need to change a lot of existing jobs, i bet.	18:18
clarkb	corvus: tobiash could also have cleanup be exclusive to post	18:18
clarkb	have one or the other	18:18
mordred	still gotta deal with inheritance hierarchy though	18:18
clarkb	ya so parent pre, child pre, run, child cleanup, parent post type of deal?	18:19
* mordred has been convinced of pre-run-cleanup-post a sequence		18:19
clarkb	not sure that is easier or more clear	18:19
tobiash	hrm, the annotation would make the inheritance easy	18:20
corvus	clarkb: true, but i don't think that gains us much (and loses us the ability to have a job with both a cleanup and regular post playbook; granted, you can still do the same thing with conditionals, but you may have to build more logic into the playbook than otherwise)	18:20
pabelanger	corvus: yah, some jobs today (want to say tox) are already using zuul_success for log collection, but agree, we'd likely need some post-run clean up	18:20
corvus	pabelanger: yeah. right now, for example, we're not uploading logs for aborted builds, just because the playbook isn't running.	18:21
*** jiapei has quit IRC		18:23
corvus	tobiash: i lean ever so slightly towards the annotation idea, because it's more future proof, and because it keeps the pre/run/post sequence looking simple (but intuitively accomodates more complexity when needed)	18:23
tobiash	corvus: shall we use storyboard for noting such ideas?	18:26
tobiash	I find it hard to find them after weeks buried deep in the backlog	18:26
corvus	tobiash: i thought you were writing this tomorrow? :)	18:26
tobiash	corvus: that were two ideas :)	18:27
corvus	i think we'll find quickly that pausing jobs will require cleanup playbooks :)	18:27
tobiash	yes, but technically they're independent of each other	18:27
corvus	tobiash: storyboard is a fine place for such ideas, though the first half of this idea is in the container spec	18:29
tobiash	corvus: I'll need pausing jobs now to not being killed in the next two weeks and after I survived I can volunteer to implement the cleanup if nobody else took that task	18:29
corvus	tobiash: you don't need a cleanup for artifactory?	18:30
tobiash	corvus: ok, I think that both may fit well into the container spec	18:30
tobiash	corvus: we do that by annotating expiry dates to the artifacts and asynchronously deleting expired stuff	18:30
corvus	tobiash: or are you going to do the thing described in the container spec and run a service on a node for the duration of the buildset?	18:30
corvus	tobiash: so what are you going to use the pause for?	18:31
tobiash	corvus: my plan is to leave a lightweight node with the first job running serving the short lived cache instead of artifactory	18:31
corvus	ah, ok. so yeah, that's the model described in the container spec, and i agree, it doesn't need cleanup (deleting the node is sufficient)	18:32
tobiash	corvus: like a prepare-workspace job that gets the synced source and git-lfs data (several GB)	18:32
tobiash	and all the other jobs get their data from that node and push their data to that node	18:32
tobiash	then that's not hitting artifactory at all and the network traffic is more decentral in the cloud and not targeted to a single load balancer	18:33
corvus	having said all of that, fungi made a point earlier that we could engineer inter-job scratch space on the executors after we add executor affinity.	18:33
corvus	but, i think in the long run, having both of these options will be good. and 'pause' is probably both easier to implement, and also useful for the container work.	18:34
mordred	++	18:34
tobiash	and in my case probably distributes the network traffic better	18:34
fungi	yeah, i like the resource build idea anyway since it's more flexible	18:34
mordred	I think the scratch space from pause seems more potentially scalable, since the space can grow with job nodesets as needed ... but I could also see executor scratch space with affinity being a thing too	18:35
mordred	also - scratch space from pause will work soon (potentially) for tobias - and just be inefficient for openstack/multi-cloud scenarios until affinity is done	18:35
corvus	(pause, i'll note, will benefit from provider affinity too, and that may be effectively required for the container use case in some environments, but isn't strictly required like executor affinity is for scratch space)	18:35
fungi	the scratch space on executor model wins on simplicity but mostly only handles the one use case of dependent jobs sharing artifacts	18:35
mordred	corvus: jinx	18:35
corvus	mordred: :)	18:35
fungi	and is also yet one more place to run into executor scaling issues	18:36
fungi	if provider build affinity gets implemented then a resource build could theoretically handle sharing very large ephemeral artifacts between jobs with decent performance	18:37
corvus	yep	18:37
fungi	whereas executor scratch space would need to be tightly constrained to avoid creating a denial of service scenario	18:37
fungi	(not just in terms of disk space but also bandwidth consumption)	18:38
fungi	and to get similar network performance you'd need provider-specific executors, which is yet one more scaling axis to manage	18:39
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add role for installing docker and configuring registry mirror https://review.openstack.org/580730	18:39
tobiash	ok, so the plan is to implement job pause, then cleanup, then provider affinity?	18:40
fungi	and probably we could have roles available in the zuul-jobs stdlib to set up arbitrary storage during a resource build and pass around the necessary credentials, so in the end it _could_ be made just as easy as the scratch space on executor idea, i think	18:41
corvus	tobiash, fungi: ++	18:46
tobiash	fungi: the credentials (ssh key?) could also be passed via zuul_return to the child jobs	18:48
tobiash	so ++ for zuul-jobs	18:48
fungi	tobiash: yes, that's what i had in mind, just thinking we could orchestrate the handling of it via zuul_return in said role(s)	18:48
tobiash	yes good idea	18:49
tobiash	corvus: regarding provider affinity, would you request that via an info in zuul_return (on demand) or in the project pipeline (static)?	18:54
* mordred would vote for project pipeline - so that zuul/nodepool would know at the beginning that they might need to allocate all the nodes for a job graph in the same provider (might be important to know from a capacity perspective)		18:56
mordred	like, if parent (2 nodes) + 2 children (4 each) need a total of 10 nodes and one of the providers only has 8 nodes of capacity - letting the parent schedule there then request affinity via zuul_return would potentially lead to a completely stuck situation	18:58
mordred	s/capacity/total capacity/	18:58
mordred	but I'm just thikning out loud	18:58
tobiash	mordred: hrm, that would require a credit card like request model, take 5 nodes now but make sure you can fulfill 13 nodes	18:59
corvus	tobiash, mordred: project-pipeline (static) means that we can provide the most information to nodepool. whether we make use of it now or not is a separate question. it would let us do the really sophisticated thing that mordred describes in the future if we want, but we can still do a simpler version (request child jobs in the same provider as parent if they indicate they need it) as well.	19:27
tobiash	corvus: just had a further idea when having a prepare-workspace job that pauses it might reduce load on the executor if we could tag a job to skip setting up non-playbook/roles repos	19:29
tobiash	regarding provider affinity I maybe would tag the jobs in the project pipeline with a provider group (arbitrary user choosable value) to indicate that this set of jobs need to run on the same provider. With this information we could easily do the validation if the whole group could be in theory satisfied by the provider (the abs quota check in nodepool)	19:39
*** rcarrill1 has joined #zuul		19:49
openstackgerrit	Merged openstack-infra/zuul-jobs master: Don't write docker proxy config if docker_mirror undef https://review.openstack.org/583010	19:50
*** rcarrillocruz has quit IRC		19:51
mordred	Shrews: ^^ woot!	19:53
mordred	Shrews: I now expect a bazillion patches to land all in a row	19:54
Shrews	mordred: was that the dependency for the pbrx jobs?	19:54
mordred	yup	19:56
Shrews	oh, that was that one's parent, actually	19:56
mordred	yeah	19:56
mordred	that one there is really just a cleanup	19:57
*** sshnaidm\|bbl has joined #zuul		20:00
corvus	tobiash: could you, today, make a new base job which didn't copy the workspaces over, and inherit from that for jobs which shouldn't do that?	20:21
pabelanger	there was a good idea from mordred about adding a new group into inventory that was something like skip_git, then we update prepare-workspace to run on hosts: all,!skip_git and repos shouldn't get pushed. But I haven't tested that yet.	20:23
pabelanger	but there is also a need for that workflow in rdoproject to help save some IO / time	20:23
tobiash	corvus: we have such a base job but I mean in this case we don't even have to prepare all repos on the executor	20:23
tobiash	pabelanger: our base job just reacts on a skip_synchronization variable that can be set on a job or even parts of the nodes in a nodeset	20:25
tobiash	pabelanger: you don't need groups for that	20:25
openstackgerrit	Goutham Pacha Ravi proposed openstack-infra/zuul-jobs master: Attempt to copy the coverage report even if job fails https://review.openstack.org/582690	20:27
dmsimard	corvus: not entirely sure what that job runs but it's probably worth considering sending some of that output to log files instead of stdout so that the job logs are the ara-report are manageable	20:47
corvus	dmsimard: the job only emits output on test failure because the output is so big. all the tests failing is a pathological case.	20:47
dmsimard	oh, so it's not generally that big -- got it	20:48
corvus	ya. normal case is, say 0-5 test failures :)	20:48
corvus	(for channel context, this is in re ara's performance with a very large sqlite database)	20:49
*** hasharAway has quit IRC		20:58
SpamapS	Oh interesting... a parent job that can say "I've done the things children might need" and then pause and wait for the children to finish. I like that, and the implementation would be pretty simple I think, since you could just use SIGSTOP/SIGCONT	21:07
mordred	SpamapS: you're a SIGCONT	21:07
SpamapS	Or a control socket or something else I suppose.	21:07
SpamapS	mordred: A feckless SIGCONT?	21:08
* SpamapS should probably have SIGSTOP'd himself there.		21:08
*** samccann has quit IRC		21:12
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Put ubuntu_gpg_key into defaults instead of vars https://review.openstack.org/583047	21:17
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Build container images using pbrx https://review.openstack.org/580160	21:28
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Specify a prefix for building the images https://review.openstack.org/582396	21:28
*** ianw_pto is now known as ianw		21:44
openstackgerrit	Merged openstack-infra/zuul master: Update bindep file with compile profiles https://review.openstack.org/580159	21:50
openstackgerrit	Merged openstack-infra/zuul master: Add alpine packages to bindep.txt https://review.openstack.org/582276	21:58
*** harlowja has joined #zuul		21:59
*** jpena\|off has quit IRC		22:33
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: Install less than alpine-sdk https://review.openstack.org/583062	22:35
tristanC	mordred: should i look into using a TenantName Singleton services to query api/info and manage the zuul_api_root_url?	22:45
tristanC	mordred: and updates all the component to wait for the singleton service to be setup...	22:46
tristanC	mordred: i was think we could have a tenant drop-down list, like the project list in horizon, where you could just switch tenant if many are available	22:46
tristanC	hum, but that wouldn't work with the '/t/{tenant}/page.html' routing...	22:47
tristanC	mordred: what do you think would be the easiest fix for the current ui issue?	22:48
tristanC	well, i volunteer to fix that bug as it seems like a release blocker, but are there other blockers i can work on?	22:59
*** harlowja has quit IRC		23:03
openstackgerrit	Ian Wienand proposed openstack-infra/zuul-jobs master: upload-logs: generate a script to download logs https://review.openstack.org/581204	23:45
openstackgerrit	Ian Wienand proposed openstack-infra/zuul-jobs master: upload-logs: generate a script to download logs https://review.openstack.org/581204	23:55
openstackgerrit	Merged openstack-infra/zuul-jobs master: Put ubuntu_gpg_key into defaults instead of vars https://review.openstack.org/583047	23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!