Monday, 2018-07-16

openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: zk: skip node already being deleted in cleanup leaked instance task  https://review.openstack.org/57628800:09
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: job: add ansible-tags and ansible-skip-tags attribute  https://review.openstack.org/57567200:10
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: angular: call enableProdMode  https://review.openstack.org/57349400:13
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: gerrit: add support for report only connection  https://review.openstack.org/56821600:15
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/pipelines route  https://review.openstack.org/54152100:29
*** jiapei has joined #zuul03:23
*** elyezer has quit IRC04:24
*** elyezer has joined #zuul04:26
tobiashtristanC: these functions now return (True, 'Reason') or (False, 'Reason') and both still evaluate to True and I think that's dangerous for a 'matches' function05:24
tobiashtristanC: but you could wrap this in a class and overwrite the __bool__ function https://docs.python.org/3/reference/datamodel.html#object.__bool__05:30
tobiashtristanC: like a FalseWithReason class05:33
*** elyezer has quit IRC05:35
*** elyezer has joined #zuul05:36
*** gtema has joined #zuul05:53
*** bhavik1 has joined #zuul06:35
tristanCtobiash: oh good idea, i'll update the review06:35
*** pcaruana has joined #zuul06:36
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: github: add event filter debug  https://review.openstack.org/58054706:37
*** bhavik1 has quit IRC06:40
*** hashar has joined #zuul07:00
*** quiquell|off is now known as quiquell07:04
*** elyezer has quit IRC07:21
*** fbo|off is now known as fbo07:23
*** elyezer has joined #zuul07:24
*** elyezer has quit IRC07:33
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/pipelines route  https://review.openstack.org/54152107:33
*** elyezer has joined #zuul07:36
*** jiapei has quit IRC07:43
*** dmsimard has quit IRC07:54
*** dmsimard has joined #zuul07:55
tobiashyay, tested that caching patch in prod now and freeze time of one of our bigger jobs went down from 400ms before to 9ms now :)08:09
tristanC98% increased performance :-)08:20
openstackgerritMarkus Hosch proposed openstack-infra/zuul master: Reduce number of reconfigurations on branch delete  https://review.openstack.org/58096709:10
openstackgerritMarkus Hosch proposed openstack-infra/zuul master: Per-branch management of unparsed config cache  https://review.openstack.org/58289709:10
*** electrofelix has joined #zuul09:14
openstackgerritMarkus Hosch proposed openstack-infra/zuul master: Per-branch management of unparsed config cache  https://review.openstack.org/58289709:19
*** sambetts_ is now known as sambetts09:25
*** electrofelix has quit IRC09:59
*** electrofelix has joined #zuul10:11
*** rcarrill1 is now known as rcarrillocruz10:16
*** elyezer has quit IRC10:23
*** elyezer has joined #zuul10:24
*** electrofelix has quit IRC10:34
quiquelltristanC: zuul.project.src_dir is the place to generate stuff in a job ?10:50
tristanCquiquell: you need to fetch to zuul.executor.log_root10:51
quiquelltristanC: This directory is also the directory of the log.o.o ?10:55
quiquelltristanC: I have try zuul.executor.work_root¶ but I don't have permissions there10:56
tristanCquiquell: the upload-log roles executed by the base post playbook will exports the file in executor.log_root to logs.o.o11:00
quiquelltristanC: Ok, thanks !11:01
quiquelltristanC: And zuul.executor.work_root, we cannot write there.11:02
tristanCquiquell: you're welcome. no you can't write to work_root, but log_root should be fine11:05
quiquelltranzemc: Ok, is there any way we can instruct upload-log to ignore some dirs ?11:06
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: docs: add job's logs guide  https://review.openstack.org/58292111:12
tristanCquiquell: https://review.openstack.org/582921 is a copy of https://softwarefactory-project.io/docs/user/zuul_user.html#export-logs-artifacts-to-the-logserver which contains more detailed instructions11:12
tristanCnot sure logstash_processor_config are available to untrusted-projects in zuul.openstack.org11:13
quiquelltristanC: So looks like zuul.project.src_dir is the correct place to generate stuff11:15
quiquelltristanC: Then copy what want to be logged to log_root11:15
quiquellIs that correct ?11:15
tristanCoh right, you can prepare artifacts in zuul.project.src_dir before doing the fetch to zuul.executor.log_root11:16
quiquelltristanC: And after that just copy it to log_root11:16
quiquellok11:17
tristanCyes, zuul.project is on the ephemeral test instance, zuul.executor is on the executor, e.g. localhost11:17
quiquellso zuul.project.src_dir is like the workspace for the build11:17
quiquelland also the place for the cloned project11:17
tristanCyou can also create a dir at "{{ ansible_env.HOME }}/workspace" on the test node to get a clean workspace11:20
quiquelltristanC: Yep we where doing so, but looking for an already created place11:21
quiquelltristanC: Do you see any problem on using src_dir ?11:21
tristanCquiquell: beware that zuul.project.src_dir is relative to ansible_env.HOME11:22
*** elyezer has quit IRC11:22
tristanCquiquell: src_dir should work, but make sure you don't conflict with existing files, for example in case there is already a "build" or "logs" directory in the projects, or as a result of the test11:22
tristanCotherwise you'll fetch and export extra bits11:22
quiquelltristanC: That's a good one, will go back to 'workspace' dir11:23
quiquelltristanC: Thanks11:23
*** elyezer has joined #zuul11:23
tristanCquiquell: you're welcome :-)11:23
*** electrofelix has joined #zuul11:23
*** GonZo2000 has joined #zuul11:30
*** GonZo2000 has quit IRC11:30
*** GonZo2000 has joined #zuul11:30
*** gtema has quit IRC11:32
*** sshnaidm is now known as sshnaidm|rover11:43
*** quiquell is now known as quiquell|lunch12:12
*** rlandy has joined #zuul12:23
*** quiquell|lunch is now known as quiquell12:37
*** elyezer has quit IRC12:44
*** samccann has joined #zuul12:50
*** elyezer has joined #zuul12:57
funginote that having an explicit workspace the job is allowed to write to on the job nodes is a bit of a jenkinsism inherited in zuul v2 which we got rid of in v3. as discussed in #openstack-infra the job can write anywhere on a node you like as long as the usual posix filesystem permissions are taken into account (so if you want the "zuul" user to write in /opt/mydir you need to use root permissions in the job to13:04
fungicreate and chown/chmod it accordingly)13:04
*** gtema has joined #zuul13:05
fungithere _is_ a "workspace" on the executors because they're shared by lots of builds and tightly restricted as to where they allow writes, but you shouldn't write files explicitly on the executor unless you know you need to do that (e.g., a nodeless executor-only job)13:06
*** electrofelix has quit IRC13:07
*** gtema has quit IRC13:08
mordredShrews: I made a pbrx patch for nodepool too https://review.openstack.org/#/c/582732/13:29
mordredtristanC: I'm gonna try to get something pushed up this morning, but if I don't, I'll hand it off to you for sure13:29
*** rcarrill1 has joined #zuul13:30
Shrewscool13:30
*** rcarrillocruz has quit IRC13:31
*** rcarrill1 is now known as rcarrillocruz13:33
Shrewshrm, i'm not sure what's happening with the nodepool-functional-py35-src job: http://logs.openstack.org/32/582732/2/check/nodepool-functional-py35-src/7c024da/job-output.txt.gz#_2018-07-15_13_58_14_68005113:41
Shrewspython3 should be the version in use, so that's weird13:41
*** acozine1 has joined #zuul13:42
Shrewsbut obviously it's not  :/13:44
openstackgerritMerged openstack-infra/zuul-jobs master: ara-report: add missing ara_report_run check  https://review.openstack.org/57767513:52
*** sshnaidm|rover is now known as sshnaidm|afk13:54
pabelangerShrews: looks like broken dependency for diskimage-builder13:55
pabelangerguess something dropped 2.7 support13:56
Shrewsoh, so maybe yesterday's release of astroid breaks dib13:59
Shrewsi thought dib was also using py3 but i guess not?14:00
Shrewsoh, i think dhellmann already has a fix out14:04
*** openstackgerrit has quit IRC14:04
Shrewsmaybe not14:07
*** weshay is now known as weshay_mtg14:30
*** quiquell is now known as quiquell|off14:32
*** sshnaidm|afk is now known as sshnaidm|rover14:47
corvustristanC, tobiash: any thoughts about my email question regarding building container images?14:55
pabelangerI still need to read up on it, but hope to reply this afternoon14:57
*** mhu is now known as mhu|He-Man15:00
*** mhu|He-Man is now known as mhu15:01
clarkbcorvus: I need to write a response to that, tl;dr is dib can already build container images (it might not be the best/most useful tool for that but is possible iirc) so might be more a question of uploading images than building them?15:02
*** pcaruana has quit IRC15:02
*** jiapei has joined #zuul15:03
corvusclarkb: yeah, i meant to ask less about "how" and more about "whether we should, regardless of mechanism"15:03
corvusclarkb: i agree, if we decide we want to we should consider dib15:04
pabelangerYah, some optimization for DIB and containers would be recommended, but works pretty well.15:05
pabelangera while back create ubuntu-rootfs element to help with container builds: https://review.openstack.org/413115/ needs to be rebased15:07
mordredcorvus: thanks for the reminder - I just replied to the list15:08
tobiashcorvus: sorry, I'm in fire fighting mode since quite some time, will read later this evening15:09
*** weshay_mtg is now known as weshay15:13
Shrewsi've read mordred's words and find myself agreeing with him (crazy, i know) and cannot think of convincing arguments in the other direction15:21
corvusyeah, mordred and jhesketh both make compelling arguments :)15:23
corvusShrews: also, yes, if you agree with mordred you are by definition crazy15:23
corvusi frequently agree with mordred, for what that's worth :)15:23
Shrewsnow i agree with corvus, which makes me some sort of exponential of crazy15:25
mordredyou're all nuts15:27
corvusi agree15:27
mordredspeaking of nuts - if anyone wants to review an ansible role: https://review.openstack.org/#/c/580730/15:27
mordredI should recheck the depends-on patches for that just to make sure...15:28
mordredcorvus: (also, as you may have already seen, I did a 'build images of nodepool' patch as well15:29
Shrewsspeaking of crazy... mordred, i sort of want to land your nodepool openstacksdk change. we ready for that?15:29
mordredShrews: yes. we should totally be ready for that stack15:29
Shrewswell, i think it's the last one15:29
Shrewsanyone else want to review that change? https://review.openstack.org/57282915:30
Shrewsalready have two +2's, but since it's a rather large deal, others might be interested15:31
Shrewsmordred: just did a 'check experimental' on it, fwiw15:32
mordredcool15:34
*** ssbarnea1 has joined #zuul15:39
ssbarnea1https://review.openstack.org/#/c/570546/ anyone? one liner.15:40
*** samccann has quit IRC15:44
*** samccann has joined #zuul15:45
*** samccann has quit IRC15:46
*** samccann has joined #zuul15:46
*** openstackgerrit has joined #zuul15:57
openstackgerritFabien Boucher proposed openstack-infra/zuul master: Add tenant yaml validation option to zuul client  https://review.openstack.org/57426515:57
*** sshnaidm|rover is now known as sshnaidm|bbl16:28
*** sshnaidm|bbl has quit IRC16:31
tobiashfbo: added a question to 57426516:36
tobiashcorvus: what do you think about ^ ?16:36
pabelangerlooking for some job design help, hopefully this will be clear. In rdoproject.org, we have a project (rdoinfo) that potentially needs to have required-projects of 300 packaging repos (eg: openstack/nova-distgit). Today, I don't believe https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.name can we a regex, any concerns about making it?16:40
pabelangerThe 2nd part of the design, we actually don't really need to push the 300 projects using prepare-workspace to the node, only if rdoinfo has the depends-on header do we actually care about using that project in require-project to then build an rpm, trying to best think of how to make the rsync more dynamic. Could I use zuul.items here?16:42
clarkbpabelanger: maybe update zuul to have depends on populate required projects?16:50
clarkbcould be a job attribute (I think it may do this in some cases already though?)16:50
*** sambetts is now known as sambetts|afk16:51
*** gtema has joined #zuul16:54
pabelangerI see, if zuul is able to fine the project, but not in require-projects auto addit?16:54
pabelangerwhen using depends-on16:54
clarkbyeah an implicit required project mode based on depends on16:55
pabelangeryah, that would actually work well here, the other concern was, if we added 300 required-projects, how to we keep it in sync when we add new distgit packages. Aside from writing check jobs to help validation16:58
fungicould have some nasty side effects if parts of the job rely on required-projects to inform them of what versions of things to install (think tox-siblings) since it could result in deadlocking16:58
fungiso we should make it non-default, or have a second required-projects-like var which they go into, or just document that you shouldn't rely on required-projects to only be influenced by explicit static job configuration16:59
fungileaning toward something like a required-projects-implicit list, and then have zuul merge and check those out on disk but keep the vars distinct so you know which were required by configuration and which by dynamic dependencies17:01
tobiashcorvus: do we already have a concept for buildset resources (nodes) or buildset livetime of nodes?17:01
pabelangerYah, I can see that. I don't actually mind if we allow regex for required-projects then write some check job to deal with sync issues too17:02
pabelangerthe 300 projects would be ^openstack/*-distgit17:02
tobiashcorvus: our projects are starting to kill our artifactory with cached stuff which should be handed over to the next child job17:02
clarkbpabelanger: my immediate thought on the repo sync is that we should always sync the required projects. Which is why I'm trying to come up with some way of influencing that list rather than special casing behavior around syncing17:03
clarkbpabelanger: if you ahve asserted a repo is required theni t should be synced17:03
tobiashbuildset resources or optional buildset livetimes of nodes could be a good way to solve this use case much better than using a centralized hosting service17:03
pabelangerclarkb: yah, I think if we added some logic or new role to only push projects in zuul.items for this job, it might something that worked today17:04
clarkbtobiash: something similar came up in the k8s discussion. A way to keep a build around for the lifetime of all other builds (so that it could host a registry iirc)17:04
*** gtema has quit IRC17:04
pabelangerif I understand zuul.items correctly17:04
*** hashar is now known as hasharAway17:04
pabelangertobiash: yah, rdoproject just pushed to central log server today for artifacts, so really looking forward to container spec to help remove that dependency17:05
fungitobiash: at one point we had discussed providing (limited) scratch space on executors readable by other builds, arranging so dependent jobs all get their builds for a particular changeset run from the same executor, and expiring the shared scratch space when the last build in a dependent job set terminates17:05
tobiashclarkb: yes, I remember this. Did this reach a point where I could start to implement such a functionality?17:05
clarkbpabelanger: it will still have to update 300 repos for that job though, you are only optimizing half of the problem (the sync from executor to the other test nodes)17:06
fungioh, but the long-lived build also makes sense. especially if it can come with some sort of provider affinity17:06
clarkbtobiash: I think the spec may have been updated to mention it, but unsure if anything more has been done17:06
pabelangerclarkb: agree, downside to that17:07
fungibuild node provider affinity might also be easier to satisfy than provider-pinned executors17:07
fungiand easier to scale17:07
pabelangerclarkb: however, likely step 1 here to solve the issue how we can today without zuul changes, then work to maybe implement what we discussed here today17:08
fungiwe already have something similar for multi-node build provider affinity anyway17:08
clarkbpabelanger: ya the downside is it means you have to support and deprecate that functionality in the repo syncing role17:09
openstackgerritGoutham Pacha Ravi proposed openstack-infra/zuul-jobs master: Attempt to copy the coverage report even if job fails  https://review.openstack.org/58269017:09
pabelangerclarkb: yup and the job is broke today, so working to land new feature (assuming we wan to do it) might just be best path here17:11
pabelangerI'd like to hear from mordred / corvus too17:11
tobiashclarkb, fungi: ah yes, it was in the container spec17:13
logan-pabelanger: i'm confused why required-projects is being used for the repos that should only be cloned/synced when depends-on is present17:13
logan-those will get cloned/synced due to the depends-on even if they are not in required projects17:14
clarkblogan-: I thought we might do that but wasn't sure.17:16
tobiashSo I think buildset lifetime and provider affinity could be solved one after the other17:16
pabelangerI also think I am leaving out a key piece of information, we also need to setup the https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.override-checkout setting, because these projects don't have a master branch: https://review.rdoproject.org/r/13330/17:17
pabelangerwe get the following error: ERROR Project review.rdoproject.org/openstack/networking-cisco-distgit does not have the default branch master17:17
tobiashprovider affinity in nodepool might be as easy as if a provider is requested, all other providers just decline the request17:17
corvuswow, are we talking about two really complicated issues at the same time?17:17
fungiso it seems17:17
tobiashoh sorry, I'll defer my thoughs17:18
corvusokay, i'm going to need like 10 minutes to untangle scrollback17:18
fungicalls for a scrollback deinterlacer17:20
corvusi'm doing it. i'll have an etherpad in a minute17:20
corvushttps://etherpad.openstack.org/p/ajP8DUX02S17:23
corvusfirst time i've ever had to do that :)17:23
corvusokay, i'm going to read the conversation about required-projects first17:23
clarkbI'm sure slack would've solved that for us right? >_>17:24
corvus /kick clarkb17:25
mordredclarkb: wasn't chromakode working on a threaded chat client at one point?17:25
clarkbmordred: ya I think he was involved with some startup that was going to fix chat problems. Like the other dozen startups all doing that :)17:25
clarkbtl;dr it is a hard problem17:26
mordredyup17:27
corvusokay, i've read the required-projects chat, and my understanding is: i agree with logan-: depends-on should cause the repo to show up on disk, but pabelanger says that we *also* need a specific branch of that repo checked out.  pabelanger, what branch do you want to have checked out?  presumably these repos don't have the branch of the change (otherwise it would have checked them out).  is the issue that you17:30
corvusneed a specific branch checked out, or that those projects need a default branch so that the job checks *something* out and the error goes away?17:30
corvuspabelanger: try setting "default-branch" on the projects in question in their project stanzas.  you can do this in a config-project.  you can even do it with a regex project matcher, like "name: .*-distgit" so you don't have to type it 300 times.   https://zuul-ci.org/docs/zuul/user/config.html#attr-project.default-branch17:33
pabelangercorvus: we'd need a specific branch checked out, in the case of the error it is rpm-master, but also would need queens-rdo, pike-rdo. I am unsure the history here for rdoproject why the branches are different17:34
pabelangercorvus: ack, I can test that17:34
corvuspabelanger: well, that doesn't apply if you need different branches checked out17:34
pabelangeryes, I was first looking to fix the master error first, but rdoproject does development on the other branches too17:35
corvuspabelanger: if you have a project (-distgit) with branches that correspond to an equal set of branches in another project, zuul can only handle that if they have the same name.17:36
corvuspabelanger: what i would do is recognize the advantages of having corresponding branches of different projects have the same name and use that system going forward so that zuul and humans can both do the intuitive thing.  pike == pike.  :)17:37
corvuspabelanger: but, if you need to map different branch names to each other, then i'd recommend setting the default-branch attribute to get rid of the error, then you can always manually check out the right branch in a pre-playbook.17:37
corvuspabelanger: so you can say "if zuul.branch == pike, checkout pike-rdo"17:38
logan-would https://zuul-ci.org/docs/zuul/user/config.html#attr-pragma be useful to bridge the branch name issue?17:38
corvuspabelanger: you can even iterate over all of the non-required projects by iterating over zuul.projects and looking for required=false.  those will be depends-on projects (or, possibly, the project of the change under test)17:38
pabelangerokay, thanks. Let me get error resolved first using defaul-branch, and confirm if other branches have an issue17:38
pabelangerI don't really know yet17:39
corvuslogan-: it can do so for job definitions (ie, jobs on pike should apply to changes pike-rdo).  and may have a place here just for that depending on what the job definitions look like.  but i don't think it's a complete solution to the problem --17:39
corvuslogan-: because it won't cause the pike-rdo versions of the repos to be checked out for a pike change17:39
logan-ah17:40
corvuslogan-: *however*, if one were in a situation where you had branch variants of jobs and could specify required-projects for every branch, you could construct a mapping like that.  so the pike job runs on changes to pike, and changes to pike-rdo, and it's required-projects says: check out nova@pike and nova-distgit@pike-rdo.  but that only works for things added to required-projects.17:41
corvusso it's really close, but the 300 projects thing throws a wrench in it here :)17:42
corvuspabelanger: okay, sounds like you've got the next one or two steps to take yeah?17:43
pabelangerI do, thanks for help17:44
corvusok, i'll read the buildset conversation now :)17:44
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Don't write docker proxy config if docker_mirror undef  https://review.openstack.org/58301017:47
corvustobiash, clarkb: yeah, i think we can start with having jobs zuul_return something from the main playbook which says "i completed successfully, keep me running until my child jobs are finished, then run my post playbook".  that's what i was getting at in the container spec, and should allow us to implement all kinds of things based on a create, wait, cleanup pattern.17:48
corvusprovider affinity makes that better, but isn't required for a first pass, and shouldn't interfere with the initial implementation.17:49
tobiashcorvus: ah so that would keep the job alive instead of just the node17:49
tobiashcorvus: that's an interesting idea17:49
*** GonZo2000 has quit IRC17:49
corvustobiash: yep.  that makes things like "create an object store container; child jobs use it; delete container" easy to do.17:50
corvusthat could be a zero-node job17:50
tobiashok, I was more on the trip adding a leave-me-alive annotation to a node in a buildset but your idea seems to be more powerful17:51
corvusthis *also* touches on SpamapS's idea of a cleanup job -- a job which runs after all other jobs are finished.  but i think the idea of suspending then resuming the parent job may be able to accomplish the same things, and more, so may be the better approach.17:51
corvustobiash: to be fair, i think what i wrote in the spec suggests more of what you describe.  i think i, erm, may have forgotten to write down this newly revised version.  :)17:52
tobiashcorvus: so with that zuul_return info would you want to pause just at the end of the playbook that called this or just at the end of the run playbook?17:53
corvustobiash: i think only at the end of the run playbook?  i think it will be difficult to figure out what to do if a job pauses twice, so if we say we can only do it after the run playbook, that makes things clearer?17:54
corvusthough, i guess we could say that we only pause the first time.17:55
tobiashcorvus: depends, we could also say pause will be done just once,17:55
corvuswhether that happens in a pre/run/post doesn't really matter, i guess...17:55
tobiashcorvus: except you want to attach the success status to this signal17:56
pabelangerjust catching up, +1 for zuul_return + keep running17:56
tobiashcorvus: but I don't think we need to attach the success status to this signal as that job still can just use zuul_return to forward any variable to the child jobs17:57
tobiashcorvus: so I have a slight preference to just let it pause once regardless at which playbook17:58
tobiash(but also can live with end of run playbook)17:58
tobiashdetail question: would be add a new metric to the executor stats (jobs starting, running, paused)?17:59
tobiashs/be/we/18:00
corvustobiash: well, we still decide whether to run child jobs based on the result of the parent.  right now, that's the final result.  we'd be talking about moving that to ether the run playbook, or any playbook.  so if we allow it on any, then we can end up in a situation where the pre_playbook pauses with success and child jobs run, then the run playbook fails, so the job result switches to failure.  that's18:00
corvusweird, but, i guess okay?  only allowing this from the run playbook avoids that situation.18:00
openstackgerritAdam Harwell proposed openstack-infra/zuul-jobs master: Make revoke-sudo work on base cloud-init images  https://review.openstack.org/56467418:01
tobiashcorvus: good point18:01
corvustobiash: a new metric sounds like a fine idea18:01
corvus(we should also expose the job state in the api and the status page)18:01
* mordred likes pause-at-end-of-run18:02
tobiashuhm, more work, but yes, definitely18:02
tobiashcorvus: so with your point we should do pause only at end of run18:02
corvusmy gut says only do this for the run playbook.  it's clear and should be sufficient.  i think if we decide we want to allow it at other playbooks later, we make that change later.18:02
corvus(if a use case comes up that pause-at-end-of-run can't handle)18:03
tobiashhowever the first job still could switch to post failure18:03
tobiashbut I guess that's ok then in this case18:03
pabelangerdoes parent unpause once child jobs are finished?18:03
corvusyep.  i feel like that's a fairly minor change.18:03
corvuspabelanger: yes18:04
pabelangerand we'd pass the parent watchdog too I assume18:04
pabelangerpause*18:04
tobiashcorvus: so for start automatic unpause if all recursive children are finished?18:05
corvussomeday, someone is going to ask to be able to unpause before child jobs are finished.  i'm sure we'll be able to accomodate that then.  but until then, let's keep it simple and just unpause when all child jobs are finished.18:05
corvustobiash: exactly18:05
tobiashcorvus: that use case probably could be added easily also via zuul_return18:05
corvustobiash: ya18:05
tobiash(later)18:05
pabelangeryah, I don't see a need to unpause for the rdoproject use case18:05
tobiashso I guess I will start tomorrow with that18:06
corvuspabelanger: good question -- should the parent job timeout be paused?  i think probably so.18:06
mordredI think so too18:06
tobiashbecause we'll kill our artifactory within the next two weeks probably without this feature :(18:06
corvustobiash: great, when i next update the spec, i'll clarify this section to match :)18:06
pabelanger++18:07
* mordred is excited about this18:07
pabelangerexciting18:07
pabelangermordred: YES18:07
tobiashmordred: excited about killing artifactory? ;)18:07
corvusone more detail: i think currently aborted jobs don't run post-playbooks18:07
corvusso you could end up creating containers but not cleaning them up18:08
tobiashcorvus: yes, they just stop18:08
tobiashcorvus: do you think for the container use case we need an 'always run this post playbook'-annotation in the job?18:09
clarkbtobiash: corvus maybe a new cleanup-playbook: specifier18:10
tobiashor that18:10
tobiashdefinitly better than a post playbook annotation18:10
corvusyeah, i think one of those would be useful.18:10
corvuswith cleanup-playbook, we need to define the nesting order (it has 4 dimenions now, which is harder to think about than 3 :).  with annotation, we need to alter the yaml structure to allow for annotations (post-run is currently a simple list of strings; we'd have to make it list of [string or dict])18:11
corvusi think if we did cleanup-playbook, maybe add cleanup playbooks before post playbooks at each level.18:13
corvuslike: pre-parent, pre-child, run, cleanup-child, post-child, cleanup-parent, post-parent18:14
corvusthe annotation approach would let you do that plus more options.18:15
corvusi think either would work fine, just a matter of (a) whether we want the extra flexibility, and (b) whether we're going to end up needing annotations anyway in the future for some other change :)18:16
mordredcorvus: I'm curious - why before post and not after post?18:16
pabelangerI could see post-run always runs regardless of aborts and maybe expect users to use blocks with zuul_success, then we can add clean up handler things into roles in always section. But maybe too much work on user side?18:17
tobiashmordred: after post could be difficult if you deregister the build ssh key in the post18:17
*** acozine1 has quit IRC18:17
mordredoh. good point18:17
corvusmordred: because in the simplest case of just one job level (no child inheritance), you'd probably want "upload logs" to be last, and that's the only way you could do that.18:17
tobiashmordred: and you might want to have logs about the cleanup18:17
corvuspabelanger: oh, yes, that's another option.  but we'd need to change a lot of existing jobs, i bet.18:18
clarkbcorvus: tobiash could also have cleanup be exclusive to post18:18
clarkbhave one or the other18:18
mordredstill gotta deal with inheritance hierarchy though18:18
clarkbya so parent pre, child pre, run, child cleanup, parent post type of deal?18:19
* mordred has been convinced of pre-run-cleanup-post a sequence18:19
clarkbnot sure that is easier or more clear18:19
tobiashhrm, the annotation would make the inheritance easy18:20
corvusclarkb: true, but i don't think that gains us much (and loses us the ability to have a job with both a cleanup and regular post playbook; granted, you can still do the same thing with conditionals, but you may have to build more logic into the playbook than otherwise)18:20
pabelangercorvus: yah, some jobs today (want to say tox) are already using zuul_success for log collection, but agree, we'd likely need some post-run clean up18:20
corvuspabelanger: yeah.  right now, for example, we're not uploading logs for aborted builds, just because the playbook isn't running.18:21
*** jiapei has quit IRC18:23
corvustobiash: i lean ever so slightly towards the annotation idea, because it's more future proof, and because it keeps the pre/run/post sequence looking simple (but intuitively accomodates more complexity when needed)18:23
tobiashcorvus: shall we use storyboard for noting such ideas?18:26
tobiashI find it hard to find them after weeks buried deep in the backlog18:26
corvustobiash: i thought you were writing this tomorrow? :)18:26
tobiashcorvus: that were two ideas :)18:27
corvusi think we'll find quickly that pausing jobs will require cleanup playbooks :)18:27
tobiashyes, but technically they're independent of each other18:27
corvustobiash: storyboard is a fine place for such ideas, though the first half of this idea is in the container spec18:29
tobiashcorvus: I'll need pausing jobs now to not being killed in the next two weeks and after I survived I can volunteer to implement the cleanup if nobody else took that task18:29
corvustobiash: you don't need a cleanup for artifactory?18:30
tobiashcorvus: ok, I think that both may fit well into the container spec18:30
tobiashcorvus: we do that by annotating expiry dates to the artifacts and asynchronously deleting expired stuff18:30
corvustobiash: or are you going to do the thing described in the container spec and run a service on a node for the duration of the buildset?18:30
corvustobiash: so what are you going to use the pause for?18:31
tobiashcorvus: my plan is to leave a lightweight node with the first job running serving the short lived cache instead of artifactory18:31
corvusah, ok.  so yeah, that's the model described in the container spec, and i agree, it doesn't need cleanup (deleting the node is sufficient)18:32
tobiashcorvus: like a prepare-workspace job that gets the synced source and git-lfs data (several GB)18:32
tobiashand all the other jobs get their data from that node and push their data to that node18:32
tobiashthen that's not hitting artifactory at all and the network traffic is more decentral in the cloud and not targeted to a single load balancer18:33
corvushaving said all of that, fungi made a point earlier that we could engineer inter-job scratch space on the executors after we add executor affinity.18:33
corvusbut, i think in the long run, having both of these options will be good.  and 'pause' is probably both easier to implement, and also useful for the container work.18:34
mordred++18:34
tobiashand in my case probably distributes the network traffic better18:34
fungiyeah, i like the resource build idea anyway since it's more flexible18:34
mordredI think the scratch space from pause seems more potentially scalable, since the space can grow with job nodesets as needed ... but I could also see executor scratch space with affinity being a thing too18:35
mordredalso - scratch space from pause will work soon (potentially) for tobias - and just be inefficient for openstack/multi-cloud scenarios until affinity is done18:35
corvus(pause, i'll note, will benefit from *provider* affinity too, and that may be effectively required for the container use case in some environments, but isn't *strictly* required like executor affinity is for scratch space)18:35
fungithe scratch space on executor model wins on simplicity but mostly only handles the one use case of dependent jobs sharing artifacts18:35
mordredcorvus: jinx18:35
corvusmordred: :)18:35
fungiand is also yet one more place to run into executor scaling issues18:36
fungiif provider build affinity gets implemented then a resource build could theoretically handle sharing very large ephemeral artifacts between jobs with decent performance18:37
corvusyep18:37
fungiwhereas executor scratch space would need to be tightly constrained to avoid creating a denial of service scenario18:37
fungi(not just in terms of disk space but also bandwidth consumption)18:38
fungiand to get similar network performance you'd need provider-specific executors, which is yet one more scaling axis to manage18:39
openstackgerritMerged openstack-infra/zuul-jobs master: Add role for installing docker and configuring registry mirror  https://review.openstack.org/58073018:39
tobiashok, so the plan is to implement job pause, then cleanup, then provider affinity?18:40
fungiand probably we could have roles available in the zuul-jobs stdlib to set up arbitrary storage during a resource build and pass around the necessary credentials, so in the end it _could_ be made just as easy as the scratch space on executor idea, i think18:41
corvustobiash, fungi: ++18:46
tobiashfungi: the credentials (ssh key?) could also be passed via zuul_return to the child jobs18:48
tobiashso ++ for zuul-jobs18:48
fungitobiash: yes, that's what i had in mind, just thinking we could orchestrate the handling of it via zuul_return in said role(s)18:48
tobiashyes good idea18:49
tobiashcorvus: regarding provider affinity, would you request that via an info in zuul_return (on demand) or in the project pipeline (static)?18:54
* mordred would vote for project pipeline - so that zuul/nodepool would know at the beginning that they might need to allocate all the nodes for a job graph in the same provider (might be important to know from a capacity perspective)18:56
mordredlike, if parent (2 nodes) + 2 children (4 each) need a total of 10 nodes and one of the providers only has 8 nodes of capacity - letting the parent schedule there then request affinity via zuul_return would potentially lead to a completely stuck situation18:58
mordreds/capacity/total capacity/18:58
mordredbut I'm just thikning out loud18:58
tobiashmordred: hrm, that would require a credit card like request model, take 5 nodes now but make sure you can fulfill 13 nodes18:59
corvustobiash, mordred: project-pipeline (static) means that we can provide the most information to nodepool.  whether we make use of it now or not is a separate question.  it would let us do the really sophisticated thing that mordred describes in the future if we want, but we can still do a simpler version (request child jobs in the same provider as parent if they indicate they need it) as well.19:27
tobiashcorvus: just had a further idea when having a prepare-workspace job that pauses it might reduce load on the executor if we could tag a job to skip setting up non-playbook/roles repos19:29
tobiashregarding provider affinity I maybe would tag the jobs in the project pipeline with a provider group (arbitrary user choosable value) to indicate that this set of jobs need to run on the same provider. With this information we could easily do the validation if the whole group could be in theory satisfied by the provider (the abs quota check in nodepool)19:39
*** rcarrill1 has joined #zuul19:49
openstackgerritMerged openstack-infra/zuul-jobs master: Don't write docker proxy config if docker_mirror undef  https://review.openstack.org/58301019:50
*** rcarrillocruz has quit IRC19:51
mordredShrews: ^^ woot!19:53
mordredShrews: I now expect a bazillion patches to land all in a row19:54
Shrewsmordred: was that the dependency for the pbrx jobs?19:54
mordredyup19:56
Shrewsoh, that was that one's parent, actually19:56
mordredyeah19:56
mordredthat one there is really just a cleanup19:57
*** sshnaidm|bbl has joined #zuul20:00
corvustobiash: could you, today, make a new base job which didn't copy the workspaces over, and inherit from that for jobs which shouldn't do that?20:21
pabelangerthere was a good idea from mordred about adding a new group into inventory that was something like skip_git, then we update prepare-workspace to run on hosts: all,!skip_git and repos shouldn't get pushed. But I haven't tested that yet.20:23
pabelangerbut there is also a need for that workflow in rdoproject to help save some IO / time20:23
tobiashcorvus: we have such a base job but I mean in this case we don't even have to prepare all repos on the executor20:23
tobiashpabelanger: our base job just reacts on a skip_synchronization variable that can be set on a job or even parts of the nodes in a nodeset20:25
tobiashpabelanger: you don't need groups for that20:25
openstackgerritGoutham Pacha Ravi proposed openstack-infra/zuul-jobs master: Attempt to copy the coverage report even if job fails  https://review.openstack.org/58269020:27
dmsimardcorvus: not entirely sure what that job runs but it's probably worth considering sending some of that output to log files instead of stdout so that the job logs are the ara-report are manageable20:47
corvusdmsimard: the job only emits output on test failure because the output is so big.  all the tests failing is a pathological case.20:47
dmsimardoh, so it's not generally that big -- got it20:48
corvusya.  normal case is, say 0-5 test failures :)20:48
corvus(for channel context, this is in re ara's performance with a very large sqlite database)20:49
*** hasharAway has quit IRC20:58
SpamapSOh interesting... a parent job that can say "I've done the things children might need" and then pause and wait for the children to finish. I like that, and the implementation would be pretty simple I think, since you could just use SIGSTOP/SIGCONT21:07
mordredSpamapS: you're a SIGCONT21:07
SpamapSOr a control socket or something else I suppose.21:07
SpamapSmordred: A feckless SIGCONT?21:08
* SpamapS should probably have SIGSTOP'd himself there.21:08
*** samccann has quit IRC21:12
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Put ubuntu_gpg_key into defaults instead of vars  https://review.openstack.org/58304721:17
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Build container images using pbrx  https://review.openstack.org/58016021:28
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Specify a prefix for building the images  https://review.openstack.org/58239621:28
*** ianw_pto is now known as ianw21:44
openstackgerritMerged openstack-infra/zuul master: Update bindep file with compile profiles  https://review.openstack.org/58015921:50
openstackgerritMerged openstack-infra/zuul master: Add alpine packages to bindep.txt  https://review.openstack.org/58227621:58
*** harlowja has joined #zuul21:59
*** jpena|off has quit IRC22:33
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Install less than alpine-sdk  https://review.openstack.org/58306222:35
tristanCmordred: should i look into using a TenantName Singleton services to query api/info and manage the zuul_api_root_url?22:45
tristanCmordred: and updates all the component to wait for the singleton service to be setup...22:46
tristanCmordred: i was think we could have a tenant drop-down list, like the project list in horizon, where you could just switch tenant if many are available22:46
tristanChum, but that wouldn't work with the '/t/{tenant}/page.html' routing...22:47
tristanCmordred: what do you think would be the easiest fix for the current ui issue?22:48
tristanCwell, i volunteer to fix that bug as it seems like a release blocker, but are there other blockers i can work on?22:59
*** harlowja has quit IRC23:03
openstackgerritIan Wienand proposed openstack-infra/zuul-jobs master: upload-logs: generate a script to download logs  https://review.openstack.org/58120423:45
openstackgerritIan Wienand proposed openstack-infra/zuul-jobs master: upload-logs: generate a script to download logs  https://review.openstack.org/58120423:55
openstackgerritMerged openstack-infra/zuul-jobs master: Put ubuntu_gpg_key into defaults instead of vars  https://review.openstack.org/58304723:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!