Thursday, 2019-05-30

openstackgerritTristan Cacqueray proposed zuul/zuul master: test_v3: replace while loop with iterate_timeout  https://review.opendev.org/66211200:29
*** ianychoi has quit IRC00:56
pabelangerclarkb: when you have time to review: https://review.opendev.org/661866/01:38
*** threestrands has joined #zuul01:56
openstackgerritTristan Cacqueray proposed zuul/zuul master: test_v3: replace while loop with iterate_timeout  https://review.opendev.org/66211202:45
*** rlandy|bbl has quit IRC03:42
*** threestrands has quit IRC03:45
*** threestrands has joined #zuul04:05
*** threestrands has quit IRC04:06
*** raukadah is now known as chandankumar04:31
*** saneax has joined #zuul04:57
*** pcaruana has joined #zuul05:00
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213406:02
*** bjackman has joined #zuul06:44
*** ianychoi has joined #zuul06:57
openstackgerritTristan Cacqueray proposed zuul/zuul master: executor: run cleanup playbook on stop  https://review.opendev.org/66188107:27
openstackgerritTristan Cacqueray proposed zuul/zuul master: docs: add cleanup-run documentation  https://review.opendev.org/66214707:27
*** jpena|off is now known as jpena07:36
*** toabctl has quit IRC07:50
openstackgerritAndriy Shevchenko proposed x/pbrx master: Update home-page  https://review.opendev.org/63013208:44
*** bjackman has quit IRC08:45
*** bjackman has joined #zuul08:46
*** saneax has quit IRC08:47
*** panda is now known as panda|ruck09:17
openstackgerritSlawek Kaplonski proposed zuul/zuul-jobs master: Add role to fetch journal log from test node  https://review.opendev.org/64373309:37
*** electrofelix has joined #zuul09:44
openstackgerritSlawek Kaplonski proposed zuul/zuul-jobs master: Add role to fetch journal log from test node  https://review.opendev.org/64373310:42
*** bjackman_ has joined #zuul10:51
*** bjackman has quit IRC10:54
*** jpena is now known as jpena|lunch11:02
*** tosky has joined #zuul12:14
*** jpena|lunch is now known as jpena12:25
*** sshnaidm|off has quit IRC12:32
*** rlandy has joined #zuul12:33
*** sshnaidm has joined #zuul12:49
openstackgerritMark Meyer proposed zuul/zuul master: Build a slack integration  https://review.opendev.org/66220813:11
*** ofosos has joined #zuul13:14
ofososThe Slack integration is very much WIP, it lacks docs, connection interface and probably a lot more. I'll polish until monday.13:15
pabelangerthere is some interest from ansible network folks on a slack reporter, so might take a peak13:16
ofososdon't, it's crap right now. We have a public holiday and I'm on-call and just hacked something togehter13:16
ofososIt currently lacks a reporter, it's a chat bot, that you can tell to run pipelines13:17
ofososBut the reporter is next, but not today :)13:17
ofososProbably on the weekend.13:17
pabelangerI can't comment on trigger option, I know there has been some discussion in the past for users to be able to more freely do that, but haven't been following the discussion13:18
pabelangerbut reporter is of some interest13:18
AJaegerofosos: check also https://review.opendev.org/536391 for a previous attempt and see discussion there13:20
*** pcaruana has quit IRC13:23
ofososAJaeger: thanks for the pointer13:23
ofososessentially I need both a trigger and a reporter, I want to run infrastructure repos in zuul13:24
fungias part of a team who runs infrastructure repos in zuul, i wonder what the desire for a manually-triggered pipeline is, unless the idea is to be able to drive things like maintenance activities with zuul and not have the build associated with a particular code event?13:35
AJaegereven those you can trigger with an "approve" of a change. A manually-triggered post pipeline (queue everything and release at certain time) comes to mind...13:36
*** bjackman_ has quit IRC13:36
fungiyeah, i mean you could trigger off approval of a code-reviewed maintenance plan in a planning repository or something, i suppose13:37
AJaegerexactly13:38
fungibut i can see the allure of starting the upgrade-firmware-on-all-the-ethernet-switches job on demand, when there are a sufficient critical mass of other sysadmins on hand to deal with any fallout... it's just zuul's model is around running sets of builds for events related to a git repository (even the periodic pipeline trigger expects a repository associated with any buildset)13:39
ofososHow is this different from the TimeTrigger implementation?13:43
ofososI think I basically worked off that piece of code13:43
fungiit would still be relative to the state of a particular repository and run a statically-defined set of builds, if it's like the timer trigger13:44
ofososWhat we want to do is specify the ideal state in a repo, gate it and then roll it out.13:44
ofososBut infrastructure has the tendency to degrade and we might need a way to manually trigger a deployment, without a code change13:45
fungiahh, and the "roll it out" part would be triggered by a human instead of the timer13:45
fungior instead of happening immediately after merging13:45
ofososKind of, the default is to roll it out as part of the gate13:45
ofososBut, if stuff breaks, we need to redeploy13:45
fungiahh, so for rerunning13:46
ofososSo we'll have a `deployment.yaml' which specifies the state that the system should be in13:46
fungithere is the zuul rpc command-line utility, which has an enqueue-ref subcommand13:46
ofososAnd rerunning will just pick up any drift and smooth it out13:46
fungiwe frequently rely on that for rerunning things... so i guess this interface would be similar?13:47
ofososWait...13:47
*** pcaruana has joined #zuul13:47
ofososhttps://imgur.com/a/f6LCgAx13:49
ofososWorks like this13:49
ofososIn this case the pipeline is just named check, but could be anything13:49
fungiso functionally similar to https://zuul-ci.org/docs/zuul/admin/client.html#enqueue-ref13:49
ofososYes13:50
pabelangerofosos: I've had good success with promote pipeline and periodic pipeline. After gate, promote runs.13:50
pabelangerand if that fails for some reason, periodic will then run13:50
pabelangerand hopefully fix13:50
ofososWe don't do any magic parameters that are outside git, we just need a way to rerun things13:50
fungii'm guessing your "check" pipeline is ref-oriented and not change-oriented, or else running a buildset on a git head wouldn't be doable13:50
corvusif the jobs don't make too many assumptions about zuul.* vars, both might be okay13:51
fungimmm, good point13:52
ofososI'm just sitting in a wood workshop and fooling around, we don't have that intricate pipelines yet. I've to see how it works out tomorrow.13:53
ofososRight now it's just in a proof-of-concept state.13:53
fungiin our (opendev deployment) case the closest equivalent would probably be either using `zuul enqueue ...` on our promote pipeline or `zuul enqueue-ref ...` on our post pipeline to rerun (run a new buildset for) the jobs which originally ran after a change successfully merged13:54
corvusbut yeah, part of the refactor into triggers/sources/reporters in v3 was to accomodate this kind of abstraction -- so i think it should work out13:54
fungior, well, not necessarily the jobs which originally ran but the jobs which are configured to run currently, which is usually the same (but sometimes it's not, and sometimes that's also why we want to reenqueue them)13:55
fungiwell, anyway, my point was that the same could *probably* be accomplished by a chatbot which ran zuul rpc subcommands or implemented the same rpc client interfaces13:56
fungiwith the right sort of socket configuration and protections you could probably even put it on a separate machine from the scheduler13:57
fungi(might need a trivial proxy to go from a tcp socket to a named pipe)13:58
pabelangerfungi: I know a while back, maybe gozer folks, talked about zuul client being able to do remote rpc commands (so we didn't need to expose ssh to users).14:00
clarkbit uses gearman iirc14:05
corvusthere's a lot of opportunity for ux improvement with chat triggers/reporters -- if we add an irc bot, we could "recheck 661627" for example.14:06
corvusif you're interested in the user-accessed-rpc approach, see the web-admin api spec14:06
*** chandankumar is now known as raukadah14:13
SpamapSofosos:FYI, I have had the same desires as you for manual triggering, and I've found empty commits work better.14:14
SpamapSofosos:the only problem is using files matchers and such, which don't trigger, and I've often thought that a header like `Ignore-Matchers: files` in the commit message would be a nice feature anyway.14:15
ofososSpamapS: I'm not really a taken by pushing an empty commit and opening a PR based on that. And I'm not sure what Bitbucket will do, when I try to open a PR with an empty commit.14:18
SpamapSofosos: GitHub and Gerrit handle it fine.14:18
SpamapSWorst case, you tack a line into a file, manual_runs.txt.14:18
SpamapSMake a script , like   `date >> manual_runs.txt && git add manual_runs.txt && git commit -m "manual_run by $USER" && git push origin manual-run-$USER-$(date +%Y%m%d%H%M%S) && bitbucket-client-open-thing`14:20
SpamapSPoint being, it's actually *immensely* valuable to have *everything* you ever did linked to git.14:20
ofososHow do you handle versioning with this? I.e. I want to have well defined versions on master and ideally only roll those out.14:21
SpamapSEspecially if you have change management controls, the PR/review/whatever-bitbucket-calls-it becomes your paperwork.14:21
SpamapSofosos:tag the new commit?14:21
SpamapSI've actually given up on human-defined versions. My devs tag the repo, but everything is tied to the Zuul build UUID, which links to the git commit, so the versions are just a human-readable summary of important commits.14:23
ofososIf some piece of infra broke down, why would this justify having a new version of the software? With infrastructure repos this might seem ok, but if I have a joint repo for infra & software it looks foreign14:23
SpamapSIt's pretty common in practice to have "rebuild" versions of software.14:25
SpamapSBut, you can always have a repo that is just for triggering.14:25
corvusSpamapS: i think at this point we could all learn from a conference presentation on your build/deployment practices.  :)14:25
SpamapScorvus: :-D14:26
SpamapSI should probably submit an abstract for Shanghai eh? ;)14:26
SpamapSofosos: so yeah, one interesting thing you can do with Zuul is attach jobs to repos that aren't the main focus of the job. So you could have a deploy job that requires the 'manual-triggers' project, which is just for recording manual triggers.14:29
SpamapSAnd don't think I haven't tried Slack integration in a similar fashion. :)14:29
ofososSo how does the manual-triggers project look?14:30
ofososlike14:30
SpamapSWe carried an experimental patch on our Zuul at GoDaddy for a while. But ultimately, I found that git was still the better way to trigger, and I reverted the slack patch, and wrote a slack-notify role.14:30
ofososIs that slack-notify role somewhere available?14:31
SpamapSofosos: README and maybe a zuul.yaml.14:31
SpamapSofosos:it's been in review forever.. let me dig out the link14:31
SpamapShttps://review.opendev.org/62359414:31
SpamapSTesting it proved.. complicated. ;)14:32
SpamapSThough I think I mostly just needed to change the test slack to something random so we didn't accidentally migrate opendev to slack. ;)14:32
ofososManual triggers would be interesting, I'd like to loop in a manual trigger, since that might allow me to delegate credentials from the user that wants to run the pipeline to the build system.14:32
ofososThat would be a different approach to constraining the build job to purpose built user credentials.14:34
SpamapSIndeed!14:34
SpamapSofosos:happy to help you hash it out.. hopefully I've steered you in a happy directoin. Have to run for a while.14:35
ofososHave a good run! I'll take care of my beechwood box and let the ideas percolate.14:36
*** zbr_ has joined #zuul15:06
pabelangerclarkb: corvus: tobiash: do you mind adding https://review.opendev.org/660856/ to your review pipeline, that is tristanC patch to skip file matcher on timer trigger pipelines. Would like to get your eyes on it please15:07
*** zbr has quit IRC15:09
ofososInterestingly this need arose, when I talked to our devs.15:12
ofososSpamapS: how do you deploy an older version?15:13
clarkbrevert probably15:15
SpamapSofosos: clarkb is correct. The HEAD is what we deploy. Always.15:17
clarkbthat ensures you have history of the rollback which is nice15:18
SpamapSExactly.15:18
SpamapSRollbacks are changes.15:18
pabelanger+115:18
ofososSounds good15:18
SpamapSI will say... the git->build->test->upload->deploy->test pipeline is too slow for prod, so there are hot-rollback procedures.15:18
SpamapSFor instance, if we can get back to a steady state by just rolling back a Kubernetes deployment, we do that.15:19
SpamapSBut for the most part, if it can wait 15 minutes, it goes through git.15:19
SpamapSLooking in to more automatic ways to do that, like Spinnaker's canary deploys.15:20
SpamapSAlso I realized yesterday our stack is Kubernetes, Ansible, Terraform, Zuul... so.. we herd KATZ15:20
pabelangerSpamapS: yah, I've curious how often people use UI for k8s / openshift to scale up / down stuff15:20
pabelangerover say, gitops15:21
SpamapSpabelanger: scaling should be handled by the pod autoscaler and AWS autoscaling groups. In theory. ;)15:21
SpamapSThe plumbing on that may have a few "TODO" comments. ;)15:21
SpamapSBut in general, scaling should always be in response to real data.15:22
SpamapSOur git config just sets a baseline, which we try to make "10X more than normal traffic" if we can afford it.15:22
ofososHmm, we're planning on having an entire blue/green cycle inside the gate pipeline including checking Splunk and Datadog. It feels like this will run for quite some time, especially because we're doing multi-region deployments. Any better ideas?15:25
SpamapSofosos:That probably belongs in a promote style pipeline, not gate.15:26
SpamapSpromote generally is tied to close+merge events, so that git reflects your intended state at any given time.15:26
SpamapSThat's how we do it anyway. gate is for validating the proposed git state, and staging artifacts.15:27
ofososSpamapS: can you point me to an example of how this looks in zuul?15:27
SpamapSI think it could work to deploy in gate though. Haven't thought about that.15:27
pabelangeryah, agree with SpamapS, we've been doing promote too for production things15:27
pabelangerwould be awkward in gate, incase that change didn't merge properly15:28
ofososWhat happens if a deployment fails in promote?15:29
SpamapSofosos: http://paste.openstack.org/show/752305/15:29
SpamapSthat's our pipeline config15:30
SpamapSNote that we don't actually use the `post` pipeline anymore.15:30
SpamapSofosos: promote fails notify slack, and often trigger monitors before that. ;)15:31
SpamapSin theory they could comment on the PR too, but we don't do that.15:31
pabelangerhttps://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L9615:32
ofososBut then you're in a state where `master' (or whatever) will not result in a viable deployment.15:32
* SpamapS utterly failed at anonymizing that paste. :-P15:32
pabelangerthat is our promote, based on SpamapS one15:32
pabelangerwe do comment too15:33
pabelangerI find that helpful, incase promote doesn't work for some reason15:33
*** pcaruana has quit IRC15:33
SpamapSofosos:correct! But that's a 3-alarm fire, and generally we have to decide whether to revert or handle urgently.15:33
SpamapSif it happened in gate, we wouldn't actually have the state that resulted in the problem15:34
SpamapSSince a gate fail would just reset, and the next thing in the queue would start deploying.15:34
SpamapSand TBH, promote jobs don't always detect the failures15:34
SpamapSOur promote job fails to wait for all of the things it started to finish, for instance.15:35
SpamapSSo we have to fall back on monitoring to alert us to that fail.15:35
SpamapSThis is great btw15:35
SpamapSyou are all writing my talk for me.15:35
SpamapS;)15:35
pabelangeryah, I'd be intereted to shadow your deployment for a day or so, to see how it all works :)15:36
corvusSpamapS: "so then ofosos asked '...' and i said '...' and then pabalanger was like '...'!"15:37
fungii want to say corvus did a conference presentation involving opendev's promote pipeline usage model in the past couple of weeks15:37
fungii can't recall where that was merged though15:37
SpamapSIt's pretty boring. We deploy like, 2 python API's, some 3rd-party stuff, a frontend website, and a bunch of AWS plumbing with terraform.15:37
corvusfungi: that was mostly focused on the k8s stuff, only incidental mention of zuul15:37
fungi(i think i approved the addition in git though, so shame on my fallable memory)15:37
SpamapScorvus: and he was like "shuuttt uuuup" and I was like "whaaatever". ;)15:37
fungicorvus: ahh, okay15:38
corvusSpamapS: totally krad talk.15:38
SpamapSbruh, do you even zuul?15:38
pabelangerSpamapS: corvus: I'd totally come to a talk about our irc discussions, and the solutions that came from them :)15:39
SpamapSThe more interesting work is where people keep reacting violently to Zuul's model and asking "what do we do when the build breaks?" ;)15:39
clarkbSpamapS: semi related elsewhere I saw comments about "lets just merge this because we can wait around to fix the zuul gate"15:39
SpamapSclarkb:can't?15:40
clarkber ya15:40
SpamapS:)15:40
clarkbbasically they didn't understand that you can't merge unless the gate passes15:40
clarkbit is a learning experience for many15:40
SpamapSYeah luckily our gate runs about 15 minutes, and we don't do clean-check, so I haven't had any "skip the gate" conversations as yet.15:40
pabelangerclarkb: oh, yah, that happened recently for us too. The hard part right now, is humans still have admin access to repos zuul runs on. It has been difficult asking them to stop doing that workflow15:41
SpamapSOddly enough I also haven't had any "hey it's amazing master always works" compliments yet. Ungrateful devs.15:41
SpamapSofosos: one thing I haven't mentioned yet. We deploy master to our staging environment, but we have a separate branch, called prod, that we use for production. The staging environment is used as a buffer in case there are things people want to visually verify, etc.15:42
ofososSpamapS: interesting detail15:43
SpamapSAnd lately I've had to yell at people to stop doing manual API testing there and write real tests for the gate. "SHIFT LEFT!" I scream into the void.15:43
ofososSpamapS: do you do canary or blue/green on any of the Terraform stuff?15:43
SpamapSNearly every failure we've had in deploy can be traced to things like "Wrong API key in prod config." or "Visual/Legal-review-needed detail missed in staging."15:44
pabelangerclarkb: could I get a review on https://review.opendev.org/661866/ wouldn't mind seeing if we could land that15:44
*** tosky has quit IRC15:44
SpamapSofosos:no, we just apply and slurp outputs.15:45
ofososWe're  in the CloudFormation 'rollback failed nightmare'-camp15:45
clarkbpabelanger: yes15:46
SpamapSI refuse to use CloudFormation. There are real reasons, and also personal reasons, for that. ;)15:46
clarkbpabelanger: see my question in #openstack-infra about ansible things if you have a moment too please :)15:46
ofososSo no updates, just re-creates. Despite that, we want to roll out VPCs and the like with this, so there are really not going to be updates15:46
* SpamapS shoots an appreciative but worried glance at the Heat dev team. ;)15:46
ofososIt's not a nice experience, I agree with that.15:47
SpamapSofosos: With terraform we do actually sometimes apply before deploy, and we're talking about storing the plan in git so we don't have surprises.15:47
SpamapSBut so far none of that has bit us.15:47
pabelangerclarkb: replied15:48
pabelangeralso, relocating network here is terrible15:48
SpamapSofosos: Terraform and CloudFormation are both very very powerful, with emergent behaviors if you don't reign them in.15:48
SpamapSIMO nobody should ever use CloudFormation now that Terraform exists.15:48
fungiSpamapS: nobody every notices when things are always working. need to occasionally break stuff to get their attention ;)15:49
SpamapSWe plumb AWS[VPC, EC2, ELB, RDS] -> CloudFlare -> Kubernetes -> StatusCake all with terraform. I can't imagine how much bash/python/garbage code we'd have to write without it.15:49
clarkbpabelanger: I had put it on the backburner after the request for a test but now see shrews feels that isn't necessary (I think it wouldn't hurt to have one but also agree seems unnecessary)15:50
ofososI arrived at this company, with the impression that this discussion was already finalized and people still believe that terraform is somehow inferior15:50
clarkbSpamapS: do you ipv6 with elb? can you maybe help docker to do the same with docker hub :)15:51
SpamapSclarkb: nope, CloudFlare does all our ipv6.15:51
SpamapSBut AFAIK elbs always get an AAAA and an ipv6.15:52
fungii guess dockerhub needs a cdn15:52
clarkbdocker hub uses cloudflare to serve the fs layer objects15:52
SpamapSso if they're failing to republish the AAAA they're just lazy.15:52
fungihuh... i wonder why the dockerhub elb lacks aaaa then15:52
clarkbbut the index is served behind elb15:52
fungiyeah, could be the lazy15:52
SpamapSclarkb:weird, I wonder why they wouldn't want to CloudFlare the index.15:53
clarkbSpamapS: it leads to fun caching proxy rules15:53
clarkbcertainly would be easier for us if it was all at a single location15:53
SpamapShm I was wrong, you have to turn on ipv6 on classic ELB15:55
ofososHmm, I'm wondering on how to do a multi-region promote. Does that make sense to have the promote job run with a delay between regions?15:57
SpamapSofosos:I do a multi-region promote. I think at that point, it's just like any other automated deploy. If you need a delay, do a delay. We do them one after the other.15:59
ofosossounds reasonable15:59
ofososIf one of them fails, do you rollback all previous deploys?16:00
SpamapSBut I actually would really like to hand this all off to Spinnaker and use some of their awesome primitives.16:00
SpamapSofosos:no, we just explode and notify. Some of our stuff self-heals though. The kubernetes deploys for instance do a good job of detecting readiness and not destroying working pods.16:01
SpamapSI'm focused more on self-healing than auto-rollback.16:02
SpamapSWhich is where I want to get Spinnaker canaries involved.16:02
*** rfolco has quit IRC16:06
fungiour situation is probably a lot different, but we're performing more and more full-stack tests by using our deployment ansible playbooks to deploy test copies of sections of the infrastructure in virtual machines and exercise it16:09
fungibefore we approve, heck before we even review those modifications16:09
*** rfolco has joined #zuul16:11
*** electrofelix has quit IRC16:16
openstackgerritMerged zuul/nodepool master: Add error handling when cleaning up resources  https://review.opendev.org/66186616:18
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132716:20
SpamapSfungi:that's exactly what we do too16:20
fungicool!16:21
fungiso it's not just us then16:21
SpamapScheck does what it can w/o secrets. gate does more with secrets. It's a pretty tight funnel, and most of what makes it through does what the programmer/automator intended.16:21
SpamapSFor instance, in check, we deploy all of our kubernetes things into a minikube.16:22
SpamapSIn gate, we can stick them into a real k8s cluster because now we have creds in secrets.16:22
fungiwe use fake credentials in check/gate and just stand up copies of the additional services we want the service we're testing to interact with16:22
*** electrofelix has joined #zuul16:22
SpamapSfungi:one big difference for us, is that we have ~20 3rd party API's to deal with, so we can't stand up fakes.16:22
SpamapSBut in gate, we have fake-ish account creds to run tests with.16:23
ofososHmm, hmm, hmm. The glue of my box is setting...16:23
fungibut yeah, that doesn't catch possible differences on long-running persistent systems, and we also eschew proprietary software/services16:23
fungiso our free software ideals help us out there16:23
SpamapSI swim in a sea of OPP (other peoples programs). ;)16:23
fungiwe do rely on lots of opp, it's just opflossp16:24
ofososI still like to have some knobs: I'd like to model my playbooks in a way that I can trigger a hot rollback in some easy/general way and I'd like a knob for passing credentials to the promote job. I think I need to build another box to mull this through :)16:25
SpamapSofosos: one way to model this in zuul is to make a fast-track pipeline. I did that at GoDaddy, where certain things would trigger things to run w/o long tests.16:35
SpamapSLike, a label on a PR, or a specific hot-fix branch.16:35
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132716:39
ofososSpamapS: what do you think about 'a thing' that uses AWS cognito to log you into a site and then passes your role/permissions/credentials to a job to execute with? I think that would be fairly easy to realize.16:42
SpamapSofosos: I think that might be fine. I choose to run everything through git though. ;)16:46
SpamapSI am immune to eye rolls though.16:46
ofososYeah, but then you end up with a) a lot of roles that can do a lot of stuff, in sum allowing zuul to do everything; or b) outright allowing zuul to do everything. Both are kind of bad, IMO.16:50
ofosos This could be part of a promote pipeline, i.e. the pipeline requests credentials from a credential provider.16:51
SpamapSofosos:the roles aren't what enables things, the secrets are. And those should be tightly coupled to whatever lets people approve/merge changes.16:51
fungizuul's secrets model has had tons of thought put into its design specifically to allow these use cases, so that you don't need to have a separate secrets store for your jobs to authenticate to and fetch from16:52
ofososHaving fixed credentials is kind of bad. I'd prefer to operate with temporary credentials. Irrespective of how much brainpower people put into managing these fixed credentials.16:55
fungiahh, so you have some separate system create new credentials on the fly and authorize them in the relevant services and hand those to jobs and then revoke them when the build completes?16:56
fungii guess it just depends on where you put that trusted central authority. in our case zuul is our trusted central authority for such purposes16:57
ofososYep, that's right. But mostly they'll just time out.16:57
fungialso the job needs some way to authenticate to the credential broker, so the fixed credentials it uses to authenticate to the credential broker becomes the new authority, in effect16:59
fungii suppose it does though give you the ability to insta-revoke access from zuul jobs to all systems by just revoking the credentials it uses to interface with the credential broker, rather than needing to individually revoke various credentials which were in the job secrets17:03
ofososThe workflow would be: Job requests credentials, user logs into web ui and grants access with their credentials, job is passed a set of credentials which don't renew. Ideally the job could sign that request, so the user is presented with information about which system is requesting their permissions.17:03
ofososThis is for prod, for test, we'll likely use a more relaxed policy.17:04
fungido the user's credentials also time out? otherwise what's to prevent the system from caching/saving and reusing them?17:08
fungii suppose otp could work there17:08
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132717:08
ofososWe'll just generate temporary credentials based on the users permission level.17:11
fungiand then the user enters those?17:15
fungior the user is entering durable credentials which the job then uses to obtain temporary credentials?17:15
ofososNope, the user is authenticated, we check his permissions and generate appropriate temporary credentials based on his permission level to pass on to the job.17:15
ofososThe user has to authenticate in some way. I think with our setup, this will likely be oauth.17:16
ofososThe `thing' (service) just makes sure that the job never has durable credentials.17:16
ofososI'll try to build something on the weekend, so I can demo it. Maybe that'll be easier than just text. :)17:18
ofososI'll be afk for the rest of the day, need to enjoy the public holiday some more. Very enjoyable discussion :)17:19
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate builds with event id  https://review.opendev.org/65889517:28
openstackgerritTobias Henkel proposed zuul/zuul master: Log github requests with annotated events  https://review.opendev.org/66080017:28
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate logs around build completion and cancellation  https://review.opendev.org/66080617:28
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate logs around build states  https://review.opendev.org/66148917:28
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate logs around reporting  https://review.opendev.org/66149017:28
openstackgerritTobias Henkel proposed zuul/zuul master: Annotate logs around finished builds  https://review.opendev.org/66149117:28
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132717:34
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Store autohold requests in zookeeper  https://review.opendev.org/66111417:34
Shrewsi'm sad zuul tests seem to lack an equivalent of nodepool's CLI tests17:39
SpamapSofosos: I don't know why you'd want a user to be the gateway for credentials. We make policy the gateway. If a job has been granted permissions, it can go forward. We scope down when API's allow it, like Amazon's sts, where we make a token that only is valid for the life of the job, but a human doesn't do that, a trusted job does it.17:40
SpamapSAlso we only allow a narrow team of individuals to commit things to the prod branch, so if it's in prod, it has already had a human authorization.17:41
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132717:44
corvusShrews: the rpcs are tested, and so far the client has been a thin enough layer on the rpcs that if they work, the cli should too.  that's in test_scheduler.py (of course, because they're old and everything is there) -- eg test_autohold17:47
corvusShrews: there is a test_client.py which seems to have one cli executable test17:47
*** electrofelix has quit IRC17:48
corvuscould probably combine the two to make a new test if you didn't feel the rpc-only test was sufficient17:48
Shrewscorvus: yep, i'm aware of that one. it lacks the framework for testing output though, similar to https://opendev.org/zuul/nodepool/src/branch/master/nodepool/tests/unit/test_commands.py17:49
*** electrofelix has joined #zuul17:50
tobiashtristanC: I added some thoughts to https://review.opendev.org/59009217:59
tobiashcorvus: I'd be curious what you think ^18:01
*** electrofelix has quit IRC18:03
corvustobiash: i think i agree about forwarding zuul_return.  i'm not sure about the rest right now.18:06
tobiashcorvus: thanks18:10
*** jpena is now known as jpena|off18:13
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132718:17
*** pcaruana has joined #zuul18:24
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132718:38
*** nickx-intel has joined #zuul18:54
nickx-intelhow do I inherit variables from main.yaml > main-task role > leaf-task role ?18:55
nickx-intelweird19:02
nickx-intelit's erroring because it's finding variablename in - name: but variablename isn't noted by {{}}19:03
nickx-intelcan't I escape variablename so that it doesn't try to parse - name: "stuff"19:03
pabelangernickx-intel: where is your variable?19:03
nickx-intelpabelanger, I have it declared in run.yaml variables19:04
pabelangernickx-intel: you can look to inventory file for it19:05
pabelangerit depends how you are setting it19:05
nickx-intelhmm19:05
pabelangerif you are using set_facts, they don't persist across ansible-playbook runs19:05
nickx-inteldoes run.yaml vars: use set_facts?19:06
nickx-intelimplicitly?19:06
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Store autohold requests in zookeeper  https://review.opendev.org/66111419:06
pabelangernickx-intel: what is run.yaml?19:06
pabelangeris that a pre-run / run / post-run playbook?19:07
pabelangerin your zuul job19:07
nickx-intelit's a run playbook19:07
pabelangernickx-intel: how are you setting the fact? It would only be set_fact, if you called that task19:08
pabelangerother wise, if a zuul job variable, that will be stored in the inventory file19:08
nickx-intelpabelanger, setting implicitly? idk? it's not an explicitly defined variable assignment. like. it just does like this,19:10
nickx-intelvars:19:10
nickx-intel  key: value19:10
nickx-intel  key2: value219:10
pabelangeryah, if that is in your play, that should work19:10
nickx-inteldoes this implicitly call set_fact?19:10
pabelangerno19:10
pabelangernickx-intel: https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html explains all the fun with variables in ansible19:11
nickx-inteldo I need to call branch_role(vars) leaf_role(vars) or something?19:11
pabelangernope19:11
pabelangeryou should be able to call19:11
pabelangertask: shell: "echo {{ key }}"19:12
pabelangerand it works19:12
pabelangerin the same play19:12
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132719:13
nickx-intelI'm trying to implement include_role: leaf_role19:13
nickx-intelbut my leaf_role is dumb19:13
pabelangeryou might need to pass vars into include_role, see: https://docs.ansible.com/ansible/latest/modules/include_role_module.html19:14
pabelangeryou likely hitting a scoping issue19:14
pabelangerPass variables to role example in link above19:14
nickx-intelyeah that's my apparent position pabelanger, vis branch_role(vars) leaf_role(vars) :)19:14
nickx-intelI'll dig more after lunch, I think this is sufficient, thank you pabelanger for confirming my suspicion19:15
nickx-intelI'll post my fix after I fix lol :)19:16
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132719:24
*** tosky has joined #zuul19:38
*** rlandy is now known as rlandy|brb19:39
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132719:52
*** rlandy|brb is now known as rlandy20:08
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132720:12
openstackgerritClark Boylan proposed zuul/zuul master: Update axios version and yarn.lock  https://review.opendev.org/66231620:16
clarkbtristanC: one thing I notice about my change at ^ is that it added itnegrity shas to the packages in my yarn.lock update but we don't seem to have those on the other locked packages20:18
clarkbtristanC: does that mean I did something wrong or will it add those optomistically?20:19
clarkbreading on the internet seems like older yarn didn't add those and newer yarn does. Maybe the version of yarn used to generate the existing lock file was older?20:27
clarkbseems like checking package hashes is a good thing so I don't think I'll try to undo it unless someone says we need to20:27
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132720:32
*** pcaruana has quit IRC21:18
pabelangertobiash: have you seen this error before with github? http://paste.openstack.org/show/752331/21:27
openstackgerritClark Boylan proposed zuul/zuul master: Update axios version and yarn.lock  https://review.opendev.org/66231621:27
openstackgerritClark Boylan proposed zuul/zuul master: Use nodejs v10 in testing  https://review.opendev.org/66233921:27
clarkbthe axios change failed an in debugging it noticed we use nodejs 6, 8 and 10 in different jobs21:30
clarkbI've tried to make it nodejs 10 across the board in hopes that also fixes my axios problem21:30
clarkbbut I think we should use nodejs10 regardless21:30
fungilikely so21:31
clarkbok new nodejs doesn't fix the axios change issue21:38
* clarkb generates new yarn.lock from scratch21:38
clarkbafter rebuilding venv so that it has nodejs 10 in it21:38
fungiweb browsers are so last decade anyway21:40
pabelangertobiash: it looks like some PR reviews, don't have commit_id: https://api.github.com/repos/ansible/ansible/pulls/45469/reviews21:43
pabelangerbut I don't know why21:44
openstackgerritClark Boylan proposed zuul/zuul master: Update axios version and yarn.lock  https://review.opendev.org/66231621:45
pabelangerhttps://github.com/ansible/ansible/pull/45469/files/1feaf0f2df238cf6788c65c80f08e655891091f621:45
pabelangerlooks to be deleted?21:45
pabelangerjlk: when you have spare cycles, I'd be interested in what you think we need to do about pull reviews missing a commit_id, see pb above21:46
clarkbpabelanger: maybe that happens if you do a rebase and replace the old commits?21:46
clarkbgithub has in the past not been great about keeping that data around21:46
pabelangerclarkb: maybe21:47
clarkbit does keep diff contexts now but last I checked the commits are gone21:47
clarkband it is the first 2 comments that don't have it in this case whihc would fit under that I think21:47
pabelangerbut looks like we need to update github3.py, because https://github.com/sigmavirus24/github3.py/blob/master/src/github3/pulls.py#L961 is where it is failing21:47
pabelangernot sure what we should do in that case21:48
clarkbpabelanger: ya seems like it21:48
pabelangerI'm not even sure what we are using pullreviews for right now21:50
pabelangerI guess for pipeline trigger21:51
pabelangerso, in our case, we likely don't care about commit_id21:51
clarkbmaybe not? if you trigger approvals or rechecks and expect the commit id to identify what to test we might, but I think we always use the current state of the PR HEAD so it is similar to gerrit in that way21:53
clarkbok someone smarter than me will have to figure out the axios bump. The other change https://review.opendev.org/662339 is good to go I expect22:01
jlkpabelanger: My thought is that if it's missing a commit_id it gets discarded. But _also_ this looks like a bug in github3.py; always assuming there is a commit_id. That's probably my code.22:06
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: WIP: registry test job  https://review.opendev.org/66132722:21
pabelangerclarkb: jlk: my 2min fix: https://github.com/sigmavirus24/github3.py/pull/94422:27
pabelangerwill look at tests in a bit to see if we need to add coverage22:27
jlkalrighty. I think there might be, but again I wrote most of that so it's possible I didn't do it right.22:27
jlkI'm asking internally about this. It should be documented.22:27
pabelangercool, thanks22:27
clarkbpabelanger: we may need to update zuul to check for a None commit_id after taht goes in but that is probably the right approach22:28
jlkYes, if our assumption is correct (a review for a commit that was force pushed out of the branch) then the proper thing to do is toss the review.22:29
pabelangerclarkb: yah, seems like a good idea22:29
pabelangerdoing that patch now22:30
openstackgerritPaul Belanger proposed zuul/zuul master: Discard GitHub PullReview if incomplete  https://review.opendev.org/66234722:35
pabelangerclarkb: jlk: believe that is what you are suggesting^22:35
openstackgerritPaul Belanger proposed zuul/zuul master: Discard GitHub PullReview if incomplete  https://review.opendev.org/66234722:37
pabelangerheh22:38
pabelangerzuul can't seem to merge depends-on on ^22:38
pabelangerlet me remove to confirm22:38
openstackgerritPaul Belanger proposed zuul/zuul master: Discard GitHub PullReview if incomplete  https://review.opendev.org/66234722:38
pabelangerHA22:45
pabelangerclarkb: zuul.o.o is getting the same commit_id exception on that review22:45
jlkodd22:46
pabelangerthat is why there is a merge failure it seems22:46
jlkoh, because it was trying to see if that referenced PR is mergeable yet, by looking at reviews?22:46
pabelangerthat is what I'm looking at now22:46
pabelangeroh, maybe it was another event from github22:48
pabelangerI'll have to dig later on zuul.o.o22:48
tristanCclarkb: that should be fine, perhaps you need to also bump (or removed) the yarn version from the lock/packages.json22:56
clarkbtristanC: it did bump it but is still failing23:00
tristanCclarkb: i meant in the packages.json, though i don't know if yarn should install itself or if it's safe to use a global one23:09
clarkbah23:11
tristanCclarkb: and it seems like the yarn.lock change bump versions for un-pinned dependencies like eslint-plugin-react (from 7.11 to 7.13)23:11
tristanCwhich may not be compatible with the pinned one like react-scripts 1.1423:11
tristanCclarkb: perhaps we should try to rebase on https://review.opendev.org/65999123:11
openstackgerritClark Boylan proposed zuul/zuul master: Update axios version and yarn.lock  https://review.opendev.org/66231623:13
clarkbis that what you mean about the yarn versions?23:13
clarkband ya wouldn't surprise me if we need to update other things and so basing it on that revert might be the way to go23:13
clarkbor update the revert to update axios23:13
*** ianychoi has quit IRC23:13
*** rlandy has quit IRC23:16
*** panda|ruck has quit IRC23:22
*** panda has joined #zuul23:23
*** tosky has quit IRC23:33
*** tjgresha has joined #zuul23:50
*** tjgresha has quit IRC23:55
*** tjgresha has joined #zuul23:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!