Tuesday, 2021-09-28

-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] Make QueueItem a Zookeeper object https://review.opendev.org/c/zuul/zuul/+/80941408:07
-@gerrit:opendev.org- Simon Westphahl proposed:11:11
- [zuul/zuul] Store pipeline state in Zookeeper https://review.opendev.org/c/zuul/zuul/+/810658
- [zuul/zuul] Store change queues in Zookeeper https://review.opendev.org/c/zuul/zuul/+/810920
@spamaps:spamaps.ems.host> <@jim:acmegating.com> zuul-operator would also be an option of course, but it doesn't have the gcp service account special sauce13:45
The zuul operator is not an option for me. We don't get root on the cluster. I can run a privileged container with a PR to a repo that allows it, but I can't get CRDs installed without a massive hoop jumping exercise.
@spamaps:spamaps.ems.hostAnd honestly, I reject the paradigm entirely.13:45
@jim:acmegating.comdifferent strokes for different folks :)13:51
-@gerrit:opendev.org- Simon Westphahl proposed:13:52
- [zuul/zuul] Store change queues in Zookeeper https://review.opendev.org/c/zuul/zuul/+/810920
- [zuul/zuul] Save and restore bundle with item in Zookeeper https://review.opendev.org/c/zuul/zuul/+/811422
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] Save and restore bundle with item in Zookeeper https://review.opendev.org/c/zuul/zuul/+/81142214:18
@clarkb:matrix.org> <@jim:acmegating.com> opendev is running master (well, as of a few commits ago), and so far the event queue is looking good.15:07
Is the next step pushing a new zuul release?
@fungicide:matrix.org> <@iwienand:matrix.org> fungi: that all seems fine.  no need to spend our limited attention spans on xenial now.  i think the stack is all ready to go?15:27
ianw: yeah, i think we can merge them unless you feel like we need to announce that anywhere, https://review.opendev.org/810299 is also related to that and needs reviews (though it's not a blocker for the rest, just updating docs in order to better align user expectations)
@jim:acmegating.comClark: i see some errors in opendev's scheduler log i'd like to look into15:47
@clarkb:matrix.orgoh good idea.15:48
@tristanc_:matrix.orgspamaps: would you mind explaining (in a nutshell) what is the paradigm issue with the operator?15:48
@tristanc_:matrix.orgcorvus: would it be possible to bypass the CRD and uses the operator container standalone? e.g. write the CRD spec in a config map and make the operator run in somesort of a one-shot mode15:50
@jim:acmegating.comtristanC: theoretically, and it doesn't even really need to be one-shot.  but the operator would do a lot of extra work to filter out events from all the config maps.  however, it's already doing that work since it tracks config files that are stored as configmaps (since we didn't make those custom resources).  but honestly, it seems like the wrong direction -- it feels like extra work to accomodate a local policy decision, and CRDs seem much more like the right way to use k8s (we should seriously consider making our config files CRs as well, then the operator can do less work).  but i sort of interpreted spamaps's objection as objecting to the idea of an operator process taking control of deployment versus building up a deployment by hand from first principles.  i guess we'll see what he says.  :)15:56
@jim:acmegating.comtristanC: sorry it's secrets, not configmaps, that the operator watches.  so here's a function that's called everytime a secret is updated: https://opendev.org/zuul/zuul-operator/src/branch/master/zuul_operator/operator.py#L7115:58
@tristanc_:matrix.orgcorvus: i agree CRDs seem to be the way to go, but I wonder if (and why) they would be an issue (beside the required privilege to install them)16:00
@jim:acmegating.comyeah, i think that would be an interesting thing to learn too.16:01
@spamaps:spamaps.ems.host> <@tristanc_:matrix.org> spamaps: would you mind explaining (in a nutshell) what is the paradigm issue with the operator?16:02
I may be behind in my K8S-fu, maybe it's better, but personally I think operators complicate operations by running at the wrong layer. I don't want my k8s cluster managing *anything* outside of pods, services, ingresses, networks, etc. Not everything maps to a convergent model where one puts a yaml somewhere and waits for it to realize itself. I much prefer the terraform flow, where APIs are contacted by a unified orchestrator that lives outside of the production runtime.
@tristanc_:matrix.orgspamaps: so would it be ok for you to run the zuul-operator container locally to deploy on a remote cluster, or you want to use an existing tool like terraform?16:05
@jim:acmegating.comClark, swest: https://paste.opendev.org/show/809655/ are 2 errors that look significant enough we should try to address them asap.  i'll get started on that.16:07
@spamaps:spamaps.ems.host> <@tristanc_:matrix.org> spamaps: so would it be ok for you to run the zuul-operator container locally to deploy on a remote cluster, or you want to use an existing tool like terraform?16:08
I don't know what benefits said operator container would offer me. Terraform gives me something I always want in operations: intended state, and actual state. The diff between intended state and actual state allows calculating the change. That becomes very useful the more complex something becomes.
@jim:acmegating.com(they are persistent, since they abort queue processing and kep those items in the queue)16:08
@spamaps:spamaps.ems.host> <@spamaps:spamaps.ems.host> I don't know what benefits said operator container would offer me. Terraform gives me something I always want in operations: intended state, and actual state. The diff between intended state and actual state allows calculating the change. That becomes very useful the more complex something becomes.16:09
It also gives me a bridge to other APIs, so if I want to pull information out of the k8s resources and ship them into a firewall, that's a very natural thing in Terraform, and very much a black box coding exercise otherwise.
@tristanc_:matrix.orgspamaps: alright, thank you very much for the feedback. Then I guess there is no point trying to run the operator standalone without CRDs. I agree with @corvus and it should actually leverage CRDs for the individual config, as it seems to be the way to go for this usage.16:11
@spamaps:spamaps.ems.hostMy actual experience with operators is scant. They feel like black boxes and they assume a lot of privileges in the cluster that not everyone will have.16:14
@tristanc_:matrix.orgThen I can't tell if and how the zuul project could provide additional deployment option for terraform user.16:15
@jpew:matrix.orgMy (limited) experience is that good operators are amazing, but bad operators are worse than useless. I think they just have a lot higher possibility of variability than other things like helm/terraform16:21
@jpew:matrix.orghigher ceiling, much lower floor :)16:21
@spamaps:spamaps.ems.hostZuul on K8S Terraform modules is something I'm trying to get open sourced from my previous employer who isn't even using them anymore.17:05
@spamaps:spamaps.ems.hostWe ran it on EKS with S3 log storage, StatefulSet Zuul, and RDS for the DB. It worked great.17:09
@spamaps:spamaps.ems.host * We ran it on EKS with S3 log storage, StatefulSet Zuul and ZooKeeper, and RDS for the DB. It worked great.17:09
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Never externally delete change cache entries https://review.opendev.org/c/zuul/zuul/+/81145217:20
@jim:acmegating.comClark, swest, tobiash: ^ i think that should fix the most urgent error we're seeing in opendev.  i'll start looking into the other one (which seems a little less urgent).17:20
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Never externally delete change cache entries https://review.opendev.org/c/zuul/zuul/+/81145217:21
@jim:acmegating.compep8 fix ^17:21
@clarkb:matrix.orgthanks I'll review it momentarily17:26
@clarkb:matrix.orgIt has been approved17:32
@clarkb:matrix.orgThank you for the detailed commit message17:32
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] Never externally delete change cache entries https://review.opendev.org/c/zuul/zuul/+/81145218:43
@spamaps:spamaps.ems.host> <@jpew:matrix.org> My (limited) experience is that good operators are amazing, but bad operators are worse than useless. I think they just have a lot higher possibility of variability than other things like helm/terraform19:01
My experience with operators is all bad. Same with helm actually. ;)
@mordred:inaugust.comI enjoyed rook, the ceph operator19:17
@gtema:matrix.orgRook is great, but if it crashes19:19
@gtema:matrix.orgFixing it is not as great19:20
@clarkb:matrix.orgcorvus: couple of things on https://review.opendev.org/c/zuul/zuul/+/808841 for when the zuul restarting is done20:33
@jim:acmegating.comClark: replied, thx.  let me know if you feel strongly about either of those (i don't)20:38
@jim:acmegating.com(to elaborate on the first comment, that's sort of idiomatic elsewhere in zuul too)20:39
@clarkb:matrix.org+2'd I dont' feel strongly about them either20:39
@jim:acmegating.comopendev and gerrit-review are both running zuul master now20:41
@jpew:matrix.orgI have a bunch of projects that need to the exact same job definition, so I'd like to include that in a common repo so that the others can pull it in... but it requires a secret and I'm not sure how to deal with that. Does anyone have suggestsions?20:58
@clarkb:matrix.orgjpew: if the secret needs to change between repos they can pass their secrets to a trusted parent with a job directive20:59
@clarkb:matrix.orgif you're worried that a common secret can be exposed by other repos one mitigation against that ism aking the job final. YOu can also restrict which repos can run a job21:00
@jpew:matrix.orgClark: Marking it as final only works if it's in a trusted repo, correct?21:02
@jpew:matrix.org(according to the big warning in the documentation)21:03
@clarkb:matrix.orgI'd have to double check that but if the docs say that then I would trust them21:07
@jim:acmegating.comif they should share the secret, i'd put it all in a common config-project.  an alternative is the pass-to-parent option if they have different secrets (or you want to duplicate the secret in each repo)21:09
@clarkb:matrix.orgAnother approach would be to do similar to how the secrets for logs are handled. You move things to the executor context in the base jobs post-run and the stuff running on the test nodes earlier shouldn't have access to that21:36
@jim:acmegating.comClark, swest: i've looked into the other traceback, and i don't currently fully understand it.  i can't exclude the possibility that something was awry with opendev's configuration at the time.  i think i want to wait and see if it shows up again after the next batch of periodic jobs.21:37
@jim:acmegating.comoh... i think i see another clue...  digging in again21:40
@jim:acmegating.comi grok it.  looking into test/fix now.21:45
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Include project name in gerrit branch cache https://review.opendev.org/c/zuul/zuul/+/81148822:19
@jim:acmegating.comClark, fungi: i think we may want to strive to merge that and restart opendev again with it today. ^22:20
@clarkb:matrix.orgok looking22:21
@clarkb:matrix.orgcorvus: is the project available in the change case as well? Should we put it in there for consistency since it won't hurt?22:23
@jim:acmegating.comClark: it is sometimes available, but if we put it in the change key, we can't retrieve the change without it.  so we should keep the key the minimum unique identifier22:24
@clarkb:matrix.orgI see. Also that test fixtures modifies an existing test?22:25
@jim:acmegating.comClark: nope, forgot a git add, thanks!22:25
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] Include project name in gerrit branch cache https://review.opendev.org/c/zuul/zuul/+/81148822:25
@jim:acmegating.comClark: git add ^22:25
@clarkb:matrix.orgcorvus: +2'd but left a qusetion about making the test a bit more robust22:30
@jim:acmegating.comClark: replied; let me know if you think it's worth it22:36
@clarkb:matrix.orgah I see ya I guess nesuring the job overlaps is also worth testing so we're talking combinatorial problems22:39
@clarkb:matrix.orgI'm good as is then22:39
@clarkb:matrix.org * ah I see ya I guess ensuring the job overlaps is also worth testing so we're talking combinatorial problems22:39
@clarkb:matrix.orgfungi: if you have time to review that one ^ that would be good given the plan to restart corvus has mentioned22:45
@jim:acmegating.comi think it might be worth me +3ing that and inviting zuul-maint to retro-review it at their convenience22:54
@clarkb:matrix.orgwfm22:56
@jim:acmegating.comstill got at least an hour before it merges :)22:56
@clarkb:matrix.orgI'm going to take that opportunity to make sure I don't need to help with dinner since restarting time will be around when I'd normally help with dinner. Back in a biot22:59
@jim:acmegating.comClark: sounds good (but i can also commit to restarting opendev regardless)23:00
@clarkb:matrix.orgI don't mind helping :)23:00
@clarkb:matrix.orgI guess the changes have been minimal but wouldn't want you to debug something like the github thing alone23:01

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!