Wednesday, 2020-02-26

*** mattw4 has quit IRC00:03
ianware we getting "waiting on logger" a lot more in the streaming now?00:29
fungiit has seemed like it, but i don't have enough long-term memory left to remember clearly a time when we weren't00:33
clarkbya I think we always got some of it, though it may be mroe of it now. Basically its waiting for the remote test node side daemon to start listening on port 1988500:35
clarkbpossible that newer ansible doesn't trigger the bit that starts that until later now? (could be side effect of ansible 2.7 -> 2,8 default change?)00:35
fungiany idea if a full-reconfigure is needed to get the scheduler to stop applying implied branch matchers after deleting a branch from gerrit?00:37
clarkbfungi: I thought our mergers handled that on git repo update, which may mean you need a new commit to land. If a full reconfigure implies a git repo update I think that would work too00:42
clarkband I think full reconfigure does imply that beacuse its what is done at startup and zuul wants to make sure it is up to date when doing that00:43
fungiclarkb: yep, on a whim i already gave that a shot, and it seemed to take care of it00:45
fungitwo uses for zuul-scheduler reconfigure in one day!00:45
fungier, zuul-scheduler full-reconfigure00:46
fungitwo different utc days though, technically00:46
*** sgw has quit IRC00:50
*** zenkuro13 has quit IRC01:36
*** dangtrinhnt has joined #zuul01:38
*** michael-beaver has quit IRC01:38
dangtrinhntHi, I have a couple of sessions at my company presenting about Zuul and people are so excited to use it.01:40
dangtrinhntI will have a chance to do a pilot at my company over a couple of months. My plan is to replace our Jenkins with Zuul.01:40
clarkbdangtrinhnt: exciting. Let us know how it goes and if wecan help01:45
dangtrinhntSure. Thanks. :D01:45
fungithat's awesome01:45
fungiyeah, do let us know if you have any questions01:45
dangtrinhntthanks.01:46
*** rlandy has quit IRC02:42
*** rlandy has joined #zuul03:21
*** rlandy has quit IRC03:22
*** dangtrinhnt has quit IRC03:55
*** dangtrinhnt has joined #zuul03:56
*** dangtrinhnt has quit IRC04:01
*** dangtrinhnt has joined #zuul04:07
*** raukadah is now known as chandankumar05:09
*** evrardjp has quit IRC05:34
*** evrardjp has joined #zuul05:35
openstackgerritMerged zuul/zuul-jobs master: install-javascript-packages: add tox_constraints_file  https://review.opendev.org/70941405:40
*** swest has quit IRC06:17
SpamapSmordred:thanks, I added a question06:20
*** dangtrinhnt has quit IRC06:24
*** dangtrinhnt has joined #zuul06:26
*** saneax has joined #zuul06:33
*** swest has joined #zuul06:42
openstackgerritTobias Henkel proposed zuul/zuul master: Optimize canMerge using graphql  https://review.opendev.org/70983607:44
openstackgerritTobias Henkel proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check  https://review.opendev.org/64455707:44
*** avass has joined #zuul07:51
*** harrymichal has joined #zuul08:00
*** bolg has joined #zuul08:01
*** avass has quit IRC08:09
*** jcapitao has joined #zuul08:11
bolgmordred: also + for dropping 3.5 :)08:14
*** Defolos has joined #zuul08:18
*** tosky has joined #zuul08:20
*** jpena|off is now known as jpena08:46
* mnaser is actually a fan of old-style depends-on :(09:04
mnaserit is extremely useful when doing cherry-picks to different branches09:05
mnaserbecause i dont have to modify the messages on all of them, it just naturally works09:05
*** lennyb has quit IRC09:06
*** mhu has quit IRC09:07
*** lennyb has joined #zuul09:07
*** dangtrinhnt has quit IRC09:14
*** mhu has joined #zuul09:14
*** dangtrinhnt has joined #zuul09:16
*** dangtrinhnt has quit IRC09:17
*** dangtrinhnt has joined #zuul09:17
openstackgerritJan Kubovy proposed zuul/zuul master: Refactor `self.event_queues` in tests  https://review.opendev.org/70999009:26
*** jfoufas1 has joined #zuul09:32
openstackgerritFelix Edel proposed zuul/zuul master: Dequeue items via buildset uuid  https://review.opendev.org/70913510:05
*** harrymichal has quit IRC10:09
*** harrymichal_ has joined #zuul10:10
*** harrymichal_ is now known as harrymichal10:10
*** felixedel has joined #zuul10:18
*** dangtrinhnt has quit IRC10:18
*** dangtrinhnt has joined #zuul10:19
felixedelcorvus: I've updated https://review.opendev.org/#/c/633501/23 to follow up on your latest comments. We were indeed focusing on two different API endpoints :D So, now the buildset endpoint returns a JSON that's equivalent to the one included in the MQTT report. I will add a follow up change to improve the UI part showing also the retried builds on the buildset page.10:20
*** harrymichal has quit IRC10:22
*** dangtrinhnt has quit IRC10:24
*** ianychoi_ is now known as ianychoi11:32
*** jfoufas1 has quit IRC12:02
*** dangtrinhnt has joined #zuul12:06
*** sshnaidm is now known as sshnaidm|bbl12:09
*** jcapitao is now known as jcapitao_lunch12:10
*** avass has joined #zuul12:20
*** jpena is now known as jpena|lunch12:34
*** dangtrinhnt has quit IRC12:38
*** Goneri has joined #zuul12:38
*** dangtrinhnt has joined #zuul12:40
*** harrymichal has joined #zuul12:42
*** dangtrinhnt has quit IRC12:50
*** dangtrinhnt has joined #zuul12:52
*** dangtrinhnt has quit IRC12:55
openstackgerritJan Kubovy proposed zuul/zuul master: Scheduler's pause/resume functionality  https://review.opendev.org/70973512:59
*** harrymichal has quit IRC13:00
*** rlandy has joined #zuul13:00
*** harrymichal has joined #zuul13:00
*** jcapitao_lunch is now known as jcapitao13:17
openstackgerritJan Kubovy proposed zuul/zuul master: Scheduler's pause/resume functionality  https://review.opendev.org/70973513:22
openstackgerritJan Kubovy proposed zuul/zuul master: WIP: Store unparsed branch config in Zookeeper  https://review.opendev.org/70571613:25
*** jpena|lunch is now known as jpena13:26
*** avass has quit IRC13:26
*** Goneri has quit IRC13:28
openstackgerritTobias Henkel proposed zuul/zuul master: Refactor github auth handling into its own class  https://review.opendev.org/71003413:35
*** Goneri has joined #zuul13:42
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: Tests ensure-tox on all-platforms  https://review.opendev.org/70864213:53
*** jamesmcarthur has joined #zuul13:53
openstackgerritJan Kubovy proposed zuul/zuul master: WIP: Store unparsed branch config in Zookeeper  https://review.opendev.org/70571613:54
*** harrymichal has quit IRC13:57
*** harrymichal has joined #zuul13:59
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: Tests bindep role on all-platforms  https://review.opendev.org/70870414:02
fungimnaser: i don't foresee it being necessary to drop the old-style dependency syntax, it's mainly deprecated because it's vague in a multi-connection scenario and so discouraged over being more specific about the system in which the change resides14:04
fungialso not all gerrit deployments/projects require a change-id14:05
fungiopendev's gerrit deployment has standardized on/recommended it because we think it's useful14:06
*** harrymichal has quit IRC14:24
*** jamesmcarthur has quit IRC14:27
*** jamesmcarthur has joined #zuul14:30
*** sgw has joined #zuul14:32
*** jamesmcarthur has quit IRC14:32
mnaserfungi: I didn’t know gerrit use was possible without change ID!  I thought that’s how it recognized revisions14:53
fungithat's one way, another is pushing to a specific remote ref corresponding to the change14:55
fungigit-review relies on gerrit change-id due to how it's set up14:55
*** sshnaidm|bbl is now known as sshnaidm14:56
fungiand i think it's a more intuitive solution, even though it does rely on having a commit hook to append a random id into commit messages14:56
corvusdepends-on: change-id is deprecated and planned for removal; see "An older syntax" in https://zuul-ci.org/docs/zuul/discussion/gating.html14:57
fungiwhen we do remove it, i'm assuming it'll be because of the maintenance burden for the additional code paths14:58
corvusfungi: that (it is a tortuous code path) and also to promote parity and a unified experience between drivers14:59
fungiand i think we haven't rushed to remove it because doing so would silently stop applying dependencies for existing changes which haven't switched to the new syntax15:01
fungithough maybe we could start returning errors if a malformed depends-on footer is found15:01
corvusmostly as a courtesy to the folks doing third-party ci on opendev that are still running zuul v215:01
corvusyeah, warnings are doable15:02
fungiwarnings too i suppose, but i was thinking in terms of when support is eventually removed, we could opt to have zuul refuse to run jobs instead of silently ignoring the depends-on15:04
fungiespecially because folks may be relying on dependencies to block changes from merging15:04
fungiwe've already seen that when opendev switched the canonical hostname for its gerrit in zuul15:04
fungisome changes with depends-on to changes using the old hostname were no longer blocked from merging15:05
fungiand occasionally got merged prematurely as a result15:05
*** swest has quit IRC15:06
*** saneax has quit IRC15:07
mnaserspeaking of canonical url, one of the things I’ve struggled with is that our gerrit connection driver uses an internal gerrit URL (gerrit.gerrit) as it’s hosted inside kubernetes.15:15
mnaserEven though we have our canonical hostname set to something else, the web UI shows the gerrit.gerrit path15:16
mnaserI think maybe a driver option to include “public facing URL” or maybe gerrit exposes that somewhere and we can use that in the driver instead15:16
corvusmnaser: where?  (that's a bug, and since we do the same thing in opendev, it should be easy for us to see)15:16
mnasercorvus: for example, on the “builds” page in the web UI, when you click on a review ID there15:17
mnaserFor a specific build15:17
mnaserI assume it’s likely similar in the status page too15:17
mordredmnaser: oh - because your zuul is talking to the internal url of the gerrit?15:18
mnasermordred: yep15:18
corvusmnaser: but there's also a public url for gerrit?15:18
mnaseryep, there is a public URL but I’d like zuul to use the internal one to avoid doing a big loop15:19
mnaserSo I have my user public facing URL, and the internal one that lives within the cluster network (overlay in this case)15:19
fungiso you're using different hostnames to work around different paths through the network topology15:19
*** mugsie has quit IRC15:20
mnaserI don’t think using canonical hostname is the right option for this either cause that would break me (and OpenDEv too)15:20
mnaserfungi: correct15:20
corvusyes, it is not currently anticipated that zuul and users would use a different hostname to access the same gerrit15:20
fungithe classical solution for that is split-horizon dns, but that is likely overkill for this one case alone15:20
fungiif you had a lot of instances of this sort of problem then solving it at the dns server level would make more sense15:21
corvus(canonical hostname is for when the gerrit url is not the url where users typically clone the authoritative upstream repo from)15:21
mnaserIf that’s something that sounds like it makes sense for zuul and we can come up with an option name I can push up a patch15:21
*** mugsie has joined #zuul15:22
* mnaser about to take off, will be around in a short 45 minutes hopefully :)15:23
fungianother popular solution is to solve it at the routing level, publishing separate routes for that address space to the machines which shouldn't use the global routes, and possibly also performing 1:1 address translation if those addresses aren't bound directly to the servers themselves15:23
corvusmnaser: i'm game, but it'll need to hit all the drivers15:27
mordredfungi: yeah - but in this case I think the internal location is just the standard internal k8s service name and the external is the external dns name attached to the published external ip - so to do that would involve updating k8s15:27
mordredcorvus: yah. because one could imagine someone doing a private github + zuul in the same model15:27
corvusyup15:27
fungimordred: seems like the routing solution should be doable in network namespaces for the individual containers, but i don't know a whole lot about how the kernel routes between namespaces15:28
mordredfungi: I think the issue might be separation of ownership in this case - k8s owns the internal ip and name but likely not the external dns name - so there might not be anyone (other than mnaser as a human) who has complete enough context to do the split routing15:29
mordredfungi: k8s runs an internal dns service that it uses to assign logically addressable names to pods, so that intra-k8s communication can be done via logical names leveraging dns15:30
corvusi think a notable aspect of this is that the public ip is potentially entirely outside of k8s (ie, an external load balancer); though if the load balancer is configured by k8s (via a custom ingress controller), then k8s can know the ip, so could theoretically create an internal route for it15:30
mordredcorvus: yah15:31
corvusmordred: but routing is by ip, not name15:31
corvusand k8s might know the ip15:31
mordredthat's true15:31
fungiwell, i was thinking in terms of destination route in the container namespaces for the public address, and layer-3 translation to map that address to the actual internal one15:31
fungiso that the containers themselves when communicating with the "public" addresses of other services just use local routes15:32
fungirather than hitting the "load balancer" (whatever it may be)15:32
corvusfungi: right, i think that's theoretically possible if k8s knows that the public ip == an internal service15:33
fungibut if the desire is to also have kubernetes set up and maintain all the local routing in your container network namespaces (rather than injecting the routes and translation rules into the network namespaces yourself through other means) then that likely depends on whether it supports such notions15:33
fungiit's one of the reasons i struggle to appreciate kubernetes, it seems lots of people treat it as a black box and anything it can't do itself is simply deemed not possible15:34
corvusfungi, mordred: fwiw, i just did a traceroute from zuul-scheduler on gerrit's zuul to ci.gerritcodereview.com and got 9 hops terminating at the public ip, and did the same to zuul-web.zuul (the internal service name) and got 2 hops.  so based on my *extensive* testing, it seems like we do not get a pony.15:35
fungithese are long-standing solutions to these sorts of problems which predate kubernetes by decades, and just because they're not kubernetes features doesn't mean they can't still be used alongside it15:36
mordredcorvus: darn. I wanted a pony15:36
mordredfungi: maybe another way to look at this one is that it's still possible this could be implemented by k8s - but just because k8s might implement it doesn't mean every k8s admin will have deployed it that way - so from a zuul perspective we should still probably support the split to be friendly to zuul admins regardless of the shape of their k8s15:37
mordredall openstack services support public and internal endpoints in the keystone catalog for the same use case - fwiw15:38
fungii'm not opposed to the suggested feature, simply pointing out that there are already ways to accomplish the same goal through other means (name resolution, routing, et cetera)15:39
corvusright, but since k8s controls both the name resolution and routing for the containers inside k8s, then unless/until k8s supports doing those things the options effectively aren't available15:40
fungii'm not a big fan of address translation to start with, but when address translation is already forced on you (it seems like kubernetes insists on it?) then fixing it with more address translation seems reasonable15:40
corvus(and who knows?  maybe it does?  it's... complex... to put it mildly)15:41
mordredfungi: you *can* actually set up a k8s with no NAT15:42
mordredfungi: almost nobody does of course15:42
fungikubernetes doesn't exclusively control routing and name resolution for the containers though, as i understand it, you can still influence them from the host by manipulating the namespace routing tables and controlling whatever resolvers requests are forwarded to15:42
mordredbecause the world has been somehow brainwashed into believing that NAT isa . good thing15:42
corvus(pnat is evil; nat is not bad)15:42
corvusfungi: right, we could probably edit /etc/hosts or something.  but that seems a bit heavy handed, and may not be a realistic option in some deployments.15:43
funginat as a means of packet filtering is pretty bad though, and something we can blame cisco's purchase of pix and subsequent popularization for15:44
corvusanyway, mnaser's patch will not be a one-liner, but it will be simpler than "deploy a new dns system" :)15:45
fungicorvus: i was thinking split-horizon dns in the recursive resolver actually, but i know folks are often not a fan of that either15:45
corvuser, i guess "deploy a new dns".  i have to go to the atm machine.15:45
mordredcorvus: you must of done good on the SAT test15:45
Shrewsmordred: or just SAT15:46
fungii had a scholastic aptitude for the sat test15:47
corvusi did sit for the sat15:48
corvusso if i can change the topic for a moment; i'm looking into removing the "requires/provides: docker-image" we have in our opendev base image jobs and replacing them with more specific attributes on the leaf-node jobs like "provides/requires: python-base-image" so that zuul can be smarter about serializing them (right now, we've reduced zuul's gate queue to a serialized window of 1 because it thinks every15:50
corvuschange depends on the container images of the change ahead).15:50
corvusthat seems pretty straightforward up to this point: the "requires:" is not on the build job, but rather, on the buildset-registry job, because the buildset registry role pulls the previous artifacts in when it starts.15:51
corvusthat means that i'd need to put "requires" lines on either the leaf-node job or the buildset registry job, depending on whether the repo is using it (ie, zuul does not have a separate buildset registry job, so i would put requires on "zuul-quick-start", but nodepool does, so i would need to duplicate them there)15:52
*** bolg has quit IRC15:52
corvusthat sounds messy, and prone to error if someone were to switch from no buildset-registry job to using one15:52
corvusi was thinking maybe i could have zuul "pull-up" the requirements of any dependent jobs (so buildset-registry gets the requirements of zuul-quick-start automatically in nodepool).  but i worry that could go too far (for instance, if you made a tree of jobs that started with a linter, then ran buildset-registry if linting passed.  you'd have the linter waiting for container builds of changes ahead)15:53
corvusmaybe it would be okay to move the role that populates the buildset registry into the base image building job instead of the buildset-registry job?15:54
corvusyou could end up with some jobs which use the buildset-registry starting before it was completely populated with images from previous builds, but that should be okay, because presumably, those jobs don't "require" the images that haven't been imported yet15:55
corvusi think i'll look into that, unless someone has another idea (or can see why that wouldn't work)15:56
mordredcorvus: wow, that's a second coffee question ...15:56
mordredcorvus: but on first read it seems reasonable15:56
mordredand I support the goal state :)15:56
corvusyeah, i mean, i should have breakfast too.  :)15:56
mordredcorvus: you're pre-breakfast? I'm violating one of my cardinal rules - never get between corvus and breakfast15:57
mordred(never start a land war in asia is still a more important rule)15:58
Shrewsthat's more of a 4th coffee or 1st coffee-flavored beer question16:00
tobiashzuul-maint: we're currently facing performance issues with github so I took some time to dive into optimizing the api requests we're doing towards github. This reduces the number of requests needed to enqueue a change from 12 to 6 while four of the saved requests are in the critical code path in the run_handler: https://review.opendev.org/#/q/topic:github-optimization16:01
fungii'm still trying to wrap my head around the premise with the buildset-registry requires question... maybe i should rethink my one-coffee-a-day policy16:09
tristanCtobiash: i though github graphql endpoint needs a pageInfo element to get a cursor and be able to query more than 100 results or something.16:11
tobiashtristanC: yes, the initial assumption is that nobody has more than 100 protection rules and check runs16:12
mordredtobiash: that seems like a reasonable starting place :)16:13
tobiashI'll look into (partly generic) paging if needed16:13
tristanCtobiash: isn't the query also used to get mergable PR?16:13
tobiashtristanC: you mean if the pr can be merged without conflicts?16:14
tobiashregarding cursor on check suites/runs: zuul has one checksuite/run per pipeline, so I'm pretty sure a limit of 100 is safe there16:15
*** felixedel has quit IRC16:23
mordredtobiash: the codeowners patch can't use the graphql too I suppose?16:24
mordredtobiash: NEVERMIND16:25
mordredtobiash: I see it now16:25
tobiashmordred: it does, but I'd like to refactor it to match the structure of the parent16:26
tobiashI didn't wip it because it never got reviews ;)16:26
mordredtobiash: it's always so scary!16:26
mordredtobiash: but yeah, I think updating it to match will be nice16:27
tobiashyes I know that this is scary (and only needed due to api limitations)16:27
mordredthe new graphql patch is actually much more understandable than I thought when I saw the commit come in originally16:28
tobiashmordred: that was the goal :)16:29
tobiashand if you get used to it, the graphene based mock of the github api is quite simple16:29
mordredtobiash: you've reminded me that I have a very out-of-date todo list item to migrate the zuul website to gatsby ... largely because gatsby also uses graphql and doing a website with it is what got me to learn the graphql in the first place16:30
mordredso maybe I should actually get around to doing that16:30
*** mattw4 has joined #zuul16:33
tobiashmordred: oops, I forgot one in the topic: https://review.opendev.org/#/c/70914916:35
*** jamesmcarthur has joined #zuul16:36
mnasersome good discussion. I agree that all drivers should support it16:39
mnaserI will work in something ..16:40
mnaserok i've made two discoveries16:51
mnaserthere is a 'baseurl' config option inside the gerrit driver..16:52
mnaserwhich seems like it would be the actual perfect case to use for this, so i'd *maybe* argue that if it's not being respected, maybe that's an issue, because it seems to be used for things like the gitweb url16:53
mnaserand to me it seems like if i add that setting, this solves my issue?16:53
mnaser(it defaults to the value of 'server')16:54
corvusmnaser: oh i forgot about that, sorry.  yeah, that's worth a try :)16:54
corvusmnaser: oh, no that's not quite it16:55
mnaserif it's not it, i'm struggling a little bit in finding the code that 'generates' the url which ends up going into the database (which seems to be item.change.url) which is what is provided by ref_url inside the API16:56
corvusmnaser: baseurl is used for rest api requests.  basically, it's the http equivalent of 'server'16:56
mnaserahhh16:56
corvus(server for ssh, baseurl for http)16:56
mnaserok, yes makes sense16:57
corvusso yeah, i think the main challenge is going to be separating out all of the internal vs external uses16:57
mnaserok yeah i'm starting to see how this can be a little hard16:58
mnaserit might involve adding an extra thing to the zuul change model which is item.external_url or whatever it's called16:59
mnaserand then we save that in the sql reporter instead of the item.change.url16:59
corvusi don't think the model should need to be changed16:59
mnaseri guess this also means that unknowingly my depends-on with urls will only work with Depends-On: http://gerrit.gerrit/123416:59
mnaseri'm trying to think out loud with the least impactful change that doesnt end up touching a lot of different bits17:00
corvusall the internal actions should be done via the source, trigger, and reporter interfaces; those are the only thing that needs to know the internal hostname17:00
corvus(all of those are aspects of a connection)17:01
corvusand yes, i think that's your current depends-on syntax17:01
*** mattw4 has quit IRC17:03
*** mattw4 has joined #zuul17:04
openstackgerritMerged zuul/zuul master: Don't set untouched refs of the repo state twice.  https://review.opendev.org/70785717:05
*** mattw4 has quit IRC17:34
*** evrardjp has quit IRC17:34
*** evrardjp has joined #zuul17:35
*** jpena is now known as jpena|off17:40
openstackgerritMerged zuul/zuul master: Don't fetch pull request twice for status event  https://review.opendev.org/70914917:43
*** mattw4 has joined #zuul17:44
*** jamesmcarthur has quit IRC17:58
*** jamesmcarthur has joined #zuul18:00
*** jamesmcarthur has quit IRC18:05
*** mattw4 has quit IRC18:05
*** Goneri has quit IRC18:11
*** Goneri has joined #zuul18:12
*** jamesmcarthur has joined #zuul18:21
*** chandankumar is now known as raukadah18:21
*** michael-beaver has joined #zuul18:29
*** jcapitao is now known as jcapitao_off18:35
*** mattw4 has joined #zuul18:36
*** tjgresha has joined #zuul18:36
*** igordc has joined #zuul18:37
mordredcorvus: reply from SpamapS in https://etherpad.openstack.org/p/zuulv418:39
*** igordc has quit IRC18:43
corvusmordred, SpamapS: i think the warning should come after we merge implicit sql reporters, and that's a semi-breaking change (at least, it's a change that operators *may* need to make carefully).  but if we follow the sequence of: tell people that sql reporters will be required, make them implicit, make explicit sql reporters an error, then it should be relatively smooth for most folks?  unless i'm missing18:46
corvussomething18:46
*** rlandy is now known as rlandy|mtg18:46
*** tjgresha has quit IRC18:49
*** tjgresha has joined #zuul18:52
openstackgerritJames E. Blair proposed zuul/nodepool master: Use explicit provides/requires for container jobs  https://review.opendev.org/71011518:52
*** tosky has quit IRC18:56
openstackgerritJames E. Blair proposed zuul/zuul master: Use explicit provides/requires for container jobs  https://review.opendev.org/71011618:58
*** dpawlik has quit IRC19:01
*** felixedel has joined #zuul19:05
*** mattw4 has quit IRC19:09
*** mattw4 has joined #zuul19:17
*** dpawlik has joined #zuul19:18
*** rlandy|mtg is now known as rlandy19:27
*** jamesmcarthur has quit IRC19:31
*** jamesmcarthur has joined #zuul19:40
*** jamesmcarthur has quit IRC19:58
*** sshnaidm is now known as sshnaidm|afk20:11
*** jcapitao_off has quit IRC20:17
*** jamesmcarthur has joined #zuul20:21
SpamapScorvus: Was just thinking, start telling people about it now, and then the jump to 4 is the only place where anybody has to sit and think about their reporters.20:33
SpamapSbut I'm fine with it as-is, just wondering if we can be more aggressive while crossing a major release boundary.20:33
*** mattw4 has quit IRC20:42
*** mattw4 has joined #zuul20:42
*** michael-beaver has quit IRC21:19
openstackgerritIan Wienand proposed opendev/zone-zuul-ci.org master: git.zuul-ci.org : point to static.opendev.org  https://review.opendev.org/71014221:23
corvusSpamapS: i think if we want people to be able to upgrade without significant downtime we have to have a period where explicit reporters are not an error, but zuul uses implicit reporters.  so if we move the explicit-reporters-are-error phase back to 4.0, we'll need to make implicit reporters happen before 4.0.  even though that might be okay (i suspect few or zero users have more (or less) than one sql21:23
corvusconnection in a tenant at this point), if they did, that'd be a pretty rough change to do without major signaling ("i know your pipeline says this, but zuul is really going to do this instead")21:23
*** adam_g has quit IRC21:36
*** jamesmcarthur has quit IRC21:36
*** jamesmcarthur has joined #zuul21:36
*** adam_g has joined #zuul21:37
*** jamesmcarthur has quit IRC21:44
*** jamesmcarthur has joined #zuul21:45
*** adam_g has quit IRC21:46
tristanCconsidering zuul doesn't have stable branch and is kind of a rolling release, shouldn't we use the v4 increment as a scaled-out scheduler marketting version number21:47
*** adam_g has joined #zuul21:47
tristanCotherwise, according to semver, asking user to configure zk-auth should be the 4.0.021:52
fungii'm wary of any ties between version number choices and "marketing"21:54
tristanCthus i'm in favor of adding support for zk auth, implicit sql, and the other thing we'll need for scaled-out scheduler as 3.x version, and then use zuul v4 for the scaled-out feature21:54
openstackgerritJames E. Blair proposed zuul/zuul master: Increase timeout in timeout test  https://review.opendev.org/71014621:56
corvuszk auth is not a breaking change, implicit sql is (we're literally saying that zuul is going to ignore the configuration you tell it to, and if you dont have a sql connection configured, it won't start)21:58
*** jamesmcarthur has quit IRC21:59
corvusour interpretation of how to apply the principles of semver to this application (zuul is not a library) are that architectural changes or significant breaking changes for operators get a major version, and breaking changes for users get a minor.21:59
corvusit's vague, but it's something.22:00
tristanCcorvus: i'm not sure what `asking the user to configure zk auth` implies then. it seems like something major where you can't just update zuul22:00
*** jamesmcarthur has joined #zuul22:00
corvustristanC: we land zk auth asap, and zuul will work with both but not require auth.  users can configure auth anytime between now and 4.0.  probably at 4.0 we require auth?22:01
*** jamesmcarthur has quit IRC22:01
corvusyes, that's #422:01
tristanCcorvus: can't the implicit reporter also land asap too?22:02
corvustristanC: no, an implicit sql reporter is 2 breaking changes: 1) it will ignore what you have in your pipeline; 2) it will break if there's no sql connection22:02
corvusit's mostly #1 that i think is intractable (we could obviously make #2 softer)22:03
tristanCwhy not adding sql reporter by default so that user can remove reporter anytime between now and 4.0, and probably at 4.0 we requires no sql connection?22:04
tristanC(when sql connection exists)22:05
*** jamesmcarthur has joined #zuul22:05
corvussure, we can do that, but doing something unexpected (contrary to what the user configured) is a big deal, and we should signal it.  i think we should signal it with a major version bump.22:06
tristanCalright, then why not making scaled-out scheduler v5.0 then?22:07
corvusi think we have two parallel issues here -- i think everyone wants the ha scheduler work to proceed with minimal disruption.  if you ignore the "release" steps in https://etherpad.openstack.org/p/zuulv4 then i think that process achieves it.22:07
corvusso then the question is, which of those steps necessitate releases.  i think: implied sql reporter (with required sql conn), and required zk auth are breaking changes.  that plan proposes the 4.0.0 release be those.  i don't think there are any further breaking changes for operators.22:09
corvusif we're not happy with those version numbers, i see 2 alternatives:22:09
corvusa) decide that major version number bumps should not be used to signal major changes to operators and instead should be used for marketing.  in which case, we just relase 4.0.0 at the end of that process.22:10
tristanCi meant, between the proposed 4.0 and 4.1, it won't be 'breaking', but correctly operating a ha scheduler probably needs breaking change for operators (e.g. adds an extra scheduler service)22:10
corvusb) some compromise where we use 4.0 to signal breaking changes and use 5.0 to signal completion of the feature set and a marketing milestone22:10
corvustristanC: when we have ha schedulers, it's still okay to run one.  so at the point the feature is implemented, folks can just optionally run a second one.  it's work, but it's not required or disruptive.22:11
tristanCi think a or b are better than a 4.0 with only required sql reporter and 4.1 with ha scheduler22:11
*** rfolco has quit IRC22:12
*** tosky has joined #zuul22:12
mnaseri know this might sound silly, but 'marketability' is something that makes sense for big feature changes.22:12
corvusthe plan in the etherpad doesn't even have 4.1 as ha scheduler, it's just the end of the deprecation period for sql reporters in pipelines.22:12
mnaserfor example, saying 'zuul' and 'zuul v3' told users a big story in the difference and feature set that comes with it22:13
tobiashzuulv5 as ha scheduler :)22:13
mnaserbeing able to easily know if you have a version of an application that has <big feature> is useful overall imho22:13
corvusmnaser: yep, i recognize that.  i would like to attract new users to whom the feature is important, but also don't want to piss off current users who are like "wtf you broke everything on 3.18.0?" :)22:13
tristanCmnaser: i find it easier to communicate22:13
mnaseryes, i agree it's easier to communicate, but i also agree on not breaking things in 3.18.022:14
tristanCcorvus: oh my bad, i read the pad as 4.1 is ha scheduler22:14
corvusso yeah, if we don't think (just random guess here) 4.3 == ha scheduler is marketable enough, i'm not opposed to zuul v5 for that.22:14
corvusi think our users would forgive v5 == no new changes after 4.X other than completion of the ha scheduler work more than breaking things on 3.18.0.22:15
corvus(and i think if we want 4.0.0 to be all the breaking changes plus the ha scheduler, we would need a feature branch)22:16
tristanCcorvus: zuul v5 for ha scheduler sounds more aligned with semver, but zuul v4 would works for me too22:16
corvusit sounds like we may be able to obtain consensus on that; let's check with fungi, mordred, and SpamapS (see additional steps 12 and 13 in https://etherpad.openstack.org/p/zuulv4 -- basically, release 5.0.0 as a signal that ha scheduler work is complete)22:19
mnasermaybe it's something to think cause its kindof the first time that this happens (yes i acknowledge that it isn't really the first time because we're at zuul v3) but perhaps having some messaging around "zuul v3 is no longer supported/released"22:20
corvustristanC: that does make "error on explicit sql in pipelines" show up in 5.0, which is nice (but not strictly necessary according to our established pracitce)22:21
fungiokay, catching up22:23
*** sshnaidm|afk has quit IRC22:23
corvusmnaser: indeed, both major version changes so far have been major architectural/breaking changes.  1->2 == jenkins http api -> gearman plugin; 2->3 == jenkins -> ansible.22:24
tristanCfwiw i've been contemplating an haskell tool that automatically picks a version number based on code changes: https://kowainik.github.io/posts/policeman-bristol#briefing  unfortunately that doesn't apply well to unsafe language :)22:26
*** jamesmcarthur has quit IRC22:29
* fungi wonders what qualifies as a "safe" language22:29
corvusone without side effects22:30
*** jamesmcarthur has joined #zuul22:30
*** Goneri has quit IRC22:30
fungii'm fine with 4.0.0 being the breaking changes and 5.0.0 signalling the completion of those features, though also if the breaking changes can't land at the same time having one be 4.0.0 and the other be 5.0.0 and then completion of both being 6.0.0 also works for me22:33
fungicorvus: given the state of modern processors, i wonder if there is any language without side effects (even if unintentional ones)22:34
*** jamesmcarthur has quit IRC22:35
*** jamesmcarthur has joined #zuul22:35
tristanCfungi: i meant a safe a language where a behavior change results in a semantic diff on the exposed interface, like what haskell or rust do22:39
*** jamesmcarthur has quit IRC22:49
*** jamesmcarthur has joined #zuul22:54
fungimakes sense22:56
*** felixedel has joined #zuul23:07
openstackgerritJames E. Blair proposed zuul/zuul master: Store build.error_detail in SQL  https://review.opendev.org/70985723:11
*** jamesmcarthur has quit IRC23:15
*** rlandy is now known as rlandy|bbl23:18
SpamapScorvus: I like that. I'm also in agreement with you either way on how to make the reporter switch (just wondered if we can skip a few steps really)23:28
mordredI'm not convinced we need #13 (although I've also had the same thought myself - marketability IS A big deal) ... BUT - I don't think my being convinced or not convinced on 13 matters right now - the nice thing about 13 is that, once HA scheduler is done and we're happy - we could cut a v5 and make noise about it - OR - we could _not_ cut a v5 and instead say "LOOK, we just released HA scheduler as a point23:34
mordredrelease because we are stable enough to do so"23:34
mordredlike - as much as I want to tout the new feature, I'm also excited that the plan for HA scheduler can be done primarily without breaking changes, and the breaking changes we're doing in prep for it are also not particularly breaking either (they are - by our definition, but compared to _other_ breaking changes out there - I think we're doing pretty good) and it's something worth trumpeting.23:35
mordredbut it's also just numbers, so I could totally get behind tagging a v5 if it'll help us talk about it23:36
*** Defolos has quit IRC23:48

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!