Monday, 2021-06-28

*** marios is now known as marios|ruck05:16
*** jpena|off is now known as jpena06:57
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: Add a meta log upload role with a failover mechanism  https://review.opendev.org/c/zuul/zuul-jobs/+/79533607:39
*** bhagyashris_ is now known as bhagyashris09:40
*** jpena is now known as jpena|lunch11:32
*** jpena|lunch is now known as jpena12:29
*** marios|ruck is now known as marios|ruck|call12:30
pabelanger[m]tobiash: if you have spare time, maybe your team and check out https://review.opendev.org/c/zuul/zuul/+/79813112:40
tristanCRestarting our main zuul scheduler now takes more than 15 minutes and it seems like the bottleneck is getting the branches information of repositories hosted on src.fedoraproject.org...12:43
tobiash[m]pabelanger: do you know if the neutral status is available on all ghe versions?12:43
tristanCCould we look into skipping that loading step for project with the include attribute set to []? Or is this going to be refactored with the rest of the scheduler ha?12:43
pabelanger[m]I didn't see anything specific in https://docs.github.com/en/rest/reference/checks#update-a-check-run12:44
tobiash[m]tristanC: I'm not aware about changes around this topic, do you think this is a regression or just something to be tweaked?12:44
tristanCtobiash[m]: i am not sure if this is a regression, and/or if this is because we are adding more and more projects. But this is going to be a blocker as we would like to add a new tenant with double the amount of projects we currently have.12:50
tristanCand since most of these projects do not include zuul configuration, i guess we could tweak the scheduler startup process by skipping the branch collection when include is set to []12:51
tristanCor perhaps this branch information should be stored in the ZooKeeper, so that multiple schedulers would not need to look it up when starting12:53
pabelanger[m]I feel we had this issue before, about reloads taking a long time for projects zuul didn't really care about12:55
pabelanger[m]in the early 3.0 days12:56
fungitristanC: not necessarily "refactored" but with the configuration state moving into zookeeper it will be persisted there between scheduler restarts13:35
fungiso hopefully scheduler restarts will perform more like a smart-reconfigure13:37
corvussome of the branch selection logic in the executor uses the list of branches on project even with include:[], so ignoring that wouldn't be straightforward (ie, it may not be correct in all cases).  it might be best to wait until the scheduler cache is in place and see how that goes before looking into more specific optimizations13:42
tristanCcorvus: how far are we to have this cache?13:44
corvus(it's partially in place now; we just still reload everything on startup)13:44
corvustristanC: do you know what part of the process takes so long?  is there something in one of the drivers we could make less expensive?13:45
corvus(because collecting the branches on all of opendev's zuul takes only a few seconds)13:45
clarkbtristanC: tobiash[m]: I know for Gerrit they say that git protocol v2 really helps. I don't think src.fedoraproject.org is running gerrit but maybe that is something to try toggling on there too?13:46
clarkbit may just be a client side thing (we did that in the zuul published images already though if you use those)13:47
corvusRUN git config --system protocol.version 213:48
corvushttps://review.opendev.org/766765 for more info13:48
fungithough it's worth keeping in mind that while opendev does have a lot of projects we culturally (mostly due to the gerrit workflow) don't have the proliferation of thousands of topic branches you see on some other projects in the wild14:07
corvusyep, just wondering what's eating the time?  does the problem scale with number of projects or number of branches?14:09
tristanCcorvus: it seems to scale with the number of projects, and we guess this is cause by src.fedoraproject.org being slow14:12
fungiand that's pagure, right?14:13
tristanCfungi: yes, with >30k projects14:14
*** marios|ruck|call is now known as marios|ruck14:15
corvusif so, it uses the api to get branches; another option might be to use git14:15
tristanCwe would like to add all those project to zuul, but the scheduler is currently struggling to load 1k projects in reasonable amount of time14:15
clarkbpabelanger[m]: tobiash[m] for neutral and skipped statuses do we need GHE to handle them to land the change? Or is it only a problem if the user sets that status in their config and their github doesn't support it?14:39
opendevreviewMerged zuul/zuul master: Increase unit test job timeout to 90 minutes  https://review.opendev.org/c/zuul/zuul/+/79801914:51
pabelanger[m]clarkb: I am not sure, I would imagine if GHE didn't support it, they would raise an error from the API.14:56
pabelanger[m]however, I don't see any dates or API release notes to say when it was added14:57
clarkbpabelanger[m]: looks like you cna select the github api version in the top right of the docs page. It defaults to github.com15:00
clarkband while I see GHE 3.1 and 3.0 list neutral I don't see skipped on any of the version pages15:00
clarkbI suspect it is fine https://docs.github.com/en/enterprise-server@3.1/rest/guides/getting-started-with-the-checks-api15:01
clarkbcorvus: is https://review.opendev.org/c/zuul/zuul/+/797180/2 desireable if testing shows it is slightly slower than using alembic? Maybe we want tokeep the two in sync anyway and this is a good way to ensure that?15:04
corvusclarkb: my thinking is yes -- it's not slow enough to make a difference, and the functionality is an improvement15:08
corvus(just getting them in sync already caught one minor error)15:08
pabelanger[m]clarkb: I can see it in 2.21 https://docs.github.com/en/enterprise-server@2.21/rest/reference/checks#update-a-check-run15:24
clarkbpabelanger[m]: ya I sspect your change is fine given those docs15:24
pabelanger[m]so, think we are safe15:24
clarkbcorvus: do we also need to test postgres the same way?15:28
corvusclarkb: not a bad idea; i can work on a change for that15:30
corvus(it should be the same, but having an extra test could catch an unlikely edge case)15:31
clarkbya and we may have different indexes or types with postgres15:32
clarkbcorvus: in https://review.opendev.org/c/zuul/zuul/+/797181/1/tests/base.py should we just go ahead and delete that commented out code? Or do you want to do that when you update all of the test cases to use the new parent class?15:38
corvusclarkb: that should probably be deleted from that change15:41
clarkbalso what is the difference betwene the sql driver and the database driver? We seem to hanlde both driver setups15:46
clarkbwe made the canonical config "database" right?15:46
clarkbya if database then driver = self.drivers['sql']15:47
clarkbleft another note on that change15:50
corvusclarkb: yeah, sql is deprecated, to be removed with 5.015:58
clarkbfungi: corvus  is https://review.opendev.org/c/zuul/zuul/+/798092 related to that zuul-jobs update mhu1 was writing? if so I suspect that is a priority even if it has conflicts iwth many changes16:00
*** jpena is now known as jpena|off16:00
corvusheh, i only see it conflicting with 1 change; but yes, that is related to mhu1's change -- we can revert mhu's change after we make a release with 798092 in it16:01
fungiclarkb: yes, it's the longer-term solution which the zuul-jobs change is attempting to provide a temporary workaround for16:01
fungiwhich i was also able to get working over the weekend if anyone has time to look at it16:02
clarkbcorvus: gerrit web ui shows ~40? conflicting changes16:02
corvusclarkb: impressive!  gertty shows it only conflicting with https://review.opendev.org/66496516:03
corvus(with which, btw, it truly does conflict -- i just checked)16:04
fungii think gerrit considers two changes to "conflict" if they both edit the same file16:04
corvusanyway, i agree, it's a priority16:04
clarkbthat change would write out the zuul resources dict to that file via zuul_vars.update({'resources': zuul_resources})16:06
clarkbwe don't put any secret connection details in there thatwe might accidentally expose? This would be for container workloads I think (since winrm and ssh rely on external credentials)16:06
opendevreviewJames E. Blair proposed zuul/zuul master: Replace TreeCache in component registry  https://review.opendev.org/c/zuul/zuul/+/79658216:07
opendevreviewJames E. Blair proposed zuul/zuul master: Add ExecutorApi  https://review.opendev.org/c/zuul/zuul/+/77090216:07
opendevreviewJames E. Blair proposed zuul/zuul master: Change zone handling in ExecutorApi  https://review.opendev.org/c/zuul/zuul/+/78783316:07
opendevreviewJames E. Blair proposed zuul/zuul master: Switch to string constants in BuildRequest  https://review.opendev.org/c/zuul/zuul/+/79184916:07
opendevreviewJames E. Blair proposed zuul/zuul master: Clean up Executor API build request locking and add tests  https://review.opendev.org/c/zuul/zuul/+/78862416:07
opendevreviewJames E. Blair proposed zuul/zuul master: Fix race with watches in ExecutorAPI  https://review.opendev.org/c/zuul/zuul/+/79230016:07
opendevreviewJames E. Blair proposed zuul/zuul master: Execute builds via ZooKeeper  https://review.opendev.org/c/zuul/zuul/+/78898816:07
opendevreviewJames E. Blair proposed zuul/zuul master: Move build request cleanup from executor to scheduler  https://review.opendev.org/c/zuul/zuul/+/79468716:07
opendevreviewJames E. Blair proposed zuul/zuul master: Handle errors in the executor main loop  https://review.opendev.org/c/zuul/zuul/+/79658316:07
opendevreviewJames E. Blair proposed zuul/zuul master: Shard BuildRequest parameters  https://review.opendev.org/c/zuul/zuul/+/79714916:07
opendevreviewJames E. Blair proposed zuul/zuul master: Compress sharded ZK data  https://review.opendev.org/c/zuul/zuul/+/79715616:07
opendevreviewJames E. Blair proposed zuul/zuul master: Switch to ZooKeeper backed merge result events  https://review.opendev.org/c/zuul/zuul/+/78419516:07
opendevreviewJames E. Blair proposed zuul/zuul master: Switch from global to tenant event queues  https://review.opendev.org/c/zuul/zuul/+/79744016:07
opendevreviewJames E. Blair proposed zuul/zuul master: Remove unused addResultEvent method  https://review.opendev.org/c/zuul/zuul/+/79754216:07
corvusclarkb: hrm, it probably does have connection information...16:07
corvusclarkb: it's possible we "leak" that now in the same way that we leaked the buildset registry info before16:08
clarkbhrm16:08
corvus(ie, it's only available after the job finishes, so it doesn't matter)16:08
*** marios|ruck is now known as marios|out16:09
corvusbut considering what we learned about the buildset registry, we should think about whether that's actually true16:09
clarkbya I was trying to think if there would be a case where it matters. In theory the creds are only used while the job is running and once the job is done it would no longer become a problem16:09
clarkbexceptions would be if we returned those credentials while the job was running (via console log or zuul_return) or if we reused credentials between jobs16:10
corvuszuul_return won't expose those until the end of the job though16:11
clarkbah yup, they would have to then be used by another job consuming the return somehow16:11
corvusi think my inclination right now is to say that it's okay because it doesn't escape the executor until the end of the job.  and that in general, the "inventory" part of that data is useful debugging info.  but if we wanted to tighten it up a bit, we could remove the credentials from there when writing the debug inventory.yaml file; but that would basically be special-casing some stuff.16:12
clarkbmaybe tristanC and tobiash[m] want to weigh in since they use systems with credetnials in there? (I don't think opendev really does)16:13
clarkbI'll +2 the change and they can approve it if they are ok wtih it16:14
corvus(and to be clear: it's not a regression -- but it is an opportunity to rethink something that has been happening for a while)16:16
clarkb++16:16
clarkbcorvus: https://review.opendev.org/c/zuul/zuul/+/795195 ianw reports that firefox 88 exhibits the issue with xterm.js text highlights16:16
clarkbI suspect that maybe there is an interaction between xterm.js and $browser $version. Probably worth updating to ensure it is happy regradless of browser16:17
corvusclarkb: yep -- do we know if anyone's rechecked the behavior in the build?16:17
clarkbcorvus: using the preview site? I haven't. But I can do that now16:18
clarkbI don't have unhappy FF though16:18
corvusoh interesting, the scrollbar is back to being unhappy for me, but i also don't have a selection issue either way.16:19
corvusanyway, change lgtm16:19
clarkbcorvus: on FF 90.0b12 the highlighting works for me using the site preview16:19
clarkbThe scrollbar seems to work16:19
corvusmy thought process was: something changed in production, better double check the change16:20
clarkb++16:20
clarkbI suspect the production hcnage was we all get newer FF installs16:20
corvusya16:20
corvus(and that also could affect the change :)16:20
opendevreviewJames E. Blair proposed zuul/zuul master: Use an actual database in all Zuul tests  https://review.opendev.org/c/zuul/zuul/+/79718116:22
avass[m]corvus, clarkb: do you mean the missing zuul.* vars and kubernetes/openshift credentials? or do other drivers use credentials?16:23
clarkbavass[m]: I think this would be largely limited to drivers that don't use key auth (so ya k8s and openshift)16:23
avass[m]so that16:23
corvusyeah, it's basically drivers that don't end up putting stuff in the ansible inventory16:24
avass[m]that's also specifically the namespace drivers I suppose16:24
corvusclarkb: https://review.opendev.org/796582 is the next v5 thing, btw; i think it's ready to go16:26
clarkbcorvus: ok I'll take a look16:26
opendevreviewJames E. Blair proposed zuul/zuul master: Remove ZuulDBTestCase  https://review.opendev.org/c/zuul/zuul/+/79835016:29
opendevreviewJames E. Blair proposed zuul/zuul master: Use an actual database in all Zuul tests  https://review.opendev.org/c/zuul/zuul/+/79718117:31
opendevreviewJames E. Blair proposed zuul/zuul master: Remove ZuulDBTestCase  https://review.opendev.org/c/zuul/zuul/+/79835017:31
opendevreviewJames E. Blair proposed zuul/zuul master: Report intermediate buildsets and builds  https://review.opendev.org/c/zuul/zuul/+/79718217:31
opendevreviewJames E. Blair proposed zuul/zuul master: Add nodeset to build table  https://review.opendev.org/c/zuul/zuul/+/79820717:31
opendevreviewJames E. Blair proposed zuul/zuul master: Add Postgres schema test  https://review.opendev.org/c/zuul/zuul/+/79836217:31
corvusclarkb: ^ i think that's your comments addressed17:31
corvustook a bit to figure out a workable (i hope?) solution for postgres schema comparison17:32
clarkbcorvus: thanks. I'm still working through the TreeCache replacement. It isn't a quick review17:32
corvusclarkb: oh sorry :(17:32
clarkbno worries, I just like to think about the state as I'm going through and should probably drag out pen and paper  but doing it in my head instead17:33
clarkbI think I'm almost through it now though.17:34
corvusclarkb: the good news is that the executor api behaves in exactly the same way (it's several changes down the stack); so it's a pattern we'll be establishing.  the executor api came first; then we found out the component registry had bugs that kept failing changes further down the stack.17:35
opendevreviewMerged zuul/zuul master: web: update to latest xterm; disabled selection overrides  https://review.opendev.org/c/zuul/zuul/+/79519517:45
clarkbcorvus: I left a couple of questions on the TreeCache replacement. Can you take a look at those?17:47
clarkbalso huh does postgres not do the show create table stuff?17:50
corvusnope17:50
corvusi mean, "nope" to question #2.  question #1 is "yes i'm looking now" :)17:51
corvusclarkb: replied17:54
clarkbwfm17:55
fungi"tell em yes on one and no on two" - buckaroo banzai17:55
opendevreviewJames E. Blair proposed zuul/zuul master: Remove ZuulDBTestCase  https://review.opendev.org/c/zuul/zuul/+/79835019:33
opendevreviewJames E. Blair proposed zuul/zuul master: Report intermediate buildsets and builds  https://review.opendev.org/c/zuul/zuul/+/79718219:33
opendevreviewJames E. Blair proposed zuul/zuul master: Add nodeset to build table  https://review.opendev.org/c/zuul/zuul/+/79820719:33
clarkbif anyone has review time https://review.opendev.org/c/zuul/nodepool/+/793508 is the last change in hte azure driver rewrite stack. A pretty easy cleanup fix19:53
clarkbcorvus: https://review.opendev.org/c/zuul/nodepool/+/785821 is an accounting change in nodepool that you may want to look at. I think I didn't approve it due to the zuul 4.6.0 work and wanting to avoid landing any problems concurrent with that19:54
clarkbcorvus: its parent too19:54
clarkbtristanC: I have approved https://review.opendev.org/c/zuul/nodepool/+/777022 with one minor nit noted. I don't think its a big deal. more of a code simplification20:02
corvusclarkb, ianw, fungi, tobiash: re 785821 -- left a q on there20:16
corvusi left a +1 on that to indicate i think the change is probably okay, but maybe if we do the other thing we can delete a bunch of code and not have to think about changes like that :)20:18
clarkbcorvus: I think if we do that we have the launch failures due to being over quota when we think we are under? The provider will then go into pause mode until it frees space so it isn't fatal but can be noisy in logs iirc20:18
clarkbI think the idea behind the code there is to include leaked instances in quota calculations to avoid that if possible20:20
corvusclarkb: as long as the provider counts the leaked nodes as having reduced quota, the effect should be the same -- though it's true that we might bump into a quota error before we discover it.  should just happen once at most though.20:20
corvusalternatively, we could have the leak detector update the quota.20:20
corvusi think this will happen if/when we port openstack to the state machine driver framework anyway20:21
clarkbya I think this approach is one way to have the provider count the leaked nodes as reducing quota20:21
corvusclarkb: no sorry i meant something really specific by that20:22
clarkbhttps://review.opendev.org/c/zuul/nodepool/+/795642 is an unrelated change that should be approvable now (the functional openshift job just succeeded in the check queue for it)20:23
clarkbif anyone wants to be a quick second review on it20:23
corvusthe provider keeps track of quota that is reduced by nodes it doesn't know about; by "remembering" these nodes, they account for quota that we do know about.  if we "forget" them, then they should appear in quota of nodes we don't know about20:23
corvusevery driver other than openstack does the second thing, and is simpler.  openstack does the first thing and is complex.20:23
clarkbgot it20:24
clarkbmaybe we land these changes since they improve the current driver for those of us that run from master in particular. Then plan to switch when it gets rewritten to the state machine driver?20:25
clarkb(sound slike that would just happen for free)20:25
corvuswell, i dunno if i'd say "for free" :)20:26
corvusi don't know of anyone currently signed up to rewrite the openstack driver using state machine atm20:27
clarkbwell as a necessary part of any rewrite I guess (the rewrite itself isn't free)20:27
corvusso my personal preference would be to incrementally make it better and move toward it, which is why i made that suggestion20:27
clarkbI see20:27
clarkbI'll let ianw weigh in then as the original author20:28
corvusyeah, i'm not opposed; just that if presented the choice of "add more complexity" and "delete code" i'd go with 2.20:29
clarkbdoes the openstack dirver have handling for the unmanaged quota?20:29
corvusyep20:29
clarkbI assume so if it is common to the other drivers since they inherited a bunch of openstack stuff20:29
clarkbok cool20:29
corvusclarkb: can you review https://review.opendev.org/798374 real quick-like?  should unblock gating.dev dns20:32
clarkbdone20:34
opendevreviewMerged zuul/nodepool master: kubernetes: refactor client creation to utils_k8s  https://review.opendev.org/c/zuul/nodepool/+/77702221:16
corvusclarkb: if you have some brainspace left, this is an interesting non-change that i think could use your review: https://review.opendev.org/79405021:37
opendevreviewMerged zuul/nodepool master: QuotaInformation : abstract resource recording  https://review.opendev.org/c/zuul/nodepool/+/78709321:46
fungihopefully he's not passed out from heatstroke21:47
opendevreviewJames E. Blair proposed zuul/zuul master: Remove ZuulDBTestCase  https://review.opendev.org/c/zuul/zuul/+/79835021:59
opendevreviewJames E. Blair proposed zuul/zuul master: Report intermediate buildsets and builds  https://review.opendev.org/c/zuul/zuul/+/79718221:59
opendevreviewJames E. Blair proposed zuul/zuul master: Add nodeset to build table  https://review.opendev.org/c/zuul/zuul/+/79820721:59
SpamapSHey all. Just wanted to say, well written security messaging today. Sorry for the bugs in 3! ;)22:59
fungiSpamapS: ooh, did you see the zuul-announce mailing list post first, or the twitters?23:08
fungiwe asked foundation comms folks to spread the word on social media today in case folks missed the ml announcement thursday23:09
corvusmaybe we need to bounce those messages to -discuss too?23:12
corvusspamaps: also, you have nothing to apologize for; you gave us bwrap which has basically saved our skin on every other security issue so far, so thanks!  :)23:15
corvusand thanks for the kudos on the msg :)  it's nice to know it's comprehensible23:16
opendevreviewJames E. Blair proposed zuul/zuul master: Remove ZuulDBTestCase  https://review.opendev.org/c/zuul/zuul/+/79835023:18
opendevreviewJames E. Blair proposed zuul/zuul master: Report intermediate buildsets and builds  https://review.opendev.org/c/zuul/zuul/+/79718223:18
opendevreviewJames E. Blair proposed zuul/zuul master: Add nodeset to build table  https://review.opendev.org/c/zuul/zuul/+/79820723:18
SpamapS<corvus "spamaps: also, you have nothing "> aww shucks. :) Glad it's been handy. It was a great idea, the origin of which I don't recall. :)23:34
SpamapS<corvus "and thanks for the kudos on the "> Yeah I just liked how clear and matter-of-fact it was. :) I saw it on the Twitters yes.23:34
SpamapSsadly, I rarely get around to reading -discuss until the weekend.23:35
fungiSpamapS: that's great feedback, it'll help me remember to get the openinfra social media crew involved sooner in future cases23:36
fungiby the time i was caught up on opendev's upgrade and the related config changes/fixes, thursday was gone, and friday's not a good day to announce things if you want people to actually notice23:37

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!